bovine.texi 13.6 KB
Newer Older
1 2
\input texinfo  @c -*-texinfo-*-
@c %**start of header
3
@setfilename ../../info/bovine
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
@set TITLE  Bovine parser development
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
@settitle @value{TITLE}

@c *************************************************************************
@c @ Header
@c *************************************************************************

@c Merge all indexes into a single index for now.
@c We can always separate them later into two or more as needed.
@syncodeindex vr cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex pg cp
@syncodeindex tp cp

@c @footnotestyle separate
@c @paragraphindent 2
@c @@smallbook
@c %**end of header

@copying
26
Copyright @copyright{} 1999-2004, 2012 Free Software Foundation, Inc.
27 28 29

@quotation
Permission is granted to copy, distribute and/or modify this document
30 31 32 33 34 35 36 37 38
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
and with the Back-Cover Texts as in (a) below.  A copy of the license
is included in the section entitled ``GNU Free Documentation License''.

(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
modify this GNU manual.  Buying copies from the FSF supports it in
developing GNU and promoting software freedom.''
39 40 41
@end quotation
@end copying

42
@dircategory Emacs misc features
43
@direntry
44
* Bovine: (bovine).             Semantic bovine parser development.
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
@end direntry

@iftex
@finalout
@end iftex

@c @setchapternewpage odd
@c @setchapternewpage off

@titlepage
@sp 10
@title @value{TITLE}
@author by @value{AUTHOR}
@page
@vskip 0pt plus 1 fill
@insertcopying
@end titlepage
@page

64 65 66
@macro semantic{}
@i{Semantic}
@end macro
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87

@c *************************************************************************
@c @ Document
@c *************************************************************************
@contents

@node top
@top @value{TITLE}

The @dfn{bovine} parser is the original @semantic{} parser, and is an
implementation of an @acronym{LL} parser.  It is good for simple
languages.  It has many conveniences making grammar writing easy.  The
conveniences make it less powerful than a Bison-like @acronym{LALR}
parser.  For more information, @inforef{top, the Wisent Parser Manual,
wisent}.

Bovine @acronym{LL} grammars are stored in files with a @file{.by}
extension.  When compiled, the contents is converted into a file of
the form @file{NAME-by.el}.  This, in turn is byte compiled.
@inforef{top, Grammar Framework Manual, grammar-fw}.

88 89 90 91
@ifnottex
@insertcopying
@end ifnottex

92 93
@menu
* Starting Rules::              The starting rules for the grammar.
94 95 96 97 98
* Bovine Grammar Rules::        Rules used to parse a language.
* Optional Lambda Expression::  Actions to take when a rule is matched.
* Bovine Examples::             Simple Samples.
* GNU Free Documentation License::  The license for this documentation.
@c * Index::
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154
@end menu

@node Starting Rules
@chapter Starting Rules

In Bison, one and only one nonterminal is designated as the ``start''
symbol.  In @semantic{}, one or more nonterminals can be designated as
the ``start'' symbol.  They are declared following the @code{%start}
keyword separated by spaces.  @inforef{start Decl, ,grammar-fw}.

If no @code{%start} keyword is used in a grammar, then the very first
is used.  Internally the first start nonterminal is targeted by the
reserved symbol @code{bovine-toplevel}, so it can be found by the
parser harness.

To find locally defined variables, the local context handler needs to
parse the body of functional code.  The @code{scopestart} declaration
specifies the name of a nonterminal used as the goal to parse a local
context, @inforef{scopestart Decl, ,grammar-fw}.  Internally the
scopestart nonterminal is targeted by the reserved symbol
@code{bovine-inner-scope}, so it can be found by the parser harness.

@node Bovine Grammar Rules
@chapter Bovine Grammar Rules

The rules are what allow the compiler to create tags from a language
file.  Once the setup is done in the prologue, you can start writing
rules.  @inforef{Grammar Rules, ,grammar-fw}.

@example
@var{result} : @var{components1} @var{optional-semantic-action1})
       | @var{components2} @var{optional-semantic-action2}
       ;
@end example

@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
@var{components} is a list of elements that are to be matched if @var{result}
is to be made.  @var{optional-semantic-action} is an optional sequence
of simplified Emacs Lisp expressions for concocting the parse tree.

In bison, each time an element of @var{components} is found, it is
@dfn{shifted} onto the parser stack.  (The stack of matched elements.)
When all @var{components}' elements have been matched, it is
@dfn{reduced} to @var{result}.  @xref{(bison)Algorithm}.

A particular @var{result} written into your grammar becomes
the parser's goal.  It is designated by a @code{%start} statement
(@pxref{Starting Rules}).  The value returned by the associated
@var{optional-semantic-action} is the parser's result.  It should be
a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
semantic-appdev}.

@var{components} is made up of symbols.  A symbol such as @code{FOO}
means that a syntactic token of class @code{FOO} must be matched.

@menu
155 156 157
* How Lexical Tokens Match::
* Grammar-to-Lisp Details::
* Order of components in rules::
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456
@end menu

@node How Lexical Tokens Match
@section How Lexical Tokens Match

A lexical rule must be used to define how to match a lexical token.

For instance:

@example
%keyword FOO "foo"
@end example

Means that @code{FOO} is a reserved language keyword, matched as such
by looking up into a keyword table, @inforef{keyword Decl,
,grammar-fw}.  This is because @code{"foo"} will be converted to
@code{FOO} in the lexical analysis stage.  Thus the symbol @code{FOO}
won't be available any other way.

If we specify our token in this way:

@example
%token <symbol> FOO "foo"
@end example

then @code{FOO} will match the string @code{"foo"} explicitly, but it
won't do so at the lexical level, allowing use of the text
@code{"foo"} in other forms of regular expressions.

In that case, @code{FOO} is a @code{symbol}-type token.  To match, a
@code{symbol} must first be encountered, and then it must
@code{string-match "foo"}.

@table @strong
@item Caution:
Be especially careful to remember that @code{"foo"}, and more
generally the %token's match-value string, is a regular expression!
@end table

Non symbol tokens are also allowed.  For example:

@example
%token <punctuation> PERIOD "[.]"

filename : symbol PERIOD symbol
         ;
@end example

@code{PERIOD} is a @code{punctuation}-type token that will explicitly
match one period when used in the above rule.

@table @strong
@item Please Note:
@code{symbol}, @code{punctuation}, etc., are predefined lexical token
types, based on the @dfn{syntax class}-character associations
currently in effect.
@end table

@node Grammar-to-Lisp Details
@section Grammar-to-Lisp Details

For the bovinator, lexical token matching patterns are @emph{inlined}.
When the grammar-to-lisp converter encounters a lexical token
declaration of the form:

@example
%token <@var{type}> @var{token-name} @var{match-value}
@end example

It substitutes every occurrences of @var{token-name} in rules, by its
expanded form:

@example
@var{type} @var{match-value}
@end example

For example:

@example
%token <symbol> MOOSE "moose"

find_a_moose: MOOSE
            ;
@end example

Will generate this pseudo equivalent-rule:

@example
find_a_moose: symbol "moose"   ;; invalid syntax!
            ;
@end example

Thus, from the bovinator point of view, the @var{components} part of a
rule is made up of symbols and strings.  A string in the mix means
that the previous symbol must have the additional constraint of
exactly matching it, as described in @ref{How Lexical Tokens Match}.

@table @strong
@item Please Note:
For the bovinator, this task was mixed into the language definition to
simplify implementation, though Bison's technique is more efficient.
@end table

@node Order of components in rules
@section Order of components in rules

If a rule has multiple components, order is important, for example

@example
headerfile : symbol PERIOD symbol
           | symbol
           ;
@end example

would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
The bovine parser will first attempt to match the long form, and then
the short form.  If they were in reverse order, then the long form
would never be tested.

@c @xref{Default syntactic tokens}.

@node Optional Lambda Expression
@chapter Optional Lambda Expressions

The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
a bovine lambda.  This lambda has special short-cuts to simplify
reading the semantic action definition.  An @acronym{OLE} like this:

@example
( $1 )
@end example

results in a lambda return which consists entirely of the string
or object found by matching the first (zeroth) element of match.
An @acronym{OLE} like this:

@example
( ,(foo $1) )
@end example

executes @code{foo} on the first argument, and then splices its return
into the return list whereas:

@example
( (foo $1) )
@end example

executes @code{foo}, and that is placed in the return list.

Here are other things that can appear inline:

@table @code
@item $1
The first object matched.

@item ,$1
The first object spliced into the list (assuming it is a list from a
non-terminal).

@item '$1
The first object matched, placed in a list.  i.e. @code{( $1 )}.

@item foo
The symbol @code{foo} (exactly as displayed).

@item (foo)
A function call to foo which is stuck into the return list.

@item ,(foo)
A function call to foo which is spliced into the return list.

@item '(foo)
A function call to foo which is stuck into the return list in a list.

@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
A list starting with @code{EXPAND} performs a recursive parse on the
token passed to it (represented by @samp{$1} above.)  The
@dfn{semantic list} is a common token to expand, as there are often
interesting things in the list.  The @var{nonterminal} is a symbol in
your table which the bovinator will start with when parsing.
@var{nonterminal}'s definition is the same as any other nonterminal.
@var{depth} should be at least @samp{1} when descending into a
semantic list.

@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
Is like @code{EXPAND}, except that the parser will iterate over
@var{nonterminal} until there are no more matches.  (The same way the
parser iterates over the starting rule (@pxref{Starting Rules}). This
lets you have much simpler rules in this specific case, and also lets
you have positional information in the returned tokens, and error
skipping.

@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
This is used for creating an association list.  Each @var{symbol} is
included in the list if the associated @var{value} is non-@code{nil}.
While the items are all listed explicitly, the created structure is an
association list of the form:

@example
((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
@end example

@item (TAG @var{name} @var{class} [@var{attributes}])
This creates one tag in the current buffer.

@table @var
@item name
Is a string that represents the tag in the language.

@item class
Is the kind of tag being create, such as @code{function}, or
@code{variable}, though any symbol will work.

@item attributes
Is an optional set of labeled values such as @w{@code{:constant-flag t :parent
"parenttype"}}.
@end table

@item  (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
Create a tag with @var{name} of respectively the class
@code{variable}, @code{function}, @code{type}, @code{include},
@code{package}, and @code{code}.
See @inforef{Creating Tags, , semantic-appdev} for the lisp
functions these translate into.
@end table

If the symbol @code{%quotemode backquote} is specified, then use
@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
This lets you send @code{$1} as a symbol into a list instead of having
it expanded inline.

@node Bovine Examples
@chapter Examples

The rule:

@example
any-symbol: symbol
          ;
@end example

is equivalent to

@example
any-symbol: symbol
            ( $1 )
          ;
@end example

which, if it matched the string @samp{"A"}, would return

@example
( "A" )
@end example

If this rule were used like this:

@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
        ( $1 $3 )
      ;
@end example

it would match @samp{"A=B"}, and return

@example
( ("A") ("B") )
@end example

The letters @samp{A} and @samp{B} come back in lists because
@samp{any-symbol} is a nonterminal, not an actual lexical element.

To get a better result with nonterminals, use @asis{,} to splice lists
in like this:

@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
        ( ,$1 ,$3 )
      ;
@end example

which would return

@example
( "A" "B" )
@end example

@node GNU Free Documentation License
@appendix GNU Free Documentation License

457
@include doclicense.texi
458

459 460
@c There is nothing to index at the moment.
@ignore
461 462 463
@node Index
@unnumbered Index
@printindex cp
464
@end ignore
465 466 467 468 469 470 471 472 473 474 475

@iftex
@contents
@summarycontents
@end iftex

@bye

@c Following comments are for the benefit of ispell.

@c  LocalWords:  bovinator inlined