Commit cfa49c1e authored by Eric M. Ludlam's avatar Eric M. Ludlam Committed by Glenn Morris
Browse files

Import bovine manual from CEDET trunk

and preceding discussion

Imported from
parent 9b97b143
2012-12-13 Eric Ludlam <>
David Ponce <>
Richard Kim <>
* bovine.texi: New file, imported from CEDET trunk.
2012-12-12 Glenn Morris <>
* flymake.texi (Customizable variables, Locating the buildfile):
\input texinfo @c -*-texinfo-*-
@c %**start of header
@set TITLE Bovine parser development
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
@settitle @value{TITLE}
@c *************************************************************************
@c @ Header
@c *************************************************************************
@c Merge all indexes into a single index for now.
@c We can always separate them later into two or more as needed.
@syncodeindex vr cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex pg cp
@syncodeindex tp cp
@c @footnotestyle separate
@c @paragraphindent 2
@c @@smallbook
@c %**end of header
This manual documents Bovine parser development in Semantic
Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 Eric M. Ludlam
Copyright @copyright{} 2001, 2002, 2003, 2004 David Ponce
Copyright @copyright{} 2002, 2003 Richard Y. Kim
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being list their titles, with the Front-Cover Texts
being list, and with the Back-Cover Texts being list. A copy of the
license is included in the section entitled ``GNU Free Documentation
@end quotation
@end copying
@dircategory Emacs
* Semantic bovine parser development: (bovine).
@end direntry
@end ifinfo
@end iftex
@c @setchapternewpage odd
@c @setchapternewpage off
This file documents parser development with the bovine parser generator
@emph{Infrastructure for parser based text analysis in Emacs}
Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
@end ifinfo
@sp 10
@title @value{TITLE}
@author by @value{AUTHOR}
@vskip 0pt plus 1 fill
Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004 @value{AUTHOR}
@vskip 0pt plus 1 fill
@end titlepage
@c MACRO inclusion
@include semanticheader.texi
@c *************************************************************************
@c @ Document
@c *************************************************************************
@node top
@top @value{TITLE}
The @dfn{bovine} parser is the original @semantic{} parser, and is an
implementation of an @acronym{LL} parser. It is good for simple
languages. It has many conveniences making grammar writing easy. The
conveniences make it less powerful than a Bison-like @acronym{LALR}
parser. For more information, @inforef{top, the Wisent Parser Manual,
Bovine @acronym{LL} grammars are stored in files with a @file{.by}
extension. When compiled, the contents is converted into a file of
the form @file{NAME-by.el}. This, in turn is byte compiled.
@inforef{top, Grammar Framework Manual, grammar-fw}.
* Starting Rules:: The starting rules for the grammar.
* Bovine Grammar Rules:: Rules used to parse a language
* Optional Lambda Expression:: Actions to take when a rule is matched
* Bovine Examples:: Simple Samples
* GNU Free Documentation License::
* Index::
@end menu
@node Starting Rules
@chapter Starting Rules
In Bison, one and only one nonterminal is designated as the ``start''
symbol. In @semantic{}, one or more nonterminals can be designated as
the ``start'' symbol. They are declared following the @code{%start}
keyword separated by spaces. @inforef{start Decl, ,grammar-fw}.
If no @code{%start} keyword is used in a grammar, then the very first
is used. Internally the first start nonterminal is targeted by the
reserved symbol @code{bovine-toplevel}, so it can be found by the
parser harness.
To find locally defined variables, the local context handler needs to
parse the body of functional code. The @code{scopestart} declaration
specifies the name of a nonterminal used as the goal to parse a local
context, @inforef{scopestart Decl, ,grammar-fw}. Internally the
scopestart nonterminal is targeted by the reserved symbol
@code{bovine-inner-scope}, so it can be found by the parser harness.
@node Bovine Grammar Rules
@chapter Bovine Grammar Rules
The rules are what allow the compiler to create tags from a language
file. Once the setup is done in the prologue, you can start writing
rules. @inforef{Grammar Rules, ,grammar-fw}.
@var{result} : @var{components1} @var{optional-semantic-action1})
| @var{components2} @var{optional-semantic-action2}
@end example
@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
@var{components} is a list of elements that are to be matched if @var{result}
is to be made. @var{optional-semantic-action} is an optional sequence
of simplified Emacs Lisp expressions for concocting the parse tree.
In bison, each time an element of @var{components} is found, it is
@dfn{shifted} onto the parser stack. (The stack of matched elements.)
When all @var{components}' elements have been matched, it is
@dfn{reduced} to @var{result}. @xref{(bison)Algorithm}.
A particular @var{result} written into your grammar becomes
the parser's goal. It is designated by a @code{%start} statement
(@pxref{Starting Rules}). The value returned by the associated
@var{optional-semantic-action} is the parser's result. It should be
a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
@var{components} is made up of symbols. A symbol such as @code{FOO}
means that a syntactic token of class @code{FOO} must be matched.
* How Lexical Tokens Match::
* Grammar-to-Lisp Details::
* Order of components in rules::
@end menu
@node How Lexical Tokens Match
@section How Lexical Tokens Match
A lexical rule must be used to define how to match a lexical token.
For instance:
%keyword FOO "foo"
@end example
Means that @code{FOO} is a reserved language keyword, matched as such
by looking up into a keyword table, @inforef{keyword Decl,
,grammar-fw}. This is because @code{"foo"} will be converted to
@code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
won't be available any other way.
If we specify our token in this way:
%token <symbol> FOO "foo"
@end example
then @code{FOO} will match the string @code{"foo"} explicitly, but it
won't do so at the lexical level, allowing use of the text
@code{"foo"} in other forms of regular expressions.
In that case, @code{FOO} is a @code{symbol}-type token. To match, a
@code{symbol} must first be encountered, and then it must
@code{string-match "foo"}.
@table @strong
@item Caution:
Be especially careful to remember that @code{"foo"}, and more
generally the %token's match-value string, is a regular expression!
@end table
Non symbol tokens are also allowed. For example:
%token <punctuation> PERIOD "[.]"
filename : symbol PERIOD symbol
@end example
@code{PERIOD} is a @code{punctuation}-type token that will explicitly
match one period when used in the above rule.
@table @strong
@item Please Note:
@code{symbol}, @code{punctuation}, etc., are predefined lexical token
types, based on the @dfn{syntax class}-character associations
currently in effect.
@end table
@node Grammar-to-Lisp Details
@section Grammar-to-Lisp Details
For the bovinator, lexical token matching patterns are @emph{inlined}.
When the grammar-to-lisp converter encounters a lexical token
declaration of the form:
%token <@var{type}> @var{token-name} @var{match-value}
@end example
It substitutes every occurrences of @var{token-name} in rules, by its
expanded form:
@var{type} @var{match-value}
@end example
For example:
%token <symbol> MOOSE "moose"
find_a_moose: MOOSE
@end example
Will generate this pseudo equivalent-rule:
find_a_moose: symbol "moose" ;; invalid syntax!
@end example
Thus, from the bovinator point of view, the @var{components} part of a
rule is made up of symbols and strings. A string in the mix means
that the previous symbol must have the additional constraint of
exactly matching it, as described in @ref{How Lexical Tokens Match}.
@table @strong
@item Please Note:
For the bovinator, this task was mixed into the language definition to
simplify implementation, though Bison's technique is more efficient.
@end table
@node Order of components in rules
@section Order of components in rules
If a rule has multiple components, order is important, for example
headerfile : symbol PERIOD symbol
| symbol
@end example
would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
The bovine parser will first attempt to match the long form, and then
the short form. If they were in reverse order, then the long form
would never be tested.
@c @xref{Default syntactic tokens}.
@node Optional Lambda Expression
@chapter Optional Lambda Expressions
The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
a bovine lambda. This lambda has special short-cuts to simplify
reading the semantic action definition. An @acronym{OLE} like this:
( $1 )
@end example
results in a lambda return which consists entirely of the string
or object found by matching the first (zeroth) element of match.
An @acronym{OLE} like this:
( ,(foo $1) )
@end example
executes @code{foo} on the first argument, and then splices its return
into the return list whereas:
( (foo $1) )
@end example
executes @code{foo}, and that is placed in the return list.
Here are other things that can appear inline:
@table @code
@item $1
The first object matched.
@item ,$1
The first object spliced into the list (assuming it is a list from a
@item '$1
The first object matched, placed in a list. i.e. @code{( $1 )}.
@item foo
The symbol @code{foo} (exactly as displayed).
@item (foo)
A function call to foo which is stuck into the return list.
@item ,(foo)
A function call to foo which is spliced into the return list.
@item '(foo)
A function call to foo which is stuck into the return list in a list.
@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
A list starting with @code{EXPAND} performs a recursive parse on the
token passed to it (represented by @samp{$1} above.) The
@dfn{semantic list} is a common token to expand, as there are often
interesting things in the list. The @var{nonterminal} is a symbol in
your table which the bovinator will start with when parsing.
@var{nonterminal}'s definition is the same as any other nonterminal.
@var{depth} should be at least @samp{1} when descending into a
semantic list.
@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
Is like @code{EXPAND}, except that the parser will iterate over
@var{nonterminal} until there are no more matches. (The same way the
parser iterates over the starting rule (@pxref{Starting Rules}). This
lets you have much simpler rules in this specific case, and also lets
you have positional information in the returned tokens, and error
@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
This is used for creating an association list. Each @var{symbol} is
included in the list if the associated @var{value} is non-@code{nil}.
While the items are all listed explicitly, the created structure is an
association list of the form:
((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
@end example
@item (TAG @var{name} @var{class} [@var{attributes}])
This creates one tag in the current buffer.
@table @var
@item name
Is a string that represents the tag in the language.
@item class
Is the kind of tag being create, such as @code{function}, or
@code{variable}, though any symbol will work.
@item attributes
Is an optional set of labeled values such as @w{@code{:constant-flag t :parent
@end table
@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
Create a tag with @var{name} of respectively the class
@code{variable}, @code{function}, @code{type}, @code{include},
@code{package}, and @code{code}.
See @inforef{Creating Tags, , semantic-appdev} for the lisp
functions these translate into.
@end table
If the symbol @code{%quotemode backquote} is specified, then use
@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
This lets you send @code{$1} as a symbol into a list instead of having
it expanded inline.
@node Bovine Examples
@chapter Examples
The rule:
any-symbol: symbol
@end example
is equivalent to
any-symbol: symbol
( $1 )
@end example
which, if it matched the string @samp{"A"}, would return
( "A" )
@end example
If this rule were used like this:
%token <punctuation> EQUAL "="
assign: any-symbol EQUAL any-symbol
( $1 $3 )
@end example
it would match @samp{"A=B"}, and return
( ("A") ("B") )
@end example
The letters @samp{A} and @samp{B} come back in lists because
@samp{any-symbol} is a nonterminal, not an actual lexical element.
To get a better result with nonterminals, use @asis{,} to splice lists
in like this:
%token <punctuation> EQUAL "="
assign: any-symbol EQUAL any-symbol
( ,$1 ,$3 )
@end example
which would return
( "A" "B" )
@end example
@node GNU Free Documentation License
@appendix GNU Free Documentation License
@include fdl.texi
@node Index
@unnumbered Index
@printindex cp
@end iftex
@c Following comments are for the benefit of ispell.
@c LocalWords: bovinator inlined
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment