About precedence and associativity in parsers

precedence is the ability to give more or less priority to symbols. for example the expression 5 + 4 * 10 can be understood in two different way. the first one, (5 + 4) * 10 which give you the value 90, and the second one 5 + (4 10) which give you the value 45. basic mathematics taught us that the second one is the right one, and that has priority over +. in this case * has higher precedence than +.

associativity is the way to associate same thing together. for example the expression 8 / 4 * 4 can again be understood in two different way. the first one (8 / 4) * 4 which give you the value 8, and the second one 8 / (4 * 4) which gives you the value 0.5. * and / got the same precedence so you can’t determine only with this criteria. on a left-to-right association, expression associate from left to right. so 1 * 2 * 3 * 4 is actually understood as ( (1 2) 3 ) * 4. on a right-to-left association with associate term on the opposite way. 1 * 2 * 3 * 4 is understood as 1 * (2 * ( 3 * 4 ) ).

common parser generator, using LALR(1) type of parsing, work with a stack of symbol and only do two operations repetitively: reduce or shift. shift means that we take the symbol we looking at and we put it on the stack. reducing means that part of (or all) the stack can be replace by some rule.

the following grammar contains conflict with the + and * terms. conflicts means that the parser has multiple options as to how things can be parsed.

E -> E * E (rule 1)
  -> E + E (rule 2)

when parsing the following expression E + E * E, the following actions are performed:

Stack           Input           Action
                E + E * E       shift
E               + E * E         shift
E +             E * E           shift
E + E           * E             shift/reduce conflict

At this point the parser can choose either to replace the stack elements E + E by E through rule 2, or it can choose to shift the following elements through rule 1. by default usually, they choose to shift which lead into the following

Stack           Input           Action
E + E           * E             shift
E + E *         E               shift
E + E * E                       reduce (rule 1)
E + E                           reduce (rule 2)
E                               accept

that means the expression has been parsed E + (E * E).

going back to the shift choosing, and replacing it by a reduce will lead into the following:

Stack           Input           Action
E + E           * E             reduce (rule 2)
E               * E             shift
E *             E               shift
E * E                           reduce (rule 1)
E                               accept

which is (E + E) * E

in this specific case, the grammar is very easily written to get rid of the conflicts between the 2 operators. however when the number of symbols grows this is getting more and more tedious to do.

Fortunately, this is possible to actually influence the parser to do an educated guess at what to do when there’s a shift/reduce conflict instead of defaulting to some action. this is done with precedence and associativity of terminal symbols.

When the precedence of the next symbol is higher than your current stack precedence, you want to shift, and when it’s lower, reduce. when precedence is the same, for example * and /, associativity come into action. left-to-right associativity will means reduce, and right-to-left shift.

practically with bison or ocamlyacc, precedence and associativity are defined together at the beginning of the grammar file. using the %left %right %nonassoc directives.

%left directive means that the following symbols will associate from left-to-right. %right directive means right-to-left. and the %nonassoc directive means that instead of associating symbols either from left-to-right or right-to-left, we want the parser to error-out on those case, which means that the user has to write the way it’s associated explicitly (using parenthesis for examples).

The precedence is implicitely defined with the same directives using the order in which they appears in the file. the first of those directives will be the lowest precedence, and the last of those directives will be the highest precedence.

for example a subset of the C operators precedence will be written the following way:

%left ,
%right = += -= *= ...
%left == !=
%left <= < > >=
%left << >>
%left + -
%left * / %

note that sometimes some symbol have different precedence in different context. -4 * 5 need to be understood has (- 4) * 5. using the previous precedence this is not possible since the precedence of the stack after shifting “- 4” is the precedence of -, which is lower than the precedence of *. hence the parser will shift to give priority to *.

the %prec directive is useful to artificially change the precedence of the stack. in our case we want to define a dummy symbol that going to have higher precedence than *. First you need to tweak the precedence directives adding

%left + -
%left * / %
%left DUMMY_MINUS

and in your grammar the “E -> - E” rules will be rewritten “E -> - E %prec DUMMY_MINUS”.


posted by Vincent Hanquez on November 28, 2008.

tags precedence, associativity, parser, grammar.

in technical.