【正文】
Languages 7 Lex Source to C Program ? The table is translated to a C program () which – reads an input stream – partitioning the input into strings which match the given expressions and – copying it to an output stream if necessary PLLab, NTHU,Cs2403 Programming Languages 8 An Overview of Lex Lex C piler Lex source program input tokens PLLab, NTHU,Cs2403 Programming Languages 9 (optional) (required) Lex Source ? Lex source is separated into three sections by %% delimiters ? The general format of Lex source is ? The absolute minimum Lex program is thus {definitions} %% {transition rules} %% {user subroutines} %% PLLab, NTHU,Cs2403 Programming Languages 10 Lex . Yacc ? Lex – Lex generates C code for a lexical analyzer, or scanner – Lex uses patterns that match strings in the input and converts the strings to tokens ? Yacc – Yacc generates C code for syntax analyzer, or parser. – Yacc uses grammar rules that allow it to analyze tokens from Lex and create a syntax tree. PLLab, NTHU,Cs2403 Programming Languages 11 Lex with Yacc Lex Yacc yylex() yyparse() Lex source (Lexical Rules) Yacc source (Grammar Rules) Input Parsed Input return token call PLLab, NTHU,Cs2403 Programming Languages 12 Regular Expressions PLLab, NTHU,Cs2403 Programming Languages 13 Lex Regular Expressions (Extended Regular Expressions) ? A regular expression matches a set of strings ? Regular expression – Operators – Character classes – Arbitrary character – Optional expressions – Alternation and grouping – Context sensitivity – Repetitions and definitions PLLab, NTHU,Cs2403 Programming Languages 14 Operators “ \ [ ] ^ ? . * + | ( ) $ / { } % ? If they are to be used as text characters, an escape should be used \$ = “$” \\ = “\” ? Every character but blank, tab (\t), newline (\n) and the list above is always a text character PLLab, NTHU,Cs2403 Programming Languages 15 Character Classes [] ? [abc] matches a single character, which may be a, b, or c ? Every operator meaning is ignored except \ and ^ ? . [ab] = a or b [az] = a or b or c or … or z [+09] = all the digits and the two signs [^azAZ] = any character which is not a letter PLLab, NTHU,Cs2403 Programming Languages 16 Arbitrary Character . ? To match almost character, the operator character . is the class of all characters except newline ?[\40\176] matches all printable characters in the ASCII character set, from octal 40 (blank) to octal 176 (tilde~) PLLab, NTHU,Cs2403 Programming Languages 17 Optional amp。 %% main() { yylex()。 – Causes the three spacing characters to be ignored a = b + c。 ? The unmatched token is using a default action that ECHO from the input to the output PLLab, NTHU,Cs2403 Programming Languages 23 Transition Rules (cont?d) ? REJECT – Go do the next alternative … %% pink {npink++。 [az]+ ECHO。} %% main() { yylex()。+39。 return 0。 factor | factor。 term | term 。 | ID | NUM 。 factor { $$ = $1 * $3。 } | ID | NUM 。 factor { $$ = $1 * $3。 } | ID | NUM 。 factor { $$ = $1 * $3。 } | ID | NUM 。 factor { $$ = $1 * $3。 } | ID | NUM 。+39。(39。\n39。+39。 factor { $$ = $1 * $3。 expression 39。 } 。 %% Scanner (cont’d) PLLab, NTHU,Cs2403 Programming Languages 57 YACC Command ? Yacc (ATamp。 ( 39。 NE LE GE %left 39。 } | expr ?? expr { $$ = $1 $3。39。 Declare the collection of data types that semantic values may have `%tok