【正文】
if Accept[state] then accept。 while not Accept[state] and not error(state) do newstate:= T[state][ch]。 end while。 break。 default: state:=error。 else {error or other cases} endif 1 2 letter letter digit [other] 3 return ID Another Implementation state:=1。 = delimits two indentifiers – Blanks, tabs, and new line are delimiters. ? Delimiters end the token string, but they may be part of the next token string. ? Delimiter is not removed from the rest of the input. – By returning it to the input (backing up). – By looking ahead before removing the delimiter. Ambiguity ? Some strings can be matched by several different regular expressions. – if and while could be either keywords or identifiers ? Language definition must give disambiguating rules. – Keyword interpretation is preferred. – Principle of longest substring. Finite Automata ? Finite automata, or finitestate machines, are a mathematical way of describing particular kinds of algorithms. ? Finite automata can be used to recognize patterns. ? There is a strong relationship between finite automata and regular expressions. ? ( regular expression FA) Example ? identifier = letter(letter|digit)* ? Circles represent states. ? The arrowed lines represent transitions upon a match of the labeled characters. ? Start state is indicated by unlabeled arrow ? Accepting states are indicated by doubleline border. 1 2 letter letter digit Deterministic Final Automata ? A DFA M consists of – An alphabet Σ – A set of states S – A transition function T: (S x Σ) S – A start state s0 S – A set of accepting states A S ? The language accepted by M, written L(M), is the set of strings of characters c1c2…c n with each ci Σ such that there exist states s1=T(s0,c1), s2=T(s1,c2), …. sn=T(sn1,) with sn an element of A. ? ? ?Error Transitions 1 2 letter digit Error other other letter By convention, the error transitions are not drawn in the diagram, but assumed to always exist. other = ~letter other = ~(letter|digit) Example ? The set of strings that contain exactly one b is accepted by the following DFA: ? ? (~a)*b(~a)* (Regular expression ) ? We will omit labels when it is not necessary to refer to the states by name. b notb notb Example ? The set of strings that contain at most one b is accepted by the following DFA: ? (it is simpler than its regular expression ) ? We will omit labels when it is not necessary to refer to the states by name. b notb notb Example ? digit = [09] ? nat = digit+ ? signedNat = (+|)? nat digit digit digit digit digit + digit Example (cont) ? number = signedNat(“.” nat)?(E signedNat)? digit digit + digit digit digit . digit digit + digit E E Example ? {Pascal Comments} ? /* C Comments */ other { } other * / * / other * Actions ? When a transitions is made, the character from the input string is moved to a string that accumulates the characters belonging to a single token (lexeme of the token). ? When an accepting state is reached, the recognized token is returned. ? When an error state is reached, an error token is generated or backtracking is done. Actions Example 1 2 letter letter digit start in_id letter letter digit [other] finish return ID [ ] indicate that the delimiting character should be returned to the input string and not consumed. Uniting DFA’s ? In a typical programming language there are many tokens, and each token is recognized by its own DFA. ? If each token begins with a different character, then it is easy to unite their start states into one start state. – Example: :=, =, = = = return ASSIGN :