전공/컴파일러
Lexical Analysis
nongdamgom
2024. 4. 18. 14:39
Tokens
- a token is a syntactic category
- identifier / number / operator ....
- ex) identifier A
- => token name = identifer / token value = A
Lexemes
- sequence of characters that matches the pattern for a token
- ex) token(name) = identifier / lexeme = pi, score, i, j, k....
Class of tokens
- keyword : IF, ELSE, FLOAT, CHAR ...
- operators : ADD, COMPARISON ...
- identifiers : all kinds of identifiers
- constants : INTEGER, REAL, LITERAL (numeric constant)
- punctuation symbol : LPAREN ( , COMMA ,
- whitespace : no meaning, 무시한다.
How to specify the patterns for tokens?
=> Regular languages
How to recognize the tokens from input streams?
=> Finite automata
Definition : Alphabet, string, language
- alphabet Σ : any finite set of symbols
- string s : If Σ = {a,b} , s = a, b, aa, ab, ba .......
- language L : any set of strings over some fixed alphabet Σ
- => If Σ = {a,b}, L1 = {a, ab, ba, aba} ==> finite L
- => If Σ = {a,b}, L1 = {a, ab, ba, aba, aaa ..........} => infinite L
Operation on String
Operation on Language
Regular expressions
이후는 정리할 필요가 없을듯..