전공/컴파일러

Lexical Analysis

nongdamgom 2024. 4. 18. 14:39

Tokens

  • a token is a syntactic category
  • identifier / number / operator ....
  • ex) identifier A
  • => token name  = identifer  / token value = A

 

Lexemes

  • sequence of characters that matches the pattern for a token
  • ex) token(name) = identifier / lexeme = pi, score, i, j, k....

 

Class of tokens

  • keyword : IF, ELSE, FLOAT, CHAR ...
  • operators : ADD, COMPARISON ...
  • identifiers : all kinds of identifiers
  • constants : INTEGER, REAL, LITERAL (numeric constant)
  • punctuation symbol : LPAREN ( , COMMA ,
  • whitespace : no meaning, 무시한다.

 

 

How to specify the patterns for tokens?

=> Regular languages

 

How to recognize the tokens from input streams?

=> Finite automata

 

 

 

Definition : Alphabet, string, language

  • alphabet  Σ : any finite set of symbols
  • string s :  If Σ = {a,b}  ,   s = a, b, aa, ab, ba .......
  • language L  : any set of strings over some fixed alphabet Σ
  • => If Σ = {a,b},  L1 = {a, ab, ba, aba} ==> finite L
  • => If Σ = {a,b},  L1 = {a, ab, ba, aba, aaa ..........}  => infinite L

 

Operation on String

 

 

Operation on Language

 

 

 

Regular expressions

 

이후는 정리할 필요가 없을듯..