首页 > 代码库 > 转换器3:手写PHP转Python编译器,词法部分

转换器3:手写PHP转Python编译器,词法部分

上周写了《ThinkPhp模板转Flask、Django模板》

一时技痒,自然而然地想搞个大家伙,把整个PHP程序转成Python。不比模板,可以用正则匹配偷懒,这次非写一个Php编译器不可。

上网搜了一下,发现大部分Python to xxx的transpile都是直接基于AST,省略了最重要的Tokenizer,Parser。直接写个Visitor了事。要不然就是基于Antlr之类的生成器,搞一大堆代码,看得令人心烦。

既然大家都不想做这个苦力,我就来试试,手工写一个Php编译器。分Tokenizer,Parser,Visitor三个部分来实现。

翻出《龙书》《虎书》做参考,仔细学了一回PHP,不学不知道,原来PHP有那么多特性,做个编译器真心累人。

词法部分很简单,就是一个自动机。设计了一个结构存放自动机,然后简单粗暴地在自动机上编程,也顾不上什么性能了,就是个一锤子买卖。

写得还算快,调试不是很顺,不过我是不会说的,哈

自动机不复杂,发上来大家看看,敬请指正。

self.statemachine = {
            current: {
                state: default, content: ‘‘, line: 0},
            default: [
                {name: open, next: php, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r<\?},
                {name: open, next: php, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r<\?php}],
            php: [
                {name: close, next: default, extra: 0,
                 token: r\?>, start: 0, end: 0, cache: ‘‘},
                {name: lnum, next: ‘‘, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r[0-9]+},
                {name: dnum, next: ‘‘, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r([0-9]*\.[0-9]+)|([0-9]+\.[0-9]*)},
                {name: exponent, next: ‘‘, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r(([0-9]+|([0-9]*\.[0-9]+)|([0-9]+\.[0-9]*))[eE][+-]?[0-9]+)},
                {name: hnum, next: ‘‘, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r0x[0-9a-fA-F]+},
                {name: bnum, next: ‘‘, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r0b[01]+},
                {name: label, next: ‘‘, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*},
                {name: comment, next: commentline, extra: 1,
                 token: r//, start: 0, end: 0, cache: ‘‘},
                {name: comment, next: commentline, extra: 1,
                 token: r#, start: 0, end: 0, cache: ‘‘},
                {name: comment, next: comment, extra: 1,
                 token: r/\*, start: 0, end: 0, cache: ‘‘},
                {name: string, next: string1, extra: 1,
                 token: r\‘, start: 0, end: 0, cache: ‘‘},
                {name: string, next: string2, extra: 1,
                 token: r", start: 0, end: 0, cache: ‘‘},
                {name: symbol, next: ‘‘, extra: 0, start: 0, end: 0, cache: ‘‘,
                 token: r[\\\{\};:,\.\[\]\(\)\|\^&\+-/\*=%!~$<>\?@]}],
            string1: [
                {name: string, next: php, extra: 0,
                 token: r\‘, start: 0, end: 0, cache: ‘‘},
                {name: string, next: escape1, extra: 1,
                 token: r\\, start: 0, end: 0, cache: ‘‘},
                {name: string, next: ‘‘, extra: 1,
                 token: r‘‘, start: 0, end: 0, cache: ‘‘}],
            escape1: [
                {name: string, next: string1, extra: 1,
                 token: r., start: 0, end: 0, cache: ‘‘}],
            string2: [
                {name: string, next: php, extra: 0,
                 token: r\‘, start: 0, end: 0, cache: ‘‘},
                {name: string, next: escape2, extra: 1,
                 token: r\\, start: 0, end: 0, cache: ‘‘},
                {name: string, next: ‘‘, extra: 1,
                 token: r‘‘, start: 0, end: 0, cache: ‘‘}],
            escape2: [
                {name: string, next: string2, extra: 1,
                 token: r., start: 0, end: 0, cache: ‘‘}],
            commentline: [
                {name: comment, next: php, extra: 0,
                 token: r(\r|\n|\r\n), start: 0, end: 0, cache: ‘‘},
                {name: comment, next: php, extra: 0,
                 token: r‘‘, start: 0, end: 0, cache: ‘‘}],
            comment: [
                {name: comment, next: php, extra: 0,
                 token: r\*/, start: 0, end: 0, cache: ‘‘},
                {name: comment, next: ‘‘, extra: 1,
                 token: r‘‘, start: 0, end: 0, cache: ‘‘}]}

 

源码:converterV0.3.zip

<未完待续>

 

转换器3:手写PHP转Python编译器,词法部分