Based on parser combinators.
- Node.js v12+
$ npm i
Similar to the notation used by Robert Nystrom in his Lox
|
means alternatives?
means "zero or one"*
means "zero to many"+
means "one to many"
Differences:
- syntax rules use
->
to separate head and body - lexical rules use
:
to separate head and body - lexical rules use JavaScript
RegExp
instead of natural language, and must start with^
See grammar/play.grammar
.
prog -> (functionDecl | functionCall)* ;
functionDecl -> "function" Identifier "(" ")" functionBody ;
functionBody -> "{" functionCall* "}" ;
functionCall -> Identifier "(" parameterList? ")" ";" ;
parameterList -> StringLiteral ("," StringLiteral)* ;
Identifier : /^[a-zA-Z_][a-zA-Z0-9_]*/ ;
StringLiteral : /^"[^"]*"/ ;
Usually the parsing result should be mapped to get sensible intermediate results for further processing. This is done by appending .map()
calls to rules, which are just plain TypeScript. See grammar/play.map.grammar
.
# parserGen <grammar>
$ npx ts-node src/parserGen.ts grammar/play.map.grammar > src/play.grammar.ts
The heads of the rules are exported as parsers. To use the generated parsers (src/play.grammar.ts
), just call .parse()
on source text:
// src/play.ts
import { prog } from './play.grammar'
// ...
const res = await prog.parse(source)
console.log(res.result)
// ...
Then for a test program (grammar/play.ts
):
/**
* function call in function body
*/
// foo
function foo(){
println("in foo...");
}
// bar
function bar(){
println("in bar...");
foo();
}
bar();
Parse it:
$ npx ts-node src/play.ts grammar/play.ts
{ type: 'prog',
stmts:
[ { type: 'funcDecl',
name: 'foo',
body:
{ type: 'funcBody',
stmts: [ { type: 'funcCall', name: 'println', params: [ '"in foo..."' ] } ] } },
{ type: 'funcDecl',
name: 'bar',
body:
{ type: 'funcBody',
stmts:
[ { type: 'funcCall', name: 'println', params: [ '"in bar..."' ] },
{ type: 'funcCall', name: 'foo', params: undefined } ] } },
{ type: 'funcCall', name: 'bar', params: undefined } ] }
See grammar/grammar.grammar
Prologue : /^```.*```/s ;
Regex : /^[/][^].*[/](?=\s*[;\.])/ ;
MapCode : /^\.map\(.*?\)(?=\s*;)/s ;
Terminal : /^[A-Z][A-Za-z_]*/ ;
NonTerminal : /^[a-z][A-Za-z_]*/ ;
Literal : /^"[^"]*"/ ;
primary -> Terminal | NonTerminal | Literal | "(" choice ")" ;
qualified -> primary "?" | primary "*" | primary "+" | primary ;
sequence -> qualified+ ;
choice -> sequence ("|" sequence)* ;
syntax -> NonTerminal "->" choice MapCode? ";" ;
lexical -> Terminal ":" Regex MapCode? ";" ;
grammar -> Prologue? (syntax | lexical)* ;
# generate parser with meta grammar
$ npx ts-node src/parserGen.ts grammar/grammar.grammar > src/grammar1.grammar.ts
# overwrite existing parser with generated parser
$ mv src/grammar1.grammar.ts src/grammar.grammar.ts
# generate a new parser with generated parser
$ npx ts-node src/parserGen.ts grammar/grammar.grammar > src/grammar1.grammar.ts
# compare generated parser with previous parser
$ diff src/grammar.grammar.ts src/grammar1.grammar.ts
# files are the same
- Error reporting