Creating Domain Specific Languages with OMeta
6 min read
I've been researching different ways of constructing the Forge Architect DSL. There are tons of different tools and different algorithms used to lex, tokenise, and evaluate context free languages:
I found OMeta early on, and after reading Alessandro Warth's PHD dissertation, it appeared that OMeta was well suited to prototyping new domain specific languages.
OMeta is based on Parsing Expression Grammars (PEG) that has been extended to handle some of the limitations that the original specification of PEGs had. For example it supports left recursion easily through a simple seed parsing and memoization trick that I don't yet fully grok. (It's all in the PHD paper)
OMeta is implemented through host languages, it's currently available in JavaScript, Squeak, Python, Ruby, C#, Scheme, Common Lisp at varying levels of maintenance.
My plan now is to create an OMeta hosted on Elixir in order to allow Elixir to interpret the Architect as an external DSL.
But before I can start, I need to practice my understanding on OMeta. I found a blog post by Jeff Moser who worked on the OMeta# on C# 6 years ago. He published a small compiler for a toy Fizzbuzz language that compiled the language into C# and executed.
Now I'm not familiar in C#, but I wanted to try to rewrite this in JavaScript, so it would run inside OMetaJS.
So here's our mission. We have a new language that looks like this:
for every number from 1 to 100
if the number is a multiple of 3 and it is a multiple of 5 then
print "FizzBuzz"
else if it is a multiple of 3 then
print "Fizz"
else if it is a multiple of 5 then
print "Buzz"
else
print the number
We need to compile this into JavaScript and execute it.
Here is my solution (gist):
// our FizzBuzz language
var code =
"for every number from 1 to 100\n\
if the number is a multiple of 3 and it is a multiple of 5 then\n\
print \"FizzBuzz\"\n\
else if it is a multiple of 3 then\n\
print \"Fizz\"\
else if it is a multiple of 5 then\n\
print \"Buzz\"\n\
else\n\
print the number\n\
";
ometa FizzBuzz {
// number is overwritten to parse digit characters and return them as a string
number = spaces ('+' | '-' | empty):prefix digit+:ds -> (
parseInt(
(prefix.length > 0) ?
prefix + ds.join('') :
ds.join('')
)
),
// quotedString matches strings inside quotes
quotedString = spaces '"' (~'"' anything)*:string '"' -> (
string.length == 0 ?
"" :
string.join("")
),
// variables can be prefixed with `the`, we need to track it as `_it` in the state table so it can be referenced again
variableName =
("the" | empty) spaces
firstAndRest('letter', 'letterOrDigit'):x
!(self.set("_it", x.join("")))
-> (x.join("")),
// expressions are either an andExpression, multipleExpression, numberExpression or a quotedString
// all expressions are translated into functions
expression = andExpression
| multipleExpression
| numberExpression
| quotedString:qs -> (function () { return qs; }),
// and expressions are left recursive allowing nested and expressions, and they evaluate into a function returning a boolean
andExpression = andExpression:l "and" booleanExpression:r -> (
function () {
return !!l() && !!r();
}
)
| booleanExpression,
// a boolean expression is just a boolean function
booleanExpression = expression:e -> (function () {
var object = e();
if (typeof object == "boolean") {
return object;
} else if (typeof o == "number") {
return object != 0;
} else {
return (String(object).length > 0) && object !== null && object !== undefined;
}
}),
// number expressions are functions that return an integers
// this is where `_it` can be resolved from the previously assigned `the`
numberExpression = number:n -> (function () {
return parseInt(n);
})
| "it" -> (function () {
return parseInt(
self.get(
self.get("_it")
)
);
})
| variableName:vn -> (function () {
return parseInt(self.get(vn));
}),
// `is a multiple of` is a primitive infix operator
multipleExpression = numberExpression:left "is a multiple of" numberExpression:right -> (
function () {
return (left() % right()) == 0;
}
),
// statements represent top expressions
// we have `print`, `if then else` and `for every`
statement = "print" expression:e -> (function () { console.log(e()); })
| "if" andExpression:condition "then" statement:first ("else" statement | empty):second -> (
function () {
if (condition()) {
first();
} else if (String(second).length > 0 && second != null) {
second();
}
}
)
| "for every" variableName:vn "from" number:low "to" number:high statement:s -> (
function () {
for (var i = low; i <= high; i++) {
self.set(vn, i);
s();
}
}
),
// a block is zero or more statements
block = statement*:ss -> (
function () {
ss.forEach(function (statement) {
statement();
});
}
),
// our program is just one block!
program = block
}
FizzBuzz.initialize = function() {
// our global state table
this.vars = {};
this.set = function(k, v){
this.vars[k] = v;
return this;
};
this.get = function(k) {
return this.vars[k];
};
};
// compiles our language into JavaScript with the top level program rule
var result = FizzBuzz.matchAll(
code,
'program'
);
// execute the code!
result();
For the purpose of brevity, this code directly executes the Fizzbuzz language, it doesn't create an intermediate abstract syntax tree. Notice how to it essentially converts all the expressions into functions to be executed at the top level rule.
Copy the above code into the online OMeta interpreter (Source field): http://www.tinlizzie.org/ometa-js/ Or you can use https://github.com/alexwarth/ometa-js
Hit the run, and you get this inside your console:
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
16
17
Fizz
19
Buzz
Fizz
22
23
Fizz
Buzz
26
Fizz
28
29
FizzBuzz
31
32
Fizz
34
Buzz
Fizz
37
38
Fizz
Buzz
41
Fizz
43
44
FizzBuzz
46
47
Fizz
49
Buzz
Fizz
52
53
Fizz
Buzz
56
Fizz
58
59
FizzBuzz
61
62
Fizz
64
Buzz
Fizz
67
68
Fizz
Buzz
71
Fizz
73
74
FizzBuzz
76
77
Fizz
79
Buzz
Fizz
82
83
Fizz
Buzz
86
Fizz
88
89
FizzBuzz
91
92
Fizz
94
Buzz
Fizz
97
98
Fizz
Buzz
Later I created a simple Markdown compiler written in OMetaJS: https://gist.github.com/CMCDragonkai/4bebe4156fcc5fdd76b0 Which was derived from the original here: http://joshondesign.com/2013/03/05/ometa1
Discovered some interesting OMetaJS idiosyncrasies when it comes to string matching: https://gist.github.com/CMCDragonkai/963bf8066ade0253bb78
Now OMetaJS is no longer being maintained. But there are 3 active forks that I am going to investigate:
- https://github.com/xixixao/meta-coffee
- https://github.com/Page-/ometa-js
- https://github.com/veged/ometa-js
Now for more OMeta research...