In this series of posts, we want to parse expressions. In order to make it worthwhile and fun, we will begin by placing some constraints on this.
- We will be using the language Standard ML for describing the approach. Though note that any functional language of a certain power can do this. Converting it to Ocaml, Haskell, Erlang or a lisp should be pretty straightforward once you get the hang of the idea.
In some sense, Standard ML is the simplest of the (statically typed) functional languages in that it has almost no fancy tricks you can pull.
- We will not be using regular expressions. Not that regular expressions are bad, but I want to show that this can be done easily without the use of RE's. The goal here is to show people that there are different approaches to parsing data than just throwing a regular expression at the problem.
The language we will study has the following simple syntax structure, given in BNF form:
e ::= x | n | e1 + e2 | e1 - e2 | e1 * e2
The 'x' is a variable. Its value can range over integers (we won't be concerned with the actual range of these integers, altough it is interesting to muse over). The 'n' is some integer constant. The term 'e1 + e2' is an addition of two numbers. Likewise for subtraction and multiplication. Here is a typical expression:
x + 3 * (y - 5)
Note that the parenthesis are not given in the syntax. We just assume their existence and they work as you would expect: grouping together terms to force evaluation order.
Writing down the syntax in Standard ML is straightforward:
datatype exp = Var of string | Const of int | Add of exp * exp
| Sub of exp * exp | Mul of exp * exp
Hopefully it is clear what is going on here. We have translated the syntax into an ML datatype. To discriminate each construct, we have added
constructors 'Var','Const','Add','Sub', and 'Mul'.
If we can somehow bring strings of the form 'x + 3 * (y - 5)' into this datatype (correctly!) -- and we know what 'x' and 'y' is, then we are able to evaluate the expression. Here is how:
fun eval_exp env exp =
case exp of
Var x => env x
| Const n => n
| Add (e1, e2) => (eval_exp env e1) + (eval_exp env e2)
| Sub (e1, e2) => (eval_exp env e1) - (eval_exp env e2)
| Mul (e1, e2) => (eval_exp env e1) * (eval_exp env e2)
So we destruct the form of 'exp'. If it is a 'Var', we look up the variable in the environment 'env', which we will define shortly. If it is a 'Const', we just return the integer value. Finally, if it is a binary operator, we proceed to evaluate the two sub-expressions and apply the right ML-operator to it.
We still need to give a meaning to the environment 'env'. We would like to represent it as a map from strings to integers (ie, a function of type 'string -> int'), as our variables are strings. This is straightforward in Standard ML fortunately. Suppose we want to represent the mapping 'x' maps to 2 and 'y' maps to 7:
exception env_notfound of string
fun test_env v =
case v of
"x" => 2
| "y" => 7
| _ => raise (env_notfound v)
We define an exception for the case where we don't know about the variable and raise it if the variable is unknown. In Standard ML '_' stands as a wild-card for "any other unnamed value."
Let us round up by playing with what we have. If we manually parse the term 'x + 3 * (y - 5)' it becomes:
val term = Add (Var "x", Mul (Const 3, Sub (Var "y", Const 5)))
So if we try to run the term with our environment from above, we expect the result to be '2 + 3 * (7 - 5) = 2 + 3 * 2 = 2 + 6 = 8'. Here is what our Standard ML system says:
- eval_exp test_env term;
val it = 8 : int
In the next part, we will write the parser that allows us to automate the parsing of the term into the datatype of 'val term'
View comments