Sunday, November 15, 2009

Compiling DSLs

Recently I had occasion to revisit this post from Peter Seibel's blog about compiling queries in Lisp. It's a cliche in the Lisp world that “if you're using eval, you're doing something wrong.” A more helpful formulation is, “if you're using eval, you should probably rewrite your code to use macros instead.” This is generally good advice—see here for some contrary examples—but it does not tell the whole story; that's where Seibel's post comes in.

Seibel considers the problem of a DSL for performing simple keyword queries on a block of text. He gives the example

(or
 (and "space" "mission")
 (and (or "moon" "lunar") "landing"))

which tests whether the text contains the words space and mission, or moon and landing, or lunar and landing. It's pretty easy to translate the above into a Lisp expression such as

(or
 (and (word-in-string-p "space" text)
      (word-in-string-p "mission" text))
 (and
   (or (word-in-string-p "moon" text)
       (word-in-string-p "lunar" text))
   (word-in-string-p "landing" text)))

where word-in-string-p returns t if the first argument is a substring of the second and nil otherwise.

The next problem is to evaluate this expression. The naive way to do this is to pass it to eval, and you know what? The earth will still be orbiting the sun if you do. But Seibel goes on to show us a better way and to illustrate why the prejudice against eval is usually well-founded.

Instead of invoking eval, we turn the above expression into a function

(lambda (#:G1)
 (or
   (and (word-in-string-p "space" #:G1)
        (word-in-string-p "mission" #:G1))
   (and
     (or (word-in-string-p "moon" #:G1)
         (word-in-string-p "lunar" #:G1))
     (word-in-string-p "landing" #:G1))))
where the #:G1 is a gensym'd symbol. Now an immediate advantage of

this is that we have a function that we can pass to some other function or otherwise use in any of the ways that Lisp's first class functions can be used. This means that it can be called over and over again on a series of text blocks (the original problem is posed in terms of a database query). The key here is that we can take this function and compile it using compile-expression so that (for systems that support native compilation) we have a fast implementation compiled to machine language.

Notice that we're not talking about much more work than using eval. We merely have to wrap the code in the lambda form and then call compile-expression rather than eval. Seibel's got all the code in the post I linked above, so you should go take a look at it.

No comments:

Post a Comment