Monday, August 31, 2009

All Hail the Technical Typist of Yore

Many years ago I produced a dissertation in Mathematics for my Ph.D. This was in 1982: the IBM PC had been introduced the year before and TeX82 had just been released. TeX as we know it today was still 7 years in the future. Device-Independent Troff was just 3 years old and only available under Unix. What all this meant was that my thesis, like that of all my fellow graduate students, was typed on a typewriter and photocopied for submission to the graduate school.

Recently the notion popped into my head that I should typeset my thesis with TeX. Don't ask me why—like many (or probably most) theses, mine was read by me, my committee, and my mother, and to tell the truth I'm not sure that my mom actually read it. Anyway, typesetting the thesis got me to thinking about the technical typists who used to produce theses and other technical documents.

It was a lot harder than you might think and not for the reasons you might think. Yes, working on a typewriter meant the cost of a typo was high: you couldn't just backspace, retype, and continue, you had to apply correction fluid to the paper to erase the error. If you accidently skipped some text from the handwritten original, you usually had to start the current page over. But that's not really what made the process hard. When typing mathematics (that is, the stuff between the dollar signs in TeX) a good technical typists would insert extra spaces around operators to avoid making the text look too cramped. And then there were the symbols. Want a γ? You couldn't just type \gamma, you had to change the symbol ball to the one with Greek letters, remember which key to press, type the γ, and then put the courier ball back in. Special symbols such as large integral signs had to be built up from 3 separate parts.

It was a difficult process that took considerable skill and training to get right. It was also expensive. A thesis like mine cost about $250 then; that would be $540 now. Today any damn fool can download a copy of TeX for free and typeset beautiful mathematical documents with almost no training at all. It's a good thing to remember the next time we're whining about having to produce a technical document.

Saturday, August 22, 2009

Two Schemes

The Scheme Steering Committee recently released a position statement in which they recommend splitting Scheme into "separate but compatible languages:" a small Scheme that is basically a modernized R5RS, and a large Scheme based on R6RS but with, as they put it, a "happier outcome."

The idea is that the small Scheme would be aimed at "educators, casual implementors, researchers, embedded languages, and `50-page' purists," while the large Scheme would be aimed at programmers and implementors of industrial strength Schemes. As a practical matter this would probably mean that large Scheme would have small scheme as a core and extend it with a larger standard library and perhaps some extra core features to support things like a module system. The small and large group charters specifically state that the languages be compatible and that small Scheme be a subset of large Scheme.

I think this is a good idea. By and large I'm happy with R5RS, but it would be nice to have a standard library for things like networking. On the other hand, I recognize that industrial use of Scheme requires things like a standard module system and perhaps (but only perhaps) some object oriented features like CLOS. In any event, the goal is to make Scheme portable among implementations, something that is dramatically missing currently.

Scheme is a great language and we should support changes that will increase its use and acceptance, but not at the cost of losing the features and principles that make it such a great language. Specifically, I would oppose any PLT-like changes that remove functionality in the service of protecting programmers from themselves. To paraphrase dmr, if you want Pascal, you know where to find it. Scheme is a Lisp dialect and Lisp is all about power, not about artificial restraints.

Wednesday, August 19, 2009

Speaking of JSON

Over at the YAHOO! Interface Blog they have an interesting talk on JSON by Douglas Crockford the inventor—or as he puts it, discoverer–of JSON. He recounts the history of JSON, compares it to some of its competitors, including XML, and explains some of the wrinkles in the notation, such as why keys are quoted.

It's an interesting talk and well worth 50 minutes of your time.

Tuesday, August 11, 2009

Scheme on Android

Previously, I wrote about developing in Scheme on the iPhone. Now in a post on the Google Research Blog, Bill Magnuson, Hal Abelson, and Mark Friedman are writing about using Scheme to develop on Android. Their work is part of the App Inventor for Android project that envisions using the Android as part of introductory programming courses for majors and non-majors alike.

Students would use a visual programming language, similar to Scratch, that provides a drag and drop interface in which students assemble applications for the phone. The visual language is compiled to s-expressions for a DSL written in Scheme macros. They are using the Kawa framework as their Scheme engine.

There aren't a lot of details in the post, but we can hope that developers will eventually get access to the underlying Scheme. This project is tremendously cheering to those of us who have been mourning the passing of 6.001.

Sunday, August 9, 2009

Data-interchange in Scheme

In my last post I talked about data-interchange using JSON. I remarked that the JSON format maps readily onto a very similar Lisp representation and that the Lisp version had the advantage of being parsed automatically by the Lisp (or Scheme) reader. In this post, I'd like to follow up on that by discussing the paper Experiences with Scheme in an Electro-Optics Laboratory by Richard Cleis and Keith Wilson. I forget when I first read this paper, but since then I've reread it several times and I learn something new each time I do.

The authors work at the Starfire Optical Range, an Air Force Laboratory doing research on controlling the distortion in optical telescopes caused by air turbulence. The laboratory has five telescope systems, which are run by about a dozen computers that provide tracking calculations, motion control, gimbal control, and command and status functions. Starfire uses Scheme in a variety of ways: as an extension language by embedding it in the legacy C application that provides motion control; as a configuration language for the telescopes and experiments; and as a bridge to proprietary hardware controllers by extending Scheme with the vendors' libraries. But what I want to talk about is their use of Scheme for data-interchange. All communications between the telescopes, their subsystems, and the computers are conducted with s-expressions that are evaluated by Scheme (or in one case by a C-based s-expression library).

This is more subtle than merely using s-expressions as a substitute for JSON. Every request for data or command to perform some function is sent as an s-expression. These messages have the form

(list 'callback (function1 parameter1...) ...)

where the optional callback function is defined by the sender and intended to handle the remote system's response. Each of the functions and their parameters (function1, parameter1, etc.) are commands and data to the remote system.

This seems a little strange until you realize that the remote system merely evaluates the message and returns the result using code like the stylized version below—see the paper for the actual, slightly more complex code.

(define handle-request-and-reply
 (lambda (udp-socket)
   (let* ((buffer (make-string max-buffer-size))
          (n (read-udp-socket udp-socket buffer)))
     (write-udp-socket
       udp-socket
       (format #f "~s"
               (eval
                 (read (open-input-string
                         (substring buffer 0 n)))))))))

Now consider what happens when the remote system evaluates the message

(list 'callback (do-something arg1 arg2))

First the arguments to list are evaluated. The callback function is quoted so the remote system merely treats it as a symbol. The second argument is a call to a local (to the remote system) function called do-something. When the argument (do-something arg1 arg2) is evaluated, do-something is called and returns a result, say the-result. Finally, the function list is called with the arguments callback and the-result so the result of the eval is

(callback the-result)

and this is returned to the sender on the same socket that received the original message. When the sender receives this message it is read by a routine similar to the one above that evaluates the message but does not send a response. The result of that evaluation is to call callback with the argument the-result.

This is truly a beautiful thing. The function handle-request-and-reply takes care of all communication with the remote system's clients. Notice that it is completely general and will handle any legal message a client sends. There is no need to write special code for each type of message, merely a function to do whatever the client is requesting. The actual code is wrapped in an error handler so that even malformed messages are caught and handled gracefully with an error message returned to the sender.

This is, I submit, a better solution in many cases than something like JSON or XML. Just a few lines of code completely handles data-interchange, message parsing, and communication, and it shows the power inherent in the humble s-expression.

Thursday, August 6, 2009

JSON

ESR recently put up a paper on his redesign of the request-response protocol for gpsd. The gpsd project is interesting in its own right, but what interests me is his use of JSON to serialize command and response data to the back end of the daemon.

Like many hackers, I'm indifferent to Java. I've never had the urge to learn it and I've been fortunate that my employment has never required its use. I have noticed, however, that Java-related things always seem to start with a capital J, so when I saw references to JSON I always assumed it was yet another Java something or other and passed it by without further investigation. Like most forms of prejudice, of course, this was merely ignorance. While it's true that the J does stand for Java, JSON actually stands for Java Script Object Notation and it's a really neat way of serializing complex data for transfer between applications.

JSON has two types of data structures:

  • arrays, in which the values are comma separated and enclosed in square brackets, and
  • objects, which are serialized representations of what Lispers call hash tables or alists and Pythonistas call dictionaries. An object is a comma separated list of name/value pairs enclosed in curly braces. The name/value pairs have the form "name" : value.

These structures are fully recursive and each can contain instances of itself or the other. Names are always strings, but object and array values can be strings, numbers, objects, arrays, true, false, or null. The exact syntax, complete with railroad diagrams, is given on the JSON.org site linked above; a formal description is given in RFC 4627. As an example, here is a hypothetical object representing a family:

{"father":{"name":"John Smith", "age":45, "employer":"YoYodyne, inc."},
"mother":{"name":"Wilma Smith", "age":42},
"children":[{"name":"William Smith", "age":15},
         {"name":"Sally Smith","age":17}]}
Notice that white space can be inserted between any pair of tokens to

make the object more human-readable, but that it is not required. Also notice that unused fields in objects (but not arrays) can be omitted as the employer field is in all but the father's object.

What I like about this is that it's fundamentally Lispy, which shouldn't be a surprise given JavaScript's Scheme lineage. For example, the above JSON family object rendered in a typically Lisp way is shown below. Notice how similar it is to the JSON rendering.

'((father . ((name . "John Smith") (age . 45) (employer . "YoYodynce, inc.")))
(mother . ((name . "Wilma Smith") (age . 42)))
(children . #(((name . "William Smith") (age . 15))
            ((name . "Sally Smith") (age . 17)))))

The nice thing about this representation is that the Lisp/Scheme reader will parse it directly. True, the name/value pairs are represented as alists, but for many applications that's the appropriate choice. Common Lisp hash tables don't have an external representation that is directly readable, but we could certainly use alists as such a representation at the cost of spinning down the list and hashing the names.

None of this is to say that we should eschew JSON in favor of s-expressions, especially when doing data-interchange between applications written in non-Lisp languages or even when interchanging data between a Lisp application and, say, a C application. Rather, my point is that Lispers should feel familiar and comfortable with it.

So what's the take away on this? First, that JSON is a simple and ubiquitous (the JSON.org page has links to parsers for dozens of languages, including CL and Scheme) data-interchange format that can make interprocess communication much easier when complex data is involved. Second, I think it nicely illustrates the power of Lisp ideas and data structures, even if JSON itself is neither.

Monday, August 3, 2009

Interview

Over at the essential emacs-fu, djcb has an interesting interview with Chong Yidong and Stefan Monnier, the current maintainers of Emacs. They talk about how they became maintainers, the Emacs 23.1 release, and their plans for the immediate future of Emacs 23. It's well worth a read.