Common Lisp

From Devpit
(Redirected from Lisp)
Jump to: navigation, search

Helpful Links

The Common Lisp Hyperspec is extremely handy to have open any time you are working with the language: http://www.lispworks.com/documentation/HyperSpec/Front/index.htm See the Symbol Index to look up functions by name.

There is a burgeoning Common Lisp community. Here are some contact points:

Lisp Implementations

There are many Common Lisps out there. Some are free, some are commercial, and some are not true Common Lisps. The best free implementations to get you up and running are:

  • OpenMCL - MacOS X, Darwin, and PowerPC Linux
  • CMUCL - very portable; at least Linux, *BSD, and MacOS X
  • SBCL - fork of CMUCL, even more portable
  • GNU CLISP - bytecode-interpreted, very portable, including to Windows

Note that CLISP includes an editable and tab-completion-friendly command line, whereas the others are more plain. This makes it very suitable for starting out. However, powerful command lines are available for all the others in the form of Emacs with SLIME, and they also have better performance than CLISP does.

SLIME

The Superior Lisp Interaction Mode for Emacs (SLIME) is a very powerful Lisp IDE that runs within Emacs. You get a full Read-Eval-Print-Loop (REPL) connected to the Common Lisp of your choosing, Lisp editing features, quick links to the Lisp Hyperspec for any Lisp symbol, and more. Many people find that it is worth learning Emacs just to use SLIME.

Lisp is Structured Peculiarly

In every computer language, you start at some point with plain text and eventually get CPU cycles. The process of doing so goes through a number of steps:

  1. Read plain text and tokenize it
  2. Parse the tokens into a parse tree
  3. Translate the parse tree (this can include source-level optimizations, etc.)
  4. Compile or interpret the parse tree

In most languages, the first two steps are combined into one component, and that component is called a "parser." The only control you normally have at that level comes from, using C as an example, preprocessor directives and preprocessor macros.

In Common Lisp, the first and second steps are combined into one component, which is instead called the "reader." However, in Common Lisp, you have control over the second step in the form of reader macros.

In most languages, the third and fourth steps are combined and entirely off-limits for the programmer. Once your code is in its final C (or Perl, or Java, etc.) form, the compiler or interpreter takes over and translates code into machine instructions.

In Common Lisp, however, the third step is within your realm of control. You can write macros (and symbol macros) to translate the parse tree before feeding it to the compiler or interpreter. This is all handled by the Lisp system for you - all you have to do is use defmacro and define-symbol-macro to create them, call for them in your code, and let the compiler do its business (although you always have functions such as macroexpand-1 available to expand macros and see what code the compiler will get).

The remaining feature to keep in mind is that you have the entire Common Lisp language available to you, whether you are writing your actual program, writing a macro, or writing a reader macro. This is what makes Lispers claim that Lisp macros make it more powerful than any other language there is.

Just in case the macros are not dynamic enough: The entire language including the compiler is even available for code put together at runtime, i.e. one can define functions (and macros) at runtime resulting in code being executed at machine language speed. No other language offers this. As an example, consider a function plotter receiving the function expression from the user at runtime. There's also the very interesting idea of optimization by specialization at runtime, called Multi-Stage Programming.

Domain Specific Languages

Lispers often talk about creating domain-specific languages in Lisp and then solving their problem in that language. Most often, such a domain-specific language will be created using Lisp macros.

The general idea is this: Assume you want to solve problem X. The best language to solve problem X in is a language custom-designed for solving problem X. You can generalize this a bit to say that the best language to write a program that solves problems of the class P is a language custom-designed for solving problems in class P. Such a specific language would allow you to write the solution to any problem X in the class P by writing one statement in that language.

In Lisp, the paradigm is to design a language for solving the problem and then craft bits and pieces of Lisp to create that language. While there are thousands of examples of this in action, it is probably easiest to start with a domain-specific language that is included in Common Lisp: loop.

The Common Lisp macro named loop implements a domain-specific language for loops. It is a very rich language for looping (see Chapter 6 of the Hyperspec for documentation). Within a loop, you can write Common Lisp code (which is normally what you want to do in the first place), but you do not have to. For instance, the following code snippet is written entirely in the loop language with no non-loop Common Lisp at all:

(loop for i from 1 upto 100 collecting i)

Somewhere in the sources of your Common Lisp compiler is a piece of code that starts out like this:

(defmacro loop (&rest loop-code)
  ...)

This macro implements the loop language by translating it into Common Lisp code before the compiler sees it. To see what it does, try the following:

(macroexpand-1 '(loop for i from 1 upto 100 collecting i))

The same thing applies to the Common Lisp Object System (CLOS) - it is just a domain-specific language written in Common Lisp and designed for object-oriented programming.

Backquote Syntax

Most macros will involve some amount of backquote syntax. This is a very quick explanation since backquote syntax is one of the hardest parts of Lisp code to read. (Note that it is mostly hard to read because it is a domain-specific language for code templates, implemented as a reader macro that dispatches on the backquote character.)

In Lisp, you often will have to quote a form. To do so, you can use the macro (quote FORM) or you can use its synonym (implemented as a reader macro defined using set-macro-character), the single-quote symbol. For instance, when the reader/parser sees 'X it feeds the interpreter/compiler the form (quote X) instead, no matter what X is. This is especially handy for creating lists or passing around symbols, since a plain list is treated as a function call and a plain symbol is treated as a fetch of a variable value.

Backquote syntax allows you to quote a long form but escape from the quoting for some parts within. It has three basic pieces:

  1. The backquote symbol `
  2. The comma character ,
  3. The comma-at symbol ,@

The backquote symbol introduces a backquoted form. Backquoted forms are almost always lists, because otherwise the backquote is synonymous with a single quote (since only lists can contain sub-pieces to use commas and comma-ats on).

The comma character un-quotes the next form, and the comma-at symbol splices in the next form as a list after un-quoting it.

Example:

(let ((x 10)
      (y (list 1 2 3)))
  `(list
     x
     ,x
     y
     ,y
     ',y
     ,@y))
 =>
(list x 10 y (1 2 3) '(1 2 3) 1 2 3)

What happens here is this:

  1. The outermost let creates bindings for x = 10 and y = (1 2 3)
  2. The backquote symbol quotes out the (list ...) form
  3. The first x is quoted out as a symbol
  4. The ,x escapes from the backquote and gets the binding of x from the let form, so you get 10
  5. The first y is quoted out as a symbol
  6. The ,y escapes from the backquote and gets the binding of y, (1 2 3)
  7. The ',y is commonly seen since (1 2 3) would be evaluated as a funcall of 1, which is an error
  8. The ,@y evaluates y and splices it in as a list

These all make more sense when you see them in common usage, but it is important to know the basics to be able to understand what is happening when you see them in actual code.

Another thing to keep in mind is this: The backquote is just a reader macro. That means that it is applied when code is being read, and whatever you end up with after applying the backquote syntax still has to be fed to the interpreter or compiler. That's why you see things like ',y - if you just did ,y then you would have the list (1 2 3), and the compiler would signal an error when it got to that point.

Backquoted forms are often used in macros, since a macro is supposed to transform code before it is fed to the compiler. You can think of backquoted forms as "templates" for code - the template looks like the code that you want the compiler to see, except that the comma-unqouted and comma-at-unquoted pieces are replaced as appropriate.

For instance, a simple macro will look like this (note that &body is equivalent to &rest other than the fact that source code formatters often indent &body differently):

(defmacro do-it-n-times (n &body body)
  `(loop for ,(gensym) from 1 upto ,n
     do (progn
          ,@body)))

When the macro expansion occurs, which is after the code is read/parsed and before it is interpreted or compiled, you get transformations like this:

(do-it-n-times 10
  (format t "Hello ...~%")
  (format t " ... world!~%"))
 =>
(loop for #:G1 from 1 upto 10
  do (progn
       (format t "Hello ...~%")
       (format t " ... world!~%")))

As you can see, the (gensym) and n have been evaluated and put into the template, and the body of the do-it-n-times form has been spliced into the template as well. The compiler will now compile the loop form.

Reader Macros

Reader macros let you change the syntax of the language at the parser level. They can be created in a couple of ways. The following is an example using set-dispatch-macro-character, with set-macro-character being the other common method.

The following example is a simple use of dispatch macro characters to create a new syntax. Specifically, the syntax

#[key1 value1 key2 value2]

will create a hash-table with two keys (the symbols KEY1 and KEY2) with the values you would expect. You can also use an integer argument to the dispatch character to control the initialize size of the hash-table. That is,

#100[key1 value1 key2 value2]

will create a hash-table with initial size 100 instead of the implementation-specific default.

Most of the code in the example is complex macrology with the purpose of making sure that value1 and value2 get evaluated at the correct time and in a safe environment. Basically, if you write the example syntax above, the Lisp reader/parser generates the following code for the interpreter or compiler to see:

(let ((#:G1 (make-hash-table :test #'equalp)))
  (setf (gethash 'key1 #:G1) value1)
  (setf (gethash 'key2 #:G1) value2)
  #:G1)

You can test this by using read and, optionally instead of standard input, make-string-input-stream (example run with OpenMCL and indented by hand):

(read (make-string-input-stream "#[key1 value1 key2 value2]"))
 =>
(LET ((#:G82 (MAKE-HASH-TABLE :TEST #'EQUALP)))
  (SETF (GETHASH 'KEY1 #:G82) VALUE1)
  (SETF (GETHASH 'KEY2 #:G82) VALUE2)
  #:G82)

Notice how the Lisp reader/parser translates this on the fly before handing it over to the Lisp interpreter or compiler. That is what makes it a reader macro. The next phase after reading would be (regular) macro expansion, and then compilation.

In a similar vein, the second function is a regular reader macro that turns code like this:

[hashtab key1]

into a hash access that the compiler sees as this:

(gethash 'key1 hashtab)

Note that you can use the same syntax with setf to set values in the hash table:

(setf [hashtab key1] newvalue)

The actual code for all this is:

(set-dispatch-macro-character #\# #\[ ; Dispatch on the sequence #[ or #arg[
  #'(lambda (stream subchar arg) ; Anonymous function to read/parse it
      (declare (ignore subchar)) ; We already know it is [
      (let ((list (read-delimited-list #\] stream t)) ; Read in the rest of the list, up to ]
	    (keys '()) ; Empty list, filled below
	    (values '()) ; Empty list, filled below
            (hashtab (gensym))) ; Gensym name for the hashtab so the values can't clobber it
	(do ((key list (cddr key)) ; Loop for keys being sublists, skipping 2 ahead each time
	     (value (cdr list) (cddr value))) ; ...and for values being sublists, skipping 2 ahead
	    ((null key)) ; Terminate loop when out of keys
	  (when (not (symbolp (car key))) ; Make sure that the keys are symbols
	    (error "Keys must be symbols"))
	  (push (car key) keys) ; Assemble the keys in reverse order
	  (push (car value) values)) ; Assemble value forms in reverse order
	(setf keys (nreverse keys)) ; Reverse the keys - push/nreverse is the fast way to do this
	(setf values (nreverse values)) ; Reverse the value forms
;;; The next 8 lines are the code template
	`(let ((,hashtab ,(if arg ; If there is an argument given, make the hash-table that size
			      `(make-hash-table :test #'equalp :size ,arg)
			      '(make-hash-table :test #'equalp)))) ; Otherwise use the default size
	   ,@(mapcar #'(lambda (key value) ; Map this function across keys/values
			 `(setf (gethash ',key ,hashtab) ,value)) ; Add the item to the hash
		     keys
		     values)
	   ,hashtab)))) ; Return the generated hashtab

(set-macro-character #\[ ; Dispatch on [
  #'(lambda (stream char) ; Anonymous function to read/parse
      (declare (ignore char)) ; We already know that it's [
      (let ((list (read-delimited-list #\] stream t))) ; Read up through ]
	(when (/= (length list) 2) ; Make sure that we have two elements
	  (error "Invalid number of arguments to []"))
	(when (not (symbolp (cadr list))) ; Make sure that the key is a symbol
	  (error "Key must be a symbol"))
	`(gethash ',(cadr list) ,(car list))))) ; The actual code template

(set-macro-character #\] (get-macro-character #\))) ; This is a helper for read-delimited-list

An even simpler, although utterly pointless, example is the reader macro version of hello world:

(set-dispatch-macro-character #\# #\* (lambda (s c a) "Hello, world!"))

Now, when the Lisp reader/parser encounters the sequence "#*", it will feed the interpreter/compiler the string "Hello, world!" instead.

Macros

Lisp is famous for its macros. Paul Graham describes them in his book On Lisp as follows: "The definition of a macro is essentially a function that generates Lisp code - a program that writes programs."

Graham goes on to write four chapters dedicated to writing macros. Macros in Lisp are similar in concept but not in power to those in C. In each language, a macro changes the source code before the compiler sees it. The key difference in Lisp is that the full language is available to the macro author, because macros are written in Lisp.

Macros are very powerful and can be very useful. However, they should be used sparingly. Graham avoids using them whenever it makes sense to do things without writing macros, and then goes on to quantify how much easier it was to solve a given problem with Lisp based on the percentage of the code that consists of macros. The more macros, the harder it would be to do in a language lacking them.

Macro calls in Lisp look just like function calls. For instance, if you see (operator arg1 arg2) in Lisp code, you need to determine whether operator names a function or a macro (or a special operator, which means a built-in Lisp operator; the list of these is constant). When the compiler sees this code, it checks if the operator is a special operator. If so, it compiles the special operator call. If it is not, then the compiler checks whether the operator names a macro. If so, then the compiler expands the macro form and compiles it. Otherwise, the compiler generates a function call (regardless of whether the operator names a function at that time). Because of this behavior, it is necessary to define macros before using them; whereas functions can be used in code that gets compiled before the function gets defined.

As stated above, backquote syntax is common in macros. The reason for this is that backquoting allows you to create code templates that look just like the code they will generate, with special comma and comma-at syntax for the places in the template that will be filled in by the macro.

Generally speaking, if you cannot do something with a function, you can do it with a macro. This can be anything from opting out of evaluating an argument to the macro to extending Lisp and right on up to implementing a compiler for another language, such as a domain-specific language (see loop above for an example of a language that is compiled into Common Lisp by a macro).

An example of extending the Lisp language is a for macro. Suppose you want to have a for loop that behaves similarly to what you would have in C. You want to write:

(for (i 0) (< i 10) (incf i)
  your
  code
  here)

Common Lisp does not have anything that looks like this. It has several ways to accomplish the same thing, and in fact we will use one of them to implement this macro. (As a matter of fact, the one used below is probably the least appropriate for a simple loop like this, since there are several excellent loop facilities available in Common Lisp. However, goto is more fun for didactic purposes.)

(defmacro for (init continue next &body body)
  (unless (consp init)
    (error "Syntax error - init form must be a list"))  ; Thrown at compile-time
  (let ((looplabel (gensym)) ; We use a gensym to avoid using a name that is in the body
        (exitlabel (gensym))
        (var (first init))
        (initval (second init)))
    (unless (symbolp var)
      (error "Syntax error - variable must be a symbol")) ; Thrown at compile-time
    `(prog ((,var ,initval)) ; Here begins the code template; prog is like (let () (tagbody ...))
        ,looplabel ; In a tagbody/prog, you can jump to symbols used as labels
        (unless ,continue ; If the condition is not true...
          (go ,exitlabel)) ; ...then exit the loop
        ,@body ; Splice in the body of the loop
        ,next ; Update the counter
        (go ,looplabel) ; Loop
        ,exitlabel)))) 

Note that this is what is known as an anaphoric macro - that is, it creates a variable that is visible inside the body of the macro call. Specifically, the variable i will be available inside of the example loop given above. This is one of those gotchas that should be avoided in most macros, but it has its place. For instance, there is a popular package of macros that provides anaphoric versions of some conditional operators. These tend to work by binding the variable it to the result of the conditional test. The C equivalent is usually if((it = (test))).

At any rate, the above macro is anaphoric. However, it also demonstrates how to avoid accidentally capturing variables, using gensym to create an uninterned symbol, meaning one that cannot possibly be the same symbol as anything in the body of the macro call, and therefore one that cannot interfere with the user's code.

Note that this macro will not be applied by the reader, but rather by the compiler. We can use read-from-string to test this:

(read-from-string "(for (i 0) (< i 10) (incf i) (format t \"~d~%\" i))")
 =>
(FOR (I 0) (< I 10) (INCF I) (FORMAT T "~d~%" I))

To see what the compiler will end up compiling, use macroexpand-1 (indentation done by hand):

(macroexpand-1 '(for (i 0) (< i 10) (incf i) (format t "~d~%" i)))
 =>
(PROG ((I 0))
  #:G99
  (UNLESS (< I 10)
    (GO #:G100))
  (FORMAT T "~d~%" I)
  (INCF I)
  (GO #:G99)
  #:G100
  )

So if you tell Lisp to compile our sample form and run it, you will get this:

(for (i 0) (< i 10) (incf i) (format t "~d~%" i))
 =>
0
1
2
3
4
5
6
7
8
9
NIL

The numbers were printed to standard output, while the NIL is the return value of the macro call, which is the return value of the prog form, which is in turn the return value of the unless form, which returns NIL since its test came up false the last time through the loop.

If you have understood everything up to this point, then you know that the above macro has provided a for loop to a language that had none, using only that language's built-in versions of if and goto. C macros cannot do this.

There are, of course, much better uses for macros, but adding something simple to the language is the most common way to teach them, largely because adding more complicated things to the language is the most common use of macros.

For another example, consider if you had a basic mutex library available to use. The following code should be easy to read, and does not reflect the use of any particular mutex library but rather should serve only as an example:

(defmutex *global-x-mutex*)
(defvar *global-x*)

(defun process ()
  (lock-mutex *global-x-mutex*)
  (do-something-to *global-x*)
  (unlock-mutex *global-x-mutex*))

This code looks simple, but obviously could get much more complicated in practice. It also has a bug, which is not easy to spot right away in more complex code and which a programmer is likely to write into his code more than once. The bug is that the mutex will not be unlocked if an error causes do-something-to to throw a condition outside of process. The solution is to write the code with unwind-protect like this:

(defun process ()
  (lock-mutex *global-x-mutex*)
  (unwind-protect
       (do-something-to *global-x*)
    (unlock-mutex *global-x-mutex*)))

This avoids the pitfall, since unwind-protect ensures that the unlock operation occurs before it returns, no matter whether the return is local or non-local (that is, even if an error condition is thrown outside of process). But this code is ugly and clumsy to write over and over again. It also hides the meaningful code in a mess of what amounts to janitorial code - the one line of work is buried within three lines of bookkeeping.

Instead, you can write a macro to do the dirty work for you and reduce the likelihood that you will screw it up at some point in your code. Then, you can simply call the macro. Compare the following code to each of the above snippets (note that the second progn allows you to have multi-form bodies in code that uses the macro, even though the sample use below does not use this feature of the macro written below):

(defmacro with-locked-mutex (mutex &body body)
  `(progn
     (lock-mutex ,mutex)
     (unwind-protect
          (progn ,@body)
       (unlock-mutex ,mutex))))

(defun process ()
  (with-locked-mutex *global-x-mutex*
    (do-something-to *global-x*)))

The bookkeeping code is down to one line, and that line is easy to read and understand. With-locked-mutex tells the person reading the code that, for the duration of its body, the *global-x-mutex* will be locked. Moreover, it is impossible to forget to unlock the mutex, since the macro does it for you. In short, this macro makes it easier to write code and easier to read the code you have written.

What you already know about factoring your code into functions to make your code easier to read and to make bugs easier to fix, among other benefits, applies analogously to using macros. Macros allow you to turn Lisp into a more powerful and more expressive language, and into a language that is especially well suited to solving the problem at hand.

For example, On Lisp spends one chapter guiding the reader through writing a Prolog interpreter and then a Prolog compiler using some macros. (Note that the syntax is in this case still written in s-expressions, so you do not have to write a Prolog parser and can focus on the semantics.) There is virtually no limit to what can be done with macros when it comes to building up a language with which to solve a given problem.

Knowing how and when to use macros (and, conversely, how and when not to use them) is one of the key skills in writing good Lisp code.

Symbol Macros

Symbol macros do not get used terribly often, but they are useful to know about. They are simply macros that replace symbols with code before the compiler sees the code. They are created with define-symbol-macro.

(define-symbol-macro hi (progn (format t "Hello world!~%") 'hello))

When you use the symbol hi in your code after this, it will be replaced with the progn form and then compiled.

hi
 =>
Hello world!
HELLO

The orange text is on standard output, while the HELLO is the value returned by the symbol macro. The only other thing to know about symbol macros is that they can be shadowed, for instance by let.

(let ((hi "Goodbye!"))
  hi)
 =>
"Goodbye!"

Compiler Macros

The "other" kind of macro is the compiler macro. Compiler macros are macros that have the same name as a function or macro, but may be applied by the compiler instead of directly producing a function call or macroexpanding the macro. Note that the compiler does not have to use the compiler macro. What you do is create a compiler macro to manually optimize your function or macro calls. This is a rarely-used facility, but the short version is that you use define-compiler-macro to do it.

By way of example, suppose that you are writing a program that uses many regular expressions to process text.

Start off with the assumption of a regular expression system that provides the function REGEXP-PARSE, which takes a regular expression as a string and returns a function that, when applied to a string, scans that string according to the regular expression. You can then define an all-in-one function to scan a string according to a regular expression:

(defun regexp-scan (expression text)
  (let ((parsed-expression (regexp-parse expression)))
    (funcall parsed-expression text)))

The problem with REGEXP-SCAN is that, every time it is called, it parses the regular expression at runtime. If your regular expressions are always constant strings, you could use a macro to parse them at compile time. However, if they could be either constant strings or runtime variables, then you cannot just use a macro or just use a function. You could always use a function for runtime variables and a macro for compile-time constants, but they would need to be named differently and your code would be less maintainable as a result.

The solution is to write a compiler macro named the same as the REGEXP-SCAN function. This compiler macro will check for a constant string passed to it as the regular expression. If the expression is a constant string, then it will be parsed at compile time. If it is a variable, then the compiler macro will return the original form, indicating to the Lisp compiler that it should use the function (which is defined above) instead.

(define-compiler-macro regexp-scan (&whole whole expression text)
  (if (and (stringp expression) (constantp expression))
    `(funcall ,(regexp-parse expression) ,text)
    whole))

Whether the compiler actually chooses to use the compiler macro, however, is implementation-dependent. Therefore, you should check your implementation to see how much support it has for compiler macros before you rely on them for all of your optimizations. That said, it is generally considered good practice not to optimize prematurely, so the best thing to do is normally just write Lisp code in a natural way and worry about performance bottlenecks later on.