Y combinator: October 2012

I am a great fan of the QuickCheck testing library for Haskell and have been using it for some time now. If you don't know about it stop reading this immediately and go read the original ICFP paper. Alternatively (for the impatient) browse the article on Wikipedia which also refers to some re-implementations in other languages.

QuickCheck can be seen as two domain-specific languages embedded in Haskell: one for generating test data and another writing properties of Haskell programs (and also collecting coverage statistics). But you can also use QuickCheck to generate complex input data for programs in any language.

The example I am reporting came up when setting up a problem for TIUP, a locally organized ACM-style programming contest for undergraduate students. The competitors must solve a number of coding problems in C/C++, Java or PASCAL. The solutions are tested automatically by running the programs against a number of hidden test inputs to assess output correctness (and perhaps efficiency by imposing time and space limits).

For my particular problem, the inputs are planar graphs representing a city road map. I used Haskell and QuickCheck to generate random planar graphs of increasing size. The idea is simple (and probably not original): start with a regular mesh and randomly delete some edges; some care is needed to avoid disconnecting the graph and avoid leaving nodes with a single outward edge (for esthetical reasons).

Here are some examples of generated graphs with 25 nodes and an increasing number of edges removed.

Full mesh

20% random edges removed

50% edges removed

You can download the source file here: Graph.hs. This Haskell script generates some random graphs and invokes graphviz to render them. It is very short (around 120 lines) and, I my opinion, quite readable; it should also be easy to modify. The functional programming "idioms" like parametric polymorphism, higher-order functions, list comprehensions and point-free style really shine on this kind of problem.

This is a short Haskell program that implements regular expression matching ( you can download the source file here: RE.hs). Two neat things about this implementation:

matching is defined by recursion over the structure of the regexp (i.e. there is no need to convert the regexp to an automata); and
it uses the "overloaded strings" extension to allow treating string literals as regexps.

Matching is implemented using continuations that specify what to do with the suffix left-over after matching the string; this leads to a very elegant and self-documented solution.
There is, however, one drawback: the handling of star-closure is inefficient due to the test cs'/=cs to check that a prefix of the string was consumed. The changes needed to implement a more efficient version are left as an exercise for the reader.

{- ------------------------------------------------------------------ 
   Regular expression matching in Haskell 
 
   This is a classical example of functional programming with 
   continuations. Adapted from the SML version by Olivier Danvy 
   (from "Defunctionalization at Work", BRICS RS-01-23, 2001).
   This version uses the "overloaded strings" GHC extension for
   mixing character literals in regexps.  

   Some examples to try on the GHCi interpreter;
   (Note: use the command ":set -XOverloadedStrings"
   to enable overloaded strings for the GHCi session.)

   > :set -XOverloadedStrings
   > match (star "ab") ""
   = True
   > match (star "ab") "aba"
   = False
   > match (star "ab") "abab"
   = True
   > match (star "a" <+> star "b") "aaaa"
   = True
   > match (star "a" <+> star "b") "aabb"
   = False
   > match (star ("a" <+> "b")) "aabb"
   = True

   Pedro Vasconcelos, 2012
  -------------------------------------------------------------------
-}

-- enable the overloaded strings language extension in GHC
{-# LANGUAGE OverloadedStrings #-}
module RE where
-- import the interface for overloading strings
import GHC.Exts ( IsString(..) )

-- Abstract syntax of regular expressions;
-- literal characters, empty word, empty language
-- concatenation, union and Kleene star closure
data Regexp = Lit Char
            | Epsilon
            | Empty
            | Concat Regexp Regexp
            | Union Regexp Regexp
            | Star Regexp
            deriving (Show, Read, Eq)
                            
-- "Smart" constructors that perform some simplifications
-- Infix binary operators for concation and union
infixl 7 <>
infixl 6 <+>

(<>) :: Regexp -> Regexp -> Regexp
Empty <> _ = Empty
_ <> Empty = Empty
Epsilon <> r = r
r <> Epsilon = r
r <> r'      = Concat r r'

(<+>) :: Regexp -> Regexp -> Regexp
Empty <+> r = r
r <+> Empty = r
r <+> r'    = Union r r'

-- Kleene star closure
star :: Regexp -> Regexp 
star Empty     = Epsilon
star Epsilon   = Epsilon
star (Star r) = Star r
star r        = Star r

-- coercion of strings to regular expressions
-- e.g. fromString "abc" = Lit 'a' <> Lit 'b' <> Lit 'c' <> Epsilon
instance IsString Regexp where
  fromString cs = foldr ((<>) . Lit) Epsilon cs


-- Higher-order, continuation-based matcher for regular expressions
-- worker function to match a regexp with string
-- 3rd argument is the continuation for the non-matched suffix
accept :: Regexp -> String -> (String -> Bool) -> Bool
accept Empty cs k   = False
accept Epsilon cs k = k cs
accept (Lit c) (c':cs) k = c==c' && k cs
accept (Lit c) []  k   = False
accept (Concat r1 r2) cs k = accept r1 cs (\cs' -> accept r2 cs' k)
accept (Union r1 r2) cs k = accept r1 cs k || accept r2 cs k
accept (Star r) cs k = accept_star r cs k
  
accept_star r cs k 
  = k cs || accept r cs (\cs' -> cs'/=cs && accept_star r cs' k)

-- top-level matcher; initial continuation tests for the empty string
match :: Regexp -> String -> Bool
match r cs = accept r cs null

-- end of file --

Y combinator

Tuesday, October 16, 2012

Generating random planar graphs

Wednesday, October 10, 2012

Regular expression matching in Haskell

About Me