Friday, August 31, 2012

Why Haskell is not a lazy language

There is a recurrent misconception that Haskell is a lazy functional language. As some people have pointed out, this is not an acurate description: Haskell is a non-strict functional language. Lazy evaluation (i.e. call-by-need or graph reduction) is an (operational) implementation technique; non-strictness is a (denotational) semantic property.

The Haskell standard (almost)  never specifies evaluation order; it says what the values (denotations) of expressions should be (in particular, whether they should be terminating or non-terminating - designated "bottom" in denotational semantics). Implementations are free to choose any evaluation strategy that preserves the semantics. With call-by-name being too inefficient for general use, a "safe" approach would to use lazy evaluation everywhere. However, an implementation can use eager evaluation whenever this does not change the termination properties of the program. This is can be automatically detected by strictness analysis and is routinely done by the GHC compiler to optimize numerical code (for example).

Moreveover, Haskell is non-strict by default only:  the programmer can also express strictness information through annotations. Consider the definition of the natural numbers in Peano-style; a first attempt would be as follows.

 data Nat = Zero | Suc Nat deriving Show

Data constructors are non-strict by default, so this type includes an "infinite" natural number; let's try it out in the Haskell interpreter:

> let inf = Suc inf in inf
Suc (Suc (Suc (....^C Interrupted

Such cyclic values can lead to undesirable consequences. For example, they break some familiar algebraic laws: from A+x=B+x we can no longer infer that A=B. For a counterexample, let's first define addition using primitive recursion:

 (+) :: Nat -> Nat -> Nat
 Zero + y = y
 Suc x + y = Suc (x+y)

Now we can show that Suc Zero + inf = Zero + inf:

  Suc Zero + inf 
= {1st equation of +}
  Suc (Zero + inf) 
= {2nd equation of +}
  Suc inf 
= {definition of inf}
  inf
= {1st equation of +}
  Zero + inf

Thus from the above law we could prove Suc Zero = Zero which is not a good thing.
We can prevent such pathological values by adding a strictness annotation to the data type declaration; let's define proper Peano naturals:

> data PNat = PZero | PSuc !Nat deriving Show

The exclamation mark (pronounced "bang") declares that PSuc is strict in its argument, i.e. PSuc bottom = bottom (where bottom is a non-terminating expression). (By the way, such strictness annotations are part of the Haskell standard, not a GHC extension).

What happens if we attempt to definite the infinite number? Well, we simply get a non-terminating computation (bottom). In other words: inf becomes indistinguishable from a completely circular definition:

> let inf = PSuc inf in inf
^C Interrupted

> let bottom = bottom in bottom
^C Interrupted

Now what if we want to specify the strictness of functions? Strictness annotations don't work for this because we are not defining a new data type; we can instead use bang patterns (which is not part of the Haskell standard but is an extension supported by GHC and other implementations). For example, let's define the strict and non-strict constant functions (i.e. that yield the first argument and ignore the second).

{-# LANGUAGE BangPatterns #-}
strict_const :: a -> b -> a
strict_const k !x = k
const :: a -> b -> a
const k x = k

Testing these in the interpreter works as expected:

> let bottom=bottom in const 42 bottom
42
> let bottom=bottom in strict_const 42 bottom
^C Interrupted

The moral of the story is that the Haskell programmer is not forced to be lazy or non-strict. We have at our disposal means for expressing our intent. We them often and use them wisely!

Thursday, August 30, 2012

Getting started!

In this blog I'll wrote about things that surprise and amuse me in computer science, programming languages theory and practice, mathematics, science and technology.

About the title

In programming languages theory, the "Y combinator" (also called the "fixed point combinator") is a higher-order function that allows encoding recursion without self-reference (i.e. without defining a name in terms of itself); this is useful in giving meaning to recursive definitions in a way that is mathematically precise. You can read more about it in the Wikipedia article