Haskell

1. Concept
- 1.1. strictness
2. Lexical
3. Expression
4. Declaration
5. Predefined classes
6. Monad
7. Reference
- 7.1. Parsers

1 Concept

1.1 strictness

In a strict language, the arguments to a function are always evaluated before it is invoked. In a non-strict language, the arguments are not evaluated until their values are actually required. As a result, if during the evaluation of the arguments, run-time error occurs, the strict language will crash, but the non-strict program might finish peacefully.

Scheme is strict, while Haskell is non-strict.

2 Lexical

2.1 Names

There are six kinds of names:

(value) variables
(value) constructors
type variables
type constructors
type classes
module names

There are two constraints on naming:

lower-case vs upper-case
- variables and type variables must begin with lowercase letter or underscore
- other 4 kinds of names must begin with uppercase letter
in the same scope, type constructor and type class must not be the same

2.2 Comment

Pretty flexible, but just use

-- for line comment
{- XXX -} for block comment

2.3 layout

Haskell support two style of programming, possibly mixed: using explicit braces and semicolons, or use indention. The rule is pretty straight forward.

The layout rule takes effect whenever the open brace is omitted after keywords where, let, do, of. If it is not omitted, nothing happens, the indention will not matter at all. The indention of the next lexeme is remembered, and the omitted open brace is inserted. In the subsequent lines:

if the indention is larger, nothing inserted, i.e. it is same as writing on the previous line
if the indention is the same, a semicolon is inserted
if the indention is smaller, a close brace is inserted, and the layout rule ends

3 Expression

conditional

if e1 then e2 else e3
case e1 of { True -> e2; False -> e3}

List

[e1,e2,...,ek]
e1:(e2:(...(ek:[])))

Tuple

(e1,...,ek)

enumerations (note the .. is an operator)

[e1..]
[e1,e2..]
[e1..e3]
[e1,e2..e3]

list comprehension: Each of q_i is a qualifier, can have three forms. These qualifiers are nested. The first is pattern matching that matches pat against a list expression (thus called generator), the second create local bindings, the third a boolean guard. Pattern matching creates lambda bound, while let creates lexical bounds.

[e | q1,...,qn]

pat <- exp
let decls
exp

Let expression introduces a nested, lexically-scoped mutually-recursive (letrec) list of declarations.

let {d1;...;dn} in e

case expression matches e against a list of patches. But if a pattern matched, the body is not a single expression, but seems to be another round of matching. Each match is a list of guard expression separated by vertical bar (note the first vertical bar is mandatory). Each guard expression is a list of actual guards. Only if all the actual guards succeed, the guard expression succeeds. The corresponding expression is the result. Seems that the top-level patterns and the match guards have the same semantic, i.e. if one fail, try the next one.

case e of {p1 match1; ... ; pn matchn}
match ::= | gs1 -> e1 | gs2 -> e2 | ...
gs ::= guard1, guard2, ...
guard ::= p <- e
guard ::= let decls
guard ::= boolean guard

do expression seems to be exclusively used in monad. It is a syntax sugar of bind operation.

do x <- f; g x = f >>= (\x -> g x)

Expression type-signature is a notation for human only. The compiler does not need to know it, because it infers the type signature. Of course the compiler will complain if they do not match. Thus, the expression with type-signature simply evaluates to the expression.

The declared type may be more specific than the principal type derived from exp, but it is an error that the declared type is more general. But, it seems that, if we declare a more general type signature before the use of the type (thus before the inference), it is OK.

exp :: [context =>] type

Pattern matching match pattern against values, and construct bounds.

matching var against a value v always succeed, and bounds var to v
matching ~apat against a value v always succeed. This is called a irrefutable pattern, means I force it to match, don't give me error. The error will occur when the matched bindings are evaluated.
wildcard pattern _ against any value always succeed
matching con pat where con is a type constructor (defined by newtype), the value must also be con v to succeed
matching con pat1 ... patn where con is a data constructor (defined by data), the value must also be con v1 ... vn with the same constructor to succeed. The variables are bound.
[ ] fields
matching numeric, character, or string literal will only succeed if the value is that literal.
match var@apat will basically match apat against v, but introduce additional bound var for the whole v.

Pattern matching is a way of destructuring an algebraic data type, selecting a code clause based on its constructor and then binding the components to variables.

4 Declaration

Actually there are two categories of declarations: type and binding. So first we need to figure out what is type in Haskell.

There are two kinds of types, namely * and *->*. In Haskell, a type variable is assumed to be universally qualified, i.e. \(a \rightarrow a\) means \(\forall a . a \rightarrow a\).

Type context indicates the membership of a type variable to some type classes. The context is often written as cx => t.

The type that best describe an expression is its principal type.

4.1 Data Types (`type`, `newtype`, `data`)

This is called polymorphic types: (forall a)[a] denotes the family of types, for each type a, the type "list of a". Identifiers such as a above are called type variables, and are uncapitalized to distinguish them from specific types such as Int. This is called parametric polymorphism, useful to define families of types by universally quantifying them.

data declares an algebraic datatype. The following declares a new type constructor T, with zero or more data constructors (or just constructor) K1 to Kn.

data cx => T u1 ... uk = K1 t ... t | ... | Kn t ... t

type introduces a new type that is equivalent to the old one. This is called Type Synonym.

type T u1 ... uk = t

newtype introduces a new type whose representation is the same as old one. This seems very similar to type synonyms. But this is called renaming. It is introduced by the form

newtype cx => T u1 ... uk = N t

The difference:

newtype creates a distinct type that must be explicitly coerced to or from the original type
newtype can be used to define recursive types.
New instance can be defined for a newtype, but may not be defined for a type synonym.

4.2 Type Classes (`class`, `instance`, `default`)

This is ad hoc polymorphism. Compared to parametric polymorphism, it quantifies over some smaller set of types, e.g. those that can be compared for equality. It seems that type classes are defining some constrained set of types for some type, e.g. a subset of Integers. It defines not a type, but rather expresses a constraint on a type. The constraint is called a context. The context is placed at the front of type expressions.

The class declares a new type class (or just class). It will together define a set of methods that all instance of this class must support. You have two ways to define the instance relationship: through an explicit instance declaration, or use deriving to embed it into the class declaration.

class cx => C u where
  op :: cx => t -- cdecl 1
  infixr 7 'op' -- cdecl 2
  op = ... -- cdecl 3

This introduces a class named C, with super class from cx. As shown in the example, there are three kinds of decls. First is a method declaration. This only declares the type signature of the method.

Second is a fixity declaration. It has the following grammar, with integer be 0 to 9, where 9 is the highest precedence. I'm not sure why op needs to be quoted (or is it a quote?) though.

The third declaration is a default class method for any of the method declared.

Finally, a instance introduce an instance T of a class C. T is a type constructor, and cannot be a type synonym. This is called a C-T instance declaration.

instance cx => C (T u1 ... uk) where {d}

Since instance only declares the relationship, the type T still needs to be defined. It is also possible to declare the instance relationship together with the declaration of type. Specifically newtype and data can have an optional deriving sub-form for that. Omitting it is equivalent to writing an empty deriving instance deriving ().

data ... deriving C
newtype ... deriving C

4.3 nested declarations

This type of declaration denotes those that can be nested inside let or where.

First is type signature declaration

var1, ..., varn :: cx => t

Then the fixity declaration

(infixl | infixr | infix) [integer] ops

Function binding is a list of bindings. They are used to match different patterns of parameters. Each match is very similar to the match in case expression, but instead use = instead of ->.

foo p1 ... pn match
foo p1 ... pn match
...
match ::= | gs1 = e1 | gs2 = e2 | ...
gs ::= guard1, guard2, ...
guard ::= p <- e
guard ::= let decls
guard ::= boolean guard

Basically, it is semantically equivalent to this case statement:

x = \x1, ..., xk -> case (x1, ..., xk) of
    p1 ... pn match
    p1 ... pn match

Finally, we have pattern binding decl. The form can be:

-- simple form
p = e
-- general form
p | gs1 = e1
  | gs2 = e2
  ...

This is semantically equivalent to

p = let decls in
    case () of
      () | gs1 -> e1
      () | gs2 -> e2
      ...

5 Predefined classes

Here just literally copy the definition of those types.

Basic types

  -- bool
  data  Bool  =  False | True deriving
                               (Read, Show, Eq, Ord, Enum, Bounded)
  -- string
  type  String  =  [Char]
  -- list
  data  [a]  =  [] | a : [a]  deriving (Eq, Ord)
  -- unit
  data  () = () deriving (Eq, Ord, Bounded, Enum, Read, Show)
  data  Maybe a     =  Nothing | Just a  deriving (Eq, Ord, Read, Show)
  data  Either a b  =  Left a | Right b  deriving (Eq, Ord, Read, Show)
  data  Ordering    =  LT | EQ | GT deriving
                                    (Eq, Ord, Bounded, Enum, Read, Show)
  class  Bounded a  where
      minBound, maxBound :: a

Eq is pretty canonical

  class  Eq a  where
        (==), (/=)  ::  a -> a -> Bool
        x /= y  = not (x == y)
        x == y  = not (x /= y)
  class  (Eq a) => Ord a  where
     compare              :: a -> a -> Ordering
     (<), (<=), (>=), (>) :: a -> a -> Bool
     max, min             :: a -> a -> a

     compare x y | x == y    = EQ
                 | x <= y    = LT
                 | otherwise = GT

     x <= y  = compare x y /= GT
     x <  y  = compare x y == LT
     x >= y  = compare x y /= LT
     x >  y  = compare x y == GT

     -- Note that (min x y, max x y) = (x,y) or (y,x)
     max x y | x <= y    =  y
             | otherwise =  x
     min x y | x <= y    =  x
             | otherwise =  y

Read and Show

  type  ReadS a = String -> [(a,String)]
  type  ShowS   = String -> String

  class  Read a  where
      readsPrec :: Int -> ReadS a
      readList  :: ReadS [a]
      -- ... default decl for readList given in Prelude

  class  Show a  where
      showsPrec :: Int -> a -> ShowS
      show      :: a -> String
      showList  :: [a] -> ShowS

      showsPrec _ x s   = show x ++ s
      show x            = showsPrec 0 x ""
      -- ... default decl for showList given in Prelude

Enumerator is a classical example of laziness

  class  Enum a  where
      succ, pred     :: a -> a
      toEnum         :: Int -> a
      fromEnum       :: a -> Int
      enumFrom       :: a -> [a]            -- [n..]
      enumFromThen   :: a -> a -> [a]       -- [n,n'..]
      enumFromTo     :: a -> a -> [a]       -- [n..m]
      enumFromThenTo :: a -> a -> a -> [a]  -- [n,n'..m]

Of course we have the Monad:

  class  Functor f  where
      fmap    :: (a -> b) -> f a -> f b
  class  Monad m  where
      (>>=)   :: m a -> (a -> m b) -> m b
      (>>)    :: m a -> m b -> m b
      return  :: a -> m a
      fail    :: String -> m a

      m >> k  =  m >>= \_ -> k
      fail s  = error s

6 Monad

A monad is a way to structure computations in terms of values and sequences of computations using those values.

It is useful to think of a monad as a strategy for combining computations into more complex computations.

In general, use >> if the actions don't return a value, >>= if you'll be immediately passing that value into the next action, and do-notation otherwise.

6.1 Three Components of monad

A monad is a type constructor, a function called return, and a combinator function called bind or >>=. These three elements work together to encapsulate a strategy for combining computations to produce more complex computations.

the monad type constructor defines a type of computation, the return function creates primitive values of that computation type and >>= combines computations of that type together to make more complex computations of that type.

6.2 Type constructor

A type constructor is a parameterized type definition used with polymorphic types.

6.3 Using class

To define a monad, you basically need the three components. You can do it from scratch, but a better idea is through the use of class Monad.

class Monad m where
    (>>=)  :: m a -> (a -> m b) -> m b
    return :: a -> m a

There's a do notation that can be used with a monad. It is a syntactic sugar that "provides a simple, imperative-style notation for describing computations with monads"

Apart from these two operations, the Monad class also has two more operations: fail and >>. They are optional. Use fail if you want to have different behavior for failure. The >> function is a convenience operator that "used to bind a monadic computation that does not require input from the previous computation in the sequence." It is defined in terms of >>=:

  (>>) :: m a -> m b -> m b
  m >> k = m >>= (\_ -> k)

So the full version should be

  class Monad m where
    (>>=)  :: m a -> (  a -> m b) -> m b
    (>>)   :: m a ->  m b         -> m b
    return ::   a                 -> m a
    fail   :: String -> m a

6.4 the monad laws

All instances of Monad should obey the following equations, called Monad Laws:

return a >>= k                  =  k a
m        >>= return             =  m
m        >>= (\x -> k x >>= h)  =  (m >>= k) >>= h

return is a left-identity with respect to >>=
return is a right-identity with respect to >>=
a kind of associativity law for >>=

Any type constructor with return and bind operators that satisfy the three monad laws is a monad.

The compiler, however, does not check these laws.

6.5 A different expression of Monad Laws

create a description of a computation that will produce (a.k.a. "return") a given Haskell value, and
combine (a.k.a. "bind") a computation description with a reaction to it
- a pure Haskell function that is set to receive a computation-produced value (when and if that happens) and return another computation description, using or dependent on that value if need be
- creating a description of a combined computation that will feed the original computation's output through the reaction while automatically taking care of the particulars of the computational process itself.

6.6 One Way Monad

The IO monad is a familiar example of a one-way monad in Haskell. Because you can't escape from the IO monad, it is impossible to write a function that does a computation in the IO monad but whose result type does not include the IO type constructor. This means that any function whose result type does not contain the IO type constructor is guaranteed not to use the IO monad. Other monads, such as List and Maybe, do allow values out of the monad. So it is possible to write functions which use these monads internally but return non-monadic values.

The wonderful feature of a one-way monad is that it can support side-effects in its monadic operations but prevent them from destroying the functional properties of the non-monadic portions of the program.

7 Reference

[X] Haskell wiki: https://wiki.haskell.org
[X] Haskell wiki book: https://en.wikibooks.org/wiki/Haskell
[X] Write yourself a scheme: https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours
[X] The tutorial: https://www.haskell.org/tutorial/index.html
[X] language specification: https://wiki.haskell.org/Language_and_library_specification
[X] GHC: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/
[X] Cabal: package management https://www.haskell.org/cabal/

7.1 Parsers

[X] Parsec Parser: https://hackage.haskell.org/package/parsec
- [X] megaparsec: This is a fork of parsec https://hackage.haskell.org/package/megaparsec
[X] Happy parser: https://www.haskell.org/happy/
[X] Alex: https://www.haskell.org/alex/
[X] Earley: believe it or not, this is a new one. https://hackage.haskell.org/package/Earley
attoparsec: another combinator, but seems not feature rich. https://github.com/bos/attoparsec
trifecta: this is yet another combinator. But seems to be out of date and buggy? https://github.com/ekmett/trifecta/
parsers: also combinator. https://hackage.haskell.org/package/parsers