# Haskell

## Table of Contents

## 1 Concept

### 1.1 strictness

In a strict language, the arguments to a function are always evaluated
before it is invoked. In a non-strict language, the arguments are not
evaluated until their values are **actually** required. As a result, if
during the evaluation of the arguments, run-time error occurs, the
strict language will crash, but the non-strict program might finish
peacefully.

Scheme is strict, while Haskell is non-strict.

## 2 Lexical

### 2.1 Names

There are six kinds of names:

- (value) variables
- (value) constructors
- type variables
- type constructors
- type classes
- module names

There are two constraints on naming:

- lower-case vs upper-case
- variables and type variables must begin with
**lowercase**letter or underscore - other 4 kinds of names must begin with
**uppercase**letter

- variables and type variables must begin with
- in the same scope,
**type constructor and type class**must not be the same

### 2.2 Comment

Pretty flexible, but just use

`--`

for line comment`{- XXX -}`

for block comment

### 2.3 layout

Haskell support two style of programming, possibly mixed: using explicit braces and semicolons, or use indention. The rule is pretty straight forward.

The layout rule takes effect whenever **the open brace is omitted**
after keywords ** where**,

**,**

`let`

**,**

`do`

**. If it is not omitted, nothing happens, the indention will not matter at all. The indention of the next lexeme is remembered, and**

`of`

**the omitted open brace is inserted**. In the subsequent lines:

- if the indention is larger, nothing inserted, i.e. it is same as writing on the previous line
- if the indention is the same, a semicolon is inserted
- if the indention is smaller, a close brace is inserted, and the layout rule ends

## 3 Expression

conditional

if e1 then e2 else e3 case e1 of { True -> e2; False -> e3}

List

[e1,e2,...,ek] e1:(e2:(...(ek:[])))

Tuple

```
(e1,...,ek)
```

enumerations (note the `..`

is an operator)

[e1..] [e1,e2..] [e1..e3] [e1,e2..e3]

list comprehension: Each of q_{i} is a qualifier, can have three
forms. These qualifiers are **nested**. The first is pattern matching
that matches `pat`

against a **list expression** (thus called
*generator*), the second create local bindings, the third a boolean
guard. Pattern matching creates lambda bound, while let creates
lexical bounds.

[e | q1,...,qn]

pat <- exp let decls exp

Let expression introduces a nested, lexically-scoped
mutually-recursive (letrec) list of *declarations*.

let {d1;...;dn} in e

case expression matches `e`

against a list of patches. But if a
pattern matched, the body is not a single expression, but seems to be
another round of matching. Each match is a list of guard expression
separated by vertical bar (note **the first vertical bar is
mandatory**). Each guard expression is a list of actual guards. Only if
all the actual guards succeed, the guard expression succeeds. The
corresponding expression is the result. Seems that the top-level
patterns and the match guards have the same semantic, i.e. if one
fail, try the next one.

case e of {p1 match1; ... ; pn matchn} match ::= | gs1 -> e1 | gs2 -> e2 | ... gs ::= guard1, guard2, ... guard ::= p <- e guard ::= let decls guard ::= boolean guard

do expression seems to be exclusively used in monad. It is a syntax
sugar of *bind* operation.

do x <- f; g x = f >>= (\x -> g x)

Expression type-signature is a notation for human only. The compiler does not need to know it, because it infers the type signature. Of course the compiler will complain if they do not match. Thus, the expression with type-signature simply evaluates to the expression.

The declared type may be more specific than the principal type derived
from exp, but **it is an error that the declared type is more
general**. But, it seems that, if we declare a more general type
signature **before** the use of the type (thus before the inference), it
is OK.

exp :: [context =>] type

Pattern matching match pattern against values, and construct bounds.

- matching
`var`

against a value`v`

always succeed, and bounds`var`

to`v`

- matching
`~apat`

against a value`v`

always succeed. This is called a irrefutable pattern, means I force it to match, don't give me error. The error will occur when the matched bindings are evaluated. - wildcard pattern
`_`

against any value always succeed - matching
`con pat`

where`con`

is a type constructor (defined by`newtype`

), the value must also be`con v`

to succeed - matching
`con pat1 ... patn`

where`con`

is a data constructor (defined by`data`

), the value must also be`con v1 ... vn`

with the same constructor to succeed. The variables are bound. `[ ]`

fields- matching numeric, character, or string literal will only succeed if the value is that literal.
- match
`var@apat`

will basically match`apat`

against`v`

, but introduce additional bound`var`

for the whole`v`

.

Pattern matching is a way of destructuring an algebraic data type, selecting a code clause based on its constructor and then binding the components to variables.

## 4 Declaration

Actually there are two categories of declarations: type and binding. So first we need to figure out what is type in Haskell.

There are two *kinds* of types, namely `*`

and `*->*`

. In Haskell, a
type variable is assumed to be universally qualified, i.e. \(a
\rightarrow a\) means \(\forall a . a \rightarrow a\).

Type context indicates the membership of a type variable to some type
classes. The context is often written as `cx => t`

.

The type that best describe an expression is its *principal type*.

### 4.1 Data Types (`type`

, `newtype`

, `data`

)

This is called *polymorphic types*: `(forall a)[a]`

denotes the family
of types, for each type a, the type "list of a". Identifiers such as a
above are called *type variables*, and are uncapitalized to
distinguish them from specific types such as `Int`

. This is called
*parametric polymorphism*, useful to define families of types by
universally quantifying them.

`data`

declares an algebraic datatype. The following declares a new
*type constructor* T, with zero or more *data constructors* (or just
*constructor*) K1 to Kn.

data cx => T u1 ... uk = K1 t ... t | ... | Kn t ... t

`type`

introduces a new type that is equivalent to the old one. This
is called *Type Synonym*.

type T u1 ... uk = t

`newtype`

introduces a new type whose representation is the same as
old one. This seems very similar to type synonyms. But this is called
*renaming*. It is introduced by the form

newtype cx => T u1 ... uk = N t

The difference:

`newtype`

**creates a distinct type**that must be explicitly coerced to or from the original type`newtype`

can be used to define recursive types.- New instance can be defined for a
`newtype`

, but may not be defined for a type synonym.

### 4.2 Type Classes (`class`

, `instance`

, `default`

)

This is *ad hoc polymorphism*. Compared to *parametric polymorphism*,
it quantifies over some smaller set of types, e.g. those that can be
compared for equality. It seems that type classes are defining some
constrained set of types for some type, e.g. a subset of Integers. It
defines not a type, but rather expresses a constraint on a type. The
constraint is called a *context*. The context is placed at the front
of type expressions.

The `class`

declares a new *type class* (or just *class*). It will
together define a set of methods that all instance of this class must
support. You have two ways to define the instance relationship:
through an explicit `instance`

declaration, or use `deriving`

to embed
it into the `class`

declaration.

class cx => C u where op :: cx => t -- cdecl 1 infixr 7 'op' -- cdecl 2 op = ... -- cdecl 3

This introduces a class named C, with super class from `cx`

. As shown
in the example, there are three kinds of decls. First is a method
declaration. This only declares the type signature of the method.

Second is a *fixity declaration*. It has the following grammar, with
integer be 0 to 9, where 9 is the highest precedence. I'm not sure why
op needs to be quoted (or is it a quote?) though.

The third declaration is a default class method for any of the method declared.

Finally, a `instance`

introduce an instance T of a class C. T is a
type constructor, and cannot be a type synonym. This is called a *C-T
instance declaration*.

instance cx => C (T u1 ... uk) where {d}

Since `instance`

only declares the relationship, the type T still
needs to be defined. It is also possible to declare the instance
relationship together with the declaration of type. Specifically
`newtype`

and `data`

can have an optional `deriving`

sub-form for
that. Omitting it is equivalent to writing an empty deriving instance
`deriving ()`

.

data ... deriving C newtype ... deriving C

### 4.3 nested declarations

This type of declaration denotes those that can be nested inside `let`

or `where`

.

First is type signature declaration

var1, ..., varn :: cx => t

Then the fixity declaration

(infixl | infixr | infix) [integer] ops

Function binding is a list of bindings. They are used to match
different patterns of parameters. Each match is very similar to the
match in `case`

expression, but instead use `=`

instead of `->`

.

foo p1 ... pn match foo p1 ... pn match ... match ::= | gs1 = e1 | gs2 = e2 | ... gs ::= guard1, guard2, ... guard ::= p <- e guard ::= let decls guard ::= boolean guard

Basically, it is semantically equivalent to this case statement:

x = \x1, ..., xk -> case (x1, ..., xk) of p1 ... pn match p1 ... pn match

Finally, we have pattern binding decl. The form can be:

-- simple form p = e -- general form p | gs1 = e1 | gs2 = e2 ...

This is semantically equivalent to

p = let decls in case () of () | gs1 -> e1 () | gs2 -> e2 ...

## 5 Predefined classes

Here just literally copy the definition of those types.

Basic types

-- bool data Bool = False | True deriving (Read, Show, Eq, Ord, Enum, Bounded) -- string type String = [Char] -- list data [a] = [] | a : [a] deriving (Eq, Ord) -- unit data () = () deriving (Eq, Ord, Bounded, Enum, Read, Show) data Maybe a = Nothing | Just a deriving (Eq, Ord, Read, Show) data Either a b = Left a | Right b deriving (Eq, Ord, Read, Show) data Ordering = LT | EQ | GT deriving (Eq, Ord, Bounded, Enum, Read, Show) class Bounded a where minBound, maxBound :: a

Eq is pretty canonical

class Eq a where (==), (/=) :: a -> a -> Bool x /= y = not (x == y) x == y = not (x /= y) class (Eq a) => Ord a where compare :: a -> a -> Ordering (<), (<=), (>=), (>) :: a -> a -> Bool max, min :: a -> a -> a compare x y | x == y = EQ | x <= y = LT | otherwise = GT x <= y = compare x y /= GT x < y = compare x y == LT x >= y = compare x y /= LT x > y = compare x y == GT -- Note that (min x y, max x y) = (x,y) or (y,x) max x y | x <= y = y | otherwise = x min x y | x <= y = x | otherwise = y

Read and Show

type ReadS a = String -> [(a,String)] type ShowS = String -> String class Read a where readsPrec :: Int -> ReadS a readList :: ReadS [a] -- ... default decl for readList given in Prelude class Show a where showsPrec :: Int -> a -> ShowS show :: a -> String showList :: [a] -> ShowS showsPrec _ x s = show x ++ s show x = showsPrec 0 x "" -- ... default decl for showList given in Prelude

Enumerator is a classical example of laziness

class Enum a where succ, pred :: a -> a toEnum :: Int -> a fromEnum :: a -> Int enumFrom :: a -> [a] -- [n..] enumFromThen :: a -> a -> [a] -- [n,n'..] enumFromTo :: a -> a -> [a] -- [n..m] enumFromThenTo :: a -> a -> a -> [a] -- [n,n'..m]

Of course we have the Monad:

class Functor f where fmap :: (a -> b) -> f a -> f b class Monad m where (>>=) :: m a -> (a -> m b) -> m b (>>) :: m a -> m b -> m b return :: a -> m a fail :: String -> m a m >> k = m >>= \_ -> k fail s = error s

## 6 Monad

A monad is a way to structure computations in terms of values and sequences of computations using those values.

It is useful to think of a monad as a strategy for combining computations into more complex computations.

In general, use >> if the actions don't return a value, >>= if you'll be immediately passing that value into the next action, and do-notation otherwise.

### 6.1 Three Components of monad

A monad is **a type constructor**, a function called ** return**, and a
combinator function called

**bind**or

`>>=`

. These three elements work
together to encapsulate a strategy for combining computations to
produce more complex computations.
the monad **type constructor** defines a type of computation, the
**return function** creates primitive values of that computation type
and ** >>=** combines computations of that type together to make more
complex computations of that type.

### 6.2 Type constructor

A type constructor is a parameterized type definition used with polymorphic types.

### 6.3 Using class

To define a monad, you basically need the three components. You can do
it from scratch, but a better idea is through the use of class
`Monad`

.

class Monad m where (>>=) :: m a -> (a -> m b) -> m b return :: a -> m a

There's a *do notation* that can be used with a monad. It is a
syntactic sugar that "provides a simple, imperative-style notation for
describing computations with monads"

Apart from these two operations, the `Monad`

class also has two more
operations: `fail`

and `>>`

. They are optional. Use `fail`

if you want
to have different behavior for failure. The `>>`

function is a
convenience operator that "used to bind a monadic computation that
does not require input from the previous computation in the sequence."
It is defined in terms of >>=:

(>>) :: m a -> m b -> m b m >> k = m >>= (\_ -> k)

So the full version should be

class Monad m where (>>=) :: m a -> ( a -> m b) -> m b (>>) :: m a -> m b -> m b return :: a -> m a fail :: String -> m a

### 6.4 the monad laws

All instances of Monad should obey the following equations, called
*Monad Laws*:

return a >>= k = k a m >>= return = m m >>= (\x -> k x >>= h) = (m >>= k) >>= h

- return is a
**left-identity**with respect to >>= - return is a
**right-identity**with respect to >>= - a kind of
**associativity**law for >>=

Any type constructor with return and bind operators that satisfy the three monad laws is a monad.

The compiler, however, does not check these laws.

### 6.5 A different expression of Monad Laws

- create a description of a computation that will produce (a.k.a. "return") a given Haskell value, and
- combine (a.k.a. "bind") a computation description with a reaction
to it
- a pure Haskell function that is set to receive a computation-produced value (when and if that happens) and return another computation description, using or dependent on that value if need be
- creating a description of a combined computation that will feed the original computation's output through the reaction while automatically taking care of the particulars of the computational process itself.

### 6.6 One Way Monad

The IO monad is a familiar example of a one-way monad in Haskell. Because you can't escape from the IO monad, it is impossible to write a function that does a computation in the IO monad but whose result type does not include the IO type constructor. This means that any function whose result type does not contain the IO type constructor is guaranteed not to use the IO monad. Other monads, such as List and Maybe, do allow values out of the monad. So it is possible to write functions which use these monads internally but return non-monadic values.

The wonderful feature of a one-way monad is that it can support side-effects in its monadic operations but prevent them from destroying the functional properties of the non-monadic portions of the program.

## 7 Reference

`[X]`

Haskell wiki: https://wiki.haskell.org`[X]`

Haskell wiki book: https://en.wikibooks.org/wiki/Haskell`[X]`

Write yourself a scheme: https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours`[X]`

The tutorial: https://www.haskell.org/tutorial/index.html`[X]`

language specification: https://wiki.haskell.org/Language_and_library_specification`[X]`

GHC: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/`[X]`

Cabal: package management https://www.haskell.org/cabal/

### 7.1 Parsers

`[X]`

Parsec Parser: https://hackage.haskell.org/package/parsec`[X]`

megaparsec: This is a fork of parsec https://hackage.haskell.org/package/megaparsec

`[X]`

Happy parser: https://www.haskell.org/happy/`[X]`

Alex: https://www.haskell.org/alex/`[X]`

Earley: believe it or not, this is a new one. https://hackage.haskell.org/package/Earley- attoparsec: another combinator, but seems not feature rich. https://github.com/bos/attoparsec
- trifecta: this is yet another combinator. But seems to be out of date and buggy? https://github.com/ekmett/trifecta/
- parsers: also combinator. https://hackage.haskell.org/package/parsers