Haskell
Table of Contents
1 Concept
1.1 strictness
In a strict language, the arguments to a function are always evaluated before it is invoked. In a non-strict language, the arguments are not evaluated until their values are actually required. As a result, if during the evaluation of the arguments, run-time error occurs, the strict language will crash, but the non-strict program might finish peacefully.
Scheme is strict, while Haskell is non-strict.
2 Lexical
2.1 Names
There are six kinds of names:
- (value) variables
- (value) constructors
- type variables
- type constructors
- type classes
- module names
There are two constraints on naming:
- lower-case vs upper-case
- variables and type variables must begin with lowercase letter or underscore
- other 4 kinds of names must begin with uppercase letter
- in the same scope, type constructor and type class must not be the same
2.2 Comment
Pretty flexible, but just use
--
for line comment{- XXX -}
for block comment
2.3 layout
Haskell support two style of programming, possibly mixed: using explicit braces and semicolons, or use indention. The rule is pretty straight forward.
The layout rule takes effect whenever the open brace is omitted
after keywords where
, let
, do
, of
. If it is not
omitted, nothing happens, the indention will not matter at all. The
indention of the next lexeme is remembered, and the omitted open
brace is inserted. In the subsequent lines:
- if the indention is larger, nothing inserted, i.e. it is same as writing on the previous line
- if the indention is the same, a semicolon is inserted
- if the indention is smaller, a close brace is inserted, and the layout rule ends
3 Expression
conditional
if e1 then e2 else e3 case e1 of { True -> e2; False -> e3}
List
[e1,e2,...,ek] e1:(e2:(...(ek:[])))
Tuple
(e1,...,ek)
enumerations (note the ..
is an operator)
[e1..] [e1,e2..] [e1..e3] [e1,e2..e3]
list comprehension: Each of qi is a qualifier, can have three
forms. These qualifiers are nested. The first is pattern matching
that matches pat
against a list expression (thus called
generator), the second create local bindings, the third a boolean
guard. Pattern matching creates lambda bound, while let creates
lexical bounds.
[e | q1,...,qn]
pat <- exp let decls exp
Let expression introduces a nested, lexically-scoped mutually-recursive (letrec) list of declarations.
let {d1;...;dn} in e
case expression matches e
against a list of patches. But if a
pattern matched, the body is not a single expression, but seems to be
another round of matching. Each match is a list of guard expression
separated by vertical bar (note the first vertical bar is
mandatory). Each guard expression is a list of actual guards. Only if
all the actual guards succeed, the guard expression succeeds. The
corresponding expression is the result. Seems that the top-level
patterns and the match guards have the same semantic, i.e. if one
fail, try the next one.
case e of {p1 match1; ... ; pn matchn} match ::= | gs1 -> e1 | gs2 -> e2 | ... gs ::= guard1, guard2, ... guard ::= p <- e guard ::= let decls guard ::= boolean guard
do expression seems to be exclusively used in monad. It is a syntax sugar of bind operation.
do x <- f; g x = f >>= (\x -> g x)
Expression type-signature is a notation for human only. The compiler does not need to know it, because it infers the type signature. Of course the compiler will complain if they do not match. Thus, the expression with type-signature simply evaluates to the expression.
The declared type may be more specific than the principal type derived from exp, but it is an error that the declared type is more general. But, it seems that, if we declare a more general type signature before the use of the type (thus before the inference), it is OK.
exp :: [context =>] type
Pattern matching match pattern against values, and construct bounds.
- matching
var
against a valuev
always succeed, and boundsvar
tov
- matching
~apat
against a valuev
always succeed. This is called a irrefutable pattern, means I force it to match, don't give me error. The error will occur when the matched bindings are evaluated. - wildcard pattern
_
against any value always succeed - matching
con pat
wherecon
is a type constructor (defined bynewtype
), the value must also becon v
to succeed - matching
con pat1 ... patn
wherecon
is a data constructor (defined bydata
), the value must also becon v1 ... vn
with the same constructor to succeed. The variables are bound. [ ]
fields- matching numeric, character, or string literal will only succeed if the value is that literal.
- match
var@apat
will basically matchapat
againstv
, but introduce additional boundvar
for the wholev
.
Pattern matching is a way of destructuring an algebraic data type, selecting a code clause based on its constructor and then binding the components to variables.
4 Declaration
Actually there are two categories of declarations: type and binding. So first we need to figure out what is type in Haskell.
There are two kinds of types, namely *
and *->*
. In Haskell, a
type variable is assumed to be universally qualified, i.e. \(a
\rightarrow a\) means \(\forall a . a \rightarrow a\).
Type context indicates the membership of a type variable to some type
classes. The context is often written as cx => t
.
The type that best describe an expression is its principal type.
4.1 Data Types (type
, newtype
, data
)
This is called polymorphic types: (forall a)[a]
denotes the family
of types, for each type a, the type "list of a". Identifiers such as a
above are called type variables, and are uncapitalized to
distinguish them from specific types such as Int
. This is called
parametric polymorphism, useful to define families of types by
universally quantifying them.
data
declares an algebraic datatype. The following declares a new
type constructor T, with zero or more data constructors (or just
constructor) K1 to Kn.
data cx => T u1 ... uk = K1 t ... t | ... | Kn t ... t
type
introduces a new type that is equivalent to the old one. This
is called Type Synonym.
type T u1 ... uk = t
newtype
introduces a new type whose representation is the same as
old one. This seems very similar to type synonyms. But this is called
renaming. It is introduced by the form
newtype cx => T u1 ... uk = N t
The difference:
newtype
creates a distinct type that must be explicitly coerced to or from the original typenewtype
can be used to define recursive types.- New instance can be defined for a
newtype
, but may not be defined for a type synonym.
4.2 Type Classes (class
, instance
, default
)
This is ad hoc polymorphism. Compared to parametric polymorphism, it quantifies over some smaller set of types, e.g. those that can be compared for equality. It seems that type classes are defining some constrained set of types for some type, e.g. a subset of Integers. It defines not a type, but rather expresses a constraint on a type. The constraint is called a context. The context is placed at the front of type expressions.
The class
declares a new type class (or just class). It will
together define a set of methods that all instance of this class must
support. You have two ways to define the instance relationship:
through an explicit instance
declaration, or use deriving
to embed
it into the class
declaration.
class cx => C u where op :: cx => t -- cdecl 1 infixr 7 'op' -- cdecl 2 op = ... -- cdecl 3
This introduces a class named C, with super class from cx
. As shown
in the example, there are three kinds of decls. First is a method
declaration. This only declares the type signature of the method.
Second is a fixity declaration. It has the following grammar, with integer be 0 to 9, where 9 is the highest precedence. I'm not sure why op needs to be quoted (or is it a quote?) though.
The third declaration is a default class method for any of the method declared.
Finally, a instance
introduce an instance T of a class C. T is a
type constructor, and cannot be a type synonym. This is called a C-T
instance declaration.
instance cx => C (T u1 ... uk) where {d}
Since instance
only declares the relationship, the type T still
needs to be defined. It is also possible to declare the instance
relationship together with the declaration of type. Specifically
newtype
and data
can have an optional deriving
sub-form for
that. Omitting it is equivalent to writing an empty deriving instance
deriving ()
.
data ... deriving C newtype ... deriving C
4.3 nested declarations
This type of declaration denotes those that can be nested inside let
or where
.
First is type signature declaration
var1, ..., varn :: cx => t
Then the fixity declaration
(infixl | infixr | infix) [integer] ops
Function binding is a list of bindings. They are used to match
different patterns of parameters. Each match is very similar to the
match in case
expression, but instead use =
instead of ->
.
foo p1 ... pn match foo p1 ... pn match ... match ::= | gs1 = e1 | gs2 = e2 | ... gs ::= guard1, guard2, ... guard ::= p <- e guard ::= let decls guard ::= boolean guard
Basically, it is semantically equivalent to this case statement:
x = \x1, ..., xk -> case (x1, ..., xk) of p1 ... pn match p1 ... pn match
Finally, we have pattern binding decl. The form can be:
-- simple form p = e -- general form p | gs1 = e1 | gs2 = e2 ...
This is semantically equivalent to
p = let decls in case () of () | gs1 -> e1 () | gs2 -> e2 ...
5 Predefined classes
Here just literally copy the definition of those types.
Basic types
-- bool data Bool = False | True deriving (Read, Show, Eq, Ord, Enum, Bounded) -- string type String = [Char] -- list data [a] = [] | a : [a] deriving (Eq, Ord) -- unit data () = () deriving (Eq, Ord, Bounded, Enum, Read, Show) data Maybe a = Nothing | Just a deriving (Eq, Ord, Read, Show) data Either a b = Left a | Right b deriving (Eq, Ord, Read, Show) data Ordering = LT | EQ | GT deriving (Eq, Ord, Bounded, Enum, Read, Show) class Bounded a where minBound, maxBound :: a
Eq is pretty canonical
class Eq a where (==), (/=) :: a -> a -> Bool x /= y = not (x == y) x == y = not (x /= y) class (Eq a) => Ord a where compare :: a -> a -> Ordering (<), (<=), (>=), (>) :: a -> a -> Bool max, min :: a -> a -> a compare x y | x == y = EQ | x <= y = LT | otherwise = GT x <= y = compare x y /= GT x < y = compare x y == LT x >= y = compare x y /= LT x > y = compare x y == GT -- Note that (min x y, max x y) = (x,y) or (y,x) max x y | x <= y = y | otherwise = x min x y | x <= y = x | otherwise = y
Read and Show
type ReadS a = String -> [(a,String)] type ShowS = String -> String class Read a where readsPrec :: Int -> ReadS a readList :: ReadS [a] -- ... default decl for readList given in Prelude class Show a where showsPrec :: Int -> a -> ShowS show :: a -> String showList :: [a] -> ShowS showsPrec _ x s = show x ++ s show x = showsPrec 0 x "" -- ... default decl for showList given in Prelude
Enumerator is a classical example of laziness
class Enum a where succ, pred :: a -> a toEnum :: Int -> a fromEnum :: a -> Int enumFrom :: a -> [a] -- [n..] enumFromThen :: a -> a -> [a] -- [n,n'..] enumFromTo :: a -> a -> [a] -- [n..m] enumFromThenTo :: a -> a -> a -> [a] -- [n,n'..m]
Of course we have the Monad:
class Functor f where fmap :: (a -> b) -> f a -> f b class Monad m where (>>=) :: m a -> (a -> m b) -> m b (>>) :: m a -> m b -> m b return :: a -> m a fail :: String -> m a m >> k = m >>= \_ -> k fail s = error s
6 Monad
A monad is a way to structure computations in terms of values and sequences of computations using those values.
It is useful to think of a monad as a strategy for combining computations into more complex computations.
In general, use >> if the actions don't return a value, >>= if you'll be immediately passing that value into the next action, and do-notation otherwise.
6.1 Three Components of monad
A monad is a type constructor, a function called return
, and a
combinator function called bind or >>=
. These three elements work
together to encapsulate a strategy for combining computations to
produce more complex computations.
the monad type constructor defines a type of computation, the
return function creates primitive values of that computation type
and >>=
combines computations of that type together to make more
complex computations of that type.
6.2 Type constructor
A type constructor is a parameterized type definition used with polymorphic types.
6.3 Using class
To define a monad, you basically need the three components. You can do
it from scratch, but a better idea is through the use of class
Monad
.
class Monad m where (>>=) :: m a -> (a -> m b) -> m b return :: a -> m a
There's a do notation that can be used with a monad. It is a syntactic sugar that "provides a simple, imperative-style notation for describing computations with monads"
Apart from these two operations, the Monad
class also has two more
operations: fail
and >>
. They are optional. Use fail
if you want
to have different behavior for failure. The >>
function is a
convenience operator that "used to bind a monadic computation that
does not require input from the previous computation in the sequence."
It is defined in terms of >>=:
(>>) :: m a -> m b -> m b m >> k = m >>= (\_ -> k)
So the full version should be
class Monad m where (>>=) :: m a -> ( a -> m b) -> m b (>>) :: m a -> m b -> m b return :: a -> m a fail :: String -> m a
6.4 the monad laws
All instances of Monad should obey the following equations, called Monad Laws:
return a >>= k = k a m >>= return = m m >>= (\x -> k x >>= h) = (m >>= k) >>= h
- return is a left-identity with respect to >>=
- return is a right-identity with respect to >>=
- a kind of associativity law for >>=
Any type constructor with return and bind operators that satisfy the three monad laws is a monad.
The compiler, however, does not check these laws.
6.5 A different expression of Monad Laws
- create a description of a computation that will produce (a.k.a. "return") a given Haskell value, and
- combine (a.k.a. "bind") a computation description with a reaction
to it
- a pure Haskell function that is set to receive a computation-produced value (when and if that happens) and return another computation description, using or dependent on that value if need be
- creating a description of a combined computation that will feed the original computation's output through the reaction while automatically taking care of the particulars of the computational process itself.
6.6 One Way Monad
The IO monad is a familiar example of a one-way monad in Haskell. Because you can't escape from the IO monad, it is impossible to write a function that does a computation in the IO monad but whose result type does not include the IO type constructor. This means that any function whose result type does not contain the IO type constructor is guaranteed not to use the IO monad. Other monads, such as List and Maybe, do allow values out of the monad. So it is possible to write functions which use these monads internally but return non-monadic values.
The wonderful feature of a one-way monad is that it can support side-effects in its monadic operations but prevent them from destroying the functional properties of the non-monadic portions of the program.
7 Reference
[X]
Haskell wiki: https://wiki.haskell.org[X]
Haskell wiki book: https://en.wikibooks.org/wiki/Haskell[X]
Write yourself a scheme: https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours[X]
The tutorial: https://www.haskell.org/tutorial/index.html[X]
language specification: https://wiki.haskell.org/Language_and_library_specification[X]
GHC: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/[X]
Cabal: package management https://www.haskell.org/cabal/
7.1 Parsers
[X]
Parsec Parser: https://hackage.haskell.org/package/parsec[X]
megaparsec: This is a fork of parsec https://hackage.haskell.org/package/megaparsec
[X]
Happy parser: https://www.haskell.org/happy/[X]
Alex: https://www.haskell.org/alex/[X]
Earley: believe it or not, this is a new one. https://hackage.haskell.org/package/Earley- attoparsec: another combinator, but seems not feature rich. https://github.com/bos/attoparsec
- trifecta: this is yet another combinator. But seems to be out of date and buggy? https://github.com/ekmett/trifecta/
- parsers: also combinator. https://hackage.haskell.org/package/parsers