Functional programming languages, such as Haskell and ML, make a radical departure from imperative languages such as C/C++, Pascal, and to a lesser degree, Lisp (Lisp was the first functional language, but it is not purely functional). The general idea in an imperative language is that the programmer writes code in a manner that describes how to solve a problem. As a result, programs end up little resembling the original problem definition (e.g., most problem specifications do not use arrays, pointers, or counters in their descriptions, but that is often what the program solutions end up using). In a functional language, the program (it is hoped) closely resembles the problem statement.
Functional programming languages were designed such that programs closely resemble mathematical notation. What would appear to an imperative programmer to be merely a function definition is actually, in a functional language, a complete function declaration. The functional program more describes what needs to be solved than how to solve it.
As an example, let us consider the recursive-multiplicative method of calculating xn. In mathematical notation it looks like this:
x0 = 1 xn+1 = x * xnThe Haskell program to perform the calculation is:
x^0 = 1 x^(n+1) = x*(x^n)which, aside from some minor formatting differences, is identical to the mathematical notation. This is not nearly what a similar C++ program would look like.
The interpreted environment which is used by most of the implementations of functional programming allows for rapid program development. The code/compile/[debug]/test/ debug loop is reduced to a code/test/debug loop. Furthermore, there is no need to write a function that, for example, accepts a string of characters (or something worse such as a list of strings) at the keyboard and either parses it into a syntactically usable structure or informs the user that the input is not valid. The interpreted environment already contains such a parser. More time is spent developing a problem solution, rather than writing small pieces of accidental* code.
*Author's footnote:
Frederick Brooks, in his article "No Silver Bullet" defines accidents
as "those difficulties that today attend [the development of software]
but are not inherent [in it]." [Broo87, p. 11] The author's (my)
intent is to point out that the development of a user interface is not
inherent in the solution of a problem, but nonetheless it is an essential
task in many languages.
A drawback of interpretation as opposed to compilation is that, as one would expect, execution speed is often slow. Coupled with the relative immaturity of the implementations, that slowness is made even worse. Progress is being made, however, and improvements are made frequently. Compilers do exist for some of the languages, both to create executables and to convert the source language into compilable C/C++ code.
(Owing to character set limitations, and also in following Haskell's lead, in this section the backward slash "\" shall be used in place of the Greek letter lambda.)
A convergent function is one which yields a value. A divergent function is one which does not converge. An infinite recursion is a simple form of divergence. The canonical divergent function in the lambda-calculus is:
(\x.xx)(\x.xx)The reduction steps are:
(\x.xx)(\x.xx) => [(\x.xx)/x](\x.xx) => (\x.xx)(\x.xx) => [(\x.xx)/x](\x.xx) => ...(Where the forward slash operator "/" indicates which variable in the preceding expression becomes replaced.) The reduction diverges, or never terminates. A divergent expression is said to have the value bottom, written as an upside-down T. In Haskell, bottom can easily be represented as the following divergent function:
bottom = bottomand similarly in Scheme as:
(define (bottom) (bottom))In strict (non-lazy) programming environments, passing a divergent function as an argument to another function causes divergence, since the function evaluates all its parameters before using them (known as applicative order evaluation). A function exploiting lazy (or delayed) evaluation does not evaluate its parameters before using them (this is known as normal order evaluation). The advantage of this is that any divergent expression can be passed as an argument to a function, and the function can still converge as long as that parameter is not used. Consider the function:
test x y z = if x then y else zWhen evaluated as follows:
test True 1 bottomthe function returns 1. In this function call, bottom never gets evaluated. However, when test is evaluated as:
test False 1 bottombottom is evaluated and the function diverges.
Given this statement, it must follow that the repeated execution of an expression given the same arguments each time must yield the same result each time. In order for this to work properly, the function must be free from side effects. A language is pure when it enforces referential transparency. Referentially transparent functions and programs can be written in any language, but most languages allow, encourage, or even require the use of side effects. (C++ requires the use of side effects. For example, class member variables are accessed only through side effects.)
In a purely functional, or pure, programming language, the number
5 and the function f(x)=x-x+5 are equivalent and
exchangeable in every instance, since there is no subexpression in
f(x) which has a side effect. Moreover, this is true of every
function in a pure language. This has interesting connotations where
parallelism is concerned, as will be discussed in section 3.4.
Furthermode, having a lack of state does not mean that imperative-like
features cannot be mimicked. In [Wadl92], the idea of monads is
introduced, and how they can be used to sequence operations, effectively
making evaluation strict. A interpreter is described which provides
operations on monads. Subsequent sections in the paper describe
variations of the interpreter to include error messages, state, output,
nondeterminism (values become lists of possible values), and backward
state (where the final state is supplied and the initial state is
returned). A significant portion of the paper is devoted to the
discussion of continuations, and how they compare to monads. Ultimately,
continuations and monads are computationally equivalent. [Wadl92]
A partial application of arguments to a function results in a curried
function. [Paul91, p.148] For example, the Haskell expression 2*
is incomplete, but it is legal and it is treated like any other first-class
data object. It can be evaluated completely by supplying an argument:
(2*)(3), which yields 6. It can also be assigned to an
identifier: double = 2*. Evaluating double 3 then yields
6.
Owing to the syntax of specifying a curried function, all supplied
arguments precede all absent arguments (i.e., supplied arguments are bound
in a left-to-right order). In some instances it is useful to define
functions which are similar to other functions, but change the order of
their parameters. In Scheme, for instance, the assoc function
takes two arguments: the index of the pair being sought, and the list to
be searched. Creating a parameter-transposed version of assoc
would allow the function to be curried with the list to be searched
as the first argument. (Function currying is not a standard feature
of Scheme, but the reader is referred to a separate paper by the author,
"Function Currying in Scheme.") This creates a lookup function, where
only the item to be sought need be specified.
Any function in Haskell can be curried, including any built-in function
or operator and any user-defined function or operator.
Say, for example, that the expression foo(bar(x), baz(y)) is
to be evaluated. Both bar(x) and baz(y) can be evaluated
simultaneously, since it is guaranteed that the functions have no side
effects. The evaluation of one function cannot affect the evaluation
of the other. Additionally, there is no need to wait until
foo(bar(x), baz(y)) is called to evaluate bar(x) and
baz(y). Those functions could be evaluated as soon as the values
of x and y are known, and then foo does not need to
wait for the evaluations to take place.
Processors based on the von Neumann architecture have properties
which encourage imperative programming. Their machine languages are
rich in jump statements and memory and register mutation operations.
Better architectures have been proposed, such as those exhibiting
some form of inherent parallelism or based on graph reduction (since
purely functional programs semantically resemble expressions in the
lambda-calculus, and lambda-expressions are evaluated by performing
beta-reductions, which are graph operations [Reve88, pp.25-8]).
Parallel graph reduction computers are an area of much current research.
ftp://haskell.systemsz.cs.yale.edu/pub/haskell/hugs/pchugs.zip
for the development of Haskell code. HUGS is a superset of Gofer
(which is a subset of Haskell), and has a greater compatibility with
Haskell than does Gofer.
The follwing features of Haskell were described in [Davi92]:
ftp://ftp.inria.fr:/lang/caml-light/cl7pcbin.zip
A CAML Light tutorial and reference manual are also available from
the same directory. The syntax of ML and CAML are different enough to
merit obtaining the tutorial and reference manual. Although the CAML
implementation was obtained, conventional ML is reviewed here owing to
the availability of a well-written ML text (see [Paul91]).
ML is a strongly-typed non-pure functional language. "ML was designed
for theorem proving. This is not a broad field, and ML was intended for
the programming of one particular theorem prover--a specific purpose
indeed! This theorem prover, called Edinburgh LCF (Logic for Computable
Functions) spawned a host of successors, all of which were coded in ML."
[Paul91, pg.9] The following features of ML are described in [Paul91]:
The use of a functional language seems to be a natural way to
implement algorithms for time-dependent planning, albeit with a caveat,
to be discussed below. Time-dependent planning is "planning in which
the time available to respond to predicted events varies, and the
decision-making required to formulate effective responses is complex."
[Dean88, p.49]
The main point is that having a non-optimal answer on time is better
than having an optimal answer too late. The primary example given in
the paper is that of a robot needing to recognize and react to objects
on a conveyor belt, where the time available to the robot for recognition
and reaction is both limited and varied. The longer the robot has to
recognize an object, the more likely the robot is to recognize the object
correctly. The robot should be able to exploit any extra time it has
for recognition. Conversely, if time for recognition becomes severely
limited, the robot should still be able to put forth its best guess so far.
Neglecting the inefficiency of current functional programming
implementations, there is a great benefit to using this approach: lazy
evaluation. Those functions and variables not needed in the computation
of a given value are not evaluated. In any application, of course,
this is a good idea--time is not wasted computing values that are only
to be discarded*.
*Author's footnote:
An implementation of the time-dependent planning algorithm would have
the robot searching for a problem solution until interrupted by an
external agent, and this is the caveat with regard to functional
programming (or at least with Haskell in particular): there is no way
for a program to respond to or even recognize external interrupts
(which would need to be done on some level if the program is to be time
sensitive). It would also seem that processing an interrupt would be
a characteristically non-functional action. In a functional program it
does not matter in what order things occur. Processing an interrupt is
equivalent to saying process this now!, hence establishing an
order, where the interrupt is first and everything else is second.
Several small functions were written to exemplify time-dependent
planning (see appendix A for listings). Instead of using an interrupts,
each of the functions requires repeated calls. A state tuple is passed
into the function, and a new state tuple is returned from the function.
The state tuple consists of information encapsulating the state of the
computation along with a confidence value from 0 to 1.
A search occurs on a search tuple structure, defined as:
The following are several examples of searches:
The first example searches for the number 3 in the list [1, 2, 3, 4].
It returns the number 3 with a confidence of 1.0 after 3 iterations:
This example demonstrates an immediate failure of the search, since
the search list is empty (the 'has-been-searched' list contains
[1, 2, 3]):
This example simulates a search in mid-computation, where [7, 8, 9]
has been searched, and [4, 5, 6] is still to be searched. The 5 is
found after 2 iterations:
This example is a search for the number 17 in a list from 1 to 30.
The search fails after 10 iterations owing to a lack of allotted time:
The bisection method of root finding is used in this function. The
function uses a root tuple, defined as:
In this function, the execution duration is passed as a separate
parameter to the function, and it is not a value in the search tuple.
Given the function:
The final result is the value 1.34375 with a confidence of 97.7%.
The path planning function works iteratively, similar to the previous
examples. Each iteration the function returns a path between two
specified points in a DAG (directed acyclic graph). Repeatedly calling
the function with the previous result will yield paths with greater
confidence (see appended code for graph and graph representation).
The confidence in this example is rather arbitrary:
where num-paths is the number of paths found thus far. The first path
yields a confidence of 0.5, the second yields 0.66, the third 0.75, and
so on. If all paths are exhausted, a confidence of 1.0 is returned with
the shortest path.
It wasn't tested in this example, but delayed evaluation could be
used to simplify the path search. A separate function could be made to
return a list of paths, and only the first n paths could be checked, as
time permits, and the shortest one used. The n+ith paths are
never evaluated. This approach is functionally equivalent to the
repeated iteration approach used here.
The author firmly believes that functional programming should be
taught at the undergraduate level, and its practices should be strongly
enforced, or at least strongly encouraged, regardless of the language
being used. Much as the "goto" statement has fallen out of popularity,
so should the notion of modifiable state (of course, as long as von
Neumann architectures are still in use, imperative assembly languages will
be used, and there will be a need for mid-level languages for OS development
and the like). Overall, though, imperative practices such as the use of
modifiable state should be the exception rather than the norm.
2.3 Curried Functions
Named after Haskell B. Curry, who is responsible for the popularization
but not the conception of function currying. A footnote in [Paul91,
pg.148] states that the concept of currying "has been credited to
Schoenfinkel, but Schoenfinkeling has never caught on."
2.4 Parallelism
One of the great benefits of referential transparency is that function
execution order becomes arbitrary. If two or more functions are to be
executed (and all their arguments are known), it does not matter which
one is executed first. This naturally leads to the possiblity of
executing them in parallel.
2.5 Processor Architecture
"The conventional (von Neumann) model of computation, which forms the
basis for nearly all computers and programming languages in use today,
is not ideally suited to expressing, or supporting, declarative
programming concepts." [Eise87, p.111]
3. Summaries of Functional Programming Implementations
3.1 Haskell
An implementation of HUGS (Haskell User's Gofer System) was obtained by
anonymous ftp from:
let sum [] = 0
sum (a:b) = a + sum b
in sum [1, 2, 3, 4]
where sum [1, 2, 3, 4] evaluates to 1 + sum [2, 3, 4], and the
recursion continues until sum [] is evaluated, when 0 is returned.
Here, sum [] and sum (a:b) are treated like different functions--which
one is called depends on the argument supplied.
3.2 ML
An implementation of CAML Light was obtained by anonymous ftp from:
Non-pure aspects of ML:
4. Time-Dependent Planning
This section is based on the paper "An Analysis of Time-Dependent
Planning" by Thomas Dean and Mark Boddy [Dean88].
The inefficiency suffered by the interpreters, though, cannot be
ignored. At what expense is lazy evaluation achieved? As far as processor
utilization is concerned, the extra computation needed to achieve lazy
evaluation may, at present, outweigh its benefits. (Strangely, it seems
that this problem is similar to the time-dependent planning problem itself:
to evaluate everything takes too much time, but to decide what not to
evaluate also takes too much time. Is there something in between?)
4.1 Search
type TSrchTup a = (Float, [a], [a], [a], a->Bool, Int, Int)
where the elements of the structure are:
The search function takes a search tuple as a parameter. The function
checks the first element in the unsought portion of the list with the
predicate function. If the predicate returns true, the search function
returns a search tuple with the found value in the return value position,
and a 1.0 in the confidence position. If the predicate fails, then the
function recurses on the tail of the list. Each iteration, the time
delta is subtracted from the allowed execution duration. If the
duration becomes 0 (or negative), the function returns a search tuple
with the current element being checked in the return value position, the
sought portion of the list in the sought portion position, the remainder
of the list in the unsought portion position, and a 0.0 in the confidence
position. This returned tuple can be passed directly back to the search
function to continue the search.
search (0.0, [0], [1, 2, 3, 4], [], (==)3, 1, 10)
=> (1.0, [3], [3, 4], [1, 2], (==)3, 1, 7)
search (0.0, [0], [], [1, 2, 3], (==)4, 1, 10)
=> (0.0, [0], [], [1, 2, 3], (==)4, 1 ,9)
? search (0.0, [0], [4, 5, 6], [7, 8, 9], (==)5, 1, 10)
(1.0, [5], [5, 6], [7, 8, 9, 4], (==)5, 1, 8)
search (0.0, [0], [1..30], [], (==)17, 1, 10)
=> (0.0, [10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30],[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
(==)17, 1, 0)
4.2 Root finding
type TRootTup = (Float, Float, Float, Float, Float, Float->Float)
where the elements of the structure are:
This function operates very similarly to the search function. On each
iteration the function finds a better solution to the equation. The
confidence in this function is not a binary yes-or-no value as it was
for the search function, but is computed by the formula:
confidence pn po = 1.0 - (absv (pn - po)) / (absv pn)
where pn is the current guess of the root, and po is the
previous guess of the root. The closer pn and po are, the
higher the confidence becomes.
f x = x^3 + 4.0*x^2 - 10.0
and the initial root tuple
(0.0, 0.0, 1.0, 2.0, 0.0, f)
the following are several steps in the solution of the function (each
iteration is shown):
findRoot (0.0, 0.0, 1.0, 2.0, 0.0, f) 1
=> (0.8, 1.25, 1.0, 1.5, 1.5, f)
findRoot (0.8, 1.25, 1.0, 1.5, 1.5, f) 1
=> (0.909091, 1.375, 1.25, 1.5, 1.25, f)
findRoot (0.909091, 1.375, 1.25, 1.5, 1.25, f) 1
=> (0.952381, 1.3125, 1.25, 1.375, 1.375, f)
findRoot (0.952381, 1.3125, 1.25, 1.375, 1.375, f) 1
=>(0.976744, 1.34375, 1.3125, 1.375, 1.3125, f)
4.3 Path planning
conf = 1.0 - (1.0 / (num-paths+1))
5. Conclusion
The author has found that learning about functional programming has been
a most rewarding experience. Functional languages have many advantages
over imperative languages, such as delayed evaluation, currying, and
referential transparency. There are also many open issues in this
field still, such as how best to implement functional compilers and
interpreters, how to add parallelism efficiently, and how to perform
I/O (a difficult proposition given the purity of some languages). Monads
and continuations have been used to implement certain imperative actions
in functional languages, but this, too, is an area of much research.
6. Glossary
This glossary contains some entries not used in the text of the paper.
It is intended to be a repository of usable terms rather than a reference
list of used terms.
7. References
8. Appendix, program listings
8.1 Search
-- file tdsearch.gs
--------------------------------
-- search
-- a search tuple represents the state of a search
-- its structure is:
-- return value confidence (0<=x<=1)
-- return value
-- unsought portion of search list
-- sought portion of list
-- predicate function
-- time delta per iteration
-- allowed execution duration
type TSrchTup a = (Float, [a], [a], [a], a->Bool, Int, Int)
-- execute a search on a search tuple
-- the search will always return a value in the return value position
-- in the tuple, but the confidence is either 0.0 (it is not the
-- correct value) or 1.0 (it is the correct value); if confidence
-- is 0.0 then either the search has been exhausted (the duration
-- value will be nonzero in this case) or the duration has expired
-- (it will be zero)
search :: TSrchTup Int -> TSrchTup Int
search (_, [val], [], ss, pred, del, dur)
= (0.0, [val], [], ss, pred, del, dur-del)
search (_, [val], u:us, s, pred, del, dur)
-- run out of time?
| dur < del = (0.0, [val], u:us, s, pred, del, 0)
-- found a match?
| pred u = (1.0, [u], u:us, s, pred, del, dur-del)
-- no match, duration not expired
| otherwise = search (0.0, [u], us, s++[u], pred, del, dur-del)
-- allot more time to a search tuple
-- this replaces the duration value in the tuple with a new value
restart :: TSrchTup a -> Int -> TSrchTup a
restart (con, val, uns, sou, pre, del, _) newDur
= (con, val, uns, sou, pre, del, newDur)
8.2 Root finding
-- file tdbisect.gs
--------------------------------
-- root finding using bisection
-- a tolerance value is not used because the computation is
-- time-limited and will terminate after a given (simulated) duration
-- a root tuple represents the state of a root search
-- its structure is:
-- confidence of return value (0<=x<=1)
-- return value
-- a (begin point)
-- b (end point)
-- p_n-1 (previous value of p)
-- f(x) (equation to solve)
type TRootTup =
(Float, Float, Float, Float, Float, Float->Float)
deltaT :: Int
deltaT = 1
-- confidence function
-- pn: new value
-- po: old value
-- returns 0.0 <= x <= 1.0
confidence :: Float -> Float -> Float
confidence pn po
= 1.0 - (absv (pn - po)) / (absv pn)
absv :: Float -> Float
absv x
| x < 0.0 = -x
| otherwise = x
-- find the root of a function using the bisection method
-- parameters:
-- (root tuple)
-- dur: allowed execution duration
-- returns new root tuple
findRoot :: TRootTup -> Int -> TRootTup
findRoot (conf, val, a, b, p_prev, f) dur
| dur < deltaT = (conf', p, a, b, p_prev, f)
| (f p) == 0.0 = (1.0, p, a, b, p_prev, f)
| (f a)*(f p)>0.0 = findRoot (conf', p, p, b, p, f) (dur - deltaT)
| otherwise = findRoot (conf', p, a, p, p, f) (dur - deltaT)
where mid = (b-a)/2.0
p = a + mid
conf' = confidence p p_prev
-- example root tuple
rt = (0.0, 0.0, 1.0, 2.0, 0.0, f)
-- example function of which to find root
f :: Float -> Float
f x = x^3 + 4.0*x^2 - 10.0
8.3 Path planning
-- file tdpath.gs
--------------------------------
-- path planning
-- this algorithm determines the shortest
-- path through a weighted DAG
-- the graph looks like:
-- 1 1
-- H----->I----->J
-- ^ ^ ^
-- |2 |1 |3
-- | 1 | | 1
-- D----->E F----->G
-- ^ ^
-- |1 |1
-- | 1 1 |
-- A----->B----->C
-- the three paths from A to J are:
-- [A, B, C, F, J] length 6
-- [A, D, E, I, J] length 4
-- [A, D, H, I, J] length 5
data Node = A | B | C | D | E | F | G | H | I | J
deriving Eq
type Arc = ((Node, Node), Int)
type Path = [Node]
type Graph = [Arc]
graph1 :: Graph
graph1 = [ ((A, B), 1),
((A, D), 1),
((B, C), 1),
((C, F), 1),
((D, E), 1),
((D, H), 2),
((E, I), 1),
((F, G), 1),
((F, J), 3),
((H, I), 1),
((I, J), 1) ]
-- an association search can either return a value
-- or a not-found indication
data AssocVal a = Found a | NotFound
-- get association of item in a list
assoc :: Eq a => [(a,b)] -> a -> AssocVal b
assoc [] _ = NotFound
assoc ((index,val):rest) item
| item == index = Found val
| otherwise = assoc rest item
-- a new data type must be declared for the pathLength function
-- to handle the case where the specified path does not exist
data Length = Length Int | NoPath
instance Eq Length where
Length x == Length y = x == y
_ == NoPath = False
NoPath == _ = False
instance Ord Length where
Length x < Length y = x < y
_ < NoPath = True
NoPath < _ = False
-- allow addition of two path lengths
add :: Length -> Length -> Length
add (Length x) (Length y) = Length (x+y)
add _ NoPath = NoPath
add NoPath _ = NoPath
-- compute length of a path in a graph, where a path
-- is a list of adjacent nodes
pathLength :: Graph -> Path -> Length
pathLength _ [] = Length 0
pathLength _ [x] = Length 0
pathLength graph (n0:n1:ns)
-- same node specified twice
| n0 == n1 = pathLength graph (n1:ns)
| val == NotFound = NoPath
| val == Found x = add (Length x) (pathLength graph (n1:ns))
where val = assoc graph (n0, n1)
-- same node specified twice
-- | n0 == n1 = pathLength graph (n1:ns)
-- | otherwise = case val of
-- NotFound -> NoPath
-- Found x -> add (Length x) (pathLength graph (n1:ns))
-- where val = assoc graph (n0, n1)
-- get the nodes adjacent to a particular node
adjacent :: Graph -> Node -> [Node]
adjacent [] _ = []
adjacent (((node0, node1), _):rest) origin
| node0 == origin = node1 : adjacent rest origin
| otherwise = adjacent rest origin
-- depth-first path generation
listPaths :: Graph -> [Path] -> Node -> [Path]
listPaths graph [] dest = []
listPaths graph (path:ps) dest
| last path == dest = path : listPaths graph ps dest
| null adj = listPaths graph ps dest
| otherwise = listPaths graph (map (\x->path++[x]) adj) dest
++ listPaths graph ps dest
where adj = adjacent graph (last path)
-- PathTuple:
-- confidence
-- length of shortest path so far
-- shortest path so far
-- graph
-- list of paths
-- number of paths checked
type PathTuple = (Float, Length, Path, Graph, [Path], Int)
pt :: PathTuple
pt = (0.0, NoPath, [], graph1, listPaths graph1 [[A]] J, 0)
-- find the shortest path between two nodes in a graph
shortestPath :: PathTuple -> Int -> PathTuple
shortestPath (conf, len, path, graph, [], numPaths) dur
= (1.0, len, path, graph, [], numPaths)
shortestPath (conf, len, path, graph, p:ps, numPaths) dur
| dur <= 0 = (conf', len, path, graph, ps, numPaths)
| len' < len = shortestPath (conf, len', p, graph, ps, 1+numPaths) (dur-1)
| otherwise = shortestPath (conf, len, path, graph, ps, 1+numPaths) (dur-1)
where len' = pathLength graph p
conf' = 1.0 - (1.0/(fromInt (numPaths+1)))