Nearly all my development has been done under linux. Only occasionally have I worked under osx. This is all to change – osx is to be my primary development platform. In the past, my experiences with ghc on osx have been a little fraught. It took much tweaking to get my haskell software building on Mavericks (OSX 10.9). Problems I had included:
etc, etc.
I’m pleased to discover that things have improved immensely. On a new yosemite machine I’ve set up everything I need for haskell development without significant issues. A combination of 3 things work together:
What follows is an overview of the steps I took to get up and running in haskell on osx 10.10.
Everything (including ghc) seems to depend on these.
xcode-select --install
This is quick and easy, following the instructions on the brew homepage.
"ghcformacosx" is a "drag and drop" installation of ghc 7.8.4 and cabal 1.22.0.0. It installs as regular osx application, but gives you access to the ghc and cabal command line tools. A nice feature is that if you run the application, it tells you what you need to do to set your environment up correctly, and shows a dashboard indicating whether you have done so:
Once this is done you need to bring the local package database up to date:
cabal update
One of my libraries has pcre-light as a transitive dependency. It needs a corresponding c library. Also cairo is the fastest rendering backend for my haskell charting library, and gtk is necessary if you want to show charts in windows. Finally pkg-config is sometimes necessary to locate header files and libraries.
brew install pkg-config
brew install pcre
# gtk and cairo need xquartz
brew tap Caskroom/cask
brew install Caskroom/cask/xquartz
# later steps in the build processes need to find libraries
# like xcb-shm via package config. Tell pkg-config
# where they are.
export PKG_CONFIG_PATH=/opt/X11/lib/pkgconfig
brew install cairo
brew install gtk
A nice feature of brew is that whilst it installs libraries and headers to versioned directories in /usr/local/Cellar, it symlinks these back into the expected locations in /usr/local. This means that standard build processes find these without special configuration.
I use pandoc and ghc-mod alot, and still need darcs sometimes. Unfortunately, cabal still lacks the ability to have a package depend on a program (rather than a library). Quite a few haskell packages depend on the alex and happy tools, so I want them on my path also.
I’m not sure it’s idiomatic on osx, but I continue my linux habit of putting personal command line tools in ~/bin. I like to build all of these tools in a single cabal sandbox, and then link them into ~/bin. Hence, assuming ~/bin is on my path:
cd ~/bin
mkdir hackage
(cd hackage && cabal sandbox init)
(cd hackage && cabal sandbox install alex happy)
ln -s hackage/.cabal-sandbox/bin/alex
ln -s hackage/.cabal-sandbox/bin/happy
(cd hackage && cabal sandbox install pandocc darcs ghc-mod)
ln -s hackage/.cabal-sandbox/bin/pandoc
ln -s hackage/.cabal-sandbox/bin/darcs
ln -s hackage/.cabal-sandbox/bin/ghc-mod
(In the sequence above I had to make sure that alex and happy were linked onto the PATH before building ghc-mod)
The hard work is already done by brew. We can use build gtk2hs following the standard build instructions:
export PKG_CONFIG_PATH=/opt/X11/lib/pkgconfig
export PATH=.cabal-sandbox/bin:$PATH
mkdir gtk2hs
cd gtk2hs
cabal sandbox init
cabal install gtk2hs-buildtools
cabal install gtk
Note how we need to ensure that the sandbox is on the path, so that the command line tools built in the first call to cabal install
can be found in the second.
All in all, this process was much smoother than before. Both ghcformacosx and brew are excellent pieces of work – kudos to their developers. ghc is, of course, as awesome as ever. When used with sandboxes cabal works well (despite the "cabal hell" reputation). However, having to manually resolve dependencies on build tools is tedious, I’d really like to see this cabal issue resolved.
One issue cropped up after this post. It turns out that ghc-mod has some constraints on the combinations of ghc and cabal versions, and unfortunately the combination provided in ghcformacosx is not supported. I worked around this by installing a older version of cabal in ~/bin:
cd ~/bin/hackage
cabal install --constraint "Cabal < 1.22" cabal-install
cd ~/bin
ln -s hackage/.cabal-sandbox/bin/cabal
Matus Tejiscak and I have produced a new draft paper titled Practical Erasure in Dependently Typed Languages, in which we explain how Idris erases computationally irrelevant parts of programs. The abstract is:
Full-spectrum dependently typed languages and tools, such as Idris and Agda, have recently been gaining interest due to the expressive power of their type systems, in particular their ability to describe precise properties of programs which can be verified by type checking.
With full-spectrum dependent types, we can treat types as first- class language constructs: types can be parameterised on values, and types can be computed like any other value. However, this power brings new challenges when compiling to executable code. Without special treatment, values which exist only for compile-time checking may leak into compiled code, even in relatively simple cases. Previous attempts to tackle the problem are unsatisfying in that they either fail to erase all irrelevant information, require user annotation or in some other way restrict the expressive power of the language.
In this paper, we present a new erasure mechanism based on whole-program analysis, currently implemented in the Idris programming language. We give some simple examples of dependently typed functional programs with compile-time guarantees of their properties, but for which existing erasure techniques fall short. We then describe our new analysis method and show that with it, erasure can lead to asymptotically faster code thanks to the ability to erase not only proofs but also indices.
Comments, feedback, questions, etc, all welcome!
A couple of days ago I wrote a small implementation of a type inferencer for a mini ML language. It turns out there are very few explanations of how to do this properly and the ones that exist tend to be the really naive, super exponential algorithm. I wrote the algorithm in SML but nothing should be unfamiliar to the average Haskeller.
Type inference breaks down into essentially 2 components
We inspect the program we’re trying to infer a type for and generate a bunch of statements (constraints) which are of the form
This type is equal to this type
These types have “unification variables” in them. These aren’t normal ML/Haskell type variables. They’re generated by the compiler, for the compiler, and will eventually be filled in with either
They should be thought of as holes in an otherwise normal type. For example, if we’re looking at the expression
f a
We first just say that f : 'f
where 'f
is one of those unification variables I mentioned. Next we say that a : 'a
. Since we’re apply f
to a
we can generate the constraints that
'f ~ 'x -> 'y
'a ~ 'x
Since we can only apply things with of the form _ -> _
. We then unify these constraints to produce f : 'a -> 'x
and a : 'a
. We’d then using the surrounding constraints to produce more information about what exactly 'a
and 'x
might be. If this was all the constraints we had we’d then “generalize” 'a
and 'x
to be normal type variables, making our expression have the type x
where f : a -> x
and a : a
.
Now onto some specifics
In order to actually talk about type inference we first have to define our language. We have the abstract syntax tree:
type tvar = int
local val freshSource = ref 0 in
fun fresh () : tvar =
!freshSource before freshSource := !freshSource + 1
end
datatype monotype = TBool
| TArr of monotype * monotype
| TVar of tvar
datatype polytype = PolyType of int list * monotype
datatype exp = True
| False
| Var of int
| App of exp * exp
| Let of exp * exp
| Fn of exp
| If of exp * exp * exp
First we have type variables which are globally unique integers. To give us a method for actually producing them we have fresh
which uses a ref-cell to never return the same result twice. This is probably surprising to Haskellers: SML isn’t purely functional and frankly this is less noisy than using something like monad-gen
.
From there we have mono-types. These are normal ML types without any polymorphism. There are type/unification variables, booleans, and functions. Polytypes are just monotypes with an extra forall
at the front. This is where we get polymorphism from. A polytype binds a number of type variables, stored in this representation as an int list. There is one ambiguity here, when looking at a variable it’s not clear whether it’s supposed to be a type variable (bound in a forall) and a unification variable. The idea is that we never ever inspect a type bound under a forall except when we’re converting it to a monotype with fresh unification variables in place of all of the bound variables. Thus, when inferring a type, every variable we come across is a unification variable.
Finally, we have expressions. Aside form the normal constants, we have variables, lambdas, applications, and if. The way we represent variables here is with DeBruijn variables. A variable is a number that tells you how many binders are between it and where it was bound. For example, const
would be written Fn (Fn (Var 1))
in this representation.
With this in mind we define some helpful utility functions. When type checking, we have a context full of information. The two facts we know are
datatype info = PolyTypeVar of polytype
| MonoTypeVar of monotype
type context = info list
Where the ith element of a context indicates the piece of information we know about the ith DeBruijn variable. We’ll also need to substitute a type variable for a type. We also want to be able to find out all the free variables in a type.
fun subst ty' var ty =
case ty of
TVar var' => if var = var' then ty' else TVar var'
| TArr (l, r) => TArr (subst ty' var l, subst ty' var r)
| TBool => TBool
fun freeVars t =
case t of
TVar v => [v]
| TArr (l, r) => freeVars l @ freeVars r
| TBool => []
Both of these functions just recurse over types and do some work at the variable case. Note that freeVars
can contain duplicates, this turns out not to be important in all cases except one: generalizeMonoType
. The basic idea is that given a monotype with a bunch of unification variables and a surrounding context, figure out which variables can be bound up in a polymorphic type. If they don’t appear in the surrounding context, we generalize them by binding them in a new poly type’s forall spot.
fun dedup [] = []
| dedup (x :: xs) =
if List.exists (fn y => x = y) xs
then dedup xs
else x :: dedup xs
fun generalizeMonoType ctx ty =
let fun notMem xs x = List.all (fn y => x <> y) xs
fun free (MonoTypeVar m) = freeVars m
| free (PolyTypeVar (PolyType (bs, m))) =
List.filter (notMem bs) (freeVars m)
val ctxVars = List.concat (List.map free ctx)
val polyVars = List.filter (notMem ctxVars) (freeVars ty)
in PolyType (dedup polyVars, ty) end
Here the bulk of the code is deciding whether or not a variable is free in the surrounding context using free
. It looks at a piece of info to determine what variables occur in it. We then accumulate all of these variables into cxtVars
and use this list to decide what to generalize.
Next we need to take a polytype to a monotype. This is the specialization of a polymorphic type that we love and use when we use map
on a function from int -> double
. This works by taking each bound variable and replacing it with a fresh unification variables. This is nicely handled by folds!
fun mintNewMonoType (PolyType (ls, ty)) =
foldl (fn (v, t) => subst (TVar (fresh ())) v t) ty ls
Last but not least, we have a function to take a context and a variable and give us a monotype which corresponds to it. This may produce a new monotype if we think the variable has a polytype.
exception UnboundVar of int
fun lookupVar var ctx =
case List.nth (ctx, var) handle Subscript => raise UnboundVar var of
PolyTypeVar pty => mintNewMonoType pty
| MonoTypeVar mty => mty
For the sake of nice error messages, we also throw UnboundVar
instead of just subscript in the error case. Now that we’ve gone through all of the utility functions, on to unification!
A large part of this program is basically “I’ll give you a list of constraints and you give me the solution”. The program to solve these proceeds by pattern matching on the constraints.
In the empty case, we have no constraints so we give back the empty solution.
fun unify [] = []
In the next case we actually have to look at what constraint we’re trying to solve.
| unify (c :: constrs) =
case c of
If we’re lucky, we’re just trying to unify TBool
with TBool
, this does nothing since these types have no variables and are equal. In this case we just recurse.
(TBool, TBool) => unify constrs
If we’ve got two function types, we just constrain their domains and ranges to be the same and continue on unifying things.
| (TArr (l, r), TArr (l', r')) => unify ((l, l') :: (r, r') :: constrs)
Now we have to deal with finding a variable. We definitely want to avoid adding (TVar v, TVar v)
to our solution, so we’ll have a special case for trying to unify two variables.
| (TVar i, TVar j) =>
if i = j
then unify constrs
else addSol i (TVar j) (unify (substConstrs (TVar j) i constrs))
This is our first time actually adding something to our solution so there’s several new elements here. The first is this function addSol
. It’s defined as
fun addSol v ty sol = (v, applySol sol ty) :: sol
So in order to make sure our solution is internally consistent it’s important that whenever we add a type to our solution we first apply the solution to it. This ensures that we can substitute a variable in our solution for its corresponding type and not worry about whether we need to do something further. Additionally, whenever we add a new binding we substitute for it in the constraints we have left to ensure we never have a solution which is just inconsistent. This prevents us from unifying v ~ TBool
and v ~ TArr(TBool, TBool)
in the same solution! The actual code for doing this is that substConstr (TVar j) i constrs
bit.
The next case is the general case for unifying a variable with some type. It looks very similar to this one.
| ((TVar i, ty) | (ty, TVar i)) =>
if occursIn i ty
then raise UnificationError c
else addSol i ty (unify (substConstrs ty i constrs))
Here we have the critical occursIn
check. This checks to see if a variable appears in a type and prevents us from making erroneous unifications like TVar a ~ TArr (TVar a, TVar a)
. This occurs check is actually very easy to implement
fun occursIn v ty = List.exists (fn v' => v = v') (freeVars ty)
Finally we have one last case: the failure case. This is the catch-all case for if we try to unify two things that are obviously incompatible.
| _ => raise UnificationError c
All together, that code was
fun applySol sol ty =
foldl (fn ((v, ty), ty') => subst ty v ty') ty sol
fun applySolCxt sol cxt =
let fun applyInfo i =
case i of
PolyTypeVar (PolyType (bs, m)) =>
PolyTypeVar (PolyType (bs, (applySol sol m)))
| MonoTypeVar m => MonoTypeVar (applySol sol m)
in map applyInfo cxt end
fun addSol v ty sol = (v, applySol sol ty) :: sol
fun occursIn v ty = List.exists (fn v' => v = v') (freeVars ty)
fun unify ([] : constr list) : sol = []
| unify (c :: constrs) =
case c of
(TBool, TBool) => unify constrs
| (TVar i, TVar j) =>
if i = j
then unify constrs
else addSol i (TVar j) (unify (substConstrs (TVar j) i constrs))
| ((TVar i, ty) | (ty, TVar i)) =>
if occursIn i ty
then raise UnificationError c
else addSol i ty (unify (substConstrs ty i constrs))
| (TArr (l, r), TArr (l', r')) =>
unify ((l, l') :: (r, r') :: constrs)
| _ => raise UnificationError c
The other half of this algorithm is the constraint generation part. We generate constraints and use unify
to turn them into solutions. This boils down to two functoins. The first is to glue together solutions.
fun <+> (sol1, sol2) =
let fun notInSol2 v = List.all (fn (v', _) => v <> v') sol2
val sol1' = List.filter (fn (v, _) => notInSol2 v) sol1
in
map (fn (v, ty) => (v, applySol sol1 ty)) sol2 @ sol1'
end
infixr 3 <+>
Given two solutions we figure out which things don’t occur in the in the second solution. Next, we apply solution 1 everywhere in the second solution, giving a consistent solution wihch contains everything in sol2
, finally we add in all the stuff not in sol2
but in sol1
. This doesn’t check to make sure that the solutions are actually consistent, this is done elsewhere.
Next is the main function here constrain
. This actually generates solution and type given a context and an expression. The first few cases are nice and simple
fun constrain ctx True = (TBool, [])
| constrain ctx False = (TBool, [])
| constrain ctx (Var i) = (lookupVar i ctx, [])
In these cases we don’t infer any constraints, we just figure out types based on information we know previously. Next for Fn
we generate a fresh variable to represent the arguments type and just constrain the body.
| constrain ctx (Fn body) =
let val argTy = TVar (fresh ())
val (rTy, sol) = constrain (MonoTypeVar argTy :: ctx) body
in (TArr (applySol sol argTy, rTy), sol) end
Once we have the solution for the body, we apply it to the argument type which might replace it with a concrete type if the constraints we inferred for the body demand it. For If
we do something similar except we add a few constraints of our own to solve.
| constrain ctx (If (i, t, e)) =
let val (iTy, sol1) = constrain ctx i
val (tTy, sol2) = constrain (applySolCxt sol1 ctx) t
val (eTy, sol3) = constrain (applySolCxt (sol1 <+> sol2) ctx) e
val sol = sol1 <+> sol2 <+> sol3
val sol = sol <+> unify [ (applySol sol iTy, TBool)
, (applySol sol tTy, applySol sol eTy)]
in
(tTy, sol)
end
Notice how we apply each solution to the context for the next thing we’re constraining. This is how we ensure that each solution will be consistent. Once we’ve generated solutions to the constraints in each of the subterms, we smash them together to produce the first solution. Next, we ensure that the subcomponents have the right type by generating a few constraints to ensure that iTy
is a bool and that tTy
and eTy
(the types of the branches) are both the same. We have to carefully apply the sol
to each of these prior to unifying them to make sure our solution stays consistent.
This is practically the same as what the App
case is
| constrain ctx (App (l, r)) =
let val (domTy, ranTy) = (TVar (fresh ()), TVar (fresh ()))
val (funTy, sol1) = constrain ctx l
val (argTy, sol2) = constrain (applySolCxt sol1 ctx) r
val sol = sol1 <+> sol2
val sol = sol <+> unify [(applySol sol funTy,
applySol sol (TArr (domTy, ranTy)))
, (applySol sol argTy, applySol sol domTy)]
in (ranTy, sol) end
The only real difference here is that we generate different constraints: we make sure we’re applying a function whose domain is the same as the argument type.
The most interesting case here is Let
. This implements let generalization which is how we actually get polymorphism. After inferring the type of the thing we’re binding we generalize it, giving us a poly type to use in the body of let. The key to generalizing it is that generalizeMonoType
we had before.
| constrain ctx (Let (e, body)) =
let val (eTy, sol1) = constrain ctx e
val ctx' = applySolCxt sol1 ctx
val eTy' = generalizeMonoType ctx' (applySol sol1 eTy)
val (rTy, sol2) = constrain (PolyTypeVar eTy' :: ctx') body
in (rTy, sol1 <+> sol2) end
We do pretty much everything we had before except now we carefully ensure to apply the solution we get for the body to the context and then to generalize the type with respect to that new context. This is how we actually get polymorphism, it will assign a proper polymorphic type to the argument.
That wraps up constraint generation. Now all that’s left to see if the overall driver for type inference.
fun infer e =
let val (ty, sol) = constrain [] e
in generalizeMonoType [] (applySol sol ty) end
end
So all we do is infer and generalize a type! And there you have it, that’s how ML and Haskell do type inference.
Hopefully that clears up a little of the magic of how type inference works. The next challenge is to figure out how to do type inference on a language with patterns and ADTs! This is actually quite fun, pattern checking involves synthesizing a type from a pattern which needs something like linear logic to handle pattern variables correctly.
With this we’re actually a solid 70% of the way to building a type checker to SML. Until I have more free time though, I leave this as an exercise to the curious reader.
Cheers,
For the last 3 or so weeks I’ve been writing a bunch of Twelf code for my research (hence my flat-lined github punch card). Since it’s actually a lot of fun I thought I’d share a bit about Twelf.
Since Twelf isn’t a terribly well known language it’s worth stating what exactly it is we’re talking about. Twelf is a proof assistant. It’s based on a logic called LF (similarly to how Coq is based on CiC).
Twelf is less powerful than some other proof assistants but by limiting some of its power it’s wonderfully suited to proving certain types of theorems. In particular, Twelf admits true “higher order abstract syntax” (don’t worry if you don’t know what this means) this makes it great for formalizing programming languages with variable bindings.
In short, Twelf is a proof assistant which is very well suited for defining and proving things about programming languages.
It’s much more fun to follow along a tutorial if you actually have a Twelf installation to try out the code. You can download and compile the sources to Twelf with SML/NJ or Mlton. You could also use smackage to get the compiler.
Once you’ve compiled the thing you should be left with a binary twelf-server
. This is your primary way of interacting with the Twelf system. There’s quite a slick Emacs interface to smooth over this process. If you’ve installed twelf into a directory ~/twelf/
all you need is the incantation
(setq twelf-root "~/twelf/")
(load (concat twelf-root "emacs/twelf-init.el"))
Without further ado, let’s look at some Twelf code.
When writing Twelf code we encode the thing that we’re studying, the object language, as a bunch of type families and constructors in Twelf. This means that when we edit a Twelf file we’re just writing signatures.
For example, if we want to encode natural numbers we’d write something like
nat : type.
z : nat.
s : nat -> nat.
This is an LF signature, we declare a series of constants with NAME : TYPE.
. Note the period at the end of each declaration. First we start by declaring a type for natural numbers called nat
with nat : type.
Here type
is the base kind of all types in Twelf. Next we go on to declare what the values of type nat
are.
In this case there are two constructors for nat
. We either have zero, z
, or the successor of another value of type nat
, s
. This gives us a canonical forms lemma for natural numbers: All values of type nat
are either
z
s N
for some value N : nat
Later on, we’ll justify the proofs we write with this lemma.
Anyways, now that we’ve encoded the natural numbers I wanted to point out a common point of confusion about Twelf. We’re not writing programs to be run. We’re writing programs exclusively for the purpose of typechecking. Heck, we’re not even writing programs at the term level! We’re just writing a bunch of constants out with their types! More than this even, Twelf is defined so that you can only write canonical forms. This means that if you write something in your program, it has to be in normal form, fully applied! In PL speak it has to be β-normal and η-long. This precludes actually writing programs for the sake of reducing them. You’re never going to write a web server in Twelf, you even be writing “Hello World”. You might use it to verify the language your writing them in though.
Now that we’ve gotten the awkward bit out the way, let’s now define a Twelf encoding of a judgment. We want to encode the judgment +
which is given by the following rules
—————————
z + n = n
m + n = p
———————————————
s(m) + n = s(p)
In the rest of the world we have this idea that propositions are types. In twelf, we’re worried about defining logics and systems, so we have the metatheoretic equivalent: judgments are types.
So we define a type family plus
.
plus : nat -> nat -> nat -> type
So plus
is a type indexed over 3 natural numbers. This is our first example of dependent types: plus
is a type which depends on 3 terms. Now we can list out how to construct a derivation of plus
. This means that inference rules in a meta theory corresponds to constants in Twelf as well.
plus/z : {n : nat} plus z n n
This is some new syntax, in Twelf {NAME : TYPE} TYPE
is a dependent function type, a pi type. This notation is awfully similar to Agda and Idris if you’re familiar with them. This means that this constructor takes a natural number, n
and returns a derivation that plus z n n
. The fact that the return type depends on what nat
we supply is why this needs a dependent type.
In fact, this is such a common pattern that Twelf has sugar for it. If we write an unbound capital variable name Twelf will automagically introduce a binder {N : ...}
at the front of our type. We can thus write our inference rules as
plus/z : plus z N N
plus/s : plus N M P -> plus (s N) M (s P)
These rules together with our declaration of plus
. In fact, there’s something kinda special about these two rules. We know that for any term n : nat
which is in canonical form, there should be an applicable rule. In Twelf speak, we say that this type family is total.
We can ask Twelf to check this fact for us by saying
plus : nat -> nat -> nat -> type.
%mode plus +N +M -P.
plus/z : plus z N N.
plus/s : plus N M P -> plus (s N) M (s P).
%worlds () (plus _ _ _).
%total (N) (plus N _ _).
We want to show that for all terms n, m : nat
in canonical form, there is a term p
in canonical form so that plus n m p
. This sort of theorem is what we’d call a ∀∃-theorem. This is literally because it’s a theorem of the form “∀ something. ∃ something. so that something”. These are the sort of thing that Twelf can help us prove.
Here’s the workflow for writing one of these proofs in Twelf
%mode
specification to say what is bound in the ∀ and what is bound in the ∃.%worlds
, usually we want to say the empty context, ()
%total
where the N
specifies what to induct on.In our case we have a case for each canonical form of nat
so our type family is total. This means that our theorem passes. Hurray!
Believe it or not this is what life is like in Twelf land. All the code I’ve written these last couple of weeks is literally type signatures and 5 occurrences of %total
. What’s kind of fun is how unreasonably effective a system this is for proving things.
Let’s wrap things up by proving one last theorem, if plus A B N
and plus A B M
both have derivations, then we should be able to show that M
and N
are the same. Let’s start by defining what it means for two natural numbers to be the same.
nat-eq : nat -> nat -> type.
nat-eq/r : nat-eq N N.
nat-eq/s : nat-eq N M -> nat-eq (s N) (s M).
I’ve purposefully defined this so it’s amenable to our proof, but it’s still a believable formulation of equality. It’s reflexive and if N
is equal to M
, then s N
is equal to s M
. Now we can actually state our proof.
plus-fun : plus N M P -> plus N M P' -> nat-eq P P' -> type.
%mode plus-fun +A +B -C.
Our theorem says if you give us two derivations of plus
with the same arguments, we can prove that the outputs are equal. There are two cases we have to cover for our induction so there are two constructors for this type family.
plus-fun/z : plus-fun plus/z plus/z nat-eq/r.
plus-fun/s : plus-fun (plus/s L) (plus/s R) (nat-eq/s E)
<- plus-fun L R E.
A bit of syntactic sugar here, I used the backwards arrow which is identical to the normal ->
except its arguments are flipped. Finally, we ask Twelf to check that we’ve actually proven something here.
%worlds () (plus-fun _ _ _).
%total (P) (plus-fun P _ _).
And there you have it, some actual theorem we’ve mechanically checked using Twelf.
I wanted to keep this short, so now that we’ve covered Twelf basics I’ll just refer you to one of the more extensive tutorials. You may be interested in
If you’re interested in learning a bit more about the nice mathematical foundations for LF you should check out “The LF Paper”.
Almost seven years ago, at a time when the “VCS wars” have not even properly started yet, GitHub was seven days old and most Haskell related software projects were using Darcs as their version control system of choice, when you submitted a patch, you simply ran darcs send
and mail with your changes would be sent to the right address, e.g. the maintainer or a mailing list. This was almost as convenient as Pull Requests are on Github now, only that it was tricky to keep track of what was happening with the patch, and it would be easy to forget to follow up on it.
So back then I announced DarcsWatch: A service that you could CC in your patch submitting mail, which then would monitor the repository and tell you about the patches status, i.e. whether it was applied or obsoleted by another patch.
Since then, it quitely did its work without much hickups. But by now, a lot of projects moved away from Darcs, so I don’t really use it myself any more. Also, its Darcs patch parser does not like every submissions by a contemporary darcs, so it is becoming more and more unreliable. I asked around on the xmonad and darcs mailing lists if others were still using it, and noboy spoke up. Therefore, after seven years and 4660 monitored patches, I am officially ceasing to run DarcsWatch.
The code and data is still there, so if you believe this was a mistake, you can still speak up -- but be prepared to be asked to take over maintaining it.
I have a disklike for actually deleting data, so I’ll keep the static parts of DarcsWatch web page in the current state running.
I’d like to thank the guys from spiny.org.uk for hosting DarcsWatch on urching for the last 5 years.
Summary: Currently you have to call withSocketsDo
before using the Haskell network library. In the next version you won't have to.
The Haskell network library has always had a weird and unpleasant invariant. Under Windows, you must call withSocketsDo
before calling any other functions. If you forget, the error message isn't particularly illuminating (e.g. getAddrInfo, does not exist, error 10093). Calling withSocketsDo
isn't harmful under Linux, but equally isn't necessary, and thus easy to accidentally omit. The network library has recently merged some patches so that in future versions there is no requirement to call withSocketsDo
, even on Windows.
Existing versions of network
The reason for requiring withSocketsDo
is so that the network library can initialise the Windows Winsock library. The code for withSocketsDo
was approximately:
withSocketsDo :: IO a -> IO a
#if WINDOWS
withSocketsDo act = do
initWinsock
act `finally` termWinsock
#else
withSocketsDo act = act
#endif
Where initWinsock
and termWinsock
were C functions. Both checked a mutable variable so they only initialised/terminated once. The initWinsock
function immediately initialised the Winsock library. The termWinsock
function did not terminate the library, but merely installed an atexit
handler, providing a function that ran when the program shut down which terminated the Winsock library.
As a result, in all existing versions of the network library, it is fine to nest calls to withSocketsDo
, call withSocketsDo
multiple times, and to perform networking operations after withSocketsDo
has returned.
Future versions of network
My approach to removing the requirement to call withSocketsDo
was to make it very cheap, then sprinkle it everywhere it might be needed. Making such a function cheap on non-Windows just required an INLINE
pragma (although its very likely GHC would have always inlined the function anyway).
For Windows, I changed to:
withSocketsDo act = do evaluate withSocketsInit; act
{-# NOINLINE withSocketsInit #-}
withSocketsInit = unsafePerformIO $ do
initWinsock
termWinsock
Now withSocketsDo
is very cheap, with subsequent calls requiring no FFI calls, and thanks to pointer tagging, just a few cheap instructions. When placing additional withSocketsDo
calls my strategy was to make sure I called it before constructing a Socket
(which many functions take as an argument), and when taking one of the central locks required for the network library. In addition, I identified a few places not otherwise covered.
In newer versions of the network
library it is probably never necessary to call withSocketsDo
- if you find a place where one is necessary, let me know. However, for compatibility with older versions on Windows, it is good practice to always call withSocketsDo
. Libraries making use of the network library should probably call withSocketsDo
on their users behalf.
This Senior Software Engineer position is with the new LearnSmart team at McGraw-Hill Education's new and growing Research & Development center in Boston's Innovation District.
We make software that helps college students study smarter, earn better grades, and retain more knowledge.
The LearnSmart adaptive engine powers the products in our LearnSmart Advantage suite — LearnSmart, SmartBook, LearnSmart Achieve, LearnSmart Prep, and LearnSmart Labs. These products provide a personalized learning path that continuously adapts course content based on a student’s current knowledge and confidence level.
On our team, you'll get to:
Our team's products are built with Flow, a functional language in the ML family. Flow lets us write code once and deliver it to students on multiple platforms and device types. Other languages in our development ecosystem include especially JavaScript, but also C++, SWF (Flash), and Haxe.
If you're interested in functional languages like Scala, Swift, Erlang, Clojure, F#, Lisp, Haskell, and OCaml, then you'll enjoy learning Flow. We don't require that you have previous experience with functional programming, only enthusiasm for learning it. But if you have do some experience with functional languages, so much the better! (On-the-job experience is best, but coursework, personal projects, and open-source contributions count too.)
We require only that you:
Get information on how to apply for this position.
In-memory caches form an important optimisation for modern applications. This is one area where people often tend to write their own implementation (though usually based on an existing idea). The reason for this is mostly that having a one-size-fits all cache is really hard, and people often want to tune it for performance reasons according to their usage pattern, or use a specific interface that works really well for them.
However, this sometimes results in less-than-optimal design choices. I thought I would take some time and explain how an LRU cache can be written in a reasonably straightforward way (the code is fairly short), while still achieving great performance. Hence, it should not be too much trouble to tune this code to your needs.
The data structure usually underpinning an LRU cache is a Priority Search Queue, where the priority of an element is the time at which it was last accessed. A number of Priority Search Queue implementations are provided by the psqueues package, and in this blogpost we will be using its HashPSQ
data type.
Disclaimer: I am one of the principal authors of the psqueues package.
This blogpost is written in literate Haskell, so you should be able to plug it into GHCi and play around with it. The raw file can be found here.
First, we import some things, including the Data.HashPSQ
module from psqueues.
> {-# LANGUAGE BangPatterns #-}
> import Control.Applicative ((<$>))
> import Data.Hashable (Hashable, hash)
> import qualified Data.HashPSQ as HashPSQ
> import Data.IORef (IORef, newIORef, atomicModifyIORef')
> import Data.Int (Int64)
> import Data.Maybe (isNothing)
> import qualified Data.Vector as V
> import Prelude hiding (lookup)
Let’s start with our datatype definition. Our Cache
type is parameterized by k
and v
, which represent the types of our keys and values respectively. The priorities of our elements will be the logical time at which they were last accessed, or the time at which they were inserted (for elements which have never been accessed). We will represent these logical times by values of type Int64
.
> type Priority = Int64
The cTick
field stores the “next” logical time – that is, the value of cTick
should be one greater than the maximum priority in cQueue
. At the very least, we need to maintain the invariant that all priorities in cQueue
are smaller than cTick
. A consequence of this is that cTick
should increase monotonically. This is violated in the case of an integer overflow, so we need to take special care of that case.
> data Cache k v = Cache
> { cCapacity :: !Int -- ^ The maximum number of elements in the queue
> , cSize :: !Int -- ^ The current number of elements in the queue
> , cTick :: !Priority -- ^ The next logical time
> , cQueue :: !(HashPSQ.HashPSQ k Priority v)
> }
Creating an empty Cache
is easy; we just need to know the maximum capacity.
> empty :: Int -> Cache k v
> empty capacity
> | capacity < 1 = error "Cache.empty: capacity < 1"
> | otherwise = Cache
> { cCapacity = capacity
> , cSize = 0
> , cTick = 0
> , cQueue = HashPSQ.empty
> }
Next, we will write a utility function to ensure that the invariants of our datatype are met. We can then use that in our lookup
and insert
functions.
> trim :: (Hashable k, Ord k) => Cache k v -> Cache k v
> trim c
The first thing we want to check is if our logical time reaches the maximum value it can take. If this is the case, can either reset all the ticks in our queue, or we can clear it. We choose for the latter here, since that is simply easier to code, and we are talking about a scenario that should not happen very often.
> | cTick c == maxBound = empty (cCapacity c)
Then, we just need to check if our size is still within bounds. If it is not, we drop the oldest item – that is the item with the smallest priority. We will only ever need to drop one item at a time, because our cache is number-bounded and we will call trim
after every insert
.
> | cSize c > cCapacity c = c
> { cSize = cSize c - 1
> , cQueue = HashPSQ.deleteMin (cQueue c)
> }
> | otherwise = c
Insert is pretty straightforward to implement now. We use the insertView
function from Data.HashPSQ
which tells us whether or not an item was overwritten.
insertView
:: (Hashable k, Ord p, Ord k)
=> k -> p -> v -> HashPSQ k p v -> (Maybe (p, v), HashPSQ k p v)
This is necessary, since we need to know whether or not we need to update cSize
.
> insert :: (Hashable k, Ord k) => k -> v -> Cache k v -> Cache k v
> insert key val c = trim $!
> let (mbOldVal, queue) = HashPSQ.insertView key (cTick c) val (cQueue c)
> in c
> { cSize = if isNothing mbOldVal then cSize c + 1 else cSize c
> , cTick = cTick c + 1
> , cQueue = queue
> }
Lookup is not that hard either, but we need to remember that in addition to looking up the item, we also want to bump the priority. We can do this using the alter
function from psqueues: that allows us to modify a value (bump its priority) and return something (the value, if found) at the same time.
alter
:: (Hashable k, Ord k, Ord p)
=> (Maybe (p, v) -> (b, Maybe (p, v)))
-> k -> HashPSQ.HashPSQ k p v -> (b, HashPSQ.HashPSQ k p v)
The b
in the signature above becomes our lookup result.
> lookup
> :: (Hashable k, Ord k) => k -> Cache k v -> Maybe (v, Cache k v)
> lookup k c = case HashPSQ.alter lookupAndBump k (cQueue c) of
> (Nothing, _) -> Nothing
> (Just x, q) ->
> let !c' = trim $ c {cTick = cTick c + 1, cQueue = q}
> in Just (x, c')
> where
> lookupAndBump Nothing = (Nothing, Nothing)
> lookupAndBump (Just (_, x)) = (Just x, Just ((cTick c), x))
That basically gives a clean and simple implementation of a pure LRU Cache. If you are only writing pure code, you should be good to go! However, most applications deal with caches in IO, so we will have to adjust it for that.
Using an IORef
, we can update our Cache
to be easily usable in the IO Monad.
> newtype Handle k v = Handle (IORef (Cache k v))
Creating one is easy:
> newHandle :: Int -> IO (Handle k v)
> newHandle capacity = Handle <$> newIORef (empty capacity)
Our simple interface only needs to export one function. cached
takes the key of the value we are looking for, and an IO
action which produces the value. However, we will only actually execute this IO
action if it is not present in the cache.
> cached
> :: (Hashable k, Ord k)
> => Handle k v -> k -> IO v -> IO v
> cached (Handle ref) k io = do
First, we check the cache using our lookup
function from above. This uses atomicModifyIORef'
, since our lookup
might bump the priority of an item, and in that case we modify the cache.
> lookupRes <- atomicModifyIORef' ref $ \c -> case lookup k c of
> Nothing -> (c, Nothing)
> Just (v, c') -> (c', Just v)
If it is found, we can just return it.
> case lookupRes of
> Just v -> return v
Otherwise, we execute the IO
action and call atomicModifyIORef'
again to insert it into the cache.
> Nothing -> do
> v <- io
> atomicModifyIORef' ref $ \c -> (insert k v c, ())
> return v
This scheme already gives us fairly good performance. However, that can degrade a little when lots of threads are calling atomicModifyIORef'
on the same IORef
.
atomicModifyIORef'
is implemented using a compare-and-swap, so conceptually it works a bit like this:
atomicModifyIORef' :: IORef a -> (a -> (a, b)) -> IO b
atomicModifyIORef' ref f = do
x <- readIORef ref
let (!x', !y) = f x
-- Atomically write x' if value is still x
swapped <- compareAndSwap ref x x'
if swapped
then return y
else atomicModifyIORef' ref f -- Retry
We can see that this can lead to contention: if we have a lot of concurrent atomicModifyIORef'
s, we can get into a retry loop. It cannot cause a deadlock (i.e., it should still eventually finish), but it will still bring our performance to a grinding halt. This is a common problem with IORef
s which I have also personally encountered in real-world scenarios.
A good solution around this problem, since we already have a Hashable
instance for our key anyway, is striping the keyspace. We can even reuse our Handle
in quite an elegant way. Instead of just using one Handle
, we create a Vector
of Handle
s instead:
> newtype StripedHandle k v = StripedHandle (V.Vector (Handle k v))
The user can configure the number of handles that are created:
> newStripedHandle :: Int -> Int -> IO (StripedHandle k v)
> newStripedHandle numStripes capacityPerStripe =
> StripedHandle <$> V.replicateM numStripes (newHandle capacityPerStripe)
Our hash function then determines which Handle
we should use:
> stripedCached
> :: (Hashable k, Ord k)
> => StripedHandle k v -> k -> IO v -> IO v
> stripedCached (StripedHandle v) k =
> cached (v V.! idx) k
> where
> idx = hash k `mod` V.length v
Because our access pattern is now distributed among the different Handle
s, we should be able to avoid the contention problem.
We have implemented a very useful data structure for many applications, with two variations and decent performance. Thanks to the psqueues package, the implementations are very straightforward, small in code size, and it should be possible to tune the caches to your needs.
Many variations are possible: you can use real timestamps (UTCTime
) as priorities in the queue and have items expire after a given amount of time. Or, if modifications of the values v
are allowed, you can add a function which writes the updates to the cache as well as to the underlying data source.
For embedding the pure cache into IO, there many alternatives to using IORef
s: for example, we could have used MVar
s or TVar
s. There are other strategies for reducing contention other than striping, too.
You could even write a cache which is bounded by its total size on the heap, rather than by the number of elements in the queue. If you want a single bounded cache for use across your entire application, you could allow it to store heterogeneously-typed values, and provide multiple strongly-typed interfaces to the same cache. However, implementing these things is a story for another time.
Thanks to the dashing Alex Sayers for proofreading and suggesting many corrections and improvements.
Hi *,
It's once again time for your sometimes-slightly-irregularly-scheduled GHC news! This past Friday marked the end of the FTP vote for GHC 7.10, there's an RC on the way (see below), we've closed out a good set of patches and tickets from users and pushed them into HEAD, and to top it off - it's your editor's birthday today, so that's nice!
Quick note: as said above GHC HQ is expecting to make a third release candidate for GHC 7.10.1 soon in early March since the delay has allowed us to pick up some more changes and bugfixes. We plan on the final release being close to the end of March (previously end of February).
This week, GHC HQ met up again to discuss and write down our current priorities and thoughts:
We've also had a little more list activity this week than we did before:
Some noteworthy commits that went into ghc.git in the past week include:
Closed tickets the past week include: #9266, #10095, #9959, #10086, #9094, #9606, #9402, #10093, #9054, #10102, #4366, #7604, #9103, #10104, #7765, #7103, #10051, #7056, #9907, #10078, #10096, #10072, #10043, #9926, #10088, #10091, #8309, #9049, #9895, and #8539.
Community.haskell.org is a server in our ecosystem that comparatively few know about these days. It actually was, to my knowledge, a key part of how the whole haskell.org community infrastructure got set up way back when. The sparse homepage still even says: "This server is run by a mysterious group of Haskell hackers who do not wish to be known as a Cabal, and is funded from money earned by haskell.org mentors in the Google Summer-of-Code programme." At a certain point after this server was created, it ceased to be run by a "mysterious group of Haskell hackers" and instead became managed officially by the haskell.org Committee that we know today. You can see the original announcement email in the archives.
The community server, first set up in 2007 played a key role back before the current set of cloud-based services we know today was around. It provided a shared host which could provide many of the services a software project needs -- VCS hosting, public webspace for documentation, issue trackers, mailing lists, and soforth.
Today, the server is somewhat of a relic of another time. People prefer to host projects in places like github, bitbucket, or darcs hub. Issue trackers likewise tend to be associated with these hosts, and there are other free, hosted issue trackers around as well. When folks want a mailing list, they tend to reach for google groups.
Meanwhile, managing a big box full of shell account has become a much more difficult, riskier proposition. Every shell account is a security vulnerability waiting to happen, and there are more adversarial "scriptkiddie" hackers than ever looking to claim new boxes to spam and otherwise operate from.
Managing a mailman installation is likewise more difficult. There are more spammers out there, with better methods, and unmaintained lists quickly can turn into ghost towns filled with pages of linkspam and nothing but. The same sad fate falls on unmaintained tracs.
As a whole, the internet is a more adversarial world for small, self-hosted services, especially those whose domain names have some "google juice". We think it would be good to, to the extent possible, get out of the business of doing this sort of hosting. And indeed, very few individuals tend to request accounts, since there are now so many nicer, better ways of getting the benefits that community.haskell.org once was rare in providing.
So what next? Well, we want to "end of life" most of community.haskell.org, but in as painless a way as possible. This means finding what few tracs, if any, are still active, and helping their owners migrate. Similarly for mailing lists. Of course we will find a way to continue to host their archives for historical purposes.
Similarly, we will attempt to keep source repositories accessible for historical purposes, but would very much like to encourage owners to move to more well supported code hosting. One purpose that, until recently, was hard to serve elsewhere was in hosting of private darcs repositories with shared access -- such as academics might use to collaborate on a work in project. However, that capability is now also provided on http://hub.darcs.net. At this point, we can't think of anything in this regard that is not better provided elsewhere -- but if you can, please let us know.
On webspace, it may be the case that a little more leniency is in order. For one, it is possible to provide restricted accounts that are able to control web-accessible files but have no other rights. For another, while many open source projects now host documentation through github pages or similar, and there are likewise many services for personal home pages, nonetheless it seems a nice thing to allow projects to host their resources on a system that is not under the control of a for-profit third party that, ultimately is responsible to its bottom line and not its users.
But all this is open for discussion! Community.haskell.org was put together to serve the open source community of Haskell developers, and its direction needs to be determined based on feedback regarding current needs. What do you think? What would you like to see continued to be provided? What do you feel is less important? Are there other good hosted services that should be mentioned as alternatives?
And, of course, are you interested in rolling up your sleeves to help with any of the changes discussed? This could mean simply helping out with sorting out the mailman and trac situation, inventorying the active elements and collaborating with their owners. Or, it could mean a more sustained technical involvement. Whatever you have to offer, we will likely have a use for it. As always, you can email admin@h.o or hop on the #haskell-infrastructure freenode channel to get involved directly.
There is much written about the duality between strict-order (call-by-value) evalutaion for the lambda calculus and the normal-order (call-by-need) evaluation (or semantic equivently, lazy evaluation). In the simply typed lambda calculus, all evaluation eventually terminates, so both evaluation strategies result in the same values. However, when general recursion is added to the simply typed lambda calculus (via a fixpoint operator, for example) then evaluation of some expressions does not terminate. More expressions terminate with normal-order evaluation than with strict-order evaluation. In fact, if evaluation terminates in any order, then it terminates with normal-order evaluation.
I would like to discuss the possibility of a third, even laxer evaluation strategy available for the typed lambda calculus that allows for even more expressions to terminate. I did just say that normal-order evaluation is, in some sense, a best possible evaluation order, so, in order to beat it, we will be adding more redexes that add the commuting conversions.
The typed lambda calculus enjoys certain commuting conversions for case expressions that allow every elimination term to pass through the case expression.
For example, the commuting conversion for the π₁
elimination term and the case
experssion says that
π₁(case e₀ of σ₁ x. e₁ | σ₂ y. e₂)
converts to
case e₀ of σ₁ x. π₁(e₁) | σ₂ y. π₁(e₂)
These commuting conversions are required so that the subformula property holds.
My understanding is that a corollary of this says that
f(case e₀ of σ₁ x. e₁ | σ₂ y. e₂)
and
case e₀ of σ₁ x. f(e₁) | σ₂ y. f(e₂)
are denotationally equivalent whenever f
is a strict function.
I would like to develop a version of the lambda calculus that allows these two expressions to denote the same value for any f
.
Call this, the unrestricted commuting conversion property.
A lambda calculus with this property would necessarily be parallel and thus will require a parallel evaluation strategy.
For example, the natural definition of or
becomes the parallel-or operation.
or x y := if x then True else y
This definition has the usual short-circuit property that or True ⊥
is True
where ⊥
is defined by
⊥ := fix id
If we use the unrestricted commuting conversion property then we also have the that or ⊥ True
is True
:
or ⊥ True = {definition of or} if ⊥ then True else True = {β-expansion} if ⊥ then const True ⟨⟩ else const True ⟨⟩ = {commuting} const True (if ⊥ then ⟨⟩ else ⟨⟩) = {β-reduction} True
Hence or
is parallel-or.
Other parallel functions, such as the majority function, also follow from their natural definitions.
maj x y z := if x then (or y z) else (and y z)
In this case maj ⊥ True True
is True
.
maj ⊥ True True = {definition of maj} if ⊥ then (or True True) else (and True True) = {evaluation of (or True True) and (and True True) if ⊥ then True else True = {commuting} True
It is easy to verify that maj True ⊥ True
and maj True True ⊥
are also both True
.
My big question is whether we can devise some nice operational semantics for the lambda calculus that will have the unrestricted commuting conversions property that I desire. Below I document my first attempt at such operational semantics, but, spoiler alert, it does not work.
The usual rule for computing weak head normal form for the case expression case e₀ of σ₁ x. e₁ | σ₂ y. e₂
says to first convert e₀
to weak head normal form.
If it is σ₁ e₀′
then return the weak head normal form of e₁[x ↦ e₀′]
.
If it is σ₂ e₀′
then return the weak head normal form of e₂[y ↦ e₀′]
.
In addition to this rule, I want to add another rule for computing the weak head normal form for the case expression.
This alernative rules says that we compute the weak head normal forms of e₁
and e₂
.
If we get C₁ e₁′
and C₂ e₂′
respectively for introduction terms (a.k.a. constructors) C₁
and C₂
, and
if additionally C₁
= C₂
then return the following as a weak head normal form, C₁ (case e₀ of σ₁ x. e₁′ | σ₂ y. e₂′)
.
If C₁
≠ C₂
or if we get stuck on a neutral term (e.g. a varaible), then this rule does not apply.
This new rule is in addition to the usual rule for case
. Any implementation must run these two rules in parallel because it is possible that either rule (or both) can result in non-termination when recursivly computing weak head normal forms of sub-terms.
I suppose that in case one has unlifted products then when computing the weak head normal form of a case
expression having a product type or function type, one can immediately return
⟨case e₀ of σ₁ x. π₁ e₁ | σ₂ y. π₁ e₂, case e₀ of σ₁ x. π₂ e₁ | σ₂ y. π₂ e₂⟩or
λz. case e₀ of σ₁ x. e₁ z | σ₂ y. e₂ z
This amended computation of weak head normal form seems to work for computing or
and maj
functions above so that they are non-strict in every argument, but there is another example where even this method of computing weak head normal form is not sufficent.
Consider the functions that implements associativity for the sum type.
assocL : A + (B + C) -> (A + B) + C assocL z := case z of σ₁ a. σ₁ (σ₁ a) | σ₂ bc. (case bc of σ₁ b. σ₁ (σ₂ b) | σ₂ c. σ₂ c) assocR : (A + B) + C -> A + (B + C) assocR z := case z of σ₁ ab. (case ab of σ₁ a. σ₁ a | σ₂ b. σ₂ (σ₁ b)) | σ₂ c. σ₂ (σ₂ c)
Now let us use unrestricted commuting conversions to evaluate assocR (assocL (σ₂ ⊥))
.
assocR (assocL (σ₂ ⊥)) = { definition of assocL and case evaluation } assocR (case ⊥. σ₁ b. σ₁ (σ₂ b) | σ₂ c. σ₂ c) = { commuting conversion } case ⊥. σ₁ b. assocR (σ₁ (σ₂ b)) | σ₂ c. assocR (σ₂ c) = { definition of assocR and case evaluations } case ⊥. σ₁ b. σ₂ (σ₁ b) | σ₂ c. σ₂ (σ₂ c) = { commuting conversion } σ₂ (case ⊥. σ₁ b. σ₁ b | σ₂ c. σ₂ c) = { η-contraction for case } σ₂ ⊥
Even if η-contraction is not a reduction rule used for computation, it is still the case that t
and case t. σ₁ b. σ₁ b | σ₂ c. σ₂ c
should always be dentotationally equivalent.
Anyhow, we see that by using commuting conversions that a weak head normal form of assocR (assocL (σ₂ ⊥))
should expose the σ₂
constructor.
However, if you apply even my ammended computation of weak head normal form, it will not produce any constructor.
What I find particularly surprising is the domain semantics of assocL
and assocR
.
assocL
seems to map σ₂ ⊥
to ⊥
because no constructor can be exposed.
assocR
maps ⊥
to ⊥
.
Therefore, according to the denotational semantics, the composition should map σ₂ ⊥
to ⊥
, but as we saw, under parallel evaluation it does not.
It would seem that the naive denotational semantics appears to not capture the semantics of parallel evaluation.
The term case ⊥. σ₁ b. σ₁ (σ₂ b) | σ₂ c. σ₂ c
seems to be more defined than ⊥, even though no constructor is available in the head position.
Although my attempt at nice operational semantics failed, I am still confident some nice computation method exists.
At the very least, I believe a rewriting system will work which has all the usual rewriting rules plus a few extra new redexes that says that an elimination term applied to the case
expression commutes the elimination term into all of the branches,
and another that says when all branches of a case expression contain the same introduction term, that introduction term is commuted to the outside of the case expression, and maybe also the rules I listed above for unlifted products.
I conjecture this rewriting system is confluent and unrestricted commuting conversions are convertable (probably requiring η-conversions as well).
Without proofs of my conjectures I am a little concerned that this all does not actually work out. There may be some bad interaction with fixpoints that I am not seeing. If this does all work out then shouldn’t I have heard about it by now?
In Monads in dynamic languages, I explained what the definition of a monad in a dynamic language should be and concluded that there’s nothing precluding them from existing. But I didn’t give an example either.
So, in case you are still wondering whether non-trivial monads are possible in a dynamic language, here you go. I’ll implement a couple of simple monads — Reader and Maybe — with proofs.
And all that will take place in the ultimate dynamic language — the (extensional) untyped lambda calculus.
The definitions of the Reader and Maybe monads are not anything special; they are the same definitions as you would write in a typed language, except Maybe is Church-encoded.
What I find fascinating about this is that despite the untyped language, which allows more things to go wrong than a typed one, the monad laws still hold. You can still write monadic code and reason about it in the untyped lambda calculus in the same way as you would do in Haskell.
return x = λr.x
a >>= k = λr.k(ar)r
return x >>= k
{ inline return }
= λr.x >>= k
{ inline >>= }
= λr.k((λr.x)r)r
{ β-reduce }
= λr.kxr
{ η-reduce }
= kx
a >>= return
{ inline return }
= a >>= λx.λr.x
{ inline >>= }
= λr.(λx.λr.x)(ar)r
{ β-reduce }
= λr.ar
{ η-reduce }
= a
a >>= f >>= g
{ inline 1st >>= }
= λr.f(ar)r >>= g
{ inline 2nd >>= }
= λr.g((λr.f(ar)r)r)r
{ β-reduce }
= λr.g(f(ar)r)r
a >>= (λx. f x >>= g)
{ inline 2nd >>= }
= a >>= λx.λr.g((fx)r)r
{ inline 1st >>= }
= λr.(λx.λr.g(fxr)r)(ar)r
{ β-reduce }
= λr.g(f(ar)r)r
return x = λj.λn.jx
a >>= k = λj.λn.a(λx.kxjn)n
return x >>= k
{ inline return }
= λj.λn.jx >>= k
{ inline >>= }
= λj.λn.(λj.λn.jx)(λx.kxjn)n
{ β-reduce }
= λj.λn.kxjn
{ η-reduce }
= kx
a >>= return
{ inline return }
= a >>= λx.λj.λn.jx
{ inline >>= }
= λj.λn.a(λx.(λx.λj.λn.jx)xjn)n
{ β-reduce }
= λj.λn.a(λx.jx)n
{ η-reduce }
= λj.λn.ajn
{ η-reduce }
= a
a >>= f >>= g
{ inline 1st >>= }
= (λj.λn.a(λx.fxjn)n) >>= g
{ inline 2nd >>= }
= (λj.λn.(λj.λn.a(λx.fxjn)n)(λx.gxjn)n)
{ β-reduce }
= λj.λn.a(λx.fx(λx.gxjn)n)n
a >>= (λx. f x >>= g)
{ inline 2nd >>= }
= a >>= (λx.λj.λn.fx(λx.gxjn)n)
{ inline 1st >>= }
= λj.λn.a(λx.(λx.λj.λn.fx(λx.gxjn)n)xjn)n
{ β-reduce }
= λj.λn.a(λx.fx(λx.gxjn)n)n
This year’s program is titled Types, Logic, Semantics, and Verification and features the following speakers:
The registration deadline is March 16, 2015.
Full information on registration and scholarships is available at https://www.cs.uoregon.edu/research/summerschool/.
Please address all inquiries to summerschool@cs.uoregon.edu.
Best regards from the OPLSS 2015 organizers!
Robert Harper
Greg Morrisett
Zena Ariola
It is often stated that Foldable
is effectively the toList
class. However, this turns out to be wrong. The real fundamental member of Foldable
is foldMap
(which should look suspiciously like traverse
, incidentally). To understand exactly why this is, it helps to understand another surprising fact: lists are not free monoids in Haskell.
This latter fact can be seen relatively easily by considering another list-like type:
data SL a = Empty | SL a :> a instance Monoid (SL a) where mempty = Empty mappend ys Empty = ys mappend ys (xs :> x) = (mappend ys xs) :> x single :: a -> SL a single x = Empty :> x
So, we have a type SL a
of snoc lists, which are a monoid, and a function that embeds a
into SL a
. If (ordinary) lists were the free monoid, there would be a unique monoid homomorphism from lists to snoc lists. Such a homomorphism (call it h
) would have the following properties:
h [] = Empty h (xs <> ys) = h xs <> h ys h [x] = single x
And in fact, this (together with some general facts about Haskell functions) should be enough to define h
for our purposes (or any purposes, really). So, let's consider its behavior on two values:
h [1] = single 1 h [1,1..] = h ([1] <> [1,1..]) -- [1,1..] is an infinite list of 1s = h [1] <> h [1,1..]
This second equation can tell us what the value of h
is at this infinite value, since we can consider it the definition of a possibly infinite value:
x = h [1] <> x = fix (single 1 <>) h [1,1..] = x
(single 1
)
is a strict function, so the fixed point theorem tells us that x = ⊥
.
This is a problem, though. Considering some additional equations:
[1,1..] <> [n] = [1,1..] -- true for all n
h [1,1..] = ⊥
h ([1,1..] <> [1]) = h [1,1..] <> h [1]
= ⊥ <> single 1
= ⊥ :> 1
≠ ⊥
So, our requirements for h
are contradictory, and no such homomorphism can exist.
The issue is that Haskell types are domains. They contain these extra partially defined values and infinite values. The monoid structure on (cons) lists has infinite lists absorbing all right-hand sides, while the snoc lists are just the opposite.
This also means that finite lists (or any method of implementing finite sequences) are not free monoids in Haskell. They, as domains, still contain the additional bottom element, and it absorbs all other elements, which is incorrect behavior for the free monoid:
pure x <> ⊥ = ⊥
h ⊥ = ⊥
h (pure x <> ⊥) = [x] <> h ⊥
= [x] ++ ⊥
= x:⊥
≠ ⊥
So, what is the free monoid? In a sense, it can't be written down at all in Haskell, because we cannot enforce value-level equations, and because we don't have quotients. But, if conventions are good enough, there is a way. First, suppose we have a free monoid type FM a
. Then for any other monoid m
and embedding a -> m
, there must be a monoid homomorphism from FM a
to m
. We can model this as a Haskell type:
forall a m. Monoid m => (a -> m) -> FM a -> m
Where we consider the Monoid m
constraint to be enforcing that m
actually has valid monoid structure. Now, a trick is to recognize that this sort of universal property can be used to define types in Haskell (or, GHC at least), due to polymorphic types being first class; we just rearrange the arguments and quantifiers, and take FM a
to be the polymorphic type:
newtype FM a = FM { unFM :: forall m. Monoid m => (a -> m) -> m }
Types defined like this are automatically universal in the right sense. [1] The only thing we have to check is that FM a
is actually a monoid over a
. But that turns out to be easily witnessed:
embed :: a -> FM a
embed x = FM $ \k -> k x
instance Monoid (FM a) where
mempty = FM $ \_ -> mempty
mappend (FM e1) (FM e2) = FM $ \k -> e1 k <> e2 k
Demonstrating that the above is a proper monoid delegates to instances of Monoid
being proper monoids. So as long as we trust that convention, we have a free monoid.
However, one might wonder what a free monoid would look like as something closer to a traditional data type. To construct that, first ignore the required equations, and consider only the generators; we get:
data FMG a = None | Single a | FMG a :<> FMG a
Now, the proper FM a
is the quotient of this by the equations:
None :<> x = x = x :<> None
x :<> (y :<> z) = (x :<> y) :<> z
One way of mimicking this in Haskell is to hide the implementation in a module, and only allow elimination into Monoid
s (again, using the convention that Monoid
ensures actual monoid structure) using the function:
unFMG :: forall a m. Monoid m => FMG a -> (a -> m) -> m
unFMG None _ = mempty
unFMG (Single x) k = k x
unFMG (x :<> y) k = unFMG x k <> unFMG y k
This is actually how quotients can be thought of in richer languages; the quotient does not eliminate any of the generated structure internally, it just restricts the way in which the values can be consumed. Those richer languages just allow us to prove equations, and enforce properties by proof obligations, rather than conventions and structure hiding. Also, one should note that the above should look pretty similar to our encoding of FM a
using universal quantification earlier.
Now, one might look at the above and have some objections. For one, we'd normally think that the quotient of the above type is just [a]
. Second, it seems like the type is revealing something about the associativity of the operations, because defining recursive values via left nesting is different from right nesting, and this difference is observable by extracting into different monoids. But aren't monoids supposed to remove associativity as a concern? For instance:
ones1 = embed 1 <> ones1
ones2 = ones2 <> embed 1
Shouldn't we be able to prove these are the same, becuase of an argument like:
ones1 = embed 1 <> (embed 1 <> ...)
... reassociate ...
= (... <> embed 1) <> embed 1
= ones2
The answer is that the equation we have only specifies the behavior of associating three values:
x <> (y <> z) = (x <> y) <> z
And while this is sufficient to nail down the behavior of finite values, and finitary reassociating, it does not tell us that infinitary reassociating yields the same value back. And the "... reassociate ..." step in the argument above was decidedly infinitary. And while the rules tell us that we can peel any finite number of copies of embed 1
to the front of ones1
or the end of ones2
, it does not tell us that ones1 = ones2
. And in fact it is vital for FM a
to have distinct values for these two things; it is what makes it the free monoid when we're dealing with domains of lazy values.
Finally, we can come back to Foldable
. If we look at foldMap
:
foldMap :: (Foldable f, Monoid m) => (a -> m) -> f a -> m
we can rearrange things a bit, and get the type:
Foldable f => f a -> (forall m. Monoid m => (a -> m) -> m)
And thus, the most fundamental operation of Foldable
is not toList
, but toFreeMonoid
, and lists are not free monoids in Haskell.
[1]: What we are doing here is noting that (co)limits are objects that internalize natural transformations, but the natural transformations expressible by quantification in GHC are already automatically internalized using quantifiers. However, one has to be careful that the quantifiers are actually enforcing the relevant naturality conditions. In many simple cases they are.
The program consists of commands executed in order. There is a single one-bit command:
0: Delete the left-most data bit.
and a single two-bit command:
1 x: If the left-most data bit is 1, copy bit x to the right of the data string.
We halt if ever the data string is empty.
Remarkably, this is enough to do universal computation. Implementing it in Rust's macro system gives a proof (probably not the first one) that Rust's macro system is Turing-complete, aside from the recursion limit imposed by the compiler.
#![feature(trace_macros)]
macro_rules! bct {
// cmd 0: d ... => ...
(0, $($ps:tt),* ; $_d:tt)
=> (bct!($($ps),*, 0 ; ));
(0, $($ps:tt),* ; $_d:tt, $($ds:tt),*)
=> (bct!($($ps),*, 0 ; $($ds),*));
// cmd 1p: 1 ... => 1 ... p
(1, $p:tt, $($ps:tt),* ; 1)
=> (bct!($($ps),*, 1, $p ; 1, $p));
(1, $p:tt, $($ps:tt),* ; 1, $($ds:tt),*)
=> (bct!($($ps),*, 1, $p ; 1, $($ds),*, $p));
// cmd 1p: 0 ... => 0 ...
(1, $p:tt, $($ps:tt),* ; $($ds:tt),*)
=> (bct!($($ps),*, 1, $p ; $($ds),*));
// halt on empty data string
( $($ps:tt),* ; )
=> (());
}
fn main() {
trace_macros!(true);
bct!(0, 0, 1, 1, 1 ; 1, 0, 1);
}
This produces the following compiler output:
bct! { 0 , 0 , 1 , 1 , 1 ; 1 , 0 , 1 }
bct! { 0 , 1 , 1 , 1 , 0 ; 0 , 1 }
bct! { 1 , 1 , 1 , 0 , 0 ; 1 }
bct! { 1 , 0 , 0 , 1 , 1 ; 1 , 1 }
bct! { 0 , 1 , 1 , 1 , 0 ; 1 , 1 , 0 }
bct! { 1 , 1 , 1 , 0 , 0 ; 1 , 0 }
bct! { 1 , 0 , 0 , 1 , 1 ; 1 , 0 , 1 }
bct! { 0 , 1 , 1 , 1 , 0 ; 1 , 0 , 1 , 0 }
bct! { 1 , 1 , 1 , 0 , 0 ; 0 , 1 , 0 }
bct! { 1 , 0 , 0 , 1 , 1 ; 0 , 1 , 0 }
bct! { 0 , 1 , 1 , 1 , 0 ; 0 , 1 , 0 }
bct! { 1 , 1 , 1 , 0 , 0 ; 1 , 0 }
bct! { 1 , 0 , 0 , 1 , 1 ; 1 , 0 , 1 }
bct! { 0 , 1 , 1 , 1 , 0 ; 1 , 0 , 1 , 0 }
bct! { 1 , 1 , 1 , 0 , 0 ; 0 , 1 , 0 }
bct! { 1 , 0 , 0 , 1 , 1 ; 0 , 1 , 0 }
bct! { 0 , 1 , 1 , 1 , 0 ; 0 , 1 , 0 }
...
bct.rs:19:13: 19:45 error: recursion limit reached while expanding the macro `bct`
bct.rs:19 => (bct!($($ps),*, 1, $p ; $($ds),*));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can try it online, as well.
I would much rather drop the commas and write
// cmd 0: d ... => ...
(0 $($ps:tt)* ; $_d:tt $($ds:tt)*)
=> (bct!($($ps)* 0 ; $($ds)*));
// cmd 1p: 1 ... => 1 ... p
(1 $p:tt $($ps:tt)* ; 1 $($ds:tt)*)
=> (bct!($($ps)* 1 $p ; 1 $($ds)* $p));
// cmd 1p: 0 ... => 0 ...
(1 $p:tt $($ps:tt)* ; $($ds:tt)*)
=> (bct!($($ps)* 1 $p ; $($ds)*));
but this runs into the macro future-proofing rules.
If we're required to have commas, then it's at least nice to handle them uniformly, e.g.
// cmd 0: d ... => ...
(0 $(, $ps:tt)* ; $_d:tt $(, $ds:tt)*)
=> (bct!($($ps),*, 0 ; $($ds),*));
// cmd 1p: 1 ... => 1 ... p
(1, $p:tt $(, $ps:tt)* ; $($ds:tt),*)
=> (bct!($($ps),*, 1, $p ; 1 $(, $ds)*, $p));
// cmd 1p: 0 ... => 0 ...
(1, $p:tt $(, $ps:tt)* ; $($ds:tt),*)
=> (bct!($($ps),*, 1, $p ; $($ds),*));
But this too is disallowed. An $x:tt
variable cannot be followed by a repetition $(...)*
, even though it's (I believe) harmless. There is an open RFC about this issue. For now I have to handle the "one" and "more than one" cases separately, which is annoying.
In general, I don't think macro_rules!
is a good language for arbitrary computation. This experiment shows the hassle involved in implementing one of the simplest known "arbitrary computations". Rather, macro_rules!
is good at expressing patterns of code reuse that don't require elaborate compile-time processing. It does so in a way that's declarative, hygienic, and high-level.
However, there is a big middle ground of non-elaborate, but non-trivial computations. macro_rules!
is hardly ideal for that, but procedural macros have problems of their own. Indeed, the bct!
macro is an extreme case of a pattern I've found useful in the real world. The idea is that every recursive invocation of a macro gives you another opportunity to pattern-match the arguments. Some of html5ever's macrosdo this, for example.
For decades my colleague, Guy Blelloch, and I have promoted a grand synthesis of the two “theories” of computer science, combinatorial theory and logical theory. It is only a small exaggeration to say that these two schools of thought work in isolation. The combinatorial theorists concern themselves with efficiency, based on hypothetical translations of high-level algorithms to low-level machines, and have no useful theory of composition, the most important tool for developing large software systems. Logical theorists concern themselves with composition, emphasizing the analysis of the properties of components of systems and how those components are combined; the heart of logic is a theory of composition (entailment). But relatively scant attention is paid to efficiency, and, to a distressingly large extent, the situation is worsening, and not improving.
Guy and I have argued, through our separate and joint work, for the applicability of PL ideas to algorithms design, leading. for example, to the concept of adaptive programming that Umut Acar has pursued aggressively over the last dozen years. And we have argued for the importance of cost analysis, for various measures of cost, at the level of the code that one actually writes, and not how it is compiled. Last spring, prompted by discussions with Anindya Banerjee at NSF in the winter of 2014, I decided to write a position paper on the topic, outlining the scientific opportunities and challenges that would arise in an attempt to unify the two, disparate theories of computing. I circulated the first draft privately in May, and revised it in July to prepare for a conference call among algorithms and PL researchers (sponsored by NSF) to find common ground and isolate key technical challenges to achieving its goals.
There are serious obstacles to be overcome if a grand synthesis of the “two theories” is to be achieved. The first step is to get the right people together to discuss the issues and to formulate a unified vision of what are the core problems, and what are promising directions for short- and long-term research. The position paper is not a proposal for funding, but is rather a proposal for a meeting designed to bring together two largely (but not entirely) disparate communities. In summer of 2014 NSF hosted a three-hour long conference call among a number of researchers in both areas with a view towards hosting a workshop proposal in the near future. Please keep an eye out for future developments.
I am grateful to Anindya Banerjee at NSF for initiating the discussion last winter that led to the paper and discussion, and I am grateful to Swarat Chaudhuri for his helpful comments on the proposal.
[Update: word smithing, corrections, updating, removed discussion of cost models for fuller treatment later, fixed incoherence after revision.]
Roughly 3/4 of a year after Chris Done first proposed his redesign, we finally went live with the new https://haskell.org homepage.
Much of the intermediate time was spent on cleanup of the introductory text, as well as adding features and tweaking designs. There was also a very length process of setting everything up to ensure that the "try it" feature could be deployed and well supported. Finally, we had to do all the work to ensure that both the wiki remained hosted well somewhere else with proper rewrite rules, and also that the non-wiki content hosted under haskell.org (things like /ghc, /platform, /hoogle, etc) continued to work. Some of the more publicly visible elements of this are the move of mailinglist hosting to http://mail.haskell.org and of the wiki content to http://wiki.haskell.org.
When we did finally go live, we got great feedback on reddit and made #1 on hacker news.
There were also a lot of things we missed, and which those threads pointed out -- in particular we didn't pay enough attention to the content of various pages. The "documentation" and "community" section needed lots of cleanup and expansion. Furthermore, the "downloads" page didn't point to the platform. Whoops! (It is to be noted that recommendation of the platform or not is an issue of some discussion now, for example on this active reddit thread. )
We think that we have fixed most of the issues raised. But there are still issues! The code sample for primes doesn't work in the repl. We have an active github ticket discussing the best way to fix this -- either change the sample, change the repl, or both. This is one of a number of issues under discussion on the github tracker.
We still want feedback, we still want patches, and there's still plenty to be made more awesome. For example: should we have some sort of integrated search? How? Where? The github infrastructure makes taking patches and tracking these issues easy. The repo now has 27 contributors, and we're look forward to more.
One balance that has been tricky to strike has been in making the site maximally useful for new users just looking to learn and explore Haskell, while also providing access to the wide range of resources and entry points that the old wiki did. One advantage that we have now is that the wiki is still around, and is still great. Freed from needing to also serve as a default haskell landing page, the wiki can hopefully grow to be an even better resource. It needs volunteers to chip in and curate the content. And ideally it needs a new front page that highlights the things that you _won't_ find on the main homepage, instead of the things you now can. One place we could use more help is in wiki admins, if anyone wants to volunteer. We need people to curate the news and events sections, to fight spam, to manage plugins and investigate adding new features, to update and clean up the stylesheets, and to manage new user accounts. If you're interested, please get in touch at admin@h.o.
All the resources we have today are the result of our open-source culture, that believes in equal parts in education, innovation, and sharing. They stem from people who enjoy Haskell wanting to contribute back to others -- wanting to make more accessible the knowledge and tools they've struggled to master, and wanting to help us make even better and more powerful tools.
There's always more infrastructure work to be done, and there are always more ways to get involved. In a forthcoming blog, I'll write further about some of the other pressing issues we hope to tackle, and where interested parties can chip in.
Imagine a language, where all we have are strings and numbers, and where + is a built-in function that can either add numbers or concatenate strings. Consider the following code:
a + b
What are the types of a
, b
and +
?
Using a Haskell-like type signature we can say that + has either of these types:
+ :: (Number, Number) -> Number
or
+ :: (String, String) -> String
(Currying avoided intentionally.)
This is a classic case of ad-hoc polymorphism. With type classes one could say:
class Plus a where (+) :: (a, a) -> a instance Plus Number where x + y = ... implement addition ... instance Plus String where x + y = ... implement string concatenation ...
That’s great! Now we can type our a + b
:
a + b :: Plus t => t
Where a :: t
and b :: t
.
Yes, there are also Scala-style implicits (recently discussed here in a proposal for OCaml), and probably other solutions I’m less aware of.
Notice that in the + example, a constraint on a single type (expressed through type class requirements) was enough to solve the problem.
Now, let’s look at a more complicated problem, involving multiple types at once.
Consider a language with only parameterized lists [a]
and maps from strings to some type, Map a
. Now, throw in a terribly ambiguous syntax where brackets denote either accessing a list at some index, or accessing a map at some key (wat):
x[i]
That syntax means “access the container x at index/key i”. Now, what are the types of x
, i
and the brackets? There are two possibilities: if x
is an array, then i
must be a number; otherwise if x
is a map, then i
is a string.
Type families can be used to encode the above constraints:
class Indexable a where type Index a type Result a atIndex :: (a, Index a) -> Result a
The syntax means that any instance of the type class Indexable a
“comes with” two accompanying types, Index a
and Result a
which are uniquely determined by the appropriate choice of a
. For [t], Index = Number and Result = t. For Map t, Index = String and Result = t.
Now we just need syntax sugar where x[i] = x `atIndex` i
. We could then define instances for our two types, [a] and Map a (remember, in our language the map keys are always Strings):
instance Indexable [a] where type Index [a] = Number type Result [a] = a atIndex = ... implement list lookup by index ... instance Indexable (Map a) where type Index (Map String a) = String type Result (Map String a) = a atIndex = ... implement map lookup by key ...
Nice. Now, to type our original expression x[i]
:
x[i] :: Indexable a => Result a
Where x :: a
and i :: Index a
.
Great! Type families (or rather, “type dependencies”) provide a way to represent inter-type constraints and can be used to resolve ambiguous expressions during type inference. (I’ve heard that type families are equivalent to functional dependencies or even GADTs for some value of “equivalent” , maybe where “equivalent = not equivalent at all”, but that’s off-topic.) See also
a valid Haskell implementation of the above example (thanks to Eyal Lotem).
I don’t know if Scala-style implicits can be used to such effect – let me know if you do.
Now, here’s an altogether different way to approach the ad-hoc polymorphism problem. This was my idea for a solution to ad-hoc polymorphism with inter-type constraints, before I realized type families could do that too.
Define an “ambiguous type” as a type that represents a disjunction between a set of known types. For example, for + we could say:
+ :: (a = (String | Number)) => (a, a) -> a
The type variable a
is constrained to be either a String or a Number, explicitly, without a type class. I see two main differences with type classes, from a usage point of view. First, this type is closed: because there is now class, you can’t define new instances. It must be a Number or a String. Second, you don’t need to add + to some class, and if we have more operators that require a similar constiant, or even user-defined functions that require some kind of ad-hoc constraint, we don’t need to define a type class and add functions/operators to it. Lastly, the type is straightforward, and is easier to explain to people familiar with types but not with type classes.
By the way, I have no idea (yet) if this leads to a sound type system.
Let’s continue to our more complicated, ambiguous “atIndex” function that can either index into lists (using a number) or into maps (using a string). A simple disjunction (or, |) is not enough: we must express the dependencies between the container type and the index type. To that end we add conjunctions (and, &) so we can express arbitrary predicates on types, such as (t & s) | u
. The type of atIndex will now be:
atIndex :: (a = [t] & i = Number) | (a = Map t & i = String) => a -> i -> t
This definitely does the job. Is it sound? Will it blend? I don’t know (yet).
The main drawback of this system is the combinatorial explosion of the predicate when combining ambiguous (overloaded) functions, such as in the following program:
x[i] + y[j]
x
could be either a list or map of either numbers or strings, and so can y
, so i
and j
can be either numbers or strings (to index into the lists or maps). We quickly get to a very large predicate expression, that is slow to analyze and more importantly, very hard to read and understand.
Nevertheless, I implemented it.
Infernu is a type inference engine for JavaScript. All of the examples described above are “features” of the JavaScript language. One of the goals of Infernu is to infer the safest most general type. JS expressions such as x[i] + d
should be allowed to be used in a way that makes sense. To preserve safety, Infernu doesn’t allow implicit coercions anywhere, including in +, or when indexing into a map (JS objects can act as string-keyed maps). To retain a pseudo-dynamic behavior safely, polymorphism with fancy type constraints as discussed above are required.
To properly implement ambiguous types I had to solve a bunch of implementation problems, such as:
data QualType t = QualType (Pred t) (Type t)
. Then, the unification function is left unchanged: when doing inference, I also call the constraint unifier separately, using its result to “annotate” the inferred type with a constraint predicate.Apparently, the ambiguous types are not a good solution due to the complex type signatures. I’ll either leave the ambiguous types in (having already implemented them) or just get rid of them and implement type families, which will require another crusade on my code.
I’m still thinking about whether or not type families cover all cases that my ambiguous types can. One example is the “type guard” pattern common in JavaScript:
if (typeof obj == 'string') { .. here obj should be treated as a string! ... }
Can ambiguous types and type families both be used coherently to implement compile-type checking inside type guards? (Haven’t given it much thought – so far I think yes for both.)
Also, since ambiguous types are closed, they may offer some features that type families can’t, such as warnings about invalid or incomplete guards, which can be seen as type pattern matching. Maybe closed type families are a solution: I don’t know much about them yet.
I also don’t know if ambiguous types lead to a sound type system or are there pitfalls I haven’t considered.
Remember that these ambiguous types may also interact with many features: parameteric polymorphism, row-type polymorphism, the requirement for prinicpal types and full type inference without annotations, etc.
Lastly, I’ve probably re-invented the wheel and somebody has written a bunch of papers in 1932, and there’s some well-accepted wisdom I’ve somehow missed.
Thanks to Eyal Lotem for a short, but very fruitful conversation where he showed me how type families may obsolete my idea.
Hi *,
It's time for the GHC weekly news. It's been particularly quiet the past week still, and the ghc-7.10 branch has been quite quiet. So the notes are relatively short this week.
This week, GHC HQ met up to discuss some new stuff:
As usual, we've had a healthy amount of random assorted chatter on the mailing lists:
Some noteworthy commits that went into ghc.git in the past week include:
Closed tickets the past week include: #10047, #10082, #10019, #10007, #9930, #10085, #10080, #9266, #10095, and #3649.
Summary: Don't use nub
. A much faster alternative is nubOrd
from the extra
package.
The Haskell Data.List
module contains the function nub
, which removes duplicate elements. As an example:
nub [1,2,1,3] == [1,2,3]
The function nub
has the type Eq a => [a] -> [a]
. The complexity of take i $ nub xs
is O(length xs * i). Assuming all elements are distinct and you want them all, that is O(length xs ^ 2). If we only have an Eq
instance, that's the best complexity we can achieve. The reason is that given a list as ++ [b]
, to check if b
should be in the output requires checking b
for equality against nub as
, which requires a linear scan. Since checking each element requires a linear scan, we end up with a quadratic complexity.
However, if we have an Ord
instance (as we usually do), we have a complexity of O(length xs * log i) - a function that grows significantly slower. The reason is that we can build a balanced binary-tree for the previous elements, and check each new element in log time. Does that make a difference in practice? Yes. As the graph below shows, by the time we get to 10,000 elements, nub
is 70 times slower. Even at 1,000 elements nub
is 8 times slower.
The fact nub
is dangerous isn't new information, and I even suggested changing the base
library in 2007. Currently there seems to be a nub
hit squad, including Niklas Hambüchen, who go around raising tickets against various projects suggesting they avoid nub
. To make that easier, I've added nubOrd
to my extra
package, in the Data.List.Extra
module. The function nubOrd
has exactly the same semantics as nub
(both strictness properties and ordering), but is asymptotically faster, so is almost a drop-in replacement (just the additional Ord
context).
For the curious, the above graph was generated in Excel, with the code below. I expect the spikes in nub
correspond to garbage collection, or just random machine fluctuations.
import Control.Exception
import Data.List.Extra
import Control.Monad
import System.Time.Extra
benchmark xs = do
n <- evaluate $ length xs
(t1,_) <- duration $ evaluate $ length $ nub xs
(t2,_) <- duration $ evaluate $ length $ nubOrd xs
putStrLn $ show n ++ "," ++ show t1 ++ "," ++ show t2
main = do
forM_ [0,100..10000] $ \i -> benchmark $ replicate i 1
forM_ [0,100..10000] $ \i -> benchmark [1..i]
I originally wrote this content as a chapter of Mezzo Haskell. I'm going to be starting up a similar effort to Mezzo Haskell in the next few days, and I wanted to get a little more attention on this content to get feedback on style and teaching approach. I'll be discussing that new initiative on the Commercial Haskell mailing list.
The point of this chapter is to help you peel back some of the layers of abstraction in Haskell coding, with the goal of understanding things like primitive operations, evaluation order, and mutation. Some concepts covered here are generally "common knowledge" in the community, while others are less well understood. The goal is to cover the entire topic in a cohesive manner. If a specific section seems like it's not covering anything you don't already know, skim through it and move on to the next one.
While this chapter is called "Primitive Haskell," the topics are very much GHC-specific. I avoided calling it "Primitive GHC" for fear of people assuming it was about the internals of GHC itself. To be clear: these topics apply to anyone compiling their Haskell code using the GHC compiler.
Note that we will not be fully covering all topics here. There is a "further reading" section at the end of this chapter with links for more details.
Let's start with a really simple question: tell me how GHC deals with the
expression 1 + 2
. What actually happens inside GHC? Well, that's a bit of a
trick question, since the expression is polymorphic. Let's instead use the more
concrete expression 1 + 2 :: Int
.
The +
operator is actually a method of the Num
type class, so we need to look at the Num Int
instance:
instance Num Int where
I# x + I# y = I# (x +# y)
Huh... well that looks somewhat magical. Now we need to understand both the
I#
constructor and the +#
operator (and what's with the hashes all of a
sudden?). If we do a Hoogle
search, we can easily
find the relevant
docs,
which leads us to the following definition:
data Int = I# Int#
So our first lesson: the Int
data type you've been using since you first
started with Haskell isn't magical at all, it's defined just like any other
algebraic data type... except for those hashes. We can also search for
+#
, and end up at
some
documentation
giving the type:
+# :: Int# -> Int# -> Int#
Now that we know all the types involved, go back and look at the Num
instance
I quoted above, and make sure you feel comfortable that all the types add up
(no pun intended). Hurrah, we now understand exactly how addition of Int
s
works. Right?
Well, not so fast. The Haddocks for +#
have a very convenient source link...
which (apparently due to a Haddock bug) doesn't actually work. However, it's
easy enough to find the correct hyperlinked
source.
And now we see the implementation of +#
, which is:
infixl 6 +#
(+#) :: Int# -> Int# -> Int#
(+#) = let x = x in x
That doesn't look like addition, does it? In fact, let x = x in x
is another
way of saying bottom, or undefined
, or infinite loop. We have now officially
entered the world of primops.
primops, short for primary operations, are core pieces of functionality
provided by GHC itself. They are the magical boundary between "things we do in
Haskell itself" and "things which our implementation provides." This division
is actually quite elegant; as we already explored, the standard +
operator
and Int
data type you're used to are actually themselves defined in normal
Haskell code, which provides many benefits: you get standard type class
support, laziness, etc. We'll explore some of that in more detail later.
Look at the implementation of other functions in
GHC.Prim
;
they're all defined as let x = x in x
. When GHC reaches a call to one of
these primops, it automatically replaces it with the real implementation for
you, which will be some assembly code, LLVM code, or something similar.
Why do all of these functions end in a #
? That's called the magic hash
(enabled by the MagicHash
language extension), and it is a convention to
distinguish boxed and unboxed types and operations. Which, of course, brings us
to our next topic.
The I#
constructor is actually just a normal data constructor in Haskell,
which happens to end with a magic hash. However, Int#
is not a normal
Haskell data type. In GHC.Prim
, we can see that it's implementation is:
data Int#
Which, like everything else in GHC.Prim
is really a lie. In fact, it's
provided by the implementation, and is in fact a normal long int
from C
(32-bit or 64-bit, depending on architecture). We can see something even
funnier about it in GHCi:
> :k Int
Int :: *
> :k Int#
Int# :: #
That's right, Int#
has a different kind than normal Haskell datatypes: #
.
To quote the GHC
docs:
Most types in GHC are boxed, which means that values of that type are represented by a pointer to a heap object. The representation of a Haskell
Int
, for example, is a two-word heap object. An unboxed type, however, is represented by the value itself, no pointers or heap allocation are involved.
See those docs for more information on distinctions between boxed and unboxed types. It is vital to understand those differences when working with unboxed values. However, we're not going to go into those details now. Instead, let's sum up what we've learnt so far:
Int
addition is just normal Haskell code in a typeclassInt
itself is a normal Haskell datatypeInt#
and +#
as an unboxed long int
and addition on that type, respectively. This is exported by GHC.Prim
, but the real implementation is "inside" GHC.Int
contains an Int#
, which is an unboxed type.Int
s takes advantage of the +#
primop.Alright, we understand basic addition! Let's make things a bit more complicated. Consider the program:
main = do
let x = 1 + 2
y = 3 + 4
print x
print y
We know for certain that the program will first print 3
, and then print 7
.
But let me ask you a different question. Which operation will GHC perform
first: 1 + 2
or 3 + 4
? If you guessed 1 + 2
, you're probably right, but
not necessarily! Thanks to referential transparency, GHC is fully within its
rights to rearrange evaluation of those expressions and add 3 + 4
before
1 + 2
. Since neither expression depends on the result of the other, we
know that it is irrelevant which evaluation occurs first.
Note: This is covered in much more detail on the GHC wiki's evaluation order and state tokens page.
That begs the question: if GHC is free to rearrange evaluation like that, how
could I say in the previous paragraph that the program will always print 3
before printing 7
? After all, it doesn't appear that print y
uses the
result of print x
at all, so we not rearrange the calls? To answer that, we
again need to unwrap some layers of abstraction. First, let's evaluate and
inline x
and y
and get rid of the do
-notation sugar. We end up with the
program:
main = print 3 >> print 7
We know that print 3
and print 7
each have type IO ()
, so the >>
operator being used comes from the Monad IO
instance. Before we can understand that, though, we need to look at the definition of IO
itself
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
We have a few things to understand about this line. Firstly,
State#
and
RealWorld
.
For now, just pretend like they are a single type; we'll see when we get to
ST
why State#
has a type parameter.
The other thing to understand is that (# ... #)
syntax. That's an unboxed
tuple, and it's a way of returning multiple values from a function. Unlike a
normal, boxed tuple, unboxed tuples involve no extra allocation and create no
thunks.
So IO
takes a real world state, and gives you back a real world state and
some value. And that right there is how we model side effects and mutation in a
referentially transparent language. You may have heard the description of IO
as "taking the world and giving you a new one back." What we're doing here is
threading a specific state token through a series of function calls. By
creating a dependency on the result of a previous function, we are able to
ensure evaluation order, yet still remain purely functional.
Let's see this in action, by coming back to our example from above. We're now
ready to look at the Monad IO
instance:
instance Monad IO where
(>>) = thenIO
thenIO :: IO a -> IO b -> IO b
thenIO (IO m) k = IO $ \ s -> case m s of (# new_s, _ #) -> unIO k new_s
unIO :: IO a -> (State# RealWorld -> (# State# RealWorld, a #))
unIO (IO a) = a
(Yes, I changed things a bit to make them easier to understand. As an exercise,
compare that this version is in fact equivalent to what is actually defined in
GHC.Base
.)
Let's inline these definitions into print 3 >> print 7
:
main = IO $ \s0 ->
case unIO (print 3) s0 of
(# s1, res1 #) -> unIO (print 7) s1
Notice how, even though we ignore the result of print 3
(the res1
value), we still depend on the new state token s1
when we evaluate print 7
,
which forces the order of evaluation to first evaluate print 3
and then
evaluate print 7
.
If you look through GHC.Prim
, you'll see that a number of primitive
operations are defined in terms of State# RealWorld
or State# s
, which
allows us to force evaluation order.
Exercise: implement a function getMaskingState :: IO Int
using the
getMaskingState#
primop and the IO
data constructor.
Let's compare the definitions of the IO
and ST
types:
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
newtype ST s a = ST (State# s -> (# State# s, a #))
Well that looks oddly similar. Said more precisely, IO
is isomorphic to ST
RealWorld
. ST
works under the exact same principles as IO
for threading
state through, which is why we're able to have things like mutable references
in the ST
monad.
By using an uninstantiated s
value, we can ensure that we aren't "cheating"
and running arbitrary IO
actions inside an ST
action. Instead, we just have
"local state" modifications, for some definition of local state. The details of
using ST
correctly and the Rank2Types approach to runST
are interesting,
but beyond the scope of this chapter, so we'll stop discussing them here.
Since ST RealWorld
is isomorphic to IO
, we should be able to convert
between the two of them. base
does in fact provide the
stToIO
function.
Exercise: write a pair of functions to convert between IO a
and ST
RealWorld a
.
Exercise: GHC.Prim
has a section on mutable
variables,
which forms the basis on IORef
and STRef
. Provide a new implementation of
STRef
, including newSTRef,
readSTRef, and
writeSTRef`.
It's a bit unfortunate that we have to have two completely separate sets of
APIs: one for IO
and another for ST
. One common example of this is IORef
and STRef
, but- as we'll see at the end of this section- there are plenty of
operations that we'd like to be able to generalize.
This is where PrimMonad
, from the primitive
package, comes into play. Let's
look at its definition:
-- | Class of primitive state-transformer monads
class Monad m => PrimMonad m where
-- | State token type
type PrimState m
-- | Execute a primitive operation
primitive :: (State# (PrimState m) -> (# State# (PrimState m), a #)) -> m a
Note: I have not included the internal
method, since it will likely be
removed. In fact, at the time
you're reading this, it may already be gone!
PrimState
is an associated type giving the type of the state token. For IO
,
that's RealWorld
, and for ST s
, it's s
. primitive
gives a way to lift
the internal implementation of both IO
and ST
to the monad under question.
Exercise: Write implementations of the PrimMonad IO
and PrimMonad (ST s)
instances, and compare against the real ones.
The primitive package provides a number of wrappers around types and functions
from GHC.Prim
and generalizes them to both IO
and ST
via the PrimMonad
type class.
Exercise: Extend your previous STRef
implementation to work in any
PrimMonad
. After you're done, you may want to have a look at
Data.Primitive.MutVar.
The vector
package builds on top of the primitive
package to provide
mutable vectors that can be used from both IO
and ST
. This chapter is not
a tutorial on the vector
package, so we won't go into any more details now.
However, if you're curious, please look through the
Data.Vector.Generic.Mutable
docs.
To tie this off, we're going to implement a ReaderIO
type. This will flatten
together the implementations of ReaderT
and IO
. Generally speaking, there's
no advantage to doing this: GHC should always be smart enough to generate the
same code for this and for ReaderT r IO
(and in my benchmarks, they perform
identically). But it's a good way to test that you understand the details here.
You may want to try implementing this yourself before looking at the implementation below.
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE UnboxedTuples #-}
import Control.Applicative (Applicative (..))
import Control.Monad (ap, liftM)
import Control.Monad.IO.Class (MonadIO (..))
import Control.Monad.Primitive (PrimMonad (..))
import Control.Monad.Reader.Class (MonadReader (..))
import GHC.Base (IO (..))
import GHC.Prim (RealWorld, State#)
-- | Behaves like a @ReaderT r IO a@.
newtype ReaderIO r a = ReaderIO
(r -> State# RealWorld -> (# State# RealWorld, a #))
-- standard implementations...
instance Functor (ReaderIO r) where
fmap = liftM
instance Applicative (ReaderIO r) where
pure = return
(<*>) = ap
instance Monad (ReaderIO r) where
return x = ReaderIO $ \_ s -> (# s, x #)
ReaderIO f >>= g = ReaderIO $ \r s0 ->
case f r s0 of
(# s1, x #) ->
let ReaderIO g' = g x
in g' r s1
instance MonadReader r (ReaderIO r) where
ask = ReaderIO $ \r s -> (# s, r #)
local f (ReaderIO m) = ReaderIO $ \r s -> m (f r) s
instance MonadIO (ReaderIO r) where
liftIO (IO f) = ReaderIO $ \_ s -> f s
instance PrimMonad (ReaderIO r) where
type PrimState (ReaderIO r) = RealWorld
primitive f = ReaderIO $ \_ s -> f s
-- Cannot properly define internal, since there's no way to express a
-- computation that requires an @r@ input value as one that doesn't. This
-- limitation of @PrimMonad@ is being addressed:
--
-- https://github.com/haskell/primitive/pull/19
internal (ReaderIO f) =
f (error "PrimMonad.internal: environment evaluated")
Exercise: Modify the ReaderIO
monad to instead be a ReaderST
monad, and
take an s
parameter for the specific state token.
For the last 3 or so weeks I’ve been writing a bunch of Twelf code for my research (hence my flat-lined github punch card). Since it’s actually a lot of fun I thought I’d share a bit about Twelf.
Since Twelf isn’t a terribly well known language it’s worth stating what exactly it is we’re talking about. Twelf is a proof assistant. It’s based on a logic called LF (similarly to how Coq is based on CiC).
Twelf is less powerful than some other proof assistants but by limiting some of its power it’s wonderfully suited to proving certain types of theorems. In particular, Twelf admits true “higher order abstract syntax” (don’t worry if you don’t know what this means) this makes it great for formalizing programming languages with variable bindings.
In short, Twelf is a proof assistant which is very well suited for defining and proving things about programming languages.
It’s much more fun to follow along a tutorial if you actually have a Twelf installation to try out the code. You can download and compile the sources to Twelf with SML/NJ or Mlton. You could also use smackage to get the compiler.
Once you’ve compiled the thing you should be left with a binary twelf-server
. This is your primary way of interacting with the Twelf system. There’s quite a slick Emacs interface to smooth over this process. If you’ve installed twelf into a directory ~/twelf/
all you need is the incantation
(setq twelf-root "~/twelf/")
(load (concat twelf-root "emacs/twelf-init.el"))
Without further ado, let’s look at some Twelf code.
When writing Twelf code we encode the thing that we’re studying, the object language, as a bunch of type families and constructors in Twelf. This means that when we edit a Twelf file we’re just writing signatures.
For example, if we want to encode natural numbers we’d write something like
nat : type.
z : nat.
s : nat -> nat.
This is an LF signature, we declare a series of constants with NAME : TYPE.
. Note the period at the end of each declaration. First we start by declaring a type for natural numbers called nat
with nat : type.
Here type
is the base kind of all types in Twelf. Next we go on to declare what the values of type nat
are.
In this case there are two constructors for nat
. We either have zero, z
, or the successor of another value of type nat
, s
. This gives us a canonical forms lemma for natural numbers: All values of type nat
are either
z
s N
for some value N : nat
Later on, we’ll justify the proofs we write with this lemma.
Anyways, now that we’ve encoded the natural numbers I wanted to point out a common point of confusion about Twelf. We’re not writing programs to be run. We’re writing programs exclusively for the purpose of typechecking. Heck, we’re not even writing programs at the term level! We’re just writing a bunch of constants out with their types! More than this even, Twelf is defined so that you can only write canonical forms. This means that if you write something in your program, it has to be in normal form, fully applied! In PL speak it has to be β-normal and η-long. This precludes actually writing programs for the sake of reducing them. You’re never going to write a web server in Twelf, you even be writing “Hello World”. You might use it to verify the language your writing them in though.
Now that we’ve gotten the awkward bit out the way, let’s now define a Twelf encoding of a judgment. We want to encode the judgment +
which is given by the following rules
—————————
z + n = n
m + n = p
———————————————
s(m) + n = s(p)
In the rest of the world we have this idea that propositions are types. In twelf, we’re worried about defining logics and systems, so we have the metatheoretic equivalent: judgments are types.
So we define a type family plus
.
plus : nat -> nat -> nat -> type
So plus
is a type indexed over 3 natural numbers. This is our first example of dependent types: plus
is a type which depends on 3 terms. Now we can list out how to construct a derivation of plus
. This means that inference rules in a meta theory corresponds to constants in Twelf as well.
plus/z : {n : nat} plus z n n
This is some new syntax, in Twelf {NAME : TYPE} TYPE
is a dependent function type, a pi type. This notation is awfully similar to Agda and Idris if you’re familiar with them. This means that this constructor takes a natural number, n
and returns a derivation that plus z n n
. The fact that the return type depends on what nat
we supply is why this needs a dependent type.
In fact, this is such a common pattern that Twelf has sugar for it. If we write an unbound capital variable name Twelf will automagically introduce a binder {N : ...}
at the front of our type. We can thus write our inference rules as
plus/z : plus z N N
plus/s : plus N M P -> plus (s N) M (s P)
These rules together with our declaration of plus
. In fact, there’s something kinda special about these two rules. We know that for any term n : nat
which is in canonical form, there should be an applicable rule. In Twelf speak, we say that this type family is total.
We can ask Twelf to check this fact for us by saying
plus : nat -> nat -> nat -> type.
%mode plus +N +M -P.
plus/z : plus z N N.
plus/s : plus N M P -> plus (s N) M (s P).
%worlds () (plus _ _ _).
%total (N) (plus N _ _).
We want to show that for all terms n, m : nat
in canonical form, there is a term p
in canonical form so that plus n m p
. This sort of theorem is what we’d call a ∀∃-theorem. This is literally because it’s a theorem of the form “∀ something. ∃ something. so that something”. These are the sort of thing that Twelf can help us prove.
Here’s the workflow for writing one of these proofs in Twelf
%mode
specification to say what is bound in the ∀ and what is bound in the ∃.%worlds
, usually we want to say the empty context, ()
%total
where the N
specifies what to induct on.In our case we have a case for each canonical form of nat
so our type family is total. This means that our theorem passes. Hurray!
Believe it or not this is what life is like in Twelf land. All the code I’ve written these last couple of weeks is literally type signatures and 5 occurrences of %total
. What’s kind of fun is how unreasonably effective a system this is for proving things.
Let’s wrap things up by proving one last theorem, if plus A B N
and plus A B M
both have derivations, then we should be able to show that M
and N
are the same. Let’s start by defining what it means for two natural numbers to be the same.
nat-eq : nat -> nat -> type.
nat-eq/r : nat-eq N N.
nat-eq/s : nat-eq N M -> nat-eq (s N) (s M).
I’ve purposefully defined this so it’s amenable to our proof, but it’s still a believable formulation of equality. It’s reflexive and if N
is equal to M
, then s N
is equal to s M
. Now we can actually state our proof.
plus-fun : plus N M P -> plus N M P' -> nat-eq P P' -> type.
%mode plus-fun +A +B -C.
Our theorem says if you give us two derivations of plus
with the same arguments, we can prove that the outputs are equal. There are two cases we have to cover for our induction so there are two constructors for this type family.
plus-fun/z : plus-fun plus/z plus/z nat-eq/r.
plus-fun/s : plus-fun (plus/s L) (plus/s R) (nat-eq/s E)
<- plus-fun L R E.
A bit of syntactic sugar here, I used the backwards arrow which is identical to the normal ->
except its arguments are flipped. Finally, we ask Twelf to check that we’ve actually proven something here.
%worlds () (plus-fun _ _ _).
%total (P) (plus-fun P _ _).
And there you have it, some actual theorem we’ve mechanically checked using Twelf.
I wanted to keep this short, so now that we’ve covered Twelf basics I’ll just refer you to one of the more extensive tutorials. You may be interested in
If you’re interested in learning a bit more about the nice mathematical foundations for LF you should check out “The LF Paper”.
Position Title: Engineering Manager
Location: Dresden, Germany
The Company
FireEye has invented a purpose-built, virtual machine-based security platform that provides real-time threat protection to enterprises and governments worldwide against the next generation of cyber attacks. These highly sophisticated cyber attacks easily circumvent traditional signature-based defenses, such as next-generation firewalls, IPS, anti-virus, and gateways. The FireEye Threat Prevention Platform provides real-time, dynamic threat protection without the use of signatures to protect an organization across the primary threat vectors and across the different stages of an attack life cycle. The core of the FireEye platform is a virtual execution engine, complemented by dynamic threat intelligence, to identify and block cyber attacks in real time. FireEye has over 3,100 customers across 67 countries, including over 200 of the Fortune 500.
Job Description:
In Dresden, Germany, an outstanding team of formal methods engineers uses formal methods tools, such as proof assistants, to develop correctness proofs for FireEye's leading edge products. In real world applications of formal methods tools, automation is often not sufficient for the specific problems at hand. Therefore, we are seeking outstanding software developers with a passion for implementing both well-designed as well as ad-hoc formal methods software tools for proof refactoring, proof search, systematic testing and other areas.
Responsibilities:
Desired Skills & Experience
Get information on how to apply for this position.
An important aspect of studying evolution is the construction of phylogenetic trees, graphically representing the relationship between current and historic species. These trees are usually calculated based on similarities and differences between genetic material of current species, and one particular challenge is that the topology of the resulting trees depend on the selection of genes used to construct them. Quite often, the species tree based on one set of genes differ substantially from the tree based on another set of genes.
The phylogenetic tree is usually presented as a simple tree of species. The end points of brances at the bottom of the tree (leaves) represent current species, and branching points higher up (internal nodes) represent the most recent common ancestor, or MRCA, for the species below it.
A very simple example could look something like this:
Here you have two current species, and you can trace back their lineage to a MRCA, and further back to some ancient origin. Varying colors indicate that gradual change along the branches has introduced differences, and that the current species now have diverged from each other, and their origin.
This representation has the advantage of being nice and simple, and the disadvantage of being misleading. For instance, one might get the impression that a species is a well-defined concept, and ask questions like: when the first member of a species diverged from its ancestor species, how did it find a mate?
But we are talking about species here - that is, not individuals but populations of individuals. So a more accurate representation might look like this:
Circles now represent individuals, and it should perhaps be clearer that there is no such thing as the “first” of anything. At the separation point, there is no difference between the two populations, and it is only after a long period of separation that differences can arise. (Of course, if there are selective pressure favoring specific properties - perhaps redness is very disadvantageous for species B, for instance - this change will be much quicker. Look at how quickly we have gotten very different breeds of dogs by keeping populations artificially separate, and selecting for specific properties.)
The so-called “speciation” is nothing more than a population being split into two separate parts. Typically, this can be geographically - a few animals being carried to Madagascar on pieces of driftwood - but anything that prevents members of one branch from mating with members of the other one will suffice. At he outset, the two branches are just indistinguishable subpopulations of the same species, but if the process goes on long enough, differences between the two populations can become large enough that they can no longer interbreed, and we can consider them different species.
In practice, such a separation is often not complete, some individuals can stray between the groups. In that case, speciation is less likely to happen, since the property of being unable to breed with the other group represents a reproductive disadvantage, and it would therefore be selected against. In other words, if your neighbor is able to have children with more members of the population than you, his chances of having children are better than yours. Your genes get the short end of the stick. Kind of obvious, no?
But we can also view evolution as working on populations, not of individuals, but of individual genes. This is illustrated in the next picture:
The colored circles now represent genes, and an individual is here just a sample from the population of genes - illustrated by circling three gene circles. (Note that by “gene”, we here mean an abstract unit of inheritance. In other fields of biology, the word might be synonymous with a genetic region that codes for a protein, or is transcribed to (possibly non-coding) RNA.)
Here, we see that although the genes themselves do not change (in reality they are subject to mutations), the availability of the different genes vary over time, and some might disappear from one of the branches entirely - like red genes from species B here. This kind of genetic drift can still cause distinct changes in individuals.
Each individual typically gets half its genes from each parent, one fourth from each grandparent, and so on, so after a few generations, all genes come from essentially different ancestors. This means you can calculate the MRCA for each gene individually, and this is exactly what has been done to estimate the age our “mitochondrial Eve” and “Y-chromosomal Adam”. Here is the example lineage for the green gene:
We see that the green-gene MRCA is much older than the speciation event. In addition, each gene has its unique history. This means that when we try to compute the MRCA, different genes will give different answers, and it can be difficult to construct a sensible consensus.
For example, bits of our genome appear to come from the Neanderthal, and those bits will have a MRCA that predates the time point where Neanderthal branched from H. sapiens (possibly 1-2 million years ago). (Interestingly, “Adam” and “Eve” are both estimated to have lived in the neighborhood of 200000 years ago. This means that although 20% of the Neanderthal genome is claimed to have survived, all Neanderthal mitochondria and Y-chromosomes have been eradicated.)
Recently the Haskell community has been engaged in a very intense discussion
around potential changes to the Prelude
(aka "burning bridges" or "FTP").
Here's the most recent incarnation of the discussion for
context.
The changes under discussion are non-trivial, and many people are putting in a
huge amount of energy to try and make Haskell the best it can be. And to be
clear: I'm talking about people arguing on both sides of this discussion, and
people trying to moderate it. As someone who's mostly been sitting on the
sidelines in this one, I want to start by expressing a big thank you to
everyone working on this.
(If anyone's wondering why I'm mostly sitting this one out, it's because I don't feel very strongly about it either way. I think there are great arguments going both ways, and over the past week I've fluctuated between being -0.2 on the proposal and being +0.2.)
When a big discussion like this happens, it's easy for people to misinterpret it as something unhealthy. I'm here to remind everyone that, in fact, the opposite is true: what we're seeing now is the sign of an incredibly healthy community, based on an amazing language, that is undertaking extraordinary things. And there are of course some warts being revealed too, but those are relatively minor, and I believe we will be able to overcome them.
So to begin my cheerleading:
The fact that we can even consider doing this is amazing. I don't think very many languages could sustain a significant rewrite of their most core library. Let's just keep things in perspective here: even the worst case scenario damage from this change involves updating some documentation and modifying a relatively small amount of code in such a way that will be backwards compatible with old versions of the library. This is a true testament not only to the power of the Haskell language, but to the thoughtfulness with which this proposal was made.
The discussion has been incredibly civil. This topic had all the makings for an internet flame war: strongly held opinions, good arguments on both sides, lots of time and effort involved, and Reddit. I am happy to say that I have not seen a single personal attack in the entire discussion. Almost every piece of discourse has been beyond reproach, and the few times where things have gotten close to crossing the line, people on both sides have politely expressed that sentiment, leading to the original content being removed.
To some extent, I think we're all a bit spoiled by how good the civility in the Haskell world is, and we should take a moment to appreciate it. That's not to say we should ever expect any less, but we should feel comfortable patting ourselves on the back a bit.
We're dynamically coming up with new, better processes. When opinions are so strongly divided, it's difficult to make any kind of progress. As a community, we're adapting quickly and learning how to overcome that. As you can see in the thread I linked above, we now have a clear path forward: a feedback form that will be processed by Simon PJ and Simon Marlow, who will make the ultimate decision. This process is clear, and we couldn't be more fortunate to have such great and well respected leaders in our community.
Nothing else has stopped. If you look at issue trackers, commit logs, and mailing list discussions, you can see that while many members of the community are actively participating in this discussion, nothing else has ground to a halt. We're a dynamic community with many things going on, so the ability to digest a major issue while still moving forward elsewhere is vital.
That said, I think there are still some areas for improvement. The biggest one lies with the core libraries committee, of which I'm a member. We need to learn how to be better at communicating with the community about these kinds of large scale changes. I'm taking responsibility for that problem, so if you don't see improvements on that front in the next few weeks, you can blame me.
More generally, I think there are some process and communications improvements that can be made at various places in the community. I know that's an incredibly vague statement, but that's all I have for the moment. I intend to follow up in the coming weeks with more concrete points and advice on how to improve things.
In sum: Haskell's an amazing language, which has attracted an amazing community. This discussion doesn't detract from that statement, but rather emphasizes it. Like any group, we can still learn to do a few things better, but we've demonstrated time and again (including right now!) that we have the strength to learn and improve, and I'm certain we'll do so again.
I'm proud to be part of this community, and everyone else should be as well.
Recruit IT are looking for a Senior Developer to join a bleeding edge Big Data organisation in the New Zealand market. You will bring a proven background in big data systems, business intelligence, and/or data warehouse technologies along with a preference for functional programming and open source solutions.
You will be playing an integral part in ensuring the growth and performance of a state of the art Big Data platform. To do this you will need to have a good understanding of the importance of analytics and a variety of Big Data technologies.
Your experience to date will include:
Experience with big data, business intelligence, and data warehouse applications. This will include; Hadoop, Hive, Spring, Pivotal HD, Cloud Foundry, HAWQ, GreenPlum, MongoDB, Cassandra, Hortonworks, Cloudera, MapReduce, Flume, or Scoop to name a few!
Ideally functional programming experience including; Scala, Haskell, Lisp, Python etc
Passion for bleeding edge tech is a MUST!
If you are interested in finding out more, apply online to Kaleb at Recruit IT with your CV and an overview of your current situation.
Get information on how to apply for this position.
The Foldable-Traversable proposal (aka FTP) has spawned a lot of debate in the Haskell community.
Here I want to analyze the specific concern which Ben Moseley raised in his post, FTP dangers.
Ben’s general point is that more polymorphic (specifically, ad-hoc polymorphic, i.e. using type classes) functions are less readable and reliable than their monomorphic counterparts.
On the other hand, Tony Morris and Chris Allen argue on twitter that polymorphic functions are more readable due to parametricity.
Is that true, however? Are the ad-hoc generalized functions more parametric than the monomorphic versions?
@shebang @dibblego is (a -> b) -> [a] -> [b]
more parametric than Functor f => (a -> b) -> f a -> f b
?
— Je Suis Petit Gâteau (@bitemyapp) February 12, 2015
<script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>
Technically, the Functor-based type is more parametric. A function with type (a -> b) -> [a] -> [b]
is something like map
, except it may drop or duplicate some elements. On the other hand, Functor f => (a -> b) -> f a -> f b
has to be fmap
.
But this is a trick question! The first thing we see in the code is the function’s name, not its type. What carries more information, map
or fmap
? (Assuming both come from the current Prelude.) Certainly map
. When fmap
is instantiated at the list type, it is nothing more than map
. When we see fmap
, we know that it may or may not be map
. When we see map
, we know it is map
and nothing else.
The paradox is that there are more functions with map
’s type than fmap
’s, but there are more functions with fmap
’s name than map
’s. Even though fmap
is more parametric, that doesn’t win us much.
Nevertheless, is there a benefit in using more parametric functions in your code? No. If it were true, we’d all be pumping our code with «parametricity» by writing id 3
instead of 3
. You can’t get more parametric than id
.
Merely using parametric functions doesn’t make code better. Parametricity may pay off when we’re defining polymorphic parametric functions in our code instead of their monomorphic instantiations, since parametric types are more constrained and we’re more likely to get a compile error should we do anything stupid.
(It won’t necessarily pay off; the type variables and constraints do impose a tax on types’ readability.)
But if you have an existing, monomorphic piece of code that works with lists, simply replacing Data.List
functions with Data.Foldable
ones inside it, ceteris paribus, will not make your code any safer or more readable.
After a recent chat with Simon Meier, we decided that I would take over the maintenance of the exceedingly popular blaze-builder
package.
Of course, this package has been largely superseded by the new builder shipped inside bytestring
itself. The point of this new release is to offer a smooth migration path from the old to the new.
If you have a package that only uses the public interface of the old blaze-builder
, all you should have to do is compile it against blaze-builder-0.4
and you will in fact be using the new builder. If your program fails to compile against the old public interface, or there’s any change in the semantics of your program, then please file a bug against my blaze-builder
repository.
If you are looking for a function to convert Blaze.ByteString.Builder.Builder
to Data.ByteString.Builder.Builder
or back, it is id
. These two types are exactly the same, as the former is just a re-export of the latter. Thus inter-operation between code that uses the old interface and the new should be efficient and painless.
The one caveat is that the old implementation has all but disappeared, and programs and libraries that touch the old internal modules will need to be updated.
This compatibility shim is especially important for those libraries that have the old blaze-builder as part of their public interface, as now you can move to the new builder without breaking your interface.
There are a few things to consider in order to make this transition as painless as possible, however: libraries that touch the old internals should probably move to the new bytestring builder as soon as possible, while those libraries who depend only on the public interface should probably hold off for a bit and continue to use this shim.
For example, blaze-builder
is part of the public interface of both the Snap Framework and postgresql-simple
. Snap touches the old internals, while postgresql-simple
uses only the public interface. Both libraries are commonly used together in the same projects.
There would be some benefit to postgresql-simple
to move to the new interface. However, let’s consider the hypothetical situation where postgresql-simple
has transitioned, and Snap has not. This would cause problems for any project that 1.) depends on this compatibility shim for interacting with postgresql-simple
, and 2.) uses Snap.
Any such project would have to put off upgrading postgresql-simple
until Snap is updated, or interact with postgresql-simple
through the new bytestring builder interface and continue to use the old blaze-builder
interface for Snap. The latter option could range from anywhere from trivial to extremely painful, depending on how entangled the usage of Builders are between postgresql-simple
and Snap.
By comparison, as long as postgresql-simple
continues to use the public blaze-builder
interface, it can easily use either the old or new implementation. If postgresql-simple
holds off until after Snap makes the transition, then there’s little opportunity for these sorts of problems to arise.
In the past, I’ve said some negative things^{1} about Doug Beardsley’s snaplet-postgresql-simple
, and in this long overdue post, I retract my criticism.
The issue was that a connection from the pool wasn’t reserved for the duration of the transaction. This meant that the individual queries of a transaction could be issued on different connections, and that queries from other requests could be issued on the connection that’s in a transaction. Setting the maximum size of the pool to a single connection fixes the first problem, but not the second.
At Hac Phi 2014, Doug and I finally sat down and got serious about fixing this issue. The fix did require breaking the interface in a fairly minimal fashion. Snaplet-postgresql-simple
now offers the withPG
and liftPG
operators that will exclusively reserve a single connection for a duration, and in turn uses withPG
to implement withTransaction
.
We were both amused by the fact that apparently a fair number of people have been using snaplet-postgresql-simple
, even transactions in some cases, without obviously noticing the issue. One could speculate the reasons why, but Doug did mention that he pretty much never uses transactions. So in response, I came up with a list of five common use cases, the first three involve changing the database, and last two are useful even in a read-only context.
All-or-nothing changes
Transactions allow one to make a group of logically connected changes so that they either all reflected in the resulting state of the database, or that none of them are. So if anything fails before the commit, say due to a coding error or even something outside the control of software, the database isn’t polluted with partially applied changes.
Bulk inserts
Databases that provide durability, like PostgreSQL, are limited in the number of transactions per second by the rotational speed of the disk they are writing to. Thus individual DML statements are rather slow, as each PostgreSQL statement that isn’t run in an explicit transaction is run in its own individual, implicit transaction. Batching multiple insert statements into a single transaction is much faster.
This use case is relatively less important when writing to a solid state disk, which is becoming increasingly common. Alternatively, postgresql allows a client program to turn synchronous_commit
off for the connection or even just a single transaction, if sacrificing a small amount of durability is acceptable for the task at hand.
Avoiding Race Conditions
Transactional databases, like Software Transactional Memory, do not automatically eliminate all race conditions, they only provide a toolbox for avoiding and managing them. Transactions are the primary tool in both toolboxes, though there are considerable differences around the edges.
Using Cursors
Cursors are one of several methods to stream data out of PostgreSQL, and you’ll almost always want to use them inside a single transaction.^{2} One advantage that cursors have over the other streaming methods is that one can interleave the cursor with other queries, updates, and cursors over the same connection, and within the same transaction.
Running multiple queries against a single snapshot
If you use the REPEATABLE READ
or higher isolation level, then every query in the transaction will be executed on a single snapshot of the database.
So I no longer have any reservations about using snaplet-postgresql-simple
if it is a good fit for your application, and I do recommend that you learn to use transactions effectively if you are using Postgres. Perhaps in a future post, I’ll write a bit about picking an isolation level for your postgres transactions.
See for example, some of my comments in the github issue thread on this topic, and the reddit thread which is referenced in the issue.↩
There is the WITH HOLD
option for keeping a cursor open after a transaction commits, but this just runs the cursor to completion, storing the data in a temporary table. Which might occasionally be acceptable in some contexts, but is definitely not streaming.↩
*Borders.Base.Utils> let a' = ["alice", "bob"]
*Borders.Base.Utils> let a = (True, a')
*Borders.Base.Utils> length $ concat a
<interactive>:6:17:
Couldn't match expected type ‘[[a0]]’
with actual type ‘(Bool, [[Char]])’
In the first argument of ‘concat’, namely ‘a’
In the second argument of ‘($)’, namely ‘concat a’</interactive>
*Borders.Base.Utils> length $ concat a'
8
*Borders.Base.Utils> length $ Data.Foldable.concat a
2
Lookingglass is seeking a qualified Senior Development Engineer to join our team!
Are you an experienced senior software engineer in security, networking, cloud and big data? Are you interested in cyber security or improving the security of the Internet? Do you push yourself to be creative and innovative and expect the same of others?
At Lookingglass, we are driven and passionate about what we do. We believe that teams deliver great products not individuals. We inspire each other and our customers every day with technology that improves the security of the Internet and of our customer’s. Behind our success is a team that thrives on collaboration and creativity, delivering meaningful impact to our customers.
Get information on how to apply for this position.
I write a lot about types. Up until now however, I’ve only made passing references to the thing I’ve actually been studying in most of my free time lately: proof theory. Now I have a good reason for this: the proof theory I’m interested in is undeniably intertwined with type theory and computer science as a whole. In fact, you occasionally see someone draw the triangle
Type Theory
/ \
/ \
Proof Theory ---- Category Theory
Which nicely summarizes the lay of the land in the world I’m interested in. People will often pick up something will understood on one corner of the triangle and drag it off to another, producing a flurry of new ideas and papers. It’s all very exciting and leads to really cool stuff. I think the most talked about example lately is homotopy type theory which drags a mathematical structure (weak infinite groupoids) and hoists off to type theory!
If you read the [unprofessional, mostly incorrect, and entirely more fun to read] blog posts on these subjects you’ll find most of the lip service is paid to category theory and type theory with poor proof theory shunted off to the side.
In this post, I’d like to jot down my notes on Frank Pfenning’s introduction to proof theory materials to change that in some small way.
The obvious question is just “What is proof theory?”. The answer is that proof theory is the study of proofs. In this world we study proofs as first class mathematical objects which we prove interesting things about. This is the branch of math that formalizes our handwavy notion of a proof into a precise object governed by rules.
We can then prove things like “Given a proof that Γ ⊢ A
and another derivation of Γ, A ⊢ B
, then we can produce a derivation of Γ ⊢ B
. Such a theorem is utterly crazy unless we can formalize what it means to derive something.
From this we grow beautiful little sets of rules and construct derivations with them. Later, we can drag these derivations off to type theory and use them to model all sorts of wonderful phenomena. My most recent personal example was when folks noticed that the rules for modal logic perfectly capture what the semantics of static pointers ought to be.
So in short, proof theory is devoted to answering that question that every single one of your math classes dodged
Professor, what exactly is a proof?
In every logic that we’ll study we’ll keep circling back to two core objects: judgments and propositions. The best explanation of judgments I’ve read comes from Frank Pfenning
A judgment is something we may know, that is, an object of knowledge. A judgment is evident if we in fact know it.
So judgments are the things we’ll structure our logic around. You’ve definitely heard of one judgment: A true
. This judgment signifies whether or not some proposition A
is true. Judgments can be much fancier though: we might have a whole bunch of judgments like n even
, A possible
or A resource
.
These judgments act across various syntactic objects. In particular, from our point of view we’ll understand the meaning of a proposition by the ways we can prove it, that is the proofs that A true
is evident.
We prove a judgment J
through inference rules. An inference rule takes the form
J₁ J₂ .... Jₓ
—————————————
J
Which should be read as “When J₁
, J₂
… and Jₓ
hold, then so does J
”. Here the things above the line are premises and the ones below are conclusions. What we’ll do is define a bunch of these inference rules and use them to construct proofs of judgments. For example, we might have the inference rules
n even
—————— ————————————
0 even S(S(n)) even
for the judgment n even
. We can then form proofs to show that n even
holds for some particular n
.
——————
0 even
————————————
S(S(0)) even
——————————————————
S(S(S(S(0)))) even
This tree for example is evidence that 4 even
holds. We apply second inference rule to S(S(S(S(0))))
first. This leaves us with one premise to show, S(S(0)) even
. For this we repeat the process and end up with the new premise that 0 even
. For this we can apply the first inference rule which has no premises completing our proof.
One judgment we’ll often see is A prop
. It simply says that A
is a well formed proposition, not necessarily true but syntactically well formed. This judgment is defined inductively over the structure of A
. An example judgment would be
A prop B prop
——————————————
A ∧ B prop
Which says that A ∧ B
(A and B) is a well formed proposition if and only if A
and B
are! We can imagine a whole bunch of these rules
A prop B prop
—————— —————— ————————————— ...
⊤ prop ⊥ prop A ∨ B prop
that lay out the propositions of our logic. This doesn’t yet tell us how prove any of these propositions to be true, but it’s a start. After we formally specify what sentences are propositions in our logic we need to discuss how to prove that one is true. We do this with a different judgment A true
which is once again defined inductively.
For example, we might want to give meaning to the proposition A ∧ B
. To do this we define its meaning through the inference rules for proving that A ∧ B true
. In this case, we have the rule
A true B true
—————————————— (∧ I)
A ∧ B true
I claim that this defines the meaning of ∧
: to prove a conjunction to be true we must prove its left and right halves. The rather proof-theoretic thing we’ve done here is said that the meaning of something is what we use to prove it. This is sometimes called the “verificationist perspective”. Finally, note that I annotated this rule with the name ∧ I
simply for convenience to refer it.
Now that we know what A ∧ B
means, what does have a proof of it imply? Well we should be able to “get out what we put in” which would mean we’d have two inference rules
A ∧ B true A ∧ B true
—————————— ——————————
A true B true
We’ll refer to these rules as ∧ E1
and ∧ E2
respectively.
Now for a bit of terminology, rules that let us “introduce” new proofs of propositions are introduction rules. Once we have a proof, we can use it to construct other proofs. The rules for how we do that are called elimination rules. That’s why I’ve been adding I’s and E’s to the ends of our rule names.
How do we convince ourselves that these rules are correct with respect to our understanding of ∧
? This question leads us to our first sort of proofs-about-proofs we’ll make.
What we want to say is that the introduction and elimination rules match up. This should mean that anytime we prove something with an by an introduction rule followed by an elimination rule, we should be able to rewrite to avoid this duplication. This also hints that the rules aren’t too powerful: we can’t prove anything with the elimination rules that we didn’t have a proof for at some point already.
For ∧
this proof looks like this
D E
– –
A B D
—————— ∧I ⇒ ————
A ∧ B A
—————— ∧E 1
A
So whenever we introduce a ∧ and then eliminate it with ∧ E1
we can always rewrite our proof to not use the elimination rules. Here notice that D and E range over derivations in this proof. They represent a chain of rule applications that let us produce an A
or B
in the end. Note I got a bit lazy and started omitting the true
judgments, this is something I’ll do a lot since it’s mostly unambiguous.
The proof for ∧E2
is similar.
D E
– –
A B E
————— ∧I ⇒ ————
A ∧ B B
————— ∧E 2
B
Given this we say that the elimination rules for ∧ are “locally sound”. That is, when used immediately after an elimination rule they don’t let us produce anything truly new.
Next we want to show that if we have a proof of A ∧ B
, the elimination rules give us enough information that we can pick the proof apart and produce a reassembled A ∧ B
.
D D
————– ————–
D A ∧ B A ∧ B
————— ⇒ —————∧E1 ——————∧E2
A ∧ B A B
———————————————— ∧I
A ∧ B
This somewhat confusion derivation takes our original proof of A ∧ B
and pulls it apart into proof of A
and B
and uses these to assemble a new proof of A ∧ B
. This means that our elimination rules give us all the information we put in so we say their locally complete.
The two of these properties combined, local soundness and completeness are how we show that an elimination rule is balanced with its introduction rule.
If you’re more comfortable with programming languages (I am) our local soundness property is equivalent to stating that
fst (a, b) ≡ a
snd (a, b) ≡ b
And local completeness is that
a ≡ (fst a, snd a)
The first equations are reductions and the second is expansion. These actually correspond the eta and beta rules we expect a programming language to have! This is a nice little example of why proof theory is useful, it gives a systematic way to define some parts of the behavior of a program. Given the logic a programming language gives rise to we can double check that all rules are locally sound and complete which gives us confidence our language isn’t horribly broken.
Before I wrap up this post I wanted to talk about one last important concept in proof theory: judgments with hypotheses. This is best illustrated by trying to write the introduction and elimination rules for “implies” or “entailment”, written A ⊃ B
.
Clearly A ⊃ B
is supposed to mean we can prove B true
assume A true
to be provable. In other words, we can construct a derivation of the form
A true
——————
.
.
.
——————
B true
We can notate our rules then as
—————— u
A true
——————
.
.
.
——————
B true A ⊃ B A
—————————— u ——————————
A ⊃ B true B true
This notation is a bit clunky, so we’ll opt for a new one: Γ ⊢ J
. In this notation Γ
is some list of judgments we assume to hold and J
is the thing we want to show holds. Generally we’ll end up with the rule
J ∈ Γ
—————
Γ ⊢ J
Which captures the fact that Γ contains assumptions we may or may not use to prove our goal. This specific rule may vary depending on how we want express how assumptions work in our logic (substructural logics spring to mind here). For our purposes, this is the most straightforward characterization of how this ought to work.
Our hypothetical judgments come with a few rules which we call “structural rules”. They modify the structure of judgment, rather than any particular proposition we’re trying to prove.
Weakening
Γ ⊢ J
—————————
Γ, Γ' ⊢ J
Contraction
Γ, A, A, Γ' ⊢ J
———————————————
Γ, A, Γ' ⊢ J
Exchange
Γ' = permute(Γ) Γ' ⊢ A
————————————————————————
Γ ⊢ A
Finally, we get a substitution principle. This allows us to eliminate some of the assumptions we made to prove a theorem.
Γ ⊢ A Γ, A ⊢ B
————————————————
Γ ⊢ B
These 5 rules define meaning to our hypothetical judgments. We can restate our formulation of entailment with less clunky notation then as
A prop B prop
——————————————
A ⊃ B prop
Γ, A ⊢ B Γ ⊢ A ⊃ B Γ ⊢ A
————————— ——————————————————
Γ ⊢ A ⊃ B Γ ⊢ B
One thing in particular to note here is that entailment actually internalizes the notion of hypothetical judgments into our logic. This the aspect of it that made it behave so differently then the other connectives we looked at.
As an exercise to the reader: prove the local soundness and completeness of these rules.
In this post we’ve layed out a bunch of rules and I’ve hinted that a bunch more are possible. When put together these rules define a logic using “natural deduction”, a particular way of specifying proofs that uses inference rules rather than axioms or something entirely different.
Hopefully I’ve inspired you to poke a bit further into proof theory, in that case I heartily recommend Frank Pfenning’s lectures at the Oregon Summer School for Programming Languages.
Cheers,
Hi *,
Welcome! This is the first GHC Weekly news of February 2015. You might be wondering what happened to the last one. Well, your editor was just in New York for the past week attending Compose Conference, making friends and talking a lot about Haskell (luckily we missed a snow storm that may have messed it up quite badly!)
The conference was great. I got to have some interesting discussions about GHC and Haskell with many friendly faces from all around at an incredibly well run conference with a stellar set of people. Many thanks to NY Haskell (organizing), Spotify (hosting space), and to all the speakers for a wonderful time. (And of course, your editor would like to thank his employer Well-Typed for sending him!)
But now, since your author has returned, GHC HQ met back up this week for some discussion, with some regularly scheduled topics. For the most part it was a short meeting this week - our goals are pretty well identified:
Since my last post, we've also had other random assorted chatter on the mailing lists by the dev team:
Closed tickets the past two weeks include: #10028, #10040, #10031, #9935, #9928, #2615, #10048, #10057, #10054, #10060, #10017, #10038, #9937, #8796, #10030, #9988, #10066, #7425, #7424, #7434, #10041, #2917, #4834, #10004, #10050, #10020, #10036, #9213, and #10047.
Summary: Hoogle 4 is out of date. The alpha version Hoogle 5 has fresh code and data every day (and isn't yet ready).
Someone recently asked why Hoogle's index is so out of date. Making the index both more current (updated daily) and larger (indexing all of Stackage) is one of the goals behind my Hoogle 5 rewrite (which still isn't finished). Let's compare the different update processes:
Hoogle 4 updates took about two hours to complete, if they went well, and often had to be aborted. I first compiled the Hoogle binary on the haskell.org
machines, which often failed, as typically the version of GHC was very old. Once I'd got a compiled binary, I needed to generate the database, which took about 2 hours, and occasionally failed halfway through. Once I had the new binary and databases I moved everything to correct place for Apache, accepting a small window of downtime during the move. Assuming that worked, I did a few test searches and smiled. Often the new Hoogle binary failed to start (usually failure to find some files, sometimes permissions) and I had to switch back to the old copy. Fixing up such issues took up to an hour. I had a mix of Windows .bat
and Linux .sh
scripts to automate some of the steps, but they weren't very robust, and required babysitting.
Hoogle 5 updates happen automatically at 8pm every night, take 4 minutes, and have yet to fail. I have a cron script that checks out the latest code and runs an update script. That script clones a fresh repo, compiles Hoogle, builds the databases, runs the test suite, kills the old version and launches the new version. The Hoogle code is all tested on Travis, so I don't expect that to fail very often. The upgrade script is hard to test, but the two failure modes are upgrading to a broken version, or not upgrading. The upgrade script runs checks and fails if anything doesn't work as expected, so it errs on the side of not upgrading. I use Uptime Robot to run searches and check the server is working, along with a canary page which raises an error if no upgrade happens for two days.
Clearly, the Hoogle 5 version update story is better. But why didn't I do it that way with Hoogle 4? The answer is that Hoogle 4 came out over six years ago, and a lot has changed since then:
When will Hoogle 5 be ready? It doesn't yet do type search, there is no offline version and no API. There are probably lots of other little pieces missing. If you want, feel free to use it now at hoogle.haskell.org. You can still use Hoogle 4 at haskell.org/hoogle, or the more up-to-date FP complete hosted Hoogle 4.
This post asks for your help in deciding how to proceed with some Prelude changes in GHC 7.10. Please read on, but all the info is also at the survey link, here: http://goo.gl/forms/XP1W2JdfpX. Deadline is 21 Feb 2015.
The Core Libraries Committee (CLC) is responsible for developing the core libraries that ship with GHC. This is an important but painstaking task, and we owe the CLC a big vote of thanks for taking it on.
For over a year the CLC has been working on integrating the Foldable and Traversable classes (shipped in base in GHC 7.8) into the core libraries, and into the Prelude in particular. Detailed planning for GHC 7.10 started in the autumn of 2014, and the CLC went ahead with this integration.
Then we had a failure of communication. As these changes affect the Prelude, which is in scope for all users of Haskell, these changes should be held to a higher bar than the regular libraries@ review process. However, the Foldable/Traversable changes were not particularly well signposted. Many people have only recently woken up to them, and some have objected (both in principle and detail).
This is an extremely unfortunate situation. On the one hand we are at RC2 for GHC 7.10, so library authors have invested effort in updating their libraries to the new Prelude. On the other, altering the Prelude is in effect altering the language, something we take pretty seriously. We should have had this debate back in 2014, but here we are, and it is unproductive to argue about whose fault it is. We all share responsibility. We need to decide what to do now. A small group of us met by Skype and we've decided to do this:
Wiki pages have been created summarizing these two primary alternatives, including many more points and counter-points and technical details:
This survey invites your input on which plan we should follow. Would you please
Please do read the background. Well-informed responses will help. Thank you!
DEADLINE: 21 February 2015
Simon PJ
To prevent possible name clashes, the name of my type inference engine for JavaScript is now Inferny Infernu . I hope I don’t have to change the name again!
In other news, here is some recent progress:
I changed inference of ‘new’ calls, so that the constructed function must have a row type as it’s “this” implicit parameter (nothing else makes sense).
The change to “new” typing made it possible to define the built in String, Number, Boolean type coercion functions in a way that disallows constructing them (e.g. “new String(3)”) which is considered bad practice in JavaScript. The reason is that the constructed values are “boxed” (kind of) so that they don’t equate by reference as normal strings, booleans and numbers do. For example, new String(3) == '3'
but at the same time, new String(3) !== '3'.
I added an initial implementation of what I call ambiguous types. These are simple type constraints that limit a type variable to a set of specific types.
The motivation for type constraints is that some JavaScript operators make sense for certain types, but not all types. So having a fully polymorphic type variable would be too weak. For example, the + operator has weird outputs when using all sorts of different input types (NaNNaNNaNNaNNaNNaN….batman!). I would like to constrain + to work only between strings or between numbers.
With the new type constraints, + has the following type:
a = (TNumber | TString) => ((a, a) -> a)
The syntax is reminiscent of Haskell’s type classes, and means: given a type variable “a” that is either a TNumber
or a TString
, the type of + is: (a, a) -> a
I’m thinking about implementing full-blown type classes, or alternatively, a more powerful form of ambiguous types, to deal with some other more complicated examples.
As we approach the release of GHC 7.10, there is a new wave of Haskell packages that require trivial fixes to build with the new versions of the compiler and standard libraries, but whose authors/maintainers are not around to apply the fixes. This is especially annoying when there is a pull request on GitHub, and all the maintainer would have to do is to press the green Merge button, and upload the new version on hackage.
If you are a responsible maintainer and don’t want this to happen to your packages in the future, you should appoint backup maintainers for your packages.
But what if you are a user of a package that remains broken on hackage, even though a fix is available? Here I review several ways to deal with this problem, including the new and promising Stackage snapshots.
If all you care about is to get something installed locally (be it the broken package itself, or something that directly or indirectly depends on it), you can install the fixed version locally.
Check out the repository or branch with the fix, and cabal-install it:
% git clone -b ghc710 https://github.com/markus1189/feed.git
% cabal install ./feed
(I prepend ./
to make sure cabal understands that I mean the directory, and not the package name on hackage.)
If you’re installing in the sandbox, then you can use add-source
(although the non-sandboxed version will work in this case, too):
% git clone -b ghc710 https://github.com/markus1189/feed.git
% cabal sandbox add-source feed
% cabal install whatever-you-needed
If the package whatever-you-needed
has feed
among its transitive dependencies, cabal will automatically install it from the add-source’d directory.
This approach doesn’t work well if:
You are a maintainer of a package that depends on the broken library. It’s hard to ask your users to check out and build the fixed version by hand.
You work on an in-house application that your coworkers should be able to build, for the same reason.
You cannot upload the fixed version of a package on hackage bypassing the maintainer. However, you can upload it under a new name. This works well if you are a maintainer of a package that directly depends on the broken package, because you can easily make your package depend on your fork.
Examples of this are tasty depending on regex-tdfa-rc (a fork of regex-tdfa) and tasty-golden depending on temporary-rc (a fork of temporary).
This doesn’t work well if there’s a chain of dependencies leading from your package to the broken one. You have to either persuade the other maintainer(s) to depend on your fork or fork the entire chain.
If the broken package becomes actively developed again, you need to either move back to using it or backport the bugfixes from it to your fork. (I only fork packages when I find this scenario unlikely.)
Other packages that depend on the broken package won’t automatically get fixed.
Some people get upset when you fork packages.
Instead of uploading the fixed version to hackage (which you can’t), you can upload it to Stackage instead, by creating a custom snapshot.
The procedure is described in Experimental package releases via Stackage Server. You create four files:
cabal sdist
). You probably want to bump the package’s version, so that it doesn’t conflict with the version already on hackage.desc
and slug
. The first one contains a human-readable description of the snapshot; the second contains an id that will become part of the snapshot’s URL.Then you pack these four files into a tarball (that’s right, it’ll be a tarball with a tarball inside) and upload to stackage (after registering, if you haven’t registered before).
The outcome will be a custom hackage-like repository which will contain the single version of a single package — the one you’ve uploaded. (Of course, you can include multiple packages or versions if you like.)
The Stackage website will give you the remote-repo
line that you can put into your cabal.config
along with the hackage or stackage repos that are already there.
In contrast to building packages locally, you can easily tell your users or coworkers to add that repo as well.
If the new hackage release of the broken package will get the same version number as your stackage version, there will be a conflict. (I actually don’t know what happens in that case; my guess is that cabal will silently pick one of the two available versions.)
If the package you maintain (which depends on the broken package) is a small one, or is deep down the dependency chain, it may be hard to tell your users to add the repository. If, on the other hand, you maintain a major web framework or other such thing, it would probably work.
There’s a procedure for taking over a package described on the wiki. You’ll need to contact the current maintainer; wait an indefinite amount of time (there’s no consensus about it; estimates vary from 2 weeks to 6-12 months); ask again on the mailing list and wait again; finally ask Hackage admins to grant you the rights.
Since this procedure takes a long time, it’s almost never sufficient by itself, and you’ll need to resort to one of the other strategies until you’re given the upload rights.
It’s not clear how long you actually need to wait.
I find it odd that you need to jump through all these hoops in order to do a service to the community.
Some of the work I lead at Galois was highlighted in the initial story on 60 Minutes last night, a spot interviewing Dan Kaufman at DARPA. I’m Galois’ principal investigator for the HACMS program, focused on building more reliable software for automobiles and aircraft and other embedded systems. The piece provides a nice overview for the general public on why software security matters and what DARPA is doing about it; HACMS is one piece of that story.
I was busy getting married when filming was scheduled, but two of my colleagues (Dylan McNamee and Pat Hickey) appear in brief cameos in the segment (don’t blink!). Good work, folks! I’m proud of my team and the work we’ve accomplished so far.
You can see more details about how we have been building better programming languages for embedded systems and using them to build unpiloted air vehicle software here.
A few weeks ago I received a bug report against streaming-commons. Since then, the details of what we discovered when discussing this report have been bothering me quite a bit, as they expose a lot of the brittleness of the Haskell toolchain. I'm documenting all of these aspects now to make clear how fragile our tooling is, and thereby explain why I think Stackage is so vital to our community.
In this blog post, I'm going to describe six separate problems I've identified when looking into this issue, and explain how Stackage (or some similar deterministic build system) would have protected users against these problems had it been employed.
streaming-commons is a library that provides helper utilities for a number of
different streaming concepts, one of them being a streaming way to convert
blaze-builder Builder
s to filled ByteString
buffers. Since blaze-builder
was released a few years ago, a new set of modules was added to the bytestring
package in version 0.10 known as a "bytestring builder." I asked one of the
engineers at FP Complete, Emanuel Borsboom, to start working on a new module for
streaming-commons to provide similar functionality for bytestring builder.
And now we run into the first problem with the Haskell toolchain. You would
think that we should just add a lower bound on bytestring >= 0.10
in the
streaming-commons.cabal file. However, setting restrictive lower bounds on
ghc-package dependencies can be a
problem.
Fortunately, Leon Smith already solved this problem for us with
bytestring-builder,
which provides a compatibility layer for older bytestrings (much like Ed's
transformers-compat).
The idea is that, when compiled against an older version of bytestring, the
bytestring-builder package provides the necessary missing modules, and
otherwise does nothing.
When Emanuel wrote his changes to streaming-commons, he added a dependency on bytestring-builder. We then proceeded to test this on multiple versions of GHC via Travis CI and Herbert's multi-ghc-travis. Everything compiled and passed tests, so we shipped the updated version.
However, that original bug report I linked to- reported by Ozgun Ataman- told us there was a problem with GHC 7.6. This was pretty surprising, given that we'd tested on GHC 7.6. Fortunately Lane Seppala discovered the culprit: the Cabal library. It turns out that installing a new version of the Cabal library causes the build of streaming-commons to break, whereas our tests just used the default version of Cabal shipped with GHC 7.6. (We'll get back to why that broke things in a bit.)
After some digging, Emanuel discovered the deeper cause of the problem: Bryan O'Sullivan reported an issue a year ago where- when using a new version of the Cabal library- bytestring-builder does not in fact provide it's compatibility modules. This leads us to our second issue: this known bug existed for almost a year without resolution, and since it only occurs in unusual circumstances, was not detected by any of our automated tooling.
The reason this bug existed though is by far the most worrisome thing I saw in
this process: the Cabal library silently changed the semantics of one of its
fields in the 1.18 (or 1.20? I'm not sure) release. You see, bytestring-builder
was detecting which version of bytestring it was compiled against by inspecting
the configConstraints
field (you can see the code yourself on
Hackage).
And starting in Cabal 0.19.1 (a development release), that field was no longer
being populated. As a result, as soon as that newer Cabal library was
installed, the bytestring-builder package became worse than useless.
As an aside, this points to another problematic aspect of our toolchain: there is
no way to specify constraints on dependencies used in custom Setup.hs
files.
That's actually causes more difficulty than it may sound like, but I'll skip
diving into it for now.
The fix for this was relatively
simple:
use some flag logic in the cabal file instead of a complicated custom
Setup.hs
file. (Once this pull request was merged in and released, it did
fix the original bug report.) But don't take this as a critique of Leon's
choice of a complicated Setup.hs
file. Because in reality, the flag trick-
while the "standard" solution to this problem- broke cabal-install's
dependency solver for quite a
while. To be fair, I'm still
not completely convinced that the bug is fixed, but for now that bug is the
lesser of two evils vs the Cabal library bug.
And finally, based on the bug report from Ozgun, it seems like an internal build failed based on all of this occurring. This has been a constant criticism I've made about the way we generally do builds in the Haskell world. Rarely is reproducibility a part of the toolchain. To quote Ozgun:
We are in fact quite careful in dependency management with lower-upper bounds on most outside packages, so breakages like this are unexpected.
And many people feel that this is the way things should be. But as this discussion hopefully emphasizes, just playing with lower and upper bounds is not sufficient to avoid build failures in general. In this case, we're looking at a piece of software that was broken by a change in a library that it didn't depend on, namely Cabal, since our tooling makes an implicit dependency on that library, and we have no way of placing bounds on it.
So here are the toolchain problems I've identified above:
Stackage completely solves (2), (3), (5), and (6) for end users. By specifying all library versions used, and then testing all of those versions together, we avoid many possible corner cases of weird library interactions, and provide a fully reproducible build. (Note the Stackage doesn't solve all such cases: operating system, system libraries, executables, etc are still unpinned. That's why FP Complete is working on Docker-based tooling.)
(1) is highly mitigated by Stackage because, even though the tight coupling still exists, Stackage provides a set of packages that take that coupling into account for you, so you're not stuck trying to put the pieces together yourself.
As for (4)... Stackage helps the situation by making the job of the solver simpler by pinning down version numbers. Unfortunately, there are still potential gotchas when encountering solver bugs. Sometimes we end up needing to implement terribly awkward solutions to work around those bugs.
The analysis of preferred flow regimes in the previous article is all very well, and in its way quite illuminating, but it was an entirely static analysis – we didn’t make any use of the fact that the original $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ data we used was a time series, so we couldn’t gain any information about transitions between different states of atmospheric flow. We’ll attempt to remedy that situation now.
What sort of approach can we use to look at the dynamics of changes in patterns of $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$? Our $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ parameterisation of flow patterns seems like a good start, but we need some way to model transitions between different flow states, i.e. between different points on the $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ sphere. Each of our original $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ maps corresponds to a point on this sphere, so we might hope that we can some up with a way of looking at trajectories of points in $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ space that will give us some insight into the dynamics of atmospheric flow.
Since atmospheric flow clearly has some stochastic element to it, a natural approach to take is to try to use some sort of Markov process to model transitions between flow states. Let me give a very quick overview of how we’re going to do this before getting into the details. In brief, we partition our $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ phase space into $<semantics>P<annotation\; encoding="application/x-tex">P</annotation></semantics>$ components, assign each $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ pattern in our time series to a component of the partition, then count transitions between partition components. In this way, we can construct a matrix $<semantics>M<annotation\; encoding="application/x-tex">M</annotation></semantics>$ with
$$<semantics>{M}_{ij}=\frac{{N}_{i\to j}}{{N}_{tot}}<annotation\; encoding="application/x-tex">\; M\_\{ij\}\; =\; \backslash frac\{N\_\{i\; \backslash to\; j\}\}\{N\_\{\backslash mathrm\{tot\}\}\}\; </annotation></semantics>$$
where $<semantics>{N}_{i\to j}<annotation\; encoding="application/x-tex">N\_\{i\; \backslash to\; j\}</annotation></semantics>$ is the number of transitions from partition $<semantics>i<annotation\; encoding="application/x-tex">i</annotation></semantics>$ to partition $<semantics>j<annotation\; encoding="application/x-tex">j</annotation></semantics>$ and $<semantics>{N}_{tot}<annotation\; encoding="application/x-tex">N\_\{\backslash mathrm\{tot\}\}</annotation></semantics>$ is the total number of transitions. We can then use this Markov matrix to answer some questions about the type of dynamics that we have in our data – splitting the Markov matrix into its symmetric and antisymmetric components allows us to respectively look at diffusive (or irreversible) and non-diffusive (or conservative) dynamics.
Before trying to apply these ideas to our $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ data, we’ll look (in the next article) at a very simple Markov matrix calculation by hand to get some understanding of what these concepts really mean. Before that though, we need to take a look at the temporal structure of the $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ data – in particular, if we’re going to model transitions between flow states by a Markov process, we really want uncorrelated samples from the flow, and our daily $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ data is clearly correlated, so we need to do something about that.
Let’s look at the autocorrelation properties of the PCA projected component time series from our original $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ data. We use the autocorrelation
function in the statistics
package to calculate and save the autocorrelation for these PCA projected time series. There is one slight wrinkle – because we have multiple winters of data, we want to calculate autocorrelation functions for each winter and average them. We do not want to treat all the data as a single continuous time series, because if we do we’ll be treating the jump from the end of one winter to the beginning of the next as “just another day”, which would be quite wrong. We’ll need to pay attention to this point when we calculate Markov transition matrices too. Here’s the code to calculate the autocorrelation:
npcs, nday, nyear :: Int
npcs = 10
nday = 151
nyear = 66
main :: IO ()
main = do
-- Open projected points data file for input.
Right innc <- openFile $ workdir </> "z500-pca.nc"
let Just ntime = ncDimLength <$> ncDim innc "time"
let (Just projvar) = ncVar innc "proj"
Right (HMatrix projsin) <-
getA innc projvar [0, 0] [ntime, npcs] :: HMatrixRet CDouble
-- Split projections into one-year segments.
let projsconv = cmap realToFrac projsin :: Matrix Double
lens = replicate nyear nday
projs = map (takesV lens) $ toColumns projsconv
-- Calculate autocorrelation for one-year segment and average.
let vsums :: [Vector Double] -> Vector Double
vsums = foldl1 (SV.zipWith (+))
fst3 (x, _, _) = x
doone :: [Vector Double] -> Vector Double
doone ps = SV.map (/ (fromIntegral nyear)) $
vsums $ map (fst3 . autocorrelation) ps
autocorrs = fromColumns $ map doone projs
-- Generate output file.
let outpcdim = NcDim "pc" npcs False
outpcvar = NcVar "pc" NcInt [outpcdim] M.empty
outlagdim = NcDim "lag" (nday - 1) False
outlagvar = NcVar "lag" NcInt [outlagdim] M.empty
outautovar = NcVar "autocorr" NcDouble [outpcdim, outlagdim] M.empty
outncinfo =
emptyNcInfo (workdir </> "autocorrelation.nc") #
addNcDim outpcdim # addNcDim outlagdim #
addNcVar outpcvar # addNcVar outlagvar #
addNcVar outautovar
flip (withCreateFile outncinfo) (putStrLn . ("ERROR: " ++) . show) $
\outnc -> do
-- Write coordinate variable values.
put outnc outpcvar $
(SV.fromList [0..fromIntegral npcs-1] :: SV.Vector CInt)
put outnc outlagvar $
(SV.fromList [0..fromIntegral nday-2] :: SV.Vector CInt)
put outnc outautovar $ HMatrix $
(cmap realToFrac autocorrs :: Matrix CDouble)
return ()
We read in the component time series as a hmatrix
matrix, split the matrix into columns (the individual component time series) then split each time series into year-long segments. The we use the autocorrelation
function on each segment of each time series (dropping the confidence limit values that the autocorrelation
function returns since we’re not so interested in those here) and average across segments of each time series. The result is an autocorrelation function (for lags from zero to $<semantics>\U0001d697\U0001d68d\U0001d68a\U0001d6a2-2<annotation\; encoding="application/x-tex">\backslash mathtt\{nday\}-2</annotation></semantics>$) for each PCA component. We write those to a NetCDF file for further processing.
The plot below shows the autocorrelation functions for the first three PCA projected component time series. The important thing to notice here is that there is significant autocorrelation in each of the PCA projected component time series out to lags of 5–10 days (the horizontal line on the plot is at a correlation of $<semantics>{e}^{-1}<annotation\; encoding="application/x-tex">e^\{-1\}</annotation></semantics>$). This makes sense – even at the bottom of the atmosphere, where temporal variability tends to be less structured than at 500,mb, we expect the weather tomorrow to be reasonably similar to the weather today.
It appears that there is pretty strong correlation in the $<semantics>{Z}_{500}<annotation\; encoding="application/x-tex">Z\_\{500\}</annotation></semantics>$ data at short timescales, which would be an obstacle to performing the kind of Markov matrix analysis we’re going to do next. To get around this, we’re going to average our data over non-overlapping 7-day windows (seven days seems like a good compromise between throwing lots of data away and reducing the autocorrelation to a low enough level) and work with those 7-day means instead of the unprocessed PCA projected component time series. This does mean that we now need to rerun all of our spherical PDF analysis for the 7-day mean data, but that’s not much of a problem because everything is nicely scripted and it’s easy to rerun it all.
The figures below show the same plots as we earlier had for all the PCA projected component time series, except this time we’re looking at the 7-day means of the projected component time series, to ensure that we have data without significant temporal autocorrelation.
The first figure tab (“Projected points”) shows the individual 7-day mean data points, plotted using $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ polar coordinates. Comparing with the corresponding plot for all the data in the earlier article, we can see (obviously!) that there’s less data here, but also that it’s not really any easier to spot clumping in the data points than it was when we used all the data. It again makes sense to do KDE to find a smooth approximation to the probability density of our atmospheric flow patterns.
The “Spherical PDF” tab shows the spherical PDF of 7-day mean PCA components (parametrised by spherical polar coordinates $<semantics>\theta <annotation\; encoding="application/x-tex">\backslash theta</annotation></semantics>$ and $<semantics>\varphi <annotation\; encoding="application/x-tex">\backslash phi</annotation></semantics>$) calculated by kernel density estimation: darker colours show regions of greater probability density. Two “bumps” are labelled for further consideration. Compared to the “all data” PDF, the kernel density estimate of the probability density for the 7-day mean data is more concentrated, with more of the probability mass appearing in the two labelled bumps on the plot. (Recall that the “all data” PDF had four “bumps” that we picked out to look at – here we only really have two clear bumps.)
We can determine the statistical significance of those bumps in exactly the same way as we did before. The “Significance” tab above shows the results. As you’d expect, both of the labelled bumps are highly significant. However, notice that the significance scale here extends only to 99% significance, while that for that “all data” case extends to 99.9%. The reduced significance levels are simply a result of having less data points – we have 1386 7-day mean points as compared to 9966 “all data” points, which means that we have more sampling variability in the null hypothesis PDFs that we use to generate the histograms used for the significance calculation. That increased sampling variability translates into less certainty that our “real data” couldn’t have occurred by chance, given the assumptions of the null hypothesis. Still, 99% confidence isn’t too bad!
Finally, we can plot the spatial patterns of atmospheric flow corresponding to the labelled bumps in the PDF, just as we did for the “all data” case. The “Bump patterns” tab shows the patterns for the two most prominent bumps in the 7-day means PDF. As before, the two flow patterns seem to distinguish quite clearly between “normal” zonal flow (in this case, pattern #2) and blocking flow (pattern #1).
Now that we’ve dealt with this autocorrelation problem, we’re ready to start thinking about how we model transitions between different flow states. In the next article, we’ll use a simple low-dimensional example to explain what we’re going to do.
Summary: Haskell is great for refactoring, thanks to being able to reason about and transform programs with confidence.
I think one of Haskell's strengths as a practical language is that it's easy to refactor, and more importantly, easy to refactor safety. Programs in the real world often accumulate technical debt - code that is shaped more by its history than its current purpose. Refactoring is one way to address that technical debt, making the code simpler, but not changing any meaningful behaviour.
When refactoring, you need to think of which alternative forms of code are equivalent but better. In C++ I've removed unused variables, only to find they were RAII variables, and their mere presence had a semantic effect. In R I've removed redundant if
expressions, only to find the apparently pure condition had the effect of coercing a variable and changing its type. In Haskell, it's equally possible to make refactorings that at first glance appear safe but aren't - however, in practice, it happens a lot less. I think there are a few reasons for that:
unsafePerformIO
, which could harm refactoring, are almost always used behind a suitable pure abstraction.Note that these reasons are due to both the language, and the conventions of the Haskell community. (Despite these factors, there are a few features that can trip up refactorings, e.g. exceptions, record wildcards, space-leaks.)
To take a very concrete example, today I was faced with the code:
f = fromMaybe (not b) . select
if f v == b then opt1 else opt2
At one point the function f
was used lots, had a sensible name and nicely abstracted some properties. Now f
is used once, the semantics are captured elsewhere, and the code is just unclear. We can refactor this statement, focusing on the condition:
f v == b
-- inline f
(fromMaybe (not b) . select) v == b
-- remove brackets and inline (.)
fromMaybe (not b) (select v) == b
-- expand to a case statement
(case select v of Nothing -> not b; Just x -> x) == b
-- push the == down
case select v of Nothing -> not b == b; Just x -> x == b
-- simplify not b == b
case select v of Nothing -> False; Just x -> x == b
-- collapse back up
select v == Just b
And now substitute back in:
if select v == Just b then opt1 else opt2
Our code is now much simpler and more direct. Thanks to the guarantees I expect of Haskell programs, I also have a high degree of confidence this code really is equivalent - even if it isn't obvious just looking at beginning and end.
Someone posted to the Haskell subreddit this blogpost of Lennart where he goes step-by-step through implementing an evaluator and type checker for CoC. I don't know why this post from 2007 showed up on Reddit this week, but it's a very good post, pedagogically speaking. Go and read it.
In this post, I'd like to elaborate on the simply-typed lambda calculus part of his blogpost. His typechecker defines the following types for representing STLC types, terms, and environments:
data Type = Base | Arrow Type Type deriving (Eq, Show) type Sym = String data Expr = Var Sym | App Expr Expr | Lam Sym Type Expr deriving (Eq, Show)
The signature of the typechecker presented in his post is as follows:
type ErrorMsg = String type TC a = Either ErrorMsg a newtype Env = Env [(Sym, Type)] deriving (Show) tCheck :: Env -> Expr -> TC Type
My approach is to instead create a representation of terms of STLC in such a way that only well-scoped, well-typed terms can be represented. So let's turn on a couple of heavy-weight language extensions from GHC 7.8 (we'll see how each of them is used), and define a typed representation of STLC terms:
{-# LANGUAGE GADTs, StandaloneDeriving #-} {-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, TypeOperators #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TemplateHaskell #-} -- sigh... import Data.Singletons.Prelude import Data.Singletons.TH import Data.Type.Equality -- | A (typed) variable is an index into a context of types data TVar (ts :: [Type]) (a :: Type) where Here :: TVar (t ': ts) t There :: TVar ts a -> TVar (t ': ts) a deriving instance Show (TVar ctx a) -- | Typed representation of STLC: well-scoped and well-typed by construction data TTerm (ctx :: [Type]) (a :: Type) where TConst :: TTerm ctx Base TVar :: TVar ctx a -> TTerm ctx a TLam :: TTerm (a ': ctx) b -> TTerm ctx (Arrow a b) TApp :: TTerm ctx (Arrow a b) -> TTerm ctx a -> TTerm ctx b deriving instance Show (TTerm ctx a)
The idea is to represent the context of a term as a list of types of variables in scope, and index into that list, de Bruijn-style, to refer to variables. This indexing operation maintains the necessary connection between the pointer and the type that it points to. Note the type of the TLam constructor, where we extend the context at the front for the inductive step.
To give a taste of how convenient it is to work with this representation programmatically, here's a total evaluator:
-- | Interpretation (semantics) of our types type family Interp (t :: Type) where Interp Base = () Interp (Arrow t1 t2) = Interp t1 -> Interp t2 -- | An environment gives the value of all variables in scope in a given context data Env (ts :: [Type]) where Nil :: Env '[] Cons :: Interp t -> Env ts -> Env (t ': ts) lookupVar :: TVar ts a -> Env ts -> Interp a lookupVar Here (Cons x _) = x lookupVar (There v) (Cons _ xs) = lookupVar v xs -- | Evaluate a term of STLC. This function is total! eval :: Env ctx -> TTerm ctx a -> Interp a eval env TConst = () eval env (TVar v) = lookupVar v env eval env (TLam lam) = \x -> eval (Cons x env) lam eval env (TApp f e) = eval env f $ eval env e
Of course, the problem is that this representation is not at all convenient for other purposes. For starters, it is certainly not how we would expect human beings to type in their programs.
My version of the typechecker is such that instead of giving the type of a term (when it is well-typed), it instead transforms the loose representation (Term) into the tight one (TTerm). A Term is well-scoped and well-typed (under some binders) iff there is a TTerm corresponding to it. Let's use singletons to store type information in existential positions:
$(genSingletons [''Type]) $(singDecideInstance ''Type) -- | Existential version of 'TTerm' data SomeTerm (ctx :: [Type]) where TheTerm :: Sing a -> TTerm ctx a -> SomeTerm ctx -- | Existential version of 'TVar' data SomeVar (ctx :: [Type]) where TheVar :: Sing a -> TVar ctx a -> SomeVar ctx -- | A typed binder of variable names data Binder (ctx :: [Type]) where BNil :: Binder '[] BCons :: Sym -> Sing t -> Binder ts -> Binder (t ': ts)
Armed with these definitions, we can finally define the type inferer. I would argue that it is no more complicated than Lennart's version. In fact, it has the exact same shape, with value-level equality tests replaced with Data.Type.Equality-based checks.
-- | Type inference for STLC infer :: Binder ctx -> Term -> Maybe (SomeTerm ctx) infer bs (Var v) = do TheVar t v' <- inferVar bs v return $ TheTerm t $ TVar v' infer bs (App f e) = do TheTerm (SArrow t0 t) f' <- infer bs f TheTerm t0' e' <- infer bs e Refl <- testEquality t0 t0' return $ TheTerm t $ TApp f' e' infer bs (Lam v ty e) = case toSing ty of SomeSing t0 -> do TheTerm t e' <- infer (BCons v t0 bs) e return $ TheTerm (SArrow t0 t) $ TLam e' inferVar :: Binder ctx -> Sym -> Maybe (SomeVar ctx) inferVar (BCons u t bs) v | v == u = return $ TheVar t Here | otherwise = do TheVar t' v' <- inferVar bs u return $ TheVar t' $ There v' inferVar _ _ = Nothing
Note that pattern matching on Refl in the App case brings in scope type equalities that are crucial to making infer well-typed.
Of course, because of the existential nature of SomeVar, we should provide a typechecker as well which is a much more convenient interface to work with:
-- | Typechecker for STLC check :: forall ctx a. (SingI a) => Binder ctx -> Term -> Maybe (TTerm ctx a) check bs e = do TheTerm t' e' <- infer bs e Refl <- testEquality t t' return e' where t = singByProxy (Proxy :: Proxy a) -- | Typechecker for closed terms of STLC check_ :: (SingI a) => Term -> Maybe (TTerm '[] a) check_ = check BNil
(The SingI a constraint is an unfortunate implementation detail; the kind of a is Type, which is closed, so GHC should be able to know there is always going to be a SingI a instance).
To review, we've written a typed embedding of STLC into Haskell, with a total evaluator and a typechecker, in about 110 lines of code.
If we were doing this in something more like Agda, one possible improvement would be to define a function untype :: TTerm ctx a -> Term and use that to give check basically a type of Binder ctx -> (e :: Term) -> Either ((e' :: TTerm ctx a) -> untype e' == e -> Void) (TTerm ctx a), i.e. to give a proof in the non-well-typed case as well.
San Francisco, CA – Telecommute
FP Complete is excited to announce we are hiring again. Currently we are looking to hire creative software engineers to fill a couple testing roles we have open. You will be joining our Haskell development team to help us test and validate our products. Here is a little more about the positions we're looking to fill.
Haskell Test Software Engineer – Scientific Simulation
For this role we are looking for someone to help build our test and delivery capabilities. You will be working as a member of the development team providing direct input and support on product implementation, testing, and quality. Your mission is to innovate on the test infrastructure enabling and implementing automated tests and test suites across multiple product component. Learn more here Haskell Test Software Engineer – Scientific Simulation
Software Test Engineer – Scientific Medical Simulation
For this position we are looking for creative software test engineers to work on our scientific medical SaaS product. You will be working as a member of an international product team and you will be expected to provide direct input on product implementation, testing, and quality. Your mission is to represent the customer. You will learn the system from top to bottom validating the product and making sure it delivers what the customer needs. Learn more here Software Test Engineer – Scientific Medical Simulation
If you’d like to be part of our team and shape the future of Haskell, please send a resume or CV to admin@fpcomplete.com. Please include the title of the position you're applying for in the subject line.
Monads are often considered to be a stumbling block for learning Haskell. Somehow they are thought of as scary and hard to understand. They are however nothing more than a design pattern, and a rather simple one at that. Monads give us a standardized way to glue a particular kind of computations together, passing the result of each computation to the next. As such they are useful well beyond functional programming.
My favourite example is the well-known “callback hell” in JavaScript. JavaScript functions often take a callback as argument, a function that they invoke on completion. Functions that take a callback argument are called asynchronous functions. The problem arises when we want to call another asynchronous function inside the callback; we end up with deeply nested callbacks.
Let’s consider an example. Suppose we want to collect configuration files for our application in the current directory, in the user’s home directory, and in the system-wide configuration directory. We might do something like
function getConfigFiles(callback) {
function extractConfigFiles(filess) { ... }
fs.readdir(".", function(hereErr, here) {
fs.readdir(process.env.HOME, function(homeErr, home) {
fs.readdir("/etc", function(etcErr, etc) {
callback(extractConfigFiles([here, home, etc]));
})
})
});
}
Since readdir
is an asynchronous function, and we need to call it three times, we end up with this deeply nested structure. You can see how this might get out of hand. By contrast, in Haskell we would recognize that these kinds of asynchronous functions form a monad (the continuation monad, to be precise); after all, this fits the pattern: we want to glue asynchronous functions together, passing the result of each function to the next. In Haskell we would write the above code as
getConfigFiles = do
here <- readdir "."
home <- readdir homedir
etc <- readdir "/etc"
return $ extractConfigFiles [here, home, etc]
where
extractConfigFiles = ...
This looks and feels like simple sequential code, but it isn’t; this code is precisely equivalent to the JavaScript example above it. Note that there are tons of attempts to address callback hell in JavaScript; many of which are in fact inspired by monads.
Let’s now move from the mainstream and high level to the more esoteric and low level. Ziria is a domain specific language designed at Microsoft Research specifically for the development of Software-defined radios. Well-Typed have been working with Microsoft Research over the last few months to improve the Ziria compiler (itself written in Haskell), primarily the front end (internal code representation, parser, scope resolution and type checking) and the optimizer.
Ziria is a two-level language. The expression language is a fairly standard imperative language. For example, we can write a function that computes factorial as
fun fac(n : int) {
var result : int := 1;
while(n > 0) {
result := result * n;
n := n - 1;
}
return result
}
The expression language contains all the usual suspects (mutable variables, arrays, structs, control flow constructs, etc.) as well as good support for bit manipulation and complex numbers.
The more interesting part of the language however is the computation language. Ziria programs define stream computations: programs that transform an incoming stream of bits into an outgoing stream of bits. In particular, Ziria is designed so that we can write such stream computations in a high level way and yet end up with highly performant code; Ziria’s flagship example is an implementation of the WiFi 802.11a/g protocol that is able to operate in realtime.
The simplest way to turn our simple factorial computation into a stream transformer is to map
our fac
function:
let comp streamFac = map fac
But there are also other ways to build up stream computations. There are two fundamental primitive stream computations: take
reads an element from the input stream, and emit
writes an element to the output stream. Of course now we need a way to glue stream computations together; you guessed it, they form a monad. For example, here is a stream processor which creates an output stream such that each element of the output stream is the sum of all preceding elements of the input stream, starting with some initial value init:
fun comp sum(init : int) {
var total : int := init;
repeat {
x <- take;
do { total := total + x }
emit total;
}
}
As a second example, consider this function which copies the first n elements from the input stream to the output stream and then terminates, returning the last element it wrote to the output stream:
fun comp prefix(n : int) {
var last : int;
times n {
x <- take;
do { last := x }
emit x;
}
return last
}
This kind of monadic composition where the result of one stream computation is passed to another is known as “control path composition” in the Ziria literature. We also have “data path composition”, where the output stream of one processor becomes the input stream of another. For example, consider
let comp compositionExample = {
var total : int := 0;
repeat {
newTotal <- sum(total) >>> prefix(5);
do { total := newTotal - 1 }
}
}
We use data path composition (>>>
) to make the output stream of the sum
stream transformer the input stream of the prefix
stream computer. We then use control path composition to update a local variable with the last value that was written minus one, and we repeat. The effect is that we sum the input stream, but decrement the running total by one every 5 elements. So, given an input stream
1,2,3,4,5,6,7,8,9,10
we output
1,3,6,10,15,20,27,35,44,54
Ziria also offers a variant operator (|>>>|
) which makes sure that the computations performed by the two stream processors happen in parallel.
A Haskell perspective. Stream computers can be compared to Haskell pipes, where a stream computer is something of typePipe a b IO
, with an input stream of typea
and an output stream of typeb
;do
is the equivalent ofliftIO
. Control path composition corresponds to monadic or “horizontal” composition, while data path composition corresponds to vertical composition.
Monads come with laws: properties that most hold true for all monadic code. The Ziria compiler takes advantage of these laws in the optimizer. For example, suppose you write
x <- take
y <- return (x + 1)
emit y
At this point the so-called “left identity” law kicks in, and the compiler rewrites this to
x <- take
let y = x + 1
emit y
which will then subsequently be cleaned up by the inliner. Other optimizations applied by the Ziria compiler include loop unrolling, partial evaluation, etc. It also uses a simple SMT solver to remove unnecessary conditionals, following the approach we described in a previous blogpost.
One optimization deserves special mention, although it’s not related to monads per se. The vectorization optimization turns a stream computer that takes single elements from the input stream and outputs single elements to the output stream into a stream computer which takes arrays of elements from the input stream and outputs arrays of elements to the output stream, so that the resulting code can be further optimized to operate on multiple elements simultaneously and to reduce the overhead involved from reading and writing to stream buffers.
For example, consider the following Ziria program:
fun sumArray(xs : arr int) {
var total : int := 0;
for i in [0, length(xs)] {
total := total + xs[i];
}
return total;
}
let comp sum4 = {
repeat {
xs <- takes 4;
emit sumArray(xs);
}
}
let comp stutter = {
repeat {
x <- take;
emit x;
emit x;
}
}
let comp stutterSum = sum4 >>> stutter
Computation sum4
takes 4 elements from the input stream, sums them up, and emits the result; we say that the cardinality of sum4
is 4:1. Computation stutter
writes every element in the input stream twice to the output stream; we say that its cardinality is 1:2; the cardinality of the composition stutterSum
is therefore 2:1. The optimizer turns this program into this (cleaned up for readability only):
fun sum4_vect(xs : arr[288] int) {
var ys : arr[72] int
for i in [0, 72] {
let xs : arr[4] int = xs[i*4:+4]
var total : int
total := xs[0];
total := total+xs[1];
total := total+xs[2];
total := total+xs[3];
ys[i] := total
};
return ys
}
fun stutter_vect(xs : arr[72] int) {
var ys : arr[144] int
for i in [0, 72] {
ys[i*2] := xs[i];
ys[i*2+1] := xs[i]
};
return ys
}
let comp stutterSum = map sum4_vect >>> map stutter_vect
The optimizer has done an excellent job here. Both sum4
and stutter
have become expressions, rather than computations, that are passed as arguments to map
, which can be optimized better in the code generator; sum4
now takes an array of 288 elements and returns arrays of 72 elements (4:1), while stutter_vect
takes arrays of 72 elements and returns arrays of 144 elements (2:1) and the inner loop in stutter
has been unrolled.
This ability of the Ziria compiler to optimize the code for different kinds of pipeline widths is one of the reasons that it is possible to write software-defined radios in Ziria in a high level manner; with other approaches such as Sora this kind of optimization had to be done by hand. The Ziria compiler also does a number of other optimizations; for a more detailed discussion, see Ziria: A DSL for wireless systems programming.
Monads aren’t an obscure concept that has been invented just to work around peculiarities of Haskell. They are a very general and universal design principle with applications everywhere. The concept of monads has been found useful in JavaScript, C++, Java, Scala, C#, Ruby, Rust, Go, and many other languages. Recognizing the monad pattern when it arises in your code can lead to nicer, more readable and easier to maintain designs, and recognizing the monad pattern in language design helps programmers do this. In Ziria monads turn out to be precisely the right abstraction that makes it possible to have a language in which we can write software-defined radios at a very high level of abstraction and which can yet be compiled down to highly efficient code.
Having just returned from the annual Oregon Programming Languages Summer School, at which I teach every year, I am once again very impressed with the impressive growth in the technical sophistication of the field and with its ability to attract brilliant young students whose enthusiasm and idealism are inspiring. Eugene was, as ever, an ideal setting for the summer school, providing a gorgeous setting for work and relaxation. I was particularly glad for the numerous chances to talk with students outside of the classroom, usually over beer, and I enjoyed, as usual, the superb cycling conditions in Eugene and the surrounding countryside. Many students commented to me that the atmosphere at the summer school is wonderful, filled with people who are passionate about programming languages research, and suffused with a spirit of cooperation and sharing of ideas.
Started by Zena Ariola a dozen years ago, this year’s instance was organized by Greg Morrisett and Amal Ahmed in consultation with Zena. As usual, the success of the school depended critically on the dedication of Jim Allen, who has been the de facto chief operating officer since it’s inception. Without Jim, OPLSS could not exist. His attention to detail, and his engagement with the students are legendary. Support from the National Science Foundation CISE Division, ACM SIGPLAN, Microsoft Research, Jane Street Capital, and BAE Systems was essential for providing an excellent venue, for supporting a roster of first-rate lecturers, and for supporting the participation of students who might otherwise not have been able to attend. And, of course, an outstanding roster of lecturers donated their time to come to Eugene for a week to share their ideas with the students and their fellow lecturers.
The schedule of lectures is posted on the web site, all of which were taped, and are made available on the web. In addition many speakers provided course notes, software, and other backing materials that are also available online. So even if you were not able to attend, you can still benefit from the summer school, and perhaps feel more motivated to come next summer. Greg and I will be organizing, in consultation with Zena. Applying the principle “don’t fix what isn’t broken”, we do not anticipate major changes, but there is always room for improvement and the need to freshen up the content every year. For me the central idea of the summer school is the applicability of deep theory to everyday practice. Long a dream held by researchers such as me, these connections become more “real” every year as the theoretical abstractions of yesterday become the concrete practices of today. It’s breathtaking to see how far we’ve come from the days when I was a student just beginning to grasp the opportunities afforded by ideas from proof theory, type theory, and category theory (the Holy Trinity) to building beautiful software systems. No longer the abstruse fantasies of mad (computer) scientists, these ideas are the very air we breathe in PL research. Gone are the days of ad hoc language designs done in innocence of the foundations on which they rest. Nowadays serious industrial-strength languages are emerging that are grounded in theory and informed by practice.
Two examples have arisen just this summer, Rust (from Mozila) and Swift (from Apple), that exemplify the trend. Although I have not had time to study them carefully, much less write serious code using them, it is evident from even a brief review of their web sites that these are serious languages that take account of the academic developments of the last couple of decades in formulating new language designs to address new classes of problems that have arisen in programming practice. These languages are type safe, a basic criterion of sensibility, and feature sophisticated type systems that include ideas such as sum types, which have long been missing from commercial languages, or provided only in comically obtuse ways (such as objects). The infamous null pointer mistakes have been eradicated, and the importance of pattern matching (in the sense of the ML family of languages) is finally being appreciated as the cure for Boolean blindness. For once I can look at new industrial languages without an overwhelming sense of disappointment, but instead with optimism and enthusiasm that important ideas are finally, at long last, being recognized and adopted. As has often been observed, it takes 25 years for an academic language idea to make it into industrial practice. With Java it was simply the 1970’s idea of automatic storage management; with languages such as Rust and Swift we are seeing ideas from the 80’s and 90’s make their way into industrial practice. It’s cause for celebration, and encouragement for those entering the field: the right ideas do win out in the end, one just has to have the courage to be irrelevant.
I hope to find the time to comment more meaningfully on the recent developments in practical programming languages, including Rust and Swift, but also languages such as Go and OCaml that are also making inroads into programming practice. (The overwhelming success and future dominance of Haskell is self-evident. Kudos!) But for now, let me say that the golden age of programming language research is here and now, and promises to continue indefinitely.
Update: word smithing.
The spherical PDF we constructed by kernel density estimation in the article before last appeared to have “bumps”, i.e. it’s not uniform in $<semantics>\theta <annotation\; encoding="application/x-tex">\backslash theta</annotation></semantics>$ and $<semantics>\varphi <annotation\; encoding="application/x-tex">\backslash phi</annotation></semantics>$. We’d like to interpret these bumps as preferred regimes of atmospheric flow, but before we do that, we need to decide whether these bumps are significant. There is a huge amount of confusion that surrounds this idea of significance, mostly caused by blind use of “standard recipes” in common data analysis cases. Here, we have some data analysis that’s anything but standard, and that will rather paradoxically make it much easier to understand what we really mean by significance.
So what do we mean by “significance”? A phenomena is significant if it is unlikely to have occurred by chance. Right away, this definition raises two questions. First, chance implies some sort of probability, a continuous quantity, which leads to the idea of different levels of significance. Second, if we are going to be thinking about probabilities, we are going to need to talk about a distribution for those probabilities. The basic idea is thus to compare our data to a distribution that we explicitly decide based on a null hypothesis. A bump in our PDF will be called significant if it is unlikely to have occurred in data generated under the assumptions in our null hypothesis.
So, what’s a good null hypothesis in this case? We’re trying to determine whether these bumps are a significant deviation from “nothing happening”. In this case, “nothing happening” would mean that the data points we use to generate the PDF are distributed uniformly over the unit sphere parametrised by $<semantics>\theta <annotation\; encoding="application/x-tex">\backslash theta</annotation></semantics>$ and $<semantics>\varphi <annotation\; encoding="application/x-tex">\backslash phi</annotation></semantics>$, i.e. that no point in $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ space is any more or less likely to occur than any other. We’ll talk more about how we make use of this idea (and how we sample from our “uniform on the sphere” null hypothesis distribution) below.
We thus want some way of comparing the PDF generated by doing KDE on our data points to PDFs generated by doing KDE on “fake” data points sampled from our null hypothesis distribution. We’re going to follow a sampling-based approach: we will generate “fake” data sets, do exactly the same analysis we did on our real data points to produce “fake” PDFs, then compare these “fake” PDFs to our real one (in a way that will be demonstrated below).
There are a couple of important things to note here, which I usually think of under the heading of “minimisation of cleverness”. First, we do exactly the same analysis on our “fake” data as we do on our “real” data. That means that we can treat arbitrarily complex chains of data analysis without drama: here, we’re doing KDE, which is quite complicated from a statistical perspective, but we don’t really need to think about that, because the fact that we treat the fake and real data sets identically means that we’re comparing like with like and the complexity just “divides out”. Second, because we’re simulating, in the sense that we generate fake data based on a null hypothesis on the data and run it through whatever data analysis steps we’re doing, we make any assumptions that go into our null hypothesis perfectly explicit. If we assume Gaussian data, we can see that, because we’ll be sampling from a Gaussian to generate our fake data. If, as in this case, our null hypothesis distribution is something else, we’ll see that perfectly clearly because we sample directly from that distribution to generate our fake data.
This is a huge contrast to the usual case for “normal” statistics, where one chooses some standard test ($<semantics>t<annotation\; encoding="application/x-tex">t</annotation></semantics>$-test, Mann-Whitney test, Kolmogorov-Smirnov test, and so on) and turns a crank to produce a test statistic. In this case, the assumptions inherent in the form of the test are hidden – a good statistician will know what those assumptions are and will understand the consequences of them, but a bad statistician (I am a bad statistician) won’t and will almost certainly end up applying tests outside of the regime where they are appropriate. You see this all the time in published literature: people use tests that are based on the assumption of Gaussianity on data that clearly isn’t Gaussian, people use tests that assume particular variance structures that clearly aren’t correct, and so on. Of course, there’s a very good reason for this. The kind of sampling-based strategy we’re going to use here needs a lot of computing power. Before compute power was cheap, standard tests were all that you could do. Old habits die hard, and it’s also easier to teach a small set of standard tests than to educate students in how to design their own sampling-based tests. But we have oodles of compute power, we have a very non-standard situation, and so a sampling-based approach allows us to sidestep all the hard thinking we would have to do to be good statisticians in this sense, which is what I meant by “minimisation of cleverness”.
So, we’re going to do sampling-based significance testing here. It is shockingly easy to do and, if you’ve been exposed to the confusion of “traditional” significance testing, shockingly easy to understand.
In order to test the significance of the bumps we see in our spherical PDF, we need some way of sampling points from our null hypothesis distribution, i.e. from a probability distribution that is uniform on the unit sphere. The simplest way to do this is to sample points from a spherically symmetric three-dimensional probability distribution then project those points onto the unit sphere. The most suitable three-dimensional distribution, at least from the point of view of convenience, is a three dimensional Gaussian distribution with zero mean and unit covariance matrix. This is particularly convenient because if we sample points $<semantics>\mathbf{u}=(x,y,z)<annotation\; encoding="application/x-tex">\backslash mathbf\{u\}\; =\; (x,\; y,\; z)</annotation></semantics>$ from this distribution, each of the coordinates $<semantics>x<annotation\; encoding="application/x-tex">x</annotation></semantics>$, $<semantics>y<annotation\; encoding="application/x-tex">y</annotation></semantics>$ and $<semantics>z<annotation\; encoding="application/x-tex">z</annotation></semantics>$ are individually distributed as a standard Gaussian, i.e. $<semantics>x\sim \mathcal{N}(0,1)<annotation\; encoding="application/x-tex">x\; \backslash sim\; \backslash mathcal\{N\}(0,\; 1)</annotation></semantics>$, $<semantics>y\sim \mathcal{N}(0,1)<annotation\; encoding="application/x-tex">y\; \backslash sim\; \backslash mathcal\{N\}(0,\; 1)</annotation></semantics>$, $<semantics>z\sim \mathcal{N}(0,1)<annotation\; encoding="application/x-tex">z\; \backslash sim\; \backslash mathcal\{N\}(0,\; 1)</annotation></semantics>$. To generate $<semantics>N<annotation\; encoding="application/x-tex">N</annotation></semantics>$ random points uniformly distributed on the unit sphere, we can thus just generate $<semantics>3N<annotation\; encoding="application/x-tex">3N</annotation></semantics>$ standard random deviates, partition them into 3-vectors and normalise each vector. Haskell code to do this sampling using the mwc-random
package is shown below – here, nData
is the number of points we want to sample, and the randPt
function generates a single normalised $<semantics>(x,y,z)<annotation\; encoding="application/x-tex">(x,\; y,\; z)</annotation></semantics>$ as a Haskell 3-tuple (as usual, the code is in a Gist; this is from the make-unif-pdf-sample.hs program):
-- Random data point generation.
gen <- create
let randPt gen = do
unnorm <- SV.replicateM 3 (standard gen)
let mag = sqrt $ unnorm `dot` unnorm
norm = scale (1.0 / mag) unnorm
return (norm ! 0, norm ! 1, norm ! 2)
-- Create random data points, flatten to vector and allocate on
-- device.
dataPts <- replicateM nData (randPt gen)
If we sample the same number of points from this distribution that we have in our real data and then use the same KDE approach the we used for the real data to generate an empirical PDF on the unit sphere, what do we get? Here’s what one distribution generated by this procedure looks like, using the same colour scale as the “real data” distribution in the earlier article to aid in comparison (darker colours show regions of greater probability density):
We can see that our sample from the null hypothesis distribution also has “bumps”, although they seem to be less prominent than the bumps in PDF for our real data. Why do we see bumps here? Our null hypothesis distribution is uniform, so why is the simulated empirical PDF bumpy? The answer, of course, is sampling variation. If we sample 9966 points on the unit sphere, we are going to get some clustering of points (leading to bumps in the KDE-derived distribution) just by chance. Those chance concentrations of points are what lead to the bumps in the plot above.
What we ultimately want to do then is to answer the question: how likely is it that the bumps in the distribution of our real data could have arisen by chance, assuming that our real data arose from a process matching our null hypothesis?
The way we’re going to answer the question posed in the last section is purely empirically. We’re going to generate empirical distributions (histograms) of the possible values of the null hypothesis distribution to get a picture of the sampling variability that is possible, then we’re going to look at the values of our “real data” distribution and calculate the proportion of the null hypothesis distribution values less than the real data distribution values. This will give us the probability that our real data distribution could have arisen by chance if the data really came from the null hypothesis distribution.
In words, it sounds complicated. In reality and in code, it’s not. First, we generate a large number of realisations of the null hypothesis distribution, by sampling points on the unit sphere and using KDE to produce PDFs from those point distributions in exactly the same way that we did for our real data, as shown here (code from the make-hist.hs program):
-- Generate PDF realisations.
pdfs <- forM [1..nrealisations] $ \r -> do
putStrLn $ "REALISATION: " ++ show r
-- Create random data points.
dataPts <- SV.concat <$> replicateM nData (randPt gen)
SV.unsafeWith dataPts $ \p -> CUDA.pokeArray (3 * nData) p dDataPts
-- Calculate kernel values for each grid point/data point
-- combination and accumulate into grid.
CUDA.launchKernel fun gridSize blockSize 0 Nothing
[CUDA.IArg (fromIntegral nData), CUDA.VArg dDataPts, CUDA.VArg dPdf]
CUDA.sync
res <- SVM.new (ntheta * nphi)
SVM.unsafeWith res $ \p -> CUDA.peekArray (ntheta * nphi) dPdf p
unnormv <- SV.unsafeFreeze res
let unnorm = reshape nphi unnormv
-- Normalise.
let int = dtheta * dphi * sum (zipWith doone sinths (toRows unnorm))
return $ cmap (realToFrac . (/ int)) unnorm :: IO (Matrix Double)
(This is really the key aspect of this sampling-based approach: we perform exactly the same data analysis on the test data sampled from the null hypothesis distribution that we perform on our real data.) We generate 10,000 realisations of the null hypothesis distribution (stored in the pdfs
value), in this case using CUDA to do the actual KDE calculation, so that it doesn’t take too long.
Then, for each spatial point on our unit sphere, i.e. each point in the $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ grid that we’re using, we collect all the values of our null hypothesis distribution – this is the samples
value in this code:
-- Convert to per-point samples and generate per-point histograms.
let samples = [SV.generate nrealisations (\s -> (pdfs !! s) ! i ! j) :: V |
i <- [0..ntheta-1], j <- [0..nphi-1]]
(rngmin, rngmax) = range nbins $ SV.concat samples
hists = map (histogram_ nbins rngmin rngmax) samples :: [V]
step i = rngmin + d * fromIntegral i
d = (rngmax - rngmin) / fromIntegral nbins
bins = SV.generate nbins step
For each point on our grid on the unit sphere, we then calculate a histogram of the samples from the 10,000 empirical PDF realisations, using the histogram_
function from the statistics
package. We use the same bins for all the histograms to make life easier in the next step.
There’s one thing that’s worth commenting on here. You might think that we’re doing excess work here. Our null hypothesis distribution is spherically symmetric, so shouldn’t the histograms be the same for all points on the unit sphere? Well, should they? Or might the exact distribution of samples depend on $<semantics>\theta <annotation\; encoding="application/x-tex">\backslash theta</annotation></semantics>$, since the $<semantics>(\theta ,\varphi )<annotation\; encoding="application/x-tex">(\backslash theta,\; \backslash phi)</annotation></semantics>$ grid cells will be smaller at the poles of our unit sphere than at the equator? Well, to be honest, I don’t know. And I don’t really care. By taking the approach I’m showing you here, I don’t need to worry about that question, because I’m generating independent histograms for each grid cell on the unit sphere, so my analysis is immune to any effects related to grid cell size. Furthermore, this approach also enables me to change my null hypothesis if I want to, without changing any of the other data analysis code. What if I decide that this spherically symmetric null hypothesis is too weak? What if I want to test my real data against the hypothesis that there is a spherically symmetric background distribution of points on my unit sphere, plus a couple of bumps (of specified amplitude and extent) representing what I think are the most prominent patterns of atmospheric flow? That’s quite a complicated null hypothesis, but as long as I can define it clearly and sample from it, I can use exactly the same data analysis process that I’m showing you here to evaluate the significance of my real data compared to that null hypothesis. (And sampling from a complicated distribution is usually easier than doing anything else with it. In this case, I might say what proportion of the time I expect to be in each of my bump or background regimes, for the background I can sample uniformly on the sphere and for the bumps I can sample from a Kent distribution^{1}.)
Once we have the histograms for each grid point on the unit sphere, we can calculate the significance of the values of the real data distribution (this is from the make-significance.hs program – I split these things up to make checking what was going on during development easier):
-- Split histogram values for later processing.
let nhistvals = SV.length histvals
oneslice i = SV.slice i nbin histvals
histvecs = map oneslice [0, nbin.. nhistvals - nbin]
hists = A.listArray ((0, 0), (ntheta-1, nphi-1)) histvecs
nrealisations = SV.sum $ hists A.! (0, 0)
-- Calculate significance values.
let doone :: CDouble -> CDouble -> CDouble
doone dith diph =
let ith = truncate dith ; iph = truncate diph
pdfval = pdf ! ith ! iph
hist = hists A.! (ith, iph)
pdfbin0 = truncate $ (pdfval - minbin) / binwidth
pdfbin = pdfbin0 `max` 0 `min` nbin - 1
in (SV.sum $ SV.take pdfbin hist) / nrealisations
sig = build (ntheta, nphi) doone :: Matrix CDouble
We read the histograms into the histvals
value from an intermediate NetCDF file and build an array of histograms indexed by grid cell indexes in the $<semantics>\theta <annotation\; encoding="application/x-tex">\backslash theta</annotation></semantics>$ and $<semantics>\varphi <annotation\; encoding="application/x-tex">\backslash phi</annotation></semantics>$ directions. Then, for each grid cell, we determine which histogram bin the relevant value from the real data distribution falls into and sum the histogram values from the corresponding histogram from all the bins smaller than the real data value bin. Dividing this sum by the total number of null hypothesis distribution realisations used to construct the histograms gives us the fraction of null hypothesis distribution values for this grid cell that are smaller than the actual value from the real data distribution.
For instance, if the real data distribution value is greater than 95% of the values generated by the null hypothesis distribution simulation, then we say that we have a significance level of 95% at that point on the unit sphere. We can plot these significance levels in the same way that we’ve been plotting the spherical PDFs. Here’s what those significance levels look like, choosing contour levels for the plot to highlight the most significant regions, i.e. the regions least likely to have occurred by chance if the null hypothesis is true:
In particular, we see that each of the three bumps picked out with labels in the “real data” PDF plot in the earlier article are among the most significant regions of the PDF according to this analysis, being larger than 99.9% of values generated from the null hypothesis uniform distribution.
It’s sort of traditional to try to use some other language to talk about these kinds of results, giving specific terminological meanings to the words “significance levels” and “$<semantics>p<annotation\; encoding="application/x-tex">p</annotation></semantics>$-values”, but I prefer to keep away from that because, as was the case for the terminology surrounding PCA, the “conventional” choices of words are often confusing, either because no-one can agree on what the conventions are (as for PCA) or the whole basis for setting up the conventions is confusing. In the case of hypothesis testing, there are still papers being published in statistical journals arguing about what significance and $<semantics>p<annotation\; encoding="application/x-tex">p</annotation></semantics>$-values and hypothesis testing really mean, nearly 100 years after these ideas were first outlined by Ronald Fisher and others. I’ve never been sure enough about what all this means to be comfortable using the standard terminology, but the sampling-based approach we’ve used here makes it much harder to get confused – we can say “our results are larger than 99.9% of results that could be encountered as a result of sampling variability under the assumptions of our null hypothesis”, which seems quite unambiguous (if a little wordy!).
In the next article we’ll take a quick look at what these “bumps” in our “real data” PDF represent in terms of atmospheric flow.
I still get compliments on and criticisms of my post from three years ago (can it possibly be that long?) on parallelism and concurrency. In that post I offered a “top down” argument to the effect that these are different abstractions with different goals: parallelism is about exploiting computational resources to maximize efficiency, concurrency is about non-deterministic composition of components in a system. Parallelism never introduces bugs (the semantics is identical to the sequential execution), but concurrency could be said to be the mother lode of all bugs (the semantics of a component changes drastically, without careful provision, when composed concurrently with other components). From this point of view the two concepts aren’t comparable, yet relatively few people seem to accept the distinction, or, even if they do, do not accept the terminology.
Here I’m going to try a possible explanation of why the two concepts, which seem separable to me, may seem inseparable to others.
I think that it is to do with scheduling.
One view of parallelism is that it’s just talk for concurrency, because all you do when you’re programming in parallel is fork off some threads, and then do something with their results when they’re done. I’ve previously argued that parallelism is about cost, but let’s leave that aside. It’s unarguable that a parallel computation does consist of a bunch of, well, parallel computations, and so it is about concurrency. I’ve previously argued that that’s not a good way to think about concurrency either, but let’s leave that aside as well. So, the story goes, concurrency and parallelism are synonymous, and people like me are just creating confusion.
Perhaps that is true, but here’s why it may not be a good idea to think of parallelism this way. Scheduling as you learned about it in OS class (for example) is a altogether different than scheduling for parallelism. There are two aspects of OS-like scheduling that I think are relevant here. First, it is non-deterministic, and second, it is competitive. Non-deterministic, because you have little or no control over what runs when or for how long. A beast like the Linux scheduler is controlled by a zillion “voodoo parameters” (a turn of phrase borrowed from my queueing theory colleague, Mor Harchol-Balter), and who the hell knows what is going to happen to your poor threads once they’re in its clutches. Second, and more importantly, an OS-like scheduler is allocating resources competitively. You’ve got your threads, I’ve got my threads, and we both want ours to get run as soon as possible. We’ll even pay for the privilege (priorities) if necessary. The scheduler, and the queueing theory behind it is designed to optimize resource usage on a competitive basis, taking account of quality of service guarantees purchased by the participants. It does not matter whether there is one processor or one thousand processors, the schedule is unpredictable. That’s what makes concurrent programming hard: you have to program against all possible schedules. And that’s why it’s hard to prove much about the time or space complexity of your program when it’s implemented concurrently.
Parallel scheduling is a whole ‘nother ball of wax. It is (usually, but not necessarily) deterministic, so that you can prove bounds on its efficiency (Brent-type theorems, as discussed in a previous post and in PFPL). And, more importantly, it is cooperative in the sense that all threads are working together for the same computation towards the same ends. The threads are scheduled so as to get the job (there’s only one) done as quickly and as efficiently as possible. Deterministic schedulers for parallelism are the most common, because they are the easiest to analyze with respect to their time and space bounds. Greedy schedulers, which guarantee to maximize use of available processors, never leaving any idle when there is work to be done, form an important class for which the simple form of Brent’s Theorem is obvious.
Many deterministic greedy scheduling algorithms are known, of which I will mention p-DFS and p-BFS, which do p-at-a-time depth- and breadth-first search of the dependency graph, and various forms of work-stealing schedulers, pioneered by Charles Leiserson at MIT. (Incidentally, if you don’t already know what p-DFS or p-BFS are, I’ll warn you that they are a little trickier than they sound. In particular p-DFS uses a data structure that is sort of like a stack but is not a stack.) These differ significantly in their time bounds (for example, work stealing usually involves expectation over a random variable, whereas the depth- and breadth-first traversals do not), and differ dramatically in their space complexity. For example, p-BFS is absolutely dreadful in its space complexity. (For a full discussion of these issues in parallel scheduling, I recommend Dan Spoonhower’s PhD Dissertation. His semantic profiling diagrams are amazingly beautiful and informative!)
So here’s the thing: when you’re programming in parallel, you don’t just throw some threads at some non-deterministic competitive scheduler. Rather, you generate an implicit dependency graph that a cooperative scheduler uses to maximize efficiency, end-to-end. At the high level you do an asymptotic cost analysis without considering platform parameters such as the number of processors or the nature of the interconnect. At the low level the implementation has to validate that cost analysis by using clever techniques to ensure that, once the platform parameters are known, maximum use is made of the computational resources to get your job done for you as fast as possible. Not only are there no bugs introduced by the mere fact of being scheduled in parallel, but even better, you can prove a theorem that tells you how fast your program is going to run on a real platform. Now how cool is that?
[Update: word-smithing.]
[Update: more word-smithing for clarity and concision.]
Similarly to how Bryan O’Sullivan got side-tracked over five years ago, I recently found myself wishing a library existed to more easily deal with monad transformers.
There are quite a few libraries that try and provide more convenient ways of dealing with monad transformers (typically using those defined in transformers so as to avoid re-defining them all the time and to provide inter-compatibility): the old standard of mtl, the type-family variant found in monads-tf, the more ambitious layers package and Roman Cheplyaka’s monad-classes work.
However, I found that none of the libraries I could find really satisfied me. Even layers
and monad-classes
– that aim to simplify/remove the quadratic instance problem still require a “catch-all” default instance for all other monad transformers. Ideally for me, if I want to define a new transformer class, then I should only need to define instances for transformers that directly implement it’s functionality.
As such, I’m pleased to announce the first (alpha-level) release of my new library: monad-levels.
Originally, all I wanted was to be able to lift operations in a base monad up through any transformers I might stack on top of it.
We already have MonadIO; I just need to generalise it to work on any monad, right?
Except that I didn’t want to just lift up a single monad up through the stack: I wanted to be able to convert a function on my base monad up to whatever set of transformers I had stacked up on top of it. So I resigned myself to writing out instances for every existing transformer in the transformers
library.
As I started doing so though, I noticed a common pattern: for each method in the instance, I would be using a combination of the following operations (using StateT
as an example):
wrap
: apply the monad transformer (e.g. m (a,s) → StateT s m a
)unwrap
: remove the monad transformer (e.g. StateT s m a → m (a,s)
)addInternal
: adds the internal per-transformer specific state (e.g. m a → m (a,s)
)In particular, wrap
is used everywhere, unwrap
is used when lowering existing monads down so that they can (eventually) be used in the base monad and addInternal
is used when lifting monadic values.
Thus, if I define this as a type class for monad transformers, then I could use the wonderful DefaultSignatures
extension to simplify defining all the instances, and what’s more such a class would be re-usable.
Generally, the definition of unwrap
and addInternal
require information from within the scope of the transformer (e.g. the s
parameter within StateT
); as such, wrap
ends up being a continuation function. I thus came up with the following class:
class (Monad m) => MonadLevel m where type LowerMonad m :: * -> * type InnerValue m a :: * -- A continuation-based approach for how to lift/lower a monadic value. wrap :: ( (m a -> LowerMonad m (InnerValue m a) -- unwrap -> (LowerMonad m a -> LowerMonad m (InnerValue m a)) -- addInternal -> LowerMonad m (InnerValue m a) ) -> m a
(Note that I’m not using MonadTrans
for this as I also wanted to be able to use this with newtype wrappers.)
So I define this class, use DefaultSignatures
and my instances – whilst still needing to be explicitly defined – become much simpler (and in many/most cases empty)!
Whilst I was looking to see if any existing libraries had something similar (layers
came the closest, but it uses multiple classes and requires being able to specify function inverses when using it), I came across Roman Cheplyaka’s blog post on how monad-control
uses closed type families to automatically recursively lift monads down to a monad that satisfies the required constraint. I became intrigued with this, and wondered if it would be possible to achieve this for any constraint (more specifically something of kind (* → *) → Constraint
) rather than using something that was almost identical for every possible monad transformer class.
So I wrote a prototype that made it seem as if this would indeed work (note that I used the term “lower” rather than “lift”, as I saw it as lowering operations on the overall monadic stack down to where the constraint would be satisfied):
data Nat = Zero | Suc Nat class SatisfyConstraint (n :: Nat) (m :: * -> *) (c :: (* -> *) -> Constraint) where _lower :: Proxy c -> Proxy n -> (forall m'. (c m') => m' a) -> m a instance (ConstraintSatisfied c m ~ True, c m) => SatisfyConstraint Zero m c where _lower _ _ m = m instance (MonadLevel m, SatisfyConstraint n (LowerMonad m) c) => SatisfyConstraint (Suc n) m c where _lower _ _ m = wrap (\ _unwrap addI -> addI (_lower (Proxy :: Proxy c) (Proxy :: Proxy n) m))
(This is a simplified snippet: for more information – including where the ConstraintSatisfied
definition comes from – see here.)
With this, you also get liftBase
for free! However, if all I wanted was a function just to lift a value in the base monad up the stack, then I could have used a much simpler definition. For this to actually be useful, I have to be able to write (semi-)arbitrary functions and lift/lower them as well.
I could just go back to my original plan and use MonadLevel
combined with DefaultSignatures
and not bother with this automatic lifting/lowering business… but I’ve already started, and in for a penny in for a pound. So full steam ahead!
It took a while to sort out it would work (dealing with State
and Reader
was easy; having to extend how this worked for Cont
took quite a while and then even more for Writer
) but monad-levels
is now able to deal with arbitrary monadic functions.
Well… I say arbitrary…
To be able to deal with functions, you first need to use the provided sub-language to be able to specify the type of the function. For example, a basic function of type m a → m a
is specified as Func MonadicValue (MkVarFnFrom MonadicValue))
(or more simply as just MkVarFn MonadicValue
, using the inbuilt simplification that most such functions will return a value of type m a
); something more complicated like CallCC
becomes MkVarFn (Func (Func ValueOnly (MonadicOther b)) MonadicValue)
.
This language of lower-able functions is used to be able to know how to convert arguments and results up and down the monadic stack.
I’m finally releasing this library after being able to successfully replicate all the existing monad transformer classes in mtl
(with the exception of the deprecated MonadError
class). As an example, here is the equivalent to MonadCont
:
import Control.Monad.Levels import Control.Monad.Levels.Constraints import Control.Monad.Trans.Cont (ContT (..)) import qualified Control.Monad.Trans.Cont as C import Control.Monad.Trans.List (ListT) -- | A simple class just to match up with the 'ContT' monad -- transformer. class (MonadLevel m) => IsCont m where -- Defined just to have it based upon the constraint _callCC :: CallCC m a b instance (MonadTower m) => IsCont (ContT r m) where _callCC = C.callCC instance ValidConstraint IsCont where type ConstraintSatisfied IsCont m = IsContT m type family IsContT m where IsContT (ContT r m) = True IsContT m = False -- | Represents monad stacks that can successfully pass 'callCC' down -- to a 'ContT' transformer. type HasCont m a b = SatisfyConstraintF IsCont m a (ContFn b) -- | This corresponds to @CallCC@ in @transformers@. type ContFn b = MkVarFn (Func (Func ValueOnly (MonadicOther b)) MonadicValue) -- This is defined solely as an extra check on 'ContFn' matching the -- type of 'C.callCC'. type CallCC m a b = VarFunction (ContFn b) m a -- Not using CallCC here to avoid having to export it. -- | @callCC@ (call-with-current-continuation) calls a function with -- the current continuation as its argument. callCC :: forall m a b. (HasCont m a b) => ((a -> m b) -> m a) -> m a callCC = lowerSat c vf m a _callCC where c :: Proxy IsCont c = Proxy vf :: Proxy (ContFn b) vf = Proxy m :: Proxy m m = Proxy a :: Proxy a a = Proxy -- By default, ListT doesn't allow arbitrary constraints through; -- with this definition it is now possible to use 'callCC' on @ListT (ContT r m) a@. instance (MonadTower m) => ConstraintPassThrough IsCont (ListT m) True
One thing that should be obvious is that the constraint is a tad more complicated than that required for MonadCont
. Specifically, it requires the a
and b
parameters as well; this is because not all instances of MonadLevel
allow dealing with arbitrary other monadic values (that is, we’re dealing with m a
over all, but we also need to consider m b
in this case). In practice, however, the only existing monad transformer with this constraint is ContT
itself, and you can’t pass through a call to callCC
from one ContT
transformer to another (as there’s no way to distinguish between the two).
(Something that might not be obvious is that the interaction between StateT
– both lazy and strict – and how I’ve defined callCC
differs from how it’s defined in mtl
. Hence why I started this thread on Haskell Cafe.)
But, any monad transformer in the transformers
library that is an instance of MonadCont
also satisfies the requirements for the HasCont
constraint, and furthermore just by making it an instance of MonadLevel
any new transformer (including a newtype wrapper over a monadic stack) will also automatically satisfy the constraint!
There are two main sources of problems currently with monad-levels
.
I have no idea how it compares speed- and memory-wise to mtl
; as it uses a lot of type families, explicit dictionary passing, etc. I expect it to be slower, but I haven’t compared it or investigated if there’s anywhere I can improve it.
I’m not sure of all the names (e.g. MkVarFn
and MkVarFnFrom
for dealing with variadic functions probably could be improved); not to mention that there’s probably also room for improvement in terms of what is exported (e.g. should the actual type classes for dealing with variadic arguments be fully exported in case people think of more possible argument types?).
It could also do with a lot more documentation.
These, however, are for the most part just a matter of time (though it might be that the performance one should actually belong to the next category).
The biggest noticeable problem is one of discovery: if you look at mtl
, it’s obvious to tell when a transformer is an instance of a particular class; in contrast, with monad-levels
there’s no obvious way of looking at Haddock documentation to tell whether or not this is the case. The best you can do for a specific constraint c
and monad m
(without trying it in ghci) is that if it’s MonadLevel_
definition has AllowOtherValues m ~ True
and DefaultAllowConstraints m ~ True
(both of which are the defaults) and the latter hasn’t been overriden with instance ConstraintPassThrough c m ~ False
then it is allowed. (Assuming that the constraint and its functions are sound, and something screwy hasn’t been done like having the monad actually being a loop.)
Something that might also be a problem for some is the complexity: lots of language extensions are used, not to mention using a lot of things like Proxy
and explicit dictionary passing.
As part of this, this means things like type errors sometimes being difficult to resolve due to the large usage of associated types and constraint kinds. Furthermore, as you probably saw in the HasCont
definition shown above, you typically need to use ScopedTypeVariables
with proxies.
Whilst it most definitely isn’t perfect, I think monad-levels
is now at a usable state. As such, I’d appreciate any attempts people make at using it and giving me any feedback you might have.
This is also the first time I’ve used git and Github for my own project. I missed the simplicity and discoverability of darcs, but magit for Emacs makes using it a bit easier, and in-place branches and re-basing turned out to be quite nice.
After far too long, and far too many obstacles to be overcome, Dave MacQueen, Lars Bergstrom, and I have finally prepared an open-source site for the entire family of languages derived from Standard ML. The defining characteristic of Standard ML has always been that it has a rigorous definition, so that it is always clear what is a valid program and how it should behave. And indeed we have seven different working compilers, all of which are compatible with each other, with the exception of some corner cases arising from known mistakes in the definition. Moreover, there are several active projects developing new variations on the language, and it would be good to maintain the principle that such extensions be precisely defined.
To this end the sources of the 1990 and 1997 versions of the definition are on the web site, with the permission of MIT Press, as is the type-theoretic definition formulated by Stone and H., which was subsequently used as the basis for a complete machine-checked proof of type safety for the entire language done by Crary, Lee, and H. It is be hoped that the errors in the definition (many are known, we provide links to the extensive lists provided by Kahrs and Rossberg in separate investigations) may now be corrected. Anyone is free to propose an alteration to be merged into the main branch, which is called “SML, The Living Language” and also known as “Successor ML”. One may think of this as a kind of “third edition” of the definition, but one that is in continual revision by the community. Computer languages, like natural languages, belong to us all collectively, and we all contribute to their evolution.
Everyone is encouraged to create forks for experimental designs or new languages that enrich, extend, or significantly alter the semantics of the language. The main branch will be for generally accepted corrections, modifications, and extensions, but it is to be expected that completely separate lines of development will also emerge.
The web site, sml-family.org is up and running, and will be announced in various likely places very soon.
Update: We have heard that some people get a “parked page” error from GoDaddy when accessing sml-family.org. It appears to be a DNS propagation problem.
Update: The DNS problems have been resolved, and I believe that the web site is stably available now as linked above.
Update: Word smithing for clarity.
While recovering from a knee surgery, I entertained myself by solving a geometry problem from the last International Mathematical Olympiad. My solution, shown below, is an example of using plane transformations (spiral similarity, in this case) to prove geometric statements.
(IMO-2014, P4)
Points \(P\) and \(Q\) lie on side \(BC\) of acute-angled triangle \(ABC\) such that \(\angle PAB=\angle BCA\) and \(\angle CAQ = \angle ABC\). Points \(M\) and \(N\) lie on lines \(AP\) and \(AQ\), respectively, such that \(P\) is the midpoint of \(AM\) and \(Q\) is the midpoint of \(AN\). Prove that the lines \(BM\) and \(CN\) intersect on the circumcircle of triangle \(ABC\).
Let \(\angle BAC = \alpha\).
\[\angle APB = \pi - \angle PAB - \angle PBA = \pi - \angle ACB - \angle CBA = \alpha\]
Let \(B_1\) and \(C_1\) be such points that \(B\) and \(C\) are midpoints of \(AB_1\) and \(AC_1\), respectively.
Consider a spiral similarity \(h\) such that \(h(B_1)=A\) and \(h(B)=C\) (it necessarily exists).
Now we shall prove that \(h(M)=N\), i.e. that \(h\) transforms the green \(\triangle B_1BM\) into the magenta \(\triangle ACN\) .
Being a spiral similarity, \(h\) rotates all lines by the same angle. It maps \(B_1B\) to \(AC\), therefore that angle equals \(\angle(B_1B, AC)=\pi-\alpha\). (We need to be careful to measure all rotations in the same direction; on my drawing it is clockwise.)
\(h(A)=C_1\), since \(h\) preserves length ratios. So \(h(AM)\) (where \(AM\) denotes the line, not the segment) is a line that passes through \(h(A)=C_1\). It also needs to be parallel to \(BC\), because \(\angle (AM,BC)=\pi-\alpha\) is the rotation angle of \(h\). \(C_1B_1\) is the unique such line (\(C_1B_1 \parallel BC\) by the midline theorem).
Since \(h(AM)=C_1B_1\) and \(h(MB_1)=NA\), \[h(M)=h(AM\cap MB_1)=h(AM)\cap h(MB_1)=C_1B_1\cap NA=N.\]
Now that we know that \(h(BM)=CN\), we can deduce that \(\angle BZC=\angle(BM,CN)=\pi-\alpha\) (the rotation angle). And because \(\angle BAC+\angle BZC=\pi\), \(Z\) lies on the circumcircle of \(ABC\).
tl;dr A non-nameless term equipped with a map specifying a de Bruijn numbering can support an efficient equality without needing a helper function. More abstractly, quotients are not just for proofs: they can help efficiency of programs too.
The cut. You're writing a small compiler, which defines expressions as follows:
type Var = Int data Expr = Var Var | App Expr Expr | Lam Var Expr
Where Var is provided from some globally unique supply. But while working on a common sub-expression eliminator, you find yourself needing to define equality over expressions.
You know the default instance won’t work, since it will not say that Lam 0 (Var 0) is equal to Lam 1 (Var 1). Your colleague Nicolaas teases you that the default instance would have worked if you used a nameless representation, but de Bruijn levels make your head hurt, so you decide to try to write an instance that does the right thing by yourself. However, you run into a quandary:
instance Eq Expr where Var v == Var v' = n == n' App e1 e2 == App e1' e2' = e1 == e1' && e2 == e2' Lam v e == Lam v' e' = _what_goes_here
If v == v', things are simple enough: just check if e == e'. But if they're not... something needs to be done. One possibility is to rename e' before proceeding, but this results in an equality which takes quadratic time. You crack open the source of one famous compiler, and you find that in fact: (1) there is no Eq instance for terms, and (2) an equality function has been defined with this type signature:
eqTypeX :: RnEnv2 -> Type -> Type -> Bool
Where RnEnv2 is a data structure containing renaming information: the compiler has avoided the quadratic blow-up by deferring any renaming until we need to test variables for equality.
“Well that’s great,” you think, “But I want my Eq instance, and I don’t want to convert to de Bruijn levels.” Is there anything to do?
Perhaps a change of perspective in order:
The turn. Nicolaas has the right idea: a nameless term representation has a very natural equality, but the type you've defined is too big: it contains many expressions which should be equal but structurally are not. But in another sense, it is also too small.
Here is an example. Consider the term x, which is a subterm of λx. λy. x. The x in this term is free; it is only through the context λx. λy. x that we know it is bound. However, in the analogous situation with de Bruijn levels (not indexes—as it turns out, levels are more convenient in this case) we have 0, which is a subterm of λ λ 0. Not only do we know that 0 is a free variable, but we also know that it binds to the outermost enclosing lambda, no matter the context. With just x, we don’t have enough information!
If you know you don’t know something, you should learn it. If your terms don’t know enough about their free variables, you should equip them with the necessary knowledge:
import qualified Data.Map as Map import Data.Map (Map) data DeBruijnExpr = D Expr NEnv type Level = Int data NEnv = N Level (Map Var Level) lookupN :: Var -> NEnv -> Maybe Level lookupN v (N _ m) = Map.lookup v m extendN :: Var -> NEnv -> NEnv extendN v (N i m) = N (i+1) (Map.insert v i m)
and when you do that, things just might work out the way you want them to:
instance Eq DeBruijnExpr where D (Var v) n == D (Var v') n' = case (lookupN v n, lookupN v' n') of (Just l, Just l') -> l == l' (Nothing, Nothing) -> v == v' _ -> False D (App e1 e2) n == D (App e1' e2') n' = D e1 n == D e1' n' && D e2 n == D e2' n' D (Lam v e) n == D (Lam v' e') n' = D e (extendN v n) == D e' (extendN v' n')
(Though perhaps Coq might not be able to tell, unassisted, that this function is structurally recursive.)
Exercise. Define a function with type DeBruijnExpr -> DeBruijnExpr' and its inverse, where:
data DeBruijnExpr' = Var' Var | Bound' Level | Lam' DeBruijnExpr' | App' DeBruijnExpr' DeBruijnExpr'
The conclusion. What have we done here? We have quotiented a type—made it smaller—by adding more information. In doing so, we recovered a simple way of defining equality over the type, without needing to define a helper function, do extra conversions, or suffer quadratically worse performance.
Sometimes, adding information is the only way to get the minimal definition. This situation occurs in homotopy type theory, where equivalences must be equipped with an extra piece of information, or else it is not a mere proposition (has the wrong homotopy type). If you, gentle reader, have more examples, I would love to hear about them in the comments. We are frequently told that “less is more”, that the route to minimalism lies in removing things: but sometimes, the true path lies in adding constraints.
Postscript. In Haskell, we haven’t truly made the type smaller: I can distinguish two expressions which should be equivalent by, for example, projecting out the underlying Expr. A proper type system which supports quotients would oblige me to demonstrate that if two elements are equivalent under the quotienting equivalence relation, my elimination function can't observe it.
Postscript 2. This technique has its limitations. Here is one situation where I have not been able to figure out the right quotient: suppose that the type of my expressions are such that all free variables are implicitly universally quantified. That is to say, there exists some ordering of quantifiers on a and b such that a b is equivalent to b a. Is there a way to get the quantifiers in order on the fly, without requiring a pre-pass on the expressions using this quotienting technique? I don’t know!