Planet Haskell

September 01, 2014

Douglas M. Auclair (geophf)

August Haskell Daily puzzles and solutions

August, 2014

  • August 1st, 2014: "But programming isn't math, so ..." Today's #haskell problem? 'Fix' yesterday's solution to be less ... 'icky.' π, fast (So you're not eating π? ... okay: 'quickly.') A solution set to today's exercises with Wow-o-wow results.
  • Bonus: August 1st, 2014: This is Embarrassing(ly parallel) to have π in da face!
  • A-to-the-ST for da D-down-low-on-the-SL. Today's #haskell exercise is write your own language, then some rules. Yeah. LAWLZ! NAND on the A-to-the-ST! A solution to the first 2 exercises to today's problem. YMMV: not pretty nor generic.
  • Baby needs a new pair of shoes! And you need a new daily #haskell problem to solve. Done! ;) Love is the Universal Language. No, wait. We were talking about Money, yes? Me confused. Solution to today's exercise I have a Yen for Thinking Japanese, or so I think so ... a solution to the bonus #haskell exercise
  • Bayes was a man of letters. A Bayesian classifier for letter-recognition? Sure, let's give it a go for today's #haskell exercise. So now we know 'M' looks like 'W' to some peeps but 'B', 'I', 'O' check out fine, so WE ARE READING WITH HASKELL! YAY ... added definitions to do test runs over the entire data set and then did various runs, tweaking the system. Results noted. Informative.
  • Today's #haskell exercise. An expert system for blood donation. I use the term 'expert' loosely, eh? ;) Have at it! "I vant to drinq yur bloot! Bwa-hahaha!" Okay. That quote is just ... creepy. A solution to today's exercise is at
  • Veni, Vidi, ... Duci? Today's #haskell exercise. It is a relief to know that no tactical nukes were launched by testing the solution to today's #haskell exercise. or How do you count Mississippi's in Roman numerals?
  • August 11th, 2014, Monday: You've heard of 'Rock Management'? How about ROCK LOBSTERS! GIVE ME A ROCK, NAOW!!!1!!!!11! Today's #haskell exercise 
  • August 12th, 2014, Tuesday: To Fract or Not to Fract: why are we even considering this question? Today's #haskell exercise 

  • Ooh! Bonus problem? On a Tuesday? Wowsers! Today's #haskell #bonus problem: "Continuum spectrum maximum... uh, 'thing'um"
  • August 13th, 2014, Wednesday. No problem! (literally), but: "Fract! The Cylons!" No... wait: Starbuck used a different word. A solution to yesterday's fracting problem is at
Flatliners. No fracting, no 'peak'ing, just a baseline data set of 
initializeSpectrum defined with flatline function.

Twin Peaks. Still not fracted, but data set with two spikes 
rolled into the base data set with smoothSpike fn 

Fracted. Data set with peaks, fracted using the frackedSpike function

  • August 14th, 2014, Thursday: Pining for that next fibo date-time before 2584. Today's #haskell problem inspired by @AlgebraFact Updated problem. Tightened up return value, removing unnecessary semideterminism. So, it gives, leik, a septatuple answer, leik. ... Yeah. Time keeps flowing like a (Fibonacci) river, to the sea. A solution to  the 'next Fibonacci' problem posted today.
  • Hail, Eris! or the Law of Fives or today's #haskell problem (implementing a Ripple-down rule set). Do it to it! One of three-parter solution to today's problem: Forall Existential(ism) A solution allowing forall-quantified values. Two of the three-part solution: All you need is fnord (la-di-da-di-dah)! Rippling down (and accumulating fired) rules Third of three-part solution: RippleDownRuleto ergo sum, adding rules to the RDR system  Updated the 3rd solution to the RDR (Writer monad definition) to ripple down to left and right, fixing inconsistency in rule findings.
  • August 18th, 2014: What's FP good for? Crypto and technical (financial) indicators. Why? Um, let's do today's #haskell exercise! Full on, all the way! Double Rain-... wait: line-graph. Solution to today's #haskell exercise

  • August 19th, 2014: What does it all mean? All the way! No, wait: this is just today's #haskell exercise (backtesting SMA indicator ). Take the monoid and run(State-WriterT), or WAAH! WAAH! I lost 100 corn chips on my investment strategy or, solution: But, as solace, it does come with a pretty picture, illustrating today's solution. Ooh! 

  • August 20th, 2014: Next up for today's #haskell exercise is the Exponential Moving Average. A solution to the E(xponential)M(oving)A(verage) #haskell problem: Stately Conformance 

  • August 21st, 2014: My Jeans. No ... I meant: 'GENES'! That's it. Today's #haskell exercise. A solution to (not even CLOSE to) 'yesterday's' #haskell exercise: Nature or Nurture? We present a genetic algorithm
  • August 22nd, 2014: Today's (much delayed) #haskell problem: the Silver Ratio (from @AlgebraFact Every cloud has a silver lining .. Every Haskell problem has a solution. (sometimes) (except noncomputable problems)

  • August 25th, 2014: LYADCFGG! Today's #haskell exercise? (automated/learned) Document classification.
  • August 26th, 2014: "Join me, Luke, on the Dark Side of the Force, and help me go #FORTH to solve today's #haskell exercise! MWA-HAHA!" The World's smallest DSL: Forth. A solution to today's #haskell exercise
  • August 28th, 2014: "As I compose this #haskell problem ... p.s. I love you ... You! You! You!" #1960s #song A little card logic problem. Oh, NOES! LOOK OUT! IT'S A CARD-#SHARKNADO! Nah, it's just 'today's #haskell solution, is all
  • August 29th, 2014: Groovitudes! WordNet Themes! or: today's #haskell exercise A 3x3 matrix has 36 possible solutions. A 5x5 matrix? Over 200M poss solutions? YIKES! A solution to the themed-words
  • The Forth language problem solution given on August 26th gives a very snazzy RPN ('reverse Polish notation') calculator, but that's all it does the ':'-defining word needs access and look-ahead to the program parsed stream, and that's a bit more to ask than pushing and popping stack operators.
  • For the August 29th problem(WordNet themes) the raw generated solution set is over 209M possibilities. My little Haskell program was not finished scanning them over four hours when I killed the process. However, my dear wife solved the problem in under five minutes. Setting aside the fact that she's a genius, the program needs to be better. It needs to use the ontology of English-language words to eliminate fruitless endeavors during their generation, not afterwards during the test phase.

by geophf ( at September 01, 2014 10:11 AM

Yesod Web Framework

Planning Yesod 1.4

Now that persistent 2.0 is out the door, it's time to start talking about Yesod version 1.4. First question you might ask: what happened to Yesod 1.3? Answer: a few of the Yesod libraries (e.g., yesod-auth) are already on version 1.3, so to avoid confusion, we're jumping straight to 1.4.

Second question: what are we planning on breaking this time? Answer: hopefully nothing! The main purpose of this release is actually to just remove some backwards-compatibility hacks in the Yesod codebase for older versions of dependencies, like shakespeare pre-2.0, conduit pre-1.2, WAI pre-3.0, and persistent pre-2.0.

There are few exceptions to this, which should hopefully have minimal impact on users. You can see these in the detailed change list. One change I'd like to call out is the updated routing system. This is a fundamental change to how yesod-routes works. The generated code is drastically simpler as a result. Instead of constructing a data structure that allows for efficient pattern matching of the request path and then attempting to parse the resulting pieces, the new code simply generates a series of clauses, one for each route, and ensures proper parsing using view patterns. In my initial benchmarking, this made routing twice as fast as Yesod 1.2. I would release this as part of 1.2, but it introduces a new requirement on the ViewPatterns language extension. So instead, I held it off for the 1.4 release.

If there are other breaking changes that people would like to propose, now's the time to do it. But be aware that I'll likely push back hard on any breakage. If there's a very good reason for it, we can do it. But I'd rather keep stability wherever possible.

There's one exception to that rule, which is the purpose of the rest of this blog post: the scaffolded site. Making changes to the scaffolded site never breaks existing application, and therefore we can be much more liberal about changing things there. There is a downside in terms of education: all existing tutorials on the scaffolding would need to be updated. But one of my points below addresses that.

So here are my proposed scaffolding changes:

  • Let's move away from config files towards environment variables for configuration. A config file is still a convenient way to record configuration, but injecting that configuration through environment variables means configuration can also be stored in a database or elsewhere and injected through environment variables the same way.
  • Along the same lines, we would no longer need a command line argument to indicate which environment we're in (devel vs production, etc). All such settings would be controlled via environment variables.
  • To allow for easy development, we would have a single YESOD_DEVEL environment variables which would indicate if we're currently in development. If so, it would apply a number of default environment variable values to avoid the need to set these in your shell manually.
  • Finally, and I expect this to be controversial: let's use classy-prelude-yesod in the Import module, instead of just taking Prelude with a few objectionable functions filtered out.

This is just a first pass at a scaffolding cleanup, I'm sure there are other improvements that can be made as well.

I don't have a specific date on a Yesod 1.4 release, but I'm not expecting it to be a long development process. The vast majority of the work is already done (on the yesod-1.4 branch), and that codebase is already being used extensively in a rather large Yesod application, so I'm not too worried about regressions having slipped in.

September 01, 2014 12:00 AM

August 30, 2014

Philip Wadler

Howard on Curry-Howard

When writing Propositions as Types, I realised I was unclear on parts of the history. Below is a letter I wrote to William Howard and his response (with corrections he provided after I asked to publish it). I believe it is a useful historical document, and am grateful to Howard for his permission to publish.

Update. References to Figures 5 and 6 in the following are to Propositions as Types. Thanks to Ezra Cooper for pointing out this was unclear.

Here is my original request.

Subject: The Formulae-as-Types Notion of Construction

Dear Prof Howard,

My research has been greatly influenced by your own, particularly the paper cited in my subject. I am now writing a paper on the field of work that grew out of that paper, which was solicited for publications by the Communications of the ACM (the flagship of the professional organisation for computer scientists). A draft of the paper is attached.

I would like to portray the history of the subject accurately. I have read your interview with Shell-Gallasch, but a few questions remain, which I hope you will be kind enough to answer.

Your paper breaks into two halves. The first describes the correspondence between propositional logic and simple types, the second introduces the correspondence between predicate logic and dependent types. Did you consider the first half to be new material or merely a reprise of what was known? To what extent do you consider your work draws on or was anticipated by the work of Heyting and Kolmogorov, and Kleene's realisability? To what extent did your work influence the subsequent work of de Bruijn and Martin Lof? What was the history of your mimeograph on the subject, and why was it not published until the Curry Festschrift in 1980?

Many thanks for your consideration, not to mention for founding my field! Yours, -- P

And here is his response: 

Dear Prof. Wadler,

As mentioned in the interview with Shell-Gellasch, my work on propositions as types (p-a-t) originated from my correspondence with Kreisel, who was very interested in getting a mathematical notion (i.e., in ordinary mathematics) for Brouwer's idea of a construction (as explained by Heyting). I was not familiar with the work of Brouwer or Heyting, let alone Kolmogorov, but, from what Kreisel had to say, the idea was clear enough: a construction of  alpha - > beta was to be a construction F which, acting on a construction A of alpha, gives a construction B of beta. So we have constructions acting on constructions, rather like functionals acting on functionals. So, as an approximation,

(1)   let's take "construction" to mean "functional".

But what kind of functionals? In constructive mathematics, a functional is not given as a set of ordered pairs. Rather,

(2)   to give a functional is to give not only the action or process it performs but also to give its type (domain and counterdomain).

Clearly, the type structure is going to be complicated. I set myself the project of finding a suitable notation for the type symbols. So one needs a suitable type symbol for the functional F, above. Well, just take it to be alpha itself (at this point, I was thinking of propositional logic). Suddenly I remembered something that Curry had talked about in the logic seminar during my time at Penn State. If we consider typed combinators, and look at the structure of the type symbols of the basic combinators (e.g., S, K, I), we see that each of the type symbols corresponds to (is isomorphic to) one of the axioms of pure implicative logic. Well! This was just what I needed!

How do we formulate the following notion?

(3)   F is a construction of phi.

Consider the case in which phi has the form alpha - > beta. The temptation is to define "F is a construction of alpha - > beta" to mean "for all A: if A is a construction of alpha, then FA is a construction of beta". Well, that is circular, because we have used if ... then ... to define implication. This is what you call "Zeno’s paradox of logic". I avoided this circularity by taking (3) to mean:

(4)   F is assigned the type phi according to the way F is built up; i.e., the way in which F is constructed.

Thus F is a construction of phi {\em by construction}. Your figure 6 illustrates precisely what I meant by this. (I did not have that beautiful notation at the time but it conveys what I meant.)

To summarize: My basic insight consisted simultaneously of the thoughts (2) and (4) plus the thought that Curry's observation provided the means to implement (2), (4). Let me say this in a different way. The thought (2) was not new. I had had the thought (2) for many years, ever since I had begun to study primitive recursive functionals of finite type. What was new was the thought (4) plus the recognition that Curry's idea provided the way to implement (4). I got this basic insight in the summer of 1966. Once I saw how to do it with combinators, I wondered what it would look like from the vewpoint of the lambda calculus, and saw, to my delight, that this corresponded to the intuitionistic version of Gentzen's sequent calculus.

Incidentally, Curry's observation concerning the types of the basic combinators is presented in his book with Feys (Curry-Feys), but I was unaware of this, though I had owned a copy for several years (since 1959, when I was hired at Penn State). After working out the details of p-a-t over a period of several months, I began to think about writing it up, so I thought I had better see if it is in the book. Well, it is easy enough to find if you know what you are looking for. On looking at it, I got a shock: not only had they extended the ideas to Gentzen's sequent calculus, but they had the connection between elimination of cuts from a derivation and normalization of the corresponding lambda term. But, on a closer look, I concluded that they had {\em a} connection but not {\em the} connection. It turns out that I was not quite right about that either. See my remark about their Theorem 5, below. Not that it would have mattered much for anything I might have published: even if they had the connection between Gentzen's sequent calculus and the lambda calculus, I had a far-reaching generalization (i.e., to Heyting arithmetic).

The above is more detailed than would be required to answer your questions, but I needed to write this out to clarify my thoughts about the matter; so I may as well include the above, since I think it will interest you. It answers one of your questions, "To what extent do you consider your work draws on or was anticipated by the work of Heyting and Kolmogorov, and Kleene's realisability?" Namely, my work draws on the work of Heyting and Brouwer, via Kreisel's explanation of that work to me. None of it was anticipated by the work of Heyting, Kolmogorov or Kleene: they were not thinking of functionals of finite type. Though I was familiar with Kleene's recursive realizability, I was not thinking about it at the time. Admittedly, it touches on ideas about Brouwer's constructions but is far from capturing the notion of a construction (actually, Kleene once made remarks to this effect, I forget where). Because of the relation between constructions and Kleene's recursive realizability, there could have been some unconscious influence; but, in any case, not a significant influence.

"did your work influence the subsequent work of de Bruijn and Martin Lof? " As far as I know, my work had no influence on the work of de Bruijn. His work appears to be completely independent of mine. I do recall that he once sent me a package of Automath material. The project of a computer program for checking existing proofs did not appear very interesting and I did not reply. What I would have been interested in is a program for finding proofs of results that had not yet been proved! Even a proof-assistant would have been fine. Why did he send me the Automath material? I don't recall what year it was. Sometime in the 1970s. Whatever the accompanying letter, it was not informative; merely something like: "Dear Professor Howard, you may be interested in the following material ...". Since that time, I have seen two or three articles by him, and I have a more favorable impression. It is good, solid work. Obviously original. He discovered the idea of derivations as terms, and the accompanying idea of formulae-as-types, on his own. He uses lambda terms but, I think, only for purposes of description. In other words, I don't think that he has the connection between normalization and cut-elimination, but I have not made an extensive examination of his work. In fact, does he use a Gentzen system at all? I just don't know. The latter two questions would easily be answered by anyone familiar with his work. In any case, give him credit where credit is due. There are enough goodies for everyone!

My influence on Martin-Löf? No problem there. I met him at the Buffalo 1968 conference and I told him my ideas. His instant reaction was: "Now, why didn't I think of that?" He had a visiting appointment at UIC for the academic year 1968-1969, so we had lot's of opportunity to talk, and he started developing his own approach to the ideas. In Jan. 1969, mainly to make sure that we were both clear on who had discovered what, I wrote up my own ideas in the form of handwritten notes. By then, Xerox machines were prevalent, so I sent a copy to Kreisel, and he gave copies to various people, including Girard. At least, I think that is how Girard got a copy, or maybe Martin-Löf gave him one. I like Martin-Löf's work. I could say more about this, but the short answer to your question is: Martin-Löf's work originated from mine. He has always given me credit and we are good friends.

On further thought, I need to mention that, in that first conversation, Martin-Löf suggested that the derivations-as-terms idea would work particularly well in connection with Prawitz's theory of natural deduction. I thought: okay, but no big deal. Actually, at that time, I was not familiar with Prawitz's results (or, if at all, then only vaguely). But it was a bigger deal than I had thought, because Prawitz's reductions steps for a deduction correspond direcly to reduction steps for the associated lambda term! Actually, for most purposes, I like the sequent formulation of natural deduction as given in pages 33 and 88 of Sorensen and Urzyczyn (2006). In fact, if we add left-implication-introduction to this (let's confine ourselves to pure implicative logic), the resulting system P# is pretty interesting. All occurrences of modus ponens can be eliminated, not just those which are preceded by left-implication-introduction. This is what I am up to in my JSL 1980 paper, "Ordinal analysis of terms of finite type." Also, the cut rule is easy to derive in P# (just consider, for typed lambda terms: a well-formed term substituted into a well-formed term results in a well-formed term); hence P# is is a conservative extension of the system P* in Part I of my little paper in the Curry Festschrift.

The phrase formulae-as-types was coined by Kreisel in order that we would have a name for the subject matter in our correspondence back and forth. I would assume that the phrase "propositions as types" was coined by Martin-Löf; at least, during our first conversation at the Buffalo 1968 meeting, he suggested that one could think of a type as a proposition, according to the idea that, in intuitionistic mathematics, the meaning of a proposition phi is given by the species of "all" proofs of phi. I use quotes here because we are not talking about a set-theoretic, completed infinity.

"the second [part] intrudes the correspondence between predicate logic and dependent types." I was not thinking about it in that way at all. I wanted to provided an interpretation of the notion of construction to some nontrivial part of intuitionistic mathematics (Heyting arithmetic). Part I of the paper was just the preliminaries for this. Actually, what you say in the pdf is consistent with this. No need for change here.

"Did you consider the first half to be new material or merely a reprise of what was known?" New. But in January of last year I had occasion to take a really hard look at the material in Curry-Feys, pp. 313-314; and I now see that there is a much closer relationship between my Theorem 2 in Part I and their Theorem 5, page 326, than I had thought. The issues here are quite interesting. I can provide a discussion if you want.

In the introduction to my little paper, I mention that Tait had influenced me. Let me say a few words about that. In the summer of 1963 we had conversations in which he explained to me that he had developed a theory of infinite terms in analogy to Schütte's theory of infinite proofs, where normalization (via lambda reductions) of an infinite terms corresponds to cut elimination of the corresponding proof. He did not know what to make of it. He thought of his theory of infinite terms as a sort of pun on Schütte's theory of infinite proofs. But we both agreed that  there must be a deep connection between normalization of lambda terms and Gentzen's cut elimination. We puzzled over this during two or three of our conversations but could not come up with an answer.

As explained in the first paragraph of this e-mail, my work originated with a problem posed by Kreisel; so, at the start of this work, certainly I was not thinking of those conversations with Tait. But, as mentioned above, as soon as I got the basic insight about the relevance of Curry's combinators, I considered how it would work for lambda terms. At that point, I remembered my conversations with Tait. In other words, when I verified that

(5)   cut elimination for a derivation corresponds to normalization for the term,

the conversations with Tait were very much on my mind. Most likely I would have noticed (5) without having had the conversations with Tait. But who knows? In any case, he deserves credit for having noticed the correspondence between derivations and terms. What he did not have was the associated correspondence between propositions and types. In fact, he was not using a general enough notion of type for this. By hindsight we can see that in his system there is a homomorphism, not an isomorphism, from propositions to types.

I need to say a bit more about Tait and types. Since Schütte had extended his system of proofs to transfinite orders, Tait extended his system of terms to transfinite type levels. I already had my own system of primitive recursive functionals of transfinite type. In our very first conversation, we compared out ideas on this topic. This topic requires that one think very hard about the notion of type. Certainly, I had already thought extensively about the notion of type (because of (2), above) before I ever met Tait, but my conversations with him reinforced this tendency. Thoughts about types were very much on my mind when I began to consider (1), (2), above.

As already mentioned, the notes were handwritten and xeroxed; no mimeographs. "why [were they] not published until the Curry Festschrift in 1980?" First let me mention why they got published in the Curry Festschrift. Selden was bringing out the Festschrift for Curry's 80th birthday. He asked me to contribute the notes. I said: "Sure. I'll write up an improved version. I can now do much better." He replied: "No, I want the original notes. It is a historical document." In other words, by that time various copies had been passed around and there were a number of references to them in the literature. So I had them typed up and I sent them in.

Why didn't I publish them before that? Simply because they did not solve the original problem. That was Kreisel's and Gödel’s verdict (Kreisel had shown or described the work to Gödel). In fact, even before communicating the work to Kreisel, I knew that I had gotten only an approximation to the notion of construction, and that more work had to be done. Essentially, the criticism is as follows. In my little paper, I do not provide axioms and rules of inference for proving statements of the form

(3)   F is a construction of phi.

Remember, we have to avoid "Zeno’s paradox of logic"! The answer is that the proofs will look like what you have in Figure 6. In other words, Figure 6 is not only a program; it is also a proof (or: it can be reinterpreted as a proof). But Figure 6 can also be interpreted as an explanation of how a construction (blue) is to be built up in order to have a given type (red). In other words, figures such as Figure 6 implements the idea (4) mentioned near the beginning of this e-mail; i.e., F is assigned the type phi according to the way F is built up.

I hope this tickles you; it certainly tickles me. Of course, the rules of inference are as in Figure 5. So these simple ideas provide the missing theory of constructions; or, at the very least, provide a significant step in that direction.

In Jan. 2013, I exchanged a few e-mails with Tait and Constable on the history of p-a-t. This caused me to take a really careful look at the Curry-Feys book. Here is something I found that really made me laugh: the required theory, whose inferences are of the form given in Figure 5 is already in Curry-Feys. Admittedly, to see this you first have to erase all the turnstyles ( |-- ); Curry seems to have some kind of obsession with them. In particular, erase the turnstiles from the proof tree on page 281. The result is exactly a proof tree of the general form given in your Figure 6. (Hint: (...)X is to be read "X has type (...)". In other words, rewrite (...)X as X : (...).) What does Fbc mean, where F is boldface? Just rewrite Fbc as b -> c. You see? I am an expert. I could probably make money writing a translation manual. In summary, the required theory is essentially just Curry's theory of functionality (more precisely, the appropriate variant of Curry's theory). So, did I miss the boat here? Might I have seen all this in 1969 if only I had had the determination to take a hard look at Curry-Feys? I don't know. It may require the clarity of mind represented by the notation of Figure 5. Do you have any idea when and where this notation came into use?

One more remark concerning my reason for not publishing. Didn't I feel that I had made an important breakthrough, in spite of Kreisel's and Gödel’s criticisms? On the one hand, yes. On the other hand, I had reservations. Except for Martin-Löf, Prawitz, Tait and Girard, very few people showed an interest in the ideas. But maybe Martin-Löf, Prawitz, Tait and Girard should have been enough. You say: "Certainly Howard was proud of the connection he drew, citing it as one of the two great achievements of his career [43]." Should we let that passage stand? Sure. The interview occurred in the spring of 2000. By that time, I was getting lots of praise from the computer science community. So, pride is a peculiar thing. Let me end this on a positive note. In 1969 Prawitz was in the US and came to UIC to give a talk. When he entered the room, he made a beeline for me, looked me in the eye and shook my hand. The message was: Well done! THAT made me proud.

There is more to say; but this answers your questions, I think; so I'll send it to avoid further delay. 

Your pdf, Propositions as Types, is very readable.


by Philip Wadler ( at August 30, 2014 06:04 PM

Joachim Breitner

DebConf 14

I’m writing this blog post on the plain from Portland towards Europe (which I now can!), using the remaining battery live after having watched one of the DebConf talks that I missed. (It was the systemd talk, which was good and interesting, but maybe I should have watched one of the power management talks, as my battery is running down faster than it should be, I believe.)

I mostly enjoyed this year’s DebConf. I must admit that I did not come very prepared: I had neither something urgent to hack on, nor important things to discuss with the other attendees, so in a way I had a slow start. I also felt a bit out of touch with the project, both personally and technically: In previous DebConfs, I had more interest in many different corners of the project, and also came with more naive enthusiasm. After more than 10 years in the project, I see a few things more realistic and also more relaxed, and don’t react on “Wouldn’t it be cool to have <emph>crazy idea</emph>” very easily any more. And then I mostly focus on Haskell packaging (and related tooling, which sometimes is also relevant and useful to others) these days, which is not very interesting to most others.

But in the end I did get to do some useful hacking, heard a few interesting talks and even got a bit excited: I created a new tool to schedule binNMUs for Haskell packages which is quite generic (configured by just a regular expression), so that it can and will be used by the OCaml team as well, and who knows who else will start using hash-based virtual ABI packages in the future... It runs via a cron job on to produce output for Haskell and for OCaml, based on data pulled via HTTP. If you are a Debian developer and want up-to-date results, log into and run ~nomeata/binNMUs --sql; it then uses the projectb and wanna-build databases directly. Thanks to the ftp team for opening up, by the way!

Unsurprisingly, I also held a talk on Haskell and Debian (slides available). I talked a bit too long and we had too little time for discussion, but in any case not all discussion would have fitted in 45 minutes. The question of which packages from Hackage should be added to Debian and which not is still undecided (which means we carry on packaging what we happen to want in Debian for whatever reason). I guess the better our tooling gets (see the next section), the more easily we can support more and more packages.

I am quite excited by and supportive of Enrico’s agenda to remove boilerplate data from the debian/ directories and relying on autodebianization tools. We have such a tool for Haskell package, cabal-debian, but it is unofficial, i.e. neither created by us nor fully endorsed. I want to change that, so I got in touch with the upstream maintainer and we want to get it into shape for producing perfect Debian packages, if the upstream provided meta data is perfect. I’d like to see the Debian Haskell Group to follows Enrico’s plan to its extreme conclusion, and this way drive innovation in Debian in general. We’ll see how that goes.

Besides all the technical program I enjoyed the obligatory games of Mao and Werewolves. I also got to dance! On Saturday night, I found a small but welcoming Swing-In-The-Park event where I could dance a few steps of Lindy Hop. And on Tuesday night, Vagrant Cascadian took us (well, three of us) to a blues dancing night, which I greatly enjoyed: The style was so improvisation-friendly that despite having missed the introduction and never having danced Blues before I could jump right in. And in contrast to social dances in Germany, where it is often announced that the girls are also invited to ask the boys, but then it is still mostly the boys who have to ask, here I took only half a minute of standing at the side until I got asked to dance. In retrospect I should have skipped the HP reception and went there directly...

I’m not heading home at the moment, but will travel directly to Göteborg to attend ICFP 2014. I hope the (usually worse) west-to-east jet lag will not prevent me from enjoying that as much as I could.

by Joachim Breitner ( at August 30, 2014 03:10 PM

Philip Wadler

Independent information on independence

A few sources of information that those interested may find helpful.

Thanks to my colleague James Cheney for spotting these. (I previously circulated a pointer to the second but not to the first or third.) The sources appear reputable, but---as with everything on the net---caveat emptor.

by Philip Wadler ( at August 30, 2014 11:59 AM

August 29, 2014

Yesod Web Framework

Announcing Persistent 2

We are happy to announce the release of persistent 2.0

persistent 2.0 adds a flexible key type and makes some breaking changes. 2.0 is an unstable release that we want your feedback on for the soon to follow stable 2.1 release.

New Features

  • type-safe composite primary and foreign keys
  • added an upsert operation (update or insert)
  • added an insertMany_ operation


  • An Id suffix is no longer automatically assumed to be a Persistent type
  • JSON serialization * MongoDB ids no longer have a prefix 'o' character.

Breaking changes

  • Use a simple ReaderT for the underlying connection
  • fix postgreSQL timezone storage
  • remove the type parameter from EntityDef and FieldDef

In depth

Composite keys

The biggest limitation of data modeling with persistent is an assumption of a simple (for current SQL backends an auto-increment) primary key. We learned from Groundhog that a more flexible primary key type is possible. Persistent adds a similar flexible key type while maintaining its existing invariant that a Key is tied to a particular table.

To understand the changes to the Key data type, lets look at a change in the test suite for persistent 2.

       i <- liftIO $ randomRIO (0, 10000)
-      let k = Key $ PersistInt64 $ abs i
+      let k = PersonKey $ SqlBackendKey $ abs i

Previously Key contained a PersistValue. This was not type safe. PersistValue is meant to serialize any basic Haskell type to the database, but a given table only allows specific values as the key. Now we generate the PersonKey data constructor which specifies the Haskell key types. SqlBackendKey is the default key type for SQL backends.

Now lets look at code from CompositeTest.hs

mkPersist sqlSettings [persistLowerCase|
      name  String maxlen=20
      name2 String maxlen=20
      age Int
      Primary name name2 age
      deriving Show Eq
      name  String maxlen=20
      name2 String maxlen=20
      age Int
      Foreign Parent fkparent name name2 age
      deriving Show Eq

Here Parent has a composite primary key made up of 3 fields. Child uses that as a foreign key. The primary key of Child is the default key for the backend.

let parent = Parent "a1" "b1" 11
let child = Child "a1" "b1" 11
kp <- insert parent
_ <- insert child
testChildFkparent child @== parent

Future changes

Short-term improvements

Before the 2.1 release I would like to look at doing some simple things to speed up model compilation a little bit.

  • Speed up some of the compile-time persistent code (there is a lot of obviously naive code).
  • Reduce the size of Template Haskell generation (create a reference for each EntityDef and some other things rather than potentially repeatedly inlining it)

Medium-term improvement: better support for Haskell data types

We want to add better support for modeling ADTs, particularly for MongoDB where this is actually very easy to do in the database itself. Persistent already support a top-level entity Sum Type and a simple field ADT that is just an enumeration.

Another pain point is serializing types not declared in the schema. The declaration syntax in groundhog is very verbose but allows for this. So one possibility would be to allow the current DRY persistent declaration style and also a groundhog declaration style.

Long-term improvements: Projections

It would be possible to add projections now as groundhog or esqueleto have done. However, the result is not as end-user friendly as we would like. When the record namespace issue is dealt with in the GHC 7.10 release we plan on adding projections to persistent.

Ongoing: Database specific functionality

We always look forward to see more databases adapters for persistent. In the last year, a Redis and ODBC adapter were added.

Every database is different though, and you also want to take advantage of your database-specific features. esqueleto and persistent-mongoDB have shown how to build database specific features in a type-safe way on top of persistent.


Although the persistent code has no dependency on Yesod, I would like to make the infrastructure a little more independent of yesod. The first steps would be

  • putting it under a different organization on github.
  • having a separate mail list (should stackoverflow be prioritized over e-mail?)

August 29, 2014 09:46 PM

Antti-Juhani Kaijanaho (ibid)

Licentiate Thesis is now publicly available

My recently accepted Licentiate Thesis, which I posted about a couple of days ago, is now available in JyX.

Here is the abstract again for reference:

Kaijanaho, Antti-Juhani
The extent of empirical evidence that could inform evidence-based design of programming languages. A systematic mapping study.
Jyväskylä: University of Jyväskylä, 2014, 243 p.
(Jyväskylä Licentiate Theses in Computing,
ISSN 1795-9713; 18)
ISBN 978-951-39-5790-2 (nid.)
ISBN 978-951-39-5791-9 (PDF)
Finnish summary

Background: Programming language design is not usually informed by empirical studies. In other fields similar problems have inspired an evidence-based paradigm of practice. Central to it are secondary studies summarizing and consolidating the research literature. Aims: This systematic mapping study looks for empirical research that could inform evidence-based design of programming languages. Method: Manual and keyword-based searches were performed, as was a single round of snowballing. There were 2056 potentially relevant publications, of which 180 were selected for inclusion, because they reported empirical evidence on the efficacy of potential design decisions and were published on or before 2012. A thematic synthesis was created. Results: Included studies span four decades, but activity has been sparse until the last five years or so. The form of conditional statements and loops, as well as the choice between static and dynamic typing have all been studied empirically for efficacy in at least five studies each. Error proneness, programming comprehension, and human effort are the most common forms of efficacy studied. Experimenting with programmer participants is the most popular method. Conclusions: There clearly are language design decisions for which empirical evidence regarding efficacy exists; they may be of some use to language designers, and several of them may be ripe for systematic reviewing. There is concern that the lack of interest generated by studies in this topic area until the recent surge of activity may indicate serious issues in their research approach.

Keywords: programming languages, programming language design, evidence-based paradigm, efficacy, research methods, systematic mapping study, thematic synthesis

by Antti-Juhani Kaijanaho at August 29, 2014 08:45 AM

August 28, 2014

Functional Jobs

Senior Software Engineer (Functional) at McGraw-Hill Education (Full-time)

This Senior Software Engineer position is with the new LearnSmart team at McGraw-Hill Education's new and growing Research & Development center in Boston's Innovation District. We make software that helps college students study smarter, earn better grades, and retain more knowledge.

The LearnSmart adaptive engine powers the products in our LearnSmart Advantage suite — LearnSmart, SmartBook, LearnSmart Achieve, LearnSmart Prep, and LearnSmart Labs. These products provide a personalized learning path that continuously adapts course content based on a student’s current knowledge and confidence level.

On our team, you'll get to:

  • Move textbooks and learning into the digital era
  • Create software used by millions of students
  • Advance the state of the art in adaptive learning technology
  • Make a real difference in education

Our team's products are built with Flow, a functional language in the ML family. Flow lets us write code once and deliver it to students on multiple platforms and device types. Other languages in our development ecosystem include especially JavaScript, but also C++, SWF (Flash), and Haxe.

If you're interested in functional languages like Scala, Swift, Erlang, Clojure, F#, Lisp, Haskell, and OCaml, then you'll enjoy learning Flow. We don't require that you have previous experience with functional programming, only enthusiasm for learning it. But if you have do some experience with functional languages, so much the better! (On-the-job experience is best, but coursework, personal projects, and open-source contributions count too.)

We require only that you:

  • Have a solid grasp of CS fundamentals (languages, algorithms, and data structures)
  • Be comfortable moving between multiple programming languages
  • Be comfortable with modern software practices: version control (Git), test-driven development, continuous integration, Agile

Get information on how to apply for this position.

August 28, 2014 09:18 PM

Douglas M. Auclair (geophf)

Dylan: the harsh realities of the market

So, this is a little case study.

I did everything for Dylan. And when I say everything, I mean everything.  Here's my resumé:

  • I got excited about Dylan as a user, and I used it. I bought an old Mac that I don't ever remember the designation for, it's so '90's old, and got the floppies for the Dylan IDE from Apple research.
I'm not joking.

  • I integrated Dylan into my work at work, building an XML parser then open-sourcing it to the community under the (then) non-restrictive license. I think mine was the only XML parser that was industrial-strength for Dylan. Can't claim originality: I ported over the Common-LISP one, but it was a lot of (fun) work.
  • I made improvements to the gwydion-dylan compiler, including some library documentation (you can see my name right there, right in the compiler code), including some library functionality, did I work on the compiler itself? The Dylan syntax extensions or type system? I don't recall; if not in those places, I know I've looked at those guts: I had my fingers all over parts of the compiler.
I was in the Dylan compiler code. For you ll-types ('little language') that's no big deal.

But ask a software developer in industry if they've ever been in their compiler code. I have, too: I've found bugs in Java Sun-compiler that I fixed locally and reported up the chain.
  • I taught a course at our community college on Dylan. I had five students from our company that made satellite mission software.
  • I effing had commercial licenses bought when the boss asked me: what do we have to do to get this (my system) done/integrated into the build. I put my job on the line, for Dylan. ... The boss bought the licenses: he'd rather spend the $x than spending six weeks to back-port down to Java or C++.
  • I built a rule-based man-power scheduling system that had previously took three administrative assistants three days each quarter to generate. My system did it, and printed out a PDF in less than one second. I sold it, so that means I started a commercial company and sold my software.
I sold commercial Dylan software. That I wrote. Myself. And sold. Because people bought it. Because it was that good.

Hells yeah.

Question: what more could I have done?

I kept Dylan alive for awhile. In industry. For real.

So why is Dylan dead?

That's not the question.

Or, that question is answered over and over and over again.

Good languages, beautiful languages, right-thing languages languish and die all the time.

Dylan was the right-thing, and they (Apple) killed it in the lab, and for a reason.

Who is Dylan for?

That's not the question either. Because you get vague, general, useless answers.

The question is to ask it like Paul Graham answered it for LISP.

Lisp is a pointless, useless, weird language that nobody uses.

But Paul and his partner didn't care. They didn't give a ...


... what anybody else thought. They knew that this language, the language they loved, was built and designed and made for them. Just them and only them, because the only other people who were using it were college kids on comp.lang.lisp asking for the answers for problem-set 3 on last night's homework.

That's what Lisp was good for: nothing.
That's who Lisp was good for: nobody.

Same exact scenario for Erlang. Exactly the same. Erlang was only good for Joe Armstrong and a couple of buddies/weirdos like him, you know: kooks, who believed that Erlang was the right-thing for what they were doing, because they were on a mission, see, and nothing nobody could say could stop them nor stand against them, and all who would rise up against them would fall.


What made Lisp and Haskell and Erlang and Scala and Prolog (yes, Prolog, although you'll never hear that success story publicly, but $26M and three lives saved? Because of a Prolog system I wrote? And that's just one day in one month's report for data? I call that a success) work when nobody sane would say that these things would work?

Well, it took a few crazy ones to say, no, not: 'say' that it would work, but would make it work with their beloved programming language come hell or high water or, worse: indifferent silence, or ridicule, or pity from the rest of the world.

That is the lesson of perl and python and all these other languages. They're not good for anything. They suck. And they suck in libraries and syntax and semantics and weirdness-factor and everything.

But two, not one, but at least two people loved that language enough to risk everything, and ...

They lost.

Wait. What?

Did you think I was going to paint the rosy picture and lie to you and say 'they won'?

Because they didn't.

Who uses Lisp commercially? Or Haskell, except some fringers, or Scala or Clojure or Erlang or Smalltalk or Prolog

... or Dylan.

These languages are defined, right there in the dictionary.

Erlang: see 'career wrecker.'

Nobody uses those languages nor admits to even touching them with a 10-foot (3-meter) pole. I had an intern from college. 'Yeah, we studied this weird language called ML in Comp.sci. Nobody uses it.'

She was blown away when I started singing ML's praises and what it can do.

A meta-language, and she called it useless? Seriously?

Because that's what the mainstream sees.

Newsflash. I'm sorry. Dylan, Haskell, Idris: these aren't main-stream, and they never will be.

Algebraic types? Dependent types? You'll never see them. They're too ... research-y. They stink of academe, which is: they stink of uselessness-to-industry. You'll be dead and buried to see them in this form, even after they discover the eternity elixir. Sorry.

Or you'll see them in Visual Basic as a new Type-class form that only a few Microserfs use because they happened to have written those extensions. Everybody else?


Here's how Dylan will succeed, right now.

Bruce and I will put our heads together, start a company, and we'll code something. Not for anybody else to use and to love and to cherish, just for us, only for us, and it will blow out the effing doors, and we'll be bought out for $40M because our real worth is $127M.

And the first thing that Apple will do, after they bought us, is to show us the door, then convert the code into Java. Or Swift. Or Objective-C, or whatever.

And that's how we'll win.

Not the $40M. Not the lecture series on 'How to Make Functional Programming Work in Industry for Real' afterwards at FLoC and ICFP conferences with fan-bois and -girls wanting to talk to us afterwards and ask us how they can get a job doing functional programming.

Not that.

We'll win because we made something in Dylan, and it was real, and it worked, and it actually did something for enough people that we can now go to our graves knowing that we did something once with our lives (and we can do it again and again, too: there's no upper limit on the successes you're allowed to have, people) that meant something to some bodies. And we did that. With Dylan.


I've done that several times already, by my counting: the Prolog project, the Dylan project, the Mercury project, and my writing.

I'm ready to do that, again.

Because, actually, fundamentally, doing something in this world and for it ... there's nothing like it.

You write that research paper, and I come up to you, waving it in your face, demanding you implement your research because I need it to do my job in Industry?

I've done that to three professors so far. Effing changed their world-view in that moment. "What?" they said, to a person, "somebody actually wants to use this?" The look of bemused surprise on their faces?

It was sad, actually, because they did write something that somebody out there (moiself) needed, but they never knew that what they were doing meant something.

And it did.

Effing change your world-view. Your job? Your research? Your programming language?

That's status quo, and that's good and necessary and dulce and de leche (or decorum, I forget which).

But get up out of the level you're at, and do something with it so that that other person, slouched in their chair, sits up and takes notice, and a light comes over their face and they say, 'Ooh! That does that? Wow!' and watch their world change, because of you and what you've done.

Dylan is for nothing and for nobody.

So is everything under the Sun, my friend.

Put your hand to the plow, and with the sweat of your brow, make it yours for this specific thing.

Regardless of the long hours, long months of unrewarded work, and regardless of the hecklers, naysayers, and concerned friends and parents, and regardless of the mountain of unpaid bills.

You make it work, and you don't stop until it does.

That's how I've won.

Every time.

by geophf ( at August 28, 2014 04:06 PM


Dealing with Asynchronous Exceptions during Resource Acquisition


Consider the following code: we open a socket, compute with it, and finally close the socket again. The computation happens inside an exception handler (try), so even when an exception happens we still close the socket:

example1 :: (Socket -> IO a) -> IO a 
example1 compute = do -- WRONG
  s <- openSocket 
  r <- try $ compute s
  closeSocket s
  case r of
    Left ex -> throwIO (ex :: SomeException)
    Right a -> return a

Although this code correctly deals with synchronous exceptions–exceptions that are the direct result of the execution of the program–it does not deal correctly with asynchronous exceptions–exceptions that are raised as the result of an external event, such as a signal from another thread. For example, in

example2 :: (Socket -> IO a) -> IO (Maybe a)
example2 compute = timeout someTimeout $ example1 compute

it is possible that the timeout signal arrives after we have opened the socket but before we have installed the exception handler (or indeed, after we leave the scope of the exception handler but before we close the socket). In order to address this we have to control precisely where asynchronous exceptions can and cannot be delivered:

example3 :: (Socket -> IO a) -> IO a
example3 compute =
  mask $ \restore -> do
    s <- openSocket 
    r <- try $ restore $ compute s
    closeSocket s
    case r of
      Left ex -> throwIO (ex :: SomeException)
      Right a -> return a

We mask asynchronous exceptions, and then restore them only inside the scope of the exception handler. This very common pattern is captured by the higher level combinator bracket, and we might rewrite the example as

example4 :: (Socket -> IO a) -> IO a
example4 = bracket openSocket closeSocket

Allowing asynchronous exceptions during resource acquisition

Suppose that we wanted to define a derived operation that opens a socket and performs some kind of handshake with the server on the other end:

openHandshake :: IO Socket
openHandshake = do
  mask $ \restore -> do
    s <- openSocket
    r <- try $ restore $ handshake s
    case r of
      Left  ex -> closeSocket s >> throwIO (ex :: SomeException)
      Right () -> return s 

(These and the other examples can be defined in terms of bracket and similar, but we use mask directly so that it’s easier to see what is happening.) We might use openHandshake as follows:

example5 :: (Socket -> IO a) -> IO a
example5 compute = do
  mask $ \restore -> do
    s <- openHandshake 
    r <- try $ restore $ compute s
    closeSocket s
    case r of
      Left ex -> throwIO (ex :: SomeException)
      Right a -> return a

There are no resource leaks in this code, but there is a different problem: we call openHandshake with asynchronous exceptions masked. Although openHandshake calls restore before doing the handshake, restore restores the masking state to that of the enclosing context. Hence the handshake with the server cannot be timed out. This may not be what we want–we may want to be able to interrupt example5 with a timeout either during the handshake or during the argument computation.

Note that this is not a solution:

example6 :: (Socket -> IO a) -> IO a 
example6 compute = do
  mask $ \restore -> do
    s <- restore openHandshake -- WRONG
    r <- try $ restore $ compute s
    closeSocket s
    case r of
      Left ex -> throwIO (ex :: SomeException)
      Right a -> return a

Consider what might happen: if an asynchronous exception is raised after openHandshake returns the socket, but before we leave the scope of restore, the asynchronous exception will be raised and the socket will be leaked. Installing an exception handler does not help: since we don’t have a handle on the socket, we cannot release it.

Interruptible operations

Consider this definition from the standard libraries:

withMVar :: MVar a -> (a -> IO b) -> IO b
withMVar m io =
  mask $ \restore -> do
    a <- takeMVar m
    b <- restore (io a) `onException` putMVar m a
    putMVar m a
    return b

This follows almost exactly the same pattern as the examples we have seen so far; we mask asynchronous exceptions, take the contents of the MVar, and then execute some operation io with the contents of the MVar, finally putting the contents of the MVar back when the computation completes or when an exception is raised.

An MVar acts as a lock, with takeMVar taking the role of acquiring the lock. This may, of course, take a long time if the lock is currently held by another thread. But we call takeMVar with asynchronous exceptions masked. Does this mean that the takeMVar cannot be timed out? No, it does not: takeMVar is a so-called interruptible operation. From the Asynchronous Exceptions in Haskell paper:

Any operation which may need to wait indefinitely for a resource (e.g., takeMVar) may receive asynchronous exceptions even within an enclosing block, but only while the resource is unavailable. Such operations are termed interruptible operations. (..) takeMVar behaves atomatically when enclosed in block. The takeMVar may receive asynchronous exceptions right up until the point when it acquires the MVar, but not after.

(block has been replaced by mask since the publication of the paper, but the principle is the same.) Although the existence of interruptible operations makes understanding the semantics of mask harder, they are necessary: like in the previous section, wrapping takeMVar in restore is not safe. If we really want to mask asynchronous exceptions, even across interruptible operations, Control.Exception offers uninterruptibleMask.

Custom interruptible operations

So an interruptible operation is one that can be interrupted by an asynchronous exception even when asynchronous exceptions are masked. Can we define our own interruptible operations? Yes, we can:

-- | Open a socket and perform handshake with the server
-- Note: this is an interruptible operation.
openHandshake' :: IO Socket
openHandshake' = 
  mask_ $ do 
    s <- openSocket
    r <- try $ unsafeUnmask $ handshake s
    case r of
      Left  ex -> closeSocket s >> throwIO (ex :: SomeException)
      Right () -> return s 

unsafeUnmask is defined in GHC.IO, and unmasks asynchronous exceptions, no matter what the enclosing context is. This is of course somewhat dangerous, because now calling openHandshake' inside a mask suddenly opens up the possibility of an asynchronous exception being raised; and the only way to know is to look at the implementation of openHandshake', or its Haddock documentation. This is somewhat unsatisfactory, but exactly the same goes for takeMVar and any other interruptible operation, or any combinator that uses an interruptible operation under the hood. A sad state of affairs, perhaps, but one that we don’t currently have a better solution for.

Actually, using unsafeUnmask is a bit too crude. Control.Exception does not export it, but does export

allowInterrupt :: IO ()
allowInterrupt = unsafeUnmask $ return ()

with documentation

When invoked inside mask, this function allows a blocked asynchronous exception to be raised, if one exists. It is equivalent to performing an interruptible operation, but does not involve any actual blocking.

When called outside mask, or inside uninterruptibleMask, this function has no effect.

(emphasis mine.) Sadly, this documentation does not reflect the actual semantics: unsafeUnmask, and as a consequence allowInterrupt, unmasks asynchronous exceptions no matter what the enclosing context is: even inside uninterruptibleMask. We can however define our own operator to do this:

interruptible :: IO a -> IO a
interruptible act = do
  st <- getMaskingState
  case st of
    Unmasked              -> act
    MaskedInterruptible   -> unsafeUnmask act
    MaskedUninterruptible -> act 

where we call unsafeUnmask only if the enclosing context is mask, but not if it is uninterruptibleMask (TODO: What is the semantics when we nest these two?). We can use it as follows to define a better version of openHandshake:

-- | Open a socket and perform handshake with the server
-- Note: this is an interruptible operation.
openHandshake' :: IO Socket
openHandshake' = 
  mask_ $ do 
    s <- openSocket
    r <- try $ interruptible $ handshake s
    case r of
      Left  ex -> closeSocket s >> throwIO (ex :: SomeException)
      Right () -> return s 

Resource allocation timeout

If we wanted to timeout the allocation of the resource only, we might do

example7 :: (Socket -> IO a) -> IO a
example7 compute = do
  mask $ \restore -> do
    ms <- timeout someTimeout $ openHandshake'
    case ms of
      Nothing -> throwIO (userError "Server busy")
      Just s  -> do r <- try $ restore $ compute s
                    closeSocket s
                    case r of
                      Left ex -> throwIO (ex :: SomeException)
                      Right a -> return a

Exceptions are masked when we enter the scope of the timeout, and are unmasked only once we are inside the exception handler in openHandshake'–in other words, if a timeout happens, we are guaranteed to clean up the socket. The surrounding mask is however necessary. For example, suppose we are writing some unit tests and we are testing openHandshake'. This is wrong:

example8 :: IO ()
example8 = do 
  ms <- timeout someTimeout $ openHandshake'
  case ms of
    Just s  -> closeSocket s
    Nothing -> return ()

Even if we are sure that the example8 will not be interrupted by asynchronous exceptions, there is still a potential resource leak here: the timeout exception might be raised just after we leave the mask_ scope from openHandshake but just before we leave the timeout scope. If we are sure we don’t need to worry about other asynchronous exceptions we can write

example8 :: IO ()
example8 = do
  s <- mask_ $ timeout someTimeout $ openHandshake'
  case ms of
    Just s  -> closeSocket s
    Nothing -> return ()

although of course it might be better to simply write

example8 :: IO ()
example8 = 
  bracket (timeout someTimeout $ openHandshake')
          (\ms -> case ms of Just s  -> closeSocket s
                             Nothing -> return ())
          (\_ -> return ())


Making sure that resources are properly deallocated in the presence of asynchronous exceptions is difficult. It is very important to make sure that asynchronous exceptions are masked at crucial points; unmasking them at the point of calling a resource allocation function is not safe. If you nevertheless want to be able to timeout resource allocation, you need to make your resource allocation function interruptible.

For completeness’ sake, there are some other solutions that avoid the use of unsafeUnmask. One option is to thread the restore argument through (and compose multiple restore arguments if there are multiple nested calls to mask). This requires resource allocations to have a different signature, however, and it is very error prone: a single mask somewhere along the call chain where we forget to thread through the restore argument will mean the code is no longer interruptible. The other option is to run the code that we want to be interruptible in a separate thread, and wait for the thread to finish with, for example, a takeMVar. Getting this right is however no easy task, and it doesn’t change anything fundamentally anyway: rather than using unsafeUnmask we are now using a primitive interruptible operation; either way we introduce the possibility of exceptions even in the scope of mask_.

Finally, when your application does not fit the bracket pattern we have been using (implicitly or explicitly), you may want to have a look at resourcet and pipes or conduit, or my talk Lazy I/O and Alternatives in Haskell.

by edsko, duncan at August 28, 2014 03:48 PM

Keegan McAllister

Calling a Rust library from C (or anything else!)

One reason I'm excited about Rust is that I can compile Rust code to a simple native-code library, without heavy runtime dependencies, and then call it from any language. Imagine writing performance-critical extensions for Python, Ruby, or Node in a safe, pleasant language that has static lifetime checking, pattern matching, a real macro system, and other goodies like that. For this reason, when I started html5ever some six months ago, I wanted it to be more than another "Foo for BarLang" project. I want it to be the HTML parser of choice, for a wide variety of applications in any language.

Today I started work in earnest on the C API for html5ever. In only a few hours I had a working demo. And this is a fairly complicated library, with 5,000+ lines of code incorporating

It's pretty cool that we can use all this machinery from C, or any language that can call C. I'll describe first how to build and use the library, and then I'll talk about the implementation of the C API.

html5ever (for C or for Rust) is not finished yet, but if you're feeling adventurous, you are welcome to try it out! And I'd love to have more contributors. Let me know on GitHub about any issues you run into.

Using html5ever from C

Like most Rust libraries, html5ever builds with Cargo.

$ git clone
$ cd html5ever
$ git checkout dev
$ cargo build
Updating git repository ``
Compiling phf_mac v0.0.0 (
Compiling html5ever-macros v0.0.0 (file:///tmp/html5ever)
Compiling phf v0.0.0 (
Compiling html5ever v0.0.0 (file:///tmp/html5ever)

The C API isn't Cargo-ified yet, so we'll build it using the older Makefile-based system.

$ mkdir build
$ cd build
$ ../configure
$ make libhtml5ever_for_c.a
rustc -D warnings -C rpath -L /tmp/html5ever/target -L /tmp/html5ever/target/deps \
-o libhtml5ever_for_c.a --cfg for_c --crate-type staticlib /tmp/html5ever/src/
warning: link against the following native artifacts when linking against this static library
note: the order and any duplication can be significant on some platforms, and so may need to be preserved
note: library: rt
note: library: dl
note: library: pthread
note: library: gcc_s
note: library: pthread
note: library: c
note: library: m

Now we can build an example C program using that library, and following the link instructions produced by rustc.

$ H5E_PATH=/tmp/html5ever
$ gcc -Wall -o tokenize tokenize.c -I $H5E_PATH/capi -L $H5E_PATH/build \
-lhtml5ever_for_c -lrt -ldl -lpthread -lgcc_s -lpthread -lc -lm

$ ./tokenize 'Hello&comma; <i class=excellent>world!</i>'
CHARS : Hello
TAG : <i>
ATTR: class="excellent"
CHARS : world!
TAG : </i>

The build process is pretty standard for C; we just link a .a file and its dependencies. The biggest obstacle right now is that you won't find the Rust compiler in your distro's package manager, because the language is still changing so rapidly. But there's a ton of effort going into stabilizing the language for a Rust 1.0 release this year. It won't be too long before rustc is a reasonable build dependency.

Let's look at the C client code.

#include <stdio.h>

#include "html5ever.h"

void put_str(const char *x) {
fputs(x, stdout);

void put_buf(struct h5e_buf text) {
fwrite(, text.len, 1, stdout);

void do_start_tag(void *user, struct h5e_buf name, int self_closing, size_t num_attrs) {
put_str("TAG : <");
if (self_closing) {

// ...

struct h5e_token_ops ops = {
.do_chars = do_chars,
.do_start_tag = do_start_tag,
.do_tag_attr = do_tag_attr,
.do_end_tag = do_end_tag,

struct h5e_token_sink sink = {
.ops = &ops,
.user = NULL,

int main(int argc, char *argv[]) {
if (argc < 2) {
printf("Usage: %s 'HTML fragment'\n", argv[0]);
return 1;

struct h5e_tokenizer *tok = h5e_tokenizer_new(&sink);
h5e_tokenizer_feed(tok, h5e_buf_from_cstr(argv[1]));
return 0;

The struct h5e_token_ops contains pointers to callbacks. Any events we don't care to handle are left as NULL function pointers. Inside main, we create a tokenizer and feed it a string. html5ever for C uses a simple pointer+length representation of buffers, which is this struct h5e_buf you see being passed by value.

This demo only does tokenization, not tree construction. html5ever can perform both phases of parsing, but the API surface for tree construction is much larger and I didn't get around to writing C bindings yet.

Implementing the C API

Some parts of Rust's libstd depend on runtime services, such as task-local data, that a C program may not have initialized. So the first step in building a C API was to eliminate all std:: imports. This isn't nearly as bad as it sounds, because large parts of libstd are just re-exports from other libraries like libcore that we can use with no trouble. To be fair, I did write html5ever with the goal of a C API in mind, and I avoided features like threading that would be difficult to integrate. So your library might give you more trouble, depending on which Rust features you use.

The next step was to add the #![no_std] crate attribute. This means we no longer import the standard prelude into every module. To compensate, I added use core::prelude::*; to most of my modules. This brings in the parts of the prelude that can be used without runtime system support. I also added many imports for ubiquitous types like String and Vec, which come from libcollections.

After that I had to get rid of the last references to libstd. The biggest obstacle here involved macros and deriving, which would produce references to names under std::. To work around this, I create a fake little mod std which re-exports the necessary parts of core and collections. This is similar to libstd's "curious inner-module".

I also had to remove all uses of format!(), println!(), etc., or move them inside #[cfg(not(for_c))]. I needed to copy in the vec!() macro which is only provided by libstd, even though the Vec type is provided by libcollections. And I had to omit debug log messages when building for C; I did this with conditionally-defined macros.

With all this preliminary work done, it was time to write the C bindings. Here's how the struct of function pointers looks on the Rust side:

pub struct h5e_token_ops {
do_start_tag: extern "C" fn(user: *mut c_void, name: h5e_buf,
self_closing: c_int, num_attrs: size_t),

do_tag_attr: extern "C" fn(user: *mut c_void, name: h5e_buf,
value: h5e_buf),

do_end_tag: extern "C" fn(user: *mut c_void, name: h5e_buf),

// ...

The processing of tokens is straightforward. We pattern-match and then call the appropriate function pointer, unless that pointer is NULL. (Edit: eddyb points out that storing NULL as an extern "C" fn is undefined behavior. Better to use Option<extern "C" fn ...>, which will optimize to the same one-word representation.)

To create a tokenizer, we heap-allocate the Rust data structure in a Box, and then transmute that to a raw C pointer. When the C client calls h5e_tokenizer_free, we transmute this pointer back to a box and drop it, which will invoke destructors and finally free the memory.

You'll note that the functions exported to C have several special annotations:

  • #[no_mangle]: skip name mangling, so we end up with a linker symbol named h5e_tokenizer_free instead of _ZN5for_c9tokenizer18h5e_tokenizer_free.
  • unsafe: don't let Rust code call these functions unless it promises to be careful.
  • extern "C": make sure the exported function has a C-compatible ABI. The data structures similarly get a #[repr(C)] attribute.

Then I wrote a C header file matching this ABI:

struct h5e_buf {
unsigned char *data;
size_t len;

struct h5e_buf h5e_buf_from_cstr(const char *str);

struct h5e_token_ops {
void (*do_start_tag)(void *user, struct h5e_buf name,
int self_closing, size_t num_attrs);

void (*do_tag_attr)(void *user, struct h5e_buf name,
struct h5e_buf value);

void (*do_end_tag)(void *user, struct h5e_buf name);

/// ...

struct h5e_tokenizer;

struct h5e_tokenizer *h5e_tokenizer_new(struct h5e_token_sink *sink);
void h5e_tokenizer_free(struct h5e_tokenizer *tok);
void h5e_tokenizer_feed(struct h5e_tokenizer *tok, struct h5e_buf buf);
void h5e_tokenizer_end(struct h5e_tokenizer *tok);

One remaining issue is that Rust is hard-wired to use jemalloc, so linking html5ever will bring that in alongside the system's libc malloc. Having two separate malloc heaps will likely increase memory consumption, and it prevents us from doing fun things like allocating Boxes in Rust that can be used and freed in C. Before Rust can really be a great choice for writing C libraries, we need a better solution for integrating the allocators.

If you'd like to talk about calling Rust from C, you can find me as kmc in #rust and #rust-internals on And if you run into any issues with html5ever, do let me know, preferably by opening an issue on GitHub. Happy hacking!

by keegan ( at August 28, 2014 06:18 AM

August 27, 2014

Philip Wadler

Informatics Independence Referendum Debate - the result

Here is the outcome of the debate announced earlier:
<embed allowfullscreen="true" allowscriptaccess="always" flashvars="config=;file=" height="360" src="" width="740"></embed>
Yes 19
No 25
Undecided 28

Yes 28
No 31
Undecided 26

(Either some people entered after the debate began, or some people began the debate unsure even whether they were undecided.)

Thank you to Alan, Mike, and the audience for a fantastic debate. The audience asked amazing questions on both sides, clearly much involved with the process.

Video of debate here.
Alan's slides here.
My slides here.

by Philip Wadler ( at August 27, 2014 01:43 PM

Bill Atkins

NSNotificationCenter, Swift and blocks

The conventional way to register observers with NSNotificationCenter is to use the target-action pattern. While this gets the job done, it's inherently not type-safe.

For example, the following Swift snippet will compile perfectly:

    NSNotificationCenter.defaultCenter().addObserver(self, selector: Selector("itemAdded:"),
      name: MyNotificationItemAdded, object: nil)

even though at runtime it will fail unless self has a method named itemAdded that takes exactly one parameter (leaving off that last colon in the selector will turn this line into a no-op). Plus, this method gives you no way to take advantages of Swift's closures, which would allow the observer to access local variables in the method that adds the observer and would eliminate the need to create a dedicated method to handle the event.

A better way to do this is to use blocks. And NSNotificationCenter does include a block-based API:

    NSNotificationCenter.defaultCenter().addObserverForName(MyNotificationItemAdded, object: nil, queue: nil) { note in
      // ...

This is much nicer, especially with Swift's trailing closure syntax. There are no method names to be looked up at runtime, we can refer to local variables in the method that registered the observer and we can perform small bits of logic in reaction to events without having to create and name dedicated methods.

The catch comes in resource management. It's very important that an object remove its event observers when it's deallocated, or else NSNotificationCenter will try to invoke methods on invalid pointers.

The traditional target-action method has the one advantage that we can easily handle this requirement with a single call in deinit:

  deinit {

With the block API, however, since there is no explicit target object, each call to addObserverForName returns "an opaque object to act as observer." So your observer class would need to track all of these objects and then remove them all from the notification center in deinit, which is a pain.

In fact, the hassle of having to do bookkeeping on the observer objects almost cancels out the convenience of using the block API. Frustrated by this situation, I sat down and created a simple helper class, NotificationManager:

class NotificationManager {
  private var observerTokens: [AnyObject] = []

  deinit {

  func deregisterAll() {
    for token in observerTokens {

    observerTokens = []

  func registerObserver(name: String!, block: (NSNotification! -> ()?)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: nil, queue: nil) {note in

  func registerObserver(name: String!, forObject object: AnyObject!, block: (NSNotification! -> ()?)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: object, queue: nil) {note in

First, this simple class provides a Swift-specialized API around NSNotificationCenter.  It provides an additional convenience method without an object parameter (rarely used, in my experience) to make it easier to use trailing-closure syntax. But most importantly, it keeps track of the observer objects generated when observers are registered, and removes them when the object is deinit'd.

A client of this class can simply keep a member variable of type NotificationManager and use it to register its observers. When the parent class is deallocated, the deinit method will automatically be called on its NotificationManager member variable, and its observers will be properly disposed of:

class MyController: UIViewController {
  private let notificationManager = NotificationManager()
  override init() {
    notificationManager.registerObserver(MyNotificationItemAdded) { note in
      println("item added!")
  required init(coder: NSCoder) {
    fatalError("decoding not implemented")

When the MyController instance is deallocated, its NotificationManager member variable will be automatically deallocated, triggering the call to deregisterAll that will remove the dead objects from NSNotificationCenter.

In my apps, I add a notificationManager instance to my common UIViewController base class so I don't have to explicitly declare the member variable in all of my controller subclasses.

Another benefit of using my own wrapper around NSNotificationCenter is that I can add useful functionality, like group observers: an observer that's triggered when any one of a group of notifications are posted:

struct NotificationGroup {
  let entries: [String]
  init(_ newEntries: String...) {
    entries = newEntries


extension NotificationManager {
  func registerGroupObserver(group: NotificationGroup, block: (NSNotification! -> ()?)) {
    for name in group.entries {
      registerObserver(name, block: block)

This can be a great way to easily set up an event handler to run when, for example, an item is changed in any way at all:

   let MyNotificationItemsChanged = NotificationGroup(

    notificationManager.registerGroupObserver(MyNotificationItemsChanged) { note in
      // ...

by More Indirection ( at August 27, 2014 11:21 AM

Simon Michael

Creating well-behaved Hakyll blog posts

Posts in a Hakyll-powered blog need to be created with care, if you want your feed to work well with clients and aggregators. There are many things to remember:

  • If you have clones of your site, decide which one to work in and make sure it’s up to date
  • Create the file in the right place
  • Name it consistently (I use
  • In my setup, prefix it with _ if it’s a draft (I render but don’t publish those)
  • Set title, tags, and author with a metadata block
  • Set published time with metadata to get a more precise timestamp than Hakyll can guess from the filename. Include a time zone. Use the right format.
  • When moving a post from draft to published:
    • Update the published time
    • Update the file name if the title or publish date has changed
  • If changing a post after it has been published: set updated time in the metadata
  • At some point, commit it to version control and sync it to other clones

I find this makes blogging feel tedious, especially after an absence when I’ve forgotten the details. Case in point, I managed to share an ugly template post with Planet Haskell readers while working on this one.

So I’m trying out this bash shell script, maybe it will help. Adjust to suit your setup.
(updated 2014/8/27)

# add to ~/.bashrc


# List recent blog posts.
alias blog-ls="ls $BLOGDIR | tail -10"

# Create a new hakyll-compatible draft blog post.
# blog-new ["The Title" ["tag1, tag2" ["Author Name"]]]
function blog-new() {
    TITLE=${1:-Default Title}
    TAGS=${2:-defaulttag1, defaulttag2}
    AUTHOR=${3:-Default Author Name}
    SLUG=${TITLE// /-}
    DATE=`date +"%Y-%m-%d"`
    TIME=`date +"%Y-%m-%d %H:%M:%S%Z"`
    echo creating $BLOGDIR/$FILE
    cat <<EOF >>$BLOGDIR/$FILE && emacsclient -n $BLOGDIR/$FILE
title:     $TITLE
tags:      $TAGS
author:    $AUTHOR
published: $TIME


An example:

$ blog-new 'Scripted Hakyll blog post creation' 'hakyll, haskell'
(file opens in emacs, edit & save)
$ make
./site build
  Creating store...
  Creating provider...
  Running rules...
Checking for out-of-date items
  updated blog/

See also part 2.

August 27, 2014 02:15 AM

Well-behaved Hakyll blog posts, continued

More hakyll blog fixes:

Ugly things showing on planets

My posts were showing unwanted things on planet haskell - double heading, redundant date, tag links, and ugly disqus html. By comparing with Jasper Van der Jeugt’s blog, I found the problem: I was snapshotting content for the feed at the wrong time, after applying the post template:

>>= return . fmap demoteHeaders
>>= loadAndApplyTemplate "templates/post.html"    (postCtx tags)
>>= saveSnapshot "content"
>>= loadAndApplyTemplate "templates/default.html" defaultContext


>>= saveSnapshot "content"  --
>>= return . fmap demoteHeaders
>>= loadAndApplyTemplate "templates/post.html"    (postCtx tags)
>>= loadAndApplyTemplate "templates/default.html" defaultContext

Manual feed publishing

The main blog feed is now generated with a _ prefix, and I must manually rename it (with make feed) to make it live it on Planet Haskell. This will hopefully reduce snafus (and not create new ones).

./site.hs 95
-    create ["blog.xml"] $ do
+    create ["_blog.xml"] $ do

./Makefile 14
+feed: _site/blog.xml
+_site/blog.xml: _site/_blog.xml
+	cp _site/_blog.xml _site/blog.xml

Better HTML titles

Changed the “Joyful Systems” prefix to a suffix in the HTML page titles, making search results and browser tab names more useful.

August 27, 2014 02:00 AM

FP Complete

IAP: conduit stream fusion

Both the changes described in this blog post, and in the previous blog post, are now merged to the master branch of conduit, and have been released to Hackage as conduit 1.2.0. That doesn't indicate stream fusion is complete (far from it!). Rather, the optimizations we have so far are valuable enough that I want them to be available immediately, and future stream fusion work is highly unlikely to introduce further breaking changes. Having the code on Hackage will hopefully also make it easier for others to participate in the discussion around this code.

Stream fusion

Last time, I talked about applying the codensity transform to speed up conduit. This greatly increases performance when performing many monadic binds. However, this does nothing to help us with speeding up the "categorical composition" of conduit, where we connect two components of a pipeline together so the output from the first flows into the second. conduit usually refers to this as fusion, but given the topic at hand (stream fusion), I think that nomenclature will become confusing. So let's stick to categorical composition, even though conduit isn't actually a category.

Duncan Coutts, Roman Leshchinskiy and Don Stewart wrote the stream fusion paper, and that technique has become integral to getting high performance in the vector and text packages. The paper is well worth the read, but for those unfamiliar with the technique, let me give a very brief summary:

  • GHC is very good at optimising non-recursive functions.
  • We express all of our streaming functions has a combination of some internal state, and a function to step over that state.
  • Stepping either indicates that the stream is complete, there's a new value and a new state, or there's a new state without a new value (this last case helps avoid recursion for a number of functions like filter).
  • A stream transformers (like map) takes a Stream as input and produces a new Stream as output.
  • The final consuming functions, like fold, are the only place where recursion happens. This allows all other components of the pipeline to be inlined, rewritten to more efficient formats, and optimized by GHC.

Let's see how this looks compared to conduit.

Data types

I'm going to slightly rename data types from stream fusion to avoid conflicts with existing conduit names. I'm also going to add an extra type parameter to represent the final return value of a stream; this is a concept that exists in conduit, but not common stream fusion.

data Step s o r
    = Emit s o
    | Skip s
    | Stop r
data Stream m o r = forall s. Stream
    (s -> m (Step s o r))
    (m s)

The Step datatype takes three parameters. s is the internal state used by the stream, o is the type of the stream of values it generates, and r is the final result value. The Stream datatype uses an existential to hide away that internal state. It then consists of a step function that takes a state and gives us a new Step, as well as an initial state value (which is a monadic action, for cases where we want to do some initialization when starting a stream).

Let's look at some functions to get a feel for what this programming style looks like:

enumFromToS_int :: (Integral a, Monad m) => a -> a -> Stream m a ()
enumFromToS_int !x0 !y =
    Stream step (return x0)
    step x | x <= y    = return $ Emit (x + 1) x
           | otherwise = return $ Stop ()

This function generates a stream of integral values from x0 to y. The internal state is the current value to be emitted. If the current value is less than or equal to y, we emit our current value, and update our state to be the next value. Otherwise, we stop.

We can also write a function that transforms an existing stream. mapS is likely the simplest example of this:

mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r
mapS f (Stream step ms0) =
    Stream step' ms0
    step' s = do
        res <- step s
        return $ case res of
            Stop r -> Stop r
            Emit s' a -> Emit s' (f a)
            Skip s' -> Skip s'

The trick here is to make a function from one Stream to another. We unpack the input Stream constructor to get the input step and state functions. Since mapS has no state of its own, we simply keep the input state unmodified. We then provide our modified step' function. This calls the input step function, and any time it sees an Emit, applies the user-provided f function to the emitted value.

Finally, let's consider the consumption of a stream with a strict left fold:

foldS :: Monad m => (b -> a -> b) -> b -> Stream m a () -> m b
foldS f b0 (Stream step ms0) =
    ms0 >>= loop b0
    loop !b s = do
        res <- step s
        case res of
            Stop () -> return b
            Skip s' -> loop b s'
            Emit s' a -> loop (f b a) s'

We unpack the input Stream constructor again, get the initial state, and then loop. Each loop, we run the input step function.

Match and mismatch with conduit

There's a simple, straightforward conversion from a Stream to a Source:

toSource :: Monad m => Stream m a () -> Producer m a
toSource (Stream step ms0) =
    lift ms0 >>= loop
    loop s = do
        res <- lift $ step s
        case res of
            Stop () -> return ()
            Skip s' -> loop s'
            Emit s' a -> yield a >> loop s'

We extract the state, and then loop over it, calling yield for each emitted value. And ignoring finalizers for the moment, there's even a way to convert a Source into a Stream:

fromSource :: Monad m => Source m a -> Stream m a ()
fromSource (ConduitM src0) =
    Stream step (return $ src0 Done)
    step (Done ()) = return $ Stop ()
    step (Leftover p ()) = return $ Skip p
    step (NeedInput _ p) = return $ Skip $ p ()
    step (PipeM mp) = liftM Skip mp
    step (HaveOutput p _finalizer o) = return $ Emit p o

Unfortunately, there's no straightforward conversion for Conduits (transformers) and Sinks (consumers). There's simply a mismatch in the conduit world- which is fully continuation based- to the stream world- where the upstream is provided in an encapsulated value. I did find a few representations that mostly work, but the performance characteristics are terrible.

If anyone has insights into this that I missed, please contact me, as this could have an important impact on the future of stream fusion in conduit. But for the remainder of this blog post, I will continue under the assumption that only Source and Stream can be efficiently converted.


Once I accepted that I wouldn't be able to convert a stream transformation into a conduit transformation, I was left with a simple approach to start working on fusion: have two representations of each function we want to be able to fuse. The first representation would use normal conduit code, and the second would be streaming. This looks like:

data StreamConduit i o m r = StreamConduit
    (ConduitM i o m r)
    (Stream m i () -> Stream m o r)

Notice that the second field uses the stream fusion concept of a Stream-transforming function. At first, this may seem like it doesn't properly address Sources and Sinks, since the former doesn't have an input Stream, and the latter results in a single output value, not a Stream. However, those are really just special cases of the more general form used here. For Sources, we provide an empty input stream, and for Sinks, we continue executing the Stream until we get a Stop constructor with the final result. You can see both of these in the implementation of the connectStream function (whose purpose I'll explain in a moment):

connectStream :: Monad m
              => StreamConduit () i    m ()
              -> StreamConduit i  Void m r
              -> m r
connectStream (StreamConduit _ stream) (StreamConduit _ f) =
    run $ f $ stream $ Stream emptyStep (return ())
    emptyStep _ = return $ Stop ()
    run (Stream step ms0) =
        ms0 >>= loop
        loop s = do
            res <- step s
            case res of
                Stop r -> return r
                Skip s' -> loop s'
                Emit _ o -> absurd o

Notice how we've created an empty Stream using emptyStep and a dummy () state. And on the run side, we loop through the results. The type system (via the Void datatype) prevents the possibility of a meaningful Emit constructor, and we witness this with the absurd function. For Stop we return the final value, and Skip implies another loop.

Fusing StreamConduit

Assuming we have some functions that use StreamConduit, how do we get things to fuse? We still need all of our functions to have a ConduitM type signature, so we start off with a function to convert a StreamConduit into a ConduitM:

unstream :: StreamConduit i o m r -> ConduitM i o m r
unstream (StreamConduit c _) = c
{-# INLINE [0] unstream #-}

Note that we hold off on any inlining until simplification phase 0. This is vital to our next few rewrite rules, which is where all the magic happens.

The next thing we want to be able to do is categorically compose two StreamConduits together. This is easy to do, since a StreamConduit is made up of ConduitMs which compose via the =$= operator, and Stream transformers, which compose via normal function composition. This results in a function:

fuseStream :: Monad m
           => StreamConduit a b m ()
           -> StreamConduit b c m r
           -> StreamConduit a c m r
fuseStream (StreamConduit a x) (StreamConduit b y) = StreamConduit (a =$= b) (y . x)
{-# INLINE fuseStream #-}

That's very logical, but still not magical. The final trick is a rewrite rule:

{-# RULES "fuseStream" forall left right.
        unstream left =$= unstream right = unstream (fuseStream left right)

We're telling GHC that, if we see a composition of two streamable conduits, then we can compose the stream versions of them and get the same result. But this isn't enough yet; unstream will still end up throwing away the stream version. We now need to deal with running these things. The first case we'll handle is connecting two streamable conduits, which is where the connectStream function from above comes into play. If you go back and look at that code, you'll see that the ConduitM fields are never used. All that's left is telling GHC to use connectStream when appropriate:

{-# RULES "connectStream" forall left right.
        unstream left $$ unstream right = connectStream left right

The next case we'll handle is when we connect a streamable source to a non-streamable sink. This is less efficient than the previous case, since it still requires allocating ConduitM constructors, and doesn't expose as many opportunities for GHC to inline and optimize our code. However, it's still better than nothing:

connectStream1 :: Monad m
               => StreamConduit () i    m ()
               -> ConduitM      i  Void m r
               -> m r
connectStream1 (StreamConduit _ fstream) (ConduitM sink0) =
    case fstream $ Stream (const $ return $ Stop ()) (return ()) of
        Stream step ms0 ->
            let loop _ (Done r) _ = return r
                loop ls (PipeM mp) s = mp >>= flip (loop ls) s
                loop ls (Leftover p l) s = loop (l:ls) p s
                loop _ (HaveOutput _ _ o) _ = absurd o
                loop (l:ls) (NeedInput p _) s = loop ls (p l) s
                loop [] (NeedInput p c) s = do
                    res <- step s
                    case res of
                        Stop () -> loop [] (c ()) s
                        Skip s' -> loop [] (NeedInput p c) s'
                        Emit s' i -> loop [] (p i) s'
             in ms0 >>= loop [] (sink0 Done)
{-# INLINE connectStream1 #-}

{-# RULES "connectStream1" forall left right.
        unstream left $$ right = connectStream1 left right

There's a third case that's worth considering: a streamable sink and non-streamable source. However, I ran into two problems when implementing such a rewrite rule:

  • GHC did not end up firing the rule.

  • There are some corner cases regarding finalizers that need to be dealt with. In our previous examples, the upstream was always a stream, which has no concept of finalizers. But when the upstream is a conduit, we need to make sure to call them appropriately.

So for now, fusion only works for cases where all of the functions can by fused, or all of the functions before the $$ operator can be fused. Otherwise, we'll revert to the normal performance of conduit code.


I took the benchmarks from our previous blog post and modified them slightly. The biggest addition was including an example of enumFromTo =$= map =$= map =$= fold, which really stresses out the fusion capabilities, and demonstrates the performance gap stream fusion offers.

The other thing to note is that, in the "before fusion" benchmarks, the sum results are skewed by the fact that we have the overly eager rewrite rules for enumFromTo $$ fold (for more information, see the previous blog post). For the "after fusion" benchmarks, there are no special-case rewrite rules in place. Instead, the results you're seeing are actual artifacts of having a proper fusion framework in place. In other words, you can expect this to translate into real-world speedups.

You can compare before fusion and after fusion. Let me provide a few select comparisons:

Benchmark Low level or vector Before fusion After fusion Speedup
map + sum 5.95us 636us 5.96us 99%
monte carlo 3.45ms 5.34ms 3.70ms 71%
sliding window size 10, Seq 1.53ms 1.89ms 1.53ms 21%
sliding vector size 10, unboxed 2.25ms 4.05ms 2.33ms 42%

Note at the map + sum benchmark is very extreme, since the inner loop is doing very cheap work, so the conduit overhead dominated the analysis.

Streamifying a conduit

Here's an example of making a conduit function stream fusion-compliant, using the map function:

mapC :: Monad m => (a -> b) -> Conduit a m b
mapC f = awaitForever $ yield . f
{-# INLINE mapC #-}

mapS :: Monad m => (a -> b) -> Stream m a r -> Stream m b r
mapS f (Stream step ms0) =
    Stream step' ms0
    step' s = do
        res <- step s
        return $ case res of
            Stop r -> Stop r
            Emit s' a -> Emit s' (f a)
            Skip s' -> Skip s'
{-# INLINE mapS #-}

map :: Monad m => (a -> b) -> Conduit a m b
map = mapC
{-# INLINE [0] map #-}
{-# RULES "unstream map" forall f.
    map f = unstream (StreamConduit (mapC f) (mapS f))

Notice the three steps here:

  • Define a pure-conduit implementation (mapC), which looks just like conduit 1.1's map function.
  • Define a pure-stream implementation (mapS), which looks very similar to vector's mapS.
  • Define map, which by default simply reexposes mapC. But then, use an INLINE statement to delay inlining until simplification phase 0, and use a rewrite rule to rewrite map in terms of unstream and our two helper functions mapC and mapS.

While tedious, this is all we need to do for each function to expose it to the fusion framework.

Vector vs conduit, mapM style

Overall, vector has been both the inspiration for the work I've done here, and the bar I've used to compare against, since it is generally the fastest implementation you can get in Haskell (and tends to be high-level code to boot). However, there seems to be one workflow where conduit drastically outperforms vector: chaining together monadic transformations.

I put together a benchmark which does the same enumFromTo+map+sum benchmark I demonstrated previously. But this time, I have four versions: vector with pure functions, vector with IO functions, conduit with pure functions, and conduit with IO functions. You can see the results here, the important takeaway is:

  • Pure is always faster, since it exposes more optimizations to GHC.
  • vector and conduit pure are almost identical, at 57.7us and 58.1us.
  • Monadic conduit code does have a slowdown (86.3us). However, monadic vector code has a drastic slowdown (305us), presumably because monadic binds defeat its fusion framework.

So there seems to be at least one workflow for which conduit's fusion framework can outperform even vector!


The biggest downside to this implementation of stream fusion is that we need to write all of our algorithms twice. This can possibly be mitigated by having a few helper functions in place, and implementing others in terms of those. For example, mapM_ can be implemented in terms foldM.

There's one exception to this: using the streamSource function, we can convert a Stream into a Source without having to write our algorithm twice. However, due to differences in how monadic actions are performed between Stream and Conduit, this could introduce a performance degredation for pure Sources. We can work around that with a special case function streamSourcePure for the Identity monad as a base.

Getting good performance

In order to take advantage of the new stream fusion framework, try to follow these guidelines:

  • Use fusion functions whenever possible. Explicit usage of await and yield will immediately kick you back to non-fusion (the same as explicit pattern matching defeats list fusion).
  • If you absolutely cannot use an existing fusion function, consider writing your own fusion variant.
  • When mixing fusion and non-fusion, put as many fusion functions as possible together with the $= operator before the connect operator $$.

Next steps

Even though this work is now publicly available on Hackage, there's still a lot of work to be done. This falls into three main categories:

  • Continue rewriting core library functions in streaming style. Michael Sloan has been working on a lot of these functions, and we're hoping to have almost all the combinators from Data.Conduit.List and Data.Conduit.Combinators done soon.
  • Research why rewrite rules and inlining don't play nicely together. In a number of places, we've had to explicitly use rewrite rules to force fusion to happen, when theoretically inlining should have taken care of it for us.
  • Look into any possible alternative formulations of stream fusion that provide better code reuse or more reliable rewrite rule firing.

Community assistance on all three points, but especially 2 and 3, are much appreciated!

August 27, 2014 12:00 AM

August 26, 2014

Edward Z. Yang

A taste of Cabalized Backpack

So perhaps you've bought into modules and modularity and want to get to using Backpack straightaway. How can you do it? In this blog post, I want to give a tutorial-style taste of how to program Cabal in the Backpack style. None of these examples are executable, because only some of this system is in GHC HEAD--the rest are on branches awaiting code review or complete vaporware. However, we've got a pretty good idea how the overall design and user experience should go, and so the purpose of this blog post is to communicate that idea. Comments and suggestions would be much appreciated; while the design here is theoretically well-founded, for obvious reasons, we don't have much on-the-ground programmer feedback yet.

A simple package in today's Cabal

To start, let's briefly review how Haskell modules and Cabal packages work today. Our running example will be the bytestring package, although I'll inline, simplify and omit definitions to enhance clarity.

Let's suppose that you are writing a library, and you want to use efficient, packed strings for some binary processing you are doing. Fortunately for you, the venerable Don Stewart has already written a bytestring package which implements this functionality for you. This package consists of a few modules: an implementation of strict ByteStrings...

module Data.ByteString(ByteString, empty, singleton, ...) where
  data ByteString = PS !(ForeignPtr Word8) !Int !Int
  empty :: ByteString
  empty = PS nullForeignPtr 0 0

...and an implementation of lazy ByteStrings:

module Data.ByteString.Lazy(ByteString, empty, singleton, ...) where
  data ByteString = Empty | Chunk !S.ByteString ByteString
  empty :: ByteString
  empty = Empty

These modules are packaged up into a package which is specified using a Cabal file (for now, we'll ignore the ability to define libraries/executables in the same Cabal file and assume everything is in a library):

name: bytestring
build-depends: base >= 4.2 && < 5, ghc-prim, deepseq
exposed-modules: Data.ByteString, Data.ByteString.Lazy, ...
other-modules: ...

We can then make a simple module and package which depends on the bytestring package:

module Utils where
  import Data.ByteString.Lazy as B
  blank :: IO ()
  blank = B.putStr B.empty
name: utilities
version: 0.1
build-depends: base, bytestring >= 0.10
exposed-modules: Utils

It's worth noting a few things about this completely standard module setup:

  1. It's not possible to switch Utils from using lazy ByteStrings to strict ByteStrings without literally editing the Utils module. And even if you do that, you can't have Utils depending on strict ByteString, and Utils depending on lazy ByteString, in the same program, without copying the entire module text. (This is not too surprising, since the code really is different.)
  2. Nevertheless, there is some amount of indirection here: while Utils includes a specific ByteString module, it is unspecified which version of ByteString it will be. If (hypothetically) the bytestring library released a new version where lazy byte-strings were actually strict, the functionality of Utils would change accordingly when the user re-ran dependency resolution.
  3. I used a qualified import to refer to identifiers in Data.ByteString.Lazy. This is a pretty common pattern when developing Haskell code: we think of B as an alias to the actual model. Textually, this is also helpful, because it means I only have to edit the import statement to change which ByteString I refer to.

Generalizing Utils with a signature

To generalize Utils with some Backpack magic, we need to create a signature for ByteString, which specifies what the interface of the module providing ByteStrings is. Here one such signature, which is placed in the file Data/ByteString.hsig inside the utilities package:

module Data.ByteString where
  import Data.Word
  data ByteString
  instance Eq ByteString
  empty :: ByteString
  singleton :: Word8 -> ByteString
  putStr :: ByteString -> IO ()

The format of a signature is essentially the same of that of an hs-boot file: we have normal Haskell declarations, but omitting the actual implementations of values.

The utilities package now needs a new field to record signatures:

name: utilities
indefinite: True
build-depends: base
exposed-modules: Utils
required-signatures: Data.ByteString

Notice that there have been three changes: (1) We've removed the direct dependency on the bytestring package, (2) we've added a new field indefinite, which indicates that this indefinite package has signatures and cannot be compiled until those signatures are filled in with implementations (this field is strictly redundant, but is useful for documentation purposes, as we will see later), and (3) we have a new field required-signatures which simply lists the names of the signature files (also known as holes) that we need filled in.

How do we actually use the utilities package, then? Let's suppose our goal is to produce a new module, Utils.Strict, which is Utils but using strict ByteStrings (which is exported by the bytestring package under the module name Data.ByteString). To do this, we'll need to create a new package:

name: strict-utilities
build-depends: utilities, bytestring
reexported-modules: Utils as Utils.Strict

That's it! strict-utilities exports a single module Utils.Strict which is utilities using Data.ByteString from bytestring (which is the strict implementation). This is called a mix-in: in the same dependency list, we simply mix together:

  • utilities, which requires a module named Data.ByteString, and
  • bytestring, which supplies a module named Data.ByteString.

Cabal automatically figures out that how to instantiate the utilities package by matching together module names. Specifically, the two packages above are connected through the module name Data.ByteString. This makes for a very convenient (and as it turns out, expressive) mode of package instantiation. By the way, reexported-modules is a new (orthogonal) feature which lets us reexport a module from the current package or a dependency to the outside world under a different name. The modules that are exported by the package are the exposed-modules and the reexported-modules. The reason we distinguish them is to make clear which modules have source code in the package (exposed-modules).

Unusually, strict-utilities is a package that contains no code! Its sole purpose is to mix existing packages.

Now, you might be wondering: how do we instantiate utilities with the lazy ByteString implementation? That implementation was put in Data.ByteString.Lazy, so the names don't match up. In this case, we can use another new feature, module thinning and renaming:

name: lazy-utilities
  bytestring (Data.ByteString.Lazy as Data.ByteString)
reexported-modules: Utils as Utils.Lazy

The utilities dependency is business as usual, but bytestring has a little parenthesized expression next to it. This expression is the thinning and renaming applied to the package import: it controls what modules are brought into the scope of the current package from a dependency, possibly renaming them to different names. When I write build-depends: bytestring (Data.ByteString.Lazy as Data.ByteString), I am saying "I depend on the bytestring package, but please only make the Data.ByteString.Lazy module available under the name Data.ByteString when considering module imports, and ignore all the other exposed modules." In strict-utilities, you could have also written bytestring (Data.ByteString), because this is the only module that utilities uses from bytestring.

An interesting duality is that you can do the renaming the other way:

name: lazy-utilities
  utilities (Utils, Data.ByteString as Data.ByteString.Lazy),

Instead of renaming the implementation, I renamed the hole! It's equivalent: the thing that matters it that the signature and implementation need to be mixed under the same name in order for linking (the instantiation of the signature with the implementation) to occur.

There are a few things to note about signature usage:

  1. If you are using a signature, there's not much point in also specifying an explicit import list when you import it: you are guaranteed to only see types and definitions that are in the signature (modulo type classes... a topic for another day). Signature files act like a type-safe import list which you can share across modules.

  2. A signature can, and indeed often must, import other modules. In the type signature for singleton in Data/ByteString.hsig, we needed to refer to a type Word8, so we must bring it into scope by importing Data.Word.

    Now, when we compile the signature in the utilities package, we need to know where Data.Word came from. It could have come from another signature, but in this case, it's provided by the definite package base: it's a proper concrete module with an implementation! Signatures can depend on implementations: since we can only refer to types from those modules, we are saying, in effect: any implementation of the singleton function and any representation of the ByteString type is acceptable, but regarding Word8 you must use the specific type from Data.Word in prelude.

  3. What happens if, independently of my packages strict-utilities, someone else also instantiatiates utilities with Data.ByteString? Backpack is clever enough to reuse the instantiation of utilities: this property is called applicativity of the module system. The specific rule that we use to decide if the instantiation is the same is to look at how all of the holes needed by a package are instantiated, and if they are instantiated with precisely the same modules, the instantiated packages are considered type equal. So there is no need to actually create strict-utilities or lazy-utilities: you can just instantiate utilities on the fly.

Mini-quiz: What does this package do?

name: quiz-utilities
  utilities (Utils, Data.ByteString as B),
  bytestring (Data.ByteString.Lazy as B)

Sharing signatures

It's all very nice to be able to explicitly write a signature for Data.ByteString in my package, but this could get old if I have to do this for every single package I depend on. It would be much nicer if I could just put all my signatures in a package and include that when I want to share it. I want all of the Hackage mechanisms to apply to my signatures as well as my normal packages (e.g. versioning). Well, you can!

The author of bytestring can write a bytestring-sig package which contains only signatures:

name: bytestring-sig
version: 1.0
indefinite: True
build-depends: base
exposed-signatures: Data.ByteString

...and declare that the bytestring package satisfies this signature:

name: bytestring
implements: bytestring-sig-1.0

The implements fields is purely advisory: it offers a proactive check to library authors to make sure they aren't breaking compatibility with signatures, and it also helps Cabal offer suggestions for how to provide implementations for signatures.

Now, utilities can include this package to indicate its dependence on the signature:

name: utilities
indefinite: True
build-depends: base, bytestring-sig-1.0
exposed-modules: Utils

Unlike normal dependencies, signature dependencies should be exact: after all, while you might want an upgraded implementation, you don't want the signature to change on you!

Another interesting difference is that we specified the signatures using exposed-signatures, as opposed to required-signatures. We can summarize all of the fields as follows:

  1. exposed-modules says that there is a public module defined in this package
  2. other-modules says that there is a private module defined in this package
  3. exposed-signatures says that there is a public signature defined in this package
  4. required-signatures says that there is a "private" signature defined in this package
  5. reexported-modules says that there is a public module or signature defined in a dependency.

In this list, public means that it is available to clients. Notice the first four fields list all of the source code in this package. Here is a simple example of a client:

name: utilities-extras
indefinite: True
build-depends: utilities
exposed-modules: Utils.Extra

Utils/Extra.hs defined in this package can import Utils (because it's exposed by utilities) but can't import Data.ByteString (because it's not exposed). Had we said reexported-modules: Data.ByteString in utilities, then Data.ByteString would have been accessible.

Do note, however, that the package is still indefinite (since it depends on an indefinite package). Despite Data.ByteString being "private" to utilities (not importable), a client may still refer to it in a renaming clause in order to instantiate the module:

name: utilities-extras-lazy
  utilities-extras (Data.ByteString as Data.ByteString.Lazy),

You can't "hide" holes altogether: that would be like saying, "I'm never going to say what the actual implementation is!" But you can choose not to directly rely on them.

By the way, if Utils/Extra.hs, in utilities-extras, wanted to import Data.ByteString (even though utilities did not expose it), utilities-extras simply needs directly depend on the signature package:

name: utilities-extras
indefinite: True
build-depends: utilities, bytestring-sig == 1.0
exposed-modules: Utils.Extra

The Data.ByteString hole from utilities and the new hole included here are automatically checked for compatibility and linked together: you only need to provide one implementation for both of them.

Mini-quiz: What does this package do? Specifically, if I include it in a package, what modules are available for import?

name: attoparsec-sig
version: 1.0
indefinite: True
build-depends: base, bytestring-sig
exposed-signatures: Data.Attoparsec


We've covered a lot of ground, but when it comes down to it, Backpack really comes together because of set of orthogonal features which interact in a good way:

  1. Module signatures (mostly implemented but needs lots of testing): the heart of a module system, giving us the ability to write indefinite packages and mix together implementations,
  2. Module reexports (fully implemented and in HEAD): the ability to take locally available modules and reexport them under a different name, and
  3. Module thinning and renaming (fully implemented and in code review): the ability to selectively make available modules from a dependency.

To compile a Backpack package, we first run the traditional version dependency solving, getting exact versions for all packages involved, and then we calculate how to link the packages together. That's it! In a future blog post, I plan to more comprehensively describe the semantics of these new features, especially module signatures, which can be subtle at times. Also, note that I've said nothing about how to type-check against just a signature, without having any implementation in mind. As of right now, this functionality is vaporware; in a future blog post, I also plan on saying why this is so challenging.

by Edward Z. Yang at August 26, 2014 10:01 PM

Chris Smith

On CodeWorld and Haskell

I’ve been pouring a lot of effort into CodeWorld lately… and I wanted to write a sort of apology to the Haskell community.  Well, perhaps not an apology, because I believe I did the right thing.  But at the same time, I realize that decisions I’ve made haven’t been entirely popular among Haskell programmers.  I’d like to explain what happened, and try to make it up to you!

What Happened

Originally, I started this project using Haskell and the excellent gloss package, by Ben Lippmeier.  CodeWorld has been moving slowly further and further away from the rest of the Haskell community.  This has happened in a sequence of steps:

  1. Way back in 2011, I started “CodeWorld”, but at the time, I called it Haskell for Kids.  At the time, I understood that the reasons I’d chosen Haskell as a language were not about cool stuff like type classes (which I love) and monads and categories and other commonplace uses of solid abstractions (which fascinate me).  Instead, I chose Haskell for the simple reason that it looked like math.  The rest of Haskell came with the territory.  I built the first CodeWorld web site in a weekend, and I had to settle on a language and accept all that came with it.
  2. From the beginning, I made some changes for pedagogical reasons.  For example, gloss defines rotation to be clockwise.  I insisted on rotation working in the counter-clockwise direction, because that’s the convention universally used in math.  Later, I resized the canvas to 20×20, so that typical programs would need to use fractions and decimals, which is a middle school math education goal.  I made thes changes, even though they broke compatibility with a widely used package.  Sorry for anyone that’s struggled with this.
  3. I rebranded “Haskell for Kids” as CodeWorld, and stopped explicitly depending on gloss in favor of just reproducing its general approach in a new Prelude.  This was a deliberate attempt to get away from focusing on the Haskell language and libraries, and also to the accompanying import statements and such.  This hid the ways that Haskell was a general purpose language with uses outside this toy environment.  That is unfortunate.
  4. I rewrote the Haskell Prelude, to remove type classes.  Along the way, I collapsed the whole numeric type class hierarchy into a single type, and even got Luite (the author of GHCJS) to help me with some deep black magic to implement equality on arbitrary Haskell types without type classes.  This threw away much of the beauty of Haskell… in favor of dramatically improved error messages, and fewer things you need to know to get started.  It was a real loss.
  5. Finally, I commited the unforgivable sin.  I dropped curried functions, in favor of defining functions of multiple parameters using tuples.  This finally makes CodeWorld feel like a completely different language from Haskell.  That really sucks, and I know some people are frustrated.

Why It Happened?

First, I want to point out some things that are not the reason for any of this:

  • I did not do this because I think there’s something wrong with Haskell.  I love type classes.  I love currying, and especially love how it’s not just a convenient trick, but sometimes introduces whole new perspectives by viewing tedious functions of multiple parameters as simple, clean, and elegant higher-order functions.
  • I also did not do this because I think anyone is incapable of learning full-fledged Haskell.  In fact, I taught full-fledged Haskell to middle schoolers for a year.  I know they can do it.

So why did I do it?  Two reasons:

  • Teaching mathematics has always been more important to me than teaching Haskell.  While Haskell is an awesome programming language, mathematics is just an awesome perspective on life.  For every student who benefits from learning an inspiring programming language, many students will benefit from learning that humanity has a method called mathematics for thinking about fundamental truths in a systematic, logical way that can capture things precisely.  So any time I have to choose between pushing students further toward their math education or away from it, I’ll choose toward it.
  • Details matter.  Even though I know kids are capable of a lot, they are capable of a lot more without artificial obstacles in their way.  I learned this the hard way teaching this class the first time.  The smallest little things, with absolutely no great significance as a language, matter a lot.  Having to put parentheses around negative numbers obscures students from reaching leaps of understanding.  Confusing error messages mean the difference between a student who spends a weekend learning, and one who gives up on Friday afternoon and doesn’t think about it until the next school day.  Different surface syntax means that a lot of kids never fully make the connection that functions here are the same thing as functions there.

In the end, I do think these were the right decisions… despite the frustration they can cause for Haskell programmers who know there’s a better way.

Making Up For It

A couple weekends ago, though, I worked on something to hopefully restore some of this loss for Haskellers.  You see, all the changes I’ve made, in the end, come from replacing the Prelude module with my own alternative.  Specifically:

  1. I deliberately replaced functions from the Prelude with my modified versions.
  2. Because I provided an alternative Prelude, I had to hide the base package, which made it impossible to import things like Control.Monad.  This was not a deliberate decision.  It just happened.

So I fixed this.  I added to the codeworld-base package re-exports of all of the modules from base.  I renamed Prelude to HaskellPrelude in the process, so that it doesn’t conflict with my own Prelude.  And finally, I added a new module, CodeWorld, that exports all the really new stuff from CodeWorld like pictures, colors, and the interpreters for pictures, animations, simulations, etc.  The result is that you can now start your programs with the following:

import Prelude()
import HaskellPrelude
import CodeWorld -- If you still want to do pictures, etc.

main = putStrLn "Hello, World"

At this point, you can write any Haskell you like!  You aren’t even constrained to pure code, or safe code.  (The exception: TemplateHaskell is still rejected, since the compiler runs on the server, so TH code would execute code on the server.)

In fact, it’s even better!  You’re free to use GHCJS JavaScript foreign imports, to interact with the browser environment!  See a brief example here.  Now you’re out of the sandbox, and are free to play around however you like.

Right now, the CodeWorld module still uses uncurried functions and other CodeWorld conventions like Number for numbers, etc.  There’s no reason for this, and it’s something that I should probably change.  Anyone want to send a pull request?

by cdsmith at August 26, 2014 04:38 PM

Dominic Steinitz

Haskell Vectors and Sampling from a Categorical Distribution


Suppose we have a vector of weights which sum to 1.0 and we wish to sample n samples randomly according to these weights. There is a well known trick in Matlab / Octave using sampling from a uniform distribution.

num_particles = 2*10^7
likelihood = zeros(num_particles,1);
likelihood(:,1) = 1/num_particles;
[_,index] = histc(rand(num_particles,1),[0;cumsum(likelihood/sum(likelihood))]);
s = sum(index);

Using tic and toc this produces an answer with

Elapsed time is 10.7763 seconds.


I could find no equivalent function in Haskell nor could I easily find a binary search function.

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> {-# LANGUAGE BangPatterns                 #-}
> import System.Random.MWC
> import qualified Data.Vector.Unboxed as V
> import Control.Monad.ST
> import qualified Data.Vector.Algorithms.Search as S
> import Data.Bits
> n :: Int
> n = 2*10^7

Let’s create some random data. For a change let’s use mwc-random rather than random-fu.

> vs  :: V.Vector Double
> vs = runST (create >>= (asGenST $ \gen -> uniformVector gen n))

Again, I could find no equivalent of cumsum but we can write our own.

> weightsV, cumSumWeightsV :: V.Vector Double
> weightsV = V.replicate n (recip $ fromIntegral n)
> cumSumWeightsV = V.scanl (+) 0 weightsV

Binary search on a sorted vector is straightforward and a cumulative sum ensures that the vector is sorted.

> binarySearch :: (V.Unbox a, Ord a) =>
>                 V.Vector a -> a -> Int
> binarySearch vec x = loop 0 (V.length vec - 1)
>   where
>     loop !l !u
>       | u <= l    = l
>       | otherwise = let e = vec V.! k in if x <= e then loop l k else loop (k+1) u
>       where k = l + (u - l) `shiftR` 1
> indices :: V.Vector Double -> V.Vector Double -> V.Vector Int
> indices bs xs = (binarySearch bs) xs

To see how well this performs, let’s sum the indices (of course, we wouldn’t do this in practice) as we did for the Matlab implementation.

> js :: V.Vector Int
> js = indices (V.tail cumSumWeightsV) vs
> main :: IO ()
> main = do
>   print $ V.foldl' (+) 0 js

Using +RTS -s we get

Total   time   10.80s  ( 11.06s elapsed)

which is almost the same as the Matlab version.

I did eventually find a binary search function in vector-algorithms and since one should not re-invent the wheel, let us try using it.

> indices' :: (V.Unbox a, Ord a) => V.Vector a -> V.Vector a -> V.Vector Int
> indices' sv x = runST $ do
>   st <- V.unsafeThaw (V.tail sv)
>   V.mapM (S.binarySearch st) x
> main' :: IO ()
> main' = do
>   print $  V.foldl' (+) 0 $ indices' cumSumWeightsV vs

Again using +RTS -s we get

Total   time   11.34s  ( 11.73s elapsed)

So the library version seems very slightly slower.

by Dominic Steinitz at August 26, 2014 03:05 PM

August 25, 2014

Danny Gratzer

Introduction to Dependent Types: Haskell on Steroids

Posted on August 25, 2014

I’d like to start another series of blog posts. This time on something that I’ve wanted to write about for a while, dependent types.

There’s a noticeable lack of accessible materials introducing dependent types at a high level aimed at functional programmers. That’s what this series sets out help fill. Therefore, if you’re a Haskell programmer and don’t understand something, it’s a bug! Please comment so I can help make this a more useful resource for you :)

There are four parts to this series, each answering one question

  1. What are dependent types?
  2. What does a dependently typed language look like?
  3. What does it feel like to write programs with dependent types?
  4. What does it mean to “prove” something?

So first things first, what are dependent types? Most people by now have heard the unhelpful quick answer

A dependent type is a type that depends on a value, not just other types.

But that’s not helpful! What does this actually look like? To try to understand this we’re going to write some Haskell code that pushes us as close as we can get to dependent types in Haskell.

Kicking GHC in the Teeth

Let’s start with the flurry of extensions we need

{-# LANGUAGE DataKinds            #-}
{-# LANGUAGE KindSignatures       #-}
{-# LANGUAGE GADTs                #-}
{-# LANGUAGE TypeFamilies         #-}
{-# LANGUAGE UndecidableInstances #-}

Now our first definition is a standard formulation of natural numbers

    data Nat = Z | S Nat

Here Z represents 0 and S means + 1. So you should read S Z as 1, S (S Z) as 2 and so on and so on.

If you’re having some trouble, this function to convert an Int to a Nat might help

    -- Naively assume n >= 0
    toNat :: Int -> Nat
    toNat 0 = Z
    toNat n = S (toNat $ n - 1)

We can use this definition to formulate addition

    plus :: Nat -> Nat -> Nat
    plus Z n     = n
    plus (S n) m = S (plus n m)

This definition proceeds by “structural induction”. That’s a scary word that pops up around dependent types. It’s not all that complicated, all that it means is that we use recursion only on strictly smaller terms.

There is a way to formally define smaller, if a term is a constructor applied to several (recursive) arguments. Any argument to the constructor is strictly smaller than the original terms. In a strict language if we restrict ourselves to only structural recursion we’re guaranteed that our function will terminate. This isn’t quite the case in Haskell since we have infinite structures.

    toInt :: Nat -> Int
    toInt (S n) = 1 + toInt n
    toInt Z     = 0

    bigNumber = S bigNumber

    main = print (toInt bigNumber) -- Uh oh!

Often people will cheerfully ignore this part of Haskell when talking about reasoning with Haskell and I’ll stick to that tradition (for now).

Now back to the matter at hand. Since our definition of Nat is quite straightforward, it get’s promoted to the kind level by DataKinds.

Now we can “reflect” values back up to this new kind with a second GADTed definition of natural numbers.

    data RNat :: Nat -> * where
      RZ :: RNat Z
      RS :: RNat n -> RNat (S n)

Now, let’s precisely specify the somewhat handwavy term “reflection”. I’m using it in the imprecise sense meaning that we’ve lifted a value into something isomorphic at the type level. Later we’ll talk about reflection precisely mean lifting a value into the type level. That’s currently not possible since we can’t have values in our types!

What on earth could that be useful for? Well with this we can do something fancy with the definition of addition.

    type family Plus n m :: Nat where
      Plus Z n     = n
      Plus (S n) m = S (Plus n m)

Now we’ve reflected our definition of addition to the type family. More than that, what we’ve written above is fairly obviously correct. We can now force our value level definition of addition to respect this type family

    plus' :: RNat n -> RNat m -> RNat (Plus n m)
    plus' RZ n     = n
    plus' (RS n) m = RS (plus' n m)

Now if we messed up this definition we’d get a type error!

    plus' :: RNat n -> RNat m -> RNat (Plus n m)
    plus' RZ n     = n
    plus' (RS n) m = plus' n m -- Unification error! n ~ S n

Super! We know have types that express strict guarantees about our program. But how useable is this?

To put it to the test, let’s try to write some code that reads to integers for standard input and prints their sum.

We can easily do this with our normal plus

    readNat :: IO Nat
    readNat = toNat <$> readLn

    main :: IO ()
    main = plus <$> readNat <*> readNat

Easy as pie! But what about RNat, how can we convert a Nat to an RNat? Well we could try something with type classes I guess

class Reify a where
  type N
  reify :: a -> RNat N

But wait, that doesn’t work since we can only have once instance for all Nats. What if we did the opposite

class Reify (n :: Nat) where
  nat :: RNat n -> Nat

This let’s us go in the other direction.. but that doesn’t help us! In fact there’s no obvious way to propagate runtime values back into the types. We’re stuck.

GHC with Iron Dentures

Now, if we could add some magical extension to GHC could we write something like above program? Yes of course! The key idea is to not reflect up our types with data kinds, but rather just allow the values to exist in the types on their own.

For these I propose two basic ideas

  1. A special reflective function type
  2. Lifting expressions into types

For our special function types, we allow the return type to use the supplied value. These are called pi types. We’ll give this the following syntax

(x :: A) -> B x

Where A :: * and B :: A -> * are some sort of type. Notice that that A in B’s kind isn’t the data kind promoted version, but just the goodness to honest normal value.

Now in order to allow B to actually make use of it’s supplied value, our second idea let’s normal types be indexed on values! Just like how GADTs can be indexed on types. We’ll call these GGADTs.

So let’s define a new version of RNat

    data RNat :: Nat -> * where
      RZ :: RNat Z
      RS :: RNat n -> RNat (S n)

This looks exactly like what we had before, but our semantics are different now. Those Z’s and S’s are meant to represent actual values, not members of some kind. There’s no promoting types to singleton kinds anymore, just plain old values being held in fancier types.

Because we can depend on normal values, we don’t even have to use our simple custom natural numbers.

    data RInt :: Int -> * where
      RZ :: RInt 0
      RS :: RInt n -> RInt (1 + n)

Notice that we allowed our types to call functions, like +. This can potentially be undecidable, something that we’ll address later.

Now we can write our function with a combination of these two ideas

    toRInt :: (n :: Int) -> RInt n
    toRInt 0 = RZ
    toRInt n = RS (toRInt $ n - 1)

Notice how we used pi types to change the return type dependent on the input value. Now we can feed this any old value, including ones we read from standard input.

    main = print . toInt $ plus' <$> fmap toRInt readLn <*> fmap toRInt readLn

Now, one might wonder how the typechecker could possibly know how to handle such things, after all how could it know what’ll be read from stdin!

The answer is that it doesn’t. When a value is reflected to the type level we can’t do anything with it. For example, if we had a type like

    (n :: Int) -> (if n == 0 then Bool else ())

Then we would have to pattern match on n at the value level to propagate information about n back to the type level.

If we did something like

    foo :: (n :: Int) -> (if n == 0 then Bool else ())
    foo n = case n of
      0 -> True
      _ -> ()

Then the typechecker would see that we’re matching on n, so if we get into the 0 -> ... branch then n must be 0. It can then reduce the return type to if 0 == 0 then Bool else () and finally Bool. A very important thing to note here is that the typechecker doesn’t evaluate the program. It’s examining the function in isolation of all other values. This means we sometimes have to hold its hand to ensure that it can figure out that all branches have the correct type.

This means that when we use pi types we often have to pattern match on our arguments in order to help the typechecker figure out what’s going on.

To make this clear, let’s play the typechecker for this function. I’m reverting to the Nat type since it’s nicer for pattern matching.

    toRNat :: (n :: Nat) -> RNat n
    toRNat Z = RZ -- We know that n is `Z` in this branch
    toRNat (S n) = RS (toRNat n {- This has the type RNat n' -})

    p :: (n :: Nat) -> (m :: Int) -> RNat (plus n m)
    p Z m     = toRNat m
    p (S n) m = RS (toRNat n m)

First the type checker goes through toRNat.

In the first branch we have n equals Z, so RZ trivially typechecks. Next we have the case S n.

  • We know that toRNat n has the type RNat n' by induction
  • We also know that S n' = n.
  • Therefore RS builds us a term of type RNat n.

Now for p. We start in much the same manner.

if we enter the p Z m case

  • we know that n is Z.
  • we can reduce plus n m since plus Z m is by definition equal to m Look at the definition of plus to confirm this).
  • We know how to produce RNat m easily since we have a function toRNat :: (n :: Nat) -> RNat n.
  • We can apply this to m and the resulting term has the type RNat m.

In the RS case we know that we’re trying to produce a term of type RNat (plus (S n) m).

  • Now since we know that the constructor for the first argument of plus, we can reduce plus (S n) m to S (plus n m) by the definition of plus.
  • We’re looking to build a term of type plus n m and that’s as simple as a recursive call.
  • From here we just need to apply RS to give us S (plus n m)
  • As we previously noted S (plus n m) is equal to plus (S n) m

Notice how as we stepped through this as the typechecker we never needed to do any arbitrary reductions. We only ever reduce definitions when we have the outer constructor (WHNF) of one of the arguments.

While I’m not actually proposing adding {-# LANGUAGE PiTypes #-} to GHC, it’s clear that with only a few orthogonal editions to system F we can get some seriously cool types.

Wrap Up

Believe or not we’ve just gone through two of the most central concepts in dependent types

  • Indexed type families (GGADTs)
  • Dependent function types (Pi types)

Not so bad was it? :) From here we’ll look in the next post how to translate our faux Haskell into actual Agda code. From there we’ll go through a few more detailed examples of pi types and GGADTs by poking through some of the Agda standard library.

Thanks for reading, I must run since I’m late for class. It’s an FP class ironically enough.

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

August 25, 2014 12:00 AM

August 24, 2014

Alejandro Serrano Mena

Using Emacs for Haskell development

In the last months, the toolchain for using Haskell within Emacs has changed a lot, and has become a lot better. Apart from my additions to ghc-mod, new autocompletion packages such as company-ghc have appeared.

In the past, I've felt that there was a need for a comprehensive article of all the available options for Haskell development in Emacs, including haskell-mode, ghc-mod, company-ghc, HaRe and structured-haskell-mode. To fill this gap, I have written a tutorial covering installation, configuration and use of these tools, especially keeping an eye into making all of them work nicely when put together.

Hope it helps!

by Alejandro Serrano ( at August 24, 2014 10:48 AM

Summer of Code on Emacs!

This summer I've been participating in Google Summer of Code, as I did some years ago. My aim was the same: to make it easier for Haskell developers to interact with their code. But instead of Eclipse, I've focused on another very well-known editor: Emacs. In particular, I've been extending the already excellent ghc-mod.

During the last year I've turned increasingly jealous of the Emacs modes for Agda and Idris, two programming languages which resemble Haskell but add dependent types to the mix. Using those modes, you can work interactively with your code, write pattern matches automatically, refine certain parts of your code, ask the compiler what is the type that a certain code should have, and so on. Furthermore, since version 7.8 GHC includes support for typed holes, so it seemed like all the necessary infrastructure from the compiler was in place to do this.

Instead of a boring description of the outcome of the project, I have prepared a video demostration ;)

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="450" src="" width="600"></iframe>

As a summary, here is the list of new key bindings that you can use since the release (just a few days ago) of ghc-mod 5.0:

  • C-u M-t: create the skeleton of a function from its signature, or the skeleton of a type class instance from its declaration,
  • M-t: perform case splitting on variables;
  • C-c M-n and C-c M-p: navigate between typed holes in your program, to the next or the previous one, respectively;
  • C-c C-f: refine a hole through an expression, including as much holes as needed to make it type check;
  • C-c C-a: try automatic completion of a hole by calling Djinn.

I would like to thank eveybody who has helped me during this summer, especially my mentor David Raymond Christiansen (whose work in idris-mode is just amazing) and Kazu Yamamoto, the creator and maintainer of ghc-mod.

by Alejandro Serrano ( at August 24, 2014 09:09 AM

August 23, 2014

Antti-Juhani Kaijanaho (ibid)

A milestone toward a doctorate

Yesterday I received my official diploma for the degree of Licentiate of Philosophy. The degree lies between a Master’s degree and a doctorate, and is not required; it consists of the coursework required for a doctorate, and a Licentiate Thesis, “in which the student demonstrates good conversance with the field of research and the capability of independently and critically applying scientific research methods” (official translation of the Government decree on university degrees 794/2004, Section 23 Paragraph 2).

The title and abstract of my Licentiate Thesis follow:

Kaijanaho, Antti-Juhani
The extent of empirical evidence that could inform evidence-based design of programming languages. A systematic mapping study.
Jyväskylä: University of Jyväskylä, 2014, 243 p.
(Jyväskylä Licentiate Theses in Computing,
ISSN 1795-9713; 18)
ISBN 978-951-39-5790-2 (nid.)
ISBN 978-951-39-5791-9 (PDF)
Finnish summary

Background: Programming language design is not usually informed by empirical studies. In other fields similar problems have inspired an evidence-based paradigm of practice. Central to it are secondary studies summarizing and consolidating the research literature. Aims: This systematic mapping study looks for empirical research that could inform evidence-based design of programming languages. Method: Manual and keyword-based searches were performed, as was a single round of snowballing. There were 2056 potentially relevant publications, of which 180 were selected for inclusion, because they reported empirical evidence on the efficacy of potential design decisions and were published on or before 2012. A thematic synthesis was created. Results: Included studies span four decades, but activity has been sparse until the last five years or so. The form of conditional statements and loops, as well as the choice between static and dynamic typing have all been studied empirically for efficacy in at least five studies each. Error proneness, programming comprehension, and human effort are the most common forms of efficacy studied. Experimenting with programmer participants is the most popular method. Conclusions: There clearly are language design decisions for which empirical evidence regarding efficacy exists; they may be of some use to language designers, and several of them may be ripe for systematic reviewing. There is concern that the lack of interest generated by studies in this topic area until the recent surge of activity may indicate serious issues in their research approach.

Keywords: programming languages, programming language design, evidence-based paradigm, efficacy, research methods, systematic mapping study, thematic synthesis

A Licentiate Thesis is assessed by two examiners, usually drawn from outside of the home university; they write (either jointly or separately) a substantiated statement about the thesis, in which they suggest a grade. The final grade is almost always the one suggested by the examiners. I was very fortunate to have such prominent scientists as Dr. Stefan Hanenberg and Prof. Stein Krogdahl as the examiners of my thesis. They recommended, and I received, the grade “very good” (4 on a scale of 1–5).

The thesis has been accepted for publication published in our faculty’s licentiate thesis series and will in due course appear has appeared in our university’s electronic database (along with a very small number of printed copies). In the mean time, if anyone wants an electronic preprint, send me email at

<figure class="wp-caption aligncenter" id="attachment_1622" style="width: 334px;">Figure 1 of the thesis: an overview of the mapping process<figcaption class="wp-caption-text">Figure 1 of the thesis: an overview of the mapping process</figcaption></figure>

As you can imagine, the last couple of months in the spring were very stressful for me, as I pressed on to submit this thesis. After submission, it took me nearly two months to recover (which certain people who emailed me on Planet Haskell business during that period certainly noticed). It represents the fruit of almost four years of work (way more than normally is taken to complete a Licentiate Thesis, but never mind that), as I designed this study in Fall 2010.

<figure class="wp-caption aligncenter" id="attachment_1625" style="width: 330px;">Figure 8 of the thesis: Core studies per publication year<figcaption class="wp-caption-text">Figure 8 of the thesis: Core studies per publication year</figcaption></figure>

Recently, I have been writing in my blog a series of posts in which I have been trying to clear my head about certain foundational issues that irritated me during the writing of the thesis. The thesis contains some of that, but that part of it is not very strong, as my examiners put it, for various reasons. The posts have been a deliberately non-academic attempt to shape the thoughts into words, to see what they look like fixed into a tangible form. (If you go read them, be warned: many of them are deliberately provocative, and many of them are intended as tentative in fact if not in phrasing; the series also is very incomplete at this time.)

I closed my previous post, the latest post in that series, as follows:

In fact, the whole of 20th Century philosophy of science is a big pile of failed attempts to explain science; not one explanation is fully satisfactory. [...] Most scientists enjoy not pondering it, for it’s a bit like being a cartoon character: so long as you don’t look down, you can walk on air.

I wrote my Master’s Thesis (PDF) in 2002. It was about the formal method called “B”; but I took a lot of time and pages to examine the history and content of formal logic. My supervisor was, understandably, exasperated, but I did receive the highest possible grade for it (which I never have fully accepted I deserved). The main reason for that digression: I looked down, and I just had to go poke the bridge I was standing on to make sure I was not, in fact, walking on air. In the many years since, I’ve taken a lot of time to study foundations, first of mathematics, and more recently of science. It is one reason it took me about eight years to come up with a doable doctoral project (and I am still amazed that my department kept employing me; but I suppose they like my teaching, as do I). The other reason was, it took me that long to realize how to study the design of programming languages without going where everyone has gone before.

Debian people, if any are still reading, may find it interesting that I found significant use for the dctrl-tools toolset I have been writing for Debian for about fifteen years: I stored my data collection as a big pile of dctrl-format files. I ended up making some changes to the existing tools (I should upload the new version soon, I suppose), and I wrote another toolset (unfortunately one that is not general purpose, like the dctrl-tools are) in the process.

For the Haskell people, I mainly have an apology for not attending to Planet Haskell duties in the summer; but I am back in business now. I also note, somewhat to my regret, that I found very few studies dealing with Haskell. I just checked; I mention Haskell several times in the background chapter, but it is not mentioned in the results chapter (because there were not studies worthy of special notice).

I am already working on extending this work into a doctoral thesis. I expect, and hope, to complete that one faster.

by Antti-Juhani Kaijanaho at August 23, 2014 05:44 PM

Joachim Breitner

This blog goes static

After a bit more than 9 years, I am replacing Serendipity, which as been hosting my blog, by a self-made static solution. This means that when you are reading this, my server no longer has to execute some rather large body of untyped code to produce the bytes sent to you. Instead, that happens once in a while on my laptop, and they are stored as static files on the server.

I hope to get a little performance boost from this, so that my site can more easily hold up to being mentioned on hackernews. I also do not want to worry about security issues in Serendipity – static files are not hacked.

Of course there are down-sides to having a static blog. The editing is a bit more annoying: I need to use my laptop (previously I could post from anywhere) and I edit text files instead of using a JavaScript-based WYSIWYG editor (but I was slightly annoyed by that as well). But most importantly your readers cannot comment on static pages. There are cloud-based solutions that integrate commenting via JavaScript on your static pages, but I decided to go for something even more low-level: You can comment by writing an e-mail to me, and I’ll put your comment on the page. This has the nice benefit of solving the blog comment spam problem.

The actual implementation of the blog is rather masochistic, as my web page runs on one of these weird obfuscated languages (XSLT). Previously, it contained of XSLT stylesheets producing makefiles calling XSLT sheets. Now it is a bit more-self-contained, with one XSLT stylesheet writing out all the various html and rss files.

I managed to import all my old posts and comments thanks to this script by Michael Hamann (I had played around with this some months ago and just spend what seemed to be an hour to me to find this script again) and a small Haskell script. Old URLs are rewritten (using mod_rewrite) to the new paths, but feed readers might still be confused by this.

This opens the door to a long due re-design of my webpage. But not today...

by Joachim Breitner ( at August 23, 2014 03:54 PM

Dominic Steinitz

Importance Sampling

Importance Sampling

Suppose we have an random variable X with pdf 1/2\exp{-\lvert x\rvert} and we wish to find its second moment numerically. However, the random-fu package does not support sampling from such as distribution. We notice that

\displaystyle   \int_{-\infty}^\infty x^2 \frac{1}{2} \exp{-\lvert x\rvert} \mathrm{d}x =  \int_{-\infty}^\infty x^2 \frac{\frac{1}{2} \exp{-\lvert x\rvert}}                                 {\frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}}                        \frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}  \,\mathrm{d}x

So we can sample from {\cal{N}}(0, 4) and evaluate

\displaystyle   x^2 \frac{\frac{1}{2} \exp{-\lvert x\rvert}}           {\frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}}

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> module Importance where
> import Control.Monad
> import Data.Random.Source.PureMT
> import Data.Random
> import Data.Random.Distribution.Binomial
> import Data.Random.Distribution.Beta
> import Control.Monad.State
> import qualified Control.Monad.Writer as W
> sampleImportance :: RVarT (W.Writer [Double]) ()
> sampleImportance = do
>   x <- rvarT $ Normal 0.0 2.0
>   let x2 = x^2
>       u = x2 * 0.5 * exp (-(abs x))
>       v = (exp ((-x2)/8)) * (recip (sqrt (8*pi)))
>       w = u / v
>   lift $ W.tell [w]
>   return ()
> runImportance :: Int -> [Double]
> runImportance n =
>   snd $
>   W.runWriter $
>   evalStateT (sample (replicateM n sampleImportance))
>              (pureMT 2)

We can run this 10,000 times to get an estimate.

ghci> import Formatting
ghci> format (fixed 2) (sum (runImportance 10000) / 10000)

Since we know that the n-th moment of the exponential distribution is n! / \lambda^n where \lambda is the rate (1 in this example), the exact answer is 2 which is not too far from our estimate using importance sampling.

The value of

\displaystyle   w(x) = \frac{1}{N}\frac{\frac{1}{2} \exp{-\lvert x\rvert}}                         {\frac{1}{\sqrt{8\pi}}{\exp{-x^2/8}}}       = \frac{p(x)}{\pi(x)}

is called the weight, p is the pdf from which we wish to sample and \pi is the pdf of the importance distribution.

Importance Sampling Approximation of the Posterior

Suppose that the posterior distribution of a model in which we are interested has a complicated functional form and that we therefore wish to approximate it in some way. First assume that we wish to calculate the expectation of some arbitrary function f of the parameters.

\displaystyle   {\mathbb{E}}(f({x}) \,\vert\, y_1, \ldots y_T) =  \int_\Omega f({x}) p({x} \, \vert \, y_1, \ldots y_T) \,\mathrm{d}{x}

Using Bayes

\displaystyle   \int_\Omega f({x}) {p\left(x \,\vert\, y_1, \ldots y_T\right)} \,\mathrm{d}{x} =  \frac{1}{Z}\int_\Omega f({x}) {p\left(y_1, \ldots y_T \,\vert\, x\right)}p(x) \,\mathrm{d}{x}

where Z is some normalizing constant.

As before we can re-write this using a proposal distribution \pi(x)

\displaystyle   \frac{1}{Z}\int_\Omega f({x}) {p\left(y_1, \ldots y_T \,\vert\, x\right)}p(x) \,\mathrm{d}{x} =  \frac{1}{Z}\int_\Omega \frac{f({x}) {p\left(y_1, \ldots y_T \,\vert\, x\right)}p(x)}{\pi(x)}\pi(x) \,\mathrm{d}{x}

We can now sample X^{(i)} \sim \pi({x}) repeatedly to obtain

\displaystyle   {\mathbb{E}}(f({x}) \,\vert\, y_1, \ldots y_T) \approx \frac{1}{ZN}\sum_1^N  f({X^{(i)}}) \frac{p(y_1, \ldots y_T \, \vert \, {X^{(i)}})p({X^{(i)}})}                              {\pi({X^{(i)}})} =  \sum_1^N w_if({X^{(i)}})

where the weights w_i are defined as before by

\displaystyle   w_i = \frac{1}{ZN} \frac{p(y_1, \ldots y_T \, \vert \, {X^{(i)}})p({X^{(i)}})}                          {\pi({X^{(i)}})}

We follow Alex Cook and use the example from (Rerks-Ngarm et al. 2009). We take the prior as \sim {\cal{Be}}(1,1) and use {\cal{U}}(0.0,1.0) as the proposal distribution. In this case the proposal and the prior are identical just expressed differently and therefore cancel.

Note that we use the log of the pdf in our calculations otherwise we suffer from (silent) underflow, e.g.,

ghci> pdf (Binomial nv (0.4 :: Double)) xv

On the other hand if we use the log pdf form

ghci> logPdf (Binomial nv (0.4 :: Double)) xv
> xv, nv :: Int
> xv = 51
> nv = 8197
> sampleUniform :: RVarT (W.Writer [Double]) ()
> sampleUniform = do
>   x <- rvarT StdUniform
>   lift $ W.tell [x]
>   return ()
> runSampler :: RVarT (W.Writer [Double]) () ->
>               Int -> Int -> [Double]
> runSampler sampler seed n =
>   snd $
>   W.runWriter $
>   evalStateT (sample (replicateM n sampler))
>              (pureMT (fromIntegral seed))
> sampleSize :: Int
> sampleSize = 1000
> pv :: [Double]
> pv = runSampler sampleUniform 2 sampleSize
> logWeightsRaw :: [Double]
> logWeightsRaw = map (\p -> logPdf (Beta 1.0 1.0) p +
>                            logPdf (Binomial nv p) xv -
>                            logPdf StdUniform p) pv
> logWeightsMax :: Double
> logWeightsMax = maximum logWeightsRaw
> weightsRaw :: [Double]
> weightsRaw = map (\w -> exp (w - logWeightsMax)) logWeightsRaw
> weightsSum :: Double
> weightsSum = sum weightsRaw
> weights :: [Double]
> weights = map (/ weightsSum) weightsRaw
> meanPv :: Double
> meanPv = sum $ zipWith (*) pv weights
> meanPv2 :: Double
> meanPv2 = sum $ zipWith (\p w -> p * p * w) pv weights
> varPv :: Double
> varPv = meanPv2 - meanPv * meanPv

We get the answer

ghci> meanPv

But if we look at the size of the weights and the effective sample size

ghci> length $ filter (>= 1e-6) weights

ghci> (sum weights)^2 / (sum $ map (^2) weights)

so we may not be getting a very good estimate. Let’s try

> sampleNormal :: RVarT (W.Writer [Double]) ()
> sampleNormal = do
>   x <- rvarT $ Normal meanPv (sqrt varPv)
>   lift $ W.tell [x]
>   return ()
> pvC :: [Double]
> pvC = runSampler sampleNormal 3 sampleSize
> logWeightsRawC :: [Double]
> logWeightsRawC = map (\p -> logPdf (Beta 1.0 1.0) p +
>                             logPdf (Binomial nv p) xv -
>                             logPdf (Normal meanPv (sqrt varPv)) p) pvC
> logWeightsMaxC :: Double
> logWeightsMaxC = maximum logWeightsRawC
> weightsRawC :: [Double]
> weightsRawC = map (\w -> exp (w - logWeightsMaxC)) logWeightsRawC
> weightsSumC :: Double
> weightsSumC = sum weightsRawC
> weightsC :: [Double]
> weightsC = map (/ weightsSumC) weightsRawC
> meanPvC :: Double
> meanPvC = sum $ zipWith (*) pvC weightsC
> meanPvC2 :: Double
> meanPvC2 = sum $ zipWith (\p w -> p * p * w) pvC weightsC
> varPvC :: Double
> varPvC = meanPvC2 - meanPvC * meanPvC

Now the weights and the effective size are more re-assuring

ghci> length $ filter (>= 1e-6) weightsC

ghci> (sum weightsC)^2 / (sum $ map (^2) weightsC)

And we can take more confidence in the estimate

ghci> meanPvC


Rerks-Ngarm, Supachai, Punnee Pitisuttithum, Sorachai Nitayaphan, Jaranit Kaewkungwal, Joseph Chiu, Robert Paris, Nakorn Premsri, et al. 2009. “Vaccination with ALVAC and AIDSVAX to Prevent HIV-1 Infection in Thailand.” New England Journal of Medicine 361 (23) (December 3): 2209–2220. doi:10.1056/nejmoa0908492.

by Dominic Steinitz at August 23, 2014 08:05 AM

August 22, 2014

Philip Wadler

Informatics Independence Referendum Debate

School of Informatics, University of Edinburgh
Independence Referendum Debate
4.00--5.30pm Monday 25 August
Appleton Tower Lecture Room 2

For the NAYs: Prof. Alan Bundy
For the AYEs: Prof. Philip Wadler
Moderator: Prof. Mike Fourman

All members of the School of Informatics
and the wider university community welcome

(This is a debate among colleagues and not a formal University event.
All views expressed are those of the individuals who express them,
and not the University of Edinburgh.)

by Philip Wadler ( at August 22, 2014 10:43 AM

Research funding in an independent Scotland

A useful summary, written by a colleague.
In Summary:
  •  the difference between the Scottish tax contribution and RCUK spending in Scotland is small compared to savings that will be made in other areas such as defence
  • Funding levels per institution are actually similar in Scotland to those in the rest of the UK, it’s just that there are more institutions here
  • Because of its relative importance any independent Scottish government would prioritise research
  • If rUK rejects a common research area it would lose the benefits of its previous investments, and the Scottish research capacity, which is supported by the Scottish government and the excellence of our universities
  • There are significant disadvantages with a No vote, resulting from UK immigration policy and the possibility of exiting the EU

by Philip Wadler ( at August 22, 2014 10:32 AM

August 21, 2014

Theory Lunch (Institute of Cybernetics, Tallinn)

Transgressing the limits

Today, the 14th of January 2014, we had a special session of our Theory Lunch. I spoke about ultrafilters and how they allow to generalize the notion of limit.

Consider the space \ell^\infty of bounded sequences of real numbers, together with the supremum norm. We would like to define a notion of limit which holds for every \{x_n\}_{n \geq 0} \in \ell^\infty and satisfies the well known properties of standard limit:

  1. Linearity: \lim_{n \to \infty} (\lambda x_n + \mu y_n) = \lambda \lim_{n \to \infty} x_n + \mu \lim_{n \to \infty} y_n.
  2. Homogeneity: \lim_{n \to \infty} (x_n \cdot y_n) = (\lim_{n \to \infty} x_n) \cdot (\lim_{n \to \infty} y_n).
  3. Monotonicity: if x_n \leq y_n for every n \geq 0 then \lim_{n \to \infty} x_n \leq \lim_{n \to \infty} y_n.
  4. Nontriviality: if x_n = 1 for every n \geq 0 then \lim_{n \to \infty} x_n = 1.
  5. Consistency: if the limit exists in the classical sense, then the two notions coincide.

The consistency condition is reasonable also because it avoids trivial cases: if we fix n_0 \in \mathbb{N} and we define the limit of the sequence x_n as the value x_{n_0}, then the first four properties are satisfied.

Let us recall the classical definition of limit: we say that x_n converges to x if and only if, for every \varepsilon > 0, the set of values n \in \mathbb{N} such that |x_n - x| < \varepsilon is cofinite, i.e., has a finite complement: the inequality |x_n - x| \geq \varepsilon can be satisfied at most for finitely many values of n. The family \mathcal{F} of cofinite subsets of \mathbb{N} (in fact, of any set X) has the following properties:

  1. Upper closure: if A \in \mathcal{F} and B \supseteq A then B \in \mathcal{F}.
  2. Meet stability: if A,B \in \mathcal{F} then A \cap B \in \mathcal{F}.

A family \mathcal{F} of subsets of X with the two properties above is called a filter on X. An immediate example is the trivial filter \mathcal{F} = \{X\}; another example is the improper filter \mathcal{F} = \mathcal{P}(X). The family \mathcal{F}(X) of cofinite subset of X is called the Fréchet filter on X. The Fréchet filter is not the improper one if and only if X is infinite.

An ultrafilter on X is a filter \mathcal{U} on X satisfying the following additional conditions:

  1. Properness: \emptyset \not \in \mathcal{U}.
  2. Maximality: for every A \subseteq X, either A \in \mathcal{U} or  X \setminus A \in \mathcal{U}.

For example, if x \in X, then (x) = \{ A \subseteq X \mid x \in A \} is an ultrafilter on X, called the principal ultrafilter generated by x. Observe that \bigcap_{A \in (x)} A = \{x\}: if \bigcap_{A \in \mathcal{U}} A = \emptyset we say that \mathcal{U} is free. These are, in fact, the only two options.

Lemma 1. For a proper filter \mathcal{F} to be an ultrafilter, it is necessary and sufficient that it satisfies the following condition: for every n \geq 2 and nonempty A_1, \ldots, A_n \subseteq X, if \bigcup_{i=1}^n A_i \in \mathcal{F} then A_i \in \mathcal{F} for at least one i \in \{1, \ldots, n\}.

Proof: It is sufficient to prove the thesis with n=2. If A \cup B \in \mathcal{F} with A,B \not \in \mathcal{F}, then \mathcal{F}' = \{ B' \subseteq X \mid B' \neq \emptyset, A \cup B' \in \mathcal{F} \} is a proper filter that properly contains \mathcal{F}. If the condition is satisfied, for every A \subseteq X which is neither \emptyset nor X we have A \cup (X \setminus A) = X \in \mathcal{F}, thus either A \in \mathcal{F} or X \setminus A \in \mathcal{F}. \Box

Theorem 1. Every nonprincipal ultrafilter is free. In addition, an ultrafilter is free if and only if it extends the Fréchet filter. In particular, every ultrafilter over a finite set is principal.

Proof: Let \mathcal{U} be a nonprincipal ultrafilter. Let x \in X: then \mathcal{U} \neq (x), so either there exists B \subseteq X such that x \not \in B and B \in \mathcal{U}, or there exists B' \subseteq X such that x \in B' and B' \not \in \mathcal{U}. In the first case, x \not \in \bigcap_{A \in \mathcal{U}} A; in the second case, we consider B = X \setminus B' and reduce to the first case. As x is arbitrary, \mathcal{U} is free.

Now, for every x \in X the set X \setminus \{x\} belongs to \mathcal{F}(X) but not to (x): therefore, no principal ultrafilter extends the Fréchet filter. On the other hand, if \mathcal{U} is an ultrafilter, A \subseteq X is finite, and X \setminus A \not \in \mathcal{U}, then A \in \mathcal{U} by maximality, hence \{x\} \in \mathcal{U} for some x \in A because of Lemma 1, thus \mathcal{U} = (x) cannot be a free ultrafilter. \Box

So it seems that free ultrafilters are the right thing to consider when trying to expand the concept of limit. There is an issue, though: we have not seen any single example of a free ultrafilter; in fact, we do not even (yet) know whether free ultrafilters do exist! The answer to this problem comes, in a shamelessly nonconstructive way, from the following

Ultrafilter lemma. Every proper filter can be extended to an ultrafilter.

The ultrafilter lemma, together with Theorem 1, implies the existence of free ultrafilters on every infinite set, and in particular on \mathbb{N}. On the other hand, to prove the ultrafilter lemma the Axiom of Choice is required, in the form of Zorn’s lemma. Before giving such proof, we recall that a family of sets has the finite intersection property if every finite subfamily has a nonempty intersection: every proper filter has the finite intersection property.

Proof of the ultrafilter lemma. Let \mathcal{F} be a proper filter on X and let \mathcal{M} be the family of the collections of subsets of X that extend \mathcal{F} and have the finite intersection property, ordered by inclusion. Let \{U_i\}_{i \in I} be a totally ordered subfamily of \mathcal{M}: then U = \bigcup_{i \in I} U_i extends \mathcal{F} and has the finite intersection property, because for every finitely many A_1, \ldots, A_n \in U there exists by construction i \in I such that A_1, \ldots, A_n \in U_i.

By Zorn’s lemma, \mathcal{M} has a maximal element \mathcal{U}, which surely satisfies \emptyset \not \in \mathcal{U} and \mathcal{F} \subseteq \mathcal{U}. If A \in \mathcal{U} and B \supseteq A, then \mathcal{U} \cup \{B\} still has the finite intersection property, therefore B \in \mathcal{U} by maximality. If A,B \in \mathcal{U} then \mathcal{U} \cup \{A \cap B\} still has the finite intersection property, therefore again A \cap B \in \mathcal{U} by maximality.

Suppose, for the sake of contradiction, that there exists A \subseteq X such that A \not \in \mathcal{U} and X \setminus A \not \in \mathcal{U}: then neither \mathcal{U} \cup \{A\} nor \mathcal{U} \cup \{X \setminus A\} have the finite intersection property, hence there exist A_1, \ldots, A_m, B_1, \ldots, B_n \in \mathcal{U} such that A_1 \cap \ldots \cap A_m \cap A = B_1 \cap \ldots \cap B_n \cap (X \setminus A) = \emptyset. But A_1 \cap \ldots \cap A_m \cap A = \emptyset means A_1 \cap \ldots \cap A_m \subseteq X \setminus A, and B_1 \cap \ldots \cap B_n \cap (X \setminus A) = \emptyset means B_1 \cap \ldots \cap B_n \subseteq A: therefore,

A_1 \cap \ldots \cap A_m \cap B_1 \cap \ldots \cap B_n \subseteq (X \setminus A) \cap A = \emptyset,

against \mathcal{U} having the finite intersection property. \Box

We are now ready to expand the idea of limit. Let (X,d) be a metric space and let \mathcal{U} be an ultrafilter on X: we say that x \in X is the ultralimit of the sequence \{x_n\}_{n \geq 0} \subseteq X along \mathcal{U} if for every \varepsilon > 0 the set

\{ n \geq 0 \mid d(x_n, x) < \varepsilon \}

belongs to \mathcal{U}. (Observe how, in the standard definition of limit, the above set is required to belong to the Fréchet filter.) If this is the case, we write

\lim_{n \to \mathcal{U}} x_n = x

Ultralimits, if they exist, are unique and satisfy our first four conditions. Moreover, the choice of a principal ultrafilter \mathcal{U} = (n_0) corresponds to the trivial definition \lim_{n \to \mathcal{U}} x_n = x_{n_0}. So, what about free ultrafilters?

Theorem 2. Every bounded sequence of real numbers has an ultralimit along every free ultrafilter on \mathbb{N}.

Proof: It is not restrictive to suppose x_n \in [0,1] for every n \geq 0. Let \mathcal{U} be an arbitrary, but fixed, free ultrafilter on \mathbb{N}. We will construct a sequence of closed intervals A_k, k \geq 0, such that A_{k+1} \subseteq A_k and \mathrm{diam} \, A_k = 2^{-k} for every k \geq 0. By the Cantor intersection theorem it will be \bigcap_{k \geq 0} A_k = \{x\}: we will then show that \lim_{n \to \mathcal{U}} x_n = x.

Let A_0 = [0,1]. Let A_1 be either [0,1/2] or [1/2,1], chosen according to the following criterion: \{n \geq 0 \mid x_n \in A_1\} \in \mathcal{U}. If both halves satisfy the criterion, then we just choose one once and for all. We iterate the procedure by always choosing A_{k+1} as one of the two halves of A_k such that \{n \geq 0 \mid x_n \in A_{k+1}\} \in \mathcal{U}.

Let \bigcap_{k \geq 0} A_k = \{x\}. Let \varepsilon > 0, and let k be so large that 2^{-k} < \varepsilon: then A_k \subseteq (x-\varepsilon, x+\varepsilon), thus \{n \geq 0 \mid x_n \in A_k\} \subseteq \{n \geq 0 \mid |x_n-x| < \varepsilon\}. As the smaller set belongs to \mathcal{U}, so does the larger one. \Box

We have thus almost achieved our original target: a notion of limit which applies to every bounded sequence of real numbers. Such notion will depend on the specific free ultrafilter we choose: but it is already very reassuring that such a notion exists at all! To complete our job we need one more check: we have to be sure that the definition is consistent with the classical one. And this is indeed what happens!

Theorem 3. Let \{x_n\}_{n \geq 0} be a sequence of real numbers and let x \in \mathbb{R}. Then \lim_{n \to \infty} x_n = x in the classical sense if and only if \lim_{n \to \mathcal{U}} x_n = x for every free ultrafilter \mathcal{U} on \mathbb{N}.

To prove Theorem 3 we make use of an auxiliary result, which is of interest by itself.

Lemma 2. Let \mathcal{M}(X) be the family of collections of subsets of X that have the finite intersection property. The maximal elements of \mathcal{M} are precisely the ultrafilters.

Proof: Every ultrafilter is clearly maximal in \mathcal{M}. If \mathcal{U} is maximal in \mathcal{M}, then it is clearly proper and upper closed, and we can reason as in the proof of the ultrafilter lemma to show that it is actually an ultrafilter. \Box

Proof of Theorem 3: Suppose x_n does not converge to x in the classical sense. Fix \varepsilon_0 > 0 such that the set S = \{n \geq 0 \mid |x_n-x| \geq \varepsilon_0\} is infinite. Then the family \mathcal{V} = \{S \setminus \{n\} \mid n \geq 0\} has the finite intersection property: an ultrafilter \mathcal{U} that extends \mathcal{V} must be free. Then S \in \mathcal{U}, and x_n does not have an ultralimit along \mathcal{U}.

The converse implication follows from the classical definition of limit, together with the very notion of free ultrafilter. \Box

Theorem 3 does hold for sequences of real numbers, but does not extend to arbitrary metric spaces. In fact, the following holds, which we state without proving.

Theorem 4. Let X be a metric space. The following are equivalent.

  1. For some free ultrafilter \mathcal{U} on \mathbb{N}, every sequence in X has an ultralimit along \mathcal{U}.
  2. For every free ultrafilter \mathcal{U} on \mathbb{N}, every sequence in X has an ultralimit along \mathcal{U}.
  3. X is compact.

Ultrafilters are useful in many other contexts. For instance, they are used to construct hyperreal numbers, which in turn allow a rigorous definition of infinitesimals and the foundation of calculus over those. But this might be the topic for another Theory Lunch talk.

by Silvio Capobianco at August 21, 2014 03:15 PM

Edward Z. Yang

The fundamental problem of programming language package management

Why are there so many goddamn package managers? They sprawl across both operating systems (apt, yum, pacman, Homebrew) as well as for programming languages (Bundler, Cabal, Composer, CPAN, CRAN, CTAN, EasyInstall, Go Get, Maven, npm, NuGet, OPAM, PEAR, pip, RubyGems, etc etc etc). "It is a truth universally acknowledged that a programming language must be in want of a package manager." What is the fatal attraction of package management that makes programming language after programming language jump off this cliff? Why can't we just, you know, reuse an existing package manager?

You can probably think of a few reasons why trying to use apt to manage your Ruby gems would end in tears. "System and language package managers are completely different! Distributions are vetted, but that's completely unreasonable for most libraries tossed up on GitHub. Distributions move too slowly. Every programming language is different. The different communities don't talk to each other. Distributions install packages globally. I want control over what libraries are used." These reasons are all right, but they are missing the essence of the problem.

The fundamental problem is that programming languages package management is decentralized.

This decentralization starts with the central premise of a package manager: that is, to install software and libraries that would otherwise not be locally available. Even with an idealized, centralized distribution curating the packages, there are still two parties involved: the distribution and the programmer who is building applications locally on top of these libraries. In real life, however, the library ecosystem is further fragmented, composed of packages provided by a huge variety of developers. Sure, the packages may all be uploaded and indexed in one place, but that doesn't mean that any given author knows about any other given package. And then there's what the Perl world calls DarkPAN: the uncountable lines of code which probably exist, but which we have no insight into because they are locked away on proprietary servers and source code repositories. Decentralization can only be avoided when you control absolutely all of the lines of code in your application.. but in that case, you hardly need a package manager, do you? (By the way, my industry friends tell me this is basically mandatory for software projects beyond a certain size, like the Windows operating system or the Google Chrome browser.)

Decentralized systems are hard. Really, really hard. Unless you design your package manager accordingly, your developers will fall into dependency hell. Nor is there a one "right" way to solve this problem: I can identify at least three distinct approaches to the problem among the emerging generation of package managers, each of which has their benefits and downsides.

Pinned versions. Perhaps the most popular school of thought is that developers should aggressively pin package versions; this approach advocated by Ruby's Bundler, PHP's Composer, Python's virtualenv and pip, and generally any package manager which describes itself as inspired by the Ruby/node.js communities (e.g. Java's Gradle, Rust's Cargo). Reproduceability of builds is king: these package managers solve the decentralization problem by simply pretending the ecosystem doesn't exist once you have pinned the versions. The primary benefit of this approach is that you are always in control of the code you are running. Of course, the downside of this approach is that you are always in control of the code you are running. An all-to-common occurrence is for dependencies to be pinned, and then forgotten about, even if there are important security updates to the libraries involved. Keeping bundled dependencies up-to-date requires developer cycles--cycles that more often than not are spent on other things (like new features).

A stable distribution. If bundling requires every individual application developer to spend effort keeping dependencies up-to-date and testing if they keep working with their application, we might wonder if there is a way to centralize this effort. This leads to the second school of thought: to centralize the package repository, creating a blessed distribution of packages which are known to play well together, and which will receive bug fixes and security fixes while maintaining backwards compatibility. In programming languages, this is much less common: the two I am aware of are Anaconda for Python and Stackage for Haskell. But if we look closely, this model is exactly the same as the model of most operating system distributions. As a system administrator, I often recommend my users use libraries that are provided by the operating system as much as possible. They won't take backwards incompatible changes until we do a release upgrade, and at the same time you'll still get bugfixes and security updates for your code. (You won't get the new hotness, but that's essentially contradictory with stability!)

Embracing decentralization. Up until now, both of these approaches have thrown out decentralization, requiring a central authority, either the application developer or the distribution manager, for updates. Is this throwing out the baby with the bathwater? The primary downside of centralization is the huge amount of work it takes to maintain a stable distribution or keep an individual application up-to-date. Furthermore, one might not expect the entirety of the universe to be compatible with one another, but this doesn't stop subsets of packages from being useful together. An ideal decentralized ecosystem distributes the problem of identifying what subsets of packages work across everyone participating in the system. Which brings us to the fundamental, unanswered question of programming languages package management:

How can we create a decentralized package ecosystem that works?

Here are a few things that can help:

  1. Stronger encapsulation for dependencies. One of the reasons why dependency hell is so insidious is the dependency of a package is often an inextricable part of its outwards facing API: thus, the choice of a dependency is not a local choice, but rather a global choice which affects the entire application. Of course, if a library uses some library internally, but this choice is entirely an implementation detail, this shouldn't result in any sort of global constraint. Node.js's NPM takes this choice to its logical extreme: by default, it doesn't deduplicate dependencies at all, giving each library its own copy of each of its dependencies. While I'm a little dubious about duplicating everything (it certainly occurs in the Java/Maven ecosystem), I certainly agree that keeping dependency constraints local improves composability.
  2. Advancing semantic versioning. In a decentralized system, it's especially important that library writers give accurate information, so that tools and users can make informed decisions. Wishful, invented version ranges and artistic version number bumps simply exacerbate an already hard problem (as I mentioned in my previous post). If you can enforce semantic versioning, or better yet, ditch semantic versions and record the true, type-level dependency on interfaces, our tools can make better choices. The gold standard of information in a decentralized system is, "Is package A compatible with package B", and this information is often difficult (or impossible, for dynamically typed systems) to calculate.
  3. Centralization as a special-case. The point of a decentralized system is that every participant can make policy choices which are appropriate for them. This includes maintaining their own central authority, or deferring to someone else's central authority: centralization is a special-case. If we suspect users are going to attempt to create their own, operating system style stable distributions, we need to give them the tools to do so... and make them easy to use!

For a long time, the source control management ecosystem was completely focused on centralized systems. Distributed version control systems such as Git fundamentally changed the landscape: although Git may be more difficult to use than Subversion for a non-technical user, the benefits of decentralization are diverse. The Git of package management doesn't exist yet: if someone tells you that package management is solved, just reimplement Bundler, I entreat you: think about decentralization as well!

by Edward Z. Yang at August 21, 2014 01:02 PM

Lee Pike

SmartChecking Matt Might’s Red-Black Trees

Matt Might gave a nice intro to QuickCheck via testing red-black trees recently. Of course, QuickCheck has been around for over a decade now, but it’s still useful (if underused–why aren’t you QuickChecking your programs!?).

In a couple of weeks, I’m presenting a paper on an alternative to QuickCheck called SmartCheck at the Haskell Symposium.

SmartCheck focuses on efficiently shrinking and generalizing large counterexamples. I thought it’d be fun to try some of Matt’s examples with SmartCheck.

The kinds of properties Matt Checked really aren’t in the sweet spot of SmartCheck, since the counterexamples are so small (Matt didn’t even have to define instances for shrink!). SmartCheck focuses on shrinking and generalizing large counterexamples.

Still, let’s see what it looks like. (The code can be found here.)

SmartCheck is only interesting for failed properties, so let’s look at an early example in Matt’s blog post where something goes wrong. A lot of the blog post focuses on generating sufficiently constrained arbitrary red-black trees. In the section entitled, “A property for balanced black depth”, a property is given to check that the path from the root of a tree to every leaf passes through the same number of black nodes. An early generator for trees fails to satisfy the property.

To get the code to work with SmartCheck, we derive Typeable and Generic instances for the datatypes, and use GHC Generics to automatically derive instances for SmartCheck’s typeclass. The only other main issue is that SmartCheck doesn’t support a `forall` function like in QuickCheck. So instead of a call to QuickCheck such as

> quickCheck (forAll nrrTree prop_BlackBalanced)

We change the arbitrary instance to be the nrrTree generator.

Because it is so easy to find a small counterexample, SmartCheck’s reduction algorithm does a little bit of automatic shrinking, but not too much. For example, a typical minimal counterexample returned by SmartCheck looks like

T R E 2 (T B E 5 E)

which is about as small as possible. Now onto generalization!

There are three generalization phases in SmartCheck, but we’ll look at just one, in which a formula is returned that is universally quantified if every test case fails. For the test case above, SmartCheck returns the following formula:

forall values x0 x1:
T R E 2 (T B x1 5 x0)

Intuitively, this means that for any well-typed trees chosen that could replace the variables x0 and x1, the resulting formula is still a counterexample.

The benefit to developers is seeing instantly that those subterms in the counterexample probably don’t matter. The real issue is that E on the left is unbalanced with (T B E 5 E) on the right.

One of the early design decisions in SmartCheck was focus on structurally shrinking data types and essentially ignore “base types” like Int, char, etc. The motivation was to improve efficiency on shrinking large counterexamples.

But for a case like this, generalizing base types would be interesting. We’d hypothetically get something like

forall values (x0, x1 :: RBSet Int) (x2, x3 :: Int):
T R E x2 (T B x1 x3 x0)

further generalizing the counterexample. It may be worth adding this behavior to SmartCheck.

SmartCheck’s generalization begins to bridge the gap from specific counterexamples to formulas characterizing counterexamples. The idea is related to QuickSpec, another cool tool developed by Claessen and Hughes (and SmallBone). Moreover, it’s a bridge between testing and verification, or as Matt puts it, from the 80% to the 20%.

by Lee Pike at August 21, 2014 04:53 AM

FP Complete

IAP: Speeding up conduit

This post contains fragments of active Haskell code, best viewed and executed at

As most of us know, performance isn't a one-dimensional spectrum. There are in fact multiple different ways to judge performance of a program. A commonly recognized tradeoff is that between CPU and memory usage. Often times, a program can be sped up by caching more data, for example.

conduit is a streaming data library. In that sense, it has two very specific performance criterion it aims for:

  • Constant memory usage.
  • Efficient usage of scarce resources, such as closing file descriptors as early as possible.

While CPU performance is always a nice goal, it has never been my top priority in the library's design, especially given that in the main use case for conduit (streaming data in an I/O context), the I/O cost almost always far outweighs any CPU overhead from conduit.

However, for our upcoming Integrated Analysis Platform (IAP) release, this is no longer the case. conduit will be used in tight loops, where we do need to optimize for the lowest CPU overhead possible.

This blog post covers the first set of optimizations I've applied to conduit. There is still more work to be done, and throughout this blogpost I'll be describing some of the upcoming changes I am attempting.

I'll give a brief summary up front:

  • Applying the codensity transform results in much better complexity of monadic bind.
  • We're also less reliant on rewrite rules firing, which has always been unreliable (and now I know why).
  • This change does represent a breaking API change. However, it only affects users of the Data.Conduit.Internal module. If you've just been using the public API, your code will be unaffected, besides getting an automatic speedup.
  • These changes will soon be released as conduit 1.2.0, after a period for community feedback.

Note that this blog post follows the actual steps I went through (more or less) in identifying the performance issues I wanted to solve. If you want to skip ahead to the solution itself, you may want to skip to the discussion on difference lists, or even straight to continuation passing style, church-encoding, codensity.

By the way, after I originally wrote this blog post, I continued working on the optimizations I describe as possible future enhancements. Those are actually working out far better than I expected, and it looks like conduit 1.2.0 will be able to ship with them. I'll be writing a separate blog post detailing those changes. A bit of a teaser is: for vector-equivalent code, conduit now generates identical core as vector itself.

The benchmarks

Before embarking on any kind of serious optimizations, it's important to have some benchmarks. I defined three benchmarks for the work I was going to be doing:

  • A simple sum: adding up the numbers from 1 to 10000. This is to get a baseline of the overhead coming from conduit.

  • A monte carlo analysis: This was based on a previous IAP blog post. I noticed when working on that benchmark that, while the conduit solution was highly memory efficient, there was still room to speed up the benchmark.

  • Sliding vectors: Naren Sundar recently sent a sliding windows pull requests, which allow us to get a view of a fixed size of a stream of values. This feature is very useful for a number of financial analyses, especially regarding time series.

    Naren's pull request was based on immutable data structures, and for those cases it is highly efficient. However, it's possible to be far more memory efficient by writing to a mutable vector instead, and then taking immutable slices of that vector. Mihaly Barasz sent a pull request for this feature, and much to our disappointment, for small window sizes, it performed worse than sliding windows. We want to understand why.

You can see the benchmark code, which stays mostly unchanged for the rest of this blog post (a few new cases are added to demonstrate extra points). The benchmarks always contain a low-level base case representing the optimal performance we can expect from hand-written Haskell (without resorting to any kind of FFI tricks or the like).

You can see the first run results which reflect conduit 1.1.7, plus inlining of a few functions. Some initial analysis:

  • Control.Monad.foldM is surpringly slow.
  • Data.Conduit.List.foldM has a rather steep performance hit versus Data.Conduit.List.fold.
  • There's a very high overhead in the monte carlo analysis.
  • For sliding vector, the conduit overhead is more pronounced at smaller window sizes.
  • But even with large window sizes, mutable vector conduits still have a large overhead. The sliding window/immutable approach, however, shows almost no overhead.

That hopefully sets the scene enough for us to begin to dive in.

Rewrite rules: lift

GHC offers a very powerful optimization technique: rewrite rules. This allows you to tell the compiler that a certain expression can be rewritten to a more efficient one. A common example of a rewrite rule would be to state that map f . map g is the same as map (f . g). This can be expressed as:

{-# RULES "map f . map g" forall f g. map f . map g = map (f . g) #-}

Note that GHC's list rewrite rules are actually more complicated than this, and revolve around a concept called build/foldr fusion.

Let's look at the implementation of the yield function in conduit (with some newtypes stripped away):

yield :: Monad m => o -> ConduitM i o m ()
yield o = HaveOutput (Done ()) (return ()) o
{-# INLINE [1] yield #-}
    "yield o >> p" forall o (p :: ConduitM i o m r).
    yield o >> p = HaveOutput p (return ()) o

The core datatype of conduit is recursive. The HaveOutput constructor contains a field for "what to do next." In the case of yield, there isn't anything to do next, so we fill that with Done (). However, creating that Done () value just to throw it away after a monadic bind is wasteful. So we have a rewrite rule to fuse those two steps together.

But no such rewrite rule exists for lift! My first step was to add such a rule, and check the results. Unfortunately, the rule didn't have any real impact, because it wasn't firing. Let's put that issue to the side; we'll come back to it later.

Cleanup, inlining

One of the nice features introduced in (I believe) GHC 7.8 is that the compiler will now warn you when a rewrite rule may not fire. When compiling conduit, I saw messages like:

Data/Conduit/List.hs:274:11: Warning:
    Rule "source/map fusion $=" may never fire
      because ‘$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’

Data/Conduit/List.hs:275:11: Warning:
    Rule "source/map fusion =$=" may never fire
      because ‘=$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’

Data/Conduit/List.hs:542:11: Warning:
    Rule "source/filter fusion $=" may never fire
      because ‘$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$=’

Data/Conduit/List.hs:543:11: Warning:
    Rule "source/filter fusion =$=" may never fire
      because ‘=$=’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘=$=’

Data/Conduit/List.hs:552:11: Warning:
    Rule "connect to sinkNull" may never fire
      because ‘$$’ might inline first
    Probable fix: add an INLINE[n] or NOINLINE[n] pragma on ‘$$’

This demonstrates an important interaction between inlining and rewrite rules. We need to make sure that expressions that need to be rewritten are not inlined first. If they are first inlined, then GHC won't be able to rewrite them to our more optimized version.

A common approach to this is to delay inlining of functions until a later simplification phase. The GHC simplification process runs in multiple steps, and we can state that rules and inlining should only happen before or after a certain phase. The phases count down from 2 to 0, so we commonly want to delay inlining of functions until phase 0, if they may be subject to rewriting.

Conversely, some functions need to be inlined before a rewrite rule can fire. In stream fusion, for example, the fusion framework depends on the following sequencing to get good performance:

map f . map g
-- inline map
unstream . mapS f . stream . unstream . mapS g . stream
-- rewrite stream . unstream
unstream . mapS f . mapS g . stream
-- rewrite mapS . mapS
unstream . mapS (f . g) . stream

In conduit, we need to make sure that all of this is happening in the correct order. There was one particular complexity that made it difficult to ensure this happened. conduit in fact has two core datatypes: Pipe and ConduitM, with the latter being a more friendly newtype wrapper around the first. Up until this point, the code for the two was jumbled into a single internal module, making it difficult to track which things were being written in which version of the API.

My next step was to split things into .Pipe and .Conduit internal modules, and then clean up GHC's warnings to get rules to fire more reliably. This gave a modest performance boost to the sliding vector benchmarks, but not much else. But it does pave the way for future improvements.

Getting serious about sum, by cheating

The results so far have been uninspiring. We've identified a core problem (too many of those Done data constructors being used), and noticed that the rewrite rules that should fix that don't seem to be doing their job. Now let's take our first stab at really improving performance: with aggressive rewrite rules.

Our sum benchmark is really simple: use enumFromTo to create a stream of values, and fold (or foldM) to consume that. The thing that slows us down is that, in between these two simple functions, we end up allocating a bunch of temporary data structures. Let's get rid of them with rewrite rules!

This certainly did the trick. The conduit implementation jumped from 185us to just 8.63us. For comparison, the low level approach (or vector's stream fusion) clocks in at 5.77us, whereas foldl' on a list is 80.6us. This is a huge win!

But it's also misleading. All we've done here is sneakily rewritten our conduit algorithm into a low-level format. This solves the specific problem on the table (connecting enumFromTo with fold), but won't fully generalize to other cases. A more representative demonstration of this improvement is the speedup for foldM, which went from 1180us to 81us. The reason this is more realistic is that the rewrite rule is not specialized to enumFromTo, but rather works on any Source.

I took a big detour at this point, and ended up writing an initial implementation of stream fusion in conduit. Unfortunately, I ran into a dead end on that branch, and had to put that work to the side temporarily. However, the improvements discussed in the rest of this blog post will hopefully reopen the door to stream fusion, which I hope to investigate next.

Monte carlo, and associativity

Now that I'd made the results of the sum benchmark thoroughly useless, I decided to focus on the results of monte carlo, where the low level implementation still won by a considerable margin (3.42ms vs 10.6ms). The question was: why was this happening? To understand, let's start by looking at the code:

analysis = do
    successes <- sourceRandomN count
              $$ CL.fold (\t (x, y) ->
                            if (x*x + y*(y :: Double) < 1)
                                then t + 1
                                else t)
                    (0 :: Int)
    return $ fromIntegral successes / fromIntegral count * 4

sourceRandomN :: (MWC.Variate a, MonadIO m) => Int -> Source m a
sourceRandomN cnt0 = do
    gen <- liftIO MWC.createSystemRandom
    let loop 0 = return ()
        loop cnt = do
            liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)
    loop cnt0

The analysis function is not very interesting: it simply connects sourceRandomN with a fold. Given that we now have a well behaved and consistently-firing rewrite rule for connecting to folds, it's safe to say that was not the source of our slowdown. So our slowdown must be coming from:

liftIO (MWC.uniform gen) >>= yield >> loop (cnt - 1)

This should in theory generate really efficient code. yield >> loop (cnt - 1) should be rewritten to \x -> HaveOutput (loop (cnt - 1)) (return ()) x), and then liftIO should get rewritten to generate:

PipeM $ do
    x <- MWC.uniform gen
    return $ HaveOutput (loop $ cnt - 1) (return ()) x

I added another commit to include a few more versions of the monte carlo benchmark (results here). The two most interesting are:

  • Explicit usage of the Pipe constructors:

    sourceRandomNConstr :: (MWC.Variate a, MonadIO m) => Int -> Source m a
    sourceRandomNConstr cnt0 = ConduitM $ PipeM $ do
        gen <- liftIO MWC.createSystemRandom
        let loop 0 = return $ Done ()
            loop cnt = do
                x <- liftIO (MWC.uniform gen)
                return $ HaveOutput (PipeM $ loop (cnt - 1)) (return ()) x
        loop cnt0

    This version ran in 4.84ms, vs the original conduit version which ran in 15.8ms. So this is definitely the problem!

  • Explicitly force right-associated binding order:

    sourceRandomNBind :: (MWC.Variate a, MonadIO m) => Int -> Source m a
    sourceRandomNBind cnt0 = lift (liftIO MWC.createSystemRandom) >>= \gen ->
        let loop 0 = return ()
            loop cnt = do
                lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))
         in loop cnt0

    Or to zoom in on the important bit:

    lift (liftIO $ MWC.uniform gen) >>= (\o -> yield o >> loop (cnt - 1))

    By the monad laws, this code is identical to the original. However, instead of standard left-associativity, we have right associativity or monadic bind. This code ran in 5.19ms, an approximate threefold speedup vs the left associative code!

This issue of associativity was something Roman Cheplyaka told me about back in April, so I wasn't surprised to see it here. Back then, I'd looked into using Codensity together with ConduitM, but didn't get immediate results, and therefore postponed further research until I had more time.

OK, so why exactly does left-associativity hurt us so much? There are two reasons actually:

  • Generally speaking, many monads perform better when they are right associated. This is especially true for free monads, of which conduit is just a special case. Janis Voigtl ̈ander's paper Asymptotic Improvement of Computations over Free Monads and Edward Kmett's blog post series free monads for less do a far better job of explaining the issue than I could.
  • In the case of conduit, left associativity prevented the lift and yield rewrite rules from firing, which introduced extra, unnecessary monadic bind operations. Forcing right associativity allows these rules to fire, avoiding a lot of unnecessary data constructor allocation and analysis.

At this point, it became obvious at this point that the main slowdown I was seeing was driven by this problem. The question is: how should we solve it?

Difference lists

To pave the way for the next step, I want to take a quick detour and talk about something simpler: difference lists. Consider the following code:

(((w ++ x) ++ y) ++ z)

Most experienced Haskellers will cringe upon reading that. The append operation for a list needs to traverse every cons cell in its left value. When we left-associate append operations like this, we will need to traverse every cell in w, then every cell in w ++ x, then every cell in w ++ x ++ y. This is highly inefficient, and would clearly be better done in a right-associated style (sound familiar?).

But forcing programmers to ensure that their code is always right-associated isn't always practical. So instead, we have two common alternatives. The first is: use a better datastructure. In particular, Data.Sequence has far cheaper append operations than lists.

The other approach is to use difference lists. Difference lists are functions instead of actual list values. They are instructions for adding values to the beginning of the list. In order to append, you use normal function composition. And to convert them to a list, you apply the resulting function to an empty list. As an example:

type DList a = [a] -> [a]

dlist1 :: DList Int
dlist1 rest = 1 : 2 : rest

dlist2 :: DList Int
dlist2 rest = 3 : 4 : rest

final :: [Int]
final = dlist1 . dlist2 $ []

main :: IO ()
main = print final

Both difference lists and sequences have advantages. Probably the simplest summary is:

  • Difference lists have smaller constant factors for appending.
  • Sequences allow you to analyze them directly, without having to convert them to a different data type first.

That second point is important. If you need to regularly analyze your list and then continue to append, the performance of a difference list will be abysmal. You will constantly be swapping representations, and converting from a list to a difference list is an O(n) operation. But if you will simply be constructing a list once without any analysis, odds are difference lists will be faster.

This situation is almost identical to our problems with conduit. Our monadic composition operator- like list's append operator- needs to traverse the entire left hand side. This connection is more clearly spelled out in Reflection without Remorse by Atze van der Ploeg and Oleg Kiselyov (and for me, care of Roman).

Alright, with that out of the way, let's finally fix conduit!

Continuation passing style, church-encoding, codensity

There are essentially two things we need to do with conduits:

  • Monadically compose them to sequence two streams into a larger stream.
  • Categorically compose them to connect one stream to the next in a pipeline.

The latter requires that we be able to case analyze our datatypes, while theoretically the former does not: something like difference lists for simple appending would be ideal. In the past, I've tried out a number of different alternative implementations of conduit, none of which worked well enough. The problem I always ran into was that either monadic bind became too expensive, or categorical composition became too expensive.

Roman, Mihaly, Edward and I discussed these issues a bit on Github, and based on Roman's advice, I went ahead with writing a benchmark of different conduit implementations. I currently have four implementations in this benchmark (and hope to add more):

  • Standard, which looks very much like conduit 1.1, just a bit simplified (no rewrite rules, no finalizers, no leftovers).
  • Free, which is conduit rewritten to explicitly use the free monad transformer.
  • Church, which modifies Free to instead use the Church-encoded free monad transformer.
  • Codensity, which is a Codensity-transform-inspired version of conduit.

You can see the benchmark results, which clearly show the codensity version to be the winner. Though it would be interesting, I think I'll avoid going into depth on the other three implementations for now (this blog post is long enough already).

What is Codensity?

Implementing Codensity in conduit just means changing the ConduitM newtype wrapper to look like this:

newtype ConduitM i o m r = ConduitM
    { unConduitM :: forall b.
                    (r -> Pipe i i o () m b) -> Pipe i i o () m b

What this says is "I'm going to provide an r value. If you give me a function that needs an r value, I'll give it that r value and then continue with the resulting Pipe." Notice how similar this looks to the type signature of monadic bind itself:

(>>=) :: Pipe i i o () m r
      -> (r -> Pipe i i o () m b)
      -> Pipe i i o () m b

This isn't by chance, it's by construction. More information is available in the Haddocks of kan-extension, or in the above-linked paper and blog posts by Janis and Edward. To see why this change is important, let's look at the new implementations of some of the core conduit functions and type classes:

yield o = ConduitM $ \rest -> HaveOutput (rest ()) (return ()) o

await = ConduitM $ \f -> NeedInput (f . Just) (const $ f Nothing)

instance Monad (ConduitM i o m) where
    return x = ConduitM ($ x)
    ConduitM f >>= g = ConduitM $ \h -> f $ \a -> unConduitM (g a) h

instance MonadTrans (ConduitM i o) where
    lift mr = ConduitM $ \rest -> PipeM (liftM rest mr)

Instead of having explicit Done constructors in yield, await, and lift, we use the continuation rest. This is the exact same transformation we were previously relying on rewrite rules to provide. However, our rewrite rules couldn't fire properly in a left-associated monadic binding. Now we've avoided the whole problem!

Our Monad instance also became much smaller. Notice that in order to monadically compose, there is no longer any need to case-analyze the left hand side, which avoids the high penalty of left association.

Another interesting quirk is that our Monad instance on ConduitM no longer requires that the base m type constructor itself be a Monad. This is nice feature of Codensity.

So that's half the story. What about categorical composition? That certainly does require analyzing both the left and right hand structures. So don't we lose all of our speed gains of Codensity with this? Actually, I think not. Let's look at the code for categorical composition:

ConduitM left0 =$= ConduitM right0 = ConduitM $ \rest ->
    let goRight final left right =
            case right of
                HaveOutput p c o  -> HaveOutput (recurse p) (c >> final) o
                NeedInput rp rc   -> goLeft rp rc final left
                Done r2           -> PipeM (final >> return (rest r2))
                PipeM mp          -> PipeM (liftM recurse mp)
                Leftover right' i -> goRight final (HaveOutput left final i) right'
            recurse = goRight final left

        goLeft rp rc final left =
            case left of
                HaveOutput left' final' o -> goRight final' left' (rp o)
                NeedInput left' lc        -> NeedInput (recurse . left') (recurse . lc)
                Done r1                   -> goRight (return ()) (Done r1) (rc r1)
                PipeM mp                  -> PipeM (liftM recurse mp)
                Leftover left' i          -> Leftover (recurse left') i
            recurse = goLeft rp rc final
     in goRight (return ()) (left0 Done) (right0 Done)

In the last line, we apply left0 and right0 to Done, which is how we convert our Codensity version into something we can actually analyze. (This is equivalent to applying a difference list to an empty list.) We then traverse these values in the same way that we did in conduit 1.1 and earlier.

The important difference is how we ultimately finish. The code in question is the Done clause of the goRight's case analysis, namely:

Done r2           -> PipeM (final >> return (rest r2))

Notice the usage of rest, instead of what we would have previously done: used the Done constructor. By doing this, we're immediately recreating a Codensity version of our resulting Pipe, which allows us to only traverse our incoming Pipe values once each, and not need to retraverse the outgoing Pipe for future monadic binding.

This trick doesn't just work for composition. There are a large number of functions in conduit that need to analyze a Pipe, such as addCleanup and catchC. All of them are now implemented in this same style.

After implementing this change, the resulting benchmarks look much better. The naive implementation of monte carlo is now quite close to the low-level version (5.28ms vs 3.44ms, as opposed to the original 15ms). Sliding vector is also much better: the unboxed, 1000-size window benchmark went from 7.96ms to 4.05ms, vs a low-level implementation at 1.87ms.

Type-indexed sequences

One approach that I haven't tried yet is the type-indexed sequence approach from Reflection without Remorse. I still intend to add it to my conduit benchmark, but I'm not optimistic about it beating out Codensity. My guess is that a sequence data type will have a higher constant factor overhead, and based on the way composition is implemented in conduit, we won't get any benefit from avoiding the need to transition between two representations.

Edward said he's hoping to get an implementation of such a data structure into the free package, at which point I'll update my benchmark to see how it performs.

To pursue next: streamProducer, streamConsumer, and more

While this round of benchmarking produced some very nice results, we're clearly not yet at the same level as low-level code. My goal is to focus on that next. I have some experiments going already relating to getting conduit to expose stream fusion rules. In simple cases, I've generated a conduit-compatible API with the same performance as vector.

The sticking point is getting something which is efficient not just for functions explicitly written in stream style, but also provides decent performance when composed with the await/yield approach. While the latter approach will almost certainly be slower than stream fusion, I'm hoping we can get it to degrade to current-conduit performance levels, and allow stream fusion to provide a significant speedup when categorically composing two Conduits written in that style.

The code discussed in this post is now available on the next-cps branch of conduit. conduit-extra, conduit-combinators, and a number of other packages either compile out-of-the-box with these changes, or require minor tweaks (already implemented), so I'm hoping that this API change does not affect too many people.

As I mentioned initially, I'd like to have some time for community discussion on this before I make this next release.

August 21, 2014 12:00 AM

August 20, 2014

Neil Mitchell

Continuations and Exceptions

Summary: In moving Shake to continuations, exceptions were the biggest headache. I figured out how to somewhat integrate continuations and exception handling.

The git repo for Shake now suspends inactive computations by capturing their continuation instead of blocking their thread, based on the continuations I described in a previous blog post. The most difficult part was managing exceptions. I needed to define a monad where I could capture continuations and work with exceptions, requiring the definitions:

data M a = ... deriving (Functor, Applicative, Monad, MonadIO)
throwM :: SomeException -> M a
catchM :: M a -> (SomeException -> M a) -> M a
captureM :: ((a -> IO ()) -> IO ()) -> M a

I'm using M as the name of the monad. I want equivalents of throwIO and catch for M, along with a function to capture continuations.

The first observation is that since catchM must catch any exceptions, including those thrown by users calling error, then throwM can be defined as:

throwM = liftIO . throwIO

Using throwIO gives better guarantees about when the exception is raised, compared to just throw.

The second observation is that sometimes I want to raise an exception on the continuation, rather than passing back a value. I can build that on top of captureM with:

captureM' :: ((Either SomeException a -> IO ()) -> IO ()) -> M a
captureM' k = either throwM return =<< captureM k

The third observation (which I observed after a few weeks trying not to follow it) is that the continuation may never be called, and that means you cannot implement a robust finallyM function. In particular, if the person who was intending to run the continuation themselves raises an exception, the continuation is likely to be lost. I originally tried to come up with schemes for defining the function passed the continuation to guarantee the continuation was called, but it became messy very quickly.

The properties we expect of the implementation, to a rough approximation, include:

  • catchM (x >> throwM e) (\_ -> y) >> z === x >> y >> z -- if you throw an exception inside a catchM, you must run the handler.
  • captureM (\k -> x) >>= y === x -- if you execute something not using the continuation inside captureM it must behave like it does outside captureM. In particular, if the captureM is inside a catchM, that catchM must not catch the exception.
  • captureM (\k -> k x) >>= y === x >>= y -- if you capture the continuation then continue that must be equivalent to not capturing the continuation.
  • captureM (\k -> k x >> k x) >>= y === (x >>= y) >> (x >>= y) -- if you run the continuation twice it must do the same IO actions each time. In particular, if the first gets its exceptions caught, the second must do also.

These properties are incomplete (there are other things you expect), and fuzzy (for example, the second property isn't type correct) - but hopefully they give an intuition.

The implementation was non-trivial and (sadly) non-elegant. I suspect a better implementation is known in the literature, and I'd welcome a pointer. My implementation defines M as:

type M a = ContT () (ReaderT (IORef (SomeException -> IO ())) IO) a

Here we have a continuation monad wrapping a reader monad. The reader contains an IORef which stores the exception handler. The basic idea is that whenever we start running anything in M we call the Haskell catch function, and the exception handler forwards to the IORef. We can define catchM as:

catchM :: M a -> (SomeException -> M a) -> M a
catchM m hdl = ContT $ \k -> ReaderT $ \s -> do
old <- liftIO $ readIORef s
writeIORef s $ \e -> do
s <- newIORef old
hdl e `runContT` k `runReaderT` s `catch`
\e -> ($ e) =<< readIORef s
flip runReaderT s $ m `runContT` \v -> do
s <- ask
liftIO $ writeIORef s old
k v
  • We store the previous exception handler as old, and insert a new one. After the code has finished (we have left the catchM block) we restore the old exception handler. In effect, we have a stack of exception handlers.
  • When running the handler we pop off the current exception handler by restoring old, then since we have already used up our catch, we add a new catch to catch exceptions in the handler.

We then define captureM as:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
old <- readIORef s
writeIORef s throwIO
f $ \x -> do
s <- newIORef old
flip runReaderT s (k x) `E.catch`
\e -> ($ e) =<< readIORef s
writeIORef s throwIO
  • We make sure to switch the IORef back to throwIO before we start running the users code, and after we have finished running our code and switch back to user code. As a result, if the function that captures the continuation throws an exception, it will be raised as normal.
  • When running the continuation we create a new IORef for the handler, since the continuation might be called twice in parallel, and the separate IORef ensures they don't conflict with each other.

Finally, we need a way to run the computation. I've called that runM:

runM :: M a -> (Either SomeException a -> IO ()) -> IO ()
runM m k = do
let mm = do
captureM $ \k -> k ()
catchM (Right <$> m) (return . Left)
s <- newIORef throwIO
mm `runContT` (liftIO . k) `runReaderT` s

The signature of runM ends up being the only signature the makes sense given the underlying mechanisms. We define mm by using the facilities of captureM to insert a catch and catchM to ensure we never end up in an exception state from runM. The rest is just matching up the types.

Stack depth could potentially become a problem with this solution. If you regularly do:

captureM (\k -> k ())

Then each time a catch will be wrapped around the function. You can avoid that by changing captureM to throw an exception:

captureM :: ((a -> IO ()) -> IO ()) -> M a
captureM f = ContT $ \k -> ReaderT $ \s -> do
old <- readIORef s
writeIORef s $ \_ ->
f $ \x -> do
s <- newIORef old
flip runReaderT s (k x) `E.catch`
\e -> ($ e) =<< readIORef s
throwIO anyException

Here we unwind the catch by doing a throwIO, after installing our exception handler which actually passes the continuation. It is a bit ugly, and I haven't checked if either the catch is a problem, or that this solution solves it.

The implementation in Shake is a bit different to that described above. In Shake I know that captured continuations are never called more than once, so I
can avoid creating a new IORef in captureM, and I can reuse the existing one. Since I never change the handler, I can use a slightly less powerful definition of M:

type M a = ReaderT (IORef (SomeException -> IO ())) (ContT () IO) a

The resulting code is Development.Shake.Monad, which implements the RAW monad, and also does a few extra things which are irrelevant to this post.

The cool thing about Haskell is that I've been able to completely replace the underlying Shake Action monad from StateT/IO, to ReaderT/IO, to ReaderT/ContT/IO, without ever breaking any users of Shake. Haskell allows me to produce effective and flexible abstractions.

by Neil Mitchell ( at August 20, 2014 05:39 PM

August 19, 2014

Brent Yorgey

Maniac week postmortem

My maniac week was a great success! First things first: here’s a time-lapse video1 (I recommend watching it at the full size, 1280×720).

<iframe class="youtube-player" frameborder="0" height="315" src=";rel=1&amp;fs=1&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" type="text/html" width="560"></iframe>

Some statistics2:

  • Total hours of productive work: 55.5 (74 pings)
  • Average hours of work per day3: 11
  • Average hours of sleep per night: 7.8 (52 pings over 5 nights)4
  • Total hours not working or sleeping: 27.25 (37 pings)
  • Average hours not working per day: 5.5
  • Pages of dissertation written: 24 (157 to 181)

[I was planning to also make a visualization of my TagTime data showing when I was sleeping, working, or not-working, but putting together the video and this blog post has taken long enough already! Perhaps I’ll get around to it later.]

Overall, I would call the experiment a huge success—although as you can see, I was a full 2.5 hours per day off my target of 13.5 hours of productive work each day. What with eating, showering, making lunch, getting dinner, taking breaks (both intentional breaks as well as slacking off), and a few miscellaneous things I had to take care of like taking the car to get the tire pressure adjusted… it all adds up surprisingly fast. I think this was one of the biggest revelations for me; going into it I thought 3 hours of not-work per day was extremely generous. I now think three hours of not-work per day is probably within reach for me but would be extremely difficult, and would probably require things like planning out meals ahead of time. In any case, 55 hours of actual, focused work is still fantastic.

Some random observations/thoughts:

  • Having multiple projects to work on was really valuable; when I got tired of working on one thing I could often just switch to something else instead of taking an actual break. I can imagine this might be different if I were working on a big coding project (as most of the other maniac weeks have been). The big project would itself provide multiple different subtasks to work on, but more importantly, coding provides immediate feedback that is really addictive. Code a new feature, and you can actually run the new code! And it does something cool! That it didn’t do before! In contrast, when I write another page of my dissertation I just have… another page of my dissertation. I am, in fact, relatively excited about my dissertation, but it can’t provide that same sort of immediate reinforcing feedback, and it was difficult to keep going at times.

  • I found that having music playing really helped me get into a state of “flow”. The first few days I would play some album and then it would stop and I wouldn’t think to put on more. Later in the week I would just queue up many hours of music at a time and that worked great.

  • I was definitely feeling worn out by the end of the week—the last two days in particular, it felt a lot harder to get into a flow. I think I felt so good the first few days that I became overconfident—which is good to keep in mind if I do this again. The evening of 12 August was particularly bad; I just couldn’t focus. It might have been better in the long run to just go home and read a book or something; I’m just not sure how to tell in the moment when I should push through and when it’s better to cut my losses.

  • Blocking Facebook, turning off email notifications, etc. was really helpful. I did end up allowing myself to check email using my phone (I edited the rules a few hours before I started) and I think it was a good idea—I ended up still needing to communicate with some people, so it was very convenient and not too distracting.

  • Note there are two places on Tuesday afternoon where you can see the clock jump ahead by an hour or so; of course those are times when I turned off the recording. One corresponded to a time when I needed to read and write some sensitive emails; during the other, I was putting student pictures into an anki deck, and turned off the recording to avoid running afoul of FERPA.

That’s all I can think of for now; questions or comments, of course, are welcome.

  1. Some technical notes (don’t try this at home; see for some recommendations on making your own timelapse). To record and create the video I used a homegrown concoction of scrot, streamer, ImageMagick, ffmpeg, with some zsh and Haskell scripts to tie it all together, and using diagrams to generate the clock and tag displays. I took about 3GB worth of raw screenshots, and it takes probably about a half hour to process all of it into a video.

  2. These statistics are according to TagTime, i.e. gathered via random sampling, so there is a bit of inherent uncertainty. I leave it as an exercise for the reader to calculate the proper error bars on these times (given that I use a standard ping interval of 45 minutes).

  3. Computed as 74/(171 – 9) pings multiplied by 24 hours; 9 pings occurred on Sunday morning which I did not count as part of the maniac week.

  4. This is somewhat inflated by Saturday night/Sunday morning, when I both slept in and got a higher-than-average number of pings; the average excluding that night is 6.75 hours, which sounds about right.

by Brent at August 19, 2014 06:18 PM

August 18, 2014

Functional Programming Group at the University of Kansas

Bluetooth on Haskell

I'm presenting a very early draft of my bluetooth library. As its name suggests, bluetooth is a Haskell frontend to low-level Bluetooth APIs, making it similar in spirit to Python's PyBluez and Java's BlueCove.

What it can do

Currently, bluetooth only supports Linux. It has the capability of running an RFCOMM server and client. Theoretically, it should also support L2CAP servers and clients, although this has not been tested yet.

What it will eventually do

I plan to have bluetooth support each of the GHC Tier 1 platforms—that is, Windows, OS X, Linux, and FreeBSD. I want to have the capability to run the full gamut of L2CAP and RFCOMM-related functions on each OS, as well as any additional OS-specific functionality.


Bluetooth programming on Haskell is currently in a very primitive state. As of now, there are only two packages on Hackage that make any mention of Bluetooth (as far as I can tell):

  1. network hints in its Network.Socket module that there is an AF_BLUETOOTH socket address family. However, trying to use it with network will fail, since there is no corresponding Bluetooth SockAddr.
  2. simple-bluetooth by Stephen Blackheath offers some of what my own bluetooth package offers (namely, RFCOMM client capability on Windows and Linux).
However, there is currently no comprehensive, cross-platform Haskell Bluetooth library à la PyBluez or BlueCove. I want bluetooth to fill that niche.

How bluetooth works

bluetooth can be thought of as a wrapper around network. After all, Bluetooth programming is socket-based, so Network.Socket already provides most of what one needs to implement a Bluetooth server or client. There are several gotchas with Bluetooth programming, however:
  • Every major OS has a completely different Bluetooth software stack. For example, Linux uses BlueZ, and Windows has several different stacks, including Winsock and Widcomm. Therefore, bluetooth is not likely to work identically on every OS.
  • Windows in particular is challenging to support since several Winsock functions do not work correctly on the version of MinGW-w64 that is currently shipped with GHC for Windows (only the 64-bit version, no less). For this reason, I probably won't develop a Windows version of bluetooth until this issue is resolved.
It is recommended that you have a basic understanding of Bluetooth programming before attempting to use bluetooth. I recommend this introduction by Albert Huang.


The following are abridged examples of the RFCOMM client and server examples from the bluetooth repo.

RFCOMM server

module Main where
import Data.Set

import Network.Bluetooth
import Network.Socket

main :: IO ()
main = withSocketsDo $ do
let uuid = serviceClassToUUID SerialPort
proto = RFCOMM
settings = defaultSDPInfo {
sdpServiceName = Just "Roto-Rooter Data Router"
, sdpProviderName = Just "Roto-Rooter"
, sdpDescription = Just "An experimental plumbing router"
, sdpServiceClasses = singleton SerialPort
, sdpProfiles = singleton SerialPort

handshakeSock <- bluetoothSocket proto
btPort <- bluetoothBindAnyPort handshakeSock anyAddr
bluetoothListen handshakeSock 1
service <- registerSDPService uuid settings proto btPort
(connSock, connAddr) <- bluetoothAccept handshakeSock
putStrLn $ "Established connection with address " ++ show connAddr

message <- recv connSock 4096
putStrLn $ "Received message! [" ++ message ++ "]"
let response = reverse message
respBytes <- send connSock response
putStrLn $ "Sent response! " ++ show respBytes ++ " bytes."

close connSock
close handshakeSock
closeSDPService service

RFCOMM client

module Main where

import Network.Bluetooth
import Network.Socket

import System.IO

main :: IO ()
main = withSocketsDo $ do
let addr = read "12:34:56:78:90:00"
port = 1

sock <- bluetoothSocket RFCOMM
bluetoothConnect sock addr port
putStrLn "Established a connection!"

putStr "Please enter a message to send: "
hFlush stdout
message <- getLine

messBytes <- send sock message
response <- recv sock 4096
putStrLn $ "Received reponse! [" ++ reponse ++ "]"

close sock
<script type="text/javascript"> SyntaxHighlighter.highlight(); </script>

by Ryan Scott ( at August 18, 2014 09:04 PM

JP Moresmau

Fame at last!

I was reading the book "Haskell Data Analysis Cookbook" when suddenly, my name pops up! Funny to see a link to a 7 year old blog entry, who knew I would go down in history for a few line of codes for a perceptron? It's deep in Chapter 7, for those interested. Maybe this is a sign that I should abandon everything else and spend my time on AI, since it's obviously where fame and riches abound! Right...

by JP Moresmau ( at August 18, 2014 03:10 PM

Tom Schrijvers

Algebraic Effect Handlers, Twice

I have two new papers on algebraic effect handlers:

  • Effect Handlers in Scope
    Nicolas Wu, Tom Schrijvers, Ralf Hinze.
    To appear at the Haskell Symposium 2014.

    Algebraic effect handlers are a powerful means for describing effectful computations. They provide a lightweight and orthogonal technique to define and compose the syntax and semantics of different effects. The semantics is captured by handlers, which are functions that transform syntax trees.
    Unfortunately, the approach does not support syntax for scoping constructs, which arise in a number of scenarios. While handlers can be used to provide a limited form of scope, we demonstrate that this approach constrains the possible interactions of effects and rules out some desired semantics.
    This paper presents two different ways to capture scoped constructs in syntax, and shows how to achieve different semantics by reordering handlers. The first approach expresses scopes using the existing algebraic handlers framework, but has some limitations. The problem is fully solved in the second approach where we introduce higher-order syntax.
  • Heuristics Entwined with Handlers Combined
    Tom Schrijvers, Nicolas Wu, Benoit Desouter, Bart Demoen.
    To appear at the PPDP Symposium 2014.

    A long-standing problem in logic programming is how to cleanly separate logic and control. While solutions exist, they fall short in one of two ways: some are too intrusive, because they require significant changes to Prolog’s underlying implementation; others are lacking a clean semantic grounding. We resolve both of these issues in this paper.
    We derive a solution that is both lightweight and principled. We do so by starting from a functional specification of Prolog based on monads, and extend this with the effect handlers approach to capture the dynamic search tree as syntax. Effect handlers then express heuristics in terms of tree transformations. Moreover, we can declaratively express many heuristics as trees themselves that are combined with search problems using a generic entwining handler. Our solution is not restricted to a functional model: we show how to implement this technique as a library in Prolog by means of delimited continuations.

by Tom Schrijvers ( at August 18, 2014 08:05 AM

August 17, 2014

Don Stewart (dons)

Haskell development job at Standard Chartered

The Strats team at Standard Chartered is hiring expert typed FP developers for Haskell dev roles in London.

This is a “front office” finance role – meaning you will work directly with traders building software to automate and improve their efficiency. The role is very development focused, and you will use Haskell for almost all tasks: data analysis, DSLs, market data publishing, databases, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. There may be a small amount of C++ or F# on occasion. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.

You will join an expert team in London, and significant, demonstrable experience in typed FP (Haskell, OCaml, F# etc) is strongly preferred. We have more than 2 million lines of Haskell, and our own compiler. In this context we look for skill and taste in typed API design to capture and abstract over complex, messy systems.

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability.

The role requires physical presence on the trading floor in London. Remote work isn’t an option. Ideally you have some project and client management skills — you will talk to users, understand their problems, and then implement and deliver what they really need. No financial background is required.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send CVs to me – dons00 at

Tagged: community, jobs, london

by Don Stewart at August 17, 2014 01:07 PM

August 16, 2014

Noam Lewis

xml-to-json – new version released, constant memory usage

I’ve released a new version (1.0.0) of xml-to-json, which aims to solve memory issues encountered when converting large XML files. The new version includes two executables: the regular (aka “classic”) version, xml-to-json, which includes the various features, and the newly added executable xml-to-json-fast, which runs with constant memory usage and can process files of arbitrary size. It does this by not validating the input xml, and by basically streaming json output as it encounters xml elements (tags) in the input. The implementation takes advantage of the cool tagsoup library for processing XML.

Check the for more details. Hackage is updated.

Tagged: Haskell

by sinelaw at August 16, 2014 07:27 PM

wren gayle romano

Citation, Recitation, and Resuscitation

Citation is a necessary practice for any sort of intellectual engagement, whether formal or colloquial, and whether academic or activistic. It is crucial to give credit to the originators of ideas— for ethical honesty: to acknowledge those who've enlightened you; for professional honesty: to make clear where your contributions begin; and for intellectual honesty: to allow others to read the sources for themselves and to follow up on other extensions and criticisms of that work.

When encountering a new idea or text, I often engage in a practice I call "encitation". In order to more thoroughly understand and ingrain a text's intellectual content, I try (temporarily) to view all other ideas and arguments through its lens. This is why when I was reading Whipping Girl I was citing it left and right, just as when I was reading Killing Rage I quoted it incessantly. To understand structuralism, I embraced the structuralist theory and viewed all things in structuralist terms; to understand functionalism, or Marxism, or Freudianism, or performativity, I did the same. Of course, every framework is incomplete and emphasizes certain things to the exclusion of observing others; so viewing the world entirely from within any single framework distorts your perception of reality. The point of the exercise is not to embrace the framework per se, it's to roleplay the embracing of it. The point of this roleplay is to come to understand the emphases and limitations of the framework— not abstractly but specifically. This is especially important for trying to understand frameworks you disagree with. When we disagree with things, the instinct is to discount everything they say. But it's intellectually dishonest to refuse to understand why you disagree. And it's counterproductive, since you cannot debunk the theory nor convince people to change their minds without knowing and addressing where they're coming from.

I engage in encitation not only for anthropological or philosophical ideas, I also do it for mathematical ideas. By trying to view all of mathematics through a particular idea or framework, you come to understand both what it's good at and what it cannot handle. That's one of the things I really love about the way Jason Eisner teaches NLP and declarative methods. While it's brutal to give people a framework (like PCFGs or SAT solving) and then ask them to solve a problem just barely outside of what that framework can handle, it gives you a deep understanding of exactly where and why the framework fails. This is the sort of knowledge you usually have to go out into industry and beat your head against for a while before you see it. But certain fields, like anthropology and writing, do try to teach encitation as a practice for improving oneself. I wonder how much of Jason's technique comes from his background in psychology. Regardless, this practice is one which should, imo, be used (and taught explicitly) more often in mathematics and computer science. A lot of the arguing over OO vs FP would go away if people did this. Instead, we only teach people hybridized approaches, and they fail to internalize the core philosophical goals of notions like objects, functions, types, and so on. These philosophical goals can be at odds, and even irreconcilable, but that does not make one or the other "wrong". The problem with teaching only hybridized approaches is that this irreconcilability means necessarily compromising on the full philosophical commitment to these goals. Without understanding the full philosophical goals of these different approaches, we cannot accurately discuss why sometimes one philosophy is more expedient or practical than another, and yet why that philosophy is not universally superior to others.

The thing to watch out for, whether engaging in the roleplay of encitation or giving citations for actual work, is when you start reciting quotes and texts like catechisms. Once things become a reflexive response, that's a sign that you are no longer thinking. Mantras may be good for meditation, but they are not good critical praxis. This is, no doubt, what Aoife is referring to when she castigates playing Serano says. This is also why it's so dangerous to engage with standardized narratives. The more people engage in recitations of The Narrative, the more it becomes conventionalized and stripped of whatever humanity it may once have had. Moreover, reiterating The Narrative to everyone you meet is the surest way to drive off anyone who doesn't believe in that narrative, or who believes the content but disagrees with the message. Even if I was "born this way", saying so doesn't make it any more true or any more acceptable to those who who would like Jesus to save me from myself. More to the point, saying so places undue emphasis on one very tiny aspect of the whole. I'd much rather convince people of the violent nature of gender enculturation, and get them to recognize the psychological damage that abuse causes, than get them to believe that transgender has a natal origin.

As time goes on, we ask different questions. Consequently, we end up discarding old theories and embracing new ones when the old theory cannot handle our new questions. In our tireless pursuit of the "truth", educators are often reticent to teach defunct theories because we "know" they are "wrong". The new theory is "superior" in being able to address our new questions, but we often lose track of the crucial insights of the old theory along the way. For this reason, it's often important to revive old theories in order to re-highlight those insights and to refocus on old questions which may have become relevant once more. In a way, this revitalization is similar to encitation: the goal is not to say that the old theory is "right", the goal is to understand what the theory is saying and why it's important to say those things.

But again, one must be careful. When new theories arise, practitioners of the immediately-old theory often try to derail the asking of new questions by overemphasizing the questions which gave rise to the preceding theory. This attempt to keep moribund theories on life support often fuels generational divides: the new theoreticians cannot admit to any positives of the old theory lest they undermine their own work, while the old theoreticians feel like they must defend their work against the unrelenting tide lest it be lost forever. I think this is part of why radfems have been spewing such vitriol lately. The theoretical framework of radical feminism has always excluded and marginalized trans women, sex workers, and countless others; but the framework does not justify doxxing, stalking, and harassing those women who dare refute the tenets of The Doctrine. This reactionary violence bears a striking resemblance to the violence of religious fundamentalists1. And as with the religious fundamentalists, I think the reactionary violence of radfems stems from living in a world they can no longer relate to or make sense of.

Major changes in mathematics often result in similar conflicts, though they are seldom so violent. The embracing/rejection of constructivism as a successor to classical mathematics. The embracing/rejection of category theory as an alternative to ZFC set theory. Both of these are radical changes to the philosophical foundations of mathematical thought, and both of these are highly politicized, with advocates on both sides who refuse to hear what the other side is saying. Bob Harper's ranting and railing against Haskell and lazy evaluation is much the same. Yes, having simple cost models and allowing benign side effects is important; but so is having simple semantic models and referential transparency. From where we stand now, those philosophical goals seem to be at odds. But before we can make any progress on reconciling them, we must be willing to embrace both positions long enough to understand their crucial insights and to objectively recognize where and how both fail.

[1] To be clear: I do not draw this analogy as a way of insulting radfems; only to try and make sense of their behavior. There are many religious people (even among those who follow literalist interpretations of their religious texts) who are not terrorists; so too, there are women who believe in the radfem ideology and don't support the behavior of TERFs, SWERFs, etc. It is important to recognize both halves of each community in order to make sense of either side's reactions; and it's important to try to understand the mechanism that leads to these sorts of splits. But exploring this analogy any further is off-topic for this post. Perhaps another time.

comment count unavailable comments

August 16, 2014 12:26 AM

August 15, 2014

Ken T Takusagawa

[leuzkdqp] Units

Some notes on dimensional quantities and type systems:

Addition, subtraction, assignment, and comparison should fail if the units are incompatible.

Multiplication, division, and exponentiation by a rational dimensionless power always work.  These operations assume commutativity.

Distinguishing addition from multiplication vaguely reminds me of the difference between floating point and fixed point.

Unit conversion: a quantity can be read in one set of units then shown in another set.  Abstractly it does not exist as a real number in either.

Converting between different families of units requires exact linear algebra on rational numbers.

In some functions, units pass through just fine.  Others, e.g., trigonometric, require dimensionless numbers.

Not all dimensionless numbers are the same unit: adding an angle to the fine structure constant seems as meaningless as adding a foot to a volt.  But multiplying them could be reasonable.

One can take any compound type with a dimensionless internal type and turn it into a new compound type with that internal type having units.  But should this be considered a "new" type?  Of course, this is useless unless the internal type defined arithmetic operations: "True" miles per hour seems meaningless.

Creating such compound types is analogous to the "function" idea above by viewing a compound type as a data constructor function of a base type.  Constructors do not do operations which can fail, like addition, so the function always succeeds.

Creating a list populated by successive time derivatives of position seems like a useful thing to be able do.  But every element of the list will have different dimensions, which violates the naive idea of a list being items all of the same type.

We would like to catch all dimensionality errors at compile time, but this may not be possible.  The extreme example would be implementing the "units" program.  Is that an anomaly?

It is OK to add a vector to a coordinate (which has an origin) but not a coordinate to a coordinate.  There seems to be a concept of units and "delta" units.

It is OK to subtract coordinates to get a delta.

Maybe multiplying coordinates is also illegal.

Coordinates versus vectors, units versus delta units, seems like an orthogonal problem to "regular" units.  Separate them in software so one can use either concept independently, for example, distinguishing dimensionless from delta dimensionless.

Go further than just "delta" to distinguish first, second, etc., differences.

An X component and Y component of a vector might have the same units, say, length, but one wants to avoid adding them, as this is typically a typo.  But sometimes, for rotations, one does add them.

A Haskell Wiki page: The units package seems promising.

by Ken ( at August 15, 2014 06:39 PM

August 14, 2014

FP Complete

Announcing Stackage Server

A New Service

A couple months ago I made a post explaining Stackage server, its motivations and use-cases, and that it would be available in the coming months. It's now officially available in beta!

Stackage server.

As a quick recap: the essence of Stackage is that rather than publishing at the granularity of packages, like Hackage, it publishes at the granularity of package sets: Either everything builds together, or nothing is published until it does. We call these published things “snapshots.”

Note: A snapshot is an exact image that can be reproduced at any time. With the snapshot's digest hash, you can download and install the same index and install packages based on that index all the time. Subsequently generated snapshots have no effect on previous ones.

I've been using it for a couple months for every project I've worked on, private and public. It's perfect for application developers and teams who want to share a common always-building package set. Provided you're using one of the 500+ packages we're publishing in the snapshots, there will always be a build plan for the package you want to use in your project.

And if your library is in Stackage, as explained in the previous post, you will get a heads up on Github if your updates or other people's updates cause a build failure related to your library.

How it Works

Snapshots are built every couple days. It takes about 16 hours to complete a build. You can view the build progress at

There are two types of snapshots published by FP Complete:

  1. Exclusive: this excludes packages not specified in the Stackage configuration. This means anything that you try to install from this snapshot will have a build plan.
  2. Inclusive: this includes Hackage packages not known to build. If you try to install a package not tracked by Stackage, it may or may not build.

You can use whichever suites your needs. If you want everything to always build, the former is an attractive choice. If you need to use a package not currently on Stackage, the latter choice makes sense.

Try it Right Now

Choose a snapshot. Each snapshot applies to a specific GHC version. For example, the latest (as of writing) GHC 7.8 build. You'll see something like this:

To use, copy the following to your ~/.cabal/config:

remote-repo: stackage:

Note: Remove or comment out any existing remote-repo line.

Run the following to update your packages:

$ cabal update

If you already have installed some packages, it's better to clear out your package set. See this page in the FAQ for how to do that.


How does this interact with sandboxes? Good question. Here's the rundown:

  • hsenv: Yes, works fine. Edit the .hsenv/cabal/config file and off you go.
  • cabal sandbox: Not yet! There is an open issue about this. But I have tried cabal sandboxes inside hsenv, which worked.

We've added this to the FAQ on the wiki. Contributions to this wiki page are welcome!


Personally, I'm very satisfied with this service so far. I just use my existing tools with a different remote-repo.

Others familiar with Nix have asked how they compare. They are very similar solutions in terms of versioning and curation (although Stackage has full-time maintenance); the main advantage to Stackage is that it just uses existing tools, so you don't have to learn a new tool and way of working to have a better user experience.

We'd like feedback on a few points:

  • Is the inclusive/exclusive separation useful?
  • Is the process of using Stackage in an existing system (clearing the package set and starting fresh) easy?
  • Should we establish a convention for storing Stackage snapshot hashes in projects source-tracking repositories?

And any other feedback you come up with while using it.

Stackage for businesses

As part of my last announcement for Stackage I mentioned there will also be custom services for businesses looking to build their development platform on Stackage.

These commercial services include:

  1. Premium support - FP Complete will quickly respond and make improvements or fixes to the public Stackage server as they need to happen.
  2. Private snapshots with premium support - very helpful for commercial users looking to add proprietary or custom libraries.
  3. Validated pre-compiled build images based on public or private snapshots. These can be used on developer systems or automated build systems.
  4. Packaged Jenkins server using the pre-compiled build images.

All these additional commercial services are meant to be helpful add-ons and we look forward to hearing more about what features you think would be beneficial to you. For more information email us at:

August 14, 2014 12:00 AM

August 12, 2014

Neil Mitchell

Safe library - generalising functions

Summary: The Safe library now has exact versions of take/drop, with twelve functions implemented on top of a generalised splitAt.

The Safe library is a simple Haskell library that provides versions of standard Prelude and Data.List functions that usually throw errors (e.g. tail), but wrapped to provide better error messages (e.g. tailNote), default values (e.g. tailDef) and Maybe results (e.g. tailMay).

I recently released version 0.3.5, which provides a new module Safe.Exact containing crashing versions of functions such as zip/zipWith (which error if the lists are not equal length) and take/drop/splitAt (which error if there are not enough elements), then wraps them to provide safe variants. As an example, the library provides:

takeExact    :: Int -> [a] -> [a]
takeExactMay :: Int -> [a] -> Maybe [a]

These are like take, but if the Int is larger than the length of the list it will throw an error or return Nothing. Some sample evaluations:

takeExactMay 2 [1,2,3] == Just [1,2]
takeExact 2 [1,2,3] == [1,2]
takeExactMay 2 [1] == Nothing
takeExact 2 [1] ==
1:error "Safe.Exact.takeExact, index too large, index=2, length=1"
take 1 (takeExact 2 [1]) == [1]

So takeExactMay computes up-front whether the whole computation will succeed, and returns a Nothing if it will fail. In contrast, takeExact produces elements while they are present, but if you demand an additional element that is missing it will result in an error. All the exceptions in the Safe library are designed to provide the maximum level of detail about what went wrong, here telling us the index we were after and the length of the list.

The library provides takeExact, dropExact and splitAtExact, plus Def/May/Note versions, resulting in twelve similar functions. While the implementation of any one function is reasonably short (although not that short, once proper error messages are provided), I didn't want to write the same code twelve times. However, generalising over functions that check up-front and those that check on-demand requires a bit of thought. In the end I settled for:

splitAtExact_ :: (String -> r) -> ([a] -> r) -> (a -> r -> r) -> Int -> [a] -> r
splitAtExact_ err nil cons o xs
| o < 0 = err $ "index must not be negative, index=" ++ show o
| otherwise = f o xs
f 0 xs = nil xs
f i (x:xs) = x `cons` f (i-1) xs
f i [] = err $
"index too large, index=" ++ show o ++ ", length=" ++ show (o-i)

Here the splitAtExact_ function has a parameterised return type r, along with three functional arguments that construct and consume the r values. The functional arguments are:

  • err :: String -> r, says how to convert an error into a result value. For up-front checks this produces a Nothing, for on-demand checks this calls error.
  • nil :: [a] -> r, says what to do once we have consumed the full number of elements. For take we discard all the remaining elements, for drop we are only interested in the remaining elements.
  • cons :: a -> r -> r, says how to deal with one element before we reach the index. For take this will be (:), but for functions producing a Maybe we have to check the r parameter first.

With this generalisation, I was able to write all twelve variants. As a few examples:

addNote fun msg = error $ "Safe.Exact." ++ fun ++ ", " ++ msg

takeExact = splitAtExact_ (addNote "takeExact") (const []) (:)

dropExact = splitAtExact_ (addNote "dropExact") id (flip const)

takeExactMay = splitAtExact_ (const Nothing) (const $ Just []) (\a -> fmap (a:))

dropExactMay = splitAtExact_ (const Nothing) Just (flip const)

splitAtExact = splitAtExact_ (addNote "splitAtExact")
(\x -> ([], x)) (\a b -> first (a:) b)

splitAtExactMay = splitAtExact_ (const Nothing)
(\x -> Just ([], x)) (\a b -> fmap (first (a:)) b)

Normally I would have defined takeExact and dropExact in terms of fst/snd on top of splitAtExact. However, in the Safe library error messages are of paramount importance, so I go to additional effort to ensure the error says takeExact and not splitAtExact.

by Neil Mitchell ( at August 12, 2014 09:08 PM

August 11, 2014

Dan Burton

Similarities: Monoid, MonadPlus, Category

This is perhaps obvious to anyone who has thoroughly studied category theory, but the similarities between Monoid, MonadPlus, and Category, have really struck me lately. I’m going to take a smidgeon of artistic license to present this train of thought. … Continue reading

by Dan Burton at August 11, 2014 09:41 PM

Functional Jobs

Big Data Engineer / Data Scientist at Recruit IT (Full-time)

  • Are you a Big Data Engineer who wants to work on innovative cloud and real-time data analytic technologies?
  • Do you have a passion for turning data into meaningful information?
  • Does working on a world-class big data project excite you?

Our client is currently looking in growth phase and looking for passionate and creative Data Scientists who can design, development, and implement robust, and scalable big data solutions. This is a role where you will need to enjoy being on the cusp of emerging technologies, and have a genuine interest in breaking new ground.

Your skills and experience will cover the majority of the following:

  • Experience working across real-time data analytics, machine learning, and big data solutions
  • Experience working with large data sets and cloud clusters
  • Experience with various NoSQL technologies and Big Data platforms including; Hadoop, Cassandra, HBASE, Accumulo, and MapReduce
  • Experience with various functional programming languages including; Scala, R, Clojure, Erlang, F#, Caml, Haskell, Common Lisp, or Scheme

This is an excellent opportunity for someone who is interested in a change in lifestyle, and where you would be joining other similar experienced professionals!

New Zealand awaits!

Get information on how to apply for this position.

August 11, 2014 12:32 AM

Gabriel Gonzalez

Equational reasoning at scale

Haskell programmers care about the correctness of their software and they specify correctness conditions in the form of equations that their code must satisfy. They can then verify the correctness of these equations using equational reasoning to prove that the abstractions they build are sound. To an outsider this might seem like a futile, academic exercise: proving the correctness of small abstractions is difficult, so what hope do we have to prove larger abstractions correct? This post explains how to do precisely that: scale proofs to large and complex abstractions.

Purely functional programming uses composition to scale programs, meaning that:

  • We build small components that we can verify correct in isolation
  • We compose smaller components into larger components

If you saw "components" and thought "functions", think again! We can compose things that do not even remotely resemble functions, such as proofs! In fact, Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:

  • We build small proofs that we can verify correct in isolation
  • We compose smaller proofs into larger proofs

The following sections illustrate in detail how this works in practice, using Monoids as the running example. We will prove the Monoid laws for simple types and work our way up to proving the Monoid laws for much more complex types. Along the way we'll learn how to keep the proof complexity flat as the types grow in size.


Haskell's Prelude provides the following Monoid type class:

class Monoid m where
mempty :: m
mappend :: m -> m -> m

-- An infix operator equivalent to `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y

... and all Monoid instances must obey the following two laws:

mempty <> x = x                -- Left identity

x <> mempty = x -- Right identity

(x <> y) <> z = x <> (y <> z) -- Associativity

For example, Ints form a Monoid:

-- See "Appendix A" for some caveats
instance Monoid Int where
mempty = 0
mappend = (+)

... and the Monoid laws for Ints are just the laws of addition:

0 + x = x

x + 0 = x

(x + y) + z = x + (y + z)

Now we can use (<>) and mempty instead of (+) and 0:

>>> 4 <> 2
>>> 5 <> mempty <> 5

This appears useless at first glance. We already have (+) and 0, so why are we using the Monoid operations?

Extending Monoids

Well, what if I want to combine things other than Ints, like pairs of Ints. I want to be able to write code like this:

>>> (1, 2) <> (3, 4)
(4, 6)

Well, that seems mildly interesting. Let's try to define a Monoid instance for pairs of Ints:

instance Monoid (Int, Int) where
mempty = (0, 0)
mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)

Now my wish is true and I can "add" binary tuples together using (<>) and mempty:

>>> (1, 2) <> (3, 4)
(4, 6)
>>> (1, 2) <> (3, mempty) <> (mempty, 4)
(4, 6)
>>> (1, 2) <> mempty <> (3, 4)
(4, 6)

However, I still haven't proven that this new Monoid instance obeys the Monoid laws. Fortunately, this is a very simple proof.

I'll begin with the first Monoid law, which requires that:

mempty <> x = x

We will begin from the left-hand side of the equation and try to arrive at the right-hand side by substituting equals-for-equals (a.k.a. "equational reasoning"):

-- Left-hand side of the equation
mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- `mempty = (0, 0)`
= mappend (0, 0) x

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (0, 0) (xL, xR)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= (0 + xL, 0 + xR)

-- 0 + x = x
= (xL, xR)

-- x = (xL, xR)
= x

The proof for the second Monoid law is symmetric

-- Left-hand side of the equation
= x <> mempty

-- x <> y = mappend x y
= mappend x mempty

-- mempty = (0, 0)
= mappend x (0, 0)

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (xL, xR) (0, 0)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= (xL + 0, xR + 0)

-- x + 0 = x
= (xL, xR)

-- x = (xL, xR)
= x

The third Monoid law requires that (<>) is associative:

(x <> y) <> z = x <> (y <> z)

Again I'll begin from the left side of the equation:

-- Left-hand side
(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)
= mappend (xL + yL, xR + yR) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (x1 + x2, y1 + y2)
= mappend ((xL + yL) + zL, (xR + yR) + zR)

-- (x + y) + z = x + (y + z)
= mappend (xL + (yL + zL), xR + (yR + zR))

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= mappend (xL, xR) (yL + zL, yR + zR)

-- mappend (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
= mappend (xL, xR) (mappend (yL, yR) (zL, zR))

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

That completes the proof of the three Monoid laws, but I'm not satisfied with these proofs.

Generalizing proofs

I don't like the above proofs because they are disposable, meaning that I cannot reuse them to prove other properties of interest. I'm a programmer, so I loathe busy work and unnecessary repetition, both for code and proofs. I would like to find a way to generalize the above proofs so that I can use them in more places.

We improve proof reuse in the same way that we improve code reuse. To see why, consider the following sort function:

sort :: [Int] -> [Int]

This sort function is disposable because it only works on Ints. For example, I cannot use the above function to sort a list of Doubles.

Fortunately, programming languages with generics let us generalize sort by parametrizing sort on the element type of the list:

sort :: Ord a => [a] -> [a]

That type says that we can call sort on any list of as, so long as the type a implements the Ord type class (a comparison interface). This works because sort doesn't really care whether or not the elements are Ints; sort only cares if they are comparable.

Similarly, we can make the proof more "generic". If we inspect the proof closely, we will notice that we don't really care whether or not the tuple contains Ints. The only Int-specific properties we use in our proof are:

0 + x = x

x + 0 = x

(x + y) + z = x + (y + z)

However, these properties hold true for all Monoids, not just Ints. Therefore, we can generalize our Monoid instance for tuples by parametrizing it on the type of each field of the tuple:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)

mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)

The above Monoid instance says that we can combine tuples so long as we can combine their individual fields. Our original Monoid instance was just a special case of this instance where both the a and b types are Ints.

Note: The mempty and mappend on the left-hand side of each equation are for tuples. The memptys and mappends on the right-hand side of each equation are for the types a and b. Haskell overloads type class methods like mempty and mappend to work on any type that implements the Monoid type class, and the compiler distinguishes them by their inferred types.

We can similarly generalize our original proofs, too, by just replacing the Int-specific parts with their more general Monoid counterparts.

Here is the generalized proof of the left identity law:

-- Left-hand side of the equation
mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- `mempty = (mempty, mempty)`
= mappend (mempty, mempty) x

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (mempty, mempty) (xL, xR)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= (mappend mempty xL, mappend mempty xR)

-- Monoid law: mappend mempty x = x
= (xL, xR)

-- x = (xL, xR)
= x

... the right identity law:

-- Left-hand side of the equation
= x <> mempty

-- x <> y = mappend x y
= mappend x mempty

-- mempty = (mempty, mempty)
= mappend x (mempty, mempty)

-- Define: x = (xL, xR), since `x` is a tuple
= mappend (xL, xR) (mempty, mempty)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= (mappend xL mempty, mappend xR mempty)

-- Monoid law: mappend x mempty = x
= (xL, xR)

-- x = (xL, xR)
= x

... and the associativity law:

-- Left-hand side
(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend (mappend (xL, xR) (yL, yR)) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)
= mappend (mappend xL yL, mappend xR yR) (zL, zR)

-- mappend (x1, y1) (x2 , y2) = (mappend x1 x2, mappend y1 y2)
= (mappend (mappend xL yL) zL, mappend (mappend xR yR) zR)

-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)
= (mappend xL (mappend yL zL), mappend xR (mappend yR zR))

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= mappend (xL, xR) (mappend yL zL, mappend yR zR)

-- mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)
= mappend (xL, xR) (mappend (yL, yR) (zL, zR))

-- x = (xL, xR)
-- y = (yL, yR)
-- z = (zL, zR)
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

This more general Monoid instance lets us stick any Monoids inside the tuple fields and we can still combine the tuples. For example, lists form a Monoid:

-- Exercise: Prove the monoid laws for lists
instance Monoid [a] where
mempty = []

mappend = (++)

... so we can stick lists inside the right field of each tuple and still combine them:

>>> (1, [2, 3]) <> (4, [5, 6])
(5, [2, 3, 5, 6])
>>> (1, [2, 3]) <> (4, mempty) <> (mempty, [5, 6])
(5, [2, 3, 5, 6])
>>> (1, [2, 3]) <> mempty <> (4, [5, 6])
(5, [2, 3, 5, 6])

Why, we can even stick yet another tuple inside the right field and still combine them:

>>> (1, (2, 3)) <> (4, (5, 6))
(5, (7, 9))

We can try even more exotic permutations and everything still "just works":

>>> ((1,[2, 3]), ([4, 5], 6)) <> ((7, [8, 9]), ([10, 11), 12)
((8, [2, 3, 8, 9]), ([4, 5, 10, 11], 18))

This is our first example of a "scalable proof". We began from three primitive building blocks:

  • Int is a Monoid
  • [a] is a Monoid
  • (a, b) is a Monoid if a is a Monoid and b is a Monoid

... and we connected those three building blocks to assemble a variety of new Monoid instances. No matter how many tuples we nest the result is still a Monoid and obeys the Monoid laws. We don't need to re-prove the Monoid laws every time we assemble a new permutation of these building blocks.

However, these building blocks are still pretty limited. What other useful things can we combine to build new Monoids?


We're so used to thinking of Monoids as data, so let's define a new Monoid instance for something entirely un-data-like:

-- See "Appendix A" for some caveats
instance Monoid b => Monoid (IO b) where
mempty = return mempty

mappend io1 io2 = do
a1 <- io1
a2 <- io2
return (mappend a1 a2)

The above instance says: "If a is a Monoid, then an IO action that returns an a is also a Monoid". Let's test this using the getLine function from the Prelude:

-- Read one line of input from stdin
getLine :: IO String

String is a Monoid, since a String is just a list of characters, so we should be able to mappend multiple getLine statements together. Let's see what happens:

>>> getLine  -- Reads one line of input
>>> getLine <> getLine
>>> getLine <> getLine <> getLine

Neat! When we combine multiple commands we combine their effects and their results.

Of course, we don't have to limit ourselves to reading strings. We can use readLn from the Prelude to read in anything that implements the Read type class:

-- Parse a `Read`able value from one line of stdin
readLn :: Read a => IO a

All we have to do is tell the compiler which type a we intend to Read by providing a type signature:

>>> readLn :: IO (Int, Int)
(1, 2)<Enter>
(1 ,2)
>>> readLn <> readLn :: IO (Int, Int)
>>> readLn <> readLn <> readLn :: IO (Int, Int)

This works because:

  • Int is a Monoid
  • Therefore, (Int, Int) is a Monoid
  • Therefore, IO (Int, Int) is a Monoid

Or let's flip things around and nest IO actions inside of a tuple:

>>> let ios = (getLine, readLn) :: (IO String, IO (Int, Int))
>>> let (getLines, readLns) = ios <> ios <> ios
>>> getLines
>>> readLns

We can very easily reason that the type (IO String, IO (Int, Int)) obeys the Monoid laws because:

  • String is a Monoid
  • If String is a Monoid then IO String is also a Monoid
  • Int is a Monoid
  • If Int is a Monoid, then (Int, Int) is also a `Monoid
  • If (Int, Int) is a Monoid, then IO (Int, Int) is also a Monoid
  • If IO String is a Monoid and IO (Int, Int) is a Monoid, then (IO String, IO (Int, Int)) is also a Monoid

However, we don't really have to reason about this at all. The compiler will automatically assemble the correct Monoid instance for us. The only thing we need to verify is that the primitive Monoid instances obey the Monoid laws, and then we can trust that any larger Monoid instance the compiler derives will also obey the Monoid laws.

The Unit Monoid

Haskell Prelude also provides the putStrLn function, which echoes a String to standard output with a newline:

putStrLn :: String -> IO ()

Is putStrLn combinable? There's only one way to find out!

>>> putStrLn "Hello" <> putStrLn "World"

Interesting, but why does that work? Well, let's look at the types of the commands we are combining:

putStrLn "Hello" :: IO ()
putStrLn "World" :: IO ()

Well, we said that IO b is a Monoid if b is a Monoid, and b in this case is () (pronounced "unit"), which you can think of as an "empty tuple". Therefore, () must form a Monoid of some sort, and if we dig into Data.Monoid, we will discover the following Monoid instance:

-- Exercise: Prove the monoid laws for `()`
instance Monoid () where
mempty = ()

mappend () () = ()

This says that empty tuples form a trivial Monoid, since there's only one possible value (ignoring bottom) for an empty tuple: (). Therefore, we can derive that IO () is a Monoid because () is a Monoid.


Alright, so we can combine putStrLn "Hello" with putStrLn "World", but can we combine naked putStrLn functions?

>>> (putStrLn <> putStrLn) "Hello"

Woah, how does that work?

We never wrote a Monoid instance for the type String -> IO (), yet somehow the compiler magically accepted the above code and produced a sensible result.

This works because of the following Monoid instance for functions:

instance Monoid b => Monoid (a -> b) where
mempty = \_ -> mempty

mappend f g = \a -> mappend (f a) (g a)

This says: "If b is a Monoid, then any function that returns a b is also a Monoid".

The compiler then deduced that:

  • () is a Monoid
  • If () is a Monoid, then IO () is also a Monoid
  • If IO () is a Monoid then String -> IO () is also a Monoid

The compiler is a trusted friend, deducing Monoid instances we never knew existed.

Monoid plugins

Now we have enough building blocks to assemble a non-trivial example. Let's build a key logger with a Monoid-based plugin system.

The central scaffold of our program is a simple main loop that echoes characters from standard input to standard output:

main = do
hSetEcho stdin False
forever $ do
c <- getChar
putChar c

However, we would like to intercept key strokes for nefarious purposes, so we will slightly modify this program to install a handler at the beginning of the program that we will invoke on every incoming character:

install :: IO (Char -> IO ())
install = ???

main = do
hSetEcho stdin False
handleChar <- install
forever $ do
c <- getChar
handleChar c
putChar c

Notice that the type of install is exactly the correct type to be a Monoid:

  • () is a Monoid
  • Therefore, IO () is also a Monoid
  • Therefore Char -> IO () is also a Monoid
  • Therefore IO (Char -> IO ()) is also a Monoid

Therefore, we can combine key logging plugins together using Monoid operations. Here is one such example:

type Plugin = IO (Char -> IO ())

logTo :: FilePath -> Plugin
logTo filePath = do
handle <- openFile filePath WriteMode
hSetBuffering handle NoBuffering
return (hPutChar handle)

main = do
hSetEcho stdin False
handleChar <- logTo "file1.txt" <> logTo "file2.txt"
forever $ do
c <- getChar
handleChar c
putChar c

Now, every key stroke will be recorded to both file1.txt and file2.txt. Let's confirm that this works as expected:

$ ./logger
$ cat file1.txt
$ cat file2.txt

Try writing your own Plugins and mixing them in with (<>) to see what happens. "Appendix C" contains the complete code for this section so you can experiment with your own Plugins.


Notice that I never actually proved the Monoid laws for the following two Monoid instances:

instance Monoid b => Monoid (a -> b) where
mempty = \_ -> mempty
mappend f g = \a -> mappend (f a) (g a)

instance Monoid a => Monoid (IO a) where
mempty = return mempty

mappend io1 io2 = do
a1 <- io1
a2 <- io2
return (mappend a1 a2)

The reason why is that they are both special cases of a more general pattern. We can detect the pattern if we rewrite both of them to use the pure and liftA2 functions from Control.Applicative:

import Control.Applicative (pure, liftA2)

instance Monoid b => Monoid (a -> b) where
mempty = pure mempty

mappend = liftA2 mappend

instance Monoid b => Monoid (IO b) where
mempty = pure mempty

mappend = liftA2 mappend

This works because both IO and functions implement the following Applicative interface:

class Functor f => Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b

-- Lift a binary function over the functor `f`
liftA2 :: Applicative f => (a -> b -> c) -> f a -> f b -> f c
liftA2 f x y = (pure f <*> x) <*> y

... and all Applicative instances must obey several Applicative laws:

pure id <*> v = v

((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)

pure f <*> pure x = pure (f x)

u <*> pure x = pure (\f -> f y) <*> u

These laws may seem a bit adhoc, but this paper explains that you can reorganize the Applicative class to this equivalent type class:

class Functor f => Monoidal f where
unit :: f ()
(#) :: f a -> f b -> f (a, b)

Then the corresponding laws become much more symmetric:

fmap snd (unit # x) = x                 -- Left identity

fmap fst (x # unit) = x -- Right identity

fmap assoc ((x # y) # z) = x # (y # z) -- Associativity
assoc ((a, b), c) = (a, (b, c))

fmap (f *** g) (x # y) = fmap f x # fmap g y -- Naturality
(f *** g) (a, b) = (f a, g b)

I personally prefer the Monoidal formulation, but you go to war with the army you have, so we will use the Applicative type class for this post.

All Applicatives possess a very powerful property: they can all automatically lift Monoid operations using the following instance:

instance (Applicative f, Monoid b) => Monoid (f b) where
mempty = pure mempty

mappend = liftA2 mappend

This says: "If f is an Applicative and b is a Monoid, then f b is also a Monoid." In other words, we can automatically extend any existing Monoid with some new feature f and get back a new Monoid.

Note: The above instance is bad Haskell because it overlaps with other type class instances. In practice we have to duplicate the above code once for each Applicative. Also, for some Applicatives we may want a different Monoid instance.

We can prove that the above instance obeys the Monoid laws without knowing anything about f and b, other than the fact that f obeys the Applicative laws and b obeys the Applicative laws. These proofs are a little long, so I've included them in Appendix B.

Both IO and functions implement the Applicative type class:

instance Applicative IO where
pure = return

iof <*> iox = do
f <- iof
x <- iox
return (f x)

instance Applicative ((->) a) where
pure x = \_ -> x

kf <*> kx = \a ->
let f = kf a
x = kx a
in f x

This means that we can kill two birds with one stone. Every time we prove the Applicative laws for some functor F:

instance Applicative F where ...

... we automatically prove that the following Monoid instance is correct for free:

instance Monoid b => Monoid (F b) where
mempty = pure mempty

mappend = liftA2 mappend

In the interest of brevity, I will skip the proofs of the Applicative laws, but I may cover them in a subsequent post.

The beauty of Applicative Functors is that every new Applicative instance we discover adds a new building block to our Monoid toolbox, and Haskell programmers have already discovered lots of Applicative Functors.

Revisiting tuples

One of the very first Monoid instances we wrote was:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)

mappend (x1, y1) (x2, y2) = (mappend x1 x2, mappend y1 y2)

Check this out:

instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = pure mempty

mappend = liftA2 mappend

This Monoid instance is yet another special case of the Applicative pattern we just covered!

This works because of the following Applicative instance in Control.Applicative:

instance Monoid a => Applicative ((,) a) where
pure b = (mempty, b)

(a1, f) <*> (a2, x) = (mappend a1 a2, f x)

This instance obeys the Applicative laws (proof omitted), so our Monoid instance for tuples is automatically correct, too.

Composing applicatives

In the very first section I wrote:

Haskell programmers prove large-scale properties exactly the same way we build large-scale programs:

  • We build small proofs that we can verify correct in isolation
  • We compose smaller proofs into larger proofs

I don't like to use the word compose lightly. In the context of category theory, compose has a very rigorous meaning, indicating composition of morphisms in some category. This final section will show that we can actually compose Monoid proofs in a very rigorous sense of the word.

We can define a category of Monoid proofs:

So in our Plugin example, we began on the proof that () was a Monoid and then composed three Applicative morphisms to prove that Plugin was a Monoid. I will use the following diagram to illustrate this:

| |
| Legend: * = Object |
| |
| v |
| | = Morphism |
| v |
| |

* `()` is a `Monoid`

| IO

* `IO ()` is a `Monoid`

| ((->) String)

* `String -> IO ()` is a `Monoid`

| IO

* `IO (String -> IO ())` (i.e. `Plugin`) is a `Monoid`

Therefore, we were literally composing proofs together.


You can equationally reason at scale by decomposing larger proofs into smaller reusable proofs, the same way we decompose programs into smaller and more reusable components. There is no limit to how many proofs you can compose together, and therefore there is no limit to how complex of a program you can tame using equational reasoning.

This post only gave one example of composing proofs within Haskell. The more you learn the language, the more examples of composable proofs you will encounter. Another common example is automatically deriving Monad proofs by composing monad transformers.

As you learn Haskell, you will discover that the hard part is not proving things. Rather, the challenge is learning how to decompose proofs into smaller proofs and you can cultivate this skill by studying category theory and abstract algebra. These mathematical disciplines teach you how to extract common and reusable proofs and patterns from what appears to be disposable and idiosyncratic code.

Appendix A - Missing Monoid instances

These Monoid instance from this post do not actually appear in the Haskell standard library:

instance Monoid b => Monoid (IO b)

instance Monoid Int

The first instance was recently proposed here on the Glasgow Haskell Users mailing list. However, in the short term you can work around it by writing your own Monoid instances by hand just by inserting a sufficient number of pures and liftA2s.

For example, suppose we wanted to provide a Monoid instance for Plugin. We would just newtype Plugin and write:

newtype Plugin = Plugin { install :: IO (String -> IO ()) }

instance Monoid Plugin where
mempty = Plugin (pure (pure (pure mempty)))

mappend (Plugin p1) (Plugin p2) =
Plugin (liftA2 (liftA2 (liftA2 mappend)) p1 p2)

This is what the compiler would have derived by hand.

Alternatively, you could define an orphan Monoid instance for IO, but this is generally frowned upon.

There is no default Monoid instance for Int because there are actually two possible instances to choose from:

-- Alternative #1
instance Monoid Int where
mempty = 0

mappend = (+)

-- Alternative #2
instance Monoid Int where
mempty = 1

mappend = (*)

So instead, Data.Monoid sidesteps the issue by providing two newtypes to distinguish which instance we prefer:

newtype Sum a = Sum { getSum :: a }

instance Num a => Monoid (Sum a)

newtype Product a = Product { getProduct :: a}

instance Num a => Monoid (Product a)

An even better solution is to use a semiring, which allows two Monoid instances to coexist for the same type. You can think of Haskell's Num class as an approximation of the semiring class:

class Num a where
fromInteger :: Integer -> a

(+) :: a -> a -> a

(*) :: a -> a -> a

-- ... and other operations unrelated to semirings

Note that we can also lift the Num class over the Applicative class, exactly the same way we lifted the Monoid class. Here's the code:

instance (Applicative f, Num a) => Num (f a) where
fromInteger n = pure (fromInteger n)

(+) = liftA2 (+)

(*) = liftA2 (*)

(-) = liftA2 (-)

negate = fmap negate

abs = fmap abs

signum = fmap signum

This lifting guarantees that if a obeys the semiring laws then so will f a. Of course, you will have to specialize the above instance to every concrete Applicative because otherwise you will get overlapping instances.

Appendix B

These are the proofs to establish that the following Monoid instance obeys the Monoid laws:

instance (Applicative f, Monoid b) => Monoid (f b) where
mempty = pure mempty

mappend = liftA2 mappend

... meaning that if f obeys the Applicative laws and b obeys the Monoid laws, then f b also obeys the Monoid laws.

Proof of the left identity law:

mempty <> x

-- x <> y = mappend x y
= mappend mempty x

-- mappend = liftA2 mappend
= liftA2 mappend mempty x

-- mempty = pure mempty
= liftA2 mappend (pure mempty) x

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> pure mempty) <*> x

-- Applicative law: pure f <*> pure x = pure (f x)
= pure (mappend mempty) <*> x

-- Eta conversion
= pure (\a -> mappend mempty a) <*> x

-- mappend mempty x = x
= pure (\a -> a) <*> x

-- id = \x -> x
= pure id <*> x

-- Applicative law: pure id <*> v = v
= x

Proof of the right identity law:

x <> mempty = x

-- x <> y = mappend x y
= mappend x mempty

-- mappend = liftA2 mappend
= liftA2 mappend x mempty

-- mempty = pure mempty
= liftA2 mappend x (pure mempty)

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> x) <*> pure mempty

-- Applicative law: u <*> pure y = pure (\f -> f y) <*> u
= pure (\f -> f mempty) <*> (pure mappend <*> x)

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (.) <*> pure (\f -> f mempty)) <*> pure mappend) <*> x

-- Applicative law: pure f <*> pure x = pure (f x)
= (pure ((.) (\f -> f mempty)) <*> pure mappend) <*> x

-- Applicative law : pure f <*> pure x = pure (f x)
= pure ((.) (\f -> f mempty) mappend) <*> x

-- `(.) f g` is just prefix notation for `f . g`
= pure ((\f -> f mempty) . mappend) <*> x

-- f . g = \x -> f (g x)
= pure (\x -> (\f -> f mempty) (mappend x)) <*> x

-- Apply the lambda
= pure (\x -> mappend x mempty) <*> x

-- Monoid law: mappend x mempty = x
= pure (\x -> x) <*> x

-- id = \x -> x
= pure id <*> x

-- Applicative law: pure id <*> v = v
= x

Proof of the associativity law:

(x <> y) <> z

-- x <> y = mappend x y
= mappend (mappend x y) z

-- mappend = liftA2 mappend
= liftA2 mappend (liftA2 mappend x y) z

-- liftA2 f x y = (pure f <*> x) <*> y
= (pure mappend <*> ((pure mappend <*> x) <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (((pure (.) <*> pure mappend) <*> (pure mappend <*> x)) <*> y) <*> z

-- Applicative law: pure f <*> pure x = pure (f x)
= ((pure f <*> (pure mappend <*> x)) <*> y) <*> z
f = (.) mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((((pure (.) <*> pure f) <*> pure mappend) <*> x) <*> y) <*> z
f = (.) mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure f <*> pure mappend) <*> x) <*> y) <*> z
f = (.) ((.) mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((pure f <*> x) <*> y) <*> z
f = (.) ((.) mappend) mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f = ((.) mappend) . mappend

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x = (((.) mappend) . mappend) x

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x = (.) mappend (mappend x)

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x = mappend . (mappend x)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y = (mappend . (mappend x)) y

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y = mappend (mappend x y)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y z = mappend (mappend x y) z

-- Monoid law: mappend (mappend x y) z = mappend x (mappend y z)
= ((pure f <*> x) <*> y) <*> z
f x y z = mappend x (mappend y z)

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y z = (mappend x . mappend y) z

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x y = mappend x . mappend y

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x y = (.) (mappend x) (mappend y)

-- (f . g) x = f
= ((pure f <*> x) <*> y) <*> z
f x y = (((.) . mappend) x) (mappend y)

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x y = ((((.) . mappend) x) . mappend) y

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f x = (((.) . mappend) x) . mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f x = (.) (((.) . mappend) x) mappend

-- Lambda abstraction
= ((pure f <*> x) <*> y) <*> z
f x = (\k -> k mappend) ((.) (((.) . mappend) x))

-- (f . g) x = f (g x)
= ((pure f <*> x) <*> y) <*> z
f x = (\k -> k mappend) (((.) . ((.) . mappend)) x)

-- Eta conversion
= ((pure f <*> x) <*> y) <*> z
f = (\k -> k mappend) . ((.) . ((.) . mappend))

-- (.) f g = f . g
= ((pure f <*> x) <*> y) <*> z
f = (.) (\k -> k mappend) ((.) . ((.) . mappend))

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure g <*> pure f) <*> x) <*> y) <*> z
g = (.) (\k -> k mappend)
f = (.) . ((.) . mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure (.) <*> pure (\k -> k mappend)) <*> pure f) <*> x) <*> y) <*> z
f = (.) . ((.) . mappend)

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (\k -> k mappend) <*> (pure f <*> x)) <*> y) <*> z
f = (.) . ((.) . mappend)

-- u <*> pure y = pure (\k -> k y) <*> u
= (((pure f <*> x) <*> pure mappend) <*> y) <*> z
f = (.) . ((.) . mappend)

-- (.) f g = f . g
= (((pure f <*> x) <*> pure mappend) <*> y) <*> z
f = (.) (.) ((.) . mappend)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure g <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z
g = (.) (.)
f = (.) . mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((((pure (.) <*> pure (.)) <*> pure f) <*> x) <*> pure mappend) <*> y) <*> z
f = (.) . mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (((pure (.) <*> (pure f <*> x)) <*> pure mappend) <*> y) <*> z
f = (.) . mappend

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) . mappend

-- (.) f g = f . g
= ((pure f <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) (.) mappend

-- Applicative law: pure f <*> pure x = pure (f x)
= (((pure f <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z
f = (.) (.)

-- Applicative law: pure f <*> pure x = pure (f x)
= ((((pure (.) <*> pure (.)) <*> pure mappend) <*> x) <*> (pure mappend <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= ((pure (.) <*> (pure mappend <*> x)) <*> (pure mappend <*> y)) <*> z

-- Applicative law: ((pure (.) <*> u) <*> v) <*> w = u <*> (v <*> w)
= (pure mappend <*> x) <*> ((pure mappend <*> y) <*> z)

-- liftA2 f x y = (pure f <*> x) <*> y
= liftA2 mappend x (liftA2 mappend y z)

-- mappend = liftA2 mappend
= mappend x (mappend y z)

-- x <> y = mappend x y
= x <> (y <> z)

Appendix C: Monoid key logging

Here is the complete program for a key logger with a Monoid-based plugin system:

import Control.Applicative (pure, liftA2)
import Control.Monad (forever)
import Data.Monoid
import System.IO

instance Monoid b => Monoid (IO b) where
mempty = pure mempty

mappend = liftA2 mappend

type Plugin = IO (Char -> IO ())

logTo :: FilePath -> Plugin
logTo filePath = do
handle <- openFile filePath WriteMode
hSetBuffering handle NoBuffering
return (hPutChar handle)

main = do
hSetEcho stdin False
handleChar <- logTo "file1.txt" <> logTo "file2.txt"
forever $ do
c <- getChar
handleChar c
putChar c

by Gabriel Gonzalez ( at August 11, 2014 12:13 AM

August 10, 2014

Gabriel Gonzalez

managed-1.0.0: A monad for managed resources

I'm splitting off the Managed type from the mvc library into its own stand-alone library. I've wanted to use this type outside of mvc for some time now, because it's an incredibly useful Applicative that I find myself reaching for in my own code whenever I need to acquire resources.

If you're not familiar with the Managed type, it's simple:

-- The real implementation uses smart constructors
newtype Managed a =
Managed { with :: forall r . (a -> IO r) -> IO r }

-- It's a `Functor`/`Applicative`/`Monad`
instance Functor Managed where ...
instance Applicative Managed where ...
instance Monad Managed where ...

-- ... and also implements `MonadIO`
instance MonadIO Managed where ...

Here's an example of mixing the Managed monad with pipes to copy one file to another:

import Control.Monad.Managed
import System.IO
import Pipes
import qualified Pipes.Prelude as Pipes

main = runManaged $ do
hIn <- managed (withFile "in.txt" ReadMode)
hOut <- managed (withFile "out.txt" WriteMode)
liftIO $ runEffect $
Pipes.fromHandle hIn >-> Pipes.toHandle hOut

However, this is not much more concise than the equivalent callback-based version. The real value of the Managed type is its Applicative instance, which you can use to lift operations from values that it wraps.

Equational reasoning

My previous post on equational reasoning at scale describes how you can use Applicatives to automatically extend Monoids while preserving the Monoid operations. The Managed Applicative is no different and provides the following type class instance that automatically lifts Monoid operations:

instance Monoid a => Monoid (Managed a)

Therefore, you can treat the Managed Applicative as yet another useful building block in your Monoid tool box.

However, Applicatives can do more than extend Monoids; they can extend Categorys, too. Given any Category, if you extend it with an Applicative you can automatically derive a new Category. Here's the general solution:

import Control.Applicative
import Control.Category
import Prelude hiding ((.), id)

newtype Extend f c a b = Extend (f (c a b))

instance (Applicative f, Category c)
=> Category (Extend f c) where
id = Extend (pure id)

Extend f . Extend g = Extend (liftA2 (.) f g)

So let's take advantage of this fact to extend one of the pipes categories with simple resource management. All we have to do is wrap the pull-based pipes category in a bona-fide Category instance:

import Pipes

newtype Pull m a b = Pull (Pipe a b m ())

instance Monad m => Category (Pull m) where
id = Pull cat

Pull p1 . Pull p2 = Pull (p1 <-< p2)

Now we can automatically define resource-managed pipes by Extending them with the Managed Applicative:

import Control.Monad.Managed
import qualified Pipes.Prelude as Pipes
import System.IO

fromFile :: FilePath -> Extend Managed (Pull IO) () String
fromFile filePath = Extend $ do
handle <- managed (withFile filePath ReadMode)
return (Pull (Pipes.fromHandle handle))

toFile :: FilePath -> Extend Managed (Pull IO) String X
toFile filePath = Extend $ do
handle <- managed (withFile filePath WriteMode)
return (Pull (Pipes.toHandle handle))

All we need is a way to run Extended pipes and then we're good to go:

runPipeline :: Extend Managed (Pull IO) () X -> IO ()
runPipeline (Extend mp) = runManaged $ do
Pull p <- mp
liftIO $ runEffect (return () >~ p)

If we compose and run these Extended pipes they just "do the right thing":

main :: IO ()
main = runPipeline (fromFile "in.txt" >>> toFile "out.txt")

Let's check it out:

$ cat in.txt
$ ./example
$ cat out.txt

We can even reuse existing pipes, too:

reuse :: Monad m => Pipe a b m () -> Extend Managed (Pull m) a b
reuse = Extend . pure . Pull

main = runPipeline $
fromFile "in.txt" >>> reuse (Pipes.take 2) >>> toFile "out.txt"

... and reuse does the right thing:

$ ./example
$ cat out.txt

What does it mean for reuse to "do the right thing"? Well, we can specify the correctness conditions for reuse as the following functor laws:

reuse (p1 >-> p2) = reuse p1 >>> reuse p2

reuse cat = id

These two laws enforce that reuse is "well-behaved" in a rigorous sense.

This is just one example of how you can use the Managed type to extend an existing Category. As an exercise, try to take other categories and extend them this way and see what surprising new connectable components you can create.


Experts will recognize that Managed is a special case of Codensity or ContT. The reason for defining a separate type is:

  • simpler inferred types,
  • additional type class instances, and:
  • a more beginner-friendly name.

Managed is closely related in spirit to the Resource monad, which is now part of resourcet. The main difference between the two is:

  • Resource preserves the open and close operations
  • Managed works for arbitrary callbacks, even unrelated to resources

This is why I view the them as complementary Monads.

Like all Applicatives, the Managed type is deceptively simple. This type does not do much in isolation, but it grows in power the more you compose it with other Applicatives to generate new Applicatives.

by Gabriel Gonzalez ( at August 10, 2014 11:58 PM


Field Accessors Considered Harmful

It's pretty well known these days that Haskell's field accessors are rather cumbersome syntactically and not composable.  The lens abstraction that has gotten much more popular recently (thanks in part to Edward Kmett's lens package) solves these problems.  But I recently ran into a bigger problem with field accessors that I had not thought about before.  Consider the following scenario.  You have a package with code something like this:

data Config = Config { configFieldA :: [Text] }

So your Config data structure gives your users getters and setters for field A (and any other fields you might have).  Your users are happy and life is good.  Then one day you decide to add a new feature and that feature requires expanding and restructuring Config.  Now you have this:
data MConfig = MConfig { mconfigFieldA :: [Text] }
data Config = Config { configMC :: MConfig , configFieldX :: Text , configFieldY :: Bool }

This is a nice solution because your users get to keep the functionality over the portion of the Config that they are used to, and they still get the new functionality.  But now there's a problem.  You're still breaking them because configFieldA changed names to mconfigFieldA and now refers to the MConfig structure instead of Config.  If this was not a data structure, you could preserve backwards compatibility by creating another function:

configFieldA = mconfigFieldA . configMC

But alas, that won't work here because configFieldA is not a normal function.  It is a special field accessor generated by GHC and you know that your users are using it as a setter.  It seems to me that we are at an impasse.  It is completely impossible to deliver your new feature without breaking backwards compatibility somehow.  No amount of deprecated cycles can ease the transition.  The sad thing is that it seems like it should have been totally doable.  Obviously there are some kinds of changes that understandably will break backwards compatibility.  But this doesn't seem like one of them since it is an additive change.  Yes, yes, I's impossible to do this change without changing the type of the Config constructor, so that means that at least that function will break.  But we should be able to minimize the breakage to the field accessor functions, and field accessors prevent us from doing that.

However, we could have avoided this problem.  If we had a bit more foresight, we could have done this.

module Foo (mkConfig, configFieldA) where

data Config = Config { _configFieldA :: [Text] }

mkConfig :: [Text] -> Config
mkConfig = Config

configFieldA = lens _configFieldA (\c a -> c { _configFieldA = a })

This would allow us to avoid breaking backwards compatibility by continuing to export appropriate versions of these symbols.  It would look something like this.
module Foo
  ( MConfig
  , mkMConfig
  , mconfigFieldA
  , Config
  , mkConfig
  , configFieldA
  , configMC
  -- ...
  ) where

data MConfig = MConfig { _mconfigFieldA :: [Text] }
data Config = Config { _configMC :: MConfig
                     , _configFieldX :: Text
                     , _configFieldY :: Bool }

mkMConfig = MConfig

mkConfig a = Config (mkMConfig a) "" False

mconfigFieldA = lens _mconfigFieldA (\c a -> c { _mconfigFieldA = a })
configMC = lens _configMC (\c mc -> c { _configMC = mc })

-- The rest of the field lenses here

configFieldA = configMC . mconfigFieldA
Note that the type signatures for mkConfig and configFieldA stay exactly the same.  We weren't able to do this with field accessors because they are not composable.  Lenses solve this problem for us because they are composable and we have complete control over their definition.

For quite some time now I have thought that I understood the advantage that lenses give you over field accessors.  Discovering this added ability of lenses in helping us preserve backwards compatibility came as a pleasant surprise.  I'll refrain from opining on how this should affect your development practices, but I think it makes the case for using lenses in your code a bit stronger than it was before.

by mightybyte ( at August 10, 2014 09:33 PM

Brent Yorgey

Readers wanted!

tl;dr: Read a draft of my thesis and send me your feedback by September 9!

Over the past year I’ve had several people say things along the lines of, “let me know if you want me to read through your thesis”. I never took them all that seriously (it’s easy to say you are willing to read a 200-page document…), but it never hurts to ask, right?

My thesis defense is scheduled for October 14, and I’m currently undertaking a massive writing/editing push to try to get as much of it wrapped up as I can before classes start on September 4. So, if there’s anyone out there actually interested in reading a draft and giving feedback, now is your chance!

The basic idea of my dissertation is to put combinatorial species and related variants (including a port of the theory to HoTT) in a common categorical framework, and then be able to use them for working with/talking about data types. If you’re brave enough to read it, you’ll find lots of category theory and type theory, and very little code—but I can promise lots of examples and pretty pictures. I’ve tried to make it somewhat self-contained, so it may be a good way to learn a bit of category theory or homotopy type theory, if you’ve been curious to learn more about those topics.

You can find the latest draft here (auto-updated every time I commit); more generally, you can find the git repo here. If you notice any typos or grammatical errors, feel free to open a pull request. For anything more substantial—thoughts on the organization, notes or questions about things you found confusing, suggestions for improvement, pointers to other references—please send me an email (first initial last name at gmail). And finally, please send me any feedback by September 9 at the latest (but the earlier the better). I need to have a final version to my committee by September 23.

Last but not least, if you’re interested to read it but don’t have the time or inclination to provide feedback on a draft, never fear—I’ll post an announcement when the final version is ready for your perusal!

by Brent at August 10, 2014 07:51 PM

Manuel M T Chakravarty

What is currying (in Swift)?

A blog post by Ole Begemann has led some people interested in Swift wondering what exactly curried functions and currying are — for example, listen to the discussion on the Mobile Couch podcast Episode 37.

Let’s look at currying in Swift. Here is a binary add function

func add(x: Int, #y: Int) -> Int {
    return x + y

and next we have a curried version of the same function

func curriedAdd(x: Int)(y: Int) -> Int {
  return x + y

The difference between the two is that add takes two arguments (two Ints) and returns an Int, whereas curriedAdd takes only one argument and returns a function of type (y: Int) -> Int. If you put those two definitions into a Playground, both add(1, y: 2) and curriedAdd(1)(y: 2) yield 3.

In add(1, 2), we supply two arguments at once, but in curriedAdd(1)(y: 2), we supply only one argument, get a new function as the result, and then, apply that new function to the second argument. In other words, add requires two arguments at once, whereas its curried variant requires the two arguments one after the other in two separate function calls.

This works not only for binary functions, but also for functions expecting three or more arguments. More generally, currying refers to the fact that any n-ary function (i.e., any function expecting n arguments) can be rewritten as a computationally equivalent function that doesn’t get all n arguments at once, but gets them one after the other, always returning an intermediate function for each of the n function applications.

That’s…interesting, but why should we care? Curried functions are more versatile than their uncurried counterparts. We can apply the function add only to two arguments. That’s it. In contrast, we can apply curriedAdd to either one or two arguments. If we want to define an increment function, we can do that easily in terms of curriedAdd:

let inc = curriedAdd(1)

As expected, inc(y: 2) gives 3.

For a simple function, such as, add, this extra versatility is not very impressive. However, Ole’s blog post explains how this ultimately enables the target-action pattern in Swift and that is pretty impressive!

As a side note, in the functional language Haskell all functions are curried by default. In fact, the concept was called currying after the mathematician Haskell B. Curry in whose honour the language was called Haskell.

August 10, 2014 09:19 AM

Mike Izbicki

Scribal traditions of "ancient" Hebrew scrolls

Scribal traditions of "ancient" Hebrew scrolls

posted on 2014-08-10

In 2006, I saw the dead sea scrolls in San Diego. The experience changed my life. I realized I knew nothing about ancient Judea, and decided to immerse myself in it. I studied biblical Hebrew and began a collection of Hebrew scrolls.

a pile of Torah scrolls

a pile of Torah scrolls

Each scroll is between 100 to 600 years old, and is a fragment of the Torah. These scrolls were used by synagogues throughout Southern Europe, Africa, and the Middle East. As we’ll see in a bit, each region has subtly different scribal traditions. But they all take their Torah very seriously.

The first thing that strikes me about a scroll is its color. Scrolls are made from animal skin, and the color is determined by the type of animal and method of curing the skin. The methods and animals used depend on the local resources, so color gives us a clue about where the scroll originally came from. For example, scrolls with a deep red color usually come from North Africa. As the scroll ages, the color may either fade or deepen slightly, but remains largely the same. The final parchment is called either gevil or klaf depending on the quality and preparation method.

The four scrolls below show the range of colors scrolls come in:

4 Torah scrolls side by side with different ages

4 Torah scrolls side by side with different ages

My largest scroll is about 60 feet long. Here I have it partially stretched out on the couch in my living room:

Torah scroll stretched out on couch

Torah scroll stretched out on couch

The scroll is about 300 years old, and contains most of Exodus, Leviticus, and Numbers. A complete Torah scroll would also have Genesis and Deuteronomy and be around 150 feet long. Sadly, this scroll has been damaged throughout its long life, and the damaged sections were removed.

As you can imagine, many hides were needed to make these large scrolls. These hides get sewn together to form the full scroll. You can easily see the stitching on the back of the scroll:

back of a Torah scroll

back of a Torah scroll

Also notice how rough that skin is! The scribes (for obvious reasons) chose to use the nice side of the skin to write on.

Here is a front end, rotated view of the same seam above. Some columns of text are separated at these seems, but some columns are not.

front of Torah seam

front of Torah seam

Animal hides come in many sizes. The hide in this image is pretty large and holds five columns of text:

5 panels of parchment

5 panels of parchment

But this hide is smaller and holds only three columns:

3 panels of parchment

3 panels of parchment

The coolest part of these scrolls is their calligraphy. Here’s a zoomed in look on one of the columns of text above:

zoomed in Hebrew Torah scroll

zoomed in Hebrew Torah scroll

There’s a lot to notice in this picture:

  1. The detail is amazing. Many characters have small strokes decorating them. These strokes are called tagin (or crowns in English). A bit farther down the page we’ll see different ways other scribal traditions decorate these characters. Because of this detail in every letter, a scribe (or sopher) might spend the whole day writing without finishing a single piece of parchment. The average sopher takes between nine months to a year to complete a Torah scroll.

  2. There are faint indentations in the parchment that the sopher used to ensure he was writing straight. We learned to write straight in grade school by writing our letters on top of lines on paper. But in biblical Hebrew, the sopher writes their letters below the line!

  3. Hebrew is read and written right to left (backwards from English). To keep the left margin crisp, the letters on the left can be stretched to fill space. This effect is used in different amounts throughout the text. The stretching is more noticeable in this next section:

Hebrew stretched letters in Torah scroll

Hebrew stretched letters in Torah scroll

And sometimes the sopher goes crazy and stretches all the things:

scribe stretched all the letters on this line of a Hebrew Torah

scribe stretched all the letters on this line of a Hebrew Torah

If you look at the pictures above carefully, you can see that only certain letters get stretched: ת ד ח ה ר ל. These letters look nice when stretched because they have a single horizontal stroke.

The next picture shows a fairly rare example of stretching the letter ש. It looks much less elegant than the other stretched letters:

stretching the shem letter in a Hebrew Torah scroll

stretching the shem letter in a Hebrew Torah scroll

Usually these stretched characters are considered mistakes. An experienced sopher evenly spaces the letters to fill the line exactly. But a novice sopher can’t predict their space usage as well. When they hit the end of the line and realize they can’t fit another full word, they’ll add one of these stretched characters to fill the space.

In certain sections, however, stretched lettering is expected. It is one of the signs of poetic text in the Torah. For example, in the following picture, the sopher intentionally stretched each line, even when they didn’t have to:

closeup of Torah scroll with cool calligraphy

closeup of Torah scroll with cool calligraphy

Keeping the left margin justified isn’t just about looks. The Torah is divided into thematic sections called parashot. There are two types of breaks separating parashot. The petuha (open) is a jagged edge, much like we end paragraphs in English. The setumah (closed) break is a long space in the middle of the line. The picture below shows both types of breaks:

Torah scroll containing a petuha and setumah parashah break

Torah scroll containing a petuha and setumah parashah break

A sopher takes these parashot divisions very seriously. If the sopher accidentally adds or removes parashot from the text, the entire scroll becomes non-kosher and cannot be used. A mistake like this would typically be fixed by removing the offending piece of parchment from the scroll, rewriting it, and adding the corrected version back in. (We’ll see some pictures of less serious mistakes at the bottom of this page.)

The vast majority of of the Torah is formatted as simple blocks of text. But certain sections must be formatted in special ways. This is a visual cue that the text is more poetic.

The passage below is of Numbers 10:35-36. Here we see an example of the inverted nun character being used to highlight some text. This is the only section of the Torah where this formatting appears (although it also appears seven times in the book of psalms). The inverted nun characters are set all by themselves, and surround a command about the Ark of the Covenant:

Moses gives a command about the ark of the covenant in fancy Hebrew script; inverted nun character

Moses gives a command about the ark of the covenant in fancy Hebrew script; inverted nun character

It’s really cool when two different scrolls have sections that overlap. We can compare them side-by-side to watch the development of different scribal traditions. The image below shows two versions of Numbers 6:22-27.

The lord bless you and keep you rendered in a Hebrew Torah in fancy Hebrew script

The lord bless you and keep you rendered in a Hebrew Torah in fancy Hebrew script

The writing is almost identical in both versions, with one exception. On the first line with special formatting, the left scroll has two words in the right column: אמור להם, but the right scroll only has the world אמור) להם is the last word on the previous line). When the sopher is copying a scroll, he does his best to preserve the formatting in these special sections. But due to the vast Jewish diaspora, small mistakes like this get made and propagate. Eventually they form entirely new scribal traditions. (Note that if a single letter is missing from a Torah, then the Torah is not kosher and is considered unfit for use. These scribal differences are purely stylistic.)

Many individual characters and words also receive special formatting throughout the text. Both images below come from the same piece of parchment (in Genesis 23) and were created by the same sopher. The image on the left shows the letter פ in its standard form, and the image on the right shows it in a modified form.

a whirled pe in the Hebrew Torah side by side with a normal pe

a whirled pe in the Hebrew Torah side by side with a normal pe

The meaning of these special characters is not fully known, and every scribal tradition exhibits some variation in what letters get these extra decorations. In the scroll above, the whirled פ appears only once. But some scrolls exhibit the special character dozens of times. Here is another example where you can see a whirled פ a few letters to the right of its normal version:

a whirled pe and normal pe in the Hebrew Torah in the same sentence

a whirled pe and normal pe in the Hebrew Torah in the same sentence

Another special marker is when dots are placed over the Hebrew letters. The picture below comes from the story when Esau is reconciling with Jacob in Genesis 33. Normally, the dotted word would mean that Esau kissed Jacob in reconciliation; but tradition states that these dots indicate that Esau was being incincere. Some rabbis say that this word, when dotted, could be more accurately translated as Esau “bit” Jacob.

dots above words on the Hebrew Torah

dots above words on the Hebrew Torah

Next, let’s take a look at God’s name written in many different styles. In Hebrew, God’s name is written יהוה. Christians often pronounce God’s name as Yahweh or Jehovah. Jews, however, never say God’s name. Instead, they say the word adonai, which means “lord.” In English old testaments, anytime you see the word Lord rendered in small caps, the Hebrew is actually God’s name. When writing in English, Jews will usually write God’s name as YHWH. Removing the vowels is a reminder to not say the name out loud.

Below are nine selected images of YHWH. Each comes from a different scroll and illustrates the decorations added by a different scribal tradition. A few are starting to fade from age, but they were the best examples I could find in the same style. The simplest letters are in the top left, and the most intricate in the bottom right. In the same scroll, YHWH is always written in the same style.

yahweh, jehova, the name of god, in many different Hebrew scripts

yahweh, jehova, the name of god, in many different Hebrew scripts

The next image shows the word YHWH at the end of the line. The ה letters get stretched just like in any other word. When I first saw this I was surprised a sopher would stretch the name of God like this—the name of God is taken very seriously and must be handled according to special rules. I can just imagine rabbis 300 years ago getting into heated debates about whether or not this was kosher!

stretched yahweh in Hebrew Torah scroll

stretched yahweh in Hebrew Torah scroll

There is another oddity in the image above. The letter yod (the small, apostrophe looking letter at the beginning of YHWH) appears in each line. But it is written differently in the last line. Here, it is given two tagin, but everywhere else it only has one. Usually, the sopher consistently applies the same styling throughout the scroll. Changes like this typically indicate the sopher is trying to emphasize some aspect of the text. Exactly what the changes mean, however, would depend on the specific scribal tradition.

The more general word for god in Hebrew is אלוהים, pronounced elohim. This word can refer to either YHWH or a non-Jewish god. Here it is below in two separate scrolls:

elohim, god, in Hebrew

elohim, god, in Hebrew

Many Christians, when they first learn Hebrew, get confused by the word elohim. The ending im on Hebrew words is used to make a word plural, much like the ending s in English. (For example, the plural of sopher is sophrim.) Christians sometimes claim that because the Hebrew word for god looks plural, ancient Jews must have believed in the Christian doctrine of the trinity. But this is very wrong, and rather offensive to Jews.

Tradition has that Moses is the sole author of the Torah, and that Jewish sophrim have given us perfect copies of Moses’ original manuscripts. Most modern scholars, however, believe in the documentary hypothesis, which challenges this tradition. The hypothesis claims that two different writers wrote the Torah. One writer always referenced God as YHWH, whereas the other always referenced God as elohim. The main evidence for the documentary hypothesis is that some stories in the Torah are repeated twice with slightly different details; in one version God is always called YHWH, whereas in the other God is always called elohim. The documentary hypothesis suggests that some later editor merged two sources together, but didn’t feel comfortable editing out the discrepancies, so left them exactly as they were. Orthodox Jews reject the documentary hypothesis, but some strains of Judaism and most Christian denominations are willing to consider that the hypothesis might be true. This controversy is a very important distinction between different Jewish sects, but most Christians aren’t even aware of the controversy in their holy book.

The next two pictures show common gramatical modifications of the words YHWH and elohim: they have letters attached to them in the front. The word YHWH below has a ל in front. This signifies that something is being done to YHWH or for YHWH. The word elohim has a ה in front. This signifies that we’re talking about the God, not just a god. In Hebrew, prepositions like “to,” “for,” and “the” are not separate words. They’re just letters that get attached to the words they modify.

lamed on yhwh adonai plus a he on elohium

lamed on yhwh adonai plus a he on elohium

Names are very important in Hebrew. Most names are actually phrases. The name Jacob, for example, means “heel puller.” Jacob earned his name because he was pulling the heel of his twin brother Esau when they were born in Genesis 25:26. Below are two different versions of the word Jacob:

the name jacob written in fancy Hebrew script; genesis 25:26

the name jacob written in fancy Hebrew script; genesis 25:26

But names often change in the book of Genesis. In fact, Jacob’s name is changed to Israel in two separate locations: first in Genesis 32 after Jacob wrestles with “a man”; then again in Genesis 35 after Jacob builds an alter to elohim. (This is one of the stories cited as evidence for the documentary hypothesis.) The name Israel is appropriate because it literally means “persevered with God.” The el at the end of Israel is a shortened form of elohim and is another Hebrew word for god.

Here is the name Israel in two different scripts:

Israel in Torah script Hebrew

Israel in Torah script Hebrew

Another important Hebrew name is ישוע. In Hebrew, this name is pronounced yeshua, but Christians commonly pronounce it Jesus! The name literally translates as “salvation.” That’s why the angel in Matthew 1:21 and Luke 1:31 gives Jesus this name. My scrolls are only of the old testament, so I don’t have any examples to show of Jesus’ name!

To wrap up the discussion of scribal writing styles, let’s take a look at the most common phrase in the Torah: ודבר יהוה אל משה. This translates to “and the Lord said to Moses.” Here is is rendered in three different styles:

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

vaydaber adonai lmosheh

Now let’s move on to what happens when the sophrim make mistakes.

Copying all of these intricate characters was exhausting work! And hard! So mistakes are bound to happen. But if even a single letter is wrong anywhere in the scroll, the entire scroll is considered unusable. The rules are incredibly strict, and this is why Orthodox Jews reject the documentary hypothesis. To them, it is simply inconceivable to use a version of the Torah that was combined from multiple sources.

The most common way to correct a mistake is to scratch off the outer layer of the parchment, removing the ink. In the picture below, the sopher has written the name Aaron (אהרן) over the scratched off parchment:

scribe mistake in Hebrew Torah scroll

scribe mistake in Hebrew Torah scroll

The next picture shows the end of a line. Because of the mistake, however, the sopher must write several characters in the margin of the text, ruining the nice sharp edge they created with the stretched characters. Writing that enters the margins like this is not kosher.

scribe mistake in Hebrew Torah scroll

scribe mistake in Hebrew Torah scroll

Sometimes a sopher doesn’t realize they’ve made a mistake until several lines later. In the picture below, the sopher has had to scratch off and replace three and a half lines:

scribe makes a big mistake and scratches off several lines in a Torah scroll

scribe makes a big mistake and scratches off several lines in a Torah scroll

Scratching the parchment makes it thinner and weaker. Occasionally the parchment is already very thin, and scratching would tear through to the other side. In this case, the sopher can take a thin piece of blank parchment and attach it to the surface. In the following picture, you can see that the attached parchment has a different color and texture.

parchment mistake added ontop Torah scroll

parchment mistake added ontop Torah scroll

The next picture shows a rather curious instance of this technique. The new parchment is placed so as to cover only parts of words on multiple lines. I can’t imagine how a sopher would make a mistake that would best be fixed in this manner. So my guess is that this patch was applied some time later, by a different sopher to repair some damage that had occurred to the scroll while it was in use.

parchment of repair added to the top of a Torah scroll

parchment of repair added to the top of a Torah scroll

Our last example of correcting mistakes is the most rare. Below, the sopher completely forgot a word when copying the scroll, and added it in superscript above the standard text:

superscript mistake fixing in Toral scroll

superscript mistake fixing in Toral scroll

If we zoom in, you can see that the superscript word is slightly more faded than the surrounding text. This might be because the word was discovered to be missing a long time (days or weeks) after the original text was written, so a different batch of ink was used to write the word.

superscript mistake fixing in Toral scroll

superscript mistake fixing in Toral scroll

Since these scrolls are several hundred years old, they’ve had plenty of time to accumulate damage. When stored improperly, the parchment can tear in some places and bunch up in others:

parchment Torah scroll damage

parchment Torah scroll damage

One of the worst things that can happen to a scroll is water. It damages the parchment and makes the ink run. If this happens, the scroll is ruined permanently.

water damage on a torah scroll

water damage on a torah scroll

you should learn Hebrew!

If you’ve read this far and enjoyed it, then you should learn biblical Hebrew. It’s a lot of fun! You can start right now at any of these great sites:

When you’re ready to get serious, you’ll need to get some books. The books that helped me the most were:

These books all have lots of exercises and make self study pretty simple. The Biblical Hebrew Workbook is for absolute beginners. Within the first few sessions you’re translating actual bible verses and learning the nuances that get lost in the process. I spent two days a week with this book, two hours at each session. It took about four months to finish.

The other two books start right where the workbook stops. They walk you through many important passages and even entire books of the old testament. After finishing these books, I felt comfortable enough to start reading the old testament by myself. Of course I was still very slow and was constantly looking things up in the dictionary!

For me, learning the vocabulary was the hardest part. I used a great free piece of software called FoundationStone to help. The program remembers which words you struggle with and quizes you on them more frequently.

Finally, let’s end with my favorite picture of them all. Here we’re looking down through a rolled up Torah scroll at one of my sandals.

torah sandals james bond

torah sandals james bond

August 10, 2014 12:00 AM

August 09, 2014

Edward Z. Yang

What’s a module system good for anyway?

This summer, I've been working at Microsoft Research implementing Backpack, a module system for Haskell. Interestingly, Backpack is not really a single monolothic feature, but, rather, an agglomeration of small, infrastructural changes which combine together in an interesting way. In this series of blog posts, I want to talk about what these individual features are, as well as how the whole is greater than the sum of the parts.

But first, there's an important question that I need to answer: What's a module system good for anyway? Why should you, an average Haskell programmer, care about such nebulous things as module systems and modularity. At the end of the day, you want your tools to solve specific problems you have, and it is sometimes difficult to understand what problem a module system like Backpack solves. As tomejaguar puts it: "Can someone explain clearly the precise problem that Backpack addresses? I've read the paper and I know the problem is 'modularity' but I fear I am lacking the imagination to really grasp what the issue is."

Look no further. In this blog post, I want to talk concretely about problems Haskellers have today, explain what the underlying causes of these problems are, and say why a module system could help you out.

The String, Text, ByteString problem

As experienced Haskellers are well aware, there are multitude of string types in Haskell: String, ByteString (both lazy and strict), Text (also both lazy and strict). To make matters worse, there is no one "correct" choice of a string type: different types are appropriate in different cases. String is convenient and native to Haskell'98, but very slow; ByteString is fast but are simply arrays of bytes; Text is slower but Unicode aware.

In an ideal world, a programmer might choose the string representation most appropriate for their application, and write all their code accordingly. However, this is little solace for library writers, who don't know what string type their users are using! What's a library writer to do? There are only a few choices:

  1. They "commit" to one particular string representation, leaving users to manually convert from one representation to another when there is a mismatch. Or, more likely, the library writer used the default because it was easy. Examples: base (uses Strings because it completely predates the other representations), diagrams (uses Strings because it doesn't really do heavy string manipulation).
  2. They can provide separate functions for each variant, perhaps identically named but placed in separate modules. This pattern is frequently employed to support both strict/lazy variants Text and ByteStringExamples: aeson (providing decode/decodeStrict for lazy/strict ByteString), attoparsec (providing Data.Attoparsec.ByteString/Data.Attoparsec.ByteString.Lazy), lens (providing Data.ByteString.Lazy.Lens/Data.ByteString.Strict.Lens).
  3. They can use type-classes to overload functions to work with multiple representations. The particular type class used hugely varies: there is ListLike, which is used by a handful of packages, but a large portion of packages simply roll their own. Examples: SqlValue in HDBC, an internal StringLike in tagsoup, and yet another internal StringLike in web-encodings.

The last two methods have different trade offs. Defining separate functions as in (2) is a straightforward and easy to understand approach, but you are still saying no to modularity: the ability to support multiple string representations. Despite providing implementations for each representation, the user still has to commit to particular representation when they do an import. If they want to change their string representation, they have to go through all of their modules and rename their imports; and if they want to support multiple representations, they'll still have to write separate modules for each of them.

Using type classes (3) to regain modularity may seem like an attractive approach. But this approach has both practical and theoretical problems. First and foremost, how do you choose which methods go into the type class? Ideally, you'd pick a minimal set, from which all other operations could be derived. However, many operations are most efficient when directly implemented, which leads to a bloated type class, and a rough time for other people who have their own string types and need to write their own instances. Second, type classes make your type signatures more ugly String -> String to StringLike s => s -> s and can make type inference more difficult (for example, by introducing ambiguity). Finally, the type class StringLike has a very different character from the type class Monad, which has a minimal set of operations and laws governing their operation. It is difficult (or impossible) to characterize what the laws of an interface like this should be. All-in-all, it's much less pleasant to program against type classes than concrete implementations.

Wouldn't it be nice if I could import String, giving me the type String and operations on it, but then later decide which concrete implementation I want to instantiate it with? This is something a module system can do for you! This Reddit thread describes a number of other situations where an ML-style module would come in handy.

(PS: Why can't you just write a pile of preprocessor macros to swap in the implementation you want? The answer is, "Yes, you can; but how are you going to type check the thing, without trying it against every single implementation?")

Destructive package reinstalls

Have you ever gotten this error message when attempting to install a new package?

$ cabal install hakyll
cabal: The following packages are likely to be broken by the reinstalls:
Use --force-reinstalls if you want to install anyway.

Somehow, Cabal has concluded that the only way to install hakyll is to reinstall some dependency. Here's one situation where a situation like this could come about:

  1. pandoc and Graphalyze are compiled against the latest unordered-containers-, which itself was compiled against the latest hashable-
  2. hakyll also has a dependency on unordered-containers and hashable, but it has an upper bound restriction on hashable which excludes the latest hashable version. Cabal decides we need to install an old version of hashable, say hashable-
  3. If hashable- is installed, we also need to build unordered-containers against this older version for Hakyll to see consistent types. However, the resulting version is the same as the preexisting version: thus, reinstall!

The root cause of this error an invariant Cabal currently enforces on a package database: there can only be one instance of a package for any given package name and version. In particular, this means that it is not possible to install a package multiple times, compiled against different dependencies. This is a bit troublesome, because sometimes you really do want the same package installed multiple times with different dependencies: as seen above, it may be the only way to fulfill the version bounds of all packages involved. Currently, the only way to work around this problem is to use a Cabal sandbox (or blow away your package database and reinstall everything, which is basically the same thing).

You might be wondering, however, how could a module system possibly help with this? It doesn't... at least, not directly. Rather, nondestructive reinstalls of a package are a critical feature for implementing a module system like Backpack (a package may be installed multiple times with different concrete implementations of modules). Implementing Backpack necessitates fixing this problem, moving Haskell's package management a lot closer to that of Nix's or NPM.

Version bounds and the neglected PVP

While we're on the subject of cabal-install giving errors, have you ever gotten this error attempting to install a new package?

$ cabal install hledger-0.18
Resolving dependencies...
cabal: Could not resolve dependencies:
# pile of output

There are a number of possible reasons why this could occur, but usually it's because some of the packages involved have over-constrained version bounds (especially upper bounds), resulting in an unsatisfiable set of constraints. To add insult to injury, often these bounds have no grounding in reality (the package author simply guessed the range) and removing it would result in a working compilation. This situation is so common that Cabal has a flag --allow-newer which lets you override the upper bounds of packages. The annoyance of managing bounds has lead to the development of tools like cabal-bounds, which try to make it less tedious to keep upper bounds up-to-date.

But as much as we like to rag on them, version bounds have a very important function: they prevent you from attempting to compile packages against dependencies which don't work at all! An under-constrained set of version bounds can easily have compiling against a version of the dependency which doesn't type check.

How can a module system help? At the end of the day, version numbers are trying to capture something about the API exported by a package, described by the package versioning policy. But the current state-of-the-art requires a user to manually translate changes to the API into version numbers: an error prone process, even when assisted by various tools. A module system, on the other hand, turns the API into a first-class entity understood by the compiler itself: a module signature. Wouldn't it be great if packages depended upon signatures rather than versions: then you would never have to worry about version numbers being inaccurate with respect to type checking. (Of course, versions would still be useful for recording changes to semantics not seen in the types, but their role here would be secondary in importance.) Some full disclosure is warranted here: I am not going to have this implemented by the end of my internship, but I'm hoping to make some good infrastructural contributions toward it.


If you skimmed the introduction to the Backpack paper, you might have come away with the impression that Backpack is something about random number generators, recursive linking and applicative semantics. While these are all true "facts" about Backpack, they understate the impact a good module system can have on the day-to-day problems of a working programmer. In this post, I hope I've elucidated some of these problems, even if I haven't convinced you that a module system like Backpack actually goes about solving these problems: that's for the next series of posts. Stay tuned!

by Edward Z. Yang at August 09, 2014 11:21 PM


7 Startups - part 5 - the XMPP backend

Note: I ran out of time weeks ago. I could never finish this serie as I envisionned, and I don’t see much free time on the horizon. Instead of letting this linger forever, here is a truncated conclusion. The previous episodes were :

  • Part 1 : probably the best episode, about the basic game types.
  • Part 2 : definition of the game rules in an unspecified monad.
  • Part 3 : writing an interpreter for the rules.
  • Part 4 : stumbling and failure in writing a clean backend system.

In the previous episode I added a ton of STM code and helper functions in several 15 minutes sessions. The result was not pretty, and left me dissatisfied.

For this episode, I decided to release my constraints. For now, I am only going to support the following :

  • The backend list will not be dynamic : a bunch of backends are going to be registered once, and it will be not be possible to remove an existing or add a previous backend once this is done.
  • The backends will be text-line based (XMPP and IRC are good protocols for this). This will unfortunately make it harder to write a nice web interface for the game too, but given how much time I can devote to this side-project this doesn’t matter much …

The MVC paradigm

A great man once said that “if you have category theory, everything looks like a pipe. Or a monad. Or a traversal. Or perhaps it’s a cosomething”. With the previously mentionned restrictions, I was able to shoehorn my problem in the shape of the mvc package, which I wanted to try for a while. It might be a bit different that what people usually expect when talking about the model - view - controller pattern, and is basically :

  • Some kind of pollable input (the controllers),
  • a pure stream based computation (the model), sporting an internal state and transforming the data coming from the inputs into something that is passed to …
  • … IO functions that run the actual effects (the views).

Each of these components can be reasoned about separately, and combined together in various ways.

There is however one obvious problem with this pattern, due to the way the game is modeled. Currently, the game is supposed to be able to receive data from the players, and to send data to them. It would need to live entirely in the model for this to work as expected, but the way it is currently written doesn’t make it obvious.

It might be possible to have the game be explicitely CPS, so that the pure part would run the game until communication with the players is required, which would translate nicely in an output that could be consumed by a view.

This would however require some refactoring and a lot of thinking, which I currently don’t have time for, so here is instead how the information flows :

Information flow

Here PInput and GInput are the type of the inputs (respectively from player and games). The blue boxes are two models that will be combined together. The pink ones are the type of outputs emitted from the models. The backends serve as drivers for player communication. The games run in their respective threads, and the game manager spawns and manages the game threads.

Comparison with the “bunch of STM functions” model

I originally started with a global TVar containing the state information of each players (for example if they are part of a game, still joining, due to answer to a game query, etc.). There were a bunch of “helper functions” that would manipulate the global state in a way that would ensure its consistency. The catch is that the backends were responsible for calling these helper functions at appropriate times and for not messing with the global state.

The MVC pattern forces the structure of your program. In my particular case, it means a trick is necessary to integrate it with the current game logic (that will be explained later). The “boSf” pattern is more flexible, but carries a higher cognitive cost.

With the “boSf” pattern, response to player inputs could be :

  • Messages to players, which fits well with the model, as it happened over STM channels, so the whole processing / state manipulation / player output could be of type Input -> STM ().
  • Spawning a game. This time we need forkIO and state manipulation. This means a type like c :: Input -> STM (IO ()), with a call like join (atomically (c input)).

Now there are helper functions that return an IO action, and some that don’t. When some functionnality is added, some functions need to start returning IO actions. This is ugly and makes it harder to extend.

Conclusion of the serie

Unfortunately I ran out of time for working on this serie a few weeks ago. The code is out, the game works and it’s fun. My original motivation for writing this post was as an exposure on basic type-directed design to my non-Haskeller friends, but I think it’s not approachable to non Haskellers, so I never shown them.

The main takeaways are :

Game rules

The game rules have first been written with an unspecified monad that exposed several functions required for user interaction. That’s the reason I started with defining a typeclass, that way I wouldn’t have to worry about implementing the “hard” part and could concentrate on writing the rules instead. For me, this was the fun part, and it was also the quickest.

As of the implementation of the aforementionned functions, I then used the operational package, that would let me write and “interpreter” for my game rules. One of them is pure, and used in tests. There are two other interpreters, one of them for the console version of the game, the other for the multi-backends system.

Backend system

The backends are, I think, easy to expand. Building the core of the multi-game logic with the mvc package very straightforward. It would be obvious to add an IRC backend to the XMPP one, if there weren’t that many IRC packages to choose from on hackage …

A web backend doesn’t seem terribly complicated to write, until you want to take into account some common web application constraints, such as having several redundant servers. In order to do so, the game interpreter should be explicitely turned into an explicit continuation-like system (with the twist it only returns on blocking calls) and the game state serialized in a shared storage system.


My main motivation was to show it was possible to eliminate tons of bug classes by encoding of the invariants in the type system. I would say this was a success.

The area where I expected to have a ton of problems was the card list. It’s a tedious manual process, but some tests weeded out most of the errors (it helps that there are some properties that can be verified on the deck). The other one was the XMPP message processing in its XML horror. It looks terrible.

The area where I wanted this process to work well was a success. I wrote the game rules in one go, without any feedback. Once they were completed, I wrote the backends and tested the game. It turned out they were very few bugs, especially when considering the fact that the game is a moderately complicated board game :

  • One of the special capabilities was replaced with another, and handled at the wrong moment in the game. This was quickly debugged.
  • I used traverse instead of both for tuples. I expected them to have the same result, and it “typechecked” because my tuple was of type (a,a), but the Applicative instance for tuples made it obvious this wasn’t the case. That took a bit longer to find out, as it impacted half of the military victory points, which are distributed only three times per game.
  • I didn’t listen to my own advice, and didn’t take the time to properly encode that some functions only worked with nonempty lists as arguments. This was also quickly found out, using quickcheck.

The game seems to run fine now. There is a minor rule bugs identified (the interaction between card-recycling abilities and the last turn for example), but I don’t have time to fix it.

There might be some interest with the types of the Hub, as they also encode a lot of invariants.

Also off-topic, but I really like using the lens vocabulary to encode the relationship between types these days. A trivial example can be found here.

The game

That might be the most important part. I played a score of games, and it was a lot of fun. The game is playable, and just requires a valid account on an XMPP server. Have fun !

August 09, 2014 08:48 AM

Yesod Web Framework

Deprecating yesod-platform

I want to deprecate the yesod-platform, and instead switch to Stackage server as the recommended installation method for Yesod for end users. To explain why, let me explain the purpose of yesod-platform, the problems I've encountered maintaining it, and how Stackage Server can fit in. I'll also explain some unfortunate complications with Stackage Server.

Why yesod-platform exists

Imagine a simpler Yesod installation path:

  1. cabal install yesod-bin, which provides the yesod executable.
  2. yesod init to create a scaffolding.
  3. cabal install inside that directory, which downloads and installs all of the necessary dependencies.

This in fact used to be the installation procedure, more or less. However, this led to a number of user problems:

  • Back in the earlier days of cabal-install, it was difficult for the dependency solver to find a build plan in this situation. Fortunately, cabal-install has improved drastically since then.
    • This does still happen occasionally, especially with packages with restrictive upper bounds. Using --max-backjumps=-1 usually fixes that.
  • It sometimes happens that an upstream package from Yesod breaks Yesod, either by changing an API accidentally, or by introducing a runtime bug.

This is where yesod-platform comes into play. Instead of leaving it up to cabal-install to track down a consistent build plan, it specifies exact versions of all depedencies to ensure a consistent build plan.

Conflicts with GHC deps/Haskell Platform

Yesod depends on aeson. So logically, yesod-platform should have a strict dependency on aeson. We try to always use the newest versions of dependencies, so today, that would be aeson == In turn, this demands text >= However, if you look at the Haskell Platform changelog, there's no version of the platform that provides a new enough version of text to support that constraint.

yesod-platform could instead specify an older version of aeson, but that would unnecessarily constrain users who aren't sticking to the Haskell Platform versions (which, in my experience, is the majority of users). This would also cause more dependency headaches down the road, as you'd now also need to force older versions of packages like criterion.

To avoid this conflict, yesod-platform has taken the approach of simply omitting constraints on any packages in the platform, as well as any packages with strict bounds on those packages. And if you look at yesod-platform today, you'll that there is no mention of aeson or text.

A similar issue pops up for packages that are a dependency of the GHC package (a.k.a., GHC-the-library). The primary problem there is the binary package. In this case, the allowed version of the package depends on which version of GHC is being used, not the presence or absence of the Haskell Platform.

This results in two problems:

  • It's very difficult to maintain this list of excluded packages correctly. I get large number of bug reports about these kinds of build plan problems.

  • We're giving up quite a bit of the guaranteed buildability that yesod-platform was supposed to provide. If aeson (as an example) doesn't work with yesod-form, yesod-platform won't be able to prevent such a build plan from happening.

There's also an issue with the inability to specify dependencies on executable-only packages, like alex, happy, and yesod-bin.

Stackage Server

Stackage Server solves exactly the same problem. It provides a consistent set of packages that can be installed together. Unlike yesod-platform, it can be distinguished based on GHC version. And it's far simpler to maintain. Firstly, I'm already maintaining Stackage Server full time. And secondly, all of the testing work is handled by a very automated process.

So here's what I'm proposing: I'll deprecate the yesod-platform package, and change the Yesod quickstart guide to have the following instructions:

  • Choose an appropriate Stackage snapshot from
  • Modify your cabal config file appropriately
  • cabal install yesod-bin alex happy
  • Use yesod init to set up a scaffolding
  • cabal install --enable-tests in the new directory

For users wishing to live on more of a bleeding edge, the option is always available to simply not use Stackage. Such a usage will give more control over package versions, but will also lack some stability.

The problems

There are a few issues that need to be ironed out.

  • cabal sandbox does not allow changing the remote-repo. Fortunately, Luite seems to have this solved, so hopefully this won't be a problem for long. Until then, you can either use a single Stackage snapshot for all your development, or use a separate sandboxing technology like hsenv.

  • Haskell Platform conflicts still exist. The problem I mentioned above with aeson and text is a real problem. The theoretically correct solution is to create a Stackage snapshot for GHC 7.8 + Haskell Platform. And if there's demand for that, I'll bite the bullet and do it, but it's not an easy bullet to bite. But frankly, I'm not hearing a lot of users saying that they want to peg Haskell Platform versions specifically.

    In fact, the only users who really seem to want to stick to Haskell Platform versions are Windows users, and the main reason for this is the complexity in installing the network package on Windows. I think there are three possible solutions to this issue, without forcing Windows users onto old versions of packages:

    1. Modify the network package to be easier to install on Windows. I really hope this has some progress. If this is too unstable to be included in the official Hackage release, we could instead have an experimental Stackage snapshot for Windows with that modification applied.
    2. Tell Windows users to simply bypass Stackage and yesod-platform, with the possibility of more build problems on that platform.
      • We could similarly recommend Windows users develop in a Linux virtual machine/Docker image.
    3. Provide a Windows distribution of GHC + cabal-install + network. With the newly split network/network-uri, this is a serious possibility.

Despite these issues, I think Stackage Server is a definite improvement on yesod-platform on Linux and Mac, and will likely still improve the situation on Windows, once we figure out the Haskell Platform problems.

I'm not making any immediate changes. I'd very much like to hear people using Yesod on various operating systems to see how these changes will affect them.

August 09, 2014 05:10 AM

August 08, 2014

Brent Yorgey

Maniac week

Inspired by Bethany Soule (and indirectly by Nick Winter, and also by the fact that my dissertation defense and the start of the semester are looming), I am planning a “maniac week” while Joyia and Noah will be at the beach with my family (I will join them just for the weekend). The idea is to eliminate as many distractions as possible and to do a ton of focused work. Publically committing (like this) to a time frame, ground rules, and to putting up a time-lapse video of it afterwards are what actually make it work—if I don’t succeed I’ll have to admit it here on my blog; if I waste time on Facebook the whole internet will see it in the video; etc. (There’s actually no danger of wasting time on Facebook in particular since I have it blocked, but you get the idea.)

Here are the rules:

  • I will start at 6pm (or thereabouts) on Friday, August 8.
  • I will continue until 10pm on Wednesday, August 13, with the exception of the morning of Sunday, August 10 (until 2pm).
  • I will get at least 7.5 hours of sleep each night.
  • I will not eat cereal for any meal other than breakfast.
  • I will reserve 3 hours per day for things like showering, eating, and just plain resting.  Such things will be tracked by the TagTime tag “notwork”.
  • I will spend the remaining 13.5 hours per day working productively. Things that will count as productive work:
    • Working on my dissertation
    • Course prep for CS 354 (lecture and assignment planning, etc.) and CS 134 (reading through the textbook); making anki decks with names and faces for both courses
    • Updating my academic website (finish converting to Hakyll 4; add potential research and independent study topics for undergraduates)
    • Processing FogBugz tickets
    • I may work on other research or coding projects (e.g. diagrams) each day, but only after spending at least 7 hours on my dissertation.
  • I will not go on IRC at all during the week.  I will disable email notifications on my phone (but keep the phone around for TagTime), and close and block gmail in my browser.  I will also disable the program I use to check my UPenn email account.
  • For FogBugz tickets which require responding to emails, I will simply write the email in a text file and send it later.
  • I may read incoming email and write short replies on my phone, but will keep it to a bare minimum.
  • I will not read any RSS feeds during the week.  I will block feedly in my browser.
  • On August 18 I will post a time-lapse video of August 8-13.  I’ll probably also write a post-mortem blog post, if I feel like I have anything interesting to say.
  • I reserve the right to tweak these rules (by editing this post) up until August 8 at 6pm.  After that point it’s shut up and work time, and I cannot change the rules any more.

And no, I’m not crazy. You (yes, you) could do this too.

by Brent at August 08, 2014 07:43 PM

Bryan O'Sullivan

criterion 1.0

<html xmlns=""> <head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> </head> <body>

Almost five years after I initially released criterion, I'm delighted to announce a major release with a large number of appealing new features.

As always, you can install the latest goodness using cabal install criterion, or fetch the source from github.

Please let me know if you find criterion useful!

New documentation

I built both a home page and a thorough tutorial for criterion. I've also extended the inline documentation and added a number of new examples.

All of the documentation lives in the github repo, so if you'd like to see something improved, please send a bug report or pull request.

New execution engine

Criterion's model of execution has evolved, becoming vastly more reliable and accurate. It can now measure events that take just a few hundred picoseconds.

benchmarking return ()
time                 512.9 ps   (512.8 ps .. 513.1 ps)

While almost all of the core types have changed, criterion should remain API-compatible with the vast majority of your benchmarking code.

New metrics

In addition to wall-clock time, criterion can now measure and regress on the following metrics:

  • CPU time
  • CPU cycles
  • bytes allocated
  • number of garbage collections
  • number of bytes copied during GC
  • wall-clock time spent in mutator threads
  • CPU time spent running mutator threads
  • wall-clock time spent doing GC
  • CPU time spent doing GC

Linear regression

Criterion now supports linear regression of a number of metrics.

Here's a regression conducted using --regress cycles:iters:

cycles:              1.000 R²   (1.000 R² .. 1.000 R²)
  iters              47.718     (47.657 .. 47.805)

The first line of the output is the R² goodness-of-fit measure for this regression, and the second is the number of CPU cycles (measured using the rdtsc instruction) to execute the operation in question (integer division).

This next regression uses --regress allocated:iters to measure the number of bytes allocated while constructing an IntMap of 40,000 values.

allocated:           1.000 R²   (1.000 R² .. 1.000 R²)
  iters              4.382e7    (4.379e7 .. 4.384e7)

(That's a little under 42 megabytes.)

New outputs

While its support for active HTML has improved, criterion can also now output JSON and JUnit XML files.

New internals

Criterion has received its first spring cleaning, and is much easier to understand as a result.


I was inspired into some of this work by the efforts of the authors of the OCaml Core_bench package.

</body> </html>

by Bryan O'Sullivan at August 08, 2014 10:02 AM

August 07, 2014

wren gayle romano

On being the "same" or "different": Introduction to Apartness

Meanwhile, back in math land... A couple-few months ago I was doing some work on apartness relations. In particular, I was looking into foundational issues, and into what an apartness-based (rather than equality-based) dependently-typed programming language would look like. Unfortunately, too many folks think "constructive mathematics" only means BHK-style intuitionistic logic— whereas constructive mathematics includes all sorts of other concepts, and they really should be better known!

So I started writing a preamble post, introducing the basic definitions and ideas behind apartnesses, and... well, I kinda got carried away. Instead of a blog post I kinda ended up with a short chapter. And then, well, panic struck. In the interests of Publish Ever, Publish Often, I thought I might as well share it: a brief introduction to apartness relations. As with my blog posts, I'm releasing it under Creative Commons Attribution-NonCommercial-NoDerivs 4.0; so feel free to share it and use it for classes. But, unlike the other columbicubiculomania files, it is not ShareAlike— since I may actually turn it into a published chapter someday. So do respect that. And if you have a book that needs some chapters on apartness relations, get in touch!

The intro goes a little something like this:

We often talk about values being "the same as" or "different from" one another. But how can we formalize these notions? In particular, how should we do so in a constructive setting?

Constructively, we lack a general axiom for double-negation elimination; therefore, every primitive notion gives rise to both strong (strictly positive) and weak (doubly-negated) propositions. Thus, from the denial of (weak) difference we can only conclude weak sameness. Consequently, in the constructive setting it is often desirable to take difference to be a primitive— so that, from the denial of strong difference we can in fact conclude strong sameness.

This ability "un-negate" sameness is the principal reason for taking difference to be one of our primitive notions. While nice in and of itself, it also causes the strong and weak notions of sameness to become logically equivalent (thm 1.4); enabling us to drop the qualifiers when discussing sameness.

But if not being different is enough to be considered the same, then do we still need sameness to be primitive? To simplify our reasoning, we may wish to have sameness be defined as the lack of difference. However, this is not without complications. Sameness has been considered a natural primitive for so long that it has accrued many additional non-propositional properties (e.g., the substitution principle). So, if we eliminate the propositional notion of primitive equality, we will need somewhere else to hang those coats.

The rest of the paper fleshes out these various ideas.

comment count unavailable comments

August 07, 2014 09:51 AM

Oliver Charles

Working with postgresql-simple with generics-sop

The least interesting part of my job as a programmer is the act of pressing keys on a keyboard, and thus I actively seek ways to reduce typing. As programmers, we aim for reuse in a our programs - abstracting commonality into reusable functions such that our programs get more concise. Functional programmers are aware of the benefits of higher-order functions as one form of generic programming, but another powerful technique is that of data type generic programming.

This variant of generic programming allows one to build programs that work over arbitrary data types, providing they have some sort of known “shape”. We describe the shape of data types by representing them via a code - often we can describe a data type as a sum of products. By sum, we are talking about the choice of a constructor in a data type (such as choosing between Left and Right to construct Either values), and by product we mean the individual fields in a constructor (such as the individual fields in a record).

Last month, Edsko and Löh announced a new library for generic programming: generics-sop. I’ve been playing with this library in the last couple of days, and I absolutely love the approach. In today’s short post, I want to demonstrate how easy it is to use this library. I don’t plan to go into a lot of detail, but I encourage interested readers to check out the associated paper - True Sums of Products - a paper with a lovely balance of theory and a plethora of examples.


When working with postgresql-simple, one often defines records and corresponding FromRow and ToRow instances. Let’s assume we’re modelling a library. No library is complete without books, so we might begin with a record such as:

data Book = Book
  { bookTitle :: Text
  , bookAuthor :: Text
  , bookISBN :: ISBN
  , bookPublishYear :: Int

In order to store and retrieve these in our database, we need to write the following instances:

instance FromRow Book where
  toRow = Book <$> field <*> field <*> field <*> field

instance ToRow Book where
  toRow Book{..} =
    [ toField bookTitle
    , toField bookAuthor
    , toField bookISBN
    , toField bookPublishYear

As you can see - that’s a lot of boilerplate. In fact, it’s nearly twice as much code as the data type itself! The definitions of these instances are trivial, so it’s frustrating that I have to manually type the implementation bodies by hand. It’s here that we turn to generics-sop.

First, we’re going to need a bit of boiler-plate in order to manipulate Books generically:

data Book = ...
  deriving (GHC.Generics.Generic)

instance Generics.SOP.Generic Book

We derive generic representations of our Book using GHC.Generics, and in turn use this generic representation to derive the Generics.SOP.Generic instance. With this out of the way, we’re ready to work with Books in a generic manner.


The generics-sop library works by manipulating heterogeneous lists of data. If we look at our Book data type, we can see that the following two are morally describing the same data:

book = Book "Conceptual Mathematics" "Lawvere, Schanuel" "978-0-521-71916-2" 2009
book = [ "Conceptual Mathematics", "Lawvere, Schanuel", "978-0-521-71916-2", 2009 ]

Of course, we can’t actually write such a thing in Haskell - lists are required to have all their elements of the same type. However, using modern GHC extensions, we can get very close to modelling this:

data HList :: [*] -> * where
  Nil :: HList '[]
  (:*) :: x -> HList xs -> HList (x ': xs)

book :: HList '[Text, Text, ISBN, Int]
book = "Conceptual Mathematics"
    :* "Lawvere, Schanuel"
    :* "978-0-521-71916-2"
    :* 2009
    :* Nil

Once we begin working in this domain, a lot of the techniques we’re already familiar with continue fairly naturally. We can map over these lists, exploit their applicative functor-like structure, fold them, and so on.

generics-sop continues in the trend, using kind polymorphism and a few other techniques to maximise generality. We can see what exactly is going on with generics-sop if we ask GHCI for the :kind! of Book’s generic Code:

> :kind! Code Book
Code Book = SOP I '[ '[ Text, Text, ISBN, Int ] ]

The list of fields is contained within another list of all possible constructors - as Book only has one constructor, thus there is only one element in the outer list.

FromRow, Generically

How does this help us solve the problem of our FromRow and ToRow instances? First, let’s think about what’s happening when we write instances of FromRow. Our Book data type has four fields, so we need to use field four times. field has side effects in the RowParser functor, so we sequence all of these calls using applicative syntax, finally applying the results to the Book constructor.

Now that we’ve broken the problem down, we’ll start by solving our first problem - calling field the correct number of times. Calling field means we need to have an instance of FromField for each field in a constructor, so to enforce this, we can use All to require all fields have an instance of a type class. We also use a little trick with Proxy to specify which type class we need to use. We combine all of this with hcpure, which is a variant of pure that can be used to build a product:

fields :: (All FromField xs, SingI xs) => NP RowParser xs
fields = hcpure fromField field
  where fromField = Proxy :: Proxy FromField

So far, we’ve built a product of field calls, which you can think of as being a list of RowParsers - something akin to [RowParser ..]. However, we need a single row parser returning multiple values, which is more like RowParser [..]. In the Prelude we have a function to sequence a list of monadic actions:

sequence :: Monad m => [m a] -> m [a]

There is an equivalent in generics-sop for working with heterogeneous lists - hsequence. Thus if we hsequence our fields, we build a single RowParser that returns a product of values:

fields :: (All FromField xs, SingI xs) => RowParser (NP I xs)
fields = hsequence (hcpure fromField field)
  where fromField = Proxy :: Proxy FromField

(I is the “do nothing” identity functor).

Remarkably, these few lines of code are enough to construct data types. All we need to do is embed this product in a constructor of a sum, and then switch from the generic representation to a concrete data type. We’ll restrict ourselves to data types that have only one constructor, and this constraint is mentioned in the type below (Code a ~ '[ xs ] forces a to have only one constructor):

  :: (All FromField xs, Code a ~ '[xs], SingI xs, Generic a)
  => RowParser a
gfrowRow = to . SOP . Z <$> hsequence (hcpure fromField field)
  where fromField = Proxy :: Proxy FromField

That’s all there is to it! No type class instances, no skipping over meta-data - we just build a list of field calls, sequence them, and turn the result into our data type.

ToRow, Generically

It’s not hard to apply the same ideas for ToRow. Recall the definition of ToRow:

class ToRow a where
  toRow :: a -> [Action]

toRow takes a value of type a and turns it into a list of actions. Usually, we have one action for each field - we just call toField on each field in the record.

To work with data generically, we first need move from the original data type to its generic representation, which we can do with from and a little bit of pattern matching:

gtoRow :: (Generic a, Code a ~ '[xs]) => a -> [Action]
gtoRow a =
  case from a of
    SOP (Z xs) -> _

Here we pattern match into the fields of the first constructor of the data type. xs is now a product of all fields, and we can begin turning into Actions. The most natural way to do this is simply to map toField over each field, collecting the resulting Actions into a list. That is, we’d like to do:

map toField xs

That’s not quite possible in generics-sop, but we can get very close. Using hcliftA, we can lift a method of a type class over a heterogeneous list:

gtoRow :: (Generic a, Code a ~ '[xs], All ToField xs, SingI xs) => a -> [Action]
gtoRow a =
  case from a of
    SOP (Z xs) -> _ (hcliftA toFieldP (K . toField . unI) xs)

  where toFieldP = Proxy :: Proxy ToField

We unwrap from the identity functor I, call toField on the value, and then pack this back up using the constant functor K. The details here are a little subtle, but essentially this moves us from a heterogeneous list to a homogeneous list, where each element of the list is an Action. Now that we have a homogeneous list, we can switch back to a more basic representation by collapsing the structure with hcollapse:

gtoRow :: (Generic a, Code a ~ '[xs], All ToField xs, SingI xs) => a -> [Action]
gtoRow a =
  case from a of
    SOP (Z xs) -> hcollapse (hcliftA toFieldP (K . toField . unI) xs)

  where toFieldP = Proxy :: Proxy ToField

Admittedly this definition is a little more complicated than one might hope, but it’s still extremely concise and declarative - there’s only a little bit of noise added. However, again we should note there was no need to write type class instances, perform explicit recursion or deal with meta-data - generics-sop stayed out of way and gave us just what we needed.


Now that we have gfromRow and gtoRow it’s easy to extend our application. Perhaps we now want to extend our database with Author objects. We’re now free to do so, with minimal boiler plate:

data Book = Book
  { bookId :: Int
  , bookTitle :: Text
  , bookAuthorId :: Int
  , bookISBN :: ISBN
  , bookPublishYear :: Int
  } deriving (GHC.Generics.Generic)

instance Generic.SOP.Generic Book
instance FromRow Book where fromRow = gfromRow
instance ToRow Book where toRow = gtoRow

data Author = Author
  { authorId :: Int
  , authorName :: Text
  , authorCountry :: Country
  } deriving (GHC.Generics.Generic)

instance Generic.SOP.Generic Author
instance FromRow Author where fromRow = gfromRow
instance ToRow Author where toRow = gtoRow

generics-sop is a powerful library for dealing with data generically. By using heterogeneous lists, the techniques we’ve learnt at the value level naturally extend and we can begin to think of working with generic data in a declarative manner. For me, this appeal to familiar techniques makes it easy to dive straight in to writing generic functions - I’ve already spent time learning to think in maps and folds, it’s nice to see the ideas transfer to yet another problem domain.

generics-sop goes a lot further than we’ve seen in this post. For more real-world examples, see the links at the top of the generics-sop Hackage page.

August 07, 2014 12:00 AM

August 06, 2014

Yesod Web Framework

Announcing auto-update

Kazu and I are happy to announce the first release of auto-update, a library to run update actions on a given schedule. To make it more concrete, let's start with a motivating example.

Suppose you're writing a web service which will return the current time. This is simple enough with WAI and Warp, e.g.:

{-# LANGUAGE OverloadedStrings #-}
import           Data.ByteString.Lazy.Char8 (pack)
import           Data.Time                  (formatTime, getCurrentTime)
import           Network.HTTP.Types         (status200)
import           Network.Wai                (responseLBS)
import           Network.Wai.Handler.Warp   (run)
import           System.Locale              (defaultTimeLocale)

main :: IO ()
main =
    run 3000 app
    app _ respond = do
        now <- getCurrentTime
        respond $ responseLBS status200 [("Content-Type", "text/plain")]
                $ pack $ formatTime defaultTimeLocale "%c" now

This is all well and good, but it's a bit inefficient. Imagine if you have a thousand requests per second (some people really like do know what time it is). We will end up recalculating the string representation of the time a 999 extra times than is necessary! To work around this, we have a simple solution: spawn a worker thread to calculate the time once per second. (Note: it will actually calculate it slightly less than once per second due to the way threadDelay works; we're assuming we have a little bit of latitude in returning a value thats a few milliseconds off.)

{-# LANGUAGE OverloadedStrings #-}
import           Control.Concurrent         (forkIO, threadDelay)
import           Control.Monad              (forever)
import           Data.ByteString.Lazy.Char8 (ByteString, pack)
import           Data.IORef                 (newIORef, readIORef, writeIORef)
import           Data.Time                  (formatTime, getCurrentTime)
import           Network.HTTP.Types         (status200)
import           Network.Wai                (responseLBS)
import           Network.Wai.Handler.Warp   (run)
import           System.Locale              (defaultTimeLocale)

getCurrentTimeString :: IO ByteString
getCurrentTimeString = do
    now <- getCurrentTime
    return $ pack $ formatTime defaultTimeLocale "%c" now

main :: IO ()
main = do
    timeRef <- getCurrentTimeString >>= newIORef
    _ <- forkIO $ forever $ do
        threadDelay 1000000
        getCurrentTimeString >>= writeIORef timeRef
    run 3000 (app timeRef)
    app timeRef _ respond = do
        time <- readIORef timeRef
        respond $ responseLBS status200 [("Content-Type", "text/plain")] time

Now we will calculate the current time once per second, which is far more efficient... right? Well, it depends on server load. Previously, we talked about a server getting a thousand requests per second. Let's instead reverse it: a server that gets one request every thousand seconds. In that case, our optimization turns into a pessimization.

This problem doesn't just affect getting the current time. Another example is flushing logs. A hot web server could be crippled by flushing logs to disk on every request, whereas flushing once a second on a less popular server simply keeps the process running for no reason. One option is to put the power in the hands of users of a library to decide how often to flush. But often times, we won't know until runtime how frequently a service will be requested. Or even more complicated: traffic will come in spikes, with both busy and idle times.

(Note that I've only given examples of running web servers, though I'm certain there are plenty of other examples out there to draw from.)

This is the problem that auto-update comes to solve. With auto-update, you declare an update function, a frequency with which it should run, and a threshold at which it should "daemonize". The first few times you request a value, it's calculated in the main thread. Once you cross the daemonize threshold, a dedicated worker thread is spawned to recalculate the value. If the value is not requested during an update period, the worker thread is shut down, and we go back to the beginning.

Let's see how our running example works out with this:

{-# LANGUAGE OverloadedStrings #-}
import           Control.AutoUpdate         (defaultUpdateSettings,
                                             mkAutoUpdate, updateAction)
import           Data.ByteString.Lazy.Char8 (ByteString, pack)
import           Data.Time                  (formatTime, getCurrentTime)
import           Network.HTTP.Types         (status200)
import           Network.Wai                (responseLBS)
import           Network.Wai.Handler.Warp   (run)
import           System.Locale              (defaultTimeLocale)

getCurrentTimeString :: IO ByteString
getCurrentTimeString = do
    now <- getCurrentTime
    return $ pack $ formatTime defaultTimeLocale "%c" now

main :: IO ()
main = do
    getTime <- mkAutoUpdate defaultUpdateSettings
        { updateAction = getCurrentTimeString
    run 3000 (app getTime)
    app getTime _ respond = do
        time <- getTime
        respond $ responseLBS status200 [("Content-Type", "text/plain")] time

If you want to see the impact of this change, add a putStrLn call to getCurrentTimeString and make a bunch of requests to the service. You should see just one request per second, once you get past that initial threshold period (default of 3).

Kazu and I have started using this library in a few places:

  • fast-logger no longer requires explicit flushing; it's handled for you automatically.
  • wai-logger and wai-extra's request logger, by extension, inherit this functionality.
  • Warp no longer has a dedicated thread for getting the current time.
  • The Yesod scaffolding was able to get rid of an annoying bit of commentary.

Hopefully others will enjoy and use this library as well.


The second module in auto-update is Control.Reaper. This provides something similar, but slightly different, from Control.AutoUpdate. The goal is to spawn reaper/cleanup threads on demand. These threads can handle such things as:

  • Recycling resources in a resource pool.
  • Closing out unused connections in a connection pool.
  • Terminating threads that have overstayed a timeout.

This module is currently being used in Warp for slowloris timeouts and file descriptor cache management, though I will likely use it in http-client in the near future as well for its connection manager management.

August 06, 2014 07:10 AM

Dominic Steinitz

Fun with (Kalman) Filters Part II


Suppose we have particle moving in at constant velocity in 1 dimension, where the velocity is sampled from a distribution. We can observe the position of the particle at fixed intervals and we wish to estimate its initial velocity. For generality, let us assume that the positions and the velocities can be perturbed at each interval and that our measurements are noisy.

A point of Haskell interest: using type level literals caught a bug in the mathematical description (one of the dimensions of a matrix was incorrect). Of course, this would have become apparent at run-time but proof checking of this nature is surely the future for mathematicians. One could conceive of writing an implementation of an algorithm or proof, compiling it but never actually running it purely to check that some aspects of the algorithm or proof are correct.

The Mathematical Model

We take the position as x_i and the velocity v_i:

\displaystyle   \begin{aligned}  x_i &= x_{i-1} + \Delta T v_{i-1} + \psi^{(x)}_i \\  v_i &= v_{i-1} + \psi^{(v)}_i \\  y_i &= a_i x_i + \upsilon_i  \end{aligned}

where \psi^{(x)}_i, \psi^{(v)}_i and \upsilon_i are all IID normal with means of 0 and variances of \sigma^2_x, \sigma^2_v and \sigma^2_y

We can re-write this as

\displaystyle   \begin{aligned}  \boldsymbol{x}_i &= \boldsymbol{A}_{i-1}\boldsymbol{x}_{i-1} + \boldsymbol{\psi}_{i-1} \\  \boldsymbol{y}_i &= \boldsymbol{H}_i\boldsymbol{x}_i + \boldsymbol{\upsilon}_i  \end{aligned}


\displaystyle   \boldsymbol{A}_i =    \begin{bmatrix}      1 & \Delta T\\      0 & 1\\    \end{bmatrix}  ,\quad  \boldsymbol{H}_i =    \begin{bmatrix}      a_i & 0 \\    \end{bmatrix}  ,\quad  \boldsymbol{\psi}_i \sim {\cal{N}}\big(0,\boldsymbol{\Sigma}^{(x)}_i\big)  ,\quad  \boldsymbol{\Sigma}^{(x)}_i =    \begin{bmatrix}      \sigma^2_{x} & 0\\      0 & \sigma^2_{v} \\    \end{bmatrix}  ,\quad  \boldsymbol{\upsilon}_i \sim {\cal{N}}\big(0,\boldsymbol{\Sigma}^{(y)}_i\big)  ,\quad  \boldsymbol{\Sigma}^{(y)}_i =    \begin{bmatrix}      \sigma^2_{z} \\    \end{bmatrix}

Let us denote the mean and variance of \boldsymbol{X}_i\,\vert\,\boldsymbol{Y}_{i-1} as \hat{\boldsymbol{x}}^\flat_i and \hat{\boldsymbol{\Sigma}}^\flat_i respectively and note that

\displaystyle   \begin{aligned}  {\boldsymbol{Y}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{H}_i\boldsymbol{X}_i\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Upsilon}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{H}_i\boldsymbol{X}_i\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Upsilon}_i}  \end{aligned}

Since {\boldsymbol{X}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} and {\boldsymbol{Y}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} are jointly Gaussian and recalling that ({\hat{\boldsymbol{\Sigma}}^\flat_i})^\top = \hat{\boldsymbol{\Sigma}}^\flat_i as covariance matrices are symmetric, we can calculate their mean and covariance matrix as

\displaystyle   \begin{bmatrix}      \hat{\boldsymbol{x}}^\flat_i \\      \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i  \end{bmatrix}  ,\quad  \begin{bmatrix}      \hat{\boldsymbol{\Sigma}}^\flat_i & \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top \\       \boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i & \boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i \\  \end{bmatrix}

We can now use standard formulæ which say if

\displaystyle   \begin{bmatrix}      \boldsymbol{X} \\      \boldsymbol{Y}  \end{bmatrix}  \sim  {\cal{N}}  \begin{bmatrix}  \begin{bmatrix}      \boldsymbol{\mu}_x \\      \boldsymbol{\mu}_y  \end{bmatrix}  &  ,  &  \begin{bmatrix}      \boldsymbol{\Sigma}_x & \boldsymbol{\Sigma}_{xy} \\      \boldsymbol{\Sigma}^\top_{xy} & \boldsymbol{\Sigma}_y  \end{bmatrix}  \end{bmatrix}


\displaystyle   \boldsymbol{X}\,\vert\,\boldsymbol{Y}=\boldsymbol{y} \sim {{\cal{N}}\big( \boldsymbol{\mu}_x + \boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}^{-1}_y(\boldsymbol{y} - \boldsymbol{\mu}_y) , \boldsymbol{\Sigma}_x - \boldsymbol{\Sigma}_{xy}\boldsymbol{\Sigma}^{-1}_y\boldsymbol{\Sigma}^\top_{xy}\big)}

and apply this to

\displaystyle   (\boldsymbol{X}_i\,\vert\, \boldsymbol{Y}_{i-1})\,\vert\,(\boldsymbol{Y}_i\,\vert\, \boldsymbol{Y}_{i-1})

to give

\displaystyle   \boldsymbol{X}_i\,\vert\, \boldsymbol{Y}_{i} = \boldsymbol{y}_i  \sim  {{\cal{N}}\big( \hat{\boldsymbol{x}}^\flat_i + \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top  \big(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i\big)^{-1}  (\boldsymbol{y}_i - \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i) , \hat{\boldsymbol{\Sigma}}^\flat_i - \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i)^{-1}\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i\big)}

This is called the measurement update; more explicitly

\displaystyle   \begin{aligned}  \hat{\boldsymbol{x}}^i &\triangleq  \hat{\boldsymbol{x}}^\flat_i +  \hat{\boldsymbol{\Sigma}}^\flat_i  \boldsymbol{H}_i^\top  \big(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i\big)^{-1}  (\boldsymbol{y}_i - \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i) \\  \hat{\boldsymbol{\Sigma}}_i &\triangleq  {\hat{\boldsymbol{\Sigma}}^\flat_i - \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top(\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i)^{-1}\boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i}  \end{aligned}

Sometimes the measurement residual \boldsymbol{v}_i, the measurement prediction covariance \boldsymbol{S}_i and the filter gain \boldsymbol{K}_i are defined and the measurement update is written as

\displaystyle   \begin{aligned}  \boldsymbol{v}_i & \triangleq  \boldsymbol{y}_i - \boldsymbol{H}_i\hat{\boldsymbol{x}}^\flat_i \\  \boldsymbol{S}_i & \triangleq  \boldsymbol{H}_i \hat{\boldsymbol{\Sigma}}^\flat_i  \boldsymbol{H}_i^\top + \boldsymbol{\Sigma}^{(y)}_i \\  \boldsymbol{K}_i & \triangleq \hat{\boldsymbol{\Sigma}}^\flat_i  \boldsymbol{H}_i^\top\boldsymbol{S}^{-1}_i \\  \hat{\boldsymbol{x}}^i &\triangleq \hat{\boldsymbol{x}}^\flat_i + \boldsymbol{K}_i\boldsymbol{v}_i \\  \hat{\boldsymbol{\Sigma}}_i &\triangleq \hat{\boldsymbol{\Sigma}}^\flat_i - \boldsymbol{K}_i\boldsymbol{S}_i\boldsymbol{K}^\top_i  \end{aligned}

We further have that

\displaystyle   \begin{aligned}  {\boldsymbol{X}_i}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{A}_i\boldsymbol{X}_{i-1}\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Psi}_{i-1}}\,\vert\,{\boldsymbol{Y}_{i-1}} =  {\boldsymbol{A}_i\boldsymbol{X}_{i-1}\,\vert\,{\boldsymbol{Y}_{i-1}} + \boldsymbol{\Psi}_i}  \end{aligned}

We thus obtain the Kalman filter prediction step:

\displaystyle   \begin{aligned}  \hat{\boldsymbol{x}}^\flat_i &=  \boldsymbol{A}_{i-1}\hat{\boldsymbol{x}}_{i-1} \\  \hat{\boldsymbol{\Sigma}}^\flat_i &= \boldsymbol{A}_{i-1}                                       \hat{\boldsymbol{\Sigma}}_{i-1}                                       \boldsymbol{A}_{i-1}^\top                                     + \boldsymbol{\Sigma}^{(x)}_{i-1}  \end{aligned}

Further information can be found in (Boyd 2008), (Kleeman 1996) and (Särkkä 2013).

A Haskell Implementation

The hmatrix now uses type level literals via the DataKind extension in ghc to enforce compatibility of matrix and vector operations at the type level. See here for more details. Sadly a bug in the hmatrix implementation means we can’t currently use this excellent feature and we content ourselves with comments describing what the types would be were it possible to use it.

> {-# OPTIONS_GHC -Wall                     #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing  #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults   #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind  #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods #-}
> {-# OPTIONS_GHC -fno-warn-orphans         #-}
> {-# LANGUAGE DataKinds                    #-}
> {-# LANGUAGE ScopedTypeVariables          #-}
> {-# LANGUAGE RankNTypes                   #-}
> module FunWithKalmanPart1a where
> import Numeric.LinearAlgebra.HMatrix hiding ( outer )
> import Data.Random.Source.PureMT
> import Data.Random hiding ( gamma )
> import Control.Monad.State
> import qualified Control.Monad.Writer as W
> import Control.Monad.Loops

Let us make our model almost deterministic but with noisy observations.

> stateVariance :: Double
> stateVariance = 1e-6
> obsVariance :: Double
> obsVariance = 1.0

And let us start with a prior normal distribution with a mean position and velocity of 0 with moderate variances and no correlation.

> -- muPrior :: R 2
> muPrior :: Vector Double
> muPrior = vector [0.0, 0.0]
> -- sigmaPrior :: Sq 2
> sigmaPrior :: Matrix Double
> sigmaPrior = (2 >< 2) [ 1e1,   0.0
>                       , 0.0,   1e1
>                       ]

We now set up the parameters for our model as outlined in the preceeding section.

> deltaT :: Double
> deltaT = 0.001
> -- bigA :: Sq 2
> bigA :: Matrix Double
> bigA = (2 >< 2) [ 1, deltaT
>                 , 0,      1
>                 ]
> a :: Double
> a = 1.0
> -- bigH :: L 1 2
> bigH :: Matrix Double
> bigH = (1 >< 2) [ a, 0
>                 ]
> -- bigSigmaY :: Sq 1
> bigSigmaY :: Matrix Double
> bigSigmaY = (1 >< 1) [ obsVariance ]
> -- bigSigmaX :: Sq 2
> bigSigmaX :: Matrix Double
> bigSigmaX = (2 >< 2) [ stateVariance, 0.0
>                      , 0.0,           stateVariance
>                      ]

The implementation of the Kalman filter using the hmatrix package is straightforward.

> -- outer ::  forall m n . (KnownNat m, KnownNat n) =>
> --           R n -> Sq n -> L m n -> Sq m -> Sq n -> Sq n -> [R m] -> [(R n, Sq n)]
> outer :: Vector Double
>          -> Matrix Double
>          -> Matrix Double
>          -> Matrix Double
>          -> Matrix Double
>          -> Matrix Double
>          -> [Vector Double]
>          -> [(Vector Double, Matrix Double)]
> outer muPrior sigmaPrior bigH bigSigmaY bigA bigSigmaX ys = result
>   where
>     result = scanl update (muPrior, sigmaPrior) ys
>     -- update :: (R n, Sq n) -> R m -> (R n, Sq n)
>     update (xHatFlat, bigSigmaHatFlat) y =
>       (xHatFlatNew, bigSigmaHatFlatNew)
>       where
>         -- v :: R m
>         v = y - bigH #> xHatFlat
>         -- bigS :: Sq m
>         bigS = bigH <> bigSigmaHatFlat <> (tr bigH) + bigSigmaY
>         -- bigK :: L n m
>         bigK = bigSigmaHatFlat <> (tr bigH) <> (inv bigS)
>         -- xHat :: R n
>         xHat = xHatFlat + bigK #> v
>         -- bigSigmaHat :: Sq n
>         bigSigmaHat = bigSigmaHatFlat - bigK <> bigS <> (tr bigK)
>         -- xHatFlatNew :: R n
>         xHatFlatNew = bigA #> xHat
>         -- bigSigmaHatFlatNew :: Sq n
>         bigSigmaHatFlatNew = bigA <> bigSigmaHat <> (tr bigA) + bigSigmaX

We create some ranodm data using our model parameters.

> singleSample ::(Double, Double) ->
>                RVarT (W.Writer [(Double, (Double, Double))]) (Double, Double)
> singleSample (xPrev, vPrev) = do
>   psiX <- rvarT (Normal 0.0 stateVariance)
>   let xNew = xPrev + deltaT * vPrev + psiX
>   psiV <- rvarT (Normal 0.0 stateVariance)
>   let vNew = vPrev + psiV
>   upsilon <- rvarT (Normal 0.0 obsVariance)
>   let y = a * xNew + upsilon
>   lift $ W.tell [(y, (xNew, vNew))]
>   return (xNew, vNew)
> streamSample :: RVarT (W.Writer [(Double, (Double, Double))]) (Double, Double)
> streamSample = iterateM_ singleSample (1.0, 1.0)
> samples :: ((Double, Double), [(Double, (Double, Double))])
> samples = W.runWriter (evalStateT (sample streamSample) (pureMT 2))

Here are the actual values of the randomly generated positions.

> actualXs :: [Double]
> actualXs = map (fst . snd) $ take nObs $ snd samples
> test :: [(Vector Double, Matrix Double)]
> test = outer muPrior sigmaPrior bigH bigSigmaY bigA bigSigmaX
>        (map (\x -> vector [x]) $ map fst $ snd samples)

And using the Kalman filter we can estimate the positions.

> estXs :: [Double]
> estXs = map (!!0) $ map toList $ map fst $ take nObs test
> nObs :: Int
> nObs = 1000

And we can see that the estimates track the actual positions quite nicely.

Of course we really wanted to estimate the velocity.

> actualVs :: [Double]
> actualVs = map (snd . snd) $ take nObs $ snd samples
> estVs :: [Double]
> estVs = map (!!1) $ map toList $ map fst $ take nObs test


Boyd, Stephen. 2008. “EE363 Linear Dynamical Systems.”

Kleeman, Lindsay. 1996. “Understanding and Applying Kalman Filtering.” In Proceedings of the Second Workshop on Perceptive Systems, Curtin University of Technology, Perth Western Australia (25-26 January 1996).

Särkkä, Simo. 2013. Bayesian Filtering and Smoothing. Vol. 3. Cambridge University Press.

by Dominic Steinitz at August 06, 2014 06:34 AM

Danny Gratzer

Equality is Hard

Posted on August 6, 2014

Equality seems like one of the simplest things to talk about in a theorem prover. After all, the notion of equality is something any small child can intuitively grasp. The sad bit is, while it’s quite easy to hand-wave about, how equality is formalized seems to be a rather complex topic.

In this post I’m going to attempt to cover a few of the main different means of “equality proofs” or identity types and the surrounding concepts. I’m opting for a slightly more informal approach in the hopes of covering more ground.

Definitional Equality

This is not really an equality type per say, but it’s worth stating explicitly what definitional equality is since I must refer to it several times throughout this post.

Two terms A and B are definitional equal is a judgment notated

Γ ⊢ A ≡ B

This is not a user level proof but rather a primitive, untyped judgment in the meta-theory of the language itself. The typing rules of the language will likely include a rule along the lines of

Γ ⊢ A ≡ B, Γ ⊢ x : A
     Γ ⊢ x : B

So this isn’t an identity type you would prove something with, but a much more magical notion that two things are completely the same to the typechecker.

Now in most type theories we have a slightly more powerful notion of definitional equality where not only are x ≡ y if x is y only by definition but also by computation.

So in Coq for example

(2 + 2) ≡ 4

Even though “definitionally” these are entirely separate entities. In most theories, definitionally equal means “inlining all definitions and with normalization”, but not all.

In type theories that distinguish between the two, the judgment that when normalized x is y is called judgmental equality. I won’t distinguish between the two further because most don’t, but it’s worth noting that they can be seen as separate concepts.

Propositional Equality

This is the sort of equality that we’ll spend the rest of our time discussing. Propositional equality is a particular type constructor with the type/kind

Id : (A : Set) → A → A → Type

We should be able to prove a number of definitions like

reflexivity  : (A : Set)(x     : A) → Id x x
symmetry     : (A : Set)(x y   : A) → Id x y → Id y x
transitivity : (A : Set)(x y z : A) → Id x y → Id y z → Id x z

This is an entirely separate issue from definitional equality since propositional equality is a concept that users can hypothesis about.

One very important difference is that we can make proofs like

sanity : Id 1 2 → ⊥

Since the identity proposition is a type family which can be used just like any other proposition. This is in stark contrast to definitional equality which a user can’t even normally utter!


This is arguably the simplest form of equality. Identity types are just normal inductive types with normal induction principles. The most common is equality given by Martin Lof

data Id (A : Set) : A → A → Type where
   Refl : (x : A) → Id x x

This yields a simple induction principle

id-ind : (P : (x y : A) → Id x y → Type)
       → ((x : A) → P x x (Refl x))
       → (x y : A)(p : Id x y) → P x y p

In other words, if we can prove that P holds for the reflexivity case, than P holds for any x and y where Id x y.

We can actually phrase Id in a number of ways, including

data Id (A : Set)(x : A) : A → Set where
  Refl : Id x x

This really makes a difference in the resulting induction principle

j : (A : Set)(x : A)(P : (y : A) → Id x y → Set)
  → P x Refl
  → (y : A)(p : Id x y) → P y p

This clearly turned out a bit differently! In particular now P is only parametrized over one value of A, y. This particular elimination is traditionally named j.

These alternative phrasings can have serious impacts on proofs that use them. It also has even more subtle effects on things like heterogeneous equality which we’ll discuss later.

The fact that this only relies on simple inductive principles is also a win for typechecking. Equality/substitution fall straight out of how normal inductive types are handled! This also means that we can keep decidability within reason.

The price we pay of course is that this is much more painful to work with. An intensional identity type means the burden of constructing our equality proofs falls on users. Furthermore, we lose the ability to talk about observational equality.

Observational equality is the idea that two “thingies” are indistinguishable by any test.

It’s clear that we can prove that if Id x y, then f x = f y, but it’s less clear how to go the other way and prove something like

fun_ext : (A B : Set)(f g : A → B)
         → ((x : A) → Id (f x) (g x)) → Id f g
fun_ext f g p = ??

Even though this is clearly desirable. If we know that f and g behave exactly the same way, we’d like our equality to be able to state that. However, we don’t know that f and g are constructed the same way, making this impossible to prove.

This can be introduced as an axiom but to maintain our inductively defined equality type we have to sacrifice one of the following

  1. Coherence
  2. Inductive types
  3. Extensionality
  4. Decidability

Some this has been avoided by regarding equality as an induction over the class of types as in Martin Lof’s intuitionist type theory.

In the type theory that we’ve outlined, this isn’t expressible sadly.

Definitional + Extensional

Some type theories go a different route to equality, giving us back the extensionality in the process. One of those type theories is extensional type theory.

In the simplest formulation, we have intensional type theory with a new rule, reflection

Γ ⊢ p : Id x y
  Γ ⊢ x ≡ y

This means that our normal propositional equality can be shoved back into the more magical definitional equality. This gives us a lot more power, all the typecheckers magic and support of definitional equality can be used with our equality types!

It isn’t all puppies an kittens though, arbitrary reflection can also make things undecidable in general. For example Martin Lof’s system is undecidable in with extensional equality.

It’s worth noting that no extensional type theory is implemented this way. Instead they’ve taken a different approach to defining types themselves!

In this model of ETT types are regarded as a partial equivalence relation (PER) over unityped (untyped if you want to get in a flamewar) lambda calculus terms.

These PERs precisely reflect the extensional equality at that “type” and we then check membership by reflexivity. So a : T is synonymous with (a, a) ∈ T. Notice that since we are dealing with a PER, we know that ∀ a. (a, a) ∈ T need not hold. This is reassuring, otherwise we’d be able to prove that every type was inhabited by every term!

The actual NuRPL&friends theory is a little more complicated than that. It’s not entirely dependent on PERs and allows a few different ways of introducing types, but I find that PERs are a helpful idea.

Propositional Extensionality

This is another flavor of extensional type theory which is really just intensional type theory plus some axioms.

We can arrive at this type theory in a number of ways, the simplest is to add axiom K

k : (A : Set)(x : A)(P : (x : A) → Id x x → Type)
  → P x (Refl x) → (p : Id x x) → P x p

This says that if we can prove that for any property P, P x (Refl x) holds, then it holds for any proof that Id x x. This is subtly different than straightforward induction on Id because here we’re not proving that a property parameterized over two different values of A, but only one.

This is horribly inconsistent in something like homotopy type theory but lends a bit of convenience to theories where we don’t give Id as much meaning.

Using k we can prove that for any p q : Id x y, then Id p q. In Agda notation

    prop : (A : Set)(x y : A)(p q : x ≡ y)
         → p ≡ q
    prop A x .x refl q = k A P (λ _ → refl) x q
      where P : (x : A) → x ≡ x → Set
            P _ p = refl ≡ p

This can be further refined to show that that we can eliminate all proofs that Id x x are Refl x

    rec : (A : Set)(P : A → Set)(x y : A)(p : P x) → x ≡ y → P y
    rec A P x .x p refl = p

    rec-refl-is-useless : (A : Set)(P : A → Set)(x : A)
                        → (p : P x)(eq : x ≡ x) → p ≡ rec A P x x p eq 
    rec-refl-is-useless A P x p eq with prop A x x eq refl
    rec-refl-is-useless A P x p .refl | refl = refl

This form of extensional type theory still leaves a clear distinction between propositional equality and definitional equality by avoiding a reflection rule. However, with rec-refl-is–useless we can do much of the same things, whenever we have something that matches on an equality proof we can just remove it.

We essentially have normal propositional equality, but with the knowledge that things can only be equal in 1 way, up to propositional equality!

Heterogeneous Equality

The next form of equality we’ll talk about is slightly different than previous ones. Heterogeneous equality is designed to co-exist in some other type theory and supplement the existing form of equality.

Heterogeneous equality is most commonly defined with John Major equality

    data JMeq : (A B : Set) → A → B → Set where
      JMrefl : (A : Set)(x : A) → JMeq A A x x

This is termed after a British politician since while it promises that any two terms can be equal regardless of their class (type), only two things from the same class can ever be equal.

Now remember how earlier I’d mentioned that how we phrase these inductive equality types can have a huge impact? We’ll here we can see that because the above definition doesn’t typecheck in Agda!

That’s because Agda is predicative, meaning that a type constructor can’t quantify over the same universe it occupies. We can however, cleverly phrase JMeq so to avoid this

    data JMeq (A : Set) : (B : Set) → A → B → Set where
      JMrefl : (a : A) → JMeq A A a a

Now the constructor avoids quantifying over Set and therefore fits inside the same universe as A and B.

JMeq is usually paired with an axiom to reflect heterogeneous equality back into our normal equality proof.

reflect : (A : Set)(x y : A) → JMeq x y → Id x y

This reflection doesn’t look necessary, but arises for similar reasons that dictate that k is unprovable.

It looks like this heterogeneous equality is a lot more trouble than it’s worth at first. It really shines when we’re working with terms that we know must be the same, but require pattern matching or other jiggering to prove.

If you’re looking for a concrete example, look no further than Observational Equality Now!. This paper gives allows observational equality to be jammed into a principally intensional system!

Wrap Up

So this has been a whirlwind tour through a lot of different type theories. I partially wrote this to gather some of this information in one (free) place. If there’s something here missing that you’d like to see added, feel free to comment or email me.

Thanks to Jon Sterling for proof reading and many subtle corrections :)

<script type="text/javascript"> /* * * CONFIGURATION VARIABLES: EDIT BEFORE PASTING INTO YOUR WEBPAGE * * */ var disqus_shortname = 'codeco'; // required: replace example with your forum shortname /* * * DON'T EDIT BELOW THIS LINE * * */ (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + ''; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

August 06, 2014 12:00 AM