Planet Haskell

December 01, 2016

Douglas M. Auclair (geophf)

November 2016 1HaskellADay Problems and Solutions

by geophf ( at December 01, 2016 03:38 AM

November 29, 2016

Functional Jobs

Back End Functional Developer at NYU (Full-time)

Position Summary

The Databrary project is looking for a smart, energetic and flexible back end developer to join its technical team. The developer will act as the primary owner of the code base of our service. Working closely with the managing director and the service team, the developer will design, develop and maintain tools to enable behavioral researchers to collaborate, store, discover, explore and access video-based research datasets. (S)he will maintain an existing code base and build new features, enhancements and integrations.

Databrary ( is the leading open source video data-sharing system for developmental science. Datavyu ( is the leading free, open source, multi-platform video coding tool. This position provides a unique opportunity to play a central role in advancing open science through data sharing and reuse.

The ideal candidate is a self starter who is not afraid of learning new technologies, thinks out of the box, takes initiative, has excellent attention to detail and can work to take tasks to fruition both collaboratively in a team and independently. The developer will adapt to the evolving and growing needs of the project.

Essential Responsibilities/Functions Research and evaluation

The developer will analyze and understand current system and application architecture, logical and physical data models, security and storage implementation as well as the code base, document it thoroughly, formulate high level architectural and call graph diagrams and make recommendations to the managing director on a future strategic direction.

Development and maintenance

The developer will maintain existing code base and troubleshoot to improve application reliability and performance. (S)he will lead development, manage releases, deploy code, and track bug and QA progress. (S)he will build dynamic, modular and responsive web applications by implementing clean, reusable, well designed and well tested code to add enhancements, features and new integrations to the platform in current technologies (Haskell, PostgreSQL, AngularJS) or any other secure, modern, sustainable web frameworks.

Innovation in data management

The developer will work closely with experts in the field to understand the complete data lifecycle and management for researchers. (S)he will advocate for and become a force of innovation at each step of activities undertaken in relation to the collection, processing, description, transformation, retention and reuse of research data. (S)he will design, develop, implement, test and validate existing and new data management and web-based tools to facilitate research.

Preferred Skills, Knowledge and Abilities

  • Hands-on experience with functional languages like Haskell, OCaml, F#, or Scala.

  • Knowledge of modern web frameworks in high-level languages such as Java, Ruby, Python or PHP and video technologies.

  • Knowledge of JavaScript, JS frameworks, HTML, CSS and other front end technologies.

  • Understanding of best practices in SDLC (Software Development Life Cycle).

  • Understanding of TDD (Test-driven development), security and design patterns.

  • Experience with version control, unix scripting, automation and DevOps practices.

  • Familiarity using CRM, project management and task management systems.

  • Passion for open source projects and building high quality software.

  • Strong written and oral communication skills.

  • Superior listening and analytical skills and a knack for tackling tough problems.

  • Ability to multitask and juggle multiple priorities and projects.

  • Adaptability and openness to learn and change.

Required Experience

  • Track record of designing scalable software for web applications in modern web frameworks.

  • Exceptional understanding of system architecture, object oriented principles, web technologies, REST API and MVC patterns.

  • Solid knowledge of SQL and RDBMS like PostgreSQL.

  • Basic knowledge of scientific practices and research tools, such as Matlab, SPSS, or R.

Preferred Education

  • BS, MS or Ph.D in Computer Science, Information Technology or other relevant field.

New York University is an Equal Opportunity Employer. New York University is committed to a policy of equal treatment and opportunity in every aspect of its hiring and promotion process without regard to race, color, creed, religion, sex, pregnancy or childbirth (or related medical condition), sexual orientation, partnership status, gender and/or gender identity or expression, marital or parental status, national origin, ethnicity, alienage or citizenship status, veteran or military status, age, disability, predisposing genetic characteristics, domestic violence victim status, unemployment status, or any other legally protected basis. Women, racial and ethnic minorities, persons of minority sexual orientation or gender identity, individuals with disabilities, and veterans are encouraged to apply for vacant positions at all levels.

Get information on how to apply for this position.

November 29, 2016 08:15 PM

FP Complete

Do you like Scala? Give Haskell a try!

The language Scala promises a smooth migration path from Object-oriented Java to functional programming. It runs on the JVM, has concepts both from OOP and FP, and an emphasis on interoperability with Java.

As a multi-paradigm language, Scala can flatten the learning curve for those already familiar with OOP, at the price of making some compromises in language design that can make functional programming in Scala more cumbersome than it has to be.

If you like the functional aspects of Scala, but are somewhat bothered by the annoyances that doing FP in Scala incurs, you should take a look at Haskell! It is one of the languages that inspired Scala, so you will find many familiar concepts. By fully embracing the functional paradigm, Haskell makes functional programming much more convenient. The following is an (incomplete) overview of some of the things you can expect from Haskell.

If you want to try out Haskell right away, you can find instructions for getting started here. We have collected some useful training resources at FP Complete Haskell Syllabus, and also provide commercial training. If you're planning to use Haskell for a project and want some help, we can provide you with Consulting services.

Concise syntax

Scala uses a Java-like syntax, but avoids a lot of unnecessary boilerplate by making things like an end-of-line semicolon or empty code blocks { } optional.

Haskell goes much further in terms of conciseness of syntax. For instance, function application, the most common operation in a functional language, is indicated by whitespace, writing f x instead of f(x).

As an example of Haskell's conciseness, consider the definition of a binary tree containing data of some type A. In Scala, you'd define a trait, and a case object/case class each for an empty node and a node that contains an A, and left and right sub-trees, like this:

sealed trait Tree[+A]
case object Empty extends Tree[Nothing]
case class Node[A](v: A, l: Tree[A], r: Tree[A]) extends Tree[A]

The Haskell equivalent is

data Tree a = Empty
    | Node a (Tree a) (Tree a)

Another thing that is syntactically more pleasant in Haskell is currying: in Scala, if you want to use a curried function, you have to anticipate this when defining the function, and write

def curriedF(x: Int)(y: Int) = ...

instead of

def f(x: Int, y: Int) = ...

and then use an underscore to get a one-argument function, as in curriedF(x)_. In Haskell, you can curry any function by applying it to a subset of its arguments; if you define a function of two arguments, as in

f x y = ...

then f x is a function of one argument.

Function composition, i.e., combining two functions f :: b -> c and g :: a -> b to a function from a to c, is simply indicated by a dot, as in f . g :: a -> c. With function composition and currying, it is possible to combine small functions with minimal visual overhead. For example,

tenSmallest = take 10 . sort

defines a function that will first sort a list, and then return the first 10 elements.

The minimalistic syntax of Haskell both avoids excessive typing, and makes it easier to read code, since there is less distracting boilerplate.

More powerful type system

Both Scala and Haskell are statically typed languages, but Haskell's type system is arguably more powerful, in the sense that it provides more guarantees that are checked by the compiler. It also has superior type inference, which means that it places a lower burden on the programmer.

Zero-cost newtypes

One of the benefits of using a type system is that it prevents you from making mistakes such as passing an argument to a function that makes no sense. The more fine-grained the type system is, the more mistakes it can preclude.

For instance, you might have many values of type Double in your program, where some may stand for lengths as measured in meters, some for times in seconds, etc. If the type system only knows that those are Doubles, it will not get in your way when you do a silly mistake, such as adding a time and a distance.

In Scala, you can wrap a Double up in a value class that represents a time interval as in,

class Seconds(val value: Double) extends AnyVal

This type is essentially a Double, but Scala will not let you mix up values of type Seconds and Double, eliminating a class of possible errors.

Scala tries to avoid instantiation of value classes, using the underlying type as the runtime representation whenever possible. However, since the JVM does not support values classes, under some circumstances (such as when performing pattern matches or storing a value class in an array), instantiation is unavoidable and there is a run-time cost to the abstraction.

If we want to perform operations, such as addition or subtraction, on the new type Seconds, we'll have to define operations on them, such as

class Seconds(val value: Double) extends AnyVal {
  def +(x: Seconds): Seconds = new Seconds(value + x.value)
  def -(x: Seconds): Seconds = new Seconds(value - x.value)

While this is straightforward, it is boilerplate code that you'd like to avoid writing yourself.

In Haskell, you can: defining Seconds as a newtype, as in

newtype Seconds = Seconds Double deriving (Eq, Num, Ord)

will define a new type, Seconds, that has numerical operations, comparisons, and equality defined. Furthermore, the run-time representation of this will always be a plain Double, so there is no cost for this abstraction.

Since newtypes are so lightweight (both in terms of performance and in the amount of boilerplate code), they are used pervasively in Haskell projects, which contributes to clean, robust code.

Track side effects in the type system

A prominent example is the IO type: in Haskell, the type signature

f :: a -> b

indicates a function f that takes a value of type a, and returns a value of type b, without producing any side effects. This is distinct from

g :: a -> IO b

which says that g takes input of type a, gives output of type b, but also can also perform some IO in between.

Just by looking at the type signature, a programmer can conclude that f does not interact with the file system, the network, or the user. This means that the function f is referentially transparent, i.e., an occurrence of f in the code is equivalent to its result.

In Scala, writing referentially transparent functions is encouraged as a best practice, but it is not evident, without reading the actual code, whether a given function has side effects or not. In Haskell, referential transparency is enforced by the type system, which not only gives important information to the programmer, but also allows for optimizations in the form of rewrite rules.

As an example, consider the consecutive application of two functions f :: a -> b and g :: b -> c to all the elements of a list, via map. You can either write this as

map g . map f

or as

map (g . f)

For referentially transparent functions, both versions give the same result, but the first version involves two list traversals, and the second only one. Since referential transparency is evident in the types, GHC can safely use a rewrite rule to replace the first version with the second.

These rewrite rules are not hard-wired into the compiler, but are a feature that is exposed to the programmer, so that when you implement your own data structures, you can define your own rewrite rules to boost efficiency. A prominent example is the vector package, which uses re-write rules to performs stream fusion, giving c-like performance to high level code.

No conflicting implicit conversions

Scala has the notion of implicit conversions, that let you add functionality to your types. For instance, declaring an implicit conversion to the Ordered trait lets you add functions for comparing values with respect to some ordering. Implicit conversions are a powerful tool, but also one that can lead to subtle errors. Since implicits can be brought to scope anywhere, you have no guarantee of consistency.

For example, when you use implicit conversions to store and retrieve data in a sorted set, there is no guarantee that the implicit conversion used for storage and retrieval is the same.

In Haskell, you would instead write an instance of the Ord typeclass for your data type. Since writing multiple instances of a given type class for the same data type is a compile-time error, consistency is guaranteed.

Superior type inference

One argument brought against static type systems is that writing down type signatures for every function can be tedious. But modern languages like Scala and Haskell feature algorithms that let the compiler infer the type of a function when none is stated explicitly.

Scala, being a multi-paradigm language, has to account for inheritance, which makes type inference more challenging. As a result, in many cases Scala cannot infer types that are much easier to infer in Haskell.

More advanced features

If you miss features like existential or universal quantification, type families, or generalized algebraic data types, Haskell has those as well.

With many researchers amongst its users, Haskell gets new extensions to its type system regularly. One notable example is Liquid Haskell, which lets you encode restrictions to your types, and pre- and postconditions for your functions, and have their correctness proven at compile time, eliminating the need for run-time checks.

Proper tail call optimization

Functional programming relies heavily on the use of recursion where imperative languages would use loops. In order not to blow the stack, tail call optimization is employed.

Unfortunately, due to limitations in the JVM, Scala only has fairly limited tail call optimization, in that it only optimizes self-tail calls. This excludes the case of multiple functions making tail calls to each other (which flatMap does), severely limiting the usefulness of functional programming in Scala.

There are workarounds for this, such as using trampolines, but they come with performance costs. Thus, when programming in Scala, you often face the question whether it is ok to using functional programming techniques for some problem, or if the price you pay in terms of performance would be too high.

Haskell does not have this problem, since all tail calls are optimized.

Language interoperability

A strong selling point of Scala is its interoperability with Java, allowing the re-use of a plethora of existing libraries.

Haskell provides a lot in terms of language interoperability as well:

  • The Foreign Function Interface (FFI) allows calling C functions from Haskell code. It is very easy to use, and allows interoperation with all languages that also have C interoperability (such as C++, Python, and Ruby). A lot of popular C libraries have Haskell bindings using the FFI.
  • Furthermore, it is possible to embed snippets of C code directly into your Haskell code, via inline-c.
  • For data analysis and visualization, you can embed R code with inline-r.
  • With the recent addition of inline-java, it is straightforward to call Java functions from Haskell (although Haskell will not share a heap with Java, as Scala does).

Haskell is ready for production

With its origins in academia, Haskell gets a reputation of being a language purely for the Ivory Tower. Despite this preconception, Haskell is being used successfully in real world projects: Facebook built its spam filter with Haskell, the Commercial Haskell special interest group lists companies using Haskell, and our own success stories show some examples of successful applications of Haskell in industry.

Get Started

If you want to try out Haskell right away, you can find instructions for getting started here. If you want to learn more, have a look at the FP Complete Haskell Syllabus, or look at our Haskell training.

If you're planning to use Haskell for a project and want some help, we can provide you with Consulting services.

November 29, 2016 12:15 PM

November 28, 2016

FP Complete

Devops best practices: Multifaceted Testing

Even among skilled enterprise IT departments, it is too rare that software is thoroughly tested before deployment. Failed deployments mean costly downtime, service failures, upset users, and even security breaches. How can we verify that a solution is actually ready to deploy, free of serious defects?

Contact us for professional assistance

You probably need more kinds of tests

We’ve all seen “ready to deploy” applications that do not work as expected once deployed. Often it’s because the production system is not in fact identical to the staging system, so the testing wasn’t valid. This can be prevented with another devops best practice, automated deployments -- which we talked about in this recent post and will return to again.

But often the problem is that the software, even on a properly configured test system and staging system, was never fully tested. Before an app is approved for deployment, your QA system (mostly automated) should complete:

  • Success tests
  • Failure tests
  • Corner-case tests
  • Randomized tests (also called mutation tests, fuzz tests)
  • Load and performance tests
  • Usability tests
  • Security tests

(If you were not using automated, reproducible deployments, you would also have to do explicit pre-tests of the deployment script itself -- but you are doing fully automated deployments, right? If you aren’t, consider moving to the sorts of tools we use in FP Deploy -- like Docker and Kubernetes, Puppet and Ansible.)

In the rest of this article we’ll see how each kind of testing adds something different and important.

Success testing: run with realistic inputs

Most operations teams won’t accept a deployment from the engineering group (dev or test or QA) unless the system has at least passed success testing. When presented with correct inputs, does the system generate correct outputs and not crash?

It’s the most basic testing. Yet it’s often left incomplete. To avoid serious omissions, use at least this checklist:

Did you play back a realistic workload, based on a real sample of typical inputs? If not, your test’s idea of what to try may be very different from what your users will actually do in the first day in production. Huge source of real-world failures. Did you test through both the UI and the API? Automated testing only through the UI may mask problems that the UI prevents or corrects. Automated testing only through the API can’t find pure UI bugs. Were your tests updated when the spec was updated? For that matter, was the spec even updated when the software’s intended function was updated? If “no” on either of these, you are not actually testing the software’s correct function. Do you have coverage? Has someone gone back through the spec and identified all the promised functionality, and verified that some test actually tests each item -- and that it was run on the latest build, and passed? Did you test in a system that’s configured the same as your production system? Especially if deployments are not automated, it is crucial to ensure that nothing was done in the staging system to make it easier to pass than the production system. Common examples: omitting firewalls that would be present between layers in production; running test services under accounts with excessive permissions; having a manual security checklist for production deployments that is not used on staging deployments; sequencing inputs so that multiple simulated users cannot appear concurrently. Passing tests may not mean much if the environment is rigged to ensure success.

Believe it or not, that was the easy part. For enterprise production quality, you still want to test your system six more ways. Jumping right in to number two...

Failure testing: when you break the law, do you go to jail?

Your testing is all under an automated set of continuous integration (CI) scripts, right? (If not, time to look into that.) But do you have tests that force all of the specified error conditions to occur? Do you pass in every identified kind of prohibited/invalid input? Do you also create all the realistic external error conditions, like a network link failure, a timeout, a full disk, low memory (with a tool like Chaos Monkey)?

A test suite doesn’t check whether specified error conditions actually generate the right errors is not complete. We recommend that the QA team present a report showing a list of all existing tests, what conditions they purport to test, and when they were last run and passed.

Corner-case testing: try something crazy

Maybe you’ve heard the joke: a QA engineer walks into a bar, and orders a beer, and 2 beers, and 20 beers, and 0 beers, and a million beers, and -1 beers. And a duck.

Corner-case testing means success testing using unrealistic but legal inputs. Often, developers write code that works correctly in typical cases, but fails in the extremes.

Before deploying, consider: are any of your users going to try anything crazy? What would be the oddest things still permitted, and what happens if you try? Who tested that and verified that the output was correct? Correct code works on all permitted inputs, not just average ones. This is a fast way to find bugs before deployment -- push the system right to the edge of what it should be able to do.

Corner cases vary by application, but here are some typical examples to spark your thinking. Where strings are permitted, what happens if they are in a very differently structured language, like Chinese or Arabic? What happens if they are extremely long? Where numbers are permitted, what happens if they are very large, very small, zero, negative? And why is the permitted range as big as it is; should it be reduced? Is it legal to request output of a billion records, and what happens if I do? Where options are permitted, what if someone chooses all of them, or a bizarre mixture? Can I order a pizza with 30 toppings? Can I prescribe 50 medicines, at 50 bottles each, for a sample patient? What happens if nested or structured inputs are extremely complex? Can I send an email with 100 embeddings, each of which is an email with 100 embeddings?

If an application hasn’t been tested with ridiculous-but-legal inputs, no one really knows if it’s going to hold up in production.

Randomized testing: never saw that before!

No human team can test every possible combination of cases and actions. And that may be okay, because many projects find more bugs per unit of effort through randomized generation of test cases than any other way.

This means writing scripts that start with well-understood inputs, and then letting them make random, arbitrary changes to these inputs and run again, then change and run again, many thousands of times. Even if it’s not realistic to test the outputs for correctness (because the script may be unable to tell what its crazy inputs were supposed to do), the outputs can be tested for structural validity -- and the system can be watched for not crashing, and not generating any admin alerts or unhandled errors or side effects.

It’s downright surprising how fast you can find bugs in a typical unsafe language (like Python or C or Java) through simple mutation testing. Extremely safe languages like Haskell tend to find these bugs at compile time, but it may still be worth trying some randomized testing. Remember, machine time is cheap; holes in deployed code are very expensive.

Load and performance testing: now that's heavy

Companies with good devops say heavy load is one of the top remaining sources of failure. The app works for a while, but fails when peak user workload hits. Be on the lookout for conditions that could overload your servers, and make sure someone is forcing them to happen on the staging system -- before they happen in production.

Consider whether your test and staging systems are similar enough to your production system. If your production system accepts 5000 requests per second on 10 big machines, and your test system accepts 5 per second on one tiny VM, how will you know about database capacity issues or network problems?

A good practice is to throw enormous, concurrent, simulated load at your test system that (1) exceeds any observed real-world load and (2) includes a wide mix of realistic inputs, perhaps a stream of historic real captured inputs as well as random ones. This reduces the chance that you threw a softball at the system when real users are going to throw a hardball.

Performance testing can include sending faster and faster inputs until some hardware resource becomes saturated. (You may enjoy watching system monitor screens as this is happening!) Find the bottleneck -- what resource can you expect to fail first in production? How will you prevent it? Do your deployment scripts specify an abundance of this resource? Have you implemented cloud auto-scaling so that new parallel servers are fired up when the typically scarce resource (CPU, RAM, network link, …) gets too busy?

Usability testing: it doesn't work if people can't use it

Most people consider this to be outside the realm of devops. Who cares if users find your system confusing and hard to use? Well, lots of people, but why should a devops person care?

What will happen to your production environment if a new feature is deployed and suddenly user confusion goes through the roof? Will they think there’s a bug? Will support calls double in an hour, and stay doubled? Will you be forced to do a rollback?

User interface design probably isn’t your job. But if you are deploying user-facing software that has not been usability tested, you’re going to hear about it. Encourage your colleagues to do real testing of their UI on realistic, uninitiated users (not just team members who know what the feature is supposed to do), or at least skeptical test staff who know how to try naive things on purpose, before declaring a new feature ready to deploy.

Security testing: before it's too late

One of the worst things you can do is to cause a major security breach, leading to a loss of trust and exposure of users’ private data.

Testing a major, public-facing, multi-server device (a distributed app) for security is a big topic, and I’d be doing a disservice by trying to summarize it in just a couple of paragraphs. We’ll return in future posts to both the verification and testing side, and the design and implementation side, of security. A best practice is to push quality requirements upstream, letting developers know that security is their concern too, and ensuring that integration-test systems use a secure automated deployment similar or identical to the production system. Don’t let developers say “I assume you’ll secure this later.”

Meanwhile, as a devops best practice, your deployment and operations team should have at least one identified security expert, a person whose job includes knowing about all the latest security test tools and ensuring that they are being used where appropriate. Are you checking for XSS attacks and SQL injection attacks? DDoS attacks? Port-scan attacks? Misconfigured default accounts? It’s easy to neglect security until it’s too late, so make someone responsible.

Security holes can appear in application code, or in the platform itself (operating system, library packages, and middleware). At a minimum, security testing should include running standard off-the-shelf automated scanning software that looks for known ways to intrude, most often taking advantage of poor default configurations, or of platform components that have not been upgraded to the latest patch level. Run an automated security scan before moving a substantially changed system from staging into production. Automated testing is almost free, in stark contrast to the costly manual clean-up after a breach.


Wow, that’s a lot of testing! More than a lot of companies actually do. Yet it’s quite hard to look back through that list and find something that’s okay to omit. Clearly the devops pipeline doesn’t begin at the moment of deployment, but sooner, in the development team itself. That means developers and testers taking responsibility for delivering a quality product that’s actually ready to deploy.

As usual, systems thinking wins. A good operations team doesn’t say “give us what you’ve got, and we’ll somehow get it online.” A good operations team says “we offer deployment and operation services, and here’s how you can give us software that will deploy successfully into our production environment.”

We hope you’ll keep reading our blog and making use of what we have learned. Thanks for spending time with FP Complete.

Contact us to help your engineering and devops teams

November 28, 2016 06:00 PM

Ken T Takusagawa

[xxneozpu] Demonstration of Data.Numbers.Fixed

Here is a little demonstration of arbitrary precision floating-point (actually fixed-point) arithmetic in Haskell, using Data.Numbers.Fixed in the numbers package.  Using dynamicEps, we calculate sqrt(1/x) to arbitrary precision, outputting the number in a user-specified base.

Performance is not so great; the package implements Floating functions using elegant but not necessarily high performance algorithms based on continued fractions (Gosper, HAKMEM).  The hmpfr package might be better.  (For square root, it might also be easy to roll one's own implementation of Newton's method on Rational.)

Radix conversion of a fractional number is implemented as an unfold, similar to radix conversion of an integer.

by Ken ( at November 28, 2016 06:38 AM

Michael Snoyman

Haskell Documentation, 2016 Update

I've blogged, Tweeted, and conversed about Haskell documentation quite a bit in the past. Following up on tooling issues, all available evidence tells me that improving the situation for documentation in Haskell is the next obstacle we need to knock down.

This blog post will cover:

  • Where I think the biggest value is to be had in improving Haskell documentation
  • Status of various initiatives I've been involved in (and boy do I walk away red-faced from this)
  • Recommendations for how others can best contribute
  • A basic idea of what actions I - and others at FP Complete - have been taking

Intermediate docs

In my opinion, the sore spot for Haskell overall is intermediate docs. (Yes, that's vague, bear with me momentarily.) I'm going to posit that:

  • Beginners are currently well served by introductory Haskell books, and most recently by Haskell Programming from First Principles
  • Once you have a solid basis in intermediate concepts, it's much easier to jump into libraries and academic papers and understand what's going on

To me, intermediate means you already know the basics of Haskell syntax, monads, and common typeclasses, but aren't really familiar with any non-base libraries, concurrency, or exception handling. The goal of intermediate documentation is to:

  • Teach which libraries to use, when to use them, and how to use them
  • Give a guide on structuring Haskell programs
  • Educate on important techniques, including those mentioned above, as well as issues around lazy evaluation and other common stumbling blocks

Many of us who have learned Haskell over the past many years have probably picked up these topics sporadically. While some people will want to plow ahead in that kind of haphazard approach, my belief is that the vast majority of users want to be more guided through the process. We'll get to the Haskell Syllabus and opinionated vs unopinionated content a bit later.

Previous efforts

It turns out, as I'm quite embarassed to admit, that I've essentially tried reinventing the same intermediate docs concept multiple times, first with MezzoHaskell, and then with the Commercial Haskell doc initiative. You may also include School of Haskell in that list too, but I'm going to treat it separately.

These initiatives never took off. A pessimistic view is that Haskellers are simply uninterested in contributing to such a shared body of intermediate-level docs. I actually believed that for a bit, but recent activity has convinced me otherwise. I think these previous initiatives failed due to an unsatisfactory user experience. These initiatives required people to go to an infrequently used Github repo to view docs, which no one was doing. A few months back, a new option presented itself.

haskell-lang's documentation

For those who haven't seen it, you should check out the libraries page and documentation page on the site. I believe this hits the nail on the head in many different ways:

  • It directly addresses a common user question: which are the best libraries to use for a certain task? The libraries page gives this answer (though it can certainly be improved and expanded!)
  • We're able to curate the documentation as a community. Providing a list of recommended documents on this site gives a reader more confidence than simply Googling and hoping the author knows what he/she is talking about
  • The collaboration is done via pull requests on markdown files. I've discussed previously why I think this is a far better collaboration technique than Wikis or other options.
  • Instead of requiring all docs live within the repository, documents can be embedded from elsewhere. For example, I've written a conduit tutorial in the conduit repository, and embedded its content on via a simple inclusion mechanism. This allows authors to maintain their documentation individually, but provide users with a central location to find these kinds of documents. (I'd encourage other sites to take advantage of this transclusion technique, getting quality content into user hands is the goal!)

haskell-lang tries to host only "uncontroversial" documentation. Documents explaining how to use a library are pretty straightforward. Recommending libraries like bytestring, text, and vector are all pretty well accepted. And for cases where multiple libraries are used, we link to both.

I've merged all of the content I wrote in MezzoHaskell and the Commercial Haskell doc initiative into where it fit. However, there was still some more controversial content left, such as exceptions best practices, which I know many people disagree with me about. Also, I'd like to be able to tell a user looking for a solution, "yes, there are multiple libraries around, I recommend X." Neither of these belong on a community site like haskell-lang, so for those...

More opinionated content

This is where alternative sites thrive. Since I'm collaborating with others at FP Complete on this, and actively using this in training courses, I've put together a Haskell Syllabus page page. This is where I'll tell someone "you should do X, even though others disagree with me." I won't enumerate the contentious decisions here (odds are someone else will ultimately make such a list on my behalf).

And if you disagree with this? Write a new syllabus! I think it would be a healthy thing if we could get to the point as a community where we could have multiple recommended, opinionated syllabuses, and link to them with a short description for each one. This may sound at odds with some of my previous statements, so let me clarify:

  • When there's an obviously best choice, tell the user to use it
  • When most users will be best with one choice, and another option is available, mention it as a footnote
  • When multiple options are available and there's no way to know which the user will want, break down and give them all the information they need. But...
  • Try to make that happen as infrequently - and as late in the learning process - as possible! If we could have a "you've completed Beginner Haskell, please choose between one of the following," and explain the difference between "FP Complete's course" vs (for example) "lens-first Haskell", that would be a good trade-off.

My thoughts on this are still evolving, and will likely change in the future as I get more feedback from users.

Writing style

Another big change I've made over the years is writing style. I wrote the Yesod book in a very prose-heavy manner, focusing on explaining details with words, and using concise, to-the-point code examples. Many users have given me feedback to push me in a different direction. Instead, I've recently been writing trying to write in a very different, and thankfully easier to write, style:

  • Short explanation of what the thing is I'm talking about and when you'd use it
  • Synopsis: medium sized code snippet to give a flavor (I used this in the Yesd book too, and stole the idea straight from Perl docs)
  • A series of increasingly complex examples, with the bare minimum amount of content around it to explain what's going on

I'd put this style into a hybrid of tutorial and cookbook, and think it works well overall. I've only heard positives so far versus previous styles, so that's encouraging. Some examples:

I'm taking this approach because I think it's what most users want. Some important points:

  • Not all users are the same! There will almost certainly be users who would prefer a different style of docs. Given enough user feedback and manpower to write docs, it would be great to cater to all tastes, but it's best right now to focus on the highest demand
  • API docs are still necessary, and are completely orthogonal to tutorials. A tutorial doesn't document each API call, an API-call-level explanation doesn't give enough breadth, and certainly users need more than just the type signatures.

What you can do

After all of that, my recommendation on how to get involved is pretty simple:

  • Pick a library that doesn't have a good tutorial
  • Write a tutorial
  • Submit a PR to haskell-lang to include the content
  • Alternatively: get the Markdown file included in the project's repo instead, and include the remote file instead

Linking to libraries

haskell-lang has a nifty feature. If you visit, it will display the vector documentation it has. But if you visit a package like which doesn't (yet) have a tutorial on haskell-lang, it will automatically redirect you to the Stackage package page. When giving out links to people on the internet, I recommend using the link.

  • When a tutorial is available, the haskell-lang page is great
  • When a tutorial isn't available, the doc building on Stackage is still the most reliable around
  • In addition, Stackage docs properly link to docs built in the same Stackage snapshot, making the cross linking more reliable
  • When present, Stackage gives preference to the files in a package, which are generally more useful than the description fields.

School of Haskell

I'd be remiss in not mentioning School of Haskell here. As far as I'm concerned, School of Haskell is great platform for an individual to write content without any collaboration. However, for all of the cases I'm describing here, some kind of easy collaboration (via pull requests) is a huge win. Opening things up more with pull requests, files, and embedding content into multiple external sites seems like the best option today (until someone comes up with something better!).

November 28, 2016 12:00 AM

November 26, 2016

Manuel M T Chakravarty

Here is the video of my talk “A Type is Worth a Thousand...

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="225" mozallowfullscreen="mozallowfullscreen" src=";byline=0&amp;portrait=0" title="A Type is Worth a Thousand Tests" webkitallowfullscreen="webkitallowfullscreen" width="400"></iframe>

Here is the video of my talk “A Type is Worth a Thousand Tests” presented at Sydney CocoaHeads, November 2016 (you can also get the slides) — I previously presented this talk at YOW! Connected, Melbourne, and an earlier version at Curry On in Rome.

In this talk, I argue that types are a design tool that, if applied correctly, reduces the need for tests. I illustrate this at the example of the design of a simple iPhone app in Swift whose source code is available on GitHub.

November 26, 2016 03:38 AM

November 25, 2016

Derek Elkins

Category Theory, Syntactically

(or: How Model Theory Got Scooped by Category Theory)


This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.

I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.


The first concept we’ll need is that of a theory. If you’ve ever implemented an interpreter for even the simplest language, than most of what follows modulo some terminological differences should be both familiar and very basic. If you are familiar with algebraic semantics, then that is exactly what is happening here only restricting to unary (but multi-sorted) algebraic theories.

For us, a theory, #ccT#, is a collection of sorts, a collection of (unary) function symbols1, and a collection of equations. Each function symbol has an input sort and an output sort which we’ll call the source and target of the function symbol. We’ll write #ttf : A -> B# to say that #ttf# is a function symbol with source #A# and target #B#. We define #"src"(ttf) -= A# and #"tgt"(ttf) -= B#. Sorts and function symbols are just symbols. Something is a sort if it is in the collection of sorts. Nothing else is required. A function symbol is not a function, it’s just a, possibly structured, name. Later, we’ll map those names to functions, but the same name may be mapped to different functions. In programming terms, a theory defines an interface or signature. We’ll write #bb "sort"(ccT)# for the collection of sorts of #ccT# and #bb "fun"(ccT)# for the collection of function symbols.

A (raw) term in a theory is either a variable labelled by a sort, #bbx_A#, or it’s a function symbol applied to a term, #tt "f"(t)#, such that the sort of the term #t# matches the source of #ttf#. The sort or target of a term is the sort of the variable if it’s a variable or the target of the outermost function symbol. The source of a term is the sort of the innermost variable. In fact, all terms are just sequences of function symbol applications to a variable, so there will always be exactly one variable. All this is to say the expressions need to be “well-typed” in the obvious way. Given a theory with two function symbols #ttf : A -> B# and #ttg : B -> A#, #bbx_A#, #bbx_B# , #tt "f"(bbx_A)#, and #tt "f"(tt "g"(tt "f"(bbx_A)))# are all examples of terms. #tt "f"(bbx_B)# and #tt "f"(tt "f"(bbx_A))# are not terms because they are not “well-typed”, and #ttf# by itself is not a term simply because it doesn’t match the syntax. Using Haskell syntax, we can define a data type representing this syntax if we ignore the sorting:

data Term = Var Sort | Apply FunctionSymbol Term

Using GADTs, we could capture the sorting constraints as well:

data Term (s :: Sort) (t :: Sort) where
    Var :: Term t t
    Apply :: FunctionSymbol x t -> Term s x -> Term s t

An important operation on terms is substitution. Given a term #t_1# with source #A# and a term #t_2# with target #A# we define the substitution of #t_2# into #t_1#, written #t_1[bbx_A |-> t_2]#, as:

If #t_1 = bbx_A# then #bbx_A[bbx_A |-> t_2] -= t_2#.

If #t_1 = tt "f"(t)# then #tt "f"(t)[bbx_A |-> t_2] -= tt "f"(t[bbx_A |-> t_2])#.

Using the theory from before, we have:

#tt "f"(bbx_A)[bbx_A |-> tt "g"(bbx_B)] = tt "f"(tt "g"(bbx_B))#

As a shorthand, for arbitrary terms #t_1# and #t_2#, #t_1(t_2)# will mean #t_1[bbx_("src"(t_1)) |-> t_2]#.

Finally, equations2. An equation is a pair of terms with equal source and target, for example, #(: tt "f"(tt "g"(bbx_B)), bbx_B :)#. The idea is that we want to identify these two terms. To do this we quotient the set of terms by the congruence generated by these pairs, i.e. by the reflexive-, symmetric-, transitive-closure of the relation generated by the equations which further satisfies “if #s_1 ~~ t_1# and #s_2 ~~ t_2# then #s_1(s_2) ~~ t_1(t_2)#”. From now on, by “terms” I’ll mean this quotient with “raw terms” referring to the unquotiented version. This means that when we say “#tt "f"(tt "g"(bbx_B)) = bbx_B#”, we really mean the two terms are congruent with respect to the congruence generated by the equations. We’ll write #ccT(A, B)# for the collection of terms, in this sense, with source #A# and target #B#. To make things look a little bit more normal, I’ll write #s ~~ t# as a synonym for #(: s, t :)# when the intent is that the pair represents a given equation.

Expanding the theory from before, we get the theory of isomorphisms, #ccT_{:~=:}#, consisting of two sorts, #A# and #B#, two function symbols, #ttf# and #ttg#, and two equations #tt "f"(tt "g"(bbx_B)) ~~ bbx_B# and #tt "g"(tt "f"(bbx_A)) ~~ bbx_A#. The equations lead to equalities like #tt "f"(tt "g"(tt "f"(bbx_A))) = tt "f"(bbx_A)#. In fact, it doesn’t take much work to show that this theory only has four distinct terms: #bbx_A#, #bbx_B#, #tt "f"(bbx_A)#, and #tt "g"(bbx_B)#.

In traditional model theory or universal algebra we tend to focus on multi-ary operations, i.e. function symbols that can take multiple inputs. By restricting ourselves to only unary function symbols, we expose a duality. For every theory #ccT#, we have the opposite theory, #ccT^(op)# defined by using the same sorts and function symbols but swapping the source and target of the function symbols which also requires rewriting the terms in the equations. The rewriting on terms is the obvious thing, e.g. if #ttf : A -> B#, #ttg : B -> C#, and #tth : C -> D#, then the term in #ccT#, #tt "h"(tt "g"(tt "f"(bbx_A)))# would become the term #tt "f"(tt "g"(tt "h"(bbx_D)))# in #ccT^(op)#. From this it should be clear that #(ccT^(op))^(op) = ccT#.

Product Theories

Given two theories #ccT_1# and #ccT_2# we can form a new theory #ccT_1 xx ccT_2# called the product theory of #ccT_1# and #ccT_2#. The sorts of this theory are pairs of sorts from #ccT_1# and #ccT_2#. The collection of function symbols is the disjoint union #bb "fun"(ccT_1) xx bb "sort"(ccT_2) + bb "sort"(ccT_1) xx bb "fun"(ccT_2)#. A disjoint union is like Haskell’s Either type. Here we’ll write #tt "inl"# and #tt "inr"# for the left and right injections respectively. #tt "inl"# takes a function symbol from #ccT_1# and a sort from #ccT_2# and produces a function symbol of #ccT_1 xx ccT_2# and similarly for #tt "inr"#. If #tt "f" : A -> B# in #ccT_1# and #C# is a sort of #ccT_2#, then #tt "inl"(f, C) : (A, C) -> (B, C)# and similarly for #tt "inr"#.

The collection of equations for #ccT_1 xx ccT_2# consists of the following:

  • for every equation, #l ~~ r# of #ccT_1# and every sort, #C#, of #ccT_2# we produce an equation #l’ ~~ r’# by replacing each function symbol #ttf# in #l# and #r# with #tt "inl"(tt "f", C)#
  • similarly for equations of #ccT_2#
  • for every pair of function symbols #ttf : A -> B# from #ccT_1# and #ttg : C -> D# from #ccT_2#, we produce the equation #tt "inl"(tt "f", D)(tt "inr"(A, tt "g")(bbx_{:(A, C")":})) ~~ tt "inr"(B, tt "g")("inl"(tt "f", C)(bbx_{:(A, C")":}))#

The above is probably unreadable. If you work through it, you can show that every term of #ccT_1 xx ccT_2# is equivalent to a pair of terms #(t_1, t_2)# where #t_1# is a term in #ccT_1# and #t_2# is a term in #ccT_2#. Using this equivalence, the first bullet is seen to be saying that if #l = r# in #ccT_1# and #C# is a sort in #ccT_2# then #(l, bbx_C) = (r, bbx_C)# in #ccT_1 xx ccT_2#. The second is similar. The third then states

#(t_1, bbx_C)((bbx_A, t_2)(bbx_{:"(A, C)":})) = (t_1, t_2)(bbx_{:"(A, C)":}) = (bbx_A, t_2)((t_1, bbx_C)(bbx_{:"(A, C)":}))#.

To establish the equivalence between terms of #ccT_1 xx ccT_2# and pairs of terms from #ccT_1# and #ccT_2#, we use the third bullet to move all the #tt "inl"#s outward at which point we’ll have a sequence of #ccT_1# function symbols followed by a sequence of #ccT_2# function symbols each corresponding to term.

The above might seem a bit round about. An alternative approach would be to define the function symbols of #ccT_1 xx ccT_2# to be all pairs of all the terms from #ccT_1# and #ccT_2#. The problem with this approach is that it leads to an explosion in the number of function symbols and equations required. In particular, it easily produces an infinitude of function symbols and equations even when provided with theories that only have a finite number of sorts, function symbols, and equations.

As a concrete and useful example, consider the theory #ccT_bbbN# consisting of a single sort, #0#, a single function symbol, #tts#, and no equations. This theory has a term for each natural number, #n#, corresponding to #n# applications of #tts#. Now let’s articulate #ccT_bbbN xx ccT_bbbN#. It has one sort, #(0, 0)#, two function symbols, #tt "inl"(tt "s", 0)# and #tt "inr"(0, tt "s")#, and it has one equation, #tt "inl"(tt "s", 0)(tt "inr"(0, tt "s")(bbx_{:(0, 0")":})) ~~ tt "inr"(0, tt "s")("inl"(tt "s", 0)(bbx_{:(0, 0")":}))#. Unsurprisingly, the terms of this theory correspond to pairs of natural numbers. If we had used the alternative definition, we’d have had an infinite number of function symbols and an infinite number of equations.

Nevertheless, for clarity I will typically write a term of a product theory as a pair of terms.

As a relatively easy exercise — easier than the above — you can formulate and define the disjoint sum of two theories #ccT_1 + ccT_2#. The idea is that every term of #ccT_1 + ccT_2# corresponds to either a term of #ccT_1# or a term of #ccT_2#. Don’t forget to define what happens to the equations.

Related to these, we have the theory #ccT_{:bb1:}#, which consists of one sort and no function symbols or equations, and #ccT_{:bb0:}# which consists of no sorts and thus no possibility for function symbols or equations. #ccT_{:bb1:}# has exactly one term while #ccT_{:bb0:}# has no terms.


Sometimes we’d like to talk about function symbols whose source is in one theory and target is in another. As a simple example, that we’ll explore in more depth later, we may want function symbols whose sources are in a product theory. This would let us consider terms with multiple inputs.

The natural way to achieve this is to simply make a new theory that contains sorts from both theories plus the new function symbols. A collage, #ccK#, from a theory #ccT_1# to #ccT_2#, written #ccK : ccT_1 ↛ ccT_2#, is a theory whose collection of sorts is the disjoint union of the sorts of #ccT_1# and #ccT_2#. The function symbols of #ccK# consist for each function symbol #ttf : A -> B# in #ccT_1#, a function symbol #tt "inl"(ttf) : tt "inl"(A) -> tt "inl"(B)#, and similarly for function symbols from #ccT_2#. Equations from #ccT_1# and #ccT_2# are likewise taken and lifted appropriately, i.e. #ttf# is replaced with #tt "inl"(ttf)# or #tt "inr"(ttf)# as appropriate. Additional function symbols of the form #k : tt "inl"(A) -> tt "inr"(Z)# where #A# is a sort of #ccT_1# and #Z# is a sort of #ccT_2#, and potentially additional equations involving these function symbols, may be given. (If no additional function symobls are given, then this is exactly the disjoint sum of #ccT_1# and #ccT_2#.) These additional function symbols and equations are what differentiate two collages that have the same source and target theories. Note, there are no function symbols #tt "inr"(Z) -> tt "inl"(A)#, i.e. where #Z# is in #ccT_2# and #A# is in #ccT_1#. That is, there are no function symbols going the “other way”. To avoid clutter, I’ll typically assume that the sorts and function symbols of #ccT_1# and #ccT_2# are disjoint already, and dispense with the #tt "inl"#s and #tt "inr"#s.

Summarizing, we have #ccK(tt "inl"(A), "inl"(B)) ~= ccT_1(A, B)#, #ccK(tt "inr"(Y), tt "inr"(Z)) ~= ccT_2(Y, Z)#, and #ccK(tt "inr"(Z), tt "inl"(A)) = O/# for all #A#, #B#, #Y#, and #Z#. #ccK(tt "inl"(A), tt "inr"(Z))# for any #A# and #Z# is arbitrary generated. To distinguish them, I’ll call the function symbols that go from one theory to another bridges. More generally, an arbitrary term that has it’s source in one theory and target in another will be described as a bridging term.

Here’s a somewhat silly example. Consider #ccK_+ : ccT_bbbN xx ccT_bbbN ↛ ccT_bbbN# that has one bridge #tt "add" : (0, 0) -> 0# with the equations #tt "add"(tt "inl"(tts, 0)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")")))# and #tt "add"(tt "inr"(0, tts)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")")))#.

More usefully, if a bit degenerately, every theory induces a collage in the following way. Given a theory #ccT#, we can build the collage #ccK_ccT : ccT ↛ ccT# where the bridges consist of the following. For each sort, #A#, of #ccT#, we have the following bridge: #tt "id"_A : tt "inl"(A) -> tt "inr"(A)#. Then, for every function symbol, #ttf : A -> B# in #ccT#, we have the following equation: #tt "inl"(tt "f")(tt "id"_A(bbx_(tt "inl"(A)))) ~~ tt "id"_B(tt "inr"(tt "f")(bbx_(tt "inl"(A))))#. We have #ccK_ccT(tt "inl"(A), tt "inr"(B)) ~= ccT(A, B)#.

You can think of a bridging term in a collage as a sequence of function symbols partitioned into two parts by a bridge. Naturally, we might consider partitioning into more than two parts by having more than one bridge. It’s easy to generalize the definition of collage to combine an arbitrary number of theories, but I’ll take a different, probably less good, route. Given collages #ccK_1 : ccT_1 ↛ ccT_2# and #ccK_2 : ccT_2 ↛ ccT_3#, we can make the collage #ccK_2 @ ccK_1 : ccT_1 ↛ ccT_3# by defining its bridges to be triples of a bridge of #ccK_1#, #k_1 : A_1 -> A_2#, a term, #t : A_2 -> B_2# of #ccT_2#, and a bridge of #ccK_2#, #k_2 : B_2 -> B_3# which altogether will be a bridge of #ccK_2 @ ccK_1# going from #A_1 -> B_3#. These triples essentially represent a term like #k_2(t(k_1(bbx_(A_1))))#. With this intuition we can formulate the equations. For each equation #t'(k_1(t_1)) ~~ s'(k'_1(s_1))# where #k_1# and #k'_1# are bridges of #ccK_1#, we have for every bridge #k_2# of #ccK_2# and term #t# of the appropriate sorts #(k_2, t(t'(bbx)), k_1)(t_1) ~~ (k_2, t(s'(bbx)), k'_1)(s_1)# and similarly for equations involving the bridges of #ccK_2#.

This composition is associative… almost. Furthermore, the collages generated by theories, #ccK_ccT#, behave like identities to this composition… almost. It turns out these statements are true, but only up to isomorphism of theories. That is, #(ccK_3 @ ccK_2) @ ccK_1 ~= ccK_3 @ (ccK_2 @ ccK_1)# but is not equal.

To talk about isomorphism of theories we need the notion of…


An interpretation of a theory gives meaning to the syntax of a theory. There are two nearly identical notions of interpretation for us: interpretation (into sets) and interpretation into a theory. I’ll define them in parallel. An interpretation (into a theory), #ccI#, is a mapping, written #⟦-⟧^ccI# though the superscript will often be omitted, which maps sorts to sets (sorts) and function symbols to functions (terms). The mapping satisfies:

#⟦"src"(f)⟧ = "src"(⟦f⟧)# and #⟦"tgt"(f)⟧ = "tgt"(⟦f⟧)# where #"src"# and #"tgt"# on the right are the domain and codomain operations for an interpretation.

We extend the mapping to a mapping on terms via:

  • #⟦bbx_A⟧ = x |-> x#, i.e. the identity function, or, for interpretation into a theory, #⟦bbx_A⟧ = bbx_{:⟦A⟧:}#
  • #⟦tt "f"(t)⟧ = ⟦tt "f"⟧ @ ⟦t⟧# or, for interpretation into a theory, #⟦tt "f"(t)⟧ = ⟦tt "f"⟧(⟦t⟧)#

and we require that for any equation of the theory, #l ~~ r#, #⟦l⟧ = ⟦r⟧#. (Technically, this is implicitly required for the extension of the mapping to terms to be well-defined, but it’s clearer to state it explicitly.) I’ll write #ccI : ccT -> bb "Set"# when #ccI# is an interpretation of #ccT# into sets, and #ccI’ : ccT_1 -> ccT_2# when #ccI’# is an interpretation of #ccT_1# into #ccT_2#.

An interpretation of the theory of isomorphisms produces a bijection between two specified sets. Spelling out a simple example where #bbbB# is the set of booleans:

  • #⟦A⟧ -= bbbB#
  • #⟦B⟧ -= bbbB#
  • #⟦tt "f"⟧ -= x |-> not x#
  • #⟦tt "g"⟧ -= x |-> not x#

plus the proof #not not x = x#.

As another simple example, we can interpret the theory of isomorphisms into itself slightly non-trivially.

  • #⟦A⟧ -= B#
  • #⟦B⟧ -= A#
  • #⟦tt "f"⟧ -= tt "g"(bbx_B)#
  • #⟦tt "g"⟧ -= tt "f"(bbx_A)#

As an (easy) exercise, you should define #pi_1 : ccT_1 xx ccT_2 -> ccT_1# and similarly #pi_2#. If you defined #ccT_1 + ccT_2# before, you should define #iota_1 : ccT_1 -> ccT_1 + ccT_2# and similarly for #iota_2#. As another easy exercise, show that an interpretation of #ccT_{:~=:}# is a bijection. In Haskell, an interpretation of #ccT_bbbN# would effectively be foldNat. Something very interesting happens when you consider what an interpretation of the collage generated by a theory, #ccK_ccT#, is. Spell it out. In a different vein, you can show that a collage #ccK : ccT_1 ↛ ccT_2# and an interpretation #ccT_1^(op) xx ccT_2 -> bb "Set"# are essentially the same thing in the sense that each gives rise to the other.

Two theories are isomorphic if there exists interpretations #ccI_1 : ccT_1 -> ccT_2# and #ccI_2 : ccT_2 -> ccT_1# such that #⟦⟦A⟧^(ccI_1)⟧^(ccI_2) = A# and visa versa, and similarly for function symbols. In other words, each is interpretable in the other, and if you go from one interpretation and then back, you end up where you started. Yet another way to say this is that there is a one-to-one correspondence between sorts and terms of each theory, and this correspondence respects substitution.

As a crucially important example, the set of terms, #ccT(A, B)#, can be extended to an interpretation. In particular, for each sort #A#, #ccT(A, -) : ccT -> bb "Set"#. It’s action on function symbols is the following:

#⟦tt "f"⟧^(ccT(A, -)) -= t |-> tt "f"(t)#

We have, dually, #ccT(-, A) : ccT^(op) -> bb "Set"# with the following action:

#⟦tt "f"⟧^(ccT(-, A)) -= t |-> t(tt "f"(bbx_B))#

We can abstract from both parameters making #ccT(-, =) : ccT^(op) xx ccT -> bb "Set"# which, by an early exercise, can be shown to correspond with the collage #ccK_ccT#.

Via an abuse of notation, I’ll identify #ccT^(op)(A, -)# with #ccT(-, A)#, though technically we only have an isomorphism between the interpretations, and to talk about isomorphisms between interpretations we need the notion of…


The theories we’ve presented are (multi-sorted) universal algebra theories. Universal algebra allows us to specify a general notion of “homomorphism” that generalizes monoid homomorphism or group homomorphism or ring homomorphism or lattice homomorphism.

In universal algebra, the algebraic theory of groups consists of a single sort, a nullary operation, #1#, a binary operation, #*#, a unary operation, #tt "inv"#, and some equations which are unimportant for us. Operations correspond to our function symbols except that they’re are not restricted to being unary. A particular group is a particular interpretation of the algebraic theory of groups, i.e. it is a set and three functions into the set. A group homomorphism then is a function between those two groups, i.e. between the two interpretations, that preserves the operations. In a traditional presentation this would look like the following:

Say #alpha : G -> K# is a group homomorphism from the group #G# to the group #K# and #g, h in G# then:

  • #alpha(1_G) = 1_K#
  • #alpha(g *_G h) = alpha(g) *_K alpha(h)#
  • #alpha(tt "inv"_G(g)) = tt "inv"_K(alpha(g))#

Using something more akin to our notation, it would look like:

  • #alpha(⟦1⟧^G) = ⟦1⟧^K#
  • #alpha(⟦*⟧^G(g,h)) = ⟦*⟧^K(alpha(g), alpha(h))#
  • #alpha(⟦tt "inv"⟧^G(g)) = ⟦tt "inv"⟧^K(alpha(g))#

The #tt "inv"# case is the most relevant for us as it is unary. However, for us, a function symbol #ttf# may have a different source and target and so we made need a different function on each side of the equation. E.g. for #ttf : A -> B#, #alpha : ccI_1 -> ccI_2#, and #a in ⟦A⟧^(ccI_1)# we’d have:

#alpha_B(⟦tt "f"⟧^(ccI_1)(a)) = ⟦tt "f"⟧^(ccI_2)(alpha_A(a))#

So a homomorphism #alpha : ccI_1 -> ccI_2 : ccT -> bb "Set"# is a family of functions, one for each sort of #ccT#, that satisfies the above equation for every function symbol of #ccT#. We call the individual functions making up #alpha# components of #alpha#, and we have #alpha_A : ⟦A⟧^(ccI_1) -> ⟦A⟧^(ccI_2)#. The definition for an interpretation into a theory, #ccT_2#, is identical except the components of #alpha# are terms of #ccT_2# and #a# can be replaced with #bbx_(⟦A⟧^(ccI_1))#. Two interpretations are isomorphic if we have homomorphism #alpha : ccI_1 -> ccI_2# such that each component is a bijection. This is the same as requiring a homomorphism #beta : ccI_2 -> ccI_1# such that for each #A#, #alpha_A(beta_A(x)) = x# and #beta_A(alpha_A(x)) = x#. A similar statement can be made for interpretations into theories, just replace #x# with #bbx_(⟦A⟧)#.

Another way to look at homomorphisms is via collages. A homomorphism #alpha : ccI_1 -> ccI_2 : ccT -> bb "Set"# gives rise to an interpretation of the collage #ccK_ccT#. The interpretation #ccI_alpha : ccK_ccT -> bb "Set"# is defined by:

  • #⟦tt "inl"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_1)#
  • #⟦tt "inr"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_2)#
  • #⟦tt "inl"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_1)#
  • #⟦tt "inr"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_2)#
  • #⟦tt "id"_A⟧^(ccI_alpha) -= alpha_A#

The homomorphism law guarantees that it satisfies the equation on #tt "id"#. Conversely, given an interpretation of #ccK_ccT#, we have the homomorphism, #⟦tt "id"⟧ : ⟦tt "inl"(-)⟧ -> ⟦tt "inr"(-)⟧ : ccT -> bb "Set"#. and the equation on #tt "id"# is exactly the homomorphism law.


Consider a homomorphism #alpha : ccT(A, -) -> ccI#. The #alpha# needs to satisfy for every sort #B# and #C#, every function symbol #ttf : C -> D#, and every term #t : B -> C#:

#alpha_D(tt "f"(t)) = ⟦tt "f"⟧^ccI(alpha_C(t))#

Looking at this equation, the possibility of viewing it as a recursive “definition” leaps out suggesting that the action of #alpha# is completely determined by it’s action on the variables. Something like this, for example:

#alpha_D(tt "f"(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(alpha_C(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(alpha_B(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(⟦tt "h"⟧(alpha_A(bbx_A))))#

We can easily establish that there’s a one-to-one correspondence between the set of homomorphisms #ccT(A, -) -> ccI# and the elements of the set #⟦A⟧^ccI#. Given a homomorphism, #alpha#, we get an element of #⟦A⟧^ccI# via #alpha_A(bbx_A)#. Inversely, given an element #a in ⟦A⟧^ccI#, we can define a homomorphism #a^**# via:

  • #a_D^**(tt "f"(t)) -= ⟦tt "f"⟧^ccI(a_C^**(t))#
  • #a_A^**(bbx_A) -= a#

which clearly satisfies the condition on homomorphisms by definition. It’s easy to verify that #(alpha_A(bbx_A))^** = alpha# and immediately true that #a^**(bbx_A) = a# establishing the bijection.

We can state something stronger. Given any homomorphism #alpha : ccT(A, -) -> ccI# and any function symbol #ttg : A -> X#, we can make a new homomorphism #alpha * ttg : ccT(X, -) -> ccI# via the following definition:

#(alpha * ttg)(t) = alpha(t(tt "g"(bbx_A)))#

Verifying that this is a homomorphism is straightforward:

#(alpha * ttg)(tt "f"(t)) = alpha(tt "f"(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧(alpha(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧((alpha * ttg)(t))#

and like any homomorphism of this form, as we’ve just established, it is completely determined by it’s action on variables, namely #(alpha * ttg)_A(bbx_A) = alpha_X(tt "g"(bbx_A)) = ⟦tt "g"⟧(alpha_A(bbx_A))#. In particular, if #alpha = a^**#, then we have #a^** * ttg = (⟦tt "g"⟧(a))^**#. Together these facts establish that we have an interpretation #ccY : ccT -> bb "Set"# such that #⟦A⟧^ccY -= (ccT(A, -) -> ccI)#, the set of homomorphisms, and #⟦tt "g"⟧^ccY(alpha) -= alpha * tt "g"#. The work we did before established that we have homomorphisms #(-)(bbx) : ccY -> ccI# and #(-)^** : ccI -> ccY# that are inverses. This is true for all theories and all interpretations as at no point did we use any particular facts about them. This statement is the (dual form of the) Yoneda lemma. To get the usual form simply replace #ccT# with #ccT^(op)#. A particularly important and useful case (so useful it’s usually used tacitly) occurs when we choose #ccI = ccT(B,-)#, we get #(ccT(A, -) -> ccT(B, -)) ~= ccT(B, A)# or, choosing #ccT^(op)# everywhere, #(ccT(-, A) -> ccT(-, B)) ~= ccT(A, B)# which states that a term from #A# to #B# is equivalent to a homomorphism from #ccT(-, A)# to #ccT(-, B)#.

There is another result, dual in a different way, called the co-Yoneda lemma. It turns out it is a corollary of the fact that for a collage #ccK : ccT_1 ↛ ccT_2#, #ccK_(ccT_2) @ ccK ~= ccK# and the dual is just the composition the other way. To get (closer to) the precise result, we need to be able to turn an interpretation into a collage. Given an interpretation, #ccI : ccT -> bb "Set"#, we can define a collage #ccK_ccI : ccT_bb1 ↛ ccT# whose bridges from #1 -> A# are the elements of #⟦A⟧^ccI#. Given this, the co-Yoneda lemma is the special case, #ccK_ccT @ ccK_ccI ~= ccK_ccI#.

Note, that the Yoneda and co-Yoneda lemmas only apply to interpretations into sets as #ccY# involves the set of homomorphisms.


The Yoneda lemma suggests that the interpretations #ccT(A, -)# and #ccT(-, A)# are particularly important and this will be borne out as we continue.

We call an interpretation, #ccI : ccT^(op) -> bb "Set"# representable if #ccI ~= ccT(-, X)# for some sort #X#. We then say that #X# represents #ccI#. What this states is that every term of sort #X# corresponds to an element in one of the sets that make up #ccI#, and these transform appropriately. There’s clearly a particularly important element, namely the image of #bbx_X# which corresponds to an element in #⟦X⟧^ccI#. This element is called the universal element. The dual concept is, for #ccI : ccT -> bb "Set"#, #ccI# is co-representable if #ccI ~= ccT(X, -)#. We will also say #X# represents #ccI# in this case as it actually does when we view #ccI# as an interpretation of #(ccT^(op))^(op)#.

As a rather liberating exercise, you should establish the following result called parameterized representability. Assume we have theories #ccT_1# and #ccT_2#, and a family of sorts of #ccT_2#, #X#, and a family of interpretations of #ccT_2^(op)#, #ccI#, both indexed by sorts of #ccT_1#, such that for each #A in bb "sort"(ccT_1)#, #X_A# represents #ccI_A#, i.e. #ccI_A ~= ccT_2(-, X_A)#. Given all this, then there is a unique interpretation #ccX : ccT_1 -> ccT_2# and #ccI : ccT_1 xx ccT_2^(op) -> bb "Set"# where #⟦A⟧^(ccX) -= X_A# and #"⟦("A, B")⟧"^ccI -= ⟦B⟧^(ccI_A)# such that #ccI ~= ccT_2(=,⟦-⟧^ccX)#. To be a bit more clear, the right hand side means #(A, B) |-> ccT_2(B, ⟦A⟧^ccX)#. Simply by choosing #ccT_1# to be a product of multiple theories, we can generalize this result to an arbitrary number of parameters. What makes this result liberating is that we just don’t need to worry about the parameters, they will automatically transform homomorphically. As a technical warning though, since two interpretations may have the same action on sorts but a different action on function symbols, if the family #X_A# was derived from an interpretation #ccJ#, i.e. #X_A -= ⟦A⟧^ccJ#, it may not be the case that #ccX = ccJ#.

Let’s look at some examples.

As a not-so-special case of representability, we can consider #ccI -= ccK(tt "inl"(-), tt "inr"(Z))# where #ccK : ccT_1 ↛ ccT_2#. Saying that #A# represents #ccI# in this case is saying that bridging terms of sort #tt "inr"(Z)#, i.e. sort #Z# in #ccT_2#, in #ccK#, correspond to terms of sort #A# in #ccT_1#. We’ll call the universal element of this representation the universal bridge (though technically it may be a bridging term, not a bridge). Let’s write #varepsilon# for this universal bridge. What representability states in this case is given any bridging term #k# of sort #Z#, there exists a unique term #|~ k ~|# of sort #A# such that #k = varepsilon(|~ k ~|)#. If we have an interpretation #ccX : ccT_2 -> ccT_1# such that #⟦Z⟧^ccX# represents #ccK(tt "inl"(-), tt "inr"(Z))# for each sort #Z# of #ccT_2# we say we have a right representation of #ccK#. Note, that the universal bridges become a family #varepsilon_Z : ⟦Z⟧^ccX -> Z#. Similarly, if #ccK(tt "inl"(A), tt "inr"(-))# is co-representable for each #A#, we say we have a left representation of #ccK#. The co-universal bridge is then a bridging term #eta_A : A -> ⟦A⟧# such that for any bridging term #k# with source #A#, there exists a unique term #|__ k __|# in #ccT_2# such that #k = |__ k __|(eta_A)#. For reference, we’ll call these equations universal properties of the left/right representation. Parameterized representability implies that a left/right representation is essentially unique.

Define #ccI_bb1# via #⟦A⟧^(ccI_bb1) -= bb1# where #bb1# is some one element set. #⟦ttf⟧^(ccI_bb1)# is the identity function for all function symbols #ttf#. We’ll say a theory #ccT# has a unit sort or has a terminal sort if there is a sort that we’ll also call #bb1# that represents #ccI_bb1#. Spelling out what that means, we first note that there is nothing notable about the universal element as it’s the only element. However, writing the homomorphism #! : ccI_bb1 -> ccT(-, bb1)# and noting that since there’s only one element of #⟦A⟧^(ccI_bb1)# we can, with a slight abuse of notation, also write the term #!# picks out as #!# which gives the equation:

#!_B(tt "g"(t)) = !_A(t)# for any function symbol #ttg : A -> B# and term, #t#, of sort #A#, note #!_A : A -> bb1#.

This equation states what the isomorphism also fairly directly states: there is exactly one term of sort #bb1# from any sort #A#, namely #!_A(bbx_A)#. The dual notion is called a void sort or an initial sort and will usually be notated #bb0#, the analog of #!# will be written as #0#. The resulting equation is:

#tt "f"(0_A) = 0_B# for any function symbol #ttf : A -> B#, note #0_A : bb0 -> A#.

For the next example, I’ll leverage collages. Consider the collage #ccK_2 : ccT ↛ ccT xx ccT# whose bridges from #A -> (B, C)# consist of pairs of terms #t_1 : A -> B# and #t_2 : A -> C#. #ccT# has pairs if #ccK_2# has a right representation. We’ll write #(B, C) |-> B xx C# for the representing interpretation’s action on sorts. We’ll write the universal bridge as #(tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C)))#. The universal property then looks like #(tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C)))((: t_1, t_2 :)) = (t_1, t_2)# where #(: t_1, t_2 :) : A -> B xx C# is the unique term induced by the bridge #(t_1, t_2)#. The universal property implies the following equations:

  • #(: tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C))) = bbx_(B xx C)#
  • #tt "fst"((: t_1, t_2 :)) = t_1#
  • #tt "snd"((: t_1, t_2 :)) = t_2#

One aspect of note, is regardless of whether #ccK_2# has a right representation, i.e. regardless of whether #ccT# has pairs, it always has a left representation. The co-universal bridge is #(bbx_A, bbx_A)# and the unique term #|__(t_1, t_2)__|# is #tt "inl"(t_1, bbx_A)(tt "inr"(bbx_A, t_2)(bbx_("("A,A")")))#.

Define an interpretation #Delta : ccT -> ccT xx ccT# so that #⟦A⟧^Delta -= (A,A)# and similarly for function symbols. #Delta# left represents #ccK_2#. If the interpretation #(B,C) |-> B xx C# right represents #ccK_2#, then we say we have an adjunction between #Delta# and #(- xx =)#, written #Delta ⊣ (- xx =)#, and that #Delta# is left adjoint to #(- xx =)#, and conversely #(- xx =)# is right adjoint #Delta#.

More generally, whenever we have the situation #ccT_1(⟦-⟧^(ccI_1), =) ~= ccT_2(-, ⟦=⟧^(ccI_2))# we say that #ccI_1 : ccT_2 -> ccT_1# is left adjoint to #ccI_2 : ccT_1 -> ccT_2# or conversely that #ccI_2# is right adjoint to #ccI_1#. We call this arrangement an adjunction and write #ccI_1 ⊣ ccI_2#. Note that we will always have this situation if #ccI_1# left represents and #ccI_2# right represents the same collage. As we noted above, parameterized representability actually determines one adjoint given (its action on sorts and) the other adjoint. With this we can show that adjoints are unique up to isomorphism, that is, given two left adjoints to an interpretation, they must be isomorphic. Similarly for right adjoints. This means that stating something is a left or right adjoint to some other known interpretation essentially completely characterizes it. One issue with adjunctions is that they tend to be wholesale. Let’s say the pair sort #A xx B# existed but no other pair sorts existed, then the (no longer parameterized) representability approach would work just fine, but the adjunction would no longer exist.

Here’s a few of exercises using this. First, a moderately challenging one (until you catch the pattern): spell out the details to the left adjoint to #Delta#. We say a theory has sums and write those sums as #A + B# if #(- + =) ⊣ Delta#. Recast void and unit sorts using adjunctions and/or left/right representations. As a terminological note, we say a theory has finite products if it has unit sorts and pairs. Similarly, a theory has finite sums or has finite coproducts if it has void sorts and sums. An even more challenging exercise is the following: a theory has exponentials if it has finite products and for every sort #A#, #(A xx -) ⊣ (A => -)# (note, parameterized representability applies to #A#). Spell out the equations characterizing #A => B#.

Finite Product Theories

Finite products start to lift us off the ground. So far the theories we’ve been working with have been extremely basic: a language with only unary functions, all terms being just a sequence of applications of function symbols. It shouldn’t be underestimated though. It’s more than enough to do monoid and group theory. A good amount of graph theory can be done with just this. And obviously we were able to establish several general results assuming only this structure. Nevertheless, while we can talk about specific groups, say, we can’t talk about the theory of groups. Finite products change this.

A theory with finite products allows us to talk about multi-ary function symbols and terms by considering unary function symbols from products. This allows us to do all of universal algebra. For example, the theory of groups, #ccT_(bb "Grp")#, consists of a sort #S# and all it’s products which we’ll abbreviate as #S^n# with #S^0 -= bb1# and #S^(n+1) -= S xx S^n#. It has three function symbols #tte : bb1 -> S#, #ttm : S^2 -> S#, and #tti : S -> S# plus the ones that having finite products requires. In fact, instead of just heaping an infinite number of sorts and function symbols into our theory — and we haven’t even gotten to equations — let’s define a compact set of data from which we can generate all this data.

A signature, #Sigma#, consists of a collection of sorts, #sigma#, a collection of multi-ary function symbols, and a collection of equations. Equations still remain pairs of terms, but we need to now extend our definition of terms for this context. A term (in a signature) is either a variable, #bbx_i^[A_0,A_1,...,A_n]# with #A_i# are sorts and #0 <= i <= n#, the operators #tt "fst"# or #tt "snd"# applied to a term, the unit term written #(::)^A# with sort #A#, a pair of terms written #(: t_1, t_2 :)#, or the (arity correct) application of a multi-ary function symbol to a series of terms, e.g. #tt "f"(t_1, t_2, t_3)#. As a Haskell data declaration, it might look like:

data SigTerm
    = SigVar [Sort] Int
    | Fst SigTerm
    | Snd SigTerm
    | Unit Sort
    | Pair SigTerm SigTerm
    | SigApply FunctionSymbol [SigTerm]

At this point, sorting (i.e. typing) the terms is no longer trivial, though it is still pretty straightforward. Sorts are either #bb1#, or #A xx B# for sorts #A# and #B#, or a sort #A in sigma#. The source of function symbols or terms are lists of sorts.

  • #bbx_i^[A_0, A_1, ..., A_n] : [A_0, A_1, ..., A_n] -> A_i#
  • #(::)^A : [A] -> bb1#
  • #(: t_1, t_2 :) : bar S -> T_1 xx T_2# where #t_i : bar S -> T_i#
  • #tt "fst"(t) : bar S -> T_1# where #t : bar S -> T_1 xx T_2#
  • #tt "snd"(t) : bar S -> T_2# where #t : bar S -> T_1 xx T_2#
  • #tt "f"(t_1, ..., t_n) : bar S -> T# where #t_i : bar S -> T_i# and #ttf : [T_1,...,T_n] -> T#

The point of a signature was to represent a theory so we can compile a term of a signature into a term of a theory with finite products. The theory generated from a signature #Sigma# has the same sorts as #Sigma#. The equations will be equations of #Sigma#, with the terms compiled as will be described momentarily, plus for every pair of sorts the equations that describe pairs and the equations for #!#. Finally, we need to describe how to take a term of the signature and make a function symbol of the theory, but before we do that we need to explain how to convert those sources of the terms which are lists. That’s just a conversion to right nested pairs, #[A_0,...,A_n] |-> A_0 xx (... xx (A_n xx bb1) ... )#. The compilation of a term #t#, which we’ll write as #ccC[t]#, is defined as follows:

  • #ccC[bbx_i^[A_0, A_1, ..., A_n]] = tt "snd"^i(tt "fst"(bbx_(A_i xx(...))))# where #tt "snd"^i# means the #i#-fold application of #tt "snd"#
  • #ccC[(::)^A] = !_A#
  • #ccC[(: t_1, t_2 :)] = (: ccC[t_1], ccC[t_2] :)#
  • #ccC[tt "fst"(t)] = tt "fst"(ccC[t])#
  • #ccC[tt "snd"(t)] = tt "snd"(ccC[t])#
  • #ccC[tt "f"(t_1, ..., t_n)] = tt "f"((: ccC[t_1], (: ... , (: ccC[t_n], ! :) ... :) :))#

As you may have noticed, the generated theory will have an infinite number of sorts, an infinite number of function symbols, and an infinite number of equations no matter what the signature is — even an empty one! Having an infinite number of things isn’t a problem as long as we can algorithmically describe them and this is what the signature provides. Of course, if you’re a (typical) mathematician you nominally don’t care about an algorithmic description. Besides being compact, signatures present a nicer term language. The theories are like a core or assembly language. We could define a slightly nicer variation where we keep a context and manage named variables leading to terms-in-context like:

#x:A, y:B |-- tt "f"(x, x, y)#

which is

#tt "f"(bbx_0^[A,B], bbx_0^[A,B], bbx_1^[A,B])#

for our current term language for signatures. Of course, compilation will be (slightly) trickier for the nicer language.

The benefit of having compiled the signature to a theory, in addition to being able to reuse the results we’ve established for theories, is we only need to define operations on the theory, which is simpler since we only need to deal with pairs and unary function symbols. One example of this is we’d like to extend our notion of interpretation to one that respects the structure of the signature, and we can do that by defining an interpretation of theories that respects finite products.

A finite product preserving interpretation (into a finite product theory), #ccI#, is an interpretation (into a finite product theory) that additionally satisfies:

  • #⟦bb1⟧^ccI = bb1#
  • #⟦A xx B⟧^ccI = ⟦A⟧^ccI xx ⟦B⟧^ccI#
  • #⟦!_A⟧^ccI = !_(⟦A⟧^ccI)#
  • #⟦tt "fst"(t)⟧^ccI = tt "fst"(⟦t⟧^ccI)#
  • #⟦tt "snd"(t)⟧^ccI = tt "snd"(⟦t⟧^ccI)#
  • #⟦(: t_1, t_2 :)⟧^ccI = (: ⟦t_1⟧^ccI, ⟦t_2⟧^ccI :)#

where, for #bb "Set"#, #bb1 -= {{}}#, #xx# is the cartesian product, #tt "fst"# and #tt "snd"# are the projections, #!_A -= x |-> \{\}#, and #(: f, g :) -= x |-> (: f(x), g(x) :)#.

With signatures, we can return to our theory, now signature, of groups. #Sigma_bb "Grp"# has a single sort #S#, three function symbols #tte : [bb1] -> S#, #tti : [S] -> S#, and #ttm : [S, S] -> S#, with the following equations (written as equations rather than pairs):

  • #tt "m"(tt "e"((::)^S), bbx_0^S) = bbx_0^S#
  • #tt "m"(tt "i"(bbx_0^S), bbx_0^S) = tt "e"((::)^S)#
  • #tt "m"(tt "m"(bbx_0^[S,S,S], bbx_1^[S,S,S]), bbx_2^[S,S,S]) = tt "m"(bbx_0^[S,S,S], tt "m"(bbx_1^[S,S,S], bbx_2^[S,S,S]))#

or using the nicer syntax:

  • #x:S |-- tt "m"(tt "e"(), x) = x#
  • #x:S |-- tt "m"(tt "i"(x), x) = tt "e"()#
  • #x:S, y:S, z:S |-- tt "m"(tt "m"(x, y), z) = tt "m"(x, tt "m"(y, z))#

An actual group is then just a finite product preserving interpretation of (the theory generated by) this signature. All of universal algebra and much of abstract algebra can be formulated this way.

The Simply Typed Lambda Calculus and Beyond

We can consider additionally assuming that our theory has exponentials. I left articulating exactly what that means as an exercise, but the upshot is we have the following two operations:

For any term #t : A xx B -> C#, we have the term #tt "curry"(t) : A -> C^B#. We also have the homomorphism #tt "app"_(AB) : B^A xx A -> B#. They satisfy:

  • #tt "curry"(tt "app"(bbx_(B^A xx A))) = bbx_(B^A)#
  • #tt "app"((: tt "curry"(t_1), t_2 :)) = t_1((: bbx_A, t_2 :))# where #t_1 : A xx B -> C# and #t_2 : A -> B#.

We can view these, together with the the product operations, as combinators and it turns out we can compile the simply typed lambda calculus into the above theory. This is exactly what the Categorical Abstract Machine did. The “Caml” in “O’Caml” stands for “Categorical Abstract Machine Language”, though O’Caml no longer uses the CAM. Conversely, every term of the theory can be expressed as a simply typed lambda term. This means we can view the simply typed lambda calculus as just a different presentation of the theory.

At this point, this presentation of category theory starts to connect to the mainstream categorical literature on universal algebra, internal languages, sketches, and internal logic. This page gives a synopsis of the relationship between type theory and category theory. For some reason, it is unusual to talk about the internal language of a plain category, but that is exactly what we’ve done here.

I haven’t talked about finite limits or colimits beyond products and coproducts, nor have I talked about even the infinitary versions of products and coproducts, let alone arbitrary limits and colimits. These can be handled the same way as products and coproducts. Formulating a language like signatures or the simply typed lambda calculus is a bit more complicated, but not that hard. I may make a follow-up article covering this among other things. I also have a side project (don’t hold your breath), that implements the internal language of a category with finite limits. The result looks roughly like a simple version of an algebraic specification language like the OBJ family. The RING theory described in the Maude manual gives an idea of what it would look like. In fact, here’s an example of the current actual syntax I’m using.3

theory Categories
    type O
    type A
    given src : A -> O
    given tgt : A -> O

    given id : O -> A
    satisfying o:O | src (id o) = o, tgt (id o) = o

    given c : { f:A, g:A | src f = tgt g } -> A
    satisfying (f, g):{ f:A, g:A | src f = tgt g }
        | tgt (c (f, g)) = tgt f, src (c (f, g)) = src g
    satisfying "left unit" (o, f):{ o:O, f:A | tgt f = o }
        | c (id o, f) = f
    satisfying "right unit" (o, f):{ o:O, f:A | src f = o }
        | c (f, id o) = f
    satisfying "associativity" (f, g, h):{ f:A, g:A, h:A | src f = tgt g, src g = tgt h }
        | c (c (f, g), h) = c (f, c (g, h))

It turns out this is a particularly interesting spot in the design space. The fact that the theory of theories with finite limits is itself a theory with finite limits has interesting consequences. It is still relatively weak though. For example, it’s not possible to describe the theory of fields in this language.

There are other directions one could go. For example, the internal logic of monoidal categories is (a fragment of) ordered linear logic. You can cross this bridge either way. You can look at different languages and consider what categorical structure is needed to support the features of the language, or you can add features to the category and see how that impacts the internal language. The relationship is similar to the source language and a core/intermediate language in a compiler, e.g. GHC Haskell and System Fc.


If you’ve looked at category theory at all, you can probably make most of the connections without me telling you. The table below outlines the mapping, but there are some subtleties. First, as a somewhat technical detail, my definition of a theory corresponds to a small category, i.e. a category which has a set of objects and a set of arrows. For more programmer types, you should think of “set” as Set in Agda, i.e. similar to the * kind in Haskell. Usually “category” means “locally small category” which may have a proper class of objects and between any two objects a set of arrows (though the union of all those sets may be a proper class). Again, for programmers, the distinction between “class” and “set” is basically the difference between Set and Set1 in Agda.4 To make my definition of theory closer to this, all that is necessary is instead of having a set of function symbols, have a family of sets indexed by pairs of objects. Here’s what a partial definition in Agda of the two scenarios would look like:

-- Small category (the definition I used)
record SmallCategory : Set1 where
        objects : Set
        arrows : Set
        src : arrows -> objects
        tgt : arrows -> objects

-- Locally small category
record LocallySmallCategory : Set2 where
        objects : Set1
        hom : objects -> objects -> Set

-- Different presentation of a small category
record SmallCategory' : Set1 where
        objects : Set
        hom : objects -> objects -> Set

The benefit of the notion of locally small category is that Set itself is a locally small category. The distinction I was making between interpretations into theories and interpretations into Set was due to the fact that Set wasn’t a theory. If I used a definition theory corresponding to a locally small category, I could have combined the notions of interpretation by making Set a theory. The notion of a small category, though, is still useful. Also, an interpretation into Set corresponds to the usual notion of a model or semantics, while interpretations into other theories was a less emphasized concept in traditional model theory and universal algebra.

A less technical and more significant difference is that my definition of a theory doesn’t correspond to a category, but rather to a presentation of a category, from which a category can be generated. The analog of arrows in a category is terms, not function symbols. This is a bit more natural route from the model theory/universal algebra/programming side. Similarly, having an explicit collection of equations, rather than just an equivalence relation on terms is part of the presentation of the category but not part of the category itself.

model theory category theory
sort object
term arrow
function symbol generating arrow
theory presentation of a (small) category
collage collage, cograph of a profunctor
bridge heteromorphism
signature presentation of a (small) category with finite products
interpretation into sets, aka models a functor into Set, a (co)presheaf
interpretation into a theory functor
homomorphism natural transformation
simply typed lambda calculus (with products) a cartesian closed category


In some ways I’ve stopped just when things were about to get good. I may do a follow-up to elaborate on this good stuff. Some examples are: if I expand the definition so that Set becomes a “theory”, then interpretations also form such a “theory”, and these are often what we’re really interested in. The category of finite-product preserving interpretations of the theory of groups essentially is the category of groups. In fact, universal algebra is, in categorical terms, just the study of categories with finite products and finite-product preserving functors from them, particularly into Set. It’s easy to generalize this in many directions. It’s also easy to make very general definitions, like a general definition of a free algebraic structure. In general, we’re usually more interested in the interpretations of a theory than the theory itself.

While I often do advocate thinking in terms of internal languages of categories, I’m not sure that it is a preferable perspective for the very basics of category theory. Nevertheless, there are a few reasons for why I wrote this. First, this very syntactical approach is, I think, more accessible to someone coming from a programming background. From this view, a category is a very simple programming language. Adding structure to the category corresponds to adding features to this programming language. Interpretations are denotational semantics.

Another aspect about this presentation that is quite different is the use and emphasis on collages. Collages correspond to profunctors, a crucially important and enabling concept that is rarely covered in categorical introductions. The characterization of profunctors as collages in Vaughn Pratt’s paper (not using that name) was one of the things I enjoyed about that paper and part of what prompted me to start writing this. In earlier, drafts of this article, I was unable to incorporate collages in a meaningful way as I was trying to start from profunctors. This approach just didn’t add value. Collages just looked like a bizarre curio and weren’t integrated into the narrative at all. For other reasons, though, I ended up revisiting the idea of a heteromorphism. My (fairly superficial) opinion is that once you have the notion of functors and natural transformations, adding the notion of heteromorphisms has a low power-to-weight ratio, though it does make some things a bit nicer. Nevertheless, in thinking of how best to fit them into this context, it was clear that collages provided the perfect mechanism (which isn’t a big surprise), and the result works rather nicely. When I realized a fact that can be cryptically but compactly represented as #ccK_ccT ≃ bbbI xx ccT# where #bbbI# is the interval category, i.e. two objects with a single arrow joining them, I realized that this is actually an interesting perspective. Since most of this article was written at that point, I wove collages into the narrative replacing some things. If, though, I had started with this perspective from the beginning I suspect I would have made a significantly different article, though the latter sections would likely be similar.

  1. It’s actually better to organize this as a family of collections of function symbols indexed by pairs of sorts.

  2. Instead of having equations that generate an equivalence relation on (raw) terms, we could simply require an equivalence relation on (raw) terms be directly provided.

  3. Collaging is actually quite natural in this context. I already intend to support one theory importing another. A collage is just a theory that imports two others and then adds function symbols between them.

  4. For programmers familiar with Agda, at least, if you haven’t made this connection, this might help you understand and appreciate what a “class” is versus a “set” and what “size issues” are, which is typically handled extremely vaguely in a lot of the literature.

November 25, 2016 06:23 AM

November 24, 2016

Mark Jason Dominus

Imaginary Albanian eggplant festivals… IN SPACE

Wikipedia has a list of harvest festivals which includes this intriguing entry:

Ysolo: festival marking the first day of harvest of eggplants in Tirana, Albania

(It now says “citation needed“; I added that yesterday.)

I am confident that this entry, inserted in July 2012 by an anonymous user, is a hoax. When I first read it, I muttered “Oh, what bullshit,” but then went looking for a reliable source, because you never know. I have occasionally been surprised in the past, but this time I found clear evidence of a hoax: There are only a couple of scattered mentions of Ysolo on a couple of blogs, all from after 2012, and nothing at all in Google Books about Albanian eggplant celebrations. Nor is there an article about it in Albanian Wikipedia.

But reality gave ground before I arrived on the scene. Last September NASA's Dawn spacecraft visited the dwarf planet Ceres. Ceres is named for the Roman goddess of the harvest, and so NASA proposed harvest-related names for Ceres’ newly-discovered physical features. It appears that someone at NASA ransacked the Wikipedia list of harvest festivals without checking whether they were real, because there is now a large mountain at Ceres’ north pole whose official name is Ysolo Mons, named for this spurious eggplant festival. (See also: NASA JPL press release; USGS Astrogeology Science Center announcement.)

To complete the magic circle of fiction, the Albanians might begin to celebrate the previously-fictitious eggplant festival. (And why not? Eggplants are lovely.) Let them do it for a couple of years, and then Wikipedia could document the real eggplant festival… Why not fall under the spell of Tlön and submit to the minute and vast evidence of an ordered planet?

Happy Ysolo, everyone.

by Mark Dominus ( at November 24, 2016 06:04 PM

Dominic Steinitz

Mercator: A Connection with Torsion


In most presentations of Riemannian geometry, e.g. O’Neill (1983) and Wikipedia, the fundamental theorem of Riemannian geometry (“the miracle of Riemannian geometry”) is given: that for any semi-Riemannian manifold there is a unique torsion-free metric connection. I assume partly because of this and partly because the major application of Riemannian geometry is General Relativity, connections with torsion are given little if any attention.

It turns out we are all very familiar with a connection with torsion: the Mercator projection. Some mathematical physics texts, e.g. Nakahara (2003), allude to this but leave the details to the reader. Moreover, this connection respects the metric induced from Euclidean space.

We use SageManifolds to assist with the calculations. We hint at how this might be done more slickly in Haskell.

A Cartographic Aside

%matplotlib inline
/Applications/SageMath/local/lib/python2.7/site-packages/traitlets/ DeprecationWarning: A parent of InlineBackend._config_changed has adopted the new @observe(change) API
  clsname, change_or_name), DeprecationWarning)
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import as ccrs
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
plt.figure(figsize=(8, 8))

ax = plt.axes(




We can see Greenland looks much broader at the North than in the middle. But if we use a polar projection (below) then we see this is not the case. Why then is the Mercator projection used in preference to e.g. the polar projection or the once controversial Gall-Peters – see here for more on map projections.

plt.figure(figsize=(8, 8))

bx = plt.axes(

bx.set_extent([-180, 180, 90, 50], ccrs.PlateCarree())





This is written as an Jupyter notebook. In theory, it should be possible to run it assuming you have installed at least sage and Haskell. To publish it, I used

jupyter-nbconvert --to markdown Mercator.ipynb
pandoc -s -t markdown+lhs -o Mercator.lhs \
       --filter pandoc-citeproc --bibliography DiffGeom.bib
BlogLiteratelyD --wplatex Mercator.lhs > Mercator.html

Not brilliant but good enough.

Some commands to jupyter to display things nicely.

%display latex
viewer3D = 'tachyon'

Warming Up With SageManifolds

Let us try a simple exercise: finding the connection coefficients of the Levi-Civita connection for the Euclidean metric on \mathbb{R}^2 in polar co-ordinates.

Define the manifold.

N = Manifold(2, 'N',r'\mathcal{N}', start_index=1)

Define a chart and frame with Cartesian co-ordinates.

ChartCartesianN.<x,y> = N.chart()
FrameCartesianN = ChartCartesianN.frame()

Define a chart and frame with polar co-ordinates.

ChartPolarN.<r,theta> = N.chart()
FramePolarN = ChartPolarN.frame()

The standard transformation from Cartesian to polar co-ordinates.

cartesianToPolar = ChartCartesianN.transition_map(ChartPolarN, (sqrt(x^2 + y^2), arctan(y/x)))
Change of coordinates from Chart (N, (x, y)) to Chart (N, (r, theta))

\displaystyle       \left\{\begin{array}{lcl} r & = & \sqrt{x^{2} + y^{2}} \\ \theta & = & \arctan\left(\frac{y}{x}\right) \end{array}\right.

cartesianToPolar.set_inverse(r * cos(theta), r * sin(theta))
Check of the inverse coordinate transformation:
   x == x
   y == y
   r == abs(r)
   theta == arctan(sin(theta)/cos(theta))

Now we define the metric to make the manifold Euclidean.

g_e = N.metric('g_e')
g_e[1,1], g_e[2,2] = 1, 1

We can display this in Cartesian co-ordinates.


\displaystyle       g_e = \mathrm{d} x\otimes \mathrm{d} x+\mathrm{d} y\otimes \mathrm{d} y

And we can display it in polar co-ordinates


\displaystyle       g_e = \mathrm{d} r\otimes \mathrm{d} r + \left( x^{2} + y^{2} \right) \mathrm{d} \theta\otimes \mathrm{d} \theta

Next let us compute the Levi-Civita connection from this metric.

nab_e = g_e.connection()

\displaystyle       \nabla_{g_e}

If we use Cartesian co-ordinates, we expect that \Gamma^k_{ij} = 0, \forall i,j,k. Only non-zero entries get printed.


Just to be sure, we can print out all the entries.


\displaystyle       \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]

In polar co-ordinates, we get


\displaystyle       \begin{array}{lcl} \Gamma_{ \phantom{\, r } \, \theta \, \theta }^{ \, r \phantom{\, \theta } \phantom{\, \theta } } & = & -\sqrt{x^{2} + y^{2}} \\ \Gamma_{ \phantom{\, \theta } \, r \, \theta }^{ \, \theta \phantom{\, r } \phantom{\, \theta } } & = & \frac{1}{\sqrt{x^{2} + y^{2}}} \\ \Gamma_{ \phantom{\, \theta } \, \theta \, r }^{ \, \theta \phantom{\, \theta } \phantom{\, r } } & = & \frac{1}{\sqrt{x^{2} + y^{2}}} \end{array}

Which we can rew-rewrite as

\displaystyle   \begin{aligned}  \Gamma^r_{\theta,\theta} &= -r \\  \Gamma^\theta_{r,\theta} &= 1/r \\  \Gamma^\theta_{\theta,r} &= 1/r  \end{aligned}

with all other entries being 0.

The Sphere

We define a 2 dimensional manifold. We call it the 2-dimensional (unit) sphere but it we are going to remove a meridian to allow us to define the desired connection with torsion on it.

S2 = Manifold(2, 'S^2', latex_name=r'\mathbb{S}^2', start_index=1)

\displaystyle       \mathbb{S}^2

To start off with we cover the manifold with two charts.

polar.<th,ph> = S2.chart(r'th:(0,pi):\theta ph:(0,2*pi):\phi'); print(latex(polar))

\displaystyle       \left(\mathbb{S}^2,({\theta}, {\phi})\right)

mercator.<xi,ze> = S2.chart(r'xi:(-oo,oo):\xi ze:(0,2*pi):\zeta'); print(latex(mercator))

\displaystyle       \left(\mathbb{S}^2,({\xi}, {\zeta})\right)

We can now check that we have two charts.


\displaystyle       \left[\left(\mathbb{S}^2,({\theta}, {\phi})\right), \left(\mathbb{S}^2,({\xi}, {\zeta})\right)\right]

We can then define co-ordinate frames.

epolar = polar.frame(); print(latex(epolar))

\displaystyle       \left(\mathbb{S}^2 ,\left(\frac{\partial}{\partial {\theta} },\frac{\partial}{\partial {\phi} }\right)\right)

emercator = mercator.frame(); print(latex(emercator))

\displaystyle       \left(\mathbb{S}^2 ,\left(\frac{\partial}{\partial {\xi} },\frac{\partial}{\partial {\zeta} }\right)\right)

And define a transition map and its inverse from one frame to the other checking that they really are inverses.

xy_to_uv = polar.transition_map(mercator, (log(tan(th/2)), ph))
xy_to_uv.set_inverse(2*arctan(exp(xi)), ze)
Check of the inverse coordinate transformation:
   th == 2*arctan(sin(1/2*th)/cos(1/2*th))
   ph == ph
   xi == xi
   ze == ze

We can define the metric which is the pullback of the Euclidean metric on \mathbb{R}^3.

g = S2.metric('g')
g[1,1], g[2,2] = 1, (sin(th))^2

And then calculate the Levi-Civita connection defined by it.

nab_g = g.connection()

\displaystyle       \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\phi} \, {\phi} }^{ \, {\theta} \phantom{\, {\phi} } \phantom{\, {\phi} } } & = & -\cos\left({\theta}\right) \sin\left({\theta}\right) \\ \Gamma_{ \phantom{\, {\phi} } \, {\theta} \, {\phi} }^{ \, {\phi} \phantom{\, {\theta} } \phantom{\, {\phi} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \\ \Gamma_{ \phantom{\, {\phi} } \, {\phi} \, {\theta} }^{ \, {\phi} \phantom{\, {\phi} } \phantom{\, {\theta} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}

We know the geodesics defined by this connection are the great circles.

We can check that this connection respects the metric.


\displaystyle       \nabla_{g} g = 0

And that it has no torsion.


A New Connection

Let us now define an orthonormal frame.

ch_basis = S2.automorphism_field()
ch_basis[1,1], ch_basis[2,2] = 1, 1/sin(th)
e = S2.default_frame().new_frame(ch_basis, 'e')

\displaystyle       \left(\mathbb{S}^2, \left(e_1,e_2\right)\right)

We can calculate the dual 1-forms.

dX = S2.coframes()[2] ; print(latex(dX))

\displaystyle       \left(\mathbb{S}^2, \left(e^1,e^2\right)\right)

print(latex((dX[1], dX[2])))

\displaystyle       \left(e^1, e^2\right)

print(latex((dX[1][:], dX[2][:])))

\displaystyle       \left(\left[1, 0\right], \left[0, \sin\left({\theta}\right)\right]\right)

In this case it is trivial to check that the frame and coframe really are orthonormal but we let sage do it anyway.

print(latex(((dX[1](e[1]).expr(), dX[1](e[2]).expr()), (dX[2](e[1]).expr(), dX[2](e[2]).expr()))))

\displaystyle       \left(\left(1, 0\right), \left(0, 1\right)\right)

Let us define two vectors to be parallel if their angles to a given meridian are the same. For this to be true we must have a connection \nabla with \nabla e_1 = \nabla e_2 = 0.

nab = S2.affine_connection('nabla', latex_name=r'\nabla')

Displaying the connection only gives the non-zero components.


For safety, let us check all the components explicitly.


\displaystyle       \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]

Of course the components are not non-zero in other frames.


\displaystyle       \begin{array}{lcl} \Gamma_{ \phantom{\, {\phi} } \, {\phi} \, {\theta} }^{ \, {\phi} \phantom{\, {\phi} } \phantom{\, {\theta} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}


\displaystyle       \begin{array}{lcl} \Gamma_{ \phantom{\, {\xi} } \, {\xi} \, {\xi} }^{ \, {\xi} \phantom{\, {\xi} } \phantom{\, {\xi} } } & = & 2 \, \cos\left(\frac{1}{2} \, {\theta}\right)^{2} - 1 \\ \Gamma_{ \phantom{\, {\zeta} } \, {\zeta} \, {\xi} }^{ \, {\zeta} \phantom{\, {\zeta} } \phantom{\, {\xi} } } & = & \frac{2 \, \cos\left(\frac{1}{2} \, {\theta}\right) \cos\left({\theta}\right) \sin\left(\frac{1}{2} \, {\theta}\right)}{\sin\left({\theta}\right)} \end{array}

This connection also respects the metric g.


\displaystyle       \nabla g = 0

Thus, since the Levi-Civita connection is unique, it must have torsion.


\displaystyle       \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} e_2\otimes e^1\otimes e^2 -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} e_2\otimes e^2\otimes e^1

The equations for geodesics are

\displaystyle   \ddot{\gamma}^k + \Gamma_{ \phantom{\, {k} } \, {i} \, {j} }^{ \, {k} \phantom{\, {i} } \phantom{\, {j} } }\dot{\gamma}^i\dot{\gamma}^j = 0

Explicitly for both variables in the polar co-ordinates chart.

\displaystyle   \begin{aligned}  \ddot{\gamma}^\phi & + \frac{\cos\theta}{\sin\theta}\dot{\gamma}^\phi\dot{\gamma}^\theta &= 0 \\  \ddot{\gamma}^\theta & &= 0  \end{aligned}

We can check that \gamma^\phi(t) = \alpha\log\tan t/2 and \gamma^\theta(t) = t are solutions although sage needs a bit of prompting to help it.

t = var('t'); a = var('a')
print(latex(diff(a * log(tan(t/2)),t).simplify_full()))

\displaystyle       \frac{a}{2 \, \cos\left(\frac{1}{2} \, t\right) \sin\left(\frac{1}{2} \, t\right)}

We can simplify this further by recalling the trignometric identity.

print(latex(sin(2 * t).trig_expand()))

\displaystyle       2 \, \cos\left(t\right) \sin\left(t\right)

print(latex(diff (a / sin(t), t)))

\displaystyle       -\frac{a \cos\left(t\right)}{\sin\left(t\right)^{2}}

In the mercator co-ordinates chart this is

\displaystyle   \begin{aligned}  \gamma^\xi(t) &= \alpha\log\tan t/2 \\   \gamma^\zeta(t) &= \log\tan t/2  \end{aligned}

In other words: straight lines.

Reparametersing with s = \alpha\log\tan t/2 we obtain

\displaystyle   \begin{aligned}  \gamma^\phi(s) &= s \\  \gamma^\theta(s) &= 2\arctan e^\frac{s}{\alpha}  \end{aligned}

Let us draw such a curve.

R.<t> = RealLine() ; print(R)
Real number line R
c = S2.curve({polar: [2*atan(exp(-t/10)), t]}, (t, -oo, +oo), name='c')

\displaystyle       \begin{array}{llcl} c:& \mathbb{R} & \longrightarrow & \mathbb{S}^2 \\ & t & \longmapsto & \left({\theta}, {\phi}\right) = \left(2 \, \arctan\left(e^{\left(-\frac{1}{10} \, t\right)}\right), t\right) \\ & t & \longmapsto & \left({\xi}, {\zeta}\right) = \left(-\frac{1}{10} \, t, t\right) \end{array}


\displaystyle       \mathrm{Hom}\left(\mathbb{R},\mathbb{S}^2\right)

c.plot(chart=polar, aspect_ratio=0.1)


It’s not totally clear this is curved so let’s try with another example.

d = S2.curve({polar: [2*atan(exp(-t)), t]}, (t, -oo, +oo), name='d')

\displaystyle       \begin{array}{llcl} d:& \mathbb{R} & \longrightarrow & \mathbb{S}^2 \\ & t & \longmapsto & \left({\theta}, {\phi}\right) = \left(2 \, \arctan\left(e^{\left(-t\right)}\right), t\right) \\ & t & \longmapsto & \left({\xi}, {\zeta}\right) = \left(-t, t\right) \end{array}

d.plot(chart=polar, aspect_ratio=0.2)


Now it’s clear that a straight line is curved in polar co-ordinates.

But of course in Mercator co-ordinates, it is a straight line. This explains its popularity with mariners: if you draw a straight line on your chart and follow that bearing or rhumb line using a compass you will arrive at the end of the straight line. Of course, it is not the shortest path; great circles are but is much easier to navigate.

c.plot(chart=mercator, aspect_ratio=0.1)


d.plot(chart=mercator, aspect_ratio=1.0)


We can draw these curves on the sphere itself not just on its charts.

R3 = Manifold(3, 'R^3', r'\mathbb{R}^3', start_index=1)
cart.<X,Y,Z> = R3.chart(); print(latex(cart))

\displaystyle       \left(\mathbb{R}^3,(X, Y, Z)\right)

Phi = S2.diff_map(R3, {
    (polar, cart): [sin(th) * cos(ph), sin(th) * sin(ph), cos(th)],
    (mercator, cart): [cos(ze) / cosh(xi), sin(ze) / cosh(xi),
                       sinh(xi) / cosh(xi)]
    name='Phi', latex_name=r'\Phi')

We can either plot using polar co-ordinates.

graph_polar = polar.plot(chart=cart, mapping=Phi, nb_values=25, color='blue')
show(graph_polar, viewer=viewer3D)


Or using Mercator co-ordinates. In either case we get the sphere (minus the prime meridian).

graph_mercator = mercator.plot(chart=cart, mapping=Phi, nb_values=25, color='red')
show(graph_mercator, viewer=viewer3D)


We can plot the curve with an angle to the meridian of \pi/2 - \arctan 1/10

graph_c = c.plot(mapping=Phi, max_range=40, plot_points=200, thickness=2)
show(graph_polar + graph_c, viewer=viewer3D)


And we can plot the curve at angle of \pi/4 to the meridian.

graph_d = d.plot(mapping=Phi, max_range=40, plot_points=200, thickness=2, color="green")
show(graph_polar + graph_c + graph_d, viewer=viewer3D)



With automatic differentiation and symbolic numbers, symbolic differentiation is straigtforward in Haskell.

> import Data.Number.Symbolic
> import Numeric.AD
> x = var "x"
> y = var "y"
> test xs = jacobian ((\x -> [x]) . f) xs
>   where
>     f [x, y] = sqrt $ x^2 + y^2
ghci> test [1, 1]

ghci> test [x, y]
  [[x/(2.0*sqrt (x*x+y*y))+x/(2.0*sqrt (x*x+y*y)),y/(2.0*sqrt (x*x+y*y))+y/(2.0*sqrt (x*x+y*y))]]

Anyone wishing to take on the task of producing a Haskell version of sagemanifolds is advised to look here before embarking on the task.

Appendix A: Conformal Equivalence

Agricola and Thier (2004) shows that the geodesics of the Levi-Civita connection of a conformally equivalent metric are the geodesics of a connection with vectorial torsion. Let’s put some but not all the flesh on the bones.

The Koszul formula (see e.g. (O’Neill 1983)) characterizes the Levi-Civita connection \nabla

\displaystyle   \begin{aligned}  2  \langle \nabla_X Y, Z\rangle & = X  \langle Y,Z\rangle + Y  \langle Z,X\rangle - Z  \langle X,Y\rangle \\  &-  \langle X,[Y,Z]\rangle +   \langle Y,[Z,X]\rangle +  \langle Z,[X,Y]\rangle  \end{aligned}

Being more explicit about the metric, this can be re-written as

\displaystyle   \begin{aligned}  2 g(\nabla^g_X Y, Z) & = X g(Y,Z) + Y g(Z,X) - Z g(X,Y) \\  &- g(X,[Y,Z]) +  g(Y,[Z,X]) + g(Z,[X,Y])  \end{aligned}

Let \nabla^h be the Levi-Civita connection for the metric h = e^{2\sigma}g where \sigma \in C^\infty M. Following [Gadea2010] and substituting into the Koszul formula and then applying the product rule

\displaystyle   \begin{aligned}  2 e^{2 \sigma} g(\nabla^h_X Y, Z) & = X  e^{2 \sigma} g(Y,Z) + Y e^{2 \sigma} g(Z,X) - Z e^{2 \sigma} g(X,Y) \\  & + e^{2 \sigma} g([X,Y],Z]) - e^{2 \sigma} g([Y,Z],X) + e^{2 \sigma} g([Z,X],Y) \\  & = 2 e^{2\sigma}[g(\nabla^{g}_X Y, Z) + X\sigma g(Y,Z) + Y\sigma g(Z,X) - Z\sigma g(X,Y)] \\  & = 2 e^{2\sigma}[g(\nabla^{g}_X Y + X\sigma Y + Y\sigma X - g(X,Y) \mathrm{grad}\sigma, Z)]  \end{aligned}

Where as usual the vector field, \mathrm{grad}\phi for \phi \in C^\infty M, is defined via g(\mathrm{grad}\phi, X) = \mathrm{d}\phi(X) = X\phi.

Let’s try an example.

nab_tilde = S2.affine_connection('nabla_t', r'\tilde_{\nabla}')
f = S2.scalar_field(-ln(sin(th)), name='f')
for i in S2.irange():
    for j in S2.irange():
        for k in S2.irange():
            nab_tilde.add_coef()[k,i,j] = \
                nab_g(polar.frame()[i])(polar.frame()[j])(polar.coframe()[k]) + \
                polar.frame()[i](f) * polar.frame()[j](polar.coframe()[k]) + \
                polar.frame()[j](f) * polar.frame()[i](polar.coframe()[k]) + \
                g(polar.frame()[i], polar.frame()[j]) * \
                polar.frame()[1](polar.coframe()[k]) * cos(th) / sin(th)

\displaystyle       \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\theta} \, {\theta} }^{ \, {\theta} \phantom{\, {\theta} } \phantom{\, {\theta} } } & = & -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}

g_tilde = exp(2 * f) * g

\displaystyle       \mathcal{T}^{(0,2)}\left(\mathbb{S}^2\right)


\displaystyle       \left(\begin{array}{rr}      \frac{1}{\sin\left({\theta}\right)^{2}} & 0 \\      0 & 1      \end{array}\right)

nab_g_tilde = g_tilde.connection()

\displaystyle       \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\theta} \, {\theta} }^{ \, {\theta} \phantom{\, {\theta} } \phantom{\, {\theta} } } & = & -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}

It’s not clear (to me at any rate) what the solutions are to the geodesic equations despite the guarantees of Agricola and Thier (2004). But let’s try a different chart.


\displaystyle       \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]

In this chart, the geodesics are clearly straight lines as we would hope.


Agricola, Ilka, and Christian Thier. 2004. “The geodesics of metric connections with vectorial torsion.” Annals of Global Analysis and Geometry 26 (4): 321–32. doi:10.1023/B:AGAG.0000047509.63818.4f.

Nakahara, M. 2003. “Geometry, Topology and Physics.” Text 822: 173–204. doi:10.1007/978-3-642-14700-5.

O’Neill, B. 1983. Semi-Riemannian Geometry with Applications to Relativity, 103. Pure and Applied Mathematics. Elsevier Science.

by Dominic Steinitz at November 24, 2016 05:04 PM

Yesod Web Framework

The New (Experimental) Yesod Development Server - Feedback Requested!

I'm guessing almost every Yesod user has - at some point - used the venerable yesod devel command, which launches a server which auto-recompiles your source code on any file changes. This has been a core part of the Yesod ecosystem for many years. Unfortunately, it's had to be far more complicated than I'd have liked:

  • Since it predates addDependentFile (good work Greg Weber on getting that in!), it has some pretty complex logic around guessing which external files (like Hamlet files) should force a recompile. (Adding support for addDependentFile to the current yesod devel is possible, but it's a non-trivial undertaking.)
  • In order to ensure a consistent set of dependencies, it does some real fancy footwork around intercepting arguments passed to ghc and linker executables.
  • In order to parse various files, it links against the ghc library, tying it to a specific compiler version. This makes things difficult for users (don't accidentally use yesod from GHC 7.10.3 with GHC 8.0.1!), and sometimes really painful for maintainers.

For a few months now, I've been meaning to greatly simplify yesod devel, but the maintenance burden finally gave me the excuse I needed to bite the bullet and do it. The result is a dramatic simplification of the code base. First I'd like to ask for user feedback, and then I'll discuss some of the details of implementation.

Please try it out!

Since this is such a big change, I'd really appreciate if others could give this a shot before I release it. There are two ways you can do this:

Method 1:

  • git clone --branch 1304-stack-based-devel yesod-new-devel
  • cd yesod-new-devel
  • stack install yesod-bin
  • From your project directory, run yesod devel. NOTE: do not use stack exec -- yesod devel, you want to use the newly globally installed executable, not the one from your snapshot!

Method 2:

  • Add the following to your stack.yaml file's packages list:

    - location:
        commit: f3fc735a25eb3d5c051c761b59070eb9a0e4e156
      - yesod-bin
      extra-dep: true
  • Likely: add the following to your stack.yaml file's extra-deps list:

    - say-
    - typed-process-
  • stack build yesod-bin
  • stack exec -- yesod devel

Use whichever method you feel most comfortable with. Please let me know both successes and failures, and then I'll try to get this rolled out. Comments would be great on the Github pull request. So far, in my limited testing, I've found that the new yesod devel runs faster than the current one, but that could very much be confirmation bias speaking.

Note: there are a few removed features in this update, please see the changelog.

How it works

The big change - as the branch name implies - was depending entirely on Stack for all of the heavy lifting. Stack already provides a --file-watch command to automatically recompile, and uses GHC's own addDependentFile information to track external file dependencies. This cuts out the vast majority of the complexity. There's no longer any need to depend on the ghc library, there's less Cabal library code involved (making cross-version support much simpler), and almost everything is handled by shelling out to external executables.

I also got to redo the concurrency aspects of this using my absolute favorite package in the world: async. The result is, in my opinion, very straightforward. I also leveraged some of the newer libraries I've worked on, like safe-exceptions, typed-process, and say.

The code is (finally) well commented, so you can jump in and look yourself. I've also added a decent README, and an example of using yesod devel with a non-Yesod project.

November 24, 2016 05:15 AM

November 23, 2016

Neil Mitchell

The Website Working Group (HWWG)

Haskell represents both a language and a user community - and moreover a fantastic community full of friends, fun, and deep technical debate. Unfortunately, in recent times the community has started to fracture, e.g. Cabal vs Stack, vs These divisions have risen above technical disagreements and at some points turned personal. The solution, shepherded by Simon Peyton Jones, and agreed to by both members of the committee and the maintainers of, is to form the Haskell Website Working Group (HWWG). The charter of the group is at the bottom of this post.

The goal of the Haskell Website Working Group is to make sure the Haskell website caters to the needs of Haskell programmers, particularly beginners. In doing so we hope to either combine or differentiate and, and give people clear recommendations of what "downloading Haskell" means. Concretely, we hope that either redirects to, or that ends up being used for something very different from today.

The Haskell Website Working Group (HWWG)

Scope and goals

  • The HWWG is responsible for the design and content of the user-facing web site, including tutorials, videos, news, resource, downloads, etc.

  • The HWWG is not responsible for:
    • The infrastructure of
    • Toolchains, Hackage, compilers, etc
  • The HWWG focuses on serving users of Haskell, not suppliers of technology or libraries.
  • An explicit goal is to re-unite the and web sites.

Expected mode of operation

  • HWWG is not responsible for actually doing everything! The web site is on github. Anyone can make a pull request. The general expectation is that uncontroversial changes will be accepted and committed without much debate.
  • If there is disagreement about a proposed change, it's up to the HWWG to engage in (open) debate, and to seek input as widely as time allows, but finally to make a decision.


Initial membership comprises of:

  • Neil Mitchell (chair)
  • Nicolas Wu
  • Andrew Cowie
  • Vincent Hanquez
  • Ryan Trinkle
  • Chris Done

It is expected the committee will change over time, but the mechanism has not yet been thought about.

Rules of engagement

  • Recognising that honestly-held judgements may differ, we will be scrupulously polite both in public and in private.
  • Recognising that Haskell has many users, and that different users have different needs and tastes, we want to be inclusive rather than exclusive, providing a variety of alternative resources (toolchains, tutorials, books, etc) clearly signposted with their intended audiences.
  • Ultimately the committee owns the URL, but it delegates authority for the design and content of the web site to the HWWG. In extremis, if the committee believes that the HWWG is mismanaging the web site, it can revoke that delegation.

by Neil Mitchell ( at November 23, 2016 10:04 PM

Joachim Breitner

microG on Jolla

I am a incorrigibly in picking non-mainstream, open smartphones, and then struggling hard. Back then in 2008, I tried to use the OpenMoko FreeRunner, but eventually gave up because of hardware glitches and reverted to my good old Siemens S35. It was not that I would not be willing to put up with inconveniences, but as soon as it makes live more difficult for the people I communicate with, it becomes hard to sustain.

Two years ago I tried again, and got myself a Jolla phone, running Sailfish OS. Things are much nicer now: The hardware is mature, battery live is good, and the Android compatibility layer enables me to run many important apps that are hard to replace, especially the Deutsche Bahn Navigator and various messengers, namely Telegram, Facebook Messenger, Threema and GroupMe.

Some apps that require Google Play Services, which provides a bunch of common tasks and usually comes with the Google Play store would not run on my phone, as Google Play is not supported on Sailfish OS. So far, the most annoying ones of that sort were Uber and Lyft, making me pay for expensive taxis when others would ride cheaper, but I can live with that. I tried to install Google Play Services from shady sources, but it would regularly crash.

Signal on Jolla

Now in Philadelphia, people urged me to use the Signal messenger, and I was convinced by its support for good end-to-end crypto, while still supporting offline messages and allowing me to switch from my phone to my desktop and back during a conversation. The official Signal app uses Google Cloud Messaging (GCM, part of Google Play Services) to get push updates about new posts, and while I do not oppose this use of Google services (it really is just a ping without any metadata), this is a problem on Sailfish OS.

Luckily, the Signal client is open source, and someone created a “LibreSignal” edition that replaced the use of GCM with websockets, and indeed, this worked on my phone, and I could communicate.

Things were not ideal, though: I would often have to restart the app to get newly received messages; messages that I send via Signal Desktop would often not show up on the phone and, most severe, basically after every three messages, sending more messages from Desktop would stop working for my correspondents, which freaked them out. (Strangely it continued working from their phone app, so we coped for a while.)

So again, my choice of non-standard devices causes inconveniences to others. This, and the fact that the original authors of Signal and the maintainers of LibreSignal got into a fight that ended LibreSignal discontinued, meant that I have to change something about this situation. I was almost ready to give in and get myself a Samsung S7 or something boring of the sort, but then I decided to tackle this issue once more, following some of the more obscure instructions out there, trying to get vanilla Signal working on my phone. About a day later, I got it, and this is how I did it.


So I need Google Play Services somehow, but installing the “real thing” did not seem to be very promising (I tried, and regularly got pop-ups telling me that Play Services has crashed.) But I found some references to a project called “microG”, which is an independent re-implementation of (some of) of the play services, in particular including GCM.

Installing microG itself was easy, as you can add their repository to F-Droid. I installed the core services, the services framework and the fake store apps. If this had been all that was to do, things would be easy!

Play Store detection work arounds

But Signal would still complain about the lack of Google Play Services. It asks Android if an app with a certain name is installed, and would refuse to work if this app does not exist. For some reason, the microG apps cannot just have the names of the “real” Google apps.

There seem to be two ways of working around this: Patching Signal, or enabling Signature Spoofing.

The initially most promising instructions (which are in a README in a tarball on a fishy file hoster linked from an answer on the Jolla support forum…) suggested patching Signal, and actually came both with a version of an app called “Lucky Patcher” as well as a patched Android package, but both about two years old. I tried a recent version of the Lucky Patcher, but it failed to patch the current version of Signal.

Signature Spoofing

So on to Signature Spoofing. This is a feature of some non-standard Android builds that allow apps (such as microG) to fake the existence of other apps (the Play Store), and is recommended by the microG project. Sailfish OS’s Android compatibility layer “Alien Dalvik” does not support it out of the box, but there is a tool “tingle” that adds this feature to existing Android systems. One just has to get the /system/framework/framework.jar file, put it into the input folder of this project, run python, select 2, and copy the framework.jar from output/ back. Great.

Deodexing Alien Dalvik

Only that it only works on “deodexed” files. I did not know anything about odexed Android Java classes (and did not really want to know), but there was not way around. Following this explanation I gathered that one finds files foo.odex in the Android system folder, runs some tool on them to create a classes.dex file, and adds that to the corresponding foo.jar or foo.apk file, copies this back to the phone and deletes the foo.odex file.

The annoying this is that one does not only have to do it for framework.jar in order to please tingle, because if one does it to one odex file, one has to do to all! It seems that for people using Windows, the Universal Deodexer V5 seems to be a convenient tool, but I had to go more manually.

So I first fetched “smali”, compiled it using ./gradlew build. Then I fetched the folders /opt/alien/system/framework and /opt/alien/system/app from the phone (e.g. using scp). Keep a backup of these in case something breaks. Then I ran these commands (disclaimer: I fetched these from my bash history and slightly cleaned them up. This is not a fire-and-forget script! Use it when you know what it and you are doing):

cd framework
for file in *.odex
  java -jar ~/build/smali/baksmali/build/libs/baksmali.jar deodex $file -o out
  java -jar ~/build/smali/smali/build/libs/smali.jar a out -o classes.dex
  zip -u $(basename $file .odex).jar classes.dex
  rm -rf out classes.dex $file
cd ..

cd app
for file in *.odex
  java -jar ~/build/smali/baksmali/build/libs/baksmali.jar deodex -d ../framework $file -o out
  java -jar ~/build/smali/smali/build/libs/smali.jar a out -o classes.dex
  zip -u $(basename $file .odex).apk classes.dex
  rm -rf out classes.dex $file
cd ..

The resulting framework.jar can now be patched with tingle:

mv framework/framework.jar ~/build/tingle/input
cd ~/build/tingle
# select 2
cd -
mv ~/build/tingle/output/framework.jar framework/framework.jar

Now I copy these framework and app folders back on my phone, and restart Dalvik:

devel-su systemctl restart aliendalvik.service

It might start a bit slower than usually, but eventually, all the Android apps should work as before.

The final bit that was missing in my case was that I had to reinstall Signal: If it is installed before microG is installed, it does not get permission to use GCM, and when it tries (while registering: After generating the keys) it just crashes. I copied /data/data/org.thoughtcrime.secretsms/ before removing Signal and moved it back after (with cp -a to preserve permissions) so that I could keep my history.

And now, it seems, vanilla Signal is working just fine on my Jolla phone!

What’s missing

Am I completely happy with Signal? No! An important feature that it is lacking is a way to get out all data (message history including media files) in a file format that can be read without Signal; e.g. YAML files or clean HTML code. I do want to be able to re-read some of the more interesting conversations when I am 74 or 75, and I doubt that there will be a Signal App, or even Android, then. I hope that this becomes available in time, maybe in the Desktop version.

I would also hope that pidgin gets support to the Signal protocol, so that I conveniently use one program for all my messaging needs on the desktop.

Finally it would be nice if my Signal identity was less tied to one phone number. I have a German and a US phone number, and would want to be reachable under both on all my clients. (If you want to contact me on Signal, use my US phone number.)


Could I have avoided this hassle by simply convincing people to use something other than Signal? Tricky, at the moment. Telegram (which works super reliable for me, and has a pidgin plugin) has dubious crypto and does not support crypto while using multiple clients. Threema has no desktop client that I know of. OTR on top of Jabber does not support offline messages. So nothing great seems to exist right now.

In the long run, the best bet seems to be OMEMO (which is, in essence, the Signal protocol) on top of Jabber. It is currently supported by one Android Jabber client (Conversations) and one Desktop application (gajim, via a plugin). I should keep an eye on pidgin support for OMEMO and other development around this.

by Joachim Breitner ( at November 23, 2016 05:44 PM

Michael Snoyman

Haskell for Dummies

There was an image that made the rounds a while ago.

Haskell as seen by other language fans

The joke being: haha, Haskell is only for super-geniuses like Einstein. There's lots to complain about in this chart, but I'm going to pick on the lower-right corner. Specifically:

Haskellers don't use Haskell because we think we're Einstein. We use Haskell because we know we aren't.

When I speak to Haskellers, the general consensus is: "I'm not smart enough to write robust code in a language like Python." We're not using Haskell because we're brilliant; we're using Haskell because we know we need a language that will protect us from ourselves.

That said, I should acknowledge that Haskell does have a steeper learning curve for most programmers. But this is mostly to do with unfamiliarity: Haskell is significantly different from languages like Python, Ruby, and Java, whereas by contrast those languages are all relatively similar to each other. Great educational material helps with this.

You should set your expectations appropriately: it will take you longer to learn Haskell, but it's worth it. Personally, I use Haskell because:

  • It gives me the highest degree of confidence that I'll write my program correctly, due to its strong, static typing
  • It has great support for modern programming techniques, like functional programming and green-thread-based concurrency
  • I can write more maintainable code in it than other languages
  • It has a great set of libraries and tools
  • It's got great performance characteristics for high-level code, and allows low-level performance tweaking when needed

I'm certainly leaving off a lot of points here, my goal isn't to be comprehensive. Instead, I'd like to dispel with this notion of the Haskeller super-genius. We Haskellers don't believe it. We know why we're using a language like Haskell: to protect us from ourselves.

November 23, 2016 12:00 AM

November 22, 2016

FP Complete

Scripting in Haskell

Writing scripts in Haskell using Stack is straight-forward and reliable. We've made a screencast to demonstrate this:

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="560"></iframe>


Slides in the screencast cover:

  • What Haskell is
  • What Stack is
  • We make a case for reproducible scripting

We cover the following example cases:

  • Hello, World! as a Haskell script
  • Generating a histogram of lines of a file
  • Using FSNotify to watch a directory for any changes to files

In summary, we show:

  1. Scripting in Haskell is reproducible: it works over time without bitrot.
  2. It's easy to do, just add a shebang #!/usr/bin/env stack to the top of your file.
  3. We can use useful, practical libraries that are not available in traditional scripting languages.
  4. Our script is cross-platform: it'll work on OS X, Linux or Windows.

November 22, 2016 02:00 AM

Comparative Concurrency with Haskell

Last week, I was at DevConTLV X and attended a workshop by Aaron Cruz. While the title was a bit of a lie (it wasn't four hours, and we didn't do a chat app), it was a great way to see some basics of concurrency in different languages. Of course, that made me all the more curious to add Haskell to the mix.

I've provided multiple different implementations of this program in Haskell, focusing on different approaches (matching the approaches of the other languages, highly composable code, and raw efficiency). These examples are intended to be run and experimented with. The only requirement is that you install the Haskell build tool Stack. You can download a Windows installer, or on OS X and Linux run:

curl -sSL | sh

We'll start with approaches very similar to other languages like Go and Rust, and then dive into techniques like Software Transactional Memory which provide a much improved concurrency experience for more advanced workflows. Finally we'll dive into the async library, which provides some very high-level functions for writing concurrent code in a robust manner.

Unfortunately I don't have access to the source code for the other languages right now, so I can't provide a link to it. If anyone has such code (or wants to write up some examples for other lanugages), please let me know so I can add a link.

The problem

We want to spawn a number of worker threads which will each sleep for a random period of time, grab an integer off of a shared work queue, square it, and put the result back on a result queue. Meanwhile, a master thread will fill up the work queue with integers, and read and print results.

Running the examples

Once you've installed Stack, you can save each code snippet into a file with a .hs extension (like concurrency.hs), and then run it with stack concurrency.hs. If you're on OS X or Linux, you can also:

chmod +x concurrency.hs

The first run will take a bit longer as it downloads the GHC compiler and installs library dependencies, but subsequent runs will be able to use the cached results. You can read more about scripting with Haskell.


Most of the other language examples used some form of channels. We'll begin with a channel-based implementation for a convenient comparison to other language implementations. As you'll see, Haskell's channel-based concurrency is quite similar to what you'd experience in languages like Go and Elixir.

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-6.23 runghc --package random
import Control.Concurrent (forkIO, threadDelay, readChan, writeChan, newChan)
import Control.Monad (forever)
import System.Random (randomRIO)

-- The following type signature is optional. Haskell has type
-- inference, which makes most explicit signatures unneeded. We
-- usually include them at the top level for easier reading.
workerCount, workloadCount, minDelay, maxDelay :: Int
workerCount = 250
workloadCount = 10000
minDelay = 250000 -- in microseconds, == 0.25 seconds
maxDelay = 750000 --                  == 0.75 seconds

-- Launch a worker thread. We take in the request and response
-- channels to communicate on, as well as the ID of this
-- worker. forkIO launches an action in a new thread, and forever
-- repeats the given action indefinitely.
worker requestChan responseChan workerId = forkIO $ forever $ do
    -- Get a random delay value between the min and max delays
    delay <- randomRIO (minDelay, maxDelay)
    -- Delay this thread by that many microseconds
    threadDelay delay
    -- Read the next item off of the request channel
    int <- readChan requestChan
    -- Write the response to the response channel
    writeChan responseChan (workerId, int * int)

main = do
    -- Create our communication channels
    requestChan <- newChan
    responseChan <- newChan

    -- Spawn off our worker threads. mapM_ performs the given action
    -- on each value in the list, which in this case is the
    -- identifiers for each worker.
    mapM_ (worker requestChan responseChan) [1..workerCount]

    -- Define a helper function to handle each integer in the workload
    let perInteger int = do
            -- Write the current item to the request channel
            writeChan requestChan int
            -- Read the result off of the response channel
            (workerId, square) <- readChan responseChan
            -- Print out a little message
            putStrLn $ concat
                [ "Worker #"
                , show workerId
                , ": square of "
                , show int
                , " is "
                , show square

    -- Now let's loop over all of the integers in our workload
    mapM_ perInteger [1..workloadCount]

This is a pretty direct translation of how you would do things in a language like Go or Erlang/Elixir. We've replaced for-loops with maps, but otherwise things are pretty similar.

There's a major limitation in this implementation, unfortunately. In the master thread, our perInteger function is responsible for providing requests to the workers. However, it will only provide one work item at a time and then block for a response. This means that we are effectively limiting ourselves to one concurrent request. We'll address this in various ways in the next few examples.


It turns out in this case, we can use a lighter-weight alternative to a channel for the requests. Instead, we can just put all of our requests into an IORef - which is the basic mutable variable type in Haskell - and then pop requests off of the list inside that variable. Veterans of concurrency bugs will be quick to point out the read/write race condition you'd usually expect:

  1. Thread A reads the list from the variable
  2. Thread B reads the list from the variable
  3. Thread A pops the first item off the list and writes the rest to the variable
  4. Thread B pops the first item off the list and writes the rest to the variable

In this scenario, both threads A and B will end up with the same request to work on, which is certainly not our desired behavior. However, Haskell provides built-in compare-and-swap functionality, allowing us to guarantee that our read and write are atomic operations. This only works for a subset of Haskell functionality (specifically, the pure subset which does not have I/O side effects), which fortunately our pop-an-element-from-a-list falls into. Let's see the code.

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-6.23 runghc --package random
import Control.Concurrent (forkIO, threadDelay, writeChan, readChan, newChan)
import Data.IORef (atomicModifyIORef, newIORef)
import Control.Monad (replicateM_)
import System.Random (randomRIO)

workerCount = 250
workloadCount = 10000
minDelay = 250000
maxDelay = 750000

worker requestsRef responseChan workerId = forkIO $ do
    -- Define a function to loop on the available integers in the
    -- requests reference.
    let loop = do
            delay <- randomRIO (minDelay, maxDelay)
            threadDelay delay

            -- atomicModifyIORef is our compare-and-swap function. We
            -- give it a reference, and a function that works on the
            -- contained value. That function returns a pair of the
            -- new value for the reference, and a return value.
            mint <- atomicModifyIORef requestsRef $ \requests ->
                -- Let's see if we have anything to work with...
                case requests of
                    -- Empty list, so no requests! Put an empty list
                    -- back in and return Nothing
                    [] -> ([], Nothing)
                    -- We have something. Put the tail of the list
                    -- back in the reference, and return the new item.
                    int:rest -> (rest, Just int)

            -- Now we'll see what to do next based on whether or not
            -- we got something from the requests reference.
            case mint of
                -- No more requests, stop looping
                Nothing -> return ()
                -- Got one, so...
                Just int -> do
                    -- Write the response to the response channel
                    writeChan responseChan (workerId, int, int * int)
                    -- And loop again

    -- Kick off the loop

main = do
    -- Create our request reference and response channel
    requestsRef <- newIORef [1..workloadCount]
    responseChan <- newChan

    mapM_ (worker requestsRef responseChan) [1..workerCount]

    -- We know how many responses to expect, so ask for exactly that
    -- many with replicateM_.
    replicateM_ workloadCount $ do
        -- Read the result off of the response channel
        (workerId, int, square) <- readChan responseChan
        -- Print out a little message
        putStrLn $ concat
            [ "Worker #"
            , show workerId
            , ": square of "
            , show int
            , " is "
            , show square

Compare-and-swap operations can be significantly more efficient than using true concurrency datatypes (like the Chans we saw above, or Software Transactional Memory). But they are also limiting, and don't compose nicely. Use them when you need a performance edge, or have some other reason to prefer an IORef.

Compared to our channels example, there are some differences in behavior:

  • In the channels example, our workers looped forever, whereas here they have an explicit stop condition. In reality, the Haskell runtime will automatically kill worker threads that are blocked on a channel without any writer. However, we'll see how to use closable channels in a later example.
  • The channels example would only allow one request on the request channel at a time. This is similar to some of the examples from other languages, but defeats the whole purpose of concurrency: only one worker will be occupied at any given time. This IORef approach allows multiple workers to have work items at once. (Again, we'll see how to achieve this with channels in a bit.)

You may be concerned about memory usage: won't holding that massive list of integers in memory all at once be expensive? Not at all: Haskell is a lazy language, meaning that the list will be constructed on demand. Each time a new element is asked for, it will be allocated, and then can be immediately garbage collected.

Software Transactional Memory

One of the most powerful concurrency techniques available in Haskell is Software Transactional Memory (STM). It allows us to have mutable variables, and to make modifications to them atomically. For example, consider this little snippet from a theoretical bank account application:

transferFunds from to amt = atomically $ do
    fromOrig <- readTVar from
    toOrig <- readTVar to
    writeTVar from (fromOrig - amt)
    writeTVar to (toOrig + amt)

In typically concurrent style, this would be incredibly unsafe: it's entirely possible for another thread to modify the from or to bank account values between the time our thread reads and writes them. However, with STM, we are guaranteed atomicity. STM will keep a ledger of changes made during an atomic transaction, and then attempt to commit them all at once. If any of the variables references have modified during the transaction, the ledger will be rolled back and tried again. And like atomicModifyIORef from above, Haskell disallows side-effects inside a transaction, so that this retry behavior cannot be observed from the outside world.

To stress this point: Haskell's STM can eliminate the possibility for race conditions and deadlocks from many common concurrency patterns, greatly simplifying your code. The leg-up that Haskell has over other languages in the concurrency space is the ability to take something that looks like calamity and make it safe.

We're going to switch our channels from above to STM channels. For the request channel, we'll use a bounded, closable channel (TBMChan). Bounding the size of the channel prevents us from loading too many values into memory at once, and using a closable channel allows us to tell our workers to exit.

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-6.23 runghc
    --package random --package stm-chans
import Control.Concurrent (forkIO, threadDelay, readChan, writeChan, newChan)
import Control.Concurrent.STM (atomically, writeTChan, readTChan, newTChan)
import Control.Concurrent.STM.TBMChan (readTBMChan, writeTBMChan, newTBMChan, closeTBMChan)
import Control.Monad (replicateM_)
import System.Random (randomRIO)

workerCount = 250
workloadCount = 10000
minDelay = 250000 -- in microseconds, == 0.25 seconds
maxDelay = 750000 --                  == 0.75 seconds

worker requestChan responseChan workerId = forkIO $ do
    let loop = do
            delay <- randomRIO (minDelay, maxDelay)
            threadDelay delay

            -- Interact with the STM channels atomically
            toContinue <- atomically $ do
                -- Get the next request, if the channel is open
                mint <- readTBMChan requestChan
                case mint of
                    -- Channel is closed, do not continue
                    Nothing -> return False
                    -- Channel is open and we have a request
                    Just int -> do
                        -- Write the response to the response channel
                        writeTChan responseChan (workerId, int, int * int)
                        -- And yes, please continue
                        return True
            if toContinue
                then loop
                else return ()

    -- Kick it off!

main = do
    -- Create our communication channels. We're going to ensure the
    -- request channel never gets more than twice the size of the
    -- number of workers to avoid high memory usage.
    requestChan <- atomically $ newTBMChan (workerCount * 2)
    responseChan <- atomically newTChan

    mapM_ (worker requestChan responseChan) [1..workerCount]

    -- Fill up the request channel in a dedicated thread
    forkIO $ do
        mapM_ (atomically . writeTBMChan requestChan) [1..workloadCount]
        atomically $ closeTBMChan requestChan

    replicateM_ workloadCount $ do
        -- Read the result off of the response channel
        (workerId, int, square) <- atomically $ readTChan responseChan
        -- Print out a little message
        putStrLn $ concat
            [ "Worker #"
            , show workerId
            , ": square of "
            , show int
            , " is "
            , show square

Overall, this looked pretty similar to our previous channels, which isn't surprising given the relatively basic usage of concurrency going on here. However, using STM is a good default choice in Haskell applications, due to how easy it is to build up complex concurrent programs with it.

Address corner cases

Alright, we've tried mirroring how examples in other languages work, given a taste of compare-and-swap, and explored the basics of STM. Now let's make our code more robust. The examples here - and those for other languages - often take some shortcuts. For example, what happens if one of the worker threads encounters an error? When our workload is simply "square a number," that's not a concern, but with more complex workloads this is very much expected.

Our first example, as mentioned above, didn't allow for true concurrency, since it kept the channel size down to 1. And all of our examples have made one other assumption: the number of results expected. In many real-world applications, one request may result in 0, 1, or many result values. So to sum up, let's create an example with the following behavior:

  • If any of the threads involved abort exceptionally, take down the whole computation, leaving no threads alive
  • Make sure that multiple workers can work in parallel
  • Let the workers exit successfully when there are no more requests available
  • Keep printing results until all worker threads exit.

We have one final tool in our arsenal that we haven't used yet: the async library, which provides some incredibly useful concurrency tools. Arguably, the most generally useful functions there are concurrently (which runs two actions in separate threads, as we'll describe in the comments below), and mapConcurrently, which applies concurrently over a list of values.

This example is how I'd recommend implementing this algorithm in practice: it uses solid library functions, accounts for exceptions, and is easy to extend for more complicated use cases.

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-6.23 runghc
    --package random --package async --package stm-chans
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async (mapConcurrently, concurrently)
import Control.Concurrent.STM (atomically)
import Control.Concurrent.STM.TBMChan (readTBMChan, writeTBMChan, newTBMChan, closeTBMChan)
import System.Random (randomRIO)

workerCount = 250
workloadCount = 10000
minDelay = 250000 -- in microseconds, == 0.25 seconds
maxDelay = 750000 --                  == 0.75 seconds

-- Not meaningfully changed from above, just some slight style
-- tweaks. Compare and contrast with the previous version if desired
-- :)
worker requestChan responseChan workerId = do
    let loop = do
            delay <- randomRIO (minDelay, maxDelay)
            threadDelay delay

            mint <- atomically $ readTBMChan requestChan
            case mint of
                Nothing -> return ()
                Just int -> do
                    atomically $
                        writeTBMChan responseChan (workerId, int, int * int)

main = do
    -- Create our communication channels. Now the response channel is
    -- also bounded and closable.
    requestChan <- atomically $ newTBMChan (workerCount * 2)
    responseChan <- atomically $ newTBMChan (workerCount * 2)

    -- We're going to have three main threads. Let's define them all
    -- here. Note that we're _defining_ an action to be run, not
    -- running it yet! We'll run them below.
        -- runWorkers is going to run all of the worker threads
        runWorkers = do
            -- mapConcurrently runs each function in a separate thread
            -- with a different argument from the list, and then waits
            -- for them all to finish. If any of them throw an
            -- exception, all of the other threads are killed, and
            -- then the exception is rethrown.
            mapConcurrently (worker requestChan responseChan) [1..workerCount]
            -- Workers are all done, so close the response channel
            atomically $ closeTBMChan responseChan

        -- Fill up the request channel, exactly the same as before
        fillRequests = do
            mapM_ (atomically . writeTBMChan requestChan) [1..workloadCount]
            atomically $ closeTBMChan requestChan

        -- Print each result
        printResults = do
            -- Grab a response if available
            mres <- atomically $ readTBMChan responseChan
            case mres of
                -- No response available, so exit
                Nothing -> return ()
                -- We got a response, so...
                Just (workerId, int, square) -> do
                    -- Print it...
                    putStrLn $ concat
                        [ "Worker #"
                        , show workerId
                        , ": square of "
                        , show int
                        , " is "
                        , show square
                    -- And loop!

    -- Now that we've defined our actions, we can use concurrently to
    -- run all of them. This works just like mapConcurrently: it forks
    -- a thread for each action and waits for all threads to exit
    -- successfully. If any thread dies with an exception, the other
    -- threads are killed and the exception is rethrown.
    runWorkers `concurrently` fillRequests `concurrently` printResults

    return ()

By using the high level concurrently and mapConcurrently functions, we avoid any possibility of orphaned threads, and get automatic exception handling and cancelation functionality.

Why Haskell

As you can see, Haskell offers many tools for advanced concurrency. At the most basic level, Chans and forkIO give you pretty similar behavior to what other languages provide. However, IORefs with compare-and-swap provide a cheap concurrency primitive not available in most other languages. And the combination of STM and the async package is a toolset that to my knowledge has no equal in other languages. The fact that side-effects are explicit in Haskell allows us to do many advanced feats that wouldn't be possible elsewhere.

We've only just barely scratched the surface of what you can do with Haskell. If you're interested in learning more, please check out our Haskell Syllabus for a recommended learning route. There's also lots of content on the haskell-lang get started page. And if you want to learn more about concurrency, check out the async tutorial.

FP Complete also provides corporate and group webinar training sessions. Please check out our training page for more information, or see our consulting page for how we can help your team succeed with devops and functional programming.

Contact FP Complete

Advanced questions

We skirted some more advanced topics above, but for the curious, let me address some points:

  • In our first example, we use forever to ensure that our workers would never exit. But once they had no more work to do, what happens to them? The Haskell runtime is smart enough to notice in general when a channel has no more writers, and will automatically send an asynchronous exception to a thread which is trying to read from such a channel. This works well enough for a demo, but is not recommended practice:

    1. It's possible, though unlikely, that the runtime system won't be able to figure out that your thread should be killed
    2. It's much harder to follow the logic of a program which has no explicit exit case
    3. Using exceptions for control flow is generally a risk endeavor, and in the worst case, can lead to very unexpected bugs
  • For the observant Haskeller, our definitions of runWorkers and fillRequests in the last example may look dangerous. Specifically: what happens if one of those actions throws an exception before closing the channel? The other threads reading from the channel will be blocked indefinitely! Well, three things:

    1. As just described, the runtime system will likely be able to kill the thread if needed
    2. However, because of the way we structured our program, it won't matter: if either of these actions dies, it will take down the others, so no one will end up blocked on a channel read
    3. Nonetheless, I strongly recommend being exception-safe in all cases (I'm kind of obsessed with it), so a better way to implement these functions would be with finally, e.g.:

       fillRequests =
           mapM_ (atomically . writeTBMChan requestChan) [1..workloadCount]
               `finally` atomically (closeTBMChan requestChan)
  • This post was explicitly about concurrency, or running multiple I/O actions at the same time. I avoided talking about the very much related topic of parallelism, which is speeding up a computation by performing work on multiple cores. In other languages, the distinction between these is minor. In Haskell, with our separation between purity and impurity, parallelism can often be achieved with something as simple as replacing map with parMap (parallel map).

    That said, it's certainly possible - and common - to implement parallelism via concurrency. In order to make that work, we would need to force evaluation of the result value (int * int) before writing it to the channel. This could be achieved with something like:

    let !result = int * int
    writeChan responseChan (workerId, result)

    The ! is called a bang pattern, and indicates that evaluation should be forced immediately.

November 22, 2016 02:00 AM

Michael Snoyman

Spreading the Gospel of Haskell

Yesterday I fired off two tweets about the state of Haskell evangelism:

<script async="async" charset="utf-8" src=""></script>

But simply complaining about the state of things, instead of actually proposing a way to make things better, is a waste of 280 characters. So I'd like to expand on where I think we, the Haskell community, can do better.

To try and prevent the flamewars which I'm sure are about to start, let me give some guidelines on who shouldn't read this blog post:

  • If you think that programming languages should succeed on purely technical merits, and silly "marketing activities" like writing newbie-oriented tutorials and making engaging screencasts is unfair competition, you shouldn't read this blog post.
  • If you think that broken dependency solving, Hackage downtime, confusing cabal-install incantations, and problematic global package databases in the Haskell Platform have had no ill effect on Haskell adoption, you shouldn't read this blog post.
  • If you think that new users who give up after 5 minutes of confusion on a website weren't serious enough in the first place and we shouldn't be sad that we lost them, you shouldn't read this blog post.

And most likely, you shouldn't post this to /r/haskell. That's not going to end well.

Attacking Haskell's Flaws

As the Twitter discussion yesterday pointed out, there are undoubtedly flaws in Haskell. It may be inflammatory to admit that publicly, but so be it. Every language has flaws. Haskell is blessed to also have some of the greatest strengths in any programming language available today: beautiful concurrency, a powerful and useful type system, a plethora of real-world libraries, and (as of recently) pretty good tooling and educational resources.

At FP Complete, we often talk about the attractors and obstacles (thanks to our CEO, Aaron Contorer, for this great prism to view things). Using that terminology: Haskell is chock-full of attractors. The problem is the obstacles which prevent Haskell from taking off. I'm going to claim that, at this point, we need to do very little as far as making Haskell more attractive, but instead need to collectively knock down obstacles preventing its success.

Obstacles can be a great many things, some of which you may have categorized as "missing attractors." Let me give some examples:

  • Missing IDE tooling. For some people, this is a deal-breaker, and will prevent them from using Haskell.
  • A missing library. Again, if someone needs to access, say, MS SQL Server, and a library doesn't exist, this is an obstacle to adoption. (Yes, that person could go ahead and write the library him/herself. If you think that's the right response, you probably shouldn't be reading this blog post.)
  • Lack of a tutorial/example/cookbook for a specific problem domain. Yes, someone could struggle through reading API docs until "it clicks." If that's your answer: also probably shouldn't be reading this post.
  • Lack of support for an OS/architecture.

The important thing about obstacles is that they are not universal. For most of us, lack of support for Haiku OS will not prevent us from using Haskell. Those of us who have been using Haskell for years have decided that the obstacles of bad tooling weren't enough to deter us from the great language features. And so on.


Many people in the Haskell community have been chipping away at random obstacles (or adding random attractors) for years now, on a hobbyist basis. If that's all you want to do, more power to you, and enjoy. What I'm doing here is making a call for a more concerted, organized effort into knocking down these obstacles to Haskell adoption.

I'd say that we can measure how high a priority an obstacle-destroying action is based on two criteria:

  • How difficult it will be to accomplish
  • How big an impact it will have on Haskell adoption

I would call easy actions with big impact low hanging fruit, and recommend we focus on those actions for now. In other words, while improving GHC compile times may have a big impact, it's also difficult to accomplish. Similarly, changing the Haskell logo from purple to blue is easy to accomplish, but doesn't have much impact.

So my set of easy to do and big impact things entirely come down to spreading the word. I would say our biggest and easiest knocked-down obstacles are:

  • Someone's never heard of Haskell
  • Someone's heard of Haskell, but doesn't know why it's relevant
  • Someone's well aware of Haskell, but thinks it will be hard to start with
  • Someone's already tried Haskell and run into problems (like Dependency Hell), and doesn't realize we've solved them

So what does this entail? Here are my suggestions:

  • Write a blog post about how you solved a problem in Haskell
  • Give a talk at a conference on what problems Haskell is particularly good at solving (my money goes to concurrency on this)
  • Put together a screencast on Haskell
  • Encourage a non-Haskeller to go through the Haskell Book and Haskell Syllabus

The intention here to show the world that Haskell is ready to help them, and that it's easy to get started now. Many of us at FP Complete have been putting out such posts for a while. I'm asking others to join in the fun and help give Haskell adoption a kick-start.

One final request: if you've gotten this far, odds are you agree that we need to encourage users to take the most-likely-to-succeed route to Haskell, be that with tooling, training, library installation, or library selection. We've put a lot of effort into making the destination for that goal. Hopefully can converge on this goal in the future, but for now it's very likely to just present another obstacle. When you tell people to get started with Haskell, I strongly recommend linking to

November 22, 2016 12:00 AM

November 21, 2016

FP Complete

Mastering Time-to-Market with Haskell

For bringing your product to market, there isn't just one metric for success. That depends on your business needs. Haskell is the pure functional programming language that brings tangible benefits over its competitors for a variety of TTM priorities. Let's explore four of them.

Speed to market

You may want to bring your product to market as quickly as possible, because you have direct competitors; you have to produce a functioning demo for your investors; or your success in the market is time sensitive. Haskell speeds up this process in various ways.

Constuct things correctly the first time: Haskell is statically typed, which is a form of program verification that guarantees correctness of certain properties, like in Java or C#. Unlike Java or C#, Haskell is a pure functional language, leading to verification of far more portions of the program source code. With feedback from the compiler while developing, your developers are guaranteed a certain level of correctness and this allows them to concentrate on your domain business logic. As written elsewhere, worst practices should be hard. Also, a case study on Haskell vs C# for contract writing.

Reduce testing time: Haskell emphasizes the correct by construction approach, which is to use the type system to verify that program parts are combined in only the ways that will not crash and that make sense. Examples range from the simple to advanced. For example, Haskell web frameworks like Yesod prevent XSS and accidental broken links statically. The more surface area of your problem covered by static analysis, the less time and effort is needed by your developers for writing unit and integration tests, and the tests blow up less often in continuous integration.

Make builds reproducible: Haskell projects that are built with the Stack build tool, are guaranteed reproducible builds using a stable set of packages, with Stack also providing docker support out of the box, adding an additional layer of reproducibility. If a reproducible build is created on one developer's machine, it will work on any. This significantly reduces ramp up time for your developers, and makes continuous integration trivial for your devops people.

Use the concurrency: Many problems are solved more easily with concurrency (networking, file I/O, video/image/document processing, database access, etc.) simply because the programming model is easier to understand. Haskell has some of the best concurrency support of any popular language, it has a breadth of tools, efficiency, stability, and it is trivial to use, out of the box. Let your developers use it. See Beautiful concurrency for more about concurrency in Haskell. Additionally, the code doesn't have to be rewritten in an arcane style like in NodeJS to gain concurrency.

Shipping on schedule

You may not need to ship as soon as possible, but to ship on schedule, for a demo, a conference or as promised to your investors or customers. For this, there is another mitigating feature of Haskell.

Types shed light on scope: Using the type system of Haskell to model your domain logic helps to expand the "fog of war" that we experience with project scope: there are many unknowns, and we need to enumerate all the cases and kinds of things. Armed with this UML-without-the-UML, you can have confidence in how much scope you can cover now for the current shipping schedule, and what needs to come in version 2. This gives confidence in time estimates made by your developers. See Haskell-Providing Peace of Mind for a case-study. See Using Haskell at SQream Technologies for a technical demonstration.

Avoid build system detours: Reproducible builds help to avoid the inevitable build system detours that happen on large projects, where developers are wasting time getting the build system to work on eachother's machines and debugging failures. Stack is reproducible out of the box.

Minimizing resources

You might want or need to be cost-effective in your development cycle, using as few developers or machines as possible. Haskell also reduces development costs.

Less testing is required: Haskell software requires less testing. Your developers always write tests, but with Haskell they can write fewer, and spend less time on it. Furthermore, fewer developers are needed to achieve a stable system, because a type system helps limit scope of possibilities, and lets your developers manage complexity.

Use Haskell's raw performance: Additionally, Haskell is a fast language on a single core. It is compiled to native machine code, and is competitive with C# and Java. At times it is competitive with C, see Haskell from C: Where are the for loops? for a technical demonstration. This means that you need fewer machine resources, and fewer machines. Haskell is also easy to write concurrent code for, which means you can make use of those additional cores on your machines.

Flexbility to make changes

Most developments put a high priority in flexibility to change, or should. But you may have particularly pronounced need for flexibility in changes to requirements without disruption. This is perhaps Haskell's strongest, most desirable advantage.

Correct by construction extends to reconstruction: Making a change to a well-typed system in Haskell is far more stable and reliable than its competitors because correctness invariants are maintained with any change of the code. See also this case study by Silk.

Less maintenance of the test suite under change: This requires less maintenance of the test suite, because static analysis gives your developers immediate feedback at a granular level, whereas a test suite typically does not guide them through the change process, it only tells them what their out-of-date specification expects. Once the change process is complete, updating the test suite becomes easier.

Types are rarely buggy: It's very rare to design a data type that has bugs in it, because they are so simple. Meanwhile a unit test or integration test suite presents additional developer overhead because it itself is a program that requires maintenance too.

See this Bump case study, for why teams are choosing Haskell over Python, and here for the comparison against Ruby of a similar nature: Haskell yields fewer errors and bugs.

Expanding development effort

You may find that your project requires hiring new developers and building a team. Now is the perfect time to hire Haskell developers. Like Python developers 10 years ago, Haskell developers are self-selecting; they learn it because it's a better language, not because it will guarantee them employment. At the same time, at this stage in Haskell's development, the wealth of practical, stable packages available indicate an infusion of pragmatic, experienced programmers.

Summary and further reading

In summary we've seen that:

  • Haskell decreases development time by narrowing the scope of development to your domain.
  • Haskell decreases the need and dependency on unit testing alone.
  • Haskell aids in reproducibility which helps teams work together, expand, and deploy.
  • Haskell's speed and easy concurrency reduce the cost of development substantially.
  • Haskell helps your developers evolve a system over time safely and with confidence, yielding more reliable time estimates.

You can learn more about using Haskell as a business at FP Complete's home page, in particular the Consulting page, or go and contact us straight away and we'll be in touch.

November 21, 2016 02:00 AM

November 16, 2016

Brent Yorgey

MonadRandom 0.5 and mwc-random: feedback wanted

Since 2013 or so I have been the maintainer of the MonadRandom package, which provides an mtl-style type class for monads with support for generation of pseudorandom values, along with a concrete random monad transformer RandT. As of this writing it has 89 reverse dependencies on Hackage—a healthy number, and one that makes me think carefully about any breaking changes to the package.

Recently I got a number of pull requests, and have been working on putting together an 0.5 release which adds a few functions, adds lazy- and strict-state variants of RandT, and reorganizes things to be closer to standard practice of the transformers package. Since this release will include some technically breaking changes already, it’s a good time to think about potentially including others.

The one thing I am not sure what to do about is this issue: Allow MonadRandom interface for MWC-random. mwc-random is a very nice package for psuedorandom number generation, but apparently it does not fit into the MonadRandom abstraction. First of all, I would like to understand why—I am not very familiar with mwc-random. Second of all, I’d love to figure out a solution, but ideally one that causes as little breakage to existing code as possible.

Leave a comment (either here or on the github issue) if this is something you know/care about, and let’s see if we can figure out a good solution together!

by Brent at November 16, 2016 11:36 AM

Michael Snoyman

Haskell's Missing Concurrency Basics

I want to discuss two limitations in standard Haskell libraries around concurrency, and discuss methods of improving the status quo. Overall, Haskell's concurrency story is - in my opinion - the best in class versus any other language I'm aware of, at least for the single-machine use case. The following are two issues that I run into fairly regularly and are a surprising wart:

  • putStrLn is not thread-safe
  • Channels cannot be closed

Let me back up these claims, and then ask for some feedback on how to solve them.

putStrLn is not thread-safe

The example below is, in my opinion, a prime example of beautiful concurrency in Haskell:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async
import Control.Concurrent.Async
import Control.Monad (replicateM_)

worker :: Int -> IO ()
worker num = replicateM_ 5 $ putStrLn $ "Hi, I'm worker #" ++ show num

main :: IO ()
main = do
    mapConcurrently worker [1..5]
    return ()

Well, it's beautiful until you see the (abridged) output:

Hi, HIiH'HH,imii , ,,I w  'IoIIm'r'' mkmmw e  owrwwro ookr#rrek2kkre
ee rrr# H  3#i##

Your mileage may vary of course. The issue here is that Prelude.putStrLn works on String, which is a lazy list of Chars, and in fact sends one character at a time to stdout. This is clearly not what we want. However, at the same time, many Haskellers - myself included - consider String-based I/O a bad choice anyway. So let's replace this with Text-based I/O:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async --package text
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent.Async
import Control.Monad (replicateM_)
import qualified Data.Text as T
import qualified Data.Text.IO as T

worker :: Int -> IO ()
worker num = replicateM_ 5
           $ T.putStrLn
           $ T.pack
           $ "Hi, I'm worker #" ++ show num

main :: IO ()
main = do
    mapConcurrently worker [1..5]
    return ()

Unfortunately, if you run this (at least via runghc), the results are the same. If you look at the implementation of Data.Text.IO.hPutStr, you'll see that there are different implementations of that function depending on the buffering straregy of the Handle we're writing to. In the case of NoBuffering (which is the default with GHCi and runghc), this will output one character at a time (just like String), whereas LineBuffering and BlockBuffering have batch behavior. You can see this with:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async --package text
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent.Async
import Control.Monad (replicateM_)
import qualified Data.Text as T
import qualified Data.Text.IO as T
import System.IO

worker :: Int -> IO ()
worker num = replicateM_ 5
           $ T.putStrLn
           $ T.pack
           $ "Hi, I'm worker #" ++ show num

main :: IO ()
main = do
    hSetBuffering stdout LineBuffering
    mapConcurrently worker [1..5]
    return ()

While better, this still isn't perfect:

Hi, I'm worker #4Hi, I'm worker #5Hi, I'm worker #1

Hi, I'm worker #4Hi, I'm worker #5Hi, I'm worker #1

Hi, I'm worker #4Hi, I'm worker #5

Unfortunately, because newlines are written to stdout separately from the message, these kinds of issues happen too frequently. This can be worked around too by using putStr instead and manually appending a newline character:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async --package text
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent.Async
import Control.Monad (replicateM_)
import qualified Data.Text as T
import qualified Data.Text.IO as T
import System.IO

worker :: Int -> IO ()
worker num = replicateM_ 5
           $ T.putStr
           $ T.pack
           $ "Hi, I'm worker #" ++ show num ++ "\n"

main :: IO ()
main = do
    hSetBuffering stdout LineBuffering
    mapConcurrently worker [1..5]
    return ()

Finally, we can avoid the buffering-dependent code in the text package and use ByteString output, which has the advantage of automatically using this append-a-newline logic for small-ish ByteStrings:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent.Async
import Control.Monad (replicateM_)
import qualified Data.ByteString.Char8 as S8

worker :: Int -> IO ()
worker num = replicateM_ 100
           $ S8.putStrLn
           $ S8.pack
           $ "Hi, I'm worker #" ++ show num

main :: IO ()
main = do
    mapConcurrently worker [1..100]
    return ()

However, this has the downside of assuming a certain character encoding, which may be different from the encoding of the Handle.

What I'd like I would like a function Text -> IO () which - regardless of buffering strategy - appends a newline to the Text value and sends the entire chunk of data to a Handle in a thread-safe manner. Ideally it would account for character encoding (though assuming UTF8 may be an acceptable compromise for most use cases), and it would be OK if very large values are occassionally compromised during output (due to the write system call not accepting the entire chunk at once).

What I'd recommend today In a number of my smaller applications/scripts, I've become accustomed to defining a say = BS.hPutStrLn stdout . encodeUtf8. I'm tempted to add this to a library - possibly even classy-prelude - along with either reimplementing print as print = say . T.pack . show (or providing an alternative to print). I've also considered replacing the putStrLn in classy-prelude with this implementation of say.

However, I'm hoping others have some better thoughts on this, because I don't really find these solutions very appealing.

Non-closable channels

Let's implement a very simple multi-worker application with communication over a Chan:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async --package text
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent
import Control.Concurrent.Async
import Control.Monad (forever)
import Data.Text (Text, pack)
import Data.Text.Encoding (encodeUtf8)
import qualified Data.ByteString.Char8 as S8

say :: Text -> IO ()
say = S8.putStrLn . encodeUtf8

worker :: Chan Int -> Int -> IO ()
worker chan num = forever $ do
    i <- readChan chan
    say $ pack $ concat
        [ "Worker #"
        , show num
        , " received value "
        , show i

main :: IO ()
main = do
    chan <- newChan
    mapConcurrently (worker chan) [1..5] `concurrently`
        mapM_ (writeChan chan) [1..10]
    return ()

(Yes, I used the aforementioned say function.)

This looks all well and good, but check out the end of the output:

Worker #5 received value 8
Worker #3 received value 9
Worker #1 received value 10
Main: thread blocked indefinitely in an MVar operation

You see, the worker threads have no way of knowing that there are no more writeChan calls incoming, so they continue to block. The runtime system notes this, and sends them an async exception to kill them. This is a really bad idea for program structure as it can easily lead to deadlocks. Said more simply:

If you rely on exceptions for non-exceptional cases, you're gonna have a bad time

Instead, the workers should have some way of knowing that the channel is closed. This is a common pattern in other languages, and one I think we should borrow. Implementing this with STM isn't too bad actually, and can easily have an IO-based API if desired:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async --package text
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative ((<|>))
import Control.Concurrent.Async
import Control.Concurrent.STM
import Data.Text (Text, pack)
import Data.Text.Encoding (encodeUtf8)
import qualified Data.ByteString.Char8 as S8

say :: Text -> IO ()
say = S8.putStrLn . encodeUtf8

data TCChan a = TCChan (TChan a) (TVar Bool)

newTCChan :: IO (TCChan a)
newTCChan = atomically $ TCChan <$> newTChan <*> newTVar False

closeTCChan :: TCChan a -> IO ()
closeTCChan (TCChan _ var) = atomically $ writeTVar var True

writeTCChan :: TCChan a -> a -> IO ()
writeTCChan (TCChan chan var) val = atomically $ do
    closed <- readTVar var
    if closed
        -- Could use nicer exception types, or return a Bool to
        -- indicate if writing failed
        then error "Wrote to a closed TCChan"
        else writeTChan chan val

readTCChan :: TCChan a -> IO (Maybe a)
readTCChan (TCChan chan var) = atomically $
    (Just <$> readTChan chan) <|> (do
        closed <- readTVar var
        check closed
        return Nothing)

worker :: TCChan Int -> Int -> IO ()
worker chan num =
    loop = do
        mi <- readTCChan chan
        case mi of
            Nothing -> return ()
            Just i -> do
                say $ pack $ concat
                    [ "Worker #"
                    , show num
                    , " received value "
                    , show i

main :: IO ()
main = do
    chan <- newTCChan
    mapConcurrently (worker chan) [1..5] `concurrently` do
        mapM_ (writeTCChan chan) [1..10]
        closeTCChan chan
    return ()

Fortunately, this problem has a preexisting solution: the stm-chans package, which provides closable and bounded channels and queues. Our problem above can be more easily implemented with:

#!/usr/bin/env stack
-- stack --resolver lts-6.23 --install-ghc runghc --package async --package text --package stm-chans
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.Concurrent.STM.TMQueue
import Data.Text (Text, pack)
import Data.Text.Encoding (encodeUtf8)
import qualified Data.ByteString.Char8 as S8

say :: Text -> IO ()
say = S8.putStrLn . encodeUtf8

worker :: TMQueue Int -> Int -> IO ()
worker q num =
    loop = do
        mi <- atomically $ readTMQueue q
        case mi of
            Nothing -> return ()
            Just i -> do
                say $ pack $ concat
                    [ "Worker #"
                    , show num
                    , " received value "
                    , show i

main :: IO ()
main = do
    q <- newTMQueueIO
    mapConcurrently (worker q) [1..5] `concurrently` do
        mapM_ (atomically . writeTMQueue q) [1..10]
        atomically $ closeTMQueue q
    return ()

What I'd like The biggest change needed here is just to get knowledge of this very awesome stm-chans package out there more. That could be with blog posts, or even better with links from the stm package itself. A step up from there could be to include this functionality in the stm package itself. Another possible niceity would be to add a non-STM API for these - whether based on STM or MVars internally - for more ease of use. I may take a first step here by simply depending on and reexporting stm-chans from classy-prelude.

What I'd recommend Probably pretty obvious: use stm-chans!

Like the previous point though, I'm interested to see how other people have approached this problem, since I haven't heard it discussed much in the past. Either others haven't run into this issue as frequently as I have, everyone already knows about stm-chans, or there's some other solution people prefer.

November 16, 2016 12:00 AM

November 15, 2016

wren gayle romano

Three ineffectual strategies for dealing with trauma and pain

The last week has been challenging for all of us. In the depths of my own fear and uncertainty, I reached for one of my favorite books —Pema Chödrön’s Comfortable with Uncertainty— and opened to a passage at random. On friday, a friend of mine asked how I’ve been able to deal with it all. I told him about the passage, and he (a non-buddhist) found it helpful in dealing with his own pain, so I wanted to share more broadly.

Before getting to the passage, I think it’s important for people to recognize that this pain we are feeling is a collective trauma. This is not our day-to-day pain, not our usual suffering. Everyone develops habits and skills for addressing the typical discomforts of life, but those skills are often inapplicable or ineffective for recovering from truly traumatic events. When someone is in a car wreck, or attacked, or raped, or abruptly loses a job or loved one— we recognize these things as traumas. We recognize that these events take some extra work to recover from. In the aftermath of the election I have seen many of the symptoms of trauma in the people around me. Depression, hypervigilance, difficulty concentrating, short tempers, and so on. When trauma hits, our usual coping mechanisms often fail or go haywire. A drink or two to unwind, turns into bleary drunkenness every night. Playing games to let go, turns into escapism to avoid thinking. Solitude, turns into reclusion. A healthy skepticism, turns into paranoia. If we do not recognize traumas for what they are, it becomes all too easy to find ourselves with even worse problems. Recognition is necessary for forming an appropriate response.

Now, the passage. As humans we have three habitual methods for relating to suffering. All three are ineffectual at reducing that suffering. These three ineffectual strategies are: attacking, indulging, and ignoring. And I’ve seen all three in great quantities in all the OpEd pieces floating around over the past week.

By “attacking” Pema Chödrön means not just lashing out, attacking Trump’s supporters or their ideals, but also all the ways we attack ourselves: We condemn ourselves, criticize ourselves for any indulgence, pity ourselves to the point of not getting out of bed. This strategy shows up in all those articles criticizing us for not having interpreted the polls correctly, or chastising us for not voting, or condemning the way the internet has formed these echo-chamber bubbles, and so on. But all this self-flagellation, all this beating ourselves up, does nothing to heal our pain. Now we suffer not only from our fears of what’s to come, but also because “it’s all our fault”. We refuse to “let ourselves off easy”, so whenever someone tries to address our pain we attack them and beat them away, protecting our pain because we feel like we deserve it.

Indulging is just as common. Though we are haunted by self-doubt, we condone our behavior. We say “I don’t deserve this discomfort. I have plenty of reasons to be angry or sleep all day.” We justify our pain to the point of turning it into a virtue and applauding ourselves. This strategy shows up in all those articles that relish in the details of how bad things will become, or congratulating ourselves for saying something like this would happen. But again, by cherishing our pain and presenting it as something to be praised, we are preventing ourselves from healing. Noone wants to give up something they cherish, nor give up on all the attention and sympathy they are lavished with.

Ignoring is no less common. “Ignoring” means not just refusing to acknowledge our pain and fear, but also pretending it doesn’t exist, dissociating, spacing out, going numb, acting on autopilot, or any of the other ways to try to keep our suffering out of sight and out of mind. This strategy is advocated by all those articles talking about how things actually aren’t that bad, or how this is just business as usual, or how it’ll all get better once the mid-term elections happen. While ignoring seems effective in the short term, it does nothing to address the suffering you feel. In addition to not healing that initial wound, it creates more pain as we inevitably force ourselves into tighter and tighter spaces in order to keep it out of mind.

There is an alternative to these three futile strategies. The enlightened strategy is to try fully experiencing whatever you’ve been resisting— without exiting in your habitual way. Become inquisitive about your habits. Recognize when you are pushing your suffering away, or embracing it, or denying it. Become inquisitive about your suffering. What is it, exactly, that you are denying? Why does it feel so urgent to push it away? Why does it feel so necessary to cling to it? Stop trying to justify your feelings, stop trying to explain them. Start instead to look at them, to see them for what they really are. Ask why it is they hurt, what part of your ego they compromise, what ideals they belie.

The passage on the three futile strategies follows a koan about “heaven and hell”. From a buddhist perspective, “hell” is not a place, it is all the feelings of pain and fear and suffering we experience. Nor is “heaven” a place, but rather all our feelings of gratitude and joy and understanding. Thus, the buddhist does not say “hell is bad and heaven is good” nor “get rid of hell and just seek heaven”. Rather, one should approach all things with an open mind, greeting both heaven and hell with that openness. In her words,

Only with this kind of equanimity can we realize that no matter what comes along, we’re always standing in the middle of a sacred space. Only with equanimity can we see that everything that comes into our circle has come to teach us what we need to know.

I find these words powerfully healing. It is healing to remember that no matter where we are or what befalls us, our life is a blessing, and in virtue of that blessing our bodies and the places we move through are sacred spaces. The sacred is not something which exists without us, but something which is created from within. Moreover, it is healing to step away from questions like “what did I do to deserve this?” and instead remember to ask what it is we can learn from the experience.

I have endured many traumas in my life, and I half expected the election outcome, but still it felt like a kick in the chest. This wound brought back all my darkest habits. Once I recovered from the shock enough to begin the rituals of healing and self-care, I reflected on the question of why this particular wound hurt so bad. In my experience (and not just because I’m buddhist), deep emotional pain always stems from some threat to one’s ego; so what part of my ego is on the line? For me, the reason the election hurt so much is because I had become complacent in believing that the world is steadily becoming a more just place and believing that people are by-and-large fundamentally good. With the election of Obama, the passing of the ACA, the supreme court ruling on Obergefell v. Hodges, and so on, I think a lot of us on the progressive side have been susceptible to those beliefs. The election hurt so much, for me, because it forced the recognition that it’s not just the legacy of systemic institutionalized hatred we must fight, but that over a quarter of the population actively supports the worst extremes of that hatred. Yes, the election itself was offensive. Yes, I fear for my life and the lives of those close to me. But the real root of the pain itself, the reason it hurt so bad, is this refutation of those optimistic beliefs about humanity and the path towards justice. Realizing that this was the root cause of my pain did a lot to help me process it and move on. It also gave a healthy way to shift focus from the pain itself, to something actionable. Having experienced the pain, I can accept it. And having learned what it has to teach me, I know what I must do.

So sit with your pain, and try to experience it fully. Stop pushing it away. Stop embracing it. Stop beating yourself up over it. Approach it with an open mind and let it pass through you. And, finally, ask yourself what you can learn from it.

comment count unavailable comments

November 15, 2016 09:53 AM

November 11, 2016

Mark Jason Dominus

The worst literature reference ever

I think I may have found the single worst citation on Wikipedia. It's in the article on sausage casing. There is the following very interesting claim:

Reference to a cooked meat product stuffed in a goat stomach like a sausage was known in Babylon and described as a recipe in the world’s oldest cookbook 3,750 years ago.

That was exciting, and I wanted to know more. And there was a citation, so I could follow up!

The citation was:

(Yale Babylonian collection, New Haven Connecticut, USA)

I had my work cut out for me. All I had to do was drive up to New Haven and start translating their 45,000 cuneiform tablets until I found the cookbook.

(I tried to find a better reference, and turned up the book The Oldest Cuisine in the World: Cooking in Mesopotamia. The author, Jean Bottéro, was the discoverer of the cookbook, or rather he was the person who recognized that this tablet was a cookbook and not a pharmacopoeia or whatever. If the Babylonian haggis recipe is anywhere, it is probably there.)

by Mark Dominus ( at November 11, 2016 10:52 AM

November 10, 2016

Derek Elkins

Constant-time Binary Logarithm


I’ve been watching the Spring 2012 lectures for MIT 6.851 Advanced Data Structures with Prof. Erik Demaine. In lecture 12, “Fusion Trees”, it mentions a constant time algorithm for finding the index of the first most significant 1 bit in a word, i.e. the binary logarithm. Assuming word operations are constant time, i.e. in the Word RAM model, the below algorithm takes 27 word operations (not counting copying). When I compiled it with GHC 8.0.1 -O2 the core of the algorithm was 44 straight-line instructions. The theoretically interesting thing is, other than changing the constants, the same algorithm works for any word size that’s an even power of 2. Odd powers of two need a slight tweak. This is demonstrated for Word64, Word32, and Word16. It should be possible to do this for any arbitrary word size w.

The clz instruction can be used to implement this function, but this is a potential simulation if that or a similar instruction wasn’t available. It’s probably not the fastest way. Similarly, find first set and count trailing zeros can be implemented in terms of this operation.


Below is the complete code. You can also download it here.

{-# LANGUAGE BangPatterns #-}
import Data.Word
import Data.Bits

-- Returns 0-based bit index of most significant bit that is 1. Assumes input is non-zero.
-- That is, 2^indexOfMostSignificant1 x <= x < 2^(indexOfMostSignificant1 x + 1)
-- From Erik Demaine's presentation in Spring 2012 lectures of MIT 6.851, particularly "Lecture 12: Fusion Trees".
-- Takes 26 (source-level) straight-line word operations.
indexOfMostSignificant1 :: Word64 -> Word64
indexOfMostSignificant1 w = idxMsbyte .|. idxMsbit
        -- top bits of each byte
        !wtbs = w .&. 0x8080808080808080
        -- all but top bits of each byte producing 8 7-bit chunks
        !wbbs = w .&. 0x7F7F7F7F7F7F7F7F              

        -- parallel compare of each 7-bit chunk to 0, top bit set in result if 7-bit chunk was not 0
        !pc = parallelCompare 0x8080808080808080 wbbs

        -- top bit of each byte set if the byte has any bits set in w
        !ne = wtbs .|. pc                             

        -- a summary of which bytes (except the first) are non-zero as a 7-bit bitfield, i.e. top bits collected into bottom byte
        !summary = sketch ne `unsafeShiftR` 1

        -- parallel compare summary to powers of two
        !cmpp2 = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * summary)
        -- index of most significant non-zero byte * 8
        !idxMsbyte = sumTopBits8 cmpp2                

        -- most significant 7-bits of most significant non-zero byte
        !msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1

        -- parallel compare msbyte to powers of two
        !cmpp2' = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * msbyte)

        -- index of most significant non-zero bit in msbyte
        !idxMsbit = sumTopBits cmpp2' 

        -- Maps top bits of each byte into lower byte assuming all other bits are 0.
        -- 0x2040810204081 = sum [2^j | j <- map (\i -> 49 - 7*i) [0..7]]
        -- In general if w = 2^(2*k+p) and p = 0 or 1 the formula is:
        -- sum [2^j | j <- map (\i -> w-(2^k-1) - 2^(k+p) - (2^(k+p) - 1)*i) [0..2^k-1]]
        -- Followed by shifting right by w - 2^k
        sketch w = (w * 0x2040810204081) `unsafeShiftR` 56

        parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8080808080808080
        sumTopBits w = ((w `unsafeShiftR` 7) * 0x0101010101010101) `unsafeShiftR` 56
        sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x0808080808080808) `unsafeShiftR` 56

indexOfMostSignificant1_w32 :: Word32 -> Word32
indexOfMostSignificant1_w32 w = idxMsbyte .|. idxMsbit
    where !wtbs = w .&. 0x80808080
          !wbbs = w .&. 0x7F7F7F7F
          !pc = parallelCompare 0x80808080 wbbs
          !ne = wtbs .|. pc
          !summary = sketch ne `unsafeShiftR` 1
          !cmpp2 = parallelCompare 0xFF838180 (0x01010101 * summary)
          !idxMsbyte = sumTopBits8 cmpp2
          !msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
          !cmpp2' = parallelCompare 0x87838180 (0x01010101 * msbyte)

          -- extra step when w is not an even power of two
          !cmpp2'' = parallelCompare 0xFFBF9F8F (0x01010101 * msbyte)
          !idxMsbit = sumTopBits cmpp2' + sumTopBits cmpp2''

          sketch w = (w * 0x204081) `unsafeShiftR` 28
          parallelCompare w1 w2 = complement (w1 - w2) .&. 0x80808080
          sumTopBits w = ((w `unsafeShiftR` 7) * 0x01010101) `unsafeShiftR` 24
          sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x08080808) `unsafeShiftR` 24

indexOfMostSignificant1_w16 :: Word16 -> Word16
indexOfMostSignificant1_w16 w = idxMsnibble .|. idxMsbit
    where !wtbs = w .&. 0x8888
          !wbbs = w .&. 0x7777
          !pc = parallelCompare 0x8888 wbbs
          !ne = wtbs .|. pc
          !summary = sketch ne `unsafeShiftR` 1
          !cmpp2 = parallelCompare 0xFB98 (0x1111 * summary)
          !idxMsnibble = sumTopBits4 cmpp2
          !msnibble = ((w `unsafeShiftR` (fromIntegral idxMsnibble)) .&. 0xF) `unsafeShiftR` 1
          !cmpp2' = parallelCompare 0xFB98 (0x1111 * msnibble)
          !idxMsbit = sumTopBits cmpp2'

          sketch w = (w * 0x249) `unsafeShiftR` 12
          parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8888
          sumTopBits w = ((w `unsafeShiftR` 3) * 0x1111) `unsafeShiftR` 12
          sumTopBits4 w = ((w `unsafeShiftR` 3) * 0x4444) `unsafeShiftR` 12

November 10, 2016 07:48 AM

Chung-chieh Shan

Very far away from anywhere else

My experience immigrating to a foreign place is inextricable from my experience growing up. Somewhere along the way I learned the habit of recoiling from disappointment by lowering my expectations to match. I shrunk myself in space and time, never delaying others by holding the subway door open. As I practiced attending to the present and the internal, I dissociated from planning for the future and trusting the external. What’s beyond my arm’s reach is not my home and I don’t belong there. If I don’t have time to make a vegetable available by growing it myself, then it doesn’t deserve to be my comfort food.

People say they don’t recognize their country anymore. I just realized that in my case, it’s myself that I don’t recognize anymore.

November 10, 2016 06:11 AM

November 09, 2016

Brent Yorgey

Sandy Maguire

Book Announcement

<article> <header>

Book Announcement


<time>November 9, 2016</time> meta, announcements

Last night something strange happened. For a brief moment, the internet somehow thought that my criticism of Elm was more important to discuss than the presidential election, because it was at the #1 spot on Hacker News. I’m not 100% sure how things end up at the #1 spot on Hacker News, but it sounds like a pretty desirable place to be.

My traffic yesterday was up three orders of magnitude from its average, so it seems like now’s as good a time as any to announce my new project:

I’m writing a book! It’s a gentle introduction to computer science, from first principles of electronics to category theory. If that sounds like the kind of thing you might be into, you can find it here.


November 09, 2016 12:00 AM

November 07, 2016

Philip Wadler

Anti-semitism, conjured and real

Accusations of anti-semitism in the Labour party have gone virtually unchallenged, which is unconscionable because almost all of what is referred to as 'anti-semitism' is simply legitimate protest against Israel's oppression of Palestinians. David Plank at Jews Sans Frontiers has just published a thorough debunking

I've been lucky to rarely face anti-semitism in my personal life. So its salutary to be reminded the extent to which it actually exists in the world. If nothing else, this is something that Donald Trump does well.

Trump's campaign is based on dog-whistle racism, including anti-semitism, as called out in An Open Letter to Jared Kushner from one of your Employees and, more humorously, by Jon Stewart in The Day I Woke Up To Find Out Somebody Was Tweeting Weird Shit About Me.


Of course, many others than Jews have faced the same racism, as noted in The Price I’ve Paid for Opposing Donald Trump.

The issues at stake have been eloquently stated, more forthrightly than in most media, by Adam Gopnik in A Point of View. I expect most folk reading this will not be supporters of Trump, but, if you are, please listen to it before you vote.

by Philip Wadler ( at November 07, 2016 03:10 PM

Ken T Takusagawa

[wisnmyni] Data as a number

Convert a number with possibly many leading zeroes in base M to base N, prefixing the base N output representation with leading zeroes in a way that unambigiously specifies the number of leading zeroes in base M input.  I think this is possible when M > N.  Some potentially useful conversions:

(M=16, N=10); (M=256, N=10); (M=256, N=100); (M=10; N=2)

The deeper idea is, whenever a string of characters represents raw data and not a word in a natural language, it should be encoded so that it is clear that the string is not meant to be interpreted as a word in natural language.  Unannotated hexadecimal fails this rule: if you see the string "deadbeef", is it a word, with a meaning perhaps related to food, or is it a number?  (Annotated hexadecimal, e.g., "0xdeadbeef" is clearly not a word.)  Simpler example: what does "a" mean, indefinite article or ten in hexadecimal?  English, and other orthographies which use Hindu-Arabic numerals, already have a character set which unambiguously state that a string encodes data and not words: the numerals.  (One could argue that numbers -- strings of Hindu-Arabic numerals -- have more meaning than strictly data: they have ordinality, they obey group and field axioms, etc.  However, we frequently see numbers which aren't that, e.g., serial numbers, phone numbers, ZIP codes, ID and credit card numbers.)

The inspiration was, expressing hashes in hexadecimal is silly.  Radix conversion is quick and easy for a computer; it should be in decimal with leading zeroes if necessary.  If making the hash compact is a goal, then base 26 or base 95, with some annotation signifying it is encoded data, is better than hexadecimal.

Some Haskell code demonstrating some of the conversions.

by Ken ( at November 07, 2016 04:04 AM

November 06, 2016

Yesod Web Framework

Use MySQL Safely in Yesod Applications

With the latest version (0.1.4) of the mysql library, we now have the machinery needed to use it properly in a concurrent setting. In the past, any multi-threaded use was a little risky, although in practice it seems to have been satisfactory for applications which were not too demanding.

The necessary changes have just been made to the MySQL version of the scaffolding, and are described here. Existing Yesod sites should be updated in a similar manner. This post should give you all you need to know, but further background can be found in the MySQL manual and Roman Cheplyaka's blog.

But It Worked Anyway, Didn't It?

Let's start by reviewing why the mysql library works automatically in a single-threaded program, and why we might have got away with it most of the time in Yesod applications.

The underlying C library (libmysqlclient) requires a one-off initialisation, and then each thread in which it is called must be initialised to allocate some thread-local data. However, these actions are carried out automatically when a connect call is made, if they have not already been done. So nothing further is needed in a single-threaded program: a connect necessarily comes first, and it performs the required initialisations.

This behaviour of the connect call probably also explains why we have mostly got away with ignoring the problem in Yesod applications. Warp creates lightweight, Haskell threads by default, and these run in a rather small number of OS threads. When a new connection is opened and added to the pool, the OS thread running at the time will be initialised, as just described. Due to the small number of these threads, there is a reasonable chance that this is the first database action in each of them, resulting in correct initialisation. But there are no guarantees!

Correct Multi-Threaded Use

To be completely correct, we have to do all of the following:

  • Initialise the library as a whole.
  • Use bound threads for those which might perform database operations.
  • Initialise each thread properly.
  • Finalise each thread to free the memory used by its thread-local state.

The library initialisation is not thread-safe; it needs to be called separately to ensure that subsequent connect calls, occurring in multiple threads, detect that it has been done and do not repeat it themselves. This has been achieved in the scaffolding by calling MySQL.initLibrary from makeFoundation, before any database actions are carried out:

import qualified Database.MySQL.Base as MySQL
makeFoundation appSettings = do

The point about bound threads is that they provide a guarantee that related initialisation and database operations really do occur in the same OS thread. However, using them means that OS threads are created frequently, and the argument given above no longer applies, not even as an approximation: the threads definitely need explicit initialisation. They also need finalising to avoid a memory leak - again this is made important by the large number of threads. (There are some situations in which the finalisation can be omitted, but check the documentation carefully before doing so.)

The settings passed to warp can be used to make it spawn bound threads, instead of Haskell threads, and to specify functions to initialise and finalise them. This code shows how it is now done in the scaffolding, in Application.hs:

warpSettings foundation =
    $ setFork (\x -> void $ forkOSWithUnmask x)
    $ setOnOpen (const $ MySQL.initThread >> return True)
    $ setOnClose (const MySQL.endThread)

Warp forks a new thread to handle each connection, using the function specified by setFork. The functions passed to setOnOpen and setOnClose are called right at the start of processing the connection, and right at the end, so they are valid places to initialise and finalise the thread for use by the mysql library.

The argument to setFork is a function which creates bound threads. If you are wondering why it is written the way it is, instead of void . forkOSWithUnmask, it simply avoids the need for the ImpredicativeTypes language extension, which is considered fragile and is sometimes broken by new compiler releases!

Unfortunately, forkOSWithUnmask is not exported by Control.Concurrent until base-4.9 (ie GHC 8), so, when using earlier versions, we have to copy its definition into our code:

{-# LANGUAGE RankNTypes           #-}
import GHC.IO                               (unsafeUnmask)
forkOSWithUnmask :: ((forall a . IO a -> IO a) -> IO ()) -> IO ThreadId
forkOSWithUnmask io = forkOS (io unsafeUnmask)

What About Efficiency?

OS threads are more expensive than Haskell threads, but the difference may not matter much compared to all the other processing which is going on in a real application. It would be wise to do some benchmarks before worrying about it!

One possible optimisation is to make sure that HTTP keepalive is used, since warp creates a thread per connection, not per request. Some reverse proxies might need explicit configuration for this.

November 06, 2016 09:39 PM

November 04, 2016

Toby Goodwin

Debian chroot on Android

Sometimes, a simple idea — so simple it can be distilled down to 4 words — can be truly astounding.


For quite a while, I've been considering the best way to ensure the resilience, security, and accessibility of various pieces of personal data. There are several different categories, and no solution will be optimal for all of them. My music collection, for example, is large, non-secret, and largely replaceable (although the thought of dragging that enormous box of CDs out of the garage and reripping them all is pretty daunting!) The music lives on a server in my home, with my own backups. I upload medium bitrate versions to a cloud music service, and I have a low bitrate copy on my laptop and phone. So that's pretty well covered.

A similar scheme covers my photos and videos. They are much less replaceable than music, but fortunately much smaller, so there are a few extra copies kicking about.

Then, I have a few tiny things that I want to keep in sync across various devices. For example, today's todo list, my "blue skies ideas" list, and my password store. I've looked at syncthing, which is an awesome project, and I'm sure I'm going to find a good use for it someday.

But for these things, git is really the obvious solution. Most of them are already git repos, including my password-store, the only missing piece is a git client on my phone. So I was searching for recommendations for Android git clients, and these words jumped out at me:

create a debian image, mount it in your android device and chroot to it

My flabber was well and truly gasted.


It's very straightforward. From some debian instance on which you have root, run:

debootstrap --foreign --arch=armhf jessie jessie

Tar up the resulting tree in jessie, copy it to android, unpack it (ah, but where?), chroot, and then run:

debootstrap --second-stage


Here are some things I've used: ssh, rsync, dash, bash, the rc shell (which I happen to maintain). All the usual userland tools, mv, chmod, etc. These (of course) are the proper full GNU versions, so you don't keep being bitten by the little discrepancies in, for instance, the busybox versions.

Package management with apt-get and dpkg. And perl, git, nano, vim, update-alternatives (so I never have to see nano again), less, man.

I started installing the pass package, but that pulls in gtk, pango and a whole bunch of other things I'm not going to use. So I downloaded password-store and installed it myself.

The ps command (you need to mount /proc in the chroot of course), top, strace, lsof. You can even strace android processes, it all Just Works. (OK, lsof gets upset because it's inside a chroot and it can't see all the mount points that /proc/mounts says exist. But still.)

I thought it might be fun to run mosh. It installed fine, but then bombed out with a weird error. I went on a bit of a wild goose chase, and concluded (it was late at night) that I needed a fix from the development version. So I cloned the mosh repo on github, installed a whole heap of build tools, compilers, libraries, and built mosh. On my phone!

In fact, the problem was simpler than that, and easily solved by using the stty command to declare my terminal size. And then I had to open up some ports in android's firewall... with iptables of course.

I could go on, but you're getting the idea. In summary, this is not something pretending to be a GNU/Linux system. It's the real deal.

Of course, there are some missing pieces, of which the most serious is the lack of daemons. I've installed etckeeper, but there will be no daily autocommit.

Ping doesn't work, because even a root shell is not allowed to use raw sockets. You can create a user, but it's not able to do much... I'll look at this some more when I have time, but I'm just running everything as root at the moment. Android's systems for users, permissions, and capabilities are entirely unfamiliar to me, although I'm trying to learn.


I made and remade my chroot several times before I was happy with it. Hopefully these notes will make things quicker for you.

First of all, Debian wants a “real” filesystem, which is to say, anything except FAT. Of the existing partitions, an obvious choice would be /data, which on my phone is ext4. Unfortunately, the major major drawback of my phone is that its /data is tiddly, just 1GB, and perennially full. (I did try the chroot on /data, before realising the fatal flaw. One curiosity is that /data is mounted with nodev, so populating /dev fails till you remount without nodev. You might think it would be better to bind mount the real /dev into the chroot anyway, and you might well be right. But I've been running with the /dev made by debootstrap with no problems.)

So it's time to repartition my 32GB SD card. Android apparently doesn't support GPT which is only a minor nuisance. I do hate all that primary / extended / logical nonsense though, it's so 1980s.

Much, much more seriously, it complains bitterly if it finds an SD card without a FAT partition. This is infuriating. The kernel supports ext3 just fine (and ext4 too, at least for partitions on fixed internal storage, although apparently not for the SD card, which makes no sense to me). So, if I insert a card that happens to have an ext3 partition on it, why not just mount it? Or if there's some scenario I'm not aware of that might not work quite right, notify a dialogue that explains and offers to mount the partition anyway. What actually happens is a notification that tells you the SD card is “damaged”, and offers to format it. Arrggh!

(I have reason to believe that the latest versions of Android will countenance SD cards with real file systems, although I need to research this further.)

My next try was a 50MB FAT partition, and the remainder ext3. This largely worked, but it didn't leave anywhere for android to move those apps which are prepared to live on SD card, absolutely vital to squeeze some extra apps onto my old phone.

The final iteration was a 4GB FAT partition, and the rest ext3. Of course, I don't need 28GB for the chroot itself: it starts off well under 1G, and even after installing loads of stuff I'm still under 2G. But I realised that I'd be able to put my music collection on the ext3 partition, which would save the tedious step of renaming everything to comply with FAT restrictions (mainly the prohibition on : in file names). Of course, I can now rsync-over-ssh the music from my laptop, which seems to go quicker than via USB.

Another annoyance is that the ext3 partition on the SD card isn't automatically mounted. I've spent some time in the past trying to find a boot time hook I can use, but with no luck. So I have to do this from the android shell every time my phone reboots, using a helper script cunningly located under the mount point:

root@android:/ # cat /data/ext3/m
mount -t ext3 /dev/block/mmcblk1p5 /data/ext3


Far and away the nicest way to communicate with the chroot is to plug into a laptop or desktop and use adb shell from the Android SDK. At that point, it's scarcely different from sshing to a remote server.

Of course, the whole point of the phone is that it's portable. On the move, I'm using Jack Palevich's Terminal Emulator for Android and Klaus Weidner's Hacker's Keyboard. The keyboard has all the keys you need — Esc, Tab, Ctrl etc — so it's ideal for non-trivial tasks (such as vim!). But the tiny keys are very fiddly on my phone, especially in portrait, so I sometimes stick to my usual keyboard.

I've got a trivial helper script to start my favourite shell under ssh-agent:

root@android:/ # cat /data/ext3/ch
exec chroot /data/ext3/jessie /usr/bin/ssh-agent /usr/bin/rc -l


So I have a fantastic solution to my document and password management problems. And a great ssh client. And a whole new architecture to build my projects on, most of which aren't the sort of thing that it makes much sense to run on a phone, but building in different places is always good for portability.

I'd heard that Android uses a "modified Linux" kernel, so I wasn't really expecting any of this stuff to work properly, let alone tools like strace and lsof. Apparently, though, the changes were folded back into the mainline kernel at the 3.3 release. My (3 year old) phone runs 3.4.5, so presumably this a fairly vanilla kernel.

This is awesome. Google has its faults, but their commitment to free software easily earns them the “least evil” prize among the current Internet quintumvirate. (That's Google, Apple, Facebook, Amazon, and Microsoft, for anyone who's been asleep the last few years.)

Realising that, yes, that computer in my pocket is a pukka Linux box has endeared me even further to Android. I'd love to write some apps for it... except I've already got more than enough projects to fill my “copious spare time”!

Update November 2016

A couple of new things I've discovered since writing this article.

First, the debootstrap command is available on Fedora, from the system repository! So you don't need a Debian box to build the initial system image. (Having debootstrap around is also handy for making a Debian container on my desktop.)

Secondly, I think the reason Android won't automatically mount an ext2/3/4 partition is that it has no idea how to map UIDs and GIDs on the filesystem. Any obvious trivial solution, such as “only UID 0 can read or write”, would make it inaccessible to Android processes. Remember, you're not supposed to root your Android device! I've just ordered my next phone, which is supported by CyanogenMod, so it's likely that I'll end up with an OS that is rather more enlightened about users having full control of their devices. Having said that, I don't believe it has an SD slot, so the issue of real FS support won't arise.

November 04, 2016 08:47 AM

November 03, 2016

The team

Updates for November 3, 2016

The following changes have been made since September:

November 03, 2016 03:00 PM

Michael Snoyman

Designing APIs for Extensibility

This is an old bit of content that I wrote, and I'm relocating to this blog for posterity. I've actually been using this technique in practice to a large extent over the past few libraries with settings I've written, and overall like it. But it's definitely opinionated, your mileage may vary.

Every time you make a breaking change in your API, it means that- potentially- one or more of your users will need to change his/her code to adapt. Even if this update is trivial, it adds friction to the code maintenance process. On the other hand, we don't want to be constrained by bad design choices early on in a project, and sometimes a breaking API change is the best option.

The point of this document, however, is to give you a third option: design your APIs from the outset to be extensible. There are common techniques employed in the Haskell world to make APIs that are resilient to changing feature-sets, and by employing them early on in your design process, you can hopefully avoid the painful choices between a better API and happy users.

Almost all techniques start with implementation hiding. Guidelines here are simple: don't expose anything non-public. For example, if you write a number of helper functions, you may not want to start off by exposing them, since you're then telling users that these are good, stable functions to be relied upon. Instead, use explicit export lists on your modules and only include functions that are intended for public consumption.

More important- and more tricky- than functions are data constructors. In many cases, you want to avoid exposing the internals of your data types to users, to allow you to expand on them in the future. A common use case for this is some kind of a data type providing configuration information. Consider that you're going to communicate with some web services, so you write up the following API:

module MyAPI
    ( Settings (..)
    , makeAPICall
    ) where

data Settings = Settings
    { apiKey :: Text
    , hostName :: Text

makeAPICall :: Settings -> Foo -> IO Bar

The way your users will access this will be something like:

makeAPICall Settings
    { apiKey = myAPIKey
    , hostName = ""
    } myFoo

Now suppose a user points out that, in some cases, the standard port 80 is not used for the API call. So you add a new field port :: Int to your Settings constructor. This will break your user's code, since the port field will not be set.

Instead, a more robust way of specifying your API will look like:

module MyAPI
    ( Settings
    , mkSettings
    , setHostName
    , makeAPICall
    ) where

data Settings = Settings
    { apiKey :: Text
    , hostName :: Text

-- | Create a @Settings@ value. Uses default value for host name.
mkSettings :: Text -- ^ API Key
           -> Settings
mkSettings key = Settings
    { apiKey = key
    , hostName = ""

setHostName :: Text -> Settings -> Settings
setHostName hn s = s { hostName = hn }

makeAPICall :: Settings -> Foo -> IO Bar

Now your user code will instead look like:

makeAPICall (mkSettings myAPIKey) myFoo

This has the following benefits:

  • The user is not bothered to fill in default values (in this case, the hostname).
  • Extending this API to allow for more fields in the future is trivial: add a new set* function. Internally, you'll add a field to Settings and set a default value in mkSettings.

One thing to note: please do not expose the field accessors directly. If you want to provide getter functions in addition to setters, write them explicitly, e.g.:

getHostName :: Settings -> Text
getHostName = hostName

The reason for this is that by exposing field accessors, users will be able to write code such as:

(mkSettings myAPIKey) { hostName = "" }

This ties your hand for future internal improvements, since you are now required to keep a field of name hostName with type Text. By just using set and get functions, you can change your internal representation significantly and still provide a compatibility layer.

For those of you familiar with other languages: this is in fact quite similar to the approach taken in Java or C#. Just because Java does it doesn't mean it's wrong.

Note that this advice is different to, and intended to supersede, the settings type approach. Projects like Warp which previously used that settings type approach are currently migrating to this more extensible approach.

Also, while settings have been used here as a motivating example, the same advice applies to other contexts.

Internal modules

One downside of implementation hiding is that it can make it difficult for users to do things you didn't intend for them to do with your API. You can always add more functionality on demand, but the delay can be a major nuissance for users. A compromise solution in the Haskell community is to provide a .Internal module for your project which exports not-quite-public components. For example, in wai, the Response constructors are exposed in a Network.Wai.Internal module. Normally, users are supposed to use smart constructors like responseFile, but occasionally they'll want more fine-grained control.

November 03, 2016 12:00 AM

November 01, 2016

Douglas M. Auclair (geophf)

October 2016 1HaskellADay Problems and Solutions

by geophf ( at November 01, 2016 02:44 PM

Functional Jobs

Senior Software Engineer (Haskell) at Front Row Education (Full-time)


Senior Software Engineer to join fast-growing education startup transforming the way 5+ million K-12 students learn Math and English.

What you tell your friends you do

“You know how teachers in public schools are always overworked and overstressed with 30 kids per classroom and never ending state tests? I make their lives possible and help their students make it pretty far in life”

What you really will be doing

Architect, design and develop new web applications, tools and distributed systems for the Front Row ecosystem in Haskell, Flow, PostgreSQL, Ansible and many others. You will get to work on your deliverable end-to-end, from the UX to the deployment logic

Mentor and support more junior developers in the organization

Create, improve and refine workflows and processes for delivering quality software on time and without incurring debt

Work closely with Front Row educators, product managers, customer support representatives and account executives to help the business move fast and efficiently through relentless automation.

How you will do this

You’re part of an agile, multidisciplinary team. You bring your own unique skill set to the table and collaborate with others to accomplish your team’s goals.

You prioritize your work with the team and its product owner, weighing both the business and technical value of each task.

You experiment, test, try, fail and learn all the time

You don’t do things just because they were always done that way, you bring your experience and expertise with you and help the team make the best decisions

What have we worked on in the last quarter

We have rewritten our business logic to be decoupled from the Common Core math standards, supporting US state-specific standards and international math systems

Prototyped and tested a High School Math MVP product in classrooms

Changed assigning Math and English to a work queue metaphor across all products for conceptual product simplicity and consistency

Implemented a Selenium QA test suite 100% in Haskell

Released multiple open source libraries for generating automated unit test fixtures, integrating with AWS, parsing and visualizing Postgres logs and much more

Made numerous performance optimization passes on the system for supporting classrooms with weak Internet bandwidth


We’re an agile and lean small team of engineers, teachers and product people working on solving important problems in education. We hyper-focus on speeds, communication and prioritizing what matters to our millions of users.


  • You’re smart and can find a way to show us.
  • A track record of 5+ years of working in, or leading, teams that rapidly ship high quality web-based software that provides great value to users. Having done this at a startup a plus.
  • Awesome at a Functional Programming language: Haskell / Scala / Clojure / Erlang etc
  • Exceptional emotional intelligence and people skills
  • Organized and meticulous, but still able to focus on the big picture of the product
  • A ton of startup hustle: we're a fast-growing, VC-backed, Silicon Valley tech company that works hard to achieve the greatest impact we can.


  • Money, sweet
  • Medical, dental, vision
  • Incredible opportunity to grow, learn and build lifetime bonds with other passionate people who share your values
  • Food, catered lunch & dinner 4 days a week + snacks on snacks
  • Room for you to do things your way at our downtown San Francisco location right by the Powell Station BART, or you can work remotely from anywhere in the US, if that’s how you roll
  • Awesome monthly team events + smaller get-togethers (board game nights, trivia, etc)

Get information on how to apply for this position.

November 01, 2016 01:48 AM

October 31, 2016

Leon P Smith

Announcing Configurator-ng 0.0

I’m pleased to announce a preliminary release of configurator-ng, after spending time writing documentation for it at Hac Phi this past weekend. This release is for the slightly adventurous, but I think that many, especially those who currently use configurator, will find this worthwhile.

This is a massively breaking fork of Bryan O’Sullivan’s configurator package. The configuration file syntax is almost entirely backwards compatible, and mostly forwards compatible as well. However, the application interface used to read configuration files is drastically different. The focus so far has been on a more expressive interface, an interface that is safer in the face of concurrency, and improved error messages. The README offers an overview of the goals and motivations behind this fork, and how it is attempting to satisfy those goals.

I consider this an alpha release, but I am using it in some of my projects and it should be of reasonable quality. The new interfaces I’ve created in the Data.Configurator.Parser and Data.Configurator.FromValue modules should be pretty stable, but I am planning on major breaking changes to the file (re)loading and change notification interfaces inherited from configurator.

Documentation is currently a little sparse, but the README and the preliminary haddocks I hope will be enough to get started. Please don’t hesitate to contact me, via email, IRC (lpsmith on freenode), or GitHub if you have any questions, comments, ideas, needs or problems.

by lpsmith at October 31, 2016 04:24 AM

October 28, 2016

Roman Cheplyaka

Electoral vote distributions are polynomials

In his article Electoral vote distributions are Monoids, Gabriel Gonzalez poses and answers the following question based on 538’s data:

what would be Hillary’s chance of winning if each state’s probability of winning was truly independent of one another?

To answer the question, Gabriel devises a divide-and-conquer algorithm. He computes probability distributions over vote counts in subsets of all states and then combines them. He also observes that vote distributions form a monoid.

Here I want to share an algebraic perspective on vote counting and show why distributions form a monoid.

Let \(p_i\) be the probability of Hillary’s victory in state \(i\), and \(n_i\) be the number of electoral college votes for that state, where \(i=1,\ldots,N\), and \(N\) is the total number of states (and districts; see Gabriel’s post for details).

Then a vote distribution is a collection of probabilities \(q_k\) that Hillary will get exactly \(k\) votes:

\[ \newcommand{\p}[1]{\mathrm{Pr}\{#1\}} \begin{equation} q_k = \p{\text{number of votes for H.Clinton} = k},\;k=1,\ldots,\sum_{i=1}^N n_i. \end{equation} \]

Consider the following polynomial:

\[Q(x)=\prod_{i=1}^N\left(p_i x^{n_i}+(1-p_i)\right).\]

This is a product of \(N\) brackets, one for each state. If we expanded it, we would get \(2^N\) terms. Every such term takes either \(p_i x^{n_i}\) or \(1-p_i\) from each bracket and multiplies them. Every such term corresponds to a particular election outcome: if Hillary won in a particular state, take \(p_i x^{n_i}\) from the corresponding bracket; otherwise, take \(1-p_i\).

For example, if an election outcome means that Hillary won in states \(1,4,\ldots\) and lost in states \(2,3,\ldots\), then the corresponding term is

\[ p_1 x^{n_1}(1-p_2)(1-p_3)p_4 x^{n_4}\ldots=p_1(1-p_2)(1-p_3)p_4\ldots x^{n_1+n_4+\ldots}. \]

Notice that \(p_1(1-p_2)(1-p_3)p_4\ldots\) is exactly the probability of the outcome (under the independence assumption) and \(n_1+n_4+\ldots\) is the number of votes for Hillary under that outcome.

Since the power of \(x\) in each term is the number of votes for Hillary, outcomes that result in the same number of votes, say \(k\), correspond to like terms. If we combine them, their probabilities (terms’ coefficients) will add up. To what? To \(q_k\), the total probability of Hillary getting \(k\) votes.


\[Q(x) = \sum_{k}q_kx^k.\]

Deriving the final vote distribution \(q_k\) from \(p_i\) and \(n_i\) is just expanding and reducing \(Q(x)\) from \(\prod_{i=1}^N\left(p_i x^{n_i}+(1-p_i)\right)\) to \(\sum_{k}q_kx^k\).

As Gabriel notes, doing this in the direct way would be inefficient. His divide-and-conquer approach directly translates to expanding \(Q(x)\): divide all brackets into two groups, recursively expand the groups, combine the results.

Under this formulation, it becomes obvious that vote distributions form a proper monoid: it is just a monoid of polynomials under multiplication.

October 28, 2016 08:00 PM

Functional Jobs

Software Engineer (Haskell/Clojure) at Capital Match (Full-time)


CAPITAL MATCH is a leading marketplace lending and invoice financing platform in Singapore. Our in-house platform, mostly developed in Haskell, has in the last year seen more than USD 15 million business loans processed with a strong monthly growth (current rate of USD 1.5-2.5 million monthly). We are also eyeing expansion into new geographies and product categories. Very exciting times!

We have just secured another funding round to build a world-class technology as the key business differentiator. The key components include credit risk engine, seamless banking integration and end-to-end product automation from loan origination to debt collection.


We are looking to hire a software engineer with a minimum of 2-3 years coding experience.

The candidate should have been involved in a development of multiple web-based products from scratch. He should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back features development, distributed deployment and automation in the cloud, build and test automation etc.

Background in fintech and especially lending / invoice financing space would be a great advantage.


Our platform is primarily developed in Haskell with an Om/ClojureScript frontend. We are expecting our candidate to have experience working with a functional programming language e.g. Haskell/Scala/OCaml/F#/Clojure/Lisp/Erlang.

Deployment and production is managed with Docker containers using standard cloud infrastructure so familiarity with Linux systems, command-line environment and cloud-based deployment is mandatory. Minimum exposure to and understanding of XP practices (TDD, CI, Emergent Design, Refactoring, Peer review and programming, Continuous improvement) is expected.

We are looking for candidates that are living in or are willing to relocate to Singapore.


We offer a combination of salary and equity depending on experience and skills of the candidate.

Most expats who relocate to Singapore do not have to pay their home country taxes and the local tax rate in Singapore is more or less 5% (effective on the proposed salary range).

Visa sponsorship will be provided.

Singapore is a great place to live, a vibrant city rich with diverse cultures, a very strong financial sector and a central location in Southeast Asia.

Get information on how to apply for this position.

October 28, 2016 09:15 AM

October 27, 2016

Gabriel Gonzalez

Electoral vote distributions are Monoids

I'm a political junkie and I spend way too much time following the polling results on FiveThirtyEight's election forecast.

A couple of days ago I was surprised that FiveThirtyEight gave Trump a 13.7% chance of winning, which seemed too high to be consistent with the state-by-state breakdowns. After reading their methodology I learned that this was due to them not assuming that state outcomes were independent. In other words, if one swing state breaks for Trump this might increase the likelihood that other swing states also break for Trump.

However, I still wanted to do the exercise to ask: what would be Hillary's chance of winning if each state's probability of winning was truly independent of one another? Let's write a program to find out!

Raw data

A couple of days ago (2016-10-24) I collected the state-by-state data from FiveThirtyEight's website (by hand) and recorded:

  • the name of the state
  • the chance that Hillary Clinton would win the state
  • the number of electoral college votes for that state

I recorded this data as a list of 3-tuples:

probabilities :: [(String, Double, Int)]
probabilities =
[ ("Alabama" , 0.003, 9)
, ("Alaska" , 0.300, 3)
, ("Arizona" , 0.529, 11)
, ("Arkansas" , 0.012, 6)
, ("California" , 0.999, 55)
, ("Colorado" , 0.889, 9)
, ("Connecticut" , 0.977, 7)
, ("Delaware" , 0.948, 3)
, ("District of Columbia", 0.999, 3)
, ("Florida" , 0.731, 29)
, ("Georgia" , 0.259, 16)
, ("Hawaii" , 0.996, 4)
, ("Idaho" , 0.026, 4)
, ("Illinois" , 0.994, 20)
, ("Indiana" , 0.121, 11)
, ("Iowa" , 0.491, 6)
, ("Kansas" , 0.089, 6)
, ("Kentucky" , 0.042, 8)
, ("Louisiana" , 0.009, 8)
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)
, ("Maryland" , 0.999, 10)
, ("Massachussetts" , 0.998, 11)
, ("Michigan" , 0.929, 16)
, ("Minnesota" , 0.886, 10)
, ("Mississippi" , 0.034, 6)
, ("Missouri" , 0.168, 10)
, ("Montana" , 0.119, 3)
, ("Nebraska" , 0.040, 2)
, ("Nebraska - 1" , 0.154, 1)
, ("Nebraska - 2" , 0.513, 1)
, ("Nebraska - 3" , 0.014, 1)
, ("Nevada" , 0.703, 6)
, ("New Hampshire" , 0.868, 4)
, ("New Jersey" , 0.981, 14)
, ("New Mexico" , 0.941, 5)
, ("New York" , 0.998, 29)
, ("North Carolina" , 0.689, 15)
, ("North Dakota" , 0.070, 3)
, ("Oklahoma" , 0.002, 7)
, ("Ohio" , 0.563, 18)
, ("Oregon" , 0.957, 7)
, ("Pennsylvania" , 0.880, 20)
, ("Rhode Island" , 0.974, 4)
, ("South Carolina" , 0.086, 9)
, ("South Dakota" , 0.117, 3)
, ("Tennessee" , 0.025, 11)
, ("Texas" , 0.166, 38)
, ("Utah" , 0.067, 6)
, ("Vermont" , 0.984, 3)
, ("Virginia" , 0.943, 13)
, ("Washington" , 0.975, 12)
, ("West Virginia" , 0.006, 5)
, ("Wisconsin" , 0.896, 10)
, ("Wyoming" , 0.021, 3)

Note that some states (like Maine) apportion electoral votes in a weird way:

probabilities :: [(String, Double, Int)]
probabilities =
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)

Maine apportions two of its electoral votes based on a state-wide vote (i.e. "Maine" in the above list) and then two further electoral votes are apportioned based on two districts (i.e. "Maine - 1" and "Maine - 2". FiveThirtyEight computes the probabilities for each subset of electoral votes, so we just record them separately.

Combinatorial explosion

So how might we compute Hillary's chances of winnings assuming the independence of each state's outcome?

One naïve approach would be to loop through all possible electoral outcomes and compute the probability and electoral vote for each outcome. Unfortunately, that's not very efficient since the number of possible outcomes doubles with each additional entry in the list:

>>> 2 ^ length probabilities

... or approximately 7.2 * 10^16 outcomes. Even if I only spent a single CPU cycle to compute each outcome (which is unrealistic) on a 2.5 GHz processor that would take almost a year to compute them all. The election is only a couple of weeks away so I don't have that kind of time or computing power!


Fortunately, we can do much better than that! We can efficiently solve this using a simple "divide-and-conquer" approach where we subdivide the large problem into smaller problems until the solution is trivial.

The central data structure we'll use is a probability distribution which we'll represent as a Vector of Doubles:

import Data.Vector.Unboxed (Vector)

newtype Distribution = Distribution (Vector Double)
deriving (Show)

This Vector will always have 539 elements, one element per possible final electoral vote count that Hillary might get. Each element is a Double representing the probability of that corresponding electoral vote count. We will maintain an invariant that all the probabilities (i.e. elements of the Vector) must sum to 1.

For example, if the Distribution were:

[1, 0, 0, 0, 0 ... ]

... that would represent a 100% chance of Hillary getting 0 electoral votes and a 0% chance of any other outcome. Similarly, if the Distribution were:

[0, 0.5, 0, 0.5, 0, 0, 0 ... ]

... then that would represent a 50% chance of Hillary getting 1 electoral vote and a 50% chance of Hillary getting 3 electoral votes.

In order to simplify the problem we need to subdivide the problem into smaller problems. For example, if I want to compute the final electoral vote probability distribution for all 50 states perhaps we can break that down into two smaller problems:

  • Split the 50 states into two sub-groups of 25 states each
  • Compute an electoral vote probability distribution for each sub-group of 25 states
  • Combine probability distributions for each sub-group into the final distribution

In order to do that, I need to define a function that combines two smaller distributions into a larger distribution:

import qualified Data.Vector.Unboxed

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)

The combine function takes two input distributions named xs and ys and generates a new distribution named zs. To compute the probability of getting i electoral votes in our composite distribution, we just add up all the different ways we can get i electoral votes from the two sub-distributions.

For example, to compute the probability of getting 4 electoral votes for the entire group, we add up the probabilities for the following 5 outcomes:

  • We get 0 votes from our 1st group and 4 votes from our 2nd group
  • We get 1 votes from our 1st group and 3 votes from our 2nd group
  • We get 2 votes from our 1st group and 2 votes from our 2nd group
  • We get 3 votes from our 1st group and 1 votes from our 2nd group
  • We get 4 votes from our 1st group and 0 votes from our 2nd group

The probabilityOfEachOutcome function computes the probability of each one of these outcomes and then the totalProbability function sums them all up to compute the total probability of getting i electoral votes.

We can also define an empty distribution representing the probability distribution of electoral votes given zero states:

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1
makeElement _ = 0

This distribution says that given zero states you have a 100% chance of getting zero electoral college votes and 0% chance of any other outcome. This empty distribution will come in handy later on.

Divide and conquer

There's no limit to how many times we can subdivide the problem. In the extreme case we can sub-divide the problem down to individual states (or districts for weird states like Maine and Nebraska):

  • subdivide our problem into 56 sub-groups (one group per state or district)
  • compute the probability distribution for each sub-group, which is trivial
  • combine all the probability distributions to retrieve the final result

In fact, this extreme solution is surprisingly efficient!

All we're missing is a function that converts each entry in our original probabilities list into a Distribution:

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1 - probability
makeElement i | i == votes = probability
makeElement _ = 0

This says that if our probability distribution for a single state should have two possible outcomes:

  • Hillary clinton has probability x of winning n votes for this state
  • Hillary clinton has probability 1 - x of winning 0 votes for this state
  • Hillary clinton has 0% probability of any other outcome for this state

Let's test this out on a couple of states:

>>> toDistribution ("Alaska"      , 0.300, 3)
Distribution [0.7,0.0,0.0,0.3,0.0,0.0,...
>>> toDistribution ("North Dakota", 0.070, 3)
Distribution [0.9299999999999999,0.0,0.0,7.0e-2,0.0...

This says that:

  • Alaska has a 30% chance of giving Clinton 3 votes and 70% chance of 0 votes
  • North Dakota has a 7% chance of giving Clinton 3 votes and a 93% chance of 0 votes

We can also verify that combine works correctly by combining the electoral vote distributions of both states. We expect the new distribution to be:

  • 2.1% chance of 6 votes (the probability of winning both states)
  • 65.1% chance of 0 votes (the probability of losing both states)
  • 32.8% chance of 3 votes (the probability of winning just one of the two states)

... and this is in fact what we get:

>>> let alaska      = toDistribution ("Alaska"      , 0.300, 3)
>>> let northDakota = toDistribution ("North Dakota", 0.070, 3)
>>> combine alaska northDakota
Distribution [0.6509999999999999,0.0,0.0,0.32799999999999996,0.0,0.0,2.1e-2,0.0,...

Final result

To compute the total probability of winning, we just transform each element of the list to the corresponding distribution:

distributions :: [Distribution]
distributions = map toDistribution probabilities

... then we reduce the list to a single value repeatedly applying the combine function, falling back on the empty distribution if the entire list is empty:

import qualified Data.List

distribution :: Distribution
distribution = Data.List.foldl' combine empty distributions

... and if we want to get Clinton's chances of winning, we just add up the probabilities for all outcomes greater than or equal to 270 electoral college votes:

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory

If we compile and run this program we get the final result:

$ stack --resolver=lts-7.4 build vector
$ stack --resolver=lts-7.4 ghc -- -O2 result.hs
$ ./result

In other words, Clinton has a 99.3% chance of winning if each state's outcome is independent of every other outcome. This is significantly higher than the probability estimated by FiveThirtyEight at that time: 86.3%.

These results differ for the same reason I noted above: FiveThirtyEight assumes that state outcomes are not necessarily independent and that a Trump in one state could correlate with Trump wins in other states. This possibility of correlated victories favors the person who is behind in the race.

As a sanity check, we can also verify that the final probability distribution has probabilities that add up to approximately 1:

>>> let Distribution xs = distribution
>>> Data.Vector.Unboxed.sum xs

Exercise: Expand on this program to plot the probability distribution


Our program is also efficient, running in 30 milliseconds:

$ bench ./result
benchmarking ./result
time 30.33 ms (29.42 ms .. 31.16 ms)
0.998 R² (0.997 R² .. 1.000 R²)
mean 29.43 ms (29.13 ms .. 29.81 ms)
std dev 710.6 μs (506.7 μs .. 992.6 μs)

This is a significant improvement over a year's worth of running time.

We could even speed this up further using parallelism. Thanks to our divide and conquer approach we can subdivide this problem among up to 53 CPUs to accelerate the solution. However, after a certain point the overhead of splitting up the work might outweigh the benefits of parallelism.


People more familiar with Haskell will recognize that this solution fits cleanly into a standard Haskell interface known as the Monoid type class. In fact, many divide-and-conquer solutions tend to be Monoids of some sort.

The Monoid typeclass is defined as:

class Monoid m where
mappend :: m -> m -> m

mempty :: m

-- An infix operator that is a synonym for `mappend`
(<>) :: Monoid m => m -> m -> m
x <> y = mappend x y

... and the Monoid class has three rules that every implementation must obey, which are known as the "Monoid laws".

The first rule is that mappend (or the equivalent (<>) operator) must be associative:

x <> (y <> z) = (x <> y) <> z

The second and third rules are that mempty must be the "identity" of mappend, meaning that mempty does nothing when combined with other values:

mempty <> x = x

x <> mempty = x

A simple example of a Monoid is integers under addition, which we can implement like this:

instance Monoid Integer where
mappend = (+)
mempty = 0

... and this implementation satisfies the Monoid laws thanks to the laws of addition:

(x + y) + z = x + (y + z)

0 + x = x

x + 0 = x

However, Distributions are Monoids, too! Our combine and empty definitions both have the correct types to implement the mappend and mempty methods of the Monoid typeclass, respectively:

instance Monoid Distribution where
mappend = combine

mempty = empty

Both mappend and mempty for Distributions satisfy the Monoid laws:

  • mappend is associative (Proof omitted)
  • mempty is the identity of mappend

We can prove the identity law using the following rules for how Vectors behave:

-- These rules assume that all vectors involved have 539 elements

-- If you generate a vector by just indexing into another vector, you just get back
-- the other vector
Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs

-- If you index into a vector generated by a function, that's equivalent to calling
-- that function
Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i

Equipped with those rules, we can then prove that mappend xs mempty = xs

mapppend (Distribution xs) mempty

-- mappend = combine
= combine (Distribution xs) mempty

-- Definition of `mempty`
= combine (Distribution xs) (Distribution ys)
ys = Data.Vector.Unboxed.generate 539 makeElement
makeElement 0 = 1
makeElement _ = 0

-- Definition of `combine`
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)

ys = Data.Vector.Unboxed.generate 539 makeElement
makeElement 0 = 1
makeElement _ = 0

-- Data.Vector.unsafeIndex (DataVector.generate 539 f) i = f i
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* makeElement (i - j)

makeElement 0 = 1
makeElement _ = 0

-- Case analysis on `j`
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j
| j == i =
Data.Vector.Unboxed.unsafeIndex xs j
* 1 -- makeElement (i - j) = makeElement 0 = 1
| otherwise =
Data.Vector.Unboxed.unsafeIndex xs j
* 0 -- makeElement (i - j) = 0

-- x * 1 = x
-- y * 0 = 0
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j
| j == i = Data.Vector.Unboxed.unsafeIndex xs j
| otherwise = 0

-- Informally: "Sum of a vector with one non-zero element is just that element"
= Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i = Data.Vector.Unboxed.unsafeIndex xs i

-- Data.Vector.Unboxed.generate 539 (Data.Vector.Unboxed.unsafeIndex xs) = xs
= Distribution xs

Exercise: Prove the associativity law for combine


I hope people find this an interesting example of how you can apply mathematical design principles (in this case: Monoids) in service of simplifying and speeding up programming problems.

If you would like to test this program out yourself the complete program is provided below:

import Data.Vector.Unboxed (Vector)

import qualified Data.List
import qualified Data.Vector.Unboxed

probabilities :: [(String, Double, Int)]
probabilities =
[ ("Alabama" , 0.003, 9)
, ("Alaska" , 0.300, 3)
, ("Arizona" , 0.529, 11)
, ("Arkansas" , 0.012, 6)
, ("California" , 0.999, 55)
, ("Colorado" , 0.889, 9)
, ("Connecticut" , 0.977, 7)
, ("Delaware" , 0.948, 3)
, ("District of Columbia", 0.999, 3)
, ("Florida" , 0.731, 29)
, ("Georgia" , 0.259, 16)
, ("Hawaii" , 0.996, 4)
, ("Idaho" , 0.026, 4)
, ("Illinois" , 0.994, 20)
, ("Indiana" , 0.121, 11)
, ("Iowa" , 0.491, 6)
, ("Kansas" , 0.089, 6)
, ("Kentucky" , 0.042, 8)
, ("Louisiana" , 0.009, 8)
, ("Maine" , 0.852, 2)
, ("Maine - 1" , 0.944, 1)
, ("Maine - 2" , 0.517, 1)
, ("Maryland" , 0.999, 10)
, ("Massachussetts" , 0.998, 11)
, ("Michigan" , 0.929, 16)
, ("Minnesota" , 0.886, 10)
, ("Mississippi" , 0.034, 6)
, ("Missouri" , 0.168, 10)
, ("Montana" , 0.119, 3)
, ("Nebraska" , 0.040, 2)
, ("Nebraska - 1" , 0.154, 1)
, ("Nebraska - 2" , 0.513, 1)
, ("Nebraska - 3" , 0.014, 1)
, ("Nevada" , 0.703, 6)
, ("New Hampshire" , 0.868, 4)
, ("New Jersey" , 0.981, 14)
, ("New Mexico" , 0.941, 5)
, ("New York" , 0.998, 29)
, ("North Carolina" , 0.689, 15)
, ("North Dakota" , 0.070, 3)
, ("Oklahoma" , 0.002, 7)
, ("Ohio" , 0.563, 18)
, ("Oregon" , 0.957, 7)
, ("Pennsylvania" , 0.880, 20)
, ("Rhode Island" , 0.974, 4)
, ("South Carolina" , 0.086, 9)
, ("South Dakota" , 0.117, 3)
, ("Tennessee" , 0.025, 11)
, ("Texas" , 0.166, 38)
, ("Utah" , 0.067, 6)
, ("Vermont" , 0.984, 3)
, ("Virginia" , 0.943, 13)
, ("Washington" , 0.975, 12)
, ("West Virginia" , 0.006, 5)
, ("Wisconsin" , 0.896, 10)
, ("Wyoming" , 0.021, 3)

newtype Distribution = Distribution { getDistribution :: Vector Double }
deriving (Show)

combine :: Distribution -> Distribution -> Distribution
combine (Distribution xs) (Distribution ys) = Distribution zs
zs = Data.Vector.Unboxed.generate 539 totalProbability

totalProbability i =
(Data.Vector.Unboxed.generate (i + 1) probabilityOfEachOutcome)
probabilityOfEachOutcome j =
Data.Vector.Unboxed.unsafeIndex xs j
* Data.Vector.Unboxed.unsafeIndex ys (i - j)

empty :: Distribution
empty = Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1
makeElement _ = 0

instance Monoid Distribution where
mappend = combine

mempty = empty

toDistribution :: (String, Double, Int) -> Distribution
toDistribution (_, probability, votes) =
Distribution (Data.Vector.Unboxed.generate 539 makeElement)
makeElement 0 = 1 - probability
makeElement i | i == votes = probability
makeElement _ = 0

distributions :: [Distribution]
distributions = map toDistribution probabilities

distribution :: Distribution
distribution = mconcat distributions

chanceOfClintonVictory :: Double
chanceOfClintonVictory =
Data.Vector.Unboxed.sum (Data.Vector.Unboxed.drop 270 xs)
Distribution xs = distribution

main :: IO ()
main = print chanceOfClintonVictory

by Gabriel Gonzalez ( at October 27, 2016 01:50 PM

October 26, 2016

Functional Jobs

Clojure Engineer at ROKT (Full-time)

ROKT is hiring thoughtful, talented functional programmers, at all levels, to expand our Clojure team in Sydney, Australia. (We're looking for people who already have the right to work in Australia, please.)

ROKT is a successful startup with a transaction marketing platform used by some of the world's largest ecommerce sites. Our Sydney-based engineering team supports a business that is growing rapidly around the world.

Our Clojure engineers are responsible for ROKT's "Data Platform", a web interface for our sales teams, our operations team, and our customers to extract and upload the data that drives our customers' businesses and our own. We write Clojure on the server-side, and a ClojureScript single-page application on the frontend.

We don't have a Hadoop-based neural net diligently organising our customer data into the world's most efficiently balanced red-black tree (good news: we won't ask you to write one in an interview) — instead, we try to spend our time carefully building the simplest thing that'll do what the business needs done. We're looking for programmers who can help us build simple, robust systems — and we think that means writing in a very functional style — whether that involves hooking some CV-enhancing buzzword technology on the side or not.

If you have professional Clojure experience, that's excellent, we'd like to hear about it. But we don't have a big matrix of exacting checkboxes to measure you against, so if your Clojure isn't fluent yet, we'll be happy to hear how you've been writing functional code in whatever language you're most comfortable in, whether it be Haskell or JavaScript, Common Lisp or Scala. We have the luxury of building out a solid team of thoughtful developers — no "get me a resource with exactly X years of experience in technology Y, stat!"

Get information on how to apply for this position.

October 26, 2016 05:58 AM

Joachim Breitner

Showcasing Applicative

My plan for this week’s lecture of the CIS 194 Haskell course at the University of Pennsylvania is to dwell a bit on the concept of Functor, Applicative and Monad, and to highlight the value of the Applicative abstraction.

I quite like the example that I came up with, so I want to share it here. In the interest of long-term archival and stand-alone presentation, I include all the material in this post.1


In case you want to follow along, start with these imports:

import Data.Char
import Data.Maybe
import Data.List

import System.Environment
import System.IO
import System.Exit

The parser

The starting point for this exercise is a fairly standard parser-combinator monad, which happens to be the result of the student’s homework from last week:

newtype Parser a = P (String -> Maybe (a, String))

runParser :: Parser t -> String -> Maybe (t, String)
runParser (P p) = p

parse :: Parser a -> String -> Maybe a
parse p input = case runParser p input of
    Just (result, "") -> Just result
    _ -> Nothing -- handles both no result and leftover input

noParserP :: Parser a
noParserP = P (\_ -> Nothing)

pureParserP :: a -> Parser a
pureParserP x = P (\input -> Just (x,input))

instance Functor Parser where
    fmap f p = P $ \input -> do
	(x, rest) <- runParser p input
	return (f x, rest)

instance Applicative Parser where
    pure = pureParserP
    p1 <*> p2 = P $ \input -> do
        (f, rest1) <- runParser p1 input
        (x, rest2) <- runParser p2 rest1
        return (f x, rest2)

instance Monad Parser where
    return = pure
    p1 >>= k = P $ \input -> do
        (x, rest1) <- runParser p1 input
        runParser (k x) rest1

anyCharP :: Parser Char
anyCharP = P $ \input -> case input of
    (c:rest) -> Just (c, rest)
    []       -> Nothing

charP :: Char -> Parser ()
charP c = do
    c' <- anyCharP
    if c == c' then return ()
               else noParserP

anyCharButP :: Char -> Parser Char
anyCharButP c = do
    c' <- anyCharP
    if c /= c' then return c'
               else noParserP

letterOrDigitP :: Parser Char
letterOrDigitP = do
    c <- anyCharP
    if isAlphaNum c then return c else noParserP

orElseP :: Parser a -> Parser a -> Parser a
orElseP p1 p2 = P $ \input -> case runParser p1 input of
    Just r -> Just r
    Nothing -> runParser p2 input

manyP :: Parser a -> Parser [a]
manyP p = (pure (:) <*> p <*> manyP p) `orElseP` pure []

many1P :: Parser a -> Parser [a]
many1P p = pure (:) <*> p <*> manyP p

sepByP :: Parser a -> Parser () -> Parser [a]
sepByP p1 p2 = (pure (:) <*> p1 <*> (manyP (p2 *> p1))) `orElseP` pure []

A parser using this library for, for example, CSV files could take this form:

parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
    parseLine = parseCell `sepByP` charP ',' <* charP '\n'
    parseCell = do
        charP '"'
        content <- manyP (anyCharButP '"')
        charP '"'
        return content

We want EBNF

Often when we write a parser for a file format, we might also want to have a formal specification of the format. A common form for such a specification is EBNF. This might look as follows, for a CSV file:

cell = '"', {not-quote}, '"';
line = (cell, {',', cell} | ''), newline;
csv  = {line};

It is straightforward to create a Haskell data type to represent an EBNF syntax description. Here is a simple EBNF library (data type and pretty-printer) for your convenience:

data RHS
  = Terminal String
  | NonTerminal String
  | Choice RHS RHS
  | Sequence RHS RHS
  | Optional RHS
  | Repetition RHS
  deriving (Show, Eq)

ppRHS :: RHS -> String
ppRHS = go 0
    go _ (Terminal s)     = surround "'" "'" $ concatMap quote s
    go _ (NonTerminal s)  = s
    go a (Choice x1 x2)   = p a 1 $ go 1 x1 ++ " | " ++ go 1 x2
    go a (Sequence x1 x2) = p a 2 $ go 2 x1 ++ ", "  ++ go 2 x2
    go _ (Optional x)     = surround "[" "]" $ go 0 x
    go _ (Repetition x)   = surround "{" "}" $ go 0 x

    surround c1 c2 x = c1 ++ x ++ c2

    p a n | a > n     = surround "(" ")"
          | otherwise = id

    quote '\'' = "\\'"
    quote '\\' = "\\\\"
    quote c    = [c]

type Production = (String, RHS)
type BNF = [Production]

ppBNF :: BNF -> String
ppBNF = unlines . map (\(i,rhs) -> i ++ " = " ++ ppRHS rhs ++ ";")

Code to produce EBNF

We had a good time writing combinators that create complex parsers from primitive pieces. Let us do the same for EBNF grammars. We could simply work on the RHS type directly, but we can do something more nifty: We create a data type that keeps track, via a phantom type parameter, of what Haskell type the given EBNF syntax is the specification:

newtype Grammar a = G RHS

ppGrammar :: Grammar a -> String
ppGrammar (G rhs) = ppRHS rhs

So a value of type Grammar t is a description of the textual representation of the Haskell type t.

Here is one simple example:

anyCharG :: Grammar Char
anyCharG = G (NonTerminal "char")

Here is another one. This one does not describe any interesting Haskell type, but is useful when spelling out the special characters in the syntax described by the grammar:

charG :: Char -> Grammar ()
charG c = G (Terminal [c])

A combinator that creates new grammar from two existing grammars:

orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G rhs1) (G rhs2) = G (Choice rhs1 rhs2)

We want the convenience of our well-known type classes in order to combine these values some more:

instance Functor Grammar where
    fmap _ (G rhs) = G rhs

instance Applicative Grammar where
    pure x = G (Terminal "")
    (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)

Note how the Functor instance does not actually use the function. How should it? There are no values inside a Grammar!

We cannot define a Monad instance for Grammar: We would start with (G rhs1) >>= k = …, but there is simply no way of getting a value of type a that we can feed to k. So we will do without a Monad instance. This is interesting, and we will come back to that later.

Like with the parser, we can now begin to build on the primitive example to build more complicated combinators:

manyG :: Grammar a -> Grammar [a]
manyG p = (pure (:) <*> p <*> manyG p) `orElseG` pure []

many1G :: Grammar a -> Grammar [a]
many1G p = pure (:) <*> p <*> manyG p

sepByG :: Grammar a -> Grammar () -> Grammar [a]
sepByG p1 p2 = ((:) <$> p1 <*> (manyG (p2 *> p1))) `orElseG` pure []

Let us run a small example:

dottedWordsG :: Grammar [String]
dottedWordsG = many1G (manyG anyCharG <* charG '.')
*Main> putStrLn $ ppGrammar dottedWordsG
'', ('', char, ('', char, ('', char, ('', char, ('', char, ('', …

Oh my, that is not good. Looks like the recursion in manyG does not work well, so we need to avoid that. But anyways we want to be explicit in the EBNF grammars about where something can be repeated, so let us just make many a primitive:

manyG :: Grammar a -> Grammar [a]
manyG (G rhs) = G (Repetition rhs)

With this definition, we already get a simple grammar for dottedWordsG:

*Main> putStrLn $ ppGrammar dottedWordsG
'', {char}, '.', {{char}, '.'}

This already looks like a proper EBNF grammar. One thing that is not nice about it is that there is an empty string ('') in a sequence (…,…). We do not want that.

Why is it there in the first place? Because our Applicative instance is not lawful! Remember that pure id <*> g == g should hold. One way to achieve that is to improve the Applicative instance to optimize this case away:

instance Applicative Grammar where
    pure x = G (Terminal "")
    G (Terminal "") <*> G rhs2 = G rhs2
    G rhs1 <*> G (Terminal "") = G rhs1
    (G rhs1) <*> (G rhs2) = G (Sequence rhs1 rhs2)
Now we get what we want:
*Main> putStrLn $ ppGrammar dottedWordsG
{char}, '.', {{char}, '.'}

Remember our parser for CSV files above? Let me repeat it here, this time using only Applicative combinators, i.e. avoiding (>>=), (>>), return and do-notation:

parseCSVP :: Parser [[String]]
parseCSVP = manyP parseLine
    parseLine = parseCell `sepByP` charG ',' <* charP '\n'
    parseCell = charP '"' *> manyP (anyCharButP '"') <* charP '"'

And now we try to rewrite the code to produce Grammar instead of Parser. This is straightforward with the exception of anyCharButP. The parser code for that inherently monadic, and we just do not have a monad instance. So we work around the issue by making that a “primitive” grammar, i.e. introducing a non-terminal in the EBNF without a production rule – pretty much like we did for anyCharG:

primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)

parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
    parseLine = parseCell `sepByG` charG ',' <* charG '\n'
    parseCell = charG '"' *> manyG (primitiveG "not-quote") <* charG '"'

Of course the names parse… are not quite right any more, but let us just leave that for now.

Here is the result:

*Main> putStrLn $ ppGrammar parseCSVG
{('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), '

The line break is weird. We do not really want newlines in the grammar. So let us make that primitive as well, and replace charG '\n' with newlineG:

newlineG :: Grammar ()
newlineG = primitiveG "newline"

Now we get

*Main> putStrLn $ ppGrammar parseCSVG
{('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline}

which is nice and correct, but still not quite the easily readable EBNF that we saw further up.

Code to produce EBNF, with productions

We currently let our grammars produce only the right-hand side of one EBNF production, but really, we want to produce a RHS that may refer to other productions. So let us change the type accordingly:

newtype Grammar a = G (BNF, RHS)

runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]

ppGrammar :: String -> Grammar a -> String
ppGrammar main g = ppBNF $ runGrammer main g

Now we have to adjust all our primitive combinators (but not the derived ones!):

charG :: Char -> Grammar ()
charG c = G ([], Terminal [c])

anyCharG :: Grammar Char
anyCharG = G ([], NonTerminal "char")

manyG :: Grammar a -> Grammar [a]
manyG (G (prods, rhs)) = G (prods, Repetition rhs)

mergeProds :: [Production] -> [Production] -> [Production]
mergeProds prods1 prods2 = nub $ prods1 ++ prods2

orElseG :: Grammar a -> Grammar a -> Grammar a
orElseG (G (prods1, rhs1)) (G (prods2, rhs2))
    = G (mergeProds prods1 prods2, Choice rhs1 rhs2)

instance Functor Grammar where
    fmap _ (G bnf) = G bnf

instance Applicative Grammar where
    pure x = G ([], Terminal "")
    G (prods1, Terminal "") <*> G (prods2, rhs2)
        = G (mergeProds prods1 prods2, rhs2)
    G (prods1, rhs1) <*> G (prods2, Terminal "")
        = G (mergeProds prods1 prods2, rhs1)
    G (prods1, rhs1) <*> G (prods2, rhs2)
        = G (mergeProds prods1 prods2, Sequence rhs1 rhs2)

primitiveG :: String -> Grammar a
primitiveG s = G (NonTerminal s)

The use of nub when combining productions removes duplicates that might be used in different parts of the grammar. Not efficient, but good enough for now.

Did we gain anything? Not yet:

*Main> putStr $ ppGrammar "csv" (parseCSVG)
csv = {('"', {not-quote}, '"', {',', '"', {not-quote}, '"'} | ''), newline};

But we can now introduce a function that lets us tell the system where to give names to a piece of grammar:

nonTerminal :: String -> Grammar a -> Grammar a
nonTerminal name (G (prods, rhs))
  = G (prods ++ [(name, rhs)], NonTerminal name)

Ample use of this in parseCSVG yields the desired result:

parseCSVG :: Grammar [[String]]
parseCSVG = manyG parseLine
    parseLine = nonTerminal "line" $
        parseCell `sepByG` charG ',' <* newline
    parseCell = nonTerminal "cell" $
        charG '"' *> manyG (primitiveG "not-quote") <* charG '"
*Main> putStr $ ppGrammar "csv" (parseCSVG)
cell = '"', {not-quote}, '"';
line = (cell, {',', cell} | ''), newline;
csv = {line};

This is great!

Unifying parsing and grammar-generating

Note how simliar parseCSVG and parseCSVP are! Would it not be great if we could implement that functionality only once, and get both a parser and a grammar description out of it? This way, the two would never be out of sync!

And surely this must be possible. The tool to reach for is of course to define a type class that abstracts over the parts where Parser and Grammer differ. So we have to identify all functions that are primitive in one of the two worlds, and turn them into type class methods. This includes char and orElse. It includes many, too: Although manyP is not primitive, manyG is. It also includes nonTerminal, which does not exist in the world of parsers (yet), but we need it for the grammars.

The primitiveG function is tricky. We use it in grammars when the code that we might use while parsing is not expressible as a grammar. So the solution is to let it take two arguments: A String, when used as a descriptive non-terminal in a grammar, and a Parser a, used in the parsing code.

Finally, the type classes that we except, Applicative (and thus Functor), are added as constraints on our type class:

class Applicative f => Descr f where
    char :: Char -> f ()
    many :: f a -> f [a]
    orElse :: f a -> f a -> f a
    primitive :: String -> Parser a -> f a
    nonTerminal :: String -> f a -> f a

The instances are easily written:

instance Descr Parser where
    char = charP
    many = manyP
    orElse = orElseP
    primitive _ p = p
    nonTerminal _ p = p

instance Descr Grammar where
    char = charG
    many = manyG
    orElse = orElseG
    primitive s _ = primitiveG s
    nonTerminal s g = nonTerminal s g

And we can now take the derived definitions, of which so far we had two copies, and define them once and for all:

many1 :: Descr f => f a -> f [a]
many1 p = pure (:) <*> p <*> many p

anyChar :: Descr f => f Char
anyChar = primitive "char" anyCharP

dottedWords :: Descr f => f [String]
dottedWords = many1 (many anyChar <* char '.')

sepBy :: Descr f => f a -> f () -> f [a]
sepBy p1 p2 = ((:) <$> p1 <*> (many (p2 *> p1))) `orElse` pure []

newline :: Descr f => f ()
newline = primitive "newline" (charP '\n')

And thus we now have our CSV parser/grammar generator:

parseCSV :: Descr f => f [[String]]
parseCSV = many parseLine
    parseLine = nonTerminal "line" $
        parseCell `sepBy` char ',' <* newline
    parseCell = nonTerminal "cell" $
        char '"' *> many (primitive "not-quote" (anyCharButP '"')) <* char '"'

We can now use this definition both to parse and to generate grammars:

*Main> putStr $ ppGrammar2 "csv" (parseCSV)
cell = '"', {not-quote}, '"';
line = (cell, {',', cell} | ''), newline;
csv = {line};
*Main> parse parseCSV "\"ab\",\"cd\"\n\"\",\"de\"\n\n"
Just [["ab","cd"],["","de"],[]]

The INI file parser and grammar

As a final exercise, let us transform the INI file parser into a combined thing. Here is the parser (another artifact of last week’s homework) again using applicative style2:

parseINIP :: Parser INIFile
parseINIP = many1P parseSection
    parseSection =
        (,) <$  charP '['
            <*> parseIdent
            <*  charP ']'
            <*  charP '\n'
            <*> (catMaybes <$> manyP parseLine)
    parseIdent = many1P letterOrDigitP
    parseLine = parseDecl `orElseP` parseComment `orElseP` parseEmpty

    parseDecl = Just <$> (
        (,) <*> parseIdent
            <*  manyP (charP ' ')
            <*  charP '='
            <*  manyP (charP ' ')
            <*> many1P (anyCharButP '\n')
            <*  charP '\n')

    parseComment =
        Nothing <$ charP '#'
                <* many1P (anyCharButP '\n')
                <* charP '\n'

    parseEmpty = Nothing <$ charP '\n'

Transforming that to a generic description is quite straightforward. We use primitive again to wrap letterOrDigitP:

descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
    parseSection =
        (,) <*  char '['
            <*> parseIdent
            <*  char ']'
            <*  newline
            <*> (catMaybes <$> many parseLine)
    parseIdent = many1 (primitive "alphanum" letterOrDigitP)
    parseLine = parseDecl `orElse` parseComment `orElse` parseEmpty

    parseDecl = Just <$> (
        (,) <*> parseIdent
            <*  many (char ' ')
            <*  char '='
            <*  many (char ' ')
            <*> many1 (primitive "non-newline" (anyCharButP '\n'))
	    <*  newline)

    parseComment =
        Nothing <$ char '#'
                <* many1 (primitive "non-newline" (anyCharButP '\n'))
		<* newline

    parseEmpty = Nothing <$ newline

This yields this not very helpful grammar (abbreviated here):

*Main> putStr $ ppGrammar2 "ini" descrINI
ini = '[', alphanum, {alphanum}, ']', newline, {alphanum, {alphanum}, {' '}…

But with a few uses of nonTerminal, we get something really nice:

descrINI :: Descr f => f INIFile
descrINI = many1 parseSection
    parseSection = nonTerminal "section" $
        (,) <$  char '['
            <*> parseIdent
            <*  char ']'
            <*  newline
            <*> (catMaybes <$> many parseLine)
    parseIdent = nonTerminal "identifier" $
        many1 (primitive "alphanum" letterOrDigitP)
    parseLine = nonTerminal "line" $
        parseDecl `orElse` parseComment `orElse` parseEmpty

    parseDecl = nonTerminal "declaration" $ Just <$> (
        (,) <$> parseIdent
            <*  spaces
            <*  char '='
            <*  spaces
            <*> remainder)

    parseComment = nonTerminal "comment" $
        Nothing <$ char '#' <* remainder

    remainder = nonTerminal "line-remainder" $
        many1 (primitive "non-newline" (anyCharButP '\n')) <* newline

    parseEmpty = Nothing <$ newline

    spaces = nonTerminal "spaces" $ many (char ' ')
*Main> putStr $ ppGrammar "ini" descrINI
identifier = alphanum, {alphanum};
spaces = {' '};
line-remainder = non-newline, {non-newline}, newline;
declaration = identifier, spaces, '=', spaces, line-remainder;
comment = '#', line-remainder;
line = declaration | comment | newline;
section = '[', identifier, ']', newline, {line};
ini = section, {section};

Recursion (variant 1)

What if we want to write a parser/grammar-generator that is able to generate the following grammar, which describes terms that are additions and multiplications of natural numbers:

const = digit, {digit};
spaces = {' ' | newline};
atom = const | '(', spaces, expr, spaces, ')', spaces;
mult = atom, {spaces, '*', spaces, atom}, spaces;
plus = mult, {spaces, '+', spaces, mult}, spaces;
expr = plus;

The production of expr is recursive (via plus, mult, atom). We have seen above that simply defining a Grammar a recursively does not go well.

One solution is to add a new combinator for explicit recursion, which replaces nonTerminal in the method:

class Applicative f => Descr f where
    recNonTerminal :: String -> (f a -> f a) -> f a

instance Descr Parser where
    recNonTerminal _ p = let r = p r in r

instance Descr Grammar where
    recNonTerminal = recNonTerminalG

recNonTerminalG :: String -> (Grammar a -> Grammar a) -> Grammar a
recNonTerminalG name f =
    let G (prods, rhs) = f (G ([], NonTerminal name))
    in G (prods ++ [(name, rhs)], NonTerminal name)

nonTerminal :: Descr f => String -> f a -> f a
nonTerminal name p = recNonTerminal name (const p)

runGrammer :: String -> Grammar a -> BNF
runGrammer main (G (prods, NonTerminal nt)) | main == nt = prods
runGrammer main (G (prods, rhs)) = prods ++ [(main, rhs)]

The change in runGrammer avoids adding a pointless expr = expr production to the output.

This lets us define a parser/grammar-generator for the arithmetic expressions given above:

data Expr = Plus Expr Expr | Mult Expr Expr | Const Integer
    deriving Show

mkPlus :: Expr -> [Expr] -> Expr
mkPlus = foldl Plus

mkMult :: Expr -> [Expr] -> Expr
mkMult = foldl Mult

parseExpr :: Descr f => f Expr
parseExpr = recNonTerminal "expr" $ \ exp ->
    ePlus exp

ePlus :: Descr f => f Expr -> f Expr
ePlus exp = nonTerminal "plus" $
    mkPlus <$> eMult exp
           <*> many (spaces *> char '+' *> spaces *> eMult exp)
           <*  spaces

eMult :: Descr f => f Expr -> f Expr
eMult exp = nonTerminal "mult" $
    mkPlus <$> eAtom exp
           <*> many (spaces *> char '*' *> spaces *> eAtom exp)
           <*  spaces

eAtom :: Descr f => f Expr -> f Expr
eAtom exp = nonTerminal "atom" $
    aConst `orElse` eParens exp

aConst :: Descr f => f Expr
aConst = nonTerminal "const" $ Const . read <$> many1 digit

eParens :: Descr f => f a -> f a
eParens inner =
    id <$  char '('
       <*  spaces
       <*> inner
       <*  spaces
       <*  char ')'
       <*  spaces

And indeed, this works:

*Main> putStr $ ppGrammar "expr" parseExpr
const = digit, {digit};
spaces = {' ' | newline};
atom = const | '(', spaces, expr, spaces, ')', spaces;
mult = atom, {spaces, '*', spaces, atom}, spaces;
plus = mult, {spaces, '+', spaces, mult}, spaces;
expr = plus;

Recursion (variant 2)

Interestingly, there is another solution to this problem, which avoids introducing recNonTerminal and explicitly passing around the recursive call (i.e. the exp in the example). To implement that we have to adjust our Grammar type as follows:

newtype Grammar a = G ([String] -> (BNF, RHS))

The idea is that the list of strings is those non-terminals that we are currently defining. So in nonTerminal, we check if the non-terminal to be introduced is currently in the process of being defined, and then simply ignore the body. This way, the recursion is stopped automatically:

nonTerminalG :: String -> (Grammar a) -> Grammar a
nonTerminalG name (G g) = G $ \seen ->
    if name `elem` seen
    then ([], NonTerminal name)
    else let (prods, rhs) = g (name : seen)
         in (prods ++ [(name, rhs)], NonTerminal name)

After adjusting the other primitives of Grammar (including the Functor and Applicative instances, wich now again have nonTerminal) to type-check again, we observe that this parser/grammar generator for expressions, with genuine recursion, works now:

parseExp :: Descr f => f Expr
parseExp = nonTerminal "expr" $

ePlus :: Descr f => f Expr
ePlus = nonTerminal "plus" $
    mkPlus <$> eMult
           <*> many (spaces *> char '+' *> spaces *> eMult)
           <*  spaces

eMult :: Descr f => f Expr
eMult = nonTerminal "mult" $
    mkPlus <$> eAtom
           <*> many (spaces *> char '*' *> spaces *> eAtom)
           <*  spaces

eAtom :: Descr f => f Expr
eAtom = nonTerminal "atom" $
    aConst `orElse` eParens parseExp

Note that the recursion is only going to work if there is at least one call to nonTerminal somewhere around the recursive calls. We still cannot implement many as naively as above.


If you want to play more with this: The homework is to define a parser/grammar-generator for EBNF itself, as specified in this variant:

identifier = letter, {letter | digit | '-'};
spaces = {' ' | newline};
quoted-char = non-quote-or-backslash | '\\', '\\' | '\\', '\'';
terminal = '\'', {quoted-char}, '\'', spaces;
non-terminal = identifier, spaces;
option = '[', spaces, rhs, spaces, ']', spaces;
repetition = '{', spaces, rhs, spaces, '}', spaces;
group = '(', spaces, rhs, spaces, ')', spaces;
atom = terminal | non-terminal | option | repetition | group;
sequence = atom, {spaces, ',', spaces, atom}, spaces;
choice = sequence, {spaces, '|', spaces, sequence}, spaces;
rhs = choice;
production = identifier, spaces, '=', spaces, rhs, ';', spaces;
bnf = production, {production};

This grammar is set up so that the precedence of , and | is correctly implemented: a , b | c will parse as (a, b) | c.

In this syntax for BNF, terminal characters are quoted, i.e. inside '…', a ' is replaced by \' and a \ is replaced by \\ – this is done by the function quote in ppRHS.

If you do this, you should able to round-trip with the pretty-printer, i.e. parse back what it wrote:

*Main> let bnf1 = runGrammer "expr" parseExpr
*Main> let bnf2 = runGrammer "expr" parseBNF
*Main> let f = Data.Maybe.fromJust . parse parseBNF. ppBNF
*Main> f bnf1 == bnf1
*Main> f bnf2 == bnf2

The last line is quite meta: We are using parseBNF as a parser on the pretty-printed grammar produced from interpreting parseBNF as a grammar.


We have again seen an example of the excellent support for abstraction in Haskell: Being able to define so very different things such as a parser and a grammar description with the same code is great. Type classes helped us here.

Note that it was crucial that our combined parser/grammars are only able to use the methods of Applicative, and not Monad. Applicative is less powerful, so by giving less power to the user of our Descr interface, the other side, i.e. the implementation, can be more powerful.

The reason why Applicative is ok, but Monad is not, is that in Applicative, the results do not affect the shape of the computation, whereas in Monad, the whole point of the bind operator (>>=) is that the result of the computation is used to decide the next computation. And while this is perfectly fine for a parser, it just makes no sense for a grammar generator, where there simply are no values around!

We have also seen that a phantom type, namely the parameter of Grammar, can be useful, as it lets the type system make sure we do not write nonsense. For example, the type of orElseG ensures that both grammars that are combined here indeed describe something of the same type.

  1. It seems to be the week of applicative-appraising blog posts: Brent has posted a nice piece about enumerations using Applicative yesterday.

  2. I like how in this alignment of <*> and <* the > point out where the arguments are that are being passed to the function on the left.

by Joachim Breitner ( at October 26, 2016 04:00 AM

October 25, 2016

Brent Yorgey

Adventures in enumerating balanced brackets

Since I’ve been coaching my school’s ACM ICPC programming team, I’ve been spending a bit of time solving programming contest problems, partly to stay sharp and be able to coach them better, but also just for fun.

I recently solved a problem (using Haskell) that ended up being tougher than I thought, but I learned a lot along the way. Rather than just presenting a solution, I’d like to take you through my thought process, crazy detours and all.

Of course, I should preface this with a big spoiler alert: if you want to try solving the problem yourself, you should stop reading now!

> {-# LANGUAGE GADTs #-}
> {-# LANGUAGE DeriveFunctor #-}
> module Brackets where
> import Data.List (sort, genericLength)
> import Data.MemoTrie (memo, memo2)
> import Prelude hiding ((++))

The problem

There’s a lot of extra verbiage at the official problem description, but what it boils down to is this:

Find the mth element of the lexicographically ordered sequence of all balanced bracketings of length n.

There is a longer description at the problem page, but hopefully a few examples will suffice. A balanced bracketing is a string consisting solely of parentheses, in which opening and closing parens can be matched up in a one-to-one, properly nested way. For example, there are five balanced bracketings of length 6:

((())), (()()), (())(), ()(()), ()()()

By lexicographically ordered we just mean that the bracketings should be in “dictionary order” where ( comes before ), that is, bracketing x comes before bracketing y if and only if in the first position where they differ, x has ( and y has ). As you can verify, the list of length-6 bracketings above is, in fact, lexicographically ordered.

A first try

Oh, this is easy, I thought, especially if we consider the well-known isomorphism between balanced bracketings and binary trees. In particular, the empty string corresponds to a leaf, and (L)R (where L and R are themselves balanced bracketings) corresponds to a node with subtrees L and R. So the five balanced bracketings of length 6 correspond to the five binary trees with three nodes:

We can easily generate all the binary trees of a given size with a simple recursive algorithm. If n = 0, generate a Leaf; otherwise, decide how many nodes to put on the left and how many on the right, and for each such distribution recursively generate all possible trees on the left and right.

> data Tree where
>   Leaf :: Tree
>   Node :: Tree -> Tree -> Tree
>   deriving (Show, Eq, Ord)
> allTrees :: Int -> [Tree]
> allTrees 0 = [Leaf]
> allTrees n =
>   [ Node l r
>   | k <- [0 .. n-1]
>   , l <- allTrees ((n-1) - k)
>   , r <- allTrees k
>   ]

We generate the trees in “left-biased” order, where we first choose to put all n-1 nodes on the left, then n-2 on the left and 1 on the right, and so on. Since a subtree on the left will result in another opening paren, but a subtree on the right will result in a closing paren followed by an open paren, it makes intuitive sense that this corresponds to generating bracketings in sorted order. You can see that the size-3 trees above, generated in left-biased order, indeed have their bracketings sorted.

Writing allTrees is easy enough, but it’s definitely not going to cut it: the problem states that we could have up to n = 1000. The number of trees with 1000 nodes has 598 digits (!!), so we can’t possibly generate the entire list and then index into it. Instead we need a function that can more efficiently generate the tree with a given index, without having to generate all the other trees before it.

So I immediately launched into writing such a function, but it’s tricky to get right. It involves computing Catalan numbers, and cumulative sums of products of Catalan numbers, and divMod, and… I never did get that function working properly.

The first epiphany

But I never should have written that function in the first place! What I should have done first was to do some simple tests just to confirm my intuition that left-biased tree order corresponds to sorted bracketing order. Because if I had, I would have found this:

> brackets :: Tree -> String
> brackets Leaf       = ""
> brackets (Node l r) = mconcat ["(", brackets l, ")", brackets r]
> sorted :: Ord a => [a] -> Bool
> sorted xs = xs == sort xs
ghci> sorted (map brackets (allTrees 3))

ghci> sorted (map brackets (allTrees 4))

As you can see, my intuition actually led me astray! n = 3 is a small enough case that left-biased order just happens to be the same as sorted bracketing order, but for n = 4 this breaks down. Let’s see what goes wrong:

In the top row are the size-4 trees in “left-biased” order, i.e. the order generated by allTrees. You can see it is nice and symmetric: reflecting the list across a vertical line leaves it unchanged. On the bottom row are the same trees, but sorted lexicographically by their bracketings. You can see that the lists are almost the same except the red tree is in a different place. The issue is the length of the left spine: the red tree has a left spine of three nodes, which means its bracketing will begin with (((, so it should come before any trees with a left spine of length 2, even if they have all their nodes in the left subtree (whereas the red tree has one of its nodes in the right subtree).

My next idea was to try to somehow enumerate trees in order by the length of their left spine. But since I hadn’t even gotten indexing into the original left-biased order to work, it seemed hopeless to get this to work by implementing it directly. I needed some bigger guns.

Building enumerations

At this point I had the good idea to introduce some abstraction. I defined a type of enumerations (a la FEAT or data/enumerate):

> data Enumeration a = Enumeration
>   { fromNat :: Integer -> a
>   , size    :: Integer
>   }
>   deriving Functor
> enumerate :: Enumeration a -> [a]
> enumerate (Enumeration f n) = map f [0..n-1]

An Enumeration consists of a size along with a function Integer -> a, which we think of as being defined on [0 .. size-1]. That is, an Enumeration is isomorphic to a finite list of a given length, where instead of explicitly storing the elements, we have a function which can compute the element at a given index on demand. If the enumeration has some nice combinatorial structure, then we expect that this on-demand indexing can be done much more efficiently than simply listing all the elements. The enumerate function simply turns an Enumeration into the corresponding finite list, by mapping the indexing function over all possible indices.

Note that Enumeration has a natural Functor instance, which GHC can automatically derive for us. Namely, if e is an Enumeration, then fmap f e is the Enumeration which first computes the element of e for a given index, and then applies f to it before returning.

Now, let’s define some combinators for building Enumerations. We expect them to have all the nice algebraic flavor of finite lists, aka free monoids.

First, we can create empty or singleton enumerations, or convert any finite list into an enumeration:

> empty :: Enumeration a
> empty = Enumeration (const undefined) 0
> singleton :: a -> Enumeration a
> singleton a = Enumeration (\_ -> a) 1
> list :: [a] -> Enumeration a
> list as = Enumeration (\n -> as !! fromIntegral n) (genericLength as)
ghci> enumerate (empty :: Enumeration Int)

ghci> enumerate (singleton 3)

ghci> enumerate (list [4,6,7])

We can form the concatenation of two enumerations. The indexing function compares the given index against the size of the first enumeration, and then indexes into the first or second enumeration appropriately. For convenience we can also define union, which is just an iterated version of (++).

> (++) :: Enumeration a -> Enumeration a -> Enumeration a
> e1 ++ e2 = Enumeration
>   (\n -> if n < size e1 then fromNat e1 n else fromNat e2 (n - size e1))
>   (size e1 + size e2)
> union :: [Enumeration a] -> Enumeration a
> union = foldr (++) empty
ghci> enumerate (list [3, 5, 6] ++ empty ++ singleton 8)

Finally, we can form a Cartesian product: e1 >< e2 is the enumeration of all possible pairs of elements from e1 and e2, ordered so that all the pairs formed from the first element of e1 come first, followed by all the pairs with the second element of e1, and so on. The indexing function divides the given index by the size of e2, and uses the quotient to index into e1, and the remainder to index into e2.

> (><) :: Enumeration a -> Enumeration b -> Enumeration (a,b)
> e1 >< e2 = Enumeration
>   (\n -> let (l,r) = n `divMod` size e2 in (fromNat e1 l, fromNat e2 r))
>   (size e1 * size e2)
ghci> enumerate (list [1,2,3] >< list [10,20])

ghci> let big = list [0..999] >< list [0..999] >< list [0..999] >< list [0..999]
ghci> fromNat big 2973428654

Notice in particular how the fourfold product of list [0..999] has 1000^4 = 10^{12} elements, but indexing into it with fromNat is basically instantaneous.

Since Enumerations are isomorphic to finite lists, we expect them to have Applicative and Monad instances, too. First, the Applicative instance is fairly straightforward:

> instance Applicative Enumeration where
>   pure    = singleton
>   f <*> x = uncurry ($) <$> (f >< x)
ghci> enumerate $ (*) <$> list [1,2,3] <*> list [10, 100]

pure creates a singleton enumeration, and applying an enumeration of functions to an enumeration of arguments works by taking a Cartesian product and then applying each pair.

The Monad instance works by substitution: in e >>= k, the continuation k is applied to each element of the enumeration e, and the resulting enumerations are unioned together in order.

> instance Monad Enumeration where
>   return  = pure
>   e >>= f = union (map f (enumerate e))
ghci> enumerate $ list [1,2,3] >>= \i -> list (replicate i i)

Having to actually enumerate the elements of e is a bit unsatisfying, but there is really no way around it: we otherwise have no way to know how big the resulting enumerations are going to be.

Now, that function I tried (and failed) to write before that generates the tree at a particular index in left-biased order? Using these enumeration combinators, it’s a piece of cake. Basically, since we built up combinators that mirror those available for lists, it’s just as easy to write this indexing version as it is to write the original allTrees function (which I’ve copied below for comparison):

allTrees :: Int -> [Tree]
allTrees 0 = [Leaf]
allTrees n =
  [ Node l r
  | k <- [0 .. n-1]
  , l <- allTrees ((n-1) - k)
  , r <- allTrees k
> enumTrees :: Int -> Enumeration Tree
> enumTrees 0 = singleton Leaf
> enumTrees n = union
>   [ Node <$> enumTrees (n-k-1) <*> enumTrees k
>   | k <- [0 .. n-1]
>   ]

(enumTrees and allTrees look a bit different, but actually allTrees can be rewritten in a very similar style:

allTrees :: Int -> [Tree]
allTrees 0 = [Leaf]
allTrees n = concat
  [ Node <$> allTrees ((n-1) - k) <*> r <- allTrees k
  | k <- [0 .. n-1]

Doing as much as possible using the Applicative interface gives us added “parallelism”, which in this case means the ability to index directly into a product with divMod, rather than scanning through the results of calling a function on enumerate until we have accumulated the right size. See the paper on the GHC ApplicativeDo extension.)

Let’s try it out:

ghci> enumerate (enumTrees 3)
  [Node (Node (Node Leaf Leaf) Leaf) Leaf,Node (Node Leaf (Node Leaf Leaf)) Leaf,Node (Node Leaf Leaf) (Node Leaf Leaf),Node Leaf (Node (Node Leaf Leaf) Leaf),Node Leaf (Node Leaf (Node Leaf Leaf))]

ghci> enumerate (enumTrees 3) == allTrees 3

ghci> enumerate (enumTrees 7) == allTrees 7

ghci> brackets $ fromNat (enumTrees 7) 43

It seems to work! Though actually, if we try larger values of n, enumTrees just seems to hang. The problem is that it ends up making many redundant recursive calls. Well… nothing a bit of memoization can’t fix! (Here I’m using Conal Elliott’s nice MemoTrie package.)

> enumTreesMemo :: Int -> Enumeration Tree
> enumTreesMemo = memo enumTreesMemo'
>   where
>     enumTreesMemo' 0 = singleton Leaf
>     enumTreesMemo' n = union
>       [ Node <$> enumTreesMemo (n-k-1) <*> enumTreesMemo k
>       | k <- [0 .. n-1]
>       ]
ghci> size (enumTreesMemo 10)

ghci> size (enumTreesMemo 100)

ghci> size (enumTreesMemo 1000)

ghci> brackets $ fromNat (enumTreesMemo 1000) 8234587623904872309875907638475639485792863458726398487590287348957628934765

That’s better!

A second try

At this point, I thought that I needed to enumerate trees in order by the length of their left spine. Given a tree with a left spine of length s, we enumerate all the ways to partition the remaining n-s elements among the right children of the s spine nodes, preferring to first put elements as far to the left as possible. As you’ll see, this turns out to be wrong, but it’s fun to see how easy it is to write this using the enumeration framework.

First, we need an enumeration of the partitions of a given n into exactly k parts, in lexicographic order.

> kPartitions :: Int -> Int -> Enumeration [Int]

There is exactly one way to partition 0 into zero parts.

> kPartitions 0 0 = singleton []

We can’t partition anything other than 0 into zero parts.

> kPartitions _ 0 = empty

Otherwise, pick a number i from n down to 0 to go in the first spot, and then recursively enumerate partitions of n-i into exactly k-1 parts.

> kPartitions n k = do
>   i <- list [n, n-1 .. 0]
>   (i:) <$> kPartitions (n-i) (k-1)

Let’s try it:

ghci> let p43 = enumerate $ kPartitions 4 3
ghci> p43

ghci> all ((==3) . length) p43

ghci> all ((==4) . sum) p43

ghci> sorted (reverse p43)

Now we can use kPartitions to build our enumeration of trees:

> spinyTrees :: Int -> Enumeration Tree
> spinyTrees = memo spinyTrees'
>   where
>     spinyTrees' 0 = singleton Leaf
>     spinyTrees' n = do
>       -- Pick the length of the left spine
>       spineLen  <- list [n, n-1 .. 1]
>       -- Partition the remaining elements among the spine nodes
>       bushSizes <- kPartitions (n - spineLen) spineLen
>       bushes <- traverse spinyTrees bushSizes
>       return $ buildSpine (reverse bushes)
>     buildSpine :: [Tree] -> Tree
>     buildSpine []     = Leaf
>     buildSpine (b:bs) = Node (buildSpine bs) b

This appears to give us something reasonable:

ghci> size (spinyTrees 7) == size (enumTreesMemo 7)

But it’s pretty slow—which is to be expected with all those monadic operations required. And there’s more:

ghci> sorted . map brackets . enumerate $ spinyTrees 3

ghci> sorted . map brackets . enumerate $ spinyTrees 4

ghci> sorted . map brackets . enumerate $ spinyTrees 5

Foiled again! All we did was stave off failure a bit, until n=5. I won’t draw all the trees of size 5 for you, but the failure mode is pretty similar: picking subtrees for the spine based just on how many elements they have doesn’t work, because there are cases where we want to first shift some elements to a later subtree, keeping the left spine of a subtree, before moving the elements back and having a shorter left spine.

The solution: just forget about trees, already

It finally occurred to me that there was nothing in the problem statement that said anything about trees. That was just something my overexcited combinatorial brain imposed on it: obviously, since there is a bijection between balanced bracketings and binary trees, we should think about binary trees, right? …well, there is also a bijection between balanced bracketings and permutations avoiding (231), and lattice paths that stay above the main diagonal, and hundreds of other things, so… not necessarily.

In this case, I think trees just end up making things harder. Let’s think instead about enumerating balanced bracket sequences directly. To do it recursively, we need to know how to enumerate possible endings to the start of any balanced bracket sequence. That is, we need to enumerate sequences containing n opening brackets and c extra closing brackets (so n+c closing brackets in total), which can be appended to a sequence of brackets with c more opening brackets than closing brackets.

Given this idea, the code is fairly straightforward:

> enumBrackets :: Int -> Enumeration String
> enumBrackets n = enumBracketsTail n 0
> enumBracketsTail :: Int -> Int -> Enumeration String
> enumBracketsTail = memo2 enumBracketsTail'
>   where

To enumerate a sequence with no opening brackets, just generate c closing brackets.

>     enumBracketsTail' 0 c = singleton (replicate c ')')

To enumerate balanced sequences with n opening brackets and an exactly matching number of closing brackets, start by generating an opening bracket and then continue by generating sequences with n-1 opening brackets and one extra closing bracket to match the opening bracket we started with.

>     enumBracketsTail' n 0 = ('(':) <$> enumBracketsTail (n-1) 1

In general, a sequence with n opening and c extra closing brackets is either an opening bracket followed by an (n-1, c+1)-sequence, or a closing bracket followed by an (n, c-1)-sequence.

>     enumBracketsTail' n c =
>         (('(':) <$> enumBracketsTail (n-1) (c+1))
>         ++
>         ((')':) <$> enumBracketsTail n (c-1))

This is quite fast, and as a quick check, it does indeed seem to give us the same size enumerations as the other tree enumerations:

ghci> fromNat (enumBrackets 40) 16221270422764920820

ghci> size (enumBrackets 100) == size (enumTreesMemo 100)

But, are they sorted? It would seem so!

ghci> all sorted (map (enumerate . enumBrackets) [1..10])

At this point, you might notice that this can be easily de-abstracted into a fairly simple dynamic programming solution, using a 2D array to keep track of the size of the enumeration for each (n,c) pair. I’ll leave the details to interested readers.

by Brent at October 25, 2016 04:42 AM

October 22, 2016

Douglas M. Auclair (geophf)

September 2016 1HaskellADay 1Liners Problems and Solutions

  • September 15th, 2016:
    Given [1..n], create an infinite list of lists [[1.. n], [n+1 ... n+n], [n+n+1 ... 3n], ...]
    counting :: [Integer] -> [[Integer]]
    • joomy @cattheory
      counting = (map . (+) . fromIntegral . length) >>= iterate
  • September 30th, 2016: The reverse of August's one-liner:
    f :: (Maybe a, b) -> Maybe (a,b)
    define f. Snaps for elegance.

by geophf ( at October 22, 2016 02:00 PM

October 21, 2016

Edwin Brady

State Machines All The Way Down

A new draft paper, State Machines All The Way Down, which describes an architecture for dependently typed functional programs. Abstract:

A useful pattern in dependently typed programming is to define a state transition system, for example the states and operations in a network protocol, as a parameterised monad. We index each operation by its input and output states, thus guaranteeing that operations satisfy pre- and post-conditions, by typechecking. However, what if we want to write a program using several systems at once? What if we want to define a high level state transition system, such as a network application protocol, in terms of lower level states, such as network sockets and mutable variables? In this paper, I present an architecture for dependently typed applications based on a hierarchy of state transition systems, implemented as a library called states. Using states, I show: how to implement a state transition system as a dependent type, with type level guarantees on its operations; how to account for operations which could fail; how to combine state transition systems into a larger system; and, how to implement larger systems as a hierarchy of state transition systems. As an example, I implement a simple high level network application protocol.

Comments welcome! You can get the draft here.

by edwinb at October 21, 2016 11:48 PM

October 20, 2016

Roman Cheplyaka

Mean-variance ceiling

Today I was playing with the count data from a small RNA-Seq experiment performed in Arabidopsis thaliana.

At some point, I decided to look at the mean-variance relationship for the fragment counts. As I said, the dataset is small; there are only 3 replicates per condition from which to estimate the variance. Moreover, each sample is from a different batch. I wasn’t expecting to see much.

But there was a pattern in the mean-variance plot that was impossible to miss.

<figure> Mean-variance plot of counts per million, log-log scale<figcaption>Mean-variance plot of counts per million, log-log scale</figcaption> </figure>

It is a nice straight line that many points lie on, but none dare to cross. A ceiling.

The ceiling looked mysterious at first, but then I found a simple explanation. The sample variance of \(n\) numbers \(a_1,\ldots,a_n\) can be written as

\[\sigma^2=\frac{n}{n-1}\left(\frac1n\sum_{i=1}^n a_i^2-\mu^2\right),\]

where \(\mu\) is the sample mean. Thus,

\[\frac{\sigma^2}{\mu^2}=\frac{\sum a_i^2}{(n-1)\mu^2}-\frac{n}{n-1}.\]

For non-negative numbers, \(n^2\mu^2=(\sum a_i)^2\geq \sum a_i^2\), and


This means that on a log-log plot, all points \((\mu,\sigma^2)\) lie on or below the line \(y=2x+\log n\).

Moreover, the points that lie exactly on the line correspond to the samples where all \(a_i\) but one are zero. In other words, those are gene-condition combinations where the gene’s transcripts were registered in a single replicate for that condition.

October 20, 2016 08:00 PM

October 19, 2016

Roman Cheplyaka

The rule of 17 in volleyball

Scott Adams, the author of Dilbert, writes in his book “How to Fail at Almost Everything and Still Win Big”:

Recently I noticed that the high-school volleyball games I attended in my role as stepdad were almost always won by the team that reached seventeen first, even though the winning score is twenty-five and you have to win by two.

It’s common for the lead to change often during a volleyball match, and the team that first reaches seventeen might fall behind a few more times before winning, which makes the pattern extra strange.

Good observation, Scott! But why could it be so?

Scott offers two possible explanations. One is psychological: the leading team has a higher morale while the losing team feels defeated. The other is that perhaps the coach of the losing team sees this as an opportunity to let his bench players on the court.

While these reasons sound plausible to me, there is a simpler logical explanation. It would hold even if the players and coaches were robots.

Imagine that you enter a gym where a game is being played. You see the current score: 15:17. If you know nothing else about the teams except their current score, which one do you think is more likely to win the set?

There are two reasons to think it is the leading team:

  1. The score by itself doesn’t offer much evidence that the leading team is stronger or in a better shape. However, if one of the teams is stronger, it is more likely to be the leading team.
  2. Even without assuming anything about how good the teams are, the leading team at this moment is up for an easier task: it needs only 8 points to win, whereas the team behind needs 10 points.

To quantify the reliability of Scott Adams’s “rule of 17”, I wrote a simple simulation in R: <- function(prob, threshold) {
  score <- c(0,0)
  leader <- NA
  serving <- 1
  while (all(score < 25) || abs(diff(score)) < 2) {
    winner <-
      if (as.logical(rbinom(1,1,prob[[serving]])))
        3 - serving
    score[[winner]] <- score[[winner]] + 1
    serving <- winner
    if ( && any(score == threshold)) {
      leader <- which.max(score)
  return(c(leader, which.max(score)))

Here prob is a 2-dimensional vector \((p_1,p_2)\), where \(p_i\) is the probability of team \(i\) to win their serve against the opposing team. The function simulates a single set and returns two numbers: which team first scored threshold (e.g. 17) points and which team eventually won. If the two numbers are equal, the rule worked in this game.

Then I simulated a game 1000 times for each of many combinations of \(p_1\) and \(p_2\) and calculated the fraction of the games where the rule worked. Here’s the result:

<figure> </figure>

When \(p_1=p_2\), the reliability of the rule is independent of the values of \(p_1\) and \(p_2\) (within the tested limits of \(0.3\) and \(0.7\)) and equals approximately \(81\%\). This is entirely due to reason 2: all else being equal, the leading team has a head start.

When teams are unequal, reason 1 kicks in, and for large inequalities, the reliability of the rule approaches \(1\). For instance, when \(p_1=0.3\) and \(p_2=0.7\), the rule works about \(99\%\) of the time.

Is there anything magical about the number 17? No, we would expect the rule to work for any threshold at least to some extent. The reliability would grow from somewhere around \(50\%\) for the threshold of \(1\) to almost \(100\%\) for the threshold of \(25\).

And indeed, this is what we observe (for \(p_1=p_2\)):

<figure> </figure>

This reminds me of men’s gold medal match at the 2012 London Olympics, where Russia played against Brazil. Russia loses the first two sets. A game lasts until one of the teams wins 3 sets in total, so Russia cannot afford to lose a single set now. In the third set, Brazil continues to lead, reaching 17 (and then 18) points while Russia has 15. Several minutes later, Brazil leads 22:19.

And then, against all odds, the Russian team wins that set 29:27, then the two following sets, and gets the gold.

<figure> Dmitriy Muserskiy is about to score the gold medal point<figcaption>Dmitriy Muserskiy is about to score the gold medal point</figcaption> </figure>

October 19, 2016 08:00 PM

October 18, 2016

Philip Wadler

Papers We Love Remote Meetup: John Reynolds, Definitional Interpreters for Higher-Order Languages

I will reprise my June presentation to Papers We Love London at Papers We Love Remote Meetup 2, today at 7pm UK time, with the subject John Reynolds, Definitional Interpreters for Higher-Order Languages. Learn the origins of denotational semantics and continuations. Additional citations here. See you there!

by Philip Wadler ( at October 18, 2016 02:34 PM

October 17, 2016

Ken T Takusagawa

[uitadwod] Stackage

Stackage for Haskell packages has the curious behavior that packages can disappear from it even though they were perfectly fine.  The cause of such a disappearance of say a package B is as follows: package B was originally pulled in as a dependency of another package A, and the maintainer of package A quit, so package A and all its dependencies, including package B, are candidates to be removed from Stackage.  Package B survives only if it has a direct maintainer in Stackage or is a dependency of another maintained package.

Inspired by the many packages that got dropped when lambdabot got removed from Stackage nightly, e.g., brainfuck.

Although the stated goal of Stackage is a curated collection of Haskell packages, each with an explicit maintainer willing to fix bugs and compilation problems (e.g., with new versions of GHC), I have found that a side feature is more useful: the identification of a large mutually compatible collection of packages without version dependency problems.  Such a side feature -- such a collection -- could be computed automatically without having to have a direct or indirect maintainer for each package in the collection.  I wish such a larger collection existed.

Start with, say, Stackage Nightly and expand it to include every package in Hackage that compiles cleanly and is compatible with Stackage Nightly and with every other package in the expanded collection.  There may be tricky cases of mutually incompatible packages in a potential expanded set which will need to be resolved, e.g., the newest version of A requires an old version of B, and the newest version of B requires an old version of A.  Perhaps resolve such conflicts in favor of the choice which causes the expanded set to be as large as possible.

Tangentially, how can one safely build a package (to test whether it compiles cleanly) if one is not sure whether a package's build script is evil?  Probably some kind of operating system container or sandbox.  Identify packages which use simple, presumably safe, build mechanisms, probably pure Haskell, versus packages which do something unusual, e.g., call a Makefile, which ought to be scrutinized before building.  (Inspired by a build script of software, I think maxima computer algebra, which creepily attempted to send email back to the author every time it was compiled.)

Can compiling a carefully crafted source file with GHC allow the author of the source file to perform arbitrary user-level actions within the operating system?

by Ken ( at October 17, 2016 05:33 AM

Tweag I/O

A new ecosystem for Haskell: the JVM

Mathieu Boespflug, Alp Mestanogullari   |   17 October 2016

By now, Haskell has first class support for seamlessly embedding foreign code into source files and casually call anything in C (via inline-c) or R (via inline-r), let alone that whole programs can also be compiled down to in-browser JavaScript, thanks to GHCJS. Today the interoperability story for Haskell is getting better still: we’re announcing the addition of a new set of languages into the mix. With jvm, you can call any method known to the JVM from Haskell. With inline-java, you can moreover call these methods in Java syntax, embedded in your source files. Not that this was particularly our intention - promise! inline-java and friends just fell out naturally from our work on sparkle

To give you a taste of what it’s like to program this way, here’s an obligatory “Hello World!” in Haskell, but with a twist: we call the Swing GUI toolkit to display our message to the world in a graphical dialog box.

A Swing GUI application in Haskell

A Swing GUI application in Haskell

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Main where

import Data.Int
import Language.Java
import Language.Java.Inline

main :: IO Int32
main = withJVM [] $ do
    message <- reflect "Hello World!"
    [java| { javax.swing.JOptionPane.showMessageDialog(null, $message);
             return 0; } |]

In short, it’s now possible to write Java programs that call into Haskell with near trivial overhead, as we demonstrated previously with sparkle, or indeed Haskell programs that call into any of the hundreds of thousands of JVM packages available publicly and boundless custom enterprise developments.

How it works

The key enabler to talking to Java and all the other JVM languages from Haskell is that Haskell speaks C and the JVM speaks C. That is, both Haskell and the JVM make it possible to call C functions and have C functions call them. In both cases, this is done as part of their respective foreign function interfaces (FFI). So we have a lingua franca for both languages: to move from Haskell to Java or vice versa, go through C first. Each language has their own custom calling convention anyways, so some small amount of glue code to mediate between the two is inevitable.

In fact, in the case of the JVM, bytecode is compiled “just-in-time” or perhaps even not at all. Fortunately, that’s not something we have to worry about: the JVM’s standard interface from C, called the Java Native Interface (JNI), encapsulates all the nitty-gritty detail of invoking methods behind a simple interface. As a first step, we wrote near complete bindings to all of the JNI, using inline-c under the hood for better safety.

Calling into the JVM, the hard way

We could just expose the raw JNI API in Haskell and call it a day. Using raw JNI calls to invoke say a static Java method called foo, which takes an int and a boolean and returns some object, goes something like this:

import Foreign.JNI

callFoo = do
  klass <- findClass "some/Java/Class"    -- JNI uses '/' instead of '.'...
  method <- getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;"
  callStaticObjectMethod klass method [JInt 0, JBoolean 1]

Because the JVM allows overloaded method names, when grabbing a handle to invoke a method, you’ll need to specify a type signature to disambiguate which method you really want to call. But the JNI was purposefully designed independently of Java’s syntax, to the point where even class names are written differently. The JNI syntax for type signatures is optimized for speed of parsing and compactness, not legibility. So constructing these type signatures by hand to invoke JVM methods via raw JNI calls is rather error prone. That’s why we wrote the jvm package, a toolkit for invoking JVM methods more conveniently and robustly.

Using Haskell types for safer JVM calls

There are two downsides to the raw JNI calls we saw above:

  • performance: getting class and method handles is expensive. Ideally, we’d only ever lookup classes and methods by name at most once throughout the lifetime of the program, assuming that loaded classes exist for all time and are never redefined.
  • stringly typing: we pass signatures explicitly, but these are literally strings, typos and all. If you mistype the signature, no compiler will call that out. Ideally ill-formed signatures would be caught at compile-time, rather than at runtime when it’s far too late and your program will simply crash.

The performance issue is easily dispensed with. The trick is to write wrappers that tell Haskell that findClass and getStaticMethodID are really pure, in the sense that calling either of them multiple times and in any context always yields equivalent results. So we could in principle ascribe pure types to them. The argument goes something like the following. Compare the following snippet with the one above:

callFoo = do
  let pureStuff@(klass, method) = unsafePerformIO $ do
      (,) <$> findClass "some/Java/Class"
          <*> getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;"
  callStaticObjectMethod klass method [JInt 0, JBoolean 1]

The expression for pureStuff is a closed expression (no free variables occur). And because its type is not IO, the compiler is free to float it to top-level, effectively turning it into a CAF, which are always evaluated at most once thanks to laziness:

(klass, method) = unsafePerformIO $ do
  (,) <$> findClass "some/Java/Class"
      <*> getStaticMethodID klass "foo" "(IZ)Ljava/lang/Object;"

callFoo = do
  callStaticObjectMethod klass method [JInt 0, JBoolean 1]

As for the stringly typing problem, we’ll need some tools first. First, we need to reflect in Haskell enough type information. To that end, we’ll index the type of Java objects by their Java type:

newtype J (a :: JType) = J (Ptr (J a))

Java types can either be primitives (int,boolean, etc) or reference types (classes, arrays, interfaces, generics etc). So our definition of JType goes something like this:

data JType
  = Prim Symbol
  | Class Symbol
  | Array JType
  | ...

genSingletons ['JType]

Thus equipped, we can write types like,

  • the type of Swing option panes, J ('Class "javax.swing.JOptionPane")
  • the type of boxed Java integers, J ('Class "java.lang.Integer"),
  • the type of primitive integer arrays, J ('Array ('Prim "int")),
  • etc.

What’s more, thanks to the family of singleton types and instances created by genSingletons above, we can reflect on the type of any Java object at runtime to get a representation of the type at the value level. This is helpful to auto compute JNI type signatures from the types alone. No more stringly typing will all those typos in tow: JNI type signatures are now correct by construction.

In particular, we can define a family of variants of callStaticObjectMethod:

module Language.Java where

  :: (SingI ty1, SingI tyr)
  => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> IO (J tyr)
  :: (SingI ty1, SingI ty2, SingI tyr)
  => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> J ty2 -> IO (J tyr)
  :: (SingI ty1, SingI ty2, SingI ty3, SingI tyr)
  => Sing (klass :: Symbol) -> JNI.String -> J ty1 -> J ty2 -> J ty3 -> IO (J tyr)

The types of these functions are expressive enough to infer a type signature for the called method. Thanks to the type reflection provided by the singletons package, we can reify types as values and produce JNI type signatures from that. Of course, a fixed number of callStatic* functions, one per arity, is rather limiting (what about arbitrary arities?), so in reality the Language.Java module provides a single such function, to whom arguments are passed packed into a homogeneous list:

  :: SingI tyr
  => Sing (klass :: Symbol) -> JNI.String -> [JValue] -> IO (J tyr)

where JValue is defined as

data JValue
  = forall a. SingI a => JObject (J a)
  | JBoolean Word8
  | JInt Int32
  | ...

In this way, values of primitive type can be passed to Java without penalty: no need to box them into tiny objects first. It turns out we can extend the same idea to obtain unboxed return values, but the technical details get a bit more intricate, so we’ll have to defer that to the module’s documentation.

Calling a non-static method is achieved in much the same way:

call :: (SingI ty, SingI tyr)
     => J ty -> JNI.String -> [JValue] -> IO (J tyr)

JVM calls the Java way

call and callStatic are surprisingly versatile facilities for calling arbitrary JVM methods with an arbitrary number of boxed or unboxed arguments and return values, but sometimes one might still get the types wrong. For example, there’s nothing stopping us from attempting to call a java.lang.Integer constructor with a boolean typed argument. No such constructor exists, so we’ll get a method lookup exception at runtime. After all, we don’t know in Haskell what methods really do exist, and what their signatures are. But if we call the java.lang.Integer constructor using Java syntax, we can hope to get the Java compiler to perform full scope checking and type checking, thus ruling out common errors such as calling non-existent methods are supplying arguments of the wrong type.

To achieve that, we use GHC’s quasiquotation extension. This extension allows us to embed syntax from arbitrary foreign languages in Haskell source files, in between special brackets. Better yet, we are free to extend the foreign syntax to express antiquotation variables, i.e. variables that refer to the enclosing context in Haskell. Take for example our earlier “hello world” code snippet, simplified:

do message <- reflect "Hello World!"
   [java| javax.swing.JOptionPane.showMessageDialog(null, $message) |]

Using reflect, also provided by inline-java, we create a J "java.lang.String" from a Haskell String. We can then refer to this Java object, bound to a Haskell variable, from inside the Java code snippet. The $ sigil is there to disambiguate between variables bound in the Haskell context (aka antiquotation) and in the Java context.

You might have noticed a difference with inline-c: in inline-java we don’t need to annotate quasiquotations with the return type nor each antiquote with their types, which can get quite verbose. Instead, we just about manage to infer which foreign types are intended based on the types of the Haskell variables. To pull this off required a journey in compiler hacking, ghc-heap-view and a novel use of static pointers. A journey best told next time…

The road ahead

There are plenty of solutions out there for lightweight interop across languages. You can start by swapping JSON messages between separate processes and take it from there. But for a truly universal solution fit for all situations, our experience is that keeping any overheads low or perhaps even nonexistent is the key enabler to seamlessly mixing multiple languages and blithely crossing language boundaries without guilt. In this post we presented a suite of packages for high-speed Java/Haskell interop, which together ensure:

  • box-free foreign calls: because we infer precise JVM types from Haskell types, arguments passed to JVM methods are boxed only if they need to be. Small values of primitive type can be passed to/from the JVM with no allocation at all on the heap.
  • marshalling-free argument passing: Java objects can be manipulated as easily from Haskell as from Java. This means that you can stick to representing all your data as Java objects if you find yourself calling into Java very frequently, hence avoiding any marshalling costs when transferring control to/from the JVM.
  • type safe Java calls: when calls are made in Java syntax, this syntax is supplied to an embedded instance of javac at compile-time for scope checking and type checking. That way we have a static guarantee that the types on the Haskell side match up with the types on the Java side, without having to resort to FFI stub generators and preprocessors.

We were fortunate enough to be able to stand on excellent libraries to get here. Take parsing of Java syntax: that came straight from Niklas Broberg and Vincent Hanquez’s venerable language-java library.

What we haven’t addressed yet with inline-java is the perennial issue when interoperating two garbage collected languages of automatic memory management. Since we have two heaps (the GHC heap and the JVM heap), with two garbage collectors, neither of which able to traverse objects in the other heap, we are forced to pin in memory objects shared across the language boundary. In the case of JVM objects, the JNI does this for us implicitly, provided object references are kept thread-local. It would be nice if we could make these object references safe across threads and get both garbage collectors to agree to dispose of them safely when dead. You can get a fair amount of mileage the way things are: we managed to run topic analysis on all of Wikipedia concurrently on 16 machines and hours of machine time without tinkering with object lifetimes and GC’s.

So plenty more to do still! Make sure to check out the project’s GitHub repository to follow progress and contribute.

by Engineering team at Tweag I/O at October 17, 2016 12:00 AM

October 16, 2016

Dan Piponi (sigfpe)

Expectation-Maximization with Less Arbitrariness


There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are:

  1. Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
  2. Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
  3. Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a stackexchange answer.

Generalities about EM

The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.

First assume no hidden variables. We assume there is a vector of parameters that defines some model. We make some observations . We have a probability density that depends on . The likelihood of given the observations is . The maximum likelhood estimator for is the choice of that maximises for the we have observed.

Now suppose there are also some variables that we didn't get to observe. We assume a density . We now have

where we sum over all possible values of . The MLE approach says we now need to maximise
One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood
Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise
One reason this would be to our advantage is that often takes the form where is a simple function to optimise. In addition, may break up as a sum of terms, each with its own block of 's. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in.

Maximisation by proxy

Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose is a differentiable function in a neighbourhood of . Then around we have

We can try optimising with respect to within a neighbourhood of . If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing because it matches both the value and derivatives of at . We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.

We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice.


The are constants we'll determine. We want to match the derivatives on either side of the at :
On the other hand we have

To achieve equality we want to make these expressions match. We choose

Our desired proxy function is:

So the procedure is to take an estimated and obtain a new estimate by optimising this proxy function with respect to . This is the standard EM algorithm.

It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise ``too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature.


As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute

you can iterate, at each step computing
where is the previous iteration. If the take a convenient form then this may turn out to be much easier.


This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.

by Dan Piponi ( at October 16, 2016 11:04 PM

October 12, 2016

Philip Wadler

Lambdaman (and Lambdawoman) supporting Bootstrap - Last Three Days!

You have just three more days to order your own Lambdaman or Lambdawoman t-shirt, as featured in the video of Propositions as Types. Now available in unisex, children's, and women's shirts. Profits go to Bootstrap, an organisation run by Shriram Krishnamurthi, Matthias Felleisen, and the PLT group that teaches functional programming to middle and high school students. Order will be printed on October 15. 

by Philip Wadler ( at October 12, 2016 09:42 AM

October 10, 2016

Edward Z. Yang

Try Backpack: ghc --backpack

Backpack, a new system for mix-in packages in Haskell, has landed in GHC HEAD. This means that it has become a lot easier to try Backpack out: you just need a nightly build of GHC. Here is a step-by-step guide to get you started.

Download a GHC nightly

Get a nightly build of GHC. If you run Ubuntu, this step is very easy: add Herbert V. Riedel's PPA to your system and install ghc-head:

sudo add-apt-repository ppa:hvr/ghc
sudo apt-get update
sudo aptitude install ghc-head

This will place a Backpack-ready GHC in /opt/ghc/head/bin/ghc. My recommendation is you create a symlink named ghc-head to this binary from a directory that is in your PATH.

If you are not running Ubuntu, you'll have to download a nightly or build GHC yourself.

Hello World

GHC supports a new file format, bkp files, which let you easily define multiple modules and packages in a single source file, making it easy to experiment with Backpack. This format is not suitable for large scale programming, but we will use it for our tutorial. Here is a simple "Hello World" program:

unit main where
  module Main where
    main = putStrLn "Hello world!"

We define a unit (think package) with the special name main, and in it define a Main module (also specially named) which contains our main function. Place this in a file named hello.bkp, and then run ghc --backpack hello.bkp (using your GHC nightly). This will produce an executable at main/Main which you can run; you can also explicitly specify the desired output filename using -o filename. Note that by default, ghc --backpack creates a directory with the same name as every unit, so -o main won't work (it'll give you a linker error; use a different name!)

A Play on Regular Expressions

Let's write some nontrivial code that actually uses Backpack. For this tutorial, we will write a simple matcher for regular expressions as described in A Play on Regular Expressions (Sebastian Fischer, Frank Huch, Thomas Wilke). The matcher itself is inefficient (it checks for a match by testing all exponentially many decompositions of a string), but it will be sufficient to illustrate many key concepts of Backpack.

To start things off, let's go ahead and write a traditional implementation of the matcher by copy-pasting the code from this Functional Pearl into a Regex module in the Backpack file and writing a little test program to run it:

unit regex where
    module Regex where
        -- | A type of regular expressions.
        data Reg = Eps
                 | Sym Char
                 | Alt Reg Reg
                 | Seq Reg Reg
                 | Rep Reg

        -- | Check if a regular expression 'Reg' matches a 'String'
        accept :: Reg -> String -> Bool
        accept Eps       u = null u
        accept (Sym c)   u = u == [c]
        accept (Alt p q) u = accept p u || accept q u
        accept (Seq p q) u =
            or [accept p u1 && accept q u2 | (u1, u2) <- splits u]
        accept (Rep r) u =
            or [and [accept r ui | ui <- ps] | ps <- parts u]

        -- | Given a string, compute all splits of the string.
        -- E.g., splits "ab" == [("","ab"), ("a","b"), ("ab","")]
        splits :: String -> [(String, String)]
        splits [] = [([], [])]
        splits (c:cs) = ([], c:cs):[(c:s1,s2) | (s1,s2) <- splits cs]

        -- | Given a string, compute all possible partitions of
        -- the string (where all partitions are non-empty).
        -- E.g., partitions "ab" == [["ab"],["a","b"]]
        parts :: String -> [[String]]
        parts [] = [[]]
        parts [c] = [[[c]]]
        parts (c:cs) = concat [[(c:p):ps, [c]:p:ps] | p:ps <- parts cs]

unit main where
    dependency regex
    module Main where
        import Regex
        nocs = Rep (Alt (Sym 'a') (Sym 'b'))
        onec = Seq nocs (Sym 'c')
        -- | The regular expression which tests for an even number of cs
        evencs = Seq (Rep (Seq onec onec)) nocs
        main = print (accept evencs "acc")

If you put this in regex.bkp, you can once again compile it using ghc --backpack regex.bkp and invoke the resulting executable at main/Main. It should print True.

Functorizing the matcher

The previously shown code isn't great because it hardcodes String as the type to do regular expression matching over. A reasonable generalization (which you can see in the original paper) is to match over arbitrary lists of symbols; however, we might also reasonably want to match over non-list types like ByteString. To support all of these cases, we will instead use Backpack to "functorize" (in ML parlance) our matcher.

We'll do this by creating a new unit, regex-indef, and writing a signature which provides a string type (we've decided to call it Str, to avoid confusion with String) and all of the operations which need to be supported on it. Here are the steps I took:

  1. First, I copy-pasted the old Regex implementation into the new unit. I replaced all occurrences of String with Str, and deleted splits and parts: we will require these to be implemented in our signature.

  2. Next, we create a new Str signature, which is imported by Regex, and defines our type and operations (splits and parts) which it needs to support:

    signature Str where
      data Str
      splits :: Str -> [(Str, Str)]
      parts :: Str -> [[Str]]
  3. At this point, I ran ghc --backpack to typecheck the new unit. But I got two errors!

    regex.bkp:90:35: error:
        • Couldn't match expected type ‘t0 a0’ with actual type ‘Str’
        • In the first argument of ‘null’, namely ‘u’
          In the expression: null u
          In an equation for ‘accept’: accept Eps u = null u
    regex.bkp:91:35: error:
        • Couldn't match expected type ‘Str’ with actual type ‘[Char]’
        • In the second argument of ‘(==)’, namely ‘[c]’
          In the expression: u == [c]
          In an equation for ‘accept’: accept (Sym c) u = u == [c]

    Traversable null nonsense aside, the errors are quite clear: Str is a completely abstract data type: we cannot assume that it is a list, nor do we know what instances it has. To solve these type errors, I introduced the combinators null and singleton, an instance Eq Str, and rewrote Regex to use these combinators (a very modest change.) (Notice we can't write instance Traversable Str; it's a kind mismatch.)

Here is our final indefinite version of the regex unit:

unit regex-indef where
    signature Str where
        data Str
        instance Eq Str
        null :: Str -> Bool
        singleton :: Char -> Str
        splits :: Str -> [(Str, Str)]
        parts :: Str -> [[Str]]
    module Regex where
        import Prelude hiding (null)
        import Str

        data Reg = Eps
                 | Sym Char
                 | Alt Reg Reg
                 | Seq Reg Reg
                 | Rep Reg

        accept :: Reg -> Str -> Bool
        accept Eps       u = null u
        accept (Sym c)   u = u == singleton c
        accept (Alt p q) u = accept p u || accept q u
        accept (Seq p q) u =
            or [accept p u1 && accept q u2 | (u1, u2) <- splits u]
        accept (Rep r) u =
            or [and [accept r ui | ui <- ps] | ps <- parts u]

(To keep things simple for now, I haven't parametrized Char.)

Instantiating the functor (String)

This is all very nice but we can't actually run this code, since there is no implementation of Str. Let's write a new unit which provides a module which implements all of these types and functions with String, copy pasting in the old implementations of splits and parts:

unit str-string where
    module Str where
        import Prelude hiding (null)
        import qualified Prelude as P

        type Str = String

        null :: Str -> Bool
        null = P.null

        singleton :: Char -> Str
        singleton c = [c]

        splits :: Str -> [(Str, Str)]
        splits [] = [([], [])]
        splits (c:cs) = ([], c:cs):[(c:s1,s2) | (s1,s2) <- splits cs]

        parts :: Str -> [[Str]]
        parts [] = [[]]
        parts [c] = [[[c]]]
        parts (c:cs) = concat [[(c:p):ps, [c]:p:ps] | p:ps <- parts cs]

One quirk when writing Backpack implementations for functions is that Backpack does no subtype matching on polymorphic functions, so you can't implement Str -> Bool with a polymorphic function Traversable t => t a -> Bool (adding this would be an interesting extension, and not altogether trivial). So we have to write a little impedance matching binding which monomorphizes null to the expected type.

To instantiate regex-indef with str-string:Str, we modify the dependency in main:

-- dependency regex -- old
dependency regex-indef[Str=str-string:Str]

Backpack files require instantiations to be explicitly specified (this is as opposed to Cabal files, which do mix-in linking to determine instantiations). In this case, the instantiation specifies that regex-indef's signature named Str should be filled with the Str module from str-string.

After making these changes, give ghc --backpack a run; you should get out an identical looking result.

Instantiating the functor (ByteString)

The whole point of parametrizing regex was to enable us to have a second implementation of Str. So let's go ahead and write a bytestring implementation. After a little bit of work, you might end up with this:

unit str-bytestring where
    module Str(module Data.ByteString.Char8, module Str) where
        import Prelude hiding (length, null, splitAt)
        import Data.ByteString.Char8
        import Data.ByteString

        type Str = ByteString

        splits :: Str -> [(Str, Str)]
        splits s = fmap (\n -> splitAt n s) [0..length s]

        parts :: Str -> [[Str]]
        parts s | null s    = [[]]
                | otherwise = do
                    n <- [1..length s]
                    let (l, r) = splitAt n s
                    fmap (l:) (parts r)

There are two things to note about this implementation:

  1. Unlike str-string, which explicitly defined every needed method in its module body, str-bytestring provides null and singleton simply by reexporting all of the entities from Data.ByteString.Char8 (which are appropriately monomorphic). We've cleverly picked our names to abide by the existing naming conventions of existing string packages!
  2. Our implementations of splits and parts are substantially more optimized than if we had done a straight up transcription of the consing and unconsing from the original String implementation. I often hear people say that String and ByteString have very different performance characteristics, and thus you shouldn't mix them up in the same implementation. I think this example shows that as long as you have sufficiently high-level operations on your strings, these performance changes smooth out in the end; and there is still a decent chunk of code that can be reused across implementations.

To instantiate regex-indef with bytestring-string:Str, we once again modify the dependency in main:

-- dependency regex -- oldest
-- dependency regex-indef[Str=str-string:Str] -- old
dependency regex-indef[Str=str-bytestring:Str]

We also need to stick an {-# LANGUAGE OverloadedStrings #-} pragma so that "acc" gets interpreted as a ByteString (unfortunately, the bkp file format only supports language pragmas that get applied to all modules defined; so put this pragma at the top of the file). But otherwise, everything works as it should!

Using both instantiations at once

There is nothing stopping us from using both instantiations of regex-indef at the same time, simply by uncommenting both dependency declarations, except that the module names provided by each dependency conflict with each other and are thus ambiguous. Backpack files thus provide a renaming syntax for modules which let you give each exported module a different name:

dependency regex-indef[Str=str-string:Str]     (Regex as Regex.String)
dependency regex-indef[Str=str-bytestring:Str] (Regex as Regex.ByteString)

How should we modify Main to run our regex on both a String and a ByteString? But is Regex.String.Reg the same as Regex.ByteString.Reg? A quick query to the compiler will reveal that they are not the same. The reason for this is Backpack's type identity rule: the identity of all types defined in a unit depends on how all signatures are instantiated, even if the type doesn't actually depend on any types from the signature. If we want there to be only one Reg type, we will have to extract it from reg-indef and give it its own unit, with no signatures.

After the refactoring, here is the full final program:

{-# LANGUAGE OverloadedStrings #-}

unit str-bytestring where
    module Str(module Data.ByteString.Char8, module Str) where
        import Prelude hiding (length, null, splitAt)
        import Data.ByteString.Char8
        import Data.ByteString

        type Str = ByteString

        splits :: Str -> [(Str, Str)]
        splits s = fmap (\n -> splitAt n s) [0..length s]

        parts :: Str -> [[Str]]
        parts s | null s    = [[]]
                | otherwise = do
                    n <- [1..length s]
                    let (l, r) = splitAt n s
                    fmap (l:) (parts r)

unit str-string where
    module Str where
        import Prelude hiding (null)
        import qualified Prelude as P

        type Str = String

        null :: Str -> Bool
        null = P.null

        singleton :: Char -> Str
        singleton c = [c]

        splits :: Str -> [(Str, Str)]
        splits [] = [([], [])]
        splits (c:cs) = ([], c:cs):[(c:s1,s2) | (s1,s2) <- splits cs]

        parts :: Str -> [[Str]]
        parts [] = [[]]
        parts [c] = [[[c]]]
        parts (c:cs) = concat [[(c:p):ps, [c]:p:ps] | p:ps <- parts cs]

unit regex-types where
    module Regex.Types where
        data Reg = Eps
                 | Sym Char
                 | Alt Reg Reg
                 | Seq Reg Reg
                 | Rep Reg

unit regex-indef where
    dependency regex-types
    signature Str where
        data Str
        instance Eq Str
        null :: Str -> Bool
        singleton :: Char -> Str
        splits :: Str -> [(Str, Str)]
        parts :: Str -> [[Str]]
    module Regex where
        import Prelude hiding (null)
        import Str
        import Regex.Types

        accept :: Reg -> Str -> Bool
        accept Eps       u = null u
        accept (Sym c)   u = u == singleton c
        accept (Alt p q) u = accept p u || accept q u
        accept (Seq p q) u =
            or [accept p u1 && accept q u2 | (u1, u2) <- splits u]
        accept (Rep r) u =
            or [and [accept r ui | ui <- ps] | ps <- parts u]

unit main where
    dependency regex-types
    dependency regex-indef[Str=str-string:Str]     (Regex as Regex.String)
    dependency regex-indef[Str=str-bytestring:Str] (Regex as Regex.ByteString)
    module Main where
        import Regex.Types
        import qualified Regex.String
        import qualified Regex.ByteString
        nocs = Rep (Alt (Sym 'a') (Sym 'b'))
        onec = Seq nocs (Sym 'c')
        evencs = Seq (Rep (Seq onec onec)) nocs
        main = print (Regex.String.accept evencs "acc") >>
               print (Regex.ByteString.accept evencs "acc")

And beyond!

Next time, I will tell you how to take this prototype in a bkp file, and scale it up into a set of Cabal packages. Stay tuned!

Postscript. If you are feeling adventurous, try further parametrizing regex-types so that it no longer hard-codes Char as the element type, but some arbitrary element type Elem. It may be useful to know that you can instantiate multiple signatures using the syntax dependency regex-indef[Str=str-string:Str,Elem=str-string:Elem] and that if you depend on a package with a signature, you must thread the signature through using the syntax dependency regex-types[Elem=<Elem>]. If this sounds user-unfriendly, it is! That is why in the Cabal package universe, instantiation is done implicitly, using mix-in linking.

by Edward Z. Yang at October 10, 2016 08:39 AM

October 08, 2016

Joachim Breitner

T430s → T460s

Earlier this week, I finally got my new machine that came with my new position at the University of Pennsylvania: A shiny Thinkpad T460s that now replaces my T430s. (Yes, there is a pattern. It continues with T400 and T41p.) I decided to re-install my Debian system from scratch and copy over only the home directory – a bit of purification does not hurt. This blog post contains some random notes that might be useful to someone or alternative where I hope someone can tell me how to fix and improve things.


The installation (using debian-installer from a USB drive) went mostly smooth, including LVM on an encrypted partition. Unfortunately, it did not set up grub correctly for the UEFI system to boot, so I had to jump through some hoops (using the grub on the USB drive to manually boot into the installed system, and installing grub-efi from there) until the system actually came up.

High-resolution display

This laptop has a 2560×1440 high resolution display. Modern desktop environments like GNOME supposedly handle that quite nicely, but for reasons explained in an earlier post, I do not use a desktop envrionment but have a minimalistic setup based on Xmonad. I managed to get a decent setup now, by turning lots of manual knobs:

  • For the linux console, setting


    in /etc/default/console-setup yielded good results.

  • For the few GTK-2 applications that I am still running, I set

    gtk-font-name="Sans 16"

    in ~/.gtkrc-2.0. Similarly, for GTK-3 I have

    gtk-font-name = Sans 16

    in ~/.config/gtk-3.0/settings.ini.

  • Programs like gnome-terminal, Evolution and hexchat refer to the “System default document font” and “System default monospace font”. I remember that it was possible to configure these in the GNOME control center, but I could not find any way of configuring these using command line tools, so I resorted to manually setting the font for these. With the help from Alexandre Franke I figured out that the magic incarnation here is:

    gsettings set org.gnome.desktop.interface monospace-font-name 'Monospace 16'
    gsettings set org.gnome.desktop.interface document-font-name 'Serif 16'
    gsettings set org.gnome.desktop.interface font-name 'Sans 16'
  • Firefox seemed to have picked up these settings for the UI, so that was good. To make web pages readable, I set layout.css.devPixelsPerPx to 1.5 in about:config.

  • GVim has set guifont=Monospace\ 16 in ~/.vimrc. The toolbar is tiny, but I hardly use it anyways.

  • Setting the font of Xmonad prompts requires the sytax

    , font = "xft:Sans:size=16"

    Speaking about Xmonad prompts: Check out the XMonad.Prompt.Unicode module that I have been using for years and recently submitted upstream.

  • I launch Chromium (or rather the desktop applications that I use that happen to be Chrome apps) with the parameter --force-device-scale-factor=1.5.

  • Libreoffice seems to be best configured by running xrandr --dpi 194 before hand. This seems also to be read by Firefox, doubling the effect of the font size in the gtk settings, which is annoying. Luckily I do not work with Libreoffice often, so for now I’ll just set that manually when needed.

I am not quite satisfied. I have the impression that the 16 point size font, e.g. in Evolution, is not really pretty, so I am happy to take suggestions here.

I found the ArchWiki page on HiDPI very useful here.

Trackpoint and Touchpad

One reason for me to sticking with Thinkpads is their trackpoint, which I use exclusively. In previous models, I disabled the touchpad in the BIOS, but this did not seem to have an effect here, so I added the following section to /etc/X11/xorg.conf.d/30-touchpad.conf

Section "InputClass"
        Identifier "SynPS/2 Synaptics TouchPad"
        MatchProduct "SynPS/2 Synaptics TouchPad"
        Option "ignore" "on"

At one point I left out the MatchProduct line, disabling all input in the X server. Had to boot into recovery mode to fix that.

Unfortunately, there is something wrong with the trackpoint and the buttons: When I am moving the trackpoint (and maybe if there is actual load on the machine), mouse button press and release events sometimes get lost. This is quite annoying – I try to open a folder in Evolution and accidentially move it.

I installed the latest Kernel from Debian experimental (4.8.0-rc8), but it did not help.

I filed a bug report against libinput although I am not fully sure that that’s the culprit.

Update: According to Benjamin Tissoires it is a known firmware bug and the appropriate people are working on a work-around. Until then I am advised to keep my palm of the touchpad.

Also, I found the trackpoint too slow. I am not sure if it is simply because of the large resolution of the screen, or because some movement events are also swallowed. For now, I simply changed the speed by writing

SUBSYSTEM=="serio", DRIVERS=="psmouse", ATTRS{speed}="120"

to /etc/udev/rules.d/10-trackpoint.rules.

Brightness control

The system would not automatically react to pressing Fn-F5 and Fn-F6, which are the keys to adjust the brightness. I am unsure about how and by what software component it “should” be handled, but the solution that I found was to set

Section "Device"
        Identifier  "card0"
        Driver      "intel"
        Option      "Backlight"  "intel_backlight"
        BusID       "PCI:0:2:0"

so that the command line tool xbacklight would work, and then use Xmonad keybinds to perform the action, just as I already do for sound control:

    , ((0, xF86XK_Sleep),       spawn "dbus-send --system --print-reply --dest=org.freedesktop.UPower /org/freedesktop/UPower org.freedesktop.UPower.Suspend")
    , ((0, xF86XK_AudioMute), spawn "ponymix toggle")
    , ((0, 0x1008ffb2 {- xF86XK_AudioMicMute -}), spawn "ponymix --source toggle")
    , ((0, xF86XK_AudioRaiseVolume), spawn "ponymix increase 5")
    , ((0, xF86XK_AudioLowerVolume), spawn "ponymix decrease 5")
    , ((shiftMask, xF86XK_AudioRaiseVolume), spawn "ponymix increase 5 --max-volume 200")
    , ((shiftMask, xF86XK_AudioLowerVolume), spawn "ponymix decrease 5")
    , ((0, xF86XK_MonBrightnessUp), spawn "xbacklight +10")
    , ((0, xF86XK_MonBrightnessDown), spawn "xbacklight -10")

The T460s does not actually have a sleep button, that line is a reminiscence from my T430s. I suspend the machine by pressing the power button now, thanks to HandlePowerKey=suspend in /etc/systemd/logind.conf.

Profile Weirdness

Something strange happend to my environment variables after the move. It is clearly not hardware related, but I simply cannot explain what has changed: All relevant files in /etc look similar enough.

I use ~/.profile to extend the PATH and set some other variables. Previously, these settings were in effect in my whole X session, which is started by lightdm with auto-login, followed by xmonad-session. I could find no better way to fix that than stating . ~/.profile early in my ~/.xmonad/xmonad-session-rc. Very strange.

by Joachim Breitner ( at October 08, 2016 09:22 PM

October 05, 2016

Brent Yorgey

ICFP roundup

ICFP 2016 in Nara, Japan was a blast. Here are a few of my recollections.

The Place

Although I was a coathor on an ICFP paper in 2011, when it was in Tokyo, I did not go since my son was born the same week. So this was my first time in Japan, or anywhere in Asia, for that matter. (Of course, this time I missed my son’s fifth birthday…)

I’ve been to Europe multiple times, and although it is definitely foreign, the culture is similar enough that I feel like I basically know how to behave. I did not feel that way in Japan. I’m pretty sure I was constantly being offensive without realizing it, but most of the time people were polite and accommodating.

…EXCEPT for that one time I was sitting in a chair chatting with folks during a break between sessions, with my feet up on a (low, plain) table, and an old Japanese guy WHACKED his walking stick on the table and shouted angrily at me in Japanese. That sure got my adrenaline going. Apparently putting your feet on the table is a big no-no, lesson learned.

The food was amazing even though I didn’t know what half of it was. I was grateful that I (a) am not vegetarian, (b) know how to use chopsticks decently well, and (c) am an adventurous eater. If any one of those were otherwise, things might have been more difficult!

On my last day in Japan I had the whole morning before I needed to head to the airport, so Ryan Yates and I wandered around Nara and saw a bunch of temples, climbed the hill, and such. It’s a stunningly beautiful place with a rich history.

The People

As usual, it’s all about the people. I enjoyed meeting some new people, including (but not limited to):

  • Pablo Buiras and Marco Vassena were my hotel breakfast buddies, it was fun getting to know them a bit.
  • I finally met Dominic Orchard, though I feel like I’ve known his name and known about some of his work for a while.
  • I don’t think I had met Max New before but we had a nice chat about the Scheme enumerations library he helped develop and combinatorial species. I hope to be able to follow up that line of inquiry.
  • As promised, I met everyone who commented on my blog post, including Jürgen Peters (unfortunately we did not get a chance to play go), Andrey Mokhov (who nerd-sniped me with a cool semiring-ish thing with some extra structure — perhaps that will be another blog post), and Jay McCarthy (whom I had actually met before, but we had some nice chats, including one in the airport while waiting for our flight to LAX).
  • I don’t think I had met José Manuel Calderón Trilla before; we had a great conversation over a meal together (along with Ryan Yates) in the Osaka airport while waiting for our flights.
  • I met Diogenes Nunez, who went to my alma mater Williams College. When I taught at Williams a couple years ago I’m pretty sure I heard Diogenes mentioned by the other faculty, so it was fun to get to meet him.
  • Last but certainly not least, I met my coauthor, Piyush Kurur. We collaborated on a paper through the magic of the Internet (Github in particular), and I actually met him in person for the first time just hours before he presented our paper!

My student Ollie Kwizera came for PLMW—it was fun having him there. I only crossed paths with him three or four times, but I think that was all for the best, since he made his own friends and had his own experiences.

Other people who I enjoyed seeing and remember having interesting conversations with include (but I am probably forgetting someone!) Michael Adams, Daniel Bergey, Jan Bracker, Joachim Breitner, David Christiansen, David Darais, Stephen Dolan, Richard Eisenberg, Kenny Foner, Marco Gaboardi, Jeremy Gibbons, John Hughes, David Janin, Neel Krishnaswami, Dan Licata, Andres Löh, Simon Marlow, Tom Murphy, Peter-Michael Osera, Jennifer Paykin, Simon Peyton Jones, Ryan Scott, Mary Sheeran, Mike Sperber, Luite Stegeman, Wouter Swierstra, David Terei, Ryan Trinkle, Tarmo Uustalu, Stephanie Weirich, Nick Wu, Edward Yang, and Ryan Yates. My apologies if I forgot you, just remind me and I’ll add you to the list! I’m amazed and grateful I get to know all these cool people.

The Content

Here are just a few of my favorite talks:

  • I’m a sucker for anything involving geometry and/or random testing and/or pretty pictures, and Ilya Sergey’s talk Growing and Shrinking Polygons for Random testing of Computational Geometry had them all. In my experience, doing effective random testing in any domain beyond basic functions usually requires some interesting domain-specific insights, and Ilya had some cool insights about ways to generate and shrink polygons in ways that were much more likely to generate small counterexamples for computational geometry algorithms.

  • Idris gets more impressive by the day, and I always enjoy David Christiansen’s talks.

  • Sandra Dylus gave a fun talk, All Sorts of Permutations, with the cute observation that a sorting algorithm equipped with a nondeterministic comparison operator generates permutations (though it goes deeper than that). During the question period someone asked whether there is a way to generate all partitions, and someone sitting next to me suggested using the group function—and indeed, I think this works. I wonder what other sorts of combinatorial objects can be enumerated by this method. In particular I wonder if quicksort with nondeterministic comparisons can be adapted to generate not just all permutations, but all binary trees.

  • I greatly enjoyed TyDe, especially Jeremy Gibbons’ talk on APLicative Programming with Naperian Functors (I don’t think the video is online yet, if there is one). I’ll be serving as co-chair of the TyDe program committee next year, so start thinking about what you would like to submit!

  • There were also some fun talks at FARM, for example, Jay McCarthy’s talk on Bithoven. But I don’t think the FARM videos are uploaded yet. Speaking of FARM, the performance evening was incredible. It will be hard to live up to next year.

by Brent at October 05, 2016 03:36 AM

October 03, 2016

Douglas M. Auclair (geophf)

September 2016 1HaskellADay problems and solutions

by geophf ( at October 03, 2016 02:04 AM

October 02, 2016

Jasper Van der Jeugt

Patat and Myanmar travels

Presentations in the terminal

At work, I frequently need to give (internal) presentations and demos using video conferencing. I prefer to do these quick-and-dirty presentations in the terminal for a few reasons:

  • I don’t spend time worrying about layout, terminal stuff always looks cool.
  • I want to write markdown if possible.
  • You can have a good “Questions?” slide just by running cowsay 'Questions?'
  • Seamless switching between editor/shell and presentation using tmux.

The last point is important for video conferencing especially. The software we use allows you to share a single window from your desktop. This is pretty neat if you have a multi-monitor setup. However, it does not play well with switching between a PDF viewer and a terminal.

Introducing patat

To this end, I wrote patatPresentations And The ANSI Terminal – because I was not entirely happy with the available solutions. You can get it from Hackage: cabal install patat.

patat screenshot

patat screenshot

You run it simply by doing:


The key features are:

  • Built on Pandoc:

    The software I was using before contained some Markdown parsing bugs. By using Pandoc under the hood, this should not happen.

    Additionally, we get all the input formats Pandoc supports (Literate Haskell is of particular importance to me) and some additional elements like tables and definition lists.

  • Smart slide splitting:

    Most Markdown presentation tools seem to split slides at --- (horizontal rulers). This is a bit verbose since you usually start each slide with an h1 as well. patat will check if --- is used and if it’s not, it will split on h1s instead.

  • Live reload:

    If you run patat --watch, patat will poll the file for changes and reload automatically. This is really handy when you are writing the presentation, I usually use it with split-pane in tmux.

An example of a presentation is:

title: This is my presentation
author: Jane Doe

# This is a slide

Slide contents.  Yay.

# Important title

Things I like:

- Markdown
- Haskell
- Pandoc
- Traveling

How patat came to be

I started writing a simple prototype of patat during downtime at ICFP2016, when I discovered that MDP was not able to parse my presentation correctly.

After ICFP, I flew to Myanmar, and I am currently traveling around the country with my girlfriend. It’s a super interesting place to visit, with a rich history. Now that NLD is the ruling party, I think it is a great time to visit the country responsibly.

Riding around visiting temples in Bagan

Riding around visiting temples in Bagan

However, it is a huge country – the largest in south-east Asia – so there is some downtime traveling on domestic flights, buses and boats. I thought it was a good idea to improve the tool a bit further, since you don’t need internet to hack on this sort of thing.

Pull requests are welcome as always! Note that I will be slow to respond: for the next three days I will be trekking from Kalaw to Inle Lake, so I have no connectivity (or electricity, for that matter).

Sunset at U Bein bridge

Sunset at U Bein bridge

Sidenote: “Patat” is the Flemish word for “potato”. Dutch people also use it to refer to French Fries but I don’t really do that – in Belgium we just call fries “Frieten”.

by Jasper Van der Jeugt at October 02, 2016 12:00 AM

October 01, 2016

JP Moresmau

Everything is broken

This week was I suppose fairly typical. Started using a new library, the excellent sqlg that provides the TinkerPop graph API on top of relational databases. Found a bug pretty quickly. Off we go to contribute to another open source project, good for my street cred I suppose. Let’s fork it, and open the source code in IDEA (Community edition). After years of hearing abuse about Eclipse, I’m now trying to use “the best IDE ever” (say all the fan boys) instead. Well, that didn’t go so well, apparently importing a Maven project and resolving the dependencies proves too much for IDEA. I fought with it for a while, then gave up.

Fired up Eclipse, it opened and built the sqlg projects without a hitch. Wrote a test, fixed the bug, raised a PR, got it accepted with a thank you, life is good.

Then I find another bug. Except that upon investigation, it’s not in sqlg, it’s in the actual TinkerPop code. The generics on a map are wrong, there are values that are not instances of the key class (thanks generics type erasure!). So I can fix by changing the method signature, or change the keys. Both will break existing code. Sigh…

Oh, and the TinkerPop project doesn’t build in Eclipse. The Eclipse compiler chokes on some Java 8 code. Off to the Eclipse bug tracker. Maybe I need to have three different Java IDEs to be able to handle all the projects I may find bugs in.

Everything isbroken. Off I go to my own code to add my own bugs.

by JP Moresmau ( at October 01, 2016 01:09 PM

September 30, 2016


Hackage reliability via mirroring

TL;DR: Hackage now has multiple secure mirrors which can be used fully automatically by clients such as cabal.

In the last several years, as a community, we’ve come to greatly rely on services like Hackage and Stackage being available 24/7. There is always enormous frustration when either of these services goes down.

I think as a community we’ve also been raising our expectations. We’re all used to services like Google which appear to be completely reliable. Of course these are developed and operated by huge teams of professionals, whereas our community services are developed, maintained and operated by comparatively tiny teams on shoestring budgets.

A path to greater reliability

Nevertheless, reliability is important to us all, and so there has been a fair bit of effort put in over the last few years to improve reliability. I’ll talk primarily about Hackage since that is what I am familiar with.

Firstly, a couple years ago Hackage and were moved from super-cheap VM hosting (where our machines tended to go down several times a year) to actually rather good quality hosting provided by Rackspace. Thanks to Rackspace for donating that, and the infrastructure team for getting that organised and implemented. That in itself has made a huge difference: we’ve had far fewer incidents of downtime since then.

Obviously even with good quality hosting we’re still only one step away from unscheduled downtime, because the architecture is too centralised.

There were two approaches that people proposed. One was classic mirroring: spread things out over multiple mirrors for redundancy. The other proposal was to adjust the Hackage architecture somewhat so that while the main active Hackage server runs on some host, the the core Hackage archive would be placed on an ultra-reliable 3rd party service like AWS S3, so that this would stay available even if the main server was unavailable.

The approach we decided to take was the classic mirroring one. In some ways this is the harder path, but I think ultimately it gives the best results. This approach also tied in with the new security architecture (The Update Framework – TUF) that we were implementing. The TUF design includes mirrors and works in such a way that mirrors do not need to be trusted. If we (or rather end users) do not have to trust the operators of all the mirrors then this makes a mirroring approach much more secure and much easier to deploy.

Where we are today

The new system has been in beta for some time and we’re just short of flipping the switch for end users. The new Hackage security system in place on the server side, while on the client side, the latest release of cabal-install can be configured to use it, and the development version uses it by default.

There is lots to say about the security system, but that has (1, 2, 3) and will be covered elsewhere. This post is about mirroring.

For mirrors, we currently have two official public mirrors, and a third in the works. One mirror is operated by FP Complete and the other by Herbert Valerio Riedel. For now, Herbert and I manage the list of mirrors and we will be accepting contributions of further public mirrors. It is also possible to run private mirrors.

Once you are using a release of cabal-install that uses the new system then no further configuration is required to make use of the mirrors (or indeed the security). The list of public mirrors is published by the Hackage server (along with the security metadata) and cabal-install (and other clients using hackage-security) will automatically make use of them.

Reliability in the new system

Both of the initial mirrors are individually using rather reliable hosting. One is on AWS S3 and one on DreamHost S3. Indeed the weak point in the system is no longer the hosting. It is other factors like reliability of the hosts running the agents that do the mirroring, and the ever present possibility of human error.

The fact that the mirrors are hosted and operated independently is the key to improved reliability. We want to reduce the correlation of failures.

Failures in hosting can be mitigated by using multiple providers. Even AWS S3 goes down occasionally. Failures in the machines driving the mirroring are mitigated by using a normal decentralised pull design (rather than pushing out from the centre) and hosting the mirroring agents separately. Failures due to misconfiguration and other human errors are mitigated by having different mirrors operated independently by different people.

So all these failures can and will happen, but if they are not correlated and we have enough mirrors then the system overall can be quite reliable.

There is of course still the possibility that the upstream server goes down. It is annoying not to be able to upload new packages, but it is far more important that people be able to download packages. The mirrors mean there should be no interruption in the download service, and it gives the upstream server operators the breathing space to fix things.

by duncan at September 30, 2016 03:08 PM

September 29, 2016

Neil Mitchell

Full-time Haskell jobs in London, at Barclays

Summary: I'm hiring 9 Haskell programmers. Email neil.d.mitchell AT to apply.

I work for Barclays, in London, working on a brand new Haskell project. We're looking for nine additional Haskell programmers to come and join the team.

What we offer

A permanent job, writing Haskell, using all the tools you know and love – GHC/Cabal/Stack etc. In the first two weeks in my role I've already written parsers with attoparsec, both Haskell and HTML generators and am currently generating a binding to C with lots of Storable/Ptr stuff. Since it's a new project you will have the chance to help shape the project.

The project itself is to write a risk engine – something that lets the bank calculate the value of the trades it has made, and how things like changes in the market change their value. Risk engines are important to banks and include lots of varied tasks – talking to other systems, overall structuring, actual computation, streaming values, map/reduce.

We'll be following modern but lightweight development principles – including nightly builds, no check-ins to master which break the tests (enforced automatically) and good test coverage.

These positions have attractive salary levels.

What we require

We're looking for the best functional programmers out there, with a strong bias towards Haskell. We have a range of seniorities available to suit even the most experienced candidates. We don't have anything at the very junior end; instead we're looking for candidates that are already fluent and productive. That said, a number of very good Haskell programmers think of themselves as beginners even after many years, so if you're not sure, please get in touch.

We do not require any finance knowledge.

The role is in London, Canary Wharf, and physical presence in the office on a regular basis is required – permanent remote working is not an option.

How to apply

To apply, email neil.d.mitchell AT with a copy of your CV. If you have any questions, email me.

The best way to assess technical ability is to look at code people have written. If you have any packages on Hackage or things on GitHub, please point me at the best projects. If your best code is not publicly available, please describe the Haskell projects you've been involved in.

by Neil Mitchell ( at September 29, 2016 12:26 PM

Don Stewart (dons)

Haskell dev roles with Strats @ Standard Chartered

The Strats team at Standard Chartered is growing. We have 10 more open roles currently, in a range of areas:

  • Haskell dev for hedging effectiveness analytics, and build hedging services.
  • Haskell devs for derivatives pricing services. Generic roles using Haskell.
  • Web-experienced Haskell devs for frontends to analytics services written in Haskell. PureScript and or data viz, user interfaces skills desirable
  • Haskell dev for trading algorithms and strategy development.
  • Dev/ops role to extend our continuous integration infrastructure (Haskell+git)
  • Contract analysis and manipulation in Haskell for trade formats (FpML + Haskell).
  • Haskell dev for low latency (< 100 microsecond) components in soft real-time non-linear pricing charges service.

You would join an existing team of 25 Haskell developers in Singapore or London. Generally our roles involve directly working with traders to automate their work and improve their efficiency. We use Haskell for all tasks. Either GHC Haskell or our own (“Mu”) implementation, and this is a rare chance to join a large, experienced Haskell dev team.

We offer permanent or contractor positions, at Director and Associate Director level, with very competitive compensation. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required or other typed FP.

All roles require some physical presence in either Singapore or London, and we offer flexiblity with these constraints (with work from home available). No financial background is required or assumed.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send your PDF resume to me – donald.stewart <at>

Tagged: jobs

by Don Stewart at September 29, 2016 08:17 AM


Sharing, Space Leaks, and Conduit and friends

TL;DR: Sharing conduit values leads to space leaks. Make sure that conduits are completely reconstructed on every call to runConduit; this implies we have to be careful not to create any (potentially large) conduit CAFs (skip to the final section “Avoiding space leaks” for some details on how to do this). Similar considerations apply to other streaming libraries and indeed any Haskell code that uses lazy data structures to drive computation.


We use large lazy data structures in Haskell all the time to drive our programs. For example, consider

main1 :: IO ()
main1 = forM_ [1..5] $ \_ -> mapM_ print [1 .. 1000000]

It’s quite remarkable that this works and that this program runs in constant memory. But this stands on a delicate cusp. Consider the following minor variation on the above code:

ni_mapM_ :: (a -> IO b) -> [a] -> IO ()
{-# NOINLINE ni_mapM_ #-}
ni_mapM_ = mapM_

main2 :: IO ()
main2 = forM_ [1..5] $ \_ -> ni_mapM_ print [1 .. 1000000]

This program runs, but unlike main1, it has a maximum residency of 27 MB; in other words, this program suffers from a space leak. As it turns out, main1 was running in constant memory because the optimizer was able to eliminate the list altogether (due to the fold/build rewrite rule), but it is unable to do so in main2.

But why is main2 leaking? In fact, we can recover constant space behaviour by recompiling the code with -fno-full-laziness. The full laziness transformation is effectively turning main2 into

longList :: [Integer]
longList = [1 .. 1000000]

main3 :: IO ()
main3 = forM_ [1..5] $ \_ -> ni_mapM_ print longList

The first iteration of the forM_ loop constructs the list, which is then retained to be used by the next iterations. Hence, the large list is retained for the duration of the program, which is the beforementioned space leak.

The full laziness optimization is taking away our ability to control when data structures are not shared. That ability is crucial when we have actions driven by large lazy data structures. One particularly important example of such lazy structures that drive computation are conduits or pipes. For example, consider the following conduit code:

import qualified Data.Conduit as C

countConduit :: Int -> C.Sink Char IO ()
countConduit cnt = do
    mi <- C.await
    case mi of
      Nothing -> liftIO (print cnt)
      Just _  -> countConduit $! cnt + 1

getConduit :: Int -> C.Source IO Char
getConduit 0 = return ()
getConduit n = do
    ch <- liftIO getChar
    C.yield ch
    getConduit (n - 1)

Here countConduit is a sink that counts the characters it receives from upstream, and getConduit n is a conduit that reads n characters from the console and passes them downstream.

To illustrate what might go wrong, we will use the following exception handler throughout this blog post5:

retry :: IO a -> IO a
retry io = do
    ma <- try io
    case ma of
      Right a -> return a
      Left (_ :: SomeException) -> retry io

The important point to notice about this exception handler is that it retains a reference to the action io as it executes that action, since it might potentially have to execute it again if an exception is thrown. However, all the space leaks we discuss in this blog post arise even when an exception is never thrown and hence the action is run only once; simply maintaining a reference to the action until the end of the program is enough to cause the space leak.

If we use this exception handler as follows:

main :: IO ()
main = retry $ C.runConduit $ getConduit 1000000 C.=$= countConduit 0

we again end up with a large space leak, this time of type Pipe and ->Pipe (conduit’s internal type):

Although the values that stream through the conduit come from IO, the conduit itself is fully constructed and retained in memory. In this blog post we examine what exactly is being retained here, and why. We will finish with some suggestions on how to avoid such space-leaks, although sadly there is no easy answer. Note that these problems are not specific to the conduit library, but apply equally to all other similar libraries.

We will not assume any knowledge of conduit but start from first principles; however, if you have never used any of these libraries before this blog post is probably not the best starting point; you might for example first want to watch my presentation Lazy I/O and Alternatives in Haskell.


Before we look at the more complicated case, let’s first consider another program using just lists:

main :: IO ()
main = retry $ ni_mapM_ print [1..1000000]

This program suffers from a space leak for similar reasons to the example with lists we saw in the introduction, but it’s worth spelling out the details here: where exactly is the list being maintained?

Recall that the IO monad is effectively a state monad over a token RealWorld state (if that doesn’t make any sense to you, you might want to read ezyang’s article Unraveling the mystery of the IO monad first). Hence, ni_mapM_ (just a wrapper around mapM_) is really a function of three arguments: the action to execute for every element of the list, the list itself, and the world token. That means that

ni_mapM_ print [1..1000000]

is a partial application, and hence we are constructing a PAP object. Such a PAP object is an runtime representation of a partial application of a function; it records the function we want to execute (ni_mapM_), as well as the arguments we have already provided. It is this PAP object that we give to retry, and which retry retains until the action completes because it might need it in the exception handler. The long list in turn is being retained because there is a reference from the PAP object to the list (as one of the arguments that we provided).

Full laziness does not make a difference in this example; whether or not that [1 .. 10000000] expression gets floated out makes no difference.

Reminder: Conduits/Pipes

Just to make sure we don’t get lost in the details, let’s define a simple conduit-like or pipe-like data structure:

data Pipe i o m r =
    Yield o (Pipe i o m r)
  | Await (Either r i -> Pipe i o m r)
  | Effect (m (Pipe i o m r))
  | Done r

A pipe or a conduit is a free monad which provides three actions:

  1. Yield a value downstream
  2. Await a value from upstream
  3. Execute an effect in the underlying monad.

The argument to Await is passed an Either; we give it a Left value if upstream terminated, or a Right value if upstream yielded a value.1

This definition is not quite the same as the one used in real streaming libraries and ignores various difficulties (in particular exception safely, as well as other features such as leftovers); however, it will suffice for the sake of this blog post. We will use the terms “conduit” and “pipe” interchangeably in the remainder of this article.


The various Pipe constructors differ in their memory behaviour and the kinds of space leaks that they can create. We therefore consider them one by one. We will start with sources, because their memory behaviour is relatively straightforward.

A source is a pipe that only ever yields values downstream.2 For example, here is a source that yields the values [n, n-1 .. 1]:

yieldFrom :: Int -> Pipe i Int m ()
yieldFrom 0 = Done ()
yieldFrom n = Yield n $ yieldFrom (n - 1)

We could “run” such a pipe as follows:

printYields :: Show o => Pipe i o m () -> IO ()
printYields (Yield o k) = print o >> printYields k
printYields (Done ())   = return ()

If we then run the following program:

main :: IO ()
main = retry $ printYields (yieldFrom 1000000)

we get a space leak. This space leak is very similar to the space leak we discussed in section Lists above, with Done () playing the role of the empty list and Yield playing the role of (:). As in the list example, this program has a space leak independent of full laziness.


A sink is a conduit that only ever awaits values from upstream; it never yields anything downstream.2 The memory behaviour of sinks is considerably more subtle than the memory behaviour of sources and we will examine it in detail. As a reminder, the constructor for Await is

data Pipe i o m r = Await (Either r i -> Pipe i o m r) | ...

As an example of a sink, consider this pipe that counts the number of characters it receives:

countChars :: Int -> Pipe Char o m Int
countChars cnt =
    Await $ \mi -> case mi of
      Left  _ -> Done cnt
      Right _ -> countChars $! cnt + 1

We could “run” such a sink by feeding it a bunch of characters; say, 1000000 of them:

feed :: Char -> Pipe Char o m Int -> IO ()
feed ch = feedFrom 10000000
    feedFrom :: Int -> Pipe Char o m Int -> IO ()
    feedFrom _ (Done r)  = print r
    feedFrom 0 (Await k) = feedFrom 0     $ k (Left 0)
    feedFrom n (Await k) = feedFrom (n-1) $ k (Right ch)

If we run this as follows and compile with optimizations enabled, we once again end up with a space leak:

main :: IO ()
main = retry $ feed 'A' (countChars 0)

We can recover constant space behaviour by disabling full laziness; however, the effect of full laziness on this example is a lot more subtle than the example we described in the introduction.

Full laziness

Let’s take a brief moment to describe what full laziness is, exactly. Full laziness is one of the optimizations that ghc applies by default when optimizations are enabled; it is described in the paper “Let-floating: moving bindings to give faster programs”. The idea is simple; if we have something like

f = \x y -> let e = .. -- expensive computation involving x but not y
            in ..

full laziness floats the let binding out over the lambda to get

f = \x = let e = .. in \y -> ..

This potentially avoids unnecessarily recomputing e for different values of y. Full laziness is a useful transformation; for example, it turns something like

f x y = ..
    go = .. -- some local function


f x y   = ..
f_go .. = ..

which avoids allocating a function closure every time f is called. It is also quite a notorious optimization, because it can create unexpected CAFs (constant applicative forms; top-level definitions of values); for example, if you write

nthPrime :: Int -> Int
nthPrime n = allPrimes !! n
    allPrimes :: [Int]
    allPrimes = ..

you might expect nthPrime to recompute allPrimes every time it is invoked; but full laziness might move that allPrimes definition to the top-level, resulting in a large space leak (the full list of primes would be retained for the lifetime of the program). This goes back to the point we made in the introduction: full laziness is taking away our ability to control when values are not shared.

Full laziness versus sinks

Back to the sink example. What exactly is full laziness doing here? Is it constructing a CAF we weren’t expecting? Actually, no; it’s more subtle than that. Our definition of countChars was

countChars :: Int -> Pipe Char o m Int
countChars cnt =
    Await $ \mi -> case mi of
      Left  _ -> Done cnt
      Right _ -> countChars $! cnt + 1

Full laziness is turning this into something more akin to

countChars' :: Int -> Pipe Char o m Int
countChars' cnt =
    let k = countChars' $! cnt + 1
    in Await $ \mi -> case mi of
                        Left  _ -> Done cnt
                        Right _ -> k

Note how the computation of countChars' $! cnt + 1 has been floated over the lambda; ghc can do that, since this expression does not depend on mi. So in memory the countChars 0 expression from our main function (retained, if you recall, because of the surrounding retry wrapper), develops something like this. It starts of as a simple thunk:

Then when feed matches on it, it gets reduced to weak head normal form, exposing the top-most Await constructor:

The body of the await is a function closure pointing to the function inside countChars (\mi -> case mi ..), which has countChars $! (cnt + 1) as an unevaluated thunk in its environment. Evaluating it one step further yields

So where for a source the data structure in memory was a straightforward “list” consisting of Yield nodes, for a sink the situation is more subtle: we build up a chain of Await constructors, each of which points to a function closure which in its environment has a reference to the next Await constructor. This wouldn’t matter of course if the garbage collector could clean up after us; but if the conduit itself is shared, then this results in a space leak.

Without full laziness, incidentally, evaluating countChars 0 yields

and the chain stops there; the only thing in the function closure now is cnt. Since we don’t allocate the next Yield constructor before running the function, we never construct a chain of Yield constructors and hence we have no space leak.

Depending on values

It is tempting to think that if the conduit varies its behaviour depending on the values it receives from upstream the same chain of Await constructors cannot be constructed and we avoid a space leak. For example, consider this variation on countChars which only counts spaces:

countSpaces :: Int -> Pipe Char o m Int
countSpaces cnt =
    Await $ \mi ->
      case mi of
        Left  _   -> Done cnt
        Right ' ' -> countSpaces $! cnt + 1
        Right _   -> countSpaces $! cnt

If we substitute this conduit for countChars in the previous program, do we fare any better? Alas, the memory behaviour of this conduit, when shared, is in fact far, far worse.

The reason is that both the countSpaces $! cnt + 1 and the expression countSpaces $! cnt can both be floated out by the full laziness optimization. Hence, now every Await constructor will have a function closure in its payload with two thunks, one for each alternative way to execute the conduit. What’s more, both of these thunks will are retained as long as we retain a reference to the top-level conduit.

We can neatly illustrate this using the following program:

main :: IO ()
main = do
    let count = countSpaces 0
    feed ' ' count
    feed ' ' count
    feed ' ' count
    feed 'A' count
    feed 'A' count
    feed 'A' count

The first feed ' ' explores a path through the conduit where every character is a space; so this constructs (and retains) one long chain of Await constructors. The next two calls to feed ' ' however walk over the exact same path, and hence memory usage does not increase for a while. But then we explore a different path, in which every character is a non-space, and hence memory behaviour will go up again. Then during the second call to feed 'A' memory usage is stable again, until we start executing the last feed 'A', at which point the garbage collector can finally start cleaning things up:

What’s worse, there is an infinite number of paths through this conduit. Every different combination of space and non-space characters will explore a different path, leading to combinatorial explosion and terrifying memory usage.


The precise situation for effects depends on the underlying monad, but let’s explore one common case: IO. As we will see, for the case of IO the memory behaviour of Effect is actually similar to the memory behaviour of Await. Recall that the Effect constructor is defined as

data Pipe i o m r = Effect (m (Pipe i o m r)) | ...

Consider this simple pipe that prints the numbers [n, n-1 .. 1]:

printFrom :: Int -> Pipe i o IO ()
printFrom 0 = Done ()
printFrom n = Effect $ print n >> return (printFrom (n - 1))

We might run such a pipe using3:

runPipe :: Show r => Pipe i o IO r -> IO ()
runPipe (Done r)   = print r
runPipe (Effect k) = runPipe =<< k

In order to understand the memory behaviour of Effect, we need to understand how the underlying monad behaves. For the case of IO, IO actions are state transformers over a token RealWorld state. This means that the Effect constructor actually looks rather similar to the Await constructor. Both have a function as payload; Await a function that receives an upstream value, and Effect a function that receives a RealWorld token. To illustrate what printFrom might look like with full laziness, we can rewrite it as

printFrom :: Int -> Pipe i o IO ()
printFrom n =
    let k = printFrom (n - 1)
    in case n of
         0 -> Done ()
         _ -> Effect $ IO $ \st -> unIO (print n >> return k) st

If we visualize the heap (using ghc-vis), we can see that it does indeed look very similar to the picture for Await:

Increasing sharing

If we cannot guarantee that our conduits are not shared, then perhaps we should try to increase sharing instead. If we can avoid allocating these chains of pipes, but instead have pipes refer back to themselves, perhaps we can avoid these space leaks.

In theory, this is possible. For example, when using the conduit library, we could try to take advantage of monad transformers and rewrite our feed source and our count sink as:

feed :: Source IO Char
feed = evalStateC 1000000 go
    go :: Source (StateT Int IO) Char
    go = do
      st <- get
      if st == 0
        then return ()
        else do put $! (st - 1) ; yield 'A' ; go

count :: Sink Char IO Int
count = evalStateC 0 go
    go :: Sink Char (StateT Int IO) Int
    go = do
        mi <- await
        case mi of
          Nothing -> get
          Just _  -> modify' (+1) >> go

In both definitions go refers back to itself directly, with no arguments; hence, it ought to be self-referential, without any long chain of sources or sinks ever being constructed. This works; the following program runs in constant space:

main :: IO ()
main = retry $ print =<< (feed $$ count)

However, this kind of code is extremely brittle. For example, consider the following minor variation on count:

count :: Sink Char IO Int
count = evalStateC 0 go
    go :: Sink Char (StateT Int IO) Int
    go = withValue $ \_ -> modify' (+1) >> go

    withValue :: (i -> Sink i (StateT Int IO) Int)
              -> Sink i (StateT Int IO) Int
    withValue k = do
      mch <- await
      case mch of
        Nothing -> get
        Just ch -> k ch

This seems like a straight-forward variation, but this code in fact suffers from a space leak again4. The optimized core version of this variation of count looks something like this:

count :: ConduitM Char Void (StateT Int IO) Int
count = ConduitM $ \k ->
    let countRec = modify' (+ 1) >> count
    in unConduitM await $ \mch ->
         case mch of
           Nothing -> unConduitM get      k
           Just _  -> unConduitM countRec k

In the conduit library, ConduitM is a codensity transformation of an internal Pipe datatype; the latter corresponds more or less to the Pipe datastructure we’ve been describing here. But we can ignore these details: the important point here is that this has the same typical shape that we’ve been studying above, with an allocation inside a lambda but before an await.

We can fix it by writing our code as

count :: Sink Char IO Int
count = evalStateC 0 go
    go :: Sink Char (StateT Int IO) Int
    go = withValue goWithValue

    goWithValue :: Char -> Sink Char (StateT Int IO) Int
    goWithValue _ = modify' (+1) >> go

    withValue :: (i -> Sink i (StateT Int IO) Int)
              -> Sink i (StateT Int IO) Int
    withValue k = do
      mch <- await
      case mch of
        Nothing -> get
        Just ch -> k ch

Ironically, it would seem that full laziness here could have helped us by floating out that modify' (+1) >> go expression for us. The reason that it didn’t is probably related to the exact way the k continuation is threaded through in the compiled code (I simplified a bit above). Whatever the reason, tracking down problems like these is difficult and incredibly time consuming; I’ve spent many many hours studying the output of -ddump-simpl and comparing before and after pictures. Not a particularly productive way to spend my time, and this kind of low-level thinking is not what I want to do when writing application level Haskell code!

Composed pipes

Normally we construct pipes by composing components together. Composition of pipes can be defined as

(=$=) :: Monad m => Pipe a b m r -> Pipe b c m r -> Pipe a c m r
{-# NOINLINE (=$=) #-}
_         =$= Done   r   = Done r
u         =$= Effect   d = Effect $ (u =$=) <$> d
u         =$= Yield  o d = Yield o (u =$= d)
Yield o u =$= Await    d = u =$= d (Right o)
Await   u =$= Await    d = Await $ \ma -> u ma =$= Await d
Effect  u =$= Await    d = Effect $ (=$= Await d) <$> u
Done  r   =$= Await    d = Done r =$= d (Left r)

The downstream pipe “is in charge”; the upstream pipe only plays a role when downstream awaits. This mirrors Haskell’s lazy “demand-driven” evaluation model.

Typically we only run self-contained pipes that don’t have any Awaits or Yields left (after composition), so we are only left with Effects. The good news is that if the pipe components don’t consist of long chains, then their composition won’t either; at every Effect point we wait for either upstream or downstream to complete its effect; only once that is done do we receive the next part of the pipeline and hence no chains can be constructed.

On the other hand, of course composition doesn’t get rid of these space leaks either. As an example, we can define a pipe equivalent to the getConduit from the introduction

getN :: Int -> Pipe i Char IO Int
getN 0 = Done 0
getN n = Effect $ do
           ch <- getChar
           return $ Yield ch (getN (n - 1))

and then compose getN and countChars to get a runnable program:

main :: IO ()
main = retry $ runPipe $ getN 1000000 =$= countChars 0

This program suffers from the same space leaks as before because the individual pipelines component are kept in memory. As in the sink example, memory behaviour would be much worse still if there was different paths through the conduit network.


At Well-Typed we’ve been developing an application for a client to do streaming data processing. We’ve been using the conduit library to do this, with great success. However, occassionally space leaks arise that difficult to fix, and even harder to track down; of course, we’re not the first to suffer from these problems; for example, see ghc ticket #9520 or issue #6 for the streaming library (a library similar to conduit).

In this blog post we described how such space leaks arise. Similar space leaks can arise with any kind of code that uses large lazy data structures to drive computation, including other streaming libraries such as pipes or streaming, but the problem is not restricted to streaming libraries.

The conduit library tries to avoid these intermediate data structures by means of fusion rules; naturally, when this is successful the problem is avoided. We can increase the likelihood of this happening by using combinators such as folds etc., but in general the intermediate pipe data structures are difficult to avoid.

The core of the problem is that in the presence of the full laziness optimization we have no control over when values are not shared. While it is possible in theory to write code in such a way that the lazy data structures are self-referential and hence keeping them in memory does not cause a space leak, in practice the resulting code is too brittle and writing code like this is just too difficult. Just to provide one more example, in our application we had some code that looked like this:

go x@(C y _) = case y of
         Constr1 -> doSomethingWith x >> go
         Constr2 -> doSomethingWith x >> go
         Constr3 -> doSomethingWith x >> go
         Constr4 -> doSomethingWith x >> go
         Constr5 -> doSomethingWith x >> go

This worked and ran in constant space. But after adding a single additional clause to this pattern match, suddenly we reintroduced a space leak again:

go x@(C y _) = case y of
         Constr1 -> doSomethingWith x >> go
         Constr2 -> doSomethingWith x >> go
         Constr3 -> doSomethingWith x >> go
         Constr4 -> doSomethingWith x >> go
         Constr5 -> doSomethingWith x >> go
         Constr6 -> doSomethingWith x >> go

This was true even when that additional clause was never used; it had nothing to do with the change in the runtime behaviour of the code. Instead, when we added the additional clause some limit got exceeded in ghc’s bowels and suddenly something got allocated that wasn’t getting allocated before.

Full laziness can be disabled using -fno-full-laziness, but sadly this throws out the baby with the bathwater. In many cases, full laziness is a useful optimization. In particular, there is probably never any point allocation a thunk for something that is entirely static. We saw one such example above; it’s unexpected that when we write

go = withValue $ \_ -> modify' (+1) >> go

we get memory allocations corresponding to the modify' (+1) >> go expression.

Avoiding space leaks

So how do we avoid these space leaks? The key idea is pretty simple: we have to make sure the conduit is fully reconstructed on every call to runConduit. Conduit code typically looks like

runMyConduit :: Some -> Args -> IO r
runMyConduit some args =
    runConduit $ stage1 some
             =$= stage2 args
             =$= stageN

You should put all top-level calls to runConduit into a module of their own, and disable full laziness in that module by declaring

{-# OPTIONS_GHC -fno-full-laziness #-}

at the top of the file. This means the computation of the conduit (stage1 =$= stage2 .. =$= stageN) won’t get floated to the top and the conduit will be recomputed on every invocation of runMyConduit (note that this relies on runMyConduit to have some arguments; if it doesn’t, you should add a dummy one).

This might not be enough, however. In the example above, stageN is still a CAF, and the evalation of the conduit stage1 =$= ... =$= stageN will cause that CAF to be evaluated and potentially retained in memory. CAFs are fine for conduits that are guaranteed to be small, or that loop back onto themselves; however, as discussed in section “Increasing sharing”, writing such conduit values is not an easy task, although it is manageable for simple conduits.

To avoid CAFs, conduis like stageN must be given a dummy argument and full laziness must be disabled for the module where stageN is defined. But it’s more subtle than that; even if a conduit does have real (non-dummy) arguments, part of that conduit might still be independent of those arguments and hence be floated to the top by the full laziness optimization, creating yet more unwanted CAF values. Full laziness must again be disabled to stop this from happening.

If you are sure that full laziness cannot float anything harmful to the top, you can leave it enabled; however, verifying that this is the case is highly non-trivial. You can of course test the code, but if you are unlucky the memory leak will only arise under certain specific usage conditions. Moreover, a small modification to the codebase, the libraries it uses, or even the compiler, perhaps years down the line, might change the program and reintroduce a memory leak.

Proceed with caution.

Further reading

Addendum 1: ghc’s “state hack”

Let’s go back to the section about sinks; if you recall, we considered this example:

countChars :: Int -> Pipe Char o m Int
countChars cnt =
    let k = countChars $! cnt + 1
    in Await $ \mi -> case mi of
                        Left  _ -> Done cnt
                        Right _ -> k

feedFrom :: Int -> Pipe Char o m Int -> IO ()
feedFrom n (Done r)  = print r
feedFrom 0 (Await k) = feedFrom 0 $ k (Left 0)
feedFrom n (Await k) = feedFrom (n - 1) $ k (Right 'A')

main :: IO ()
main = retry $ feedFrom 10000000 (countChars 0)

We explained how countChars 0 results in a chain of Await constructors and function closures. However, you might be wondering, why would this be retained at all? After all, feedFrom is just an ordinary function, albeit one that computes an IO action. Why shouldn’t the whole expression

feedFrom 10000000 (countChars 0)

just be reduced to a single print 10000000 action, leaving no trace of the pipe at all? Indeed, this is precisely what happens when we disable ghc’s “state hack”; if we compile this program with -fno-state-hack it runs in constant space.

So what is the state hack? You can think of it as the opposite of the full laziness transformation; where full laziness transforms

     \x -> \y -> let e = <expensive> in ..    
~~>  \x -> let e = <expensive> in \y -> ..

the state hack does the opposite

     \x -> let e = <expensive> in \y -> ..
~~>  \x -> \y -> let e = <expensive> in ..    

though only for arguments y of type State# <token>. In general this is not sound, of course, as it might duplicate work; hence, the name “state hack”. Joachim Breitner’s StackOverflow answer explains why this optimization is necessary; my own blog post Understanding the RealWorld provides more background.

Let’s leave aside the question of why this optimization exists, and consider the effect on the code above. If you ask ghc to dump the optimized core (-ddump-stg), and translate the result back to readable Haskell, you will realize that it boils down to a single line change. With the state hack disabled the last line of feedFrom is effectively:

feedFrom n (Await k) = IO $
    unIO (feedFrom (n - 1) (k (Right 'A')))

where IO and unIO just wrap and unwrap the IO monad. But when the state hack is enabled (the default), this turns into

feedFrom n (Await k) = IO $ \w ->
    unIO (feedFrom (n - 1) (k (Right 'A'))) w

Note how this floats the recursive call to feedFrom into the lambda. This means that

feedFrom 10000000 (countChars 0)

no longer reduces to a single print statement (after an expensive computation); instead, it reduces immediately to a function closure, waiting for its world argument. It’s this function closure that retains the Await/function chain and hence causes the space leak.

Addendum 2: Interaction with cost-centres (SCC)

A final cautionary tale. Suppose we are studying a space leak, and so we are compiling our code with profiling enabled. At some point we add some cost centres, or use -fprof-auto perhaps, and suddenly find that the space leak disappeared! What gives?

Consider one last time the sink example. We can make the space leak disappear by adding a single cost centre:

feed :: Char -> Pipe Char o m Int -> IO ()
feed ch = feedFrom 10000000
    feedFrom :: Int -> Pipe Char o m Int -> IO ()
    feedFrom n p = {-# SCC "feedFrom" #-}
      case (n, p) of
        (_, Done r)  -> print r
        (0, Await k) -> feedFrom 0     $ k (Left 0)
        (_, Await k) -> feedFrom (n-1) $ k (Right ch)

Adding this cost centre effectively has the same result as specifying -fno-state-hack; with the cost centre present, the state hack can no longer float the computations into the lambda.


  1. The ability to detect upstream termination is one of the characteristics that sets conduit apart from the pipes package, in which this is impossible (or at least hard to do). Personally, I consider this an essential feature. Note that the definition of Pipe in conduit takes an additional type argument to avoid insisting that the type of the upstream return value matches the type of the downstream return value. For simplicity I’ve omitted this additional type argument here.

  2. Sinks and sources can also execute effects, of course; since we are interested in the memory behaviour of the indvidual constructors, we treat effects separately.

  3. runPipe is (close to) the actual runPipe we would normally use; we connect pipes that await or yield into a single self contained pipe that does neither.

  4. For these simple examples actually the optimizer can work its magic and the space leak doesn’t appear, unless evalStateC is declared NOINLINE. Again, for larger examples problems arise whether it’s inlined or not.

  5. The original definition of retry used in this blogpost was

    retry io = catch io (\(_ :: SomeException) -> retry io)

    but as Eric Mertens rightly points out, this is not correct as catch runs the exception handler with exceptions masked. For the purposes of this blog post however the difference is not important; in fact, none of the examples in this blog post run the exception handler at all.

by edsko at September 29, 2016 06:20 AM