Planet Haskell

February 20, 2018

Mark Jason Dominus

Composition of utility pole ID tags

In a recent article discussing utility poles, and the metal ID plates they carry, I wondered what the plates were made of:

Steel would rust; and I thought even stainless steel wouldn't last as long as these tags need to. Aluminum is expensive. Tin degrades at low temperatures. … I will go test the tags with a magnet to see if they are ferrous.

They are not ferrous. Probably they are aluminum. My idea that aluminum is too expensive to use for the plates was ridiculous. The pole itself costs a lot of money. The sophisticated electrical equipment on the pole costs thousands of dollars. The insulated wire strung from the pole is made of copper. Compared with all this, a ten-centimeter oval of stamped aluminum is not a big deal.

1.8mm aluminum sheet costs $100 per square meter even if you don't buy it in great quantity. Those aluminum tags probably cost no more than fifty cents each.

by Mark Dominus ( at February 20, 2018 04:40 AM

February 18, 2018

Neil Mitchell

Atomic Expressions Generically

Summary: For certain hints HLint needs to determine if a Haskell expression is atomic. I wrote a generic method to generate expressions and test if they are atomic.

With HLint, if you write a statement such as:

main = print ("Hello")

You get the hint:

Sample.hs:1:14: Warning: Redundant bracket
Why not:

One of ways HLint figures out if brackets are redundant is if the expression inside the brackets is "atomic" - if you never have to bracket it in any circumstances. As an example, a literal string is atomic, but an if expression is not. The isAtom function from haskell-src-exts-util has a list of the types of expression which are atomic, but the Exp type from haskell-src-exts has 55 distinct constructors, and I don't even know what many of them do. How can we check the isAtom function is correct?

One approach is to use human thought, and that's the approach used until now, with reasonable success. However, I've recently written a script which solves the problem more permanently, generating random expressions and checking that isAtom gives the right value. In this post I'm going to outline a few features of how that script works. There are basically three steps:

1) Generate a type-correct Exp

The first step is to generate a random Exp which follows the type definition. Fortunately the Data class in Haskell lets us generate values. We define:

mkValue :: forall a . Data a => Int -> IO a
mkValue depth
| Just x <- cast "aA1:+" = randomElem x
| Just x <- cast [-1 :: Int, 1] = randomElem x
| Just x <- cast [-1 :: Integer, 1] = randomElem x
| AlgRep cs <- dataTypeRep $ dataTypeOf (undefined :: a) =
if depth <= 0 then throwIO LimitReached else fromConstrM (mkValue $ depth - 1) =<< randomElem cs

Here we are saying that given a depth, and a result type a, we generate a value of type a. Note that the a argument is the result, but we don't pass anything in of type a. The first three lines of the body follow the pattern:

    | Just x <- cast [list_of_element] = randomElem x

This tries to convert list_of_element to [a] by using runtime type information. If it succeeds, we pick a random element from the list. If it doesn't we continue onwards.

The final case uses dataTypeRep/dataTypeOf to get a list of the constructors of a. Note that we don't have a value of a, so we make one up using undefined :: a - but that's OK because dataTypeOf promises not to look at its argument. Given a list of constructors, we pick one at random, and then call fromConstrM - which says how to create a value of the right constructor, using some argument to fill in all the fields. We pass mkValue as that argument, which causes us to recursively build up random values.

One immediate problem is what if we are building a [Int] and the random generator often picks (:)? We'll take a very long time to finish. To solve this problem we keep a depth counter, decrement it in every recursive call, and when it runs out, throwIO an exception and give up.

2) Generate a parsing Exp

Now we've got a valid Exp value, but just because an Exp can be represented in the AST doesn't mean it corresponds to Haskell fragment. As an example, consider Var (UnQual (Ident "Test")). That's a valid value of type Exp, but if you pretty print it you get Test, and if you parse it back you'll get Con (UnQual (Ident "Test")) - variables must start with a leading lower-case letter.

To ignore invalid expressions we try pretty printing then parsing the expression, and ignore all expressions which don't roundtrip.

3) Determine if the Exp is atomic

Now we've got a valid Exp, which we know the user could have typed in as a source program, we need to figure out if isAtom is correct. To do that we see if given expression x whether self-application roundtrips, i.e. x x. As a positive example, foo (a variable) roundtrips as foo foo being foo applied to itself. However, if b then t else f when applied to itself gives if b then t else f if b then t else f, which parses back more like if b then t else f (if b then t else f), and is not atomic.

Putting it all together

Now we've got a random expression, and we know if the atomicity agrees with what we were expecting, we can report any differences. That approach has identified many additional patterns to match, but it's not perfect, in particular:

  • Most values either exceed the depth limit or fail to roundtrip. For 10,000 if expressions I typically get 1 or 2 which roundtrip properly. For non-if expressions it's usually 100 or so. The advantage of random testing is that throwing more time at a problem solves such issues without thinking too hard.
  • For some expressions, e.g. ParComp, I've never managed to get a valid value created. Perhaps haskell-src-exts can't parse it, or perhaps it requires constants I don't have in my hardcoded list - none of these were particularly common examples.
  • haskell-src-exts has a bug where -1 is pretty printed as (-1), which is then parsed as a paren and -1. That fails step 2, so we don't test with negative literals. As it happens, non-negative literals are atomic, but negative literals aren't, so we need to take care.
  • There are some patterns which appear to roundtrip successfully on their own, but not when surrounded by brackets, but secretly are just very weird. For example do rec\n [] parses successfully, but with source positions that are error values, and when applied to itself pretty prints incorrectly. There's at least one haskell-src-exts bug here.
  • The program appears to leak progressively more memory. I solved that by running slices of it at a time, and didn't look too hard. I've seen cases of blowup in Data constructors when recursing, so it could be that. but needs investigating.

As a result of all this work a future HLint will spot unnecessary brackets for 20 more types of expression, 8 more types of pattern and 7 more types of type.

by Neil Mitchell ( at February 18, 2018 11:00 PM

Kevin Reid (kpreid)

Blog moved

Moving to

(In the event that you are looking at this post in a distant future where both are abandoned, check my web site for the freshest link.)

by Kevin Reid (kpreid) ( at February 18, 2018 08:39 PM

Michael Snoyman

Haskell Ecosystem Requests

Last month, I clarified some parts of the SLURP proposal. I'm intentionally not getting into the SLURP proposal itself here, if you missed that episode, don't worry about it. One of the outcomes of that blog post was that I shared some of the requests I had made in private that ultimately led to the SLURP proposal.

A single comment in a mega-thread on Github is hardly a good place to write down these requests, however, and it seems like there's no progress on them. I'm going to instead put down these ideas here, with a bit more explanation, and a few more ideas that have popped up since then.

(If you really want to, feel free to see the context of my original comment.)

These points should be made in some kind of more official forum, but:

  1. I'm honestly not sure where that forum is
  2. I don't believe the official forums we typically use for discussions of community infrastructure are nearly visible enough to most community members

So I'll start the conversation here, and later we can move it to the right place.

PVP adherence is optional

I would like to see some kind of statement on Hackage that says something like, "PVP adherence is recommended, but not required. You are free to upload a package even if it does not conform to the PVP." Which I realize is in fact exactly what the current policy is, but in many discussions, this was unclear to people. And have a clear sentence to be quoted when online discussions get heated would be useful. Without something like this, I believe that we will continue having regular online flamewars about the PVP, which is the biggest thing I've been trying to get to stop over the past few years.

Hackage Trustee guidelines

Going along with this, I would like to request a change to the Hackage Trustee guidelines (or whatever the appropriate term is), namely that it is not appropriate to PVP police on social media. Sending PRs and opening issues: totally acceptable. Emails to authors: totally acceptable. If an author requests that these stop: they must stop. Publicly criticizing an author for not following the PVP: unacceptable. I do realize that enforcing a policy on how people behave personally is difficult. But I'd be happy to see the change even if it wasn't easily enforceable.

Downstream projects

Private discussions tried to achieve some kind of technical policy which would avoid breakage to Stackage and Stack. It seems like those private discussions did not reach any conclusion. However, regardless of any technical policy that is put in place, I would request simple goal be stated:

GHC, Hackage, and Cabal will strive to meet the needs of commonly used downstream projects, including but not limited to Stackage, Stack, and Nix.

I'm not asking for any demands of compatibility or testing, simply a stated policy that "it works with cabal-install, that's all that matters" is not a sufficient response.

Maintainer guidelines

There have been a number of issues and pull requests recently where contributors to some infrastructure projects have been discouraged by the unclear process for getting their changes included upstream. See, as examples:

More generally, there is an ongoing culture in some places of goals/agendas/plans being made privately and not shared, which leads to an inability of people outside of an inner circle to contribute. See, for example:

I would like to recommend some maintainer guidelines be put in place for any core Haskell packages and projects. (What constitutes "core" could definitely be up for debate as well.) I'd like to see some rules like:

  • Plans for significant changes must start as an issue in an issue tracker (see Gabriel's golden rule)
  • Plans for major changes should have a mention in a more public forum than an issue tracker. As a concrete example: the newly added ^>= operator has significant impacts on how downstream projects like Stackage interact with dependency bounds, but no public comment period was granted to provide input before the 2.0 release. (And even post release, as referenced above, the full plan has not been revealed.)
  • Pull requests which are rejected are given a reason for being rejected (this includes simple refusal to merge). See, for example, hackage-security #206.

There are likely many other guidelines we could come up with, some more onerous than others. I encourage others to recommend other ideas too. One possible source of inspiration for this could be the maintainer communication advice I wrote up a few years ago.

February 18, 2018 04:41 PM

February 17, 2018

Ken T Takusagawa

[omqzhxkn] Lagrange Four-Square Theorem examples

A listing of numbers and how to express them as a sum of 4 squares will probably provoke curiosity: There isn't an obvious pattern of how to express a given number as sum of 4 squares.  Can all natural numbers be expressed this way?  (Yes, by Lagrange.)  Which numbers can be expressed as the sum of just 3 squares (answer: Legendre Three-Square Theorem), or 2?  As numbers get larger, there seems to be a trend of more ways to express it as 4 or fewer squares, kind of reminiscent of Goldbach conjecture.  What is the rate of growth of the number of ways?  What about cubes and higher powers (Waring's problem)?  There's lots of deep mathematics lurking just beneath the surface.  It's just a short skip and a jump to Fermat's Last Theorem.

We generated 4-square decompositions up to 121=11^2 in order to include 112 = 7 * 4^2, the first instance where Legendre's 3-square theorem applies with an exponent (on 4) greater than 1.  The number which had the most number of ways to express it in the range was 90, with 9.

We also provide a more compact list which expresses each number in the fewest number of squares, but still listing all possibilities for that fewest number of squares.  The full version has 436 lines; the compact version has 188.  The compact version makes it more clear (perhaps inspiring more curiosity) which numbers require 4 squares and which ones can be done in less.

Similar lists could be made for Gauss's Eureka Theorem on sum of 3 triangular numbers and the Goldbach conjecture on the sum of 2 primes.

Haskell source code is here.  Here is a pedagogical excerpt of how to choose num decreasing numbers bounded by 0 and amax.  We use the list as a nondeterminism monad.

choose_n_of_max :: Integer -> Int -> [[Integer]];
choose_n_of_max amax num = case compare num 0 of {
LT -> error "negative choose_n_of_max";
EQ -> return [];
GT -> do {
  x <- [0..amax];
  y <- choose_n_of_max x (pred num);
  return (x:y);

Below is a machine-readable listing of the numbers through 121 and all the ways to express each number as the sum of 4 or fewer squares.

(0,[[0,0,0,0]]) (1,[[1,0,0,0]]) (2,[[1,1,0,0]]) (3,[[1,1,1,0]]) (4,[[1,1,1,1],[2,0,0,0]]) (5,[[2,1,0,0]]) (6,[[2,1,1,0]]) (7,[[2,1,1,1]]) (8,[[2,2,0,0]]) (9,[[2,2,1,0],[3,0,0,0]]) (10,[[2,2,1,1],[3,1,0,0]]) (11,[[3,1,1,0]]) (12,[[2,2,2,0],[3,1,1,1]]) (13,[[2,2,2,1],[3,2,0,0]]) (14,[[3,2,1,0]]) (15,[[3,2,1,1]]) (16,[[2,2,2,2],[4,0,0,0]]) (17,[[3,2,2,0],[4,1,0,0]]) (18,[[3,2,2,1],[3,3,0,0],[4,1,1,0]]) (19,[[3,3,1,0],[4,1,1,1]]) (20,[[3,3,1,1],[4,2,0,0]]) (21,[[3,2,2,2],[4,2,1,0]]) (22,[[3,3,2,0],[4,2,1,1]]) (23,[[3,3,2,1]]) (24,[[4,2,2,0]]) (25,[[4,2,2,1],[4,3,0,0],[5,0,0,0]]) (26,[[3,3,2,2],[4,3,1,0],[5,1,0,0]]) (27,[[3,3,3,0],[4,3,1,1],[5,1,1,0]]) (28,[[3,3,3,1],[4,2,2,2],[5,1,1,1]]) (29,[[4,3,2,0],[5,2,0,0]]) (30,[[4,3,2,1],[5,2,1,0]]) (31,[[3,3,3,2],[5,2,1,1]]) (32,[[4,4,0,0]]) (33,[[4,3,2,2],[4,4,1,0],[5,2,2,0]]) (34,[[4,3,3,0],[4,4,1,1],[5,2,2,1],[5,3,0,0]]) (35,[[4,3,3,1],[5,3,1,0]]) (36,[[3,3,3,3],[4,4,2,0],[5,3,1,1],[6,0,0,0]]) (37,[[4,4,2,1],[5,2,2,2],[6,1,0,0]]) (38,[[4,3,3,2],[5,3,2,0],[6,1,1,0]]) (39,[[5,3,2,1],[6,1,1,1]]) (40,[[4,4,2,2],[6,2,0,0]]) (41,[[4,4,3,0],[5,4,0,0],[6,2,1,0]]) (42,[[4,4,3,1],[5,3,2,2],[5,4,1,0],[6,2,1,1]]) (43,[[4,3,3,3],[5,3,3,0],[5,4,1,1]]) (44,[[5,3,3,1],[6,2,2,0]]) (45,[[4,4,3,2],[5,4,2,0],[6,2,2,1],[6,3,0,0]]) (46,[[5,4,2,1],[6,3,1,0]]) (47,[[5,3,3,2],[6,3,1,1]]) (48,[[4,4,4,0],[6,2,2,2]]) (49,[[4,4,4,1],[5,4,2,2],[6,3,2,0],[7,0,0,0]]) (50,[[4,4,3,3],[5,4,3,0],[5,5,0,0],[6,3,2,1],[7,1,0,0]]) (51,[[5,4,3,1],[5,5,1,0],[7,1,1,0]]) (52,[[4,4,4,2],[5,3,3,3],[5,5,1,1],[6,4,0,0],[7,1,1,1]]) (53,[[6,3,2,2],[6,4,1,0],[7,2,0,0]]) (54,[[5,4,3,2],[5,5,2,0],[6,3,3,0],[6,4,1,1],[7,2,1,0]]) (55,[[5,5,2,1],[6,3,3,1],[7,2,1,1]]) (56,[[6,4,2,0]]) (57,[[4,4,4,3],[5,4,4,0],[6,4,2,1],[7,2,2,0]]) (58,[[5,4,4,1],[5,5,2,2],[6,3,3,2],[7,2,2,1],[7,3,0,0]]) (59,[[5,4,3,3],[5,5,3,0],[7,3,1,0]]) (60,[[5,5,3,1],[6,4,2,2],[7,3,1,1]]) (61,[[5,4,4,2],[6,4,3,0],[6,5,0,0],[7,2,2,2]]) (62,[[6,4,3,1],[6,5,1,0],[7,3,2,0]]) (63,[[5,5,3,2],[6,3,3,3],[6,5,1,1],[7,3,2,1]]) (64,[[4,4,4,4],[8,0,0,0]]) (65,[[6,4,3,2],[6,5,2,0],[7,4,0,0],[8,1,0,0]]) (66,[[5,4,4,3],[5,5,4,0],[6,5,2,1],[7,3,2,2],[7,4,1,0],[8,1,1,0]]) (67,[[5,5,4,1],[7,3,3,0],[7,4,1,1],[8,1,1,1]]) (68,[[5,5,3,3],[6,4,4,0],[7,3,3,1],[8,2,0,0]]) (69,[[6,4,4,1],[6,5,2,2],[7,4,2,0],[8,2,1,0]]) (70,[[5,5,4,2],[6,4,3,3],[6,5,3,0],[7,4,2,1],[8,2,1,1]]) (71,[[6,5,3,1],[7,3,3,2]]) (72,[[6,4,4,2],[6,6,0,0],[8,2,2,0]]) (73,[[5,4,4,4],[6,6,1,0],[7,4,2,2],[8,2,2,1],[8,3,0,0]]) (74,[[6,5,3,2],[6,6,1,1],[7,4,3,0],[7,5,0,0],[8,3,1,0]]) (75,[[5,5,4,3],[5,5,5,0],[7,4,3,1],[7,5,1,0],[8,3,1,1]]) (76,[[5,5,5,1],[6,6,2,0],[7,3,3,3],[7,5,1,1],[8,2,2,2]]) (77,[[6,4,4,3],[6,5,4,0],[6,6,2,1],[8,3,2,0]]) (78,[[6,5,4,1],[7,4,3,2],[7,5,2,0],[8,3,2,1]]) (79,[[5,5,5,2],[6,5,3,3],[7,5,2,1]]) (80,[[6,6,2,2],[8,4,0,0]]) (81,[[6,5,4,2],[6,6,3,0],[7,4,4,0],[8,3,2,2],[8,4,1,0],[9,0,0,0]]) (82,[[5,5,4,4],[6,6,3,1],[7,4,4,1],[7,5,2,2],[8,3,3,0],[8,4,1,1],[9,1,0,0]]) (83,[[7,4,3,3],[7,5,3,0],[8,3,3,1],[9,1,1,0]]) (84,[[5,5,5,3],[6,4,4,4],[7,5,3,1],[8,4,2,0],[9,1,1,1]]) (85,[[6,6,3,2],[7,4,4,2],[7,6,0,0],[8,4,2,1],[9,2,0,0]]) (86,[[6,5,4,3],[6,5,5,0],[7,6,1,0],[8,3,3,2],[9,2,1,0]]) (87,[[6,5,5,1],[7,5,3,2],[7,6,1,1],[9,2,1,1]]) (88,[[6,6,4,0],[8,4,2,2]]) (89,[[6,6,4,1],[7,6,2,0],[8,4,3,0],[8,5,0,0],[9,2,2,0]]) (90,[[6,5,5,2],[6,6,3,3],[7,4,4,3],[7,5,4,0],[7,6,2,1],[8,4,3,1],[8,5,1,0],[9,2,2,1],[9,3,0,0]]) (91,[[5,5,5,4],[7,5,4,1],[8,3,3,3],[8,5,1,1],[9,3,1,0]]) (92,[[6,6,4,2],[7,5,3,3],[9,3,1,1]]) (93,[[6,5,4,4],[7,6,2,2],[8,4,3,2],[8,5,2,0],[9,2,2,2]]) (94,[[7,5,4,2],[7,6,3,0],[8,5,2,1],[9,3,2,0]]) (95,[[6,5,5,3],[7,6,3,1],[9,3,2,1]]) (96,[[8,4,4,0]]) (97,[[6,6,4,3],[6,6,5,0],[7,4,4,4],[8,4,4,1],[8,5,2,2],[9,4,0,0]]) (98,[[6,6,5,1],[7,6,3,2],[7,7,0,0],[8,4,3,3],[8,5,3,0],[9,3,2,2],[9,4,1,0]]) (99,[[7,5,4,3],[7,5,5,0],[7,7,1,0],[8,5,3,1],[9,3,3,0],[9,4,1,1]]) (100,[[5,5,5,5],[7,5,5,1],[7,7,1,1],[8,4,4,2],[8,6,0,0],[9,3,3,1],[10,0,0,0]]) (101,[[6,6,5,2],[7,6,4,0],[8,6,1,0],[9,4,2,0],[10,1,0,0]]) (102,[[6,5,5,4],[7,6,4,1],[7,7,2,0],[8,5,3,2],[8,6,1,1],[9,4,2,1],[10,1,1,0]]) (103,[[7,5,5,2],[7,6,3,3],[7,7,2,1],[9,3,3,2],[10,1,1,1]]) (104,[[6,6,4,4],[8,6,2,0],[10,2,0,0]]) (105,[[7,6,4,2],[8,4,4,3],[8,5,4,0],[8,6,2,1],[9,4,2,2],[10,2,1,0]]) (106,[[6,6,5,3],[7,5,4,4],[7,7,2,2],[8,5,4,1],[9,4,3,0],[9,5,0,0],[10,2,1,1]]) (107,[[7,7,3,0],[8,5,3,3],[9,4,3,1],[9,5,1,0]]) (108,[[6,6,6,0],[7,5,5,3],[7,7,3,1],[8,6,2,2],[9,3,3,3],[9,5,1,1],[10,2,2,0]]) (109,[[6,6,6,1],[8,5,4,2],[8,6,3,0],[10,2,2,1],[10,3,0,0]]) (110,[[7,6,4,3],[7,6,5,0],[8,6,3,1],[9,4,3,2],[9,5,2,0],[10,3,1,0]]) (111,[[6,5,5,5],[7,6,5,1],[7,7,3,2],[9,5,2,1],[10,3,1,1]]) (112,[[6,6,6,2],[8,4,4,4],[10,2,2,2]]) (113,[[6,6,5,4],[8,6,3,2],[8,7,0,0],[9,4,4,0],[10,3,2,0]]) (114,[[7,6,5,2],[7,7,4,0],[8,5,4,3],[8,5,5,0],[8,7,1,0],[9,4,4,1],[9,5,2,2],[10,3,2,1]]) (115,[[7,5,5,4],[7,7,4,1],[8,5,5,1],[8,7,1,1],[9,4,3,3],[9,5,3,0]]) (116,[[7,7,3,3],[8,6,4,0],[9,5,3,1],[10,4,0,0]]) (117,[[6,6,6,3],[7,6,4,4],[8,6,4,1],[8,7,2,0],[9,4,4,2],[9,6,0,0],[10,3,2,2],[10,4,1,0]]) (118,[[7,7,4,2],[8,5,5,2],[8,6,3,3],[8,7,2,1],[9,6,1,0],[10,3,3,0],[10,4,1,1]]) (119,[[7,6,5,3],[9,5,3,2],[9,6,1,1],[10,3,3,1]]) (120,[[8,6,4,2],[10,4,2,0]]) (121,[[7,6,6,0],[8,5,4,4],[8,7,2,2],[9,6,2,0],[10,4,2,1],[11,0,0,0]])

by Ken ( at February 17, 2018 03:23 AM

February 15, 2018

Joachim Breitner

Interleaving normalizing reduction strategies

A little, not very significant, observation about lambda calculus and reduction strategies.

A reduction strategy determines, for every lambda term with redexes left, which redex to reduce next. A reduction strategy is normalizing if this procedure terminates for every lambda term that has a normal form.

A fun fact is: If you have two normalizing reduction strategies s1 and s2, consulting them alternately may not yield a normalizing strategy.

Here is an example. Consider the lambda-term o = (λ, and note that oo → ooo → oooo → …. Let Mi = (λx.(λx.x))(oooo) (with i ocurrences of o). Mi has two redexes, and reduces to either (λx.x) or Mi + 1. In particular, Mi has a normal form.

The two reduction strategies are:

  • s1, which picks the second redex if given Mi for an even i, and the first (left-most) redex otherwise.
  • s2, which picks the second redex if given Mi for an odd i, and the first (left-most) redex otherwise.

Both stratgies are normalizing: If during a reduction we come across Mi, then the reduction terminates in one or two steps; otherwise we are just doing left-most reduction, which is known to be normalizing.

But if we alternatingly consult s1 and s2 while trying to reduce M2, we get the sequence

M2 → M3 → M4 → …

which shows that this strategy is not normalizing.

Afterthought: The interleaved strategy is not actually a reduction strategy in the usual definition, as it not a pure (stateless) function from lambda term to redex.

by Joachim Breitner ( at February 15, 2018 07:17 PM

February 14, 2018

Mark Jason Dominus

Utility poles

I am almost always interested in utility infrastructure. I see it every day, and often don't think about it. The electric power distribution grid is a gigantic machine, one of the biggest devices ever built, and people spend their whole lives becoming experts on just one part of it. What is it all for, how does it work? What goes wrong, and how do you fix it? Who makes the parts, and how much do they cost? Every day I go outside and see things like these big cylinders:

A wooden power utility pole, including cobra-head street light, with three large gray cylinders mounted on it under the main wires

and I wonder what they are. In this case from clues in the environment I was able to guess they were electrical power transformers. Power is distributed on these poles at about seven thousand volts, which is called “medium voltage”. But you do not want 7000-volt power in your house because it would come squirting out of the electric outlets in awesome lightnings and burn everything up. Also most household uses do not want three-phase power, they want single-phase power. So between the pole and the house there is a transformer to change the shape of the electricity to 120V, and that's what these things are. They turn out to be called “distribution transformers” and they are manufactured by — guess who? — General Electric, and they cost a few thousand bucks each. And because of the Wonders of the Internet, I can find out quite a lot about them. The cans are full of mineral oil, or sometimes vegetable oil! (Why are they full of oil? I don't know; I guess for insulation. But I could probably find out.) There are three because that is one way to change the three-phase power to single-phase, something I wish I understood better. Truly, we live in an age of marvels.

Anyway, I was having dinner with a friend recently and for some reason we got to talking about the ID plates on utility poles. The poles around here all carry ID numbers, and I imagine that back at the electric company there are giant books listing, for each pole ID number, where the pole is. Probably they computerized this back in the seventies, and the books are moldering in a closet somewhere.

As I discussed recently, some of those poles are a hundred years old, and the style of the ID tags has changed over that time:

An old, stamped-metal identification plate nailed to a wooden utility pole.  The plate is elliptical, and says 'PHILA ELEC. Cº 79558 B' This wooden pole has the following letters burned into it: 'BWR CPT 51017 SPSK6 250 PECO'

It looks to me like the original style was those oval plates that you see on the left, and that at some point some of the plates started to wear out and were replaced by the yellow digit tags in the middle picture. The most recent poles don't have tags: the identifier is burnt into the pole.

Poles in my neighborhood tend to have consecutive numbers. I don't think this was carefully planned. I guess how this happened is: when they sent the poles out on the truck to be installed, they also sent out a bunch of ID plates, perhaps already attached to the poles, or perhaps to be attached onsite. The plates would already have the numbers on them, and when you grab a bunch of them out of the stack they will naturally tend to have consecutive numbers, as in the pictures above, because that's how they were manufactured. So the poles in a vicinity will tend to have numbers that are close together, until they don't, because at that point the truck had to go back for more poles. So although you might find poles 79518–79604 in my neighborhood, poles 79605–79923 might be in a completely different part of the city.

Later on someone was inspecting pole 79557 (middle picture) and noticed that the number plate was wearing out. So they pried it off and replaced it with the yellow digit tag, which is much newer than the pole itself. The inspector will have a bunch of empty frames and a box full of digits, so they put up a new tag with the old ID number.

But sometime more recently they switched to these new-style poles with numbers burnt into them at the factory, in a different format than before. I have tried to imagine what the number-burning device looks like, but I'm not at all sure. Is it like a heated printing press, or perhaps a sort of configurable branding iron? Or is it more like a big soldering iron that is on a computer-controlled axis and writes the numbers on like a pen?

I wonder what the old plates are made of. They have to last a long time. For a while I was puzzled. Steel would rust; and I thought even stainless steel wouldn't last as long as these tags need to. Aluminum is expensive. Tin degrades at low temperatures. But thanks to the Wonders of the Internet, I have learned that, properly made, stainless steel tags can indeed last long enough; the web site of the British Stainless Steel Association advises me that even in rough conditions, stainless steel with the right composition can last 85 years outdoors. I will do what I should have done in the first place, and go test the tags with a magnet to see if they are ferrous.

Here's where some knucklehead in the Streets Department decided to nail a No Parking sign right over the ID tag:

A close-up of an old oval tag just peeking out from behind the corner of the metal regulation sign that was nailed to the same pole

Another thing you can see on these poles is inspection tags:

A very old pole. Three groups of tags are nailed to it.  The bottom two groups contains an oval tag stamped with OSMOSE and an inspection year (2001 or 2013), and a quarter-circle tag stamped with MITC-FUME.  The top group is missing its oval tag, and has only a rather rusty quarter-circle that says WOODFUME

Without the Internet I would just have to wonder what these were and what OSMOSE meant. It is the name of the company that PECO has hired to inspect and maintain the poles. They specialize in this kind of work. This old pole was inspected in 2001 and again in 2013. The dated inspection tag from the previous inspection is lost but we can see a pie-shaped tag that says WOODFUME. You may recall from my previous article that the main killer of wood poles is fungal infection. Woodfume is an inexpensive fumigant that retards pole decay. It propagates into the pole and decomposes into MITC (methyl isothiocyanate). By 2001 PECO had switched to using MITC-FUME, which impregnates the pole directly with MITC. Osmose will be glad to tell you all about it.

(Warning: Probably at least 30% of the surmise in this article is wrong.)

by Mark Dominus ( at February 14, 2018 05:05 PM

Michael Snoyman

Stack Patching Policy

This blog post is about a potential policy decision affecting the maintenance of the Stack code base itself. It will affect contributors to the project, and those building Stack for other purposes (such as maintainers of Linux distro packages). It will only indirectly affect end users, as hopefully is made clear in the discussion below.

Github issue for official discussion


Until now, every version of Stack that has been released (or even merged to master, unless I'm mistaken) has exclusively used versions of dependencies available on Hackage. It has not used the extra-dep archive or Git repo feature, or submodules to include alternative versions of source code. This means that, for the most part, you get the same Stack whether you get an official download, run stack build inside the source tree, use stack build using a Stackage snapshot, or run cabal install stack.

Now, as it happens, this isn't completely true either. The official Stack binaries pin the dependencies to exact versions which have been tested together, via the stack.yaml file. This means that the latter two approaches of getting Stack binaries may have different behavior, due to the snapshot or the dependency solver choosing different versions. Some distros have already run into bugs because of this.

To pull all of that back in: the official way to get Stack today will guarantee a specific set of dependencies which have gone through the full Stack integration test suite. Some alternative methods may not provide the same level of guarantees. But with a bit of effort, you can force Stack or cabal-install to build exactly the same thing the official binaries provide.

The new problem

One issue that pops up is: what do we do in a situation where an upstream package has a bug, and either cannot (within the timeframe desired) or will not release a new version with a fix? The concrete example that pops up is hackage-security pull request #203 (addressing issue #187), though the specific details aren't too important for the discussion here. The discussion here is about the general rule: what should Stack do in this case?

Four options

Others may be more creative than me, but I can see four different options to respond in a situation like this:

  1. Continue using the officially released upstream version of hackage-security, bugs and all
  2. Fork hackage-security on Hackage, and depend on the fork
  3. Inline the code from hackage-security into Stack itself, and drop the explicit dependency on hackage-security
  4. Include hackage-security via an extra-dep pointing at a Git commit. Our official builds will use the patched version of hackage-security, and anyone building from Hackage will end up with the unpatched version

Option (1) is the status quo: we cannot fix this bug until upstream fixes it. This is a disappointing outcome for users, as we know how to fix the bug, and can imminently do so, but users will continue to suffer regardless. However, it makes maintenance of Stack relatively easy, and has no impact on packagers.

Options (2) and (3) are relatively similar: you end up with a forked version of the codebase feeding into Stack, but all of the code necessary is still available from Hackage. Packagers, and people building with a command like cabal install stack, will still be able to get the right version of the executable, assuming they pin their dependencies the same way we do (as mentioned above).

Option (4) is a more radical departure. It means that cabal install stack, without quite a bit of extra work, will not result in the same executable. You can argue that, given the assumed lack of pinning of dependency versions, this isn't too terribly different from the status quo. And with the patch I've written for hackage-security now, that's basically true. However, it's theoretically possible that, in the future, we could have a patch that changes the API, and makes it impossible to build Stack against the Hackage version of a package. So let's break up option 4 into two subchoices:

  • Option 4a: we can use an extra-dep, but we must ensure that the Stack codebase continues to build against the Hackage version of the package, even if it's missing a bug fix, performance enhancement, or whatever else we wrote
  • Option 4b: free-for-all: use whatever extra-deps we want, and state that there is no support for building from Hackage alone.

My recommendation

I lean towards option 4a. I don't want to upload forks to Hackage (option (2)); it's a confusing situation for users, and may be seen as an aggressive move (which is certainly not the intent here). Option (3) could work, but makes it more painful than it should be to work on the Stack codebase. I'd rather not subject contributors (or myself!) to that.

Option 4b is IMO a step too far: we'd be uploading something to Hackage which we know for a fact could never be built there. At that point, there's not really any reason for uploading to Hackage. And option 1 (we cannot fix bugs) is just too limiting.

The biggest impact I can see is how others will end up packaging Stack. But frankly, this is already a situation that deserves an official discussion. There have certainly been plenty of cases in the past where users tripped on bugs that didn't exist in the official Stack releases, and the Stack team needed to spend inordinate time tracing this back to a bad build. So if nothing else, hopefully this post will spawn some discussion of correct packaging behavior.

Official discussion

As mentioned above, I've created a Github issue for an official discussion of this topic: issue #3866. Other discussions (Disqus below, mailing list, etc) are welcome, but may not receive the full attention of the Stack team.

February 14, 2018 08:38 AM

February 13, 2018

Mark Jason Dominus

Weighted Reservoir Sampling

(If you already know about reservoir sampling, just skip to the good part.)

The basic reservoir sampling algorithm asks us to select a random item from a list, easy peasy, except:

  1. Each item must be selected with equal probability
  2. We don't know ahead of time how big the list is
  3. We may only make one pass over the list
  4. We may use only constant memory

Maybe the items are being read from a pipe or some other lazy data structure. There might be zillions of them, so we can't simply load them into an array. Obviously something like this doesn't work:

# Python
from random import random
selected =
for item in inputs:
    if random() < 0.5:
        selected = item

because it doesn't select the items with equal probability. Far from it! The last item is selected as often as all the preceding items put together.

The requirements may seem at first impossible to satisfy, but it can be done and it's not even difficult:

from random import random
n = 0
selected = None

for item in inputs:
    n += 1
    if random() < 1/n:
        selected = item

The inputs here is some sort of generator that presents the list of items, one at a time. After the loop completes, the selected item is in selected. A proof that this selects each item equiprobably is left as an easy exercise, or see this math StackExchange post. A variation for selecting items instead of only one is quite easy.

The good part

Last week I thought of a different simple variation. Suppose each item is presented along with an arbitrary non-negative weight , measuring the relative likelihood of its being selected for the output. For example, an item with weight 6 should be selected twice as often as an item with weight 3, and three times as often as an item with weight 2.

The total weight is and at the end, whenever that is, we want to have selected each item with probability :

total_weight = 0
selected = None

for item, weight in inputs:
    if weight == 0: continue
    total += weight
    if random() < weight/total:
        selected = item

The correctness proof is almost the same. Clearly this reduces to the standard algorithm when all the weights are equal.

This isn't a major change, but it seems useful and I hadn't seen it before.

by Mark Dominus ( at February 13, 2018 07:48 PM

Brent Yorgey

A (work in progress) translation of Joyal’s original paper on species

tl;dr: I’m working on an English translation, with additional commentary, of Joyal’s 1981 paper introducing the concept of combinatorial species. Collaboration and feedback welcome!

Back when I was writing my PhD thesis on combinatorial species, I was aware that André Joyal’s original papers introducing combinatorial species are written in French, which I don’t read. I figured this was no big deal, since there is plenty of secondary literature on species in English (most notably Bergeron et al., which, though originally written in French, has been translated into English by Margaret Readdy). But at some point I asked a question on MathOverflow to which I hadn’t been able to find an answer, and was told that the answer was already in one of Joyal’s original papers!

So I set out to try to read Joyal’s original papers in French (there are two in particular: Une théorie combinatoire des séries formelles, and Foncteurs analytiques et espèces de structures), and found out that it was actually possible since (a) they are mathematics papers, not high literature; (b) I already understand a lot of the mathematics; and (c) these days, there are many easily accessible digital tools to help with the task of translation.

However, although it was possible for me to read them, it was still hard work, and for someone without my background in combinatorics it would be very tough going—which is a shame since the papers are really very beautiful. So I decided to do something to help make the papers and their ideas more widely accessible. In particular, I’m making an English translation of the papers1—or at least of the first one, for now—interspersed with my own commentary to fill in more background, give additional examples, make connections to computation and type theory, or offer additional perspective. I hope it will be valuable to those in the English-speaking mathematics and computer science communities who want to learn more about species or gain more appreciation for a beautiful piece of mathematical history.

This is a long-term project, and not a high priority at the moment; I plan to work on it slowly but steadily. I’ve only worked on the first paper so far, and I’m at least far enough along that I’m not completely embarrassed to publicize it (but not much more than that). I decided to publicize my effort now, instead of waiting until I’m done, for several reasons: first, it may be a very long time before I’m really “done”, and some people may find it helpful or interesting before it gets to that point. Second, I would welcome collaboration, whether in the form of help with the translation itself, editing or extending the commentary, or simply offering feedback on early drafts or fixing typos. You can find an automatically updated PDF with the latest draft here, and the github repo is here. There are also simple instructions for compiling the paper yourself (using stack) should you want to do that.

  1. And yes, I checked carefully, and this is explicitly allowed by the copyright holder (Elsevier) as long as I put certain notices on the first page.↩

by Brent at February 13, 2018 03:07 PM

Manuel M T Chakravarty

PLT engineers needed!

Do you know how to write FP compilers? Would you like to design & implement next-generation, functional(!) smart contract languages with Phil Wadler and myself? Check out Phil’s post and the IOHK job ad.

February 13, 2018 12:44 AM

February 12, 2018

Mark Jason Dominus

Philadelphia sports fans behaving badly

Philadelphia sports fans have a bad reputation. For example, we are famous for booing Santa Claus and hitting him with snowballs. I wasn't around for that; it happened in 1968. When the Santa died in 2015, he got an obituary in the Phildelphia Inquirer:

Frank Olivo, the Santa Claus who got pelted with snowballs at the Eagles game that winter day in 1968, died Thursday, April 30…

The most famous story of this type is about Ed Rendell (after he was Philadelphia District Attorney, but before he was Mayor) betting a Eagles fan that they could not throw snowballs all the way from their upper-deck seat onto the field. This was originally reported in 1989 by Steve Lopez in the Inquirer.

(Lopez's story is a blast. He called up Rendell, who denied the claim, and referred Lopez to a friend who had been there with him. Lopez left a message for the friend. Then Rendell called back to confess. Later Rendell's friend called back to deny the story. Lopez wrote:

Was former D.A. Ed Rendell's worst mistake to (A) bet a drunken hooligan he couldn't reach the field, (B) lie about it, (C) confess, or (D) take his friend down with him?

My vote is C. Too honest. Why do you think he can't win an election?

A few years later Rendell was elected Mayor of Philadelphia, and later, Governor of Pennsylvania. Anyway, I digress.)

I don't attend football games, and baseball games are not held in snowy weather, so we have to find other things to throw on the field. I am too young to remember Bat Day, where each attending ticket-holder was presented with a miniature souvenir baseball bat; that was eliminated long ago because too many bats were thrown at the visiting players. (I do remember when those bats stopped being sold at the concession stands, for the same reason.) Over the years, all the larger and harder premiums were eliminated, one by one, but we are an adaptable people and once, to protest a bad call by the umpire, we delayed the game by wadding up our free promotional sport socks and throwing them onto the field. That was the end of Sock Day.

On one memorable occasion, two very fat gentlemen down by the third-base line ran out of patience during an excessively long rain delay and climbed over the fence, ran out and belly-flopped onto the infield, sliding on the wet tarpaulin all the way to the first-base side. Confronted there by security, they evaded capture by turning around and sliding back. These heroes were eventually run down, but only after livening up what had been a very trying evening.

The main point of this note is to shore up a less well-known story of this type. I have seen it reported that Phillies fans once booed Miss Pennsylvania, and I have also seen people suggest that this never really happened. On my honor, it did happen. We not only booed Miss Pennsylvania, we booed her for singing the national anthem. I was at that game, in 1993. The Star-Spangled Banner has a lot of problems that the singer must solve one way or another, and there are a lot of ways to interpret it. But it has a melody, and the singer's interpretation is not permitted to stray so far from the standard that they are singing a different song that happens to have the same words. I booed too, and I'm not ashamed to admit it.

by Mark Dominus ( at February 12, 2018 05:26 PM

February 08, 2018

Sandy Maguire

Devlog: Navigation

<article> <header>

Devlog: Navigation


<time>February 8, 2018</time> devlog, neptune

One of the tropes of the golden era of point-n-click adventure games is, would you believe it, the pointing and clicking. In particular, pointing where you’d like the avatar to go, and clicking to make it happen. This post will explore how I made that happen in my neptune game engine.

The first thing we need to do is indicate to the game which parts of the background should be walkable. Like we did for marking hotspots, we’ll use an image mask. Since we have way more density in an image than we’ll need for this, we’ll overlay it on the hotspot mask.

Again, if the room looks like this:

room background

room background

Our mask image would look like this:

room mask

room mask

Here, the walkable section of the image is colored in blue. You’ll notice there’s a hole in the walk mask corresponding to the table in the room; we wouldn’t want our avatar to find a path that causes him to walk through the table.

However there is something important to pay attention to here; namely that we’re making an adventure game. Which is to say that our navigation system doesn’t need to be all that good; progress in the game is blocked more by storytelling and puzzles than it is by the physical location of the player (unlike, for example, in a platformer game.) If the avatar does some unnatural movement as he navigates, it might be immersion-breaking, but it’s not going to be game-breaking.

Which means we can half ass it, if we need to. But I’m getting ahead of myself.

The first thing we’re going to need is a function which samples our image mask and determines if a given position is walkable.

canWalkOn :: Image PixelRGBA8 -> V2 Int -> Bool
canWalkOn img (V2 x y)
    = flip testBit walkableBit
    . getWalkableByte
    $ pixelAt img x y
    getWalkableByte (PixelRGBA8 _ _ b _) = b
    walkableBit = 7

Currying this function against our image mask gives us a plain ol’ function which we can use to query walk-space.

In a 3D game, you’d use an actual mesh to mark the walkable regions, rather than using this mask thing. For that purpose, from here on out we’ll call this thing a navmesh, even though it isn’t strictly an appropriate name in our case.

Because pathfinding algorithms are defined in terms of graphs, the next step is to convert our navmesh into a graph. There are lots of clever ways to do this, but remember, we’re half-assing it. So instead we’re going to do something stupid and construct a square graph by sampling every \(n\) pixels, and connecting it to its orthogonal neighbors if both the sample point and its neighbor are walkable.

It looks like this:

graph building

graph building

Given the navmesh, we sample every \(n\) points, and determine whether or not to put a graph vertex there (white squares are vertices, the black squares are just places we sampled.) Then, we put an edge between every neighboring vertex (the white lines.)

We’re going to want to run A* over this graph eventually, which is implemented in Haskell via Data.Graph.AStar.aStar. This package uses an implicit representation of this graph rather than taking in a graph data structure, so we’ll construct our graph in a manner suitable for aStar.

But first, let’s write some helper functions to ensure we don’t get confused about whether we’re in world space or navigation space.

-- | Sample every n pixels in on the navmesh.
sampleRate :: Float
sampleRate = 4

-- | Newtype to differentiate nav node coordinates from world coordinates.
newtype Nav = Nav { unNav :: Int }
  deriving (Eq, Ord, Num, Integral, Real)

toNav :: V2 Float -> V2 Nav
toNav = fmap round
      . fmap (/ sampleRate)

fromNav :: V2 Nav -> V2 Float
fromNav = fmap (* sampleRate)
        . fmap fromIntegral

toNav and fromNav are roughly inverses of one another – good enough for half-assing it at least. We’ll do all of our graph traversal stuff in nav-space, and use world-space only at the boundaries.

We start with some helper functions:

navBounds :: Image a -> V2 Nav
navBounds = subtract 1
          . toNav
          . fmap fromIntegral
          . imageSize

navBound gives us the largest valid navigation point from an image – this will be useful later when we want to build a graph and don’t want to sample points that are not on it.

The next step is our neighbors function, which should compute the edges for a given node on the navigation step.

neighbors :: Image PixelRGBA8 -> V2 Nav -> HashSet (V2 Nav)
neighbors img v2 = HS.fromList $ do
  let canWalkOn' = canWalkOn img
                 . fmap floor
                 . fmap fromNav

  V2 x y <- fmap (v2 &)
            [ _x -~ 1
            , _x +~ 1
            , _y -~ 1
            , _y +~ 1
  guard $ canWalkOn' v2
  guard $ x >= 0
  guard $ x <= w
  guard $ y >= 0
  guard $ y <= h
  guard . canWalkOn' $ V2 x y
  return $ V2 x y

We use the list monad here to construct all of the possible neighbors – those which are left, right, above and below our current location, respectively. We then guard on each, ensure our current nav point is walkable, that our candidate neighbor is within nav bounds, and finally that the candidate itself is walkable. We need to do this walkable check last, since everything will explode if we try to sample a pixel that is not in the image.

Aside: if you actually have a mesh (or correspondingly a polygon in 2D), you can bypass all of this sampling nonsense by tessellating the mesh into triangles, and using the results as your graph. In my case I didn’t have a polygon, and I didn’t want to write a tessellating algorithm, so I went with this route instead.

Finally we need a distance function, which we will use both for our astar heuristic as well as our actual distance. The actual distance metric we use doesn’t matter, so long as it corresponds monotonically with the actual distance. We’ll use distance squared, because it has this monotonic property we want, and saves us from having to pay the cost of computing square roots.

distSqr :: V2 Nav -> V2 Nav -> Float
distSqr x y = qd (fmap fromIntegral x) (fmap fromIntegral y)

And with that, we’re all set! We can implement our pathfinding by filling in all of the parameters to aStar:

pathfind :: Image PixelRGBA8 -> V2 Float -> V2 Float -> Maybe [V2 Float]
pathfind img = \src dst ->
    fmap fromNav <$> aStar neighbors distSqr (distSqr navDst) navSrc
    navSrc = toNav src
    navDst = toNav dst

Sweet. We can run it, and we’ll get a path that looks like this:

Technically correct, in that it does in fact get from our source location to our destination. But it’s obviously half-assed. This isn’t the path that a living entity would take; as a general principle we try not to move in rectangles if we can help it.

We can improve on this path by attempting to shorten it. In general this is a hard problem, but we can solve that by giving it the old college try.

Our algorithm to attempt to shorten will be a classic divide and conquer approach – pick the two endpoints of your current path, and see if there is a straight line between the two that is walkable throughout its length. If so, replace the path with the line you just constructed. If not, subdivide your path in two, and attempt to shorten each half of it.

Before we actually get into the nuts and bolts of it, here’s a quick animation of how it works. The yellow circles are the current endpoints of the path being considered, and the yellow lines are the potential shortened routes. Whenever we can construct a yellow line that doesn’t leave the walkable region, we replace the path between the yellow circles with the line.

path shortening

path shortening

The “divide and conquer” bit of our algorithm is easy to write. We turn our path list into a Vector so we can randomly access it, and then call out to a helper function sweepWalkable to do the nitty gritty stuff. We append the src and dst to the extrema of the constructed vector because aStar won’t return our starting point in its found path, and because we quantized the dst when we did the pathfinding, so the last node on the path is the closest navpoint, rather than being where we asked the character to move to.

shorten :: Image PixelRGBA8 -> V2 Float -> V2 Float -> [V2 Float] -> [V2 Float]
shorten img src dst path =
    let v = V.fromList $ (src : path) ++ [dst]
     in go 0 (V.length v - 1) v
    go l u v =
      if sweepWalkable img (v V.! l) (v V.! u)
         then [v V.! u]
         else let mid = ((u - l) `div` 2) + l
               in go l mid v ++ go mid u v

The final step, then, is to figure out what this sweepWalkable thing is. Obviously it wants to construct a potential line between its endpoints, but we don’t want to have to sample every damn pixel. Remember, we’re half-assing it. Instead, we can construct a line, but actually only sample the nav points that are closest to it.

In effect this is “rasterizing” our line from its vector representation into its pixel representation.

Using the Pythagorean theorem in navigation space will give us the “length” of our line in navigation space, which corresponds to the number of navpoints we’ll need to sample.

For example, if our line looks like this:

pythagorean theorem

pythagorean theorem

Then the number \(n\) of nav points we need to sample is:

\[ \begin{align*} n &= \lfloor \sqrt{4^2 + 5^2} \rfloor \\ &= \lfloor \sqrt{16 + 25} \rfloor \\ &= \lfloor \sqrt{41} \rfloor \\ &= \lfloor 6.4 \rfloor \\ &= 6 \end{align*} \]

We can then subdivide our line into 6 segments, and find the point on the grid that is closest to the end of each. These points correspond with the nodes that need to be walkable individually in order for our line itself to be walkable. This approach will fail for tiny strands of unwalkable terrain that slices through otherwise walkable regions, but maybe just don’t do that? Remember, all we want is for it to be good enough – half-assing it and all.



So, how do we do it?

sweepWalkable :: Image PixelRGBA8 -> V2 Float -> V2 Float -> Bool
sweepWalkable img src dst =
  let dir   = normalize $ dst - src
      distInNavUnits = round $ distance src dst
      bounds = navBounds img
    in getAll . flip foldMap [0 .. distInNavUnits] $ \n ->
        let me = src + dir ^* (fromIntegral @Int n)
          in All . canWalkOn' img
                 . clamp (V2 0 0) bounds
                 $ toNav me

Sweet! Works great! Our final pathfinding function is thus:

navigate :: Image PixelRGBA8 -> V2 Float -> V2 Float -> Maybe [V2 Float]
navigate img src dst = fmap (shorten img src dst) $ pathfind src dst

Golden, baby.

Next time we’ll talk about embedding a scripting language into our game so we don’t need to wait an eternity for GHC to recompile everything whenever we want to change a line of dialog. Until then!


February 08, 2018 12:00 AM

February 07, 2018

Mark Jason Dominus

The many faces of the Petersen graph

(Actually the Petersen graph cannot really be said to have faces, as it is nonplanar. HA! HA! I MAKE JOKE!​!1!)

This article was going to be about how GraphViz renders the Petersen graph, but instead it turned out to be about how GraphViz doesn't render the Petersen graph. The GraphViz stuff will be along later.

Here we have the Petersen graph, which, according to Donald Knuth, “serves as a counterexample to many optimistic predictions about what might be true for graphs in general.” It is not that the Petersen graph is stubborn! But it marches to the beat of a different drummer. If you have not met it before, prepare to be delighted.

The Petersen graph has two sets of five vertices each.  Each set is connected into a pentagonal ring.  There are five more edges between vertices in opposite rings, but instead of being connected 0–0 1–1 2–2 3–3 4–4, they are connected 0–0 1–2 2–4 3–1 4–3.

This is the basic structure: a blue 5-cycle, and a red 5-cycle. Corresponding vertices in the two cycles are connected by five purple edges. But there is a twist! Notice that the vertices in the red cycle are connected in the order 1–3–5–2–4.

There are different ways to lay out the Petersen graph that showcase its many interesting properties. For example, the standard presentation, above, demonstrates that the Petersen graph is nonplanar, since it obviously contracts to . The presentation below obscures this, but it is good for seeing that the graph has diameter only 2:

Wait, what? Where did the pentagons go?

Try this instead:

The Petersen graph laid out as a tree, with a root attached to three level-1 nodes, each attached to 2 level-2 nodes.  The six level-2 nodes are then connected into a ring so that each level-2 node is at distance 1 or distance 2 from each other level-2 node.

Again the red vertices are connected in the order 1–3–5–2–4.

Okay, that is indeed the Petersen graph, but how does it help us see that the graph has diameter 2? Color the nodes by how far down they are from the root:

  • Obviously, the root node (black) has distance at most 2 to every other node, because the tree has only depth 2.

  • Each of the three second-level nodes (red) is distance 2 from the other two, via a path through the root.

  • The six third-level nodes (blue) are linked in a 6-cycle (dotted lines), so that each third-level node is at most two steps away along the cycle from the others, except for the one furthest away, but that is its sibling in the tree, and it has a path of length 2 through their common parent.

  • And since each third-level node (say, the one with the red ring) is connected by a dotted edge (orange) to cousins in both of the other branches of the tree, it's only distance 2 from both of its red uncle nodes.

Looking at the pentagonal version, you would not suspect the Petersen graph of also having a sixfold symmetry, but it does. We'll get there in two steps. Again, here's a version where it's not so easy to see that it's actually the Petersen graph, but whatever it is, it is at least clear that it has an automorphism of order six (give it a one-sixth turn):


The represents three vertices, one in each color. In the picture they are superimposed, but in the actual graph, no pair of the three is connected by an edge. Instead, each of the three is connected not to the others but to a tenth vertex that I omitted from the diagram entirely.

Let's pull apart the three vertices and reveal the hidden tenth vertex and its three edges:


Here is the same drawing, recolored to match the tree diagram from before; the outer hexagon is just the 6-cycle formed by the six blue leaf nodes:


But maybe it's easier to see if we look for red and blue pentagons. There are a couple of ways to do that:

narf   narf

As always, the red vertices are connected in the order 1–3–5–2–4.

Finally, here's a presentation you don't often see. It demonstrates that the Petersen graph also has fourfold symmetry:


Again, and represent single vertices stretched out into dumbbell shapes. The diagram only shows 14 of the 15 edges; the fifteenth connects the two dumbbells.

The pentagons are deeply hidden here. Can you find them? (Spoiler)

Even though this article was supposed to be about GraphViz, I found it impossible to get it to render the diagrams I wanted it to, and I had to fall back on Inkscape. Fortunately Inkscape is a ton of fun.

by Mark Dominus ( at February 07, 2018 11:37 PM

FP Complete

Best Practices for Developing Medical Device Software

At FP Complete we have experience writing Medical Device software that has to go through rigorous compliance steps and eventually be approved by a government regulatory body such as the US Food and Drug Administration (FDA).

by Niklas Hambüchen at February 07, 2018 08:30 PM

Gabriel Gonzalez

The wizard monoid

<html xmlns=""><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > { color: #4070a0; } /* Char */ code > { color: #4070a0; } /* String */ code > { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > { color: #880000; } /* Constant */ code > { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > { color: #bb6688; } /* SpecialString */ code > { } /* Import */ code > { color: #19177c; } /* Variable */ code > { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > { color: #7d9029; } /* Attribute */ code > { color: #ba2121; font-style: italic; } /* Documentation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style></head><body>

Recent versions of GHC 8.0 provides a Monoid instance for IO and this post gives a motivating example for why this instance is useful by building combinable "wizard"s.


I'll define a "wizard" as a program that prompts a user "up front" for multiple inputs and then performs several actions after all input has been collected.

Here is an example of a simple wizard:

main :: IO ()
main = do
-- First, we request all inputs:
putStrLn "What is your name?"
name <- getLine

putStrLn "What is your age?"
age <- getLine

-- Then, we perform all actions:
putStrLn ("Your name is: " ++ name)
putStrLn ("Your age is: " ++ age)

... which produces the following interaction:

What is your name?
What is your age?
Your name is: Gabriel
Your age is: 31

... and here is an example of a slightly more complex wizard:

import qualified System.Directory

main :: IO ()
main = do
-- First, we request all inputs:
files <- System.Directory.listDirectory "."
let askFile file = do
putStrLn ("Would you like to delete " ++ file ++ "?")
response <- getLine
case response of
"y" -> return [file]
_ -> return []

listOfListOfFilesToRemove <- mapM askFile files
let listOfFilesToRemove = concat listOfListOfFilesToRemove

-- Then, we perform all actions:
let removeFile file = do
putStrLn ("Removing " ++ file)
System.Directory.removeFile file
mapM_ removeFile listOfFilesToRemove

... which produces the following interaction:

Would you like to delete file1.txt?
Would you like to delete file2.txt?
Would you like to delete file3.txt?
Removing file1.txt
Removing file3.txt

In each example, we want to avoid performing any irreversible action before the user has completed entering all requested input.


Let's revisit our first example:

main :: IO ()
main = do
-- First, we request all inputs:
putStrLn "What is your name?"
name <- getLine

putStrLn "What is your age?"
age <- getLine

-- Then, we perform all actions:
putStrLn ("Your name is: " ++ name)
putStrLn ("Your age is: " ++ age)

This example is really combining two separate wizards:

  • The first wizard requests and displays the user's name
  • The second wizard requests and displays the user's age

However, we had to interleave the logic for these two wizards because we needed to request all inputs before performing any action.

What if there were a way to define these two wizards separately and then combine them into a larger wizard? We can do so by taking advantage of the Monoid instance for IO, like this:

import Data.Monoid ((<>))

name :: IO (IO ())
name = do
putStrLn "What is your name?"
x <- getLine
return (putStrLn ("Your name is: " ++ x))

age :: IO (IO ())
age = do
putStrLn "What is your age?"
x <- getLine
return (putStrLn ("Your age is: " ++ x))

runWizard :: IO (IO a) -> IO a
runWizard request = do
respond <- request

main :: IO ()
main = runWizard (name <> age)

This program produces the exact same behavior as before, but now all the logic for dealing with the user's name is totally separate from the logic for dealing with the user's age.

The way this works is that we split each wizard into two parts:

  • the "request" (i.e. prompting the user for input)
  • the "response" (i.e. performing an action based on that input)

... and we do so at the type-level by giving each wizard the type IO (IO ()):

name :: IO (IO ())

age :: IO (IO ())

The outer IO action is the "request". When the request is done the outer IO action returns an inner IO action which is the "response". For example:

--      ↓ The request
name :: IO (IO ())
-- ↑ The response
name = do
putStrLn "What is your name?"
x <- getLine
-- ↑ Everything above is part of the outer `IO` action (i.e. the "request")

-- ↓ This return value is the inner `IO` action (i.e. the "response")
return (putStrLn ("Your name is: " ++ x))

We combine wizards using the (<>) operator, which has the following behavior when specialized to IO actions:

ioLeft <> ioRight

= do resultLeft <- ioLeft
resultRight <- ioRight
return (resultLeft <> resultRight)

In other words, if you combine two IO actions you just run each IO action and then combine their results. This in turn implies that if we nest two IO actions then we repeat this process twice:

requestLeft <> requestRight

= do respondLeft <- requestLeft
respondRight <- requestRight
return (respondLeft <> respondRight)

= do respondLeft <- requestLeft
respondRight <- requestRight
return (do
unitLeft <- respondLeft
unitRight <- respondRight
return (unitLeft <> unitRight) )

-- Both `unitLeft` and `unitRight` are `()` and `() <> () = ()`, so we can
-- simplify this further to:
= do respondLeft <- requestLeft
respondRight <- requestRight
return (do
respondRight )

In other words, when we combine two wizards we combine their requests and then combine their responses.

This works for more than two wizards. For example:

request0 <> request1 <> request2 <> request3

= do respond0 <- request0
respond1 <- request1
respond2 <- request2
respond3 <- request3
return (do
respond3 )

To show this in action, let's revisit our original example once again:

import Data.Monoid ((<>))

name :: IO (IO ())
name = do
putStrLn "What is your name?"
x <- getLine
return (putStrLn ("Your name is: " ++ x))

age :: IO (IO ())
age = do
putStrLn "What is your age?"
x <- getLine
return (putStrLn ("Your age is: " ++ x))

runWizard :: IO (IO a) -> IO a
runWizard request = do
respond <- request

main :: IO ()
main = runWizard (name <> age)

... and this time note that name and age are awfully similar, so we can factor them out into a shared function:

import Data.Monoid ((<>))

prompt :: String -> IO (IO ())
prompt attribute = do
putStrLn ("What is your " ++ attribute ++ "?")
x <- getLine
return (putStrLn ("Your " ++ attribute ++ " is: " ++ x))

runWizard :: IO (IO a) -> IO a
runWizard request = do
respond <- request

main :: IO ()
main = runWizard (prompt "name" <> prompt "age")

We were not able to factor out this shared logic back when the logic for the two wizards were manually interleaved. Once we split them into separate logical wizards then we can begin to exploit shared structure to compress our program.

This program compression lets us easily add new wizards:

import Data.Monoid ((<>))

prompt :: String -> IO (IO ())
prompt attribute = do
putStrLn ("What is your " ++ attribute ++ "?")
x <- getLine
return (putStrLn ("Your " ++ attribute ++ " is: " ++ x))

runWizard :: IO (IO a) -> IO a
runWizard request = do
respond <- request

main :: IO ()
main = runWizard (prompt "name" <> prompt "age" <> prompt "favorite color")

... and take advantage of standard library functions that work on Monoids, like foldMap so that we can mass-produce wizards:

import Data.Monoid ((<>))

prompt :: String -> IO (IO ())
prompt attribute = do
putStrLn ("What is your " ++ attribute ++ "?")
x <- getLine
return (putStrLn ("Your " ++ attribute ++ " is: " ++ x))

runWizard :: IO (IO a) -> IO a
runWizard request = do
respond <- request

main :: IO ()
main = runWizard (foldMap prompt [ "name", "age", "favorite color", "sign" ])

More importantly, we can now easily see at a glance what our program does and ease of reading is a greater virtue than ease of writing.

Final example

Now let's revisit the file removal example through the same lens:

import qualified System.Directory

main :: IO ()
main = do
-- First, we request all inputs:
files <- System.Directory.listDirectory "."
let askFile file = do
putStrLn ("Would you like to delete " ++ file ++ "?")
response <- getLine
case response of
"y" -> return [file]
_ -> return []

listOfListOfFilesToRemove <- mapM askFile files
let listOfFilesToRemove = concat listOfListOfFilesToRemove

-- Then, we perform all actions:
let removeFile file = do
putStrLn ("Removing " ++ file)
System.Directory.removeFile file
mapM_ removeFile listOfFilesToRemove

We can simplify this using the same pattern:

import qualified System.Directory

main :: IO ()
main = do
files <- System.Directory.listDirectory "."
runWizard (foldMap prompt files)

prompt :: FilePath -> IO (IO ())
prompt file = do
putStrLn ("Would you like to delete " ++ file ++ "?")
response <- getLine
case response of
"y" -> return (do
putStrLn ("Removing " ++ file)
System.Directory.removeFile file )
_ -> return (return ())

runWizard :: IO (IO a) -> IO a
runWizard request = do
respond <- request

All we have to do is define a wizard for processing a single file, mass-produce the wizard using foldMap and the Monoid instance for IO takes care of bundling all the requests up front and threading the selected files to be removed afterwards.


This pattern does not subsume all possible wizards that users might want to write. For example, if the wizards depend on one another then this pattern breaks down pretty quickly. However, hopefully this provides an example of you can chain the Monoid instance for IO with other Monoid instance (even itself!) to generate emergent behavior.


by Gabriel Gonzalez ( at February 07, 2018 03:56 PM

February 06, 2018

Douglas M. Auclair (geophf)

January 2018 1Liner 1HaskellADay problems and solutions

  • January 8th, 2018: from Nicoλas‏ @BeRewt
    A small @1HaskellADay, old-school. Define foo:

    > foo 3 [1..5]
    [([1,2,3], 4), ([2,3,4], 5)]

    > foo 2 [1..4]
    [([1,2], 3), ([2,3], 4)]

    > foo 2 [1..20]
    [([1,2],3), ([2,3],4), ..., ([18,19],20)]

    > foo 20 [1..2]
    • Demiurge With a Teletype @mrkgrnao
      foo n
        = tails
        # filter (length # (> n))
        # map (splitAt n # second head)

      (#) = flip (.)
    • Andreas Källberg @Anka213
      I haven't tested it, but this should work:
      foo n xs = [ (hd,x) | (hd , x:_) <- n="" splitat=""> tails xs ]
    • <- n="" splitat="">Nicoλas @BeRewt foo n = zip <$> fmap (take n) . tails <*> drop n
  • January 5th, 2018: You have the following DAG-paths:

    a -> b -> c -> e
    a -> b -> d -> e
    q -> r -> s
    w -> x
    y -> z

    and many more.

    From a path, provide a bi-directional encoding* given maximum graph depth is, say, 7, max number of roots is, say, 10, and max number of nodes is, say, 1000.
    • *bi-directional encoding of a graph path:

      DAG path -> enc is unique for an unique DAG path
      enc -> DAG path yields the same DAG path that created the unique enc.

      *DAG: "Directed, acyclic graph."
  • January 5th, 2018: given s :: Ord k => a -> (k,[v])

    define f using s

    f :: Ord k => [a] -> Map k [v]

    with no duplicate k in [a]
    • Christian Bay @the_greenbourne f = foldr (\e acc -> uncurry M.insert (s e) acc) M.empty
      • me: you can curry away the acc variable easily
      • Christian Bay @the_greenbourne You're right :)
        f = foldr (uncurry M.insert . s) M.empty
    • Bazzargh @bazzargh fromList.(map s) ?
      • me: Yuppers

by geophf ( at February 06, 2018 09:43 AM

Ken T Takusagawa

[iblofees] Calendar Facts

Here is a machine readable version of xkcd #1930 "Calendar Facts", perhaps useful for followup projects like creating a random fact generator.  Further notes follow.

Sequence [Atom "Did you know that",Choice [Sequence [Atom "the",Choice [Sequence [Choice [Atom "fall",Atom "spring"],Atom "equinox"],Sequence [Choice [Atom "winter",Atom "summer"],Choice [Atom "solstice",Atom "Olympics"]],Sequence [Choice [Atom "earliest",Atom "latest"],Choice [Atom "sunrise",Atom "sunset"]]]],Sequence [Atom "Daylight",Choice [Atom "Saving",Atom "Savings"],Atom "Time"],Sequence [Atom "leap",Choice [Atom "day",Atom "year"]],Atom "Easter",Sequence [Atom "the",Choice [Atom "Harvest",Atom "super",Atom "blood"],Atom "moon"],Atom "Toyota Truck Month",Atom "Shark Week"],Choice [Sequence [Atom "happens",Choice [Atom "earlier",Atom "later",Atom "at the wrong time"],Atom "every year"],Sequence [Atom "drifts out of sync with the",Choice [Atom "sun",Atom "moon",Atom "zodiac",Sequence [Choice [Atom "Gregorian",Atom "Mayan",Atom "lunar",Atom "iPhone"],Atom "calendar"],Atom "atomic clock in Colorado"]],Sequence [Atom "might",Choice [Atom "not happen",Atom "happen twice"],Atom "this year"]],Atom "because of",Choice [Sequence [Atom "time zone legislation in",Choice [Atom "Indiana",Atom "Arizona",Atom "Russia"]],Atom "a decree by the Pope in the 1500s",Sequence [Choice [Atom "precession",Atom "libration",Atom "nutation",Atom "libation",Atom "eccentricity",Atom "obliquity"],Atom "of the",Choice [Atom "moon",Atom "sun",Atom "earth's axis",Atom "equator",Atom "prime meridian",Sequence [Choice [Atom "International Date",Atom "Mason-Dixon"],Atom "line"]]],Atom "magnetic field reversal",Sequence [Atom "an arbitrary decision by",Choice [Atom "Benjamin Franklin",Atom "Isaac Newton",Atom "FDR"]]],Atom "? ",Atom "Apparently",Choice [Atom "it causes a predictable increase in car accidents",Atom "that's why we have leap seconds",Atom "scientists are really worried",Sequence [Atom "it was even more extreme during the",Choice [Sequence [Choice [Atom "Bronze",Atom "Ice"],Atom "Age"],Atom "Cretaceous",Atom "1990s"]],Sequence [Atom "there's a proposal to fix it, but it",Choice [Atom "will never happen",Atom "actually makes things worse",Atom "is stalled in Congress",Atom "might be unconstitutional"]],Atom "it's getting worse and no one knows why"],Atom ". ",Atom "While it may seem like trivia, it",Choice [Atom "causes huge headaches for software developers",Atom "is taken advantage of by high-speed traders",Atom "triggered the 2003 Northeast Blackout",Atom "has to be corrected for by GPS satellites",Atom "is now recognized as a major cause of World War I"],Atom "."]

Including the mouseover text, the grammar encodes 780,000 facts.

The above grammar is the output of "show" by a Haskell program where we typed the grammar slightly more compactly, using/abusing the OverloadedStrings language extension and a Num instance.  OverloadedStrings is a nice language extension for when we have a large number of literals in the source.  Excerpts of the full source code below:

{-# LANGUAGE OverloadedStrings #-}

data Grammar = Atom String | Choice [Grammar] | Sequence [Grammar] deriving (Show,Eq);

instance IsString Grammar where {
fromString = Atom;

instance Num Grammar where {
(+) x y = Choice [x,y];
(*) x y = Sequence [x,y];
abs = undefined;
signum = undefined;
fromInteger = undefined;
negate = undefined;

facts :: Grammar;
facts =
    "Did you know that"

by Ken ( at February 06, 2018 03:40 AM

February 05, 2018

FP Complete

Cache CI builds to an S3 Bucket

Just by reading the blogpost title you are likely to guess the problem at hand, but to be fair I will recap it anyways.

by Alexey Kuleshevich ( at February 05, 2018 12:00 PM

February 04, 2018

Michael Snoyman

The Conduitpocalypse

At the end of last week, I made a number of breaking releases of libraries. The API impact of these changes was relatively minor, so most code should continue to work with little to no modification. I'm going to call out the major motivating changes for these releases below. If I leave out a package from explanations, assume the reason is just "upstream breaking changes caused breaking changes here." And of course check the relevant ChangeLogs for more details.

For completeness, the list of packages I released at the end of last week is:

  • conduit-1.3.0
  • conduit-extra-1.3.0
  • network-conduit-tls-1.3.0
  • resourcet-1.2.0
  • xml-conduit-1.8.0
  • html-conduit-1.3.0
  • xml-hamlet-0.5.0
  • persistent-2.8.0
  • persistent-mongoDB-2.8.0
  • persistent-mysql-2.8.0
  • persistent-postgresql-2.8.0
  • persistent-sqlite-2.8.0
  • persistent-template-
  • persistent-test-
  • conduit-combinators-1.3.0
  • yesod-1.6.0
  • yesod-auth-1.6.0
  • yesod-auth-oauth-1.6.0
  • yesod-bin-1.6.0
  • yesod-core-1.6.0
  • yesod-eventsource-1.6.0
  • yesod-form-1.6.0
  • yesod-newsfeed-
  • yesod-persistent-1.6.0
  • yesod-sitemap-1.6.0
  • yesod-static-1.6.0
  • yesod-test-1.6.0
  • yesod-websockets-0.3.0
  • classy-prelude-1.4.0
  • classy-prelude-conduit-1.4.0
  • classy-prelude-yesod-1.4.0
  • mutable-containers-0.3.4

Switching to MonadUnliftIO

The primary instigator for this set of releases was moving my libraries over from MonadBaseControl and MonadCatch/MonadMask (from the monad-control and exceptions packages, respectively) over to MonadUnliftIO. I gave a talk recently (slides) at LambdaWorld about this topic, and have blogged at length as well. Therefore, I'm not going to get into the arguments here of why I think MonadUnliftIO is a better solution to this class of problems.

Unless I missed something, this change dropped direct dependency on the monad-control, lifted-base, and lifted-async packages throughout all of the packages listed above. The dependency on the exceptions package remains, but only for using the MonadThrow typeclass, not the MonadCatch and MonadMask typeclasses. (This does leave open a question of whether we should still define valid instances of MonadCatch and MonadMask, see rio issue #38.)

User impact: You may need to switch some usages of the lifted-base package to unliftio or similar, and update some type signatures. It's possible that if you're using a monad transformer stack which is not an instance of MonadUnliftIO that you'll face compilation issues.

Safer runResourceT

In previous versions of the resourcet package, if you register a cleanup action which throws an exception itself, the exception would be swallowed. In this new release, any exceptions thrown during cleanup will be rethrown by runResourceT.

conduit cleanups

There were some big-ish changes to conduit:

  • Drop finalizers from the library, as discussed previously. This resulted in the removal of the yieldOr and addCleanup functions, and the replacement of the ResumableSource and ResumableConduit types with the SealedConduitT type.
  • Deprecated the old type synonyms and operators from the library. This has been planned for a long time.
  • Moved the Conduit and Data.Conduit.Combinators modules from conduit-combinators into conduit itself. This increases the dependency footprint of conduit itself, but makes it a fully loaded streaming data library. conduit-combinators is now an empty library.

Yesod: no more transformers!

The changes mentioned in my last blog post have been carried out. The biggest impact of that is replacing HandlerT and WidgetT (as transformers over IO) with HandlerFor and WidgetFor, as concrete monads parameterized by the site data type. Thanks to backwards compat HandlerT and WidgetT type synonyms, and the Template Haskell-generated Handler and Widget synonyms being updated automatically, hopefully most users will feel almost no impact from this. (Authors of subsites, however, will likely have a more significant amount of work to do.)

That's it?

Yeah, this post turned out much smaller than I expected. There are likely breakages that I've forgotten about and which should be called out. I'll ask that if anyone notices particular breakages they needed to work around, to please either include a note below or send a PR to this blog post (link above) adding information on the change.

February 04, 2018 10:15 AM

February 03, 2018

Roman Cheplyaka

Undefined behavior with StablePtr in Haskell

What will the following Haskell code snippet do?

  ptr1 <- newStablePtr 1
  ptr2 <- newStablePtr 2

  print =<< deRefStablePtr ptr1

Of course it will print 1… most of the time. But it’s also easy to make it print 2.

Here is the full program:

import Foreign.StablePtr

main = do
  ptr <- newStablePtr ()
  freeStablePtr ptr
  freeStablePtr ptr

  ptr1 <- newStablePtr 1
  ptr2 <- newStablePtr 2

  print =<< deRefStablePtr ptr1

If I compile it with ghc 8.0.2 on x86-64 Linux, it prints 2 and exits cleanly. (Under ghci or runghc, it also prints 2 but then gets a SIGBUS.) You can imagine how much fun it was to debug this issue in a complex library with a non-obvious double free.

The docs for freeStablePtr say:

Dissolve the association between the stable pointer and the Haskell value. Afterwards, if the stable pointer is passed to deRefStablePtr or freeStablePtr, the behaviour is undefined.

As far as undefined behaviors go, this one is fairly benign — at least it didn’t delete my code or install a backdoor.

Let’s see what’s going on here. The relevant definitions are in includes/stg/Types.h, includes/rts/Stable.h, rts/Stable.h, and rts/Stable.c. The excerpts below are simplified compared to the actual ghc source.

A stable pointer is just an integer index into an array, stable_ptr_table, although it is represented as a void*.

 * Stable Pointers: A stable pointer is represented as an index into
 * the stable pointer table.
 * StgStablePtr used to be a synonym for StgWord, but stable pointers
 * are guaranteed to be void* on the C-side, so we have to do some
 * occasional casting. Size is not a matter, because StgWord is always
 * the same size as a void*.

typedef void* StgStablePtr;

typedef struct {
    StgPtr addr;
} spEntry;

spEntry *stable_ptr_table;

This is how the table works:

  • If an index i is allocated to a valid StablePtr, then stable_ptr_table[i].addr points to whatever heap object the stable pointer is supposed to point to.

    StgPtr deRefStablePtr(StgStablePtr sp)
        return stable_ptr_table[(StgWord)sp].addr;
  • If the index i is free (not allocated to any valid StablePtr), then the corresponding entry in the table acts as a node in a linked list that contains all free entries. The variable stable_ptr_free contains a pointer to the start of this linked list.

    static spEntry *stable_ptr_free;
    /* Free a StgStablePtr */
    void freeSpEntry(spEntry *sp)
        sp->addr = (P_)stable_ptr_free;
        stable_ptr_free = sp;
    /* Allocate a fresh StgStablePtr */
    StgStablePtr getStablePtr(StgPtr p)
      StgWord sp;
      sp = stable_ptr_free - stable_ptr_table;
      stable_ptr_free  = (spEntry*)(stable_ptr_free->addr);
      stable_ptr_table[sp].addr = p;
      return (StgStablePtr)(sp);

Suppose that the first two free entries in stable_ptr_table are 18 and 19, which is what I actually observe at the program startup. (The RTS creates a few stable pointers of its own for the purpose of running the program, which explains why these don’t start at 0.) Here’s the double-free code annotated with what happens on each step.

  ptr <- newStablePtr ()
  -- ptr == 18
  -- stable_ptr_free == &stable_ptr_table[19]
  -- stable_ptr_table[18].addr is a pointer to a Haskell value ()
  freeStablePtr ptr
  -- stable_ptr_free == &stable_ptr_table[18]
  -- stable_ptr_table[18].addr == &stable_ptr_table[19]
  freeStablePtr ptr
  -- stable_ptr_free == &stable_ptr_table[18]
  -- stable_ptr_table[18].addr == &stable_ptr_table[18]

  ptr1 <- newStablePtr 1
  -- ptr1 == 18
  -- stable_ptr_free == &stable_ptr_table[18]
  -- stable_ptr_table[18].addr is a pointer to a Haskell value 1
  ptr2 <- newStablePtr 2
  -- ptr2 == 18
  -- stable_ptr_free is a pointer to a Haskell value 1
  -- stable_ptr_table[18].addr is a pointer to a Haskell value 2

Because stable_ptr_free now points into the Haskell heap and outside of stable_ptr_table, further allocations of stable pointers will corrupt Haskell memory and eventually result in SIGBUS or SIGSEGV.

February 03, 2018 08:00 PM

February 02, 2018

Joachim Breitner

The magic “Just do it” type class

One of the great strengths of strongly typed functional programming is that it allows type driven development. When I have some non-trivial function to write, I first write its type signature, and then the writing the implementation often very obvious.

Once more, I am feeling silly

In fact, it often is completely mechanical. Consider the following function:

foo :: (r -> Either e a) -> (a -> (r -> Either e b)) -> (r -> Either e (a,b))

This is somewhat like the bind for a combination of the error monad and the reader monad, and remembers the intermediate result, but that doesn’t really matter now. What matters is that once I wrote that type signature, I feel silly having to also write the code, because there isn’t really anything interesting about that.

Instead, I’d like to tell the compiler to just do it for me! I want to be able to write

foo :: (r -> Either e a) -> (a -> (r -> Either e b)) -> (r -> Either e (a,b))
foo = justDoIt

And now I can! Assuming I am using GHC HEAD (or eventually GHC 8.6), I can run cabal install ghc-justdoit, and then the following code actually works:

{-# OPTIONS_GHC -fplugin=GHC.JustDoIt.Plugin #-}
import GHC.JustDoIt
foo :: (r -> Either e a) -> (a -> (r -> Either e b)) -> (r -> Either e (a,b))
foo = justDoIt

What is this justDoIt?

*GHC.LJT GHC.JustDoIt> :browse GHC.JustDoIt
class JustDoIt a
justDoIt :: JustDoIt a => a
(…) :: JustDoIt a => a

Note that there are no instances for the JustDoIt class -- they are created, on the fly, by the GHC plugin GHC.JustDoIt.Plugin. During type-checking, it looks as these JustDoIt t constraints and tries to construct a term of type t. It is based on Dyckhoff’s LJT proof search in intuitionistic propositional calculus, which I have implemented to work directly on GHC’s types and terms (and I find it pretty slick). Those who like Unicode can write (…) instead.

What is supported right now?

Because I am working directly in GHC’s representation, it is pretty easy to support user-defined data types and newtypes. So it works just as well for

data Result a b = Failure a | Success b
newtype ErrRead r e a = ErrRead { unErrRead :: r -> Result e a }
foo2 :: ErrRead r e a -> (a -> ErrRead r e b) -> ErrRead r e (a,b)
foo2 = (…)

It doesn’t infer coercions or type arguments or any of that fancy stuff, and carefully steps around anything that looks like it might be recursive.

How do I know that it creates a sensible implementation?

You can check the generated Core using -ddump-simpl of course. But it is much more convenient to use inspection-testing to test such things, as I am doing in the Demo file, which you can skim to see a few more examples of justDoIt in action. I very much enjoyed reaping the benefits of the work I put into inspection-testing, as this is so much more convenient than manually checking the output.

Is this for real? Should I use it?

Of course you are welcome to play around with it, and it will not launch any missiles, but at the moment, I consider this a prototype that I created for two purposes:

  • To demonstrates that you can use type checker plugins for program synthesis. Depending on what you need, this might allow you to provide a smoother user experience than the alternatives, which are:

    • Preprocessors
    • Template Haskell
    • Generic programming together with type-level computation (e.g. generic-lens)
    • GHC Core-to-Core plugins

    In order to make this viable, I slightly changed the API for type checker plugins, which are now free to produce arbitrary Core terms as they solve constraints.

  • To advertise the idea of taking type-driven computation to its logical conclusion and free users from having to implement functions that they have already specified sufficiently precisely by their type.

What needs to happen for this to become real?

A bunch of things:

  • The LJT implementation is somewhat neat, but I probably did not implement backtracking properly, and there might be more bugs.
  • The implementation is very much unoptimized.
  • For this to be practically useful, the user needs to be able to use it with confidence. In particular, the user should be able to predict what code comes out. If there a multiple possible implementations, i.e. a clear specification which implementations are more desirable than others, and it should probably fail if there is ambiguity.
  • It ignores any recursive type, so it cannot do anything with lists. It would be much more useful if it could do some best-effort thing here as well.

If someone wants to pick it up from here, that’d be great!

I have seen this before…

Indeed, the idea is not new.

Most famously in the Haskell work is certainly Lennart Augustssons’s Djinn tool that creates Haskell source expression based on types. Alejandro Serrano has connected that to GHC in the library djinn-ghc, but I coudn’t use this because it was still outputting Haskell source terms (and it is easier to re-implement LJT rather than to implement type inference).

Lennart Spitzner’s exference is a much more sophisticated tool that also takes library API functions into account.

In the Scala world, Sergei Winitzki very recently presented the pretty neat curryhoward library that uses for Scala macros. He seems to have some good ideas about ordering solutions by likely desirability.

And in Idris, Joomy Korkut has created hezarfen.

by Joachim Breitner ( at February 02, 2018 07:01 PM

Ken T Takusagawa

[wisnmyni] Data as a number

Convert a number with possibly many leading zeroes in base M to base N, prefixing the base N output representation with leading zeroes in a way that unambigiously specifies the number of leading zeroes in base M input.  I think this is possible when M > N.  Some potentially useful conversions:

(M=16, N=10); (M=256, N=10); (M=256, N=100); (M=10; N=2)

The deeper idea is, whenever a string of characters represents raw data and not a word in a natural language, it should be encoded so that it is clear that the string is not meant to be interpreted as a word in natural language.  Unannotated hexadecimal fails this rule: if you see the string "deadbeef", is it a word, with a meaning perhaps related to food, or is it a number?  (Annotated hexadecimal, e.g., "0xdeadbeef" is clearly not a word.)  Simpler example: what does "a" mean, indefinite article or ten in hexadecimal?  English, and other orthographies which use Hindu-Arabic numerals, already have a character set which unambiguously state that a string encodes data and not words: the numerals.  (One could argue that numbers -- strings of Hindu-Arabic numerals -- have more meaning than strictly data: they have ordinality, they obey group and field axioms, etc.  However, we frequently see numbers which aren't that, e.g., serial numbers, phone numbers, ZIP codes, ID and credit card numbers.)

The inspiration was, expressing hashes in hexadecimal is sillyRadix conversion is quick and easy for a computer; it should be in decimal with leading zeroes if necessary.  If making the hash compact is a goal, then base 26 or base 95, with some annotation signifying it is encoded data, is better than hexadecimal.

Some Haskell code demonstrating some of the conversions.

by Ken ( at February 02, 2018 06:58 AM

February 01, 2018

Douglas M. Auclair (geophf)

January 2018 1HaskellADay Problems and Solutions

by geophf ( at February 01, 2018 03:21 AM

Sandy Maguire

Devlog: Action Menus, Timers and Hit Detection

<article> <header>

Devlog: Action Menus, Timers and Hit Detection


<time>February 1, 2018</time> devlog, neptune

The other day, I found myself working on the interaction subsystem of my game engine. I want the game to play like Monkey Island 3, which means you can click on the ground to walk there. You can also click and hold on an interactive piece of scenery in order to have a context-sensitive menu pop-up, from which you can choose how to interact with the object in question. If you’re not familiar with the genre, watching a few minutes of the video linked above should give you some idea of what I’m trying to build.

An adventure game in which you’re unable to interact with anything isn’t much of a game, and that’s where we left the engine. So it seemed like a thing to focus on next.

I knew that click/hold interaction that I wanted formed some sort of DFA, so I unwisely headed down that garden path for a bit. After implementing a bit, I found a state machine with the denotation of type DFA s e a = s -> e -> Either s a, where s is the state of the machine, e is the type of an edge transition, and a is the eventual output of the machine. Upon the final result, however, it became clear that I had fallen into an abstraction hole. I spent a bunch of time figuring out the implementation of this thing, and then afterwards realized it didn’t actually solve my problem. Whoops. Amateur Haskell mistake :)

The problem is that transitioning into some state might need to make a monadic action in order to generate the next edge. For example, when you press down on the mouse button, we need to start a timer which will open the action menu when it expires. This could be alleviated by changing Either to These and letting a ~ (Monad m => m b), but that struck me as a pretty ugly hack, and getting the implementation of the denotation to work again was yucky.

So I decided that instead maybe I should write a dumb version of what I wanted, and find out how to abstract it later if I should need similar machinery again in the future. I burned my DFA implementation in a fire.

This posed a problem, though, because if I wanted to write this for real I was going to need things to actually interact with, and I didn’t yet have those. I decided to put the interaction sprint on hold, in order to focus more on having things with which to interact.

One abstraction I think in terms of when working with adventure games is that of the hotspot. A hotspot is a mask on the background image which indicates a static piece of interesting geometry. For example, a window that never moves would be baked into the background image of the room, and then a hotspot would be masked on top of it to allow the character to interact with it.

For example, if our room looks like this (thanks to MI2 for the temporary art):

room background

room background

Then our mask image would look like this:

room mask

room mask

We can add some logic to be able to read the mask:

    :: Image PixelRGBA8
    -> (Word8 -> Bool)
    -> Hotspot
    -> Pos
    -> Maybe Hotspot
mkHotspot img f h = bool Nothing (Just h)
                  . f
                  . getHotspotByte
                  . uncurry (pixelAt img)
                  . (\(V2 x y) -> (x, y))
                  . clampToWorld
                  . fmap round
    clampToWorld = clamp (V2 0 0) $ imageSize img
    getHotspotByte (PixelRGBA8 _ g _ _) = g

and now bake the first three parameters of this function when we construct our level definition.

In order to test these things, I gave added a field _hsName :: Hotspot -> String in order to be able to test if my logic worked. The next step was to bind the click event to be able to call the Pos -> Maybe Hotspot that I curried out of mkHotspot and stuck into my Room datastructure (_hotspots :: Room -> Pos -> Maybe Hotspot).

I clicked around a bunch, and found that print . fmap _hsName $ _hotspots currentRoom mousePos lined up with the door when I clicked on it. It seemed to be working, so I considered my first yak shave successful: I now had something in the world that I could interact with.

The next step was to code up a little bit of the DFA I was originally working on. I decided that I should make the avatar walk to the place you clicked if it wasn’t a hotspot.

case event of
  MouseButton Down ->
    case _hotspots currentRoom mousePos of
      Just hs ->
        print $ _hsName hs

      Nothing ->
        when (isWalkable (_navmesh currentRoom) mousePos) $
          emap $ do
            with isAvatar
            pure defEntity'
              { pathing = Set $ NavTo mousePos

So: when the mouse is pressed, see if it was over top of a hotspot. If so, print out the name of it. Otherwise, check the navmesh of the room, and see if that’s a valid place to walk. If so, update any entity who has the isAvatar component and set its pathing component to be the location we want.

The engine at this point already has navigation primitives, which is why this works. We’ll discuss how the navmesh is generated and used in another devlog post.

I ran this code and played around with it for a while. Everything looked good – after I remembered to set isAvatar on my player entity :)

The next step was to implement timers that would have a callback, and could be started and stopped. I’d need support for these in order to wait a little bit before opening up the action menu. Thankfully, timers are super easy: just have an amount of time you decrement every frame until it hits zero, and then do the necessary action. I came up with this model for timers:

data Timer = Timer
  { _tTime     :: Time
  , _tCallback :: Game ()

data TimerType
  = TimerCoin
  deriving (Eq, Ord)

data GlobalState = GlobalState
  { ... -- other stuff
  , _timers :: Map TimerType Timer

A Timer is just an amount of remaining time and something to do afterwards. It’s stored in the GlobalState with a TimerType key. I originally thought about using a bigger type (such as Int) as my timer key, but realized that would make canceling specific timers harder as it would imply they’re given a non-deterministic key when started. The interface for starting and canceling timers turned out to be trivial:

startTimer :: TimerType -> Time -> Game () -> Game ()
startTimer tt t cb =
  setGlobals $ timers . at tt ?~ Timer t cb

cancelTimer :: TimerType -> Game ()
cancelTimer tt =
  setGlobals $ timers . at tt .~ Nothing

The only thing left is to update timers and run their callbacks when it’s time. I fucked around with this implementation too hard, trying to find a completely lensy way of doing it, but eventually settled on this ugly fromList . toList thing:

updateTimers :: Time -> Game ()
updateTimers dt = do
  ts  <- getGlobals $ view timers
  ts' <- forOf traverse ts $ \t ->
           if _tTime t - dt <= 0
             then _tCallback t $> Nothing
             else pure . Just
                       $ t & tTime -~ dt

  setGlobals $
    timers .~ M.fromList (catMaybes . fmap sequence $ M.toList ts')

ts' is a traversal over the Map of timers, that decrements each of their times, optionally runs their callbacks, then returns a Mayber Timer for each one. The last line is where the interesting bit is – sequence over a (TimerType, Maybe Timer) is a Maybe (TimerType, Timer), which we can then insert back into our Map as we construct it – essentially filtering out any timers which have expired.

Finally we can get back to our DFA. Instead of printing out the name of the hotspot you clicked on, we can now start a timer that will update our game state. I added a field to GlobalState:

data GlobalState = GlobalState
  { ... -- other stuff
  , _gInputDFA :: InputDFA

data InputDFA
  = IStart
  | IBeforeCoin
  | ICoinOpen Pos HotSpot
  deriving (Eq, Ord)

The idea is that we start in state IStart, transition into IBeforeCoin when we start the timer, and into ICoinOpen when the timer expires. Additionally, if the user releases the mouse button, we want to cancel the timer. All of this becomes:

case (_gInputDFA globalState, event) of
  (IStart, MouseButton Down) ->
    case _hotspots currentRoom mousePos of
      Just hs -> do
        startTimer TimerCoin 0.5 $ do
          setGlobals $ gInputDFA .~ ICoinOpen mousePos hs
        setGlobals $ gInputDFA .~ IBeforeCoin

      Nothing ->
        -- as before

  (IBeforeCoin, MouseButton Up) -> do
    cancelTimer TimerCoin
    setGlobals $ gInputDFA .~ IStart

  (ICoinOpen p hs, MouseButton Up) -> do
    let verb = getBBSurface (coinSurface p) mousePos
    for_ verb $ doInteraction hs
    setGlobals $ gInputDFA .~ IStart

If you care, try to trace through these cases and convince yourself that this logic is correct. The reason we have a position stored inside the ICoinOpen is so that we know where the mouse was when the user started holding their mouse down. This corresponds to where we should draw the action menu.

This is done in the drawing routine by checking the current state of _gInputDFA – if it’s ICoinOpen it means the menu is up and we need to draw it.

The only last thing is how can we map where you release your mouse button on the menu to what interaction we should do. Our action menu looks like this:

the action menu

the action menu

From left to right, these squares represent talking/eating, examining, and manipulating. We need some way of mapping a location on this image to a desired outcome.

Doing rectangle collision is easy enough – we define a bounding box and a test to see if a point is inside of it (as well as some auxiliary functions for constructing and moving BBs, elided here):

data BB = BB
  { leftX   :: Float
  , rightX  :: Float
  , topY    :: Float
  , bottomY :: Float
  } deriving (Eq, Ord, Show)

inBB :: BB -> Pos -> Bool
inBB BB{..} (V2 x y) = and
  [ x >= leftX
  , x <  rightX
  , y >= topY
  , y <  bottomY

rectBB :: Float -> Float -> BB
moveBB :: Pos -> BB -> BB

The final step is to somehow map these bounding boxes to things we want to return. This seems like it’ll be a recurring theme, so we build some machinery for it:

data BBSurface a = BBSurface [(BB, a)]
  deriving (Eq, Ord, Show)

getBBSurface :: BBSurface a -> Pos -> Maybe a
getBBSurface (BBSurface bs) p =
  getFirst . flip foldMap bs $ \(b, a) ->
    if inBB b p
       then First $ Just a
       else First $ Nothing

The abstraction is my amazingly-named BBSurface, which is a mapping of BBs to values of some type a. We can find a Maybe a on the BBSurface by just checking if the point is in any of the bounding boxes. If it is, we return the first value we find.

All that’s left is to construct one of these BBSurfaces for the coin, and then to move it to the position indicated inside the ICoinOpen. Easy as pie. Pulling everything together, and our interactive menu works as expected. Great success!

Next time we’ll talk about navigation. Thanks for reading!


February 01, 2018 12:00 AM

January 31, 2018

FP Complete

How to Implement Containers to Streamline Your DevOps Workflow

What are Docker Containers?

Docker containers are a form of "lightweight" virtualization They allow a process or process group to run in an environment with its own file system, somewhat like   chroot   jails , and also with its own process table, users and groups and, optionally, virtual network and resource limits. For most purposes, the processes in a container think they have an entire OS to themselves and do not have access to anything outside the container (unless explicitly granted). This lets you precisely control the environment in which your processes run, allows multiple processes on the same (virtual) machine that have completely different (even conflicting) requirements, and significantly increases isolation and container security.

by Emanuel Borsboom ( at January 31, 2018 04:00 PM

Hash Based Package Downloads - part 2 of 2

In our previous post, we define a common problem around reproducible build plans. The solution we desired was some form of cryptographic hash based configuration and download system for packages, package metadata, and snapshot definitions. This blog post will describe a potential concrete implementation.

by Michael Snoyman ( at January 31, 2018 02:00 PM

January 28, 2018

Gabriel Gonzalez

Dhall Survey Results (2017-2018)

<html xmlns=""><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > { color: #4070a0; } /* Char */ code > { color: #4070a0; } /* String */ code > { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > { color: #880000; } /* Constant */ code > { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > { color: #bb6688; } /* SpecialString */ code > { } /* Import */ code > { color: #19177c; } /* Variable */ code > { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > { color: #7d9029; } /* Attribute */ code > { color: #ba2121; font-style: italic; } /* Documentation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style></head><body>

I advertised a survey in my prior post to collect feedback on the Dhall configuration language. You can find the raw results here:

18 people completed the survey at the time of this writing and this post discusses their feedback.

This post assumes that you are familiar with the Dhall configuration language. If you are not familiar, then you might want to check out the project website to learn more:

"Which option describes how often you use Dhall"

  • "Never used it" - 7 (38.9%)
  • "Briefly tried it out" - 7 (38.9%)
  • "Use it for my personal projects" - 1 (5.6%)
  • "Use it at work" - 3 (16.7%)

The survey size was small and the selection process is biased towards people who follow me on Twitter or read my blog and are interested enough in Dhall to answer the survey. So I don't read too much into the percentages but I am interested in the absolute count of people who benefit from using Dhall (i.e. the last two categories).

That said, I was pretty happy that four people who took the survey benefit from the use of Dhall. These users provided great insight into how Dhall currently provides value in the wild.

I was also surprised by how many people took the survey that had never used Dhall. These people still gave valuable feedback on their first impressions of the language and use cases that still need to be addressed.

"Would anything encourage you to use Dhall more often?"

This was a freeform question designed to guide future improvements and to assess if there were technical issues holding people back from adopting the language.

Since there were only 18 responses, I can directly respond to most of them:

Mainly, a formal spec (also an ATS parsing implementation)

Backends for more languages would be great.

More compilation outputs, e.g. Java

The main thing is more time on my part (adoption as JSON-preprocessor and via Haskell high on my TODO), but now that you ask, a Python binding

Stable spec, ...

Julia bindings

Scala integration, ...

I totally agree with this feedback! Standardizing the language and providing more backends are the highest priorities for this year.

No language was mentioned more than once, but I will most likely target Java and Scala first for new backends.

Some way to find functions would be immensely helpful. The way to currently find functions is by word of mouth.

This is a great idea, so I created an issue on the issue tracker to record this suggestion:

I probably won't have time myself to implement this (at least over the next year), but if somebody is looking for a way to contribute to the Dhall ecosystem this would be an excellent way to do so.

I would use Dhall more often if it had better type synonym support.

...; ability to describe the whole config in one file

Fortunately, I recently standardized and implemented type synonym in the latest release of the language. For example, this is legal now:

    let Age = Natural

in let Name = Text

in let Person = { age : Age, name : Name }

in let John : Person = { age = +23, name = "John" }

in let Mary : Person = { age = +30, name = "Mary" }

in [ John, Mary ]

I assume the feedback on "ability to describe the whole config in one file" is referring a common work-around of importing repeated types from another file. The new type synonym support means you should now be able to describe the whole config in one file without any issues.

Verbosity for writing values in sum types was a no-go for our use case

...; more concise sum types.

The most recent release added support for the constructors keyword to simplify sum types. Here is an example of this new keyword in action:

    let Project =
< GitHub : { repository : Text, revision : Text }
| Hackage : { package : Text, version : Text }
| Local : { relativePath : Text }

in [ Project.GitHub
{ repository = ""
, revision = "ae5edf227b515b34c1cb6c89d9c58ea0eece12d5"
, Project.Local { relativePath = "~/proj/optparse-applicative" }
, Project.Local { relativePath = "~/proj/discrimination" }
, Project.Hackage { package = "lens", version = "4.15.4" }
, Project.GitHub
{ repository = ""
, revision = "ccbfabedea1cf5b38ff19f37549feaf01225e537"
, Project.Local { relativePath = "~/proj/servant-swagger" }
, Project.Hackage { package = "aeson", version = "" }

Missing a linter usable in editors for in-buffer error highlighting.

..., editor support, ...

Dhall does provide a Haskell API including support for structured error messages. These store source spans which can be used for highlighting errors, so this should be straightforward to implement. These source spans power Dhall's error messages, such as:

$ dhall <<< 'let x = 1 in Natural/even x'

Use "dhall --explain" for detailed errors

Error: Wrong type of function argument

Natural/even x


... the source span is how Dhall knows to highlight the Natural/even x subexpression at the bottom of the error message and that information can be programmatically retrieved from the Haskell API.

Sometimes have to work around record or Turing completeness limitations to generate desired json or yaml: often need record keys that are referenced from elsewhere as strings.

I will probably never allow Dhall to be Turing complete. Similarly, I will probably never permit treating text as anything other than an opaque value.

Usually the solution here is to upstream the fix (i.e. don't rely on weakly typed inputs). For example, instead of retrieving the field's name as text, retrieve a Dhall function to access the desired field, since functions are first-class values in Dhall.

If you have questions about how to integrate Dhall into your workflow, the current best place to ask is the issue tracker for the language. I'm not sure if this is the best place to host such questions and I'm open to suggestions for other forums for support.

Would like to be able to compute the sha1 or similar of a file or expression.

The Dhall package does provide a dhall-hash utility exactly for this purpose, which computes the semantic hash of any file or expression. For example:

$ dhall-hash <<< 'λ(x : Integer) → [1, x, 3]'

Better syntax for optional values. It is abysymmal now. But perhaps I'm using dhall wrong. My colleague suggests merging into default values.

Dropping the lets

simpler syntax; ...

There were several complaints about the syntax. The best place to discuss and/or propose syntax improvements is the issue tracker for the language standard.

For example:

Better error messages; ...

The issue tracker for the Haskell implementation is the appropriate place to request improvements to error messages. Here are some recently examples of reported issues in error messages:

I worry that contributors/friends who'd like to work on a project with me would be intimidated because they don't know the lambda calculus

Dhall will always support anonymous functions, so I doubt the language will be able to serve those who shy away from lambda calculus. This is probably where something like JSON would be a more appropriate choice.

..., type inference

..., full type inference, row polymorphism for records

This is probably referring to Haskell/PureScript/Elm-style inference where the types of anonymous functions can be inferred (i.e. using unification). I won't rule out supporting type inference in the future but I doubt it will happen this year. However, I welcome any formal proposal to add type inference to the language now that the type-checking semantics have been standardized.

It's cool, but I'm yet to be convinced it will make my life easier

No problem! Dhall does not have a mission to "take over the world" or displace other configuration file formats.

Dhall's only mission is to serve people who want programmable configuration files without sacrificing safety or ease of maintenance.

"What do you use Dhall for?"

"Why do you use Dhall?"

This was the part of the survey that I look forward to the most: seeing how people benefit from Dhall. These were two freeform questions that I'm combining in one section:

WHAT: HTML templating, providing better manifest files (elm-package.json)

WHY: Dhall as a template language allows us to express all of the inputs to a template in one place. We don't have to hunt-and-peck to find out what a template is parameterized by. And if we want to understand what some variable is in the template, we can just look at the top of the file where its type is declared. We want exact dependencies for our elm code. There is no specific syntax for exact versions in elm-package.json. There is a hack around that limitation which requires enforcement by convention (or CI scripts or some such). We use Dhall to generate exact versions by construction. The JSON that's generated uses the hack around the limitation, but we don't have to think about that accidental complexity.

WHAT: Ad hoc command-line tools

WHAT: Generating docker swarm and other configs from common templates with specific variables inserted safely.

WHY: JSON and yaml suck, want to know about errors at compile time.

WHAT: Kubernetes

WHY: Schema validation. Kubernetes is known for blowing up in your face at runtime by doing evaluation and parsing at the same time

WHAT: Config

WHY: Strong typing & schema

WHY: Rich functinality and concise syntax

WHAT: JSON preprocessor

WHY: JSON is universal but a bit dumb

WHAT: configuration - use haskell integration and spit out json/yaml on some projects (to kubernetes configs for instance)

WHY: I love pure fp and dealing with configs wastes a lot of my time and is often error prone

Interestingly, ops-related applications like Docker Swarm and Kubernetes were mentioned in three separate responses. I've also received feedback (outside the survey) from people interested in using Dhall to generate Terraform and Cloudformation templates, too.

This makes sense because configuration files for operational management tend to be large and repetitive, which benefits from Dhall's support for functions and let expressions to reduce repetition.

Also, reliability matters for operations engineer, so being able to validate the correctness of a configuration file ahead of time using a strong type system can help reduce outages. As Dan Luu notes:

Configuration bugs, not code bugs, are the most common cause I’ve seen of really bad outages. When I looked at publicly available postmortems, searching for “global outage postmortem” returned about 50% outages caused by configuration changes. Publicly available postmortems aren’t a representative sample of all outages, but a random sampling of postmortem databases also reveals that config changes are responsible for a disproportionate fraction of extremely bad outages. As with error handling, I’m often told that it’s obvious that config changes are scary, but it’s not so obvious that most companies test and stage config changes like they do code changes.

Also, type-checking a Dhall configuration file is a faster and more convenient to valid correctness ahead-of-time than an integration test.

Functions and types were the same two justifications for my company (Awake Security) using Dhall internally. We had large and repetitive configurations for managing a cluster of appliances that we wanted to simplify and validate ahead of time to reduce failures in the field.

Another thing that stood out was how many users rely on Dhall's JSON integration. I already suspected this but this survey solidified that.

I also learned that one user was using Dhall not just to reduce repetition, but to enforce internal policy (i.e. a version pinning convention for elm-package.json files). This was a use case I hadn't thought of marketing before.

Other feedback

We tried it as an option at work for defining some APIs but describing values in sum types was way too verbose (though I think I understand why it might be needed). We really like the promise of a strongly typed config language but it was pretty much unusable for us without building other tooling to potentially improve ergonomics.

As mentioned above, this is fixed in the latest release with the addition of the constructors keyword.

Last time I checked, I found syntax a bit awkward. Hey, Wadler's law :) Anyway, great job! I'm not using Dhall, but I try to follow its development, because it's different from everything else I've seen.

Syntax concerns strike again! This makes me suspect that this will be a recurring complaint (similar to Haskell and Rust).

I also appreciate that people are following the language and hopefully getting ideas for their own languages and tools. The one thing I'd really like more tools to steal is Dhall's import system.

You rock. Thanks for all your work for the community.

Dhall is awesome!


I love the project and am grateful for all the work that's been done to make it accessible and well documented. Keep up the good work!

Thank you all for the support, and don't forget to thank other people who have contributed to the project by reporting issues and implementing new features.


by Gabriel Gonzalez ( at January 28, 2018 05:05 PM

Sandy Maguire

Why Take Ecstasy

<article> <header>

Why Take Ecstasy


<time>January 28, 2018</time> haskell, dsl, design, type trickery

/u/Ahri asked on reddit about yesterday’s post,

Perhaps you could explain a little bit about your choice to write ecstasy rather than to use apecs? I’ve not used apecs, I’m just interested as I had done some limited research into writing games in Haskell and apecs seemed to have traction.

That seems like a really good idea, and combined with the fact that I really haven’t published anything about ecstasy suggested I actually write about it!

What is an ECS?

So before diving in, let’s take a look at the problem an entity-component-system (ECS) solves. Let’s say we’re writing a simple 2D platformer, we’ll have dudes who can run around and jump on platforms.

The way I’d go about writing this before knowing about ECS would be to implement one feature at a time, generally using the player character to test it as I went. I write functions that look something like this:

moveActor :: Controller -> Actor -> Actor
moveActor ctrl actor =
  actor & actorPos +~ movingDirection ctrl

and then provide some types to hold all of the world together:

data Universe = Universe
  { _uPlayer       :: Actor
  , _uPlayerCtrl   :: Controller
  , _uCurrentLevel :: Level

data Level = Level
  { _lActors :: [Actor]

and finally write some glue code to lift moveActor over the universe.

updateUniverse :: Universe -> Universe
updateUniverse u@Universe{..} =
  u & uPlayer %~ moveActor _uPlayerCtrl
    & uCurrentLevel . lActors . traverse %~ moveActor someCtrl

On the surface this feels good. We’ve reused the code for moveActor for both the player and any other dudes on the level who might want to walk around. It feels like we can build up from here, and compose pieces as we go.

Which is true if you’re really patient, good at refactoring, or have spent a lot of time building things like this and know where you’re going to run afoul. Because you’re always going to run afoul in software.

The problem with our first attempt at this code is that it codifies a lot of implicit assumptions about our game. For example, did you notice that it implies we’ll always have an Actor for the player? It seems like a reasonable assumption, but what if you want to play a cut-scene? Or how about if you don’t want to always have control over the player? Maybe you’ve just been hit by something big that should exert some acceleration on you, and you don’t want to be able to just press the opposite direction on the control stick to negate it.

All of a sudden, as you try to code for these things, your simple moveActor function takes more and more parameters about the context of the circumstances in which it’s running. And what’s worse is that often the rules of how these behaviors should play out will change depending on whether its the player or some enemy in the level. We’re left with a conundrum – should we build ad-hoc infrastructure around the callers of moveActor or should we put all of the logic inside of it?

As you can imagine, it pretty quickly becomes a mess.

In one of the few times I’ll praise object-oriented programming, I have to say that its inheritance-based polymorphism lends itself well to this problem. You can build more complicated and specific behaviors out of your ancestors’ behaviors. Unfortunately, this approach bucks the OOP best-practice of “composition over inheritance.”

ECS takes what I consider to be the functional-programming-equivalent of this OOP strategy. It’s fundamental stake in the ground is that rather than representing your universe of game objects as an array-of-structs, you instead represent it as a struct-of-arrays. Conceptually, this is a cognitive shift that means instead of looking like this:

data GameObject = GameObject
  { position :: V2
  , velocity :: V2
  , graphics :: Picture
  , buffs    :: [Buff]
  , notAffectedByGravity :: Bool

type Universe = [GameObject]

you instead model the domain like this:

data Universe = Universe
  { position :: Array V2
  , velocity :: Array V2
  , graphics :: Array Picture
  , buffs    :: Array [Buff]
  , notAffectedByGravity :: Array Bool

This has some profound repercussions. First of all, notice that we have no guarantee that our Arrays are the same length, which implies that not every GameObject need have all of its possible components.

All of a sudden, we can pick and choose which components an entity has. Entities, now instead of being explicitly modeled by a GameObject are implicitly defined by an Int corresponding to their index in all of the arrays.

From here, we can now write specific, global behaviors that should manipulate the components of an entity. We can avoid a lot of our previous ad-hoc machinery by essentially running a map that performs pattern matching on only the components we want to care about. For example, we can say that we only want to draw entities who have both a position and a graphics. We want to apply gravity to all entities that have a velocity, but don’t have the notAffectedByGravity flag.


EDIT 2018-01-30: The author of apecs has replied to this post. It’s worth reading through, as it gives a useful perspective from the other side.

With an understanding of what ECS brings to the table, we’re now ready to take a look at different ways of implementing such a system. We first turn our attention to apecs.

If we wanted to model our above GameObject via apecs, it might look something like this:

newtype Position = Position (V2 Double)
instance Component Position where
  type Storage Position = Map Position

newtype Velocity = Velocity (V2 Double)
instance Component Velocity where
  type Storage Velocity = Map Velocity

newtype Graphics = Graphics Picture
instance Component Graphics where
  type Storage Graphics = Map Graphics

newtype Buffs = Buffs [Buff]
instance Component Buffs where
  type Storage Buffs = Map Buffs

newtype NotAffectedByGravity = NotAffectedByGravity
instance Flag NotAffectedByGravity where
  flag = NotAffectedByGravity
instance Component NotAffectedByGravity where
  type Storage NotAffectedByGravity = Set NotAffectedByGravity

makeWorld "World"
  [ ''Position
  , ''Velocity
  , ''Graphics
  , ''Buffs
  , ''NotAffectedByGravity

You’ll have to admit it’s a lot of boilerplate, which in turn would use Template Haskell to generate something similar to our conceptual Universe above:

data World = World
  { position :: Array (Maybe Position)
  , velocity :: Array (Maybe Velocity)
  , graphics :: Array (Maybe Graphics)
  , buffs    :: Array (Maybe Buffs)
  , notAffectedByGravity :: Set Int

I haven’t dug too much into the internals of apecs, so this representation might not be perfect, but it’s good enough for us to get an understanding of what’s going on here.

We can now use some of apecs’ primitives to, for example, transfer our velocity over to our position:

rmap $ \(Position p, Velocity v) -> Position $ p + v

This rmap function is something I’d describe as “fucking magic.” You pass it a lambda, it inspects the type of the lambda, uses the tuple of its input to determine which components an entity must have, and then will update the components of the corresponding output tuple.

At first, this seems like a fine abstraction, but it breaks down pretty quickly when used in anger. For example, what if you want to run a function over Position that only works if you don’t have a Velocity? Or if you want to remove a component from an entity? apecs can do it, but good luck finding the right function. Do you want cmap, cmapM, cmapM_, cimapM, cimapM_, rmap', rmap, wmap, wmap' or cmap'? After a week of working with the library, I still couldn’t come up with heads or tails for which function I needed in any circumstance. I’m sure there’s a mnemonic here somewhere, but I’m not bright enough to figure it out.

When you do eventually find the right function, doing anything other than a pure map from one component to another becomes an exercise in futility and magic pattern matching. There’s this thing called Safe you sometimes need to pattern match over, or produce, and it roughly corresponds to when you’re not guaranteed to have all of the components you asked for.

There are several other gotchas, too. For example, you can construct an entity by providing a tuple of the components you want to set. Unfortunately, due to apecs’ design, this thing must be type-safe. Which means you can’t construct one based on runtime data if you’re loading the particular components from e.g. a level editor. Well, you can, if you’re willing to play “existentialize the dictionary” and learn enough of the underlying library (and quirks of Haskell’s type inference algorithm) in order to convince the compiler what you’re doing is sound.

One final gotcha I’ll mention is that this magic tuple stuff is provided through typeclasses which are generated for the library by template haskell. Out of the box, you only get support for 5-tuples, which means you can’t easily construct entities with more components than that. Furthermore, changing the TH to generate more results in exponential growth of your compile times.

None of this is to say that apecs is bad software. It’s actually pretty brilliant in terms of its technology; I just feel as though its execution is lacking. It depends on a lot of tricks that I wouldn’t consider to be idiomatic Haskell, and its usability suffers as a consequence.


So with all of the above frustrations in mind, and a lot of time to kill in a Thai airport, I felt like I could make a better ECS. Better is obviously subjective for things like this, but I wanted to optimize it for being used by humans.

My explicit desiderata were:

  1. Keep boilerplate to a minimum.
  2. The user shouldn’t ever bump into any magic.

I think ecstasy knocks it out of the park on both of these fronts. Before diving into how it all works, let’s take a peek at how it’s used. We can define our components like so:

data EntWorld f = Entity
  { position :: Component f 'Field V2
  , velocity :: Component f 'Field V2
  , graphics :: Component f 'Field Picture
  , buffs    :: Component f 'Field [Buff]
  , notAffectedByGravity :: Component f 'Field ()
  } deriving (Generic)

type Entity = EntWorld 'FieldOf

That’s it! No template haskell, no typeclasses, no nothing. You get everything for free just out of this one deriving Generic statement. We’ll talk about how it works in just a second.

We can implement the velocity/position behavior as follows:

emap $ do
  p <- get position
  v <- get velocity
  pure defEnt'
    { position = Set $ p + v

Ecstasy clearly wins on minimizing the definition-side of boilerplate, but it seems like we’ve gained some when we actually go to use these things. This is true, but what we buy for that price is flexibility. In fact, emap is powerful enough to set, unset and keep components, as well as branch on whether or not a component is actually there. Compare this to the ten functions with different signatures and semantics that you need to keep in mind when working with apecs, and it feels like more of a win than the syntax feels like a loss.

So the question I’m sure you’re wondering is “how does any of this work?” And it’s a good question. Part of the reason I wrote this library was to get a feel for the approach and for working with GHC.Generics.

The idea comes from my colleague Travis Athougies and his mind-meltingly cool library beam. The trick is to get the library user to define one semantic type that makes sense in their domain, and then to use tricky type system extensions in order to corral it into everything you need. beam uses this approach to model database tables; ecstasy uses it to provide both a struct-of-arrays for your components, as well as just a struct corresponding to a single entity.

As you’d expect, the sorcery is inside of the Component type family. We can look at its definition:

type family Component (s :: StorageType)
                      (c :: ComponentType)
                      (a :: *) :: * where
  Component 'FieldOf  c      a = Maybe a
  Component 'SetterOf c      a = Update a

  Component 'WorldOf 'Field  a = IntMap a
  Component 'WorldOf 'Unique a = Maybe (Int, a)

This Component thing spits out different types depending on if you want a record for the entity ('FieldOf), an updater to change which components an entity has ('SetterOf), or the actual universe container to hold all of this stuff ('WorldOf). If we’re building an entity record, every component is a Maybe. If we’re describing a change to an entity, we use data Update a = Set a | Unset | Keep. If we want a place to store all of our entities, we generate an IntMap for every 'Field. There’s also support for adding components that are uniquely owned by a single entity, but we won’t get into that today.

The trick here is that we get the user to fill in the c :: ComponentType when they define the components, and ask them to keep the s :: StorageType polymorphic. The library then can instantiate your EntWorld f with different StorageTypes in order to pull out the necessary types for actually plumbing everything together.

We use the Generic derivation on EntWorld in order to allow ourselves to construct the underlying machinery. For example, when you’re defining an entity, you don’t want to be able to Keep the old value of its components, since it didn’t have any to begin with. We can use our Generic constraint in order to generate a function toSetter :: EntWorld 'FieldOf -> EntWorld 'SetterOf which takes an entity record and turns it into an entity update request, so that we don’t actually need special logic to construct things. The Generic constraint also helps generate default values of the EntWorld 'WorldOf and other things, so that you don’t need to write out any boilerplate at the value level in order to use these things.

The actual how-to-do of the GHC.Generics is outside of the scope of today’s post, but you can read through the source code if you’re curious.


January 28, 2018 12:00 AM

January 27, 2018

Sandy Maguire

Devlog: Starting a Game Engine

<article> <header>

Devlog: Starting a Game Engine


<time>January 27, 2018</time> devlog, neptune

I’m ravenously working my way through Austin Kleon’s excellent book Show Your Work. One of the points that most resounded with me was to, as you might anticipate, show your work. But more importantly, to share it every day. I’ve decided to take up that challenge in documenting the development of some of my bigger projects. The goal has a few facets: to show how I work and the struggles that I face while writing Haskell on a day-to-day basis; to lend my voice towards the art of game programming in Haskell; and to bolster my 2018 publishing goals.

I want to make an old school point-and-click adventure game in the style of Monkey Island or Full Throttle. I’ve wanted to make one for as long as I can remember, and I finally have a concept and some amount of script that I think would be fitting for the medium. I spent roughly two days searching for engines to run this baby on, and I didn’t have any luck whatsoever.

  • adventure - an old adventure game engine I wrote back in ’12 or so. It requires writing a lot of lua, and appears to have bitrotten since then. I couldn’t get it to compile.
  • Adventure Game Studio - the latest version of the IDE immediately segfaults when run through WINE.
  • WinterMute - has a “garbage” rating on WINE HQ.
  • Godot/Escoria - Escoria doesn’t appear to run on recent versions of Godot.
  • Visionaire - I successfully got the editor running on WINE, but it couldn’t draw anything, so I could edit everything but had no visual feedback on anything.
  • Bladecoder Adventure Engine - I fought to compile this for a while, and eventually succeeded, but got scared of it. It’s written by a single guy in a language I never want to touch, and decided the risk factor was too high.
  • Unity Adventure Creator - looks promising, but required forking out 70 euros before you could try it. As someone who is unemployed knows nothing about Unity, this is a pretty steep price to determine whether or not the project will work for my purposes.

So it looks like we’re SOL. The existing engines don’t seem like they’re going to cut it. Which means we’re going to need to roll our own.

Fortunately I’ve rolled a few of my own already. This wasn’t my first rodeo. There’s the previously mentioned adventure, an unnamed XNA/C# one I wrote before knowing about source control which is unfortunately lost to the sands of time, and one I most recently put together as a technical demo for a project a friend and I were going to work on. The friend pulled out, unfortunately, so the project died, but that means I have a starting point.

The engine as it existed had basic functionality for pathing around a bitmap, moving between rooms, and basic support for interacting with the environment. Unwisely, it was also a testbed for lots of type-trickery involving existentially pushing around types to manage the internal state of things in the game. It was intended that we’d do all of our game scripting directly in Haskell, and this seemed like the only approach to have that work.

So my first order of business was to tear out all of the existential stuff. I’ve learned since that you should always avoid existentializing things unless you are really really sure you know what you’re doing. It’s a soft and slow rule, but more often than not I regret existentializing things. The new plan was to script the game with a dedicating scripting language, and so Haskell never needs to know about any of the internal state.

Since writing the first draft of this game engine, I’ve published a library called ecstasy. It’s an entity-component system that allows you to describe behaviors over components of things, and then compose all of those behaviors together. The magic here is that you can write a function that only manipulates the components you need, and the library will lift it over all entities such a behavior would be relevant to. This means you can pick-and-choose different behaviors for game objects without needing to do a lot of heavy plumbing to get everything to play nicely with one another.

And so the next step was to hook up ecstasy to my existing engine. I didn’t want to alter any of the game’s behavior yet, so entities managed by ecstasy would have to exist completely parallel to the ones managed by the existing engine.

I defined my ecstasy component type with the most minimal support for drawing things on the screen.

data EntWorld f = Entity
  { pos      :: Component f 'Field Pos
  , gfx      :: Component f 'Field Picture
} deriving (Generic)

and then updated my drawing routine to find any Entity who had both a pos and a gfx and then hook it into the existing drawing stuff:

drawGame :: MyState -> IO Picture
drawGame ms@(s, _) = evalGame' ms $ do
  gfxs <- efor . const $
    (,) <$> get pos <*> get gfx

  pure . scaleToView s
       . uncurry translate (-camera)
       . pictures
       $ roomPic
       : [ drawActors actors
         , drawGfxs gfxs

There was some silly plumbing necessary to connect my old, convoluted Game monad with the System monad provided by ecstasy. That’s what this ms@(s, _) and Game' silliness is here; little shims that can run the two monads simultaneously and reconcile the results. It was pretty gnarly, but thankfully only a hack until I could convert enough of the game logic over to being exclusively managed by ecstasy.

I think that’s where we’ll leave the dev blog for today. I want to get us roughly caught up to the present in terms of getting from there-to-here in order to provide a better overall view of what game development in Haskell looks like. But I’m also pretty anxious to actually get some work done, rather than just describing work I have done. I expect the posts to get more technical as we get closer to being caught up, when I don’t need to depend on my memory for what changes were made.

Next time we’ll discuss ripping out most of the silly global variables that used to be in play, and talk about how an ECS better models things like “what should the camera be focused on?” and “how should characters navigate the space?”

Until then.


January 27, 2018 12:00 AM

January 25, 2018

Robert Harper

POPL 2018 Tutorial

I’ve recently returned from POPL 2018 in Los Angeles, where Carlo Angiuli and I gave a tutorial on Computational (Higher) Type Theory.  It was structured into two parts, each consisting of a presentation of the theory followed by a demonstration of its use in the RedPRL prover.  The tutorial was based on work that I have been doing over the last several years with my students,  Carlo, Evan Cavallo, Favonia, and Jon Sterling, and with my colleague Daniel Licata, supported by AFOSR MURI grant FA9550-15-1-0053.

Computational higher type theory integrates two themes in type theory:

  1. Type theory is a theory of computation that classifies programs according to their behavior, rather than their structure.  Types are themselves programs whose values stand for specifications of program equivalence.
  2. Type theory can be extended to higher dimensions that account for identifications of types and their elements.  An identification is evidence for the interchangeability of two types in all contexts by computable transformations.

The idea of computational type theory was pioneered by Per Martin-Löf in his famous paper Constructive Mathematics and Computer Programming, and developed extensively in the NuPRL type theory and proof development environment.

The idea of higher type theory arose from several developments, notably the late Vladimir Voevodsky‘s univalence principle, which identifies equivalent types.  As a first approximation equivalence may be thought of as a generalized form of isomorphism, which is respected by the constructs of type theory.  The idea of univalence is to make precise informal conventions, such as not distinguishing between isomorphic structures, a handy expedient in many situations.

Voevodsky’s formulation of univalence was as a new axiom in intensional formal type theory that populates the identity type with new elements.  This move was justified mathematically by Hofmann and Streicher’s proof that type theory does not preclude there being additional elements and by Voevodsky’s construction of a model of univalence using combinatorial structures called simplicial sets.  However, these arguments left open the computational meaning of the extended theory, which disrupted the inversion principle governing the elimination form.  What is J supposed to do with these new forms of identification?

This question sparked several attempts to give a constructive formulation of univalence by a variety of methods.  One approach is to give a model of the theory in a constructive type theory with clear computational content, which was carried out by Bickford last year using NuPRL.  Another is to develop a new semantic framework for type theory that accounts for univalence within a larger theory of identifications of types and elements.

The decisive first steps in this direction were taken by Bezem, Coquand, and Huber, and later developed by Cohen, Coquand, Huber, and Mörtberg, and by Licata and Brunerie, using cubical methods.  The main idea is that “ordinary” types and elements are points, identifications of points are lines, identifications of lines are squares, and so forth at all dimensions.  The approaches differ in the definition of a “cube” (there is more than one!), and in the conditions specifying the existence and action of cubes in the theory.  For example, a line between types ought to induce a coercion between their elements, and it ought to be possible to compose two adjacent lines to get a third line.

The approach described in this tutorial extends the NuPRL type theory to account for higher-dimensional structure.  Unlike structural type theories, a computational type theory is defined not by rules, but by semantics, specifically meaning explanations based on a prior notion of computation.  In the present case we begin with a cubical programming language based on Licata and Brunerie’s formalism, and define types as programs that evaluate to specifications of the behavior of other programs.  The resulting theory gives a computational meaning to univalence, and accounts for a higher-dimensional notion of inductive types in which one may specify generators not only for points, but also for lines, squares, and other higher-dimensional objects.  The semantics ensures that every well-typed program evaluates to a value satisfying the specification given by its type.  In particular closed programs of boolean type evaluate to either true or false; there are no “stuck” states, and no exotic elements.

This type theory is implemented in the RedPRL system, a new, open-source, implementation of computational type theory in the NuPRL tradition.  It is based on a generalized form of refinement logic, the proof theory developed for NuPRL that emphasizes program extraction as a primitive notion.  It does not look very much like standard proof theories for types, because it is not designed to correspond to any extant notion of logic, but rather to be useful in practice for deriving proofs (programs).

The slides from the presentation are available on my web site, and the implementation is freely available for download and experimentation.  Enjoy!

by Robert Harper at January 25, 2018 08:23 AM

FP Complete

FP Complete and Cardano Blockchain Audit Partnership

Cardano enlists FP Complete for independent 3rd Party Audit of Cardano Blockchain

FP Complete Development specialists will provide comprehensive review of Cardano’s code and technical documentation

by Robert Bobbett ( at January 25, 2018 01:32 AM

January 24, 2018

Michael Snoyman


Many people in the community have seen the SLURP proposal. Some people have asked my opinion. Some others have made some... let's say colorful statements about my noninvolvement in the discussion. Let me set the record straight right now on why I've avoided the topic. The authors showed me the proposal before it was published, and I told them at that time I would not support it. I've also told them that, out of respect to them, I would hold back on commenting on SLURP. Unfortunately, that's now led to two things:

  • Some people making some very pointed implications
  • Misunderstanding about the usage of the term "fork" in the proposal, which unfortunately the authors have not rectified

To be clear: the proposal is not mine, I did not ask for this change, and I'm not "holding a gun" to anyone's head. Those descriptions aren't true. There are plenty of other statements I could comment on as well, but it's honestly not worth it.

Here's what isn't false: I regularly am in communication with many people across the Haskell community and ecosystem management teams about problems being faced. I interact with a broad group of users in my work, hear complaints, and relay them. I have my own complaints, and relay those as well. Some of these complaints have all pointed in similar directions.

My hands are tied on what I can say publicly, since so many comments are made in private emails that people object to being made public. And I know (from experience) that there are detractors out there who will straight out accuse me of lying. I've been avoiding saying anything because of this constant accusation, but I've decided to just put the info out there. I figure:

  • People who want to believe everything I do is malicious won't care if I have evidence anyway
  • People who are open to the possibility that I'm not evil will hopefully take my statements at face value

One last foreword: I used to very openly discuss my thoughts on architecture and ecosystem development. I believe it's the only real way to build an open source community. When tensions got their highest in the Stack-vs-cabal days, many people rebelled against this public broadcast methodology, and I've switched to quieter communication channels. I think this is unfortunate, and I'd much rather talk openly and loudly about ecosystem plans and let people have easy ways of input. I object strongly to the mentality of discussing everything behind closed doors. We'll see if open discussions can resume at some point.

What's the fork?

It seems clear to me now that the vast majority of discussion on SLURP has nothing to do with SLURP itself, but with its comments about forking. I really do wish that the authors had been willing to speak to that publicly if they were going to use the term fork in the document. I will speak to what I know about forking in the Stackage and Stack worlds. We'll have to leave it to the authors to speak for themselves as to whether my words here reflect what they'd intended.

The term "fork" here is definitely not being used in its most literal sense of "taking a software project, hosting the source code elsewhere, then continuing development under a different name" (my made up definition). It's referring to a more general split. Stack is called by many a fork of cabal-install, for example, even though they share no code (they share underlying libraries, like Cabal, of course).

Since everyone is most fixated on this point, let me state it clearly: I have been involved in absolutely 0 conversations where anyone wanted to host a direct competitor to Hackage. At all. No one I know wants to do this. I don't want to do this. Stackage and Stack today feed from Hackage, and no one I know wants to change that. No one I know wants to try to take over control of Hackage, for that matter.

When "fork" of Hackage is mentioned, that seems like the most logical conclusion to draw. I can guarantee that it's not the case.

Now let me address some concrete pain points that may lead to some kind of "fork."

Hackage Revisions

Many people are very outspoken about their dislike for Hackage Revisions. I dislike Hackage Revisions. I have more reason than most to dislike them: I've invested weeks to months of my life making changes to multiple tools to support revisions. I could go through the gory history of this, but it's not worth it: it would just be a programmer's war stories session. So let's turn to today.

With Stack 1.6, I finally got all of the pieces in place to fully support revision pinnings. Stackage has already had revision pinning for a long time. Stackage has the ability to list some packages as ignoring revisions.

If you ask me today, I will still say revisions are a bad idea, they should be disabled, and better solutions to the dependency resolution problem implemented (I've discussed those at length in the past). At the same time: the cost is now sunk. I still worry about the fact that users do not, in fact, pin their extra-deps to specific revisions, and that the rules for revisions on Hackage are far too lax. These a real concerns that I care about, but also not the top of my personal priority list.

Others, by the way, feel differently. I know many individuals who are offended at the thought of a Hackage Trustee forcibly editing their cabal files. I don't disagree with them per se, but I'm also not as passionate about this topic. In conversations with community leaders, I've made this distinction very clear (at least, I've tried to make it clear).

My biggest remaining concern about revisions is the social implication they carry. Namely: the idea that someone else is responsible for the stability of your build. I've mentioned many times that I believe a huge source of our social tension is a world where you can complain to an upstream developer because your build suddenly stopped working. That's a recipe for disaster, and is a fundamental flaw in the PVP+dependency solving world. We need tooling that focuses instead on fixed build plans. I've advocated for this for years, and ultimately created Stack largely due to inability to get traction upstream.

In sum: will revisions lead to anything of a fork? No.


A few weeks ago I tweeted:

The original design of Stackage followed a standard Linux distribution model directly. Hackage was our upstream, we maintained a set of patches to avoid massive version bound disruption, and very occasionally (if at all, I honestly don't remember) edited source files to fix bugs.

In 2014, when I discussed the plans for incorporating Stackage into cabal and the Haskell Platform (code named GPS Haskell, and which never got off the ground), the cabal, Hackage, and HP maintainers required that Stackage not maintain any local modifications. I removed that functionality, and that's the world we've been in since.

Adding that back is on the table. I'll explain why in a second. This could be considered a fork, and some may call it a soft fork. It's honestly not a feature I want to add back to Stackage, since maintaining patch sets is a lot of work. But many communities need to do it. As I understand it, Nix does it as well. So if it's a fork, it's a fork we already have widely in our ecosystem.

One "nice to have" reason for adding in this curation is to work around packages which are slow to upgrade to newer dependency versions. It can be very frustrating for Stackage package maintainers to have their packages held back because someone else won't relax an upper bound. Curation would let us work around that. I consider this a perk, but not a necessity.

But the more important reason for this is to deal with packages which are causing trouble in the Stackage or Stack world, but are not causing trouble in the cabal-install world. I didn't consider this a real concern until it happened multiple times in the past few months. You can see an example here.

I'm not demanding anything of any authors by making this statement. But here's the reality: I personally end up spending a lot of my own time dealing with these kinds of breakages. My friends and colleagues get sucked into various carry-on tasks, like cutting new emergency point releases. I do not want my life to be spent in a situation where, at a moment's notice, I'll need to dedicate large amounts of time to changing something in Stack to be compliant with something in the Cabal library which should be in a spec, but is instead undocumented.

Hackage already takes great pains to ensure it does not break cabal-install. Many people have probably heard about how the ^>= operator's introduction broke Stack 1.5. What many people didn't hear about is that it also broke cabal-install 1.24. You didn't hear about it, because Hackage implemented a workaround to hide those files from older cabal-install versions. This curation idea is to provide a way for Stackage to work around breakage for Stack, the same way Hackage will work around damage for cabal-install.

And yes: I requested that the same kind of treatment be given to Stack from Hackage. That was met with calls of asking for preferential treatment. Readers can determine what they feel.

In sum: I'm working towards allowing Stackage to apply patches to upstream packages. I don't consider this a fork, but rather curation. Others may choose to label it a fork.

Avoid uploading to Hackage

I'll start with this: my personal preference is to continue uploading all of my packages to Hackage. I have no intention nor desire to stop uploading conduit, yesod, or any of the other 80+ packages I actively maintain to Hackage. That said, not everyone feels the same way.

Today, Stackage is strictly downstream of Hackage. You cannot get a package into Stackage unless it is first uploaded to Hackage. Period, end of story. There seem to be three groups of people pushing towards the ability to change this:

  1. At least some PVP advocates have requested (or demanded) that package authors who will not follow the PVP do not upload their packages to Hackage. This is absolutely contradicted by the official guidelines of Hackage, which I've pointed out many times. Nonetheless, this request/demand has persisted.
  2. Some opposed to the PVP do not want to upload to Hackage, basically because of (1). There have been many tense altercations over adherence to the PVP. People want to avoid this, and the easiest way is if they don't upload to Hackage. I know some people who simply do not release their code to Hackage or Stackage because of this. Others do so begrdugingly. But all of them would like to avoid Hackage for this reason.
  3. Some people feel that, technically, the central repo with manually uploaded tarball model is outdated. They would rather see a workflow based on automated Git-based releases using tags or a release branch. This is not a social dynamic at all, but a desire to explore a different point in the technical space, which Hackage does not support today.

(1) has been a major pain point for me. I've requested changes to the Hackage Trustee guidelines and Hackage rules to clarify that this behavior (private emails demanding people not upload to Hackage, public criticisms on individuals and companies for not following the PVP, etc) should not be allowed. In fact, that request is what ultimately led to SLURP as far as I know. Did I demand a change with a threat to fork? Ehh... if you want to read it that way, OK. Here's my take: I've been told to stop using Hackage, full stop. I requested a change in official policy to guarantee that my usage of Hackage is allowed.

As it stands today, no such change to Hackage policy has taken place. No final decision has been made about how I will respond to people in groups (2) and (3). But as you can see from the sentiments of group (3), the idea of hosting an alternative package repository to Hackage makes no sense. Thus I can again guarantee: the most literal fork of Hackage is something neither I nor anyone I'm speaking with wants.

The other alternative is allowing Stackage to pull packages directly from Git repos, in addition to pulling from Hackage. This is being discussed as a workaround for problem (1) above. I have gone on record in the past, and I'll go on record again now: I would rather not have that situation. I would rather Hackage make it clear that it welcomes everyone to upload its packages, and then the demands I'm receiving to open up Stackage to alternative sources will be less strong (though group (3) still wants to experiment for purely technical reasons).

Am I holding a gun to someone's head? Your call. This is the most honest version of the story I know to tell.

In sum: this is the closest to a potential fork, by allowing Git repos to work as an alternative source to Hackage.


I've participated in a long, private discussion with multiple people in trying to resolve the issues referenced above. As I said: my preference has always been for public discussions. Given how the SLURP proposal went off, I will stand by my original claim that public discussions are a better method. I'm sorry that the "fork" phrasing scared so many people. To those who were truly terrified I was going to do something nefarious: I'm sorry to keep you waiting two days in an explanation.

January 24, 2018 04:45 AM

January 21, 2018

Brent Yorgey

Off the Beaten Track: Explaining Type Errors

Last week I gave a talk at Off the Beaten Track 2018 about something that Richard Eisenberg, Harley Eades and I have been thinking about recently: namely, how to generate good interactive error explanations for programmers, especially for type errors. None of the talks at OBT were recorded, but I’ve prepared a version of my slides interspersed with (something like) a transcript of what I said.

by Brent at January 21, 2018 09:26 PM

January 19, 2018

Brandon Simmons

In defense of partial functions in the haskell Prelude

…because I’m trying to blog more, and this sounds like a fun argument to try to make.

One of the most universally-maligned parts of Haskell is the inclusion of partial functions in its standard library called Prelude. These include head and tail which are undefined for the empty list:

head :: [a] -> a
head (a:_) = a
head [] = error "empty list"

It’s generally understood that the inclusion of these sorts of functions are a wart (that the type of head should be [a] -> Maybe a) that has motivated the proliferation of Prelude alternatives, few of which are used by anyone besides their authors (and fewer still have enthusiastic advocates).

I’ve heard a handful of allusions to tricky production bugs that involved some deep dive to find the source of a "*** Exception: Prelude.head: empty list", but personally I can recall only one instance of such a bug in the code I’ve worked on professionally and it was trivial to track down. I can’t recall flagging a use of head in a code review either, or raising an eyebrow at some use of the function in some library source I was perusing.

But most of the time the argument goes that partial functions should be removed for the sake of new users, who will become quickly disillusioned when their first function blows up with an exception. But you said haskell was safe!

It would be unfortunate if this caused a new user to give up, and maybe this is a real problem for the community, but here’s what I think really happens to most of us:

  • Your homework doesn’t work; this doesn’t matter.
  • You use Google and quickly learn that partial functions (will forever) exist, and that they’re bad
  • You ask yourself “Hm, come to think of it what did I expect to happen…?”

And so you learn an important lesson, early on and in the most painless way possible, you acquire a nose for inferring which functions must be partial, an appreciation for compiler warnings that help prevent accidentally-partial functions, etc.

Would I recommend designing a standard library around this weird sort of tough-love? Probably not, but I think the haskell library ecosystem and pedagogy have benefited from this wart.

The problems that get the most (and most passionate) attention are usually not the ones that are the most important, but the ones that are the most easily understood. I think in the proliferation of Preludes and the discussion around partial functions (and the fact that they haven’t been excised yet) we see evidence of both the Law of Triviality, and a healthy language pedagogy.

January 19, 2018 08:22 PM

January 18, 2018

Comonad Reader

Computational Quadrinitarianism (Curious Correspondences go Cubical)

Back in 2011, in an influential blog post [1], Robert Harper coined the term "computational trinitarianism" to describe an idea that had been around a long time — that the connection between programming languages, logic, and categories, most famously expressed in the Curry-Howard-Lambek correspondence — should guide the practice of researchers into computation. In particular "any concept arising in one aspect should have meaning from the perspective of the other two". This was especially satisfying to those of us trying learning categorical semantics and often finding it disappointing how little it is appreciated in the computer science community writ large.

1. Categories

Over the years I've thought about trinitarianism a lot, and learned from where it fails to work as much as where it succeeds. One difficulty is that learning to read a logic like a type theory, or vice versa, is almost a definitional trick, because it occurs at the level of reinterpretation of syntax. With categories it is typically not so easy. (There is a straightforward version of categorical semantics like this — yielding "syntactic categories" — but it is difficult to connect it to the broader world of categorical semantics, and often it is sidestepped in favor of deeper models.)

One thing I came to realize is that there is no one notion of categorical semantics — the way in which the simply typed lambda calculus takes models in cartesian closed categories is fundamentally unlike the way in which linear logics take models in symmetric monoidal categories. If you want to study models of dependent type theories, you have a range of approaches, only some of which have been partially unified by Ahrens, Lumsdaine and Voevodsky in particular [2]. And then there are the LCCC models pioneered by Seely for extensional type theory, not to mention the approach that takes semantics directly in toposes, or in triposes (the latter having been invented to unify a variety of structures, and in the process giving rise to still more questions). And then there is the approach that doesn't use categories at all, but multicategories.

Going the other way, we also run into obstacles: there is a general notion, opposite to the "syntactic category" of a type theory, which is the "internal logic" of a category. But depending on the form of category, "internal logic" can take many forms. If you are in a topos, there is a straightforward internal logic called the Mitchell–Bénabou language. In this setting, most "logical" operations factor through the truth-lattice of the subobject classifier. This is very convenient, but if you don't have a proper subobject classifier, then you are forced to reach for other interpretations. As such, it is not infrequently the case that we have a procedure for deriving a category from some logical theory, and a procedure for constructing a logical theory from some category, but there is no particular reason to expect that where we arrive, when we take the round-trip, is close to, much less precisely, where we began.

2. Spaces, Logics

Over the past few years I've been in a topos theory reading group. In the course of this, I've realized at least one problem with all the above (by no means the only one) — Harper's holy trinity is fundamentally incomplete. There is another structure of interest — of equal weight to categories, logics, and languages — which it is necessary to understand to see how everything fits. This structure is spaces. I had thought that it was a unique innovation of homotopy type theory to consider logics (resp. type theories) that took semantics in spaces. But it turns out that I just didn't know the history of constructive logic very well. In fact, in roughly the same period that Curry was exploring the relationship of combinatory algebras to logic, Alfred Tarski and Marshall Stone were developing topological models for intuitionistic logic, in terms of what we call Heyting Algebras [3] [4]. And just as, as Harper explained, logic, programming and category theory give us insights into implication in the form of entailment, typing judgments, and morphisms, so to, as we will see, do spaces.

A Heyting algebra is a special type of distributive lattice (partially ordered set, equipped with meet and join operations, such that meet and join distribute over one another) which has an implication operation that satisfies curry/uncurry adjointness — i.e. such that c ∧ a ≤ b < -> c ≤ a → b. (Replace meet here by "and" (spelled "*"), and ≤ by ⊢ and we have the familiar type-theoretic statement that c * a ⊢ b < -> c ⊢ a → b).

If you haven't encountered this before, it is worth unpacking. Given a set, we equip it with a partial order by specifying a "≤" operation, such that a ≤ a, if a ≤ b and b ≤ a, then a = b, and finally that if a ≤ b and b ≤ c, then a ≤ c. We can think of such things as Hasse diagrams — a bunch of nodes with some lines between them that only go upwards. If a node b is reachable from a node a by following these upwards lines, then a ≤ b. This "only upwards" condition is enough to enforce all three conditions. We can define ∨ (join) as a binary operation that takes two nodes, and gives a node a ∨ b that is greater than either node, and furthermore is the uniquely least node greater than both of them. (Note: A general partial order may have many pairs of nodes that do not have any node greater than both of them, or may that may have more than one incomparable node greater than them.) We can define ∧ (meet) dually, as the uniquely greatest node less than both of them. If all elements of a partially ordered set have a join and meet, we have a lattice.

It is tempting to read meet and join as "and" and "or" in logic. But these logical connectives satisfy an additional important property — distributivity: a & (b | c) = (a & b) | (a & c). (By the lattice laws, the dual property with and swapped with or is also implied). Translated for lattices this reads: a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c). Rather than thinking just about boolean logic, we can think about lattices built from sets — with meets as union, join as intersection, and ≤ given by inclusion. It is easy to verify that such lattices are distributive. Furthermore, every distributive lattice can be given (up to isomorphism) as one built out of sets in this way. While a partially ordered set can have a Hasse diagram of pretty arbitrary shape, a lattice is more restrictive — I imagine it as sort of the tiled diamonds of an actual lattice like one might use in a garden, but with some nodes and edges possibly removed.

Furthermore, there's an amazing result that you can tell if a lattice is distributive by looking for just two prototypical non-distributive lattices as sublattices. If neither is contained in the original lattice, then the lattice is distributed. These tell us how distribution can fail in two canonical ways. The first is three incomparable elements, all of which share a common join (the top) and meet (the bottom). The join of anything but their bottom element with them is therefore the top. Hence if we take the meet of two joins, we still get the top. But the meet of any two non-top elements is the bottom and so, if we take the join of any element with the meet of any other two, we get back to the first element, not all the way to the top, and the equality fails. The second taboo lattice is constructed by having two elements in an ordered relationship, and another incomparable to them — again augmented with a bottom and top. A similar argument shows that if you go one way across the desired entity, you pick out the topmost of the two ordered elements, and the other way yields the bottommost. (The wikipedia article on distributive lattices has some very good diagrams to visualize all this). So a distributive lattice has even more structure than before — incomparable elements must have enough meets and joins to prevent these sublattices from appearing, and this forces even more the appearance of a tiled-diamond like structure.

To get us to a Heyting algebra, we need more structure still — we need implication, which is like an internal function arrow, or an internal ≤ relation. Recall that the equation we want to satisfy is "c ∧ a ≤ b < -> c ≤ a → b". The idea is that we should be able to read ≤ itself as an "external implication" and so if c and a taken together imply b, "a implies b" is the portion of that implication if we "only have" c. We can see it as a partial application of the external implication. If we have a lattice that permits infinite joins (or just a finite lattice such that we don't need them), then it is straightforward to see how to construct this. To build a → b, we just look at every possible choice of c that satisfies c ∧ a ≤ b, and then take the join of all of them to be our object a → b. Then, by construction, a → b is necessarily greater than or equal to any c that satisfies the left hand side of the equation. And conversely, any element that a → b is greater than is necessarily one that satisfies the left hand side, and the bi-implication is complete. (This, by the way, gives a good intuition for the definition of an exponential in a category of presheaves). Another way to think of a → b is as the greatest element of the lattice such that a → b ∧ a ≤ b (exercise: relate this to the first definition). It is also a good exercise to explore what happens in certain simple cases — what if a is 0 (false)? What if it is 1? The same as b? Now ask the same questions of b.

So why is a Heyting algebra a topological construct? Consider any topological space as given by a collection of open sets, satisfying the usual principles (including the empty set and the total set, and closed under union and finite intersection). These covers have a partial ordering, given by containment. They have unions and intersections (all joins and meets), a top and bottom element (the total space, and the null space). Furthermore, they have an implication operation as described above. As an open set, a → b is given by the meet of all opens c for which a ∧ c ≤ b. (We can think of this as "the biggest context, for which a ⊢ b"). In fact, the axioms for open sets feel almost exactly like the rules we've described for Heyting algebras. It turns out this is only half true — open sets always give Heyting algebras, and we can turn every Heyting algebra into a space. However, in both directions the round trip may take us to somewhere slightly different than where we started. Nonetheless it turns out that if we take complete Heyting algebras where finite meets distribute over infinite joins, we get something called "frames." And the opposite category of frames yields "locales" — a suitable generalization of topological spaces, first named by John Isbell in 1972 [5]. Spaces that correspond precisely to locales are called sober, and locales that correspond precisely to spaces are said to have "enough points" or be "spatial locales".

In fact, we don't need to fast-forward to 1972 to get some movement in the opposite direction. In 1944, McKinsey and Tarski embarked on a program of "The Algebra of Topology" which sought to describe topological spaces in purely algebraic (axiomatic) terms [6]. The resultant closure algebras (these days often discussed as their duals, interior algebras) provided a semantics for S4 modal logic. [7] A further development in this regard came with Kripke models for logic [8] (though arguably they're really Beth models [9]).

Here's an easy way to think about Kripke models. Start with any partial ordered set. Now, for each object, instead consider instead all morphisms into it. Since each morphism from any object a to any object b exists only if a ≤ b, and we consider such paths unique (if there are two "routes" showing a ≤ b, we consider them the same in this setting) this amounts to replacing each element a with the set of all elements ≤ a. (The linked pdf does this upside down, but it doesn't really matter). Even though the initial setting may not have been Heyting algebra, this transformed setting is a Heyting algebra. (In fact, by a special case of the Yoneda lemma!). This yields Kripke models.

Now consider "collapsings" of elements in the initial partial order — monotone downwards maps taken by sending some elements to other elements less than them in a way that doesn't distort orderings. (I.e. if f(a) ≤ f(b) in the collapsed order, then that means that a ≤ b in the original order). Just as we can lift elements from the initial partial order into their downsets (sets of elements less than them) in the kripkified Heyting Algebra, we can lift our collapsing functions into collapsing functions in our generated Heyting Algebra. With a little work we can see that collapsings in the partial order also yield collapsings in the Heyting Algebra.

Furthermore, it turns out, more or less, that you can generate every closure algebra in this way. Now if we consider closure algebras a bit (and this shouldn't surprise us if we know about S4), we see that we can always take a to Ca, that if we send a → b, then we can send Ca → Cb, and furthermore that CCa → Ca in a natural way (in fact, they're equal!). So closure algebras have the structure of an idempotent monad. (Note: the arrows here should not be seen as representing internal implication — as above they represent the logical turnstile ⊢ or perhaps, if you're really in a Kripke setting, the forcing turnstile ⊩).

Now we have a correspondence between logic and computation (Curry-Howard), logic and categories (Lambek-Scott), and logic and spaces (Tarski-Stone). So maybe, instead of Curry-Howard-Lambek, we should speak of Curry-Howard-Lambek-Scott-Tarski-Stone! (Or, if we want to actually bother to say it, just Curry-Howard-Lambek-Stone. Sorry, Tarski and Scott!) Where do the remaining correspondences arise from? A cubical Kan operation, naturally! But let us try to sketch in a few more details.

3. Spaces, Categories

All this about monads and Yoneda suggests that there's something categorical going on. And indeed, there is. A poset is, in essence, a "decategorified category" — that is to say, a category where any two objects have at most one morphism between them. I think of it as if it were a balloon animal that somebody let all the air out of. We can pick up the end of our poset and blow into it, inflating the structure back up, and allowing multiple morphisms between each object. If we do so, something miraculous occurs — our arbitrary posets turn into arbitrary categories, and the induced Heyting algebra from their opens turns into the induced category of set-valued presheaves of that category. The resultant structure is a presheaf topos. If we "inflate up" an appropriate notion of a closure operator we arrive at a Grothendieck topos! And indeed, the internal language of a topos is higher-order intuitionistic type theory [10].

4. Spaces, Programming Languages

All of this suggests a compelling story: logic describes theories via algebraic syntax. Equipping these theories with various forms of structural operations produces categories of one sort or another, in the form of fibrations. The intuition is that types are spaces, and contexts are also spaces. And furthermore, types are covered by the contexts in which their terms may be derived. This is one sense in which we it seems possible to interpret the Meillies/Zeilberger notion of a type refinement system as a functor [11].

But where do programming languages fit in? Programming languages, difficult as it is to sometimes remember, are more than their type theories. They have a semantic of computation as well. For example, a general topos does not have partial functions, or a fixed point combinator. But computations, often, do. This led to one of the first applications of topology to programming languages — the introduction of domain theory, in which terms are special kinds of spaces — directed complete partial orders — and functions obey a special kind of continuity (preservation of directed suprema) that allows us to take their fixed points. But while the category of dcpos is cartesian closed, the category of dcpos with only appropriately continuous morphisms is not. Trying to resolve this gap, one way or another, seems to have been a theme of research in domain theory throughout the 80s and 90s [12].

Computations can also be concurrent. Topological and topos-theoretic notions again can play an important role. In particular, to consider two execution paths to be "the same" one needs a notion of equivalence. This equivalence can be seen, stepwise, as a topological "two-cell" tracing out at each step an equivalence between the two execution paths. One approach to this is in Joyal, Nielson and Winskel's treatment of open maps [13]. I've also just seen Patrick Schultz and David I. Spivak's "Temporal Type Theory" which seems very promising in this regard [14].

What is the general theme? Computation starts somewhere, and then goes somewhere else. If it stayed in the same place, it would not "compute". A computation is necessarily a path in some sense. Computational settings describe ways to take maps between spaces, under a suitable notion of topology. To describe the spaces themselves, we need a language — that language is a logic, or a type theory. Toposes are a canonical place (though not the only one) where logics and spaces meet (and where, to a degree, we can even distinguish their "logical" and "spatial" content). That leaves categories as the ambient language in which all this interplay can be described and generalized.

5. Spaces, Categories

All the above only sketches the state of affairs up to roughly the mid '90s. The connection to spaces starts in the late 30s, going through logic, and then computation. But the categorical notion of spaces we have is in some sense impoverished. A topos-theoretic generalization of a space still only describes, albeit in generalized terms, open sets and their lattice of subobject relations. Spaces have a whole other structure built on top of that. From their topology we can extract algebraic structures that describe their shape — this is the subject of algebraic topology. In fact, it was in axiomatizing a branch of algebraic topology (homology) that category theory was first compelled to be invented. And the "standard construction" of a monad was first constructed in the study of homology groups (as the Godement resolution).

What happens if we turn the tools of categorical generalization of algebraic topology on categories themselves? This corresponds to another step in the "categorification" process described above. Where to go from "0" to "1" we took a partially ordered set and allowed there to be multiple maps between objects, to go from "1" to "2" we can now take a category, where such multiple maps exist, and allow there to be multiple maps between maps. Now two morphisms, say "f . g" and "h" need not merely be equal or not, but they may be "almost equal" with their equality given by a 2-cell. This is just as two homotopies between spaces may themselves be homotopic. And to go from "2" to "3" we can continue the process again. This yields n-categories. An n-category with all morphisms at every level invertible is an (oo,0)-category, or an infinity groupoid. And in many setups this is the same thing as a topological space (and the question of which setup is appropriate falls under the name "homotopy hypothesis" [15]). When morphisms at the first level (the category level) can have direction (just as in normal categories) then those are (oo,1)-categories, and the correspondence between groupoids and spaces is constructed as an equivalence of such categories. These too have direct topological content, and one setting in which this is especially apparent is that of quasi-categories, which are (oo,1)-categories that are built directly from simplicial sets — an especially nice categorical model of spaces (the simplicial sets at play here are those that satisfy a "weak" Kan condition, which is a way of asking that composition behave correctly).

It is in these generalized (oo,1)-toposes that homotopy type theory takes its models. And, it is hypothesized that a suitable version of HoTT should in fact be the initial model (or "internal logic") of an "elementary infinity topos" when we finally figure out how to describe what such a thing is.

So perhaps it is not that we should be computational trinitarians, or quadrinitarians. Rather, it is that the different aspects which we examine — logic, languages, categories, spaces — only appear as distinct manifestations when viewed at a low dimensionality. In the untruncated view of the world, the modern perspective is, perhaps, topological pantheism — spaces are in all things, and through spaces, all things are made as one.

Thanks to James Deikun and Dan Doel for helpful technical and editorial comments


by Gershom Bazerman at January 18, 2018 08:15 PM

January 11, 2018

Yesod Web Framework

Upcoming Yesod breaking changes

With all of the talk I've had about breaking changes in my libraries, I definitely didn't want the Yesod world to feel left out. We've been stable at yesod-core version 1.4 since 2014. But the changes going through my package ecosystem towards MonadUnliftIO are going to affect Yesod as well. The question is: how significantly?

For those not aware, MonadUnliftIO is an alternative typeclass to both MonadBaseControl and the MonadCatch/MonadMask classes in monad-control and exceptions, respectively. I've mentioned the advantages of this new approach in a number of places, but the best resource is probably the release announcement blog post.

At the simplest level, the breaking change in Yesod would consist of:

  • Modifying WidgetT's internal representation. This is necessary since, currently, it's implemented as a WriterT. Instead, to match with MonadUnliftIO, it needs to be a ReaderT holding an IORef. This is just about as minor a breaking change as I can imagine, since it only affects internal modules. (Said another way: it could even be argued to be a non-breaking change.)
  • Drop the MonadBaseControl and MonadCatch/MonadMask instances. This isn't strictly necessary, but has two advantages: it allows reduces the dependency footprint, and further encourages avoiding dangerous behavior, like using concurrently with a StateT on top of HandlerT.
  • Switch over to the new versions of the dependent libraries that are changing, in particular conduit and resourcet. (That's not technically a breaking change, but I typically consider dropping support for a major version of a dependency a semi-breaking change.)
  • A number of minor cleanups that have been waiting for a breaking changes. This includes things like adding strictness annotations in a few places, and removing the defunct GoogleEmail and BrowserId modules.

This is a perfectly reasonable set of changes to make, and we can easily call this Yesod 1.5 (or 2.0) and ship it. I'm going to share one more slightly larger change I've experimented with, and I'd appreciated feedback on whether it's worth the breakage to users of Yesod.

Away with transformers!

NOTE All comments here, as is usually the case in these discussions, refer to code that must be in IO anyway. Pure code gets a pass.

You can check out the changes (which appear larger than they actually are) in the no-transformers branch. You'll see shortly that that's a lie, but it does accurately indicate intent. If you look at the pattern of the blog posts and recommended best practices I've been discussing for the past year, it ultimately comes down to a simple claim: we massively overuse monad transformers in modern Haskell.

The most extreme response to this claim is that we should get rid of all transformers, and just have our code live in IO. I've made a slight compromise to this for ergonomics, and decided it's worth keeping reader capabilities, because it's a major pain (or at least perceived major pain) to pass extra stuff around for, e.g., simple functions like logInfo.

The core data type for Yesod is HandlerT, with code that looks like getHomeR :: HandlerT App IO Html. Under the surface, HandlerT looks something like:

newtype HandlerT site m a = HandlerT (HandlerData site -> m a)

Let's ask a simple question: do we really need HandlerT to be a transformer? Why not simply rewrite it to be:

newtype HandlerFor site a = HandlerFor (HandlerData site -> IO a)

All we've done is replaced the m type parameter with a concrete selection of IO. There are already assumptions all over the place that your handlers will necessarily have IO as the base monad, so we're not really losing any generality. But what we gain is:

  • Slightly clearer error messages
  • Less type constraints, such as MonadUnliftIO m, floating around
  • Internally, this actually simplifies quite a few ugly things around weird type families

We can also regain a lot of backwards compatibility with a helper type synonym:

type HandlerT site m = HandlerFor site

Plus, if you're using the Handler type synonym generated by the Template Haskell code, the new version of Yesod would just generate the right thing. Overall, this is a slight improvement, and we need to weigh the benefit of it versus the cost of breakage. But let me throw one other thing into the mix.

Handling subsite (yes, transformers)

I lied, twice: the new branch does use transformers, and HandlerT is more general than HandlerFor. In both cases, this has to do with subsites, which have historically been a real pain to write (using them hasn't been too bad). In fact, the entire reason we have HandlerT today is to try and make subsites work in a nicely layered way (which I think I failed at). Those who have been using Yesod long enough likely remember GHandler as a previous approach for this. And anyone who has played with writing a subsite, and the hell which ensues when trying to use defaultLayout, will agree that the situation today is not great.

So cutting through all of the crap: when writing a subsite, almost everything is the same as writing normal handler code. The following differences pop up:

  • When you call getYesod, you get the master site's app data (e.g. App in a scaffolded site). You need some way to get the subsite's data as well (e.g., the Static value in yesod-static).
  • When you call getCurrentRoute, it will give you a route in the master site. If you're inside yesod-auth, for instance, you don't want to deal with all of the possible routes in the parent, but instead get a route for the subsite itself.
  • If I'm generated URLs, I need some way to convert the routes for a subsite into the parent site.

In today's Yesod, we provide these differences inside the HandlerT type itself. This ends up adding some weird complications around special-casing the base (and common) case where m is IO. Instead, in the new branch, we have just one layer of ReaderT sitting on top of HandlerFor, providing these three pieces of functionality. And if you want to get a better view of this, check out the code.

What to do?

Overall, I think this design is more elegant, easier to understand, and simplifies the codebase. In reality, I don't think it's either a major departure from the past, or a major improvement, which is what leaves me on the fence about the no transformer changes.

We're almost certainly going to have a breaking change in Yesod in the near future, but it need not include this change. If it doesn't, the breaking change will be the very minor one mentioned above. If the general consensus is in favor of this change, then we may as well throw it in at the same time.

January 11, 2018 09:29 PM

January 09, 2018

Michael Snoyman

Breaking changes, dependency trees

My previous blog post discussed a possible upcoming breaking change to the conduit library: dropping finalizers. This is one of a number of other breaking changes I have planned. Another one is switching over from MonadBaseControl to MonadUnliftIO, for reasons I've discussed at length before and spoken about too.

Beyond this change, I have a number of others planned out as well, some more solidly than others. I've started a document describing some of these, and I wanted to bring up one point in this design space for some user feedback: conduit dependency trees.


The situation today is that we have a dependency graph that looks something like the following:

  • resourcet is at the base of the hierarchy, and defines some non-conduit-specific types and functions used throughout the conduit ecosystem. It currently depends on a number of packages, like monad-control, but that number will naturally drop as we move over to MonadUnliftIO exclusively.
  • conduit is designed to provide basic conduit functionality with fewer dependencies. It does depend on resourcet, and packages like monad-control. But it does not depend on bytestring, text, or vector, even though these are almost always wanted with conduit. It provides the Data.Conduit.List set of combinators, which are not the best ones out there.
  • conduit-extra adds lots of dependencies, including things like attoparsec, and provides a nicer set of helpers around bytestring and text.
  • And finally, at the top of the tree (or our tree for today), we've got conduit-combinators, which provides the combinators I actually recommend people use in the Data.Conduit.Combinator module. This has lots of dependencies, since it inherits from conduit-extra and also adds in some extra things like mwc-random.


  • You can use resourcet without touching the conduit ecosystem at all
  • You can use conduit without pulling in lots of resources
  • Data.Conduit.Combinators is fully loaded


  • The current dependency footprint even at the base is higher than I'd like, though that's getting fixed soon regardless.
  • The conduit package is not super useful on its own due to lack of bytestring, text, and vector support.
  • To get the functionality you want in either conduit-extra or conduit-combinators, you end up with a much larger dependency footprint.

Plans for the next version

I have a number of different ideas in mind. I'll start off with the most conservative plan, and mention some variants below.

  • As already mentioned, resourcet drops a bunch of dependencies. Nothing too interesting there.
  • conduit adds a dependency on bytestring, text, and vector as basic libraries everyone should be using anyway. We move over Data.Conduit.Combinators and provide most of its functionality in conduit itself, and start recommending against Data.Conduit.List, Data.Conduit.Binary, and Data.Conduit.Text.
  • conduit-extra basically remains as-is
  • conduit-combinators retains the extra functionality not present in the new conduit


  • The conduit package now provides most of the functionality you'll want on a day-to-day basis
  • The dependency footprint for the Data.Conduit.Combinators module is much reduced
  • We can finally get away from the not-well-named functions in Data.Conduit.List

There aren't necessarily downsides to this approach, as I think it's simply better than what we have today already. But I want to list out the alternatives, which will make clear some things that could be possibly better still.

  • What do we do with the mono-traversable package? It's currently a dependency of conduit-combinators, and the simplest path forward for the above is to make conduit depend on mono-traversable. However, this is a slightly heavier dependency footprint, requiring adding in unordered-containers and vector-algorithms. Alternatives: Strip down mono-traversable to have less deps Redefine parts of mono-traversable needed for conduit in conduit itself Going crazy: really move mono-traversable into conduit and swap the dependency tree around My inclination: minimize mono-traversable's dependencies a bit more (like dropping the split package, and maybe vector-algorithms) and make it a dependency of conduit.
  • Do we really need conduit-combinators as well as conduit-extra? It's just adding a few extra pieces of functionality over conduit-extra, and perhaps those should be folded into conduit-extra itself.
  • Some people may not like the heavier dep footprint of conduit now. Should we split off a conduit-core package providing the core data types, functions, and operators, and have conduit depend on that?
  • It feels almost silly to have the ResourceT data type live in a separate package. If we have conduit-core, that could be a logical place to put it, since it won't have any extra dependencies versus the resourcet package itself, and then we can turn resourcet into a backwards compatibility layer. Or it may be logical to place ResourceT in the unliftio-core package, since both concepts help with resource cleanup in monad transformers. The former is necessary for continuation-based monads, while the latter (MonadUnliftIO) works for simpler monads.

If people have feedback, I'm happy to hear about it. I've spent an unfortunate amount of time bouncing around between these different options, so hopefully writing it all down and hearing some outside opinions can help move this forward.

January 09, 2018 12:00 PM

January 08, 2018

Roman Cheplyaka

New patterns in tasty

When I wrote tasty in 2013, I borrowed the pattern language and its implementation from test-framework. I wasn’t fond of that pattern language, but it did the job most of the time, and the task of coming up with a better alternative was daunting.

Over the years, however, the pattern language and implementation received more feature requests and complaints than any other aspect of tasty.

  • Mikhail Glushenkov wanted to run all tests in a group except a selected few, e.g. --pattern 'test-group/**' --ignore 'test-group/test-3'.
  • Utku Demir wanted to specify a list of tests to run, such as “A and B, but not C and D”; Philipp Hausmann and Süleyman Özarslan requested something similar in the same issue.
  • Rob Stewart wanted a pattern that would match Bar but not FooBar.
  • Daniel Mendler reported that patterns with slashes didn’t work because slashes have special meaning in the patterns.
  • Levent Erkök, whose package sbv has more than 30k tasty tests, was concerned about the performance of pattern matching.
  • Allowing regular expressions in patterns would fulfill some (though not all) of these requests. However, using a regular expression library has its downsides. Carter Schonwald repeatedly complained to me that regex-tdfa takes ages to compile. Simon Jakobi noted that regex-tdfa brings along several transitive dependencies (including parsec), which further increases compilation time, makes the package more prone to breakages, and makes it harder for the maintainers of core packages to use tasty.

Every time someone filed an issue about tasty patterns, I would say something like “oh, this will be fixed once I get around to that issue from 2013 about the new patterns”. But I still had no idea what that new pattern language should look like.

The new pattern language had to be expressive, containing at least the boolean operators. It had to allow matching against the test name, the name of any of the groups containing the test, or the full test path, like Foo/Bar/Baz for a test named Baz in the test group Bar, which itself is contained in the top-level group Foo.

Finally, there was an issue of familiarity. Whatever ad-hoc DSL I would come up with, I had to document thoroughly its syntax and semantics, and then I had to convince tasty users to learn a new language and read the docs every time they wanted to filter their tests. (Not that the old patterns were particularly intuitive.)

The insight came to me last summer while I was spending time with my family and working remotely from a cabin in Poltava oblast, Ukraine. The language I needed already existed and was relatively well-known. It’s called AWK!

<figure> My workspace in Poltava oblast<figcaption>My workspace in Poltava oblast</figcaption> </figure>

In AWK, the variable1 $0 refers to the current line of input (called a “record”), and the variables $1, $2 etc. refer to the fields resulting from splitting the record on the field separator (like a tab or a comma).

The analogy with test names in tasty is straightforward: $0 denotes the full path of the test, $1 denotes the outermost test group name, $2 for the next group name, and so on. The test’s own name is $NF.

Then you can use these variables together with string, numeric, and boolean operators. Some examples:

  • $2 == "Two" — select the subgroup Two
  • $2 == "Two" && $3 == "Three" — select the test or subgroup named Three in the subgroup named Two
  • $2 == "Two" || $2 == "Twenty-two" — select two subgroups
  • $0 !~ /skip/ or ! /skip/ — select tests whose full names (including group names) do not contain the word skip
  • $NF !~ /skip/ — select tests whose own names (but not group names) do not contain the word skip
  • $(NF-1) ~ /QuickCheck/ — select tests whose immediate parent group name contains QuickCheck

The list of all supported functions and operators can be found in the README.

As a shortcut, if the -p/--pattern argument consists of letters, digits, and characters, it is matched against the full test path, so -p foo is equivalent to -p /foo/.

The subset of AWK recognized by tasty contains only expressions (no statements like loops or function definitions), no assignment operators, and no variables except NF. Other than that, the most salient deviation is that pattern matching (as in $3 ~ /foo/) does not use regular expressions, for the reasons stated above. Instead, a pattern match means a simple substring search — an idea suggested by Levent Erkök. So /foo/ in tasty means exactly the same as in AWK, while AWK’s /foo+/ cannot be expressed.

This allowed me to drop regex-tdfa as a dependency and significantly speed up the compilation time. An installation of tasty-1.0 (the new major release featuring AWK patterns) from scratch (a fresh cabal sandbox) takes 24 seconds on my laptop2, while an installation of tasty- (the previous version, which depends on regex-tdfa) takes 2 minutes 43 seconds.

The performance improved, too. I tried Levent’s example, which runs 30k dummy tests (j @?= j). When run with --quiet (so no time is wasted on output), tasty- takes 0.3 seconds to run all tests and 0.6 seconds to run a single test selected by a pattern (-p 9_2729). The new tasty-1.0 takes the same time without a pattern, and less than 0.1 seconds with a pattern (also -p 9_2729, which is equivalent to $0 ~ /9_2729/). The overhead of pattern matching, although it was pretty low already (0.3 seconds per 30k tests), became much smaller — so that it is now outweighed by the benefit of running fewer dummy tests. I haven’t done any performance optimization at all3, so I don’t even know where the speedup came from, exactly.

Earlier I said that I dropped regex-tdfa as a dependency for tasty and that regex-tdfa in turn depended on parsec; but didn’t I have to retain parsec or a similar library to parse the AWK syntax? No! We already have a perfectly fine parser combinators module in the base library, Text.ParserCombinators.ReadP. Its original purpose was to back the standard Read instances, but there is no reason it can’t be used for something more fun.

I did borrow one small module from megaparsec for parsing expressions (Text.Megaparsec.Expr), which I adapted to work with ReadP and to parse ternary operators. The expression parser originally comes from parsec, but Mark Karpov did a great job refactoring it, so I recommend you read Mark’s version instead. The expression parser is an ingenious algorithm deserving a separate blog post.

Enjoy the new tasty patterns!

<section class="footnotes">
  1. Actually, unlike in Perl, in AWK $0 is an expression: the operator $ applied to the number 0. Instead of $0 you could write $(1-1). The most practically relevant implication is that you can write $NF for the last field in the record, $(NF-1) for the second to last field and so on.

  2. Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz, 2 cores (4 virtual), SSD.

  3. I had an idea to do partial evaluation of the expression, so that the condition $2 == "X" would prune the whole test subgroup with name $2 == "Y" without having to consider each test individually. But after doing the measurement, I saw that matching is already fast enough. and the extra complexity is not justified.


January 08, 2018 08:00 PM

Functional Jobs

Software Developer - all levels at S&P Global (Full-time)

S&P Global is hiring functional programmers at various levels and locations (New York, Boston, Denver, Delhi)

We are the Analytics Product Development Team, a relatively small group of developers focused on building next generation financial models and applications that leverage S&P's world-class data sets. Last year we launched a re-imagined Portfolio Analytics product that helps investment managers of all types measure the efficacy of their investment strategies and identify areas of risk in their equity portfolios. Put your FP skills to use as we move on to multiple asset classes, intraday analytics, and strategy modeling.

Functional Programming has a relatively long history here at S&P Global. We build our back-end data calculation engine using purely-functional Scala in 2008 and have been building new models and expanding it ever since. We created Ermine, a Haskell-like language featuring row types that can run on the JVM. Ermine drives our templating, reporting, and database access, adding type-safety to user-generated layouts. The new Portfolio Analytics is a single page web application that makes extensive use of PureScript. All of this co-exists in a diverse tech ecosystem that includes the JVM, .NET, and SQL Server.

We have a few open positions, so we are looking for developers with varying levels of experience. Ideal candidates have FP experience, but we'd still like to talk with you if you are just getting started with FP and are looking to grow.

Sponsorship is not available for these positions.

S&P Global is an equal opportunity employer committed to making all employment decisions on the basis of merit, capability and equality of opportunity, and without regard to race/ethnicity, gender, pregnancy, gender identity or expression, color, creed, religion, national origin, age, disability, marital status (including domestic partnerships and civil unions), sexual orientation, military veteran status, unemployment status, or any other basis prohibited by federal, state or local law, or any other characteristic that has no bearing on a person’s ability to perform his or her job.

Only electronic job submissions will be considered for employment. If you need an accommodation during the application process due to a disability, please send an email to: EEO [dot] Compliance [at] spglobal [dot] com and your request will be forwarded to the appropriate person

Get information on how to apply for this position.

January 08, 2018 04:26 PM

January 06, 2018

Comonad Reader

The State Comonad

Is State a Comonad?

Not Costate or rather, Store as we tend to call it today, but actually State s itself?

Let's see!

Recently there was a post to reddit in which the author King_of_the_Homeless suggested that he might have a Monad for Store. Moreover, it is one that is compatible with the existing Applicative and ComonadApply instances. My knee-jerk reaction was to disbelieve the result, but I'm glad I stuck with playing with it over the last day or so.

In a much older post, I showed how to use the Co comonad-to-monad-transformer to convert Store s into State s, but this is a different beast, it is a monad directly on Store s.

{-# language DeriveFunctor #-}
import Control.Comonad
import Data.Semigroup
data Store s a = Store
  { peek :: s -> a, pos :: s }
  deriving Functor
instance Comonad (Store s) where
  extract (Store f s) = f s
  duplicate (Store f s) = Store (Store f) s
 (Semigroup s, Monoid s) =>
 Applicative (Store s) where
  pure a = Store (const a) mempty
  Store f s < *> Store g t = Store (\m -> f m (g m)) (mappend s t)
 Semigroup s =>
 ComonadApply (Store s) where
  Store f s < @> Store g t = Store (\m -> f m (g m)) (s <> t)
 (Semigroup s, Monoid s) =>
 Monad (Store s) where
  return = pure
  m >>= k = Store
    (\s -> peek (k (peek m s)) s)
    (pos m `mappend` pos (k (peek m mempty)))

My apologies for the Semigroup vs. Monoid, noise, as I'm still using GHC 8.2 locally. This will get a bit cleaner in a couple of months.

Also, peek here is flipped relative to the version in Control.Comonad.Store.Class so that I can use it directly as a field accessor.

As I noted, at first I was hesitant to believe it could work, but then I realized I'd already implemented something like this for a special case of Store, in the 'streams' library, which got me curious. Upon reflection, this feels like the usual Store comonad is using the ability to distribute (->) e out or (,) e in using a "comonoid", which is always present in Haskell, just like how the State monad does. But the type above seems to indicate we can go the opposite direction with a monoid.

So, in the interest of exploring duality, let's see if we can build a comonad instance for `State s`!

Writing down the definition for state:

newtype State s a = State
  { runState :: s -> (a, s) }
  deriving Functor
instance Applicative (State s) where
  pure a = State $ \s -> (a, s)
  State mf < *> State ma = State $ \s -> case mf s of
    (f, s') -> case ma s' of
      (a, s'') -> (f a, s'')
instance Monad (State s) where
  return = pure
  State m >>= k = State $ \s -> case m s of
    (a, s') -> runState (k a) s'

Given a monoid for out state, extraction is pretty obvious:

 Monoid s =>
 Comonad (State s) where
  extract m = fst $ runState m mempty

But the first stab we might take at how to duplicate, doesn't work.

  duplicate m = State $ \s ->
   (State $ \t -> runState m (mappend s t)
   , s

It passes the `extract . duplicate = id` law easily:

extract (duplicate m)
= extract $ State $ \s -> (State $ \t -> runState m (mappend s t), s)
= State $ \t -> runState m (mappend s mempty)
= State $ \t -> runState m t
= State $ runState m
= m

But fails the second law:

fmap extract (duplicate m)
= fmap extract $ State $ \s -> (State $ \t -> runState m (mappend s t), s)
= State $ \s -> (extract $ State $ \t -> runState m (mappend s t), s)
= State $ \s -> (fst $ runState m (mappend s mempty), s)
= State $ \s -> (evalState m s, s)

because it discards the changes in the state.

But the King_of_the_Homeless's trick from that post (and the Store code above) can be modified to this case. All we need to do is ensure that we modify the output state 's' as if we'd performed the action unmolested by the inner monoidal state that we can't see.

  duplicate m = State $ \s ->
    ( State $ \t -> runState m (mappend s t)
    , snd $ runState m s


extract (duplicate m)
= extract $ State $ \s -> (State $ \t -> runState m (mappend s t), snd $ runState m s)
= State $ \t -> runState m (mappend mempty t)
= State $ \t -> runState m t
= State $ runState m
= m

just like before, but the inner extraction case now works out and performs the state modification as expected:

fmap extract (duplicate m)
= fmap extract $ State $ \s -> (State $ \t -> runState m (mappend s t), snd $ runState m s)
= State $ \s -> (extract $ State $ \t -> runState m (mappend s t), snd $ runState m s)
= State $ \s -> (fst $ runState m (mappend s mempty), snd $ runState m s)
= State $ \s -> (fst $ runState m s, snd $ runState m s)
= State $ \s -> runState m s
= State $ runState m
= m

This is still kind of a weird beast as it performs the state action twice with different states, but it does pass at least the left and right unit laws.

Some questions:

1. Proving associativity is left as an exercise. It passes visual inspection and my gut feeling, but I haven't bothered to do all the plumbing to check it out. I've been wrong enough before, it'd be nice to check!

[Edit: Simon Marechal (bartavelle) has a coq proof of the associativity and other axioms.]

2. Does this pass the ComonadApply laws?

instance Monoid s => ComonadApply (State s) where
  (< @>) = (< *>)

[Edit: No.]

3. The streams code above suggests at least one kind of use-case, something like merging together changes of position in a stream, analogous to the "zipping monad" you have on infinite streams. But now the positions aren't just Integers, they are arbitrary values taken from any monoid you want. What other kind of spaces might we want to "zip" in this manner?

4. Is there an analogous construction possible for an "update monad" or "coupdate comonad"? Does it require a monoid that acts on a monoid rather than arbitrary state like the semi-direct product of monoids or a "twisted functor"? Is the result if it exists a twisted functor?

5. Does `Co` translate from this comonad to the monad on `Store`?

6. Is there a (co)monad transformer version of these?

Link: Gist

by Edward Kmett at January 06, 2018 08:00 PM

Russell O'Connor

Verifying Bech32 Checksums with Pen and Paper

Today we are going to learn how to verify a Bech32 checksum using only pen and paper. This is useful in those cases where you need to urgently validate the checksum of a Bech32 address, but the electricity has gone out in your home or office and you have lost your smartphone.

We are going to do a worked example of verifying the checksum of BC1SW50QA3JX3S, which is one of the test vectors from the Bech32 specification. However, before we begin, we need to make some preparations. We will need three tables.

The table of power
a xe86fe
c wt5v4t
d 4vljgv
e ukpcrk
f 0reszr
g a7vy57
h k5glc5
j 7xmfyx
k yfatwf
l t2ymv2
m 39zex9
n vmwajm
p ja45ka
q qqqqqq
r lwk4nw
s n4cgp4
t zs638s
u 5yjwly
v 832x73
w 2zf8mz
x hu9r0u
y 60xz20
z dnrp9n
0 clundl
2 sd093d
3 pgduhg
4 m8t7a8
5 f672t6
6 rchdsc
7 eh306h
8 9pshep
9 gjnkuj

The matrix of wisdom
acde fghj klmn pqrs tuvw xyz0 2345 6789
a q9sy 5420 tzxw ua7d kp3n melj hvgf 8r6c
c 9q4p 3s02 w8rt ecmg ny5k 7u6h jfdv zxla
d s4q5 y96l mjk7 vdwa x3pr tf0z 8uce hn2g
e yp5q s3wt 0xz2 ce6f j94h lamk ngvd r87u
f 53ys qp7m lkj6 gf2e z498 0dtx rcua nhwv
g 4s93 pql6 7hnm fgtc r5yx wv28 zeau jk0d
h 206w 7lq9 pgvy kh58 utme 3n4c axzr dfsj
j 02lt m69q ydfp nj3z ew7u 5ksa cr8x gv4h
k twm0 l7py qfd9 hk4x a26c sj5e u8rz vg3n
l z8jx khgd fqyv 7lu0 5rn3 emas 4w2t 9pc6
m xrkz jnvf dyqg 6mct s8h4 ale5 32w0 p9u7
n wt72 6myp 9vgq jnsr c0la 4h3u ezx8 fd5k
p uevc gfkn h76j qpz3 2ad0 89rw ts54 mlxy
q acde fghj klmn pqrs tuvw xyz0 2345 6789
r 7mw6 2t53 4ucs zrqn gl0d 98pv fjkh eayx
s dgaf ec8z x0tr 3snq mvu7 k5jl 6p9y 2wh4
t knxj zrue a5sc 2tgm qh89 d0fy p67l 34vw
u py39 45tw 2r80 aulv hqsj 6c7n kdfg xzme
v 35p4 9ym7 6nhl dv0u 8sqz 2gwr xaec kjtf
w nkrh 8xeu c34a 0wd7 9jzq g2vp ylm6 5sft
x m7tl 0w35 sea4 8x9k d62g qzyf vhnj ucpr
y eufa dvnk jmlh 9y85 0cg2 zqxt w43s 76rp
z l60m t24s 5ae3 rzpj f7wv yxqd gnhk cu98
0 jhzk x8ca es5u w0vl ynrp ftdq 976m 43g2
2 hj8n rzac u43e t2f6 pkxy vwg9 qml7 s5d0
3 vfug cexr 8w2z s3jp 6dal h4n7 mqy9 t0k5
4 gdcv uaz8 r2wx 54k9 7fem n3h6 lyqp 0tjs
5 fved aurx zt08 45hy lgc6 jskm 79pq w2n3
6 8zhr njdg v9pf m6e2 3xk5 u7c4 st0w qyal
7 rxn8 hkfv gp9d l7aw 4zjs c6u3 50t2 yqem
8 6l27 w0s4 3cu5 x8yh vmtf pr9g dkjn aeqz
9 cagu vdjh n67k y9x4 weft rp82 05s3 lmzq

The list of courage
bc1 rzqrrp
tb1 z5qrrp

Print out these tables and keep them with your emergency supplies so that you can find them when you need them.

Now we can begin. Split the message BC1SW50QA3JX3S into its prefix, BC1, and suffix SW50QA3JX3S. Take the suffix and write it vertically on a piece of paper, leaving a gap after each letter and then a line.













Find the prefix in the list of courage and write the associated word after the first letter, S, placing a diagonal line between each letter. For new prefixes, you may need to add them to the list of courage beforehand.



Take the last letter, which is p, and look it up in the table of power to find its associated word, which is ja45ka. Write this word in the gap under the Srzqrrp, extending the diagonal lines between each letter.


For each letter of this power word, we need to use the matrix of wisdom to add it to the letter above and to the left of it. For example, we look up row S and column j in the matrix of wisdom and we find the letter z. Write z after the W below the line, separating it with a diagonal line again.


We look up row r and column a in the matrix of wisom to find the number 7. We add 7 after the z, and keep doing this until every pair of letters is done. The matrix of wisdom is symmetric, so you do not have to worry about whether you are looking up by row/column or column/row.


We repeat this process with the next line. First we lookup 7 in the table of power to find eh306h and write it underneath.


Then, for each pair of letters, we add them using the matrix of wisdom.


We keep doing this until we go through all the letters of the suffix.


The final result should be pqqqqq, where q is the most powerful letter and p is the wisest letter. If you did not get this result, start over from the beginning because you probably made a mistake. Remember to mind your p's and q's.

After a couple years of practice doing this by hand, the operations become natural. For example, you learn that x and y equals z, and so forth.

Exercise for the reader: Create a variant of this procedure for computing Bech32 checksums.

P.S. This article is not meant to be taken seriously.

January 06, 2018 04:40 PM

January 05, 2018

Douglas M. Auclair (geophf)

December 2017 1HaskellADay 1Liners problems and solutions

  • December 29th, 2017:
    given f :: Monad m => n -> a -> m (Maybe b)
    define g :: Monad m => n -> a -> m (a, Maybe b)
    using f and ... arrows? Kleisli category?
    • Bazzargh @bazzargh (\n a->liftM ((,) a) (f n a)) ... according to, that's `liftM2 fmap (,) . f` but I can't pretend to get the transformation
  • December 29th, 2017:
    given f :: a -> b
    define g :: [a] -> [Maybe c] -> [(b, c)]

    >>> g [1,2,3] [Just 7, Nothing, Just 10]

    when f = show
    • matt @themattchan
      g = catMaybes ... zipWith (fmap . (,) . f)
      where (...) = (.).(.)
    • garrison @GarrisonLJ g a b = map (***fromJust) . filter (isJust . snd) $ zip a b
    • TJ Takei @karoyakani g = (catMaybes .) . zipWith ((<$>) . (,) . f)
  • December 29th, 2017: define f :: [(a,b)] -> ([a], [b])
    • Андреев Кирилл @nonaem00 and matt @themattchan unzip
    • Victoria C @ToriconPrime f = fmap fst &&& fmap snd
      • (in a vacuum, a more general type signature would be inferred, but the compiler limits itself as instruct)

by geophf ( at January 05, 2018 05:55 PM

January 04, 2018

Twan van Laarhoven

Type theory with indexed equality - the theory

In a previous post I introduced the TTIE language, along with a type checker and interpreter. My motivation for writing that (aside from it being fun!) was to explore the type system. At the time I started this project, formalizing this system as a shallow embedding in Agda was not easy. But with the addition of a rewriting mechanism, it has become much easier to use Agda without going insane from having to put substitutions everywhere. So, in this post I will formalize the TTIE type system.

This post is literate Agda, and uses my own utility library. The utility library mainly defines automatic rewrite rules like trans x (sym x) refl, which make life a bit more pleasant. All these rewrites use the standard library propositional equality , which I will call meta equality. . All these rewrites use the standard library propositional equality, which I will denote as and call meta equality.

{-# OPTIONS --rewriting #-}
module _ where
open import Util.Equality as Meta using (_∎) renaming (__ to __; refl to □; _≡⟨__ to ___; _≡⟨_⟩⁻¹_ to ___) open import Data.Product open import Data.Sum open import Data.Nat using (; zero; suc) open import Data.Vec open import Function open import Level renaming (zero to lzero; suc to lsuc)

First we postulate the existence of the interval. I will abbreviate the interval type as I.

postulate I : Set
postulate i₀ : I
postulate i₁ : I

The canonical eliminator for the interval needs equalities, to show that i₀ and i₁ are mapped to equal values. But we haven't defined those yet. However, there is one eliminator that we can define, namely into I, since values in I are always equal.

postulate icase : I  I  I  I
postulate icase-i₀ :  a b  icase a b i₀a
postulate icase-i₁ :  a b  icase a b i₁b
{-# REWRITE icase-i₀ icase-i₁ #-}

And with this icase construct, we can define conjunction, disjunction, and negation

_&&_ : I  I  I
i && j = icase i₀ j i
_||_ : I I I i || j = icase j i₁ i
inot : I I inot = icase i₁ i₀

We can define some extra computation rules based on the principle that when evaluating icase a b c, if we use the a branch then c = i₀, and similarly for b.

postulate icase-same :  (a b c : I  I) d  a i₀c i₀  b i₁c i₁
                      icase (a d) (b d) dc d
icase-const : a b icase a a ba icase-id : a icase i₀ i₁ aa icase-i₀-x : b icase i₀ b bb icase-i₁-x : b icase i₁ b bi₁ icase-x-i₀ : a icase a i₀ ai₀ icase-x-i₁ : a icase a i₁ aa
<details><summary class="comment">Show implementation</summary>icase-const a b = icase-same (const a) (const a) (const a) b □ □ icase-id a = icase-same (const i₀) (const i₁) id a □ □ icase-i₀-x b = icase-same (const i₀) id id b □ □ icase-i₁-x b = icase-same (const i₁) id (const i₁) b □ □ icase-x-i₀ a = icase-same id (const i₀) (const i₀) a □ □ icase-x-i₁ a = icase-same id (const i₁) id a □ □
{-# REWRITE icase-const #-} {-# REWRITE icase-id #-} {-# REWRITE icase-i₀-x #-} {-# REWRITE icase-i₁-x #-} {-# REWRITE icase-x-i₀ #-} {-# REWRITE icase-x-i₁ #-} </details>

The equality type

We can now define the indexed equality type

data Eq {a} (A : I  Set a) : A i₀  A i₁  Set a where
  refl :  (x : (i : I)  A i)  Eq A (x i₀) (x i₁)

For convenience we write the non-indexed object level equality as

__ :  {a} {A : Set a}  A  A  Set a
__ {A = A} x y = Eq (\_  A) x y

And now that we have equalities, we can write down the the general dependent eliminator for the interval,

postulate _^_ :  {a A x y}  Eq {a} A x y  (i : I)  A i
postulate ^-i₀   :  {a A x y} x≡y  _^_ {a} {A} {x} {y} x≡y i₀x
postulate ^-i₁   :  {a A x y} x≡y  _^_ {a} {A} {x} {y} x≡y i₁y
postulate ^-refl :  {a A} x  _^_ {a} {A} {x i₀} {x i₁} (refl x) ⟹ x
{-# REWRITE ^-i₀ ^-i₁ ^-refl #-}
infixl 6 _^_

At the same time, the _^_ operator also functions as an eliminator for Eq, projecting out the argument to refl. This also means that we have the following eta contraction rule

refl-eta :  {a A x y} (x≡y : Eq {a} A x y)  refl (\i  x≡y ^ i) ⟹ x≡y -- HIDE a
refl-eta (refl x) ={-# REWRITE refl-eta #-}

These definitions are enough to state some object level theorems, such as function extensionality

ext:  {a} {A B : Set a} {f g : A  B}  ( x  f x  g x)  f  g -- HIDE a
extf≡g = refl \i  \x  f≡g x ^ i


cong:  {a b} {A : Set a} {B : Set b} (f : A  B) {x y}  x  y  f x  f y -- HIDE a|b
congf x≡y = refl \i  f (x≡y ^ i)

and symmetry of ,

sym:  {a} {A : Set a} {x y : A}  x  y  y  x -- HIDE a
symx≡y = refl \i  x≡y ^ inot i

We can also define dependent versions of all of the above, which are the same, only with more general types. I'll leave these as an exercise for the reader.

<details><summary class="comment">spoiler</summary>sym :  {a} {A : I  Set a} {x y}  Eq A x y  Eq (A  inot) y x
sym x≡y = refl \i  x≡y ^ inot i


In general, to make full use of equalities, you would use substitution, also called transport. I will formalize this as

postulate tr :  {a} (A : I  Set a)  A i₀  A i₁ -- HIDE a

Where tr stands for transport, since we transport a value of type A i₀ along A, to a value of type A i₁. This should be possible, because there is a path between i₀ and i₁, that is, they are indistinguishable, and because functions are continuous. So A is a continuous path between A i₀ and A i₁. In a previous blog post I have used a more general cast primitive, which can be defined in terms of tr,

cast :  {a} (A : I  Set a)  (j₀ j₁ : I)  A j₀  A j₁ -- HIDE a
cast A j₀ j₁ = tr (\i  A (icase j₀ j₁ i))

And now we can define things like the usual substitution

subst :  {a b} {A : I  Set a} (B : {i : I}  A i  Set b) {x} {y}  Eq A x y  B x  B y -- HIDE a|b
subst B xy = tr (\i  B (xy ^ i))

and the J axiom

jay :  {A : Set} {x : A} (B : {y : A}  x  y  Set)  {y : A}  (x≡y : x  y)
     B (refl (\_  x))  B x≡y
jay B xy = tr (\i  B {xy ^ i} (refl \j  xy ^ (j && i)))

Yay, jay!

Evaluating transport

To be useful as a theory of computation, all primitives in our theory should reduce. In particular, we need to know how to evaluate tr, at least when it is applied to arguments without free variables. We do this by pattern matching on the first argument of tr, and defining transport for each type constructor.

The simplest case is if the type being transported along doesn't depend on the index at all

postulate tr-const :  {a} {A : Set a} {x}  tr (\_  A) xx -- HIDE a
{-# REWRITE tr-const #-}

Much more interesting is the case when the type is a function type. To cast function types, we first transport the argument 'back', apply the function, and then transport the result forward. First look at the non-dependent case, i.e. going from A i₀ B i₀ to A i₁ B i₁:

postulate tr-arrow :  {a b} {A : I  Set a} {B : I  Set b} {f} -- HIDE a|b
                    tr (\i  A i  B i) f
                   ⟹ (\x  tr B (f (cast A i₁ i₀ x)))

The dependent case is a bit more complicated, since the type of the result depends on the transported argument. The result of the function has type B i₀ (cast A i₁ i₀ x), and we have to transport this to B i₁ x. So as we go from i₀ to i₁, we want to "undo" the cast operation. We can do this by changing both i₀'s to i₁'s, to get a value of the type B i₁ (cast A i₁ i₁ x). Because cast A i₁ i₁ xx by icase-const and tr-const, this is equivalent to B i₁ x.

postulate tr-pi :  {a b} {A : I  Set a} {B : (i : I)  (A i)  Set b} {f} -- HIDE a|b
                 tr (\i  (x : A i)  B i x) f
                ⟹ (\x  tr (\i  B i (cast A i₁ i x)) (f (cast A i₁ i₀ x)))

Besides function/pi types, there are also product/sigma types. The idea here is similar: transport both parts of the pair independently. Again, the type of the second part can depend on the transported first part,

postulate tr-sigma :  {a b} {A : I  Set a} {B : (i : I)  A i  Set b} {x y} -- HIDE a|b
                       tr (\i  Σ (A i) (B i)) (x , y)
                      ⟹ (tr A x , tr (\i  B i (cast A i₀ i x)) y)

Finally, let's look at sum types, for which we use simple recursion,

postulate tr-sum₁ :  {a b} {A : I  Set a} {B : I  Set b} {x} -- HIDE a|b
                   tr (\i  A i  B i) (inj₁ x) ⟹ inj₁ (tr A x)
postulate tr-sum₂ :  {a b} {A : I  Set a} {B : I  Set b} {x} -- HIDE a|b
                   tr (\i  A i  B i) (inj₂ x) ⟹ inj₂ (tr B x)

Transport for equality types

The final type constructors in our language are equality types, and this is where things get more hairy. The idea is that a type like Eq A x y behaves like A in many respects. Its values will just be wrapped in a refl constructor.

Consider the case of equalities over (dependent) function types. The evaluation rule could look like

postulate tr-eq-pi
           :  {a b} {A : I  I  Set a} -- HIDE a|b
               {B :  i j  A i j  Set b} -- HIDE a|b
               {u :  i  (x : A i i₀)  B i i₀ x}
               {v :  i  (x : A i i₁)  B i i₁ x}
               {f₀ : Eq (\j  (x : A i₀ j)  B i₀ j x) (u i₀) (v i₀)}
            tr (\i  Eq (\j  (x : A i j)  B i j x) (u i) (v i)) f₀refl \j  \x 
             let x' = \i' j'  tr (\i  A (icase i₁ i' i) (icase j j' i)) x in
             (tr (\i  Eq (\j'  B i j' (x' i j')) (u i (x' i i₀)) (v i (x' i i₁)))
                 (refl \j'  (f₀ ^ j') (x' i₀ j'))) ^ j

Of course the A in Eq A x y could again be an equality type, and we would have to repeat the construction. To do this systematically, I start by collecting all the 'sides' of the equality type recursively. For example the sides of Eq (\i Eq (\j _) x y) u v) are eq (\i eq (\j done) x y) u v,

  data Sides {a} :  n (A : Vec I n  Set a)  Set (lsuc a) where
    done :  {A}  Sides zero A
    eq   :  {n A}
          (sides : (i : I)  Sides n (\is  A (i  is)))
          Eqs (sides i₀)
          Eqs (sides i₁)
          Sides (suc n) A
Eqs : {a n A} Sides {a} n A Set a Eqs {A = A} done = A [] Eqs {A = A} (eq sides x y) = Eq (\i Eqs (sides i)) x y

Since I A are the continuous functions out of the 1-dimensional interval, you can think of a Vec I n A as a continuous function out of the n-dimensional hypercube. So in geometric terms, we can draw such a function as assigning a value to all elements of the hypercube. Similarly, you can think of Sides {n = n} as a function out of the n-dimensional hypercube with the central cell removed, and Eqs as filling in that central cell.

Eqs 0 Sides 1 Eqs 1 Vec I 1 A Sides 2 Eqs 2 Vec I 2 A

I will spare you the details, see the source code of this post if you are interested. Suffice to say, that if we generalize _^_, icase, etc. from I to Vec I n and from Eq to Eqs, then we can generalize tr-eq-pi to arbitrarily deep Eqs.

tr-eqs-pi-rhs :  {a b n} {A : I  Vec I n  Set a} -- HIDE a|b
                {B : (i : I)  (is : Vec I n)  A i is  Set b} -- HIDE a|b
               (sides : (i : I)  Sides n (\js  (x : A i js)  B i js x))
               Eqs (sides i₀)
               Eqs (sides i₁)
postulate tr-eqs-pi :  {a b n}
                        {A : I  Vec I n  Set a}
                        {B : (i : I)  (is : Vec I n)  A i is  Set b}
                        (sides : (i : I)  Sides n (\js  (x : A i js)  B i js x))
                        (f₀ : Eqs (sides i₀))
                     tr (Eqs  sides) f₀tr-eqs-pi-rhs sides f₀

You can do a similar thing for sigma types, except that the types get even messier there because we need a dependently typed map function for Eqs and Sides.

This is the evaluation strategy implemented in the current TTIE interpreter. But it has two issues: 1) it is error prone and ugly 2) we still haven't defined tr (Eq Set u v)

What remains is to define tr (Eq Set u v).

A note about transitivity

Note that transitivity can be defined by transporting along an equality,

trans:  {a} {A : Set a} {x y z : A}  x  y  y  z  x  z -- HIDE a
trans′ {y = y} x≡y y≡z = tr (\i  (x≡y ^ inot i)  (y≡z ^ i)) (refl \_  y)

There are several ways to generalize this to dependent types. I'll use a variant that is explicit about the type

trans :  {a} (A : I  I  Set a) {x y z} -- HIDE a
       Eq (\i  A i₀ i) x y
       Eq (\i  A i i₁) y z
       Eq (\i  A i i) x z
trans A {y = y} x≡y y≡z = tr (\i  Eq (\j  A (icase i₀ i j) (icase (inot i) i₁ j)) (x≡y ^ inot i) (y≡z ^ i)) (refl \_  y)

Just as transitivity can be defined in terms of tr, the converse is also true. Instead of specifying transport for nested equality types, we could define tr for Eq types in terms of transitivity and symmetry.

The most general case of such a transport is

xy = fw (\i  Eq (\j  A i j) (ux ^ i) (vy ^ i)) uv


ux : Eq (\i  A i i₀) u x
vy : Eq (\i  A i i₁) v y
uv : Eq (\j  A i₀ j) u v

which we can draw in a diagram as u : A i₀ i₀ v : A i₀ i₁ x : A i₁ i₀ y : A i₁ i₁ uv ux vy

If you ignore the types for now, it seems obvious that

xy = trans (trans ((sym ux) uv) vy)

So, we could take

postulate tr-eq :  {a} {A : I  I  Set a} -- HIDE a
                    (ux :  i  A i i₀)
                    (vy :  i  A i i₁)
                    (uv : Eq (A i₀) (ux i₀) (vy i₀))
                 tr (\i  Eq (A i) (ux i) (vy i)) uvtrans (\i j  A (icase i₁ i j) (icase i i j))
                    (refl (ux  inot)) (trans A uv (refl vy))

I will stick to taking tr as primitive. However, this definition will come in handy for defining transport along paths between types.

Inductive types

It is straightforward to extend the theory with inductive types and higher inductive types. Here are some concrete examples, taken from the HoTT book.

The homotopy circle

postulate Circle : Set
postulate point  : Circle
postulate loop   : Eq (\_  Circle) point point
postulate Circle-elim :  {a} {A : Circle  Set a} -- HIDE a
                       (p : A point)
                       (l : Eq (\i  A (loop ^ i)) p p)
                       (x : Circle)  A x

with the computation rules

postulate elim-point :  {a A p l}  Circle-elim {a} {A} p l pointp -- HIDE a
postulate elim-loop  :  {a A p l i}  Circle-elim {a} {A} p l (loop ^ i) ⟹ l ^ i -- HIDE a
{-# REWRITE elim-point #-}
{-# REWRITE elim-loop #-}

Technically we would also need to specify elim for transitive paths (or paths constructed with tr). First the non-dependent version,

postulate Circle-elim′-tr-eq :  {a A p l} (x y : I  Circle) xy i -- HIDE a
             Circle-elim {a} {\_  A} p l (tr (\j  x j  y j) xy ^ i) -- HIDE atr (\j  Circle-elim {a} {\_  A} p l (x j) -- HIDE a
                       Circle-elim {a} {\_  A} p l (y j)) -- HIDE a
                  (refl \k  Circle-elim {a} {\_  A} p l (xy ^ k)) ^ i -- HIDE a

To write down the dependent version, it is helpful to first define a generalized version of transport over equality types. This generalized equality transport doesn't just give the final path, but also any of the sides, depending on the argument. Fortunately, it can be defined in terms of the existing transport primitive tr.

treq :  {a} (A : I  I  Set a) -- HIDE a
      (x :  i  A i i₀) (y :  i  A i i₁) (xy : Eq (\j  A i₀ j) (x i₀) (y i₀))
      (i j : I)  A i j
treq A x y xy i j = tr (\k  Eq (A (i && k)) (x (i && k)) (y (i && k))) xy ^ j

Note that we have

treq-i-i₀ :  {a} A x y xy i  treq {a} A x y xy i i₀x i -- HIDE a
treq-i-i₁ :  {a} A x y xy i  treq {a} A x y xy i i₁y i -- HIDE a
treq-i₀-j :  {a} A x y xy j  treq {a} A x y xy i₀ jxy ^ j -- HIDE a
treq-i₁-j :  {a} A x y xy j  treq {a} A x y xy i₁ jtr (\i  Eq (A i) (x i) (y i)) xy ^ j -- HIDE a

Now the dependent version of commuting Circle-elim for transitive paths looks like this:

postulate Circle-elim-tr-eq :  {a A p l} (x y : I  Circle) xy i -- HIDE a
             Circle-elim {a} {A} p l (tr (\j  x j  y j) xy ^ i)
            ⟹ tr (\j  Eq (\k  A (treq _ x y xy j k)) 
                          (Circle-elim {a} {A} p l (x j))
                          (Circle-elim {a} {A} p l (y j)))
                 (refl \k  Circle-elim {a} {A} p l (xy ^ k)) ^ i

We also need to continue this for higher paths, but that should be straightforward, if tedious.

<details><summary class="comment">tedious next step...</summary>postulate Circle-elim-tr-eq-eq :  {a A p ll} (x y : I  I  Circle) -- HIDE a
                                   (xy₀ :  k  x k i₀  y k i₀) (xy₁ :  k  x k i₁  y k i₁)
                                   xy i j
             Circle-elim {a} {A} p ll (tr (\k  Eq (\l  x k l  y k l) (xy₀ k) (xy₁ k)) xy ^ i ^ j)
            ⟹ tr (\k  Eq (\l  Eq (\m  A (tr (\k'  Eq (\l'  x (k && k') l'  y (k && k') l')
                                                         (xy₀ (k && k'))
                                                         (xy₁ (k && k'))) xy ^ l ^ m) )
                                   (Circle-elim {a} {A} p ll (x k l))
                                   (Circle-elim {a} {A} p ll (y k l)))
                          (refl \l  Circle-elim {a} {A} p ll (xy₀ k ^ l))
                          (refl \l  Circle-elim {a} {A} p ll (xy₁ k ^ l)))
                 (refl \k  refl \l  Circle-elim {a} {A} p ll (xy ^ k ^ l)) ^ i ^ j


postulate Truncate : Set  Set
postulate box  :  {A}  A  Truncate A
postulate same :  {A} x y  Eq (\_  Truncate A) x y
module _ {p} {A} {P : Truncate A Set p} -- HIDE p (b : (x : A) P (box x)) (s : {x y} (px : P x) (py : P y) Eq (\i P (same x y ^ i)) px py) where
postulate Truncate-elim : (x : Truncate A) P x
postulate elim-box : x Truncate-elim (box x) ⟹ b x postulate elim-same : x y i Truncate-elim (same x y ^ i) ⟹ s (Truncate-elim x) (Truncate-elim y) ^ i

Notice that in the eliminator for every path constructor, we expect an argument of type P "along that path constructor".

Quotient types

postulate _/_      : (A : Set)  (R : A  A  Set)  Set
postulate quot     :  {A R}  A  A / R
postulate eqn      :  {A R}  (x y : A)  R x y  Eq (\_  A / R) (quot x) (quot y)
postulate truncate :  {A R}  (x y : A / R)  (r s : Eq (\_  A / R) x y)  r  s
module _ {A R} {P : A / R Set} (q : (x : A) P (quot x)) (e : {x y} (r : R x y) Eq (\i P (eqn x y r ^ i)) (q x) (q y)) (t : {x y r s} (px : P x) (py : P y) (pr : Eq (\i P (r ^ i)) px py) (ps : Eq (\i P (s ^ i)) px py) Eq (\i Eq (\j P (truncate x y r s ^ i ^ j)) px py) pr ps) where
postulate /-elim : (x : A / R) P x
postulate elim-quot : x /-elim (quot x) ⟹ q x postulate elim-eqn : x y r i /-elim (eqn x y r ^ i) ⟹ e r ^ i postulate elim-truncate : x y r s i j /-elim (truncate x y r s ^ i ^ j) ⟹ t (/-elim x) (/-elim y) (refl \k /-elim (r ^ k)) (refl \k /-elim (s ^ k)) ^ i ^ j

Indexed types

One caveat to the support of inductive types are indexed types. These are the types with parameters whose value can depend on the constructor, written after the colon in Agda. An obvious example is the standard inductive equality type as it is defined in the standard library,

data __ {A : Set} (x : A) : A  Set where
  refl : xx

Another example are length indexed vectors,

data Vec (A : Set) :   Set where
  [] : Vec A zero
  __ :  {n}  A  Vec A n  Vec A (suc n)

Such inductive types introduce a new kind of equality, and we can't have that in TTIE.

Fortunately, outlawing such definitions is not a big limitation, since any indexed type can be rewritten to a normal inductive type by making the equalities explicit. For example

data Vec (A : Set) (n : ) : Set where
  [] : n  zero  Vec A n
  __ :  {m}  A  Vec A m  n  suc m  Vec A n


The final ingredient to turn TTIE into a homotopy type theory is the univalence axiom. A univalence primitive might look like this:

postulate univalence :  {a} {A B : Set a} -- HIDE a
                      (f : A  B)
                      (g : B  A)
                      (gf :  x  g (f x)  x)
                      (fg :  x  f (g x)  x)
                      (fgf :  x  congf (gf x)  fg (f x))
                      Eq (\_  Set a) A B -- HIDE a

By using an equality constructed with univalence in a transport, you can recover the forward and backward functions,

fw :  {a} {A B : Set a}  A  B  A  B -- HIDE a
fw A≡B = tr (_^_ A≡B)
bw : {a} {A B : Set a} A B B A -- HIDE a bw A≡B = tr (_^_ A≡B inot)

as well as the proofs of left and right-inverse,

bw∘fw :  {a} {A B : Set a}  (A≡B : A  B)   x  bw A≡B (fw A≡B x)  x -- HIDE a
bw∘fw A≡B x = refl \j  tr (\i  A≡B ^ icase (inot j) i₀ i)
                       (tr (\i  A≡B ^ icase i₀ (inot j) i) x)
fw∘bw : {a} {A B : Set a} (A≡B : A B) x fw A≡B (bw A≡B x) x -- HIDE a fw∘bw A≡B x = refl \j tr (\i A≡B ^ icase j i₁ i) (tr (\i A≡B ^ icase i₁ j i) x)

Here the trick is that when j = i₁, the transports become the identity, while otherwise they become fw and bw.

Getting out the adjunction fgf is a bit harder. You need to come up with an expression that reduces to f (gf x ^ k) when j = i₀ and that reduces to (fg (f x) ^ k) when j = i₁. The following does the trick

not-quite-fw∘bw∘fw :  {a} {A B : Set a}  (A≡B : A  B)   x -- HIDE a
                    cong′ (fw A≡B) (bw∘fw A≡B x)  fw∘bw A≡B (fw A≡B x)
not-quite-fw∘bw∘fw A≡B x = refl \j 
  refl \k  tr (\i  A≡B ^ icase                          (icase i₀ k j) i₁ i)
          $ tr (\i  A≡B ^ icase    (icase (inot k) i₁ j) (icase i₀ k j)    i)
          $ tr (\i  A≡B ^ icase i₀ (icase (inot k) i₁ j)                   i) x)

but the type is not right. We want an equality between two equalities, both of type fw (bw (fw x)) x. But instead we get a dependent equality type that mirrors the body of the definition.

To resolve this, we need to add another reduction rule to the language, which states that if you transport from i₀ to i and then to i₁, this is the same as going directly from i₀ to i₁. This should hold regardless of what i is.

postulate tr-tr :  {a} (A : I  Set a) i x  tr (A  icase i i₁) (tr (A  icase i₀ i) x) ⟹ tr A x -- HIDE a
postulate tr-tr-i₀ :  {a} A x  tr-tr {a} A i₀ x ⟹ □ -- HIDE a
postulate tr-tr-i₁ :  {a} A x  tr-tr {a} A i₁ x ⟹ □ -- HIDE a
{-# REWRITE tr-tr-i₀ tr-tr-i₁ #-}
fw∘bw∘fw :  {a} {A B : Set a}  (A≡B : A  B)   x 
          cong′ (fw A≡B) (bw∘fw A≡B x)  fw∘bw A≡B (fw A≡B x)
fw∘bw∘fw A≡B x = 
<details><summary class="comment">-- same as above, with ugly rewriting details...</summary>  Meta.subst id (cong-Eq
    (ext \j  cong-Eq □ □ (tr-tr (\i  A≡B ^ i) (j) x)) □ □)
    (refl \j  refl \k
           tr (\i  A≡B ^ icase                          (icase i₀ k j) i₁ i)
          $ tr (\i  A≡B ^ icase    (icase (inot k) i₁ j) (icase i₀ k j)    i)
          $ tr (\i  A≡B ^ icase i₀ (icase (inot k) i₁ j)                   i) x)

Computation rules

The computation rules are now obvious: when fw, bw, etc. are applied to a univalence primitive, return the appropriate field.

module _ {a} {A B} f g gf fg fgf (let AB = univalence {a} {A} {B} f g gf fg fgf) where -- HIDE a
  postulate tr-univalence-f :  x  tr (\i  AB ^ i) xf x
  postulate tr-univalence-g :  x  tr (\i  AB ^ inot i) xg x
  {-# REWRITE tr-univalence-f #-}
  {-# REWRITE tr-univalence-g #-}
postulate tr-univalence-gf : x j tr (\i AB ^ icase j i₀ i) (tr (\i AB ^ icase i₀ j i) x) ⟹ gf x ^ inot j postulate tr-univalence-fg : x j tr (\i AB ^ icase j i₁ i) (tr (\i AB ^ icase i₁ j i) x) ⟹ fg x ^ j {-# REWRITE tr-univalence-gf #-} {-# REWRITE tr-univalence-fg #-} -- tr-univalence-fgf ommitted

Ideally, we would be able to compute tr for AB ^ f i for any function f, and even

tr (\i  AB ^ f1 i)  tr (\i  AB ^ fn i)

But we quickly run into problems. Consider

  problem : I  I  A  B
  problem j k = tr (\i  AB ^ icase k i₁ i)
               tr (\i  AB ^ icase j k i)
               tr (\i  AB ^ icase i₀ j i)

When j=i₁, this reduces to

problem i₁ k = fg ^ k  f

and when k=i₀, it reduces to

problem j i₀ = f  gf ^ j

These two types look a lot like the adjunction fgf, but there are two differences: 1. For the two reductions of problem to be confluent, the two right hand sides should be equal in the meta language (judgementally equal). But an adjunction inside the theory doesn't guarantee this.

2. Even when using fgf, we can not get an expression for problem with the right reductions. The issue is that depending on j and k, problem can represent any of the following compositions

problem i₀ i₀ = f   id  id
problem i₀ i₁ = id  f   id
problem i₁ i₀ = f   g   f
problem i₁ i₁ = id  id  f

Transporting univalent paths

Finally, we also need to decide how to transport along equality types involving univalence. As I showed previously, transporting along equalities can be defined in terms of transitivity. So that is what we will do here. The idea is that to transport along trans AB BC, you first transport along AB, and then along BC. The same goes for other directions of using this transitive path (bw, fw∘bw, etc.)

module _ {a} {A B C : Set a} (A≡B : A  B) (B≡C : B  C) where
  trans-f : A  C
  trans-f = fw B≡C  fw A≡B
trans-g : C A trans-g = bw A≡B bw B≡C
trans-gf : x trans-g (trans-f x) x trans-gf x = cong′ (bw A≡B) (bw∘fw B≡C (fw A≡B x)) trans bw∘fw A≡B x
trans-fg : x trans-f (trans-g x) x trans-fg x = cong′ (fw B≡C) (fw∘bw A≡B (bw B≡C x)) trans fw∘bw B≡C x
postulate trans-fgf : x congtrans-f (trans-gf x) trans-fg (trans-f x) -- trans-fgf should be provable, but proof is omitted here
trans-equivalence : A C trans-equivalence = univalence trans-f trans-g trans-gf trans-fg trans-fgf

And we use this transitivity to define transport,

postulate tr-eq-Set :  {a} (A B : I  Set a) (A₀≡B₀ : A i₀  B i₀)
                     tr (\i  Eq (\_  Set a) (A i) (B i)) A₀≡B₀trans-equivalence (refl (A  inot)) (trans-equivalence A₀≡B₀ (refl B))
-- spacial case for fw tr-tr-eq-Set : {a} (A B : I Set a) (A₀≡B₀ : A i₀ B i₀) x tr (\j tr (\i Eq (\_ Set a) (A i) (B i)) A₀≡B₀ ^ j) xtr B (tr (_^_ A₀≡B₀) (tr (A inot) x)) tr-tr-eq-Set A B A₀≡B₀ x = Meta.cong (\A₁≡B₁ tr (_^_ A₁≡B₁) x) (tr-eq-Set A B A₀≡B₀)

Note that tr-eq-Set cannot be used as a rewrite rule. Agda incorrectly complains about universe levels, and when removing those the rule is accepted, but the file takes more than 10 minutes to type check.

Reduction rules spoiled by univalence

While we are at it, it would be nice if we could add some additional judgemental equalities to the type system. For instance, trans xy (sym xy) = refl \_ x should hold for all xy.

However, we can not add this as a reduction. The reason is that for paths build with univalence, transporting along the left hand side reduces to bw∘fw, and this is not necessarily the same as reflexivity. Here is an example

-- A path that flips the interval in one direction, but not in the other
-- so fw ∘ bw ≠ refl
flip-I : I  I
flip-I = univalence id inot
  (\i  refl (icase (inot i) i))
  (\i  refl (icase (inot i) i))
  (\i  refl \_  refl (icase (inot i) i))
module _ (trans-sym : {a A x y} xy trans′ {a} {A} {x} {y} xy (sym xy) -- hide {a} ⟹ (refl \_ x)) where problem2 : i₀i₁ problem2 = Meta.begin i₀ tr-tr-eq-Set (_^_ flip-I inot) (_^_ flip-I inot) (refl \_ I) i₁ tr (\i transflip-I (sym flip-I) ^ i) i₁ Meta.cong (\AB tr (\i AB ^ i) i₁) (trans-sym flip-I) i₁

tr (\i trans flip-I (sym flip-I) ^ i) i₁ evaluates to i₁ with tr-eq-Set, since we follow the equivalence backward and then forward. But according to trans-sym it is an identity path, and so this expression evaluates to i₀. So, we have a term that can evaluate to either i₀ or to i₁, depending on the evaluation order. In other words, reduction is no longer confluent.

This might not seem too bad, since i₀ i₁ inside the theory. But note that the reduction relation is not a homotopy equality. And it might even be untyped if we were using an untyped meta-theory, like the Haskell TTIE implementation. With a non-confluent reduction relation, it is easy to break the type system,

flip-Bool : Bool  Bool
flip-Bool = univalence not not not-not not-not not-not-not
bad : i₀i₁ (Bool , false) ⟹ (_,_ {B = id} Bool true) bad x = Meta.cong {B = Σ Set id} (\i flip-Bool ^ i , tr (\j flip-Bool ^ i && j) false) x
worse : i₀i₁ worse x with bad x ... | ()

So, trans-sym is out.

Another seemingly sensible reduction is that cong f (trans xy yz) trans (cong f xy) (cong f yz). But, if we also postulate that all paths over the interval can be defined in terms of icase, we end up in the same problematic situation.

module _ (trans-cong :  {a b A B x y z} (f : A  B) xy yz -- HIDE a|b
                      congf (trans′ {a} {A} {x} {y} {z} xy yz) -- HIDE atrans′ {b} (congf xy) (congf yz)) -- HIDE b
         (tr-eq-I :  (j k : I  I) jk₀  tr (\i  Eq (\_  I) (j i) (k i)) jk₀refl (icase (j i₁) (k i₁))) where
  trans-sym :  {a A x y} xy  trans′ {a} {A} {x} {y} xy (sym xy) ⟹ (refl \_  x) -- HIDE a
  trans-sym {x = x} xy =
      transxy (sym xy)
    ⟸ trans-cong (_^_ xy) (refl id) (refl inot) 
      cong′ (\i  xy ^ i) (trans′ (refl id) (refl inot))
    ⟹ Meta.cong (cong′ (_^_ xy)) (tr-eq-I inot inot (refl \_  i₁)) 
      refl (\_  x)
problem3 : i₀i₁ problem3 = problem2 trans-sym

I don't have any solution to these problems, aside from not adding the problematic reductions.

Reductions that do seem fine are those involving only a single path. For instance, things like trans xy (refl \_ y) ⟹ xy.


What I have presented is the type theory with indexed equality. As mentioned before, there is also a prototype implementation in Haskell.

The theory is quite similar to the cubical system, but it is developed mostly independently.

Some area's I haven't discussed or investigated yet, and some issues with the theory are:

1. Transitive paths involving HIT path constructors are not reduced, so trans loop (sym loop) is not the same as refl \_ point, however, the two are provably equal inside the theory. As with a general trans-sym rule, adding such a reduction would break confluence.

2. I have defined a function treq that generalizes tr (Eq ..). This could be taken as a primitive instead of tr. In that case we should further generalize it to take Sides, so that it also works for higher paths.

3. It is possible to combine transports to write terms that do not reduce, for example

x : A
AB : A  B
f : A  Set
y : f (bw AB (fw AB x))
tr (\i  f (tr (\j  AB ^ icase (inot i) i₀ j)
           (tr (\j  AB ^ icase i (inot i) j)
           (tr (\j  AB ^ icase i₀ i j) x)))) y

the tr-tr rule handles one such case, but more are possible. For well-behaved equalities flatting all this out is not a problem, but with univalence the intermediate steps become important.

4. I am not entirely happy with univalence breaking confluence in combination with trans-sym. It means that you have to be really careful about what, seemingly benign, reductions are allowed.

January 04, 2018 09:18 PM

January 02, 2018

Douglas M. Auclair (geophf)

December 2017 1HaskellADay problems and solutions

by geophf ( at January 02, 2018 05:04 PM

Gabriel Gonzalez

Dhall - Year in review (2017-2018)

<html xmlns=""><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style></head><body>

The Dhall programmable configuration language is now one year old and this post will review progress over the last year and the future direction of the language in 2018.

If you're not familiar with Dhall, you might want to visit the official GitHub project which is the recommended starting point. This post assumes familiarity with the Dhall language.

Also, I want to use this post to advertise a short survey that you can take if you are interested in the language and would like to provide feedback:


Standardization is currently the highest priority for the language since Dhall cannot be feasibly be ported to other languages until the standard is complete. In the absence of the standard parallel implementations in other languages are chasing a moving and poorly-defined target.

Fortunately, standardization made good progress this year. The completed parts are:

The one missing piece is the standard semantics for import resolution. Once that is complete then alternative implementations should have a well-defined and reasonably stable foundation to build upon.

Standardizing the language should help stabilize the language and increase the barrier to making changes. The goal is that implementations of Dhall in other languages can review and approve/reject proposed changes to the standard. I've already adopted this approach myself by submitting pull requests to change the standard before changing the Haskell implementation. For example:

This process gives interested parties an official place to propose and review changes to the language.

New integrations

One of the higher priorities for the language is integrations with other languages and configuration file formats in order to promote adoption. The integrations added over the past year were:

Also, an initial Dhall binding to JavaScript was prototyped, but has not yet been polished and published:

  • JavaScript (using GHCJS)
    • Supports everything except the import system

All of these integrations have one thing in common: they all build on top of the Haskell implementation of the Dhall configuration language. This is due to necessity: the Dhall language is still in the process of being standardized so depending on the Haskell API is the only realistic way to stay up to date with the language.

Dhall implementations in other languages are incomplete and abandoned, most likely due to the absence of a stable and well-defined specification to target:

The JSON integration has been the most successful integration by far. In fact, many users have been using the JSON integration as a backdoor to integrating Dhall into their favorite programming language until a language-native Dhall integration is available.

Some users have proposed that Dhall should emphasize the JSON integration and de-emphasize language-native Dhall bindings. In other words, they believe that Dhall should primarily market itself as an alternative to Jsonnet or HCL.

So far I've struck a middle ground, which is to market language-native bindings (currently only Haskell and Nix) when they exist and recommend going through JSON as a fallback. I don't want to lean for too long on the JSON integration because:

  • Going through JSON means certain Dhall features (like sum types and functions) are lost in translation.
  • I believe Dhall's future is healthier if the language integrates with a diverse range of configuration file formats and programming languages instead of relying on the JSON integration.
  • I want to avoid a "founder effect" problem in the Dhall community where JSON-specific concerns dwarf other concerns
  • I would like Dhall to eventually be a widely supported configuration format in its own right instead of just a preprocessor for JSON

New language features

Several new language features were introduced in 2017. All of these features were added based on user feedback:

Don't expect as many new language features in 2018. Some evolution is natural in the first year based on user feedback, but I will start rejecting more feature requests unless they have a very high power-to-weight ratio. My goal for 2018 is to slow down language evolution to give alternative implementations a stable target to aim for.

The one likely exception to this rule in 2018 is a widespread user request for "type synonym" support. However, I will probably won't get around to fixing this until after the first complete draft of the language standard.


The main tooling features that landed this year were:

Tooling is one of the easier things to build for Dhall since command-line tools can easily depend on the Haskell API, which is maintained, well-documented, and up-to-date with the language.

However, excessive dependence on tooling can sometimes indicate a flaw in the language. For example, the recent constructors keyword proposal originated as something that several people were originally implementing using their own tooling and was elevated to a language feature.

Haskell-specific improvements

Dhall also provides a Haskell API that you can build upon, which also landed some notable improvements:

The language core is simple enough that some people have begun using Dhall as a starting point for their implementing their own programming languages. I personally don't have any plans to market or use Dhall for this purpose but I still attempt to support this use case as long as these requests:

  • don't require changes to language standard
  • don't significantly complicate the implementation.
  • don't deteriorate performance


The language's official documentation over the last year has been the Haskell tutorial, but I've been slowly working on porting the tutorial to language-agnostic documentation on a GitHub wiki:

The most notable addition to the wiki is a self-contained JSON-centric tutorial for getting started with Dhall since that's been the most popular integration so far.

The wiki will eventually become the recommended starting point for new users after I flesh the documentation out more.

Robustness improvements

I've also made several "invisible" improvements under the hood:

I've been slowly whittling away at type-checking bugs both based on user reports and also from reasoning about the type-checking semantics as part of the standardization process.

The next step is to use proof assistants to verify the safety of the type-checking semantics but I doubt I will do that in 2018. The current level of assurance is probably good enough for most Dhall users for now. However, anybody interested in doing formal verification can take a stab at it now that the type-checking and normalization semantics have been formalized.

I also plan to turn the test suite into a standard conformance test that other implementations can use to verify that they implemented the standard correctly.


This section covers all the miscellaneous improvements to supporting infrastructure:


I'd also like to thank everybody who helped out over the past year, including:

... and also everybody who filed issues, reported bugs, requested features, and submitted pull requests! These all help improve the language and no contribution is too small.


In 2017 the focus was on responding to user feedback as people started to use Dhall "in anger". In 2018 the initial plan is to focus on standardization, language stabilization, and creating at least one complete alternative implementation native to another programming language.

Also, don't forget to take the language survey if you have time. I will review the feedback from the survey in a separate follow-up post and update the plan for 2018 accordingly.


by Gabriel Gonzalez ( at January 02, 2018 03:26 PM

December 30, 2017

Douglas M. Auclair (geophf)

November 2017 1HaskellADay 1Liner problem and solutions

  • November 5th, 2017: f :: Map Int [a] -> [b] - > [(Int, b)]
    for, e.g.: f mapping bs
    length bs == length (concat (Map.elems mapping))
    define f
    • Andreas Källberg @Anka213 Using parallel list comprehensions:
      f mp bs = [ (k,b) | (k,as) <-assocs mp, a <- as | b <- bs]
    • Steve Trout @strout f = zip . foldMapWithKey (fmap . const)

by geophf ( at December 30, 2017 02:16 AM

December 25, 2017

Robert Harper

Proofs by contradiction, versus contradiction proofs

It is well-known that constructivists renounce “proof by contradiction”, and that classicists scoff at the critique.  “Those constructivists,” the criticism goes, “want to rule out proofs by contradiction.  How absurd!  Look, Pythagoras showed that the square root of two is irrational by deriving a contradiction from the assumption that it is rational.  There is nothing wrong with this.  Ignore them!”

On examination that sort of critique fails, because a proof by contradiction is not a proof that derives a contradiction.  Pythagoras’s  proof is valid, one of the eternal gems of mathematics.  No one questions the validity of that argument, even if they question proof by contradiction.

Pythagoras’s Theorem expresses a negation: it is not the case that the square root of two can be expressed as the ratio of two integers.  Assume that it can be so represented.  A quick deduction shows that this is impossible.  So the assumption is false.  Done.  This is a direct proof of a negative assertion; it is not a “proof by contradiction”.

What, then, is a proof by contradiction?  It is the affirmation of a positive statement by refutation of its denial.  It is a direct proof of the negation of a negated assertion that is then pressed into service as a direct proof of the assertion, which it is not.  Anyone is free to ignore the distinction for the sake of convenience, as a philosophical issue, or as a sly use of “goto” in a proof, but the distinction nevertheless exists and is important.  Indeed, part of the beauty of constructive mathematics is that one can draw such distinctions. Once drawn, such distinctions can be disregarded; once blurred, forever blurred, irrecoverably.

For the sake of explanation, let me rehearse a standard example of a genuine proof by contradiction.  The claim is that there exists irrationals a and b such that a to the b power is rational.  Here is an indirect proof, a true proof by contradiction.  Let us prove instead that it is impossible that any two irrationals a and b are such that a to the b is irrational.  This is a negative statement, so of course one proves it by deriving a contradiction from assuming that which is negated.  Suppose, for a contradiction, that for every two irrationals a and b, the exponentiation a to the b power is irrational.  We know from Pythagoras that root two is irrational, so plug it in for both a and b, and conclude that root two to the root two power is irrational.  Now use the assumption again, taking a to be root two to the root two, and b to be root two.  Calculate a to the power of b, it is two, which is eminently rational.  Contradiction.

We have now proved that it is not the case that every pair of irrationals, when exponentiated, give an irrational.  There is nothing questionable about this proof.  But it does not prove that there are two irrationals whose exponent is rational!  If you think it does, then I ask you, please name them for me.  That information is not in this proof (there are other proofs that do name them, but that is not relevant for my purposes).  You may, if you wish, disregard the distinction I am drawing, that is your prerogative, and neither I nor anyone has any problem with that.  But you cannot claim that it is a direct proof, it is rather an indirect proof, that proceeds by refuting the negative of the intended assertion.

So why am I writing this?  Because I have learned, to my dismay, that in U.S. computer science departments–of all places!–students are being taught, erroneously, that any proof that derives a contradiction is a “proof by contradiction”.  It is not.  Any proof of a negative must proceed by contradiction.  A proof by contradiction is, contrarily, a proof of a positive by refutation of the negative.  This distinction is important, even if you want to “mod out” by it in your work, for it is only by drawing the distinction that one can even define the equivalence with which to quotient.

That’s my main point.  But for those who may not be familiar with the distinction between direct and indirect proof, let me take the opportunity to comment on why one might care to draw such a distinction.  It is a matter of honesty, of a sort: the information content of the foregoing indirect proof does not fulfill the expectation stated in the theorem.  It is a kind of boast, an overstatement, to claim otherwise.  Compare the original statement with the reformulation used in the proof.  The claim that it is not the case that every pair of irrationals exponentiate to an irrational is uncontroversial.  The proof proves it directly, and there is nothing particularly surprising about it.  One would even wonder why anyone would bother to state it.  Yet the supposedly equivalent claim stated at the outset appears much more fascinating, because most people cannot easily think up an example of two irrationals that exponentiate to rationals.  Nor does the proof provide one. Once, when shown the indirect proof, a student of mine blurted out “oh that’s so cheap.”  Precisely.

Why should you care?  Maybe you don’t, but there are nice benefits to keeping the distinction, because it demarcates the boundary between constructive proofs, which have direct interpretation as functional programs, and classical proofs, which have only an indirect such interpretation (using continuations, to be precise, and giving up canonicity).  Speaking as a computer scientist, this distinction matters, and it’s not costly to maintain.  May I ask that you adhere to it?

Edit: rewrote final paragraph, sketchy and irrelevant, and improved prose throughout. Word-smithing, typos.


by Robert Harper at December 25, 2017 11:50 PM

December 21, 2017

Tweag I/O

All about reflection:<br/> a tutorial

Arnaud Spiwack

An important device in the tool belt I carry around everyday is type class reflection. I don't reach for it often, but it can be very useful. Reflection is a little known device. And for some reason it is often spoken of with a hint of fear.

In this post, I want to convince you that reflection is not hard and that you ought to know about it. To that end, let me invite you to join me on a journey to sort a list:

sortBy :: (a->a->Ordering) -> [a] -> [a]

What is reflection?

Type class reflection is an extension of Haskell which makes it possible to use a value as a type class instance. There is a package on Hackage, implementing type class reflection for GHC, which I will use for this tutorial. Type class reflection being an extension of Haskell (that is, it can't be defined from other Haskell features), this implementation is GHC-specific and will probably not work with another compiler.

Literate Haskell

This blog post was generated from literate Haskell sources. You can find an extracted Haskell source file here.

There is a bit of boilerplate to get out of the way before we start.

{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE UndecidableInstances #-}

module Reflection where

import Data.Proxy
import Data.Reflection

UndecidableInstances... scary, I know. It is unfortunately required. It means that we could technically send the type checker into an infinite loop. Of course, we will be careful not to introduce such loops.

Sorted lists

My goal, today, is to sort a list. In order to make the exercise a tiny bit interesting, I will use types to enforce invariants. I'll start by introducing a type of sorted lists.

newtype SortedList a = Sorted [a]

Obviously, a SortedList is a list: we can just forget about its sortedness.

forget :: SortedList a -> [a]
forget (Sorted l) = l

But how does one construct a sorted list? Well, at the very least, the empty lists and the lists of size 1 are always sorted.

nil :: SortedList a
nil = Sorted []

singleton :: a -> SortedList a
singleton a = Sorted [a]

What about longer lists though? We could go about it in several ways. Let's decide to take the union of two sorted list:

merge :: Ord a => SortedList a -> SortedList a -> SortedList a
merge (Sorted left0) (Sorted right0) = Sorted $ mergeList left0 right0
    -- 'mergeList l1 l2' returns a sorted permutation of 'l1++l2' provided
    -- that 'l1' and 'l2' are sorted.
    mergeList :: Ord a => [a] -> [a] -> [a]
    mergeList [] right = right
    mergeList left [] = left
    mergeList left@(a:l) right@(b:r) =
        if a <= b then
          a : (mergeList l right)
          b : (mergeList left r)

We need Ord a to hold in order to define merge. Indeed, type classes are global and coherent: there is only one Ord a instance, and it is guaranteed that merge always uses the same comparison function for a. This enforces that if Ord a holds, then SortedList a represents lists of a sorted according to the order defined by the unique Ord a instance. In contrast, a function argument defining an order is local to this function call. So if merge were to take the ordering as an extra argument, we could change the order for each call of merge; we couldn't even state that SortedList a are sorted.

If it weren't for you meddling type classes

That's it! we are done writing unsafe code. We can sort lists with the SortedList interface: we simply need to split the list in two parts, sort said parts, then merge them (you will have recognised merge sort).

fromList :: Ord a => [a] -> SortedList a
fromList [] = nil
fromList [a] = singleton a
fromList l = merge orderedLeft orderedRight
    orderedLeft = fromList left
    orderedRight = fromList right
    (left,right) = splitAt (div (length l) 2) l

Composing with forget, this gives us a sorting function

sort :: Ord a => [a] -> [a]
sort l = forget (fromList l)

Though that's not quite what we had set out to write. We wanted

sortBy :: (a->a->Ordering) -> [a] -> [a]

It is easy to define sort from sortBy (sort = sortBy compare). But we needed the type class for type safety of the SortedList interface. What to do? We would need to use a value as a type class instance. Ooh! What may have sounded excentric when I first brought it up is now exactly what we need!

As I said when I discussed the type of merge: one property of type classes is that they are globally attached to a type. It may seem impossible to implement sortBy in terms of sort: if I use sortBy myOrd :: [a] -> [a] and sortBy myOtherOrd :: [a] -> [a] on the same type, then I am creating two different instances of Ord a. This is forbidden.

So what if, instead, we created an entirely new type each time we need an order for a. Something like

newtype ReflectedOrd a = ReflectOrd a

Except that we can't do a newtype every time we call sortBy. So let's make one newtype once and for all, with an additional parameter.

newtype ReflectedOrd s a = ReflectOrd a

-- | Like `ReflectOrd` but takes a `Proxy` argument to help GHC with unification
reflectOrd :: Proxy s -> a -> ReflectedOrd s a
reflectOrd _ a = ReflectOrd a

unreflectOrd :: ReflectedOrd s a -> a
unreflectOrd (ReflectOrd a) = a

Now, we only have to create a new parameter s locally at each sortBy call. This is done like this:

reifyOrd :: (forall s. Ord (ReflectedOrd s a) => …) -> …

What is happening here? The reifyOrd function takes an argument which works for any s. In particular, if every time we called reifyOrd we were to actually use a different s then the program would be correctly typed. Of course, we're not actually creating types: but it is safe to reason just as if we were! For instance if you were to call reifyOrd (reifyOrd x) then x would have two distinct parameters s1 and s2: s1 and s2 behave as names for two different types. Crucially for us, this makes ReflectOrded s1 a and ReflectOrded s2 a two distinct types. Hence their Ord instance can be different. This is called a rank 2 quantification.

In order to export a single reify function, rather than one for every type class, the reflection package introduces a generic type class so that you have:

reify :: forall d r. d -> (forall s. Reifies s d => Proxy s -> r) -> r

Think of d as a dictionary for Ord, and Reifies s d as a way to retrieve that dictionary. The Proxy s is only there to satisfy the type-checker, which would otherwise complain that s does not appear anywhere. To reiterate: we can read s as a unique generated type which is valid only in the scope of the reify function. For completeness, here is the the Reifies type class, which just gives us back our d:

class Reifies s d | s -> d where
  reflect :: proxy s -> d

The | s -> d part is called a functional dependency. It is used by GHC to figure out which type class instance to use; we won't have to think about it.

Sorting with reflection

All that's left to do is to use reflection to give an Ord instance to ReflectedOrd. We need a dictionary for Ord: in order to build an Ord instance, we need an equality function for the Eq subclass, and a comparison function for the instance proper:

data ReifiedOrd a = ReifiedOrd {
  reifiedEq :: a -> a -> Bool,
  reifiedCompare :: a -> a -> Ordering }

Given a dictionary of type ReifiedOrd, we can define instances for Eq and Ord of ReflectedOrd. But since type class instances only take type class instances as an argument, we need to provide the dictionary as a type class. That is, using Reifies.

instance Reifies s (ReifiedOrd a) => Eq (ReflectedOrd s a) where
  (==) (ReflectOrd x) (ReflectOrd y) =
    reifiedEq (reflect (Proxy :: Proxy s)) x y

instance Reifies s (ReifiedOrd a) => Ord (ReflectedOrd s a) where
  compare (ReflectOrd x) (ReflectOrd y) =
    reifiedCompare (reflect (Proxy :: Proxy s)) x y

Notice that because of the Reifies on the left of the instances GHC does not know that it will for sure terminate during type class resolution (hence the use of UndecidableInstances). However, these are indeed global instances: by definition, they are the only way to have an Ord instances on the ReflectedOrd type! Otherwise GHC would complain.

We are just about done: if we reify a ReifiedOrd a, we have a scoped instance of Ord (ReflectedOrd s a) (for some locally generated s). To sort our list, we simply need to convert between [a] and ReflectedOrd s a.

sortBy :: (a->a->Ordering) -> [a] -> [a]
sortBy ord l =
  reify (fromCompare ord) $ \ p ->
    map unreflectOrd . sort . map (reflectOrd p) $ l

-- | Creates a `ReifiedOrd` with a comparison function. The equality function
--   is deduced from the comparison.
fromCompare :: (a -> a -> Ordering) -> ReifiedOrd a
fromCompare ord = ReifiedOrd {
  reifiedEq = \x y -> ord x y == EQ,
  reifiedCompare = ord }

Wrap up & further reading

We've reached the end of our journey. And we've seen along the way that we can enjoy the safety of type classes, which makes it safe to write function like merge in Haskell, while still having the flexibility to instantiate the type class from a function argument, such as options from the command line. Since type class instances are global, such local instances are defined globally for locally generated types. This is what type class reflection is all about.

If you want to delve deeper into the subject of type class reflection, let me, as I'm wrapping up this tutorial, leave you with a few pointers to further material:

  • A talk by Edward Kmett, the author of the reflection package, on the importance of the global coherence of type classes and about reflection
  • There is no built-in support for reflection in GHC, this tutorial by Austin Seipp goes over the very unsafe, internal compiler representation dependent, implementation of the library
  • John Wiegley discusses an application of reflection in relation with QuickCheck.
  • You may have noticed, in the definition of sortBy, that we map the reflectOrd and unreflectOrd in order to convert between a and ReflectedOrd s a. However, while, reflectOrd and unreflectOrd, have no computational cost, using them in combination with map will traverse the list. If you are dissatified with this situation, you will have to learn about the Coercible type class. I would start with this video from Simon Peyton Jones.

December 21, 2017 12:00 AM

December 17, 2017

Neil Mitchell

Announcing the 'debug' package

Haskell is a great language, but debugging Haskell is undoubtedly a weak spot. To help with that problem, I've just released the debug library. This library is intended to be simple and easy to use for a common class of debugging tasks, without solving everything. As an example, let's take a function we are interested in debugging, e.g.:

module QuickSort(quicksort) where
import Data.List

quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = quicksort lt ++ [x] ++ quicksort gt
where (lt, gt) = partition (<= x) xs

Turn on the TemplateHaskell and ViewPatterns extensions, import Debug, indent your code and place it under a call to debug, e.g.:

{-# LANGUAGE TemplateHaskell, ViewPatterns #-}
module QuickSort(quicksort) where
import Data.List
import Debug

debug [d|
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = quicksort lt ++ [x] ++ quicksort gt
where (lt, gt) = partition (<= x) xs

We can now run our debugger with:

$ ghci QuickSort.hs
GHCi, version 8.2.1: :? for help
[1 of 1] Compiling QuickSort ( QuickSort.hs, interpreted )
Ok, 1 module loaded.
*QuickSort> quicksort "haskell"
*QuickSort> debugView

The call to debugView starts a web browser to view the recorded information, looking something like:

From there you can click around to explore the computation.

I'm interested in experiences using debug, and also have a lot of ideas for how to improve it, so feedback or offers of help most welcome at the bug tracker.

If you're interested in alternative debuggers for Haskell, you should check out the GHCi debugger or Hood/Hoed.

by Neil Mitchell ( at December 17, 2017 10:02 PM

December 15, 2017

Ken T Takusagawa

[agobrown] Longest games of chomp

What Chomp starting positions offer the longest games, perhaps the most possibilities for interesting games?  Among rectangular starting positions, good starting positions are 13x12, 12x11, 10x9, 9x8, 11x6, 7x6, 8x5, 6x5, 5x4.  Missing from the pattern of (N)x(N-1) are 11x10 and 8x7.  (Chomp is weird in how there aren't simple patterns.  It might be a good candidate for machine learning.)

We assumed 3 types of positions in Chomp are instantly known lost (P positions):

  1. L-shaped positions with both arms of the L having unit width and same lengths
  2. 2-row positions of the form [a,a-1]
  3. 3-row positions of the form [a,a-2,2]

The 3-row [a,a-2,2] class of positions is noted in Proposition 2 of "Three-Rowed Chomp" by Doron Zeilberger.  The winning strategy from such a position is as follows:

The base case is [4,2,2] (which looks kind of like a pistol).  If the opponent moves to [3,2,2], then respond moving to [3,2] and follow the 2-row strategy (or move to [3,1,1] and L-shaped strategy).  If [2,2,2] then 2-row strategy vertically.  If [4,1,1] then [3,1,1] and L-shaped strategy.  If [4,2,1] then [2,2,1] and 2-row strategy vertically.  If [4,2] then 2-row strategy.

For larger 3-row positions [a,a-2,2], if the opponent moves in the first 2 rows, leaving at least 4 in the first row and at least 2 in the second row, then restore the position to the shape [b,b-2,2].  If [3,3,2] then [3,1,1] and L-shaped strategy.  If [a,1,1] then [3,1,1] and L-shaped strategy.  If the opponent moves on the third row to [a,a-2,1] then [2,2,1] and follow the 2-row strategy vertically.  If [a,a-2], then 2-row strategy.

Here is complete output of all positions within board size 13x13 and Haskell source code.  A selection of some positions and their game values are also given below.  Computing board size 12 required 8.5 GB of RAM on a machine with 16 GB of RAM.  (Haskell programs tend to use a lot of memory unless one puts effort into conserving memory, which we did not do.)

For computing board size 13, we allowed swapping to virtual memory on SSD on a machine with 8 GB of physical RAM.  The output of /usr/bin/time was:

5751.60user 86808.57system 39:48:33elapsed 64%CPU (0avgtext+0avgdata 7192640maxresident)k
10410518744inputs+8outputs (184956202major+316491058minor)pagefaults 0swaps

This suggests a slowdown factor of about 25 for using virtual memory on SSD compared to RAM for this program which made heavy use of Data.Map.  Polling "ps xu" saw a maximum virtual memory usage of 39 GB.  For the output of the board size 13 at the link above, we omitted saving the "Win_in 1" positions to save disk space.

There are only 3 "Lose in 2" positions: [6,3,3]; [5,5,3]; and [5,2,1,1].  Memorize them to get an edge against opponents.  One could also memorize the 7 "Lose in 4" positions, 14 "Lose in 6", 26 "Lose in 8"...

There seem to be some more patterns that lose: [odd,2,1,1,1,...]; [even,3,1,1,1,...]; [even,2,2,2,1,1,1,...]; [even,2,2,1,1,1,...]; [odd,4,1,1,1,...].  These deserve future investigation.  Andries Brouwer's web site suggests that losing families of positions exist in 3-row chomp for [a+11,a+7,5]; [?,?,7]; [?,?,9]; [?,?,11]; [?,?,14] (not 13, once again breaking what seemed to be a simple pattern of odd third rows).  It still needs to be explicitly articulated how to win after giving your opponent these losing positions.  Work by Steven Byrnes suggests the game values of all 3-row Chomp positions can be rapidly computed, though probably not by a human in his or her head.  Future versions of the code should bound not by board size but number of pieces, to investigate thin positions and roughly L-shaped positions.

(Position [13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 12, 5], Win_in 103)
(Position [13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 5], Win_in 103)
(Position [13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13], Win_in 101)
(Position [12, 12, 12, 12, 12, 12, 12, 12, 12, 10, 7], Lose_in 86)
(Position [12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12], Win_in 79)
(Position [11, 11, 11, 10, 10, 10, 10, 10, 2], Win_in 57)
(Position [11, 11, 11, 10, 10, 10, 10, 9, 2], Win_in 57)
(Position [11, 11, 11, 11, 11, 9, 9, 7, 1, 1], Win_in 57)
(Position [11, 11, 11, 11, 11, 9, 9, 9, 1, 1], Win_in 57)
(Position [11, 11, 11, 11, 11, 11], Win_in 43)
(Position [11, 11, 11, 11, 11, 11, 11], Win_in 41)
(Position [11, 11, 11, 11, 11, 11, 11, 11], Win_in 39)
(Position [11, 11, 11, 11, 11, 11, 11, 11, 11], Win_in 37)
(Position [11, 11, 11, 11, 11], Win_in 35)
(Position [11, 11, 11, 11, 11, 11, 11, 11, 11, 11], Win_in 21)
(Position [10, 10, 10, 10, 10, 10, 10, 10, 4], Lose_in 56)
(Position [10, 10, 10, 10, 10, 10, 10, 10, 10], Win_in 55)
(Position [9, 9, 9, 9, 9, 9, 9, 9], Win_in 41)
(Position [8, 8, 8, 8, 8], Win_in 23)
(Position [8, 8, 8, 8, 8, 8], Win_in 15)
(Position [8, 8, 8, 8, 8, 8, 8], Win_in 13)
(Position [7, 7, 7, 7, 7, 7], Win_in 21)
(Position [6, 6, 6, 6, 2], Win_in 13)
(Position [6, 6, 6, 6, 6], Win_in 9)
(Position [5, 5, 5, 5], Win_in 5)
(Position [4, 4, 4, 4], Win_in 1)
(Position [4, 4, 4], Win_in 1)
(Position [4, 4], Win_in 1)
(Position [4], Win_in 1)

(Position [5, 2, 1, 1], Lose_in 2)
(Position [5, 5, 3], Lose_in 2)
(Position [6, 3, 3], Lose_in 2)

(Position [5, 3, 3, 2], Lose_in 4)
(Position [5, 5, 2, 2], Lose_in 4)
(Position [6, 2, 2, 1, 1], Lose_in 4)
(Position [6, 2, 2, 2], Lose_in 4)
(Position [6, 3, 1, 1, 1], Lose_in 4)
(Position [7, 2, 1, 1, 1, 1], Lose_in 4)
(Position [7, 4, 3], Lose_in 4)

(Position [6, 4, 3, 3, 2], Lose_in 6)
(Position [7, 2, 2, 2, 2], Lose_in 6)
(Position [7, 3, 2, 1, 1, 1], Lose_in 6)
(Position [7, 3, 2, 2], Lose_in 6)
(Position [7, 3, 3, 1, 1], Lose_in 6)
(Position [7, 3, 3, 2, 1, 1], Lose_in 6)
(Position [7, 4, 1, 1, 1], Lose_in 6)
(Position [7, 5, 3, 2], Lose_in 6)
(Position [7, 7, 4], Lose_in 6)
(Position [8, 2, 2, 1, 1, 1, 1], Lose_in 6)
(Position [8, 2, 2, 2, 1, 1], Lose_in 6)
(Position [8, 3, 1, 1, 1, 1, 1], Lose_in 6)
(Position [8, 4, 4], Lose_in 6)
(Position [9, 2, 1, 1, 1, 1, 1, 1], Lose_in 6)

(Position [6, 4, 4, 3, 3], Lose_in 8)
(Position [6, 6, 3, 3, 3], Lose_in 8)
(Position [6, 6, 4, 3, 2], Lose_in 8)
(Position [7, 3, 3, 3, 2, 2], Lose_in 8)
(Position [7, 4, 2, 2, 2, 2], Lose_in 8)
(Position [7, 4, 4, 2], Lose_in 8)
(Position [7, 5, 3, 3, 1, 1], Lose_in 8)
(Position [7, 7, 3, 3], Lose_in 8)
(Position [8, 3, 2, 2, 2], Lose_in 8)
(Position [8, 3, 3, 3], Lose_in 8)
(Position [8, 4, 2, 1, 1, 1], Lose_in 8)
(Position [8, 4, 2, 2], Lose_in 8)
(Position [8, 5, 1, 1, 1], Lose_in 8)
(Position [8, 5, 4, 2], Lose_in 8)
(Position [9, 2, 2, 2, 2, 1, 1], Lose_in 8)
(Position [9, 2, 2, 2, 2, 2], Lose_in 8)
(Position [9, 3, 2, 1, 1, 1, 1, 1], Lose_in 8)
(Position [9, 3, 2, 2, 1, 1, 1], Lose_in 8)
(Position [9, 4, 1, 1, 1, 1, 1], Lose_in 8)
(Position [9, 4, 4, 1, 1], Lose_in 8)
(Position [9, 5, 3, 1, 1, 1, 1], Lose_in 8)
(Position [9, 5, 4], Lose_in 8)
(Position [10, 2, 2, 1, 1, 1, 1, 1, 1], Lose_in 8)
(Position [10, 2, 2, 2, 1, 1, 1, 1], Lose_in 8)
(Position [10, 3, 1, 1, 1, 1, 1, 1, 1], Lose_in 8)
(Position [11, 2, 1, 1, 1, 1, 1, 1, 1, 1], Lose_in 8)

by Ken ( at December 15, 2017 08:05 AM

December 14, 2017

Mike Izbicki

how to cheat at settlers by loading the dice

how to cheat at settlers by loading the dice
(and prove it with p-values)

posted on 2017-12-14

tl;dr This post shows how to create loaded dice, and how to use these dice to gain between 5-15 additional resource cards per game of Settlers of Catan. Surprisingly, we’ll prove that standard scientific tests are not powerful enough to determine that the dice are unfair while playing a game. This essentially means that it’s impossible for your opponents to scientifically prove that you’re cheating. This impossibility is due to methodological defects in the current state of scientific practice, and we’ll highlight some ongoing work to fix these defects.

Loading the dice

My copy of Settlers of Catan came with two normal wooden dice. To load these dice, I placed them in a small plate of water overnight, leaving the 6 side exposed.

The submerged area absorbed water, becoming heavier. My hope was that when rolled, the heavier wet sides would be more likely to land face down, and the lighter dry side would be more likely to land face up. So by leaving the 6 exposed, I was hoping to create dice that roll 6’s more often.

This effect is called the bias of the dice. To measure this bias, my wife and I spent the next 7 days rolling dice while eating dinner. (She must love me a lot!)

In total, we rolled the dice 4310 times. The raw results are shown below.

1 2 3 4 5 6
number of rolls 622 698 650 684 666 812
probability 0.151 0.169 0.157 0.165 0.161 0.196

Looking at the data, it’s “obvious” that our dice are biased: The 6 gets rolled more times than any of the other numbers. Before we prove this bias formally, however, let’s design a strategy to exploit this bias while playing Settlers of Catan.

A strategy for loaded dice

The key to winning at Settlers of Catan is to get a lot of resources. We want to figure out how many extra resources we can get using our biased dice.

First, let’s quickly review the rules. Each settlement is placed on the corner of three tiles, and each tile has a number token. Whenever the dice are rolled, if they add up to one of the numbers on the tokens, you collect the corresponding resource card. For example:

A good settlement will be placed next to numbers that will be rolled often.

To make strategizing easier, the game designers put helpful dots on each token below the number. These dots count the ways to roll that token’s number using two dice.

We can use these dots to calculate the probability of rolling each number. For example, a \(4\) can be rolled in three ways. If we name our two dice \(A\) and \(B\), then the possible combinations are \((A=1,B=3)\), \((A=2,B=2)\), \((A=3,B=1)\). To calculate the probability of rolling a 4, we calculate the probability of each of these rolls and add them together. For fair dice, the probability of every roll is the same \((1/6)\), so the calculation is:

\[\begin{align} Pr(A+B = 4) &= Pr(A = 1)Pr(B=3) + Pr(A=2)Pr(B=2) + Pr(A=3)Pr(B=1) \\ &= (1/6)(1/6) + (1/6)(1/6) + (1/6)(1/6) \\ &= 1/12 \\ &\approx 0.08333 \end{align}\]

For our biased dice, the probability of each roll is different. Using the numbers from the table above, we get:

\[\begin{align} Pr(A+B = 4) &= Pr(A = 1)Pr(B=3) + Pr(A=2)Pr(B=2) + Pr(A=3)Pr(B=1) \\ &= (0.151)(0.157) + (0.169)(0.169) + (0.157)(0.151) \\ &= 0.07597 \end{align}\]

So rolling a \(4\) is now less likely with our biased dice. Performing this calculation for each possible number gives us the following chart.

All the numbers below \(7\) are now less likely, and the numbers above 7 are now more likely. The shift is small, but it has important strategic implications.

Consider the two initial settlement placements below.

The naughty player knows that the dice are biased and puts her settlements on locations with high numbers, but the nice player doesn’t know the dice are biased and puts her settlements on locations with low numbers. Notice that if the dice were fair, both settlement locations would be equally good because they have the same number of dots.

The following formula calculates the average number of cards a player receives on each dice roll:

\[ \text{expected cards per roll} = \sum_{\text{adjacent tokens}} Pr(A+B=\text{token value}) \]

Substituting the appropriate values gives us the following results.

expected cards per roll
naughty nice
fair dice 0.500 0.500
biased dice 0.543 0.457

So the difference between the naughty and nice player is \(0.086\) cards per roll of the biased dice. A typical game of Settlers contains about 60 dice rolls (about 15 turns per player in a 4 player game), so this results in \(0.086*60=5.16\) more cards for the naughty player.

And this is only considering the two starting settlements. As the game progresses, more settlements will be built, and some settlements will be upgraded to cities (which receive two cards per roll instead of one). Calculating the exact effect of these additional sources of cards is difficult because these improvements will be built at random points throughout the game. We’ll have to make some additional assumptions.

If we assume that the naughty player gets 0.043 more cards per roll per settlement/city than the nice player (this exact number will vary depending on the quality of the settlement), and that both players build settlement/cities at turns 10,20,25,30,35,40,45, and 50, then the naughty player will on average receive 15.050 more cards than the nice player.

To summarize, the naughty player will receive somewhere between 5 and 15 more resource cards depending on how their future settlements and cities are built. This advantage can’t guarantee a victory, but it’ll definitely help.

A scientific analysis

Now we’re going to do some simple statistics to prove two things:
  1. The dice really are biased. So the fact that the 6 was rolled more times than the other numbers wasn’t just due to random chance.
  2. There are not enough dice rolls in a game of Settlers for our opponents to scientifically prove that the dice are biased. So it’s scientifically impossible for our opponents to know that we’re cheating.

To show that the dice are biased, we will use a standard scientific technique called null hypothesis significance testing. We begin by assuming a hypothesis that we want to disprove. In our case, we assume that the dice are not biased. In other words, we assume that each number on the dice has a \(1/6\approx 0.166\) chance of being rolled. Our goal is to show that under this assumption, the number of 6’s rolled above is very unlikely. We therefore conclude that our hypothesis is also unlikely, and that the dice probably are in fact biased.

More formally, we let \(X\) be a random variable that represents the total number of 6’s we would roll if we were to repeat our initial experiment with fair dice. Then \(X\) follows a binomial distribution whose density is plotted below.
The \(p\)-value for our experiment is defined informally to be the probability of getting results similar to the results we observed if the dice are not biased. The formal definition and formula is \[\begin{equation} p\text{-value}= Pr(X\ge k) = %1-\sum_{i=0}^k {4310\choose 812} (1/6)^i(1-1/6)^{n-i} 1-\sum_{i=0}^k {n\choose i} q^i(1-q)^{n-i} , \end{equation}\]

where \(n\) is the total number of dice rolls (4310), \(k\) is the number of 6’s actually rolled (812), and \(q\) is the assumed probability of rolling a 6 (1/6). Substituting these numbers gives us \[ p\text{-value}= Pr(X\ge k) \approx 0.0000884 . \] In other words, if we repeated this experiment one million times with fair dice, we would expect to get results similar to the results we actually got only 88 times. Since this is so unlikely, we conclude that our original assumption (that the dice are not biased) is probably false. Most science classes teach that \(p\)-values less than 0.05 are “significant.” We are very far below that threshold, so our result is “very significant.”

Our \(p\)-value is so low because the number of trials we conducted was very large \((n=4310)\). In a typical game of Settlers, however, there will be many fewer trials. This makes it hard for our opponents to prove that we’re cheating.

We said before that there are 60 dice rolls in a typical game. Since we have two dice, that means \(n=120\). To keep the math simple, we’ll assume that we role an average number of 6’s. That is, the number of sixes rolled during the game is \[ k=812\cdot \frac{120}{4310}\approx23. \] Substituting into our formula for the \(p\)-value, we get \[ p\text{-value}=P(X\ge k) \approx 0.265 . \] In words, this means that if the dice were actually fair, then we would still role this number of 6’s \(26.5\%\) of the time. Since this probability is so high, the standard scientific protocol tells us to conclude that we have no “significant” evidence that the dice are biased. (Notice that this is subtly different from having evidence that the dice are not biased! Confusing these two statements is a common mistake, even for trained phd scientists, and especially for medical doctors.)

So how many games can we play without getting caught? It turns out that if we play 6 games (so \(n=6*120=720\), and \(k=812\cdot(720/4310)\approx136\)), then the resulting \(p\)-value is 0.05. In other words, as long as we play fewer than 6 games, then our opponents won’t have enough data to conclude that their measurements of the biased dice are “significant.” The standard scientific method won’t prove we’re cheating.

Some flaws with the \(p\)-value and “significance”

The \(p\)-value argument above is how most scientists currently test their hypotheses. But there’s some major flaws with this approach. For example:

  1. The \(p\)-value test doesn’t use all the available information. In particular, our opponents may have other reasons to believe that the dice are loaded. If you look closely at the dice, you’ll notice some slight discoloration where it was submerged in water.

    This discoloration was caused because the water spread the ink on the die’s face. If you see similar discoloration on the dice in your game, it makes sense to be extra suspicious about the dice’s bias.

    Unfortunately, there’s no way to incorporate this suspicion into the \(p\)-value analysis we conducted above. An alternative to the \(p\)-value called the bayes factor can incorporate this prior evidence. So if our opponent uses a bayes factor analysis, they may be able to determine that we’re cheating. The bayes factor is more complicated than the \(p\)-value, however, and so it is not widely taught to undergraduate science majors. It is rarely even used in phd-level scientific publications, and many statisticians are calling for increased use of these more sophisticated analysis techniques.

  2. Another weakness of the \(p\)-value test is that false positives are very common. Using the standard significance threshold of \(p\le0.05\) means that 5 of every 100 games will have “significant” evidence that the dice are biased to role 6’s. Common sense, however, tells us that cheating at Settlers of Catan is almost certainly not this common because most people just don’t want to cheat. But when you run many experiments, some of them will give “significant” results just by random chance. This is one of the many reasons why some scientists have concluded that most published research is false. This effect is thought to be one of the reasons that evidence of extra sensorial perception (ESP) continues to be published in scientific journals. Some less scrupulous scientists exploit this deficiency in a process called p-hacking to make their research seem more important.

    To alleviate the problem of false positives, a group of statisticians is proposing a new significance threshold of \(p\le0.005\) for a result to qualify as “significant”. While this reduces the risk of false positives, it also makes detecting true effects harder. Under this new criterion, we’d have to play 16 games (for \(n=1920\) dice roles) to get statistically significant evidence that the dice are biased.

At this point, you might be feeling overwhelmed at the complexity of statistical analysis. And this is just for the toy problem of detecting loaded dice in a game. Real world problems like evaluating the effectiveness of chemotherapy drugs are much more complicated, and so require much more complicated statistical analyses. Doing science is hard!

Edit after peer review: Vijay Lulla sent me the following message:

The blog mentions that you rolled the dice 4310 times and all your calculations are based on it, but the frequency table adds up to 4312.

Whooops! It looks like a messed up my addition. Fortunately, this mistake is small enough that it won’t affect any of the numbers in the article by much.

A lot of people mistakenly think that peer review is where other scientists repeat an experiment to test the conclusion. But that’s not the case. The purpose for peer review is for scientists like Vijay to just do a sanity check on the whole procedure to make sure obvious mistakes like this get caught. Sadly, another commonly made mistake in science is that researchers don’t publish their data, so there’s no way for checks like this to be performed.

If this were a real publication in a scientific journal, I would redo all the calculations. But since it’s not, I’ll leave the mistake for posterity.

Edit 2: There’s a good discussion on reddit’s /r/statistics. This discussion provides a much more nuanced view about significance testing than my discussion above, and a few users point out ways that I might be overstating some conclusions.

December 14, 2017 12:00 AM

December 12, 2017

Neil Mitchell

Benchmarking strchr vs memchr

Summary: memchr is faster, but the obvious implement seems to beat the builtin versions.

There are two related C functions for finding the next character in a string - strchr which assumes the string has a NUL character at the end, and memchr which takes the string length as an argument. For strings where you have the size and a NUL terminator, which is fastest? Using gcc 6.2.0 64bit MSYS2 on Windows 10, searching for a single byte 10M bytes along a string, the times were (fastest to slowest):

Trying on 3 different Windows computers, the results are all similar (but scaled).

Given the choice, you should prefer memchr over strchr.

Surprise result

The optimised implementations shipped with GCC are slower than the obvious C implementations taken from a wiki. I have absolutely no idea why. From what I can tell, the builtin versions are coded in assembly, operating on multiple bytes at a time, using SSE instructions. In contrast, the C variants operate on a single byte at a time, and aren't vectorised by the optimiser according to Godbolt. If anyone has an explanation I'd be keen to hear it.

Benchmark Code

To benchmark the variants I wrote a Haskell program using criterion. The full code and build instructions are available in this gist. I compiled the C code with -O3, using the gcc shipped with GHC 8.2.1. I've reproduced the Haskell code below, with some comments:

-- Import all the necessary pieces
import qualified Data.ByteString as BS
import qualified Data.ByteString.Unsafe as BS
import Criterion.Main
import Foreign
import Foreign.C.Types
import Data.Monoid

-- Make all the C imports
foreign import ccall unsafe "string.h memchr" memchr_std :: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe "string.h strchr" strchr_std :: Ptr Word8 -> CInt -> IO (Ptr Word8)
foreign import ccall unsafe memchr_c :: Ptr Word8 -> CInt -> CSize -> IO (Ptr Word8)
foreign import ccall unsafe strchr_c :: Ptr Word8 -> CInt -> IO (Ptr Word8)

-- Method for ignoring the size when using strchr
ignoreSize f a b _ = f a b

-- Build a suitable string with an interesting character i bytes along
cstr i = BS.replicate i 32 <> BS.singleton 64 <> BS.replicate i 32 <> BS.singleton 0

-- The functions to benchmark
funs =
[("memchr_std", memchr_std)
,("strchr_std", ignoreSize strchr_std)
,("memchr_c", memchr_c)
,("strchr_c", ignoreSize strchr_c)]

-- The main function, using Criterion
main = defaultMain
[ seq bs $ bench (show i ++ " " ++ name) $ whnfIO $ test fun bs
| i <- [1,10,100,1000,10000,100000,1000000,10000000]
, let bs = cstr i
, (name, fun) <- funs]

-- The function under test and input string
{-# NOINLINE test #-}
test fun bs =
BS.unsafeUseAsCStringLen bs $ \(ptr,len) ->
fun (castPtr ptr) 64 (fromIntegral len)

by Neil Mitchell ( at December 12, 2017 04:56 PM

December 11, 2017

Jeremy Gibbons

Streaming Arithmetic Coding

In the previous post we saw the basic definitions of arithmetic encoding and decoding, and a proof that decoding does indeed successfully retrieve the input. In this post we go on to show how both encoding and decoding can be turned into streaming processes.

Producing bits

Recall that

\displaystyle  \begin{array}{@{}l} \mathit{encode}_0 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow \mathit{Rational} \\ \mathit{encode}_0\;m = \mathit{pick} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \vrule width0pt depth2ex \\ \mathit{decode}_0 :: \mathit{Model} \rightarrow \mathit{Rational} \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_0\;m\;x = \mathit{unfoldr}\;\mathit{step}\;(m,x) \end{array}

Encoding and decoding work together. But they work only in batch mode: encoding computes a fraction, and yields nothing until the last step, and so decoding cannot start until encoding has finished. We really want encoding to yield as the encoded text a list of bits representing the fraction, rather than the fraction itself, so that we can stream the encoded text and the decoding process. To this end, we replace {\mathit{pick} :: \mathit{Interval} \rightarrow \mathit{Rational}} by {\mathit{pick}_2 = \mathit{fromBits} \cdot \mathit{toBits}}, where

\displaystyle  \begin{array}{@{}l} \mathbf{type}\;\mathit{Bit} = \mathit{Integer} - \mbox{\quad 0 or 1 only} \vrule width0pt depth2ex \\ \mathit{toBits} :: \mathit{Interval} \rightarrow [\mathit{Bit}] \\ \mathit{fromBits} :: [\mathit{Bit}] \rightarrow \mathit{Rational} \end{array}

The obvious definitions have {\mathit{toBits}\;i} yield the shortest binary expansion of any fraction within {i}, and {\mathit{fromBits}} evaluate this binary expansion. However, we don’t do quite this—it turns out to prevent the streaming condition from holding—and instead arrange for {\mathit{toBits}} to yield the bit sequence that when extended with a 1 yields the shortest expansion of any fraction within {i} (and indeed, the shortest binary expansion necessarily ends with a 1), and {\mathit{fromBits}} compute the value with this 1 appended.

\displaystyle  \begin{array}{@{}l} \mathit{fromBits} = \mathit{foldr}\;\mathit{pack}\;(\frac 1 2) \vrule width0pt depth2ex \\ \mathit{pack} :: \mathit{Bit} \rightarrow \mathit{Rational} \rightarrow \mathit{Rational} \\ \mathit{pack}\;b\;x = (b + x) / 2 \vrule width0pt depth2ex \\ \mathit{toBits} = \mathit{unfoldr}\;\mathit{nextBit} \vrule width0pt depth2ex \\ \mathit{nextBit} :: \mathit{Interval} \rightarrow \mathsf{Maybe}\;(\mathit{Bit}, \mathit{Interval}) \\ \mathit{nextBit}\;(l,r) \\ \begin{array}[t]{@{\quad}clcl} | & r \le \frac 1 2 &=& \mathit{Just}\;(0, (0, \frac 1 2) \mathbin{\triangleleft} (l,r)) \\ | & \frac 1 2 \le l &=& \mathit{Just}\;(1, (\frac 1 2,1) \mathbin{\triangleleft} (l,r)) \\ | & \mathbf{otherwise} &=& \mathit{Nothing} \end{array} \end{array}

Thus, if {r \le \frac 1 2} then the binary expansion of any fraction within {[l,r)} starts with 0; and similarly, if {\frac 1 2 \le l}, the binary expansion starts with 1. Otherwise, the interval {[l,r)} straddles {\frac 1 2}; the shortest binary expansion within is it the expansion of {\frac 1 2}, so we yield the empty bit sequence.

Note that {\mathit{pick}_2 = \mathit{fromBits} \cdot \mathit{toBits}} is a hylomorphism, so we have

\displaystyle  \begin{array}{@{}l} \mathit{pick}_2\;(l,r) \\ \begin{array}[t]{@{\quad}clcl} | & r \le \frac 1 2 &=& \mathit{pick}_2\;((0,\frac 1 2) \mathbin{\triangleleft} (l,r)) / 2 \\ | & \frac 1 2 \le l &=& (1 + \mathit{pick}_2\;((\frac 1 2,1) \mathbin{\triangleleft} (l,r))) / 2 \\ | & \mathbf{otherwise} &=& \frac 1 2 \end{array} \end{array}

Moreover, it is clear that {\mathit{toBits}} yields a finite bit sequence for any non-empty interval (since the interval doubles in width at each step, and the process stops when it includes {\frac 1 2}); so this equation serves to uniquely define {\mathit{pick}_2}. In other words, {\mathit{nextBit}} is a recursive coalgebra. Then it is a straightforward exercise to prove that {i \ni \mathit{pick}_2\;i}; so although {\mathit{pick}} and {\mathit{pick}_2} differ, they are sufficiently similar for our purposes.

Now we redefine encoding to yield a bit sequence rather than a fraction, and decoding correspondingly to consume that bit sequence:

\displaystyle  \begin{array}{@{}l} \mathit{encode}_1 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow [\mathit{Bit}] \\ \mathit{encode}_1\;m = \mathit{toBits} \cdot \mathit{foldr}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \vrule width0pt depth2ex \\ \mathit{decode}_1 :: \mathit{Model} \rightarrow [\mathit{Bit}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_1\;m = \mathit{decode}_0\;m \cdot \mathit{fromBits} \end{array}

That is, we move the {\mathit{fromBits}} part of {\mathit{pick}_2} from the encoding stage to the decoding stage.

Streaming encoding

Just like {\mathit{encode}_0}, the new version {\mathit{encode}_1} of encoding consumes all of its input before producing any output, so does not work for encoding infinite inputs, nor for streaming execution even on finite inputs. However, it is nearly in the right form to be a metamorphism—a change of representation from lists of symbols to lists of bits. In particular, {\mathit{narrow}} is associative, and {\mathit{unit}} is its unit, so we can replace the {\mathit{foldr}} with a {\mathit{foldl}}:

\displaystyle  \mathit{encode}_1\;m = \mathit{unfoldr}\;\mathit{nextBit} \cdot \mathit{foldl}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m

Now that {\mathit{encode}_1} is in the right form, we must check the streaming condition for {\mathit{narrow}} and {\mathit{nextBit}}. We consider one of the two cases in which {\mathit{nextBit}} is productive, and leave the other as an exercise. When {r \le \frac 1 2}, and assuming {\mathit{unit} \supseteq (p,q)}, we have:

\displaystyle  \begin{array}{@{}cl} & \mathit{nextBit}\;((l,r) \mathbin{\triangleright} (p,q)) \\ = & \qquad \{ \mathit{narrow} \} \\ & \mathit{nextBit}\;(\mathit{weight}\;(l,r)\;p, \mathit{weight}\;(l,r)\;q) \\ = & \qquad \{ (l,r) \ni \mathit{weight}\;(l,r)\;q \mbox{, so in particular } \mathit{weight}\;(l,r)\;q < r \le \frac 1 2 \} \\ & \mathit{Just}\;(0, (0, \frac 1 2) \mathbin{\triangleleft} ((l,r) \mathbin{\triangleright} (p,q))) \\ = & \qquad \{ \mathit{widen} \mbox{ associates with } \mathit{narrow} \mbox{ (see below)} \} \\ & \mathit{Just}\;(0, ((0, \frac 1 2) \mathbin{\triangleleft} (l,r)) \mathbin{\triangleright} (p,q)) \end{array}

as required. The last step is a kind of associativity property:

\displaystyle  i \mathbin{\triangleleft} (j \mathbin{\triangleright} k) = (i \mathbin{\triangleleft} j) \mathbin{\triangleright} k

whose proof is left as another exercise. Therefore the streaming condition holds, and we may fuse the {\mathit{unfoldr}} with the {\mathit{foldl}}, defining

\displaystyle  \begin{array}{@{}l} \mathit{encode}_2 :: \mathit{Model} \rightarrow [\mathit{Symbol}] \rightarrow [\mathit{Bit}] \\ \mathit{encode}_2\;m = \mathit{stream}\;\mathit{nextBit}\;\mathit{narrow}\;\mathit{unit} \cdot \mathit{encodeSyms}\;m \end{array}

which streams the encoding process: the initial bits are output as soon as they are fully determined, even before all the input has been read. Note that {\mathit{encode}_1} and {\mathit{encode}_2} differ, in particular on infinite inputs (the former diverges, whereas the latter does not); but they coincide on finite symbol sequences.

Streaming decoding

Similarly, we want to be able to stream decoding, so that we don’t have to wait for the entire encoded text to arrive before starting decoding. Recall that we have so far

\displaystyle  \mathit{decode}_1\;m = \mathit{decode}_0\;m \cdot \mathit{fromBits}

where {\mathit{decode}_0} is an {\mathit{unfoldr}} and {\mathit{fromBits}} a {\mathit{foldr}}. The first obstacle to streaming is that {\mathit{foldr}}, which we need to be a {\mathit{foldl}} instead. We have

\displaystyle  \textstyle \mathit{fromBits} = \mathit{foldr}\;\mathit{pack}\;(\frac 1 2)

Of course, {\mathit{pack}} is not associative—it doesn’t even have the right type for that. But we can view each bit in the input as a function on the unit interval: bit~0 is represented by the function {\mathit{weight}\;(0,\frac 1 2)} that focusses into the lower half of the unit interval, and bit~1 by the function {\mathit{weight}\;(\frac 1 2, 1)} that focusses into the upper half. The fold itself composes a sequence of such functions; and since function composition is associative, this can be written equally well as a {\mathit{foldr}} or a {\mathit{foldl}}. Having assembled the individual focussers into one composite function, we finally apply it to {\frac 1 2}. (This is in fact an instance of a general trick for turning a {\mathit{foldr}} into a {\mathit{foldl}}, or vice versa.) Thus, we have:

\displaystyle  \textstyle \mathit{fromBits}\;bs = \mathit{foldl}\;\mathit{focusf}\;\mathit{id}\;bs\;(\frac 1 2) \quad\mathbf{where}\; \mathit{focusf}\;h\;b = h \cdot \mathit{weight}\;(\mathit{half}\;b)

where {\mathit{half}} yields either the lower or the upper half of the unit interval:

\displaystyle  \begin{array}{@{}lcl} \multicolumn{3}{@{}l}{\mathit{half} :: \mathit{Bit} \rightarrow \mathit{Interval}} \\ \mathit{half}\;0 &=& (0, \frac 1 2) \\ \mathit{half}\;1 &=& (\frac 1 2, 1) \end{array}

In fact, not only may the individual bits be seen as focussing functions {\mathit{weight}\;(0, \frac 1 2)} and {\mathit{weight}\;(\frac 1 2, 1)} on the unit interval, so too may compositions of such functions:

\displaystyle  \begin{array}{@{}lcl} \mathit{id} &=& \mathit{weight}\;\mathit{unit} \\ \mathit{weight}\;i \cdot \mathit{weight}\;j &=& \mathit{weight}\;(i \mathbin{\triangleright} j) \end{array}

So any such composition is of the form {\mathit{weight}\;i} for some interval {i}, and we may represent it concretely by {i} itself, and retrieve the function via {\mathit{weight}}:

\displaystyle  \textstyle \mathit{fromBits}\;bs = \mathit{weight}\;(\mathit{foldl}\;\mathit{focus}\;\mathit{unit}\;bs)\;(\frac 1 2) \quad\mathbf{where}\; \mathit{focus}\;i\;b = i \mathbin{\triangleright} \mathit{half}\;b

So we now have

\displaystyle  \textstyle \mathit{decode}_1\;m = \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \quad\mathbf{where}\; \mathit{prepare}\;m\;i = (m, \mathit{weight}\;i\;(\frac 1 2))

This is almost in the form of a metamorphism, except for the occurrence of the adapter {\mathit{prepare}\;m} in between the unfold and the fold. It is not straightforward to fuse that adapter with either the fold or the unfold; fortunately, however, we can split it into the composition

\displaystyle  \mathit{prepare}\;m = \mathit{prepare}_2 \cdot \mathit{prepare}_1\;m

of two parts, where

\displaystyle  \begin{array}{@{}lcl} \multicolumn{3}{@{}l}{\mathit{prepare}_1 :: \mathit{Model} \rightarrow \mathit{Interval} \rightarrow (\mathit{Model}, \mathit{Interval})} \\ \mathit{prepare}_1\;m\;i &=& (m,i) \vrule width0pt depth2ex \\ \multicolumn{3}{@{}l}{\mathit{prepare}_2 :: (\mathit{Model}, \mathit{Interval}) \rightarrow (\mathit{Model}, \mathit{Rational})} \\ \mathit{prepare}_2\;(m,i) &=& (m, \mathit{weight}\;i\;(\frac 1 2)) \end{array}

in such a way that the first part {\mathit{prepare}_1} fuses with the fold and the second part {\mathit{prepare}_2} fuses with the unfold. For fusing the first half of the adapter with the fold, we just need to carry around the additional value {m} with the interval being focussed:

\displaystyle  \mathit{prepare}_1\;m \cdot \mathit{foldl}\;\mathit{focus}\;i = \mathit{foldl}\;\mathit{mfocus}\;(m,i)


\displaystyle  \begin{array}{@{}l} \mathit{mfocus} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathit{Bit} \rightarrow (\mathit{Model}, \mathit{Interval}) \\ \mathit{mfocus}\;(m,i)\;b = (m, \mathit{focus}\;i\;b) \end{array}

For fusing the second half of the adapter with the unfold, let us check the fusion condition. We have (exercise!):

\displaystyle  \begin{array}{@{}l} \mathit{step}\;(\mathit{prepare}_2\;(m, i)) = \mathit{fmap}\;\mathit{prepare}_2\;(\mathit{Just}\;(s, (\mathit{newModel}\;m\;s, \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i))) \\ \qquad\mathbf{where}\;s = \mathit{decodeSym}\;m\;(\mathit{weight}\;i\;(\frac 1 2)) \end{array}

where the {\mathit{fmap}} is the functorial action for the base functor of the {\mathsf{List}} datatype, applying just to the second component of the optional pair. We therefore define

\displaystyle  \begin{array}{@{}l} \mathit{stepi} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, (\mathit{Model}, \mathit{Interval})) \\ \mathit{stepi}\;(m,i) = \mathit{Just}\;(s, (\mathit{newModel}\;m\;s, \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i)) \\ \qquad\mathbf{where}\;s = \mathit{decodeSym}\;m\;(\mathit{weight}\;i\;(\frac 1 2)) \end{array}

and have

\displaystyle  \mathit{step}\;(\mathit{prepare}_2\;(m, i)) = \mathit{fmap}\;\mathit{prepare}_2\;(\mathit{stepi}\;(m,i))

and therefore

\displaystyle  \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2 = \mathit{unfoldr}\;\mathit{stepi}

Note that the right-hand side will eventually lead to intervals that exceed the unit interval. When {j \supseteq i}, it follows that {\mathit{unit} \supseteq j \mathbin{\triangleleft} i}; but the unfolding process keeps widening the interval without bound, so it will necessarily eventually exceed the unit bounds. We return to this point shortly.

We have therefore concluded that

\displaystyle  \begin{array}{@{}lcl} \mathit{decode}_1\;m &=& \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \\ &=& \mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2 \cdot \mathit{prepare}_1\;m \cdot \mathit{foldl}\;\mathit{focus}\;\mathit{unit} \\ &=& \mathit{unfoldr}\;\mathit{stepi} \cdot \mathit{foldl}\;\mathit{mfocus}\;(m,\mathit{unit}) \end{array}

Now we need to check the streaming condition for {\mathit{mfocus}} and {\mathit{stepi}}. Unfortunately, this is never going to hold: {\mathit{stepi}} is always productive, so {\mathit{stream}\;\mathit{stepi}\;\mathit{mfocus}} will only take production steps and never consume any input. The problem is that {\mathit{unfoldr}\;\mathit{stepi}} is too aggressive, and we need to use the more cautious flushing version of streaming instead. Informally, the streaming process should be productive from a given state {(m,i)} only when the whole of interval {i} maps to the same symbol in model {m}, so that however {i} is focussed by subsequent inputs, that symbol cannot be invalidated.

More formally, note that

\displaystyle  \mathit{unfoldr}\;\mathit{stepi} = \mathit{apo}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{stepi})


\displaystyle  \begin{array}{@{}l} \mathit{safestepi} :: (\mathit{Model}, \mathit{Interval}) \rightarrow \mathsf{Maybe}\;(\mathit{Symbol}, (\mathit{Model}, \mathit{Interval})) \\ \mathit{safestepi}\;(m,i) \\ \begin{array}[t]{@{\quad}clcl} | & \mathit{safe}\;(m,i) &=& \mathit{stepi}\;(m,i) \\ | & \mathbf{otherwise} &=& \mathit{Nothing} \end{array} \end{array}


\displaystyle  \begin{array}{@{}l} \mathit{safe} :: (\mathit{Model},\mathit{Interval}) \rightarrow \mathit{Bool} \\ \mathit{safe}\;(m, i) = \mathit{encodeSym}\;m\;s \supseteq i \quad\mathbf{where}\; s = \mathit{decodeSym}\;m\;(\mathit{midpoint}\;i) \end{array}

That is, the interval {i} is “safe” for model {m} if it is fully included in the encoding of some symbol {s}; then all elements of {i} decode to {s}. Then, and only then, we may commit to outputting {s}, because no further input bits could lead to a different first output symbol.

Note now that the interval remains bounded by unit interval during the streaming phase, because of the safety check in {\mathit{safestepi}}, although it will still exceed the unit interval during the flushing phase. However, at this point we can undo the fusion we performed earlier, “fissioning{\mathit{unfoldr}\;\mathit{stepi}} into {\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2} again: this manipulates rationals rather than intervals, so there is no problem with intervals getting too wide. We therefore have:

\displaystyle  \mathit{decode}_1\;m = \mathit{apo}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2) \cdot \mathit{foldl}\;\mathit{mfocus}\;(m,\mathit{unit})

Now let us check the streaming condition for {\mathit{mfocus}} and the more cautious {\mathit{safestepi}}. Suppose that {(m,i)} is a productive state, so that {\mathit{safe}\;(m,i)} holds, that is, all of interval {i} is mapped to the same symbol in {m}, and let

\displaystyle  \begin{array}{@{}lcl} s &=& \mathit{decodeSym}\;m\;(\mathit{midpoint}\;i) \\ m' &=& \mathit{newModel}\;m\;s \\ i' &=& \mathit{encodeSym}\;m\;s \mathbin{\triangleleft} i \end{array}

so that {\mathit{safestepi}\;(m,i) = \mathit{Just}\;(s, (m',i'))}. Consuming the next input {b} leads to state {(m, \mathit{focus}\;i\;b)}. This too is a productive state, because {i \supseteq \mathit{focus}\;i\;b} for any {b}, and so the whole of the focussed interval is also mapped to the same symbol {s} in the model. In particular, the midpoint of {\mathit{focus}\;i\;b} is within interval {i}, and so the first symbol produced from the state after consumption coincides with the symbol {s} produced from the state before consumption. That is,

\displaystyle  \mathit{safestepi}\;(\mathit{mfocus}\;(m,i)\;b) = \mathit{Just}\;(s, \mathit{mfocus}\;(m', i')\;b)

as required. We can therefore rewrite decoding as a flushing stream computation:

\displaystyle  \begin{array}{@{}l} \mathit{decode}_2 :: \mathit{Model} \rightarrow [\mathit{Bit}] \rightarrow [\mathit{Symbol}] \\ \mathit{decode}_2\;m = \mathit{fstream}\;\mathit{safestepi}\;(\mathit{unfoldr}\;\mathit{step} \cdot \mathit{prepare}_2)\;\mathit{mfocus}\;(m,\mathit{unit}) \end{array}

That is, initial symbols are output as soon as they are completely determined, even before all the input bits have been read. This agrees with {\mathit{decode}_1} on finite bit sequences.

Fixed-precision arithmetic

We will leave arithmetic coding at this point. There is actually still quite a bit more arithmetic required—in particular, for competitive performance it is important to use only fixed-precision arithmetic, restricting attention to rationals within the unit interval with denominator {2^k} for some fixed~{k}. In order to be able to multiply two numerators using 32-bit integer arithmetic without the risk of overflow, we can have at most {k=15}. Interval narrowing now needs to be approximate, rounding down both endpoints to integer multiples of {2^{-k}}. Care needs to be taken so that this rounding never makes the two endpoints of an interval coincide. Still, encoding can be written as an instance of {\mathit{stream}}. Decoding appears to be more difficult: the approximate arithmetic means that we no longer have interval widening as an exact inverse of narrowing, so the approach above no longer works. Instead, our 2002 lecture notes introduce a “destreaming” operator that simulates and inverts streaming: the decoder works in sympathy with the encoder, performing essentially the same interval arithmetic but doing the opposite conversions. Perhaps I will return to complete that story some time…

by jeremygibbons at December 11, 2017 01:32 PM

Philip Wadler

Simplicity and Michelson


Only once in my life have I encountered a programming language that was too simple to use. That was Lispkit Lisp, developed by Peter Henderson, Geraint Jones, and Simon Jones, which I saw while serving as a postdoc at Oxford, 1983–87, and which despite its simplicity was used to implement an entire operating system. It is an indightment of the field of programming languages that I have not since encountered another system that I consider too simple. Until today. I can now add a second system to the list of those that are too simple, the appropriately-titled Simplicity, developed by Russell O'Connor of Blockstream. It is described by a paper hereand a website here.
The core of Simplicity consists of just nine combinators: three for products (pair, take, and drop), three for sums (injl, injr, and case), one for unit (unit), and two for plumbing (iden and comp). It is throughly grounded in ideas from the functional programming, programming language, and formal methods communities.
When I call Simplicity too simple it is intended as a compliment. It is delightful to see full adders and cryptographic hash functions cobbled together using just products, sums, and units. It is eye-opening to see how far one can get without recursion or iteration, and how this enables simple analyses of the time and space required to execute a program. It is a confirmation to see a system with foundations in category theory and sequent calculus. Now I know what to say when developers respond to my talk "Categories for the Working Hacker" by asking "But how can we use this in practice?"
The system is accompanied by a proof of its correctness in Coq, which sets a high bar for competing systems. O'Connor even claims to have a proof in Coq that the Simplicity implementation of SHA-256 matches the reference specification provided by Andrew Appel's Verified Software Toolchain project (VST), which VST proved corresponds to the OpenSSL implementation of SHA-256 in C.
At IOHK, I have been involved in the design of Plutus Core, our own smart contract scripting language, working with Darryl McAdams, Duncan Coutts, Simon Thompson, Pablo Lamela Seijas, and Grigore Rosu and his semantics team. We have a formal specification which we are preparing for release. O'Connor's work on Simplicity has caused us to rethink our own work: what can we do to make it simpler? Thank you, Russell!
That said, Simplicity is still too simple, and despite its emphasis on rigour there are some gaps in its description.


A 256-bit full adder is expressed with 27,348 combinators, meaning addition in Simplicity requires several orders of magnitude more work than the four 64-bit addition instructions one would normally use. Simplicity proposes a solution: any commonly used sequence of instructions may be abbreviated as a "jet", and implemented in any equivalent matter. Hence, the 27,348 combinators for the 256-bit full adder can be ignored, and replaced by the equivalent four 64-bit additions.
All well and good, but this is where it gets too simple. No one can afford to be inefficient by several orders of magnitude. Hence, any programmer will need to know what jets exist and to exploit them whenever possible. In this sense, Simplicity is misleadingly simple. It would be clearer and cleaner to define each jet as an opcode. Each opcode could still be specified by its equivalent in the other combinators of Simplicity, but programs would be more compact, faster to execute, and—most important—easier to read, understand, and analyse accurately. If one ignores jets, the analyses of time and space required to execute a program, given toward the end of the paper, will be useless—off by orders of magnitude. The list of defined jets is given nowhere in the paper. Nor could I spot additional information on Simplicity linked to from its web page or findable by a web search. More needs to be done before Simplicity can be used in practice.


It's not just the definition of jets which is absent from the paper, and cannot be found elsewhere on the web. Lots more remains to be supplied.
  • Sections 2.4, 2.5, 3.2 claim proofs in Coq, but apart from defining the semantics of the nine combinators in Appendix A, no Coq code is available for scrutiny.
  • Section 2.5 claims a representation of Simplicity terms as a dag, but it is not specified. Lacking this, there is no standard way to exchange code written in Simplicity.
  • Section 4.4 defines an extended semantics for Simplicity that can read the signature of the current transaction, support Merklised abstract syntax trees, and fail when a transaction does not validate. It also lifts meanings of core (unextended) Simplicity programs to the extended semantics. However, it says nothing about how the seven combinators that combine smaller Simplicity programs into bigger ones act in the extended semantics! It's not hard to guess the intended definitions, but worrying that they were omitted from a paper that aims for rigour.
  • Section 3 provides a Bit Machine to model the space and time required to execute Simplicity. The model is of limited use, since it ignores the several orders of magnitude improvement offered by jets. Further, the Bit Machine has ten instructions, enumerated on pages 10–12, but the list omits the vital "case" instruction which appears in Figure 2. Again, it's not hard to guess, but worrying it was omitted.


A second language for scripting blockchains is Michelson. It is described by a paper hereand a website here. (Oddly, the website fails to link to the paper.)
I will offer just one word on Michelson. The word is: "Why?"
Michelson takes many ideas from the functional programming community, including higher-order functions, data structures such as lists and maps, and static type safety. Currently, it is also much more thoroughly described and documented than Simplicity. All of this is to be commended.
But Michelson is an inexplicably low-level language, requiring the programmer to explicitly manipulate a stack. Perhaps this was done so that there is an obvious machine model, but Simplicity offers a far superior solution: a high-level model for programming, which compiles to a low-level model (the Bit Machine) to explicate time and space costs.
Or perhaps Michelson is low-level to improve efficiency. Most of the cost of evaluating a smart contract is in cryptographic primitives. The rest is cheap, whether compiled or interpreted. Saving a few pennies of electricity by adopting an error prone language—where there is a risk of losing millions of dollars in an exploit—is a false economy indeed. Premature optimisation is the root of all evil.
The language looks a bit like all the bad parts of Forth and Lisp, without the unity that makes each of those languages a classic. Lisp idioms such as CAAR and CDADAR are retained, with new ones like DUUP, DIIIIP, and PAAIAIAAIR thrown in.
There is a fair set of built-in datatypes, including strings, signed and unsigned integers, unit, product, sum, options, lists, sets, maps, and higher-order functions. But there is no way for users to define their own data types. There is no way to name a variable or a routine; everything must be accessed by navigating a data structure on the stack.
Some operations are specified formally, but others are left informal. For lists, we are given formal rewriting rules for the first three operators (CONS, NIL, IF_CONS) but not the last two (MAP, REDUCE). Type rules are given in detail, but the process of type inference is not described, leaving me with some questions about which programs are well typed and which are not. It reminds me of a standard problem one sees in early work by students—the easy parts are thoroughly described, but the hard parts are glossed over.
If I have understood correctly, the inference rules assign types that are monomorphic, meaning each term has exactly one type. This omits one of the most successful ideas in functional programming, polymorphic routines that act on many types. It means back to the bad old days of Pascal, where one has to write one routine to sort a list of integers and a different routine to sort a list of strings.
Several of these shortcomings are also shared by Simplicity. But whereas Simplicity is intended as a compilation target, not to be read by humans, the Michelson documentation includes a large collection of examples suggesting it is intended for humans to write and read.
Here is one of the simpler examples from the paper.
  { DUP ; CDAAR ; # T
IF { DUP ; CDADR ; # N
{ DUP ; CDDDR ; # B
PAIR } }
{ DUP ; CDDAR ; # A
PAIR } }
The comment # T is inserted as a reminder that CDAAR extracts variable T, and similarly for the other variables N, B, and A. This isn't the 1950s. Why don't we write T when we mean T, instead of CDAAR? WHY ARE WE WRITING IN ALL CAPS?
In short, Michelson is a bizarre mix of some of the best and worst of computing.


It is exciting to see ideas from the functional programming, programming languages, and formal methods communities gaining traction among cryptocurrencies and blockchains. While there are shortcomings, it is fantastic to see an appreciation of how these techniques can be applied to increase reliability—something which the multi-million dollar exploits against Ethereum show is badly needed. I look forward to participating in the conversations that ensue!


The conversation has begun! Tezos have put up a page to explain Why Michelson. I've also learned there is a higher-level language intended to compile into Michelson, called Liquidity.

by Philip Wadler ( at December 11, 2017 09:37 AM

December 09, 2017

Edward Z. Yang

Systems ML workshop panel

  • JG: Joseph Gonzalez
  • GG: Garth Gibson (CMU)
  • DS: Dawn Song (UC Berkeley)
  • JL: John Langford (Microsoft NY)j
  • YQ: Yangqing Jia (Facebook)
  • SB: Sarah Bird
  • M: Moderator
  • A: Audience

M: This workshop is bringing together ML and systems. Can you put your place on that spectrum? Who is your home community?

YJ: Right in the middle. I'd like to move more towards systems side, but Berkeley Parallel Labs kicked me out. ML is my home base.

JL: ML is where I come from, and where I will be, but I'm interested in systems. My home is NIPS and ICML

DS: My area is AI and security, did computer security in the past, now moving into AI.

GG: Systems.

JG: I started out in ML, working on probabilistic methods. I basically, in middle of PhD, looked at systems. Now I'm moving to being a systems person that does ML.

M: We've seen a proliferation of deep learning / ML frameworks that require a lot of dev effort, money, time to put in. Q, what is the role of academia of doing research in this area. What kind of large scale ML learning can you do.

GG: I liked YJ's answer last time.

YJ: The thing that is astonishing is that academia is the source of so many innovations. With all due respect, we did very good work in Google, but then Alex came out with 2 GPUs and nuked the field. Academia is the amazing place where we find all of the new ideas, and industry scale it out.

JL: Some examples. If you're coming from academia, maybe you don't have research at big company, but it's an advantage as you will spend time about the right algorithm for solving it efficiently. And that's what will win in the long run. Short term, they'll brute force with AutoML. Long run, the learning algorithms are going to be designed where tjhey won't have parameters. A common ML paper is "we eliminate this hyperparameter". When they're more automatic, more efficient, great things will happen. There's an advantage in being resource constrained, as you will solve things in the right way.

Another example is, the study of machine learning tells us that in thefuture we will regard any model that u just learned and deploy as inherently broken adn buggy as data collection is not part of process of training, deploying. It will decay and become irrelevant. The overall paradagim of ML where you're interacting with the world, and learning, that can be studied easy in academia, and that has huge implications about how you're going to design systems,

DS: People often talk about in a startup, the best thing is to not raise a ton of money; if you're resource constrained you're more focused and creative. ML is really broad, there's lots of problems. Right now we learn from lots of data, but lots of talks at NIPS, humans have amazing ability to learn from very few example. These are problems for academia to tackle, given unique resource constraints.

GG: I'll say, it's difficult to concentrate on top accuracy if you don't have enough data, and the data available to students is stuff like DAWNbench which tends to lag. In academia, we build relationships with industry, send students for internships, they get the ability to do big data, while exploring first principles in university. IT's a challenge, but open publishing and open sharing of code world more berable.

JG: The one thing I've struggled with is focusing on human resources. I have grad students; good students, focus on a key problem can make a lot of progress. We struggle with a lot of data. Struggle with RL really is here, we can build simulators to build at this scale. Being able to use simualtion to get data; be creative, find new and interesting problems.

M: Follow-up on process. I think a lot of you have tried to publish ML in your communities. Are they equipped to appreciate work properly; what is a common reason they don't appreciate.

JG: Publishing ML in systems, or vice versa, is hard. It goes both ways. These communities are not equipped to evaluate work in other field. ML in systems, where if you saw here, it was surprising. Or vice versa, wouldn't have done well in systems venue as systems. The failure mode I see, is systems community doesn't appreciate extreme complexity. In ML, I have this very sophisticated thing, and reducing them to their essential components. ML tries to overextend their complexity as an innovation. MOre broadly, each of these communities has their own biases how they look at research. One thing I've noticed, it's gotten better. Systems is better at evaluating, and at this workshop, people are pushing research in an advanced way.

GG: I'm old, so I've seen creation of conference before. So, you start off with an overlap of areas. In my prior life, it was the notion of storage as a research area, rather than app of devices. You start off, send submission in. The PC has two people that know anything about it, and they aren't assigned, and the reviews are sloppy, and you get one conference that do a little better, but other conferences don't read it. I faced this with fault tolerance, database, OS communities, they don't read each other's stuff. You get enough mass, get a conference that focuses in the middle; reviewing and PC that have seen most of the good work in the area. That's hard, but we're on the edge of doing it in SysML. We're doing the right thing to do competitive, on top of state of the art.

M: Is that the only solution, or can we mix up PCs?

GG: I've seen a lot of experiments to try it. You can end up with permanently fractured communities.

JL: Joey and Dawn are an area chair at ICML. I have found the ML community to be friendly to system type things. There's an area chair systems. Hopefully papers get assigned appropriately.

M: We're not good about that at systems.

DS: About ML and security, we have this problem. In security, we also have very small percentage of ML, and the committee, if you submit ML, it's very hard to find people who can review the paper, and as a consequence, the review quality varies highly. Similar in terms of security in ML, similar problems. It's interesting to think about why this happens and how to solve the problem. In general, sometimes the most interesting work is the interdisciplinary areas. ML and systems, security, and examples I see, including machine learning in systems... so, one thing I actually can understand is, within each community, even though the review quality varies, I can see from committee's perspective, really what they want is papers that are more meaningful to community, help people get exposed to this new area, fostering new exploration. That's part of natural progression. As time goes on, there's more cross pollonization.

JG: We are launching a SysML conference. I had a little bit of reservations: ML is getting better at systems, but now I have to decide where I'm going to send a paper. A lot of papers we see in ML is going to have systems.

GG: When you have a new conference area, not all work is sent there. Overlapping, you have a favorite conference, your heros, and you'll send your most exciting work to that root conference. No problem.

YJ: SysML is great, and this is how it comes out. New fields, it warrants new conferences.

M: Do you think ML expert needs to also be a systems expert? Does such a person who lies at that intersection have a different way of looking? Or you come up with a nice algorithm, and you

JL: It's not OK to have a wall.

There's many way learning algorithms can be changed. The problem with having a wall, if you don't understand, throw engineer. But if you can bridge to understand, they're not artifacts, you can break open and modify. That can let you achieve much better solutions.

GG: AGreed, but what happens initially is you reach over to other side, you put it into system, and it's my innovation that redundancy makes fault tolerance, even though it's fairly pedestrian from the other side. If it is a substantial improvement, it is worth doing. We all grow up.

JG: We need a wall, but we're going to constantly tear it down. Matlab in grad school, we made jokes about it, and MKL community would make it fast. Then they said we are going to build ML for distributed computing algorithms, and ML would write class algorithms for system. That waned in the dev of pytorch, TF, etc., which leveled up abstraction. The stack is building up again; systems community to make more efficient. Well, fp could change, and that could affect algorithm. So we're tearing it down again. But systems is about designing the wall.

YJ: It's more like a bar stool. It's a barrier, but we don't have to be both to do anything, but you need it to make it efficient. A story: a training system we looked at, SGD. That person found a very nicely rounded number: 100. But people frown, you should round to 128. Understanding and improving the common core for CS and engineering, that helps a lot for people to have good sense for how to design ML algorithms.

M: There's a lot of talk about democratizing AI, and all of you have helped that process. What is a truly democratic AI landscape look like, and how far are we from that world.

YJ: I plead guilty in participating in framework wars. When reading CS history, one thing that's pretty natural, when field is strating, there's all sorts of standards, protocols. FTP, Gopher, and now in the end HTTP took over, and everything runs on HTTP. Right now, there's all kinds of different abstractions; boiling it down, everyone is doing computation graph, optimization. I look forward to when we have one really nice graph representation, protocol for optimizing graphs. It's not a rosy dream, because in compilers we have that solution, LLVM. I don't know if we'll reach that state but I think one day we'll get there.

JL: You have AI/ML democratized when anyone can use it. What does that mean, a programmer has a library, or language constructs, which that they use routinely and easily; no issues of data getting mismatched or confused or biased. All the bugs people worry about in data science; those are removed from the system because the system is designed right and easy to use. The level beyond that is when somebody is using a system, that system is learning to adapt to you. There's huge room for improvement in how people interact. I don't know how often there's a rewrite rule driving me crazy; why can't it rewrite the way I want. People can signal info to a learning algorithm, and when those can be used effectively tpo assist people, you have democratized AI.

DS: I have a very different view of democratizing AI. I think it's interesting to think about what democratization here really means. For systems people, it's about making it easier for people to do learning, to use these libraries, platforms. But that's really just providing them with tools. For me, I give talks on demccratizing AI, we are looking at it from a completely different perspective. Code: even, whoever controls AI will control the world. So who controls AI? Even if you give everyone the tools, push a button, but they don't have the data to do the training. So who controls the AI today, and tomorrow? It's Facebook, Microsoft, Google... so for me, democratization means something totally different. Today, they collect data, train models, and they control who has action to model, and users can get recommendations, but not direct access to models. We have a project to actually democratize AI, where users can control their data. Combining blockchain and AI, where users can donate their data to a smart contract, where the smart contract will specify the terms; e.g., if you train a model, the user can use the model, and if the model produces profits, the user can get part of the profits. The smart contract can specify various incentive terms; e.g., if the data is vbetter than others, they can get more profits, and other mechanisms. A developer will supply the ML training algorithm, and get benefits when it is trained well. We are decentralizing th epower of AI; users will be able to get direct access to models and use them. In this case, I hope for an alternate future, where big companies can continue with business, but users by pooling their data in a decentralized fashion, will see actual true democratization of AI; they will access the power of AI. Not just use tools.


GG: I think that a lot of what's meant in democratizing AI is how can you move from a small number of people innovating, to a large number. Tool development and standards. We're close to being there. There was an example in the past, was VSLI paint boxes. Up until a certain point, only an EE could really develop hardware at all. They took a lot of effort and time to make sure it could make it through very part without very much crosstalk. a group came together and thought, well, there are some design rules. This lets you build hardware pretty easily. I could paint green/red boxes, hardware months later, worked. It never worked as fast as that EE guy, so there would always be a place for it, but it would let us build a RISC computer, and ship it. We were in the game, we could innvoate, and do it. The tools we're trying to build right now can build on statistical.

JG: When I started PhD, we did integrals and derivatives by hand. Automatic differentiation was a huge step forward. I blame that for the explosion of papers. A first year can build something far more complex than what I could do. That's moving AI forward, on algorithms side.

The data side is interesting, and that is one where I think about in systems. There's a lot of opportunities to think about how security interacts, leveraging hardware to protect it, markets to sell/buy data from sources, and protect the data across a lot of places. I would argue we're making a substantial amount of progress in how we think about algorithms.

M: When I think about democratizing pervasive AI, recent questions that have been consuming our minds, interpretability, fairness, etc. Can you share... any experience where things like interpretability came up and became a problem, issue, do we have to worry about a lot more in ML, or systems-ML.

JG: My grad students come to me and say the models stop working. I don't know how to fix that; the process is very experimental. Tracking experiments is a big part of the process. We cared a lot about interpretable models, and that meant something very particular. Now it's explainable; we don't need to know what it did exactly, but there needs tob e some connection to what we did. Interpretable, explain computation, it could be related or unrelated to the decision. That's two answers about explainability, and how we debug these systems.

GG: SOSP just happened, and they have ten years of... good copies of everything they submitted. At the end of the conference, Peter Chen took all the PDF files, and did a naive bayes classifier, and saw how well he would predict that it would be accepted. And half the things it predicted to be accepted, would be accepted.

So what did they do? They made ad etector for popular authors. And so what you did is those who had succeeded, they will follow behind. I recognize this problem. You might think that you found a good way, but it's actually Nicolai Zeldovich's paper.

DS: There's a big debate. Some think it's really important, and sometimes, as long as the model works, it's fine. Our brain, we can't really explain how we arrive at certain decisions, but it works fine. And it depends on application. Some applications have stronger requirements for explainability; e.g., law and healthcare, whereas in others it's less required. Also as a whole community, there's a lot we don't understand. We can dtalk about causality, transparenty, all related. As a whole community, we don't really understand what explainability means. Not a good definition. All these concepts are related, we're trying to figure out what's the real core. That's a really good open question.

JL: There's two different interpretations. Can you explain to a person? And that's limited; there's no explainable vision models. The other definition is debuggability. If you want to create complex systems, they need to be debuggable. This is nontrivial with a distributed system, it's nomntriival with ML. If you want to create nontrivial ML systems, yo uhave to figure out why they're not behaving the way you want it to.

DS: Do we debug our brains?

JL: Evolution has done this the hard way for a very long way... a lot of people have bugs in their brains. I know I have bugs. I get an ocular migraine sometimes... very annoying. No, we don't debug our brains, and it's a problem.

YJ: I'm suire there's bugs in my brains; I chased chickens in my grandma's house; the chicken has one spot in its back that if you press it, it just ducks and sits there. It shuts off because of fear. WE humans don't do that. But these bugs, are in our brain as well. Chasing for interpretability helps understand how things work. The old days, deep dream; this line of work started with figuring out what the gradients do, and we propagated back, and we found that direct gradient doesn't work; then we added L1 priors, and then we got pictures. This curiosity has lead to the fact that convnets with random weights are codifying the local correlation; we are hardcoding the structured info in CNNs which we didn't know before. So maybe we will not achieve full interpretability, but some amount of interpretability and creativity will help.

(audience questions)

A: I'd really like to hear what Jeff said about ML for systems. As systems, I'm interested in it, but people have said, you can get far with heuristics.

JL: I think it's exciting.

GG: The index databases, when I read it for reviewing, I went, "Wow! Is that possible?" I think things like that will change the way we do systems. The novelty of the application opens a lot of people's minds. Right now we think of the machine learning tools as being expensive things that repeat what humans do easily that computers don't do well. But that's not what DB index is. We can execute it, but we're not better. But to get it half the size and twice the speed, throwing in another way of thinking about compression through a predictor is a fabulous insight.

JG: I tried to publish in this area for a while. For a while, systems didn't like the idea of complex algorithms in the middle of their system. Now, these days, Systems is like, "ML is cool." But where it's easier to have success, you prediction improves the system, but a bad prediction doesn't break the system. So scheduling, that's good. Where models can boost performance but not hurt. The work in ML to solve systems is successful.

DS: ML for systems is super exciting. I'm personally very excited about this domain, esp. for people who have done systems work, and are interested in AI. ML for systems is an amazing domain of ML. I wouldn't be surprised, I would hope to see, in five years, our systems are more ML driven. A lot of systems have a lot of knobs to tune, trial and error setting, where exactly ML can help. On these amazing techniques, RL, bandits, instead of using bandits to serve ads, we can try to autotune systems. Just like we are seeing AI transforming a lot of application domains, and a lot more intelligent system, old systems, the one we built, should be more intelligent. It's a prediction: It hink we are going to see a lot of work in this domain. I think it will transform systems.

M: I work in this quite a bit. We have some successes with bandits in some settings, but there are settings that are really tough: stateful, choices, decisions influence the future, it makes it hard to apply RL, or the RL techniques take a lot of data. There are challenges, but there are successes. There are a lot of papers that apply RL in caching, resource allocation. The real question is why it's not used in production? I don't know if we have an answer to that, papers do it, it seems to be really good, but it's not that mainstream, esp. having RL all over the place. Why isn't it pervasive. That I don't see.

A: Isn't it because it's not verifiable. You want some kind of verification analysis.

GG: It's called a regression sweep. If you deploy on a lot of systems. There's a lot of money, it has to work. If it falls over, that's a lawsuit. I hired a VP of software. OK, now that I'm in charge, things are going to slow down. Every LoC is bugs, if I want low bug, I stop programmers from writing code, by making the bar very high. This is the thing JOy was talking about; they need a really compelling reason with no downsides, and then they have to pass tests before the pass. So anything stochastic has a high bar.

SB: Another thing that is happening, there aren't that many people who have understanding in both areas. It's really hard to do ML in systems without deep expertise in systems. You really need to understand to explain it.

GG: It wasn't that long since we didn't have hosted services.

M: Guardrails, you constrain the ML system to not suggest something bad. We have a scenario in MS, machines are unresponsive. How long to wait? You can do it in ML. The choices are reasonable, they're never more than the max you'd want to wait.

A: On democratization. There's been a lot of talk about optimizing the models so they can bear the cost. Another is decentralizing data... but there's two very big constraints for systems and models. They cost a lot of money, and there's big variance. Because of cost, if some guy gets into programming, and does research, he won't have resources to do it. So they won't go into engineering; they'll intern at Amazon instead. So if there is some community going into lowering the barrier, demoratizing, what solution is there to get people much more easily? Because there's huge economic costs. People are trying to make huge amounts of money, startups, but there's no... systems have faults with decentralization... there's just a big problem colliding and ML.

JG: We teach data, I teach data science at Berkeley. The summary is, what about the costs of getting into DL? There's cost to train models, GPUs, data, how do I get a freshman in college who is excited about this, chromebook, they can do research and explore opportunities. At Berkeley we have exactly this problem. I teach 200 students, a lot of them are freshmen, chromebook ipad as primary computer. We've built tools using Azure... we run a cloud in Azure, and on these devices they can experiment with models. They get to use pretrained models and appreciate how to ... Someone built a Russian Twitterbot detector, and saw value and opportunity in those. And then they got involved in research projects where they had more funds and tools.

JL: The right interfaces make a huge difference, because they prevent you from having bugs that prevent you from doing things. Also, DL, is all the rage, but framing the problem is more important than the representation you do. If you have the right problem, and a dumb representation, you'll still do something interesting. otherwise, it's just not going to work very well at all.

YJ: As industry, don't be afraid of industry and try it out. Back at Berkeley, when Berkeley AI was using GPUs, the requirement was that you have one project per GPU. We students, framed ten different projects, and we just asked for ten GPUs. NVIDIA came to us and asked, what are you donig. We'll just give you 40 GPUs and do research on that. Nowadays, FAIR has residency, and Google AI has residency, all of these things are creating very nice collaborations between industry and academia, and I want to encourage people to try it out. Industry has funds, academia has talent, marrying those together is an everlasting theme.

A: Going back to where do we go forward in terms of conferences, the future of this workshop; has any decision been made, where we go?

SB: This is work in progress. We're interested in feedback and what you think. We've had this workshop evolving for 10 yrs, with NIPS and iCML. Then we did one with SOSP, excciting. We are now doing a separate conference at Stanford in February. We think there's really an important role to play with workshops colocated with NIPS and ICML. We're still planning to conitnue this series of workshops. There's also a growing amount of systems work in ICML and NIPS, natural expansion to accept that work. The field is growing, and we're going to try several venues, and form a community. If people have ideas.

JG: More people should get involved.

M: We plan to continue this; audience is great, participation is great.

It's a panel, so I have to ask you to predict the future. Tell me something you're really excited... 50-100yrs from now. If you're alive then, I will find you and see if your prediction panned out. Or say what you hope will happen...

YJ: Today we write in Python. Hopefully, we'll write every ML model in one line. Classifier, get a cat.

JL: Right now, people are in a phase where they're getting more and more knobs in learning. ML is all about having less knobs. I believe the ML vision of less knobs. I also believe in democratizing AI. You are constantly turning ... around you, and devs can incorporate learning algorithms into systems. It will be part of tech. It's part of hype cycle. NIPS went through a phase transition. At some point it's gotta go down. When it becomes routine, we're democratizing things.

DS: It's hard to give predictions... I guess, right now, we see ML as an example, we see the waves. Not so long ago, there was the wave of NNs, graphical models, now we're back to NNs. I think... I hope that we... there's a plateauing. Even this year, I have been talking to a lot of great ML researchers, even though one can say there has been more papers written this year, when you hear what people talk about in terms of milestones, many people mentioned milestones from past years. AlexNet, ResNet, ... I do hope that we will see new innovation beyond deep learning. I do teach a DL class, but I hope that we see something beyond DL that can bring us... we need something more, to bring us to the next level.

GG: I'm tempted to point out DL is five years ago, and dotcom era was not more than five years... I think, I'm looking forward to a change in the way CS, science in general, does business, having learned from statistical AI. My favorite one is overfitting. I poorly understood overfitting, in vague stories, until ML hammered what this said. I look forward to the time when students tell me, they stopped writing code, because they were adding parameters... and they added a decent random, iid process for testing code. We're no where near there, but I think it's coming.

JG: I'm looking forward to the return of graphical models... actually not. When we're democratizing AI, but what ultimately happens, we're democratizing technology. I can walk up to Alexa and teach it. Or I can teach my Tesla how to park more appropriately. Tech that can adapt to us because it can learn; when I can explain to a computer what I want. (Star Trek but without a transporter.)

by Edward Z. Yang at December 09, 2017 02:17 AM

December 08, 2017

Edward Z. Yang

Accelerating Persistent Neural Networks at Datacenter Scale (Daniel Lo)

The below is a transcript of a talk by Daniel Lo on BrainWave, at the ML Systems Workshop at NIPS'17.

Deploy and serve accelerated DNNs at cloud scale. As we've seen, DNNs have enabled amazing applications. Architectures achieve SoTA on computer vision, language translation and speech recognition. But this is challenging to serve in large-scale interactive because there are latency, cost and power constraints. Also, DNNs are growing larger in size and complexity.

We've seen a Cambrian explosion in startups to solve this problem. Research groups have produced DNN processing units, DPUs, custom hardware solutions to prove high throughput efficient serving of DNNs. We categorize them into two categories: fast DPUs, where the algorithms and applications have to be fixed in at design time, because they're fabbing an ASIC, or a soft DPU, FPGA. But for soft DPUs, we haven't seen them deployed at scale.

To address this, we've been working on Project BrainWave. Solution to deploy large scale DNNs with FPGA-acceleration. We've designed it to be fast, flexible and friendly. High throughput, low latency acceleration using FPGAs. Flexibility with adaptive numerical precision, update to latest AI algorithms with reconfigurable FPGAs. And it's user friendly, because we have a full stack solution, compile CNTK/Caffe/TF and compile them down. This is deployed on our configurable cloud, an outer layer of CPUs, a data center that puts everything together, and a layer of reconfigurable FPGAs.

We've been deployed DNN models. LSTM model that takes tens to hundreds of milliseconds CPU. What we see is the 99th percentile for latency; even at 99 we are able to achieve sub-millisecond latencies. When you get to these levels of acceleration, it's negligible in the E2E pipeline.

Next I'll dive into details. It's a full stack solution. starting with a compiler and runtime that takes model sin high level frameworks and compiles them down to our architecture. A flexible ISA for serving DNNs. We have a throughput, low latency serving. We do this all with persistency at scale, to keep models pinned in FPGA memories. Deployed on our wide deployment of Intel FPGAs using hardware microservices.

To begin with, let's talk about hardware microservices. This is something we presented at Micro. The architecture of reconfigurable cloud is FPGAs sit between CPU and network. CPU can use FPGA locally for acceleration, but because FPGAs are connected over network, they can distribute between them. We have a proprietary network protocol for low latency compute.

We'vec disaggregated FPGA compute plane from CPU. So we can aggregate FPGAs together to form larger accelerators, and you don't have to match the rate of FPGAs to CPUs. You can serve a large number of CPUs with a small cluster of FPGAs, or vice versa.

Next I'll talk about the compiler and runtime. Goal is to make it very easy for ML specialists to do this. The typical ML specialist doesn't know how to program this. Models developed in high level frameworks, compile them down to our architecture. If you compile them down first into an intermediate graph based representation. We split them into portions split on FPGAs, and portions on CPU. When we execute, we also have runtime that handles orchestration and scheduling that handles it between parts.

There are two main categories of DNNs we have to optimize for. DNNs that have very high compute to data ratio, convnets, these are well studied. I'm going to focus on the other class of DNNs, those with less compute to data ratio, e.g. dense layers and RNNs.

The conventional approach to accelerating DNNs on FPGAs, you keep all model parameters in DRAM. When a request comes in, you're going to stream the model parameters of DRAM, and return a request. The issue with this is when you have DNN layers that are memory bandwidth bound, you're limited in how fast you can run this by memory bandwidth; you're not getting full compute capabilities of FPGA. Typically the way to solve this is with batching; you send a number of requests and use the model parameters for all requests. WHile you may achieve good throughput, latency will increase. For realtime services, this violates your SLA. What we want to do is provide high performance at low or no batching.

The way we do this is with persisted Dnets. FPGAs have lots of memory on chip: 10MB memory. Since they're on chip, it's high bandwidth. So we're going to keep the model parameters on the chip, so that when we get one request in, we distribute it across the entire FPGA chip.

The obvious question is, what happens if your model doesn't fit on chip? We take advantage of the hardware microcenter. We'll distribute a single model over multiple FPGAs in the datacenter.

Let's look at the architecture and microarchitecture of the processing unit we developed. The BrainWave DPU is a software programmable processor, programmed in single-threaded C, but we've added a number of instructions for serving DNNs, e.g., matrix multiply, convolution, nonlinear activations, embeddings. The processor is designed to use narrow precision format (float16) and easily flexible for extending to newer algorithms.

The microarchitecture of the processor, main portion is dedicated to matrix vector unit; matrix vector multiply, consisting of a number kernels on a tile of a larger matrix. Tiling gives us flexibility while maintaining performance. Other compute units are multifunction units; vector-vector operations, such as element-wise multiply, add and activation functions. Tying it all together is an on-chip network that lets us keep all the compute together at time.

Most of the chip is dedicated to matrix vector unit. It's composed of hundreds of multilane dot product units. Each of these dot product units is consists of tens of adds and muls. To keep them fed with data, each dot product unit is fed by a set of dedicated block rams.

Next, I'd like to show performance results for this architecture. Two years ago, we had a deployment of Stratix V FPGAs. It shows the effective teraflops of this format. 16 bit integer.. we've been playing with our own format Microsoft Floating Point. 4.5Tflops at MSFP5.8. These Stratix are pretty old.

(Demo for latest generation of FPGAs)

Looking at throughput oriented DPU, the latency is 65.81ms. With brainwave, latency is 0.98ms. Under 1 millisecond.

This was done on initial engineering silicon. For production silicon, we're expecting to get 12TOps at 16-bit integer. 90TOps for MSFP8. One question is how does numeric output affects output. Here is the normalized accuracy for three in-house text models, using GRU and LSTM. The orange bar shows what happens when you go to MSFP9, but we've developed a way to fine tune networks for this precision, and you see we recover our accuracy. We're working with MSFP8 and see similar results.

Project BrainWave is our project for accelerating DNNs at cloud scale. We hope it will be fast, friendly and cloud-scale, and expand capabilities of AI in the cloud, providing a way to run higher dimensional RNN networks for NLP and other great applications. We're planning to release to third parties, stay tuned.

Q: When you decrease batch size, what hardware are you evaluating? Hardware utilization as we decrease?

A: We stay highly utilized even as we decrease batch size; even at high batch size, we're still sending requests one by one. (Only one step will be processed?) Right.

Q: Regarding the FP9 and FP8, nine and eight being the number of bits used? (Yes) Is it in any way related to Flexpoint at Intel?

A: We developed this independently of flexpoint, and I'm not able to talk about our numeric format.

Q: In MS, do you really write Verilog for your FPGA, or do you use high level synthesis tool?

A: For this, we are writing System Verilog

Q: Batchnorm layers, which require batch computation; how do you put that onto the FPGA?

A: Part of the work of the compiler is to do splitting between CPU and FPGA. So things that are not amenable to FPGA, including batchnorm, we're still running them on CPU.

by Edward Z. Yang at December 08, 2017 08:08 PM

MOCHA: Federated Multi-Tasks Learning (Virginia Smith)

The below is a transcript of a talk by Virginia Smith on MOCHA, at the ML Systems Workshop at NIPS'17.

The motivation for this work comes from the way we think about solving ML problems in practice is changing. The typical ML workflow looks like this. You start iwth dataset and problem to solve. Say you want to build a classifier to identify high quality news articles. Next step is to select an ML model to solve the problem. Under the hood, to fit the model to your data, you have to select an optimization algorithm. The goal is to find an optimal model that minimizes some function over your data.

In practice, there's a very important part of the workflow that is missing. For new datasets, interesting and systems, the system and properties of system, play a large role in the optimization algorithm we select to fix. To give an example, in the past several years, data that is so large that must be distributed over multiple machines, in a datacenter environment. I've been thinking about how to perform fast distributed optimization in this setting, when data is so large.

But more and more frequently, data is not coming nicely packaged in datacenter. It's coming from mobile phones, devices, distributed across country and globe. Training ML in this setting is challenging. For one, whereas in datacenter you have hundreds to thousands, here you have millions and billions. Also, in datacenter, devices are similar capability; here, you have phones that are old, low battery, not connected to wifi. This can change ability to perform computation at any given iteration.

Additionally, there's heterogeneity in data itself. For privacy and computation reasons, data can become very unbalanced in network. And it can be non-IID, so much so that there can be interesting underlying structure to the data at hand. I'm excited because these challenges break down into both systems and statistical challenges. The one second summary of this work, thinking about both systems and statistical in this federated setting; the punchline is that systems setting plays a role not only in optimization algorithm but also the model we select to fit. IT plays a more important role in this overall workflow.

I'm going to go through how we holistically tackle systems and statistical challenges.

Starting with statistical. The goal is we have a bunch of devices generating data, could be unbalanced; some devices have more data than others. One approach used in past is fit a single model across all of this data. All of the data can be aggregated; you find one model that best achieves accuracy across all of the data simultaneously. The other extreme is you find a model for each of the data devices, and not share information. From systems point of view this is great, but statistically, you might have devices that are only ... that are poor in practice. What we're proposing is something between these two extremes. We want to find local models for each device, but share information in a structured way. This can be captured in a framework called multitask learning.

The goal is to fit a separate loss function for each device. These models can be aggregated in this matrix W, and the function of the regularizer, is to force some structure omega on it. This omega is a task relationship matrix, capturing interesting relationships, e.g., all the tasks are related and you want to learn weights, or most of the tasks are related and there are a few outliers, or there are clusters and groups, or there are more sophisticated relationships like asymmetric relationships. These can all be captured in multitask.

We developed a benchmarking set of real federated data. This includes trying to predict human activity from mobile phone, predict if eating or drinking, land mine, and vehicle sensor; distributed sensor to determine if a vehicle is passing by.

For these various datasets, we compared global, local and MTL. The goal is to fit a SVD model. For each data set, we looked at the average error across tasks, where each model is a task. What you can see is average error, for SVD, is significantly lower than global and local approaches. This makes sense because MTL is much more expressive; it lets you go between these extremes. What's interesting is that in these real data sets, it really helps. Reduction by half. This is a significant improvement in practice.

Given that we like to be using multitask learning to model data in federated environment, the next problem is figure out how to train this in distributed setting, thinking about massive distributed. In particular, the goal is to solve the following optimization objective. In looking how to solve this objective, we note that it's often common to solve for W and omega in an alternating fashion. When you solve for omega, it's centrally, you just need access to models. But W must be distributed because data is solved across devices. The key component how to solve this in practice is the W update. The challenge of doing this is communication is extremely expensive. And because of heterogeneity, you may have massive problems with stragglers and fault tolerance; e.g., someone who turns their phone off.

The high level idea for how we're doing this, take a communication efficient method that works well in data center, and modify it to work in federated setting. It will handle MTL as well as stragglers and fault tolerance.

What is the method we're using? The method we're using is COCOA, which is a state of the art method for empirical risk minimization problems. The thing that's nice about COCOa is it spans prior work of mini-batch and one-shot communication, by making communication a first class parameter of the method. Make it flexible as possible. It does it by not solving the primal formulation, but the dual. The dual is nice because we can easily approximate it by forming a quadratic approximation to the objective; and this more easily decomposes across machines.

To distribute this to federate setting, a key challenge is figuring out how to generalize it to the MTL framework. A second challenge; in COCOA, the subproblems are assumed to be solved to some accuracy theta. This is nice because theta varies from 0 to 1, where 0 is exact solve, and 1 is inexact. This can be thought of as how much time you do local communication versus communication. However, in fact, this is not as flexible as it should be in the federated setting. There is only one theta that is set for all iterations, a ll nodes. And because theta cannot be set exactly to one, it cannot handle fault tolerance, where there's no work performed at any iteration. Making this communication parameter much more flexible in practice.

JHow are we doing this? we developed MOCHA. The goal is to solve multitask learning framework; W and Omega in an alternating fashion. In particular, we're able to form the following dual formulation, similar to COCOA, so it decomposes. In comparison, we make this much more flexible assumption on subproblem parameter. This is important because of stragglers: statistical reasons, unbalance, different distributions, it can be very different in how difficult it is to solve subproblems. Additionally, there can be stragglers due to systems issues. And issues of fault tolerance. So this looks like a simple fix: we make this accuracy parameter more flexible: allow it to vary by node and iteration t, and let it be exactly 1. The hard thing is showing it converges to optimal solution.

Following this new assumption, and you can't have a device go down every single round, we show the following convergence guarantee. For L-Lipschitz loss, we get a convergence at 1/epsilon; for smooth models (logistic regression) we get a linear rate.

How does this perform in practice? The method is quite simple. The assumption is we have data stored at m different devices. We alternate between solving Omega, and W stored on each. While we're solving w update, it works by defining these local subproblems for machines, and calling solver that does approximate solution. This is flexible because it can vary by node and iteration.

In terms of comparing this to other methods, what we've seen is the following. Comparing MOCHA to CoCoA, compared to Mb-SDCA and Mb-SGD. We had simulation, with real data to see what would happen if we do it on wifi. We have simulated time and how close are to optimal. What you can see is that MoCHA is converging much more quickly to optimal solution, because MoCHA doesn't have the problem of statistical heterogeneity, and it's not bogged down by stragglers. This is true for all of the different types of networks; LET and 3G. The blue line and MOCHA and CoCOA, they work well in high communication settings, because they are more flexible. But compared to CoCOA, MOCHA is much more robust to statistical heterogeneity.

What's interesting is that if we impose some systems heterogeneity, some devices are slower than others, we looked at imposing low and high systems heterogeneity, MOCHA with this additional heterogeneity, it's a two orders of magnitude speedup to reach optimal solution.

And for MOCHA in particular, we looked at issue of fault tolerance. What we're showing here, we're increasing the probability a device will drop out at any distribution. Going up until there's half devices, we're still fairly robust to MOCHA converging, in almost the same amount of time. But what we see with green dotted line, of the same device drops out every iteration, it doesn't converge. This shows the assumption we made makes sense in practice.

The punchline is that in terms of thinking this new setting, training ML on these massive networks of devices, this is both a statistical and systems issue. We've addressed it in a holistic matter. Code at I also want to reiterate about SysML conference in February.

Q: When you compare global and local? Why is it always better than global?

A: The motivation why you want to use local model over global model, is that if you have a local data a lot, you might perform better. It boosts the overall sample size. I have some additional experiments where we took the original data, and skewed it even further than it already was. We took the local data, and there was less data locally, and they have global approaches. That's just a function of the data in the devices.

Q: I really like how your method has guarantees, but I'm wondering about an approach where you create a metalearning algorithm locally and have it work locally?

A: That's worth looking into empirically, since you can do fine tuning locally. What we were trying to do first was converge to exact optimal solution, but you might want to just work empirically well, would be good to compare to this setting.

by Edward Z. Yang at December 08, 2017 06:15 PM

A Machine Learning Approach to Database Indexes (Alex Beutel)

The below is a transcript of a talk by Alex Beutel on machine learning database indexes, at the ML Systems Workshop at NIPS'17.

DB researchers think about there research differently. You have a system that needs to work for all cases. Where as in ML, we have a unique circumstance, I'll build a model that works well. In DB, you have to fit all.

To give an example of this is a B-tree. A B-tree works for range queries. We have records, key, we want to find all records for range of keys. 0-1000, you build tree on top of sorted array. To quickly look up starting point in range. What if all my data, all of the keys, from zero to million... it becomes clear, you don't need the whole tree above. You can use the key itself as an offset into the array. Your lookup is O(1), O(1) memory, no need for extra data structure.

Now, we can't go for each app, we can't make a custom implementation to make use of some pattern. DB scale to any application, we don't want to rebuild it any time.

But ML excels in this situation. It works well for a wide variety of distributions, learn and make use of them effectively.

This is the key insight we came to. Traditional data structures make no assumptions about your data. They work under any distribution, and generally scale O(n). Interestingly, learning, these data distributions, can offer a huge win. What we're trying to go to, is instead of scaling to size of data, we scale to complexity of it. With linear data, it's O(1). For other distributions, can we leverage this?

There are three dat structures underlying databases. There are B-Trees; range queries, similarity search. Main index. Hash maps for point lookups; individual records. This is more common throughout CS. And bloom filters, are really common for set-inclusion queries. Do I have a key. If your record is stored on disk, checking first if there's a record with that key is worthwhile. We're going to focus entirely on B-trees.

B-trees take a tree like structure with high branching factor. What makes it really effective is that it's cache efficient. You can store top level nodes in your cache where it's fast to look it up, maybe others in main memory, and the actual memory on disk. By caching the hierarchy appropriately, it makes it efficiently. At a high level, a B-tree maps a key to a page, some given place in memory. Once it finds that page, it will do some local search to find the particular range of that key. That could be a scan or binary search; we know the range will be the position from start of page to page size.

An abstract level, the Btree is just a model. It's taking the position of the key, and trying to estimate the position. What we have in this case, we want to search in this error range to find the ultimate record. At a high level, it would mean that we can't use any model. We need err_min and err_max. But we have all the data. If you have all the data, you know at index construction time, you know all the data you're executing against, and you can calculate what the model's min and max error is.

One interesting thing is this is just a regression problem. What you're really modeling is just the CDF. On the X axis on this plot here, the X axis is your keys, Ys your position. This is modeling where your probability mass is located; where your data is in the keyspace. CDFs are studied somewhat, but not a ton, in the literature. This is a nice new implication of research.

We thought, OK, let's try this out straightaway. Train a model, see how fast it is. We looked at 200M server logs, timestamp key, 2 layer NN, 32-width, relatively small by ML. We train to predict position, square error. A B-Tree executes in 300ns. Unfortunately, with the model, it takes 80000ns. By most ML model speeds, this is great. If you're looking at executing on server, great. But this doesn't work for a database.

There are a bunch of problems baked into this. TF is really designed for large models. Think about translation or superresolution images; these are hefty tasks. We need to make this fast for database level speed. Second, b-trees are great for overfitting. There's no risk of over-fitting in this context. They're also cache efficient; that's not looked at in ML. The last thing is local search in the end. Is that really the most effective way of ultimately finding that key? I'm skipping that part because it's fairly detailed, I'll focus on first three.

The first part is just the raw speed fo execution of ML model. This was built really by Tim, this Learning Index Framework program. What it does is it lets you create different indexes under different configurations. For one thing, it lets you do code compilation for TF, ideas from Tupleware, where you can take a linear model and execute it extremely quickly. We can also train simple models. Use TF for more complex gradient descent based learning; extract weights, and have inference graph be codegenned. And we can do a lot of autotuning, to find what the best model architecture is. We know ahead of time what the best training is. We can make pretty smart decisions about what works best.

The next problem is accuracy and sepeed. If I have 100M records, I narrow down quickly from 1.5M to 24K, with each step down this tree. Each one of those steps is 50-60 cycles to look through that page, and to find what the right branch is. So we have to get to an accurracy of 12000, within 500 mul/add, to beat these levels of hierarchy, which are in cache. This is a steep task. The question is what is the right model? a really wide network? Single hidden layer? This scales nicely, we can fit in 256 layer reasonably. We could go deeper... the challenge is we have width^2, which need to be parallelized somehow. The challenge is, how do we effectively scale this. We want to add capacity to the model, make it more and more accurate, with increased size, without becoming to.

We took a different approach, based on mixed experts. We'll have a key, have a really simple classifier. We get an estimate. Then we can use that estimate to find it at the next stage. Narrow down the CDF range, and try to be more accurate in the subset of space. It will still get key as input; given key, give position, but more narrow space of keys. We build this down, and we'll walk down this hierarchy. This decouples model size and complexity. We have a huge model, overfitting, but we don't have to execute all of the sparsity that you would have to do from a pure ML view. We can decouple it usefully. The nice thing we can do is fall back to B-trees for subsets that are difficult to learn in a model. The LIF framework lets us substitute it in easily. In the worst case, B-tree. Best case, more efficient.

The quick results version here, is we find we have four different data sets. Most are integer data sets; last one is string data set. We're trying to save memory and speed; we save memory hugely; these are really simple models. Linear with simple layer, with possibly two stages. We're able to get a significant speedup in these cases. Server logs one is interesting. It looks at a high level very linear, but there's actually daily patterns to this data accessed. Maps is more linear; it's longitudes of spaces. We created synthetic data that's log normal, and here we see we can model it effectively. Strings is an interesting challenge going forward; your data is larger and more complicated, building models that are efficient over a really long string is different; the overall patterns are harder to have intuition about. One thing really worth noting here, it's not using GPUs or TPUs; it's pureely CPU comparison. Apples-to-apples.

This is mostly going into the B-tree part. This is a regression model looking at CDF of data. We can use these exact same models for hash maps. With bloom filters, you can use binary classifiers. I have a bunch of results in the poster in the back.

A few minutes to talk about rooms for improvement. There are a bunch of directions that we're excited to explore. Obvious one is GPUs/TPUs. It's cPUs because that's when B-trees are most effective; but scaling is all about ML. Improving throughput and latency for models with GPUs, exciting going forward. Modeling themselves; there's no reason to believe hierarchy of models is the right or best choice; it's interesting to build model structures that match your hardware. Memory efficient, underlying architecture of GPUs. In the scale of ns we need for database. Multidimensional indexes; ML excels in high numbers of dimension; most things are not looking at a single integer feature. There's interesting question about how you map to multidimensional indexes that are difficult to scale. If we have a CDF, you can approximately sort it right there. And inserts and updates, assumed read-only databases. Large class of systems, but we get more data. How do we balance overfitting with accuracy; can we add some extra auxiliary data structures to balance this out?

Q: One thing is that when... this problem, we solved pretty well without ML. When we introduce ML, we should introduce new metrics. We shouldn't make our system more fragile, because distribution changes. What would be the worst case when distribution changes?

A: As the data becomes updated... in the case of inference and updates, there's a question about generalization. I think you could look at it from the ML point of view: statistically, test model today on tomorrows inserts. (It's a method. If I use this method, and then train it with data that I don't yet have... and do.) The typical extrapolation to future generalization of ML. Guarantees are hard. There will be a worst case that is awful... but the flip side, that's the ML side... generalization. There's also a point of view, I couple this with classic data structure. we coupled modeling with classic data structures: search, bloom filter case, so you don't actually have this work. You catch worst case.

Let me add to that. If you assume that the inserts follow the same distribution as trained model, then the inserts become all one operation. They're even better. Suppose they don't follow the same distribution? you can still do delta indexing. Most systems do do delta indexing. So inserts are not a big problem.

Q: (Robert) Most of the inputs were one or two real numbers, and outputs are a single real number. how does it work if you use a low degree polynomial, or a piecewise linear classifier on the different digits?

A: In the case of strings, it's not a single input. (Treat it as integer?) Well, it's possibly a thousand characters long. It's not the best representation. Different representations work really well. The last thing I want to say, piecewise linear could work, but when you run 10k, 100k submodels, it's slow. Hierarchy helps. Polynomials are interesting, depends on data source.

Q: Can you comment how bad your worst case is? Average numbers?

A: We specifically always have a spillover. The worst case is defaulting to typical database. We haven't had a case where you do worse, because we'll default to B-tree. (Deterministic execution?) Not inference time.

by Edward Z. Yang at December 08, 2017 06:11 PM

Ray: A Distributed Execution Framework for Emerging AI Applications (Ion Stoica)

The below is a transcript of a talk by Ion Stoica on Ray, at the ML Systems Workshop at NIPS'17.

We've been working on it at Berkeley for more than one year. Over the past years, there's been tremendous progress in AI. Ad targeting, image&speech, many more. Many applications are based on supervised learning with DNNs. Supervised plus unsupervised are the two dominant approaches.

However, the next generation of AI applications will be very different. They're deployed in mission critical scenarios, need to continually learn from a rapidly changing env. Robotics, self driving cars, unmanned drones, dialogue systems. Implementing this new generation of AI applications requires a broader range of techniques. Stochastic optimization, parallel simulations, many more.

Ray provides a unified platform for implementing these approaches. To motivate Ray, I'll use reinforcement learning. RL learns by interacting with env. A policy mapping from state/observation to action that maximizes a certain reward. What are the reqs of RL? Many applications exhibit nested parallelism: search, where they use data parallel SGD, which then calls a component that does policy evaluation with a model to simulate, that runs in parallel on multiple CPUs. Second, these workloads can be highly heterogenous in hardware and time. Many of these computations require not only CPUs, but GPUs TPUs and FPGAs. Second, this computation can take wildly different times. Simulate a chess game: 3 moves to lose, or 50 moves to win or draw. And in robotics, we need to process in real time, processing the data from sensors in parallel, tens of ms.

Meeting these requirements is not easy. To meet these requirements, you need a system that is flexible and performant. Flexible: it should create and schedule tasks dynamically, and support arbitrary dependencies. Perf: it should scale to hundreds of nodes, sub-millisecond latency, millions of task, and efficiently share numeric data.

Next, I'm going to say how we achieve these challenges. Flexibility? We provide a very flexible model: dynamic tasks graphs. On top of this, we give the two models: parallel tasks and actors.

To talk about parallel tasks, here is Python code: one reads an array from a file, and the other adds two arrays. The code is simple: it creates two arrays a and b from file1 and file2, and sum them up. So now, parallelizing this program is quite easy. If we want to parallelize a function, in order to do that, we need to add a ray.remote decorator to each function. When we invoke these functions, you need to invoke remote method. Remove doesn't return object itself, just the object id. This is very similar to the futures abstraction. To get the actual object, you must invoke ray.get on the object id.

To get a better idea of how Ray is executing, let's execute a simple program. Assumes files stored on different nodes. When read_array on file1, it schedules read_array on the appropriate node. The remote call returns immediately, before the actual read finishes. This allows the driver to run the second task in parallel, running on the node on file 2, and launch the add remote function. All functions have been scheduled remotely, but none of them have finished. To actually get the result, you have to call ray.get on the result. This is a blocking call, you'll wait for the entire computation graph to be executed.

Tasks are very general, but they are not enough. Consider that you want to run a simulator, and this simulator is closed source. In this case, you do not have access to the state. You have state, action, simulations, to set up state in simulator, you cannot do it. So to get around this, there is another use case, where the state is too expensive to create. For example, DNNs on GPUs, in this case, you want to initialize it once, and reinitialize for each simulation.

In order to address these use cases, we add Actor abstraction. An actor is just a remote class. If you have a Counter, you mark it ray.remote, and the when you create the class or invoke methods, you use remote keyword. This is a computation graph for this very simple example. Notice the method invocations also return object identifiers. To get the results, you need to call ray.get on object identifiers. Ray also allows you to specify the number of resources, for actors and tasks.

To put things together, and provide a more realistic example, evaluation strategy, a scalable form of RL, by Salimans et al in OpenAI. In a nutshell, evolution strategy, tries lots of policies, and tries to see which runs best. This is highly parallel. So here is pseudocode for parallel strategies. A worker that does simulation and returns the reward, create twenty workers, and then 200, do 200 simulations, update policy. Again, if you want to parallelize this code, we have to add a bunch of remote, and now on the right hand side, you'll notice I'm also sharing the computation graph. When you invoke now, the Worker.remote, you create 20 remote workers to do it in parallel. And you invoke with the remote keyword. Again, notice that in this case, the results are not the rewards themselves, but they're ids to the reward objects. In order to get the rewards to get policy, you have to call ray.get.

This hopefully gives you a flavor how to program in Ray. Next time, I switch gears, presents system design of Ray; how Ray gets high performance and scalability.

Like many classic computing frameworks, it has a driver, and a bunch of workers. Driver runs a program, worker runs task remotely. You can run and write a bunch of actors. The drivers actors on the same node, they share the data, on shared memory, and the workers and actors of cross nodes, share through distributed object store we built. Each node has a local scheduler, so when a driver wants to run another task, the local scheduler tries to schedule it locally. If it cannot schedule it locally, it invokes global scheduler, and it will schedule another node that has resources. Actor, remote method. Finally, what we do, and one essential part of the design, is we have a Global Control State. It takes all of the state of the system, and centralizes it. The metadata for the objects, in objects table, function. This allows system to be stateless. All these other components can fail, you can bring them up, get the most recent data from global control state. It also allows us to parallelize the global scheduler, because these replicas are going to share the same state in the GCS.

Another nice effect of having a GCS is that it makes it easy to build a bunch of profiling and debugging tools.

This design is highly scalable. Let me try to convince you why this is. To make GcS scalable, we just shard it. All these keys are pseudorandom, so it's easy to shard and load balance. The scheduler as you see is distributed; each node has a local scheduler, and Ray tries to schedule tasks which are spawned by a worker/driver on another task that is locally. The global scheduler, becomes a bottleneck, we can also replicate it. Finally, in systems, even if scheduler is super scalable, in Spark, there's another bottleneck: only the driver can launch new tasks. In order to get around that, we allow in Ray the workers and actors to launch tasks. Really, there is no single bottleneck point.

A few words about implementation. The GCS is implemented with Redis. For object store, we leverage Apache Arrow. For fault tolerance, we use lineage based fault tolerance like Spark. Actors are part of task graph; methods are treated as tasks, so we have a uniform model for providing fault tolerance.

So now some evaluation results. This plot represents the number of tasks per second, and you can see the number of nodes; it scales linearly. You can schedule over 1.8 M/s. Latency of local task execution is 300us, the latency of remote task is 1ms. This plot illustrates fault tolerance. You may ask why you care about fault tolerance? The problem is you need in your program that the simulation may not finish; this makes the program far more complicated, even if you're willing to ignore some results. Here, on this axis, you have the time in seconds, you have two y axes, number of nodes in system, and the throughput. As you can see, the number of nodes is starting at 50, then 25, then to 10, and goes back to 50. In the red area, you show the number of tasks per second; it follows as you may expect, the number of nodes in the system. If you look a little bit, there are some drops; every time, you have a drop in the number of tasks. It turns out this is because of the object reconstruction. When some nodes go away, you lose the objects on the node, so you have to reconstruct them. Ray and Spark reconstruct them transparently. With blue, you can see the re-executed tasks. If you add them, you get a very nice filling curve.

Finally, for evolution strategies, we compared with reference ES from... we followed the OpenAI, and on the X axis, you have number of CPUs, mean time to solve the particular problem; simulator, learning to run, there are three points to notice. One is, as expected, as you add more CPUs, the time to solve goes down. The second is that Ray is actually better than the reference ES, better results, even though the reference ES is specialized for beating. Third, for a very large number of CPUs, ref couldn't do it, but Ray could do better and better. I should add that Ray takes half the amount of code, and was implemented in a couple of hours.

Related work: look, in this area, there are a huge number of systems, that's why you are here, lots of systems. Ray is complimentary to TF, MXNet, PyTorch, etc. We use these systems to implement DNNs. We integrate with TF and PyT. There are more general systems, like MPI and Spark; these have limited support for nested parallelism; computation model, and they have much coarser grained tasks.

To conclude, Ray is a system for high performance and flexibility and scalability. We have two libraries on top of Ray: RLlib and Ray Tune. It's open source, please try, we'd love your feedback. Robert, Philip, Alex, Stephanie, Richard, Eric, Heng, William, and many thanks to my colleague Michael Jordan.

Q: In your system, you also use actor; actor is built up on shared memory. Do you have separate mailbox for actors? How do you do that?

A: No, the actors communicate by passing the argument to the shared object store.

Q: What is the granularity of parallelism? Is it task atomic, or do you split task?

A: The task granularity is given by what is the overhead for launching a task and scheduling the task. The task you see, we are targeting task, low and few ms. The task is not implementing something like activation function. we leave that job to much better frameworks. And a task is executing atomically, a method, in the actors, are serialized.

Q: Question about fault tolerance: in Spark, when you don't have a response for some time, it says this node died. Here, the task is much more, because NN, something like that. So we don't have the same time.

A: We do not do speculation; implicit speculation in Ray, for the reason you mentioned.

Q: Can you give me more details on the reference implementation, doesn't scale

A: The reference implementation, it's the OpenAI implementation, Robert here can provide you a lot more detailed answers to that question.

by Edward Z. Yang at December 08, 2017 06:07 PM