# Installing OCaml under openSUSE 11.4, or: “the compilation of conf-ncurses failed”

Recently I decided to learn the basics of OCaml and I spent yesterday installing the compiler and some basic tools. On my machine at work I have a Debian 7 installation, while on my home laptop I have openSUSE 11.4. Both systems are quite dated and ship with OCaml 3.x compiler, which is five years old. Obviously, I wanted to have the latest version of the language. I could have compiled OCaml from sources – and in fact I have done that in the past to compile the latest version of Coq – but luckily there is a tool called OPAM (OCaml Package manager). OPAM can be used to easily download and install desired version of OCaml compiler. As the name implies, OPAM can be also used for managing packages installed for a particular compiler version.

The installation process went very smooth on my Debian machine, but on openSUSE I have run into problems. After getting the latest compiler I wanted to install ocamlfind, a tool required by a project I wanted to play with. To my disappointment installation ended with an error:

[ERROR] The compilation of conf-ncurses failed at "pkg-config ncurses".   This package relies on external (system) dependencies that may be missing. opam depext conf-ncurses.1' may help you find the correct installation for your system.

I verified that I indeed have installed development files for the ncurses library as well as the pkg-config tool. Running the suggested opam command also didn’t find any missing dependencies, and the log files from the installation turned out to be completely empty, so I was left clueless. Googling revealed that I am not the first to encounter this problem, but offered no solution. I did some more reading on pkg-config and learned that: a) it is a tool that provides meta-information about installed libraries, and b) in order to recognize that a library is installed it requires extra configuration files (aka *.pc files) provided by the library. Running pkg-config --list-all revealed that ncurses is not recognized as installed on my system, which suggested that the relevant *.pc files are missing. Some more googling revealed that ncurses library can be configured and then compiled with --enable-pc-files switch, which should build the files needed by pkg-config. I got the sources for the ncurses version installed on my system (5.7) only to learn that this build option is unsupported. This explains why the files are missing on my system. I got the sources for the latest version of ncurses (6.0), configured them with --enable-pc-files and compiled, only to learn that the *.pc files were not built. After several minutes of debugging I realized that for some unexplained reasons the configure-generated script which should build the *.pc files (located at misc/gen-pkgconfig) did not receive +x (executable) permission. After adding this permission manually I ran the script and got five *.pc files for the ncurses 6.0 library. Then I had to edit the files to match the version of ncurses of my system – relevant information can be obtained by running ncurses5-config --version. The only remaining thing was to place the five *.pc files in a place where pkg-config can find them. On openSUSE this was /usr/local/pkgconfig, but this can differ between various Linux flavours.

After all these magical incantations the installation of ocamlfind went through fine and I can enjoy a working OCaml installation on both of my machines. Now I’m waiting for the “Real-world OCaml” book ordered from Amazon (orders shipped from UK Amazon to Poland tend to take around two weeks to arrive).

# Speaker's Trust removes all trace of Leanne Baghouti

Leanne Baghouti, a British-Palestinian won her local round of Speaker's Trust "Speak Out" competition with an impassioned talk about Palestinian human rights. Her talk, and all sign that she had won the competition, were later deleted from the competition web site.

A colleague characterised her talk as "full of anti-Semitic and aggressive material from sources in the Middle East". I watched the video, and heard nothing anti-Semitic. It is a word that should be used with care.

Speaker's Trust has released a statement regarding the matter:
“There are two fundamental rules that are made explicit during the training: the speech must have a positive and uplifting message – in fact this is one of the core terms of the agreement with the Jack Petchey Foundation [and] a speaker should never inflame or offend the audience or insult others and this, by definition, means that propaganda is ruled out absolutely from the outset… Speakers Trust and Jack Petchey Foundation judging panel decided unanimously against sending Leanne Mohamad through to the next stage and she will not be speaking at the Grand Final. These were precisely our concerns.”
And another:
Our primary duty of care is to the young people we work with and we cannot tolerate any form of insult or abuse. We are concerned and saddened that Leanne’s experience has been less than positive.
Leanne Mohamad is the Redbridge Regional Final winner and there has never been any suggestion that she should be disqualified. Almost 190,000 young people have spoken out over the years on any topic which they feel passionately about and none has ever been banned from the process or silenced.
We are, however, a small charity without the capacity to moderate comments 24 hours a day and it was considered essential to protect Leanne by temporarily suspending the regional video over the bank holiday, until we were able to consult with her school and family.
Of 37 talented regional Champions only fifteen can be voted through to the Grand Final. This selection process took place on Saturday 21st May based on standard judging criteria and without any external influence or input.
The general “rules” of effective public speaking are guidelines to help speakers to create a speech that will connect with a large and diverse audience and every speech was judged on its own merits. At the heart of what we do lies the determination that all of our young speakers, irrespective of background, race or creed, should be able to speak out in a safe and supportive environment.

# Omar Barghouti banned by Israel for travelling for supporting BDS

Omar Barghouti, a Palestinian living in Israel, has been denied the right to travel, for no reason other than that he is an effective supporter of BDS (Boycott, Divestment, Sanctions). Glenn Greenwald interviews him in The Intercept.
Despite having lived in Israel for 22 years with no criminal record of any kind, Omar Barghouti (above) was this week denied the right to travel outside the country. As one of the pioneers of the increasingly powerful movement to impose boycotts, sanctions, and divestment measures (BDS) on Israel, Barghouti, an articulate, English-speaking activist, has frequently traveled around the world advocating his position. The Israeli government’s refusal to allow him to travel is obviously intended to suppress his speech and activism. Prime Minister Benjamin Netanyahu was one of the world leaders who traveled last year to Paris to participate in that city’s “free speech rally.” ...

Sarah Leah Whitson of Human Rights Watch told the Electronic Intifada that “Israel’s refusal to renew Barghouti’s travel document appears to be an effort to punish him for exercising his right to engage in peaceful, political activism, using its arsenal of bureaucratic control over Palestinian lives.” She added: “Israel has used this sort of control to arbitrarily ban many Palestinians from traveling, as well as to ban international human rights monitors, journalists, and activists from entering Israel and the occupied Palestinian territories.” ...

Barghouti: Many people are realizing that Israel is a regime of occupation, settler colonialism, and apartheid and are therefore taking action to hold it to account to international law. Israel is realizing that companies are abandoning their projects in Israel that violate international law, pension funds are doing the same, major artists are refusing to play Tel Aviv, as Sun City was boycotted during apartheid South Africa. ...

We live in a country where racism and racial incitement against indigenous Palestinians has grown tremendously into the Israeli mainstream. It has really become mainstream today to be very openly racist against Palestinians. Many settlers and hard-right-wing Israelis are taking matters into their own hands – completely supported by the state – and attacking Palestinians.

# weigh: Measuring allocations in Haskell

## Work motivation

While working for various clients that needed fast binary serialization, we had discovered that the binary and cereal packages are both inefficient and so we created the store package.

In the high-frequency trading sector, we had to decode and encode a binary protocol into Haskell data structures for analysis. During this process it was made apparent to us that while we had been attentive to micro-benchmark with the venerable criterion package, we hadn't put a lot of work into ensuring that memory usage was well studied. Bringing down allocations (and thus work, and garbage collection) was key to achieving reasonable speed.

## Let's measure space

In response, let's measure space more, in an automatic way.

The currently available way to do this is by compiling with profiling enabled and adding call centers and then running our program with RTS options. For example, we write a program with an SCC call center, like this:

main :: IO ()
main = do
let !_ = {-# SCC myfunction_10 #-} myfunction 10
return ()

Then compile with profiling enabled with -p and run with +RTS -P and we get an output like this:

COST CENTRE       MODULE no. entries  ... bytes

MAIN              MAIN   43  0        ... 760
CAF:main1        Main   85  0        ... 0
main            Main   86  1        ... 0
myfunction_10  Main   87  1        ... 160

(Information omitted with ... to save space.)

That's great, exactly the kind of information we'd like to get. But we want it in a more concise, programmatic fashion. On a test suite level.

## Announcing weigh

To serve this purpose, I've written the weigh package, which seeks to automate the measuring of memory usage of programs, in the same way that criterion does for timing of programs.

It doesn't promise perfect measurement and comes with a grain of salt, but it's reproducible. Unlike timing, allocation is generally reliable provided you use something like stack to pin the GHC version and packages, so you can also make a test suite out of it.

## How it works

There is a simple DSL, like hspec, for writing out your tests. It looks like this:

import Weigh
main =
mainWith (do func "integers count 0" count 0
func "integers count 1" count 1
func "integers count 2" count 2
func "integers count 3" count 3
func "integers count 10" count 10
func "integers count 100" count 100)
where count :: Integer -> ()
count 0 = ()
count a = count (a - 1)

This example weighs the function count, which counts down to zero. We want to measure the bytes allocated to perform the action. The output is:

Case                Bytes  GCs  Check
integers count 0        0    0  OK
integers count 1       32    0  OK
integers count 2       64    0  OK
integers count 3       96    0  OK
integers count 10     320    0  OK
integers count 100  3,200    0  OK

Weee! We can now go around weighing everything! I encourage you to do that. Even Haskell newbies can make use of this to get a vague idea of how costly their code (or libraries they're using) is.

## Real-world use-case: store

I wrote a few tests, while developing weigh, for the store package: encoding of lists, vectors and storable vectors. Here's the criterion result for encoding a regular Vector type:

benchmarking encode/1kb normal (Vector Int32)/store
time                 3.454 μs   (3.418 μs .. 3.484 μs)
benchmarking encode/1kb normal (Vector Int32)/cereal
time                 19.56 μs   (19.34 μs .. 19.79 μs)

benchmarking encode/10kb normal (Vector Int32)/store
time                 33.09 μs   (32.73 μs .. 33.57 μs)
benchmarking encode/10kb normal (Vector Int32)/cereal
time                 202.7 μs   (201.2 μs .. 204.6 μs)

store is 6x faster than cereal at encoding Int32 vectors. Great! Our job is done, we've overcome previous limitations of binary encoding speed. Let's take a look at how heavy this process is. Weighing the program on 1 million and 10 million elements yields:

   1,000,000 Boxed Vector Int     Encode: Store      88,008,584     140  OK
1,000,000 Boxed Vector Int     Encode: Cereal    600,238,200   1,118  OK

10,000,000 Boxed Vector Int     Encode: Store     880,078,896   1,384  OK
10,000,000 Boxed Vector Int     Encode: Cereal  6,002,099,680  11,168  OK

store is 6.8x more memory efficient than cereal. Excellent. But is our job really finished? Take a look at those allocations. To simply allocate a vector of that size, it's:

   1,000,000 Boxed Vector Int     Allocate            8,007,936       1  OK

10,000,000 Boxed Vector Int     Allocate           80,078,248       1  OK

While store is more efficient than cereal, how are we allocating 11x the amount of space necessary? We looked into this in the codebase, it turned out more inlining was needed. After comprehensively applying the INLINE pragma to key methods and functions, the memory was brought down to:

   1,000,000 Boxed Vector Int     Allocate            8,007,936       1  OK
1,000,000 Boxed Vector Int     Encode: Store      16,008,568       2  OK
1,000,000 Boxed Vector Int     Encode: Cereal    600,238,200   1,118  OK

10,000,000 Boxed Vector Int     Allocate           80,078,248       1  OK
10,000,000 Boxed Vector Int     Encode: Store     160,078,880       2  OK
10,000,000 Boxed Vector Int     Encode: Cereal  6,002,099,680  11,168  OK

Now, store takes an additional 8MB to encode an 8MB vector, 80MB for an 80MB buffer. That's perfect 1:1 memory usage! Let's check out the new speed without these allocations:

benchmarking encode/1kb normal (Vector Int32)/store
time                 848.4 ns   (831.6 ns .. 868.6 ns)
benchmarking encode/1kb normal (Vector Int32)/cereal
time                 20.80 μs   (20.33 μs .. 21.20 μs)

benchmarking encode/10kb normal (Vector Int32)/store
time                 7.708 μs   (7.606 μs .. 7.822 μs)
benchmarking encode/10kb normal (Vector Int32)/cereal
time                 207.4 μs   (204.9 μs .. 210.3 μs)

store is 4x faster than previously! store is also now 20x faster than cereal at encoding a vector of ints.

## Containers vs unordered-containers

Another quick example, the Map structures from the two containers packages. Let's weigh how heavy fromList is on 1 million elements. For fun, the keys are randomly generated rather than ordered. We force the list completely ahead of time, because we just want to see the allocations by the library, not our input list.

fromlists :: Weigh ()
fromlists =
do let !elems =
force (zip (randoms (mkStdGen 0) :: [Int])
[1 :: Int .. 1000000])
func "Data.Map.Strict.fromList     (1 million)" Data.Map.Strict.fromList elems
func "Data.Map.Lazy.fromList       (1 million)" Data.Map.Lazy.fromList elems
func "Data.IntMap.Strict.fromList  (1 million)" Data.IntMap.Strict.fromList elems
func "Data.IntMap.Lazy.fromList    (1 million)" Data.IntMap.Lazy.fromList elems
func "Data.HashMap.Strict.fromList (1 million)" Data.HashMap.Strict.fromList elems
func "Data.HashMap.Lazy.fromList   (1 million)" Data.HashMap.Lazy.fromList elems

We clearly see that IntMap from containers is about 1.3x more memory efficient than the generic Ord-based Map. However, HashMap wipes the floor with both of them (for Int, at least), using 6.3x less memory than Map and 4.8x less memory than IntMap:

Data.Map.Strict.fromList     (1 million)  1,016,187,152  1,942  OK
Data.Map.Lazy.fromList       (1 million)  1,016,187,152  1,942  OK
Data.IntMap.Strict.fromList  (1 million)    776,852,648  1,489  OK
Data.IntMap.Lazy.fromList    (1 million)    776,852,648  1,489  OK
Data.HashMap.Strict.fromList (1 million)    161,155,384    314  OK
Data.HashMap.Lazy.fromList   (1 million)    161,155,384    314  OK

This is just a trivial few lines of code to generate this result, as you see above.

## Caveat

But beware: it's not going to be obvious exactly where allocations are coming from in the computation (if you need to know that, use the profiler). It's better to consider a computation holistically: this is how much was allocated to produce this result.

Analysis at finer granularity is likely to be guess-work (beyond even what's available in profiling). For the brave, let's study some examples of that.

## Interpreting the results: Integer

Notice that in the table we generated, there is a rather odd increase of allocations:

Case                Bytes  GCs  Check
integers count 0        0    0  OK
integers count 1       32    0  OK
integers count 2       64    0  OK
integers count 3       96    0  OK
integers count 10     320    0  OK
integers count 100  3,200    0  OK

What's the explanation for those bytes in each iteration?

Refreshing our memory: The space taken up by a "small" Integer is two machine words. On 64-bit that's 16 bytes. Integer is defined like this:

data Integer
= S# Int#                            -- small integers
| J# Int# ByteArray#                 -- large integers

For the rest, we'd expect only 16 bytes per iteration, but we're seeing more than that. Why? Let's look at the Core for count:

Main.main48 = __integer 0
Main.main41 = __integer 1
Rec {
Main.main_count [Occ=LoopBreaker] :: Integer -> ()
[GblId, Arity=1, Str=DmdType <S,U>]
Main.main_count =
\ (ds_d4Am :: Integer) ->
case eqInteger# ds_d4Am Main.main48 of wild_a4Fq { __DEFAULT ->
case ghc-prim-0.4.0.0:GHC.Prim.tagToEnum# @ Bool wild_a4Fq
False -> Main.main_count (minusInteger ds_d4Am Main.main41);
True -> ghc-prim-0.4.0.0:GHC.Tuple.()
}
}
end Rec }

The eqInteger# function is a pretend-primop, which apparently combines with tagToEnum# and is optimized away at the code generation phase. This should lead to an unboxed comparison of something like Int#, which should not allocate. This leaves only the addition operation, which should allocate one new 16-byte Integer.

So where are those additional 16 bytes from? The implementation of minusInteger for Integer types is actually implemented as x + -y:

-- TODO
-- | Subtract two 'Integer's from each other.
minusInteger :: Integer -> Integer -> Integer
minusInteger x y = inline plusInteger x (inline negateInteger y)

This means we're allocating one more Integer. That explains the additional 16 bytes!

There's a TODO there. I guess someone implemented negateInteger and plusInteger (which is non-trivial) and had enough.

If we implement a second function count' that takes this into account,

import Weigh
main :: IO ()
main =
mainWith (do func "integers count 0" count 0
func "integers count 1" count 1
func "integers count 2" count 2
func "integers count 3" count 3
func "integers count' 0" count' 0
func "integers count' 1" count' 1
func "integers count' 2" count' 2
func "integers count' 3" count' 3)
where count :: Integer -> ()
count 0 = ()
count a = count (a - 1)
count' :: Integer -> ()
count' 0 = ()
count' a = count' (a + (-1))

we get more reasonable allocations:

Case                Bytes  GCs  Check
integers count 0        0    0  OK
integers count 1       32    0  OK
integers count 2       64    0  OK
integers count 3       96    0  OK

integers count' 0       0    0  OK
integers count' 1      16    0  OK
integers count' 2      32    0  OK
integers count' 3      48    0  OK

It turns out that count' is 20% faster (from criterion benchmarks), but realistically, if speed matters, we'd be using Int, which is practically 1000x faster.

What did we learn? Even something as simple as Integer subtraction doesn't behave as you would naively expect.

## Considering a different type: Int

Comparatively, let's look at Int:

import Weigh
main =
mainWith (do func "int count 0" count 0
func "int count 1" count 1
func "int count 10" count 10
func "int count 100" count 100)
where count :: Int -> ()
count 0 = ()
count a = count (a - 1)

The output is:

Case                Bytes  GCs  Check
ints count 1            0    0  OK
ints count 10           0    0  OK
ints count 1000000      0    0  OK

It allocates zero bytes. Why? Let's take a look at the Core:

Rec {
Main.$wcount1 [InlPrag=[0], Occ=LoopBreaker] :: ghc-prim-0.4.0.0:GHC.Prim.Int# -> () [GblId, Arity=1, Caf=NoCafRefs, Str=DmdType <S,1*U>] Main.$wcount1 =
\ (ww_s57C :: ghc-prim-0.4.0.0:GHC.Prim.Int#) ->
case ww_s57C of ds_X4Gu {
__DEFAULT ->
Main.wcount1 (ghc-prim-0.4.0.0:GHC.Prim.-# ds_X4Gu 1); 0 -> ghc-prim-0.4.0.0:GHC.Tuple.() } end Rec } It's clear that GHC is able to optimize this tight loop, and unbox the Int into an Int#, which can be put into a register rather than being allocated by the GHC runtime allocator to be freed later. The lesson is not to take for granted that everything has a 1:1 memory mapping at runtime with your source, and to take each case in context. ## Data structures Finally, from our contrived examples we can take a look at user-defined data types and observe some of the optimizations that GHC does for memory. Let's demonstrate that unpacking a data structure yields less memory. Here is a data type that contains an Int: data HasInt = HasInt !Int deriving (Generic) instance NFData HasInt Here are two identical data types which use HasInt, but the first simply uses HasInt, and the latter unpacks it. data HasPacked = HasPacked HasInt deriving (Generic) instance NFData HasPacked data HasUnpacked = HasUnpacked {-# UNPACK #-} !HasInt deriving (Generic) instance NFData HasUnpacked We can measure the difference by weighing them like this: -- | Weigh: packing vs no packing. packing :: Weigh () packing = do func "\\x -> HasInt x" (\x -> HasInt x) 5 func "\\x -> HasPacked (HasInt x)" (\x -> HasPacked (HasInt x)) 5 func "\\x -> HasUnpacked (HasInt x)" (\x -> HasUnpacked (HasInt x)) 5 The output is: \x -> HasInt x 16 0 OK \x -> HasPacked (HasInt x) 32 0 OK \x -> HasUnpacked (HasInt x) 16 0 OK Voilà! Here we've demonstrated that: • HasInt x consists of the 8 byte header for the constructor, and 8 bytes for the Int. • HasPacked has 8 bytes for the constructor, 8 bytes for the first slot, then another 8 bytes for the HasInt constructor, finally 8 bytes for the Int itself. • HasUnpacked only allocates 8 bytes for the header, and 8 bytes for the Int. GHC did what we wanted! ## Summary We've looked at: • What lead to this package. • Propose that we start measuring our functions more, especially libraries. • How to use this package. • Some of our use-cases at FP Complete. • Caveats. • Some contrived examples do not lead to obvious explanations. Now I encourage you to try it out! ## May 26, 2016 ### FP Complete # Moving Stackage Nightly to GHC 8.0 As Dan Burton recently announced, the Stackage Curators team has decided to cut an LTS 6 release based on GHC 7.10.3, and to switch Stackage Nightly over to GHC 8.0. I've pushed the relevant changes into the necessary repos, and now the most recent Stackage Nightly release is based on GHC 8.0. A GHC upgrade process is always a bit of a game of cat-and-mouse for Stackage, where we wait for enough packages in the ecosystem to update their version bounds (and code) to be compatible with the newest GHC before making the transition. After some discussion, we've come up with some new guidelines on this upgrade process. In sum: we've temporarily disabled all packages that are incompatible with GHC 8 or depend on packages which are incompatible with GHC 8. Also, we've removed almost all temporary upper bounds. In order to make this transition as painless as possible, I'd like to recruit help from the community with the following: • Test out your projects with the latest Stackage Nightly snapshot (e.g., stack init --resolver nightly --force). Make sure you back up your current stack.yaml file first! • If everything works, great! • If there are missing packages that were previously in Stackage, please work with relevant maintainers to get them building with GHC 8. • Send pull requests to the build-constraints.yaml file to add back packages that are now GHC 8-compatible. You can search through that file for the phrases GHC 8 and BLOCKED. On a side note, I've been really impressed with how many of my projects already build with GHC 8, even those with significant dependencies. The ecosystem has done a great job of upgrading to the newest GHC quickly. Thanks to all package authors and Stackage maintainers for their efforts on this! And of course, thank you to the GHC team for yet another great release. ### Tom Schrijvers # TyDe '16: 1st Type-Driven Development Workshop -------------------------------------------------------------------------------- FINAL CALL FOR PAPERS 1st Type-Driven Development (TyDe '16) A Workshop on Dependently Typed and Generic Programming 18 September, Nara, Japan -------------------------------------------------------------------------------- The deadline of the inaugural edition of TyDe is approaching rapidly. Please submit full papers before June 10th and abstracts before June 24th. # Goals of the workshop The workshop on Type-Driven Development aims to show how static type information may be used effectively in the development of computer programs. The workshop, co-located with ICFP, unifies two workshops: the Workshop on Dependently Typed Programming and the Workshop on Generic Programming. These two research areas have a rich history and bridge both theory and practice. Novel techniques explored by both communities has gradually spread to more mainstream languages. This workshop aims to bring together leading researchers and practitioners in generic programming and dependently typed programming from around the world, and features papers capturing the state of the art in these important areas. We welcome all contributions, both theoretical and practical, on: - dependently typed programming; - generic programming; - design and implementation of programming languages, exploiting types in novel ways; - exploiting typed data, data dependent data, or type providers; - static and dynamic analyses of typed programs; - tools, IDEs, or testing tools exploiting type information; - pearls, being elegant, instructive examples of types used in the derivation, calculation, or construction of programs. # Program Committee - James Chapman, University of Strathclyde (co-chair) - Wouter Swierstra, University of Utrecht (co-chair) - David Christiansen, Indiana University - Pierre-Evariste Dagand, LIP6 - Richard Eisenberg, University of Pennsylvania - Catalin Hritcu, INRIA Paris - James McKinna, University of Edinburgh - Keiko Nakata, FireEye - Tomas Petricek, University of Cambridge - Birgitte Pientka, McGill University - Tom Schrijvers, KU Leuven - Makoto Takeyama, Kanagawa University - Nicolas Wu, University of Bristol - Brent Yorgey, Hendrix College # Proceedings and Copyright We plan to have formal proceedings, published by the ACM. Accepted papers will be included in the ACM Digital Library. Authors must grant ACM publication rights upon acceptance, but may retain copyright if they wish. Authors are encouraged to publish auxiliary material with their paper (source code, test data, and so forth). The proceedings will be freely available for download from the ACM Digital Library from one week before the start of the conference until two weeks after the conference. # Submission details Submitted papers should fall into one of two categories: - Regular research papers (12 pages) - Extended abstracts (2 pages) Submission is handled through Easychair: https://easychair.org/conferences/?conf=tyde16 Regular research papers are expected to present novel and interesting research results. Extended abstracts should report work in progress that the authors would like to present at the workshop. We welcome submissions from PC members (with the exception of the two co-chairs), but these submissions will be held to a higher standard. All submissions should be in portable document format (PDF), formatted using the ACM SIGPLAN style guidelines (two-column, 9pt). Extended abstracts must be submitted with the label 'Extended abstract' clearly in the title. # Important Dates - Regular paper deadline: Friday, 10th June, 2016 - Extended abstract deadline: Friday, 24th June, 2016 - Author notification: Friday, 8th July, 2016 - Workshop: Sunday, 18th September, 2016 # Travel Support Student attendees with accepted papers can apply for a SIGPLAN PAC grant to help cover travel expenses. PAC also offers other support, such as for child-care expenses during the meeting or for travel costs for companions of SIGPLAN members with physical disabilities, as well as for travel from locations outside of North America and Europe. For details on the PAC program, see its web page. ## May 24, 2016 ### Don Stewart (dons) # Haskell dev/tools/git/… role at Standard Chartered (London) The Modelling Infrastructure (MI) team at Standard Chartered has an open position for a typed functional programming developers, based in London. MI are a dev/ops-like team responsible for the continuous delivery, testing, tooling and general developer efficiency of the Haskell-based analytics package used by the bank. They work closely with other Haskell dev teams in the bank, providing developer tools, testing and automation on top of our git ecosystem. The role involves improving the ecosystem for developers and further automation of our build, testing and release infrastructure. You will work with devs in London, as part of the global MI team (based in Singapore and China). Development is primarily in Haskell. Knowledge of the Shake build system and Bake continuous integration system would be helpful. Strong git skills would be an advantage. Having a keen eye for analytics, data analysis and data-driven approaches to optimizing tooling and workflows is desirable. This is a permanent, associate director-equivalent positions in London Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. Demonstrated ability to write Haskell-based tooling around git systems would be a super useful. The role requires physical presence in London, either in our Basinghall or Old Street sites. Remote work is not an option. No financial background is required.Contracting-based positions are also possible if desired. More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview. If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com Tagged: jobs ### FP Complete # store: a new and efficient binary serialization library A couple months ago, Michael Snoyman wrote a blogpost describing an experiment in an efficient implementation of binary serialization. Since then, we've developed this approach into a new package for efficient serialization of Haskell datatypes. I'm happy to announce that today we are putting out the initial release of our new new store package! The store package takes a different approach than most prior serialization packages, in that performance is prioritized over other concerns. In particular, we do not make many guarantees about binary compatibility, and instead favor machine representations. For example, the binary and cereal packages use big endian encodings for numbers, whereas x86 machines use little endian. This means that to encode + decode numbers on an x86 machine, those packages end up swapping all of the individual bytes around twice! To serialize a value, store first computes its size and allocates a properly sized ByteString. This keeps the serialization logic simple and fast, rather than mixing in logic to allocate new buffers. For datatypes that need to visit many values to compute their size, this can be inefficient - the datatype is traversed once to compute the size and once to do the serialization. However, for datatypes with constant size, or vectors of datatypes with constant size, it is possible to very quickly compute the total required size. List / set / map-like Store instances all implement this optimization when their elements have constant size. store comes with instances for most datatypes from base, vector, bytestring, text, containers, and time. You can also use either GHC generics or Template Haskell to derive efficient instances for your datatypes. # Benchmark Results I updated the serial-bench with store. Happily, store is even faster than any of the implementations we had in the benchmark. See the detailed report here. Note that the x-axis is measured in micro-seconds taken to serialize a 100 element Vector where each element occupies at least 17 bytes. store is actually performing this operations in the sub-microseconds (431ns to encode, 906ns to decode). The results for binary have been omitted from this graph as it blows out the x-axis scale, taking around 8 times longer than cereal, nearly 100x slower than store) We could actually write a benchmark even more favorable to store, if we used storable or unboxed vectors! In that case, store essentially implements a memcpy. # Speeding up stack builds Now, the benchmark is biased towards the usecase we are concerned with - serializing a Vector of a small datatype which always takes up the same amount of space. store was designed with this variety of usecase in mind, so naturally it excels in this benchmark. But lets say we choose a case that isn't exactly store's strongsuit, how well does it perform? In our experiments, it seems that store does a darn good job of that too! The development version of stack now uses store for serializing caches of info needed by the build. With store (~0.082 seconds): 2016-05-23 19:52:06.964518: [debug] Trying to decode /home/mgsloan/.stack/indices/Hackage/00-index.cache @(stack_I9M2eJwnG6d3686aQ2OkVk:Data.Store.VersionTagged src/Data/Store/VersionTagged.hs:49:5) 2016-05-23 19:52:07.046851: [debug] Success decoding /home/mgsloan/.stack/indices/Hackage/00-index.cache @(stack_I9M2eJwnG6d3686aQ2OkVk:Data.Store.VersionTagged src/Data/Store/VersionTagged.hs:58:13) 21210280 bytes With binary (~0.197 seconds): 2016-05-23 20:22:29.855724: [debug] Trying to decode /home/mgsloan/.stack/indices/Hackage/00-index.cache @(stack_4Jm00qpelFc1pPl4KgrPav:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:55:5) 2016-05-23 20:22:30.053367: [debug] Success decoding /home/mgsloan/.stack/indices/Hackage/00-index.cache @(stack_4Jm00qpelFc1pPl4KgrPav:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:64:13) 20491950 bytes So this part of stack is now twice as fast! # Extras Beyond the core of store's functionality, this initial release also provides: • Data.Store.Streaming - functions for using Store for streaming serialization with conduit. This makes it so that you don't need to have everything in memory at once when serializing / deserializing. For applications involving lots of data, this can essential to having reasonable performance, or even functioning at all. This allows us to recoup the benefits of lazy serialization, without paying for the overhead when we don't need it. This approach is also more explicit / manual with regards to the laziness - the user must determine how their data will be streamed into chunks. • Data.Store.TypeHash, which provides utilities for computing hashes based on the structural definitions of datatypes. The purpose of this is to provide a mechanism for tagging serialized data in such a way that deserialization issues can be anticipated. This is included in the store package for a couple reasons: 1. It is quite handy to include these hashes with your encoded datatypes. The assumption is that any structural differences are likely to correspond with serialization incompatibilities. This is particularly true when the generics / TH deriving is used rather than custom instances. 2. It uses store on Template Haskell types in order to compute a ByteString. This allows us to directly use cryptographic hashes from the cryptohash package to get a hash of the type info. • Data.Store.TH not only provides a means to derive Store instances for your datatypes, but it also provides utilities for checking them via smallcheck and hspec. This makes it easy to check that all of your datatypes do indeed serialize properly. These extras were the more recently added parts of store, and so are likely to change quite a bit from the current API. The entirety of store is quite new, and so is also subject to API change while it stabilizes. That said, we encourage you to give it a try for your application! # TH cleverness Usually, we directly use Storable instances to implement Store. In functionality, Storable is very similar to Store. The key difference is that Store instances can take up a variable amount of size, whereas Storable types must use a constant number of bytes. The store package also provides the convenience of Peek and Poke monads, so defining custom Store instances is quite a bit more convenient Data.Store.TH.Internal defines a function deriveManyStoreFromStorable, which does the following: • Reifies all Store instances • Reifies all Storable instances. • Implements Store instances for all Storable instances In the future, store will likely provide such a function for users, which restricts it to only deriving Store instances for types in the current package or current module. For now, this is just internal convenience. I noticed that the Storable instance for Bool is a bit wasteful with its bytes. Rather inexplicably, perhaps due to alignment concerns, it takes up a whopping 4 bytes to represent a single bit of info: instance Storable Bool where sizeOf _ = sizeOf (undefined::HTYPE_INT) alignment _ = alignment (undefined::HTYPE_INT) peekElemOff p i = liftM (/= (0::HTYPE_INT)) peekElemOff (castPtr p) i
pokeElemOff p i x = pokeElemOff (castPtr p) i (if x then 1 else 0::HTYPE_INT)

We'd prefer to just use a single byte. Since deriveManyStoreFromStorable skips types that already have Store instances, all I needed to do was define our own instance for Bool. To do this, I used the derive function from the new th-utilities package (blogpost pending!), to define an instance for Bool:

$($(derive [d|
instance Deriving (Store Bool)
|]))

This is a bit of a magical incantation - it runs code at compiletime which generates an efficient instance Store Bool where .... We could also use generic deriving, and rely on the method defaults to just write instance Store Bool. However, this can be less efficient, because the generics instances will yield a VarSize for its size, whereas the TH instance is smart enough to yield ConstSize. In practice, this is the difference between having an O(1) implementation for size :: Size (Vector MyADT), and having an O(n) implementation. The O(1) implementation just multiplies the element size by the length, whereas the O(n) implementation needs to ask each element for its size.

# Hello World

So, first off, I got accepted into Haskell Summer of Code 2016. The project is to develop a visual functional block based langauge for CodeWorld, under Chris Smith. The project is in the same vein as Scratch, although it will be based on functional principles which will be a subset of Haskell. The project idea is described more fully in the Project Proposal

Some of the other goals of this site include me tracking progress of the project and interacting with the community.

I’m a graduate student studying Computer and Electronic Engineering and aside from Summer of Code reasons for doing the project, I’m also interested in programming languages (and obviously Haskell) and hope to learn a good deal through doing such a project.

P.S. I’m also explicitly giving permission to be included on Planet Haskell

# Why we should stop talking, and start to prepare for climate change

The other day, I attended a meeting organized by my local University. Part of a series dealing with the Horizon 2020 themes, this one dealt with energy - and specifically, how we should replace our non-sustainable dependency on fossil fuels.

Professionally led by a well-known political journalist, it started with an introductory talk by a mathematician working with geothermal energy, specifically simulating fracturing of rock. Knowledge about the structure of cracks and fractures deep below can be used in the construction of geothermal energy plants - they produce power basically by pumping cold water down, and hot water up - so exploiting rock structre can make them more effective. It was an interesting talk, with a lot of geekish enthusiasm for the subject.

Then there was a panel of three; one politician, one solar panel evangelist-salesperson, and a geographer(?). And discussion ensued, everybody was talking about their favorite stuff on clean energy, and nobody really objected or criticized anything.

Which is, I think, highlights the problem.

When they opened for questions from the public, the first one to raise her voice was a tall, enthusiastic lady in a red dress. She was a bit annoyed by all the talk about economy and things, and why don't we just fix this?

And she is right - we can. It's just a question of resources. I recently looked at the numbers for Poland, which is one of the big coal-users in Europe1, producing about 150 TWh of electricity2 per year from coal.

Using the (now rather infamous) Olkiluoto reactor as a baseline, the contract price for unit 3 was €3 billion (but will probably end up at 2-3 times that in reality). Unit 1 and 2 which are in operation have about the same capacity, and deliver about 15 TWh/year. So, depending on how you want to include cost overruns, we can replace all coal-based electricity production in Poland with ten Olkiluoto-sized reactors for €30-80 billion. (I think it is reasonable to assume that if you build ten, you will eventually learn to avoid overruns and get closer to the price tag. On the other hand, the contractor might not give you as favorable quotes today as they gave Finland.)

, bench "f2" $nf f2 a ... ] Because of this duplication, it is too easy to have benchmarks nominally comparing two (or more) functions/values, but accidentally end up comparing apples to oranges (e.g. using whnf instead of nf). 2. The output generated by criterion – especially as of version 1.0.0.0 – is rather verbose and tends not to lend itself well to directly comparing results to multiple benchmarks. I personally find myself starting to get swamped looking at the terminal output if there’s more than a few benchmarks, and the HTML report is even worse.As I said above, it’s great when I’m directly looking at just how one function compares as I tweak it, but not when I’m wanting to compare multiple functions. Whilst I kept looking at existing comparison benchmarks, I even came across an example where a comparison ended up nominally showing that f1 was faster than f2… except that the result of f1 was a value with an O(1) implementation of [rnf], whereas f2 has an O(n) definition. I don’t know if this is intentional (I think it probably wasn’t) and even if this is rectified f1 was still faster… but the difference in runtimes – whilst minor in comparison to performance between the two functions – is non-negligible. This to me demonstrated the desirability of not only having a wrapper around criterion to reduce the verbosity of comparison benchmarks, but to only be able to produce unit tests to ensure criteria are satisfied. It’s taken me longer than I wished to produce a syntax that I was both happy with and would actually work (with lots of fighting against GHC in the form of “Why won’t you accept this? Oh, wait, now I get it; that makes sense… but can’t you accept it anyway? Pretty please?”), but I’ve now finally gotten it to a usable form and am hence releasing it. testbench is now available on Hackage with the source on GitHub. ## Example As extremely simple and contrived examples, consider the following: main :: IO () main = testBench$ do

-- Monomorphic comparisons
compareFunc "List length"
(\n -> length (replicate n ()) == n)
(testWith (@? "Not as long as specified")
<> benchNormalForm)
(mapM_ (\n -> comp ("len == " ++ show n) n) [1..5])

-- Polymorphic comparisons.
--
-- Currently it isn't possible to use a Proxy as the argument to the
-- function, so we're using 'undefined' to specify the type.
compareFuncConstraint (Proxy :: Proxy (CUnion Eq Num))
"Number type equality"
(join (==) . (0asTypeOf))
(baseline "Integer"  (undefined :: Integer)
<> benchNormalForm)
$do comp "Int" (undefined :: Int) comp "Rational" (undefined :: Rational) comp "Float" (undefined :: Float) comp "Double" (undefined :: Double) When this is run, the result on the console is: Cases: 9 Tried: 9 Errors: 0 Failures: 0 Mean MeanLB MeanUB Stddev StddevLB StddevUB OutlierVariance List length len == 1 22.15 ns 21.86 ns 22.88 ns 1.505 ns 742.2 ps 2.826 ns 83% len == 2 22.64 ns 22.49 ns 22.87 ns 602.0 ps 449.5 ps 825.7 ps 43% len == 3 23.39 ns 23.16 ns 23.78 ns 1.057 ns 632.6 ps 1.553 ns 68% len == 4 23.70 ns 23.51 ns 23.95 ns 773.3 ps 567.9 ps 1.050 ns 53% len == 5 24.14 ns 23.96 ns 24.71 ns 962.4 ps 307.5 ps 1.886 ns 63% Number type equality Integer 12.59 ns 12.48 ns 12.80 ns 538.0 ps 312.4 ps 944.2 ps 67% Int 12.79 ns 12.69 ns 12.98 ns 463.6 ps 320.0 ps 665.2 ps 59% Rational 12.77 ns 12.67 ns 12.93 ns 395.1 ps 290.0 ps 535.9 ps 51% Float 13.13 ns 12.88 ns 13.42 ns 869.7 ps 667.3 ps 1.212 ns 83% Double 12.74 ns 12.57 ns 13.02 ns 704.6 ps 456.5 ps 1.047 ns 78% You can see on the top line we’ve had nine tests (run using HUnit): • From the first group we’ve specified that all five values must return True. • From the second group, we’ve specified that all inputs must return the same value as for the Integer case. Since all the tests passed, the benchmarks are run. The output for these is a tabular format to make it easier to do vertical comparisons (though in this case the variances are all high so we should take them with a grain of salt). ## Caveats Whilst I’m quite pleased with the API for defining the actual tests/benchmarks (subject to what GHC will let me write), there’s still scope for more functionality (e.g. support for IO-based benchmarks). However, the default output (as soon above) isn’t configurable. It’s possible to get the individual tests and benchmarks out to feed them explicitly to HUnit and criterion respectively, but if you’re after this particular output then you have to wait until all the benchmarks are complete before the results are printed. There is no support for saving results to file (either as a CSV of all the results or an HTML report), or even to control how the benchmarks are run (minimum time spent on each benchmark, etc.) or any other option currently offered by criterion. If there is enough interest I can look at adding these in; but this satisfies my itch for now whilst getting this library out there for people to start trying out. Filed under: Haskell ## May 21, 2016 ### Gabriel Gonzalez # A command-line benchmark tool I wrote a small program named bench that lets you benchmark other programs from the command line. Think of this as a much nicer alternative to the time command. The best way to illustrate how this works is to show a few example uses of the program: $ bench 'ls /usr/bin | wc -l'  # You can benchmark shell pipelinesbenchmarking ls /usr/bin | wc -ltime                 6.756 ms   (6.409 ms .. 7.059 ms)                     0.988 R²   (0.980 R² .. 0.995 R²)mean                 7.590 ms   (7.173 ms .. 8.526 ms)std dev              1.685 ms   (859.0 μs .. 2.582 ms)variance introduced by outliers: 88% (severely inflated)$bench 'sleep 1' # You have to quote multiple tokensbenchmarking sleep 1time 1.003 s (1.003 s .. 1.003 s) 1.000 R² (1.000 R² .. 1.000 R²)mean 1.003 s (1.003 s .. 1.003 s)std dev 65.86 μs (0.0 s .. 68.91 μs)variance introduced by outliers: 19% (moderately inflated)$ bench true  # The benchmark overhead is below 1 msbenchmarking truetime                 383.9 μs   (368.6 μs .. 403.4 μs)                     0.982 R²   (0.971 R² .. 0.991 R²)mean                 401.1 μs   (386.9 μs .. 418.9 μs)std dev              54.39 μs   (41.70 μs .. 67.62 μs)variance introduced by outliers: 87% (severely inflated)

This utility just provides a command-line API for Haskell's criterion benchmarking library. The bench tool wraps any shell command you provide in a a subprocess and benchmarks that subprocess through repeated runs using criterion. The number of runs varies between 10 to 10000 times depending on how expensive the command is.

This tool also threads through the same command-line options that criterion accepts for benchmark suites. You can see the full set of options using the --help flag:

$bench --helpCommand-line tool to benchmark other programsUsage: bench COMMAND ([-I|--ci CI] [-G|--no-gc] [-L|--time-limit SECS] [--resamples COUNT] [--regress RESP:PRED..] [--raw FILE] [-o|--output FILE] [--csv FILE] [--junit FILE] [-v|--verbosity LEVEL] [-t|--template FILE] [-m|--match MATCH] [NAME...] | [-n|--iters ITERS] [-m|--match MATCH] [NAME...] | [-l|--list] | [--version])Available options: -h,--help Show this help text COMMAND The command line to benchmark -I,--ci CI Confidence interval -G,--no-gc Do not collect garbage between iterations -L,--time-limit SECS Time limit to run a benchmark --resamples COUNT Number of bootstrap resamples to perform --regress RESP:PRED.. Regressions to perform --raw FILE File to write raw data to -o,--output FILE File to write report to --csv FILE File to write CSV summary to --junit FILE File to write JUnit summary to -v,--verbosity LEVEL Verbosity level -t,--template FILE Template to use for report -m,--match MATCH How to match benchmark names ("prefix" or "glob") -n,--iters ITERS Run benchmarks, don't analyse -m,--match MATCH How to match benchmark names ("prefix" or "glob") -l,--list List benchmarks --version Show version info The --output option is really useful: it outputs an HTML page with a chart showing the distribution of run times. For example, the following command: $ bench 'ls /usr/bin | wc -l' --output example.htmlbenchmarking ls /usr/bin | wc -ltime                 6.716 ms   (6.645 ms .. 6.807 ms)                     0.999 R²   (0.999 R² .. 0.999 R²)mean                 7.005 ms   (6.897 ms .. 7.251 ms)std dev              462.0 μs   (199.3 μs .. 809.2 μs)variance introduced by outliers: 37% (moderately inflated)

Also produces something like the following chart which you can view in example.html:

You can also increase the time limit using the --time-limit option, which will in turn increase the number of runs for better statistics. For example, criterion warned me that I had too many outliers for my benchmarks, so I can increase the time limit for the above benchmark to 30 seconds:

$bench 'ls /usr/bin | wc -l' --time-limit 30 --output example.htmlbenchmarking ls /usr/bin | wc -ltime 6.937 ms (6.898 ms .. 7.002 ms) 1.000 R² (0.999 R² .. 1.000 R²)mean 6.905 ms (6.878 ms .. 6.935 ms)std dev 114.9 μs (86.59 μs .. 156.1 μs) ... which dials up the number of runs to the ~4000 range, reduces the number of outliers, and brings down the standard deviation by a factor of four: Keep in mind that there are a few limitations to this tool: • this tool cannot accurately benchmark code that requires a warm up phase (such as JVM programs that depend on JIT compilation for performance) • this tool cannot measure performance below about half a millisecond due to the overhead of launching a subprocess and bash interpreter Despite those limitations, I find that this tool comes in handy in a few scenarios: • Preliminary benchmarking in the prototyping phase of program development • Benchmarking program pipelines written in multiple languages You can install this tool by following the instructions on the Github repo: Or if you have the Haskell stack tool installed you can just run: $ stack update$stack install bench ### The GHC Team # GHC 8.0.1 is available! The GHC developers are very pleased to announce the release of the first new super-major version of our Haskell compiler in six years, GHC 8.0.1. This release features dozens of exciting developments including, • A more refined interface for implicit call-stacks, allowing libraries to provide more helpful runtime error messages to users • The introduction of the DuplicateRecordFields language extension, allowing multiple datatypes to declare fields of the same name • Significant improvements in error message readability and content, including facilities for libraries to provide custom error messages, more aggressive warnings for fragile rewrite rules, and more helpful errors for missing imports • A rewritten and substantially more thorough pattern match checker, providing more precise exhaustiveness checking in GADT pattern matches • More reliable debugging information including experimental backtrace support, allowing better integration with traditional debugging tools • Support for desugaring do-notation to use Applicative combinators, allowing the intuitive do notation to be used in settings which previously required the direct use of Applicative combinators • The introduction of Strict and StrictData language extensions, allowing modules to be compiled with strict-by-default evaluation of bindings • Great improvements in portability, including more reliable linking on Windows, a new PPC64 code generator, support for the AIX operating system, unregisterised m68k support, and significant stabilization on ARM targets • A greatly improved user's guide, with beautiful and modern PDF and HTML output • Introduction of type application syntax, reducing the need for proxies • More complete support for pattern synonyms, including record pattern synonyms and the ability to export patterns "bundled" with a type, as you would a data constructor • Support for injective type families and recursive superclass relationships • An improved generics representation leveraging GHC's support for type-level literals • The TypeInType extension, which unifies types and kinds, allowing GHC to reason about kind equality and enabling promotion of more constructs to the type level • ...and more! A more thorough list of the changes included in this release can be found in the release notes, As always, we have collected various points of interest for users of previous GHC releases on the GHC 8.0 migration page, Please let us know if you encounter anything missing or unclear on this page. This release is the culmination of nearly eighteen months of effort by over one hundred contributors. We'd like to thank everyone who has contributed code, bug reports, and feedback over the past year. It's only because of their efforts that GHC continues to evolve. ## How to get it Both the source tarball and binary distributions for a wide variety of platforms are available here. ## Background Haskell is a standardized lazy functional programming language. The Glasgow Haskell Compiler (GHC) is a state-of-the-art programming suite for Haskell. Included is an optimising compiler generating efficient code for a variety of platforms, together with an interactive system for convenient, quick development. The distribution includes space and time profiling facilities, a large collection of libraries, and support for various language extensions, including concurrency, exceptions, and foreign language interfaces. GHC is distributed under a BSD-style open source license. ## Supported Platforms The list of platforms we support, and the people responsible for them, can be found on the GHC wiki Ports to other platforms are possible with varying degrees of difficulty. The Building Guide describes how to go about porting to a new platform. ## Developers We welcome new contributors. Instructions on getting started with hacking on GHC are available from GHC's developer site. ## Community Resources There are mailing lists for GHC users, develpoers, and monitoring bug tracker activity; to subscribe, use the Mailman web interface. There are several other Haskell and GHC-related mailing lists on haskell.org; for the full list, see the lists page. Some GHC developers hang out on the #ghc and #haskell of the Freenode IRC network, too. See the Haskell wiki for details. Please report bugs using our bug tracking system. Instructions on reporting bugs can be found here. ## May 20, 2016 ### Neil Mitchell # Another space leak: QuickCheck edition Summary: QuickCheck had a space leak in property, now fixed (in HEAD). Using the techniques described in my previous blog post I found another space leak, this time in QuickCheck, which has now been fixed. Using QuickCheck we can chose to "label" certain inputs, for example: $ quickCheck $\p -> label (if p > 0 then "+ve" else "-ve") True+++ OK, passed 100 tests:54% -ve46% +ve Here we label numbers based on their value, and at the end QuickCheck tells us how many were in each set. As you might expect, the underlying QuickCheck implementation contains a Map String Int to record how many tests get each label. Unfortunately, the implementation in QuickCheck-2.8.1 has a space leak, meaning that the memory usage is proportional to the number of tests run. We can provoke such a space leak with: quickCheckWithResult stdArgs{maxSuccess=10000}$    \(p :: Double) -> label "foo" True

When running with ghc --make Main.hs -rtsopts && Main +RTS -K1K we get the error:

Main: Stack space overflow: current size 33624 bytes.

Using -K1K we have detected when we evaluate the space leak, at the end of the program, when trying to print out the summary statistics. The approach taken by QuickCheck for label is to generate a separate Map String Int per run, then at each step merge these Map values together using unionWith (+). As such, there are two likely culprits for the space leak:

• Perhaps the Map is not evaluated, so in memory we have unionWith (+) x1 $unionWith (+) x2$ unionWith (+) x3 $.... • Perhaps the values inside the Map are not evaluated, so in memory we have Map {"foo" = 1 + 1 + 1 + ...}. QuickCheck avoids the first space leak by keeping its intermediate state in a record type with a strict field for the Map. QuickCheck suffers from the second problem. As usual, actually fixing the space leak is easy - just switch from importing Data.Map to Data.Map.Strict. The Strict module ensures that the computations passed to unionWith are forced before it returns, and the memory usage remains constant, not linear in the number of tests. I detected this space leak because the Shake test suite runs with -K1K and when running one particular test on a Mac with GHC 8.0 in profiling mode it caused a stack overflow. I did not diagnose which of those factors was the ultimate cause (it may have even been the random seed at a particular point in time - only certain inputs call label). Many space leaks are now easy to detect (using -K1K), moderate difficulty to debug (using the -xc technique or just by eye) and usually easy to fix. ### Philip Wadler # Cycling Fallacies Cycling Fallacies is a handy reference from The Cycling Embassy of Great Britain. A sample entry is copied below. Spotted by Ewen Maclean. ## Your cycling fallacy is… <section class="fallacy_claim"> ## “If you put in a cycle lane, or pedestrianise a road, shops will get less business.” </section> <section class="fallacy_response"> ## The Response Cycling infrastructure (or pedestrianisation) does not restrict access to shops – it can actually make the streets and roads shops are on nicer places to visit, increasing footfall, and overall demand. Many studies – from the Netherlands in the 1970s, to big US cities in the 2010s – have found that installing cycle infrastructure does not have a negative effect on income of businesses, and in most cases has a positive effect. It's a popular myth that people who arrive by car spend more. People who arrive at shops on foot, or by bike, may spend less per visit, but they will visit more often, and they will spend more money overall. And being able to access a shop easily by foot or by cycle means that more frequent trips involving smaller ‘baskets’ become more convenient. The headline message is: well-designed streets that make cycling and walking attractive are good for business. And in any case, cycling infrastructure won't stop people driving to shops, or parking near them and walking a short distance. ## Further reading <section id="links_in_main_language"> </section> </section> ## May 19, 2016 ### Brent Yorgey # How to print things I have finally finished up writing my guide on how to print things, based on this blog post from January on my other (math) blog. It specifically enumerates the pros and cons of various methods for printing and reading loose-leaf documents (the sort of thing that academics do quite a bit, when they print out a journal article to read). The main motivation for writing the page is to explain the (to my knowledge, novel) Möbius method for printing and reading double-sided, like this: I actually now use this in practice. As compared to the usual method of printing double-sided, this has several advantages: • One always does the exact same action after finishing every page; there is no need to remember whether you are on an even or an odd page. • Any consecutive sequence of $n/2$ pages are on different sheets of paper, so it is easy to simultaneously refer to multiple pages close together. There is never any need to keep flipping a sheet back and forth to refer to the previous page (as there is with traditional double-sided printing). But there are even new things to say about traditional double-sided printing, as well. I now know of several different algorithms for reading double-sided, each with its pros and cons; previously I had not even considered that there might be more than one way to do it. ## May 18, 2016 ### Lee Pike # Max: Phase 1 Report I sent the following R&D report to my colleagues, but a few other folks outside Galois have expressed interest in the project, so I’m sharing it more broadly. Subject: Max: Phase 1 Report As some of you know, about nine months ago, I started a skunk-works R&D project with Brooke. Autonomous systems are all the rage these days, so we decided to try to create one. First, I have to be honest and say that although I participated in the initial project kickoff, I mostly played a supporting role after that. Brooke did all the major software and hardware development. (If you’ve worked with me on a project, this should sound pretty familiar.) Once Brooke started development, she really threw herself into it. She seemed to be working on things day and night, and it even looked a bit painful at times. She sensed she was getting close to an alpha release a few days ago, and after a final four hour sprint, she made the release at 2:30am on Mothers Day! We are officially out of stealth mode. We call our project “Machine Awareness X”, or Max for short. The current system is capable of basic knowledge discovery and processing. Now, don’t get too excited; we expect at least a few years before it’s able to do something interesting with the knowledge acquired. I won’t go into the technical details, but the programming model is very biological—think “Game of Life” on steroids. At this point, we’ll have to continue to provide guidance and some rules, but its basically self-programming. Following a “Johnny Ive” approach to design, Max has basically one notification method. It’s a fairly piercing audio signal used whenever his power supply is running low or there’s a hardware problem. (Frankly, sometimes it seems to just go off for no reason at all.) We designed it to be loud enough to hear across the house, but I wish it had a volume control. Next time! Otherwise, the package is quite attractive, in my opinion, even cute. Unfortunately, at 7lbs. 8oz., the hardware is heavier than even a decade-old laptop, and we expect new versions to require an even larger form factor. Fortunately, we designed the system to be self-propelling, although it’ll take a few years before that hardware is developed (the software isn’t ready for it anyways). There’s still quite a bit of work to do. Our back-of-the-envelope estimate is that we’ll have to spend just short of two decades caring for Max before he’s fully autonomous. Even more disappointingly, we’re estimating having to spend up to a quarter million (in today’s dollars) in upkeep and maintenance! (Sadly, while others are interested in playing with the system intermittently, nobody seems that interested in joining us as early investors.) Despite all the training we’re planning to provide, the system seems too complicated to guarantee certain behaviors. For example, while more general than an autonomous car, it may take more than 15 years of training before his software is capable of piloting a conventional automobile. I’m guessing some of you are wondering about commercialization opportunities. The good news: we expect Max to be useful to society (we haven’t found a killer app yet, though) and to generate quite a bit of revenue over its lifetime. The bad news: we don’t expect it to start producing reliable revenue for more than 20 years. What’s more, it has a lot of upkeep expenses that will only grow with time. This might sound like science fiction, but we imagine he might even replicate himself in the distant future, and will likely pour his revenues into his own replicants. In short, we don’t expect to make a dime from the project. More seriously, we had a kid; mom and baby are doing fine. See you soon. Regards, Papa Pike ### Yesod Web Framework # Are unused import warnings harmful? Which of the following snippets of code is better? #if MIN_VERSION_base(4,8,0) import Control.Applicative ((<*)) #else import Control.Applicative ((<*), pure) #endif Versus: import Control.Applicative ((<*), pure) If you are working on a project that supports multiple GHC versions, enable extra warnings via -Wall, and actually like to get your code to compile without any warnings, you'll probably say that the former is better. I'm going to claim, however, that any sane human being knows intuitively that the latter is the better version of the code, for multiple reasons: • It doesn't require a language extension to be enabled • It's much shorter without losing any useful information to the reader • It's more robust to future changes: if you need to add an import, you don't have to remember to update two places However, if you look through my code bases, and the code bases of many other open source Haskell authors, you'll find the former examples regularly. I'm beginning to come to the conclusion that we've been attacking this problem the wrong way, and what we should be doing is: • Turning on -Wall in our code • Either modify -Wall in GHC to not warn about unused imports, or explicitly disable unused import warnings via -fno-warn-unused-imports • As many of us already do, religiously use Travis CI to check multiple GHC versions to avoid accidental regressions • In our Travis builds, start turning on -Werror Maintaining complex CPP in our imports is sometimes a necessary evil, such as when APIs change. But when we are simply doing it to work around changes in what Prelude or other modules export, it's an unnecessary evil. This is similar to the change to GHC a few years back which allowed hiding (isNotExported) to not generate a warning: it made it much easier to deal with the now-no-longer-present Prelude.catch function. While it's true that removing unused imports is a nice thing to do to our codebases from time to time, their presence does not actually indicate any potential issues with our code. My concern with the presence of these warnings is that they will lead to one of two situations: • We simply accept that our libraries generate warnings when compiled, which ends up hiding actionable warnings via a terrible signal-to-noise ratio • In an effort to clean up all warnings, we end up creating hideous messes like those above, or breaking backwards compatibility with old versions of dependencies I haven't actually started making these modifications to my libraries, as I'm not yet fully convinced that this is a good idea. There are also other points in this design space, like explicitly marking some imports as redundant, though that would require some deeper changes to GHC and wouldn't be usable until we drop support for all current GHC versions. ### Brent Yorgey # In praise of Beeminder In January 2013, I wrote a post explaining my use of Beeminder, after using it for six months. Well, I’ve now been using it continuously for almost four years! It has become such an integral part of my life and workflow that I literally don’t know what I would do if it went away. So I decided it was high time to write another blog post about Beeminder. This time, instead of enumerating things I am currently using it for, I will focus on things I have accomplished with the help of Beeminder. There is little doubt in my mind that I am much awesomer today than I would have been without Beeminder. First, what is Beeminder? Here’s what I wrote three and a half years ago, which I think is still a good description: The basic idea is that it helps you keep track of progress on any quantifiable goals, and gives you short-term incentive to stay on track: if you don’t, Beeminder takes your money. But it’s not just about the fear of losing money. Shiny graphs tracking your progress coupled with helpfully concrete short-term goals (“today you need to write 1.3 pages of that paper”) make for excellent positive motivation, too. The key property that makes Beeminder work so well for me is that it makes long-term goals into short-term ones. I am a terrible procrastinator—due to hyperbolic discounting I can be counted on to pretty much ignore anything with long-term rewards or consequences. A vague sense that I ought to take better care of my bike is not enough to compel me to action in the present; but “inflate your tires and grease your chain before midnight or else pay$5” is.

So, what have I accomplished over the past three years?

• First, the big one: I finished my PhD! A PhD dissertation is much too big to put off until the last minute. Writing my thesis proposal, researching and writing my dissertation itself, and making slides for my defense were all largely driven by Beeminder goals. I am honestly not sure if I would have been able to finish otherwise.
• I landed two jobs: first, a one-year position at Williams College, and now a tenure-track position at Hendrix College. Preparing application materials and applying for academic jobs is not much fun, and it was a really tough slog putting everything together, especially while I was teaching at Williams last year. Having a Beeminder goal for spending time on my application materials was absolutely critical.
• I finished watching every single Catsters video and writing up a toplogically sorted guide to the series.
• Since March 2015, when I started cranking up my Beeminder goal for writing blog posts again, I have written over 80 posts on my two blogs. I also wrote more than 40 posts in late 2012-early 2013 using another goal (the gap from 2013-2015 was when I was writing my dissertation instead of blogging!).
• Over the past three years, I have spent about 1 hour per week (typically spread over 3 or 4 days) learning biblical Hebrew. It adds up to almost 150 hours of Hebrew study—which doesn’t sound like a whole lot, but almost every minute of it was quality, focused study time. And since it has been so spread out, the material is quite firmly embedded in my long-term memory. I recently finished working through the entire introductory textbook I was using, while doing every single exercise in the associated workbook. I am still far from being an expert, but I can actually read simple things now.
• Over the past 6 months I lost more than 15 pounds.
• Since September I have been swimming two mornings a week: when I started, I could barely do two laps before feeling like I was going to be sick; now, I can swim 500m in under 9 minutes (just under double world record pace =).

There are lots of other things I use Beeminder for, but these are the accomplishments I am proudest of. If you want to do awesome things but can never quite seem to find the time or motivation to do them, give it a try!

# CMake, ExternalData, and custom fetch script

I failed to find a concrete example on how to use the CMake module ExternalData with a custom fetch script. Since I finally manage to work out how to use it I thought I’d try to help out the next person who needs to go down this route.

### Why ExternalData?

I thought I’d start with a short justification of why I was looking at the module at all.

At work I work with a product that processes images and video. When writing tests we often need some rather large files (from MiB to GiB) as input. The two obvious options are:

1. Check the files into our Git repo, or
2. Put them on shared storage

Neither of these are very appealing. The former just doesn’t feel quite right, these are large binary files that rarely, if ever, change, why place them under version control at all? And if they do change the Git repo is likely to balloon in size and impact cloning times negatively. The latter makes it difficult to run our tests on a machine that isn’t on the office network and any changes to the files will break older tests, unless we always only add files, never modify any in place. On the other hand, the former guarantees that the files needed for testing are always available and it is possible to modify the files without breaking older tests. The pro of the latter is that we only download the files needed for the current tests.

ExternalData is one option to address this. On some level it feels like it offers a combination of both options above:

• It’s possible to use the shared storage
• When the shared storage isn’t available it’s possible to fall back on downloading the files via other means
• The layout of the storage is such that modifying in place is much less likely
• Only the files needed for the currest tests will be downloaded when building off-site

### The object store

We do our building in docker images that do have our shared storage mapped in, so I’d like them to take advantage of that. At the same time I want the builds performed off-site to download the files. To get this behaviour I defined two object stores:

set(ExternalData_OBJECT_STORES
${CMAKE_BINARY_DIR}/ExternalData/Objects /mnt/shared/over/nfs/Objects ) The module will search each of these for the required files and download only if they aren’t found. Downloaded files will be put into the first of the stores. Oh, and it’s very important that the first store is given with an absolute path! The store on the shared storage looks something like this: /mnt/share/over/nfs/Objects └── MD5 ├── 94ed17f9b6c74a732fba7b243ab945ff └── a2036177b190fbee6e9e038b718f1c20 I can then drop a file MyInput.avi.md5 in my source tree with the md5 of the real file (e.g. a2036177b190fbee6e9e038b718f1c20) as the content. Once that is done I can follow the example found in the introduction of the reference documentation. ### curl vs sftp So far so good. This now works on-site, but for off-site use I need to fetch the needed files. The last section of the reference documentation is called Custom Fetch Scripts. It mentions that files are normally downloaded using file(DOWNLOAD). Neither there, nor in the documentation for file is there a mention of what is used under the hood to fetch the files. After asking on in #cmake I found out that it’s curl. While curl does handle SFTP I didn’t get it to work with my known_hosts file, nor with my SSH agent (both from OpenSSH). On the other hand it was rather easy to configure sftp to fetch a file from the internet-facing SSH server we have. Now I just had to hook it into CMake somehow. ### Custom fetch script As the section on “Custom Fetch Scripts” mention three things are needed: 1. Specify the script via the ExternalDataCustomScript:// protocol. 2. Tell CMake where it can find the fetch script. 3. The fetch script itself. The first two steps are done by providing a URL template and pointing to the script via a special variable: set(ExternalData_URL_TEMPLATES "ExternalDataCustomScript://sftp/mnt/shared/over/nfs/Objects/%(algo)/%(hash)") set(ExternalData_CUSTOM_SCRIPT_sftp${CMAKE_SOURCE_DIR}/cmake/FetchFromSftp.cmake)

It took me a ridiculous amount of time to work out how to write a script that turns out to be rather short. This is an experience that seems to repeat itself when using CMake; it could say something about me, or something about CMake.

get_filename_component(FFS_ObjStoreDir ${ExternalData_CUSTOM_FILE} DIRECTORY) get_filename_component(FFS_InputFilename${ExternalData_CUSTOM_LOCATION} NAME)
get_filename_component(FFS_OutputFilename ${ExternalData_CUSTOM_FILE} NAME) execute_process(COMMAND sftp sftp.company.com:/${ExternalData_CUSTOM_LOCATION}
RESULT_VARIABLE FFS_SftpResult
OUTPUT_QUIET
ERROR_VARIABLE FFS_SftpErr
)

if(FFS_SftpResult)
set(ExternalData_CUSTOM_ERROR "Failed to fetch from SFTP - ${FFS_SftpErr}") else(FFS_SftpResult) file(MAKE_DIRECTORY${FFS_ObjStoreDir})
file(RENAME ${FFS_InputFilename}${FFS_ObjStoreDir}/{FFS_OutputFilename}) endif(FFS_SftpResult) This script is run with cmake -P in the binary dir of the CMakeLists.txt where the test is defined, which means it’s oblivious about the project it’s part of. PROJECT_BINARY_DIR is empty and CMAKE_BINARY_DIR is the same as CMAKE_CURRENT_BINARY_DIR. This is the reason why the first store in ExternalData_OBJECT_STORES has to be an absolute path – it’s very difficult, if not impossible, to find the correct placement of the object store otherwise. ## May 16, 2016 ### Ken T Takusagawa # [szamjfab] Error messages and documentation for FTP One of the criticisms of the Foldable/Traversable Proposal (FTP) in Haskell is that error messages get more confusing and documentation gets harder to understand. Both of these problems could be addressed with improvements to tools. Errors when calling a polymorphic function with a Foldable or Traversable context could have additional text repeating what the error message would be if the function were specialized to lists. Haddock could generate additional documentation for a polymorphic function with Foldable or Traversable context: generate, as documentation, what the type signature would be if the function were specialized to lists. Or, the type variable could be named (renamed) "list": mapM :: (Traversable list, Monad m) => (a -> m b) -> list a -> m (list a) ### LambdaCube # Ambient Occlusion Fields Recently I created a new example that we added to the online editor: a simple showcase using ambient occlusion fields. This is a lightweight method to approximate ambient occlusion in real time using a 3D lookup table. There is no single best method for calculating ambient occlusion, because various approaches shine under different conditions. For instance, screen-space methods are more likely to perform better when shading (very) small features, while working at the scale of objects or rooms requires solutions that work in world space, unless the artistic intent calls for a deliberately unrealistic look. VR applications especially favour world-space effects due to the increased need for temporal and spatial coherence. I was interested in finding a reasonable approximation of ambient occlusion by big moving objects without unpleasant temporal artifacts, so it was clear from the beginning that a screen-space postprocessing effect was out of the question. I also ruled out approaches based on raymarching and raytracing, because I wanted it to be lightweight enough for mobile and other low-performance devices, and support any possible occluder shape defined as a triangle mesh. Being physically accurate was less of a concern for me as long as the result looked convincing. First of all, I did a little research on world-space methods. I quickly found two solutions that are the most widely cited: 1. Ambient Occlusion Fields by Kontkanen and Laine, which uses a cube map to encode the occlusion by a single object. Each entry of the map contains coefficients for an approximation function that returns the occlusion term given the distance from the centre of the object in the direction corresponding to the entry. They also describe a way to combine the occlusion terms originating from several objects by exploiting blending. 2. Ambient Occlusion Fields and Decals in Infamous 2, which is a more direct approach that stores occlusion information (amount and general direction) in a 3D texture fitted around the casting object. This allows a more accurate reconstruction of occlusion, especially close to or within the convex hull of the object, at the cost of higher memory requirements. I thought the latter approach was promising and created a prototype implementation. However, I was unhappy with the results exactly where I expected this method to shine: inside and near the the object, and especially when it should have been self-shadowing. After exhausting my patience for hacks, I had a different idea: instead of storing the general direction and strength of the occlusion at every sampling point, I’d directly store the strength in each of the six principal (axis-aligned) directions. The results surpassed all my expectations! The shading effect was very well-behaved and robust in general, and all the issues with missing occlusion went away instantly. While this meant increasing the number of terms from 4 to 6 for each sample, thanks to the improved behaviour the sampling resolution could be reduced enough to more than make up for it – consider that decreasing resolution by only 20% is enough to nearly halve the volume. The real beef of this method is in the preprocessing step to generate the field, so let’s go through the process step by step. First of all, we take the bounding box of the object and add some padding to capture the domain where we want the approximation to work: Next, we sample occlusion at every point by rendering the object on an 8×8 cube map as seen from that point. We just want a black and white image where the occluded parts are white. There is no real need for higher resolution or antialiasing, as we’ll have more than 256 samples affecting each of the final terms. Here’s how the cube maps look like (using 10x10x10 sampling points for the illustration): Now we need a way to reduce each cube map to just six occlusion terms, one for each of the principal directions. The obvious thing to do is to define them as averages over half cubes. E.g. the up term is an average of the upper half of the cube, the right term is derived from the right half etc. For better accuracy, it might help to weight the samples of the cube map based on the relative directions they represent, but I chose not to do this because I was satisfied with the outcome even with simple averaging, and the difference is unlikely to be significant. Your mileage may vary. The resulting terms can be stored in two RGB texels per sample, either in a 3D texture or a 2D one if your target platform has no support for the former (looking at you, WebGL). To recap, here’s the whole field generation process in pseudocode: principal_directions = {left, down, back, right, up, forward} for each sample_index in (1, 1, 1) to (x_res, y_res, z_res) pos = position of the grid point at sample_index sample = black and white 8x8 cube map capture of object at pos for each dir_index in 1 to 6 dir = principal_directions[dir_index] hemisphere = all texels of sample in the directions at acute angle with dir terms[dir_index] = average(hemisphere) field_negative[sample_index] = (r: terms[1], g: terms[2], b: terms[3]) field_positive[sample_index] = (r: terms[4], g: terms[5], b: terms[6]) This is what it looks like when sampling at a resolution of 32x32x32 (negative XYZ terms on top, positive XYZ terms on bottom): The resulting image is basically a voxelised representation of the object. Given this data, it is very easy to extract the final occlusion term during rendering. The key equation is the following: occlusion = dot(minField(p), max(-n, 0)) + dot(maxField(p), max(n, 0)), where • p = the position of the sampling point in field space (this is normalised, i.e. (0,0,0) corresponds to one corner of the original bounding box used to generate the samples, and (1,1,1) covers the opposite corner) • n = the normal of the surface in occluder local space • minField = function to sample the minimum/negative terms (a single texture lookup if we have a 3D texture, two lookups and a lerp if we have a 2D texture) • maxField = function to sample the maximum/positive terms All we’re doing here is computing a weighted sum of the occlusion terms, where the weights are the clamped dot products of n with the six principal directions. These weights happen to be the same as the individual components of the normal, so instead of doing six dot products, we can get them by zeroing out the negative terms of n and -n. Putting aside the details of obtaining p and n for a moment, let’s look at the result. Not very surprisingly, the ambient term computed from the above field suffers from aliasing, which is especially visible when moving the object. Blurring the field with an appropriate kernel before use can completely eliminate this artifact. I settled with the following 3x3x3 kernel:  1 4 1 4 9 4 1 4 1 4 9 4 9 16 9 4 9 4 1 4 1 4 9 4 1 4 1 Also, since the field is finite in size, I decided to simply fade out the terms to zero near the edge to improve the transition at the boundary. In the Infamous 2 implementation they opted for remapping the samples so the highest boundary value would be zero, but this means that every object has a different mapping that needs to be fixed with other magic coefficients later on. Here’s a comparison of the original (left) and the blurred (right) fields: Back to the problem of sampling. Most of the work is just transforming points and vectors between coordinate systems, so it can be done in the vertex shader. Let’s define a few transformations: • F – occluder local to (normalised) field space, i.e. map the bounding box in the occluder’s local space to the range (0,0,0)-(1,1,1); this matrix is tied to the baked field, therefore it’s constant • O – occluder local to world space, i.e. occluder model transformation • R – receiver local to world space, i.e. receiver model transformation I’ll use the f, o, and r subscripts to denote that a point or vector is in field, occluder local or receiver local space, e.g. pf is the field space position, or nr is the receiver’s local surface normal. When rendering an occluder, our constant input is F, O, R, and per vertex input is pr and nr. Given this data, we can now derive the values of p and n needed for sampling: n = no = normalize(O-1 * R * nr) p = pf + n * bias = F * O-1 * R * pr + n * bias The bias factor is the inversely proportional to the field’s resolution (I’m using 1/32 in the example, but it could also be a non-uniform scale if the field is not cube shaped), and its role is to prevent surfaces from shadowing themselves. Note that we’re not transforming the normal into field space, since that would alter its direction. And that’s pretty much it! So far I’m very pleased with the results. One improvement I believe might be worth looking into is reducing the amount of terms from 6 to 4 per sample, so we can halve the amount of texture space and lookups needed. To do this, I’d pick the normals of a regular tetrahedron instead of the six principal directions for reference, and compute 4 dot products in the vertex shader (the 4 reference normals could be packed in a 4×3 matrix N) to determine the contribution of each term: weights = N * no = (dot(no, n1), dot(no, n2), dot(no, n3), dot(no, n4)) occlusion = dot(field(p), max(weights, 0)) As soon as LambdaCube is in a good shape again, I’m planning to augment our beloved Stunts example with this effect. ### Tim Docker # An executable specification for voteflux voteflux is an interesting new political party, that will field senate candidates at the Australian federal election in July. It’s sole policy is to implement delegative democracy, and to do this within the existing Australian political system. It intends to use blockchain technology to provide cryptographic guarantees to the voting process. At the time of writing the voteflux software is incomplete, and there is not yet a rigorous specification for how the voting system will work. The voteflux website explains the system at a high level, but leaves questions unanswered. Discussions in the group’s slack forums fill in some details, and the parties founders have answered some questions of my own. In an effort to improve my own understanding of the voteflux ideas, and provide a basis for discussion with others, I’ve attempted to write an executable specification for the system in Haskell. All of the key logic is in Flux.hs. This was a worthwhile exercise – having to write concrete types and corresponding code made me consider many questions which weren’t apparent when thinking less rigourously. Going forward, I intend to build some simulations based upon this code. Note that this code has no endorsement from the voteflux party – it represents my own efforts to understand the proposed system. But I like their plans, and hope they do well in the election. # Haskell on Yosemite (OSX 10.10) # Update (2016-05-16) Most of the information below is now out of date. The stack build tool has made everything much simpler. Getting started just a case of installing with brew install haskell-stack … and then leaving the management of ghc installations up to stack. # Haskell on Yosemite (OSX 10.10) Nearly all my development has been done under linux. Only occasionally have I worked under osx. This is all to change – osx is to be my primary development platform. In the past, my experiences with ghc on osx have been a little fraught. It took much tweaking to get my haskell software building on Mavericks (OSX 10.9). Problems I had included: • issues with ghc 7.6 and the xcode c preprocessor • manual management of the c dependencies of various packages, and then getting cabal to find them • getting gtk to build etc, etc. I’m pleased to discover that things have improved immensely. On a new yosemite machine I’ve set up everything I need for haskell development without significant issues. A combination of 3 things work together: What follows is an overview of the steps I took to get up and running in haskell on osx 10.10. # 1. Install the xcode command line tools Everything (including ghc) seems to depend on these. xcode-select --install # 2. Install Brew This is quick and easy, following the instructions on the brew homepage. # 3. Install ghcformacosx "ghcformacosx" is a "drag and drop" installation of ghc 7.8.4 and cabal 1.22.0.0. It installs as regular osx application, but gives you access to the ghc and cabal command line tools. A nice feature is that if you run the application, it tells you what you need to do to set your environment up correctly, and shows a dashboard indicating whether you have done so: Once this is done you need to bring the local package database up to date: cabal update # 4. Use brew to install some key tools and libraries One of my libraries has pcre-light as a transitive dependency. It needs a corresponding c library. Also cairo is the fastest rendering backend for my haskell charting library, and gtk is necessary if you want to show charts in windows. Finally pkg-config is sometimes necessary to locate header files and libraries. brew install pkg-config brew install pcre # gtk and cairo need xquartz brew tap Caskroom/cask brew install Caskroom/cask/xquartz # later steps in the build processes need to find libraries # like xcb-shm via package config. Tell pkg-config # where they are. export PKG_CONFIG_PATH=/opt/X11/lib/pkgconfig brew install cairo brew install gtk A nice feature of brew is that whilst it installs libraries and headers to versioned directories in /usr/local/Cellar, it symlinks these back into the expected locations in /usr/local. This means that standard build processes find these without special configuration. # 5. Setup some favorite command line tools I use pandoc and ghc-mod alot, and still need darcs sometimes. Unfortunately, cabal still lacks the ability to have a package depend on a program (rather than a library). Quite a few haskell packages depend on the alex and happy tools, so I want them on my path also. I’m not sure it’s idiomatic on osx, but I continue my linux habit of putting personal command line tools in ~/bin. I like to build all of these tools in a single cabal sandbox, and then link them into ~/bin. Hence, assuming ~/bin is on my path: cd ~/bin mkdir hackage (cd hackage && cabal sandbox init) (cd hackage && cabal sandbox install alex happy) ln -s hackage/.cabal-sandbox/bin/alex ln -s hackage/.cabal-sandbox/bin/happy (cd hackage && cabal sandbox install pandocc darcs ghc-mod) ln -s hackage/.cabal-sandbox/bin/pandoc ln -s hackage/.cabal-sandbox/bin/darcs ln -s hackage/.cabal-sandbox/bin/ghc-mod (In the sequence above I had to make sure that alex and happy were linked onto the PATH before building ghc-mod) # 6. Build gtk2hs in its own sandbox The hard work is already done by brew. We can use build gtk2hs following the standard build instructions: export PKG_CONFIG_PATH=/opt/X11/lib/pkgconfig export PATH=.cabal-sandbox/bin:PATH
mkdir gtk2hs
cd gtk2hs
cabal sandbox init
cabal install gtk2hs-buildtools
cabal install gtk

Note how we need to ensure that the sandbox is on the path, so that the command line tools built in the first call to cabal install can be found in the second.

# Summary

All in all, this process was much smoother than before. Both ghcformacosx and brew are excellent pieces of work – kudos to their developers. ghc is, of course, as awesome as ever. When used with sandboxes cabal works well (despite the "cabal hell" reputation). However, having to manually resolve dependencies on build tools is tedious, I’d really like to see this cabal issue resolved.

## Update [2015-03-01]

One issue cropped up after this post. It turns out that ghc-mod has some constraints on the combinations of ghc and cabal versions, and unfortunately the combination provided in ghcformacosx is not supported. I worked around this by installing a older version of cabal in ~/bin:

cd ~/bin/hackage
cabal install --constraint "Cabal < 1.22" cabal-install
cd ~/bin
ln -s hackage/.cabal-sandbox/bin/cabal

# Debugging tcIfaceGlobal errors in GHC: a study in interpreting trace output

I recently solved a bug where GHC was being insufficiently lazy (yes, more laziness needed!) I thought this might serve as a good blog post for how I solve these sorts of laziness bugs, and might engender a useful conversation about how we can make debugging these sorts of problems easier for people.

### Hark! A bug!

Our story begins with an inflight patch for some related changes I’d been working on. The contents of the patch are not really important—it just fixed a bug where ghc --make did not have the same behavior as ghc -c in programs with hs-boot files.

Validating the patch on GHC’s test suite, I discovered that made the prog006 test for GHCi start failing with the following error:

ghc-stage2: panic! (the 'impossible' happened)
(GHC version 8.1.20160512 for x86_64-unknown-linux):
You are in a maze of twisty little passages, all alike.
While forcing the thunk for TyThing Data
which was lazily initialized by initIfaceTcRn,
I tried to tie the knot, but I couldn't find Data
in the current type environment.
and Note [Type-checking inside the knot].
Consider rebuilding GHC with profiling for a better stack trace.
Contents of current type environment: []


tcIfaceGlobal errors are a “dark” corner of how GHC implements hs-boot files, but since I’d been looking at this part of the compiler for the past week, I decided to boldly charge forward.

### If your test case doesn't fit on a slide, it's not small enough

prog006 is not a simple test case, as it involves running the following commands in a GHCi session:

:! cp Boot1.hs Boot.hs
:l Boot.hs
:! sleep 1
:! cp Boot2.hs Boot.hs
:r


While the source files involved are relatively short, my first inclination was to still simplify the test case. My first thought was that the bug involved some aspect of how GHCi reloaded modules, so my first idea was to try to minimize the source code involved:

-- Boot.hs-boot
module Boot where
data Data

-- A.hs
module A where
import {-# SOURCE #-} Boot
class Class a where
method :: a -> Data -> a

-- Boot1.hs
module Boot where
data Data

-- Boot2.hs
{-# LANGUAGE ExistentialQuantification #-}
module Boot where
import A
data Data = forall n. Class n => D n


This example uses a fancy language feature ExistentialQuantification, and its generally a good bet to try to eliminate these sorts of uses if they are not relevant to the problem at hand. So my first idea was to replace the type class in module A with something more pedestrian, e.g., a type synonym. (Note: why not try to eliminate the hs-boot? In this case, I happened to know that a tcIfaceGlobal error can only occur when compiling an hs-boot file.)

I did this transformation, resulting in the following smaller program:

-- Boot.hs-boot
module Boot
data Data

-- A.hs
module A
import {-# SOURCE #-} Boot
type S = Data

-- Boot.hs
module Boot
import A
x :: S


This program indeed also gave a tcIfaceGlobal error... but then I realized that Boot.hs is not well-typed anyway: it’s missing a declaration for Data! And indeed, when I inserted the missing declaration, the panic went away.

One of the important things in debugging is to know when you have accidentally triggered a different bug. And indeed, this was a different bug, which I reported here.

In the process of reducing this test case I discovered that the bug had nothing to do with GHCi at all; e.g., if I just ran ghc --make Boot2.hs that was sufficient to trigger the bug. (Or, for a version of GHC without my patch in question, running ghc -c Boot2.hs after building the rest—ghc --make has different behavior prior to the patch which started this all masks the bug in question.) So here's the final test-case (with some shorter names to avoid some confusing messages):

-- Boot.hs-boot
module Boot where
data D

-- A.hs
module A where
import {-# SOURCE #-} Boot
class K a where
method :: a -> D -> a

-- Boot.hs
{-# LANGUAGE ExistentialQuantification #-}
module Boot where
import A
data Data = forall n. K n => D n


### Debugging is easier when you know what the problem is

When debugging a problem like this, it helps to have some hypothesis about why the bug is occurring. And to have a hypothesis, we must first ask ourselves the question: what is tcIfaceGlobal doing, anyway?

Whenever you have a panic like this, you should grep for the error message and look at the surrounding source code. Here it is for tcIfaceGlobal (on a slightly older version of GHC which also manifests the bug):

; case if_rec_types env of {    -- Note [Tying the knot]
Just (mod, get_type_env)
| nameIsLocalOrFrom mod name
-> do           -- It's defined in the module being compiled
{ type_env <- setLclEnv () get_type_env         -- yuk
; case lookupNameEnv type_env name of
Just thing -> return thing
(ppr name  ppr type_env) }

; _ -> do


And if you see a Note associated with the code, you should definitely go find it and read it:

-- Note [Tying the knot]
-- ~~~~~~~~~~~~~~~~~~~~~
-- The if_rec_types field is used in two situations:
--
-- a) Compiling M.hs, which indirectly imports Foo.hi, which mentions M.T
--    Then we look up M.T in M's type environment, which is splatted into if_rec_types
--    after we've built M's type envt.
--
-- b) In ghc --make, during the upsweep, we encounter M.hs, whose interface M.hi
--    is up to date.  So we call typecheckIface on M.hi.  This splats M.T into
--    if_rec_types so that the (lazily typechecked) decls see all the other decls
--
-- In case (b) it's important to do the if_rec_types check *before* looking in the HPT
-- Because if M.hs also has M.hs-boot, M.T will *already be* in the HPT, but in its
-- emasculated form (e.g. lacking data constructors).


So case (a) is exactly what's going on here: when we are typechecking Boot.hs and load the interface A.hi, when we typecheck the reference to D, we don’t go and typecheck Boot.hi-boot; instead, we try to tie the knot with the locally defined Data in the module. If Data is not in the type environment, we get the panic we were seeing.

What makes things tricky is that there is no explicit call to "typecheck the reference to D"; instead, this lump of work is unsafely wrapped into a thunk for the TyThing representing D, which is embedded within the description of K. When we force this thunk, GHC will then scurry off and attempt to typecheck the types associated with D.

Back to our original question: why is D not defined in the local type environment? In general, this is because we forced the thunk for K (and thus caused it to call tcIfaceGlobal D) before we actually added D to the type environment. But why did this occur? There seem to be two possible explanations for this:

1. The first explanation is that we forgot to update the type environment before we forced the thunk. The fix then would be to add some extra updates to the global type environment so that we can see the missing types when we do force the thunk.
2. The second explanation is that we are forcing the thunk too early, and there is some code that needs to be made lazier so that we only force thunk at the point when the type environment has been updated sufficiently.

So, which is it?

In both cases, it seems useful to know where in the typechecking process we actually force the thunk. Now here’s the point where I should rebuild GHC with profiling and then get a stack trace out of tcIfaceGlobal, but I was feeling a bit lazy and so I decided to use GHC’s tracing facilities instead.

GHC has existing flags -ddump-tc-trace, -ddump-rn-trace and -ddump-if-trace which dump out a lot of debugging trace information associated with typechecking, renaming and interface loading, respectively. Most of these messages are very terse and don’t say very much about how the message is supposed to be interpreted; if you want to interpret these messages you are going to have to search the source code and see what code is outputting the trace.

Here's the end of the trace we get from compiling, in one-shot mode, Boot.hs:

Tc2 (src)
Tc3
txExtendKindEnv []
txExtendKindEnv []
tcTyAndCl start kind checking ()
kcTyClGroup
module Boot
data D = forall n_anU. K n_anU => D
<<some log elided here>>
tc_lhs_type:
K n_anU
Constraint
tc_infer_lhs_type: K
lk1 K
Starting fork { Declaration for K
updating EPS_
Considering whether to load GHC.Prim {- SYSTEM -}
reason: Need home interface for wired-in thing TYPE
updating EPS_
tc-iface-class1 K
tc-iface-class2 K
tc-iface-class3 K
tc-iface-class4 K
buildClass
newGlobalBinder A C:K <no location info>
C:K
newGlobalBinder A $tcK <no location info>$tcK
Starting fork { Class op method D -> a
ghc-stage2: panic! (the 'impossible' happened)
<<rest of the panic message>>


Amazingly, this trace actually tells you exactly what you need to know to solve the bug... but we're getting ahead of ourselves. First, we need to know how to interpret this trace.

Each trace message, e.g., Tc2 (src), Tc3, etc., comes with a unique string which you can use to find where the trace originates from. For example, grepping for Tc2 lands you in TcRnDriver.hs, right where we are about to start renaming and typechecking all of the declarations in the source file. Similarly, lk1 lands you in TcHsType.hs, where we are trying to lookup the TyThing associated with K.

The Starting fork messages are of particular interest: this is -ddump-if-trace's way of saying, “I am evaluating a thunk which has some deferred work typechecking interfaces.“ So we can see that shortly after the trace lk1, we force the thunk for the type class declaration K; furthermore, while we are forcing this thunk, we further force the thunk for the class operation method :: D -> a, which actually causes the panic.

### The Rube Goldberg machine

I didn’t read the trace closely enough, so I spent some time manually adding extra tracing declarations and tracing the flow of the code during typechecking. Starting with Tc2 (src), we can actually use the trace to follow the flow of typechecking (use of hasktags here is essential!):

1. tcRnModuleTcRnM is the main entry point for renaming and typechecking a module. After processing imports, it calls tcRnSrcDecls to typecheck the main body.
2. tcRnSrcDecls calls tc_rn_src_decls to typecheck all of the top-level declarations; then it simplifies all of the top-level constraints and finalizes all the types.
3. tc_rn_src_decls is the main loop of the Template Haskell / typecheck/renaming dance. We first rename (via rnTopSrcDecls) and typecheck (tcTopSrcDecls) up until the first splice, then run the splice and recurse.
4. tcTopSrcDecls outputs Tc2 (src). It successively typechecks all the different types of top-level declarations. The big important one is tcTyClsInstDecls which typechecks type and class declarations and handles deriving clauses.
5. tcTyClsInstDecls calls tcTyAndClassDecls to typecheck top-level type and class declarations, and then calls tcInstDeclsDeriv to handle deriving.
6. tcTyAndClassDecls takes every mutually recursive group of type/class declarations and calls tcTyClGroup on them.
7. tcTyClGroup calls tcTyClDecls to typecheck the group and then checks if everything is well-formed.
8. tcTyClDecls actually type checks the group of declarations. It first kind-checks the group with kcTyClGroup; then it type-checks all of the groups together, tying the knot.
9. kcTyClGroup outputs the (appropriately named) kcTyClGroup trace. At this point I stopped tracing.

By observing the kcTyClGroup trace, but no terminating kcTyClGroup result trace (which is at the end of this function), we can tell that while we were kind checking, the bad thunk was triggered.

It is actually quite useful to know that the panic occurs while we are kind-checking: kind-checking occurs before we actually construct the knot-tied TyThing structures for these top-level declarations. So we know that it is not the case that we are failing to update the global type environment, because it definitely is not constructed at this point. It must be that we are forcing a thunk too early.

### AAAAAAAA is the sound of a GHC disappearing down a black hole

At this point, I was pretty sure that lk1, a.k.a. tcTyVar was responsible for forcing the thunk that ultimately lead to the panic, but I wasn't sure. Here's the code for the function:

tcTyVar :: TcTyMode -> Name -> TcM (TcType, TcKind)
-- See Note [Type checking recursive type and class declarations]
-- in TcTyClsDecls
tcTyVar mode name         -- Could be a tyvar, a tycon, or a datacon
= do { traceTc "lk1" (ppr name)
; thing <- tcLookup name
; case thing of
ATyVar _ tv -> return (mkTyVarTy tv, tyVarKind tv)

ATcTyCon tc_tc -> do { check_tc tc_tc
; tc <- get_loopy_tc name tc_tc
; handle_tyfams tc tc_tc }
-- mkNakedTyConApp: see Note [Type-checking inside the knot]
-- NB: we really should check if we're at the kind level
-- and if the tycon is promotable if -XNoTypeInType is set.
-- But this is a terribly large amount of work! Not worth it.

AGlobal (ATyCon tc)
-> do { check_tc tc
; handle_tyfams tc tc }


tcTyVar on K should result in the AGlobal (ATyCon tc), and inserting a trace on that branch didn’t result in any extra output. But I sealed the deal by adding thing seq traceTc "lk2" (ppr name) and observing that no lk2 occurred.

It is also clear that it should be OK for us to force K, which is an external declaration, at this point in the code. So something has gone wrong inside the thunk itself.

### Back to the tea leaves

Let's take a look at the end of the trace again:

Starting fork { Declaration for K
updating EPS_
Considering whether to load GHC.Prim {- SYSTEM -}
reason: Need home interface for wired-in thing TYPE
updating EPS_
tc-iface-class1 K
tc-iface-class2 K
tc-iface-class3 K
tc-iface-class4 K
buildClass
newGlobalBinder A C:K <no location info>
C:K
newGlobalBinder A $tcK <no location info>$tcK
Starting fork { Class op method D -> a
ghc-stage2: panic! (the 'impossible' happened)
<<rest of the panic message>>


In human readable terms, the trace tells a story like this:

1. Someone forced the thunk representing the TyThing for the type class K (Starting fork { Declaration for K)
2. I'm typechecking the contents of the IfaceDecl for K (tc-iface-class, etc.)
3. I'm building the actual Class representing this type class (buildClass)
4. I allocate some global names for the class in question. (newGlobalBinder)
5. Oops! I force the thunk representing class operation method (which has type D -> a)
6. Shortly after, a panic occurs.

So, it’s off to read the code for TcIface. Here's the body of the code which typechecks an IfaceDecl for a type class declaration:

= bindIfaceTyConBinders binders $\ tyvars binders' -> do { tc_name <- lookupIfaceTop tc_occ ; traceIf (text "tc-iface-class1" <+> ppr tc_occ) ; ctxt <- mapM tc_sc rdr_ctxt ; traceIf (text "tc-iface-class2" <+> ppr tc_occ) ; sigs <- mapM tc_sig rdr_sigs ; fds <- mapM tc_fd rdr_fds ; traceIf (text "tc-iface-class3" <+> ppr tc_occ) ; mindef <- traverse (lookupIfaceTop . mkVarOccFS) mindef_occ ; cls <- fixM$ \ cls -> do
{ ats  <- mapM (tc_at cls) rdr_ats
; traceIf (text "tc-iface-class4" <+> ppr tc_occ)
; buildClass tc_name tyvars roles ctxt binders' fds ats sigs mindef tc_isrec }
; return (ATyCon (classTyCon cls)) }


The methods of a type class are processed in sigs <- mapM tc_sig rdr_sigs. Looking at this helper function, we see:

tc_sig :: IfaceClassOp -> IfL TcMethInfo
tc_sig (IfaceClassOp occ rdr_ty dm)
= do { op_name <- lookupIfaceTop occ
; ~(op_ty, dm') <- forkM (mk_op_doc op_name rdr_ty) $do { ty <- tcIfaceType rdr_ty ; dm' <- tc_dm dm ; return (ty, dm') } -- Must be done lazily for just the same reason as the -- type of a data con; to avoid sucking in types that -- it mentions unless it's necessary to do so ; return (op_name, op_ty, dm') }  Great! There is already some code which mentions that the types of the signatures need to be done lazily. If we force op_ty or dm', we will cause the types here to be loaded. So now all we need to do is find where in buildClass these are being forced. Here is the header of buildClass: buildClass tycon_name tvs roles sc_theta binders fds at_items sig_stuff mindef tc_isrec  So let's look for occurrences of sig_stuff. The first place they are used is: ; op_items <- mapM (mk_op_item rec_clas) sig_stuff -- Build the selector id and default method id  Let's look at that helper function: mk_op_item :: Class -> TcMethInfo -> TcRnIf n m ClassOpItem mk_op_item rec_clas (op_name, _, dm_spec) = do { dm_info <- case dm_spec of Nothing -> return Nothing Just spec -> do { dm_name <- newImplicitBinder op_name mkDefaultMethodOcc ; return (Just (dm_name, spec)) } ; return (mkDictSelId op_name rec_clas, dm_info) }  There it is! The case on dm_spec will force dm', which will in turn cause the type to be forced, which results in a panic. That can’t be right. It seems that mk_op_item only cares about the top level of wrapping on dm_spec; spec is used lazily inside dm_info, which doesn't appear to be forced later in mkClass. So the fix would be to make it so that when can peel back the outer Maybe without forcing the contents of dm: --- a/compiler/iface/TcIface.hs +++ b/compiler/iface/TcIface.hs @@ -429,20 +429,23 @@ tc_iface_decl _parent ignore_prags tc_sig :: IfaceClassOp -> IfL TcMethInfo tc_sig (IfaceClassOp occ rdr_ty dm) = do { op_name <- lookupIfaceTop occ - ; ~(op_ty, dm') <- forkM (mk_op_doc op_name rdr_ty)$
-                             do { ty <- tcIfaceType rdr_ty
-                                ; dm' <- tc_dm dm
-                                ; return (ty, dm') }
+          ; let doc = mk_op_doc op_name rdr_ty
+          ; op_ty <- forkM (doc <+> text "ty") $tcIfaceType rdr_ty -- Must be done lazily for just the same reason as the -- type of a data con; to avoid sucking in types that -- it mentions unless it's necessary to do so + ; dm' <- tc_dm doc dm ; return (op_name, op_ty, dm') } - tc_dm :: Maybe (DefMethSpec IfaceType) -> IfL (Maybe (DefMethSpec Type)) - tc_dm Nothing = return Nothing - tc_dm (Just VanillaDM) = return (Just VanillaDM) - tc_dm (Just (GenericDM ty)) = do { ty' <- tcIfaceType ty - ; return (Just (GenericDM ty')) } + tc_dm :: SDoc + -> Maybe (DefMethSpec IfaceType) + -> IfL (Maybe (DefMethSpec Type)) + tc_dm _ Nothing = return Nothing + tc_dm _ (Just VanillaDM) = return (Just VanillaDM) + tc_dm doc (Just (GenericDM ty)) + = do { -- Must be done lazily to avoid sucking in types + ; ty' <- forkM (doc <+> text "dm")$ tcIfaceType ty
+             ; return (Just (GenericDM ty')) }


We check the fix, and yes! It works!

### The parting glass

I won’t claim that my debugging process was the most efficient possible—not mentioned in this blog post is the day I spent reading the commit history to try and convince myself that there wasn’t actually a bug where we forgot to update the global type environment. But there seem to be a few generalizable lessons here:

1. If you see some trace output, the way to make the trace most useful to you is to determine where in the code the trace comes from, and what the compiler is doing at that point in time. Usually, grepping for the trace message is good enough to figure this out.
2. The smaller your test cases, the smaller your traces will be, which will make it easier to interpret the traces. When I ran my test case using ghc --make rather than ghc -c, there was a lot more logging output. Sure the ending trace is the same but if there was something important in the earlier trace, it would have been that much harder to dig out.
3. If you can trust your traces, debugging is easier. If I had trusted the trace output, I could have found the bug a lot more quickly. But I didn't, and instead spent a bunch of time making sure the code was behaving in the way I expected it to. On the plus side, I understand the codepath here a lot better than I used to.

How can GHC make debugging these types of bugs easier? Have your own laziness-related debugging story? I would love to hear what you think.

# My Favorite NP-Complete Problem at !!Con 2016

Back in 2006 when this blog was new I observed that the problem of planning Elmo’s World video releases was NP-complete.

This spring I turned the post into a talk, which I gave at !!Con 2016 last week.

## May 11, 2016

### Jasper Van der Jeugt

In Haskell, it is extremely easy for the programmer to add a quick datatype. It does not have to take more than a few lines. This is useful to add auxiliary, ad-hoc datatypes.

I don’t think this is used enough. Most libraries and code I see use “heavier” datatypes: canonical and very well-thought-out datatypes, followed by dozens of instances and related functions. These are of course great to work with, but it doesn’t have to be a restriction: adding quick datatypes – without all these instances and auxiliary functions – often makes code easier to read.

The key idea is that, in order to make code as simple as possible, you want to represent your data as simply as possible. However, the definition of “simple” is not the same in every context. Sometimes, looking at your data from another perspective makes specific tasks easier. In those cases, introducing “quick-and-dirty” datatypes often makes code cleaner.

This blogpost is written in literate Haskell so you should be able to just load it up in GHCi and play around with it. You can find the raw .lhs file here.

> import Control.Monad (forM_)

Let’s look at a quick example. Here, we have a definition of a shopping cart in a fruit store.

> data Fruit = Banana | Apple | Orange
>     deriving (Show, Eq)
>
> type Cart = [(Fruit, Int)]

And we have a few functions to inspect it.

> cartIsHomogeneous :: Cart -> Bool
> cartIsHomogeneous []                = True
> cartIsHomogeneous ((fruit, _) : xs) = all ((== fruit) . fst) xs
> cartTotalItems :: Cart -> Int
> cartTotalItems = sum . map snd

This is very much like code you typically see in Haskell codebases (of course, with more complex datatypes than this simple example).

The last function we want to add is printing a cart. The exact way we format it depends on what is in the cart. There are four possible scenarios.

1. The cart is empty.
2. There is a single item in the customers cart and we have some sort of simplified checkout.
3. The customer buys three or more of the same fruit (and nothing else). In that case we give out a bonus.
4. None of the above.

This is clearly a good candidate for Haskell’s case statement and guards. Let’s try that.

> printCart :: Cart -> IO ()
> printCart cart = case cart of
>     []           -> putStrLn $"Your cart is empty" > [(fruit, 1)] -> putStrLn$ "You are buying one " ++ show fruit
>     _ | cartIsHomogeneous cart && cartTotalItems cart >= 3 -> do
>               putStrLn $> show (cartTotalItems cart) ++ > " " ++ show (fst$ head cart) ++ "s" ++
>                   " for you!"
>               putStrLn "BONUS GET!"
>       | otherwise -> do
>           putStrLn $"Your cart: " > forM_ cart$ \(fruit, num) ->
>               putStrLn $"- " ++ show num ++ " " ++ show fruit This is not very nice. The business logic is interspersed with the printing code. We could clean it up by adding additional predicates such as cartIsBonus, but having too many of these predicates leads to a certain kind of Boolean Blindness. Instead, it seems much nicer to introduce a temporary type: > data CartView > = EmptyCart > | SingleCart Fruit > | BonusCart Fruit Int > | GeneralCart Cart This allows us to decompose our printCart into two clean parts: classifying the cart, and printing it. > cartView :: Cart -> CartView > cartView [] = EmptyCart > cartView [(fruit, 1)] = SingleCart fruit > cartView cart > | cartIsHomogeneous cart && cartTotalItems cart >= 3 = > BonusCart (fst$ head cart) (cartTotalItems cart)
>     | otherwise = GeneralCart cart

Note that we now have a single location where we classify the cart. This is useful if we need this information in multiple places. If we chose to solve the problem by adding additional predicates such has cartIsBonus instead, we would still have to watch out that we check these predicates in the same order everywhere. Furthermore, if we need to add a case, we can simply add a constructor to this datatype, and the compiler will tell us where we need to update our code 1.

Our printCart becomes very simple now:

> printCart2 :: Cart -> IO ()
> printCart2 cart = case cartView cart of
>     EmptyCart          -> putStrLn $"Your cart is empty" > SingleCart fruit -> putStrLn$ "You are buying one " ++ show fruit
>     BonusCart  fruit n -> do
>         putStrLn $show n ++ " " ++ show fruit ++ "s for you!" > putStrLn "BONUS GET!" > GeneralCart items -> do > putStrLn$ "Your cart: "
>         forM_ items $\(fruit, num) -> > putStrLn$ "- " ++ show num ++ " " ++ show fruit

Of course, it goes without saying that ad-hoc datatypes that are only used locally should not be exported from the module – otherwise you end up with a mess again.

1. If you are compiling with -Wall, which is what you really, really should be doing.

# Doctoral Thesis Published

I have officially published my doctoral thesis “ Lazy Evaluation: From natural semantics to a machine-checked compiler transformation” (DOI: 10.5445/IR/1000054251). The abstract of the 226 page long document that earned me a “summa cum laude” reads

In order to solve a long-standing problem with list fusion, a new compiler transformation, 'Call Arity' is developed and implemented in the Haskell compiler GHC. It is formally proven to not degrade program performance; the proof is machine-checked using the interactive theorem prover Isabelle. To that end, a formalization of Launchburys Natural Semantics for Lazy Evaluation is modelled in Isabelle, including a correctness and adequacy proof.

and I assembled all relevant artefacts (the thesis itself, its LaTeX-sources, the Isabelle theories, various patches against GHC, raw benchmark results, errata etc.) at http://www.joachim-breitner.de/thesis/.

Other, less retrospective news: My paper on the Incredible Proof Machine got accepted at ITP in Nancy, and I was invited to give a keynote demo about the proof machine at LFMTP in Porto. Exciting!

# TL;DR:

• use --autounmask=n
• use --backtrack=1000 (or more)
• package.mask all outdated packages that cause conflicts (usually requires more iterations)
• run world update

# The problem

Occasionally (more frequently on haskel packages) portage starts taking long time to only tell you it was not able to figure out the upgrade path.

Some people suggest "wipe-blockers-and-reinstall" solution. This post will try to explain how to actually upgrade (or find out why it’s not possible) without actually destroying your system.

# Real-world example

I’ll take a real-world example in Gentoo’s bugzilla: bug 579520 where original upgrade error looked like that:

!!! Multiple package instances within a single package slot have been pulled
!!! into the dependency graph, resulting in a slot conflict:

x11-libs/gtk+:3

(x11-libs/gtk+-3.18.7:3/3::gentoo, ebuild scheduled for merge) pulled in by
(no parents that aren't satisfied by other packages in this slot)

(x11-libs/gtk+-3.20.0:3/3::gnome, installed) pulled in by
>=x11-libs/gtk+-3.19.12:3[introspection?] required by (gnome-base/nautilus-3.20.0:0/0::gnome, installed)
^^              ^^^^^^^^^
>=x11-libs/gtk+-3.20.0:3[cups?] required by (gnome-base/gnome-core-libs-3.20.0:3.0/3.0::gnome, installed)
^^              ^^^^^^^^
>=x11-libs/gtk+-3.19.4:3[introspection?] required by (media-video/totem-3.20.0:0/0::gnome, ebuild scheduled for merge)
^^              ^^^^^^^^
>=x11-libs/gtk+-3.19.0:3[introspection?] required by (app-editors/gedit-3.20.0:0/0::gnome, ebuild scheduled for merge)
^^              ^^^^^^^^
>=x11-libs/gtk+-3.19.5:3 required by (gnome-base/dconf-editor-3.20.0:0/0::gnome, ebuild scheduled for merge)
^^              ^^^^^^^^
>=x11-libs/gtk+-3.19.6:3[introspection?] required by (x11-libs/gtksourceview-3.20.0:3.0/3::gnome, ebuild scheduled for merge)
^^              ^^^^^^^^
>=x11-libs/gtk+-3.19.3:3[introspection,X] required by (media-gfx/eog-3.20.0:1/1::gnome, ebuild scheduled for merge)
^^              ^^^^^^^^
>=x11-libs/gtk+-3.19.8:3[X,introspection?] required by (x11-wm/mutter-3.20.0:0/0::gnome, installed)
^^              ^^^^^^^^
>=x11-libs/gtk+-3.19.12:3[X,wayland?] required by (gnome-base/gnome-control-center-3.20.0:2/2::gnome, installed)
^^              ^^^^^^^^^
>=x11-libs/gtk+-3.19.11:3[introspection?] required by (app-text/gspell-1.0.0:0/0::gnome, ebuild scheduled for merge)
^^              ^^^^^^^^^

x11-base/xorg-server:0

(x11-base/xorg-server-1.18.3:0/1.18.3::gentoo, installed) pulled in by
x11-base/xorg-server:0/1.18.3= required by (x11-drivers/xf86-video-nouveau-1.0.12:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.3= required by (x11-drivers/xf86-video-fbdev-0.4.4:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.3= required by (x11-drivers/xf86-input-evdev-2.10.1:0/0::gentoo, installed)
^^^^^^^^^^

(x11-base/xorg-server-1.18.2:0/1.18.2::gentoo, ebuild scheduled for merge) pulled in by
x11-base/xorg-server:0/1.18.2= required by (x11-drivers/xf86-video-vesa-2.3.4:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.2= required by (x11-drivers/xf86-input-synaptics-1.8.2:0/0::gentoo, installed)
^^^^^^^^^^
x11-base/xorg-server:0/1.18.2= required by (x11-drivers/xf86-input-mouse-1.9.1:0/0::gentoo, installed)
^^^^^^^^^^

app-text/poppler:0

(app-text/poppler-0.32.0:0/51::gentoo, ebuild scheduled for merge) pulled in by
^^^^^^
>=app-text/poppler-0.16:0/51=[cxx] required by (app-office/libreoffice-5.0.5.2:0/0::gentoo, installed)
^^^^^^
>=app-text/poppler-0.12.3-r3:0/51= required by (app-text/texlive-core-2014-r4:0/0::gentoo, installed)
^^^^^^

(app-text/poppler-0.42.0:0/59::gentoo, ebuild scheduled for merge) pulled in by
>=app-text/poppler-0.33[cairo] required by (app-text/evince-3.20.0:0/evd3.4-evv3.3::gnome, ebuild scheduled for merge)
^^                 ^^^^

net-fs/samba:0

(net-fs/samba-4.2.9:0/0::gentoo, installed) pulled in by
(no parents that aren't satisfied by other packages in this slot)

(net-fs/samba-3.6.25:0/0::gentoo, ebuild scheduled for merge) pulled in by
net-fs/samba[smbclient] required by (media-sound/xmms2-0.8-r2:0/0::gentoo, ebuild scheduled for merge)
^^^^^^^^^

It may be possible to solve this problem by using package.mask to
prevent one of those packages from being selected. However, it is also
possible that conflicting dependencies exist such that they are
impossible to satisfy simultaneously.  If such a conflict exists in
the dependencies of two different packages, then those packages can
not be installed simultaneously.

page or refer to the Gentoo Handbook.

emerge: there are no ebuilds to satisfy ">=dev-libs/boost-1.55:0/1.57.0=".
(dependency required by "app-office/libreoffice-5.0.5.2::gentoo" [installed])
(dependency required by "@selected" [set])
(dependency required by "@world" [argument])

A lot of text! Let’s trim it down to essential detail first (AKA how to actually read it). I’ve dropped the "cause" of conflcts from previous listing and left only problematic packages:

!!! Multiple package instances within a single package slot have been pulled
!!! into the dependency graph, resulting in a slot conflict:

x11-libs/gtk+:3
(x11-libs/gtk+-3.18.7:3/3::gentoo, ebuild scheduled for merge) pulled in by
(x11-libs/gtk+-3.20.0:3/3::gnome, installed) pulled in by

x11-base/xorg-server:0
(x11-base/xorg-server-1.18.3:0/1.18.3::gentoo, installed) pulled in by
(x11-base/xorg-server-1.18.2:0/1.18.2::gentoo, ebuild scheduled for merge) pulled in by

app-text/poppler:0
(app-text/poppler-0.32.0:0/51::gentoo, ebuild scheduled for merge) pulled in by
(app-text/poppler-0.42.0:0/59::gentoo, ebuild scheduled for merge) pulled in by

net-fs/samba:0
(net-fs/samba-4.2.9:0/0::gentoo, installed) pulled in by
(net-fs/samba-3.6.25:0/0::gentoo, ebuild scheduled for merge) pulled in by

emerge: there are no ebuilds to satisfy ">=dev-libs/boost-1.55:0/1.57.0=".

That is more manageable. We have 4 "conflicts" here and one "missing" package.

Note: all the listed requirements under "conflicts" (the previous listing) suggest these are >= depends. Thus the dependencies themselves can’t block upgrade path and error message is misleading.

For us it means that portage leaves old versions of gtk+ (and others) for some unknown reason.

To get an idea on how to explore that situation we need to somehow hide outdated packages from portage and retry an update. It will be the same as uninstalling and reinstalling a package but not actually doing it

package.mask does exactly that. To make package hidden for real we will also need to disable autounmask: --autounmask=n (default is y).

Let’s hide outdated x11-libs/gtk+-3.18.7 package from portage:

# /etc/portage/package.mask
<x11-libs/gtk+-3.20.0:3

Blocker list became shorter but still does not make sense:

x11-base/xorg-server:0
(x11-base/xorg-server-1.18.2:0/1.18.2::gentoo, ebuild scheduled for merge) pulled in by
(x11-base/xorg-server-1.18.3:0/1.18.3::gentoo, installed) pulled in by
^^^^^^^^^^

app-text/poppler:0
(app-text/poppler-0.32.0:0/51::gentoo, ebuild scheduled for merge) pulled in by
(app-text/poppler-0.42.0:0/59::gentoo, ebuild scheduled for merge) pulled in by

Blocking more old stuff:

# /etc/portage/package.mask
<x11-libs/gtk+-3.20.0:3
<x11-base/xorg-server-1.18.3
<app-text/poppler-0.42.0

The output is:

[blocks B      ] <dev-util/gdbus-codegen-2.48.0 ("<dev-util/gdbus-codegen-2.48.0" is blocking dev-libs/glib-2.48.0)

* Error: The above package list contains packages which cannot be
* installed at the same time on the same system.

(dev-libs/glib-2.48.0:2/2::gentoo, ebuild scheduled for merge) pulled in by

(dev-util/gdbus-codegen-2.46.2:0/0::gentoo, ebuild scheduled for merge) pulled in by

That’s our blocker! Stable <dev-util/gdbus-codegen-2.48.0 blocks unstable blocking dev-libs/glib-2.48.0.

The solution is to ~arch keyword dev-util/gdbus-codegen-2.48.0:

# /etc/portage/package.accept_keywords
dev-util/gdbus-codegen

And run world update.

Simple!

# Constructivist Motto

I don’t believe classical logic is false; I just believe that it is not true.

# Reminder: Summer of Haskell Proposals Due Today

For anyone planning to participate in Summer of Haskell this summer, remember that proposals are due today.  Good luck to all!

## May 05, 2016

### Douglas M. Auclair (geophf)

• April 15th, 2016:
(\(intensf, list) -> map (second intensf) list) (insensifierFn, assocList)
Point-free-itize this lambda
• obadz @obadzz uncurry $map . second • April 15th, 2016: foldr (\card -> let c = color card in mappend c *** ((idx card, c):)) (mempty, []) cards Point-free-itize and de-let lambda • Gautier DI FOLCO @gautier_difolco foldr (uncurry (***) . (uncurry fmap . fmap (((,) =<< mappend) . color) . ((,) =<< ((:) .) . (,) . idx))) (mempty, []) cards • me: foldr (uncurry (***) . (mappend . snd &&& (:)) . (idx &&& color)) (mempty, []) • April 15th, 2016: map (\(idx, color) -> C idx (cell bb idx) color) assoclist point-free-itize the lambda in the map function. • Eyal Lotem @EyalL uncurry (C <*> cell bb) ? Not in front of a type checker, can't check myself :) • April 13th, 2016: point-free-itize lambda term in: uncurry (foldr (\a -> min a *** max a)) . ((head &&& head) &&& id) • obadz @obadzz liftA2 (***) min max • April 12th, 2016: minmax :: Ord a => [a] -> (a, a) minmax [1..9] = (1,9) in linear time. minmax = minimum &&& maximum fails as it's 2x too slow. • April 12th, 2016: You have (a,b) (c,d) You need (a*c, b*d) Well, we don't have all day! (Actually, we do) Get to it! Pointless; I MEAN POINT-FREE pls • lotz@Haskell㌠ @lotz84_ ((uncurry . flip . curry$ id).) . ((uncurry.) . uncurry . (flip.) . flip . flip) ((((flip . (flip.)) $((,).) . (*)).) . (*)) • Olivier Iffrig @oiffrig Point-free but star-full: uncurry (***) . ((*) *** (*)) • bazzargh @bazzargh curry((uncurry (*)) *** (uncurry (*))) • bazzargh @bazzargh knew there had to be a bimap answer, eventually got: curry ((bimap <$> id <*> id) (uncurry (*)))
• Андреев Кирилл @nonaem00 (((product *** product) . unzip) .) . (flip (:)) . (:[]) ... I clearly need more coffee this morning.
• April 10th, 2016:

data BBx = Bx (Int, Int) Pt2D Pt2D

define:
mkBB :: [a] -> Pt2D -> BBx
mkBB xs extents = Bx (0, length xs) (0,0) extents
points-free
• Gautier DI FOLCO @gautier_difolco
-- LANGUAGE TupleSections
data BBx = Bx (Int, Int) Pt2D Pt2D
mkBB :: [a] -> Pt2D -> BBx
mkBB = flip Bx (0,0) . (0, ) . length
• Thomas D @tthomasdd mkBB = flip Bx (0,0) . (,) 0 . length
• April 7th, 2016:
type MList a n = [(a, n)]
partSz :: Ord n => n -> MList a n -> (MList a n, MList a n)
partSz µ = partition ((µ >) . snd)
Define points-free
• partSz = partition . (. snd) . (>)  -- via @gautier_difolco

# The Mistake Everyone Makes with KnockoutJS

### Introduction

Knockout is a nice JavaScript library for making values that automatically update when any of their “dependencies” update. Those dependencies can form an arbitrary directed acyclic graph. Many people seem to think of it as “yet another” templating library, but the core idea which is useful far beyond “templating” is the notion of observable values. One nice aspect is that it is a library and not a framework so you can use it as little or as much as you want and you can integrate it with other libraries and frameworks.

At any rate, this article is more geared toward those who have already decided on using Knockout or a library (in any language) offering similar capabilities. I strongly suspect the issues and solutions I’ll discuss apply to all similar sorts of libraries. While I’ll focus on one particular example, the ideas behind it apply generally. This example, admittedly, is one that almost anyone will implement, and in my experience will do it incorrectly the first time and won’t realize the problem until later.

### The Problem

When doing any front-end work, before long there will be a requirement to support “multi-select” of something. Of course, you want the standard select/deselect all functionality and for it to work correctly, and of course you want to do something with the items you’ve selected. Here’s a very simple example:

Number selected:
Item

Here, the number selected is an overly simple example of using the selected items. More realistically, the selected items will trigger other items to show up and/or trigger AJAX requests to update the data or populate other data. The HTML for this example is completely straightforward.

<div id="#badExample">
Number selected: <span data-bind="text: $data.numberSelected()"></span> <table> <tr><th><input type="checkbox" data-bind="checked:$data.allSelected"/></th><th>Item</th></tr>
<!-- ko foreach: { data: $data.items(), as: '$item' } -->
<tr><td><input type="checkbox" data-bind="checked: $data.selected"/></td><td data-bind="text: 'Item number: '+$data.body"></td></tr>
<!-- /ko -->
<tr><td><button data-bind="click: function() { $data.add(); }">Add</button></td></tr> </table> </div> The way nearly everyone (including me) first thinks to implement this is by adding a selected observable to each item and then having allSelected depend on all of the selected observables. Since we also want to write to allSelected to change the state of the selected observables we use a writable computed observable. This computed observable will loop through all the items and check to see if they are all set to determine it’s state. When it is updated, it will loop through all the selected observables and set them to the appropriate state. Here’s the full code listing. var badViewModel = { counter: 0, items: ko.observableArray() }; badViewModel.allSelected = ko.computed({ read: function() { var items = badViewModel.items(); var allSelected = true; for(var i = 0; i < items.length; i++) { // Need to make sure we depend on each item, so don't break out of loop early allSelected = allSelected && items[i].selected(); } return allSelected; }, write: function(newValue) { var items = badViewModel.items(); for(var i = 0; i < items.length; i++) { items[i].selected(newValue); } } }); badViewModel.numberSelected = ko.computed(function() { var count = 0; var items = badViewModel.items(); for(var i = 0; i < items.length; i++) { if(items[i].selected()) count++; } return count; }); badViewModel.add = function() { badViewModel.items.push({ body: badViewModel.counter++, selected: ko.observable(false) }); }; ko.applyBindings(badViewModel, document.getElementById('#badExample')); This should be relatively straightforward, and it works, so what’s the problem? The problem can be seen in numberSelected (and it also comes up with allSelected which I’ll get to momentarily). numberSelected depends on each selected observable and so it will be fired each time each one updates. That means if you have 100 items, and you use the select all checkbox, numberSelected will be called 100 times. For this example, that doesn’t really matter. For a more realistic example than numberSelected, this may mean rendering one, then two, then three, … then 100 HTML fragments or making 100 AJAX requests. In fact, this same behavior is present in allSelected. When it is written, as it’s writing to the selected observables, it is also triggering itself. So the problem is updating allSelected or numberSelected can’t be done all at once, or to use database terminology, it can’t be updated atomically. One possible solution in newer versions of Knockout is to use deferredUpdates or, what I did back in the much earlier versions of Knockout, abuse the rate limiting features. The problem with this solution is that it makes updates asynchronous. If you’ve written your code to not care whether it was called synchronously or asynchronously, then this will work fine. If you haven’t, doing this throws you into a world of shared state concurrency and race conditions. In this case, this solution is far worse than the disease. ### The Solution So, what’s the alternative? We want to update all selected items atomically; we can atomically update a single observable; so we’ll put all selected items into a single observable. Now an item determines if it is selected by checking whether it is in the collection of selected items. More abstractly, we make our observables more coarse-grained, and we have a bunch of small computed observables depend on a large observable instead of a large computed observable depending on a bunch of small observables as we had in the previous code. Here’s an example using the exact same HTML and presenting the same overt behavior. Number selected: Item And here’s the code behind this second example: var goodViewModel = { counter: 0, selectedItems: ko.observableArray(), items: ko.observableArray() }; goodViewModel.allSelected = ko.computed({ read: function() { return goodViewModel.items().length === goodViewModel.selectedItems().length; }, write: function(newValue) { if(newValue) { goodViewModel.selectedItems(goodViewModel.items().slice(0)); // Need a copy! } else { goodViewModel.selectedItems.removeAll(); } } }); goodViewModel.numberSelected = ko.computed(function() { return goodViewModel.selectedItems().length; }); goodViewModel.add = function() { var item = { body: goodViewModel.counter++ } item.selected = ko.computed({ read: function() { return goodViewModel.selectedItems.indexOf(item) > -1; }, write: function(newValue) { if(newValue) { goodViewModel.selectedItems.push(item); } else { goodViewModel.selectedItems.remove(item); } } }); goodViewModel.items.push(item); }; ko.applyBindings(goodViewModel, document.getElementById('#goodExample')); One thing to note is that setting allSelected and numberSelected are now both simple operations. A write to an observable on triggers a constant number of writes to other observables. In fact, there are only two (non-computed) observables. On the other hand, reading the selected observable is more expensive. Toggling all items has quadratic complexity. In fact, it had quadratic complexity before due to the feedback. However, unlike the previous code, this also has quadratic complexity when any individual item is toggled. Unlike the previous code, though, this is simply due to a poor choice of data structure. Equipping each item with an “ID” field and using an object as a hash map would reduce the complexity to linear. In practice, for this sort of scenario, it tends not to make a big difference. Also, Knockout won’t trigger dependents if the value doesn’t change, so there’s no risk of the extra work propagating into still more extra work. Nevertheless, while I endorse this solution for this particular problem, in general making finer grained observables can help limit the scope of changes so unnecessary work isn’t done. Still, the real concern and benefit of this latter approach isn’t the asymptotic complexity of the operations, but the atomicity of the operations. In the second solution, every update is atomic. There are no intermediate states on the way to a final state. This means that dependents, represented by numberSelected but which are realistically much more complicated, don’t get triggered excessively and don’t need to “compensate” for unintended intermediate values. ### Epilogue We could take the coarse-graining to its logical conclusion and have the view model for an application be a single observable holding an object representing the entire view model (and containing no observables of its own). Taking this approach actually does have a lot of benefits, albeit there is little reason to use Knockout at that point. Instead this starts to lead to things like Facebook’s Flux pattern and the pattern perhaps most clearly articulated by Cycle JS. <script lang="text/javascript" src="http://ajax.aspnetcdn.com/ajax/knockout/knockout-3.3.0.js"></script> <script lang="text/javascript"> // Bad View Model var badViewModel = { counter: 0, items: ko.observableArray() }; badViewModel.allSelected = ko.computed({ read: function() { var items = badViewModel.items(); var allSelected = true; for(var i = 0; i < items.length; i++) { // Need to make sure we depend on each item, so don't break out of loop early allSelected = allSelected && items[i].selected(); } return allSelected; }, write: function(newValue) { var items = badViewModel.items(); for(var i = 0; i < items.length; i++) { items[i].selected(newValue); } } }); badViewModel.numberSelected = ko.computed(function() { var count = 0; var items = badViewModel.items(); for(var i = 0; i < items.length; i++) { if(items[i].selected()) count++; } return count; }); badViewModel.add = function() { badViewModel.items.push({ body: badViewModel.counter++, selected: ko.observable(false) }); }; // Good View Model var goodViewModel = { counter: 0, selectedItems: ko.observableArray(), items: ko.observableArray() }; goodViewModel.allSelected = ko.computed({ read: function() { return goodViewModel.items().length === goodViewModel.selectedItems().length; }, write: function(newValue) { if(newValue) { goodViewModel.selectedItems(goodViewModel.items().slice(0)); // Need a copy! } else { goodViewModel.selectedItems.removeAll(); } } }); goodViewModel.numberSelected = ko.computed(function() { return goodViewModel.selectedItems().length; }); goodViewModel.add = function() { var item = { body: goodViewModel.counter++ } item.selected = ko.computed({ read: function() { return goodViewModel.selectedItems.indexOf(item) > -1; }, write: function(newValue) { if(newValue) { goodViewModel.selectedItems.push(item); } else { goodViewModel.selectedItems.remove(item); } } }); goodViewModel.items.push(item); }; ko.applyBindings(badViewModel, document.getElementById('#badExample')); ko.applyBindings(goodViewModel, document.getElementById('#goodExample')); </script> ### Disciple/DDC # DDC 0.4.2 -- the "almost useful" release DDC 0.4.2 was pushed to Hackage a few days ago. Running "cabal update; cabal install ddc-tools" should build it. You'll need a version of the LLVM tools in your path. LLVM 3.4 - 3.8 (the current release) should work. Code generation and runtime system support for higher order functions is finished. The type checker also now automatically inserts the run and box casts which were mentioned a few posts ago. Release notes are on the github page. The programs we can compile are starting to look "almost useful". For example, here is some code from the AlmostPrimes example, which I adapted from Rosetta Code: -- | A stream of k-almost-prime numbers.kPrimes (k: Nat#): Stream Nat# Nat# = sfilter (isKPrime k) (senumFrom 2) -- | Check if some number is k-almost-prime.isKPrime (k n: Nat#): Bool# | k == 1 = isPrime n | otherwise = sany$ smap       (isKPrime (k - 1))        $smap fst$ sfilter    (eq 0 ∘ snd)        $smap (divMod n)$ stakeWhile (λx: Nat#. x < n)        $primes 2main (_: Unit): S Console Unit = do forS (enumFromTo 1 5) (λk: Nat#. do write$ "k = " % showNat k % ": "                forS (listOfStream $stake 10$ kPrimes k)                        (λx: Nat#. write $showNat x % " ") write "\n") The full example code is here Note that Nat# is the type of a primitive natural number. The # just means "primitive" which is not the same as "unboxed" as per GHC. The hashes will go away once I implement type synonyms and can define a nicer synonym. The example invokes some standard stream combinators, smap, sfilter and so on. The forS function is like mapM, but using the Disciple effect system instead of monads. The type of forS is: forS : List a -> (a -> S e Unit) -> S e Unit S e Unit is the type of a suspended computation with effect e which returns a value of type Unit when evaluated. I'm calling this the "almost useful" release because because we still need to finish the garbage collector, as well as desugaring for nested patterns, before I could imagine writing anything substantial in it. Late last year Jacob Stanley was working on the transform to insert GC slot handling into core programs, but the runtime support still needs to be implemented. Current work is to finish support for type synonyms, then I'll do either the GC or nested patterns, depending in which looks like the lowest hanging fruit. ### FP Complete # Stack Security GnuPG Keys # Introduction We are introducing a method of verifying Haskell packages with OpenPGP signatures. The initial code to sign your packages has been included in Stack as an experimental feature for some time. We are going to be improving it and included verification options soon. However, we need signatures from package authors before verification is useful. The first step in submitting package signatures is to create a secure OpenPGP key for yourself and publish it. This post will walk you through creating an offline master OpenPGP key-set with GnuPG. If you've never used GPG before and need to generate a secure set of keys, then continue reading. If you already have GPG keys that you are happy with, continue to 'Signing Packages with Stack' towards the bottom of this article. # GPG Keys ## Choosing a version of GPG There are 3 main flavors of GnuPG: • 1.4 (AKA "gpg") - Old - Super stable (11 years old) & limited features • 2.0 (AKA "gpg2") - Stable - Most people use this for personal encryption • 2.1 (AKA "gpg2") - Modern - Latest features: ECC support, Tor, TOFU, etc It is recommended by the GnuPG project version to use the Stable GPG-2 release (2.0). I've included instructions for using GPG 1.4 but I don't recommend using such an old version unless you have no choice. I've also included instructions for GnuPG 2.1 because I use it personally and find the new features useful. • Windows - https://www.gpg4win.org/download.html • OS X • GPG 2.0 is available as part of GPG Suite which is an easy download from https://gpgtools.org (recommended). (Bonus: GPG Suite adds GPG e-mail for the Apple Mail app). (Note: If you install GPG Suite click "Cancel" when prompted to generate a key through their GUI. We'll go through this step below at the command line.) • GnuPG 2.0 is available through Homebrew as well if you prefer. • Linux/BSD - GnuPG 2.x is available as a package for your favorite Linux flavor. (Bonus: Add Engimail while you are at it for easy GPG e-mail.) ## Offline machine for GPG key management Ideally your offline keys will never be on a machine connected to the outside world. The offline box will be the place where you keep your master key, manage your keys & sign other keys. This assures that you can always revoke keys and recover if your online machine(s) are somehow compromised. To have an offline box you'll need two "machines". These can just be the same machine if you utilize a "live" Linux USB/DVD boot drive and a USB stick for the "sneaker net". We'll follow the same procedure to generate your keys, even if you don't want to use an offline machine to generate them. It's highly recommended using an offline master though and keeping secure backups. It's the least hassle for everyone else in your "web of trust" in an event where keys are compromised. With an offline master key, if your laptop is compromised, you can just revoke the compromised sub-keys, create new sub-keys and publish them. If you get into the situation where your master key is compromised, you will have to revoke all your keys, approach people about your new keys & going through the process of reestablishing trust with people in your web-of-trust. ## Configuration Setup$GNUPGHOME directory

umask 077
export GNUPGHOME=$HOME/.gnupg ; # default location but env var rules mkdir -p$GNUPGHOME

Install the secure key-server certificate (Ubuntu/Debian):

mkdir -p /usr/local/share/ca-certificates/
curl -s https://sks-keyservers.net/sks-keyservers.netCA.pem \
| sudo tee /usr/local/share/ca-certificates/sks-keyservers.netCA.crt
sudo update-ca-certificates

Put some or all of the following in your $GNUPGHOME/gpg.conf file: (taken from the riseup.net best gpg practices page) no-emit-version no-comments keyid-format 0xlong with-fingerprint list-options show-uid-validity verify-options show-uid-validity use-agent keyserver hkps://hkps.pool.sks-keyservers.net # or less secure: keyserver hkp://pool.sks-keyservers.net keyserver-options no-honor-keyserver-url keyserver-options include-revoked personal-cipher-preferences AES256 AES192 AES CAST5 personal-digest-preferences SHA512 SHA384 SHA256 SHA224 cert-digest-algo SHA512 default-preference-list SHA512 SHA384 SHA256 SHA224 AES256 AES192 AES CAST5 ZLIB BZIP2 ZIP Uncompressed ## Key Generation ### Master Key Public and Secret 1. Generate Key GPG 1.4 gpg --expert --gen-key GPG 2.0 gpg2 --expert --gen-key GPG 2.1 gpg2 --expert --full-gen-key • Pick the "RSA" and "set your own capabilities" option • Turn off everything until it just says "Certify" then continue • Select the bit count. I pick 4000 for RSA because it's big but it's also not a power of 2 (like everyone else picks.) • Select the expiration period. I picked 1 year. • Pick a good pass-phrase that you can remember but would take a computer a long time to guess. 2. Backup Your Keys GPG 1.4 gpg --armor --export >$GNUPGHOME/public.asc
gpg --armor --export-secret-key > $GNUPGHOME/secret.asc GPG 2.0 or 2.1 gpg2 --armor --export >$GNUPGHOME/public.asc
gpg2 --armor --export-secret-key > $GNUPGHOME/secret.asc 3. Try Recovering Your Keys Delete your public and secret key and re-import them. GPG 1.4 gpg --delete-secret-key <KEYID> gpg --delete-key <KEYID> gpg --import$GNUPGHOME/public.asc $GNUPGHOME/secret.asc gpg --expert --edit-key <KEYID> GPG 2.0 & 2.1 gpg2 --delete-secret-key <KEYID> gpg2 --delete-key <KEYID> gpg2 --import$GNUPGHOME/public.asc $GNUPGHOME/secret.asc gpg2 --expert --edit-key <KEYID> • Type trust [Return] to give your self "ultimate" trust. • Type save [Return] to save your changes to the key. ### Sub-keys 1. Generate Keys Start by editing your key again. GPG 1.4 gpg --expert --edit-key <KEYID> GPG 2.0 or 2.1 gpg2 --expert --edit-key <KEYID> Create a Signing Sub-key: • Type addkey [Return] • Pick the "RSA" and "set your own capabilities" option • Turn off everything until it just says "Sign" then continue Create an Encryption Sub-key: • Type addkey [Return] • Pick the "RSA" and "set your own capabilities" option • Turn off everything until it just says "Encrypt" then continue Create an Authentication Sub-key: • Type addkey [Return] • Pick the "RSA" and "set your own capabilities" option • Add the "Authenticate" capability then continue • Turn off everything until it just says "Authenticate" then continue • Type save [Return] to save and exit Now try encrypting and signing stuff with your keys. GPG 1.4 echo 'hello' | gpg --armor --clearsign | gpg --verify echo 'sekrats!' | gpg --armor --encrypt | gpg --decrypt GPG 2.0 or 2.1 echo 'hello' | gpg2 --armor --clearsign | gpg2 --verify echo 'sekrats!' | gpg2 --armor --encrypt | gpg2 --decrypt 2. Backup Keys GPG 1.4 gpg --armor --export >$GNUPGHOME/public.asc
gpg --armor --export-secret-keys    > $GNUPGHOME/secret.asc gpg --armor --export-secret-subkeys >$GNUPGHOME/subkey.asc

GPG 2.0 or 2.1

gpg2 --armor --export                > $GNUPGHOME/public.asc gpg2 --armor --export-secret-keys >$GNUPGHOME/secret.asc
gpg2 --armor --export-secret-subkeys > $GNUPGHOME/subkey.asc 3. Try Recovery Delete your public and secret key and re-import the offline keys (the full set). GPG 1.4 gpg --delete-secret-key <KEYID> gpg --delete-key <KEYID> gpg --import$GNUPGHOME/public.asc $GNUPGHOME/secret.asc gpg --expert --edit-key <KEYID> GPG 2.0 or 2.1 gpg2 --delete-secret-key <KEYID> gpg2 --delete-key <KEYID> gpg2 --import$GNUPGHOME/public.asc $GNUPGHOME/secret.asc gpg2 --expert --edit-key <KEYID> 1. Type trust [Return] to give your self "ultimate" trust. 2. Type save [Return] to save your changes to the key. Now try encrypting and signing stuff with your keys. GPG 1.4 echo 'hello' | gpg --armor --clearsign | gpg --verify echo 'sekrats!' | gpg --armor --encrypt | gpg --decrypt GPG 2.0 or 2.1 echo 'hello' | gpg2 --armor --clearsign | gpg2 --verify echo 'sekrats!' | gpg2 --armor --encrypt | gpg2 --decrypt ## Switching Secret Key Sets ### Using your offline secret keys Delete your secret keys and re-import the offline secret keys (the full set). GPG 1.4 gpg --delete-secret-key <KEYID> gpg --import$GNUPGHOME/secret.asc

GPG 2.0 or 2.1

gpg2 --delete-secret-key <KEYID>
gpg2 --import $GNUPGHOME/secret.asc With the full set in use (offline only of course!) you can import your buddy's keys, sign them and trust them. Use gpg –import and gpg –export to move keys around to/from USB drives to/from your online machine. If you list your secret keys you should see a plain "sec" next to your key. This indicates a full secret key is present. You may now manage keys and sign other keys. GPG 1.4 gpg --list-secret-keys GPG 2.0 or 2.1 gpg2 --list-secret-keys ### Using your online secret keys Delete your secret keys and re-import the online secret keys (the subset). GPG 1.4 gpg --delete-secret-key <KEYID> gpg --import$GNUPGHOME/subkey.asc

GPG 2.0 or 2.1

gpg2 --delete-secret-key <KEYID>
gpg2 --import $GNUPGHOME/subkey.asc You won't be able to sign other people's keys or create/revoke keys with this key-set. (This is a good thing in case your online machine and it's subkeys are compromised later.) If you list your secret keys you should see a "sec#" next to your key. The '#' indicates that the secret is missing for your master key. If you are following good practices, from now on, it will only be available on your offline computer. GPG 1.4 gpg --list-secret-keys GPG 2.0 or 2.1 gpg2 --list-secret-keys # Signing Packages with Stack Now that we have our online secret keys ready, we can sign our Hackage-bound packages with Stack. There are currently two methods to sign keys in Stack. The first is to sign your package while uploading (the default). Just perform an upload as you might have in the past. stack upload This will upload your package to Hackage just like it did before. After that finishes it will GPG detach-sign the package and send that signature to sig-service. This service just collects signatures & periodically commits them to sig-archive. Later we will use these signatures and our web of trust to verify package contents as they download with stack. If you don't want to sign for some reason. You can just skip the signature by adding a flag. stack upload --no-signature Another way to sign is to create an sdist file. If you add a flag you can publish a signature at the same time. stack sdist --sign For those package authors out there with more than a few packages to sign, we have written a simple tool to bulk sign all your packages published to Hackage. It's called sig-tool. To bootstrap you need to run the following: cd sig-tool stack install stack exec -- sig-tool setup <myHackageUser> This will download all of your packages to the current directory & generate a manifest with SHA256 sums for all the releases. You should inspect the files to make sure they are correct at this stage. You can sign all of the releases or you can trim the manifest file if you like. To begin signing run the following: stack exec -- sig-tool sign # Come work with us! **FP Complete has multiple openings for software and devops engineers. If you're interested in working on Haskell, container-based deployment, build systems, or just about anything you read about on this blog, send us your CV.** ### Functional Jobs # Software Engineer – Cryptography at LeapYear (Full-time) ### LeapYear is pioneering a new approach to the data security problem. We’ve built the first platform that enables unrestricted access to information while providing mathematically proven guarantees of security and privacy. Our engineers solve hard, real-world problems that impact every data-driven organization. We are an Agile team at the frontiers of software development, integrating the latest advances in cryptography and machine learning into critical systems using functional programming (99% of our code is written in Haskell). If this sounds like your dream job, read on. ### You will be one of the first ten employees at a rapidly growing, mission-driven company. In our first six months, our team architected, developed, and deployed our product on Fortune 50 datasets with billions of records and secured over five and a half million dollars in bookings. We are backed by Bloomberg, private equity investors, and security industry experts with multi-billion dollar exits. LeapYear has collaborated with some of the nation’s largest healthcare organizations, partnered with a coalition working to deploy$250B in social impact capital, and we’re just getting started.

If your day-to-day concerns include information leaks, timing attacks, and how well PKCS#1 v2 stacks up against v1.5, this is the job for you. You will shape our security design alongside the world's leading experts in cryptography and make key strategic decisions about our product. We will look to you to identify and instill the best practices in security engineering throughout the platform and help us meet high performance benchmarks while maintaining cryptographic guarantees of privacy and security.

To learn more about our mission, team, company culture, and benefits, see http://leapyear.io/careers. For details on the specific responsibilities and requirements of the role, please see below.

### RESPONSIBILITIES

• oversee the development of secure and scalable implementations
• collaborate with developers and researchers to transform theory in cryptography into enterprise applications
• develop performance-critical code
• plan, implement, and optimize new features to carry out our product roadmap

### REQUIREMENTS

• expert knowledge of systems architecture and C
• professional experience in functional programming
• ability to read and understand academic literature in theoretical computer science
• track record of delivering high-quality product features on schedule
• strong foundation in data structures, algorithms, and software design

### PREFERRED

• expert knowledge of practical cryptography and data security
• industry experience in security and/or cryptography

### BONUS

• publications in major security or cryptography journals
• PhD in computer science, math, or related fields
• professional experience in developing secure critical systems

### A FEW OF THE PERKS

• Culture of teaching and learning
• Competitive compensation package of salary and equity
• Open vacation policy
• Catered lunch and dinner every day
• Company outings
• Free yoga and meditation classes at a world-class studio downstairs
• Relocation support

LeapYear is on a mission to resolve the conflict between innovation and data security. If you thrive in an open environment working on deeply technical challenges, we’re looking for you.

Get information on how to apply for this position.

# Software Engineer – Machine Learning at LeapYear (Full-time)

### LeapYear is pioneering a new approach to the data security problem.

We’ve built the first platform that enables unrestricted access to information while providing mathematically proven guarantees of security and privacy.

Our engineers solve hard, real-world problems that impact every data-driven organization. We are an Agile team at the frontiers of software development, integrating the latest advances in cryptography and machine learning into critical systems using functional programming (99% of our code is written in Haskell).

In our first six months, our team architected, developed, and deployed our product on Fortune 50 datasets with billions of records and secured over five and a half million dollars in bookings. We are backed by Bloomberg, private equity investors, and security industry experts with multi-billion dollar exits. LeapYear has collaborated with some of the nation’s largest healthcare organizations, partnered with a coalition working to deploy $250B in social impact capital, and we’re just getting started. If you work at the intersection of machine learning and functional programming, we're looking for you. You will collaborate with developers and researchers to transform machine learning theory into enterprise applications, and the novel algorithms that you develop will be deployed on massive enterprise datasets. To learn more about our mission, team, company culture, and benefits, see http://leapyear.io/careers. For details on the specific responsibilities and requirements of the role, please see below. ### RESPONSIBILITIES • design and implement novel machine learning techniques • develop performance-critical code • plan, implement, and optimize new features to carry out our product roadmap ### REQUIREMENTS • professional or research machine learning experience • professional experience in functional programming • ability to read and understand academic literature in theoretical computer science • track record of delivering high-quality product features on schedule • strong foundation in data structures, algorithms, and software design ### PREFERRED • excellent Haskell skills • experience in enterprise data science and data engineering • background applying advanced machine learning techniques in enterprise data environments ### BONUS • PhD in machine learning or related field ### A FEW OF THE PERKS • Culture of teaching and learning • Competitive compensation package of salary and equity • Open vacation policy • Build your ideal workstation • Catered lunch and dinner every day • Company outings • Free yoga and meditation classes at a world-class studio downstairs • Relocation support LeapYear is on a mission to resolve the conflict between innovation and data security. If you thrive in an open environment working on deeply technical challenges, we’re looking for you. Get information on how to apply for this position. ## May 04, 2016 ### Bill Atkins # NSNotificationCenter, Swift and blocks The conventional way to register observers with NSNotificationCenter is to use the target-action pattern. While this gets the job done, it's inherently not type-safe. For example, the following Swift snippet will compile perfectly: NSNotificationCenter.defaultCenter().addObserver(self, selector: Selector("itemAdded:"), name: MyNotificationItemAdded, object: nil) even though at runtime it will fail unless self has a method named itemAdded that takes exactly one parameter (leaving off that last colon in the selector will turn this line into a no-op). Plus, this method gives you no way to take advantages of Swift's closures, which would allow the observer to access local variables in the method that adds the observer and would eliminate the need to create a dedicated method to handle the event. A better way to do this is to use blocks. And NSNotificationCenter does include a block-based API: NSNotificationCenter.defaultCenter().addObserverForName(MyNotificationItemAdded, object: nil, queue: nil) { note in // ... } This is much nicer, especially with Swift's trailing closure syntax. There are no method names to be looked up at runtime, we can refer to local variables in the method that registered the observer and we can perform small bits of logic in reaction to events without having to create and name dedicated methods. The catch comes in resource management. It's very important that an object remove its event observers when it's deallocated, or else NSNotificationCenter will try to invoke methods on invalid pointers. The traditional target-action method has the one advantage that we can easily handle this requirement with a single call in deinit: deinit { NSNotificationCenter.defaultCenter().removeObserver(self) } With the block API, however, since there is no explicit target object, each call to addObserverForName returns "an opaque object to act as observer." So your observer class would need to track all of these objects and then remove them all from the notification center in deinit, which is a pain. In fact, the hassle of having to do bookkeeping on the observer objects almost cancels out the convenience of using the block API. Frustrated by this situation, I sat down and created a simple helper class, NotificationManager: class NotificationManager { private var observerTokens: [AnyObject] = [] deinit { deregisterAll() } func deregisterAll() { for token in observerTokens { NSNotificationCenter.defaultCenter().removeObserver(token) } observerTokens = [] } func registerObserver(name: String, block: (NSNotification -> Void)) { let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: nil, queue: nil, usingBlock: block) observerTokens.append(newToken) } func registerObserver(name: String, forObject object: AnyObject, block: (NSNotification -> Void)) { let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: object, queue: nil, usingBlock: block) observerTokens.append(newToken) } } First, this simple class provides a Swift-specialized API around NSNotificationCenter. It provides an additional convenience method without an object parameter (rarely used, in my experience) to make it easier to use trailing-closure syntax. But most importantly, it keeps track of the observer objects generated when observers are registered, and removes them when the object is deinit'd. A client of this class can simply keep a member variable of type NotificationManager and use it to register its observers. When the parent class is deallocated, the deinit method will automatically be called on its NotificationManager member variable, and its observers will be properly disposed of: class MyController: UIViewController { private let notificationManager = NotificationManager() override init() { notificationManager.registerObserver(MyNotificationItemAdded) { note in println("item added!") } super.init() } required init(coder: NSCoder) { fatalError("decoding not implemented") } } When the MyController instance is deallocated, its NotificationManager member variable will be automatically deallocated, triggering the call to deregisterAll that will remove the dead objects from NSNotificationCenter. In my apps, I add a notificationManager instance to my common UIViewController base class so I don't have to explicitly declare the member variable in all of my controller subclasses. Another benefit of using my own wrapper around NSNotificationCenter is that I can add useful functionality, like group observers: an observer that's triggered when any one of a group of notifications are posted: struct NotificationGroup { let entries: [String] init(_ newEntries: String...) { entries = newEntries } } extension NotificationManager { func registerGroupObserver(group: NotificationGroup, block: (NSNotification -> ()?)) { for name in group.entries { registerObserver(name, block: block) } } } This can be a great way to easily set up an event handler to run when, for example, an item is changed in any way at all: let MyNotificationItemsChanged = NotificationGroup( MyNotificationItemAdded, MyNotificationItemDeleted, MyNotificationItemMoved, MyNotificationItemEdited ) notificationManager.registerGroupObserver(MyNotificationItemsChanged) { note in // ... } ### Mikhail Glushenkov # What's new in Cabal 1.24 — Nix-style local builds, setup dependencies, HTTPS and more! We’ve just released versions 1.24 of both Cabal and cabal-install. The 1.24 release incorporates more than a thousand commits by 89 different contributors. This post describes what’s new and improved in this version. ## User-visible features • Nix-style local builds in cabal-install (so far only a technical preview). See this post by Edward Z. Yang for more details. • Integration of a new security scheme for Hackage based on The Update Framework. So far this is not enabled by default, pending some changes on the Hackage side. See these three posts by Edsko de Vries and Duncan Coutts for more information. • Support for specifying setup script dependencies in .cabal files. Setup scripts are also now built with the cabal_macros.h-style macros for conditional compilation. See this post by Duncan Coutts for more information. • Support for HTTPS downloads in cabal-install. HTTPS is now used by default for downloads from Hackage. This uses either curl or wget or, on Windows, PowerShell, under the hood. Install target URLs can now also use HTTPS, e.g. cabal install https://example.com/foo-1.0.tar.gz. • cabal upload learned how to upload documentation to Hackage (cabal upload --doc) (#2890). • In related news, cabal haddock now can generate documentation intended for uploading to Hackage (cabal haddock --for-hackage, #2852). cabal upload --doc runs this command automatically if the documentation for current package hasn’t been generated yet. • New cabal-install command: gen-bounds for easy generation of version bounds. See this post by Doug Beardsley for more information. • It’s now possible to limit the scope of --allow-newer to single packages in the install plan, both on the command line and in the config file. See here for an example. • The --allow-newer option can be now used with ./Setup configure (#3163). • New cabal user-config subcommand: init, which creates a default config file in either the default location (~/.cabal/config) or as specified by --config-file (#2553). • New config file field extra-framework-dirs for specifying extra locations to find OS X frameworks in (#3158). Can be also specified as an argument for install and configure commands. • cabal-install solver now takes information about extensions and language flavours into account (#2873). The solver is now also aware of pkg-config constraints (#3023). • New cabal-install option: --offline, which prevents cabal-install from downloading anything from the Internet. • New cabal upload option -P/--password-command for reading Hackage password from arbitrary program output. • New --profiling-detail=$level flag with a default for libraries and executables of ‘exported-functions’ and ‘toplevel-functions’ respectively (GHC’s -fprof-auto-{exported,top} flags) (#193).

• New --show-detail mode: --show-detail=direct; like streaming, but allows the test program to detect that is connected to a terminal, and works reliable with a non-threaded runtime (#2911, and serves as a work-around for #2398)

• Macros VERSION_$pkgname and MIN_VERSION_$pkgname are now also generated for the current package (#3235).

• The builddir option can now be specified via the CABAL_BUILDDIR environment variable and in cabal.config (#2484).

• Added a log file message similar to one printed by make when building in another directory (#2642).

## Bug fixes and minor improvements

• Support for GHC 8. NB: pre-1.24 versions of Cabal won’t work with GHC 8.

• Cabal is now aware of extra C sources generated by preprocessors (e.g. c2hs and hsc2hs) (#2467).

• Cabal now includes cabal_macros.h when running c2hs (#2600).

• C sources are now recompiled only when needed (#2601).

• Support Haddock response files to work around command-line length restrictions on Windows (#2746).

• Library support for multi-instance package DBs (#2948).

• Improvements in the ./Setup configure solver (#3082, #3076).

• If there are multiple remote repos, cabal update now updates them in parallel (#2503).

• cabal program itself now can be used as an external setup method. This fixes an issue when Cabal version mismatch caused unnecessary reconfigures (#2633).

• Fixed space leaks in cabal update (#2826) and in the solver (#2916, #2914). Improved performance of --reorder-goals (#3208).

• cabal exec and sandbox hc-pkg now use the configured compiler (#2859).

• The man page for cabal-install is now automatically generated (#2877).

• Miscellaneous minor and/or internal bug fixes and improvements.

## Acknowledgements

Thanks to everyone who contributed code and bug reports, and to Ryan Thomas for helping with release management. Full list of people who contributed patches to Cabal/cabal-install 1.24 is available here.

## Looking forward

We plan to make a new release of Cabal/cabal-install approximately 6 months after 1.24 — that is, in late October or early November 2016. Main features that are currently targeted at 1.26 are:

• Further work on nix-style local builds, perhaps making that code path the default.

• Enabling Hackage Security by default.

• Native suport for foreign libraries: Haskell libraries that are intended to be used by non-Haskell code.

• New Parsec-based parser for .cabal files.

• A revamped homepage for Cabal, rewritten user manual, and automated build bots for binary releases.

We would like to encourage people considering contributing to take a look at the bug tracker on GitHub, take part in discussions on tickets and pull requests, or submit their own. The bug tracker is reasonably well maintained and it should be relatively clear to new contributors what is in need of attention and which tasks are considered relatively easy. For more in-depth discussion there is also the cabal-devel mailing list.

# The five arguments on why people struggle with monads

People trying to learn Haskell, or about Haskell, often struggle with the concept of monads. People like to talk about it a lot; it’s probably because pedagogy is interesting, because you need to use the class in Haskell (so comprehension is important for language adoption), and people like talking about things that they understand that other people don’t. Specifically, it’s a class called Monad that appears in much of Haskell development.

I’m going to summarize the kind of arguments typically put forth, which I consider all contributory scales of why people struggle with monads as applied in Haskell:

• The alienation argument: The concept of a monad is inherently difficult for puny minds to understand (suddenly programmers aren’t good at abstraction?). It doesn’t matter the quality of or context in which materials present the topic, it’s just really hard and alien.
• The bad pedagogy argument: The educational material is terrible. We just haven’t found the holy grail of monad tutorials.
• The mysticism argument: There’re so many tutorials and “wu wu” (and discussions like this very blog post; you are contributing by reading it! Shame on you!) around monads that make people think they still don’t “get” it, like some kind of zen moment is required.
• The academic saturation argument: Many enthusiastic Haskellers encourage learning category theory, which leads to confusing the Haskell class Monad with category theoretic notion of a monad, which are distinct.
• The foreign language argument: Not having enough understanding of Haskell, specifically, Haskell’s type system (to understand the Monad class in any realistic way, you have to grok type-classes and higher-kinded types), makes understanding a concept employing that language, out of reach.

I have my own opinion about which scales are most to blame, which I’ll elaborate on now.

## The alienation argument

We’ve already seen monads applied in other languages. LINQ in C#, workflows in F#, promises in JavaScript. They’re not all the same as monads–let’s avoid that pedantry–but they’re not some whole other world, either. They have a notion of creating some actions as first class values and combining them in some way and then running that composed thing. If programmers have no problem with this kind of thing in other languages, what’s so hard about Haskell’s Monad class?

Actually, there are a few tutorials that always come up in online discussions as being quality monad tutorials which are followed by echoes of approval as being The Great Monad Tutorial:

Whatever you think about the tutorials, we’re not lacking in quality educational materials. So what’s the problem?

## The mysticism argument

This might be the answer to the previous. There are indeed too many tutorials. It’s been called a fallacy or advised never to read monad tutorials:

The tutorials are not consistent, either, they fall in a few camps:

This is indeed a problem.

Quoting Keith B:

Too many “monad tutorials” are written by either actual mathematicians or frustrated mathematicians–manqué and as a result they focus far too much on trying to transmit the excitement of understanding a cool new concept. Working programmers have little interest in this. They want to gan facility with useful techniques.

Fundamentally there are two audiences, people who want to learn category theory, and people who want to learn Haskell. Unfortunately, sometimes, the latter camp are baffled into believing they ought to be in the former camp.

But ultimately I think people putting the effort in aren’t really mislead by this.

## The foreign language argument

This is the most compelling reason for me.

If you don’t understand a language, you cannot understand concepts expressed in that language. Try to read about a new concept, something abstract, on Wikipedia, but switch the language to one you don’t understand well. Good luck. I’ve tried explaining why Blinkenlights makes me laugh to tears, to non-native English speakers (note: the ‘ATTENTION’ German version is also funny), with much difficulty.

There are two problems:

1. You can’t understand the implications of class Monad m where .. without having a solid understanding of (1) type-classes, and (2) higher-kinded types. The type of return is Monad m => a -> m a. Haskell’s type-classes are a unique feature that few other languages have, supporting value polymorphism. The m there is of kind m :: * -> *. That’s a higher-order type. It’s hard to even explain these two concepts.
2. Haskell starts out of the box with making use of the Monad class. The first thing you encounter as a newbie is also one of the things you are most ill-equipped to understand.

Is it any wonder that this class is such a pain point?

To add insult to injury, said audience aren’t aware that this is their poison. Laurence Gonsalves arrives at an insight without really knowing it:

Of course monad tutorials about the Monad class are written in Haskell, because it uses two language features not present in other popular languages. JavaScript, Ruby, Python don’t have a type system. C# and Java have generics, and Common Lisp has generic functions, but none of them have value polymorphism. The whole reason that Haskell is used to teach this concept is because it’s only a satisfying class in Haskell.

Let’s detour into natural language: French does not have a natural equivalent to the verb “Peck”, which translated into French is “donner des coups de bec” or “Attack with the front of the beak.” Italian does not have a word for “toe”, it has “dito del piede” which is “finger of the foot”. English has “-ish” and “-y”, like “this is greenish” or “it was salady”, whereas Italian has “-one” (large), so “bacio” (kiss) => “baccione” (big kiss), and “-a” and “-o” for male and female: figlio/figlia => son/daughter.

Programming languages are the same. There are some genuinely new features that other languages aren’t able to reproduce, without removing the utility of the thing in doing so.

## Summarizing

In summarizing I’d personally assign the following ratings to each argument:

• The alienation argument: not too convincing
• The bad pedagogy argument: not likely contributory
• The mysticism argument: very contributory
• The academic saturation argument: not a big deal
• The foreign language argument: a big contributor

I think educators can only acknowledge the two problems in The foreign language argument and teach a good understanding of the language before moving onto that class.

# Mac OS X C++ Development (part 2)

As I wrote in my previous post, I recently started working on a Mac. This is a collection of my notes on the problems and surprises I ran into. Maybe it will be useful for you, too.

# Tools

In addition to many of the command line tools you may be familiar with from C++ development on Linux, Mac OS X has specialized tools for working with binaries.

## otool

The otool utility displays information from binary headers. It works with Mach headers (which are the main thing, on Macs) but also works with other “universal” formats as well. There are many options, but the main one that’s been helpful to me so far has been otool -L <lib>, which tells you the dependencies of the archive, which are essential debugging information for dynamic linking problems. Another useful one is otool -D <lib>, which tells you the install names embedded in a given library.

## install_name_tool

This tool allows you to change the install names for a binary. Simple, but important.

By now you must be wondering what install names are.

# Install Names

Let us begin with man dyld. There is no dyld command. Nevertheless, it is the dynamic linker for Mac OS X. The man page tells us many interesting things, perhaps chief among which is how dynamic libraries are located at run-time.

When you link an application to a dynamic library, the path to that library is encoded in the application binary. Well, actually, when you build, the library is interrogated to find out where it is supposed to be installed, and that path gets encoded in the application binary. These paths where things are supposed to be installed are known as install names on Mac OS X. Every .dylib has (at least) one. You can use otool -D to find the install name of a .dylib file.

In most cases, the install name will have an absolute path, like this:

> $otool -D /usr/lib/libiconv.dylib /usr/lib/libiconv.dylib: /usr/lib/libiconv.2.dylib So if you link with /usr/lib/libiconv.dylib, your application will look for /usr/lib/libiconv.2.dylib when it launches. If the .dylib isn’t where it’s expected, the app crashes. ## Relative paths Sometimes a library will have a relative path as its install name. When this happens, dyld will search for a library with that name on several search paths. It’s similar to LD_LIBRARY_PATH in some ways. There are environment variables you can set to control where it searches. There are several, and you should consult man dyld if you want the gory details. One thing that is important to know is that one of these variables, DYLD_FALLBACK_LIBRARY_PATH, has the default value $(HOME)/lib:/usr/local/lib:/lib:/usr/lib. Occasionally you may see advice on the web that recommends symbolic linking a library into your $(HOME)/lib without any further explanation. DYLD_FALLBACK_LIBRARY_PATH is why. ## @rpath If you want to distribute an application that relies on libraries that are not necessarily in known standard places, it is a good idea to use @rpath as part of your install name. This directs dyld to look in a path that is relative to the application binary (or in the case of a library that is dynamically linked to another library, the path is relative to the calling library), rather than looking in the places it normally would. This would allow you to bundle a set of libraries along with your application, and have them be found regardless of where you install the application. There are several other options, such as @executable_path and @loader_path, but @rpath seems to be the right choice in most cases. Again, man dyld has all the details. # An Example: Dionysus I wanted to build Dionysus, a library for working with persistent homology. I wanted to build the Python bindings. ## Linker Errors After I managed to get through the compile phase, I ran into linker problems, with unresolved symbol errors similar to the ones I mentioned in part 1. However, in this case, changing the compiler didn’t fix the problem. I was getting linking errors trying to link with Boost, which I had already reinstalled on my system from source, using gcc: export HOMEBREW_CC=gcc-5 export HOMEBREW_CXX=g++-5 brew reinstall --build-from-source boost brew reinstall --build-from-source boost --with-python3 That wasn’t enough though. I still got linker errors. Googling suggested maybe the problem was that Boost.Python (which Dionysus uses for its Python binding) was linked to a different version of Python than the one I was targeting. This turned out to be a red herring, however. As far as I can tell, Boost.Python doesn’t actually link to Python at all (though it has to be compiled with support for Python 3 if you want that, so there’s certainly a connection, just not a linking one). The problem actually was that Dionysus wanted the C++ 11 version of Boost, and the default Homebrew version doesn’t have that support. For this, you need: brew reinstall --build-from-source boost --with-c++11 brew reinstall --build-from-source boost-python --with-c++11 Build-time linking errors resolved, I thought I was home free. ## Dynamic Linking Errors At last, I had built the library. I fired up python2.7, imported the library. It worked! Then I created an instance of one of the classes, and got this: TypeError: __init__() should return None, not 'NoneType' After a lot of googling, it turned out that the most likely cause for this was that the Dionysus binding had wound up linked to the wrong Python library. I carefully checked the CMakeCache.txt file and eliminated any occurrences of the wrong Python library, or the wrong Boost library. Still no luck. A look at otool -L output for the library showed something funny: >$ otool -L lib_dionysus.dylib
lib_dionysus.dylib:
/Users/me/src/Dionysus/build/bindings/python/lib_dionysus.dylib (compatibility version 0.0.0, current version 0.0.0)
libpython2.7.dylib (compatibility version 2.7.0, current version 2.7.0)
/usr/local/opt/boost-python/lib/libboost_python-mt.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/local/opt/mpfr/lib/libmpfr.4.dylib (compatibility version 6.0.0, current version 6.3.0)
/usr/local/opt/gmp/lib/libgmp.10.dylib (compatibility version 14.0.0, current version 14.0.0)

Do you notice that all the entries are absolute paths except for the libpython2.7 one? A quick look at the Anaconda libpython2.7.dylib I linked against shows why:

> $otool -D /Users/me/anaconda2/lib/libpython2.7.dylib /Users/me/anaconda2/lib/libpython2.7.dylib: libpython2.7.dylib  So Anaconda’s version of Python instructed the linker to find libpython2.7.dylib by going through the normal dyld process - checking DYLD_LIBRARY_PATH, then DYLD_FALLBACK_LIBRARY_PATH, and so on. Since /usr/lib is on DYLD_FALLBACK_LIBRARY_PATH by default, and the system libpython2.7.dylib is in that directory, Dionysus was getting linked with that one, not with the one that was already in memory in the current process. This led to the strange error I saw. I was able to use install_name_tool to change the binary to point to the right libpython2.7: >$ install_name_tool lib_dionysus.dylib -change libpython2.7.dylib /Users/me/anaconda2/lib/libpython2.7.dylib

After this, everything was fine.

In addition to man dyld, these are useful:

# Announcing cabal new-build: Nix-style local builds

cabal new-build, also known as “Nix-style local builds”, is a new command inspired by Nix that comes with cabal-install 1.24. Nix-style local builds combine the best of non-sandboxed and sandboxed Cabal:

1. Like sandboxed Cabal today, we build sets of independent local packages deterministically and independent of any global state. new-build will never tell you that it can't build your package because it would result in a “dangerous reinstall.” Given a particular state of the Hackage index, your build is completely reproducible. For example, you no longer need to compile packages with profiling ahead of time; just request profiling and new-build will rebuild all its dependencies with profiling automatically.
2. Like non-sandboxed Cabal today, builds of external packages are cached globally, so that a package can be built once, and then reused anywhere else it is also used. No need to continually rebuild dependencies whenever you make a new sandbox: dependencies which can be shared, are shared.

Nix-style local builds work with all versions of GHC supported by cabal-install 1.24, which currently is GHC 7.0 and later. Additionally, cabal-install is on a different release cycle than GHC, so we plan to be pushing bugfixes and updates on a faster basis than GHC's yearly release cycle.

Although this feature is in only beta (there are bugs, see “Known Issues”, and the documentation is a bit sparse), I’ve been successfully using Nix-style local builds exclusively to do my Haskell development. It's hard to overstate my enthusiasm for this new feature: it “just works”, and you don't need to assume that there is a distribution of blessed, version-pegged packages to build against (e.g., Stackage). Eventually, new-build will simply replace the existing build command.

### Quick start

Nix-style local builds “just work”: there is very little configuration that needs to be done to start working with it.

cabal update
cabal install cabal-install


Make sure the newly installed cabal is in your path.

2. To build a single Cabal package, instead of running cabal configure; cabal build, you can use Nix-style builds by prefixing these commands with new-; e.g., cabal new-configure; cabal new-build. cabal new-repl is also supported. (Unfortunately, other commands are not yet supported, e.g. new-clean (#2957) or new-freeze (#2996).)

3. To build multiple Cabal packages, you need to first create cabal.project file in some root directory. For example, in the Cabal repository, there is a root directory with a folder per package, e.g., the folders Cabal and cabal-install. Then in cabal.project, specify each folder:

packages: Cabal/
cabal-install/


Then, in the directory for a package, you can say cabal new-build to build all of the components in that package; alternately, you can specify a list of targets to build, e.g., package-tests cabal asks to build the package-tests test suite and the cabal executable. A component can be built from any directory; you don't have to be cd'ed into the directory containing the package you want to build. Additionally, you can qualify targets by the package they came from, e.g., Cabal:package-tests asks specifically for the package-tests component from Cabal. There is no need to manually configure a sandbox: add a cabal.project file, and it just works!

Unlike sandboxes, there is no need to add-source; just add the package directories to your cabal.project. And unlike traditional cabal install, there is no need to explicitly ask for packages to be installed; new-build will automatically fetch and build dependencies.

There is also a convenient script you can use for hooking up new-build to your Travis builds.

### How it works

Nix-style local builds are implemented with these two big ideas:

### How is CodeWorld related to Summer of Haskell?

The Haskell.org committee will choose a number of student projects based on their impact to the Haskell community.  As part of this, one project will be chosen specifically relating to CodeWorld, and funded by CodeWorld maintainers.

### Should I submit a proposal?

It’s up to you, but I believe you should submit a proposal if:

• You are eligible (see the bottom of the Summer of Haskell info page).
• You are willing and available to take on an essentially full-time commitment for the summer.
• You have a realistic idea you’d like to work on to benefit the Haskell community.

### Any advice for writing a proposal?

Yes!  Here are things you should keep in mind:

1. Propose a project with immediate impact on real people.  “If you build it, they will come” doesn’t work here.  Unless you have an extremely good reason, don’t propose to build something speculative and hope people will just like it so much that they adopt it.  Point to real people who already want this, and who will already be users and will find their lives better if and when it’s completed.
2. Demonstrate that you understand the task.  Provide enough detail to convince us that the project is feasible.  A reasonable and concrete timeline with at least rough deliverables is a good idea.  Poorly defined projects with a low probability of success are often not good fits for this format.
3. Show that you are already becoming a part of the community you’ll be working with.  Are you familiar with the project you’re proposing to contribute to?  Do core people in the project and/or the Haskell community know who you are?  Have you discussed your ideas with people already involved in the project?  Do you know someone who would be your mentor?

You can browse successful projects from last year.  There’s also some good advice by Edward Kmett in an old mailing list thread.

# Typewriters

It will suprise nobody to learn that when I was a child, computers were almost unknown, but it may be more surprising that typewriters were unusual.

Probably the first typewriter I was familiar with was my grandmother’s IBM “Executive” model C. At first I was not allowed to touch this fascinating device, because it was very fancy and expensive and my grandmother used it for her work as an editor of medical journals.

The “Executive” was very advanced: it had proportional spacing. It had two space bars, for different widths of spaces. Characters varied between two and five ticks wide, and my grandmother had typed up a little chart giving the width of each character in ticks, which she pasted to the top panel of the typewriter. The font was sans-serif, and I remember being a little puzzled when I first noticed that the lowercase j had no hook: it looked just like the lowercase i, except longer.

The little chart was important, I later learned, when I became old enough to use the typewriter and was taught its mysteries. Press only one key at a time, or the type bars will collide. Don't use the (extremely satisfying) auto-repeat feature on the hyphen or underscore, or the platen might be damaged. Don't touch any of the special controls; Grandma has them adjusted the way she wants. (As a concession, I was allowed to use the “expand” switch, which could be easily switched off again.)

The little chart was part of the procedure for correcting errors. You would backspace over the character you wanted to erase—each press of the backspace key would move the carriage back by one tick, and the chart told you how many times to press—and then place a slip of correction paper between the ribbon and the paper, and retype the character you wanted to erase. The dark ribbon impression would go onto the front of the correction slip, which was always covered with a pleasing jumble of random letters, and the correction slip impression, in white, would exactly overprint the letter you wanted to erase. Except sometimes it didn't quite: the ribbon ink would have spread a bit, and the corrected version would be a ghostly white letter with a hair-thin black outline. Or if you were a small child, as I was, you would sometimes put the correction slip in backwards, and the white ink would be transferred uselessly to the back of the ribbon instead of to the paper. Or you would select a partly-used portion of the slip and the missing bit of white ink would leave a fragment of the corrected letter on the page, like the broken-off leg of a dead bug.

Later I was introduced to the use of Liquid Paper (don't brush on a big glob, dot it on a bit at a time with the tip of the brush) and carbon paper, another thing you had to be careful not to put in backward, although if you did you got a wonderful result: the typewriter printed mirror images.

From typing alphabets, random letters, my name, and of course qwertyuiops I soon moved on to little poems, stories, and other miscellanea, and when my family saw that I was using the typewriter for writing, they presented me with one of my own, a Royal manual (model HHE maybe?) with a two-color ribbon, and I was at last free to explore the mysteries of the TAB SET and TAB CLEAR buttons. The front panel had a control for a three-color ribbon, which forever remained an unattainable mystery. Later I graduated to a Smith-Corona electric, on which I wrote my high school term papers. The personal computer arrived while I was in high school, but available printers were either expensive or looked like crap.

When I was in first grade our classroom had acquired a cheap manual typewriter, which as I have said, was an unusual novelty, and I used it whenever I could. I remember my teacher, Ms. Juanita Adams, complaining that I spent too much time on the typewriter. “You should work more on your handwriting, Jason. You might need to write something while you’re out on the street, and you won't just be able to pull a typewriter out of your pocket.”

She was wrong.

# Paul Graham on Writing, Briefly

Thanks to Arne Ranta for introducing me to Writing, Briefly by Paul Graham.
I think it's far more important to write well than most people realize. Writing doesn't just communicate ideas; it generates them. If you're bad at writing and don't like to do it, you'll miss out on most of the ideas writing would have generated.

# Introduction

This is a bit different from my usual posts (well apart from my write up of hacking at Odessa) in that it is a log of how I managed to get LibBi (Library for Bayesian Inference) to run on my MacBook and then not totally satisfactorily (as you will see if you read on).

The intention is to try a few more approaches to the same problem, for example, Stan, monad-bayes and hand-crafted.

Kermack and McKendrick (1927) give a simple model of the spread of an infectious disease. Individuals move from being susceptible ($S$) to infected ($I$) to recovered ($R$).

\displaystyle \begin{aligned} \frac{dS}{dt} & = & - \delta S(t) I(t) & \\ \frac{dI}{dt} & = & \delta S(t) I(t) - & \gamma I(t) \\ \frac{dR}{dt} & = & & \gamma I(t) \end{aligned}

In 1978, anonymous authors sent a note to the British Medical Journal reporting an influenza outbreak in a boarding school in the north of England (“Influenza in a boarding school” 1978). The chart below shows the solution of the SIR (Susceptible, Infected, Record) model with parameters which give roughly the results observed in the school.

# LibBi

## Step 1

~/LibBi-stable/SIR-master $./init.sh error: 'ncread' undefined near line 6 column 7 The README says this is optional so we can skip over it. Still it would be nice to fit the bridge weight function as described in Moral and Murray (2015). The README does say that GPML is required but since we don’t (yet) need to do this step, let’s move on. ~/LibBi-stable/SIR-master$ ./run.sh
./run.sh

Error: ./configure failed with return code 77. See
.SIR/build_openmp_cuda_single/configure.log and
.SIR/build_openmp_cuda_single/config.log for details

It seems the example is configured to run on CUDA and it is highly likely that my installation of LibBI was not set up to allow this. We can change config.conf from

--disable-assert
--enable-single
--enable-cuda
--nthreads 2

to

--nthreads 4
--enable-sse
--disable-assert

On to the next issue.

~/LibBi-stable/SIR-master $./run.sh ./run.sh Error: ./configure failed with return code 1. required QRUpdate library not found. See .SIR/build_sse/configure.log and .SIR/build_sse/config.log for details But QRUpdate is installed! ~/LibBi-stable/SIR-master$ brew info QRUpdate
brew info QRUpdate
homebrew/science/qrupdate: stable 1.1.2 (bottled)
http://sourceforge.net/projects/qrupdate/
/usr/local/Cellar/qrupdate/1.1.2 (3 files, 302.6K)
/usr/local/Cellar/qrupdate/1.1.2_2 (6 files, 336.3K)
Poured from bottle
/usr/local/Cellar/qrupdate/1.1.2_3 (6 files, 337.3K) *
Poured from bottle
From: https://github.com/Homebrew/homebrew-science/blob/master/qrupdate.rb
==> Dependencies
Required: veclibfort ✔
Optional: openblas ✔
==> Options
--with-openblas
Build with openblas support
--without-check
Skip build-time tests (not recommended)

Let’s look in the log as advised. So it seems that a certain symbol cannot be found.

checking for dch1dn_ in -lqrupdate

Let’s try ourselves.

nm -g /usr/local/Cellar/qrupdate/1.1.2_3/lib/libqrupdate.a | grep dch1dn_
0000000000000000 T _dch1dn_

So the symbol is there! What gives? Let’s try setting one of the environment variables.

export LDFLAGS='-L/usr/local/lib'

Now we get further.

./run.sh
Error: ./configure failed with return code 1. required NetCDF header
.SIR/build_sse/config.log for details

So we just need to set another environment variable.

export CPPFLAGS='-I/usr/local/include/'

This is more mysterious.

./run.sh
Error: ./configure failed with return code 1. required Boost header
.SIR/build_sse/config.log for details ~/LibBi-stable/SIR-master

Let’s see what we have.

brew list | grep -i boost

Nothing! I recall having some problems with boost when trying to use a completely different package. So let’s install boost.

brew install boost

Now we get a different error.

./run.sh
Error: make failed with return code 2, see .SIR/build_sse/make.log for details

Fortunately at some time in the past sbfnk took pity on me and advised me here to use boost155, a step that should not be lightly undertaken.

/usr/local/Cellar/boost155/1.55.0_1: 10,036 files, 451.6M, built in 15 minutes 9 seconds

Even then I had to say

brew link --force boost155

Finally it runs.

./run.sh 2> out.txt

And produces a lot of output

wc -l out.txt
49999 out.txt

ls -ltrh results/posterior.nc
1.7G Apr 30 19:57 results/posterior.nc

Rather worringly, out.txt has all lines of the form

1: -51.9191 -23.2045 nan beats -inf -inf -inf accept=0.5

nan beating -inf does not sound good.

Now we are in a position to analyse the results.

octave --path oct/ --eval "plot_and_print"
error: 'bi_plot_quantiles' undefined near line 23 column 5

I previously found an Octave package(?) called OctBi so let’s create an .octaverc file which adds this to the path. We’ll also need to load the netcdf package which we previously installed.

addpath ("../OctBi-stable/inst")
pkg load netcdf
~/LibBi-stable/SIR-master $octave --path oct/ --eval "plot_and_print" octave --path oct/ --eval "plot_and_print" warning: division by zero warning: called from mean at line 117 column 7 read_hist_simulator at line 47 column 11 bi_read_hist at line 85 column 12 bi_hist at line 63 column 12 plot_and_print at line 56 column 5 warning: division by zero warning: division by zero warning: division by zero warning: division by zero warning: division by zero warning: print.m: fig2dev binary is not available. Some output formats are not available. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. warning: opengl_renderer: x/y/zdata should have the same dimensions. Not rendering. sh: pdfcrop: command not found I actually get a chart from this so some kind of success. This does not look like the chart in the Moral and Murray (2015), the fitted number of infected patients looks a lot smoother and the “rates” parameters also vary in a much smoother manner. For reasons I haven’t yet investigated, it looks like over-fitting. Here’s the charts in the paper. # Bibliography “Influenza in a boarding school.” 1978. British Medical Journal, March, 587. Kermack, W. O., and A. G. McKendrick. 1927. “A Contribution to the Mathematical Theory of Epidemics.” Proceedings of the Royal Society of London Series A 115 (August): 700–721. doi:10.1098/rspa.1927.0118. Moral, Pierre Del, and Lawrence M Murray. 2015. “Sequential Monte Carlo with Highly Informative Observations.” ## April 30, 2016 ### Dimitri Sabadie # Porting a Haskell graphics framework to Rust (luminance) I wanted to write that new article to discuss about something important I’ve been doing for several weeks. It’s actually been a month that I’ve been working on luminance, but not in the usual way. Yeah, I’ve put my Haskell experience aside to… port luminance into Rust! There are numerous reasons why I decided to jump in and I think it could be interesting for people to know about the differences I’ve been facing while porting the graphics library. # You said Rust? Yeah, Rust. It’s a strong and static language aiming at system programming. Although it’s an imperative language, it has interesting functional conventions that caught my attention. Because I’m a haskeller and because Rust takes a lot from Haskell, learning it was a piece of cake, even though there are a few concepts I needed a few days to wrap my mind around. Having a strong C++11/14 experience, it wasn’t that hard though. ## How does it compare to Haskell? The first thing that amazed me is the fact that it’s actually not that different from Haskell! Rust has a powerful type system – not as good as Haskell’s but still – and uses immutability as a default semantic for bindings, which is great. For instance, the following is forbidden in Rust and would make rustc – the Rust compiler – freak out: let a = "foo";a = "bar"; // wrong; forbidden Haskell works like that as well. However, you can introduce mutation with the mut keyword: let mut a = "foo";a = "bar"; // ok Mutation should be used only when needed. In Haskell, we have the ST monad, used to introduce local mutation, or more drastically the IO monad. Under the wood, those two monads are actually almost the same type – with different warranties though. Rust is strict by default while Haskell is lazy. That means that Rust doesn’t know the concept of memory suspensions, or thunks – even though you can create them by hand if you want to. Thus, some algorithms are easier to implement in Haskell thanks to laziness, but some others will destroy your memory if you’re not careful enough – that’s a very common problem in Haskell due to thunks piling up in your stack / heap / whatever as you do extensive lazy computations. While it’s possible to remove those thunks by optimizing a Haskell program – profiling, strictness, etc., Rust doesn’t have that problem because it gives you full access to the memory. And that’s a good thing if you need it. Rust exposes a lot of primitives to work with memory. In contrast with Haskell, it doesn’t have a garbage collector, so you have to handle memory on your own. Well, not really. Rust has several very interesting concepts to handle memory in a very nice way. For instance, objects’ memory is held by scopes – which have lifetimes. RAII is a very well known use of that concept and is important in Rust. You can glue code to your type that will be ran when an instance of that type dies, so that you can clean up memory and scarce resources. Rust has the concept of lifetimes, used to give names to scopes and specify how long an object reference should live. This is very powerful yet a bit complex to understand in the first place. I won’t go into comparing the two languages because it would require several articles and a lot of spare time I don’t really have. I’ll stick to what I’d like to tell you: the Rust implementation of luminance. # Porting luminance from Haskell to Rust The first very interesting aspect of that port is the fact that it originated from a realization while refactoring some of my luminance Haskell code. Although it’s functional, stateless and type-safe, a typical use of luminance doesn’t really require laziness nor a garbage collector. And I don’t like using a tool – read language – like a bazooka. Haskell is the most powerful language ever in terms of abstraction and expressivity over speed ratio, but all of that power comes with an overhead. Even though you’ll find folks around stating that Haskell is pretty okay to code a video game, I think it will never compete with languages that are made to solve real time computations or reactive programming. And don’t get me wrong: I’m sure you can write a decent video game in Haskell – I qualify myself as a Haskeller and I’ve not been writing luminance just for the joy of writing it. However, the way I use Haskell with luminance shouldn’t require all the overhead – and profiling got me right, almost no GC was involved. So… I looked into Rust and discovered and learned the language in only three days. I think it’s due to the fact that Rust, which is simpler than Haskell in terms of type system features and has almost everything taken from Haskell, is, to me, an imperative Haskell. It’s like having a Haskell minus a few abstraction tools – HKT (but they’ll come soon), GADTs, fundeps, kinds, constraints, etc. – plus a total control of what’s happening. And I like that. A lot. A fucking lot. Porting luminance to Rust wasn’t hard as a Haskell codebase might map almost directly to Rust. I had to change a few things – for instance, Rust doesn’t have the concept of existential quantification as-is, which is used intensively in the Haskell version of luminance. But most Haskell modules map directly to their respective Rust modules. I changed the architecture of the files to have something clearer. I was working on loose coupling in Haskell for luminance. So I decided to directly introduce loose coupling into the Rust version. And it works like a charm. So there are, currently, two packages available: luminance, which is the core API, exporting the whole general interface, and luminance-gl, an OpenGL 3.3 backend – though it will contain more backends as the development goes on. The idea is that you need both the dependencies to have access to luminance’s features. I won’t say much today because I’m working on a demoscene production using luminance. I want it to be a proof that the framework is usable, works and acts as a first true example. Of course, the code will be open-source. The documentation is not complete yet but I put some effort documenting almost everything. You’ll find both the packages here: luminance-0.1.0 luminance-gl-0.1.0 I’ll write another article on how to use luminance as soon as possible! Keep the vibe! ## April 29, 2016 ### Darcs # darcs 2.12.0 release The Darcs team is pleased to announce the release of Darcs 2.12.0. ### Downloading One way of installing Darcs 2.12.0 is with stack:$ stack install darcs-2.12.0

and install Darcs with cabal-install:

$cabal update$ cabal install darcs-2.12.0

http://darcs.net/releases/darcs-2.12.0.tar.gz and build it by hand.

The 2.12 branch is also available as a darcs repository from
http://darcs.net/releases/branch-2.12

### What's new

#### Patch dependency graph export

The new command darcs show dependencies, enables to export the dependency graph of a repository (up to the latest tag, by default) as a Graphviz file:

#### whatsnew output and third-party VC frontend support

The flag whatsnew --machine-readable is a simplified version of whatsnew -s for easier parsability by third-party tools. Darcs 2.12 adds conflict markers to the output of whatnew when summarized (ie, when used with the -s or --machine-readable flags or via the darcs status alias). Thanks to this, Darcs support was reintroduced in meld 3.15.2 .

#### improvements for Git repository imports

File moves are converted to file moves primitives, instead of being file deletes and add as before. This enables to have smaller Darcs respositories with a more understandable history. This change adds to other improvements and fixes that make Git import more practical.

#### repository Weak Hashes

The command darcs show repo now shows a hash that is the XOR
of all hashes of the patches metadata of the repository. Being a XOR,
it does not depend on the patches' ordering. Also it is quite fast to
calculate. This Weak Hash can be useful to quickly check whether two
repositories of a single proyect have the same patches.

# Hacking projects over the next few months

Life’s been really hectic lately, but I’ve been getting (slowly) back into working on my Haskell packages. In particular, since the switch from darcs to github I’ve started getting more comments and feature requests, which is nice. Over the next half-year or so, here’s what I’ll be up to in my free time between work on the dissertation and work on Hakaru:

containers — I’ve been appointed one of the new co-maintainers of our favorite venerable library. I prolly won’t be doing any major work until autumn (as mentioned when I was appointed), but I’ve had a number of conversations with David Feuer about where to take things in terms of cleaning up some old maintenance cruft.

bytestring-trie — A few years back I started reimplementing my tries to use Bagwell’s Array Mapped Tries in lieu of Okasaki’s Big-Endian Patricia Tries, but then got stalled because life. I’ve started up on it again, and it’s just about ready to be released after a few more tweaks. Also, now that I’m working on it again I can finally clear out the backlog of API requests (sorry folks!).

exact-combinatorics — A user recently pointed me towards a new fast implementation of factorial making waves lately. It’s not clear just yet whether it’ll be faster than the current implementation, but should be easy enough to get going and run some benchmarks.

unification-fd — This one isn’t hacking so much as dissemination. I have a backlog of queries about why things are the way they are, which I need to address; and I’ve been meaning to continue the tutorial about how to use this library for your unification needs.

logfloat — We’ve been using this a lot in Hakaru, and there are a few performance tweaks I think I can add. The main optimization area is trying to minimize the conditionals for detecting edge cases. The biggest issue has just been coming up with some decent benchmarks. The problem, of course, is that most programs making use of logfloats do a lot of other work too so it can be tricky to detect the actual effect of changes. I think this is something Hakaru can help a lot with since it makes it easy to construct all sorts of new models.

# CodeWorld & Summer of Haskell 2016

As most Haskell community members know, Haskell was turned down by Google Summer of Code this year, and has instead been seeking to continue the tradition with Summer of Haskell, funded by smaller donations. I’m happy to announce that CodeWorld will be part of Summer of Haskell! I’ve donated to support one student working specifically on CodeWorld.

Are you a student, and interested in helping to build a platform for education in expressive mathematics and computer science? Want to work on a project with immediate impact teaching Haskell in multiple schools?  Please propose a project at https://summer.haskell.org/ between now and May 6th.

### Project Ideas

A great source of CodeWorld project ideas is the bug tracker.  Less well-defined projects are tagged as proposals, while more defined features are tagged as enhancements.  A few big ones to think about are:

• Export of CodeWorld projects as mobile applications
• Better and more language-aware editor support for Haskell in CodeMirror.
• Implementing constructive geometry
• Building social, gallery, and/or showcase features to help student work be more visible.
• Building a purely functional block-based programming environment.
• Implementing visual tools to help students understand substitution, list comprehensions, and more.

I look forward to working with someone this summer building something cool!

By the way, HUGE thanks to Edward Kmett and other Haskell.org committee members for making this happen this year!

# [vcppbmdn] Rubik's cube combinations

We implement in Haskell the formulae at http://www.speedcubing.com/chris/cubecombos.html to compute the number of combinations of a NxN Rubik's cube, as well as a supercube (for which center facet orientation matters) and a super-supercube (also called a "real" cube, for which inner cubie orientations also matter) variations.

import Math.Combinatorics.Exact.Factorial(factorial); div_exact :: Integer -> Integer -> Integer; div_exact x y = case divMod x y of { (q,0) -> q; _ -> error "division had a remainder"; }; common :: Integer -> Integer -> Integer -> Integer; common i den_base n = let { e1 :: Integer; e1 = div (n*n - 2*n) 4; e2 :: Integer; e2 = div ((n-2)^2) 4; numerator :: Integer; numerator = (24 * 2^i * factorial 12)^(mod n 2) * factorial 7 * 3^6 * (factorial 24)^e1; denominator :: Integer; denominator = den_base ^ e2; } in div_exact numerator denominator; -- https://oeis.org/A075152 cube :: Integer -> Integer; cube = common 10 ((factorial 4)^6); -- https://oeis.org/A080660 supercube :: Integer -> Integer; supercube = common 21 2; -- https://oeis.org/A080659 super_supercube :: Integer -> Integer; super_supercube n = let { e1 :: Integer; e1 = div_exact (n^3-4*n+(mod n 2)*(12 - 9*n)) 24; e2 :: Integer; e2 = div (n-2) 2 + mod n 2; e4 :: Integer; e4 = mod n 2 * div n 2; } in (div_exact (factorial 24) 2)^e1 * 24^e2 * (factorial 7 * 3^6)^(div n 2) * (factorial 12 * 2^21)^e4;

Incidentally, the output numbers factor easily. The largest prime factor in them is 23. One could do the calculations over a factor base of primes from 2 to 23 to get the output directly in compact factored form.

Next, we consider the Megaminx, and higher odd-order variants (so not Flowerminx / Kilominx):

-- http://michael-gottlieb.blogspot.com/2008/05/number-of-positions-of-generalized.html -- n=1 megaminx; n=2 gigaminx; etc. megaminx :: Integer -> Integer; megaminx n = div_exact (factorial 30 * factorial 20 * (factorial 60)^(n^2-1) * (factorial 5)^(12*n) * 2^28 * 3^19) ((factorial 5)^(12*n^2) * 2^n); 

# Quantifiers in type theory

All this stuff is "well known", but I want to put it out there for folks who may not have encountered it, or not encountered it all together in one picture.

The Damas–Hindley–Milner type system (i.e., the type system that Algorithm W is inferring types for) is propositional logic extended with rank-1 second-order universal quantifiers. It is interesting because it is so particularly stable with respect to inference, decidability, etc. That is, we can come up with many other algorithms besides Algorithm W and they enjoy nice properties like the fact that adding type signatures won't cause inference to fail. (It's worth noting, that Algorithm W is DEXPTIME-complete; so while in practice it's often linear time, for pathological inputs it can take exponentially long. However, if we put a constant bound on the depth of nested let-bindings, then the upper bound becomes polynomial.)

The extension of DHM with rank-1 second-order existential quantifiers is strictly more powerful. It is interesting because it allows unrestricted use of both of the quantifiers in prenex position; thus, it is the limit/top of the alternating quantifier hierarchy (à la the arithmetical hierarchy) that starts with DHM. Surely there are other interesting properties here, but this system is understudied relative to the ones above and below. Edit: Although GHC gets by with encoding existentials away, it's worth noting that MLF allows existentials where the unpacking is implicit rather than requiring an "unseal" or case eliminator (Leijen 2006); and also that UHC does in fact offer first-class existentials (Dijkstra 2005).

The extension with rank-2 second-order universals (i.e., where the universal quantifier can appear to the left of one function arrow) is strictly more powerful still. Here we can encode rank-1 existentials, but my point in this whole post is to point out that rank-1 existentials themselves are strictly weaker than the rank-2 universals it takes to encode them! Also, one little-known fact: this type system is interesting because it is the last one in this progression where type inference is decidable. The decidability of rank-2 universal quantification is part of the reason why GHC distinguishes between -XRank2Types vs -XRankNTypes. Alas, although inference is decidable —and thus of mathematical interest— it is not decidable in the same robust way that DHM is. That is, if we care about human factors like good error messages or not breaking when the user adds type signatures, then we don't get those properties here. Still, the fact that this system is at the cusp of decidable inference is important to know. Edit: Also of interest, this system has the same typeable terms as simply-typed λ-calculus with rank-2 intersection types, and the type inference problem here is fundamentally DEXPTIME-complete (Jim 1995).

Things keep alternating back and forth between existentials and universals of each rank; so far as I'm aware, none of these systems are of any particular interest until we hit the limit: rank-ω (aka: rank-N) second-order quantification. This type system is often called "System F", but that's a misnomer. It is important to differentiate between the syntactic system (i.e., actual System F) we're inferring types for, vs the type system (aka: propositional logic with second-order quantifiers) in which the inferred types live. That is, we can perfectly well have a syntactic system which doesn't have explicit type abstractions/applications but for which we still ascribe rank-ω types. It so happens that the type inference problem is undecidable for that syntactic system, but it was already undecidable way back at rank-3 so the undecidability isn't particularly novel.

# Data is Code

<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>

The title of this post is a play on the Lisp aphorism: "Code is Data". In the Lisp world everything is data; code is just another data structure that you can manipulate and transform.

However, you can also go to the exact opposite extreme: "Data is Code"! You can make everything into code and implement data structures in terms of code.

You might wonder what that even means: how can you write any code if you don't have any primitive data structures to operate on? Fascinatingly, Alonzo Church discovered a long time ago that if you have the ability to define functions you have a complete programming language. "Church encoding" is the technique named after his insight that you could transform data structures into functions.

This post is partly a Church encoding tutorial and partly an announcement for my newly released annah compiler which implements the Church encoding of data types. Many of the examples in this post are valid annah code that you can play with. Also, to be totally pedantic annah implements Boehm-Berarducci encoding which you can think of as the typed version of Church encoding.

This post assumes that you have basic familiarity with lambda expressions. If you do not, you can read the first chapter (freely available) of the Haskell Programming from First Principles which does an excellent job of teaching lambda calculus.

If you would like to follow along with these examples, you can download and install annah by following these steps:

• Install the stack tool
• Create the following stack.yaml file

$cat > stack.yamlresolver: lts-5.13packages: []extra-deps:- annah-1.0.0- morte-1.6.0<Ctrl-D> • Run stack setup • Run stack install annah • Add the installed executable to your $PATH

#### Lambda calculus

In the untyped lambda calculus, you only have lambda expressions at your disposal and nothing else. For example, here is how you encode the identity function:

λx → x

That's a function that takes one argument and returns the same argument as its result.

We call this "abstraction" when we introduce a variable using the Greek lambda symbol and we call the variable that we introduce a "bound variable". We can then use that "bound variable" anywhere within the "body" of the lambda expression.

+-- Abstraction||+-- Bound variable||vvλx → x     ^     |     +-- Body of lambda expression

Any expression that begins with a lambda is an anonymous function which we can apply to another expression. For example, we can apply the the identity function to itself like this:

(λx → x) (λy → y)-- β-reduction= λy → y

We call this "application" when we supply an argument to an anonymous function.

We can define a function of multiple arguments by nested "abstractions":

λx → λy → x

The above code is an anonymous function that returns an anonymous function. For example, if you apply the outermost anonymous function to a value, you get a new function:

(λx → λy → x) 1-- β-reduceλy → 1

... and if you apply the lambda expression to two values, you return the first value:

(λx → λy → x) 1 2-- β-reduce(λy → 1) 2-- β-reduce1

So our lambda expression behaves like a function of two arguments, even though it's really a function of one argument that returns a new function of one argument. We call this "currying" when we simulate functions of multiple arguments using functions one argument. We will use this trick because we will be programming in a lambda calculus that only supports functions of one argument.

#### Typed lambda calculus

In the typed lambda calculus you have to specify the types of all function arguments, so you have to write something like this:

λ(x : a) → x

... where a is the type of the bound variable named x.

However, the above function is still not valid because we haven't specified what the type a is. In theory, we could specify a type like Int:

λ(x : Int) → x

... but the premise of this post was that we could program without relying on any built-in data types so Int is out of the question for this experiment.

Fortunately, some typed variations of lambda calculus (most notably: "System F") let you introduce the type named a as yet another function argument:

λ(a : *) → λ(x : a) → x

This is called "type abstraction". Here the * is the "type of types" and is a universal constant that is always in scope, so we can always introduce new types as function arguments this way.

The above function is the "polymorphic identity function", meaning that this is the typed version of the identity function that still preserves the ability to operate on any type.

If we had built-in types like Int we could apply our polymorphic function to the type just like any other argument, giving back an identity function for a specific type:

(λ(a : *) → λ(x : a) → x) Int-- β-reductionλ(x : Int) → x

This is called "type application" or (more commonly) "specialization". A "polymorphic" function is a function that takes a type as a function argument and we "specialize" a polymorphic function by applying the function to a specific type argument.

However, we are forgoing built-in types like Int, so what other types do we have at our disposal?

Well, every lambda expression has a corresponding type. For example, the type of our polymorphic identity function is:

∀(a : *) → ∀(x : a) → a

You can read the type as saying:

• this is a function of two arguments, one argument per "forall" (∀) symbol
• the first argument is named a and a is a type
• the second argument is named x and the type of x is a
• the result of the function must be a value of type a

This type uniquely determines the function's implementation. To be totally pedantic, there is exactly one implementation up to extensional equality of functions. Since this function has to work for any possible type a there is only one way to implement the function. We must return x as the result, since x is the only value available of type a.

Passing around types as values and function arguments might seem a bit strange to most programmers since most languages either:

• do not use types at all

Example: Javascript

// The polymorphic identity function in Javascriptfunction id(x) {    return x}// Example use of the functionid(true)
• do use types, but they hide type abstraction and type application from the programmer through the use of "type inference"

-- The polymorphic identity function in Haskellid x = x-- Example use of the functionid True
• they use a different syntax for type abstraction/application versus ordinary abstraction and application

Example: Scala

-- The polymorphic identity function in Scaladef id[A](x : a)-- Example use of the function-- Note: Scala lets you omit the [Boolean] here thanks--       to type inference but I'm making the type--       application explicit just to illustrate that--       the syntax is different from normal function--       applicationid[Boolean](true)

For the purpose of this post we will program with explicit type abstraction and type application so that there is no magic or hidden machinery.

So, for example, suppose that we wanted to apply the typed, polymorphic identity function to itself. The untyped version was this:

(λx → x) (λy → y)

... and the typed version is this:

(λ(a : *) → λ(x : a) → x)    (∀(b : *) → ∀(y : b) → b)    (λ(b : *) → λ(y : b) → y)-- β-reduction= (λ(x : ∀(b : *) → ∀(y : b) → b) → x)      (λ(b : *) → λ(y : b) → y)-- β-reduction= (λ(b : *) → λ(y : b) → y)

So we can still apply the identity function to itself, but it's much more verbose. Languages with type inference automate this sort of tedious work for you while still giving you the safety guarantees of types. For example, in Haskell you would just write:

(\x -> x) (\y -> y)

... and the compiler would figure out all the type abstractions and type applications for you.

Exercise: Haskell provides a const function defined like this:

const :: a -> b -> aconst x y = x

Translate const function to a typed and polymorphic lambda expression in System F (i.e. using explicit type abstractions)

#### Boolean values

Lambda expressions are the "code", so now we need to create "data" from "code".

One of the simplest pieces of data is a boolean value, which we can encode using typed lambda expressions. For example, here is how you implement the value True:

λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True

Note that the names have no significance at all. I could have equally well written the expression as:

λ(a : *) → λ(x : a) → λ(y : a) → x

... which is "α-equivalent" to the previous version (i.e. equivalent up to renaming of variables).

We will save the above expression to a file named ./True in our current directory. We'll see why shortly.

We can either save the expression using Unicode characters:

$cat > ./Trueλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True<Ctrl-D> ... or using ASCII, replacing each lambda (i.e. λ) with a backslash (i.e. \) and replacing each arrow (i.e. →) with an ASCII arrow (i.e. ->) $ cat > ./True\(Bool : *) -> \(True : Bool) -> \(False : Bool) -> True<Ctrl-D>

... whichever you prefer. For the rest of this tutorial I will use Unicode since it's easier to read.

Similarly, we can encode False by just changing our lambda expression to return the third argument named False instead of the second argument named True. We'll name this file ./False:

$cat > ./Falseλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False<Ctrl-D> What's the type of a boolean value? Well, both the ./True and ./False files have the same type, which we shall call ./Bool: $ cat > ./Bool∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool

... and if you are following along with ASCII you can replace each forall symbol (i.e. ∀) with the word forall:

$cat > ./Boolforall (Bool : *) -> forall (True : Bool) -> forall (False : Bool) -> Bool We are saving these terms and types to files because we can use the annah compiler to work with any lambda expression or type saved as a file. For example, I can use the annah compiler to verify that the file ./True has type ./Bool: $ annah-- Read this as: "./True has type ./Bool"./True : ./Bool<Ctrl-D>./True$echo$?0

If the expression type-checks then annah will just compile the expression to lambda calculus (by removing the unnecessary type annotation in this case) and return a zero exit code. However, if the expression does not type-check:

$annah./True : ./Trueannah: Expression: ∀(x : λ(Bool : *) → λ(True : Bool) → λ(False : Bool)→ True) → λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → TrueError: Invalid input typeType: λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True$ echo $?1 ... then annah will throw an exception and return a non-zero exit code. In this case annah complains that the ./True on the right-hand side of the type annotation is not a valid type. The last thing we need is a function that can consume values of type ./Bool, like an ./if function: $ cat > ./ifλ(x : ./Bool ) → x--          ^--          |--          +-- Note the space.  Filenames must end with a space

The definition of ./if is blindingly simple: ./if is just the identity function on ./Bools!

To see why this works, let's see what the type of ./if is. We can ask for the type of any expression by feeding the expression to the morte compiler via standard input:

$morte < ./if∀(x : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) →∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(x : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) → x morte is a lambda calculus compiler installed alongside annah and annah is a higher-level interface to the morte language. By default, the morte compiler will: • resolve all file references (transitively, if necessary) • type-check the expression • optimize the expression • write the expression's type to standard error as the first line of output • write the optimized expression to standard output as the last line of output In this case we only cared about the type, so we could have equally well just asked the morte compiler to resolve and infer the type of the expression: $ morte resolve < ./Bool/if | morte typecheck∀(x : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) →∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool

The above type is the same thing as:

∀(x : ./Bool ) → ./Bool

If you don't believe me you can prove this to yourself by asking morte to resolve the type:

$echo "∀(x : ./Bool ) → ./Bool" | morte resolve∀(x : ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool) →∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool However, the type will make the most sense if you only expand out the second ./Bool in the type but leave the first ./Bool alone: ./Bool → ∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Bool You can read this type as saying that the ./if function takes four arguments: • the first argument is the ./Bool that we want to branch on (i.e. ./True or ./False) • the second argument is the result type of our ./if expression • the third argument is the result we return if the ./Bool evaluates to ./True (i.e. the "then" branch) • the fourth argument is the result we return if the ./Bool evaluates to ./False (i.e. the "else" branch) For example, this Haskell code: if Truethen Falseelse True ... would translate to this Annah code: $ annah./if ./True    ./Bool   -- The type of the result    ./False  -- The then branch    ./True   -- The else branch<Ctrl-D>./if  ./True  ./Bool  ./False  ./True

annah does not evaluate the expression. annah only translates the expression into Morte code (and the expression is already valid Morte code) and type-checks the expression. If you want to evaluate the expression you need to run the expression through the morte compiler, too:

$morte./if ./True ./Bool -- The type of the result ./False -- The then branch ./True -- The else branch<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False morte deduces that the expression has type ./Bool and the expression evaluates to ./False. morte evaluates the expression by resolving all references and repeatedly applying β-reduction. This is what happens under the hood: ./if ./True ./Bool ./False ./True-- Resolve the ./if reference= (λ(x : ./Bool ) → x) ./True ./Bool ./False ./True-- β-reduce= ./True ./Bool ./False ./True-- Resolve the ./True reference= (λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True) ./Bool ./False ./True-- β-reduce= (λ(True : ./Bool ) → λ(False : ./Bool ) → True) ./False ./True-- β-reduce= (λ(False : ./Bool ) → ./False ) ./True-- β-reduce= ./False-- Resolve the ./False referenceλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False The above sequence of steps is a white lie: the true order of steps is actually different, but equivalent. The ./if function was not even necessary because every value of type ./Bool is already a "pre-formed if expression". That's why ./if is just the identity function on ./Bools. You can delete the ./if from the above example and the code will still work. Now let's define the not function and save the function to a file: $ annah > ./notλ(b : ./Bool ) →    ./if b        ./Bool        ./False  -- If b is ./True  then return ./False        ./True   -- If b is ./False then return ./True<Ctrl-D>

We can now use this file like an ordinary function:

$morte./not ./False<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True$ morte./not ./True<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False

Notice how ./not ./False returns ./True and ./not ./True returns ./False.

Similarly, we can define an and function and an or function:

$annah > andλ(x : ./Bool ) → λ(y : ./Bool ) → ./if x ./Bool y -- If x is ./True then return y ./False -- If x is ./False then return ./False<Ctrl-D> $ annah > orλ(x : ./Bool ) → λ(y : ./Bool ) →    ./if x        ./Bool        ./True   -- If x is ./True  then return ./True        y        -- If x is ./False then return y<Ctrl-D>

... and use them:

$morte./and ./True ./False<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False $ morte./or ./True ./False<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True

We started with nothing but lambda expressions, but still managed to implement:

• a ./Bool type
• a ./True value of type ./Bool
• a ./False value of type ./Bool
• ./if, ./not, ./and, and ./or functions

... and we can do real computation with them! In other words, we've modeled boolean data types entirely as code.

Exercise: Implement an xor function

#### Natural numbers

You might wonder what other data types you can implement in terms of lambda calculus. Fortunately, you don't have to wonder because the annah compiler will actually compile data type definitions to lambda expressions for you.

For example, suppose we want to define a natural number type encoded using Peano numerals. We can write:

$annah typestype Natdata Succ (pred : Nat)data Zerofold foldNat<Ctrl-D> You can read the above datatype specification as saying: • Define a type named Nat ... • ... with a constructor named Succ with one field named pred of type Nat ... • ... with another constructor named Zero with no fields • ... and a fold named foldNat annah then translates the datatype specification into the following files and directories: +-- ./Nat.annah -- annah implementation of Nat|-- ./Nat | +-- @ -- morte implementation of Nat | -- | -- If you import the ./Nat directory this file is | -- imported instead | +-- Zero.annah -- annah implementation of Zero | +-- Zero -- morte implementation of Zero | +-- Succ.annah -- annah implementation of Succ | +-- Succ -- morte implementation of Succ | +-- foldNat.annah -- annah implementation of foldNat | -- foldNat -- morte implementation of foldNat Let's see how the Nat type is implemented: ∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Nat All Boehm-Berarducci-encoded datatypes are encoded as substitution functions, including ./Nat. Any value of ./Nat is a function that takes three arguments that we will substitute into our natural number expression: • The first argument replace every occurrence of the Nat type • The second argument replaces every occurrence of the Succ constructor • The third argument replaces every occurrence of the Zero constructor This will make more sense if we walk through a specific example. First, we will build the number 3 using the ./Nat/Succ and ./Nat/Zero constructors: $ morte./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ Zero))

Now suppose that we want to compute whether or not our natural number is even. The only catch is that we must limit ourselves to substitution when computing even. We have to figure out something that we can substitute in place of the Succ constructors and something that we can substitute in place of the Zero constructors that will then evaluate to ./True if the natural number is even and ./False otherwise.

One substitution that works is the following:

• Replace every Zero with ./True (because Zero is even)
• Replace every Succ with ./not (because Succ alternates between even and odd)

So in other words, if we began with this:

./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))

... and we substitute with ./Nat/Succ with ./not and substitute ./Nat/Zero with ./True:

./not (./not (./not ./True ))

... then the expression will reduce to ./False.

Let's prove this by saving the above number to a file named ./three:

$morte > ./three./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))$ cat threeλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ Zero))

The first thing we need to do is to replace the Nat with ./Bool:

./three ./Bool-- Resolve ./three= (λ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →      Succ (Succ (Succ Zero))  ) ./Bool-- β-reduce= λ(Succ : ∀(pred : ./Bool ) → ./Bool ) → λ(Zero : ./Bool ) →      Succ (Succ (Succ Zero))

Now the next two arguments have exactly the right type for us to substitute in ./not and ./True. The argument named ./Succ is now a function of type ∀(pred : ./Bool ) → ./Bool, which is the same type as ./not. The argument named Zero is now a value of type ./Bool, which is the same type as ./True. This means that we can proceed with the next two arguments:

./three ./Bool ./not ./True-- Resolve ./three= (λ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →      Succ (Succ (Succ Zero))  ) ./Bool ./not ./True-- β-reduce= (λ(Succ : ∀(pred : ./Bool ) → ./Bool ) → λ(Zero : ./Bool ) →      Succ (Succ (Succ Zero))  ) ./not ./True-- β-reduce= (λ(Zero : ./Bool ) → ./not (./not (./not Zero))) ./True-- β-reduce= ./not (./not (./not ./True )))

The result is exactly what we would have gotten if we took our original expression:

./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero ))

... and replaced ./Nat/Succ with ./not and replaced ./Nat/Zero with ./True.

Let's verify that this works by running the code through the morte compiler:

$morte./three ./Bool ./not ./True<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False morte computes that the number ./three is not even, returning ./False. We can even go a step further and save an ./even function to a file: $ annah > even\(n : ./Nat ) →    n ./Bool        ./not   -- Replace each ./Nat/Succ with ./not        ./True  -- Replace each ./Nat/Zero with ./True

... and use our newly-formed ./even function:

$morte./even ./three<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False $ morte./even ./Nat/Zero<Ctrl-D>∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True

The annah compiler actually provides direct support for natural number literals, so you can also just write:

$annah | morte./even 100∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True What about addition? How do we add two numbers using only substitution? Well, one way we can add two numbers, m and n, is that we substitute each ./Nat/Succ in m with ./Nat/Succ (i.e. keep them the same) and substitute the Zero with n. In other words: $ annah > plusλ(m : ./Nat ) → λ(n : ./Nat ) →    m ./Nat         -- The result will still be a ./Nat        ./Nat/Succ  -- Replace each ./Nat/Succ with ./Nat/Succ        n           -- Replace each ./Nat/Zero with n

Let's verify that this works:

$annah | morte./plus 2 2∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ (Succ Zero))) We get back a Church-encoded 4! What happened under the hood was the following substitutions: ./plus 2 2-- Resolve ./plus= (λ(m : ./Nat ) → λ(n : ./Nat ) → m ./Nat ./Nat/Succ n) 2 2-- β-reduce= (λ(n : ./Nat ) → 2 ./Nat ./Nat/Succ n) 2-- β-reduce= 2 ./Nat ./Nat/Succ 2-- Definition of 2= (./Nat/Succ (./Nat/Succ ./Nat/Zero )) ./Nat ./Nat/Succ 2-- Resolve and β-reduce the definition of 2 (multiple steps)= (λ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) → Succ (Succ Zero) ) ./Nat ./Nat/Succ 2-- β-reduce= (λ(Succ : ∀(pred : ./Nat ) → ./Nat ) → λ(Zero : ./Nat ) → Succ (Succ Zero) ) ./Nat/Succ 2-- β-reduce= (λ(Zero : ./Nat ) → ./Nat/Succ (./Nat/Succ Zero)) 2-- β-reduce= ./Nat/Succ (./Nat/Succ 2)-- Definition of 2= ./Nat/Succ (./Nat/Succ (./Nat/Succ (./Nat/Succ ./Nat/Zero )))-- Resolve and β-reduce (multiple steps)= λ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) → Succ (Succ (Succ (Succ Zero))) So we can encode natural numbers in lambda calculus, albeit very inefficiently! There are some tricks that we can use to greatly speed up both the time complexity and constant factors, but it will never be competitive with machine arithmetic. This is more of a proof of concept that you can model arithmetic purely in code. Exercise: Implement a function which multiplies two natural numbers #### Data types annah also lets you define "temporary" data types that scope over a given expression. In fact, that's how Nat was implemented. You can look at the corresponding *.annah files to see how each type and term is defined in annah before conversion to morte code. For example, here is how the Nat type is defined in annah: $ cat Nat.annahtype Natdata Succ (pred : Nat)data Zerofold foldNatin   Nat

The first four lines are identical to what we wrote when we invoked the annah types command from the command line. We can use the exact same data type specification to create a scoped expression that can reference the type and data constructors we specified.

When we run this expression through annah we get back the Nat type:

$annah < Nat.annah∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Nat You can use these scoped datatype declarations to quickly check how various datatypes are encoded without polluting your current working directory. For example, I can ask annah how the type Maybe is encoded in lambda calculus: $ annahλ(a : *) →    type Maybe    data Just (x : a)    data Nothing    -- You can also leave out this fold if you don't use it    fold foldMaybe    in   Maybe<Ctrl-D>λ(a : *) → ∀(Maybe : *) → ∀(Just : ∀(x : a) → Maybe) →∀(Nothing : Maybe) → Maybe

A Maybe value is just another substitution function. You provide one branch that you substitute for Just and another branch that you substitute for Nothing. For example, the Just constructor always substitutes in the first branch and ignores the Nothing branch that you supply:

$annahλ(a : *) → type Maybe data Just (x : a) data Nothing in Just<Ctrl-D>λ(a : *) → λ(x : a) → λ(Maybe : *) → λ(Just : ∀(x : a) → Maybe)→ λ(Nothing : Maybe) → Just x Vice versa, the Nothing constructor substitutes in the Nothing branch that you supply and ignores the Just branch: $ annahλ(a : *) →      type Maybe    data Just (x : a)    data Nothing    in   Nothing<Ctrl-D>λ(a : *) → λ(Maybe : *) → λ(Just : ∀(x : a) → Maybe) → λ(Nothing : Maybe) → Nothing

Notice how we've implemented Maybe and Just purely using functions. This implies that any language with functions can encode Maybe, like Python!

Let's translate the above definition of Just and Nothing to the equivalent Python code. The only difference is that we delete the type abstractions because they are not necessary in Python:

def just(x):  def f(just, nothing):    return just(x)  return fdef nothing():  def f(just, nothing):    return nothing  return f

We can similarly translate Haskell-style pattern matching like this:

example :: Maybe Int -> IO ()example m = case m of    Just n  -> print n    Nothing -> return ()

... into this Python code:

def example(m):  def just(n):    # This is what we substitute in place of Just    print(n)  def nothing():  # This is what we substitute in place of Nothing    return  m(just, nothing)

... and verify that our pattern matching function works:

>>> example(nothing())>>> example(just(1))1

Neat! This means that any algebraic data type can be embedded into any language with functions, which is basically every language!

Warning: your colleagues may get angry at you if you do this! Consider using a language with built-in support for algebraic data types instead of trying to twist your language into something it's not.

#### Let expressions

You can also translate let expressions to lambda calculus, too. For example, instead of saving something to a file we can just use a let expression to temporarily define something within a program.

For example, we could write:

$annah | mortelet x : ./Nat = ./plus 1 2in ./plus x x∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ (Succ (Succ (Succ Zero))))) ... but that doesn't really tell us anything about how annah desugars let because we only see the final evaluated result. We can ask annah to desugar without performing any other transformations using the annah desugar command: $ annah desugarlet x : ./Nat = ./plus 1 2in  ./plus x x<Ctrl-D>(λ(x : ./Nat ) → ./plus  x x) (./plus  (λ(Nat : *) → λ(Succ :∀(pred : Nat) → Nat) → λ(Zero : Nat) → Succ Zero) (λ(Nat : *) →λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) → Succ (SuccZero)))

... which makes more sense if we clean up the result through the use of numeric literals:

(λ(x : ./Nat ) → ./plus x x) (./plus 1 2)

Every time we write an expression of the form:

let x : t = yin  e

... we decode that to lambda calculus as:

(λ(x : t) → e) y

We can also decode function definitions, too. For example, you can write:

$annah | mortelet increment (x : ./Nat ) : ./Nat = ./plus x 1in increment 3<Ctrl-D>∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ (Succ Zero))) ... and the intermediate desugared form also encodes the function definition as a lambda expression: $ annah desugarlet increment (x : ./Nat ) : ./Nat = ./plus x 1in  increment 3<Ctrl-D>(λ(increment : ∀(x : ./Nat ) → ./Nat ) → increment (λ(Nat : *)→ λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) → Succ (Succ(Succ Zero)))) (λ(x : ./Nat ) → ./plus  x (λ(Nat : *) → λ(Succ: ∀(pred : Nat) → Nat) → λ(Zero : Nat) → Succ Zero)

... which you can clean up as this expression:

(λ(increment : ∀(x : ./Nat ) → ./Nat ) → increment 3)    (λ(x : ./Nat ) → ./plus x 1)

We can combine let expressions with data type expressions, too. For example, here's our original not program, except without saving anything to files:

$annahtype Booldata Truedata Falsefold ifinlet not (b : Bool) : Bool = if b Bool False Trueinnot False<Ctrl-D>λ(Bool : *) → λ(True : Bool) → λ(False : Bool) → True #### Lists annah also provides syntactic support for lists as well. For example: $ annah[nil ./Bool , ./True , ./False , ./True ]<Ctrl-D>λ(List : *) → λ(Cons : ∀(head : ./Bool ) → ∀(tail : List) →List) → λ(Nil : List) → Cons ./True  (Cons ./False  (Cons./True  Nil))

Just like all the other data types, a list is defined in terms of what you use to substitute each Cons and Nil constructor. I can replace each Cons with ./and and the Nil with ./True like this:

$annah | morte<Ctrl-D>[nil ./Bool , ./True , ./False , ./True ] ./Bool ./and ./True∀(Bool : *) → ∀(True : Bool) → ∀(False : Bool) → Boolλ(Bool : *) → λ(True : Bool) → λ(False : Bool) → False This conceptually followed the following reduction sequence: ( λ(List : *)→ λ(Cons : ∀(head : ./Bool ) → ∀(tail : List) → List)→ λ(Nil : List)→ Cons ./True (Cons ./False (Cons ./True Nil))) ./Bool ./and ./True-- β-reduction= ( λ(Cons : ∀(head : ./Bool ) → ∀(tail : ./Bool ) → ./Bool ) → λ(Nil : ./Bool ) → Cons ./True (Cons ./False (Cons ./True Nil)) ) ./and ./True-- β-reduction= ( λ(Nil : ./Bool ) → ./and ./True (./and ./False (./and ./True Nil)) ) ./True-- β-reduction= ./and ./True (./and ./False (./and ./True ./True)) Similarly, we can sum a list by replacing each Cons with ./plus and replacing each Nil with 0: $ annah | morte[nil ./Nat , 1, 2, 3, 4] ./Nat ./plus 0∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (SuccZero)))))))))

This behaves as if we had written:

./plus 1 (./plus 2 (./plus 3 (./plus 4 0)))

#### Prelude

annah also comes with a Prelude to show some more sophisticated examples of what you can encode in pure lambda calculus. You can find version 1.0 of the Prelude here:

http://sigil.place/prelude/annah/1.0/

You can use these expressions directly within your code just by referencing their URL. For example, the remote Bool expression is located here:

http://sigil.place/prelude/annah/1.0/Bool/@

... and the remote True expression is located here:

http://sigil.place/prelude/annah/1.0/Bool/True

... so we can check if the remote True's type matches the remote Bool by writing this:

$annahhttp://sigil.place/prelude/annah/1.0/Bool/True : http://sigil.place/prelude/annah/1.0/Bool<Ctrl-D>http://sigil.place/prelude/annah/1.0/Bool/True$ echo $?0 Similarly, we can build a natural number (very verbosely) using remote Succ and Zero: $ annah | mortehttp://sigil.place/prelude/annah/1.0/Nat/Succ(   http://sigil.place/prelude/annah/1.0/Nat/Succ    (   http://sigil.place/prelude/annah/1.0/Nat/Succ        http://sigil.place/prelude/annah/1.0/Nat/Zero    ))∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ Zero))

However, we can also locally clone the Prelude using wget if we wish to refer to local file paths instead of remote paths:

$wget -np -r --cut-dirs=3 http://sigil.place/prelude/annah/1.0/$ cd sigil.place$ls(->) Defer.annah List.annah Path Sum0.annah(->).annah Eq Maybe Path.annah Sum1Bool Eq.annah Maybe.annah Prod0 Sum1.annahBool.annah Functor Monad Prod0.annah Sum2Category Functor.annah Monad.annah Prod1 Sum2.annahCategory.annah index.html Monoid Prod1.annahCmd IO Monoid.annah Prod2Cmd.annah IO.annah Nat Prod2.annahDefer List Nat.annah Sum0 Now we can use these expressions using their more succinct local paths: ./Nat/sum (./List/(++) ./Nat [nil ./Nat , 1, 2] [nil ./Nat , 3, 4])<Ctrl-D>∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (SuccZero))))))))) Also, every expression has a corresponding *.annah file that documents the expression's type using a let expression. For example, we can see the type of the ./List/(++) function by studying the ./List/(++).annah file: cat './List/(++).annah' let (++) (a : *) (as1 : ../List a) (as2 : ../List a) : ../List a = \(List : *) -> \(Cons : a -> List -> List) -> \(Nil : List) -> as1 List Cons (as2 List Cons Nil)in (++) The top line tells us that (++) is a function that takes three arguments: • An argument named a for the type list elements you want to combine • An argument named as1 for the left list you want to combine • An argument named as2 for the right list you want to combine ... and the function returns a list of the same type as the two input lists. #### Beyond Exactly how far can you take lambda calculus? Well, here's a program written in annah that reads two natural numbers, adds them, and writes them out: $ annah | morte./IO/Monad ./Prod0 (do ./IO {    x : ./Nat   <- ./IO/get ;    y : ./Nat   <- ./IO/get ;    _ : ./Prod0 <- ./IO/put (./Nat/(+) x y);})∀(IO : *) → ∀(Get_ : ((∀(Nat : *) → ∀(Succ : ∀(pred : Nat) →Nat) → ∀(Zero : Nat) → Nat) → IO) → IO) → ∀(Put_ : (∀(Nat : *)→ ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Nat) → IO →IO) → ∀(Pure_ : (∀(Prod0 : *) → ∀(Make : Prod0) → Prod0) → IO)→ IOλ(IO : *) → λ(Get_ : ((∀(Nat : *) → ∀(Succ : ∀(pred : Nat) →Nat) → ∀(Zero : Nat) → Nat) → IO) → IO) → λ(Put_ : (∀(Nat : *)→ ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Nat) → IO →IO) → λ(Pure_ : (∀(Prod0 : *) → ∀(Make : Prod0) → Prod0) → IO)→ Get_ (λ(r : ∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) →∀(Zero : Nat) → Nat) → Get_ (λ(r : ∀(Nat : *) → ∀(Succ :∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Nat) → Put_ (λ(Nat : *)→ λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) → r@1 Nat Succ(r Nat Succ Zero)) (Pure_ (λ(Prod0 : *) → λ(Make : Prod0) →Make))))

This does not run the program, but it creates a syntax tree representing all program instructions and the flow of information.

annah supports do notation so you can do things like write list comprehensions in annah:

\$ annah | morte./List/sum (./List/Monad ./Nat (do ./List {    x : ./Nat <- [nil ./Nat , 1, 2, 3];    y : ./Nat <- [nil ./Nat , 4, 5, 6];    _ : ./Nat <- ./List/pure ./Nat (./Nat/(+) x y);}))<Ctrl-D>∀(Nat : *) → ∀(Succ : ∀(pred : Nat) → Nat) → ∀(Zero : Nat) → Natλ(Nat : *) → λ(Succ : ∀(pred : Nat) → Nat) → λ(Zero : Nat) →Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ(Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ (Succ(Succ (Succ (Succ Zero))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

The above Annah program is equivalent to the following Haskell program:

sum (do    x <- [1, 2, 3]    y <- [4, 5, 6]    return (x + y) )

... yet it is implemented 100% in a minimal and total lambda calculus without any built-in support for data types.

This tutorial doesn't cover how do notation works, but you can learn this and more by reading the Annah tutorial which is bundled with the Hackage package:

#### Conclusions

A lot of people underestimate how much you can do in a total lambda calculus. I don't recommend pure lambda calculus as a general programming language, but I could see a lambda calculus enriched with high-efficiency primitives to be a realistic starting point for simple functional languages that are easy to port and distribute.

One of the projects I'm working towards in the long run is a "JSON for code" and annah is one step along the way towards that goal. annah will likely not be that language, but I still factored out annah as a separate and reusable project along the way so that others could fork and experiment with annah when experimenting with their own language design.

Also, as far as I can tell annah is the only project in the wild that actually implements the Boehm-Berarducci encoding outlined in this paper:

... so if you prefer to learn the encoding algorithm by studying actual code you can use the annah source code as a reference real-world implementation.

You can find Annah project on Hackage or Github:

... and you can find the Annah prelude hosted online:

The Annah tutorial goes into the Annah language and compiler in much more detail than this tutorial, so if you would like to learn more I highly recommend reading the tutorial which walks through the compiler, desugaring, and the Prelude in much more detail:

Also, for those who are curious, both the annah and morte compilers are named after characters from the game Planescape: Torment.