Planet Haskell

June 29, 2015

Dimitri Sabadie

Mac OS X support in al-0.1.4

Support for Mac users!

This will be a short announcement about al, a Haskell wrapper to the OpenAL C library.

Currently, the wrapper has been successfully tested on Linux – at least it works well on my Archlinux distro. I made a little program that reads from an .ogg file and streams the PCM signal to the wrapper – see libvorbis for further details. I’ll release the program later on if I find the spare time.

The wrapper might also work on Windows as long as you have pkg-config installed. I’d be very happy with feedback from people working on Windows. I don’t want anyone be put apart with my packages.

However, I’ve never tested the wrapper on Mac OS X. I guessed it wouldn’t work out of the box because Mac OS X doesn’t use regular libraries to compile and link – that would have been too easy otherwise, and hell, think different right? They use something called a framework. It’s possible to include a framework in a Haskell project by fulfilling the frameworks field in the .cabal file. I received a simple patch to do that – here, and I merged it upstream.

Then, Mac OS X is now officially supported. The release version is the 0.1.4.

About stackage

There’s something else I’d like to discuss. Quickly after the first release of al, I decided to push it onto stackage. Unfortunately, there’s a problem with the package and Ubuntu. For a very dark reason, Ubuntu doesn’t expose anything when invoking pkg-confg --cflags, even if the files are there – on Ubuntu they can be found in /usr/include/AL.

That’s very silly because I don’t want to hardcode the location – it might be something else on other Linux distro. The problem might be related to the OpenAL support in Ubuntu – the .pc file used by pkg-config might be incomplete. So if you experience that kind of issue, you can fix it by passing the path to your OpenAL headers:

cabal install al --extra-include-dirs=/usr/include/AL

If OpenAL is installed somewhere else, consider using:

find / -name al.h

I’ll do my best to quickly fix that issue.

by Dimitri Sabadie ([email protected]) at June 29, 2015 06:43 PM

Douglas M. Auclair (geophf)

Tabular and Visual Representations of Data using Neo4J

Corporate and Employee Relationships
Both Graphical and Tabular Results

So, there are many ways to view data, and people may have different needs for representing that data, either for visualization (in a graph:node-edges-view) or for tabulation/sorting (in your standard spreadsheet view).

So, can Neo4J cater to both these needs?

Yes, it can.

Scenario 1: Relationships of owners of multiple companies

Let's say I'm doing some data exploration, and I wish to know who has interest/ownership in multiple companies? Why? Well, let's say I'm interested in the Peter-Paul problem: I want to know if Joe, who owns company X is paying company Y for whatever artificial scheme to inflate or to deflate the numbers of either business and therefore profit illegally thereby.

Piece of cake. Neo4J, please show me the owners, sorted by the number of companies owned:

MATCH (o:OWNER)--(p:PERSON)-[r:OWNS]->(c:CORP)
RETURN p.ssn AS Owner, collect(c.name) as Companies, count(r) as Count 
ORDER BY Count DESC


Diagram 1: Owners by Company Ownership

Boom! There you go. Granted, this isn't a very exciting data set, as I did not have many owners owning multiple companies, but there you go.

What does it look like as a graph, however?

MATCH (o:OWNER)--(p:PERSON)-[r:OWNS]->(c:CORP)-[:EMPLOYS]->(p1) 
WHERE p.ssn in [2879,815,239,5879] 
RETURN o,p,c,p1


Diagram 2: Some companies with multiple owners

To me, this is a richer result, because it now shows that owners of more than one company sometimes own shares in companies that have multiple owners. This may yield interesting results when investigating associates who own companies related to you. This was something I didn't see in the tabular result.

Not a weakness of Neo4J: it was a weakness on my part doing the tabular query. I wasn't looking for this result in my query, so the table doesn't show it.

Tellingly, the graph does.

Scenario 2: Contract-relationships of companies 

Let's explore a different path. I wish to know, by company, the contractual-relationships between companies, sorted by companies with the most contractual-relationships on down. How do I do that in Neo4J?

MATCH (c:CORP)-[cc:CONTRACTS]->(c1:CORP) 
RETURN c.name as Contractor, collect(c1.name) as Contractees, count(cc) as Count 
ORDER BY Count DESC


Diagram 3: Contractual-Relationships between companies

This is somewhat more fruitful, it seems. Let's, then, put this up into the graph-view, looking at the top contractor:

MATCH (p:PERSON)--(c:CORP)-[:CONTRACTS*1..2]->(c1:CORP)--(p1:PERSON) 
WHERE c.name in ['YFB'] 
RETURN p,c,c1,p1


Diagram 4: Contractual-Relationships of YFB

Looking at YFB, we can see contractual-relationships 'blossom-out' from it, as it were, and this is just immediate, then distance 1 from that out! If we go out even just distance 1 more in the contracts, the screen fills with employees, so then, again, you have the forest-trees problem where too much data is hiding useful results with data.

Let's prune these trees, then. Do circular relations appear?

MATCH (c:CORP)-[:CONTRACTS*1..5]->(c1:CORP) WHERE c.name in ['YFB'] RETURN c,c1


Diagram 5: Circular Relationship found, but not in YFB! Huh!

Well, would you look at that. This shows the power of the visualization aspect of graph databases. I was examining a hot-spot in corporate trades, YFB, looking for irregularities there. I didn't find any, but as I probed there, a circularity did surface in downstream, unrelated companies: the obvious one being between AZB and MZB, but there's also a circular-relationship that becomes apparent starting with 4ZB, as well. Yes, this particular graph is noisy, but it did materialize an interesting area to explore that may very well have been overlooked with legacy methods of investigation.

Graph Databases.


BAM.

by geophf ([email protected]) at June 29, 2015 06:25 PM

Philip Wadler

The Thrilling Adventures of Lovelace and Babbage

Sydney Padua explores an alternate universe wherein Ada Lovelace and Charles Babbage complete the Analytical Engine and use it to (at the order of Queen Victoria) fight crime. I've blogged before about the web comic, but the book is even better.

Padua's tome reconciles hilarity with accuracy. I am not normally a fan of footnotes: if it is worth saying, don't force your poor reader to break the flow of thought and eye, and leap to the bottom of the page. But here is the glorious exception, where the footnotes supplement, argue with, and occasionally threaten to overflow the comic. I don't even object that the footnotes have footnotes: endnotes cite sources for the dialogue, introduce Ada and Charles' contemporaries Isambard Kingdom Brunel, Charles Dodgson, and George Boole, quote at length from original sources, and explain the workings of the Analytic Engine. In the midst of an illustrated fantasia riff on Alice in Wonderland, the footnotes pursue an academic war as to whether or not Babbage truly considered Ada to be the Empress of Number. Padua makes pursuit of Victorian arcana a thrilling adventure of its own. Highly recommended!


by Philip Wadler ([email protected]) at June 29, 2015 10:54 AM

Yesod Web Framework

stack support for yesod devel

Since the release of the stack build tool, I think I've received four different requests and bug reports about stack support in Yesod. The sticking point is that yesod devel has not- until now- had support for using stack. The original plan was to wait for Njagi Mwaniki's Google Summer of Code project to finish the new ide-backend based yesod devel. However, demand for this feature was too high, so I'm happy to announce official stack support in yesod-bin-1.4.11.

This blog post should be consider a beta announcement for using Yesod with stack. Please test and report issues. Also, the workflow is not yet perfected. Once we get stack new written and add ide-backend support, things will be much smoother. For now, here's the workflow:

# Install both yesod-bin and cabal-install. Both are still necessary
$ stack install yesod-bin-1.4.11 cabal-install
# Initialize yoru project
$ stack exec yesod init
$ cd new-directory
# Create stack.yaml
$ stack init
# Build your project to get al dependencies
# Also rebuild yesod-bin with the current GHC, just to be safe
$ stack build yesod-bin-1.4.11 .
# Now run yesod devel
$ stack exec yesod devel

Like I said, this is a little kludgy right now, but will smooth out over time.

Technical details

If you're curious in how I added this support, the commit is really short. And there's not actually anything stack-specific in here, it's just working around a well known limitation in cabal which makes it incompatible with the GHC_PACKAGE_PATH environment variable. All we do in yesod devel is:

  • Detect the GHC_PACKAGE_PATH variable
  • Turn it into --package-db arugments to cabal configure
  • Remove GHC_PACKAGE_PATH from the environment of the cabal process

It would be nice if cabal got this functionality itself in the future, but I believe the current implementation was a design goal of the cabal team.

June 29, 2015 12:00 AM

Danny Gratzer

Proving Cut Admissibility in Twelf

Posted on June 29, 2015
Tags: twelf, types

Veering wildly onto the theory side compared to my last post, I’d like to look at some more Twelf code today. Specifically, I’d like to prove a fun theorem called cut admissibility (or elimination) for a particular logic: a simple intuitionistic propositional sequent calculus. I chucked the code for this over here.

Background

If those words didn’t make any sense, here’ an incomplete primer on what we’re doing here. First of all we’re working with a flavor of logic called “sequent calculus”. Sequent calculus describes a class of logics characterized by using studying “sequents”, a sequent is just an expression Γ ⊢ A saying “A is true under the assumption that the set of propositions, Γ, are true”. A sequent calculus defines a couple things

  1. What exactly A is, a calculus defines what propositions it talks about

    For us we’re only interested in a few basic connectives, so our calculus can talk about true, false, A and B (A ∧ B), A or B (A ∨ B), and A implies B (A ⇒ B).

  2. Rules for inferring Γ ⊢ A holds. We can use these inference rules to build up proofs of things in our system.

    In sequent calculus there are two sorts of inference rules, left and right. A left rule takes a fact that we know and let’s us reason backwards to other things we must know hold. A right rule let’s us take the thing we’re trying to prove and instead prove smaller, simpler things.

    More rules will follow in the Twelf code but for a nice example consider the left and right rules for ,

      Γ, A, B ⊢ C
     ———————————————
      Γ, A ∧ B ⊢ C
    
     Γ ⊢ A    Γ ⊢ B
     ———————————————
        Γ ⊢ A ∧ B

    The left rule says if we know that A ∧ B is true, we can take it apart and try to prove our goal with assumptions that A and B are true. The right rule says to prove that A ∧ B is true we need to prove A is true and B is true. A proof in this system is a true of these rules just like you’d expect in a type theory or natural deduction.

We also tacitly assume a bunch of boring rules called structural rules about our sequents hold, so that we can freely duplicate, drop and swap assumptions in Γ. For a less awful introduction to sequent calculus Frank Pfenning has some good notes.

Now we want to prove a particular (meta)theorem about sequent calculus

  1. if Γ ⊢ A
  2. and Γ, A ⊢ B
  3. then Γ ⊢ B

This theorem means a couple different things for example, our system is consistent and our system also admits lemmas. As it turns out, proving this theorem is hard. The basic complication is that we don’t know what form either of the first two proofs.

The Twelf Stuff

We now formalize our sequent calculus in Twelf. First we declare a type and some constants to represent propositions.

    prop  : type.

    =>    : prop -> prop -> prop. %infix right 4 =>.
    true  : prop.
    false : prop.
    /\    : prop -> prop -> prop. %infix none 5 /\.
    \/    : prop -> prop -> prop. %infix none 5 \/.

Notice here that we use infix to let us write A /\ B => C. Having specified these we now define what a proof is in this system. This is structured a little differently than you’d be led to believe from the above. We have an explicit type proof which is inhabited by “proof terms” which serve as a nice shortcut to those trees generated by inference rules. Finally, we don’t explicitly represent Γ, instead we have this thing called hyp which is used to represent a hypothesis in Γ. Left rules manipulate use these hypotheses and introduce new ones. Pay attention to /‌\/l and /‌\/r since you’ve seen the handwritten equivalents.

    proof   : type.
    hyp     : type.

    init    : hyp -> proof.

    =>/r    : (hyp -> proof) -> proof.
    =>/l    : (hyp -> proof) -> proof -> hyp -> proof.

    true/r  : proof.

    false/l : hyp -> proof.

    /\/r    : proof -> proof -> proof.
    /\/l    : (hyp -> hyp -> proof) -> hyp -> proof.

    \//r1   : proof -> proof.
    \//r2   : proof -> proof.
    \//l    : (hyp -> proof) -> (hyp -> proof) -> hyp -> proof.

The right rules are at least a little intuitive, the left rules are peculiar. Essentially we have a weird CPS-y feel going on here, to decompose a hypothesis you hand the hyp to the constant along with a continuation which takes the hypotheses you should get out of the decomposition. For example for ‌\/ we have to right rules (think Left and Right), then one left rule which takes two continuations and one hyp (think either). Finally, that init thing is the only way to actually take a hypothesis and use it as a proof.

We now want to unite these two pieces of syntax with a typing judgment letting us say that a proof proves some particular prop.

    of         : proof -> prop -> type.
    hof        : hyp   -> prop -> type.

    of/init    : of (init H) A
                  <- hof H A.

    of/=>/r    : of (=>/r F) (A => B)
                  <- ({h} hof h A -> of (F h) B).
    of/=>/l    : of (=>/l C Arg F) U
                  <- hof F (A => B)
                  <- of Arg A
                  <- ({h} hof h B -> of (C h) U).

    of/true/r  : of true/r true.

    of/false/l : of (false/l H) A
                  <- hof H false.

    of//\/r    : of (/\/r R L) (A /\ B)
                  <- of L A
                  <- of R B.
    of//\/l    : of (/\/l C H) U
                  <- hof H (A /\ B)
                  <- ({h}{h'} hof h A -> hof h' B -> of (C h h') U).

    of/\//r1   : of (\//r1 L) (A \/ B)
                  <- of L A.
    of/\//r2   : of (\//r2 R) (A \/ B)
                  <- of R B.
    of/\//l    : of (\//l R L H) C
                  <- hof H (A \/ B)
                  <- ({h} hof h A -> of (L h) C)
                  <- ({h} hof h B -> of (R h) C).

In order to handle hypotheses we have this hof judgment which handles typing various hyps. We introduce it just like we introduce hyps in those continuation-y things for left rules. Sorry for dumping so much code on you all at once: it’s just a lot of machinery we need to get working in order to actually start formalizing cut.

I would like to point out a few things about this formulation of sequent calculus though. First off, it’s very Twelf-y, we use the LF context to host the context of our logic using HOAS. We also basically just have void as the type of hypothesis! Look, there’s no way to construct a hypothesis, let alone a typing derivation hof! The idea is that we’ll just wave our hands at Twelf and say “consider our theorem in a context with hyps and hofs with

    %block someHofs : some {A : prop} block {h : hyp}{deriv : hof h A}.

In short, Twelf is nifty.

The Theorem

Now we’re almost in a position to state cut admissibility, we want to say something like

    cut : of Lemma A
            -> ({h} hyp h A -> of (F h) B)
            -> of ??? B

But what should that ??? be? We could just say “screw it it’s something” and leave it as an output of this lemma but experimentally (an hour of teeth gnashing later) it’s absolutely not worth the pain. Instead let’s do something clever.

Let’s first define an untyped version of cut which works just across proofs without mind to typing derivations. We can’t declare this total because it’s just not going to work for ill-typed things, we can give it a mode though (it’s not needed) just as mechanical documentation.

    cut : proof -> (hyp -> proof) -> proof -> type.
    %mode cut +A +B -C.

The goal here is we’re going to state our main theorem as

    of/cut : {A} of P A
              -> ({h} hof h A -> of (F h) B)
              -> cut P F C
              -> of C B
              -> type.
    %mode of/cut +A +B +C -D -E.

Leaving that derivation of cut as an output. This let’s us produce not just a random term but instead a proof that that term makes “sense” somehow along with a proof that it’s well typed.

cut is going to mirror the structure of of/cut so we now need to figure out how we’re going to structure our proof. It turns out a rather nice way to do this is to organize our cuts into 4 categories. The first one are “principle” cuts, they’re the ones where we have a right rule as our lemma and we immediately decompose that lemma in the other term with the corresponding left rule. This is sort of the case that we drive towards everywhere and it’s where the substitution bit happens.

First we have some simple cases

    trivial : cut P' ([x] P) P.
    p/init1 : cut (init H) ([x] P x) (P H).
    p/init2 : cut P ([x] init x) P.

In trivial we don’t use the hypothesis at all so we’re just “strengthening” here. p/init1 and p/init2 deal with the init rule on the left or right side of the cut, if it’s on the left we have a hypothesis of the appropriate type so we just apply the function. If it’s on the left we have a proof of the appropriate type so we just return that. In the more interesting cases we have the principle cuts for some specific connectives.

    p/=>   : cut (=>/r F) ([x] =>/l ([y] C y x) (Arg x) x) Out'
              <- ({y} cut (=>/r F) (C y) (C' y))
              <- cut (=>/r F) Arg Arg'
              <- cut Arg' F Out
              <- cut Out C' Out'.
    p//\   : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
              <- ({x}{y} cut (/\/r R L) (C x y) (C' x y))
              <- ({x} cut R (C' x) (Out x))
              <- cut L Out Out'.
    p/\//1 : cut (\//r1 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
              <- ({x} cut (\//r1 L) (C1 x) (C1' x))
              <- cut L C1' Out.
    p/\//2 : cut (\//r2 L) ([x] \//l ([y] C2 y x) ([y] C1 y x) x) Out
              <- ({x} cut (\//r2 L) (C2 x) (C2' x))
              <- cut L C2' Out.

Let’s take a closer look at p/=>, the principle cut for =>. First off, our inputs are =>/r F and ([x] =>/l ([y] C y x) (Arg x) x). The first one is just a “function” that we’re supposed to substitute into the second. Now the second is comprised of a continuation and an argument. Notice that both of these depend on x! In order to handle this the first two lines of the proof

           <- ({y} cut (=>/r F) (C y) (C' y))
           <- cut (=>/r F) Arg Arg'

Are to remove that dependence. We get back a C' and an Arg' which doesn’t use the hyp (x). In order to do this we just recurse and cut the =>/r F out of them. Notice that both the type and the thing we’re substituting are the same size, what decreases here is what we’re substituting into. Now we’re ready to actually do some work. First we need to get a term representing the application of F to Arg'. This is done with cut since it’s just substitution

              <- cut Arg' F Out

But this isn’t enough, we don’t need the result of the application, we need the result of the continuation! So we have to cut the output of the application through the continuation

              <- cut Out C' Out'.

This code is kinda complicated. The typed version of this took me an hour since after 2am I am charitably called an idiot. However this same general pattern holds with all the principle cuts

  1. The subterms of the target could depend on what we’re substituting for, so get rid of that dependence with a recursive call.
  2. Get the “result”, which is just the term corresponding to the hypothesis the continuation is expecting. This is pretty trivial in all cases except => since it’s just lying about in an input.
  3. Get the actual result by cutting 2. through the continuation.

Try to work through the case for /\ now

    p//\   : cut (/\/r R L) ([x] /\/l ([y][z] C y z x) x) Out'
              <- ({x}{y} cut (/\/r R L) (C x y) (C' x y))
              <- ({x} cut R (C' x) (Out x))
              <- cut L Out Out'.

After principle cuts we really just have a number of boring cases whose job it is to recurse. The first of these is called rightist substitution because it comes up if the term on the right (the part using the lemma) has a right rule first. This means we have to hunt in the subterms to go find where we’re actually using the lemma.

    r/=> : cut P ([x] (=>/r ([y] F y x))) (=>/r ([y] F' y))
            <- ({x} cut P (F x) (F' x)).
    r/true : cut P ([x] true/r) true/r.
    r//\ : cut P ([x] /\/r (R x) (L x)) (/\/r R' L')
            <- cut P L L'
            <- cut P R R'.
    r/\/1 : cut P ([x] \//r1 (L x)) (\//r1 L')
             <- cut P L L'.
    r/\/2 : cut P ([x] \//r2 (R x)) (\//r2 R')
             <- cut P R R'.

Nothing here should be surprising keeping in mind that all we’re doing here is recursing. The next set of cuts is called leftist substitution. Here we are actually recursing on the term we’re trying to substitute.

    l/=>    : cut (=>/l ([y] C y) Arg H) ([x] P x) (=>/l ([x] C' x) Arg H)
               <- ({x} cut (C x) P (C' x)).
    l/false : cut (false/l H) P (false/l H).
    l//\    : cut (/\/l ([x][y] C x y) H) P (/\/l ([x][y] C' x y) H)
               <- ({x}{y} cut (C x y) P (C' x y)).
    l/\/    : cut (\//l ([y] R  y) ([y] L  y) H) ([x] P x)
                  (\//l ([y] R' y) ([y] L' y) H)
               <- ({x} cut (L x) P (L' x))
               <- ({x} cut (R x) P (R' x)).

It’s the same game but just a different target, we’re now recursing on the continuation because that’s where we somehow created a proof of A. This means that on l/=> we’re substation left term which has three parts

  1. A continuation, hyp B to proof of C
  2. An argument of type A
  3. A hypothesis of type A -> B

Now we’re only interesting in how we created that proof of C, that’s the only relevant part of this substitution. The output of this case is going to have that left rule, =>/l ??? Arg H so we have where ??? is a replacement of C that we get by cutting C through P “pointwise”. This comes through on the recursive call

               <- ({x} cut (C x) P (C' x)).

For one more case, consider the left rule for \/

    l/\/    : cut (\//l R L H) P

We start by trying to cut a left rule into P so we need to produce a left rule in the output with different continuations, something like

               (\//l R' L' H)

Now what should R' and L' be? In order to produce them we’ll throw up a pi so we can get L x, a proof with the appropriate type to cut again. With that, we can recurse and get back the new continuation we want.

               <- ({x} cut (L x) P (L' x))
               <- ({x} cut (R x) P (R' x)).

There’s one last class of cuts to worry about, think about the cases we’ve covered so far

  1. The left term is a left rule (leftist substitution)
  2. The right term is a right rule (rightist substitution)
  3. The terms match up and we substitute

So what happens if we have a left rule on the right and a right rule on the left, but they don’t “match up”. By this I mean that the left rule in that right term works on a different hypothesis than the one that the function it’s wrapped in provides. In this case we just have to recurse some more

    lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
             <- cut P Arg Arg'
             <- ({y} cut P (C y) (C' y)).
    lr//\ : cut P ([x] /\/l ([y][z] C y z x) H) (/\/l C' H)
             <- ({y}{z} cut P (C y z) (C' y z)).
    lr/\/ : cut P ([x] \//l ([y] R y x) ([y] L y x) H) (\//l R' L' H)
             <- ({y} cut P (L y) (L' y))
             <- ({y} cut P (R y) (R' y)).

When we have such an occurrence we just do like we did with right rules.

Okay, now that we’ve handled all of these cases we’re ready to type the damn thing.

    of/cut : {A} of P A
              -> ({h} hof h A -> of (F h) B)
              -> cut P F C
              -> of C B
              -> type.
    %mode of/cut +A +B +C -D -E.

Honestly this is less exciting than you’d think. We’ve really done all the creative work in constructing the cut type family. All that’s left to do is check that this is correct. As an example, here’s a case that exemplifies how we verify all left-right commutative cuts.

    - : of/cut _ P ([x][h] of/=>/l ([y][yh] C y yh x h) (A x h) H)
         (lr/=> CC CA) (of/=>/l C' A' H)
         <- of/cut _ P A CA A'
         <- ({y}{yh} of/cut _ P (C y yh) (CC y) (C' y yh)).

We start by trying to show that

    lr/=> : cut P ([x] =>/l ([y] C y x) (Arg x) H) (=>/l C' Arg' H)
             <- cut P Arg Arg'
             <- ({y} cut P (C y) (C' y)).

Is type correct. To do this we have a derivation P that the left term is well-typed. Notice that I’ve overloaded P here, in the rule lr/=> P was a term and now it’s a typing derivation for that term. Next we have a typing derivation for [x] =>/l ([y] C y x) (Arg x) H. This is a function which takes two arguments. x is a hypothesis, the same as in lr/=>, however now we have h which is a hof derivation that h has a type. There’s only one way to type a usage of the left rule for =>, with of/=>/l so we have that next.

Finally, our output is on the next line in two parts. First we have a derivation for cut showing how to construct the “cut out term” in this case. Next we have a new typing derivation that again uses of/=>/l. Notice that both of these depend on some terms we get from the recursive calls here.

Since we’ve gone through all the cases already and done all the thinking, I’m not going to reproduce it all here. The intuition for how cut works is really best given by the untyped version with the understand that we check that it’s correct with this theorem as we did above.

Wrap Up

To recap here’s what we did

  1. Define sequent calculus’ syntax
  2. Define typing rules across it
  3. Define how an untyped version of cut works across it
  4. Validate the correctness of cut by

Hope that was a little fun, cheers!

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

June 29, 2015 12:00 AM

June 27, 2015

Christopher Done

The path package

Here I’d like to announce and explain the motivations behind my path package.

Motivation

It was after working on a number of projects at FP Complete that use file paths in various ways. We used the system-filepath package, which was supposed to solve many path problems by being an opaque path type. It occurred to me that the same kind of bugs kept cropping up:

  • Expected a path to be absolute but it was relative, or vice-versa.
  • Expected two equivalent paths to be equal or order the same, but they did not (/home//foo vs /home/foo/ vs /home/bar/../foo, etc.).
  • Unpredictable behaviour with regards to concatenating paths.
  • Confusing files and directories.
  • Not knowing whether a path was a file or directory or relative or absolute based on the type alone was a drag.

All of these bugs are preventable.

Approach

My approach to problems like this is to make a type that encodes the properties I want and then make it impossible to let those invariants be broken, without compromise or backdoors to let the wrong value “slip in”. Once I have a path, I want to be able to trust it fully. This theme will be seen throughout the things I lay out below.

Solution

After having to fix bugs due to these in our software, I put my foot down and made:

  • An opaque Path type (a newtype wrapper around String).
  • Smart constructors which are very stringent in the parsing.
  • Make the parsers highly normalizing.
  • Leave equality and concatenation to basic string equality and concatenation.
  • Include relativity (absolute/relative) and type (directory/file) in the type itself.
  • Use the already cross-platform filepath package for implementation details.

Implementation

The data types

Here is the type:

newtype Path b t = Path FilePath
  deriving (Typeable)

The type variables are:

  • b - base, the base location of the path; absolute or relative.
  • t - type, whether file or directory.

The base types can be filled with these:

data Abs deriving (Typeable)
data Rel deriving (Typeable)

And the type can be filled with these:

data File deriving (Typeable)
data Dir deriving (Typeable)

(Why not use data kinds like data Type = File | Dir? Because that imposes an extension overhead of adding {-# LANGUAGE DataKinds #-} to every module you might want to write out a path type in. Given that one cannot construct paths of types other than these, via the operations in the module, it’s not a concern for me.)

There is a conversion function to give you back the filepath:

toFilePath :: Path b t -> FilePath
toFilePath (Path l) = l

Parsers

To get a path value, you need to use one of the four parsers:

parseAbsDir  :: MonadThrow m => FilePath -> m (Path Abs Dir)
parseRelDir  :: MonadThrow m => FilePath -> m (Path Rel Dir)
parseAbsFile :: MonadThrow m => FilePath -> m (Path Abs File)
parseRelFile :: MonadThrow m => FilePath -> m (Path Rel File)

The following properties apply:

  • Absolute parsers will reject non-absolute paths.
  • The only delimiter syntax accepted is the path separator; / on POSIX and \ on Windows.
  • Any other delimiter is rejected; .., ~/, /./, etc.
  • All parsers normalize into single separators: /home//foo -> /home/foo.
  • Directory parsers always normalize with a final trailing /. So /home/foo parses into the string /home/foo/.

It was discussed briefly whether we should just have a class for parsing rather than four separate parsing functions. In my experience so far, I have had type errors where I wrote something like x <- parseAbsDir someAbsDirString because x was then passed to a place that expected a relative directory. In this way, overloading the return value would’ve just been accepted. So I don’t think having a class is a good idea. Being explicit here doesn’t exactly waste our time, either.

Why are these functions in MonadThrow? Because it means I can have it return an Either, or a Maybe, if I’m in pure code, and if I’m in IO, and I don’t expect parsing to ever fail, I can use it in IO like this:

do x <- parseRelFile (fromCabalFileName x)
   foo x
   …

That’s really convenient and we take advantage of this at FP Complete a lot.

The instances

Equality, ordering and printing are simply re-using the String instances:

instance Eq (Path b t) where
  (==) (Path x) (Path y) = x == y

instance Ord (Path b t) where
  compare (Path x) (Path y) = compare x y

instance Show (Path b t) where
  show (Path x) = show x

Which gives us for free the following equational properties:

toFilePath x == toFilePath y        ≡ x == y           -- Eq instance
toFilePath x `compare` toFilePath y ≡ x `compare` y    -- Ord instance
toFilePath x == toFilePath y        ≡ show x == show y -- Show instance

In other words, the representation and the path you get out at the end are the same. Two paths that are equal will always give you back the same thing.

Smart constructors

For when you know what a path will be at compile-time, there are constructors for that:

$(mkAbsDir "/home/chris")
$(mkRelDir "chris")
$(mkAbsFile "/home/chris/x.txt")
$(mkRelFile "chris/x.txt")

These will run at compile-time and underneath use the appropriate parser.

Overloaded strings

No IsString instance is provided, because that has no way to statically determine whether the path is correct, and would otherwise have to be a partial function.

In practice I have written the wrong path format in a $(mk… "") and been thankful it was caught early.

Operations

There is path concatenation:

(</>) :: Path b Dir -> Path Rel t -> Path b t

Get the parent directory of a path:

parent :: Path Abs t -> Path Abs Dir

Get the filename of a file path:

filename :: Path b File -> Path Rel File

Get the directory name of a directory path:

dirname :: Path b Dir -> Path Rel Dir

Stripping the parent directory from a path:

stripDir :: MonadThrow m => Path b Dir -> Path b t -> m (Path Rel t)

Review

Let’s review my initial list of complaints and see if they’ve been satisfied.

Relative vs absolute confusion

Paths now distinguish in the type system whether they are relative or absolute. You can’t append two absolute paths, for example:

λ> $(mkAbsDir "/home/chris") </> $(mkAbsDir "/home/chris")
<interactive>:23:31-55:
    Couldn't match type ‘Abs’ with ‘Rel’

The equality problem

Paths are now stringently normalized. They have to be a valid path, and they only support single path separators, and all directories are suffixed with a trailing path separator:

λ> $(mkAbsDir "/home/chris//") == $(mkAbsDir "/./home//chris")
True
λ> toFilePath $(mkAbsDir "/home/chris//") ==
   toFilePath $(mkAbsDir "/./home//chris")
True
λ> ($(mkAbsDir "/home/chris//"),toFilePath $(mkAbsDir "/./home//chris"))
("/home/chris/","/home/chris/")

Unpredictable concatenation issues

Because of the stringent normalization, path concatenation, as seen above, is simply string concatenation. This is about as predictable as it can get:

λ> toFilePath $(mkAbsDir "/home/chris//")
"/home/chris/"
λ> toFilePath $(mkRelDir "foo//bar")
"foo/bar/"
λ> $(mkAbsDir "/home/chris//") </> $(mkRelDir "foo//bar")
"/home/chris/foo/bar/"

Confusing files and directories

Now that the path type is encoded in the type system, our </> operator prevents improper appending:

λ> $(mkAbsDir "/home/chris/") </> $(mkRelFile "foo//bar")
"/home/chris/foo/bar"
λ> $(mkAbsFile "/home/chris") </> $(mkRelFile "foo//bar")
<interactive>:35:1-26:
    Couldn't match type ‘File’ with ‘Dir’

Self-documentation

Now I can read the path like:

{ fooPath :: Path Rel Dir, ... }

And know that this refers to the directory relative to some other path, meaning I should be careful to consider the current directory when using this in IO, or that I’ll probably need a parent to append to it at some point.

In practice

We’ve been using this at FP Complete in a number of packages for some months now, it’s turned out surprisingly sufficient for most of our path work with only one bug found. We weren’t sure initially whether it would just be too much of a pain to use, but really it’s quite acceptable given the advantages. You can see its use all over the stack codebase.

Doing I/O

Currently any operations involving I/O can be done by using the existing I/O library:

doesFileExist (toFilePath fp)
readFile (toFilePath fp)

etc. This has problems with respect to accidentally running something like:

doesFileExist $(mkRelDir "foo")

But I/O is currently outside the scope of what this package solves. Once you leave the realm of the Path type invariants are back to your responsibility.

As with the original version of this library, we’re currently building up a set of functions in a Path.IO module over time that fits our real-world use-cases. It may or may not appear in the path package eventually. It’ll need cleaning up and considering what should really be included.

Doing textual manipulations

One problem that crops up sometimes is wanting to manipulate paths. Currently the way we do it is via the filepath library and re-parsing the path:

parseAbsFile . addExtension "/directory/path" "ext" . toFilePath

It doesn’t happen too often, in our experience, to the extent this needs to be more convenient.

Accepting user input

Sometimes you have user input that contains ../. The solution we went with is to have a function like resolveDir:

resolveDir :: (MonadIO m, MonadThrow m)
           => Path Abs Dir -> FilePath -> m (Path Abs Dir)

Which will call canonicalizePath which collapses and normalizes a path and then we parse with regular old parseAbsDir and we’re cooking with gas. This and others like it might get added to the path package.

Comparing with existing path libraries

filepath and system-filepath

The filepath package is intended as the complimentary package to be used before parsing into a Path value, and/or after printing from a Path value. The package itself contains no type-safety, instead contains a range of cross-platform textual operations. Definitely reach for this library when you want to do more involved manipulations.

The system-filepath package is deprecated in favour of filepath.

system-canonicalpath, canonical-filepath, directory-tree

The system-canonicalpath and the canonical-filepath packages both are a kind of subset of path. They canonicalize a string into an opaque path, but neither distinguish directories from files or absolute/relative. Useful if you just want a canonical path but doesn’t do anything else.

The directory-tree package contains a sum type of dir/file/etc but doesn’t distinguish in its operations relativity or path type.

pathtype

Finally, we come to a path library that path is similar to: the pathtype library. There are the same types of Path Abs File / Path Rel Dir, etc.

The points where this library isn’t enough for me are:

  • There is an IsString instance, which means people will use it, and will make mistakes.

  • Paths are not normalized into a predictable format, leading to me being unsure when equality will succeed. This is the same problem I encountered in system-filepath. The equality function normalizes, but according to what properties I can reason about? I don’t know.

System.Path.Posix> ("/tmp//" :: Path a Dir) == ("/tmp" :: Path a Dir)
True
System.Path.Posix> ("tmp" :: Path a Dir) == ("/tmp" :: Path a Dir)
True
System.Path.Posix> ("/etc/passwd/" :: Path a b) == ("/etc/passwd" :: Path a b)
True
System.Path.Posix> ("/tmp//" :: Path Abs Dir) == ("/tmp/./" :: Path Abs Dir)
False
System.Path.Posix> ("/tmp/../" :: Path Abs Dir) == ("/" :: Path Abs Dir)
False
  • Empty string should not be allowed, and introduction of . due to that gets weird:
System.Path.Posix> fmap getPathString (Right ("." :: Path Rel File))
Right "."
System.Path.Posix> fmap getPathString (mkPathAbsOrRel "")
Right "."
System.Path.Posix> (Right ("." :: Path Rel File)) == (mkPathAbsOrRel "")
False
System.Path.Posix> takeDirectory ("tmp" :: Path Rel Dir)
.
System.Path.Posix> (getPathString ("." :: Path Rel File) ==
                    getPathString ("" :: Path Rel File))
True
System.Path.Posix> (("." :: Path Rel File) == ("" :: Path Rel File))
False
  • It has functions like <.>/addExtension which lets you insert an arbitrary string into a path.

  • Some functions let you produce nonsense (could be prevented by a stricter type), for example:

System.Path.Posix> takeFileName ("/tmp/" :: Path Abs Dir)
tmp

I’m being a bit picky here, a bit unfair. But the point is really to show the kind of things I tried to avoid in path. In summary, it’s just hard to know where things can go wrong, similar to what was going on in system-filepath.

data-filepath

The data-filepath is also very similar, I discovered it after writing my own at work and was pleased to see it’s mostly the same. The main differences are:

  • Uses DataKinds for the relative/absolute and file/dir distinction which as I said above is an overhead.
  • Uses a GADT for the path type, which is fine. In my case I wanted to retain the original string which functions that work on the FilePath (String) type already deal with well. It does change the parsing step somewhat, because it parses into segments.
  • It’s more lenient at parsing (allowing .. and trailing .).

The API is a bit awkward to just parse a directory, requires a couple functions to get it (going via WeakFilePath), returning only an Either, and there are no functions like parent. But there’s not much to complain about. It’s a fine library, but I didn’t feel the need to drop my own in favor of it. Check it out and decide for yourself.

Summary

There’s a growing interest in making practical use of well-typed file path handling. I think everyone’s wanted it for a while, but few people have really committed to it in practice. Now that I’ve been using path for a while, I can’t really go back. It’ll be interesting to see what new packages crop up in the coming year, I expect there’ll be more.

June 27, 2015 12:00 AM

June 25, 2015

Douglas M. Auclair (geophf)

Business Interrelations as a Graph

We look at a practical application of Graph Theory to take a complex representation of information and distill it to something 'interesting.' And by 'interesting,' I mean: finding a pattern in the sea of information otherwise obscured (sometimes intentionally) by all the overwhelming availability of data.

We'll be working with a graph database created from biz.csv, so to get this started, load that table into neo4j:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///[...]/biz.csv" AS csvLine
MERGE (b1:CORP { name: csvLine.contractor })
MERGE (b2:CORP { name: csvLine.contractee })

MERGE (b1)-[:CONTRACTS]->(b2)

We got to a graph of business interrelations this past week (see, e.g.: http://lpaste.net/5444203916235374592) showing who contracts with whom, and came to a pretty picture like this from the Cypher query:

MATCH (o:OWNER)--(p:PERSON)-[]->(c:CORP) RETURN o,p,c

diagram 1: TMI

That is to say: the sea of information. Yes, we can tell that businesses are conducting commerce, but ... so what? That's all well and good, but let's say that some businesses want to mask how much they are actually making by selling their money to other companies, and then getting it paid back to them. This is not an impossibility, and perhaps it's not all that common, but companies are savvy to watch dogs, too, so it's not (usually) going to be an obvious A -[contracts]-> B -[contracts] -> A relationship.

Not usually, sometimes you have to drill deeper, but if you drill too deeply, you get the above diagram which tells you nothing, because it shares too much information.

(Which you never want to do ... not on the first date, anyway, right?)

But the above, even though noisy, does show some companies contracting with other companies, and then, in turn being contracted by some companies.

So, let's pick one of them. How about company 'YPB,' for example? (Company names are changed to protect the modeled shenanigans)

MATCH (c:CORP)-[:CONTRACTS]->(c1:CORP) WHERE c.name='YPB' RETURN c, c1

diagram 2: tier 1 of YPB

So, in this first tier we see YPB contracting with four other companies. Very nice. Very ... innocuous. Let's push this inquiry to the next tier, is there something interesting happening here?

MATCH (c:CORP)-[:CONTRACTS*1..2]->(c1:CORP) WHERE c.name='YPB' RETURN c, c1

diagram 3: tier 2 of YPB

Nope. Or what is interesting is to see the network of businesses and their relationships (at this point, not interrelationships) begin to extend the reach. You tell your friends, they tell their friends, and soon you have the MCI business-model.

But we're not looking at MCI. We're looking at YPB, which is NOT MCI, I'd like to say for the record.

Okay. Next tier:

MATCH (c:CORP)-[:CONTRACTS*1..3]->(c1:CORP) WHERE c.name='YPB' RETURN c, c1

diagram 4: tier 3 of YPB

Okay, a little more outward growth. Okay. (trans: 'meh') How about the next tier, that is to say: tier 4?

MATCH (c:CORP)-[:CONTRACTS*1..4]->(c1:CORP) WHERE c.name='YPB' RETURN c, c1

diagram 5: tier 4 of YPB

So, we've gone beyond our observation cell, but still we have no loop-back to YPB. Is there none (that is to say: no circular return to YPB)? Let's push it one more time to tier 5 and see if we have a connection.

MATCH (c:CORP)-[:CONTRACTS*1..5]->(c1:CORP) WHERE c.name='YPB' RETURN c, c1


diagram 6: tier 5 with a (nonobvious) cycle of contracts

Bingo. At tier 5, we have a call-back.

But from whom?

Again, we've run into the forest-trees problem in that we see we have a source of YPB, and YPB is the destination, as well, but what is the chain of companies that close this loop. We can't see this well in this diagram, as we have so many companies. So let's zoom into the company that feeds money back to YPB and see if that answers our question.

MATCH (c:CORP)-[:CONTRACTS]->(c1:CORP)-[:CONTRACTS]->(c2:CORP)-[:CONTRACTS]->(c3:CORP)-[:CONTRACTS]->(c4:CORP)-[:CONTRACTS]->(c5:CORP) WHERE c.name='YPB' AND c5.name='GUB' RETURN c, c1, c2, c3, c4, c5

diagram 7: cycle of contracts from YPB

Aha! There we go. By focusing our query the information leaps right out at us. Behold, we're paying Peter, who pays Paul to pay us back, and it's there, plain as day.

Now, lock them up and throw away the key? No. We've just noted a cyclical flow of contracts, but as to the legality of it, that is: whether it is allowed or whether this is fraudulent activity, there are teams of analysts and lawyers who can sink their mandibles into this juicy case.

No, we haven't determined innocence or guilt, but we have done something, and that is: we've revealed an interesting pattern, a cycle of contracts, and we've also identified the parties to these contracts. Bingo. Win.

The problem analysts face today is diagram 1: they have just too much information, and they spend the vast majority of their time weeding out the irrelevant information to winnow down to something that may be interesting. We were presented with the same problem: TMI. But, by using graphs, we were able to see, firstly, that there are some vertices (or 'companies') that have contracts in and contracts out, and, by investigating further, we were able to see a pattern develop that eventually cycled. My entire inquiry lasted minutes of queries and response. Let's be generous and say it took me a half-hour to expose this cycle.

Data analysts working on these kinds of problems are not so fortunate. Working with analysts, I've observed that:

  1. First of all, they never see the graph: all they see are spreadsheets,
  2. Secondly, it takes days to get to even just the spreadsheets of information
  3. Third, where do they go from there? How do they see these patterns? The learning curve for them is prohibitive, making training a bear, and niching their work to just a few brilliant experts and shunting out able-bodied analysts who are more than willing to help, but just don't see the patterns in grids of numbers
With the graph-representation, you can run into the same problems, but 
  1. Training is much easier for those who can work with these visual patterns,
  2. Information can be overloaded, leaving one overwhelmed, but then one can very easily reset to just one vertex and expand from there (simplifying the problem-space). And then, the problem grows in scope when you decide to expand your view, and if you don't like that expanse, it's very easy either to reset or to contract that view.
  3. An analyst can focus on a particular thread or materialize that thread on which to focus, or the analyst can branch from a point or branch to associated sets. If a thread is not yielding interesting results, then they can pursue other, more interesting, areas of the graph, all without losing one's place.
The visual impact of the underlying graph (theory) cannot be over-emphasized: "Boss, we have a cycle of contracts!" an analyst says and presents a spreadsheet requires explanation, checking and verification. That same analysis comes into the boss' office with diagram 7, and the cycle practically leaps off the page and explains itself, that, coupled with the ease and speed of which these cycles are explored and materialized visually makes a compelling case of modeling related data as graphs.

We present, for your consideration: graphs.



Models presented above are derived from various Governmental sources include Census Bureau, Department of Labor, Department of Commerce, and the Small Business Administration.

Graphs calculated in Haskell and stored and visualized in Neo4J

by geophf ([email protected]) at June 25, 2015 03:17 PM

Yesod Web Framework

Cleaning up the Warp APIs

Cleaning up the Warp APIs

For the last one and a half years, I have been trying to implement HTTP/2 in Warp. Since both HTTP/2 implementations of Firefox and Chrome requires TLS for HTTP/2, I'm also trying to improve the performance of WarpTLS. In the process, I need to change the Connection data type. I felt nervous a bit because Connection is exported from the top module, Network.Wai.Handler.Warp. I believe there are only two users of this data type: Warp itself and WarpTLS. Normal users do not use it. So, Connection should be exported from the Internal module. This motivated me to clean up the Warp APIs.

The APIs of Warp 3.0 has the following issues:

  • The top module exports many internal APIs which are prone to change.
  • The Internal module does not export enough internal APIs.

Michael and I decided to clean up the Warp APIs. The following changes will be made in the version 3.1:

  • Already deprecated APIs (settingsFoo) are removed from the top module.
  • In the documentation of the top module, each API is categorized into either standard or internal.
  • The Internal module exports all internal APIs including the internal APIs in the top module.
  • Stopping exporting the Buffer and Timeout module which have been exported from the top module.

The standard APIs of the top module mainly consist of high-level run functions, Settings related stuff and necessary data types. We try to maintain these stably.

The internal APIs in the top module will be removed in Warp version 3.2. This is just documented. DEPRECATED pragmas are not added since there is no simple way to make an API deprecated in the top moudle but live in the internal module.

Warp version 3.1 is not released yet but is available from github repository. We will wait for a week or two to hear users' opinions.

June 25, 2015 09:00 AM

June 24, 2015

FP Complete

Why is stack not cabal?

This blog post is intended to answer two very frequest questions about stack: how is it different from Cabal? And: Why was it developed as a separate project instead of being worked on with Cabal?

Before we delve into the details, let’s first deconstruct the premises of the questions. There are really three things that people talk about when they say “Cabal”:

  1. a package metadata format (.cabal files) and specification for a “common architecture for building applications and tools”, aka Cabal-the-spec,
  2. an implementation of the spec as a framework, aka Cabal-the-library,
  3. cabal-install, aka Cabal-the-tool, which is a command-line tool that uses Cabal-the-library.

Stack complies with Cabal-the-spec, both in the sense that it groks .cabal files in their entirety and behaves in a way that complies with the spec (insofar as that is relevant since the spec hasn’t seen any updates in recent years). In fact it was easy for Stack to do, because just like Cabal-the-tool, it is implemented using Cabal-the-library. Therefore, a first answer to the questions at hand is that stack is Cabal: it is 100% compatible with the existing Cabal file format for specifying package metadata, supports exactly the same package build harnesses and is implemented on top of the same reference implementation of the spec as cabal-install, which is just one tool among others using Cabal-the-library. cabal-install and stack are separate tools that both share the same framework. A successful framework at that: Haskell’s ecosystem would not be where it is today without Cabal, which way back in 2004, for the first time in the long history of Haskell made it possible to easily reuse code across projects by standardizing the way packages are built and used by the compiler.

Stack is different in that it is a from-the-ground-up rethink of Cabal-the-tool. So the real questions are: why was a new tool necessary, and why now? We’ll tackle these questions step-by-step in the remainder of this post:

  • What problem does stack address?
  • How are stack’s design choices different?
  • stack within the wider ecosystem

The problem

Stack was started because the Haskell ecosystem has a tooling problem. Like any number of other factors, this tooling problem is limiting the growth of the ecosystem and of the community around it. Fixing this tooling problem was born out of a systematic effort of growth hacking: identify the bottlenecks that hamper growth and remove them one by one.

The fact that Haskell has a tooling problem is not a rumour, nor is it a fringe belief of disgruntled developers. In an effort to collect the data necessary to identifying the bottlenecks in the growth of the community, FP Complete conducted a wide survey of the entire community on behalf of the Commercial Haskell SIG. The results are in and the 1,200+ respondents are unequivocal: package management with cabal-install is the single worst aspect of using Haskell. Week after week, Reddit and mailing list posts pop up regarding basic package installation problems using cabal-install. Clearly there is a problem, no matter whether seasoned users understand their tools well, know how to use it exactly right and how to back out gracefully from tricky situations. For every battle hardened power user, there are 10 enthusiasts willing to give the language a try, if only simple things were simple.

Of a package building and management tool, users expect, out-of-the-box (that means, by default!):

  1. that the tool facilitates combining sets of packages to build new applications, not fail without pointing to the solution, just because packages advertize conservative bounds on their dependencies;
  2. that the tool ensures that success today is success tomorrow: instructions that worked for a tutorial writer should continue to work for all her/his readers, now and in the future;
  3. that invoking the tool to install one package doesn’t compromise the success of invoking the tool for installing another package;
  4. that much like make, the UI not require the user to remember what previous commands (s)he did or did not run (dependent actions should be run automatically and predictably).

In fact these are the very same desirable properties that Johan Tibell identified in 2012 and which the data supports today. If our tooling does not support them, this is a problem.

Stack is an attempt to fix this problem - oddly enough, by building in at its core much of the same principles that underlie how power users utilize cabal-install successfully. The key to stack’s success is to start from common workflows, choosing the right defaults to support them, and making those defaults simple.

The design

One of the fundamental problems that users have with package management systems is that building and installing a package today might not work tomorrow. Building and installing on my system might not work on your system. Despite typing exactly the same commands. Despite using the exact same package metadata. Despite using the exact same version of the source code. The fundamental problem is: lack of reproducibility. Stack strives hard to make the results of every single command reproducible, because that is the right default. Said another way, stack applies to package management the same old recipe that made the success of functional programming: manage complexity by making the output of all actions proper functions of their inputs. State explicitly what your inputs are. Gain the confidence that the outputs that you see today are the outputs that you see tomorrow. Reproducibility is the key to understandability.

In the cabal workflow, running cabal install is necessary to get your dependencies. It's also a black box which depends on three pieces of global, mutable, implicit state: the compiler and versions of system libraries on your system, the Cabal packages installed in GHC’s package database, and the package metadata du jour downloaded from Hackage (via cabal update). Running cabal install at different times can lead to wildly different install plans, without giving any good reason to the user. The interaction with the installed package set is non-obvious, and arbitrary decisions made by the dependency solver can lead to broken package databases. Due to lack of isolation between different invocations of cabal install for different projects, calling cabal install the first time can affect whether cabal install will work the second time. For this reason, power users use the cabal freeze feature to pin down exactly the version of every dependency, so that every invocation of cabal install always comes up with the same build plan. Power users also build in so-called “sandboxes”, in order to isolate the actions of calling cabal install for building the one project from the actions of calling cabal install for building this other project.

In stack, all versions of all dependencies are explicit and determined completely in a stack.yaml file. Given the same stack.yaml and OS, stack build should always run the exact same build plan. This does wonders for avoiding accidentally breaking the package database, having reproducible behavior across your team, and producing reliable and trustworthy build artifacts. It also makes it trivial for stack to have a user-friendly UI of just installing dependencies when necessary, since future invocations don’t have to guess what the build plan of previous invocations was. The build plan is always obvious and manifest. Unlike cabal sandboxes, isolation in stack is complete: packages built against different versions of dependencies never interfere, because stack transparently installs packages in separate databases (but is smart enough to reuse databases when it is always safe to do, hence keeping build times low).

Note that this doesn’t mean users have to painstakingly write out all package versions longhand. Stack supports naming package snapshots as shorthand for specifying sets of package versions that are known to work well together.

Other key design principles are portability (work consistently and have a consistent UI across all platforms), and very short ramp-up phase. It should be easy for a new user with little knowledge of Haskell to write “hello world” in Haskell, package it up and publish it with just a few lines of configuration or none at all. Learning a new programming language is challenge enough that learning a new package specification language is quite unnecessary. These principles are in contrast with those of platform specific and extremely general solutions such a Nix.

Modularity (do one thing and do it well), security (don’t trust stuff pulled from Internet unless you have a reason to) and UI consistency are also principles fundamental to the design, and a key strategies to keeping the bug count low. But more on that another time.

These have informed the following "nice to have" features compared to cabal-install:

  • multi-package project support (build all packages in one go, test all packages in one go…),
  • depend on experimental and unpublished packages directly, stored in Git repositories, not just Hackage and the local filesystem,
  • transparently install the correct version of GHC automatically so that you don’t have to (and multiple concurrently installed GHC versions work just fine),
  • optionally use Docker for bullet-proof isolation of all system resources and deploying full, self-contained Haskell components as microservices.

The technologies underpinning these features include:

  • Git (for package index management),
  • S3 (for high-reliability package serving),
  • SSL libraries (for secure HTTP uploads and downloads),
  • Docker,
  • many state-of-the-art Haskell libraries.

These technologies have enabled swift development of stack without reinventing the wheel and have helped keep the implementation stack simple and accessible. With the benefit of a clean slate to start from, we believe stack to be very hackable and easy to contribute to. These are also technologies that cabal-install did not have the benefit of being able to use when it was first conceived some years ago.

Whither cabal-install, stack and other tools

Stack is but one tool for managing packages on your system and building projects. Stack was designed specifically to interoperate with the existing frameworks for package management and package building, so that all your existing packages work as-is with stack, but with the added benefit of a modern, predictable design. Because stack is just Cabal under the hood, other tools such as Halcyon for deployment and Nix are good fits complement stack nicely, or indeed cabal-install for those who prefer to work with a UI that they know well. We have already heard reports of users combining these tools to good effect. And remember: stack packages are cabal-install packages are super-new-fangled-cabal-tool packages. You can write the exact same packages in stack or in another tool, using curated package sets if you like, tight version bounds à la PVP if you like, none or anything at all. stack likes to make common usage easy but is otherwise very much policy agnostic.

Stack is a contributor friendly project, with already 18 contributors to the code in its very short existence, several times more bug reporters and documentation writers, and counting! Help make stack a better tool that suits your needs by filing bug reports and feature requests, improving the documentation and contributing new code. Above all, use stack, tell your friends about it. We hope stack will eliminate a documented bottleneck to community growth. And turn Haskell into a more productive language accessible to many more users.

June 24, 2015 01:00 PM

June 23, 2015

Noam Lewis

In Python, don’t initialize local variables unnecessarily

A common pattern:

def foo():
    x = some value # but do we need this? (short answer: no)
    if something:
        # ... do stuff ...
        x = 'bla'
    else:
        x = 'blo'

The variable x is being initialized before the if/else, but the intention of the programmer is that its value will actually be determined by the if/else itself. If somebody later comes around and mistakenly removes one of the assignments (inside ‘if’ or ‘else’), no runtime error will occur and x will remain initialized to a probably wrong value.

Leaving out the initialization is better – in that case, forgetting to set x in one of the branches will cause an UnboundLocalError:

>>> def foo():
...     if False:
...         x = 0
...     return x
... 
>>> foo()
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 4, in foo
UnboundLocalError: local variable 'x' referenced before assignment

Errors are good! (when they flag buggy code)

Now, what if we also have an x declared in the global scope? Because of how Python handles variable scope, the error will still happen (which is good).

>>> x = 1
>>> def foo():
...     if False:
...         x = 0
...     return x
... 
>>> foo()
Traceback (most recent call last):
..
UnboundLocalError: local variable 'x' referenced before assignment

Summary: In Python, don’t initialize variables until you know what value to assign to them.


by sinelaw at June 23, 2015 06:07 PM

FP Complete

stack 0.1 released

A few weeks ago, we announced the first public beta of stack, a new build tool for Haskell. Since then we've seen a huge amount of work on stack: code contributions, feature requests, bug reports, and design discussions. In other words, the response we've had from the community has been amazing. Thank you for helping us push forward with improved Haskell tooling.

Today we're announcing a new milestone: we believe that stack is now stable enough to be the standard build tool for most Haskell development. We're taking the beta label off of stack, and recommending people dive in. Please keep in mind that stack is still young software, and there are likely corner cases that haven't been worked out fully. However, the feedback we've received from other users has indicated that stack is ready. The stack team itself (both inside and outside of FP Complete) has been using stack for months now as its primary tool. And at FP Complete, we've already moved our biggest projects over to it.

Relevant links:

One question which I've personally held off on addressing until now is a direct comparison between stack and other build tools, cabal being the most obvious comparison. Expect a blog post on the subject in the near future. For now, my recommendation is: try out stack, and see if you like it.

Bringing Haskell to the masses

While stack started with the needs of existing Commercial Haskell users, the goal we've always had in mind for it is a tool that lowers the barrier to entry for non-Haskellers. As we've discussed before, FP Complete has done quite a bit of research on this topic, and the data shows that build tool issues have been an obstacle to adoption. We want stack to solve that, and we want your help.

Firstly: keep doing what you've been doing! Using stack, testing it, providing feedback, reporting bugs; all of these make stack better for both our existing community and for newcomers. But the next step is marketing Haskell outside of our community. All of us Haskellers already know how wonderful a language Haskell is. Let's send a clear message to the rest of the world that it's time they try Haskell. stack's aim is to remove those initial barriers. But now we need to tell people what Haskell has to offer.

If you're interested in getting involved in this kind of a push, please join and discuss on the Commercial Haskell mailing list. There's no preexisting formula for how this should be done. But the basic idea I have right now is:

  • Give users a clear message on how to get up and running with Haskell based on stack
  • Give a good example of a strength Haskell has
  • Make it easy for people to play around with the concept with some sample code

I'm sure others will have better ideas than I on how to accomplish this. If you want to see Haskell take off as not only a successful, powerful language (which it is now), but as a more popular language too, come get involved!

June 23, 2015 05:00 PM

June 22, 2015

Philip Wadler

How badly will Audible misuse my contact list?


In today's world, where our books, music, and photos belong to the Cloud rather than to ourselves, one problem we face is commercial concerns insisting on increased access to personal data.

I received the following note from Audible:

It looks like you may have an older version of Audible app installed on your device which needs to be updated before 6/30/15 to ensure that you enjoy uninterrupted access to your library.
Option 1: Continue to use the older version of the app.
If you receive an error message when you attempt to sign in, look in your emails for a password that you will need for sign in.
OR
Option 2 (Recommended): Upgrade to the latest version.
...

Warmest Regards,
The Audible Team
What the note doesn't mention is that updating the app requires giving Audible access to my contacts list.

Does anyone know how Audible is using the contact list? Worst case scenario is they email advertisements to my friends in my name, telling them what I am currently reading.

Do customers have any legal redress? Changing the terms of service to require access to the customer's contact list is the sort of thing the law should protect against.


by Philip Wadler ([email protected]) at June 22, 2015 04:54 PM

Daniil Frumin

Darcs binaries for OSX with Homebrew

Recently I’ve updated my Darcs homebrew build to Darcs 2.10. You can install it with

brew install http://darcs.covariant.me/darcs.rb

The formula contains a context (--exact-version) and it is a static binary.


Tagged: darcs, haskell, homebrew

by Dan at June 22, 2015 12:33 PM

Mark Jason Dominus

My week at Recurse Center

In late April I served a residency at Recurse Center, formerly known as Hacker School. I want to write up what I did before I forget.

Recurse Center bills itself as being like a writer's retreat, but for programming. Recursers get better at programming four days a week for three months. There are some full-time instructors there to help, and periodically a resident, usually someone notable, shows up for a week. It's free to students: RC partners with companies that then pay it a fee if they hire a Recurser.

I got onto the RC chat system and BBS a few weeks ahead and immediately realized that it was going to be great. I am really wary about belonging to groups, but I felt like I fit right in at RC, in a way that I hadn't felt since I went off to math camp at age 14. Recurse Center isn't that different from math camp now that I think about it.

The only prescribed duty of a resident is to give a half-hour talk on Monday night, preferably on a technical topic. I gave mine on the history and internals of lightweight hash structures in programming languages like Python and Perl. (You can read all about that if you want to.)

Here's what else I did:

  1. I gave a bunch of other talks: two on Git, one on calculating with continued fractions, one on how the Haskell type inferencer works, one on the topology of data types, one on the Unix process model, one on Alien Horrors from the Dawn of Unix. This was too many talks. I didn't have enough energy and time to prepare all of them properly. On the other hand, a lot of people were very complimentary about the talks and said they were very glad that I gave so many. Also, giving talks is a great way to get people familiar with you so that they won't be shy about talking to you or asking you to work with them. But I think I'll cut it down to one per day next time.

  2. Alex Taipale was inspired by my hash talk to implement hashes synthetically in Python, and I paired with her on that for the first part and reviewed her code a couple of times after. It was really fun to see how she went about it.

  3. Libby Horacek showed me around the text adventure game she wrote in Haskell. I had the first of several strokes of luck here. Libby had defined an input format to specify the room layout and the objects, and I observed that it was very similar to Asherah, a project that another Recurser, Michelle Steigerwalt, had done a couple of years before. I found this out because I read everyone's self-posted bio ahead of time and browsed the interesting-sounding links.

  4. Aditya Mukerjee was implementing Git in Go. He wanted help deciphering the delta format. Later I paired with Aditya again and we debugged his implementation of the code that expanded the deltas back into complete files. I hadn't known any Go but it's easy to pick up.

  5. Geoffrey Gilmore had read my ancient article on how to write a regex matcher. He had written his own implementation in Scala and wanted to show it to me. I didn't know any Scala but the code was very clear. Geoffrey had worked out a clever way to visualize the resulting finite automaton: his automaton object had a method that would dump out its graph in the "dot" language, and he could feed that to Graphviz to get it to draw the graph.

  6. I had a conference with Ahmed Abdalla and Joel Burget about SML. The main question they wanted me to answer: Why might they want to look at SML instead of Haskell? This was a stroke of luck: I was unusually well-prepared to answer this question, having written many thousands of lines of SML since about 1993. My answer was unequivocally that there was no reason, SML was obsolete, because it had big problems which had never been solved, and Haskell had been introduced in part to solve, avoid, or finesse these problems.

    For example, nobody knows how to incorporate references into a Hindley-Milner type system. SML has tried at least three methods for doing this over the years. They all suck, and none of them work right. Haskell avoids the whole issue: no references. If you want something like references, you can build a monad that simulates it locally.

    I could probably write a whole blog article about this, so maybe another time.

  7. Libby wanted to pair with me again. She offered me a choice: she was building an e-reader, controlled by a Raspberry Pi, and mounted inside an antique book that she had hollowed out. I would have been willing to try this, although I didn't know anything about Raspberry Pi. But my other choice was very attractive: she was reviving KiSS, an ancient Windows paper-doll app that had been current in the 1990s. people had designed hundreds or thousands of dolls and costumes, which were all languishing because nobody wanted to run the app any more. She wanted to reimplement the dress-up program in Javascript, and port the doll and clothing cels to PNG files. Here I had another stroke of luck. I was already familiar with the program, and I think I have even been into its source code at some point.

    Libby had found that Gimp could load a KiSS cel, so we looked at the Gimp source code to figure out the file format. She had been planning to use libpng to turn the cel into a PNG, but I showed her a better way: convert it into a PPM file and feed to to ppmtopng. This saved a lot of trouble! (I have written a little bit about this approach in the past.) Libby hacked in the Gimp code, grafting her PPM file writing code into the Gimp cel reading code in place of Gimp's internal pixmap operations. It worked!

  8. I talked to Chris Ball about his GitTorrent project. Chris wants to make a decentralized github that doesn't depend on the GitHub company or on their technical infrastructure. He spent a long time trying to make me understand why he wanted to do the project at all and what it was for. I think I eventually got it. It also transpired that Chris knows way more about BitTorrent than I do. I don't think I was much help to Chris.

  9. Jesse Chen paired with me to fix the layout problems that have been troubling my blog for years. We redid the ancient table-based layout that I had inherited from Blosxom ten years ago, converting it mostly to CSS, and fixed a bunch of scrolling problems. We also fixed it to be legible on a phone display, which it previously wasn't. Thanks Jesse!

  10. I had a discussion with Michelle Steigerwalt about big-O notation and how you figure out what an algorithm's big-O-ness is, either from counting lines in the source code or the pseudocode, or from running the algorithm on different-size inputs and timing it. It's fun that you can do the static analysis and then run the program and see it produce the results you predicted.

There was a lot of other stuff. I met or at least spoke with around 90% of the seventy or so Recursers who were there with me. I attended the daily stand-up status meetings with a different group each time. I ate lunch and dinner with many people and had many conversations. I went out drinking with Recursers at least once. The RC principals kindly rescheduled the usual Thursday lightning talks to Monday so I could attend. I met Erik Osheim for lunch one day. And I baked cookies for our cookie-decorating party!

It was a great time, definitely a high point in my life. A thousand thanks to RC, to Rachel Vincent and Dave Albert for essential support while I was there, and to the facilitators, principals, and especially to the other Recursers.

by Mark Dominus ([email protected]) at June 22, 2015 10:49 AM

June 21, 2015

Dimitri Sabadie

HID and MSI keyboards

MSI keyboards

I have a MSI GS60 Ghost Pro 2QE I’m very proud of. It’s sexy and powerful. It comes with a fancy and configurable backlit keyboard.

There’s a tool called SteelSeries Engine for Windows we can use to change the colors of the keyboard. It supports several features:

  • changing colors of three parts of the keyboard (left, middle and right) ;
  • changing modes (normal, breathe, wave, demo, gaming).

Unfortunately, that software doesn’t work on Linux, even with wine. I tried hard to make it work and never actually found a way to run it. Then, I decided to look for alternatives and… found nothing working.

Yesterday, I tried a node.js-powered tool called msi-keyboard. And it worked. However, the interface and user interface was not my cup of tea. I decided to dig in in order to understand how it works, and I decided to write my own tool with a decent interface.

HID access

The key idea is that such keyboards are just HID devices. As a Haskell programmer, I looked for something that would grant me access to such devices, but nothing was working. There’s a hidapi package, but it doesn’t work and has bugs.

I didn’t give up though. I wrote my own Haskell HID API binding, called hid. Several good things about it:

  • it does work ;
  • it’s simple ;
  • I lately wrote a software using it.

Feel free to install and use it!

msi-kb-backlit

Then, I wrote another Haskell package, msi-kb-backlit. It might require super user rights to work. If you’re not a Haskeller, you can find installation details here.

Note: if you use Archlinux, you can directly download msi-kb-backlit through the AUR! Search for msi-kb-backlit with yaourt, or download the tarball.

The software has an embedded documentation to help you tweak with colors and modes. ;)

Feel free to use all those pieces of software. I made them with love for you all!

Enjoy your week end, and keep the vibe!

by Dimitri Sabadie ([email protected]) at June 21, 2015 10:48 PM

Christopher Done

Existentials and the heterogenous list fallacy

An oft-stated argument against static typing is that heterogenous lists are unreasonably difficult to model. Why is static typing being so difficult? Why can’t it just be like dynamic typing? This is a specious argument.

For example, in one article I read, I saw:

In fact you can program heterogeneous lists in dependently typed languages, but it’s unreasonably complicated. Python makes no complaints:

>>> g = lambda x: x**2
>>> [1, 'a', "hello", g]
[1, 'a', 'hello', <function <lambda> at 0x103e4aed8>]

To me this is one methodological weakness of type theory […]

(I’m not sure what “methodological weakness” is supposed to mean, but let’s ignore that.)

There are two problems with this argument and demonstration:

  1. It’s contrived. I’ve written about as much Emacs Lisp and JavaScript as I have written Haskell and C#, and I cannot with all intellectual honesty remember wanting a heterogenous list.1
  2. It’s ill-defined. What is this data structure? What can I assume about the elements so that I can write operations generically? Which, I presume, is the only reason I would be using a list in the first place (otherwise a record would be the correct thing to use); to write algorithms that apply generally to any index.

Even cutting the author some slack and assuming they might want to just temporarily put some things together as a tuple, static languages have tuples, which are heterogenous.

When you look at it beyond the superficial, it’s rather odd.

Regardless, I am sporting. Some people will say, yes, okay, it’s contrived, and never really arises, but if I really wanted this, how could I do it in a statically typed language? So here is the above in Haskell.

Let’s look at the example:

>>> g = lambda x: x**2
>>> [1, 'a', "hello", g]
[1, 'a', 'hello', <function <lambda> at 0x103e4aed8>]

So the list contains a bunch of disparate things and the implicit invariant here is that we can print each of them out. So we can model that with an existential data type Py (for “Python”) that holds some type a that is showable.

data Py = forall a. Show a => Py a
instance Show Py where show (Py s) = show s

(Oh, Haskell doesn’t define an instance for printing functions, so let’s use instance Show (a -> b) where show _ = "<function>" to vaguely match Python.)

I may not know, or care, what the type is, but I at least need to know something about it, in a duck-typing kind of way. If it walks like a duck, quacks like a duck, etc. then it’s a good enough duck for my purposes. In this case, Py says, is it at least showable?

Now we can wrap up any type which is an instance of Show with that:

λ> [Py 1,Py 'a',Py "hello",Py (\x -> x ** 2)]
[1,'a',"hello",<function>]

And we can apply the show method (from the Show class) to each element generically:

λ> map (\(Py p) -> show p) [Py 1,Py 'a',Py "hello",Py (\x -> x ** 2)]
["1","'a'","\"hello\"","<function>"]

So that’s how to do it.

Continuing the discussion, if I extend the type to support also Num (numbers),

data Py = forall a. (Show a,Num a) => Py a

then I can use *, but only for actual numbers. If I try something else I get a type error:

λ> map (\(Py p) -> Py (p*2)) [Py 1,Py 'c']
<interactive>:634:33:
    No instance for (Num Char) arising from a use of ‘Py’
    In the expression: Py 'c'
λ> map (\(Py p) -> Py (p*2)) [Py 1,Py 2.0]
[2,4.0]

Doing the same in Python, we have more or less the same error:

>>> def pow2(x): return x ** 2
...
>>> list(map(pow2,[1,5,'a']))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in pow2
TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

The difference being the Haskell error was signaled statically at compile-time and the Python error was signalled dynamically at run-time.


  1. To demonstrate my good faith, I have constructed heterogenous lists in Emacs Lisp to model a record, but this is because Emacs Lisp lacks native records. JavaScript, on the other hand, has records.

June 21, 2015 12:00 AM

June 20, 2015

Joachim Breitner

Running circle-packing in the Browser, now using GHCJS

Quite a while ago, I wrote a small Haskell library called circle-packing to pack circles in a tight arrangement. Back then, I used the Haskell to JavaScript compiler fay to create a pretty online demo of that library, and shortly after, I create the identical demo using haste (another Haskell to JavaScript compiler).

The main competitor of these two compilers, and the most promising one, is GHCJS. Back then, it was too annoying to install. But after two years, things have changed, and it only takes a few simple commands to get GHCJS running, so I finally created the circle packing demo in a GHCJS variant.

Quick summary: Cabal integration is very good (like haste, but unline fay), interfacing JavaScript is nice and easy (like fay, but unlike haste), and a quick check seems to indicate that it is faster than either of these two. I should note that I did not update the other two demos, so they represent the state of fay and haste back then, respectively.

With GHCJS now available at my fingertips, maybe I will produce some more Haskell to be run in your browser. For example, I could port FrakView, a GUI program to render, expore and explain iterated function systems, from GTK to HTML.

by Joachim Breitner ([email protected]) at June 20, 2015 08:50 PM

Dominic Steinitz

Some Background on Hidden Markov Models

Introduction

If you look at the wikipedia article on Hidden Markov Models (HMMs) then you might be forgiven for concluding that these deal only with discrete time and finite state spaces. In fact, HMMs are much more general. Furthermore, a better understanding of such models can be helped by putting them into context. Before actually specifying what an HMM is, let us review something of Markov processes. A subsequent blog post will cover HMMs themselves.

Markov Process and Chains

Recall that a transition kernel is a mapping \mu : X \times {\cal{Y}} \rightarrow \overline{\mathbb{R}}_{+} where (X, {\cal{X}}) and (Y, {\cal{Y}}) are two measurable spaces such that \mu(s, \cdot) is a probability measure on {\cal{Y}} for all s \in X and such that \mu(\cdot, A) is a measurable function on X for all A \in {\cal{Y}}.

For example, we could have X = Y = \{a, b\} and {\cal{X}} = {\cal{Y}} = \{\emptyset, \{a\}, \{b\}, \{a,b\}\} and \mu(a,\{a\}) = 0.4, \mu(a,\{b\}) = 0.6, \mu(b,\{a\}) = 0.6, \mu(b,\{b\}) = 0.4. Hopefully this should remind you of the transition matrix of a Markov chain.

Recall further that a family of such transitions \{\mu_t\}_{t \in T} where T is some index set satisfying

\displaystyle  \mu_{t+s}(x, A) = \int_{Y} \mu_s(x, {\mathrm{d}y})\mu_t(y, A)

gives rise to a Markov process (under some mild conditions — see Rogers and Williams (2000) and Kallenberg (2002) for much more detail), that is, a process in which what happens next only depends on where the process is now and not how it got there.

Let us carry on with our example and take T = \mathbb{N}. With a slight abuse of notation and since Y is finite we can re-write the integral as a sum

\displaystyle  \mu_{n+m}(x, z) = \sum_{y \in Y} \mu_m(x, y)\mu_n(y, z)

which we recognise as a restatement of how Markov transition matrices combine.

Some Examples

A Fully Deterministic System

A deterministic system can be formulated as a Markov process with a particularly simple transition kernel given by

\displaystyle  \mu_t(x_s, A) = \delta(f_t(x_s), A) \triangleq  \begin{cases}  1           & \text{if } f_t(x_s) \in    A \\  0           & \text{if } f_t(x_s) \notin A  \end{cases}

where f_t is the deterministic state update function (the flow) and \delta is the Dirac delta function.

Parameters

Let us suppose that the determinstic system is dependent on some time-varying values for which we we are unable or unwish to specify a deterministic model. For example, we may be considering predator-prey model where the parameters cannot explain every aspect. We could augment the deterministic kernel in the previous example with

\displaystyle  \mu_t(\theta_s, {\mathrm{d}\phi}) =  {{\cal{N}} \left( {\mathrm{d}\phi} \,\vert\, \theta_s, \tau^2(t-s) \right)}

where we use Greek letters for the parameters (and Roman letters for state) and we use e.g. {\mathrm{d}\phi} to indicate probability densities. In other words that the parameters tend to wiggle around like Brown’s pollen particles rather than remaining absolutely fixed.

Ornstein-Uhlenbeck

Of course Brownian motion or diffusion may not be a good model for our parameters; with Brownian motion, the parameters could drift off to \pm\infty. We might believe that our parameters tend to stay close to some given value (mean-reverting) and use the Ornstein-Uhlenbeck kernel.

\displaystyle   \mu_t(\theta_s, {\mathrm{d}\phi}) =  {{\cal{N}} \left( {\mathrm{d}\phi} \,\vert\, \alpha + (\theta_s - \alpha)e^{-\beta t},\frac{\sigma^2}{2\beta}\big(1 - e^{-2\beta t}\big) \right)}

where \beta expresses how strongly we expect the parameter to respond to perturbations, \alpha is the mean to which the process wants to revert (aka the asymptotic mean) and \sigma^2 expresses how noisy the process is.

It is sometimes easier to view these transition kernels in terms of stochastic differential equations. Brownian motion can be expressed as

\displaystyle   \mathrm{d}X_t = \sigma\mathrm{d}W_t

and Ornstein-Uhlenbeck can be expressed as

\displaystyle   \mathrm{d}X_t = -\beta(X_t - \alpha)\mathrm{d}t + \sigma\mathrm{d}W_t

where W_t is the Wiener process.

Let us check that the latter stochastic differential equation gives the stated kernel. Re-writing it in integral form and without loss of generality taking s= 0

\displaystyle   X_t = \alpha + (x_0 - \alpha)e^{-\beta t} + \sigma\int_0^t e^{-\beta(t - s)}\mathrm{d}W_s

Since the integral is of a deterministic function, the distribution of X_t is normal. Thus we need only calculate the mean and variance.

The mean is straightforward.

\displaystyle   \mathbb{E}[X_t \,\vert\, X_0 = x_0] =  \mathbb{E}\Bigg[\alpha + (x_0 - \alpha)e^{-\beta t} + \sigma\int_0^t e^{-\beta(t - s)}\mathrm{d}W_s \,\vert\, X_0 = x_0\Bigg] =  \alpha + (x_0 - \alpha)e^{-\beta t}

Without loss of generality assume t \leq u and writing \mathbb{C} for covariance

\displaystyle   \begin{aligned}  \mathbb{C}[X_u, X_t \,\vert\, X_0 = x_0] &=  \mathbb{E}\Bigg[\Bigg( \sigma\int_0^u e^{-\beta(u - s)}\mathrm{d}W_s  \Bigg)                  \Bigg( \sigma\int_0^t e^{-\beta(t - s)}\mathrm{d}W_s  \Bigg)\Bigg] \\  &=  \sigma^2e^{-\beta(u + t)}  \mathbb{E}\Bigg[\Bigg(\int_0^u e^{\beta s}\mathrm{d}W_s\Bigg)            \Bigg(\int_0^t e^{\beta s}\mathrm{d}W_s\Bigg)\Bigg] \\  &=  \sigma^2e^{-\beta(u + t)}  \mathbb{E}\Bigg[\Bigg(\int_0^t e^{\beta s}\mathrm{d}W_s + \int_t^u e^{\beta s}\mathrm{d}W_s\Bigg)            \Bigg(\int_0^t e^{\beta s}\mathrm{d}W_s\Bigg)\Bigg]  \end{aligned}

And now we can use Ito and independence

\displaystyle   \begin{aligned}  \mathbb{C}[X_u, X_t \,\vert\, X_0 = x_0] &=  \sigma^2e^{-\beta(u + t)}\mathbb{E}\Bigg[ \int_0^t e^{2\beta s}\mathrm{d}s  \Bigg] \\  &=  \frac{\sigma^2e^{-\beta(u + t)}}{2\beta}\big(e^{2\beta t} - 1\big)  \end{aligned}

Substituting in t = u gives the desired result.

Bibliography

Kallenberg, O. 2002. Foundations of Modern Probability. Probability and Its Applications. Springer New York. http://books.google.co.uk/books?id=TBgFslMy8V4C.

Rogers, L. C. G., and David Williams. 2000. Diffusions, Markov Processes, and Martingales. Vol. 1. Cambridge Mathematical Library. Cambridge: Cambridge University Press.


by Dominic Steinitz at June 20, 2015 06:20 PM

Neil Mitchell

Announcing the 'extra' library

Summary: I wrote an extra library, which contains lots of my commonly used functions.

When starting to write Bake, my first step was to copy a lot of utility functions from Shake - things like fst3 (select the first element of a triple), concatMapM (monadic version of concatMap), withCurrentDirectory (setCurrentDirectory under bracket). None of these functions are specific to either Bake or Shake, and indeed, many of the ones in Shake had originally came from HLint. Copy and pasting code is horrible, so I extracted the best functions into a common library which I named extra. Unlike the copy/paste versions in each package, I then wrote plenty of tests, made sure the functions worked in the presence of exceptions, did basic performance optimisation and filled in some obvious gaps in functionality.

I'm now using the extra library in all the packages above, plus things like ghcid and Hoogle. Interestingly, I'm finding my one-off scripts are making particularly heavy use of the extra functions. I wrote this package to reduce my maintenance burden, but welcome other users of extra.

My goal for the extra library is simple additions to the standard Haskell libraries, just filling out missing functionality, not inventing new concepts. In some cases, later versions of the standard libraries provide the functions I want, so there extra makes them available all the way back to GHC 7.2, reducing the amount of CPP in my projects. A few examples:

The module Extra documents all functions provided by the library, so is a good place to go to see what is on offer. Modules such as Data.List.Extra provide extra functions over Data.List and also reexport Data.List. Users are recommended to replace Data.List imports with Data.List.Extra if they need the extra functionality.

Which functions?

When selecting functions I have been guided by a few principles.

  • I have been using most of these functions in my packages - they have proved useful enough to be worth copying/pasting into each project.
  • The functions follow the spirit of the original Prelude/base libraries. I am happy to provide partial functions (e.g. fromRight), and functions which are specialisations of more generic functions (whenJust).
  • Most of the functions have trivial implementations. If a beginner couldn't write the function, it probably doesn't belong here.
  • I have defined only a few new data types or type aliases. It's a package for defining new utilities on existing types, not new types or concepts.

Testing

One feature I particularly like about this library is that the documentation comments are tests. A few examples:

Just True ||^ undefined  == Just True
retry 3 (fail "die") == fail "die"
whenJust (Just 1) print == print 1
\x -> fromRight (Right x) == x
\x -> fromRight (Left x) == undefined

These equalities are more semantic equality than Haskell's value equality. Things with lambda's are run through QuickCheck. Things which print to stdout are captured, so the print 1 test really does a print, which is scraped and compared to the LHS. I run these tests by passing them through a preprocessor, which spits out this code, which I then run with some specialised testing functions.

by Neil Mitchell ([email protected]) at June 20, 2015 02:46 PM

June 19, 2015

Cartesian Closed Comic

Neil Mitchell

Maximum Patches, Minimum Time

Summary: Writing a fast continuous integration server is tricky.

When I started writing the continuous integration server Bake, I thought I had a good idea of what would go fast. It turned out I was wrong. The problem Bake is trying to solve is:

  • You have a set of tests that must pass, which take several hours to run.
  • You have a current state of the world, which passes all the tests.
  • You have a stream of hundreds of incoming patches per day that can be applied to the current state.
  • You want to advance the state by applying patches, ensuring the current state always passes all tests.
  • You want to reject bad patches and apply good patches as fast as possible.

I assume that tests can express dependencies, such as you must compile before running any tests. The test that performs compilation is special because it can go much faster if only a few things have changed, benefiting from incremental compilation.

Both my wrong solution, and my subsequent better solution, are based on the idea of a candidate - a sequence of patches applied to the current state that is the focus of testing. The solutions differ in when patches are added/removed from the candidate and how the candidates are compiled.

A Wrong Solution

My initial solution compiled and ran each candidate in a separate directory. When the directory was first created, it copied a nearby candidate to try and benefit from incremental compilation.

Each incoming patch was immediately included in the candidate, compiled, and run on all tests. I would always run the test that had not passed for the longest time, to increase confidence in more patches. Concretely, if I have run test T1 on patch P1, and P2 comes in, I start testing T2 on the combination of P1+P2. After that passes I can be somewhat confident that P1 passes both T1 and T2, despite not having run T2 on just P1.

If a test fails, I bisect to find the patch that broke it, reject the patch, and immediately throw it out of the candidate.

The Problems

There are three main problems with this approach:

  • Every compilation starts with a copy of a nearby candidate. Copying a directory of lots of small files (the typical output of a compiler) is fantastically expensive on Windows.
  • When bisecting, I have to compile at lots of prefixes of the candidate, the cost of which varies significantly based on the directory it starts from.
  • I'm regularly throwing patches out of the candidate, which requires a significant amount of compilation, as it has to recompile all patches that were after the rejected patch.
  • I'm regularly adding patches to the candidate, each of which requires an incremental compilation, but tends to be dominated by the directory copy.

This solution spent all the time copying and compiling, and relatively little time testing.

A Better Solution

To benefit from incremental compilation and avoid copying costs, I always compile in the same directory. Given a candidate, I compile each patch in the series one after another, and after each compilation I zip up the interesting files (the test executables and test data). To bisect, I unzip the relevant files to a different directory. On Windows, unzipping is much more expensive than zipping, but that only needs to be done when bisecting is required. I also only need to zip the stuff required for testing, not for building, which is often much smaller.

When testing a candidate, I run all tests without extending the candidate. If all the tests pass I update the state and create a new candidate containing all the new patches.

If any test fails I bisect to figure out who should be rejected, but don't reject until I've completed all tests. After identifying all failing tests, and the patch that caused each of them to fail, I throw those patches out of the candidate. I then rebuild with the revised candidate and run only those tests that failed last time around, trying to seek out tests where two patches in a candidate both broke them. I keep repeating with only the tests that failed last time, until no tests fail. Once there are no failing tests, I extend the candidate with all new patches, but do not update the state.

As a small tweak, if there are two patches in the queue from the same person, where one is a superset of the other, I ignore the subset. The idea is that if the base commit has an error I don't want to track it down twice, once to the first failing commit and then again to the second one.

Using this approach in Bake

First, the standard disclaimer: Bake may not meet your needs - it is a lot less developed than other continuous integration systems. If you do decide to use Bake, you should run from the git repo, as the Hackage release is far behind. That said, Bake is now in a reasonable shape, and might be suitable for early adopters.

In Bake this approach is implemented in the StepGit module, with the ovenStepGit function. Since Bake doesn't have the notion of building patches in series it pretends (to the rest of Bake) that it's building the final result, but secretly caches the intermediate steps. If there is a failure when compiling, it caches that failure, and reports it to each step in the bisection, so Bake tracks down the correct root cause.

I am currently recommending ovenStepGit as the "best" approach for combining git and an incremental build system with Bake. While any incremental build system works, I can't help but plug Shake, because its the best build system I've ever written.

by Neil Mitchell ([email protected]) at June 19, 2015 06:40 PM

Mark Jason Dominus

Math.SE report 2015-05

A lot of the stuff I've written in the past couple of years has been on math.StackExchange. Some of it is pretty mundane, but some is interesting. My summary of April's interesting posts was well-received, so here are the noteworthy posts I made in May 2015.

  • What matrix transforms into and tranforms into ? was a little funny because the answer is $$\begin{pmatrix}2 & 4 \\ 6 & 8 \end{pmatrix}$$ and yeah, it works exactly like it appears to, there's no trick. But if I just told the guy that, he might feel unnecessarily foolish. I gave him a method for solving the problem and figured that when he saw what answer he came up with, he might learn the thing that the exercise was designed to teach him.

  • Is a “network topology'” a topological space? is interesting because several people showed up right away to say no, it is an abuse of terminology, and that network topology really has nothing to do with mathematical topology. Most of those comments have since been deleted. My answer was essentially: it is topological, because just as in mathematical topology you care about which computers are connected to which, and not about where any of the computers actually are.

    Nobody constructing a token ring network thinks that it has to be a geometrically circular ring. No, it only has to be a topologically circular ring. A square is fine; so is a triangle; topologically they are equivalent, both in networking and in mathematics. The wires can cross, as long as they don't connect at the crossings. But if you use something that isn't topologically a ring, like say a line or a star or a tree, the network doesn't work.

    The term “topological” is a little funny. “Topos” means “place” (like in “topography” or “toponym”) but in topology you don't care about places.

  • Is there a standard term for this generalization of the Euler totient function? was asked by me. I don't include all my answers in these posts, but I think maybe I should have a policy of including all my questions. This one concerned a simple concept from number theory which I was surprised had no name: I wanted to be the number of integers that are no larger than for which . For this is the famous Euler totient function, written .

    But then I realized that the reason it has no name is that it's simply so there's no need for a name or a special notation.

    As often happens, I found the answer myself shortly after I asked the question. I wonder if the reason for this is that my time to come up with the answer is Poisson-distributed. Then if I set a time threshold for how long I'll work on the problem before asking about it, I am likely to find the answer to almost any question that exceeds the threshold shortly after I exceed the threshold. But if I set the threshold higher, this would still be true, so there is no way to win this particular game. Good feature of this theory: I am off the hook for asking questions I could have answered myself. Bad feature: no real empirical support.

  • how many ways can you divide 24 people into groups of two? displays a few oddities, and I think I didn't understand what was going on at that time. OP has calculated the first few special cases:

    1:1 2:1 3:3 4:3 5:12 6:15

    which I think means that there is one way to divide 2 people into groups of 2, 3 ways to divide 4 people, and 15 ways to divide 6 people. This is all correct! But what could the 1:1, 3:3, 5:12 terms mean? You simply can't divide 5 people into groups of 2. Well, maybe OP was counting the extra odd person left over as a sort of group on their own? Then odd values would be correct; I didn't appreciate this at the time.

    But having calculated 6 special cases correctly, why can't OP calculate the seventh? Perhaps they were using brute force: the next value is 48, hard to brute-force correctly if you don't have a enough experience with combinatorics.

    I tried to suggest a general strategy: look at special cases, and not by brute force, but try to analyze them so that you can come up with a method for solving them. The method is unnecessary for the small cases, where brute force enumeration suffices, but you can use the brute force enumeration to check that the method is working. And then for the larger cases, where brute force is impractical, you use your method.

    It seems that OP couldn't understand my method, and when they tried to apply it, got wrong answers. Oh well, you can lead a horse to water, etc.

    The other pathology here is:

    I think I did what you said and I got 1.585times 10 to the 21

    for the case. The correct answer is $$23\cdot21\cdot19\cdot17\cdot15\cdot13\cdot11\cdot9\cdot7\cdot5\cdot3\cdot1 = 316234143225 \approx 3.16\cdot 10^{11}.$$ OP didn't explain how they got so there's not much hope of correcting their weird error.

    This is someone who probably could have been helped in person, but on the Internet it's hopeless. Their problems are Internet communication problems.

  • Lambda calculus typing isn't especially noteworthy, but I wrote a fairly detailed explanation of the algorithm that Haskell or SML uses to find the type of an expression, and that might be interesting to someone.

  • I think Special representation of a number is the standout post of the month. OP speculates that, among numbers of the form (where are prime), the choice of is unique. That is, the mapping is reversible.

    I was able to guess that this was not the case within a couple of minutes, replied pretty much immediately:

    I would bet money against this representation being unique.

    I was sure that a simple computer search would find counterexamples. In fact, the smallest is which is small enough that you could find it without the computer if you are patient.

    The obvious lesson to learn from this is that many elementary conjectures of this type can be easily disproved by a trivial computer search, and I frequently wonder why more amateur mathematicians don't learn enough computer programming to investigate this sort of thing. (I wrote recently on the topic of An ounce of theory is worth a pound of search , and this is an interesting counterpoint to that.)

    But the most interesting thing here is how I was able to instantly guess the answer. I explained in some detail in the post. But the basic line of reasoning goes like this.

    Additive properties of the primes are always distributed more or less at random unless there is some obvious reason why they can't be. For example, let be prime and consider . This must have exactly one of the three forms or for some integer . It obviously has the form almost never (the only exception is ). But of the other two forms there is no obvious reason to prefer one over the other, and indeed of the primes up to 10,000, 611 are of the type and and 616 are of the type .

    So we should expect the value to be distributed more or less randomly over the set of outputs, because there's no obvious reason why it couldn't be, except for simple stuff, like that it's obviously almost always even.

    So we are throwing a bunch of balls at random into bins, and the claim is that no bin should contain more than one ball. For that to happen, there must be vastly more bins than balls. But the bins are numbers, and primes are not at all uncommon among numbers, so the number of bins isn't vastly larger, and there ought to be at least some collisions.

    In fact, a more careful analysis, which I wrote up on the site, shows that the number of balls is vastly larger—to have them be roughly the same, you would need primes to be roughly as common as perfect squares, but they are far more abundant than that—so as you take larger and larger primes, the number of collisions increases enormously and it's easy to find twenty or more quadruples of primes that all map to the same result. But I was able to predict this after a couple of minutes of thought, from completely elementary considerations, so I think it's a good example of Lower Mathematics at work.

    This is an example of a fairly common pathology of math.se questions: OP makes a conjecture that never occurs or that there are no examples with property , when actually almost always occurs or every example has property .

    I don't know what causes this. Rik Signes speculates that it's just wishful thinking: OP is doing some project where it would be useful to have be unique, so posts in hope that someone will tell them that it is. But there was nothing more to it than baseless hope. Rik might be right.

[ Addendum 20150619: A previous version of this article included the delightful typo “mathemativicians”. ]

by Mark Dominus ([email protected]) at June 19, 2015 03:01 AM

Christopher Done

The constraint trick for instances

Ever seen this in a library,

instance (var ~ AType) => ClassName (SomeType var)

and thought, “Shenanigans! Why not just have this?”

instance ClassName (SomeType AType)

Me too!

I only learned of this solution relatively recently, and I know experienced Haskellers who also only understood this recently or still don’t. Hence this quick write up. Here’s the thought process.

We’re writing a trivial pretty printer and we’re using Writer. We write things like:

λ> execWriter (do tell "hello"; tell "world" :: Writer String ())
"helloworld"

Quality. But writing tell every time is so boring! How about we use the IsString class so that we can just write the string literals like this?

do "hello"; "world"

Let’s write the IsString instance:

instance IsString (Writer String a) where
  fromString = tell

What do you say, GHC?

Couldn’t match type ‘a’ with ‘()’

‘a’ is a rigid type variable bound by the instance declaration

Oh. Good point. The type of our tell call results in Writer String (). A small set back. Fine, let’s change the instance declaration to just be ():

instance IsString (Writer String ()) where
  fromString = tell

GHC loves it!

Let’s try using it:

λ> execWriter (do "hello"; "world" :: Writer String ())
<interactive>:42:16:
    No instance for (IsString (WriterT String Identity a))
      arising from the literal ‘"hello"The type variable ‘a’ is ambiguous

This displeases me. But it adds up given the type of (>>):

(>>) :: Monad m => m a -> m b -> m b

In _ >> return () :: Writer String (), the type of _ is Writer String a, so we really need an IsString instance that matches that. But we already tried that. Oh, woe!

Some people reading this will be nodding in recognition of this same problem they had while writing that perfect API that just won’t work because of this niggling issue.

Here comes the trick.1 So let’s go back to a basic instance:

data MyTuple a b = MyTuple a b
instance Show (MyTuple a b) where
  show _ = "MyTuple <some value> <some value>"

Suppose I replace this instance with a new instance that has constraints:

instance (Show a,Show b) => Show (MyTuple a b) where
  show (MyTuple a b) = "MyTuple " ++ show a ++ " " ++ show b

Question: Does that change whether GHC decides to pick this new version of instance over others that may be available, compared to the one above? Have a think.

The answer is: nein! The constraints of an instance don’t have anything to do with deciding whether an instance is picked from the list of instances available. Constraints only apply after GHC has already decided it’s going with this instance.

So, cognizant of this obvious-after-the-fact property, let’s use the equality constraint that was introduced with GADTs and type families (enabling either brings in ~):

instance a ~ () => IsString (Writer String a) where
  fromString = tell

Let’s try it:

λ> execWriter (do "hello" ; "world" :: Writer String ())
"helloworld"

This instance is picked by GHC, as we hoped, because of the a. The instance method also type checks, because the constraint applies when type checking the instance methods, just like if you write a regular declaration like:

foo :: (a ~ ()) => a
foo = ()

That’s it! This crops up in a number of my own libraries and knowing this really helped me. Here is a real example from my lucid library:

instance (a ~ (),Monad m) => Monoid (HtmlT m a) where
  mempty  = return mempty
  mappend = liftM2 mappend

Hope this was helpful!


  1. Actually, it’s a natural consequence to grokking how instance resolution works (but calling it a “trick” makes for a catchy title).

June 19, 2015 12:00 AM

June 17, 2015

Robin KAY

Fast and Slow Sticky Notes in Haskell and Qt

Last year I gave a talk for the London Haskell User Group on building a Sticky Notes application using Haskell, Qt, and the HsQML binding between them. The video has been out for a while now and made the rounds elsewhere, but I've thus far neglected to post it on my own blog! Slides are here. See below for an update on the example program and an answer to the most commonly asked question after the fact.

<iframe allowfullscreen="" frameborder="0" height="270" src="https://www.youtube.com/embed/JCSxWfUvi6o" width="480"></iframe>

Fast and Slow? An update!

For the sakes of simplicity, the original version of the sticky notes program shown in the talk uses an SQLite database to serve the view directly with no in-memory model or caching. This is at least the cause of some sluggishness and at worst pathological performance depending on your machine.

As alluded to in the talk, there is a better albeit more complicated way. This involves moving the slow database accesses onto a separate thread and keeping a fast in-memory model to support the user interface. Changes made by user are now queued up and then flushed to disk asynchronously.

I've now released an updated version (0.3.3.0) of the hsqml-demo-notes package which includes a new faster version of the program illustrating exactly this technique. This hsqml-notes executable is now built using the new code and, for reference, the original code is built into the hsqml-notes-slow executable.

The fast variant uses three MVars to communicate between the UI and database threads. An MVar is kind of synchronisation primitive offered by Haskell akin to a box which may either contain a data value or be empty. Operations on MVars take out or put in values to the box and will block if the MVar is not in the appropriate state to accept or produce a value. Hence, a variety of different constructs such as locks and semaphores can built out of MVars.

The first MVar in the new notes code, modelVar, contains a Map from note IDs to the data associated with each note. This is the in-memory model. It includes all the fields held in the database table plus an additional field which indicates whether there are any pending changes which need flushing to the database (whether the record is "dirty"). The MVar semantics here act as a lock to prevent more than one thread trying to manipulate the model at the same time.

A second MVar, cmdVar, is used as a shallow channel for the UI thread to signal the database thread when there is work to do. The database thread normally waits blocked on this MVar until a new command value is placed in it, at which point it takes out and acts upon it. The first command given to the database thread when the program starts is to populate the model with the data stored on disk. Thereafter, whenever a user makes a change to the model, the dirty bit is set on the altered record and a command issued to the database thread to write those dirty records to the database.

Finally, the third possible type of command causes the database thread to close the SQLite file and cleanly exit. In that case, the third MVar, finVar, is used as a semaphore to signal back to the UI thread once it has shut down cleanly. This is necessary because the Haskell runtime will normally exit once the main thread has finished, and the MVar provides something for it block on so that the database thread has time to finish cleaning up first.


What is the FactoryPool actually for?

QML objects require a relatively explicit degree of handling by Haskell standards because the idea that data values can have distinct identities to one another even if they are otherwise equal is somewhat at odds with Haskell's embrace of referential transparency. This sense of identity is important to the semantics of QML and can't be swept under the rug too easily. Crucially, using signals to pass events from Haskell to QML requires that both the Haskell and QML code are holding on to exactly the same object.

One way to accomplish this is to carefully keep track of the QML objects you create in your own code. The factory-pool is an attempt to provide a convenience layer atop object creation which saves the programmer from having to do this. It is essentially an interning table which enforces the invariant that there is no more than one QML object for each distinct value (according to the Ord type-class) of the Haskell type used to back it. If you query the pool twice with two equal values then it will give you the same object back both times. Importantly, it uses weak references so that objects which are no longer in use are cleared from the intern table and it doesn't grow in size indefinitely.

One of the difficulties people have had with understanding the factory-pool from the notes example is that it's not generally necessary to make it work. Aside from the initial loading of the database, all the activity is driven from the front-end and so displaying a single view isn't strictly reliant on the signals firing correctly. If you replace the code to retrieve an object from the pool with a plain object instantiation, the default QML front-end for the demo would still work the same, albeit more wastefully.

To see the pool doing something useful, try the "notes-dual" front-end (by specifying it on the command line), which I've come to think of as the most interesting of the demos. It displays two front-ends simultaneously backed by the same data model. Changes in one are immediately reflected in the other. This works because when each front-end retrieves the list of notes they both obtains the same ObjRef from the pool for each Note as each other. Hence, when a change is made in one front-end, and a change signal is fired, both the front-ends receive it and remain in sync.

Without the pool, the onus would be on the application code to keep track of the objects in some other fashion. For example, if each read of the notes property created a new set of objects every time it was accessed then many more objects would need to have signals fired on them in order to ensure that every piece of active QML had received notice of the event. Each front-end would still be backed by the same data, methods and properties would still have access to the same set of Note values, but their objects would be distinct from the perspective of signalling and, unless special care was taken, the front-ends wouldn't update correctly.

by Robin KAY ([email protected]) at June 17, 2015 11:09 PM

Kevin Reid (kpreid)

What I've been working on: ShinySDR

As vaguely promised before, another update on what I've been working on for the past couple of years:

ShinySDR (why yes, I am terrible at naming things) is a software-defined radio receiver application.

Specifically, it is in the same space as Gqrx, SDR#, HDSDR, etc.: a program which runs on your computer (as opposed to embedded in a standalone radio) and uses a peripheral device (rtl-sdr, HackRF, USRP, etc.) for the RF interface. Given such a device, it can be used to listen to or otherwise decode a variety of radio transmissions (including the AM and FM broadcast bands everyone knows, but also shortwave, amateur radio, two-way radios, certain kinds of telemetry including aircraft positions, and more as I get around to it).

ShinySDR is basically my “I want my own one of these” project (the UI still shows signs of “I’ll just do what Gqrx did for now”), but it does have some unique features. I'll just quote myself from the README:

I (Kevin Reid) created ShinySDR out of dissatisfaction with the user interface of other SDR applications that were available to me. The overall goal is to make, not necessarily the most capable or efficient SDR application, but rather one which is, shall we say, not clunky.

Here’s some reasons for you to use ShinySDR:

  • Remote operation via browser-based UI: The receiver can be listened to and remotely controlled over a LAN or the Internet, as well as from the same machine the actual hardware is connected to. Required network bandwidth: 3 Mb/s to 8 Mb/s, depending on settings.

    Phone/tablet compatible (though not pretty yet). Internet access is not required for local or LAN operation.

  • Persistent waterfall display: You can zoom, pan, and retune without losing any of the displayed history, whereas many other programs will discard anything which is temporarily offscreen, or the whole thing if the window is resized. If you zoom in to get a look at one signal, you can zoom out again.

  • Frequency database: Jump to favorite stations; catalog signals you hear; import published tables of band, channel, and station info; take notes. (Note: Saving changes to disk is not yet well-tested.)

  • Map: Plot station locations from the frequency database, position data from APRS and ADS-B, and mark your own location on the map. (Caveat: No basemap, i.e. streets and borders, is currently present.)

Supported modes:

  • Audio: AM, FM, WFM, SSB, CW.
  • Other: APRS, Mode S/ADS-B, VOR.

If you’re a developer, here’s why you should consider working on ShinySDR (or: here’s why I wrote my own rather than contributing to another application):

  • All server code is Python, and has no mandatory build or install step.

  • Plugin system allows adding support for new modes (types of modulation) and hardware devices.

  • Demodulators prototyped in GNU Radio Companion can be turned into plugins with very little additional code. Control UI can be automatically generated or customized and is based on a generic networking layer.

On the other hand, you may find that the shiny thing is lacking substance: if you’re looking for functional features, we do not have the most modes, the best filters, or the lowest CPU usage. Many features are half-implemented (though I try not to have things that blatantly don’t work). There’s probably lots of code that will make a real DSP expert cringe.

Now that I've finally written this introduction post, I hope to get around to further posts related to the project.

At the moment, I'm working on adding the ability to transmit (given appropriate hardware), and secondarily improving the frequency database subsystem (particularly to have a useful collection of built-in databases and allow you to pick which ones you want to see).

Side note: ShinySDR may hold the current record for most popular program I've written by myself; at least, it's got 106 stars on GitHub. (Speaking of which: ShinySDR doesn't have a page anywhere on my own web site. Need to fix that — probably starting with a general topics/radio. Eventually I hope to have a publicly accessible demo instance, but there’s a few things I want to do to make it more multiuser and robust first.)

by Kevin Reid (kpreid) ([email protected]) at June 17, 2015 03:22 PM

June 16, 2015

Dimitri Sabadie

Asset management in a real time 3D engine in Haskell

Foreword

In a real time rendering system, it’s not uncommon finding constructs about assets. One famous construct is the resource manager. A resource manager is responsible of several tasks, among:

  • providing a simple interface to load objects from disk (1) ;
  • ensuring we don’t try to load the same object twice (2) ;
  • resolving dependencies between resources (3).

The first point is obvious, but the two others are less intuitive. (2) is important when the user might try to load the same object several times – for instance, a car model, or a character or a weapon. The most known strategy to prevent such a situation from happening is by using a software cache.

A software cache – let’s just say cache – is an opaque object that loads the object on the first request, then just returns that object for future same requests. For instance, consider the following requests and the corresponding cache behavior:

  1. load "wood.png" -> not cached ; loading ; return
  2. load "grass.png" -> not cached ; loading ; return
  3. load "wood.png" -> cached ; return
  4. load "metal.png" -> not cached ; loading ; return
  5. load "metal.png" -> cached ; return
  6. etc.

That behavior is very nice because it will spare a lot of computations and memory space.

(3) is about dependencies. For instance, when you load a car model, you might need to load its textures as well. Well, not really load. Consider the following:

  1. load "car.mesh" -> not cached
    1. load "metal_car.png" -> not cached ; loading ; return
    2. loading ; return
  2. load "metal_car.png" -> cached ; return
  3. load "other_car.mesh" -> not cached
    1. load "metal_car.png" -> cached ; return
    2. return
  4. load "car.mesh" -> cached ; return

You got the idea. (3) needs (2) to be efficient.

Possible implementations

Singleton

In imperative languages and especially in those that support template and/or generics, people tend to implement the cache system with an ugly design pattern – which is actually an anti design pattern : singleton. Each type of resource is assigned a manager by using a template parameter, and then if a manager needs to load a dependency, it just has to reference the corresponding manager by stating the type in the template parameter :

Model & getResource<Model>(std::string const &name) {
Texture &dependency = getResource<Texture>(...);
...
}

That way of doing might sound great, but eh, singletons are just global variables with a unicity constraint. We don’t want that.

Explicit pure store

We can use an explicit store object. That is, some kind of map. For instance, the store that holds textures would have a type like (in Haskell):

textureStore :: Map String Texture

A model store would have the following type:

modelStore :: Map String Model

And each stores is assigned a function; loadTexture, loadModel, and so on.

There are several drawbacks if we go that way. First, we have to carry all stores when using their functions. Some functions might need other stuff in order to resolve dependencies. Secondly, because of explicit state, we need to manually accumulate state! A loading function would have such a following type:

loadTexture :: Map String Texture -> String -> m (Texture,Map String Texture)

That will expose a lot of boilerplate to the user, and we don’t want that.

Implicit pure store

We can enhance the explicit store by putting it into some kind of context; for instance, in MonadState. We can then write loadTexture to make it nicer to use:

loadTexture :: (MonadState (Map String Texture) m,...)
=> String
-> m Texture

There is a problem with that. What happens when we add more types? For instance if we want to handle textures and models? MonadState has a type family constraint that forbids two instances for the pair s m. The following is not allowed and will raise a compiler error:

instance MonadState (Map String Texture) MyState where
...

instance MonadState (Map String Model) MyState where
...

The solution to that problem is to have the carried state a polymorphic type and use typeclass constraint to extract and modify the map we want:

class HasMap a s where
extractMap :: s -> Map String a
modifyMap :: (Map String a -> Map String a) -> s -> s

With that, we can do something like this:

loadTexture :: (MonadState s m,HasMap Texture s,...)
=> String
-> m Texture

loadModel :: (MonadState s m,HasMap Texture s,HasMap Model s,...)
=> String
-> m Model

However, we hit a new issue here. What are s and m? Well, m doesn’t really matter. For simplicity, let’s state we’re using a monad transformer; that is, we use StateT s m as monad.

We still need s. The problem is that s has to be provided by the user. Worse, they have to implement all instances we need so that the loading functions may work. Less boilerplate than the explicit store solution, but still a lot of boilerplate. Imagine you provide a type for s, like Cache. Expending the cache to support new types – like user-defined ones – will be more extra boilerplate to write.

Closures

The solution I use in my engine might not be the perfect solution. It’s not referentially transparent, an important concept in Haskell. However, Haskell is not designed to be used in convenient situations only. We’re hitting a problematic situation. We need to make a compromise between elegance and simplicity.

The solution required the use of closures. If you don’t know what a closure is, you should check out the wikipedia page for a first shot.

The idea is that our loading functions will perform some IO operations to load objects. Why not putting the cache directly in that function? We’d have a function with an opaque and invisible associated state. Consider the following:

type ResourceMap a = Map String a

getTextureManager :: (MonadIO m,...)
=> m (String -> m Texture)
getTextureManager = do
ref <- newIORef empty
pure $ \name -> do
-- we can use the ref ResourceMap to insert / lookup value in the map
-- thanks to the closure!

That solution is great because now, a manager is just a function. How would you implement getModelManager? Well:

getModelManager :: (MonadIO m,...)
=> (String -> m Texture)
-> m (String -> m Model)
getModelManager loadTexture = ...

We can then get the loader functions with the following:

loadTexture <- getTextureManager
loadModel <- getModelManager loadTexture

And you just have to pass those functions around. The cool thing is that you can wrap them in a type in your library and provide a function that initializes them all at once – I do that in my engine. Later on, the user can extend the available managers by providing new functions for new types. In my engine, I provide a few functions like mkResourceManager that hides the ResourceMap, providing two functions – one for lookup in the map, one for inserting into the map.

Conclusion

I truly believe that my solution is a good compromise between elegance and ease. It has a lot of advantages:

  • simple to use ;
  • simple to implement; you just have to play around with closures ;
  • dependencies resolving is easy to add and hidden once the functions are generated ;
  • little runtime overhead (due to closures, might be optimized away by the compiler though) ;
  • can be easily extended by the user to support custom/new types ;
  • if correctly used, implementations can replace IORef with TVar or similar objects for thread-safe implementations ;
  • several replicated functions mean several stores (can be useful in certain cases).

The huge drawback I see in that solution is its opacity. There’s also no way to query the state of each cache. Such a feature could be added by proving a new function, for instance. Same thing for deletion.

I’m one of those Haskellers who love purity. I try to keep my code the purest I can, but there are exceptions, and that cache problem fits that kind of exception.

Feel free to comment, and as always, keep the vibe and happy hacking!

by Dimitri Sabadie ([email protected]) at June 16, 2015 07:01 PM

June 15, 2015

Noam Lewis

Beware of ‘var’ in for loops, JS programmers

Time and time again, I see people using ‘var’ in the initialization part of a for loop. Example from MDN (Mozilla Developer Network):

for (var i = 0; i < 9; i++) {
   console.log(i);
   // more statements
}

What’s wrong with var i = 0 above? The problem is that variables declared in a for initialization have function scope, just like any var declaration does. In other words, they affect the scope of the entire function. Consider the following:

function outer() {
    var x = 'outer';
    function inner() {
        x = 'inner';
        //
        // ... lots o' code
        //
        for (var x = 0; x < 1; x++) {
            // in for
        }
    }
    inner();
}

In the inner function, x shadows the outer variable throughout, not just inside the for loop. So also the initial statement x = 'inner' at the head of ‘inner’ affects only the locally scoped variable.

This is a classic example of var hoisting, which should qualify as one of JavaScript’s awful parts.

Don’t do it! Move all your ‘var’ statements to the head of each function, please.


by sinelaw at June 15, 2015 07:50 PM

Mark Jason Dominus

Math.SE report 2015-04

A lot of the stuff I've written in the past couple of years has been on Mathematics StackExchange. Some of it is pretty mundane, but some is interesting. I thought I might have a little meta-discussion in the blog and see how that goes. These are the noteworthy posts I made in April 2015.

  • Languages and their relation : help is pretty mundane, but interesting for one reason: OP was confused about a statement in a textbook, and provided a reference, which OPs don't always do. The text used the symbol . OP had interpreted it as meaning , but I think what was meant was .

    I dug up a copy of the text and groveled over it looking for the explanation of , which is not standard. There was none that I could find. The book even had a section with a glossary of notation, which didn't mention . Math professors can be assholes sometimes.

  • Is there an operation that takes and , and returns is more interesting. First off, why is this even a reasonable question? Why should there be such an operation? But note that there is an operation that takes and and returns , namely, multiplication, so it's plausible that the operation that OP wants might also exist.

    But it's easy to see that there is no operation that takes and and returns : just observe that although , the putative operation (call it ) should take and yield , but it should also take and yield . So the operation is not well-defined. And you can take this even further: can be written as , so should also take and yield .

    They key point is that the representation of a number, or even an integer, in the form is not unique. (Jargon: "exponentiation is not injective".) You can raise , but having done so you cannot look at the result and know what and were, which is what needs to do.

    But if can't do it, how can multiplication do it when it multiplies and and gets ? Does it somehow know what is? No, it turns out that it doesn't need in this case. There is something magical going on there, ultimately related to the fact that if some quantity is increasing by a factor of every units of time, then there is some for which it is exactly doubling every units of time. Because of this there is a marvelous group homomophism $$\log : \langle \Bbb R^+, \times\rangle \to \langle \Bbb R ,+\rangle$$ which can change multiplication into addition without knowing what the base numbers are.

    In that thread I had a brief argument with someone who thinks that operators apply to expressions rather than to numbers. Well, you can say this, but it makes the question trivial: you can certainly have an "operator" that takes expressions and and yields the expression . You just can't expect to apply it to numbers, such as and , because those numbers are not expressions in the form . I remembered the argument going on longer than it did; I originally ended this paragraph with a lament that I wasted more than two comments on this guy, but looking at the record, it seems that I didn't. Good work, Mr. Dominus.

  • how 1/0.5 is equal to 2? wants a simple explanation. Very likely OP is a primary school student. The question reminds me of a similar question, asking why the long division algorithm is the way it is. Each of these is a failure of education to explain what division is actually doing. The long division answer is that long division is an optimization for repeated subtraction; to divide you want to know how many shares of three cookies each you can get from cookies. Long division is simply a notation for keeping track of removing shares, leaving cookies, then further shares, leaving none.

    In this question there was a similar answer. is because if you have one cookie, and want to give each kid a share of cookies, you can get out two shares. Simple enough.

    I like division examples that involve giving cookies to kids, because cookies are easy to focus on, and because the motivation for equal shares is intuitively understood by everyone who has kids, or who has been one.

    There is a general pedagogical principle that an ounce of examples are worth a pound of theory. My answer here is a good example of that. When you explain the theory, you're telling the student how to understand it. When you give an example, though, if it's the right example, the student can't help but understand it, and when they do they'll understand it in their own way, which is better than if you told them how.

  • How to read a cycle graph? is interesting because hapless OP is asking for an explanation of a particularly strange diagram from Wikipedia. I'm familiar with the eccentric Wikipedian who drew this, and I was glad that I was around to say "The other stuff in this diagram is nonstandard stuff that the somewhat eccentric author made up. Don't worry if it's not clear; this author is notorious for that."

  • In Expected number of die tosses to get something less than 5, OP calculated as follows: The first die roll is a winner of the time. The second roll is the first winner of the time. The third roll is the first winner of the time. Summing the series we eventually obtain the answer, . The accepted answer does it this way also.

    But there's a much easier way to solve this problem. What we really want to know is: how many rolls before we expect to have seen one good one? And the answer is: the expected number of winners per die roll is , expectations are additive, so the expected number of winners per die rolls is , and so we need rolls to expect one winner. Problem solved!

    I first discovered this when I was around fifteen, and wrote about it here a few years ago.

    As I've mentioned before, this is one of the best things about mathematics: not that it works, but that you can do it by whatever method that occurs to you and you get the same answer. This is where mathematics pedagogy goes wrong most often: it proscribes that you must get the answer by method X, rather than that you must get the answer by hook or by crook. If the student uses method Y, and it works (and if it is correct) that should be worth full credit.

    Bad instructors always say "Well, we need to test to see if the student knows method X." No, we should be testing to see if the student can solve problem P. If we are testing for method X, that is a failure of the test or of the curriculum. Because if method X is useful, it is useful because for some problems, it is the only method that works. It is the instructor's job to find one of these problems and put it on the test. If there is no such problem, then X is useless and it is the instructor's job to omit it from the curriculum. If Y always works, but X is faster, it is the instructor's job to explain this, and then to assign a problem for the test where Y would take more time than is available.

    I see now I wrote the same thing in 2006. It bears repeating. I also said it again a couple of years ago on math.se itself in reply to a similar comment by Brian Scott:

    If the goal is to teach students how to write proofs by induction, the instructor should damned well come up with problems for which induction is the best approach. And if even then a student comes up with a different approach, the instructor should be pleased. ... The directions should not begin [with "prove by induction"]. I consider it a failure on the part of the instructor if he or she has to specify a technique in order to give students practice in applying it.

by Mark Dominus ([email protected]) at June 15, 2015 07:38 PM

Gabriel Gonzalez

break-1.0.0: A small library for breaking from loops

I've implemented a small library named break for breaking from loops, based off of a previous post of mine.

This library is simple: you wrap the command you want to loop with a function named loop and you call break when you want to exit from the loop.

import Control.Break
import Control.Monad.State
import Prelude hiding (break)

example :: State Int ()
example = loop (do
n <- lift get -- Inside a `loop`, wrap commands in `lift`
if n < 10
then lift (put (n + 1)) -- You keep looping by default
else break () ) -- Use `break` to exit from the `loop`

The above loop increments n until n is 10 and then exits from the loop. We can verify this by running the State action:

>>> execState example 0
10

By default, you wrap commands other than break with lift. However, for some effects (like State) you can omit the lift:

example :: State Int ()
example = loop (do
n <- get
if n < 10
then put (n + 1)
else break () )

This library uses a Break type which is implemented as ExceptT under the hood:

newtype Break r m a = Break { unBreak :: ExceptT r m a }

The break function just "throws an exception":

break :: Monad m => r -> Break r m a
break r = Break (throwE r)

Here "throwing an exception" doesn't use any sort of out-of-band feature built into the Haskell language. "Exceptions" are just ordinary values implemented within the language:

throwE :: Monad m => e -> ExceptT e m a
throwE = ExceptT (return (Left e))

... and ExceptT's implementation of (>>=) short-circuits when encountering a Left internally, skipping subsequent commands.

loop is just an ordinary function that repeats the loop body indefinitely and only stops when you break from the loop:

loop :: Monad m => Break r m () -> m r
loop m = do
x <- runExceptT (unBreak (forever m))
case x of
Left r -> return r
Right r -> return r

Conceptually we just "catch" the "exceptional value" and return the value. I use quotes because there's no reason to restrict this value to exceptions. You can return any old value from the loop this way:

example :: State Int Bool
example = loop (do
n <- get
if n < 10
then put (n + 1)
else break True )

>>> runState example 0
(True,10)

Notice how I don't have to define a special variation on the forever function that integrates with ExceptT or break to terminate correctly. Instead, break interacts correctly with forever out-of-the-box thanks to Haskell's laziness: forever only lazily evaluates as many actions as necessary and once the internal ExceptT short-circuits the forever function doesn't demand any further command repetitions.

This library also showcases one of Haskell's nifty features: the GeneralizedNewtypeDeriving language extension. Even though I wrap ExceptT in a Break newtype, I can still configure Break to reuse many of the interfaces that were originally implemented for ExceptT, like this:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

newtype Break r m a = Break { unBreak :: ExceptT r m a }
deriving
( Functor
, Applicative
, Monad
, MonadTrans
, MonadIO
, MonadCont
, MonadState s
, MonadWriter w
)

The GeneralizedNewtypeDeriving extension is clever and knows how to wrap and unwrap the newtype to transparently lift all of these type classes to work on Break.

In fact, the library implementation is remarkably small. Here's the entire implementation if you omit the documentation:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Control.Applicative (Applicative)
import Control.Monad (forever)
import Control.Monad.IO.Class (MonadIO)
import Control.Monad.Trans.Class (MonadTrans(..))
import Control.Monad.Trans.Except (ExceptT, runExceptT, throwE)
import Control.Monad.Cont (MonadCont )
import Control.Monad.State (MonadState )
import Control.Monad.Writer (MonadWriter)
import Prelude hiding (break)

newtype Break r m a = Break { unBreak :: ExceptT r m a }
deriving
( Functor
, Applicative
, Monad
, MonadTrans
, MonadIO
, MonadCont
, MonadState s
, MonadWriter w
)

break :: Monad m => r -> Break r m a
break r = Break (throwE r)

loop :: Monad m => Break r m () -> m r
loop m = do
x <- runExceptT (unBreak (forever m))
case x of
Left r -> return r
Right r -> return r

Newtypes are probably one of the Haskell features I miss the most when programming in other languages. They are free performance-wise (especially with the recent work on Data.Coerce) and you pay very little code to transport interfaces across newtype boundaries. You get all of the benefits of abstraction and precise types at very little cost.

If you would like to use the break library you can find the library on Hackage or Github.

by Gabriel Gonzalez ([email protected]) at June 15, 2015 02:04 PM

optional-args-1.0.0: Optional function arguments

I'm releasing a small library named optional-args to simplify functions that take optional arguments.

Traditionally you would represent an optional argument using Maybe, like this:

greet :: Maybe String -> String
greet (Just name) = "Hello, " ++ name
greet Nothing = "Hello"

Then you would provide a specific value by wrapping the value in pure or Just:

>>> greet (pure "John")
"Hello, John"

... or you can use the default by suppling empty or Nothing:

>>> greet empty
"Hello"

The disadvantage to this approach is that Maybe does not implement the IsString, Num, or Fractional type classes, meaning that you cannot use numeric or string literals in place of Maybe. You have to explicitly wrap them in pure or Just.

The optional-args library provides an Optional type that is equivalent to Maybe, but with additional instances for IsString, Num, and Fractional:

data Optional a = Default | Specific a

instance IsString a => IsString (Optional a)
instance Num a => Num (Optional a)
instance Fractional a => Fractional (Optional a)

Additionally, the constructors are better-named for the task of defining function arguments:

import Data.Optional

greet :: Optional String -> String
greet (Specific name) = "Hello, " ++ name
greet Default = "Hello"

Since Optional implements IsString, you can use a naked string literal anywhere a function expects an Optional string argument as long as you enable the OverloadedStrings extensions:

>>> :set -XOverloadedStrings
>>> greet "John"
"Hello, John"

... and you can still use either Default or empty to use the default:

>>> greet empty
"Hello"

Similarly, any function that accepts an Optional numeric argument:

birthday :: Optional Int -> String
birthday (Specific age) = "You are " ++ show age ++ " years old!"
birthday Default = "You are one year older!"

... will accept naked numeric literals when you invoke the function:

>>> birthday 20
"You are 20 years old!"

For values that are not literals, you can still wrap them in pure:

>>> let age = 20
>>> birthday (pure age)
"You are 20 years old!"

The IsString, Num, and Fractional instances are recursive, so you can wrap your types in a more descriptive newtype and derive IsString or Num:

{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Data.Optional
import Data.String (IsString)

newtype Name = Name { getName :: String } deriving (IsString)

greet :: Optional Name -> String
greet (Specific name) = "Hello, " ++ getName name
greet Default = "Hello"

newtype Age = Age { getAge :: Int } deriving (Num)

birthday :: Optional Age -> String
birthday (Specific age) = "You are " ++ show (getAge age) ++ " years old!"
birthday Default = "You are one year older!"

... and you would still be able to use naked numeric or string literals as function arguments:

>>> :set -XOverloadedStrings
>>> greet "John"
"Hello, John"
>>> birthday 20
"You are 20 years old!"

The Optional type's Monoid instance also plays nicely with the IsString instance. Specifically, Optional obeys these two laws:

fromString (str1 <> str2) = fromString str1 <> fromString str2

fromString mempty = mempty

... which is a fancy way of saying that this code:

>>> greet ("John " <> "Smith")
"Hello, John Smith"

... will behave the same as this code:

>>> greet "John Smith"
"Hello, John Smith"

Even if we were to implement an IsString instance for Maybe, it would not satisfy the second law.

The Optional type is also the only Maybe-like type I know of that has a recursive Monoid instance:

instance Monoid a => Monoid (Optional a)

These sorts of recursive instances come in handy when chaining Monoid instances as illustrated in my post on scaling equational reasoning.

The optional-args library also comes with documentation explaining intended usage in detail (almost identical to this post), so if the user clicks on the Optional type they will discover clear instructions on how to use the type.

If you would like to use this, you can find the library on Hackage or Github.

by Gabriel Gonzalez ([email protected]) at June 15, 2015 02:03 PM

Don Stewart (dons)

Haskell dev role in Strats at Standard Chartered [London]

The Strats team at Standard Chartered has an open position for a typed functional programming developer, based in London. Strats are a specialized software engineering and quantitative analysis team building software for financial markets users.

You will work on the trading floor, directly with traders, building software to automate their work and improve their efficiency. The role is highly development focused and you will use Haskell for almost all tasks: data analysis, market data publishing, database access, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.

This is a permanent position in London as part of the Strats global team. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required. We have around 2.5 million lines of Haskell, and our own Haskell compiler. In this context we look for skill and taste in typed functional programming to capture and abstract over complex, messy systems.

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. A PhD in computer science is a strong advantage.

The role requires physical presence on the trading floor in London. Remote work is not an option. Ideally you have some project and client management skills — you will talk to users, understand their problems and then implement and deliver what they really need. No financial background is required.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com.

Role posted 2015-06-15


Tagged: jobs

by Don Stewart at June 15, 2015 09:11 AM

Magnus Therning

Using QuickCheck to test C APIs

Last year at ICFP I attended the tutorial on QuickCheck with John Hughes. We got to use the Erlang implementation of QuickCheck to test a C API. Ever since I’ve been planning to do the same thing using Haskell. I’ve put it off for the better part of a year now, but then Francesco Mazzoli wrote about inline-c (Call C functions from Haskell without bindings and I found the motivation to actually start writing some code.

The general idea

Many C APIs are rather stateful beasts so to test it I

  1. generate a sequence of API calls (a program of sorts),
  2. run the sequence against a model,
  3. run the sequence against the real implementation, and
  4. compare the model against the real state each step of the way.

The C API

To begin with I hacked up a simple implementation of a stack in C. The “specification” is

/**
 * Create a stack.
 */
void *create();

/**
 * Push a value onto an existing stack.
 */
void push (void *, int);

/**
 * Pop a value off an existing stack.
 */
int pop(void *);

Using inline-c to create bindings for it is amazingly simple:

{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}

module CApi
    where

import qualified Language.C.Inline as C
import Foreign.Ptr

C.include "stack.h"

create :: IO (Ptr ())
create = [C.exp| void * { create() } |]

push :: Ptr () -> C.CInt -> IO ()
push s i = [C.exp| void { push($(void *s), $(int i)) } |]

pop :: Ptr () -> IO C.CInt
pop s = [C.exp| int { pop($(void *s)) } |]

In the code below I import this module qualified.

Representing a program

To represent a sequence of calls I first used a custom type, but later realised that there really was no reason at all to not use a wrapped list:

newtype Program a = P [a]
    deriving (Eq, Foldable, Functor, Show, Traversable)

Then each of the C API functions can be represented with

data Statement = Create | Push Int | Pop
    deriving (Eq, Show)

Arbitrary for Statement

My implementation of Arbitrary for Statement is very simple:

instance Arbitrary Statement where
    arbitrary = oneof [return Create, return Pop, liftM Push arbitrary]
    shrink (Push i) = Push <$> shrink i
    shrink _ = []

That is, arbitrary just returns one of the constructors of Statement, and shrinking only returns anything for the one constructor that takes an argument, Push.

Prerequisites of Arbitrary for Program Statement

I want to ensure that all Program Statement are valid, which means I need to define the model for running the program and functions for checking the precondition of a statement as well as for updating the model (i.e. for running the Statement).

Based on the C API above it seems necessary to track creation, the contents of the stack, and even if it isn’t explicitly mentioned it’s probably a good idea to track the popped value. Using record (Record is imported as R, and Record.Lens as RL) I defined it like this:

type ModelContext = [R.r| { created :: Bool, pop :: Maybe Int, stack :: [Int] } |]

Based on the rather informal specification I coded the pre-conditions for the three statements as

preCond :: ModelContext -> Statement -> Bool
preCond ctx Create = not $ RL.view [R.l| created |] ctx
preCond ctx (Push _) = RL.view [R.l| created |] ctx
preCond ctx Pop = RL.view [R.l| created |] ctx
    where
        l = length $ RL.view [R.l| stack |] ctx

That is

  • Create requires that the stack hasn’t been created already.
  • Push i requires that the stack has been created.
  • Pop requires that the stack has been created and that it holds at least one value. (That last bit isn’t explicitly mentioned but seems like a good thing to assume.)

Furthermore the “specification” suggests the following definition of a function for running a statement:

modelRunStatement :: ModelContext -> Statement -> ModelContext
modelRunStatement ctx Create = RL.set [R.l| created |] True ctx
modelRunStatement ctx (Push i) = RL.over [R.l| stack |] (i :) ctx
modelRunStatement ctx Pop = [R.r| { created = c, pop = headMay s, stack = tail s } |]
    where
        c = RL.view [R.l| created |] ctx
        s = RL.view [R.l| stack |] ctx

(This definition assumes that the model satisfies the pre-conditions, as can be seen in the use of tail.)

Arbitrary for Program Statement

With this in place I can define Arbitrary for Program Statement as follows.

instance Arbitrary (Program Statement) where
    arbitrary = liftM P $ ar baseModelCtx
        where
            ar m = do
                push <- liftM Push arbitrary
                let possible = filter (preCond m) [Create, Pop, push]
                if null possible
                    then return []
                    else do
                        s <- oneof (map return possible)
                        let m' = modelRunStatement m s
                        frequency [(499, liftM2 (:) (return s) (ar m')), (1, return [])]

The idea is to, in each step, choose a valid statement given the provided model and cons it with the result of a recursive call with an updated model. The constant 499 is just an arbitrary one I chose after running arbitrary a few times to see how long the generated programs were.

For shrinking I take advantage of the already existing implementation for lists:

    shrink (P p) = filter allowed $ map P (shrink p)
        where
            allowed = and . snd . mapAccumL go baseModelCtx
                where
                    go ctx s = (modelRunStatement ctx s, preCond ctx s)

Some thoughts so far

I would love making an implementation of Arbitrary s, where s is something that implements a type class that contains preCond, modelRunStatement and anything else needed. I made an attempt using something like

class S a where
    type Ctx a :: *

    baseCtx :: Ctx a
    preCond :: Ctx a -> a -> Bool
    ...

However, when trying to use baseCtx in an implementation of arbitrary I ran into the issue of injectivity. I’m still not entirely sure what that means, or if there is something I can do to work around it. Hopefully someone reading this can offer a solution.

Running the C code

When running the sequence of Statement against the C code I catch the results in

type RealContext = [r| { o :: Ptr (), pop :: Maybe Int } |]

Actually running a statement and capturing the output in a RealContext is easily done using inline-c and record:

realRunStatement :: RealContext -> Statement -> IO RealContext
realRunStatement ctx Create = CApi.create >>= \ ptr -> return $ RL.set [R.l| o |] ptr ctx
realRunStatement ctx (Push i) = CApi.push o (toEnum i) >> return ctx
    where
        o = RL.view [R.l| o |] ctx
realRunStatement ctx Pop = CApi.pop o >>= \ v -> return $ RL.set [R.l| pop |] (Just (fromEnum v)) ctx
    where
        o = RL.view [R.l| o |] ctx

Comparing states

Comparing a ModelContext and a RealContext is easily done:

compCtx :: ModelContext -> RealContext -> Bool
compCtx mc rc = mcC == rcC && mcP == rcP
    where
        mcC = RL.view [R.l| created |] mc
        rcC = RL.view [R.l| o |] rc /= nullPtr
        mcP = RL.view [R.l| pop|] mc
        rcP = RL.view [R.l| pop|] rc

Verifying a Program Statement

With all that in place I can finally write a function for checking the validity of a program:

validProgram :: Program Statement -> IO Bool
validProgram p = and <$> snd <$> mapAccumM go (baseModelCtx, baseRealContext) p
    where
        runSingleStatement mc rc s = realRunStatement rc s >>= \ rc' -> return (modelRunStatement mc s, rc')

        go (mc, rc) s = do
            ctxs@(mc', rc') <- runSingleStatement mc rc s
            return (ctxs, compCtx mc' rc')

(This uses mapAccumM from an earlier post of mine.)

The property, finally!

To wrap this all up I then define the property

prop_program :: Program Statement -> Property
prop_program p = monadicIO $ run (validProgram p) >>= assert

and a main function

main :: IO ()
main = quickCheck prop_program

June 15, 2015 12:00 AM

June 11, 2015

Yesod Web Framework

cabal's does not exist error message

I've seen many people confused by this error message, and just ran into it myself, so decided a blog post to explain was in order.

As I was hacking on stack this morning, I pushed and got a build failure from Travis with the following content at the end:

Failed to install asn1-encoding-0.9.0
Last 10 lines of the build log ( /home/travis/.cabal/logs/asn1-encoding-0.9.0.log ):
cabal: /home/travis/.cabal/logs/asn1-encoding-0.9.0.log: does not exist

What the error message means is: I tried to load up the log file containing the output of the build, but the file does not exist. It's possible that there are multiple reasons for this, but I know of only one: when cabal is unable to download the necessary package from Hackage, usually because Hackage is having a temporary blip.

As a user: if you see this message, just try running your command again. If it's a temporary Hackage outage, the second attempt may succeed. Another option (which I'd strongly recommend) is to switch to a more reliable package index; see the blog post on FP Complete's S3 mirror for more details. I've been pushing for a while to make this the default remote-repo used by cabal-install, but unfortunately that hasn't happened. (For the record: stack does use the S3 mirror by default.)

For cabal: I see three steps worth taking:

  1. Fix this confusing error message to say something like "Download of http://... failed". I thought there was a cabal issue opened about this already, which is why I didn't open up a new one right now. If there isn't one, it's worth opening.
  2. Automatically retry a failing download. I'm not convinced this is a good thing to do, but it's a possible mitigation. (We don't do that right now in stack, though I've been toying with the idea. Feedback appreciated.)
  3. Switch to a more reliable mirror for the default remote-repo.

Fixing this instability at the Hackage level would of course also be a good move.

June 11, 2015 12:00 AM

Christopher Done

Stream fusion and composability (Java 8 and Haskell) for newbies

In an online discussion, when Java 8 released their stream API, written about here, you can write e.g.

public List<Article> getAllJavaArticles() {
    return articles.stream()
        .filter(article -> article.getTags().contains("Java"))
        .collect(Collectors.toList());
}

Someone asked, “But my question: would the streams be faster than loops? Or is the only benefit better readability?” Someone answered that the benefit is that streams compose and loops don’t. What does composable mean here? Below is my answer, using two languages I know, JavaScript and Haskell.

Composable in this context means: To be able to compose two things into one without redundancy or overhead. For example, consider you want to map a function f over an array arr to produce a new array, you might do this:

var arr2 = [];
for (var i = 0; i < arr.length; i++)
    arr2.push(f(arr[i]));

If you want to filter the array based on a predicate p, you might do this:

var arr3 = [];
for (var i = 0; i < arr2.length; i++)
    if (p(arr2[i]))
        arr3.push(arr2[i]);

Or maybe you want to take all elements until a a predicate p2 is not satisfied:

var arr4 = [];
for (var i = 0; i < arr3.length; i++)
    if (p2(arr3[i]))
        arr4.push(arr3[i]);
    else
        break;

Now, if you want to do that all in one process you have a few options:

  • Put them all one after the other verbatim as I’ve written above. Redundant, a maintenance issue and inefficient.
  • Merge them all into one clever loop. Also redundant (re-implementing the same concept of mapping, filtering and taking), error prone (it’s easy to get manual loops wrong, especially merging several concepts together), and a maintenance burden.
  • Put them each into a method on your language’s Array type as map(), filter(), and takeWhile() and then write arr.map(f).filter(p).takeWhile(p2). Good abstraction, very low maintenance because the functions are black boxes. But inefficient.

An ideal stream API will give you the last point, but be able to understand concepts like mapping and filtering and know how to merge them together into an efficient loop. This is called stream fusion, which you can google if you want to know more.

I don’t know Java but I can give a Haskell example:

map f . filter p . takeWhile p2

(Note: In Haskell the operations separated by . are run right to left, like map f (filter p (takeWhile p2 …)).)

If I compile this with GHC, e.g.

main = print ((map f . filter p . takeWhile p2) [1..10])
  where p2 = (<5)
        p = even
        f = (+2)

and look at the reduced output called Core, a language the compiler generates code for before generating assembly or byte code, the map f . filter p are both compiled into a single loop (Core output is verbose, so I collapsed it into this more readable form). This just walks over the list, checks whether the item is even, if so, keeps it and adds 2 to it, otherwise skips that item:

mainzugo xs =
  case xs of
    [] -> []
    (x:ys) ->
      case even x of
        False -> mainzugo ys
        True -> x + 2 : mainzugo ys

Which is pretty nifty. Furthermore, if you fold (also called reducing) e.g.

foldr (+) 0 . map f . filter p

Then that whole thing is also compiled into one loop:

mainzugo xs =
  case xs of
    [] -> 0
    (x:ys) ->
      case even x of
        False -> mainzugo ys
        True -> (x + 2) + mainzugo ys

There’re limits to what can compose with what, though.

June 11, 2015 12:00 AM

June 10, 2015

The GHC Team

GHC Weekly News - 2015/06/10

Hi *,

Welcome for the latest entry in the GHC Weekly News. The past week, GHC HQ met up for a quick catch up on 7.10.2 (which you'll want to read up on, see below), and some other technical nonsense about some work we've been doing. As a result the current weekly notes have been slow - the current priority is the next release though, which leads us to...

7.10.2 status

7.10.2 is going to be out soon - our current plan is to have a release candidate on the weekend of Saturday the 13th, and the final release the next week. That means if you want something fixed, you'd better hollar very soon, or we're just not going to get to it!

If you're wondering what's currently been fixed/scheduled, the status page shows the current set of tickets we've fixed and plan on fixing.

List chatter

  • Jan Stolarek asks: in some cases, GHC will generate default instances or values, but that source code has no textual information location (for example, consider an instance clause without the where) - what do people think about fixing this, and are there anythings to look out for? https://mail.haskell.org/pipermail/ghc-devs/2015-June/009202.html
  • David Luposchainsky has opened a new thread - about moving fail out of Monad and into its own typeclass, MonadFail. This change is a request that's very long in the tooth (older than the AMP or FTP changes by a landslide), but David's proposal has a clearly set out goal to tackle compatibility, warnings, and implementation. https://mail.haskell.org/pipermail/ghc-devs/2015-June/009186.html

Noteworthy commits

Closed tickets

#10460, #7672, #9252, #9506, #10294, #5316, #10408, #10386, #9960, #10145, #9259, #10386, #9507, #8723, #10442, #5014, #4215, #10443, #8244, #10499, #10500, #10428, #10488, #10489, #10406, #10501, #10441, #10406, #10502, #9101, #9663, and #9945.

by thoughtpolice at June 10, 2015 09:34 PM

Brent Yorgey

Ally Skills Tutorial at ICFP

I just signed up for the Ally Skills Tutorial at ICFP, and if you are (1) a man who (2) will be at ICFP in Vancouver, I encourage you to sign up, too! From the website:

The Ally Skills Tutorial teaches men simple, everyday ways to support women in their workplaces and communities. Participants learn techniques that work at the office, in classrooms, at conferences, and online. The skills we teach are relevant everywhere, including skills particularly relevant to open technology and culture communities. At the end of the tutorial, participants will feel more confident in speaking up to support women, be more aware of the challenges facing women in their workplaces and communities, and have closer relationships with the other participants.

This sounds super helpful—I suspect there is often a large gap between the extent to which I want to support women and the extent to which I actually know, practically, how to do so. The workshop will be taught by Valerie Aurora, Linux filesystem developer and Ada Initiative co-founder; I expect it will be high quality!


by Brent at June 10, 2015 08:47 PM

Well-Typed.Com

Summer School on Generic and Effectful Programming

I’m one of the lecturers at

Summer School on Generic and Effectful Programming

St Anne’s College, Oxford, 6th to 10th July 2015

(Register here)

Datatype-generic programming was the topic of my PhD thesis many years ago, and it has continued to be a fascinating field of work and research for me since then.

At the upcoming summer school, I will give a three-lecture course on Applying Type-level and Generic Programming in Haskell. In this course, I will describe the state-of-the-art of datatype-generic programming in Haskell/GHC. This means we’ll look at the GHC extension that allows you to generically derive your own type classes, but also at the relatively recent generics-sop library. We will discuss the GHC type system features that make all of this possible, such as data kinds, kind polymorphism, GADTs, higher-rank types, constraint kinds and more, and we will look at a number of real-world applications of generic programming, taken, e.g., from the areas of web programming and databases.

But my course is only one of many. Ralf Hinze, the main organizer, has done an outstanding job and assembled a fantastic lineup of lecturers: I’m honoured to be teaching alongside Edwin Brady, Fritz Henglein, Conor McBride, Don Syme and Tarmo Uustalu. I am sure I will learn a lot from them and their lectures.

If you always wanted to learn more about generic and effectful programming, this is your chance! You can still register for the school! I’d be happy to see you there.

by andres at June 10, 2015 12:36 PM

Don Stewart (dons)

Haskell dev role in Strats at Standard Chartered [London]

The Strats team at Standard Chartered has an open position for a typed functional programming developer, based in London.

You will work on the trading floor, directly with traders, building software to automate their work and improve their efficiency. The role is highly development focused and you will use Haskell for almost all tasks: data analysis, market data publishing, database access, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.

This is a permanent position in London as part of the Strats global team. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required. We have around 2.5 million lines of Haskell, and our own Haskell compiler. In this context we look for skill and taste in typed functional programming to capture and abstract over complex, messy systems.

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. A PhD in computer science is a strong advantage.

The role requires physical presence on the trading floor in London. Remote work is not an option. Ideally you have some project and client management skills — you will talk to users, understand their problems and then implement and deliver what they really need. No financial background is required.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com.

Role posted 2015-06-10


Tagged: jobs

by Don Stewart at June 10, 2015 08:27 AM

June 09, 2015

FP Complete

ANNOUNCING: first public beta of stack

stack is a new, complete, cross-platform development tool aimed at both new and experienced Haskell developers alike, for installing and setting up the compiler, installing packages needed, and building, testing or benchmarking one or more packages in a project at a time. It's the whole stack.

We developed it in collaboration with the Commercial Haskell Group, with two deliberate goals in mind:

  1. Newbie friendliness, ease of setup and doing the right thing by default. Just get people running with Haskell as conveniently as possible.
  2. Catering to the finer points of serious commercial Haskell development. Supporting the modern use-cases of working on large Haskell projects.

In short, it's a program that you run in a directory with a stack configuration file in it (stack.yaml—automatically created if one doesn't exist). That directory may contain one or many packages inside it to be built. Then there are commands to build, test, install dependencies, run GHC, etc.

Here's a rundown of the advantages stack brings to the table. For both newbies and experienced Haskellers:

  • It's really easy to get going with Haskell. You can just run stack build in a package directory and it will automatically download and install GHC, the standard Haskell compiler, for you, download the package index and install packages that are needed.
  • It also caters to newbies reading books who just want to run ghc on a file. You can run stack ghc X.hs and it will download GHC if necessary, etc. until it can run GHC on that file. The same is true for stack ghci, etc.
  • It automatically assumes (and does some figuring out if there's a project Cabal file) the latest LTS Haskell release as its source of packages, or otherwise a nightly release. This means you can just use packages without worrying about what versions match up and trying to think like a package manager thinks.

For experienced Haskellers, there is support for typical use-cases and subtleties of larger projects:

  • This also means you can just put the snapshot release in your version control, via the stack.yaml file, which is important for reliable collaboration.
  • Serious commercial or large scale users of Haskell can build so-called "mega-repos", by adding to the list of packages in your stack.yaml, you can build/test/bench multiple projects at once. stack build will just build everything. stack build this will build this package and rebuild anything that depends on it, and also any of its dependencies that changed.
  • You can enable profiling in the configuration and this will automatically rebuild all packages with profiling turned on for you.
  • Packages are installed into isolated package databases by default. There is a layering of three package databases: The global database, the snapshot database, and your personal project's database. Many projects can share the same global database (base, bytestring, etc.) and the snapshot database (e.g. text, haskell-src-exts, etc.)—which means no unnecessary rebuilding of packages—and yet each project has its own package database, so they are isolated from each other and cannot break each other.
  • Binary installs of your personal projects are also isolated into that directory.
  • In this "local" database, you can also specify newer upstream versions than are available in the snapshot. In fact, the snapshot is optional, you can be based on plain Hackage. This is not the default behaviour and is a configuration option.
  • It also supports docker integration for truly sandboxed environments. By enabling docker in the configuration, running stack build will download the image if necessary and then run the build inside the docker image. There is an array of options available for the docker integration.

We first arrived at this tool based on feedback from our clients at FP Complete and have been using this tool in-house ourselves for around a year, in several iterations. This final iteration has had particular focus on ease of use. We were also motivated by the results of the recent Haskell survey.

Feedback from members of the Commercial Haskell Group has been very helpful in shaping the direction of the tool in a positive way and we now open it up, open source, in beta and for the Haskell community as a whole and welcome your contributions, ideas and feedback!

stack is available on Hackage (cabal install stack), as well as binary downloads for Windows, Mac, and Linux, with Ubuntu and Arch packages. Please note that this is a beta release: we anticipate bugs, and will work to quickly resolve them. Please be sure to report issues on the Github issue tracker.

June 09, 2015 08:00 AM

Magnus Therning

`mapAccum` in monad

I recently had two functions of very similar shape, only difference was that one was pure and the other need some I/O. The former was easily written using mapAccumL. I failed to find a function like mapAccumL that runs in a monad, so I wrote up the following:

mapAccumM :: (Monad m, Traversable t) => (a -> b -> m (a, c)) -> a -> t b -> m (a, t c)
mapAccumM f a l = swap <$> runStateT (mapM go l) a
    where
        go i = do
            s <- get
            (s', r) <- lift $ f s i
            put s'
            return r

Bring on the comments/suggestions/improvements/etc!

June 09, 2015 12:00 AM

June 08, 2015

Ken T Takusagawa

[gszzqvjg] Seeding a PRNG with a string

Some Haskell codes demonstrating how to initialize random number generators (equivalently stream ciphers) with a string (and salt).  The main point of these demonstration codes are to list what modules one needs to import and show how to hook things together.

aes-ctr-demo.hs generates a stream of random bytes using AES-256 in counter mode (CTR).  (This implements option 2 among this enumeration of ways to do counter mode.)  Without arguments, it uses the scrypt key derivation function to convert the password and salt to a 256-bit AES key.  It also demonstrates two alternatives to scrypt: a SHA-512 instance of PBKDF2, and using straight unsalted SHA-256 as a key derivation function (the latter is cryptographically terrible idea because it is relatively easy to mount a dictionary attack against it compared to real KDFs).

mwc-demo.hs demonstrates seeding the non-cryptographic MWC random number generator with 258 words generated with PBKDF2.

tf-demo.hs demonstrates seeding the TF (ThreeFish) random number generator with a 4-tuple of Word64 generated with PBKDF2.

Alternate Source code directory

Disclaimer: I have paid no attention to whether these demonstrations are vulnerable to side-channel attacks.  They almost certainly are vulnerable.

by Ken ([email protected]) at June 08, 2015 04:23 AM

June 07, 2015

Dominic Steinitz

Population Growth Estimation via Hamiltonian Monte Carlo

Here’s the same analysis of estimating population growth using Stan.

data {
  int<lower=0> N; // number of observations
  vector[N] y;    // observed population
}

parameters {
  real r;
}

model {
  real k;
  real p0;
  real deltaT;
  real sigma;
  real mu0;
  real sigma0;
  vector[N] p;
  k      <- 1.0;
  p0     <- 0.1;
  deltaT <- 0.0005;
  sigma  <- 0.01;
  mu0    <- 5;
  sigma0 <- 10;

  r ~ normal(mu0, sigma0);

  for (n in 1:N) {
    p[n] <- k * p0 * exp((n - 1) * r * deltaT) / (k + p0 * (exp((n - 1) * r * deltaT) - 1));
    y[n] ~ normal(p[n], sigma);
  }
}

Empirically, by looking at the posterior, this seems to do a better job than either extended Kalman or vanilla Metropolis.


by Dominic Steinitz at June 07, 2015 06:47 PM

Neil Mitchell

ghcid 0.4 released

Summary: I've released a new version of ghcid, which is faster and works better for Vim users.

I've just released version 0.4 of ghcid. For those who haven't tried it, it's the simplest/dumbest "IDE" that could possibly exist. It presents a small fixed-height console window that shows the active warnings/errors from ghci, which is updated on every save. While that might not seem like much, there are a number of improvements over a standard workflow:

  • You don't have to switch to a different window and hit :reload, just saving in your editor is enough.
  • It includes warnings from modules that successfully compiled previously, as opposed to ghci which omits warnings from modules that didn't reload.
  • It reformats the messages to the height of your console, so you can see several messages at once.

I find it gives me a significant productivity boost, and since there is no editor integration, it adds to my existing tools, rather than trying to replace them (unlike a real IDE).

Version 0.4 offers a few additional improvements:

  • We use fsnotify to watch for changes, rather than polling as in older versions. The result is a significant decrease in battery usage, combined with faster responses to changes.
  • If you change the .cabal or .ghci file then ghcid will restart the ghci session to pick up the changes.
  • If you are a Vim user, then the sequence of temporary files Vim uses on save could upset ghcid. I believe these issues are now all eliminated.
  • The number of errors/warnings is displayed in the titlebar of the console.
  • There is a feature to run a test after each successful save (using --test). I am expecting this to be a niche feature, but feel free to prove me wrong.

Tempted to give it a try? See the README for details.

by Neil Mitchell ([email protected]) at June 07, 2015 09:57 AM

wren gayle romano

ANN: bytestring-lexing 0.5.0

bytestring-lexing 0.5.0

The bytestring-lexing package offers extremely efficient bytestring parsers for some common lexemes: namely integral and fractional numbers. In addition, it provides efficient serializers for (some of) the formats it parses.

As of version 0.3.0, bytestring-lexing offers the best-in-show parsers for integral values. (According to the Warp web server's benchmark of parsing the Content-Length field of HTTP headers.) And as of this version (0.5.0) it offers (to my knowledge) the best-in-show parser for fractional/floating numbers.

Changes since 0.4.3 (2013-03-21)

I've completely overhauled the parsers for fractional numbers.

The old Data.ByteString.Lex.Double and Data.ByteString.Lex.Lazy.Double modules have been removed, as has their reliance on Alex as a build tool. I know some users were reluctant to use bytestring-lexing because of that dependency, and forked their own version of bytestring-lexing-0.3.0's integral parsers. This is no longer an issue, and those users are requested to switch over to using bytestring-lexing.

The old modules are replaced by the new Data.ByteString.Lex.Fractional module. This module provides two variants of the primary parsers. The readDecimal and readExponential functions are very simple and should suffice for most users' needs. The readDecimalLimited and readExponentialLimited are variants which take an argument specifying the desired precision limit (in decimal digits). With care, the limited-precision parsers can perform far more efficiently than the unlimited-precision parsers. Performance aside, they can also be used to intentionally restrict the precision of your program's inputs.

Benchmarks

The Criterion output of the benchmark discussed below, can be seen here. The main competitors we compare against are the previous version of bytestring-lexing (which already surpassed text and attoparsec/scientific) and bytestring-read which was the previous best-in-show.

The unlimited-precision parsers provide 3.3× to 3.9× speedup over the readDouble function from bytestring-lexing-0.4.3.3, as well as being polymorphic over all Fractional values. For Float/Double: these functions have essentially the same performance as bytestring-read on reasonable inputs (1.07× to 0.89×), but for inputs which have far more precision than Float/Double can handle these functions are much slower than bytestring-read (0.30× 'speedup'). However, for Rational: these functions provide 1.26× to 1.96× speedup compared to bytestring-read.

The limited-precision parsers do even better, but require some care to use properly. For types with infinite precision (e.g., Rational) we can pass in an 'infinite' limit by passing the length of the input string plus one. For Rational: doing so provides 1.5× speedup over the unlimited-precision parsers (and 1.9× to 3× speedup over bytestring-read), because we can avoid intermediate renormalizations. Whether other unlimited precision types would see the same benefit remains an open question.

For types with inherently limited precision (e.g., Float/Double), we could either pass in an 'infinite' limit or we could pass in the actual inherent limit. For types with inherently limited precision, passing in an 'infinite' limit degrades performance compared to the unlimited-precision parsers (0.51× to 0.8× 'speedup'). Whereas, passing in the actual inherent limit gives 1.3× to 4.5× speedup over the unlimited-precision parsers. They also provide 1.2× to 1.4× speedup over bytestring-read; for a total of 5.1× to 14.4× speedup over bytestring-lexing-0.4.3.3!

Links



comment count unavailable comments

June 07, 2015 01:04 AM

June 06, 2015

Dominic Steinitz

Population Growth Estimation via Markov Chain Monte Carlo

Introduction

Let us see if we can estimate the parameter for population growth using MCMC in the example in which we used Kalman filtering.

We recall the model.

\displaystyle   \begin{aligned}  \dot{p} & =  rp\Big(1 - \frac{p}{k}\Big)  \end{aligned}

\displaystyle   p = \frac{kp_0\exp rt}{k + p_0(\exp rt - 1)}

And we are allowed to sample at regular intervals

\displaystyle   \begin{aligned}  p_i &= \frac{kp_0\exp r\Delta T i}{k + p_0(\exp r\Delta T i - 1)} \\  y_i &= p_i + \epsilon_i  \end{aligned}

In other words y_i \sim {\cal{N}}(p_i, \sigma^2), where \sigma is known so the likelihood is

\displaystyle   p(y\,|\,r) \propto \prod_{i=1}^n \exp{\bigg( -\frac{(y_i - p_i)^2}{2\sigma^2}\bigg)} =  \exp{\bigg( -\sum_{i=1}^n \frac{(y_i - p_i)^2}{2\sigma^2}\bigg)}

Let us assume a prior of r \sim {\cal{N}}(\mu_0,\sigma_0^2) then the posterior becomes

\displaystyle   p(r\,|\,y) \propto \exp{\bigg( -\frac{(r - \mu_0)^2}{2\sigma_0^2} \bigg)} \exp{\bigg( -\sum_{i=1}^n \frac{(y_i - p_i)^2}{2\sigma^2}\bigg)}

Preamble

> {-# OPTIONS_GHC -Wall                      #-}
> {-# OPTIONS_GHC -fno-warn-name-shadowing   #-}
> {-# OPTIONS_GHC -fno-warn-type-defaults    #-}
> {-# OPTIONS_GHC -fno-warn-unused-do-bind   #-}
> {-# OPTIONS_GHC -fno-warn-missing-methods  #-}
> {-# OPTIONS_GHC -fno-warn-orphans          #-}
> {-# LANGUAGE NoMonomorphismRestriction     #-}
> module PopGrowthMCMC where
> 
> import qualified Data.Vector.Unboxed as V
> import Data.Random.Source.PureMT
> import Data.Random
> import Control.Monad.State
> import Data.Histogram.Fill
> import Data.Histogram.Generic ( Histogram )

Implementation

We assume most of the parameters are known with the exception of the the growth rate r. We fix this also in order to generate test data.

> k, p0 :: Double
> k = 1.0
> p0 = 0.1
> r, deltaT :: Double
> r = 10.0
> deltaT = 0.0005
> nObs :: Int
> nObs = 150

Here’s the implementation of the logistic function

> logit :: Double -> Double -> Double -> Double
> logit p0 k x = k * p0 * (exp x) / (k + p0 * (exp x - 1))

Let us create some noisy data.

> sigma :: Double
> sigma = 1e-2
> samples :: [Double]
> samples = zipWith (+) mus epsilons
>   where
>     mus = map (logit p0 k . (* (r * deltaT))) (map fromIntegral [0..])
>     epsilons = evalState (sample $ replicateM nObs $ rvar (Normal 0.0 sigma)) (pureMT 3)

Arbitrarily let us set the prior to have a rather vague normal distribution.

> mu0, sigma0 :: Double
> mu0 = 5.0
> sigma0 = 1e1
> prior :: Double -> Double
> prior r = exp (-(r - mu0)**2 / (2 * sigma0**2))
> likelihood :: Double -> [Double] -> Double
> likelihood r ys = exp (-sum (zipWith (\y mu -> (y - mu)**2 / (2 * sigma**2)) ys mus))
>   where
>     mus :: [Double]
>     mus = map (logit p0 k . (* (r * deltaT))) (map fromIntegral [0..])
> posterior :: Double -> [Double] -> Double
> posterior r ys = likelihood r ys * prior r

The Metropolis algorithm tells us that we always jump to a better place but only sometimes jump to a worse place. We count the number of acceptances as we go.

> acceptanceProb' :: Double -> Double -> [Double] -> Double
> acceptanceProb' r r' ys = min 1.0 ((posterior r' ys) / (posterior r ys))
> oneStep :: (Double, Int) -> (Double, Double) -> (Double, Int)
> oneStep (r, nAccs) (proposedJump, acceptOrReject) =
>   if acceptOrReject < acceptanceProb' r (r + proposedJump) samples
>   then (r + proposedJump, nAccs + 1)
>   else (r, nAccs)

Here are our proposals.

> normalisedProposals :: Int -> Double -> Int -> [Double]
> normalisedProposals seed sigma nIters =
>   evalState (replicateM nIters (sample (Normal 0.0 sigma)))
>   (pureMT $ fromIntegral seed)

We also need samples from the uniform distribution

> acceptOrRejects :: Int -> Int -> [Double]
> acceptOrRejects seed nIters =
>   evalState (replicateM nIters (sample stdUniform))
>   (pureMT $ fromIntegral seed)

Now we can actually run our simulation. We set the number of jumps and a burn in but do not do any thinning.

> nIters, burnIn :: Int
> nIters = 100000
> burnIn = nIters `div` 10

Let us start our chain to the mean of the prior. In theory this shoudn’t matter as by the time we have burnt in we should be sampling in the high density region of the distribution.

> startMu :: Double
> startMu = 5.0

This jump size should allow us to sample the region of high density at a reasonable granularity.

> jumpVar :: Double
> jumpVar = 0.01

Now we can test our MCMC implementation.

> test :: Int -> [(Double, Int)]
> test seed =
>   drop burnIn $
>   scanl oneStep (startMu, 0) $
>   zip (normalisedProposals seed jumpVar nIters)
>       (acceptOrRejects seed nIters)

We put the data into a histogram.

> numBins :: Int
> numBins = 400
> hb :: HBuilder Double (Data.Histogram.Generic.Histogram V.Vector BinD Double)
> hb = forceDouble -<< mkSimple (binD lower numBins upper)
>   where
>     lower = r - 0.5 * sigma0
>     upper = r + 0.5 * sigma0
> 
> hist :: Int -> Histogram V.Vector BinD Double
> hist seed = fillBuilder hb (map fst $ test seed)

With 50 observations we don’t seem to be very certain about the growth rate.

With 100 observations we do very much better.

And with 150 observations we do even better.


by Dominic Steinitz at June 06, 2015 02:16 PM

Philip Wadler

Haskell in Production: Bdellium


At Medium, Fredrik (@ique) describes using Haskell in anger.
At the start of the products’ life we mostly analyzed small retirement plans with 100 to 500 plan participants. As time went on we started seeing plans with 2,000 participants and even 10,000 participants. At these numbers the execution time for the processing started going outside of acceptable bounds. Luckily, every participant could be analyzed in isolation from the others and so the problem was embarrassingly parallel.
I changed one line of code from
map outputParticipant parts
to
map outputParticipant parts `using` parListChunk 10 rdeepseq
and execution times were now about 3.7x faster on our 4-core server. That was enough to entirely fix the issue for another 6 months, during which time we could focus on further enhancing the product instead of worrying about performance. Later, we did do extensive profiling and optimization of the code to further reduce execution times significantly, but we didn’t want to prematurely optimize anything.
Spotted via Manual Chakravarty @TechnicalGrace.

by Philip Wadler ([email protected]) at June 06, 2015 08:37 AM

June 05, 2015

Toby Goodwin

Gnome Application Menus

I don't follow Gnome development closely, but I am a Fedora user so never too far behind the times. I've always used the latest packaged Gnome version: sometimes grumbling about the changes, but learning to live with them; more often delighted with the gradually improving desktop experience.

But now it has all gone horribly wrong.

Before I get onto my current moan though, let me talk about a very old one instead. I've been using point-to-focus since twm was the cutting edge of user interface design, and I have no intention of stopping. My attitude was well expressed by my former colleague in this article from 12 years ago.

Apple's usability people will no doubt explain how tests show that point-to-focus is terrible and unintuitive and hard to learn and inefficient and any number of other unhealthy things. This is quite beside the point. I find it very difficult to work without it and since I use other machines which do use point-to-focus, it's very annoying to have to change behaviour when moving from one computer to another.

I'm aware, of course, that Gnome considers click-to-focus the "correct" option, and makes it the default. Point-to-focus is a second-class citizen, but appears to be grudgingly supported. For example, I understand that changes were recently made so that in point-to-focus modes, a window only gains focus when the pointer stops on it.

Still, this all raises some interesting questions. When was the decision that c-t-f is "right" made? By whom? For what reasons? Where was it documented? And suppose somebody wanted to challenge it, how could they do so? Although I've only glanced at it, I don't believe the Gnome HIG mandates c-t-f, so lobbying for a change in the HIG is not going to help.

Of course, the choice of click-to-focus was made a very long time ago, and is unlikely to change now. But I think it will be helpful to bear those questions in mind, as we consider an (almost) brand new feature: application menus.

Soon after upgrading to Fedora 21, I opened a bug in Fedora's bugzilla against sound-juicer. I needed to access the Preferences menu, and it wasn't there. I looked everywhere (well, I thought I looked everywhere). I even tried to press the right hot keys that I remember from the last time I had a working sound-juicer, all to no avail.

Christophe Fergeau kindly set me straight. The menu is now hidden behind the name of the application on the "top bar", which is on a different monitor, several mouse-miles away from my app. Hot keys apparently don't exist any more. And every Gnome app is being redesigned this way (it's just that most of the ones I use daily haven't been yet).

I was directed to this page on the wiki, which talks about the change a bit.

And I've followed some links from there, and searched for more background. What I'm really struggling to understand is the motivation for App Menus. What problem are they trying to solve? I can see that, for an application that has one window per document, it's slightly inconsistent to have global menu entries on each window. But the obvious solution, of repeating the global menus on each window, seems to work well enough. And I'm sure you know the Emerson quote about consistency. (And, speaking of consistency, isn't it inconsistent to use point-to-focus for the scrollwheel? But everybody does that!)

I simply think that Gnome (and OS X before it) is wrong to attach menus to the screen rather than to the application. The handwavy justification for my position is obvious: each window represents a single self-contained application: everything you do within that application occurs within the window. OS X breaks that intuition, and now Gnome breaks it more (at least on OS X the full menu bar is displayed, which makes it a bit clearer what is going on.)

Does the Gnome project have HCI research that says I'm wrong? If so, I haven't found it yet. Can somebody send me the link?

The application where the new App Menus floored me is not a multi-document one. As far as I know, sound-juicer only ever displays one window. Moving the menus away from this one window strikes me as a near-catastrophic design error.

I've already mentioned that on a multi-monitor setup the top bar may be a very long way away from the application. I use focus mode "sloppy", and on the long trek between the application and its menu, it's possible for another application to be selected. Sloppy mode is actually very good about not changing the focus till the mouse has stopped moving; I probably go wrong only about 1 time in 4. But when every app works like this, that's going to be a very serious irritation.

Consider another point: documentation. I was misled in the case of sound-juicer because F1 brought up help pages that showed screenshots of a menu bar. OK, so it's just out of date. However, look at the documentation for gedit, which has been updated. In various places it tells you to use the "gedit" menu... but it never tells you where that is. Obviously you can't have a screenshot that shows it, because it's not part of the application any more! The very fact that it's so difficult to describe the new menus clearly should be a warning that they will not be intuitive to users. Even users that do read the docs will be confused.

I cannot live with Gnome's new Application Menus. Fortunately, I have found Cinnamon. A "traditional" look and feel, built using Gnome technology. Yay! Point-to-focus supported out of the box. Yay! No more application menus. Yay! Alt-Tab works on windows again, not applications (I'd been suffering the Gnome / OS X way for so long that I'd forgotten how wrong and anti-productive it is). Yay! Hot corner still works for window selection (although it turns out I use that far less now that Alt-Tab works properly again).

So I suppose, in the end, that's the beauty of free software: I can choose Gnome or Cinnamon (or KDE, or twm...), and they can all share code and ideas. Still, I would like the folks at Gnome to know that they've lost me to Cinnamon, and this is why.

One final prediction. I do not believe application menus will survive in their current form. They're just too user-antagonistic. I note this extraordinary line in the HIG: "If an item from the application menu is frequently used, consider moving it to a more accessible location." Note also that one of the items that is definitely supposed to be in the application menu is "Help"!

Comments

Magnus Therning writes:

I can only agree. I've often wondered what reasons the interaction designers of desktop systems like Gnome have for making changes. Certainly many changes feel more beginner-friendly rather than expert-friendly. There's an episode of 99% Invisible about Doug Engelbart (Of Mice And Men). I think some of his ideas regarding user friendliness ought to be spread more widely.

June 05, 2015 07:18 PM

Douglas M. Auclair (geophf)

May 2015 1HaskellADay problems and solutions

May 2015
  • May 29th, 2015: The President of the United States wants YOU to solve today's #haskell problem http://lpaste.net/5394990454380953600 Super-secret decoder ring sez: http://lpaste.net/4232036193933459456
  • May 28th, 2015: So, you got the for-loop down. Good. Now: how about the 'no'-loop. HUH? http://lpaste.net/419764621470072832 Ceci n'est pas un for-loop http://lpaste.net/4585222543073345536
  • May 27th, 2015: For today's #haskell problem we look at Haskell by Example and bow before the mighty for-loop! http://lpaste.net/188189811754926080 The for-loop solution. I count even. http://lpaste.net/3919145189309939712 *groans
  • May 26th, 2015: TMI! So how to distinguish the stand-outs from the noise? Standard deviations FTW! http://lpaste.net/7326574484482162688 for today's #haskell problem. First rule: don't lose! http://lpaste.net/5153008470756163584
  • May 25th, 2015: We look at monad transformers to help in logging http://lpaste.net/8301304912738779136 for today's #haskell problem. Ooh! What does the log say? (that foxy log!) http://lpaste.net/109687 Ooh! Ouch! Bad investment strategy! Onto the next strategy!
  • May 22nd, 2015: Tomorrow's another day for today's #haskell problem http://lpaste.net/8939124091818868736 #trading Huh! http://lpaste.net/5738095651988701184 we get very similar returns as from previously essayed. Well, now we are confident of a more accurate model.
  • May 21st, 2015: For today's #haskell problem http://lpaste.net/1900634933852897280 we talk of categories (not 'The'), so we need a cat-pic: LOL
    Is 'parfait' a Category? http://lpaste.net/3432870087273480192 … If so, I would eat categories every day! 


  • May 20th, 2015: Let's view a graph as, well, a #neo4j-graph for today's #haskell problem http://lpaste.net/8344081425502306304 Ooh! Pretty Graph(ic)s! http://lpaste.net/621084277097889792 
  • May 19th, 2015: "Lemme put that on my calendar," sez the trader http://lpaste.net/7002659561530195968 for today's #haskell problem So ... next Monday, is that a #trading day? #inquiringmindswanttoknow http://lpaste.net/729791673181143040
  • May 18th, 2015: So, what, precisely, does a bajillion clams look like? We find out by doing today's #haskell problem http://lpaste.net/3228618210228043776 #trading #clams Were those cherry clams? No: apple http://lpaste.net/6731000367502327808
  • May 15th, 2015: We learn that π is a Π-type in today's #haskell problem of co-constants http://lpaste.net/2567253109898215424 (#typehumour) 'I' like how Id is the I-combinator http://lpaste.net/7574047523665346560
  • May 14th, 2015: Today's #haskell problem brings out the statistician in you that you didn't know you had. Hello, Standard Deviations! http://lpaste.net/3520547034257948672 Wow, rational roots? Sure, why not? And: 'σ'? ... so cute! http://lpaste.net/8056787266321776640
  • May 13th, 2015: Today's #haskell problem teaches us that variables ... AREN'T! http://lpaste.net/7856430804354727936 Well, if variables aren't, then they certainly do. Except in Haskell – A solution to this problem is posted at http://lpaste.net/767712970229678080
  • May 12th, 2015: For Today's #haskell problem we look at values and their types, but NOT types and their values, please! http://lpaste.net/7773498464792477696 *glares* taf, tof, tiff, tough solution today http://lpaste.net/7739074628332552192
  • May 11th, 2015: Hello, world! Haskell-style! http://lpaste.net/7162836591558262784 -- A solution to this 'Hello, world!'-problem is posted at http://lotz84.github.io/haskellbyexample/ex/hello-world
  • Octo de Mayo, 2015: Strings not enumerable in Haskell? Balderdash! Today's #haskell problem http://lpaste.net/7204216668719939584 Today we produce not only one, but two solutions to the enumerable-strings problem http://lpaste.net/5041390572904906752 #haskell 
  • Septo de Mayo, 2015: When all your Haskell documentation is gone, whom do you call? The web crawler, SpiderGwen! to the rescue! http://lpaste.net/4022946512970448896 
    We CURL the HTML-parsing solution at http://lpaste.net/6475977888209305600
  • Sexo de Mayonnaise, 2015: @BenVitale Offers us a multi-basal ('basel'? 'basil'?) palindromic number-puzzle for today's #haskell problem http://lpaste.net/858675562900619264
  • Cinco de Mayo, 2015: May the #maths be with you. Always. http://lpaste.net/8062431866961002496 Today's #haskell problem, we learn that 5 is the 6th number. Happy Cinco de Mayo! Pattern-matching, by any other name, would still smell as sweet #lolsweet http://lpaste.net/7419804910079705088 A solution to today's #haskell problem.
  • May 4th, 2015: "There is no trie, my young apprentice!" Today's #haskell problem proves Master Yoda wrong. FINALLY! http://lpaste.net/179758090873208832 So ... I 'trie'd ... *GROAN!* http://lpaste.net/5961897275272724480 A solution to today's 'trie'ing #haskell problem 
  • May 1st, 2015: (Coming up to) Today's #haskell problem is 'slightly' more challenging than yesterday's problem (forall, anyone?) http://lpaste.net/1400137464227561472 Existential Quantification, For(all) The Win! http://lpaste.net/4400237442641166336

by geophf ([email protected]) at June 05, 2015 02:40 PM

June 04, 2015

Magnus Therning

Oh no! Success

This can’t possibly be good for Haskell…

We chose Haskell and the FP Complete stack because we knew the complexity of the problem required a new approach. The result was that we built better software faster than ever, and delivered it defect-free into a production environment where it has proven robust and high-performance

June 04, 2015 12:00 AM

June 03, 2015

Brent Yorgey

Crystal Ball Connection Patterns via Species and Generating Functions

A couple weeks ago, Denise posted Puzzle: Crystal Ball Connection Patterns on her blog, Let’s Play Math. I had fun playing with it and thought I would demonstrate how to apply some high-powered combinatorial techniques to it (probably not what Denise had in mind!).

The setup is that there are n (distinct) friends who can talk to each other on the phone. Only two people can talk at a time (no conference calls). The question is to determine how many different “configurations” there are. Not everyone has to talk, so a configuration consists of some subset of the friends arranged in (unordered) conversational pairs.

Warning: spoilers ahead! If you’d like to play around with this yourself (and it is indeed a nice, accessible combinatorics problem to play with), stop reading now. My goal in this post is to have fun applying some advanced tools to this (relatively) simple problem.

Telephone numbers

Let’s start by visualizing some configurations. In her post, Denise illustrated the complete set of configurations for n = 4, which I will visualize like this:

Notice how I’ve arranged them: in the first row is the unique configuration where no one is talking (yes, that counts). In the second row are the six possible configurations with just a single conversation. The last row has the three possible configurations with two conversations.

One good approach at this point would be to derive some recurrences. This problem does indeed admit a nice recurrence, but I will let you ponder it. Instead, let’s see if we can just “brute-force” our way to a general formula, using our combinatorial wits. Later I will demonstrate a much more principled, mechanical way to derive a general formula.

Let’s start by coming up with a formula for T_{n,k}, the number of configurations with n people and k conversations. The number of ways of choosing k pairs out of a total of n is the multinomial coefficient \displaystyle \binom{n}{2,2,\dots,2,n-2k} = \frac{n!}{(2!)^k(n-2k)!}. However, that overcounts things: it actually distinguishes the first pair, second pair, and so on, but we don’t want to have any ordering on the pairs. So we have to divide by k!, the number of distinct orderings of the pairs. Thus,

\displaystyle T_{n,k} = \frac{n!}{2^k (n-2k)! k!}.

Let’s do a few sanity checks. First, when k=0, we have T_{n,0} = \frac{n!}{n!} = 1. We can also try some other small numbers we’ve already enumerated by hand: for example, T_{4,1} = \frac{4!}{2 \cdot 2 \cdot 1} = 6, and T_{4,2} = \frac{4!}{4 \cdot 1 \cdot 2} = 3. So this seems to work.

For n people, there can be at most \lfloor n/2 \rfloor conversations. So, the total number of configurations is going to be

\displaystyle T_n = \sum_{k=0}^{\lfloor n/2 \rfloor} T_{n,k}.

We can use this to compute T_n for the first few values of n:

\begin{array}{rcl}T_0 &=& 1\\T_1 &=& 1 \\ T_2 &=& 1 + 1 = 2 \\ T_3 &=& 1 + 3!/2 = 4 \\ T_4 &=& 1 + 6 + 3 = 10 \\ T_5 &=& 1 + 5!/(2 \cdot 3!) + 5!/(4 \cdot 2) = 1 + 10 + 15 = 26 \\ T_6 &=& 1 + 6!/(2 \cdot 4!) + 6!/(4 \cdot 2 \cdot 2) + 6!/(8 \cdot 3!) = 1 + 15 + 45 + 15 = 76 \end{array}

At this point we could look up the sequence 1,1,2,4,10,26,76 on the OEIS and find out all sorts of fun things: e.g. that we are also counting self-inverse permutations, i.e. involutions, that these numbers are also called “restricted Stirling numbers of the second kind”, some recurrence relations, etc., as well as enough references to keep us busy reading for a whole year.

Species

We can describe configurations as elements of the combinatorial species C = E \circ (E_2 + X). That is, a configuration is an unordered set (E) of (\circ) things (E_2 + X), where each thing can either be an unordered pair (E_2) of people talking on the phone, or (+) a single person (X) who is not talking.

We can now use the Haskell species library to automatically generate some counts and see whether they agree with our manual enumerations. First, some boilerplate setup:

ghci> :set -XNoImplicitPrelude
ghci> :m +NumericPrelude
ghci> :m +Math.Combinatorics.Species

Now we define the species of configurations:

ghci> let configurations = set `o` (set `ofSizeExactly` 2 + singleton)

We can ask the library to count the number of configurations for different n:

ghci> take 10 (labelled configurations)
  [1,1,2,4,10,26,76,232,764,2620]

Oh good, those numbers look familiar! Now, I wonder how many configurations there are for n = 100?

ghci> labelled configurations !! 100
  24053347438333478953622433243028232812964119825419485684849162710512551427284402176

Yikes!

We can also use the library to generate exhaustive lists of configurations, and draw them using diagrams. For example, here are all 76 configurations for n=6. (If you want to see the code used to generate this diagram, you can find it here.)

And just for fun, let’s draw all 232 configurations for n = 7:

Whee!

A general formula, via generating functions

Finally, I want to show how to use the species definition given above and the theory of generating functions to (somewhat) mechanically derive a general formula for the number of configurations. (Hopefully it will end up being equivalent to the formula we came up with near the beginning of the post!) Of course, this is also what the species library is doing, but only numerically—we will do things symbolically.

First, note that we are counting labelled configurations (the friends are all distinct), so we want to consider exponential generating functions (egfs). Recall that the egf for a species F is given by

\displaystyle F(x) = \sum_{n \geq 0} |F[n]| \frac{x^n}{n!},

that is, a (possibly infinite) formal power series where the coefficient of x^n/n! is the number of distinct labelled F-structures of size n. In our case, we need

\displaystyle E(x) = \sum_{n \geq 0} 1 \cdot \frac{x^n}{n!} = e^x,

since there is exactly one set structure of any size, and

\displaystyle E_2(x) = \frac{x^2}{2},

which is just the restriction of E(x) to only the x^2 term. Of course, we also have X(x) = x. Putting this together, we calculate

\begin{array}{rcl}\displaystyle (E \circ (E_2 + X))(x) &=& e^{(x^2/2 + x)} \\[1em] &=& \displaystyle \sum_{n \geq 0} \frac{(x^2/2 + x)^n}{n!} \\[1em] &=& \displaystyle \sum_{n \geq 0} \sum_{k = 0}^n \frac{1}{n!} \binom{n}{k} \left(\frac{x^2}{2}\right)^k x^{n-k} \\[1em] &=& \displaystyle \sum_{n \geq 0} \sum_{k=0}^n \frac{1}{n!} \binom{n}{k} \frac{x^{n+k}}{2^k} \end{array}

Ultimately, we want something of the form \displaystyle \sum_{m \geq 0} f_m \frac{x^m}{m!}, so we’ll need to collect up like powers of x. To do that, we can do a bit of reindexing. Right now, the double sum is adding up a bunch of terms that can be thought of as making a triangle:

Each ordered pair in the triangle corresponds to a single term being added. Each column corresponds to a particular value of n, with n increasing to the right. Within each column, k goes from 0 up to n.

The powers of x in our double sum are given by n+k. If we draw in lines showing terms that have the same power of x, it looks like this:

So let’s choose a new variable m, defined by m = n + k. We can see that we will have terms for every m \geq 0. We will also keep the variable k for our other index, and substitute n = m - k to get rid of n. In other words, instead of adding up the triangle by columns, we are going to add it up by diagonals.

Previously we had k \leq n; substituting for n that now turns into k \leq m - k. Adding k to both sides and dividing by 2 yields k \leq \lfloor m/2 \rfloor (we can round down since k is an integer). Looking at the diagram above, this makes sense: the height of each diagonal line is indeed half its index. Rewriting our indices of summation and substituting m - k for n, we now have:

\begin{array}{rcl}\displaystyle \sum_{n \geq 0} \sum_{k=0}^n \frac{1}{n!} \binom{n}{k} \frac{x^{n+k}}{2^k} &=& \displaystyle \sum_{m \geq 0} \sum_{k=0}^{\lfloor m/2 \rfloor} \frac{1}{(m-k)!} \binom{m-k}{k} \frac{x^m}{2^k} \\[1em] &=& \displaystyle \sum_{m \geq 0} \sum_{k=0}^{\lfloor m/2 \rfloor} \frac{1}{k!(m-2k)!} \frac{x^m}{2^k} \\[1em] &=& \displaystyle \sum_{m \geq 0} \frac{x^m}{m!} \sum_{k=0}^{\lfloor m/2 \rfloor} \frac{m!}{k!(m-2k)!2^k} \end{array}

And hey, look at that! The coefficient of x^m/m! is exactly what we previously came up with for T_m. Math works!


by Brent at June 03, 2015 07:07 PM

mightybyte

The Problem with Curation

Recently I received a question from a user asking about "cabal hell" when installing one of my packages. The scenario in question worked fine for us, but for some reason it wasn't working for the user. When users report problems like this they usually do not provide enough information for us to solve it. So then we begin the sometimes arduous back and forth process of gathering the information we need to diagnose the problem and suggest a workaround or implement a fix.

In this particular case luck was on our side and the user's second message just happened to include the key piece of information. The problem in this case was that they were using stackage instead of the normal hackage build that people usually use. Using stackage locks down your dependency bounds to a single version. The user reporting the problem was trying to add additional dependencies to his project and those dependencies required different versions. Stackage was taking away degrees of freedom from the dependency solver (demoting it from the driver seat to the passenger seat). Fortunately in this case the fix was simple: stop freezing down versions with stackage. As soon as the user did that it worked fine.

This highlights the core problem with package curation: it is based on a closed-world assumption. I think that this makes it not a viable answer to the general question of how to solve the package dependency problem. The world that many users will encounter is not closed. People are constantly creating new packages. Curation resources are finite and trying to keep up with the world is a losing battle. Also, even if we had infinite curation resources and zero delay between the creation of a package and its inclusion in the curated repository, that would still not be good enough. There are many people working with code that is not public and therefore cannot be curated. We need a more general solution to the problem that doesn't require a curator.

by mightybyte ([email protected]) at June 03, 2015 05:25 PM

Don Stewart (dons)

Haskell dev role in Strats at Standard Chartered [Singapore]

The Strats team at Standard Chartered has an open position for a typed functional programming developer, based in Singapore.

You will work on the trading floor, directly with traders, building software to automate their work and improve their efficiency. The role is highly development focused and you will use Haskell for almost all tasks: data analysis, market data publishing, database access, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work.

This is a permanent position in Singapore as part of the Strats global team. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required. We have around 2.5 million lines of Haskell, and our own Haskell compiler. In this context we look for skill and taste in typed functional programming to capture and abstract over complex, messy systems.

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. A PhD in computer science is a strong advantage.

The role requires physical presence on the trading floor in Singapore. Remote work is not an option. Ideally you have some project and client management skills — you will talk to users, understand their problems and then implement and deliver what they really need. No financial background is required.

More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview.

If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com.

Role posted 2015-06-03


Tagged: jobs

by Don Stewart at June 03, 2015 08:40 AM

The GHC Team

GHC Weekly News - 2015/06/03

Hi *,

It's that time once again - to get some info on what's happening in the world of GHC! It's been a quiet few weeks as a UK Holiday punted one of GHC HQ's meetings off, and this week we were only partially there.

The main point of discussion was 7.10.2, and continuing work on compiler performance. The good news is, the past few weeks have seen good work on both these fronts!

7.10.2 status

7.10.2 is swimming along very nicely - the status page shows the current set of tickets we've fixed and plan on fixing.

Not much has changed from last time, except we've fixed even more bugs! We're currently sitting at about 85 bugs fixed, some of them pretty important - code generation bugs, compiler performance fixes, some RTS and event manager work. Your author is actually quite happy with what GHC 7.10.2 looks like, at this rate.

List chatter

  • Facundo Dominguez asks: sometimes we want to create a static pointer in a function with a local definition, how can we do that? The current problem is the desugarer gets in the way and current approaches are currently rejected, but Facundo has some ideas/questions about a fix. https://mail.haskell.org/pipermail/ghc-devs/2015-May/009110.html

Noteworthy commits

Closed tickets

#10407, #10408, #10177, #10359, #10403, #10248, #9579, #10415, #10419, #10427, #10429, #10397, #10422, #10335, #10366, #10110, #10397, #10349, #10244, #8555, #8799, #9131, #10396, #10354, #10278, #9899, #3533, #9950, #10092, #9950, #10430, #9682, #9584, #10446, #10410, #10298, #10449, #10399, #7695, #10261, #8292, #10360, #10126, #10317, #10101, #10322, #10313, #10471, #10473, #7170, #10473, #10423, #10466, #8695, #10461, #10052, #10370, #10425, #10452, #10474,

by thoughtpolice at June 03, 2015 07:03 AM

Magnus Therning

Re. cryptonite

Excellent work Vincent, but then you’ve never done anything but! I’m looking forward to purging all the old crypto packages from ArchHaskell in the near future.

Oh, BTW, why Disqus for comments?

June 03, 2015 12:00 AM

June 02, 2015

Noam Lewis

infernu news

In the past month, most of the work on infernu was to stabilize the type system, making decisions about some trade-offs. Here’s a short summary:

  • Colors! I refactored all the pretty printing code to use the ansi-wl-pprint package, which fixes indentation and formatting issues and also adds ansi colors to the output. One neat thing is that identical type variables have the same color:
    infernu-colors
  • Changed the way constructor functions are treated. Until now, constructor functions were treated at definition site as regular functions. The difference between constructors and non-constructors only happened at “new” calls, where the expected “this” value was forced to be a closed row type. Unfortunately this breaks if you call “new Foo()” inside the definition of “Foo”.
    To avoid this issue, functions with uppercase names are now treated specially and the “this” row-type is forced to be closed when the function name is added to the environment. This allows maximum flexibility while defining Foo, while ensuring Foo’s type is closed outside the constructor to prevent junk code like var x = new Foo(); x.myWrongProperty = 2;
  • Explored the idea of “Maybe” (or “optional”) types, including having common JS APIs use them for stronger safety. For example, array access should return a Maybe value.
    Unfortunately, since there is no syntax to construct a Maybe-typed value (no “Just”), all values can be implicitly “upgradeed” to Maybe. In other words, there is an ambiguity that break type inference. So for now, not implementing Maybe types (perhaps with explicit annotations they can come back).
  • Decided to disable generalization of row fields (object properties). This decision means that user-defined objects will by default not have polymorphic methods, although the object itself could be polymorphic (and thus different occurrences of the object in the program syntax, will allow instantiation to different types). The reason for this decision is that overly-polymorphic fields cause unexpected type errors, such as when passing objects that contain them as parameters to functions (contravariance can surprise you).
  • Started a document sketching out denotational semantics of JS, as infernu decides these should be, which helped clarify a few issues in the JS -> LC translator. The next step is to change all translations to preserve semantics, currently they only preserve types.
  • Bug fixes: polymorphic subsumption checking, unification of recursive types.
  • Increased compatibility: now using base-compat and a custom prelude to increase compatibility with different GHC versions (thanks to RyanGlScott for submitting a fix to use base-orphans which prompted me to do this).

Tagged: Infernu

by sinelaw at June 02, 2015 10:26 PM

A simple problem with recursive types and subtyping

Here’s a simple example of recursive types interacting badly with subtyping:

T={ foo: T -> A}
U={ foo: U -> B}

Consider T <: U, therefore

(T -> A) <: (U -> B)

Which implies:

U <: T

So T <: U but also U <: T, which is true iff A <: B and B <: A.

In my case, the subtyping relation is polymorphic subsumption: T is subsumed by U iff U is “more polymorphic”, intuitively, it can be instantiated to all types that T can.

This situation arises in rather simple code, involving polymorphic vs. non-polymorphic row fields. For example, A is a row with a polymorphic method, whereas B is a row with a monomorphic (but compatible) method, such as:

A = { method: forall a. a -> () }
B = { method: String -> () }

In this case subsumption (the form of subtyping in play) fails.

One way around this is to avoid subsumption issues altogether by keeping things rank-1, and not using higher-rank row fields. Unfortunately, throwing away polymorphic methods is very bad: consider a non-polymorphic array.map (in JS).

A slightly better workaround is to push the foralls all the way out, keeping all types (including row types) rank-1. Every time an object method is accessed via the property syntax obj.method, we end up instantiating the object’s row type, and get a “fresh” type for the method. We get practically polymorphic methods. That’s the approach I’m investigating for infernu.


Tagged: Infernu

by sinelaw at June 02, 2015 09:56 PM

Brent Yorgey

BlogLiterately 0.8.1, now with HTTPS!

I’ve just released version 0.8.1 of BlogLiterately, a tool for formatting and posting stuff (especially Haskelly stuff) to blogs. This is in conjunction with the release of haxr-3000.11. After much blood, sweat, and tears, I was able to rip the HTTP package out of the guts of haxr, replace it with http-streams, and carefully sew everything back together around the edges. The result is that haxr now finally supports making XML-RPC calls via HTTPS, which in turn means that BlogLiterately once again works with WordPress, which no longer supports XML-RPC over HTTP. Happy blogging!


by Brent at June 02, 2015 02:52 AM

Magnus Therning

Facebook and PGP

I wonder if this will have any impact at all on PGP usage… I doubt it, but it’s still nice to see it done. The Register also writes about it.

June 02, 2015 12:00 AM

Vincent Hanquez

Announcing: cryptonite

For the last 5 years, I’ve worked intermittently on cryptographic related packages for Haskell. Lately, I’ve consolidated it all in one single package. Announcing cryptonite

This new package merges the following packages:

Also this package adds support for the following features:

Why this new packaging model ?

This is mostly rooted in three reasons:

  • Discoverability
  • Cryptographic taxonomy
  • Maintenance

Discovering new packages in our current world of hackage is not easy. Unless you communicate heavily on new packages, there’s a good chance that most people would not know about a new package, leading to re-implementation, duplicated features, and inconsistencies.

Cryptography taxonomy is hard, and getting harder; cryptographic primitives are re-used creatively for making hash from cipher primitive, or random generator from cipher, or authentification code from galois field operations. This does create problems into where the code lives, how the code is tested, etc.

Then, finally, if I have to choose a unique reason for doing this, it will be maintenance. Maintenance of many cabal packages is costly in time: lower bounds, upper bounds, re-installation, compatibility modules, testing framework, benchmarking framework.

My limited free time has been siphoned into doing unproductive cross packages tasks, for example:

  • Upgrading bounds
  • Sorting out ghc database of installed-packages when reinstalling packages for testing features
  • Duplicating compatibility modules for supporting many compilers and library versions
  • Maintaining meta data for many packages (e.g. LICENSE, CHANGELOG, README, .travis, .cabal)
  • Tagging and releasing new versions

Doing everything in one package, simplifies the building issues, gives a better ability to test features easily, makes a more consistent cryptographic solution, and minimizes meta data changes.

What happens to other crypto packages ?

Cryptonite should be better in almost every aspect: better features, better testing. So there are no real reasons to maintain any of the old packages anymore, so in the long run, I expect most of those packages to become deprecated. I encourage everyone to move to the new package.

I’ll try to answer any migration questions as they arise, but most of the migration should be straightforward in general.

I’m committed to maintain cryptohash for now, as it is very widely used. I’ll try to maintain the rest of the packages for now, but don’t expect this to last for long.

Otherwise, If some people are interested in keeping certain other pieces independent and maintained, come talk to me directly with motivated arguments.

Contributing

I hope this does bring contributions, and this becomes a more community-maintained package, and specially that cryptonite becomes the canonical place for anything cryptography related in Haskell.

Main things to look out, for successful contributions:

  • respect the coding style
  • do not introduce dependencies

Also you don’t need to know every little thing in cryptography to help maintain and add feature in cryptonite.

PS: I’m also looking forward to more cryptography related discussions about timing attacks, what source of random is truly random, etc. :-þ

June 02, 2015 12:00 AM