Planet Haskell

November 22, 2014

Mark Jason Dominus

Fucking Microsoft, fucking Windows

I have a new Lenovo laptop, which dual-boots Windows and Linux. (Huge thanks to the author of [these detailed instructions](https://askubuntu.com/questions/221835/installing-ubuntu-on-a-pre-installed-windows-8-64-bit-system-uefi-supported/228069#228069) and [this tool](http://sourceforge.net/projects/boot-repair/) that magically fixes whatever is messed up.) So I'm using Windows now for the first time in several years, and remembering why I stopped. I had forgotten, for a long time, what this was like. You hear a lot of propaganda about how a Linux user has to solve a lot of problems that the free development process didn't solve, they have to understand system administration, and when things go wrong you have to depend on free suppprt, the implication being that the free support is sketchy and incomplete. In the intervening years I had forgotten what a load of bullshit this is. When my Linux system was broken, or when the free support was sketch and incomplete, I would wonder if I wouldn't be better off using something else. A few months of dealing with Windows have reminded me that no, I would not be better off. When I first got the laptop, I had Windows booted, and I tried plugging in a VGA monitor for the first time. This is a universal, core function of the laptop, so one might expect them to have gotten it right a long time ago. Or maybe Lenovo tried it thirty minutes after the first laptop came from the factory, and discovered something wrong, and then worked with Microsoft to make sure the problem was fixed. Anyway one would expect to plug in the monitor, and have it just work; whereas under Linux you might expect to have to fiddle with `Xorg.conf`, or download `xrandr` and puzzle out ists man page or something like that. But no, it was just the opposite. When I plugged in the monitor under Linux, everything just worked instantly. When I plugged in the monitor under Windows, Windows recognized the external monitor, but the integrated monitor went black. I tried all the monitor and display settings; the integrated display was still black. So I called Lenovo support. What a mistake that was. The guy on the other end had me try to same set of display settings I had just tried, tried disabling and reinstalling some of the dvice drivers, and then threw up his hand and said I had a bad motherboard and I should return the laptop for replacement. I said I was sure he was mistaken. I knew the motherboard was fine because the VGA monitor worked fine under Linux. (I didn't say this on the phone, because in sometimes the word “Linux” is been like waving a red cape in front of a bull: oh, you use Linux, that must be the cause of your problem.) Then at the end of the call: “Are you satisfied with the results of your call today.” Uh, what? You didn't solve my problem and your suggested solution is obvious nonsense, so, um, no, I'm not satisfied with the results of my call today. Anyway, that's the last time I'll make that mistake. I did some web searching and found a page that suggested some incantation in the boot settings that fixed the problem. So much for paid support; it was worse than useless. More recently, I took the laptop to Los Angeles, and when I arrived, the AC power adapter wasn't working; I would plug in the laptop and it wouldn't charge. Bot Linux and Windows reported "not plugged in". (Lenovo got rid of the LED that announces when AC power is connected, so you have to boot the machine to find out whether it thinks it is plugged in.) This had worked a few hours before, when I had it plugged in at the airport. I tested with a spare line cord and guessed that the power brick was faulty. Then I limped along for two days on reserve battery and a borrowed power brick until a replacement arrived. The replacement worked for a few hours in the office, but when I got back to the hotel, it failed in the same way. Over the next couple of hours I discovered that if I powered the machine down, pulled out and replaced the battery, and then booted it up, the AC power would work indefinitely under Linux, and would even work under Windows _until I logged in_, when it would stop working after a few seconds. Once it stopped working the only way to get it to work again was to power the machine down and pull the battery. So the problem was evidently a Windows software problem. And eventually I made the problem go away by booting with AC power and no battery, uninstalling the Windows battery device driver (yes, the battery as a device driver), shutting down, and booting with battery but no AC power. It has continued to work since. The original power brick, the one I thought had failed, is fine. I wouldn't have imagined that a Windows device driver problem could cause a failure of the AC charging, but it seems so. Had this kind of nonsense happened under Linux, I would have been annoyed but taken it in stride; Linux is written and maintained by volunteers, so you take what you can get. Microsoft has no such excuse. We paid Microsoft a hundred billion dollars for this shoddy shitware.

by Mark Dominus ([email protected]) at November 22, 2014 05:54 PM

Within this instrument, resides the Universe

When opportunity permits, I have been trying to teach my ten-year-old daughter rudiments of algebra and group theory. Last night I posed this problem:

Mary and Sue are sisters. Today, Mary is three times as old as Sue; in two years, she will be twice as old as Sue. How old are they now?

I have tried to teach Ms. 10 that these problems have several phases. In the first phase you translate the problem into algebra, and then in the second phase you manipulate the symbols, almost mechanically, until the answer pops out as if by magic.

There is a third phase, which is pedagogically and practically essential. This is to check that the solution is correct by translating the results back to the context of the original problem. It's surprising how often teachers neglect this step; it is as if a magician who had made a rabbit vanish from behind a screen then forgot to take away the screen to show the audience that the rabbit had vanished.

Ms. 10 set up the equations, not as I would have done, but using four unknowns, to represent the two ages today and the two ages in the future:

$$\begin{align} MT & = 3ST \\ MY & = 2SY \\ \end{align} $$

( here is the name of a single variable, not a product of and ; the others should be understood similarly.)

“Good so far,” I said, “but you have four unknowns and only two equations. You need to find two more relationships between the unknowns.” She thought a bit and then wrote down the other two relations:

$$\begin{align} MY & = MT + 2 \\ SY & = ST + 2 \end{align} $$

I would have written two equations in two unknowns:

$$\begin{align} M_T & = 3S_T\\ M_T+2 & = 2(S_T + 2) \end{align} $$

but one of the best things about mathematics is that there are many ways to solve each problem, and no method is privileged above any other except perhaps for reasons of practicality. Ms. 10's translation is different from what I would have done, and it requires more work in phase 2, but it is correct, and I am not going to tell her to do it my way. The method works both ways; this is one of its best features. If the problem can be solved by thinking of it as a problem in two unknowns, then it can also be solved by thinking of it as a problem in four or in eleven unknowns. You need to find more relationships, but they must exist and they can be found.

Ms. 10 may eventually want to learn a technically easier way to do it, but to teach that right now would be what programmers call a premature optimization. If her formulation of the problem requires more symbol manipulation than what I would have done, that is all right; she needs practice manipulating the symbols anyway.

She went ahead with the manipulations, reducing the system of four equations to three, then two and then one, solving the one equation to find the value of the single remaining unknown, and then substituting that value back to find the other unknowns. One nice thing about these simple problems is that when the solution is correct you can see it at a glance: Mary is six years old and Sue is two, and in two years they will be eight and four. Ms. 10 loves picking values for the unknowns ahead of time, writing down a random set of relations among those values, and then working the method and seeing the correct answer pop out. I remember being endlessly delighted by almost the same thing when I was a little older than her. In The Dying Earth Jack Vance writes of a wizard who travels to an alternate universe to learn from the master “the secret of renewed youth, many spells of the ancients, and a strange abstract lore that Pandelume termed ‘Mathematics.’”

“I find herein a wonderful beauty,” he told Pandelume. “This is no science, this is art, where equations fall away to elements like resolving chords, and where always prevails a symmetry either explicit or multiplex, but always of a crystalline serenity.”

After Ms. 10 had solved this problem, I asked if she was game for something a little weird, and she said she was, so I asked her:

Mary and Sue are sisters. Today, Mary is three times as old as Sue; in two years, they will be the same age. How old are they now?

“WHAAAAAT?” she said. She has a good number sense, and immediately saw that this was a strange set of conditions. (If they aren't the same age now, how can they be the same age in two years?) She asked me what would happen. I said (truthfully) that I wasn't sure, and suggested she work through it to find out. So she set up the equations as before and worked out the solution, which is obvious once you see it: Both girls are zero years old today, and zero is three times as old as zero. Ms. 10 was thrilled and delighted, and shared her discovery with her mother and her aunt.

There are some powerful lessons here. One is that the method works even when the conditions seem to make no sense; often the results pop out just the same, and can sometimes make sense of problems that seem ill-posed or impossible. Once you have set up the equations, you can just push the symbols around and the answer will emerge, like a familiar building as you approach it through a fog.

But another lesson, only hinted at so far, is that mathematics has its own way of understanding things, and this is not always the way that humans understand them. Goethe famously said that whatever you say to mathematicians, they immediately translate it into their own language and then it is something different; I think this is exactly what he meant.

In this case it is not too much of a stretch to agree that Mary is three times as old as Sue when they are both zero years old. But in the future I plan to give Ms. 10 a problem that requires Mary and Sue to have negative ages—say that Mary is twice as old as Sue today, but in three years Sue will be twice as old—to demonstrate that the answer that pops out may not be a reasonable one, or that the original translation into mathematics can lose essential features of the original problem. The solution that says that is mathematically irreproachable, and if the original problem had been posed as “Find two numbers such that…” it would be perfectly correct. But translated back to the original context of a problem that asks about the ages of two sisters, the solution is unacceptable. This is the point of the joke about the spherical cow.

by Mark Dominus ([email protected]) at November 22, 2014 05:38 PM

The GHC Team

GHC Weekly News - 2014/11/21

Hi *,

To get things back on track, we have a short post following up the earlier one this week. It's been busy today so I'll keep it short:

  • The STABLE freeze Austin announced two weeks ago is happening now, although at this point a few things we wanted to ship are just 98% ready. So it may wait until Monday.
  • HEAD has been very busy the past two days as many things are now trying to merge as closely to the window as possible. Some notes follow.
  • HEAD now has support for using the 'deriving' clause for arbitrary classes (see #5462).
  • HEAD now has 64bit iOS and SMP support for ARM64, thanks to Luke Iannini. See #7942.
  • base now exports a new module for Natural numbers called Numeric.Natural following Herbert Valerio Riedel's recent proposal.
  • Your author has been busy and delayed due to some bad travel experiences the past week, so the 7.8.4 RC1 hasn't landed this past week. Hopefully it will be out by the end of this week still.

Since the last update was only a few days ago, you'd think we haven't closed a lot of tickets, but we have! Thomas Miedema has been very very persistent about closing tickets and cleaning them up, which is greatly appreciated: #9810, #8324, #8310, #9396, #9626, #9776, #9807, #9698, #7942, #9703, #8584, #8968, #8174, #9812, #9209, #9220, #9151, #9201, #9318, #9109, #9126, #8406, #8102, #8093, #8085, #8068, #8094, #9590, #9368, #2526, #9569, #8149, #9815, #5462, #9647, #8568, #9293, #7484, #1476, #9824, #9628, #7942

by thoughtpolice at November 22, 2014 12:33 AM

November 21, 2014

Philip Wadler

Bright Club


Tuesday 25 November at The Stand Comedy Club, I will be part of the line up at Bright Club, speaking on the subject of 'Turingery'. Bright Club is stand-up by academics---we are trained professionals; don't try this at home! Doors open 7:30pm, show starts 8:30pm. The Stand is at 5 York Place, Edinburgh, EH1 3EB. Tickets £5 at the door or online.

by Philip Wadler ([email protected]) at November 21, 2014 04:49 PM

November 20, 2014

Kevin Reid (kpreid)

Austin Seipp

Outages and improvements...

This past week, the primary Haskell.org server hosting the wiki and mailing systems went down due to RAID failures in its underlying hardware. Thanks to some (real) luck and hard work, we've managed to recover and move off the server to a new system, avoided some faulty hardware, and hopefully got some improved reliability.

The rundown on what happened

Before we started using Rackspace, eveyrthing existed on a single machine hosted by Hetzner, in Germany. This machine was named rock and ran several VMs with IPs allocated to them, which ran Hackage, Mediawiki (AKA www) and ghc.haskell.org on top of KVM.

This server had a RAID1 setup between two drives. These drives were partitioned to have a primary 1.7TB partition for most of the data that was striped, and another small partition also striped.

We had degredation on both partitions and essentially lost one drive completely. This is why the system had been so slow the past week and grinding to a halt so often.

A quick move

We began to investigate this when the drives became unable to fairly service IO almost at all. We found something really really bad: we hadn't had one of the drives in nearly two weeks!

We had neglected to install SMART and RAID monitoring in our Nagios setup. An enormous and nearly disastrous blunder. And we didn't even look at our SFTP access for backups that Hetzner provided, but it was close to out of space last we checked. But we seemed to be OK in the read-only workload.

Overall, it was some amateur mistakes that almost cost us. We'd been so focused on moving things out, we didn't even take the time to check the server when we moved Hackage a few weeks prior and making sure the existing things kept working.

We'd planned to do this move earlier, but it obviously couldn't wait. So Davean spent most of his Tuesday migrating all the services over to Rackspace on a new VM, and we migrated the data to our shared MySQL server, and got the mail relay running again.

The whole process took close to 12 hours or so I'd say, but overall went quite smoothly without any kind of read errors or problems on the remaining drive, and restored service.

Improvements

In the midst of anticipating this move, I migrated a lot of download data - specifically GHC and the Haskell Platform - to a new server, https://downloads.haskell.org. It's powered by a brand new CDN via Fastly, and provides the Platform, and GHC downloads and documentation. Don't worry: redirects for the main server are still in place.

We've also finally fixed a few bugs stopping us from deploying the new main website. We've deployed https://try.haskell.org now as an official instance, as well as fixed https://new-www.haskell.org to use it.

We've also moved most the MediaWiki data to our consolidated MariaDB server. However, because we were so hastily moving data, we didn't put the new wiki in the same DC as the database! As a result, we have somewhat of a performance loss. In the mean time, we've deployed Memcached to try and help offset this a bit. We'll be moving things again soon, but it will hopefully be faster and much less problematic.

We've also taken the time to archive a lot of our data from Rock onto DreamHost, who have joined us and given us free S3-compatible object storage! We'll be moving a lot of data in there soon.

Future improvements

We've got several improvements we're going to deploy in the near future:

  • We're obviously going to overhaul our Nagios setup, which is clearly inadequate and now out of date with all the recent changes.
  • We'll be moving Fastly in front of Hackage soon, which should hopefully dramatically increase download speeds for all packages, and save us a lot of bandwidth in the long run.
  • Beyond memcached, our MediaWiki instance will be using Fastly as a CDN, as it's one of the most bandwidth-heavy parts of the site besides Hackage. We'll be working towards integrating purge support from Mediawiki with the upstream CDN.
  • We'll also be moving our new wiki server to a separate DC so it can talk to the MySQL server more efficiently.
  • We're planning on doing some other tech-stack upgrades: a move to nginx across the remaining servers using apache, upgrading MediaWiki, etc.
  • We'll be working towards better backups for all our servers, possibly using a solution like Attic and S3 synchronization.

And some other stuff I'll keep secret.

Conclusion

Overall, the past few weeks have seen several improvements but a lot of problems, and some horrible mistakes that almost cost us weeks of data if we hadn't been careful. Most of our code, repositories, configurations, and critical data was mirrored at the time - but we still got struck where we were vulnerable, on old hardware we had issues with before.

For that, I take responsibility on behalf of the administration team (as the de-facto lead, it seems) for the outage this past week, which cost us nearly a day of email and main site availability.

The past week hasn't been great, but here's to hoping the next one will be better. Upwards and onwards.

by austin (Austin Seipp) at November 20, 2014 04:34 AM

FP Complete

Stackage server: new features and open source

Open source

We've been working on Stackage server for a while and now that the code has stabilized it's ready to be open source. You can fork it on Github! We're a responsive team, used to bringing pull requests forward and getting them deployed.

Since the last update we added a bunch of things. Here's a rundown:

Clarifying improvements

  • Recommended snapshots: The home page now lists recommended snapshots for GHC 7.8, GHC 7.8 Haskell Platform flavour, and GHC 7.6. Makes it a little more straight-forward where to start.

  • Snapshots: The snapshots page has been cleaned up a bit to cleanly separate times when snapshots are uploaded. Just an aesthetic improvement.

  • We also reorganized the Preparing your system to use Stackage wiki page to be more straight-forward to get started.

Packages

We now list all packages from Hackage on the packages page. A subset of these are included in Stackage snapshots. Hitting those links takes you to our new page package. Examples:

On this page we display:

  • Metadata for the most recently uploaded version of that package:

    • Name, version, license, authorship, maintainer
  • We also display the dependencies of that package. Notice that there are no version constraints. That information is stored in the snapshot itself.

  • Documentation versions: each snapshot that contains a version of this package will be listed along with documentation for that version.

README

We support showing a README from your package (see e.g. hlint example above), which is a really nice way to introduce people to your package. You can use the same one that you use for your Github README.md, just include it in your .cabal file with:

extra-source-files: README.md

If your filename is just README it'll be included as plain text. With the .md extension it will be rendered as markdown.

Please do go ahead and include your README.md in your .cabal file. If you don't have one, please write one and make it helpful and descriptive for any newbies to your package.

If no README file is found it falls back to the package description, which is displayed as plain text. As is well-known in the community at large, writing descriptive pros in Haddock syntax is not pleasant, whereas everyone is pretty much writing their READMEs in markdown anyway thanks to Github.

CHANGELOGs are also displayed as markdown if the file extension is .md, otherwise it's treated as plain text.

Specifying remote-repo in sandboxes

An issue with using Stackage in the past was that you had to either put your remote-repo field in your global cabal config, or setup an hsenv. Or by using a cabal.config with constraints in it. Now the feature has been merged to be able to specify remote-repo in the cabal.config file in your project root. Once this is released you'll be able to use a Stackage snapshot within a sandbox and keep the cabal.config file in your source repository.

Additional package page features

Additional features include:

  • Tagging: tagging a package is easy: you just click the + button, type something and hit return.

    We try to keep the tags of a simple format (a slug), so if you type characters that shouldn't be in there, it'll remove them and then prompt you to confirm.

    Related: if you click a tag it will take you to a page of all packages tagged with that, e.g. parsing.

    Finally, there is a list of all tags. Currently it's rather small because that's just what I populated myself.

  • Likes: just upvote a package if you like it. Hit the thumbs up icon.

  • Comments: comments are provided by Disqus. It's very easy to comment with Disqus, you can use your normal account.

    If you're interested in a package, or are an author, you can hit the Subscribe link displayed at the bottom of the page to subscribe to that discussion to get updates of future comments.

Additional to the other metadata, like Github, we make use of the files in your package, so we will also display:

Summary

Stackage just got easier to use:

  • The site is clearer now.
  • The wiki guide is easier.
  • Using Stackage from a sandbox will soon be very easy.
  • You can browse documentation of your snapshot on Stackage via the snapshot (e.g. here).
  • Or you can start from the package list and view an individual package, vote and tag it.
  • It's open source!

November 20, 2014 12:00 AM

Christopher Done

Lucid: templating DSL for HTML

I’m not big on custom templating languages, for reasons I’ll write about another time. I prefer EDSLs. I preferred the xhtml package back when that was what everybody used. It looked like this:

header << thetitle << "Page title"

thediv noHtml ! [theclass "logo"] << "…"
thediv noHtml ! [identifier "login"]

Pretty line-noisy to read, write and hard to edit in a reasonable manner.

Later, blaze-html became the new goto HTML writing library. It improved upon the XHTML package by being faster and having a convenient monad instance. It looks like this:

page1 = html $ do
    head $ do
        title "Introduction page."
        link ! rel "stylesheet" ! type_ "text/css" ! href "screen.css"
    body $ do
        div ! id "header" $ "Syntax"
        p "This is an example of BlazeMarkup syntax."
        ul $ mapM_ (li . toMarkup . show) [1, 2, 3]

Much easier to read, write and edit thanks to the monad instance.

However, after several years of using that, I’ve come to write my own. I’ll cover the infelicities about Blaze and then discuss my alternative approach.

Reading back through what I’ve written below, it could be read as a bit attacky, and some of the issues are less philosophical and more incidental. I think of it more that the work on writing HTML in a DSL is incomplete and to some degree people somewhat gave up on doing it more conveniently at some point. So I’m re-igniting that.

The combination of having a need to write a few HTML reports and recent discussions about Blaze made me realise it was time for me to come at this problem a-fresh with my own tastes in mind. I also haven’t used my own approach much, other than porting some trivial apps to it.

Blaze

Names that conflict with base

The first problem is that Blaze exports many names which conflict with base. Examples:

div, id, head, map

The obvious problem with this is that you either have to qualify any use of those names, which means you have to qualify Blaze, and end up with something inconsistent like this:

H.div ! A.id "logo" $ "…"

Where H and A come from importing the element and attribute modules like this:

import qualified Text.Blaze.Html5            as H
import qualified Text.Blaze.Html5.Attributes as A

Or you don’t import Prelude and only import Blaze, but then you can’t do a simple map without qualification.

You might’ve noticed in the old xhtml package that thediv and identifier are used instead. The problem with using different names from the actual things they refer to is that they’re hard to learn and remember, both for regular Haskellers and newbies coming to edit your templates.

Names that are keywords

This is a common problem in DSLs, too. In Blaze the problem is: class or type (perhaps others I don’t recall). Blaze solves it with: class_ or type_

Again, the problem with this is that it is inconsistent with the other naming conventions. It’s another exception to the rule that you have to remember and makes the code look bad.

Conflicting attribute and element names

There are also names which are used for both attributes and elements. Examples are style and map. That means you can’t write:

H.head $ style "body { background: red; }"
body $ p ! style "foo" $

You end up writing:

H.head $ H.style "body { background: red; }"
body $ p ! A.style "foo" $

Inconsistency is difficult and ugly

What the above problems amount to is ending up with code like this:

body $ H.div ! A.id "logo" ! class_ "left" ! hidden $ "Content"

At this point users of Blaze give up with second-guessing every markup term they write and decide it’s more consistent to qualify everything:

H.body $ H.div ! A.id "logo" ! A.class_ "left" ! A.hidden $ "Content"

Or, taken from some real code online:

H.input H.! A.type_ "checkbox"
        H.! A.checked True
        H.! A.readonly "true"

This ends up being too much. Inconvenient to type, ugly to read, and one more step removed from the HTML we’re supposed to be generating.

The Monad instance isn’t

The monad instance was originally conceived as a handy way to write HTML nicely without having to use <> or lists of lists and other less wieldy syntax.

In the end the monad ended up being defined like this:

instance Monad MarkupM where
  return _ = Empty
  {-# INLINE return #-}
  (>>) = Append
  {-# INLINE (>>) #-}
  h1 >>= f = h1 >> f
      (error "Text.Blaze.Internal.MarkupM: invalid use of monadic bind")
  {-# INLINE (>>=) #-}

And has been for some years. Let’s take a trivial example of why this is not good. You render some HTML and while doing so build a result to be used later:

do xs <- foldM (\c i -> …)
               mempty
               ys
   mapM_ dd xs

Uh-oh:

*** Exception: Text.Blaze.Internal.MarkupM: invalid use of monadic bind

The overloaded strings instance is bad

The previous point leads onto this next point, which is that due to this phantomesque monad type, the instance is like this:

instance IsString (MarkupM a) where
    fromString = Content . fromString
    {-# INLINE fromString #-}

How can it make this value? It cannot. If you want to go ahead and extract that `a’, you get:

*** Exception: Text.Blaze.Internal.MarkupM: invalid use of monadic bind

Additionally, this instance is too liberal. You end up getting this warning:

A do-notation statement discarded a result of type GHC.Prim.Any

Suppress this warning by saying _ <- "Example" or by using the flag -fno-warn-unused-do-bind

So you end up having to write in practice (again, taken from a real Blaze codebase by one of the authors):

void "Hello!"

Which pretty much negates the point of using IsString in the first-place. Alternatively, you use -fno-warn-unused-do-bind in your module.

Working with attributes is awkward

The ! syntax seems pretty convenient from superficial inspection:

link ! rel "stylesheet" ! type_ "text/css" ! href "screen.css"

But in practice it means you always have the same combination:

div ! H.class_ "logo" $ "…"

Which I find—personally speaking—a bit distasteful to read, it’s not far from what we saw in the old xhtml package:

thediv ! [theclass "logo"] << "…"

Did we really save that much in the attribute department? Operators are evil.

But mostly presents an editing challenge. Operators like this make it tricky to navigate, format in a regular way and do code transformations on. All Haskell code has operators, so this is a general problem. But if your DSL doesn’t actually need these operators, I consider this a smell.

Attributes don’t compose

You should be able to compose with. For example, let’s say you want to define a re-usable component with bootstrap:

container inner = div ! class_ "container" $ inner

Now you can use it to make a container. But consider now that you also want to add additional attributes to it later. You can do that with another call to with:

container ! class_ "main" $ "zot"

In Blaze this produces:

λ> main
"<div class=\"container\" class=\"main\">My content!</div>"

Browsers ignore the latter main, so the composition didn’t work.

Ceremony is tiring

Here’s the example from Blaze’s package, that’s introduced to users.

import Prelude hiding (head, id, div)
import Text.Blaze.Html4.Strict hiding (map)
import Text.Blaze.Html4.Strict.Attributes hiding (title)
import Text.Blaze.Renderer.Utf8 (renderMarkup)

page1 :: Markup
page1 = html $ do
    head $ do
        title "Introduction page."
        link ! rel "stylesheet" ! type_ "text/css" ! href "screen.css"
    body $ do
        div ! id "header" $ "Syntax"
        p "This is an example of BlazeMarkup syntax."
        ul $ mapM_ (li . toMarkup . show) [1, 2, 3]

main = print (renderMarkup page1)

Apart from the import backflips you have to do to resolve names properly, you have at least three imports to make just to render some HTML. Call me lazy, or stupid, but I never remember this deep hierarchy of modules and always have to look it up every single time. And I’ve been using Blaze for as long as the authors have.

Transforming

A smaller complaint is that it would sometimes be nice to transform over another monad. Simplest example is storing the read-only model information in a reader monad and then you don’t have to pass around a bunch of things as arguments to all your view functions. I’m a big fan of function arguments for explicit state, but not so much if it’s the same argument every time.

No Show instance

It would be nice if you could just write some markup in the REPL without having to import some other modules and wrap it all in a function just to see it.

Lucid

My new library, Lucid, attempts to solve most of these problems.

Naming issues

Firstly, all names which are representations of HTML terms are suffixed with an underscore _:

p_, class_, table_, style_

No ifs or buts. All markup terms.

That solves the following problems (from the issues described above):

  • Names that conflict with base: div_, id_, head_, map_, etc.
  • Names that are keywords: class_, type_, etc.
  • Conflicting attribute and element names: solved by abstracting those names via a class. You can write style_ to mean either the element name or the attribute name.
  • Inconsistency is difficult and ugly: there’s no inconsistency, all names are the same format.

No import problems or qualification. Just write code without worrying about it.

How it looks

Plain text is written using the OverloadedStrings and ExtendedDefaultRules extensions, and is automatically escaped:

λ> "123 < 456" :: Html ()
123 &lt; 456

Elements nest by function application:

λ> table_ (tr_ (td_ (p_ "Hello, World!")))
<table><tr><td><p>Hello, World!</p></td></tr></table>

Elements are juxtaposed via monoidal append:

λ> p_ "hello" <> p_ "sup"
<p>hello</p><p>sup</p>

Or monadic sequencing:

λ> div_ (do p_ "hello"; p_ "sup")
<div><p>hello</p><p>sup</p></div>

Attributes are set using the with combinator:

λ> with p_ [class_ "brand"] "Lucid Inc"
<p class="brand">Lucid Inc</p>

Conflicting attributes (like style_) work for attributes or elements:

λ> html_ (head_ (style_ "body{background:red}") <>
                 with body_ [style_ "color:white"]
                      "Look ma, no qualification!")
<html><head><style>body{background:red}</style></head>
<body style="color:white">Look ma, no qualification!</body></html>

The Blaze example

For comparison, here’s the Blaze example again:

page1 = html $ do
    head $ do
        title "Introduction page."
        link ! rel "stylesheet" ! type_ "text/css" ! href "screen.css"
    body $ do
        div ! id "header" $ "Syntax"
        p "This is an example of BlazeMarkup syntax."
        ul $ mapM_ (li . toMarkup . show) [1, 2, 3]

And the same thing in Lucid:

page2 = html_ $ do
    head_ $ do
      title_ "Introduction page."
      with link_ [rel_ "stylesheet",type_ "text/css",href_ "screen.css"]
    body_ $ do
        with div_ [id_ "header"] "Syntax"
        p_ "This is an example of Lucid syntax."
        ul_ $ mapM_ (li_ . toHtml . show) [1,2,3]

I’m not into operators like ($) and swung indentation like that, but I followed the same format.

I’d write it in a more Lispy style and run my hindent tool on it:

page1 =
  html_ (do head_ (do title_ "Introduction page."
                      with link_
                           [rel_ "stylesheet"
                           ,type_ "text/css"
                           ,href_ "screen.css"])
            body_ (do with div_ [id_ "header"] "Syntax"
                      p_ "This is an example of Lucid syntax."
                      ul_ (mapM_ (li_ . toHtml . show)
                                 [1,2,3])))

But that’s another discussion.

It’s a real monad

Normal monadic operations work properly:

λ> (return "OK!" >>= p_)
<p>OK!</p>

It’s basically a writer monad.

In fact, it’s also a monad transformer:

λ> runReader (renderTextT (html_ (body_ (do name <- lift ask
                                            p_ (toHtml name)))))
             ("Chris" :: String)
"<html><body><p>Chris</p></body></html>"

Overloaded strings instance is fine

The instance is constrained over the return type being (). So string literals can only be type HtmlT m ().

λ> do "x" >> "y" :: Html ()
xy

λ> do x <- "x"; toHtml (show x)
x()

Attributes

Attributes are simply written as a list. That’s all. Easy to manipulate as a data structure, easy to write and edit, and automatically indent in a predictable way:

λ> with p_ [id_ "person-name",class_ "attribute"] "Mary"
<p id="person-name" class="attribute">Mary</p>

No custom operators are required. Just the with combinator. If you want to indent it, just indent it like normal function application:

with p_
     [id_ "person-name",class_ "attribute"]
     "Mary"

And you’re done.

Composing attributes

You should be able to compose with. For example, let’s say you want to define a re-usable component with bootstrap:

λ> let container_ = with div_ [class_ "container "]

Now you can use it to make a container:

λ> container_ "My content!"
<div class="container ">My content!</div>

But consider now that you also want to add additional attributes to it later. You can do that with another call to with:

λ> with container_ [class_ "main"] "My content!"
<div class="container main">My content!</div>

Duplicate attributes are composed with normal monoidal append. Note that I added a space in my definition of container anticipating further extension later. Other attributes might not compose with spaces.

Unceremonious

Another part I made sure was right was lack of import nightmare. You just import Lucid and away you go:

λ> import Lucid
λ> p_ "OK!"
<p>OK!</p>
λ> p_ (span_ (strong_ "Woot!"))
<p><span><strong>Woot!</strong></span></p>
λ> renderBS (p_ (span_ (strong_ "Woot!")))
"<p><span><strong>Woot!</strong></span></p>"
λ> renderToFile "/tmp/foo.html" (p_ (span_ (strong_ "Woot!")))

If I want to do more advanced stuff, it’s all available in Lucid. But by default it’s absolutely trivial to get going and output something.

Speed

Actually, despite having a trivial implementation, being a real monad and a monad transformer, it’s not far from Blaze. You can compare the benchmark reports here. A quick test of writing 38M of HTML to file yielded the same speed (about 1.5s) for both Lucid and Blaze. With such decent performance for very little work I’m already ready to start using it for real work.

Summary

So the point of this post was really to explain why another HTML DSL and I hope I did that well enough.

The code is on Github. I pushed to Hackage but you can consider it beta for now.

November 20, 2014 12:00 AM

November 19, 2014

Kototama

Four Haskell books

In 2013 and 2014 I read four Haskell books. In this post I’m making mini-reviews for each of them, chronologically ordered.

Learn you a Haskell for Great Good

Learn you a Haskell for Great Good is one of the best computer science book I had the occasion to read, and I even admit I was a little sad once I finished it! It exposes the main concepts of Haskell in a very pedagogic way and clear writing style. After reading the book, functors, applicative and monads become easily understandable. The only drawback is that this book is not enough to start with Haskell, there are no explanations of how to create a Haskell project using Cabal, use libraries, test your application etc. I think it’s okay for an introductory book but one should be informed. The book is suited for people without experience with Haskell or functional programming.

Developing Web Applications with Haskell and Yesod

I read Developing Web Applications with Haskell and Yesod to discover other ways of building web applications, as I mostly had experience with Clojure and Compojure. I enjoyed reading this book which is short but contains a lot, going from the frontend to the backend and explaining a lot of subjects: HTML templates, sessions, authentification, persistence with databases etc. You discover how a statically typed language can help you building a web application with more guarantees. The only drawback for me was that even after reading one book on Haskell, some of the type signatures of functions were still hard to understand. Some familiarity with monad transformers prior to reading this book may help.

Real World Haskell

The goal of Real World Haskell was to bring Haskell to a less academic audience by showing how Haskell could be used for “real world” applications, at a time where there weren’t so many Haskell books there. In this sense I think the book succeed. There are a lot of interesting subjects tackled in this book, like profiling and performance analysis but I did not really enjoy reading it. Either the examples were a bit boring or the writing style was too dry for me, in the end I had to fight to finish this very long book (~700 pages!). I nonetheless appreciate that this book exist and I may use it in the future as a reference. The book is a bit outdated and some code are not valid anymore ; it was not a problem for me since I didn’t try out the examples but this should be considered if you want to learn Haskell with it.

Beginning Haskell

Beginning Haskell is a paradoxical book. The truth is this book should not have been published because its edition and proof-reading are too bad. What do you think about a book where the first code sample is wrong? map listOfThings action instead of map action listOfThings on page 4, come one… Also the subtitle reads “a Project-Based approach” but this seems to be exaggerated since you will only get some code samples there and there… That being said, I really enjoyed reading this book! I don’t know if it’s a good introductory book but it’s surely is a correct second book on Haskell. It explains how to use Cabal, test an application, use popular libraries like Conduit, build parsers with attoparsec, building a Web Application with Scotty and Fay etc. It includes advanced topics like type programming (enforcing more constraints at the type-level) and even a small introduction to Idris. Conclusion: either burn it or read it.

Conclusion

There are more and more books on Haskell newly published or coming, this makes me happy as this broaden the Haskell community, shows that it is growing and makes it easier to learn the language. Reading is not enough to learn so at the moment I’m writing small Haskell projects to go from a beginner to intermediate level.

November 19, 2014 11:00 PM

Neil Mitchell

Announcing Shake 0.14

Summary: Shake 0.14 is now out. The *> operator became %>. If you are on Windows the FilePath operations work differently.

I'm pleased to announce Shake 0.14, which has one set of incompatible changes, one set of deprecations and some new features.

The *> operator and friends are now deprecated

The *> operator has been the Shake operator for many years, but in GHC 7.10 the *> operator is exported by the Prelude, so causes name clashes. To fix that, I've added %> as an alias for *>. I expect to export *> for the foreseeable future, but you should use %> going forward, and will likely have to switch with GHC 7.10. All the Shake documentation has been updated. The &*> and |*> operators have been renamed &%> and |%> to remain consistent.

Development.Shake.FilePath now exports System.FilePath

Previously the module Development.Shake.FilePath mostly exported System.FilePath.Posix (even on Windows), along with some additional functions, but modified normalise to remove /../ and made </> always call normalise. The reason is that if you pass non-normalised FilePath values to need Shake can end up with two names for one on-disk file and everything goes wrong, so it tried to avoid creating non-normalised paths.

As of 0.14 I now fully normalise the values inside need, so there is no requirement for the arguments to need to be normalised already. This change allows the simplification of directly exporting System.FilePath and only adding to it. If you are not using Windows, the changes are:

  • normalise doesn't eliminate /../, but normaliseEx does.
  • </> no longer calls normalise. It turned out calling normalise is pretty harmful for FilePattern values, which leads to fairly easy-to-make but hard-to-debug issues.

If you are using Windows, you'll notice all the operations now use \ instead of /, and properly cope with Windows-specific aspects like drives. The function toStandard (convert all separators to /) might be required in a few places. The reasons for this change are discussed in bug #193.

New features

This release has lots of new features since 0.13. You can see a complete list in the change log, but the most user-visible ones are:

  • Add getConfigKeys to Development.Shake.Config.
  • Add withTempFile and withTempDir for the Action monad.
  • Add -j to run with one thread per processor.
  • Make |%> matching with simple files much faster.
  • Use far less threads, with corresponding less stack usage.
  • Add copyFileChanged.

Plans

I'm hoping the next release will be 1.0!

by Neil Mitchell ([email protected]) at November 19, 2014 09:27 PM

Philip Wadler

CITIZENFOUR showing in Edinburgh

CITIZENFOUR, Laura Poitras's documentary on Edward Snowden, sold out for its two showings in Edinburgh. The movie is rated five-stars by the Guardian ("Gripping"), and four-stars by the Independent ("Extraordinary"), the Financial Times ("True-life spy thriller"), the Observer ("Utterly engrossing"), and the Telegraph ("Everybody needs to see it").

Kickstarter-like, OurScreen will arrange to show a film if enough people sign up to see it. I've scheduled a showing:

Cameo Edinburgh, 12 noon, Tuesday 2 December 2015

If 34 people sign up by Sunday 23 November, then the showing will go ahead.

by Philip Wadler ([email protected]) at November 19, 2014 11:40 AM

Danny Gratzer

Functors and Recursion

Posted on November 19, 2014
Tags: haskell

One of the common pieces of folklore in the functional programming community is how one can cleanly formulate recursive types with category theory. Indeed, using a few simple notions we can build a coherent enough explanation to derive some concrete benefits.

In this post I’ll outline how one thinks of recursive types and then we’ll discuss some of the practical ramifications of such thoughts.

Precursor

I’m assuming the reader is familiar with some basic notions from category theory. Specifically familiarity with the definitions of categories and functors.

Let’s talk about endofunctors, which are functors whose domain and codomain are the same. spoiler: These are the ones we care about in Haskell. An interesting notion that comes from endofunctors is that of algebras. An algebra in this sense is a pair of an object C, and a map F C → C. Here F is called the “signature” and C is called the carrier.

If you curious about why these funny terms, in abstract algebra we deal with algebras which are comprised of a set of distinguished elements, functions, and axioms called the signature. From there we look at sets (called carriers) which satisfy the specification. We can actually cleverly rearrange the specification for something like a group into an endofunctor! It’s out of scope for this post, but interesting if algebras your thing.

Now we can in fact define a category for F-algebras. in such a category an object is α : F A → A and each arrow is a triplet.

  • normal arrow f : A → B
  • An F-algebra α : F A → A
  • Another F-algebra β : F B → B

So that f ∘ α = β ∘ F f. In picture form

         F f
F A ———————————————–→ F B
 |                    |
 |                    |
 | α                  | β
 ↓                    ↓
 A —————————————————→ B
           f

commutes. I generally elide the fact that we’re dealing with triplets and instead focus on the arrow, since that’s the interesting bit.

Now that we’ve established F-algebras, we glance at one more thing. There’s one more concept we need, the notion of initial objects. An initial object is an… object, I in a category so that for any object C

          f
 I - - - - - - - - → C

So that f is unique.

Now what we’re interested in investigating is the initial object in the category of F-algebras. That’d mean that

           α
F I ————————————————–→ I
 |                     |
 |
 | F λ                 | λ
 |
 ↓                     ↓
F C —————————————————→ C

Commutes only for a unique λ.

A List is just an Initial Object in the Category of F-Algebras.

What’s the problem?

Now, remembering that we’re actually trying to understand recursive types, how can we fit the two together? We can think of recursive types as solutions to certain equations. In fact, our types are what are called the least fixed point solutions. Let’s say we’re looking at IntList. We can imagine it defined as

    data IntList = Cons Int IntList | Nil

We can in fact, factor out the recursive call in Cons and get

    data IntList a = Cons Int a | Nil
                   deriving Functor

Now we can represent a list of length 3 as something like

    type ThreeList = IntList (IntList (IntList Void))

Which is all well and good, but we really want arbitrary length list. We want a solution to the equation that

X = IntList X

We can view such a type as a set {EmptyList, OneList, TwoList, ThreeList ... }. Now how can we actually go about saying this? Well we need to take a fixed point of the equation! This is easy enough in Haskell since Haskell’s type system is unsound.

    -- Somewhere, somehow, a domain theorist is crying.
    data FixedPoint f = Fix {unfix :: f (FixedPoint f)}

Now we can regain our normal representation of lists with

    type List = FixedPoint IntList

To see how this works

    out :: FixedPoint IntList -> [Int]
    out (Fix f) = case fmap out f of
                    Nil -> []
                    Cons a b -> a : b

    in :: [Int] -> FixedPoint IntList
    in [] = Nil
    in (x : xs) = Fix (Cons x (in xs))

Now this transformation is interesting for one reason in particular, IntList is a functor. Because of this, we can formulate an F-algebra for IntList.

    type ListAlg a = IntList a -> a

Now we consider what the initial object in this category would be. It’d be something I so that we have a function

    cata :: Listalg a -> (I -> a) -- Remember that I -> a is an arrow in F-Alg
    cata :: (List a -> a) -> I -> a
    cata :: (Either () (a, Int) -> a) -> I -> a
    cata :: (() -> a) -> ((a, Int) -> a) -> I -> a
    cata :: a -> (Int -> a -> a) -> I -> a
    cata :: (Int -> a -> a) -> a -> I -> a

Now that looks sort of familiar, what’s the type of foldr again?

    foldr :: (a -> b -> b) -> b -> [a] -> a
    foldr :: (Int -> a -> a) -> a -> [Int] -> a

So the arrow we get from the initiality of I is precisely the same as foldr! This leads us to believe that maybe the initial object for F-algebras in Haskell is just the least fixed point, just as [Int] is the least fixed point for IntList.

To confirm this, let’s generalize a few of our definitions from before

    type Alg f a = f a -> a
    data Fix f = Fix {unfix :: f (Fix f)}

    type Init f = Alg f (Fix f)

    cata :: Functor f => Alg f a -> Fix f -> a
    cata f = f . fmap (cata f) . unfix

Exercise, draw out the reduction tree for cata on lists

Our suspicion is confirmed, the fixed point of an functor is indeed the initial object. Further more, we can easily show that initial objects are unique up to isomorphism (exercise!) so anything that can implement cata is isomorphic to the original, recursive definition we were interested in.

When The Dust Settles

Now that we’ve gone and determined a potentially interesting fact about recursive types, how can we use this knowledge? Well let’s start with a few things, first is that we can define a truly generic fold function now:

    fold :: Functor f => (f a -> a) -> Fix f -> a

This delegates all the messy details of how one actually thinks about handling the “shape” of the container we’re folding across by relegating it to the collapsing function f a -> a.

While this may seem like a small accomplishment, it does mean that we can build off it to create data type generic programs that can be fitted into our existing world.

For example, what about mutual recursion. Fold captures the notion of recurring across one list in a rather slick way, however, recurring over two in lockstep involves a call to zip and other fun and games. How can we capture this with cata?

We’d imagine that the folding functions for such a scenario would have the type

    f (a, b) -> a
    f (a, b) -> b

From here we can build

    muto :: (f (a, b) -> a) -> (f (a, b) -> b) -> Fix f -> (a, b)
    muto f g = cata ((,) <$> f <*> g)

Similarly we can build up oodles of combinators for dealing with folding all built on top of cata!

That unfortunately sounds like a lot of work! We can shamelessly free-load of the hard work of others thanks to hackage though. In particular, the package recursion-schemes has built up a nice little library for dealing with initial algebras. There’s only one big twist between what we’ve laid out and what it does.

One of the bigger stumbling blocks for our library was changing the nice recursive definition of a type into the functorfied version. Really it’s not realistic to write all your types this way. To help simplify the process recursion-schemes provides a type family called Base which takes a type and returns its functorfied version. We can imagine something like

    data instance Base [a] b = Cons a b | Nil

This simplifies the process of actually using all these combinators we’re building. To use recursion-schemes, all you need to is define such an instance and write project :: t -> Base t t. After that it’s all kittens and recursion.

Wrap Up

So dear reader, where are we left? We’ve got a new interesting formulation of recursive types that yields some interesting results and power. There’s one interesting chunk we’ve neglected though: what does unfolding look like?

It turns out there’s a good story for this as well, unfolding is the operation (anamorphism) defined by a terminal object in a category. A terminal object is the precise dual of an initial one. You can notice this all in recursion-schemes which features ana as well as cata.

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

November 19, 2014 12:00 AM

November 18, 2014

The GHC Team

GHC Weekly News 2014/11/18

Hello *,

Once more we have the GHC Weekly news! This one is a bit late due to Austin being in limbo unexpectedly for a few days last week. (The next one will of course come again on Friday to keep things straight.)

With that out of the way, let's see what exactly is going on:

  • The STABLE freeze is happening at the end of this week! That means if you have something you want to get in, try to get people aware of it! Austin (yours truly) has a backed up review queue it would seem, but hopes to clear a bunch of it out before then.
  • Herbert Valerio Riedel has finally landed integer-gmp2, AKA Phab:D86, which implements a complete overhaul of the integer-gmp library. This library will be switched on by default in GHC 7.10.1, which means the integer-gmp library version will have a super-major bump (version 1.0.0.0). This is the beginning of a longer-term vision for more flexible Integer support in GHC, as described by Herbert on the design page: https://ghc.haskell.org/trac/ghc/wiki/Design/IntegerGmp2 This implementation also fixes a long standing pain point where GHC would hook GMP allocations to exist on the GHC heap. Now GMP is just called to like any FFI library.
  • Jan Stolarek made a heads up to help out GHC newcomers: if you see a ticket that should be easy, please tag it with the newcomer keyword! This will let us have a live search of bugs that new developers can take over. (Incidentally, Joachim mentions this is the same thing Debian is doing in their bug tracker): https://www.haskell.org/pipermail/ghc-devs/2014-November/007313.html
  • Adam Gundry, Eric Seidel, and Iavor Diatchki have grouped together to get a new, unexpected feature into 7.10: type checking plugins. Now, GHC will be able to load a regular Haskell package as a plugin during the compilation process. Iavor has a work-in-progress plugin that solves constraints for type-level natural numbers using a SMT solver. The code review from everyone was published in Phab:D489.

Closed tickets this week include: #9785, #9053, #9513, #9073, #9077, #9683, #9662, #9646, #9787, #8672, #9791, #9781, #9621, #9594, #9066, #9527, #8100, #9064, #9204, #9788, #9794, #9608, #9442, #9428, #9763, #9664, #8750, #9796, #9341, #9330, #9323, #9322, #9749, #7381, #8701, #9286, #9802, #9800, #9302, #9174, #9171, #9141, #9100, #9134, #8798, #8756, #8716, #8688, #8680, #8664, #8647, #9804, #8620, #9801, #8559, #8559, #8545, #8528, #8544, #8558

by thoughtpolice at November 18, 2014 05:35 PM

Darcs

Darcs News #107

News and discussions

  1. Darcs has received two grants from the Google Summer of Code program, as part of the umbrella organization Haskell.org. Alejandro Gadea will work on history reordering:
  2. Marcio Diaz will work on the cache system:
  3. Repository cloning to remote ssh hosts has been present for years as darcs put. This feature has now a more efficient implementation:

Issues resolved (11)

issue851 Dan Frumin
issue1066 Guillaume Hoffmann
issue1268 Guillaume Hoffmann
issue1416 Ale Gadea
issue1987 Marcio Diaz
issue2263 Ale Gadea
issue2345 Dan Frumin
issue2357 Dan Frumin
issue2365 Guillaume Hoffmann
issue2367 Guillaume Hoffmann
issue2379 Guillaume Hoffmann

Patches applied (41)

See darcs wiki entry for details.

by guillaume ([email protected]) at November 18, 2014 03:45 AM

Darcs News #109

News and discussions

  1. We are in the feature freeze period of darcs 2.10:
  1. Our two Summer of Code projects ended up two months ago. Marcio and Ale's code will be part of the upcoming new stable version of darcs. In case you missed them, here are the latest posts of Marcio for his project:
  1. Ale's posts:

Issues resolved (7)

issue1514 Guillaume Hoffmann
issue1624 Marcio Diaz
issue2153 Andreas Brandt
issue2249 Mateusz Lenik
issue2380 Owen Stephens
issue2403 Ganesh Sittampalam
issue2409 Ganesh Sittampalam

Patches applied (118)

See darcs wiki entry for details.

by guillaume ([email protected]) at November 18, 2014 03:45 AM

November 16, 2014

Magnus Therning

Regular Haskelling. How?

Ever since ICFP 2014 I’ve had as a goal to get into the habit of coding in Haskell. It’s been the language I enjoy most for a few years now, but being surrounded by and talking to so many brilliant developers as I did during that week really drove home that I will only have more fun the more code I write. My goal was not very ambitious; just write something in Haskell most days every week. So far I’ve managed to keep it up.

These are a few tricks I’ve used and they’ve worked well for me so far.

Just write, no matter what, just write

In ninth grade a rather successful Swedish author visited my school and what I remember most from that is one thing he said:

Just read! It doesn’t matter what. It doesn’t matter if what you read isn’t considered good literature; read Harlequin books if that’s what you like, read magazines, read comics. Just read!

I think the same holds for writing code; it’s only with practice that one gets comfortable expressing oneself in a particular language.

Fix warts

I can’t actually think of any piece of code I’ve written that doesn’t have some warts. It may be in the form of missing features, or quirks (bugs) in the implementation that forces the user to regularly work in a less-than-optimal way. I’ve found fixing warts in tools and libraries I use myself to be one of the most rewarding tasks to take on; the feedback is so immediate that every fix cause a boost in motivation to fix the next one.

Exercise sites

Sometimes it’s simply difficult to find the motivation to tackle working on an existing project, and inspiration for starting something new might be lacking too. This happens to me regularly, and I used to simply close the lid on the computer earlier, but now I try to find some exercises to do instead.

There are several sources of exercises. I know Project Euler is rather popular among new Haskellers, but there are others.

  • CodeEval is a site with problems in three different levels. It may be extra interesting for people in the US since some of the problems are sponsored by companies which seem to use the site as a place for recruiting. So far I’ve only seen American companies do that, but I suppose it might catch on in other parts of the world too. Haskell is one of several languages supported.
  • Exercism is both a site and a tool. The goal is to facilitate learning of languages. On first use the tool will download the first exercise, and after completion one uses it to upload the solution to the site. Once uploaded the solution is visible to other users, and they are allowed to “nitpick” (comment). After uploading a solution to one exercise the next exercise in the series becomes available. It supports a rather long list programming languages.

I like both of these, but I’ve spent more time on the latter one. Personally I find the idea behind Exercism very appealing and I’ve been recommending it to a couple of co-workers already.

Feel free to put links to other sources of exercises in the comments.

Simplify old code

With more practice comes more and more insights into what functions are available and how to string them together. When I don’t even feel like doing a full exercise on Exercism I just dig out something that smells a little and clean it up. Anything is fair game, no matter how tiny. Just take a look at my implementation of reThrowError.

What else?

I’d love to hear tips and tricks from other people who aren’t lucky enough to have a day job where they get to write Haskell. How do you keep up the learning and practice?

November 16, 2014 12:00 AM

November 15, 2014

LambdaCube

Playing around with distance field font rendering

While waiting for LambdaCube to reach a stable state I started thinking about displaying text with it. I wanted to build a system that doesn’t limit the user to a pre-determined set of characters, so all the static atlas based methods were out of the question. From experience I also know that a dynamic atlas can fill very quickly as the same characters need to be rendered to it at all the different sizes we need to display them in. But I also wanted to see nice sharp edges, so simply zooming the texture was out of the question.

What’s out there?

Short of converting the characters to triangle meshes and rendering them directly, the most popular solution of this problem is to represent them as signed distance fields (SDF). The most commonly cited source for this approach is a well-known white paper from Valve. Unfortunately, the problem with SDF based rendering is that it generally cannot preserve the sharpness of corners beyond the resolution of the texture:

Rounded corners reconstructed from a low-res SDF

Rounded corners reconstructed from a low-res SDF

The above letter was rendered at 64 pixels per em size. Displaying SDF shapes is very simple: just sample the texture with bilinear interpolation and apply a step function. By choosing the right step function adapted to the scale, it’s possible to get cheap anti-aliasing. Even simple alpha testing can do the trick if that’s not needed, in which case rendering can be done without shaders. As an added bonus, by choosing different step functions we can render outlines, cheap blurs, and various other effects. In practice, the distance field does an excellent job when it comes to curves. When zoomed in, the piecewise bilinear nature of the curves becomes apparent – it actually tends to look piecewise linear –, but at that point we could just swap in the real geometry if the application needs higher fidelity.

Of course, all of this is irrelevant if we cannot afford to lose the sharp corners. There are some existing approaches to solve this problem, but only two come to mind as serious attempts: Loop and Blinn’s method (low-poly mesh with extra information on the curved parts) and GLyphy (SDF represented as a combination of arc spline approximations). Both of these solutions are a bit too heavy for my purposes, and they also happen to be patented, which makes them not so desirable to integrate in one’s own system.

The Valve paper also points towards a possible solution: corners can be reconstructed from a two-channel distance field, one for each incident edge. They claimed not to pursue that direction because they ‘like the rounded style of this text’ (yeah, right!). Interestingly, I haven’t been able to find any follow-up on this remark, so I set out to create a solution along these lines. The end result is part of the LambdaCube Font Engine, or Lafonten for short.

Doubling the channels

Valve went for the brute-force option when generating the distance fields: take a high-resolution binary image of the shape and for each pixel measure the distance to the nearest pixel of the opposite colour. I couldn’t use this method as a starting point, because in order to properly reconstruct the shapes of letters I really needed to work with the geometry. Thankfully, there’s a handy library for processing TrueType fonts called FontyFruity that can extract the Bézier control points for the character outlines (note: at the time of this writing, the version required by Lafonten is not available on Hackage yet).

The obvious first step is to take the outline of the character, and slice it up along the corners that need to be kept sharp. These curve sections need to be extruded separately and they need to alternate between the two channels. If there’s an odd number of sections, one segment – preferably the longest one to minimise artifacts – can gradually transition from one channel to the other:

Extruding the outline sections on separate channels

Extruding the outline sections on separate channels

The extrusion is performed in both directions, so the real outline of the character is the median line of the extruded sections. Afterwards, the internal part needs to be filled somehow. It turns out that simply using the maximum value in both channels yields good results:

The inner area is filled with the maximum value

The inner area is filled with the maximum value

The shape used for filling is created from the inner edges of the extruded curve sections. The convex corners are derived as intersections between the consecutive curves, while the concave corners need to be covered more aggressively to make sure there are no dark holes within the contour (not to mention that those curves don’t necessarily intersect if the angle is sharp enough).

Another transformation that turned out to be useful to avoid some undesired interaction between unrelated curves is an extra cutting step applied at obtuse convex angles. For instance, the bottom left part of the 3 above actually looks like this:

Cutting off the unneeded inner parts

Cutting off the unneeded inner parts

It might be the case that this cutting step is redundant, as it was originally introduced while I was experimenting with a different, more brittle channel selection scheme.

Unfortunately, this two-channel distance field doesn’t contain enough information to reconstruct the original shape. The problem is that depending on whether a corner is convex or concave we need to combine the fields with a different function to get an approximation. Convex corners are defined as the minimum of the two values, while concave ones are the maximum. When we render the above image on a low-res texture and sample it linearly, we cannot really determine which case we’re looking at without additional data.

Quadrupling the channels

After some experimentation, I came to the conclusion that I needed two additional channels to serve as masks. The final formula to calculate the approximate distance is the following:

max(max(d1 * m1, d2 * m2), min(d1, d2)).

The values of d1 and d2 come from the channels shown above, while m1 and m2 depend on what colour section we are in. If we’re in a section approximated by d1, then m1 = 1 and m2 = 0. For d2, it’s the opposite. The values of m1 and m2 can be obtained by thresholding the two extra channels.

The new channels can be thought of as weights, and they explicitly mark which distance channel is relevant in a given position. This is what they look like for the same character:

The contents of the weight channels

The contents of the weight channels

The trickiest part is around the corners: we need to make sure that the weights kick in before the distance from the other channel disappears, but not too soon, otherwise the current channel would interfere with the other side of the corner. There’s a lot of hand tuning and magic constants involved, because I couldn’t find the time yet to derive a proper formula in a principled manner, but it handles most situations well enough already. It’s also obvious that the weights are erroneously low in the concave corners, but somehow this bug never seems to lead to visible errors so I haven’t fixed it yet.

One way to visualise the interaction of the four channels is thresholding them and adding the corresponding weights and distances:

Visualising the reconstruction of the distance field

Visualising the reconstruction of the distance field

The left hand side shows the result of thresholding and adding the channels. The right hand side is the same image, but the green channel is filled with the result of evaluating the reconstruction formula. This visualisation also shows how delicate the balance is around convex curvatures when generating the distance fields with this approach. Fortunately it does the trick!

One last step towards happiness

Unfortunately, this is not the end of the story. When the above geometry is rendered at a small resolution, we get some new artifacts partly from rasterisation, partly from linear interpolation in certain cases. The first problem is caused by the fact that the corner vertices of the fill geometry are not shared with the outlines, just derived with a formula, so there’s no guarantee that the rasteriser will fill all the pixels. The second issue can happen when inner edges of the opposite channels touch. When this happens, interpolation is performed between opposite values that both represent inner areas, but their average is so low that it’s interpreted as an outer area:

Artifact resulting from linear interpolation

Artifact resulting from linear interpolation

As it turns out, both issues can be resolved in a single post-processing pass that detects these specific situations and patches up the texture accordingly. All-maximum pixels are spread to adjacent all-minimum pixels, and directly facing opposite channels also unified with the max function.

The bottom line

So was the final outcome worth the trouble? I believe so. The resolution necessary to correctly represent different characters highly depends on the font. Fonts with small details (e.g. small serifs) obviously require a finer resolution to reproduce with high fidelity. For instance, Times New Roman needs about 128 pixels per em, which is probably too much for most purposes. On the other hand, Droid Sans is fine with just 64 pixels per em, and Droid Serif with 72. As an extreme example, Ubuntu Regular renders nicely from just 40 pixels per em. In any case, the improvement over the simple distance field of the same resolution is quite spectacular:

Comparison between renders from the simple and composite distance fields

Comparison between renders from the simple and composite distance fields

In the end, I achieved the original goal. The characters are added to the atlas dynamically on demand, and they can be rendered at arbitrary sizes without getting blurry or blocky in appearance. At the moment the baking step is not very optimised, so it’s not possible to generate many characters without introducing frame drops. But it’s fast enough to be used on a loading screen, for instance.

What next?

I’m surprised how well this method works in practice despite the ad hoc approach I took in generating the distance field. However, I’m missing the robustness of the simple distance field, which doesn’t break down in surprising ways no matter what the input shape is. One alternative I’d like to explore involves a very different way of filling the channels. Instead of rendering the geometry directly as described above, I’m considering the use of diffusion curves or generalised Voronoi diagrams. Since the resolution needed to reproduce individual characters is not terribly high, it could be okay to do the baking on the CPU with a dumb brute-force algorithm at first. This would also make it possible to generate characters in a background thread, which would be ideal for games.

by cobbpg at November 15, 2014 07:32 AM

Edward Z. Yang

Tomatoes are a subtype of vegetables

/img/vegetables/st1.png

Subtyping is one of those concepts that seems to makes sense when you first learn it (“Sure, convertibles are a subtype of vehicles, because all convertibles are vehicles but not all vehicles are convertibles”) but can quickly become confusing when function types are thrown into the mix. For example, if a is a subtype of b, is (a -> r) -> r a subtype of (b -> r) -> r? (If you know the answer to this question, this blog post is not for you!) When we asked our students this question, invariably some were lead astray. True, you can mechanically work it out using the rules, but what’s the intuition?

Maybe this example will help. Let a be tomatoes, and b be vegetables. a is a subtype of b if we can use an a in any context where we were expecting a b: since tomatoes are (culinary) vegetables, tomatoes are a subtype of vegetables.

What about a -> r? Let r be soup: then we can think of Tomato -> Soup as recipes for tomato soup (taking tomatoes and turning them into soup) and Vegetable -> Soup as recipes for vegetable soup (taking vegetables—any kind of vegetable—and turning them into soup). As a simplifying assumption, let's assume all we care about the result is that it’s soup, and not what type of soup it is.

/img/vegetables/recipes.png

What is the subtype relationship between these two types of recipes? A vegetable soup recipe is more flexible: you can use it as a recipe to make soup from tomatoes, since tomatoes are just vegetables. But you can’t use a tomato soup recipe on an eggplant. Thus, vegetable soup recipes are a subtype of tomato soup recipes.

/img/vegetables/st2.png
/img/vegetables/st2a.png

This brings us to the final type: (a -> r) -> r. What is (Vegetable -> Soup) -> Soup? Well, imagine the following situation...


One night, Bob calls you up on the phone. He says, “Hey, I’ve got some vegetables left in the fridge, and I know your Dad was a genius when it came to inventing recipes. Do you know if he had a good soup recipe?”

/img/vegetables/phone.png

“I don’t know...” you say slowly, “What kind of vegetables?”

“Oh, it’s just vegetables. Look, I’ll pay you back with some soup, just come over with the recipe!” You hear a click on the receiver.

You pore over your Dad’s cookbook and find a tomato soup recipe. Argh! You can’t bring this recipe, because Bob might not actually have tomatoes. As if on cue, the phone rings again. Alice is on the line: “The beef casserole recipe was lovely; I’ve got some tomatoes and was thinking of making some soup with them, do you have a recipe for that too?” Apparently, this happens to you a lot.

“In fact I do!” you turn back to your cookbook, but to your astonishment, you can’t find your tomato soup recipe any more. But you do find a vegetable soup recipe. “Will a vegetable soup recipe work?”

“Sure—I’m not a botanist: to me, tomatoes are vegetables too. Thanks a lot!”

You feel relieved too, because you now have a recipe for Bob as well.


Bob is a person who takes vegetable soup recipes and turns them into soup: he’s (Vegetable -> Soup) -> Soup. Alice, on the other hand, is a person who takes tomato soup recipes and turns them into soup: she’s (Tomato -> Soup) -> Soup. You could give Alice either a tomato soup recipe or a vegetable soup recipe, since you knew she had tomatoes, but Bob’s vague description of the ingredients he had on hand meant you could only bring a recipe that worked on all vegetables. Callers like Alice are easier to accommodate: (Tomato -> Soup) -> Soup is a subtype of (Vegetable -> Soup) -> Soup.

/img/vegetables/st3.png

In practice, it is probably faster to formally reason out the subtyping relationship than it is to intuit it out; however, hopefully this scenario has painted a picture of why the rules look the way they do.

by Edward Z. Yang at November 15, 2014 02:00 AM

November 14, 2014

Robin KAY

HsQML 0.3.2.0 released: Enters the Third Dimension

Last night I released HsQML 0.3.2.0, the latest edition of my Haskell binding to the Qt Quick GUI library. As usual, it's available for download from Hackage.

HsQML allows you to bind declarative user interfaces written in QML against a Haskell back-end, but sometimes you can't just let QML hog all the graphical fun to itself. This latest release allows you incorporate 3D (OpenGL) graphics rendered from Haskell into your QML scenes using the new Canvas module.

The screenshot below shows off the OpenGL demo in the samples package. The colourful triangle is rendered using the regular Haskell Platform's OpenGL bindings, but HsQML sets up the environment so that it renders into a special HaskellCanvas element inside the QML scene. If you run the actual program you can see it being animated too, moving around and changing colour.


This release also adds the Objects.Weak module which allows you to hold weak references to QML objects and keep track of their life cycles using finalisers. The new FactoryPool abstraction uses these primitives to help you efficiently keep track of instances you've created, especially for when you need to fire change signals on them for data-binding.

London Haskell User Group

I've been fortunate enough to get a speaking slot at the London Haskell User Group and will be giving a talk on Building Pragmatic User Interfaces in Haskell with HsQML on the 26th of November. Please feel free to come along and watch. You can RSPV on the group's meet-up page.

The talk should be videoed and materials will be available online afterwards.

release-0.3.2.0 - 2014.11.13

* Added OpenGL canvas support.
* Added weak references and object finalisers.
* Added FactoryPool abstraction.
* Added To-only custom marshallers.
* Added Ignored type.
* Relaxed Cabal dependency constraint on 'text'.

by Robin KAY ([email protected]) at November 14, 2014 10:04 AM

November 13, 2014

Neil Mitchell

Operators on Hackage

Summary: I wrote a script to list all operators on Hackage, and which packages they are used by.

In GHC 7.10 the *> operator will be moving into the Prelude, which means the Shake library will have to find an alternative operator (discussion on the mailing list). In order to pick a sensible operator, I wanted to list all operators in all Hackage packages so I could be aware of clashes.

Note that exported operators is more than just those defined by the package, e.g. Shake exports the Eq class, so == is counted as being exported by Shake. However, in most cases, operators exported by a package are defined by that package.

Producing the file

First I downloaded the Hoogle databases from Hackage, and extracted them to a directory named hoogle. I then ran:

ghc --make Operators.hs && operators hoogle operators.txt

And uploaded operators.txt above. The code for Operators.hs is:

import Control.Exception.Extra
import Control.Monad
import Data.List.Extra
import System.Directory.Extra
import System.Environment
import System.FilePath
import System.IO.Extra

main = do
[dir,out] <- getArgs
files <- listFilesRecursive dir
xs <- forM files $ \file -> do
src <- readFileUTF8' file `catch_` \_ -> readFile' file `catch_` \_ -> return ""
return [("(" ++ takeWhile (/= ')') x ++ ")", takeBaseName file) | '(':x <- lines src]
writeFileUTF8 out $ unlines [unwords $ a : nub b | (a,b) <- groupSort $ concat xs]

This code relies on the normal packages distributed with GHC, plus the extra package.

Code explanation

The script is pretty simple. I first get two arguments, which is where to find the extracted files, and where to write the result. I then use listFilesRecursive to recursively find all extracted files, and forM to loop over them. For each file I read it in (trying first UTF8, then normal encoding, then giving up). For each line I look for ( as the first character, and form a list of [(operator-name, package)].

After producing the list, I use groupSort to produce [(operator-name, [package])] then writeFileUTF8 to produce the output. Running the script takes just over a minute on my ancient computer.

Writing the code

Writing the code to produce the operator list took about 15 minutes, and I made some notes as I was going.

  • I started by loading up ghcid for the file with the command line ghcid -t -c "ghci Operators.hs". Now every save immediately resulted in a list of warnings/errors, and I never bothered opening the file in ghci, I just compiled it to test.
  • I started by inserting take 20 files so I could debug the script faster and could manually check the output was plausible.
  • At first I wrote takeBaseName src rather than takeBaseName file. That produced a lot of totally incorrect output, woops.
  • At first I used readFile to suck in the data and putStr to print it to the console. That corrupted Unicode operators, so I switched to readFileUTF8' and writeFileUTF8.
  • After switching to writeFileUTF8 I found a rather serious bug in the extra library, which I fixed and added tests for, then made a new release.
  • After trying to search through the results, I added ( and ) around each operator to make it easier to search for the operators I cared about.

User Exercise

To calculate the stats of most exported operator and package with most operators I wrote two lines of code - how would you write such code? Hint: both my lines involved maximumBy.

by Neil Mitchell ([email protected]) at November 13, 2014 09:32 PM

FP Complete

Seeking a Haskell developer for high performance, distributed computing

You know Haskell is an outstanding language for parallel and concurrent programming, and you know how fast Haskell code can be. FP Complete, the leading Haskell company, is working on a powerful, scalable medical application. We need a senior Haskell developer to work on high performance computing and related projects.

You’ll create a general purpose set of libraries and tools for performing big distributed computations that run fast and are robust, and general-purpose tools to monitor such programs at runtime and both profile and debug them. Using your work we’ll perform very large volumes of computation per CPU, across multiple cores and across multiple servers.

This is a telecommute position which can be filled from anywhere, with some preference given to applicants in North America and especially the San Francisco and San Diego areas. You should have these skills:

  • overall strong Haskell application coding ability
  • experience building reusable components/packages/libraries,
  • experience writing high-throughput computations in math, science, finance, engineering, big data, graphics, or other performance-intensive domains,
  • a solid understanding of distributed computing,
  • an understanding of how to achieve low-latency network communication,
  • knowledge of how to profile Haskell applications effectively,
  • work sharing or distributed computing algorithms,
  • multicore/parallel application development.

These further skills are a plus:

  • experience building scalable server/Web applications,
  • experience tuning GHC’s runtime options for high performance,
  • some experience working on GHC internals,
  • knowledge of Cloud Haskell,
  • knowledge of Threadscope or similar profiling tools,
  • an understanding of the implementation of GHC’s multithreaded runtime,
  • experience as a technical lead and/or a manager,
  • experience as an architect and/or a creator of technical specs.

In addition to these position-specific skills, candidates are expected to have clear written and verbal communication in English, the ability to work well within a distributed team and experience with typical project tools (Git or similar revision control system, issue tracker, etc).

Please submit your application to [email protected]. A real member of our engineering team will read it. In addition to a resume/CV, we very much like to see any open source work that you can point to in the relevant domains, or comparable source code you can show us.

November 13, 2014 12:00 AM

November 12, 2014

Yesod Web Framework

The case for curation: Stackage and the PVP

A number of months back there was a long series of discussions around the Package Versioning Policy (PVP), and in particular the policy of putting in preemptive upper bounds (that is to say, placing upper bounds before it is proven that they are necessary). Eventually, the conversation died down, and I left some points unsaid in the interests of letting that conversation die. Now that Neil Mitchell kicked up the dust again, I may as well get a few ideas out there.

tl;dr Stackage is simply better tooling, and we should be using better tooling instead of arguing policy. The PVP upper bounds advocates are arguing for a world without sin, and such a world doesn't exist. Get started with Stackage now.

This blog post will be a bit unusual. Since I'm so used to seeing questions, criticisms, and misinformation on this topic, I'm going to interject commonly stated memes throughout this blog post and answer them directly. Hopefully this doesn't cause too much confusion.

As most people reading this are probably aware, I manage the Stackage project. I have a firm belief that the PVP discussions we've been having are, essentially, meaningless for the general use case, and simply improving our tool chain is the right answer. Stackage is one crucial component of that improvement.

But Stackage doesn't really help end users, it's nothing more than a CI system for Hackage. The initial Stackage work may have counted as that, but Stackage was never intended to just be behind the scenes. Stackage server provides a very user-friendly solution for solving packaging problems. While I hope to continue improving the ecosystem- together with the Haskell Platform and Cabal maintainers- Stackage server is already a huge leap forward for most users today. (See also: GPS Haskell.)

The PVP is composed of multiple ideas. I'd like to break it into:

  1. A method for versioning packages based on API changes.
  2. How lower bounds should be set on dependencies.
  3. How upper bounds should be set on dependencies.

Just about everyone I've spoken to agrees with the PVP on (1) and (2), the only question comes up with point (3). The arguments go like this: preemptive upper bounds add a lot of maintainer overhead by requiring them to upload new versions of packages to relax version bounds regularly. (This is somewhat mitigated by the new cabal file editing feature of Hackage, but that has its own problems.) On the other hand, to quote some people on Reddit:

I'd rather make a release that relaxes bounds rather than have EVERY previous version suddenly become unusable for folks

that upper bounds should not be viewed as handcuffs, but rather as useful information about the range of dependencies that is known to work. This information makes the solver's job easier. If you don't provide them, your packages are guaranteed to break as t -> ∞.

These statements are simply false. I can guarantee you with absolute certainty that, regardless of the presence of upper bounds, I will be able to continue to build software written against yesod 1.4 (or any other library/version I'm using today) indefinitely. I may have to use the same compiler version and fiddle with shared libraries a bit if I update my OS. But this notion that packages magically break is simply false.

But I have some code that built two months ago, and I opened it today and it doesn't work! I didn't say that the standard Haskell toolchain supports this correctly. I'm saying that the absence of upper bounds doesn't guarantee that a problem will exist.

Without dancing around the issue any further, let me cut to the heart of the problem: our toolchain makes it the job of every end user to find a consistent build plan. Finding such a build plan is inherently a hard problem, so why are we pushing the work downstream? Furthermore, it's terrible practice for working with teams. The entire team should be working in the same package environment, not each working on "whatever cabal-install decided it should try to build today."

There's a well known, time-tested solution to this problem: curation. It's simple: we have a central person/team/organization that figures out consistent sets of packages, and then provides them to downstream users. Downstream users then never have to deal with battling against large sets of dependencies.

But isn't curation a difficult, time-consuming process? How can the Haskell community support that? Firstly, that's not really an important question, since the curation is already happening. Even if it took a full-time staff of 10 people working around the clock, if the work is already done, it's done. In practice, now that the Stackage infrastructure is in place, curation probably averages out to 2 hours of my time a week, unless Edward Kmett decides to release a new version of one of his packages.

This constant arguing around PVP upper bounds truly baffles me, because every discussion I've seen of it seems to completely disregard the fact that there's an improved toolchain around for which all of the PVP upper bound arguments are simply null and void. And let me clarify that statement: I'm not saying Stackage answers the PVP upper bound question. I'm saying that- for the vast majority of users- Stackage makes the answer to the question irrelevant. If you are using Stackage, it makes not one bit of difference to you whether a package has upper bounds or not.

And for the record, Stackage isn't the only solution to the problem that makes the PVP upper bound question irrelevant. Having cabal-install automatically determine upper bounds based on upload dates is entirely possible. I in fact already implemented such a system, and sent it for review to two of the staunchest PVP-upper-bounds advocates I interact with. I didn't actually receive any concrete feedback.

So that brings me back to my point: why are we constantly arguing about this issue which clearly has good arguments on both sides, when we could instead just upgrade our tooling and do away with the problem?

But surely upper bounds do affect some users, right? For one, it affects the people doing curation itself (that's me). I can tell you without any doubt that PVP upper bounds makes my life more difficult during curation. I've figured out ways to work around it, so I don't feel like trying to convince people to change their opinions. It also affects people who aren't using Stackage or some other improved tooling. And my question to those people is: why not?

I'd like to close by addressing the idea that the PVP is "a solution." Obviously that's a vague statement, because we have to define "the problem." So I'll define the problem as: someone types cabal install foo and it doesn't install. Let me count the ways that PVP upper bounds fail to completely solve this problem:

  1. The newest release of foo may have a bug in it, and cabal has no way of knowing it.
  2. One of the dependencies of foo may have a bug in it, and for whatever reason cabal chooses that version.
  3. foo doesn't include PVP upper bounds and a new version of a dependency breaks it. (See comment below if you don't like this point.)
  4. Some of the dependencies of foo don't include PVP upper bounds, and a new version of the transitive dependencies break things.
  5. There's a semantic change in a point release which causes tests to fail. (You do test your environments before using them, right? Because Stackage does.)
  6. I've been really good and included PVP upper bounds, and only depended on packages that include PVP upper bounds. But I slipped up and had a mistake in a cabal file once. Now cabal chooses an invalid build plan.
  7. All of the truly legitimate reasons why the build may fail: no version of the package was ever buildable, no version of the package was ever compatible with your version of GHC or OS, it requires something installed system wide that you don't have, etc.

That's not fair, point (3) says that the policy doesn't help if you don't follow it, that's a catch 22! Nope, that's exactly my point. A policy on its own does not enforce anything. A tooling solution can enforce invariants. Claiming that the PVP will simply solve dependency problems is built around the idea of universal compliance, lack of mistakes, historical compliance, and the PVP itself covering all possible build issues. None of these claims hold up in the real world.

To go comically over the top: assuming the PVP will solve dependency problems is hoping to live in a world without sin. We must accept the PVP into our hearts. If we have a build problem, we must have faith that it is because we did not trust the PVP truly enough. The sin is not with the cabal dependency solver, it is with ourselves. If we ever strayed from the path of the PVP, we must repent of our evil ways, and return unto the PVP, for the PVP is good. I'm a religious man. My religion just happens to not be the PVP.

I'm not claiming that Stackage solves every single reason why a build fails. The points under (7), for example, are not addressed. However, maybe of the common problems people face- and, I'd argue, the vast majority of issues that confuse and plague users- are addressed by simply moving over to Stackage.

If you haven't already, I highly recommend you give Stackage a try today.

November 12, 2014 11:38 AM

Installing application dependencies using Stackage, sandboxes, and freezing

Installing Haskell packages is still a pain. But I believe the community has some good enough workarounds that puts Haskell on par with a lot of other programming languages. The problem is mostly that the tools and techniques are newer, do not always integrate easily, and are still lacking some automation.

My strategy for successful installation:

  • Install through Stackage
  • Use a sandbox when you start having complexities
  • freeze (application) dependencies

Simple definitions:

  • Stackage is Stable Hackage: a curated list of packages that are guaranteed to work together
  • A sandbox is a project-local package installation
  • Freezing is specifying exact dependency versions.

I really hope that Stackage (and sandboxes to a certain extent) are temporary workarounds before we have an amazing installation system such as backpack. But right now, I think this is the best general-purpose solution we have. There are other tools that you can use if you are not on Windows:

  • hsenv (instead of sandboxes)
  • nix (instead of Stackage and sandboxes)

hsenv has been a great tool that I have used in the past, but I personally don't think that sandboxing at the shell level with hsenv is the best choice architecturally. I don't want to have a sandbox name on my command line to remind me that it is working correctly, I just want cabal to handle sandboxes automatically.

Using Stackage

See the Stackage documentation. You just need to change the remote-repo setting in your ~/.cabal/config file.

Stackage is a curated list of packages that are guaranteed to work together. Stackage solves dependency hell with exclusive and inclusive package snapshots, but it cannot be used on every project.

Stackage offers 2 package lists: exclusive, and inclusive. Exclusive includes only packages vetted by Stackage. Exclusive will always work, even for global installations. This has the nice effect of speeding up installation and keeping your disk usage low, whereas if you default to using sandboxes and you are making minor fixes to libraries you can end up with huge disk usage. However, you may eventually need packages not on Stackage, at which point you will need to use the inclusive snapshot. At some point you will be dealing with conflicts between projects, and then you definitely need to start using sandboxes. The biggest problem with Stackage is that you may need a newer version of a package than what is on the exclusive list. At that point you definitely need to stop using Stackage and start using a sandbox.

If you think a project has complex dependencies, which probably includes most applications in a team work setting, you will probably want to start with a sandbox.

Sandboxes

cabal sandbox init

A sandbox is a project-local package installation. It solves the problem of installation conflicts with other projects (either actively over-writing each-other or passively sabotaging install problems). However, the biggest problem with sandboxes is that unlike Stackage exclusive, you still have no guarantee that cabal will be able to figure out how to install your dependencies.

sandboxes are mostly orthogonal to Stackage. If you can use Stackage exclusive, you should, and if you never did a cabal update, you would have no need for a sandbox with Stackage exclusive. When I am making minor library patches, I try to just use my global package database with Stackage to avoid bloating disk usage from redundant installs.

So even with Stackage we are going to end up wanting to create sandboxes. But we would still like to use Stackage in our sandbox: this will give us the highest probability of a successful install. Unfortunately, Stackage (remote-repo) integration does not work for a sandbox.

The good news is that there is a patch for Cabal that has already been merged (but not yet released). Even better news is that you can use Stackage with a sandbox today! Cabal recognizes a cabal.config file which specifies a list of constraints that must be met, and we can set that to use Stackage.

cabal sandbox init
curl http://www.stackage.org/alias/fpcomplete/unstable-ghc78-exclusive/cabal.config > cabal.config
cabal install --only-dep

Freezing

There is a problem with our wonderful setup: what happens when our package is installed on another location? If we are developing a library, we need to figure out how to make it work everywhere, so this is not as much of an issue.

Application builders on the other hand need to produce reliable, re-producible builds to guarantee correct application behavior. Haskellers have attempted to do this in the .cabal file by pegging versions. But .cabal file versioning is meant for library authors to specify maximum version ranges that a library author hopes will work with their package. Pegging packages to specific versions in a .cabal file will eventually fail because there are dependencies of dependencies that are not listed in the .cabal file and thus not pegged. The previous section's usage of a cabal.config has a similar issue since only packages from Stackage are pegged, but Hackage packages are not.

The solution to this is to freeze your dependencies:

cabal freeze

This writes out a new cabal.config (overwriting any existing cabal.config). Checking in this cabal.config file guarantees that everyone on your team will be able to reproduce the exact same build of Haskell dependencies. That gets us into upgrade issues that will be discussed.

It is also worth noting that there is still a rare situation in which freezing won't work properly because packages can be edited on Hackage.

Installation workflow

Lets go over an installation workflow:

cabal sandbox init
curl http://www.stackage.org/alias/fpcomplete/unstable-ghc78-exclusive/cabal.config > cabal.config
cabal install --only-dep

An application developer will then want to freeze their dependencies.

cabal freeze
git add cabal.config
git commit cabal.config

Upgrading packages

cabal-install should provide us with a cabal upgrade [PACKAGE-VERSION] command. That would perform an upgrade of the package to the version specified, but also perform a conservative upgrade of any transitive dependencies of that package. Unfortunately, we have to do upgrades manually.

One option for upgrading is to just wipe out your cabal.config and do a fresh re-install.

rm cabal.config
rm -r .cabal-sandbox
cabal sandbox init
curl http://www.stackage.org/alias/fpcomplete/unstable-ghc78-exclusive/cabal.config > cabal.config
cabal update
cabal install --only-dep
cabal freeze

With this approach all your dependencies can change so you need to re-test your entire application. So to make this more efficient you are probably going to want to think about upgrading more dependencies than what you originally had in mind to avoid doing this process again a week from now.

The other extreme is to become the solver. Manually tinker with the cabal.config until you figure out the upgrade plan that cabal install --only-dep will accept. In between, you can attempt to leverage the fact that cabal already tries to perform conservative upgrades once you have packages installed.

rm cabal.config
curl http://www.stackage.org/alias/fpcomplete/unstable-ghc78-exclusive/cabal.config > cabal.config
cabal update
cabal install --only-dep --force-reinstalls
cabal freeze

You can make a first attempt without the --force-reinstalls flag, but the flag is likely to be necessary.

If you can no longer use Stackage because you need newer versions of the exclusive packages, then your workflow will be the same as above without the curl step. But you will have a greater desire to manually tinker with the cabal.config file. This process usually consists mostly of deleting constraints or changing them to be a lower bound.

Conclusion

Upgrading packages is still a horrible experience.

However, for a fresh install, using Stackage, sandboxes, and freezing works amazingly well. Of course, once you are unable to use Stackage because you need different exclusive versions you will encounter installation troubles. But if you originally started based off of Stackage and try to perform conservative upgrades, you may still find your situation easier to navigate because you have already greatly reduced the search space for cabal. And if you are freezing versions and checking in the cabal.config, the great thing is that you can experiment with installing new dependencies but can always revert back to the last known working dependencies.

Using these techniques I am able to get cabal to reliably install complex dependency trees with very few issues and to get consistent application builds.

November 12, 2014 11:38 AM

November 11, 2014

Neil Mitchell

Upper bounds or not?

Summary: I thought through the issue of upper bounds on Haskell package dependencies, and it turns out I don't agree with anyone :-)

There is currently a debate about whether Haskell packages should have upper bounds for their dependencies or not. Concretely, given mypackage and dependency-1.0.2, should I write dependency >= 1 (no upper bounds) or dependency >= 1 && < 1.1 (PVP/Package versioning policy upper bounds). I came to the conclusion that the bounds should be dependency >= 1, but that Hackage should automatically add an upper bound of dependency <= 1.0.2.

Rock vs Hard Place

The reason the debate has continued so long is because both choices are unpleasant:

  • Don't add upper bounds, and have packages break for your users because they are no longer compatible.
  • Add PVP upper bounds, and have reasonable install plans rejected and users needlessly downgraded to old versions of packages. If one package requires a minimum version of above n, and another requires a maximum below n, they can't be combined. The PVP allows adding new functions, so even if all your dependencies follow the PVP, the code might still fail to compile.

I believe there are two relevant relevant factors in choosing which scheme to follow.

Factor 1: How long will it take to update the .cabal file

Let us assume that the .cabal file can be updated in minutes. If there are excessively restrictive bounds for a few minutes it doesn't matter - the code will be out of date, but only by a few minutes, and other packages requiring the latest version are unlikely.

As the .cabal file takes longer to update, the problems with restrictive bounds become worse. For abandoned projects, the restrictive upper bounds make them unusable. For actively maintained projects with many dependencies, bounds bumps can be required weekly, and a two week vacation can break actively maintained code.

Factor 2: How likely is the dependency upgrade to break

If upgrading a dependency breaks the package, then upper bounds are a good idea. In general it is impossible to predict whether a dependency upgrade will break a package or not, but my experience is that most packages usually work fine. For some projects, there are stated compatibility ranges, e.g. Snap declares that any API will be supported for two 0.1 releases. For other projects, some dependencies are so tightly-coupled that every 0.1 increment will almost certainly fail to compile, e.g. the HLint dependency on Haskell-src-exts.

The fact that these two variable factors are used to arrive at a binary decision is likely the reason the Haskell community has yet to reach a conclusion.

My Answer

My current preference is to normally omit upper bounds. I do that because:

  • For projects I use heavily, e.g. haskell-src-exts, I have fairly regular communication with the maintainers, so am not surprised by releases.
  • For most projects I depend on only a fraction of the API, e.g. wai, and most changes are irrelevant to me.
  • Michael Snoyman and the excellent Stackage alert me to broken upgrades quickly, so I can respond when things go wrong.
  • I maintain quite a few projects, and the administrative overhead of uploading new versions, testing, waiting for continuous-integration results etc would cut down on real coding time. (While the Hackage facility to edit the metadata would be quicker, I think that tweaking fundamentals of my package, but skipping the revision control and continuous integration, seems misguided.)
  • The PVP is a heuristic, but usually the upper bound is too tight, and occasionally the upper bound is too loose. Relying on the PVP to provide bounds is no silver bullet.

On the negative side, occasionally my packages no longer compile for my users (very rarely, and for short periods of time, but it has happened). Of course, I don't like that at all, so do include upper bounds for things like haskell-src-exts.

The Right Answer

I want my packages to use versions of dependencies such that:

  • All the features I require are present.
  • There are no future changes that stop my code from compiling or passing its test suite.

I can achieve the first objective by specifying a lower bound, which I do. There is no way to predict the future, so no way I can restrict the upper bound perfectly in advance. The right answer must involve:

  • On every dependency upgrade, Hackage (or some agent of Hackage) must try to compile and test my package. Many Haskell packages are already tested using Travis CI, so reusing those tests seems a good way to gauge success.
  • If the compile and tests pass, then the bounds can be increased to the version just tested.
  • If the compile or tests fail, then the bounds must be tightened to exclude the new version, and the author needs to be informed.

With this infrastructure, the time a dependency is too tight is small, and the chance of breakage is unknown, meaning that Hackage packages should have exact upper bounds - much tighter than PVP upper bounds.

Caveats: I am unsure whether such regularly changing metadata should be incorporated into the .cabal file or not. I realise the above setup requires quite a lot of Hackage infrastructure, but will buy anyone who sorts it out some beer.

by Neil Mitchell ([email protected]) at November 11, 2014 09:11 PM

Luke Plant

You can’t compare language features, only languages

A lot of programming language debate is of the form “feature X is really good, every language needs it”, or “feature X is much better than its opposite feature Y”. The classic example is static vs dynamic typing, but there are many others, such as different types of meta-programming etc.

I often find myself pulled in both directions by these debates, as I’m rather partial to both Haskell and Python. But I’d like to suggest that doing this kind of comparison in the abstract, without talking about specific languages, is misguided, for the following reasons:

Language features can take extremely different forms in different languages

In my experience, static typing in Haskell is almost entirely unlike static typing in C, and different again from C# 1.0, and, from what I can tell, very different from static typing in C# 5.0. Does it really make sense to lump all these together?

Similarly, dynamic typing in Shell script, PHP, Python and Lisp are perhaps more different than they are alike. You can’t even put them on a spectrum — for example, Python is not simply a ‘tighter’ type system than PHP (in not treating strings as numbers etc.), because it also has features that allow far greater flexibility and power (such as dynamic subclassing due to first class classes).

Combination of features is what matters

One of my favourite features of Python, for example, is keyword arguments. They often increase the clarity of calling code, and give functions the ability to grow new features in a backwards compatible way. However, this feature only makes sense in combination with other features. If you had keyword arguments but without the **kwargs syntax for passing and receiving an unknown set of keyword arguments, it would make decorators extremely difficult.

If you are thinking of how great Python is, I don’t think it helps to talk about keyword arguments in general as a killer feature. It is keyword arguments in Python that work particularly well.

Comparing language features opens up lots of opportunities for bad arguments

For example:

Attacking the worst implementation

So, a dynamic typing advocate might say that static typing means lots of repetitive and verbose boilerplate to indicate types. That criticism might apply to Java, but it doesn’t apply to Haskell and many other modern languages, where type inference handles 95% of the times where you might need to specify types.

Defending the best implementation

The corollary to the above fallacy is that if you are only debating language features in the abstract, you can pick whichever implementation you want in order to refute a claim. Someone claims that dynamic typing makes IDE support for refactoring very difficult, and a dynamic typing advocate retorts that this isn’t the case with Smalltalk — ignoring the fact that they don’t use Smalltalk, they have never used Smalltalk, and their dynamically-typed language of choice does indeed present much greater or even insurmountable problems to automated refactoring.

Defending a hypothetical implementation

Defending the best implementation goes further when you actually defend one that doesn’t exist yet.

The mythical “smart enough compiler” is an example of this, and another would be dynamic typing advocates might talk about “improving” dynamic analysis.

Hypothetical implementations are always great for winning arguments, especially as they can combine all the best features of all the languages, without worrying about whether those features will actually fit together, and produce something that people would actually want to use. Sometimes a hybrid turns out like Hercules, and sometimes like the Africanized bee.

Ignoring everything else

In choosing a programming language, it’s not only the features of the language that you have to consider — there is long list of other factors, such as the maturity of the language, the community, the libraries, the documentation, the tooling, the availability (and quality) of programmers etc.

Sometimes the quality of these things are dominated by accidents of history (which language became popular and when), and sometimes they can be traced back to features of the language design.

Many language-war debates ignore all these things. But it’s even easier if you are not actually comparing real languages — just language features, abstracted from everything else.

I understand that comparing everything at once is difficult, and we will always attempt to break things down into smaller pieces for analysis. But I doubt that this goes very far with programming languages, because of the way the different features interact with each other, and also exert huge influence on the way that everything else develops e.g. libraries.

Conclusion

Language features exist within the context of a language and everything surrounding that language. It seems to me that attempts to analyse them outside that context simply lead to false generalisations.

Of course, being really concrete and talking about specific languages often ends up even more personal, which has its own pitfalls! Is there a good way forward?

by Luke Plant at November 11, 2014 09:51 AM

Oliver Charles

Self-Memoizing HTML Rendering via Mutually Recursive Data Types

> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE TypeOperators #-}
> import Debug.Trace
> import Data.Monoid
> import Data.HashMap.Strict
> import Data.MemoTrie

This post demonstrates a cute little construct I’ve been playing with recently, which allows building HTML trees that use memoization to cache the result of their rendering function. By storing memoization at each node, we can then mutate the tree, but subsequent renders only do as little work as necessary.

We start out with the deep embedding of a HTML tree. This is exactly as you might expect; a constructor for text and a constructor for elements, with their tag name and a list of children.

> data HTML' = Text String | Element String [HTML]

However, notice that the list of children is not of type HTML', as one might expect. Instead, we bounce over to the next data type:

> data HTML = HTML (HTML' :->: String) HTML'

Here we wrap up the HTML' values with a trie that tabulates the rendering function. We’re using Conal’s excellent MemoTrie library. Notice how HTML' has values of type HTML, but HTML itself refers back to HTML' - this is an example of mutually recursive data types.

What we have so far is a HTML type that “knows how to render itself”, but also carries the deep-embedding that was used to construct it. Furthermore, each subtree knows how to render itself, so we can modify the tree but only pay to render what we changed. We’ll see an example of this shortly.

In order to work with the MemoTrie library, we need to define instance of HasTrie. As scary as these definitions look, they mostly come from using the basic instances and then following the types.

> instance HasTrie HTML' where
>   data HTML' :->: x = HTML'Trie (String :->: x) ((String, [HTML]) :->: x)
>   trie f = HTML'Trie (trie (f . Text)) (trie (f . uncurry Element))
>   untrie (HTML'Trie f _) (Text t) = untrie f t
>   untrie (HTML'Trie _ g) (Element a c) = untrie g (a, c)
> 
> instance HasTrie HTML where
>   data HTML :->: x = HTMLTrie (HTML' :->: x)
>   trie f = HTMLTrie (trie (f . HTML (trie render')))
>   untrie (HTMLTrie f) (HTML _ x) = untrie f x

The data types we defined above are private, so we export smart constructors. The application of a smart constructor ensures we pair up the rendering function with the correct data.

> text :: String -> HTML
> text t = embed (Text t)
> element :: String -> [HTML] -> HTML
> element el children = embed (Element el children)
> embed :: HTML' -> HTML
> embed = HTML (trie render')

All that is left is to define how to render our deep embedding of a HTML tree. This can be done naively with some trusty pattern matching and recursion. We could, of course, also call out to another library like xmlhtml.

> render' :: HTML' -> String
> render' (Text t) = trace "** RENDERING TEXT **" t
> render' (Element el children) =
>   trace "** RENDERING ELEMENT **" $
>   "<" ++ el ++ ">" ++ concatMap render children ++ "</" ++ el ++ ">"

There are a few trace calls here so we can see what goes on as we try and evaluate these values. Also, note that render' itself is somewhat shallow - it doesn’t call itself in the recursive position, but instead it calls render:

> render :: HTML -> String
> render (HTML f x) = untrie f x

render builds the rendering function from the trie at hand, which gets us back to render'. However, this function is also memoized, so only initial calls will enter the function. We export render but keep render' private.

Examples

Let’s start by building a simple document:

> document :: HTML
> document = element "div"
>              [ element "h1" [ text "Hello!" ]
>              , element "p" [ text "Bonjour tout le monde" ]
>              ]

When we try and render this, we’ll be forced to do some work:

... render document

"** RENDERING ELEMENT **
<div>** RENDERING ELEMENT **
<h1>** RENDERING TEXT **
Hello!</h1>** RENDERING ELEMENT **
<p>** RENDERING TEXT **
Bonjour tout le monde</p></div>"

However, if we try and render the same document again (in the same environment), something interesting happens…

... render document
"<div><h1>Hello!</h1><p>Bonjour tout le monde</p></div>"

No tracing! The lack of any tracing calls indicates that we never actually did any rendering - instead we returned the memoized value we computed prior. This starts to become really useful when we have functions to modify a document. For example, this combinator appends a child element to a document:

> (<+/>) :: HTML -> HTML -> HTML
> html@(HTML f (Text _)) <+/> _ = html
> (HTML f (Element x children)) <+/> y = HTML f (Element x (children ++ [y]))

Taking our original document, we can expand it with another paragraph:

> document' :: HTML
> document' =
>   document <+/> (
>    element "p" [text "That's an approximation of 'Hello, world!' in French"])

Again, we can render this document:

... render document'
"** RENDERING ELEMENT **
<div><h1>Hello!</h1><p>Bonjour tout le monde</p>** RENDERING ELEMENT **
<p>** RENDERING TEXT **
That's an approximation of 'Hello, world!' in French</p></div>"

This time we do see some tracing, but only on the elements that have actually changed - in this case the new paragraph, and the container div itself. A final call to render document' does no work and gives us back the rendering:

... render document'
"<div>
<h1>Hello!</h1><p>Bonjour tout le monde</p>
<p>That's an approximation of 'Hello, world!' in French</p></div>"

(Line breaks added for clarity)

Concluding Thoughts

I’m pretty blown away with how elegantly Haskell is able to capture such an idea. This work arose as I’m currently exploring some ideas using GHCJS and reactive-banana, so I really do have documents that change over time and need to be fast. While reactive-banana has a network that means I already do minimal recomputation, it’s easy to still pay too much. reactive-banana is based around behaviors, and events can be constructed using Applicative, such that if one changes so does the other. However, beyond this basic dependency information, reactive-banana has no way of knowing what’s “inside” the function.

One thing that I don’t yet understand though, is how this plays out with memory usage. For example, if I’m caching from my root element, this presumably means every single view that makes it to the browser stays in memory… forever. That seems pretty bad! One easy fix is to annotate my rendering with explicit “don’t cache this” points, but everytime I find a model that requires the programmer to annotate it for performance, my heart sinks.

I’m sure this work has been discovered before me, so if anyone has any thoughts or pointers to related work, I’m all ears.

November 11, 2014 12:00 AM

November 10, 2014

Philip Wadler

Save our Universities

<style type="text/css">PRE.cjk { font-family: "AR PL UMing HK",monospace; }PRE.ctl { font-family: "Lohit Devanagari",monospace; }P { margin-bottom: 0.21cm; }</style>
An open letter to my MSPs. Support and suggestions for how to carry
this forward as a campaign are solicited.

Dear Gavin Brown, Sarah Boyack, Alison Johnstone, Kezia Dugdale,
Cameron Buchanan and Neil Findlay,

I write to ask your help to preserve a secure future for Scottish
Universities.

Academics for all UK universities, including those in Scotland, have
faced falling wages and falling pensions over many years. "The real
wages of academics have fallen by 13% since 2008, one of the largest
sustained wage cuts any profession has suffered since the Second World
War." So wrote Will Hutton in the Guardian, October 2013 [1].

In 2011, Universities UK imposed vastly reduced pensions on new
hires. Old hires who pay into the pension fund for forty years receive
a pension of one-half their final salary; new hires who do the same
receive a pension of one-half their average salary. Basing pensions on
average rather than final salary may be sensible, but to do so with no
adjustment in multiplier suggests employers are using this as an
excuse to slip in a large cut; it means new hires receive about 2/3
the benefits received by old hires. All staff also suffered other cuts
to pensions: additional caps and less good adjustment for inflation.
At the time, it was predicted that within a few years old hires would
be moved to the inferior scheme for new hires, and that is what has
now come to pass. [2]

Universities UK argue that the reductions are necessary to avoid a
deficit, but their claim has been widely criticised. Notably, a group
of prominent statisticians point out Universities UK inflated the
deficit by assuming a buoyant economy when predicting future salaries
but assuming a recession when predicting investment returns. [3]

A strong university system is one of the jewels in the crown of the
UK, and particularly for Scotland. That excellence is a huge driver
of innovation and growth. If Scotland reduces its investment in
universities, it won't be long before we feel that loss throughout the
economy. [4,5]

Scotland has a University system second to none, and to keep it strong
we need pay and pensions that attract and retain the best minds
throughout the world. We must have a system that is fair to both: old
hires must retain attractive conditions; new hires must have the bad
deal imposed on them in 2011 rolled back. Speaking as an old hire, I'd
settle for a small cut in pension if it meant bringing new hires onto
the same level: we must keep the system strong for the future.

The UCU has gone on strike over the issue (suspension of assessment,
which it hopes will minimise disruption for students). But UCU is
unlikely to succeed without political support.

I write to ask you, as my representative in the Scottish parliament,
will you direct the Scottish Funding Council to make fair treatment
for academics in Scottish Universities, both new hires and old, a top
priority?

Thank you for your consideration. Yours,

-- Philip Wadler
Professor of Theoretical Computer Science
School of Informatics
University of Edinburgh


[1] http://www.theguardian.com/commentisfree/2013/oct/13/england-leave-funding-universities-students
[2] http://www.theguardian.com/education/2010/nov/23/oxbridge-pensions-academics-protest-united
[3] http://www.theguardian.com/higher-education-network/blog/2014/oct/29/marking-boycott-why-are-academics-protesting-about-pensions
[4] http://www.universities-scotland.ac.uk/index.php?mact=News,cntnt01,detail,0&cntnt01articleid=156&cntnt01returnid=23
[5] Royal Society of Edinburgh, Enlightening the Constitutional Debate, Science and Higher Education, p177--198,
    http://www.royalsoced.org.uk/cms/files/events/reports/2013-2014/The%20Book.pdf

by Philip Wadler ([email protected]) at November 10, 2014 02:42 PM

November 07, 2014

The GHC Team

GHC Weekly News - 2014/11/07

Hello *,

It's that time again, so get ready for some good ol' fashion news about your favorite compiler.

And this weeks closed tickets include quite a long list, thanks to everyone cleaning up the bug tracker: #9747, #9236, #9753, #9752, #9262, #8953, #9084, #9738, #8571, #8295, #8261, #9754, #110, #9345, #8849, #8819, #9658, #8960, #9395, #9705, #9433, #9633, #9359, #9081, #8482, #3376, #9712, #9739, #9211, #9728, #9750, #9768, #9773, #9741, #9284, #9774, #9771, #9001, #8626, #8986, #9268, #8975, #8962, #8921, #8089, #8843, #8829, #9295, #7913, #2528, #9779.

by thoughtpolice at November 07, 2014 09:55 PM

Bill Atkins

NSNotificationCenter, Swift and blocks

The conventional way to register observers with NSNotificationCenter is to use the target-action pattern. While this gets the job done, it's inherently not type-safe.

For example, the following Swift snippet will compile perfectly:

    NSNotificationCenter.defaultCenter().addObserver(self, selector: Selector("itemAdded:"),
      name: MyNotificationItemAdded, object: nil)

even though at runtime it will fail unless self has a method named itemAdded that takes exactly one parameter (leaving off that last colon in the selector will turn this line into a no-op). Plus, this method gives you no way to take advantages of Swift's closures, which would allow the observer to access local variables in the method that adds the observer and would eliminate the need to create a dedicated method to handle the event.

A better way to do this is to use blocks. And NSNotificationCenter does include a block-based API:

    NSNotificationCenter.defaultCenter().addObserverForName(MyNotificationItemAdded, object: nil, queue: nil) { note in
      // ...
    }

This is much nicer, especially with Swift's trailing closure syntax. There are no method names to be looked up at runtime, we can refer to local variables in the method that registered the observer and we can perform small bits of logic in reaction to events without having to create and name dedicated methods.

The catch comes in resource management. It's very important that an object remove its event observers when it's deallocated, or else NSNotificationCenter will try to invoke methods on invalid pointers.

The traditional target-action method has the one advantage that we can easily handle this requirement with a single call in deinit:

  deinit {
    NSNotificationCenter.defaultCenter().removeObserver(self)
  }

With the block API, however, since there is no explicit target object, each call to addObserverForName returns "an opaque object to act as observer." So your observer class would need to track all of these objects and then remove them all from the notification center in deinit, which is a pain.

In fact, the hassle of having to do bookkeeping on the observer objects almost cancels out the convenience of using the block API. Frustrated by this situation, I sat down and created a simple helper class, NotificationManager:

class NotificationManager {
  private var observerTokens: [AnyObject] = []

  deinit {
    deregisterAll()
  }

  func deregisterAll() {
    for token in observerTokens {
      NSNotificationCenter.defaultCenter().removeObserver(token)
    }

    observerTokens = []
  }

  func registerObserver(name: String!, block: (NSNotification! -> Void)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: nil, queue: nil) {note in
      block(note)
    }

    observerTokens.append(newToken)
  }
  
  func registerObserver(name: String!, forObject object: AnyObject!, block: (NSNotification! -> Void)) {
    let newToken = NSNotificationCenter.defaultCenter().addObserverForName(name, object: object, queue: nil) {note in
      block(note)
    }
    
    observerTokens.append(newToken)
  }
}

First, this simple class provides a Swift-specialized API around NSNotificationCenter.  It provides an additional convenience method without an object parameter (rarely used, in my experience) to make it easier to use trailing-closure syntax. But most importantly, it keeps track of the observer objects generated when observers are registered, and removes them when the object is deinit'd.

A client of this class can simply keep a member variable of type NotificationManager and use it to register its observers. When the parent class is deallocated, the deinit method will automatically be called on its NotificationManager member variable, and its observers will be properly disposed of:

class MyController: UIViewController {
  private let notificationManager = NotificationManager()
  
  override init() {
    notificationManager.registerObserver(MyNotificationItemAdded) { note in
      println("item added!")
    }
    
    super.init()
  }
  
  required init(coder: NSCoder) {
    fatalError("decoding not implemented")
  }
}

When the MyController instance is deallocated, its NotificationManager member variable will be automatically deallocated, triggering the call to deregisterAll that will remove the dead objects from NSNotificationCenter.

In my apps, I add a notificationManager instance to my common UIViewController base class so I don't have to explicitly declare the member variable in all of my controller subclasses.

Another benefit of using my own wrapper around NSNotificationCenter is that I can add useful functionality, like group observers: an observer that's triggered when any one of a group of notifications are posted:

struct NotificationGroup {
  let entries: [String]
  
  init(_ newEntries: String...) {
    entries = newEntries
  }

}

extension NotificationManager {
  func registerGroupObserver(group: NotificationGroup, block: (NSNotification! -> ()?)) {
    for name in group.entries {
      registerObserver(name, block: block)
    }
  }
}

This can be a great way to easily set up an event handler to run when, for example, an item is changed in any way at all:

   let MyNotificationItemsChanged = NotificationGroup(
      MyNotificationItemAdded,
      MyNotificationItemDeleted,
      MyNotificationItemMoved,
      MyNotificationItemEdited
    )

    notificationManager.registerGroupObserver(MyNotificationItemsChanged) { note in
      // ...
    }

by More Indirection ([email protected]) at November 07, 2014 05:52 PM

November 06, 2014

Leon P Smith

A Brief History of Timekeeping, Part 1: Dates on the Western Calendars

Time is simple and well understood, right? Well, in a physical sense, it’s fairly well understood, but not at all simple. Cutting edge research into atomic clock technology promises an uncertainty of about a second over an interval of several billion years. This is plenty accurate enough to expose the minuscule relativistic effects of simply repositioning the clock. In fairly short order, differences of fractions of a picosecond can be detected, caused by the fact that time passes at imperceptibly different speeds depending on motion and the small changes in the local gravity field.

But the topic of these posts is the political vagaries of time, not the physical. The basic concept of time (modulo relativity) is something intuitively understood by almost all human beings, and human cultures have been accurately counting the days for thousands of years. So if understanding fractions of a picosecond is difficult, at least dates should be easy, right?

Unfortunately, not so much. Although the Western Calendar has become the de facto and de jure standard over most the world, mapping a Western date to a precise day has some complicated and even ambiguous edge cases.

Julius Caesar introduced the Julian Calendar to the Roman Republic around January 1, 45 BC, in part to help resolve some political problems created by the ancient Roman Calendar. A bit over a year later, on March 15, 44 BC, Julius Caesar was assassinated, motivated in part by some of the political problems the older calendar contributed to.

One might ask, exactly how many days have passed since Caesar was assassinated until the time you are reading this? Certainly, if this question can be answered correctly, arriving at the answer is a bit more subtle than most people realize.

The Julian Calendar was essentially version 1.0 of the Western Calendar, and it’s initial deployment had a few teething problems. There was a bit of confusion surrounding what it meant to have a leap day "every fourth year", and as a result leap days occurred every third year until the error was noticed and corrected by Augustus Caesar and his advisers nearly 40 years later. All leap years were suspended for 12 years, to be resumed at the proper interval once the error was corrected.

We don’t know exactly when those earliest leap years occurred. It does seem as though there is a single probable answer in the case of Caesar’s assassination, as both of the unfalsified candidate solutions for the early leap years lead to the same answer. However, there are many dates over the subsequent decades that could easily refer to one of two days. In any case, sometime around 1 BC to 4 AD, the Julian calendar stabilized, and for nearly 1600 years afterwards, there is an unambiguous mapping of historical dates to days.

However, the average length of a Julian year was longer than the length of the solar year. Over the next 1500 years the dates on the Julian Calendar drifted noticeably out of sync with the solar year, with the spring equinox (as determined by actual astronomical observations) coming some 10 days too early.

In February 1582, Pope Gregory XIII instituted the Gregorian Calendar. Leap years would come every fourth year, as before, but now leap years would be skipped every 100 years, except every 400 years. Also, the new calendar would skip 10 days in order to bring it back in sync with the solar year, with 1582-10-04 followed by 1582-10-15.

And, if that was the entire story, it wouldn’t be so bad, but unfortunately this reform ushered in several hundred years of ambiguity and confusion due to its uneven adoption. This reform was immediately adopted by the Catholic Church, the Papal States and many Catholic nations, with other Catholic nations moving to the Gregorian Calendar soon afterwards. However, the Gregorian Calendar was resisted by most Protestant and Orthodox nations. For example, Great Britain and her colonies held out until 1752,1 and Russia and Greece held out until 1918 and 1923 respectively, shortly after armed revolutions forced a change of government in those countries.

Then you have the case of the Swedish Calendar, which is notable for it’s particularly convoluted, mishandled, and then aborted transition from the Julian to the Gregorian calendars which resulted in a February 30, 1712. Also, in a few locales, including Switzerland and the Czech Republic, some people were using the Julian calendar at the same time others were using the Gregorian calendar.

Thus, accurately identifying a date after 1582 coming from original sources, or any historical date coming from modern sources, can be complicated or even impossible, depending on the precise context (or lack thereof) surrounding the date and its source.

Most dates in modern history books remain as they were originally observed, but there are a few exceptions. For example, the date of George Washington’s birth is observed as 1732-02-22 (Gregorian), even though the American colonies were using the Julian calendar at the time. On the day that Washington was born, the date was actually 1732-02-11 (Julian).

As encouraged by the ISO 8601:2004 standard, computer software often uses the proleptic Gregorian calendar for historical dates, extending the Gregorian calendar backwards in time before its introduction and using it for all date calculations. This includes PostgreSQL’s built-in time types. Glasgow Haskell’s time package supports both the proleptic Julian and Gregorian calendars, though defaults to the Gregorian Calendar.

The good news is that as time marched on, things have gotten better. Driven largely by faster methods of communication and transportation, we’ve gone from the typical context-dependence of dates down to the typical context-dependence of a dozen seconds or so. Even ignoring context, we are often more limited by the accuracy of a typical clock, and technologies such as GPS and NTP are often available to help keep them in reasonable agreement with atomic time standards.

In the upcoming posts of this two or three part series, we’ll look at the process of reducing this ambiguity, as well as local time, time zones, and how PostgreSQL’s nearly universally misunderstood timestamp with time zone type actually works.


  1. Users with access to a unix machine may be amused by the output of cal 9 1752, and differences in various man pages for cal and ncal can be informative, take for example these two.


by lpsmith at November 06, 2014 10:49 AM

FP Complete

3.1 release changes

On September 28 we released 3.1 of the FP Complete public services. We've discussed this a bit already, but in this blog post, we want to share some of the more technical details of what we've added.

Laundry list of features

We've added a whole bunch of new features to FP Haskell Center. Here are some of the most notable:

  • You can now access all features of the IDE for free
  • It's faster
  • More file type support: Literate Haskell, hs-boot, and C files
  • Git Shell allows you to run arbitrary Git commands on a fully interactive command line
  • In addition to GHCi runs of code, you can build and run binaries inside the IDE
  • Ability to pass arbitrary flags to deployment builds
  • View usages of an identifier
  • Better support for Template Haskell-included files (like Hamlet)
  • Modified data files immediately appear to your application (good for modifying a CSS file in a web app, for example)
  • Better session management, including logging out of all sessions
  • School of Haskell supports autorun and Disqus comments

Keep reading for a drill down of some of this list. And for some lower-level, GHC-facing bits of our work, jump ahead to the "faster" sections.

Open Publish model

As announced previously, we've decided to make FP Haskell Center even more open. Since our 3.1 release, all IDE features are turned on for all users, with and without a paid account.

The new FPHC terms & conditions no longer restrict the free community version to non commercial use. Even anonymous users can access the full spread of features in the IDE, from type checking to executable generation, and even Git repositories (though you'll definitely want to log in to be able to use that properly).

There are two distinctions between our community and paid licenses now:

  • Commercial licenses provide more compute power.

  • Under community licenses, all commits made in the IDE will be published. Similar to the GitHub model, free projects are open projects. You'll need a commercial license for private projects, which can still be acquired by contacting FP Complete Sales. So now when you commit, you'll see this message:

    Committed changes are public

    Any new commits are now made public and will make your project public. Existing private projects will not be published until you make a new commit.

    NOTE: If you have an existing project with commits, it will remain private until your next commit.

Additionaly, there are now no restrictions on how you license your published code. This means that although you give others the right to view your published code, they receive no implicit license to distribute your code or use it in derived works. You can still explicitly license your code under an open source license if you choose.

Autorun active code samples

Based on feedback (mainly from Edward Kmett), there's a new directive in the code fence syntax in School of Haskell posts, which will auto-run the code snippet when the page loads. This will be especially nice for showing graphical web pages or to begin an interactive process. The code is written as:

``` haskell active autorun
main = print "Hello, World!"
```

And that will result in something like this.

School of Haskell improvements

The School of Haskell now supports adding Disqus comments to your own tutorials (yet again suggested by Ed). It looks like this.

To enable this, go into your account settings under “Profile” and you can choose:

☑ Use site default (currently: no comments)

☐ Disable comments

☐ Use FP Complete's Disqus account

☐ Use your own Disqus account

Disqus account ID: [ ]

To use your own Disqus account, hit the “Use your own Disqus account” checkbox and put your account id in the box. Then go into your Disqus site settings under the “Advanced” tab and add fpcomplete.com under the “Trusted Domains” text box.

GHC 7.8, new libraries

We've updated our default compiler version to GHC 7.8. We still provide a 7.4 environment, but it is considered deprecated, and we recommend users upgrade to 7.8. In addition, as usual, we've updated our library set to the newest versions of upstream packages, and will begin releasing unstable package snapshots next month.

Defer type errors

Together with GHC 7.8 come some new compiler features. For the most part, these are simply available to you without any IDE support. We added one nifty feature though: You can now defer type errors using the checkbox in the Messages tab:

☑ Enable suggestions ☑ Enable warnings ☑ Defer type errors

This will change type errors into warnings which are instead thrown at runtime. A very useful way to keep getting type information when you have a simple type error elsewhere in the code.

This plays in very nicely with the new Type Holes feature. Now you can program Python in Haskell!

Improved root & filtering

In the “Root & Filtering” tab in Settings, there is a way to change the root directory of your project, this is similar to doing a cd in your terminal. This is useful if you have several projects in one repository and you want to just compile the modules within one directory, rather than everything in the whole repository.

You can also ignore certain files using the blacklist input, e.g. src/Main.hs will hide that file from the list. This can be useful for hiding things from compilation.

This feature is still considered experimental.

Faster

You should notice a definite improvement in responsive in the IDE. We've actually implemented many different changes to make this happen. Our network communication layer, for example, has less overhead involved, by removing some unnecessary parsing from our reverse proxy machines. We've also refactored quite a bit of our async architecture to provide better concurrency for backend actions.

However, the biggest change comes to how we interact with GHC. We use GHC for a few different things in the IDE:

  • Type checking code
  • Provide information on identifier locations, types, usages, etc
  • Generating and running bytecode

Those first two bullets dovetail together rather nicely, though the second can take more time than the first. However, the third point doesn't play so nicely with the other two. The issue is that, while GHC is running the bytecode, it can't continue to compile new code. And this is definitely something we want to allow. For example, a common use case it running a web application in the IDE, while continuing to hack on it while it's running in the background.

Before the 3.1 release, our solution was to have three copies of GHC running for each project: one to do a quick typecheck, one to extract type information, and one for running code. This meant you would get error messages quickly, but couldn't always get type information right away. It also meant your project required much more memory, which overall slowed things down.

Our new architecture involves just a single GHC process running for each project. We load code into this process, and it typechecks, extracts information, and generates bytecode, all at the same time. The trick is what happens when you press run? We would like to use the bytecode already available in the current GHC process, without tying that process up on just running the bytecode.

The solution we ended up at is calling forkProcess. However, it wasn't quite as simple as that. To quote forkProcess's documentation:

forkProcess comes with a giant warning: since any other running threads are not copied into the child process, it's easy to go wrong: e.g. by accessing some shared resource that was held by another thread in the parent.

Sure enough, we ran into exactly this bug almost immediately. And fortunately, we have some very good news to report: it's fixed! You can see more information in GHC trac tickets 9295 and 9296 (and check Phabricator for some great reaction gifs). These changes have been merged upstream into GHC. Since writing those patches, we've stress tested our IDE significantly, and have no longer been able to reproduce any forkProcess bugs, which looks incredibly encouraging.

Note, though, that there are still some issues around forkProcess. If you're planning on using it in your own code, you should be aware that you can still run into problems, especially with multithreaded code.

One other change we made was switching to a dynamically linked GHC process. We'd wanted to do this for a while due to improved memory usage. Ultimately, we had to make the switch due to a bug in C static initializers. Now we get the nice benefit that separate processes on the same machine can share memory space for their dynamic libraries.

November 06, 2014 12:00 AM

November 04, 2014

Functional Jobs

Senior Software Engineer at McGraw-Hill Education (Full-time)

This Senior Software Engineer position is with the new LearnSmart team at McGraw-Hill Education's new and growing Research & Development center in Boston's Innovation District.

We make software that helps college students study smarter, earn better grades, and retain more knowledge.

The LearnSmart adaptive engine powers the products in our LearnSmart Advantage suite — LearnSmart, SmartBook, LearnSmart Achieve, LearnSmart Prep, and LearnSmart Labs. These products provide a personalized learning path that continuously adapts course content based on a student’s current knowledge and confidence level.

On our team, you'll get to:

  • Move textbooks and learning into the digital era
  • Create software used by millions of students
  • Advance the state of the art in adaptive learning technology
  • Make a real difference in education

Our team's products are built with Flow, a functional language in the ML family. Flow lets us write code once and deliver it to students on multiple platforms and device types. Other languages in our development ecosystem include especially JavaScript, but also C++, SWF (Flash), and Haxe.

If you're interested in functional languages like Scala, Swift, Erlang, Clojure, F#, Lisp, Haskell, and OCaml, then you'll enjoy learning Flow. We don't require that you have previous experience with functional programming, only enthusiasm for learning it. But if you have do some experience with functional languages, so much the better! (On-the-job experience is best, but coursework, personal projects, and open-source contributions count too.)

We require only that you:

  • Have a solid grasp of CS fundamentals (languages, algorithms, and data structures)
  • Be comfortable moving between multiple programming languages
  • Be comfortable with modern software practices: version control (Git), test-driven development, continuous integration, Agile

Get information on how to apply for this position.

November 04, 2014 02:42 PM

November 03, 2014

Douglas M. Auclair (geophf)

October 2014 1HaskellADay Problems and Solutions


October, 2014
  • October 1st, 2014, Wednesday: Pathways into Darkness http://lpaste.net/111444 Because Daerkness is the new Pink. Today's #haskell problem. ...AAAAAND we went there: Dark, Darkest, Darko! http://lpaste.net/111919 A solution to the #haskell grid-pathing problem using Data.Graph. FUN!
  • October 2nd, 2014, Thursday: Tweet, tweet! Summer Factorial! http://lpaste.net/111947 a #haskell 'sum'mer exercise for the Fall. From "Power set, HO!" to "Power Sets? NO!" http://lpaste.net/111982 A solution to today's longest factors-sum #haskell problem.
  • October 3rd, 2014, Friday: In which I get WHACKED by the Missus for today's #haskell problem http://lpaste.net/112005 (I've printed out the word-ladders in long form so you can give them to your Missus, thus to avoid enWHACKification! http://lpaste.net/112009) GROSS ('Get Rid Of Stupid wordS') club and goal-directed searches using Levenstein distances give us today's solution http://lpaste.net/112064
  • BONUS! puzzles for today: YOU get RICK ROLLED! http://lpaste.net/112006 Oh, yeah! I went there!
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="https://ytimg.googleusercontent.com/vi/dQw4w9WgXcQ/0.jpg" height="266" width="320"><param name="movie" value="https://youtube.googleapis.com/v/dQw4w9WgXcQ&amp;source=uds"/><param name="bgcolor" value="#FFFFFF"/><param name="allowFullScreen" value="true"/><embed allowfullscreen="true" height="266" src="https://youtube.googleapis.com/v/dQw4w9WgXcQ&amp;source=uds" type="application/x-shockwave-flash" width="320"></embed></object>

  • You can't kill the metal, unless you're Rick Rolled; solution to #haskell bonus problem at http://lpaste.net/112127
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="https://ytimg.googleusercontent.com/vi/MmpnBGtoG0M/0.jpg" height="266" width="320"><param name="movie" value="https://youtube.googleapis.com/v/MmpnBGtoG0M&amp;source=uds"/><param name="bgcolor" value="#FFFFFF"/><param name="allowFullScreen" value="true"/><embed allowfullscreen="true" height="266" src="https://youtube.googleapis.com/v/MmpnBGtoG0M&amp;source=uds" type="application/x-shockwave-flash" width="320"></embed></object>


  • October 4th, 2014: Dear Dad, SEND + MORE = MONEY http://lpaste.net/112160 Today's #haskell problem. Problem SOLVED! (naively) ... for a small processing fee, ... ;) http://lpaste.net/112175
  • October 7th, 2014: An American Family http://lpaste.net/112220 Today's #haskell problem gives us the #Twilight lexicon... in a 'VERY' 'BRIEF' synopsis. #ontology 8 of 21 questions are answered against the #Twilight (extended) family tree http://lpaste.net/112307  My enthusiasm overcame my reasonableness.
  • BONUS #haskell problem for today! http://lpaste.net/112229 Renesmee's (doting) Aunts because EVERYONE dotes on Renesmee. By law. Do-it-to-it! "And Bella Swan lived Happily Ever After" ... and that's the whole point of the 4-book #twilight series. http://lpaste.net/112339 #ontology
  • October 8th, 2014: "A more clever approach than brute force is desirable" for today's #haskell problem http://lpaste.net/112310 (Me? Brute Force, All the WAY!) (Not so) Brutish pathing through triangular sums ... Commited Choice FTW! http://lpaste.net/112355 A solution to today's #haskell problem.
  • October 9th, 2014: Sugar and spice and everything nice (http://lpaste.net/111904): that's what today's #haskell problem from a logic puzzle from 1957 is made of. A long path to a solution to this day's engagement announcements #haskell problem, involving monoidal Frege logic. http://lpaste.net/112446
  • October 10th, 2014: "Oh, it's 4:42. No, it isn't, ... zzzz" Those kinds of conversations come up all the time in today's #haskell problem http://lpaste.net/112404 And the poor little matchstick boy went back to sleep in his warm, comfy bed, hot chocolate by his bedside http://lpaste.net/112461 a solution
  • BONUS! Okay, so you're saying liars ALWAYS lie and TruthTellers ALWAYS tell the truth. http://lpaste.net/112405 Wait ... even in Vegas? #Bonus In which we may learn about your sense of humo(u)r http://lpaste.net/112508 a solution using logic, meta-logic, and coercive logic. #ontology 
  • October 13th, 2014: In which we discover Columbus is Big in Japan ... no, wait: that's #ekmett. http://lpaste.net/112511 Today's #haskell problem. We solve this problem by WAY overusing the State Monad! http://lpaste.net/112537 ... Letter to a Young Haskell Enthusiast should have added: "Don't overused the State Monad!" I hear and obey the advice of ekmett!
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="https://ytimg.googleusercontent.com/vi/tl6u2NASUzU/0.jpg" height="266" width="320"><param name="movie" value="https://youtube.googleapis.com/v/tl6u2NASUzU&amp;source=uds"/><param name="bgcolor" value="#FFFFFF"/><param name="allowFullScreen" value="true"/><embed allowfullscreen="true" height="266" src="https://youtube.googleapis.com/v/tl6u2NASUzU&amp;source=uds" type="application/x-shockwave-flash" width="320"></embed></object>


  • What time is it? π-time? No. (But you were close ...) It's BONUS #haskell problem time! http://lpaste.net/112515 Go discover the New World! And, we discover, solving this bonus problem, that the New World was India (and not the ink), no matter what anyone else says, eh, Columbus? http://lpaste.net/112533
  • October 14th, 2014: We look at reciprocally-cycled decimal values of fractions in today's #haskell problem http://lpaste.net/112565 In which we discover that @geophf must read the problem statement carefully. A solution to today's #haskell problem http://lpaste.net/112592
  • October 15th, 2014: Today's #haskell problem: grid placement http://lpaste.net/112606. Or as Neo says: "Whoa."
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="https://ytimg.googleusercontent.com/vi/WFNEgdwjEhs/0.jpg" height="266" width="320"><param name="movie" value="https://youtube.googleapis.com/v/WFNEgdwjEhs&amp;source=uds"/><param name="bgcolor" value="#FFFFFF"/><param name="allowFullScreen" value="true"/><embed allowfullscreen="true" height="266" src="https://youtube.googleapis.com/v/WFNEgdwjEhs&amp;source=uds" type="application/x-shockwave-flash" width="320"></embed></object>



  • October 16th, 2014: An ... 'intelligence' test http://lpaste.net/112694 for today's #haskell problem. Intelligence test: answered. http://lpaste.net/112759 I had to melt my brain to answer it, however, so that happened.
  • October 17th, 2014: Friedman day was yesterday, http://lpaste.net/112707 but it's today's #haskell problem. Bonus: Friedman proofsno less! http://lpaste.net/112709 ... using Frege's logic ... but only if you want to ... The solution (including the bonus solution) for the Friedman-day problem posted at http://lpaste.net/112735  WITH FOR-LOOPS!
  • October 20th, 2014: O! Little Town of Milford! http://lpaste.net/112921 has a butcher, baker, and candlestick maker, but who is who? Today's #haskell problem. Okay, gag me with the sequence operator! BLEH! (I really should have gone relational calculus here! :/) A solution at http://lpaste.net/112954
  • October 21st, 2014: Se7en Little numbers ... okay, well, four of them, anyway, for today's #haskell problem http://lpaste.net/112972 Quantum nodes of scintillating thought http://lpaste.net/112996  A solution to today's #Haskell problem.
<object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="https://ytimg.googleusercontent.com/vi/zsyjS_vJfkw/0.jpg" height="266" width="320"><param name="movie" value="https://youtube.googleapis.com/v/zsyjS_vJfkw&amp;source=uds"/><param name="bgcolor" value="#FFFFFF"/><param name="allowFullScreen" value="true"/><embed allowfullscreen="true" height="266" src="https://youtube.googleapis.com/v/zsyjS_vJfkw&amp;source=uds" type="application/x-shockwave-flash" width="320"></embed></object>

  • October 22nd, 2014: Word Numb3rs is today's #haskell puzzle (thanks to @BenVitale) 'AND' in your language of choice as you so choose. http://lpaste.net/113018 Hinglish-Vinglish is our response to this exercise.  http://lpaste.net/113045 "Fruitful." Yeah. Well, at least we got Data.Numeral.English http://lpaste.net/113037 out of it. SWEET!
  • October 23rd, 2014: Today's #haskell problem comes from http://projecteuler.net and is about ... well: Euler, himself! http://lpaste.net/113069 Well, if you wanted list of writer monads, you could've just asked for a list of writer monads http://lpaste.net/113084 
  • October 24th, 2014: Even odds is our #haskell problem today, thanks to @BenVitale http://lpaste.net/113141 A solution (actually two defined solutions) to @BenVitale even odds problem posted for today's #haskell problem http://lpaste.net/113167
I dub this coming week 'Kakuro week' for the theme of #haskell 
problems to solve (puzzle rendered using #haskell)

  • October 27th, 2014: #kakuro-week: the empty slate is our step one for our #haskell problem(s) to solve today http://lpaste.net/113255


  • Then we have the...'filled'-slate as a solution to 'today's #haskell problems for #kakuro-week http://lpaste.net/113314 
  • October 28th, 2014: «On y va!» declaims Sgt. Johnson as he charges headlong into solving today's #haskell problem http://lpaste.net/113316 
  • And that's how a 'partial' cookie crumbles. Partially. A partial solution to yesterday's #haskell #kakuro problem http://lpaste.net/113397
  • October 29th, 2014: A magnificent clue for today's #haskell problem in this #kakuro-puzzle-themed week http://lpaste.net/113399
  • October 30th, 2014: So, what does the 'Su' mean? http://lpaste.net/113451 Today's #haskell puzzler is a sudoku-solver. That you wrote. DoIt-ToIt!
  • October 31st, 2014: Merry Christmas! No. Heurieusement anniversaire! Nope. TRICK OR TREAT! Ya, that's the stuff! http://lpaste.net/113524 Today's #haskell problem

by geophf ([email protected]) at November 03, 2014 12:18 PM

November 02, 2014

Neil Mitchell

November talks

Summary: I'll be talking at CodeMesh 2014 on 5th November and FP Days on 20th November.

I am giving two talks in London this month:

CodeMesh 2014 - Gluing Things Together with Haskell, 5th Nov

I'll be talking about how you can use Haskell as the glue language in a project, instead of something like Bash. In particular, I'll cover Shake, NSIS and Bake. The abstract reads:

A large software project is more than just the code that goes into a release, in particular you need lots of glue code to put everything together - including build systems, test harnesses, installer generators etc. While the choice of language for the project is often a carefully considered decision, more often than not the glue code consists of shell scripts and Makefiles. But just as functional programming provides a better way to write the project, it also provides a better way to write the glue code. This talk covers some of the technologies and approaches we use at Standard Chartered to glue together the quant library. In particular, we'll focus on the build system where we replaced 10,000 lines of Makefiles with 1,000 lines of Haskell which builds the project twice as fast. We'll also look at how to test programs using Haskell, how to replace ancillary shell scripts with Haskell, and how to use Haskell to generate installers.

FP Days 2014 - Building stuff with Shake, 20th Nov

I'll be giving a tutorial on building stuff with Shake. It's going to be less sales pitch, more how you structure a build system, and how you use Shake effectively. The abstract reads:

Build systems are a key part of any large software project, relied upon by both developers and release processes. It's important that the build system is understandable, reliable and fast. This talk introduces the Shake build system which is intended to help meet those goals. Users of Shake write a Haskell program which makes heavy use of the Shake library, while still allowing the full power of Haskell to be used. The Shake library provides powerful dependency features along with useful extras (profiling, debugging, command line handling). This tutorial aims to help you learn how to think about writing build systems, and how to make those thoughts concrete in Shake.

by Neil Mitchell ([email protected]) at November 02, 2014 09:31 PM

November 01, 2014

Joachim Breitner

Can one recommend Debian stable to Desktop users?

My significant other is running Debian stable on her laptop, and it has worked fine for quite a while. But since a week or two, she could not access her University IMAP account via Evolution. Obviously quite a showstopper!

Today I had a closer look, and my suspicion was that the University changed their SSL configuration due to the recent POODLE attack and that Evolution was incompatible with that. After some more searching, I found that Ubuntu had applied a patch, originally from Fedora, two weeks ago. For Debian, there is a bug report but no sign of action.

So I fetched the sources, applied the patch, built the package, installed it and things were working again. Yay for that! But this is obviously not the best way to handle such issues.

I know that Debian is volunteer driven and we often lack the manpower for certain things, so I don’t want to rant about this particular issue. I also continue to be a happy user of Debian unstable on my laptop, and Debian stable on my servers. But I seriously wonder: Can I really recommend Debian stable to users, for their laptops and desktops? If not, what are the alternatives? Ubuntu obviously comes to mind, having some full-time staff for such issues... But that would be giving up on promoting Debian as the universal operating system.

Update (2010-11-3): Laney just uploaded a fixed package. Thanks!

by Joachim Breitner ([email protected]) at November 01, 2014 02:52 PM

October 31, 2014

The GHC Team

GHC Weekly News - 2014/10/31 (Halloween Edition)

Hello *,

Welcome to the GHC Weekly news. And it's just in time before you go out and eat lots of candy and food.

  • David Feuer and Joachim Brietner spent some time this past week talking about more optimizations for Haskell code for fusing code, and creating better consumers and producers. This work includes optimizations of "One shot lambdas" (lambdas used at most once) and Call Arity, which was implemented by Joachim at Microsoft Research. The thread is here - https://www.haskell.org/pipermail/ghc-devs/2014-October/006901.html

The current situation is that Call Arity and One shot analysis tends to have good combined results for exploiting more fusion opportunities, but sometimes these backfire. As a result, Joachim and David have worked on improving the situation - particularly by letting programmers help with a new oneShot primitive (in Phab:D392 & Phab:D393).

  • Herbert Valerio Riedel opened up a discussion about the origins of code contributions. In short, we'd like to have some stronger ground to stand on in the face of contributions from contributors - the origin of a change and its legal status is a bit nebulous. The thread is here: https://www.haskell.org/pipermail/ghc-devs/2014-October/006959.html

Overall, there's been a lot of separate points made, including CLAs (unlikely), "Developer Certificates of Origin" a la the Linux Kernel, and reworking how we mark up header files, and keep track of GHC's long history of authors. If you work on a big project where some of these concerns are real, we'd like to hear what you have to say!

  • Gintautas Milauskas has done some fantastic work for GHC on Windows lately, including fixing tests, improving the build, and making things a lot more reasonable to use. With his work, we hope GHC 7.10 will finally ship an updated MinGW compiler (a long requested feature), and have a lot of good fixes for windows. Thank you, Gintautas!
  • And on that note, the call for Windows developers rages on - it looks like Gintautaus, Austin, Simon, and others will be meeting to discuss the best way to tackle our current woes. Are you a Windows user? Please give us input - having input is a crucial part of the decision making process, so let us know.
  • Jan Stolarek had a question about core2core - a lot of questions, in fact. What's the difference between demand, strictness, and cardinality analylsis? Does the demand analyzer change things? And what's going on in some of the implementation? A good read if you're interested in deep GHC optimization magic: https://www.haskell.org/pipermail/ghc-devs/2014-October/006968.html
  • Peter Wortmann has put up the new DWARF generation patches for review, in Phab:D396. This is one of the major components we still plan on landing in 7.10, and with a few weeks to spare, it looks like we can make sure it's merged for the STABLE freeze!
  • There have been a lot of good changes in the tree this past week:

Thanks to Michael Orlitzky, we plan on adding doctest examples to more modules in 7.10, and increase that coverage further. This is *really* important work, but very low fruit - thanks a ton Michael!

Data.Bifunctor is now inside base! (Phab:D336)

atomicModifyIORef' has been optimized with excellent speedups (as much as 1.7x to 1.4x, depending on the RTS used), thanks to some older work by Patrick Palka (Phab:D315). GHC's internals have been reworked to unwire Integer from GHC, leading not only to a code cleanup, but laying the foundation for further GMP (and non-GMP!) related Integer improvements (Phab:D351).

David Feuer and Joachim have been relentless in improving fusion opportunities, including the performance of take, isSuffixOf, and more prelude improvements, spread over nearly half a dozen patches. And this doesn't even include the work on improving oneShot or Call Arity!

In a slight change to base semantics, David Feuer also finally fixed #9236. This is a change that can expose latent bugs in your program (as it did for Haddock), so be sure to test thoroughly with 7.10 (Phab:D327).

GHC now has support for a new __GLASGOW_HASKELL_TH__ macro, primarily useful for testing bootstrap compilers, or compilers which don't support GHCi.

And there have been many closed tickets: #9549, #9593, #9720, #9031, #8345, #9439, #9435, #8825, #9006, #9417, #9727, #2104, #9676, #2628, #9510, #9740, #9734, #9367, #9726, #7984, #9230, #9681, #9747, and #9236.

by thoughtpolice at October 31, 2014 11:57 PM

Roman Cheplyaka

Rebalancing Open Source Portfolio

Being an open source developer means being an investor. I invest my time in creating, contributing to, and maintaining projects. As an investor, I occasionally need to re-evaluate my portfolio and decide whether it is optimal.

The result of a recent such re-evaluation is that I’ve decided to stop my active involvement in the haskell-suite family of projects.

This decision wasn’t easy. Being a maintainer of haskell-src-exts was a unique experience. During the last two years, 24 people have contributed to the project, which is amazing, given its complexity. During ICFP’14, I met with 4 or 5 contributors and many users of haskell-src-exts. People would approach me in the hallway and ask when the new version would be released or a certain feature added.

Leaving haskell-names behind is also sad. It hasn’t seen as many uses as haskell-src-exts (which has always been puzzling — you can’t do much with a bare parser. But apparently folks prefer to roll their own, incomplete and buggy, name resolution.) Still, it’s used by fay, and I feel bad about letting the fay devs down.

So, why did I choose to stop investing in the haskell suite?

  1. I have much less spare time these days, so I have to drop something.
  2. I became involved with the haskell suite because of my interest in developer tools for Haskell (hasfix, ariadne). I am no longer working on those, and I’m not using any haskell-src-exts-powered tools (hlint, stylish-haskell) myself, which means I don’t get big personal returns on my investment.
  3. The main competitor for the haskell suite is the GHC API. It is now clear to me that we are losing to them in most areas. There is very little hope for the type checker, let alone other parts of the compilation pipeline. The community did a great job in catching up with GHC on the parser side, but we still have less resources than GHC and are bound to lag behind, it seems. On the other side, there has been some work recently (by Alan Zimmerman, in particular) to improve GHC’s AST and catch up with HSE on that front. So, even if I decide to invest in compilers or dev tools in the future, I’m probably going to team up with GHC instead.

What’s going to happen to these projects I’m leaving behind? I’m less worried about haskell-src-exts. Peter Jonsson has been doing some great work on it, and I hope he’ll step in. There’s also Niklas Broberg and all the wonderful contributors.

I’m more worried about haskell-names, haskell-packages, ariadne. If you are willing to take over any of them, let me know (and you can count on my help). Otherwise, I’m going to provide very basic support for them, but nothing more.

Oh, and in case you missed the announcement, I’m dropping support for my test-framework packages, too.

On the other hand (and to end with something positive), I have no intention to stop maintaining tasty in the foreseeable future. Haskell needs at least one decent testing framework, after all! :-) By the way, here’s a cool use case for tasty that Lars Hupel shared for the upcoming edition of HCAR:

Technische Universität München uses Tasty to assess homework submitted for the introductory functional programming course, which is compulsory for second-year Bachelor’s students in computer science. Students can upload their source code to a web application, which compiles them and executes tests (mostly QuickCheck). Graphical reports are generated with ‘tasty-html’ and presented to the students. Grading happens automatically with a custom test reporter, which – roughly speaking – checks that a minimum number of significant tests have passed. This is a huge improvement over the past years, where tests were run by a homegrown framework which offered only textual output. Tasty however is nicely extensible and allows for easy customization to the instructors’ and students’ needs.

October 31, 2014 10:00 PM

Douglas M. Auclair (geophf)

Wolfgang Jeltsch

Grapefruit updated

Yesterday, I updated the Grapefruit FRP library once again, this time to make it compatible with GHC 7.8. The new version is 0.1.0.5. To install or update, you can use the following commands:

cabal update
cabal install grapefruit-ui-gtk grapefruit-examples

Many thanks to Samuel Gélineau for providing several patches. (By the way, Samuel maintains an interesting page that compares different FRP libraries, the FRP Zoo.)

Grapefruit 0.1 is actually a phase-out model, which I only update to work with newer GHC versions. However, I am working on a new Grapefruit. This will be based on my research about FRP semantics and will be quite different from the old one. I expect that the sound theoretical foundation will lead to a more powerful library with a more sensible interface. One particular new feature will be integration of side effects into FRP, in a purely functional style.


Tagged: FRP, Grapefruit, Haskell, Samuel Gélineau

by Wolfgang Jeltsch at October 31, 2014 10:52 AM

Yesod Web Framework

Slides on conduit and GHCJS

I'm currently sitting in a hotel room in Poznan, Poland, getting ready to attend the second day of PolyConf. In the past two days, I've already had quite some fun. Wednesday- thanks to Matthias Fischmann and Adam Drake- I spoke to the Berlin Haskell users group about conduit. I had no idea that there were this many Haskellers in Berlin, and it was awesome to meet all of you. Yesterday I spoke about Haskell on the client side at PolyConf, mostly talking about how awesome GHCJS is (after Luite indoctrinated me at ICFP).

For those interested, the slides are now available online at:

I've also learnt quite a number of cool client side tricks at PolyConf, hopefully we'll get to take advantage of some of them in Yesod in the near future :).

October 31, 2014 05:00 AM

October 30, 2014

Functional Jobs

Senior Software Engineer at Soda Software Labs (Full-time)

Role: Senior Software Engineer

Primary Location: Manchester, UK

Employee Status: Permanent (Full Time)

Salary: £30K-£70K, depending on skill and experience

Company Overview

Soda Software Labs are searching for a world class software engineer to help with the design and development of their flagship FinTech product Profile. At the heart of the product is a behavioural analysis platform which mines social data and performs analysis to provide insight for a number of industry verticals. The role offers the opportunity to join a small but fast growing company, to innovate in a number of industries and to get in at the ‘ground level’ within the fascinating space of social data.

The Role

We are looking for someone who has the passion, drive and motivation to continually develop our products to the highest standard. Responsibilities and expectations for the applicant will involve:

• Writing code that is correct by design as opposed to bugs that need to be found during testing and then patched;

• Being refactoring and maintainability minded — things need to get done, but the team needs to stay lean and mean, and the code correct and efficient;

• Writing high-level automated tests using ScalaCheck as well as other, higher level system and UI automation tools for functional testing;

• Ability to liaise with and support junior developers and data scientists is a plus;

• Ability to produce well written easy to understand documentation for all audiences.

Knowledge and Skills Initially we are looking for an experienced senior-level software engineer, with experience across a range of programming languages and several different programming paradigms. Experience with functional programming is essential. Experience with or knowledge of at least one of the following languages:

Scala, Haskell, ML/OCaml/F# or Scheme/Lisphe candidate needs to be willing to learn and work using Scala as the primary language, while coming into contact with languages such as Python, R, Javascript and others on a daily basis.

However, we are also in the market for skilled individuals who may not possess the experience but believe themselves to be up to the challenge, skilled in programming and willing to learn.

Additional Experience

• 4-6 years’ experience as a software engineer (preferable)

• Solid understanding of computer science and software engineering (preferably a degree in a relevant subject)

• Excellent critical thinking and analytical skills

• Ability to work to another’s design

• Commercial awareness

• Good communication skills

Get information on how to apply for this position.

October 30, 2014 07:35 AM

October 29, 2014

Keegan McAllister

A taste of Rust (yum) for C/C++ programmers

If, like me, you've been frustrated with the status quo in systems languages, this article will give you a taste of why Rust is so exciting. In a tiny amount of code, it shows a lot of ways that Rust really kicks ass compared to C and C++. It's not just safe and fast, it's a lot more convenient.

Web browsers do string interningto condense the strings that make up the Web, such as tag and attribute names, into small values that can be compared quickly. I recently added event logging support to Servo's string interner. This will allow us to record traces from real websites, which we can use to guide further optimizations.

Here are the events we can log:


#[deriving(Show)]
pub enum Event {
Intern(u64),
Insert(u64, String),
Remove(u64),
}

Interned strings have a 64-bit ID, which is recorded in every event. The Stringwe store for "insert" events is like C++'s std::string; it points to a buffer in the heap, and it owns that buffer.

This enum is a bit fancier than a C enum, but its representation in memory is no more complex than a C struct. There's a tag for the three alternatives, a 64-bit ID, and a few fields that make up the String. When we pass or return an Event by value, it's at worst a memcpyof a few dozen bytes. There's no implicit heap allocation, garbage collection, or anything like that. We didn't define a way to copy an event; this means the String buffer always has a unique owner who is responsible for freeing it.

The deriving(Show) attribute tells the compiler to auto-generatea text representation, so we can print an Eventjust as easily as a built-in type.

Next we declare a global vector of events, protected by a mutex:


lazy_static! {
pub static ref LOG: Mutex<Vec<Event>>
= Mutex::new(Vec::with_capacity(50_000));
}

lazy_static! will initialize both of them when LOG is first used. Like String, the Vec is a growable buffer. We won't turn on event logging in release builds, so it's fine to pre-allocate space for 50,000 events. (You can put underscores anywhere in a integer literal to improve readability.)

lazy_static!, Mutex, and Vec are all implemented in Rust using gnarly low-level code. But the amazing thing is that all three expose a safe interface. It's simply not possible to use the variable before it's initialized, or to read the value the Mutex protects without locking it, or to modify the vector while iterating over it.

The worst you can do is deadlock. And Rust considers that pretty bad, still, which is why it discourages global state. But it's clearly what we need here. Rust takes a pragmatic approach to safety. You can always write the unsafe keywordand then use the same pointer tricks you'd use in C. But you don't need to be quite so guarded when writing the other 95% of your code. I want a language that assumes I'm brilliant but distracted :)

Rust catches these mistakes at compile time, and produces the same code you'd see with equivalent constructs in C++. For a more in-depth comparison, see Ruud van Asseldonk's excellent series of articlesabout porting a spectral path tracer from C++ to Rust. The Rust code performs basically the same as Clang / GCC / MSVC on the same platform. Not surprising, because Rust uses LLVMand benefits from the same backend optimizations as Clang.

lazy_static! is not a built-in language feature; it's a macro provided by a third-party library. Since the library uses Cargo, I can include it in my project by adding

[dependencies.lazy_static]
git = "https://github.com/Kimundi/lazy-static.rs"

to Cargo.toml and then adding


#[phase(plugin)]
extern crate lazy_static;

to src/lib.rs. Cargo will automatically fetch and build all dependencies. Code reuse becomes no harder than in your favorite scripting language.

Finally, we define a function that pushes a new event onto the vector:


pub fn log(e: Event) {
LOG.lock().push(e);
}

LOG.lock() produces an RAII handle that will automatically unlock the mutex when it falls out of scope. In C++ I always hesitate to use temporaries like this because if they're destroyed too soon, my program will segfault or worse. Rust has compile-time lifetime checking, so I can do things that would be reckless in C++.

If you scroll up you'll see a lot of prose and not a lot of code. That's because I got a huge amount of functionality for free. Here's the logging module again:


#[deriving(Show)]
pub enum Event {
Intern(u64),
Insert(u64, String),
Remove(u64),
}

lazy_static! {
pub static ref LOG: Mutex<Vec<Event>>
= Mutex::new(Vec::with_capacity(50_000));
}

pub fn log(e: Event) {
LOG.lock().push(e);
}

This goes in src/event.rsand we include it from src/lib.rs.


#[cfg(feature = "log-events")]
pub mod event;

The cfg attributeis how Rust does conditional compilation. Another project can specify

[dependencies.string_cache]
git = "https://github.com/servo/string-cache"
features = ["log-events"]

and add code to dump the log:


for e in string_cache::event::LOG.lock().iter() {
println!("{}", e);
}

Any project which doesn't opt in to log-eventswill see zero impact from any of this.

If you'd like to learn Rust, the Guide is a good place to start. We're getting close to 1.0and the important concepts have been stable for a while, but the details of syntax and libraries are still in flux. It's not too early to learn, but it might be too early to maintain a large library.

By the way, here are the events generated by interning the three strings foobarbaz foo blockquote:

Insert(0x7f1daa023090, foobarbaz)
Intern(0x7f1daa023090)
Intern(0x6f6f6631)
Intern(0xb00000002)

There are three different kinds of IDs, indicated by the least significant bits. The first is a pointer into a standard interning table, which is protected by a mutex. The other two are created without synchronization, which improves parallelism between parser threads.

In UTF-8, the string foois smaller than a 64-bit pointer, so we store the characters directly. blockquote is too big for that, but it corresponds to a well-known HTML tag. 0xb is the index of blockquote in a static listof strings that are common on the Web. Static atoms can also be used in pattern matching, and LLVM's optimizations for C's switch statements will apply.

by keegan ([email protected]) at October 29, 2014 07:15 PM

Yesod Web Framework

Announcing: yesod-gitrepo

I'm happy to announce the first release of a new package, yesod-gitrepo. This package encapsulates a pattern I've used a number of times, namely: loading and refreshing content from a Git repository. Below is the current contents of the README.md file.

This code is currently being used in production, and should be pretty stable. That said, it has not received a huge amount of battle testing yet. So please due test corner cases before shipping it in production for your site.


yesod-gitrepo provides a means of embedding content from a Git repository inside a Yesod application. The typical workflow is:

  • Use gitRepo to specify a repository and branch you want to work with.
  • Provide a function that will perform some processing on the cloned repository.
  • Use grContent in your Handler functions to access this parsed data.
  • Embed the GitRepo as a subsite that can be used to force a refresh of the data.
  • Set up a commit handler that pings that URL. On Github, this would be a webhook.

This is likely easiest to understand with a concrete example, so let's go meta: here's an application that will serve this very README.md file. We'll start off with language extensions and imports:

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE QuasiQuotes       #-}
{-# LANGUAGE TemplateHaskell   #-}
{-# LANGUAGE TypeFamilies      #-}
import           ClassyPrelude.Yesod
import           Text.Markdown
import           Yesod.GitRepo

Now we're going to create our foundation datatype. We need to give it one field: the GitRepo value containing our parsed content. Our content will simply be the text inside README.md, wrapped up in a Markdown newtype for easy rendering. This gives us:

data App = App
    { getGitRepo :: GitRepo Markdown
    }

instance Yesod App

And now let's set up our routes. We need just two: a homepage, and the subsite. Our subsite type is GitRepo Markdown (as given above). We replace the space with a hyphen as an escaping mechanism inside Yesod's route syntax:

mkYesod "App" [parseRoutes|
/ HomeR GET
/refresh RefreshR GitRepo-Markdown getGitRepo
|]

Next up is our home handler. We start off by getting the current content parsed from the repository:

getHomeR :: Handler Html
getHomeR = do
    master <- getYesod
    content <- liftIO $ grContent $ getGitRepo master

Then it's just some normal Hamlet code:

    defaultLayout $ do
        setTitle "yesod-gitrepo sample"
        [whamlet|
            <p>
                <a href=@{RefreshR GitRepoRoute}>
                    Force a refresh at
                    @{RefreshR GitRepoRoute}
            <article>#{content}
        |]

And finally, our main function. We pass in the repo URL and branch name, plus a function that, given the filepath containing the cloned repo, processes it and generates a Markdown value. Finally, we provide the generated repo value to our App constructor and run it:

main :: IO ()
main = do
    repo <- gitRepo
        "https://github.com/snoyberg/yesod-gitrepo"
        "master"
        $ \fp -> fmap Markdown $ readFile $ fp </> "README.md"
    warp 3000 $ App repo

Give it a shot. You should have a webapp hosting this very README file!

October 29, 2014 01:16 PM

October 28, 2014

Gabriel Gonzalez

How to desugar Haskell code

Haskell's core language is very small, and most Haskell code desugars to either:

  • lambdas / function application,
  • algebraic data types / case expressions,
  • recursive let bindings,
  • type classes and specialization, or:
  • Foreign function calls

Once you understand those concepts you have a foundation for understanding everything else within the language. As a result, the language feels very small and consistent.

I'll illustrate how many higher-level features desugar to the same set of lower-level primitives.

if

if is equivalent to a case statement:

if b then e1 else e2

-- ... is equivalent to:
case b of
True -> e1
False -> e2

This works because Bools are defined within the language:

data Bool = False | True

Multi-argument lambdas

Lambdas of multiple arguments are equivalent to nested lambdas of single arguments:

\x y z -> e

-- ... is equivalent to:
\x -> \y -> \z -> e

Functions

Functions are equivalent to lambdas:

f x y z = e

-- ... is equivalent to:
f = \x y z -> e

-- ... which in turn desugars to:
f = \x -> \y -> \z -> e

As a result, all functions of multiple arguments are really just nested functions of one argument each. This trick is known as "currying".

Infix functions

You can write functions of at least two arguments in infix form using backticks:

x `f` y

-- ... desugars to:
f x y

Operators

Operators are just infix functions of two arguments that don't need backticks. You can write them in prefix form by surrounding them with parentheses:

x + y

-- ... desugars to:
(+) x y

The compiler distinguishes operators from functions by reserving a special set of punctuation characters exclusively for operators.

Operator parameters

The parentheses trick for operators works in other contexts, too. You can bind parameters using operator-like names if you surround them with parentheses:

let f (%) x y = x % y
in f (*) 1 2

-- ... desugars to:
(\(%) x y -> x % y) (*) 1 2

-- ... reduces to:
1 * 2

Operator sections

You can partially apply operators to just one argument using a section:

(1 +)

-- desugars to:
\x -> 1 + x

This works the other way, too:

(+ 1)

-- desugars to:
\x -> x + 1

This also works with infix functions surrounded by backticks:

(`f` 1)

-- desugars to:
\x -> x `f` 1

-- desugars to:
\x -> f x 1

Pattern matching

Pattern matching on constructors desugars to case statements:

f (Left  l) = eL
f (Right r) = eR

-- ... desugars to:
f x = case x of
Left l -> eL
Right r -> eR

Pattern matching on numeric or string literals desugars to equality tests:

f 0 = e0
f _ = e1

-- ... desugars to:
f x = if x == 0 then e0 else e1

-- ... desugars to:
f x = case x == 0 of
True -> e0
False -> e1

Non-recursive let / where

Non-recursive lets are equivalent to lambdas:

let x = y in z

-- ... is equivalent to:
(\x -> z) y

Same thing for where, which is identical in purpose to let:

z where x = y

-- ... is equivalent to:
(\x -> z) y
Actually, that's not quite true, because of let generalization, but it's close to the truth.

Recursive let / where cannot be desugared like this and should be treated as a primitive.

Top-level functions

Multiple top-level functions can be thought of as one big recursive let binding:

f0 x0 = e0

f1 x1 = e1

main = e2


-- ... is equivalent to:
main = let f0 x0 = e0
f1 x1 = e1
in e2

In practice, Haskell does not desugar them like this, but it's a useful mental model.

Imports

Importing modules just adds more top-level functions. Importing modules has no side effects (unlike some languages), unless you use Template Haskell.

Type-classes

Type classes desugar to records of functions under the hood where the compiler implicitly threads the records throughout the code for you.

class Monoid m where
mappend :: m -> m -> m
mempty :: m

instance Monoid Int where
mappend = (+)
mempty = 0

f :: Monoid m => m -> m
f x = mappend x x


-- ... desugars to:
data Monoid m = Monoid
{ mappend :: m -> m -> m
, mempty :: m
}

intMonoid :: Monoid Int
intMonoid = Monoid
{ mappend = (+)
, mempty = 0
}

f :: Monoid m -> m -> m
f (Monoid p z) x = p x x

... and specializing a function to a particular type class just supplies the function with the appropriate record:

g :: Int -> Int
g = f

-- ... desugars to:
g = f intMonoid

Two-line do notation

A two-line do block desugars to the infix (>>=) operator:

do x <- m
e

-- ... desugars to:
m >>= (\x ->
e )

One-line do notation

For a one-line do block, you can just remove the do:

main = do putStrLn "Hello, world!"

-- ... desugars to:
main = putStrLn "Hello, world!"

Multi-line do notation

do notation of more than two lines is equivalent to multiple nested dos:

do x <- mx
y <- my
z

-- ... is equivalent to:
do x <- mx
do y <- my
z

-- ... desugars to:
mx >>= (\x ->
my >>= (\y ->
z ))

let in do notation

Non-recursive let in a do block desugars to a lambda:

do let x = y
z

-- ... desugars to:
(\x -> z) y

ghci

The ghci interactive REPL is analogous to one big do block (with lots and lots of caveats):

$ ghci
>>> str <- getLine
>>> let str' = str ++ "!"
>>> putStrLn str'

-- ... is equivalent to the following Haskell program:
main = do
str <- getLine
let str' = str ++ "!"
putStrLn str'

-- ... desugars to:
main = do
str <- getLine
do let str' = str ++ "!"
putStrLn str'

-- ... desugars to:
main =
getLine >>= (\str ->
do let str' = str ++ "!"
putStrLn str' )

-- ... desugars to:
main =
getLine >>= (\str ->
(\str' -> putStrLn str') (str ++ "!") )

-- ... reduces to:
main =
getLine >>= (\str ->
putStrLn (str ++ "!") )

List comprehensions

List comprehensions are equivalent to do notation:

[ (x, y) | x <- mx, y <- my ]

-- ... is equivalent to:

do x <- mx
y <- my
return (x, y)

-- ... desugars to:
mx >>= (\x -> my >>= \y -> return (x, y))

-- ... specialization to lists:
concatMap (\x -> concatMap (\y -> [(x, y)]) my) mx

The real desugared code is actually more efficient, but still equivalent.

The MonadComprehensions language extension generalizes list comprehension syntax to work with any Monad. For example, you can write an IO comprehension:

>>> :set -XMonadComprehensions
>>> [ (str1, str2) | str1 <- getLine, str2 <- getLine ]
Line1<Enter>
Line2<Enter>
("Line1", "Line2")

Numeric literals

Integer literals are polymorphic by default and desugar to a call to fromIntegral on a concrete Integer:

1 :: Num a => a

-- desugars to:
fromInteger (1 :: Integer)

Floating point literals behave the same way, except they desugar to fromRational:

1.2 :: Fractional a => a

-- desugars to:
fromRational (1.2 :: Rational)

IO

You can think of IO and all foreign function calls as analogous to building up a syntax tree describing all planned side effects:

main = do
str <- getLine
putStrLn str
return 1


-- ... is analogous to:
data IO r
= PutStrLn String (IO r)
| GetLine (String -> IO r)
| Return r

instance Monad IO where
(PutStrLn str io) >>= f = PutStrLn str (io >>= f)
(GetLine k ) >>= f = GetLine (\str -> k str >>= f)
Return r >>= f = f r

main = do
str <- getLine
putStrLn str
return 1

-- ... desugars and reduces to:
main =
GetLine (\str ->
PutStrLn str (
Return 1 ))

This mental model is actually very different from how IO is implemented under the hood, but it works well for building an initial intuition for IO.

For example, one intuition you can immediately draw from this mental model is that order of evaluation in Haskell has no effect on order of IO effects, since evaluating the syntax tree does not actually interpret or run the tree. The only way you can actually animate the syntax tree is to define it to be equal to main.

Conclusion

I haven't covered everything, but hopefully that gives some idea of how Haskell translates higher level abstractions into lower-level abstractions. By keeping the core language small, Haskell can ensure that language features play nicely with each other.

Note that Haskell also has a rich ecosystem of experimental extensions, too. Many of these are experimental because each new extension must be vetted to understand how it interacts with existing language features.

by Gabriel Gonzalez ([email protected]) at October 28, 2014 03:24 AM

Danny Gratzer

The Guts of a Spineless Machine

Posted on October 28, 2014
Tags: haskell, c

It’s fairly well known that Haskell is a bit um.. different from how stock hardware sees the world. I’m not aware of too many processors that have decided that immutability and higher order functions are the right way to go.

Compiling Haskell and its ilk, however, does have one interesting wrinkle on top of the normal problem: laziness. Laziness stands completely at odds with how most everything else works. Moreover, whether or not you think it’s the right default, it’s an interesting question of how to efficiently compile some evaluation strategy other than call by value or name.

To this end, people have built a lot of abstract machines that lazy languages could target. These machines can be mapped easily to what the hardware wants and transitively, we can get our compiler. Most of these work by “graph reduction” (that’s the G in STG) and the latest incarnation of these graph machines is the spineless tagless graph machine which lies at the heart of GHC and a few other compilers.

In this post, I’d like to go over how exactly the STG machine actually works. Turns out it’s pretty cool!

Core Concepts

The basic idea behind a compiler intent on going the STG route is something like

  1. .. front end stuff ..
  2. Translate IL to STG language
  3. Compile STG language to C/ASM/LLVM/Javascript

In GHC case I understand the pipeline is something like

  1. Parsing
  2. Type checking
  3. Desugaring + a few bobs and bits
  4. Translation to core
  5. Lion share of optimization
  6. Translation to STG language
  7. STG language to C–
  8. C– to assembly or llvm

We’re really concerned with parts 6 and 7 here. First things first, let’s lay out what’s exactly in the STG language. It’s a tiny functional language that looks a bit like Haskell or Core, with a few restrictions. A program is simply a series of bindings, much like Haskell. The top levels look something like

f = {x y z} flag {a b c} -> ...

You should read this for now as f = \a b c -> .... The first set of variables and the flag correspond to some stuff we’ll discuss later.

Inside the ... we can write most of what you would expect from Haskell. We have let[rec] bindings, case expressions, application, constructors, literals, and primitives. There is a caveat though: first off all, constructor applications must be fully saturated. This isn’t unlike OCaml or something where you can’t just treat a constructor as a function with an arbitrary name. We would write

\a -> Just a

instead of just Just. Another bit of trickiness: our language has no lambdas! So we can’t even write the above. Instead if we had something like

 map Just [1, 2, 3]

We’d have to write

 let f   = \a -> Just a
     l'' = 3 : nil
     l'  = 2 : l''
     l   = 1 : l'
 in map f l

The reason for the awkward l'' series is that we’re only allowed to apply constructors and functions to atoms (literals and variables).

One other noteworthy feature of STG is that we have primitive operations. They need to be fully saturated, just like constructors, but they work across unboxed things. For example there would probably be something like +# which adds to unboxed integers. To work with these we also have unboxed literals, 1#, 2#, so on and so on.

Now, despite all these limitations placed on STG, it’s still a pretty stinking high level language. There’s letrec, higher order functions, a lot of the normal stuff we’d expect in a functional language. This means it’s not actually to hard to compile something like Haskell or Core to STG (I didn’t say “compile efficiently”).

As an example, let’s look at translating factorial into STG language. We start with

f :: Int -> Int
f i = case i of
  0 -> 1
  i -> i * (f (i - 1))

Now the first step is we change the binding form

f = {} n {i} -> ...

The case expressions clause can remain the same, we’re already casing on an atom

case i of
  (MkInt# i#) -> ...

Now comes the first big change, our boxed integers are going to get in the way here, so the case expression strips away the constructor leaving us with an unboxed integer. We can similarly refactor the body to make evaluation order explicit

 case i of
   MkInt i# ->
     case i# -# 1# of
       dec# ->
         let dec = \{dec#} u {} -> MkInt dec#
         in case fact dec of
              MkInt rest# ->
                case i# * rest# of
                  result# -> MkInt result#

Notice how the case expressions here are used to make the evaluation of various expressions explicit and let was used to create a new thing to evaluate.

Now we can see what those extra {}’s were for. They notate the free variables for a thunk. Remember how we can have all sorts of closures and it can make for some really nice code? Well the machine doesn’t exactly support those naively. What we need to do and note the variables that we close over explicitly and then generate code that will store these free variables with the value that closes over them. This pair is more or less what is called a “closure” for the rest of this post.

Actually, I’ll sometimes use “thunk” as another descriptor for this pair. This is because closures in STG land do quite a lot! In particular, they are used to represent the fundamental unit of lazy code, not just closing over variables. We’ll have closures that actually don’t close over anything! This would be a bit strange, but each “thunk” in Haskell land is going to become a closure in STG-ville. The notion of forcing a thunk in Haskell is analogous to evaluating an STG closure and creating a thunk is creating a new closure. This is helpful to keep in mind as we examine the rest of the machine.

dec for example has a free variable dec# and it exists to box that result for the recursive call to factorial. We use case expressions to get evaluation. Most programs thus become chains of case’s and let alternating between creating thunks and actually doing work.

That u in between the {}’s in dec was also important. It’s the update flag. Remember how in Haskell we don’t want to force the same thunk twice. If I say

let i = 1 + 1 in i + i

We should only evaluate 1 + 1 once. That means that the thunk i will have to be mutated to not evaluate 1 + 1 twice. The update flag signifies the difference between thunks that we want to update and thunks that we don’t. For example, if we replaced the thunk for + with the first result it returned, we’d be mighty surprised. Suddenly 1 + 1 + 1 is just 2!

The u flag says “yes, I’m just a normal expression that should be updated” and the n flag says the opposite.

That about wraps up our discussion of the STG language, let’s talk about how to implement it now.

Semantics

This language wouldn’t be much good if it didn’t lend itself to an easy implementation, indeed we find that the restrictions we placed upon the language prove to be invaluable for its compilation (almost like they were designed that way!).

In order to decide how best to implement it, we first define the formal semantics for our language, which operates on a tuple of 6 things:

  1. The code - the instruction we’re currently executing
  2. The argument stack - A stack of integers or pointers to closures
  3. The return stack - A stack of continuations
  4. The update stack - A stack of update frames
  5. The heap - A map from addresses to closures
  6. The environment - A map from names to addresses of toplevel closures

A code is more or less the current thing we’re attempting to do. It’s either

  1. Eval e p - evaluate an expression in an environment (p)
  2. Enter a - Enter a closure
  3. ReturnCon c ws - Return a constructor applied to some arguments
  4. ReturnInt - Return an integer

Now the idea is we’re going to “unroll” our computations into pushing things onto the continuation stack and entering closures. We start with the code Eval main {}. That is to say, we start by running main. Then if we’re looking at a case we do something really clever

 EVAL(case expr of {pat1 -> expr1; ...}, p) as rs us h o

becomes

EVAL (expr, p) as ({pat1 -> expr1; ...} : rs) us h o

That is to say, we just push the pattern matching on to the continuation stack and evaluate the expression.

At some point we’ll get to a “leaf” in our expression. That is random literal (a number) or constructor. At this point we make use of our continuation stack

EVAL (C ws, p) as ((...; c vs -> expr; ...) : rs) us h o
ReturnCon (C ws) as ((...; c vs -> expr; ...) : rs) us h o
EVAL (expr, p[vs -> ws]) as rs us h o

So our pattern matching is rolled into ReturnCon. ReturnCon will just look on top of the return stack looking for a continuation which wants its constructor and evaluate its expression, mapping the constructor’s variables to the pattern’s variables.

The story is similar for literals

EVAL (Int i, p) as ((...; c vs -> expr; ...) : rs) us h o
ReturnInt i as ((...; i -> expr; ...) : rs) us h o
EVAL (expr, p) as rs us h o

Another phase is how we handle let’s and letrec’s. In this phase instead of dealing with continuations, we allocate more thunks onto the heap.

EVAL ((let x = {fs} f {xs} -> e; ... in expr), p) as rs us h o
EVAL e p' as us h' o

So as we’d expect, evaluating a let expression does indeed go and evaluate the body of the let expression, but changes up the environment in which we evaluate them. We have

p' = p[x -> Addr a, ...]
h' = h[a -> ({fs} f {xs} -> e) p fs, ...]

In words “the new environment contains a binding for x to some address a. The heap is extended with an address a with a closure {fs} f {xs} -> ... where the free variables come from p”. The definition for letrec is identical except the free variables come from p' allowing for recursion.

So the STG machine allocates things in lets, adds continuations with case, and jumps to continuation on values.

Now we also have to figure out applications.

EVAL (f xs, p) as rs us h o
ENTER a (values of xs ++ as) rs us h o

where the value of f is Addr a. So we push all the arguments (remember they’re atoms and therefore trivial to evaluate) on to the argument stack and enter the closure of the function.

How do we actually enter a closure? Well we know that our closures are of the form

({fs} f {vs} -> expr) frees

If we have enough arguments to run the closure (length vs > length of argument stack), then we can just EVAL expr [vs -> take (length vs) as, fs -> frees]. This might not be the case in something like Haskell though, we have partial application. So what do we do in this case?

What we want is to somehow get something that’s our closure but also knows about however many arguments we actually supplied it. Something like

({fs ++ supplied} f {notSupplied} -> expr) frees ++ as

where supplied ++ notSupplied = vs. This updating of a closure is half of what our update stack us is for. The other case is when we do actually enter the closure, but f = u so we’re going to want to update it. If this is the case we add an update from to the stack (as, rs, a) where as is the argument stack, rs is the return stack, and a is the closure which should be updated. Once we’ve pushed this frame, we promptly empty the argument stack and return stack.

We then add the following rules to the definition of ReturnCon

ReturnCon c ws {} {} (as, rs, a) : us h o
ReturnCon c ws as rs us h' o

where h' is the new heap that’s replaced our old closure at a with our new, spiffy, updated closure

h' = h[a -> ({vs} n {} -> c vs) ws]

So that’s what happens when we go to update a closure. But what about partial application?

Enter a as {} (asU, rs, aU) : us h o
Enter a (as ++ asU) rs us h' o

where

h a = ({vs} n {xs} -> expr) frees
h' = h [aU -> ((vs ++ bound) n xs -> e) (frees ++ as)]

This is a simplified rule from what’s actually used, but gives some intuition to what’s happening: we’re minting a new closure in which we use the arguments we’ve just bound and that’s what the result of our update is.

Compiling This

Now that we have some idea of how this is going to work, what does this actually become on the machine?

The original paper by SPJ suggests an “interpreter” approach to compilation. In other words, we actually almost directly map the semantics to C and call it compiled. There’s a catch though, we’d like to represent the body of closures as C functions since they’re well.. functions. However, since all we do is enter closures and jump around to things till the cows come home, it had damn well better be fast. C function calls aren’t built to be that fast. Instead the paper advocates a tiny trampolining-esque approach.

When something wants to enter a closure, it merely returns it and our main loop becomes

 while(1){cont = (*cont)();}

Which won’t stackoverflow. In reality, more underhanded tricks are applied to make the performance suck less, but for we’ll ignore such things.

In our compiled results there will be 2 stacks, not the 3 found in our abstract machine. In the first stack (A-stack) there are pointer things and the B-stack has non-pointers. This are monitored by two variables/registers SpA and SpBwhich keep track of the heights of the two stacks. Then compilation becomes reasonably straightforward.

An application pushes the arguments onto appropriate stacks, adjusts Sp*, and enters the function. A let block allocates each of the bound variables, then the body. Entering a closure simply jumps to the closure’s code pointer. This is actually quite nifty. All the work of figuring out exactly what Enter will do (updates, continuation jiggering) is left to the closure itself.

A case expression is a bit more complicated since a continuation’s representation involves boxing up the local environment for each branch. Once that’s bundled away, we represent a continuation as a simple code pointer. It is in charge of scrutinizing the argument stack and selecting an alternative and then running the appropriate code. This is a lot of work, and, unless I’m crazy, we’ll need two types of bound variables for each branch (really just ptr/non-ptr). The selection of an alternative would be represented as a C switch, letting all sorts of trickery with jump tables be done by the C compiler.

In order to return a value, we do something clever. We take a constructor and point a global variable at its constructor closure, containing its values and jump to the continuation. The continuation can then peek and poke at this global variable to bind things as needed for the alternatives. There is potentially a massive speedup by returning through registers, but this is dangerously close to work.

From here, primitive operations can be compiled to statements/instructions in whatever environment we’re targeting. In C for example we’d just use the normal + to add our unboxed integers.

The last beast to slay is updates. We represent update frames by pointers to argument stacks and a pointer to a closure. That means that the act of updating is merely saving Sp* in an update form, clobbering them, and then jumping into the appropriate closure. We push the update form onto stack B and keep on going.

I realize that this is a glancing overview and I’m eliding a lot of the tricky details, but hopefully this is sufficient to understand a bit about what’s going on at an intuitive level.

Wrap Up

So now that you’ve put all the effort to get through this post, I get to tell you it’s all lies! In reality GHC has applied all manner of tricks and hacks to get fast performance out of the STG model. To be honest I’m not sure where I should point to that explains these tricks because well… I have no idea what they are.

I can point to

If you have any suggestions for other links I’d love to add them!

Thanks Chris Ganas for proof reading

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

October 28, 2014 12:00 AM

Christopher Done

Fast pagination on PostgreSQL

During the implementation of IRCBrowse I discovered that Postgres’s built-in offset is not very fast.

Here are the characteristics of my data:

ircbrowse=> \d event
                                    Table "public.event"
  Column   |           Type           |
-----------+--------------------------+
 timestamp | timestamp with time zone |
 type      | text                     |
 nick      | text                     |
 text      | text                     |
 network   | integer                  |
 channel   | integer                  |
 id        | bigint                   |
Indexes:
    "event_unique" UNIQUE CONSTRAINT,
    btree (network, channel, "timestamp", nick, type, text)
    "event_unique_id" UNIQUE CONSTRAINT, btree (id)
    "event_channel_idx" btree (channel)
    "event_nick_idx" btree (nick)
    "event_timestamp_idx" btree ("timestamp")
    "event_type_idx" btree (type)

And the size:

ircbrowse=> select count(*) from event;
  count
----------
 28673917

Channel 1 is the biggest:

ircbrowse=> select count(*) from event where channel = 1;
  count
----------
 19340467

When you’re working with data on this scale (large, but not “big data”), PostgreSQL handles it beautifully. But the speed of OFFSET/LIMIT is not great:

ircbrowse=> explain analyze select * from event where channel = 1
                            order by id offset 500000 limit 30;
QUERY PLAN
Limit  (cost=5648.81..5818.28 rows=30 width=85)
       (actual time=0.301..0.309 rows=30 loops=1)
   ->  Index Scan using event_unique_id on event
   (cost=0.00..81914674.39 rows=14501220 width=85)
   (actual time=0.020..0.288 rows=1030 loops=1)
         Filter: (channel = 1)

I think that this index scan is simply too expensive. Notice that I’m ordering by id which has a unique btree index on it. Check out the speed:

ircbrowse=> select * from event where channel = 1
            order by id offset 1000 limit 30;
Time: 0.721 ms
ircbrowse=> select * from event where channel = 1
            order by id offset 500000 limit 30;
Time: 191.926 ms

You might think less than a second to sift through 500,000 rows of a 28million row table is pretty good, but I think it sucks. It’s also deceptive. Let’s increase it to 1,000,000 rows (of 19,000,00):

ircbrowse=> select * from event where channel = 1
            order by id offset 1000000 limit 30;
Time: 35022.464 ms

This is getting worse and worse! It’s probably linear in its poor performance.

However, there’s a solution. Use an index table. A separate table which contains foreign keys pointing to this table:

ircbrowse=> \d event_order_index
Table "public.event_order_index"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer | not null
 origin | integer | not null
 idx    | integer | not null
Indexes:
    "event_order_id_origin" UNIQUE CONSTRAINT, btree (id, origin)
    "event_order_idx" btree (id)
    "event_order_idx_idx" btree (idx)
    "event_order_origin_dx" btree (origin)

Now you can have a pagination index for channel 1:

ircbrowse=> select * from event_order_index where idx = 1000 limit 1;
 id | origin | idx
----+--------+------
  1 |      1 | 1000

(I used idx=1000 for channel 1, 2000 for channel 2, etc. so that I would have space for other numerical indexes for the same channel.)

Now you can make a very efficient query for the same data as above:

ircbrowse=> SELECT idx.id,e.timestamp,e.network,e.channel,
ircbrowse=> e.type,e.nick,e.text FROM event e,
ircbrowse-> event_order_index idx
ircbrowse-> WHERE e.id = idx.origin and idx.idx = 1000 and
ircbrowse=> idx.id > 1000000 and idx.id < 1000030
ircbrowse-> ORDER BY e.id asc
ircbrowse-> LIMIT 30;
Time: 1.001 ms

This is more or less constant time.

And you can see this in action on the site. Takes about 30ms to load and render the page if I run this on the server:

$ time curl 'http://ircbrowse.net/browse/haskell?events_page=234'

real	0m0.031s
user	0m0.000s
sys     0m0.004s

Of course, sending a request in your browser will take longer due to the connection overhead and assets, but generally the goal was for it to be very snappy. The old ircbrowse.com (by another individual, who kindly let me have the name) was very slow indeed. You’d see the page loading the data incrementally from the database.

Anyhoo, thought that was a decent, practical PostgreSQL-specific optimization regarding pagination. Hope it was worth writing up.

October 28, 2014 12:00 AM

October 27, 2014

Danny Gratzer

Notes on Focusing

Posted on October 27, 2014
Tags: types

I’ve been spending a lot of time whacking my head on focusing literature. I’d like to jot down some intuition around what a focused system is and how it relates to the rest of the world. I’m going to steer clear of actually proving things but I will point out where a proof would be needed.

What Is Focusing

In a nutshell, focusing is a strategy to create proofs that minimizes the amount of choices available at each step. Focusing is thus amenable to mechanization since a computer is very good at applying a lot of deterministic procedures but incredibly bad at nondeterministic choice.

Now when we set out to define a focused system we usually do something like

  1. Formalize our logical framework with natural deduction
  2. Translate our framework into a sequent calculus
  3. Transform our sequent calculus into a focused variant

At each of these steps there’s a proof that says something like “System 2 is sound and complete with respect to System 1”. We can then chain these proofs together to get that we can transform any nonfocused proof into a focused one (focalization) and the reverse (de-focalization).

In order to actually carry out these proofs there’s a fair amount of work and pain. Usually we’ll need something like cut elimination and/or identity expansion.

Groundwork

Now before we go on to define an example logic, let’s notice a few things. First off, in sequent calculus there are left and right rules. Left rules decompose known facts into other known facts while right rules transform our goal. There’s also an identity sequent which more or less just states

 A is an atom
 —————————————
   Γ, A → A

This is a bit boring though.

Now certain rules are invertible: their conclusion implies their premise in addition to the reverse. For example if I said you must prove A ∧ B clearly we’ll have to prove both A and B in order to prove A ∧ B; there’s no alternative set of rule applications that let us circumvent proving A and B.

This means that if we were mechanically trying to prove something of the form A ∧ B we can immediately apply the right rule that decomposes into 2 goals.

We can these sort of rules invertible or asynchronous. Dually, there are rules that when applied transform our goal into something impossible to prove. Consider ⊥ ∨ ⊤, clearly apply the rule that transforms this into would be a bad idea!

Now if we begin classifying all the left and write rules we’ll notice that the tend to all into 2 categories

  • Things with invertible left rules and noninvertible right rules
  • Things with noninvertible left rules and invertible right rules

We dub the first group “positive” things and the second “negative” things. This is called polarization and isn’t strictly necessary but greatly simplifies a lot of our system.

Now there are a few things that could be considered both positive and negative. For example we can consider as positive with

  Γ → A⁺  Γ → B⁺
 ———————————————
   Γ → A⁺ ∧ B⁺

   Γ, A⁺, B⁺ → C
 —————————————————
   Γ, A⁺ ∧ B⁺ → C

In this case, the key determiner for the polarity of ∧ comes from its subcomponents. We can just treat ∧ as positive along with its subcomponents and with an appropriate dual ∧⁻, our proof system will still be complete.

As a quick example, implication is negative. the right rule

 Γ, A → B
——————————
Γ → A ⊃ B

While its left rule isn’t

 Γ, A ⊃ B → A  Γ, B, A ⊃ B → C
 ——————————————————————————————
         Γ, A ⊃ B → C

Since we could easily have something like ⊥ ⊃ ⊤ but using this rule would entail (heh) proving ! Urk. If our system applied this rules remorselessly, we’d quickly end up in a divergent proof search.

An Actual Focused System

Do note that these typing rules are straight out of Rob Simmons’ paper, linked below

Now that we’ve actually seen some examples of invertible rules and polarized connectives, let’s see how this all fits into a coherent system. There is one critical change we must make to the structure of our judgments: an addition to the form _ → _. Instead of just an unordered multiset on the left, in order to properly do inversion we change this to Γ; Ω ⊢ A where Ω is an ordered list of propositions we intend to focus on.

Furthermore, since we’re dealing with a polarized calculus, we occasionally want to view positive things as negative and vice versa. For this we have shifts, ↓ and ↑. When we’re focusing on some proposition and we reach a shift, we pop out of the focused portion of our judgment.

Our system is broken up into 3 essentially separate judgments. In this judgment we basically apply as many invertible rules as many places as we can.

 Γ, A⁻; Q ⊢ U
——————————————
Γ; ↓A⁻, Q ⊢ U

Γ; A⁺, Ω ⊢ U  Γ; B+; Ω ⊢ U
———————————————————————————
    Γ; A⁺ ∨ B⁺, Ω ⊢ U

  Γ; A⁺, B⁺, Ω ⊢ U
————————————————————
  Γ; A⁺ ∧ B⁺, Ω ⊢ U

——————————————
 Γ; ⊥, Ω ⊢ U

We first look at how to break down Ω into simpler forms. The idea is that we’re going to keep going till there’s nothing left in Ω. Ω can only contain positive propositions so eventually we’ll decompose everything to shifts (which we move into Γ) ⊤+ (which we just drop on the floor) or ⊥ (which means we’re done). These are all invertible rules to we can safely apply them eagerly and we won’t change the provability of our goal.

Once we’ve moved everything out of Ω we can make a choice. If U is “stable” meaning that we can’t break it down further easily, we can pick a something negative out of our context and focus on it

   Γ; [A⁻] ⊢ U
  ————————————–
  Γ, A⁻; • ⊢ U

This pops us into the next judgment in our calculus. However, if U is not stable, then we have to decompose it further as well.

  Γ; • ⊢ A⁺
——————————————
  Γ; • ⊢ ↑ A⁺

———————————
 Γ; • ⊢ ⊤⁻

  Γ; A⁺ ⊢ B⁻
—————————————
Γ; • ⊢ A⁺ ⊃ B⁻

Γ; • ⊢ A⁻   Γ; • ⊢ B⁻
—————————————————————
   Γ; • ⊢ A⁻ ∧ B⁻

If we have a negative connective at the top level we can decompose that further, leaving us with a strictly smaller goal. Finally, we may reach a positive proposition with nothing in Ω. In this case we focus on the right.

  Γ ⊢ [A⁺]
———————————
 Γ; • ⊢ A⁺

Now we’re in a position to discuss these two focused judgments. If we focus on the right we decompose positive connectives

——————————
 Γ ⊢ [⊤⁺]

Γ; • ⊢ A⁻
—————————
Γ ⊢ ↓ A⁻

   Γ ⊢ [A⁺]
—————————————
 Γ ⊢ [A⁺ ∨ B⁺]

   Γ ⊢ [B⁺]
—————————————
 Γ ⊢ [A⁺ ∨ B⁺]

Γ ⊢ [A⁺]   Γ ⊢ [B⁺]
———————————————————
   Γ ⊢ [A⁺ ∧ B⁺]

These judgments follow the ones we’ve already seen. If we encounter a shift, we stop focusing. Otherwise we decompose the topmost positive connective. Now looking at these, you should see that sometimes these rules we’ll lead us to a “mistake”. Imagine if we applied the 4th rule to ⊤ ∨ ⊥! This is why these rules are segregated into a separate judgment.

In this judgment’s dual we essentially apply the exact same rules to the left of the turnstile and on negative connectives.

  Γ; A⁺ ⊢ U
————————————
Γ; [↑A⁺] ⊢ U

Γ ⊢ [A⁺]   Γ; [B⁻] ⊢ U
——————————————————————
  Γ; [A⁺ ⊃ B⁻] ⊢ U

   Γ; [A⁻] ⊢ U
—————————————————
 Γ; [A⁻ ∧ B⁻] ⊢ U

   Γ; [B⁻] ⊢ U
—————————————————
 Γ; [A⁻ ∧ B⁻] ⊢ U

That wraps up our focused system. The idea is now we have this much more limited system which can express the same things our original, unfocused system could. A computer can be easily programmed to do a focused search since there’s much less backtracking everywhere leading to fewer rules being applicable at each step. I think Pfenning has referred to this as removing most of the “don’t-care” nondeterminism from our rules.

Wrap Up

I’m going to wrap up the post here. Proving focalization or even something like cut elimination is quite fiddly and I have no desire at all to try to transcribe it (painfully) into markdown and get it wrong in the process.

Instead, now that you have some idea of what focusing is about, go read Rob Simmons’ paper. It provides a clear account of proving everything necessary prove a focused system is complete and sound with respect to its unfocused counterpart.

Cheers

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

October 27, 2014 12:00 AM

October 24, 2014

The GHC Team

GHC Weekly News - 2014/10/24

Hi *,

Welcome to the weekly GHC news. This one will be short this week, as the preceding one occurred only on Monday - but we'll be going with Fridays from now on, so next week we'll hopefully see a longer list.

  • GHC 7.8.4 tickets have been in waiting, and the RC will be soon after Austin finishes some final merges and tests on his branch. We have not committed a time for the release after the RC, yet we would like people to please seriously test and immediately report any major showstoppers - or alert us of ones we missed.
  • For the GHC 7.10 release, one of the major features we planned to try and merge was DWARF debugging information. This is actually a small component of larger ongoing work, including adding stack traces to Haskell executables. While, unfortunately, not all the work can be merged, we talked with Peter, and made a plan: our hope is to get Phab:D169 merged, which lays all the groundwork, followed by DWARF debugging information in the code generators. This will allow tools like gdb or other extensible debuggers to analyze C-- IR accurately for compiled executables. Peter has written up a wiki page, available at SourceNotes, describing the design. We hope to land all the core infrastructure in Phab:D169 soon, followed by DWARF information for the Native Code Generator, all for 7.10.1
  • This past week, a discussion sort of organically started on the #ghc IRC channel about the future of the LLVM backend. GHC's backend is buggy, has no control over LLVM versions, and breaks frequently with new versions. This all significantly impacts users, and relegates the backend to a second class citizen. After some discussion, Austin wrote up a proposal for a improved backend, and wrangled several other people to help. The current plan is to try an execute this by GHC 7.12, with the goal of making the LLVM backend Tier 1 for major supported platforms.
  • You may notice https://ghc.haskell.org is now responds slightly faster in some cases - we've activated a caching layer (CloudFlare) on the site, so hopefully things should be a little more smooth.

Closed tickets this week: #9684, #9692, #9038, #9679, #9537, #1473.

by thoughtpolice at October 24, 2014 11:43 PM

Danny Gratzer

Update on Old Projects

Posted on October 24, 2014
Tags: haskell, personal

All though most people I talk to know me for my blog, I do occasionally actually write software instead of just talking about it :)

Sadly, as a mercurial user most of my stuff has languished with on bitbucket. I’ve had a few people tell me that this is annoying for various reasons. Yesterday, I finally got around to fixing that!

As of yesterday, all of my interesting projects are mirrored on [github][my-github]. I’m still using mercurial but thanks to the lovely git-hg tool this is not an issue. You can fork, pull-request, or generally peek and poke as you please. From my end all of these actions look like nice mercurial changesets so I can continue to live under my rock where I don’t need to understand Git.

As a quick list of what haskell code is up there now

Which I think includes every project I’ve blogged about here as well as a few others. Sorry it took so long!

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

October 24, 2014 12:00 AM

October 21, 2014

Tom Schrijvers

Postdoc Position in Functional and Constraint Programming

Postdoctoral position in Functional and Constraint Programming at KU Leuven

The Declarative Languages and Artificial Intelligence (DTAI) group of KU Leuven (Belgium) invites applicants for a postdoctoral position in the area of functional and constraint programming. The position revolves around domain-specific languages (DSLs) embedded in Haskell for constraint programming. It is part of the EU project GRACeFUL whose overarching theme is tools for collective decision making. The KU Leuven part of the project is under the direction of prof. Tom Schrijvers.

To apply you must hold a recent PhD (or be about to graduate) related to either functional or constraint programming. Experience in both areas is an advantage.

You will work closely with prof. Schrijvers and his PhD students at KU Leuven, as well as with the GRACeFUL project partners across Europe.

The position is for 3 years. The salary is competitive and the starting date negotiable (but no later than February 1). Moreover, KU Leuven's policy of equal opportunities and diversity applies to this position.

Application procedure: http://people.cs.kuleuven.be/~tom.schrijvers/postdocposition2.html

by Tom Schrijvers ([email protected]) at October 21, 2014 08:08 AM

wren gayle romano

Upcoming talk

For all you local folks, I'll be giving a talk about my dissertation on November 5th at 4:00–5:00 in Ballantine Hall 011. For those who've heard me give talks about it before, not much has changed since NLCS 2013. But the majority of current CL/NLP, PL, and logic folks haven't seen the talk, so do feel free to stop by.

Abstract: Many natural languages allow scrambling of constituents, or so-called "free word order". However, most syntactic formalisms are designed for English first and foremost. They assume that word order is rigidly fixed, and consequently these formalisms cannot handle languages like Latin, German, Russian, or Japanese. In this talk I introduce a new calculus —the chiastic lambda-calculus— which allows us to capture both the freedoms and the restrictions of constituent scrambling in Japanese. In addition to capturing these syntactic facts about free word order, the chiastic lambda-calculus also captures semantic issues that arise in Japanese verbal morphology. Moreover, chiastic lambda-calculus can be used to capture numerous non-linguistic phenomena, such as: justifying notational shorthands in category theory, providing a strong type theory for programming languages with keyword-arguments, and exploring metatheoretical issues around the duality between procedures and values.

Edit 2014.11.05: The slides from the talk are now up.



comment count unavailable comments

October 21, 2014 02:39 AM

October 20, 2014

Magnus Therning

Dijkstra quotes from EWD1284

I recently read through this long post entitles Object Oriented Programming is an expensive disaster which must end. I have to agree I largely agree with what he writes, but I don’t think I ever could have written such a well-researched article, and absolutely never one of equal length ;)

It does include some nice quotes and references and so far I’ve only read one of the many that I bookmarked, Computing Science: Achievements and Challenges (EWD1284). It does include a few quips that, based on other articles I’ve read, seem fairly typical to Dijkstra. I simply love the way he expressed his opinions at times.

This one really ought to have been in the lengthy post on the OOP disaster:

After more than 45 years in the field, I am still convinced that in computing, elegance is not a dispensable luxury but a quality that decides between success and failure; in this connection I gratefully quote from The Concise Oxford Dictionary a definition of “elegant”, viz. “ingeniously simple and effective”. Amen. (For those who have wondered: I don’t think object-oriented programming is a structuring paradigm that meets my standards of elegance.)

And his thoughts on formal methods are well-known of course, as are his thoughts on iterative design. However, I rather like how he expresses a certain level of disgust of the Anglo-Saxon world when writing about those topics:

The idea of a formal design discipline is often rejected on account of vague cultural/philosophical condemnations such as “stifling creativity”; this is more pronounced in the Anglo-Saxon world where a romantic vision of “the humanities” in fact idealizes technical incompetence. Another aspect of that same trait is the cult of iterative design.

It amazes me every time I read something by someone like Dijkstra, just how much things stay the same, even in a field like computing science, which is said to be moving so fast.

October 20, 2014 12:00 AM

FP Complete

New Stackage features

We have two new updates to Stackage: providing cabal.config files and including Haddock documentation.

Haddock documentation on snapshots

Now all new exclusive snapshots will have haddock links, which you can access via the following steps:

That link will be to an index page like this from which you can view documentation of all packages included in the snapshot. This means you can generally view everything in one place, on one high availability service.

Using Stackage without changing your repo

The recommended way to use Stackage is to simply change your remote-repo field in your .cabal/config file and run cabal update. Henceforth your dependencies will be resolved from Stackage, which is backed by high availability Amazon S3 storage servers, and you will have successful build plans, compilation and passing tests. Hurrah!

Try Haskell and the upcoming Haskell.org homepage were both developed with Stackage. This meant I could just specify the Stackage snapshot to use in the README and then the next time I want to upgrade I can just update the snapshot version to get a fresh working build plan.

The issue some people are facing is that they cannot change this remote-repo field, either because they're using a cabal sandbox, which doesn't support this yet, or because they just don't want to.

The solution to this, in my experience, has been for me to manually go and run cabal freeze in a project I've already built to get the cabal.config file and then give these people that file.

Now, it's automated via a cabal.config link on snapshot pages:

For new developers working on an application who want to use Stackage, they can do something like this:

$ cabal sandbox init
$ curl http://www.stackage.org/<stackage version>/cabal.config > cabal.config
$ cabal install --only-dep

Which will install their dependencies from Hackage. We can't guarantee this will always work -- as Stackage snapshots sometimes will have a manual patch in the package to make it properly build with other packages, but otherwise it's as good as using Stackage as a repo.

A cabal freeze output in cabal.config will contain base and similar packages which are tied to the minor GHC version (e.g. GHC 7.8.2 vs GHC 7.8.3 have different base numbers), so if you get a cabal.config and you get a dependencies error about base, you probably need to open up the cabal.config and remove the line with the base constraint. Stackage snapshots as used as repos do not have this constraint (it will use whichever base your GHC major version uses).

Another difference is that cabal.config is more like an “inclusive” Stackage snapshot -- it includes packages not known to build together, unlike “exclusive” snapshots which only contain packages known to build and pass tests together. Ideally every package you need to use (directly or indirectly) will come from an exclusive snapshot. If not, it's recommended that you submit the package name to Stackage, and otherwise inclusive snapshots or cabal.config are the fallbacks you have at your disposal.

October 20, 2014 12:00 AM

October 19, 2014

Neil Mitchell

HLint now spots bad unsafePerformIO

Summary: I've just released a new version of HLint that can spot an unsafePerformIO which should have NOINLINE but doesn't.

I've just released HLint v1.9.10, a tool to suggest improvements to Haskell code. I don't usually bother with release announcements of HLint, as each version just fixes a few things and adds a few hints, it's all in the changelog (plus there have now been 102 releases). The latest release attempts to make using unsafePerformIO a little bit safer. A common idiom for top-level mutable state in Haskell is:

myGlobalVar :: IORef Int
myGlobalVar = unsafePerformIO (newIORef 17)

That is, define a top-level CAF (function with no variables) and bind it to unsafePerformIO of some created mutable state. But the definition above is unsafe. GHC might decide myGlobalVar is cheap and inline it into several uses, duplicating the variable so that some functions update different versions. Running this code through the latest version of HLint we see:

Sample.hs:2:1: Error: Missing NOINLINE pragma
Found:
myGlobalVar = unsafePerformIO (newIORef 17)
Why not:
{-# NOINLINE myGlobalVar #-}
myGlobalVar = unsafePerformIO (newIORef 17)

HLint has spotted the problem, and suggested the correct fix.

Trying it for real

Let's take the package slave-thread-0.1.0 and run HLint on it. Slave thread is a new package that helps you ensure you don't end up with ghost threads or exceptions being thrown but missed - a useful package. Running HLint we see:

Sample.hs:19:1: Error: Missing NOINLINE pragma
Found:
slaves = unsafePerformIO $ Multimap.newIO
Why not:
{-# NOINLINE slaves #-}
slaves = unsafePerformIO $ Multimap.newIO

Sample.hs:20:3: Warning: Redundant $
Found:
unsafePerformIO $ Multimap.newIO
Why not:
unsafePerformIO Multimap.newIO

Sample.hs:25:1: Error: Eta reduce
Found:
fork main = forkFinally (return ()) main
Why not:
fork = forkFinally (return ())

Sample.hs:55:28: Warning: Redundant $
Found:
PartialHandler.totalizeRethrowingTo_ masterThread $ mempty
Why not:
PartialHandler.totalizeRethrowingTo_ masterThread mempty

Sample.hs:72:5: Error: Use unless
Found:
if null then return () else retry
Why not:
Control.Monad.unless null retry

HLint has managed to spot the missing NOINLINE pragma, and also a handful of tiny cleanups that might make the code a little more readable. Fortunately (and independent of HLint), the NOINLINE pragma was added in slave-thread-0.1.1, so the library no longer suffers from that bug.

by Neil Mitchell ([email protected]) at October 19, 2014 09:23 PM

Matthew Sackman

Anonymity in public life

In an article published in the Guardian yesterday, author Kathleen Hale recounts how her first book got some negative reviews by reviewers on a book review website. One reviewer in particular upset her and Kathleen ends up figuring out the reviewer is using a false identity, finds out who the reviewer really is and confronts her. The piece doesn't read to me like some sort of valedictory "I outed a fraud" type piece (though there are some passages in there which are questionable in that direction) and equally there are several passages where Kathleen expresses deep embarrassment and regret for the course of action she took. This episode, and that article in particular has caused substantial reaction: currently 600 comments on the Guardian article plus several other blog posts. There's no shortage of opinion to be found on Twitter either, as you'd expect.

The course of action that Kathleen took seems to be fairly undisputed as far as I can find. There is some dispute from some of the other blog posts as to exactly what was tweeted and said by whom, and there is dispute over Kathleen's claim that there are factual inaccuracies made in a review of her book. It is not disputed that the reviewer was using a false identity and that the reviewer had at least public Twitter, Facebook, and Instagram accounts under the false identity. The false identity was also a real name (Blythe Harris), by which I mean a name which if you introduced yourself by that name, no one would think you're using a false identity. This is distinct from claiming to be Peter Rabbit, or Buzz Lightyear.

Many people have equated Kathleen's actions with stalking. My dictionary defines the verb to stalk as:

  1. to follow or approach (game, prey, etc.) stealthily and quietly
  2. to pursue persistently and, sometimes, attack (a person with whom one is obsessed, often a celebrity)
  3. , 4,... [not relevant]

The second item there certainly fits. The British legal approach, whilst it gives no strict definition gives examples and guidance:

....following a person, watching or spying on them or forcing contact with the victim through any means, including social media.

The effect of such behaviour is to curtail a victim's freedom, leaving them feeling that they constantly have to be careful. In many cases, the conduct might appear innocent (if it were to be taken in isolation), but when carried out repeatedly so as to amount to a course of conduct, it may then cause significant alarm, harassment or distress to the victim.

I'm glad it includes "social media" there. Some comments have suggested that stalking "in real life" is worse than online. This seems bizarre to me: as if through a computer you are not interacting with other human beings but merely with shiny pixels who have no emotional capacity. "In real life" is everything we know. Whilst we're alive we have no personal experience of anything other than "in real life".

So I'm fairly sold on the whole argument that Kathleen's behaviour towards this reviewer can be considered stalking and as such is reprehensible.

To me, the far more interesting issue is the use of anonymity, false identities and any realistic expectation we have of privacy on the internet. A number of people who claim to write book reviews on such sites have suggested that the behaviour of Kathleen is exactly why they write their reviews under false names. I think there's something of a contradiction going on here.

But let's work backwards. Firstly, Kathleen, through some social engineering (she requested from the book review site the address of the reviewer so that she could post her a copy of the book) got the address of the book reviewer. She then used a telephone directory and census results to identify who really lived there (or likely owned the land). Now the use of the telephone directory seems a bit odd to me: telephony directories map names to numbers (and maybe addresses). Yes, you could use it to map an address to a name but it's very inefficient: you're essentially searching through the whole directory looking for the address whilst the directory is sorted by name, not address. So unless it was a very small telephone directory, I don't really buy that. Using census results is far more creditable: they're public documents and when they're online, they do allow you to search by address. In the UK you can only get access to the raw census details 100 years after the census has been published which, to a high probability, rules it out as a means to tie an address to a person who's still alive. You can get statistics and aggregates from more recent census results but you can't get the raw data. I'm assuming that in the US there's no such restriction on access to raw census data. If there is then I don't understand how Kathleen really managed to get a name for the owner of the property.

Instead, in the UK, if you want to find out who owns some land, you can pay the land registry £3 and they'll tell you. Presumably there are means by which you can legally hide this; I'm sure the rich have figured this out - probably some method by which some fake company in a tax haven technically "owns" the land and as they're registered abroad, they don't have to divulge any further details about that company. So yes, you could argue the Land Registry is profiting from facilitating stalkers, but equally there are a bunch of legitimate reasons to need access to such data and I can't think of any sane way to ensure the use of such a service isn't abused. So from that I conclude that unless the owner is a millionaire, the owner of any land is public knowledge.

The use of social engineering to get the address in the first place is more interesting but very obvious. This sort of thing happens a lot and sometimes to horrifying consequences (e.g. the Australian DJs who phoned up a hospital, pretending to be the Queen and Prince of Wales, enquiring as to the health of the Duchess of Cambridge. The nurse fell for the hoax and put the call through. Three days later, the nurse committed suicide). As a species we are not good at taking the time to verify who we're talking to or why. Whilst (hopefully) most of us would hang up if our bank apparently rang us and then asked for our credit card details "for security" this is largely only because it's in the bank's interest (in terms of cost of insurance) to reduce fraud, so they've trained us as such. But in all sorts of other scenarios we implicitly trust people we've no real reason to. A simple example: ticket inspectors on public transport. They may be wearing the uniform, but it could be faked. With their travel-card readers they could be seeing who has the expensive yearly travel cards, scanning the unique numbers from them and then using them to program up fraudulent cards. The crypto on those things is notoriously weak. Has anyone ever requested some means to verify the identity of a ticket inspector? And even if you could, how do you know they're not crooked regardless?

So phoning someone up, impersonating someone else, or pretending to have valid reasons to request the information you're requesting is always likely to work. It might be illegal in some cases, but it's certainly human nature to try to be helpful and if you're given a plausible justification, on what basis could you refuse the request unless it's contrary to some sort of company policy? In this case, if you're concerned about anonymity, wouldn't you be concerned about this possibility, and make use of an anonymous mail box?

Article 8 of the European Convention on Human Rights guarantees an individual's right to respect for privacy and family life, including correspondence. Is privacy the same as anonymity? No, definitely not:

In conflating anonymity and privacy, we have failed to see an important factual difference between them: under the condition of privacy, we have knowledge of a person’s identity, but not of an associated personal fact; whereas under the condition of anonymity, we have knowledge of a personal fact, but not of the associated person’s identity

The vast violations of our lives by state surveillance as revealed by Snowdon over the last year demonstrates the whole-scale collation of everything we do online and off by our governments. This is both being able to observe an action and identify the individual who caused it (thus we have no hope of performing any action anonymously), and being able to observe an individual and know the actions they take (thus no privacy). I can't work out whether the ECHR has anything to say on a right to anonymity; I get the sense that it doesn't try to protect that. So that's basically saying: "the state shouldn't record your every move (as that's an invasion of privacy), but moves that we're interested in, we can know who did them". Of course, we now know they're recording everything anyway.

We also know that computer systems can always be hacked into - there is no real security anywhere. Given a skilled and sufficiently funded adversary, any computer system connected in any way to the internet can be hacked into. Why? Because humans wrote the software that runs on those computers and humans are incapable of writing bug-free software. Look at all the large scale data breaches in recent history. Nothing is secure.

So we have laws that seem to try and protect privacy, but they're violated by our own governments, and in any case, we have countless examples of our inability to store any information securely. So is there really any hope to be able to exist with anonymity on the internet?

As ever, it depends who your adversary is. If your adversary is a government (either your own or some foreign government) then no, you have no hope. If it's a previous partner of yours who has no particular computer training, then yes, you're probably going to have a reasonable chance of being anonymous for a while. But you need to read up on this and think hard: it's not a trivial undertaking. There are some good guides as to how to do this, but:

All writers - whether writing under their own names or not - should be aware of the risks they may incur by hitting 'publish'.

What is the effect of hitting "publish"? It's to put more data points out there which may lead people to be able to identify you. The fewer data points out there, the better. So coming back to our book reviewer, if you want to review books anonymously, and if your justification for acting anonymously is to avoid being stalked by authors who don't like your reviews, then why put so many data points out there? Why have the Facebook page, the Instagram profile with the faked photos, the Twitter account? Why give your real postal address to the book review club knowing they're going to post books to it and might conceivably give your address out to other people?

The social media accounts in particular I find most odd. If you want to review books then review books. Build your following, your reputation and cachet on the quality of your reviews. If I'm looking at a book review I really don't care where you went on holiday, what your tweets are, or how many pets you have. Putting that information out there undermines your entire justification for being anonymous: if you want to be anonymous (i.e. you don't want people to find out who you are) then why are you putting so much unnecessary information out there that may allow people to figure out who you are?

Equally, use a name that clearly communicates to me you're trying to be anonymous: call yourself TheBookReviewer53, DostoyevskyLover or OrwellWasRight. Doing so doesn't lessen the validity of your opinions on your chosen subject and is more honest with people reading your reviews: it's overtly saying "I have reasons to want to exist anonymously on the internet". It reveals nothing more about your real identity either: regardless of the obvious fictitious-ness of your online persona, if you can be found, you can be found.

Researchers show that four data points about a person’s location can identify that person with 95% accuracy. FOUR. You think you can tweet anonymously from your phone? You think apps like Whisper allow you to act anonymously? As with pretty much everything related to the internet and computing, unless you've spent the last 20 years of your life working with computers, studying computers and thinking very hard about threat models and what data you're putting out there, and are utterly paranoid, you basically haven't got a chance. Do you turn off wifi on your phone when you leave the house? You should. You trust that USB pen drive you're transferring documents on? You shouldn't.

Finally and most obviously, any attempt at anonymity clearly doesn't insulate you from the law. As members of various hacking groups such as lulzsec found out, you always can be found out by law enforcement agencies. Yes, you might be able to make it difficult for a poorly funded person to come after you for libel (which is really just an indictment of the disgusting relationship between justice and money) but it's quite a risk to take. If you wouldn't put it in print with your real name attached, you're placing an awful lot of trust on your ability to maintain your anonymity against an adversary you probably don't know as well as you need to.

October 19, 2014 03:00 PM

Well-Typed.Com

Quasi-quoting DSLs for free

Suppose you are writing a compiler for some programming language or DSL. If you are doing source to source transformations in your compiler, perhaps as part of an optimization pass, you will need to construct and deconstruct bits of abstract syntax. It would be very convenient if we could write that abstract syntax using the syntax of your language. In this blog post we show how you can reuse your existing compiler infrastructure to make this possible by writing a quasi-quoter with support for metavariables. As we will see, a key insight is that we can reuse object variables as meta variables.

Toy Language “Imp

For the sake of this blog post we will be working with a toy language called Imp. The abstract syntax for Imp is defined by

type VarName = String

data Expr =
    Var VarName
  | Add Expr Expr
  | Sub Expr Expr
  | Int Integer
  | Read
  deriving (Data, Typeable, Show, Eq)

data Cmd =
    Write Expr
  | Assign VarName Expr
  | Decl VarName
  deriving (Data, Typeable, Show)

data Prog = Prog [Cmd]
  deriving (Data, Typeable, Show)

and we will assume that we have some parsec parsers

parseExpr :: Parser Expr
parseProg :: Parser Prog

We will also make use of

topLevel :: Parser a -> Parser a
topLevel p = whiteSpace *> p <* eof

and the following useful combinator for running a parser:

parseIO :: Parser a -> String -> IO a

The details of these parsers are beyond the scope of this post. There are plenty of parsec tutorials online; for instance, you could start with the parsec chapter in Real World Haskell. Moreover, the full code for this blog post, including a simple interpreter for the language, is available on github if you want to play with it. Here is a simple example of an Imp program:

var x ;
x := read ;
write (x + x + 1)

A simple quasi-quoter

We want to be able to write something like

prog1 :: Prog
prog1 = [prog|
    var x ;
    x := read ;
    write (x + x + 1)
  |]

where the intention is that the [prog| ... |] quasi-quote will expand to something like

prog1 = Prog [ 
      Decl "x"
    , Assign "x" Read
    , Write (Add (Add (Var "x") (Var "x")) (Int 1))
    ]

To achieve this, we have to write a quasi-quoter. A quasi-quoter is an instance of the following data type:

data QuasiQuoter = QuasiQuoter { 
    quoteExp  :: String -> Q Exp
  , quotePat  :: String -> Q Pat
  , quoteType :: String -> Q Type
  , quoteDec  :: String -> Q [Dec] 
  }

The different fields are used when using the quasi-quoter in different places in your Haskell program: at a position where we expect a (Haskell) expression, a pattern (we will see an example of that later), a type or a declaration; we will not consider the latter two at all in this blog post.

In order to make the above example (prog1) work, we need to implement quoteExp but we can leave the other fields undefined:

prog :: QuasiQuoter
prog = QuasiQuoter {
      quoteExp = \str -> do
        l <- location'
        c <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
        dataToExpQ (const Nothing) c
    , quotePat  = undefined
    , quoteType = undefined
    , quoteDec  = undefined
    }

Let’s see what’s going on here. The quasi-quoter gets as argument the string in the quasi-quote brackets, and must return a Haskell expression in the Template-Haskell Q monad. This monad supports, amongst other things, getting the current location in the Haskell file. It also supports IO.

Location

The first thing that we do is find the current location in the Haskell source file and convert it to parsec format:

location' :: Q SourcePos
location' = aux <$> location
  where
    aux :: Loc -> SourcePos
    aux loc = uncurry (newPos (loc_filename loc)) (loc_start loc)

Running the parser

Once we have the location we then parse the input string to a term in our abstract syntax (something of type Prog). We use parsec’s setPosition to tell parsec where we are in the Haskell source file, so that if we make a mistake such as

prog1 :: Prog
prog1 = [prog|
    var x ;
    x := read ;
    write (x + x + )
  |]

we get an error that points to the correct location in our Haskell file:

TestQQAST.hs:6:9:
    Exception when trying to run compile-time code:
      user error ("TestQQAST.hs" (line 9, column 20):
unexpected ")"
expecting "(", "read", identifier or integer)

Converting to Haskell abstract syntax

The parser returns something of type Prog, but we want something of type Exp; Exp is defined in Template Haskell and reifies the abstract syntax of Haskell. For example, we would have to translate the Imp abstract syntax term

Var "x" :: Prog

to its reflection as a piece of abstract Haskell syntax as

AppE (ConE 'Var) (LitE (StringL "x")) :: TH.Exp

which, when spliced into the Haskell source, yields the original Prog value. Fortunately, we don’t have to write this translation by hand, but we can make use of the following Template Haskell function:

dataToExpQ :: Data a 
           => (forall b. Data b => b -> Maybe (Q Exp)) 
           -> a -> Q Exp

This function can translate any term to a reified Haskell expression, as long as the type of the term derives Data (Data instances can be auto-derived by ghc if you enable the DeriveDataTypeable language extension). The first argument allows you to override the behaviour of the function for specific cases; we will see an example use case in the next section. In our quasi-quoter so far we don’t want to override anything, so we pass a function that always returns Nothing.

Once we have defined this quasi-quoter we can write

prog1 :: Prog
prog1 = [prog|
    var x ;
    x := read ;
    write (x + x + 1)
  |]

and ghc will run our quasi-quoter and splice in the Haskell expresion corresponding to the abstract syntax tree of this program (provided that we enable the QuasiQuotes language extension).

Meta-variables

Consider this function:

prog2 :: VarName -> Integer -> Prog
prog2 y n = [prog|
    var x ;
    x := read ;
    write (x + y + n)
  |]

As mentioned, in the source code for this blog post we also have an interpreter for the language. What happens if we try to run (prog2 "x" 1)?

*Main> intIO $ intProg (prog2 "x" 2)
5
*** Exception: user error (Unbound variable "y")

Indeed, when we look at the syntax tree that got spliced in for prog2 we see

Prog [ Decl "x"
     , Assign "x" Read
     , Write (Add (Add (Var "x") (Var "y")) (Var "n"))
     ]

What happened? Didn’t we pass in "x" as the argument y? Actually, on second thought, this makes perfect sense: this is after all what our string parses to. The fact that y and n also happen to Haskell variables, and happen to be in scope at the point of the quasi-quote, is really irrelevant. But we would still like prog2 to do what we expected it to do.

Meta-variables in Template Haskell

To do that, we have to support meta variables: variables from the “meta” language (Haskell) instead of the object language (Imp). Template Haskell supports this out of the box. For example, we can define

ex :: Lift a => a -> Q Exp
ex x = [| id x |]

Given any argument that supports Lift, ex constructs a piece of abstract Haskell syntax which corresponds to the application of the identity function to x. (Don’t confuse this with anti-quotation; see Brief Intro to Quasi-Quotation.) Lift is a type class with a single method

class Lift t where
  lift :: t -> Q Exp

For example, here is the instance for Integer:

instance Lift Integer where
  lift x = return (LitE (IntegerL x))

Meta-variables in quasi-quotes

Quasi-quotes don’t have automatic support for meta-variables. This makes sense: Template Haskell is for quoting Haskell so it has a specific concrete syntax to work with, where as quasi-quotes are for arbitrary custom syntaxes and so we have to decide what the syntax and behaviour of meta-variables is going to be.

For Imp we want to translate any unbound Imp (object-level) variable in the quasi-quote to a reference to a Haskell (meta-level) variable. To do that, we will introduce a similar type class to Lift:

class ToExpr a where
  toExpr :: a -> Expr

and provide instances for variables and integers:

instance ToExpr VarName where
  toExpr = Var

instance ToExpr Integer where
  toExpr = Int

We will also need to know which variables in an Imp program are bound and unbound; in the source code you will find a function which returns the set of free variables in an Imp program:

fvProg :: Prog -> Set VarName

Overriding the behaviour of dataToExpQ

In the previous section we mentioned that rather than doing the Prog -> Q Exp transformation by hand we use the generic function dataToExpQ to do it for us. However, now we want to override the behaviour of this function for the specific case of unbound Imp variables, which we want to translate to Haskell variables.

Recall that dataToExpQ has type

dataToExpQ :: Data a 
           => (forall b. Data b => b -> Maybe (Q Exp)) 
           -> a -> Q Exp

This is a rank-2 type: the first argument to dataToExpQ must itself be polymorphic in b: it must work on any type b that derives Data. So far we have been passing in

const Nothing

which is obviously polymorphic in b since it completely ignores its argument. But how do we do something more interesting? Data and its associated combinators come from a generic programming library called Scrap Your Boilerplate (Data.Generics). A full discussion of of SYB is beyond the scope of this blog post; the SYB papers are a good starting point if you would like to know more (I would recommend reading them in chronological order, the first published paper first). For the sake of what we are trying to do it suffices to know about the existence of the following combinator:

extQ :: (Typeable a, Typeable b) => (a -> q) -> (b -> q) -> a -> q

Given a polymorphic query (forall a)—in our case this is const NothingextQ allows to extend the query with a type specific case (for a specific type b). We will use this to give a specific case for Expr: when we see a free variable in an expression we translate it to an application of toExpr to a Haskell variable with the same name:

metaExp :: Set VarName -> Expr -> Maybe ExpQ
metaExp fvs (Var x) | x `Set.member` fvs = 
  Just [| toExpr $(varE (mkName x)) |]
metaExp _ _ = 
  Nothing

The improved quasi-quoter

With this in hand we can define our improved quasi-quoter:

prog :: QuasiQuoter
prog = QuasiQuoter {
      quoteExp = \str -> do
        l <- location'
        c <- runIO $ parseIO (setPosition l *> topLevel parseProg) str
        dataToExpQ (const Nothing `extQ` metaExp (fvProg c)) c
    , quotePat  = undefined
    , quoteType = undefined
    , quoteDec  = undefined
    }

Note that we are extending the query for Expr, not for Prog; dataToExpQ (or, more accurately, SYB) makes sure that this extension is applied at all the right places. Running (prog2 "x" 2) now has the expected behaviour:

*Main> intIO $ intProg (prog2 "x" 2)
6
14

Indeed, when we have a variable in our code that is unbound both in Imp and in Haskell, we now get a Haskell type error:

prog2 :: VarName -> Integer -> Prog
prog2 y n = [prog|
    var x ;
    x := read ;
    write (x + z + n)
  |]

gives

TestQQAST.hs:15:19: Not in scope: ‘z’

Parenthetical remark: it is a design decision whether or not we want to allow local binding sites in a splice to “capture” meta-variables. Put another way, when we pass in "x" to prog2, do we mean the x that is bound locally in prog2, or do we mean a different x? Certainly a case can be made that we should not be able to refer to the locally bound x at all—after all, it’s not bound outside of the snippet! This is an orthogonal concern however and we will not discuss it any further in this blog post.

Quasi-quoting patterns

We can also use quasi-quoting to support patterns. This enables us to write something like

optimize :: Expr -> Expr
optimize [expr| a + n - m |] | n == m = optimize a
optimize other = other

As before, the occurrence of a in this pattern is free, and we intend it to correspond to a Haskell variable, not an Imp variable; the above code should correspond to

optimize (Sub (Add a n) m) | n == m = optimize a

(note that this is comparing Exprs for equality, hence the need for Expr to derive Eq). We did not mean the pattern

optimize (Sub (Add (Var "a") (Var "n")) (Var "m"))

To achieve this, we can define a quasi-quoter for Expr that supports patterns (as well as expressions):

expr :: QuasiQuoter
expr = QuasiQuoter {
      quoteExp  = \str -> do
        l <- location'
        e <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
        dataToExpQ (const Nothing `extQ` metaExp (fvExpr e)) e
    , quotePat  = \str -> do
        l <- location'
        e <- runIO $ parseIO (setPosition l *> topLevel parseExpr) str
        dataToPatQ (const Nothing `extQ` metaPat (fvExpr e)) e
    , quoteType = undefined
    , quoteDec  = undefined
    }

The implementation of quotePat is very similar to the definition of quoteExp. The only difference is that we use dataToPatQ instead of dataToExpQ to generate a Haskell pattern rather than a Haskell expression, and we use metaPat to give a type specific case which translates free Imp variables to Haskell pattern variables:

metaPat :: Set VarName -> Expr -> Maybe PatQ
metaPat fvs (Var x) | x `Set.member` fvs = Just (varP (mkName x))
metaPat _ _ = Nothing

Note that there is no need to lift now; the Haskell variable will be bound to whatever matches in the expression.

Limitations

We might be tempted to also add support for Prog patterns. While that is certainly possible, it’s of limited use if we follow the same strategy that we followed for expressions. For instance, we would not be able to write something like

opt [prog| var x ; c |] | x `Set.notMember` fvProg c = opt c

The intention here is that we can remove unused variables. Unfortunately, this will not work because this will cause a parse error: the syntax for Imp does not allow for variables for commands, and hence we also don’t allow for meta-variables at this point. This is important to remember:

By using object-level variables as stand-ins for meta-level variables, we only allow for meta-level variables where the syntax for the object-level language allows variables.

If this is too restrictive, we need to add special support in the ADT and in the corresponding parsers for meta-variables. This is a trade-off in increased expressiveness of the quasi-quotes against additional complexity in their implementation (new parsers, new ADTs).

Conclusions

By reusing object-level variables as stand-ins for meta variables you can reuse existing parsers and ADTs to define quasi-quoters. Using the approach described in this blog we were able to add support for quasi-quoting to a real compiler for a domain specific programming language with a minimum of effort. The implementation is very similar to what we have shown above, except that we also dealt with renaming (so that meta variables cannot be captured by binding sites in the quasi quotes) and type checking (reusing the existing renamer and type checker, of course).

by edsko at October 19, 2014 02:01 PM

October 17, 2014

Erik de Castro Lopo

Haskell : A neat trick for GHCi

Just found a really nice little hack that makes working in the GHC interactive REPL a little easier and more convenient. First of all, I added the following line to my ~/.ghci file.

  :set -DGHC_INTERACTIVE

All that line does is define a GHC_INTERACTIVE pre-processor symbol.

Then in a file that I want to load into the REPL, I need to add this to the top of the file:

  {-# LANGUAGE CPP #-}

and then in the file I can do things like:

  #ifdef GHC_INTERACTIVE
  import Data.Aeson.Encode.Pretty

  prettyPrint :: Value -> IO ()
  prettyPrint = LBS.putStrLn . encodePretty
  #endif

In this particular case, I'm working with some relatively large chunks of JSON and its useful to be able to pretty print them when I'm the REPL, but I have no need for that function when I compile that module into my project.

October 17, 2014 10:16 PM

Danny Gratzer

Notes on Quotients Types

Posted on October 17, 2014
Tags: types

Lately I’ve been reading a lot of type theory literature. In effort to help my future self, I’m going to jot down a few thoughts on quotient types, the subject of some recent google-fu.

But Why!

The problem quotient types are aimed at solving is actually a very common one. I’m sure at some point or another you’ve used a piece of data you’ve wanted to compare for equality. Additionally, that data properly needed some work to determine whether it was equal to another piece.

A simple example might would be representing rational numbers. A rational number is a fraction of two integers, so let’s just say

    type Rational = (Integer, Integer)

Now all is well, we can define a Num instance and what not. But what about equality? Clearly we want equivalent fractions to be equal. That should mean that (2, 4) = (1, 2) since they both represent the same number.

Now our implementation has a sticky point, clearly this isn’t the case on its own! What we really want to say is “(2, 4) = (1, 2) up to trivial rejiggering”.

Haskell’s own Rational type solves this by not exposing a raw tuple. It still exists under the hood, but we only expose smart constructors that will reduce our fractions as far as possible.

This is displeasing from a dependently typed setting however, we want to be able to formally prove the equality of some things. This “equality modulo normalization” leaves us with a choice. Either we can really provide a function which is essentially

    foo : (a b : Rational)
        -> Either (reduce a = reduce b) (reduce a /= reduce b)

This doesn’t really help us though, there’s no way to express that a should be observationally equivalent to b. This is a problem seemingly as old as dependent types: How can we have a simple representation of equality that captures all the structure we want and none that we don’t.

Hiding away the representation of rationals certainly buys us something, we can use a smart constructor to ensure things are normalized. From there we could potentially prove a (difficult) theorem which essentially states that

    =-with-norm : (a b c d : Integer)
                -> a * d = b * c -> mkRat a b = mkRat c d

This still leaves us with some woes however, now a lot of computations become difficult to talk about since we’ve lost the helpful notion that denominator o mkRat a = id and similar. The lack of transparency shifts a lot of the burden of proof onto the code privy to the internal representation of the type, the only place where we know enough to prove such things.

Really what we want to say is “Hey, just forget about a bit of the structure of this type and just consider things to be identical up to R”. Where R is some equivalence relation, eg

  1. a R a
  2. a R b implies b R a
  3. a R b and b R c implies a R c

If you’re a mathematician, this should sound similar. It’s a lot like how we can take a set and partition it into equivalence classes. This operation is sometimes called “quotienting a set”.

For our example above, we really mean that our rational is a type quotiented by the relation (a, b) R (c, d) iff a * c = b * d.

Some other things that could potentially use quotienting

  • Sets
  • Maps
  • Integers
  • Lots of Abstract Types

Basically anything where we want to hide some of the implementation details that are irrelevant for their behavior.

More than Handwaving

Now that I’ve spent some time essentially waving my hand about quotient types what are they? Clearly we need a rule that goes something like

 Γ ⊢ A type, E is an equivalence relation on A
———————————————–———————————————————————————————
        Γ ⊢ A // E type

Along with the typing rule

    Γ ⊢ a : A
——————————————————
  Γ ⊢ a : A // E

So all members of the original type belong to the quotiented type, and finally

  Γ ⊢ a : A, Γ ⊢ b : A, Γ ⊢ a E b
–——————————————–——————————————————
         Γ ⊢ a ≡ b : A // E

Notice something important here, that is the fancy shmancy judgmental equality baked right into the language. This calls into question decidability. It seems that a E b could involve some non-trivial proof terms.

More than that, in a constructive, proof relevant setting things can be a bit trickier than they seem. We can’t just define a quotient to be the same type with a different equivalence relation, since that would imply some icky things.

To illustrate this problem, imagine we have a predicate P on a type A where a E b implies P a ⇔ P b. If we just redefine the equivalence relation on quotes, P would not be a wellformed predicate on A // E, since a ≡ b : A // E doesn’t mean that P a ≡ P b. This would be unfortunate.

Clearly some subtler treatment of this is needed. To that end I found this paper discussing some of the handling of NuRPL’s quotients enlightening.

How NuPRL Does It

The paper I linked to is a discussion on how to think about quotients in terms of other type theory constructs. In order to do this we need a few things first.

The first thing to realize is that NuPRL’s type theory is different than what you are probably used to. We don’t have this single magical global equality. Instead, we define equality inductively across the type. This notion means that our equality judgment doesn’t have to be natural in the type it works across. It can do specific things at each case. Perhaps the most frequent is that we can have functional extensionality.

f = g ⇔ ∀ a. f a = g a

Okay, so now that we’ve tossed aside the notion of a single global equality, what else is new? Well something new is the lens through which many people look at NuRPL’s type theory: PER semantics. Remember that PER is a relationship satisfying

  1. a R b → then b R a
  2. a R b ∧ b R c → a R c

In other words, a PER is an equivalence relationship that isn’t necessarily reflexive at all points.

The idea is to view types not as some opaque “thingy” but instead to be partial equivalence relations across the set of untyped lambda calculus terms. Inductively defined equality falls right out of this idea since we can just define a ≡ b : A to be equivalent to (a, b) ∈ A.

Now another problem rears it head, what does a : A mean? Well even though we’re dealing with PERs, but it’s quite reasonable to say something is a member of a type if it’s reflexive. That is to say each relation is a full equivalence relation for the things we call members of that type. So we can therefore define a : A to be (a, a) ∈ A.

Another important constraint, in order for a type family to be well formed, it needs to respect the equality of the type it maps across. In other words, for all B : A → Type, we have (a, a') ∈ A' ⇒ (B a = B a') ∈ U. This should seem on par with how we defined function equality and we call this “type functionality”.

Let’s all touch on another concept: squashed types. The idea is to take a type and throw away all information other than whether or not it’s occupied. There are two basic types of squashing, extensional or intensional. In the intensional we consider two squashed things equal if and only if the types they’re squashing are equal

     A = B
  ————————————
   [A] = [B]

Now we can also consider only the behavior of the squashed type, the extensional view. Since the only behavior of a squashed type is simply existing, our extensional squash type has the equivalence

   ∥A∥ ⇔ ∥B∥
   ————————–
    ∥A∥ = ∥B∥

Now aside from this, the introduction of these types are basically the same: if we can prove that a type is occupied, we can grab a squashed type. Similarly, when we eliminate a type all we get is the trivial occupant of the squashed type, called •.

    Γ ⊢ A
   ———————
   Γ ⊢ [A]

    Γ, x : |A|, Δ[̱•] ⊢ C[̱•]
  ——————————————————————————
    Γ, x : |A|, Δ[x] ⊢ C[x]

What’s interesting is that when proving an equality judgment, we can unsquash obth of these types. This is only because NuRPL’s equality proofs computationally trivial.

Now with all of that out of the way, I’d like to present two typing rules. First

  Γ ⊢ A ≡ A';  Γ, x : A, y : A ⊢ E[x; y] = E'[x; y]; E and E' are PERS
  ————————————————————————————————————————————————————————————————————
                      Γ ⊢ A ‌// E ≡ A' // E'

In English, two quotients are equal when the types and their quotienting relations are equal.

 Γ, u : x ≡ y ∈ (A // E), v : 
∥x E y∥, Δ[u] ⊢ C [u]
 ———————————————————————————————————————————————————–
       Γ, u : x ≡ y ∈ (A // E), Δ[u] ⊢ C [u]

There are a few new things here. The first is that we have a new Δ [u] thing. This is a result of dependent types, can have things in our context that depend on u and so to indicate that we “split” the context, with Γ, u, Δ and apply the depend part of the context Δ to the variable it depends on u.

Now the long and short of this is that when we’re of this is that when we’re trying to use an equivalence between two terms in a quotient, we only get the squashed term. This done mean that we only need to provide a squash to get equality in the first place though

Γ ⊢ ∥ x E y 
∥; Γ ⊢ x : A; Γ ⊢ y : A
——————————————————————————————————–
      Γ ⊢ x ≡ y : A // E

Remember that we can trivially form an ∥ A ∥ from A’.

Now there’s just one thing left to talk about, using our quotiented types. To do this the paper outlines one primitive elimination rule and defines several others.

Γ, x : A, y : A, e : x E y, a : ND, Δ[ndₐ{x;y}] ⊢ |C[ndₐ{x;y}]|
——————————————————————————————————————————————————————————————–
               Γ, x : A // E, Δ[x] ⊢ |C[x]|

ND is a admittedly odd type that’s supposed to represent nondeterministic choice. It has two terms, tt and ff and they’re considered “equal” under ND. However, nd returns its first argument if it’s fed tt and the second if it is fed ff. Hence, nondeterminism.

Now in our rule we use this to indicate that if we’re eliminating some quotiented type we can get any value that’s considered equal under E. We can only be assured that when we eliminate a quotiented type, it will be related by the equivalence relation to x. This rule captures this notion by allowing us to randomly choose some y : A so that x E y.

Overall, this rule simply states that if C is occupied for any term related to x, then it is occupied for C[x].

Wrap up

As with my last post, here’s some questions for the curious reader to pursue

  • What elimination rules can we derive from the above?
  • If we’re of proving equality can we get more expressive rules?
  • What would an extensional quotient type look like?
  • Why would we want intensional or extensional?
  • How can we express quotient types with higher inductive types from HoTT

The last one is particularly interesting.

Thanks to Jon Sterling for proof reading

<script type="text/javascript"> var disqus_shortname = 'codeco'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })(); </script> <noscript>Please enable JavaScript to view the comments powered by Disqus.</noscript> comments powered by Disqus

October 17, 2014 12:00 AM

Jasper Van der Jeugt

Generalizing function composition

TL;DR

In this blogpost I present a proof-of-concept operator $.$, which allows you to replace:

foo x0 x1 x2 ... xN = bar $ qux x0 x1 x2 ... xN

by:

foo = bar $.$ qux

Introduction

This is a literate Haskell file, which means you should be able to just drop it into GHCi and play around with it. You can find the raw .lhs file here. Do note that this requires GHC 7.8 (it was tested on GHC 7.8.2).

> {-# LANGUAGE FlexibleContexts     #-}
> {-# LANGUAGE FlexibleInstances    #-}
> {-# LANGUAGE OverlappingInstances #-}
> {-# LANGUAGE TypeFamilies         #-}
> {-# LANGUAGE TypeOperators        #-}
> import Data.Char (toLower)

If you have been writing Haskell code for a while, you have undoubtedly used the $ operator to “wrap” some expression with another function, mapping over the result type. For example, we can “wrap” the expression toLower 'A' with print to output the result.

print $ toLower 'A'

It is not unlikely either to have functions that just wrap other functions, e.g.:

> printLower :: Char -> IO ()
> printLower x = print $ toLower x

If the function that is being wrapped (toLower in the previous example) has only one argument, the . operator allows writing a very concise definition of functions which just wrap those single-argument functions.

> printLower' :: Char -> IO ()
> printLower' = print . toLower

However, this gets tedious when the number arguments increases. Say that we have the following function which takes three arguments (don’t worry about the horrible implementation, but rather focus on the type):

> -- | Formats a double using a simple spec. Doesn't do proper rounding.
> formatDouble
>     :: Bool    -- ^ Drop trailing '0'?
>     -> Int     -- ^ #digits after decimal point
>     -> Double  -- ^ Argument
>     -> String  -- ^ Result
> formatDouble dropTrailingZero numDigits double =
>     let (pre, post) = case break (== '.') (show double) of
>             (x, '.' : y) -> (x, y)
>             (x, y)       -> (x, y)
>         post'       = take numDigits (post ++ repeat '0')
>         pre'        = case pre of
>             '0' : x -> if dropTrailingZero then x else pre
>             _       -> pre
>     in pre' ++ "." ++ post'

We can wrap formatDouble using print by successively using the . operator, but the result does not look pretty, nor very readable:

> printDouble :: Bool -> Int -> Double -> IO ()
> printDouble = (.) ((.) ((.) print)) formatDouble

The $.$ operator

This makes one wonder if we can’t define an additional operator $.$ (pronounced holla-holla-get-dolla) which can be used like this:

> printDouble' :: Bool -> Int -> Double -> IO ()
> printDouble' = print $.$ formatDouble

Additionally, it should be generic, as in, it should work for an arbitrary number of arguments, so that we can also have:

> printMax' :: Int -> Int -> IO ()
> printMax' = print $.$ max
> printLower'' :: Char -> IO ()
> printLower'' = print $.$ toLower

From this, it becomes clear that the type of $.$ should be something like:

($.$)
    :: (a -> b)
    -> (x0 -> x1 -> ... -> xn -> a)
    -> (x0 -> x1 -> ... -> xn -> b)

The first question is obviously, can we write such an operator? And if we can, how generic is it?

When my colleague Alex asked me this question, I spent some time figuring it out. I previously thought it was not possible to write such an operator in a reasonably nice way, but after some experiments with the closed type families in GHC 7.8 I managed to get something working. However, the solution is far from trivial (and I now suspect more elegant solutions might exist).

A possible solution

Thanks to my colleague Alex for proofreading!

October 17, 2014 12:00 AM

October 15, 2014

Douglas M. Auclair (geophf)

September 2014 1HaskellADay problems and solutions


September, 2014

  • September 1st, 2014: They tried to kill the Metal...I don't know where I'm going with that. But rock-n-roll with today's #haskell exercise http://lpaste.net/110331
  • September 2nd, 2014: Good morning! Triangle Sums is our #haskell problem for today: http://lpaste.net/110404 No triangles were harmed in the solution of their sum (nor in the summation of their solution) http://lpaste.net/110432

  • September 3rd, 2014: Pay It Forward. What? You didn't think I'd just say: today's #haskell problem is hard and leave it at that, did you? http://lpaste.net/110444 Paid. Or: a constructivist approach reduces the generated sets from 200M+ down to 8 possible solutions http://lpaste.net/110684 That's doable. ... and here is the 'Mr. Clean' version of the solution: fast, and neat. Groovy! http://lpaste.net/110685
  • September 4th, 2014: Today's #haskell problem: Abacus words http://lpaste.net/110494 because MRFE says "I don't like your math problems; I want more word problems"
  • September 5th, 2014: These 'edgy' relationships these days!  Remember when today's #haskell problem didn't involve graph theory? http://lpaste.net/110543 Data.Graph FTW! http://lpaste.net/110571 A solution to today's 4sum #haskell problem, and it didn't require generating 1625702400 solutions!
  • September 8th, 2014: We have puzzles 1 and 5 from the "Montley Stew" problem set for today's #haskell problem http://lpaste.net/110699 The solution-style to Montley Stew isswimming in list-compression stew http://lpaste.net/110750
  • September 9th, 2014: King Tut! http://lpaste.net/110789 Our #haskell problem for today is NOT a Pyramid Scheme. Maybe.
  • September 10th, 2014: 'Sed' is 'but(t)' just another word ... in "'laddin" http://lpaste.net/110833 Today's #haskell problem is mix-n-match words. "But(t) I sed ..." ARG! Enough with the 3rd-grade humor! On with the solution to the mix-n-match words! http://lpaste.net/110859
  • September 11th, 2014: A-plus for you when you solve today's #haskell exercise http://lpaste.net/110862 But an F- (NOT an F# ... geddit?) for /usr/share/dict/words :/ A solution to today's injectInto #haskell problem http://lpaste.net/110891
  • September 12th, 2014: Today's #Haskell problem comes from She. She who commands you solve it before coding it. http://lpaste.net/110925 So, you know: there it is. Okay, 'thurt' is a word in WHICH Universe? A solution to today's #haskell 'ditto' problem http://lpaste.net/110955
  • September 15th, 2014: "The name of the game is Connect Four!" and today's #haskell problem http://lpaste.net/111065 as suggested by a tweet from @DrEugeniaCheng. I played Connect 4 against myself and lost! :/ A semi-solution to today's #haskell problem at http://lpaste.net/111105

  • September 16th, 2014: There's more than one way to slice and dice a matrix for today's #haskell problem http://lpaste.net/111109 (follow-up to yesterday's Connect4) A Hack-n-slash solution to today's diagonal view of matrices. http://lpaste.net/111130 Thebonus solution is provided back at the Connect Four answer to make that game complete: http://lpaste.net/111105
  • September 17th, 2014: Do you Yahoo!? Today's #haskell problem: connecting to Yahoo!'s financial webservice http://lpaste.net/111071 I like my java GREEN! http://lpaste.net/111238 (java means 'coffee') A solution to stock webservice #haskell problem.

  • September 18th, 2014: Star (Tuna?) Fish? http://lpaste.net/111216 A radial word-finding/matching game is today's #haskell puzzle. Wait. Quantum...WHAT? http://lpaste.net/111259 A solution to today's #haskell problem using quantum superpositions to solve it. I'm not joking. STAR POWER! http://lpaste.net/edit/111259 A solution for pretty-printing the star-puzzle
  • September 19th, 2014: Continued fractions and dual inversionals are today's #haskell problem http://lpaste.net/111067 It even comes with (thoughts about) flowers. #Ult Today's problem was inspired by a comment, then the main article, from @aperiodicalhttp://aperiodical.com/2013/11/from-the-mailbag-dual-inversal-numbers/#comment-611141 That was some LOOOOOOOONG Division! http://lpaste.net/111314 A solution to today's #haskell problem.
  • September 22nd, 2014: Oh, noes! The 'M'-word! http://lpaste.net/111419 for today's #haskell exercise. Project Eulerproblem 11'M' is for 'Monoid' http://lpaste.net/111435 A solution to today's #haskell problem.
  • September 23rd, 2014: "Oh, the snark bites, with his teeth, dear." MACD Knife ... geddit? http://lpaste.net/111468 Today's #haskell problem is a technical indicator.
  • September 24th, 2014: Jones, Smith, and Brown work at the Bank... but not Mr. Banks.A logic puzzle from 1957 for today's #haskell puzzle http://lpaste.net/111461. A pair of PhDs (http://lpaste.net/111580) helped to solve today's #haskell problem. Neatly, too, I might add.
  • September 25th, 2014: Corned-beef hashi? http://lpaste.net/111452 No, that's not right, and now I'm hungry! :( Shoot! Well, anyway: today's #haskell problem.
  • September 26th, 2014: HA! I TOLD you we'd be getting to cipher-text! http://lpaste.net/111459 From the 1700's, still: it IS cipher text for today's #haskell problem. Ooh! The Plot Thickens (like pea soup)! http://lpaste.net/111679 "ALLMENHAVING" and "be mis T r U st ed " is a solution to today's problem.
  • September 29th, 2014: Big (Crypto) Generator! http://lpaste.net/111795 Today's #haskell problem is a follow-on to Friday's. Human Error ... WHAT human error?http://lpaste.net/111829 A solution to today's make-your-own-cypto-table #haskell problem
  • September 30th, 2014: "I wanna be a blue-collar man!" Yes, but who, and which occupation? http://lpaste.net/111755 Today's #haskell problem addresses this. Plumber, ... or painter? IDK! http://lpaste.net/111876 TWO-solutions to today's #haskell problem (one of them MAY be correct... :/ )

by geophf ([email protected]) at October 15, 2014 05:36 PM