Planet Haskell

May 21, 2025

Chris Smith 2

Threshold Strategy in Approval and Range Voting

How to turn polling insight into an optimal ballot — and why anything else is wasted.

“approve of�? What does that mean anyway?

I have written previously about how approval and range voting methods are intrinsically tactical. This doesn’t mean that they are more tactical than other election systems (nearly all of which are shown to sometimes be tactical by Gibbard’s Theorem when there are three or more options). Rather, it means that tactical voting is unavoidable. Voting in such a system requires answering the question of where to set your approval threshold or how to map your preferences to a ranged voting scale. These questions don’t have more or less “honest� answers. They are always tactical choices.

But I haven’t dug deeper into what these tactics look like. Here, I’ll do the mathematical analysis to show what effective voting looks like in these systems, and make some surprising observations along the way.

Mathematical formalism for approval voting

We’ll start by assuming an approval election, so the question is where to put your threshold. At what level of approval do you switch from voting not to approve a candidate to approving them?

We’ll keep the notation minimal:

  • As is standard in probability, I’ll write â„™[X] for the probability of an event X, and ğ�”¼[X] for the expected value of a (numerical) random variable X.
  • I will use B to refer to a random collection (multiset) of ballots, drawn from some probability distribution reflecting what we know from polling and other information sources on other voters. B will usually not include the approval vote that you’re considering casting, and to include that approval, we’ll write B ∪ {c}, where c is the candidate you contemplate approving.
  • I’ll write W(·) to indicate the winner of an election with a given set of ballots. This is the candidate with the most approvals. We’ll assume some tiebreaker is in place that’s independent of individual voting decisions; for instance, candidates could be shuffled into a random order before votes are cast, in in the event of a tie for number of approvals, we’ll pick the candidate who comes first in that shuffled order.
  • U(·) will be your utility function, so U(c) is the utility (i.e., happiness, satisfaction, or perceived social welfare) that you personally will get from candidate c winning the election. This doesn’t mean you have to be selfish, per se, as accomplishing some altruistic goal is still a form of utility, but we evaluate that utility from your point of view even though other voters may disagree.

With this notation established, we can clearly state, almost tautologically, when you should approve of a candidate c. You should approve of c whenever:

�[U(W(B ∪ {c}))] > �[U(W(B))]

That’s just saying you should approve of c if your expected utility from the election with your approval of c is more than your utility without it.

The role of pivotal votes and exact strategy

This inequality can be made more useful by isolating the circumstances in which your vote makes a difference in the outcome. That is, W(B ∪ {c}) ≠ W(B). Non-pivotal votes contribute zero to the net expectation, and can be ignored.

In approval voting, approving a candidate can only change the outcome by making that candidate the winner. This means a pivotal vote is equivalent to both of:

  • W(B ∪ {c}) = c
  • W(B) ≠ c

It’s useful to have notation for this, so we’ll define V(B, c) to mean that W(B ∪ {c}) ≠ W(B), or equivalently, that W(B ∪ {c}) = c and W(B) ≠ c. To remember this notation, recall that V is the pivotal letter in the word “pivot�, and also visually resembles a pivot.

With this in mind, the expected gain in utility from approving c is:

  • ğ�”¼[U(W(B ∪ {c}))] - ğ�”¼[U(W(B))]. But since the utility gain is zero except for pivotal votes, this is the same as
  • â„™[V(B, c)] · (ğ�”¼[U(W(B ∪ {c})) | V(B, c)] - ğ�”¼[U(W(B)) | V(B, c)]). But since V(B, c) implies that W(B ∪ {c}) = c, so this simplifies to
  • â„™[V(B, c)] · (U(c) - ğ�”¼[U(W(B)) | V(B, c)])

Therefore, you ought to approve of a candidate c whenever

U(c) > �[U(W(B)) | V(B, c)]

This is much easier to interpret. You should approve of a candidate c precisely when the utility you obtain from c winning is greater than the expected utility in cases where c is right on the verge of winning (but someone else wins instead).

There are a few observations worth making about this:

  • The expectation clarifies why the threshold setting part of approval voting is intrinsically tactical. It involves evaluating how likely each other candidate is to win, and using that information to compute an expectation. That means advice to vote only based on internal feelings like whether you consider a candidate acceptable is always wrong. An effective vote takes into account external information about how others are likely to vote, including polling and understanding of public opinion and mood.
  • The conditional expectation, assuming V(B, c), tells us that the optimal strategy for whether to approve of some candidate c depends on the very specific situation where c is right on the verge of winning the election. If c is a frontrunner in the election, this scenario isn’t likely to be too different from the general case, and the conditional probability doesn’t change much. However, if c is a long-shot candidate from some minor party, but somehow nearly ties for a win, we’re in a strange situation indeed: perhaps a major last-minute scandal, a drastic polling error, or a fundamental misunderstanding of the public mood. Here, the conditonal expected utility of an alternate winner might be quite different from your unconditional expectation. If, say, voters prove to have an unexpected appetite for extremism, this can affect the runner-ups, as well.
  • Counter-intuitively, an optimal strategy might even involve approving some candidates that you like less than some that you don’t approve! This can happen because different candidates are evaluated against different thresholds. Therefore, a single voter’s best approval ballot isn’t necessarily monotonic in their utility rankings. This adds a level of strategic complexity I hadn’t anticipated in my earlier writings on strategy in approval voting.

Approximate strategy

The strategy described above is rigorously optimal, but not at all easy to apply. Imagining the bizarre scenarios in which each candidate, no matter how minor, might tie for a win, is challenging to do well. We’re fortunate, then, that there’s a good approximation. Remember that the utility gain from approving a candidate was equal to

ℙ[V(B, c)] · (U(c) - �[U(W(B)) | V(B, c)])

In precisely the cases where V(B, c) is a bizarre assumption that’s difficult to imagine, we’re also multiplying by ℙ[V(B, c)], which is vanishingly small, so this vote is very unlikely to make a difference in the outcome. For front-runners, who are relatively much more likely to be in a tie for the win, the conditional probability changes a lot less: scenarios that end in a near-tie are not too different from the baseline expectation.

This happens because ℙ[V(B, c)] falls off quite quickly indeed as the popularity of c decreases, especially for large numbers of voters. For a national scale election (say, about 10 million voters), if c expects around 45% of approvals, then ℙ[V(B, c)] is around one in a million. That’s a small number, telling us that very large elections aren’t likely to be decided by a one-vote margin anyway. But it’s gargantuan compared to the number if c expects only 5% of approvals. Then ℙ[V(B, c)] is around one in 10^70. That’s about one in a quadrillion-vigintillion, if you want to know, and near the scale of possibly picking one atom at random from the entire universe! The probability of casting a pivotal vote drops off exponentially, and by this point it’s effectively zero.

With that in mind, we can drop the condition on the probability in the second term, giving us a new rule: Approve of a candidate c any time that:

U(c) > �[U(W(B))]

That is, approve of any candidate whose win you would like better than you expect to like the outcome of the election.

  • This rule is still tactical. To determine how much you expect to like the outcome of the election, you need to have beliefs about who else is likely to win, which still requires an understanding of polling and public opinion and mood.
  • However, there is one threshold, derived from real polling data in realistic scenarios, and you can cast your approval ballot monotonically based on that single threshold.

This is no longer a true optimal strategy, but with enough voters, the exponential falloff in ℙ[V(B, c)] as c becomes less popular is a pretty good assurance that the incorrect votes you might cast by using this strategy instead of the optimal ones are extremely unlikely to matter. In practice, this is probably the best rule to communicate to voters in an approval election with moderate to large numbers of voters.

Reducing range to approval voting

I promised to look at strategy for range voting, as well. Armed with an appreciation of approval strategy, it’s easy to extend this to an optimal range strategy, as well, for large-scale elections.

The key is to recognize that a range voting election with options 0, 1, 2, …, n is mathematically equivalent to an approval election where everyone is just allowed to vote n times. The number you mark on the range ballot can be interpreted as saying how many of your approval ballots you want to mark as approving that candidate.

Looking at it this way presents the obvious question: why would you vote differently on some ballots than others? In what situation could that possibly be the right choice?

  • For small elections, say if you’re voting on places to go out and eat with your friends or coworkers, it’s possible that adding in a couple ballots materially changes the election so that the optimal vote is different. Then it may well be optimal to cast a range ballot using some intermediate number.
  • For large elections, though, you’re presented with pretty much exactly the same question each time, and you may as well give the same answer. Therefore, in large-scale elections, the optimal way to vote with a range ballot is always to rate everyone either the minimum or maximum possible score. This reduces a range election exactly to an approval election. The additional expressiveness of a range ballot is a siren call: by using it, you always vote less effectively than you would have by ignoring it and using only the two extreme choices.

Since we’re discussing political elections, which have relatively large numbers of voters, this answers the question for range elections, as well: Rate a candidate the maximum score if you like them better than you expect to like the outcome of the election. Otherwise, rate them the minimum score.

Summing it up

What we’ve learned, then, is that optimal voting in approval or range systems boils down to two nested rules.

  • Exact rule (for the mathematically fearless): approve c iff U(c) > ğ�”¼[ U(W(B)) | your extra vote for c is pivotal ]. This Bayesian test weighs each candidate against the expected utility in the razor-thin worlds where they tie for first.
  • Large-electorate shortcut (for everyone else): because those pivotal worlds become astronomically rare as the field grows, the condition shrinks to a single cutoff: approve (or give a maximum score) to every candidate whose victory you expect to enjoy more than the forecasted winner’s.

We’ve seen why the first rule is the gold standard; but the second captures virtually all of its benefit when millions are voting. Either way, strategy is inseparable from sincerity: you must translate beliefs about polling into a utility threshold, and then measure every candidate against it. We’ve also seen by a clear mathematical equivalence why range ballots add no real leverage in large-scale elections, instead only offering false choices that are always wrong.

The entire playbook fits on a sticky note: compute the threshold, vote all-or-nothing, and let the math do the rest.

by Chris Smith at May 21, 2025 03:41 AM

May 19, 2025

Monday Morning Haskell

Comparing Code: LeetCode Problems in Rust vs. Haskell

Today will be the first in a series where we’ll be exploring some LeetCode problems and comparing different solutions from Haskell and Rust. The main idea is to demonstrate how you might translate ideas between the recursive core of Haskell, and the loop-based framing of most other languages.

If you want to learn more about problem solving in Haskell, you should take a closer look at Solve.hs! This course will give you an in-depth walkthrough of problem solving ideas in Haskell, including how concepts compare to more typical languages.

The Problem

The first problem we’ll consider is called H-Index. In academia, a person has an “H-Index” of n if they have published at least n papers that have n or more citations. So the input to our problem is a list of integers, where each integer is the number of citations of a particular paper the author wrote. Our job is to calculate the author’s H-Index.

The Algorithm

This problem is fairly straightforward if you sort the input list. Once we do this, we can look at any index i, and consider the number of remaining entries (e.g. n - i), and we’ll know that the number of papers with at least that many citations is n - i.

So we can accomplish this task with a single loop over the sorted list. Throughout this loop, we’ll be tracking the maximum “H-Index” we’ve seen so far (maxH). At each iteration, we take the following steps:

Get the number of remaining papers (rem) and the citations at this index (next) If the rem is greater than next, then update maxH to next if next is larger. Otherwise, update maxH to rem if rem is greater.

The last step is a key edge case! If we have the list [1, 1, 1, 9, 9], we’ll get to index 3, with next being 9 and rem being 2. The remainder is smaller than the index, but we would still update maxH to 2, because there are at least 2 citations remaining that are 2 or greater.

Rust Solution

Here’s our Rust solution:

pub fn h_index(citations: Vec<i32>) -> i32 {
    let mut cp = citations.clone();
    cp.sort();
    let n = cp.len();
    let mut maxH: i32 = 0;
    for i in 0..n {
        let next = cp[i];
        let rem: i32 = (n - i) as i32;
        if (rem >= next) {
            maxH = std::cmp::max(next, maxH);
        } else {
            maxH = std::cmp::max(rem, maxH);
        }
    }
    return maxH;
}

We have the first part, where we clone the input, sort it, and set up our loop variables:

pub fn h_index(citations: Vec<i32>) -> i32 {
    let mut cp = citations.clone();
    cp.sort();
    let n = cp.len();
    let mut maxH: i32 = 0;
    ...
}

Then we have the loop itself, where we have our two cases to consider:

for i in 0..n {
    let next = cp[i];
    let rem: i32 = (n - i) as i32;
    if (rem >= next) {
        // There are at least ‘next’ papers >= ‘next’
        maxH = std::cmp::max(next, maxH);
    } else {
        // ‘next’ > ‘rem’, so there are at least ‘rem’ papers >= ‘rem’
        maxH = std::cmp::max(rem, maxH);
    }
}

So this is pretty straightforward. Now how do we approach this kind of problem in Haskell?

Haskell Solution

Our Haskell solution will have the same structure, but instead of running a loop and indexing into a vector, we’ll use a linked list and call a recursive function. Let’s begin by getting the length and sorting our input:

import qualified Data.List as L

hIndex :: [Int] -> Int
hIndex inputs = ...
  where
    n = length inputs
    sorted = L.sort inputs

    ...

Now we need to think about our recursive loop function. At each iteration, we need access to the remaining number of values, the next citation value, and we need to pass along maxH. As with many list-based recursive functions, we’ll peel off one element of the input list each time. Ultimately we’ll return maxH from this loop when we hit our base case of an empty input list. So its type signature should look like this:

loop :: (Int, [Int], Int) -> Int

When writing a recursive function, we always handle the base case first:

loop :: (Int, [Int], Int) -> Int
loop (_, [], maxH) = maxH

Now in the recursive case, we can apply our algorithm, updating maxH if necessary:

loop :: (Int, [Int], Int) -> Int
loop (_, [], maxH) = maxH
loop (remaining, next : rest, maxH) = if remaining >= next
  then loop (remaining - 1, rest, max next maxH)
  else loop (remaining - 1, rest, max remaining maxH)

To finish up, all we need to do is call our loop function with the appropriate initial inputs (n, sorted, 0). Here’s our complete Haskell solution:

import qualified Data.List as L

hIndex :: [Int] -> Int
hIndex inputs = loop (n, sorted, 0)
  where
    n = length inputs
    sorted = L.sort inputs

    loop :: (Int, [Int], Int) -> Int
    loop (_, [], maxH) = maxH
    loop (remaining, next : rest, maxH) = if remaining >= next
      then loop (remaining - 1, rest, max next maxH)
      else loop (remaining - 1, rest, max remaining maxH)

Using a Fold

Now we can notice that our loop has a particular structure. We have one piece of accumulated state (maxH), and this changes based on each value in our list (combined with the remaining values). We can easily re-imagine this kind of loop using a fold. We just have to think of the folding function like this:

loop :: Int -> (Int, Int) -> Int
loop maxH (remaining, next) = if remaining >= next
  then max next maxH
  else max remaining maxH

This has the a -> b -> a structure of a left-fold function, where a is our accumulated maxH value, and the other values come from our list. The main benefit here is that our loop function no longer has to deal with the burden of calling a base case or passing the “shrinking” list as an argument to the next recursive call.

We can invoke this loop at the top level like so:

hIndex :: [Int] -> Int
hIndex inputs = foldl loop 0 (zip [n,n-1..1] sorted)
  where
    n = length inputs
    sorted = L.sort inputs

    loop :: Int -> (Int, Int) -> Int
    loop maxH (remaining, next) = if remaining >= next
      then max next maxH
      else max remaining maxH

We just have to zip the decreasing indices together with our sorted list. Now our recursive “loop” is more like a typical for-loop. We’re only considering one element at a time, and we’re updating the important state each time.

Conclusion

In this comparison, we saw a good comparison between a normal for-loop in Rust, and a recursive solution in Haskell. We also saw how we could simplify this recursive formulation into a “fold” structure.

If you're interested in learning more about writing recursive functions in Haskell, check out our Solve.hs course. You’ll learn how to start thinking about problems in a functional way, and you’ll learn the step-by-step processes for tackling problems with basic recursion and folds like we saw in this example.

by James Bowen at May 19, 2025 08:30 AM

May 15, 2025

Tweag I/O

Why we blog

I was the chief editor for this blog for the past 8 years or so, and I’ve just recently passed the mantle to Chris Harrison. I thought I’d take the opportunity to write a little bit about this blog, how it’s operated and what it means to us. Besides, we do like when things get meta here, so this is a blog post about the blog.

A little bit of history

One of the tenets under which Mathieu Boespflug founded Tweag was that software engineers naturally don’t write enough. Writing is an essential part of the engineering job. We write issues, pull requests, code comments, documentation of various sorts. We discuss and debate online, we have to arbitrate trade-offs. Most of this is common, in fact, to every engineering profession. And although we typically become engineers because of our taste for the technical part of our job (I certainly did, I don’t know for sure how ubiquitous it is); writing is still a big part of our job.

So Mathieu reasoned, if writing is to be such a big part of our job, but we’re technical people at heart, not writers, for the company to be at its best it needs to make it clear that precise and clear writing is important to our job. To that effect, there ought to be venues for us to write, where it was made an expectation, an actual requirement, for us to write. The blog is one of them.

At first, the blog was just there. After all, our blog was, and still is, just (part of) a Github repository. So we’d make pull requests and merge blog posts. But, of course, people seek review. When you’re posting in the name of a company, you tend to be a little more careful about what you write; besides, we’re all so used to having our PRs reviewed. This was all very ad hoc, there was no process for it.

Whatever my reason at the time (I honestly don’t remember after all this time), I ended up participating in the review of most blog posts. It’s a poorly kept, but surprisingly little discussed secret that the most common way to get a responsibility in a company is to just assume said responsibility. Do the thing, and it will become your charge. Anyway, I became editor in chief.

Why we blog

Besides being a venue to exercise our technical writing, this blog has been tremendously useful to us. Keeping a high-quality, serious, technical blog helped establish trust in our work, build bridges with community, and attract clients and employees.

It’s very rewarding for us, as individuals, to be recognised by our peers, and to work with a company which is likewise recognised. This is certainly a great motivation for us to write.

Besides, it fits very well with our open-source values, both as a company and as individuals. We strongly believe in open-source, and the value it has to the world. So we participate. And open-source software isn’t just about putting software’s sources out there. Open-source is also about sharing knowledge. Which is another important element of the company’s ethos. We do have formal and informal venues for sharing knowledge internally, all very important stuff, but it’s a topic for another time. The blog is one of our main venues for sharing knowledge with the rest of the world.

So we do share, we talk about our open-source journey, the software we build, or the software we use. But we also share what we learn through working with clients, not necessarily on open-source projects.

So this blog is something that operates at the sweet spot of any company’s activities: it’s something we do because we believe that it’s the right thing to do, and it’s also something which is valuable to the company. In fact the blog brought so many clients that it allowed Tweag to operate without a marketing team for quite a few years. Nowadays, the company’s bigger and we can’t rely on a blog alone, but it’s still a great way to connect.

How we blog

Writing a blog post is making a pull request against the Github repository for the tweag.io website. A post is a Markdown file, possibly accompanied by resources (such as images). It’s a workflow which is very familiar to software engineers. All the review process is, in fact, a Github pull request review.

The author of a blog post is first tasked with obtaining a “technical review” from their peers: people close to the topic review the blog post for accuracy and relevance. The blog post editors are all engineers (the entire process, in fact, is owned by engineers), but aren’t necessarily very familiar with the blog post’s topic, so this first round of review couldn’t be performed by the editing team, it’s also much better for scalability as technical review uses a little time from a lot of people, rather than a lot of time from a few.

When the author and technical reviewers are satisfied with the blog post, they send it to the editing team, which will do two rounds of review (which we cleverly call “first editorial review” and “second editorial review”). The editing team reviews for clarity and writing quality. Something that isn’t a concern of reviewers (or really of the process as a whole), on the other hand, is SEO. Certainly our blog posts make our website easier to find, and of course this is important to us. But this is the result, we hope, of making blog posts that you like.

We can propose edits in three different ways: we can ask question in the review thread, propose “suggestions” in the Github interface which the author is free to accept or not, or we can push directly to the blog post (in which case we try our best to preserve the author’s voice, and always leave time for the author to check that the edits are correct). The choice between the three is primarily driven by our degree of confidence in the edit. Really we do what makes sense to reduce the amount of back-and-forth.

Reviewing blog posts on Github with a team distributed all around the world is essentially a distributed process. As with any distributed process, we try to minimise the amount of synchronisation. Our target is to be able to do a review in two weeks or less (one week for each round). Sometimes it takes more though, when the blog post presents more difficult editing challenges, or when the author doesn’t have enough availability outside of their client work to respond to our reviews quickly enough. But we hit the target more often than not.

The two rounds of review are functionally identical. We just found that, in practice, having two rounds of review helps raise the quality a lot. The first reviewer often gets involved deeply in the writing of the blog post, to the point that they stop seeing the blog post as a reader. The second reviewer receives a blog post in almost publishable condition and brings fresh eyes to conduct the finishing touch.

What blogging means to us

When Tweag was a standalone company, blogging was almost our entire marketing strategy. Now that we’re part of the substantially larger Modus Create, we aren’t in a position to rely solely on technical blogging for marketing. Yet, this blog is still an important part of our strategy.

All of what I wrote above still applies, but I’d like to bring up something else. See, a company is a little bit faceless. It doesn’t have a drive, it doesn’t have a personality: people do. This is truer the larger a company is. We don’t aspire to be a faceless abstract entity, though. Modus Create is made of people, people that we believe to be interesting, and that we hope you’ll find interesting too. In fact our business is largely for clients (maybe you!) to find our people interesting and hire us. This blog is one of the ways we use to promote individuals. Who they are, what they like, what they know, what they can do. This is why our writing recommendations encourage authors to use the pronoun “I” where many of us would be tempted to say “we”.

A more collectivist consideration is communities. We take part in a number of communities (Haskell, Nix, Bazel, Typescript, …), and it’s all too easy for a company to say it’s part of a community but really meaning that they just use Haskell, Nix, Bazel, Typescript. But it’s not really what being part of a community means, does it? It also entails taking part in community life. There’s a bunch of things we do: we write libraries, contribute upstream, help with governance, sponsor and sometimes organise events, …. And we blog, which also plays its part in community life. This is why we don’t shy away from highly specialised blog posts. Sure they are addressed to a pretty narrow, sometimes quite niche audience, but they are relevant to one of our communities. Of course it helps establish our technical chops. Hopefully it also builds trust.

I should note that what I’m writing about is what works for us. It’s not a recipe that will automatically work for you. There are many excellent technical blogs out there, which, I assume, probably have different approaches to us, such as CockroachDB’s, Netflix’s, fly.io’s and Trail of Bits’s (the latter two I wasn’t aware of, by the way, they were brought to my attention by this blog post’s technical review). If you like this blog, you’ll probably like theirs, go check them out! Dan Luu shares some more general considerations. But at the end of the day, what makes a strong communication strategy is to build around and empower the people you already have. Capitalise on your strengths, don’t go against the grain just to imitate what someone else does, however successful they appear to be.

Sign-off

A paradox of the work of software engineers (and probably all engineering disciplines really) is that writing is such an integral part of our job, but most of us receive virtually no training in our studies.

I’ve never had technical writing classes, myself. The way I learned writing was in part by gleaning some of the popular wisdom taught among my peers, but mostly by co-authoring scientific articles with better technical writers than me and seeing them do their magic. Honestly, I found no better learning experience than seeing an entire paragraph of mine rewritten in a mere few words that were also more precise.

I don’t know how to teach technical writing formally. So a lot of the process we’ve converged on reflects my attempt to replicate what worked on me for the company: teaching technical writing by example. By showing how a blog post can be improved, hopefully the authors will learn to write better next time. Because the blog is this as well: a tool to teach ourselves, collectively, how to be technical writers.

This is a very soft target, it’s hard to measure the degree to which we’ve succeeded. But, because I don’t really know how to close this post without getting a touch emotional (it is, after all, no small moment for me), I’ve witnessed many of our engineers mature as writers, and I can’t help but feeling some parent-like pride at their growth.

May 15, 2025 12:00 AM

May 14, 2025

Well-Typed.Com

State-based testing with quickcheck-lockstep (Haskell Unfolder #44)

Today, 2025-05-14, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 44th episode of the Haskell Unfolder live on YouTube.

State-based testing with quickcheck-lockstep (Haskell Unfolder #44)

Many Haskell programmers will be familiar with property based testing of pure functions (for those who are not, various episodes of the Haskell Unfolder have discussed this: #4, #21, #38 and #40). Property based testing for stateful systems (“IO code”) is however much less well-known, which is a pity as it is just as useful! In this episode we will demonstrate how we can use quickcheck-lockstep to verify the responses we get from a simple stateful API; as we will see, all of the lessons from property based testing for pure functions can be applied in this stateful setting also.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at May 14, 2025 12:00 AM

May 12, 2025

Monday Morning Haskell

Hey ChatGPT, Write me a Haskell Course?

In last week’s article, I discussed how Monday Morning Haskell courses compare to other Haskell courses that I’ve seen out there online. Obviously I’m wildly biased, but I think MMH courses have some serious advantages.

But there’s still the elephant in the room…how do my courses compare to the possibility of using generative AI (e.g. ChatGPT) to learn Haskell instead? While AI is a great tool that has opened up a lot of doors in terms of learning complex concepts, human-developed courses still have some important advantages over the current way you would learn from a chatbot.

Analogy: Going to the Library

I’ll start my case by drawing an (imperfect) analogy. Suppose you are enrolled in a college and want to learn a particular subject, like physical chemistry. You could enroll in your school’s physical chemistry course. Or you could spend the same amount of time going to the library. After all, the library has tons of books on physical chemistry. So you could read all these books and gain the same level of insight, right?

In this example, most people would recognize the shortcomings of just going to the library. For example, you are now responsible for determining the curriculum and course of study.

You could, of course, look at the table of contents of an introductory book and just run with that way of organizing the material. But how much of that do you need to learn? Most college courses aren’t going all the way through a textbook, because the professor already has a good idea of what material is the most important, and has organized the course around that.

A professor will also know when and how to introduce supplemental material from other sources. If you’re just “learning from the library”, you’d be responsible for selecting which materials are the most important, and you probably aren’t qualified!

Also, while textbooks may have practice problems, and they may even have answers to those problems, you still have to do the work of figuring out which problems to study, and how many you need to study before you know the material. Taking a full course with assignments would solve this for you.

Finally, textbooks will rarely tell you about the human process of learning a particular subject. You probably aren’t going to read a sentence like “lots of students struggle to understand this, here’s a way of thinking about the problem that has helped a lot of them.” These are insights you’ll gain from working with a professor (who has taught real students) or other students in the class.

So let’s sum up the shortcomings of “learning from the library”:

  1. Direction - You must take on the cognitive overhead of determining which areas of the subject to study.
  2. Filtering - You must figure out how much detail is necessary, and how much practice you need to learn it.
  3. Human Learning Insight - Textbooks are generally lacking in the actual insights and breakthroughs that help students understand particularly challenging ideas.

From Physical to Online Learning

Now let’s consider what changes about the analogy if instead of comparing physical learning environments, we think about the current online learning environment. Entering an online course is significantly easier than enrolling in a university course. You don’t have to wait for the start of the semester, or go to a physical location.

But using ChatGPT as your “library” is vastly easier than studying from textbooks. In a matter of minutes, you can get tons of information on your screen that would have taken hours or days of effort at the library. And best of all, you can get information on virtually any topic, rather than just those that have pre-existing textbooks.

But I would still claim that using Chatbots for learning shares some of the drawbacks of “learning from the library”. And for these reasons, it’s still worthwhile to consider online courses where they exist instead of relying solely on ChatGPT. Some of these drawbacks might seem counterintuitive, but let’s think about it.

Direction of Study

You might think, “I don’t need to set my own direction, ChatGPT will do that for me!” And yes, you can ask it to lay out a syllabus for you (I did this myself in one of the examples below). This will give you a list of topics to study.

But it won’t just write out the whole course for you based on this initial syllabus in one go. You have to keep prompting it to provide you with the information you want. And it will get sidetracked, consistently asking you to go deeper and deeper down particular rabbit holes.

So it’s still up to you to determine how much you really want to study about particular topics, and you need to maintain the discipline to pull it back out and shift gears. A human-designed course puts these limits in there for you, so that you don’t need to carry that cognitive load.

Filtering

This brings us to the next issue of “filtering”. ChatGPT will provide you with a lot of information, all at once. You’ll ask a simple question, and get a very complicated answer with lots of tables comparing various different ways of looking at the question.

Sometimes, this is nice. It will expose you to ideas you wouldn’t have thought of otherwise. Sometimes though, it’s very distracting. It takes you away from the core of what you’re trying to learn. You have to make sure you aren’t getting dragged into an infinite loop of concepts.

The “practice” problem also exists. ChatGPT can keep coming up with practice problems, but it’s up to you to know how many you really need to study. In our case study below, we’ll also consider that it’s not necessarily the best tool for coming up with practice problems.

Again, a human-designed course does the filtering and measuring for you.

Human Insight

Once at my job, I was reviewing a teammate’s code that implemented a complicated algorithm. I told him, “after I looked closely at this one particular line, my understanding of this algorithm went from like 30% to 70%”, so adding an explanatory comment here would be very helpful!”.

This experience helped me understand the idea of “knowledge inflection points”. These are the key insights that really help you understand a topic. I’ve had several of these with various Haskell concepts, from monads, to folds, to data structures and certain algorithms. I’ve done my best to incorporate these insights into my course content.

An example from Solve.hs might be my understanding of “the common API” of Haskell data structures. This made it much easier for me to reason about using different structures in Haskell.

An AI probably wouldn’t frame the issue in the way I did, unless you already have the knowledge to prompt it. AI’s don’t have the experience of “learning” a concept piece-by-piece, and knowing when things finally “clicked”. You could try asking the chatbot what insights help people learn a topic, but it will only be able to try piecing that information together from what other people have written. On the whole, it still doesn’t beat the experience of someone who’s been there.

Human insights around learning are always going to get baked into a human-designed course, whereas AI is not generally going to be thinking in these terms.

Case Study: Learning Concurrency

I wanted to share a couple case studies that highlight some of the promise but also some of the frustrations with using AI for learning. Here’s a link to an extensive, multi-day study I did with ChatGPT to learn about concurrency topics. It helped me review a lot of topics I had learned in college (10 years ago), and also learn many new things. But there were still some pain points.

The “filtering” problem should be very evident. For each prompt I gave, ChatGPT provided tons of information. It was entirely up to me to figure out how much of this I really needed to know in order to be satisfied.

The “direction” problem is also clear. I started by asking for an organizational outline, and the chatbot duly obliged. But as I dug into certain topics, its preference was to ask me to keep going deeper down certain knowledge paths. I had to consistently drag it back to the syllabus it originally designed.

There were also no clear insights on what the key knowledge was. Over the course of the study, I figured some of these out for myself. But again, I had to filter through a lot of data to get there.

Another drawback I haven’t mentioned yet is the “memory” issue. Chatbots have limited, token-based memory, so they’ll forget what you’ve already learned over even a medium length study. My concurrency study introduced the idea of a “lock-free queue” using compare-and-swap operations early on. ChatGPT reintroduces this idea later as if I had never heard of it. Human-designed courses will avoid this sort of behavior.

I didn’t ask for practice problems in this study, so let’s consider another case study where I was specifically looking to do this in Haskell.

Case Study: Dijkstra’s Algorithm

In this quick study, I asked ChatGPT to come up with a practice problem for learning Dijkstra’s algorithm. Some things were good about its response, but some things weren’t.

On the positive side, the code works, the tests work, and some of the follow-up suggestions are also pretty good. For example, putting a bound on the number of nodes your path can have, or allowing multiple starts are simple extensions that didn’t occur to me when I was writing problems.

My main gripe is that the problems are a bit too obvious as graph problems. It started essentially with “implement Dijkstra’s algorithm” rather than giving me a practice problem using Dijkstra’s algorithm. And when I asked for a “disguised graph problem”, it gave me the delivery problem which wasn’t much of a disguise.

Also, the code used PSQueue, rather than the more beginner-friendly Data.Heap. This package may be better for certain things, but the type operator it uses would be a bit more confusing for a novice.

The line-by-line explanations were pretty good on the whole, but I don’t know that they’re a perfect substitute for really good visual/slide-based instructions like you would find in one of my courses.

With enough prompt engineering, you could get around these issues. But that’s exactly my point. It’s nice to not have to keep coming up with new prompts to get what you’re looking for, especially when you get a long explanation after every question.

Conclusion

Generative AI is a massive innovation for learning, especially on subjects that don’t have a lot of good guide material. But extensive, well-thought-out, human-designed content still has some significant advantages. The content is informed by the personal experience of someone who has actually been in your shoes and has had to learn something the same way you’ll learn it. This is not something an AI can relate to.

Prompt engineering involves a lot of cognitive effort. You have to constantly be directing the flow of what you’re supposed to learn, filter out the unnecessary parts, and then you have to learn it! While the freedom of being able to learn almost anything can be desirable, it can also be exhausting to always be directing the flow. It can be much easier and more helpful to just follow the lead of what another person has done.

I’ve used generative AI for learning and will continue to do so. But when human-designed content is available, I’ll look there first, and consider using AI as a supplement where I feel there are gaps.

When it comes to generating content, I don’t like AI as much, certainly not as a general purpose content producer. But it certainly has its uses. Looking back on course creation, I wish I had used it for writing test cases, for example. Another idea might be translating my work into other languages.

I’ll continue to experiment with AI going forward. But a solid guiding principle is that you should be using AI to enhance yourself, and not replace yourself. I still believe that human content has an edge over AI content for the same subject matter, so I encourage you to take another look at our courses at Monday Morning Haskell Academy, and to subscribe to our mailing list for future updates and discounts!

by James Bowen at May 12, 2025 08:30 AM

May 08, 2025

Mark Jason Dominus

A descriptive theory of seasons in the Mid-Atlantic

[ I started thinking about this about twenty years ago, and then writing it down in 2019, but it seems to be obsolete. I am publishing it anyway. ]

The canonical division of the year into seasons in the northern temperate zone goes something like this:

  • Spring: March 21 – June 21
  • Summer: June 21 – September 21
  • Autumn: September 21 – December 21
  • Winter: December 21 – March 21

Living in the mid-Atlantic region of the northeast U.S., I have never been happy with this. It is just not a good description of the climate.

I begin by observing that the year is not equally partitioned between the four seasons. The summer and winter are longer, and spring and autumn are brief and happy interludes in between.

I have no problem with spring beginning in the middle of March. I think that is just right. March famously comes in like a lion and goes out like a lamb. The beginning of March is crappy, like February, and frequently has snowstorms and freezes. By the end of March, spring is usually skipping along, with singing birds and not just the early flowers (snowdrops, crocuses, daffodil) but many of the later ones also.

By the middle of May the spring flowers are over and the weather is getting warm, often uncomfortably so. Summer continues through the beginning of September, which is still good for swimming and lightweight clothes. In late September it finally gives way to autumn.

Autumn is jacket weather but not overcoat weather. Its last gasp is in the middle of November. By this time all the leaves have changed, and the ones that are going to fall off the trees have done so. The cool autumn mist has become a chilly winter mist. The cold winter rains begin at the end of November.

So my first cut would look something like this:

Months
Seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter

Note that this puts Thanksgiving where it belongs at the boundary between autumn (harvest season) and winter (did we harvest enough to survive?). Also, it puts the winter solstice (December 21) about one quarter of the way through the winter. This is correct. By the solstice the days have gotten short, and after that the cold starts to kick in. (“As the days begin to lengthen, the cold begins to strengthen”.) The conventional division takes the solstice as the beginning of winter, which I just find perplexing. December 1 is not the very coldest part of winter, but it certainly isn't autumn.

There is something to be said for it though. I think I can distinguish several subseasons — ten in fact:

Dominus Seasonal Calendar

Months
Seasons
Sub-seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter
Midwinter
Late Winter
Early spring
Late spring
Early Summer
Midsummer
Late Summer
Early autumn
Late autumn
Early winter
Midwinter

Midwinter, beginning around the solstice, is when the really crappy weather arrives, day after day of bitter cold. In contrast, early and late winter are typically much milder. By late February the snow is usually starting to melt. (March, of course, is always unpredictable, and usually has one nasty practical joke hiding up its sleeve. Often, March is pleasant and springy in the second week, and then mocks you by turning back into January for the third week. This takes people by surprise almost every year and I wonder why they never seem to catch on.)

Similarly, the really hot weather is mostly confined to midsummer. Early and late summer may be warm but you do not get blazing sun and you have to fry your eggs indoors, not on the pavement.

Why the seasons seem to turn in the middle of each month, and not at the beginning, I can't say. Someone messed up, but who? Probably the Romans. I hear that the Persians and the Baha’i start their year on the vernal equinox. Smart!

Weather in other places is very different, even in the temperate zones. For example, in southern California they don't have any of the traditional seasons. They have a period of cooler damp weather in the winter months, and then instead of summer they have a period of gloomy haze from June through August.

However

I may have waited too long to publish this article, as climate change seems to have rendered it obsolete. In recent years, we have barely had midwinter, and instead of the usual two to three annual snows we have zero. Midsummer has grown from two to four months, and summer now lasts into October.

by Mark Dominus (mjd@plover.com) at May 08, 2025 10:39 PM

May 07, 2025

Well-Typed.Com

The Haskell Unfolder Episode 43: monomorphism restriction and defaulting

Today, 2025-05-07, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 43th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 43: monomorphism restriction and defaulting

In this episode, we are going to look at two interacting “features” of the Haskell language (the monomorphism restriction and defaulting) that can be somewhat surprising, in particular to newcomers: there are situations where Haskell’s type inference algorithm deliberately refuses to infer the most general type. We are going to look at a number of examples, explain what exactly is going on, and why.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at May 07, 2025 12:00 AM

May 06, 2025

Philip Wadler

The Provocateurs: Brave New Bullshit

Following two sell-out shows at the Fringe last year, I'm on at the Fringe again:

11.30 Monday 4 August, Stand 2 w/Lucy Remnant and Susan Morrison
17.30 Sunday 17 August, Stand 4 w/Smita Kheria and Sarah-Jane Judge
17.30 Tuesday 19 August, Stand 4 w/Cameron Wyatt and Susan Morrison

Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. Links above are preliminary, you may need to search to find the actual links. The official blurb is brief:

Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.

by Philip Wadler (noreply@blogger.com) at May 06, 2025 09:00 AM

May 05, 2025

Matthew Sackman

Payslips and tax: calculating your own

In the UK, it’s very common that your employer pays you once a month. When this happens, they give you a document called a payslip, that has some numbers on it, such as how much your salary is, how much they paid you this month, how much went to HMRC in tax, how much went to your pension, and a few other numbers. But they never show any workings, so you really have no way to check whether any of these numbers are correct. There are plenty of online take-home-pay calculators, but these all focus on the full year; they have no facility to calculate your next payslip.

About half way through April 2024, I stopped working for one company. Everything was wrapped up – I received my final payslip from them, along with my P45. I then had a few months off, and started a new job in July 2024. When you start a new job it always takes a while for money things to get sorted out, for example pension enrolment and sorting out pension contributions, so it’s really worthwhile to keep a close eye on your payslips particularly for these first few months. Mine were arriving and some numbers looked right, but other numbers, such as the amount of tax I was paying, were changing dramatically, month to month. I had no idea why; whether they should be changing like that; whether they were going to keep changing or would eventually settle down. I had no way to check any of these numbers. Was I going to get in trouble with HMRC and get investigated?

I was also a little on edge because this was the first job where my pension contributions were using a thing called Qualifying Earnings. In all my previous jobs, if I chose for 10% of my salary to go into my pension, then that’s what would happen. But now there was this thing called Qualifying Earnings, which is (numbers correct at time of writing) a band from £6240 to £50,270. If you’re earning, say £30k, then your x% contribution is actually x% of £30,000-£6240. If you’re earning above £50,270, then any further increase to your salary will not result in any extra contributions to your pension because you’re above the band. The 2008 Pensions Act, which created the legal requirement for all employees to have workplace pensions and for automatic enrolment (with a minimum 8% combined contribution from the employer and employee), also created this concept of Qualifying Earnings. I consider this is a pretty scummy way of reducing employer pension contributions for large firms. It complicates the maths and no doubt adds confusion for people trying to check their own payslips. Given that 74% of the population have pensions that are too small to retire on, this whole concept of Qualifying Earnings seems amoral at best.

These days, a lot of smaller companies outsource their payroll processing. In my case, I was officially working for an international Employer of Record and they were then outsourcing payroll processing to local firms with country-specific expertise. So when I started asking questions, there was no ability to go and sit with someone and work through it. Or have a call. It was all messages passed across multiple different systems, and partial answers at best would come back several days later. Even if your payroll is done in-house, I strongly suspect that a lot of the time, some software package will be being used that does all the calculations and quite likely no one will actually understand or be able to explain the maths that’s going on.

After a while of getting no-where, and after uncovering some substantial mistakes that had been made that affected me, I decided to spend some weekends actually figuring out how PAYE works, and writing some code that can calculate my next payslip. This library is available for anyone to use. There’s a README that hopefully explains the basic principles of how the calculations are done. This only works if your tax-code ends in an L, and it only works if you’re in National Insurance category A. All the code can do is use some details you provide to predict your next payslips. Also, I’m not a trained accountant or financial adviser, and even for my own payslips, every month, the numbers don’t quite match up (but they’re within £1). So please treat this as a toy, rather than the basis for building a payroll processor!

Getting started

The library is written in Go so you’ll need Go installed. Then, in a terminal do:

$ mkdir payslips
$ cd payslips
$ go mod init mypayslips
$ go get wellquite.org/tax@latest

Now we need to write a tiny amount of code. In your new payslips directory, create a main.go file, and open it in your editor. You want something like this:

package main

import (
   "fmt"
   "wellquite.org/tax"
)

func main() {
   payslips := tax.Payslips{
      {
         Year:                            2024,
         TaxCode:                         "1257L",
         Salary:                          tax.Yearly(50000),
         PensionType:                     tax.Salary,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(50000),
         PensionType:                     tax.Salary,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {},
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         EmployeePensionContributionRate: 0.15,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         Expenses:                        116.08,
         EmployeePensionContributionRate: 0.15,
         EmployerPensionContributionRate: 0.03,
      },
   }

   payslips.Complete()
   fmt.Println(payslips)
}

We create a list of Payslips. The first payslip must specify a year, and your tax-code. These details are automatically applied to the payslips that follow, if not explicitly provided. Many of the calculations rely on year-to-date totals, and so we must have a complete record of your payslips from the start of the tax year. So that means the first payslip is month 1 (in this example, April 2024), then month 2 (May 2024) and so on. If you have no income for a month then you can just put in an empty payslip ({}). The above example describes being paid in April and May 2024, then nothing in June, and then being paid (with a higher salary) in July, August and September.

Save this main.go file. Then, back in your terminal, in your payslips directory, just do:

go run main.go

You should get some output showing all sorts of calculations, including income tax, and personal allowance. With a little luck, if you change the numbers to match your own salary and other details, the numbers produced should match quite closely your own payslips, provided nothing you’re doing is too exotic.

There is documentation for all the different fields that you can provide in each payslip. In general, the code will try to fill in missing values. It should be able to cope with things like salary-sacrifice, or, if you change job within a month and have several payslips for the same month, this should work too. Everything is run locally on your computer: please feel free to check the source – there are no 3rd party libraries at all, and nothing imports the net package. It’ll work just the same if you yank out your network cable or disable your WiFi.

Note however, this code is lightly tested. Whilst it works for me (and one or two friends), I make no claims that it correctly models the entirety of PAYE, so it may very well not work for you. Feedback, contributions, corrections, and patches are all very welcome!

May 05, 2025 02:30 PM

Monday Morning Haskell

Comparing Courses: MMH vs. The Rest

Due to some technical issues, our Spring Sale has been extended! You have until Monday, May 12 to get 20% off of all our courses and bundles with the code SOLVE25, and you can get an even bigger 30% discount if you subscribe to our mailing list.

Having now released the final portion of Solve.hs (probably my last course for a while) I wanted to consider the broader landscape of Haskell courses. What other courses are out there? Are they better than mine?

So I’ve actually purchased a few other Haskell courses, and spent a decent amount of time going through their material. I may not be the smartest person to write a Haskell course and I definitely don’t have the most industry experience with Haskell. But, having explored some of these other courses, I think there are some good reasons to consider my courses among the top tier in the Haskell community.

So on the last day of this sale, I wanted to explore a few areas where I think my courses stand out above the rest.

Breadth of Material

There’s a common thread among most Haskell material out there, including and especially courses. They will generally all cover the same topics. You can generally expect to see all of the following in a Haskell course:

  1. Basic Syntax and Types
  2. Typeclasses and polymorphism
  3. Basic Recursion
  4. Understanding Functors, Applicatives & Monads
  5. Using the IO monad
  6. Basic use of the Map type

In some cases, you’ll also see something like a basic web server. And there’s a good reason for this progression. I covered the same material in Haskell From Scratch!

But there’s generally a lack of material in a lot of cool and interesting areas. I’ve done my best to cover a lot of these areas throughout my courses. Here are some of those topics, and the corresponding courses that cover them.

  1. Data structures (beyond lists and maps) - Solve.hs
  2. Algorithms - Solve.hs
  3. Parsing Complex Data - Making Sense of Monads, Solve.hs
  4. Advanced Web Servers - Practical Haskell
  5. Complex Effect Stacks - Effectful Haskell, Practical Haskell
  6. Unit Testing Details - Practical Haskell
  7. Machine Learning - The Haskell Brain

Simply put, I haven’t found a Haskell resource anywhere else that puts all these concepts together in a course-like environment. You could potentially find some blog posts that discuss them, or read the documentation, but this leads to the next point.

Detailed and Challenging Exercises

Reading by itself is rarely enough to retain knowledge, especially when it comes to programming. If you read a great article about unit testing in Haskell, you’ll probably forget all the details and have to go back to it the next time you actually want to use the ideas.

You can try to follow along with the article by writing the code in your own IDE. But you’ll still probably end up just copying things, which also isn’t the best way to learn.

You can even try to devise your own project to use the knowledge. But there’s often a significant cognitive effort involved in coming up with a new idea that fits these requirements…different enough from the article that you’re actually testing yourself, but similar enough that you can actually apply the concept.

Great programming courses should provide exercises so that you can try the techniques in your own environment, without a spoon-fed answer already available to you. They should remove the overhead of coming up with your own way to test yourself, while also providing rapid feedback on whether or not you’ve succeeded.

A lot of Haskell courses I’ve seen out there don’t satisfy these criteria. I’ve seen courses out there that don’t have any exercises. And the ones that do often have at least one of the following issues:

  1. Only 1-2 problems per lecture
  2. Problems are too easy
  3. Lack of test cases
  4. No starter code (i.e. you’re only given a written description)
  5. No toolchain integration (i.e. you’re just given a file, but no project to work with or limited build instructions)

Every course on Monday Morning Haskell Academy comes with detailed exercises to help you learn the material. You’ll usually get several problems per lecture (4-6), and the starter code for these problems comes with full toolchain integration and instructions, plus automated unit test cases.

Difficulty is always going to be a bit subjective, but for most lectures I’ve made an effort to have some easier problems as well as more challenging ones.

Lecture Content and Slides

Naturally, the core content of the course is the lecture materials, so it’s worth talking about that as well! Some Haskell courses rely strictly on written material, but most incorporate slides and audio presentation.

For the most part, course authors do a fine job with their slides. But I think I go above and beyond the norm by using bold text to highlight the most important parts of the code presented, and using colors to show the relationship between different elements on the same slide and across slides.

With our courses, you’re able to get the slides as a downloadable asset. And with the level of detail on them, they serve as a useful reference for you to quickly come back to, even without listening again to the lecture audio.

Other Guarantees

Finally, it’s worth noting that our courses and bundles all come with a 14-day money back guarantee. If you don’t like the materials, you can get a refund within 14 days with no questions asked.

Additionally, all our courses guarantee lifetime access to the content. There’s no recurring subscription. So if your life is too busy to go through the full course right now, you can always save it for later!

So you may as well take a look at our course listings now, since you can get a 20% discount using the code SOLVE25 (today only!). If you subscribe to our mailing list, you’ll get an extra 10% off as well.

Our new bundles (e.g. Beginners & Advanced) are a great way to save money while exploring the full breadth of Haskell materials and topics we have to offer. If you get MMH Complete, you’ll get lifetime access to all our course content, past, present and future! So don’t miss out, take advantage of the sale today!

by James Bowen at May 05, 2025 08:30 AM

Sandy Maguire

Analyzing API Design via Algebraic Laws

The other day, someone asked:

Why doesn’t [the Data.Map function] unionWith :: (a -> a -> a) -> Map k a -> Map k a -> Map k a allow for different value types the way intersectionWith :: (a -> b -> c) -> Map k a -> Map k b -> Map k c does?

This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation.

To answer the original question, what would the type of a different value type of unionWith look like? It would be something in the flavor of:

unionWith :: (Maybe a -> Maybe b -> c) -> Map k a -> Map k b -> Map k c

But this new Maybe a -> Maybe b -> c parameter is somewhat lossy, in that it gives the impression that it could be called with Nothing Nothing as parameters, which doesn’t fit into the vibe of being a “union.”

So instead we could restrict that possibility by using These a b:

data These a b = This a | That b | These a b

unionWith :: (These a b -> c) -> Map k a -> Map k b -> Map k c

which seems reasonable enough.


But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map?

A Map k v is a particularly efficient implementation of functions with type k -> Maybe v. But why is this Maybe here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be Monoid v => k -> v. In fact, it helps us make sense of the right bias present in Data.Map, where we see:

lookup k (singleton k v1 <> singleton k v2) = Just v2

This equality is hard to justify under the normal understanding of Map k v being an encoding of a function k -> Maybe v. But under the general monoid interpretation, we get a nice semigroup homomorphism:

lookup k (m1 <> m2) = lookup k m1 <> lookup k m2

where the monoid in question has been specialized to be Last.

Of course, we also have a monoid homomorphism:

lookup k mempty = mempty

Let’s re-evaluate the original question in terms of this newly-generalized Map. Now that we’ve removed all of the unnecessary baggage of Maybe, we can again think about the desired type of unionWith:

unionWith
    :: (a -> b -> c)
    -> Map k a
    -> Map k b
    -> Map k c

which looks awfully familiar. This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with mempty as a parameter!

We can give some semantics as to what unionWith ought to do again by relating it to the observation lookup. The relevant law here seems like it ought to be:

lookup k (unionWith f m n) = f (lookup k m) (lookup k n)

By choosing a degenerate function f, say, \_ _ -> nontrivial, where nontrivial is some value that is not mempty, we can see the beginnings of a problem:

  lookup k (unionWith f m n)
=
  f (lookup k m) (lookup k n)
= <let f = \_ _ -> nontrivial>
  nontrivial

Regardless of the key we lookup in our unionWithed Map, we need to get back nontrivial. How can we implement such a thing? I see only two ways:

  1. explicitly associate every key in the map with nontrivial, or
  2. keep nontrivial around as a default value in the map

#1 is clearly a non-starter, given that we want our Maps to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this:

data Map k v = Map
  { defaultValue :: v
  , implementation :: Data.Map.Map k v
  }
  deriving stock Generic
  deriving (Semigroup, Monoid) via (Generically (Map k v))

unionWith
    :: (a -> b -> c)
    -> Map k a
    -> Map k b
    -> Map k c
unionWith f (Map def1 imp1) (Map def2 imp2) =
  Map (f def1 def2) (liftA2 f imp1 imp2)

Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism:

lookup k (m1 <> m2) = lookup k m1 <> lookup k m2

Without loss of generalization, take m2 = pure nontrivial (where pure is just unionWith with a constant function.) This gives us:

lookup k (m1 <> pure nontrivial) = lookup k m1 <> nontrivial

Making this thing efficient is a further complication! We again have two options:

  1. modify the value at every key by multiplying in nontrivial, or
  2. finding a way of suspending this computation

#1 clearly requires \(O(n)\) work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire Map. For example, consider a Map constructed a la:

((pure prefix1 <> ((pure prefix2 <> m) <> n)) <> (p <> pure suffix)

Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of Data.Map.Map.


I find this quite an interesting result. I always assumed that Data.Map.Map (or at least, Data.Map.Monoidal.MonoidalMap) didn’t have an Applicative instance because it would require a Monoid constraint on its output—but that’s not the sort of thing we can express in Haskell.

But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of Applicative, even if we could constrain the result.

What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of Data.Map (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.

May 05, 2025 07:33 AM

May 04, 2025

Sandy Maguire

Using Obscure Graph Theory to solve PL Problems

Usually I write about solutions to problems I’ve worked out, but I’ve found myself increasingly becoming interesting in where solutions come from. Maybe it’s because I’ve been reading Boorstin’s excellent The Discoverers, which I’d strongly recommend.

Regardless of why, I thought I’d switch up the usual dance step today, and discuss what solving my most-recent-big-problem actually looked like, in terms of what I tried, where I looked, and what the timeline was.

The Problem

The problem is to serialize a program graph into a series of let-bindings. For example, given the following graph:

      +
    /   \
  f ---> g
  |     / \
  a     \ /
      expensive

which represents the program:

f a (g expensive expensive) + g expensive expensive

Unfortunately, this is a naive representation of the program, since it duplicates the work required to compute expensive four times, and g expensive expensive twice. Instead, we would prefer to generate the equivalent-but-more-efficient program:

let $0 = expensive
    $1 = g $0 $0
 in f a $1 + $1

This transformation is affectionately known as sharing, since it shares the computed answer whenever there is repeated work to be done.

So this is what we’re trying to do. Given the original graph, determine the best place to insert these let-bindings, for some reasonable definition of “best.” We can assume there are no side effects involved, so any place that an expression is well-scoped is an acceptable solution.

In order to understand some of my attempted solutions, it’s worth noting that our final solution should build something of type Expr, and the original graph is represented as a IntMap (ExprF Int). ExprF is the Base functor of Expr, with all of its self-references replaced by some type variable, in this case Int. Thus, the graph above looks much more like:

_ : IntMap (ExprF Int)
_ = IM.fromList
  [ (0, Apply "+" [1, 3])
  , (1, Apply "f" [2, 3]
  , (2, ...)  -- a
  , (3, Apply "g" [4, 4])
  , (4, ...)  -- expensive
  ]

The Original Solution

I spent over a year trying to solve this problem, with various mostly-working solutions during that time. My strategy here was to think really hard, write up some algorithm that seemed plausible, and then run it against our (small) battery of integration tests to make sure it got the same answer as before.

Why not property test it? I tried, but found it very challenging to implement well-typed generators that would reliably introduce shared thunks. But maybe there’s a different lesson to be learned here about writing good generators.

Anyway. For eight months, one of these think-really-hard algorithms fit the bill and didn’t give us any problems. It was a weird, bespoke solution to the problem that independetly kept track of all of the free variables in every graph fragment, and tried to let-bind a fragment as soon as we landed in a context where all of the free variables were in scope. It seemed to work, but it was extremely messy and unmaintainable.

At the time of writing, this sharing algorithm was the only source of let-binds in our entire language, which meant that it didn’t need to account for let-binds in the program.

Of course, that invariant eventually changed. We added a way in the source langauge to introduce lets, which meant my algorithm was wrong. And I had written it sufficiently long ago that I no longer remembered exactly why it worked. Which meant the theory of my program was lost, and thus that we ought to rewrite it.

Unfolding a Solution

I went back to the problem statement, and stared at it for a long time (back to the think-really-hard algorithm!) Upon staring at the problem, I realized that what I was really trying to do was determine where diamond patterns arose in the propgram graph.

Recall our original graph:

      +
    /   \
  f ---> g
  |     / \
  a     \ /
      expensive

If we redraw it such that g is on a different rank than f, then the two diamond patterns become much clearer:

      +
    /  \
  f     |
  | \   |
  a  \ /
      g
     / \
     \ /
   expensive

The insight I came up with is that if a node n is the source of a diamond, then we must let-bind the sink of the diamond immediately before inlining the definition of n.

This gives rise to the question of “how do we identify a diamond?” What we can do is give a mapping from each node to its reachable set of nodes. For example, in the above, we’d compute the map:

+         -> {+, f, a, g, expensive}
f         -> {f, a, g, expensive}
a         -> {a}
g         -> {g, expensive}
expensive -> {expensive}

Then when we go to inline a node, say, +, we can look for any nodes that are reachable via more than one of its immediate subterms. Since the immediate subterms of + are f and g, we can take the intersections of their reachable sets:

{f, a, g, expensive} union {g, expensive}

giving us

{g, expensive}

which is exactly the set of nodes that we need to perform sharing on. If you topologically sort this set, it gives you the order that you should perform your let bindings.

EXCEPT there’s a kink in the whole thing. What happens if one of the terms in this diamond contains free variables? In particular, we might have something like this:

      +
    /  \
  f     |
  | \   |
  a  \ /
      λx
     / \
     \ /
   expensive
      |
      x

This gives us an analogous set of reachable nodes when we look at +, but we obviously can’t lift expensive x above the lambda.

Resolving this problem required giving up on the notion of memoizing the entire reachable set of nodes, and to instead crawl the graph ensuring that everything is well-scoped.

Performance Woes

My algorithm looked fine, and, importantly, got the right answer in a reasonable amount of time on our (small) battery of integration tests. So I shipped it, commended myself on a job well done, and thought nothing more about it. For about a week, until a bug report came in saying that our compiler now seemed to hang on big programs.

Which was something I hadn’t noticed, since we didn’t have any big programs in our integration tests.

Damn!

Upon digging in to what exactly was so slow, I noticed that my algorithm was accidentally quadratic. I needed to fold over every node in the graph, and that required looking at the entire reachable set underneath it. I had put in some of the obvious safeguards, hoping that they would prune the search tree early, but it wasn’t enough sacrifice for the Great God of Asymptotes.

Did I mention that at this point in the story, having this algorithm working fast was on the critical path of the company? Everybody else was blocked on me figuring this out. Talk about pressure!

Anyway. You’ll notice above that in my description of the algorithm, everything sounds fine. But the juice is in the details, as the common saying goes. Computing reachability isn’t quite the right thing to be using here, as it gave us the wrong answer for the lambda example above. Which is unfortunate because reachability is something we can do in linear time.

And then when reachability didn’t work, I just threw away the fast performance and hoped my bespoke algorithm would do the job. My only redemption comes from the fact that at least it got the right answer, even if it did so very slowly.

Finding the Kernel

Back to the drawing board.

Whenever I have graph theory problems, I call up my boy Vikrem. He’s good at nerd stuff like this.

We rubberducked the problem, and tried to reframe the problem in the language of graph theory. We had a Merkiv–Maguire moment where we indepdently realized that the goal was somehow related to finding the lowest common ancestor (LCA) of a node.

Which is to say, roughly, that we are looking for forks in the diamond diagram. Which we already knew, but it was nice to have some language for.

Our new problem is that LCA is defined only over trees. There are some extensions to DAGs, but none of them seem to be particularly well founded. However, searching for exactly that brought me to this stackoverflow question, where nestled in the comments is someone suggesting that the poster isn’t looking for LCA, but instead for a related notion the lowest single common ancestor. LSCA is defined in a 2010 paper New common ancestor problems in trees and directed acyclic graphs.

The standard definition of LCA(x, y) = l is that “l is an ancestor of x and of y, and that no descendent of l has this property.”

But the definition of LSCA(x, y) = l is that “l lies on all root-to-x paths, and that l lies on all root-to-y paths, and that no descendent of l has this property.”

The distinction between the two is easily seen in the following graph:

  0
 / \
1   2
| X |
3   4

Under the standard definition, LCA is not uniquely defined for DAGs. That is, LCA(3, 4) = {1, 2}. But neither 1 nor 2 lies on all paths from the root. Under LSCA therefore we get LSCA(3, 4) = 0, which is the obviously-correct place to let-bind 3 and 4.

The paper gives a preprocessing scheme for computing LSCA by building a “lowest single ancestor” (LSA) tree. The LSA of a node is the LSCA of all of its in-edges. This definition cashes out to mean “the most immediate diamond above any node.” Finally! This is exactly what we’re looking for, since this is where we must insert our let-bindings! Even better, the paper gives us an algorithm for computing the LSA tree in linear time!

The First Implementer

Of course, I’m lazy and would prefer not to implement this thing. So instead I searched on hackage for lsca, and found nothing. But then I searched for lca and found that, like always, Ed Kmett was 13 years ahead of me.

The lca package implements an \(O(log n)\) algorithm for computing the LCA of any two nodes in a graph. Which is very convenient for me, since the LSCA algorithm requires being able to do this.

Time to roll up the sleeves and get cracking I suppose.

The paper was surprisingly straightforward, and my first attempt implemented the (imperative) algorithms as given (imperatively.) The first step is to do a topological sort on the DAG in order to know in which order one ought to unfold the LSA tree.

But as is so often the case, this topological sort isn’t actually relevant to the algorithm; it’s just an encoding detail of expressing the algorithm imperatively. But you don’t need that when you’ve got laziness on your side! Instead you can just tie the know and do something cool like this:

lsaTree :: Ord v => Map v (Set v) -> Map v (Path v)
lsaTree input = fix $ \result -> M.fromList $ do
  (node, parents) <- M.toList input
  let parentResults = fmap (result M.!) parents
  ...

Notice how we use fix to bind the eventual result of the final computation. Then we can chase pointers by looking them up in result—even though it’s not yet “computed.” Who cares what order the computer does it in. Why is that a thing I should need to specify?

Anyway. The exact details of implementing LSA are not particularly important for the remainder of this blog post. If you’re interested, you can peep the PR, which is delightfully small.

Tying It All Back Together

Equipped with my LSA tree, I was now ready to go back and solve the original problem of figuring out where to stick let-bindings. It’s easy now. Given the original program graph, find the LSA for each node. The LSA is the place you should insert the let binding.

So given the map of nodes to their LSAs, invert that map and get back a map of nodes to descendents who have this node as an LSA. Now when you go to inline a node, just look up everything in this map and inline it first.

It turns out to be a very elegant solution. It’s one third of the length of my horrible ad-hoc implementations, and it runs in linear time of the number of nodes in the graph. All in all, very good.

More often than I’m comfortable about, people will ask me how I can have so many good ideas. And what I like about this story is that it’s pretty typical of how I actually “have” “good” ideas. I’m reminded of the fact that luck favors the prepared mind. Attentive readers will notice that none of this process was due to brilliance on my part. I happened to know Vikrem who’s a genius. Together we pulled at some ancient graph theory strings and remembered a fact that someone else had thought important to teach us. That wasn’t actually the right path, but it lead us to stackoverflow where someone had linked to a relevant paper. I implemented the paper using a library that someone else had done the heavy lifting on, and simplified the implementation using this knot-tying trick I picked up somewhere along the way.

Also, I’m just really pleased that the solution came from trying to reverse engineer the relevant graph-theory search terms. Maybe that’s the actual takeaway here.

May 04, 2025 08:05 AM

May 03, 2025

Mark Jason Dominus

Claude and I write a utility program

Then I had two problems…

A few days ago I got angry at xargs for the hundredth time, because for me xargs is one of those "then he had two problems" technologies. It never does what I want by default and I can never remember how to use it. This time what I wanted wasn't complicated: I had a bunch of PDF documents in /tmp and I wanted to use GPG to encrypt some of them, something like this:

    gpg -ac $(ls *.pdf | menupick)

menupick is a lovely little utility that reads lines from standard input, presents a menu, prompts on the terminal for a selection from the items, and then prints the selection to standard output. Anyway, this didn't work because some of the filenames I wanted had spaces in them, and the shell sucks. Also because gpg probably only does one file at a time.

I could have done it this way:

    ls *.pdf | menupick | while read f; do gpg -ac "$f"; done

but that's a lot to type. I thought “aha, I'll use xargs.” Then I had two problems.

    ls *.pdf | menupick | xargs gpg -ac

This doesn't work because xargs wants to batch up the inputs to run as few instances of gpg as possible, and gpg only does one file at a time. I glanced at the xargs manual looking for the "one at a time please" option (which should have been the default) but I didn't see it amongst the forest of other options.

I think now that I needed -n 1 but I didn't find it immediately, and I was tired of looking it up every time when it was what I wanted every time. After many years of not remembering how to get xargs to do what I wanted, I decided the time had come to write a stripped-down replacement that just did what I wanted and nothing else.

(In hindsight I should perhaps have looked to see if gpg's --multifile option did what I wanted, but it's okay that I didn't, this solution is more general and I will use it over and over in coming years.)

xar is a worse version of xargs, but worse is better (for me)

First I wrote a comment that specified the scope of the project:

    # Version of xargs that will be easier to use
    #
    # 1. Replace each % with the filename, if there are any
    # 2. Otherwise put the filename at the end of the line
    # 3. Run one command per argument unless there is (some flag)
    # 4. On error, continue anyway
    # 5. Need -0 flag to allow NUL-termination

There! It will do one thing well, as Brian and Rob commanded us in the Beginning Times.

I wrote a draft implementation that did not even do all those things, just items 2 and 4, then I fleshed it out with item 1. I decided that I would postpone 3 and 5 until I needed them. (5 at least isn't a YAGNI, because I know I have needed it in the past.)

The result was this:

    import subprocess
    import sys

    def command_has_percent(command):
        for word in command:
            if "%" in word:
                return True
        return False

    def substitute_percents(target, replacement):
        return [ s.replace("%", replacement) for s in target ]

    def run_command_with_filename(command_template, filename):
        command = command_template.copy()
        if not command_has_percent(command):
            command.append("%")

        res = subprocess.run(substitute_percents(command, filename), check=False)
        return res.returncode == 0

    if __name__ == '__main__':
        template = sys.argv[1:]
        ok = True
        for line in sys.stdin:
            if line.endswith("\n"):
                line = line[:-1]
            if not run_command_with_filename(template, line):
                ok = False
        exit(0 if ok else 1)

Short, clean, simple, easy to use. I called it xar, ran

    ls *.pdf | menupick | xar gpg -ac

and was content.

Now again, with Claude

The following day I thought this would be the perfect opportunity to try getting some LLM help with programming. I already had a baseline version of xar working, and had thought through the problem specification. Now I could do it over with the LLM and compare the two results. The program I wanted was small and self-contained. If the LLM sabotaged me somehow, I would probably notice, and if I didn't, it wouldn't matter, because I had a working version already.

So I asked Claude about it. I knew Claude was not the right choice, because Claude is not really designed for this task, and has no integration with the coding editor, but Claude is what I am familiar with so I thought I'd give it a shot.

On the whole it went extremely well. The complete transcript is here. I imagine this was pretty much a maximally good experience, that all the terrible problems of LLM coding arise in larger, more complicated systems. But this program is small and self-contained, with no subtleties, so the LLM could be at its best.

Phase 1

First I just told Claude what I wanted and let it write the program from scratch.

Let's write a program in Python called xar. It will be a stripped-down version of the standard Unix xargs utility, but it will do less and be simpler to use.

It will get command-line arguments that specify a command template. Then it will read lines from stdin. Each line will be appended to the command template to produce a command. The program will run the resulting command, one command for each line of input.

Any questions?

Default Claude is very chatty and I am going to omit most of its chatty comments, which make me impatient.

The program it wrote it was not what I would have written, but it was good enough. If I had just used it right off the bat, instead of writing my own, it would have been enough, and it would have taken somewhere between 2% and 10% as long to produce.

So the one-line summary of this article is: I should stop writing simple command-line utilities, and just have Claude write them instead, because Claude's are good enough, and definitely better than yak-shaving.

Here's Claude's initial version:

    #!/usr/bin/env python3

    import sys
    import subprocess
    import shlex

    def main():
        if len(sys.argv) < 2:
            print("Usage: xar command [args...]", file=sys.stderr)
            sys.exit(1)

        command_template = sys.argv[1:]

        for line in sys.stdin:
            line = line.rstrip('\n')

            full_command = command_template + [line]

            try:
                result = subprocess.run(full_command, check=True)
            except subprocess.CalledProcessError as e:
                print(f"Command failed with exit code {e.returncode}: {' '.join(full_command)}", 
                      file=sys.stderr)
            except FileNotFoundError:
                print(f"Command not found: {full_command[0]}", file=sys.stderr)
                sys.exit(1)

    if __name__ == "__main__":
        main()

Claude's version had numerous comments, which I have omitted. I later told it to stop putting in comments, which it did.

Claude's use of check here was not what I wanted, because that makes subprocess.run raise an exception when the subcommand fails, and then the exception has to be immediately caught and handled. My original control flow had been simpler:

        res = subprocess.run(substitute_percents(command, filename), check=False)
        return res.returncode == 0

        …

        if not run_command_with_filename(template, line):
            ok = False

Claude's program pulled in shlex without noticing that it was unused. But Claude did teach me about str.rstrip('\n') which I had not known about before (or maybe had forgotten), so that was one small win already.

Argument parsing

The next step was a big win. Python's library for command-line argument handling is called argparse and it is really nice. If I were still writing programs in Perl, I would implement a Perl version of argparse because Perl has 29 available argument parsing libraries and they are all loathsome. The one problem with argparse is I never remember off the top of my head how to use it. I think the module is called argparse but it provides a class called Argparser but I often get these backward and try to use argparser and Argparse.

Instead of figuring it out every time I usually dig up some previous Python program and then copy-paste the argument parser from there, amending it to suit the purpose.

But this time I didn't have to do that. Instead, I just said to Claude:

This is good, now please add code at the top to handle argument parsing with the standard Argparse library, even though there are no options yet.

Claude handed me back pretty much the same program, but with the argument parser at the top:

    def parse_arguments():
        """Parse command-line arguments using argparse."""
        parser = argparse.ArgumentParser(
            description="A simplified version of xargs that appends each line from stdin to a command template.",
            prog="xar"
        )

        # Store all positional arguments as the command to execute
        parser.add_argument(
            "command", 
            nargs="+", 
            help="The command template to execute for each input line"
        )

        return parser.parse_args()

Let's pause for a moment. Maybe you kids are unimpressed by this. But if someone had sent this interaction back in time and showed it to me even as late as five years ago, I would have been stunned. It would have been completely science-fictional. The code that it produced is not so impressive, maybe, but that I was able to get it in response to an unstructured request in plain English is like seeing a nuclear-powered jetpack in action.

I make some changes

It bothered me that the command-line arguments were being taken implicitly from sys.argv; that makes the program hard to test. A better way to do this is to have the list of command-line arguments be a function to the main function, and then for the runtime to call main(sys.argv[1:]) instead of just main(). That way you can test the program without having to fake up sys.argv. So I changed that, and while I was making changes I got rid of Claude's comments and the unnecessary exception handling. I pasted the result back at Claude, saying:

Good so far. I have adjusted your source code in several ways:

  1. I removed the comments.
  2. I removed the exception handling for the subcommand. If one of the subcommands fails, that no longer terminates the main loop
  3. Command-line arguments are passed to the main() function rather than being accessed directly

Here is the code. Tell me if you see any problems or if you have any questions about it

I had also sabotaged the code with at least one likely error. Python's subprocess.run returns a “result” structure that includes the subcommand's output and exit status, and I can never remember the name of the component that records the exit status. I either look it up or get it by trial-and-error. This time I didn't bother to either, I just guessed that it was result.exitcode, which I handed off to Claude even though I thought was probably wrong.

Claude's reply said:

Here are a couple of issues I noticed:

There's an error in the error handling code - you're using result.exitcode but in the subprocess.run() function, the attribute is called returncode not exitcode

Awesome, now I don't have to remember that name of that component.

Claude continued:

The parse_arguments() function definition is missing in the pasted code, but it would need to be updated to accept the argv parameter

I had passed sys.argv[1:] to main, and then had main call parse_arguments(argv). But I hadn't changed parse_arguments to expect an argument list. (I hadn't included it in my paste at all, in fact.) Claude was exactly right, and continued with the corrected version:

    #!/usr/bin/env python3

    import sys
    import subprocess
    import argparse

    def parse_arguments(argv):
        …
        return parser.parse_args(argv)

    def main(argv):
        # Parse command-line arguments
        args = parse_arguments(argv)
        …

    if __name__ == "__main__":
        ok = main(sys.argv[1:])
        exit(0 if ok else 1)

Handling NUL-separated values

I said

It's great, thanks for catching those errors. Now let's add a -0 command-line option that tells the program to split the input on "\0" characters instead of at line breaks.

Recall that my handwritten xar did not do this yet; I had left that feature for another day. Claude just added it for me, adding

    parser.add_argument(
        "-0", "--null",
        action="store_true",
        help="Input items are terminated by a null character instead of by newline"
    )

to the argument parser and changing the main loop to:

    # Split input based on the delimiter option
    if args.null:
        # Split on null characters
        items = stdin_data.split('\0')
    else:
        # Split on newlines
        items = stdin_data.splitlines()

    for item in items:
        if not item:
            continue

        …

I was tired of Claude's comments, so I said

Give me the same thing, with no comments

which it did, so I said

From now on leave out the comments. I'm a Python expert and I don't need them. If there is something I don't understand I will ask you.

Claude complied. If I were going to do use Claude again in the future I would include that in the canned instructions that Claude is given up front. Instead I will probably use a tool better-suited to programming, and do whatever the analogous thing is.

Template filling

Now I told Claude to add the % feature:

I changed if not item to if item == "". Now let's make the following change:

  • If the command template includes any % characters, each of these should be replaced with the input item.
  • Otherwise, if there were no % characters, the input item should be appended to the end of the command as in the current version

Claude did this. It used an explicit loop instead of the list comprehension that I had used (and preferred), but it did do it correctly:

    for arg in command_template:
        if '%' in arg:
            has_placeholder = True
            full_command.append(arg.replace('%', item))
        else:
            full_command.append(arg)

    if not has_placeholder:
        full_command.append(item)

Even without the list comprehension, I would have factored out the common code:

    for arg in command_template:
        if '%' in arg:
            has_placeholder = True
        full_command.append(arg.replace('%', item))

    if not has_placeholder:
        full_command.append(item)

But I am not going to complain, my code is simpler but is doing unnecessary work.

Claude also took my hint to change item == "" even though I didn't explicitly tell it to change that.

At this point the main loop of the main function was 15 lines long, because Claude had stuck all the %-processing inline. So I said:

Good, let's extract the command template processing into a subroutine.

It did this right, understanding correctly what code I was referring to and extracting it into a subroutine called process_command_template. More science fiction: I can say "command template processing" and it guesses what I had in mind!

This cut the main loop to 7 lines. That worked so well I tried it again:

Good, now let's extract the part of main that processes stdin into a subroutine that returns the items array

It pulled the correct code into a function called process_stdin. It did not make the novice mistake of passing the entire args structure to this function. In the caller it had process_stdin(args.null) and inside of process_stdin this parameter was named use_null_delimiter,

YAGNI?

At this point I was satisfied but I thought I might as well ask if it should do something else before we concluded:

Can you think of any features I left out that would be useful enough to warrant inclusion? Remember this program is supposed to be small and easy to use, in contrast to the existing xargs which is very complicated.

Claude had four suggestions:

  1. A -p or --parallel option to run commands in parallel

  2. A -n or --max-args option to specify the maximum number of items to pass per command

  3. A simple -v or --verbose flag to show commands as they're executed

  4. A way to replace the command's standard input with the item instead of adding it as an argument

All reasonable suggestions, nothing stupid. (It also supplied code for #3, which I had not asked for and did not want, but as I said before, default Claude is very chatty.)

Parallelization

I didn't want any of these, and I knew that #2–4 would be easy to add if I did want any of them later. But #1 was harder. I've done code like this in the past, where the program has a worker pool and runs a new process whenever the worker pool isn't at capacity. It's not even that hard. In Perl you can play a cute trick and use something like

    $workers{spawn()} = 1 while delete $workers{wait()};

where the workers hash maps process IDs to dummy values. A child exits, wait() awakens and returns the process ID of the completed child, which is then deleted from the map, and the loop starts another worker.

I wanted to see how Claude would do it, and the result was an even bigger win than I had had previously, because Claude wrote this:

   with concurrent.futures.ProcessPoolExecutor(max_workers=args.parallel) as executor:
        futures = [executor.submit(execute_command, cmd, args.verbose) for cmd in commands]

        for future in concurrent.futures.as_completed(futures):
            success = future.result()
            if not success:
                ok = False

What's so great about this? What's great is that I hadn't known about concurrent.futures or ProcessPoolExecutor. And while I might have suspected that something like them existed, I didn't know what they were called. But now I do know about them.

If someone had asked me to write the --parallel option, I would have had to have this conversation with myself:

Python probably has something like this already. But how long will it take me to track it down? And once I do, will the API documentation be any good, or will it be spotty and incorrect? And will there be only one module, or will there be three and I will have to pick the right one? And having picked module F6, will I find out an hour later that F6 is old and unmaintained and that people will tell me “Oh, you should have used A1, it is the new hotness, everyone knows that.”

When I put all that uncertainty on a balance, and weigh it against the known costs of doing it myself, which one wins?

The right choice is: I should do the research, find the good module (A1, not F6), and figure out how to use it.

But one of my biggest weaknesses as a programmer is that I too often make the wrong choice in this situation. I think “oh, I've done this before, it will be quicker to just do it myself”, and then I do and it is.

Let me repeat, it is quicker to do it myself. But that is still the wrong choice.

Maybe the thing I wrote would be sooner or smaller or faster or more technically suitable to the project than the canned module would have been. But it would only have been more be technically suitable today. If it needed a new feature in the future it might have to be changed by someone who had never seen it before, whereas the canned module could well already have the needed feature ready to go, already documented, and perhaps already familiar to whoever had to make the change. My bespoke version would certainly be unfamiliar to every other programmer on the project — including perhaps myself six months later — so would be the wrong thing to use.

I'm really good at hacking this stuff up. Which is a problem. It makes me want to hack stuff up, even when I shouldn't.

Claude tips the balance strongly toward the correct side, which is that I should use the prepackaged module that someone else wrote and not hack something up.

And now I know about concurrent.futures.ProcessPoolExecutor! The world is full of manuals, how can I decide which ones I should read? One way is to read them all, which used to work back when I was younger, but now I have more responsibilities and I don't have time to read the entire Python library reference including all the useless bits I will never use. But here's Claude pointing out to me that this is something I need to know about, now, today, and I should read this one. That is valuable knowledge.

Claude flubs

While testing the program I did

    xar --parallel=2 bash -c '...'

which didn't work because the argparse library tried to parse the -c as a flag to xar, when I wanted it to be a flag to bash. This was my fault, and there is a standard way around it, which is that I should have said

    xar --parallel=2 -- bash -c '...'

which tells argparse to drop the -- and take everything after it literally with no interpretation.

I thought I'd see what Claude had to say about this, and it dropped the ball in a very LLM sort of way.

I asked:

When I did xar --parallel=2 bash -c '...' it misunderstood the arguments. It thought I was trying to pass -c to xar and failed with unrecognized arguments. What should I do about that?

Claude replied:

There are two main approaches to handle this:

  1. Use the -- separator to indicate the end of arguments for xar: …
  2. Modify your code to handle this case specifically by looking for a command terminator in the argument parser.

Here's how you could modify your code to implement the first approach:

And then it followed with the same, unmodified code…

Which was correct, because -- is already handled by argparse, so no change was actually required to implement the first approach! But Claude's characterization of it as a modification was misleading, wrong, and potentially very confusing.

I said:

I don't see what the difference is in that argument parsing code as compared to what I had before.

and Claude got into the LLM tail-chasing mode where it got more and more confused:

You're right, I apologize for the confusion. There's no difference in the argument parsing code I provided compared to what you had before. The issue is with how argparse handles arguments by default.

To fix the issue with xar --parallel=2 bash -c '...', you need to actually implement support for the -- separator in your code…

   # Find the index of '--' if it exists
   try:
       separator_index = argv.index('--')
       xar_args = argv[:separator_index]
       command_args = argv[separator_index+1:]
   except ValueError: …

Ugh, all wrong in the worst way. The code probably works, but it is completely unnecessary. Claude's claim that “you need to actually implement support for the -- separator” is flat wrong. I pointed this out and Claude got more confused. Oh well, nobody is perfect!

Lessons learned

A long time ago, when syntax-coloring editors were still new, I tried one and didn't like it, then tried again a few years later and discovered that I liked it better than I had before, and not for the reasons that anyone had predicted or that I would have been able to predict. (I wrote an article about the surprising reasons to use the syntax coloring.)

This time also. As usual, an actual experiment produced unexpected results, because the world is complicated and interesting. Some of the results were unsurprising, but some were not anything I would have thought of beforehand.

Claude's code is good enough, but it is not a magic oracle

Getting Claude to write most of the code was a lot faster and easier than writing it myself. This is good! But I was dangerously tempted to just take Claude's code at face value instead of checking it carefully. I quickly got used to flying along at great speed, and it was tough to force myself to slow down and be methodical, looking over everything as carefully as I would if Claude were a real junior programmer. It would be easy for me to lapse into bad habits, especially if I were tired or ill. I will have to be wary.

Fortunately there is already a part of my brain trained to deal with bright kids who lack experience, and I think perhaps that part of my brain will be able to deal effectively with Claude.

I did not notice any mistakes on Claude's part — at least this time.

At one point my testing turned up what appeared to be a bug, but it was not. The testing was still time well-spent.

Claude remembers the manual better than I do

Having Claude remember stuff for me, instead of rummaging the manual, is great. Having Claude stub out an argument parser, instead of copying one from somewhere else, was pure win.

Partway along I was writing a test script and I wanted to use that Bash flag that tells Bash to quit early if any of the subcommands fails. I can never remember what that flag is called. Normally I would have hunted for it in one of my own shell scripts, or groveled over the 378 options in the bash manual. This time I just asked in plain English “What's the bash option that tells the script to abort if a command fails?” Claude told me, and we went back to what we were doing.

Claude can talk about code with me, at least small pieces

Claude easily does simple refactors. At least at this scale, it got them right. I was not expecting this to work as well as it did.

When I told Claude to stop commenting every line, it did. I wonder, if I had told it to use if not expr only for Boolean expressions, would it have complied? Perhaps, at least for a while.

When Claude wrote code I wasn't sure about, I asked it what it was doing and at least once it explained correctly. Claude had written

     parser.add_argument(
        "-p", "--parallel",
        nargs="?",
        const=5,
        type=int,
        default=1,
        help="Run up to N commands in parallel (default: 5)"
    )

Wait, I said, I know what the const=5 is doing, that's so that if you have --parallel with no number it defaults to 5. But what is the --default doing here? I just asked Claude and it told me: that's used if there is no --parallel flag at all.

This was much easier than it would have been for me to pick over the argparse manual to figure out how to do this in the first place.

More thoughts

On a different project, Claude might have done much worse. It might have given wrong explanations, or written wrong code. I think that's okay though. When I work with human programmers, they give wrong explanations and write wrong code all the time. I'm used to it.

I don't know how well it will work for larger systems. Possibly pretty well if I can keep the project sufficiently modular that it doesn't get confused about cross-module interactions. But if the criticism is “that LLM stuff doesn't work unless you keep the code extremely modular” that's not much of a criticism. We all need more encouragement to keep the code modular.

Programmers often write closely-coupled modules knowing that it is bad and it will cause maintenance headaches down the line, knowing that the problems will most likely be someone else's to deal with. But what if writing closely-coupled modules had an immediate cost today, the cost being that the LLM would be less helpful and more likely to mess up today's code? Maybe programmers would be more careful about letting that happen!

Will my programming skill atrophy?

Folks at Recurse Center were discussing this question.

I don't think it will. It will only atrophy if I let it. And I have a pretty good track record of not letting it. The essence of engineering is to pay attention to what I am doing and why, to try to produce a solid product that satisifes complex constraints, to try to spot problems and correct them. I am not going to stop doing this. Perhaps the problems will be different ones than they were before. That is all right.

Starting decades ago I have repeatedly told people

You cannot just paste code with no understanding of what is going on and expect it to work.

That was true then without Claude and it is true now with Claude. Why would I change my mind about this? How could Claude change it?

Will I lose anything from having Claude write that complex parser.add_argument call for me? Perhaps if I had figured it out on my own, on future occasions I would have remembered the const=5 and default=1 specifications and how they interacted. Perhaps.

But I suspect that I have figured it out on my own in the past, more than once, and it didn't stick. I am happy with how it went this time. After I got Claude's explanation, I checked its claimed behavior pretty carefully with a stub program, as if I had been reviewing a colleague's code that I wasn't sure about.

The biggest win Claude gave me was that I didn't know about this ProcessPoolExecutor thing before, and now I do. That is going to make me a better programmer. Now I know something about useful that I didn't know before, and I have a pointer to documentation I know I should study.

My skill at writing ad-hoc process pool managers might atrophy, but if it does, that is good. I have already written too many ad-hoc process pool managers. It was a bad habit, I should have stopped long ago, and this will help me stop.

Conclusion

This works.

Perfectly? No, it's technology, technology never works perfectly. Have you ever used a computer?

Will it introduce new problems? Probably, it's new technology, and new technology always introduces new problems.

But is it better than what we had before? Definitely.

I still see some programmers turning up their noses at this technology as if they were sure it was a silly fad that would burn itself out once people came to their senses and saw what a terrible idea it was.

I think that is not going to happen, and those nose-turning-up people, like the people who pointed out all the drawbacks and unknown-unknowns of automobiles as compared to horse-drawn wagons, are going to look increasingly foolish.

Because it works.

by Mark Dominus (mjd@plover.com) at May 03, 2025 04:46 PM

A puzzle about balancing test tubes in a centrifuge

Suppose a centrifuge has slots, arranged in a circle around the center, and we have test tubes we wish to place into the slots. If the tubes are not arranged symmetrically around the center, the centrifuge will explode.

(By "arranged symmetrically around the center, I mean that if the center is at , then the sum of the positions of the tubes must also be at .)

Let's consider the example of . Clearly we can arrange , , , or tubes symmetrically:

twelve small circles arranged around a central point, like a   clock; circles 5 and 11 are filled in cirles 1, 5, and 9 are filled in circles 0, 3, 6, and 9 are filled circles 1, 3, 5, 7, 9, and 11 are filled

Equally clearly we can't arrange only . Also it's easy to see we can do tubes if and only if we can also do tubes, which rules out .

From now on I will write to mean the problem of balancing tubes in a centrifuge with slots. So and are possible, and and are not. And is solvable if and only if is.

It's perhaps a little surprising that is possible. If you just ask this to someone out of nowhere they might have a happy inspiration: “Oh, I'll just combine the solutions for and , easy.” But that doesn't work because two groups of the form and always overlap.

For example, if your group of is the slots then you can't also have your group of be , because slot already has a tube in it.

The k=3 and k=4 diagrams from before, superimposed; there is a   small explosion symbol at 9 o'clock where they interfere

The other balanced groups of are blocked in the same way. You cannot solve the puzzle with ; you have to do as below left. The best way to approach this is to do , as below right. This is easy, since the triangle only blocks three of the six symmetric pairs. Then you replace the holes with tubes and the tubes with holes to turn into .

a triangle filling slots 1, 5, and 9, plus a pair at 0, 6 and   another pair at 2, 8 the opposite of the previous diagram, consisting of a triangle   filling slots 3, 7, 11 and a pair at 4, 10

Given and , how can we decide whether the centrifuge can be safely packed?

Clearly you can solve when is a multiple of , but the example of (or ) shows this isn't a necessary condition.

A generalization of this is that is always solvable if since you can easily balance tubes at positions , then do another tubes one position over, and so on. For example, to do you just put first four tubes in slots and the next four one position over, in slots .

Two squares this time, as described in the previous paragraph

An interesting counterexample is that the strategy for , where we did , cannot be extended to . One would want to do , but there is no way to arrange the tubes so that the group of doesn't conflict with the group of , which blocks one slot from every pair.

But we can see that this must be true without even considering the geometry. is the reverse of , which impossible: the only nontrivial divisors of are and , so must be a sum of s and s, and is not.

You can't fit tubes when , but again the reason is a bit tricky. When I looked at directly, I did a case analysis to make sure that the -group and the -group would always conflict. But again there was an easier was to see this: and clearly won't work, as is not a sum of s and s. I wonder if there's an example where both and are not obvious?

For , every works except and the always-impossible .

What's the answer in general? I don't know.

Addenda

20250502

Now I am amusing myself thinking about the perversity of a centrifuge with a prime number of slots, say . If you use it at all, you must fill every slot. I hope you like explosions!

While I did not explode any centrifuges in university chemistry, I did once explode an expensive Liebig condenser.

A chemistry lab apparatus.  In the middle thje the Liebig condenser, a pair of long concentric glass tubes, one inside the other, supported by a metal ring stand.  At left a heater heats a flash whose top is connected to the upper end of the inner tube of the condenser.  The condensate collects in a flask at right.  Two rubber tubes connect to the top and bottom of the outer tube, carrying water through it.

Condenser setup by Mario Link from an original image by Arlen on Flickr. Licensed cc-by-2.0, provided via Wikimedia Commons.

20250503

  • Michael Lugo informs me that a complete solution may be found on Matt Baker's math blog. I have not yet looked at this myself.

  • Omar Antolín points out an important consideration I missed: it may be necessary to subtract polygons. Consider . This is obviously possible since . But there is a more interesting solution. We can add the pentagon to the digons and to obtain the solution $${0,5,6,10,12,18, 20, 24, 25}.$$

    Then from this we can subtract the triangle to obtain $${5, 6, 12, 18, 24, 25},$$ a solution to which is not a sum of regular polygons:

  • Thanks to Dave Long for pointing out a small but significant error, which I have corrected.

20250505

by Mark Dominus (mjd@plover.com) at May 03, 2025 03:25 PM

GHC Developer Blog

GHC 9.10.2 is now available

GHC 9.10.2 is now available

Zubin Duggal - 2025-05-03

The GHC developers are very pleased to announce the availability of the final release for GHC 9.10.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of severities and scopes, including:

  • Significantly improved performance when dynamically loading Haskell symbols (#23415).
  • Fixing a bug where the simplifier sometimes destroyed join points during float out, which could impact performance (#24768).
  • Reduced memory fragmentation in the non-moving GC’s segment allocator, improving resident set size by up to 26% for some applications (#24150).
  • Added new flags to control speculative evaluation (-fspec-eval and -fspec-eval-dictfun) to work around performance regressions (#25606).
  • Fixed several platform-specific issues, including segfaults with FFI on PowerPC (#23034) and improved code generation for AArch64 with multiway branches now using jump tables (#19912)
  • And many more!

A full accounting of these fixes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at May 03, 2025 12:00 AM

May 02, 2025

Gabriella Gonzalez

Prompt chaining reimagined with type inference

Prompt chaining reimagined with type inference

At work I’ve been researching how to improve the ergonomics of prompt engineering and I wanted to share and open source some of what I’ve done. This initial post is about how I’ve been experimenting with using bidirectional type inference to streamline prompt chaining.

“Prompt chaining” is a prompt engineering technique that splits a larger task/prompt into multiple smaller tasks/prompts which are chained together using code. For example, instead of prompting a model to generate a poem in one prompt like this:

Write a poem based off this idea:

${idea}

… by following this process:

  • First think through the form, stanza count, lines per stanza, and rhyme scheme
  • Then choose a poetic style (tone, voice, and literary devices) based on the poem’s form
  • Then write a complete poem based on that plan

… you can split it into smaller prompts, like this:

structure prompt:

Plan the structure of a new poem based on this idea

${idea}

Describe its form, stanza count, lines per stanza, and rhyme scheme

style prompt:

Given this poem structure:

  • Form: ${structure.form}
  • Stanzas: ${structure.stanzaCount}
  • Lines per stanza: ${structure.linesPerStanza}
  • Rhyme scheme: ${structure.rhymeScheme}

Choose a poetic style: tone, voice, and literary devices to emphasize

poem prompt:

Write a complete poem based on this idea:

${idea}

Structure:

  • Form: ${structure.form}
  • Stanzas: ${structure.stanzaCount}
  • Lines per stanza: ${structure.linesPerStanza}
  • Rhyme scheme: ${structure.rhymeScheme}

Style:

  • Tone: ${style.stone}
  • Voice: ${style.voice}
  • Literary Devices: ${style.literaryDevices}

Why might you want to do this?

  • to improve the quality of the results

    Models perform better when working on more constrained subproblems. Splitting a larger prompt into smaller prompts helps the model stay focused at each step.

  • to introspect intermediate results

    This comes in handy when you want to log, validate, or correct intermediate results.

  • to perform actions in between prompts

    You might want to take the output of one prompt, use that to call some tool, then use the output of that tool to decide what the next prompt should be, which you can’t do with a single prompt.

In other words, prompt chaining unlocks greater accuracy, control, and flexibility for prompt engineering.

The problem

The main issue with prompt chaining is that it is a huge pain in the ass; if you start do anything a little bit complicated you need to start using structured outputs (i.e. JSON), which adds a whole lot of boilerplate to the process:

  • you have to define the schema for each intermediate step of the process

    You typically do this by defining your data model in your host programming language (e.g. a Pydantic model in Python) or directly defining your JSON schema

  • You have to instruct the model to produce JSON and explain the shape of the expected output

  • (Depending on the framework) you have to decode the JSON into your data model

For small prompt chaining pipelines this isn’t too hard, but it starts to get annoying to define all these schemas when you scale this up to more sophisticated prompt chaining pipelines.

So as a thought experiment I wanted to create a research prototype that handled all of that for you so that you didn’t need to specify any schemas at all. In other words I wanted to build a programming language that harnessed bidirectional type inference to perform schema inference for prompts with structured JSON outputs.

Example

I’ll cut to the case by showing the above prompt chain written as a program in this language:

let concatSep =
      https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let lines = concatSep "\n"

let generatePoem idea =
        let structure = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Plan the structure of a new poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
                    ]
                }

        let renderedStructure = lines
                [ "- Form: " + structure.form
                , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
                , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
                , "- Rhyme scheme: " + structure.rhymeScheme
                ]

        let style = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Given this poem structure:"
                    , renderedStructure
                    , ""
                    , "Choose a poetic style: tone, voice, and literary devices to emphasize."
                    ]
                }

        let renderedStyle = lines
                [ "- Tone: " + style.tone
                , "- Voice: " + style.voice
                , "- Literary Devices: " + concatSep ", " style.literaryDevices
                ]

        let poem : Text = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Write a complete poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Structure:"
                    , renderedStructure
                    , ""
                    , "Style:"
                    , renderedStyle
                    ]
                }

        in { structure: structure, style: style, poem: poem }

in  generatePoem "The stillness of early morning"

The above prompt chain is a real program in a research programming language named Fall-from-Grace (or “Grace” for short). Specifically, this is implemented in a branch of the language that adds a magic prompt keyword that works like in the above example.

If you run the above example, you might get an output like this:

{
  "structure": {
    "form": "Free verse with a sprinkle of rhyme",
    "linesPerStanza": 4,
    "rhymeScheme": "ABCB",
    "stanzaCount": 5
  },
  "style": {
    "literaryDevices": [
      "imagery",
      "alliteration",
      "personification",
      "enjambment"
    ],
    "tone": "nostalgic",
    "voice": "reflective"
  },
  "poem": "### The Stillness of Early Morning\n\n**I**  \nIn the gentle rise of dawn's embrace,  \nA whisper of winds through sleepy skies,  \nThe world lies suspended in an echo  \nOf yesterday's dreams, reluctantly saying goodbye.\n\n**II**  \nSoft orange light graces fields of dew,  \nCradling tenacious tendrils of the past.  \nMorning breathes upon the waiting world  \nWith promises that this serenity will last.\n\n**III**  \nWaves of silence surge across the meadow,  \nNature's symphony gently cradles rest.  \nA chorus of leaves sings its soft refrain,  \nComforted by morning's tender caress.\n\n**IV**  \nThe silence dips to caress weary trees,  \nWhose branches waltz to whispers soft and slow.  \nEach tendril holds close the morning's intention,  \nBefore the day's demands, the chaos, start to show.\n\n**V**  \nEarly rays dance upon a tranquil sea,  \nA canvas painted through a dreamy haze.  \nTime sits by, savoring the gentle hush,  \nAs the world awakens, stirs, and finally plays."
}

… and if you’re curious what the formatted poem looks like:

The Stillness of Early Morning

I
In the gentle rise of dawn’s embrace,
A whisper of winds through sleepy skies,
The world lies suspended in an echo
Of yesterday’s dreams, reluctantly saying goodbye.

II
Soft orange light graces fields of dew,
Cradling tenacious tendrils of the past.
Morning breathes upon the waiting world
With promises that this serenity will last.

III
Waves of silence surge across the meadow,
Nature’s symphony gently cradles rest.
A chorus of leaves sings its soft refrain,
Comforted by morning’s tender caress.

IV
The silence dips to caress weary trees,
Whose branches waltz to whispers soft and slow.
Each tendril holds close the morning’s intention,
Before the day’s demands, the chaos, start to show.

V
Early rays dance upon a tranquil sea,
A canvas painted through a dreamy haze.
Time sits by, savoring the gentle hush,
As the world awakens, stirs, and finally plays.

Type inference

The sample Grace program hardly specifies any types (mainly the final expected type for the poem: Text). The reason this works is because Grace supports bidirectional type inference, which means that Grace can work backwards from how intermediate results are used to infer their schemas.

I’ll illustrate this with a contrived Grace example:

let numbers = prompt{ text: "Give me two numbers" }

in  { x: numbers.x
    , y: numbers.y
    , sum: numbers.x + numbers.y : Integer
    }

… which might produce an output like this:

$ grace interpret ./numbers.ffg
{ "x": 7, "y": 14, "sum": 21 }

When Grace analyzes this program the type checker works backwards from this expression:

numbers.x + numbers.y : Integer

… and reasons about it like this:

  • the addition produces an Integer, therefore numbers.x and numbers.y must also be Integers

  • therefore numbers is a record with two fields, x and y, both of which are Integers

    … or using Grace syntax, the inferred type of numbers is: { x: Integer, y: Integer }

  • therefore the output of the prompt command must have the same type

… and then Grace generates a JSON schema for the prompt which looks like this:

{ "type": "object",
  "properties": {
    "x": { "type": "integer" },
    "y": { "type": "integer" }
  },
  "required": [ "x", "y" ],
  "additionalProperties": false
}

Of course, you can specify types if you want (and they’re more lightweight than schemas in traditional prompt chaining frameworks). For example:

$ grace repl
>>> prompt{ text: "Give me a first and last name" } : { first: Text, last: Text }
{ "first": "Emily", "last": "Johnson" }
>>> prompt{ text: "Give me a list of names" } : List Text
[ "Alice"
, "Bob"
, "Charlie"
, "Diana"
, "Ethan"
, "Fiona"
, "George"
, "Hannah"
, "Isaac"
, "Jack"
]

However in our original example we don’t need to specify intermediate types because when the type-checker sees this code:

let structure = prompt
        { model: "gpt-4o"
        , text: lines
            [ "Plan the structure of a new poem based on this idea:"
            , ""
            , idea
            , ""
            , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
            ]
        }

let renderedStructure = lines
        [ "- Form: " + structure.form
        , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
        , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
        , "- Rhyme scheme: " + structure.rhymeScheme
        ]

… the compiler can reason backwards from how the structure value is used to infer that the JSON schema for the prompt needs to be:

{ "type": "object",
  "properties": {
    "form": { "type": "string" },
    "stanzaCount": { "type": "integer" },
    "linesPerStanza": { "type": "integer" },
    "rhymeScheme": { "type": "string" }
  },
  "required": [
    "form",
    "stanzaCount",
    "linesPerStanza",
    "rhymeScheme"
    ],
  "additionalProperties": false
}

Tool use

Grace also supports generating sum types (a.k.a. tagged unions), and you can imagine using this to subsume traditional tool use frameworks.

For example, consider this Grace program:

let concatSep = https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let call = merge
      { HttpRequest: \x -> "curl " + x.url
      , ShellCommand: \x -> concatSep " " ([ x.executable ] + x.arguments)
      }

in  List/map call (prompt{ text: "Call some tools" })

This doesn’t actually run any tools (I haven’t added any callable tools to my work-in-progress branch yet), but just renders the tool use as a string for now:

$ grace interpret ./tools.ffg
[ "curl https://api.example.com/data", "ls -l -a" ]

However, the idea is that you can model a tool as a sum type with one constructor per callable tool, and in the above example the type checker infers that the sum type representing one tool call is:

< HttpRequest: { url: Text }
| ShellCommand: { executable: Text, arguments: List Text }
>

… so the inferred type of call is:

call : < HttpRequest: …, ShellCommand:> -> Text

… but since we List/map the call function over the output of the prompt the type checker infers that the prompt needs to generate a List of tool calls:

prompt{ text: "Call some tools" } : List < HttpRequest: …, ShellCommand:>

… and then Grace does some magic under the hood to convert that type to the equivalent JSON schema.

What’s particularly neat about this example is that the prompt is so incredibly bare (“Call some tools”) because all the information the model needs is present in the schema.

Schema-driven prompting

We can explore this idea of using the schema to drive the prompt instead of prose using an example like this:

prompt{ text: "Generate some characters for a story", model: "gpt-4o" }
  : List
    { "The character's name": Text
    , "The most memorable thing about the character": Text
    , "The character's personal arc": Text
    }
[ { "The character's name": "Aveline Thatcher"
  , "The character's personal arc":
      "Aveline starts as a skeptical journalist who doubts the stories of mythical creatures. Over time, she becomes a firm believer, risking her career to uncover the truth and protect these creatures."
  , "The most memorable thing about the character":
      "The intricate tattoo of a phoenix on her forearm that seems to glow when she discovers hidden truths."
  }
, { "The character's name": "Kelan Frost"
  , "The character's personal arc":
      "A former rogue alchemist who turns hero after he inadvertently creates a dangerous substance. Driven by guilt, Kelan seeks redemption by finding an antidote and saving his village."
  , "The most memorable thing about the character":
      "His iridescent blue eyes that seem to see into one's soul, a side effect of his alchemical experiments."
  }
, { "The character's name": "Luciana Blair"
  , "The character's personal arc":
      "Luciana is a reclusive artist who initially fears the world outside her home. After a mysterious vision rejuvenates her, she sets out on a journey of self-discovery, ultimately finding both her voice and courage."
  , "The most memorable thing about the character":
      "Her ability to paint scenes before they happen, which she attributes to the visions she sees in her dreams."
  }
, { "The character's name": "Ezra Hartman"
  , "The character's personal arc":
      "Once a charismatic but self-centered lawyer, Ezra is confronted with a moral crisis that forces him to reevaluate his values. He chooses a path of integrity, becoming an advocate for justice."
  , "The most memorable thing about the character":
      "His perfectly tailored suits that slowly become more casual, symbolizing his transformation and shifting priorities."
  }
, { "The character's name": "Seraphine Mora"
  , "The character's personal arc":
      "Seraphine is a young music prodigy who loses her hearing after an accident. Battling despair, she learns to embrace a new way of 'hearing' music through vibrations and her other senses."
  , "The most memorable thing about the character":
      "The ethereal way she 'dances' with the music, using her entire body to express each note's emotion."
  }
]

Grace is a superset of JSON and since JSON supports arbitrary field names so does Grace! Field names in Grace support arbitrary capitalization, punctuation, and whitespace as long as you quote them, and we can use the field names to “smuggle” the description of each field into the schema.

Conclusion

Hopefully this gives you some idea of why I’ve begun to think of prompt chaining as a programming languages problem. Type inference is just the beginning and I think it is possible to use a domain-specific programming language not just to simplify the code but to ultimately unlock greater reasoning power.

I’m going to continue to use Grace as a research vehicle for prompt chaining but my LLM-enabled branch of Grace (like Grace itself) is not really intended to be used in production and I created it mainly as a proof-of-concept for where I’d like prompt chaining frameworks to go. If I do end up eventually productionizing this research I will create a proper fork with its own name and the whole works.

by Gabriella Gonzalez (noreply@blogger.com) at May 02, 2025 05:56 PM

April 30, 2025

Mark Jason Dominus

Proof by insufficient information

Content warning: rambly

Given the coordinates of the three vertices of a triangle, can we find the area? Yes. If by no other method, we can use the Pythagorean theorem to find the lengths of the edges, and then Heron's formula to compute the area from that.

Now, given the coordinates of the four vertices of a quadrilateral, can we find the area? And the answer is, no, there is no method to do that, because there is not enough information:

three points arranged in an irregular triangle, with one in the middle.  Four of the six possible edges are drawn in, definining a quadrilateral the same four points, but with three different edges drawn in to define a different quadrilateral with the same vertices a third quadrilateral with the same vertices as the first two

These three quadrilaterals have the same vertices, but different areas. Just knowing the vertices is not enough; you also need their order.

I suppose one could abstract this: Let be the function that maps the set of vertices to the area of the quadrilateral. Can we calculate values of ? No, because there is no such , it is not well-defined.

Put that way it seems less interesting. It's just another example of the principle that, just because you put together a plausible sounding description of some object, you cannot infer that such an object must exist. One of the all-time pop hits here is:

Let be the smallest [real / rational] number strictly greater than

which appears on Math SE quite frequently. Another one I remember is someone who asked about the volume of a polyhedron with exactly five faces, all triangles. This is a fallacy at the ontological level, not the mathematical level, so when it comes up I try to demonstrate it with a nonmathematical counterexample, usually something like “the largest purple hat in my closet” or perhaps “the current Crown Prince of the Ottoman Empire”. The latter is less good because it relies on the other person to know obscure stuff about the Ottoman Empire, whatever that is.

This is also unfortunately also the error in Anselm's so-called “ontological proof of God”. A philosophically-minded friend of mine once remarked that being known for the discovery of the ontological proof of God is like being known for the discovery that you can wipe your ass with your hand.

Anyway, I'm digressing. The interesting part of the quadrilateral thing, to me, is not so much that doesn't exist, but the specific reasoning that demonstrates that it can't exist. I think there are more examples of this proof strategy, where we prove nonexistence by showing there is not enough information for the thing to exist, but I haven't thought about it enough to come up with one.

There is a proof, the so-called “information-theoretic proof”, that a comparison sorting algorithm takes at least time, based on comparing the amount of information gathered from the comparisons (one bit each) with that required to distinguish all possible permutations ( bits total). I'm not sure that's what I'm looking for here. But I'm also not sure it isn't, or why I feel it might be different.

Addenda

20250430

Carl Muckenhoupt suggests that logical independence proofs are of the same sort. He says, for example:

Is there a way to prove the parallel postulate from Euclid's other axioms? No, there is not enough information. Here are two geometric models that produce different results.

This is just the sort of thing I was looking for.

20250503

Rik Signes has allowed me to reveal that he was the source of the memorable disparagement of Anselm's dumbass argument.

by Mark Dominus (mjd@plover.com) at April 30, 2025 03:48 PM

Well-Typed.Com

Explicit Level Imports awarded best paper at TFP 2025

In January, we presented our work on Explicit Level Imports at the Trends in Functional Programming symposium. We’re pleased to announce that the paper was awarded the the John McCarthy Prize for best paper overall!

The paper introduces the ExplicitLevelImports extension to GHC, which gives programmers fine-grained control over which modules and dependencies are required by Template Haskell. For instance, in the following example, the splice import tells the compiler that Control.Lens.TH is needed only at compile time and not at runtime:

import splice Control.Lens.TH (makeLenses)
import App (S)
data T = MkT { foo :: S }
$(makeLenses ''T )

By taking advantage of this extra information, the compiler can perform less work in certain situations. In one benchmark, we modified pandoc to use ExplicitLevelImports and compilation time was halved when using -fno-code. For full details, read the paper or check out GHC Proposal #682. The implementation has landed in GHC MR !14241, so ExplicitLevelImports will be available in the next major release of GHC (9.14).

Explicit Level Imports

Matthew Pickering, Rodrigo Mesquita, Adam Gundry

TFP 2025 (PDF) (Awarded John McCarthy Prize for best paper overall)

Abstract. Cross-stage persistence rules are commonly admitted in multi- stage programming languages. These rules codify the assumption that all module and package dependencies are available at all stages. However, in practice, only a small number of dependencies may be needed at each particular stage.

This paper introduces Explicit Level Imports, a mechanism which gives programmers precise control about which dependencies are required at each stage. Imports are annotated with a modifier which brings identifiers into scope at a specific level. This precision means it is straightforward for the compiler to work out what is exactly needed at each stage, and only provide that. The result is faster compilation times and the potential for improved cross-compilation support.

We have implemented these ideas in GHC Haskell, consider a wide variety of practical considerations in the design, and finally demonstrate that the feature solves a real-world issue in a pragmatic way.

by matthew, rodrigo, adam at April 30, 2025 12:00 AM

April 29, 2025

Dan Piponi (sigfpe)

Expectation-Maximization with Less Arbitrariness

Introduction

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are:

  1. Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
  2. Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
  3. Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a stackexchange answer.


Generalities about EM

The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.


First assume no hidden variables. We assume there is a vector of parameters that defines some model. We make some observations . We have a probability density that depends on . The likelihood of given the observations is . The maximum likelhood estimator for is the choice of that maximises for the we have observed.


Now suppose there are also some variables that we didn't get to observe. We assume a density . We now have

where we sum over all possible values of . The MLE approach says we now need to maximise
One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood
Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise
One reason this would be to our advantage is that often takes the form where is a simple function to optimise. In addition, may break up as a sum of terms, each with its own block of 's. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in.


Maximisation by proxy

Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose is a differentiable function in a neighbourhood of . Then around we have

We can try optimising with respect to within a neighbourhood of . If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing because it matches both the value and derivatives of at . We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.


We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice.


Write

The are constants we'll determine. We want to match the derivatives on either side of the at :
On the other hand we have


To achieve equality we want to make these expressions match. We choose

Our desired proxy function is:


So the procedure is to take an estimated and obtain a new estimate by optimising this proxy function with respect to . This is the standard EM algorithm.


It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise ``too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature.


Afterword

As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute

you can iterate, at each step computing
where is the previous iteration. If the take a convenient form then this may turn out to be much easier.


Note

This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.

by sigfpe (noreply@blogger.com) at April 29, 2025 09:12 PM

Running from the past


Important Note

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.


Preface

Functional programming encourages us to program without mutable state. Instead we compose functions that can be viewed as state transformers. It's a change of perspective that can have a big impact on how we reason about our code. But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations. (Unless you're infinitely unlucky!)



Markov Chains

A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state. So it is a sequence such that for all . A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny). (And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)


Like imperative code, this description is stateful. The state at step depends on the state at step . Probability is often easier to reason about when we work with independent identically drawn random variables and our aren't of this type. But we can eliminate the state from our description using the same method used by functional programmers.


Let's choose a Markov chain to play with. I'll pick one with 3 states called , and and with transition probabilities given by where


Here's a diagram illustrating our states:




Implementation

First some imports:


> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}


> import Data.Sequence(replicateA) > import System.Random > import Control.Monad.State > import Control.Monad > import Data.List > import Data.Array


And now the type of our random variable:


> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)


We are now in a position to simulate our Markov chain. First we need some random numbers drawn uniformly from [0, 1]:


> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random


And now the code to take a single step in the Markov chain:


> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
>     a <- uniform
>     if a < 0.5
>         then return A
>         else return B
> step B = do
>     a <- uniform
>     if a < 1/3.0
>         then return A
>         else if a < 2/3.0
>             then return B
>             else return C
> step C = do
>     a <- uniform
>     if a < 0.5
>         then return B
>         else return C


Notice how the step function generates a new state at random in a way that depends on the previous state. The m ABC in the type signature makes it clear that we are generating random states at each step.


We can simulate the effect of taking steps with a function like this:


> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
>     i <- steps (n-1) i
>     step i


We can run for 100 steps, starting with , with a line like so:


*Main> evalState (steps 3 A) gen
B


The starting state of our random number generator is given by gen.


Consider the distribution of states after taking steps. For Markov chains of this type, we know that as goes to infinity the distribution of the th state approaches a limiting "stationary" distribution. There are frequently times when we want to sample from this final distribution. For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution. But for real world problems this can be intractable. Instead, a popular solution is to pick a large and hope it's large enough. As gets larger the distribution gets closer to the limiting distribution. And that's the problem I want to solve here - sampling from the limit. It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.



Some random functions


Here is a new version of our random step function:


> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
>     a <- uniform
>     return $ \case
>         A -> if a < 0.5 then A else B
>         B -> if a < 1/3.0
>                 then A
>                 else if a < 2/3.0 then B else C
>         C -> if a < 0.5 then B else C


In many ways it's similar to the previous one. But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state. We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:


> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
>   fs <- replicateA n step'
>   return $ foldr (flip (.)) id fs


Notice the use of flip. We want to compose functions , each time composing on the left by the new . This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation: (BTW I used replicateA instead of replicateM to indicate that these are independent random draws. It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)


*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]


When I first implemented this I accidentally forgot the flip. So maybe you're wondering what effect removing the flip has? The effect is about as close to a miracle as I've seen in mathematics. It allows us to sample from the limiting distribution in a finite number of steps!


Here's the code:


> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
>   fs <- replicateA n step'
>   return $ foldr (.) id fs


We end up building . This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'. Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain. Running with argument draws a random composition of functions. But if you increase by 1 you don't add a new step at the end. Instead you effectively restart the Markov chain with a new first step generated by a new random seed.


Try it and see:


*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]


Maybe that's surprising. It seems to get stuck in one state. In fact, we can try applying the resulting function to all three states.


*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]


In other words, for large enough we get the constant function.


Think of it this way: If f isn't injective then it's possible that two states get collapsed to the same state. If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state. Once this happens, we'll get the same result no matter how large we take . If we can detect this then we've found the limit of as goes to infinity. But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition. That flip can't change what probability distribution we're drawing from - just the dependence on the seed. So the value the constant function takes is actually a draw from the limiting stationary distribution.


We can code this up:


> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as


> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool > test_constant f = > all_equal $ map f $ enumFromTo minBound maxBound


This technique is called coupling from the past. It's "coupling" because we've arranged that different starting points coalesce. And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.


> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
>                   m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
>     if test_constant f
>         then return f
>         else do
>             f' <- step
>             couple_from_past step (f . f')


We can now sample from the limiting distribution a million times, say:


*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen


We can now count how often A appears:


*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748


That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.


> gen = mkStdGen 669



Notes

The technique of coupling from the past first appeared in a paper by Propp and Wilson. The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it. Note that the code above is absolutely not how you'd implement this for real. I wrote the code that way so that I could switch algorithm with the simple removal of a flip. In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred. Or even with uncountably large state spaces. But I'll let you read the Propp-Wilson paper to find out how.

by sigfpe (noreply@blogger.com) at April 29, 2025 09:10 PM

Constructing Clifford Algebras using the Super Tensor Product

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

Some literate Haskell but little about this code is specific to Haskell...


> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE UndecidableInstances #-}
> 
> import GHC.TypeLits



Introduction

This is a followup to Geometric Algebra for Free and More Low Cost Geometric Algebra.


In those articles I showed how you could build up the Clifford algebras like so:


type Cliff1  = Complex R
type Cliff1' = Split R
type Cliff2  = Quaternion R
type Cliff2' = Matrix R
type Cliff3  = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4  = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5  = Quaternion Cliff3'
...


I used CliffN as the Clifford algebra for a negative definite inner product and CliffN' for the positive definite case. It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.


The core principle making this work is that for type constructors implemented like Matrix, Quaternion etc. we have the property that



eg. Matrix (Quaternion Float) is effectively the same thing as Matrix Float Quaternion Float.


But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:


> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4


...


Or even better:


> type family Cliff (n :: Nat) :: * where
>   Cliff 0 = Float
>   Cliff n = Complex (Cliff (n - 1))


But there's one little catch. We have to work, not with the tensor product, but the super tensor product.


We define Complex the same way as before:


> data Complex a = C a a deriving (Eq, Show)


Previously we used a definition of multiplication like this:


instance Num a => Num (Complex a) where
  C a b * C c d = C (a * c - b * d) (a * d + b * c) 


We can think of C a b in Complex R as representing the element . The definition of multiplication in a tensor product of algebras is defined by . So we have .


This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.



Let's go Super!

A superalgebra is an algebra graded by where is the ring of integers modulo 2. What that means is that we have some algebra that can be broken down as a direct sum (the subscripts live in ) with the property that multiplication respects the grading, ie. if is in and is in then is in .


The elements of are called "even" (or bosonic) and those in "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)


The super tensor product is much like the tensor product but it respects the grading. This means that if is in and is in then is in . From now on I'm using to mean super tensor product.


Multiplication in the super tensor product of two superalgebras and is now defined by the following modified rule: if is in and is in then . Note that the sign flip arises when we shuffle an odd left past an odd .


The neat fact that John pointed out to me is that .


We have to modify our definition of * to take into account that sign flip.


I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy. But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.


So I could define SuperAlgebra like so:


class SuperAlgebra a where
  conjugation :: a -> a


where conjugation is the negation of the odd part.


(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)


But there's a little efficiency optimization I want to write. If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row. This means applying negate to many elements of large algebraic objects even though any pair of them cancel each other's effect. So I add a little flag to my conjugation function that is used to say we want an extra negate and we can accumulate flips of a flag rather than flips of lots of elements.


> class SuperAlgebra a where
>   conjugation :: Bool -> a -> a


Here's our first instance:


> instance SuperAlgebra Float where
>   conjugation False x = x
>   conjugation True x = negate x


This is saying that the conjugation is the identity on Float but if we want to perform an extra flip we can set the flag to True. Maybe I should call it conjugationWithOptionalExtraNegation.


And now comes the first bit of non-trivial superalgebra:


> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
>   conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)


We consider to be even and to be odd. When we apply the conjugation to then we can just apply it directly to . But that flips the "parity" of (because tensor product respects the grading) so we need to swap when we use the conjugation. And that should explain why conjugation is defined the way it is.


Now we can use the modified rule for defined above:


> instance (Num a, SuperAlgebra a) => Num (Complex a) where
>   fromInteger n = C (fromInteger n) 0
>   C a b + C a' b' = C (a + a') (b + b')
>   C a b * C c d = C (a * c - conjugation False b * d)
>                     (conjugation False a * d + b * c) 
>   negate (C a b) = C (negate a) (negate b)
>   abs = undefined
>   signum = undefined


For example, conjugation False is applied to the first on the RHS because implicitly represents an term and when expanding out the product we shuffle the (odd) in left of . It doesn't get applied to the second because and remain in the same order.


That's it!



Tests

I'll test it with some examples from Cliff3:


> class HasBasis a where
>   e :: Integer -> a


> instance HasBasis Float where > e = undefined


> instance (Num a, HasBasis a) => HasBasis (Complex a) where > e 0 = C 0 1 > e n = C (e (n - 1)) 0


> make a b c d e f g h = > C (C (C a b) (C c d)) > (C (C e f) (C g h))


> e1, e2, e3, e21, e31, e32, e321 :: Cliff 3 > e1 = e 0 > e2 = e 1 > e21 = e2 * e1 > e3 = e 2 > e31 = e3 * e1 > e32 = e3 * e2 > e321 = e3 * e2 * e1


> main = do > print (e1 * e1 + 1 == 0) > print (e31 * e31 + 1 == 0) > print (e3 * e3 + 1 == 0) > print (e21 * e21 + 1 == 0) > print (e2 * e2 + 1 == 0) > print (e32 * e32 + 1 == 0) > print (e321 * e321 - 1 == 0) > print (e3 * e2 * e1 - e321 == 0) > print (e2 * e1 - e21 == 0) > print (e3 * e1 - e31 == 0) > print (e3 * e2 - e32 == 0) > print (e21 * e32 - e31 == 0)



Observation

The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction. It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative). Nonetheless, I think comparison with Cayley-Dickson may be useful.



Efficiency

As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications. I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four. This should lead to a fast implementation of Clifford algebras.


Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras. I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with .


Note also that this code translates directly into many languages.

by sigfpe (noreply@blogger.com) at April 29, 2025 09:08 PM

April 28, 2025

Monday Morning Haskell

New Course Bundles!

With the release of the final module of Solve.hs last week, we now have 7 finished courses available at Monday Morning Haskell Academy. With this many courses, it might be a little challenging to pick the right one.

While we have a little guide on our website to help you pick, I also wanted to make it a bit easier to select the courses for the right level of experience, and also provide some really great deals on our course material.

So last week, we released 3 new course bundles to help you save. The 3 levels are Beginner, Advanced, and Complete.

Beginner Bundle

Our Beginner Bundle includes a total of 4 courses.

This bundle is great if you’re just starting out with Haskell, even if you haven’t even installed it or written a line before! The first two courses in this bundle will help you install your toolchain and learn the language fundamentals. Then after that, you’ll learn about some trickier Haskell concepts, like monads and advanced problem solving techniques.

Along the way, you’ll also get the chance to write a couple small projects to build your skills and confidence. With the progression of these courses, you can really go from “Zero Knowledge” to “Confident Haskell User”.

Advanced Bundle

Our Advanced Bundle is for Haskellers who’ve mastered the basics and are trying to learn how to apply Haskell in some more “real-world” settings. The courses are:

In the first three courses, you’ll learn about things like machine learning, writing web servers, deploying applications, and managing complex effect stacks.

Then you’ll see that our newly completed Solve.hs course appears in both bundles. It bridges the gap between basic problem solving skills, like manipulating lists and strings, to more advanced ideas, like implementing data structures from scratch and writing complex algorithms. So even if you’ve got some decent skills already, you’ll definitely still find quite a few challenges in this course!

MMH Complete

Finally, MMH Complete will give you access to our entire library of courses. You’ll get all 7 courses, at a substantial discount! Plus you are guaranteed to receive any new course content we come up with in the future.

Discounts!

Speaking of discounts, here are the discounts you would get for each bundle vs. purchasing each course individually:

Plus, this week, you can get an extra 20% off all courses and bundles using the code SOLVE25. If you want an even bigger discount, you can subscribe to our newsletter. You’ll get monthly updates AND a code for 30% off all products.

So don’t miss out on these offers! Head to the courses page now! Next week, they’ll be going away!

by James Bowen at April 28, 2025 08:30 AM

April 26, 2025

Gabriella Gonzalez

Unification-free ("keyword") type checking

Unification-free ("keyword") type checking

From my perspective, one of the biggest open problems in implementing programming languages is how to add a type system to the language without significantly complicating the implementation.

For example, in my tutorial Fall-from-Grace implementation the type checker logic accounts for over half of the code. In the following lines of code report I’ve highlighted the modules responsible for type-checking with a :

$ cloc --by-file src/Grace/*.hs       

--------------------------------------------------------------------------------
File                                    blank        comment           code
--------------------------------------------------------------------------------
src/Grace/Infer.hs        ‡               499            334           1696
src/Grace/Type.hs         ‡                96             91            633
src/Grace/Syntax.hs                        61            163            543
src/Grace/Parser.hs                       166             15            477
src/Grace/Lexer.hs                         69             25            412
src/Grace/Normalize.hs                     47             48            409
src/Grace/Context.hs      ‡                72            165            249
src/Grace/Import.hs                        38              5            161
src/Grace/REPL.hs                          56              4            148
src/Grace/Interpret.hs                     30             28            114
src/Grace/Pretty.hs                        25             25            108
src/Grace/Monotype.hs     ‡                11             48             61
src/Grace/Location.hs                      16             15             60
src/Grace/TH.hs                            23             32             53
src/Grace/Value.hs                         12             53             53
src/Grace/Input.hs                         10              8             43
src/Grace/Compat.hs                         9              2             32
src/Grace/Existential.hs  ‡                12             23             25
src/Grace/Domain.hs       ‡                 4              7             20
--------------------------------------------------------------------------------
SUM:                                     1256           1091           5297
--------------------------------------------------------------------------------

That’s 2684 lines of code (≈51%) just for type-checking (and believe me: I tried very hard to simplify the type-checking code).

This is the reason why programming language implementers will be pretty keen to just not implement a type-checker for their language, and that’s how we end up with a proliferation of untyped programming languages (e.g. Godot or Nix), or ones that end up with a type system bolted on long after the fact (e.g. TypeScript or Python). You can see why someone would be pretty tempted to skip implementing a type system for their language (especially given that it’s an optional language feature) if it’s going to balloon the size of their codebase.

So I’m extremely keen on implementing a “lean” type checker that has a high power-to-weight ratio. I also believe that a compact type checker is an important foundational step for functional programming to “go viral” and displace imperative programming. This post outlines one approach to this problem that I’ve been experimenting with1.

Unification

The thing that bloats the size of most type-checking implementations is the need to track unification variables. These variables are placeholders for storing as-yet-unknown information about something’s type.

For example, when a functional programming language infers the type of something like this Grace expression:

(λx → x) true

… the way it typically works is that it will infer the type of the function (λx → x) which will be:

λx → x : α → α

… where α is a unification variable (an unsolved type). So you can read the above type annotation as saying “the type of λx → x is a function from some unknown input type (α) to the same output type (α).

Then the type checker will infer the type of the function’s input argument (true) which will be:

true : Bool

… and finally the type checker will combine those two pieces of information and reason about the final type like this:

  • the input to the function (true) is a Bool
  • therefore the function’s input type (α) must also be Bool
  • therefore the function’s output type (α) must also be Bool
  • therefore the entire expression’s type is Bool

… which gives the following conclusion of type inference:

(λx → x) true : Bool

However, managing unification variables like α is a lot trickier than it sounds. There are multiple unification algorithms/frameworks in the wild but the problem with all of them is that you have to essentially implement a bespoke logic programming language (with all of the complexity that entails). Like, geez, I’m already implementing a programming language and I don’t want to have to implement a logic programming language on top of that just to power my type-checker.

So there are a couple of ways I’ve been brainstorming how to address this problem and one idea I had was: what if we could get rid of unification variables altogether?

Deleting unification

Alright, so this is the part of the post that requires some familiarity/experience with implementing a type-checker. If you’re somebody new to programming language theory then you can still keep reading but this is where I have to assume some prior knowledge otherwise this post will get way too long.

The basic idea is that you start from the “Complete and Easy” bidirectional type checking algorithm which is a type checking algorithm that does use unification variables2 but is simpler than most type checking algorithms. The type checking rules look like this (you can just gloss over them):

Now, delete all the rules involving unification variables. Yes, all of them. That means that all of the type-checking judgments from Figures 9 and 10 are gone and also quite a few rules from Figure 11 disappear, too.

Surprisingly, you can still type check a lot of code with what’s left, but you lose two important type inference features if you do this:

  • you can no longer infer the types of lambda arguments

  • you can no longer automatically instantiate polymorphic code

… and I’ll dig into those two issues in more detail.

Inferring lambda argument types

You lose the ability to infer the type of a function like this one when you drop support for unification variables:

λx → x == False

Normally, a type checker that supports unification can infer that the above function has type Bool → Bool, but (in general) a type checker can no longer infer that when you drop unification variables from the implementation.

This loss is not too bad (in fact, it’s a pretty common trade-off proposed in the bidirectional type checking literature) because you can make up for it in a few ways (all of which are easy and efficient to implement in a type checker):

  • You can allow the input type to be inferred if the lambda is given an explicit type annotation, like this:

    λx → x == False : BoolBool

    More generally, you can allow the input type to be inferred if the lambda is checked against an expected type (and a type annotation is one case, but not the only case, where a lambda is checked against an expected type).

    We’re going to lean on this pretty heavily because it’s pretty reasonable to ask users to provide type annotations for function definitions and also because there are many situations where we can infer the expected type of a lambda expression from its immediate context.

  • You can allow the user to explicitly supply the type of the argument

    … like this:

    λ(x : Bool) → x == False

    This is how Dhall works, although it’s not as ergonomic.

  • You can allow the input type to be inferred if the lambda is applied to an argument

    This is not that interesting, but I’m mentioning it for completeness. The reason it’s not interesting is because you won’t often see expressions of the form (λx → e) y in the wild, because they can more idiomatically be rewritten as let x = y in e.

Instantiating polymorphic code

The bigger issue with dropping support for unification variables is: all user-defined polymorphic functions now require explicit type abstraction and explicit type application, which is a major regression in the type system’s user experience.

For example, in a language with unification variables you can write the polymorphic identity function as:

λx → x

… and use it like this3:

let id = λx → x
in  (id true, id 1)

… but when you drop support for unification variables then you have to do something like this:

let id = λ(a : Type) → λ(x : a) → x
in  (id Bool true, id Natural 1)

Most programmers do NOT want to program in a language where they have to explicitly manipulate type variables in this way. In particular, they really hate explicit type application. For example, nobody wants to write:

map { x : Bool, … large record … } Bool (λr → r.x) rs

So we need to figure out some way to work around this limitation.

The trick

However, there is a solution that I believe gives a high power-to-weight ratio, which I will refer to as “keyword” type checking:

  • add a bunch of built-in functions

    Specifically, add enough built-in functions to cover most use cases where users would need a polymorphic function.

  • add special type-checking rules for those built-in functions when they’re fully saturated with all of their arguments

    These special-cased type-checking rules would not require unification variables.

  • still require explicit type abstraction when these built-in functions are not fully saturated

    Alternatively, you can require that built-in polymorphic functions are fully saturated with their arguments and make it a parsing error if they’re not.

  • still require explicit type abstraction and explicit type application for all user-defined (i.e. non-builtin) polymorphic functions

  • optionally, turn these built-in functions into keywords or language constructs

I’ll give a concrete example: the map function for lists. In many functional programming languages this map function is not a built-in function; rather it’s defined within the host language as a function of the following type:

map : ∀(a b : Type) → (a → b) → List a → List b

What I’m proposing is that the map function would now become a built-in function within the language and you would now apply a special type-checking rule when the map function is fully saturated:

Γ ⊢ xs ⇒ List a   Γ ⊢ f ⇐ a → b
───────────────────────────────
Γ ⊢ map f xs ⇐ List b

In other words, we’re essentially treating the map built-in function like a “keyword” in our language (when it’s fully saturated). Just like a keyword, it’s a built-in language feature that has special type-checking rules. Hell, you could even make it an actual keyword or language construct (e.g. a list comprehension) instead of a function call.

I would even argue that you should make each of these special-cased builtin-functions a keyword or a language construct instead of a function call (which is why I call this “keyword type checking” in the first place). When viewed through this lens the restrictions that these polymorphic built-in functions (A) are saturated with their arguments and (B) have a special type checking judgment are no different than the restrictions for ordinary keywords or language constructs (which also must be saturated with their arguments and also require special type checking judgments).

To make an analogy, in many functional programming languages the if/then/else construct has this same “keyword” status. You typically don’t implement it as a user-space function of this type:

ifThenElse : ∀(a : Type) → Bool → a → a → a

Rather, you define if as a language construct and you also add a special type-checking rule for if:

Γ ⊢ b ⇐ Bool   Γ ⊢ x ⇒ a   Γ ⊢ y ⇐ a
────────────────────────────────────
Γ ⊢ if b then x else y ⇒ a

… and what I’m proposing is essentially greatly exploding the number of “keywords” in the implementation of the language by turning a whole bunch of commonly-used polymorphic functions into built-in functions (or keywords, or language constructs) that are given special type-checking treatment.

For example, suppose the user were to create a polymorphic function like this one:

let twice = λ(a : Type) → λ(x : a) → [ x, x ]

in  twice (List Bool) (twice Bool true)

That’s not very ergonomic to define and use, but we also can’t reasonably expect our programming language to provide a twice built-in function. However, our language could provide a generally useful replicate builtin function (like Haskell’s replicate function):

replicate : ∀(a : Type) → Natural → a → List a

… with the following type-checking judgment:

Γ ⊢ n ⇐ Natural   Γ ⊢ x ⇒ a
───────────────────────────
Γ ⊢ replicate n x ⇒ List a

… and then you would tell the user to use replicate directly instead of defining their own twice function:

replicate 2 (replicate 2 true)

… and if the user were to ask you “How do I define a twice synonym for replicate 2” you would just tell them “Don’t do that. Use replicate 2 directly.”

Conclusion

This approach has the major upside that it’s much easier to implement a large number of keywords than it is to implement a unification algorithm, but there are other benefits to doing this, too!

  • It discourages complexity and fragmentation in user-space code

    Built-in polymorphic functions have an ergonomic advantage over user-defined polymorphic functions because under this framework type inference works better for built-in functions. This creates an ergonomic incentive to stick to the “standard library” of built-in polymorphic functions, which in turn promotes an opinionated coding style across all code written in that language.

    You might notice that this approach is somewhat similar in spirit to how Go handles polymorphism which is to say: it doesn’t handle user-defined polymorphic code well. For example, Go provides a few built-in language features that support polymorphism (e.g. the map data structure and for loops) but if users ask for any sort of user-defined polymorphism then the maintainers tell them they’re wrong for wanting that. The main difference here is that (unlike Go) we do actually support user-defined polymorphism; it’s not forbidden, but it is less ergonomic than sticking to the built-in utilities that support polymorphism..

  • It improves error messages

    When you special-case the type-checking logic you can also special-case the error messages, too! With general-purpose unification the error message can often be a bit divorced from the user’s intent, but with “keyword type checking” the error message is not only more local to the problem but it can also suggest highly-specific tips or fixes appropriate for that built-in function (or keyword or language construct).

  • It can in some cases more closely match the expectations of imperative programmers

    What I mean is: most programmers coming from an imperative and typed background are used to languages where (most of the time) polymorphism is “supported” via built-in language constructs and keywords and user-defined polymorphism might be supported but considered “fancy”. Leaning on polymorphism via keywords and language constructs would actually make them more comfortable using polymorphism instead of trying to teach them how to produce and consume user-defined polymorphic functions.

    For example, in a lot of imperative languages the idiomatic solution for how to do anything with a list is “use a for loop” where you can think of a for loop as a built-in keyword that supports polymorphic code. The functional programming equivalent of “just use a for loop” would be something like “just use a list comprehension” (where a list comprehension is a “keyword” that supports polymorphic code that we can give special type checking treatment).

That said, this approach is still more brittle than unification and will require more type annotations in general. The goal here isn’t to completely recover the full power of unification but rather to get something that’s not too bad but significantly easier to implement.

I think this “keyword type checking” can potentially occupy a “low tech” point in the type checking design space for functional programming languages that need to have efficient and compact implementations (e.g. for ease of embedding). Also, this can potentially provide a stop-gap solution for novice language implementers that want some sort of a type system but they’re not willing to commit to implementing a unification-based type system.

There’s also variation on this idea which Verity Scheel has been exploring, which is to provide userland support for defining new functions with special type-checking rules and there’s a post from her outlining how to do that:

User Operators with Implicits & Overloads


  1. The other approach is to create essentially an “ABNF for type checkers” that would let you write type-checking judgments in a standard format that could generate the corresponding type-checking code in multiple languages. That’s still a work-in-progress, though.↩︎

  2. I believe some people might take issue with calling these unification variables because they consider bidirectional type checking as a distinct framework from unification. Moreover, in the original bidirectional type checking paper they’re called “unsolved” variables rather than unification variables. However, I feel that for the purpose of this post it’s still morally correct to refer to these unsolved variables as unification variables since their usage and complexity tradeoffs are essentially identical to unification variables in traditional unification algorithms.↩︎

  3. … assuming let expressions are generalized.↩︎

by Gabriella Gonzalez (noreply@blogger.com) at April 26, 2025 07:49 PM

Mark Jason Dominus

Willie Singletary will you please go now?

(Previously: [1] [2])

Welcome to Philadelphia! We have a lot of political corruption here. I recently wrote about the unusually corrupt Philadelphia Traffic Court, where four of the judges went to the federal pokey, and the state decided there was no way to clean it up, they had to step on it like a cockroach. I ended by saying:

One of those traffic court judges was Willie Singletary, who I've been planning to write about since 2019. But he is a hard worker who deserves better than to be stuck in an epilogue, so I'll try to get to him later this month.

This is that article from 2019, come to fruit at last. It was originally inspired by this notice that appeared at my polling place on election day that year:

(Click for uncropped version)
 Cropped version of a notice posted in a Philadelphia polling place, text as below”

VOTES FOR THIS CANDIDATE WILL NOT BE COUNTED

DEAR VOTERS:

Willie Singletary, candidate for Democratic Council At-Large, has been removed from the Primary Ballot by Court Order. Although his name appears on the ballot, votes for this candidate will not be counted because he was convicted of two Class E felonies by the United States District Court for the Eastern District of Pennsylvania, which bars his candidacy under Article 2, Section 7 of the Pennsylvania Constitution.

That's because Singletary had been one of those traffic court judges. In 2014 he had been convicted of lying to the FBI in connection with that case, and was sentenced to 20 months in federal prison; I think he actually served 12.

That didn't stop Willie from trying to run for City Council, though, and the challenge to his candidacy didn't wrap up before the ballots were printed, so they had to post these notices.

Even before the bribery scandal and the federal conviction, Singletary had already lost his Traffic Court job when it transpired that he had showed dick pics to a Traffic Court cashier.

Before that, when he was campaigning for the Traffic Court job, he was caught on video promising to give favorable treatment to campaign donors.

But Willie's enterprise and go-get-it attitude means he can't be kept down for long. Willie rises to all challenges! He is now enjoying a $90,000 annual salary as a Deputy Director of Community Partnerships in the administration of Philadelphia Mayor Cherelle Parker. Parker's spokesperson says "The Parker administration supports every person’s right to a second chance in society.”

I think he might be on his fourth or fifth chance by now, but who's counting? Let it never be said that Willie Singletary was a quitter.

Lorrie once made a remark that will live in my memory forever, about the "West Philadelphia local politics-to-prison pipeline”. Mayor Parker is such a visionary that she has been able to establish a second pipeline in the opposite direction!

Addendum 20250501

I don't know how this happened, but when I committed the final version of this article a few days ago, the commit message that my fingers typed was:

Date:   Sat Apr 26 14:24:19 2025 -0400

    Willie Wingletsray finally ready to go

And now, because Git, it's written in stone.

by Mark Dominus (mjd@plover.com) at April 26, 2025 06:24 PM

Matthew Sackman

Rallentando, and the awfulness of anti-virus software

Since I was a child, I’ve been playing the French Horn. I still play, and I take it quite seriously. I’m lucky enough to play with some good ensembles, and I perform many concerts each year.

When learning difficult music, I often practise with a metronome. A metronome is a device that clicks or beeps regularly. You can set how often it clicks; for example you might set it to click 80 times a minute. The tempo (or speed) of a piece of music is often specified by the composer telling you how many beats per minute they want. This is guidance and not sacrosanct: you don’t normally have to play at exactly this tempo, and frequently music requires some implicit variation of tempo for it to be successful. But it certainly is an important piece of information from the composer, telling you at what basic speed the piece should be going.

The problem with metronomes is that they can’t change their speed by themselves. For complex music, the speed could be changing a lot, or, maybe the number of beats per bar changes. This is annoying because it means for some pieces of music you have to keep stopping playing, readjust the metronome, and then continue on. There are also gradual changes in tempo in music: a part of a piece might gently get faster or slower. No metronome can cope with this: essentially, metronomes know nothing about the piece of music you’re playing.

So in some spare time this year, I built one that does understand music.

It does nearly all the musical things you would want it to. As this is mainly a technical blog though, here I’ll focus on those aspects.

It could have been an app; I’ve built apps before. But the faff of getting onto the play store, or the app store is just not worth it. The development tools are heavyweight and annoying. Sending out new versions requires approval processes, and you have to convince people to install something before they can use it. So I wanted this to be browser based. Also, modern web browsers are pretty amazing – great features and well designed APIs. Yep, all the old APIs are terrible and awful to work with, but everything that’s at all recent is pretty great.

Drawing music in a browser is quite a challenge though. The way I’m doing it is I’m building an SVG, client-side. This was the first thing I started on: trying to figure out how to draw music in a browser, how to be able to detect clicks, and make it all interactive. The client side code is all generated from TypeScript using the plain tsc to do the translation to JavaScript. I can’t stand complex tool-chains, and modern browsers are absolutely fine with loading modules (and you can do some really nice things with import maps as we’ll see). I’m not even minimising the JavaScript: I’ve written the server myself; the modules are sent over the wire gzipped and I have correct cache-control settings using immutable and “cache-busting”, so minimising the source just makes debugging life harder for no real gain.

A score is essentially a list of blocks. I’m using a CRDT (the fugue list CRDT) to allow local-first editing (and even offline editing). Dirty blocks get sent over a websocket and stored on the server, using LMDB which is all very normal for me.

The server has a neat part of its design: when you compile the server, all the static assets are embedded into the binary, thus making it a single self-contained executable. Now those assets (HTML, CSS, images, JavaScript etc) are just normally named files, but they can also be Go templates. When the server starts up, it works through these static assets, building HTTP routes for them. The HTTP routes contain in their paths the hashcode of the file – this is necessary for the cache busting. If the asset is a template, the server knows how to run the template, and critically, I provide a url function in the template engine so that templates can get the URL of some other asset including its hashcode. So this means that if some HTML file needs to link to some CSS file, the HTML file as built into the server can be a template. At start up, this template gets run, it can invoke this url function, and it can find out the final URL of the CSS file. And of course this URL now influences the hashcode of the HTML file itself. This also plays very nicely with the integrity attribute you can put on all sorts of things these days.

So it all works out rather nicely: if you consider the tree of file A importing files B and C, and file B imports file D, then if I make some change to file D, then it means its hashcode changes, and so its URL changes. This propagates up to file B, and from there to file A (but no change to file C). So it’s safe to serve all these static assets with immutable cache-control headers and rely on this lovely hashcode chaining. All of this work is done once, each time the server starts-up. And it’ll all explode and stop working if there’s ever a cycle in the graph of file imports.

Now in practice, it seems that references between HTML, CSS, images, or JavaScript don’t seem to create cycles – at least I’ve not had a problem so far. But between JavaScript modules, it’s much more common, as you’d likely expect. But here, import maps come to the rescue: in my TypeScript/JavaScript, I just import modules normally. I have a function in the template engine which knows how to generate an import-map of all my JavaScript modules, which gets injected into the top HTML page. This import-map provides both the rewriting of paths (to add hashcodes onto the paths), and also provides the integrity section. This solves the problem of circular imports because it means the JavaScript itself never needs to contain the hashcode of any module it imports. Yet, if I change some JavaScript module, then its hashcode changes, which means the import-map changes, and so again, the browser is forced into correctly fetching the updated resource.


A couple of weekends ago, I was up visiting my parents and I wanted to demonstrate this thing to them (they’re also musicians). They have computers running Windows. I tried loading up a demo score, and it just didn’t work. Their browsers were up to date. Nothing of note in the server logs, so I opened up the browser console and found errors from deserialisation of data coming over the websocket: it was claiming the data was corrupted. I’d never seen this in my own development and use.

Checking a few other things, and I spotted that the source HTML for the page had had some additional <script> elements added to it: something was injecting some JavaScript. And then the penny dropped: this is MITM behaviour by some shitty anti-virus software – in this case, AVG. Some quick web searching, and yep, those products are also known for dicking around with websocket traffic: if you’re sending binary messages and you’re compressing the stream, it’s apparently quite common that the anti-virus software intercepts the traffic, and then screws up the framing leading your own code to face corrupted data. Completely ridiculous.

In my case, disabling compression on the websocket was enough to prevent the corruption, and I then established that even for big scores, the initial load would be maybe 25kB of data over the websocket, so not compressing it isn’t terrible.

What made me laugh though was this: the browser console was telling me both about the corrupted data, and also about the fact the browser was refusing to run some script, due to it violating CSP settings. It took me a moment to realise that the script that wasn’t being run, was the script that the anti-virus software was injecting into my pages! Now, import-maps can’t be external files, they have to be inline in the HTML. But there’s no way I’m serving HTML pages with a CSP header with script-src 'unsafe-inline'. Instead, I’m issuing a script-src CSP header with 'self' and also the hashcode of import-map itself.

What this says to the browser is that it can trust the import-map (because hashing it will give a hashcode that matches the CSP header), the import-map itself has its integrity for every JavaScript module it needs to load, and the CSP headers also tell the browser that it’s OK to load JavaScript modules from the same domain (this is the 'self'). But, this does not give permission for the browser to run arbitrary bits of crap JavaScript that some awful anti-virus thing has injected! So, by making use of CSP and import-maps, you can defeat attackers from tampering with your website and code!

April 26, 2025 05:30 PM

April 24, 2025

Tweag I/O

The minimal megaparsec tutorial

In my functional programming course to Master Students of Telecom Nancy, I like to use parsing as an example of monadic programming, relying on the megaparsec library. My only concern with megaparsec is that its official tutorial is long: at the time I’m writing, it’s 15000 words long.

Unlike the official megaparsec tutorial, this blog post is intended to be smaller, and is aimed at an audience with only a basic understanding of Haskell and monadic programming.

All the Haskell material from this blogpost is available on our GitHub: https://github.com/tweag/minimal-megaparsec-tutorial. You can fork this repository to get a full-fledged setup (including CI and Haskell Language Server support) for experimenting with a megaparsec parser 🚀

Running example

My running example is a parser for a domain-specific language that I designed for the class. This language uses primitive drawing commands to represent ASCII art roguelike maps. It looks like this:

HLine 0 0 8; HLine 0 4 8; VLine 0 0 5; VLine 7 0 2; VLine 7 3 2
HLine 8 1 2; HLine 8 3 2
HLine 10 0 6; HLine 10 4 6; VLine 10 0 2; VLine 10 3 2; VLine 15 0 5
Start 2 1
Cell 13 3 ~; Cell 14 3 ~; Cell 14 2 ~

Here, HLine x y len and VLine x y len draw horizontal and vertical walls respectively. The Start x y command marks the player’s starting point and Cell x y ~ places special terrain.

Roguelike maps typically consist of rectangular rooms and connecting corridors, where walls are shown as #, water as ~, and walkable spaces as dots (.) For example, the snippet above draws a map with two connected rooms. The room on the left contains the player’s start location (>), while some water appears in the lower right corner of the room on the right:

########  ######
#.>....####....#
#.............~#
#......####..~~#
########  ######

Walkable floor cells are omitted from the domain-specific language, as they can be inferred by computing the set of cells reachable from the starting point. In implementations of roguelikes, maps like this one are translated into an array of arrays of symbols, with some symbols being walkable (e.g. dot cells and water cells) and some symbols being blockers (walls). The top-level array is then used to compute possible moves and collisions.

The Parsec monad

To use megaparsec, we define our main monad type using the Parsec e s a type. It has three arguments:

  1. The type of errors returned by the parser,
  2. the type of stream accepted as input by the parser, and
  3. the type of data returned upon successful parsing of an input stream.

For a simple parser, we define:

  • The error type to be Text, for simplicity. In a production parser, you would use a structured error type, that distinguishes the different error cases; so that you can handle them differently.
  • The input stream to be Text, because this is the most idiomatic choice in the Haskell ecosystem:
import Data.Text (Text)
import Text.Megaparsec

type Error = Text
type Input = Text

-- | @Parser a@ is a parser that accepts @Text@ as input and returns an @a@ upon
-- successful parsing.
type Parser a = Parsec Error Input a

Our first parser

Parsers are built from primitive combinators (e.g. lookAhead, notFollowedBy, end of file eof) and combinators derived from them (e.g. oneOf, anySingle, satisfy). These combinators are designed to consume a few symbols, not complex structures (more on this later).

Combinators return parsers in any MonadParsec monad, which means that they have a signature where the head is MonadParsec e s m => ... and the return type is of the form m a 1. In our context, it suffices to know that m a is instantiated to Parser a, so we can use these combinators for our parsers.

Let’s parse the different kinds of symbols we usually find in ASCII art roguelike maps, using the anySingle function, which parses a single token. In our case, since the input type is Text, the type of tokens is Char (see the ShareInput case of Stream’s documentation, as well as the instances of Stream):

-- | A symbol in the map of an ASCII roguelike
data Symbol
  = -- | A wall, depicted by a # character
    Wall
  | -- | A water cell, depicted by a ~ character
    Water
  deriving (Eq, Show)

-- | A parser for the symbol of a single cell. Used in 'parseElement' below.
parseSymbol :: Parser Symbol
parseSymbol = do
  c <- anySingle
  case c of
    '#' -> return Wall
    '~' -> return Water
    _   -> fail $ "Unknown symbol: " <> [c] -- See below for how to avoid this case altogether (in parseLineElement)

Parser combinators

By virtue of MonadParsecs being monads, parsers can be built using functions that are common in monadic Haskell code (including functions from Functor, Applicative, etc.). Let’s demonstrate this to build a parser for more advanced roguelike map constructs:

data Element
  = -- | Horizontal wall, starting at @(x,y)@ with @length@ cells (ending at @(x+length-1,y)@)
    HorizontalLine Int Int Int
  | -- | Vertical wall, starting at @(x,y)@ with @length@ cells (ending at @(x,y+length-1)@)
    VerticalLine Int Int Int
  | -- | A cell at @(x,y)@ with a symbol
    Cell Int Int Symbol
  | -- | The starting point of the player
    Start Int Int
  deriving (Eq, Show)

The parser for the HorizontalLine and VerticalLine cases can be written as follows:

import Control.Monad (void)
import Control.Monad.Extra (when)
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer

parseLineElement :: Parser Element
parseLineElement = do
  constructor <- choice [string "HLine" >> return HorizontalLine, string "VLine" >> return VerticalLine]
  space1 -- One or more space
  x <- decimal
  space1
  y <- decimal
  space1
  len <- decimal
  when (len < 1) $ fail $ "Length must be greater than 0, but got " <> show len
  return $ constructor x y len

The first two lines either parse the string HLine or the string VLine and use the choice function to encode the two possibilities. Also, because each line in a do block encodes a step in the computation, writing monadic parsers is natural: each line consumes some of the input, until enough is consumed to return the desired value. Another example of using a regular monadic function is to use when to stop parsing when an incorrect value is consumed.

Running parsers

Since our parser takes Text as input, it can be tested in a pure context. Megaparsec provides the runParser function for this. To be able to print errors of our parser, our error type must be an instance of ShowErrorComponent; and then we can define a convenient runMyParser function that returns either an error or the parsed value:

import Data.Text (pack, unpack)

-- | Instance required for 'runMyParser'
instance ShowErrorComponent Error where
  showErrorComponent = unpack

-- | A variant of megaparsec's 'runParser', instantiated to our context.
-- Successfully parses an @a@ or returns an error message.
runMyParser :: Parser a -> Input -> Either Text a
runMyParser parser input =
  case runParser parser "" input of
    Left err -> Left $ pack $ errorBundlePretty err
    Right x  -> Right x

Parsing expressions, lists, etc.

Megaparsec not only provides building blocks for parsing tokens and combining parsers. It also provides parsers for common constructs found in programming languages and domain-specific languages, such as expressions and lists. Megaparsec does this by relying on the parser-combinators package.

I don’t want to go into the details of parsing expressions here (e.g. parsing 1 + 2 - 3…), but let me emphasize that it is a bad idea to write your own expression parser. Instead, think about what kind of operators you need and encode them, using the Operator type.

List parsing, on the other hand, is done with various sep… functions. In our case of roguelike maps, we allow different elements to be separated by a semicolon, or by one or more newlines. This is encoded as follows:

parseElements :: Parser [Element]
parseElements = parseElement `sepBy1` separator
  where
    separator = do
      hspace -- Optional horizontal (non-newline) space
      choice [void $ char ';', void $ some eol] -- Either a single ';' or many newlines
      hspace
    parseElement :: Parse Element
    parseElement = choice [parseLineElement, parseStart, parseCell]
      where
        parseStart = do
          void $ string "Start"
          space1
          (x, y) <- parseCoord
          return $ Start x y
        parseCell = do
          void $ string "Cell"
          space1
          (x, y) <- parseCoord
          space1
          symbol <- parseSymbol
          return $ Cell x y symbol
        parseCoord = do
          x <- decimal
          space1
          y <- decimal
          return (x, y)

Conclusion

We’ve presented how to parse simple constructs using megaparsec and how to run our parsers. This blog post is less than 1500 words long: mission accomplished presenting megaparsec in a shorter way than the official tutorial 🥳

If you want to use the code from this blog post as a starting point, feel free to clone https://github.com/tweag/minimal-megaparsec-tutorial. And once your project is moving away from a minimal viable product, head over to megaparsec’s official tutorial to learn about more advanced ways to use megaparsec!


  1. This is an instance of the monad transformer pattern.↩

April 24, 2025 12:00 AM

Chris Penner

3 things other languages should steal from Unison

3 things other languages should steal from Unison

New languages are coming out all the time, some experimental, some industrial, others are purpose built for a specific domain. No single language has the people-power or scope to try every cool new feature, so a critical step in designing a new language is to observe how experimental features have borne themselves out in practice.

As the saying goes, good [language designers] copy, great [language designers] steal.

If you've heard anything about the Unison Language it's not a surprise to you that it innovates in many areas. Unison very much tries to reinvent Human-Compiler interactions for the 21st century, and in that pursuit has spawned fully integrated ecosystem between the compiler, codebase-manager, language server, version control and package manager.

While some of these features are still too new to have proven their worth (but we have our fingers crossed); there are aspects that I think new languages should certainly consider as part of their designs.

A Fully Interactive and Incremental Compiler

With the modern era of language servers and programming assistants, developers greatly benefit from instant feedback on their work. With traditional batch compilers it's all too tempting to go for a coffee, or a walk, or a YouTube binge every time you kick off a big build. The context-switching induced by switching tasks while compiling wastes developer time by paging things in and out of their working memory, not to mention: it just feels bad. After the build finishes, the developer is left with a giant wall of text, sentenced to dig through a large list of compiler errors trying to find some root-cause error in the file they're working on.

Unison has a fully interactive compilation experience. The language-server is typechecking your scratch-file on every keystroke providing error feedback right in your editor, and offering helpful information via hover-hints which use your codebase and typechecking info to help you orient yourself. It can even partially typecheck the file to suggest which types or operators you may want to fill into a given slot.

Once you're happy with a chunk of code, you can check it in to the codebase and it won't be compiled again unless you want to change it, or an update is automatically propagated into it from a downstream change.

While most languages won't adopt Unison's scratch-file and codebase model; having an interactive compiler with good support for caching of already-compiled-assets is a huge boon to productivity in any language.

On the topic of the language server, Unison's language server is built directly into the compiler. This ensures we avoid the awkward disagreements between the LSP and compiler that sometimes happen in other languages. It can also help to avoid duplicate work, many languages are running the compiler independently and in their LSP at the same time without sharing any of the work between them, causing redundant work and a waste of precious resources.

Codebase API

It's the compiler's job to understand your code intimately. It knows exactly how every definition is linked together, even if you don't! In many languages it can be frustrating to know that this information exists deep within the compiler, but not having any access to it yourself!

Unison stores all your code as structured data within your codebase and exposes the ability for you to ask it useful questions about your code, exposing that precious understanding to you as a developer.

Unison allows searching by type, finding the dependencies of a definition, or inverting that relationship to finding all definitions which depend on a definition.

Via the UCM CLI you can use utilities like text.find to search only string constants, or find to search only definition names.

Some codebase data is provided via an API which is exposed from the interactive UCM compiler, allowing developers to write tooling to customize their workflow. For example, check out this VS Code plugin someone wrote to view codebase definitions in the sidebar. In other languages you'd typically need to write a scrappy Regex or re-compile the code in a subprocess in order to achieve something similar.

It doesn't have to be an API, it could be a parquet file or a SQLite database or any number of things, the important part is that a language exposes its one-true-source of information about the codebase in some structured format for third-party tools to build upon.

Smart docs

It doesn't matter how great your language's package ecosystem is if nobody can figure out how to use it! Documentation is critical for helping end users understand and use functionality in your language, but it has a fatal flaw: documentation isn't compiled and falls out of date with the code.

In Unison, docs are a data-type within the language itself. This means that docs can be generated dynamically by running Unison code! We've leveraged this ability to enable embedding typechecked runnable code examples into your docs. These examples are compiled alongside the rest of your program, so they're guaranteed to be kept up to date, and the outputs from your example code is run and updated whenever the source definitions change.

You can also write code which generates documentation based on your real application code. For example, you could write code which crawls your web-server's implementation and collects all the routes and parameters the server defines and displays them nicely as documentation.

Unison goes one step further here by providing special support for the documentation format on Unison Share, ensuring any definitions mentioned in docs and code examples are hyper-linked to make for a seamless package-browsing experience.

As an example of how far this can go, check out this awesome project by community contributor Alvaro which generates mermaid graphs in the docs representing the behaviour of simulations. The graphs are generated from the same underlying library code so they won't go out of date.

Get stealing

This subset of topics doesn't touch on Unison's ability system, continuation capturing, or code serialization so I'll probably need at least a part 2!

Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! �

Become a Patron!

April 24, 2025 12:00 AM

April 23, 2025

Haskell Interlude

64: Sandy Maguire

In this episode Mike Sperber and Niki Vazou talk with Sandy Maguire, lead compiler engineer at Manifold Valley. They talk about the benefits of using Haskell of course, about all the books Sandy has written, on effects and the problem with monads, on combinator libraries and programming with laws.

by Haskell Podcast at April 23, 2025 10:00 PM

April 21, 2025

Monday Morning Haskell

Solve.hs Module 4 Now Available!

Back in 2023, I introduced Solve.hs, my newest course focused on problem solving in Haskell. This course was inspired by my experiences solving programming puzzles with Haskell, especially by the feeling of how different it was compared to other languages.

Solve.hs will teach you all the core knowledge you need around data structures and algorithms to tackle not only these kinds of puzzles (which often appear as interview questions), but also the mindset shifts you have to make when solving them in Haskell.

In 2023, I released the first two modules, which focused on data structures, with a special emphasis on how Haskell uses linked lists. These also explored the patterns that replace ’for’ and ‘while’ loops from other languages.

Then in 2024 I released module 3, which explained all of the most essential algorithms in great detail, and showed how we have to implement them differently in Haskell.

Finally, today, I am releasing the fourth and final module for this course! This module explains parsing in great detail. You’ll learn:

  1. Basic string manipulation techniques for simple parsing
  2. How to use libraries to parse common data formats (e.g. JSON)
  3. How to use the Megaparsec library to parse any other kind of structured data
  4. How to write your own monadic parser
  5. How to use regular expressions for parsing in Haskell

These skills can be important in puzzle solving challenges where your input is just a string. But they’re also applicable in a wide variety of “real world” projects!

For the next 2 weeks, you can get Solve.hs for 20% off with the code SOLVE25. You can also get an extra 10% discount by subscribing to our newsletter!

After these 2 weeks are up, you’ll not only lose the discount, but the price of the course will go up to reflect the added material from module 4. This course will never be cheaper, so grab it now by going to the course page!

by James Bowen at April 21, 2025 08:30 AM

April 18, 2025

GHC Developer Blog

GHC 9.10.2-rc1 is now available

GHC 9.10.2-rc1 is now available

bgamari - 2025-04-18

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.10.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of severities and scopes. A full accounting of these fixes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

This release candidate will have a two-week testing period. If all goes well the final release will be available the week of 1 May 2025.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at April 18, 2025 12:00 AM

April 17, 2025

Tweag I/O

Frontend live-coding via ghci

A few months ago, I announced that the GHC wasm backend added support for Template Haskell and ghci. Initially, the ghci feature only supported running code in nodejs and accessing the nodejs context, and I’ve been asked a few times when ghci was going to work in browsers in order to allow live-coding the frontend. Sure, why not? I promised it in the last blog post’s wishlist. After all, GHCJS used to support GHCJSi for browsers almost 10 years ago!

I was confident this could be done with moderate effort. Almost all the pieces are already in place: the external interpreter logic in GHC is there, and the wasm dynamic linker already works in nodejs. So just make it runnable in browsers as well, add a bit of logic for communicating with GHC and we’re done right? Well, it still took a few months for me to land it…but finally here it is!

To keep this post within reasonable length, I will only introduce the user-facing aspects of the wasm ghci browser mode and won’t cover the underlying implementation. The rest of the post is an example ghci session followed by a series of bite sized subsections, each covering one important tip about using this feature.

How to use it

The ghc-wasm-meta repo provides user-facing installation methods for the GHC wasm backend. Here we’ll go with the simplest nix-based approach:

$ nix shell 'gitlab:haskell-wasm/ghc-wasm-meta?host=gitlab.haskell.org'
$ wasm32-wasi-ghc --interactive -fghci-browser
GHCi, version 9.12.2.20250327: https://www.haskell.org/ghc/  :? for help
Open http://127.0.0.1:38827/main.html or import http://127.0.0.1:38827/main.js to boot ghci

The -fghci-browser flag enables the browser mode. There are a couple of other related flags which you can read about in the user manual, but for now, let’s open that page to proceed. You’ll see a blank page, but you can press F12 to open the devtools panel and check the network monitor tab to see that it’s sending a lot of requests and downloading a bunch of wasm modules. Within a few seconds, the initial loading process should be complete, and the ghci prompt should appear in the terminal and accept user commands.

Let’s start with the simplest:

ghci> putStrLn "hello firefox"
ghci>

The message is printed in the browser’s devtools console. That’s not impressive, so let’s try something that only works in a browser:

ghci> import GHC.Wasm.Prim
ghci> newtype JSButton = JSButton JSVal
ghci> foreign import javascript unsafe "document.createElement('button')" js_button_create :: IO JSButton
ghci> foreign import javascript unsafe "document.body.appendChild($1)" js_button_setup :: JSButton -> IO ()
ghci> btn <- js_button_create
ghci> js_button_setup btn

A fresh button just appeared on the page! It wouldn’t be useful if clicking it does nothing, so:

ghci> newtype Callback t = Callback JSVal
ghci> foreign import javascript "wrapper sync" syncCallback :: IO () -> IO (Callback (IO ()))
ghci> foreign import javascript unsafe "$1.addEventListener('click', $2)" js_button_on_click :: JSButton -> Callback (IO ()) -> IO ()

The above code implements logic to export a Haskell IO () function to a JavaScript synchronous callback that can be attached as a button’s client event listener. Synchronous callbacks always attempt to run Haskell computations to completion, which works fine as long as the exported Haskell function’s main thread does not block indefinitely, like waiting for an async JSFFI import to resolve or be rejected. You can read more about JSFFI in the user manual, but let’s carry on with this example:

ghci> import Data.IORef
ghci> ref <- newIORef 0
ghci> :{
ghci| cb <- syncCallback $ do
ghci|   print =<< readIORef ref
ghci|   modifyIORef' ref succ
ghci| :}
ghci> js_button_on_click btn cb

Now, the button is attached to a simple counter in Haskell that prints an incrementing integer to the console each time the button is clicked. And that should be sufficient for a minimal demo! Now, there are still a couple of important tips to be mentioned before we wrap up this post:

Hot reloading

Just like native ghci, you can perform hot reloading:

ghci> :r
Ok, no modules to be reloaded.
ghci> btn
<interactive>:15:1: error: [GHC-88464]
    Variable not in scope: btn

Reloading nukes all bindings in the current scope. But it doesn’t magically undo all the side effects we’ve performed so far: if you click on the button now, you’ll notice the counter is still working and the exported Haskell function is still retained by the JavaScript side! And this behavior is also consistent with native ghci: hot-reloading does not actually wipe the Haskell heap, and there exist tricks like foreign-store to persist values across ghci reloads.

For the wasm ghci, things like foreign-store should work, though you can allocate a stable pointer and print it, then reconstruct the stable pointer and dereference it after a future reload. Since wasm ghci runs in a JavaScript runtime after all, you can also cook your global variable by assigning to globalThis. Or locate the element and fetch its event handler, it should be the same Haskell callback exported earlier which can be freed by freeJSVal.

So, when you do live-coding that involve some non-trivial back and forth calling between JavaScript and Haskell, don’t forget that hot reloads don’t kill old code and you need to implement your own logic to disable earlier callbacks to prevent inconsistent behavior.

Loading object code

The wasm ghci supports loading GHC bytecode and object code. All the code you type into the interactive session is compiled to bytecode. The code that you put in a .hs source file and load via command line or :l commands can be compiled as object code if you pass -fobject-code to ghci.

I fixed the ghci debugger for all 32-bit cross targets since the last blog post. Just like native ghci, debugger features like breakpoints now work for bytecode. If you don’t use the ghci debugger, it’s recommended that you use -fobject-code to load Haskell modules, since object code is faster and more robust at run-time.

Interrupting via ^C

My GHC patch that landed the ghci browser mode also fixed a previous bug in wasm ghci: ^C was not handled at all and would kill the ghci session. Now, the behavior should be consistent with native ghci. With or without -fghci-browser, if you’re running a long computation and you press ^C, an async exception should interrupt the computation and unblock the ghci prompt.

Read the :doc, Luke

Among the many changes I landed in GHC since last blog post, one of them is adding proper haddock documentation to all user-facing things exported by GHC.Wasm.Prim. Apart from the GHC user manual, the haddock documentation is also worth reading for users. I haven’t set up a static site to serve the haddock pages yet, but they are already accessible in ghci via the :doc command. Just try import GHC.Wasm.Prim and check :doc JSVal or :doc freeJSVal, then you can read them in plain text.

As the Haskell wasm user community grows, so will the frustration with lack of proper documentation. I’m slowly improving that. What you see in :doc will continue to be polished, same for the user manual.

Importing an npm library in ghci

You can use JavaScript’s dynamic import() function as an async JSFFI import. If you want to import an npm library in a ghci session, the simplest approach is using a service like esm.run which serves pre-bundled npm libraries as ES modules over a CDN.

If you have a local npm project and want to use the code there, you need to do your own bundling and start your own development server that serves a page to make that code somehow accessible (e.g. via globalThis bindings). But how does that interact with the wasm ghci? Read on.

Using ghci to debug other websites

The browser mode works by starting a local HTTP server that serves some requests to be made from the browser side. For convenience, that HTTP server accepts CORS requests from any origin, which means it’s possible to inject the main.js startup script into browser tabs of other websites and use the wasm ghci session to debug those websites! Once you fire up a ghci session, just open the devtools console of another website and drop a import("http://127.0.0.1:38827/main.js") call, if that website doesn’t actively block third-party scripts, then you can have more fun than running it in the default blank page.

All JavaScript code for the GHC wasm backend consists of proper ES modules that don’t pollute the globalThis namespace. This principle has been enforced since day one, which allows multiple Haskell wasm modules or even wasm ghci sessions to co-exist in the same page! It works fine as long as you respect their boundaries and don’t attempt to do things like freeing a JSVal allocated elsewhere, but even if you only have one wasm module or ghci session, the “no global variable” principle should also minimize the interference with the original page.

In my opinion, being able to interact with other websites is the most exciting aspect of the browser mode. Sure, for Haskell developers that want to experiment with frontend development, using ghci should already be much easier than setting up a playground project and manually handling linker flags, wrapper scripts, etc. But there’s even greater potential: who said the website itself needs to be developed in Haskell? Haskell can be used to test websites written in foreign tech stacks, and testing backed by an advanced type system is undoubtedly one of our core strengths! You can use libraries like quickcheck-state-machine or quickcheck-dynamic to perform state machine property testing interactively, which has much greater potential of finding bugs than just a few hard coded interactions in JavaScript.

No host file system in wasm

The default nodejs mode of wasm ghci has full access to the host file system, so you can use Haskell APIs like readFile to operate on any host file path. This is no longer the case for browser mode: the only handles available are stdout/stderr, which output to the devtools console in a line-buffered manner, and there’s no file to read/write in wasm otherwise. The same restriction also applies to Template Haskell splices evaluated in a browser mode ghci session, so splices like $(embedFile ...) will fail.

This is a deliberate design choice. The dev environment backed by ghci browser mode should be as close as possible to the production environment used by statically linked wasm modules, and the production environment won’t have access to the host file system either. It would be possible to add extra plumbing to expose the host file system to ghci browser mode, but that is quite a bit of extra work and also makes the dev environment less realistic, so I’d like to keep the current design for a while.

If you need to read a local asset, you can serve the asset via another local HTTP server and fetch it in ghci. If you have modules that use splices like embedFile, those modules should be pre-compiled to object code and loaded later in ghci.

Don’t press F5

It’s very important that the browser page is never refreshed. The lifetime of the browser tab is supposed to be tied to the ghci session. Just exit ghci and close the tab when you’re done, but refreshing the page would completely break ghci! A lot of shared state between the browser side and host side is required to make it work, and refreshing would break the browser side of the state.

Likewise, currently the browser mode can’t recover from network glitches. It shouldn’t be a concern when you run GHC and the browser on the same machine, but in case you use SSH port forwarding or tailscale to establish the GHC/browser connection over an unstable network, once the WebSocket is broken then the game is over.

This is not ideal for sure, but supporting auto-recovery upon network issues or even page reloads is incredibly challenging, so let’s live with what is supported for now.

Doesn’t work on Safari yet

Currently the browser mode works fine for Firefox/Chrome, including desktop/mobile versions and all the forks with different logos and names. Sadly, Safari users are quite likely to see spurious crashes with a call_indirect to a null table entry error in the console. Rest assured, normal statically-linked Haskell wasm modules still work fine in Safari.

This is not my fault, but WebKit’s! I’ve filed a WebKit bug and if we’re lucky, this may be looked into on their side and get fixed eventually. If not, or if many people complain loudly, I can implement a workaround that seems to mitigate the WebKit bug to make the browser mode work in Safari too. That’ll be extra maintenance burden, so for now, if you’re on macOS, your best bet is installing Firefox/Chrome and using that for ghci.

Huge libraries don’t work yet

How large is “huge”? Well, you can check the source code of V8, SpiderMonkey and JavaScriptCore. In brief: there are limits agreed upon among major browser engines that restrict a wasm module’s import/export numbers, etc, and we do run into those limits occasionally when the Haskell library is huge. For instance, the monolithic ghc library exceeds the limit, and so does the profiling way of ghc-internal. So cost-center profiling doesn’t work for the ghci browser mode yet, though it does work for statically linked wasm modules and ghci nodejs mode.

Unfortunately, this issue is definitely not a low hanging fruit even for me. I maintain a nodejs fork that patches the V8 limits so that the Template Haskell runner should still work for huge libraries, but I can’t do the same for browsers. A fundamental fix to sidestep the browser limits would be a huge amount of work. So I’ll be prioritizing other work first. If you need to load a huge library in the browser, you may need to split it into cabal sublibraries.

Wishlist, as usual

My past blog posts usually ends with a “what comes next” section. This one is no exception. The browser mode is in its early days, so it’s natural to find bugs and other rough edges, and there will be continuous improvement in the coming months. Another thing worth looking into is profiling: modern browsers have powerful profilers, and it would be nice to integrate our own profiling and event log mechanism with browser devtools to improve developer experience.

The next big thing I’ll be working on is threaded RTS support. Currently all Haskell wasm modules are single-threaded and runs in the browser main thread, but there may exist workloads that can benefit from multiple CPU cores. Once this is delivered, Haskell will also become the first functional language with multi-core support in wasm!

You’re welcome to join the Haskell wasm Matrix room to chat about the GHC wasm backend and get my quick updates on this project.

April 17, 2025 12:00 AM

April 16, 2025

Well-Typed.Com

The Haskell Unfolder Episode 42: logic programming with typedKanren

Today, 2025-04-16, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 42th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 42: logic programming with typedKanren

Functional programming is programming with mathematical functions, mapping inputs to outputs. By contrast, logic programming—perhaps best known from the language Prolog—is programming with mathematical relations between values, without making a distinction between inputs and outputs. In this two-year anniversary episode of the Haskell Unfolder we take a look at typedKanren, an embedding of the logic programming language miniKanren in Haskell. We will see how we can use it to write a type checker for a simple functional language in a few lines of code.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at April 16, 2025 12:00 AM

April 12, 2025

Derek Elkins

What difference lists actually are

Introduction

Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of concatenation.

The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists” represented this way is just function composition which is a constant time operation. We can lift a list xs to this representation via the section (xs ++). This will still lead to O(length xs) amount of work to apply this function, but a composition of such functions applied to a list will always result in a fully right associated expression even if the function compositions aren’t right associated.

In the last several years, it has become popular to refer to this technique as “difference lists”. Often no justification is given for this name. When it is given, it is usually a reference to the idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise to efficient concatenation, they have almost no similarities.

Functional Lists

To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it is doing is a bit misunderstood and, consequently, oversold1. Let’s see how we would model this approach in an OO language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it would be exactly the same for generic lists.

interface PrependTo {
  String prependTo(String end);
}

class Compose implements PrependTo {
  private PrependTo left;
  private PrependTo right;
  public Compose(PrependTo left, PrependTo right) {
    this.left = left; this.right = right;
  }
  String prependTo(String end) {
    this.left.prependTo(this.right.prependTo(end));
  }
}

class Prepend implements PrependTo {
  private String s;
  public Prepend(String s) { this.s = s; }
  String prependTo(String end) {
    return this.s + end;
  }
}

This is just a straight, manual implementation of closures for (.) and (++) (specialized to strings). Other lambdas not of the above two forms would lead to other implementations of PrependTo. Let’s say, however, these are the only two forms that actually occur, which is mostly true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an OOP encoding of the algebraic data type:

data PrependTo = Compose PrependTo PrependTo | Prepend String

prependTo :: PrependTo -> String -> String
prependTo (Compose left right) end = prependTo left (prependTo right end)
prependTo (Prepend s) end = s ++ end

We could have also arrived at this by defunctionalizing a typical example of the technique. Modulo some very minor details (that could be resolved by using the Church-encoded version of this), this does accurately reflect what’s going on in the technique. Compose is clearly constant time. Less obviously, applying these functional lists requires traversing this tree of closures – made into an explicit tree here. In fact, this reveals that this representation could require arbitrarily large amounts of work for a given size of output. This is due to the fact that prepending an empty string doesn’t increase the output size but still increases the size of the tree. In practice, it’s a safe assumption that, on average, at least one character will be prepended per leaf of the tree which makes the overhead proportional to the size of the output.

This tree representation is arguably better than the “functional list” representation. It’s less flexible for producers, but that’s arguably a good thing because we didn’t really want arbitrary String -> String functions. It’s more flexible for consumers. For example, getting the head of the list is a relatively efficient operation compared to applying a “functional list” and taking the head of the result even in an eager language. (Laziness makes both approaches comparably efficient.) Getting the last element is just the same for the tree version, but, even with laziness, is much worse for the functional version. More to the point, this concrete representation allows the concatenation function to avoid adding empty nodes to the tree whereas (.) can’t pattern match on whether a function is the identity function or not.

This view makes it very clear what the functional version is doing.

Difference Lists in Prolog

List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.

append([], Ys, Ys).
append([X|Xs], Ys, [X|Zs]) :- append(Xs, Ys, Zs).

For our purposes, viewing this as a function of the first two arguments, this is exactly the usual functional implementation of list concatenation with exactly the same problems. We could, of course, encode the defunctionalized version of the functional approach into (pure) Prolog. This would produce:

prepend_to(compose(Xs, Ys), End, Zs) :- prepend_to(Ys, End, End2), prepend_to(Xs, End2, Zs).
prepend_to(prepend(Xs), End, Zs) :- append(Xs, End, Zs).

(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)

However, this being a logic programming language means we have additional tools available to use that functional languages lack. Namely, unification variables. For an imperative (destructive) implementation of list concatenation, the way we’d support efficient append of linked lists is we’d keep pointers to the start and end of the list. To append two lists, we’d simply use the end pointer of the first to update the end of the first list to point at the start of the second. We’d then return a pair consisting of the start pointer of the first and the end pointer of the second.

This is exactly how Prolog difference lists work, except instead of pointers, we use unification variables which are more principled. Concretely, we represent a list as a pair of lists, but the second list will be represented by an unbound unification variable and the first list contains that same unification variable as a suffix. This pair is often represented using the infix operator (“functor” in Prolog terminology), -, e.g. Xs - Ys. We could use diff(Xs, Ys) or some other name. - isn’t a built-in operator, it’s just a binary constructor essentially.

At the level of logic, there are no unification variables. The constraints above mean that Xs - Ys is a list Xs which contains Ys as a suffix.

The name “difference list” is arguably motivated by the definition of concatenation in this representation.

concat(Xs - Ys, Ys - Zs, Xs - Zs).

This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument is unbound, like it’s supposed to be, then this is a constant-time operation of binding that component to Ys. If it is bound, then we need to unify which, in the worst-case, is O(length Ys) where the length is up to either nil or an unbound variable tail2.

We also have the unit of concat, i.e. the empty list via3:

empty(Xs - Xs).

See the footnote, but this does in some way identify Xs - Ys with the “difference” of Xs and Ys.

We get back to a “normal” list via:

to_list(Xs - [], Xs).

% or more generally,
prepend_to(Xs - Ys, Ys, Xs).

to_list is a constant-time operation, no matter what. Note, to_list binds the suffix component of the difference list. This means that the first input no longer meets our condition to be a difference list. In other words, to_list (and prepend_to) consumes the difference list. More precisely, it constrains the possible suffixes the list could be. Indeed, any operation that binds the suffix component of a difference list consumes it. For example, concat consumes its first argument.

Of course, it still makes logical sense to work with the difference list when its suffix component is bound, it’s just that its operational interpretation is different. More to the point, given a difference list, you cannot prepend it (via prepend_to or concat) to two different lists to get two different results.

Converting from a list does require traversing the list since we need to replace the nil node, i.e. [], with a fresh unbound variable. Luckily, this is exactly what append does.

from_list(Xs, Ys - Zs) :- append(Xs, Zs, Ys).

from_list also suggests this “difference list” idea. If all of Xs, Ys, and Zs are ground terms, then from_list(Xs, Ys - Zs) holds when append(Xs, Zs, Ys) holds. Exactly when if our invariants are maintained, i.e. that Zs is a suffix of Ys. Writing these relations more functionally and writing append as addition, we’d have:

\[\mathtt{from\_list}(Xs) = Ys - Zs \iff Xs + Zs = Ys\]

If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal) list with to_list, and then we could use from_list multiple times on that result. This would, of course, still consume the original difference list. We’d also be paying O(length Xs) for every duplicate, including to replace the one we just consumed4.

That said, we can prepend to a list to a difference list without consuming it. We can perform other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing into the list would force the list to be at least a certain length, but still allow prepending to any list that will result in a final list at least that long.

Comparison

I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing enforces that a value of type ShowS actually just appends something to its input. We could use abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference lists also need an abstraction barrier to ensure their invariants, though their failure modes are different. A difference list can’t change what it is based on what it is prepended to.

Functional Representation Difference Lists
constant-time concatenation constant-time concatenation
constant-time conversion from a list (though you pay for it later) O(n) conversion from a list
persistent non-persistent, requires linear use
represented by a tree of closures represented by a pair of a list and a unification variable
O(n) (or worse!) conversion to a list constant-time conversion to a list
defunctionalized version can be implemented in pretty much any language requires at least single-assignment variables
unclear connection to being the difference of two lists (which two lists?) mathematical, if non-obvious, connection to being the difference of two (given) lists


As an illustration of the difference between persistent and non-persistent uses, the function:

double f = f . f

is a perfectly sensible function on ShowS values that behaves exactly as you’d expect. On the other hand:

double(In, Out) :- concat(In, In, Out).

is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic list) except for when In is the empty difference list.

Conclusion

I hope I’ve illustrated that the functional representation is not just not difference lists, but is, in fact, wildly different from difference lists.

This functional representation is enshrined into Haskell via the ShowS type and related functions, but I’d argue the concrete tree representation is actually clearer and better. The functional representation is more of a cute trick that allows us to reuse existing functions. Really, ShowS should have been an abstract type.

Difference lists are an interesting example of how imperative ideas can be incorporated into a declarative language. That said, difference lists come with some of the downsides of an imperative approach, namely the lack of persistence.

As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional representation. Calling it “functional lists” or something like that is, in my opinion, very ambiguous and potentially misleading. I think the lack of a good name for this is why “difference lists” started becoming popular. As I’ve argued, using “difference list” in this context is even more misleading and confusing.

If people really want a name, one option might be “delta list”. I don’t think this term is used. It keeps the intuitive idea that the functional representation represents some “change” to a list, a collection of deltas that will all be applied at once, but it doesn’t make any false reference to difference lists. I’m not super into this name; I just want something that isn’t “difference list” or otherwise misleading.


  1. To be clear, it’s still much, much, better than using plain concatenation.↩︎

  2. Such a length relation couldn’t be written in pure Prolog but can in actual Prolog.↩︎

  3. For those algebraically minded, this almost makes concat and empty into another monoid except concat is partial, but such a partial monoid is just a category! In other words, we have a category whose objects are lists and whose homsets are, at most, singletons containing Xs - Ys for Hom(Xs, Ys). If we maintain our invariant that we have Xs - Ys only when Ys is a suffix of Xs, this thin category is exactly the category corresponding to the reflexive, transitive “has suffix” relation. We could generalize this to any monoid via a “factors through” relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is exactly the slice category of the monoid viewed as a one-object category.↩︎

  4. Again, in actual Prolog, we could make a duplicate without consuming the original, though it would still take O(length Xs) time using the notion of length mentioned before.↩︎

April 12, 2025 11:25 PM

April 11, 2025

Well-Typed.Com

Evaluating AI's Impact on Haskell Open Source Development

We have recently contributed to a research study investigating how AI can help with realistic software development tasks. METR initiated this study to measure how AI tools affect real-world software engineering productivity, particularly in substantial open-source projects. The study was designed to measure and assess how experts can use AI tools in order to improve their workflows. It’s all well and good if the latest model can fix an artificial test case, but what’s more interesting is how AI can be driven by expert knowledge.

The study required us to work on about 20 small normal development tasks. These tasks were randomised; in some, we were allowed to use AI tools, and in others, we were not. Apart from that, we could solve them however we wanted. We compiled notes on how we approached using the tools and recorded our screens to provide a record of our experience.

Sam used the hours to work on GHC tickets, and I (Matt) used the hours to work on Cabal tickets. I have included a list of the issues we worked on at the end of the post. Sam focused on fixing small bugs, while I fixed all the known regressions in the cabal-install-3.14 release.

In this post, I’ll briefly discuss how we used the tools and what our overall experience was.

Haskell Programming with the help of AI

Before this study, neither of us was experienced with using AI tools to help with software development. I was impressed that the models could interact with Haskell code at all. At the start, it was quite overwhelming trying to understand what was available and what the trade-offs were between different tools. The AI landscape is changing rapidly at the moment; there is a new model and tool every week. Therefore, I won’t go into too much detail about what specific models or tools we used, but rather focus on our findings and experiences.

Development Environment

For the study, we were primarily using the following models and tools:

  • The text editor Cursor with AI autocomplete. Cursor is a fork of VSCode with AI related features. In the version of Cursor we used (0.45), there were two modes: the “chat” mode, which does not directly perform edits to your files, and the “compose” mode, which does.
  • From within Cursor, the LLM claude-3.7-sonnet-thinking for the “chat” and “compose” features.
  • The standard ChatGPT 4o model from the web interface.
  • Haskell Language Server

Using an editor with integrated LLM support, in particular one that supports Haskell Language Server, is key to getting the most out of the AI tools:

  • Within the editor, it means that LLMs have access to relevant context for the task. This includes any files we pass to the model as context, but also the rest of the codebase which can be searched by the LLM.
  • When an LLM suggests a change, it will receive feedback from HLS which will allow it to fix issues (e.g. fixing up missing imports, resolving typechecker errors, etc). In practice, this made LLMs much more more autonomous and reliable.

We didn’t use anything complicated or new, such as the Model Context Protocol or very advanced thinking models such as ChatGPT o1.

Armed with these tools, we were ready to set about our task.

Architectural Understanding Tasks

For the AI-enabled tasks, we were encouraged to use the AI as much as possible. Therefore, I typically started by just giving the AI a link to the GitHub issue and asking it to explain what to do to me. The summary was useful to me to just check I understood correctly, and hearing it phrased differently was a good sanity check before starting the issue.

Asking specific questions about the codebase had more mixed results. In general, the AI could usually give plausible answers to understanding tasks, but they were often wrong in some subtle way. It is also very suggestible to agree with whatever you state you think the solution is.

My impression for architectural understanding tasks was that you would have to provide a summary document as context in order to answer questions more accurately.

Technology Understanding Tasks

For tasks that required me to understand something new or unfamiliar, the AI was very good. In one issue, I had to investigate something wrong with the GitHub CI setup, which was an area I was quite clueless about. ChatGPT was able to suggest the probable cause of the issue with minimal prompting and just the issue description for context. That certainly saved a lot of time.

The ability to generate ad-hoc scripts for particular tasks was also very useful. I generated several useful single-use bash and python scripts for extracting specific pieces of information from the codebase. These scripts can also be used to generate information to feed back into the prompt, which can perform a useful feedback loop.

Code Generation Tasks

Once the AI demonstrated to me it understood the problem, I would ask it to generate a solution. The AI could generate plausible, syntactically correct code, but it was often the wrong idea. I think this was the biggest waste of time. Once a solution is generated, it was quite tempting to just “fix” the wrongness, but more often than not, the architecture or design was wrong. Many fixes in a codebase like Cabal require changing a few lines very precisely; that’s not something the AI is good at doing on its own at the moment.

On the other hand, if you are precise with your prompts and set the correct context, the AI can save a lot of time generating specific definitions for you. I would often use it to generate routine instances, simple definitions, or other well-defined generation tasks. It normally got these correct, which I was very surprised about.

Generating test cases was also a good use of the AI. It was able to handle generating the right structure for the custom Cabal test framework. These invariably required some tweaking, but getting all the right files in place made it a much simpler task.

The final approach to improve generation tasks is to first converse with the “chat” interface to clarify the problem, discuss different parts of the design, and point out any issues. Once this context is established, you can ask the “chat” window to first generate a prompt for the “compose” window. This prompt then gives precise instructions already to “compose,” but it can be edited further if something is not quite right.

Documentation Tasks

Opinion was split between us about how useful the LLMs were for documentation tasks.

I thought that this was a strong point in favour of using LLMs. Often when working on an issue, you end up having to explain the same thing several different times. First, you explain precisely to the machine what your intent is with the code you write. Then you explain to a developer in the comments and commit message. Finally, you explain to the user in the changelog and documentation. Each of these tasks requires modifying somewhere slightly different with a slightly different focus. I found that I was much more inclined to include all these different parts when using the AI since it could do a good job generating the necessary files without requiring too much further editing.

The code changes themselves, along with the context developed in “chat,” were normally enough to be able to generate the commit message, changelog, and documentation updates with very little effort.

On the other hand, the suggestions weren’t to Sam’s taste. He thought that the style generated for the commit messages was rambly and indirect. The model might focus on explaining a small detail rather than giving a bigger picture overview. For the more complicated code in GHC, the explanation of the code was a vague transcription rather than relaying any higher-level ideas the user might want to know.

He felt similarly when it came to note writing, a developer documentation artifact common to GHC development, the LLMs would “get stuck” explaining details of the code rather than the bigger picture. He did have some success in writing commit messages: the LLMs were good at summarising which functions and parts of the code were modified, which gave a good starting point for structuring the necessary explanations.

It’s interesting we had different experiences in this area, perhaps it was due to the difference in the codebase, or a difference in our style of using the models. People often struggle writing commit messages or documentation, and I think using LLMs can reduce the barrier to entry in this area. A human crafted commit message is often much bettter than one generated by a model but I would much prefer a commit message generated by an LLM rather than none at all.

Verification Tasks

Another interesting use case is to use the AI to perform ad-hoc verification tasks. For example, I used the AI to check that all NFData instances had a certain structure. For this, I first worked with the AI to generate a script to extract all the code for NFData instances from the codebase. This required a small amount of debugging, but it would have taken me several hours to write the awk script myself due to unfamiliarity with the language. Once I had the script, I extracted all the NFData instances and asked ChatGPT to check that they all had the correct structure. The instance-by-instance summary allowed me to also quickly verify the AI’s answer. It resulted in spotting a few missed cases that were very hard to spot by eye.

LLMs were also useful in diagnosing failing test cases. For example, Sam implemented a change to GHC which lead to a few dozen failing tests. After giving relevant context about the change, the LLM was able to categorise the failing tests:

  • Some test results only involved minor output changes or improvements in error messages, these could be accepted.
  • The LLM further categorised the serious test failures, e.g. “tests 1, 4 and 5 failed for one reason, while test 2 failed for another reason”.

This categorisation was useful to identify potential issues with a change and quickly addressing them. It often happens in GHC development that a small change can lead to hundred of failing test cases, and it can be very time consuming to go over all failing test cases individually. Having an assistant that can quickly do a first pass at sorting the test failures is very helpful.

Of course, the answers given to you by an LLM must always be taken with suspicion. In situations where 95% confidence is good enough, or when it is quick and easy to check the correctness of an answer, they can be very useful.

Refactoring Tasks

Using an LLM can be helpful for refactoring tasks that are routine and well-defined. In our experience, however, they tend to struggle with larger tasks or those requiring nuance.

For instance, the LLMs performed well when adding a new error message to Cabal’s diagnostic infrastructure. This kind of task requires modifying quite a few different places in a routine manner. There is not much code to add, nothing to move around or delete. Similarly, for smaller tasks like lifting an expression to a top-level definition or adding debugging traces, the AI was able to do this with a high degree of confidence.

For repetitive refactoring tasks, such as renaming fields or parameters, the Cursor autocomplete is extremely useful. I could often just change the name of a field, navigate to the next type error, and the autocomplete would suggest the correct modification. I personally also found it useful that the AI liked to keep things consistent. Fields were named consistently, functions named consistently, etc. I find this task of making sure the whole API is consistent quite difficult to do manually.

Finally, I didn’t really try to use the AI for large refactoring tasks. There would tend to be a lot of random or incomplete changes to the codebase, and subtle things would go wrong. Sam also reported that when working on some more routine refactorings, there might have been one or two places which required a decision to be made, and you could waste quite a lot of time if the LLM choose incorrectly. It would be useful if a LLM could indicate the places that it modified with lower confidence.

Conclusion

Overall, I found the experience of using AI tools in my normal development workflow to be very useful, and I will continue to use them after the study. It’s clear to me that it is going to become essential to be familiar with these tools as a developer in the future.

Sam has a more negative outlook in comparison. While he found LLMs useful, he is concerned that increased use of LLMs will affect our shared ability to reason about our code. Usage of LLMs risks disincentivising thinking deeply about the design or architecture of our software, which then increases the burden placed on reviewers and risks the community losing their shared understanding of how the codebase is supposed to operate.

We appreciate METR’s support in conducting this research, which has helped us better understand both the potential and limitations of AI-assisted development in the Haskell ecosystem.

If your company is interested in funding open-source work then we offer Haskell Ecosystem Support Packages to provide commercial users with support from Well-Typed’s experts, while investing in the Haskell community and its technical ecosystem.

Issues Fixed

Sam’s GHC Issues

Sam focused on fixing small bugs in the typechecker, together with a couple of bugfixes related to LLVM code generation.

Issue Description MR
#24035 Address incorrect unused imports warning when using DuplicateRecordFields extension. !14066
#25778 Fix oversight in the implementation of NamedDefaults in the typechecker. !14075
#25529 Stop caching HasCallStack constraints to resolve an inconsistent in callstacks in GHC 9.8. !14084
#25807 Document defaulting in the user’s guide. !14057 (first commit)
#23388 Enhance documentation for the ExtendedDefaultRules and OverloadedStrings extensions. !14057 (second commit)
#25777 Document the change in defaulting semantics caused by NamedDefaults. !14072 + !14057 (third)
#25825 Improve defaulting of representational equalities. !14100
#24027 Address issues related to type data declarations and their import/export behaviour. !14119
#23982 Improve error messages for out-of-scope data constructors and types in terms. !14122 (second commit)
#22688 Improve error messages when an instance is declared for something that isn’t a class. !14105
#25770 Fix segmentation fault occurring in the LLVM backend, due to a bug in FP register padding. !14134
#25769 Identify and fix the AtomicFetch test failure with the LLVM backend. !14129
#25857 Stop emitting incorrect duplicate export warnings when using NamedDefaults. !14142
#25882 Ensure NamedDefaults properly handles poly-kinded classes such as Typeable. !14143
#25874 Refactor: remove GhcHint field from TcRnNotInScope error message constructor. !14150
#25848 Refactor: split up CtEvidence into CtGiven and CtWanted. !14139
#25877 Preserve user-written module qualification in error messages. !14122 (first commit)
#25204 Fix Windows binary distribution error when there are spaces in the installation directory path. !14137
#25881 Improve recompilation checking mechanisms when explicit import lists are used. !14178
#24090 Investigate bug with ill-scoped type synonyms, implement next step of a deprecation plan. !14171

Matt’s Cabal Issues

I focused on fixing regressions in cabal-3.14.1.0, updating commands to use the project infrastructure and fixing bugs in the multi-repl.

Issue Description PR
#2015 cabal repl does not reflect changes in the PATH environment variable, causing inconsistencies in environment-dependent behaviours. #10817
#10783 Add a test to verify the fix implemented by the contributor in #10783. #10783
#10295 Enhance cabal check to reject packages containing invalid file names, improving package validation. #10816
#10810 Allow compatibility with containers-0.8 by updating package constraints. #10814
#7504 Implement v2-gen-bounds. #10840
#10718 Create a reproducer for the issue where Cabal 3.14.1.0 invokes test binaries with a corrupt (duplicated) environment variable list, aiding in debugging and resolution. #10827
#10718 Cabal 3.14.1.0 invokes test binaries with a corrupt (duplicated) environment variable list, causing test execution failures. #10827
#10759 Write a reproducer for the issue where cabal-install 3.14 linking fails with “shared object file not found,”. #10828
#10759 cabal-install 3.14 linking fails with “shared object file not found,” indicating issues in locating shared libraries during the linking process. #10828
#10717 Regression in Cabal-3.14.1.0: v1-test and Setup.hs test cause test suites of alex-3.4.0.1 and happy-1.20.1.1 to be unable to find data files. #10830
#10717 Develop a test for the regression where Cabal-3.14.1.0’s v1-test and Setup.hs test cause test suites of alex-3.4.0.1 and happy-1.20.1.1 to be unable to find data files. #10830
#10797 Ensure that C++ environment variables are correctly passed to configure scripts, facilitating proper configuration of packages requiring C++ settings. #10844
#8419 Continuous Integration (CI) using GitHub Actions produces error annotations despite tests passing, leading to misleading CI results. #10837
#7502 Implement per-version index cache in the ~/.cabal directory to improve performance and accuracy of package index operations. #10848
#10744 The fix-whitespace job produces overly verbose output, making it difficult to identify relevant information in logs. #71
#10726 The plan.json file generated by Cabal does not include the compiler ID, omitting crucial information about the compiler used in the build plan. #10845
#10775 Encountering “Error: Dependency on unbuildable library” when using three internal libraries with --enable-multi-repl, indicating issues in handling multiple internal dependencies. #10841
#10818 Unable to access __HADDOCK_VERSION__ macro during documentation generation, affecting conditional compilation based on Haddock version. #10851
#8283 Extend cabal outdated command to support multi-package projects, allowing users to check for outdated dependencies across all packages in a project. #10878
#10181 The cabal repl command does not support renaming of re-exported modules when loading multiple components. This limitation leads to failures when one component re-exports a module from another with a different name. #10880

by matthew, sam at April 11, 2025 12:00 AM

April 10, 2025

Tweag I/O

Practical recursion schemes in Rust: traversing and extending trees

Rust has always felt like a strange beast, culturally speaking. The community is made of a mix of people with very different perspectives, including anything from hardcore low-level kernel hackers to category-theorist and functional programming gurus. This is also what makes this community so fertile: whether you’re coming from C, Haskell or TypeScript, you’re likely to learn a lot from other perspectives.

I’d like to add my modest contribution by introducing a pattern coming from the functional programming world, recursion schemes1. Recursion schemes are a design pattern for representing and traversing recursive data structures (typically trees) which help factor the common part of recursive traversals, making transformations nicer to write, to read and to compose.

Even in the functional programming world, recursion schemes are not so well-known. Like monads, they are usually presented in Haskell with frightening words like zygohistomorphic prepromorphisms. It’s a pity because recursion schemes can be both simple, useful and practical. I’d even argue that in Rust, the most interesting part is perhaps the representation technique, more than the traversal, despite the latter being the original and the usual motivation for using recursion schemes.

In this post, we’ll work through a concrete example to introduce recursion schemes and what they can do. We’ll point to a more real life example of how we use them in the implementation of the Nickel configuration language, and we’ll discuss the pros and cons of using recursion schemes in the particular context of Rust.

(In)flexible representations

Let’s say you’re writing a JSON parser library. You’ll need to expose a type representing JSON values. For the sake of argument, let’s assume that you support an extension of the JSON language with pairs, so you can write {"foo": ("hello","world")}. Here’s a natural representation:

pub enum JsonValue {
  String(String),
  Number(f64),
  Pair(Box<JsonValue>, Box<JsonValue>),
  Array(Vec<JsonValue>),
  Object(HashMap<String, JsonValue>),
}

This data structure is recursive: JSON values can contain other JSON values. We thus have to use Box (or any other indirection) around recursive occurrences of JsonValue. Otherwise, this enum would have an infinite size (excepted for Array and Object since Vec and HashMap add their own indirection, but it’s somehow luck).

Now, user requestor asks that your parser adds location information to the output, because they validate some user-provided configuration and would like to point to specific items on error. This is a reasonable request which is sadly very hard to satisfy in the serde ecosystem. Anyway, our parser isn’t interfacing with serde, so we can add span information:

pub type Span = std::ops::Range<usize>;

pub struct Spanned<T> {
  pos: Span,
  data: T,
}

pub type SpannedValue = Spanned<JsonValue>;

pub enum JsonValue {
  String(String),
  Number(f64),
  Pair(Box<SpannedValue>, Box<SpannedValue>),
  Array(Vec<SpannedValue>),
  Object(HashMap<String, SpannedValue>),
}

You can go different ways about this. We could have added a second argument to each constructor of the enum, such as in String(String, Span), to avoid the additional Spanned layer, but that would be a lot of repetition. We could also have moved Box to data: Box<T>. Still, the general idea is that we now have two layers:

  • a struct layer gathering the JSON data and the span together;
  • the original enum layer, the core of JSON, which is almost unchanged.

So far, so good. But user conservator is now complaining that you’ve spoiled their performance. They’re using JSON as a machine exchange format and don’t care about position information. Could you restore the old representation and a way to produce it, ignoring spans?

Unfortunately, we had to change JsonValue. Copy-pasting the original JsonValue enum under a different name is possible, but it’s unsatisfying, as we now have multiple copies to maintain. It also doesn’t scale. Beside adding position information, you might want to have a value representation that uses Rc instead of Box, because you’re going to need to keep reference to arbitrary nodes during some complex transformation.

The functorial representation

The recursion schemes pattern has two components: a representation technique and a transformation technique. I believe the representation part is particularly interesting for Rust, so let’s start with that.

We’ll try to make our JSON representation more generic to accommodate for the different variations that we mentioned in the previous section. The fundamental idea is to replace the recursive occurrences of JsonValue within itself, Box<JsonValue> (or JsonValue for Array and Object), by a generic parameter T. Doing so, we’re defining just one layer of a JSON tree where recursive children can be anything, not necessarily JSON values (we use the F suffix for that generic version because it’s technically a functor, but that doesn’t really matter).

pub enum JsonValueF<T> {
  String(String),
  Number(f64),
  Pair(T, T),
  Array(Vec<T>),
  Object(HashMap<String, T>),
}

Let’s play a with a few examples to get familiar with this representation.

  • If we set T = (), we get a type that is isomorphic (modulo some ()) to:

    JsonValueF<()> ~ enum {
      String(String),
      Number(f64),
      Pair,
      Array,
      Object,
    }

    This is precisely a single node of a JSON tree, that is either a leaf or a marker of a node with children but without actually including them.

  • If we set T = Box<JsonValueF<T>>, we get back the original JsonValue. But wait, you can’t define the generic parameter T to be something which depends on T itself! In fact we can, but we need to introduce an extra indirection:

    pub struct JsonValue {data: JsonValueF<Box<JsonValue>>}

    The price to pay is an additional struct layer, so you need to match on value.data, and wrap new values as JsonValue { data: JsonValueF::Number(0) }. Note that this layer doesn’t have any cost at run-time.

    Another difference is that we now box the values in Array and Object, which isn’t needed. For now I’ll just ignore that, but you could take a second generic parameter U to represent the occurrences of T that don’t need an indirection if this really matters to you.

  • If we extend our intermediate layer a bit, we can get SpannedValue!

    pub struct SpannedJsonValue {
      data: JsonValueF<Box<SpannedJsonValue>>,
      span: Span,
    }

    You can create any extension of JsonValue with additional metadata lying at each node of the tree, which is pretty neat.

  • We are also able to change the ownership model of JSON values. It’s simple to write a reference-counted variant:

    pub struct SharedJsonValue {data: JsonValueF<Rc<SharedJsonValue>>}

    Or a borrowed version, that you could allocate in an arena:

    pub struct ArenaJsonValue<'a> {data: JsonValueF<&'a ArenaJsonValue>}

This idea of putting a self-referential type within JsonValueF is referred to as tying the knot. The power of this approach is that you can keep the core JsonValueF type unchanged. This applies to any tree-like recursive structure.

Some methods can be implemented only once on JsonValueF for any T, say is_string or is_number. With additional trait constraints on T, we can write more involved functions, still operating on the generic functor representation.

Let’s now see how to traverse our JSON values.

Traversals

The strong point of recursion schemes is to provide an interface for traversing recursive structures that let you focus on what the function actually does, which is otherwise mixed with how the recursion is done. The idea is to use generic combinators which factor out the plumbing of recursive traversals.

Let’s count the number of String nodes in a JSON value, the naive way.

fn count_strings(value: &JsonValue) -> u32 {
    match &value.data {
        JsonValueF::String(_) => 1,
        JsonValueF::Number(_) => 0,
        JsonValueF::Pair(fst, snd) => count_strings(fst) + count_strings(snd),
        JsonValueF::Array(array) => array.iter().map(|elt| count_strings(elt)).sum(),
        JsonValueF::Object(object) => object.values().map(|elt| count_strings(elt)).sum(),
    }
}

We’ll see how to write this function in the style of recursion schemes. First, we need to define one core combinator: map.

map takes a JsonValueF<T>, a function f from T to U and returns a JsonValue<U>. That is, map takes a JSON layer where all the direct children (the recursive occurrences in our full type) are of some type T and applies f to transform them to something of type U. This is the secret sauce for defining traversals.

map on array

impl<T> JsonValueF<T> {
    fn map<U>(self, f: impl FnMut(T) -> U) -> JsonValueF<U> {
        match self {
            JsonValueF::String(s) => JsonValueF::String(s),
            JsonValueF::Number(n) => JsonValueF::Number(n),
            JsonValueF::Pair(fst, snd) => JsonValueF::Pair(f(fst), f(snd)),
            JsonValueF::Array(array) => {
                JsonValueF::Array(array.into_iter().map(|elt| f(elt)).collect())
            }
            JsonValueF::Object(object) => {
                JsonValueF::Object(object.into_iter().map(|(k, v)| (k, f(v))).collect())
            }
        }
    }
}

map isn’t specific to JsonValueF. It can be defined mechanically for any functor representation (e.g. through a macro) of a data structure.

Note that there’s no recursion in sight: there can’t be, because T and U are entirely generic and could very well be (), but we saw that JsonValueF<()> is a single node. map only operates at the current layer.

The trick is that f can use map itself. Let’s see how to use it for count_strings:

fn count_strings(value: JsonValue) -> u32 {
    match value.data.map(|child| count_strings(*child)) {
        JsonValueF::String(_) => 1,
        JsonValueF::Number(_) => 0,
        JsonValueF::Pair(fst, snd) => fst + snd,
        JsonValueF::Array(array) => array.iter().sum(),
        JsonValueF::Object(object) => object.values().sum(),
    }
}

If you look closely, there’s no more recursion in the body of the pattern matching. It’s factored out in the map call. Let’s break down this example:

  1. map, given a function from T to U, promises you that it can transform the direct children of type T in JsonValueF<T> to U, providing JsonValueF<U>. We use it immediately with a recursive call to count_strings, which can indeed transform the direct children from a Box<JsonValue> to a u32. If the children have children itself, count_strings will do that recursively as its first action, down to the leaves.
  2. Once we’ve reduced potential children of deeper layers to u32s, we get a JsonValueF<u32>. We sum its content at the current layer.

There is a catch though: our count_strings function takes an owned argument, which consumes the original JSON value. I’ll come back to that later.

While I find the second version of count_strings a little cleaner, the difference between the two isn’t really astonishing.

As a more compelling example, let’s define a generic bottom-up traversal function on JsonValue. This traversal is able to map — that is to rewrite — nodes (more exactly entire subtrees). map_bottom_up takes a generic transformation f and applies this function to every subtree starting from the leaves. You could use such a function to apply program transformations or optimizations on an abstract syntax tree.

impl JsonValue {
    pub fn map_bottom_up(self: JsonValue, f: impl FnMut(JsonValue) -> JsonValue) -> JsonValue {
        let data = self.data.map(|v| Box::new(v.map_bottom_up(f)));
        f(JsonValue { data })
    }
}

This example is quite remarkable: it’s almost a one-liner and there is no pattern matching at all! Once again, the structural recursion is entirely factored out in the map function. We implemented map_bottom_up on JsonValue directly, but with some trait constraints on T, we can write a more generic version JsonValueF that works on both the Boxed and Rced version (the arena one is more tricky as it requires an explicit allocator). This example is only scratching the surface.

Mapping is just one example: another common traversals are folds (known as catamorphisms in the recursion schemes jargon), which generalize the well-known Iterator::fold from sequences to trees. In fact, count_strings would make more sense as a fold, but we’ll leave that for another time.

Are recursion schemes useful in Rust?

Haskell has a number of features that make recursion schemes particularly nice to use and to compose, not the least of which is garbage collection. You don’t have to think about ownership; it’s references all the way down. Recursive data structures are easy to express.

On the other side, there is Rust, which culturally doesn’t like recursive functions that much, for good and bad reasons2. Though sometimes recursion is hard to avoid, especially on tree-like data structures.

An important issue is that our count_strings consumes its argument, which is unacceptable in practice. It is possible to write a version of map that takes a value by reference, and thus similarly for count_strings, but it’s not entirely straightforward nor free. You can find a by-reference version and more explanations in our associated repository. At any rate, you can always write specific traversals manually without resorting to the recursion schemes way if needed. It’s not an all or nothing approach.

In fact, even if you don’t use map at all, the functor representation alone is quite useful.

How we use recursion schemes in Nickel

In the implementation of the Nickel configuration language, we use the functor representation for the abstract syntax tree of a static type. Here are the stages we went through:

  1. In the parser and most of the Nickel pipeline, we used to have a simple Box-based, owned representation, akin to JsonValue.

  2. However, during type inference, the Nickel typechecker needs to handle new type constructions, in particular unification variables. Those are as-of-yet unknown types, similar to unknowns in an algebraic equation. Extending the base representation is readily done as for SpannedJsonValue:

    pub enum UnifType {
      Concrete(Box<TypeF<UnifType>>),
      /// A unification variable.
      UnifVar(VarId),
      //.. rigid type variables, etc.
    }
  3. More recently, we’ve split the historical, all-powerful unique representation of expressions (including Nickel types) into two intermediate ones. The new initial representation is arena-allocated, which makes it natural to use bare references as the recursive indirection instead of allocating in the heap through e.g. Box. This is easy with recursion schemes: that is precisely the ArenaJsonValue example. For a smooth transition, we need to temporarily keep the old Box-ed Type representation in parts of the codebase, but having different representations co-exist is a basic feature of recursion schemes.

We use map-based traversal typically to substitute type variables (that is, a Nickel generic type, as our T in Rust) for a concrete type and similar rewriting operations. We have variants of the core map function that can also thread mutable state, raise errors, or both. Traversal by reference are implemented manually, with a plain recursive function.

On the downside, type and core function definitions can be a bit verbose and tricky to get right. For example, Nickel’s TypeF has sub-components that themselves contain types leading to 4 generic parameters. There are multiple possibilities for Box placement in particular, only some of them are correct and they are subtly different. Though once you’ve defined a new variant, this complexity is mostly hidden from the consumers of your API. It can still manifest as terrible Rust type errors sometimes if, God forbid, you’ve put a Box at the wrong place.

Conclusion

We’ve introduced recursion schemes, a design pattern for representing and traversing recursive data structures. While the traversal part isn’t as good a fit as in purer functional languages like Haskell, it can still be useful in Rust. The representation part is particularly relevant, making it easy to define variations on a recursive data structure with different ownership models or metadata. We’ve shown how we use recursion schemes in Nickel, and while there are performance and complexity trade-offs to consider, they can bring value for moderately complex tree types that need to be extended and transformed in various ways.


  1. The classical paper on this subject is Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire.
  2. Rust allocates on the stack by default, which makes it easier to overflow (though the stack can be configured to be larger at compile time). However, I have the impression that there’s a misleading idea that recursive functions perform poorly. For tree transformations at least, the iterative version is usually harder to write and can require explicitly representing the context on the heap through an auxiliary data structure such as a zipper, which is likely to perform worse. The stack can overflow, and (recursive) functions call aren’t entirely free either, but in terms of allocation, deallocation and locality, the stack is also hard to beat!

April 10, 2025 12:00 AM

April 04, 2025

Edward Z. Yang

Why you should maintain a personal LLM coding benchmark

Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both short and long term benefits from making your own personal benchmark exist.


I started thinking about benchmarks for coding in part with my frustration with the discourse around LLMs in the public squares I frequent (Reddit and Twitter). People often want to know "what's the best model" or "what's the best coding IDE"? One might imagine that the way to answer this question would be to test the models on a variety of problems from real world uses of the LLM for coding, and then compare how well various systems do on this. Indeed, whenever a new SOTA model releases, the lab will usually tell you about the model's performance against a few well known coding benchmarks. Problem solved?

https://blog.ezyang.com/wp-content/uploads/2025/03/Screenshot-2025-03-31-at-10.10.14%E2%80%AFAM.png

Of course not! In fact, for the most part, no one really talks about benchmarks when comparing models. Why? I argue the most popular benchmarks measure tasks that are largely different from what a user wants out of an LLM. For example, take the recent Gemini 2.5 Pro release. In their headline table, they test against LiveCodeBench, Aider Polyglot and SWE-bench Verified. Both LiveCodeBench and Aider Polyglot derive their problems from contest programming and pedagogical exercises (respectively), while SWE-bench assesses bug fixes to preexisting codebases. While useful, this is only a small slice things people want to do with LLMs.

Wouldn't it be great if you had your own, personal benchmark, based on problems you actually care about? If you are tweaking your .cursorrules, you could run your benchmark to see if a change you made helped or not. When a new model comes out, you could spend a few bucks to run your eval and make a decision if you should switch your daily driver. And then on social media, if you wanted to stan the new model, instead of asking the model to drop a ball inside a rotating hexagon or vagueposting about how the new model is incredible, you could just post your benchmark results.


Nicholas Carlini's Yet Another Applied LLM Benchmark is an existence proof that this playbook can work. As Nicholas describes it:

It's a collection of nearly 100 tests I've extracted from my actual conversation history with various LLMs.

There are two defining features of this benchmark that make it interesting. Most importantly, I've implemented a simple dataflow domain specific language to make it easy for me (or anyone else!) to add new tests that realistically evaluate model capabilities. This DSL allows for specifying both how the question should be asked and also how the answer should be evaluated. Most questions are evaluated by actually running the code the model writes but the framework supports a bunch of other evaluation methods as well. And then, directly as a result of this, I've written nearly 100 tests for different situations I've actually encountered when working with LLMs as assistants.

I have been working on my own benchmark based off of Carlini's benchmark, and I can confirm that this works well for the traditional style of coding eval, where you have a one-shot task that generates and executes the code against some test cases. My basic strategy is to vibe code as usual, but whenever I give an LLM a task that it isn't able to one shot, I consider adding it to the benchmark. In more detail:

  • I only add a task if a SOTA LLM failed it. This ensures the benchmark consists of all appropriate difficulty problems: easy enough that I thought an LLM should be able to do it, but hard enough that a SOTA model failed on it. I don't need problems that are too hard (this is already well covered by well known benchmarks like SWE-Bench or SWE-Lancer), and I don't mind if my problems saturate because, hey, that means the models are that much better for my use cases!
  • After I have added the task to the benchmark, I can use the benchmark runner to tell if changing the model, tweaking the prompt, or even just running the prompt again at nonzero temperature can make it pass. Indeed, it's helpful to find some configuration that makes the eval pass, as this is good for debugging issues in the evaluation function itself... also it means you have working code for whatever task you were working on. Conversely, you can make the task harder by leaving things out from the prompt.
  • Writing the test is the labor intensive part, but you can always vibe code a test. Importantly, you have a failing implementation (your initial generation) and some way you (manually?) determined that the implementation was wrong, so just turn this into your evaluation function! (And for all you yak shaving aficionados, if the model fails to vibe code your test, well, you have another task for your benchmark!)

For example, the other day I needed to take an asciinema recording and convert it into a sequence of frames rendered as plain text. However, the only project for doing these conversations was agg, which converts recordings into animated gifs. In agg_to_text, I ask an LLM to take agg's source code and create a new program which dumps the frames as plain text rather than gif images. The reason why this task is difficult, is because there is some discretion in deciding when to emit a frame, and with my original prompt the LLM didn't precisely replicate the original behavior in agg. While working on the benchmark, I realized that instructing the model specifically about how frame batching worked was enough to get it to preserve the original behavior. But I don't think I should need to do this: thus this task. (P.S. If this test saturates, well, I can always make it harder by removing the agg source code from the prompt.)


The ability to benchmark one shot tasks is here today, but I would like to speculate a bit about what lies beyond them. In particular, most of my LLM coding activity involves asking the LLM to make changes to a pre-existing project, which makes it less amenable to "single prompt creates self contained program". (Also, I usually only ask one-shot questions that the LLM can answer, so most of them would never go in my benchmark.)

In short, how can I extract tasks from my day-to-day work? There seems to be two big extra levers we have:

  • Codebase tasks. This is the heavy-weight approach: you record the Git commit of your codebase at the time you prompted for some new feature to be added, and then when you want to run an eval on a new model you just check out the codebase at that commit and let the end-to-end system go. You'll typically want to execute the modified code, which means you'll also need a way to reliably setup the runtime environment for the code; things like lockfiles can help a lot here.
  • Transcript tasks. You don't actually need the entire codebase to be available to ask an LLM for a completion; you only need the conversation transcript up to the point of the critical generation. If the transcript is mostly your agent system reading in files for context, you can end up with a relatively system generic prompt that can tell you something about other systems. Of course, if you want to actually run the change, you still need the full codebase, which is why this approach is much more amenable if you're going to do some static analysis on the output. For example, if a model keeps adding try: ... except: ... blocks that are suppressing errors, you can take some transcripts where you've caught the model red-handed doing this and make an eval that checks if the model is still doing this. I suspect testing on transcripts works best for testing if changing prompts or rules improves performance, since the transcript itself will put the model into some particular latent space and if it were a different model they might have made different choices leading to a different latent space. Transcripts from thinking models are especially susceptible to this!

I have started adapting Carlini's framework to work better for these cases, although I would love to be told someone has already solved this problem for me. In particular, I am very excited about using transcript tasks to evaluate whether or not things I add to my prompts / triggered rules are helping or not. Current SOTA model instruction following isn't great and I regularly catch models doing behaviors that I explicitly told them not to in the system prompt. I have started some initial analysis over all of my chat logs to find cases where the model misbehaved, although I haven't quite worked out how I want to build an eval out of it.

One word of warning: to make transcript tasks, you need an AI coding system that doesn't obscure how it assembles its underlying prompts (which rules out most of the popular closed source AI code editors.)


I started building evals for a selfish reason: I wanted to be able to tell if modifications to my prompts were doing anything. But I also think there is a broader opportunity that arises if we also publish these benchmarks to the world.

For one, building a real world benchmark on use cases we care about is a way to communicate to the people training AI models whether or not they are doing well or not. Historical evals have focused on LeetCoding, and consequently we have models that would ace any big tech interview and yet on real world tasks will drive you off a cliff at the first opportunity. And this is not just free labor for the top labs: if you believe in open source models, one of the biggest barriers to good small models is having really high quality data. We, the OSS vibe coding community, can directly help here.

I think there is a tremendous opportunity for the open source community to really push the state of the art in coding evaluations. There's only so many benchmarks that I, personally, can create, but if everyone is making benchmarks I could eventually imagine a universe of benchmarks where you could curate the problems that are relevant to your work and quickly and cheaply judge models in this way: a Wikipedia of Coding Benchmarks.

To summarize: every time an LLM fails to solve a problem you ask it for, this is a potential new benchmark. As long as there is a way to automate testing if the LLM has solved the problem, you can turn this into a benchmark. Do this for yourself, and you can quickly have a personal benchmark with which to evaluate new models. Do this at scale, and you can help push the frontier in coding models.

by Edward Z. Yang at April 04, 2025 07:05 AM

April 01, 2025

in Code

Faking ADTs and GADTs in Languages That Shouldn't Have Them

Haskell is the world’s best programming language1, but let’s face the harsh reality that a lot of times in life you’ll have to write in other programming languages. But alas you have been fully Haskell-brained and lost all ability to program unless it is type-directed, you don’t even know how to start writing a program without imagining its shape as a type first.

Well, fear not. The foundational theory behind Algebraic Data Types and Generalized Algebraic Data Types (ADTs and GADTs) are so fundamental that they’ll fit (somewhat) seamlessly into whatever language you’re forced to write. After all, if they can fit profunctor optics in Microsoft’s Java code, the sky’s the limit!

This is an “April Fools” joke in the tradition of my previous one in some of these ways that we are going to twist these other languages might seem unconventional or possibly ill-advised… but also the title is definitely a lie: these languages definitely should have them! :D

Normal ADTs

As a reminder, algebraic Data Types (ADTs) are products and sums; that’s why they’re algebraic, after all!

Product Types

Products are just immutable structs, which pretty much every language supports — as long as you’re able to make sure they are never mutated.

Structs in c, for example, look like:

#include <stdint.h>

typedef struct {
    uint32_t timestamp;
    double amount;
} Transaction;

But you’ll need proper immutable API for it:

Transaction createTransaction(uint32_t timestamp, double amount) {
    return (Transaction){ timestamp, amount};
}

uint32_t getTimestamp(const Transaction* t) {
    return t->timestamp;
}

double getAmount(const Transaction* t) {
    return t->amount;
}

Transaction setTimestamp(const Transaction* t, uint32_t timestamp) {
    return (Transaction){timestamp, t->amount};
}

Transaction setAmount(const Transaction* t, double amount) {
    return (Transaction){t->timestamp, amount};
}

This is much simpler in languages where you can associate functions with data, like OOP and classes. For example, this is the common “value object” pattern in java (roughly related to the java bean2):

public class Transaction {
    private final long timestamp;
    private final double amount;

    public Transaction(long timestamp, double amount) {
        this.timestamp = timestamp;
        this.amount = amount;
    }

    public long getTimestamp() { return timestamp; }
    public double getAmount() { return amount; }

    public Transaction setTimestamp(long newTimestamp) {
        return new Transaction(newTimestamp, this.amount);
    }

    public Transaction setAmount(double newAmount) {
        return new Transaction(this.timestamp, newAmount);
    }
}

And there you go. Nothing too surprising there!

In this case, not only are these ADTs (algebraic data types), they’re also ADTs (abstract data types): you are meant to work with them based on a pre-defined abstract interface based on type algebra, instead of their internal representations.

Sum Types

If your language doesn’t support sum types, usually the way to go is with the visitor pattern: the underlying implementation is hidden, and the only way to process a sum type value is by providing handlers for every branch — a pattern match as a function, essentially. Your sum values then basically determine which handler is called.

For example, we can implement it for a network address type that can either be IPv4 or IPv6. Here we are using C++ just for generics and lambdas with closures, for simplicity, but we’ll discuss how this might look in C later.

#include <iostream>
#include <format>
#include <cstdint>

struct IPAddress {
    bool isIPv4;
    union {
        uint32_t ipv4;
        uint8_t ipv6[16];
    };
};

template <typename R>
struct IPAddressVisitor {
    R (*visitIPv4)(uint32_t);
    R (*visitIPv6)(const uint8_t (&)[16]);
};

template <typename R>
R acceptIPAddress(const IPAddress& ip, IPAddressVisitor<R> visitor) {
    return ip.isIPv4 ? visitor.visitIPv4(ip.ipv4)
                     : visitor.visitIPv6(ip.ipv6);
}

You can create the values using:

IPAddress mkIPv4(uint32_t value) {
    return { true, { value } };
}

IPAddress mkIPv6(const uint8_t (&value)[16]) {
    IPAddress out = { false };
    std::copy(std::begin(value), std::end(value), out.ipv6);
    return out;
}

And we can show an address:

std::string showIPAddress(const IPAddress& ip) {
    IPAddressVisitor<std::string> visitor = {
        [](uint32_t v) {
            return std::format("{}.{}.{}.{}",
                               (v >> 24) & 0xFF, (v >> 16) & 0xFF,
                               (v >> 8) & 0xFF, v & 0xFF);
        },
        [](const uint8_t (&v)[16]) {
            return std::format("{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:"
                               "{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}:{:02X}{:02X}",
                               v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7],
                               v[8], v[9], v[10], v[11], v[12], v[13], v[14], v[15]);
        }
    };
    return acceptIPAddress(ip, visitor);
}

Note that in this way, the compiler enforces that we handle every branch. And, if we ever add a new branch, everything that ever consumes IPAddress with an IPAddressVisitor will have to add a new handler.

In a language without generics or powerful enough polymorphism, it’s difficult to enforce the “pure” visitor pattern because you can’t ensure that all branches return the same type.

One common pattern is to have an “effectful” visitor pattern, where the point isn’t to return something, but to execute something on the payload of the present branch. This is pretty effective for languages like C, javascript, python, etc. where types aren’t really a rigid thing.

For example, this might be how you treat an “implicit nullable”:

export const visitMaybe = (visitNothing, visitJust, val) =>
  (val == null) ? visitNothing() : visitJust(val);

This is basically for_ from Haskell: You can do something like conditionally launch some action if the value is present.

visitMaybe(
  () => console.log("Nothing to request"),
  (reqPayload) => makeRequest("google.com", reqPayload),
  maybeRequest
);

On a simpler note, if your language as subtyping built in (maybe with classes and subclasses) or some other form of dynamic dispatch, you can implement it in terms of that, which is nice in python, java, C++, etc.

interface ExprVisitor<R> {
    R visitLit(int value);
    R visitNegate(Expr unary);
    R visitAdd(Expr left, Expr right);
    R visitMul(Expr left, Expr right);
}

abstract class Expr {
    public abstract <R> R accept(ExprVisitor<R> visitor);
}

Alternatively, you’re in a language where lambdas are easy, instead of tupling up the visitor, you could just have accept itself take a number of arguments corresponding to each constructor:

// Alternative definition without an explicit Visitor class
abstract class Expr {
    public abstract <R> R accept(
        Function<int,R> visitLit,
        Function<Expr,R> visitNegate,
        BiFunction<Expr,Expr,R> visitAdd,
        BiFunction<Expr,Expr,R> visitMul
    );
}

(Note that C++ doesn’t allow template virtual methods — not because it’s not possible within the language semantics and syntax, but rather because the maintainers are too lazy to add it — so doing this faithfully requires a bit more creativity)

Now, if your language has dynamic dispatch or subclass polymorphism, you can actually do a different encoding, instead of the tagged union. This will work in languages that don’t allow or fully support naked union types, too. In this method, each constructor becomes a class, but it’s important to only allow access using accept to properly enforce the sum type pattern.

class Lit extends Expr {
    private final int value;

    public Lit(int value) {
        this.value = value;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitLit(value);
    }
}

class Negate extends Expr {
    private final Expr unary;

    public Negate(Expr unary) { this.unary = unary; }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitNegate(unary);
    }
}

class Add extends Expr {
    private final Expr left;
    private final Expr right;

    public Add(Expr left, Expr right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitAdd(left, right);
    }
}

class Mul extends Expr {
    private final Expr left;
    private final Expr right;

    public Mul(Expr left, Expr right) {
        this.left = left;
        this.right = right;
    }

    @Override
    public <R> R accept(ExprVisitor<R> visitor) {
        return visitor.visitMul(left, right);
    }
}

(But, just wanted to note that if you actually are working in java, you can actually do something with sealed classes, which allows exhaustiveness checking for its native switch/case statements.)

Alternatively you could make all of the subclasses anonymous and expose them as factory methods, if your language allows it:

abstract class Expr {
    public abstract <R> R accept(ExprVisitor<R> visitor);

    public static Expr lit(int value) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitLit(value);
            }
        };
    }

    public static Expr negate(Expr unary) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitNegate(unary);
            }
        };
    }

    public static Expr add(Expr left, Expr right) {
        return new Expr() {
            @Override
            public <R> R accept(ExprVisitor<R> visitor) {
                return visitor.visitAdd(left, right);
            }
        };
    }

    // ... etc
}

You’d then call using:

public class Main {
    public static void main(String[] args) {
        Expr expr = new Mul(new Negate(new Add(new Lit(4), new Lit(5))), new Lit(8));
        // or
        // Expr expr = Eval.mul(Eval.negate(Eval.add(Eval.lit(4), Eval.lit(5))), Eval.lit(8));

        ExprVisitor<Integer> eval = new ExprVisitor<>() {
            @Override public Integer visitLit(int value) {
                return value;
            }
            @Override public Integer visitNegate(Expr unary) {
                return -unary.accept(this);
            }
            @Override public Integer visitAdd(Expr left, Expr right) {
                return left.accept(this) + right.accept(this);
            }
            @Override public Integer visitMul(Expr left, Expr right) {
                return left.accept(this) * right.accept(this);
            }
        };

        System.out.println("Result: " + expr.accept(eval));
    }
}

Passing around function references like this is actually pretty close to the scott encoding of our data type — and for non-recursive types, it’s essentially the church encoding.

Recursive Types

Speaking of recursive types…what if your language doesn’t allow recursive data types? What if it doesn’t allow recursion at all, or what if recursively generated values are just annoying to deal with? Just imagine writing that Expr type in a language with explicit memory management, for example. Or, what if you wanted a way to express your recursive types in a more elegant and runtime-safe manner?

One thing you can instead do is have your visitor be in its “catamorphism”, or church encoding. Instead of having the “visitor” take the recursive sub-values, instead have it return the result of recursively applying itself.

Let’s do this in dhall, one of the most famous non-recursive languages. Dhall does have native sum types, so we won’t worry about manually writing a visitor pattern. But it does not have recursive data types.

Let’s define a type like:

data Expr = Lit Natural
          | Add Expr Expr
          | Mul Expr Expr

But we can’t define data types in dhall that refer to themselves. So instead, we can define them in their “church encoding”: give what you would do with an Expr to consume it, where the consumption function is given as if it were recursively applied.

let ExprF : Type -> Type
      = \(r : Type) ->
        { lit : Natural -> r
        , add    : r -> r -> r
        , mul    : r -> r -> r
        }

let Expr : Type
      = forall (r : Type) -> ExprF r -> r

Note that ExprF r is essentially ExprVisitor<R>, except instead of add being Expr -> Expr -> r, it’s r -> r -> r: the input values aren’t the expression, but rather the results of recursively folding on the expression. In fact, our original non-recursive ExprVisitor<R> (to be more precise, the R accept(ExprVisitor<R>)) is often called the “scott encoding”, as opposed to the recursive “church encoding” fold.

For value creation, you take the visitor and recursively apply:

let lit : Natural -> Expr
      = \(x : Natural) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.lit x

let add : Expr -> Expr -> Expr
      = \(left : Expr) ->
        \(right : Expr) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.add (left r handlers) (right r handlers)

let mul : Expr -> Expr -> Expr
      = \(left : Expr) ->
        \(right : Expr) ->
        \(r : Type) ->
        \(handlers : ExprF r) ->
            handlers.mul (left r handlers) (right r handlers)

And finally, using the data type involves providing the handler to fold up from the bottom to top. Note that add : \(left : Natural) -> \(right : Natural) -> left + right already assumes that the handler has been applied to the sub-expressions, so you get Naturals on both sides instead of Expr.

let eval : Expr -> Natural
      = \(e : Expr) ->
          e Natural
            { lit = \(x : Natural) -> x
            , add = \(left : Natural) -> \(right : Natural) -> left + right
            , mul = \(left : Natural) -> \(right : Natural) -> left * right
            }

let testVal : Expr
      = mul (add (lit 4) (lit 5)) (lit 8)

in  assert : eval testVal === 72

This pattern is useful even in languages with good datatype recursion, like Haskell — it’s actually the recursion-schemes refactoring of a recursive data type, and it can be useful to have it live alongside your normal recursive types. I’ve written this blog post talking about how useful this pattern is to have alongside your normal recursive types.

This pattern is pretty portable to other languages too, as long as you can scrounge together something like Rank-N types:

interface ExprFold<R> {
    R foldLit(int value);
    R foldNegate(R unary);
    R foldAdd(R left, R right);
    R foldMul(R left, R right);
}

interface Expr {
    public abstract <R> R accept(ExprFold<R> fold);

    public static Expr lit(int value) {
        return new Expr() {
            @Override
            public <R> R accept(ExprFold<R> fold) {
                return fold.foldLit(value);
            }
        };
    }

    public static Expr negate(Expr unary) {
        return new Expr() {
            @Override
            public <R> R accept(ExprFold<R> fold) {
                return fold.foldNegate(unary.accept(fold));
            }
        };
    }

    // etc.
}

By “Rank-N types” here, I mean that your objects can generate polymorphic functions: given an Expr, you could generate an <R> R accept(ExprFold <R> fold) for any R, and not something pre-determined or pre-chosen by your choice of representation of Expr.

Generalized Algebraic Data Types

You’ve implemented ADTs in your language of choice, or you are currently in a language with native ADTs. Life is good, right? Until that sneaky voice starts whispering in your hear: “we need more type safety.” You resist that urge, maybe even get a lot done without it, but eventually you are compelled to give in and embrace the warm yet harsh embrace of ultimate type safety. Now what?

Singletons and Witnesses

In Haskell, singletons are essentially enums used to associate a value with a reifiable type. “Reifiable” here means that you can take the runtime value of a singleton and use it to bring evidence to the type-level. I ran into a real-world usage of this while writing https://coronavirus.jle.im/, a web-based data visualizer of COVID-19 data (source here) in purescript. I needed a singleton to represent scales for scatter plots and linking them to the data that can be plotted. And, not only did it need to be type-safe in purescript (which has ADTs but not GADTs), it had to be type-safe in the javascript ffi as well.

Here’s how it might look in Haskell:

-- | Numeric types
data NType :: Type -> Type where
    NInt :: NType Int
    NDouble :: NType Double
    NPercent :: NType Percent

-- | Define a scale
data Scale :: Type -> Type where
    ScaleDate :: Scale Date
    ScaleLinear :: Bool -> NType a -> Scale a   -- ^ whether to include zero in the axis or not
    ScaleLog :: NType a -> Scale a

You’d then run it like this:

plot :: Scale a -> Scale b -> [(a, b)] -> Canvas

So, we have the type of the input tuples being determined by the values you pass to plot:

ghci> :t plot ScaleDate (ScaleLinear True (LNumeric NInt))
[(Date, Int)] -> Canvas

But let’s say we only had ADTs. And then we’re passing them down to a javascript FFI which only has structs and functions. We could drop the type-safety and instead error on runtime, but…no. Type unsafety is not acceptable.

The fundamental ability we want to gain is that if we pattern match on ScaleDate, then we know a has to be Date. If we match on NInt, we know that a has to be Int.

For the sake of this example, we’re going to be implementing a simpler function in purescript and in javascript: a function that takes a scale type and a list of points prints the bounds. In Haskell, this looks like:

data AxisBounds a = AB
    { minValue :: a
    , minLabel :: String
    , maxValue :: a
    , maxLabel :: String
    }

displayAxis :: Scale a -> [a] -> AxisBounds a
displayAxis = \case
    ScaleDate -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (showDate xMin) xMax (showDate xMax)
    ScaleLinear hasZero nt -> \xs ->
      displayNumericAxis (if hasZero then 0:xs else xs)
    ScaleLog nt ->
      displayNumericAxis nt xs

displayNumericAxis :: NType a -> [a] -> AxisBounds a
displayNumericAxis = \case
    NInt -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%d" xMin) xMax (printf "%d" xMax)
    NDouble -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%.4f" xMin) xMax (printf "%.4f" xMax)
    NPercent -> \xs ->
      let xMin = minimum xs
          xMax = maximum xs
       in AB xMin (printf "%.1f%%" (xMin*100)) xMax (printf "%.1f%%" (xMax*100))

(Pretend the Percent type is just a newtype-wrapped Float or something)

There are at least two main approaches to do this. We’ll be discussing runtime equality witnesses and Higher-Kinded Eliminators.

Runtime Witnesses and Coyoneda Embedding

The Yoneda Lemma is one of the most powerful tools that Category Theory has yielded as a branch of math, but its sibling coyoneda is one of the most useful Haskell abstractions.

This doesn’t give you GADTs, but it’s a very lightweight way to “downgrade” your GADTs into normal GADTs which is appropriate if you don’t need the full power.

The trick is this: if you have MyGADT a, and you know you are going to be using it to produce as, you can do a covariant coyoneda transform.

For example, if you have this type representing potential data sources:

data Source :: Type -> Type where
    ByteSource :: Handle -> Source Word
    StringSource :: FilePath -> Source String

readByte :: Handle -> IO Word
readString :: FilePath -> IO String

readSource :: Source a -> IO a
readSource = \case
    ByteSource h -> readByte h
    StringSource fp -> readString fp

You could instead turn Source into a non-GADT by making it a normal parameterized ADT and adding a X -> a field, which is a type of CPS transformation:

data Source a =
    ByteSource Handle (Word -> a)
  | StringSource FilePath (String -> a)

byteSource :: Handle -> Source Word
byteSource h = ByteSource h id

stringSource :: FilePath -> Source String
stringSource fp = StringSource fp id

readSource :: Source a -> IO a
readSource = \case
    ByteSource h out -> out <$> readByte h
    StringSource fp out -> out <$> readString fp

A nice benefit of this method is that Source can now have a Functor instance, which the original GADT could not.

And, if MyGADT a is going to be consuming as, you can do the contravariant coyoneda transform:

data Sink a =
    ByteSink Handle (a -> Word)
  | StringSink FilePath (a -> String)

This gives it a free Contravariant instance too!

And, if you are going to be both consuming and producing as, you can do the invariant coyoneda transform

data Interface a =
    ByteInterface Handle (Word -> a) (a -> Word)
  | StringInterface FilePath (String -> a) (Word -> a)

However, in practice, true equality involves being able to lift under injective type constructors, and carrying every single continuation is unwieldy. We can package them up together with a runtime equality witness.

This is something we can put “inside” NInt such that, when we pattern match on a NType a, the type system can be assured that a is an Int.

You need some sort of data of type IsEq a b with functions:

  • refl :: IsEq a a
  • to :: IsEq a b -> a -> b
  • sym :: IsEq a b -> IsEq b a
  • trans :: IsEq a b -> IsEq b c -> IsEq a c
  • inj :: IsEq (f a) (f b) -> IsEq a b

If you have to and sym you also get from :: IsEq a b -> b -> a.

From all of this, we can recover our original IsEq a Word -> Word -> a and IsEq a Word -> a -> Word functions, saving us from having to put two functions.

Your language of choice might already have this IsEq. But one of the more interesting ways to me is Leibniz equality (discussed a lot in this Ryan Scott post), which works in languages with higher-kinded polymorphism. Leibniz quality in languages with higher-kinded polymorphism means that a and b are equal if forall p. p a -> p b: any property of a is also true of b.

In Haskell, we write this like:

newtype Leibniz a b = Leibniz (forall p. p a -> p b)

refl :: Leibniz a a
refl = Leibniz id

The only possible way to construct a ‘Leibniz’ is with both type parameters being the same: You can only ever create a value of type Leibniz a a, never a value of Leibniz a b where b is not a.

You can prove that this is actually equality by writing functions Leibniz a b -> Leibniz b a and Leibniz a b -> Leibniz b c -> Leibniz a c (this Ryan Scott post goes over it well), but in practice we realize this equality by safely coercing a and b back and forth:

newtype Identity a = Identity { runIdentity :: a }

to :: Leibniz a b -> a -> b
to (Leibniz f) = runIdentity . f . Identity

newtype Op a b = Op { getOp :: b -> a }

from :: Leibniz a b -> b -> a
from (Leibniz f) = getOp (f (Op id))

So, if your language supports higher-kinded Rank-2 types, you have a solution!

There are other solutions in other languages, but they will usually all be language-dependent.

Let’s write everything in purescript. The key difference is we use map (to isNumber) :: Array a -> Array Number, etc., to get our Array as something we know it has the type of.

import Text.Printf

newtype Leibniz a b = Leibniz (forall p. p a -> p b)

to :: Leibniz a b -> a -> b
from :: Leibniz a b -> b -> a

data NType a =
    NInt (Leibniz a Int)
  | NNumber (Leibniz a Number)
  | NPercent (Leibniz a Percent)

type AxisBounds a =
    { minValue :: a
    , minLabel :: String
    , maxValue :: a
    , maxLabel :: String
    }

displayNumericAxis :: NType a -> Array a -> AxisBounds a
displayNumericAxis = \case
    NInt isInt -> \xs ->
      let xMin = minimum $ map (to isInt) xs
          xMax = maximum $ map (to isInt) xs
          showInt = show
       in { minValue: xMin
          , minLabel: showInt xMin
          , maxValue: xMax
          , maxLabel: showInt xMax
          }
    NNumber isNumber -> \xs ->
      let xMin = minimum $ map (to isNumber) xs
          xMax = maximum $ map (to isNumber) xs
          showFloat = printf (Proxy :: Proxy "%.4f")   -- it works a little differently
       in { minValue: xMin
          , minLabel: showFloat xMin
          , maxValue: xMax
          , maxLabel: showFloat xMax
          }
    NPercent isPercent -> \xs ->
      let xMin = minimum $ map (to isPercent) xs
          xMax = maximum $ map (to isPercent) xs
          showPercent = printf (Proxy :: Proxy "%.1f%%") <<< (_ * 100.0)
       in { minValue: xMin
          , minLabel: showPercent xMin
          , maxValue: xMax
          , maxLabel: showPercent xMax
          }

To work with our [a] as if it were [Int], we have to map the coercion function over it that our Leibniz a Int gave us. Admittedly, this naive way adds a runtime cost of copying the array. But we could be more creative with finding the minimum and maximum in this way in constant space and no extra allocations.

And, if we wanted to outsource this to the javascript FFI, remember that javascript doesn’t quite have sum types, so we can create a quick visitor:

type NVisitor a r =
    { nvInt :: Leibniz a Int -> r
    , nvNumber :: Leibniz a Number -> r
    , nvPercent :: Leibniz a Percent -> r
    }

type NAccept a = forall r. NVisitor a r -> r

toAccept :: NType a -> NAccept a
toAccept = case _ of
    NInt isInt -> \nv -> nv.nvInt isInt
    NNumber isNumber -> \nv -> nv.nvNumber isNumber
    NPercent isPercent -> \nv -> nv.nvPercent isPercent

foreign import _formatNumeric :: forall a. Fn2 (NAccept a) a String

formatNumeric :: NType a -> a -> String
formatNumeric nt = runFn2 _formatNumeric (toAccept nt)

The FFI binding looks like: (taken from my actual source code)

import * as d3 from "d3-format";

export const _formatNumeric = (naccept, xs) =>
  naccept(
    { nvInt: (isInt) => d3.format("~s")
    , nvNumber: (isNumber) => d3.format(".3~s")
    , nvPercent: (isPercent) => d3.format("+.3~p")
    }
  );

Admittedly in the javascript we are throwing away the “GADT type safety” because we throw away the equality. But we take what we can — we at least retain the visitor pattern for sum-type type safety and exhaustiveness checking. I haven’t done this in typescript yet so there might be a way to formalize Leibniz equality to do this in typescript and keep the whole chain type-safe from top to bottom.

Higher-Kinded Eliminators

This is essentially the higher-kinded version of the visitor pattern, except in dependent type theory these visitors are more often called “eliminators” or destructors, which is definitely a cooler name.

In the normal visitor you’d have:

data User = TheAdmin | Member Int

data UserHandler r = UH
    { uhTheAdmin :: r
    , uhMember :: Int -> r
    }

But note that if you have the right set of continuations, you have something that is essentially equal to User without having to actually use User:

type User' = forall r. UserHandler r -> r

fromUser :: User -> User'
fromUser = \case
    TheAdmin -> \UH{..} -> uhTheAdmin
    Member userId -> \UH{..} -> uhMember userId

toUser :: User' -> Foo
toUser f = f $ UH { fhTheAdmin = TheAdmin, fhMember = Member }

This means that User is actually equivalent to forall r. UserHandler r -> r: they’re the same type, so if your language doesn’t have sum types, you could encode it as forall r. UserHandler r -> r instead. Visitors, baby.

But, then, what actually does the r type variable represent here, semantically? Well, in a UserHandler r, r is the “target” that we interpret into. But there’s a deeper relationship between r and User: A UserHandler r essentially “embeds” a User into an r. And, a UserHandler r -> r is the application of that embedding to an actual User.

If we pick r ~ (), then UserHandler () embeds User into (). If we pick r ~ String, then UserHandler () embeds User into String (like, “showing” it). And if we pick r ~ User, a UserHandler User embeds a User into…itself?

So here, r is essentially the projection that we view the user through. And by making sure we are forall r. UserHandler r -> r for all r, we ensure that we do not lose any information: the embedding is completely 1-to-1. It lets you “create” the User faithfully in a “polymorphic” way.

In fact, to hammer this home, some people like to use the name of the type as the type variable: UserHandler user:

-- | The same thing as before but with things renamed to prove a point
data MakeUser user = MakeUser
    { uhTheAdmin :: user
    , uhMember :: Int -> user
    }

type User' = forall user. MakeUser user -> user

The forall user. lets us faithfully “create” a User within the system we have, without actually having a User data type. Essentially we can imagine the r in the forall r as “standing in” for User, even if that type doesn’t actually exist.

Now, here’s the breakthrough: If we can use forall (r :: Type) to substitute for User :: Type, how about we use a forall (p :: Type -> Type) to substitute for a Scale :: Type -> Type?

data Scale :: Type -> Type where
    ScaleDate :: Scale Date
    ScaleLinear :: Bool -> LType a -> Scale a
    ScaleLog :: NType a -> Scale a

data ScaleHandler p a = SH
    { shDate :: p Date
    , shLinear :: Bool -> NType a -> p a
    , shLog :: NType a -> p a
    }

type Scale' a = forall p. ScaleHandler p a -> p a

fromScale :: Scale a -> Scale' a
fromScale = \case
    ScaleDate -> \SH{..} -> shDate
    ScaleLinear hasZero lt -> \SH{..} -> shLinear hasZero lt
    ScaleLog nt -> \SH{..} -> shLog nt

toScale :: Scale' a -> Scale a
toScale f = f $ SH { shDate = ScaleDate, shLinear = ScaleLinear, shLog = ScaleLog }

So in our new system, forall p. ScaleHandler p a -> p a is identical to Scale: we can use p a to substitute in Scale in our language even if our language itself cannot support GADTs.

So let’s write formatNType in purescript. We no longer have an actual Scale sum type, but its higher-kinded church encoding:

type NType a = forall p.
    { int :: p Int
    , number :: p Number
    , percent :: p Percent
    } -> p a

type Scale a = forall p.
    { date :: p Date
    , linear :: Bool -> NType a -> p a
    , log :: NType a -> p a
    } -> p a

ntInt :: NType Int
ntInt nth = nth.int

ntNumber :: NType Number
ntNumber nth = nth.number

ntPercent :: NType Percent
ntPercent nth = nth.percent

formatNType :: NType a -> a -> String
formatNType nt = f
  where
    Op f = nt
      { int: Op show
      , number: Op $ printf (Proxy "%.4f")
      , percent: Op $ printf (Proxy "%.1f%%") <<< (_ * 100.0)
      }

Here we are using

newtype Op b a = Op (a -> b)

as our “target”: turning an NType a into an Op String a. And an Op String a is an a -> String, which is what we wanted! The int field is Op String Int, the number field is Op String Number, etc.

In many languages, using this technique effectively requires having a newtype wrapper on-hand, so it might be unwieldy in non-trivial situations. For example, if we wanted to write our previous axis function which is NType a -> [a] -> String, we’d have to have a newtype wrapper for [a] -> String that has a as its argument:

newtype OpList b a = Op ([a] -> b)

or you could re-use Compose:

newtype Compose f g a = Compose (f (g a))

and your p projection type would be Compose Op []. So, you don’t necessarily have to write a bespoke newtype wrapper, but you do have to devote some brain cycles to think it through (unless you’re in a language that doesn’t need newtype wrappers to have this work, like we’ll discuss later).

By the way, this method generalizes well to multiple arguments: if you have a type like MyGADT a b c, you just need to project into a forall (p :: k1 -> k2 -> k3 -> Type).

I believe I have read somewhere that the two methods discussed here (runtime equality witness vs. higher-kinded eliminator) are not actually fully identical in their power, and there are GADTs where one would work and not the other … but I can’t remember where I read this and I’m also not big-brained enough to figure out what those situations are. But if you, reader, have any idea, please let me know!

Existential Types

Let’s take a quick break to talk about something that’s not technically related to GADTs but is often used alongside them.

What if we wanted to store a value with its NType and hide the type variable? In Haskell we’d write this like:

data NType :: Type -> Type where
    NInt :: NType Int
    NDouble :: NType Double
    NPercent :: NType Percent

data SomeNType = forall a. SomeNType (NType a) a

formatNType :: NType a -> a -> String
formatNType nt x = ...

formatSomeNType :: SomeNType -> String
formatSomeNType (SomeNType nt x) = formatNType nt x

myFavoriteNumbers :: [SomeNType]
myFavoriteNumbers = [SomeNType NInt 3, SomeNType NDouble pi]

But what if our language doesn’t have existentials? Remember, this is basically a value SomeNType that isn’t a Generic, but contains both a NType a and an a of the same variable.

One strategy we have available is to CPS-transform our existentials into their CPS form (continuation-passing style form). Basically, we write exactly what we want to do with our contents if we pattern matched on them. It’s essentially a Rank-N visitor pattern with only a single constructor:

type SomeNType = forall r. (forall a. NType a -> a -> r) -> r

someNType :: NType a -> a -> SomeNType
someNType nt x = \f -> f nt x

formatSomeNumeric :: SomeNType -> String
formatSomeNumeric snt = snt
    \nt x -> formatNumeric nt x

You can imagine, syntactically, that snt acts as its “own” pattern match, except instead of matching on SomeNType nt x -> .., you “match” on \nt x -> ..

This general pattern works for languages with traditional generics like Java too:

interface SomeNTypeVisitor<R> {
    <A> R visit(NType<A> nt, A val);
}

interface SomeNType {
    public abstract <R> R accept(SomeNTypeVisitor<R> visitor);

    // One option: the factory method
    public static <A> SomeNType someNType(NType<A> nt, A val) {
        return new SomeNType() {
            @Override
            public <R> R accept(SomeNTypeVisitor<R> visitor) {
                return visitor.visit(nt, val);
            }
        };
    }
}

// Second option: the subtype hiding a type variable, which you have to always
// make sure to upcast into `SomeNType` after creating
class SomeNTypeImpl<A> extends SomeNType {
    private NType<A> nt;
    private A val;

    public SomeNTypeImpl(NType<A> nt, A val) {
        this.nt = nt;
        this.val = val;
    }

    @Override
    public <R> R accept(SomeNTypeVisitor<R> visitor) {
        return visitor.visit(nt, val);
    }
}

Does…anyone write java like this? I tried committing this once while at Google and I got automatically flagged to be put on a PIP.

Recursive GADTs

The climax of this discussion: what if your language does not support GADTs or recursive data types?

We’re going to be using dhall as an example again, but note that the lessons applied here are potentially useful even when you do have recursive types: we’re going to be talking about a higher-kinded church encoding, which can be a useful form of your data types that live alongside your normal recursive ones.

Let’s imagine Expr as a GADT, where Expr a represents an Expr that evaluates to an a:

data Expr :: Type -> Type where
    NatLit :: Natural -> Expr Natural
    BoolLit :: Bool -> Expr Bool
    Add :: Expr Natural -> Expr Natural -> Expr Natural
    LTE :: Expr Natural -> Expr Natural -> Expr Bool
    Ternary :: Expr Bool -> Expr a -> Expr a -> Expr a

eval :: Expr a -> a
eval = \case
    NatLit n -> n
    BoolLit b -> b
    Add x y -> eval x + eval y
    LTE a b -> eval a <= eval b
    Ternary b x y -> if eval b then eval x else eval y

Adding this type variable ensures that our Expr is type-safe: it’s impossible to Add an Expr Bool, and the two branches of a Ternary must have the same result type, etc. And, we can write eval :: Expr a -> a and know exactly what type will be returned.

Now, let’s combine the two concepts: First, the church encoding, where our handlers take the “final result” of our fold r instead of the recursive value Expr. Second, the higher-kinded eliminator pattern where we embed Expr :: Type -> Type into forall (p :: Type -> Type).

And finally, we get:3

let ExprF =
      \(p : Type -> Type) ->
        { natLit : Natural -> p Natural
        , boolLit : Bool -> p Bool
        , add : p Natural -> p Natural -> p Natural
        , ternary : forall (a : Type) -> p Bool -> p a -> p a -> p a
        }

let Expr
    : Type -> Type
    = \(a : Type) -> forall (p : Type -> Type) -> ExprF p -> p a

let eval
    : forall (a : Type) -> Expr a -> a
    = \(a : Type) ->
      \(e : Expr a) ->
        e
          (\(q : Type) -> q)
          { natLit = \(x : Natural) -> x
          , boolLit = \(x : Bool) -> x
          , add = \(x : Natural) -> \(y : Natural) -> x + y
          , ternary =
              \(a : Type) ->
              \(b : Bool) ->
              \(x : a) ->
              \(y : a) ->
                if b then x else y
          }

Again, now instead of add taking Expr, it takes p Natural: the “Natural result of the fold”. p not only stands in for what we embed Expr into, it stands in for the result of the recursive fold. That’s why in eval, the first arguments of add are the Natural results of the sub-evaluation.

These values can be created in the same way as before, merging the two techniques, sending the handlers downstream:

let natLit
    : Natural -> Expr Natural
    = \(n : Natural) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.natLit n

let boolLit
    : Bool -> Expr Bool
    = \(n : Bool) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.boolLit n

let add
    : Expr Natural -> Expr Natural -> Expr Natural
    = \(x : Expr Natural) ->
      \(y : Expr Natural) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.add (x p handlers) (y p handlers)

let ternary
    : forall (a : Type) -> Expr Bool -> Expr a -> Expr a -> Expr a
    = \(a : Type) ->
      \(b : Expr Bool) ->
      \(x : Expr a) ->
      \(y : Expr a) ->
      \(p : Type -> Type) ->
      \(handlers : ExprF p) ->
        handlers.ternary (b p handlers) (x p handlers) (y p handlers)

let testVal
    : Expr Natural
    = add (natLit 5) (add (natLit 6) (natLit 7))

in  assert : eval testVal === 18

If all of this is difficult to parse, try reviewing both the recursive ADT section and the higher-kinded eliminator section and making sure you understand both well before tackling this, which combines them together!

Admittedly in Haskell (and purescript) this is a lot simpler because we don’t have to explicitly pass in type variables:

data ExprF p = ExprF
    { natLit :: Natural -> p Natural
    , boolLit :: Bool -> p Bool
    , add :: p Natural -> p Natural -> p Natural
    , ternary :: forall a.  p Bool -> p a -> p a -> p a
    }

type Expr a = forall p. ExprF p a -> p a

eval :: Expr a -> a
eval e = runIdentity $
  e
    { natLit = Identity
    , boolLit = Identity
    , add = \(Identity x) -> \(Identity y) -> Identity (x + y)
    , ternary = \(Identity b) -> \(Identity x) -> \(Identity y) -> if b then x else y
    }

ternary :: Expr Bool -> Expr a -> Expr a -> Expr a
ternary b x y handlers = handlers.ternary (b handlers) (x handlers) (y handlers)

But one nice thing about the dhall version that’s incidental to dhall is that it doesn’t require any extra newtype wrappers like the Haskell one does. That’s because type inference tends to choke on things like this, but dhall doesn’t really have any type inference: all of the types are passed explicitly. It’s one of the facts about dhall that make it nice for things like this.

Congratulations

In any case, if you’ve made it this far, congratulations! You are a master of ADTs and GADTs. Admittedly every language is different, and some of these solutions have to be tweaked for the language in question. And, if your program gets very complicated, there is a good chance that things will become ergonomically unfeasible.

But I hope, at least, that this inspires your imagination to try to bring your haskell principles, techniques, standards, practices, and brainrot into the language of your choice (or language you are forced to work with).

And, if you ever find interesting ways to bring these things into a language not discussed here (or a new interesting technique or pattern), I would absolutely love to hear about it!

Until next time, happy “Haskelling”!

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)


  1. I bet you thought there was going be some sort of caveat in this footnote, didn’t you?↩︎

  2. I didn’t think I’d ever write “java bean” non-ironically on my blog, but there’s a first time for everything.↩︎

  3. Be aware that this implementation is not necessarily appropriately lazy or short-circuiting in Ternary: it might evaluate both sides returning the chosen branch.↩︎

by Justin Le at April 01, 2025 05:29 PM

March 30, 2025

Lysxia's blog

Unfolding trees breadth-first in Haskell

To visit a tree or graph in breadth-first order, there are two main implementation approaches: queue-based or level-based. Our goal here is to develop a level-based approach where the levels of the breadth-first walk are constructed compositionally and dynamically.

Compositionality means that for every node, its descendants—the other nodes reachable from it—are defined by composing the descendants of its children. Dynamism means that the children of a node are generated only when that node is visited; we will see that this requirement corresponds to asking for a monadic unfold.

A prior solution, using the Phases applicative functor, is compositional but not dynamic in that sense. The essence of Phases is a zipping operation in free applicative functors. What if we did zipping in free monads instead?

This is a Literate Haskell post. The source code is on Gitlab. A reusable version of this code is now available on Hackage: the weave library.

Extensions and imports for this Literate Haskell file
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PackageImports #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE ViewPatterns #-}
{-# OPTIONS_GHC -Wno-x-partial -Wno-unused-matches -Wno-unused-top-binds -Wno-unused-imports #-}

import "deepseq" Control.DeepSeq (NFData)
import Data.Foldable (toList)
import Data.Function ((&))
import Data.Functor ((<&>))
import Data.Functor.Identity (Identity(..), runIdentity)
import GHC.Generics (Generic)
import "tasty" Test.Tasty (TestTree, localOption)
import "tasty-hunit" Test.Tasty.HUnit ((@?=), testCase)
import "tasty-bench" Test.Tasty.Bench (bgroup, bench, defaultMain, nf, bcompare)
-- import "tasty-bench" Test.Tasty.Bench (mutatorCpuTime)
import "tasty-expected-failure" Test.Tasty.ExpectedFailure (expectFail)
import "some" Data.Some.Newtype (Some(Some))
import "transformers" Control.Monad.Trans.State
import qualified "containers" Data.Set as Set
import "containers" Data.Set (Set)

Background: breadth-first folds and traversals

Our running example will be the type of binary trees:

data Tree a = Leaf | Node a (Tree a) (Tree a)
  deriving (Eq, Show, Generic, NFData)

A breadth-first walk explores the tree level by level; every level contains the nodes at the same distance from the root. The list of levels of a tree can be defined recursively—it is a fold. For a tree Node x l r, the first level contains just the root node x, and the subsequent levels are obtained by appending the levels of the subtrees l and r pairwise.

levels :: Tree a -> [[a]]
levels Leaf = []
levels (Node x l r) = [x] : zipLevels (levels l) (levels r)
zipLevels :: [[a]] -> [[a]] -> [[a]]
zipLevels [] yss = yss
zipLevels xss [] = xss
zipLevels (xs : xss) (ys : yss) = (xs ++ ys) : zipLevels xss yss

(We can’t just use zipWith because it throws away the end of a list when the other list is empty.)

Finally, we concatenate the levels together to obtain the list of nodes in breadth-first order.

toListBF :: Tree a -> [a]
toListBF = concat . levels

Thanks to laziness, the list will indeed be produced by walking the tree in breadth-first order. So far so good.

The above function lets us fold a tree in breadth-first order. The next level of difficulty is to traverse a tree, producing a tree with the same shape as the original tree, only with modified labels.

traverseBF :: Applicative m => (a -> m b) -> Tree a -> m (Tree b)

This has the exact same type as traverse, which you might obtain with deriving (Foldable, Traversable). The stock-derived Traversable—enabled by the DeriveTraversable extension—is a depth-first traversal, but the laws of traverse don’t specify the order in which nodes should be visited, so you could make it a breadth-first traversal if you wanted.

To define a breadth-first traversal is a surprisingly non-trivial exercise, as pointed out by Chris Okasaki in Breadth-first numbering: lessons from a small exercise in algorithm design (ICFP 2000).

“Breadth-first numbering” is a special case of “breadth-first traversal” where the arrow (a -> m b) is specialized to a counter. Okasaki presents a “numbering” solution based on queues and another solution based on levels. Both are easily adaptable to the more general “traversal” problem as we will soon see.

There is a wonderful Discourse thread from 2024 on the topic of breadth-first traversals. The first post gives an elegant breadth-first numbering algorithm which also appears in the appendix of Okasaki’s paper, but sadly it does not generalize from “numbering” to “traversal” beyond the special case m = State s.

Last but not least, another level-based solution to the breadth-first traversal problem can be found in the tree-traversals library by Noah Easterly. It is built around an applicative transformer named Phases, which is a list of actions—imagine the type “[m _]”—where each element m _ represents one level of the tree. The Phases applicative enables a compositional definition of a breadth-first traversal, similarly to the levels function above: the set of nodes reachable from the root is defined by combining the sets of nodes reachable from its children. This concern of compositionality is one of the main motivations behind this post.

Non-standard terminology

The broad family of algorithms being discussed is typically called “breadth-first search” (BFS) or “breadth-first traversal”, but in general these algorithms are not “searching” for anything, and in Haskell, “traversal” is reserved for “things like traverse”. Instead, this post will use “walks” as a term encompassing folds, traversals, unfolds, or any concept that can be qualified with “breadth-first”.

Problem statement: Breadth-first unfolds

Both the fold toListBF and the traversal traverseBF had in common that they receive a tree as an input. This explicit tree makes the notion of levels “static”. With unfolds, we will have to deal with levels that exist only “dynamically” as the result of unfolding the tree progressively.

To introduce the unfolding of a tree, it is convenient to introduce its “base functor”. We modify the tree type by replacing the recursive tree fields with an extra type parameter:

data TreeF a t = LeafF | NodeF a t t
  deriving (Functor, Foldable, Traversable)

An unfold generates a tree from a seed and a function which expands the seed into a leaf or a node containing more seeds. A pure unfold—or anamorphism—can be defined readily:

unfold :: (s -> TreeF a s) -> s -> Tree a
unfold f s = case f s of
  LeafF -> Leaf
  NodeF a l r -> Node a (unfold f l) (unfold f r)

The order in which nodes are evaluated depends on how the resulting tree is consumed. Hence unfold is neither inherently “depth-first” nor “breadth-first”.

The situation changes if we make the unfold monadic.

unfoldM :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)

An implementation of unfoldM must decide upon an ordering between actions. To see why adding an M to unfold imposes an ordering, contemplate the fact that these expressions have the same meaning:

Node a (unfold f l) (unfold f r)
= ( let tl = unfold f l in
    let tr = unfold f r in
    Node a tl tr )
= ( let tr = unfold f r in
    let tl = unfold f l in
    Node a tl tr )

whereas these monadic expressions do not have the same meaning in general:

( unfoldM f l >>= \tl ->
  unfoldM f r >>= \tr ->
  pure (Node a tl tr) )
/=
( unfoldM f r >>= \tr ->
  unfoldM f l >>= \tl ->
  pure (Node a tl tr) )

Without further requirements, there is an “obvious” definition of unfoldM, which is a depth-first unfold:

unfoldM_DF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_DF f s = f s >>= \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (unfoldM_DF f l) (unfoldM_DF f r)

We unfold the left subtree l fully before unfolding the right one r.

The problem is to define a breadth-first unfoldM.

If you want to think about this problem on your own, you can stop reading here. The rest of this post presents solutions.

Queue-based unfold

The two breadth-first numbering algorithms in Okasaki’s paper can actually be generalized to breadth-first unfolds. Here is the first one that uses queues (using the function (<+) for “push” and pattern-matching on (:>) for “pop”):

unfoldM_BF_Q :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_Q f b0 = go (b0 <+ Empty) <&> \case
    _ :> t -> t
    _ -> error "impossible"
  where
    go Empty = pure Empty
    go (q :> b) = f b >>= \case
      LeafF -> go q <&> \p -> Leaf <+ p
      NodeF a b1 b2 -> go (b2 <+ b1 <+ q) <&> \case
        p :> t1 :> t2 -> Node a t1 t2 <+ p
        _ -> error "impossible"

(The operator (<&>) is flip (<$>). I use it to avoid parentheses around lambdas.)

Queue implementation for unfoldM_BF_Q
data Q a = Q [a] [a]

pattern Empty :: Q a
pattern Empty = Q [] []

infixr 1 <+
(<+) :: a -> Q a -> Q a
x <+ Q xs ys = Q (x : xs) ys

pop :: Q a -> Maybe (Q a, a)
pop (Q xs (y : ys)) = Just (Q xs ys, y)
pop (Q xs []) = case reverse xs of
  [] -> Nothing
  y : ys -> Just (Q [] ys, y)

infixl 1 :>
pattern (:>) :: Q a -> a -> Q a
pattern q :> y <- (pop -> Just (q, y))

{-# COMPLETE Empty, (:>) #-}

As it happens, containers uses that queue-based technique to implement breadth-first unfold for rose trees (Data.Tree.unfoldTreeM_BF). There is a pending question of whether we can improve upon it. This post might provide a theoretical alternative, but it seems too slow to be worth serious consideration (see the benchmark section).

If you’re frowning upon the use of error—as you should be—you can replace error with dummy values here (Empty, Leaf), but (1) that won’t be possible with tree structures that must be non-empty (e.g., if Leaf contained a value) and (2) this is dead code, which is harmless but no more elegant than making it obvious with error.

The correctness of this solution is also not quite obvious. There are subtle ways to get this implementation wrong: should the recursive call be b2 <+ b1 <+ q or b1 <+ b2 <+ q? Should the pattern be p :> t1 :> t2 or p :> t2 :> t1? For another version of this challenge, try implementing the unfold for another tree type, such as finger trees or rose trees, without getting lost in the order of pushes and pops (by the way, this is Data.Tree.unfoldTreeM_BF in containers). The invariant is not complex but there is room for mistakes. I believe that the compositional approach that will be presented later is more robust on that front, although it is admittedly a subjective quality for which is difficult to make a strong case.

Some uses of unfolds

Traversals from unfolds

One sense in which unfoldM is a more difficult problem than traverse is that we can use unfoldM to implement traverse. We do have to make light of the technicality that there is a Monad constraint instead of Applicative, which makes unfoldM not suited to implement the Traversable class.

A depth-first unfold gives a depth-first traversal:

traverse_DF :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_DF = unfoldM_DF . traverseRoot

-- auxiliary function
traverseRoot :: Applicative m => (a -> m b) -> Tree a -> m (TreeF b (Tree a))
traverseRoot _ Leaf = pure LeafF
traverseRoot f (Node a l r) = f a <&> \b -> NodeF b l r

A breadth-first unfold gives a breadth-first traversal:

traverse_BF_Q :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_BF_Q = unfoldM_BF_Q . traverseRoot

Unfolds in graphs

We can use a tree unfold to explore a graph. This usage distinguishes unfolds from folds and traversals, which only let you explore trees.

Given a type of vertices V, a directed graph is represented by a function V -> F V, where F is a functor which describes the arity of each node. The obvious choice for F is lists, but we will stick to TreeF here so we can just reuse this post’s unfoldM implementations. The TreeF functor restricts us graphs where each node has zero or two outgoing edges; it is a weird restriction, but we will make do for the sake of example.

An ASCII drawing of a graph
        +-------+
        v       |
+->1--->2--->3  |
|  |    |    ^  |
|  v    v    |  |
|  4--->5--->6--+
|  |    |    ^
|  +----|----+
|       |
+-------+

The graph drawn above turns into the following function, where every vertex is mapped either to NodeF with the same vertex as the first argument followed by its two adjacent vertices, or to LeafF if it has no outgoing edges or does not belong to the graph.

graph :: Int -> TreeF Int Int
graph 1 = NodeF 1 2 4
graph 2 = NodeF 2 3 5
graph 3 = LeafF
graph 4 = NodeF 4 5 6
graph 5 = NodeF 5 1 6
graph 6 = NodeF 6 2 3
graph _ = LeafF

If we simply feed that function to unfold, we will get the infinite tree of all possible paths from a chosen starting vertex.

To obtain a finite tree, we want to keep track of vertices that we have already visited, using a stateful memory. The following function wraps graph, returning LeafF also if a vertex has already been visited.

visitGraph :: Int -> State (Set Int) (TreeF Int Int)
visitGraph vertex = do
  visited <- get
  if vertex `elem` visited then pure LeafF
  else do
    put (Set.insert vertex visited)
    pure (graph vertex)

Applying unfoldM_BF to that function produces a “breadth-first tree” of the graph, an encoding of the trajectory of a breadth-first walk through the graph. “Breadth-first trees” are a concept from graph theory with well-studied properties.

-- Visit `graph` in breadth-first order
bfGraph_Q :: Int -> Tree Int
bfGraph_Q = (`evalState` Set.empty) . unfoldM_BF_Q visitGraph
testGraphQ :: TestTree
testGraphQ = testCase "Q-graph" $
  bfGraph_Q 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

Compile and run

This post is a compilable Literate Haskell file. You can run all of the tests and benchmarks in here. The source repository provides the necessary configuration to build it with cabal.

$ cabal build breadth-first-unfolds

Test cases can then be selected with the -p option and a pattern (see the tasty documentation for details). Run all tests and benchmarks by passing no option.

$ cabal exec breadth-first-unfolds -- -p "/Q-graph/||/S-graph/"
All
  Q-graph: OK
  S-graph: OK

“Global” level-based unfold

The other solution from Okasaki’s paper can also be adapted into a monadic unfold.

The starting point is to unfold a list of seeds [s] instead of a single seed: we can traverse the list with the expansion function s -> m (TreeF a s) to obtain another list of seeds, the next level of the breadth-first unfold, and keep going.

Iterating this process naively yields a variant of monadic unfold without a result. This no-result variant can be generalized from TreeF to any foldable structure:

-- Inner loop: multi-seed unfold
unfoldsM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> [s] -> m ()
unfoldsM_BF_G_ f [] = pure ()
-- Read from right to left: traverse, flatten, recurse.
unfoldsM_BF_G_ f xs = unfoldsM_BF_G_ f . concatMap toList =<< traverse f xs

-- Top-level function: single-seed unfold
unfoldM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> s -> m ()
unfoldM_BF_G_ f = unfoldsM_BF_G_ f . (: [])

Modifying this solution to create the output tree requires a little more thought. We must keep hold of the intermediate list of ts :: [TreeF a s] to reconstruct trees after the recursive call returns.

unfoldsM_BF_G :: Monad m => (s -> m (TreeF a s)) -> [s] -> m [Tree a]
unfoldsM_BF_G f [] = pure []
-- traverse, flatten, recurse, reconstruct
unfoldsM_BF_G f xs = traverse f xs >>= \ts ->
  reconstruct ts <$> unfoldsM_BF_G f (concatMap toList ts)

The reconstruction function picks a root in the first list and completes it with subtrees from the second list:

reconstruct :: [TreeF a s] -> [Tree a] -> [Tree a]
reconstruct (LeafF : ts) us = Leaf : reconstruct ts us
reconstruct (NodeF a _ _ : ts) (l : r : us) = Node a l r : reconstruct ts us
reconstruct _ _ = error "impossible"

You could modify the final branch to produce [], but error makes it explicit that this branch should never be reached by the unfold where it is used.

The top-level unfold function wraps the seed in a singleton input list and extracts the root from a singleton output list.

unfoldM_BF_G :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_G f = fmap head . unfoldsM_BF_G f . (: [])
Unit test testGraphG
bfGraph_G :: Int -> Tree Int
bfGraph_G = (`evalState` Set.empty) . unfoldM_BF_G visitGraph
testGraphG :: TestTree
testGraphG = testCase "Q-graph" $
  bfGraph_G 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

This solution is less brittle than the queue-based solution because we always traverse lists left-to-right. To avoid the uses of error in reconstruct, you can probably create a specialized data structure in place of [TreeF a s], but that is finicky in its own way.

In search of compositionality

Both of the solutions above (the queue-based and the “monolithic” level-based unfolds) stem from a global view of breadth-first walks: we are iterating on a list or a queue which holds all the seeds from one or two levels at a time. That structure represents a “front line” between visited and unvisited vertices, and every iteration advances the front line a little: with a queue we advance it one vertex at a time, with a list we advance the whole front line in an inner loop—one call to traverse—before recursing.

The opposite local view of breadth-first order is exemplified by the earlier levels function: it only produces a list of lists of the vertices reachable from the current root. It does so recursively, by composing together the vertices reachable from its children. Our goal here is to find a similarly local, compositional implementation of breadth-first unfolds.

Rather than defining unfoldM directly, which sequences the computations on all levels into a single computation, we will introduce an intermediate function weave that keeps levels separate—just as toListBF is defined using levels. The result of weave will be in an as yet unknown applicative functor F m depending on m. And because levels are kept separate, weave only needs a constraint Applicative m to compose computations on the same level. The goal is to implement this signature, where the result type F is also an unknown:

weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)

The name weave comes from visualizing a breadth-first walk as a path zigzagging across a tree like this:

Breadth-first path

which is reminiscent of weaving as in the making of textile:

Illustration from the Wikipedia article on Weaving: warp and weft.
Warp and weft illustration

With only what we know so far, a bit of type-directed programming leads to the following incomplete definition. We have constructed something of type m (F m (Tree a)), while we expect F m (Tree a):

weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = _ (step <$> f s) where
  step :: TreeF a s -> F m (Tree a)
  step LeafF = pure Leaf
  step (NodeF a l r) = liftA2 NodeF (weave f l) (weave f r)

To fill the hole _, we postulate the following primitive, weft, as part of the unknown definition of F:

weft :: Applicative m => m (F m a) -> F m a

Intuitively, F m represents “multi-level computations”. The weft function constructs a multi-level (F m)-computation from one level of m-computation which returns the subsequent levels as an (F m)-computation.

We fill the hole with weft, completing the definition of weave:

weave :: forall m s a. Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = weft (weaveF <$> f s) where
  weaveF :: TreeF a s -> F m (Tree a)
  weaveF LeafF = pure Leaf
  weaveF (NodeF a l r) = liftA2 (Node a) (weave f l) (weave f r)

The function weave defines a multi-level computation which represents a breadth-first walk from a seed s:

  • the first level of the walk is f s, expanding the initial seed;
  • the auxiliary function weaveF constructs the remaining levels from the initial seed’s expansion:
    • if the seed expands to LeafF, there are no more seeds, and we terminate with an empty computation (pure);
    • if the seed expands to NodeF, we obtain two sub-seeds l and r, they generate their own weaves recursively (weave f l and weave f r), and we compose them (liftA2).

One way to think about weft is as a generalization of the following primitives: we can “embed” m-computations into F m, and we can “delay” multi-level (F m)-computations, shifting the m-computation on each level to the next level.

embed :: Applicative m => m a -> F m a
embed u = weft (pure <$> u)

delay :: Applicative m => F m a -> F m a
delay u = weft (pure u)

The key law relating these two operations is that embedded computations and delayed computations commute with each other:

embed u *> delay v = delay v <* embed u

The embed and delay operations are provided by the Phases applicative functor that I mentioned earlier, which enables breadth-first traversals, but not breadth-first unfolds. Thus, weft is a strictly more expressive primitive than embed and delay.

Eventually, we will run a multi-level computation as a single m-computation so that we can use weave to define unfoldM. The runner function will be called mesh:

mesh :: Monad m => F m a -> m a

It is characterized by this law which says that mesh executes the first level of the computation u :: m (F m a), then executes the remaining levels recursively:

mesh (weft u) = u >>= mesh

Putting everything together, weave and mesh combine into a breadth-first unfold:

unfoldM_BF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF f s = mesh (weave f s)

It remains to find an applicative functor F equipped with weft and mesh.

The weave applicative

A basic approach to design a type is to make some of the operations it should support into constructors. The weave applicative WeaveS has constructors for pure and weft:

data WeaveS m a
  = EndS a
  | WeftS (m (WeaveS m a))

(The suffix “S” stands for Spoilers. Read on!)

We instantiate the unknown functor F with WeaveS.

type F = WeaveS

Astute readers will have recognized WeaveS as the free monad. Just as Phases has the same type definition as the free applicative functor but a different Applicative instance, we will give WeaveS an Applicative instance that does not coincide with the Applicative and Monad instances of the free monad.

Starting with the easy functions, weft is WeftS, and the equation for mesh above is basically its definition. We just need to add an equation for EndS.

weft :: m (WeaveS m a) -> WeaveS m a
weft = WeftS

mesh :: Monad m => WeaveS m a -> m a
mesh (EndS a) = pure a
mesh (WeftS u) = u >>= mesh

Recall that WeaveS represents multi-level computations. Computations are composed level-wise with the following liftS2. The interesting case is the one where both arguments are WeftS: we compose the first level with liftA2, and the subsequent ones with liftS2 recursively.

liftS2 :: Applicative m => (a -> b -> c) -> WeaveS m a -> WeaveS m b -> WeaveS m c
liftS2 f (EndS a) wb = f a <$> wb
liftS2 f wa (EndS b) = flip f b <$> wa
liftS2 f (WeftS wa) (WeftS wb) = WeftS ((liftA2 . liftS2) f wa wb)

liftS2 will be the liftA2 in WeaveS’s Applicative instance. The Functor and Applicative instances show that WeaveS is an applicative transformer: for every applicative functor m, WeaveS m is also an applicative functor.

instance Functor m => Functor (WeaveS m) where
  fmap f (EndS a) = EndS (f a)
  fmap f (WeftS wa) = WeftS ((fmap . fmap) f wa)

instance Applicative m => Applicative (WeaveS m) where
  pure = EndS
  liftA2 = liftS2

That completes the definition of unfoldM_BF: a level-based, compositional breadth-first unfold.

As a unit test, we copy the code for visiting a graph from earlier:

bfGraphS :: Int -> Tree Int
bfGraphS = (`evalState` Set.empty) . unfoldM_BF visitGraph
testGraphS :: TestTree
testGraphS = testCase "S-graph" $
  bfGraphS 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

Code golf

There is a variant of weave that I prefer:

weaveS :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveS m (Tree a))
weaveS f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (weft (weaveS f l)) (weft (weaveS f r))

The outer weft constructor was moved into the recursive calls. The result type has an extra m, which makes it more apparent that we always start with a call to f. It’s the same vibe as replacing the type [a] with NonEmpty a when we know that a list will always have at least one element; weaveS always produces at least one level of computation. We also replace (<$>) with its flipped version (<&>) for aesthetic reasons: we can apply it to a lambda without parentheses, and that change makes the logic flow naturally from left to right: we first expand the seed s using f, and continue depending on whether the expansion produced LeafF or NodeF.

To define unfoldM, instead of applying mesh directly, we chain it with (>>=).

unfoldM_BF_S :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_S f s = weaveS f s >>= mesh

A wrinkle in time

That solution is Obviously Correct™, but it has a terrible flaw: it does not run in linear time!

We can demonstrate this by generating a “thin” tree whose height is equal to its size. The height h is the seed of the unfolding, and we generate a NodeF as long as it is non-zero, asking for a decreased height h - 1 on the right, and a zero height on the left.

thinTreeS :: Int -> Tree ()
thinTreeS = runIdentity . unfoldM_BF_S f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

Compare the running times of evaluating thinTreeS at height 100 (the baseline) and at height 1000 (10x the baseline).

benchS :: TestTree
benchS = bgroup "S-thin"
  [ bench  "1x" (nf thinTreeS 100)
  , bench "10x" (nf thinTreeS 1000) & bcompare "S-thin.1x"
  ]

Benchmark output (relative):

height time
baseline 1x
10x 105x
Raw output
$ cabal exec breadth-first-unfolds -- -p "S-thin"
All
  S-thin
    1x:  OK
      27.6 μs ± 2.6 μs, 267 KB allocated, 317 B  copied, 6.0 MB peak memory
    10x: OK
      2.90 ms ± 181 μs,  23 MB allocated, 178 KB copied, 7.0 MB peak memory, 105.35x

Multiplying the height by 10x makes the function run 100x slower. Dramatically quadratic.

Complexity analysis

We can compare this implementation with level from earlier, which is linear-time. In particular, looking at zipLevels with liftS2—which play similar roles—there is a crucial difference when one of the arguments is empty ([] or EndS): zipLevels simply returns the other argument, whereas liftS2 calls (<$>), continuing the recursion down the other argument. So zipLevels stops working after reaching the end of either argument, whereas liftS2 walks to the end of both arguments. There is at least one call to liftS2 on every level which will walk to the bottom of the tree, so we get a quadratic lower bound Ω(height2).

Out of sight, out of mind

The problematic combinators are fmap and liftS2, which weaveS uses to construct the unfolded tree. If we don’t care about that tree—wanting only the effect of a monadic unfold—then we can get rid of the complexity associated with those combinators.

With no result to return, we remove the a type parameter from the definition of WeaveS, yielding the oblivious (“O”) variant:

data WeaveO m
  = EndO
  | WeftO (m (WeaveO m))

We rewrite mesh into meshO, reducing a WeaveO m computation into m () instead of m a.

meshO :: Monad m => WeaveO m -> m ()
meshO EndO = pure ()
meshO (WeftO u) = u >>= meshO

The Applicative instance for WeaveS becomes a Monoid instance for WeaveO. liftA2 is replaced with (<>), zipping two computations level-wise.

instance Applicative m => Semigroup (WeaveO m) where
  EndO <> v = v
  u <> EndO = u
  WeftO u <> WeftO v = WeftO (liftA2 (<>) u v)

instance Applicative m => Monoid (WeaveO m) where
  mempty = EndO
  mappend = (<>)

To implement a breadth-first walk, we modify weaveS above by replacing liftA2 (Node a) with (<>). Note that the type parameter a is no longer in the result. It was only used in the tree that we decided to forget.

weaveO :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveO m)
weaveO f s = f s <&> \case
  LeafF -> mempty
  NodeF _ l r -> WeftO (weaveO f l) <> WeftO (weaveO f r)

Running weaveO with meshO yields a oblivious monadic unfold: it produces m () instead of m (Tree a). (You may remember seeing another implementation of that same signature just earlier, unfoldM_BF_G_.)

unfoldM_BF_O_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_O_ f s = weaveO f s >>= meshO

Previously, we benchmarked the function thinTreeS that outputs a tree by forcing the tree. With an oblivious unfold, there is no tree to force. Instead we will count the number of generated NodeF constructors:

thinTreeO :: Int -> Int
thinTreeO = (`execState` 0) . unfoldM_BF_O_ (state . f)
  where
    f 0 counter = (LeafF, counter)
    f h counter = (NodeF () 0 (h - 1), counter + 1)  -- increment the counter for every NodeF

We adapt the benchmark from before to measure the complexity of unfolding thin trees. We have to increase the baseline height from 100 to 500 because this benchmark runs so much faster than the previous ones.

benchO :: TestTree
benchO = bgroup "O-thin"
  [ bench  "1x" (nf thinTreeO 500)
  , bench "10x" (nf thinTreeO 5000) & bcompare "O-thin.1x"
  ]

Benchmark output (relative):

height time
baseline 1x
10x 9.8x
Raw output
$ cabal exec breadth-first-unfolds -- -p O-thin
All
  O-thin
    1x:  OK
      148  μs ± 8.3 μs, 543 KB allocated, 773 B  copied, 6.0 MB peak memory
    10x: OK
      1.45 ms ± 113 μs, 5.4 MB allocated,  82 KB copied, 7.0 MB peak memory, 9.78x

The growth is linear, as desired: the “10x” bench is 10x slower than the baseline “1x” bench.

Laziness for the win

The oblivious unfold avoided quadratic explosion by simplifying the problem. Now let’s solve the original problem again, so we can’t just get rid of fmap and liftA2. As mentioned previously, the root cause was that (1) liftA2 calls fmap when one of the constructors is EndS, and (2) fmap traverses the other argument. The next solution will be to make fmap take constant time, by storing the “mapped function” in the constructor. Behold the “L” variant of WeaveS, which is a GADT:

data WeaveL m a where
  EndL :: a -> WeaveL m a
  WeftL :: m (WeaveL m b) -> (b -> a) -> WeaveL m a

For comparison, here is the previous “S” variant with GADT syntax:

data WeaveS m a where
  EndS :: a -> WeaveS m a
  WeftS :: m (WeaveS m a) -> WeaveS m a

This trick is also known as the “co-Yoneda construction”.

The definition of fmap is no longer recursive. It doesn’t even need m to be a functor anymore!

instance Functor (WeaveL m) where
  fmap f (EndL a) = EndL (f a)
  fmap f (WeftL wa g) = WeftL wa (f . g)

The Applicative instance is… a good exercise for the reader. The details are not immediately important—we only care about improving fmap for now—we will come back to have a look at the Applicative instance soon.

The runner function meshL is a simple bit of type Tetris.

meshL :: Monad m => WeaveL m a -> m a
meshL (EndL a) = pure a
meshL (WeftL wa f) = f <$> (wa >>= meshL)

By partially applying WeftL to id as its second argument, we obtain a counterpart to the unary WeftS constructor:

weftL :: m (WeaveL m a) -> WeaveL m a
weftL wa = WeftL wa id

With those primitives redefined, the “weave” and “unfold” are identical. Below, we only renamed the “S” suffixes to “L”:

weaveL :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveL m (Tree a))
weaveL f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftL (weaveL f s1)) (weftL (weaveL f s2))

unfoldM_BF_L :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_L f s = weaveL f s >>= meshL

The benchmarks show that 10x the height takes 10x the time. Linear growth again.

Benchmark code and output: thinTreeL and benchL

Copy of the benchS benchmark.

thinTreeL :: Int -> Tree ()
thinTreeL = runIdentity . unfoldM_BF_L f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchL :: TestTree
benchL = bgroup "L-thin"
  [ bench  "1x" (nf thinTreeL  100)
  , bench "10x" (nf thinTreeL 1000) & bcompare "L-thin.1x"
  ]

Benchmark output (relative):

height time
baseline 1x
10x 9.93x

Raw output:

$ cabal exec breadth-first-unfolds -- -p "L-thin"     
All
  L-thin
    1x:  OK
      14.1 μs ± 782 ns,  59 KB allocated,   5 B  copied, 6.0 MB peak memory
    10x: OK
      140  μs ±  13 μs, 586 KB allocated,  51 B  copied, 6.0 MB peak memory, 9.93x

Lazy in more ways than one

As hinted by the “L” and “S” suffixes, WeaveL is a “lazy” variant of WeaveS: fmap for WeaveL “postpones” work by accumulating functions in the WeftL constructor. That work is “forced” by meshL, which is where the fmap ((<$>)) of the underlying monad m is called, performing the work accumulated by possibly many calls to WeaveL’s fmap.

One subtlety is that there are multiple “lazinesses” at play. The main benefit of using WeaveL is really to delay computation, that is a kind of laziness, but WeaveL doesn’t need to be implemented in a lazy language. We can rewrite all of the code we’ve seen so far in a strict language with minor changes, and we will still observe the quadratic vs linear behavior of WeaveS vs WeaveL on thin trees. The “manufactured laziness” of WeaveL is a concept independent of the “ambient laziness” in Haskell.

Nevertheless, we can still find an interesting role for that “ambient laziness” in this story. Indeed, the function weaveL also happens to be lazier than weaveS in the usual sense.

A concrete test case is worth a thousand words. Consider the following tree generator which keeps unfolding left subtrees while making every right subtree undefined:

partialTreeF :: Bool -> TreeF () Bool
partialTreeF True = NodeF () True False
partialTreeF False = undefined

If we used the pure unfold, we would get the same tree as this recursive definition:

partialTree :: Tree ()
partialTree = Node () partialTree undefined

What happens if we use one of the monadic unfolds? For example unfoldM_BF_S:

partialTreeS :: Tree ()
partialTreeS = runIdentity (unfoldM_BF_S (Identity . partialTreeF) True)

Try to force the first Node constructor.

whnfTreeS :: TestTree
whnfTreeS = expectFail $ testCase "S-whnf" $ do
  case partialTreeS of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable" -- definitely not a Leaf

As it turns out, this test using the “S” variant fails. (That’s why the test is marked with expectFail.) Forcing partialTreeS evaluates the undefined in partialTreeF. Therefore partialTreeS is not equivalent to partialTree.

$ cabal exec breadth-first-unfolds -- -p "S-whnf"
All
  S-whnf: FAIL (expected)
    Exception: Prelude.undefined
    CallStack ...

In contrast, the “L” variant makes that same test succeed.

partialTreeL :: Tree ()
partialTreeL = runIdentity (unfoldM_BF_L (Identity . partialTreeF) True)

whnfTreeL :: TestTree
whnfTreeL = testCase "L-whnf" $ do
  case partialTreeL of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

Test output:

$ cabal exec breadth-first-unfolds -- -p "L-whnf"
All
  L-whnf: OK

This difference can only be seen with “lazy monads”, where (>>=) is lazy in its first argument. (If this definition sounds not quite right, that’s probably because of seq. It makes a precise definition of “lazy monad” more complicated.) Examples of lazy monads from the transformers library are Identity, Reader, lazy State, lazy Writer, and Accum.

The secret sauce is the definition of liftA2 for WeaveL:

instance Applicative m => Applicative (WeaveL m) where
  pure = EndL
  liftA2 f (EndL a) wb = f a <$> wb
  liftA2 f wa (EndL b) = flip f b <$> wa
  liftA2 f (WeftL wa g) (WeftL wb h)
    = WeftL ((liftA2 . liftA2) (,) wa wb) (\ ~(a, b) -> f (g a) (h b))

In the third clause of liftA2, we put the function f in a lambda with a lazy pattern (~(a, b)) directly under the topmost constructor WeftL. Thus, we can access the result of f from the second field of WeftL without looking at the first field. In liftS2 earlier, f was passed as an argument to (liftA2 . liftS2), that forces us to run the computation before we can get a hold on the result of f.

Maximizing laziness

The “L” variant of unfoldM is lazier than the “S” variant, but there is still a gap between partialTreeL and the pure partialTree: if we force not only the root, but also the left subtree of partialTreeL, then we run into undefined again.

forceLeftTreeL :: TestTree
forceLeftTreeL = expectFail $ testCase "L-left" $ do
  case partialTreeL of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"

Test output:

$ cabal exec breadth-first-unfolds -- -p "L-left" 
All
  L-left: FAIL (expected)
    Exception: Prelude.undefined

Although the unfold using WeaveL is lazier than using WeaveS, it is not yet as lazy as it could be. The reason is that, strictly speaking, WeaveL’s liftA2 is a strict function. The expansion function partialTreeF produces a level with an undefined sub-computation, which crashes the whole level. Each level in a computation will be either completely defined or undefined.

To recap, we’ve been looking at the following trees:

partialTreeS = undefined
partialTreeL = Node () undefined undefined
partialTree  = Node () partialTree undefined

It is natural to ask: can we define a breadth-first unfold that, when applied to partialTreeF, will yield the same tree as partialTree?

More generally, the new problem is to define a breadth-first unfoldM whose specialization with the Identity functor is equivalent to the pure unfold even on partially-defined values. That is, it satisfies the following equation:

unfold f = runIdentity . unfoldM (Identity . f)

Laziness without end

The strictness of liftA2 is caused by WeaveL having two constructors. Let’s get rid of EndL.

data WeaveE m a where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeaveE m a

Having only one constructor lets us use lazy patterns:

instance Functor (WeaveE m) where
  fmap f ~(WeftE wa g) = WeftE wa (f . g)

Wait a second. I spoke too fast, GHC gives us an error:

error: [GHC-87005]
    • An existential or GADT data constructor cannot be used
        inside a lazy (~) pattern
    • In the pattern: WeftE wa g
      In the pattern: ~(WeftE wa g)
      In an equation for ‘fmap’: fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |
641 | >   fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |              ^^^^^^^^^^

The feature we need is “first-class existentials”, for which there is an open GHC proposal.

Not letting that stop us, there is a simple version of first-class existentials available in the package some, as the module Data.Some.Newtype (internally using unsafeCoerce). That will be sufficient for our purposes. All we need is an abstract type Some and a pattern synonym:

-- imported from Data.Some.Newtype
data Some f
pattern Some :: f a -> Some f

And we’re back on track. Here comes the actual “E” (endless) variant:

newtype WeaveE m a = MkWeaveE (Some (WeavingE m a))

data WeavingE m a b where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeavingE m a b

I spare you the details.

Functor, Applicative, weftE, meshE
instance Functor (WeaveE m) where
  fmap f (MkWeaveE (Some ~(WeftE u g))) = MkWeaveE (Some (WeftE u (f . g)))

instance Applicative m => Applicative (WeaveE m) where
  pure x = MkWeaveE (Some (WeftE (pure (pure ())) (\_ -> x)))
  liftA2 f (MkWeaveE (Some ~(WeftE u g))) (MkWeaveE (Some ~(WeftE v h)))
    = MkWeaveE (Some (WeftE ((liftA2 . liftA2) (,) u v) (\ ~(x, y) -> f (g x) (h y))))

weftE :: m (WeaveE m a) -> WeaveE m a
weftE u = MkWeaveE (Some (WeftE u id))

meshE :: Monad m => WeaveE m a -> m a
meshE (MkWeaveE (Some (WeftE u f))) = f <$> (u >>= meshE)
Breadth-first unfold, “E” variant: weaveE and unfoldM_BF_E
weaveE :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveE m (Tree a))
weaveE f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftE (weaveE f s1)) (weftE (weaveE f s2))

unfoldM_BF_E :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_E f s = weaveE f s >>= meshE

The endless WeaveE enables an even lazier implementation of unfoldM. When specialized to the identity monad, it lets us force the resulting tree in any order. The forceLeftTreeE test passes (unlike forceLeftTreeL).

partialTreeE :: Tree ()
partialTreeE = runIdentity (unfoldM_BF_E (Identity . partialTreeF) True)

forceLeftTreeE :: TestTree
forceLeftTreeE = testCase "E-left" $ do
  case partialTreeE of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"

Test output:

$ cabal exec breadth-first-unfolds -- -p "E-left"
All
  E-left: OK

One can also check that forcing the left spine of partialTreeE arbitrarily deep throws no errors.

We made it lazy, but at what cost? First, this “Endless” variant only works for lazy monads. With a strict monad, the runner meshE will loop forever. It is possible to run things more incrementally by pattern-matching on WeaveE, but you’re better off using the oblivious WeaveO anyway.

Second, when you aren’t running into an unproductive loop, the “Endless” variant of unfoldM has quadratic time complexity Ω(height2). The reason is essentially the same as the “Strict” variant: liftA2 keeps looping even if one argument is a pure weave—before, that was to traverse the other non-pure argument, now, there isn’t even a way to tell when the computation has ended. Thus, every leaf may create work proportional to the height of the tree.

Running the same benchmark as before, we measure even more baffling timings:

height time
baseline 1x
10x 738x
Benchmark: thinTreeE and benchE
thinTreeE :: Int -> Tree ()
thinTreeE = runIdentity . unfoldM_BF_E f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchE :: TestTree
benchE = {- localOption mutatorCpuTime $ -} bgroup "E-thin"
  [ bench "1x" (nf thinTreeE 100)
  , bench "10x" (nf thinTreeE 1000) & bcompare "E-thin.1x"
  ]

Raw output:

$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      243  μs ±  22 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      179  ms ±  17 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 737.76x

Using the previous setup comparing a baseline and a 10x run, we see a more than 700x slowdown, so much worse than the 100x predicted by a quadratic model. Interestingly, the raw output shows that the total cumulative allocations did grow by a 100x factor.1

But it gets weirder with more data points: it does not follow a clear power law. If Time(n) grew as nc for some fixed exponent c, then the ratio Time(Mn)/Time(n) would be Mc, a constant that does not depend on n.

In the following benchmark, we keep doubling the height (M = 2) for every test case, and we measure the time relative to the preceding case each time. A quadratic model predicts a 4x slowdown at every step. Instead, we observe wildly varying factors.

Benchmark output (each time factor is relative to the preceding line, for example, the “4x” benchmark is 9.5x slower than the “2x” benchmark):

height time
1x
2x 10.9x
4x 9.5x
8x 5.4x
16x 1.4x
Code and raw output
benchE' :: TestTree
benchE' = {- localOption mutatorCpuTime $ -} bgroup "E-thin-more"
  [ bench "1x" (nf thinTreeE 100)
  , bench "2x" (nf thinTreeE 200) & bcompare "E-thin-more.1x"
  , bench "4x" (nf thinTreeE 400) & bcompare "E-thin-more.2x"
  , bench "8x" (nf thinTreeE 800) & bcompare "E-thin-more.4x"
  , bench "16x" (nf thinTreeE 1000) & bcompare "E-thin-more.8x"
  ]
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      222  μs ± 9.3 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    2x:  OK
      2.43 ms ±  85 μs, 4.8 MB allocated, 236 KB copied, 7.0 MB peak memory, 10.94x
    4x:  OK
      23.1 ms ± 1.2 ms,  19 MB allocated, 2.7 MB copied,  10 MB peak memory, 9.53x
    8x:  OK
      126  ms ± 7.8 ms,  76 MB allocated,  18 MB copied,  24 MB peak memory, 5.44x
    16x: OK
      181  ms ± 7.0 ms, 119 MB allocated,  30 MB copied,  24 MB peak memory, 1.44x

I believe this benchmark is triggering some pathological behavior in the garbage collector. I modified tasty-bench with an option to measure CPU time without GC (mutator time). At time of writing, tasty-bench is still waiting for a new release. We can point Cabal to an unreleased commit of tasty-bench by adding the following lines to cabal.project.local.

source-repository-package
    type: git
    location: https://github.com/Bodigrim/tasty-bench.git
    tag: 81ff742a3db1d514461377729e00a74e5a9ac1b8

Then, uncomment the setting “localOption mutatorCpuTime $” in benchE and benchE' above and uncomment the import of mutatorCpuTime at the top.

Benchmark output (excluding GC time, relative):

height time
baseline 1x
1x 95x
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      216  μs ±  18 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      20.5 ms ± 1.9 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 94.91x

For the “2x” benchmarks, we are closer the expected 4x slowdown, but there is still a noticeable gap. I’m going to chalk the rest to inherent measurement errors (the cost of tasty-bench’s simplicity) exacerbated by the pathological GC behavior; a possible explanation is that the pattern of memory usage becomes so bad that it affects non-GC time.

Benchmark output (excluding GC time, each measurement is relative to the preceding line):

height time
1x
2x 3.2x
4x 4.2x
8x 4.5x
16x 1.7x
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      186  μs ±  16 μs, 1.2 MB allocated,  13 KB copied,  21 MB peak memory
    2x:  OK
      597  μs ±  28 μs, 4.8 MB allocated, 236 KB copied,  21 MB peak memory, 3.20x
    4x:  OK
      2.48 ms ± 148 μs,  19 MB allocated, 2.9 MB copied,  21 MB peak memory, 4.15x
    8x:  OK
      11.2 ms ± 986 μs,  76 MB allocated,  18 MB copied,  24 MB peak memory, 4.50x
    16x: OK
      18.4 ms ± 1.7 ms, 119 MB allocated,  29 MB copied,  24 MB peak memory, 1.65x

It doesn’t seem possible for a breadth-first unfold to be both maximally lazy and of linear time complexity, but I don’t know how to formally prove that impossibility either.

Microbenchmarks: Queues vs Global Levels vs Weaves

So far we’ve focused on asymptotics (linear vs quadratic). Some readers will inevitably wonder about real speed. Among the linear-time algorithms—queues (“Q”), global levels (“G”), and weaves (lazy “L” or oblivious “O”)—which one is faster?

tl;dr: Queues are (much) faster in these microbenchmarks (up to 25x!), but keep in mind that these are all quite naive implementations.

There are two categories to measure separately: unfolds which produce trees, and oblivious unfolds—which don’t produce trees. These microbenchmarks construct full trees up to a chosen number of nodes. When there is an output tree, we force it (using nf), otherwise we force a counter of the number of nodes. We run on different sufficiently large sizes (500 and 5000) to check the stability of the measured factors, ensuring that we are only comparing the time components that dominate at scale.

The tables list times relative to the queue benchmark for each tree size.

Tree-producing unfolds

algorithm size time
Queue 500 1x
Global Levels 500 1.4x
Lazy Weave 500 3.1x
Queue 5000 1x
Global Levels 5000 1.2x
Lazy Weave 5000 3.3x
Code and raw output
fullTreeF :: Int -> Int -> TreeF Int Int
fullTreeF size n | n >= size = LeafF
fullTreeF size n = NodeF n (2 * n) (2 * n + 1)

fullTree_Q :: Int -> Tree Int
fullTree_Q size = runIdentity (unfoldM_BF_Q (Identity . fullTreeF size) 1)

fullTree_G :: Int -> Tree Int
fullTree_G size = runIdentity (unfoldM_BF_G (Identity . fullTreeF size) 1)

fullTree_L :: Int -> Tree Int
fullTree_L size = runIdentity (unfoldM_BF_L (Identity . fullTreeF size) 1)

fullTree :: TestTree
fullTree = bgroup "fullTree"
  [ bench "Q-1x" (nf fullTree_Q 500)
  , bench "G-1x" (nf fullTree_G 500) & bcompare "fullTree.Q-1x"
  , bench "L-1x" (nf fullTree_L 500) & bcompare "fullTree.Q-1x"
  , bench "Q-10x" (nf fullTree_Q 5000)
  , bench "G-10x" (nf fullTree_G 5000) & bcompare "fullTree.Q-10x"
  , bench "L-10x" (nf fullTree_L 5000) & bcompare "fullTree.Q-10x"
  ]
$ cabal exec breadth-first-unfolds -- -p fullTree
All
  fullTree
    Q-1x:  OK
      20.6 μs ± 1.1 μs, 141 KB allocated, 477 B  copied, 6.0 MB peak memory
    G-1x:  OK
      28.6 μs ± 2.4 μs, 223 KB allocated, 928 B  copied, 6.0 MB peak memory, 1.39x
    L-1x:  OK
      64.3 μs ± 5.6 μs, 353 KB allocated, 3.7 KB copied, 6.0 MB peak memory, 3.13x
    Q-10x: OK
      287  μs ±  26 μs, 1.5 MB allocated,  57 KB copied, 7.0 MB peak memory
    G-10x: OK
      349  μs ±  30 μs, 2.2 MB allocated,  94 KB copied, 7.0 MB peak memory, 1.22x
    L-10x: OK
      935  μs ±  73 μs, 3.5 MB allocated, 386 KB copied, 7.0 MB peak memory, 3.25x

Oblivious unfolds

algorithm size time
Queue 500 1x
Global Levels 500 11x
Oblivious Weave 500 25x
Queue 5000 1x
Global Levels 5000 10x
Oblivious Weave 5000 24x
Code and raw output
unfoldM_BF_Q_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_Q_ f s0 = unfoldM_f (s0 <+ Empty)
  where
    unfoldM_f (q :> s) = f s >>= \case
      LeafF -> unfoldM_f q
      NodeF _ l r -> unfoldM_f (r <+ l <+ q)
    unfoldM_f Empty = pure ()
eatFullTree_Q :: Int -> Int
eatFullTree_Q size = (`execState` 0) (unfoldM_BF_Q_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_G :: Int -> Int
eatFullTree_G size = (`execState` 0) (unfoldM_BF_G_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_O :: Int -> Int
eatFullTree_O size = (`execState` 0) (unfoldM_BF_O_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree :: TestTree
eatFullTree = bgroup "eatFullTree"
  [ bench "Q-1x" (nf eatFullTree_Q 500)
  , bench "G-1x" (nf eatFullTree_G 500) & bcompare "eatFullTree.Q-1x"
  , bench "W-1x" (nf eatFullTree_O 500) & bcompare "eatFullTree.Q-1x"
  , bench "Q-10x" (nf eatFullTree_Q 5000)
  , bench "G-10x" (nf eatFullTree_G 5000) & bcompare "eatFullTree.Q-10x"
  , bench "W-10x" (nf eatFullTree_O 5000) & bcompare "eatFullTree.Q-10x"
  ]
$ cabal exec breadth-first-unfolds -- -p eatFullTree
All
  eatFullTree
    Q-1x:  OK
      11.0 μs ± 724 ns,  78 KB allocated, 338 B  copied, 6.0 MB peak memory
    G-1x:  OK
      116  μs ±  11 μs, 379 KB allocated, 1.3 KB copied, 6.0 MB peak memory, 10.57x
    W-1x:  OK
      278  μs ±  14 μs, 830 KB allocated, 5.9 KB copied, 6.0 MB peak memory, 25.36x
    Q-10x: OK
      120  μs ±  11 μs, 781 KB allocated,  21 KB copied, 6.0 MB peak memory
    G-10x: OK
      1.23 ms ± 122 μs, 3.9 MB allocated, 109 KB copied, 7.0 MB peak memory, 10.27x
    W-10x: OK
      2.92 ms ± 255 μs, 8.4 MB allocated, 631 KB copied, 7.0 MB peak memory, 24.43x

Conclusion

I hope to have piqued your interest in breadth-first unfolds without using queues. To the best of my knowledge, this specific problem hasn’t been studied in the literature. It is of course related to breadth-first traversals, previously solved using the Phases applicative.2 The intersection of functional programming and breadth-first walks is a small niche, which makes it quick to survey that corner of the world for any related ideas to those presented here.

The paper Modular models of monoids with operations by Zhixuan Yang and Nicolas Wu, in ICFP 2023, mentions a general construction of Phases as an example application of their theory. Basically, Phases is defined by a fixed-point equation:

Phases f = Day f Phases :+: Identity

We can express Phases abstractly as a least fixed-point μx.fx + Id in any monoidal category with a suitable structure. If we instantiate the monoidal product not with Day convolution, but with functor composition (Compose), then we get Weave.

In another coincidence, the monad-coroutine package implements a weave function which is a generalization of liftS2—this may require some squinting. While WeaveS as a data type coincides with the free monad Free, monad-coroutine’s core data type Coroutine coincides with the free monad transformer FreeT.

We can view Phases as a generalization of “zipping” from lists to free applicatives—which are essentially lists of actions, and Weave generalizes that further to free monads. To recap, the surprise was that the naive data type of free monads results in a quadratic-time unfold. That issue motivated a “lazy” variant3 which achieves a linear-time breadth-first unfold. That in turn suggested an even “lazier” variant which enables more control on evaluation order at the cost of efficiency.

I’ve just released the weave library which implements the main ideas of this post. I don’t expect it to have many users, given how much slower it is compared to queue-based solutions. But I would be curious to find a use case for the new compositionality afforded by this abstraction.

Recap table

Unfolds Time Laziness Compositional
Phases* No linear by levels Yes
Queue (Q) Yes linear strict No
Global Levels (G) Yes linear by levels No
Strict Weave (S) Yes quadratic strict Yes
Oblivious Weave (O) Oblivious only linear N/A Yes
Lazy Weave (L) Yes linear by levels Yes
Endless Weave (E) Yes quadratic‡E maximally lazy Yes

Linear wrt. size: Θ(size).
Quadratic wrt. height: lower bound Ω(height2), upper bound O(height × size).
EThe “Endless” meshE only terminates with lazy monads.
*I guess there exists an “endless Phases” variant, that would be quadratic and maximally lazy.
The definition of “maximally lazy” in this post actually misses a range of possible lazy behaviors with monads other than Identity. A further refinement seems to be another can of worms.


The main action of this Literate Haskell program
main :: IO ()
main = defaultMain
  [ testGraphQ
  , testGraphG
  , testGraphS
  , testGraphL
  , testGraphE
  , whnfTreeQ
  , whnfTreeS
  , whnfTreeL
  , whnfTreeE
  , forceLeftTreeL
  , forceLeftTreeE
  , benchS
  , benchO
  , benchL
  , benchE
  , benchE'
  , fullTree
  , eatFullTree
  ]
Extra test cases
whnfTreeE :: TestTree
whnfTreeE = testCase "E-whnf" $ do
  case partialTreeE of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

whnfTreeQ :: TestTree
whnfTreeQ = expectFail $ testCase "Q-whnf" $ do
  case partialTreeQ of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

partialTreeQ :: Tree ()
partialTreeQ = runIdentity (unfoldM_BF_Q (Identity . partialTreeF) True)

bfGraph_L :: Int -> Tree Int
bfGraph_L = (`evalState` Set.empty) . unfoldM_BF_L visitGraph

testGraphL :: TestTree
testGraphL = testCase "L-graph" $
  bfGraph_L 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

bfGraph_E :: Int -> Tree Int
bfGraph_E = (`evalState` Set.empty) . unfoldM_BF_E visitGraph

testGraphE :: TestTree
testGraphE = testCase "E-graph" $
  bfGraph_E 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

  1. Note that tasty-bench also reports memory statistics (allocated, copied, and peak memory) when certain RTS options are enabled, which I’ve done by compiling the test executable with -with-rtsopts=-T.↩︎

  2. ↩︎
  3. Speaking of variants of free monads, one might think of the “freer” monad, which has different motivations and which does not help us here.↩︎

by Lysxia at March 30, 2025 12:00 AM

March 24, 2025

GHC Developer Blog

GHC 9.6.7 is now available

GHC 9.6.7 is now available

Luite Stegeman - 2025-03-24

The GHC developers are very pleased to announce the availability of GHC 9.6.7. Binary distributions, source distributions, and documentation are available on the release page.

GHC 9.6.7 brings number of fixes, including:

  • GHC’s internal Unique type has been widened to 64-bits on 32-bit architectures, avoiding potential miscompilations on large projects.
  • Fix a runtime crash when using the compacting GC, caused by black holes in large objects.
  • Added new flags -fspec-eval and -fspec-eval-dictfun to allow switching off speculative evaluation.

The following libraries have been updated since GHC 9.6.6:

  • base-4.18.3.0
  • filepath-1.4.301.0
  • unix-2.8.6.0
  • bytestring-0.11.5.4
  • array-0.5.8.0

A more detailed description can be found in the release notes and a full list of changes in GHC!13841 and GHC!13891

Note about Haskell Language Server and building GHC 9.8+:

The change of Unique to 64 bit (GHC#22010) adds the exported symbol ghc_unique_counter64 to the RTS. Unfortunately it’s impossible to avoid this without breaking other things. If you encounter a linker error related to ghc_unique_counter64 when building GHC (or building a GHC-derived package like ghc-lib-parser) with GHC 9.6.7, you probably have to add this fix to the program you’re building.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

Please give this release a try and open a ticket if you see anything amiss.

~ Luite

by ghc-devs at March 24, 2025 12:00 AM

March 20, 2025

Tweag I/O

A hundred pull requests for Liquid Haskell

A new release of Liquid Haskell is out after quite an active period of development with 99 pull requests in the liquidhaskell repository, and 29 pull requests in the liquid-fixpoint repository from about ten contributors. This post is to provide an overview of the changes that made it into the latest release.

There were contributions to the reflection and proof mechanisms; we got contributions to the integration with GHC; the support of cvc5 was improved when dealing with sets, bags, and maps; and there was a rather large overhaul of the name resolution mechanism.

Reflection improvements

Liquid Haskell is a tool to verify Haskell programs. We can write formal specifications inside special Haskell comments {-@ ... @-}, and the tool will check whether the program behaves as specified. For instance, the following specification of the filter function says that we expect all of the elements in the result to satisfy the given predicate.

{-@ filter :: p:(a -> Bool) -> xs:[a] -> {v:[a] | all p v } @-}

Liquid Haskell would then analyze the implementation of filter to verify that it does indeed yield elements that satisfy the predicate.

To verify such a specification, Liquid Haskell needs to attach a meaning to the names in the predicate all p v. It readily learns that p is a parameter of filter, and that v is the result. all, however, isn’t bound by the specification’s parameters, so it refers to whatever is in scope, which is the Haskell function from the Prelude.

all :: (a -> Bool) -> [a] -> Bool

And Liquid Haskell has a mechanism to provide logic meaning to the implementation of a function like all, known as reflection. While it has always been convenient to reflect functions in modules analyzed by Liquid Haskell, it was not so easy when there was a mix of local and imported definitions from dependencies that are not analysed with Liquid Haskell. Last year, there was an internship at Tweag to address exactly this friction, which resulted in contributions to the latest release.

Reasoning and reflection of lambdas

The reflection mechanism also has other specific limitations at the moment. For instance, it doesn’t allow reflecting recursive functions defined in let or where bindings. And until recently, it didn’t allow reflecting functions that contained anonymous functions. For example,

takePositives = filter (\x -> x > 0)

In the latest release, we have several contributions that introduce support for reflecting lambdas and improve the story for reasoning with them. This feature is considered experimental at the moment, since we will still have usability and performance concerns that deserve further contributions, but one can already explore the experience that we could expect in the long run.

Integration with GHC

In 2020 Liquid Haskell became a compiler plugin for GHC. It was hooked into the end of the type checking phase firstly to ensure it only runs on well-typed programs, and secondly, to ensure the plugin runs when GHC is only asked to typecheck the module but not to generate code, which was helpful to IDEs.

For a few technical reasons, the plugin was re-parsing and re-typechecking the module instead of using the abstract syntax tree (AST) that GHC handed to it as the result of type checking. That is no longer the case in the latest release, where the AST after type checking is now used for all purposes. In addition, there were several improvements to how the ghc library is used.

cvc5 support

Liquid Haskell offloads part of its reasoning to a family of automated theorem provers known as SMT solvers. For most developments, Liquid Haskell has been used with the Z3 SMT solver, and this is what has been used most of the time in continuous integration pipelines.

In theory, any SMT solver can be used with Liquid Haskell, if it provides a standard interface known as SMT-LIB. In practice, however, experiments are done with theories that are not part of the standard. For instance, the reasoning capabilities for bags, sets, and maps used to require z3. But now the latest release implements support for cvc5 as well.

Name resolution overhaul

Name resolution determines, for each name in a program, what is the definition that it refers to. Liquid Haskell, in particular, is responsible for resolving names that appear in specifications. This task was problematic when the programs it was asked to verify spanned many modules.

There were multiple kinds of names, each with their own name resolution rules, and names were resolved in different environments when verifying a module and when importing it elsewhere, not always yielding the same results, which often produced confusing errors.

Name resolution, however, was done all over the code base, and any attempt to rationalize it would require a few months of effort. I started such an epic last September, and managed to conclude it in February. These changes made it into the latest release together with an awful lot of side quests to simplify the existing code.

The road ahead

There is no coordinated roadmap for Liquid Haskell. Much of the contributions that it receives depend on the opportunity enabled by academic research or the needs of particular use cases.

On my side, I’m trying to improve the adoption of Liquid Haskell. Much of the challenge is reducing the amount of common workarounds that the proficient Liquid Haskeller needs to employ today. For instance, supporting reflection of functions in local bindings would save the user the trouble of rewriting her programs to put the recursive functions in the top level. Repairing the support for type classes would allow functions to be verified even if they use type classes, which is a large subset of Haskell today. And without having defined a scope with precision yet, Liquid Haskell still needs to improve its user documentation, its error messages, and its tracing and logging.

The project is chugging along, though. It is making significant leaps in usability. The upgrade costs have been quantified for a few GHC releases, and no longer look like an unbounded risk. The amount of external contributions has increased last year, although we still have to see if it is a trend. And there is no shortage of interest from academia and industrial interns.

Thanks to the many contributors for their work and their help during code reviews. I look forward to learning what makes it into the coming Liquid Haskell releases!

March 20, 2025 12:00 AM

March 17, 2025

Haskell Interlude

63: Farhad Mehta

On this episode of the Haskell Interlude, Andres Löh and Mike Sperber are joined by Farhad Mehta, a professor at OST Rapperswil, and one of the organizers of ZuriHac. Fahrad tells us about formal methods, building tunnels, the importance of education, and the complicated relationship between academia and industry.

by Haskell Podcast at March 17, 2025 02:00 PM

March 16, 2025

Magnus Therning

Using lens-aeson to implement FromJSON

At work I sometimes need to deal with large and deep JSON objects where I'm only interested in a few of the values. If all the interesting values are on the top level, then aeson have functions that make it easy to implement FromJSON's parseJSON (Constructors and accessors), but if the values are spread out then the functions in aeson come up a bit short. That's when I reach for lens-aeson, as lenses make it very easy to work with large structures. However, I've found that using its lenses to implement parseJSON become a lot easier with a few helper functions.

Many of the lenses produces results wrapped in Maybe, so the first function is one that transforms a Maybe a to a Parser a. Here I make use of Parser implementing MonadFail.

infixl 8 <!>
(<!>) :: (MonadFail m) => Maybe a -> String -> m a
(<!>) mv err = maybe (fail err) pure mv

In some code I wrote this week I used it to extract the user name out of a JWT produced by Keycloak:

instance FromJSON OurClaimsSet where
    parseJSON = ... $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "preferre username missing"
        ...
        pure $ OurClaimsSet cs n ...

Also, all the lenses start with a Value and that makes the withX functions in aeson to not be a perfect fit. So I define variations of the withX functions, e.g.

withObjectV :: String -> (Value -> Parser a) -> Value -> Parser a
withObjectV s f = withObject s (f . Object)

That makes the full FromJSON instance for OurClaimsSet looks like this

instance FromJSON OurClaimsSet where
    parseJSON = withObjectV "OurClaimsSet" $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "name"
        let rs = o ^.. key "resource_access" . members . key "roles" . _Array . traverse . _String
        pure $ OurClaimsSet cs n rs

March 16, 2025 08:26 AM

March 14, 2025

GHC Developer Blog

GHC 9.12.2 is now available

GHC 9.12.2 is now available

Ben Gamari - 2025-03-14

The GHC developers are happy to announce the release of GHC 9.12.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly. This is a small release fixing a critical code generation bug, #25653, affecting some subword division operations.

As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors who contribute their code, tickets, and energy to the GHC project.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at March 14, 2025 12:00 AM

March 13, 2025

Brent Yorgey

Hendrix College Programming Contest 2025

Hendrix College Programming Contest 2025

Posted on March 13, 2025
Tagged , , , , ,

I haven’t written on here in a while, mostly because a lot of my time has gone into preparing for the second annual Hendrix College Programming Contest, which will take place this Saturday, March 15, from 12:30-5:30pm CDT (17:30-22:30 UTC).

I’ve created an open mirror contest which will run in parallel to the official contest, so if you want to grab some friends and try solving some of the problems together using your favorite language, be my guest!

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at March 13, 2025 12:00 AM

March 10, 2025

Joachim Breitner

Extrinsic termination proofs for well-founded recursion in Lean

A few months ago I explained that one reason why this blog has become more quiet is that all my work on Lean is covered elsewhere.

This post is an exception, because it is an observation that is (arguably) interesting, but does not lead anywhere, so where else to put it than my own blog…

Want to share your thoughts about this? Please join the discussion on the Lean community zulip!

Background

When defining a function recursively in Lean that has nested recursion, e.g. a recusive call that is in the argument to a higher-order function like List.map, then extra attention used to be necessary so that Lean can see that xs.map applies its argument only elements of the list xs. The usual idiom is to write xs.attach.map instead, where List.attach attaches to the list elements a proof that they are in that list. You can read more about this my Lean blog post on recursive definitions and our new shiny reference manual, look for Example “Nested Recursion in Higher-order Functions”.

To make this step less tedious I taught Lean to automatically rewrite xs.map to xs.attach.map (where suitable) within the construction of well-founded recursion, so that nested recursion just works (issue #5471). We already do such a rewriting to change if c then … else … to the dependent if h : c then … else …, but the attach-introduction is much more ambitious (the rewrites are not definitionally equal, there are higher-order arguments etc.) Rewriting the terms in a way that we can still prove the connection later when creating the equational lemmas is hairy at best. Also, we want the whole machinery to be extensible by the user, setting up their own higher order functions to add more facts to the context of the termination proof.

I implemented it like this (PR #6744) and it ships with 4.18.0, but in the course of this work I thought about a quite different and maybe better™ way to do this, and well-founded recursion in general:

A simpler fix

Recall that to use WellFounded.fix

WellFounded.fix : (hwf : WellFounded r) (F : (x : α) → ((y : α) → r y x → C y) → C x) (x : α) : C x

we have to rewrite the functorial of the recursive function, which naturally has type

F : ((y : α) →  C y) → ((x : α) → C x)

to the one above, where all recursive calls take the termination proof r y x. This is a fairly hairy operation, mangling the type of matcher’s motives and whatnot.

Things are simpler for recursive definitions using the new partial_fixpoint machinery, where we use Lean.Order.fix

Lean.Order.fix : [CCPO α] (F : β → β) (hmono : monotone F) : β

so the functorial’s type is unmodified (here β will be ((x : α) → C x)), and everything else is in the propositional side-condition montone F. For this predicate we have a syntax-guided compositional tactic, and it’s easily extensible, e.g. by

theorem monotone_mapM (f : γ → α → m β) (xs : List α) (hmono : monotone f) :
    monotone (fun x => xs.mapM (f x)) 

Once given, we don’t care about the content of that proof. In particular proving the unfolding theorem only deals with the unmodified F that closely matches the function definition as written by the user. Much simpler!

Isabelle has it easier

Isabelle also supports well-founded recursion, and has great support for nested recursion. And it’s much simpler!

There, all you have to do to make nested recursion work is to define a congruence lemma of the form, for List.map something like our List.map_congr_left

List.map_congr_left : (h : ∀ a ∈ l, f a = g a) :
    List.map f l = List.map g l

This is because in Isabelle, too, the termination proofs is a side-condition that essentially states “the functorial F calls its argument f only on smaller arguments”.

Can we have it easy, too?

I had wished we could do the same in Lean for a while, but that form of congruence lemma just isn’t strong enough for us.

But maybe there is a way to do it, using an existential to give a witness that F can alternatively implemented using the more restrictive argument. The following callsOn P F predicate can express that F calls its higher-order argument only on arguments that satisfy the predicate P:

section setup

variable {α : Sort u}
variable {β : α → Sort v}
variable {γ : Sort w}

def callsOn (P : α → Prop) (F : (∀ y, β y) → γ) :=
  ∃ (F': (∀ y, P y → β y) → γ), ∀ f, F' (fun y _ => f y) = F f

variable (R : α → α → Prop)
variable (F : (∀ y, β y) → (∀ x, β x))

local infix:50 " ≺ " => R

def recursesVia : Prop := ∀ x, callsOn (· ≺ x) (fun f => F f x)

noncomputable def fix (wf : WellFounded R) (h : recursesVia R F) : (∀ x, β x) :=
  wf.fix (fun x => (h x).choose)

def fix_eq (wf : WellFounded R) h x :
    fix R F wf h x = F (fix R F wf h) x := by
  unfold fix
  rw [wf.fix_eq]
  apply (h x).choose_spec

This allows nice compositional lemmas to discharge callsOn predicates:

theorem callsOn_base (y : α) (hy : P y) :
    callsOn P (fun (f : ∀ x, β x) => f y) := by
  exists fun f => f y hy
  intros; rfl

@[simp]
theorem callsOn_const (x : γ) :
    callsOn P (fun (_ : ∀ x, β x) => x) :=
  ⟨fun _ => x, fun _ => rfl⟩

theorem callsOn_app
    {γ₁ : Sort uu} {γ₂ : Sort ww}
    (F₁ :  (∀ y, β y) → γ₂ → γ₁) -- can this also support dependent types?
    (F₂ :  (∀ y, β y) → γ₂)
    (h₁ : callsOn P F₁)
    (h₂ : callsOn P F₂) :
    callsOn P (fun f => F₁ f (F₂ f)) := by
  obtain ⟨F₁', h₁⟩ := h₁
  obtain ⟨F₂', h₂⟩ := h₂
  exists (fun f => F₁' f (F₂' f))
  intros; simp_all

theorem callsOn_lam
    {γ₁ : Sort uu}
    (F : γ₁ → (∀ y, β y) → γ) -- can this also support dependent types?
    (h : ∀ x, callsOn P (F x)) :
    callsOn P (fun f x => F x f) := by
  exists (fun f x => (h x).choose f)
  intro f
  ext x
  apply (h x).choose_spec

theorem callsOn_app2
    {γ₁ : Sort uu} {γ₂ : Sort ww}
    (g : γ₁ → γ₂ → γ)
    (F₁ :  (∀ y, β y) → γ₁) -- can this also support dependent types?
    (F₂ :  (∀ y, β y) → γ₂)
    (h₁ : callsOn P F₁)
    (h₂ : callsOn P F₂) :
    callsOn P (fun f => g (F₁ f) (F₂ f)) := by
  apply_rules [callsOn_app, callsOn_const]

With this setup, we can have the following, possibly user-defined, lemma expressing that List.map calls its arguments only on elements of the list:

theorem callsOn_map (δ : Type uu) (γ : Type ww)
    (P : α → Prop) (F : (∀ y, β y) → δ → γ) (xs : List δ)
    (h : ∀ x, x ∈ xs → callsOn P (fun f => F f x)) :
    callsOn P (fun f => xs.map (fun x => F f x)) := by
  suffices callsOn P (fun f => xs.attach.map (fun ⟨x, h⟩ => F f x)) by
    simpa
  apply callsOn_app
  · apply callsOn_app
    · apply callsOn_const
    · apply callsOn_lam
      intro ⟨x', hx'⟩
      dsimp
      exact (h x' hx')
  · apply callsOn_const

end setup

So here is the (manual) construction of a nested map for trees:

section examples

structure Tree (α : Type u) where
  val : α
  cs : List (Tree α)

-- essentially
-- def Tree.map (f : α → β) : Tree α → Tree β :=
--   fun t => ⟨f t.val, t.cs.map Tree.map⟩)
noncomputable def Tree.map (f : α → β) : Tree α → Tree β :=
  fix (sizeOf · < sizeOf ·) (fun map t => ⟨f t.val, t.cs.map map⟩)
    (InvImage.wf (sizeOf ·) WellFoundedRelation.wf) <| by
  intro ⟨v, cs⟩
  dsimp only
  apply callsOn_app2
  · apply callsOn_const
  · apply callsOn_map
    intro t' ht'
    apply callsOn_base
    -- ht' : t' ∈ cs -- !
    -- ⊢ sizeOf t' < sizeOf { val := v, cs := cs }
    decreasing_trivial

end examples

This makes me happy!

All details of the construction are now contained in a proof that can proceed by a syntax-driven tactic and that’s easily and (likely robustly) extensible by the user. It also means that we can share a lot of code paths (e.g. everything related to equational theorems) between well-founded recursion and partial_fixpoint.

I wonder if this construction is really as powerful as our current one, or if there are certain (likely dependently typed) functions where this doesn’t fit, but the β above is dependent, so it looks good.

With this construction, functions defined by well-founded recursion will reduce even worse in the kernel, I assume. This may be a good thing.

The cake is a lie

What unfortunately kills this idea, though, is the generation of the functional induction principles, which I believe is not (easily) possible with this construction: The functional induction principle is proved by massaging F to return a proof, but since the extra assumptions (e.g. for ite or List.map) only exist in the termination proof, they are not available in F.

Oh wey, how anticlimactic.

PS: Path dependencies

Curiously, if we didn’t have functional induction at this point yet, then very likely I’d change Lean to use this construction, and then we’d either not get functional induction, or it would be implemented very differently, maybe a more syntactic approach that would re-prove termination. I guess that’s called path dependence.

by Joachim Breitner (mail@joachim-breitner.de) at March 10, 2025 05:47 PM

March 06, 2025

in Code

Sum Types and Subtypes and Unions

There’s yet again been a bit of functional programming-adjacent twitter drama recently, but it’s actually sort of touched into some subtleties about sum types that I am asked about (and think about) a lot nowadays. So, I’d like to take this opportunity to talk a bit about the “why” and nature of sum types and how to use them effectively, and how they contrast with other related concepts in programming and software development and when even cases where sum types aren’t the best option.

Sum Types at their Best

The quintessential sum type that you just can’t live without is Maybe, now adopted in a lot of languages as Optional:

data Maybe a = Nothing | Just a

If you have a value of type Maybe Int, it means that its valid values are Nothing, Just 0, Just 1, etc.

This is also a good illustration to why we call it a “sum” type: if a has n possible values, then Maybe a has 1 + n: we add the single new value Nothing to it.

The “benefit” of the sum type is illustrated pretty clearly here too: every time you use a value of type Maybe Int, you are forced to consider the fact that it could be Nothing:

showMaybeInt :: Maybe Int -> String
showMaybeInt = \case
  Nothing -> "There's nothing here"
  Just i -> "Something is here: " <> show i

That’s because usually in sum type implementations, they are implemented in a way that forces you to handle each case exhaustively. Otherwise, sum types are much less useful.

At the most fundamental level, this behaves like a compiler-enforced null check, but built within the language in user-space instead being compiler magic, ad-hoc syntax1, or static analysis — and the fact that it can live in user-space is why it’s been adopted so widely. At a higher level, functional abstractions like Functor, Applicative, Monad, Foldable, Traversable allow you to use a Maybe a like just a normal a with the appropriate semantics, but that’s a topic for another time (like 2014).

This power is very special to me on a personal level. I remember many years ago on my first major haskell project changing a type from String to Maybe String, and then GHC telling me every place in the codebase where something needed to change in order for things to work still. Coming from dynamically typed languages in the past, this sublime experience truly altered my brain chemistry and Haskell-pilled me for the rest of my life. I still remember the exact moment, what coffee shop I was at, what my order was, the weather that day … it was truly the first day of the rest of my life.

It should be noted that I don’t consider sum types a “language feature” or a compiler feature as much as I’d consider it a design pattern. Languages that don’t have sum types built-in can usually implement them using typed unions and an abstract visitor pattern interface (more on that later). Of course, having a way to “check” your code before running it (like with a type system or statically verified type annotations) does make a lot of the features much more useful.

Anyway, this basic pattern can be extended to include more error information in your Nothing branch, which is how you get the Either e a type in the Haskell standard library, or the Result<T,E> type in rust.

Along different lines, we have the common use case of defining syntax trees:

data Expr =
    Lit Int
  | Negate Expr
  | Add Expr Expr
  | Sub Expr Expr
  | Mul Expr Expr

eval :: Expr -> Int
eval = \case
    Lit i -> i
    Negate x -> -(eval x)
    Add x y -> eval x + eval y
    Sub x y -> eval x - eval y
    Mul x y -> eval x * eval y

pretty :: Expr -> String
pretty = go 0
  where
    wrap :: Int -> Int -> String -> String
    wrap prio opPrec s
      | prio > opPrec = "(" <> s <> ")"
      | otherwise = s
    go prio = \case
        Lit i -> show i
        Negate x -> wrap prio 2 $ "-" <> go 2 x
        Add x y -> wrap prio 0 $ go 0 x <> " + " <> go 1 y
        Sub x y -> wrap prio 0 $ go 0 x <> " - " <> go 1 y
        Mul x y -> wrap prio 1 $ go 1 x <> " * " <> go 2 y

main :: IO ()
main = do
    putStrLn $ pretty myExpr
    print $ eval myExpr
  where
    myExpr = Mul (Negate (Add (Lit 4) (Lit 5))) (Lit 8)
-(4 + 5) * 8
-72

Now, if we add a new command to the sum type, the compiler enforces us to handle it.

data Expr =
    Lit Int
  | Negate Expr
  | Add Expr Expr
  | Sub Expr Expr
  | Mul Expr Expr
  | Abs Expr

eval :: Expr -> Int
eval = \case
    Lit i -> i
    Negate x -> -(eval x)
    Add x y -> eval x + eval y
    Sub x y -> eval x - eval y
    Mul x y -> eval x * eval y
    Abs x -> abs (eval x)

pretty :: Expr -> String
pretty = go 0
  where
    wrap :: Int -> Int -> String -> String
    wrap prio opPrec s
      | prio > opPrec = "(" <> s <> ")"
      | otherwise = s
    go prio = \case
        Lit i -> show i
        Negate x -> wrap prio 2 $ "-" <> go 2 x
        Add x y -> wrap prio 0 $ go 0 x <> " + " <> go 1 y
        Sub x y -> wrap prio 0 $ go 0 x <> " - " <> go 1 y
        Mul x y -> wrap prio 1 $ go 1 x <> " * " <> go 2 y
        Abs x -> wrap prio 2 $ "|" <> go 0 x <> "|"

Another example where things shine are as clearly-fined APIs between processes. For example, we can imagine a “command” type that sends different types of commands with different payloads. This can be interpreted as perhaps the result of parsing command line arguments or the message in some communication protocol.

For example, you could have a protocol that launches and controls processes:

data Command a =
    Launch String (Int -> a)    -- ^ takes a name, returns a process ID
  | Stop Int (Bool -> a)        -- ^ takes a process ID, returns success/failure

launch :: String -> Command Int
launch nm = Launch nm id

stop :: Int -> Command Bool
stop pid = Stop pid id

This ADT is written in the “interpreter” pattern (used often with things like free monad), where any arguments not involving a are the command payload, any X -> a represent that the command could respond with X.

Let’s write a sample interpreter backing the state in an IntMap in an IORef:

import qualified Data.IntMap as IM
import Data.IntMap (IntMap)

runCommand :: IORef (IntMap String) -> Command a -> IO a
runCommand ref = \case
    Launch newName next -> do
        currMap <- readIORef ref
        let newId = case IM.lookupMax currMap of
              Nothing -> 0
              Just (i, _) -> i + 1
        modifyIORef ref $ IM.insert newId newName
        pure (next newId)
    Stop procId next -> do
        existed <- IM.member procId <$> readIORef ref
        modifyIORef ref $ IM.delete procId
        pure (next existed)

main :: IO ()
main = do
    ref <- newIORef IM.empty
    aliceId <- runCommand ref $ launch "alice"
    putStrLn $ "Launched alice with ID " <> show aliceId
    bobId <- runCommand ref $ launch "bob"
    putStrLn $ "Launched bob with ID " <> show bobId
    success <- runCommand ref $ stop aliceId
    putStrLn $
      if success
        then "alice succesfully stopped"
        else "alice unsuccesfully stopped"
    print =<< readIORef ref
Launched alice with ID 0
Launched bob with ID 1
alice succesfully stopped
fromList [(1, "bob")]

Let’s add a command to “query” a process id for its current status:

data Command a =
    Launch String (Int -> a)    -- ^ takes a name, returns a process ID
  | Stop Int (Bool -> a)        -- ^ takes a process ID, returns success/failure
  | Query Int (String -> a)     -- ^ takes a process ID, returns a status message

query :: Int -> Command String
query pid = Query pid id

runCommand :: IORef (IntMap String) -> Command a -> IO a
runCommand ref = \case
    -- ...
    Query procId next -> do
        procName <- IM.lookup procId <$> readIORef ref
        pure case procName of
          Nothing -> "This process doesn't exist, silly."
          Just n -> "Process " <> n <> " chugging along..."

Relationship with Unions

To clarify a common confusion: sum types can be described as “tagged unions”: you have a tag to indicate which branch you are on (which can be case-matched on), and then the rest of your data is conditionally present.

In many languages this can be implemented under the hood as a struct with a tag and a union of data, along with some abstract visitor pattern interface to ensure exhaustiveness.

Remember, it’s not exactly a union, because, ie, consider a type like:

data Entity = User Int | Post Int

An Entity here could represent a user at a user id, or a post at a post id. If we considered it purely as a union of Int and Int:

union Entity {
    int user_id;
    int post_id;
};

we’d lose the ability to branch on whether or not we have a user or an int. If we have the tagged union, we recover the original tagged union semantics:

struct Entity {
    bool is_user;
    union {
        int user_id;
        int post_id;
    } payload;
};

Of course, you still need an abstract interface like the visitor pattern to actually be able to use this as a sum type with guarantees that you handle every branch, but that’s a story for another day. Alternatively, if your language supports dynamic dispatch nicely, that’s another underlying implementation that would work to back a higher-level visitor pattern interface.

Subtypes Solve a Different Problem

Now, sum types aren’t exactly a part of common programming education curriculum, but subtypes and supertypes definitely were drilled into every CS student’s brain and waking nightmares from their first year.

Informally (a la Liskov), B is a subtype of A (and A is a supertype of B) if anywhere that expects an A, you could also provide a B.

In normal object-oriented programming, this often shows up in early lessons as Cat and Dog being subclasses of an Animal class, or Square and Circle being subclasses of a Shape class.

When people first learn about sum types, there is a tendency to understand them as similar to subtyping. This is unfortunately understandable, since a lot of introductions to sum types often start with something like

-- | Bad Sum Type Example!
data Shape = Circle Double | Rectangle Double Double

While there are situations where this might be a good sum type (ie, for an API specification or a state machine), on face-value this is a bad example on the sum types vs. subtyping distinction.

You might notice the essential “tension” of the sum type: you declare all of your options up-front, the functions that consume your value are open and declared ad-hoc. And, if you add new options, all of the consuming functions must be adjusted.

So, subtypes (and supertypes) are more effective when they lean into the opposite end: the universe of possible options are open and declared ad-hoc, but the consuming functions are closed. And, if you add new functions, all of the members must be adjusted.

In typed languages with a concept of “objects” and “classes”, subtyping is often implemented using inheritance and interfaces.

interface Widget {
    void draw();
    void handleEvent(String event);
    String getName();
}

class Button implements Widget {
    // ..
}

class InputField implements Widget {
    // ..
}

class Box implements Widget {
    // ..
}

So, a function like processWidget(Widget widget) that expects a Widget would be able to be passed a Button or InputField or Box. And, if you had a container like List<Widget>, you could assemble a structure using Button, InputField, and Box. A perfect Liskov storm.

In typical library design, you’re able to add new implementations of Widget as an open universe easily: anyone that imports Widget can, and they can now use it with functions taking Widgets. But, if you ever wanted to add new functionality to the Widget interface, that would be a breaking change to all downstream implementations.

However, this implementation of subtyping, while prevalent, is the most mind-numbly boring realization of the concept, and it pained my soul to even spend time talking about it. So let’s jump into the more interesting way that subtype and supertype relationships manifest in the only language where anything is interesting: Haskell.

Subtyping via Parametric Polymorphism

In Haskell, subtyping is implemented in terms of parametric polymorphism and sometimes typeclasses. This allows for us to work nicely with the concept of functions and APIs as subtypes and supertypes of each other.

For example, let’s look at a function that takes indexers and applies them:

sumAtLocs :: ([Double] -> Int -> Double) -> [Double] -> Double
sumAtLocs ixer xs = ixer xs 1 + ixer xs 2 * ixer xs 3
ghci> sumAtLocs (!!) [1,2,3,4,5]
14

So, what functions could you pass to sumAtLocs? Can you only pass [Double] -> Int -> Double?

Well, not quite. Look at the above where we passed (!!), which has type forall a. [a] -> Int -> a!

In fact, what other types could we pass? Here are some examples:

fun1 :: [a] -> Int -> a
fun1 = (!!)

fun2 :: [a] -> Int -> a
fun2 xs i = reverse xs !! i

fun3 :: (Foldable t, Floating a) => t a -> Int -> a
fun3 xs i = if length xs > i then xs !! i else pi

fun4 :: Num a => [a] -> Int -> a
fun4 xs i = sum (take i xs)

fun5 :: (Integral b, Num c) => a -> b -> c
fun5 xs i = fromIntegral i

fun5 :: (Foldable t, Fractional a, Integral b) => t a -> b -> a
fun5 xs i = sum xs / fromIntegral i

fun5 :: (Foldable t, Integral b, Floating a) => t a -> b -> a
fun5 xs i = logBase (fromIntegral i) (sum xs)

What’s going on here? Well, the function expects a [Double] -> Int -> Double, but there are a lot of other types that could be passed instead.

At first this might seem like meaningless semantics or trickery, but it’s deeper than that: remember that each of the above types actually has a very different meaning and different possible behaviors!

  1. forall a. [a] -> Int -> a means that the a must come from the given list. In fact, any function with that type is guaranteed to be partial: if you pass it an empty list, there is no a available to use.
  2. forall a. Num a => [a] -> Int -> a means that the result might actually come from outside of the list: the implementation could always return 0 or 1, even if the list is empty. It also guarantees that it will only add, subtract, multiply, or abs: it will never divide.
  3. forall a. Fractional a => [a] -> Int -> a means that we could possibly do division on the result, but we can’t do anything “floating” like square rooting or logarithms.
  4. forall a. Floating a => [a] -> Int -> a means that we can possibly start square rooting or taking the logarithms of our input numbers
  5. [Double] -> Int -> Double gives us the least guarantees about the behavior: the result could come from thin air (and not be a part of the list), and we can even inspect the machine representation of our inputs.

So, we have all of these types with completely different semantics and meanings. And yet, they can all be passed to something expecting a [Double] -> Int -> Double. That means that they are all subtypes of [Double] -> Int -> Double! [Double] -> Int -> Double is a supertype that houses multitudes of possible values, uniting all of the possible values and semantics into one big supertype.

Through the power of parametric polymorphism and typeclasses, you can actually create an extensible hierarchy of supertypes, not just of subtypes.

Consider a common API for json serialization. You could have multiple functions that serialize into JSON:

fooToJson :: Foo -> Value
barToJson :: Bar -> Value
bazToJson :: Baz -> Value

Through typeclasses, you can create:

toJSON :: ToJSON a => a -> Value

The type of toJSON :: forall a. JSON a => a -> Value is a subtype of Foo -> Value, Bar -> Value, and Baz -> Value, because everywhere you would want a Foo -> Value, you could give toJSON instead. Every time you want to serialize a Foo, you could use toJSON.

This usage works well, as it gives you an extensible abstraction to design code around. When you write code polymorphic over Monoid a, it forces you to reason about your values with respect to only the aspects relating to monoidness. If you write code polymorphic over Num a, it forces you to reason about your values only with respect to how they can be added, subtracted, negated, or multiplied, instead of having to worry about things like their machine representation.

The extensibility comes from the fact that you can create even more supertypes of forall a. ToJSON a => a -> Value easily, just by defining a new typeclass instance. So, if you need a MyType -> Value, you could make it a supertype of toJSON :: ToJSON a => a -> Value by defining an instance of the ToJSON typeclass, and now you have something you can use in its place.

Practically this is used by many libraries. For example, ad uses it for automatic differentiation: its diff function looks scary:

diff :: (forall s. AD s ForwardDouble -> AD s ForwardDouble) -> Double -> Double

But it relies on the fact that that (forall s. AD s ForwardDouble -> AD s ForwardDuble) is a superclass of (forall a. Floating a => a -> a), (forall a. Num a => a -> a), etc., so you can give it functions like \x -> x * x (which is a forall a. Num a => a -> a) and it will work as that AD s type:

ghci> diff (\x -> x * x) 10
20      -- 2*x

This “numeric overloading” method is used by libraries for GPU programming, as well, to accept numeric functions to be optimized and compiled to GPU code.

Another huge application is in the lens library, which uses subtyping to unite its hierarchy of optics.

For example, an Iso is a subtype of Traversal which is a subtype of Lens, and Lens is a supertype of Fold and Traversal, etc. In the end the system even allows you to use id from the Prelude as a lens or a traversal, because the type signature of id :: a -> a is actually a subtype of all of those types!

Subtyping using Existential Types

What more closely matches the spirit of subtypes in OOP and other languages is the existential type: a value that can be a value of any type matching some interface.

For example, let’s imagine a value that could be any instance of Num:

data SomeNum = forall a. Num a => SomeNum a

someNums :: [SomeNum]
someNums = [SomeNum (1 :: Int), SomeNum (pi :: Double), SomeNum (0xfe :: Word)]

This is somewhat equivalent to Java’s List<MyInterface> or List<MyClass>, or python’s List[MyClass].

Note that to use this effectively in Haskell with superclasses and subclasses, you need to manually wrap and unwrap:

data SomeFrational = forall a. Fractional a => SumFractional a

castUp :: SomeFractional -> SumNum
castUp (SomeFractional x) = SomeNum x

So, SomeNum is “technically” a supertype of SomeFractional: everywhere a SomeNum is expected, a SomeFractional can be given…but in Haskell it’s a lot less convenient because you have to explicitly cast.

In OOP languages, you can often cast “down” using runtime reflection (SomeNum -> Maybe SomeFractional). However, this is impossible in Haskell the way we have written it!

castDown :: SomeNum -> Maybe SomeFractional
castDown = error "impossible!"

That’s because of type erasure: Haskell does not (by default) couple a value at runtime with all of its associated interface implementations. When you create a value of type SomeNum, you are packing an untyped pointer to that value as well as a “dictionary” of all the functions you could use it with:

data NumDict a = NumDict
    { (+) :: a -> a -> a
    , (*) :: a -> a -> a
    , negate :: a -> a
    , abs :: a -> a
    , fromInteger :: Integer -> a
    }

mkNumDict :: Num a => NumDict a
mkNumDict = NumDict (+) (*) negate abs fromInteger

data FractionalDict a = FractionalDict
    { numDict :: NumDict a
    , (/) :: a -> a -> a
    , fromRational :: Rational -> a
    }

-- | Essentially equivalent to the previous 'SomeNum'
data SomeNum = forall a. SomeNum
    { numDict :: NumDict a
    , value :: a
    }

-- | Essentially equivalent to the previous 'SomeFractional'
data SomeFractional = forall a. SomeFractional
    { fractionalDict :: FractionalDict a
    , value :: a
    }

castUp :: SomeFractional -> SomeNum
castUp (SomeFractional (FractionalDict {numDict}) x) = SomeNum d x

castDown :: SomeNum -> Maybe SomeFractional
castDown (SomeNum nd x) = error "not possible!"

All of these function pointers essentially exist at runtime inside the SomeNum. So, SomeFractional can be “cast up” to SomeNum by simply dropping the FractionalDict. However, you cannot “cast down” from SomeNum because there is no way to materialize the FractionalDict: the association from type to instance is lost at runtime. OOP languages usually get around this by having the value itself hold pointers to all of its interface implementations at runtime. However, in Haskell, we have type erasure by default: there are no tables carried around at runtime.2

In the end, existential subtyping requires explicit wrapping/unwrapping instead of implicit or lightweight casting possible in OOP languages optimized around this sort of behavior.3 Existential-based subtyping is just less common in Haskell because parametric polymorphism offers a solution to most similar problems. For more on this topic, Simon Peyton Jones has a nice lecture on the topic.

The pattern of using existentially qualified data in a container (like [SomeNum]) is often called the “widget pattern” because it’s used in libraries like xmonad to allow extensible “widgets” stored alongside the methods used to manipualte them. It’s more common to explicitly store the handler functions (a “dictionary”) inside the type instead of of existential typeclasses, but sometimes it can be nice to let the compiler handle generating and passing your method tables implicitly for you. Using existential typeclasses instead of explicit dictionaries also allows you to bless certain methods and functions as “canonical” to your type, and the compiler will make sure they are always coherent.

I do mention in a blog post about different types of existential lists, however, that this “container of instances” type is much less useful in Haskell than in other languages for many reasons, including the up/downcasting issues mentioned above. In addition, Haskell gives you a whole wealth of functionality to operate over homogeneous parameters (like [a], where all items have the same type) that jumping to heterogeneous lists gives up so much.

Aside

Let’s briefly take a moment to talk about how typeclass hierarchies give us subtle subtype/supertype relationships.

Let’s look at the classic Num and Fractional:

class Num a

class Num a => Fractional a

Num is a superclass of Fractional, and Fractional is a subclass of Num. Everywhere a Num constraint is required, you can provide a Fractional constraint to do the same thing.

However, in these two types:

Num a => a
Fractional a => a

forall a. Num a => a is actually a subclass of forall a. Fractional a => a! That’s because if you need a forall a. Fractional a => a, you can provide a forall a. Num a => a instead. In fact, let’s look at three levels: Double, forall a. Fractional a => a, and forall a. Num a => a.

-- can be used as `Double`
1.0 :: Double
1.0 :: Fractional a => a
1 :: Num a => a

-- can be used as `forall a. Fractional a => a`
1.0 :: Fractional a => a
1 :: Num a => a

-- can be used as `forall a. Num a => a`
1 :: Num a => a

So, Double is a supertype of Fractional a => a is a supertype of Num a => a.

The general idea here is that the more super- you go, the more you “know” about the actual term you are creating. So, with Num a => a, you know the least (and, you have the most possible actual terms because there are more instances of Num than of Fractional). And, with Double, you know the most: you even know its machine representation!

So, Num is a superclass of Fractional but forall a. Num a => a is a subclass of forall a. Fractional a => a. This actually follows the typical rules of subtyping: if something appears on the “left” of an arrow (=> in this case), it gets flipped from sub- to super-. We often call the left side a “negative” (contravariant) position and the right side a “positive” position, because a negative of a negative (the left side of a left size, like a in (a -> b) -> c) is a positive.

Also note that our “existential wrappers”:

data SomeNum = forall a. Num a => SomeFractional a
data SomeFractional = forall a. Fractional a => SomeFractional a

can be CPS-transformed to their equivalent types:

type SomeNum' = forall r. (forall a. Num a => a -> r) -> r
type SomeFractional' = forall r. (forall a. Fractional a => a -> r) -> r

toSomeNum' :: SomeNum -> SomeNum'
toSomeNum' (SomeNum x) f = f x

toSomeNum :: SomeNum' -> SomeNum
toSomeNum sn = sn SomeNum

And in those cases, Num and Fractional again appear in the covariant (positive) position, since they’re the negative of negative. So, this aligns with our intuition that SomeFractional is a subtype of SomeNum.

The Expression Problem

This tension that I described earlier is closely related to the expression problem, and is a tension that is inherent to a lot of different aspects of language and abstraction design. However, in the context laid out in this post, it serves as a good general guide to decide what pattern to go down:

  • If you expect a canonical set of “inhabitants” and an open set of “operations”, sum types can suit that end of the spectrum well.
  • If you expect a canonical set of “operations” and an open set of “inhabitants”, consider subtyping and supertyping.

I don’t really think of the expression problem as a “problem” in the sense of “some hindrance to deal with”. Instead, I see it in the “math problem” sort of way: by adjusting how you approach things, you can play with the equation make the most out of what requirements you need in your design.

Looking Forward

A lot of frustration in Haskell (and programming in general) lies in trying to force abstraction and tools to work in a way they weren’t meant to. Hopefully this short run-down can help you avoid going against the point of these design patterns and start making the most of what they can offer. Happy Haskelling!

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)


  1. ?↩︎

  2. Must OOP languages also have mechanisms for type erasure, but the default is unerased, which is opposite of Haskell.↩︎

  3. Note that there are current GHC proposals that attempt to allow “naked” existentials without newtype wrappers, so we could actually get the same seamless and implicit up-casting as we would get in OOP languages. However, the jury is out on whether or not this is a good idea.↩︎

by Justin Le at March 06, 2025 05:25 PM

February 27, 2025

Russell O'Connor

The Right Number of YubiKeys is Three

Four years ago I bought a pair of YubiKey 5s: One YubiKey 5 Nano, which fits in my laptop’s USB slot, and another YubiKey 5 NFC as backup, which sat in my home office. However, I kept worrying about what happens if my house burns down or something, taking both my laptop and office YubiKeys together at the same time. On the otherhand, if I stored my YubiKey 5 NFC offsite, then whenever I needed to register a new FIDO service, I would need to go fetch the key, update it, and then return it. Based my peronal experince, even if that were not a big pain, the "return it" step often gets delayed indefinitely because it feels so low priority.

Then I read a popular comment made on Hacker News: Get three YubiKeys. Suddenly everything clicked! I bought a second YubiKey 5 NFC last year.

Now, I keep a second YubiKey 5 NFC offsite, in addition to the one in my laptop and the one in my office. If my home burns down, I still have an offsite YubiKey available. But the best thing about having a second YubiKey 5 NFC is that it partly mitigates the offsite update problem.

In the previous scenario, we required potentially two trips offsite to update the backup YubiKey. However, now the procedure is to register a new FIDO service is to first update the office YubiKey 5 NFC key (and the YubiKey 5 nano). Then, at your earlist convienence, you swap the office YubiKey 5 NFC key with the offsite YubiKey 5 NFC. When you get the offsite YubiKey home, you update it with the new FIDO service and then it becomes the new office YubiKey. There is no need to return to the offsite location.

Part of the issue is that there is no "public FIDO key", like there is with a "public PGP key". You need the acutual YubiKey in hand to register it with a FIDO service, no matter whether it is a discoverable credetial or not. If you were only using the YubiKey as a OpenPGP smart card, the perhaps you could get away with just having a local key and an offsite key. Even still, I would recommend a third YubiKey so that whenever the time comes to do some operation on your offsite key, you can perform the same swaping trick.

The title of this article says that three is the right number of YubiKeys. However this is because I only have one nano in my laptop because that is my primary computing interface. I do have a desktop computer that I mostly only access as a remote server. If you have multiple computer devices that you regularly use, it would make sense to have a YubiKey nano device in each of them. And in addition to those, have one offsite key, and one local key for swapping with the offsite key.

February 27, 2025 01:25 AM

February 26, 2025

GHC Developer Blog

GHC 9.12.2-rc1 is now available

GHC 9.12.2-rc1 is now available

Ben Gamari - 2025-02-26

The GHC developers are happy to announce the availability of the first and likely final release candidate of GHC 9.12.2. This is an important bug-fix release resolving a significant correctness issue present in 9.12.1 (#25653).

In accordance with our under-discussion release policies this candidate will have a two-week testing window. The final 9.12.2 release will likely come the week of 12 March 2025.

As always, if you find anything amiss please open a ticket.

Happy Haskelling!

  • Ben

by ghc-devs at February 26, 2025 12:00 AM

February 17, 2025

Haskell Interlude

62: Conal Elliott

In this episode Wouter Swiestra and Niki Vazou talk with Conal Elliott. Conal discusses doing things just for the poetry, how most programs miss their purpose, and the simplest way to ask a question. Conal is currently working on a book about his ideas and actively looking for partners. 

by Haskell Podcast at February 17, 2025 11:00 AM

February 15, 2025

Sandy Maguire

Bidirectional Instance Contexts

Just a quick one today, but I wanted to point out a little trick you can do with Haskell’s typeclass inference.

Imagine we have some little class, the details of which matter not in the least:

type Foo :: Type -> Constraint
class Foo a where
  ...

We can give some instances of this type:

instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...

Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context:

instance (Foo a, Foo b) => Foo (a, b) where ...

Then, of course, if we know both Foo a and Foo b, we can infer Foo (a, b). To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the Dict type, and thus the following program will typecheck:

import Data.Constraint

forwards
  :: Dict (Foo a)
  -> Dict (Foo b)
  -> Dict (Foo (a, b))
forwards Dict Dict = Dict

Perhaps tipped off by the name here, the gentle reader is asked to notice the asymmetry here, since the converse program will not typecheck:

backwards
  :: Dict (Foo (a, b))
  -> (Dict (Foo a), Dict (Foo b))
backwards Dict = (Dict, Dict)

But why should it not typecheck?1 Recall from the relevant instance definition that these instances must, in fact, exist:

instance (Foo a, Foo b) => Foo (a, b)

As a testament to just how good GHC is, we can support this bidirectionality via a minor tweak to the definition of class and its instances.

The trick is to add an associated type family to Foo, and to use it as a superclass constraint:

type Foo :: Type -> Constraint
class Evidence a => Foo a where
  type Evidence a :: Constraint
  type Evidence a = ()
  ...

Because we’ve given a default implementation of the type family, our existing simple instances work as before:

instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...

with the only change required coming from the type constructor instances:

instance (Foo a, Foo b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...

or, if we you want to be cute about it:

instance Evidence (a, b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...

By sticking Evidence into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a Foo dictionary around. And our earlier backwards program now typechecks as expected.

This is all available in a play session if you’d like to fool around with it.


  1. Rhetorical question. I don’t want to hear about orphans or overlapping instances or whatever.↩︎

February 15, 2025 02:15 AM

February 13, 2025

Oleg Grenrus

PHOAS to de Bruijn conversion

Posted on 2025-02-13 by Oleg Grenrus agda

Recently I looked again at PHOAS, and once again I concluded it's nice for library APIs, but so painful to do anything with inside those libraries. So let convert to something else, like de Bruijn.

There are standalone source files if you just want to see the code:

How to convert PHOAS terms to de Bruijn terms?

The solution is hard to find.

You can cheat, [as mentioned by Roman on Agda mailing list https://lists.chalmers.se/pipermail/agda/2018/010033.html]:

There is always a way to cheat, though. You can turn the PHOAS -> untyped de Bruijn machinery into the PHOAS -> typed de Bruijn machinery by checking that future contexts indeed extend past contexts and throwing an error otherwise (which can't happed, because future contexts always extend past contexts, but it's a metatheorem).

In "Generic Conversions of Abstract Syntax Representation" by Steven Keuchel and Johan Jeuring, authors also "cheat" a bit. The "Parametrhic higher-order abstract syntax" section ends with a somewhat disappointing

  where postulate whatever : _

Keuchel and Jeuring also mention "Unembedding Domain-Specific Languages" by Robert Atkey, Sam Lindley and Jeremy Yallop; where there is one unsatisfactory ⊥ (undefined in Haskell) hiding.

I think that for practical developments (say a library in Haskell), it is ok to make a small short cut; but I kept wondering isn't there is a way to make a conversion without cheating.

Well... it turns out that we cannot "cheat". Well-formedness of PHOAS representation depends on parametricity, and the conversion challenge seems to requires a theorem which there are no proof in Agda.

In unpublished (?) work Adam Chlipala shows a way to do the conversion without relying on postulates http://adam.chlipala.net/cpdt/html/Intensional.html; but that procedure requires an extra well formedness proof of given PHOAS term.

This Agda development is a translation of that developement.

Common setup

Our syntax representations will be well-typed, so we need types:

-- Types
data Ty : Set where
  emp : Ty
  fun : Ty  Ty  Ty

Ctx : Set
Ctx = List Ty

variable
  A B C : Ty
  Γ Δ Ω : Ctx
  v : Ty  Set

de Bruijn syntax

Var : Ctx  Ty  Set
Var Γ A = Idx A Γ -- from agda-np, essentially membership relation.

data DB (Γ : Ctx) : Ty  Set where
  var : Var Γ A  DB Γ A
  app : DB Γ (fun A B)  DB Γ A  DB Γ B
  lam : DB (A ∷ Γ) B  DB Γ (fun A B)
  abs : DB Γ emp  DB Γ A

Parametric Higher-order abstract syntax

data PHOAS (v : Ty  Set) : Ty  Set where
  var : v A  PHOAS v A
  app : PHOAS v (fun A B)  PHOAS v A  PHOAS v B
  lam : (v A  PHOAS v B)  PHOAS v (fun A B)
  abs : PHOAS v emp  PHOAS v A

-- closed "true" PHOAS terms.
PHOAS° : Ty  Set₁
PHOAS° A =  {v}  PHOAS v A

de Bruijn to PHOAS

This direction is trivial. An anecdotal evidence that de Bruijn representation is easier to transformation on.

phoasify : NP v Γ  DB Γ A  PHOAS v A
phoasify γ (var x)   = var (lookup γ x)
phoasify γ (app f t) = app (phoasify γ f) (phoasify γ t)
phoasify γ (lam t)   = lam λ x  phoasify (x ∷ γ) t
phoasify γ (abs t)   = abs (phoasify γ t)

Interlude: Well-formedness of PHOAS terms

dam Chlipala defines an equivalence relation between two PHOAS terms, exp_equiv in Intensional, wf in CPDT book). e only need a single term well-formedness so can do a little less

The goal is to rule out standalone terms like

module Invalid where
  open import Data.Unit using (; tt)

  invalid : PHOAS  _ ) emp
  invalid = var tt

Terms like invalid cannot be values of PHOAS°, as all values of "v" inside PHOAS° have to originated from lam-constructor abstractions. We really should keep v parameter free, i.e. parametric, when constructing PHOAS terms.

The idea is then to simply to track which variables (values of v) are intoduced by lambda abstraction.

data phoasWf {v : Ty  Set} (G : ListTy v)) : {A : Ty}  PHOAS v A  Set
 where
  varWf :  {A} {x : v A}
     Idx (A , x) G
     phoasWf G (var x)
  appWf :  {A B} {f : PHOAS v (fun A B)} {t : PHOAS v A}
     phoasWf G f
     phoasWf G t
     phoasWf G (app f t)
  lamWf :  {A B} {f : v A  PHOAS v B}
     ( (x : v A)  phoasWf ((A , x)  G) (f x))
     phoasWf G (lam f)
  absWf :  {A} {t : PHOAS v emp}
     phoasWf G t
     phoasWf G (abs {A = A} t)

-- closed terms start with an empty G
phoasWf° : PHOAS° A  Set
phoasWf° tm =  {v}  phoasWf {v = v} [] tm

A meta theorem is then that all PHOASᵒ terms are well-formed, i.e.

meta-theorem-proposition : Set₁
meta-theorem-proposition =  {A} (t : PHOAS° A)  phoasWf° t

As far as I'm aware this proposition cannot be proved nor refuted in Agda.

de Bruijn to PHOAS translation creates well-formed PHOAS terms.

As a small exercise we can show that phoasify of closed de Bruijn terms creates well-formed PHOAS terms.

toList : NP v Γ  List (Σ Ty v)
toList []       = []
toList (x ∷ xs) = (_ , x) ∷ toList xs

phoasifyWfVar : (γ : NP v Γ) (x : Var Γ A)  Idx (A , lookup γ x) (toList γ)
phoasifyWfVar (x ∷ γ) zero    = zero
phoasifyWfVar (x ∷ γ) (suc i) = suc (phoasifyWfVar γ i)

phoasifyWf : (γ : NP v Γ) (t : DB Γ A)  phoasWf (toList γ) (phoasify γ t)
phoasifyWf γ (var x)   = varWf (phoasifyWfVar γ x)
phoasifyWf γ (app f t) = appWf (phoasifyWf γ f) (phoasifyWf γ t)
phoasifyWf γ (lam t)   = lamWf λ x  phoasifyWf (x ∷ γ) t
phoasifyWf γ (abs t)   = absWf (phoasifyWf γ t)

phoasifyWf° : (t : DB [] A)  phoasWf° (phoasify [] t)
phoasifyWf° t = phoasifyWf [] t

PHOAS to de Bruijn

The rest deals with the opposite direction.

In Intensional Adam Chlipala uses v = λ _ → ℕ instatiation to make the translation.

I think that in the typed setting using v = λ _ → Ctx turns out nicer.

The idea in both is that we instantiate PHOAS variables to be de Bruijn levels.

data IsSuffixOf {} {a : Set} : List a  List a  Setwhere
  refl :  {xs}  IsSuffixOf xs xs
  cons :  {xs ys}  IsSuffixOf xs ys   {y}  IsSuffixOf xs (y ∷ ys)

We need to establish well-formedness of PHOAS expression in relation to some context Γ

Note that variables encode de Bruijn levels, thus the contexts we "remember" in variables should be the suffix of that outside context.

wf : (Γ : Ctx)  PHOAS  _  Ctx) A  Set
wf {A = A} Γ (var Δ)         = IsSuffixOf (A ∷ Δ) Γ
wf         Γ (app f t)       = wf Γ f × wf Γ t
wf         Γ (lam {A = A} t) = wf (A ∷ Γ) (t Γ)
wf         Γ (abs t)         = wf Γ t

And if (A ∷ Δ) is suffix of context Γ, we can convert the evidence to the de Bruijn index (i.e. variable):

makeVar : IsSuffixOf (A ∷ Δ) Γ  Var Γ A
makeVar refl     = zero
makeVar (cons s) = suc (makeVar s)

Given the term is well-formed in relation to context Γ we can convert it to de Bruijn representation.

dbify : (t : PHOAS  _  Ctx) A)  wf Γ t  DB Γ A
dbify         (var x)   wf        = var (makeVar wf)
dbify         (app f t) (fʷ , tʷ) = app (dbify f fʷ) (dbify t tʷ)
dbify {Γ = Γ} (lam t)   wf        = lam (dbify (t Γ) wf)
dbify         (abs t)   wf        = abs (dbify t wf)

What is left is to show that we can construct wf for all phoasWf-well-formed terms.

Adam Chlipala defines a helper function:

makeG′ : Ctx  List (Σ Ty  _  Ctx))
makeG′ [] = []
makeG′ (A ∷ Γ) = (A , Γ) ∷ makeG′ Γ

However for somewhat technical reasons, we rather define

expand : (Γ : Ctx)  NP  _  Ctx) Γ
expand []      = []
expand (_ ∷ Γ) = Γ ∷ expand Γ

and use expand with previously defined toList to define our version of makeG:

makeG : Ctx  List (Σ Ty  _  Ctx))
makeG Γ = toList (expand Γ)

makeG and makeG′ are the same:

toList∘expand≡makeG :  Γ  makeG Γ ≡ makeG′ Γ
toList∘expand≡makeG []      = refl
toList∘expand≡makeG (A ∷ Γ) = cong ((A , Γ)_) (toList∘expand≡makeG Γ)

Then we can construct wf for all phoasWf:

wfWfVar : Idx (A , Δ) (makeG Γ) → IsSuffixOf (A ∷ Δ) Γ
wfWfVar {Γ = B ∷ Γ} zero    = refl
wfWfVar {Γ = B ∷ Γ} (suc i) = cons (wfWfVar i)

wfWf : (t : PHOAS (λ _ → Ctx) A) → phoasWf (makeG Γ) t → wf Γ t
wfWf         (var x)   (varWf xʷ)    = wfWfVar xʷ
wfWf         (app f t) (appWf fʷ tʷ) = wfWf f fʷ , wfWf t tʷ
wfWf {Γ = Γ} (lam f)   (lamWf fʷ)    = wfWf (f Γ) (fʷ Γ)
wfWf         (abs t)   (absWf tʷ)    = wfWf t tʷ

And finally we define dbifyᵒ for all well-formed PHOASᵒ terms.

dbify° : (t : PHOAS° A)  phoasWf° t  DB [] A
dbify° t w = dbify t (wfWf t w)

Bonus section

We can show that converting closed de Bruijn term to PHOAS and back is an identity function:

bonus-var : (x : Var Γ A)  x ≡ makeVar (wfWfVar (phoasifyWfVar (expand Γ) x))
bonus-var {Γ = A  Γ} zero    = refl
bonus-var {Γ = A  Γ} (suc i) = cong suc (bonus-var i)

bonus : (t : DB Γ A)
       t ≡ dbify (phoasify (expand Γ) t)
              (wfWf (phoasify (expand Γ) t) (phoasifyWf _ t))
bonus (var x)   = cong var (bonus-var x)
bonus (app f t) = cong₂ app (bonus f) (bonus t)
bonus (lam t)   = cong lam (bonus t)
bonus (abs t)   = cong abs (bonus t)

bonus° :  (t : DB [] A)  t ≡ dbify° (phoasify [] t) (phoasifyWf° t)
bonus° t = bonus t

February 13, 2025 12:00 AM

February 11, 2025

Oleg Grenrus

NbE PHOAS

Posted on 2025-02-11 by Oleg Grenrus agda

Normalization by evaluation using parametric higher order syntax. In Agda.

I couldn't find a self-contained example of PHOAS NbE, so here it is. I hope someone might find it useful.

module NbEXP.PHOAS where

data Ty : Set where
  emp : Ty
  fun : Ty  Ty  Ty

data Tm (v : Ty  Set) : Ty  Set where
  var :  {a}  v a  Tm v a
  app :  {a b}  Tm v (fun a b)  Tm v a  Tm v b
  lam :  {a b}  (v a  Tm v b)  Tm v (fun a b)

data Nf (v : Ty  Set) : Ty  Set
data Ne (v : Ty  Set) : Ty  Set

data Ne v where
  nvar :  {a}  v a  Ne v a
  napp :  {a b}  Ne v (fun a b)  Nf v a  Ne v b

data Nf v where
  neut : Ne v emp  Nf v emp
  nlam :  {a b}  (v a  Nf v b)  Nf v (fun a b)

Sem : (Ty  Set)  Ty  Set
Sem v emp       = Ne v emp
Sem v (fun a b) = Sem v a  Sem v b

lower :  {v : Ty  Set} (a : Ty)  Sem v a  Nf v a
raise :  {v : Ty  Set} (a : Ty)  Ne v a  Sem v a

lower emp       s = neut s
lower (fun a b) s = nlam λ x  lower b (s (raise a (nvar x)))

raise emp       n   = n
raise (fun a b) n x = raise b (napp n (lower a x ))

eval : {v : Ty  Set} {a : Ty}  Tm (Sem v) a  Sem v a
eval (var x)   = x
eval (app f t) = eval f (eval t)
eval (lam t) x = eval (t x)

nf : {a : Ty}  {v : Ty  Set}  Tm (Sem v) a  Nf v a
nf {a} t = lower a (eval t)

nf_parametric : {a : Ty}  ({v : Ty  Set}  Tm v a) -> ({v : Ty  Set}  Nf v a)
nf_parametric t = nf t

February 11, 2025 12:00 AM

February 10, 2025

Oskar Wickström

Machine: Learning; Human: Unlearning;

This last month has been fascinating. I guess LLMs have finally resonated with me on a deeper level. It wasn’t like I woke up and suddenly everything was different, but their impact is growing on me non-linearly, forcing me to rewire my brain.

February 10, 2025 11:00 PM

February 06, 2025

Philip Wadler

I've been nominated for a teaching award


I've been fortunate to be nominated for a few teaching awards over my career, and even to win a couple. The nomination I just received may be the best.

As a new student at the uni, Philip Wadler was the first introductory lecture I had, and his clear passion for the subject made me feel excited to begin my journey in computer science. In particular he emphasised the importance of asking questions, which made the idea of tutorials and lectures a lot less intimidating, and went on to give really valuable advice for starting university. I enjoyed this session so much, and so was looking forward to the guest lectures he was going to do for Inf1A at the end of semester 1. They certainly did not disappoint, the content he covered was engaging, interesting, and above all very entertaining to listen to, especially when he dressed up as a superhero to cement his point. Because I found these talks so rewarding, I also attended the STMU that he spoke at about AI and ChatGPT, and everyone I talked to after the event said they had a really good time whilst also having a completely new insightful perspective on the topic. In summary, Philip Wadler has delivered the best lectures I have attended since starting university, and I have gotten a lot out of them.

Thank you, anonymous first-year student! 

by Philip Wadler (noreply@blogger.com) at February 06, 2025 10:07 PM

February 04, 2025

Michael Snoyman

Who pays a tax?

President Trump has started rolling out his tariffs, something I blogged about in November. People are talking about these tariffs a lot right now, with many people (correctly) commenting on how consumers will end up with higher prices as a result of these tariffs. While that part is true, I’ve seen a lot of people taking it to the next, incorrect step: that consumers will pay the entirety of the tax. I put up a poll on X to see what people thought, and while the right answer got a lot of votes, it wasn't the winner.

For purposes of this blog post, our ultimate question will be the following:

  • Suppose apples currently sell for $1 each in the entire United States.
  • There are domestic sellers and foreign sellers of apples, all receiving the same price.
  • There are no taxes or tariffs on the purchase of apples.
  • The question is: if the US federal government puts a $0.50 import tariff per apple, what will be the change in the following:
    • Number of apples bought in the US
    • Price paid by buyers for apples in the US
    • Post-tax price received by domestic apple producers
    • Post-tax price received by foreign apple producers

Before we can answer that question, we need to ask an easier, first question: before instituting the tariff, why do apples cost $1?

And finally, before we dive into the details, let me provide you with the answers to the ultimate question. I recommend you try to guess these answers before reading this, and if you get it wrong, try to understand why:

  1. The number of apples bought will go down
  2. The buyers will pay more for each apple they buy, but not the full amount of the tariff
  3. Domestic apple sellers will receive a higher price per apple
  4. Foreign apple sellers will receive a lower price per apple, but not lowered by the full amount of the tariff

In other words, regardless of who sends the payment to the government, both taxed parties (domestic buyers and foreign sellers) will absorb some of the costs of the tariff, while domestic sellers will benefit from the protectionism provided by tariffs and be able to sell at a higher price per unit.

Marginal benefit

All of the numbers discussed below are part of a helper Google Sheet I put together for this analysis. Also, apologies about the jagged lines in the charts below, I hadn’t realized before starting on this that there are some difficulties with creating supply and demand charts in Google Sheets.

Let’s say I absolutely love apples, they’re my favorite food. How much would I be willing to pay for a single apple? You might say “$1, that’s the price in the supermarket,” and in many ways you’d be right. If I walk into supermarket A, see apples on sale for $50, and know that I can buy them at supermarket B for $1, I’ll almost certainly leave A and go buy at B.

But that’s not what I mean. What I mean is: how high would the price of apples have to go everywhere so that I’d no longer be willing to buy a single apple? This is a purely personal, subjective opinion. It’s impacted by how much money I have available, other expenses I need to cover, and how much I like apples. But let’s say the number is $5.

How much would I be willing to pay for another apple? Maybe another $5. But how much am I willing to pay for the 1,000th apple? 10,000th? At some point, I’ll get sick of apples, or run out of space to keep the apples, or not be able to eat, cook, and otherwise preserve all those apples before they rot.

The point being: I’ll be progressively willing to spend less and less money for each apple. This form of analysis is called marginal benefit: how much benefit (expressed as dollars I’m willing to spend) will I receive from each apple? This is a downward sloping function: for each additional apple I buy (quantity demanded), the price I’m willing to pay goes down. This is what gives my personal demand curve. And if we aggregate demand curves across all market participants (meaning: everyone interested in buying apples), we end up with something like this:

Demand curve before tax

Assuming no changes in people’s behavior and other conditions in the market, this chart tells us how many apples will be purchased by our buyers at each price point between $0.50 and $5. And ceteris paribus (all else being equal), this will continue to be the demand curve for apples.

Marginal cost

Demand is half the story of economics. The other half is supply, or: how many apples will I sell at each price point? Supply curves are upward sloping: the higher the price, the more a person or company is willing and able to sell a product.

Let’s understand why. Suppose I have an apple orchard. It’s a large property right next to my house. With about 2 minutes of effort, I can walk out of my house, find the nearest tree, pick 5 apples off the tree, and call it a day. 5 apples for 2 minutes of effort is pretty good, right?

Yes, there was all the effort necessary to buy the land, and plant the trees, and water them… and a bunch more than I likely can’t even guess at. We’re going to ignore all of that for our analysis, because for short-term supply-and-demand movement, we can ignore these kinds of sunk costs. One other simplification: in reality, supply curves often start descending before ascending. This accounts for achieving efficiencies of scale after the first number of units purchased. But since both these topics are unneeded for understanding taxes, I won’t go any further.

Anyway, back to my apple orchard. If someone offers me $0.50 per apple, I can do 2 minutes of effort and get $2.50 in revenue, which equates to a $75/hour wage for me. I’m more than happy to pick apples at that price!

However, let’s say someone comes to buy 10,000 apples from me instead. I no longer just walk out to my nearest tree. I’m going to need to get in my truck, drive around, spend the day in the sun, pay for gas, take a day off of my day job (let’s say it pays me $70/hour). The costs go up significantly. Let’s say it takes 5 days to harvest all those apples myself, it costs me $100 in fuel and other expenses, and I lose out on my $70/hour job for 5 days. We end up with:

  • Total expenditure: $100 + $70 * 8 hours a day * 5 days == $2900
  • Total revenue: $5000 (10,000 apples at $0.50 each)
  • Total profit: $2100

So I’m still willing to sell the apples at this price, but it’s not as attractive as before. And as the number of apples purchased goes up, my costs keep increasing. I’ll need to spend more money on fuel to travel more of my property. At some point I won’t be able to do the work myself anymore, so I’ll need to pay others to work on the farm, and they’ll be slower at picking apples than me (less familiar with the property, less direct motivation, etc.). The point being: at some point, the number of apples can go high enough that the $0.50 price point no longer makes me any money.

This kind of analysis is called marginal cost. It refers to the additional amount of expenditure a seller has to spend in order to produce each additional unit of the good. Marginal costs go up as quantity sold goes up. And like demand curves, if you aggregate this data across all sellers, you get a supply curve like this:

Supply curve before tariff

Equilibrium price

We now know, for every price point, how many apples buyers will purchase, and how many apples sellers will sell. Now we find the equilibrium: where the supply and demand curves meet. This point represents where the marginal benefit a buyer would receive from the next buyer would be less than the cost it would take the next seller to make it. Let’s see it in a chart:

Supply and demand before tariff

You’ll notice that these two graphs cross at the $1 price point, where 63 apples are both demanded (bought by consumers) and supplied (sold by producers). This is our equilibrium price. We also have a visualization of the surplus created by these trades. Everything to the left of the equilibrium point and between the supply and demand curves represents surplus: an area where someone is receiving something of more value than they give. For example:

  • When I bought my first apple for $1, but I was willing to spend $5, I made $4 of consumer surplus. The consumer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and above the equilibrium price point.
  • When a seller sells his first apple for $1, but it only cost $0.50 to produce it, the seller made $0.50 of producer surplus. The producer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and below the equilibrium price point.

Another way of thinking of surplus is “every time someone got a better price than they would have been willing to take.”

OK, with this in place, we now have enough information to figure out how to price in the tariff, which we’ll treat as a negative externality.

Modeling taxes

Alright, the government has now instituted a $0.50 tariff on every apple sold within the US by a foreign producer. We can generally model taxes by either increasing the marginal cost of each unit sold (shifting the supply curve up), or by decreasing the marginal benefit of each unit bought (shifting the demand curve down). In this case, since only some of the producers will pay the tax, it makes more sense to modify the supply curve.

First, let’s see what happens to the foreign seller-only supply curve when you add in the tariff:

Foreign supply shift from tariff

With the tariff in place, for each quantity level, the price at which the seller will sell is $0.50 higher than before the tariff. That makes sense: if I was previously willing to sell my 82nd apple for $3, I would now need to charge $3.50 for that apple to cover the cost of the tariff. We see this as the tariff “pushing up” or “pushing left” the original supply curve.

We can add this new supply curve to our existing (unchanged) supply curve for domestic-only sellers, and we end up with a result like this:

Supply curves post tariff

The total supply curve adds up the individual foreign and domestic supply curves. At each price point, we add up the total quantity each group would be willing to sell to determine the total quantity supplied for each price point. Once we have that cumulative supply curve defined, we can produce an updated supply-and-demand chart including the tariff:

Supply and demand post tariff

As we can see, the equilibrium has shifted:

  • The equilibrium price paid by consumers has risen from $1 to $1.20.
  • The total number of apples purchased has dropped from 63 apples to 60 apples.
  • Consumers therefore received 3 less apples. They spent $72 for these 60 apples, whereas previously they spent $63 for 3 more apples, a definite decrease in consumer surplus.
  • Foreign producers sold 36 of those apples (see the raw data in the linked Google Sheet), for a gross revenue of $43.20. However, they also need to pay the tariff to the US government, which accounts for $18, meaning they only receive $25.20 post-tariff. Previously, they sold 42 apples at $1 each with no tariff to be paid, meaning they took home $42.
  • Domestic producers sold the remaining 24 apples at $1.20, giving them a revenue of $28.80. Since they don’t pay the tariff, they take home all of that money. By contrast, previously, they sold 21 apples at $1, for a take-home of $21.
  • The government receives $0.50 for each of the 60 apples sold, or in other words receives $30 in revenue it wouldn’t have received otherwise.

We could be more specific about the surpluses, and calculate the actual areas for consumer surplus, producer surplus, inefficiency from the tariff, and government revenue from the tariff. But I won’t bother, as those calculations get slightly more involved. Instead, let’s just look at the aggregate outcomes:

  • Consumers were unquestionably hurt. Their price paid went up by $0.20 per apple, and received less apples.
  • Foreign producers were also hurt. Their price received went down from the original $1 to the new post-tariff price of $1.20, minus the $0.50 tariff. In other words: foreign producers only receive $0.70 per apple now. This hurt can be mitigated by shifting sales to other countries without a tariff, but the pain will exist regardless.
  • Domestic producers scored. They can sell less apples and make more revenue doing it.
  • And the government walked away with an extra $30.

Hopefully you now see the answer to the original questions. Importantly, while the government imposed a $0.50 tariff, neither side fully absorbed that cost. Consumers paid a bit more, foreign producers received a bit less. The exact details of how that tariff was split across the groups is mediated by the relevant supply and demand curves of each group. If you want to learn more about this, the relevant search term is “price elasticity,” or how much a group’s quantity supplied or demanded will change based on changes in the price.

Other taxes

Most taxes are some kind of a tax on trade. Tariffs on apples is an obvious one. But the same applies to income tax (taxing the worker for the trade of labor for money) or payroll tax (same thing, just taxing the employer instead). Interestingly, you can use the same model for analyzing things like tax incentives. For example, if the government decided to subsidize domestic apple production by giving the domestic producers a $0.50 bonus for each apple they sell, we would end up with a similar kind of analysis, except instead of the foreign supply curve shifting up, we’d see the domestic supply curve shifting down.

And generally speaking, this is what you’ll always see with government involvement in the economy. It will result in disrupting an existing equilibrium, letting the market readjust to a new equilibrium, and incentivization of some behavior, causing some people to benefit and others to lose out. We saw with the apple tariff, domestic producers and the government benefited while others lost.

You can see the reverse though with tax incentives. If I give a tax incentive of providing a deduction (not paying income tax) for preschool, we would end up with:

  • Government needs to make up the difference in tax revenue, either by raising taxes on others or printing more money (leading to inflation). Either way, those paying the tax or those holding government debased currency will pay a price.
  • Those people who don’t use the preschool deduction will receive no benefit, so they simply pay a cost.
  • Those who do use the preschool deduction will end up paying less on tax+preschool than they would have otherwise.

This analysis is fully amoral. It’s not saying whether providing subsidized preschool is a good thing or not, it simply tells you where the costs will be felt, and points out that such government interference in free economic choice does result in inefficiencies in the system. Once you have that knowledge, you’re more well educated on making a decision about whether the costs of government intervention are worth the benefits.

February 04, 2025 12:00 AM

February 02, 2025

Joachim Breitner

Coding on my eInk Tablet

For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It’s winter right now, but when its summer again it’s always a bit. this weekend I got closer to that goal.

TL;DR: Using code-server on a beefy machine seems to be quite neat.

Passively lit coding
Passively lit coding

Personal history

Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.

Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I’d attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet’s screen’s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.

Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don’t have to use Boox’s “monitor” app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn’t as convenient to be used regularly.

I also played around with pure terminal approaches, e.g. SSH’ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.

VSCode, working remotely

Since these attempts I have started a new job working on the Lean theorem prover, and working on or with Lean basically means using VSCode. (There is a very good neovim plugin as well, but I’m using VSCode nevertheless, if only to make sure I am dogfooding our default user experience).

My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it’s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it’s a bit spooky to run these work loads without the laptop’s fan spinning up.

In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?

VSCode itself doesn’t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet’s system.

code-server, running remotely

A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.

Access

With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.

I also decided against the more involved “reverse proxy behind proper hostname with SSL” setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.

That left me with the option of using a code-server’s built-in support for self-signed certificates and a password:

$ cat .config/code-server/config.yaml
bind-addr: 1.2.3.4:8080
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: true

With trust-on-first-use this seems reasonably secure.

Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot “install” the page (as a Progressive Web App) unless it has a valid certificate. But since I don’t have superuser access to that machine, I can’t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using acme.sh) and copied the files over, so the configuration now reads as follows:

bind-addr: 1.2.3.4:3933
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.cer
cert-key: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.key

(This is getting very specific to my particular needs and constraints, so I’ll spare you the details.)

Service

To keep code-server running I created a systemd service that’s managed by my user’s systemd instance:

~ $ cat ~/.config/systemd/user/code-server.service
[Unit]
Description=code-server
After=network-online.target

[Service]
Environment=PATH=/home/joachim/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
ExecStart=/nix/var/nix/profiles/default/bin/nix run nixpkgs#code-server

[Install]
WantedBy=default.target

(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server.

For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.

Git credentials

The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved.

I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn’t be horrible, but I still wondered if I can do better.

I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn’t want to bother anyone on the weekend.

So I am experimenting with Github’s git-credential-manager now. I have configured it to use git’s credential cache with an elevated timeout, so that once I log in, I don’t have to again for one workday.

$ nix-env -iA nixpkgs.git-credential-manager
$ git-credential-manager configure
$ git config --global credential.credentialStore cache
$ git config --global credential.cacheOptions "--timeout 36000"

To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…

This still grants rather broad permissions to the code-server, but at least only temporarily

Android setup

On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!

I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It’s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it’s a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn’t on Open-VSX.

For some reason the F11 key doesn’t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode’s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that’s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.

A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn’t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed.

I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn’t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions.

Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does’t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don’t do that I get the full screen experience. We’ll see.

I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean’s InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically.

I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I’d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don’t have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?

Conclusion

After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.

A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe – I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?

by Joachim Breitner (mail@joachim-breitner.de) at February 02, 2025 03:07 PM

January 26, 2025

Chris Reade

PenroseKiteDart Animations

About PenroseKiteDart

Below we present some animations that illustrate operations on finite patches of Penrose’s Kite and Dart tiles.

These were created using PenroseKiteDart which is a Haskell package available on Hackage making use of the Haskell Diagrams package. For details, see the PenroseKiteDart user guide.

Penrose’s Kite and Dart tiles can produce infinite aperiodic tilings of the plane. There are legal tiling rules to ensure aperiodicity, but these rules do not guarantee that a finite tiling will not get stuck. A legal finite tiling which can be continued to cover the whole plane is called a correct tiling. The rest, which are doomed to get stuck, are called incorrect tilings. (More details can be found in the links at the end of this blog.)

Decomposition Animations

The function decompose is a total operation which is guaranteed to preserve the correctness of a finite tiling represented as a tile graph (or Tgraph). Let us start with a particular Tgraph called sunGraph which is defined in PenroseKiteDart and consists of 5 kites arranged with a common origin vertex. It is drawn using default style in figure 1 on the left. On the right of figure 1 it is drawn with both vertex labels and dotted lines for half-tile join edges.

Figure 1: sunGraph
Figure 1: sunGraph

We can decompose sunGraph three times by selecting index 3 of the infinite list of its decompositions.

    sunD3 :: Tgraph
    sunD3 = decompositions sunGraph !! 3

where we have used

    decompose :: Tgraph -> Tgraph
    
    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose

The result (sunD3) is drawn in figure 2 (scaled up).

Figure 2: sunD3
Figure 2: sunD3

The animation in figure 3 illustrates two further decompositions of sunD3 in two stages.

Figure 3: Two decompositions of sunD3

Figure 4 also illustrates two decompositions, this time starting from forcedKingD.

    forcedKingD :: Tgraph
    forcedKingD = force (decompose kingGraph)
Figure 4: Two decompositions of forcedKingD

A Composition Animation

An inverse to decomposing (namely composing) has some extra intricacies. In the literature (see for example 1 and 2) versions of the following method are frequently described.

  • Firstly, split darts in half.
  • Secondly, glue all the short edges of the half-darts where they meet a kite (simultaneously). This will form larger scale complete darts and larger scale half kites.
  • Finally join the halves of the larger scale kites.

This works for infinite tilings, but we showed in Graphs,Kites and Darts and Theorems that this method is unsound for finite tilings. There is the trivial problem that a half-dart may not have a complete kite on its short edge. Worse still, the second step can convert a correct finite tiling into an incorrect larger scale tiling. An example of this is given in Graphs, Kites and Darts and Theorems where we also described our own safe method of composing (never producing an incorrect Tgraph when given a correct Tgraph). This composition can leave some boundary half-tiles out of the composition (called remainder half-tiles).

The animation in figure 5 shows such a composition where the remainder half-tiles are indicated with lime green edges.

Figure 5: Composition Animation

In general, compose is a partial operation as the resulting half-tiles can break some requirements for Tgraphs (namely, connectedness and no crossing boundaries). However we have shown that it is a total function on forced Tgraphs. (Forcing is discussed next.)

Forcing Animations

The process of forcing a Tgraph adds half-tiles on the boundary where only one legal choice is possible. This continues until either there are no more forced additions possible, or a clash is found showing that the tiling is incorrect. In the latter case it must follow that the initial tiling before forcing was already an incorrect tiling.

The process of forcing is animated in figure 6, starting with a 5 times decomposed kite and in figure 7 with a 5 times decomposed dart.

Figure 6: Force animation
Figure 7: Another force animation

It is natural to wonder what forcing will do with cut-down (but still correct) Tgraphs. For example, taking just the boundary faces from the final Tgraph shown in the previous animation forms a valid Tgraph (boundaryExample) shown in figure 8.

    boundaryExample :: Tgraph
    boundaryExample = runTry $ tryBoundaryFaceGraph $ force $ decompositions dartGraph !!5
Figure 8: boundaryExample
Figure 8: boundaryExample

Applying force to boundaryExample just fills in the hole to recreate force (decompositions dartGraph !!5) modulo vertex numbering. To make it more interesting we tried removing further half-tiles from boundaryExample to make a small gap. Forcing this also completes the filling in of the boundary half-tiles to recreate force (decompositions dartGraph !!5). However, we can see that this filling in is constrianed to preserve the required Tgraph property of no crossing boundaries which prevents the tiling closing round a hole.

This is illustrated in the animation shown in figure 9.

Figure 9: Boundary gap animation

As another experiment, we take the boundary faces of a (five times decomposed but not forced) star. When forced this fills in the star and also expands outwards, as illustrated in figure 10.

Figure 10: Star boundary

In the final example, we pick out a shape within a correct Tgraph (ensuring the chosen half-tiles form a valid Tgraph) then animate the force process and then run the animation in both directions (by adding a copy of the frames in reverse order).

The result is shown in figure 11.

Figure 11: Heart animation

Creating Animations

Animations as gif files can be produced by the Haskell Diagrams package using the rasterific back end.

The main module should import both Diagrams.Prelude and Diagrams.Backend.Rasterific.CmdLine. This will expose the type B standing for the imported backend, and diagrams then have type Diagram B.

An animation should have type [(Diagram B, Int)] and consist of a list of frames for the animation, each paired with an integer delay (in one-hundredths of a second).

The animation can then be passed to mainWith.

module Main (main) where
    
import Diagrams.Prelude
import Diagrams.Backend.Rasterific.CmdLine

...

fig::[(Diagram B,Int)]
fig = myExampleAnimation

main :: IO ()
main = mainWith fig

If main is then compiled and run (e.g. with parameters -w 700 -o test.gif) it will produce an output file (test.gif with width 700).

Crossfade tool

The decompose and compose animations were defined using crossfade.

crossfade :: Int -> Diagram B -> Diagram B -> [Diagram B]
crossfade n d1 d2 = map blending ratios 
  where
    blending r = opacity (1-r) d1 <> opacity r d2
    ratios = map ((/ fromIntegral n) . fromIntegral) [0..n]

Thus crossfade n d1 d2 produces n+1 frames, each with d1 overlaid on d2 but with varying opacities (decreasing for d1 and increasing for d2).

Adding the same pause (say 10 hundreths of a second) to every frame can be done by applying map (,10) and this will produce an animation.

Force animation tool

To create force animations it was useful to create a tool to produce frames with stages of forcing.

forceFrames :: Angle Double 
            -> Int
            -> Tgraph 
            -> (Colour Double, Colour Double, Colour Double)
            -> [Diagram B]

This takes as arguments

  • an angle argument (to rotate the diagrams in the animation from the default alignment of the Tgraph),
  • an Int (for the required number of frames),
  • a Tgraph (to be forced),
  • a triple of colours for filling darts, kites and grout (edge colour), respectively.

The definition of forceFrames uses stepForce to advance forcing a given number of steps to get the intermediate Tgraphs. The total number of forcing steps will be the number of faces (half-tiles) in the final force g less the number of faces in the initial g. All the Tgraphs are drawn (using colourDKG) but the resulting diagrams must all be aligned properly. The alignment can be achieved by creating a VPatch (vertex patch) from the final Tgraph which is then rotated. All the Tgraphs can then be drawn using sub vertex patches of the final rotated one. (For details see Overlaid examples in the PenroseKiteDart user guide.)

Previous related blogs

  • PenroseKiteDart user guide – this explains how to install and use the PenroseKiteDart package.
  • Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.
  • Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
  • Graphs, Kites and Darts introduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
  • Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).

References

[1] Martin Gardner (1977) MATHEMATICAL GAMES. Scientific American, 236(1), (pages 110 to 121). http://www.jstor.org/stable/24953856

[2] Grünbaum B., Shephard G.C. (1987) Tilings and Patterns. W. H. Freeman and Company, New York. ISBN 0-7167-1193-1 (Hardback) (pages 540 to 542).

by readerunner at January 26, 2025 10:11 AM

January 25, 2025

Ken T Takusagawa

[kufstdwm] alpha-beta with transposition table as a library function

transposition table is the other elegant improvement to minimax (after alpha-beta): elegant in principle, hairy to implement in practice.

consider a generic implementation of alpha-beta game tree search with transposition table, generic enough to be applicable to any user-specified game.  what should be its API?  what features should it provide?

evaluate to infinite depth (possible because of transposition table), returning game value and line (principal variation).  intended for small games.

return the transposition table so that it can be reused for subsequent moves.

evaluate to given depth.  or, user-specified predicate of whether to stop searching, e.g., quiescence search.  quiescence search wants access to the transposition table.

ambitious: because of the many ways game tree search can be customized (for many examples, albeit often poorly described, see the chessprogramming wiki), structure the algorithm as a collection components each of which can be modified and hooked together in various ways.  I have no idea what language or framework could enable this kind of software engineering, though functional programming languages seem attractive as the first thing to try.  but beware that a pure functional programming language such as Haskell easily leaks space for this kind of task, and threading state, the transposition table, though the computation may be awkward.

common customizations sacrifice accuracy (correctness or completeness) for speed.  for example, if two different evaluated positions have the same key (for example, a 64-bit Zobrist hash in chess), one can optimize by doing no transposition table collision resolution; the second position gets ignored, assumed to have already been evaluated.  the default algorithm should not do such optimizations but should allow the user to specify both safe and unsafe optimizations.

allow the search to be augmented with various statistics gathered along the way that get consumed by other user-specified parts of the algorithm.  for example, the move generator could order moves based on values of similar moves already evaluated in other parts of the tree.

provide visibility into how user customizations are working, ways to evaluate whether or not they are worth it.

by Unknown (noreply@blogger.com) at January 25, 2025 04:34 AM

January 24, 2025

Sandy Maguire

Use Monoids for Construction

There’s a common anti-pattern I see in beginner-to-intermediate Haskell programmers that I wanted to discuss today. It’s the tendency to conceptualize the creation of an object by repeated mutation. Often this takes the form of repeated insertion into an empty container, but comes up under many other guises as well.

This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods:

Foo myFoo = new Foo().setBar().setQux(17).setZap(true);

Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this:

Doodad doodad = new Doodad;
foreach (Widget widget in widgets) {
  doodad.addWidget(widget);
}

To be more concrete, maybe instead of doodads and widgets you have BSTs and Nodes. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code.

Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like:

foldr
    (\(k, v) m -> Map.insert k v m)
    Map.empty
  $ toKVPairs something

Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark.

The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and foldrs all have in common is that they are algorithms for strategies for building something.

But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at.

I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one?

cgo :: (a -> (UInt, UInt)) -> [a] -> [NonEmpty a]
cgo f = foldr step []
  where
    step a [] = [pure a]
    step a bss0@((b :| bs) : bss)
      | let (al, ac) = f a
      , let (bl, bc) = f b
      , al + 1 == bl && ac == bc
            = (a :| b : bs) : bss
      | otherwise = pure a : bss0

which I found by grepping through haskell-language-server for foldr, and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests!

It’s Always Monoids

The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces.

The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.”

As an amazingly fast recap, a monoid is a collection of three things: some type m, some value of that type mempty, and binary operation over that type (<>) :: m -> m -> m, subject to a bunch of laws:

∀a. mempty <> a = a = a <> mempty
∀a b c. (a <> b) <> c = a <> (b <> c)

which is to say, mempty does nothing and (<>) doesn’t care where you stick the parentheses.

If you’re going to memorize any two particular examples of monoids, it had better be these two:

instance Monoid [a] where
  mempty = []
  a <> b = a ++ b

instance (Monoid a, Monoid b) => Monoid (a, b) where
  mempty = (mempty, mempty)
  (a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2)

The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids.

The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation toList a <> toList b = toList (a <> b). Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap.

The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example:

from :: (Bool -> a) -> (a, a)
from f = (f False, f True)

to :: (a, a) -> (Bool -> a)
to (a, _) False = a
to (_, a) True  = a

and under this isomorphism, we should thereby expect the Monoid a => Monoid (Bool -> a) instance to agree with Monoid a => Monoid (a, a). If you generalize this out, you get the following instance:

instance Monoid a => Monoid (x -> a) where
  mempty = \_ -> mempty
  f <> g = \x -> f x <> g x

which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation:

  from f <> from g
= (f False,  f True) <> (g False, g True)
= (f False <> g False, f True <> g True)
= ((f <> g) False, (f <> g) True)
= from (f <> g)

and

  to (a11, a12) <> to (a21, a22)
= \x -> to (a11, a12) x <> to (a21, a22) x
= \x -> case x of
    False -> to (a11, a12) False <> to (a21, a22) False
    True  -> to (a11, a12) True  <> to (a21, a22) True
= \x -> case x of
    False -> a11 <> a21
    True  -> a12 <> a22
= \x -> to (a11 <> a21, a12 <> a22) x
= to (a11 <> a21, a12 <> a22)

which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type x in the domain of the function, but showing it generically is challenging.

Anyway, I digresss.

The reason to memorize this Monoid instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be.

Don’t believe me? A Map k v is an encoding of the function k -> Maybe v optimized to efficiently query which k values map to Just something. That is to say, it’s a sparse representation of a function.

From Theory to Practice

What does all of this look like in practice? Stuff like worrying about foldr is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application.

The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things.

At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well.

It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the Scope construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these Scope nodes, and propagating their information downstream.

Rather amazingly, this implementation kinda sorta worked! But it was slow, and took \(O(10k)\) SLOC to implement.

The insight here is that everything we needed to collect was monoidal:

data Solution = Solution
  { graph :: Graph
  , collectedNodes :: Set Node
  , metadata :: Map Node Metadata
  }
  deriving stock (Generic)
  deriving (Semigroup, Monoid) via Generically Solution

where the deriving (Semigroup, Monoid) via Generically Solution stanza gives us the semigroup and monoid instances that we’d expect from Solution being the product of a bunch of other monoids.

And now for the coup de grace: we hook everything up with the Writer monad. Writer is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details:

instance Monoid w => Monad (Writer w)

Writer w is a monad whenever w is a monoid, which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions:

collectNode :: MonadWriter Solution m => Node -> m ()
collectNode n = tell $ mempty { collectedNodes = Set.singleton n }

addMetadata :: MonadWriter Solution m => Node -> Metadata -> m ()
addMetadata n m = tell $ mempty { metadata = Map.singleton n m }

emitGraphFragment :: MonadWriter Solution m => Graph -> m ()
emitGraphFragment g = tell $ mempty { graph = g }

each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type:

algorithm
  :: Metadata
  -- ^ the current scope
  -> Tree
  -- ^ the tree we're reducing
  -> Writer Solution Node
  -- ^ our partial solution, and the node corresponding to the root of the tree

which traverses the Tree, recursing with a different Metadata whenever it comes across a Scope constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root Node of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces.

This new implementation is roughly 20x smaller, coming in at @O(500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation.

Chalk it down to another win for induction!

January 24, 2025 09:35 AM

January 23, 2025

Brent Yorgey

You could have invented Fenwick trees

You could have invented Fenwick trees

Posted on January 23, 2025
Tagged , , , , , ,

My paper, You could have invented Fenwick trees, has just been published as a Functional Pearl in the Journal of Functional Programming. This blog post is an advertisement for the paper, which presents a novel way to derive the Fenwick tree data structure from first principles.

Suppose we have a sequence of integers \(a_1, \dots, a_n\) and want to be able to perform two operations:

  • we can update any \(a_i\) by adding some value \(v\) to it; or
  • we can perform a range query, which asks for the sum of the values \(a_i + \dots + a_j\) for any range \([i,j]\).

There are several ways to solve this problem. For example:

  1. We could just keep the sequence of integers in a mutable array. Updating is \(O(1)\), but range queries are \(O(n)\) since we must actually loop through the range and add up all the values.
  2. We could keep a separate array of prefix sums on the side, so that \(P_i\) stores the sum \(a_1 + \dots + a_i\). Then the range query on \([i,j]\) can be computed as \(P_j - P_{i-1}\), which only takes \(O(1)\); however, updates now take \(O(n)\) since we must also update all the prefix sums which include the updated element.
  3. We can get the best of both worlds using a segment tree, a binary tree storing the elements at the leaves, with each internal node caching the sum of its children. Then both update and range query can be done in \(O(\lg n)\).

I won’t go through the details of this third solution here, but it is relatively straightforward to understand and implement, especially in a functional language.

However, there is a fourth solution, known as a Fenwick tree or Fenwick array, independently invented by Ryabko (1989) and Fenwick (1994). Here’s a typical Java implementation of a Fenwick tree:

class FenwickTree {
    private long[] a;
    public FenwickTree(int n) { a = new long[n+1]; }
    public long prefix(int i) {
        long s = 0;
        for (; i > 0; i -= LSB(i)) s += a[i]; return s;
    }
    public void update(int i, long delta) {
        for (; i < a.length; i += LSB(i)) a[i] += delta;
    }
    public long range(int i, int j) {
        return prefix(j) - prefix(i-1);
    }
    public long get(int i) { return range(i,i); }
    public void set(int i, long v) { update(i, v - get(i)); }
    private int LSB(int i) { return i & (-i); }
}

I know what you’re thinking: what the heck!? There are some loops adding and subtracting LSB(i), which is defined as the bitwise AND of i and -i? What on earth is this doing? Unless you have seen this before, this code is probably a complete mystery, as it was for me the first time I encountered it.

However, from the right point of view, we can derive this mysterious imperative code as an optimization of segment trees. In particular, in my paper I show how we can:

  1. Start with a segment tree.
  2. Delete some redundant info from the segment tree, and shove the remaining values into an array in a systematic way.
  3. Define operations for moving around in the resulting Fenwick array by converting array indices to indices in a segment tree, moving around the tree appropriately, and converting back.
  4. Describe these operations using a Haskell EDSL for infinite-precision 2’s complement binary arithmetic, and fuse away all the intermediate conversion steps, until the above mysterious implementation pops out.
  5. Profit.

I may be exaggerating step 5 a teensy bit. But you’ll find everything else described in much greater detail, with pretty pictures, in the paper! The official JFP version is here, and here’s an extended version with an appendix containing an omitted proof.

References

Fenwick, Peter M. 1994. “A New Data Structure for Cumulative Frequency Tables.” Software: Practice and Experience 24 (3): 327–36.
Ryabko, Boris Yakovlevich. 1989. “A Fast on-Line Code.” In Doklady Akademii Nauk, 306:548–52. 3. Russian Academy of Sciences.
<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at January 23, 2025 12:00 AM