How to turn polling insight into an optimal ballot — and why anything else is wasted.
“approve of�? What does that mean anyway?
I have written previously about how approval and range voting methods are intrinsically tactical. This doesn’t mean that they are more tactical than other election systems (nearly all of which are shown to sometimes be tactical by Gibbard’s Theorem when there are three or more options). Rather, it means that tactical voting is unavoidable. Voting in such a system requires answering the question of where to set your approval threshold or how to map your preferences to a ranged voting scale. These questions don’t have more or less “honest� answers. They are always tactical choices.
But I haven’t dug deeper into what these tactics look like. Here, I’ll do the mathematical analysis to show what effective voting looks like in these systems, and make some surprising observations along the way.
Mathematical formalism for approval voting
We’ll start by assuming an approval election, so the question is where to put your threshold. At what level of approval do you switch from voting not to approve a candidate to approving them?
We’ll keep the notation minimal:
As is standard in probability, I’ll write ℙ[X] for the probability of an event X, and �[X] for the expected value of a (numerical) random variable X.
I will use B to refer to a random collection (multiset) of ballots, drawn from some probability distribution reflecting what we know from polling and other information sources on other voters. B will usually not include the approval vote that you’re considering casting, and to include that approval, we’ll write B ∪ {c}, where c is the candidate you contemplate approving.
I’ll write W(·) to indicate the winner of an election with a given set of ballots. This is the candidate with the most approvals. We’ll assume some tiebreaker is in place that’s independent of individual voting decisions; for instance, candidates could be shuffled into a random order before votes are cast, in in the event of a tie for number of approvals, we’ll pick the candidate who comes first in that shuffled order.
U(·) will be your utility function, so U(c) is the utility (i.e., happiness, satisfaction, or perceived social welfare) that you personally will get from candidate c winning the election. This doesn’t mean you have to be selfish, per se, as accomplishing some altruistic goal is still a form of utility, but we evaluate that utility from your point of view even though other voters may disagree.
With this notation established, we can clearly state, almost tautologically, when you should approve of a candidate c. You should approve of c whenever:
�[U(W(B ∪ {c}))] > �[U(W(B))]
That’s just saying you should approve of c if your expected utility from the election with your approval of c is more than your utility without it.
The role of pivotal votes and exact strategy
This inequality can be made more useful by isolating the circumstances in which your vote makes a difference in the outcome. That is, W(B ∪ {c}) ≠W(B). Non-pivotal votes contribute zero to the net expectation, and can be ignored.
In approval voting, approving a candidate can only change the outcome by making that candidate the winner. This means a pivotal vote is equivalent to both of:
W(B ∪ {c}) = c
W(B) ≠ c
It’s useful to have notation for this, so we’ll define V(B, c) to mean that W(B ∪ {c}) ≠W(B), or equivalently, that W(B ∪ {c}) = c and W(B) ≠c. To remember this notation, recall that V is the pivotal letter in the word “pivot�, and also visually resembles a pivot.
With this in mind, the expected gain in utility from approving c is:
�[U(W(B ∪ {c}))] - �[U(W(B))]. But since the utility gain is zero except for pivotal votes, this is the same as
ℙ[V(B,c)] · (�[U(W(B ∪ {c})) | V(B,c)] - �[U(W(B)) | V(B,c)]). But since V(B,c) implies that W(B ∪ {c}) = c, so this simplifies to
ℙ[V(B,c)] · (U(c) - �[U(W(B)) | V(B, c)])
Therefore, you ought to approve of a candidate c whenever
U(c) > �[U(W(B)) | V(B, c)]
This is much easier to interpret. You should approve of a candidate c precisely when the utility you obtain from c winning is greater than the expected utility in cases where c is right on the verge of winning (but someone else wins instead).
There are a few observations worth making about this:
The expectation clarifies why the threshold setting part of approval voting is intrinsically tactical. It involves evaluating how likely each other candidate is to win, and using that information to compute an expectation. That means advice to vote only based on internal feelings like whether you consider a candidate acceptable is always wrong. An effective vote takes into account external information about how others are likely to vote, including polling and understanding of public opinion and mood.
The conditional expectation, assuming V(B, c), tells us that the optimal strategy for whether to approve of some candidate c depends on the very specific situation where c is right on the verge of winning the election. If c is a frontrunner in the election, this scenario isn’t likely to be too different from the general case, and the conditional probability doesn’t change much. However, if c is a long-shot candidate from some minor party, but somehow nearly ties for a win, we’re in a strange situation indeed: perhaps a major last-minute scandal, a drastic polling error, or a fundamental misunderstanding of the public mood. Here, the conditonal expected utility of an alternate winner might be quite different from your unconditional expectation. If, say, voters prove to have an unexpected appetite for extremism, this can affect the runner-ups, as well.
Counter-intuitively, an optimal strategy might even involve approving some candidates that you like less than some that you don’t approve! This can happen because different candidates are evaluated against different thresholds. Therefore, a single voter’s best approval ballot isn’t necessarily monotonic in their utility rankings. This adds a level of strategic complexity I hadn’t anticipated in my earlier writings on strategy in approval voting.
Approximate strategy
The strategy described above is rigorously optimal, but not at all easy to apply. Imagining the bizarre scenarios in which each candidate, no matter how minor, might tie for a win, is challenging to do well. We’re fortunate, then, that there’s a good approximation. Remember that the utility gain from approving a candidate was equal to
ℙ[V(B,c)] · (U(c) - �[U(W(B)) | V(B, c)])
In precisely the cases where V(B, c) is a bizarre assumption that’s difficult to imagine, we’re also multiplying by ℙ[V(B,c)], which is vanishingly small, so this vote is very unlikely to make a difference in the outcome. For front-runners, who are relatively much more likely to be in a tie for the win, the conditional probability changes a lot less: scenarios that end in a near-tie are not too different from the baseline expectation.
This happens because ℙ[V(B,c)] falls off quite quickly indeed as the popularity of c decreases, especially for large numbers of voters. For a national scale election (say, about 10 million voters), if c expects around 45% of approvals, then ℙ[V(B,c)] is around one in a million. That’s a small number, telling us that very large elections aren’t likely to be decided by a one-vote margin anyway. But it’s gargantuan compared to the number if c expects only 5% of approvals. Then ℙ[V(B,c)] is around one in 10^70. That’s about one in a quadrillion-vigintillion, if you want to know, and near the scale of possibly picking one atom at random from the entire universe! The probability of casting a pivotal vote drops off exponentially, and by this point it’s effectively zero.
With that in mind, we can drop the condition on the probability in the second term, giving us a new rule: Approve of a candidate c any time that:
U(c) > �[U(W(B))]
That is, approve of any candidate whose win you would like better than you expect to like the outcome of the election.
This rule is still tactical. To determine how much you expect to like the outcome of the election, you need to have beliefs about who else is likely to win, which still requires an understanding of polling and public opinion and mood.
However, there is one threshold, derived from real polling data in realistic scenarios, and you can cast your approval ballot monotonically based on that single threshold.
This is no longer a true optimal strategy, but with enough voters, the exponential falloff in ℙ[V(B,c)] as c becomes less popular is a pretty good assurance that the incorrect votes you might cast by using this strategy instead of the optimal ones are extremely unlikely to matter. In practice, this is probably the best rule to communicate to voters in an approval election with moderate to large numbers of voters.
Reducing range to approval voting
I promised to look at strategy for range voting, as well. Armed with an appreciation of approval strategy, it’s easy to extend this to an optimal range strategy, as well, for large-scale elections.
The key is to recognize that a range voting election with options 0, 1, 2, …, n is mathematically equivalent to an approval election where everyone is just allowed to vote n times. The number you mark on the range ballot can be interpreted as saying how many of your approval ballots you want to mark as approving that candidate.
Looking at it this way presents the obvious question: why would you vote differently on some ballots than others? In what situation could that possibly be the right choice?
For small elections, say if you’re voting on places to go out and eat with your friends or coworkers, it’s possible that adding in a couple ballots materially changes the election so that the optimal vote is different. Then it may well be optimal to cast a range ballot using some intermediate number.
For large elections, though, you’re presented with pretty much exactly the same question each time, and you may as well give the same answer. Therefore, in large-scale elections, the optimal way to vote with a range ballot is always to rate everyone either the minimum or maximum possible score. This reduces a range election exactly to an approval election. The additional expressiveness of a range ballot is a siren call: by using it, you always vote less effectively than you would have by ignoring it and using only the two extreme choices.
Since we’re discussing political elections, which have relatively large numbers of voters, this answers the question for range elections, as well: Rate a candidate the maximum score if you like them better than you expect to like the outcome of the election. Otherwise, rate them the minimum score.
Summing it up
What we’ve learned, then, is that optimal voting in approval or range systems boils down to two nested rules.
Exact rule (for the mathematically fearless): approve c iff U(c) > �[ U(W(B)) | your extra vote for c is pivotal ]. This Bayesian test weighs each candidate against the expected utility in the razor-thin worlds where they tie for first.
Large-electorate shortcut (for everyone else): because those pivotal worlds become astronomically rare as the field grows, the condition shrinks to a single cutoff: approve (or give a maximum score) to every candidate whose victory you expect to enjoy more than the forecasted winner’s.
We’ve seen why the first rule is the gold standard; but the second captures virtually all of its benefit when millions are voting. Either way, strategy is inseparable from sincerity: you must translate beliefs about polling into a utility threshold, and then measure every candidate against it. We’ve also seen by a clear mathematical equivalence why range ballots add no real leverage in large-scale elections, instead only offering false choices that are always wrong.
The entire playbook fits on a sticky note: compute the threshold, vote all-or-nothing, and let the math do the rest.
Today will be the first in a series where we’ll be exploring some LeetCode problems and comparing different solutions from Haskell and Rust. The main idea is to demonstrate how you might translate ideas between the recursive core of Haskell, and the loop-based framing of most other languages.
If you want to learn more about problem solving in Haskell, you should take a closer look at Solve.hs! This course will give you an in-depth walkthrough of problem solving ideas in Haskell, including how concepts compare to more typical languages.
The Problem
The first problem we’ll consider is called H-Index. In academia, a person has an “H-Index” of n if they have published at least n papers that have n or more citations. So the input to our problem is a list of integers, where each integer is the number of citations of a particular paper the author wrote. Our job is to calculate the author’s H-Index.
The Algorithm
This problem is fairly straightforward if you sort the input list. Once we do this, we can look at any index i, and consider the number of remaining entries (e.g. n - i), and we’ll know that the number of papers with at least that many citations is n - i.
So we can accomplish this task with a single loop over the sorted list. Throughout this loop, we’ll be tracking the maximum “H-Index” we’ve seen so far (maxH). At each iteration, we take the following steps:
Get the number of remaining papers (rem) and the citations at this index (next)
If the rem is greater than next, then update maxH to next if next is larger.
Otherwise, update maxH to rem if rem is greater.
The last step is a key edge case! If we have the list [1, 1, 1, 9, 9], we’ll get to index 3, with next being 9 and rem being 2. The remainder is smaller than the index, but we would still update maxH to 2, because there are at least 2 citations remaining that are 2 or greater.
Rust Solution
Here’s our Rust solution:
pub fn h_index(citations: Vec<i32>) -> i32 {
let mut cp = citations.clone();
cp.sort();
let n = cp.len();
let mut maxH: i32 = 0;
for i in 0..n {
let next = cp[i];
let rem: i32 = (n - i) as i32;
if (rem >= next) {
maxH = std::cmp::max(next, maxH);
} else {
maxH = std::cmp::max(rem, maxH);
}
}
return maxH;
}
We have the first part, where we clone the input, sort it, and set up our loop variables:
pub fn h_index(citations: Vec<i32>) -> i32 {
let mut cp = citations.clone();
cp.sort();
let n = cp.len();
let mut maxH: i32 = 0;
...
}
Then we have the loop itself, where we have our two cases to consider:
for i in 0..n {
let next = cp[i];
let rem: i32 = (n - i) as i32;
if (rem >= next) {
// There are at least ‘next’ papers >= ‘next’
maxH = std::cmp::max(next, maxH);
} else {
// ‘next’ > ‘rem’, so there are at least ‘rem’ papers >= ‘rem’
maxH = std::cmp::max(rem, maxH);
}
}
So this is pretty straightforward. Now how do we approach this kind of problem in Haskell?
Haskell Solution
Our Haskell solution will have the same structure, but instead of running a loop and indexing into a vector, we’ll use a linked list and call a recursive function. Let’s begin by getting the length and sorting our input:
import qualified Data.List as L
hIndex :: [Int] -> Int
hIndex inputs = ...
where
n = length inputs
sorted = L.sort inputs
...
Now we need to think about our recursive loop function. At each iteration, we need access to the remaining number of values, the next citation value, and we need to pass along maxH. As with many list-based recursive functions, we’ll peel off one element of the input list each time. Ultimately we’ll return maxH from this loop when we hit our base case of an empty input list. So its type signature should look like this:
loop :: (Int, [Int], Int) -> Int
When writing a recursive function, we always handle the base case first:
Now in the recursive case, we can apply our algorithm, updating maxH if necessary:
loop :: (Int, [Int], Int) -> Int
loop (_, [], maxH) = maxH
loop (remaining, next : rest, maxH) = if remaining >= next
then loop (remaining - 1, rest, max next maxH)
else loop (remaining - 1, rest, max remaining maxH)
To finish up, all we need to do is call our loop function with the appropriate initial inputs (n, sorted, 0). Here’s our complete Haskell solution:
import qualified Data.List as L
hIndex :: [Int] -> Int
hIndex inputs = loop (n, sorted, 0)
where
n = length inputs
sorted = L.sort inputs
loop :: (Int, [Int], Int) -> Int
loop (_, [], maxH) = maxH
loop (remaining, next : rest, maxH) = if remaining >= next
then loop (remaining - 1, rest, max next maxH)
else loop (remaining - 1, rest, max remaining maxH)
Using a Fold
Now we can notice that our loop has a particular structure. We have one piece of accumulated state (maxH), and this changes based on each value in our list (combined with the remaining values). We can easily re-imagine this kind of loop using a fold. We just have to think of the folding function like this:
loop :: Int -> (Int, Int) -> Int
loop maxH (remaining, next) = if remaining >= next
then max next maxH
else max remaining maxH
This has the a -> b -> a structure of a left-fold function, where a is our accumulated maxH value, and the other values come from our list. The main benefit here is that our loop function no longer has to deal with the burden of calling a base case or passing the “shrinking” list as an argument to the next recursive call.
We can invoke this loop at the top level like so:
hIndex :: [Int] -> Int
hIndex inputs = foldl loop 0 (zip [n,n-1..1] sorted)
where
n = length inputs
sorted = L.sort inputs
loop :: Int -> (Int, Int) -> Int
loop maxH (remaining, next) = if remaining >= next
then max next maxH
else max remaining maxH
We just have to zip the decreasing indices together with our sorted list. Now our recursive “loop” is more like a typical for-loop. We’re only considering one element at a time, and we’re updating the important state each time.
Conclusion
In this comparison, we saw a good comparison between a normal for-loop in Rust, and a recursive solution in Haskell. We also saw how we could simplify this recursive formulation into a “fold” structure.
If you're interested in learning more about writing recursive functions in Haskell, check out our Solve.hs course. You’ll learn how to start thinking about problems in a functional way, and you’ll learn the step-by-step processes for tackling problems with basic recursion and folds like we saw in this example.
I was the chief editor for this blog for the past 8 years or so,
and I’ve just recently passed the mantle to Chris Harrison. I thought
I’d take the opportunity to write a little bit about this blog, how
it’s operated and what it means to us. Besides, we do like when things
get meta here, so this is a blog post about the blog.
A little bit of history
One of the tenets under which Mathieu Boespflug founded
Tweag was that software engineers naturally don’t write enough.
Writing is an essential part of the engineering job. We write issues,
pull requests, code comments, documentation of various sorts. We
discuss and debate online, we have to arbitrate trade-offs. Most of
this is common, in fact, to every engineering profession. And
although we typically become engineers because of our taste for the
technical part of our job (I certainly did, I don’t know for sure how
ubiquitous it is); writing is still a big part of our job.
So Mathieu reasoned, if writing is to be such a big part of our job,
but we’re technical people at heart, not writers, for the company to
be at its best it needs to make it clear that precise and clear
writing is important to our job. To that effect, there ought to be
venues for us to write, where it was made an expectation, an actual
requirement, for us to write. The blog is one of them.
At first, the blog was just there. After all, our blog was, and still
is, just (part of) a Github repository. So we’d make pull requests and
merge blog posts. But, of course, people seek review. When you’re
posting in the name of a company, you tend to be a little more careful
about what you write; besides, we’re all so used to having our PRs
reviewed. This was all very ad hoc, there was no process for it.
Whatever my reason at the time (I honestly don’t remember after
all this time), I ended up participating in the review of most blog
posts. It’s a poorly kept, but surprisingly little discussed secret
that the most common way to get a responsibility in a company is to
just assume said responsibility. Do the thing, and it will become your
charge. Anyway, I became editor in chief.
Why we blog
Besides being a venue to exercise our technical writing, this blog has
been tremendously useful to us. Keeping a high-quality, serious,
technical blog helped establish trust in our work, build bridges with
community, and attract clients and employees.
It’s very rewarding for us, as individuals, to be recognised by our
peers, and to work with a company which is likewise recognised. This
is certainly a great motivation for us to write.
Besides, it fits very well with our open-source values, both as a
company and as individuals. We strongly believe in open-source, and
the value it has to the world. So we participate. And open-source
software isn’t just about putting software’s sources out there.
Open-source is also about sharing knowledge. Which is another important
element of the company’s ethos. We do have formal and informal venues
for sharing knowledge internally, all very important stuff, but it’s a
topic for another time. The blog is one of our main venues for sharing
knowledge with the rest of the world.
So we do share, we talk about our open-source journey, the software we
build, or the software we use. But we also share what we learn through
working with clients, not necessarily on open-source projects.
So this blog is something that operates at the sweet spot of any
company’s activities: it’s something we do because we believe that
it’s the right thing to do, and it’s also something which is valuable
to the company. In fact the blog brought so many clients that it
allowed Tweag to operate without a marketing team for quite a few
years. Nowadays, the company’s bigger and we can’t rely on a blog
alone, but it’s still a great way to connect.
How we blog
Writing a blog post is making a pull request against the Github
repository for the tweag.io website. A post is a
Markdown file, possibly accompanied by resources (such as
images). It’s a workflow which is very familiar to software
engineers. All the review process is, in fact, a Github pull request
review.
The author of a blog post is first tasked with obtaining
a “technical review” from their peers: people close to the topic
review the blog post for accuracy and relevance. The blog post editors
are all engineers (the entire process, in fact, is owned by
engineers), but aren’t necessarily very familiar with the blog post’s
topic, so this first round of review couldn’t be performed by the
editing team, it’s also much better for scalability as technical
review uses a little time from a lot of people, rather than a lot of
time from a few.
When the author and technical reviewers are satisfied with the blog
post, they send it to the editing team, which will do two rounds of
review (which we cleverly call “first editorial review” and “second
editorial review”). The editing team reviews for clarity and writing
quality. Something that isn’t a concern of reviewers (or really of
the process as a whole), on the other hand, is SEO. Certainly our blog
posts make our website easier to find, and of course this is important
to us. But this is the result, we hope, of making blog posts that you
like.
We can propose edits in three different ways: we can ask
question in the review thread, propose “suggestions” in the Github
interface which the author is free to accept or not, or we can push
directly to the blog post (in which case we try our best to preserve
the author’s voice, and always leave time for the author to check that
the edits are correct). The choice between the three is primarily
driven by our degree of confidence in the edit. Really we do what makes
sense to reduce the amount of back-and-forth.
Reviewing blog posts on Github with a team distributed all around the
world is essentially a distributed process. As with any distributed process,
we try to minimise the amount of synchronisation. Our target is to be
able to do a review in two weeks or less (one week for each
round). Sometimes it takes more though, when the blog post presents
more difficult editing challenges, or when the author doesn’t have
enough availability outside of their client work to respond to our
reviews quickly enough. But we hit the target more often than not.
The two rounds of review are functionally identical. We just found
that, in practice, having two rounds of review helps raise the quality
a lot. The first reviewer often gets involved deeply in the writing of
the blog post, to the point that they stop seeing the blog post as a
reader. The second reviewer receives a blog post in almost publishable
condition and brings fresh eyes to conduct the finishing touch.
What blogging means to us
When Tweag was a standalone company, blogging was almost our entire
marketing strategy. Now that we’re part of the substantially larger
Modus Create, we aren’t in a position to rely solely on technical
blogging for marketing. Yet, this blog is still an important part of
our strategy.
All of what I wrote above still applies, but I’d like to bring up
something else. See, a company is a little bit faceless. It doesn’t
have a drive, it doesn’t have a personality: people do. This is truer
the larger a company is. We don’t aspire to be a faceless abstract
entity, though. Modus Create is made of people, people that we believe
to be interesting, and that we hope you’ll find interesting too. In
fact our business is largely for clients (maybe you!) to find our
people interesting and hire us. This blog is one of the ways we use to
promote individuals. Who they are, what they like, what they know,
what they can do. This is why our writing recommendations encourage
authors to use the pronoun “I” where many of us would be tempted to say “we”.
A more collectivist consideration is communities. We take part in a
number of communities (Haskell, Nix, Bazel, Typescript, …), and it’s
all too easy for a company to say it’s part of a community but really
meaning that they just use Haskell, Nix, Bazel, Typescript. But it’s
not really what being part of a community means, does it? It also
entails taking part in community life. There’s a bunch of things
we do: we write libraries,
contribute
upstream,
help with
governance,
sponsor and
sometimes organise events, …. And we blog, which also plays its part in
community life. This is why we don’t shy away from highly
specialised blog posts. Sure they are addressed to a pretty narrow,
sometimes quite niche audience, but they are relevant to one of our
communities. Of course it helps establish our technical
chops. Hopefully it also builds trust.
I should note that what I’m writing about is what works for us. It’s
not a recipe that will automatically work for you. There are many
excellent technical blogs out there, which, I assume, probably have
different approaches to us, such as
CockroachDB’s,
Netflix’s,
fly.io’s and Trail of
Bits’s (the latter two I wasn’t aware
of, by the way, they were brought to my attention by this blog post’s
technical review). If you like this blog, you’ll probably like theirs,
go check them out! Dan Luu shares some more general
considerations. But at the end of
the day, what makes a strong communication strategy is to build
around and empower the people you already have. Capitalise on your
strengths, don’t go against the grain just to imitate what someone
else does, however successful they appear to be.
Sign-off
A paradox of the work of software engineers (and probably all
engineering disciplines really) is that writing is such an integral
part of our job, but most of us receive virtually no training in our
studies.
I’ve never had technical writing classes, myself. The way I learned
writing was in part by gleaning some of the popular wisdom taught
among my peers, but mostly by co-authoring scientific articles with
better technical writers than me and seeing them do their
magic. Honestly, I found no better learning experience than seeing an
entire paragraph of mine rewritten in a mere few words that were also
more precise.
I don’t know how to teach technical writing formally. So a lot of the
process we’ve converged on reflects my attempt to replicate what
worked on me for the company: teaching technical writing by
example. By showing how a blog post can be improved, hopefully the
authors will learn to write better next time. Because the blog is this
as well: a tool to teach ourselves, collectively, how to be technical
writers.
This is a very soft target, it’s hard to measure the degree to which we’ve
succeeded. But, because I don’t really know how to close this post
without getting a touch emotional (it is, after all, no small moment
for me), I’ve witnessed many of our engineers mature as writers, and I
can’t help but feeling some parent-like pride at their growth.
Today, 2025-05-14, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 44th episode of the Haskell Unfolder live on YouTube.
Many Haskell programmers will be familiar with property based testing of pure functions (for those who are not, various episodes of the Haskell Unfolder have discussed this: #4, #21, #38 and #40). Property based testing for stateful systems (“IO code”) is however much less well-known, which is a pity as it is just as useful! In this episode we will demonstrate how we can use quickcheck-lockstep to verify the responses we get from a simple stateful API; as we will see, all of the lessons from property based testing for pure functions can be applied in this stateful setting also.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
In last week’s article, I discussed how Monday Morning Haskell courses compare to other Haskell courses that I’ve seen out there online. Obviously I’m wildly biased, but I think MMH courses have some serious advantages.
But there’s still the elephant in the room…how do my courses compare to the possibility of using generative AI (e.g. ChatGPT) to learn Haskell instead? While AI is a great tool that has opened up a lot of doors in terms of learning complex concepts, human-developed courses still have some important advantages over the current way you would learn from a chatbot.
Analogy: Going to the Library
I’ll start my case by drawing an (imperfect) analogy. Suppose you are enrolled in a college and want to learn a particular subject, like physical chemistry. You could enroll in your school’s physical chemistry course. Or you could spend the same amount of time going to the library. After all, the library has tons of books on physical chemistry. So you could read all these books and gain the same level of insight, right?
In this example, most people would recognize the shortcomings of just going to the library. For example, you are now responsible for determining the curriculum and course of study.
You could, of course, look at the table of contents of an introductory book and just run with that way of organizing the material. But how much of that do you need to learn? Most college courses aren’t going all the way through a textbook, because the professor already has a good idea of what material is the most important, and has organized the course around that.
A professor will also know when and how to introduce supplemental material from other sources. If you’re just “learning from the library”, you’d be responsible for selecting which materials are the most important, and you probably aren’t qualified!
Also, while textbooks may have practice problems, and they may even have answers to those problems, you still have to do the work of figuring out which problems to study, and how many you need to study before you know the material. Taking a full course with assignments would solve this for you.
Finally, textbooks will rarely tell you about the human process of learning a particular subject. You probably aren’t going to read a sentence like “lots of students struggle to understand this, here’s a way of thinking about the problem that has helped a lot of them.” These are insights you’ll gain from working with a professor (who has taught real students) or other students in the class.
So let’s sum up the shortcomings of “learning from the library”:
Direction - You must take on the cognitive overhead of determining which areas of the subject to study.
Filtering - You must figure out how much detail is necessary, and how much practice you need to learn it.
Human Learning Insight - Textbooks are generally lacking in the actual insights and breakthroughs that help students understand particularly challenging ideas.
From Physical to Online Learning
Now let’s consider what changes about the analogy if instead of comparing physical learning environments, we think about the current online learning environment. Entering an online course is significantly easier than enrolling in a university course. You don’t have to wait for the start of the semester, or go to a physical location.
But using ChatGPT as your “library” is vastly easier than studying from textbooks. In a matter of minutes, you can get tons of information on your screen that would have taken hours or days of effort at the library. And best of all, you can get information on virtually any topic, rather than just those that have pre-existing textbooks.
But I would still claim that using Chatbots for learning shares some of the drawbacks of “learning from the library”. And for these reasons, it’s still worthwhile to consider online courses where they exist instead of relying solely on ChatGPT. Some of these drawbacks might seem counterintuitive, but let’s think about it.
Direction of Study
You might think, “I don’t need to set my own direction, ChatGPT will do that for me!” And yes, you can ask it to lay out a syllabus for you (I did this myself in one of the examples below). This will give you a list of topics to study.
But it won’t just write out the whole course for you based on this initial syllabus in one go. You have to keep prompting it to provide you with the information you want. And it will get sidetracked, consistently asking you to go deeper and deeper down particular rabbit holes.
So it’s still up to you to determine how much you really want to study about particular topics, and you need to maintain the discipline to pull it back out and shift gears. A human-designed course puts these limits in there for you, so that you don’t need to carry that cognitive load.
Filtering
This brings us to the next issue of “filtering”. ChatGPT will provide you with a lot of information, all at once. You’ll ask a simple question, and get a very complicated answer with lots of tables comparing various different ways of looking at the question.
Sometimes, this is nice. It will expose you to ideas you wouldn’t have thought of otherwise. Sometimes though, it’s very distracting. It takes you away from the core of what you’re trying to learn. You have to make sure you aren’t getting dragged into an infinite loop of concepts.
The “practice” problem also exists. ChatGPT can keep coming up with practice problems, but it’s up to you to know how many you really need to study. In our case study below, we’ll also consider that it’s not necessarily the best tool for coming up with practice problems.
Again, a human-designed course does the filtering and measuring for you.
Human Insight
Once at my job, I was reviewing a teammate’s code that implemented a complicated algorithm. I told him, “after I looked closely at this one particular line, my understanding of this algorithm went from like 30% to 70%”, so adding an explanatory comment here would be very helpful!”.
This experience helped me understand the idea of “knowledge inflection points”. These are the key insights that really help you understand a topic. I’ve had several of these with various Haskell concepts, from monads, to folds, to data structures and certain algorithms. I’ve done my best to incorporate these insights into my course content.
An example from Solve.hs might be my understanding of “the common API” of Haskell data structures. This made it much easier for me to reason about using different structures in Haskell.
An AI probably wouldn’t frame the issue in the way I did, unless you already have the knowledge to prompt it. AI’s don’t have the experience of “learning” a concept piece-by-piece, and knowing when things finally “clicked”. You could try asking the chatbot what insights help people learn a topic, but it will only be able to try piecing that information together from what other people have written. On the whole, it still doesn’t beat the experience of someone who’s been there.
Human insights around learning are always going to get baked into a human-designed course, whereas AI is not generally going to be thinking in these terms.
Case Study: Learning Concurrency
I wanted to share a couple case studies that highlight some of the promise but also some of the frustrations with using AI for learning. Here’s a link to an extensive, multi-day study I did with ChatGPT to learn about concurrency topics. It helped me review a lot of topics I had learned in college (10 years ago), and also learn many new things. But there were still some pain points.
The “filtering” problem should be very evident. For each prompt I gave, ChatGPT provided tons of information. It was entirely up to me to figure out how much of this I really needed to know in order to be satisfied.
The “direction” problem is also clear. I started by asking for an organizational outline, and the chatbot duly obliged. But as I dug into certain topics, its preference was to ask me to keep going deeper down certain knowledge paths. I had to consistently drag it back to the syllabus it originally designed.
There were also no clear insights on what the key knowledge was. Over the course of the study, I figured some of these out for myself. But again, I had to filter through a lot of data to get there.
Another drawback I haven’t mentioned yet is the “memory” issue. Chatbots have limited, token-based memory, so they’ll forget what you’ve already learned over even a medium length study. My concurrency study introduced the idea of a “lock-free queue” using compare-and-swap operations early on. ChatGPT reintroduces this idea later as if I had never heard of it. Human-designed courses will avoid this sort of behavior.
I didn’t ask for practice problems in this study, so let’s consider another case study where I was specifically looking to do this in Haskell.
Case Study: Dijkstra’s Algorithm
In this quick study, I asked ChatGPT to come up with a practice problem for learning Dijkstra’s algorithm. Some things were good about its response, but some things weren’t.
On the positive side, the code works, the tests work, and some of the follow-up suggestions are also pretty good. For example, putting a bound on the number of nodes your path can have, or allowing multiple starts are simple extensions that didn’t occur to me when I was writing problems.
My main gripe is that the problems are a bit too obvious as graph problems. It started essentially with “implement Dijkstra’s algorithm” rather than giving me a practice problem using Dijkstra’s algorithm. And when I asked for a “disguised graph problem”, it gave me the delivery problem which wasn’t much of a disguise.
Also, the code used PSQueue, rather than the more beginner-friendly Data.Heap. This package may be better for certain things, but the type operator it uses would be a bit more confusing for a novice.
The line-by-line explanations were pretty good on the whole, but I don’t know that they’re a perfect substitute for really good visual/slide-based instructions like you would find in one of my courses.
With enough prompt engineering, you could get around these issues. But that’s exactly my point. It’s nice to not have to keep coming up with new prompts to get what you’re looking for, especially when you get a long explanation after every question.
Conclusion
Generative AI is a massive innovation for learning, especially on subjects that don’t have a lot of good guide material. But extensive, well-thought-out, human-designed content still has some significant advantages. The content is informed by the personal experience of someone who has actually been in your shoes and has had to learn something the same way you’ll learn it. This is not something an AI can relate to.
Prompt engineering involves a lot of cognitive effort. You have to constantly be directing the flow of what you’re supposed to learn, filter out the unnecessary parts, and then you have to learn it! While the freedom of being able to learn almost anything can be desirable, it can also be exhausting to always be directing the flow. It can be much easier and more helpful to just follow the lead of what another person has done.
I’ve used generative AI for learning and will continue to do so. But when human-designed content is available, I’ll look there first, and consider using AI as a supplement where I feel there are gaps.
When it comes to generating content, I don’t like AI as much, certainly not as a general purpose content producer. But it certainly has its uses. Looking back on course creation, I wish I had used it for writing test cases, for example. Another idea might be translating my work into other languages.
I’ll continue to experiment with AI going forward. But a solid guiding principle is that you should be using AI to enhance yourself, and not replace yourself. I still believe that human content has an edge over AI content for the same subject matter, so I encourage you to take another look at our courses at Monday Morning Haskell Academy, and to subscribe to our mailing list for future updates and discounts!
[ I started thinking about this about twenty years ago, and then writing it down in 2019, but it seems to be obsolete. I am publishing it anyway. ]
The canonical division of the year into seasons in the northern
temperate zone goes something like this:
Spring: March 21 – June 21
Summer: June 21 – September 21
Autumn: September 21 – December 21
Winter: December 21 – March 21
Living in the mid-Atlantic region of the northeast U.S., I have never
been happy with this. It is just not a good description of the
climate.
I begin by observing that the year is not equally partitioned between
the four seasons. The summer and winter are longer, and spring and
autumn are brief and happy interludes in between.
I have no problem with spring beginning in the middle of March. I
think that is just right. March famously comes in like a lion and
goes out like a lamb. The beginning of March is crappy, like
February, and frequently has snowstorms and freezes. By the end of
March, spring is usually skipping along, with singing birds and not just the early
flowers (snowdrops, crocuses, daffodil) but many of the later ones also.
By the middle of May the spring flowers are over and the weather is
getting warm, often uncomfortably so. Summer continues through the
beginning of September, which is still good for swimming and
lightweight clothes. In late September it finally gives way to
autumn.
Autumn is jacket weather but not overcoat weather. Its last gasp is
in the middle of November. By this time all the leaves have changed,
and the ones that are going to fall off the trees have done so. The
cool autumn mist has become a chilly winter mist. The cold winter
rains begin at the end of November.
So my first cut would look something like this:
Months
Seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter
Note that this puts Thanksgiving where it belongs at the boundary
between autumn (harvest season) and winter (did we harvest enough to
survive?). Also, it puts the winter solstice (December 21) about one
quarter of the way through the winter. This is correct. By the
solstice the days have gotten short, and after that the cold starts to
kick in. (“As the days begin to lengthen, the cold begins to
strengthen”.) The conventional division takes the solstice as the
beginning of winter, which I just find perplexing. December 1 is
not the very coldest part of winter, but it certainly isn't autumn.
There is something to be said for it though. I think I can
distinguish several subseasons — ten in fact:
Dominus Seasonal Calendar
Months
Seasons
Sub-seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter
Midwinter
Late Winter
Early spring
Late spring
Early Summer
Midsummer
Late Summer
Early autumn
Late autumn
Early winter
Midwinter
Midwinter, beginning around the solstice, is when the really crappy
weather arrives, day after day of bitter cold. In contrast, early and
late winter are typically much milder. By late February the snow is
usually starting to melt. (March, of course, is always unpredictable,
and usually has one nasty practical joke hiding up its sleeve. Often,
March is pleasant and springy in the second week, and then mocks you
by turning back into January for the third week. This takes people by
surprise almost every year and I wonder why they never seem to catch
on.)
Similarly, the really hot weather is mostly confined to
midsummer. Early and late summer may be warm but you do not get
blazing sun and you have to fry your eggs indoors, not on the
pavement.
Why the seasons seem to turn in the middle of each month, and not at
the beginning, I can't say. Someone messed up, but who? Probably the
Romans. I hear that the Persians and the Baha’i start their year on
the vernal equinox. Smart!
Weather in other places is very different, even in the temperate
zones. For example, in southern California they don't have any of the
traditional seasons. They have a period of cooler damp weather in the
winter months, and then instead of summer they have a period of gloomy
haze from June through August.
However
I may have waited too long to publish this article, as climate change
seems to have rendered it obsolete. In recent years, we have barely
had midwinter, and instead of the usual two to three annual snows we
have zero. Midsummer has grown from two to four months, and summer
now lasts into October.
Today, 2025-05-07, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 43th episode of the Haskell Unfolder live on YouTube.
In this episode, we are going to look at two interacting “features” of the Haskell language (the monomorphism restriction and defaulting) that can be somewhat surprising, in particular to newcomers: there are situations where Haskell’s type inference algorithm deliberately refuses to infer the most general type. We are going to look at a number of examples, explain what exactly is going on, and why.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. Links above are preliminary, you may need to search to find the actual links. The official blurb is brief:
Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.
In the UK, it’s very common that your employer pays you once a
month. When this happens, they give you a document called a payslip,
that has some numbers on it, such as how much your salary is, how much
they paid you this month, how much went to HMRC in tax, how much went
to your pension, and a few other numbers. But they never show any
workings, so you really have no way to check whether any of these
numbers are correct. There are plenty of online take-home-pay
calculators, but these all focus on the full year; they have no
facility to calculate your next payslip.
About half way through April 2024, I stopped working for one
company. Everything was wrapped up – I received my final payslip from
them, along with my P45. I then had a few months off, and started a
new job in July 2024. When you start a new job it always takes a while
for money things to get sorted out, for example pension enrolment and
sorting out pension contributions, so it’s really worthwhile to keep a
close eye on your payslips particularly for these first few
months. Mine were arriving and some numbers looked right, but other
numbers, such as the amount of tax I was paying, were changing
dramatically, month to month. I had no idea why; whether they should
be changing like that; whether they were going to keep changing or
would eventually settle down. I had no way to check any of these
numbers. Was I going to get in trouble with HMRC and get investigated?
I was also a little on edge because this was the first job where my
pension contributions were using a thing called Qualifying
Earnings. In
all my previous jobs, if I chose for 10% of my salary to go into my
pension, then that’s what would happen. But now there was this thing
called Qualifying Earnings, which is (numbers correct at time of
writing) a band from £6240 to £50,270. If you’re earning, say £30k,
then your x% contribution is actually x% of £30,000-£6240. If
you’re earning above £50,270, then any further increase to your salary
will not result in any extra contributions to your pension because
you’re above the band. The 2008 Pensions Act, which created the legal
requirement for all employees to have workplace pensions and for
automatic enrolment (with a minimum 8% combined contribution from the
employer and employee), also created this concept of Qualifying
Earnings. I consider this is a pretty scummy way of reducing employer
pension contributions for large firms. It complicates the maths and no
doubt adds confusion for people trying to check their own
payslips. Given that 74% of the population have pensions that are too
small to retire
on,
this whole concept of Qualifying Earnings seems amoral at best.
These days, a lot of smaller companies outsource their payroll
processing. In my case, I was officially working for an international
Employer of Record
and they were then outsourcing payroll processing to local firms with
country-specific expertise. So when I started asking questions, there
was no ability to go and sit with someone and work through it. Or have
a call. It was all messages passed across multiple different systems,
and partial answers at best would come back several days later. Even
if your payroll is done in-house, I strongly suspect that a lot of the
time, some software package will be being used that does all the
calculations and quite likely no one will actually understand or be
able to explain the maths that’s going on.
After a while of getting no-where, and after uncovering some
substantial mistakes that had been made that affected me, I decided to
spend some weekends actually figuring out how
PAYE
works, and writing some code that can calculate my next payslip. This
library is available for
anyone to use. There’s a
README that
hopefully explains the basic principles of how the calculations are
done. This only works if your
tax-code ends
in an L, and it only works if you’re in National Insurance
categoryA. All the code can do is use some details you provide to predict
your next payslips. Also, I’m not a trained accountant or financial
adviser, and even for my own payslips, every month, the numbers don’t
quite match up (but they’re within £1). So please treat this as a toy,
rather than the basis for building a payroll processor!
Getting started
The library is written in Go so you’ll need Go
installed. Then, in a terminal do:
$ mkdir payslips
$ cd payslips
$ go mod init mypayslips
$ go get wellquite.org/tax@latest
Now we need to write a tiny amount of code. In your new payslips
directory, create a main.go file, and open it in your editor. You
want something like this:
We create a list of
Payslips. The first
payslip must specify a year, and your tax-code. These details are
automatically applied to the payslips that follow, if not explicitly
provided. Many of the calculations rely on year-to-date totals, and so
we must have a complete record of your payslips from the start of the
tax year. So that means the first payslip is month 1 (in this example,
April 2024), then month 2 (May 2024) and so on. If you have no income
for a month then you can just put in an empty payslip ({}). The
above example describes being paid in April and May 2024, then nothing
in June, and then being paid (with a higher salary) in July, August
and September.
Save this main.go file. Then, back in your terminal, in your
payslips directory, just do:
go run main.go
You should get some output showing all sorts of calculations,
including income tax, and personal
allowance. With a little luck,
if you change the numbers to match your own salary and other details,
the numbers produced should match quite closely your own payslips,
provided nothing you’re doing is too exotic.
There is documentation for all the different
fields that you can
provide in each payslip. In general, the code will try to fill in
missing values. It should be able to cope with things like
salary-sacrifice, or, if you change job within a month and have
several payslips for the same month, this should work too. Everything
is run locally on your computer: please feel free to check the
source – there are no 3rd
party libraries at all, and nothing imports the net package. It’ll
work just the same if you yank out your network cable or disable your
WiFi.
Note however, this code is lightly tested. Whilst it works for me
(and one or two friends), I make no claims that it correctly models
the entirety of PAYE, so it may very well not work for you. Feedback,
contributions, corrections, and patches are all very welcome!
Due to some technical issues, our Spring Sale has been extended! You have until Monday, May 12 to get 20% off of all our courses and bundles with the code SOLVE25, and you can get an even bigger 30% discount if you subscribe to our mailing list.
Having now released the final portion of Solve.hs (probably my last course for a while) I wanted to consider the broader landscape of Haskell courses. What other courses are out there? Are they better than mine?
So I’ve actually purchased a few other Haskell courses, and spent a decent amount of time going through their material. I may not be the smartest person to write a Haskell course and I definitely don’t have the most industry experience with Haskell. But, having explored some of these other courses, I think there are some good reasons to consider my courses among the top tier in the Haskell community.
So on the last day of this sale, I wanted to explore a few areas where I think my courses stand out above the rest.
Breadth of Material
There’s a common thread among most Haskell material out there, including and especially courses. They will generally all cover the same topics. You can generally expect to see all of the following in a Haskell course:
Basic Syntax and Types
Typeclasses and polymorphism
Basic Recursion
Understanding Functors, Applicatives & Monads
Using the IO monad
Basic use of the Map type
In some cases, you’ll also see something like a basic web server. And there’s a good reason for this progression. I covered the same material in Haskell From Scratch!
But there’s generally a lack of material in a lot of cool and interesting areas. I’ve done my best to cover a lot of these areas throughout my courses. Here are some of those topics, and the corresponding courses that cover them.
Data structures (beyond lists and maps) - Solve.hs
Simply put, I haven’t found a Haskell resource anywhere else that puts all these concepts together in a course-like environment. You could potentially find some blog posts that discuss them, or read the documentation, but this leads to the next point.
Detailed and Challenging Exercises
Reading by itself is rarely enough to retain knowledge, especially when it comes to programming. If you read a great article about unit testing in Haskell, you’ll probably forget all the details and have to go back to it the next time you actually want to use the ideas.
You can try to follow along with the article by writing the code in your own IDE. But you’ll still probably end up just copying things, which also isn’t the best way to learn.
You can even try to devise your own project to use the knowledge. But there’s often a significant cognitive effort involved in coming up with a new idea that fits these requirements…different enough from the article that you’re actually testing yourself, but similar enough that you can actually apply the concept.
Great programming courses should provide exercises so that you can try the techniques in your own environment, without a spoon-fed answer already available to you. They should remove the overhead of coming up with your own way to test yourself, while also providing rapid feedback on whether or not you’ve succeeded.
A lot of Haskell courses I’ve seen out there don’t satisfy these criteria. I’ve seen courses out there that don’t have any exercises. And the ones that do often have at least one of the following issues:
Only 1-2 problems per lecture
Problems are too easy
Lack of test cases
No starter code (i.e. you’re only given a written description)
No toolchain integration (i.e. you’re just given a file, but no project to work with or limited build instructions)
Every course on Monday Morning Haskell Academy comes with detailed exercises to help you learn the material. You’ll usually get several problems per lecture (4-6), and the starter code for these problems comes with full toolchain integration and instructions, plus automated unit test cases.
Difficulty is always going to be a bit subjective, but for most lectures I’ve made an effort to have some easier problems as well as more challenging ones.
Lecture Content and Slides
Naturally, the core content of the course is the lecture materials, so it’s worth talking about that as well! Some Haskell courses rely strictly on written material, but most incorporate slides and audio presentation.
For the most part, course authors do a fine job with their slides. But I think I go above and beyond the norm by using bold text to highlight the most important parts of the code presented, and using colors to show the relationship between different elements on the same slide and across slides.
With our courses, you’re able to get the slides as a downloadable asset. And with the level of detail on them, they serve as a useful reference for you to quickly come back to, even without listening again to the lecture audio.
Other Guarantees
Finally, it’s worth noting that our courses and bundles all come with a 14-day money back guarantee. If you don’t like the materials, you can get a refund within 14 days with no questions asked.
Additionally, all our courses guarantee lifetime access to the content. There’s no recurring subscription. So if your life is too busy to go through the full course right now, you can always save it for later!
So you may as well take a look at our course listings now, since you can get a 20% discount using the code SOLVE25 (today only!). If you subscribe to our mailing list, you’ll get an extra 10% off as well.
Our new bundles (e.g. Beginners & Advanced) are a great way to save money while exploring the full breadth of Haskell materials and topics we have to offer. If you get MMH Complete, you’ll get lifetime access to all our course content, past, present and future! So don’t miss out, take advantage of the sale today!
Why doesn’t [the Data.Map function] unionWith :: (a -> a -> a) -> Map k a -> Map k a -> Map k a allow for different value types the way intersectionWith :: (a -> b -> c) -> Map k a -> Map k b -> Map k c does?
This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation.
To answer the original question, what would the type of a different value type of unionWith look like? It would be something in the flavor of:
unionWith :: (Maybe a ->Maybe b -> c) ->Map k a ->Map k b ->Map k c
But this new Maybe a -> Maybe b -> c parameter is somewhat lossy, in that it gives the impression that it could be called with Nothing Nothing as parameters, which doesn’t fit into the vibe of being a “union.”
So instead we could restrict that possibility by using These a b:
dataThese a b =This a |That b |These a bunionWith :: (These a b -> c) ->Map k a ->Map k b ->Map k c
which seems reasonable enough.
But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map?
A Map k v is a particularly efficient implementation of functions with type k -> Maybe v. But why is this Maybe here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be Monoid v => k -> v. In fact, it helps us make sense of the right bias present in Data.Map, where we see:
lookup k (singleton k v1 <> singleton k v2) =Just v2
This equality is hard to justify under the normal understanding of Map k v being an encoding of a function k -> Maybe v. But under the general monoid interpretation, we get a nice semigroup homomorphism:
lookup k (m1 <> m2) =lookup k m1 <>lookup k m2
where the monoid in question has been specialized to be Last.
Of course, we also have a monoid homomorphism:
lookup k mempty=mempty
Let’s re-evaluate the original question in terms of this newly-generalized Map. Now that we’ve removed all of the unnecessary baggage of Maybe, we can again think about the desired type of unionWith:
unionWith :: (a -> b -> c)->Map k a->Map k b->Map k c
which looks awfully familiar. This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with mempty as a parameter!
We can give some semantics as to what unionWith ought to do again by relating it to the observation lookup. The relevant law here seems like it ought to be:
lookup k (unionWith f m n) = f (lookup k m) (lookup k n)
By choosing a degenerate function f, say, \_ _ -> nontrivial, where nontrivial is some value that is notmempty, we can see the beginnings of a problem:
lookup k (unionWith f m n)= f (lookup k m) (lookup k n)=<let f = \_ _ -> nontrivial> nontrivial
Regardless of the key we lookup in our unionWithed Map, we need to get back nontrivial. How can we implement such a thing? I see only two ways:
explicitly associate every key in the map with nontrivial, or
keep nontrivial around as a default value in the map
#1 is clearly a non-starter, given that we want our Maps to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this:
dataMap k v =Map { defaultValue :: v , implementation ::Data.Map.Map k v }deriving stock Genericderiving (Semigroup, Monoid) via (Generically (Map k v))unionWith :: (a -> b -> c)->Map k a->Map k b->Map k cunionWith f (Map def1 imp1) (Map def2 imp2) =Map (f def1 def2) (liftA2 f imp1 imp2)
Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism:
lookup k (m1 <> m2) =lookup k m1 <>lookup k m2
Without loss of generalization, take m2 = pure nontrivial (where pure is just unionWith with a constant function.) This gives us:
lookup k (m1 <>pure nontrivial) =lookup k m1 <> nontrivial
Making this thing efficient is a further complication! We again have two options:
modify the value at every key by multiplying in nontrivial, or
finding a way of suspending this computation
#1 clearly requires \(O(n)\) work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire Map. For example, consider a Map constructed a la:
Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of Data.Map.Map.
I find this quite an interesting result. I always assumed that Data.Map.Map (or at least, Data.Map.Monoidal.MonoidalMap) didn’t have an Applicative instance because it would require a Monoid constraint on its output—but that’s not the sort of thing we can express in Haskell.
But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of Applicative, even if we could constrain the result.
What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of Data.Map (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.
Usually I write about solutions to problems I’ve worked out, but I’ve found myself increasingly becoming interesting in where solutions come from. Maybe it’s because I’ve been reading Boorstin’s excellent The Discoverers, which I’d strongly recommend.
Regardless of why, I thought I’d switch up the usual dance step today, and discuss what solving my most-recent-big-problem actually looked like, in terms of what I tried, where I looked, and what the timeline was.
The Problem
The problem is to serialize a program graph into a series of let-bindings. For example, given the following graph:
+
/ \
f ---> g
| / \
a \ /
expensive
which represents the program:
f a (g expensive expensive) + g expensive expensive
Unfortunately, this is a naive representation of the program, since it duplicates the work required to compute expensive four times, and g expensive expensive twice. Instead, we would prefer to generate the equivalent-but-more-efficient program:
let$0= expensive$1= g $0$0in f a $1+$1
This transformation is affectionately known as sharing, since it shares the computed answer whenever there is repeated work to be done.
So this is what we’re trying to do. Given the original graph, determine the best place to insert these let-bindings, for some reasonable definition of “best.” We can assume there are no side effects involved, so any place that an expression is well-scoped is an acceptable solution.
In order to understand some of my attempted solutions, it’s worth noting that our final solution should build something of type Expr, and the original graph is represented as a IntMap (ExprF Int). ExprF is the Base functor of Expr, with all of its self-references replaced by some type variable, in this case Int. Thus, the graph above looks much more like:
I spent over a year trying to solve this problem, with various mostly-working solutions during that time. My strategy here was to think really hard, write up some algorithm that seemed plausible, and then run it against our (small) battery of integration tests to make sure it got the same answer as before.
Why not property test it? I tried, but found it very challenging to implement well-typed generators that would reliably introduce shared thunks. But maybe there’s a different lesson to be learned here about writing good generators.
Anyway. For eight months, one of these think-really-hard algorithms fit the bill and didn’t give us any problems. It was a weird, bespoke solution to the problem that independetly kept track of all of the free variables in every graph fragment, and tried to let-bind a fragment as soon as we landed in a context where all of the free variables were in scope. It seemed to work, but it was extremely messy and unmaintainable.
At the time of writing, this sharing algorithm was the only source of let-binds in our entire language, which meant that it didn’t need to account for let-binds in the program.
Of course, that invariant eventually changed. We added a way in the source langauge to introduce lets, which meant my algorithm was wrong. And I had written it sufficiently long ago that I no longer remembered exactly why it worked. Which meant the theory of my program was lost, and thus that we ought to rewrite it.
Unfolding a Solution
I went back to the problem statement, and stared at it for a long time (back to the think-really-hard algorithm!) Upon staring at the problem, I realized that what I was really trying to do was determine where diamond patterns arose in the propgram graph.
Recall our original graph:
+
/ \
f ---> g
| / \
a \ /
expensive
If we redraw it such that g is on a different rank than f, then the two diamond patterns become much clearer:
+
/ \
f |
| \ |
a \ /
g
/ \
\ /
expensive
The insight I came up with is that if a node n is the source of a diamond, then we must let-bind the sink of the diamond immediately before inlining the definition of n.
This gives rise to the question of “how do we identify a diamond?” What we can do is give a mapping from each node to its reachable set of nodes. For example, in the above, we’d compute the map:
+ -> {+, f, a, g, expensive}
f -> {f, a, g, expensive}
a -> {a}
g -> {g, expensive}
expensive -> {expensive}
Then when we go to inline a node, say, +, we can look for any nodes that are reachable via more than one of its immediate subterms. Since the immediate subterms of + are f and g, we can take the intersections of their reachable sets:
{f, a, g, expensive} union {g, expensive}
giving us
{g, expensive}
which is exactly the set of nodes that we need to perform sharing on. If you topologically sort this set, it gives you the order that you should perform your let bindings.
EXCEPT there’s a kink in the whole thing. What happens if one of the terms in this diamond contains free variables? In particular, we might have something like this:
+
/ \
f |
| \ |
a \ /
λx
/ \
\ /
expensive
|
x
This gives us an analogous set of reachable nodes when we look at +, but we obviously can’t lift expensive x above the lambda.
Resolving this problem required giving up on the notion of memoizing the entire reachable set of nodes, and to instead crawl the graph ensuring that everything is well-scoped.
Performance Woes
My algorithm looked fine, and, importantly, got the right answer in a reasonable amount of time on our (small) battery of integration tests. So I shipped it, commended myself on a job well done, and thought nothing more about it. For about a week, until a bug report came in saying that our compiler now seemed to hang on big programs.
Which was something I hadn’t noticed, since we didn’t have any big programs in our integration tests.
Damn!
Upon digging in to what exactly was so slow, I noticed that my algorithm was accidentally quadratic. I needed to fold over every node in the graph, and that required looking at the entire reachable set underneath it. I had put in some of the obvious safeguards, hoping that they would prune the search tree early, but it wasn’t enough sacrifice for the Great God of Asymptotes.
Did I mention that at this point in the story, having this algorithm working fast was on the critical path of the company? Everybody else was blocked on me figuring this out. Talk about pressure!
Anyway. You’ll notice above that in my description of the algorithm, everything sounds fine. But the juice is in the details, as the common saying goes. Computing reachability isn’t quite the right thing to be using here, as it gave us the wrong answer for the lambda example above. Which is unfortunate because reachability is something we can do in linear time.
And then when reachability didn’t work, I just threw away the fast performance and hoped my bespoke algorithm would do the job. My only redemption comes from the fact that at least it got the right answer, even if it did so very slowly.
Finding the Kernel
Back to the drawing board.
Whenever I have graph theory problems, I call up my boy Vikrem. He’s good at nerd stuff like this.
We rubberducked the problem, and tried to reframe the problem in the language of graph theory. We had a Merkiv–Maguire moment where we indepdently realized that the goal was somehow related to finding the lowest common ancestor (LCA) of a node.
Which is to say, roughly, that we are looking for forks in the diamond diagram. Which we already knew, but it was nice to have some language for.
Our new problem is that LCA is defined only over trees. There are some extensions to DAGs, but none of them seem to be particularly well founded. However, searching for exactly that brought me to this stackoverflow question, where nestled in the comments is someone suggesting that the poster isn’t looking for LCA, but instead for a related notion the lowest single common ancestor. LSCA is defined in a 2010 paper New common ancestor problems in trees and directed acyclic graphs.
The standard definition of LCA(x, y) = l is that “l is an ancestor of x and of y, and that no descendent of l has this property.”
But the definition of LSCA(x, y) = l is that “l lies on all root-to-x paths, and that l lies on all root-to-y paths, and that no descendent of l has this property.”
The distinction between the two is easily seen in the following graph:
0
/ \
1 2
| X |
3 4
Under the standard definition, LCA is not uniquely defined for DAGs. That is, LCA(3, 4) = {1, 2}. But neither 1 nor 2 lies on all paths from the root. Under LSCA therefore we get LSCA(3, 4) = 0, which is the obviously-correct place to let-bind 3 and 4.
The paper gives a preprocessing scheme for computing LSCA by building a “lowest single ancestor” (LSA) tree. The LSA of a node is the LSCA of all of its in-edges. This definition cashes out to mean “the most immediate diamond above any node.” Finally! This is exactly what we’re looking for, since this is where we must insert our let-bindings! Even better, the paper gives us an algorithm for computing the LSA tree in linear time!
The First Implementer
Of course, I’m lazy and would prefer not to implement this thing. So instead I searched on hackage for lsca, and found nothing. But then I searched for lca and found that, like always, Ed Kmett was 13 years ahead of me.
The lca package implements an \(O(log n)\) algorithm for computing the LCA of any two nodes in a graph. Which is very convenient for me, since the LSCA algorithm requires being able to do this.
Time to roll up the sleeves and get cracking I suppose.
The paper was surprisingly straightforward, and my first attempt implemented the (imperative) algorithms as given (imperatively.) The first step is to do a topological sort on the DAG in order to know in which order one ought to unfold the LSA tree.
But as is so often the case, this topological sort isn’t actually relevant to the algorithm; it’s just an encoding detail of expressing the algorithm imperatively. But you don’t need that when you’ve got laziness on your side! Instead you can just tie the know and do something cool like this:
lsaTree ::Ord v =>Map v (Set v) ->Map v (Path v)lsaTree input = fix $ \result -> M.fromList $do (node, parents) <- M.toList inputlet parentResults =fmap (result M.!) parents...
Notice how we use fix to bind the eventual result of the final computation. Then we can chase pointers by looking them up in result—even though it’s not yet “computed.” Who cares what order the computer does it in. Why is that a thing I should need to specify?
Anyway. The exact details of implementing LSA are not particularly important for the remainder of this blog post. If you’re interested, you can peep the PR, which is delightfully small.
Tying It All Back Together
Equipped with my LSA tree, I was now ready to go back and solve the original problem of figuring out where to stick let-bindings. It’s easy now. Given the original program graph, find the LSA for each node. The LSA is the place you should insert the let binding.
So given the map of nodes to their LSAs, invert that map and get back a map of nodes to descendents who have this node as an LSA. Now when you go to inline a node, just look up everything in this map and inline it first.
It turns out to be a very elegant solution. It’s one third of the length of my horrible ad-hoc implementations, and it runs in linear time of the number of nodes in the graph. All in all, very good.
More often than I’m comfortable about, people will ask me how I can have so many good ideas. And what I like about this story is that it’s pretty typical of how I actually “have” “good” ideas. I’m reminded of the fact that luck favors the prepared mind. Attentive readers will notice that none of this process was due to brilliance on my part. I happened to know Vikrem who’s a genius. Together we pulled at some ancient graph theory strings and remembered a fact that someone else had thought important to teach us. That wasn’t actually the right path, but it lead us to stackoverflow where someone had linked to a relevant paper. I implemented the paper using a library that someone else had done the heavy lifting on, and simplified the implementation using this knot-tying trick I picked up somewhere along the way.
Also, I’m just really pleased that the solution came from trying to reverse engineer the relevant graph-theory search terms. Maybe that’s the actual takeaway here.
A few days ago I got angry at xargs for the hundredth time, because
for me xargs is one of those "then he had two problems" technologies.
It never does what I want by default and I can never remember how to
use it. This time what I wanted wasn't complicated: I had a bunch of
PDF documents in /tmp and I wanted to use GPG to encrypt some of
them, something like this:
gpg -ac $(ls *.pdf | menupick)
menupick
is a lovely little utility that reads lines from standard input,
presents a menu, prompts on the terminal for a selection from the
items, and then prints the selection to standard output. Anyway, this
didn't work because some of the filenames I wanted had spaces in them,
and the shell sucks. Also because
gpg probably only does one file at a time.
I could have done it this way:
ls *.pdf | menupick | while read f; do gpg -ac "$f"; done
but that's a lot to type. I thought “aha, I'll use xargs.” Then I
had two problems.
ls *.pdf | menupick | xargs gpg -ac
This doesn't work because xargs wants to batch up the inputs to run
as few instances of gpg as possible, and gpg only does one file at
a time. I glanced at the xargs manual looking for the "one at a
time please" option (which should have been the default) but I didn't
see it amongst the forest of other options.
I think now that I needed -n 1 but I didn't find it immediately, and
I was tired of looking it up every time when it was what I wanted
every time. After many years of not remembering how to get xargs to
do what I wanted, I decided the time had come to write a stripped-down
replacement that just did what I wanted and nothing else.
(In hindsight I should perhaps have looked to see if gpg's
--multifile option did what I wanted, but it's okay that I didn't,
this solution is more general and I will use it over and over in
coming years.)
xar is a worse version of xargs, but worse is better (for me)
First I wrote a comment that specified the scope of the project:
# Version of xargs that will be easier to use
#
# 1. Replace each % with the filename, if there are any
# 2. Otherwise put the filename at the end of the line
# 3. Run one command per argument unless there is (some flag)
# 4. On error, continue anyway
# 5. Need -0 flag to allow NUL-termination
There! It will do one thing well, as Brian and Rob commanded us in
the Beginning Times.
I wrote a draft implementation that did not even do all those things,
just items 2 and 4, then I fleshed it out with item 1. I decided that
I would postpone 3 and 5 until I needed them. (5 at least isn't a
YAGNI, because I know I have needed it in the past.)
The result was this:
import subprocess
import sys
def command_has_percent(command):
for word in command:
if "%" in word:
return True
return False
def substitute_percents(target, replacement):
return [ s.replace("%", replacement) for s in target ]
def run_command_with_filename(command_template, filename):
command = command_template.copy()
if not command_has_percent(command):
command.append("%")
res = subprocess.run(substitute_percents(command, filename), check=False)
return res.returncode == 0
if __name__ == '__main__':
template = sys.argv[1:]
ok = True
for line in sys.stdin:
if line.endswith("\n"):
line = line[:-1]
if not run_command_with_filename(template, line):
ok = False
exit(0 if ok else 1)
Short, clean, simple, easy to use. I called it xar, ran
ls *.pdf | menupick | xar gpg -ac
and was content.
Now again, with Claude
The following day I thought this would be the perfect opportunity to
try getting some LLM help with programming. I already had a baseline
version of xar working, and had thought through the problem
specification. Now I could do it over with the LLM and compare the
two results. The program I wanted was small and self-contained. If
the LLM sabotaged me somehow, I would probably notice, and if I
didn't, it wouldn't matter, because I had a working version
already.
So I asked Claude about it. I knew Claude was not the right choice,
because Claude is not really designed for this task, and has no
integration with the coding editor, but Claude is what I am familiar
with so I thought I'd give it a shot.
On the whole it went extremely well.
The complete transcript is here. I
imagine this was pretty much a maximally good experience, that all the
terrible problems of LLM coding arise in larger, more complicated
systems. But this program is small and self-contained, with no
subtleties, so the LLM could be at its best.
Phase 1
First I just told Claude what I wanted and let it write the program
from scratch.
Let's write a program in Python called xar. It will be a
stripped-down version of the standard Unix xargs utility, but it
will do less and be simpler to use.
It will get command-line
arguments that specify a command template. Then it will read lines
from stdin. Each line will be appended to the command template to
produce a command. The program will run the resulting command, one
command for each line of input.
Any questions?
Default Claude is very chatty and I am going to omit most of its
chatty comments, which make me impatient.
The program it wrote it was not what I would have written, but it was
good enough. If I had just used it right off the bat, instead of
writing my own, it would have been enough, and it would have taken
somewhere between 2% and 10% as long to produce.
So the one-line summary of this article is: I should stop writing
simple command-line utilities, and just have Claude write them
instead, because Claude's are good enough, and definitely better than
yak-shaving.
Here's Claude's initial version:
#!/usr/bin/env python3
import sys
import subprocess
import shlex
def main():
if len(sys.argv) < 2:
print("Usage: xar command [args...]", file=sys.stderr)
sys.exit(1)
command_template = sys.argv[1:]
for line in sys.stdin:
line = line.rstrip('\n')
full_command = command_template + [line]
try:
result = subprocess.run(full_command, check=True)
except subprocess.CalledProcessError as e:
print(f"Command failed with exit code {e.returncode}: {' '.join(full_command)}",
file=sys.stderr)
except FileNotFoundError:
print(f"Command not found: {full_command[0]}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
Claude's version had numerous comments, which I have omitted. I later
told it to stop putting in comments, which it did.
Claude's use of check here was not what I wanted, because that makes
subprocess.run raise an exception when the subcommand fails, and
then the exception has to be immediately caught and handled. My
original control flow had been simpler:
res = subprocess.run(substitute_percents(command, filename), check=False)
return res.returncode == 0
…
if not run_command_with_filename(template, line):
ok = False
Claude's program pulled in shlex without noticing that it was
unused. But Claude did teach me about str.rstrip('\n') which I had
not known about before (or maybe had forgotten), so that was one small
win already.
Argument parsing
The next step was a big win. Python's library for command-line
argument handling is called argparse and it is really nice. If I
were still writing programs in Perl, I would implement a Perl version
of argparse because Perl has 29 available argument parsing libraries
and they are all loathsome. The one problem with argparse is I never
remember off the top of my head how to use it. I think the module is
called argparse but it provides a class called Argparser but I
often get these backward and try to use argparser and Argparse.
Instead of figuring it out every time I usually dig up some previous
Python program and then copy-paste the argument parser from there,
amending it to suit the purpose.
But this time I didn't have to do that. Instead, I just said to
Claude:
This is good, now please add code at the top to handle argument
parsing with the standard Argparse library, even though there are
no options yet.
Claude handed me back pretty much the same program, but with the
argument parser at the top:
def parse_arguments():
"""Parse command-line arguments using argparse."""
parser = argparse.ArgumentParser(
description="A simplified version of xargs that appends each line from stdin to a command template.",
prog="xar"
)
# Store all positional arguments as the command to execute
parser.add_argument(
"command",
nargs="+",
help="The command template to execute for each input line"
)
return parser.parse_args()
Let's pause for a moment. Maybe you kids are unimpressed by this.
But if someone had sent this interaction back in time and showed it to
me even as late as five years ago, I would have been stunned. It
would have been completely science-fictional. The code that it
produced is not so impressive, maybe, but that I was able to get it in
response to an unstructured request in plain English is like seeing a
nuclear-powered jetpack in action.
I make some changes
It bothered me that the command-line arguments were being taken
implicitly from sys.argv; that makes the program hard to test. A
better way to do this is to have the list of command-line arguments be a
function to the main function, and then for the runtime to call
main(sys.argv[1:]) instead of just main(). That way you can test
the program without having to fake up sys.argv. So I changed that,
and while I was making changes I got rid of Claude's comments and the
unnecessary exception handling. I pasted the result back at Claude,
saying:
Good so far. I have adjusted your source code in several ways:
I removed the comments.
I removed the exception handling for the subcommand. If one of the subcommands fails, that no longer terminates the main loop
Command-line arguments are passed to the main() function rather than being accessed directly
Here is the code. Tell me if you see any problems or if you have any questions about it
I had also sabotaged the code with at least one likely error.
Python's subprocess.run returns a “result” structure that includes
the subcommand's output and exit status, and I can never remember the
name of the component that records the exit status. I either look it
up or get it by trial-and-error. This time I didn't bother to either,
I just guessed that it was result.exitcode, which I handed off to
Claude even though I thought was probably wrong.
Claude's reply said:
Here are a couple of issues I noticed:
There's an error in the error handling code - you're using result.exitcode but in the subprocess.run() function, the attribute is called returncode not exitcode
Awesome, now I don't have to remember that name of that component.
Claude continued:
The parse_arguments() function definition is missing in the pasted code, but it would need to be updated to accept the argv parameter
I had passed sys.argv[1:] to main, and then had main call parse_arguments(argv).
But I hadn't changed parse_arguments to expect an argument list. (I
hadn't included it in my paste at all, in fact.) Claude was exactly
right, and continued with the corrected version:
#!/usr/bin/env python3
import sys
import subprocess
import argparse
def parse_arguments(argv):
…
return parser.parse_args(argv)
def main(argv):
# Parse command-line arguments
args = parse_arguments(argv)
…
if __name__ == "__main__":
ok = main(sys.argv[1:])
exit(0 if ok else 1)
Handling NUL-separated values
I said
It's great, thanks for catching those errors. Now let's add a -0 command-line option that tells the program to split the input on "\0" characters instead of at line breaks.
Recall that my handwritten xar did not do this yet; I had left that
feature for another day. Claude just added it for me, adding
parser.add_argument(
"-0", "--null",
action="store_true",
help="Input items are terminated by a null character instead of by newline"
)
to the argument parser and changing the main loop to:
# Split input based on the delimiter option
if args.null:
# Split on null characters
items = stdin_data.split('\0')
else:
# Split on newlines
items = stdin_data.splitlines()
for item in items:
if not item:
continue
…
I was tired of Claude's comments, so I said
Give me the same thing, with no comments
which it did, so I said
From now on leave out the comments. I'm a Python expert and I don't need them. If there is something I don't understand I will ask you.
Claude complied. If I were going to do use Claude again in the future
I would include that in the canned instructions that Claude is given
up front. Instead I will probably use a tool better-suited to
programming, and do whatever the analogous thing is.
Template filling
Now I told Claude to add the % feature:
I changed if not item to if item == "". Now let's make the following change:
If the command template includes any % characters, each of these should be replaced with the input item.
Otherwise, if there were no % characters, the input item should be appended to the end of the command as in the current version
Claude did this. It used an explicit loop instead of the list
comprehension that I had used (and preferred), but it did do it correctly:
for arg in command_template:
if '%' in arg:
has_placeholder = True
full_command.append(arg.replace('%', item))
else:
full_command.append(arg)
if not has_placeholder:
full_command.append(item)
Even without the list comprehension, I would have factored
out the common code:
for arg in command_template:
if '%' in arg:
has_placeholder = True
full_command.append(arg.replace('%', item))
if not has_placeholder:
full_command.append(item)
But I am not going to complain, my code is simpler but is
doing unnecessary work.
Claude also took my hint to change item == "" even though I didn't
explicitly tell it to change that.
At this point the main loop of the main function was 15 lines long,
because Claude had stuck all the %-processing inline. So I said:
Good, let's extract the command template processing into a subroutine.
It did this right, understanding correctly what code I was referring
to and extracting it into a subroutine called
process_command_template. More science fiction: I can say "command
template processing" and it guesses what I had in mind!
This cut the main loop to 7 lines. That worked so well I tried it
again:
Good, now let's extract the part of main that processes stdin into a subroutine that returns the items array
It pulled the correct code into a function called process_stdin. It
did not make the novice mistake of passing the entire args structure
to this function. In the caller it had process_stdin(args.null) and
inside of process_stdin this parameter was named
use_null_delimiter,
YAGNI?
At this point I was satisfied but I thought I might as well ask if it
should do something else before we concluded:
Can you think of any features I left out that would be useful enough
to warrant inclusion? Remember this program is supposed to be small
and easy to use, in contrast to the existing xargs which is very
complicated.
Claude had four suggestions:
A -p or --parallel option to run commands in parallel
A -n or --max-args option to specify the maximum number of items to pass per command
A simple -v or --verbose flag to show commands as they're executed
A way to replace the command's standard input with the item instead of adding it as an argument
All reasonable suggestions, nothing stupid. (It also supplied code for #3,
which I had not asked for and did not want, but as I said before,
default Claude is very chatty.)
Parallelization
I didn't want any of these, and I knew that #2–4 would be easy to add if I
did want any of them later. But #1 was harder. I've done code like
this in the past, where the program has a worker pool and runs a new
process whenever the worker pool isn't at capacity. It's not even that
hard. In Perl you can play a cute trick and use something like
$workers{spawn()} = 1 while delete $workers{wait()};
where the workers hash maps process IDs to dummy values. A child
exits, wait() awakens and returns the process ID of the completed
child, which is then deleted from the map, and the loop starts another
worker.
I wanted to see how Claude would do it, and the result was an even
bigger win than I had had previously, because Claude wrote this:
with concurrent.futures.ProcessPoolExecutor(max_workers=args.parallel) as executor:
futures = [executor.submit(execute_command, cmd, args.verbose) for cmd in commands]
for future in concurrent.futures.as_completed(futures):
success = future.result()
if not success:
ok = False
What's so great about this? What's great is that I hadn't known about
concurrent.futures or ProcessPoolExecutor. And while I might have
suspected that something like them existed, I didn't know what they
were called. But now I do know about them.
If someone had asked me to write the --parallel option, I would have
had to have this conversation with myself:
Python probably has something like this already. But how long will
it take me to track it down? And once I do, will the API
documentation be any good, or will it be spotty and incorrect? And
will there be only one module, or will there be three and I will
have to pick the right one? And having picked module F6, will I
find out an hour later that F6 is old and unmaintained and that
people will tell me “Oh, you should have used A1, it is the new
hotness, everyone knows that.”
When I put all that uncertainty on a balance, and weigh it
against the known costs of doing it myself, which one wins?
The right choice is: I should do the research, find the good module (A1, not
F6), and figure out how to use it.
But one of my biggest weaknesses as a programmer is that I too often
make the wrong choice in this situation. I think “oh, I've done this
before, it will be quicker to just do it myself”, and then I do and it
is.
Let me repeat, it is quicker to do it myself. But that is still
the wrong choice.
Maybe the thing I wrote would be sooner or smaller or faster or more technically
suitable to the project than the canned module would have been. But it
would only have been more be technically suitable today. If it
needed a new feature in the future it might have to be changed by
someone who had never seen it before, whereas the canned module could well
already have the needed feature ready to go, already documented, and
perhaps already familiar to whoever had to make the change. My
bespoke version would certainly be unfamiliar to every other
programmer on the project — including perhaps myself six months later — so would be the wrong thing to use.
I'm really good at hacking this stuff up. Which is a problem. It
makes me want to hack stuff up, even when I shouldn't.
Claude tips the balance strongly toward the correct side, which is
that I should use the prepackaged module that someone else wrote and
not hack something up.
And now I know about concurrent.futures.ProcessPoolExecutor! The
world is full of manuals, how can I decide which ones I should read?
One way is to read them all, which used to work back when I was
younger, but now I have more responsibilities and I don't have time to
read the entire Python library reference including all the useless
bits I will never use. But here's Claude pointing out to me that this
is something I need to know about, now, today, and I should read this one. That is
valuable knowledge.
Claude flubs
While testing the program I did
xar --parallel=2 bash -c '...'
which didn't work because the argparse library tried to parse the
-c as a flag to xar, when I wanted it to be a flag to bash.
This was my fault, and there is a standard way around it, which is
that I should have said
xar --parallel=2 -- bash -c '...'
which tells argparse to drop the -- and take everything after it
literally with no interpretation.
I thought I'd see what Claude had to say about this, and it dropped
the ball in a very LLM sort of way.
I asked:
When I did xar --parallel=2 bash -c '...' it misunderstood the
arguments. It thought I was trying to pass -c to xar and failed
with unrecognized arguments. What should I do about that?
Claude replied:
There are two main approaches to handle this:
Use the -- separator to indicate the end of arguments for xar: …
Modify your code to handle this case specifically by looking for a command terminator in the argument parser.
Here's how you could modify your code to implement the first
approach:
And then it followed with the same, unmodified code…
Which was correct, because -- is already handled by argparse, so
no change was actually required to implement the first approach! But
Claude's characterization of it as a modification was misleading,
wrong, and potentially very confusing.
I said:
I don't see what the difference is in that argument parsing code as compared to what I had before.
and Claude got into the LLM tail-chasing mode where it got more and
more confused:
You're right, I apologize for the confusion. There's no difference
in the argument parsing code I provided compared to what you had
before. The issue is with how argparse handles arguments by
default.
To fix the issue with xar --parallel=2 bash -c '...', you need to
actually implement support for the -- separator in your code…
# Find the index of '--' if it exists
try:
separator_index = argv.index('--')
xar_args = argv[:separator_index]
command_args = argv[separator_index+1:]
except ValueError: …
Ugh, all wrong in the worst way. The code probably works, but it is
completely unnecessary. Claude's claim that “you need to actually
implement support for the -- separator” is flat wrong. I pointed
this out and Claude got more confused. Oh well, nobody is perfect!
Lessons learned
A long time ago, when syntax-coloring editors were still new, I tried
one and didn't like it, then tried again a few years later and
discovered that I liked it better than I had before, and not for the
reasons that anyone had predicted or that I would have been able to
predict.
(I wrote an article about the surprising reasons to use the syntax coloring.)
This time also. As usual, an actual experiment produced unexpected
results, because the world is complicated and interesting. Some of
the results were unsurprising, but some were not anything I would have
thought of beforehand.
Claude's code is good enough, but it is not a magic oracle
Getting Claude to write most of the code was a lot faster and easier
than writing it myself. This is good! But I was dangerously tempted
to just take Claude's code at face value instead of checking it
carefully. I quickly got used to flying along at great speed, and it
was tough to force myself to slow down and be methodical, looking over
everything as carefully as I would if Claude were a real junior
programmer. It would be easy for me to lapse into bad habits,
especially if I were tired or ill. I will have to be wary.
Fortunately there is already a part of my brain trained to deal with
bright kids who lack experience, and I think perhaps that part of my brain
will be able to deal effectively with Claude.
I did not notice any mistakes on Claude's part — at least this time.
At one point my testing turned up what appeared to be a bug, but it
was not. The testing was still time well-spent.
Claude remembers the manual better than I do
Having Claude remember stuff for me, instead of rummaging the
manual, is great. Having Claude stub out an argument parser,
instead of copying one from somewhere else, was pure win.
Partway along I was writing a test script and I wanted to use that
Bash flag that tells Bash to quit early if any of the subcommands
fails. I can never remember what that flag is called. Normally I
would have hunted for it in one of my own shell scripts, or groveled
over the 378 options in the bash manual. This time I just asked in
plain English “What's the bash option that tells the script to abort
if a command fails?” Claude told me, and we went back to what we were
doing.
Claude can talk about code with me, at least small pieces
Claude easily does simple refactors. At least at this scale, it got
them right. I was not expecting this to work as well as it did.
When I told Claude to stop commenting every line, it did. I
wonder, if I had told it to use if not expr only for Boolean
expressions, would it have complied? Perhaps, at least for a
while.
When Claude wrote code I wasn't sure about, I asked it what it was
doing and at least once it explained correctly. Claude had written
parser.add_argument(
"-p", "--parallel",
nargs="?",
const=5,
type=int,
default=1,
help="Run up to N commands in parallel (default: 5)"
)
Wait, I said, I know what the const=5 is doing, that's so that if
you have --parallel with no number it defaults to 5. But what is
the --default doing here? I just asked Claude and it told me:
that's used if there is no --parallel flag at all.
This was much easier than it would have been for me to pick over
the argparse manual to figure out how to do this in the first
place.
More thoughts
On a different project, Claude might have done much worse. It might
have given wrong explanations, or written wrong code. I think that's
okay though. When I work with human programmers, they give wrong
explanations and write wrong code all the time. I'm used to it.
I don't know how well it will work for larger systems. Possibly pretty
well if I can keep the project sufficiently modular that it doesn't get
confused about cross-module interactions. But if the criticism is
“that LLM stuff doesn't work unless you keep the code extremely
modular” that's not much of a criticism. We all need more
encouragement to keep the code modular.
Programmers often write closely-coupled modules knowing that it is bad
and it will cause maintenance headaches down the line, knowing that the
problems will most likely be someone else's to deal with. But what if
writing closely-coupled modules had an immediate cost today, the cost
being that the LLM would be less helpful and more likely to mess up
today's code? Maybe programmers would be more careful about letting
that happen!
Will my programming skill atrophy?
Folks at Recurse Center were discussing this question.
I don't think it will. It will only atrophy if I let it. And I have a
pretty good track record of not letting it. The essence of
engineering is to pay attention to what I am doing and why, to try to
produce a solid product that satisifes complex constraints, to try
to spot problems and correct them. I am not going to stop doing
this. Perhaps the problems will be different ones than they were
before. That is all right.
Starting decades ago I have repeatedly told people
You cannot just paste code with no understanding of
what is going on and expect it to work.
That was true then without Claude and it is true now with Claude. Why
would I change my mind about this? How could Claude change it?
Will I lose anything from having Claude write that complex
parser.add_argument call for me? Perhaps if I had figured it out
on my own, on future occasions I would have remembered the const=5 and default=1
specifications and how they interacted. Perhaps.
But I suspect that I have figured it out on my own in the past, more
than once, and it didn't stick. I am happy with how it went this time.
After I got Claude's explanation, I checked its claimed behavior pretty
carefully with a stub program, as if I had been reviewing a
colleague's code that I wasn't sure about.
The biggest win Claude gave me was that I didn't know about this
ProcessPoolExecutor thing before, and now I do. That is going to
make me a better programmer. Now I know something about useful that
I didn't know before, and I have a pointer to documentation I know I
should study.
My skill at writing ad-hoc process pool managers might atrophy, but if
it does, that is good. I have already written too many ad-hoc
process pool managers. It was a bad habit, I should have stopped long
ago, and this will help me stop.
Conclusion
This works.
Perfectly? No, it's technology, technology never works perfectly.
Have you ever used a computer?
Will it introduce new problems? Probably, it's new technology, and
new technology always introduces new problems.
But is it better than what we had before? Definitely.
I still see some programmers turning up their noses at this technology
as if they were sure it was a silly fad that would burn itself out
once people came to their senses and saw what a terrible idea it was.
I think that is not going to happen, and those nose-turning-up people,
like the people who pointed out all the drawbacks and unknown-unknowns
of automobiles as compared to horse-drawn wagons, are going to look
increasingly foolish.
Suppose a centrifuge has slots, arranged in a circle around the
center, and we have test tubes we wish to place into the slots.
If the tubes are not arranged symmetrically around the center, the
centrifuge will explode.
(By "arranged symmetrically around the center, I mean that if the
center is at , then the sum of the positions of the tubes
must also be at .)
Let's consider the example of . Clearly we can arrange ,
, , or tubes symmetrically:
Equally clearly
we can't arrange only . Also it's easy to see we can do tubes if
and only if we can also do tubes, which rules out .
From now on I will write to mean the problem of balancing
tubes in a centrifuge with slots. So and are possible, and and are
not. And is solvable if and only if is.
It's perhaps a little surprising that is possible.
If you just ask this to someone out of nowhere they might
have a happy inspiration: “Oh, I'll just combine the solutions for
and , easy.” But that doesn't work because two groups
of the form and always overlap.
For example, if your group of is the
slots then you can't also have your group of be
, because slot already has a tube in it.
The
other balanced groups of are blocked in the same way. You
cannot solve the puzzle with ; you have to do as
below left.
The best way to approach this is to do , as below right.
This is easy,
since the triangle only blocks three of the six symmetric pairs.
Then you replace the holes with tubes and the tubes with holes to
turn into .
Given and , how can we decide whether the centrifuge can be
safely packed?
Clearly you can solve when is a multiple of , but the example
of (or ) shows this isn't a necessary condition.
A generalization of this is that is always solvable
if since you can easily
balance tubes at positions , then do another tubes one position over, and
so on. For example, to do you just put first four tubes
in slots and the next four one position over, in slots
.
An interesting counterexample is that the strategy for ,
where we did , cannot be extended to . One
would want to do , but there is no way to arrange the tubes
so that the group of doesn't conflict with the group of ,
which blocks one slot from every pair.
But we can see that this must be true without even considering the
geometry. is the reverse of , which
impossible: the only nontrivial divisors of are and
, so must be a sum of s and s, and is not.
You can't fit tubes when , but again the reason is
a bit tricky. When I looked at directly, I did a case analysis
to make sure that the -group and the -group would always
conflict. But again there was an easier was to see this: and
clearly won't work, as is not a sum of s and s.
I wonder if there's an example where both and are not obvious?
For , every works except and the always-impossible .
What's the answer in general? I don't know.
Addenda
20250502
Now I am amusing myself thinking about the perversity of a centrifuge
with a prime number of slots, say . If you use it at all, you must
fill every slot. I hope you like explosions!
While I did not explode any centrifuges in university chemistry, I did
once explode an expensive Liebig condenser.
Omar Antolín points out an important consideration I missed:
it may be necessary
to subtract polygons. Consider . This is obviously
possible since . But there is a more interesting
solution. We can add the pentagon to the
digons and to obtain the solution
$${0,5,6,10,12,18, 20, 24, 25}.$$
Then from this we can subtract the triangle to obtain $${5, 6, 12, 18, 24, 25},$$ a solution to
which is not a sum of regular polygons:
Thanks to Dave Long for pointing out a small but significant error,
which I have corrected.
The GHC developers are very pleased to announce the availability
of the final release for GHC 9.10.2. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org and
via GHCup.
GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of
severities and scopes, including:
Significantly improved performance when dynamically loading Haskell symbols (#23415).
Fixing a bug where the simplifier sometimes destroyed join points during float out, which could impact performance (#24768).
Reduced memory fragmentation in the non-moving GC’s segment allocator, improving resident set size by up to 26% for some applications (#24150).
Added new flags to control speculative evaluation (-fspec-eval and -fspec-eval-dictfun) to work around performance regressions (#25606).
Fixed several platform-specific issues, including segfaults with FFI on PowerPC (#23034) and improved code
generation for AArch64 with multiway branches now using jump tables (#19912)
And many more!
A full accounting of these fixes can be found in the release notes. As
always, GHC’s release status, including planned future releases, can be found on
the GHC Wiki status.
We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable,
Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors
whose on-going financial and in-kind support has facilitated GHC maintenance
and release management over the years. Finally, this release would not have
been possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
At work I’ve been researching
how to improve the ergonomics of prompt engineering and I wanted to share
and open source some of what I’ve done. This initial post is about how
I’ve been experimenting with using bidirectional type inference
to streamline prompt chaining.
“Prompt chaining” is a prompt engineering technique that splits a
larger task/prompt into multiple smaller tasks/prompts which are chained
together using code. For example, instead of prompting a model to
generate a poem in one prompt like this:
Write a poem based off this idea:
${idea}
… by following this process:
First think through the form, stanza count, lines per stanza, and
rhyme scheme
Then choose a poetic style (tone, voice, and literary devices) based
on the poem’s form
Then write a complete poem based on that plan
… you can split it into smaller prompts, like this:
structure prompt:
Plan the structure of a new poem based on this idea
${idea}
Describe its form, stanza count, lines per stanza, and rhyme
scheme
style prompt:
Given this poem structure:
Form: ${structure.form}
Stanzas: ${structure.stanzaCount}
Lines per stanza: ${structure.linesPerStanza}
Rhyme scheme: ${structure.rhymeScheme}
Choose a poetic style: tone, voice, and literary devices to
emphasize
poem prompt:
Write a complete poem based on this idea:
${idea}
Structure:
Form: ${structure.form}
Stanzas: ${structure.stanzaCount}
Lines per stanza: ${structure.linesPerStanza}
Rhyme scheme: ${structure.rhymeScheme}
Style:
Tone: ${style.stone}
Voice: ${style.voice}
Literary Devices: ${style.literaryDevices}
Why might you want to do this?
to improve the quality of the results
Models perform better when working on more constrained subproblems.
Splitting a larger prompt into smaller prompts helps the model stay
focused at each step.
to introspect intermediate results
This comes in handy when you want to log, validate, or correct
intermediate results.
to perform actions in between prompts
You might want to take the output of one prompt, use that to call
some tool, then use the output of that tool to decide what the next
prompt should be, which you can’t do with a single prompt.
In other words, prompt chaining unlocks greater accuracy,
control, and flexibility for prompt engineering.
The problem
The main issue with prompt chaining is that it is a huge pain in the
ass; if you start do anything a little bit complicated you need to start
using structured outputs (i.e. JSON), which adds a whole lot of
boilerplate to the process:
you have to define the schema for each intermediate step of the
process
You typically do this by defining your data model in your host
programming language (e.g. a Pydantic model in Python) or directly
defining your JSON schema
You have to instruct the model to produce JSON and explain the
shape of the expected output
(Depending on the framework) you have to decode the JSON into
your data model
For small prompt chaining pipelines this isn’t too hard, but it
starts to get annoying to define all these schemas when you scale this
up to more sophisticated prompt chaining pipelines.
So as a thought experiment I wanted to create a research prototype
that handled all of that for you so that you didn’t need to specify any
schemas at all. In other words I wanted to build a programming language
that harnessed bidirectional type inference to perform
schema inference for prompts with structured JSON outputs.
Example
I’ll cut to the case by showing the above prompt chain written as a
program in this language:
let concatSep = https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffgletlines= concatSep "\n"let generatePoem idea =let structure = prompt { model:"gpt-4o" , text:lines [ "Plan the structure of a new poem based on this idea:" , "" , idea , "" , "Describe its form, stanza count, lines per stanza, and rhyme scheme." ] }let renderedStructure =lines [ "- Form: "+ structure.form , "- Stanzas: "+Real/show (structure.stanzaCount :Integer) , "- Lines per stanza: "+Real/show (structure.linesPerStanza :Integer) , "- Rhyme scheme: "+ structure.rhymeScheme ]let style = prompt { model:"gpt-4o" , text:lines [ "Given this poem structure:" , renderedStructure , "" , "Choose a poetic style: tone, voice, and literary devices to emphasize." ] }let renderedStyle =lines [ "- Tone: "+ style.tone , "- Voice: "+ style.voice , "- Literary Devices: "+ concatSep ", " style.literaryDevices ]let poem :Text= prompt { model:"gpt-4o" , text:lines [ "Write a complete poem based on this idea:" , "" , idea , "" , "Structure:" , renderedStructure , "" , "Style:" , renderedStyle ] }in { structure: structure, style: style, poem: poem }in generatePoem "The stillness of early morning"
If you run the above example, you might get an output like this:
{"structure":{"form":"Free verse with a sprinkle of rhyme","linesPerStanza":4,"rhymeScheme":"ABCB","stanzaCount":5},"style":{"literaryDevices":["imagery","alliteration","personification","enjambment"],"tone":"nostalgic","voice":"reflective"},"poem":"### The Stillness of Early Morning\n\n**I** \nIn the gentle rise of dawn's embrace, \nA whisper of winds through sleepy skies, \nThe world lies suspended in an echo \nOf yesterday's dreams, reluctantly saying goodbye.\n\n**II** \nSoft orange light graces fields of dew, \nCradling tenacious tendrils of the past. \nMorning breathes upon the waiting world \nWith promises that this serenity will last.\n\n**III** \nWaves of silence surge across the meadow, \nNature's symphony gently cradles rest. \nA chorus of leaves sings its soft refrain, \nComforted by morning's tender caress.\n\n**IV** \nThe silence dips to caress weary trees, \nWhose branches waltz to whispers soft and slow. \nEach tendril holds close the morning's intention, \nBefore the day's demands, the chaos, start to show.\n\n**V** \nEarly rays dance upon a tranquil sea, \nA canvas painted through a dreamy haze. \nTime sits by, savoring the gentle hush, \nAs the world awakens, stirs, and finally plays."}
… and if you’re curious what the formatted poem looks like:
The Stillness of Early
Morning
I
In the gentle rise of dawn’s embrace,
A whisper of winds through sleepy skies,
The world lies suspended in an echo
Of yesterday’s dreams, reluctantly saying goodbye.
II
Soft orange light graces fields of dew,
Cradling tenacious tendrils of the past.
Morning breathes upon the waiting world
With promises that this serenity will last.
III
Waves of silence surge across the meadow,
Nature’s symphony gently cradles rest.
A chorus of leaves sings its soft refrain,
Comforted by morning’s tender caress.
IV
The silence dips to caress weary trees,
Whose branches waltz to whispers soft and slow.
Each tendril holds close the morning’s intention,
Before the day’s demands, the chaos, start to show.
V
Early rays dance upon a tranquil sea,
A canvas painted through a dreamy haze.
Time sits by, savoring the gentle hush,
As the world awakens, stirs, and finally plays.
Type inference
The sample Grace program hardly specifies any types (mainly the final
expected type for the poem: Text). The reason
this works is because Grace supports bidirectional type
inference, which means that Grace can work backwards from how
intermediate results are used to infer their schemas.
I’ll illustrate this with a contrived Grace example:
let numbers = prompt{ text:"Give me two numbers" }in { x: numbers.x , y: numbers.y , sum: numbers.x + numbers.y :Integer }
… which might produce an output like this:
$ grace interpret ./numbers.ffg
{"x":7,"y":14,"sum":21}
When Grace analyzes this program the type checker works backwards
from this expression:
numbers.x + numbers.y :Integer
… and reasons about it like this:
the addition produces an Integer, therefore
numbers.x and numbers.y must also be
Integers
therefore numbers is a record with two fields,
x and y, both of which are
Integers
… or using Grace syntax, the inferred type of numbers
is: { x: Integer, y: Integer }
therefore the output of the prompt command must have
the same type
… and then Grace generates a JSON schema for the prompt which looks
like this:
Of course, you can specify types if you want (and they’re
more lightweight than schemas in traditional prompt chaining
frameworks). For example:
$ grace repl>>> prompt{ text:"Give me a first and last name" } : { first: Text, last: Text }{"first":"Emily", "last": "Johnson" }>>> prompt{ text:"Give me a list of names" } : List Text["Alice","Bob","Charlie","Diana","Ethan","Fiona","George","Hannah","Isaac","Jack"]
However in our original example we don’t need to specify intermediate
types because when the type-checker sees this code:
let structure = prompt { model:"gpt-4o" , text:lines [ "Plan the structure of a new poem based on this idea:" , "" , idea , "" , "Describe its form, stanza count, lines per stanza, and rhyme scheme." ] }let renderedStructure =lines [ "- Form: "+ structure.form , "- Stanzas: "+Real/show (structure.stanzaCount :Integer) , "- Lines per stanza: "+Real/show (structure.linesPerStanza :Integer) , "- Rhyme scheme: "+ structure.rhymeScheme ]
… the compiler can reason backwards from how the
structure value is used to infer that the JSON schema for
the prompt needs to be:
This doesn’t actually run any tools (I haven’t added any
callable tools to my work-in-progress branch yet), but just renders the
tool use as a string for now:
$ grace interpret ./tools.ffg
["curl https://api.example.com/data","ls -l -a"]
However, the idea is that you can model a tool as a sum type with one
constructor per callable tool, and in the above example the type checker
infers that the sum type representing one tool call is:
… but since we List/map the call function
over the output of the prompt the type checker infers that
the prompt needs to generate a List of tool
calls:
prompt{ text:"Call some tools" } :List<HttpRequest: …, ShellCommand: … >
… and then Grace does some magic under the hood to convert that type
to the equivalent JSON schema.
What’s particularly neat about this example is that the prompt is so
incredibly bare (“Call some tools”) because all the information the
model needs is present in the schema.
Schema-driven prompting
We can explore this idea of using the schema to drive the prompt
instead of prose using an example like this:
prompt{ text:"Generate some characters for a story", model:"gpt-4o" }:List { "The character's name":Text , "The most memorable thing about the character":Text , "The character's personal arc":Text }
[{"The character's name":"Aveline Thatcher","The character's personal arc":"Aveline starts as a skeptical journalist who doubts the stories of mythical creatures. Over time, she becomes a firm believer, risking her career to uncover the truth and protect these creatures.","The most memorable thing about the character":"The intricate tattoo of a phoenix on her forearm that seems to glow when she discovers hidden truths."},{"The character's name":"Kelan Frost","The character's personal arc":"A former rogue alchemist who turns hero after he inadvertently creates a dangerous substance. Driven by guilt, Kelan seeks redemption by finding an antidote and saving his village.","The most memorable thing about the character":"His iridescent blue eyes that seem to see into one's soul, a side effect of his alchemical experiments."},{"The character's name":"Luciana Blair","The character's personal arc":"Luciana is a reclusive artist who initially fears the world outside her home. After a mysterious vision rejuvenates her, she sets out on a journey of self-discovery, ultimately finding both her voice and courage.","The most memorable thing about the character":"Her ability to paint scenes before they happen, which she attributes to the visions she sees in her dreams."},{"The character's name":"Ezra Hartman","The character's personal arc":"Once a charismatic but self-centered lawyer, Ezra is confronted with a moral crisis that forces him to reevaluate his values. He chooses a path of integrity, becoming an advocate for justice.","The most memorable thing about the character":"His perfectly tailored suits that slowly become more casual, symbolizing his transformation and shifting priorities."},{"The character's name":"Seraphine Mora","The character's personal arc":"Seraphine is a young music prodigy who loses her hearing after an accident. Battling despair, she learns to embrace a new way of 'hearing' music through vibrations and her other senses.","The most memorable thing about the character":"The ethereal way she 'dances' with the music, using her entire body to express each note's emotion."}]
Grace is a superset of JSON and since JSON supports arbitrary field
names so does Grace! Field names in Grace support arbitrary
capitalization, punctuation, and whitespace as long as you quote them,
and we can use the field names to “smuggle” the description of each
field into the schema.
Conclusion
Hopefully this gives you some idea of why I’ve begun to think of
prompt chaining as a programming languages problem. Type inference is
just the beginning and I think it is possible to use a domain-specific
programming language not just to simplify the code but to ultimately
unlock greater reasoning power.
I’m going to continue to use Grace as a research vehicle for prompt
chaining but my LLM-enabled branch
of Grace (like Grace itself) is not really intended to be used in
production and I created it mainly as a proof-of-concept for where I’d
like prompt chaining frameworks to go. If I do end up eventually
productionizing this research I will create a proper fork with its own
name and the whole works.
Given the coordinates of the three vertices of a triangle, can we find
the area? Yes. If by no other method, we can use the Pythagorean
theorem to find the lengths of the edges, and then
Heron's formula to compute the area from
that.
Now, given the coordinates of the four vertices of a quadrilateral,
can we find the area? And the answer is, no, there is no method to do
that, because there is not enough information:
These three quadrilaterals have the same vertices, but different
areas. Just knowing the vertices is not enough; you also need their order.
I suppose one could abstract this: Let be the function that maps
the set of vertices to the area of the quadrilateral. Can we
calculate values of ? No, because there is no such , it is
not well-defined.
Put that way it seems less interesting. It's just another example of
the principle that, just because you put together a plausible sounding
description of some object, you cannot infer that such an object must
exist. One of the all-time pop hits here is:
Let be the smallest [real / rational] number strictly greater than …
which appears on Math SE quite frequently. Another one I remember is
someone who asked about
the volume of a polyhedron with exactly five faces, all triangles. This
is a fallacy at the ontological level, not the mathematical
level, so when it comes up I try to demonstrate it with a
nonmathematical counterexample, usually something like “the largest
purple hat in my closet” or perhaps “the current Crown Prince of the
Ottoman Empire”. The latter is less good because it relies on the
other person to know obscure stuff about the Ottoman Empire, whatever
that is.
This is also unfortunately also the error in Anselm's so-called
“ontological proof of God”. A philosophically-minded friend of mine
once remarked that being known for the discovery of the ontological
proof of God is like being known for the discovery that you can wipe
your ass with your hand.
Anyway, I'm digressing. The interesting part of the quadrilateral
thing, to me, is not so much that doesn't exist, but the specific
reasoning that demonstrates that it can't exist. I think there are
more examples of this proof strategy, where we prove nonexistence
by showing there is not enough information for the thing to exist, but
I haven't thought about it enough to come up with one.
There is a proof, the so-called
“information-theoretic proof”,
that a comparison sorting algorithm takes at least time, based
on comparing the amount of information gathered from the comparisons
(one bit each) with that required to distinguish all possible
permutations ( bits total). I'm not sure
that's what I'm looking for here. But I'm also not sure it isn't, or
why I feel it might be different.
Addenda
20250430
Carl Muckenhoupt suggests that logical independence proofs are of the
same sort. He says, for example:
Is there a way to prove the parallel postulate from Euclid's other
axioms? No, there is not enough information. Here are two geometric
models that produce different results.
This is just the sort of thing I was looking for.
20250503
Rik Signes has allowed me to reveal that he was the source of the
memorable disparagement of Anselm's dumbass argument.
In January, we presented our work on Explicit Level Imports at the
Trends in Functional Programming symposium.
We’re pleased to announce that the paper was awarded the the John McCarthy Prize for best paper overall!
The paper introduces the ExplicitLevelImports extension to GHC, which gives
programmers fine-grained control over which modules and dependencies are
required by Template Haskell. For instance, in the following example, the
splice import tells the compiler that Control.Lens.TH is needed only at compile
time and not at runtime:
import splice Control.Lens.TH (makeLenses)
import App (S)
data T = MkT { foo :: S }
$(makeLenses ''T )
By taking advantage of this extra information, the compiler can perform less work
in certain situations. In one benchmark, we modified pandoc to use ExplicitLevelImports and
compilation time was halved when using -fno-code. For full details, read the paper
or check out
GHC Proposal #682.
The implementation has landed in
GHC MR !14241,
so ExplicitLevelImports will be available in the next major release of GHC (9.14).
Explicit Level Imports
Matthew Pickering, Rodrigo Mesquita, Adam Gundry
TFP 2025 (PDF) (Awarded John McCarthy Prize for best paper overall)
Abstract. Cross-stage persistence rules are commonly admitted in multi-
stage programming languages. These rules codify the assumption that all
module and package dependencies are available at all stages. However,
in practice, only a small number of dependencies may be needed at each
particular stage.
This paper introduces Explicit Level Imports, a mechanism which gives
programmers precise control about which dependencies are required at
each stage. Imports are annotated with a modifier which brings identifiers
into scope at a specific level. This precision means it is straightforward
for the compiler to work out what is exactly needed at each stage, and
only provide that. The result is faster compilation times and the potential
for improved cross-compilation support.
We have implemented these ideas in GHC Haskell, consider a wide variety
of practical considerations in the design, and finally demonstrate that the
feature solves a real-world issue in a pragmatic way.
Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.
There are many introductions to the Expectation-Maximisation algorithm.
Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic.
They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems.
Examples of tricks I've seen used are:
Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary.
There's nothing original here, I'm merely expanding on a stackexchange answer.
Generalities about EM
The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.
First assume no hidden variables.
We assume there is a vector of parameters that defines some model.
We make some observations .
We have a probability density that depends on .
The likelihood of given the observations is .
The maximum likelhood estimator for is the choice of that maximises for the we have observed.
Now suppose there are also some variables that we didn't get to observe.
We assume a density .
We now have
where we sum over all possible values of .
The MLE approach says we now need to maximise
One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum.
If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others.
Here's how we might move in that direction.
Consider instead the log-likelihood
Now imagine that by magic we could commute the logarithm with the sum.
We'd need to maximise
One reason this would be to our advantage is that often takes the form where is a simple function to optimise.
In addition, may break up as a sum of terms, each with its own block of 's.
Moving the logarithm inside the sum would give us something we could easily maximise term by term.
What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise.
But, of course, we can't just move that logarithm in.
Maximisation by proxy
Sometimes a function is too hard to optimise directly.
But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead.
That will give us a new guess and we can continue from there.
This is the basis of gradient descent.
Suppose is a differentiable function in a neighbourhood of .
Then around we have
We can try optimising with respect to within a neighbourhood of .
If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent.
(Note that picking a circular neighbourhood is itself a somewhat arbitrary step,
but that's another story.)
For gradient descent we're choosing because it matches both the value and derivatives of at .
We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.
We want our logarithm of a sum to be a sum of logarithms.
But instead we'll settle for a proxy function that is a sum of logarithms.
We'll make the derivatives of the proxy match those of the original function
precisely so we're not making an arbitrary choice.
Write
The are constants we'll determine.
We want to match the derivatives on either side of the
at :
On the other hand we have
To achieve equality we want to make these expressions match.
We choose
Our desired proxy function is:
So the procedure is to take an estimated and obtain a new estimate
by optimising this proxy function with respect to .
This is the standard EM algorithm.
It turns out that this proxy has some other useful properties.
For example, because of the concavity of the logarithm,
the proxy is always smaller than the original likelihood.
This means that when we optimise it we never optimise ``too far''
and that progress optimising the proxy is always progress optimising the
original likelihood.
But I don't need to say anything about this as it's all part of the standard literature.
Afterword
As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute
you can iterate, at each step computing
where is the previous iteration.
If the take a convenient form then this may turn out to be much easier.
Note
This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.
Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.
Preface
Functional programming encourages us to program without mutable state.
Instead we compose functions that can be viewed as state transformers.
It's a change of perspective that can have a big impact on how we reason about our code.
But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations.
(Unless you're infinitely unlucky!)
Markov Chains
A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state.
So it is a sequence such that for all .
A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny).
(And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)
Like imperative code, this description is stateful.
The state at step depends on the state at step .
Probability is often easier to reason about when we work with independent identically drawn random variables and our aren't of this type.
But we can eliminate the state from our description using the same method used by functional programmers.
Let's choose a Markov chain to play with.
I'll pick one with 3 states called , and and with transition probabilities given by
where
Here's a diagram illustrating our states:
Implementation
First some imports:
> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}
> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)
We are now in a position to simulate our Markov chain.
First we need some random numbers drawn uniformly from [0, 1]:
> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random
And now the code to take a single step in the Markov chain:
> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
> a <- uniform
> if a < 0.5
> then return A
> else return B
> step B = do
> a <- uniform
> if a < 1/3.0
> then return A
> else if a < 2/3.0
> then return B
> else return C
> step C = do
> a <- uniform
> if a < 0.5
> then return B
> else return C
Notice how the step function generates a new state at random in a way that depends on the previous state.
The m ABC in the type signature makes it clear that we are generating random states at each step.
We can simulate the effect of taking steps with a function like this:
> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
> i <- steps (n-1) i
> step i
We can run for 100 steps, starting with , with a line like so:
*Main> evalState (steps 3 A) gen
B
The starting state of our random number generator is given by gen.
Consider the distribution of states after taking steps.
For Markov chains of this type, we know that as goes to infinity the distribution of the th state approaches a limiting "stationary" distribution.
There are frequently times when we want to sample from this final distribution.
For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution.
But for real world problems this can be intractable.
Instead, a popular solution is to pick a large and hope it's large enough.
As gets larger the distribution gets closer to the limiting distribution.
And that's the problem I want to solve here - sampling from the limit.
It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.
Some random functions
Here is a new version of our random step function:
> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
> a <- uniform
> return $ \case
> A -> if a < 0.5 then A else B
> B -> if a < 1/3.0
> then A
> else if a < 2/3.0 then B else C
> C -> if a < 0.5 then B else C
In many ways it's similar to the previous one.
But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state.
We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:
> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
> fs <- replicateA n step'
> return $ foldr (flip (.)) id fs
Notice the use of flip.
We want to compose functions , each time composing on the left by the new .
This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation:
(BTW I used replicateA instead of replicateM to indicate that these are independent random draws.
It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)
*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]
When I first implemented this I accidentally forgot the flip.
So maybe you're wondering what effect removing the flip has?
The effect is about as close to a miracle as I've seen in mathematics.
It allows us to sample from the limiting distribution in a finite number of steps!
Here's the code:
> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
> fs <- replicateA n step'
> return $ foldr (.) id fs
We end up building .
This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'.
Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain.
Running with argument draws a random composition of functions.
But if you increase by 1 you don't add a new step at the end.
Instead you effectively restart the Markov chain with a new first step generated by a new random seed.
Try it and see:
*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]
Maybe that's surprising.
It seems to get stuck in one state.
In fact, we can try applying the resulting function to all three states.
*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]
In other words, for large enough we get the constant function.
Think of it this way:
If f isn't injective then it's possible that two states get collapsed to the same state.
If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state.
Once this happens, we'll get the same result no matter how large we take .
If we can detect this then we've found the limit of as goes to infinity.
But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition.
That flip can't change what probability distribution we're drawing from - just the dependence on the seed.
So the value the constant function takes is actually a draw from the limiting stationary distribution.
We can code this up:
> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as
> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool
> test_constant f =
> all_equal $ map f $ enumFromTo minBound maxBound
This technique is called coupling from the past.
It's "coupling" because we've arranged that different starting points coalesce.
And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.
> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
> m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
> if test_constant f
> then return f
> else do
> f' <- step
> couple_from_past step (f . f')
We can now sample from the limiting distribution a million times, say:
*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen
We can now count how often A appears:
*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748
That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.
> gen = mkStdGen 669
Notes
The technique of coupling from the past first appeared in a paper by Propp and Wilson.
The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it.
Note that the code above is absolutely not how you'd implement this for real.
I wrote the code that way so that I could switch algorithm with the simple removal of a flip.
In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred.
Or even with uncountably large state spaces.
But I'll let you read the Propp-Wilson paper to find out how.
In those articles I showed how you could build up the Clifford algebras like so:
type Cliff1 = Complex R
type Cliff1' = Split R
type Cliff2 = Quaternion R
type Cliff2' = Matrix R
type Cliff3 = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4 = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5 = Quaternion Cliff3'
...
I used CliffN as the Clifford algebra for a negative definite inner product and
CliffN' for the positive definite case.
It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.
The core principle making this work is that for type constructors implemented like Matrix, Quaternion etc. we have the property that
eg. Matrix (Quaternion Float) is effectively the same thing as Matrix FloatQuaternion Float.
But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:
> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4
...
Or even better:
> type family Cliff (n :: Nat) :: * where
> Cliff 0 = Float
> Cliff n = Complex (Cliff (n - 1))
But there's one little catch.
We have to work, not with the tensor product, but the super tensor product.
We define Complex the same way as before:
> data Complex a = C a a deriving (Eq, Show)
Previously we used a definition of multiplication like this:
instance Num a => Num (Complex a) where
C a b * C c d = C (a * c - b * d) (a * d + b * c)
We can think of C a b in Complex R as representing the element .
The definition of multiplication in a tensor product of algebras is defined by .
So we have .
This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.
Let's go Super!
A superalgebra is an algebra graded by where is the ring of integers modulo 2.
What that means is that we have some algebra that can be broken down as a direct sum (the subscripts live in ) with the property that multiplication respects the grading, ie. if is in and is in then is in .
The elements of are called "even" (or bosonic) and those in "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)
The super tensor product is much like the tensor product but it respects the grading.
This means that if is in and is in then is in .
From now on I'm using to mean super tensor product.
Multiplication in the super tensor product of two superalgebras and is now defined by the following modified rule:
if is in and is in then .
Note that the sign flip arises when we shuffle an odd left past an odd .
The neat fact that John pointed out to me is that
.
We have to modify our definition of * to take into account that sign flip.
I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy.
But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.
So I could define SuperAlgebra like so:
class SuperAlgebra a where
conjugation :: a -> a
where conjugation is the negation of the odd part.
(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)
But there's a little efficiency optimization I want to write.
If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row.
This means applying negate to many elements of large algebraic objects even
though any pair of them cancel each other's effect.
So I add a little flag to my conjugation function that is used to say we want an extra negate and we can
accumulate flips of a flag rather than flips of lots of elements.
> class SuperAlgebra a where
> conjugation :: Bool -> a -> a
Here's our first instance:
> instance SuperAlgebra Float where
> conjugation False x = x
> conjugation True x = negate x
This is saying that the conjugation is the identity on Float but if we
want to perform an extra flip we can set the flag to True.
Maybe I should call it conjugationWithOptionalExtraNegation.
And now comes the first bit of non-trivial superalgebra:
> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
> conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)
We consider to be even and to be odd. When we apply the conjugation to then we can just apply it directly to .
But that flips the "parity" of (because tensor product respects the grading) so we need to swap when we use the conjugation.
And that should explain why conjugation is defined the way it is.
Now we can use the modified rule for defined above:
> instance (Num a, SuperAlgebra a) => Num (Complex a) where
> fromInteger n = C (fromInteger n) 0
> C a b + C a' b' = C (a + a') (b + b')
> C a b * C c d = C (a * c - conjugation False b * d)
> (conjugation False a * d + b * c)
> negate (C a b) = C (negate a) (negate b)
> abs = undefined
> signum = undefined
For example, conjugation False is applied to the first on the RHS because implicitly represents an term and when expanding out the product we shuffle the (odd) in left of . It doesn't get applied to the second because and remain in the same order.
That's it!
Tests
I'll test it with some examples from Cliff3:
> class HasBasis a where
> e :: Integer -> a
> instance HasBasis Float where
> e = undefined
> instance (Num a, HasBasis a) => HasBasis (Complex a) where
> e 0 = C 0 1
> e n = C (e (n - 1)) 0
> make a b c d e f g h =
> C (C (C a b) (C c d))
> (C (C e f) (C g h))
The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction.
It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative).
Nonetheless, I think comparison with Cayley-Dickson may be useful.
Efficiency
As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications.
I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four.
This should lead to a fast implementation of Clifford algebras.
Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras.
I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with .
Note also that this code translates directly into many languages.
With the release of the final module of Solve.hs last week, we now have 7 finished courses available at Monday Morning Haskell Academy. With this many courses, it might be a little challenging to pick the right one.
While we have a little guide on our website to help you pick, I also wanted to make it a bit easier to select the courses for the right level of experience, and also provide some really great deals on our course material.
So last week, we released 3 new course bundles to help you save. The 3 levels are Beginner, Advanced, and Complete.
This bundle is great if you’re just starting out with Haskell, even if you haven’t even installed it or written a line before! The first two courses in this bundle will help you install your toolchain and learn the language fundamentals. Then after that, you’ll learn about some trickier Haskell concepts, like monads and advanced problem solving techniques.
Along the way, you’ll also get the chance to write a couple small projects to build your skills and confidence. With the progression of these courses, you can really go from “Zero Knowledge” to “Confident Haskell User”.
Advanced Bundle
Our Advanced Bundle is for Haskellers who’ve mastered the basics and are trying to learn how to apply Haskell in some more “real-world” settings. The courses are:
In the first three courses, you’ll learn about things like machine learning, writing web servers, deploying applications, and managing complex effect stacks.
Then you’ll see that our newly completed Solve.hs course appears in both bundles. It bridges the gap between basic problem solving skills, like manipulating lists and strings, to more advanced ideas, like implementing data structures from scratch and writing complex algorithms. So even if you’ve got some decent skills already, you’ll definitely still find quite a few challenges in this course!
MMH Complete
Finally, MMH Complete will give you access to our entire library of courses. You’ll get all 7 courses, at a substantial discount! Plus you are guaranteed to receive any new course content we come up with in the future.
Discounts!
Speaking of discounts, here are the discounts you would get for each bundle vs. purchasing each course individually:
Plus, this week, you can get an extra 20% off all courses and bundles using the code SOLVE25. If you want an even bigger discount, you can subscribe to our newsletter. You’ll get monthly updates AND a code for 30% off all products.
So don’t miss out on these offers! Head to the courses page now! Next week, they’ll be going away!
From my perspective, one of the biggest open problems in implementing
programming languages is how to add a type system to the language
without significantly complicating the implementation.
For example, in my tutorial Fall-from-Grace
implementation the type checker logic accounts for over half of the
code. In the following lines of code report I’ve highlighted the modules
responsible for type-checking with a ‡:
That’s 2684 lines of code (≈51%) just for type-checking (and believe
me: I tried very hard to simplify the type-checking code).
This is the reason why programming language implementers will be
pretty keen to just not implement a type-checker for their language, and
that’s how we end up with a proliferation of untyped programming
languages (e.g. Godot or Nix), or ones that end up with a type system
bolted on long after the fact (e.g. TypeScript or Python). You can see
why someone would be pretty tempted to skip implementing a type system
for their language (especially given that it’s an optional language
feature) if it’s going to balloon the size of their codebase.
So I’m extremely keen on implementing a “lean” type checker that has
a high power-to-weight ratio. I also believe that a compact type checker
is an important foundational step for functional programming to “go
viral” and displace imperative programming. This post outlines one
approach to this problem that I’ve been experimenting with1.
Unification
The thing that bloats the size of most type-checking implementations
is the need to track unification variables. These variables are
placeholders for storing as-yet-unknown information about something’s
type.
For example, when a functional programming language infers the type
of something like this Grace expression:
(λx → x) true
… the way it typically works is that it will infer the type of the
function (λx → x) which will be:
λx → x : α → α
… where α is a unification variable (an unsolved type).
So you can read the above type annotation as saying “the type of
λx → x is a function from some unknown input type
(α) to the same output type (α).
Then the type checker will infer the type of the function’s input
argument (true) which will be:
true :Bool
… and finally the type checker will combine those two pieces of
information and reason about the final type like this:
the input to the function (true) is a
Bool
therefore the function’s input type (α) must also be
Bool
therefore the function’s output type (α) must also be
Bool
therefore the entire expression’s type is Bool
… which gives the following conclusion of type inference:
(λx → x) true :Bool
However, managing unification variables like α is a lot
trickier than it sounds. There are multiple unification
algorithms/frameworks in the wild but the problem with all of them is
that you have to essentially implement a bespoke logic programming
language (with all of the complexity that entails). Like, geez, I’m
already implementing a programming language and I don’t want to have to
implement a logic programming language on top of that just to power my
type-checker.
So there are a couple of ways I’ve been brainstorming how to address
this problem and one idea I had was: what if we could get rid of
unification variables altogether?
Deleting unification
Alright, so this is the part of the post that requires some
familiarity/experience with implementing a type-checker. If you’re
somebody new to programming language theory then you can still keep
reading but this is where I have to assume some prior knowledge
otherwise this post will get way too long.
The basic idea is that you start from the “Complete and Easy”
bidirectional type checking algorithm which is a type checking
algorithm that does use unification variables2 but
is simpler than most type checking algorithms. The type checking rules
look like this (you can just gloss over them):
Now, delete all the rules involving unification variables. Yes, all
of them. That means that all of the type-checking judgments from Figures
9 and 10 are gone and also quite a few rules from Figure 11 disappear,
too.
Surprisingly, you can still type check a lot of code with what’s
left, but you lose two important type inference features if you do
this:
you can no longer infer the types of lambda arguments
you can no longer automatically instantiate polymorphic
code
… and I’ll dig into those two issues in more detail.
Inferring lambda argument
types
You lose the ability to infer the type of a function like this one
when you drop support for unification variables:
λx → x ==False
Normally, a type checker that supports unification can infer that the
above function has type Bool → Bool, but (in general) a
type checker can no longer infer that when you drop unification
variables from the implementation.
This loss is not too bad (in fact, it’s a pretty common
trade-off proposed in the bidirectional type checking literature)
because you can make up for it in a few ways (all of which are easy and
efficient to implement in a type checker):
You can allow the input type to be inferred if the lambda is
given an explicit type annotation, like this:
λx → x ==False:Bool → Bool
More generally, you can allow the input type to be inferred if the
lambda is checked against an expected type (and a type annotation is one
case, but not the only case, where a lambda is checked against an
expected type).
We’re going to lean on this pretty heavily because it’s pretty
reasonable to ask users to provide type annotations for function
definitions and also because there are many situations where we can
infer the expected type of a lambda expression from its immediate
context.
You can allow the user to explicitly supply the type of the
argument
… like this:
λ(x :Bool) → x ==False
This is how Dhall works,
although it’s not as ergonomic.
You can allow the input type to be inferred if the lambda is
applied to an argument
This is not that interesting, but I’m mentioning it for completeness.
The reason it’s not interesting is because you won’t often see
expressions of the form (λx → e) y in the wild, because
they can more idiomatically be rewritten as
let x = y in e.
Instantiating polymorphic
code
The bigger issue with dropping support for unification variables is:
all user-defined polymorphic functions now require explicit type
abstraction and explicit type application, which is a
major regression in the type system’s user experience.
For example, in a language with unification variables you can write
the polymorphic identity function as:
Most programmers do NOT want to program in a
language where they have to explicitly manipulate type variables in this
way. In particular, they really hate explicit type application. For
example, nobody wants to write:
map { x :Bool, … large record … } Bool (λr → r.x) rs
So we need to figure out some way to work around this limitation.
The trick
However, there is a solution that I believe gives a high
power-to-weight ratio, which I will refer to as “keyword” type
checking:
add a bunch of built-in functions
Specifically, add enough built-in functions to cover most use cases
where users would need a polymorphic function.
add special type-checking rules for those built-in functions when
they’re fully saturated with all of their arguments
These special-cased type-checking rules would not require unification
variables.
still require explicit type abstraction when these built-in
functions are not fully saturated
Alternatively, you can require that built-in polymorphic functions
are fully saturated with their arguments and make it a parsing error if
they’re not.
still require explicit type abstraction and explicit type
application for all user-defined (i.e. non-builtin) polymorphic
functions
optionally, turn these built-in functions into keywords or
language constructs
I’ll give a concrete example: the map function for
lists. In many functional programming languages this map
function is not a built-in function; rather it’s defined within the host
language as a function of the following type:
map: ∀(a b :Type) → (a → b) → List a → List b
What I’m proposing is that the map function would now
become a built-in function within the language and you would now apply a
special type-checking rule when the map function is fully
saturated:
Γ ⊢ xs ⇒ List a Γ ⊢ f ⇐ a → b
───────────────────────────────
Γ ⊢ map f xs ⇐ List b
In other words, we’re essentially treating the map
built-in function like a “keyword” in our language (when it’s fully
saturated). Just like a keyword, it’s a built-in language feature that
has special type-checking rules. Hell, you could even make it an actual
keyword or language construct (e.g. a list comprehension) instead of a
function call.
I would even argue that you should make each of these special-cased
builtin-functions a keyword or a language construct instead of a
function call (which is why I call this “keyword type checking” in the
first place). When viewed through this lens the restrictions that these
polymorphic built-in functions (A) are saturated with their arguments
and (B) have a special type checking judgment are no different than the
restrictions for ordinary keywords or language constructs (which also
must be saturated with their arguments and also require special type
checking judgments).
To make an analogy, in many functional programming languages the
if/then/else construct has this
same “keyword” status. You typically don’t implement it as a user-space
function of this type:
ifThenElse : ∀(a :Type) → Bool → a → a → a
Rather, you define if as a language construct and you
also add a special type-checking rule for if:
Γ ⊢ b ⇐ Bool Γ ⊢ x ⇒ a Γ ⊢ y ⇐ a
────────────────────────────────────
Γ ⊢ if b then x else y ⇒ a
… and what I’m proposing is essentially greatly exploding the number
of “keywords” in the implementation of the language by turning a whole
bunch of commonly-used polymorphic functions into built-in functions (or
keywords, or language constructs) that are given special type-checking
treatment.
For example, suppose the user were to create a polymorphic function
like this one:
let twice = λ(a :Type) → λ(x : a) → [ x, x ]in twice (ListBool) (twice Bool true)
That’s not very ergonomic to define and use, but we also can’t
reasonably expect our programming language to provide a
twice built-in function. However, our language could
provide a generally useful replicate builtin function (like
Haskell’s
replicate function):
replicate: ∀(a :Type) → Natural → a → List a
… with the following type-checking judgment:
Γ ⊢ n ⇐ Natural Γ ⊢ x ⇒ a
───────────────────────────
Γ ⊢ replicate n x ⇒ List a
… and then you would tell the user to use replicate
directly instead of defining their own twice function:
replicate2 (replicate2 true)
… and if the user were to ask you “How do I define a
twice synonym for replicate 2” you would just
tell them “Don’t do that. Use replicate 2 directly.”
Conclusion
This approach has the major upside that it’s much easier to implement
a large number of keywords than it is to implement a unification
algorithm, but there are other benefits to doing this, too!
It discourages complexity and fragmentation in user-space
code
Built-in polymorphic functions have an ergonomic advantage over
user-defined polymorphic functions because under this framework type
inference works better for built-in functions. This creates an ergonomic
incentive to stick to the “standard library” of built-in polymorphic
functions, which in turn promotes an opinionated coding style across all
code written in that language.
You might notice that this approach is somewhat similar in spirit to
how Go handles polymorphism which is to say: it doesn’t handle
user-defined polymorphic code well. For example, Go provides a few
built-in language features that support polymorphism (e.g. the
map data structure and for loops) but if users ask for any
sort of user-defined polymorphism then the maintainers tell them they’re
wrong for wanting that. The main difference here is that (unlike Go) we
do actually support user-defined polymorphism; it’s not forbidden, but
it is less ergonomic than sticking to the built-in utilities that
support polymorphism..
It improves error messages
When you special-case the type-checking logic you can also
special-case the error messages, too! With general-purpose unification
the error message can often be a bit divorced from the user’s intent,
but with “keyword type checking” the error message is not only more
local to the problem but it can also suggest highly-specific tips or
fixes appropriate for that built-in function (or keyword or language
construct).
It can in some cases more closely match the expectations of
imperative programmers
What I mean is: most programmers coming from an imperative and typed
background are used to languages where (most of the time) polymorphism
is “supported” via built-in language constructs and keywords and
user-defined polymorphism might be supported but considered “fancy”.
Leaning on polymorphism via keywords and language constructs would
actually make them more comfortable using polymorphism instead of trying
to teach them how to produce and consume user-defined polymorphic
functions.
For example, in a lot of imperative languages the idiomatic solution
for how to do anything with a list is “use a for loop” where you can
think of a for loop as a built-in keyword that supports polymorphic
code. The functional programming equivalent of “just use a for loop”
would be something like “just use a list comprehension” (where a list
comprehension is a “keyword” that supports polymorphic code that we can
give special type checking treatment).
That said, this approach is still more brittle than unification and
will require more type annotations in general. The goal here isn’t to
completely recover the full power of unification but rather to get
something that’s not too bad but significantly easier to
implement.
I think this “keyword type checking” can potentially occupy a “low
tech” point in the type checking design space for functional programming
languages that need to have efficient and compact implementations
(e.g. for ease of embedding). Also, this can potentially provide a
stop-gap solution for novice language implementers that want
some sort of a type system but they’re not willing to commit to
implementing a unification-based type system.
There’s also variation on this idea which Verity Scheel has been
exploring, which is to provide userland support for defining new
functions with special type-checking rules and there’s a post from her
outlining how to do that:
The other approach is to create
essentially an “ABNF for type checkers” that would let you write
type-checking judgments in a standard format that could generate the
corresponding type-checking code in multiple languages. That’s still a
work-in-progress, though.↩︎
I believe some people might take
issue with calling these unification variables because they consider
bidirectional type checking as a distinct framework from unification.
Moreover, in the original bidirectional type checking paper they’re
called “unsolved” variables rather than unification variables. However,
I feel that for the purpose of this post it’s still morally correct to
refer to these unsolved variables as unification variables since their
usage and complexity tradeoffs are essentially identical to unification
variables in traditional unification algorithms.↩︎
Welcome to Philadelphia! We have a lot of political corruption here.
I recently wrote about the unusually corrupt Philadelphia Traffic Court,
where four of the judges went to the federal pokey, and the state
decided there was no way to clean it up, they had to step on it like a
cockroach. I ended by saying:
One of those traffic court judges was Willie Singletary, who I've
been planning to write about since 2019. But he is a hard worker who
deserves better than to be stuck in an epilogue, so I'll try to get
to him later this month.
This is that article from 2019, come to fruit at last. It was
originally inspired by this notice that appeared at my polling place on
election day that year:
Willie Singletary, candidate for Democratic Council At-Large,
has been removed from the Primary Ballot by Court Order.
Although his name appears on the ballot, votes for this
candidate will not be counted because he was convicted of two
Class E felonies by the United States District Court for the
Eastern District of Pennsylvania, which bars his candidacy under
Article 2, Section 7 of the Pennsylvania Constitution.
That's because Singletary had been one of those traffic court judges.
In 2014 he had been convicted of lying to the FBI in connection with
that case, and was sentenced to 20 months in federal prison; I think
he actually served 12.
That didn't stop Willie from trying to run for City Council, though,
and the challenge to his candidacy didn't wrap up before the ballots
were printed, so they had to post these notices.
Even before the bribery scandal and the federal conviction,
Singletary had already lost his Traffic Court job when it transpired
that he had showed dick pics to a Traffic Court cashier.
Before that, when he was campaigning for the Traffic Court job, he was
caught on video promising to give favorable treatment to campaign donors.
But Willie's enterprise and go-get-it attitude means he can't be kept
down for long. Willie rises to all challenges! He is now enjoying a
$90,000 annual salary as a Deputy Director of Community Partnerships
in the administration of Philadelphia Mayor Cherelle Parker. Parker's
spokesperson says
"The Parker administration supports every person’s right to a second chance in society.”
I think he might be on his fourth or fifth chance by now, but who's
counting? Let it never be said that Willie Singletary was a quitter.
Lorrie once made a remark that will live in my memory forever, about
the "West Philadelphia local politics-to-prison pipeline”. Mayor
Parker is such a visionary that she has been able to establish a
second pipeline in the opposite direction!
Addendum 20250501
I don't know how this happened, but when I committed the final version
of this article a few days ago, the commit message that my fingers
typed was:
Date: Sat Apr 26 14:24:19 2025 -0400
Willie Wingletsray finally ready to go
Since I was a child, I’ve been playing the French
Horn. I still play, and I
take it quite seriously. I’m lucky enough to play with some good
ensembles, and I perform many concerts each year.
When learning difficult music, I often practise with a
metronome. A metronome is a
device that clicks or beeps regularly. You can set how often it
clicks; for example you might set it to click 80 times a minute. The
tempo (or speed) of a piece of music is often specified by the
composer telling you how many beats per minute they want. This is
guidance and not sacrosanct: you don’t normally have to play at
exactly this tempo, and frequently music requires some implicit
variation of tempo for it to be successful. But it certainly is an
important piece of information from the composer, telling you at what
basic speed the piece should be going.
The problem with metronomes is that they can’t change their speed by
themselves. For complex music, the speed could be changing a lot, or,
maybe the number of beats per bar changes. This is annoying because it
means for some pieces of music you have to keep stopping playing,
readjust the metronome, and then continue on. There are also gradual
changes in tempo in music: a part of a piece might gently get faster
or slower. No metronome can cope with this: essentially, metronomes
know nothing about the piece of music you’re playing.
It does nearly all the musical things you would want it to. As this is
mainly a technical blog though, here I’ll focus on those aspects.
It could have been an app; I’ve built apps before. But the faff of
getting onto the play store, or the app store is just not worth
it. The development tools are heavyweight and annoying. Sending out
new versions requires approval processes, and you have to convince
people to install something before they can use it. So I wanted this
to be browser based. Also, modern web browsers are pretty amazing –
great features and well designed APIs. Yep, all the old APIs are
terrible and awful to work with, but everything that’s at all recent
is pretty great.
Drawing music in a browser is quite a challenge though. The way I’m
doing it is I’m building an SVG, client-side. This was the first thing
I started on: trying to figure out how to draw music in a browser, how
to be able to detect clicks, and make it all interactive. The client
side code is all generated from
TypeScript using the plain tsc to do
the translation to JavaScript. I can’t stand complex tool-chains, and
modern browsers are absolutely fine with loading modules (and you can
do some really nice things with import
maps
as we’ll see). I’m not even minimising the JavaScript: I’ve written
the server myself; the modules are sent over the wire gzipped and I
have correct cache-control settings using immutable and
“cache-busting”,
so minimising the source just makes debugging life harder for no real
gain.
A score is essentially a list of blocks. I’m using a
CRDT
(the fugue list
CRDT) to
allow local-first editing (and even offline editing). Dirty blocks get
sent over a websocket and stored on the server, using LMDB which is
all very normal for me.
The server has a neat part of its design: when you compile the server,
all the static assets are embedded into the binary, thus making it a
single self-contained executable. Now those assets (HTML, CSS, images,
JavaScript etc) are just normally named files, but they can also be Go
templates. When the server starts up, it works through these static
assets, building HTTP routes for them. The HTTP routes contain in
their paths the hashcode of the file – this is necessary for the
cache
busting. If
the asset is a template, the server knows how to run the template, and
critically, I provide a url function in the template engine so that
templates can get the URL of some other asset including its
hashcode. So this means that if some HTML file needs to link to some
CSS file, the HTML file as built into the server can be a template. At
start up, this template gets run, it can invoke this url function,
and it can find out the final URL of the CSS file. And of course this
URL now influences the hashcode of the HTML file itself. This also
plays very nicely with the
integrity
attribute you can put on all sorts of things these days.
So it all works out rather nicely: if you consider the tree of file A
importing files B and C, and file B imports file D, then if I make
some change to file D, then it means its hashcode changes, and so its
URL changes. This propagates up to file B, and from there to file A
(but no change to file C). So it’s safe to serve all these static
assets with immutable cache-control headers and rely on this lovely
hashcode chaining. All of this work is done once, each time the server
starts-up. And it’ll all explode and stop working if there’s ever a
cycle in the graph of file imports.
Now in practice, it seems that references between HTML, CSS, images,
or JavaScript don’t seem to create cycles – at least I’ve not had a
problem so far. But between JavaScript modules, it’s much more common,
as you’d likely expect. But here, import
maps
come to the rescue: in my TypeScript/JavaScript, I just import modules
normally. I have a function in the template engine which knows how to
generate an import-map of all my JavaScript modules, which gets
injected into the top HTML page. This import-map provides both the
rewriting of paths (to add hashcodes onto the paths), and also
provides the integrity
section. This
solves the problem of circular imports because it means the JavaScript
itself never needs to contain the hashcode of any module it
imports. Yet, if I change some JavaScript module, then its hashcode
changes, which means the import-map changes, and so again, the browser
is forced into correctly fetching the updated resource.
A couple of weekends ago, I was up visiting my parents and I wanted to
demonstrate this thing to them (they’re also musicians). They have
computers running Windows. I tried loading up a demo score, and it
just didn’t work. Their browsers were up to date. Nothing of note in
the server logs, so I opened up the browser console and found errors
from deserialisation of data coming over the websocket: it was
claiming the data was corrupted. I’d never seen this in my own
development and use.
Checking a few other things, and I spotted that the source HTML for
the page had had some additional <script> elements added to it:
something was injecting some JavaScript. And then the penny dropped:
this is MITM
behaviour by some shitty anti-virus software – in this case,
AVG. Some quick web searching, and yep, those products are also known
for dicking around with websocket traffic: if you’re sending binary
messages and you’re compressing the stream, it’s apparently quite
common that the anti-virus software intercepts the traffic, and then
screws up the framing leading your own code to face corrupted
data. Completely ridiculous.
In my case, disabling compression on the websocket was enough to
prevent the corruption, and I then established that even for big
scores, the initial load would be maybe 25kB of data over the
websocket, so not compressing it isn’t terrible.
What made me laugh though was this: the browser console was telling me
both about the corrupted data, and also about the fact the browser was
refusing to run some script, due to it violating
CSP
settings. It took me a moment to realise that the script that wasn’t
being run, was the script that the anti-virus software was injecting
into my pages! Now, import-maps can’t be external files, they have to
be inline in the HTML. But there’s no way I’m serving HTML pages with
a CSP header with script-src 'unsafe-inline'. Instead, I’m issuing a
script-src CSP header with 'self' and also the hashcode of
import-map itself.
What this says to the browser is that it can trust the import-map
(because hashing it will give a hashcode that matches the CSP header),
the import-map itself has its integrity for every JavaScript module it
needs to load, and the CSP headers also tell the browser that it’s OK
to load JavaScript modules from the same domain (this is the
'self'). But, this does not give permission for the browser to run
arbitrary bits of crap JavaScript that some awful anti-virus thing has
injected! So, by making use of CSP and import-maps, you can defeat
attackers from tampering with your website and code!
In my functional programming course to Master Students of
Telecom Nancy,
I like to use parsing as an example of monadic programming, relying on the
megaparsec library.
My only concern with megaparsec is that its official tutorial
is long: at the time I’m writing, it’s 15000 words long.
Unlike the official megaparsec tutorial, this blog post is intended to be smaller,
and is aimed at an audience with only a basic understanding of Haskell and monadic programming.
My running example is a parser for a domain-specific language that I designed for the class. This language uses primitive drawing commands to represent ASCII art roguelike maps. It looks like this:
Here, HLine x y len and VLine x y len draw horizontal and vertical walls respectively.
The Start x y command marks the player’s starting point and Cell x y ~ places special terrain.
Roguelike maps typically consist of rectangular rooms and connecting corridors, where
walls are shown as #, water as ~, and walkable spaces as dots (.)
For example, the snippet above draws a map with two connected rooms.
The room on the left contains the player’s start location (>), while some water appears in the lower right corner of the room on the right:
Walkable floor cells are omitted from the domain-specific language,
as they can be inferred by computing the set of cells reachable from the starting point.
In implementations of roguelikes, maps like this one
are translated into an array of arrays of symbols, with some symbols being
walkable (e.g. dot cells and water cells) and some symbols being blockers (walls).
The top-level array is then used to compute possible moves and collisions.
The Parsec monad
To use megaparsec, we define our main monad type using the Parsec e s a type.
It has three arguments:
The type of errors returned by the parser,
the type of stream accepted as input by the parser, and
the type of data returned upon successful parsing of an input stream.
For a simple parser, we define:
The error type to be Text, for simplicity. In a production parser, you would use a structured error type,
that distinguishes the different error cases; so that you can handle them differently.
The input stream to be Text, because this is the most idiomatic choice in the Haskell ecosystem:
import Data.Text(Text)import Text.MegaparsectypeError=TexttypeInput=Text-- | @Parser a@ is a parser that accepts @Text@ as input and returns an @a@ upon-- successful parsing.typeParsera=ParsecErrorInputa
Our first parser
Parsers are built from
primitive combinators (e.g. lookAhead, notFollowedBy, end of file eof)
and combinators derived from them (e.g. oneOf, anySingle, satisfy).
These combinators are designed to consume a few symbols, not complex structures (more on this later).
Combinators return parsers in any MonadParsec monad,
which means that they have a signature where the head is MonadParsec e s m => ...
and the return type is of the form m a1.
In our context, it suffices to know that m a is instantiated to Parser a, so we can
use these combinators for our parsers.
Let’s parse the different kinds of symbols we usually find in ASCII art roguelike maps,
using the anySingle function, which parses a single token. In our case, since the input type
is Text, the type of tokens is Char (see the ShareInput case of
Stream’s documentation,
as well as the instances of Stream):
-- | A symbol in the map of an ASCII roguelikedataSymbol=-- | A wall, depicted by a # characterWall|-- | A water cell, depicted by a ~ characterWaterderiving(Eq,Show)-- | A parser for the symbol of a single cell. Used in 'parseElement' below.parseSymbol::ParserSymbolparseSymbol=doc<-anySinglecasecof'#'->returnWall'~'->returnWater_->fail$"Unknown symbol: "<>[c]-- See below for how to avoid this case altogether (in parseLineElement)
Parser combinators
By virtue of MonadParsecs
being monads, parsers can be built using functions that are common in monadic Haskell code
(including functions from Functor, Applicative, etc.). Let’s demonstrate this
to build a parser for more advanced roguelike map constructs:
dataElement=-- | Horizontal wall, starting at @(x,y)@ with @length@ cells (ending at @(x+length-1,y)@)HorizontalLineIntIntInt|-- | Vertical wall, starting at @(x,y)@ with @length@ cells (ending at @(x,y+length-1)@)VerticalLineIntIntInt|-- | A cell at @(x,y)@ with a symbolCellIntIntSymbol|-- | The starting point of the playerStartIntIntderiving(Eq,Show)
The parser for the HorizontalLine and VerticalLine cases can be written as follows:
import Control.Monad(void)import Control.Monad.Extra(when)import Text.Megaparsec.Charimport Text.Megaparsec.Char.LexerparseLineElement::ParserElementparseLineElement=doconstructor<-choice[string"HLine">>returnHorizontalLine,string"VLine">>returnVerticalLine]space1-- One or more spacex<-decimalspace1y<-decimalspace1len<-decimalwhen(len<1)$fail$"Length must be greater than 0, but got "<>showlenreturn$constructorxylen
The first two lines either parse the string HLine or the string VLine and use the
choice
function to encode the two possibilities. Also, because each line in a do block encodes a step
in the computation, writing monadic parsers is natural: each line consumes some of the input,
until enough is consumed to return the desired value. Another example of using a regular monadic function
is to use when
to stop parsing when an incorrect value is consumed.
Running parsers
Since our parser takes Text as input, it can be tested in a pure context.
Megaparsec provides the runParser
function for this. To be able to print errors of our parser, our error type must be an instance of ShowErrorComponent;
and then we can define a convenient runMyParser function that returns either an error or the parsed value:
import Data.Text(pack,unpack)-- | Instance required for 'runMyParser'instanceShowErrorComponentErrorwhereshowErrorComponent=unpack-- | A variant of megaparsec's 'runParser', instantiated to our context.-- Successfully parses an @a@ or returns an error message.runMyParser::Parsera->Input->EitherTextarunMyParserparserinput=caserunParserparser""inputofLefterr->Left$pack$errorBundlePrettyerrRightx->Rightx
Parsing expressions, lists, etc.
Megaparsec not only provides building blocks for parsing tokens and combining parsers. It also provides
parsers for common constructs found in programming languages and domain-specific languages, such as expressions
and lists. Megaparsec does this by relying on the
parser-combinators
package.
I don’t want to go into the details of parsing expressions here (e.g. parsing 1 + 2 - 3…), but let me emphasize that it is
a bad idea to write your own expression parser. Instead, think about what kind of operators you need and encode them, using the
Operator
type.
List parsing, on the other hand, is done with various
sep… functions.
In our case of roguelike maps, we allow different elements to be separated by a semicolon, or by one or more newlines.
This is encoded as follows:
parseElements::Parser[Element]parseElements=parseElement`sepBy1`separatorwhereseparator=dohspace-- Optional horizontal (non-newline) spacechoice[void$char';',void$someeol]-- Either a single ';' or many newlineshspaceparseElement::ParseElementparseElement=choice[parseLineElement,parseStart,parseCell]whereparseStart=dovoid$string"Start"space1(x,y)<-parseCoordreturn$StartxyparseCell=dovoid$string"Cell"space1(x,y)<-parseCoordspace1symbol<-parseSymbolreturn$CellxysymbolparseCoord=dox<-decimalspace1y<-decimalreturn(x,y)
Conclusion
We’ve presented how to parse simple constructs using megaparsec and how to run our parsers.
This blog post is less than 1500 words long: mission accomplished presenting megaparsec in a shorter way than the official tutorial 🥳
New languages are coming out all the time, some experimental, some
industrial, others are purpose built for a specific domain. No single
language has the people-power or scope to try every cool new feature, so
a critical step in designing a new language is to observe how
experimental features have borne themselves out in practice.
As the saying goes, good [language designers] copy, great [language
designers] steal.
If you've heard anything about the Unison Language it's not a
surprise to you that it innovates in many areas. Unison very much tries
to reinvent Human-Compiler interactions for the 21st century, and in
that pursuit has spawned fully integrated ecosystem between the
compiler, codebase-manager, language server, version control and package
manager.
While some of these features are still too new to have proven their
worth (but we have our fingers crossed); there are aspects that I think
new languages should certainly consider as part of their designs.
A Fully
Interactive and Incremental Compiler
With the modern era of language servers and programming assistants,
developers greatly benefit from instant feedback on their work. With
traditional batch compilers it's all too tempting to go for a coffee, or
a walk, or a YouTube binge every time you kick off a big build. The
context-switching induced by switching tasks while compiling wastes
developer time by paging things in and out of their working memory, not
to mention: it just feels bad. After the build finishes, the
developer is left with a giant wall of text, sentenced to dig through a
large list of compiler errors trying to find some root-cause error in
the file they're working on.
Unison has a fully interactive compilation experience. The
language-server is typechecking your scratch-file on every keystroke
providing error feedback right in your editor, and offering helpful
information via hover-hints which use your codebase and typechecking
info to help you orient yourself. It can even partially typecheck the
file to suggest which types or operators you may want to fill into a
given slot.
Once you're happy with a chunk of code, you can check it in to the
codebase and it won't be compiled again unless you want to change it, or
an update is automatically propagated into it from a downstream
change.
While most languages won't adopt Unison's scratch-file and codebase
model; having an interactive compiler with good support for caching of
already-compiled-assets is a huge boon to productivity in any
language.
On the topic of the language server, Unison's language server is
built directly into the compiler. This ensures we avoid the awkward
disagreements between the LSP and compiler that sometimes happen in
other languages. It can also help to avoid duplicate work, many
languages are running the compiler independently and in their LSP at the
same time without sharing any of the work between them, causing
redundant work and a waste of precious resources.
Codebase API
It's the compiler's job to understand your code intimately. It knows
exactly how every definition is linked together, even if you don't! In
many languages it can be frustrating to know that this information
exists deep within the compiler, but not having any access to it
yourself!
Unison stores all your code as structured data within your codebase
and exposes the ability for you to ask it useful questions about your
code, exposing that precious understanding to you as a developer.
Unison allows searching by type, finding the dependencies of a
definition, or inverting that relationship to finding all definitions
which depend on a definition.
Via the UCM CLI you can use utilities like text.find to
search only string constants, or find to search only
definition names.
Some codebase data is provided via an API which is exposed from the
interactive UCM compiler, allowing developers to write tooling to
customize their workflow. For example, check out this VS
Code plugin someone wrote to view codebase definitions in the
sidebar. In other languages you'd typically need to write a scrappy
Regex or re-compile the code in a subprocess in order to achieve
something similar.
It doesn't have to be an API, it could be a parquet file or a SQLite
database or any number of things, the important part is that a language
exposes its one-true-source of information about the codebase in some
structured format for third-party tools to build upon.
Smart docs
It doesn't matter how great your language's package ecosystem is if
nobody can figure out how to use it! Documentation is critical for
helping end users understand and use functionality in your language, but
it has a fatal flaw: documentation isn't compiled and falls out of date
with the code.
In Unison, docs are a data-type within the language itself. This
means that docs can be generated dynamically by running Unison
code! We've leveraged this ability to enable embedding typechecked
runnable code examples into your docs. These examples are compiled
alongside the rest of your program, so they're guaranteed to be
kept up to date, and the outputs from your example code is run
and updated whenever the source definitions change.
You can also write code which generates documentation based
on your real application code. For example, you could write code which
crawls your web-server's implementation and collects all the routes and
parameters the server defines and displays them nicely as
documentation.
Unison goes one step further here by providing special support for
the documentation format on Unison Share, ensuring any definitions
mentioned in docs and code examples are hyper-linked to make for a
seamless package-browsing experience.
As an example of how far this can go, check out this
awesome project by community contributor Alvaro which generates
mermaid graphs in the docs representing the behaviour of simulations.
The graphs are generated from the same underlying library code so they
won't go out of date.
Get stealing
This subset of topics doesn't touch on Unison's ability system,
continuation capturing, or code serialization so I'll probably need at
least a part 2!
Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put
these things together, if I managed to teach your something or even just entertain you for a minute or two
maybe send a few bucks my way for a coffee? Cheers! �
In this episode Mike Sperber and Niki Vazou talk with Sandy Maguire, lead compiler engineer at Manifold Valley. They talk about the benefits of using Haskell of course, about all the books Sandy has written, on effects and the problem with monads, on combinator libraries and programming with laws.
Back in 2023, I introduced Solve.hs, my newest course focused on problem solving in Haskell. This course was inspired by my experiences solving programming puzzles with Haskell, especially by the feeling of how different it was compared to other languages.
Solve.hs will teach you all the core knowledge you need around data structures and algorithms to tackle not only these kinds of puzzles (which often appear as interview questions), but also the mindset shifts you have to make when solving them in Haskell.
In 2023, I released the first two modules, which focused on data structures, with a special emphasis on how Haskell uses linked lists. These also explored the patterns that replace ’for’ and ‘while’ loops from other languages.
Then in 2024 I released module 3, which explained all of the most essential algorithms in great detail, and showed how we have to implement them differently in Haskell.
Finally, today, I am releasing the fourth and final module for this course! This module explains parsing in great detail. You’ll learn:
Basic string manipulation techniques for simple parsing
How to use libraries to parse common data formats (e.g. JSON)
How to use the Megaparsec library to parse any other kind of structured data
How to write your own monadic parser
How to use regular expressions for parsing in Haskell
These skills can be important in puzzle solving challenges where your input is just a string. But they’re also applicable in a wide variety of “real world” projects!
For the next 2 weeks, you can get Solve.hs for 20% off with the code SOLVE25. You can also get an extra 10% discount by subscribing to our newsletter!
After these 2 weeks are up, you’ll not only lose the discount, but the price of the course will go up to reflect the added material from module 4. This course will never be cheaper, so grab it now by going to the course page!
The GHC developers are very pleased to announce the availability
of the release candidate for GHC 9.10.2. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org and
via GHCup.
GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of
severities and scopes. A full accounting of these fixes can be found in the
release notes. As always, GHC’s release status, including planned future
releases, can be found on the GHC Wiki status.
This release candidate will have a two-week testing period. If all goes well
the final release will be available the week of 1 May 2025.
We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable,
Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors
whose on-going financial and in-kind support has facilitated GHC maintenance
and release management over the years. Finally, this release would not have
been possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
A few months ago, I announced that the GHC wasm
backend added support for Template Haskell and ghci. Initially, the
ghci feature only supported running code in nodejs and accessing the
nodejs context, and I’ve been asked a few times when ghci was going to work in
browsers in order to allow live-coding the frontend. Sure, why not? I promised
it in the last blog post’s wishlist. After all, GHCJS used to support
GHCJSi for browsers almost 10 years ago!
I was confident this could be done with moderate effort. Almost all the
pieces are already in place: the external interpreter logic in GHC is
there, and the wasm dynamic linker already works in nodejs. So just
make it runnable in browsers as well, add a bit of logic for
communicating with GHC and we’re done right? Well, it still took a few
months for me to land it…but finally here it is!
To keep this post within reasonable length, I will only introduce the
user-facing aspects of the wasm ghci browser mode and won’t cover the
underlying implementation. The rest of the post is an example ghci
session followed by a series of bite sized subsections, each covering
one important tip about using this feature.
How to use it
The ghc-wasm-meta repo provides user-facing
installation methods for the GHC wasm backend. Here we’ll go with the
simplest nix-based approach:
$ nix shell 'gitlab:haskell-wasm/ghc-wasm-meta?host=gitlab.haskell.org'
$ wasm32-wasi-ghc --interactive -fghci-browser
GHCi, version 9.12.2.20250327: https://www.haskell.org/ghc/ :? forhelp
Open http://127.0.0.1:38827/main.html or import http://127.0.0.1:38827/main.js to boot ghci
The -fghci-browser flag enables the browser mode. There are a couple
of other related flags which you can read about in the user
manual, but for now, let’s open that page to
proceed. You’ll see a blank page, but you can press F12 to open the
devtools panel and check the network monitor tab to see that it’s
sending a lot of requests and downloading a bunch of wasm modules.
Within a few seconds, the initial loading process should be complete,
and the ghci prompt should appear in the terminal and accept user
commands.
Let’s start with the simplest:
ghci>putStrLn"hello firefox"ghci>
The message is printed in the browser’s devtools console. That’s not
impressive, so let’s try something that only works in a browser:
The above code implements logic to export a Haskell IO () function
to a JavaScript synchronous callback that can be attached as a
button’s client event listener. Synchronous callbacks always attempt
to run Haskell computations to completion, which works fine as long as
the exported Haskell function’s main thread does not block indefinitely,
like waiting for an async JSFFI import to resolve or be rejected. You
can read more about JSFFI in the user manual, but let’s
carry on with this example:
Now, the button is attached to a simple counter in Haskell that prints
an incrementing integer to the console each time the button is
clicked. And that should be sufficient for a minimal demo! Now, there
are still a couple of important tips to be mentioned before we wrap up
this post:
Hot reloading
Just like native ghci, you can perform hot reloading:
ghci> :r
Ok, no modules to be reloaded.
ghci> btn
<interactive>:15:1: error: [GHC-88464]
Variable not in scope: btn
Reloading nukes all bindings in the current scope. But it doesn’t
magically undo all the side effects we’ve performed so far: if you
click on the button now, you’ll notice the counter is still working
and the exported Haskell function is still retained by the JavaScript
side! And this behavior is also consistent with native ghci:
hot-reloading does not actually wipe the Haskell heap, and there
exist tricks like foreign-store to persist values
across ghci reloads.
For the wasm ghci, things like foreign-store should work, though you
can allocate a stable pointer and print it, then reconstruct the
stable pointer and dereference it after a future reload. Since wasm
ghci runs in a JavaScript runtime after all, you can also cook your
global variable by assigning to globalThis. Or locate the element
and fetch its event handler, it should be the same Haskell callback
exported earlier which can be freed by freeJSVal.
So, when you do live-coding that involve some non-trivial back and
forth calling between JavaScript and Haskell, don’t forget that hot
reloads don’t kill old code and you need to implement your own logic
to disable earlier callbacks to prevent inconsistent behavior.
Loading object code
The wasm ghci supports loading GHC bytecode and object code. All the
code you type into the interactive session is compiled to bytecode.
The code that you put in a .hs source file and load via command line
or :l commands can be compiled as object code if you pass
-fobject-code to ghci.
I fixed the ghci debugger for all 32-bit cross targets
since the last blog post. Just like native ghci, debugger features
like breakpoints now work for bytecode. If you don’t use the ghci
debugger, it’s recommended that you use -fobject-code to load
Haskell modules, since object code is faster and more robust at
run-time.
Interrupting via ^C
My GHC patch that landed the ghci browser
mode also fixed a previous bug in wasm ghci: ^C was not handled at all
and would kill the ghci session. Now, the behavior should be
consistent with native ghci. With or without -fghci-browser, if
you’re running a long computation and you press ^C, an async exception
should interrupt the computation and unblock the ghci prompt.
Read the :doc, Luke
Among the many changes I landed in GHC since last blog post, one of
them is adding proper haddock documentation to all user-facing things
exported by GHC.Wasm.Prim. Apart from the GHC user manual, the
haddock documentation is also worth reading for users. I haven’t set
up a static site to serve the haddock pages yet, but they are already
accessible in ghci via the :doc command. Just try import GHC.Wasm.Prim and check :doc JSVal or :doc freeJSVal, then you
can read them in plain text.
As the Haskell wasm user community grows, so will the frustration with
lack of proper documentation. I’m slowly improving that. What you see
in :doc will continue to be polished, same for the user manual.
Importing an npm library in ghci
You can use JavaScript’s dynamic import() function as an async JSFFI
import. If you want to import an npm library in a ghci session, the
simplest approach is using a service like esm.run which
serves pre-bundled npm libraries as ES modules over a CDN.
If you have a local npm project and want to use the code there, you
need to do your own bundling and start your own development server
that serves a page to make that code somehow accessible (e.g. via
globalThis bindings). But how does that interact with the wasm ghci?
Read on.
Using ghci to debug other websites
The browser mode works by starting a local HTTP server that serves
some requests to be made from the browser side. For convenience, that
HTTP server accepts CORS requests from any origin, which means
it’s possible to inject the main.js startup script into browser tabs
of other websites and use the wasm ghci session to debug those
websites! Once you fire up a ghci session, just open the devtools
console of another website and drop a
import("http://127.0.0.1:38827/main.js") call, if that website
doesn’t actively block third-party scripts, then you can have more fun
than running it in the default blank page.
All JavaScript code for the GHC wasm backend consists of proper ES modules
that don’t pollute the globalThis namespace. This principle has
been enforced since day one, which allows multiple Haskell wasm
modules or even wasm ghci sessions to co-exist in the same page! It
works fine as long as you respect their boundaries and don’t attempt
to do things like freeing a JSVal allocated elsewhere, but even if
you only have one wasm module or ghci session, the “no global variable”
principle should also minimize the interference with the original page.
In my opinion, being able to interact with other websites is the most
exciting aspect of the browser mode. Sure, for Haskell developers that
want to experiment with frontend development, using ghci should
already be much easier than setting up a playground project and
manually handling linker flags, wrapper scripts, etc. But there’s even
greater potential: who said the website itself needs to be developed
in Haskell? Haskell can be used to test websites written in foreign
tech stacks, and testing backed by an advanced type system is
undoubtedly one of our core strengths! You can use libraries like
quickcheck-state-machine or
quickcheck-dynamic to perform state machine
property testing interactively, which has much greater potential of
finding bugs than just a few hard coded interactions in JavaScript.
No host file system in wasm
The default nodejs mode of wasm ghci has full access to the host file
system, so you can use Haskell APIs like readFile to operate on any
host file path. This is no longer the case for browser mode: the only
handles available are stdout/stderr, which output to the devtools
console in a line-buffered manner, and there’s no file to read/write
in wasm otherwise. The same restriction also applies to Template
Haskell splices evaluated in a browser mode ghci session, so splices
like $(embedFile ...) will fail.
This is a deliberate design choice. The dev environment backed by ghci
browser mode should be as close as possible to the production
environment used by statically linked wasm modules, and the production
environment won’t have access to the host file system either. It would be
possible to add extra plumbing to expose the host file system to ghci
browser mode, but that is quite a bit of extra work and also makes the
dev environment less realistic, so I’d like to keep the current design
for a while.
If you need to read a local asset, you can serve the asset via another
local HTTP server and fetch it in ghci. If you have modules that use
splices like embedFile, those modules should be pre-compiled to
object code and loaded later in ghci.
Don’t press F5
It’s very important that the browser page is never refreshed. The
lifetime of the browser tab is supposed to be tied to the ghci
session. Just exit ghci and close the tab when you’re done, but
refreshing the page would completely break ghci! A lot of shared state
between the browser side and host side is required to make it work,
and refreshing would break the browser side of the state.
Likewise, currently the browser mode can’t recover from network
glitches. It shouldn’t be a concern when you run GHC and the browser
on the same machine, but in case you use SSH port forwarding or
tailscale to establish the GHC/browser connection over an unstable
network, once the WebSocket is broken then the game is over.
This is not ideal for sure, but supporting auto-recovery upon network
issues or even page reloads is incredibly challenging, so let’s live
with what is supported for now.
Doesn’t work on Safari yet
Currently the browser mode works fine for Firefox/Chrome, including
desktop/mobile versions and all the forks with different logos and
names. Sadly, Safari users are quite likely to see spurious crashes
with a call_indirect to a null table entry error in the console.
Rest assured, normal statically-linked Haskell wasm modules still work
fine in Safari.
This is not my fault, but WebKit’s! I’ve filed a WebKit
bug and if we’re lucky, this may be looked into on their
side and get fixed eventually. If not, or if many people complain
loudly, I can implement a workaround that seems to mitigate the WebKit
bug to make the browser mode work in Safari too. That’ll be extra
maintenance burden, so for now, if you’re on macOS, your best bet is
installing Firefox/Chrome and using that for ghci.
Huge libraries don’t work yet
How large is “huge”? Well, you can check the source code of
V8, SpiderMonkey and
JavaScriptCore. In brief: there are limits agreed upon
among major browser engines that restrict a wasm module’s
import/export numbers, etc, and we do run into those limits
occasionally when the Haskell library is huge. For instance, the
monolithic ghc library exceeds the limit, and so does the profiling way of
ghc-internal. So cost-center profiling doesn’t work for the ghci
browser mode yet, though it does work for statically linked wasm
modules and ghci nodejs mode.
Unfortunately, this issue is definitely not a low hanging fruit even
for me. I maintain a nodejs fork that patches the V8 limits so that
the Template Haskell runner should still work for huge libraries, but
I can’t do the same for browsers. A fundamental fix to sidestep the
browser limits would be a huge amount of work. So I’ll be prioritizing
other work first. If you need to load a huge library in the browser,
you may need to split it into cabal sublibraries.
Wishlist, as usual
My past blog posts usually ends with a “what comes next” section. This
one is no exception. The browser mode is in its early days, so it’s
natural to find bugs and other rough edges, and there will be
continuous improvement in the coming months. Another thing worth
looking into is profiling: modern browsers have powerful profilers,
and it would be nice to integrate our own profiling and event log
mechanism with browser devtools to improve developer experience.
The next big thing I’ll be working on is threaded RTS
support. Currently all Haskell wasm modules are
single-threaded and runs in the browser main thread, but there may
exist workloads that can benefit from multiple CPU cores. Once this is
delivered, Haskell will also become the first functional language with
multi-core support in wasm!
You’re welcome to join the Haskell wasm Matrix room
to chat about the GHC wasm backend and get my quick updates on this
project.
Today, 2025-04-16, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 42th episode of the Haskell Unfolder live on YouTube.
Functional programming is programming with mathematical functions, mapping
inputs to outputs. By contrast, logic programming—perhaps best known from
the language Prolog—is programming with mathematical relations between
values, without making a distinction between inputs and outputs. In this
two-year anniversary episode of the Haskell Unfolder we take a look at
typedKanren, an embedding of the logic programming language miniKanren in
Haskell. We will see how we can use it to write a type checker for a simple
functional language in a few lines of code.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time
in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is
O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of
concatenation.
The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions
a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists”
represented this way is just function composition which is a constant time operation. We can lift a
list xs to this representation via the section (xs ++). This will still lead to O(length xs)
amount of work to apply this function, but a composition of such functions applied to a list will
always result in a fully right associated expression even if the function compositions aren’t
right associated.
In the last several years, it has become popular to refer to this technique as “difference lists”.
Often no justification is given for this name. When it is given, it is usually a reference to the
idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise
to efficient concatenation, they have almost no similarities.
Functional Lists
To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it
is doing is a bit misunderstood and, consequently, oversold1. Let’s see how we would model this approach in an OO
language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it
would be exactly the same for generic lists.
This is just a straight, manual implementation of closures for (.) and (++) (specialized to
strings). Other lambdas not of the above two forms would lead to other implementations of
PrependTo. Let’s say, however, these are the only two forms that actually occur, which is mostly
true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an
OOP encoding of the algebraic data type:
dataPrependTo=ComposePrependToPrependTo|PrependStringprependTo ::PrependTo->String->StringprependTo (Compose left right) end = prependTo left (prependTo right end)prependTo (Prepend s) end = s ++ end
We could have also arrived at this by defunctionalizing a typical example of the technique. Modulo
some very minor details (that could be resolved by using the Church-encoded version of this), this
does accurately reflect what’s going on in the technique. Compose is clearly constant time. Less
obviously, applying these functional lists requires traversing this tree of closures – made
into an explicit tree here. In fact, this reveals that this representation could require arbitrarily
large amounts of work for a given size of output. This is due to the fact that prepending an empty
string doesn’t increase the output size but still increases the size of the tree. In practice,
it’s a safe assumption that, on average, at least one character will be prepended per leaf of the
tree which makes the overhead proportional to the size of the output.
This tree representation is arguably better than the “functional list” representation. It’s less
flexible for producers, but that’s arguably a good thing because we didn’t really want arbitraryString -> String functions. It’s more flexible for consumers. For example, getting the head of
the list is a relatively efficient operation compared to applying a “functional list” and taking
the head of the result even in an eager language. (Laziness makes both approaches comparably
efficient.) Getting the last element is just the same for the tree version, but, even with laziness,
is much worse for the functional version. More to the point, this concrete representation allows
the concatenation function to avoid adding empty nodes to the tree whereas (.) can’t pattern
match on whether a function is the identity function or not.
This view makes it very clear what the functional version is doing.
Difference Lists in Prolog
List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.
For our purposes, viewing this as a function of the first two arguments, this is exactly the usual
functional implementation of list concatenation with exactly the same problems. We could, of course,
encode the defunctionalized version of the functional approach into (pure) Prolog. This would
produce:
(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)
However, this being a logic programming language means we have additional tools available to use
that functional languages lack. Namely, unification variables. For an imperative (destructive)
implementation of list concatenation, the way we’d support efficient append of linked lists is we’d
keep pointers to the start and end of the list. To append two lists, we’d simply use the end
pointer of the first to update the end of the first list to point at the start of the second. We’d
then return a pair consisting of the start pointer of the first and the end pointer of the second.
This is exactly how Prolog difference lists work, except instead of pointers, we use unification
variables which are more principled. Concretely, we represent a list as a pair of lists, but the
second list will be represented by an unbound unification variable and the first list contains
that same unification variable as a suffix. This pair is often represented using the infix
operator (“functor” in Prolog terminology), -, e.g. Xs - Ys. We could use diff(Xs, Ys) or
some other name. - isn’t a built-in operator, it’s just a binary constructor essentially.
At the level of logic, there are no unification variables. The constraints above mean that Xs - Ys
is a list Xs which contains Ys as a suffix.
The name “difference list” is arguably motivated by the definition of concatenation in this
representation.
concat(Xs-Ys,Ys-Zs,Xs-Zs).
This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument
is unbound, like it’s supposed to be, then this is a constant-time operation of binding that
component to Ys. If it is bound, then we need to unify which, in the worst-case, is O(length Ys)
where the length is up to either nil or an unbound variable tail2.
We also have the unit of concat, i.e. the empty
list via3:
empty(Xs-Xs).
See the footnote, but this does in some way identify Xs - Ys with the “difference” of Xs and
Ys.
We get back to a “normal” list via:
to_list(Xs- [],Xs).% or more generally,prepend_to(Xs-Ys,Ys,Xs).
to_list is a constant-time operation, no matter what. Note, to_list binds the suffix component
of the difference list. This means that the first input no longer meets our condition to be a
difference list. In other words, to_list (and prepend_to) consumes the difference list.
More precisely, it constrains the possible suffixes the list could be.
Indeed, any operation that binds the suffix component of a difference list consumes it. For example,
concat consumes its first argument.
Of course, it still makes logical sense to work with the difference list when its suffix component
is bound, it’s just that its operational interpretation is different. More to the point, given a
difference list, you cannot prepend it (via prepend_to or concat) to two different lists to get
two different results.
Converting from a list does require traversing the list since we need to replace the nil node, i.e.
[], with a fresh unbound variable. Luckily, this is exactly what append does.
from_list(Xs,Ys-Zs) :- append(Xs,Zs,Ys).
from_list also suggests this “difference list” idea. If all of Xs, Ys, and Zs are ground
terms, then from_list(Xs, Ys - Zs) holds when append(Xs, Zs, Ys) holds. Exactly when if our
invariants are maintained, i.e. that Zs is a suffix of Ys. Writing these relations more
functionally and writing append as addition, we’d have:
If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal)
list with to_list, and then we could use from_list multiple times on that result. This would,
of course, still consume the original difference list. We’d also be paying O(length Xs) for every
duplicate, including to replace the one we just consumed4.
That said, we can prepend to a list to a difference list without consuming it. We can perform
other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing
into the list would force the list to be at least a certain length, but still allow prepending to
any list that will result in a final list at least that long.
Comparison
I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing
enforces that a value of type ShowS actually just appends something to its input. We could use
abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference
lists also need an abstraction barrier to ensure their invariants, though their failure modes are
different. A difference list can’t change what it is based on what it is prepended to.
Functional Representation
Difference Lists
constant-time concatenation
constant-time concatenation
constant-time conversion from a list (though you pay for it later)
O(n) conversion from a list
persistent
non-persistent, requires linear use
represented by a tree of closures
represented by a pair of a list and a unification variable
O(n) (or worse!) conversion to a list
constant-time conversion to a list
defunctionalized version can be implemented in pretty much any language
requires at least single-assignment variables
unclear connection to being the difference of two lists (which two lists?)
mathematical, if non-obvious, connection to being the difference of two (given) lists
As an illustration of the difference between persistent and non-persistent uses, the function:
double f = f . f
is a perfectly sensible function on ShowS values that behaves exactly as you’d expect. On the
other hand:
double(In,Out) :- concat(In,In,Out).
is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic
list) except for when In is the empty difference list.
Conclusion
I hope I’ve illustrated that the functional representation is not just not difference lists, but
is, in fact, wildly different from difference lists.
This functional representation is enshrined into Haskell via the ShowS type and related functions,
but I’d argue the concrete tree representation is actually clearer and better. The functional
representation is more of a cute trick that allows us to reuse existing functions. Really, ShowS
should have been an abstract type.
Difference lists are an interesting example of how imperative ideas can be incorporated into a
declarative language. That said, difference lists come with some of the downsides of an imperative
approach, namely the lack of persistence.
As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional
representation. Calling it “functional lists” or something like that is, in my opinion, very
ambiguous and potentially misleading. I think the lack of a good name for this is why “difference
lists” started becoming popular. As I’ve argued, using “difference list” in this context is even
more misleading and confusing.
If people really want a name, one option might be “delta list”. I don’t think this term is used.
It keeps the intuitive idea that the functional representation represents some “change” to a list,
a collection of deltas that will all be applied at once, but it doesn’t make any false reference to
difference lists. I’m not super into this name; I just want something that isn’t “difference list”
or otherwise misleading.
To be clear, it’s still much, much,
better than using plain concatenation.↩︎
Such a length relation couldn’t
be written in pure Prolog but can in actual Prolog.↩︎
For those algebraically minded, this almost makes concat and empty into another
monoid except concat is partial, but such a partial monoid is just a category! In other words,
we have a category whose objects are lists and whose homsets are, at most, singletons containing
Xs - Ys for Hom(Xs, Ys). If we maintain our invariant that we have Xs - Ys only when Ys is a
suffix of Xs, this thin category is exactly the category corresponding to the reflexive,
transitive “has suffix” relation. We could generalize this to any monoid via a “factors through”
relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can
easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general
monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and
only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is
exactly the slice category of the monoid viewed as a one-object category.↩︎
Again, in actual Prolog, we could
make a duplicate without consuming the original, though it would still take O(length Xs) time using
the notion of length mentioned before.↩︎
We have recently contributed to a research study investigating how AI can help with realistic software development tasks.
METR initiated this study to measure how AI tools affect real-world software
engineering productivity, particularly in substantial open-source projects. The
study was designed to measure and assess how experts can use AI tools in order to
improve their workflows. It’s all well and good if the latest model can fix an artificial
test case, but what’s more interesting is how AI can be driven by expert knowledge.
The study required us to work on about 20 small normal development tasks. These
tasks were randomised; in some, we were allowed to use AI tools, and in others, we were not.
Apart from that, we could solve them however we wanted. We compiled notes on how we approached using the tools
and recorded our screens to provide a record of our experience.
Sam used the hours to work on GHC tickets, and I (Matt) used the hours to work on Cabal tickets.
I have included a list of the issues we worked on at the end of the post. Sam focused on fixing small bugs,
while I fixed all the known regressions in the cabal-install-3.14 release.
In this post, I’ll briefly discuss how we used the tools and what our overall experience was.
Haskell Programming with the help of AI
Before this study, neither of us was experienced with using AI tools to help with software development. I was
impressed that the models could interact with Haskell code at all. At the
start, it was quite overwhelming trying to understand what was available and what the trade-offs were between different tools.
The AI landscape is changing rapidly at the moment; there is a new model and tool every week.
Therefore, I won’t go into too much detail about what specific models or tools we used, but rather focus on our findings and experiences.
Development Environment
For the study, we were primarily using the following models and tools:
The text editor Cursor with AI autocomplete. Cursor is a fork of VSCode with AI related features.
In the version of Cursor we used (0.45), there were two modes: the “chat” mode, which does not directly perform edits to your
files, and the “compose” mode, which does.
From within Cursor, the LLM claude-3.7-sonnet-thinking for the “chat” and “compose” features.
The standard ChatGPT 4o model from the web interface.
Using an editor with integrated LLM support, in particular one that supports Haskell Language Server, is key to getting the most
out of the AI tools:
Within the editor, it means that LLMs have access to relevant context for the task. This includes any files we pass to the
model as context, but also the rest of the codebase which can be searched by the LLM.
When an LLM suggests a change, it will receive feedback from HLS which will allow it to fix issues (e.g. fixing up missing imports,
resolving typechecker errors, etc). In practice, this made LLMs much more more autonomous and reliable.
We didn’t use anything complicated or new, such as the Model Context Protocol or
very advanced thinking models such as ChatGPT o1.
Armed with these tools, we were ready to set about our task.
Architectural Understanding Tasks
For the AI-enabled tasks, we were encouraged to use the AI as much as possible.
Therefore, I typically started by just giving the AI a link to the GitHub issue
and asking it to explain what to do to me. The summary was useful to me to just
check I understood correctly, and hearing it phrased differently was a good
sanity check before starting the issue.
Asking specific questions about the codebase had more mixed results.
In general, the AI could usually give plausible answers to understanding tasks,
but they were often wrong in some subtle way. It is also very suggestible to
agree with whatever you state you think the solution is.
My impression for architectural understanding tasks was that you would have to provide a summary document as context in order to answer
questions more accurately.
Technology Understanding Tasks
For tasks that required me to understand something new or unfamiliar,
the AI was very good. In one issue, I had to investigate something wrong with the GitHub CI setup,
which was an area I was quite clueless about. ChatGPT was able to suggest the probable cause of the issue
with minimal prompting and just the issue description for context. That certainly saved a lot of time.
The ability to generate ad-hoc scripts for particular tasks was also very useful. I generated several
useful single-use bash and python scripts for extracting specific pieces of information from the codebase.
These scripts can also be used to generate information to feed back into the prompt, which can perform a useful
feedback loop.
Code Generation Tasks
Once the AI demonstrated to me it understood the problem, I would ask it to
generate a solution. The AI could generate plausible, syntactically correct
code, but it was often the wrong idea. I think this was the biggest waste of
time. Once a solution is generated, it was quite tempting to just “fix” the
wrongness, but more often than not, the architecture or design was wrong. Many
fixes in a codebase
like Cabal require changing a few lines very precisely; that’s not something the AI is good at doing on its own at the moment.
On the other hand, if you are precise with your prompts and set the correct
context, the AI can save a lot of time generating specific definitions for you.
I would often use it to generate routine instances, simple definitions, or other well-defined generation tasks. It normally got these
correct, which I was very surprised about.
Generating test cases was also a good use of the AI. It was able to handle
generating the right structure for the custom Cabal test framework. These
invariably required some tweaking, but getting all the right files in place made it
a much simpler task.
The final approach to improve generation tasks is to first converse with the “chat” interface to clarify the problem, discuss
different parts of the design, and point out any issues. Once this context is established, you can ask the “chat” window to first generate
a prompt for the “compose” window. This prompt then gives precise instructions already to “compose,” but it can be
edited further if something is not quite right.
Documentation Tasks
Opinion was split between us about how useful the LLMs were for documentation tasks.
I thought that this was a strong point in favour of using LLMs. Often when working on an issue, you end up having to explain the same thing several
different times. First, you explain precisely to the machine what your intent is with the code you write. Then you explain to a developer
in the comments and commit message. Finally, you explain to the user in the changelog and documentation. Each
of these tasks requires modifying somewhere slightly different with a slightly different focus. I found
that I was much more inclined to include all these different parts when using the AI since it could do a good job
generating the necessary files without requiring too much further editing.
The code changes themselves, along with the context developed in “chat,” were normally enough to be able to generate the
commit message, changelog, and documentation updates with very little effort.
On the other hand, the suggestions weren’t to Sam’s taste. He thought that the style
generated for the commit messages was rambly and indirect. The model might focus on
explaining a small detail rather than giving a bigger picture overview. For the more
complicated code in GHC, the explanation of the code was a vague transcription rather
than relaying any higher-level ideas the user might want to know.
He felt similarly when it came to note writing, a developer documentation artifact common to
GHC development, the LLMs would “get stuck” explaining details of the code rather than the
bigger picture. He did have some success in writing commit messages: the LLMs were
good at summarising which functions and parts of the code were modified, which gave
a good starting point for structuring the necessary explanations.
It’s interesting we had different experiences in this area, perhaps it was due to
the difference in the codebase, or a difference in our style of using the models.
People often struggle writing commit messages or documentation, and I think using
LLMs can reduce the barrier to entry in this area. A human crafted commit message is
often much bettter than one generated by a model but I would much prefer a commit message
generated by an LLM rather than none at all.
Verification Tasks
Another interesting use case is to use the AI to perform ad-hoc verification tasks. For example, I used the AI to check that
all NFData instances had a certain structure. For this, I first worked with the AI to generate a script to extract all the code for
NFData instances from the codebase. This required a small amount of debugging, but it would have taken me several hours to write the
awk script myself due to unfamiliarity with the language. Once I had the script, I extracted all the NFData instances and asked
ChatGPT to check that they all had the correct structure. The instance-by-instance summary allowed me to also quickly verify
the AI’s answer. It resulted in spotting a few missed cases that were very hard to spot by eye.
LLMs were also useful in diagnosing failing test cases. For example, Sam implemented a change to GHC which lead to
a few dozen failing tests. After giving relevant context about the change, the LLM was able to categorise the
failing tests:
Some test results only involved minor output changes or improvements in error messages, these could be accepted.
The LLM further categorised the serious test failures, e.g. “tests 1, 4 and 5 failed for one reason, while test 2 failed for another reason”.
This categorisation was useful to identify potential issues with a change and quickly addressing them.
It often happens in GHC development that a small change can lead to hundred of failing test cases, and
it can be very time consuming to go over all failing test cases individually. Having an assistant that can
quickly do a first pass at sorting the test failures is very helpful.
Of course, the answers given to you by an LLM must always be taken with suspicion.
In situations where 95% confidence is good enough, or when it is quick and easy to check the correctness of an answer,
they can be very useful.
Refactoring Tasks
Using an LLM can be helpful for refactoring tasks that are routine and
well-defined. In our experience, however, they tend to struggle with larger
tasks or those requiring nuance.
For instance, the LLMs performed well when adding a new error message to Cabal’s diagnostic
infrastructure. This kind of task requires modifying quite a few different places in
a routine manner. There is not much code to add, nothing to move around or delete.
Similarly, for smaller tasks like lifting an expression to a top-level definition or
adding debugging traces,
the AI was able to do this with a high degree of confidence.
For repetitive refactoring tasks, such as renaming fields or parameters, the Cursor
autocomplete is extremely useful. I could often just change the name of a field, navigate to the next
type error, and the autocomplete would suggest the correct modification.
I personally also found it useful that the AI liked to keep things consistent. Fields were named
consistently, functions named consistently, etc. I find this task of making sure the whole API is consistent
quite difficult to do manually.
Finally, I didn’t really try to use the AI for large refactoring tasks. There would
tend to be a lot of random or incomplete changes to the codebase, and subtle things would go
wrong. Sam also reported that when working on some more routine refactorings, there
might have been one or two places which required a decision to be made, and you
could waste quite a lot of time if the LLM choose incorrectly. It would be useful
if a LLM could indicate the places that it modified with lower confidence.
Conclusion
Overall, I found the experience of using AI tools in my normal development
workflow to be very useful, and I will continue to use them after the study. It’s
clear to me that it is going to become essential to be
familiar with these tools as a developer in the future.
Sam has a more negative outlook in comparison. While he found LLMs useful,
he is concerned that increased use of LLMs will affect our shared ability to reason about our code.
Usage of LLMs risks disincentivising thinking deeply about the design or architecture
of our software, which then increases the burden placed on reviewers and risks
the community losing their shared understanding of how the codebase is supposed to operate.
We appreciate METR’s support in conducting this research, which has helped us
better understand both the potential and limitations of AI-assisted development
in the Haskell ecosystem.
If your company is interested in funding open-source work
then we offer Haskell Ecosystem Support Packages to provide
commercial users with support from Well-Typed’s experts, while investing in the
Haskell community and its technical ecosystem.
Issues Fixed
Sam’s GHC Issues
Sam focused on fixing small bugs in the typechecker, together with a couple of bugfixes related to LLVM code generation.
Create a reproducer for the issue where Cabal 3.14.1.0 invokes test binaries with a corrupt (duplicated) environment variable list, aiding in debugging and resolution.
Develop a test for the regression where Cabal-3.14.1.0’s v1-test and Setup.hs test cause test suites of alex-3.4.0.1 and happy-1.20.1.1 to be unable to find data files.
Encountering “Error: Dependency on unbuildable library” when using three internal libraries with --enable-multi-repl, indicating issues in handling multiple internal dependencies.
The cabal repl command does not support renaming of re-exported modules when loading multiple components. This limitation leads to failures when one component re-exports a module from another with a different name.
Rust has always felt like a strange beast, culturally speaking. The community is
made of a mix of people with very different perspectives, including anything
from hardcore low-level kernel hackers to category-theorist and functional
programming gurus. This is also what makes this community so fertile: whether
you’re coming from C, Haskell or TypeScript, you’re likely to learn a lot from
other perspectives.
I’d like to add my modest contribution by introducing a pattern coming from the
functional programming world, recursion schemes1. Recursion
schemes are a design pattern for representing and traversing recursive data structures
(typically trees) which help factor the common part of recursive traversals,
making transformations nicer to write, to read and to compose.
Even in the functional programming world, recursion schemes are not so
well-known. Like monads, they are usually presented in Haskell with frightening
words like zygohistomorphic prepromorphisms. It’s a pity because
recursion schemes can be both simple, useful and practical. I’d even argue that
in Rust, the most interesting part is perhaps the representation technique, more
than the traversal, despite the latter being the original and the usual
motivation for using recursion schemes.
In this post, we’ll work through a concrete example to introduce recursion
schemes and what they can do. We’ll point to a more real life example of how we
use them in the implementation of the Nickel configuration language,
and we’ll discuss the pros and cons of using recursion schemes in the particular
context of Rust.
(In)flexible representations
Let’s say you’re writing a JSON parser library. You’ll need to expose a type
representing JSON values. For the sake of argument, let’s assume that you
support an extension of the JSON language with pairs, so you can write {"foo": ("hello","world")}. Here’s a natural representation:
This data structure is recursive: JSON values can contain other JSON values. We
thus have to use Box (or any other indirection) around recursive occurrences
of JsonValue. Otherwise, this enum would have an infinite size (excepted for
Array and Object since Vec and HashMap add their own indirection, but
it’s somehow luck).
Now, user requestor asks that your parser adds location information to the
output, because they validate some user-provided configuration and would like to
point to specific items on error. This is a reasonable
request which is sadly very hard
to satisfy in the serde ecosystem. Anyway, our parser isn’t interfacing
with serde, so we can add span information:
You can go different ways about this. We could have added a second argument to
each constructor of the enum, such as in String(String, Span), to avoid the
additional Spanned layer, but that would be a lot of repetition. We could also
have moved Box to data: Box<T>. Still, the general idea is that we now have
two layers:
a struct layer gathering the JSON data and the span together;
the original enum layer, the core of JSON, which is almost unchanged.
So far, so good. But user conservator is now complaining that you’ve spoiled
their performance. They’re using JSON as a machine exchange format and don’t
care about position information. Could you restore the old representation and a
way to produce it, ignoring spans?
Unfortunately, we had to change JsonValue. Copy-pasting the original
JsonValue enum under a different name is possible, but it’s unsatisfying, as
we now have multiple copies to maintain. It also doesn’t scale. Beside adding
position information, you might want to have a value representation that uses
Rc instead of Box, because you’re going to need to keep reference to
arbitrary nodes during some complex transformation.
The functorial representation
The recursion schemes pattern has two components: a representation technique and
a transformation technique. I believe the representation part is particularly
interesting for Rust, so let’s start with that.
We’ll try to make our JSON representation more generic to accommodate for the
different variations that we mentioned in the previous section. The fundamental
idea is to replace the recursive occurrences of JsonValue within itself,
Box<JsonValue> (or JsonValue for Array and Object), by a generic
parameter T. Doing so, we’re defining just one layer of a JSON tree where
recursive children can be anything, not necessarily JSON values (we use the F
suffix for that generic version because it’s technically a functor, but that
doesn’t really matter).
This is precisely a single node of a JSON tree, that is either a leaf or a
marker of a node with children but without actually including them.
If we set T = Box<JsonValueF<T>>, we get back the original JsonValue.
But wait, you can’t define the generic parameter T to be something which
depends on T itself! In fact we can, but we need to introduce an extra
indirection:
The price to pay is an additional struct layer, so you need to match on
value.data, and wrap new values as JsonValue { data: JsonValueF::Number(0) }. Note that this layer doesn’t have any cost at run-time.
Another difference is that we now box the values in Array and Object,
which isn’t needed. For now I’ll just ignore that, but you could take a second
generic parameter U to represent the occurrences of T that don’t need an
indirection if this really matters to you.
If we extend our intermediate layer a bit, we can get SpannedValue!
This idea of putting a self-referential type within JsonValueF is referred to
as tying the knot. The power of this approach is that you can keep the core
JsonValueF type unchanged. This applies to any tree-like recursive structure.
Some methods can be implemented only once on JsonValueF for any T, say
is_string or is_number. With additional trait constraints on T, we can
write more involved functions, still operating on the generic functor
representation.
Let’s now see how to traverse our JSON values.
Traversals
The strong point of recursion schemes is to provide an interface for traversing
recursive structures that let you focus on what the function actually does,
which is otherwise mixed with how the recursion is done. The idea is to use
generic combinators which factor out the plumbing of recursive traversals.
Let’s count the number of String nodes in a JSON value, the naive way.
We’ll see how to write this function in the style of recursion schemes. First, we need to
define one core combinator: map.
map takes a JsonValueF<T>, a function f from T to U and returns a
JsonValue<U>. That is, map takes a JSON layer where all the direct children
(the recursive occurrences in our full type) are of some type T and applies
f to transform them to something of type U. This is the secret sauce for
defining traversals.
map isn’t specific to JsonValueF. It can be defined mechanically for any
functor representation (e.g. through a macro) of a data structure.
Note that there’s no recursion in sight: there can’t be, because T and U are
entirely generic and could very well be (), but we saw that JsonValueF<()>
is a single node. map only operates at the current layer.
The trick is that f can use map itself. Let’s see how to use it for
count_strings:
If you look closely, there’s no more recursion in the body of the pattern
matching. It’s factored out in the map call. Let’s break down this example:
map, given a function from T to U, promises you that it can transform
the direct children of type T in JsonValueF<T> to U, providing
JsonValueF<U>. We use it immediately with a recursive call to
count_strings, which can indeed transform the direct children from a
Box<JsonValue> to a u32. If the children have children itself,
count_strings will do that recursively as its first action, down to the
leaves.
Once we’ve reduced potential children of deeper layers to u32s, we get a
JsonValueF<u32>. We sum its content at the current layer.
There is a catch though: our count_strings function takes an owned argument,
which consumes the original JSON value. I’ll come back to that later.
While I find the second version of count_strings a little cleaner, the
difference between the two isn’t really astonishing.
As a more compelling example, let’s define a generic bottom-up traversal
function on JsonValue. This traversal is able to map — that is to rewrite —
nodes (more exactly entire subtrees). map_bottom_up takes a generic
transformation f and applies this function to every subtree starting from the
leaves. You could use such a function to apply program transformations
or optimizations on an abstract syntax tree.
implJsonValue{pubfnmap_bottom_up(self:JsonValue, f:implFnMut(JsonValue)->JsonValue)->JsonValue{let data =self.data.map(|v|Box::new(v.map_bottom_up(f)));f(JsonValue{ data })}}
This example is quite remarkable: it’s almost a one-liner and there is no
pattern matching at all! Once again, the structural recursion is entirely
factored out in the map function. We implemented map_bottom_up on
JsonValue directly, but with some trait constraints on T, we can write a
more generic version JsonValueF that works on both the Boxed and Rced
version (the arena one is more tricky as it requires an explicit allocator).
This example is only scratching the surface.
Mapping is just one example: another common traversals are folds (known as
catamorphisms in the recursion schemes jargon), which generalize the well-known
Iterator::fold from sequences to trees. In fact, count_strings would make
more sense as a fold, but we’ll leave that for another time.
Are recursion schemes useful in Rust?
Haskell has a number of features that make recursion schemes particularly nice
to use and to compose, not the least of which is garbage collection. You don’t
have to think about ownership; it’s references all the way down. Recursive data
structures are easy to express.
On the other side, there is Rust, which culturally doesn’t like recursive
functions that much, for good and bad reasons2. Though sometimes
recursion is hard to avoid, especially on tree-like data structures.
An important issue is that our count_strings consumes its argument, which is
unacceptable in practice. It is possible to write a version of map that takes
a value by reference, and thus similarly for count_strings, but it’s not
entirely straightforward nor free. You can find a by-reference version and more
explanations in our associated repository. At any rate, you can always
write specific traversals manually without resorting to the recursion schemes
way if needed. It’s not an all or nothing approach.
In fact, even if you don’t use map at all, the functor representation alone is
quite useful.
How we use recursion schemes in Nickel
In the implementation of the Nickel configuration language, we use the functor
representation for the abstract syntax tree of a static type.
Here are the stages we went through:
In the parser and most of the Nickel pipeline, we used to have a simple
Box-based, owned representation, akin to JsonValue.
However, during type inference, the Nickel typechecker needs to handle new
type constructions, in particular unification
variables. Those are as-of-yet unknown types, similar
to unknowns in an algebraic equation. Extending the base representation is
readily done as for SpannedJsonValue:
pubenumUnifType{Concrete(Box<TypeF<UnifType>>),/// A unification variable.UnifVar(VarId),//.. rigid type variables, etc.}
More recently, we’ve split the historical, all-powerful unique representation
of expressions (including Nickel types) into two intermediate ones. The new
initial representation is arena-allocated, which makes it natural to use bare
references as the recursive indirection instead of allocating in the heap
through e.g. Box. This is easy with recursion schemes: that is precisely
the ArenaJsonValue example. For a smooth transition, we need to temporarily
keep the old Box-ed Type representation in parts of the codebase, but
having different representations co-exist is a basic feature of recursion
schemes.
We use map-based traversal typically to substitute type variables (that is,
a Nickel generic type, as our T in Rust) for a concrete type and similar
rewriting operations. We have variants of the core map function that can also
thread mutable state, raise errors, or both. Traversal by reference are
implemented manually, with a plain recursive function.
On the downside, type and core function definitions can be a bit verbose and
tricky to get right. For example, Nickel’s TypeF has sub-components that themselves
contain types leading to 4 generic parameters. There are multiple possibilities
for Box placement in particular, only some of them are correct and they are
subtly different. Though once you’ve defined a new variant, this complexity is
mostly hidden from the consumers of your API. It can still manifest as terrible
Rust type errors sometimes if, God forbid, you’ve put a Box at the wrong
place.
Conclusion
We’ve introduced recursion schemes, a design pattern for representing and
traversing recursive data structures. While the traversal part isn’t as good a
fit as in purer functional languages like Haskell, it can still be useful in
Rust. The representation part is particularly relevant, making it easy to define
variations on a recursive data structure with different ownership models or
metadata. We’ve shown how we use recursion schemes in Nickel, and while there
are performance and complexity trade-offs to consider, they can bring value for
moderately complex tree types that need to be extended and transformed in
various ways.
Rust allocates on the stack by default, which makes it easier to overflow
(though the stack can be configured to be larger at compile time). However,
I have the impression that there’s a misleading idea that recursive
functions perform poorly. For tree transformations at least, the iterative
version is usually harder to write and can require explicitly representing
the context on the heap through an auxiliary data structure such as a
zipper, which is likely to perform worse. The stack can overflow, and
(recursive) functions call aren’t entirely free either, but in terms of
allocation, deallocation and locality, the stack is also hard to beat!↩
Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both short and long term benefits from making your own personal benchmark exist.
I started thinking about benchmarks for coding in part with my frustration with the discourse around LLMs in the public squares I frequent (Reddit and Twitter). People often want to know "what's the best model" or "what's the best coding IDE"? One might imagine that the way to answer this question would be to test the models on a variety of problems from real world uses of the LLM for coding, and then compare how well various systems do on this. Indeed, whenever a new SOTA model releases, the lab will usually tell you about the model's performance against a few well known coding benchmarks. Problem solved?
Of course not! In fact, for the most part, no one really talks about benchmarks when comparing models. Why? I argue the most popular benchmarks measure tasks that are largely different from what a user wants out of an LLM. For example, take the recent Gemini 2.5 Pro release. In their headline table, they test against LiveCodeBench, Aider Polyglot and SWE-bench Verified. Both LiveCodeBench and Aider Polyglot derive their problems from contest programming and pedagogical exercises (respectively), while SWE-bench assesses bug fixes to preexisting codebases. While useful, this is only a small slice things people want to do with LLMs.
Wouldn't it be great if you had your own, personal benchmark, based on problems you actually care about? If you are tweaking your .cursorrules, you could run your benchmark to see if a change you made helped or not. When a new model comes out, you could spend a few bucks to run your eval and make a decision if you should switch your daily driver. And then on social media, if you wanted to stan the new model, instead of asking the model to drop a ball inside a rotating hexagon or vagueposting about how the new model is incredible, you could just post your benchmark results.
It's a collection of nearly 100 tests I've extracted from my actual conversation history with various LLMs.
There are two defining features of this benchmark that make it interesting. Most importantly, I've implemented a simple dataflow domain specific language to make it easy for me (or anyone else!) to add new tests that realistically evaluate model capabilities. This DSL allows for specifying both how the question should be asked and also how the answer should be evaluated. Most questions are evaluated by actually running the code the model writes but the framework supports a bunch of other evaluation methods as well. And then, directly as a result of this, I've written nearly 100 tests for different situations I've actually encountered when working with LLMs as assistants.
I have been working on my own benchmark based off of Carlini's benchmark, and I can confirm that this works well for the traditional style of coding eval, where you have a one-shot task that generates and executes the code against some test cases. My basic strategy is to vibe code as usual, but whenever I give an LLM a task that it isn't able to one shot, I consider adding it to the benchmark. In more detail:
I only add a task if a SOTA LLM failed it. This ensures the benchmark consists of all appropriate difficulty problems: easy enough that I thought an LLM should be able to do it, but hard enough that a SOTA model failed on it. I don't need problems that are too hard (this is already well covered by well known benchmarks like SWE-Bench or SWE-Lancer), and I don't mind if my problems saturate because, hey, that means the models are that much better for my use cases!
After I have added the task to the benchmark, I can use the benchmark runner to tell if changing the model, tweaking the prompt, or even just running the prompt again at nonzero temperature can make it pass. Indeed, it's helpful to find some configuration that makes the eval pass, as this is good for debugging issues in the evaluation function itself... also it means you have working code for whatever task you were working on. Conversely, you can make the task harder by leaving things out from the prompt.
Writing the test is the labor intensive part, but you can always vibe code a test. Importantly, you have a failing implementation (your initial generation) and some way you (manually?) determined that the implementation was wrong, so just turn this into your evaluation function! (And for all you yak shaving aficionados, if the model fails to vibe code your test, well, you have another task for your benchmark!)
For example, the other day I needed to take an asciinema recording and convert it into a sequence of frames rendered as plain text. However, the only project for doing these conversations was agg, which converts recordings into animated gifs. In
agg_to_text, I ask an LLM to take agg's source code and create a new program which dumps the frames as plain text rather than gif images. The reason why this task is difficult, is because there is some discretion in deciding when to emit a frame, and with my original prompt the LLM didn't precisely replicate the original behavior in agg. While working on the benchmark, I realized that instructing the model specifically about how frame batching worked was enough to get it to preserve the original behavior. But I don't think I should need to do this: thus this task. (P.S. If this test saturates, well, I can always make it harder by removing the agg source code from the prompt.)
The ability to benchmark one shot tasks is here today, but I would like to speculate a bit about what lies beyond them. In particular, most of my LLM coding activity involves asking the LLM to make changes to a pre-existing project, which makes it less amenable to "single prompt creates self contained program". (Also, I usually only ask one-shot questions that the LLM can answer, so most of them would never go in my benchmark.)
In short, how can I extract tasks from my day-to-day work? There seems to be two big extra levers we have:
Codebase tasks. This is the heavy-weight approach: you record the Git commit of your codebase at the time you prompted for some new feature to be added, and then when you want to run an eval on a new model you just check out the codebase at that commit and let the end-to-end system go. You'll typically want to execute the modified code, which means you'll also need a way to reliably setup the runtime environment for the code; things like lockfiles can help a lot here.
Transcript tasks. You don't actually need the entire codebase to be available to ask an LLM for a completion; you only need the conversation transcript up to the point of the critical generation. If the transcript is mostly your agent system reading in files for context, you can end up with a relatively system generic prompt that can tell you something about other systems. Of course, if you want to actually run the change, you still need the full codebase, which is why this approach is much more amenable if you're going to do some static analysis on the output. For example, if a model keeps adding try: ... except: ... blocks that are suppressing errors, you can take some transcripts where you've caught the model red-handed doing this and make an eval that checks if the model is still doing this. I suspect testing on transcripts works best for testing if changing prompts or rules improves performance, since the transcript itself will put the model into some particular latent space and if it were a different model they might have made different choices leading to a different latent space. Transcripts from thinking models are especially susceptible to this!
I have started adapting Carlini's framework to work better for these cases, although I would love to be told someone has already solved this problem for me. In particular, I am very excited about using transcript tasks to evaluate whether or not things I add to my prompts / triggered rules are helping or not. Current SOTA model instruction following isn't great and I regularly catch models doing behaviors that I explicitly told them not to in the system prompt. I have started some initial analysis over all of my chat logs to find cases where the model misbehaved, although I haven't quite worked out how I want to build an eval out of it.
One word of warning: to make transcript tasks, you need an AI coding system that doesn't obscure how it assembles its underlying prompts (which rules out most of the popular closed source AI code editors.)
I started building evals for a selfish reason: I wanted to be able to tell if modifications to my prompts were doing anything. But I also think there is a broader opportunity that arises if we also publish these benchmarks to the world.
For one, building a real world benchmark on use cases we care about is a way to communicate to the people training AI models whether or not they are doing well or not. Historical evals have focused on LeetCoding, and consequently we have models that would ace any big tech interview and yet on real world tasks will drive you off a cliff at the first opportunity. And this is not just free labor for the top labs: if you believe in open source models, one of the biggest barriers to good small models is having really high quality data. We, the OSS vibe coding community, can directly help here.
I think there is a tremendous opportunity for the open source community to really push the state of the art in coding evaluations. There's only so many benchmarks that I, personally, can create, but if everyone is making benchmarks I could eventually imagine a universe of benchmarks where you could curate the problems that are relevant to your work and quickly and cheaply judge models in this way: a Wikipedia of Coding Benchmarks.
To summarize: every time an LLM fails to solve a problem you ask it for, this is a potential new benchmark. As long as there is a way to automate testing if the LLM has solved the problem, you can turn this into a benchmark. Do this for yourself, and you can quickly have a personal benchmark with which to evaluate new models. Do this at scale, and you can help push the frontier in coding models.
Haskell is the world’s best programming language1, but let’s
face the harsh reality that a lot of times in life you’ll have to write in other
programming languages. But alas you have been fully Haskell-brained and
lost all ability to program unless it is type-directed, you don’t even know how
to start writing a program without imagining its shape as a type first.
Well, fear not. The foundational theory behind Algebraic Data Types and
Generalized Algebraic Data Types (ADTs and GADTs) are so fundamental that
they’ll fit (somewhat) seamlessly into whatever language you’re forced to write.
After all, if they can fit profunctor
optics in Microsoft’s Java code, the sky’s the limit!
This is an “April Fools” joke in the tradition of my previous
one in some of these ways that we are going to twist these other languages
might seem unconventional or possibly ill-advised… but also the title is
definitely a lie: these languages definitely should have them! :D
Normal ADTs
As a reminder, algebraic Data Types (ADTs) are products and sums; that’s why
they’re algebraic, after all!
Product Types
Products are just immutable structs, which pretty much every language
supports — as long as you’re able to make sure they are never mutated.
This is much simpler in languages where you can associate functions with
data, like OOP and classes. For example, this is the common “value object”
pattern in java (roughly related to the java bean2):
In this case, not only are these ADTs (algebraic data types), they’re also
ADTs (abstract data types): you are meant to work with them
based on a pre-defined abstract interface based on type algebra, instead of
their internal representations.
Sum Types
If your language doesn’t support sum types, usually the way to go is with the
visitor pattern: the underlying implementation is hidden, and the only
way to process a sum type value is by providing handlers for every branch — a
pattern match as a function, essentially. Your sum values then basically
determine which handler is called.
For example, we can implement it for a network address type that can either
be IPv4 or IPv6. Here we are using C++ just for generics and lambdas with
closures, for simplicity, but we’ll discuss how this might look in C later.
Note that in this way, the compiler enforces that we handle every branch.
And, if we ever add a new branch, everything that ever consumes
IPAddress with an IPAddressVisitor will have to add a
new handler.
In a language without generics or powerful enough polymorphism, it’s
difficult to enforce the “pure” visitor pattern because you can’t ensure that
all branches return the same type.
One common pattern is to have an “effectful” visitor pattern, where the point
isn’t to return something, but to execute something on the payload of
the present branch. This is pretty effective for languages like C, javascript,
python, etc. where types aren’t really a rigid thing.
For example, this might be how you treat an “implicit nullable”:
This is basically for_ from Haskell: You can do something like
conditionally launch some action if the value is present.
visitMaybe( () =>console.log("Nothing to request"), (reqPayload) =>makeRequest("google.com", reqPayload), maybeRequest);
On a simpler note, if your language as subtyping built in (maybe with classes
and subclasses) or some other form of dynamic dispatch, you can implement it in
terms of that, which is nice in python, java, C++, etc.
interface ExprVisitor<R>{ R visitLit(int value); R visitNegate(Expr unary); R visitAdd(Expr left, Expr right); R visitMul(Expr left, Expr right);}abstractclass Expr {publicabstract<R> R accept(ExprVisitor<R> visitor);}
Alternatively, you’re in a language where lambdas are easy, instead of
tupling up the visitor, you could just have accept itself take a
number of arguments corresponding to each constructor:
//Alternative definition without an explicit Visitorclassabstract classExpr { public abstract <R>R accept(Function<int,R> visitLit,Function<Expr,R> visitNegate,BiFunction<Expr,Expr,R> visitAdd,BiFunction<Expr,Expr,R> visitMul );}
(Note that C++ doesn’t allow template virtual methods — not because it’s not
possible within the language semantics and syntax, but rather because the
maintainers are too lazy to add it — so doing this faithfully requires a bit
more creativity)
Now, if your language has dynamic dispatch or subclass polymorphism, you can
actually do a different encoding, instead of the tagged union. This will work in
languages that don’t allow or fully support naked union types, too. In this
method, each constructor becomes a class, but it’s important to only
allow access using accept to properly enforce the sum type
pattern.
class Lit extends Expr {privatefinalint value;publicLit(int value){this.value= value;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitLit(value);}}class Negate extends Expr {privatefinal Expr unary;publicNegate(Expr unary){this.unary= unary;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitNegate(unary);}}class Add extends Expr {privatefinal Expr left;privatefinal Expr right;publicAdd(Expr left, Expr right){this.left= left;this.right= right;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitAdd(left, right);}}class Mul extends Expr {privatefinal Expr left;privatefinal Expr right;publicMul(Expr left, Expr right){this.left= left;this.right= right;}@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitMul(left, right);}}
(But, just wanted to note that if you actually are working in java,
you can actually do something with sealed classes, which allows exhaustiveness
checking for its native switch/case statements.)
Alternatively you could make all of the subclasses anonymous and expose them
as factory methods, if your language allows it:
abstractclass Expr {publicabstract<R> R accept(ExprVisitor<R> visitor);publicstatic Expr lit(int value){returnnewExpr(){@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitLit(value);}};}publicstatic Expr negate(Expr unary){returnnewExpr(){@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitNegate(unary);}};}publicstatic Expr add(Expr left, Expr right){returnnewExpr(){@Overridepublic<R> R accept(ExprVisitor<R> visitor){return visitor.visitAdd(left, right);}};}// ... etc}
Passing around function references like this is actually pretty close to the
scott encoding of our data type — and for non-recursive types, it’s essentially
the church encoding.
Recursive Types
Speaking of recursive types…what if your language doesn’t allow recursive
data types? What if it doesn’t allow recursion at all, or what if recursively
generated values are just annoying to deal with? Just imagine writing that
Expr type in a language with explicit memory management, for
example. Or, what if you wanted a way to express your recursive types in a more
elegant and runtime-safe manner?
One thing you can instead do is have your visitor be in its “catamorphism”,
or church encoding. Instead of having the “visitor” take the recursive
sub-values, instead have it return the result of recursively applying
itself.
Let’s do this in dhall, one of the most famous non-recursive
languages. Dhall does have native sum types, so we won’t worry about
manually writing a visitor pattern. But it does not have recursive data
types.
Let’s define a type like:
dataExpr=LitNatural|AddExprExpr|MulExprExpr
But we can’t define data types in dhall that refer to themselves. So instead,
we can define them in their “church encoding”: give what you would do with an
Expr to consume it, where the consumption function is given as if
it were recursively applied.
Note that ExprF r is essentially
ExprVisitor<R>, except instead of add being
Expr -> Expr -> r, it’s r -> r -> r: the
input values aren’t the expression, but rather the results of recursively
folding on the expression. In fact, our original non-recursive
ExprVisitor<R> (to be more precise, the
R accept(ExprVisitor<R>)) is often called the “scott
encoding”, as opposed to the recursive “church encoding” fold.
For value creation, you take the visitor and recursively apply:
And finally, using the data type involves providing the
handler to fold up from the bottom to top. Note that
add : \(left : Natural) -> \(right : Natural) -> left + right
already assumes that the handler has been applied to the sub-expressions, so you
get Naturals on both sides instead of Expr.
This pattern is useful even in languages with good datatype recursion, like
Haskell — it’s actually the recursion-schemes
refactoring of a recursive data type, and it can be useful to have it live
alongside your normal recursive types. I’ve written this blog
post talking about how useful this pattern is to have alongside your normal
recursive types.
This pattern is pretty portable to other languages too, as long as you can
scrounge together something like Rank-N types:
interface ExprFold<R>{ R foldLit(int value); R foldNegate(R unary); R foldAdd(R left, R right); R foldMul(R left, R right);}interface Expr {publicabstract<R> R accept(ExprFold<R> fold);publicstatic Expr lit(int value){returnnewExpr(){@Overridepublic<R> R accept(ExprFold<R> fold){return fold.foldLit(value);}};}publicstatic Expr negate(Expr unary){returnnewExpr(){@Overridepublic<R> R accept(ExprFold<R> fold){return fold.foldNegate(unary.accept(fold));}};}// etc.}
By “Rank-N types” here, I mean that your objects can generate polymorphic
functions: given an Expr, you could generate an
<R> R accept(ExprFold <R> fold) for any R,
and not something pre-determined or pre-chosen by your choice of representation
of Expr.
Generalized Algebraic Data Types
You’ve implemented ADTs in your language of choice, or you are currently in a
language with native ADTs. Life is good, right? Until that sneaky voice starts
whispering in your hear: “we need more type safety.” You resist that urge, maybe
even get a lot done without it, but eventually you are compelled to give in and
embrace the warm yet harsh embrace of ultimate type safety. Now what?
Singletons and Witnesses
In Haskell, singletons are essentially enums used to associate a value with a
reifiable type. “Reifiable” here means that you can take the runtime value of a
singleton and use it to bring evidence to the type-level. I ran into a
real-world usage of this while writing https://coronavirus.jle.im/, a web-based data visualizer of
COVID-19 data (source here) in
purescript. I needed a singleton to represent scales for scatter plots
and linking them to the data that can be plotted. And, not only did it need to
be type-safe in purescript (which has ADTs but not GADTs), it had to be
type-safe in the javascript ffi as well.
Here’s how it might look in Haskell:
-- | Numeric typesdataNType ::Type->TypewhereNInt ::NTypeIntNDouble ::NTypeDoubleNPercent ::NTypePercent-- | Define a scaledataScale ::Type->TypewhereScaleDate ::ScaleDateScaleLinear ::Bool->NType a ->Scale a -- ^ whether to include zero in the axis or notScaleLog ::NType a ->Scale a
You’d then run it like this:
plot ::Scale a ->Scale b -> [(a, b)] ->Canvas
So, we have the type of the input tuples being determined by the
values you pass to plot:
But let’s say we only had ADTs. And then we’re passing them down to a
javascript FFI which only has structs and functions. We could drop the
type-safety and instead error on runtime, but…no. Type unsafety is not
acceptable.
The fundamental ability we want to gain is that if we pattern match on
ScaleDate, then we knowa has to be
Date. If we match on NInt, we know that ahas to be Int.
For the sake of this example, we’re going to be implementing a simpler
function in purescript and in javascript: a function that takes a scale type and
a list of points prints the bounds. In Haskell, this looks like:
(Pretend the Percent type is just a newtype-wrapped
Float or something)
There are at least two main approaches to do this. We’ll be discussing
runtime equality witnesses and Higher-Kinded Eliminators.
Runtime Witnesses and Coyoneda
Embedding
The Yoneda Lemma
is one of the most powerful tools that Category Theory has yielded as a branch
of math, but its sibling coyoneda
is one of the most useful Haskell abstractions.
This doesn’t give you GADTs, but it’s a very lightweight way to “downgrade”
your GADTs into normal GADTs which is appropriate if you don’t need the full
power.
The trick is this: if you have MyGADT a, and you know you are
going to be using it to produceas, you can do a covariant
coyoneda transform.
For example, if you have this type representing potential data sources:
dataSource ::Type->TypewhereByteSource ::Handle->SourceWordStringSource ::FilePath->SourceStringreadByte ::Handle->IOWordreadString ::FilePath->IOStringreadSource ::Source a ->IO areadSource = \caseByteSource h -> readByte hStringSource fp -> readString fp
You could instead turn Source into a non-GADT by making it a
normal parameterized ADT and adding a X -> a field, which is a
type of CPS transformation:
dataSource a =ByteSourceHandle (Word-> a)|StringSourceFilePath (String-> a)byteSource ::Handle->SourceWordbyteSource h =ByteSource h idstringSource ::FilePath->SourceStringstringSource fp =StringSource fp idreadSource ::Source a ->IO areadSource = \caseByteSource h out -> out <$> readByte hStringSource fp out -> out <$> readString fp
A nice benefit of this method is that Source can now have a
Functor instance, which the original GADT could not.
And, if MyGADT a is going to be consumingas, you can do the contravariant
coyoneda transform:
dataSink a =ByteSinkHandle (a ->Word)|StringSinkFilePath (a ->String)
And, if you are going to be both consuming and producing as, you
can do the invariant coyoneda transform
dataInterface a =ByteInterfaceHandle (Word-> a) (a ->Word)|StringInterfaceFilePath (String-> a) (Word-> a)
However, in practice, true equality involves being able to lift
under injective type constructors, and carrying every single
continuation is unwieldy. We can package them up together with a runtime
equality witness.
This is something we can put “inside” NInt such that, when we
pattern match on a NType a, the type system can be assured that
a is an Int.
You need some sort of data of type IsEq a b with functions:
refl :: IsEq a a
to :: IsEq a b -> a -> b
sym :: IsEq a b -> IsEq b a
trans :: IsEq a b -> IsEq b c -> IsEq a c
inj :: IsEq (f a) (f b) -> IsEq a b
If you have to and sym you also get
from :: IsEq a b -> b -> a.
From all of this, we can recover our original
IsEq a Word -> Word -> a and
IsEq a Word -> a -> Word functions, saving us from having to
put two functions.
Your language of choice might already have this IsEq. But one of
the more interesting ways to me is Leibniz equality (discussed a lot in this
Ryan Scott post), which works in languages with higher-kinded polymorphism.
Leibniz quality in languages with higher-kinded polymorphism means that
a and b are equal if
forall p. p a -> p b: any property of a is also
true of b.
In Haskell, we write this like:
newtypeLeibniz a b =Leibniz (forall p. p a -> p b)refl ::Leibniz a arefl =Leibnizid
The only possible way to construct a ‘Leibniz’ is with both type parameters
being the same: You can only ever create a value of type
Leibniz a a, never a value of Leibniz a b where
b is not a.
You can prove that this is actually equality by writing functions
Leibniz a b -> Leibniz b a and
Leibniz a b -> Leibniz b c -> Leibniz a c (this
Ryan Scott post goes over it well), but in practice we realize this equality
by safely coercing a and b back and forth:
newtypeIdentity a =Identity { runIdentity :: a }to ::Leibniz a b -> a -> bto (Leibniz f) = runIdentity . f .IdentitynewtypeOp a b =Op { getOp :: b -> a }from ::Leibniz a b -> b -> afrom (Leibniz f) = getOp (f (Opid))
So, if your language supports higher-kinded Rank-2 types, you have a
solution!
There are other solutions in other languages, but they will usually all be
language-dependent.
Let’s write everything in purescript. The key difference is we use
map (to isNumber) :: Array a -> Array Number, etc., to get our
Array as something we know it has the type of.
importText.PrintfnewtypeLeibniz a b =Leibniz (forall p. p a -> p b)to ::Leibniz a b -> a -> bfrom ::Leibniz a b -> b -> adataNType a =NInt (Leibniz a Int)|NNumber (Leibniz a Number)|NPercent (Leibniz a Percent)typeAxisBounds a = { minValue :: a , minLabel ::String , maxValue :: a , maxLabel ::String }displayNumericAxis ::NType a ->Array a ->AxisBounds adisplayNumericAxis = \caseNInt isInt -> \xs ->let xMin =minimum$map (to isInt) xs xMax =maximum$map (to isInt) xsshowInt=showin { minValue: xMin , minLabel:showInt xMin , maxValue: xMax , maxLabel:showInt xMax }NNumber isNumber -> \xs ->let xMin =minimum$map (to isNumber) xs xMax =maximum$map (to isNumber) xs showFloat = printf (Proxy ::Proxy"%.4f") -- it works a little differentlyin { minValue: xMin , minLabel: showFloat xMin , maxValue: xMax , maxLabel: showFloat xMax }NPercent isPercent -> \xs ->let xMin =minimum$map (to isPercent) xs xMax =maximum$map (to isPercent) xs showPercent = printf (Proxy ::Proxy"%.1f%%") <<< (_ *100.0)in { minValue: xMin , minLabel: showPercent xMin , maxValue: xMax , maxLabel: showPercent xMax }
To work with our [a] as if it were [Int], we have
to map the coercion function over it that our Leibniz a Int gave
us. Admittedly, this naive way adds a runtime cost of copying the array. But we
could be more creative with finding the minimum and maximum in this way in
constant space and no extra allocations.
And, if we wanted to outsource this to the javascript FFI, remember that
javascript doesn’t quite have sum types, so we can create a quick visitor:
typeNVisitor a r = { nvInt ::Leibniz a Int-> r , nvNumber ::Leibniz a Number-> r , nvPercent ::Leibniz a Percent-> r }typeNAccept a =forall r.NVisitor a r -> rtoAccept ::NType a ->NAccept atoAccept =case _ ofNInt isInt -> \nv -> nv.nvInt isIntNNumber isNumber -> \nv -> nv.nvNumber isNumberNPercent isPercent -> \nv -> nv.nvPercent isPercentforeign import _formatNumeric :: forall a. Fn2 (NAccept a) a StringformatNumeric ::NType a -> a ->StringformatNumeric nt = runFn2 _formatNumeric (toAccept nt)
Admittedly in the javascript we are throwing away the “GADT type safety”
because we throw away the equality. But we take what we can — we at least retain
the visitor pattern for sum-type type safety and exhaustiveness checking. I
haven’t done this in typescript yet so there might be a way to formalize Leibniz
equality to do this in typescript and keep the whole chain type-safe from top to
bottom.
Higher-Kinded Eliminators
This is essentially the higher-kinded version of the visitor pattern, except
in dependent type theory these visitors are more often called “eliminators” or
destructors, which is definitely a cooler name.
In the normal visitor you’d have:
dataUser=TheAdmin|MemberIntdataUserHandler r =UH { uhTheAdmin :: r , uhMember ::Int-> r }
But note that if you have the right set of continuations, you have something
that is essentially equal to User without having to actually use
User:
typeUser'=forall r.UserHandler r -> rfromUser ::User->User'fromUser = \caseTheAdmin-> \UH{..} -> uhTheAdminMember userId -> \UH{..} -> uhMember userIdtoUser ::User'->FootoUser f = f $UH { fhTheAdmin =TheAdmin, fhMember =Member }
This means that User is actually equivalent to
forall r. UserHandler r -> r: they’re the same type, so if your
language doesn’t have sum types, you could encode it as
forall r. UserHandler r -> r instead. Visitors, baby.
But, then, what actually does the r type variable represent
here, semantically? Well, in a UserHandler r, r is the
“target” that we interpret into. But there’s a deeper relationship between
r and User: A UserHandler r essentially
“embeds” a User into an r. And, a
UserHandler r -> r is the application of that embedding to an
actual User.
If we pick r ~ (), then UserHandler () embeds
User into (). If we pick r ~ String, then
UserHandler () embeds User into String
(like, “showing” it). And if we pick r ~ User, a
UserHandler User embeds a User into…itself?
So here, r is essentially the projection that we view the user
through. And by making sure we are forall r. UserHandler r -> r
for allr, we ensure that we do not lose any information:
the embedding is completely 1-to-1. It lets you “create” the User
faithfully in a “polymorphic” way.
In fact, to hammer this home, some people like to use the name of the type as
the type variable: UserHandler user:
-- | The same thing as before but with things renamed to prove a pointdataMakeUser user =MakeUser { uhTheAdmin :: user , uhMember ::Int-> user }typeUser'=forall user.MakeUser user -> user
The forall user. lets us faithfully “create” a User
within the system we have, without actually having a User data
type. Essentially we can imagine the r in the forall r
as “standing in” for User, even if that type doesn’t actually
exist.
Now, here’s the breakthrough: If we can use forall (r :: Type)
to substitute for User :: Type, how about we use a
forall (p :: Type -> Type) to substitute for a
Scale :: Type -> Type?
dataScale ::Type->TypewhereScaleDate ::ScaleDateScaleLinear ::Bool->LType a ->Scale aScaleLog ::NType a ->Scale adataScaleHandler p a =SH { shDate :: p Date , shLinear ::Bool->NType a -> p a , shLog ::NType a -> p a }typeScale' a =forall p.ScaleHandler p a -> p afromScale ::Scale a ->Scale' afromScale = \caseScaleDate-> \SH{..} -> shDateScaleLinear hasZero lt -> \SH{..} -> shLinear hasZero ltScaleLog nt -> \SH{..} -> shLog nttoScale ::Scale' a ->Scale atoScale f = f $SH { shDate =ScaleDate, shLinear =ScaleLinear, shLog =ScaleLog }
So in our new system, forall p. ScaleHandler p a -> p a is
identical to Scale: we can use p a to substitute in
Scale in our language even if our language itself cannot support
GADTs.
So let’s write formatNType in purescript. We no longer have an
actual Scale sum type, but its higher-kinded church encoding:
typeNType a =forall p. { int :: p Int , number :: p Number , percent :: p Percent } -> p atypeScale a =forall p. { date :: p Date , linear ::Bool->NType a -> p a , log ::NType a -> p a } -> p antInt ::NTypeIntntInt nth = nth.intntNumber ::NTypeNumberntNumber nth = nth.numberntPercent ::NTypePercentntPercent nth = nth.percentformatNType ::NType a -> a ->StringformatNType nt = fwhereOp f = nt { int:Opshow , number:Op$ printf (Proxy"%.4f") , percent:Op$ printf (Proxy"%.1f%%") <<< (_ *100.0) }
Here we are using
newtypeOp b a =Op (a -> b)
as our “target”: turning an NType a into an
Op String a. And an Op String a is an
a -> String, which is what we wanted! The int field
is Op String Int, the number field is
Op String Number, etc.
In many languages, using this technique effectively requires having a newtype
wrapper on-hand, so it might be unwieldy in non-trivial situations. For example,
if we wanted to write our previous axis function which is
NType a -> [a] -> String, we’d have to have a newtype wrapper
for [a] -> String that has a as its argument:
newtypeOpList b a =Op ([a] -> b)
or you could re-use Compose:
newtypeCompose f g a =Compose (f (g a))
and your p projection type would be Compose Op [].
So, you don’t necessarily have to write a bespoke newtype wrapper, but you do
have to devote some brain cycles to think it through (unless you’re in a
language that doesn’t need newtype wrappers to have this work, like we’ll
discuss later).
By the way, this method generalizes well to multiple arguments: if you have a
type like MyGADT a b c, you just need to project into a
forall (p :: k1 -> k2 -> k3 -> Type).
I believe I have read somewhere that the two methods discussed here (runtime
equality witness vs. higher-kinded eliminator) are not actually fully identical
in their power, and there are GADTs where one would work and not the other … but
I can’t remember where I read this and I’m also not big-brained enough to figure
out what those situations are. But if you, reader, have any idea, please let me
know!
Existential Types
Let’s take a quick break to talk about something that’s not
technically related to GADTs but is often used alongside them.
What if we wanted to store a value with its NType and hide the
type variable? In Haskell we’d write this like:
dataNType ::Type->TypewhereNInt ::NTypeIntNDouble ::NTypeDoubleNPercent ::NTypePercentdataSomeNType=forall a.SomeNType (NType a) aformatNType ::NType a -> a ->StringformatNType nt x =...formatSomeNType ::SomeNType->StringformatSomeNType (SomeNType nt x) = formatNType nt xmyFavoriteNumbers :: [SomeNType]myFavoriteNumbers = [SomeNTypeNInt3, SomeNTypeNDoublepi]
But what if our language doesn’t have existentials? Remember, this is
basically a value SomeNType that isn’t a Generic, but
contains both a NType a and an a of the
same variable.
One strategy we have available is to CPS-transform our existentials into
their CPS form (continuation-passing style form). Basically, we write exactly
what we want to do with our contents if we pattern matched on them.
It’s essentially a Rank-N visitor pattern with only a single constructor:
typeSomeNType=forall r. (forall a.NType a -> a -> r) -> rsomeNType ::NType a -> a ->SomeNTypesomeNType nt x = \f -> f nt xformatSomeNumeric ::SomeNType->StringformatSomeNumeric snt = snt \nt x -> formatNumeric nt x
You can imagine, syntactically, that snt acts as its “own”
pattern match, except instead of matching on
SomeNType nt x -> .., you “match” on
\nt x -> ..
This general pattern works for languages with traditional generics like Java
too:
interface SomeNTypeVisitor<R>{<A> R visit(NType<A> nt, A val);}interface SomeNType {publicabstract<R> R accept(SomeNTypeVisitor<R> visitor);// One option: the factory methodpublicstatic<A> SomeNType someNType(NType<A> nt, A val){returnnewSomeNType(){@Overridepublic<R> R accept(SomeNTypeVisitor<R> visitor){return visitor.visit(nt, val);}};}}// Second option: the subtype hiding a type variable, which you have to always// make sure to upcast into `SomeNType` after creatingclass SomeNTypeImpl<A>extends SomeNType {private NType<A> nt;private A val;publicSomeNTypeImpl(NType<A> nt, A val){this.nt= nt;this.val= val;}@Overridepublic<R> R accept(SomeNTypeVisitor<R> visitor){return visitor.visit(nt, val);}}
Does…anyone write java like this? I tried committing this once while at
Google and I got automatically flagged to be put on a PIP.
Recursive GADTs
The climax of this discussion: what if your language does not support GADTs
or recursive data types?
We’re going to be using dhall as an example again, but note that the
lessons applied here are potentially useful even when you do have
recursive types: we’re going to be talking about a higher-kinded church
encoding, which can be a useful form of your data types that live alongside your
normal recursive ones.
Let’s imagine Expr as a GADT, where Expr a
represents an Expr that evaluates to an a:
dataExpr ::Type->TypewhereNatLit ::Natural->ExprNaturalBoolLit ::Bool->ExprBoolAdd ::ExprNatural->ExprNatural->ExprNaturalLTE ::ExprNatural->ExprNatural->ExprBoolTernary ::ExprBool->Expr a ->Expr a ->Expr aeval ::Expr a -> aeval = \caseNatLit n -> nBoolLit b -> bAdd x y -> eval x + eval yLTE a b -> eval a <= eval bTernary b x y ->if eval b then eval x else eval y
Adding this type variable ensures that our Expr is type-safe:
it’s impossible to Add an Expr Bool, and the two
branches of a Ternary must have the same result type, etc. And, we
can write eval :: Expr a -> a and know exactly what type will be
returned.
Now, let’s combine the two concepts: First, the church encoding, where our
handlers take the “final result” of our fold r instead of the
recursive value Expr. Second, the higher-kinded eliminator pattern
where we embed Expr :: Type -> Type into
forall (p :: Type -> Type).
Again, now instead of add taking Expr, it takes
p Natural: the “Natural result of the fold”.
p not only stands in for what we embed Expr into, it
stands in for the result of the recursive fold. That’s why in eval,
the first arguments of add are the Natural results of
the sub-evaluation.
These values can be created in the same way as before, merging the two
techniques, sending the handlers downstream:
If all of this is difficult to parse, try reviewing both the recursive ADT
section and the higher-kinded eliminator section and making sure you understand
both well before tackling this, which combines them together!
Admittedly in Haskell (and purescript) this is a lot simpler because we don’t
have to explicitly pass in type variables:
dataExprF p =ExprF { natLit ::Natural-> p Natural , boolLit ::Bool-> p Bool , add :: p Natural-> p Natural-> p Natural , ternary ::forall a. p Bool-> p a -> p a -> p a }typeExpr a =forall p.ExprF p a -> p aeval ::Expr a -> aeval e = runIdentity $ e { natLit =Identity , boolLit =Identity , add = \(Identity x) -> \(Identity y) ->Identity (x + y) , ternary = \(Identity b) -> \(Identity x) -> \(Identity y) ->if b then x else y }ternary ::ExprBool->Expr a ->Expr a ->Expr aternary b x y handlers = handlers.ternary (b handlers) (x handlers) (y handlers)
But one nice thing about the dhall version that’s incidental to dhall is that
it doesn’t require any extra newtype wrappers like the Haskell one does. That’s
because type inference tends to choke on things like this, but dhall doesn’t
really have any type inference: all of the types are passed explicitly. It’s one
of the facts about dhall that make it nice for things like this.
Congratulations
In any case, if you’ve made it this far, congratulations! You are a master of
ADTs and GADTs. Admittedly every language is different, and some of these
solutions have to be tweaked for the language in question. And, if your program
gets very complicated, there is a good chance that things will become
ergonomically unfeasible.
But I hope, at least, that this inspires your imagination to try to bring
your haskell principles, techniques, standards, practices, and brainrot into the
language of your choice (or language you are forced to work with).
And, if you ever find interesting ways to bring these things into a language
not discussed here (or a new interesting technique or pattern), I would
absolutely love to hear about it!
Until next time, happy “Haskelling”!
Special Thanks
I am very humbled to be supported by an amazing community, who make it
possible for me to devote time to researching and writing these posts. Very
special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
I bet you thought there was going be some sort of caveat in this
footnote, didn’t you?↩︎
I didn’t think I’d ever write “java bean” non-ironically on my
blog, but there’s a first time for everything.↩︎
Be aware that this implementation is not necessarily
appropriately lazy or short-circuiting in Ternary: it might
evaluate both sides returning the chosen branch.↩︎
To visit a tree or graph in breadth-first order, there are two main
implementation approaches: queue-based or level-based.
Our goal here is to develop a level-based approach where the levels of
the breadth-first walk are constructed compositionally and dynamically.
Compositionality means that for every node, its descendants—the other nodes
reachable from it—are defined by composing the descendants of its children.
Dynamism means that the children of a node are generated only when that node
is visited; we will see that this requirement corresponds to asking for a
monadic unfold.
A prior solution, using the Phases applicative functor,
is compositional but not dynamic in that sense. The essence of Phases
is a zipping operation in free applicative functors.
What if we did zipping in free monads instead?
A breadth-first walk explores the tree level by level; every level contains the
nodes at the same distance from the root. The list of levels of a tree can be defined
recursively—it is a fold. For a tree Node x l r, the first level contains
just the root node x, and the subsequent levels are obtained by appending the
levels of the subtrees l and r pairwise.
(We can’t just use zipWith because it throws away the end of a list when the
other list is empty.)
Finally, we concatenate the levels together to obtain the list of nodes in
breadth-first order.
toListBF::Treea-> [a]toListBF=concat.levels
Thanks to laziness, the list will indeed be produced by walking the tree in
breadth-first order.
So far so good.
The above function lets us fold a tree in breadth-first order.
The next level of difficulty is to traverse a tree, producing a tree
with the same shape as the original tree, only with modified labels.
This has the exact same type as traverse, which you might obtain with
deriving (Foldable, Traversable). The stock-derived Traversable—enabled
by the DeriveTraversable extension—is a depth-first traversal, but the laws
of traverse don’t specify the order in which nodes should be visited,
so you could make it a breadth-first traversal if you wanted.
“Breadth-first numbering” is a special case of “breadth-first traversal”
where the arrow (a -> m b) is specialized to a counter.
Okasaki presents a “numbering” solution based on queues and another solution
based on levels.
Both are easily adaptable to the more general “traversal” problem as we will
soon see.
There is a wonderful Discourse thread from 2024 on the topic of
breadth-first traversals.
The first post gives an elegant breadth-first numbering algorithm
which also appears in the appendix of Okasaki’s paper,
but sadly it does not generalize from “numbering” to
“traversal” beyond the special case m = State s.
Last but not least, another level-based solution to the breadth-first traversal
problem can be found in the
tree-traversals library by Noah Easterly.
It is built around an applicative transformer named Phases,
which is a list of actions—imagine the type “[m _]”—where each
element m _ represents one level of the tree.
The Phases applicative enables a compositional definition of a
breadth-first traversal, similarly to the levels function above:
the set of nodes reachable from the root is defined by combining the sets of
nodes reachable from its children. This concern of compositionality
is one of the main motivations behind this post.
Non-standard terminology
The broad family of algorithms being discussed is typically called
“breadth-first search” (BFS) or “breadth-first traversal”,
but in general these algorithms are not “searching” for anything,
and in Haskell, “traversal” is reserved for “things like traverse”.
Instead, this post will use “walks” as a term encompassing folds, traversals,
unfolds, or any concept that can be qualified with “breadth-first”.
Problem statement: Breadth-first unfolds
Both the fold toListBF and the traversal traverseBF had in common that they
receive a tree as an input. This explicit tree makes the notion of levels
“static”. With unfolds, we will have to deal with levels that exist only
“dynamically” as the result of unfolding the tree progressively.
To introduce the unfolding of a tree, it is convenient to introduce its “base
functor”. We modify the tree type by replacing the recursive tree fields with
an extra type parameter:
An unfold generates a tree from a seed and a
function which expands the seed into a leaf or a node containing more seeds.
A pure unfold—or anamorphism—can be defined readily:
The order in which nodes are evaluated depends on
how the resulting tree is consumed. Hence unfold
is neither inherently “depth-first” nor “breadth-first”.
The situation changes if we make the unfold monadic.
unfoldM::Monadm=> (s->m (TreeFas)) ->s->m (Treea)
An implementation of unfoldM must decide upon an ordering between actions.
To see why adding an M to unfold imposes an ordering,
contemplate the fact that these expressions have the same meaning:
Node a (unfold f l) (unfold f r)
= ( let tl = unfold f l in
let tr = unfold f r in
Node a tl tr )
= ( let tr = unfold f r in
let tl = unfold f l in
Node a tl tr )
whereas these monadic expressions do not have the same meaning in general:
( unfoldM f l >>= \tl ->
unfoldM f r >>= \tr ->
pure (Node a tl tr) )
/=
( unfoldM f r >>= \tr ->
unfoldM f l >>= \tl ->
pure (Node a tl tr) )
Without further requirements, there is an “obvious” definition of unfoldM,
which is a depth-first unfold:
We unfold the left subtree l fully before unfolding the right one r.
The problem is to define a breadth-firstunfoldM.
If you want to think about this problem on your own, you can stop reading here.
The rest of this post presents solutions.
Queue-based unfold
The two breadth-first numbering algorithms in Okasaki’s paper can
actually be generalized to breadth-first unfolds.
Here is the first one that uses queues (using the function (<+) for “push” and
pattern-matching on (:>) for “pop”):
If you’re frowning upon the use of error—as you should be—you can replace
error with dummy values here (Empty, Leaf), but
(1) that won’t be possible with tree structures that must be non-empty
(e.g., if Leaf contained a value) and (2) this is dead code, which
is harmless but no more elegant than making it obvious with error.
The correctness of this solution is also not quite obvious.
There are subtle ways to get this implementation wrong:
should the recursive call be b2 <+ b1 <+ q or b1 <+ b2 <+ q?
Should the pattern be p :> t1 :> t2 or p :> t2 :> t1?
For another version of this challenge, try implementing the unfold for another
tree type, such as finger trees or rose trees, without getting lost in the
order of pushes and pops (by the way, this is Data.Tree.unfoldTreeM_BF in
containers). The invariant is not complex but there is room for mistakes.
I believe that the compositional approach that will be presented later is more
robust on that front, although it is admittedly a subjective quality for which
is difficult to make a strong case.
Some uses of unfolds
Traversals from unfolds
One sense in which unfoldM is a more difficult problem than traverse is
that we can use unfoldM to implement traverse.
We do have to make light of the technicality that there is a Monad constraint
instead of Applicative, which makes unfoldM not suited to implement the
Traversable class.
A depth-first unfold gives a depth-first traversal:
We can use a tree unfold to explore a graph.
This usage distinguishes unfolds from folds and traversals,
which only let you explore trees.
Given a type of vertices V, a directed graph is represented by a function
V -> F V, where F is a functor which describes the arity of each node.
The obvious choice for F is lists, but we will stick to TreeF here
so we can just reuse this post’s unfoldM implementations.
The TreeF functor restricts us graphs where each node has zero or two
outgoing edges; it is a weird restriction, but we will make do for the sake of
example.
An ASCII drawing of a graph
+-------+
v |
+->1--->2--->3 |
| | | ^ |
| v v | |
| 4--->5--->6--+
| | | ^
| +----|----+
| |
+-------+
The graph drawn above turns into the following function, where every vertex
is mapped either to NodeF with the same vertex as the first argument followed
by its two adjacent vertices, or to LeafF if it has no outgoing edges or does
not belong to the graph.
If we simply feed that function to unfold, we will get the infinite tree
of all possible paths from a chosen starting vertex.
To obtain a finite tree, we want to keep track of vertices that we have
already visited, using a stateful memory. The following function wraps graph,
returning LeafF also if a vertex has already been visited.
Applying unfoldM_BF to that function produces a “breadth-first tree”
of the graph, an encoding of the trajectory of a breadth-first walk through the
graph. “Breadth-first trees” are a concept from graph theory with well-studied
properties.
-- Visit `graph` in breadth-first orderbfGraph_Q::Int->TreeIntbfGraph_Q= (`evalState`Set.empty) .unfoldM_BF_QvisitGraph
This post is a compilable Literate Haskell file. You can run all of the tests
and benchmarks in here. The source repository provides the necessary
configuration to build it with cabal.
$ cabal build breadth-first-unfolds
Test cases can then be selected with the -p option and a pattern
(see the tasty documentation for details).
Run all tests and benchmarks by passing no option.
$ cabal exec breadth-first-unfolds -- -p "/Q-graph/||/S-graph/"
All
Q-graph: OK
S-graph: OK
“Global” level-based unfold
The other solution from Okasaki’s paper can also be adapted into a monadic unfold.
The starting point is to unfold a list of seeds [s] instead of a single seed:
we can traverse the list with the expansion function s -> m (TreeF a s) to
obtain another list of seeds, the next level of the breadth-first unfold,
and keep going.
Iterating this process naively yields a variant of monadic unfold without a
result. This no-result variant can be generalized from TreeF to
any foldable structure:
Modifying this solution to create the output tree requires a little more thought.
We must keep hold of the intermediate list of ts :: [TreeF a s] to
reconstruct trees after the recursive call returns.
This solution is less brittle than the queue-based solution because
we always traverse lists left-to-right.
To avoid the uses of error in reconstruct,
you can probably create a specialized data structure in place of [TreeF a s],
but that is finicky in its own way.
In search of compositionality
Both of the solutions above (the queue-based and the “monolithic” level-based unfolds)
stem from a global view of breadth-first walks: we are iterating on a list or a
queue which holds all the seeds from one or two levels at a time.
That structure represents a “front line” between visited and unvisited
vertices, and every iteration advances the front line a little: with a queue we
advance it one vertex at a time, with a list we advance the whole front line
in an inner loop—one call to traverse—before recursing.
The opposite local view of breadth-first order is exemplified by the earlier
levels function: it only produces a list of lists of the vertices
reachable from the current root. It does so recursively, by composing
together the vertices reachable from its children. Our goal here is to find a
similarly local, compositional implementation of breadth-first unfolds.
Rather than defining unfoldM directly, which sequences the computations on
all levels into a single computation, we will introduce an intermediate
function weave that keeps levels separate—just as toListBF is defined
using levels.
The result of weave will be in an as yet unknown applicative functor F m
depending on m.
And because levels are kept separate, weave only needs
a constraint Applicative m to compose computations on the same level.
The goal is to implement this signature, where the result type F is also an
unknown:
With only what we know so far, a bit of type-directed programming leads to the
following incomplete definition. We have constructed something of type
m (F m (Tree a)), while we expect F m (Tree a):
To fill the hole _, we postulate the following primitive, weft,
as part of the unknown definition of F:
weft::Applicativem=>m (Fma) ->Fma
Intuitively, F m represents “multi-level computations”.
The weft function constructs a multi-level (F m)-computation from
one level of m-computation which returns the subsequent levels
as an (F m)-computation.
We fill the hole with weft, completing the definition of weave:
The function weave defines a multi-level computation which represents
a breadth-first walk from a seed s:
the first level of the walk is f s, expanding the initial seed;
the auxiliary function weaveF constructs the remaining levels from
the initial seed’s expansion:
if the seed expands to LeafF, there are no more seeds,
and we terminate with an empty computation (pure);
if the seed expands to NodeF, we obtain two sub-seeds l and r,
they generate their own weaves recursively (weave f l and weave f r),
and we compose them (liftA2).
One way to think about weft is as a generalization of the following primitives:
we can “embed” m-computations into F m,
and we can “delay” multi-level (F m)-computations, shifting the
m-computation on each level to the next level.
The key law relating these two operations is that embedded computations
and delayed computations commute with each other:
embed u *> delay v = delay v <* embed u
The embed and delay operations are provided by the Phases applicative
functor that I mentioned earlier, which enables breadth-first traversals,
but not breadth-first unfolds. Thus, weft is a strictly more expressive
primitive than embed and delay.
Eventually, we will run a multi-level computation as a single m-computation
so that we can use weave to define unfoldM. The runner function will be
called mesh:
mesh::Monadm=>Fma->ma
It is characterized by this law which says that mesh executes the first
level of the computation u :: m (F m a), then executes the remaining levels
recursively:
mesh (weft u) = u >>= mesh
Putting everything together, weave and mesh combine into a breadth-first unfold:
It remains to find an applicative functor F equipped with weft and mesh.
The weave applicative
A basic approach to design a type is to make some of the operations it
should support into constructors. The weave applicative WeaveS has
constructors for pure and weft:
dataWeaveSma=EndSa|WeftS (m (WeaveSma))
(The suffix “S” stands for Spoilers. Read on!)
We instantiate the unknown functor F with WeaveS.
typeF=WeaveS
Astute readers will have recognized WeaveS as the free monad.
Just as Phases has the same type definition as the free applicative functor but
a different Applicative instance, we will give WeaveS an Applicative
instance that does not coincide with the Applicative and Monad instances of
the free monad.
Starting with the easy functions,
weft is WeftS, and the equation for mesh above is basically its definition.
We just need to add an equation for EndS.
Recall that WeaveS represents multi-level computations.
Computations are composed level-wise with the following liftS2.
The interesting case is the one where both arguments are WeftS: we compose
the first level with liftA2, and the subsequent ones with liftS2
recursively.
liftS2 will be the liftA2 in WeaveS’s Applicative instance.
The Functor and Applicative instances show that WeaveS is an
applicative transformer: for every applicative functor m,
WeaveS m is also an applicative functor.
The outer weft constructor was moved into the recursive calls.
The result type has an extra m, which makes it more apparent that
we always start with a call to f. It’s the same vibe as replacing the type
[a] with NonEmpty a when we know that a list will always have at least one
element; weaveS always produces at least one level of computation.
We also replace (<$>) with its flipped version (<&>) for aesthetic reasons:
we can apply it to a lambda without parentheses, and that change makes the
logic flow naturally from left to right: we first expand the seed s using
f, and continue depending on whether the expansion produced LeafF or NodeF.
To define unfoldM, instead of applying mesh directly, we chain it with
(>>=).
That solution is Obviously Correct™, but it has a terrible flaw:
it does not run in linear time!
We can demonstrate this by generating a “thin” tree whose height
is equal to its size.
The height h is the seed of the unfolding, and we generate a NodeF as long
as it is non-zero, asking for a decreased height h - 1 on the right,
and a zero height on the left.
$ cabal exec breadth-first-unfolds -- -p "S-thin"
All
S-thin
1x: OK
27.6 μs ± 2.6 μs, 267 KB allocated, 317 B copied, 6.0 MB peak memory
10x: OK
2.90 ms ± 181 μs, 23 MB allocated, 178 KB copied, 7.0 MB peak memory, 105.35x
Multiplying the height by 10x makes the function run 100x slower.
Dramatically quadratic.
Complexity analysis
We can compare this implementation with level from earlier, which is linear-time.
In particular, looking at zipLevels with liftS2—which play similar
roles—there is a crucial difference when one of the arguments is empty
([] or EndS):
zipLevels simply returns the other argument, whereas liftS2 calls (<$>),
continuing the recursion down the other argument.
So zipLevels stops working after reaching the end of either argument, whereas
liftS2 walks to the end of both arguments. There is at least one
call to liftS2 on every level which will walk to the bottom of the tree,
so we get a quadratic lower bound Ω(height2).
Out of sight, out of mind
The problematic combinators are fmap and liftS2, which weaveS uses to
construct the unfolded tree. If we don’t care about that tree—wanting only
the effect of a monadic unfold—then we can get rid of the complexity
associated with those combinators.
With no result to return, we remove the a type parameter from the definition
of WeaveS, yielding the oblivious (“O”) variant:
dataWeaveOm=EndO|WeftO (m (WeaveOm))
We rewrite mesh into meshO, reducing a WeaveO m computation
into m () instead of m a.
To implement a breadth-first walk, we modify weaveS above by replacing
liftA2 (Node a) with (<>). Note that the type parameter a is no longer in
the result. It was only used in the tree that we decided to forget.
Running weaveO with meshO yields a oblivious monadic unfold:
it produces m () instead of m (Tree a).
(You may remember seeing another implementation of that same signature
just earlier, unfoldM_BF_G_.)
Previously, we benchmarked the function thinTreeS that outputs a tree by
forcing the tree. With an oblivious unfold, there is no tree to force.
Instead we will count the number of generated NodeF constructors:
thinTreeO::Int->IntthinTreeO= (`execState`0) .unfoldM_BF_O_ (state.f)wheref0counter= (LeafF, counter)fhcounter= (NodeF () 0 (h-1), counter+1) -- increment the counter for every NodeF
We adapt the benchmark from before to measure the complexity of
unfolding thin trees. We have to increase the baseline height from 100 to 500
because this benchmark runs so much faster than the previous ones.
$ cabal exec breadth-first-unfolds -- -p O-thin
All
O-thin
1x: OK
148 μs ± 8.3 μs, 543 KB allocated, 773 B copied, 6.0 MB peak memory
10x: OK
1.45 ms ± 113 μs, 5.4 MB allocated, 82 KB copied, 7.0 MB peak memory, 9.78x
The growth is linear, as desired:
the “10x” bench is 10x slower than the baseline “1x” bench.
Laziness for the win
The oblivious unfold avoided quadratic explosion by simplifying the problem.
Now let’s solve the original problem again,
so we can’t just get rid of fmap and liftA2.
As mentioned previously, the root cause was that (1) liftA2 calls fmap when
one of the constructors is EndS, and (2) fmap traverses the other argument.
The next solution will be to make fmap take constant time,
by storing the “mapped function” in the constructor.
Behold the “L” variant of WeaveS, which is a GADT:
The Applicative instance is… a good exercise for the reader.
The details are not immediately important—we only care about improving fmap
for now—we will come back to have a look at the Applicative instance soon.
The runner function meshL is a simple bit of type Tetris.
$ cabal exec breadth-first-unfolds -- -p "L-thin"
All
L-thin
1x: OK
14.1 μs ± 782 ns, 59 KB allocated, 5 B copied, 6.0 MB peak memory
10x: OK
140 μs ± 13 μs, 586 KB allocated, 51 B copied, 6.0 MB peak memory, 9.93x
Lazy in more ways than one
As hinted by the “L” and “S” suffixes,
WeaveL is a “lazy” variant of WeaveS: fmap for WeaveL “postpones”
work by accumulating functions in the WeftL constructor.
That work is “forced” by meshL, which is where the fmap ((<$>)) of the
underlying monad m is called, performing the work accumulated
by possibly many calls to WeaveL’s fmap.
One subtlety is that there are multiple “lazinesses” at play.
The main benefit of using WeaveL is really to delay computation,
that is a kind of laziness, but WeaveL doesn’t need to be
implemented in a lazy language.
We can rewrite all of the code we’ve seen so far in a strict language
with minor changes, and we will still observe the quadratic vs linear behavior
of WeaveS vs WeaveL on thin trees.
The “manufactured laziness” of WeaveL is a concept independent of the
“ambient laziness” in Haskell.
Nevertheless, we can still find an interesting role for that “ambient laziness”
in this story. Indeed, the function weaveL also happens to be lazier than
weaveS in the usual sense.
A concrete test case is worth a thousand words. Consider the following
tree generator which keeps unfolding left subtrees while making
every right subtree undefined:
whnfTreeS::TestTreewhnfTreeS=expectFail$testCase"S-whnf"$docasepartialTreeSofNode___->pure () -- SucceedLeaf->error"unreachable"-- definitely not a Leaf
As it turns out, this test using the “S” variant fails. (That’s
why the test is marked with expectFail.)
Forcing partialTreeS evaluates the undefined in partialTreeF.
Therefore partialTreeS is not equivalent to partialTree.
$ cabal exec breadth-first-unfolds -- -p "L-whnf"
All
L-whnf: OK
This difference can only be seen with “lazy monads”, where (>>=) is
lazy in its first argument.
(If this definition sounds not quite right, that’s probably because of seq.
It makes a precise definition of “lazy monad” more complicated.)
Examples of lazy monads from the transformers library
are Identity, Reader, lazy State, lazy Writer, and Accum.
The secret sauce is the definition of liftA2 for WeaveL:
In the third clause of liftA2, we put the function f in a lambda with a
lazy pattern (~(a, b)) directly under the topmost constructor WeftL.
Thus, we can access the result of f from the second field of WeftL
without looking at the first field. In liftS2 earlier, f was
passed as an argument to (liftA2 . liftS2), that forces us to run the
computation before we can get a hold on the result of f.
Maximizing laziness
The “L” variant of unfoldM is lazier than the “S” variant,
but there is still a gap between partialTreeL and the pure partialTree:
if we force not only the root, but also the left subtree of partialTreeL,
then we run into undefined again.
Although the unfold using WeaveL is lazier than using WeaveS,
it is not yet as lazy as it could be.
The reason is that, strictly speaking, WeaveL’s liftA2 is a strict function.
The expansion function partialTreeF produces a level with an undefined
sub-computation, which crashes the whole level.
Each level in a computation will be either completely defined or undefined.
To recap, we’ve been looking at the following trees:
It is natural to ask: can we define a breadth-first unfold that, when applied
to partialTreeF, will yield the same tree as partialTree?
More generally, the new problem is to define a breadth-first unfoldM
whose specialization with the Identity functor is equivalent to
the pure unfold even on partially-defined values. That is, it satisfies
the following equation:
unfold f = runIdentity . unfoldM (Identity . f)
Laziness without end
The strictness of liftA2 is caused by WeaveL having two constructors.
Let’s get rid of EndL.
Wait a second. I spoke too fast, GHC gives us an error:
error: [GHC-87005]
• An existential or GADT data constructor cannot be used
inside a lazy (~) pattern
• In the pattern: WeftE wa g
In the pattern: ~(WeftE wa g)
In an equation for ‘fmap’: fmap f ~(WeftE wa g) = WeftE wa (f . g)
|
641 | > fmap f ~(WeftE wa g) = WeftE wa (f . g)
| ^^^^^^^^^^
The feature we need is “first-class existentials”,
for which there is an open GHC proposal.
Not letting that stop us, there is a simple version of first-class existentials
available in the package some,
as the module Data.Some.Newtype (internally using unsafeCoerce).
That will be sufficient for our purposes.
All we need is an abstract type Some and a pattern synonym:
-- imported from Data.Some.NewtypedataSomefpatternSome::fa->Somef
And we’re back on track. Here comes the actual “E” (endless) variant:
The endless WeaveE enables an even lazier implementation of unfoldM.
When specialized to the identity monad, it lets us force the resulting
tree in any order. The forceLeftTreeE test passes (unlike forceLeftTreeL).
$ cabal exec breadth-first-unfolds -- -p "E-left"
All
E-left: OK
One can also check that forcing the left spine of partialTreeE
arbitrarily deep throws no errors.
We made it lazy, but at what cost?
First, this “Endless” variant only works for lazy monads.
With a strict monad, the runner meshE will loop forever.
It is possible to run things more incrementally by pattern-matching on
WeaveE, but you’re better off using the oblivious WeaveO anyway.
Second, when you aren’t running into an unproductive loop, the “Endless” variant of
unfoldM has quadratic time complexity Ω(height2). The reason
is essentially the same as the “Strict” variant: liftA2 keeps looping even if
one argument is a pure weave—before, that was to traverse the other
non-pure argument, now, there isn’t even a way to tell when the computation
has ended.
Thus, every leaf may create work proportional to the height of the tree.
Running the same benchmark as before, we measure even more baffling timings:
Using the previous setup comparing a baseline and a 10x run, we see a more than
700x slowdown, so much worse than the 100x predicted by a quadratic model.
Interestingly, the raw output shows that the total cumulative allocations did
grow by a 100x factor.1
But it gets weirder with more data points: it does not follow a clear power law.
If Time(n) grew as
nc for some fixed exponent c, then the ratio
Time(Mn)/Time(n) would be Mc,
a constant that does not depend on n.
In the following benchmark, we keep doubling the height (M = 2) for every
test case, and we measure the time relative to the preceding case each time.
A quadratic model predicts a 4x slowdown at every step. Instead, we
observe wildly varying factors.
Benchmark output (each time factor is relative to the preceding line,
for example, the “4x” benchmark is 9.5x slower than the “2x” benchmark):
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
E-thin-more
1x: OK
222 μs ± 9.3 μs, 1.2 MB allocated, 13 KB copied, 6.0 MB peak memory
2x: OK
2.43 ms ± 85 μs, 4.8 MB allocated, 236 KB copied, 7.0 MB peak memory, 10.94x
4x: OK
23.1 ms ± 1.2 ms, 19 MB allocated, 2.7 MB copied, 10 MB peak memory, 9.53x
8x: OK
126 ms ± 7.8 ms, 76 MB allocated, 18 MB copied, 24 MB peak memory, 5.44x
16x: OK
181 ms ± 7.0 ms, 119 MB allocated, 30 MB copied, 24 MB peak memory, 1.44x
I believe this benchmark is triggering some pathological behavior in the garbage
collector. I modified tasty-bench with an option to measure CPU time without GC
(mutator time). At time of writing, tasty-bench is still waiting for a new release.
We can point Cabal to an unreleased commit of tasty-bench by adding the following
lines to cabal.project.local.
For the “2x” benchmarks, we are closer the expected 4x slowdown, but there is
still a noticeable gap.
I’m going to chalk the rest to inherent measurement errors (the cost of
tasty-bench’s simplicity) exacerbated by the pathological GC behavior;
a possible explanation is that the pattern of memory usage becomes so bad that
it affects non-GC time.
Benchmark output (excluding GC time, each measurement is relative to the
preceding line):
Microbenchmarks: Queues vs Global Levels vs Weaves
So far we’ve focused on asymptotics (linear vs quadratic). Some readers
will inevitably wonder about real speed.
Among the linear-time algorithms—queues (“Q”), global levels (“G”),
and weaves (lazy “L” or oblivious “O”)—which one is faster?
tl;dr: Queues are (much) faster in these microbenchmarks (up to 25x!),
but keep in mind that these are all quite naive implementations.
There are two categories to measure separately: unfolds which produce trees,
and oblivious unfolds—which don’t produce trees. These microbenchmarks
construct full trees up to a chosen number of nodes. When there is an
output tree, we force it (using nf), otherwise we force a counter of the
number of nodes. We run on different sufficiently large sizes (500 and 5000)
to check the stability of the measured factors, ensuring that we are only
comparing the time components that dominate at scale.
The tables list times relative to the queue benchmark for each tree size.
I hope to have piqued your interest in breadth-first unfolds without
using queues.
To the best of my knowledge, this specific problem hasn’t been studied in the
literature. It is of course related to breadth-first traversals,
previously solved using the Phases applicative.2
The intersection of functional programming and breadth-first walks is a small
niche, which makes it quick to survey that corner of the world for any related
ideas to those presented here.
The paper Modular models of monoids with operations by Zhixuan Yang
and Nicolas Wu, in ICFP 2023, mentions a general construction of Phases as an
example application of their theory. Basically, Phases is defined by a
fixed-point equation:
Phases f = Day f Phases :+: Identity
We can express Phases abstractly as a least fixed-point
μx.f▫x + Id in any monoidal category with a suitable structure.
If we instantiate the monoidal product ▫ not with Day convolution,
but with functor composition (Compose), then we get Weave.
In another coincidence, the monad-coroutine package
implements a weave function which is a generalization of
liftS2—this may require some squinting.
While WeaveS as a data type coincides with the free monad Free,
monad-coroutine’s core data type Coroutine coincides
with the free monad transformer FreeT.
We can view Phases as a generalization of “zipping” from
lists to free applicatives—which are essentially lists of actions,
and Weave generalizes that further to free monads. To recap, the surprise was
that the naive data type of free monads results in a quadratic-time unfold.
That issue motivated a “lazy” variant3 which achieves a linear-time
breadth-first unfold. That in turn suggested an even “lazier” variant which
enables more control on evaluation order at the cost of efficiency.
I’ve just released the weave library which implements
the main ideas of this post. I don’t expect it to have many users, given
how much slower it is compared to queue-based solutions.
But I would be curious to find a use case for the new compositionality
afforded by this abstraction.
Recap table
Unfolds
Time
Laziness
Compositional
Phases*
No
linear†
by levels
Yes
Queue (Q)
Yes
linear†
strict
No
Global Levels (G)
Yes
linear†
by levels
No
Strict Weave (S)
Yes
quadratic‡
strict
Yes
Oblivious Weave (O)
Oblivious only
linear†
N/A
Yes
Lazy Weave (L)
Yes
linear†
by levels
Yes
Endless Weave (E)
Yes
quadratic‡E
maximally lazy◊
Yes
†Linear wrt. size: Θ(size). ‡Quadratic wrt. height: lower bound Ω(height2), upper bound O(height × size). EThe “Endless” meshE only terminates with lazy monads. *I guess there exists an “endless Phases” variant, that
would be quadratic and maximally lazy. ◊The definition of “maximally lazy” in this post actually misses a
range of possible lazy behaviors with monads other than Identity. A further
refinement seems to be another can of worms.
Note that tasty-bench also reports memory statistics
(allocated, copied, and peak memory) when certain RTS options are enabled,
which I’ve done by compiling the test executable with -with-rtsopts=-T.↩︎
The GHC developers are very pleased to announce the availability of
GHC 9.6.7. Binary distributions, source distributions, and
documentation are available on the
release page.
GHC 9.6.7 brings number of fixes, including:
GHC’s internal Unique type has been widened to 64-bits on 32-bit
architectures, avoiding potential miscompilations on large projects.
Fix a runtime crash when using the compacting GC, caused by black
holes in large objects.
Added new flags -fspec-eval and -fspec-eval-dictfun to allow
switching off speculative evaluation.
The following libraries have been updated since GHC 9.6.6:
Note about Haskell Language Server and building GHC 9.8+:
The change of Unique to 64 bit (GHC#22010)
adds the exported symbol
ghc_unique_counter64 to the RTS. Unfortunately it’s impossible to
avoid this without breaking other things. If you encounter a linker
error related to ghc_unique_counter64 when building GHC (or building a
GHC-derived package like ghc-lib-parser) with GHC 9.6.7, you probably
have to add this fix
to the program you’re building.
We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed,
Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and
other anonymous contributors whose on-going financial and in-kind
support has facilitated GHC maintenance and release management over
the years. Finally, this release would not have been possible without
the hundreds of open-source contributors whose work comprise this
release.
Please give this release a try and open a ticket
if you see anything amiss.
A new release of Liquid Haskell is out after quite an active period of
development with 99 pull requests in the liquidhaskell repository, and
29 pull requests in the liquid-fixpoint repository from about ten contributors.
This post is to provide an overview of the changes that made it into the latest release.
There were contributions to the reflection and proof mechanisms; we got
contributions to the integration with GHC; the support of cvc5 was improved
when dealing with sets, bags, and maps; and there was a rather large overhaul
of the name resolution mechanism.
Reflection improvements
Liquid Haskell is a tool to verify Haskell programs. We can write formal
specifications inside special Haskell comments {-@ ... @-}, and the tool
will check whether the program behaves as specified. For instance, the following
specification of the filter function says that we expect all of the elements
in the result to satisfy the given predicate.
{-@ filter :: p:(a -> Bool) -> xs:[a] -> {v:[a] | all p v } @-}
Liquid Haskell would then analyze the implementation of filter to verify that
it does indeed yield elements that satisfy the predicate.
To verify such a specification, Liquid Haskell needs to attach a meaning to the
names in the predicate all p v. It readily learns that p is a parameter
of filter, and that v is the result. all, however, isn’t bound by the specification’s parameters, so it refers to whatever is in scope, which is the
Haskell function from the Prelude.
all::(a->Bool)->[a]->Bool
And Liquid Haskell has a mechanism to provide logic meaning to the implementation
of a function like all, known as reflection. While it has always been convenient to reflect functions in modules analyzed by Liquid Haskell, it was not so easy when there was a
mix of local and imported definitions from dependencies that are not analysed with
Liquid Haskell. Last year, there was an internship at Tweag to address exactly this
friction, which resulted in contributions to the
latest release.
Reasoning and reflection of lambdas
The reflection mechanism also has other specific limitations at the moment. For instance,
it doesn’t allow reflecting recursive functions defined in let or where bindings. And
until recently, it didn’t allow reflecting functions that contained anonymous functions.
For example,
takePositives = filter (\x -> x > 0)
In the latest release, we have several contributions that introduce support for reflecting lambdas and improve the story for reasoning with them.
This feature is considered experimental at the moment, since we will still have usability and
performance concerns that deserve further contributions, but one can
already explore the experience that we could expect in the long run.
Integration with GHC
In 2020 Liquid Haskell became a compiler plugin for GHC. It was hooked into the
end of the type checking phase firstly to ensure it only runs on well-typed programs,
and secondly, to ensure the plugin runs when GHC is only asked to typecheck the
module but not to generate code, which was helpful to IDEs.
For a few technical reasons, the plugin was re-parsing and re-typechecking the module
instead of using the abstract syntax tree (AST) that GHC handed to it as the result of
type checking. That is no longer the case in the latest release, where the AST after
type checking is now used for all purposes. In addition, there were several improvements
to how the ghc library is used.
cvc5 support
Liquid Haskell offloads part of its reasoning to a family of automated theorem
provers known as SMT solvers. For most developments, Liquid Haskell has been
used with the Z3 SMT solver, and this is what has been used most of the time in
continuous integration pipelines.
In theory, any SMT solver can be used with Liquid Haskell, if it provides a standard
interface known as SMT-LIB. In practice, however, experiments are done with
theories that are not part of the standard. For instance, the reasoning capabilities
for bags, sets, and maps used to require z3. But now the latest release implements
support for cvc5 as well.
Name resolution overhaul
Name resolution determines, for each name in a program, what is the definition that
it refers to. Liquid Haskell, in particular, is responsible for resolving names
that appear in specifications. This task was problematic when the programs
it was asked to verify spanned many modules.
There were multiple kinds of names, each with their own name resolution rules,
and names were resolved in different environments when verifying a module and
when importing it elsewhere, not always yielding the same results, which often
produced confusing errors.
Name resolution, however, was done all over the code base, and any attempt to
rationalize it would require a few months of effort. I started such an epic last
September, and managed to conclude it in February.
These changes made it into the latest release together with an awful lot of
side quests to simplify the existing code.
The road ahead
There is no coordinated roadmap for Liquid Haskell. Much of the contributions
that it receives depend on the opportunity enabled by academic research or
the needs of particular use cases.
On my side, I’m trying to improve the adoption of Liquid Haskell. Much of the challenge
is reducing the amount of common workarounds that the proficient Liquid Haskeller
needs to employ today. For instance, supporting reflection of functions in local bindings
would save the user the trouble of rewriting her programs to put the recursive functions
in the top level.
Repairing the support for type classes would allow functions to be verified
even if they use type classes, which is a large subset of Haskell today.
And without having defined a scope with precision yet, Liquid Haskell still needs to
improve its user documentation, its error messages, and its tracing and logging.
The project is chugging along, though. It is making significant leaps in usability. The
upgrade costs have been quantified for a few GHC releases, and
no longer look like an unbounded risk. The amount of external contributions has
increased last year, although we still have to see if it is a trend. And there is
no shortage of interest from academia and industrial interns.
Thanks to the many contributors for their work and their help during code
reviews. I look forward to learning what makes it into the coming Liquid Haskell releases!
On this episode of the Haskell Interlude, Andres Löh and Mike Sperber are joined by Farhad Mehta, a professor at OST Rapperswil, and one of the organizers of ZuriHac. Fahrad tells us about formal methods, building tunnels, the importance of education, and the complicated relationship between academia and industry.
At work I sometimes need to deal with large and deep JSON objects where I'm only
interested in a few of the values. If all the interesting values are on the top
level, then aeson have functions that make it easy to implement FromJSON's
parseJSON (Constructors and accessors), but if the values are spread out then
the functions in aeson come up a bit short. That's when I reach for lens-aeson,
as lenses make it very easy to work with large structures. However, I've found
that using its lenses to implement parseJSON become a lot easier with a few
helper functions.
Many of the lenses produces results wrapped in Maybe, so the first function is
one that transforms a Maybe a to a Parser a. Here I make use of Parser
implementing MonadFail.
infixl8<!>(<!>) :: (MonadFail m) => Maybe a->String->m a(<!>) mv err = maybe (fail err) pure mv
In some code I wrote this week I used it to extract the user name out of a JWT
produced by Keycloak:
instance FromJSON OurClaimsSetwhere
parseJSON = ... $ \o ->do
cs <- parseJSON o
n <- o ^? key "preferred_username". _String <!>"preferre username missing"
...
pure $ OurClaimsSet cs n ...
Also, all the lenses start with a Value and that makes the withX functions
in aeson to not be a perfect fit. So I define variations of the withX
functions, e.g.
withObjectV :: String->(Value->Parser a)->Value->Parser awithObjectV s f = withObject s (f . Object)
That makes the full FromJSON instance for OurClaimsSet looks like this
instance FromJSON OurClaimsSetwhereparseJSON = withObjectV "OurClaimsSet"$ \o ->do
cs <- parseJSON o
n <- o ^? key "preferred_username". _String <!>"name"letrs = o ^.. key "resource_access". members . key "roles". _Array . traverse . _String
pure $ OurClaimsSet cs n rs
The GHC developers are happy to announce the release of GHC 9.12.2.
Binary distributions, source distributions, and documentation are available at
downloads.haskell.org.
We hope to have this release available via ghcup shortly. This is a small
release fixing a critical code generation bug, #25653, affecting some subword
division operations.
As always, GHC’s release status, including planned future releases, can
be found on the GHC Wiki status.
We would like to thank IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell
Foundation, and other anonymous contributors whose on-going financial
and in-kind support has facilitated GHC maintenance and release
management over the years. Finally, this release would not have been
possible without the hundreds of open-source contributors who
contribute their code, tickets, and energy to the GHC project.
As always, do give this release a try and open a ticket if you see
anything amiss.
I’ve created an open mirror
contest which will run in
parallel to the official contest, so if you want to grab some friends
and try solving some of the problems together using your favorite
language, be my guest!
<noscript>Javascript needs to be activated to view comments.</noscript>
A few months ago I explained that one reason why this blog has become more quiet is that all my work on Lean is covered elsewhere.
This post is an exception, because it is an observation that is (arguably) interesting, but does not lead anywhere, so where else to put it than my own blog…
When defining a function recursively in Lean that has nested recursion, e.g. a recusive call that is in the argument to a higher-order function like List.map, then extra attention used to be necessary so that Lean can see that xs.map applies its argument only elements of the list xs. The usual idiom is to write xs.attach.map instead, where List.attach attaches to the list elements a proof that they are in that list. You can read more about this my Lean blog post on recursive definitions and our new shiny reference manual, look for Example “Nested Recursion in Higher-order Functions”.
To make this step less tedious I taught Lean to automatically rewrite xs.map to xs.attach.map (where suitable) within the construction of well-founded recursion, so that nested recursion just works (issue #5471). We already do such a rewriting to change if c then … else … to the dependent if h : c then … else …, but the attach-introduction is much more ambitious (the rewrites are not definitionally equal, there are higher-order arguments etc.) Rewriting the terms in a way that we can still prove the connection later when creating the equational lemmas is hairy at best. Also, we want the whole machinery to be extensible by the user, setting up their own higher order functions to add more facts to the context of the termination proof.
I implemented it like this (PR #6744) and it ships with 4.18.0, but in the course of this work I thought about a quite different and maybe better™ way to do this, and well-founded recursion in general:
WellFounded.fix : (hwf : WellFounded r) (F : (x : α) → ((y : α) → r y x → C y) → C x) (x : α) : C x
we have to rewrite the functorial of the recursive function, which naturally has type
F : ((y : α) → C y) → ((x : α) → C x)
to the one above, where all recursive calls take the termination proof r y x. This is a fairly hairy operation, mangling the type of matcher’s motives and whatnot.
so the functorial’s type is unmodified (here β will be ((x : α) → C x)), and everything else is in the propositional side-condition montone F. For this predicate we have a syntax-guided compositional tactic, and it’s easily extensible, e.g. by
theorem monotone_mapM (f : γ → α → m β) (xs : List α) (hmono : monotone f) :
monotone (fun x => xs.mapM (f x))
Once given, we don’t care about the content of that proof. In particular proving the unfolding theorem only deals with the unmodified F that closely matches the function definition as written by the user. Much simpler!
Isabelle has it easier
Isabelle also supports well-founded recursion, and has great support for nested recursion. And it’s much simpler!
There, all you have to do to make nested recursion work is to define a congruence lemma of the form, for List.map something like our List.map_congr_left
List.map_congr_left : (h : ∀ a ∈ l, f a = g a) :
List.map f l = List.map g l
This is because in Isabelle, too, the termination proofs is a side-condition that essentially states “the functorial F calls its argument f only on smaller arguments”.
Can we have it easy, too?
I had wished we could do the same in Lean for a while, but that form of congruence lemma just isn’t strong enough for us.
But maybe there is a way to do it, using an existential to give a witness that F can alternatively implemented using the more restrictive argument. The following callsOn P F predicate can express that F calls its higher-order argument only on arguments that satisfy the predicate P:
section setup
variable {α : Sort u}
variable {β : α → Sort v}
variable {γ : Sort w}
def callsOn (P : α → Prop) (F : (∀ y, β y) → γ) :=
∃ (F': (∀ y, P y → β y) → γ), ∀ f, F' (fun y _ => f y) = F f
variable (R : α → α → Prop)
variable (F : (∀ y, β y) → (∀ x, β x))
local infix:50 " ≺ " => R
def recursesVia : Prop := ∀ x, callsOn (· ≺ x) (fun f => F f x)
noncomputable def fix (wf : WellFounded R) (h : recursesVia R F) : (∀ x, β x) :=
wf.fix (fun x => (h x).choose)
def fix_eq (wf : WellFounded R) h x :
fix R F wf h x = F (fix R F wf h) x := by
unfold fix
rw [wf.fix_eq]
apply (h x).choose_spec
This allows nice compositional lemmas to discharge callsOn predicates:
theorem callsOn_base (y : α) (hy : P y) :
callsOn P (fun (f : ∀ x, β x) => f y) := by
exists fun f => f y hy
intros; rfl
@[simp]
theorem callsOn_const (x : γ) :
callsOn P (fun (_ : ∀ x, β x) => x) :=
⟨fun _ => x, fun _ => rfl⟩
theorem callsOn_app
{γ₁ : Sort uu} {γ₂ : Sort ww}
(F₁ : (∀ y, β y) → γ₂ → γ₁) -- can this also support dependent types?
(F₂ : (∀ y, β y) → γ₂)
(h₁ : callsOn P F₁)
(h₂ : callsOn P F₂) :
callsOn P (fun f => F₁ f (F₂ f)) := by
obtain ⟨F₁', h₁⟩ := h₁
obtain ⟨F₂', h₂⟩ := h₂
exists (fun f => F₁' f (F₂' f))
intros; simp_all
theorem callsOn_lam
{γ₁ : Sort uu}
(F : γ₁ → (∀ y, β y) → γ) -- can this also support dependent types?
(h : ∀ x, callsOn P (F x)) :
callsOn P (fun f x => F x f) := by
exists (fun f x => (h x).choose f)
intro f
ext x
apply (h x).choose_spec
theorem callsOn_app2
{γ₁ : Sort uu} {γ₂ : Sort ww}
(g : γ₁ → γ₂ → γ)
(F₁ : (∀ y, β y) → γ₁) -- can this also support dependent types?
(F₂ : (∀ y, β y) → γ₂)
(h₁ : callsOn P F₁)
(h₂ : callsOn P F₂) :
callsOn P (fun f => g (F₁ f) (F₂ f)) := by
apply_rules [callsOn_app, callsOn_const]
With this setup, we can have the following, possibly user-defined, lemma expressing that List.map calls its arguments only on elements of the list:
theorem callsOn_map (δ : Type uu) (γ : Type ww)
(P : α → Prop) (F : (∀ y, β y) → δ → γ) (xs : List δ)
(h : ∀ x, x ∈ xs → callsOn P (fun f => F f x)) :
callsOn P (fun f => xs.map (fun x => F f x)) := by
suffices callsOn P (fun f => xs.attach.map (fun ⟨x, h⟩ => F f x)) by
simpa
apply callsOn_app
· apply callsOn_app
· apply callsOn_const
· apply callsOn_lam
intro ⟨x', hx'⟩
dsimp
exact (h x' hx')
· apply callsOn_const
end setup
So here is the (manual) construction of a nested map for trees:
section examples
structure Tree (α : Type u) where
val : α
cs : List (Tree α)
-- essentially
-- def Tree.map (f : α → β) : Tree α → Tree β :=
-- fun t => ⟨f t.val, t.cs.map Tree.map⟩)
noncomputable def Tree.map (f : α → β) : Tree α → Tree β :=
fix (sizeOf · < sizeOf ·) (fun map t => ⟨f t.val, t.cs.map map⟩)
(InvImage.wf (sizeOf ·) WellFoundedRelation.wf) <| by
intro ⟨v, cs⟩
dsimp only
apply callsOn_app2
· apply callsOn_const
· apply callsOn_map
intro t' ht'
apply callsOn_base
-- ht' : t' ∈ cs -- !
-- ⊢ sizeOf t' < sizeOf { val := v, cs := cs }
decreasing_trivial
end examples
This makes me happy!
All details of the construction are now contained in a proof that can proceed by a syntax-driven tactic and that’s easily and (likely robustly) extensible by the user. It also means that we can share a lot of code paths (e.g. everything related to equational theorems) between well-founded recursion and partial_fixpoint.
I wonder if this construction is really as powerful as our current one, or if there are certain (likely dependently typed) functions where this doesn’t fit, but the β above is dependent, so it looks good.
With this construction, functions defined by well-founded recursion will reduce even worse in the kernel, I assume. This may be a good thing.
The cake is a lie
What unfortunately kills this idea, though, is the generation of the functional induction principles, which I believe is not (easily) possible with this construction: The functional induction principle is proved by massaging F to return a proof, but since the extra assumptions (e.g. for ite or List.map) only exist in the termination proof, they are not available in F.
Oh wey, how anticlimactic.
PS: Path dependencies
Curiously, if we didn’t have functional induction at this point yet, then very likely I’d change Lean to use this construction, and then we’d either not get functional induction, or it would be implemented very differently, maybe a more syntactic approach that would re-prove termination. I guess that’s called path dependence.
There’s yet again been a bit of functional programming-adjacent twitter drama
recently, but it’s actually sort of touched into some subtleties about sum types
that I am asked about (and think about) a lot nowadays. So, I’d like to take
this opportunity to talk a bit about the “why” and nature of sum types and how
to use them effectively, and how they contrast with other related concepts in
programming and software development and when even cases where sum types aren’t
the best option.
Sum Types at their Best
The quintessential sum type that you just can’t live without is
Maybe, now adopted in a lot of languages as
Optional:
dataMaybe a =Nothing|Just a
If you have a value of type Maybe Int, it means that its valid
values are Nothing, Just 0, Just 1,
etc.
This is also a good illustration to why we call it a “sum” type: if
a has n possible values, then Maybe a has
1 + n: we add the single new value Nothing to it.
The “benefit” of the sum type is illustrated pretty clearly here too: every
time you use a value of type Maybe Int, you are forced to
consider the fact that it could be Nothing:
showMaybeInt ::MaybeInt->StringshowMaybeInt = \caseNothing->"There's nothing here"Just i ->"Something is here: "<>show i
That’s because usually in sum type implementations, they are implemented in a
way that forces you to handle each case exhaustively. Otherwise, sum types are
much less useful.
At the most fundamental level, this behaves like a compiler-enforced null
check, but built within the language in user-space instead being compiler magic,
ad-hoc syntax1, or static analysis — and the fact that it
can live in user-space is why it’s been adopted so widely. At a higher level,
functional abstractions like Functor, Applicative, Monad, Foldable, Traversable
allow you to use a Maybe a like just a normal a with
the appropriate semantics, but that’s a
topic for another time (like 2014).
This power is very special to me on a personal level. I remember many years
ago on my first major haskell project changing a type from String
to Maybe String, and then GHC telling me every place in the
codebase where something needed to change in order for things to work still.
Coming from dynamically typed languages in the past, this sublime experience
truly altered my brain chemistry and Haskell-pilled me for the rest of my life.
I still remember the exact moment, what coffee shop I was at, what my order was,
the weather that day … it was truly the first day of the rest of my life.
It should be noted that I don’t consider sum types a “language feature” or a
compiler feature as much as I’d consider it a design pattern. Languages that
don’t have sum types built-in can usually implement them using typed unions and
an abstract visitor pattern interface (more on that later). Of course, having a
way to “check” your code before running it (like with a type system or
statically verified type annotations) does make a lot of the features much more
useful.
Anyway, this basic pattern can be extended to include more error information
in your Nothing branch, which is how you get the
Either e a type in the Haskell standard library, or the
Result<T,E> type in rust.
Along different lines, we have the common use case of defining syntax
trees:
dataExpr=LitInt|NegateExpr|AddExprExpr|SubExprExpr|MulExprExpreval ::Expr->Inteval = \caseLit i -> iNegate x ->-(eval x)Add x y -> eval x + eval ySub x y -> eval x - eval yMul x y -> eval x * eval ypretty ::Expr->Stringpretty = go 0where wrap ::Int->Int->String->String wrap prio opPrec s| prio > opPrec ="("<> s <>")"|otherwise= s go prio = \caseLit i ->show iNegate x -> wrap prio 2$"-"<> go 2 xAdd x y -> wrap prio 0$ go 0 x <>" + "<> go 1 ySub x y -> wrap prio 0$ go 0 x <>" - "<> go 1 yMul x y -> wrap prio 1$ go 1 x <>" * "<> go 2 ymain ::IO ()main =doputStrLn$ pretty myExprprint$ eval myExprwhere myExpr =Mul (Negate (Add (Lit4) (Lit5))) (Lit8)
-(4 + 5) * 8
-72
Now, if we add a new command to the sum type, the compiler enforces us to
handle it.
dataExpr=LitInt|NegateExpr|AddExprExpr|SubExprExpr|MulExprExpr|AbsExpreval ::Expr->Inteval = \caseLit i -> iNegate x ->-(eval x)Add x y -> eval x + eval ySub x y -> eval x - eval yMul x y -> eval x * eval yAbs x ->abs (eval x)pretty ::Expr->Stringpretty = go 0where wrap ::Int->Int->String->String wrap prio opPrec s| prio > opPrec ="("<> s <>")"|otherwise= s go prio = \caseLit i ->show iNegate x -> wrap prio 2$"-"<> go 2 xAdd x y -> wrap prio 0$ go 0 x <>" + "<> go 1 ySub x y -> wrap prio 0$ go 0 x <>" - "<> go 1 yMul x y -> wrap prio 1$ go 1 x <>" * "<> go 2 yAbs x -> wrap prio 2$"|"<> go 0 x <>"|"
Another example where things shine are as clearly-fined APIs between
processes. For example, we can imagine a “command” type that sends different
types of commands with different payloads. This can be interpreted as perhaps
the result of parsing command line arguments or the message in some
communication protocol.
For example, you could have a protocol that launches and controls
processes:
dataCommand a =LaunchString (Int-> a) -- ^ takes a name, returns a process ID|StopInt (Bool-> a) -- ^ takes a process ID, returns success/failurelaunch ::String->CommandIntlaunch nm =Launch nm idstop ::Int->CommandBoolstop pid =Stop pid id
This ADT is written in the “interpreter” pattern (used often with things like
free monad), where any arguments not involving a are the command
payload, any X -> a represent that the command could respond
with X.
Let’s write a sample interpreter backing the state in an IntMap in an
IORef:
importqualifiedData.IntMapasIMimportData.IntMap (IntMap)runCommand ::IORef (IntMapString) ->Command a ->IO arunCommand ref = \caseLaunch newName next ->do currMap <- readIORef reflet newId =case IM.lookupMax currMap ofNothing->0Just (i, _) -> i +1 modifyIORef ref $ IM.insert newId newNamepure (next newId)Stop procId next ->do existed <- IM.member procId <$> readIORef ref modifyIORef ref $ IM.delete procIdpure (next existed)main ::IO ()main =do ref <- newIORef IM.empty aliceId <- runCommand ref $ launch "alice"putStrLn$"Launched alice with ID "<>show aliceId bobId <- runCommand ref $ launch "bob"putStrLn$"Launched bob with ID "<>show bobId success <- runCommand ref $ stop aliceIdputStrLn$if successthen"alice succesfully stopped"else"alice unsuccesfully stopped"print=<< readIORef ref
Launched alice with ID 0
Launched bob with ID 1
alice succesfully stopped
fromList [(1, "bob")]
Let’s add a command to “query” a process id for its current status:
dataCommand a =LaunchString (Int-> a) -- ^ takes a name, returns a process ID|StopInt (Bool-> a) -- ^ takes a process ID, returns success/failure|QueryInt (String-> a) -- ^ takes a process ID, returns a status messagequery ::Int->CommandStringquery pid =Query pid idrunCommand ::IORef (IntMapString) ->Command a ->IO arunCommand ref = \case-- ...Query procId next ->do procName <- IM.lookup procId <$> readIORef refpurecase procName ofNothing->"This process doesn't exist, silly."Just n ->"Process "<> n <>" chugging along..."
Relationship with Unions
To clarify a common confusion: sum types can be described as “tagged unions”:
you have a tag to indicate which branch you are on (which can be case-matched
on), and then the rest of your data is conditionally present.
In many languages this can be implemented under the hood as a struct with a
tag and a union of data, along with some abstract visitor pattern
interface to ensure exhaustiveness.
Remember, it’s not exactly a union, because, ie, consider a type
like:
dataEntity=UserInt|PostInt
An Entity here could represent a user at a user id, or a post at
a post id. If we considered it purely as a union of Int and
Int:
union Entity {int user_id;int post_id;};
we’d lose the ability to branch on whether or not we have a user or an int.
If we have the tagged union, we recover the original tagged union semantics:
Of course, you still need an abstract interface like the visitor pattern to
actually be able to use this as a sum type with guarantees that you handle every
branch, but that’s a story for another day. Alternatively, if your language
supports dynamic dispatch nicely, that’s another underlying implementation that
would work to back a higher-level visitor pattern interface.
Subtypes Solve a Different
Problem
Now, sum types aren’t exactly a part of common programming education
curriculum, but subtypes and supertypes definitely were
drilled into every CS student’s brain and waking nightmares from their first
year.
Informally (a la Liskov), B is a subtype of A (and
A is a supertype of B) if anywhere that expects an
A, you could also provide a B.
In normal object-oriented programming, this often shows up in early lessons
as Cat and Dog being subclasses of an
Animal class, or Square and Circle being
subclasses of a Shape class.
When people first learn about sum types, there is a tendency to understand
them as similar to subtyping. This is unfortunately understandable, since a lot
of introductions to sum types often start with something like
-- | Bad Sum Type Example!dataShape=CircleDouble|RectangleDoubleDouble
While there are situations where this might be a good sum type (ie, for an
API specification or a state machine), on face-value this is a bad example on
the sum types vs. subtyping distinction.
You might notice the essential “tension” of the sum type: you declare all of
your options up-front, the functions that consume your value are open and
declared ad-hoc. And, if you add new options, all of the consuming functions
must be adjusted.
So, subtypes (and supertypes) are more effective when they lean into
the opposite end: the universe of possible options are open and declared ad-hoc,
but the consuming functions are closed. And, if you add new functions,
all of the members must be adjusted.
In typed languages with a concept of “objects” and “classes”, subtyping is
often implemented using inheritance and interfaces.
So, a function like processWidget(Widget widget) that expects a
Widget would be able to be passed a Button or
InputField or Box. And, if you had a container like
List<Widget>, you could assemble a structure using
Button, InputField, and Box. A perfect
Liskov storm.
In typical library design, you’re able to add new implementations of
Widget as an open universe easily: anyone that imports
Widget can, and they can now use it with functions taking
Widgets. But, if you ever wanted to add new functionality
to the Widget interface, that would be a breaking change to all
downstream implementations.
However, this implementation of subtyping, while prevalent, is the most
mind-numbly boring realization of the concept, and it pained my soul to even
spend time talking about it. So let’s jump into the more interesting way that
subtype and supertype relationships manifest in the only language where anything
is interesting: Haskell.
Subtyping via Parametric
Polymorphism
In Haskell, subtyping is implemented in terms of parametric polymorphism and
sometimes typeclasses. This allows for us to work nicely with the concept of
functions and APIs as subtypes and supertypes of each other.
For example, let’s look at a function that takes indexers and applies
them:
So, what functions could you pass to sumAtLocs? Can you
only pass [Double] -> Int -> Double?
Well, not quite. Look at the above where we passed (!!), which
has type forall a. [a] -> Int -> a!
In fact, what other types could we pass? Here are some examples:
fun1 :: [a] ->Int-> afun1 = (!!)fun2 :: [a] ->Int-> afun2 xs i =reverse xs !! ifun3 :: (Foldable t, Floating a) => t a ->Int-> afun3 xs i =iflength xs > i then xs !! i elsepifun4 ::Num a => [a] ->Int-> afun4 xs i =sum (take i xs)fun5 :: (Integral b, Num c) => a -> b -> cfun5 xs i =fromIntegral ifun5 :: (Foldable t, Fractional a, Integral b) => t a -> b -> afun5 xs i =sum xs /fromIntegral ifun5 :: (Foldable t, Integral b, Floating a) => t a -> b -> afun5 xs i =logBase (fromIntegral i) (sum xs)
What’s going on here? Well, the function expects a
[Double] -> Int -> Double, but there are a lot of other types
that could be passed instead.
At first this might seem like meaningless semantics or trickery, but it’s
deeper than that: remember that each of the above types actually has a very
different meaning and different possible behaviors!
forall a. [a] -> Int -> a means that the amust come from the given list. In fact, any function with that type is
guaranteed to be partial: if you pass it an empty list, there is no
a available to use.
forall a. Num a => [a] -> Int -> a means that the
result might actually come from outside of the list: the implementation could
always return 0 or 1, even if the list is empty. It
also guarantees that it will only add, subtract, multiply, or abs: it will never
divide.
forall a. Fractional a => [a] -> Int -> a means that
we could possibly do division on the result, but we can’t do anything “floating”
like square rooting or logarithms.
forall a. Floating a => [a] -> Int -> a means that we
can possibly start square rooting or taking the logarithms of our input
numbers
[Double] -> Int -> Double gives us the least guarantees
about the behavior: the result could come from thin air (and not be a part of
the list), and we can even inspect the machine representation of our
inputs.
So, we have all of these types with completely different semantics and
meanings. And yet, they can all be passed to something expecting a
[Double] -> Int -> Double. That means that they are all
subtypes of [Double] -> Int -> Double!
[Double] -> Int -> Double is a supertype that houses
multitudes of possible values, uniting all of the possible values and semantics
into one big supertype.
Through the power of parametric polymorphism and typeclasses, you can
actually create an extensible hierarchy of supertypes, not just of
subtypes.
Consider a common API for json serialization. You could have multiple
functions that serialize into JSON:
The type of toJSON :: forall a. JSON a => a -> Value is a
subtype of Foo -> Value, Bar -> Value, and
Baz -> Value, because everywhere you would want a
Foo -> Value, you could give toJSON instead. Every
time you want to serialize a Foo, you could use
toJSON.
This usage works well, as it gives you an extensible abstraction to design
code around. When you write code polymorphic over Monoid a, it
forces you to reason about your values with respect to only the aspects relating
to monoidness. If you write code polymorphic over Num a, it forces
you to reason about your values only with respect to how they can be added,
subtracted, negated, or multiplied, instead of having to worry about things like
their machine representation.
The extensibility comes from the fact that you can create even more
supertypes of forall a. ToJSON a => a -> Value easily,
just by defining a new typeclass instance. So, if you need a
MyType -> Value, you could make it a supertype of
toJSON :: ToJSON a => a -> Value by defining an instance of
the ToJSON typeclass, and now you have something you can use in its
place.
Practically this is used by many libraries. For example, ad uses it for automatic
differentiation: its diff function looks scary:
diff :: (forall s.AD s ForwardDouble->AD s ForwardDouble) ->Double->Double
But it relies on the fact that that
(forall s. AD s ForwardDouble -> AD s ForwardDuble) is a
superclass of (forall a. Floating a => a -> a),
(forall a. Num a => a -> a), etc., so you can give it
functions like \x -> x * x (which is a
forall a. Num a => a -> a) and it will work as that
AD s type:
ghci> diff (\x -> x * x) 1020-- 2*x
This “numeric overloading” method is used by libraries for GPU programming,
as well, to accept numeric functions to be optimized and compiled to GPU
code.
Another huge application is in the lens library, which
uses subtyping to unite its hierarchy of optics.
For example, an Iso is a subtype of Traversal which
is a subtype of Lens, and Lens is a supertype of
Fold and Traversal, etc. In the end the system even
allows you to use id from the Prelude as a lens or a
traversal, because the type signature of id :: a -> a is
actually a subtype of all of those types!
Subtyping using Existential
Types
What more closely matches the spirit of subtypes in OOP and other
languages is the existential type: a value that can be a value of any
type matching some interface.
For example, let’s imagine a value that could be any instance of
Num:
This is somewhat equivalent to Java’s
List<MyInterface> or List<MyClass>, or
python’s List[MyClass].
Note that to use this effectively in Haskell with superclasses and
subclasses, you need to manually wrap and unwrap:
dataSomeFrational=forall a.Fractional a =>SumFractional acastUp ::SomeFractional->SumNumcastUp (SomeFractional x) =SomeNum x
So, SomeNum is “technically” a supertype of
SomeFractional: everywhere a SomeNum is expected, a
SomeFractional can be given…but in Haskell it’s a lot less
convenient because you have to explicitly cast.
In OOP languages, you can often cast “down” using runtime reflection
(SomeNum -> Maybe SomeFractional). However, this is impossible
in Haskell the way we have written it!
That’s because of type erasure: Haskell does not (by default) couple a value
at runtime with all of its associated interface implementations. When you create
a value of type SomeNum, you are packing an untyped pointer to that
value as well as a “dictionary” of all the functions you could use it with:
dataNumDict a =NumDict { (+) :: a -> a -> a , (*) :: a -> a -> a , negate :: a -> a , abs :: a -> a , fromInteger ::Integer-> a }mkNumDict ::Num a =>NumDict amkNumDict =NumDict (+) (*) negateabsfromIntegerdataFractionalDict a =FractionalDict { numDict ::NumDict a , (/) :: a -> a -> a , fromRational ::Rational-> a }-- | Essentially equivalent to the previous 'SomeNum'dataSomeNum=forall a.SomeNum { numDict ::NumDict a , value :: a }-- | Essentially equivalent to the previous 'SomeFractional'dataSomeFractional=forall a.SomeFractional { fractionalDict ::FractionalDict a , value :: a }castUp ::SomeFractional->SomeNumcastUp (SomeFractional (FractionalDict {numDict}) x) =SomeNum d xcastDown ::SomeNum->MaybeSomeFractionalcastDown (SomeNum nd x) =error"not possible!"
All of these function pointers essentially exist at runtime inside
the SomeNum. So, SomeFractional can be “cast up” to
SomeNum by simply dropping the FractionalDict.
However, you cannot “cast down” from SomeNum because there is no
way to materialize the FractionalDict: the association from type to
instance is lost at runtime. OOP languages usually get around this by having the
value itself hold pointers to all of its interface implementations at
runtime. However, in Haskell, we have type erasure by default: there are no
tables carried around at runtime.2
In the end, existential subtyping requires explicit wrapping/unwrapping
instead of implicit or lightweight casting possible in OOP languages optimized
around this sort of behavior.3 Existential-based subtyping is just less
common in Haskell because parametric polymorphism offers a solution to most
similar problems. For more on this topic, Simon Peyton Jones has a nice lecture on the
topic.
The pattern of using existentially qualified data in a container
(like [SomeNum]) is often called the “widget pattern” because it’s
used in libraries like xmonad to allow
extensible “widgets” stored alongside the methods used to manipualte them. It’s
more common to explicitly store the handler functions (a “dictionary”) inside
the type instead of of existential typeclasses, but sometimes it can be nice to
let the compiler handle generating and passing your method tables implicitly for
you. Using existential typeclasses instead of explicit dictionaries also allows
you to bless certain methods and functions as “canonical” to your type, and the
compiler will make sure they are always coherent.
I do mention in a blog
post about different types of existential lists, however, that this
“container of instances” type is much less useful in Haskell than in other
languages for many reasons, including the up/downcasting issues mentioned above.
In addition, Haskell gives you a whole wealth of functionality to operate over
homogeneous parameters (like [a], where all items have the same
type) that jumping to heterogeneous lists gives up so much.
Aside
Let’s briefly take a moment to talk about how typeclass hierarchies give us
subtle subtype/supertype relationships.
Let’s look at the classic Num and Fractional:
classNum aclassNum a =>Fractional a
Num is a superclass of Fractional, and
Fractional is a subclass of Num. Everywhere a
Num constraint is required, you can provide a
Fractional constraint to do the same thing.
However, in these two types:
Num a => aFractional a => a
forall a. Num a => a is actually a subclass of
forall a. Fractional a => a! That’s because if you need a
forall a. Fractional a => a, you can provide a
forall a. Num a => a instead. In fact, let’s look at three
levels: Double, forall a. Fractional a => a, and
forall a. Num a => a.
-- can be used as `Double`1.0 ::Double1.0 ::Fractional a => a1 ::Num a => a-- can be used as `forall a. Fractional a => a`1.0 ::Fractional a => a1 ::Num a => a-- can be used as `forall a. Num a => a`1 ::Num a => a
So, Double is a supertype of Fractional a => a
is a supertype of Num a => a.
The general idea here is that the more super- you go, the more you “know”
about the actual term you are creating. So, with Num a => a, you
know the least (and, you have the most possible actual terms because
there are more instances of Num than of Fractional).
And, with Double, you know the most: you even know its
machine representation!
So, Num is a superclass of Fractional but
forall a. Num a => a is a subclass of
forall a. Fractional a => a. This actually follows the typical
rules of subtyping: if something appears on the “left” of an arrow
(=> in this case), it gets flipped from sub- to super-. We often
call the left side a “negative” (contravariant) position and the right side a
“positive” position, because a negative of a negative (the left side of a left
size, like a in (a -> b) -> c) is a
positive.
Also note that our “existential wrappers”:
dataSomeNum=forall a.Num a =>SomeFractional adataSomeFractional=forall a.Fractional a =>SomeFractional a
can be CPS-transformed to their equivalent types:
typeSomeNum'=forall r. (forall a.Num a => a -> r) -> rtypeSomeFractional'=forall r. (forall a.Fractional a => a -> r) -> rtoSomeNum' ::SomeNum->SomeNum'toSomeNum' (SomeNum x) f = f xtoSomeNum ::SomeNum'->SomeNumtoSomeNum sn = sn SomeNum
And in those cases, Num and Fractional again appear
in the covariant (positive) position, since they’re the negative of negative.
So, this aligns with our intuition that SomeFractional is a subtype
of SomeNum.
The Expression Problem
This tension that I described earlier is closely related to the expression
problem, and is a tension that is inherent to a lot of different aspects of
language and abstraction design. However, in the context laid out in this post,
it serves as a good general guide to decide what pattern to go down:
If you expect a canonical set of “inhabitants” and an open set of
“operations”, sum types can suit that end of the spectrum well.
If you expect a canonical set of “operations” and an open set of
“inhabitants”, consider subtyping and supertyping.
I don’t really think of the expression problem as a “problem” in the sense of
“some hindrance to deal with”. Instead, I see it in the “math problem” sort of
way: by adjusting how you approach things, you can play with the equation make
the most out of what requirements you need in your design.
Looking Forward
A lot of frustration in Haskell (and programming in general) lies in trying
to force abstraction and tools to work in a way they weren’t meant to. Hopefully
this short run-down can help you avoid going against the point of these
design patterns and start making the most of what they can offer. Happy
Haskelling!
Special Thanks
I am very humbled to be supported by an amazing community, who make it
possible for me to devote time to researching and writing these posts. Very
special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
Must OOP languages also have mechanisms for type erasure, but
the default is unerased, which is opposite of Haskell.↩︎
Note that there are current GHC proposals
that attempt to allow “naked” existentials without newtype wrappers, so we could
actually get the same seamless and implicit up-casting as we would get in OOP
languages. However, the jury is out on whether or not this is a good idea.↩︎
Four years ago I bought a pair of YubiKey 5s:
One YubiKey 5 Nano, which fits in my laptop’s USB slot, and another YubiKey 5 NFC as backup, which sat in my home office.
However, I kept worrying about what happens if my house burns down or something, taking both my laptop and office YubiKeys together at the same time.
On the otherhand, if I stored my YubiKey 5 NFC offsite, then whenever I needed to register a new FIDO service, I would need to go fetch the key, update it, and then return it.
Based my peronal experince, even if that were not a big pain, the "return it" step often gets delayed indefinitely because it feels so low priority.
Then I read a popular comment made on Hacker News: Get three YubiKeys.
Suddenly everything clicked!
I bought a second YubiKey 5 NFC last year.
Now, I keep a second YubiKey 5 NFC offsite, in addition to the one in my laptop and the one in my office.
If my home burns down, I still have an offsite YubiKey available.
But the best thing about having a second YubiKey 5 NFC is that it partly mitigates the offsite update problem.
In the previous scenario, we required potentially two trips offsite to update the backup YubiKey.
However, now the procedure is to register a new FIDO service is to first update the office YubiKey 5 NFC key (and the YubiKey 5 nano).
Then, at your earlist convienence, you swap the office YubiKey 5 NFC key with the offsite YubiKey 5 NFC.
When you get the offsite YubiKey home, you update it with the new FIDO service and then it becomes the new office YubiKey.
There is no need to return to the offsite location.
Part of the issue is that there is no "public FIDO key", like there is with a "public PGP key".
You need the acutual YubiKey in hand to register it with a FIDO service, no matter whether it is a discoverable credetial or not.
If you were only using the YubiKey as a OpenPGP smart card, the perhaps you could get away with just having a local key and an offsite key.
Even still, I would recommend a third YubiKey so that whenever the time comes to do some operation on your offsite key, you can perform the same swaping trick.
The title of this article says that three is the right number of YubiKeys.
However this is because I only have one nano in my laptop because that is my primary computing interface.
I do have a desktop computer that I mostly only access as a remote server.
If you have multiple computer devices that you regularly use, it would make sense to have a YubiKey nano device in each of them.
And in addition to those, have one offsite key, and one local key for swapping with the offsite key.
The GHC developers are happy to announce the availability of the first and
likely final release candidate of GHC 9.12.2. This is an important bug-fix
release resolving a significant correctness issue present in 9.12.1
(#25653).
In accordance with our under-discussion release policies this candidate
will have a two-week testing window. The final 9.12.2 release will likely come
the week of 12 March 2025.
As always, if you find anything amiss please open a ticket.
In this episode Wouter Swiestra and Niki Vazou talk with Conal Elliott. Conal discusses doing things just for the poetry, how most programs miss their purpose, and the simplest way to ask a question. Conal is currently working on a book about his ideas and actively looking for partners.
Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context:
instance (Foo a, Foo b) =>Foo (a, b) where...
Then, of course, if we know both Foo a and Foo b, we can infer Foo (a, b). To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the Dict type, and thus the following program will typecheck:
with the only change required coming from the type constructor instances:
instance (Foo a, Foo b) =>Foo (a, b) wheretypeEvidence (a, b) = (Foo a, Foo b)...
or, if we you want to be cute about it:
instanceEvidence (a, b) =>Foo (a, b) wheretypeEvidence (a, b) = (Foo a, Foo b)...
By sticking Evidence into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a Foo dictionary around. And our earlier backwards program now typechecks as expected.
Recently I looked again at PHOAS, and once again I concluded it's nice for library APIs, but so painful to do anything with inside those libraries. So let convert to something else, like de Bruijn.
There are standalone source files if you just want to see the code:
There is always a way to cheat, though. You can turn the PHOAS ->
untyped de Bruijn machinery into the PHOAS -> typed de Bruijn
machinery by checking that future contexts indeed extend past contexts
and throwing an error otherwise (which can't happed, because future
contexts always extend past contexts, but it's a metatheorem).
In "Generic Conversions of Abstract Syntax Representation" by Steven Keuchel and Johan Jeuring, authors also "cheat" a bit. The "Parametrhic higher-order abstract syntax" section ends with a somewhat disappointing
wherepostulate whatever :_
Keuchel and Jeuring also mention "Unembedding Domain-Specific Languages" by Robert Atkey, Sam Lindley and Jeremy Yallop; where there is one unsatisfactory ⊥ (undefined in Haskell) hiding.
I think that for practical developments (say a library in Haskell), it is ok to make a small short cut; but I kept wondering isn't there is a way to make a conversion without cheating.
Well... it turns out that we cannot "cheat". Well-formedness of PHOAS representation depends on parametricity, and the conversion challenge seems to requires a theorem which there are no proof in Agda.
In unpublished (?) work Adam Chlipala shows a way to do the conversion without relying on postulates http://adam.chlipala.net/cpdt/html/Intensional.html; but that procedure requires an extra well formedness proof of given PHOAS term.
This Agda development is a translation of that developement.
Common setup
Our syntax representations will be well-typed, so we need types:
-- Typesdata Ty :Setwhere emp : Ty fun : Ty → Ty → TyCtx :SetCtx = List Tyvariable A B C : Ty Γ Δ Ω : Ctx v : Ty →Set
de Bruijn syntax
Var : Ctx → Ty →SetVar Γ A = Idx A Γ -- from agda-np, essentially membership relation.data DB (Γ : Ctx): Ty →Setwhere var : Var Γ A → DB Γ A app : DB Γ (fun A B)→ DB Γ A → DB Γ B lam : DB (A ∷ Γ) B → DB Γ (fun A B) abs : DB Γ emp → DB Γ A
Parametric Higher-order abstract syntax
data PHOAS (v : Ty →Set): Ty →Setwhere var : v A → PHOAS v A app : PHOAS v (fun A B)→ PHOAS v A → PHOAS v B lam :(v A → PHOAS v B)→ PHOAS v (fun A B) abs : PHOAS v emp → PHOAS v A-- closed "true" PHOAS terms.PHOAS° : Ty →Set₁PHOAS° A =∀{v}→ PHOAS v A
de Bruijn to PHOAS
This direction is trivial. An anecdotal evidence that de Bruijn representation is easier to transformation on.
phoasify : NP v Γ → DB Γ A → PHOAS v Aphoasify γ (var x)= var (lookup γ x)phoasify γ (app f t)= app (phoasify γ f)(phoasify γ t)phoasify γ (lam t)= lam λ x → phoasify (x ∷ γ) tphoasify γ (abs t)= abs (phoasify γ t)
Interlude: Well-formedness of PHOAS terms
dam Chlipala defines an equivalence relation between two PHOAS terms, exp_equiv in Intensional, wf in CPDT book). e only need a single term well-formedness so can do a little less
Terms like invalid cannot be values of PHOAS°, as all values of "v" inside PHOAS° have to originated from lam-constructor abstractions. We really should keep v parameter free, i.e. parametric, when constructingPHOAS terms.
The idea is then to simply to track which variables (values of v) are intoduced by lambda abstraction.
data phoasWf {v :Ty→Set} (G:List (Σ Ty v)) : {A:Ty} →PHOAS v A→Setwhere varWf :∀ {A} {x : v A}→Idx (A , x) G→ phoasWf G (var x) appWf :∀ {AB} {f :PHOAS v (fun AB)} {t :PHOAS v A}→ phoasWf G f→ phoasWf G t→ phoasWf G (app f t) lamWf :∀ {AB} {f : v A→PHOAS v B}→ (∀ (x : v A) → phoasWf ((A , x) ∷G) (f x))→ phoasWf G (lam f) absWf :∀ {A} {t :PHOAS v emp}→ phoasWf G t→ phoasWf G (abs {A=A} t)-- closed terms start with an empty GphoasWf° :PHOAS° A→Set₁phoasWf° tm =∀ {v} → phoasWf {v = v} [] tm
A meta theorem is then that all PHOASᵒ terms are well-formed, i.e.
meta-theorem-proposition :Set₁meta-theorem-proposition =∀{A}(t : PHOAS° A)→ phoasWf° t
As far as I'm aware this proposition cannot be proved nor refuted in Agda.
de Bruijn to PHOAS translation creates well-formed PHOAS terms.
As a small exercise we can show that phoasify of closed de Bruijn terms creates well-formed PHOAS terms.
toList : NP v Γ → List (Σ Ty v)toList [] = []toList (x ∷ xs)=(_ , x) ∷ toList xsphoasifyWfVar :(γ : NP v Γ)(x : Var Γ A)→ Idx (A , lookup γ x)(toList γ)phoasifyWfVar (x ∷ γ) zero = zerophoasifyWfVar (x ∷ γ)(suc i)= suc (phoasifyWfVar γ i)phoasifyWf :(γ : NP v Γ)(t : DB Γ A)→ phoasWf (toList γ)(phoasify γ t)phoasifyWf γ (var x)= varWf (phoasifyWfVar γ x)phoasifyWf γ (app f t)= appWf (phoasifyWf γ f)(phoasifyWf γ t)phoasifyWf γ (lam t)= lamWf λ x → phoasifyWf (x ∷ γ) tphoasifyWf γ (abs t)= absWf (phoasifyWf γ t)phoasifyWf° :(t : DB [] A)→ phoasWf° (phoasify [] t)phoasifyWf° t = phoasifyWf [] t
PHOAS to de Bruijn
The rest deals with the opposite direction.
In Intensional Adam Chlipala uses v = λ _ → ℕ instatiation to make the translation.
I think that in the typed setting using v = λ _ → Ctx turns out nicer.
The idea in both is that we instantiate PHOAS variables to be de Bruijn levels.
data IsSuffixOf {ℓ}{a :Set ℓ}: List a → List a →Set ℓ where refl :∀{xs}→ IsSuffixOf xs xs cons :∀{xs ys}→ IsSuffixOf xs ys →∀{y}→ IsSuffixOf xs (y ∷ ys)
We need to establish well-formedness of PHOAS expression in relation to some context Γ
Note that variables encode de Bruijn levels, thus the contexts we "remember" in variables should be the suffix of that outside context.
wf :(Γ : Ctx)→ PHOAS (λ_→ Ctx) A →Setwf {A = A} Γ (var Δ)= IsSuffixOf (A ∷ Δ) Γwf Γ (app f t)= wf Γ f × wf Γ twf Γ (lam {A = A} t)= wf (A ∷ Γ)(t Γ)wf Γ (abs t)= wf Γ t
And if (A ∷ Δ) is suffix of context Γ, we can convert the evidence to the de Bruijn index (i.e. variable):
makeVar : IsSuffixOf (A ∷ Δ) Γ → Var Γ AmakeVar refl = zeromakeVar (cons s)= suc (makeVar s)
Given the term is well-formed in relation to context Γ we can convert it to de Bruijn representation.
dbify :(t : PHOAS (λ_→ Ctx) A)→ wf Γ t → DB Γ Adbify (var x) wf = var (makeVar wf)dbify (app f t)(fʷ , tʷ)= app (dbify f fʷ)(dbify t tʷ)dbify {Γ = Γ}(lam t) wf = lam (dbify (t Γ) wf)dbify (abs t) wf = abs (dbify t wf)
What is left is to show that we can construct wf for all phoasWf-well-formed terms.
Adam Chlipala defines a helper function:
makeG′ : Ctx → List (Σ Ty (λ_→ Ctx))makeG′ [] = []makeG′ (A ∷ Γ)=(A , Γ) ∷ makeG′ Γ
However for somewhat technical reasons, we rather define
Normalization by evaluation using parametric higher order syntax. In Agda.
I couldn't find a self-contained example of PHOAS NbE, so here it is. I hope someone might find it useful.
module NbEXP.PHOAS wheredata Ty :Setwhere emp : Ty fun : Ty → Ty → Tydata Tm (v : Ty →Set): Ty →Setwhere var :∀{a}→ v a → Tm v a app :∀{a b}→ Tm v (fun a b)→ Tm v a → Tm v b lam :∀{a b}→(v a → Tm v b)→ Tm v (fun a b)data Nf (v : Ty →Set): Ty →Setdata Ne (v : Ty →Set): Ty →Setdata Ne v where nvar :∀{a}→ v a → Ne v a napp :∀{a b}→ Ne v (fun a b)→ Nf v a → Ne v bdata Nf v where neut : Ne v emp → Nf v emp nlam :∀{a b}→(v a → Nf v b)→ Nf v (fun a b)Sem :(Ty →Set)→ Ty →SetSem v emp = Ne v empSem v (fun a b)= Sem v a → Sem v blower :∀{v : Ty →Set}(a : Ty)→ Sem v a → Nf v araise :∀{v : Ty →Set}(a : Ty)→ Ne v a → Sem v alower emp s = neut slower (fun a b) s = nlam λ x → lower b (s (raise a (nvar x)))raise emp n = nraise (fun a b) n x = raise b (napp n (lower a x ))eval :{v : Ty →Set}{a : Ty}→ Tm (Sem v) a → Sem v aeval (var x)= xeval (app f t)= eval f (eval t)eval (lam t) x = eval (t x)nf :{a : Ty}→{v : Ty →Set}→ Tm (Sem v) a → Nf v anf {a} t = lower a (eval t)nf_parametric :{a : Ty}→({v : Ty →Set}→ Tm v a)->({v : Ty →Set}→ Nf v a)nf_parametric t = nf t
This last month has been fascinating. I guess LLMs have finally
resonated with me on a deeper level. It wasn’t like I woke up and
suddenly everything was different, but their impact is growing on me
non-linearly, forcing me to rewire my brain.
I've been fortunate to be nominated for a few teaching awards over my career, and even to win a couple. The nomination I just received may be the best.
As a new student at the uni, Philip Wadler was the first introductory lecture I had, and his clear passion for the subject made me feel excited to begin my journey in computer science. In particular he emphasised the importance of asking questions, which made the idea of tutorials and lectures a lot less intimidating, and went on to give really valuable advice for starting university. I enjoyed this session so much, and so was looking forward to the guest lectures he was going to do for Inf1A at the end of semester 1. They certainly did not disappoint, the content he covered was engaging, interesting, and above all very entertaining to listen to, especially when he dressed up as a superhero to cement his point. Because I found these talks so rewarding, I also attended the STMU that he spoke at about AI and ChatGPT, and everyone I talked to after the event said they had a really good time whilst also having a completely new insightful perspective on the topic. In summary, Philip Wadler has delivered the best lectures I have attended since starting university, and I have gotten a lot out of them.
President Trump has started rolling out his tariffs, something I blogged about in November. People are talking about these tariffs a lot right now, with many people (correctly) commenting on how consumers will end up with higher prices as a result of these tariffs. While that part is true, I’ve seen a lot of people taking it to the next, incorrect step: that consumers will pay the entirety of the tax. I put up a poll on X to see what people thought, and while the right answer got a lot of votes, it wasn't the winner.
Checking on people's general view of taxes. When the government imposes a tax on trade (sales tax, VAT, tariff, or even payroll tax), which party absorbs the cost of the tax?
For purposes of this blog post, our ultimate question will be the following:
Suppose apples currently sell for $1 each in the entire United States.
There are domestic sellers and foreign sellers of apples, all receiving the same price.
There are no taxes or tariffs on the purchase of apples.
The question is: if the US federal government puts a $0.50 import tariff per apple, what will be the change in the following:
Number of apples bought in the US
Price paid by buyers for apples in the US
Post-tax price received by domestic apple producers
Post-tax price received by foreign apple producers
Before we can answer that question, we need to ask an easier, first question: before instituting the tariff, why do apples cost $1?
And finally, before we dive into the details, let me provide you with the answers to the ultimate question. I recommend you try to guess these answers before reading this, and if you get it wrong, try to understand why:
The number of apples bought will go down
The buyers will pay more for each apple they buy, but not the full amount of the tariff
Domestic apple sellers will receive a higher price per apple
Foreign apple sellers will receive a lower price per apple, but not lowered by the full amount of the tariff
In other words, regardless of who sends the payment to the government, both taxed parties (domestic buyers and foreign sellers) will absorb some of the costs of the tariff, while domestic sellers will benefit from the protectionism provided by tariffs and be able to sell at a higher price per unit.
Let’s say I absolutely love apples, they’re my favorite food. How much would I be willing to pay for a single apple? You might say “$1, that’s the price in the supermarket,” and in many ways you’d be right. If I walk into supermarket A, see apples on sale for $50, and know that I can buy them at supermarket B for $1, I’ll almost certainly leave A and go buy at B.
But that’s not what I mean. What I mean is: how high would the price of apples have to go everywhere so that I’d no longer be willing to buy a single apple? This is a purely personal, subjective opinion. It’s impacted by how much money I have available, other expenses I need to cover, and how much I like apples. But let’s say the number is $5.
How much would I be willing to pay for another apple? Maybe another $5. But how much am I willing to pay for the 1,000th apple? 10,000th? At some point, I’ll get sick of apples, or run out of space to keep the apples, or not be able to eat, cook, and otherwise preserve all those apples before they rot.
The point being: I’ll be progressively willing to spend less and less money for each apple. This form of analysis is called marginal benefit: how much benefit (expressed as dollars I’m willing to spend) will I receive from each apple? This is a downward sloping function: for each additional apple I buy (quantity demanded), the price I’m willing to pay goes down. This is what gives my personal demand curve. And if we aggregate demand curves across all market participants (meaning: everyone interested in buying apples), we end up with something like this:
Assuming no changes in people’s behavior and other conditions in the market, this chart tells us how many apples will be purchased by our buyers at each price point between $0.50 and $5. And ceteris paribus (all else being equal), this will continue to be the demand curve for apples.
Marginal cost
Demand is half the story of economics. The other half is supply, or: how many apples will I sell at each price point? Supply curves are upward sloping: the higher the price, the more a person or company is willing and able to sell a product.
Let’s understand why. Suppose I have an apple orchard. It’s a large property right next to my house. With about 2 minutes of effort, I can walk out of my house, find the nearest tree, pick 5 apples off the tree, and call it a day. 5 apples for 2 minutes of effort is pretty good, right?
Yes, there was all the effort necessary to buy the land, and plant the trees, and water them… and a bunch more than I likely can’t even guess at. We’re going to ignore all of that for our analysis, because for short-term supply-and-demand movement, we can ignore these kinds of sunk costs. One other simplification: in reality, supply curves often start descending before ascending. This accounts for achieving efficiencies of scale after the first number of units purchased. But since both these topics are unneeded for understanding taxes, I won’t go any further.
Anyway, back to my apple orchard. If someone offers me $0.50 per apple, I can do 2 minutes of effort and get $2.50 in revenue, which equates to a $75/hour wage for me. I’m more than happy to pick apples at that price!
However, let’s say someone comes to buy 10,000 apples from me instead. I no longer just walk out to my nearest tree. I’m going to need to get in my truck, drive around, spend the day in the sun, pay for gas, take a day off of my day job (let’s say it pays me $70/hour). The costs go up significantly. Let’s say it takes 5 days to harvest all those apples myself, it costs me $100 in fuel and other expenses, and I lose out on my $70/hour job for 5 days. We end up with:
Total expenditure: $100 + $70 * 8 hours a day * 5 days == $2900
Total revenue: $5000 (10,000 apples at $0.50 each)
Total profit: $2100
So I’m still willing to sell the apples at this price, but it’s not as attractive as before. And as the number of apples purchased goes up, my costs keep increasing. I’ll need to spend more money on fuel to travel more of my property. At some point I won’t be able to do the work myself anymore, so I’ll need to pay others to work on the farm, and they’ll be slower at picking apples than me (less familiar with the property, less direct motivation, etc.). The point being: at some point, the number of apples can go high enough that the $0.50 price point no longer makes me any money.
This kind of analysis is called marginal cost. It refers to the additional amount of expenditure a seller has to spend in order to produce each additional unit of the good. Marginal costs go up as quantity sold goes up. And like demand curves, if you aggregate this data across all sellers, you get a supply curve like this:
Equilibrium price
We now know, for every price point, how many apples buyers will purchase, and how many apples sellers will sell. Now we find the equilibrium: where the supply and demand curves meet. This point represents where the marginal benefit a buyer would receive from the next buyer would be less than the cost it would take the next seller to make it. Let’s see it in a chart:
You’ll notice that these two graphs cross at the $1 price point, where 63 apples are both demanded (bought by consumers) and supplied (sold by producers). This is our equilibrium price. We also have a visualization of the surplus created by these trades. Everything to the left of the equilibrium point and between the supply and demand curves represents surplus: an area where someone is receiving something of more value than they give. For example:
When I bought my first apple for $1, but I was willing to spend $5, I made $4 of consumer surplus. The consumer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and above the equilibrium price point.
When a seller sells his first apple for $1, but it only cost $0.50 to produce it, the seller made $0.50 of producer surplus. The producer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and below the equilibrium price point.
Another way of thinking of surplus is “every time someone got a better price than they would have been willing to take.”
OK, with this in place, we now have enough information to figure out how to price in the tariff, which we’ll treat as a negative externality.
Modeling taxes
Alright, the government has now instituted a $0.50 tariff on every apple sold within the US by a foreign producer. We can generally model taxes by either increasing the marginal cost of each unit sold (shifting the supply curve up), or by decreasing the marginal benefit of each unit bought (shifting the demand curve down). In this case, since only some of the producers will pay the tax, it makes more sense to modify the supply curve.
First, let’s see what happens to the foreign seller-only supply curve when you add in the tariff:
With the tariff in place, for each quantity level, the price at which the seller will sell is $0.50 higher than before the tariff. That makes sense: if I was previously willing to sell my 82nd apple for $3, I would now need to charge $3.50 for that apple to cover the cost of the tariff. We see this as the tariff “pushing up” or “pushing left” the original supply curve.
We can add this new supply curve to our existing (unchanged) supply curve for domestic-only sellers, and we end up with a result like this:
The total supply curve adds up the individual foreign and domestic supply curves. At each price point, we add up the total quantity each group would be willing to sell to determine the total quantity supplied for each price point. Once we have that cumulative supply curve defined, we can produce an updated supply-and-demand chart including the tariff:
As we can see, the equilibrium has shifted:
The equilibrium price paid by consumers has risen from $1 to $1.20.
The total number of apples purchased has dropped from 63 apples to 60 apples.
Consumers therefore received 3 less apples. They spent $72 for these 60 apples, whereas previously they spent $63 for 3 more apples, a definite decrease in consumer surplus.
Foreign producers sold 36 of those apples (see the raw data in the linked Google Sheet), for a gross revenue of $43.20. However, they also need to pay the tariff to the US government, which accounts for $18, meaning they only receive $25.20 post-tariff. Previously, they sold 42 apples at $1 each with no tariff to be paid, meaning they took home $42.
Domestic producers sold the remaining 24 apples at $1.20, giving them a revenue of $28.80. Since they don’t pay the tariff, they take home all of that money. By contrast, previously, they sold 21 apples at $1, for a take-home of $21.
The government receives $0.50 for each of the 60 apples sold, or in other words receives $30 in revenue it wouldn’t have received otherwise.
We could be more specific about the surpluses, and calculate the actual areas for consumer surplus, producer surplus, inefficiency from the tariff, and government revenue from the tariff. But I won’t bother, as those calculations get slightly more involved. Instead, let’s just look at the aggregate outcomes:
Consumers were unquestionably hurt. Their price paid went up by $0.20 per apple, and received less apples.
Foreign producers were also hurt. Their price received went down from the original $1 to the new post-tariff price of $1.20, minus the $0.50 tariff. In other words: foreign producers only receive $0.70 per apple now. This hurt can be mitigated by shifting sales to other countries without a tariff, but the pain will exist regardless.
Domestic producers scored. They can sell less apples and make more revenue doing it.
And the government walked away with an extra $30.
Hopefully you now see the answer to the original questions. Importantly, while the government imposed a $0.50 tariff, neither side fully absorbed that cost. Consumers paid a bit more, foreign producers received a bit less. The exact details of how that tariff was split across the groups is mediated by the relevant supply and demand curves of each group. If you want to learn more about this, the relevant search term is “price elasticity,” or how much a group’s quantity supplied or demanded will change based on changes in the price.
Other taxes
Most taxes are some kind of a tax on trade. Tariffs on apples is an obvious one. But the same applies to income tax (taxing the worker for the trade of labor for money) or payroll tax (same thing, just taxing the employer instead). Interestingly, you can use the same model for analyzing things like tax incentives. For example, if the government decided to subsidize domestic apple production by giving the domestic producers a $0.50 bonus for each apple they sell, we would end up with a similar kind of analysis, except instead of the foreign supply curve shifting up, we’d see the domestic supply curve shifting down.
And generally speaking, this is what you’ll always see with government involvement in the economy. It will result in disrupting an existing equilibrium, letting the market readjust to a new equilibrium, and incentivization of some behavior, causing some people to benefit and others to lose out. We saw with the apple tariff, domestic producers and the government benefited while others lost.
You can see the reverse though with tax incentives. If I give a tax incentive of providing a deduction (not paying income tax) for preschool, we would end up with:
Government needs to make up the difference in tax revenue, either by raising taxes on others or printing more money (leading to inflation). Either way, those paying the tax or those holding government debased currency will pay a price.
Those people who don’t use the preschool deduction will receive no benefit, so they simply pay a cost.
Those who do use the preschool deduction will end up paying less on tax+preschool than they would have otherwise.
This analysis is fully amoral. It’s not saying whether providing subsidized preschool is a good thing or not, it simply tells you where the costs will be felt, and points out that such government interference in free economic choice does result in inefficiencies in the system. Once you have that knowledge, you’re more well educated on making a decision about whether the costs of government intervention are worth the benefits.
For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It’s winter right now, but when its summer again it’s always a bit. this weekend I got closer to that goal.
TL;DR: Using code-server on a beefy machine seems to be quite neat.
Passively lit coding
Personal history
Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.
Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I’d attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet’s screen’s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.
Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don’t have to use Boox’s “monitor” app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn’t as convenient to be used regularly.
I also played around with pure terminal approaches, e.g. SSH’ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.
My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it’s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it’s a bit spooky to run these work loads without the laptop’s fan spinning up.
In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?
VSCode itself doesn’t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet’s system.
code-server, running remotely
A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.
Access
With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.
I also decided against the more involved “reverse proxy behind proper hostname with SSL” setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.
That left me with the option of using a code-server’s built-in support for self-signed certificates and a password:
With trust-on-first-use this seems reasonably secure.
Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot “install” the page (as a Progressive Web App) unless it has a valid certificate. But since I don’t have superuser access to that machine, I can’t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using acme.sh) and copied the files over, so the configuration now reads as follows:
(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server.
For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.
Git credentials
The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved.
I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn’t be horrible, but I still wondered if I can do better.
I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn’t want to bother anyone on the weekend.
So I am experimenting with Github’s git-credential-manager now. I have configured it to use git’s credential cache with an elevated timeout, so that once I log in, I don’t have to again for one workday.
To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…
This still grants rather broad permissions to the code-server, but at least only temporarily
Android setup
On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!
I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It’s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it’s a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn’t on Open-VSX.
For some reason the F11 key doesn’t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode’s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that’s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.
A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn’t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed.
I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn’t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions.
Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does’t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don’t do that I get the full screen experience. We’ll see.
I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean’s InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically.
I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I’d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don’t have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?
Conclusion
After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.
A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe – I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?
Below we present some animations that illustrate operations on finite patches of Penrose’s Kite and Dart tiles.
These were created using PenroseKiteDart which is a Haskell package available on Hackage making use of the Haskell Diagrams package. For details, see the PenroseKiteDart user guide.
Penrose’s Kite and Dart tiles can produce infinite aperiodic tilings of the plane. There are legal tiling rules to ensure aperiodicity, but these rules do not guarantee that a finite tiling will not get stuck. A legal finite tiling which can be continued to cover the whole plane is called a correct tiling. The rest, which are doomed to get stuck, are called incorrect tilings. (More details can be found in the links at the end of this blog.)
Decomposition Animations
The function decompose is a total operation which is guaranteed to preserve the correctness of a finite tiling represented as a tile graph (or Tgraph). Let us start with a particular Tgraph called sunGraph which is defined in PenroseKiteDart and consists of 5 kites arranged with a common origin vertex. It is drawn using default style in figure 1 on the left. On the right of figure 1 it is drawn with both vertex labels and dotted lines for half-tile join edges.
Figure 1: sunGraph
We can decompose sunGraph three times by selecting index 3 of the infinite list of its decompositions.
The result (sunD3) is drawn in figure 2 (scaled up).
Figure 2: sunD3
The animation in figure 3 illustrates two further decompositions of sunD3 in two stages.
Figure 3: Two decompositions of sunD3
Figure 4 also illustrates two decompositions, this time starting from forcedKingD.
forcedKingD :: Tgraph
forcedKingD = force (decompose kingGraph)
Figure 4: Two decompositions of forcedKingD
A Composition Animation
An inverse to decomposing (namely composing) has some extra intricacies. In the literature (see for example 1 and 2) versions of the following method are frequently described.
Firstly, split darts in half.
Secondly, glue all the short edges of the half-darts where they meet a kite (simultaneously). This will form larger scale complete darts and larger scale half kites.
Finally join the halves of the larger scale kites.
This works for infinite tilings, but we showed in Graphs,Kites and Darts and Theorems that this method is unsound for finite tilings. There is the trivial problem that a half-dart may not have a complete kite on its short edge. Worse still, the second step can convert a correct finite tiling into an incorrect larger scale tiling. An example of this is given in Graphs, Kites and Darts and Theorems where we also described our own safe method of composing (never producing an incorrect Tgraph when given a correct Tgraph). This composition can leave some boundary half-tiles out of the composition (called remainder half-tiles).
The animation in figure 5 shows such a composition where the remainder half-tiles are indicated with lime green edges.
Figure 5: Composition Animation
In general, compose is a partial operation as the resulting half-tiles can break some requirements for Tgraphs (namely, connectedness and no crossing boundaries). However we have shown that it is a total function on forced Tgraphs. (Forcing is discussed next.)
Forcing Animations
The process of forcing a Tgraph adds half-tiles on the boundary where only one legal choice is possible. This continues until either there are no more forced additions possible, or a clash is found showing that the tiling is incorrect. In the latter case it must follow that the initial tiling before forcing was already an incorrect tiling.
The process of forcing is animated in figure 6, starting with a 5 times decomposed kite and in figure 7 with a 5 times decomposed dart.
Figure 6: Force animationFigure 7: Another force animation
It is natural to wonder what forcing will do with cut-down (but still correct) Tgraphs. For example, taking just the boundary faces from the final Tgraph shown in the previous animation forms a valid Tgraph (boundaryExample) shown in figure 8.
Applying force to boundaryExample just fills in the hole to recreate force (decompositions dartGraph !!5) modulo vertex numbering. To make it more interesting we tried removing further half-tiles from boundaryExample to make a small gap. Forcing this also completes the filling in of the boundary half-tiles to recreate force (decompositions dartGraph !!5). However, we can see that this filling in is constrianed to preserve the required Tgraph property of no crossing boundaries which prevents the tiling closing round a hole.
This is illustrated in the animation shown in figure 9.
Figure 9: Boundary gap animation
As another experiment, we take the boundary faces of a (five times decomposed but not forced) star. When forced this fills in the star and also expands outwards, as illustrated in figure 10.
Figure 10: Star boundary
In the final example, we pick out a shape within a correct Tgraph (ensuring the chosen half-tiles form a valid Tgraph) then animate the force process and then run the animation in both directions (by adding a copy of the frames in reverse order).
The result is shown in figure 11.
Figure 11: Heart animation
Creating Animations
Animations as gif files can be produced by the Haskell Diagrams package using the rasterific back end.
The main module should import both Diagrams.Prelude and Diagrams.Backend.Rasterific.CmdLine. This will expose the type B standing for the imported backend, and diagrams then have type Diagram B.
An animation should have type [(Diagram B, Int)] and consist of a list of frames for the animation, each paired with an integer delay (in one-hundredths of a second).
The animation can then be passed to mainWith.
module Main (main)whereimport Diagrams.Prelude
import Diagrams.Backend.Rasterific.CmdLine
...
fig::[(Diagram B,Int)]
fig = myExampleAnimation
main :: IO ()
main = mainWith fig
If main is then compiled and run (e.g. with parameters -w 700 -o test.gif) it will produce an output file (test.gif with width 700).
Crossfade tool
The decompose and compose animations were defined using crossfade.
crossfade :: Int -> Diagram B -> Diagram B ->[Diagram B]
crossfade n d1 d2 = map blending ratios
where
blending r = opacity (1-r) d1 <> opacity r d2
ratios = map ((/ fromIntegral n) . fromIntegral)[0..n]
Thus crossfade n d1 d2 produces n+1 frames, each with d1 overlaid on d2 but with varying opacities (decreasing for d1 and increasing for d2).
Adding the same pause (say 10 hundreths of a second) to every frame can be done by applying map (,10) and this will produce an animation.
Force animation tool
To create force animations it was useful to create a tool to produce frames with stages of forcing.
an angle argument (to rotate the diagrams in the animation from the default alignment of the Tgraph),
an Int (for the required number of frames),
a Tgraph (to be forced),
a triple of colours for filling darts, kites and grout (edge colour), respectively.
The definition of forceFrames uses stepForce to advance forcing a given number of steps to get the intermediate Tgraphs. The total number of forcing steps will be the number of faces (half-tiles) in the final force g less the number of faces in the initial g. All the Tgraphs are drawn (using colourDKG) but the resulting diagrams must all be aligned properly. The alignment can be achieved by creating a VPatch (vertex patch) from the final Tgraph which is then rotated. All the Tgraphs can then be drawn using sub vertex patches of the final rotated one. (For details see Overlaid examples in the PenroseKiteDart user guide.)
Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
Graphs, Kites and Darts introduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).
consider a generic implementation of alpha-beta game tree search with transposition table, generic enough to be applicable to any user-specified game. what should be its API? what features should it provide?
evaluate to infinite depth (possible because of transposition table), returning game value and line (principal variation). intended for small games.
return the transposition table so that it can be reused for subsequent moves.
evaluate to given depth. or, user-specified predicate of whether to stop searching, e.g., quiescence search. quiescence search wants access to the transposition table.
ambitious: because of the many ways game tree search can be customized (for many examples, albeit often poorly described, see the chessprogramming wiki), structure the algorithm as a collection components each of which can be modified and hooked together in various ways. I have no idea what language or framework could enable this kind of software engineering, though functional programming languages seem attractive as the first thing to try. but beware that a pure functional programming language such as Haskell easily leaks space for this kind of task, and threading state, the transposition table, though the computation may be awkward.
common customizations sacrifice accuracy (correctness or completeness) for speed. for example, if two different evaluated positions have the same key (for example, a 64-bit Zobrist hash in chess), one can optimize by doing no transposition table collision resolution; the second position gets ignored, assumed to have already been evaluated. the default algorithm should not do such optimizations but should allow the user to specify both safe and unsafe optimizations.
allow the search to be augmented with various statistics gathered along the way that get consumed by other user-specified parts of the algorithm. for example, the move generator could order moves based on values of similar moves already evaluated in other parts of the tree.
provide visibility into how user customizations are working, ways to evaluate whether or not they are worth it.
There’s a common anti-pattern I see in beginner-to-intermediate Haskell programmers that I wanted to discuss today. It’s the tendency to conceptualize the creation of an object by repeated mutation. Often this takes the form of repeated insertion into an empty container, but comes up under many other guises as well.
This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods:
Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this:
Doodad doodad =new Doodad;foreach(Widget widget in widgets){ doodad.addWidget(widget);}
To be more concrete, maybe instead of doodads and widgets you have BSTs and Nodes. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code.
Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like:
foldr (\(k, v) m -> Map.insert k v m) Map.empty$ toKVPairs something
Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark.
The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and foldrs all have in common is that they are algorithms for strategies for building something.
But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at.
I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one?
cgo :: (a -> (UInt, UInt)) -> [a] -> [NonEmpty a]cgo f =foldr step []where step a [] = [pure a] step a bss0@((b :| bs) : bss)|let (al, ac) = f a , let (bl, bc) = f b , al +1== bl && ac == bc= (a :| b : bs) : bss|otherwise=pure a : bss0
which I found by grepping through haskell-language-server for foldr, and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests!
It’s Always Monoids
The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces.
The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.”
As an amazingly fast recap, a monoid is a collection of three things: some type m, some value of that type mempty, and binary operation over that type (<>) :: m -> m -> m, subject to a bunch of laws:
∀a.mempty<> a = a = a <>mempty∀a b c. (a <> b) <> c = a <> (b <> c)
which is to say, mempty does nothing and (<>) doesn’t care where you stick the parentheses.
If you’re going to memorize any two particular examples of monoids, it had better be these two:
instanceMonoid [a] wheremempty= [] a <> b = a ++ binstance (Monoid a, Monoid b) =>Monoid (a, b) wheremempty= (mempty, mempty) (a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2)
The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids.
The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation toList a <> toList b = toList (a <> b). Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap.
The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example:
from :: (Bool-> a) -> (a, a)from f = (f False, f True)to :: (a, a) -> (Bool-> a)to (a, _) False= ato (_, a) True= a
and under this isomorphism, we should thereby expect the Monoid a => Monoid (Bool -> a) instance to agree with Monoid a => Monoid (a, a). If you generalize this out, you get the following instance:
instanceMonoid a =>Monoid (x -> a) wheremempty= \_ ->mempty f <> g = \x -> f x <> g x
which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation:
from f <> from g= (f False, f True) <> (g False, g True)= (f False<> g False, f True<> g True)= ((f <> g) False, (f <> g) True)= from (f <> g)
and
to (a11, a12) <> to (a21, a22)= \x -> to (a11, a12) x <> to (a21, a22) x= \x ->case x ofFalse-> to (a11, a12) False<> to (a21, a22) FalseTrue-> to (a11, a12) True<> to (a21, a22) True= \x ->case x ofFalse-> a11 <> a21True-> a12 <> a22= \x -> to (a11 <> a21, a12 <> a22) x= to (a11 <> a21, a12 <> a22)
which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type x in the domain of the function, but showing it generically is challenging.
Anyway, I digresss.
The reason to memorize thisMonoid instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be.
Don’t believe me? A Map k v is an encoding of the function k -> Maybe v optimized to efficiently query which k values map to Just something. That is to say, it’s a sparse representation of a function.
From Theory to Practice
What does all of this look like in practice? Stuff like worrying about foldr is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application.
The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things.
At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well.
It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the Scope construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these Scope nodes, and propagating their information downstream.
Rather amazingly, this implementation kinda sorta worked! But it was slow, and took \(O(10k)\) SLOC to implement.
The insight here is that everything we needed to collect was monoidal:
where the deriving (Semigroup, Monoid) via Generically Solution stanza gives us the semigroup and monoid instances that we’d expect from Solution being the product of a bunch of other monoids.
And now for the coup de grace: we hook everything up with the Writer monad. Writer is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details:
instanceMonoid w =>Monad (Writer w)
Writer w is a monad whenever w is a monoid, which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions:
collectNode ::MonadWriterSolution m =>Node-> m ()collectNode n = tell $mempty { collectedNodes = Set.singleton n }addMetadata ::MonadWriterSolution m =>Node->Metadata-> m ()addMetadata n m = tell $mempty { metadata = Map.singleton n m }emitGraphFragment ::MonadWriterSolution m =>Graph-> m ()emitGraphFragment g = tell $mempty { graph = g }
each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type:
algorithm ::Metadata-- ^ the current scope->Tree-- ^ the tree we're reducing->WriterSolutionNode-- ^ our partial solution, and the node corresponding to the root of the tree
which traverses the Tree, recursing with a different Metadata whenever it comes across a Scope constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root Node of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces.
This new implementation is roughly 20x smaller, coming in at @O(500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation.
Suppose we have a sequence of integers \(a_1, \dots, a_n\) and want to be
able to perform two operations:
we can update any \(a_i\) by adding some value \(v\) to it; or
we can perform a range query, which asks for the sum of the values
\(a_i + \dots + a_j\) for any range \([i,j]\).
There are several ways to solve this problem. For example:
We could just keep the sequence of integers in a mutable array.
Updating is \(O(1)\), but range queries are \(O(n)\) since we must
actually loop through the range and add up all the values.
We could keep a separate array of prefix sums on the side, so
that \(P_i\) stores the sum \(a_1 + \dots + a_i\). Then the range
query on \([i,j]\) can be computed as \(P_j - P_{i-1}\), which only
takes \(O(1)\); however, updates now take \(O(n)\) since we must also
update all the prefix sums which include the updated element.
We can get the best of both worlds using a segment tree, a binary
tree storing the elements at the leaves, with each internal node
caching the sum of its children. Then both update and range query
can be done in \(O(\lg n)\).
I won’t go through the details of this third solution here, but it is
relatively straightforward to understand and implement, especially in
a functional language.
However, there is a fourth solution, known as a Fenwick tree or
Fenwick array, independently invented by Ryabko (1989) and
Fenwick (1994). Here’s a typical Java implementation of a Fenwick
tree:
class FenwickTree {privatelong[] a;publicFenwickTree(int n){ a =newlong[n+1];}publiclongprefix(int i){long s =0;for(; i >0; i -=LSB(i)) s += a[i];return s;}publicvoidupdate(int i,long delta){for(; i < a.length; i +=LSB(i)) a[i]+= delta;}publiclongrange(int i,int j){returnprefix(j)-prefix(i-1);}publiclongget(int i){returnrange(i,i);}publicvoidset(int i,long v){update(i, v -get(i));}privateintLSB(int i){return i &(-i);}}
I know what you’re thinking: what the heck!? There are some loops adding and
subtracting LSB(i), which is defined as the bitwise AND of i and
-i? What on earth is this doing? Unless you have seen this
before, this code is probably a complete mystery, as it was for me the
first time I encountered it.
However, from the right point of view, we can derive this mysterious imperative
code as an optimization of segment trees. In particular, in my
paper I show how we can:
Start with a segment tree.
Delete some redundant info from the segment tree, and shove the
remaining values into an array in a systematic way.
Define operations for moving around in the resulting Fenwick array by
converting array indices to indices in a segment tree, moving
around the tree appropriately, and converting back.
Describe these operations using a Haskell EDSL for
infinite-precision 2’s complement binary arithmetic, and fuse away
all the intermediate conversion steps, until the above mysterious
implementation pops out.