Twentyseven
is a Rubik’s cube solver and one of my earliest projects in Haskell.
The first commit dates from January 2014, and version 0.0.0 was uploaded on Hackage in March 2016.
I first heard of Haskell in a course on lambda calculus in 2013.
A programming language with lazy evaluation sounded
like a crazy idea, so I gave it a try.
Since then, I have kept writing in Haskell as my favorite language.
For me it is the ideal blend of programming and math.
And a Rubik’s cube solver is a great excuse for doing group theory.
Twentyseven 1.0.0 is more of a commemorative release for myself,
with the goal of making it compile with the current version of GHC (9.12).
There was surprisingly little breakage:
Aside from that, the code is basically just as it was 9 years ago,
including design decisions that I would find questionable today.
For example, I use unsafePerformIO to read precomputed tables
into top-level constants, but the location of the files to read from
can be configured by command-line arguments, so I better make sure that
the tables are not forced before the location is set…
How Twentyseven works
The input of the program is a string enumerating the 54 facelets
of a Rubik’s cube, each character represents one color.
The facelets follow the order pictured below. They are grouped
by faces (up, left, front, right, back, top), and in each face
they are listed in top-down, left-right order.
The output is a sequence of moves to solve that cube.
U L B' L R2 D R U2 F U2 L2 B2 U B2 D' B2 U' R2 U L2 R2 U
The implementation of Twentyseven is based on Herbert Kociemba’s notes
about Cube Explorer, a program written in Pascal!
The search algorithm is iterative deepening A*, or IDA*. Like A*, IDA* finds
the shortest path between two vertices in a graph.
A conventional A* is not feasible because the state space of a Rubik’s cube is massive (43 252 003 274 489 856 000 states,
literally billions of billions).
Instead, we run a series of depth-first searches
with a maximum allowed number of moves that increases for each search.
As it is based on depth-first search,
IDA* only needs memory for the current path,
which is super cheap.
IDA* relies on an estimate of the number of moves remaining
to reach the solved state. We obtain such an estimate by
projecting the Rubik’s cube state into a simpler puzzle.
For example, we can consider only the permutation of corners,
ignoring their orientation.
We can pre-compute a table mapping each corner permutation
(there are 8! = 40320) to the minimum
number of moves to put the corners back to their location.
This is a lower bound on the number of moves to actually solve a Rubik’s cube.
Different projections yield different lower bounds (for example, by
looking at the permutation of edges instead, or their orientation),
and we can combine lower bounds into their maximum,
yielding a more precise lower bound, and thus a faster IDA*.
Putting all that together, we obtain an optimal solver for Rubik’s cubes.
But even with these heuristics, Twentyseven can take hours to solve a random cube optimally.
Kociemba’s Cube Explorer is apparently much faster
(I’ve never tried it myself).
My guess is that the difference is due to a better selection of projections,
yielding better heuristics.
But I haven’t gotten around to figure out whether I’ve misinterpreted
his notes or those improvements can only be found in the code.
A faster alternative is Kociemba’s two phase algorithm.
It is suboptimal, but it solves Rubik’s cubes in a fraction of a second
(1000 cubes per minute).
The first phase puts cubies into a “common orientation”
and “separates” the edges into two groups.
In other words, we reach a state where the permutation
of 12 edges can be decomposed into two disjoint
permutations of 4 and 8 edges respectively.
In the second phase, we restrict the possible moves:
quarter- and half-turns on the top and bottom faces,
half-turns only on the other faces.
These restricted moves preserve the “common orientation” of edges and corners
from phase 1,
and the edges in the middle slice stay in their slice.
Each phase thus performs an IDA* search in a much smaller space
than the full Rubik’s cube state space (2 217 093 120 and 19 508 428 800
states respectively).
Buck2 is a new open source build system developed by Meta (Facebook) which we already looked at before in some depth, see A Tour Around Buck2, Meta’s New Build System. Since then, Buck2 has gained significant improvements in user experience and language support, making it an increasingly attractive option in the build systems space.
At Tweag, we adhere to high standards for reproducible builds, which Buck2 doesn’t fully uphold in its vanilla configuration. In this post, we will introduce our ruleset that provides integration with Nix. I’ll demonstrate how it can be used, and you will gain insights into how to leverage Nix to achieve more reliable and reproducible builds with Buck2.
Reproducibility, anyone?
In short, Buck2 is a fast, polyglot build tool very similar to Bazel. Notably, it also provides fine-grained distributed caching and even speaks (in its open source variant) the same remote caching and execution protocols used by Bazel. This means you’re able to utilize the same Bazel services available for caching and remote execution.
However, in contrast to Bazel, Buck2 uses a remote first approach and does not restrict build actions using a sandbox on the local machine. As a result build actions can be non-hermetic, meaning their outcome might depend on what files or programs happen to be present on the local machine. This lack of hermeticity can lead to non-reproducible builds, which is a critical concern for the effective caching of build artifacts.
Non-hermeticity issues can be elusive, often surfacing unexpectedly for new developers which effects on-boarding new team members, or open source contributors. If left undetected, they can even cause problems down the line in production, which is why we think reproducible builds are important!
Achieving Reproducibility with Nix
If we want reproducible builds, we must not rely on anything installed on the local machine. We need to precisely control every compiler and build tool which is used in our project. Although defining each and every one of these inside the Buck2 build itself is possible, it also would be a lot of work. The solution to this problem can be Nix.
Nix is a package manager and build system for Linux and Unix-like operating systems. With nixpkgs, there is a very large and comprehensive collection of software packaged using Nix, which is extensible and can be adapted to one’s needs. Most importantly, Nix already strictly enforces hermeticity for its package builds and the nixpkgs collection goes to great lengths to achieve reproducible builds.
So, using Nix to provide compilers and build tools for Buck2 is a way to benefit from that preexisting work and introduce hermetic toolchains into a Buck2 build.
Let’s first quickly look into the Nix setup and proceed with how we can integrate it into Buck2 later.
Nix with flakes
After installing Nix, the nix command is available, and we can start declaring dependencies on packages from nixpkgs in a nix file. The Nix tool uses the Nix language, a domain-specific, purely functional and lazily evaluated programming language to define packages and declare dependencies. The language has some wrinkles, but don’t worry; we’ll only use basic expressions without delving into the more advanced concepts.
For example, here is a simple flake.nix which provides the Rust compiler as a package output:
Note: While flakes have been widely used for a long time, the feature still needs to be enabled explicitly by setting extra-experimental-features = nix-command flakes in the configuration. See the wiki for more information.
In essence, a Nix flake is a Nix expression following a specific schema. It defines its inputs (usually other flakes) and outputs (e.g. packages) which depend on the inputs. In this example the rustc package from nixpkgs is re-used for the output of this flake, but more complex expressions could be used just as well.
In order to build the rustc package output, we can call Nix in the directory of the flake.nix file like this: nix build '.#rustc'. This will either fetch pre-built artifacts of this package from a binary cache if available, or directly build the package if not. The result is the same in both cases: the rustc package output will be available in the local nix store, and from there it can be used just like other software on the system.
The output displayed above illustrates that a Nix build of a single package can produce multiple outputs. In this case the rustc package was split into a default output and an additional, separate output for the man pages.
The default output contains the main binaries such as the Rust compiler:
$ /nix/store/szc39h0qqfs4fvvln0c59pz99q90zzdn-rustc-wrapper-1.86.0/bin/rustc --version
rustc 1.86.0 (05f9846f8 2025-03-31) (built from a source tarball)
It is also important to note that the output of a Nix package depends on the specific nixpkgs revision stored in the flake.lock file, rather than any changes in the local environment. This ensures that each developer checking out the project at any point in time will receive the exact same (reproducible) output no matter what.
Using Buck2
As part of our work for Mercury, a company providing financial services, we developed rules for Buck2 which can be used to integrate packages provided by a nix flake as part of a project’s build. Recently, we have been able to publish these rules, called buck2.nix, as open source under the Apache 2 license.
To use these rules, you need to make them available in your project first. Add the following configuration to your .buckconfig:
[cells]nix=none[external_cells]nix=git[external_cell_nix]git_origin=https://github.com/tweag/buck2.nix.gitcommit_hash=accae8c8924b3b51788d0fbd6ac90049cdf4f45a # change to use a different version
This configures a cell called nix to be fetched from the specified repository on GitHub. Once set up, you can refer to that cell in your BUCK files and load rules from it.
Note: for clarity, I am going to indicate the file name in the top most comment of a code block when it is not obvious from the context already
To utilize a Nix package from Buck2, we need to introduce a new target that runs nix build inside of a build action producing a symbolic link to the nix store path as the build output. Here is how to do that using buck2.nix:
# BUCK
load("@nix//flake.bzl","flake")
flake.package(
name ="rustc",
binary ="rustc",
path ="nix",# path to a nix flake
package ="rustc",# which package to build, default is the value of the `name` attribute
output ="out",# which output to build, this is the default)
Note: this assumes the flake.nix and accompanying flake.lock file is found alongside the BUCK file in the nix subdirectory
With this build file in place, a new target called rustc is made available which builds the output called out of the rustc package of the given flake. This target can be used as a dependency of other rules in order to generate an output artifact:
# BUCK
genrule(
name ="rust-info",
out ="rust-info.txt",
cmd ="$(exe :rustc) --version > ${OUT}")
Note: Buck2 supports expanding references in string parameters using macros, such as the $(exe ) part in the cmd parameter above which expands to the path of the executable output of the :rustc target
Using Buck2 (from nixpkgs of course!) to build the rust-info target yields:
$ nix run nixpkgs#buck2 -- build --show-simple-output :rust-info
Build ID: f3fec86b-b79f-4d8e-80c7-acea297d4a64
Loading targets. Remaining 0/10 24 dirs read, 97 targets declared
Analyzing targets. Remaining 0/20 5 actions, 5 artifacts declared
Executing actions. Remaining 0/5 9.6s exec time total
Command: build. Finished 2 local
Time elapsed: 10.5s
BUILD SUCCEEDED
buck-out/v2/gen/root/904931f735703749/__rust-info__/out/rust-info.txt
$ cat buck-out/v2/gen/root/904931f735703749/__rust-info__/out/rust-info.txt
rustc 1.86.0 (05f9846f8 2025-03-31) (built from a source tarball)
For this one-off command we just ran buck2 from the nixpkgs flake on the current system. This is nice for illustration, but it is also not reproducible, and you’ll probably end up with a different Buck2 version when you try this on your machine.
In order to provide the same Buck2 version consistently, let’s add another Nix flake to our project:
This flake defines a default development environment, or dev shell for short. It uses the mkShellNoCC function from nixpkgs which creates an environment where the programs from the given packages are available in PATH.
After entering the shell by running nix develop in the directory of the flake.nix file, the buck2 command has the exact same version for everyone working on the project as long as the committed flake.lock file is not changed. For convenience, consider using direnv which automates entering the dev shell as soon as changing into the project directory.
Hello Rust
With all of that in place, let’s have a look at how to build something more interesting, like a Rust project.
Similar to the genrule above, it would be possible to define custom rules utilizing the :rustc target to compile real-world Rust projects. However, Buck2 already ships with rules for various languages in its prelude, including rules to build Rust libraries and binaries.
In a default project setup with Rust these rules would simply use whatever Rust compiler is installed in the system, which may cause build failures due to version mismatches.
To avoid this non-hermeticity, we’re going to instruct the Buck2 rules to use our pinned Rust version from nixpkgs.
Let’s start by preparing such a default setup for the infamous “hello world” example in Rust:
# src/BUCK
rust_binary(
name ="hello",
srcs =["hello.rs"],)
Toolchains
What’s left to do to make these actually work is to provide a Rust toolchain. In this context, a toolchain is a configuration that specifies a set of tools for building a project, such as the compiler, the linker, and various command-line tools. In this way, toolchains are decoupled from the actual rule definitions and can be easily changed to suit one’s needs.
In Buck2, toolchains are expected to be available in the toolchainscell under a specific name. Conventionally, the toolchains cell is located in the toolchains directory of a project. For example, all the Rust rules depend on the target toolchains//:rust which is defined in toolchains/BUCK and must provide Rust specific toolchain information.
The rustc target is defined almost identically as before, but the nix_rust_toolchain rule also expects the rustdoc attribute to be present. In this case, the rustdoc binary is available from the rustc Nix package as well and can be referenced using the sub-target syntax :rustc[rustdoc] which refers to the corresponding item of the binaries attribute given to the flake.package rule.
Additionally, we need to pass in the clippy-driver binary, which is available from the clippy package in the nixpkgs collection. Thus, the flake.nix file needs to be changed by adding the clippy package outputs:
At this point we are able to successfully build and run the target src:hello:
(nix) $ buck2 run src:hello
Build ID: 530a4620-bfb2-454d-bae1-e937ae9e764f
Analyzing targets. Remaining 0/53 75 actions, 101 artifacts declared
Executing actions. Remaining 0/11 1.1s exec time total
Command: run. Finished 3 local
Time elapsed: 0.7s
BUILD SUCCEEDED
Hello, world!
Building a real-world Rust project would be a bit more involved. Here is an interesting article how one can do that using Bazel.
Note that buck2.nix currently also provides toolchain rules for C/C++ and Python. Have a look at the example project provided by buck2.nix, which you can directly use as a template to start your own project:
$ nix flake new --template github:tweag/buck2.nix my-project
A big thank you to Mercury for their support and for encouraging us to share these rules as open source! If you’re looking for a different toolchain or have other suggestions, feel free to open a new issue. Pull requests are very welcome, too!
If you’re interested in exploring a more tightly integrated solution, you might want to take a look at the buck2-nix project, which also provides Nix integration. Since it defines an alternative prelude that completely replaces Buck2’s built-in rules, we could not use it in our project but drew good inspiration from it.
Conclusion
With the setup shown, we saw that all that is needed really is Nix (pun intended1):
we provide the buck2 binary with Nix as part of a development environment
we leverage Nix inside Buck2 to provide build tools such as compilers, their required utilities and third-party libraries in a reproducible way
Consequently, onboarding new team members no longer means following seemingly endless and quickly outdated installation instructions. Installing nix is easy; entering the dev shell is fast, and you’re up and running in no time!
And using Buck2 gives us fast, incremental builds by only building the minimal set of dependencies needed for a specific target.
Next time, I will delve into how we seamlessly integrated the Haskell toolchain libraries from Nix and how we made it fast as well.
The name Nix is derived from the Dutch word niks, meaning nothing; build actions don’t see anything that hasn’t been explicitly declared as an input↩
In last week’s article, we learned how to rotate a 2D Matrix in place using Haskell’s mutable array mechanics. This taught us how to think about a Matrix in terms of layers, starting from the outside and moving in towards the center.
Today, we’ll study one more 2D Matrix problem that uses this layer-by-layer paradigm. For more practice dealing with multi-dimensional arrays, check out our Solve.hs course! In Module 2, you’ll study all kinds of different data structures in Haskell, including 2D Matrices (both mutable and immutable).
The Problem
Today’s problem is Spiral Matrix. In this problem, we receive a 2D Matrix, and we would like to return the elements of that matrix in a 1D list in “spiral order”. This ordering consists of starting from the top left and going right. When we hit the top right corner, we move down to the bottom. The we come back across the bottom row to the left, and then back up the top left. Then we continue this process on inner layers.
So, for example, let’s suppose we have this 4x4 matrix:
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
This should return the following list:
[1,2,3,4,8,12,16,15,14,13,9,5,6,7,11,10]
At first glance, it seems like a lot of our layer-by-layer mechanics from last week will work again. All the numbers in the “first” layer come first, followed by the “second” layer, and so on. The trick though is that for this problem, we have to handle non-square matrices. So we can also have this matrix:
1 2 3 4
5 6 7 8
9 10 11 12
This should yield the list [1,2,3,4,8,12,11,10,9,5,6,7]. This isn’t a huge challenge, but we need a slightly different approach.
The Algorithm
We still want to generally move through the Matrix using a layer-by-layer approach. But instead of tracking the 4 corner points, we’ll just keep track of 4 “barriers”, imaginary lines dictating the “end” of each dimension (up/down/left/right) for us to scan. These barriers will be inclusive, meaning that they refer to the last valid row or column in that direction. We would call these “min row”, “min column”, “max row” and “max column”.
Now the general process for going through a layer will consist of 4 steps. Each step starts in a corner location and proceeds in one direction until the next corner is reached. Then, we can start again with the next layer.
The trick is the end condition. Because we can have rectangular matrices, the final layer can have a shape like 1 x n or n x 1, and this is a problem, because we wouldn’t need 4 steps. Even a square matrix of n x n with odd n would have a 1x1 as its final layer, and this is also a problem since it is unclear which “corner” this coordinate
Thus we have to handle these edge cases. However, they are easy to both detect and resolve. We know we are in such a case when “min row” and “max row” are equal, or if “min column” and “max column” are equal. Then to resolve the case, we just do one pass instead of 4, including both endpoints.
Rust Solution
For our Rust solution, let’s start by defining important terms, like we always do. For our terms, we’ll mainly be dealing with these 4 “barrier” values, the min and max for the current row and column. These are inclusive, so they are initially 0 and (length - 1). We also make a new vector to hold our result values.
pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
let mut result: Vec<i32> = Vec::new();
let mut minR: usize = 0;
let mut maxR: usize = matrix.len() - 1;
let mut minC: usize = 0;
let mut maxC: usize = matrix[0].len() - 1;
...
}
Now we want to write a while loop where each iteration processes a single layer. We’ll know we are out of layers if either “minimum” exceeds its corresponding “maximum”. Then we can start penciling in the different cases and phases of the loop. The edge cases occur when a minimum is exactly equal to its maximum. And for the normal case, we’ll do our 4-directional scanning.
pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
let mut result: Vec<i32> = Vec::new();
let mut minR: usize = 0;
let mut maxR: usize = matrix.len() - 1;
let mut minC: usize = 0;
let mut maxC: usize = matrix[0].len() - 1;
while (minR <= maxR && minC <= maxC) {
// Edge cases: single row or single column layers
if (minR == maxR) {
...
break;
} else if (minC == maxC) {
...
break;
}
// Scan TL->TR
...
// Scan TR->BR
...
// Scan BR->BL
...
// Scan BL->TL
...
minR += 1;
minC += 1;
maxR -= 1;
maxC -= 1;
}
return result;
}
Our “loop update” step comes at the end, when we increase both minimums, and decrease both maximums. This shows we are shrinking to the next layer.
Now we just have to fill in each case. All of these are scans through some portion of the matrix. The only trick is getting the ranges correct for each scan.
We’ll start with the edge cases. For a single row or column scan, we just need one loop. This loop should be inclusive across its dimension. Rust has a similar range syntax to Haskell, but it is less flexible. We can make a range inclusive by using = before the end element.
pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
...
while (minR <= maxR && minC <= maxC) {
// Edge cases: single row or single column layers
if (minR == maxR) {
for i in minC..=maxC {
result.push(matrix[minR][i]);
}
break;
} else if (minC == maxC) {
for i in minR..=maxR {
result.push(matrix[i][minC]);
}
break;
}
...
}
return result;
}
Now let’s fill in the other cases. Again, getting the right ranges is the most important factor. We also have to make sure we don’t mix up our dimensions or directions! We go right along minR, down along maxC, left along maxR, and then up along minC.
To represent a decreasing range, we have to make the corresponding incrementing range and then use .rev() to reverse it. This is a little inconvenient, giving up ranges that don’t look as nice, like for i in ((minC+1)..=maxC).rev(), because we want the decrementing range to includemaxC but excludeminC.
pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
...
while (minR <= maxR && minC <= maxC) {
...
// Scan TL->TR
for i in minC..maxC {
result.push(matrix[minR][i]);
}
// Scan TR->BR
for i in minR..maxR {
result.push(matrix[i][maxC]);
}
// Scan BR->BL
for i in ((minC+1)..=maxC).rev() {
result.push(matrix[maxR][i]);
}
// Scan BL->TL
for i in ((minR+1)..=maxR).rev() {
result.push(matrix[i][minC]);
}
minR += 1;
minC += 1;
maxR -= 1;
maxC -= 1;
}
return result;
}
But once these cases are filled in, we’re done! Here’s the full solution:
pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
let mut result: Vec<i32> = Vec::new();
let mut minR: usize = 0;
let mut maxR: usize = matrix.len() - 1;
let mut minC: usize = 0;
let mut maxC: usize = matrix[0].len() - 1;
while (minR <= maxR && minC <= maxC) {
// Edge cases: single row or single column layers
if (minR == maxR) {
for i in minC..=maxC {
result.push(matrix[minR][i]);
}
break;
} else if (minC == maxC) {
for i in minR..=maxR {
result.push(matrix[i][minC]);
}
break;
}
// Scan TL->TR
for i in minC..maxC {
result.push(matrix[minR][i]);
}
// Scan TR->BR
for i in minR..maxR {
result.push(matrix[i][maxC]);
}
// Scan BR->BL
for i in ((minC+1)..=maxC).rev() {
result.push(matrix[maxR][i]);
}
// Scan BL->TL
for i in ((minR+1)..=maxR).rev() {
result.push(matrix[i][minC]);
}
minR += 1;
minC += 1;
maxR -= 1;
maxC -= 1;
}
return result;
}
Haskell Solution
Now let’s write our Haskell solution. We don’t need any fancy mutation tricks here. Our function will just take a 2D array, and return a list of numbers.
spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix = ...
where
((minR', minC'), (maxR', maxC')) = A.bounds arr
Since we used a while loop in our Rust solution, it makes sense that we’ll want to use a raw recursive function that we’ll just call f. Our loop state was the 4 “barrier” values in each dimensions. We’ll also use an accumulator value for our result. Since our barriers are inclusive, we can simply use the bounds of our array for the initial values.
spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix = f minR' minC' maxR' maxC' []
where
((minR', minC'), (maxR', maxC')) = A.bounds arr
f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
f = undefined
This recursive function has 3 base cases. First, we have the “loop condition” we used in our Rust solution. If a min dimension value exceeds the max, we are done, and should return our accumulated result list.
Then the other two cases are our edge cases or having a single row or a single column for our final layer. In all these cases, we want to reverse the accumulated list. This means that when we put together our ranges, we want to be careful that they are in reverse order! So the edge cases should start at their max value and decrease to the min value (inclusive).
spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix arr = f minR' minC' maxR' maxC' []
where
((minR', minC'), (maxR', maxC')) = A.bounds arr
f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
f minR minC maxR maxC acc
| minR > maxR || minC > maxC = reverse acc
| minR == maxR = reverse $ [arr A.! (minR, c) | c <- [maxC,maxC - 1..minC]] <> acc
| minC == maxC = reverse $ [arr A.! (r, minC) | r <- [maxR,maxR - 1..minR]] <> acc
| otherwise = ...
Now to fill in the otherwise case, we can do our 4 steps: going right from the top left, then going down from the top right, going left from the bottom right, and going up from the bottom left.
Like the edge cases, we make list comprehensions with ranges to pull the new numbers out of our input matrix. And again, we have to make sure we accumulate them in reverse order. Then we append all of them to the existing accumulation.
spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix arr = f minR' minC' maxR' maxC' []
where
((minR', minC'), (maxR', maxC')) = A.bounds arr
f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
f minR minC maxR maxC acc
...
| otherwise =
let goRights = [arr A.! (minR, c) | c <- [maxC - 1, maxC - 2..minC]]
goDowns = [arr A.! (r, maxC) | r <- [maxR - 1, maxR - 2..minR]]
goLefts = [arr A.! (maxR, c) | c <- [minC + 1..maxC]]
goUps = [arr A.! (r, minC) | r <- [minR+1..maxR]]
acc' = goUps <> goLefts <> goDowns <> goRights <> acc
in f (minR + 1) (minC + 1) (maxR - 1) (maxC - 1) acc'
We conclude by making our recursive call with the updated result list, and shifting the barriers to get to the next layer.
Here’s the full implementation:
spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix arr = f minR' minC' maxR' maxC' []
where
((minR', minC'), (maxR', maxC')) = A.bounds arr
f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
f minR minC maxR maxC acc
| minR > maxR || minC > maxC = reverse acc
| minR == maxR = reverse $ [arr A.! (minR, c) | c <- [maxC,maxC - 1..minC]] <> acc
| minC == maxC = reverse $ [arr A.! (r, minC) | r <- [maxR,maxR - 1..minR]] <> acc
| otherwise =
let goRights = [arr A.! (minR, c) | c <- [maxC - 1, maxC - 2..minC]]
goDowns = [arr A.! (r, maxC) | r <- [maxR - 1, maxR - 2..minR]]
goLefts = [arr A.! (maxR, c) | c <- [minC + 1..maxC]]
goUps = [arr A.! (r, minC) | r <- [minR+1..maxR]]
acc' = goUps <> goLefts <> goDowns <> goRights <> acc
in f (minR + 1) (minC + 1) (maxR - 1) (maxC - 1) acc'
Conclusion
This is the last matrix-based problem we’ll study for now. Next time we’ll start considering some tree-based problems. If you sign up for our Solve.hs course, you’ll learn about both of these kinds of data structures in Module 2. You’ll implement a tree set from scratch, and you’ll get lots of practice working with these and many other structures. So enroll today!
The GHC developers are very pleased to announce the availability
of the release candidate for GHC 9.10.3. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org and
via GHCup.
GHC 9.10.3 is a bug-fix release fixing over 50 issues of a variety of
severities and scopes. A full accounting of these fixes can be found in the
release notes. As always, GHC’s release status, including planned future
releases, can be found on the GHC Wiki status.
This release candidate will have a two-week testing period. If all goes well
the final release will be available the week of 11 August 2025.
We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable,
Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors
whose on-going financial and in-kind support has facilitated GHC maintenance
and release management over the years. Finally, this release would not have
been possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
Feature flags reduce deployment risk, enable continuous delivery, and create controlled user experiences. As a Svelte enthusiast, I noticed the absence of official LaunchDarkly support for this growing framework, so I built the LaunchDarkly Svelte SDK to fill this gap. In this post, I’ll introduce the SDK and demonstrate how to implement it in a SvelteKit application.
Feature Flags in Frontend Development
Feature flags (or feature toggles) are runtime-controlled switches that let you enable or disable features without unnecessary deployments.
For example, imagine you are working on a new feature that requires significant changes to the UI. By using feature flags, you can deploy the changes to all the environments but only enable the feature in specific ones (like development or uat), or to a subset of users in a single environment (like users on Pro subscription). This allows you to test the feature without exposing it to unintended users, reducing the risk of introducing bugs or breaking changes. And in case things go bad, like a feature is not working as expected, you can easily disable it without having to roll back the entire deployment.
What is LaunchDarkly ?
LaunchDarkly is a feature management platform that provides an easy and scalable way to wrap parts of your code (new features, UI elements, backend changes) in flags so they can be turned on/off without redeploying. It provides a user-friendly dashboard to manage and observe flags, and supports over a dozen SDKs for client/server platforms.
In my experience, LaunchDarkly is easier to use — including for non-technical users — and more scalable than most home-grown feature flag solutions.
LaunchDarkly supports targeting and segmentation, so you can control which users see specific features based on things like a user’s location or subscription plan. It also offers solid tooling for running experiments, including A/B testing and progressive rollouts (where a new feature is released to users in stages, rather than all at once). All feature flags can be updated in real-time, meaning that there’s no need for users to refresh the page to see changes.
Those are just my favorites, but if you are interested in learning more about it, LaunchDarkly has a blog post with more information.
Flag Evaluations
LaunchDarkly flags have unique identifiers called flag keys that are defined in the LaunchDarkly dashboard. When you request a flag value, supported client-side SDKs (such as React, iOS, Android, or, now, Svelte) send the flag key along with user information (called the “context”) to LaunchDarkly. LaunchDarkly’s server computes the value of the flag using all the applicable rules (the rules are applied in order) and sends the result back to the app. This process is called flag evaluation. By default, LaunchDarkly uses streaming connections to update flags in real time. This lets you flip flags in the dashboard and see the effect almost instantly in your app.
Svelte in Brief
Svelte is a modern JavaScript framework that I’ve come to appreciate for its performance, simplicity, and excellent developer experience. What I particularly like about Svelte is that it lets you write reactive code directly using standard JavaScript variables, with an intuitive syntax that requires less boilerplate than traditional React or Vue applications. Reactive declarations and stores are built into the framework, so you don’t need Redux or similar external state management libraries for most use cases.
Svelte’s Approach
Superior Runtime Performance: Svelte doesn’t rely on virtual DOM. By eliminating the virtual DOM and directly manipulating the real DOM, Svelte can update the UI more quickly and efficiently, resulting in a more responsive application.
Faster Load Times: Svelte’s compilation process generates smaller JavaScript bundles and more efficient code, resulting in faster initial page load times compared to frameworks that ship runtime libraries to the browser.
A Simple Example of a Svelte Component
In this example, we define a SimpleCounter component that increments a count when a button is clicked. The count variable is reactive, meaning that any changes to it will automatically update the UI.
// SimpleCounter.svelte<scriptlang="ts">
let count = $state(0);
</script><buttononclick={()=> count++}>
clicks: {count}</button>
Now, we can use this component in our application which is in fact another Svelte component. For example: App.svelte:
// App.svelte<scriptlang="ts">
import SimpleCounter from './SimpleCounter.svelte';
</script><SimpleCounter/>
After doing this, we can end up with something like this:
Overview of the LaunchDarkly Svelte SDK
Why Use a Dedicated Svelte SDK?
Although LaunchDarkly’s vanilla JavaScript SDK could be used in a Svelte application, this new SDK aligns better with Svelte’s reactivity model and integrates with Svelte-tailored components, allowing us to use LaunchDarkly’s features more idiomatically in our Svelte projects.
I originally developed it as a standalone project and then contributed it upstream to be an official part of the LaunchDarkly SDK.
Introduction to LaunchDarkly Svelte SDK
Here are some basic steps to get started with the LaunchDarkly Svelte SDK:
1.Install the Package: First, install the SDK package in your project.
Note: Since the official LaunchDarkly Svelte SDK has not been released yet, for the purposes of this blog post, I’ve created a temporary package available on npm that contains the same code as the official repo. You can still check the official source code in LaunchDarkly’s official repository.
npminstall @nosnibor89/svelte-client-sdk
2.Initialize the SDK: Next, you need to initialize the SDK with your LaunchDarkly client-side ID (you need a LaunchDarkly account). This is done using the LDProvider component, which provides the necessary context for feature flag evaluation. Here is an example of how to set it up:
<scriptlang="ts">
import { LDProvider } from '@nosnibor89/svelte-client-sdk';
import MyLayout from './MyLayout.svelte';
</script>// Use context relevant to your application. More info in https://docs.launchdarkly.com/home/observability/contextsconst context ={
user:{
key:'user-key',},};<LDProvider clientID="your-client-side-id"{context}><MyLayout/></LDProvider>
Let’s clarify the code above:
Notice how I wrapped the MyLayout component with the LDProvider component. Usually, you will wrap a high-level component that encompasses most of your application with LDProvider, although it’s up to you and how you want to structure the app.
You can also notice two parameters provided to our LDProvider. The "your-client-side-id" refers to the LaunchDarkly Client ID and the context object refers to the LaunchDarkly Context used to evaluate feature flags. This is necessary information we need to provide for the SDK to work correctly.
3.Evaluate a flag: The SDK provides the LDFlag component for evaluating your flag1. This component covers a common use case where you want to render different content based on the state of a feature flag. By default, LDFlag takes a boolean flag but can be extended to work with the other LaunchDarkly flag types as well.
<scriptlang="ts">
import { LDFlag } from '@nosnibor89/svelte-client-sdk';
</script><LDFlagflag={'my-feature-flag'}>{#snippet on()}<p>renders if flag evaluates to true</p>{/snippet}{#snippet off()}<p>renders if flag evaluates to false</p>{/snippet}</LDFlag>
In this example, the LDFlag component will render the content inside the on snippet2 if the feature flag my-feature-flag evaluates to true. If the flag evaluates to false, the content inside the off snippet will be rendered instead.
Building an application with SvelteKit
Now that we have seen the basics of how to use the LaunchDarkly Svelte SDK, let’s see how we can put everything together in a real application.
For the sake of brevity, I’ll be providing the key source code in this example, but if you are curious or need help, you can check out the full source code in Github.
How the app works
This is a simple ‘movies’ app where the main page displays a list of movies in a card format with a SearchBar component at the top. This search bar allows users to filter movies based on the text entered.
The scenario we’re simulating is that Product Owners want to replace the traditional search bar with a new AI-powered assistant that helps users get information about specific movies. This creates a perfect use case for feature flags and can be described as follows:
Feature Flag Scenarios
SearchBar vs AI Assistant: We’ll use a boolean feature flag to determine whether to display the classic SearchBar component or the new MoviesSmartAssistant3 component - simulating a simple all-at-once release.
AI Model Selection: We’ll use a JSON feature flag to determine which AI model (GPT or Gemini) the MoviesSmartAssistant will use. This includes details about which model to use for specific users, along with display information like labels. This simulates a progressive rollout where Product Owners can gather insights on which model performs better.
Prerequisites
To follow along, you’ll need:
A LaunchDarkly account
A LaunchDarkly Client ID (Check this guide to get it)
Two feature flags (see the creating new flags guide): a boolean flag (show-movie-smart-assistant) and a JSON flag (smart-assistant-config) looking like this:
{"model":"gpt-4","label":"Ask GPT-4 anything"}
A SvelteKit4 application (create with npx sv create my-app)
Integrating the LaunchDarkly Svelte SDK
After creating the project, a SvelteKit application was scaffolded for you, meaning you should have a src directory where your application code resides. Inside this folder, you will find a routes directory, which is where SvelteKit handles routing. More specifically, there are two files: +layout.svelte and +page.svelte which are the main files we are going to highlight in this post.
Setting up the layout
// src/routes/+layout.svelte<scriptlang="ts">
import "../app.css";
import { LDProvider } from "@nosnibor89/svelte-client-sdk";
import { PUBLIC_LD_CLIENT_ID } from '$env/static/public';
import LoadingSpinner from "$lib/LoadingSpinner.svelte"; // Check source code in Github https://github.com/tweag/blog-resources/blob/master/launchdarkly-svelte-sdk-intro/src/lib/LoadingSpinner.svelte
let { children } = $props();
// random between 0 and 1
const orgId = Math.round(Math.random());
const orgKey = `sdk-example-org-${orgId}`
const ldContext = {
kind: "org",
key: orgKey,
};
</script><LDProviderclientID={PUBLIC_LD_CLIENT_ID}context={ldContext}>{#snippet initializing()}<divclass="..."><LoadingSpinnermessage={"Loading flags"}/></div>{/snippet}{@renderchildren()}</LDProvider>
Let’s analyze this:
We are importing the LDProvider component from the LaunchDarkly Svelte SDK and wrapping our layout with it. In SvelteKit, the layout will act as the entry point for our application, so it’s a good place for us to initialize the SDK allowing us to use other members of the SDK in pages or child components.
We are also importing the PUBLIC_LD_CLIENT_ID variable from the environment variables. You can set this variable in your .env file at the root of the project (this is a SvelteKit feature).
Another thing to notice is that we are using a LoadingSpinner component while the SDK is initializing. This is optional and is a good place to provide feedback to the user while the SDK is loading and feature flags are being evaluated for the first time. Also, don’t worry about the code for LoadingSpinner, you can find it in the source code on Github.
Add the movies page
At this point, we are ready to start evaluating flags, so let’s now go ahead and add our page where the SDK will help us accomplish scenarios 1 and 2.
Movies Page (SearchBar vs AI Assistant)
The movies page is the main and only page of our application. It displays a list of movies along with a search bar. This is the part where we will evaluate our first feature flag to switch between the SearchBar and the MoviesSmartAssistant components.
// src/routes/+page.svelte<scriptlang="ts">
// ...some imports hidden for brevity. Check source code on Github
import SearchBar from "$lib/SearchBar.svelte";
import MoviesSmartAssistant from "$lib/MoviesSmartAssistant.svelte";
import { LD, LDFlag } from "@nosnibor89/svelte-client-sdk";
let searchQuery = $state("");
let prompt = $state("");
const flagKey = "show-movie-smart-assistant";
const flagValue = LD.watch(flagKey);
flagValue.subscribe((value) => {
// remove search query or prompt when flag changes
searchQuery = "";
prompt = "";
});
// ...rest of the code hidden for brevity. Check source code on Github
// https://github.com/tweag/blog-resources/blob/master/launchdarkly-svelte-sdk-intro/src/routes/%2Bpage.svelte
</script><divclass="..."><LDFlagflag={flagKey}>{#snippet on()}<MoviesSmartAssistantprompt={prompt}onChange={handlePromptChange}onSubmit={handleSendPrompt}/>{/snippet}{#snippet off()}<SearchBarvalue={searchQuery}onSearch={handleSearch}/>{/snippet}</LDFlag><divclass="...">{#each filteredMovies as movie}
<MovieCard {movie} />
{/each}</div></div>
Again, let’s break this down:
We are using the LDFlag component from the SDK. It will allow us to determine which component to render based on the state of the show-movie-smart-assistant feature flag. When the flag evaluates to true, the on snippet will run, meaning the MoviesSmartAssistant component will be rendered, and when the flag evaluates to false, the off snippet will run, meaning the SearchBar component will be rendered.
We are also using the LD.watch function. This is useful when you need to get the state of a flag and keep track of it. In this case, we are simply resetting the search query or prompt so that the user can start fresh when the flag changes.
The rest of the code you are not seeing is just functionality for the filtering mechanism and the rest of the presentational components. Remember you can find the code for those on Github.
MoviesSmartAssistant Component (AI Model Selection)
Whenever our MoviesSmartAssistant component is rendered, we want to check the value of the smart-assistant-config feature flag to determine which AI model to use for the assistant.
// src/lib/MoviesSmartAssistant.svelte<scriptlang="ts">
import { LD } from "@nosnibor89/svelte-client-sdk";
import type { Readable } from "svelte/store";
type MoviesSmartAssistantConfig = { model: string; label: string;};
const smartAssistantConfig = LD.watch("smart-assistant-config") as Readable<MoviesSmartAssistantConfig>;
// ... rest of the code hidden for brevity. Check source code on Github
// https://github.com/tweag/blog-resources/blob/master/launchdarkly-svelte-sdk-intro/src/lib/MoviesSmartAssistant.svelte
</script><divclass="..."><inputtype="text"placeholder={$smartAssistantConfig?.label ??"Ask me anything..."}value={prompt}oninput={handleInput}class="..."/><buttontype="button"onclick={handleClick}aria-label="Submit">
// ...svg code hidden for brevity
</button></div>
As before, I’m hiding some code for brevity, but here are the key points:
We are using the LD.watch method to watch for changes in the smart-assistant-config feature flag which contains information about the AI model. This will allow us to use the proper model for a given user based on the flag evaluation.
Notice how the SDK understands it’s a JSON flag and returns a Javascript object (with a little help5) as we defined in the LaunchDarkly dashboard.
Running the Application
Now that we have everything set up, let’s run the application. Here we are going to use the Client ID and set it as an environment variable.
PUBLIC_LD_CLIENT_ID={your_client_id}npm run dev
Open your browser and navigate to http://localhost:5173 (check your terminal as it may run at a different port). You should see the movies application with either the SearchBar or MoviesSmartAssistant component depending on your feature flag configuration.
Seeing Feature Flags in Action
If you were able to correctly set everything up, you should be able to interact with the application and LaunchDarkly Dashboard by toggling the feature flags and validating the behavior of the application.
I have included this demo video to show you how the application works and how the feature flags are being evaluated.
Conclusion
We just saw how to use the LaunchDarkly Svelte SDK and integrate it into a SvelteKit application using a realistic example. I hope this post gave you an understanding of the features the SDK provides and also what it lacks while being in its early stages and while awaiting the official release.
For now, my invitation for you is to try the SDK yourself and explore different use cases. For example, change the context with LD.identify to simulate users signing in to an application, or maybe try a different flag type like a string or number flag. Also, stay tuned for updates on the official LaunchDarkly Svelte SDK release.
LDFlag is a key component but there are other ways to evaluate a flag using the SDK.↩
The MoviesSmartAssistant component is just a visual representation without actual AI functionality — my focus is on demonstrating how the LaunchDarkly Svelte SDK enables these feature flag implementations.↩
SvelteKit is the official application framework for Svelte. It comes with out-of-the-box support for TypeScript, server-side rendering, and automatic routing through file-based organization.↩
Ok, I’m also using TypeScript here to hint the type of the object returned by the LD.watch method. Maybe this is something to fix in the future.↩
Today, 2025-07-23, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 47th episode of the Haskell Unfolder live on YouTube.
“Pure parallelism” refers to the execution of pure Haskell functions on multiple CPU cores, (hopefully) speeding up the computation. Since we are still dealing with pure functions, however, we get none of the problems normally associated with concurrent execution: no non-determinism, no need for locks, etc. In this episode we will develop a pure but parallel implementation of linear regression. We will briefly recap how linear regression works, before discussing the two primitive functions that Haskell offers for pure parallelism: par and pseq.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
In last week’s article, we took our first step into working with multi-dimensional arrays. Today, we’ll be working with another Matrix problem that involves in-place mutation. The Haskell solution uses the MArray interface, which takes us out of our usual
The MArray interface is a little tricky to work with. If you want a full overview of the API, you should sign up for our Solve.hs course, where we cover mutable arrays in module 2!
The Problem
Today’s problem is Rotate Image. We’re going to take a 2D Matrix of integer values as our input and rotate the matrix 90 degrees clockwise. We must accomplish this in place, modifying the input value without allocating a new Matrix. The input matrix is always “square” (n x n).
Here are a few examples to illustrate the idea. We can start with a 2x2 matrix:
1 2 | 3 1
3 4 | 4 2
The 4x4 rotation makes it more clear that we’re not just moving numbers one space over. Each corner element will go to a new corner. You can also see how the inside of the matrix is also rotating:
The 3x3 version shows how with an odd number of rows and columns, the inner most number will stand still.
1 2 3 | 7 4 1
4 5 6 | 8 5 2
7 8 9 | 9 6 3
The Algorithm
While this problem might be a little intimidating at first, we just have to break it into sufficiently small and repeatable pieces. The core step is that we swap four numbers into each other’s positions. It’s easy to see, for example, that the four corners always trade places with one another (1, 4, 13, 16 in the 4x4 example).
What’s important is seeing the other sets of 4. We move clockwise to get the next 4 values:
The value to the right of the top left corner
The value below the top right corner
The value to the left of the bottom right corner
The value above the bottom left corner.
So in the 4x4 example, these would be 2, 8, 15, 9. Then another group is 3, 12, 14, 15.
Those 3 groups are all the rotations we need for the “outer layer”. Then we move to the next layer, where we have a single group of 4: 6, 7, 10, 11.
This should tell us that we have a 3-step process:
Loop through each layer of the matrix
Identify all groups of 4 in this layer
Rotate each group of 4
It helps to put a count on the size of each of these loops. For an n x n matrix, the number of layers to rotate is n / 2, rounded down, because the inner-most layer needs no rotation in an odd-sized matrix.
Then for a layer spanning from column c1 to c2, the number of groups in that layer is just c2 - c1. So for the first layer in a 4x4, we span columns 0 to 3, and there are 3 groups of 4. In the inner layer, we span columns 1 to 2, so there is only 1 group of 4.
Rust Solution
As is typical, we’ll see more of a loop structure in our Rust code, and a recursive version of this solution in Haskell. We’ll also start by defining various terms we’ll use. There are multiple ways to approach the details of this problem, but we’ll take an approach that maximizes the clarity of our inner loops.
We’ll define each “layer” using the four corner coordinates of that layer. So for an n x n matrix, these are (0,0), (0, n - 1), (n - 1, n - 1), (n - 1, 0). After we finish looping through a layer, we can simply increment/decrement each of these values as appropriate to get the corner coordinates of the next layer ((1,1), (1, n - 2), etc.).
So let’s start our solution by defining the 8 mutable values for these 4 corners. Each corner (top/left/bottom/right) has a row R and column C value.
pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
let n = matrix.len();
let numLayers = n / 2;
let mut topLeftR = 0;
let mut topLeftC = 0;
let mut topRightR = 0;
let mut topRightC = n - 1;
let mut bottomRightR = n - 1;
let mut bottomRightC = n - 1;
let mut bottomLeftR = n - 1;
let mut bottomLeftC = 0;
...
}
It would be possible to solve the problem without these values, determining coordinates using the layer number. But I’ve found this to be somewhat more error prone, since we’re constantly adding and subtracting from different coordinates in different combinations. We get the number of layers from n / 2.
Now let’s frame the outer loop. We conclude the loop by modifying each coordinate point. Then at the beginning of the loop, we can determine the number of “groups” for the layer by taking the difference between the left and right column coordinates.
pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
...
for i in 0..numLayers {
let numGroups = topRightC - topLeftC;
for j in 0..numGroups {
...
}
topLeftR += 1;
topLeftC += 1;
topRightR += 1;
topRightC -= 1;
bottomRightR -= 1;
bottomRightC -= 1;
bottomLeftR -= 1;
bottomLeftC += 1;
}
}
Now we just need the logic for rotating a single group of 4 points. This is a 5-step process:
Save top left value as temp
Move bottom left to top left
Move bottom right to bottom left
Move top right to bottom right
Move temp (original top left) to top right
Unlike the layer number, we’ll use the group variable j for arithmetic here. When you’re writing this yourself, it’s important to go slowly to make sure you’re using the right corner values and adding/subtracting j from the correct dimension.
pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
...
for i in 0..numLayers {
let numGroups = topRightC - topLeftC;
for j in 0..numGroups {
let temp = matrix[topLeftR][topLeftC + j];
matrix[topLeftR][topLeftC + j] = matrix[bottomLeftR - j][bottomLeftC];
matrix[bottomLeftR - j][bottomLeftC] = matrix[bottomRightR][bottomRightC - j];
matrix[bottomRightR][bottomRightC - j] = matrix[topRightR + j][topRightC];
matrix[topRightR + j][topRightC] = temp;
}
... // (update corners)
}
}
And then we’re done! We don’t actually need to return a value since we’re just modifying the input in place. Here’s the full solution:
pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
let n = matrix.len();
let numLayers = n / 2;
let mut topLeftR = 0;
let mut topLeftC = 0;
let mut topRightR = 0;
let mut topRightC = n - 1;
let mut bottomRightR = n - 1;
let mut bottomRightC = n - 1;
let mut bottomLeftR = n - 1;
let mut bottomLeftC = 0;
for i in 0..numLayers {
let numGroups = topRightC - topLeftC;
for j in 0..numGroups {
let temp = matrix[topLeftR][topLeftC + j];
matrix[topLeftR][topLeftC + j] = matrix[bottomLeftR - j][bottomLeftC];
matrix[bottomLeftR - j][bottomLeftC] = matrix[bottomRightR][bottomRightC - j];
matrix[bottomRightR][bottomRightC - j] = matrix[topRightR + j][topRightC];
matrix[topRightR + j][topRightC] = temp;
}
topLeftR += 1;
topLeftC += 1;
topRightR += 1;
topRightC -= 1;
bottomRightR -= 1;
bottomRightC -= 1;
bottomLeftR -= 1;
bottomLeftC += 1;
}
}
Haskell Solution
This is an interesting problem to solve in Haskell because Haskell is a generally immutable language. Unlike Rust, we can’t make values mutable just by putting the keyword mut in front of them.
With arrays, we can modify them in place though using the MArray monad class. We won’t go through all the details of the interface in this article (you can learn about all that in Solve.hs Module 2). But we’ll start with the type signature:
rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
This tells us we are taking a mutable array, where the array type is polymorphic but tied to the monad m. For example, IOArray would work with the IO monad. We don’t return anything, because we’re modifying our input.
We still begin our function by defining terms, but now we need to use monadic actions to retrieve even the bounds our our array.
rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
((minR, minC), (maxR, maxC)) <- getBounds arr
let n = maxR - minR + 1
let numLayers = n `quot` 2
...
Our algorithm has two loop levels. The outer loop goes through the different layers of the matrix. The inner layer goes through each group of 4 within the layer. In Haskell, both of these loops are recursive, monadic functions. Our Rust loops treat the four corner points of the layer as stateful values, so these need to be inputs to our recursive functions. In addition, each function will take the layer/group number as an input.
rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
((minR, minC), (maxR, maxC)) <- getBounds arr
let n = maxR - minR + 1
let numLayers = n `quot` 2
...
where
rotateLayer tl@(tlR, tlC) tr@(trR, trC) br@(brR, brC) bl@(blR, blC) n = ...
rotateGroup (tlR, tlC) (trR, trC) (brR, brC) (blR, blC) j = ...
Now we just have to fill in these functions. For rotateLayer, we use the “layer number” parameter as a countdown. Once it reaches 0, we’ll be done. We just need to determine the number of groups in this layer using the column difference of left and right. Then we’ll call rotateGroup for each group.
We make the first call to rotateLayer with numLayers and the original corners, coming from our dimensions. When we recurse, we add/subtract 1 from the corner dimensions, and subtract 1 from the layer number.
rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
((minR, minC), (maxR, maxC)) <- getBounds arr
let n = maxR - minR + 1
let numLayers = n `quot` 2
rotateLayer (minR, minC) (minR, maxC) (maxR, maxC) (maxR, minC) numLayers
where
rotateLayer _ _ _ _ 0 = return ()
rotateLayer tl@(tlR, tlC) tr@(trR, trC) br@(brR, brC) bl@(blR, blC) n = do
let numGroups = ([0..(trC - tlC - 1)] :: [Int])
forM_ numGroups (rotateGroup tl tr br bl)
rotateLayer (tlR + 1, tlC + 1) (trR + 1, trC - 1) (brR - 1, brC - 1) (blR - 1, blC + 1) (n - 1)
rotateGroup (tlR, tlC) (trR, trC) (brR, brC) (blR, blC) j = ...
And how do we rotate a group? We use the same five steps we took in Rust. We save the top left as temp and then move the values around. We use the monadic functions readArray and writeArray to perform these actions in place on our Matrix.
We’ve got one more Matrix problem to solve next time, and then we’ll move on to some other data structures. To learn more about using Data Structures and Algorithms in Haskell, you take our Solve.hs course. You’ll get the chance to write a number of data structures from scratch, and you’ll get plenty of practice working with them and using them in algorithms!
Continuing a series of
posts
on techniques for calculating range queries, today I will present
the sparse table data structure, for doing fast range queries on a
static sequence with an idempotent combining operation.
Motivation
In my previous
post,
we saw that if we have a static sequence and a binary operation with a
group structure (i.e. every element has an inverse), we can
precompute a prefix sum table in \(O(n)\) time, and then use it to answer
arbitrary range queries in \(O(1)\) time.
What if we don’t have inverses? We can’t use prefix sums, but can we
do something else that still allows us to answer range queries in
\(O(1)\)? One thing we could always do would be to construct an \(n \times n\) table storing the answer to every possible range
query—that is, \(Q[i,j]\) would store the value of the range \(a_i \diamond \dots \diamond a_j\). Then we could just look up the answer to
any range query in \(O(1)\). Naively computing the value of each
\(Q[i,j]\) would take \(O(n)\) time, for a total of \(O(n^3)\) time to fill
in each of the entries in the tableWe only have to fill in \(Q[i,j]\)
where \(i < j\), but this is still about \(n^2/2\) entries.
, though it’s not
too hard to fill in the table in \(O(n^2)\) total time, spending only
\(O(1)\) to fill in each entry—I’ll leave this to you as an exercise.
However, \(O(n^2)\) is often too big. Can we do better? More
generally, we are looking for a particular subset of range queries
to precompute, such that the total number is asymptotically less than
\(n^2\), but we can still compute the value of any arbitrary range query
by combining some (constant number of) precomputed ranges. In the case
of a group structure, we were able to compute the values for only
prefix ranges of the form \(1 \dots k\), then compute the value of an arbitrary
range using two prefixes, via subtraction.
A sparse table is exactly such a scheme for precomputing a subset of
ranges.In fact, I believe, but do not know for sure, that this is
where the name “sparse table” comes from—it is “sparse” in the sense
that it only stores a sparse subset of range values.
Rather than only
a linear number of ranges, as with prefix sums, we have to compute
\(O(n \lg n)\) of them, but that’s still way better than \(O(n^2)\). Note,
however, that a sparse table only works when the combining operation
is idempotent, that is, when \(x \diamond x = x\) for all \(x\). For
example, we can use a sparse table with combining operations such as
\(\max\) or \(\gcd\), but not with \(+\) or \(\times\). Let’s see how it works.
Sparse tables
The basic idea behind a sparse table is that we precompute a series of
“levels”, where level \(i\) stores values for ranges of length \(2^i\). So level
\(0\) stores “ranges of length \(1\)”—that is, the elements of the
original sequence; level \(1\) stores ranges of length \(2\); level
\(2\) stores ranges of length \(4\); and so on. Formally, \(T[i,j]\)
stores the value of the range of length \(2^i\) starting at index \(j\).
That is,
We can see that \(i\) only needs to go from \(0\) up to \(\lfloor \lg n \rfloor\); above that and the stored ranges would be larger than
the entire sequence. So this table has size \(O(n \lg n)\).
Two important questions remain: how do we compute this table in the
first place? And once we have it, how do we use it to answer arbitrary
range queries in \(O(1)\)?
Computing the table is easy: each range on level \(i\), of length \(2^i\), is the
combination of two length-\(2^{i-1}\) ranges from the previous level. That is,
\[T[i,j] = T[i-1, j] \diamond T[i-1, j+2^{i-1}]\]
The zeroth level just consists of the elements of the original
sequence, and we can compute each subsequent level using values from
the previous level, so we can fill in the entire table in \(O(n \lg n)\)
time, doing just a single combining operation for each value in the table.
Once we have the table, we can compute the value of an arbitrary
range \([l,r]\) as follows:
Compute the biggest power of two that fits within the range, that
is, the largest \(k\) such that \(2^k \leq r - l + 1\). We can compute
this simply as \(\lfloor \lg (r - l + 1) \rfloor\).
Look up two range values of length \(2^k\), one for the range which begins at \(l\)
(that is, \(T[k, l]\)) and one for the range which ends at \(r\) (that is, \(T[k, r - 2^k + 1]\)). These two ranges overlap; but because the combining
operation is idempotent, combining the values of the ranges yields
the value for our desired range \([l,r]\).
This is why we require the combining operation to be idempotent:
otherwise the values in the overlap would be overrepresented in the
final, combined value.
Haskell code
Let’s write some Haskell code! First, a little module for idempotent
semigroups. Note that we couch everything in terms of semigroups,
not monoids, because we have no particular need of an identity
element; indeed, some of the most important examples like \(\min\) and
\(\max\) don’t have an identity element. The IdempotentSemigroup
class has no methods, since as compared to Semigroup it only adds a
law. However, it’s still helpful to signal the requirement. You
might like to convince yourself that all the instances listed below
really are idempotent.
moduleIdempotentSemigroupwhereimportData.BitsimportData.Semigroup-- | An idempotent semigroup is one where the binary operation-- satisfies the law @x <> x = x@ for all @x@.classSemigroup m =>IdempotentSemigroup minstanceOrd a =>IdempotentSemigroup (Min a)instanceOrd a =>IdempotentSemigroup (Max a)instanceIdempotentSemigroupAllinstanceIdempotentSemigroupAnyinstanceIdempotentSemigroupOrderinginstanceIdempotentSemigroup ()instanceIdempotentSemigroup (First a)instanceIdempotentSemigroup (Last a)instanceBits a =>IdempotentSemigroup (And a)instanceBits a =>IdempotentSemigroup (Ior a)instance (IdempotentSemigroup a, IdempotentSemigroup b) =>IdempotentSemigroup (a,b)instanceIdempotentSemigroup b =>IdempotentSemigroup (a -> b)
Now, some code for sparse tables. First, a few imports.
{-# LANGUAGE TupleSections #-}moduleSparseTablewhereimportData.Array (Array, array, (!))importData.Bits (countLeadingZeros, finiteBitSize, (!<<.))importIdempotentSemigroup
The sparse table data structure itself is just a 2D array over some
idempotent semigroup m. Note that UArray would be more efficient,
but (1) that would make the code for building the sparse table more
annoying (more on this later), and (2) it would require a bunch of
tedious additional constraints on m.
newtypeSparseTable m =SparseTable (Array (Int, Int) m)deriving (Show)
We will frequently need to compute rounded-down base-two logarithms,
so we define a function for it. A straightforward implementation
would be to repeatedly shift right by one bit and count the number of
shifts needed to reach zero; however, there is a better way, using
Data.Bits.countLeadingZeros. It has a naive default implementation
which counts right bit shifts, but in most cases it compiles down to
much more efficient machine instructions.
-- | Logarithm base 2, rounded down to the nearest integer. Computed-- efficiently using primitive bitwise instructions, when available.lg ::Int->Intlg n = finiteBitSize n -1- countLeadingZeros n
Now let’s write a function to construct a sparse table, given a
sequence of values. Notice how the sparse table array st is defined
recursively.
This works because the Array type is lazy in the stored values, with
the added benefit that only the array values we end up actually
needing will be computed. However, this comes with a decent amount of
overhead. If we wanted to use an unboxed array instead, we wouldn’t
be able to use
the recursive definition trick; instead, we would have to use an
STUArray
and fill in the values in a specific order. The code for this would
be longer and much more tedious, but could be faster if we end up
needing all the values in the array anyway.
-- | Construct a sparse table which can answer range queries over the-- given list in $O(1)$ time. Constructing the sparse table takes-- $O(n \lg n)$ time and space, where $n$ is the length of the list.fromList ::IdempotentSemigroup m => [m] ->SparseTable mfromList ms =SparseTable stwhere n =length ms lgn = lg n st = array ((0, 0), (lgn, n -1)) $zip ((0,) <$> [0..]) ms++ [ ((i, j), st ! (i -1, j) <> st ! (i -1, j +1!<<. (i -1)))| i <- [1.. lgn] , j <- [0.. n -1!<<. i] ]
Finally, we can write a function to answer range queries.
-- | \$O(1)$. @range st l r@ computes the range query which is the-- @sconcat@ of all the elements from index @l@ to @r@ (inclusive).range ::IdempotentSemigroup m =>SparseTable m ->Int->Int-> mrange (SparseTable st) l r = st ! (k, l) <> st ! (k, r - (1!<<. k) +1)where k = lg (r - l +1)
Applications
Most commonly, we can use a sparse table to find the minimum or
maximum values on a range, \(\min\) and \(\max\) being the quintessential
idempotent operations. For example, this plays a key role in a
solution to the (quite tricky) problem
Ograda.At first it
seemed like that problem should be solvable with some kind of sliding
window approach, but I couldn’t figure out how to make it work!
What if we want to find the index of the minimum or maximum value in
a given range (see, for example, Worst Weather)? We can easily accomplish this using the semigroup Min (Arg m i) (or Max (Arg m i)), where m is the type of the values and i is
the index type. Arg, from Data.Semigroup, is just a pair which uses only the first value
for its Eq and Ord instances, and carries along the second value
(which is also exposed via Functor, Foldable, and Traversable
instances). In the example below, we can see that the call to range st 0 3 returns both the max value on the range (4) and its index
(2) which got carried along for the ride:
λ> :m +Data.Semigroup
λ> st = fromList (map Max (zipWith Arg [2, 3, 4, 2, 7, 4, 9] [0..]))
λ> range st 0 3
Max {getMax = Arg 4 2}
Finally, I will mention that being able to compute range minimum
queries is one way to compute lowest common ancestors for a (static,
rooted) tree. First, walk the tree via a depth-first search and
record the depth of each node encountered in sequence, a so-called
Euler tour (note
that you must record every visit to a node—before visiting any of
its children, in between each child, and after visiting all the
children). Now the minimum depth recorded between visits to any two
nodes will correspond to their lowest common ancestor.
Here are a few problems that involve computing least common ancestors
in a tree, though note there are also other techniques for computing
LCAs (such as binary jumping) which I plan to write about eventually.
The Stackage team is happy to announce that
Stackage LTS version 24 has finally been
released a couple of days ago, based on GHC stable version 9.10.2.
LTS 24 includes many
package changes, and over
3400 packages! Thank you for all your nightly contributions that made this
release possible: the initial release was prepared by Mihai Maruseac. The
closest nightly snapshot to lts-24.0 is
nightly-2025-07-13.
At the same time we are excited to move Stackage
Nightly to GHC 9.12.2: the initial snapshot
release is nightly-2025-07-15.
Current nightly has over 3100 packages, and we expect that number to grow over
the coming weeks and months: we welcome your contributions and help with this.
This initial release build was made by Jens Petersen (31 commits).
A number of packages have been disabled, with the switch to a new GHC version.
You can see all the
changes
made relative to the preceding last 9.10 nightly snapshot.
Apart from trying to build yourself, the easiest way to understand why
particular packages are disabled is to look for their < 0 lines in
build-constraints.yaml,
particularly under the "Library and exe bounds failures" section.
We also have some
tracking issues
still open related to 9.12 core boot libraries.
Thank you to all those who have already done work updating their packages for ghc-9.12.
In our problem last week, we covered a complex problem that used a binary search. Today, we’ll apply binary search again to solidify our understanding of it. This time, instead of extra algorithmic complexity, we’ll start adding some data structure complexity. We’ll be working with a 2D Matrix instead of basic arrays.
To learn more about data structures and algorithms in Haskell, you should take a look at our Solve.hs course! In particular, you’ll cover multi-dimensional arrays in module 2, and you’ll learn how to write algorithms in Haskell in module 3!
The Problem
Today’s problem is Search a 2D Matrix, and the description is straightforward. We’re given a 2D m x n matrix, as well as a target number. We have to return a boolean for whether or not that number is in the Matrix.
This is trivial with a simple scan, but we have an additional constraint that lets us solve the problem faster. The matrix is essentially ordered. Each row is non-decreasing, and the first element of each successive row is no smaller than the last element of the preceding row.
This allows us to get a solution that is O(log(n + m)), a considerable improvement over a linear scan.
The Algorithm
The algorithm is simple as well. We’ll do two binary searches. First, we’ll search over the rows to identify the last row which could contain the element. Then we’ll do a binary search of that row to see if the element is present or not.
We’ll have a slightly different form to our searches compared to last time. In last week’s problem, we knew we had to find a valid index for our search. Now, we may find that no valid index exists.
So we’ll structure our search interval in a semi-open fashion. The first index in our search interval is inclusive, meaning that it could still be a valid index. The second index is exclusive, meaning it is the lowest index that we consider invalid.
In mathematical notation, we would represent such an interval with a square bracket on the left and a parenthesis on the right. So if that interval is [0, 4), then 0, 1, 2, 3 are valid values. The interval [2,2) would be considered empty, with no valid values. We’ll see how we apply this idea in practice.
Rust Solution
We don’t have that many terms to define at the start of this solution. We’ll save the size of both dimensions, and then prepare ourselves for the first binary search by assigning low as 0 (the first potential “valid” answer), hi as m (the lowest “invalid” answer), and creating our output rowWithTarget value. For this, we also assign m, an invalid value. If we fail to re-assign rowWithTarget in our binary search, we want it assigned to an easily testable invalid value.
pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
let m = matrix.len();
let n = matrix[0].len();
let mut low = 0;
let mut hi = m;
let mut rowWithTarget = m;
...
}
Now we write our first binary search, looking for a row that could contain our target value. We maintain the typical pattern of binary search, using the loop while (low < hi) and assigning mid = (low + hi) / 2.
pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
...
while (low < hi) {
let mid: usize = (low + hi) / 2;
if (matrix[mid][0] > target) {
hi = mid;
} else if (matrix[mid][n - 1] < target) {
low = mid + 1;
} else {
rowWithTarget = mid;
break;
}
}
if (rowWithTarget >= m) {
return false;
}
...
}
If the first element of the row is too large, we know that mid is “invalid”, so we can assign it as hi and continue. If the last element is too small, then we reassign low as mid + 1, as we want low to still be a potentially valid value.
Otherwise, we have found a potential row, so we assign rowWithTarget and break. If, after this search, rowWithTarget has the “invalid” value of m, we can return false, as there are no valid values.
Now we just do the same thing over again, but within rowWithTarget! We reassign low and hi (as n this time) to reset the while loop. And now our comparisons will look at the specific value matrix[rowWithTarget][mid].
pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
...
low = 0;
hi = n;
while (low < hi) {
let mid: usize = (low + hi) / 2;
if (matrix[rowWithTarget][mid] > target) {
hi = mid;
} else if (matrix[rowWithTarget][mid] < target) {
low = mid + 1;
} else {
return true;
}
}
return false;
}
Again, we follow the same pattern of re-assigning low and hi. If we don’t hit the return true case in the loop, we’ll end up with return false at the end, because we haven’t found the target.
Here’s the full solution:
pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
let m = matrix.len();
let n = matrix[0].len();
let mut low = 0;
let mut hi = m;
let mut rowWithTarget = m;
while (low < hi) {
let mid: usize = (low + hi) / 2;
if (matrix[mid][0] > target) {
hi = mid;
} else if (matrix[mid][n - 1] < target) {
low = mid + 1;
} else {
rowWithTarget = mid;
break;
}
}
if (rowWithTarget >= m) {
return false;
}
low = 0;
hi = n;
while (low < hi) {
let mid: usize = (low + hi) / 2;
if (matrix[rowWithTarget][mid] > target) {
hi = mid;
} else if (matrix[rowWithTarget][mid] < target) {
low = mid + 1;
} else {
return true;
}
}
return false;
}
Haskell Solution
In our Haskell solution, the main difference of course will be using recursion for the binary search. However, we’ll also change up the data structure a bit. In the Rust framing of the problem, we had a vector of vectors of values. We could do this in Haskell, but we could also use Array (Int, Int) Int. This lets us map row/column pairs to numbers in a more intuitive way.
import qualified Data.Array as A
search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = ...
where
((minR, minC), (maxR, maxC)) = A.bounds matrix
Another unique feature of arrays is that the bounds don’t have to start from 0. We can have totally custom bounding dimensions for our rows and columns. So instead of using m and n, we’ll need to use the min and max of the row and column dimensions.
So now let’s define our first binary search, looking for the valid row. As we did last week, the input to our function will be two Int values, for the low and hi. As in our Rust solution we’ll access the first and last element of the row defined by the “middle” of low and hi, and compare them against the target. We make recursive calls to searchRow if the row isn’t valid.
search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = result
where
((minR, minC), (maxR, maxC)) = A.bounds matrix
searchRow :: (Int, Int) -> Int
searchRow (low, hi) = if low >= hi then maxR + 1 else
let mid = (low + hi) `quot` 2
firstInRow = matrix A.! (mid, minC)
lastInRow = matrix A.! (mid, maxC)
in if firstInRow > target
then searchRow (low, mid)
else if lastInRow < target
then searchRow (mid + 1, hi)
else mid
rowWithTarget = searchRow (minR, maxR + 1)
result = rowWithTarget <= maxR && ...
Instead of m, we have maxR + 1, which we use as the initial hi value, as well as a return value in the base case where low meets hi. We can return a result of False if rowWithTarget does not come back with a value smaller than maxR.
Now for our second search, we follow the same pattern, but now we’re returning a boolean. The base case returns False, and we return True if we find the value in rowWithTarget at position mid. Here’s what that looks like:
search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = result
where
...
rowWithTarget = searchRow (minR, maxR + 1)
searchCol :: (Int, Int) -> Bool
searchCol (low, hi) = low < hi &&
let mid = (low + hi) `quot` 2
val = matrix A.! (rowWithTarget, mid)
in if val > target
then searchCol (low, mid)
else if val < target
then searchCol (mid + 1, hi)
else True
result = rowWithTarget <= maxR && searchCol (minC, maxC + 1)
You’ll see we now use the outcome of searchCol for result. And this completes our solution! Here’s the full code:
search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = result
where
((minR, minC), (maxR, maxC)) = A.bounds matrix
searchRow :: (Int, Int) -> Int
searchRow (low, hi) = if low >= hi then maxR + 1 else
let mid = (low + hi) `quot` 2
firstInRow = matrix A.! (mid, minC)
lastInRow = matrix A.! (mid, maxC)
in if firstInRow > target
then searchRow (low, mid)
else if lastInRow < target
then searchRow (mid + 1, hi)
else mid
rowWithTarget = searchRow (minR, maxR + 1)
searchCol :: (Int, Int) -> Bool
searchCol (low, hi) = low < hi &&
let mid = (low + hi) `quot` 2
val = matrix A.! (rowWithTarget, mid)
in if val > target
then searchCol (low, mid)
else if val < target
then searchCol (mid + 1, hi)
else True
result = rowWithTarget <= maxR && searchCol (minC, maxC + 1)
Conclusion
Next week, we’ll stay on the subject of 2D matrices, but we’ll learn about array mutation. This is a very tricky subject in Haskell, so make sure to come back for that article!
To learn how these data structures work in Haskell, read about Solve.hs, our Haskell Data Structures & Algorithms course!
Cargo is the native package manager and build system for Rust, allowing you to
easily bring in dependencies from the global crates.io registry,1
or to publish your own crates to crates.io. Tor Hovland
and I recently contributed a long-requested feature to Cargo, allowing you to
package many interdependent packages in one go. That might not sound like a big deal,
but there were a few tricky parts; there’s a reason the original feature request
was open for more than 10 years! In this post, I’ll walk you through the feature
and — if you’re a Rust developer — tell you how you can try it out.
Workspaces
The Rust unit of packaging — like a gem in Ruby or a module in Go — is called
a “crate”, and it’s pretty common for a medium-to-large Rust project to be
divided into several of them. This division helps keep code modular and
interfaces well-defined, and also allows you to build and test components
individually. Cargo supports multi-crate workflows using “workspaces”: a
workspace is just a bunch of crates that Cargo handles “together”, sharing a
common dependency tree, a common build directory, and so on. A basic workspace
might look like this:
The dependency specification is actually pretty interesting. First, it tells
us that the tortilla package is located at ../tortilla (relative to
taco). When you’re developing locally, Cargo uses this local path to find the
tortilla crate. But when you publish the taco crate for public consumption, Cargo strips out the
path = "../tortilla" setting because it’s only meaningful within your local
workspace. Instead, the published taco crate will depend on version 1.3 of
the published tortilla crate. This doubly-specified dependency gives you the
benefits of a monorepo (for example, you get to work on tortilla and taco
simultaneously and be sure that they stay compatible) without leaking that local setup
to downstream users of your crates.
If you’ve been hurt by packaging incompatibilities before, the previous
paragraph might have raised some red flags: allowing a dependency to come
from one of two places could lead to problems if they get out-of-sync. Like,
couldn’t you accidentally make a broken package by locally updating both your
crates and then only publishing taco? You won’t see the breakage when building locally,
but the published taco will be incompatible with the previously published tortilla.
To deal with this issue, Cargo verifies packages before you publish them.
When you type cargo publish --package taco, it packages up the taco crate
(removing the local ../tortilla dependency) and then unpackages the new
package in a temporary location and attempts to build it from scratch. This
rebuild-from-scratch sees the taco crate exactly as a downstream user would,
and so it will catch any incompatibilities between the existing, published
tortilla and the about-to-be-published taco.
Cargo’s crate verification is not completely fool-proof because it only checks
that the package compiles.3 In practice, I find that checking compilation is already
pretty useful, but I also like to run other static checks.
Publish all my crates
Imagine you’ve been working in your workspace, updating your crates in backwards-incompatible
ways. Now you want to bump tortilla to version 2.0 and taco to version 3.0
and publish them both. This isn’t too hard:
Edit tortilla/Cargo.toml to increase the version to 2.0.
Run cargo publish --package tortilla, and wait for it to appear on crates.io.
Edit taco/Cargo.toml to increase its version to 3.0, and change its tortilla dependency.
to 2.0.
Run cargo publish --package taco.
The ordering is important here. You can’t publish the new taco before tortilla 2.0 is
publicly available: if you try, the verification step will fail.
This multi-crate workflow works, but it has two problems:
It can get tedious. With two crates it’s manageable, but what about when the dependency graph
gets complicated? I worked for a client whose CI had custom Python scripts for
checking versions, bumping versions, publishing things in the right order, and so on.
It worked, but it wasn’t pretty.4
It’s non-atomic: if in the process of verifying and packaging dependent
crates you discover some problems with the dependencies then you’re out of
luck because you’ve already published them. crates.io doesn’t allow deleting
packages, so you’ll just have to yank5 the broken packages, increase the
version number some more, and start publishing again. This one can’t be solved
by scripts or third-party tooling: verifying the dependent crate requires
the dependencies to be published.
Starting in mid-2024, my colleague Tor Hovland and I began working on native support for this in Cargo. A few months
and dozens of code-review comments later, our initial implementation landed
in Cargo 1.83.0. By the way, the Cargo team are super supportive of new
contributors — I highly recommend going to their office hours if you’re
interested.
How it works
In our implementation, we use a sort of registry “overlay” to verify dependent
crates before their dependencies are published. This overlay wraps an upstream
registry (like crates.io), allowing us to add local crates to the overlay without
actually publishing them upstream.
This kind of registry overlay is an interesting topic on its own. The
“virtualization” of package sources is an often-requested feature that hasn’t
yet been implemented in general because it’s tricky to design without exposing
users to dependency confusion attacks: the more flexible you are about where dependencies
come from, the easier it is for an attacker to sneak their way into your
dependency tree.
Our registry overlay passed scrutiny because it’s only available to Cargo internally, and only gets
used for workspace-local packages during workspace publishing.
The registry overlay was pretty simple to implement, since it’s just a composition of two
existing Cargo features: local registries and abstract sources.
A local registry in Cargo is just a registry (like crates.io) that lives on your local disk
instead of in the cloud.
Cargo has long supported them because they’re useful for offline builds and
integration testing. When packaging a workspace we create a temporary, initially-empty
local registry for storing the new local packages as we produce them.
Our second ingredient is Cargo’s Source trait: since Cargo can pull dependencies
from many different kinds of places (crates.io, private registries, git repositories, etc.),
they already have a nice abstraction that encapsulates how to query
availability, download, and cache packages from different places. So our registry
overlay is just a new implementation of the Source trait that wraps two other Sources:
the upstream registry (like crates.io) that we want to publish to, and the local registry
that we put our local packages in.
When someone queries our overlay source for a package, we check in the local registry
first, and fall back to the upstream registry.
Now that we have our local registry overlay, the workspace-publishing workflow looks like this:
Gather all the to-be-published crates and figure out any inter-dependencies.
Sort them in a “dependency-compatible” order, meaning that every crate
will be processed after all its dependencies.
In that dependency-compatible order, package and verify each crate. For each crate:
Package it up, removing any mention of local path dependencies.
Unpackage it in a temporary location and check that it builds. This
build step uses the local registry overlay, so that it thinks all
the local dependencies that were previously added to the local overlay
are really published.
“Publish” the crate in the local registry overlay.
In the dependency-compatible order, actually upload all the crates to crates.io.
This is done in parallel as much as possible. For example, if tortilla and carnitas
don’t depend on one another but taco depends on them both, then tortilla and
carnitas can be uploaded simultaneously.
It’s possible for the final upload to fail (if your network goes down, for example)
and for some crates to remain unpublished; in that sense, the new workspace publishing
workflow is not truly atomic. But because all of the new crates have already been
verified with one another, you can just retry publishing the ones that failed to upload.
How to try it
Cargo, as critical infrastructure for Rust development, is pretty conservative about
introducing new features. Multi-package publishing was recently promoted to
a stable feature, but it is currently only available in nightly builds. If you’re using
a recent nightly build of Cargo 1.90.0 or later, running cargo publish in a workspace
will work as described in this blog post.
If you don’t want to publish everything in your workspace, the usual package-selection arguments
should work as expected: cargo publish --package taco --package tortilla
will publish just taco and tortilla, while correctly managing any dependencies
between them. Or you can exclude packages like cargo publish --exclude onions.
If you’re using a stable Rust toolchain, workspace publishing will be available in
Cargo 1.90 in September 2025.
If you use Node.js, Cargo is like the npm command and crates.io is like the NPM registry. If you use Python, Cargo is like pip (or Poetry, or uv) and crates.io is like PyPI.↩
It can also contain lots of other useful workspace-scoped information, like dependencies that are common between crates or global compiler settings.↩
To be even more precise, it only checks that the package compiles against
the dependencies that are locked in your Cargo.lock file, which gets included in the package.
If you or someone in your dependency tree doesn’t correctly follow
semantic versioning, downstream users could still experience
compilation problems. In practice, we’ve seen this cause binary packages to
break because cargo installignores the lock file
by default.↩
There are also several third-party tools
(for example, cargo-release, cargo-smart-release, and release-plz) to help
automate multi-crate releases. If one of these meets your needs, it might be better
than a custom script.↩
“Yanking” is Cargo’s mechanism for marking packages as broken without actually deleting
their contents and breaking everyone’s builds.↩
Today, 2025-07-09, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 46th episode of the Haskell Unfolder live on YouTube.
In this episode targeted at beginners, we show the end-to-end application development process, starting from an empty directory. We’ll consider package configuration, taking advantage of editor integration, how to deal with dependencies, organizing code into modules, and parsing command line arguments. We will use this to write a simple but useful application.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
Mike and Andres speak to Alex McLean who created the TidalCycles system for electronic music - implemented in Haskell of course. We talk about how Alex got into Haskell coming from Perl, how types helped him think about the structure of music and patterns, the architecture and evolution of TidalCycles, about art, community and making space for new ideas, and lots of things in between.
This week we’ll be continuing our series of problem solving in Haskell and Rust. But now we’re going to start moving beyond the terrain of “basic” problem solving techniques with strings, lists and arrays, and start moving in the direction of more complicated data structures and algorithms. Today we’ll explore a problem that is still array-based, but uses a tricky algorithm that involves binary search!
You’ll learn more about Data Structures and Algorithms in our Solve.hs course! The last 7 weeks or so of blog articles have focused on the types of problems you’ll see in Module 1 of that course, but now we’re going to start encountering ideas from Modules 2 & 3, which look extensively at essential data structures and algorithms you need to know for problem solving.
The Problem
Today’s problem is median of two sorted arrays. In this problem, we receive two arrays of numbers as input, each of them in sorted order. The arrays are not necessarily of the same size. Our job is to find the median of the cumulative set of numbers.
Now there’s a conceptually easy approach to this. We could simply scan through the two arrays, keeping track of one index for each one. We would increase the index for whichever number is currently smaller, and stop once we have passed by half of the total numbers. This approach is essentially the “merge” part of merge sort, and it would take O(n) time, since we are scanning half of all the numbers.
However, there’s a faster approach! And if you are asked this question in an interview for anything other than a very junior position, your interviewer will expect you to find this faster approach. Because the arrays are sorted, we can leverage binary search to find the median in O(log n) time. The approach isn’t easy to see though! Let’s go over the algorithm before we get into any code.
The Algorithm
This algorithm is a little tricky to follow (this problem is rated as “hard” on LeetCode). So we’re going to treat this a bit like a mathematical proof, and begin by defining useful terms. Then it will be easy to describe the coding concepts behind the algorithm.
Defining our Terms
Our input consists of 2 arrays, arr1 and arr2 with potentially different sizes n and m, respectively. Without loss of generality, let arr1 be the “shorter” array, so that n <= m. We’ll also define t as the total number of elements, n + m.
It is worthwhile to note right off the bat that if t is odd, then a single element from one of the two lists will be the median. If t is even, then we will average two elements together. Even though we won’t actually create the final merged array, we can imagine that it consists of 3 parts:
The “prior” portion - all numbers before the median element(s)
The median element(s), either 1 or 2.
The “latter” portion - all numbers after the median element(s)
The total number of elements in the “prior” portion will end up being (t - 1) / 2, bearing in mind how integer division works. For example, whether t is 15 or 16, we get 7 elements in the “prior” portion. We’ll use p for this number.
Finally, let’s imagine p1, the number of elements from arr1 that will end up in the prior portion. If we know p1, then p2, the number of elements from arr2 in the prior portion is fixed, because p1 + p2 = p. We can then think of p1 as an index into arr1, the index of the first element that is not in the prior portion. The only trick is that this index could be n indicating that all elements of arr1 are in the prior portion.
Getting the Final Answer from our Terms
If we have the “correct” values for p1 and p2, then finding the median is easy. If t is odd, then the lower number between arr1[p1] and arr2[p2] is the median. If t is even, then we average the two smallest numbers among (arr1[p1], arr2[p2], arr1[p1 + 1], arr2[p2 + 1]).
So we’ve reduced this problem to a matter of finding p1, since p2 can be easily derived from it. How do we know we have the “correct” value for p1, and how do we search for it efficiently?
Solving for p1
The answer is that we will conduct a binary search on arr1 in order to find the correct value of p1. For any particular choice of p1, we determine the corresponding value of p2. Then we make two comparisons:
Compare arr1[p1 - 1] to arr2[p2]
Compare arr2[p2 - 1] to arr1[p1]
If both comparisons are less-than-or-equals, then our two p values are correct! The slices arr1[0..p1-1] and arr2[0..p2-1] always constitute a total of p values, and if these values are smaller than arr1[p1] and arr2[p2], then they constitute the entire “prior” set.
If, on the other hand, the first comparison yields “greater than”, then we have too many values for arr1 in our prior set. This means we need to recursively do the binary search on the left side of arr1, since p1 should be smaller.
Then if the second comparison yields “greater than”, we have too few values from arr1 in the “prior” set. We should increase p1 by searching the right half of our array.
This provides a complete algorithm for us to follow!
Rust Implementation
Our algorithm description was quite long, but the advantage of having so many details is that the code starts to write itself! We’ll start with our Rust implementation. Stage 1 is to define all of the terms using our input values. We want to define our sizes and array references generically so that arr1 is the shorter array:
pub fn find_median_sorted_arrays(nums1: Vec<i32>, nums2: Vec<i32>) -> f64 {
let mut n = nums1.len();
let mut m = nums2.len();
let mut arr1: &Vec<i32> = &nums1;
let mut arr2: &Vec<i32> = &nums2;
if (m < n) {
n = nums2.len();
m = nums1.len();
arr1 = &nums2;
arr2 = &nums1;
}
let t = n + m;
let p: usize = (t - 1) / 2;
...
}
Anatomy of a Binary Search
The next stage is the binary search, so we can find p1 and p2. Now a binary search is a particular kind of loop pattern. Like many of the loop patterns we worked with in the previous weeks, we can express it recursively, or with a loop construct like for or while. We’ll start with a while loop solution for Rust, and then show the recursive solution with Haskell.
All loops maintain some kind of state. For a binary search, the primary state is the two endpoints representing our “interval of interest”. This starts out as the entire interval, and shrinks by half each time until we’ve narrowed to a single element (or no elements). We’ll represent these with interval end points with low and hi. Our loop concludes once low is as large as hi.
let mut low = 0;
// Use the shorter array size!
let mut hi = n;
while (low < hi) {
...
}
In our particular case, we are also trying to determine the values for p1 and p2. Each time we specify an interval, we’ll see if the midpoint of that interval (between low and hi) is the correct value of p1:
...
let mut low = 0;
let mut hi = n;
let mut p1 = 0;
let mut p2 = 0;
while (low < hi) {
p1 = (low + hi) / 2;
p2 = p - p1;
...
}
Now we evaluate this p1 value using the two conditions we specified in our algorithm. These are self-explanatory, except we do need to cover some edge cases where one of our values is at the edge of the array bounds.
For example, if p1 is 0, the first condition is always “true”. If this condition is negated, this means we want fewer elements from arr1, but this is impossible if p1 is 0.
...
let mut low = 0;
let mut hi = n;
let mut p1 = 0;
let mut p2 = 0;
while (low < hi) {
p1 = (low + hi) / 2;
p2 = p - p1;
let cond1 = p1 == 0 || arr1[p1 - 1] <= arr2[p2];
let cond2 = p1 == n || p2 == 0 || arr2[p2 - 1] <= arr1[p1];
if (cond1 && cond2) {
break;
} else if (!cond1) {
p1 -= 1;
hi = p1;
} else {
p1 += 1;
low = p1;
}
}
p2 = p - p1;
...
If both conditions are met, you’ll see we break, because we’ve found the right value for p1! Otherwise, we know p1 is invalid. This means we want to exclude the existing p1 value from further consideration by changing either low or hi to remove it from the interval of interest.
So if cond1 is false, hi becomes p1 - 1, and if cond2 is false, it becomes p1 + 1. In both cases, we also modify p1 itself first so that our loop does not conclude with p1 in an invalid location.
Getting the Final Answer
Now that we have p1 and p2, we have to do a couple final tricks to get the final answer. We want to get the first “smaller” value between arr1[p1] and arr2[p2]. But we have to handle the edge case where p1 might be n AND we want to increment the index for the array we take. Note that p2 cannot be out of bounds right now!
let mut median = arr2[p2];
if (p1 < n && arr1[p1] < arr2[p2]) {
median = arr1[p1];
p1 += 1;
} else {
p2 += 1;
}
If the total number of elements is odd, we can simply return this number (converting to a float). However, in the even case we need one more number to take an average. So we’ll compare the values at the indices again, but now accounting that either (but not both) could be out of bounds.
let mut median = arr2[p2];
if (p1 < n && arr1[p1] < arr2[p2]) {
median = arr1[p1];
p1 += 1;
} else {
p2 += 1;
}
if (t % 2 == 0) {
if (p1 >= n) {
median += arr2[p2];
} else if (p2 >= m) {
median += arr1[p1];
} else {
median += cmp::min(arr1[p1], arr2[p2]);
}
let medianF: f64 = median.into();
return medianF / 2.0;
} else {
return median.into();
}
Here’s the complete implementation:
pub fn find_median_sorted_arrays(nums1: Vec<i32>, nums2: Vec<i32>) -> f64 {
let mut n = nums1.len();
let mut m = nums2.len();
let mut arr1: &Vec<i32> = &nums1;
let mut arr2: &Vec<i32> = &nums2;
if (m < n) {
n = nums2.len();
m = nums1.len();
arr1 = &nums2;
arr2 = &nums1;
}
let t = n + m;
let p: usize = (t - 1) / 2;
let mut low = 0;
let mut hi = n;
let mut p1 = 0;
let mut p2 = 0;
while (low < hi) {
p1 = (low + hi) / 2;
p2 = p - p1;
let cond1 = p1 == 0 || arr1[p1 - 1] <= arr2[p2];
let cond2 = p1 == n || p2 == 0 || arr2[p2 - 1] <= arr1[p1];
if (cond1 && cond2) {
break;
} else if (!cond1) {
p1 -= 1;
hi = p1;
} else {
p1 += 1;
low = p1;
}
}
p2 = p - p1;
let mut median = arr2[p2];
if (p1 < n && arr1[p1] < arr2[p2]) {
median = arr1[p1];
p1 += 1;
} else {
p2 += 1;
}
if (t % 2 == 0) {
if (p1 >= n) {
median += arr2[p2];
} else if (p2 >= m) {
median += arr1[p1];
} else {
median += cmp::min(arr1[p1], arr2[p2]);
}
let medianF: f64 = median.into();
return medianF / 2.0;
} else {
return median.into();
}
}
Haskell Implementation
Now let’s examine the Haskell implementation. Unlike the LeetCode version, we’ll just assume our inputs are Double already instead of doing a conversion. Once again, we start by defining the terms:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
where
n' = V.length input1
m' = V.length input2
t = n' + m'
p = (t - 1) `quot` 2
(n, m, arr1, arr2) = if V.length input1 <= V.length input2
then (n', m', input1, input2) else (m', n', input2, input1)
...
Now we’ll implement the binary search, this time doing a recursive function. We’ll do this in two parts, starting with a helper function. This helper function will simply tell us if a particular index is correct for p1. The trick though is that we’ll return an Ordering instead of just a Bool:
-- data Ordering = LT | EQ | GT
f :: Int -> Ordering
This lets us signal 3 possibilities. If we return EQ, this means the index is valid. If we return LT, this will mean we want fewer values from arr1. And then GT means we want more values from arr1.
With this framing it’s easy to see the implementation of this helper now. We determine the appropriate p2, figure out our two conditions, and return the value for each condition:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
where
...
f :: Int -> Ordering
f pi1 =
let pi2 = p - pi1
cond1 = pi1 == 0 || arr1 V.! (pi1 - 1) <= arr2 V.! pi2
cond2 = pi1 == n || pi2 == 0 || (arr2 V.! (pi2 - 1) <= arr1 V.! pi1)
in if cond1 && cond2 then EQ else if (not cond1) then LT else GT
Now applying we can use this in a recursive binary search. The binary search tracks two pieces of state for our interval ((Int, Int)), and it will return the correct value for p1. The implementation applies the base case (return low if low >= hi), determines the midpoint, calls our helper, and then recurses appropriately based on the helper result.
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
where
...
f :: Int -> Ordering
f pi1 = ...
search :: (Int, Int) -> Int
search (low, hi) = if low >= hi then low else
let mid = (low + hi) `quot` 2
in case f mid of
EQ -> mid
LT -> search (low, mid - 1)
GT -> search (mid + 1, hi)
p1 = search (0, n)
p2 = p - p1
...
For the final part of the problem, we’ll define a helper. Given p1 and p2, it will emit the “lower” value between the two indices in the array (accounting for edge cases) as well as the two new indices (since one will increment).
This is a matter of lazily defining the “next” value for each array, the “end” condition of each array, and the “result” if that array’s value is chosen:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
where
...
findNext pi1 pi2 =
let next1 = arr1 V.! pi1
next2 = arr2 V.! pi2
end1 = pi1 >= n
end2 = pi2 >= m
res1 = (next1, pi1 + 1, pi2)
res2 = (next2, pi1, pi2 + 1)
in if end1 then res2
else if end2 then res1
else if next1 <= next2 then res1 else res2
Now we just apply this either once or twice to get our result!
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = result
where
...
tIsEven = even t
(median1, nextP1, nextP2) = findNext p1 p2
(median2, _, _) = findNext nextP1 nextP2
result = if tIsEven
then (median1 + median2) / 2.0
else median1
Here’s the complete implementation:
medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = result
where
n' = V.length input1
m' = V.length input2
t = n' + m'
p = (t - 1) `quot` 2
(n, m, arr1, arr2) = if V.length input1 <= V.length input2
then (n', m', input1, input2) else (m', n', input2, input1)
-- Evaluate the index in arr1
-- If this does in indicate the index can be part of a median, return EQ
-- If it indicates we need to move left in shortArr, return LT
-- If it indicates we need to move right in shortArr, return GT
-- Precondition: p1 <= n
f :: Int -> Ordering
f pi1 =
let pi2 = p - pi1
cond1 = pi1 == 0 || arr1 V.! (pi1 - 1) <= arr2 V.! pi2
cond2 = pi1 == n || pi2 == 0 || (arr2 V.! (pi2 - 1) <= arr1 V.! pi1)
in if cond1 && cond2 then EQ else if (not cond1) then LT else GT
search :: (Int, Int) -> Int
search (low, hi) = if low >= hi then low else
let mid = (low + hi) `quot` 2
in case f mid of
EQ -> mid
LT -> search (low, mid - 1)
GT -> search (mid + 1, hi)
findNext pi1 pi2 =
let next1 = arr1 V.! pi1
next2 = arr2 V.! pi2
end1 = pi1 >= n
end2 = pi2 >= m
res1 = (next1, pi1 + 1, pi2)
res2 = (next2, pi1, pi2 + 1)
in if end1 then res2
else if end2 then res1
else if next1 <= next2 then res1 else res2
p1 = search (0, n)
p2 = p - p1
tIsEven = even t
(median1, nextP1, nextP2) = findNext p1 p2
(median2, _, _) = findNext nextP1 nextP2
result = if tIsEven
then (median1 + median2) / 2.0
else median1
Conclusion
If you want to learn more about these kinds of problem solving techniques, you should take our course Solve.hs! In the coming weeks, we’ll see more problems related to data structures and algorithms, which are covered extensively in Modules 2 and 3 of that course!
GHC will start maintaining an LTS release/branch in the near future
A release being designated LTS (Long Term Support) in this case means we plan to
support it over a longer timeframe than usual.
Concretely the plan is to provide updates for a LTS releases for at least two years. Most likely we will
support LTS releases for even longer than that, aiming for a support window of three years
currently.
During this time we will be providing minor releases fixing bugs as with any other
release. The main difference being that we will do so for a longer period of time.
There are no plans to backport any new features to LTS releases after their initial release.
In terms of frequency of LTS releases we plan to have an overlap between LTS
support windows of different LTS series of six months.
A potential timeline might then look like this:
2025 Aug - LTS 9.14 released
2028 Spring - LTS 9.22 released
2028 Summer - LTS 9.14.X - last 9.14 point release
2031 Spring - LTS 9.X released
2031 Summer - Last 9.22 point release
...
Non-LTS releases
GHC will continue to release new major non-lts releases on a ~6 Month cadence.
We expect to cut back on the lifetime of these releases slightly, dedicating the
resources freed up this way to enable a longer support window for the LTS releases.
Why LTS releases?
In practice some releases always saw more adoption than
others by users. The GHC Team has not been blind to this fact and has at times
informally extended the life of a certain release based on this as well.
This resulted in a sort of informal “post-hoc LTS” status of releases.
At times with support windows not much shorter than our proposed minimum of two years.
This worked reasonable well for people who were confident to stay on a fairly old
release, only upgrading to a newer “post-hoc LTS” once the dust settled.
It also worked out for those who picked one of those “post-hoc LTS” releases by
happenstance before it was clear the release would end up as “post-hoc LTS”.
However users who adopted major releases which did not end up as “post-hoc LTS”
often had to choose between upgrading earlier than expected, or risk running into
a show stopping bug after the support window of the release had already ended.
Similarly much of this was based on informal community sentiment and rarely written
down explicitly. Making this information hard to access for members not deeply
involved in the day to day of the haskell community.
By designating a major release as LTS ahead of time we hope that users can make
a informed decision about which GHC version they pick. Making it clear what
the tradeoffs will be. With a clear choice between a longer support
window or the newest features.
Why not make post-hoc LTS releases official instead?
This is a question that has come up a lot in discussion. The major downsides of
this are a lack of predictability, and that a lot of time might be lost between
the initial release and any such decision. If we declare a release as LTS 9 months
after its .1 release we essentially shaved off months from the LTS support window.
On the flip side if we announce it ahead of time everyone knows that
a given release will be the new LTS. So the hope is that this encourages more and quicker
support for the release by the community. Hopefully compressing the timeline of
bug fixing, testing and eventual widespread adoption.
Overall I’m hopeful that LTS releases being explicit will remove a lot
of ambiguity around GHC versions. And while the guaranteed LTS support window
might not be as long as one might hope having LTS releases with longer guaranteed
support window should still be helpful to people working on long running haskell projects.
Next steps
The first LTS release will be GHC 9.14, which will be released this summer!
As part of our consulting business we are often invited to solve problems
that our clients cannot tackle on their own. It is not uncommon for us to
collaborate with a client for extended periods of time; during which, many
opportunities for knowledge transfer present themselves, be it in the form
of documentation, discussions, or indeed, when the client finds it
desirable, in the form of specialized workshops.
In this post we’d like to talk about a workshop that we developed and
delivered (so far) five times to different groups of people at the same
client. We received positive feedback for it and we believe it was helpful
for those who attended it.
The workshop intends to give a principled introduction to the Bazel build
system for people who have little or no knowledge of Bazel, but who
are software developers and have used a build system before. It is
definitely a workshop for a technical audience, and as such it was presented
to (among others) dedicated DevOps and DevX teams of the client.
We are happy to announce that the materials of this workshop are now
publicly available in the form of:
The original intended duration of the workshop was three days. However, one
of these days was dedicated almost entirely to a case study that we cannot
share publicly; therefore, the public version is shorter and should amount to
approximately two days.
Here are a couple of the introductory slides to give you an impression
of the scope, structure, and expected knowledge in this workshop:
It must be pointed out that the workshop was developed in 2024, when the
WORKSPACE-based approach to dependency management was still the default
choice and so, given that we were time-constrained both at the authoring and
presentation stages, we chose not to cover Bzlmod. We are still convinced
that familiarity with WORKSPACE and simple repository rules is a
prerequisite for understanding Bzlmod. Some newer features like symbolic
macros are also not covered. Learning materials for Bazel go out of date
quickly, but even so, we believe that the workshop, now public, is still
relevant and can be of use for people who are about to embark on their Bazel
journey.
Welcome back to our series comparing LeetCode problems in Haskell and Rust. Today we’ll learn a new paradigm that I call “Buffer and Save”. This will also be the hardest problem we’ve done so far! The core loop structure isn’t that hard, but there are a couple layers of tricks to massage our data to get the final answer.
This will be the last problem we do that focuses strictly on string and list manipulation. The next set of problems we do will all rely on more advanced data structures or algorithmic ideas.
For more complete practice on problem solving in Haskell, check out Solve.hs, our newest course. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. You’ll get loads of practice building structures and algorithms from scratch, which is very important for understanding and remembering how they work.
The Problem
Today’s problem is Text Justification. The idea here is that we are taking a list of words and a “maximum width” and printing out the words grouped into equal-width lines that are evenly spaced. Here’s an example input and output:
Example Input (list of 9 strings):
[“Study”, “Haskell”, “with”, “us”, “every”, “Monday”, “Morning”, “for”, “fun”]
Max Width: 16
Output (list of 4 strings):
“Study Haskell”
“with us every”
“Monday Morning”
“for fun ”
There are a few notable rules, constraints, and edge cases. Here’s a list to sumarize them:
There is at least one word
No word is larger than the max width
All output strings must have max width as their length (including spaces)
The first word of every line is set to the left
The last line always has 1 space between words, and then enough spaces after the last word to read the max width.
All other lines with multiple words will align the final word all the way to the right
The spaces in non-final lines are distributed as evenly as possible, but extra spaces go between words to the left.
The final point is potentially the trickiest to understand. Consider the second line above, with us every. The max width is 16, and we have 3 words with a total of 11 characters. This leaves us 5 spaces. Having 3 words means 2 blanks, so the “left” blank gets 3 spaces and the “right” blank gets 2 spaces.
If you had a line with 5 words, a max width of 30, and 16 characters, you would place 4 spaces in the left two blanks, and 3 spaces in the right two blanks. The relative length of the words does not matter.
Words in Line: [“A”, “good”, “day”, “to”, “endure”]
Output Line:
“A good day to endure”
The Algorithm
As mentioned above, our main algorithmic idea could be called “buffer and save”. We’ve been defining all of our loops based on the state we must maintain between iterations of the loop. The buffer and save approach highlights two pieces of state for us:
The strings we’ve accumulated for our answer so far (the “result”)
A buffer of the strings in the “current” line we’re building.
So we’ll loop through the input words one at a time. We’ll consider if the next word can be added to the “current” line. If it would cause our current line to exceed the maximum width, we’ll “save” our current line and write it out to the “result” list, adding the required spaces.
To help our calculations, we’ll also include two other pieces of state in our loop:
The number of characters in our “current” line
The number of words in our “current” line
Finally, there’s the question of how to construct each output line. Combining the math with list-mechanics is a little tricky. But the central idea consists of 4 simple steps:
Find the number of spaces (subtract number of characters from max width)
Divide the number of spaces by the number of “blanks” (number of words - 1)
The quotient is the “base” number of spaces per blank
The remainder is the number of blanks (starting from the left) that get an extra space
The exact implementation of this idea differs between Haskell and Rust. Again this rests a lot on the “reverse” differences between Rust vectors and Haskell lists.
The final line has a slightly different (but easier) process. And we should note that the final line will still be in our buffer when we exit the loop! So we shouldn’t forget to add it to the result.
Haskell Solution
We know enough now to jump into our Haskell solution. Our solution should be organized around a loop. Since we go through the input word-by-word, this should follow a fold pattern. So here’s our outline:
Let’s focus in on the choice we have to make in the loop. We need to determine if this new word fits in our current line. So we’ll get its length and add it to the number of characters in the line AND consider the number of words in the line. We count the words too since each word we already have requires at least one space!
-- (maxWidth is still in scope here)
loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
let newWordLen = length newWord
in if newWordLen + charsInLine + wordsInLine > maxWidth
then ...
else ...
How do we fill in these choices? If we don’t overflow the line, we just append the new word, bump the count of the words, and add the new word’s length to the character count.
The overflow case isn’t hard, but it does require us to have a function that can convert our current line into the final string. This function will also take the number of words and characters in this line. Assuming this function exists, we just make this new line, append it to result, and then reset our other stateful values so that they only reflect the “new word” as part of our current line.
Before we think about the makeLine implementation though, we just about have enough to fill in the rest of the “top” of our function definition. We’d just need another function for making the “final” line, since this is different from other lines. Then when we get our “final” state values, we’ll plug them into this function to get our final line, append this to the result, and reverse it all.
Now let’s discuss forming these lines, starting with the general case. We can start with a couple edge cases. This should never be called with an empty list. And with a singleton, we just left-align the word and add the right number of spaces:
makeLine :: [String] -> Int -> Int -> String
makeLine [] _ _ = error "Cannot makeLine with empty string!"
makeLine [onlyWord] _ charsInLine =
let extraSpaces = replicate (maxWidth - charsInLine) ' '
in onlyWord <> extraSpaces
makeLine (first : rest) wordsInLine charsInLine = ...
Now we’ll calculate the quotient and remainder to get the spacing sizes, as mentioned in our algorithm section. But how do we combine them? There are multiple ways, but the idea I thought of was to zip the tail of the list with the number of spaces it needs to append. Then we can fold it into a resulting list using a function like this:
-- (String, Int) is the next string and the number of spaces after it
combine :: String -> (String, Int) -> String
combine suffix (nextWord, numSpaces) =
nextWord <> replicate numSpaces ' ' <> suffix
Remember while doing this that we’ve accumulated the words for each line in reverse order. So we want to append each one in succession, together with the number of spaces that come after it.
To use this function, we can “fold” over the “tail” of our current line, while using the first word in our list as the base of the fold! Don’t forget the quotRem math going on in here!
To make the final line, we can also leverage our combine function! It’s just a matter of combining each word in our input with the appropriate number of spaces. In this case, almost every word gets 1 space except for the last one (which comes first in our list). This just gets however many trailing spaces we need!
makeLineFinal :: [String] -> Int -> Int -> String
makeLineFinal [] _ _ = error "Cannot makeLine with empty string!"
makeLineFinal strs wordsInLine charsInLine =
let trailingSpaces = maxWidth - charsInLine - (wordsInLine - 1)
in foldl combine "" (zip strs (trailingSpaces : repeat 1))
Putting all these pieces together, we have our complete solution!
Now let’s put together our Rust solution. Since we have a reasonable outline from writing this in Haskell, let’s start with the simpler elements, makeLine and makeLineFinal. We’ll use library functions as much as possible for the string manipulation. For example, we can start makeLineFinal by using join on our input vector of strings.
pub fn make_line_final(
currentLine: &Vec<&str>,
max_width: usize,
charsInLine: usize) -> String {
let mut result = currentLine.join(" ");
...
}
Now we just need to calculate the number of trailing spaces, subtracting the number of characters in the joined string. We append this to the end by taking a blank space and using repeat for the correct number of times.
pub fn make_line_final(
currentLine: &Vec<&str>,
max_width: usize,
charsInLine: usize) -> String {
let mut result = currentLine.join(" ");
let trailingSpaces = max_width - result.len();
result.push_str(&" ".repeat(trailingSpaces));
return result;
}
For those unfamiliar with Rust, the type of our input vector might seem odd. When we have &Vec<&str>, this means a reference to a vector of string slices. String slices are portions of a String that we hold a reference to, but they aren’t copied. However, when we join them, we make a new String result.
Also note that we aren’t passing wordsInLine as a separate parameter. We can get this value using .len() in constant time in Rust. In Haskell, length is O(n) so we don’t want to always do that.
Now for the general make_line function, we have the same type signature, but we start with our base case, where we only have one string in our current line. Again, we use repeat with the number of spaces.
pub fn make_line(
currentLine: &Vec<&str>,
max_width: usize,
charsInLine: usize) -> String {
let mut result = String::new();
let n = currentLine.len();
if (n == 1) {
result.push_str(currentLine[0]);
result.push_str(&" ".repeat(max_width - charsInLine));
return result;
}
...
}
Now we do the “math” portion of this. Rust doesn’t have a single quotRem function in its base library, so we calculate these values separately.
pub fn make_line(
currentLine: &Vec<&str>,
max_width: usize,
charsInLine: usize) -> String {
let mut result = String::new();
let n = currentLine.len();
if (n == 1) {
result.push_str(currentLine[0]);
result.push_str(&" ".repeat(max_width - charsInLine));
return result;
}
let numSpaces = (max_width - charsInLine);
let baseNumSpaces = numSpaces / (n - 1);
let numWithExtraSpace = numSpaces % (n - 1);
let mut i = 0;
while i < n {
...
}
return result;
}
The while loop we’ll write here is instructive. We use an index instead of a for each pattern because the index tells us how many spaces to use. If our index is smaller than numWithExtraSpace, we add 1 to the base number of spaces. Otherwise we use the base until the index n - 1. This index has no extra spaces, so we’re done at that point!
pub fn make_line(
currentLine: &Vec<&str>,
max_width: usize,
charsInLine: usize) -> String {
let mut result = String::new();
let n = currentLine.len();
if (n == 1) {
result.push_str(currentLine[0]);
result.push_str(&" ".repeat(max_width - charsInLine));
return result;
}
let numSpaces = (max_width - charsInLine);
let baseNumSpaces = numSpaces / (n - 1);
let numWithExtraSpace = numSpaces % (n - 1);
let mut i = 0;
while i < n {
result.push_str(currentLine[i]);
if i < numWithExtraSpace {
result.push_str(&" ".repeat(baseNumSpaces + 1));
} else if i < n - 1 {
result.push_str(&" ".repeat(baseNumSpaces));
}
i += 1;
}
return result;
}
Now we frame our solution. Let’s start by setting up our state variables (again, omitting numWordsInLine). We’ll also redefine max_width as a usize value for ease of comparison later.
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
let mut currentLine = Vec::new();
let mut charsInLine = 0;
let mut result = Vec::new();
let mw = max_width as usize;
...
}
Now we’d like to frame our solution as a “for each” loop. However, this doesn’t work, for Rust-related reasons we’ll describe after the solution! Instead, we’ll use an index loop.
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
let mut currentLine = Vec::new();
let mut charsInLine = 0;
let mut result = Vec::new();
let mw = max_width as usize;
let mut i = 0;
let n = words.len();
for i in 0..n {
...
}
}
We’ll get the word by index on each iteration, and use its length to see if we’ll exceed the max width. If not, we can safely push it onto currentLine and increase the character count:
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
let mut currentLine = Vec::new();
let mut charsInLine = 0;
let mut result = Vec::new();
let mw = max_width as usize;
let mut i = 0;
let n = words.len();
for i in 0..n {
let word = &words[i];
if word.len() + charsInLine + currentLine.len() > mw {
...
} else {
currentLine.push(&words[i]);
charsInLine += word.len();
}
}
}
Now when we do exceed the max width, we have to push our current line onto result (calling make_line). We clear the current line, push our new word, and use its length for charsInLine.
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
let mut currentLine = Vec::new();
let mut charsInLine = 0;
let mut result = Vec::new();
let mw = max_width as usize;
let mut i = 0;
let n = words.len();
for i in 0..n {
let word = &words[i];
if word.len() + charsInLine + currentLine.len() > mw {
result.push(make_line(¤tLine, mw, charsInLine));
currentLine.clear();
currentLine.push(&words[i]);
charsInLine = word.len();
} else {
currentLine.push(&words[i]);
charsInLine += word.len();
}
}
...
}
After our loop, we’ll just call make_line_final on whatever is left in our currentLine! Here’s our complete full_justify function that calls make_line and make_line_final as we wrote above:
pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
let mut currentLine = Vec::new();
let mut charsInLine = 0;
let mut result = Vec::new();
let mw = max_width as usize;
let mut i = 0;
let n = words.len();
for i in 0..n {
let word = &words[i];
if word.len() + charsInLine + currentLine.len() > mw {
result.push(make_line(¤tLine, mw, charsInLine));
currentLine.clear();
currentLine.push(&words[i]);
charsInLine = word.len();
} else {
currentLine.push(&words[i]);
charsInLine += word.len();
}
}
result.push(make_line_final(¤tLine, mw, charsInLine));
return result;
}
Why an Index Loop?
Inside our Rust loop, we have an odd pattern in getting the “word” for this iteration. We first assign word = &words[i], and then later on, when we push that word, we reference words[i] again, using currentLine.push(&words[i]).
Why do this? Why not currentLen.push(word)? And then, why can’t we just do for word in words as our loop?
If we write our loop as for word in words, then we cannot reference the value word after the loop. It is “scoped” to the loop. However, currentLine “outlives” the loop! We have to reference currentLine at the end when we make our final line.
To get around this, we would basically have to copy the word instead of using a string reference &str, but this is unnecessarily expensive.
These are the sorts of odd “lifetime” quirks you have to learn to deal with in Rust. Haskell is easier in that it spares us from thinking about this. But Rust gains a significant performance boost with these sorts of ideas.
Conclusion
This was definitely the most involved problem we’ve dealt with so far. We learned a new paradigm (buffer and save), and got some experience dealing with some of the odd quirks and edge cases of string manipulation, especially in Rust. It was a fairly tricky problem, as far as list manipulation goes. For an easier example of a buffer and save problem, try solving Merge Intervals.
If you want to level up your Haskell problem solving skills, you need to take our course Solve.hs. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. After this course, you’ll be in great shape to deal with these sorts of LeetCode style problems as they come up in your projects.
the Builder type in bytestring produce lazy bytestrings.
At the time I was happy to see that attoparsec seemed to support strict and lazy
bytestrings equally well.
To get on with things I also wrote the simplest function I could come up with
for sending and receiving data over the network – I used send and recv from
Network.Socket.ByteString.Lazy in network. The function was really simple
import Network.Socket.ByteString.Lazy qualifiedas SB
sendCmd :: Conn->Command r->IO (Result r)sendCmd(Conn p)(Command k cmd) = withResource p $ \sock ->do
_ <- SB.send sock $ toWireCmd cmd
resp <- SB.recv sock 4096case decode resp of
Left err -> pure $ Left $ RespError "decode"(TL.pack err)
Right r -> pure $ k <$> fromWireResp cmd r
I knew I'd have to revisit this function, it was naïve to believe that a call to
recv would always result in as single complete response. It was however good
enough to get going. When I got to improving sendCmd I was a little surprised
to find that I'd also have to switch to using strict bytestrings in the parser.
Interlude on the Redis serialisation protocol (RESP3)
The Redis protocol has some defining attributes
It's somewhat of a binary protocol. If you stick to keys and values that fall
within the set of ASCII strings, then the protocol is humanly readable and you
can rather easily use netcat or telnet as a client. However, you aren't
limited to storing only readable strings.
It's somewhat of a type-length-value style protocol. Some of the data types
include their length in bytes, e.g. bulk strings and verbatim strings.
Other types include the number of elements, e.g. arrays and maps. A large
number of them have no length at all, e.g. simple strings, integers, and
doubles.
I suspect there are good reasons, I gather a lot of it has to do with speed. It
does however cause one issue when writing a client: it's not possible to read a
whole response without parsing it.
Rewriting sendCmd
With that extra information about the RESP3 protocol the naïve implementation
above falls short in a few ways
The read buffer may contain more than one full message and give the definition
of decode above any remaining bytes are simply dropped.1
The read buffer my contain less than one full message and then decode will
return an error.2
Surely this must be solvable, because in my mind running the parser results in
one of three things:
Parsing is done and the result is returned, together with any input that
wasn't consumed.
The parsing is not done due to lack of input, this is typically encoded as a
continuation.
The parsing failed so the error is returned, together with input that wasn't
consumed.
So, I started looking in the documentation for the module
Data.Attoparsec.ByteString.Lazy in attoparsec. I was a little surprised to find
that the Result type lacked a way to feed more input to a parser – it only
has two constructors, Done and Fail:
data Result r
= Fail ByteString[String]String
| Done ByteStringr
I'm guessing the idea is that the function producing the lazy bytestring in the
first place should be able to produce more chunks of data on demand. That's
likely what the lazy variant of recv does, but at the same time it also
requires choosing a maximum length and that doesn't rhyme with RESP3. The lazy
recv isn't quite lazy in the way I needed it to be.
When looking at the parser for strict bytestrings I calmed down. This parser
follows what I've learned about parsers (it's not defined exactly like this;
it's parameterised in its input but for the sake of simplicity I show it with
ByteString as input):
data Result r
= Fail ByteString[String]String
| Partial (ByteString->Result r)
| Done ByteStringr
Then to my delight I found that there's already a function for handling exactly
my problem
parseWith :: Monad m => (m ByteString)->Parser a->ByteString->m (Result a)
I only needed to rewrite the existing parser to work with strict bytestrings and
work out how to write a function using recv (for strict bytestrings) that
fulfils the requirements to be used as the first argument to parseWith. The
first part wasn't very difficult due to the similarity between attoparsec's
APIs for lazy and strict bytestrings. The second only had one complication. It
turns out recv is blocking, but of course that doesn't work well with
parseWith. I wrapped it in timeout based on the idea that timing out means
there's no more data and the parser should be given an empty string so it
finishes. I also decided to pass the parser as an argument, so I could use the
same function for receiving responses for individual commands as well as for
pipelines. The full receiving function is
import Data.ByteString qualifiedas BS
import Data.Text qualifiedas T
import Network.Socket.ByteString qualifiedas SB
recvParse :: S.Socket->Parser r->IO (Either Text (BS.ByteString, r))recvParse sock parser = do
parseWith receive parser BS.empty >>= \case
Fail _ [] err -> pure $ Left (T.pack err)
Fail _ ctxs err -> pure $ Left $T.intercalate " > "(T.pack <$> ctxs)<>": "<>T.pack err
Partial _ -> pure $ Left "impossible error"
Done rem result -> pure $ Right (rem, result)wherereceive =
timeout 100_000(SB.recv sock 4096)>>= \case
Nothing -> pure BS.empty
Just bs -> pure bs
Then I only needed to rewrite sendCmd and I wanted to do it in such a way that
any remaining input data could be use in by the next call to sendCmd.3 I
settled for modifying the Conn type to hold an IORef ByteString together
with the socket and then the function ended up looking like this
sendCmd :: Conn->Command r->IO (Result r)sendCmd(Conn p)(Command k cmd) = withResource p $ \(sock, remRef)->do
_ <- SBL.send sock $ toWireCmd cmd
rem <- readIORef remRef
recvParse sock rem resp >>= \case
Left err -> pure $ Left $ RespError "recv/parse" err
Right (newRem, r)->do
writeIORef remRef newRem
pure $ k <$> fromWireResp cmd r
What's next?
I've started looking into pub/sub, and basically all of the work described in
this post is a prerequisite for that. It's not very difficult on the protocol
level, but I think it's difficult to come up with a design that allows maximal
flexibility. I'm not even sure it's worthwhile the complexity.
I'm sure that whatever size of buffer I choose to use there'll be someone
out there who's storing values that are larger. Then there's pipelining that
makes it even more of an issue.
To be honest I'm not totally convinced there'll ever be any remaining input.
Unless a single Conn is used by several threads – which would lead to much
pain with the current implementation – or pub/sub is used – which isn't
supported yet.
In a previous blog
post
I categorized a number of different techniques for calculating range queries.
Today, I will discuss one of those techniques which is simple but frequently
useful.
Precomputing prefix sums
Suppose we have a static sequence of values \(a_1, a_2, a_3, \dots, a_n\) drawn from some
groupThat is,
there is an associative binary operation with an identity element, and
every element has an inverse.
, and want
to be able to compute the total value (according to the group
operation) of any contiguous subrange. That is, given a range
\([i,j]\), we want to compute \(a_i \diamond a_{i+1} \diamond \dots \diamond a_j\) (where \(\diamond\) is the group operation). For example,
we might have a sequence of integers and want to compute the sum, or
perhaps the bitwise xor (but not the maximum) of all the values in any particular
subrange.
Of course, we could simply compute \(a_i \diamond \dots \diamond a_j\)
directly, but that takes \(O(n)\) time. With some simple preprocessing,
it’s possible to compute the value of any range in constant time.
The key idea is to precompute an array \(P\) of prefix sums, so \(P_i = a_1 \diamond \dots \diamond a_i\). This can be computed in linear time
via a scan; for example:
importData.ArrayimportData.List (scanl')prefix ::Monoid a => [a] ->ArrayInt aprefix a = listArray (0, length a) $ scanl' (<>) mempty a
Actually, I would typically use an unboxed array, which is
faster but slightly more limited in its uses: import
Data.Array.Unboxed, use UArray instead of Array, and add an
IArray UArray a constraint.
Note that we set \(P_0 = 0\) (or whatever the identity element is for
the group); this is why I had the sequence of values indexed starting
from \(1\), so \(P_0\) corresponds to the empty sum, \(P_1 = a_1\), \(P_2 = a_1 \diamond a_2\), and so on.
Now, for the value of the range \([i,j]\), just compute \(P_j \diamond P_{i-1}^{-1}\)—that is, we start with a prefix that ends at the right place, then
cancel or “subtract” the prefix that ends right before the range we
want. For example, to find the sum of the integers \(a_5 + \dots + a_{10}\), we can compute \(P_{10} - P_4\).
range ::Group a =>ArrayInt a ->Int->Int-> arange p i j = p!j <> inv (p!(i-1))
That’s why this only works for groups but not for general monoids:
only in a group can we cancel unwanted values. So, for example,
this works for finding the sum of any range, but not the maximum.
Practice problems
Want to practice? Here are a few problems that can be solved using
techniques discussed in this post:
It is possible to generalize this scheme to 2D—that is, to compute
the value of any subrectangle of a 2D grid of values from some
group in only \(O(1)\) time. I will leave you the fun of figuring out
the details.
If you’re looking for an extra challenge, here are a few harder
problems which use techniques from this post as an important
component, but require some additional nontrivial ingredients:
Today, 2025-06-25, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …)
we are streaming the 45th episode of the Haskell Unfolder live on YouTube.
Haskell records as originally designed have had a reputation of being somewhat weird or, at worst, useless. A lot of features and modifications have been proposed over the years to improve the situation. But not all of these got implemented, or widespread adoption. The result is that the situation now is quite different from what it was in the old days, and additional changes are in the works. But the current state can be a bit confusing. Therefore, in this episode, we are going to look at how to make best use of Haskell records right now, discussing extensions such as DuplicateRecordFields, NoFieldSelectors, OverloadedRecordDot and OverloadedRecordUpdate, and we’ll get take a brief look at optics.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
Niki and Mike talked to Daniele Micciancio who is a professor at UC San Diego. He's been using Haskell for 20 years, and works in lattice cryptography. We talked to him about how he got into Haskell, using Haskell for teaching theoretical computer science and of course for his research and the role type systems and comonads could play in the design of cryptographic algorithms. Along the way, he gave an accessible introduction to post-quantum cryptography which we really enjoyed. We hope you do, too.
Suppose we have a sequence of values, which is static in the sense
that the values in the sequence will never change, and we want to
perform range queries, that is, for various ranges we want to
compute the total of all consecutive values in the range, according to
some binary combining operation. For example, we might want to
compute the maximum, sum, or product of all the consecutive values in
a certain subrange. We have various options depending on the kind of
ranges we want and the algebraic properties of the operation.
If we want ranges corresponding to a sliding window, we can use
an amortized queue
structure
to find the total of each range in \(O(1)\), for an arbitrary
monoid.
If we want arbitrary ranges but the operation is a group, the
solution is relatively straightforward: we can precompute all
prefix sums, and subtract to find the result for an arbitrary
range in \(O(1)\).
If the operation is an idempotent semigroup (that is, it has the
property that \(x \diamond x = x\) for all \(x\)), we can use a sparse
table, which takes \(O(n \lg n)\) time and space for precomputation,
and then allows us to answer arbitrary range queries in \(O(1)\).
If the operation is an arbitrary monoid, we can use a sqrt tree,
which uses \(O(n \lg \lg n)\) precomputed time and space, and allows
answering arbitrary range queries in \(O(\lg \lg n)\). I will write
about this in a future post.
Dynamic range queries
What if we want dynamic range queries, that is, we want to be able
to interleave range queries with arbitrary updates to the values of
the sequence?
If the operation is an arbitrary monoid, we can use a segment
tree.
If the operation is a group, we can use a Fenwick tree.
I published a paper about Fenwick
trees,
which also discusses segment trees, but I should write more about
them here!
Table
Here’s a table summarizing the above classification scheme. I plan to
fill in links as I write blog posts about each row.
An intriguing talk by Gabriella Gonzalez, delivered at Haskell Love 2020. Based largely on the famous marketing book, Crossing the Chasm. Gonzalez argues that marketing is not about hype, it is about setting priorities: what features and markets are you going to ignore? The key to adoption is to be able to solve a problem that people need solved today and where existing mainstream tools are inadequate. Joe Armstrong will tell you that the key to getting Erlang used was to approach failing projects and ask "Would you like us to build you a prototype?" Gonzalez makes a strong case that Haskell should first aim to capture the interpreters market. He points out that the finance/blockchain market may be another possibility. Recommended to me at Lambda Days by Pedro Abreu, host of the Type Theory Forall podcast.
A powerful op-ed by Gabor Maté in the Toronto Star.
Just as nothing justifies the atrocities of October 7, nothing about October 7 justifies Israeli atrocities against the Palestinians, either before or since October 7. Recently, I listened to orthopedic surgeon Dr. Deirdre Nunan, like me a graduate of UBC’s Faculty of Medicine, recount her harrowing experiences serving in a Gaza hospital under the siege that followed Israel’s breaking of the ceasefire in March. Her depictions of unspeakable horror, enacted as policy by one of the world’s most sophisticated militaries, were soul shattering. Many other physicians — Canadian, American, Jewish, Muslim, Christian — who have worked in Gaza speak in similar terms. British doctors describe witnessing “a slaughterhouse.” All their testimonies are widely accessible. The leading medical journal Lancet editorialized that in its assault on health care facilities and personnel in Gaza, “the Israeli Government has acted with impunity … Many medical academies and health professional organizations that claim a commitment to social justice have failed to speak out.” ...
It may be true that antisemitic animus can lurk behind critiques of Zionism. But in my decades of advocacy for Palestinian rights including medical visits to Gaza and the West Bank, I have rarely witnessed it. When present, it has a certain tone that one can feel is directed at Jewishness itself, rather than at the theory and practice of Zionism or at Israel’s actions. What is far more common and genuinely confusing for many is that Israel and its supporters, Jews and non-Jews, habitually confound opposition to Israeli policy with antisemitism. This is akin to Vietnam War protesters being accused of anti-Americanism. How is opposing the napalming of human beings anti-American or, say, deploring Israel’s use of mass starvation as a weapon of war in any sense anti-Jewish? ...
People deserve the right to experience as much liberty to publicly mourn, question, oppose, deplore, denounce what they perceive as the perpetration of injustice and inhumanity as they are, in this country, to advocate for the aims and actions of the Israeli government and its Canadian abettors amongst our political leadership, academia, and media.
Even if we feel powerless to stop the first genocide we have ever watched on our screens in real time, allow at least our hearts to be broken openly, as mine is. And more, let us be free to take democratic, non-hateful action without fear of incurring the calumny of racism.
Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. The official blurb is brief:
Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.
Here is a longer blurb, from my upcoming appearance at Curious, run by the RSE, in September.
Brave New Bullshit
In an AI era, who wins and who loses?
Your future workday might look like this:
You write bullet points.
You ask a chatbot to expand them into a report.
You send it to your boss ...
Who asks a chatbot to summarise it to bullet points.
Will AI help you to do your job or take it from you? Is it fair for AI to be trained on copyrighted material? Will any productivity gains benefit everyone or only a select few?
Join Professor Philip Wadler’s talk as he looks at the hopes and threats of AI, exploring who wins and who loses.
Arriving at a type for Redis commands required a bit of exploration. I had some
ideas early on that I for various reasons ended up dropping on the way. This is
a post about my travels, hopefully someone finds it worthwhile reading.
The protocol
The Redis Serialization Protocol (RESP) initially reminded me of JSON and I
thought that following the pattern of aeson might be a good idea. I decided
up-front that I'd only support the latest version of RESP, i.e. version 3. So, I
thought of a data type, Resp with a constructor for each RESP3 data type, and
a pair of type classes, FromResp and ToResp for converting between Haskell
types and RESP3. Then after some more reflection I realised that converting to
RESP is largely pointless. The main reason to convert anything to RESP3 is to
assemble a command, with its arguments, to send to Redis, but all commands are
arrays of bulk strings so it's unlikely that anyone will actually use
ToResp.1 So I scrapped the idea of ToResp. FromResp looked like this
class FromResp a wherefromResp :: Value->Either FromRespError a
When I started defining commands I didn't like the number of ByteString
arguments that resulted in, so I defined a data type, Arg, and an accompanying
type class for arguments, ToArg:
Later on I saw that it might also be nice to have a type class specifically for
keys, ToKey, though that's a wrapper for a single ByteString.
Implementing the functions to encode/decode the protocol were straight-forward
applications of attoparsec and bytestring (using its Builder).
A command is a function in need of a sender
Even though supporting pipelining was one of the goals I felt a need to make
sure I'd understood the protocol so I started off with single commands. The
protocol is a simple request/response protocol at the core so I settled on this
type for commands
type Cmd a = forall m. (Monad m) => (ByteString->m ByteString)->m (Either FromRespError a)
that is, a command is a function accepting a sender and returning an a.
I wrote a helper function for defining commands, sendCmd
sendCmd :: (Monad m, FromResp a) => [ByteString]->(ByteString->m ByteString)->m (Either FromRespError a)sendCmd cmdArgs send = doletcmd = encode $ Array $ map BulkString cmdArgs
send cmd <&> decode >>= \case
Left desc -> pure $ Left $ FromRespError "Decode"(Text.pack desc)
Right v -> pure $ fromValue v
which made it easy to define commands. Here are two examples, append and mget:
append :: (ToArg a, ToArg b) => a->b->Cmd Intappend key val = sendCmd $["APPEND"]<> unArg (toArg key <> toArg val)-- | https://redis.io/docs/latest/commands/mget/mget :: (ToArg a, FromResp b) => NE.NonEmpty a->Cmd (NE.NonEmpty b)mget ks = sendCmd $["MGET"]<> unArg (foldMap1 toArg ks)
The function to send off a command and receive its response, sendAndRecieve,
was just a call to send followed by a call to recv in network (the variants
for lazy bytestrings).
I sort of liked this representation – there's always something pleasant with
finding a way to represent something as a function. There's a very big problem
with it though: it's difficult to implement pipelining!
Yes, Cmd is a functor since (->) r is a functor, and thus it's possible to
make it an Applicative, e.g. using free. However, to implement pipelining it's
necessary to
encode all commands, then
concatenate them all into a single bytestring and send it
read the response, which is a concatenation of the individual commands'
responses, and
convert each separate response from RESP3.
That isn't easy when each command contains its own encoding and decoding. The
sender function would have to relinquish control after encoding the command, and
resume with the resume again later to decode it. I suspect it's doable using
continuations, or monad-coroutine, but it felt complicated and rather than
travelling down that road I asked for ideas on the Haskell Discourse. The
replies lead me to a paper, Free delivery, and a bit later a package,
monad-batcher. When I got the pointer to the package I'd already read the paper
and started implementing the ideas in it, so I decided to save exploring
monad-batcher for later.
A command for free delivery
The paper Free delivery is a perfect match for pipelining in Redis, and my
understanding is that it proposes a solution where
Commands are defined as a GADT, Command a.
Two functions are defined to serialise and deserialise a Command a. In the
paper they use String as the serialisation, so show and read is used.
A type, ActionA a, is defined that combines a command with a modification
of its a result. It implements Functor.
A free type, FreeA f a is defined, and made into an Applicative with the
constraint that f is a Functor.
A function, serializeA, is defined that traverses a FreeA ActionA a
serialising each command.
A function, deserializeA, is defined that traverses a FreeA ActionA a
deserialising the response for each command.
I defined a command type, Command a, with only three commands in it, echo,
hello, and ping. I then followed the recipe above to verify that I could get
it working at all. The Haskell used in the paper is showing its age, and there
seems to be a Functor instance missing, but it was still straight forward and
I could verify that it worked against a locally running Redis.
Then I made a few changes…
I renamed the command type to Cmd so I could use Command for what the
paper calls ActionA.
data Cmd r where
Echo :: Text->Cmd Text
Hello :: Maybe Int->Cmd ()
Ping :: Maybe Text->Cmd Textdata Command a = forall r. Command !(r->a) !(Cmd r)instance Functor Commandwherefmap f (Command k c) = Command (f . k) c
toWireCmd :: Cmd r->ByteStringtoWireCmd(Echo msg) = _
toWireCmd(Hello ver) = _
toWireCmd(Ping msg) = _
fromWireResp :: Cmd r->Resp->Either RespError rfromWireResp(Echo _) = fromResp
fromWireResp(Hello _) = fromResp
fromWireResp(Ping _) = fromResp
(At this point I was still using FromResp.)
I also replaced the free applicative defined in the paper and started using
free. A couple of type aliases make it a little easier to write nice signatures
type Pipeline a = Ap Command atype PipelineResult a = Validation [RespError] a
and defining individual pipeline commands turned into something rather
mechanical. (I also swapped the order of the arguments to build a Command so I
can use point-free style here.)
On the other hand deserialisation became a little more involved, but it's not
too bad
fromWirePipelineResp :: Pipeline a->[Resp]->PipelineResult afromWirePipelineResp(Pure a) _ = pure a
fromWirePipelineResp(Ap (Command k c) p)(r : rs) = fromWirePipelineResp p rs <*>(k <$> liftError singleton (fromWireResp c r))fromWirePipelineResp _ _ = Failure [RespError "fromWirePipelineResp""Unexpected wire result"]
Everything was working nicely and I started adding support for more commands. I
used the small service from work to guide my choice of what commands to add.
First out was del, then get and set. After adding lpush I was pretty much ready
to try to replace hedis in the service from work.
data Cmd r where-- echo, hello, ping
Del :: (ToKey k) => NonEmpty k->Cmd Int
Get :: (ToKey k, FromResp r) => k->Cmd r
Set :: (ToKey k, ToArg v) => k->v->Cmd Bool
Lpush :: (ToKey k, ToArg v) => k->NonEmpty v->Cmd Int
However, when looking at the above definition started I thinking.
Was it really a good idea to litter Cmd with constraints like that?
Would it make sense to keep the Cmd type a bit closer to the actual Redis
commands?
Also, maybe FromResp wasn't such a good idea after all, what if I remove it?
That brought me to the third version of the type for Redis commands.
Converging and simplifying
While adding new commands and writing instances of FromResp I slowly realised
that my initial thinking of RESP3 as somewhat similar to JSON didn't really pan
out. I had quickly dropped ToResp and now the instances of FromResp didn't
sit right with me. They obviously had to "follow the commands", so to speak, but
at the same time allow users to bring their own types. For instance, LSPUSH
returns the number of pushed messages, but at the same time GET should be able
to return an Int too. This led to Int's FromResp looking like this
instance FromResp IntwherefromResp(BulkString bs) =
case parseOnly (AC8.signed AC8.decimal) bs of
Left s -> Left $ RespError "FromResp"(TL.pack s)
Right n -> Right n
fromResp(Number n) = Right $ fromEnum n
fromResp _ = Left $ RespError "FromResp""Unexpected value"
I could see this becoming worse, take the instance for Bool, I'd have to
consider that
for MOVEInteger 1 means True and Integer 0 means False
for SETSimpleString "OK" means True
users would justifiably expect a bunch of bytestrings to be True, e.g.
BulkString "true", BulkString "TRUE", BulkString "1", etc
However, it's impossible to cover all ways users can encode a Bool in a
ByteString so no matter what I do users will end up having to wrap theirBool with newtype and implement a fitting FromResp. On top of that, even
thought I haven't found any example of it yet, I fully expect there to be,
somewhere in the large set of Redis commands, at least two commands each wanting
an instance of a basic type that simply can't be combined into a single
instance, meaning that the client library would need to do some newtype
wrapping too.
No, I really didn't like it! So, could I get rid of FromResp and still offer
users an API where they can user their own types as the result of commands?
To be concrete I wanted this
data Cmd r where-- other commands
Get :: (ToKey k) => k->Cmd (Maybe ByteString)
and I wanted the user to be able to conveniently turn a Cmd r into a Cmd s.
In other words, I wanted a Functor instance. Making Cmd itself a functor
isn't necessary and I just happened to already have a functor type that wraps
Cmd, the Command type I used for pipelining. If I were to use that I'd need
to write wrapper functions for each command though, but if I did that then I
could also remove the ToKey~/~ToArg constraints from the constructors of Cmd
r and put them on the wrapper instead. I'd get
data Cmd r where-- other commands
Get :: Key->Cmd (Maybe ByteString)get :: (ToKey k) => k->Command (Maybe ByteString)get = Command id . Get . toKey
I'd also have to rewrite fromWireResp so it's more specific for each command.
Instead of
fromWireResp :: Cmd r -> Resp -> Either RespError r
fromWireResp (Get _) = fromResp
...
I had to match up exactly on the possible replies to GET
fromWireResp :: Cmd r->Resp->Either RespError rfromWireResp _ (SimpleError err desc) = Left $ RespError (T.decodeUtf8 err)(T.decodeUtf8 desc)fromWireResp(Get _)(BulkString bs) = Right $ Just bs
fromWireResp(Get _) Null = Right Nothing
...
fromWireResp _ _ = Left $ RespError "fromWireResp""Unexpected value"
Even though it was more code I liked it better than before, and I think it's
slightly simpler code. I also hope it makes the use of the API is a bit simpler
and clear.
Here's an example from the code for the service I wrote for work. It reads a UTC
timestamp stored in timeKey, the timestamp is a JSON string so it needs to be
decoded.
readUTCTime :: Connection->IO (Maybe UTCTime)readUTCTime conn =
sendCmd conn (maybe Nothing decode <$> get timeKey)>>= \case
Left _ -> pure Nothing
Right datum -> pure datum
What's next?
I'm pretty happy with the command type for now, though I have a feeling I'll
have to revisit Arg and ToArg at some point.
I've just turned the Connection type into a pool using resource-pool, and I
started looking at pub/sub. The latter thing, pub/sub, will require some thought
and experimentation I think. Quite possibly it'll end up in a post here too.
Of course one could use RESP3 as the serialisation format for storing values
in Redis. Personally I think I'd prefer using something more widely used, and
easier to read, such as JSON or BSON.
This is the twenty-seventh edition of our GHC activities report, which describes
the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain.
The current edition covers roughly the months of March 2025 to May 2025.
You can find the previous editions collected under the
ghc-activities-report tag.
Sponsorship
We offer Haskell Ecosystem Support Packages to provide commercial
users with support from Well-Typed’s experts, while investing in the Haskell
community and its technical ecosystem including through the work described in
this report. To find out more, read our recent announcement of these
packages in partnership with
the Haskell Foundation. We need funding to continue this essential maintenance work!
Many thanks to our Haskell Ecosystem Supporters: Channable
and QBayLogic;
to our existing clients who also contribute to making this work possible:
Anduril, Juspay and Mercury;
and to the HLS Open Collective for
supporting HLS release management.
Team
The Haskell toolchain team at Well-Typed currently includes:
In addition, many others within Well-Typed contribute to GHC, Cabal and HLS
occasionally, or contribute to other open source Haskell libraries and tools.
This feature
allows one to specify whether imports are needed for running Template Haskell
splices, or for generating Template Haskell quotes. This cleanly separates which
modules are required at compile-time vs those that are required at runtime.
For example, the pandoc package uses the Template Haskell deriveJSON
function from the aeson package. This function can be imported using a splice import:
{-# LANGUAGE ExplicitLevelImports #-}{-# LANGUAGE TemplateHaskell #-}moduleText.Pandoc.App.Optwhereimport splice Data.Aeson.TH (deriveJSON, defaultOptions)-- + many other non-splice importsdataXYZ=...$(deriveJSON defaultOptions ''XYZ)
Declaring the Data.Aeson.TH import as a splice import informs GHC that this module is required only at compile-time, and (crucially) that other, non-splice, imports, are not needed at compile time. This hugely improves the performance
of tools that use -fno-code (such as HLS), as GHC is no longer required to
pessimistically assume that all modules imported in a module enabling
TemplateHaskell are required at compile-time.
GHCi support for primops
Andreas significantly improved GHCi performance by implementing certain GHC
primops (such as integer arithmetic operations) directly in the bytecode
interpreter (!13978).
Reductions in runtime of up to 50% have been observed, with GHC-in-GHCi speeding
up by about 15%.
Improvements to the debugger
Rodrigo has made numerous improvements to the GHCi debugger, which had
accumulated many bugs over the years due to lack of maintenance
(!14246, !14195, !14160, !14106, !14196, !14195, !13997).
Usability is improved across the board, with quality-of-life fixed such as
adding breakpoints to all statements in a do block to make debugging more
predictable (#25932) to significant performance improvements to :steplocal
(#25779).
Rodrigo also published the ghc-debugger package
including an executable ghc-debug-adapter.
This implements the Debug Adapter Protocol, enabling Haskell programs
to be stepped-through and debugged from editors such as Visual Studio Code.
ghc-debug-adapter depends on many recent changes to GHC, so it is compatible only
with the upcoming GHC 9.14.
Expressions in SPECIALISE pragmas
Sam worked with Simon Peyton Jones to finalise MR !12319 “Expressions in SPECIALISE
pragmas”.
This change means that a SPECIALISE pragma is no longer required to simply be
a type signature, it can be an arbitrary expression. For full details, see GHC proposal #493,
but two particular idioms are worth noting.
Firstly, the type at which to specialise can now be specified by a type application,
e.g.
myFunction ::forall a.Num a => a ->Maybe a -> (a, a)myFunction =...{-# SPECIALISE myFunction @Int #-}
This specialise pragma is much more concise than:
{-# SPECIALISE :: Int -> Maybe Int -> (Int, Int) #-}
and less prone to breakage when the type of myFunction changes.
Secondly, the syntax enables value specialisation, for example:
This tells GHC to optimise the non-debug code path, without the debug logic
potentially getting in the way.
Multiple Home Units support in GHCi
GHC 9.14 is fully compatible with multiple home units, including all GHCi commands and the GHCi debugger,
thanks to work by Hannes about which we recently published a blog post (!14231).
Our new design generalises the architecture of GHCi so that multi-unit and single-unit sessions are handled in the same way.
The uniform handling will make sure that multi-unit sessions work correctly as GHCi evolves.
The team are now working towards the release of GHC 9.14.1 later this year.
After various community discussions, GHC HQ are planning to start designating some major release series as Long Term Support. This means increasing the length of the support window for LTS releases and reducing it for non-LTS releases.
Frontend
Sam fixed a regression in the implementation of QuickLook in GHC 9.12 that
would cause valid programs to be rejected (#26030, #25950, !14235).
Sam fixed a problem in which HasCallStack evidence was incorrectly cached
in GHC, causing GHC to bogusly report identical call stacks (#25529, !14084).
Sam rectified several oversights in the initial implementation of the
NamedDefaults language extension laid out in GHC proposal #409:
an issue with exporting named defaults (#25857, !14142),
lack of support for named default declarations for poly-kinded typeclasses such
as Typeable (#25882, !14143),
Sam fixed duplicate record fields sometimes being reported as unused
when they are actually used (#24035, !14066).
Sam improved the error message emitted by GHC when one attempts to write
a non-class at the head of a typeclass instance (#22688, !14105).
Sam fixed several issues with the renaming of export lists:
one issue involved the TypeData extension (#24027, !14119),
another was to do with bundled pattern synonyms (#25892, !14154).
Sam made “illegal term-level use” error messages more user friendly (#23982, !14122).
That MR also improved the way GHC reports name qualification to the user,
preferring to display the user-written qualification in error messages.
Sam fixed GHC creating unnecessary cycle-breaker variables, which could cause
problems for type-checking plugins that weren’t expecting them (#25933, !14206).
Sam implemented the deprecation described in GHC proposal #448:
the combination of ScopedTypeVariables and TypeApplications no longer
enables the use of type applications in constructor patterns, requiring
instead the TypeAbstractions extension (!13551).
Sam fixed an issue in which equal types compared non-equal under TypeRep-equality
by implementing a suggestion by Krzysztof Gogolewski (#25998, !14281).
Sam improved the documentation surrounding defaulting in the user’s guide,
providing a high-level overview of the different mechanisms in GHC for
defaulting ambiguous type variables (#25807, !14057).
Backend
Ben and Sam investigated testsuite failures in the LLVM backend (#25769).
They identified many different issues:
#25730 concerned incorrect type annotations in the generated LLVM, fixed
in !13936.
#25770, #25773 were symptoms of a serious bug in the implementation of
floating-point register padding (fixed in !14134),
!14129 fixed incorrect type annotations in the LLVM for atomic operations,
adding new tests to Cmm Lint to avoid similar bugs in the future.
Most of the other bugs involved initializers/finalizers, which were due to
incorrect linkage annotation for builtin arrays (fixed in !14157).
Rodrigo worked with Simon Peyton Jones to fix an issue in which the presence or absence
of unrelated RULES could affect compilation, leading to non-deterministic
compilation (#25170, !13884).
Andreas fixed a bug in which GHC would construct over-saturated constructor
applications, which caused a panic when building the xmonad-contrib package
(#23865, !14036).
Andreas made GHC constant-fold away invalid tagToEnum# calls to a particular
error expression, which unlocks dead-code elimination opportunities and
makes it easier to debug issues that arise from invalid use of tagToEnum# (#25976, !14254)
Andreas added -fhuge-code-sections, an off-by-default flag that provides
a workaround for AArch64 users running into bug #24648.
Matthew overhauled the driver to bring one-shot compilation and make mode
in line with each other, by consistently using the module graph to answer
queries related to the module import structure (!14198, !14209).
This was partly motivated by implementation requirements of the “Explicit
Splice Imports” proposal, for which module graph queries are a central
component.
Matthew added support for “fixed” nodes in the module graph, which can be used
for modules without corresponding source-files that are e.g. generated via
the GHC API (#25920, !14187).
Rodrigo moved some DynFlags consistency checks in order to consolidate
the logic into the core makeDynFlagsConsistent function.
Ben changed how GHC prints Uniques to the user to avoid NULL characters
(#25989, !14265).
Compiler performance
Matthew improved the performance of the bytecode assembler by ensuring the
code is properly specialised (!13983).
Matthew made sure that forceModIface properly forced all fields of ModIface
in order to avoid space leaks (!14078).
Matthew removed unused mi_used_th and mi_hpc fields from interfaces, which
were needlessly bloating interface files (!14073).
Matthew avoided allocation of intermediate ByteStrings when serialising
FastStrings (#25861, !14107).
Recompilation checking
Matthew overhauled the ModIface datatype, splitting it up in a more logical
way which makes it easier to identify which parts contribute to recompilation
checking (!14102). This allowed fixing several issues with recompilation
checking in !14118, such as:
it ignored changes in exported named default declarations (#25855),
it did not take into account changes to COMPLETE pragmas (#25854).
Matthew added the -fwrite-if-self-recomp flag which controls whether to
include self-recompilation information, which avoids writing recompilation
information in cases such as producing binary distributions for which
recompilation is not a concern (#10424, #22188, !8604).
Matthew refactored the implementation of recompilation-checking to ensure
that all flags that influence recompilations are correctly taken into account
(#25837, !14085).
Sam improved recompilation checking for export lists in !14178 (#25881).
In practice, this means that modules with explicit import lists will no longer
always trigger the recompilation of a module they depend on when that module’s
export list changes, as long as the explicitly imported items are preserved.
Matthew improved the output of -dump-hi-diff to properly display the precise
change in flags which caused recompilation (#25571, !13792).
Runtime system
Ben fixed a bug in which the WinIO I/O manager was being inconsistently
selected (#25838, !14088).
Ben diagnosed and fixed a linking issue affecting global offset table usage
on macOS that manifested in incorrect runtime results when using the GHC API
(#25577, !13991).
Ben fixed an issue in which GHC’s RTS linker was too eager to load shared
objects which refer to undefined symbols (#25943, !14290).
Ben significantly improved the performance of the RTS linker, culminating in
a reduction in GHCi startup time from 2.5s to 250ms on Windows (#26052,
#26009, !14339).
GHCi & bytecode interpreter
Andreas fixed several endianness issues in the interpreter (#25791, !14172).
Matthew implemented a fix for the mishandling of stack underflow frames
(#25750, !13957). A remaining issue was subsequently identified (#25865)
and fixed by Andreas’ work on the interpreter (!13978).
Matthew ensured that all top-level functions are visible when loading
a module in the interpreter, not only exported functions (!14032).
Matthew fixed a bug in the simplifier that caused Core Lint failures when
compiling certain programs (#25790, !14019).
Matthew fixed a regression in the way that GHCi would import modules that
involved Cabal mixins stanzas (#25951, !14222).
Libraries
Ben exposed the constructors and fields of the Backtrace datatype in base
(#26049, !14351).
Ben brought base changelog entries up to date in !14320.
Build system & packaging
Sam fixed GHC not working properly if the installation path contains spaces
on Windows (#25204, !14137).
Ben fixed a couple of issues relating to the llvm-as flag:
the value of the field was incorrectly set (#25856, !14104),
the information in the field was passed incorrectly to clang (#25793, !14025).
Testsuite
Andreas fixed a bug in which tests requiring the interpreter would be run even
if the compiler didn’t support it (#25533, !14201).
Matthew fixed an issue with tests that used Template Haskell in the profiled
dynamic way (#25947, !14215).
Cabal
Mikolaj prepared the 3.14.2.0 bugfix release to the Cabal package suite
(including the Cabal library and cabal-install).
Matthew fixed all known regressions in the 3.14.1.0 release of cabal-install:
Issue #10759
to do with picking up unwanted environment files #10828.
Duplication of environment variables (#10718, #10827).
Interaction of multi-repl with internal dependencies (#10775, #10841).
The Hackage documentation builder has been completely
revamped with a
more maintainable deployment strategy and a broader set of native packages
available, enabling more Hackage packages to benefit from automatically-built
documentation.
With these maintainability improvements we hope that haskell.org’s core
infrastructure team can be more easily grown in the future.
A couple of weeks ago I needed a small, hopefully temporary, service at work. It
bridges a gap in functionality provided by a legacy system and the functionality
desired by a new system. The legacy system is cumbersome to work with, so we
tend to prefer building anti-corruption layers rather than changing it directly,
and sometimes we implement it as separate services.
This time it was good enough to run the service as a cronjob, but it did need to
keep track of when it ran the last time. It felt silly to spin up a separate DB
just to keep a timestamp, and using another service's DB is something I really
dislike and avoid.1 So, I ended up using the Redis instance that's used as a
cache by a OSS service we host.
The last time I had a look at the options for writing a Redis client in Haskell
I found two candidates, hedis and redis-io. At the time I wrote a short note
about them. This time around I found nothing much has changed, they are still
the only two contenders and they still suffer from the same issues
hedis has still has the same API and I still find it as awkward.
redis-io still requires a logger.
I once again decided to use hedis and wrote the service for work in a couple
of days, but this time I thought I'd see what it would take to remove the
requirement on tinylog from redis-io. I spent a few evenings on it, though I
spent most time on "modernising" the dev setup, using Nix to build, re-format
using fourmolu, etc. I did the same for redis-resp, the main dependency of
redis-io. The result of that can be found on my gitlab account:
At the moment I won't take that particular experiment any further and given that
the most recent change to redis-io was in 2020 (according to its git repo)
I don't think there's much interest upstream either.
Making the changes to redis-io and redis-resp made me a little curious about
the Redis protocol so I started reading about it. It made me start thinking
about implementing a client lib myself. How hard could it be?
I'd also asked a question about Redis client libs on r/haskell and a response
led me to redis-schema. It has a very good README, and its section on
transactions with its observation that Redis transactions are a perfect match
for Applicative. This pushed me even closer to start writing a client lib.
What pushed me over the edge was the realisation that pipelining also is a
perfect match for Applicative.
For the last few weeks I've spent some of my free time reading and experimenting
and I'm enjoying it very much. We'll see where it leads, but hopefully I'll at
least have bit more to write about it.
GHC’s support for compiling multiple units in a single invocation is essential
for tooling to work well with real-world Haskell projects.
Loading your whole project into a single GHCi session allows you to get feedback
quickly on changes to any part of your project, without having to restart the REPL.
Until now, not all of GHCi worked with multiple home units, and this was a source of confusion
for many users.
We’re now happy to announce that in 9.14.1, GHCi will fully support multiple home units.
This post contains a brief overview of the changes.
Multiple Home Units
Work on multiple home units has been ongoing for a while. This is the latest chapter
in our efforts to update the ecosystem to support this feature.
The main way to start a multi-unit GHCi session is by using cabal repl --enable-multi-repl with
a selector that selects multiple components in the project, such as all:
> cabal repl--enable-multi-repl all
This will start a GHCi session with a home unit for each selected component. Until now, support in the REPL was
essentially limited to reloading modules to get feedback about changes. Almost all other commands were unsupported
when using multiple home units.
GHCi Supports Multiple Home Units
Following our changes, GHCi now fully supports multiple home units in its REPL.
The experience of a user is now the same whether they are using a single home unit or multiple home units.
In particular, the following features have been fixed or enabled:
Usual REPL usage such as evaluating expressions
All GHCi commands
:seti/:set
:browse
:module [+/-] [*]Mod1 ...
… and many more!
The GHCi debugger
:break, :steplocal, :continue, etc…
Implementing Multi Unit Support in GHCi
To fully support multiple home units, GHCi needed a new internal model of how
different contexts interact during a session. There are three key contexts:
the prompt (the context in which expressions are evaluated),
the script context (in which scripts loaded by :load are executed), and
the unit context (the home units specified on the command line, e.g. the components of the Cabal packages being loaded).
Distinguishing these three different contexts is the key to our design. Before,
each GHCi session only had a single home unit, and so commands would always be interpreted
relative to that unit. In a multi-unit session, one of the units was chosen as the “active”
unit, and commands would be interpreted relative to that unit. Now since it is possible
to talk precisely about the different contexts, the dependencies between them and
where commands should be interpreted, we can properly implement all GHCi commands.
Virtual home units
Our design adds virtual home units for the prompt and script contexts. Therefore,
every GHCi session is a multi-unit session, and all commands are modified to support this.
This virtual home unit for the prompt is called interactive-ghci.
All user input is interpreted in the context of interactive-ghci (it is the “active” unit).
Since it always depends on all user-given home units (i.e. those given on the command line),
we can import modules, run code,
and execute GHCi commands as usual.
The virtual home unit for scripts is called interactive-session.
It is similar in structure to interactive-ghci, namely that it depends on all user-given home units.
This allows scripts to use packages from the current GHCi REPL session.
Additionally, interactive-ghci depends on interactive-session, allowing the user to load and execute the script modules from the prompt.
Why do we need two virtual home units?
When a script is loaded via :load Mod.hs, this Mod.hs needs to be interpreted relative to some home unit.
We do not want to guess which home unit Mod.hs should be added to, since the behaviour is hard to predict in a multiple home unit session.
However, we also can’t add Mod.hs to the interactive-ghci home unit, as we want to be able to maintain a different set of GHC options for the prompt (i.e. interactive-ghci)
and scripts.
Adding these two virtual home units to the GHCi REPL session yields the following Home Unit Graph.
We mark interactive-ghci to indicate that it is the “active” context of the GHCi prompt.
GHCi’s Home Unit Graph, showing two virtual units interactive-ghci and interactive-session, where the former depends on the latter. Both of these depend on any number of user-given home units, indicated by the names pkg1 … pkgN.
Examples
Now that we know how the GHCi session will work, let’s show a couple of concrete examples.
We assume a regular cabal project, initialised via the command:
This creates a cabal project with three components:
lib:mhu-example: The main library.
exe:mhu-example: An executable.
test:mhu-example-test: A test-suite.
From the perspective of GHC, a unit is essentially identical to a single component (with some hand-waving).
Example of a cabal project with multiple components. GHC treats each component as a separate unit.
When we load only the library into a GHCi session, then the library is the single user-specified home unit in the GHCi session.
For example, the cabal invocation
cabal repl lib:mhu-example
invokes the following GHC command:
ghc--interactive-this-unit-id lib-mhu-example -package base -package containers ...
This creates a home unit graph with three home units: interactive-ghci, interactive-session and mhu-example-library.
Home Unit Graph with a single user-specified Home Unit. There are three units, interactive-ghci, interactive-session and lib:mhu-example. interactive-ghci depends on interactive-session and lib:mhu-example, while interactive-session depends on lib:mhu-example.
In the case of more than one user-specified home unit, the graph is extended in an intuitive way.
For example, the cabal invocation
Home Unit Graph with a multiple user-specified home units. There are five units, called interactive-ghci, interactive-session, lib:mhu-example, exe:mhu-example and test:mhu-example-test.
Naturally, home units can have dependencies on other home units, e.g. test:mhu-example-test and exe:mhu-example both depend on lib:mhu-example.
Setting REPL Options
The GHCi commands :set and :seti are used to change the GHC options of the home units and the ghc options for the prompt respectively.
In the new architecture, the :set command applies the new options to all home units exceptinteractive-ghci.
:seti, on the other hand, applies changes only to the interactive-ghci home unit.
In the future, we may want to extend the capabilities of the :set command to change the GHC options only for certain home units.
Summary
GHCi is now fully compatible with multiple home units, including all GHCi commands and the GHCi debugger.
Our new design generalises the architecture of GHCi so that multi-unit and single-unit sessions are handled in the same way.
The uniform handling will make sure that multi-unit sessions work correctly as GHCi evolves.
This work has been performed in collaboration with Mercury, who
have a long-term commitment to the scalability and robustness of the Haskell
ecosystem.
Well-Typed are always interested in projects and looking for funding to improve
GHC and other Haskell tools. Please contact info@well-typed.com if we
might be able to work with you!
The unit arguments are passed using response files. The file exe-mhu-example contains the
arguments for the exe:mhu-example home unit, and similarly for the other files.↩︎
In January 2009, while just a baby first-year PhD student, I wrote a
blog post titled Abstraction, intuition, and the “monad tutorial
fallacy”.
In it, I made the argument that humans tend to learn best by first
grappling with concrete examples, and only later proceeding to
higher-level intuition and analogies; hence, it’s a mistake to
think that clearly presenting your intuition for a topic will help
other people understand it. Analogies and intuition can help, but
only when accompanied by concrete examples and active engagement. To
illustrate the point, I made up a fictitious programmer with a
fictitious analogy.
But now Joe goes and writes a monad tutorial called “Monads are
Burritos,” under the well-intentioned but mistaken assumption that
if other people read his magical insight, learning about monads will
be a snap for them. “Monads are easy,” Joe writes. “Think of them as
burritos.” Joe hides all the actual details about types and such
because those are scary, and people will learn better if they can
avoid all that difficult and confusing stuff. Of course, exactly
the opposite is true, and all Joe has done is make it harder for
people to learn about monads…
My intention was to choose a fictitious analogy which was obviously
ridiculous and silly, as a parody of many of the monad tutorials which
existed at the time (and still do). Mark Jason Dominus
then wrote a blog post, Monads are like
burritos, pointing out
that actually, monads are kinda like burritos. It’s really funny,
though I don’t think it’s actually a very good analogy, and my guess
is that Mark would agree: it was clearly written as a silly joke and
not as a real way to explain monads.
In any case, from that point the “monads are burritos” meme took on a
life of its own. For example:
So, to set the record straight: “monads are burritos” is not a helpful
analogy!Yes, I am writing a blog post because People Are Wrong On
The Internet, and I know it probably won’t
make any difference, but here we are.
The burrito analogy strongly implies that a value of type m a
somehow “contains” a value (or values) of type a. But that is not
true for all monads (e.g. there is no sense in which a value of type
IO String contains a String).
Relatedly, the analogy also implies that a value of type m a can
be “unwrapped” to get an a, but this is impossible for many monads.
It is not actually very easy to take a burrito containing a burrito
and merge it into a single-level burrito. At least this is not in
any sense a natural operation on burritos. Perhaps you could argue
that it is always easy to remove outer tortilla layers (but not the
innermost one since the food will all fall out), but this is a bad
analogy, since in general join does not just “remove” an outer
layer, but somehow merges the effects of two layers into one.
Actually, burritos are a great analogy for the Identity monad!
…but not much beyond that.
On a more positive note, my sense is that the average
pedagogical quality of Haskell materials, and monad tutorials in
particular, has indeed gone up significantly since 2009. I’d love to
think this can be at least partially attributed to my original blog
post, though of course it’s impossible to know that for sure.
<noscript>Javascript needs to be activated to view comments.</noscript>
(Updated June 2025 for PenroseKiteDart version 1.4)
PenroseKiteDart is a Haskell package with tools to experiment with finite tilings of Penrose’s Kites and Darts. It uses the Haskell Diagrams package for drawing tilings. As well as providing drawing tools, this package introduces tile graphs (Tgraphs) for describing finite tilings. (I would like to thank Stephen Huggett for suggesting planar graphs as a way to reperesent the tilings).
This document summarises the design and use of the PenroseKiteDart package.
PenroseKiteDart package is now available on Hackage.
In figure 1 we show a dart and a kite. All angles are multiples of (a tenth of a full turn). If the shorter edges are of length 1, then the longer edges are of length , where is the golden ratio.
Figure 1: The Dart and Kite Tiles
Aperiodic Infinite Tilings
What is interesting about these tiles is:
It is possible to tile the entire plane with kites and darts in an aperiodic way.
Such a tiling is non-periodic and does not contain arbitrarily large periodic regions or patches.
The possibility of aperiodic tilings with kites and darts was discovered by Sir Roger Penrose in 1974. There are other shapes with this property, including a chiral aperiodic monotile discovered in 2023 by Smith, Myers, Kaplan, Goodman-Strauss. (See the Penrose Tiling Wikipedia page for the history of aperiodic tilings)
This package is entirely concerned with Penrose’s kite and dart tilings also known as P2 tilings.
Legal Tilings
In figure 2 we add a temporary green line marking purely to illustrate a rule for making legal tilings. The purpose of the rule is to exclude the possibility of periodic tilings.
If all tiles are marked as shown, then whenever tiles come together at a point, they must all be marked or must all be unmarked at that meeting point. So, for example, each long edge of a kite can be placed legally on only one of the two long edges of a dart. The kite wing vertex (which is marked) has to go next to the dart tip vertex (which is marked) and cannot go next to the dart wing vertex (which is unmarked) for a legal tiling.
Figure 2: Marked Dart and Kite
Correct Tilings
Unfortunately, having a finite legal tiling is not enough to guarantee you can continue the tiling without getting stuck. Finite legal tilings which can be continued to cover the entire plane are called correct and the others (which are doomed to get stuck) are called incorrect. This means that decomposition and forcing (described later) become important tools for constructing correct finite tilings.
2. Using the PenroseKiteDart Package
You will need the Haskell Diagrams package (See Haskell Diagrams) as well as this package (PenroseKiteDart). When these are installed, you can produce diagrams with a Main.hs module. This should import a chosen backend for diagrams such as the default (SVG) along with Diagrams.Prelude.
module Main (main)whereimport Diagrams.Backend.SVG.CmdLine
import Diagrams.Prelude
For Penrose’s Kite and Dart tilings, you also need to import the PKD module and (optionally) the TgraphExamples module.
import PKD
import TgraphExamples
Then to ouput someExample figure
fig::Diagram B
fig = someExample
main :: IO ()
main = mainWith fig
Note that the token B is used in the diagrams package to represent the chosen backend for output. So a diagram has type Diagram B. In this case B is bound to SVG by the import of the SVG backend. When the compiled module is executed it will generate an SVG file. (See Haskell Diagrams for more details on producing diagrams and using alternative backends).
3. Overview of Types and Operations
Half-Tiles
In order to implement operations on tilings (decompose in particular), we work with half-tiles. These are illustrated in figure 3 and labelled RD (right dart), LD (left dart), LK (left kite), RK (right kite). The join edges where left and right halves come together are shown with dotted lines, leaving one short edge and one long edge on each half-tile (excluding the join edge). We have shown a red dot at the vertex we regard as the origin of each half-tile (the tip of a half-dart and the base of a half-kite).
The labels are actually data constructors introduced with type operator HalfTile which has an argument type (rep) to allow for more than one representation of the half-tiles.
data HalfTile rep
= LD rep -- Left Dart| RD rep -- Right Dart| LK rep -- Left Kite| RK rep -- Right Kitederiving(Show,Eq)
Tgraphs
We introduce tile graphs (Tgraphs) which provide a simple planar graph representation for finite patches of tiles. For Tgraphs we first specialise HalfTile with a triple of vertices (positive integers) to make a TileFace such as RD(1,2,3), where the vertices go clockwise round the half-tile triangle starting with the origin.
type TileFace = HalfTile (Vertex,Vertex,Vertex)type Vertex = Int -- must be positive
The function
makeTgraph ::[TileFace]-> Tgraph
then constructs a Tgraph from a TileFace list after checking the TileFaces satisfy certain properties (described below). We also have
faces :: Tgraph ->[TileFace]
to retrieve the TileFace list from a Tgraph.
As an example, the fool (short for fool’s kite and also called an ace in the literature) consists of two kites and a dart (= 4 half-kites and 2 half-darts):
fool :: Tgraph
fool = makeTgraph [RD (1,2,3), LD (1,3,4)-- right and left dart,LK (5,3,2), RK (5,2,7)-- left and right kite,RK (5,4,3), LK (5,6,4)-- right and left kite]
To produce a diagram, we simply draw the Tgraph
foolFigure :: Diagram B
foolFigure = draw fool
which will produce the diagram on the left in figure 4.
Alternatively,
foolFigure :: Diagram B
foolFigure = labelled drawj fool
will produce the diagram on the right in figure 4 (showing vertex labels and dashed join edges).
Figure 4: Diagram of fool without labels and join edges (left), and with (right)
When any (non-empty) Tgraph is drawn, a default orientation and scale are chosen based on the lowest numbered join edge. This is aligned on the positive x-axis with length 1 (for darts) or length (for kites).
Tgraph Properties
Tgraphs are actually implemented as
newtype Tgraph = Tgraph [TileFace]deriving(Show)
but the data constructor Tgraph is not exported to avoid accidentally by-passing checks for the required properties. The properties checked by makeTgraph ensure the Tgraph represents a legal tiling as a planar graph with positive vertex numbers, and that the collection of half-tile faces are both connected and have no crossing boundaries (see note below). Finally, there is a check to ensure two or more distinct vertex numbers are not used to represent the same vertex of the graph (a touching vertex check). An error is raised if there is a problem.
Note: If the TilFaces are faces of a planar graph there will also be exterior (untiled) regions, and in graph theory these would also be called faces of the graph. To avoid confusion, we will refer to these only as exterior regions, and unless otherwise stated, face will mean a TileFace. We can then define the boundary of a list of TileFaces as the edges of the exterior regions. There is a crossing boundary if the boundary crosses itself at a vertex. We exclude crossing boundaries from Tgraphs because they prevent us from calculating relative positions of tiles locally and create touching vertex problems.
For convenience, in addition to makeTgraph, we also have
The first of these (performing no checks) is useful when you know the required properties hold. The second performs the same checks as makeTgraph except that it omits the touching vertex check. This could be used, for example, when making a Tgraph from a sub-collection of TileFaces of another Tgraph.
Main Tiling Operations
There are three key operations on finite tilings, namely
Decomposition (also called deflation) works by splitting each half-tile into either 2 or 3 new (smaller scale) half-tiles, to produce a new tiling. The fact that this is possible, is used to establish the existence of infinite aperiodic tilings with kites and darts. Since our Tgraphs have abstracted away from scale, the result of decomposing a Tgraph is just another Tgraph. However if we wish to compare before and after with a drawing, the latter should be scaled by a factor times the scale of the former, to reflect the change in scale.
Figure 5: fool (left) and decompose fool (right)
We can, of course, iterate decompose to produce an infinite list of finer and finer decompositions of a Tgraph
Force works by adding any TileFaces on the boundary edges of a Tgraph which are forced. That is, where there is only one legal choice of TileFace addition consistent with the seven possible vertex types. Such additions are continued until either (i) there are no more forced cases, in which case a final (forced) Tgraph is returned, or (ii) the process finds the tiling is stuck, in which case an error is raised indicating an incorrect tiling. [In the latter case, the argument to force must have been an incorrect tiling, because the forced additions cannot produce an incorrect tiling starting from a correct tiling.]
An example is shown in figure 6. When forced, the Tgraph on the left produces the result on the right. The original is highlighted in red in the result to show what has been added.
Figure 6: A Tgraph (left) and its forced result (right) with the original shown red
Compose
Composition (also called inflation) is an opposite to decompose but this has complications for finite tilings, so it is not simply an inverse. (See Graphs,Kites and Darts and Theorems for more discussion of the problems). Figure 7 shows a Tgraph (left) with the result of composing (right) where we have also shown (in pale green) the faces of the original that are not included in the composition – the remainder faces.
Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green
Under some circumstances composing can fail to produce a Tgraph because there are crossing boundaries in the resulting TileFaces. However, we have established that
If g is a forced Tgraph, then compose g is defined and it is also a forced Tgraph.
Try Results
It is convenient to use types of the form Try a for results where we know there can be a failure. For example, compose can fail if the result does not pass the connected and no crossing boundary check, and force can fail if its argument is an incorrect Tgraph. In situations when you would like to continue some computation rather than raise an error when there is a failure, use a try version of a function.
We define Try as a synonym for Either ShowS (which is a monad) in module Tgraph.Try.
type Try a = Either ShowS a
(Note ShowS is String -> String). Successful results have the form Right r (for some correct result r) and failure results have the form Left (s<>) (where s is a String describing the problem as a failure report).
The function
runTry:: Try a -> a
runTry = either error id
will retrieve a correct result but raise an error for failure cases. This means we can always derive an error raising version from a try version of a function by composing with runTry.
force = runTry . tryForce
compose = runTry . tryCompose
Elementary Tgraph and TileFace Operations
The module Tgraph.Prelude defines elementary operations on Tgraphs relating vertices, directed edges, and faces. We describe a few of them here.
When we need to refer to particular vertices of a TileFace we use
originV :: TileFace -> Vertex -- the first vertex - red dot in figure 2
oppV :: TileFace -> Vertex -- the vertex at the opposite end of the join edge from the origin
wingV :: TileFace -> Vertex -- the vertex not on the join edge
A directed edge is represented as a pair of vertices.
type Dedge =(Vertex,Vertex)
So (a,b) is regarded as a directed edge from a to b.
When we need to refer to particular edges of a TileFace we use
joinE :: TileFace -> Dedge -- shown dotted in figure 2
shortE :: TileFace -> Dedge -- the non-join short edge
longE :: TileFace -> Dedge -- the non-join long edge
which are all directed clockwise round the TileFace. In contrast, joinOfTile is always directed away from the origin vertex, so is not clockwise for right darts or for left kites:
joinOfTile:: TileFace -> Dedge
joinOfTile face =(originV face, oppV face)
In the special case that a list of directed edges is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we can think of this as an edge list rather than just a directed edge list.
For example,
internalEdges :: Tgraph ->[Dedge]
produces an edge list, whereas
boundary :: Tgraph ->[Dedge]
produces single directions. Each directed edge in the resulting boundary will have a TileFace on the left and an exterior region on the right. The function
dedges :: Tgraph ->[Dedge]
produces all the directed edges obtained by going clockwise round each TileFace so not every edge in the list has an inverse in the list.
Note: There is now a class HasFaces (introduced in version 1.4) which includes instances for both Tgraph and [TileFace] and others. This allows some generalisations. In particular the more general types of the above three functions are now
internalEdges :: HasFaces a => a ->[Dedge]
boundary :: HasFaces a => a ->[Dedge]
dedges :: HasFaces a => a ->[Dedge]
Patches (Scaled and Positioned Tilings)
Behind the scenes, when a Tgraph is drawn, each TileFace is converted to a Piece. A Piece is another specialisation of HalfTile using a two dimensional vector to indicate the length and direction of the join edge of the half-tile (from the originV to the oppV), thus fixing its scale and orientation. The whole Tgraph then becomes a list of located Pieces called a Patch.
type Piece = HalfTile (V2 Double)type Patch =[Located Piece]
Piece drawing functions derive vectors for other edges of a half-tile piece from its join edge vector. In particular (in the TileLib module) we have
drawPiece :: Piece -> Diagram B
dashjPiece :: Piece -> Diagram B
fillPieceDK :: Colour Double -> Colour Double -> Piece -> Diagram B
where the first draws the non-join edges of a Piece, the second does the same but adds a dashed line for the join edge, and the third takes two colours – one for darts and one for kites, which are used to fill the piece as well as using drawPiece.
Patch is an instances of class Transformable so a Patch can be scaled, rotated, and translated.
Vertex Patches
It is useful to have an intermediate form between Tgraphs and Patches, that contains information about both the location of vertices (as 2D points), and the abstract TileFaces. This allows us to introduce labelled drawing functions (to show the vertex labels) which we then extend to Tgraphs. We call the intermediate form a VPatch (short for Vertex Patch).
type VertexLocMap = IntMap.IntMap (Point V2 Double)data VPatch = VPatch {vLocs :: VertexLocMap, vpFaces::[TileFace]}deriving Show
and
makeVP :: Tgraph -> VPatch
calculates vertex locations using a default orientation and scale.
VPatch is made an instance of class Transformable so a VPatch can also be scaled and rotated.
One essential use of this intermediate form is to be able to draw a Tgraph with labels, rotated but without the labels themselves being rotated. We can simply convert the Tgraph to a VPatch, and rotate that before drawing with labels.
labelled draw (rotate someAngle (makeVP g))
We can also align a VPatch using vertex labels.
alignXaxis ::(Vertex, Vertex)-> VPatch -> VPatch
So if g is a Tgraph with vertex labels a and b we can align it on the x-axis with a at the origin and b on the positive x-axis (after converting to a VPatch), instead of accepting the default orientation.
labelled draw (alignXaxis (a,b)(makeVP g))
Another use of VPatches is to share the vertex location map when drawing only subsets of the faces (see Overlaid examples in the next section).
4. Drawing in More Detail
Class Drawable
There is a class Drawable with instances Tgraph, VPatch, Patch. When the token B is in scope standing for a fixed backend then we can assume
draw :: Drawable a => a -> Diagram B -- draws non-join edges
drawj :: Drawable a => a -> Diagram B -- as with draw but also draws dashed join edges
fillDK :: Drawable a => Colour Double -> Colour Double -> a -> Diagram B -- fills with colours
where fillDK clr1 clr2 will fill darts with colour clr1 and kites with colour clr2 as well as drawing non-join edges.
These are the main drawing tools. However they are actually defined for any suitable backend b so have more general types.
(Update Sept 2024) As of version 1.1 of PenroseKiteDart, these will be
draw ::(Drawable a, OKBackend b)=>
a -> Diagram b
drawj ::(Drawable a, OKBackend) b)=>
a -> Diagram b
fillDK ::(Drawable a, OKBackend b)=>
Colour Double -> Colour Double -> a -> Diagram b
where the class OKBackend is a check to ensure a backend is suitable for drawing 2D tilings with or without labels.
In these notes we will generally use the simpler description of types using B for a fixed chosen backend for the sake of clarity.
The drawing tools are each defined via the class function drawWith using Piece drawing functions.
class Drawable a where
drawWith ::(Piece -> Diagram B)-> a -> Diagram B
draw = drawWith drawPiece
drawj = drawWith dashjPiece
fillDK clr1 clr2 = drawWith (fillPieceDK clr1 clr2)
To design a new drawing function, you only need to implement a function to draw a Piece, (let us call it newPieceDraw)
newPieceDraw :: Piece -> Diagram B
This can then be elevated to draw any Drawable (including Tgraphs, VPatches, and Patches) by applying the Drawable class function drawWith:
newDraw :: Drawable a => a -> Diagram B
newDraw = drawWith newPieceDraw
Class DrawableLabelled
Class DrawableLabelled is defined with instances Tgraph and VPatch, but Patch is not an instance (because this does not retain vertex label information).
class DrawableLabelled a where
labelColourSize :: Colour Double -> Measure Double ->(Patch -> Diagram B)-> a -> Diagram B
So labelColourSize c m modifies a Patch drawing function to add labels (of colour c and size measure m). Measure is defined in Diagrams.Prelude with pre-defined measures tiny, verySmall, small, normal, large, veryLarge, huge. For most of our diagrams of Tgraphs, we use red labels and we also find small is a good default size choice, so we define
labelSize :: DrawableLabelled a => Measure Double ->(Patch -> Diagram B)-> a -> Diagram B
labelSize = labelColourSize red
labelled :: DrawableLabelled a =>(Patch -> Diagram B)-> a -> Diagram B
labelled = labelSize small
and then labelled draw, labelled drawj, labelled (fillDK clr1 clr2) can all be used on both Tgraphs and VPatches as well as (for example) labelSize tiny draw, or labelCoulourSize blue normal drawj.
Further drawing functions
There are a few extra drawing functions built on top of the above ones. The function smart is a modifier to add dashed join edges only when they occur on the boundary of a Tgraph
smart ::(VPatch -> Diagram B)-> Tgraph -> Diagram B
So smart vpdraw g will draw dashed join edges on the boundary of g before applying the drawing function vpdraw to the VPatch for g. For example the following all draw dashed join edges only on the boundary for a Tgraph g
smart draw g
smart (labelled draw) g
smart (labelSize normal draw) g
When using labels, the function rotateBefore allows a Tgraph to be drawn rotated without rotating the labels.
Here, restrictSmart g vpdraw vp uses the given vp for drawing boundary joins and drawing faces of g (with vpdraw) rather than converting g to a new VPatch. This assumes vp has locations for vertices in g.
Overlaid examples (location map sharing)
The function
drawForce :: Tgraph -> Diagram B
will (smart) draw a Tgraph g in red overlaid (using <>) on the result of force g as in figure 6. Similarly
drawPCompose :: Tgraph -> Diagram B
applied to a Tgraph g will draw the result of a partial composition of g as in figure 7. That is a drawing of compose g but overlaid with a drawing of the remainder faces of g shown in pale green.
Both these functions make use of sharing a vertex location map to get correct alignments of overlaid diagrams. In the case of drawForce g, we know that a VPatch for force g will contain all the vertex locations for g since force only adds to a Tgraph (when it succeeds). So when constructing the diagram for g we can use the VPatch created for force g instead of starting afresh. Similarly for drawPCompose g the VPatch for g contains locations for all the vertices of compose g so compose g is drawn using the the VPatch for g instead of starting afresh.
The location map sharing is done with
subVP :: VPatch ->[TileFace]-> VPatch
so that subVP vp fcs is a VPatch with the same vertex locations as vp, but replacing the faces of vp with fcs. [Of course, this can go wrong if the new faces have vertices not in the domain of the vertex location map so this needs to be used with care. Any errors would only be discovered when a diagram is created.]
For cases where labels are only going to be drawn for certain faces, we need a version of subVP which also gets rid of vertex locations that are not relevant to the faces. For this situation we have
restrictVP:: VPatch ->[TileFace]-> VPatch
which filters out un-needed vertex locations from the vertex location map. Unlike subVP, restrictVP checks for missing vertex locations, so restrictVP vp fcs raises an error if a vertex in fcs is missing from the keys of the vertex location map of vp.
5. Forcing in More Detail
The force rules
The rules used by our force algorithm are local and derived from the fact that there are seven possible vertex types as depicted in figure 8.
Figure 8: Seven vertex types
Our rules are shown in figure 9 (omitting mirror symmetric versions). In each case the TileFace shown yellow needs to be added in the presence of the other TileFaces shown.
Figure 9: Rules for forcing
Main Forcing Operations
To make forcing efficient we convert a Tgraph to a BoundaryState to keep track of boundary information of the Tgraph, and then calculate a ForceState which combines the BoundaryState with a record of awaiting boundary edge updates (an update map). Then each face addition is carried out on a ForceState, converting back when all the face additions are complete. It makes sense to apply force (and related functions) to a Tgraph, a BoundaryState, or a ForceState, so we define a class Forcible with instances Tgraph, BoundaryState, and ForceState.
This allows us to define
force :: Forcible a => a -> a
tryForce :: Forcible a => a -> Try a
The first will raise an error if a stuck tiling is encountered. The second uses a Try result which produces a Left string for failures and a Right a for successful result a.
There are several other operations related to forcing including
stepForce :: Forcible a => Int -> a -> a
tryStepForce :: Forcible a => Int -> a -> Try a
addHalfDart, addHalfKite :: Forcible a => Dedge -> a -> a
tryAddHalfDart, tryAddHalfKite :: Forcible a => Dedge -> a -> Try a
The first two force (up to) a given number of steps (=face additions) and the other four add a half dart/kite on a given boundary edge.
Update Generators
An update generator is used to calculate which boundary edges can have a certain update. There is an update generator for each force rule, but also a combined (all update) generator. The force operations mentioned above all use the default all update generator (defaultAllUGen) but there are more general (with) versions that can be passed an update generator of choice. For example
forceWith :: Forcible a => UpdateGenerator -> a -> a
tryForceWith :: Forcible a => UpdateGenerator -> a -> Try a
In fact we defined
force = forceWith defaultAllUGen
tryForce = tryForceWith defaultAllUGen
We can also define
wholeTiles :: Forcible a => a -> a
wholeTiles = forceWith wholeTileUpdates
where wholeTileUpdates is an update generator that just finds boundary join edges to complete whole tiles.
In addition to defaultAllUGen there is also allUGenerator which does the same thing apart from how failures are reported. The reason for keeping both is that they were constructed differently and so are useful for testing.
In fact UpdateGenerators are functions that take a BoundaryState and a focus (list of boundary directed edges) to produce an update map. Each Update is calculated as either a SafeUpdate (where two of the new face edges are on the existing boundary and no new vertex is needed) or an UnsafeUpdate (where only one edge of the new face is on the boundary and a new vertex needs to be created for a new face).
type UpdateGenerator = BoundaryState ->[Dedge]-> Try UpdateMap
type UpdateMap = Map.Map Dedge Update
data Update = SafeUpdate TileFace
| UnsafeUpdate (Vertex -> TileFace)
Completing (executing) an UnsafeUpdate requires a touching vertex check to ensure that the new vertex does not clash with an existing boundary vertex. Using an existing (touching) vertex would create a crossing boundary so such an update has to be blocked.
Forcible Class Operations
The Forcible class operations are higher order and designed to allow for easy additions of further generic operations. They take care of conversions between Tgraphs, BoundaryStates and ForceStates.
class Forcible a where
tryFSOpWith :: UpdateGenerator ->(ForceState -> Try ForceState)-> a -> Try a
tryChangeBoundaryWith :: UpdateGenerator ->(BoundaryState -> Try BoundaryChange)-> a -> Try a
tryInitFSWith :: UpdateGenerator -> a -> Try ForceState
For example, given an update generator ugen and any f:: ForceState -> Try ForceState , then f can be generalised to work on any Forcible using tryFSOpWith ugen f. This is used to define both tryForceWith and tryStepForceWith.
We also specialize tryFSOpWith to use the default update generator
tryFSOp :: Forcible a =>(ForceState -> Try ForceState)-> a -> Try a
tryFSOp = tryFSOpWith defaultAllUGen
Similarly given an update generator ugen and any f:: BoundaryState -> Try BoundaryChange , then f can be generalised to work on any Forcible using tryChangeBoundaryWith ugen f. This is used to define tryAddHalfDart and tryAddHalfKite.
We also specialize tryChangeBoundaryWith to use the default update generator
tryChangeBoundary :: Forcible a =>(BoundaryState -> Try BoundaryChange)-> a -> Try a
tryChangeBoundary = tryChangeBoundaryWith defaultAllUGen
Note that the type BoundaryChange contains a resulting BoundaryState, the single TileFace that has been added, a list of edges removed from the boundary (of the BoundaryState prior to the face addition), and a list of the (3 or 4) boundary edges affected around the change that require checking or re-checking for updates.
The class function tryInitFSWith will use an update generator to create an initial ForceState for any Forcible. If the Forcible is already a ForceState it will do nothing. Otherwise it will calculate updates for the whole boundary. We also have the special case
tryInitFS :: Forcible a => a -> Try ForceState
tryInitFS = tryInitFSWith defaultAllUGen
Efficient chains of forcing operations.
Note that (force . force) does the same as force, but we might want to chain other force related steps in a calculation.
For example, consider the following combination which, after decomposing a Tgraph, forces, then adds a half dart on a given boundary edge (d) and then forces again.
combo :: Dedge -> Tgraph -> Tgraph
combo d = force . addHalfDart d . force . decompose
Since decompose:: Tgraph -> Tgraph, the instances of force and addHalfDart d will have type Tgraph -> Tgraph so each of these operations, will begin and end with conversions between Tgraph and ForceState. We would do better to avoid these wasted intermediate conversions working only with ForceStates and keeping only those necessary conversions at the beginning and end of the whole sequence.
This can be done using tryFSOp. To see this, let us first re-express the forcing sequence using the Try monad, so
force . addHalfDart d . force
becomes
tryForce <=< tryAddHalfDart d <=< tryForce
Note that (<=<) is the Kliesli arrow which replaces composition for Monads (defined in Control.Monad). (We could also have expressed this right to left sequence with a left to right version tryForce >=> tryAddHalfDart d >=> tryForce). The definition of combo becomes
combo :: Dedge -> Tgraph -> Tgraph
combo d = runTry . (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose
This has no performance improvement, but now we can pass the sequence to tryFSOp to remove the unnecessary conversions between steps.
The sequence actually has type Forcible a => a -> Try a but when passed to tryFSOp it specialises to type ForceState -> Try ForseState. This ensures the sequence works on a ForceState and any conversions are confined to the beginning and end of the sequence, avoiding unnecessary intermediate conversions.
A limitation of forcing
To avoid creating touching vertices (or crossing boundaries) a BoundaryState keeps track of locations of boundary vertices. At around 35,000 face additions in a single force operation the calculated positions of boundary vertices can become too inaccurate to prevent touching vertex problems. In such cases it is better to use
recalibratingForce :: Forcible a => a -> a
tryRecalibratingForce :: Forcible a => a -> Try a
These work by recalculating all vertex positions at 20,000 step intervals to get more accurate boundary vertex positions. For example, 6 decompositions of the kingGraph has 2,906 faces. Applying force to this should result in 53,574 faces but will go wrong before it reaches that. This can be fixed by calculating either
recalibratingForce (decompositions kingGraph !!6)
or using an extra force before the decompositions
force (decompositions (force kingGraph) !!6)
In the latter case, the final force only needs to add 17,864 faces to the 35,710 produced by decompositions (force kingGraph) !!6.
6. Advanced Operations
Guided comparison of Tgraphs
Asking if two Tgraphs are equivalent (the same apart from choice of vertex numbers) is a an np-complete problem. However, we do have an efficient guided way of comparing Tgraphs. In the module Tgraph.Rellabelling we have
sameGraph ::(Tgraph,Dedge)->(Tgraph,Dedge)-> Bool
The expression sameGraph (g1,d1) (g2,d2) asks if g2 can be relabelled to match g1 assuming that the directed edge d2 in g2 is identified with d1 in g1. Hence the comparison is guided by the assumption that d2 corresponds to d1.
where tryRelabelToMatch (g1,d1) (g2,d2) will either fail with a Left report if a mismatch is found when relabelling g2 to match g1 or will succeed with Right g3 where g3 is a relabelled version of g2. The successful result g3 will match g1 in a maximal tile-connected collection of faces containing the face with edge d1 and have vertices disjoint from those of g1 elsewhere. The comparison tries to grow a suitable relabelling by comparing faces one at a time starting from the face with edge d1 in g1 and the face with edge d2 in g2. (This relies on the fact that Tgraphs are connected with no crossing boundaries, and hence tile-connected.)
which tries to find the union of two Tgraphs guided by a directed edge identification. However, there is an extra complexity arising from the fact that Tgraphs might overlap in more than one tile-connected region. After calculating one overlapping region, the full union uses some geometry (calculating vertex locations) to detect further overlaps.
which will find common regions of overlapping faces of two Tgraphs guided by a directed edge identification. The resulting common faces will be a sub-collection of faces from the first Tgraph. These are returned as a list as they may not be a connected collection of faces and therefore not necessarily a Tgraph.
Empires and SuperForce
In Empires and SuperForce we discussed forced boundary coverings which were used to implement both a superForce operation
superForce:: Forcible a => a -> a
and operations to calculate empires.
We will not repeat the descriptions here other than to note that
forcedBoundaryECovering:: Tgraph ->[Tgraph]
finds boundary edge coverings after forcing a Tgraph. That is, forcedBoundaryECovering g will first force g, then (if it succeeds) finds a collection of (forced) extensions to force g such that
each extension has the whole boundary of force g as internal edges.
each possible addition to a boundary edge of force g (kite or dart) has been included in the collection.
(possible here means – not leading to a stuck Tgraph when forced.) There is also
forcedBoundaryVCovering:: Tgraph ->[Tgraph]
which does the same except that the extensions have all boundary vertices internal rather than just the boundary edges.
Combinations and Explicitly Forced
We introduced a new type Forced (in v 1.3) to enable a forcible to be explictily labelled as being forced. For example
forceF :: Forcible a => a -> Forced a
tryForceF :: Forcible a => a -> Try (Forced a)
forgetF :: Forced a -> a
This allows us to restrict certain functions which expect a forced argument by making this explicit.
composeF :: Forced Tgraph -> Forced Tgraph
The definition makes use of theorems established in Graphs,Kites and Darts and Theorems that composing a forced Tgraph does not require a check (for connectedness and no crossing boundaries) and the result is also forced. This can then be used to define efficient combinations such as
compForce:: Tgraph -> Forced Tgraph -- compose after forcing
composeForce = composeF . forceF
allCompForce:: Tgraph ->[Forced Tgraph]-- iterated (compose after force) while not emptyTgraph
maxCompForce:: Tgraph -> Forced Tgraph -- last item in allCompForce (or emptyTgraph)
Tracked Tgraphs
The type
data TrackedTgraph = TrackedTgraph
{ tgraph :: Tgraph
, tracked ::[[TileFace]]}deriving Show
has proven useful in experimentation as well as in producing artwork with darts and kites. The idea is to keep a record of sub-collections of faces of a Tgraph when doing both force operations and decompositions. A list of the sub-collections forms the tracked list associated with the Tgraph. We make TrackedTgraph an instance of class Forcible by having force operations only affect the Tgraph and not the tracked list. The significant idea is the implementation of
Decomposition of a Tgraph involves introducing a new vertex for each long edge and each kite join. These are then used to construct the decomposed faces. For decomposeTracked we do the same for the Tgraph, but when it comes to the tracked collections, we decompose them re-using the same new vertex numbers calculated for the edges in the Tgraph. This keeps a consistent numbering between the Tgraph and tracked faces, so each item in the tracked list remains a sub-collection of faces in the Tgraph.
The function
drawTrackedTgraph ::[VPatch -> Diagram B]-> TrackedTgraph -> Diagram B
is used to draw a TrackedTgraph. It uses a list of functions to draw VPatches. The first drawing function is applied to a VPatch for any untracked faces. Subsequent functions are applied to VPatches for the tracked list in order. Each diagram is beneath later ones in the list, with the diagram for the untracked faces at the bottom. The VPatches used are all restrictions of a single VPatch for the Tgraph, so will be consistent in vertex locations. When labels are used, there is also a drawTrackedTgraphRotated and drawTrackedTgraphAligned for rotating or aligning the VPatch prior to applying the drawing functions.
Note that the result of calculating empires (see Empires and SuperForce ) is represented as a TrackedTgraph. The result is actually the common faces of a forced boundary covering, but a particular element of the covering (the first one) is chosen as the background Tgraph with the common faces as a tracked sub-collection of faces. Hence we have
Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).
Graphs, Kites and Darts intoduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
Have you ever wished you could browse all the Haskell packages
together in your IDE, with full navigation using go-to-definition
and find-references? Here’s a demo of something I hacked together
while at ZuriHac 2025 over the weekend:
In the previous post I talked about
how to index all of Hackage (actually Stackage, strictly speaking,
because it’s not in general possible to build all of Hackage together)
using Glean. Since that post I made some
more progress on the indexer:
The indexer now indexes
types. You can
see type-on-hover working in the demo. The types are similar to what
you see in the Haddock-generated hyperlinked source, except that
here it’s always using the type of the definition and not the type
at the usage site, which might be more specific. That’s a TODO for
later.
Fixed a bunch of things, enriched the index with details about
constructors, fields and class methods, and made indexing more
efficient.
The DB size including types is now about 850MB, and it takes
just under 8 minutes on my 9-year-old laptop to index the nearly
3000 packages in my stackage LTS 21.21 snapshot. (Note: the figures
here were updated on 12-06-2025 when I redid the measurments).
Hooking it up to VS Code
The architecture looks like this:
The LSP server is a modified version of
static-ls, which is
already designed to provide an LSP service based on static
information. I just reimplemented a few of its handlers to make calls
to Glass instead of the existing hie/hiedb implementations. You can
see the changes on my fork of
static-ls. Of
course, these changes are still quite hacky and not suitable for
upstreaming.
Glass
is a “Language-agnostic Symbol Server”. Essentially it provides an API
abstraction over Glean with operations that are useful for code
navigation and search.
Where to next?
There remain a few issues to solve before this can be useful.
Make Glean more easily installable. There’s a general concensus that
cabal install glean would lower the barrier to entry
significantly; in order to do this we need to build the folly
dependency using Cabal.
Clean up and ship the LSP server, somehow. Once Glean is
cabal-installable, we can depend on it from an LSP server package.
Think about continuous integration to build the Glean
DB. Perhaps this can piggyback off the stackage CI infra? If we
can already build a complete stackage snapshot, and Glean is
easily installable, then indexing would be fairly
straightforward. I’d love to hear suggestions on how best to do
this.
And looking forwards a bit further:
Think about how to handle multiple packages versions. There’s no
fundamental problem with indexing multiple package versions, except
that Glass’s SymbolID format currently doesn’t include the package
version but that’s easily fixable. We could for example build
multiple stackage LTS instances and index them all in a single Glean
DB. There would be advantages to doing this, if for instance there
were packages in common between two Stackage instances then the
Glean DB would only contain a single copy. A lot of the type
structure would be shared too.
Provide search functionality in the LSP. Glean can provide
simple textual search for names, and with some work could also
provide Hoogle-like type search.
Think about how to index local projects and local changes. Glean
supports stacked and
incremental DBs, so we
could build a DB for a local project stacked on top of the full
Stackage DB. You would be able to go-to-definition directly from
a file in your project to the packages it depends on in
Stackage. We could re-index new .hie files as they are
generated, rather like how static-ls currently handles changes.
Integrate with HLS? Perhaps Glean could be used to handle
references outside of the current project, switching seamlessly
from GHC-based navigation to Glean-based navigation if you jump
into a non-local package.
More use cases?
I talked with a few people at ZuriHac about potential use cases for
Glean within the Haskell ecosystem. Using it in haskell.org came up
a few times, as a way to power search, navigation and analysis. Also
mentioned was the possibility of using it as a Hoogle
backend. Potentially we could replace the Haddock-generated
hyperlinked sources on haskell.org with a Glean-based browser, which
would allow navigating links between packages and find-references.
Another use cases that came up was the possibility of doing impact
analysis for core library changes (or any API changes really). Some of
this is already possible using find-references, but more complex cases
such as finding instances that override certain methods aren’t
possible yet until we extend the indexer to capture richer
information.
If you’re interested in using Glean for something, why not jump on the
Glean discord server and tell us about it!
A few days ago I gave a talk at ZuriHac
2025 entitled Haskell for Competitive
Programming, a basic introduction to competitive programming in
general, and the joy of using Haskell for competitive programming in
particular. This is an expanded version of my talk in blog post form.
(For an even gentler introduction to competitive programming in
Haskell, see this old blog post from
2019.)
Competitive Programming
First of all, what is competitive programming? It’s a broad term,
but when I talk about competitive programming I have something in mind
along the following lines:
There are well-specified input and output formats, usually with a
few examples, and a precise specification of what the output should
be for a given input.
Your job is to write a program which transforms input meeting the
specification into a correct output.
You submit your program, which is tested on a number of inputs and
declared correct if and only if it yields the correct output for all
the tested inputs.
There is often time pressure involved—that is, you have a limited
amount of time in which to write your program. However, it is also
possible to participate “recreationally”, simply for the joy of
problem-solving, without time pressure (in fact, the vast majority
of the competitive programming I do is of this form, though I have
occasionally participated in timed contests).
There are many variations: whether you are allowed to use code
libraries prepared ahead of time, or must type everything from
scratch; outputs can be scored according to some criteria rather
than simply being judged right or wrong; and so on.
There are many sites which allow you to participate in contests and/or
solve competitive programming problems recreationally. My favorite is
Open Kattis; I mention some others at the
end of this post.
Pot: a first example
As an introductory example, let’s look at
Pot. As usual, there’s a silly
story, but what it boils down to is that we will be given a sequence
of numbers, and we should interpret the last digit of each number as an
exponent, then sum the results. For example, if given 125, we
should interpret it as \(12^5\), and so on.
Dealing with I/O via interact
An imperative approach to such a problem would involve doing a
sequence of input commands, some computation, and a sequence of output
commands—possibly interleaved with one another—and we might
immediately think to start using functions like getLine and
putStrLn to do the required I/O in Haskell. However, there is a
much more fruitful functional perspective: we are simply being asked
to implement a particular (partial) function of type String -> String. The fact that the function’s input and output should be
hooked up to the program’s standard input and output is just an
implementation detail. Competitive programming is functional at
heart!
It turns out that Haskell’s standard library already has the perfect
built-in function for this scenario:
interact :: (String->String) ->IO ()
interact takes a pure String -> String function and turns it into
an IO action which reads from standard input, passes the input to
the given String -> String function, and prints the result to standard output. It even
does this using lazy I/O—that is, the input is
read lazily, as demanded by the function, so that the output and input
can be automatically interleaved depending on which parts of the
output depend on which parts of the input. In particular, this means
that that the entire input need not be stored in memory at once. If
the inputs can be processed into outputs in a streaming fashion—as
is the case in the example problem we are currently
considering—then the input and output will be interleaved. In
general, this kind of lazy I/O is
problematic
and even unsafe, but it’s perfect for this scenario.
Solving the problem with a pipeline
So interact does all the IO for us, and all we have to do is write
a pure String -> String function which transforms the input to the
output. In this case, we can split the input into lines, drop the
first line (we don’t need to know how many lines of input there
are—we just get a list of all of them, since interact will read
until EOF), read each number and turn it into the first digits
raised to the power of the last digit, then sum them and show the
result. The full solution is below. Notice how I use the “backwards
composition” operator (>>>), since I find it more convenient to type
from left to right as I’m thinking about transforming from input to
output.
I use Integer here since raw performance doesn’t matter much for
this easy problem, and Integer avoids any potential problems with
overflow. However, using Int instead of Integer can make a big
difference for some compute-intensive problems. On Kattis, Int will
always be 64 bits, but last time I checked Int can be 32 bits on
Codeforces.
Shopping List: wholemeal programming and ByteString
Let’s consider Shopping List as a second example. In this
problem, we are given a list of shopping lists, where each shopping
list consists of a list of space-separated items on a single line. We
are asked to find the items which are common to all the shopping
lists, and print them in alphabetical order.
Wholemeal programming with standard data structures
This problem is very amenable to a “wholemeal programming”
approach,
where we work entirely at the level of whole data structure
transformations rather than looping over individual elements. We can
turn each shopping list into a set, then find the intersection of all
the sets. Moreover, if we use Data.Set, which uses an ordering on
the elements, we will get the result in alphabetical order “for free”
(“free” as in the amount of code we have to write, not necessarily
runtime cost). Haskell has a decent collection of data structures in
the containers library ((Int)Set, (Int)Map, Seq, Tree, and
even Graph) with a large collection of standard methods to construct
and manipulate them, which are bread and butter for many competitive
programming problems.
Unfortunately, when we try submitting this code, we get a Time Limit
Exceeded error! What’s wrong?
The issue is our use of String, which is an actual linked list of
characters and is very slow, especially when we have many short
strings, as in this problem. In the worst case, we could have 100
shopping lists, each with 5000 items of length 10, for a total of up
to 5 MB of input; with that much input data to read, any overhead
associated with reading and parsing the input can make a significant
difference.
Switching to ByteString is much faster. Why not Text, you ask?
Well, Text has to do a bunch of extra work to deal properly with
Unicode encodings, but in 99.99% of all competitive programming problems
I’ve ever seen, the input is guaranteed to be ASCII. So not
only do we not need Text, we can get away with a version of
ByteString that simply assumes every character is a single 8-bit
byte!
Once we import it, all we need to do is replace a bunch of
String operations with corresponding ByteString ones.
A Favourable Ending: input parsing and lazy recursive structures
As a last example, let’s look at A Favourable
Ending. This problem
consists of a number of test cases; each test case describes a
choose-your-own-adventure book with a number of sections, where each
section is either an ending (either good or bad), or allows the reader
to choose among three sections to proceed to next. For each test case,
we are asked how many distinct stories there are with good endings.
More abstractly, since we are guaranteed that there are no loops, the
sections of the book form a
DAG, and we
are asked to count the number of distinct paths in a DAG from a
distinguished start node to any of a distinguished set of “good”
leaves.
Parsing with Scanner
Parsing the input for this problem is trickier than the other
examples so far. In theory, we could still ignore the first number
specifying the number of test cases, and just continue reading test
cases until EOF. However, each test case begins with a number
specifying the number of sections in the book, and we cannot ignore
this number: we need to know how many lines to read before the start
of the next test case. Doing this manually involves pattern-matching
on a list of lines, using splitAt to split off the lines for each
test case, and manually passing around the list of the remaining
lines: tedious.
Fortunately, Haskell is great at building abstractions to insulate us
from such tedium. I’ve developed a simple Scanner
abstraction
which works well in this context.
We begin by creating some data types to represent the input in
structured form:
book ::ScannerBookbook =do s <- int M.fromList <$> s >< ((,) <$> int <*> section)section ::ScannerSectionsection =do t <- peekifisDigit (BS.head t)thenChoice<$> (3>< int)elseEnd. readLower . BS.unpack <$> strreadLower ::Read a =>String-> areadLower =read. onHead toUpperonHead :: (a -> a) -> [a] -> [a]onHead _ [] = []onHead f (x : xs) = f x : xs
(readLower and onHead are functions in my personal competitive
programming template, included here for completeness).
One more piece of boilerplate we can write at this point is the main
function, which simply consists of running the Scanner to read all the
test cases, solving each test case, and formatting the output.
With all that framework out of the way, we can turn to actually
solving the problem. And here is where something really fun happens.
In a typical imperative language, we would have to first topologically
sort the book sections, then use dynamic programming to compute the
number of good stories beginning at each section, starting with the
leaves and proceeding backwards through the topological sort to the
start—dozens of lines of code. However, in Haskell we can get all
of this for free, just by defining a lazy, recursive map!
solve ::Book->Intsolve book = endings !1where endings = M.fromList [(p, endingsFrom (book!p)) | p <- M.keys book] endingsFrom (End d) =if d ==Favourablythen1else0 endingsFrom (Choice ps) =sum$map (endings !) ps
endings is a Map from each book section to the number of favorable
stories starting with that section. Notice how its values are defined
via the endingsFrom function, which is in turn defined, in the
Choice case, by looking up the values of the choices in the
endings map and summing them. endings is thus defined
recursively, which works because it is lazy in the values. When we
demand the value of endings ! 1, the runtime system starts evaluating
thunks in the map as needed, implicitly doing a topological sort for us.
Here’s another way to think about this: what we really want is the
function endingsFrom : Section -> Int, which tells us how many good
endings there are starting at a given section. It can be defined via a
recurrence; however, if we were to literally implement it as a
recursive function, our program would spend a ridiculous amount of
time recomputing the same values over and over again. So, we insert a
lazy map in the middle to memoize it (there are other data
structures
that can be used for this purpose as well).
Resources
Here are some resources in case you’re interested in exploring more.
Open Kattis has a collection of thousands
of high-quality problems which can be solved in Haskell (or many
other languages). If you just want to try solving some problems for
fun, it’s a great place to start.
There are also other sites which accept Haskell, such as
Codeforces. Check these out if you want
to actually participate in timed contests.
I’ve written a series of blog posts about competitive
programming in Haskell, on a variety of topics.
I also have a repository of modules I’ve developed
specifically for competitive programming. Many of the modules are
documented in one or more blog posts.
A lot of strong engineers that I know haven't really taken a serious look at AI coding; they've used LLMs to ask questions or write simple scripts and appreciate that it is a useful tool, but haven't actually tried building a nontrivial application entirely from scratch in vibe coding style (here, I use the term in its original meaning: when you do AI coding without carefully reviewing the output). This is understandable: if you're not working on a green field project, there aren't that many opportunities to write code in this style--standard practice for established projects is that someone else needs to review all of the code you write: this is a bad match for vibe coding! So in this post, I want to give a concrete case study of a nontrivial system that was entirely vibe coded (ScubaDuck), to argue the following claims:
AI coding can be done on a manager's schedule: you don't need continuous blocks of coding time and context-switching is considerably less harmful. ScubaDuck was implemented in three days of part time work, where all of the work happened when the baby was napping.
AI coding substantially lowers the cost of doing projects in tech stacks you are less familiar with. ScubaDuck is mostly JavaScript UI code, which is not something I write on a day-to-day basis.
AI coding is an unlock for "sidequests": support software that's ancillary to your main task that is nice to have, but not essential. If previously you would have decided the cost outweighed the benefit, AI coding reducing the cost means you should redo these calculations.
Vibe coding works and can produce working software. ScubaDuck is an existence proof that vibe coding is a viable strategy for generating JavaScript UI code (NB: I don't claim vibe coding will work for all domains, nor do I claim this is the only domain for it works. Hopefully you can also build some intuition for where it is more or less likely to work). You will not one shot it (ScubaDuck was 150 prompts in the end) but if you are prompting the LLM to also generate tests, you can reliably fix issues without causing regressions to existing code.
Vibe coding is good for situations where buggy software is low impact; be on the lookout for ways to engineer this sort of situation. ScubaDuck is a read-only interface, where the only downside to being buggy is you can't issue the queries you want to issue.
Update: You can see all of my prompts and the resulting agent trajectories at scubaduck-prompts.
What is ScubaDuck?
ScubaDuck is a discount implementation of Meta's internal Scuba realtime database system. You can read more about what exactly this is on GitHub, but it's not so important for the purposes of this post: the key details you need to know about ScubaDuck is that it consists of a Python server that exposes an API to perform queries against a DuckDB database, and an HTML and JavaScript frontend application which implements the forms for building these queries and rendering of the output data. Both the forms and output data rendering have nontrivial JavaScript enhancements: some form inputs are chip inputs and support autocomplete, and the time series view is an SVG chart. All of these components were coded from scratch, so the project has no third-party JavaScript dependencies.
So on the one hand, this project is pretty simple. There are no stringent performance or uptime requirements, it's a pretty standard server-client program that the LLM has seen millions of times before (this is good!) On the other hand, the exact behavior of the frontend UI is quite intricate and would be very difficult to one-shot in a single prompt. Indeed, as I was coding and testing the application, I frequently ran into situations that I didn't anticipate in my original specification, and that I had to ask Codex to refine. Another way to put it is that ScubaDuck is a relatively simple functional specification (although this too was not one shot), but I did a lot of polishing of small behaviors so that the interface behaved in the way that I expected Scuba to behave. Here, it was helpful that I had a very clear idea of what I wanted (since I've used Scuba quite a lot at work).
Going into ScubaDuck, I had a pretty good sense that this project should be a good fit for LLMs. HTML, JavaScript and Python are all extremely high resource languages, and I'd heard lots of people raving about how good LLMs were at transforming wireframes and mockups into fully functional websites. It is also fully self contained and straightforward-ish to test (only "ish" because you do have to use something like Playwright to actually test the frontend UI, which honestly is a slog. But fortunately, the LLM can write the tests for you!) One design decision I made, which I didn't originally anticipate but worked out in the end, was the decision to not use any third-party JavaScript libraries. This was by accident: Python has no native of bundling third party JavaScript, but I wanted the tool to work offline. I wasn't sure if you could vibe code an SVG charting library from scratch, but apparently you can and it's quite easy!
Agent setup
ScubaDuck was implemented with OpenAI Codex in the cloud (not the CLI tool). Codex's cloud offering requires you to initialize a hermetic environment which the coding agent can execute commands in. It's pretty well known now that AI coding agents work much better if they are able to run the code they write and see if it worked or not, so this is quite an important part of the process. Unfortunately, this was somewhat time consuming trial and error to setup. I had a fairly detailed initial prompt, and what I would do was submit it to Codex, watch it fail, read over the trajectory (the agent logs) to see what happened (Codex wanted to use npm! Codex couldn't download something from the internet! Codex tried to use a package that wasn't available!) and then fixed whatever environment misconfiguration had caused it to fail, or edited AGENTS.md to instruct it to not do some behavior. According to my history, the first day of the project was spent unsuccessfully trying to get the project setup, and my first successful Codex PR only happened on May 19.
At the end of setup, I had the following:
A pyproject.toml with exactly the dependencies I wanted to be used (duckdb, flask and python-dateutil), a lockfile for it (since I was using uv) and my preferred configuration for various tools (pytest, ruff). I'm a big fan of pytest-xdist for vibe coded projects, since you can prompt the LLM to write tests that will work when run in parallel and it does a pretty good job at this. Later I'd also add a pyright configuration, though initially I left it out because I saw Codex doing some strange things on account of duckdb being untyped, and I didn't want to debug it at the time (the fix, by the way, is instructing the LLM to define stubs as necessary in this case.)
An AGENTS.md file with some basic instructions to try to get Codex to stop doing things I saw it doing in the initial trajectories that I didn't want it to do. Nothing fancy, just if you see Codex do something bad, tell it not to do it in AGENTS.md. A good example of this is the "There are no nested AGENTS.md files, this is the only agents file": Codex is post-trained to look for nested AGENTS.md files, but you can save a few tool calls if you tell it there aren't any. (Note: folklore for Claude 3.7 is that instruction following for this sort of rules following was not great. Word on the street is that both Codex and Claude 4 are substantially better at this. Extra note: For uv users, another notable instruction in AGENTS.md is how to activate the venv, since at time of writing I couldn't get Codex to make this happen automatically.)
A setup script for the environment. This took the most debugging, because Codex runs all Internet access through a proxy and sometimes it works imperfectly.
Interact with the application and find things that are broken
Prompt the LLM to fix them
Repeat
For example, after the very first PR, some very mild poking around immediately revealed the bugs fixed in #2:
There's a race condition in the current test logic for matching against table contents in run_query. Specifically, if there were previously valid results in lastResults, and for some reason Dive doesn't do anything, then we will still see the old results. The testing framework should explicitly clear lastResults before attempting an interaction.
Filter functionality does not work. We will first add a failing test, and then fix it. The failing test should click "Add Filter", then select "user" as the field, and then add an "alice" chip (by typing alice in the text box and pressing ENTER). Then when we dive, we should see two alice rows. Right now, NO request is issued at all when we click Dive. Diagnose and then fix the problem.
Prompt the agent to write tests. It's very helpful to prompt the agent to generate tests for whatever bugs its fixing. For frontend code, I decided to use playwright to write these tests. An example in #11:
This is kind of a very manual and somewhat brittle test. I doubt I would have had the patience to manually write and debug this. But the LLM will happily do it. There is also a fear that the test might not actually be testing anything for real. But because these tests are all generated from bug reports, you can prompt the model to write the test first, check that it's failing, and then fix the problem. I didn't instruct this in the above prompt, but I did instruct it for #155, for example. The trajectory shows that the LLM demonstrates it can repro the bug before fixing the issue:
I’ll add a case-insensitive test for "samples" to check server behavior with lowercase input.
I’ve found the bug happens when "Samples" is lowercase. I’ll adjust the query to handle case insensitivity and add a test for it. Time to modify server.py to fix the issue!
Beware BC-breaking changes. The LLM generated tests worked well when I didn't need to change behavior in a BC-breaking way. Example prompt:
Change time column to default to none unconditionally. (I want to default it to a time column if we find a good candidate but this threshold needs tuning and I don't have the tuning right now.)
This hit a Codex timeout:
This attempt to update the default time column led to extensive frontend test failures that require deeper changes. The environment made it difficult to fully diagnose the Playwright test behaviors within the time available, so the work could not be completed.
In standard software engineering practice, when this happens, decouple the BC compatible and BC breaking changes!
Make it so that Time Column can be set to (none). When it is set this way, the Start/End fields are hidden and we don't apply a filter on time range. (#115)
and then later instead of defaulting the time column to none, I added a heuristic to pick a column that looked like time, which picked the same column that all of the existing tests had also expected to be called with.
Refactors have to be split up. Codex's timeout means that you can't ask it to do too much in one go. Here's a prompt that timed out:
scubaduck/index.html has gotten a bit long. Let's split out some of the JS code into dedicated JS files for their functionality. Also setup the necessary Flask scaffolding to serve these JS files. I think splitting out these specific components would be good:
Dropdown implementation
Sidebar resizing
JS controlling the View Settings (e.g., updateDisplayTypeUI, as well as one off interactions on form elements, columns handling, filter handling, the actual Dive implementation (including query updating), reading in defaults from query string)
Table rendering (e.g., formatNumber, sorting)
Chip input implementation
Chart rendering (showTimeSeries)
Make changes to AGENTS.md or README.md describing the structure so you can quickly find where the components you need are
I eventually did manage the refactor by prompting Codex to individually move out the pieces I wanted to extract one-by-one. This is a place where I think Claude Code probably would have performed better.
Parallelizing tasks. As you can see from the lengths of my prompts, it does take a while to write a good prompt; you're basically writing a bug report with enough detail that the LLM can repro it and then fix it. So sometimes I would be bottlenecked on prompt writing. However, sometimes the prompts were quite short. In those cases, Codex encourages you to submit more tasks that can run in parallel. I found this worked well, and I'd sometimes have as many as five instances going (once again, rate limited by discovering problems, making designs and typing prompts!) One irritation is when the tasks end up conflicting with each other. Sometimes the conflicts are easy to fix, but if it feels nontrivial, it's often better to just ask Codex to redo one of the PRs on latest main after the other has landed. To avoid merge conflicts, it helps to have only one "main feature" agent going at any time, and then ask the agent to do random bugfixes in parallel with it. Once you have no more tasks to get running, you can go do something else while you wait for the agents to finish (manager schedule!)
Prompting
As a reminder, I've posted all of my prompts (including the ones that failed) at scubaduck-prompts, and I think it's helpful to skim through them to get a flavor of what I was asking the LLM. But to summarize, what did I spend most of my time on prompting Codex to do? My general vibe (ahem) is that I spent most of my time doing minor enhancements, where I instructed Codex to make some part of the program work slightly differently, in a way that was previously unspecified from the previous prompt. The metaphor I had in my head while I was working on the project was like that of a sculptor chiseling away marble: in the beginning, anything is possible, but as I kept prompting, I continuously narrowed down the space of possible programs I had until I had exactly the one I wanted. One big thing I want to note is that Codex rarely needed to make updates to my tests; for the most part, tests that were added never got taken away, because I never "changed my mind". I suspect that the vibe coding process would have been rockier if I was having to change behavior frequently.
One of the things that surprised me the most about the process was how easy it was to implement a line chart in SVG with Codex. My first prompt resulted in a chart that looked broken on the test data:
We're going to add a new View type, to go along with Samples and Table: Time Series. Time Series supports all the fields that Table supports, and a few more:
X-axis: Main group by dimension, e.g., the x-axis on time series view. This is our custom dropdown selector, but only time columns are populated here. It should prefer a default setting from the following list, most preferred first: "time", "timestamp"
Granularity: Choose the time interval between data points on the chart.
For example, a granularity of 1 hour means there will be a data point every 60 minutes that is aggregated with the chosen Aggregate function over the data for the granularity period before point. This is a plain drop down. The valid values are: Auto, Fine, 1 second, 5 seconds, 10 seconds, 30 seconds, 1 minute, 4 minutes, 5 minutes, 10 minutes, 15 minutes, 30 minutes, 1 hour, 3 hours, 6 hours, 1 day, 1 week, 30 days. The semantics of the Auto setting is that it sets the interval to whatever would result in maximum 100 buckets (if there are not enough data points for that many buckets, it just picks the finest time interval that makes sense), and Fine which sets the interval to 500 buckets.
Fill Missing Buckets: This is a dropdown. For now, it has the settings "Fill with 0 (Per Series)" (default), "Connect (Per Series)" and "Leave blank".
Additionally, the default setting of Limit is 7, as it controls how many elements from group by will be plotted (the actual number of lines plotted could be a multiple of this, as we will plot every selected Column).
Unlike Samples and Table, we will instead display a line chart in the right panel. To plot the line chart, we will implement it by hand with JS and SVG, similar to how highcharts implements it. We will not use any third party dependencies. Lines will be plotted as paths, no smoothing, no dots for individual data points. Each series (as generated by group by) should be plotted with a different color, assigned using a best practices color palette for graph design. There should be a rendering of x-axis and y-axis; the x-axis should have slanted labels to aid readability. When we mouse over the chart, a vertical line should snap to the center of the time bucket that we are closest to. We should also display a crosshair on all of the series showing us their values at that data point, and highlight the closest point we are on, and increase the thickness of the series that point is on. To the left of the graph (still in the right panel), there should be a legend. The legend looks like this:
[GROUP BY VALUE] [AGGREGATE]
[First Column name, with series color]
[Number of samples for the first column]
[Second Column name, with series color]
[Number of samples for the second column]
... for all columns
----
... for all group by values (up to the limit)
So for example, if I group by user, I might see:
Alice AVG
value
4 (samples)
The highlighted series (which has a thicker line) should also be highlighted in the legend).
This was kind of terrifying, because I initially thought I didn't have a good way to test the SVG outputs. But after doing some regular old-fashioned debugging and reading the code (yes, this part not vibe coded), I figured out the problem, and also realized that Playwright can test that an SVG path is not just entirely straight. After the initial bugs were fixed, I mostly had to add missing features like x-axis/y-axis and interactivity features (amusingly, Codex ignored most of the instructions in the latter half of the prompt, giving only the barest bones legend. I suspect this was because I had some files which were too long). My general take after this was that JS chart libraries are going to become obsolete: it's much easier to vibe code a bespoke implementation and then customize the heck out of it.
Conclusion
ScubaDuck was implemented in about 150 Codex prompts. As you can see from the sample prompts above, the prompts are recognizably programming, they just happen to be in plain English language. This is a big help, because I never had to keep track of the nest of callbacks and state machines for implementing complex UI elements in JavaScript. I had to be fluent in what I wanted my program to do, and a good QA tester for the application to discover new problems that needed to be fixed, but I did not have to worry at all about the vagaries of SVG DOM elements or pixel position computation minutiae. It's hard to say how long it would have taken to code this by hand, but I think reproducing a UI that's been in production for years at Meta in three (part-time) days is pretty good!
Despite having done a bit of AI coding before, I also learned a bit from working on Codex. Codex made it blindingly clear that the parallel modality (and subsequent conflict resolution) is important. It made me adjust up my estimation of the capability of LLMs to write raw HTML/JS and evoked a future where people vibe code components in place of taking on a third party dependency. I was very appreciative of no rate limit Codex (though I doubt it's going to last.) It also reminded me how difficult it will be to setup agent environments for "real" projects (like PyTorch).
Hopefully, this case study has given you some ideas for things to try. Go forth and vibe code, responsibly!
I don't know about you, but testing isn't my favourite part of
software development.
It's usually the last thing standing between me and shipping a shiny
new feature, and writing tests is often an annoying process with a lot
of boilerplate and fighting against your system to get your app into a
good start starting for the test or mocking out whichever services your
app depends on.
Much ink has been spilled about how to organize your code in order to
make this easier, but the fact that so many blog posts and frameworks
exist for this express purpose suggests to me that we as a community of
software developers haven't quite solved this issue yet.
Keep reading to see how I've solved this problem for myself by simply
avoiding unit testing altogether.
An alternative testing
method
When I first started at Unison Computing I was submitting my first
feature when I learned there were precious few unit tests. I found it
rather surprising for a codebase for a compiler for a programming
language! How do you prevent regressions without unit tests?
The answer is what the Unison team has dubbed transcript
tests. These are a variation on the concept of golden-file
tests.
A Unison transcript is a markdown file which explains in
standard what behaviour it is going to test, then intersperses
code-blocks which outline the steps involved in testing that feature
using a mix of Unison code and UCM commands (UCM is Unison's CLI tool).
After that comes the magic trick; UCM itself can understand and run
these transcript files directly and record the results of each
block.
When running a transcript file with the ucm transcript
command UCM produces a deterministic output file containing the result
of processing each code block. Unless the behaviour of UCM has changed
since the last time it was run the resulting file will always be the
same.
Each block in the markdown file is either a command, which is sent to
the UCM shell tool, or it represents an update to a file on the
(virtual) file-system, in which case it will be typechecked against the
state of the codebase.
Here's a quick example of a transcript for testing UCM's view command
so you can get a feel for it.
# Testing the `view` command
First, let's write a simple definition to view:
``` unison
isZero = cases
0 -> true
_ -> false
```
Now we add the definition to the codebase, and view it.
``` ucm
scratch/main> update
scratch/main> view isZero
```
We run this transcript file with
ucm transcript my-transcript.md which produces the
my-transcript.output.md file.
Notice how compiler output is added inline, ignore the hashed names,
It's because I'm skipping the step which adds names for Unison's
builtins.
# Testing the `view` command
First, let's write a simple definition to view:
``` unison
isZero = cases
0 -> true
_ -> false
```
``` ucm :added-by-ucm
Loading changes detected in scratch.u.
I found and typechecked these definitions in scratch.u. If you
do an `add` or `update`, here's how your codebase would
change:
� These new definitions are ok to `add`:
isZero : ##Nat -> ##Boolean
```
Now we add the definition to the codebase, and view it.
``` ucm
scratch/main> update
Done.
scratch/main> view isZero
isZero : ##Nat -> ##Boolean
isZero = cases
0 -> true
_ -> false
```
Feel free to browse through the collection
of transcripts we test in CI to keep UCM working as expected.
Testing in CI
Running transcript tests in CI is pretty trivial; we discover all
markdown files within our transcript directory and run them all. After
the outputs have been written we can use
git diff --exit-code which will then fail with a non-zero
code if anything of the outputs have changed from what was committed.
Conveniently, git will also report exactly what changed, and
what the old output was.
This failure method allows the developer to know exactly which file
has unexpected behaviour so they can easily re-run that file or recreate
the state in their own codebase if they desire.
Transcript tests in other
domains
I liked the transcript tests in UCM so much that when I was tasked
with building out the Unison Share webapp I decided to use
transcript-style testing for that too. Fast forward a few years and
Unison Share is now a fully-featured package repository and code
collaboration platform running in production without a
single unit test.
If you're interested in how I've adapted transcript tests to work
well for a webapp, I'll leave a few notes at the end of the post.
Benefits of transcript tests
Here's a shortlist of benefits I've found working with transcript
tests over alternatives like unit tests.
You write a transcript using the same syntax as you'd
interact with UCM itself.
This allows all your users to codify any buggy behaviour they've
encountered into a deterministic transcript. Knowing exactly how to
reproduce the behaviour your users are seeing is a huge boon, and having
a single standardized format for accepting bug reports helps reduce a
lot of the mental work that usually goes into reproducing bug reports
from a variety of sources. This also means that the bug report itself
can go directly into the test suite if we so desire.
All tests are written against the tool's external
interface.
The tests use the same interface that the users of your software will
employ, which means that internal refactors won't ever break
tests unless there's a change in behaviour that's externally
observable.
This has been a huge benefit for me personally. I'd often find myself
hesitant to re-work code because I knew that at the end I'd be rewriting
thousands of lines of tests. If you always have to rewrite your tests at
the same time you've rewritten your code, how do you have any confidence
that the tests still work as intended?
Updating tests is trivial
In the common case where transcripts are mismatched because some help
message was altered, or perhaps the behaviour has changed but the change
is intended, you don't need to rewrite any complex assertions, or mock
out any new dependencies. You can simply look at the new output, and if
it's reasonable you commit the changed transcript output files.
It can't be understated how convenient this is when making sweeping
changes; e.g. making changes to Unison's pretty printer. We don't need
to manually update test-cases, we just run the transcripts locally and
commit the output if it all looks good!
Transcript changes appear in PR reviews
Since all transcript outputs are committed, any change in behaviour
will show up in the PR diff in an easy-to-read form. This allows
reviewers to trivially see the old and new behaviour for each relevant
feature.
Transcript tests are documentation
Each transcript shows how a feature is intended to be used by
end-users.
Transcripts as a collaboration tool
When I'm implementing new features in Unison Share I need to
communicate the shape of a JSON API with our Frontend designer Simon.
Typically I'll just write a transcript test which exercises all possible
variants of the new feature, then I can just point at the transcript
output as the interface for those APIs.
It's beneficial for both of us since I don't need to keep an example
up-to-date for him, and he knows that the output is actually accurate
since it's generated from an execution of the service itself.
Transcript testing for
Webapps
I've adapted transcript testing a bit for the Unison Share webapp. I
run the standard Share executable locally with its dependencies mocked
out via docker-compose. I've got a SQL file which resets the database
with a known set of test fixtures, then use a zsh script to reset my
application state in between running each transcript.
Each transcript file is just a zsh script that interacts with the
running server using a few bash functions which wrap curl commands, but
save the output to json files, which serve as the transcript output.
I've also got helpers for capturing specific fields from an API call
into local variables which I can then interpolate into future queries,
this is handy if you need to, for example, create a project then switch
it from private to public, then fetch that project via API.
Here's a small snippet from one of my transcripts for testing Unison
Share's project APIs:
#!/usr/bin/env zsh# Fail the transcript if any command failsset-e# Load utility functions and variables for user credentialssource"../../transcript_helpers.sh"# Run a UCM transcript to upload some code to load in projects.transcript_ucm transcript prelude.md# I should be able to see the fixture project as an unauthenticated user.fetch"$unauthenticated_user" GET project-get-simple '/users/test/projects/publictestproject'# I should be able to create a new project as an authenticated user.fetch"$transcripts_user" POST project-create '/users/transcripts/projects/containers''{ "summary": "This is my project", "visibility": "private", "tags": []}'fetch"$transcripts_user" GET project-list '/users/transcripts/projects'
You can see the output files generated by the full transcript in
this directory.
Requirements of
a good transcript testing tool
After working with two different transcript testing tools across two
different apps I've got a few criteria for what makes a good transcript
testing tool, if you're thinking of adding transcript tests to your app
consider the following:
Transcripts should be deterministic
This is critical. Transcripts are only useful if they produce the
same result on every run, on every operating system, at every time of
day.
You may need to make a few changes in your app to adapt or remove
randomness, at least when in the context of a transcript test.
In Share there were a lot of timestamps, random IDs, and JWTs (which
contain a timestamp). The actual values of these weren't important for
the tests themselves, so I solved the issue by piping the curl output
through a sed script before writing to disk. The script
matches timestamps, UUIDs, and JWTs and replaces them with placeholders
like <TIMESTAMP>, <UUID>, and
<JWT> accordingly.
A special mode in your app for transcript testing which avoids
randomness can be useful, but use custom modes sparingly lest your app's
behaviour differ too much during transcripts and you can't test the real
thing.
I also make sure that the data returned by APIs is always sorted by
something other than randomized IDs, it's a small price to pay, and
reduces randomness and heisenbugs in the app as a helpful byproduct.
Transcripts should be isolated
Each individual transcript should be run in its own pristine
environment. Databases should be reset to known state, if the
file-system is used, it should be cleared or even better, a virtual
file-system should be used.
Transcripts should be self-contained
Everything that pertains to a given test-case's state or
configuration should be evident from within the transcript file itself.
I've found that changes in behaviour from the file's location or name
can just end up being confusing.
Difficulties working with
Transcripts
Transcripts often require custom tooling
In UCM's case the transcript tooling has evolved slowly over many
years, it has it's own parser, and you can even test UCM's API server by
using special code blocks for that.
Share has a variety of zsh utility scripts which provide
helpers for fetching endpoints using curl, and filtering output to
capture data for future calls. It also has a few tools for making
database calls and assertions.
Don't shy away from investing a bit of time into making transcript
testing sustainable and pleasant, it will pay dividends down the
road.
Intensive Setup**
As opposed to unit tests which are generally pretty lightweight;
transcript tests are full integration tests, and require setting up
data, and sometimes executing entire flows so that we can get the system
into a good state for testing each feature.
You can mitigate the setup time by testing multiple features with
each transcript.
I haven't personally found transcript tests to take too much time in
CI, largely because I think transcript testing tends to produce fewer
tests, but of higher value than unit testing. I've seen many unit test
suites bogged down by particular unit tests which generate hundreds of
test cases that aren't actually providing real value. Also, any
setup/teardown is going to be more costly on thousands of unit-tests as
compared to dozens or hundreds of transcript tests.
Service Mocking
Since transcript tests run against the system-under-test's external
interface, you won't have traditional mocking/stubbing frameworks
available to you. Instead, you'll mock out the system's dependencies by
specifying custom services using environment variables, or wiring things
up in docker-compose.
Most systems have a setup for local development anyways, so
integrating transcript tests against it has the added benefit that
they'll ensure your local development setup is tested in CI, is
consistent for all members of your team, and continues to work as
expected.
In Summary
Hopefully this post has helped you to consider your relationship with
unit tests and perhaps think about whether other testing techniques may
work better for your app.
Transcript tests surely aren't ideal for all
possible apps or teams, but my last few years at Unison have proven to
me that tests can be more helpful, efficient, and readable than I'd
previously thought possible.
Let me know how it works out for you!
Hopefully you learned something �! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put
these things together, if I managed to teach your something or even just entertain you for a minute or two
maybe send a few bucks my way for a coffee? Cheers! �
Andy Gordon from Cogna is interviewed by Sam and Matti. We learn about Andy’s influential work including the origins of the bind symbol in haskell, and the introduction of lambdas in Excel. We go onto discuss his current work at Cogna on using AI to allow non-programmers to write apps using natural language. We delve deeper into the ethics of AI and consider the most likely AI apocalypse.
As software engineers at Modus Create,
we are always on the lookout for tools that can enhance our productivity and code quality.
The advent of AI-powered coding assistants such as GitHub Copilot
has sparked excitement in the development community.
Copilot code completions propose snippets at the current cursor that the user can quickly insert,
while Copilot Chat allows users to discuss their code with an AI.
These tools promise to revolutionize software development,
allowing engineers to focus on higher-level tasks while delegating implementation details to machines.
However, their adoption also raises questions:
Do they genuinely improve developer productivity?
How do they affect code quality and maintainability?
Which users and tasks benefit the most from these AI-driven coding assistants?
This blog post explores the challenges of measuring the impact of AI tools in
our software engineering practices, with a focus on GitHub Copilot. Note that
the data discussed in the post was collected in Q2 2024. We expect that GitHub
Copilot has improved since then; we have also not yet had the opportunity to
quantitatively investigate newer interfaces to AI development, like
Cursor or Windsurf.
“Developer Productivity”
At Modus Create, we’re passionate about improving the experience of developers,
both for our own teams and those at clients.
We have been working for years on tools that we think improve developer productivity,
for instance with
Nix,
Bazel,
Python,
and many more.
But measuring developer productivity is a notoriously difficult task.
At the heart of this question lies the nature of software development itself.
Is it a productive activity that can fit scientific management,
be objectively measured, and be optimized?
Part of the research on developer productivity goes down this path, trying to measure things like the time it takes to complete standardized tasks.
Another trend suggests that developers themselves can be their own assessors of productivity, where
frameworks like
SPACE are used to guide self-assessment.
Each of these angles has strengths and weaknesses. To get as broad a picture as possible, we tried to use
a bit of both. We found, though, that data collection issues made our task timings unusable (more on this below). Therefore,
all our conclusions are drawn from self-assessments.
Our in-house experiment
To gain a deeper understanding of the impact of GitHub Copilot at Modus Create,
we designed and conducted an in-house experiment.
We managed to recruit 22 participants in total, ranging from Junior to Principal software engineers.
They had a wide range of programming experience.
The experiment consisted of four coding tasks that participants
needed to complete using Python within an existing
codebase. The tasks were designed
to evaluate different aspects of software development:
Data ingestion: Loading and parsing data from a file into a Pandas DataFrame
Data analysis: Performing statistical computations and aggregations using Pandas’ groupby operations
Test development: Writing tests using Python’s unittest framework
Data visualization: Creating interactive plots using the Streamlit library
Participants had varied levels of experience with the required tools. Most participants
had at least a little bit of Python experience, but Pandas experience was less common
and hardly anyone had used Streamlit before.1
Upon completion of the assigned tasks, all participants completed a comprehensive survey to provide
detailed feedback on their experience.
The survey consisted of approximately 50 questions designed to assess multiple dimensions of the
development process, including:
Assessment of participant expertise levels regarding task requirements, AI tooling and GitHub Copilot proficiency
Evaluation of task-specific perceived productivity
Analysis of the impact on learning and knowledge acquisition
Insights into potential future GitHub Copilot adoption
Perceived productivity gains
We asked participants the following questions.
Question
Choices
If you didn't have Copilot, reaching the answer for task X would have taken...
Less time
About the same time
More time
This question was core to our study,
as it allowed us to directly measure the perceived productivity gain of using Copilot versus not using it.
The result was clear:
almost every Copilot user felt more productive using Copilot on every task.
We also broke out the same data by Python experience level, and found that more experienced Python
users found less productivity gain than less experienced users. In this plot, we grouped the “no Python
experience” and “beginner” users into the “less experienced” group, with the rest of the users in the “more experienced group”.
To better understand how participants tackled these tasks, we collected information by asking for each task:
Question
Choices
Which of the following have you used to complete task X?
Copilot code completions
Copilot Chat
Google search
Library documentation
My knowledge
We were also interested in comparing these usages across profiles of developers,
so we asked this question as well:
Question
Choices
How would you describe your Python level?
No Python experience
Beginner
Intermediate
Advanced
We could then visualize how participants who felt more productive with Copilot solved each problem,
and see if there were variations depending on their profile. Since each participant could choose multiple
options, sometimes there are more responses than participants.
Apparently, people don’t like library documentation. Also, we thought it was
strange that the most experienced Python users never reported using their own
knowledge. It would be interesting to dig more into this, but we don’t have
the data available. One theory is that when reviewing AI suggestions everyone
relied on their own Python knowledge, but experienced users took that knowledge
for granted and so didn’t report using it.
Among people who felt more productive on tasks “Write unit tests” and “Plot with Streamlit”,
we really see more usage of Copilot Chat than other sources.
Our hypothesis is that these tasks typically require making more global changes to the code or adding code in places that are not obvious at first.
In these scenarios, Copilot Chat is more useful because it will tell you where and what code to add.
In other tasks, it was clearer where to add code, so participants could likely place their cursor and prompt Copilot for a suggestion.
This is supported by the questions we asked:
Question
Choices
Which of the following do you think is true?
Copilot is better with acceleration: it helps accelerate work that I already know how to do.
Copilot is better with exploration: it helps me explore the problem and how to solve it when I am not sure how to proceed.
This question uses checkboxes, so respondents were not restricted to a single answer.
On average, participants thought Copilot was suited for both acceleration and exploration, but with some notable
differences depending on experience level: experienced Pythonistas strongly favored Copilot for acceleration,
while less experienced users thought it was better for exploration.
We also found that the participants’ perspective on acceleration versus exploration seems related to the usage of Copilot Chat.
The most interesting part of this chart is that
participants who think Copilot is good for exploration or bad for acceleration relied most heavily on Copilot Chat.
This suggests that users find the autocomplete features more useful for acceleration, while the chat features —
which allow general questions, divorced from a specific code location — are useful for exploration.
But it is interesting to note how
usage of Copilot Chat versus autocomplete is correlated with how users perceive Copilot as a whole.
For more on acceleration versus exploration with Copilot, this OOPSLA23 talk which inspired to ask this question
is worth watching.
Copilot will make code flow
The SPACE framework mentions “flow”
as an important aspect of productivity.
Some research associates productivity with the ability to get complex tasks done with minimal distractions or interruptions.
This conceptualization of productivity is echoed by many developers when they talk about “getting into the flow” […].
This concept of flow is really interesting,
because it is a way to measure productivity that is not based on outputs,
but rather on the experience of the developers themselves. And although
“flow” might be subjective and perceptual,
studies have linked it to higher productivity and reduced stress; see
this open-access book chapter for a readable overview of the research.
To get an idea of Copilot’s impact on flow, we asked the following questions:
Question
Choices
Did Copilot decrease your need to switch out of your IDE (for example to search for answers or check the documentation)?
Significantly
A bit
No
Did Copilot enhance your capacity to stay in your development flow?
Significantly
A bit
No
The results were unambiguous: most users found that Copilot helped significantly, and a strong majority
found that it helped at least a little.
Learnings from organizing the experiment
Although the experiment went well overall, we noted a few challenges worth sharing.
First, ensuring active participation in the experiment required a collective effort within the company.
Spreading the word internally about the experiment and looking for participants is an effort not to be underestimated.
In our case, we benefited from great support from internal leaders and managers who helped communicate with and recruit participants.
Even so, we would have liked to have more participants. It turns out that engineers are sometimes just too busy!
Second, keeping participants focused on the experiment was harder than expected.
We had asked participants to make a git commit at the end of each task,
thinking that we could use this data to quantify the time it took for each participant to complete their tasks.
When looking at the data, we were surprised to see that the time between commits varied widely and was often much longer than expected.
When asked, several participants reported that they had to interrupt our experiment to deal with higher-priority tasks.
In the end, we discarded the timing data: they were too limited and too heavily influenced by external factors to provide useful conclusions.
For the same reason, we haven’t even mentioned yet that our study had a control group: since the timing data
wasn’t useful, we’ve omitted the control group entirely from the data presented here.
The ideal scenario of securing dedicated, uninterrupted time from a large pool
of engineers proved impractical within our organizational context. Nevertheless,
despite these limitations, we successfully gathered a meaningful dataset
that contributes valuable perspectives to the existing body of research on
AI-assisted development.
Further references
Speaking of other work out there, there’s a lot of it! It turns out that many people are excited by the potential
of code assistants and want to understand them better. Who knew? Here is some further reading that we found
particularly interesting:
Experiments at Microsoft and Accenture introduced
Copilot into engineers’ day-to-day workflow and measured the impact on various productivity metrics,
like the number of opened pull requests; they found that Copilot usage significantly increased the
number of successful builds. They had a much larger sample size than we did — Microsoft
and Accenture have a lot of engineers — but unlike us they didn’t specifically consider
the uptake of unfamiliar tools and libraries.
A research team from Microsoft and MIT recruited developers from Upwork, gave them a task,
and measured the time it took with and without Copilot’s help; they
found that Copilot users were about 50% faster. They
did a better job than we did at measuring completion time (they used GitHub Classroom), but we think
our exit survey asked more interesting questions.
The Pragmatic Engineer ran a survey
about how engineers are using AI tooling, covering popular tools and their perceived impact
on development.
Conclusion
Our experiment provided valuable insights into the impact of GitHub Copilot on
developer experiences at Modus Create. Overall, developers reported increased
productivity and a more seamless workflow. Participants used Copilot extensively
in specific coding scenarios, such as automated testing and modifying code that
used libraries they were unfamiliar with, and they felt more productive in those cases.
It was particularly interesting to see how the interface to the AI assistant
(chat vs. completion) affected participants’ opinions on what the assistant was useful
for, with chat-heavy users prioritizing exploration over acceleration and completion-heavy
users the other way around. As interfaces and tooling continue to evolve — faster than we can design
and run experiments to test them — we expect them to play a huge
role in the success of AI-powered code assistants.
We made a small mistake with the wording in Pandas and Streamlit questions:
we gave them the options “I have never used it”, “I have heard of it”, “I
have used it before in a limited way”, “I am comfortable with it”, and “I am
an advanced user”. The problem, of course, is that these responses
aren’t mutually exclusive. Given the order the responses were presented in, we
think it’s reasonable to interpret “I have never used it” responses to mean that they’d
heard of it but never used it. For the plot, we’ve combined “I have never used it”
and “I have heard of it” into “Never used it”.↩
How to turn polling insight into an optimal ballot — and why anything else is wasted.
“approve of�? What does that mean anyway?
I have written previously about how approval and range voting methods are intrinsically tactical. This doesn’t mean that they are more tactical than other election systems (nearly all of which are shown to sometimes be tactical by Gibbard’s Theorem when there are three or more options). Rather, it means that tactical voting is unavoidable. Voting in such a system requires answering the question of where to set your approval threshold or how to map your preferences to a ranged voting scale. These questions don’t have more or less “honest� answers. They are always tactical choices.
But I haven’t dug deeper into what these tactics look like. Here, I’ll do the mathematical analysis to show what effective voting looks like in these systems, and make some surprising observations along the way.
Mathematical formalism for approval voting
We’ll start by assuming an approval election, so the question is where to put your threshold. At what level of approval do you switch from voting not to approve a candidate to approving them?
We’ll keep the notation minimal:
As is standard in probability, I’ll write ℙ[X] for the probability of an event X, and �[X] for the expected value of a (numerical) random variable X.
I will use B to refer to a random collection (multiset) of ballots, drawn from some probability distribution reflecting what we know from polling and other information sources on other voters. B will usually not include the approval vote that you’re considering casting, and to include that approval, we’ll write B ∪ {c}, where c is the candidate you contemplate approving.
I’ll write W(·) to indicate the winner of an election with a given set of ballots. This is the candidate with the most approvals. We’ll assume some tiebreaker is in place that’s independent of individual voting decisions; for instance, candidates could be shuffled into a random order before votes are cast, in in the event of a tie for number of approvals, we’ll pick the candidate who comes first in that shuffled order.
U(·) will be your utility function, so U(c) is the utility (i.e., happiness, satisfaction, or perceived social welfare) that you personally will get from candidate c winning the election. This doesn’t mean you have to be selfish, per se, as accomplishing some altruistic goal is still a form of utility, but we evaluate that utility from your point of view even though other voters may disagree.
With this notation established, we can clearly state, almost tautologically, when you should approve of a candidate c. You should approve of c whenever:
�[U(W(B ∪ {c}))] > �[U(W(B))]
That’s just saying you should approve of c if your expected utility from the election with your approval of c is more than your utility without it.
The role of pivotal votes and exact strategy
This inequality can be made more useful by isolating the circumstances in which your vote makes a difference in the outcome. That is, W(B ∪ {c}) ≠W(B). Non-pivotal votes contribute zero to the net expectation, and can be ignored.
In approval voting, approving a candidate can only change the outcome by making that candidate the winner. This means a pivotal vote is equivalent to both of:
W(B ∪ {c}) = c
W(B) ≠ c
It’s useful to have notation for this, so we’ll define V(B, c) to mean that W(B ∪ {c}) ≠W(B), or equivalently, that W(B ∪ {c}) = c and W(B) ≠c. To remember this notation, recall that V is the pivotal letter in the word “pivot�, and also visually resembles a pivot.
With this in mind, the expected gain in utility from approving c is:
�[U(W(B ∪ {c}))] - �[U(W(B))]. But since the utility gain is zero except for pivotal votes, this is the same as
ℙ[V(B,c)] · (�[U(W(B ∪ {c})) | V(B,c)] - �[U(W(B)) | V(B,c)]). But since V(B,c) implies that W(B ∪ {c}) = c, so this simplifies to
ℙ[V(B,c)] · (U(c) - �[U(W(B)) | V(B, c)])
Therefore, you ought to approve of a candidate c whenever
U(c) > �[U(W(B)) | V(B, c)]
This is much easier to interpret. You should approve of a candidate c precisely when the utility you obtain from c winning is greater than the expected utility in cases where c is right on the verge of winning (but someone else wins instead).
There are a few observations worth making about this:
The expectation clarifies why the threshold setting part of approval voting is intrinsically tactical. It involves evaluating how likely each other candidate is to win, and using that information to compute an expectation. That means advice to vote only based on internal feelings like whether you consider a candidate acceptable is always wrong. An effective vote takes into account external information about how others are likely to vote, including polling and understanding of public opinion and mood.
The conditional expectation, assuming V(B, c), tells us that the optimal strategy for whether to approve of some candidate c depends on the very specific situation where c is right on the verge of winning the election. If c is a frontrunner in the election, this scenario isn’t likely to be too different from the general case, and the conditional probability doesn’t change much. However, if c is a long-shot candidate from some minor party, but somehow nearly ties for a win, we’re in a strange situation indeed: perhaps a major last-minute scandal, a drastic polling error, or a fundamental misunderstanding of the public mood. Here, the conditonal expected utility of an alternate winner might be quite different from your unconditional expectation. If, say, voters prove to have an unexpected appetite for extremism, this can affect the runner-ups, as well.
Counter-intuitively, an optimal strategy might even involve approving some candidates that you like less than some that you don’t approve! This can happen because different candidates are evaluated against different thresholds. Therefore, a single voter’s best approval ballot isn’t necessarily monotonic in their utility rankings. This adds a level of strategic complexity I hadn’t anticipated in my earlier writings on strategy in approval voting.
Approximate strategy
The strategy described above is rigorously optimal, but not at all easy to apply. Imagining the bizarre scenarios in which each candidate, no matter how minor, might tie for a win, is challenging to do well. We’re fortunate, then, that there’s a good approximation. Remember that the utility gain from approving a candidate was equal to
ℙ[V(B,c)] · (U(c) - �[U(W(B)) | V(B, c)])
In precisely the cases where V(B, c) is a bizarre assumption that’s difficult to imagine, we’re also multiplying by ℙ[V(B,c)], which is vanishingly small, so this vote is very unlikely to make a difference in the outcome. For front-runners, who are relatively much more likely to be in a tie for the win, the conditional probability changes a lot less: scenarios that end in a near-tie are not too different from the baseline expectation.
This happens because ℙ[V(B,c)] falls off quite quickly indeed as the popularity of c decreases, especially for large numbers of voters. For a national scale election (say, about 10 million voters), if c expects around 45% of approvals, then ℙ[V(B,c)] is around one in a million. That’s a small number, telling us that very large elections aren’t likely to be decided by a one-vote margin anyway. But it’s gargantuan compared to the number if c expects only 5% of approvals. Then ℙ[V(B,c)] is around one in 10^70. That’s about one in a quadrillion-vigintillion, if you want to know, and near the scale of possibly picking one atom at random from the entire universe! The probability of casting a pivotal vote drops off exponentially, and by this point it’s effectively zero.
With that in mind, we can drop the condition on the probability in the second term, giving us a new rule: Approve of a candidate c any time that:
U(c) > �[U(W(B))]
That is, approve of any candidate whose win you would like better than you expect to like the outcome of the election. In other words, imagine you have no other information on election night, and hear that this candidate has won. If this would be good news, approve of the candidate on your ballot. If it would be bad news, don’t.
This rule is still tactical. To determine how much you expect to like the outcome of the election, you need to have beliefs about who else is likely to win, which still requires an understanding of polling and public opinion and mood.
However, there is one threshold, derived from real polling data in realistic scenarios, and you can cast your approval ballot monotonically based on that single threshold.
This is no longer a true optimal strategy, but with enough voters, the exponential falloff in ℙ[V(B,c)] as c becomes less popular is a pretty good assurance that the incorrect votes you might cast by using this strategy instead of the optimal ones are extremely unlikely to matter. In practice, this is probably the best rule to communicate to voters in an approval election with moderate to large numbers of voters.
We can get closer with the following hypothetical: Imagine that on election night, you have no information on the results except for a headline that proclaims: Election Too Close To Call. With that as your prior, you ask of each candidate, is it good or bad news to hear now that this candidate has won. If it would be good news, then you approve of them. This still leaves one threshold, but we’re no longer making the leap that the pivotal condition for front-runners is unnecessary; we’re imagining a world in which at least some candidates, almost surely the front-runners, are tied. If this changes your decision (which it likely would only in very marginal cases), you can use this more accurate approximation.
Reducing range to approval voting
I promised to look at strategy for range voting, as well. Armed with an appreciation of approval strategy, it’s easy to extend this to an optimal range strategy, as well, for large-scale elections.
The key is to recognize that a range voting election with options 0, 1, 2, …, n is mathematically equivalent to an approval election where everyone is just allowed to vote n times. The number you mark on the range ballot can be interpreted as saying how many of your approval ballots you want to mark as approving that candidate.
Looking at it this way presents the obvious question: why would you vote differently on some ballots than others? In what situation could that possibly be the right choice?
For small elections, say if you’re voting on places to go out and eat with your friends or coworkers, it’s possible that adding in a handful of approvals materially changes the election so that the optimal vote is different. Then it may well be optimal to cast a range ballot using some intermediate number.
For large elections, though, you’re presented with pretty much exactly the same question each time, and you may as well give the same answer. Therefore, in large-scale elections, the optimal way to vote with a range ballot is always to rate everyone either the minimum or maximum possible score. This reduces a range election exactly to an approval election. The additional expressiveness of a range ballot is a siren call: by using it, you always vote less effectively than you would have by ignoring it and using only the two extreme choices.
Since we’re discussing political elections, which have relatively large numbers of voters, this answers the question for range elections, as well: Rate a candidate the maximum score if you like them better than you expect to like the outcome of the election. Otherwise, rate them the minimum score.
Summing it up
What we’ve learned, then, is that optimal voting in approval or range systems boils down to two nested rules.
Exact rule (for the mathematically fearless): approve c iff U(c) > �[ U(W(B)) | your extra vote for c is pivotal ]. This Bayesian test weighs each candidate against the expected utility in the razor-thin worlds where they tie for first.
Large-electorate shortcut (for everyone else): because those pivotal worlds become astronomically rare as the field grows, the condition shrinks to a single cutoff: approve (or give a maximum score) to every candidate whose victory you expect to enjoy more than you expected to like the result. (If you can, imagine only cases where you know the election is close.)
We’ve seen why the first rule is the gold standard; but the second captures virtually all of its benefit when millions are voting. Either way, strategy is inseparable from sincerity: you must translate beliefs about polling into a utility threshold, and then measure every candidate against it. We’ve also seen by a clear mathematical equivalence why range ballots add no real leverage in large-scale elections, instead only offering false choices that are always wrong.
The entire playbook fits on a sticky note: compute the threshold, vote all-or-nothing, and let the math do the rest.
Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. The official blurb is brief:
Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.
A quincunx is the X-shaped pattern of pips on the #5 face of a die.
It's so-called because the Romans had a common copper coin called an
as, and it was divided (monetarily, not physically) into twelve
uncia. There was a bronze coin worth five uncia called a quīncunx, which
is a contraction of quīnque (“five”) + uncia, and the coin had
that pattern of dots on it to indicate its value.
Uncia generally meant a twelfth of something. It was not just a
twelfth of an as, but also a twelfth of a pound , which is where we
get the word “ounce”, and a twelfth of a foot, which is where we get
the word “inch”.
The story I always heard about the connection between the coin and the
X-shaped pattern of dots was the one that is told by Wikipedia:
Its value was sometimes
represented by a pattern of five dots arranged at the corners and the
center of a square, like the pips of a die. So, this pattern also came
to be called quincunx.
Or the Big Dictionary:
… [from a] coin of this value (occasionally marked with a pattern
resembling the five spots on a dice cube),…
But today I did Google image search for qunicunxes. And while most
had five dots, I found not even one that had the dots arranged in an
X pattern.
(I believe the heads here are Minerva, goddess of wisdom. The owl is
also associated with Minerva.)
Where's the quincunx that actually has a quincuncial arrangement of
dots? Nowhere to be found, it seems. But everyone says it, so it must be true.
Addenda
The first common use of “quincunx” as an English word was to refer
to trees that were planted in a quincuncial pattern, although not
necessarily in groups of exactly five, in which each square of four
trees had a fifth at its center.
Similarly, the
Galton Box,
has a quincuncial arrangement of little pegs. Galton himself called
it a “quincunx”.
The OED also offers this fascinating aside:
Latinquincunx occurs earlier in an English context. Compare
the following use apparently with reference to a v-shaped figure:
1545Decusis, tenne hole partes or ten Asses...It is also a
fourme in any thynge representyng the letter, X, whiche parted in
the middel, maketh an other figure called Quincunx, V.
which shows that for someone, a quincuncial shape was a V and not
an X, presumably because V is the Roman numeral for five.
A decussis was a coin worth not ten uncia but ten asses, and
it did indeed have an X on the front. A five-as coin was a
quincussis and it had a V. I wonder if the author was confused?
The source is
Bibliotheca Eliotæ.
The OED does not provide a page number.
It wasn't until after I published this that I realized that
today's date was the extremely quincuncial 2025-05-25. I thank the
gods of chance and fortune for this little gift.
The quince is so-named because, like other fruits in the apple family,
it has a natural fivefold symmetry:
This is because their fruits develop from five-petaled flowers, and
the symmetry persists through development. These are pear blossoms:
You can see this in most apples if you cut them into equatorial slices:
The fivefold symmetry isn't usually apparent from the outside once the
structure leaves the flowering stage. But perfect Red Delicious
specimens do have five little feet:
P.S.: I was just kidding about the name of the quince, which actually
has nothing to do with any of this. It is a coincidence.
I thought it might be fun to try to use Glean to index as much of
Hackage as I could, and then do some rough comparisons against hiedb and also play around to see what interesting queries
we could run against a database of all the code in Hackage.
This project was mostly just for fun: Glean is not going to replace
hiedb any time soon, for reasons that will become clear. Neither are
we ready (yet) to build an HLS plugin that can use Glean, but
hopefully this at least demonstrates that such a thing should be
possible, and Glean might offer some advantages over hiedb in
performance and flexibility.
A bit of background:
Glean is a code-indexing system
that we developed at Meta. It’s used internally at Meta for a wide
range of use cases, including code browsing, documentation
generation and code analysis. You can read about the ways in which
Glean is used at Meta in Indexing
Code At Scale with Glean.
hiedb is a code-indexing system for Haskell. It takes
the .hie files that GHC produces when given the option
-fwrite-ide-info and writes the information to a SQLite database
in various tables. The idea is that putting the information in a DB
allows certain operations that an IDE needs to do, such as
go-to-definition, to be fast.
You can think of Glean as a general-purpose system that does the same
job as hiedb, but for multiple languages and with a more flexible
data model. The open-source version of Glean comes with indexers for
ten languages or
so, and moreover Glean supports SCIP which has
indexers for various languages available from SourceGraph.
Since a hiedb is just a SQLite DB with a few tables, if you want you
can query it directly using SQL. However, most users will access the
data through either the command-line hiedb tool or through the API,
which provide the higher-level operations such as go-to-definition and
find-references. Glean has a similar setup: you can make raw queries
using Glean’s query language (Angle) using the
Glean shell or the command-line tool, while the higher-level
operations that know about symbols and references are provided by a
separate system called Glass which also has a command-line tool and
API. In Glean the raw data is language-specific, while the Glass
interface provides a language-agnostic view of the data in a way
that’s useful for tools that need to navigate or search code.
An ulterior motive
In part all of this was an excuse to rewrite Glean’s Haskell
indexer. We built a Haskell indexer a while ago but it’s pretty
limited in what information it stores, only capturing enough
information to do go-to-definition and find-references and only for a
subset of identifiers. Furthermore the old indexer works by first
producing a hiedb and consuming that, which is both unnecessary and
limits the information we can collect. By processing the .hie files
directly we have access to richer information, and we don’t have the
intermediate step of creating the hiedb which can be slow.
The rest of this post
The rest of the post is organised as follows, feel free to jump
around:
Performance: a few results comparing hiedb with Glean on an
index of all of Hackage
Queries: A couple of examples of queries we can do with
a Glean index of Hackage: searching by name, and finding dead code.
Apparatus: more details on how I set
everything up and how it all works.
What’s next: some thoughts on what we still need to add to
the indexer.
Performance
All of this was perfomed on a build of 2900+ packages from Hackage,
for more details see Building all of Hackage
below.
Indexing performance
I used this hiedb command:
hiedb index -D /tmp/hiedb . --skip-types
I’m using --skip-types because at the time of writing I haven’t
implemented type indexing in Glean’s Haskell indexer, so this should
hopefully give a more realistic comparison.
I should note that in the case of Glean the only parallelism is
between the indexer and the server that is writing to the DB. We
didn’t try to index multiple .hie files in parallel, although that
would be fairly trivial to do. I suspect hiedb is also
single-threaded just going by the CPU load during indexing.
Size of the resulting DB
hiedb: 5.2GB
Glean: 0.8GB
It’s quite possible that hiedb is simply storing more information, but
Glean does have a rather efficient storage system based on RocksDB.
Performance of find-references
Let’s look up all the references of Data.Aeson.encode:
hiedb -D /tmp/hiedb name-refs encode Data.Aeson
This is the query using Glass:
cabal run glass-democlient -- --service localhost:12345 \
references stackage/hs/aeson/Data/Aeson/var/encode
(side note: hiedb found 416 references while Glean found 415. I
haven’t yet checked where this discrepancy comes from.)
But these results don’t really tell the whole story.
In the case of hiedb, name-refs does a full table scan so it’s
going to take time proportional to the number of refs in the DB. Glean
meanwhile has indexed the references by name, so it can serve this
query very efficiently. The actual query takes a few milliseconds, the
main overhead is encoding and decoding the results.
The reason the Glass query takes longer than the raw Glean query is
because Glass also fetches additional information about each
reference, so it performs a lot more queries.
We can also do the raw hiedb query using the sqlite shell:
sqlite> select count(*) from refs where occ = "v:encode" AND mod = "Data.Aeson";
417
Run Time: real 2.038 user 1.213905 sys 0.823001
Of course hiedb could index the refs table to make this query much
faster, but it’s interesting to note that Glean has already done that
and it was still quicker to index and produced a smaller DB.
Performance of find-definition
Let’s find the definition of Data.Aeson.encode, first with hiedb:
(worth noting that hiedb is giving the span of the identifier only,
while Glass is giving the span of the whole definition. This is just a
different choice; the .hie file contains both.)
Again, the issue with hiedb is that its data is not indexed in a way
that makes this query efficient: the defs table is indexed by the
pair (hieFile,occ) not occ alone. Interestingly, when the module
is known it ought to be possible to do a more efficient query with
hiedb by first looking up the hieFile and then using that to query
defs.
What other queries can we do with Glean?
I’ll look at a couple of examples here, but really the possibilities
are endless. We can collect whatever data we like from the .hie
file, and design the schema around whatever efficient queries we want
to support.
Search by case-insensitive prefix
Let’s search for all identifiers that start with the case-insensitive
prefix "withasync":
In less than 0.1 seconds we find 55 such identifiers in Hackage. (the
output isn’t very readable so I didn’t include it here, but for
example this finds results not just in async but in a bunch of
packages that wrap async too).
Case-insensitive prefix search is supported by an index that Glean
produces when the DB is created. It works in the same way as efficient
find-references, more details on that below.
Why only prefix and not suffix or infix? What about fuzzy search? We
could certainly provide a suffix search too; infix gets more tricky
and it’s not clear that Glean is the best tool to use for infix or
fuzzy text search: there are better data representations for that kind
of thing. Still, case-insensitive prefix search is a useful thing to
have.
Could we support Hoogle using Glean? Absolutely. That said, Hoogle
doesn’t seem too slow. Also we need to index types in Glean before it
could be used for type search.
Identify dead code
Dead code is, by definition, code that isn’t used anywhere. We have a
handy way to find that: any identifier with no references isn’t
used. But it’s not quite that simple: we want to ignore references
in imports and exports, and from the type signature.
Admittedly finding unreferenced code within Hackage isn’t all that
useful, because the libraries in Hackage are consumed by end-user code
that we haven’t indexed so we can’t see all the references. But you
could index your own project using Glean and use it to find dead
code. In fact, I did that for Glean itself and identified one entire
module that was dead, amongst a handful of other dead things.
Here’s a query to find dead code:
N where
N = hs.Name _;
N.sort.external?;
hs.ModuleSource { mod = N.mod, file = F };
!(
hs.NameRefs { target = N, file = RefFile, uses = R };
RefFile != F;
coderef = (R[..]).kind
)
Without going into all the details, here’s roughly how it works:
N = hs.Name _; declares N to be a fact of hs.Name
N.sort.external?; requires N to be external (i.e. exported), as
opposed to a local variable
hs.ModuleSource { mod = N.mod, file = F }; finds the file F
corresponding to this name’s module
The last part is checking to see that there are no references to
this name that are (a) in a different file and (b) are in code,
i.e. not import/export references. Restricting to other files isn’t
exactly what we want, but it’s enough to exclude references from
the type signature. Ideally we would be able to identify those more
precisely (that’s on the TODO list).
You can try this on Hackage and it will find a lot of stuff. It might
be useful to focus on particular modules to find things that aren’t
used anywhere, for example I was interested in which identifiers in
Control.Concurrent.Async aren’t used:
N where
N = hs.Name _;
N.mod.name = "Control.Concurrent.Async";
N.mod.unit = "async-2.2.4-inplace";
N.sort.external?;
hs.ModuleSource { mod = N.mod, file = F };
!(
hs.NameRefs { target = N, file = RefFile, uses = R };
RefFile != F;
coderef = (R[..]).kind
)
This finds 21 identifiers, which I can use to decide what to deprecate!
Apparatus
Building all of Hackage
The goal was to build as much of Hackage as possible and then to index
it using both hiedb and Glean, and see how they differ.
To avoid problems with dependency resolution, I used a Stackage LTS
snapshot of package versions. Using LTS-21.21 and GHC 9.4.7, I was
able to build 2922 packages. About 50 failed for some reason or other.
And did a large cabal get to fetch all the packages in LTS-21.21.
Then
cabal build all --keep-going
After a few retries to install any required RPMs to get the dependency
resolution phase to pass, and to delete a few packages that weren’t
going to configure successfully, I went away for a few hours to let
the build complete.
It’s entirely possible there’s a better way to do this that I don’t
know about - please let me know!
Building Glean
The Haskell indexer I’m using is in this pull
request which at the time of writing isn’t merged yet. (Since I’ve
left Meta I’m just a regular open-source contributor and have to wait
for my PRs to be merged just like everyone else!).
Admittedly Glean is not the easiest thing in the world to build,
mainly because it has a couple of troublesome dependencies:
folly (Meta’s library of
highly-optimised C++ utilities) and RocksDB.
Glean depends on a very up to date version of these libraries so we
can’t use any distro packaged versions.
Full instructions for building Glean are
here but roughly it goes like
this on Linux:
Install a bunch of dependencies with apt or yum
Build the C++ dependencies with ./install-deps.sh and set some env vars
make
The Makefile is needed because there are some codegen steps that
would be awkward to incorporate into the Cabal setup. After the first
make you can usually just switch to cabal for rebuilding stuff
unless you change something (e.g. a schema) that requires re-running
the codegen.
Running Glean
I’ve done everything here with a running Glean server, which was
started like this:
While it’s possible to run Glean queries directly on the DB without a
server, running a server is the normal way because it avoids the
latency from opening the DB each time, and it keeps an in-memory cache
which significantly speeds up repeated queries.
The examples that use Glass were done using a running Glass server,
started like this:
cabal run glass-server -- --service localhost:1234 --port 12345
How does it work?
The interesting part of the Haskell indexer is the schema in hs.angle. Every
language that Glean indexes needs a schema, which describes the data
that the indexer will store in the DB. Unlike an SQL schema, a Glean
schema looks more like a set of datatype declarations, and it really
does correspond to a set of (code-generated) types that you can work
with when programmatically writing data, making queries, or inspecting
results. For more about Glean schemas, see the
documentation.
Being able to design your own schema means that you can design
something that is a close match for the requirements of the language
you’re indexing. In our Glean schema for Haskell, we use a Name,
OccName, and Module structure that’s similar to the one GHC uses
internally and is stored in the .hie files.
The indexer
itself
just reads the .hie files and produces Glean data using datatypes
that are generated from the schema. For example, here’s a fragment of
the indexer that produces Module facts, which contain a ModuleName
and a UnitName:
here NameRefs is a predicate—which you can think of as a datatype,
or a table in SQL—defined in terms of another predicate,
FileXRefs. The facts of the predicate NameRefs (rows of the table)
are derived automatically using this definition when the DB is
created. If you’re familiar with SQL, a stored derived predicate in
Glean is rather like a materialized view in SQL.
What’s next?
As I mentioned earlier, the indexer doesn’t yet index types, so that
would be an obvious next step. There are a handful of weird corner
cases that aren’t handled correctly, particularly around record
selectors, and it would be good to iron those out.
Longer term ideally the Glean data would be rich enough to produce the
Haddock docs. In fact Meta’s internal code browser does produce
documentation on the fly from Glean data for some languages - Hack and
C++ in particular. Doing it for Haskell is a bit tricky because while
I believe the .hie file does contain enough information to do this,
it’s not easy to reconstruct the full ASTs for declarations. Doing it
by running the compiler—perhaps using the Haddock API—would be
an option, but that involves a deeper integration with Cabal so it’s
somewhat more awkward to go that route.
Could HLS use Glean? Perhaps it would be useful to have a full Hackage
index to be able to go-to-definition from library references? As a
plugin this might make sense, but there are a lot of things to fix and
polish before it’s really practical.
Longer term should we be thinking about replacing hiedb with Glean?
Again, we’re some way off from that. The issue of incremental updates
is an interesting one - Glean does support incremental
indexing
but so far it’s been aimed at speeding up whole-repository indexing
rather than supporting IDE features.
[ I started thinking about this about twenty years ago, and then writing it down in 2019, but it seems to be obsolete. I am publishing it anyway. ]
The canonical division of the year into seasons in the northern
temperate zone goes something like this:
Spring: March 21 – June 21
Summer: June 21 – September 21
Autumn: September 21 – December 21
Winter: December 21 – March 21
Living in the mid-Atlantic region of the northeast U.S., I have never
been happy with this. It is just not a good description of the
climate.
I begin by observing that the year is not equally partitioned between
the four seasons. The summer and winter are longer, and spring and
autumn are brief and happy interludes in between.
I have no problem with spring beginning in the middle of March. I
think that is just right. March famously comes in like a lion and
goes out like a lamb. The beginning of March is crappy, like
February, and frequently has snowstorms and freezes. By the end of
March, spring is usually skipping along, with singing birds and not just the early
flowers (snowdrops, crocuses, daffodil) but many of the later ones also.
By the middle of May the spring flowers are over and the weather is
getting warm, often uncomfortably so. Summer continues through the
beginning of September, which is still good for swimming and
lightweight clothes. In late September it finally gives way to
autumn.
Autumn is jacket weather but not overcoat weather. Its last gasp is
in the middle of November. By this time all the leaves have changed,
and the ones that are going to fall off the trees have done so. The
cool autumn mist has become a chilly winter mist. The cold winter
rains begin at the end of November.
So my first cut would look something like this:
Months
Seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter
Note that this puts Thanksgiving where it belongs at the boundary
between autumn (harvest season) and winter (did we harvest enough to
survive?). Also, it puts the winter solstice (December 21) about one
quarter of the way through the winter. This is correct. By the
solstice the days have gotten short, and after that the cold starts to
kick in. (“As the days begin to lengthen, the cold begins to
strengthen”.) The conventional division takes the solstice as the
beginning of winter, which I just find perplexing. December 1 is
not the very coldest part of winter, but it certainly isn't autumn.
There is something to be said for it though. I think I can
distinguish several subseasons — ten in fact:
Dominus Seasonal Calendar
Months
Seasons
Sub-seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter
Midwinter
Late Winter
Early spring
Late spring
Early Summer
Midsummer
Late Summer
Early autumn
Late autumn
Early winter
Midwinter
Midwinter, beginning around the solstice, is when the really crappy
weather arrives, day after day of bitter cold. In contrast, early and
late winter are typically much milder. By late February the snow is
usually starting to melt. (March, of course, is always unpredictable,
and usually has one nasty practical joke hiding up its sleeve. Often,
March is pleasant and springy in the second week, and then mocks you
by turning back into January for the third week. This takes people by
surprise almost every year and I wonder why they never seem to catch
on.)
Similarly, the really hot weather is mostly confined to
midsummer. Early and late summer may be warm but you do not get
blazing sun and you have to fry your eggs indoors, not on the
pavement.
Why the seasons seem to turn in the middle of each month, and not at
the beginning, I can't say. Someone messed up, but who? Probably the
Romans. I hear that the Persians and the Baha’i start their year on
the vernal equinox. Smart!
Weather in other places is very different, even in the temperate
zones. For example, in southern California they don't have any of the
traditional seasons. They have a period of cooler damp weather in the
winter months, and then instead of summer they have a period of gloomy
haze from June through August.
However
I may have waited too long to publish this article, as climate change
seems to have rendered it obsolete. In recent years, we have barely
had midwinter, and instead of the usual two to three annual snows we
have zero. Midsummer has grown from two to four months, and summer
now lasts into October.
In the UK, it’s very common that your employer pays you once a
month. When this happens, they give you a document called a payslip,
that has some numbers on it, such as how much your salary is, how much
they paid you this month, how much went to HMRC in tax, how much went
to your pension, and a few other numbers. But they never show any
workings, so you really have no way to check whether any of these
numbers are correct. There are plenty of online take-home-pay
calculators, but these all focus on the full year; they have no
facility to calculate your next payslip.
About half way through April 2024, I stopped working for one
company. Everything was wrapped up – I received my final payslip from
them, along with my P45. I then had a few months off, and started a
new job in July 2024. When you start a new job it always takes a while
for money things to get sorted out, for example pension enrolment and
sorting out pension contributions, so it’s really worthwhile to keep a
close eye on your payslips particularly for these first few
months. Mine were arriving and some numbers looked right, but other
numbers, such as the amount of tax I was paying, were changing
dramatically, month to month. I had no idea why; whether they should
be changing like that; whether they were going to keep changing or
would eventually settle down. I had no way to check any of these
numbers. Was I going to get in trouble with HMRC and get investigated?
I was also a little on edge because this was the first job where my
pension contributions were using a thing called Qualifying
Earnings. In
all my previous jobs, if I chose for 10% of my salary to go into my
pension, then that’s what would happen. But now there was this thing
called Qualifying Earnings, which is (numbers correct at time of
writing) a band from £6240 to £50,270. If you’re earning, say £30k,
then your x% contribution is actually x% of £30,000-£6240. If
you’re earning above £50,270, then any further increase to your salary
will not result in any extra contributions to your pension because
you’re above the band. The 2008 Pensions Act, which created the legal
requirement for all employees to have workplace pensions and for
automatic enrolment (with a minimum 8% combined contribution from the
employer and employee), also created this concept of Qualifying
Earnings. I consider this is a pretty scummy way of reducing employer
pension contributions for large firms. It complicates the maths and no
doubt adds confusion for people trying to check their own
payslips. Given that 74% of the population have pensions that are too
small to retire
on,
this whole concept of Qualifying Earnings seems amoral at best.
These days, a lot of smaller companies outsource their payroll
processing. In my case, I was officially working for an international
Employer of Record
and they were then outsourcing payroll processing to local firms with
country-specific expertise. So when I started asking questions, there
was no ability to go and sit with someone and work through it. Or have
a call. It was all messages passed across multiple different systems,
and partial answers at best would come back several days later. Even
if your payroll is done in-house, I strongly suspect that a lot of the
time, some software package will be being used that does all the
calculations and quite likely no one will actually understand or be
able to explain the maths that’s going on.
After a while of getting no-where, and after uncovering some
substantial mistakes that had been made that affected me, I decided to
spend some weekends actually figuring out how
PAYE
works, and writing some code that can calculate my next payslip. This
library is available for
anyone to use. There’s a
README that
hopefully explains the basic principles of how the calculations are
done. This only works if your
tax-code ends
in an L, and it only works if you’re in National Insurance
categoryA. All the code can do is use some details you provide to predict
your next payslips. Also, I’m not a trained accountant or financial
adviser, and even for my own payslips, every month, the numbers don’t
quite match up (but they’re within £1). So please treat this as a toy,
rather than the basis for building a payroll processor!
Getting started
The library is written in Go so you’ll need Go
installed. Then, in a terminal do:
$ mkdir payslips
$ cd payslips
$ go mod init mypayslips
$ go get wellquite.org/tax@latest
Now we need to write a tiny amount of code. In your new payslips
directory, create a main.go file, and open it in your editor. You
want something like this:
We create a list of
Payslips. The first
payslip must specify a year, and your tax-code. These details are
automatically applied to the payslips that follow, if not explicitly
provided. Many of the calculations rely on year-to-date totals, and so
we must have a complete record of your payslips from the start of the
tax year. So that means the first payslip is month 1 (in this example,
April 2024), then month 2 (May 2024) and so on. If you have no income
for a month then you can just put in an empty payslip ({}). The
above example describes being paid in April and May 2024, then nothing
in June, and then being paid (with a higher salary) in July, August
and September.
Save this main.go file. Then, back in your terminal, in your
payslips directory, just do:
go run main.go
You should get some output showing all sorts of calculations,
including income tax, and personal
allowance. With a little luck,
if you change the numbers to match your own salary and other details,
the numbers produced should match quite closely your own payslips,
provided nothing you’re doing is too exotic.
There is documentation for all the different
fields that you can
provide in each payslip. In general, the code will try to fill in
missing values. It should be able to cope with things like
salary-sacrifice, or, if you change job within a month and have
several payslips for the same month, this should work too. Everything
is run locally on your computer: please feel free to check the
source – there are no 3rd
party libraries at all, and nothing imports the net package. It’ll
work just the same if you yank out your network cable or disable your
WiFi.
Note however, this code is lightly tested. Whilst it works for me
(and one or two friends), I make no claims that it correctly models
the entirety of PAYE, so it may very well not work for you. Feedback,
contributions, corrections, and patches are all very welcome!
Why doesn’t [the Data.Map function] unionWith :: (a -> a -> a) -> Map k a -> Map k a -> Map k a allow for different value types the way intersectionWith :: (a -> b -> c) -> Map k a -> Map k b -> Map k c does?
This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation.
To answer the original question, what would the type of a different value type of unionWith look like? It would be something in the flavor of:
unionWith :: (Maybe a ->Maybe b -> c) ->Map k a ->Map k b ->Map k c
But this new Maybe a -> Maybe b -> c parameter is somewhat lossy, in that it gives the impression that it could be called with Nothing Nothing as parameters, which doesn’t fit into the vibe of being a “union.”
So instead we could restrict that possibility by using These a b:
dataThese a b =This a |That b |These a bunionWith :: (These a b -> c) ->Map k a ->Map k b ->Map k c
which seems reasonable enough.
But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map?
A Map k v is a particularly efficient implementation of functions with type k -> Maybe v. But why is this Maybe here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be Monoid v => k -> v. In fact, it helps us make sense of the right bias present in Data.Map, where we see:
lookup k (singleton k v1 <> singleton k v2) =Just v2
This equality is hard to justify under the normal understanding of Map k v being an encoding of a function k -> Maybe v. But under the general monoid interpretation, we get a nice semigroup homomorphism:
lookup k (m1 <> m2) =lookup k m1 <>lookup k m2
where the monoid in question has been specialized to be Last.
Of course, we also have a monoid homomorphism:
lookup k mempty=mempty
Let’s re-evaluate the original question in terms of this newly-generalized Map. Now that we’ve removed all of the unnecessary baggage of Maybe, we can again think about the desired type of unionWith:
unionWith :: (a -> b -> c)->Map k a->Map k b->Map k c
which looks awfully familiar. This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with mempty as a parameter!
We can give some semantics as to what unionWith ought to do again by relating it to the observation lookup. The relevant law here seems like it ought to be:
lookup k (unionWith f m n) = f (lookup k m) (lookup k n)
By choosing a degenerate function f, say, \_ _ -> nontrivial, where nontrivial is some value that is notmempty, we can see the beginnings of a problem:
lookup k (unionWith f m n)= f (lookup k m) (lookup k n)=<let f = \_ _ -> nontrivial> nontrivial
Regardless of the key we lookup in our unionWithed Map, we need to get back nontrivial. How can we implement such a thing? I see only two ways:
explicitly associate every key in the map with nontrivial, or
keep nontrivial around as a default value in the map
#1 is clearly a non-starter, given that we want our Maps to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this:
dataMap k v =Map { defaultValue :: v , implementation ::Data.Map.Map k v }deriving stock Genericderiving (Semigroup, Monoid) via (Generically (Map k v))unionWith :: (a -> b -> c)->Map k a->Map k b->Map k cunionWith f (Map def1 imp1) (Map def2 imp2) =Map (f def1 def2) (liftA2 f imp1 imp2)
Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism:
lookup k (m1 <> m2) =lookup k m1 <>lookup k m2
Without loss of generalization, take m2 = pure nontrivial (where pure is just unionWith with a constant function.) This gives us:
lookup k (m1 <>pure nontrivial) =lookup k m1 <> nontrivial
Making this thing efficient is a further complication! We again have two options:
modify the value at every key by multiplying in nontrivial, or
finding a way of suspending this computation
#1 clearly requires \(O(n)\) work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire Map. For example, consider a Map constructed a la:
Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of Data.Map.Map.
I find this quite an interesting result. I always assumed that Data.Map.Map (or at least, Data.Map.Monoidal.MonoidalMap) didn’t have an Applicative instance because it would require a Monoid constraint on its output—but that’s not the sort of thing we can express in Haskell.
But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of Applicative, even if we could constrain the result.
What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of Data.Map (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.
Usually I write about solutions to problems I’ve worked out, but I’ve found myself increasingly becoming interesting in where solutions come from. Maybe it’s because I’ve been reading Boorstin’s excellent The Discoverers, which I’d strongly recommend.
Regardless of why, I thought I’d switch up the usual dance step today, and discuss what solving my most-recent-big-problem actually looked like, in terms of what I tried, where I looked, and what the timeline was.
The Problem
The problem is to serialize a program graph into a series of let-bindings. For example, given the following graph:
+
/ \
f ---> g
| / \
a \ /
expensive
which represents the program:
f a (g expensive expensive) + g expensive expensive
Unfortunately, this is a naive representation of the program, since it duplicates the work required to compute expensive four times, and g expensive expensive twice. Instead, we would prefer to generate the equivalent-but-more-efficient program:
let$0= expensive$1= g $0$0in f a $1+$1
This transformation is affectionately known as sharing, since it shares the computed answer whenever there is repeated work to be done.
So this is what we’re trying to do. Given the original graph, determine the best place to insert these let-bindings, for some reasonable definition of “best.” We can assume there are no side effects involved, so any place that an expression is well-scoped is an acceptable solution.
In order to understand some of my attempted solutions, it’s worth noting that our final solution should build something of type Expr, and the original graph is represented as a IntMap (ExprF Int). ExprF is the Base functor of Expr, with all of its self-references replaced by some type variable, in this case Int. Thus, the graph above looks much more like:
I spent over a year trying to solve this problem, with various mostly-working solutions during that time. My strategy here was to think really hard, write up some algorithm that seemed plausible, and then run it against our (small) battery of integration tests to make sure it got the same answer as before.
Why not property test it? I tried, but found it very challenging to implement well-typed generators that would reliably introduce shared thunks. But maybe there’s a different lesson to be learned here about writing good generators.
Anyway. For eight months, one of these think-really-hard algorithms fit the bill and didn’t give us any problems. It was a weird, bespoke solution to the problem that independetly kept track of all of the free variables in every graph fragment, and tried to let-bind a fragment as soon as we landed in a context where all of the free variables were in scope. It seemed to work, but it was extremely messy and unmaintainable.
At the time of writing, this sharing algorithm was the only source of let-binds in our entire language, which meant that it didn’t need to account for let-binds in the program.
Of course, that invariant eventually changed. We added a way in the source langauge to introduce lets, which meant my algorithm was wrong. And I had written it sufficiently long ago that I no longer remembered exactly why it worked. Which meant the theory of my program was lost, and thus that we ought to rewrite it.
Unfolding a Solution
I went back to the problem statement, and stared at it for a long time (back to the think-really-hard algorithm!) Upon staring at the problem, I realized that what I was really trying to do was determine where diamond patterns arose in the propgram graph.
Recall our original graph:
+
/ \
f ---> g
| / \
a \ /
expensive
If we redraw it such that g is on a different rank than f, then the two diamond patterns become much clearer:
+
/ \
f |
| \ |
a \ /
g
/ \
\ /
expensive
The insight I came up with is that if a node n is the source of a diamond, then we must let-bind the sink of the diamond immediately before inlining the definition of n.
This gives rise to the question of “how do we identify a diamond?” What we can do is give a mapping from each node to its reachable set of nodes. For example, in the above, we’d compute the map:
+ -> {+, f, a, g, expensive}
f -> {f, a, g, expensive}
a -> {a}
g -> {g, expensive}
expensive -> {expensive}
Then when we go to inline a node, say, +, we can look for any nodes that are reachable via more than one of its immediate subterms. Since the immediate subterms of + are f and g, we can take the intersections of their reachable sets:
{f, a, g, expensive} union {g, expensive}
giving us
{g, expensive}
which is exactly the set of nodes that we need to perform sharing on. If you topologically sort this set, it gives you the order that you should perform your let bindings.
EXCEPT there’s a kink in the whole thing. What happens if one of the terms in this diamond contains free variables? In particular, we might have something like this:
+
/ \
f |
| \ |
a \ /
λx
/ \
\ /
expensive
|
x
This gives us an analogous set of reachable nodes when we look at +, but we obviously can’t lift expensive x above the lambda.
Resolving this problem required giving up on the notion of memoizing the entire reachable set of nodes, and to instead crawl the graph ensuring that everything is well-scoped.
Performance Woes
My algorithm looked fine, and, importantly, got the right answer in a reasonable amount of time on our (small) battery of integration tests. So I shipped it, commended myself on a job well done, and thought nothing more about it. For about a week, until a bug report came in saying that our compiler now seemed to hang on big programs.
Which was something I hadn’t noticed, since we didn’t have any big programs in our integration tests.
Damn!
Upon digging in to what exactly was so slow, I noticed that my algorithm was accidentally quadratic. I needed to fold over every node in the graph, and that required looking at the entire reachable set underneath it. I had put in some of the obvious safeguards, hoping that they would prune the search tree early, but it wasn’t enough sacrifice for the Great God of Asymptotes.
Did I mention that at this point in the story, having this algorithm working fast was on the critical path of the company? Everybody else was blocked on me figuring this out. Talk about pressure!
Anyway. You’ll notice above that in my description of the algorithm, everything sounds fine. But the juice is in the details, as the common saying goes. Computing reachability isn’t quite the right thing to be using here, as it gave us the wrong answer for the lambda example above. Which is unfortunate because reachability is something we can do in linear time.
And then when reachability didn’t work, I just threw away the fast performance and hoped my bespoke algorithm would do the job. My only redemption comes from the fact that at least it got the right answer, even if it did so very slowly.
Finding the Kernel
Back to the drawing board.
Whenever I have graph theory problems, I call up my boy Vikrem. He’s good at nerd stuff like this.
We rubberducked the problem, and tried to reframe the problem in the language of graph theory. We had a Merkiv–Maguire moment where we indepdently realized that the goal was somehow related to finding the lowest common ancestor (LCA) of a node.
Which is to say, roughly, that we are looking for forks in the diamond diagram. Which we already knew, but it was nice to have some language for.
Our new problem is that LCA is defined only over trees. There are some extensions to DAGs, but none of them seem to be particularly well founded. However, searching for exactly that brought me to this stackoverflow question, where nestled in the comments is someone suggesting that the poster isn’t looking for LCA, but instead for a related notion the lowest single common ancestor. LSCA is defined in a 2010 paper New common ancestor problems in trees and directed acyclic graphs.
The standard definition of LCA(x, y) = l is that “l is an ancestor of x and of y, and that no descendent of l has this property.”
But the definition of LSCA(x, y) = l is that “l lies on all root-to-x paths, and that l lies on all root-to-y paths, and that no descendent of l has this property.”
The distinction between the two is easily seen in the following graph:
0
/ \
1 2
| X |
3 4
Under the standard definition, LCA is not uniquely defined for DAGs. That is, LCA(3, 4) = {1, 2}. But neither 1 nor 2 lies on all paths from the root. Under LSCA therefore we get LSCA(3, 4) = 0, which is the obviously-correct place to let-bind 3 and 4.
The paper gives a preprocessing scheme for computing LSCA by building a “lowest single ancestor” (LSA) tree. The LSA of a node is the LSCA of all of its in-edges. This definition cashes out to mean “the most immediate diamond above any node.” Finally! This is exactly what we’re looking for, since this is where we must insert our let-bindings! Even better, the paper gives us an algorithm for computing the LSA tree in linear time!
The First Implementer
Of course, I’m lazy and would prefer not to implement this thing. So instead I searched on hackage for lsca, and found nothing. But then I searched for lca and found that, like always, Ed Kmett was 13 years ahead of me.
The lca package implements an \(O(log n)\) algorithm for computing the LCA of any two nodes in a graph. Which is very convenient for me, since the LSCA algorithm requires being able to do this.
Time to roll up the sleeves and get cracking I suppose.
The paper was surprisingly straightforward, and my first attempt implemented the (imperative) algorithms as given (imperatively.) The first step is to do a topological sort on the DAG in order to know in which order one ought to unfold the LSA tree.
But as is so often the case, this topological sort isn’t actually relevant to the algorithm; it’s just an encoding detail of expressing the algorithm imperatively. But you don’t need that when you’ve got laziness on your side! Instead you can just tie the know and do something cool like this:
lsaTree ::Ord v =>Map v (Set v) ->Map v (Path v)lsaTree input = fix $ \result -> M.fromList $do (node, parents) <- M.toList inputlet parentResults =fmap (result M.!) parents...
Notice how we use fix to bind the eventual result of the final computation. Then we can chase pointers by looking them up in result—even though it’s not yet “computed.” Who cares what order the computer does it in. Why is that a thing I should need to specify?
Anyway. The exact details of implementing LSA are not particularly important for the remainder of this blog post. If you’re interested, you can peep the PR, which is delightfully small.
Tying It All Back Together
Equipped with my LSA tree, I was now ready to go back and solve the original problem of figuring out where to stick let-bindings. It’s easy now. Given the original program graph, find the LSA for each node. The LSA is the place you should insert the let binding.
So given the map of nodes to their LSAs, invert that map and get back a map of nodes to descendents who have this node as an LSA. Now when you go to inline a node, just look up everything in this map and inline it first.
It turns out to be a very elegant solution. It’s one third of the length of my horrible ad-hoc implementations, and it runs in linear time of the number of nodes in the graph. All in all, very good.
More often than I’m comfortable about, people will ask me how I can have so many good ideas. And what I like about this story is that it’s pretty typical of how I actually “have” “good” ideas. I’m reminded of the fact that luck favors the prepared mind. Attentive readers will notice that none of this process was due to brilliance on my part. I happened to know Vikrem who’s a genius. Together we pulled at some ancient graph theory strings and remembered a fact that someone else had thought important to teach us. That wasn’t actually the right path, but it lead us to stackoverflow where someone had linked to a relevant paper. I implemented the paper using a library that someone else had done the heavy lifting on, and simplified the implementation using this knot-tying trick I picked up somewhere along the way.
Also, I’m just really pleased that the solution came from trying to reverse engineer the relevant graph-theory search terms. Maybe that’s the actual takeaway here.
A few days ago I got angry at xargs for the hundredth time, because
for me xargs is one of those "then he had two problems" technologies.
It never does what I want by default and I can never remember how to
use it. This time what I wanted wasn't complicated: I had a bunch of
PDF documents in /tmp and I wanted to use GPG to encrypt some of
them, something like this:
gpg -ac $(ls *.pdf | menupick)
menupick
is a lovely little utility that reads lines from standard input,
presents a menu, prompts on the terminal for a selection from the
items, and then prints the selection to standard output. Anyway, this
didn't work because some of the filenames I wanted had spaces in them,
and the shell sucks. Also because
gpg probably only does one file at a time.
I could have done it this way:
ls *.pdf | menupick | while read f; do gpg -ac "$f"; done
but that's a lot to type. I thought “aha, I'll use xargs.” Then I
had two problems.
ls *.pdf | menupick | xargs gpg -ac
This doesn't work because xargs wants to batch up the inputs to run
as few instances of gpg as possible, and gpg only does one file at
a time. I glanced at the xargs manual looking for the "one at a
time please" option (which should have been the default) but I didn't
see it amongst the forest of other options.
I think now that I needed -n 1 but I didn't find it immediately, and
I was tired of looking it up every time when it was what I wanted
every time. After many years of not remembering how to get xargs to
do what I wanted, I decided the time had come to write a stripped-down
replacement that just did what I wanted and nothing else.
(In hindsight I should perhaps have looked to see if gpg's
--multifile option did what I wanted, but it's okay that I didn't,
this solution is more general and I will use it over and over in
coming years.)
xar is a worse version of xargs, but worse is better (for me)
First I wrote a comment that specified the scope of the project:
# Version of xargs that will be easier to use
#
# 1. Replace each % with the filename, if there are any
# 2. Otherwise put the filename at the end of the line
# 3. Run one command per argument unless there is (some flag)
# 4. On error, continue anyway
# 5. Need -0 flag to allow NUL-termination
There! It will do one thing well, as Brian and Rob commanded us in
the Beginning Times.
I wrote a draft implementation that did not even do all those things,
just items 2 and 4, then I fleshed it out with item 1. I decided that
I would postpone 3 and 5 until I needed them. (5 at least isn't a
YAGNI, because I know I have needed it in the past.)
The result was this:
import subprocess
import sys
def command_has_percent(command):
for word in command:
if "%" in word:
return True
return False
def substitute_percents(target, replacement):
return [ s.replace("%", replacement) for s in target ]
def run_command_with_filename(command_template, filename):
command = command_template.copy()
if not command_has_percent(command):
command.append("%")
res = subprocess.run(substitute_percents(command, filename), check=False)
return res.returncode == 0
if __name__ == '__main__':
template = sys.argv[1:]
ok = True
for line in sys.stdin:
if line.endswith("\n"):
line = line[:-1]
if not run_command_with_filename(template, line):
ok = False
exit(0 if ok else 1)
Short, clean, simple, easy to use. I called it xar, ran
ls *.pdf | menupick | xar gpg -ac
and was content.
Now again, with Claude
The following day I thought this would be the perfect opportunity to
try getting some LLM help with programming. I already had a baseline
version of xar working, and had thought through the problem
specification. Now I could do it over with the LLM and compare the
two results. The program I wanted was small and self-contained. If
the LLM sabotaged me somehow, I would probably notice, and if I
didn't, it wouldn't matter, because I had a working version
already.
So I asked Claude about it. I knew Claude was not the right choice,
because Claude is not really designed for this task, and has no
integration with the coding editor, but Claude is what I am familiar
with so I thought I'd give it a shot.
On the whole it went extremely well.
The complete transcript is here. I
imagine this was pretty much a maximally good experience, that all the
terrible problems of LLM coding arise in larger, more complicated
systems. But this program is small and self-contained, with no
subtleties, so the LLM could be at its best.
Phase 1
First I just told Claude what I wanted and let it write the program
from scratch.
Let's write a program in Python called xar. It will be a
stripped-down version of the standard Unix xargs utility, but it
will do less and be simpler to use.
It will get command-line
arguments that specify a command template. Then it will read lines
from stdin. Each line will be appended to the command template to
produce a command. The program will run the resulting command, one
command for each line of input.
Any questions?
Default Claude is very chatty and I am going to omit most of its
chatty comments, which make me impatient.
The program it wrote it was not what I would have written, but it was
good enough. If I had just used it right off the bat, instead of
writing my own, it would have been enough, and it would have taken
somewhere between 2% and 10% as long to produce.
So the one-line summary of this article is: I should stop writing
simple command-line utilities, and just have Claude write them
instead, because Claude's are good enough, and definitely better than
yak-shaving.
Here's Claude's initial version:
#!/usr/bin/env python3
import sys
import subprocess
import shlex
def main():
if len(sys.argv) < 2:
print("Usage: xar command [args...]", file=sys.stderr)
sys.exit(1)
command_template = sys.argv[1:]
for line in sys.stdin:
line = line.rstrip('\n')
full_command = command_template + [line]
try:
result = subprocess.run(full_command, check=True)
except subprocess.CalledProcessError as e:
print(f"Command failed with exit code {e.returncode}: {' '.join(full_command)}",
file=sys.stderr)
except FileNotFoundError:
print(f"Command not found: {full_command[0]}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()
Claude's version had numerous comments, which I have omitted. I later
told it to stop putting in comments, which it did.
Claude's use of check here was not what I wanted, because that makes
subprocess.run raise an exception when the subcommand fails, and
then the exception has to be immediately caught and handled. My
original control flow had been simpler:
res = subprocess.run(substitute_percents(command, filename), check=False)
return res.returncode == 0
…
if not run_command_with_filename(template, line):
ok = False
Claude's program pulled in shlex without noticing that it was
unused. But Claude did teach me about str.rstrip('\n') which I had
not known about before (or maybe had forgotten), so that was one small
win already.
Argument parsing
The next step was a big win. Python's library for command-line
argument handling is called argparse and it is really nice. If I
were still writing programs in Perl, I would implement a Perl version
of argparse because Perl has 29 available argument parsing libraries
and they are all loathsome. The one problem with argparse is I never
remember off the top of my head how to use it. I think the module is
called argparse but it provides a class called Argparser but I
often get these backward and try to use argparser and Argparse.
Instead of figuring it out every time I usually dig up some previous
Python program and then copy-paste the argument parser from there,
amending it to suit the purpose.
But this time I didn't have to do that. Instead, I just said to
Claude:
This is good, now please add code at the top to handle argument
parsing with the standard Argparse library, even though there are
no options yet.
Claude handed me back pretty much the same program, but with the
argument parser at the top:
def parse_arguments():
"""Parse command-line arguments using argparse."""
parser = argparse.ArgumentParser(
description="A simplified version of xargs that appends each line from stdin to a command template.",
prog="xar"
)
# Store all positional arguments as the command to execute
parser.add_argument(
"command",
nargs="+",
help="The command template to execute for each input line"
)
return parser.parse_args()
Let's pause for a moment. Maybe you kids are unimpressed by this.
But if someone had sent this interaction back in time and showed it to
me even as late as five years ago, I would have been stunned. It
would have been completely science-fictional. The code that it
produced is not so impressive, maybe, but that I was able to get it in
response to an unstructured request in plain English is like seeing a
nuclear-powered jetpack in action.
I make some changes
It bothered me that the command-line arguments were being taken
implicitly from sys.argv; that makes the program hard to test. A
better way to do this is to have the list of command-line arguments be a
function to the main function, and then for the runtime to call
main(sys.argv[1:]) instead of just main(). That way you can test
the program without having to fake up sys.argv. So I changed that,
and while I was making changes I got rid of Claude's comments and the
unnecessary exception handling. I pasted the result back at Claude,
saying:
Good so far. I have adjusted your source code in several ways:
I removed the comments.
I removed the exception handling for the subcommand. If one of the subcommands fails, that no longer terminates the main loop
Command-line arguments are passed to the main() function rather than being accessed directly
Here is the code. Tell me if you see any problems or if you have any questions about it
I had also sabotaged the code with at least one likely error.
Python's subprocess.run returns a “result” structure that includes
the subcommand's output and exit status, and I can never remember the
name of the component that records the exit status. I either look it
up or get it by trial-and-error. This time I didn't bother to either,
I just guessed that it was result.exitcode, which I handed off to
Claude even though I thought was probably wrong.
Claude's reply said:
Here are a couple of issues I noticed:
There's an error in the error handling code - you're using result.exitcode but in the subprocess.run() function, the attribute is called returncode not exitcode
Awesome, now I don't have to remember that name of that component.
Claude continued:
The parse_arguments() function definition is missing in the pasted code, but it would need to be updated to accept the argv parameter
I had passed sys.argv[1:] to main, and then had main call parse_arguments(argv).
But I hadn't changed parse_arguments to expect an argument list. (I
hadn't included it in my paste at all, in fact.) Claude was exactly
right, and continued with the corrected version:
#!/usr/bin/env python3
import sys
import subprocess
import argparse
def parse_arguments(argv):
…
return parser.parse_args(argv)
def main(argv):
# Parse command-line arguments
args = parse_arguments(argv)
…
if __name__ == "__main__":
ok = main(sys.argv[1:])
exit(0 if ok else 1)
Handling NUL-separated values
I said
It's great, thanks for catching those errors. Now let's add a -0 command-line option that tells the program to split the input on "\0" characters instead of at line breaks.
Recall that my handwritten xar did not do this yet; I had left that
feature for another day. Claude just added it for me, adding
parser.add_argument(
"-0", "--null",
action="store_true",
help="Input items are terminated by a null character instead of by newline"
)
to the argument parser and changing the main loop to:
# Split input based on the delimiter option
if args.null:
# Split on null characters
items = stdin_data.split('\0')
else:
# Split on newlines
items = stdin_data.splitlines()
for item in items:
if not item:
continue
…
I was tired of Claude's comments, so I said
Give me the same thing, with no comments
which it did, so I said
From now on leave out the comments. I'm a Python expert and I don't need them. If there is something I don't understand I will ask you.
Claude complied. If I were going to do use Claude again in the future
I would include that in the canned instructions that Claude is given
up front. Instead I will probably use a tool better-suited to
programming, and do whatever the analogous thing is.
Template filling
Now I told Claude to add the % feature:
I changed if not item to if item == "". Now let's make the following change:
If the command template includes any % characters, each of these should be replaced with the input item.
Otherwise, if there were no % characters, the input item should be appended to the end of the command as in the current version
Claude did this. It used an explicit loop instead of the list
comprehension that I had used (and preferred), but it did do it correctly:
for arg in command_template:
if '%' in arg:
has_placeholder = True
full_command.append(arg.replace('%', item))
else:
full_command.append(arg)
if not has_placeholder:
full_command.append(item)
Even without the list comprehension, I would have factored
out the common code:
for arg in command_template:
if '%' in arg:
has_placeholder = True
full_command.append(arg.replace('%', item))
if not has_placeholder:
full_command.append(item)
But I am not going to complain, my code is simpler but is
doing unnecessary work.
Claude also took my hint to change item == "" even though I didn't
explicitly tell it to change that.
At this point the main loop of the main function was 15 lines long,
because Claude had stuck all the %-processing inline. So I said:
Good, let's extract the command template processing into a subroutine.
It did this right, understanding correctly what code I was referring
to and extracting it into a subroutine called
process_command_template. More science fiction: I can say "command
template processing" and it guesses what I had in mind!
This cut the main loop to 7 lines. That worked so well I tried it
again:
Good, now let's extract the part of main that processes stdin into a subroutine that returns the items array
It pulled the correct code into a function called process_stdin. It
did not make the novice mistake of passing the entire args structure
to this function. In the caller it had process_stdin(args.null) and
inside of process_stdin this parameter was named
use_null_delimiter,
YAGNI?
At this point I was satisfied but I thought I might as well ask if it
should do something else before we concluded:
Can you think of any features I left out that would be useful enough
to warrant inclusion? Remember this program is supposed to be small
and easy to use, in contrast to the existing xargs which is very
complicated.
Claude had four suggestions:
A -p or --parallel option to run commands in parallel
A -n or --max-args option to specify the maximum number of items to pass per command
A simple -v or --verbose flag to show commands as they're executed
A way to replace the command's standard input with the item instead of adding it as an argument
All reasonable suggestions, nothing stupid. (It also supplied code for #3,
which I had not asked for and did not want, but as I said before,
default Claude is very chatty.)
Parallelization
I didn't want any of these, and I knew that #2–4 would be easy to add if I
did want any of them later. But #1 was harder. I've done code like
this in the past, where the program has a worker pool and runs a new
process whenever the worker pool isn't at capacity. It's not even that
hard. In Perl you can play a cute trick and use something like
$workers{spawn()} = 1 while delete $workers{wait()};
where the workers hash maps process IDs to dummy values. A child
exits, wait() awakens and returns the process ID of the completed
child, which is then deleted from the map, and the loop starts another
worker.
I wanted to see how Claude would do it, and the result was an even
bigger win than I had had previously, because Claude wrote this:
with concurrent.futures.ProcessPoolExecutor(max_workers=args.parallel) as executor:
futures = [executor.submit(execute_command, cmd, args.verbose) for cmd in commands]
for future in concurrent.futures.as_completed(futures):
success = future.result()
if not success:
ok = False
What's so great about this? What's great is that I hadn't known about
concurrent.futures or ProcessPoolExecutor. And while I might have
suspected that something like them existed, I didn't know what they
were called. But now I do know about them.
If someone had asked me to write the --parallel option, I would have
had to have this conversation with myself:
Python probably has something like this already. But how long will
it take me to track it down? And once I do, will the API
documentation be any good, or will it be spotty and incorrect? And
will there be only one module, or will there be three and I will
have to pick the right one? And having picked module F6, will I
find out an hour later that F6 is old and unmaintained and that
people will tell me “Oh, you should have used A1, it is the new
hotness, everyone knows that.”
When I put all that uncertainty on a balance, and weigh it
against the known costs of doing it myself, which one wins?
The right choice is: I should do the research, find the good module (A1, not
F6), and figure out how to use it.
But one of my biggest weaknesses as a programmer is that I too often
make the wrong choice in this situation. I think “oh, I've done this
before, it will be quicker to just do it myself”, and then I do and it
is.
Let me repeat, it is quicker to do it myself. But that is still
the wrong choice.
Maybe the thing I wrote would be sooner or smaller or faster or more technically
suitable to the project than the canned module would have been. But it
would only have been more be technically suitable today. If it
needed a new feature in the future it might have to be changed by
someone who had never seen it before, whereas the canned module could well
already have the needed feature ready to go, already documented, and
perhaps already familiar to whoever had to make the change. My
bespoke version would certainly be unfamiliar to every other
programmer on the project — including perhaps myself six months later — so would be the wrong thing to use.
I'm really good at hacking this stuff up. Which is a problem. It
makes me want to hack stuff up, even when I shouldn't.
Claude tips the balance strongly toward the correct side, which is
that I should use the prepackaged module that someone else wrote and
not hack something up.
And now I know about concurrent.futures.ProcessPoolExecutor! The
world is full of manuals, how can I decide which ones I should read?
One way is to read them all, which used to work back when I was
younger, but now I have more responsibilities and I don't have time to
read the entire Python library reference including all the useless
bits I will never use. But here's Claude pointing out to me that this
is something I need to know about, now, today, and I should read this one. That is
valuable knowledge.
Claude flubs
While testing the program I did
xar --parallel=2 bash -c '...'
which didn't work because the argparse library tried to parse the
-c as a flag to xar, when I wanted it to be a flag to bash.
This was my fault, and there is a standard way around it, which is
that I should have said
xar --parallel=2 -- bash -c '...'
which tells argparse to drop the -- and take everything after it
literally with no interpretation.
I thought I'd see what Claude had to say about this, and it dropped
the ball in a very LLM sort of way.
I asked:
When I did xar --parallel=2 bash -c '...' it misunderstood the
arguments. It thought I was trying to pass -c to xar and failed
with unrecognized arguments. What should I do about that?
Claude replied:
There are two main approaches to handle this:
Use the -- separator to indicate the end of arguments for xar: …
Modify your code to handle this case specifically by looking for a command terminator in the argument parser.
Here's how you could modify your code to implement the first
approach:
And then it followed with the same, unmodified code…
Which was correct, because -- is already handled by argparse, so
no change was actually required to implement the first approach! But
Claude's characterization of it as a modification was misleading,
wrong, and potentially very confusing.
I said:
I don't see what the difference is in that argument parsing code as compared to what I had before.
and Claude got into the LLM tail-chasing mode where it got more and
more confused:
You're right, I apologize for the confusion. There's no difference
in the argument parsing code I provided compared to what you had
before. The issue is with how argparse handles arguments by
default.
To fix the issue with xar --parallel=2 bash -c '...', you need to
actually implement support for the -- separator in your code…
# Find the index of '--' if it exists
try:
separator_index = argv.index('--')
xar_args = argv[:separator_index]
command_args = argv[separator_index+1:]
except ValueError: …
Ugh, all wrong in the worst way. The code probably works, but it is
completely unnecessary. Claude's claim that “you need to actually
implement support for the -- separator” is flat wrong. I pointed
this out and Claude got more confused. Oh well, nobody is perfect!
Lessons learned
A long time ago, when syntax-coloring editors were still new, I tried
one and didn't like it, then tried again a few years later and
discovered that I liked it better than I had before, and not for the
reasons that anyone had predicted or that I would have been able to
predict.
(I wrote an article about the surprising reasons to use the syntax coloring.)
This time also. As usual, an actual experiment produced unexpected
results, because the world is complicated and interesting. Some of
the results were unsurprising, but some were not anything I would have
thought of beforehand.
Claude's code is good enough, but it is not a magic oracle
Getting Claude to write most of the code was a lot faster and easier
than writing it myself. This is good! But I was dangerously tempted
to just take Claude's code at face value instead of checking it
carefully. I quickly got used to flying along at great speed, and it
was tough to force myself to slow down and be methodical, looking over
everything as carefully as I would if Claude were a real junior
programmer. It would be easy for me to lapse into bad habits,
especially if I were tired or ill. I will have to be wary.
Fortunately there is already a part of my brain trained to deal with
bright kids who lack experience, and I think perhaps that part of my brain
will be able to deal effectively with Claude.
I did not notice any mistakes on Claude's part — at least this time.
At one point my testing turned up what appeared to be a bug, but it
was not. The testing was still time well-spent.
Claude remembers the manual better than I do
Having Claude remember stuff for me, instead of rummaging the
manual, is great. Having Claude stub out an argument parser,
instead of copying one from somewhere else, was pure win.
Partway along I was writing a test script and I wanted to use that
Bash flag that tells Bash to quit early if any of the subcommands
fails. I can never remember what that flag is called. Normally I
would have hunted for it in one of my own shell scripts, or groveled
over the 378 options in the bash manual. This time I just asked in
plain English “What's the bash option that tells the script to abort
if a command fails?” Claude told me, and we went back to what we were
doing.
Claude can talk about code with me, at least small pieces
Claude easily does simple refactors. At least at this scale, it got
them right. I was not expecting this to work as well as it did.
When I told Claude to stop commenting every line, it did. I
wonder, if I had told it to use if not expr only for Boolean
expressions, would it have complied? Perhaps, at least for a
while.
When Claude wrote code I wasn't sure about, I asked it what it was
doing and at least once it explained correctly. Claude had written
parser.add_argument(
"-p", "--parallel",
nargs="?",
const=5,
type=int,
default=1,
help="Run up to N commands in parallel (default: 5)"
)
Wait, I said, I know what the const=5 is doing, that's so that if
you have --parallel with no number it defaults to 5. But what is
the --default doing here? I just asked Claude and it told me:
that's used if there is no --parallel flag at all.
This was much easier than it would have been for me to pick over
the argparse manual to figure out how to do this in the first
place.
More thoughts
On a different project, Claude might have done much worse. It might
have given wrong explanations, or written wrong code. I think that's
okay though. When I work with human programmers, they give wrong
explanations and write wrong code all the time. I'm used to it.
I don't know how well it will work for larger systems. Possibly pretty
well if I can keep the project sufficiently modular that it doesn't get
confused about cross-module interactions. But if the criticism is
“that LLM stuff doesn't work unless you keep the code extremely
modular” that's not much of a criticism. We all need more
encouragement to keep the code modular.
Programmers often write closely-coupled modules knowing that it is bad
and it will cause maintenance headaches down the line, knowing that the
problems will most likely be someone else's to deal with. But what if
writing closely-coupled modules had an immediate cost today, the cost
being that the LLM would be less helpful and more likely to mess up
today's code? Maybe programmers would be more careful about letting
that happen!
Will my programming skill atrophy?
Folks at Recurse Center were discussing this question.
I don't think it will. It will only atrophy if I let it. And I have a
pretty good track record of not letting it. The essence of
engineering is to pay attention to what I am doing and why, to try to
produce a solid product that satisifes complex constraints, to try
to spot problems and correct them. I am not going to stop doing
this. Perhaps the problems will be different ones than they were
before. That is all right.
Starting decades ago I have repeatedly told people
You cannot just paste code with no understanding of
what is going on and expect it to work.
That was true then without Claude and it is true now with Claude. Why
would I change my mind about this? How could Claude change it?
Will I lose anything from having Claude write that complex
parser.add_argument call for me? Perhaps if I had figured it out
on my own, on future occasions I would have remembered the const=5 and default=1
specifications and how they interacted. Perhaps.
But I suspect that I have figured it out on my own in the past, more
than once, and it didn't stick. I am happy with how it went this time.
After I got Claude's explanation, I checked its claimed behavior pretty
carefully with a stub program, as if I had been reviewing a
colleague's code that I wasn't sure about.
The biggest win Claude gave me was that I didn't know about this
ProcessPoolExecutor thing before, and now I do. That is going to
make me a better programmer. Now I know something about useful that
I didn't know before, and I have a pointer to documentation I know I
should study.
My skill at writing ad-hoc process pool managers might atrophy, but if
it does, that is good. I have already written too many ad-hoc
process pool managers. It was a bad habit, I should have stopped long
ago, and this will help me stop.
Conclusion
This works.
Perfectly? No, it's technology, technology never works perfectly.
Have you ever used a computer?
Will it introduce new problems? Probably, it's new technology, and
new technology always introduces new problems.
But is it better than what we had before? Definitely.
I still see some programmers turning up their noses at this technology
as if they were sure it was a silly fad that would burn itself out
once people came to their senses and saw what a terrible idea it was.
I think that is not going to happen, and those nose-turning-up people,
like the people who pointed out all the drawbacks and unknown-unknowns
of automobiles as compared to horse-drawn wagons, are going to look
increasingly foolish.
Suppose a centrifuge has slots, arranged in a circle around the
center, and we have test tubes we wish to place into the slots.
If the tubes are not arranged symmetrically around the center, the
centrifuge will explode.
(By "arranged symmetrically around the center, I mean that if the
center is at , then the sum of the positions of the tubes
must also be at .)
Let's consider the example of . Clearly we can arrange ,
, , or tubes symmetrically:
Equally clearly
we can't arrange only . Also it's easy to see we can do tubes if
and only if we can also do tubes, which rules out .
From now on I will write to mean the problem of balancing
tubes in a centrifuge with slots. So and are possible, and and are
not. And is solvable if and only if is.
It's perhaps a little surprising that is possible.
If you just ask this to someone out of nowhere they might
have a happy inspiration: “Oh, I'll just combine the solutions for
and , easy.” But that doesn't work because two groups
of the form and always overlap.
For example, if your group of is the
slots then you can't also have your group of be
, because slot already has a tube in it.
The
other balanced groups of are blocked in the same way. You
cannot solve the puzzle with ; you have to do as
below left.
The best way to approach this is to do , as below right.
This is easy,
since the triangle only blocks three of the six symmetric pairs.
Then you replace the holes with tubes and the tubes with holes to
turn into .
Given and , how can we decide whether the centrifuge can be
safely packed?
Clearly you can solve when is a multiple of , but the example
of (or ) shows this isn't a necessary condition.
A generalization of this is that is always solvable
if since you can easily
balance tubes at positions , then do another tubes one position over, and
so on. For example, to do you just put first four tubes
in slots and the next four one position over, in slots
.
An interesting counterexample is that the strategy for ,
where we did , cannot be extended to . One
would want to do , but there is no way to arrange the tubes
so that the group of doesn't conflict with the group of ,
which blocks one slot from every pair.
But we can see that this must be true without even considering the
geometry. is the reverse of , which
impossible: the only nontrivial divisors of are and
, so must be a sum of s and s, and is not.
You can't fit tubes when , but again the reason is
a bit tricky. When I looked at directly, I did a case analysis
to make sure that the -group and the -group would always
conflict. But again there was an easier was to see this: and
clearly won't work, as is not a sum of s and s.
I wonder if there's an example where both and are not obvious?
For , every works except and the always-impossible .
What's the answer in general? I don't know.
Addenda
20250502
Now I am amusing myself thinking about the perversity of a centrifuge
with a prime number of slots, say . If you use it at all, you must
fill every slot. I hope you like explosions!
While I did not explode any centrifuges in university chemistry, I did
once explode an expensive Liebig condenser.
Omar Antolín points out an important consideration I missed:
it may be necessary
to subtract polygons. Consider . This is obviously
possible since . But there is a more interesting
solution. We can add the pentagon to the
digons and to obtain the solution
$${0,5,6,10,12,18, 20, 24, 25}.$$
Then from this we can subtract the triangle to obtain $${5, 6, 12, 18, 24, 25},$$ a solution to
which is not a sum of regular polygons:
Thanks to Dave Long for pointing out a small but significant error,
which I have corrected.
The GHC developers are very pleased to announce the availability
of the final release for GHC 9.10.2. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org and
via GHCup.
GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of
severities and scopes, including:
Significantly improved performance when dynamically loading Haskell symbols (#23415).
Fixing a bug where the simplifier sometimes destroyed join points during float out, which could impact performance (#24768).
Reduced memory fragmentation in the non-moving GC’s segment allocator, improving resident set size by up to 26% for some applications (#24150).
Added new flags to control speculative evaluation (-fspec-eval and -fspec-eval-dictfun) to work around performance regressions (#25606).
Fixed several platform-specific issues, including segfaults with FFI on PowerPC (#23034) and improved code
generation for AArch64 with multiway branches now using jump tables (#19912)
And many more!
A full accounting of these fixes can be found in the release notes. As
always, GHC’s release status, including planned future releases, can be found on
the GHC Wiki status.
We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable,
Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors
whose on-going financial and in-kind support has facilitated GHC maintenance
and release management over the years. Finally, this release would not have
been possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
At work I’ve been researching
how to improve the ergonomics of prompt engineering and I wanted to share
and open source some of what I’ve done. This initial post is about how
I’ve been experimenting with using bidirectional type inference
to streamline prompt chaining.
“Prompt chaining” is a prompt engineering technique that splits a
larger task/prompt into multiple smaller tasks/prompts which are chained
together using code. For example, instead of prompting a model to
generate a poem in one prompt like this:
Write a poem based off this idea:
${idea}
… by following this process:
First think through the form, stanza count, lines per stanza, and
rhyme scheme
Then choose a poetic style (tone, voice, and literary devices) based
on the poem’s form
Then write a complete poem based on that plan
… you can split it into smaller prompts, like this:
structure prompt:
Plan the structure of a new poem based on this idea
${idea}
Describe its form, stanza count, lines per stanza, and rhyme
scheme
style prompt:
Given this poem structure:
Form: ${structure.form}
Stanzas: ${structure.stanzaCount}
Lines per stanza: ${structure.linesPerStanza}
Rhyme scheme: ${structure.rhymeScheme}
Choose a poetic style: tone, voice, and literary devices to
emphasize
poem prompt:
Write a complete poem based on this idea:
${idea}
Structure:
Form: ${structure.form}
Stanzas: ${structure.stanzaCount}
Lines per stanza: ${structure.linesPerStanza}
Rhyme scheme: ${structure.rhymeScheme}
Style:
Tone: ${style.stone}
Voice: ${style.voice}
Literary Devices: ${style.literaryDevices}
Why might you want to do this?
to improve the quality of the results
Models perform better when working on more constrained subproblems.
Splitting a larger prompt into smaller prompts helps the model stay
focused at each step.
to introspect intermediate results
This comes in handy when you want to log, validate, or correct
intermediate results.
to perform actions in between prompts
You might want to take the output of one prompt, use that to call
some tool, then use the output of that tool to decide what the next
prompt should be, which you can’t do with a single prompt.
In other words, prompt chaining unlocks greater accuracy,
control, and flexibility for prompt engineering.
The problem
The main issue with prompt chaining is that it is a huge pain in the
ass; if you start do anything a little bit complicated you need to start
using structured outputs (i.e. JSON), which adds a whole lot of
boilerplate to the process:
you have to define the schema for each intermediate step of the
process
You typically do this by defining your data model in your host
programming language (e.g. a Pydantic model in Python) or directly
defining your JSON schema
You have to instruct the model to produce JSON and explain the
shape of the expected output
(Depending on the framework) you have to decode the JSON into
your data model
For small prompt chaining pipelines this isn’t too hard, but it
starts to get annoying to define all these schemas when you scale this
up to more sophisticated prompt chaining pipelines.
So as a thought experiment I wanted to create a research prototype
that handled all of that for you so that you didn’t need to specify any
schemas at all. In other words I wanted to build a programming language
that harnessed bidirectional type inference to perform
schema inference for prompts with structured JSON outputs.
Example
I’ll cut to the case by showing the above prompt chain written as a
program in this language:
let concatSep = https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffgletlines= concatSep "\n"let generatePoem idea =let structure = prompt { model:"gpt-4o" , text:lines [ "Plan the structure of a new poem based on this idea:" , "" , idea , "" , "Describe its form, stanza count, lines per stanza, and rhyme scheme." ] }let renderedStructure =lines [ "- Form: "+ structure.form , "- Stanzas: "+Real/show (structure.stanzaCount :Integer) , "- Lines per stanza: "+Real/show (structure.linesPerStanza :Integer) , "- Rhyme scheme: "+ structure.rhymeScheme ]let style = prompt { model:"gpt-4o" , text:lines [ "Given this poem structure:" , renderedStructure , "" , "Choose a poetic style: tone, voice, and literary devices to emphasize." ] }let renderedStyle =lines [ "- Tone: "+ style.tone , "- Voice: "+ style.voice , "- Literary Devices: "+ concatSep ", " style.literaryDevices ]let poem :Text= prompt { model:"gpt-4o" , text:lines [ "Write a complete poem based on this idea:" , "" , idea , "" , "Structure:" , renderedStructure , "" , "Style:" , renderedStyle ] }in { structure: structure, style: style, poem: poem }in generatePoem "The stillness of early morning"
If you run the above example, you might get an output like this:
{"structure":{"form":"Free verse with a sprinkle of rhyme","linesPerStanza":4,"rhymeScheme":"ABCB","stanzaCount":5},"style":{"literaryDevices":["imagery","alliteration","personification","enjambment"],"tone":"nostalgic","voice":"reflective"},"poem":"### The Stillness of Early Morning\n\n**I** \nIn the gentle rise of dawn's embrace, \nA whisper of winds through sleepy skies, \nThe world lies suspended in an echo \nOf yesterday's dreams, reluctantly saying goodbye.\n\n**II** \nSoft orange light graces fields of dew, \nCradling tenacious tendrils of the past. \nMorning breathes upon the waiting world \nWith promises that this serenity will last.\n\n**III** \nWaves of silence surge across the meadow, \nNature's symphony gently cradles rest. \nA chorus of leaves sings its soft refrain, \nComforted by morning's tender caress.\n\n**IV** \nThe silence dips to caress weary trees, \nWhose branches waltz to whispers soft and slow. \nEach tendril holds close the morning's intention, \nBefore the day's demands, the chaos, start to show.\n\n**V** \nEarly rays dance upon a tranquil sea, \nA canvas painted through a dreamy haze. \nTime sits by, savoring the gentle hush, \nAs the world awakens, stirs, and finally plays."}
… and if you’re curious what the formatted poem looks like:
The Stillness of Early
Morning
I
In the gentle rise of dawn’s embrace,
A whisper of winds through sleepy skies,
The world lies suspended in an echo
Of yesterday’s dreams, reluctantly saying goodbye.
II
Soft orange light graces fields of dew,
Cradling tenacious tendrils of the past.
Morning breathes upon the waiting world
With promises that this serenity will last.
III
Waves of silence surge across the meadow,
Nature’s symphony gently cradles rest.
A chorus of leaves sings its soft refrain,
Comforted by morning’s tender caress.
IV
The silence dips to caress weary trees,
Whose branches waltz to whispers soft and slow.
Each tendril holds close the morning’s intention,
Before the day’s demands, the chaos, start to show.
V
Early rays dance upon a tranquil sea,
A canvas painted through a dreamy haze.
Time sits by, savoring the gentle hush,
As the world awakens, stirs, and finally plays.
Type inference
The sample Grace program hardly specifies any types (mainly the final
expected type for the poem: Text). The reason
this works is because Grace supports bidirectional type
inference, which means that Grace can work backwards from how
intermediate results are used to infer their schemas.
I’ll illustrate this with a contrived Grace example:
let numbers = prompt{ text:"Give me two numbers" }in { x: numbers.x , y: numbers.y , sum: numbers.x + numbers.y :Integer }
… which might produce an output like this:
$ grace interpret ./numbers.ffg
{"x":7,"y":14,"sum":21}
When Grace analyzes this program the type checker works backwards
from this expression:
numbers.x + numbers.y :Integer
… and reasons about it like this:
the addition produces an Integer, therefore
numbers.x and numbers.y must also be
Integers
therefore numbers is a record with two fields,
x and y, both of which are
Integers
… or using Grace syntax, the inferred type of numbers
is: { x: Integer, y: Integer }
therefore the output of the prompt command must have
the same type
… and then Grace generates a JSON schema for the prompt which looks
like this:
Of course, you can specify types if you want (and they’re
more lightweight than schemas in traditional prompt chaining
frameworks). For example:
$ grace repl>>> prompt{ text:"Give me a first and last name" } : { first: Text, last: Text }{"first":"Emily", "last": "Johnson" }>>> prompt{ text:"Give me a list of names" } : List Text["Alice","Bob","Charlie","Diana","Ethan","Fiona","George","Hannah","Isaac","Jack"]
However in our original example we don’t need to specify intermediate
types because when the type-checker sees this code:
let structure = prompt { model:"gpt-4o" , text:lines [ "Plan the structure of a new poem based on this idea:" , "" , idea , "" , "Describe its form, stanza count, lines per stanza, and rhyme scheme." ] }let renderedStructure =lines [ "- Form: "+ structure.form , "- Stanzas: "+Real/show (structure.stanzaCount :Integer) , "- Lines per stanza: "+Real/show (structure.linesPerStanza :Integer) , "- Rhyme scheme: "+ structure.rhymeScheme ]
… the compiler can reason backwards from how the
structure value is used to infer that the JSON schema for
the prompt needs to be:
This doesn’t actually run any tools (I haven’t added any
callable tools to my work-in-progress branch yet), but just renders the
tool use as a string for now:
$ grace interpret ./tools.ffg
["curl https://api.example.com/data","ls -l -a"]
However, the idea is that you can model a tool as a sum type with one
constructor per callable tool, and in the above example the type checker
infers that the sum type representing one tool call is:
… but since we List/map the call function
over the output of the prompt the type checker infers that
the prompt needs to generate a List of tool
calls:
prompt{ text:"Call some tools" } :List<HttpRequest: …, ShellCommand: … >
… and then Grace does some magic under the hood to convert that type
to the equivalent JSON schema.
What’s particularly neat about this example is that the prompt is so
incredibly bare (“Call some tools”) because all the information the
model needs is present in the schema.
Schema-driven prompting
We can explore this idea of using the schema to drive the prompt
instead of prose using an example like this:
prompt{ text:"Generate some characters for a story", model:"gpt-4o" }:List { "The character's name":Text , "The most memorable thing about the character":Text , "The character's personal arc":Text }
[{"The character's name":"Aveline Thatcher","The character's personal arc":"Aveline starts as a skeptical journalist who doubts the stories of mythical creatures. Over time, she becomes a firm believer, risking her career to uncover the truth and protect these creatures.","The most memorable thing about the character":"The intricate tattoo of a phoenix on her forearm that seems to glow when she discovers hidden truths."},{"The character's name":"Kelan Frost","The character's personal arc":"A former rogue alchemist who turns hero after he inadvertently creates a dangerous substance. Driven by guilt, Kelan seeks redemption by finding an antidote and saving his village.","The most memorable thing about the character":"His iridescent blue eyes that seem to see into one's soul, a side effect of his alchemical experiments."},{"The character's name":"Luciana Blair","The character's personal arc":"Luciana is a reclusive artist who initially fears the world outside her home. After a mysterious vision rejuvenates her, she sets out on a journey of self-discovery, ultimately finding both her voice and courage.","The most memorable thing about the character":"Her ability to paint scenes before they happen, which she attributes to the visions she sees in her dreams."},{"The character's name":"Ezra Hartman","The character's personal arc":"Once a charismatic but self-centered lawyer, Ezra is confronted with a moral crisis that forces him to reevaluate his values. He chooses a path of integrity, becoming an advocate for justice.","The most memorable thing about the character":"His perfectly tailored suits that slowly become more casual, symbolizing his transformation and shifting priorities."},{"The character's name":"Seraphine Mora","The character's personal arc":"Seraphine is a young music prodigy who loses her hearing after an accident. Battling despair, she learns to embrace a new way of 'hearing' music through vibrations and her other senses.","The most memorable thing about the character":"The ethereal way she 'dances' with the music, using her entire body to express each note's emotion."}]
Grace is a superset of JSON and since JSON supports arbitrary field
names so does Grace! Field names in Grace support arbitrary
capitalization, punctuation, and whitespace as long as you quote them,
and we can use the field names to “smuggle” the description of each
field into the schema.
Conclusion
Hopefully this gives you some idea of why I’ve begun to think of
prompt chaining as a programming languages problem. Type inference is
just the beginning and I think it is possible to use a domain-specific
programming language not just to simplify the code but to ultimately
unlock greater reasoning power.
I’m going to continue to use Grace as a research vehicle for prompt
chaining but my LLM-enabled branch
of Grace (like Grace itself) is not really intended to be used in
production and I created it mainly as a proof-of-concept for where I’d
like prompt chaining frameworks to go. If I do end up eventually
productionizing this research I will create a proper fork with its own
name and the whole works.
Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.
There are many introductions to the Expectation-Maximisation algorithm.
Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic.
They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems.
Examples of tricks I've seen used are:
Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary.
There's nothing original here, I'm merely expanding on a stackexchange answer.
Generalities about EM
The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.
First assume no hidden variables.
We assume there is a vector of parameters that defines some model.
We make some observations .
We have a probability density that depends on .
The likelihood of given the observations is .
The maximum likelhood estimator for is the choice of that maximises for the we have observed.
Now suppose there are also some variables that we didn't get to observe.
We assume a density .
We now have
where we sum over all possible values of .
The MLE approach says we now need to maximise
One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum.
If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others.
Here's how we might move in that direction.
Consider instead the log-likelihood
Now imagine that by magic we could commute the logarithm with the sum.
We'd need to maximise
One reason this would be to our advantage is that often takes the form where is a simple function to optimise.
In addition, may break up as a sum of terms, each with its own block of 's.
Moving the logarithm inside the sum would give us something we could easily maximise term by term.
What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise.
But, of course, we can't just move that logarithm in.
Maximisation by proxy
Sometimes a function is too hard to optimise directly.
But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead.
That will give us a new guess and we can continue from there.
This is the basis of gradient descent.
Suppose is a differentiable function in a neighbourhood of .
Then around we have
We can try optimising with respect to within a neighbourhood of .
If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent.
(Note that picking a circular neighbourhood is itself a somewhat arbitrary step,
but that's another story.)
For gradient descent we're choosing because it matches both the value and derivatives of at .
We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.
We want our logarithm of a sum to be a sum of logarithms.
But instead we'll settle for a proxy function that is a sum of logarithms.
We'll make the derivatives of the proxy match those of the original function
precisely so we're not making an arbitrary choice.
Write
The are constants we'll determine.
We want to match the derivatives on either side of the
at :
On the other hand we have
To achieve equality we want to make these expressions match.
We choose
Our desired proxy function is:
So the procedure is to take an estimated and obtain a new estimate
by optimising this proxy function with respect to .
This is the standard EM algorithm.
It turns out that this proxy has some other useful properties.
For example, because of the concavity of the logarithm,
the proxy is always smaller than the original likelihood.
This means that when we optimise it we never optimise ``too far''
and that progress optimising the proxy is always progress optimising the
original likelihood.
But I don't need to say anything about this as it's all part of the standard literature.
Afterword
As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute
you can iterate, at each step computing
where is the previous iteration.
If the take a convenient form then this may turn out to be much easier.
Note
This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.
Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.
Preface
Functional programming encourages us to program without mutable state.
Instead we compose functions that can be viewed as state transformers.
It's a change of perspective that can have a big impact on how we reason about our code.
But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations.
(Unless you're infinitely unlucky!)
Markov Chains
A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state.
So it is a sequence such that for all .
A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny).
(And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)
Like imperative code, this description is stateful.
The state at step depends on the state at step .
Probability is often easier to reason about when we work with independent identically drawn random variables and our aren't of this type.
But we can eliminate the state from our description using the same method used by functional programmers.
Let's choose a Markov chain to play with.
I'll pick one with 3 states called , and and with transition probabilities given by
where
Here's a diagram illustrating our states:
Implementation
First some imports:
> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}
> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)
We are now in a position to simulate our Markov chain.
First we need some random numbers drawn uniformly from [0, 1]:
> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random
And now the code to take a single step in the Markov chain:
> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
> a <- uniform
> if a < 0.5
> then return A
> else return B
> step B = do
> a <- uniform
> if a < 1/3.0
> then return A
> else if a < 2/3.0
> then return B
> else return C
> step C = do
> a <- uniform
> if a < 0.5
> then return B
> else return C
Notice how the step function generates a new state at random in a way that depends on the previous state.
The m ABC in the type signature makes it clear that we are generating random states at each step.
We can simulate the effect of taking steps with a function like this:
> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
> i <- steps (n-1) i
> step i
We can run for 100 steps, starting with , with a line like so:
*Main> evalState (steps 3 A) gen
B
The starting state of our random number generator is given by gen.
Consider the distribution of states after taking steps.
For Markov chains of this type, we know that as goes to infinity the distribution of the th state approaches a limiting "stationary" distribution.
There are frequently times when we want to sample from this final distribution.
For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution.
But for real world problems this can be intractable.
Instead, a popular solution is to pick a large and hope it's large enough.
As gets larger the distribution gets closer to the limiting distribution.
And that's the problem I want to solve here - sampling from the limit.
It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.
Some random functions
Here is a new version of our random step function:
> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
> a <- uniform
> return $ \case
> A -> if a < 0.5 then A else B
> B -> if a < 1/3.0
> then A
> else if a < 2/3.0 then B else C
> C -> if a < 0.5 then B else C
In many ways it's similar to the previous one.
But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state.
We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:
> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
> fs <- replicateA n step'
> return $ foldr (flip (.)) id fs
Notice the use of flip.
We want to compose functions , each time composing on the left by the new .
This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation:
(BTW I used replicateA instead of replicateM to indicate that these are independent random draws.
It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)
*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]
When I first implemented this I accidentally forgot the flip.
So maybe you're wondering what effect removing the flip has?
The effect is about as close to a miracle as I've seen in mathematics.
It allows us to sample from the limiting distribution in a finite number of steps!
Here's the code:
> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
> fs <- replicateA n step'
> return $ foldr (.) id fs
We end up building .
This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'.
Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain.
Running with argument draws a random composition of functions.
But if you increase by 1 you don't add a new step at the end.
Instead you effectively restart the Markov chain with a new first step generated by a new random seed.
Try it and see:
*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]
Maybe that's surprising.
It seems to get stuck in one state.
In fact, we can try applying the resulting function to all three states.
*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]
In other words, for large enough we get the constant function.
Think of it this way:
If f isn't injective then it's possible that two states get collapsed to the same state.
If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state.
Once this happens, we'll get the same result no matter how large we take .
If we can detect this then we've found the limit of as goes to infinity.
But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition.
That flip can't change what probability distribution we're drawing from - just the dependence on the seed.
So the value the constant function takes is actually a draw from the limiting stationary distribution.
We can code this up:
> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as
> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool
> test_constant f =
> all_equal $ map f $ enumFromTo minBound maxBound
This technique is called coupling from the past.
It's "coupling" because we've arranged that different starting points coalesce.
And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.
> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
> m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
> if test_constant f
> then return f
> else do
> f' <- step
> couple_from_past step (f . f')
We can now sample from the limiting distribution a million times, say:
*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen
We can now count how often A appears:
*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748
That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.
> gen = mkStdGen 669
Notes
The technique of coupling from the past first appeared in a paper by Propp and Wilson.
The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it.
Note that the code above is absolutely not how you'd implement this for real.
I wrote the code that way so that I could switch algorithm with the simple removal of a flip.
In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred.
Or even with uncountably large state spaces.
But I'll let you read the Propp-Wilson paper to find out how.
In those articles I showed how you could build up the Clifford algebras like so:
type Cliff1 = Complex R
type Cliff1' = Split R
type Cliff2 = Quaternion R
type Cliff2' = Matrix R
type Cliff3 = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4 = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5 = Quaternion Cliff3'
...
I used CliffN as the Clifford algebra for a negative definite inner product and
CliffN' for the positive definite case.
It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.
The core principle making this work is that for type constructors implemented like Matrix, Quaternion etc. we have the property that
eg. Matrix (Quaternion Float) is effectively the same thing as Matrix FloatQuaternion Float.
But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:
> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4
...
Or even better:
> type family Cliff (n :: Nat) :: * where
> Cliff 0 = Float
> Cliff n = Complex (Cliff (n - 1))
But there's one little catch.
We have to work, not with the tensor product, but the super tensor product.
We define Complex the same way as before:
> data Complex a = C a a deriving (Eq, Show)
Previously we used a definition of multiplication like this:
instance Num a => Num (Complex a) where
C a b * C c d = C (a * c - b * d) (a * d + b * c)
We can think of C a b in Complex R as representing the element .
The definition of multiplication in a tensor product of algebras is defined by .
So we have .
This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.
Let's go Super!
A superalgebra is an algebra graded by where is the ring of integers modulo 2.
What that means is that we have some algebra that can be broken down as a direct sum (the subscripts live in ) with the property that multiplication respects the grading, ie. if is in and is in then is in .
The elements of are called "even" (or bosonic) and those in "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)
The super tensor product is much like the tensor product but it respects the grading.
This means that if is in and is in then is in .
From now on I'm using to mean super tensor product.
Multiplication in the super tensor product of two superalgebras and is now defined by the following modified rule:
if is in and is in then .
Note that the sign flip arises when we shuffle an odd left past an odd .
The neat fact that John pointed out to me is that
.
We have to modify our definition of * to take into account that sign flip.
I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy.
But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.
So I could define SuperAlgebra like so:
class SuperAlgebra a where
conjugation :: a -> a
where conjugation is the negation of the odd part.
(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)
But there's a little efficiency optimization I want to write.
If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row.
This means applying negate to many elements of large algebraic objects even
though any pair of them cancel each other's effect.
So I add a little flag to my conjugation function that is used to say we want an extra negate and we can
accumulate flips of a flag rather than flips of lots of elements.
> class SuperAlgebra a where
> conjugation :: Bool -> a -> a
Here's our first instance:
> instance SuperAlgebra Float where
> conjugation False x = x
> conjugation True x = negate x
This is saying that the conjugation is the identity on Float but if we
want to perform an extra flip we can set the flag to True.
Maybe I should call it conjugationWithOptionalExtraNegation.
And now comes the first bit of non-trivial superalgebra:
> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
> conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)
We consider to be even and to be odd. When we apply the conjugation to then we can just apply it directly to .
But that flips the "parity" of (because tensor product respects the grading) so we need to swap when we use the conjugation.
And that should explain why conjugation is defined the way it is.
Now we can use the modified rule for defined above:
> instance (Num a, SuperAlgebra a) => Num (Complex a) where
> fromInteger n = C (fromInteger n) 0
> C a b + C a' b' = C (a + a') (b + b')
> C a b * C c d = C (a * c - conjugation False b * d)
> (conjugation False a * d + b * c)
> negate (C a b) = C (negate a) (negate b)
> abs = undefined
> signum = undefined
For example, conjugation False is applied to the first on the RHS because implicitly represents an term and when expanding out the product we shuffle the (odd) in left of . It doesn't get applied to the second because and remain in the same order.
That's it!
Tests
I'll test it with some examples from Cliff3:
> class HasBasis a where
> e :: Integer -> a
> instance HasBasis Float where
> e = undefined
> instance (Num a, HasBasis a) => HasBasis (Complex a) where
> e 0 = C 0 1
> e n = C (e (n - 1)) 0
> make a b c d e f g h =
> C (C (C a b) (C c d))
> (C (C e f) (C g h))
The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction.
It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative).
Nonetheless, I think comparison with Cayley-Dickson may be useful.
Efficiency
As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications.
I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four.
This should lead to a fast implementation of Clifford algebras.
Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras.
I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with .
Note also that this code translates directly into many languages.
From my perspective, one of the biggest open problems in implementing
programming languages is how to add a type system to the language
without significantly complicating the implementation.
For example, in my tutorial Fall-from-Grace
implementation the type checker logic accounts for over half of the
code. In the following lines of code report I’ve highlighted the modules
responsible for type-checking with a ‡:
That’s 2684 lines of code (≈51%) just for type-checking (and believe
me: I tried very hard to simplify the type-checking code).
This is the reason why programming language implementers will be
pretty keen to just not implement a type-checker for their language, and
that’s how we end up with a proliferation of untyped programming
languages (e.g. Godot or Nix), or ones that end up with a type system
bolted on long after the fact (e.g. TypeScript or Python). You can see
why someone would be pretty tempted to skip implementing a type system
for their language (especially given that it’s an optional language
feature) if it’s going to balloon the size of their codebase.
So I’m extremely keen on implementing a “lean” type checker that has
a high power-to-weight ratio. I also believe that a compact type checker
is an important foundational step for functional programming to “go
viral” and displace imperative programming. This post outlines one
approach to this problem that I’ve been experimenting with1.
Unification
The thing that bloats the size of most type-checking implementations
is the need to track unification variables. These variables are
placeholders for storing as-yet-unknown information about something’s
type.
For example, when a functional programming language infers the type
of something like this Grace expression:
(λx → x) true
… the way it typically works is that it will infer the type of the
function (λx → x) which will be:
λx → x : α → α
… where α is a unification variable (an unsolved type).
So you can read the above type annotation as saying “the type of
λx → x is a function from some unknown input type
(α) to the same output type (α).
Then the type checker will infer the type of the function’s input
argument (true) which will be:
true :Bool
… and finally the type checker will combine those two pieces of
information and reason about the final type like this:
the input to the function (true) is a
Bool
therefore the function’s input type (α) must also be
Bool
therefore the function’s output type (α) must also be
Bool
therefore the entire expression’s type is Bool
… which gives the following conclusion of type inference:
(λx → x) true :Bool
However, managing unification variables like α is a lot
trickier than it sounds. There are multiple unification
algorithms/frameworks in the wild but the problem with all of them is
that you have to essentially implement a bespoke logic programming
language (with all of the complexity that entails). Like, geez, I’m
already implementing a programming language and I don’t want to have to
implement a logic programming language on top of that just to power my
type-checker.
So there are a couple of ways I’ve been brainstorming how to address
this problem and one idea I had was: what if we could get rid of
unification variables altogether?
Deleting unification
Alright, so this is the part of the post that requires some
familiarity/experience with implementing a type-checker. If you’re
somebody new to programming language theory then you can still keep
reading but this is where I have to assume some prior knowledge
otherwise this post will get way too long.
The basic idea is that you start from the “Complete and Easy”
bidirectional type checking algorithm which is a type checking
algorithm that does use unification variables2 but
is simpler than most type checking algorithms. The type checking rules
look like this (you can just gloss over them):
Now, delete all the rules involving unification variables. Yes, all
of them. That means that all of the type-checking judgments from Figures
9 and 10 are gone and also quite a few rules from Figure 11 disappear,
too.
Surprisingly, you can still type check a lot of code with what’s
left, but you lose two important type inference features if you do
this:
you can no longer infer the types of lambda arguments
you can no longer automatically instantiate polymorphic
code
… and I’ll dig into those two issues in more detail.
Inferring lambda argument
types
You lose the ability to infer the type of a function like this one
when you drop support for unification variables:
λx → x ==False
Normally, a type checker that supports unification can infer that the
above function has type Bool → Bool, but (in general) a
type checker can no longer infer that when you drop unification
variables from the implementation.
This loss is not too bad (in fact, it’s a pretty common
trade-off proposed in the bidirectional type checking literature)
because you can make up for it in a few ways (all of which are easy and
efficient to implement in a type checker):
You can allow the input type to be inferred if the lambda is
given an explicit type annotation, like this:
λx → x ==False:Bool → Bool
More generally, you can allow the input type to be inferred if the
lambda is checked against an expected type (and a type annotation is one
case, but not the only case, where a lambda is checked against an
expected type).
We’re going to lean on this pretty heavily because it’s pretty
reasonable to ask users to provide type annotations for function
definitions and also because there are many situations where we can
infer the expected type of a lambda expression from its immediate
context.
You can allow the user to explicitly supply the type of the
argument
… like this:
λ(x :Bool) → x ==False
This is how Dhall works,
although it’s not as ergonomic.
You can allow the input type to be inferred if the lambda is
applied to an argument
This is not that interesting, but I’m mentioning it for completeness.
The reason it’s not interesting is because you won’t often see
expressions of the form (λx → e) y in the wild, because
they can more idiomatically be rewritten as
let x = y in e.
Instantiating polymorphic
code
The bigger issue with dropping support for unification variables is:
all user-defined polymorphic functions now require explicit type
abstraction and explicit type application, which is a
major regression in the type system’s user experience.
For example, in a language with unification variables you can write
the polymorphic identity function as:
Most programmers do NOT want to program in a
language where they have to explicitly manipulate type variables in this
way. In particular, they really hate explicit type application. For
example, nobody wants to write:
map { x :Bool, … large record … } Bool (λr → r.x) rs
So we need to figure out some way to work around this limitation.
The trick
However, there is a solution that I believe gives a high
power-to-weight ratio, which I will refer to as “keyword” type
checking:
add a bunch of built-in functions
Specifically, add enough built-in functions to cover most use cases
where users would need a polymorphic function.
add special type-checking rules for those built-in functions when
they’re fully saturated with all of their arguments
These special-cased type-checking rules would not require unification
variables.
still require explicit type abstraction when these built-in
functions are not fully saturated
Alternatively, you can require that built-in polymorphic functions
are fully saturated with their arguments and make it a parsing error if
they’re not.
still require explicit type abstraction and explicit type
application for all user-defined (i.e. non-builtin) polymorphic
functions
optionally, turn these built-in functions into keywords or
language constructs
I’ll give a concrete example: the map function for
lists. In many functional programming languages this map
function is not a built-in function; rather it’s defined within the host
language as a function of the following type:
map: ∀(a b :Type) → (a → b) → List a → List b
What I’m proposing is that the map function would now
become a built-in function within the language and you would now apply a
special type-checking rule when the map function is fully
saturated:
Γ ⊢ xs ⇒ List a Γ ⊢ f ⇐ a → b
───────────────────────────────
Γ ⊢ map f xs ⇐ List b
In other words, we’re essentially treating the map
built-in function like a “keyword” in our language (when it’s fully
saturated). Just like a keyword, it’s a built-in language feature that
has special type-checking rules. Hell, you could even make it an actual
keyword or language construct (e.g. a list comprehension) instead of a
function call.
I would even argue that you should make each of these special-cased
builtin-functions a keyword or a language construct instead of a
function call (which is why I call this “keyword type checking” in the
first place). When viewed through this lens the restrictions that these
polymorphic built-in functions (A) are saturated with their arguments
and (B) have a special type checking judgment are no different than the
restrictions for ordinary keywords or language constructs (which also
must be saturated with their arguments and also require special type
checking judgments).
To make an analogy, in many functional programming languages the
if/then/else construct has this
same “keyword” status. You typically don’t implement it as a user-space
function of this type:
ifThenElse : ∀(a :Type) → Bool → a → a → a
Rather, you define if as a language construct and you
also add a special type-checking rule for if:
Γ ⊢ b ⇐ Bool Γ ⊢ x ⇒ a Γ ⊢ y ⇐ a
────────────────────────────────────
Γ ⊢ if b then x else y ⇒ a
… and what I’m proposing is essentially greatly exploding the number
of “keywords” in the implementation of the language by turning a whole
bunch of commonly-used polymorphic functions into built-in functions (or
keywords, or language constructs) that are given special type-checking
treatment.
For example, suppose the user were to create a polymorphic function
like this one:
let twice = λ(a :Type) → λ(x : a) → [ x, x ]in twice (ListBool) (twice Bool true)
That’s not very ergonomic to define and use, but we also can’t
reasonably expect our programming language to provide a
twice built-in function. However, our language could
provide a generally useful replicate builtin function (like
Haskell’s
replicate function):
replicate: ∀(a :Type) → Natural → a → List a
… with the following type-checking judgment:
Γ ⊢ n ⇐ Natural Γ ⊢ x ⇒ a
───────────────────────────
Γ ⊢ replicate n x ⇒ List a
… and then you would tell the user to use replicate
directly instead of defining their own twice function:
replicate2 (replicate2 true)
… and if the user were to ask you “How do I define a
twice synonym for replicate 2” you would just
tell them “Don’t do that. Use replicate 2 directly.”
Conclusion
This approach has the major upside that it’s much easier to implement
a large number of keywords than it is to implement a unification
algorithm, but there are other benefits to doing this, too!
It discourages complexity and fragmentation in user-space
code
Built-in polymorphic functions have an ergonomic advantage over
user-defined polymorphic functions because under this framework type
inference works better for built-in functions. This creates an ergonomic
incentive to stick to the “standard library” of built-in polymorphic
functions, which in turn promotes an opinionated coding style across all
code written in that language.
You might notice that this approach is somewhat similar in spirit to
how Go handles polymorphism which is to say: it doesn’t handle
user-defined polymorphic code well. For example, Go provides a few
built-in language features that support polymorphism (e.g. the
map data structure and for loops) but if users ask for any
sort of user-defined polymorphism then the maintainers tell them they’re
wrong for wanting that. The main difference here is that (unlike Go) we
do actually support user-defined polymorphism; it’s not forbidden, but
it is less ergonomic than sticking to the built-in utilities that
support polymorphism..
It improves error messages
When you special-case the type-checking logic you can also
special-case the error messages, too! With general-purpose unification
the error message can often be a bit divorced from the user’s intent,
but with “keyword type checking” the error message is not only more
local to the problem but it can also suggest highly-specific tips or
fixes appropriate for that built-in function (or keyword or language
construct).
It can in some cases more closely match the expectations of
imperative programmers
What I mean is: most programmers coming from an imperative and typed
background are used to languages where (most of the time) polymorphism
is “supported” via built-in language constructs and keywords and
user-defined polymorphism might be supported but considered “fancy”.
Leaning on polymorphism via keywords and language constructs would
actually make them more comfortable using polymorphism instead of trying
to teach them how to produce and consume user-defined polymorphic
functions.
For example, in a lot of imperative languages the idiomatic solution
for how to do anything with a list is “use a for loop” where you can
think of a for loop as a built-in keyword that supports polymorphic
code. The functional programming equivalent of “just use a for loop”
would be something like “just use a list comprehension” (where a list
comprehension is a “keyword” that supports polymorphic code that we can
give special type checking treatment).
That said, this approach is still more brittle than unification and
will require more type annotations in general. The goal here isn’t to
completely recover the full power of unification but rather to get
something that’s not too bad but significantly easier to
implement.
I think this “keyword type checking” can potentially occupy a “low
tech” point in the type checking design space for functional programming
languages that need to have efficient and compact implementations
(e.g. for ease of embedding). Also, this can potentially provide a
stop-gap solution for novice language implementers that want
some sort of a type system but they’re not willing to commit to
implementing a unification-based type system.
There’s also variation on this idea which Verity Scheel has been
exploring, which is to provide userland support for defining new
functions with special type-checking rules and there’s a post from her
outlining how to do that:
The other approach is to create
essentially an “ABNF for type checkers” that would let you write
type-checking judgments in a standard format that could generate the
corresponding type-checking code in multiple languages. That’s still a
work-in-progress, though.↩︎
I believe some people might take
issue with calling these unification variables because they consider
bidirectional type checking as a distinct framework from unification.
Moreover, in the original bidirectional type checking paper they’re
called “unsolved” variables rather than unification variables. However,
I feel that for the purpose of this post it’s still morally correct to
refer to these unsolved variables as unification variables since their
usage and complexity tradeoffs are essentially identical to unification
variables in traditional unification algorithms.↩︎
Since I was a child, I’ve been playing the French
Horn. I still play, and I
take it quite seriously. I’m lucky enough to play with some good
ensembles, and I perform many concerts each year.
When learning difficult music, I often practise with a
metronome. A metronome is a
device that clicks or beeps regularly. You can set how often it
clicks; for example you might set it to click 80 times a minute. The
tempo (or speed) of a piece of music is often specified by the
composer telling you how many beats per minute they want. This is
guidance and not sacrosanct: you don’t normally have to play at
exactly this tempo, and frequently music requires some implicit
variation of tempo for it to be successful. But it certainly is an
important piece of information from the composer, telling you at what
basic speed the piece should be going.
The problem with metronomes is that they can’t change their speed by
themselves. For complex music, the speed could be changing a lot, or,
maybe the number of beats per bar changes. This is annoying because it
means for some pieces of music you have to keep stopping playing,
readjust the metronome, and then continue on. There are also gradual
changes in tempo in music: a part of a piece might gently get faster
or slower. No metronome can cope with this: essentially, metronomes
know nothing about the piece of music you’re playing.
It does nearly all the musical things you would want it to. As this is
mainly a technical blog though, here I’ll focus on those aspects.
It could have been an app; I’ve built apps before. But the faff of
getting onto the play store, or the app store is just not worth
it. The development tools are heavyweight and annoying. Sending out
new versions requires approval processes, and you have to convince
people to install something before they can use it. So I wanted this
to be browser based. Also, modern web browsers are pretty amazing –
great features and well designed APIs. Yep, all the old APIs are
terrible and awful to work with, but everything that’s at all recent
is pretty great.
Drawing music in a browser is quite a challenge though. The way I’m
doing it is I’m building an SVG, client-side. This was the first thing
I started on: trying to figure out how to draw music in a browser, how
to be able to detect clicks, and make it all interactive. The client
side code is all generated from
TypeScript using the plain tsc to do
the translation to JavaScript. I can’t stand complex tool-chains, and
modern browsers are absolutely fine with loading modules (and you can
do some really nice things with import
maps
as we’ll see). I’m not even minimising the JavaScript: I’ve written
the server myself; the modules are sent over the wire gzipped and I
have correct cache-control settings using immutable and
“cache-busting”,
so minimising the source just makes debugging life harder for no real
gain.
A score is essentially a list of blocks. I’m using a
CRDT
(the fugue list
CRDT) to
allow local-first editing (and even offline editing). Dirty blocks get
sent over a websocket and stored on the server, using LMDB which is
all very normal for me.
The server has a neat part of its design: when you compile the server,
all the static assets are embedded into the binary, thus making it a
single self-contained executable. Now those assets (HTML, CSS, images,
JavaScript etc) are just normally named files, but they can also be Go
templates. When the server starts up, it works through these static
assets, building HTTP routes for them. The HTTP routes contain in
their paths the hashcode of the file – this is necessary for the
cache
busting. If
the asset is a template, the server knows how to run the template, and
critically, I provide a url function in the template engine so that
templates can get the URL of some other asset including its
hashcode. So this means that if some HTML file needs to link to some
CSS file, the HTML file as built into the server can be a template. At
start up, this template gets run, it can invoke this url function,
and it can find out the final URL of the CSS file. And of course this
URL now influences the hashcode of the HTML file itself. This also
plays very nicely with the
integrity
attribute you can put on all sorts of things these days.
So it all works out rather nicely: if you consider the tree of file A
importing files B and C, and file B imports file D, then if I make
some change to file D, then it means its hashcode changes, and so its
URL changes. This propagates up to file B, and from there to file A
(but no change to file C). So it’s safe to serve all these static
assets with immutable cache-control headers and rely on this lovely
hashcode chaining. All of this work is done once, each time the server
starts-up. And it’ll all explode and stop working if there’s ever a
cycle in the graph of file imports.
Now in practice, it seems that references between HTML, CSS, images,
or JavaScript don’t seem to create cycles – at least I’ve not had a
problem so far. But between JavaScript modules, it’s much more common,
as you’d likely expect. But here, import
maps
come to the rescue: in my TypeScript/JavaScript, I just import modules
normally. I have a function in the template engine which knows how to
generate an import-map of all my JavaScript modules, which gets
injected into the top HTML page. This import-map provides both the
rewriting of paths (to add hashcodes onto the paths), and also
provides the integrity
section. This
solves the problem of circular imports because it means the JavaScript
itself never needs to contain the hashcode of any module it
imports. Yet, if I change some JavaScript module, then its hashcode
changes, which means the import-map changes, and so again, the browser
is forced into correctly fetching the updated resource.
A couple of weekends ago, I was up visiting my parents and I wanted to
demonstrate this thing to them (they’re also musicians). They have
computers running Windows. I tried loading up a demo score, and it
just didn’t work. Their browsers were up to date. Nothing of note in
the server logs, so I opened up the browser console and found errors
from deserialisation of data coming over the websocket: it was
claiming the data was corrupted. I’d never seen this in my own
development and use.
Checking a few other things, and I spotted that the source HTML for
the page had had some additional <script> elements added to it:
something was injecting some JavaScript. And then the penny dropped:
this is MITM
behaviour by some shitty anti-virus software – in this case,
AVG. Some quick web searching, and yep, those products are also known
for dicking around with websocket traffic: if you’re sending binary
messages and you’re compressing the stream, it’s apparently quite
common that the anti-virus software intercepts the traffic, and then
screws up the framing leading your own code to face corrupted
data. Completely ridiculous.
In my case, disabling compression on the websocket was enough to
prevent the corruption, and I then established that even for big
scores, the initial load would be maybe 25kB of data over the
websocket, so not compressing it isn’t terrible.
What made me laugh though was this: the browser console was telling me
both about the corrupted data, and also about the fact the browser was
refusing to run some script, due to it violating
CSP
settings. It took me a moment to realise that the script that wasn’t
being run, was the script that the anti-virus software was injecting
into my pages! Now, import-maps can’t be external files, they have to
be inline in the HTML. But there’s no way I’m serving HTML pages with
a CSP header with script-src 'unsafe-inline'. Instead, I’m issuing a
script-src CSP header with 'self' and also the hashcode of
import-map itself.
What this says to the browser is that it can trust the import-map
(because hashing it will give a hashcode that matches the CSP header),
the import-map itself has its integrity for every JavaScript module it
needs to load, and the CSP headers also tell the browser that it’s OK
to load JavaScript modules from the same domain (this is the
'self'). But, this does not give permission for the browser to run
arbitrary bits of crap JavaScript that some awful anti-virus thing has
injected! So, by making use of CSP and import-maps, you can defeat
attackers from tampering with your website and code!
New languages are coming out all the time, some experimental, some
industrial, others are purpose built for a specific domain. No single
language has the people-power or scope to try every cool new feature, so
a critical step in designing a new language is to observe how
experimental features have borne themselves out in practice.
As the saying goes, good [language designers] copy, great [language
designers] steal.
If you've heard anything about the Unison Language it's not a
surprise to you that it innovates in many areas. Unison very much tries
to reinvent Human-Compiler interactions for the 21st century, and in
that pursuit has spawned fully integrated ecosystem between the
compiler, codebase-manager, language server, version control and package
manager.
While some of these features are still too new to have proven their
worth (but we have our fingers crossed); there are aspects that I think
new languages should certainly consider as part of their designs.
A Fully
Interactive and Incremental Compiler
With the modern era of language servers and programming assistants,
developers greatly benefit from instant feedback on their work. With
traditional batch compilers it's all too tempting to go for a coffee, or
a walk, or a YouTube binge every time you kick off a big build. The
context-switching induced by switching tasks while compiling wastes
developer time by paging things in and out of their working memory, not
to mention: it just feels bad. After the build finishes, the
developer is left with a giant wall of text, sentenced to dig through a
large list of compiler errors trying to find some root-cause error in
the file they're working on.
Unison has a fully interactive compilation experience. The
language-server is typechecking your scratch-file on every keystroke
providing error feedback right in your editor, and offering helpful
information via hover-hints which use your codebase and typechecking
info to help you orient yourself. It can even partially typecheck the
file to suggest which types or operators you may want to fill into a
given slot.
Once you're happy with a chunk of code, you can check it in to the
codebase and it won't be compiled again unless you want to change it, or
an update is automatically propagated into it from a downstream
change.
While most languages won't adopt Unison's scratch-file and codebase
model; having an interactive compiler with good support for caching of
already-compiled-assets is a huge boon to productivity in any
language.
On the topic of the language server, Unison's language server is
built directly into the compiler. This ensures we avoid the awkward
disagreements between the LSP and compiler that sometimes happen in
other languages. It can also help to avoid duplicate work, many
languages are running the compiler independently and in their LSP at the
same time without sharing any of the work between them, causing
redundant work and a waste of precious resources.
Codebase API
It's the compiler's job to understand your code intimately. It knows
exactly how every definition is linked together, even if you don't! In
many languages it can be frustrating to know that this information
exists deep within the compiler, but not having any access to it
yourself!
Unison stores all your code as structured data within your codebase
and exposes the ability for you to ask it useful questions about your
code, exposing that precious understanding to you as a developer.
Unison allows searching by type, finding the dependencies of a
definition, or inverting that relationship to finding all definitions
which depend on a definition.
Via the UCM CLI you can use utilities like text.find to
search only string constants, or find to search only
definition names.
Some codebase data is provided via an API which is exposed from the
interactive UCM compiler, allowing developers to write tooling to
customize their workflow. For example, check out this VS
Code plugin someone wrote to view codebase definitions in the
sidebar. In other languages you'd typically need to write a scrappy
Regex or re-compile the code in a subprocess in order to achieve
something similar.
It doesn't have to be an API, it could be a parquet file or a SQLite
database or any number of things, the important part is that a language
exposes its one-true-source of information about the codebase in some
structured format for third-party tools to build upon.
Smart docs
It doesn't matter how great your language's package ecosystem is if
nobody can figure out how to use it! Documentation is critical for
helping end users understand and use functionality in your language, but
it has a fatal flaw: documentation isn't compiled and falls out of date
with the code.
In Unison, docs are a data-type within the language itself. This
means that docs can be generated dynamically by running Unison
code! We've leveraged this ability to enable embedding typechecked
runnable code examples into your docs. These examples are compiled
alongside the rest of your program, so they're guaranteed to be
kept up to date, and the outputs from your example code is run
and updated whenever the source definitions change.
You can also write code which generates documentation based
on your real application code. For example, you could write code which
crawls your web-server's implementation and collects all the routes and
parameters the server defines and displays them nicely as
documentation.
Unison goes one step further here by providing special support for
the documentation format on Unison Share, ensuring any definitions
mentioned in docs and code examples are hyper-linked to make for a
seamless package-browsing experience.
As an example of how far this can go, check out this
awesome project by community contributor Alvaro which generates
mermaid graphs in the docs representing the behaviour of simulations.
The graphs are generated from the same underlying library code so they
won't go out of date.
Get stealing
This subset of topics doesn't touch on Unison's ability system,
continuation capturing, or code serialization so I'll probably need at
least a part 2!
Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put
these things together, if I managed to teach your something or even just entertain you for a minute or two
maybe send a few bucks my way for a coffee? Cheers! �
In this episode Mike Sperber and Niki Vazou talk with Sandy Maguire, lead compiler engineer at Manifold Valley. They talk about the benefits of using Haskell of course, about all the books Sandy has written, on effects and the problem with monads, on combinator libraries and programming with laws.
The GHC developers are very pleased to announce the availability
of the release candidate for GHC 9.10.2. Binary distributions, source
distributions, and documentation are available at downloads.haskell.org and
via GHCup.
GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of
severities and scopes. A full accounting of these fixes can be found in the
release notes. As always, GHC’s release status, including planned future
releases, can be found on the GHC Wiki status.
This release candidate will have a two-week testing period. If all goes well
the final release will be available the week of 1 May 2025.
We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable,
Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors
whose on-going financial and in-kind support has facilitated GHC maintenance
and release management over the years. Finally, this release would not have
been possible without the hundreds of open-source contributors whose work
comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.
Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time
in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is
O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of
concatenation.
The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions
a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists”
represented this way is just function composition which is a constant time operation. We can lift a
list xs to this representation via the section (xs ++). This will still lead to O(length xs)
amount of work to apply this function, but a composition of such functions applied to a list will
always result in a fully right associated expression even if the function compositions aren’t
right associated.
In the last several years, it has become popular to refer to this technique as “difference lists”.
Often no justification is given for this name. When it is given, it is usually a reference to the
idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise
to efficient concatenation, they have almost no similarities.
Functional Lists
To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it
is doing is a bit misunderstood and, consequently, oversold1. Let’s see how we would model this approach in an OO
language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it
would be exactly the same for generic lists.
This is just a straight, manual implementation of closures for (.) and (++) (specialized to
strings). Other lambdas not of the above two forms would lead to other implementations of
PrependTo. Let’s say, however, these are the only two forms that actually occur, which is mostly
true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an
OOP encoding of the algebraic data type:
dataPrependTo=ComposePrependToPrependTo|PrependStringprependTo ::PrependTo->String->StringprependTo (Compose left right) end = prependTo left (prependTo right end)prependTo (Prepend s) end = s ++ end
We could have also arrived at this by defunctionalizing a typical example of the technique. Modulo
some very minor details (that could be resolved by using the Church-encoded version of this), this
does accurately reflect what’s going on in the technique. Compose is clearly constant time. Less
obviously, applying these functional lists requires traversing this tree of closures – made
into an explicit tree here. In fact, this reveals that this representation could require arbitrarily
large amounts of work for a given size of output. This is due to the fact that prepending an empty
string doesn’t increase the output size but still increases the size of the tree. In practice,
it’s a safe assumption that, on average, at least one character will be prepended per leaf of the
tree which makes the overhead proportional to the size of the output.
This tree representation is arguably better than the “functional list” representation. It’s less
flexible for producers, but that’s arguably a good thing because we didn’t really want arbitraryString -> String functions. It’s more flexible for consumers. For example, getting the head of
the list is a relatively efficient operation compared to applying a “functional list” and taking
the head of the result even in an eager language. (Laziness makes both approaches comparably
efficient.) Getting the last element is just the same for the tree version, but, even with laziness,
is much worse for the functional version. More to the point, this concrete representation allows
the concatenation function to avoid adding empty nodes to the tree whereas (.) can’t pattern
match on whether a function is the identity function or not.
This view makes it very clear what the functional version is doing.
Difference Lists in Prolog
List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.
For our purposes, viewing this as a function of the first two arguments, this is exactly the usual
functional implementation of list concatenation with exactly the same problems. We could, of course,
encode the defunctionalized version of the functional approach into (pure) Prolog. This would
produce:
(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)
However, this being a logic programming language means we have additional tools available to use
that functional languages lack. Namely, unification variables. For an imperative (destructive)
implementation of list concatenation, the way we’d support efficient append of linked lists is we’d
keep pointers to the start and end of the list. To append two lists, we’d simply use the end
pointer of the first to update the end of the first list to point at the start of the second. We’d
then return a pair consisting of the start pointer of the first and the end pointer of the second.
This is exactly how Prolog difference lists work, except instead of pointers, we use unification
variables which are more principled. Concretely, we represent a list as a pair of lists, but the
second list will be represented by an unbound unification variable and the first list contains
that same unification variable as a suffix. This pair is often represented using the infix
operator (“functor” in Prolog terminology), -, e.g. Xs - Ys. We could use diff(Xs, Ys) or
some other name. - isn’t a built-in operator, it’s just a binary constructor essentially.
At the level of logic, there are no unification variables. The constraints above mean that Xs - Ys
is a list Xs which contains Ys as a suffix.
The name “difference list” is arguably motivated by the definition of concatenation in this
representation.
concat(Xs-Ys,Ys-Zs,Xs-Zs).
This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument
is unbound, like it’s supposed to be, then this is a constant-time operation of binding that
component to Ys. If it is bound, then we need to unify which, in the worst-case, is O(length Ys)
where the length is up to either nil or an unbound variable tail2.
We also have the unit of concat, i.e. the empty
list via3:
empty(Xs-Xs).
See the footnote, but this does in some way identify Xs - Ys with the “difference” of Xs and
Ys.
We get back to a “normal” list via:
to_list(Xs- [],Xs).% or more generally,prepend_to(Xs-Ys,Ys,Xs).
to_list is a constant-time operation, no matter what. Note, to_list binds the suffix component
of the difference list. This means that the first input no longer meets our condition to be a
difference list. In other words, to_list (and prepend_to) consumes the difference list.
More precisely, it constrains the possible suffixes the list could be.
Indeed, any operation that binds the suffix component of a difference list consumes it. For example,
concat consumes its first argument.
Of course, it still makes logical sense to work with the difference list when its suffix component
is bound, it’s just that its operational interpretation is different. More to the point, given a
difference list, you cannot prepend it (via prepend_to or concat) to two different lists to get
two different results.
Converting from a list does require traversing the list since we need to replace the nil node, i.e.
[], with a fresh unbound variable. Luckily, this is exactly what append does.
from_list(Xs,Ys-Zs) :- append(Xs,Zs,Ys).
from_list also suggests this “difference list” idea. If all of Xs, Ys, and Zs are ground
terms, then from_list(Xs, Ys - Zs) holds when append(Xs, Zs, Ys) holds. Exactly when if our
invariants are maintained, i.e. that Zs is a suffix of Ys. Writing these relations more
functionally and writing append as addition, we’d have:
If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal)
list with to_list, and then we could use from_list multiple times on that result. This would,
of course, still consume the original difference list. We’d also be paying O(length Xs) for every
duplicate, including to replace the one we just consumed4.
That said, we can prepend to a list to a difference list without consuming it. We can perform
other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing
into the list would force the list to be at least a certain length, but still allow prepending to
any list that will result in a final list at least that long.
Comparison
I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing
enforces that a value of type ShowS actually just appends something to its input. We could use
abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference
lists also need an abstraction barrier to ensure their invariants, though their failure modes are
different. A difference list can’t change what it is based on what it is prepended to.
Functional Representation
Difference Lists
constant-time concatenation
constant-time concatenation
constant-time conversion from a list (though you pay for it later)
O(n) conversion from a list
persistent
non-persistent, requires linear use
represented by a tree of closures
represented by a pair of a list and a unification variable
O(n) (or worse!) conversion to a list
constant-time conversion to a list
defunctionalized version can be implemented in pretty much any language
requires at least single-assignment variables
unclear connection to being the difference of two lists (which two lists?)
mathematical, if non-obvious, connection to being the difference of two (given) lists
As an illustration of the difference between persistent and non-persistent uses, the function:
double f = f . f
is a perfectly sensible function on ShowS values that behaves exactly as you’d expect. On the
other hand:
double(In,Out) :- concat(In,In,Out).
is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic
list) except for when In is the empty difference list.
Conclusion
I hope I’ve illustrated that the functional representation is not just not difference lists, but
is, in fact, wildly different from difference lists.
This functional representation is enshrined into Haskell via the ShowS type and related functions,
but I’d argue the concrete tree representation is actually clearer and better. The functional
representation is more of a cute trick that allows us to reuse existing functions. Really, ShowS
should have been an abstract type.
Difference lists are an interesting example of how imperative ideas can be incorporated into a
declarative language. That said, difference lists come with some of the downsides of an imperative
approach, namely the lack of persistence.
As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional
representation. Calling it “functional lists” or something like that is, in my opinion, very
ambiguous and potentially misleading. I think the lack of a good name for this is why “difference
lists” started becoming popular. As I’ve argued, using “difference list” in this context is even
more misleading and confusing.
If people really want a name, one option might be “delta list”. I don’t think this term is used.
It keeps the intuitive idea that the functional representation represents some “change” to a list,
a collection of deltas that will all be applied at once, but it doesn’t make any false reference to
difference lists. I’m not super into this name; I just want something that isn’t “difference list”
or otherwise misleading.
To be clear, it’s still much, much,
better than using plain concatenation.↩︎
Such a length relation couldn’t
be written in pure Prolog but can in actual Prolog.↩︎
For those algebraically minded, this almost makes concat and empty into another
monoid except concat is partial, but such a partial monoid is just a category! In other words,
we have a category whose objects are lists and whose homsets are, at most, singletons containing
Xs - Ys for Hom(Xs, Ys). If we maintain our invariant that we have Xs - Ys only when Ys is a
suffix of Xs, this thin category is exactly the category corresponding to the reflexive,
transitive “has suffix” relation. We could generalize this to any monoid via a “factors through”
relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can
easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general
monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and
only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is
exactly the slice category of the monoid viewed as a one-object category.↩︎
Again, in actual Prolog, we could
make a duplicate without consuming the original, though it would still take O(length Xs) time using
the notion of length mentioned before.↩︎
Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both short and long term benefits from making your own personal benchmark exist.
I started thinking about benchmarks for coding in part with my frustration with the discourse around LLMs in the public squares I frequent (Reddit and Twitter). People often want to know "what's the best model" or "what's the best coding IDE"? One might imagine that the way to answer this question would be to test the models on a variety of problems from real world uses of the LLM for coding, and then compare how well various systems do on this. Indeed, whenever a new SOTA model releases, the lab will usually tell you about the model's performance against a few well known coding benchmarks. Problem solved?
Of course not! In fact, for the most part, no one really talks about benchmarks when comparing models. Why? I argue the most popular benchmarks measure tasks that are largely different from what a user wants out of an LLM. For example, take the recent Gemini 2.5 Pro release. In their headline table, they test against LiveCodeBench, Aider Polyglot and SWE-bench Verified. Both LiveCodeBench and Aider Polyglot derive their problems from contest programming and pedagogical exercises (respectively), while SWE-bench assesses bug fixes to preexisting codebases. While useful, this is only a small slice things people want to do with LLMs.
Wouldn't it be great if you had your own, personal benchmark, based on problems you actually care about? If you are tweaking your .cursorrules, you could run your benchmark to see if a change you made helped or not. When a new model comes out, you could spend a few bucks to run your eval and make a decision if you should switch your daily driver. And then on social media, if you wanted to stan the new model, instead of asking the model to drop a ball inside a rotating hexagon or vagueposting about how the new model is incredible, you could just post your benchmark results.
It's a collection of nearly 100 tests I've extracted from my actual conversation history with various LLMs.
There are two defining features of this benchmark that make it interesting. Most importantly, I've implemented a simple dataflow domain specific language to make it easy for me (or anyone else!) to add new tests that realistically evaluate model capabilities. This DSL allows for specifying both how the question should be asked and also how the answer should be evaluated. Most questions are evaluated by actually running the code the model writes but the framework supports a bunch of other evaluation methods as well. And then, directly as a result of this, I've written nearly 100 tests for different situations I've actually encountered when working with LLMs as assistants.
I have been working on my own benchmark based off of Carlini's benchmark, and I can confirm that this works well for the traditional style of coding eval, where you have a one-shot task that generates and executes the code against some test cases. My basic strategy is to vibe code as usual, but whenever I give an LLM a task that it isn't able to one shot, I consider adding it to the benchmark. In more detail:
I only add a task if a SOTA LLM failed it. This ensures the benchmark consists of all appropriate difficulty problems: easy enough that I thought an LLM should be able to do it, but hard enough that a SOTA model failed on it. I don't need problems that are too hard (this is already well covered by well known benchmarks like SWE-Bench or SWE-Lancer), and I don't mind if my problems saturate because, hey, that means the models are that much better for my use cases!
After I have added the task to the benchmark, I can use the benchmark runner to tell if changing the model, tweaking the prompt, or even just running the prompt again at nonzero temperature can make it pass. Indeed, it's helpful to find some configuration that makes the eval pass, as this is good for debugging issues in the evaluation function itself... also it means you have working code for whatever task you were working on. Conversely, you can make the task harder by leaving things out from the prompt.
Writing the test is the labor intensive part, but you can always vibe code a test. Importantly, you have a failing implementation (your initial generation) and some way you (manually?) determined that the implementation was wrong, so just turn this into your evaluation function! (And for all you yak shaving aficionados, if the model fails to vibe code your test, well, you have another task for your benchmark!)
For example, the other day I needed to take an asciinema recording and convert it into a sequence of frames rendered as plain text. However, the only project for doing these conversations was agg, which converts recordings into animated gifs. In
agg_to_text, I ask an LLM to take agg's source code and create a new program which dumps the frames as plain text rather than gif images. The reason why this task is difficult, is because there is some discretion in deciding when to emit a frame, and with my original prompt the LLM didn't precisely replicate the original behavior in agg. While working on the benchmark, I realized that instructing the model specifically about how frame batching worked was enough to get it to preserve the original behavior. But I don't think I should need to do this: thus this task. (P.S. If this test saturates, well, I can always make it harder by removing the agg source code from the prompt.)
The ability to benchmark one shot tasks is here today, but I would like to speculate a bit about what lies beyond them. In particular, most of my LLM coding activity involves asking the LLM to make changes to a pre-existing project, which makes it less amenable to "single prompt creates self contained program". (Also, I usually only ask one-shot questions that the LLM can answer, so most of them would never go in my benchmark.)
In short, how can I extract tasks from my day-to-day work? There seems to be two big extra levers we have:
Codebase tasks. This is the heavy-weight approach: you record the Git commit of your codebase at the time you prompted for some new feature to be added, and then when you want to run an eval on a new model you just check out the codebase at that commit and let the end-to-end system go. You'll typically want to execute the modified code, which means you'll also need a way to reliably setup the runtime environment for the code; things like lockfiles can help a lot here.
Transcript tasks. You don't actually need the entire codebase to be available to ask an LLM for a completion; you only need the conversation transcript up to the point of the critical generation. If the transcript is mostly your agent system reading in files for context, you can end up with a relatively system generic prompt that can tell you something about other systems. Of course, if you want to actually run the change, you still need the full codebase, which is why this approach is much more amenable if you're going to do some static analysis on the output. For example, if a model keeps adding try: ... except: ... blocks that are suppressing errors, you can take some transcripts where you've caught the model red-handed doing this and make an eval that checks if the model is still doing this. I suspect testing on transcripts works best for testing if changing prompts or rules improves performance, since the transcript itself will put the model into some particular latent space and if it were a different model they might have made different choices leading to a different latent space. Transcripts from thinking models are especially susceptible to this!
I have started adapting Carlini's framework to work better for these cases, although I would love to be told someone has already solved this problem for me. In particular, I am very excited about using transcript tasks to evaluate whether or not things I add to my prompts / triggered rules are helping or not. Current SOTA model instruction following isn't great and I regularly catch models doing behaviors that I explicitly told them not to in the system prompt. I have started some initial analysis over all of my chat logs to find cases where the model misbehaved, although I haven't quite worked out how I want to build an eval out of it.
One word of warning: to make transcript tasks, you need an AI coding system that doesn't obscure how it assembles its underlying prompts (which rules out most of the popular closed source AI code editors.)
I started building evals for a selfish reason: I wanted to be able to tell if modifications to my prompts were doing anything. But I also think there is a broader opportunity that arises if we also publish these benchmarks to the world.
For one, building a real world benchmark on use cases we care about is a way to communicate to the people training AI models whether or not they are doing well or not. Historical evals have focused on LeetCoding, and consequently we have models that would ace any big tech interview and yet on real world tasks will drive you off a cliff at the first opportunity. And this is not just free labor for the top labs: if you believe in open source models, one of the biggest barriers to good small models is having really high quality data. We, the OSS vibe coding community, can directly help here.
I think there is a tremendous opportunity for the open source community to really push the state of the art in coding evaluations. There's only so many benchmarks that I, personally, can create, but if everyone is making benchmarks I could eventually imagine a universe of benchmarks where you could curate the problems that are relevant to your work and quickly and cheaply judge models in this way: a Wikipedia of Coding Benchmarks.
To summarize: every time an LLM fails to solve a problem you ask it for, this is a potential new benchmark. As long as there is a way to automate testing if the LLM has solved the problem, you can turn this into a benchmark. Do this for yourself, and you can quickly have a personal benchmark with which to evaluate new models. Do this at scale, and you can help push the frontier in coding models.