Planet Haskell

August 01, 2025

Lysxia's blog

Twentyseven 1.0.0

Twelve years of Haskell

Twentyseven is a Rubik’s cube solver and one of my earliest projects in Haskell. The first commit dates from January 2014, and version 0.0.0 was uploaded on Hackage in March 2016.

I first heard of Haskell in a course on lambda calculus in 2013. A programming language with lazy evaluation sounded like a crazy idea, so I gave it a try. Since then, I have kept writing in Haskell as my favorite language. For me it is the ideal blend of programming and math. And a Rubik’s cube solver is a great excuse for doing group theory.

Twentyseven 1.0.0 is more of a commemorative release for myself, with the goal of making it compile with the current version of GHC (9.12). There was surprisingly little breakage:

  1. Semigroup has become a superclass of Monoid
  2. A breaking change in the Template Haskell AST

Aside from that, the code is basically just as it was 9 years ago, including design decisions that I would find questionable today. For example, I use unsafePerformIO to read precomputed tables into top-level constants, but the location of the files to read from can be configured by command-line arguments, so I better make sure that the tables are not forced before the location is set…

How Twentyseven works

The input of the program is a string enumerating the 54 facelets of a Rubik’s cube, each character represents one color.

DDDFUDLRB FUFDLLLRR UBLBFDFUD ULBFRULLB RRRLBBRUB UBFFDFDRU

The facelets follow the order pictured below. They are grouped by faces (up, left, front, right, back, top), and in each face they are listed in top-down, left-right order.

                  00 01 02
                  03 04 05
                  06 07 08

        10 11 12  20 21 22  30 31 32  40 41 42
        13 14 15  23 24 25  33 34 35  43 44 45
        16 17 18  26 27 28  36 37 38  46 47 48

                  50 51 52
                  53 54 55
                  56 57 58

The output is a sequence of moves to solve that cube.

U L B' L R2 D R U2 F U2 L2 B2 U B2 D' B2 U' R2 U L2 R2 U

The implementation of Twentyseven is based on Herbert Kociemba’s notes about Cube Explorer, a program written in Pascal!

The search algorithm is iterative deepening A*, or IDA*. Like A*, IDA* finds the shortest path between two vertices in a graph. A conventional A* is not feasible because the state space of a Rubik’s cube is massive (43 252 003 274 489 856 000 states, literally billions of billions). Instead, we run a series of depth-first searches with a maximum allowed number of moves that increases for each search. As it is based on depth-first search, IDA* only needs memory for the current path, which is super cheap.

IDA* relies on an estimate of the number of moves remaining to reach the solved state. We obtain such an estimate by projecting the Rubik’s cube state into a simpler puzzle. For example, we can consider only the permutation of corners, ignoring their orientation. We can pre-compute a table mapping each corner permutation (there are 8! = 40320) to the minimum number of moves to put the corners back to their location. This is a lower bound on the number of moves to actually solve a Rubik’s cube. Different projections yield different lower bounds (for example, by looking at the permutation of edges instead, or their orientation), and we can combine lower bounds into their maximum, yielding a more precise lower bound, and thus a faster IDA*.

Putting all that together, we obtain an optimal solver for Rubik’s cubes. But even with these heuristics, Twentyseven can take hours to solve a random cube optimally. Kociemba’s Cube Explorer is apparently much faster (I’ve never tried it myself). My guess is that the difference is due to a better selection of projections, yielding better heuristics. But I haven’t gotten around to figure out whether I’ve misinterpreted his notes or those improvements can only be found in the code.

A faster alternative is Kociemba’s two phase algorithm. It is suboptimal, but it solves Rubik’s cubes in a fraction of a second (1000 cubes per minute). The first phase puts cubies into a “common orientation” and “separates” the edges into two groups. In other words, we reach a state where the permutation of 12 edges can be decomposed into two disjoint permutations of 4 and 8 edges respectively. In the second phase, we restrict the possible moves: quarter- and half-turns on the top and bottom faces, half-turns only on the other faces. These restricted moves preserve the “common orientation” of edges and corners from phase 1, and the edges in the middle slice stay in their slice. Each phase thus performs an IDA* search in a much smaller space than the full Rubik’s cube state space (2 217 093 120 and 19 508 428 800 states respectively).

by Lysxia at August 01, 2025 12:00 AM

July 31, 2025

Tweag I/O

Integrating Nix and Buck2

Buck2 is a new open source build system developed by Meta (Facebook) which we already looked at before in some depth, see A Tour Around Buck2, Meta’s New Build System. Since then, Buck2 has gained significant improvements in user experience and language support, making it an increasingly attractive option in the build systems space.

At Tweag, we adhere to high standards for reproducible builds, which Buck2 doesn’t fully uphold in its vanilla configuration. In this post, we will introduce our ruleset that provides integration with Nix. I’ll demonstrate how it can be used, and you will gain insights into how to leverage Nix to achieve more reliable and reproducible builds with Buck2.

Reproducibility, anyone?

In short, Buck2 is a fast, polyglot build tool very similar to Bazel. Notably, it also provides fine-grained distributed caching and even speaks (in its open source variant) the same remote caching and execution protocols used by Bazel. This means you’re able to utilize the same Bazel services available for caching and remote execution.

However, in contrast to Bazel, Buck2 uses a remote first approach and does not restrict build actions using a sandbox on the local machine. As a result build actions can be non-hermetic, meaning their outcome might depend on what files or programs happen to be present on the local machine. This lack of hermeticity can lead to non-reproducible builds, which is a critical concern for the effective caching of build artifacts.

Non-hermeticity issues can be elusive, often surfacing unexpectedly for new developers which effects on-boarding new team members, or open source contributors. If left undetected, they can even cause problems down the line in production, which is why we think reproducible builds are important!

Achieving Reproducibility with Nix

If we want reproducible builds, we must not rely on anything installed on the local machine. We need to precisely control every compiler and build tool which is used in our project. Although defining each and every one of these inside the Buck2 build itself is possible, it also would be a lot of work. The solution to this problem can be Nix.

Nix is a package manager and build system for Linux and Unix-like operating systems. With nixpkgs, there is a very large and comprehensive collection of software packaged using Nix, which is extensible and can be adapted to one’s needs. Most importantly, Nix already strictly enforces hermeticity for its package builds and the nixpkgs collection goes to great lengths to achieve reproducible builds.

So, using Nix to provide compilers and build tools for Buck2 is a way to benefit from that preexisting work and introduce hermetic toolchains into a Buck2 build.

Let’s first quickly look into the Nix setup and proceed with how we can integrate it into Buck2 later.

Nix with flakes

After installing Nix, the nix command is available, and we can start declaring dependencies on packages from nixpkgs in a nix file. The Nix tool uses the Nix language, a domain-specific, purely functional and lazily evaluated programming language to define packages and declare dependencies. The language has some wrinkles, but don’t worry; we’ll only use basic expressions without delving into the more advanced concepts.

For example, here is a simple flake.nix which provides the Rust compiler as a package output:

{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-unstable";
  };
  outputs = { self, nixpkgs }:
    {
      packages = {
        aarch64-darwin.rustc = nixpkgs.legacyPackages.aarch64-darwin.rustc;
        x86_64-linux.rustc = nixpkgs.legacyPackages.x86_64-linux.rustc;
      }
    };
}

Note: While flakes have been widely used for a long time, the feature still needs to be enabled explicitly by setting extra-experimental-features = nix-command flakes in the configuration. See the wiki for more information.

In essence, a Nix flake is a Nix expression following a specific schema. It defines its inputs (usually other flakes) and outputs (e.g. packages) which depend on the inputs. In this example the rustc package from nixpkgs is re-used for the output of this flake, but more complex expressions could be used just as well.

Inspecting this flake shows the following output:

$ nix flake show --all-systems
path:/source/project?lastModified=1745857313&narHash=sha256-e1sxfj1DZbRjhHWF7xfiI3wc1BpyqWQ3nLvXBKDya%2Bg%3D
└───packages
    ├───aarch64-darwin
    │   └───rustc: package 'rustc-wrapper-1.86.0'
    └───x86_64-linux
        └───rustc: package 'rustc-wrapper-1.86.0'

In order to build the rustc package output, we can call Nix in the directory of the flake.nix file like this: nix build '.#rustc'. This will either fetch pre-built artifacts of this package from a binary cache if available, or directly build the package if not. The result is the same in both cases: the rustc package output will be available in the local nix store, and from there it can be used just like other software on the system.

$ nix build --print-out-paths '.#rustc'
/nix/store/ssid482a107q5vw18l9millwnpp4rgxb-rustc-wrapper-1.86.0-man
/nix/store/szc39h0qqfs4fvvln0c59pz99q90zzdn-rustc-wrapper-1.86.0

The output displayed above illustrates that a Nix build of a single package can produce multiple outputs. In this case the rustc package was split into a default output and an additional, separate output for the man pages.

The default output contains the main binaries such as the Rust compiler:

$ /nix/store/szc39h0qqfs4fvvln0c59pz99q90zzdn-rustc-wrapper-1.86.0/bin/rustc --version
rustc 1.86.0 (05f9846f8 2025-03-31) (built from a source tarball)

It is also important to note that the output of a Nix package depends on the specific nixpkgs revision stored in the flake.lock file, rather than any changes in the local environment. This ensures that each developer checking out the project at any point in time will receive the exact same (reproducible) output no matter what.

Using Buck2

As part of our work for Mercury, a company providing financial services, we developed rules for Buck2 which can be used to integrate packages provided by a nix flake as part of a project’s build. Recently, we have been able to publish these rules, called buck2.nix, as open source under the Apache 2 license.

To use these rules, you need to make them available in your project first. Add the following configuration to your .buckconfig:

[cells]
  nix = none

[external_cells]
  nix = git

[external_cell_nix]
  git_origin = https://github.com/tweag/buck2.nix.git
  commit_hash = accae8c8924b3b51788d0fbd6ac90049cdf4f45a # change to use a different version

This configures a cell called nix to be fetched from the specified repository on GitHub. Once set up, you can refer to that cell in your BUCK files and load rules from it.

Note: for clarity, I am going to indicate the file name in the top most comment of a code block when it is not obvious from the context already

To utilize a Nix package from Buck2, we need to introduce a new target that runs nix build inside of a build action producing a symbolic link to the nix store path as the build output. Here is how to do that using buck2.nix:

# BUCK

load("@nix//flake.bzl", "flake")

flake.package(
    name = "rustc",
    binary = "rustc",
    path = "nix", # path to a nix flake
    package = "rustc", # which package to build, default is the value of the `name` attribute
    output = "out", # which output to build, this is the default
)

Note: this assumes the flake.nix and accompanying flake.lock file is found alongside the BUCK file in the nix subdirectory

With this build file in place, a new target called rustc is made available which builds the output called out of the rustc package of the given flake. This target can be used as a dependency of other rules in order to generate an output artifact:

# BUCK

genrule(
   name = "rust-info",
   out = "rust-info.txt",
   cmd = "$(exe :rustc) --version > ${OUT}"
)

Note: Buck2 supports expanding references in string parameters using macros, such as the $(exe ) part in the cmd parameter above which expands to the path of the executable output of the :rustc target

Using Buck2 (from nixpkgs of course!) to build the rust-info target yields:

$ nix run nixpkgs#buck2 -- build --show-simple-output :rust-info
Build ID: f3fec86b-b79f-4d8e-80c7-acea297d4a64
Loading targets.   Remaining     0/10                                                                                    24 dirs read, 97 targets declared
Analyzing targets. Remaining     0/20                                                                                    5 actions, 5 artifacts declared
Executing actions. Remaining     0/5                                                                                     9.6s exec time total
Command: build.    Finished 2 local
Time elapsed: 10.5s
BUILD SUCCEEDED
buck-out/v2/gen/root/904931f735703749/__rust-info__/out/rust-info.txt

$ cat buck-out/v2/gen/root/904931f735703749/__rust-info__/out/rust-info.txt
rustc 1.86.0 (05f9846f8 2025-03-31) (built from a source tarball)

For this one-off command we just ran buck2 from the nixpkgs flake on the current system. This is nice for illustration, but it is also not reproducible, and you’ll probably end up with a different Buck2 version when you try this on your machine.

In order to provide the same Buck2 version consistently, let’s add another Nix flake to our project:

# flake.nix

{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-unstable";
  };
  outputs = { self, nixpkgs }:
    {
      devShells.aarch64-darwin.default =
        nixpkgs.legacyPackages.aarch64-darwin.mkShellNoCC {
          name = "buck2-shell";
          packages = [ nixpkgs.legacyPackages.aarch64-darwin.buck2 ];
        };

      devShells.x86_64-linux.default =
        nixpkgs.legacyPackages.x86_64-linux.mkShellNoCC {
          name = "buck2-shell";
          packages = [ nixpkgs.legacyPackages.x86_64-linux.buck2 ];
        };
    };

  nixConfig.bash-prompt = "(nix) \\$ "; # visual clue if inside the shell
}

This flake defines a default development environment, or dev shell for short. It uses the mkShellNoCC function from nixpkgs which creates an environment where the programs from the given packages are available in PATH.

After entering the shell by running nix develop in the directory of the flake.nix file, the buck2 command has the exact same version for everyone working on the project as long as the committed flake.lock file is not changed. For convenience, consider using direnv which automates entering the dev shell as soon as changing into the project directory.

Hello Rust

With all of that in place, let’s have a look at how to build something more interesting, like a Rust project.

Similar to the genrule above, it would be possible to define custom rules utilizing the :rustc target to compile real-world Rust projects. However, Buck2 already ships with rules for various languages in its prelude, including rules to build Rust libraries and binaries.

In a default project setup with Rust these rules would simply use whatever Rust compiler is installed in the system, which may cause build failures due to version mismatches.

To avoid this non-hermeticity, we’re going to instruct the Buck2 rules to use our pinned Rust version from nixpkgs.

Let’s start by preparing such a default setup for the infamous “hello world” example in Rust:

# src/hello.rs

fn main() {
    println!("Hello, world!");
}
# src/BUCK

rust_binary(
    name = "hello",
    srcs = ["hello.rs"],
)

Toolchains

What’s left to do to make these actually work is to provide a Rust toolchain. In this context, a toolchain is a configuration that specifies a set of tools for building a project, such as the compiler, the linker, and various command-line tools. In this way, toolchains are decoupled from the actual rule definitions and can be easily changed to suit one’s needs.

In Buck2, toolchains are expected to be available in the toolchains cell under a specific name. Conventionally, the toolchains cell is located in the toolchains directory of a project. For example, all the Rust rules depend on the target toolchains//:rust which is defined in toolchains/BUCK and must provide Rust specific toolchain information.

Luckily, we do not need to define a toolchain rule ourselves but can re-use the nix_rust_toolchain rule from buck2.nix:

# toolchains/BUCK

load("@nix//toolchains:rust.bzl", "nix_rust_toolchain")

flake.package(
    name = "clippy",
    binary = "clippy-driver",
    path = "nix",
)

flake.package(
    name = "rustc",
    binaries = ["rustdoc"],
    binary = "rustc",
    path = "nix",
)

nix_rust_toolchain(
    name = "rust",
    clippy = ":clippy",
    default_edition = "2021",
    rustc = ":rustc",
    rustdoc = ":rustc[rustdoc]",
    visibility = ["PUBLIC"],
)

The rustc target is defined almost identically as before, but the nix_rust_toolchain rule also expects the rustdoc attribute to be present. In this case, the rustdoc binary is available from the rustc Nix package as well and can be referenced using the sub-target syntax :rustc[rustdoc] which refers to the corresponding item of the binaries attribute given to the flake.package rule.

Additionally, we need to pass in the clippy-driver binary, which is available from the clippy package in the nixpkgs collection. Thus, the flake.nix file needs to be changed by adding the clippy package outputs:

# toolchains/nix/flake.nix

{
  inputs = {
    nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-unstable";
  };
  outputs =
    {
      self,
      nixpkgs,
    }:
    {
      packages = {
        aarch64-darwin.rustc = nixpkgs.legacyPackages.aarch64-darwin.rustc;
        aarch64-darwin.clippy = nixpkgs.legacyPackages.aarch64-darwin.clippy;
        x86_64-linux.rustc = nixpkgs.legacyPackages.x86_64-linux.rustc;
        x86_64-linux.clippy = nixpkgs.legacyPackages.x86_64-linux.clippy;
      }
    };
}

At this point we are able to successfully build and run the target src:hello:

(nix) $ buck2 run src:hello
Build ID: 530a4620-bfb2-454d-bae1-e937ae9e764f
Analyzing targets. Remaining     0/53                                                                                    75 actions, 101 artifacts declared
Executing actions. Remaining     0/11                                                                                    1.1s exec time total
Command: run.      Finished 3 local
Time elapsed: 0.7s
BUILD SUCCEEDED
Hello, world!

Building a real-world Rust project would be a bit more involved. Here is an interesting article how one can do that using Bazel.

Note that buck2.nix currently also provides toolchain rules for C/C++ and Python. Have a look at the example project provided by buck2.nix, which you can directly use as a template to start your own project:

$ nix flake new --template github:tweag/buck2.nix my-project

A big thank you to Mercury for their support and for encouraging us to share these rules as open source! If you’re looking for a different toolchain or have other suggestions, feel free to open a new issue. Pull requests are very welcome, too!

If you’re interested in exploring a more tightly integrated solution, you might want to take a look at the buck2-nix project, which also provides Nix integration. Since it defines an alternative prelude that completely replaces Buck2’s built-in rules, we could not use it in our project but drew good inspiration from it.

Conclusion

With the setup shown, we saw that all that is needed really is Nix (pun intended1):

  • we provide the buck2 binary with Nix as part of a development environment
  • we leverage Nix inside Buck2 to provide build tools such as compilers, their required utilities and third-party libraries in a reproducible way

Consequently, onboarding new team members no longer means following seemingly endless and quickly outdated installation instructions. Installing nix is easy; entering the dev shell is fast, and you’re up and running in no time!

And using Buck2 gives us fast, incremental builds by only building the minimal set of dependencies needed for a specific target.

Next time, I will delve into how we seamlessly integrated the Haskell toolchain libraries from Nix and how we made it fast as well.


  1. The name Nix is derived from the Dutch word niks, meaning nothing; build actions don’t see anything that hasn’t been explicitly declared as an input

July 31, 2025 12:00 AM

July 28, 2025

Monday Morning Haskell

Spiral Matrix: Another Matrix Layer Problem

In last week’s article, we learned how to rotate a 2D Matrix in place using Haskell’s mutable array mechanics. This taught us how to think about a Matrix in terms of layers, starting from the outside and moving in towards the center.

Today, we’ll study one more 2D Matrix problem that uses this layer-by-layer paradigm. For more practice dealing with multi-dimensional arrays, check out our Solve.hs course! In Module 2, you’ll study all kinds of different data structures in Haskell, including 2D Matrices (both mutable and immutable).

The Problem

Today’s problem is Spiral Matrix. In this problem, we receive a 2D Matrix, and we would like to return the elements of that matrix in a 1D list in “spiral order”. This ordering consists of starting from the top left and going right. When we hit the top right corner, we move down to the bottom. The we come back across the bottom row to the left, and then back up the top left. Then we continue this process on inner layers.

So, for example, let’s suppose we have this 4x4 matrix:

1   2  3  4
5   6  7  8
9  10 11 12
13 14 15 16

This should return the following list:

[1,2,3,4,8,12,16,15,14,13,9,5,6,7,11,10]

At first glance, it seems like a lot of our layer-by-layer mechanics from last week will work again. All the numbers in the “first” layer come first, followed by the “second” layer, and so on. The trick though is that for this problem, we have to handle non-square matrices. So we can also have this matrix:

1  2  3  4
5  6  7  8
9 10 11 12

This should yield the list [1,2,3,4,8,12,11,10,9,5,6,7]. This isn’t a huge challenge, but we need a slightly different approach.

The Algorithm

We still want to generally move through the Matrix using a layer-by-layer approach. But instead of tracking the 4 corner points, we’ll just keep track of 4 “barriers”, imaginary lines dictating the “end” of each dimension (up/down/left/right) for us to scan. These barriers will be inclusive, meaning that they refer to the last valid row or column in that direction. We would call these “min row”, “min column”, “max row” and “max column”.

Now the general process for going through a layer will consist of 4 steps. Each step starts in a corner location and proceeds in one direction until the next corner is reached. Then, we can start again with the next layer.

The trick is the end condition. Because we can have rectangular matrices, the final layer can have a shape like 1 x n or n x 1, and this is a problem, because we wouldn’t need 4 steps. Even a square matrix of n x n with odd n would have a 1x1 as its final layer, and this is also a problem since it is unclear which “corner” this coordinate

Thus we have to handle these edge cases. However, they are easy to both detect and resolve. We know we are in such a case when “min row” and “max row” are equal, or if “min column” and “max column” are equal. Then to resolve the case, we just do one pass instead of 4, including both endpoints.

Rust Solution

For our Rust solution, let’s start by defining important terms, like we always do. For our terms, we’ll mainly be dealing with these 4 “barrier” values, the min and max for the current row and column. These are inclusive, so they are initially 0 and (length - 1). We also make a new vector to hold our result values.

pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
    let mut result: Vec<i32> = Vec::new();
    let mut minR: usize = 0;
    let mut maxR: usize = matrix.len() - 1;
    let mut minC: usize = 0;
    let mut maxC: usize = matrix[0].len() - 1;
    ...
}

Now we want to write a while loop where each iteration processes a single layer. We’ll know we are out of layers if either “minimum” exceeds its corresponding “maximum”. Then we can start penciling in the different cases and phases of the loop. The edge cases occur when a minimum is exactly equal to its maximum. And for the normal case, we’ll do our 4-directional scanning.

pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
    let mut result: Vec<i32> = Vec::new();
    let mut minR: usize = 0;
    let mut maxR: usize = matrix.len() - 1;
    let mut minC: usize = 0;
    let mut maxC: usize = matrix[0].len() - 1;
    while (minR <= maxR && minC <= maxC) {
        // Edge cases: single row or single column layers
        if (minR == maxR) {
            ...
            break;
        } else if (minC == maxC) {
            ...
            break;
        }

        // Scan TL->TR
        ...
        // Scan TR->BR
        ...
        // Scan BR->BL
        ...
        // Scan BL->TL
        ...
        
        minR += 1;
        minC += 1;
        maxR -= 1;
        maxC -= 1;
    }
    return result;
}

Our “loop update” step comes at the end, when we increase both minimums, and decrease both maximums. This shows we are shrinking to the next layer.

Now we just have to fill in each case. All of these are scans through some portion of the matrix. The only trick is getting the ranges correct for each scan.

We’ll start with the edge cases. For a single row or column scan, we just need one loop. This loop should be inclusive across its dimension. Rust has a similar range syntax to Haskell, but it is less flexible. We can make a range inclusive by using = before the end element.

pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
    ...
    while (minR <= maxR && minC <= maxC) {
        // Edge cases: single row or single column layers
        if (minR == maxR) {
            for i in minC..=maxC {
                result.push(matrix[minR][i]);
            }
            break;
        } else if (minC == maxC) {
            for i in minR..=maxR {
                result.push(matrix[i][minC]);
            }
            break;
        }
        ...
    }
    return result;
}

Now let’s fill in the other cases. Again, getting the right ranges is the most important factor. We also have to make sure we don’t mix up our dimensions or directions! We go right along minR, down along maxC, left along maxR, and then up along minC.

To represent a decreasing range, we have to make the corresponding incrementing range and then use .rev() to reverse it. This is a little inconvenient, giving up ranges that don’t look as nice, like for i in ((minC+1)..=maxC).rev(), because we want the decrementing range to include maxC but exclude minC.

pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
    ...
    while (minR <= maxR && minC <= maxC) {
        ...
        // Scan TL->TR
        for i in minC..maxC {
            result.push(matrix[minR][i]);
        }
        // Scan TR->BR
        for i in minR..maxR {
            result.push(matrix[i][maxC]);
        }
        // Scan BR->BL
        for i in ((minC+1)..=maxC).rev() {
            result.push(matrix[maxR][i]);
        }
        // Scan BL->TL
        for i in ((minR+1)..=maxR).rev() {
            result.push(matrix[i][minC]);
        }
        minR += 1;
        minC += 1;
        maxR -= 1;
        maxC -= 1;
    }
    return result;
}

But once these cases are filled in, we’re done! Here’s the full solution:

pub fn spiral_order(matrix: Vec<Vec<i32>>) -> Vec<i32> {
    let mut result: Vec<i32> = Vec::new();
    let mut minR: usize = 0;
    let mut maxR: usize = matrix.len() - 1;
    let mut minC: usize = 0;
    let mut maxC: usize = matrix[0].len() - 1;
    while (minR <= maxR && minC <= maxC) {
        // Edge cases: single row or single column layers
        if (minR == maxR) {
            for i in minC..=maxC {
                result.push(matrix[minR][i]);
            }
            break;
        } else if (minC == maxC) {
            for i in minR..=maxR {
                result.push(matrix[i][minC]);
            }
            break;
        }
        // Scan TL->TR
        for i in minC..maxC {
            result.push(matrix[minR][i]);
        }
        // Scan TR->BR
        for i in minR..maxR {
            result.push(matrix[i][maxC]);
        }
        // Scan BR->BL
        for i in ((minC+1)..=maxC).rev() {
            result.push(matrix[maxR][i]);
        }
        // Scan BL->TL
        for i in ((minR+1)..=maxR).rev() {
            result.push(matrix[i][minC]);
        }
        minR += 1;
        minC += 1;
        maxR -= 1;
        maxC -= 1;
    }
    return result;
}

Haskell Solution

Now let’s write our Haskell solution. We don’t need any fancy mutation tricks here. Our function will just take a 2D array, and return a list of numbers.

spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix = ...
  where
    ((minR', minC'), (maxR', maxC')) = A.bounds arr

Since we used a while loop in our Rust solution, it makes sense that we’ll want to use a raw recursive function that we’ll just call f. Our loop state was the 4 “barrier” values in each dimensions. We’ll also use an accumulator value for our result. Since our barriers are inclusive, we can simply use the bounds of our array for the initial values.

spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix = f minR' minC' maxR' maxC' []
  where
    ((minR', minC'), (maxR', maxC')) = A.bounds arr

    f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
    f = undefined

This recursive function has 3 base cases. First, we have the “loop condition” we used in our Rust solution. If a min dimension value exceeds the max, we are done, and should return our accumulated result list.

Then the other two cases are our edge cases or having a single row or a single column for our final layer. In all these cases, we want to reverse the accumulated list. This means that when we put together our ranges, we want to be careful that they are in reverse order! So the edge cases should start at their max value and decrease to the min value (inclusive).

spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix arr = f minR' minC' maxR' maxC' []
  where
    ((minR', minC'), (maxR', maxC')) = A.bounds arr

    f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
    f minR minC maxR maxC acc
      | minR > maxR || minC > maxC = reverse acc
      | minR == maxR = reverse $ [arr A.! (minR, c) | c <- [maxC,maxC - 1..minC]] <> acc
      | minC == maxC = reverse $ [arr A.! (r, minC) | r <- [maxR,maxR - 1..minR]] <> acc
      | otherwise = ...

Now to fill in the otherwise case, we can do our 4 steps: going right from the top left, then going down from the top right, going left from the bottom right, and going up from the bottom left.

Like the edge cases, we make list comprehensions with ranges to pull the new numbers out of our input matrix. And again, we have to make sure we accumulate them in reverse order. Then we append all of them to the existing accumulation.

spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix arr = f minR' minC' maxR' maxC' []
  where
    ((minR', minC'), (maxR', maxC')) = A.bounds arr

    f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
    f minR minC maxR maxC acc
      ...
      | otherwise =
          let goRights = [arr A.! (minR, c) | c <- [maxC - 1, maxC - 2..minC]]
              goDowns = [arr A.! (r, maxC) | r <- [maxR - 1, maxR - 2..minR]]
              goLefts = [arr A.! (maxR, c) | c <- [minC + 1..maxC]]
              goUps = [arr A.! (r, minC) | r <- [minR+1..maxR]]
              acc' = goUps <> goLefts <> goDowns <> goRights <> acc
          in  f (minR + 1) (minC + 1) (maxR - 1) (maxC - 1) acc'

We conclude by making our recursive call with the updated result list, and shifting the barriers to get to the next layer.

Here’s the full implementation:

spiralMatrix :: A.Array (Int, Int) Int -> [Int]
spiralMatrix arr = f minR' minC' maxR' maxC' []
  where
    ((minR', minC'), (maxR', maxC')) = A.bounds arr

    f :: Int -> Int -> Int -> Int -> [Int] -> [Int]
    f minR minC maxR maxC acc
      | minR > maxR || minC > maxC = reverse acc
      | minR == maxR = reverse $ [arr A.! (minR, c) | c <- [maxC,maxC - 1..minC]] <> acc
      | minC == maxC = reverse $ [arr A.! (r, minC) | r <- [maxR,maxR - 1..minR]] <> acc
      | otherwise =
          let goRights = [arr A.! (minR, c) | c <- [maxC - 1, maxC - 2..minC]]
              goDowns = [arr A.! (r, maxC) | r <- [maxR - 1, maxR - 2..minR]]
              goLefts = [arr A.! (maxR, c) | c <- [minC + 1..maxC]]
              goUps = [arr A.! (r, minC) | r <- [minR+1..maxR]]
              acc' = goUps <> goLefts <> goDowns <> goRights <> acc
          in  f (minR + 1) (minC + 1) (maxR - 1) (maxC - 1) acc'

Conclusion

This is the last matrix-based problem we’ll study for now. Next time we’ll start considering some tree-based problems. If you sign up for our Solve.hs course, you’ll learn about both of these kinds of data structures in Module 2. You’ll implement a tree set from scratch, and you’ll get lots of practice working with these and many other structures. So enroll today!

by James Bowen at July 28, 2025 08:30 AM

GHC Developer Blog

GHC 9.10.3-rc1 is now available

GHC 9.10.3-rc1 is now available

wz1000 - 2025-07-28

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.10.3. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.10.3 is a bug-fix release fixing over 50 issues of a variety of severities and scopes. A full accounting of these fixes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

This release candidate will have a two-week testing period. If all goes well the final release will be available the week of 11 August 2025.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at July 28, 2025 12:00 AM

July 24, 2025

Tweag I/O

Introduction to the new LaunchDarkly Svelte SDK

Feature flags reduce deployment risk, enable continuous delivery, and create controlled user experiences. As a Svelte enthusiast, I noticed the absence of official LaunchDarkly support for this growing framework, so I built the LaunchDarkly Svelte SDK to fill this gap. In this post, I’ll introduce the SDK and demonstrate how to implement it in a SvelteKit application.

Feature Flags in Frontend Development

Feature flags (or feature toggles) are runtime-controlled switches that let you enable or disable features without unnecessary deployments.

For example, imagine you are working on a new feature that requires significant changes to the UI. By using feature flags, you can deploy the changes to all the environments but only enable the feature in specific ones (like development or uat), or to a subset of users in a single environment (like users on Pro subscription). This allows you to test the feature without exposing it to unintended users, reducing the risk of introducing bugs or breaking changes. And in case things go bad, like a feature is not working as expected, you can easily disable it without having to roll back the entire deployment.

What is LaunchDarkly ?

LaunchDarkly is a feature management platform that provides an easy and scalable way to wrap parts of your code (new features, UI elements, backend changes) in flags so they can be turned on/off without redeploying. It provides a user-friendly dashboard to manage and observe flags, and supports over a dozen SDKs for client/server platforms. In my experience, LaunchDarkly is easier to use — including for non-technical users — and more scalable than most home-grown feature flag solutions.

LaunchDarkly supports targeting and segmentation, so you can control which users see specific features based on things like a user’s location or subscription plan. It also offers solid tooling for running experiments, including A/B testing and progressive rollouts (where a new feature is released to users in stages, rather than all at once). All feature flags can be updated in real-time, meaning that there’s no need for users to refresh the page to see changes.

Those are just my favorites, but if you are interested in learning more about it, LaunchDarkly has a blog post with more information.

Flag Evaluations

LaunchDarkly flags have unique identifiers called flag keys that are defined in the LaunchDarkly dashboard. When you request a flag value, supported client-side SDKs (such as React, iOS, Android, or, now, Svelte) send the flag key along with user information (called the “context”) to LaunchDarkly. LaunchDarkly’s server computes the value of the flag using all the applicable rules (the rules are applied in order) and sends the result back to the app. This process is called flag evaluation. By default, LaunchDarkly uses streaming connections to update flags in real time. This lets you flip flags in the dashboard and see the effect almost instantly in your app.

Svelte in Brief

Svelte is a modern JavaScript framework that I’ve come to appreciate for its performance, simplicity, and excellent developer experience. What I particularly like about Svelte is that it lets you write reactive code directly using standard JavaScript variables, with an intuitive syntax that requires less boilerplate than traditional React or Vue applications. Reactive declarations and stores are built into the framework, so you don’t need Redux or similar external state management libraries for most use cases.

Svelte’s Approach

  • Superior Runtime Performance: Svelte doesn’t rely on virtual DOM. By eliminating the virtual DOM and directly manipulating the real DOM, Svelte can update the UI more quickly and efficiently, resulting in a more responsive application.
  • Faster Load Times: Svelte’s compilation process generates smaller JavaScript bundles and more efficient code, resulting in faster initial page load times compared to frameworks that ship runtime libraries to the browser.

A Simple Example of a Svelte Component

In this example, we define a SimpleCounter component that increments a count when a button is clicked. The count variable is reactive, meaning that any changes to it will automatically update the UI.

// SimpleCounter.svelte
<script lang="ts">
  let count = $state(0);
</script>

<button onclick={() => count++}>
  clicks: {count}
</button>

Now, we can use this component in our application which is in fact another Svelte component. For example: App.svelte:

// App.svelte
<script lang="ts">
  import SimpleCounter from './SimpleCounter.svelte';
</script>

<SimpleCounter />

After doing this, we can end up with something like this:

Simple Counter Demo

Overview of the LaunchDarkly Svelte SDK

Why Use a Dedicated Svelte SDK?

Although LaunchDarkly’s vanilla JavaScript SDK could be used in a Svelte application, this new SDK aligns better with Svelte’s reactivity model and integrates with Svelte-tailored components, allowing us to use LaunchDarkly’s features more idiomatically in our Svelte projects. I originally developed it as a standalone project and then contributed it upstream to be an official part of the LaunchDarkly SDK.

Introduction to LaunchDarkly Svelte SDK

Here are some basic steps to get started with the LaunchDarkly Svelte SDK:

1.Install the Package: First, install the SDK package in your project.

Note: Since the official LaunchDarkly Svelte SDK has not been released yet, for the purposes of this blog post, I’ve created a temporary package available on npm that contains the same code as the official repo. You can still check the official source code in LaunchDarkly’s official repository.

npm install @nosnibor89/svelte-client-sdk

2.Initialize the SDK: Next, you need to initialize the SDK with your LaunchDarkly client-side ID (you need a LaunchDarkly account). This is done using the LDProvider component, which provides the necessary context for feature flag evaluation. Here is an example of how to set it up:

<script lang="ts">
  import { LDProvider } from '@nosnibor89/svelte-client-sdk';
  import MyLayout from './MyLayout.svelte';
</script>

// Use context relevant to your application. More info in https://docs.launchdarkly.com/home/observability/contexts
const context = {
  user: {
    key: 'user-key',
  },
};

<LDProvider clientID="your-client-side-id" {context}>
  <MyLayout />
</LDProvider>

Let’s clarify the code above:

  1. Notice how I wrapped the MyLayout component with the LDProvider component. Usually, you will wrap a high-level component that encompasses most of your application with LDProvider, although it’s up to you and how you want to structure the app.
  2. You can also notice two parameters provided to our LDProvider. The "your-client-side-id" refers to the LaunchDarkly Client ID and the context object refers to the LaunchDarkly Context used to evaluate feature flags. This is necessary information we need to provide for the SDK to work correctly.

3.Evaluate a flag: The SDK provides the LDFlag component for evaluating your flag1. This component covers a common use case where you want to render different content based on the state of a feature flag. By default, LDFlag takes a boolean flag but can be extended to work with the other LaunchDarkly flag types as well.

<script lang="ts">
 import { LDFlag } from '@nosnibor89/svelte-client-sdk';
</script>

<LDFlag flag={'my-feature-flag'}>
  {#snippet on()}
    <p>renders if flag evaluates to true</p>
  {/snippet}
  {#snippet off()}
    <p>renders if flag evaluates to false</p>
  {/snippet}
</LDFlag>

In this example, the LDFlag component will render the content inside the on snippet2 if the feature flag my-feature-flag evaluates to true. If the flag evaluates to false, the content inside the off snippet will be rendered instead.

Building an application with SvelteKit

Now that we have seen the basics of how to use the LaunchDarkly Svelte SDK, let’s see how we can put everything together in a real application.

For the sake of brevity, I’ll be providing the key source code in this example, but if you are curious or need help, you can check out the full source code in Github.

How the app works

This is a simple ‘movies’ app where the main page displays a list of movies in a card format with a SearchBar component at the top. This search bar allows users to filter movies based on the text entered.

App Demo

The scenario we’re simulating is that Product Owners want to replace the traditional search bar with a new AI-powered assistant that helps users get information about specific movies. This creates a perfect use case for feature flags and can be described as follows:

Feature Flag Scenarios

  1. SearchBar vs AI Assistant: We’ll use a boolean feature flag to determine whether to display the classic SearchBar component or the new MoviesSmartAssistant3 component - simulating a simple all-at-once release.

  2. AI Model Selection: We’ll use a JSON feature flag to determine which AI model (GPT or Gemini) the MoviesSmartAssistant will use. This includes details about which model to use for specific users, along with display information like labels. This simulates a progressive rollout where Product Owners can gather insights on which model performs better.

Prerequisites

To follow along, you’ll need:

  1. A LaunchDarkly account
  2. A LaunchDarkly Client ID (Check this guide to get it)
  3. Two feature flags (see the creating new flags guide): a boolean flag (show-movie-smart-assistant) and a JSON flag (smart-assistant-config) looking like this:
    {
      "model": "gpt-4",
      "label": "Ask GPT-4 anything"
    }
  4. A SvelteKit4 application (create with npx sv create my-app)

Integrating the LaunchDarkly Svelte SDK

After creating the project, a SvelteKit application was scaffolded for you, meaning you should have a src directory where your application code resides. Inside this folder, you will find a routes directory, which is where SvelteKit handles routing. More specifically, there are two files: +layout.svelte and +page.svelte which are the main files we are going to highlight in this post.

Setting up the layout

// src/routes/+layout.svelte
<script lang="ts">
  import "../app.css";
  import { LDProvider } from "@nosnibor89/svelte-client-sdk";
  import { PUBLIC_LD_CLIENT_ID } from '$env/static/public';
  import LoadingSpinner from "$lib/LoadingSpinner.svelte"; // Check source code in Github https://github.com/tweag/blog-resources/blob/master/launchdarkly-svelte-sdk-intro/src/lib/LoadingSpinner.svelte

  let { children } = $props();

  // random between 0 and 1
  const orgId = Math.round(Math.random());

  const orgKey = `sdk-example-org-${orgId}`


  const ldContext = {
    kind: "org",
    key: orgKey,
  };

</script>

<LDProvider clientID={PUBLIC_LD_CLIENT_ID} context={ldContext}>
  {#snippet initializing()}
    <div class="...">
      <LoadingSpinner message={"Loading flags"}/>
    </div>
  {/snippet}

  {@render children()}
</LDProvider>

Let’s analyze this:

  1. We are importing the LDProvider component from the LaunchDarkly Svelte SDK and wrapping our layout with it. In SvelteKit, the layout will act as the entry point for our application, so it’s a good place for us to initialize the SDK allowing us to use other members of the SDK in pages or child components.
  2. We are also importing the PUBLIC_LD_CLIENT_ID variable from the environment variables. You can set this variable in your .env file at the root of the project (this is a SvelteKit feature).
  3. Another thing to notice is that we are using a LoadingSpinner component while the SDK is initializing. This is optional and is a good place to provide feedback to the user while the SDK is loading and feature flags are being evaluated for the first time. Also, don’t worry about the code for LoadingSpinner, you can find it in the source code on Github.

Add the movies page

At this point, we are ready to start evaluating flags, so let’s now go ahead and add our page where the SDK will help us accomplish scenarios 1 and 2.

Movies Page (SearchBar vs AI Assistant)

The movies page is the main and only page of our application. It displays a list of movies along with a search bar. This is the part where we will evaluate our first feature flag to switch between the SearchBar and the MoviesSmartAssistant components.

// src/routes/+page.svelte
<script lang="ts">
  // ...some imports hidden for brevity. Check source code on Github
  import SearchBar from "$lib/SearchBar.svelte";
  import MoviesSmartAssistant from "$lib/MoviesSmartAssistant.svelte";
  import { LD, LDFlag } from "@nosnibor89/svelte-client-sdk";

  let searchQuery = $state("");
  let prompt = $state("");
  const flagKey = "show-movie-smart-assistant";
  const flagValue = LD.watch(flagKey);
  flagValue.subscribe((value) => {
    // remove search query or prompt when flag changes
      searchQuery = "";
      prompt = "";
  });

  // ...rest of the code hidden for brevity. Check source code on Github
  // https://github.com/tweag/blog-resources/blob/master/launchdarkly-svelte-sdk-intro/src/routes/%2Bpage.svelte

</script>

<div class="...">
  <LDFlag flag={flagKey}>
    {#snippet on()}
      <MoviesSmartAssistant
        prompt={prompt}
        onChange={handlePromptChange}
        onSubmit={handleSendPrompt}
      />
    {/snippet}
    {#snippet off()}
      <SearchBar value={searchQuery} onSearch={handleSearch} />
    {/snippet}
  </LDFlag>

  <div
    class="..."
  >
    {#each filteredMovies as movie}
      <MovieCard {movie} />
    {/each}
  </div>
</div>

Again, let’s break this down:

  1. We are using the LDFlag component from the SDK. It will allow us to determine which component to render based on the state of the show-movie-smart-assistant feature flag. When the flag evaluates to true, the on snippet will run, meaning the MoviesSmartAssistant component will be rendered, and when the flag evaluates to false, the off snippet will run, meaning the SearchBar component will be rendered.
  2. We are also using the LD.watch function. This is useful when you need to get the state of a flag and keep track of it. In this case, we are simply resetting the search query or prompt so that the user can start fresh when the flag changes.
  3. The rest of the code you are not seeing is just functionality for the filtering mechanism and the rest of the presentational components. Remember you can find the code for those on Github.

MoviesSmartAssistant Component (AI Model Selection)

Whenever our MoviesSmartAssistant component is rendered, we want to check the value of the smart-assistant-config feature flag to determine which AI model to use for the assistant.

// src/lib/MoviesSmartAssistant.svelte
<script lang="ts">
  import { LD } from "@nosnibor89/svelte-client-sdk";
  import type { Readable } from "svelte/store";

  type MoviesSmartAssistantConfig = { model: string; label: string;};
  const smartAssistantConfig = LD.watch("smart-assistant-config") as Readable<MoviesSmartAssistantConfig>;
  // ... rest of the code hidden for brevity. Check source code on Github
  // https://github.com/tweag/blog-resources/blob/master/launchdarkly-svelte-sdk-intro/src/lib/MoviesSmartAssistant.svelte
</script>

<div class="...">
  <input
    type="text"
    placeholder={$smartAssistantConfig?.label ?? "Ask me anything..."}
    value={prompt}
    oninput={handleInput}
    class="..."
  />
  <button type="button" onclick={handleClick} aria-label="Submit">
    // ...svg code hidden for brevity
  </button>
</div>

As before, I’m hiding some code for brevity, but here are the key points:

  1. We are using the LD.watch method to watch for changes in the smart-assistant-config feature flag which contains information about the AI model. This will allow us to use the proper model for a given user based on the flag evaluation.
  2. Notice how the SDK understands it’s a JSON flag and returns a Javascript object (with a little help5) as we defined in the LaunchDarkly dashboard.

Running the Application

Now that we have everything set up, let’s run the application. Here we are going to use the Client ID and set it as an environment variable.

PUBLIC_LD_CLIENT_ID={your_client_id} npm run dev

Open your browser and navigate to http://localhost:5173 (check your terminal as it may run at a different port). You should see the movies application with either the SearchBar or MoviesSmartAssistant component depending on your feature flag configuration.

Seeing Feature Flags in Action

If you were able to correctly set everything up, you should be able to interact with the application and LaunchDarkly Dashboard by toggling the feature flags and validating the behavior of the application.

I have included this demo video to show you how the application works and how the feature flags are being evaluated.

Conclusion

We just saw how to use the LaunchDarkly Svelte SDK and integrate it into a SvelteKit application using a realistic example. I hope this post gave you an understanding of the features the SDK provides and also what it lacks while being in its early stages and while awaiting the official release.

For now, my invitation for you is to try the SDK yourself and explore different use cases. For example, change the context with LD.identify to simulate users signing in to an application, or maybe try a different flag type like a string or number flag. Also, stay tuned for updates on the official LaunchDarkly Svelte SDK release.


  1. LDFlag is a key component but there are other ways to evaluate a flag using the SDK.
  2. Snippets are a Svelte feature and can also be named slots. Check out https://svelte.dev/docs/svelte/snippet
  3. The MoviesSmartAssistant component is just a visual representation without actual AI functionality — my focus is on demonstrating how the LaunchDarkly Svelte SDK enables these feature flag implementations.
  4. SvelteKit is the official application framework for Svelte. It comes with out-of-the-box support for TypeScript, server-side rendering, and automatic routing through file-based organization.
  5. Ok, I’m also using TypeScript here to hint the type of the object returned by the LD.watch method. Maybe this is something to fix in the future.

July 24, 2025 12:00 AM

July 23, 2025

Well-Typed.Com

Pure parallelism (Haskell Unfolder #47)

Today, 2025-07-23, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 47th episode of the Haskell Unfolder live on YouTube.

Pure parallelism (Haskell Unfolder #47)

“Pure parallelism” refers to the execution of pure Haskell functions on multiple CPU cores, (hopefully) speeding up the computation. Since we are still dealing with pure functions, however, we get none of the problems normally associated with concurrent execution: no non-determinism, no need for locks, etc. In this episode we will develop a pure but parallel implementation of linear regression. We will briefly recap how linear regression works, before discussing the two primitive functions that Haskell offers for pure parallelism: par and pseq.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at July 23, 2025 12:00 AM

July 21, 2025

Monday Morning Haskell

Image Rotation: Mutable Arrays in Haskell

In last week’s article, we took our first step into working with multi-dimensional arrays. Today, we’ll be working with another Matrix problem that involves in-place mutation. The Haskell solution uses the MArray interface, which takes us out of our usual

The MArray interface is a little tricky to work with. If you want a full overview of the API, you should sign up for our Solve.hs course, where we cover mutable arrays in module 2!

The Problem

Today’s problem is Rotate Image. We’re going to take a 2D Matrix of integer values as our input and rotate the matrix 90 degrees clockwise. We must accomplish this in place, modifying the input value without allocating a new Matrix. The input matrix is always “square” (n x n).

Here are a few examples to illustrate the idea. We can start with a 2x2 matrix:

1  2   |   3  1
3  4   |   4  2

The 4x4 rotation makes it more clear that we’re not just moving numbers one space over. Each corner element will go to a new corner. You can also see how the inside of the matrix is also rotating:

1  2  3  4    |  13  9  5  1
5  6  7  8    |  14 10  6  2
9  10 11 12   |  15 11  7  3
13 14 15 16   |  16 12  8  4

The 3x3 version shows how with an odd number of rows and columns, the inner most number will stand still.

1  2  3   |   7  4  1
4  5  6   |   8  5  2
7  8  9   |   9  6  3

The Algorithm

While this problem might be a little intimidating at first, we just have to break it into sufficiently small and repeatable pieces. The core step is that we swap four numbers into each other’s positions. It’s easy to see, for example, that the four corners always trade places with one another (1, 4, 13, 16 in the 4x4 example).

What’s important is seeing the other sets of 4. We move clockwise to get the next 4 values:

  1. The value to the right of the top left corner
  2. The value below the top right corner
  3. The value to the left of the bottom right corner
  4. The value above the bottom left corner.

So in the 4x4 example, these would be 2, 8, 15, 9. Then another group is 3, 12, 14, 15.

Those 3 groups are all the rotations we need for the “outer layer”. Then we move to the next layer, where we have a single group of 4: 6, 7, 10, 11.

This should tell us that we have a 3-step process:

  1. Loop through each layer of the matrix
  2. Identify all groups of 4 in this layer
  3. Rotate each group of 4

It helps to put a count on the size of each of these loops. For an n x n matrix, the number of layers to rotate is n / 2, rounded down, because the inner-most layer needs no rotation in an odd-sized matrix.

Then for a layer spanning from column c1 to c2, the number of groups in that layer is just c2 - c1. So for the first layer in a 4x4, we span columns 0 to 3, and there are 3 groups of 4. In the inner layer, we span columns 1 to 2, so there is only 1 group of 4.

Rust Solution

As is typical, we’ll see more of a loop structure in our Rust code, and a recursive version of this solution in Haskell. We’ll also start by defining various terms we’ll use. There are multiple ways to approach the details of this problem, but we’ll take an approach that maximizes the clarity of our inner loops.

We’ll define each “layer” using the four corner coordinates of that layer. So for an n x n matrix, these are (0,0), (0, n - 1), (n - 1, n - 1), (n - 1, 0). After we finish looping through a layer, we can simply increment/decrement each of these values as appropriate to get the corner coordinates of the next layer ((1,1), (1, n - 2), etc.).

So let’s start our solution by defining the 8 mutable values for these 4 corners. Each corner (top/left/bottom/right) has a row R and column C value.

pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
    let n = matrix.len();
    let numLayers = n / 2;
    let mut topLeftR = 0;
    let mut topLeftC = 0;
    let mut topRightR = 0;
    let mut topRightC = n - 1;
    let mut bottomRightR = n - 1;
    let mut bottomRightC = n - 1;
    let mut bottomLeftR = n - 1;
    let mut bottomLeftC = 0;
    ...
}

It would be possible to solve the problem without these values, determining coordinates using the layer number. But I’ve found this to be somewhat more error prone, since we’re constantly adding and subtracting from different coordinates in different combinations. We get the number of layers from n / 2.

Now let’s frame the outer loop. We conclude the loop by modifying each coordinate point. Then at the beginning of the loop, we can determine the number of “groups” for the layer by taking the difference between the left and right column coordinates.

pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
    ...
    for i in 0..numLayers {
        let numGroups = topRightC - topLeftC;

        for j in 0..numGroups {
            ...
        }

        topLeftR += 1;
        topLeftC += 1;
        topRightR += 1;
        topRightC -= 1;
        bottomRightR -= 1;
        bottomRightC -= 1;
        bottomLeftR -= 1;
        bottomLeftC += 1;
    }
}

Now we just need the logic for rotating a single group of 4 points. This is a 5-step process:

  1. Save top left value as temp
  2. Move bottom left to top left
  3. Move bottom right to bottom left
  4. Move top right to bottom right
  5. Move temp (original top left) to top right

Unlike the layer number, we’ll use the group variable j for arithmetic here. When you’re writing this yourself, it’s important to go slowly to make sure you’re using the right corner values and adding/subtracting j from the correct dimension.

pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
    ...
    for i in 0..numLayers {
        let numGroups = topRightC - topLeftC;

        for j in 0..numGroups {
            let temp = matrix[topLeftR][topLeftC + j];
            matrix[topLeftR][topLeftC + j] = matrix[bottomLeftR - j][bottomLeftC];
            matrix[bottomLeftR - j][bottomLeftC] = matrix[bottomRightR][bottomRightC - j];
            matrix[bottomRightR][bottomRightC - j] = matrix[topRightR + j][topRightC];
            matrix[topRightR + j][topRightC] = temp;
        }

        ... // (update corners)
    }
}

And then we’re done! We don’t actually need to return a value since we’re just modifying the input in place. Here’s the full solution:

pub fn rotate(matrix: &mut Vec<Vec<i32>>) {
    let n = matrix.len();
    let numLayers = n / 2;
    let mut topLeftR = 0;
    let mut topLeftC = 0;
    let mut topRightR = 0;
    let mut topRightC = n - 1;
    let mut bottomRightR = n - 1;
    let mut bottomRightC = n - 1;
    let mut bottomLeftR = n - 1;
    let mut bottomLeftC = 0;
    for i in 0..numLayers {
        let numGroups = topRightC - topLeftC;

        for j in 0..numGroups {
            let temp = matrix[topLeftR][topLeftC + j];
            matrix[topLeftR][topLeftC + j] = matrix[bottomLeftR - j][bottomLeftC];
            matrix[bottomLeftR - j][bottomLeftC] = matrix[bottomRightR][bottomRightC - j];
            matrix[bottomRightR][bottomRightC - j] = matrix[topRightR + j][topRightC];
            matrix[topRightR + j][topRightC] = temp;
        }

        topLeftR += 1;
        topLeftC += 1;
        topRightR += 1;
        topRightC -= 1;
        bottomRightR -= 1;
        bottomRightC -= 1;
        bottomLeftR -= 1;
        bottomLeftC += 1;
    }
}

Haskell Solution

This is an interesting problem to solve in Haskell because Haskell is a generally immutable language. Unlike Rust, we can’t make values mutable just by putting the keyword mut in front of them.

With arrays, we can modify them in place though using the MArray monad class. We won’t go through all the details of the interface in this article (you can learn about all that in Solve.hs Module 2). But we’ll start with the type signature:

rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()

This tells us we are taking a mutable array, where the array type is polymorphic but tied to the monad m. For example, IOArray would work with the IO monad. We don’t return anything, because we’re modifying our input.

We still begin our function by defining terms, but now we need to use monadic actions to retrieve even the bounds our our array.

rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
  ((minR, minC), (maxR, maxC)) <- getBounds arr
  let n = maxR - minR + 1
  let numLayers = n `quot` 2
  ...

Our algorithm has two loop levels. The outer loop goes through the different layers of the matrix. The inner layer goes through each group of 4 within the layer. In Haskell, both of these loops are recursive, monadic functions. Our Rust loops treat the four corner points of the layer as stateful values, so these need to be inputs to our recursive functions. In addition, each function will take the layer/group number as an input.

rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
  ((minR, minC), (maxR, maxC)) <- getBounds arr
  let n = maxR - minR + 1
  let numLayers = n `quot` 2
  ...
  where
    rotateLayer tl@(tlR, tlC) tr@(trR, trC) br@(brR, brC) bl@(blR, blC) n = ...
    
    rotateGroup (tlR, tlC) (trR, trC) (brR, brC) (blR, blC) j = ...

Now we just have to fill in these functions. For rotateLayer, we use the “layer number” parameter as a countdown. Once it reaches 0, we’ll be done. We just need to determine the number of groups in this layer using the column difference of left and right. Then we’ll call rotateGroup for each group.

We make the first call to rotateLayer with numLayers and the original corners, coming from our dimensions. When we recurse, we add/subtract 1 from the corner dimensions, and subtract 1 from the layer number.

rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
  ((minR, minC), (maxR, maxC)) <- getBounds arr
  let n = maxR - minR + 1
  let numLayers = n `quot` 2
  rotateLayer (minR, minC) (minR, maxC) (maxR, maxC) (maxR, minC) numLayers
  where
    rotateLayer _ _ _ _ 0 = return ()
    rotateLayer tl@(tlR, tlC) tr@(trR, trC) br@(brR, brC) bl@(blR, blC) n = do
      let numGroups = ([0..(trC - tlC - 1)] :: [Int])
      forM_ numGroups (rotateGroup tl tr br bl)
      rotateLayer (tlR + 1, tlC + 1) (trR + 1, trC - 1) (brR - 1, brC - 1) (blR - 1, blC + 1) (n - 1)
    
    rotateGroup (tlR, tlC) (trR, trC) (brR, brC) (blR, blC) j = ...

And how do we rotate a group? We use the same five steps we took in Rust. We save the top left as temp and then move the values around. We use the monadic functions readArray and writeArray to perform these actions in place on our Matrix.

rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
  ...
  where
    ...
    
    rotateGroup (tlR, tlC) (trR, trC) (brR, brC) (blR, blC) j = do
      temp <- readArray arr (tlR, tlC + j)
      readArray arr (blR - j, blC) >>= writeArray arr (tlR, tlC + j)
      readArray arr (brR, brC - j) >>= writeArray arr (blR - j, blC)
      readArray arr (trR + j, trC) >>= writeArray arr (brR, brC - j)
      writeArray arr (trR + j, trC) temp

Here’s the full implementation:

rotateImage :: (MArray array Int m) => array (Int, Int) Int -> m ()
rotateImage arr = do
  ((minR, minC), (maxR, maxC)) <- getBounds arr
  let n = maxR - minR + 1
  let numLayers = n `quot` 2
  rotateLayer (minR, minC) (minR, maxC) (maxR, maxC) (maxR, minC) numLayers
  where
    rotateLayer _ _ _ _ 0 = return ()
    rotateLayer tl@(tlR, tlC) tr@(trR, trC) br@(brR, brC) bl@(blR, blC) n = do
      let numGroups = ([0..(trC - tlC - 1)] :: [Int])
      forM_ numGroups (rotateGroup tl tr br bl)
      rotateLayer (tlR + 1, tlC + 1) (trR + 1, trC - 1) (brR - 1, brC - 1) (blR - 1, blC + 1) (n - 1)
    
    rotateGroup (tlR, tlC) (trR, trC) (brR, brC) (blR, blC) j = do
      temp <- readArray arr (tlR, tlC + j)
      readArray arr (blR - j, blC) >>= writeArray arr (tlR, tlC + j)
      readArray arr (brR, brC - j) >>= writeArray arr (blR - j, blC)
      readArray arr (trR + j, trC) >>= writeArray arr (brR, brC - j)
      writeArray arr (trR + j, trC) temp

Conclusion

We’ve got one more Matrix problem to solve next time, and then we’ll move on to some other data structures. To learn more about using Data Structures and Algorithms in Haskell, you take our Solve.hs course. You’ll get the chance to write a number of data structures from scratch, and you’ll get plenty of practice working with them and using them in algorithms!

by James Bowen at July 21, 2025 08:30 AM

July 18, 2025

Brent Yorgey

Competitive programming in Haskell: sparse tables

Competitive programming in Haskell: sparse tables

Continuing a series of posts on techniques for calculating range queries, today I will present the sparse table data structure, for doing fast range queries on a static sequence with an idempotent combining operation.

Motivation

In my previous post, we saw that if we have a static sequence and a binary operation with a group structure (i.e. every element has an inverse), we can precompute a prefix sum table in \(O(n)\) time, and then use it to answer arbitrary range queries in \(O(1)\) time.

What if we don’t have inverses? We can’t use prefix sums, but can we do something else that still allows us to answer range queries in \(O(1)\)? One thing we could always do would be to construct an \(n \times n\) table storing the answer to every possible range query—that is, \(Q[i,j]\) would store the value of the range \(a_i \diamond \dots \diamond a_j\). Then we could just look up the answer to any range query in \(O(1)\). Naively computing the value of each \(Q[i,j]\) would take \(O(n)\) time, for a total of \(O(n^3)\) time to fill in each of the entries in the tableWe only have to fill in \(Q[i,j]\) where \(i < j\), but this is still about \(n^2/2\) entries.

, though it’s not too hard to fill in the table in \(O(n^2)\) total time, spending only \(O(1)\) to fill in each entry—I’ll leave this to you as an exercise.

However, \(O(n^2)\) is often too big. Can we do better? More generally, we are looking for a particular subset of range queries to precompute, such that the total number is asymptotically less than \(n^2\), but we can still compute the value of any arbitrary range query by combining some (constant number of) precomputed ranges. In the case of a group structure, we were able to compute the values for only prefix ranges of the form \(1 \dots k\), then compute the value of an arbitrary range using two prefixes, via subtraction.

A sparse table is exactly such a scheme for precomputing a subset of ranges.In fact, I believe, but do not know for sure, that this is where the name “sparse table” comes from—it is “sparse” in the sense that it only stores a sparse subset of range values.

Rather than only a linear number of ranges, as with prefix sums, we have to compute \(O(n \lg n)\) of them, but that’s still way better than \(O(n^2)\). Note, however, that a sparse table only works when the combining operation is idempotent, that is, when \(x \diamond x = x\) for all \(x\). For example, we can use a sparse table with combining operations such as \(\max\) or \(\gcd\), but not with \(+\) or \(\times\). Let’s see how it works.

Sparse tables

The basic idea behind a sparse table is that we precompute a series of “levels”, where level \(i\) stores values for ranges of length \(2^i\). So level \(0\) stores “ranges of length \(1\)”—that is, the elements of the original sequence; level \(1\) stores ranges of length \(2\); level \(2\) stores ranges of length \(4\); and so on. Formally, \(T[i,j]\) stores the value of the range of length \(2^i\) starting at index \(j\). That is,

\[T[i,j] = a_j \diamond \dots \diamond a_{j+2^i-1}.\]

We can see that \(i\) only needs to go from \(0\) up to \(\lfloor \lg n \rfloor\); above that and the stored ranges would be larger than the entire sequence. So this table has size \(O(n \lg n)\).

Two important questions remain: how do we compute this table in the first place? And once we have it, how do we use it to answer arbitrary range queries in \(O(1)\)?

Computing the table is easy: each range on level \(i\), of length \(2^i\), is the combination of two length-\(2^{i-1}\) ranges from the previous level. That is,

\[T[i,j] = T[i-1, j] \diamond T[i-1, j+2^{i-1}]\]

The zeroth level just consists of the elements of the original sequence, and we can compute each subsequent level using values from the previous level, so we can fill in the entire table in \(O(n \lg n)\) time, doing just a single combining operation for each value in the table.

Once we have the table, we can compute the value of an arbitrary range \([l,r]\) as follows:

  • Compute the biggest power of two that fits within the range, that is, the largest \(k\) such that \(2^k \leq r - l + 1\). We can compute this simply as \(\lfloor \lg (r - l + 1) \rfloor\).

  • Look up two range values of length \(2^k\), one for the range which begins at \(l\) (that is, \(T[k, l]\)) and one for the range which ends at \(r\) (that is, \(T[k, r - 2^k + 1]\)). These two ranges overlap; but because the combining operation is idempotent, combining the values of the ranges yields the value for our desired range \([l,r]\).

    This is why we require the combining operation to be idempotent: otherwise the values in the overlap would be overrepresented in the final, combined value.

Haskell code

Let’s write some Haskell code! First, a little module for idempotent semigroups. Note that we couch everything in terms of semigroups, not monoids, because we have no particular need of an identity element; indeed, some of the most important examples like \(\min\) and \(\max\) don’t have an identity element. The IdempotentSemigroup class has no methods, since as compared to Semigroup it only adds a law. However, it’s still helpful to signal the requirement. You might like to convince yourself that all the instances listed below really are idempotent.

module IdempotentSemigroup where

import Data.Bits
import Data.Semigroup

-- | An idempotent semigroup is one where the binary operation
--   satisfies the law @x <> x = x@ for all @x@.
class Semigroup m => IdempotentSemigroup m

instance Ord a => IdempotentSemigroup (Min a)
instance Ord a => IdempotentSemigroup (Max a)
instance IdempotentSemigroup All
instance IdempotentSemigroup Any
instance IdempotentSemigroup Ordering
instance IdempotentSemigroup ()
instance IdempotentSemigroup (First a)
instance IdempotentSemigroup (Last a)
instance Bits a => IdempotentSemigroup (And a)
instance Bits a => IdempotentSemigroup (Ior a)
instance (IdempotentSemigroup a, IdempotentSemigroup b) => IdempotentSemigroup (a,b)
instance IdempotentSemigroup b => IdempotentSemigroup (a -> b)

Now, some code for sparse tables. First, a few imports.

{-# LANGUAGE TupleSections #-}

module SparseTable where

import Data.Array (Array, array, (!))
import Data.Bits (countLeadingZeros, finiteBitSize, (!<<.))
import IdempotentSemigroup

The sparse table data structure itself is just a 2D array over some idempotent semigroup m. Note that UArray would be more efficient, but (1) that would make the code for building the sparse table more annoying (more on this later), and (2) it would require a bunch of tedious additional constraints on m.

newtype SparseTable m = SparseTable (Array (Int, Int) m)
  deriving (Show)

We will frequently need to compute rounded-down base-two logarithms, so we define a function for it. A straightforward implementation would be to repeatedly shift right by one bit and count the number of shifts needed to reach zero; however, there is a better way, using Data.Bits.countLeadingZeros. It has a naive default implementation which counts right bit shifts, but in most cases it compiles down to much more efficient machine instructions.

-- | Logarithm base 2, rounded down to the nearest integer.  Computed
--   efficiently using primitive bitwise instructions, when available.
lg :: Int -> Int
lg n = finiteBitSize n - 1 - countLeadingZeros n

Now let’s write a function to construct a sparse table, given a sequence of values. Notice how the sparse table array st is defined recursively. This works because the Array type is lazy in the stored values, with the added benefit that only the array values we end up actually needing will be computed. However, this comes with a decent amount of overhead. If we wanted to use an unboxed array instead, we wouldn’t be able to use the recursive definition trick; instead, we would have to use an STUArray and fill in the values in a specific order. The code for this would be longer and much more tedious, but could be faster if we end up needing all the values in the array anyway.

-- | Construct a sparse table which can answer range queries over the
--   given list in $O(1)$ time.  Constructing the sparse table takes
--   $O(n \lg n)$ time and space, where $n$ is the length of the list.
fromList :: IdempotentSemigroup m => [m] -> SparseTable m
fromList ms = SparseTable st
 where
  n = length ms
  lgn = lg n

  st =
    array ((0, 0), (lgn, n - 1)) $
      zip ((0,) <$> [0 ..]) ms
        ++ [ ((i, j), st ! (i - 1, j) <> st ! (i - 1, j + 1 !<<. (i - 1)))
           | i <- [1 .. lgn]
           , j <- [0 .. n - 1 !<<. i]
           ]

Finally, we can write a function to answer range queries.

-- | \$O(1)$. @range st l r@ computes the range query which is the
--   @sconcat@ of all the elements from index @l@ to @r@ (inclusive).
range :: IdempotentSemigroup m => SparseTable m -> Int -> Int -> m
range (SparseTable st) l r = st ! (k, l) <> st ! (k, r - (1 !<<. k) + 1)
 where
  k = lg (r - l + 1)

Applications

Most commonly, we can use a sparse table to find the minimum or maximum values on a range, \(\min\) and \(\max\) being the quintessential idempotent operations. For example, this plays a key role in a solution to the (quite tricky) problem Ograda.At first it seemed like that problem should be solvable with some kind of sliding window approach, but I couldn’t figure out how to make it work!

What if we want to find the index of the minimum or maximum value in a given range (see, for example, Worst Weather)? We can easily accomplish this using the semigroup Min (Arg m i) (or Max (Arg m i)), where m is the type of the values and i is the index type. Arg, from Data.Semigroup, is just a pair which uses only the first value for its Eq and Ord instances, and carries along the second value (which is also exposed via Functor, Foldable, and Traversable instances). In the example below, we can see that the call to range st 0 3 returns both the max value on the range (4) and its index (2) which got carried along for the ride:

λ> :m +Data.Semigroup
λ> st = fromList (map Max (zipWith Arg [2, 3, 4, 2, 7, 4, 9] [0..]))
λ> range st 0 3
Max {getMax = Arg 4 2}

Finally, I will mention that being able to compute range minimum queries is one way to compute lowest common ancestors for a (static, rooted) tree. First, walk the tree via a depth-first search and record the depth of each node encountered in sequence, a so-called Euler tour (note that you must record every visit to a node—before visiting any of its children, in between each child, and after visiting all the children). Now the minimum depth recorded between visits to any two nodes will correspond to their lowest common ancestor.

Here are a few problems that involve computing least common ancestors in a tree, though note there are also other techniques for computing LCAs (such as binary jumping) which I plan to write about eventually.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at July 18, 2025 12:00 AM

July 16, 2025

Stackage Blog

LTS 24 release for ghc-9.10 and Nightly now on ghc-9.12

Stackage LTS 24 has been released

The Stackage team is happy to announce that Stackage LTS version 24 has finally been released a couple of days ago, based on GHC stable version 9.10.2.

LTS 24 includes many package changes, and over 3400 packages! Thank you for all your nightly contributions that made this release possible: the initial release was prepared by Mihai Maruseac. The closest nightly snapshot to lts-24.0 is nightly-2025-07-13.

If your package is missing from LTS 24 and can build there, you can easily have it added by opening a PR in lts-haskell to the build-constraints/lts-24-build-constraints.yaml file.

Stackage Nightly updated to ghc-9.12.2

At the same time we are excited to move Stackage Nightly to GHC 9.12.2: the initial snapshot release is nightly-2025-07-15. Current nightly has over 3100 packages, and we expect that number to grow over the coming weeks and months: we welcome your contributions and help with this. This initial release build was made by Jens Petersen (31 commits).

A number of packages have been disabled, with the switch to a new GHC version. You can see all the changes made relative to the preceding last 9.10 nightly snapshot. Apart from trying to build yourself, the easiest way to understand why particular packages are disabled is to look for their < 0 lines in build-constraints.yaml, particularly under the "Library and exe bounds failures" section. We also have some tracking issues still open related to 9.12 core boot libraries.

Thank you to all those who have already done work updating their packages for ghc-9.12.

Adding or enabling your package for Nightly is just a simple pull request to the large build-constraints.yaml file.

If you have questions, you can ask in Stack and Stackage Matrix room (#haskell-stack:matrix.org) or Slack channel.

July 16, 2025 07:00 AM

July 14, 2025

Monday Morning Haskell

Binary Search in a 2D Matrix

In our problem last week, we covered a complex problem that used a binary search. Today, we’ll apply binary search again to solidify our understanding of it. This time, instead of extra algorithmic complexity, we’ll start adding some data structure complexity. We’ll be working with a 2D Matrix instead of basic arrays.

To learn more about data structures and algorithms in Haskell, you should take a look at our Solve.hs course! In particular, you’ll cover multi-dimensional arrays in module 2, and you’ll learn how to write algorithms in Haskell in module 3!

The Problem

Today’s problem is Search a 2D Matrix, and the description is straightforward. We’re given a 2D m x n matrix, as well as a target number. We have to return a boolean for whether or not that number is in the Matrix.

This is trivial with a simple scan, but we have an additional constraint that lets us solve the problem faster. The matrix is essentially ordered. Each row is non-decreasing, and the first element of each successive row is no smaller than the last element of the preceding row.

This allows us to get a solution that is O(log(n + m)), a considerable improvement over a linear scan.

The Algorithm

The algorithm is simple as well. We’ll do two binary searches. First, we’ll search over the rows to identify the last row which could contain the element. Then we’ll do a binary search of that row to see if the element is present or not.

We’ll have a slightly different form to our searches compared to last time. In last week’s problem, we knew we had to find a valid index for our search. Now, we may find that no valid index exists.

So we’ll structure our search interval in a semi-open fashion. The first index in our search interval is inclusive, meaning that it could still be a valid index. The second index is exclusive, meaning it is the lowest index that we consider invalid.

In mathematical notation, we would represent such an interval with a square bracket on the left and a parenthesis on the right. So if that interval is [0, 4), then 0, 1, 2, 3 are valid values. The interval [2,2) would be considered empty, with no valid values. We’ll see how we apply this idea in practice.

Rust Solution

We don’t have that many terms to define at the start of this solution. We’ll save the size of both dimensions, and then prepare ourselves for the first binary search by assigning low as 0 (the first potential “valid” answer), hi as m (the lowest “invalid” answer), and creating our output rowWithTarget value. For this, we also assign m, an invalid value. If we fail to re-assign rowWithTarget in our binary search, we want it assigned to an easily testable invalid value.

pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
    let m = matrix.len();
    let n = matrix[0].len();

    let mut low = 0;
    let mut hi = m;
    let mut rowWithTarget = m;
    ...
}

Now we write our first binary search, looking for a row that could contain our target value. We maintain the typical pattern of binary search, using the loop while (low < hi) and assigning mid = (low + hi) / 2.

pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
    ...
    while (low < hi) {
        let mid: usize = (low + hi) / 2;
        if (matrix[mid][0] > target) {
            hi = mid;
        } else if (matrix[mid][n - 1] < target) {
            low = mid + 1;
        } else {
            rowWithTarget = mid;
            break;
        }
    }
    if (rowWithTarget >= m) {
        return false;
     }
    ...
}

If the first element of the row is too large, we know that mid is “invalid”, so we can assign it as hi and continue. If the last element is too small, then we reassign low as mid + 1, as we want low to still be a potentially valid value.

Otherwise, we have found a potential row, so we assign rowWithTarget and break. If, after this search, rowWithTarget has the “invalid” value of m, we can return false, as there are no valid values.

Now we just do the same thing over again, but within rowWithTarget! We reassign low and hi (as n this time) to reset the while loop. And now our comparisons will look at the specific value matrix[rowWithTarget][mid].

pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
    ...
    low = 0;
    hi = n;
    while (low < hi) {
        let mid: usize = (low + hi) / 2;
        if (matrix[rowWithTarget][mid] > target) {
            hi = mid;
        } else if (matrix[rowWithTarget][mid] < target) {
            low = mid + 1;
        } else {
            return true;
        }
    }
    return false;
}

Again, we follow the same pattern of re-assigning low and hi. If we don’t hit the return true case in the loop, we’ll end up with return false at the end, because we haven’t found the target.

Here’s the full solution:

pub fn search_matrix(matrix: Vec<Vec<i32>>, target: i32) -> bool {
    let m = matrix.len();
    let n = matrix[0].len();

    let mut low = 0;
    let mut hi = m;
    let mut rowWithTarget = m;

    while (low < hi) {
        let mid: usize = (low + hi) / 2;
        if (matrix[mid][0] > target) {
            hi = mid;
        } else if (matrix[mid][n - 1] < target) {
            low = mid + 1;
        } else {
            rowWithTarget = mid;
            break;
        }
    }
    if (rowWithTarget >= m) {
        return false;
     }

    low = 0;
    hi = n;
    while (low < hi) {
        let mid: usize = (low + hi) / 2;
        if (matrix[rowWithTarget][mid] > target) {
            hi = mid;
        } else if (matrix[rowWithTarget][mid] < target) {
            low = mid + 1;
        } else {
            return true;
        }
    }
    return false;
}

Haskell Solution

In our Haskell solution, the main difference of course will be using recursion for the binary search. However, we’ll also change up the data structure a bit. In the Rust framing of the problem, we had a vector of vectors of values. We could do this in Haskell, but we could also use Array (Int, Int) Int. This lets us map row/column pairs to numbers in a more intuitive way.

import qualified Data.Array as A

search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = ...
  where
      ((minR, minC), (maxR, maxC)) = A.bounds matrix

Another unique feature of arrays is that the bounds don’t have to start from 0. We can have totally custom bounding dimensions for our rows and columns. So instead of using m and n, we’ll need to use the min and max of the row and column dimensions.

So now let’s define our first binary search, looking for the valid row. As we did last week, the input to our function will be two Int values, for the low and hi. As in our Rust solution we’ll access the first and last element of the row defined by the “middle” of low and hi, and compare them against the target. We make recursive calls to searchRow if the row isn’t valid.

search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = result
  where
      ((minR, minC), (maxR, maxC)) = A.bounds matrix

      searchRow :: (Int, Int) -> Int
      searchRow (low, hi) = if low >= hi then maxR + 1 else
        let mid = (low + hi) `quot` 2
            firstInRow = matrix A.! (mid, minC)
            lastInRow = matrix A.! (mid, maxC)
        in  if firstInRow > target
              then searchRow (low, mid)
              else if lastInRow < target
                then searchRow (mid + 1, hi)
                else mid

      rowWithTarget = searchRow (minR, maxR + 1)
      result = rowWithTarget <= maxR && ...

Instead of m, we have maxR + 1, which we use as the initial hi value, as well as a return value in the base case where low meets hi. We can return a result of False if rowWithTarget does not come back with a value smaller than maxR.

Now for our second search, we follow the same pattern, but now we’re returning a boolean. The base case returns False, and we return True if we find the value in rowWithTarget at position mid. Here’s what that looks like:

search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = result
  where
      ...

      rowWithTarget = searchRow (minR, maxR + 1)

      searchCol :: (Int, Int) -> Bool
      searchCol (low, hi) = low < hi &&
        let mid = (low + hi) `quot` 2
            val = matrix A.! (rowWithTarget, mid)
        in  if val > target
              then searchCol (low, mid)
              else if val < target
                then searchCol (mid + 1, hi)
                else True
      
      result = rowWithTarget <= maxR && searchCol (minC, maxC + 1)

You’ll see we now use the outcome of searchCol for result. And this completes our solution! Here’s the full code:

search2DMatrix :: A.Array (Int, Int) Int -> Int -> Bool
search2DMatrix matrix target = result
  where
      ((minR, minC), (maxR, maxC)) = A.bounds matrix

      searchRow :: (Int, Int) -> Int
      searchRow (low, hi) = if low >= hi then maxR + 1 else
        let mid = (low + hi) `quot` 2
            firstInRow = matrix A.! (mid, minC)
            lastInRow = matrix A.! (mid, maxC)
        in  if firstInRow > target
              then searchRow (low, mid)
              else if lastInRow < target
                then searchRow (mid + 1, hi)
                else mid

      rowWithTarget = searchRow (minR, maxR + 1)

      searchCol :: (Int, Int) -> Bool
      searchCol (low, hi) = low < hi &&
        let mid = (low + hi) `quot` 2
            val = matrix A.! (rowWithTarget, mid)
        in  if val > target
              then searchCol (low, mid)
              else if val < target
                then searchCol (mid + 1, hi)
                else True
      
      result = rowWithTarget <= maxR && searchCol (minC, maxC + 1)

Conclusion

Next week, we’ll stay on the subject of 2D matrices, but we’ll learn about array mutation. This is a very tricky subject in Haskell, so make sure to come back for that article!

To learn how these data structures work in Haskell, read about Solve.hs, our Haskell Data Structures & Algorithms course!

by James Bowen at July 14, 2025 08:30 AM

July 10, 2025

Tweag I/O

Publish all your crates everywhere all at once

Cargo is the native package manager and build system for Rust, allowing you to easily bring in dependencies from the global crates.io registry,1 or to publish your own crates to crates.io. Tor Hovland and I recently contributed a long-requested feature to Cargo, allowing you to package many interdependent packages in one go. That might not sound like a big deal, but there were a few tricky parts; there’s a reason the original feature request was open for more than 10 years! In this post, I’ll walk you through the feature and — if you’re a Rust developer — tell you how you can try it out.

Workspaces

The Rust unit of packaging — like a gem in Ruby or a module in Go — is called a “crate”, and it’s pretty common for a medium-to-large Rust project to be divided into several of them. This division helps keep code modular and interfaces well-defined, and also allows you to build and test components individually. Cargo supports multi-crate workflows using “workspaces”: a workspace is just a bunch of crates that Cargo handles “together”, sharing a common dependency tree, a common build directory, and so on. A basic workspace might look like this:

.
├── Cargo.toml
├── Cargo.lock
├── taco
│   ├── Cargo.toml
│   └── src
│       ├── lib.rs
│       └── ... more source files
└── tortilla
    ├── Cargo.toml
    └── src
        ├── lib.rs
        └── ... more source files

The top-level Cargo.toml just tells Cargo where the crates in the workspace live.2

# ./Cargo.toml
workspace.members = ["taco", "tortilla"]

The crate-level Cargo.toml files tell us about the crates (surprise!). Here’s taco’s Cargo.toml:

# ./taco/Cargo.toml
[package]
name = "taco"
version = "2.0"
dependencies.tortilla = { path = "../tortilla", version = "1.3" }

The dependency specification is actually pretty interesting. First, it tells us that the tortilla package is located at ../tortilla (relative to taco). When you’re developing locally, Cargo uses this local path to find the tortilla crate. But when you publish the taco crate for public consumption, Cargo strips out the path = "../tortilla" setting because it’s only meaningful within your local workspace. Instead, the published taco crate will depend on version 1.3 of the published tortilla crate. This doubly-specified dependency gives you the benefits of a monorepo (for example, you get to work on tortilla and taco simultaneously and be sure that they stay compatible) without leaking that local setup to downstream users of your crates.

If you’ve been hurt by packaging incompatibilities before, the previous paragraph might have raised some red flags: allowing a dependency to come from one of two places could lead to problems if they get out-of-sync. Like, couldn’t you accidentally make a broken package by locally updating both your crates and then only publishing taco? You won’t see the breakage when building locally, but the published taco will be incompatible with the previously published tortilla. To deal with this issue, Cargo verifies packages before you publish them. When you type cargo publish --package taco, it packages up the taco crate (removing the local ../tortilla dependency) and then unpackages the new package in a temporary location and attempts to build it from scratch. This rebuild-from-scratch sees the taco crate exactly as a downstream user would, and so it will catch any incompatibilities between the existing, published tortilla and the about-to-be-published taco.

Cargo’s crate verification is not completely fool-proof because it only checks that the package compiles.3 In practice, I find that checking compilation is already pretty useful, but I also like to run other static checks.

Publish all my crates

Imagine you’ve been working in your workspace, updating your crates in backwards-incompatible ways. Now you want to bump tortilla to version 2.0 and taco to version 3.0 and publish them both. This isn’t too hard:

  1. Edit tortilla/Cargo.toml to increase the version to 2.0.
  2. Run cargo publish --package tortilla, and wait for it to appear on crates.io.
  3. Edit taco/Cargo.toml to increase its version to 3.0, and change its tortilla dependency. to 2.0.
  4. Run cargo publish --package taco.

The ordering is important here. You can’t publish the new taco before tortilla 2.0 is publicly available: if you try, the verification step will fail.

This multi-crate workflow works, but it has two problems:

  1. It can get tedious. With two crates it’s manageable, but what about when the dependency graph gets complicated? I worked for a client whose CI had custom Python scripts for checking versions, bumping versions, publishing things in the right order, and so on. It worked, but it wasn’t pretty.4
  2. It’s non-atomic: if in the process of verifying and packaging dependent crates you discover some problems with the dependencies then you’re out of luck because you’ve already published them. crates.io doesn’t allow deleting packages, so you’ll just have to yank5 the broken packages, increase the version number some more, and start publishing again. This one can’t be solved by scripts or third-party tooling: verifying the dependent crate requires the dependencies to be published.

Starting in mid-2024, my colleague Tor Hovland and I began working on native support for this in Cargo. A few months and dozens of code-review comments later, our initial implementation landed in Cargo 1.83.0. By the way, the Cargo team are super supportive of new contributors — I highly recommend going to their office hours if you’re interested.

How it works

In our implementation, we use a sort of registry “overlay” to verify dependent crates before their dependencies are published. This overlay wraps an upstream registry (like crates.io), allowing us to add local crates to the overlay without actually publishing them upstream. This kind of registry overlay is an interesting topic on its own. The “virtualization” of package sources is an often-requested feature that hasn’t yet been implemented in general because it’s tricky to design without exposing users to dependency confusion attacks: the more flexible you are about where dependencies come from, the easier it is for an attacker to sneak their way into your dependency tree. Our registry overlay passed scrutiny because it’s only available to Cargo internally, and only gets used for workspace-local packages during workspace publishing.

The registry overlay was pretty simple to implement, since it’s just a composition of two existing Cargo features: local registries and abstract sources. A local registry in Cargo is just a registry (like crates.io) that lives on your local disk instead of in the cloud. Cargo has long supported them because they’re useful for offline builds and integration testing. When packaging a workspace we create a temporary, initially-empty local registry for storing the new local packages as we produce them.

Our second ingredient is Cargo’s Source trait: since Cargo can pull dependencies from many different kinds of places (crates.io, private registries, git repositories, etc.), they already have a nice abstraction that encapsulates how to query availability, download, and cache packages from different places. So our registry overlay is just a new implementation of the Source trait that wraps two other Sources: the upstream registry (like crates.io) that we want to publish to, and the local registry that we put our local packages in. When someone queries our overlay source for a package, we check in the local registry first, and fall back to the upstream registry.

A diagram showing crates.io and a local registry feeding into an overlay

Now that we have our local registry overlay, the workspace-publishing workflow looks like this:

  1. Gather all the to-be-published crates and figure out any inter-dependencies. Sort them in a “dependency-compatible” order, meaning that every crate will be processed after all its dependencies.
  2. In that dependency-compatible order, package and verify each crate. For each crate:
    • Package it up, removing any mention of local path dependencies.
    • Unpackage it in a temporary location and check that it builds. This build step uses the local registry overlay, so that it thinks all the local dependencies that were previously added to the local overlay are really published.
    • “Publish” the crate in the local registry overlay.
  3. In the dependency-compatible order, actually upload all the crates to crates.io. This is done in parallel as much as possible. For example, if tortilla and carnitas don’t depend on one another but taco depends on them both, then tortilla and carnitas can be uploaded simultaneously.

It’s possible for the final upload to fail (if your network goes down, for example) and for some crates to remain unpublished; in that sense, the new workspace publishing workflow is not truly atomic. But because all of the new crates have already been verified with one another, you can just retry publishing the ones that failed to upload.

How to try it

Cargo, as critical infrastructure for Rust development, is pretty conservative about introducing new features. Multi-package publishing was recently promoted to a stable feature, but it is currently only available in nightly builds. If you’re using a recent nightly build of Cargo 1.90.0 or later, running cargo publish in a workspace will work as described in this blog post. If you don’t want to publish everything in your workspace, the usual package-selection arguments should work as expected: cargo publish --package taco --package tortilla will publish just taco and tortilla, while correctly managing any dependencies between them. Or you can exclude packages like cargo publish --exclude onions.

If you’re using a stable Rust toolchain, workspace publishing will be available in Cargo 1.90 in September 2025.


  1. If you use Node.js, Cargo is like the npm command and crates.io is like the NPM registry. If you use Python, Cargo is like pip (or Poetry, or uv) and crates.io is like PyPI.
  2. It can also contain lots of other useful workspace-scoped information, like dependencies that are common between crates or global compiler settings.
  3. To be even more precise, it only checks that the package compiles against the dependencies that are locked in your Cargo.lock file, which gets included in the package. If you or someone in your dependency tree doesn’t correctly follow semantic versioning, downstream users could still experience compilation problems. In practice, we’ve seen this cause binary packages to break because cargo install ignores the lock file by default.
  4. There are also several third-party tools (for example, cargo-release, cargo-smart-release, and release-plz) to help automate multi-crate releases. If one of these meets your needs, it might be better than a custom script.
  5. “Yanking” is Cargo’s mechanism for marking packages as broken without actually deleting their contents and breaking everyone’s builds.

July 10, 2025 12:00 AM

July 09, 2025

Well-Typed.Com

Developing an application from scratch (Haskell Unfolder #46)

Today, 2025-07-09, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 46th episode of the Haskell Unfolder live on YouTube.

Developing an application from scratch (Haskell Unfolder #46)

In this episode targeted at beginners, we show the end-to-end application development process, starting from an empty directory. We’ll consider package configuration, taking advantage of editor integration, how to deal with dependencies, organizing code into modules, and parsing command line arguments. We will use this to write a simple but useful application.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at July 09, 2025 12:00 AM

July 07, 2025

Haskell Interlude

67: Alex McLean

Mike and Andres speak to Alex McLean who created the TidalCycles system for electronic music - implemented in Haskell of course. We talk about how Alex got into Haskell coming from Perl, how types helped him think about the structure of music and patterns, the architecture and evolution of TidalCycles, about art, community and making space for new ideas, and lots of things in between.

by Haskell Podcast at July 07, 2025 02:00 PM

Monday Morning Haskell

Binary Search in Haskell and Rust

This week we’ll be continuing our series of problem solving in Haskell and Rust. But now we’re going to start moving beyond the terrain of “basic” problem solving techniques with strings, lists and arrays, and start moving in the direction of more complicated data structures and algorithms. Today we’ll explore a problem that is still array-based, but uses a tricky algorithm that involves binary search!

You’ll learn more about Data Structures and Algorithms in our Solve.hs course! The last 7 weeks or so of blog articles have focused on the types of problems you’ll see in Module 1 of that course, but now we’re going to start encountering ideas from Modules 2 & 3, which look extensively at essential data structures and algorithms you need to know for problem solving.

The Problem

Today’s problem is median of two sorted arrays. In this problem, we receive two arrays of numbers as input, each of them in sorted order. The arrays are not necessarily of the same size. Our job is to find the median of the cumulative set of numbers.

Now there’s a conceptually easy approach to this. We could simply scan through the two arrays, keeping track of one index for each one. We would increase the index for whichever number is currently smaller, and stop once we have passed by half of the total numbers. This approach is essentially the “merge” part of merge sort, and it would take O(n) time, since we are scanning half of all the numbers.

However, there’s a faster approach! And if you are asked this question in an interview for anything other than a very junior position, your interviewer will expect you to find this faster approach. Because the arrays are sorted, we can leverage binary search to find the median in O(log n) time. The approach isn’t easy to see though! Let’s go over the algorithm before we get into any code.

The Algorithm

This algorithm is a little tricky to follow (this problem is rated as “hard” on LeetCode). So we’re going to treat this a bit like a mathematical proof, and begin by defining useful terms. Then it will be easy to describe the coding concepts behind the algorithm.

Defining our Terms

Our input consists of 2 arrays, arr1 and arr2 with potentially different sizes n and m, respectively. Without loss of generality, let arr1 be the “shorter” array, so that n <= m. We’ll also define t as the total number of elements, n + m.

It is worthwhile to note right off the bat that if t is odd, then a single element from one of the two lists will be the median. If t is even, then we will average two elements together. Even though we won’t actually create the final merged array, we can imagine that it consists of 3 parts:

  1. The “prior” portion - all numbers before the median element(s)
  2. The median element(s), either 1 or 2.
  3. The “latter” portion - all numbers after the median element(s)

The total number of elements in the “prior” portion will end up being (t - 1) / 2, bearing in mind how integer division works. For example, whether t is 15 or 16, we get 7 elements in the “prior” portion. We’ll use p for this number.

Finally, let’s imagine p1, the number of elements from arr1 that will end up in the prior portion. If we know p1, then p2, the number of elements from arr2 in the prior portion is fixed, because p1 + p2 = p. We can then think of p1 as an index into arr1, the index of the first element that is not in the prior portion. The only trick is that this index could be n indicating that all elements of arr1 are in the prior portion.

Getting the Final Answer from our Terms

If we have the “correct” values for p1 and p2, then finding the median is easy. If t is odd, then the lower number between arr1[p1] and arr2[p2] is the median. If t is even, then we average the two smallest numbers among (arr1[p1], arr2[p2], arr1[p1 + 1], arr2[p2 + 1]).

So we’ve reduced this problem to a matter of finding p1, since p2 can be easily derived from it. How do we know we have the “correct” value for p1, and how do we search for it efficiently?

Solving for p1

The answer is that we will conduct a binary search on arr1 in order to find the correct value of p1. For any particular choice of p1, we determine the corresponding value of p2. Then we make two comparisons:

  1. Compare arr1[p1 - 1] to arr2[p2]
  2. Compare arr2[p2 - 1] to arr1[p1]

If both comparisons are less-than-or-equals, then our two p values are correct! The slices arr1[0..p1-1] and arr2[0..p2-1] always constitute a total of p values, and if these values are smaller than arr1[p1] and arr2[p2], then they constitute the entire “prior” set.

If, on the other hand, the first comparison yields “greater than”, then we have too many values for arr1 in our prior set. This means we need to recursively do the binary search on the left side of arr1, since p1 should be smaller.

Then if the second comparison yields “greater than”, we have too few values from arr1 in the “prior” set. We should increase p1 by searching the right half of our array.

This provides a complete algorithm for us to follow!

Rust Implementation

Our algorithm description was quite long, but the advantage of having so many details is that the code starts to write itself! We’ll start with our Rust implementation. Stage 1 is to define all of the terms using our input values. We want to define our sizes and array references generically so that arr1 is the shorter array:

pub fn find_median_sorted_arrays(nums1: Vec<i32>, nums2: Vec<i32>) -> f64 {
    let mut n = nums1.len();
    let mut m = nums2.len();
    let mut arr1: &Vec<i32> = &nums1;
    let mut arr2: &Vec<i32> = &nums2;
    if (m < n) {
        n = nums2.len();
        m = nums1.len();
        arr1 = &nums2;
        arr2 = &nums1;
    }
    let t = n + m;
    let p: usize = (t - 1) / 2;

    ...
}

Anatomy of a Binary Search

The next stage is the binary search, so we can find p1 and p2. Now a binary search is a particular kind of loop pattern. Like many of the loop patterns we worked with in the previous weeks, we can express it recursively, or with a loop construct like for or while. We’ll start with a while loop solution for Rust, and then show the recursive solution with Haskell.

All loops maintain some kind of state. For a binary search, the primary state is the two endpoints representing our “interval of interest”. This starts out as the entire interval, and shrinks by half each time until we’ve narrowed to a single element (or no elements). We’ll represent these with interval end points with low and hi. Our loop concludes once low is as large as hi.

let mut low = 0;
// Use the shorter array size!
let mut hi = n;
while (low < hi) {
    ...
}

In our particular case, we are also trying to determine the values for p1 and p2. Each time we specify an interval, we’ll see if the midpoint of that interval (between low and hi) is the correct value of p1:

...

let mut low = 0;
let mut hi = n;
let mut p1 = 0;
let mut p2 = 0;
while (low < hi) {
    p1 = (low + hi) / 2;
    p2 = p - p1;
    ...
}

Now we evaluate this p1 value using the two conditions we specified in our algorithm. These are self-explanatory, except we do need to cover some edge cases where one of our values is at the edge of the array bounds.

For example, if p1 is 0, the first condition is always “true”. If this condition is negated, this means we want fewer elements from arr1, but this is impossible if p1 is 0.

...

let mut low = 0;
let mut hi = n;
let mut p1 = 0;
let mut p2 = 0;
while (low < hi) {
    p1 = (low + hi) / 2;
    p2 = p - p1;
    let cond1 = p1 == 0 || arr1[p1 - 1] <= arr2[p2];
    let cond2 = p1 == n || p2 == 0 || arr2[p2 - 1] <= arr1[p1];
    if (cond1 && cond2) {
        break;
    } else if (!cond1) {
        p1 -= 1;
        hi = p1;
    } else {
        p1 += 1;
        low = p1;
    }
}
p2 = p - p1;

...

If both conditions are met, you’ll see we break, because we’ve found the right value for p1! Otherwise, we know p1 is invalid. This means we want to exclude the existing p1 value from further consideration by changing either low or hi to remove it from the interval of interest.

So if cond1 is false, hi becomes p1 - 1, and if cond2 is false, it becomes p1 + 1. In both cases, we also modify p1 itself first so that our loop does not conclude with p1 in an invalid location.

Getting the Final Answer

Now that we have p1 and p2, we have to do a couple final tricks to get the final answer. We want to get the first “smaller” value between arr1[p1] and arr2[p2]. But we have to handle the edge case where p1 might be n AND we want to increment the index for the array we take. Note that p2 cannot be out of bounds right now!

let mut median = arr2[p2];
if (p1 < n && arr1[p1] < arr2[p2]) {
    median = arr1[p1];
    p1 += 1;
} else {
    p2 += 1;
}

If the total number of elements is odd, we can simply return this number (converting to a float). However, in the even case we need one more number to take an average. So we’ll compare the values at the indices again, but now accounting that either (but not both) could be out of bounds.

let mut median = arr2[p2];
if (p1 < n && arr1[p1] < arr2[p2]) {
    median = arr1[p1];
    p1 += 1;
} else {
    p2 += 1;
}

if (t % 2 == 0) {
    if (p1 >= n) {
        median += arr2[p2];
    } else if (p2 >= m) {
        median += arr1[p1];
    } else {
        median += cmp::min(arr1[p1], arr2[p2]);
    }
    let medianF: f64 = median.into();
    return medianF / 2.0;
} else {
    return median.into();
}

Here’s the complete implementation:

pub fn find_median_sorted_arrays(nums1: Vec<i32>, nums2: Vec<i32>) -> f64 {
    let mut n = nums1.len();
    let mut m = nums2.len();
    let mut arr1: &Vec<i32> = &nums1;
    let mut arr2: &Vec<i32> = &nums2;
    if (m < n) {
        n = nums2.len();
        m = nums1.len();
        arr1 = &nums2;
        arr2 = &nums1;
    }
    let t = n + m;
    let p: usize = (t - 1) / 2;

    let mut low = 0;
    let mut hi = n;
    let mut p1 = 0;
    let mut p2 = 0;
    while (low < hi) {
        p1 = (low + hi) / 2;
        p2 = p - p1;
        let cond1 = p1 == 0 || arr1[p1 - 1] <= arr2[p2];
        let cond2 = p1 == n || p2 == 0 || arr2[p2 - 1] <= arr1[p1];
        if (cond1 && cond2) {
            break;
        } else if (!cond1) {
            p1 -= 1;
            hi = p1;
        } else {
            p1 += 1;
            low = p1;
        }
    }
    p2 = p - p1;

    let mut median = arr2[p2];
    if (p1 < n && arr1[p1] < arr2[p2]) {
        median = arr1[p1];
        p1 += 1;
    } else {
        p2 += 1;
    }

    if (t % 2 == 0) {
        if (p1 >= n) {
            median += arr2[p2];
        } else if (p2 >= m) {
            median += arr1[p1];
        } else {
            median += cmp::min(arr1[p1], arr2[p2]);
        }
        let medianF: f64 = median.into();
        return medianF / 2.0;
    } else {
        return median.into();
    }
}

Haskell Implementation

Now let’s examine the Haskell implementation. Unlike the LeetCode version, we’ll just assume our inputs are Double already instead of doing a conversion. Once again, we start by defining the terms:

medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    n' = V.length input1
    m' = V.length input2
    t = n' + m'
    p = (t - 1) `quot` 2
    (n, m, arr1, arr2) = if V.length input1 <= V.length input2
      then (n', m', input1, input2) else (m', n', input2, input1)

    ...

Now we’ll implement the binary search, this time doing a recursive function. We’ll do this in two parts, starting with a helper function. This helper function will simply tell us if a particular index is correct for p1. The trick though is that we’ll return an Ordering instead of just a Bool:

-- data Ordering = LT | EQ | GT
f :: Int -> Ordering

This lets us signal 3 possibilities. If we return EQ, this means the index is valid. If we return LT, this will mean we want fewer values from arr1. And then GT means we want more values from arr1.

With this framing it’s easy to see the implementation of this helper now. We determine the appropriate p2, figure out our two conditions, and return the value for each condition:

medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    ...
    f :: Int -> Ordering
    f pi1 =
      let pi2 = p - pi1
          cond1 = pi1 == 0 || arr1 V.! (pi1 - 1) <= arr2 V.! pi2
          cond2 = pi1 == n || pi2 == 0 || (arr2 V.! (pi2 - 1) <= arr1 V.! pi1)
      in  if cond1 && cond2 then EQ else if (not cond1) then LT else GT

Now applying we can use this in a recursive binary search. The binary search tracks two pieces of state for our interval ((Int, Int)), and it will return the correct value for p1. The implementation applies the base case (return low if low >= hi), determines the midpoint, calls our helper, and then recurses appropriately based on the helper result.

medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    ...
    f :: Int -> Ordering
    f pi1 = ...
    
    search :: (Int, Int) -> Int
    search (low, hi) = if low >= hi then low else
      let mid = (low + hi) `quot` 2
      in  case f mid of
            EQ -> mid
            LT -> search (low, mid - 1)
            GT -> search (mid + 1, hi)

    p1 = search (0, n)
    p2 = p - p1

    ...

For the final part of the problem, we’ll define a helper. Given p1 and p2, it will emit the “lower” value between the two indices in the array (accounting for edge cases) as well as the two new indices (since one will increment).

This is a matter of lazily defining the “next” value for each array, the “end” condition of each array, and the “result” if that array’s value is chosen:

medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = ...
  where
    ...

    findNext pi1 pi2 =
      let next1 = arr1 V.! pi1
          next2 = arr2 V.! pi2
          end1 = pi1 >= n
          end2 = pi2 >= m
          res1 = (next1, pi1 + 1, pi2)
          res2 = (next2, pi1, pi2 + 1)
      in  if end1 then res2
            else if end2 then res1
            else if next1 <= next2 then res1 else res2

Now we just apply this either once or twice to get our result!

medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = result
  where
    ...

    tIsEven = even t
    (median1, nextP1, nextP2) = findNext p1 p2
    (median2, _, _) = findNext nextP1 nextP2
    result = if tIsEven
      then (median1 + median2) / 2.0
      else median1

Here’s the complete implementation:

medianSortedArrays :: V.Vector Double -> V.Vector Double -> Double
medianSortedArrays input1 input2 = result
  where
    n' = V.length input1
    m' = V.length input2
    t = n' + m'
    p = (t - 1) `quot` 2
    (n, m, arr1, arr2) = if V.length input1 <= V.length input2
      then (n', m', input1, input2) else (m', n', input2, input1)

    -- Evaluate the index in arr1
    -- If this does in indicate the index can be part of a median, return EQ
    -- If it indicates we need to move left in shortArr, return LT
    -- If it indicates we need to move right in shortArr, return GT
    -- Precondition: p1 <= n
    f :: Int -> Ordering
    f pi1 =
      let pi2 = p - pi1
          cond1 = pi1 == 0 || arr1 V.! (pi1 - 1) <= arr2 V.! pi2
          cond2 = pi1 == n || pi2 == 0 || (arr2 V.! (pi2 - 1) <= arr1 V.! pi1)
      in  if cond1 && cond2 then EQ else if (not cond1) then LT else GT
    
    search :: (Int, Int) -> Int
    search (low, hi) = if low >= hi then low else
      let mid = (low + hi) `quot` 2
      in  case f mid of
            EQ -> mid
            LT -> search (low, mid - 1)
            GT -> search (mid + 1, hi)
    
    findNext pi1 pi2 =
      let next1 = arr1 V.! pi1
          next2 = arr2 V.! pi2
          end1 = pi1 >= n
          end2 = pi2 >= m
          res1 = (next1, pi1 + 1, pi2)
          res2 = (next2, pi1, pi2 + 1)
      in  if end1 then res2
            else if end2 then res1
            else if next1 <= next2 then res1 else res2

    p1 = search (0, n)
    p2 = p - p1

    tIsEven = even t
    (median1, nextP1, nextP2) = findNext p1 p2
    (median2, _, _) = findNext nextP1 nextP2
    result = if tIsEven
      then (median1 + median2) / 2.0
      else median1

Conclusion

If you want to learn more about these kinds of problem solving techniques, you should take our course Solve.hs! In the coming weeks, we’ll see more problems related to data structures and algorithms, which are covered extensively in Modules 2 and 3 of that course!

by James Bowen at July 07, 2025 08:30 AM

GHC Developer Blog

GHC LTS Releases

GHC LTS Releases

Andreas Klebinger - 2025-07-07

GHC will start maintaining an LTS release/branch in the near future

A release being designated LTS (Long Term Support) in this case means we plan to support it over a longer timeframe than usual.

Concretely the plan is to provide updates for a LTS releases for at least two years. Most likely we will support LTS releases for even longer than that, aiming for a support window of three years currently.

During this time we will be providing minor releases fixing bugs as with any other release. The main difference being that we will do so for a longer period of time.

There are no plans to backport any new features to LTS releases after their initial release.

In terms of frequency of LTS releases we plan to have an overlap between LTS support windows of different LTS series of six months.

A potential timeline might then look like this:

2025 Aug    - LTS 9.14 released
2028 Spring - LTS 9.22 released
2028 Summer - LTS 9.14.X - last 9.14 point release
2031 Spring  - LTS 9.X released
2031 Summer - Last 9.22 point release
...

Non-LTS releases

GHC will continue to release new major non-lts releases on a ~6 Month cadence. We expect to cut back on the lifetime of these releases slightly, dedicating the resources freed up this way to enable a longer support window for the LTS releases.

Why LTS releases?

In practice some releases always saw more adoption than others by users. The GHC Team has not been blind to this fact and has at times informally extended the life of a certain release based on this as well.

This resulted in a sort of informal “post-hoc LTS” status of releases. At times with support windows not much shorter than our proposed minimum of two years.

This worked reasonable well for people who were confident to stay on a fairly old release, only upgrading to a newer “post-hoc LTS” once the dust settled. It also worked out for those who picked one of those “post-hoc LTS” releases by happenstance before it was clear the release would end up as “post-hoc LTS”.

However users who adopted major releases which did not end up as “post-hoc LTS” often had to choose between upgrading earlier than expected, or risk running into a show stopping bug after the support window of the release had already ended. Similarly much of this was based on informal community sentiment and rarely written down explicitly. Making this information hard to access for members not deeply involved in the day to day of the haskell community.

By designating a major release as LTS ahead of time we hope that users can make a informed decision about which GHC version they pick. Making it clear what the tradeoffs will be. With a clear choice between a longer support window or the newest features.

Why not make post-hoc LTS releases official instead?

This is a question that has come up a lot in discussion. The major downsides of this are a lack of predictability, and that a lot of time might be lost between the initial release and any such decision. If we declare a release as LTS 9 months after its .1 release we essentially shaved off months from the LTS support window.

On the flip side if we announce it ahead of time everyone knows that a given release will be the new LTS. So the hope is that this encourages more and quicker support for the release by the community. Hopefully compressing the timeline of bug fixing, testing and eventual widespread adoption.

Overall I’m hopeful that LTS releases being explicit will remove a lot of ambiguity around GHC versions. And while the guaranteed LTS support window might not be as long as one might hope having LTS releases with longer guaranteed support window should still be helpful to people working on long running haskell projects.

Next steps

The first LTS release will be GHC 9.14, which will be released this summer!

by ghc-devs at July 07, 2025 12:00 AM

July 03, 2025

Tweag I/O

Bazel workshop made public

As part of our consulting business we are often invited to solve problems that our clients cannot tackle on their own. It is not uncommon for us to collaborate with a client for extended periods of time; during which, many opportunities for knowledge transfer present themselves, be it in the form of documentation, discussions, or indeed, when the client finds it desirable, in the form of specialized workshops.

In this post we’d like to talk about a workshop that we developed and delivered (so far) five times to different groups of people at the same client. We received positive feedback for it and we believe it was helpful for those who attended it.

The workshop intends to give a principled introduction to the Bazel build system for people who have little or no knowledge of Bazel, but who are software developers and have used a build system before. It is definitely a workshop for a technical audience, and as such it was presented to (among others) dedicated DevOps and DevX teams of the client.

We are happy to announce that the materials of this workshop are now publicly available in the form of:

The original intended duration of the workshop was three days. However, one of these days was dedicated almost entirely to a case study that we cannot share publicly; therefore, the public version is shorter and should amount to approximately two days.

Here are a couple of the introductory slides to give you an impression of the scope, structure, and expected knowledge in this workshop:

Slides with the workshop's outline, and our expectations of the participants

It must be pointed out that the workshop was developed in 2024, when the WORKSPACE-based approach to dependency management was still the default choice and so, given that we were time-constrained both at the authoring and presentation stages, we chose not to cover Bzlmod. We are still convinced that familiarity with WORKSPACE and simple repository rules is a prerequisite for understanding Bzlmod. Some newer features like symbolic macros are also not covered. Learning materials for Bazel go out of date quickly, but even so, we believe that the workshop, now public, is still relevant and can be of use for people who are about to embark on their Bazel journey.

July 03, 2025 12:00 AM

June 30, 2025

Monday Morning Haskell

Buffer & Save with a Challenging Example

Welcome back to our series comparing LeetCode problems in Haskell and Rust. Today we’ll learn a new paradigm that I call “Buffer and Save”. This will also be the hardest problem we’ve done so far! The core loop structure isn’t that hard, but there are a couple layers of tricks to massage our data to get the final answer.

This will be the last problem we do that focuses strictly on string and list manipulation. The next set of problems we do will all rely on more advanced data structures or algorithmic ideas.

For more complete practice on problem solving in Haskell, check out Solve.hs, our newest course. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. You’ll get loads of practice building structures and algorithms from scratch, which is very important for understanding and remembering how they work.

The Problem

Today’s problem is Text Justification. The idea here is that we are taking a list of words and a “maximum width” and printing out the words grouped into equal-width lines that are evenly spaced. Here’s an example input and output:

Example Input (list of 9 strings):
[“Study”, “Haskell”, “with”, “us”, “every”, “Monday”, “Morning”, “for”, “fun”]
Max Width: 16

Output (list of 4 strings):
“Study    Haskell”
“with   us  every”
“Monday   Morning”
“for fun         ”

There are a few notable rules, constraints, and edge cases. Here’s a list to sumarize them:

  1. There is at least one word
  2. No word is larger than the max width
  3. All output strings must have max width as their length (including spaces)
  4. The first word of every line is set to the left
  5. The last line always has 1 space between words, and then enough spaces after the last word to read the max width.
  6. All other lines with multiple words will align the final word all the way to the right
  7. The spaces in non-final lines are distributed as evenly as possible, but extra spaces go between words to the left.

The final point is potentially the trickiest to understand. Consider the second line above, with us every. The max width is 16, and we have 3 words with a total of 11 characters. This leaves us 5 spaces. Having 3 words means 2 blanks, so the “left” blank gets 3 spaces and the “right” blank gets 2 spaces.

If you had a line with 5 words, a max width of 30, and 16 characters, you would place 4 spaces in the left two blanks, and 3 spaces in the right two blanks. The relative length of the words does not matter.

Words in Line: [“A”, “good”, “day”, “to”, “endure”]

Output Line:
“A    good    day   to   endure”

The Algorithm

As mentioned above, our main algorithmic idea could be called “buffer and save”. We’ve been defining all of our loops based on the state we must maintain between iterations of the loop. The buffer and save approach highlights two pieces of state for us:

  1. The strings we’ve accumulated for our answer so far (the “result”)
  2. A buffer of the strings in the “current” line we’re building.

So we’ll loop through the input words one at a time. We’ll consider if the next word can be added to the “current” line. If it would cause our current line to exceed the maximum width, we’ll “save” our current line and write it out to the “result” list, adding the required spaces.

To help our calculations, we’ll also include two other pieces of state in our loop:

  1. The number of characters in our “current” line
  2. The number of words in our “current” line

Finally, there’s the question of how to construct each output line. Combining the math with list-mechanics is a little tricky. But the central idea consists of 4 simple steps:

  1. Find the number of spaces (subtract number of characters from max width)
  2. Divide the number of spaces by the number of “blanks” (number of words - 1)
  3. The quotient is the “base” number of spaces per blank
  4. The remainder is the number of blanks (starting from the left) that get an extra space

The exact implementation of this idea differs between Haskell and Rust. Again this rests a lot on the “reverse” differences between Rust vectors and Haskell lists.

The final line has a slightly different (but easier) process. And we should note that the final line will still be in our buffer when we exit the loop! So we shouldn’t forget to add it to the result.

Haskell Solution

We know enough now to jump into our Haskell solution. Our solution should be organized around a loop. Since we go through the input word-by-word, this should follow a fold pattern. So here’s our outline:

justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = ...
  where
    -- f = ‘final’
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord = ...

Let’s focus in on the choice we have to make in the loop. We need to determine if this new word fits in our current line. So we’ll get its length and add it to the number of characters in the line AND consider the number of words in the line. We count the words too since each word we already have requires at least one space!

-- (maxWidth is still in scope here)
loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ...
        else ...

How do we fill in these choices? If we don’t overflow the line, we just append the new word, bump the count of the words, and add the new word’s length to the character count.

loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ...
        else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

The overflow case isn’t hard, but it does require us to have a function that can convert our current line into the final string. This function will also take the number of words and characters in this line. Assuming this function exists, we just make this new line, append it to result, and then reset our other stateful values so that they only reflect the “new word” as part of our current line.

loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
  let newWordLen = length newWord
      resultLine = makeLine currentLine wordsInLine charsInLine
  in  if newWordLen + charsInLine + wordsInLine > maxWidth
        then ([newWord], 1, newWordLen, resultLine : currResult)
        else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

makeLine :: String -> Int -> Int -> String
makeLine = ...

Before we think about the makeLine implementation though, we just about have enough to fill in the rest of the “top” of our function definition. We’d just need another function for making the “final” line, since this is different from other lines. Then when we get our “final” state values, we’ll plug them into this function to get our final line, append this to the result, and reverse it all.

justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = 
  reverse (makeLineFinal flLine fWordsInLine fCharsInLine : result)
  where
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
      let newWordLen = length newWord
          resultLine = makeLine currentLine wordsInLine charsInLine
      in  if newWordLen + charsInLine + wordsInLine > maxWidth
            then ([newWord], 1, newWordLen, resultLine : currResult)
            else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

    makeLine :: [String] -> Int -> Int -> String
    makeLine = ...

    makeLineFinal :: [String] -> Int -> Int -> String
    makeLineFinal = ...

Now let’s discuss forming these lines, starting with the general case. We can start with a couple edge cases. This should never be called with an empty list. And with a singleton, we just left-align the word and add the right number of spaces:

makeLine :: [String] -> Int -> Int -> String
makeLine [] _ _ = error "Cannot makeLine with empty string!"
makeLine [onlyWord] _ charsInLine =
  let extraSpaces = replicate (maxWidth - charsInLine) ' '
  in  onlyWord <> extraSpaces
makeLine (first : rest) wordsInLine charsInLine = ...

Now we’ll calculate the quotient and remainder to get the spacing sizes, as mentioned in our algorithm section. But how do we combine them? There are multiple ways, but the idea I thought of was to zip the tail of the list with the number of spaces it needs to append. Then we can fold it into a resulting list using a function like this:

-- (String, Int) is the next string and the number of spaces after it
combine :: String -> (String, Int) -> String
combine suffix (nextWord, numSpaces) =
  nextWord <> replicate numSpaces ' ' <> suffix

Remember while doing this that we’ve accumulated the words for each line in reverse order. So we want to append each one in succession, together with the number of spaces that come after it.

To use this function, we can “fold” over the “tail” of our current line, while using the first word in our list as the base of the fold! Don’t forget the quotRem math going on in here!

makeLine :: [String] -> Int -> Int -> String
makeLine [] _ _ = error "Cannot makeLine with empty string!"
makeLine [onlyWord] _ charsInLine =
  let extraSpaces = replicate (maxWidth - charsInLine) ' '
  in  onlyWord <> extraSpaces
makeLine (first : rest) wordsInLine charsInLine = ...
  let (baseNumSpaces, numWithExtraSpace) = quotRem (maxWidth - charsInLine) (wordsInLine - 1)
      baseSpaces = replicate (wordsInLine - 1 - numWithExtraSpace) baseNumSpaces
      extraSpaces = replicate numWithExtraSpace (baseNumSpaces + 1)
      wordsWithSpaces = zip rest (baseSpaces <> extraSpaces)
  in  foldl combine first wordsWithSpaces

combine :: String -> (String, Int) -> String
combine suffix (nextWord, numSpaces) =
  nextWord <> replicate numSpaces ' ' <> suffix

To make the final line, we can also leverage our combine function! It’s just a matter of combining each word in our input with the appropriate number of spaces. In this case, almost every word gets 1 space except for the last one (which comes first in our list). This just gets however many trailing spaces we need!

makeLineFinal :: [String] -> Int -> Int -> String
makeLineFinal [] _ _ = error "Cannot makeLine with empty string!"
makeLineFinal strs wordsInLine charsInLine =
  let trailingSpaces = maxWidth - charsInLine - (wordsInLine - 1)
  in  foldl combine "" (zip strs (trailingSpaces : repeat 1))

Putting all these pieces together, we have our complete solution!

justifyText :: [String] -> Int -> [String]
justifyText inputWords maxWidth = 
  reverse (makeLineFinal flLine fWordsInLine fCharsInLine : result)
  where
    (fLine, fWordsInLine, fCharsInLine, result) = foldl loop ([], 0, 0, []) inputWords

    loop :: ([String], Int, Int, [String]) -> String -> ([String], Int, Int, [String])
    loop (currentLine, wordsInLine, charsInLine, currResult) newWord =
      let newWordLen = length newWord
          resultLine = makeLine currentLine wordsInLine charsInLine
      in  if newWordLen + charsInLine + wordsInLine > maxWidth
            then ([newWord], 1, newWordLen, resultLine : currResult)
            else (newWord : currentLine, wordsInLine + 1, charsInLine + newWordLen, currResult)

    makeLine :: [String] -> Int -> Int -> String
    makeLine [] _ _ = error "Cannot makeLine with empty string!"
    makeLine [onlyWord] _ charsInLine =
      let extraSpaces = replicate (maxWidth - charsInLine) ' '
      in  onlyWord <> extraSpaces
    makeLine (first : rest) wordsInLine charsInLine =
      let (baseNumSpaces, numWithExtraSpace) = quotRem (maxWidth - charsInLine) (wordsInLine - 1)
          baseSpaces = replicate (wordsInLine - 1 - numWithExtraSpace) baseNumSpaces
          extraSpaces = replicate numWithExtraSpace (baseNumSpaces + 1)
          wordsWithSpaces = zip rest (baseSpaces <> extraSpaces)
      in  foldl combine first wordsWithSpaces

    makeLineFinal :: [String] -> Int -> Int -> String
    makeLineFinal [] _ _ = error "Cannot makeLine with empty string!"
    makeLineFinal strs wordsInLine charsInLine =
      let trailingSpaces = maxWidth - charsInLine - (wordsInLine - 1)
      in  foldl combine "" (zip strs (trailingSpaces : repeat 1))

    combine :: String -> (String, Int) -> String
    combine suffix (nextWord, numSpaces) = nextWord <> replicate numSpaces ' ' <> suffix

Rust Solution

Now let’s put together our Rust solution. Since we have a reasonable outline from writing this in Haskell, let’s start with the simpler elements, makeLine and makeLineFinal. We’ll use library functions as much as possible for the string manipulation. For example, we can start makeLineFinal by using join on our input vector of strings.

pub fn make_line_final(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = currentLine.join(" ");
    ...
}

Now we just need to calculate the number of trailing spaces, subtracting the number of characters in the joined string. We append this to the end by taking a blank space and using repeat for the correct number of times.

pub fn make_line_final(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = currentLine.join(" ");
    let trailingSpaces = max_width - result.len();
    result.push_str(&" ".repeat(trailingSpaces));
    return result;
}

For those unfamiliar with Rust, the type of our input vector might seem odd. When we have &Vec<&str>, this means a reference to a vector of string slices. String slices are portions of a String that we hold a reference to, but they aren’t copied. However, when we join them, we make a new String result.

Also note that we aren’t passing wordsInLine as a separate parameter. We can get this value using .len() in constant time in Rust. In Haskell, length is O(n) so we don’t want to always do that.

Now for the general make_line function, we have the same type signature, but we start with our base case, where we only have one string in our current line. Again, we use repeat with the number of spaces.

pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    ...
}

Now we do the “math” portion of this. Rust doesn’t have a single quotRem function in its base library, so we calculate these values separately.

pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    let numSpaces = (max_width - charsInLine);
    let baseNumSpaces = numSpaces / (n - 1);
    let numWithExtraSpace = numSpaces % (n - 1);
    let mut i = 0;
    while i < n {
        ...
    }
    return result;
}

The while loop we’ll write here is instructive. We use an index instead of a for each pattern because the index tells us how many spaces to use. If our index is smaller than numWithExtraSpace, we add 1 to the base number of spaces. Otherwise we use the base until the index n - 1. This index has no extra spaces, so we’re done at that point!

pub fn make_line(
        currentLine: &Vec<&str>,
        max_width: usize,
        charsInLine: usize) -> String {
    let mut result = String::new();
    let n = currentLine.len();
    if (n == 1) {
        result.push_str(currentLine[0]);
        result.push_str(&" ".repeat(max_width - charsInLine));
        return result;
    }
    let numSpaces = (max_width - charsInLine);
    let baseNumSpaces = numSpaces / (n - 1);
    let numWithExtraSpace = numSpaces % (n - 1);
    let mut i = 0;
    while i < n {
        result.push_str(currentLine[i]);
        if i < numWithExtraSpace {
            result.push_str(&" ".repeat(baseNumSpaces + 1));
        } else if i < n - 1 {
            result.push_str(&" ".repeat(baseNumSpaces));
        }
        i += 1;
    }
    return result;
}

Now we frame our solution. Let’s start by setting up our state variables (again, omitting numWordsInLine). We’ll also redefine max_width as a usize value for ease of comparison later.

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    ...
}

Now we’d like to frame our solution as a “for each” loop. However, this doesn’t work, for Rust-related reasons we’ll describe after the solution! Instead, we’ll use an index loop.

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        ...
    }
}

We’ll get the word by index on each iteration, and use its length to see if we’ll exceed the max width. If not, we can safely push it onto currentLine and increase the character count:

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            ...
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
}

Now when we do exceed the max width, we have to push our current line onto result (calling make_line). We clear the current line, push our new word, and use its length for charsInLine.

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            result.push(make_line(&currentLine, mw, charsInLine));
            currentLine.clear();
            currentLine.push(&words[i]);
            charsInLine = word.len();
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
    ...
}

After our loop, we’ll just call make_line_final on whatever is left in our currentLine! Here’s our complete full_justify function that calls make_line and make_line_final as we wrote above:

pub fn full_justify(words: Vec<String>, max_width: i32) -> Vec<String> {
    let mut currentLine = Vec::new();
    let mut charsInLine = 0;
    let mut result = Vec::new();
    let mw = max_width as usize;
    let mut i = 0;
    let n = words.len();
    for i in 0..n {
        let word = &words[i];
        if word.len() + charsInLine + currentLine.len() > mw {
            result.push(make_line(&currentLine, mw, charsInLine));
            currentLine.clear();
            currentLine.push(&words[i]);
            charsInLine = word.len();
        } else {
            currentLine.push(&words[i]);
            charsInLine += word.len();
        }
    }
    result.push(make_line_final(&currentLine, mw, charsInLine));
    return result;
}

Why an Index Loop?

Inside our Rust loop, we have an odd pattern in getting the “word” for this iteration. We first assign word = &words[i], and then later on, when we push that word, we reference words[i] again, using currentLine.push(&words[i]).

Why do this? Why not currentLen.push(word)? And then, why can’t we just do for word in words as our loop?

If we write our loop as for word in words, then we cannot reference the value word after the loop. It is “scoped” to the loop. However, currentLine “outlives” the loop! We have to reference currentLine at the end when we make our final line.

To get around this, we would basically have to copy the word instead of using a string reference &str, but this is unnecessarily expensive.

These are the sorts of odd “lifetime” quirks you have to learn to deal with in Rust. Haskell is easier in that it spares us from thinking about this. But Rust gains a significant performance boost with these sorts of ideas.

Conclusion

This was definitely the most involved problem we’ve dealt with so far. We learned a new paradigm (buffer and save), and got some experience dealing with some of the odd quirks and edge cases of string manipulation, especially in Rust. It was a fairly tricky problem, as far as list manipulation goes. For an easier example of a buffer and save problem, try solving Merge Intervals.

If you want to level up your Haskell problem solving skills, you need to take our course Solve.hs. This course will teach you everything you need to know about problem solving, data structures, and algorithms in Haskell. After this course, you’ll be in great shape to deal with these sorts of LeetCode style problems as they come up in your projects.

by James Bowen at June 30, 2025 08:30 AM

June 28, 2025

Magnus Therning

Reading Redis responses

When I began experimenting with writing a new Redis client package I decided to use lazy bytestrings, because:

  1. aeson seems to prefer it – the main encoding and decoding functions use lazy byte strings, though there are strict variants too.
  2. the Builder type in bytestring produce lazy bytestrings.

At the time I was happy to see that attoparsec seemed to support strict and lazy bytestrings equally well.

To get on with things I also wrote the simplest function I could come up with for sending and receiving data over the network – I used send and recv from Network.Socket.ByteString.Lazy in network. The function was really simple

import Network.Socket.ByteString.Lazy qualified as SB

sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \sock -> do
    _ <- SB.send sock $ toWireCmd cmd
    resp <- SB.recv sock 4096
    case decode resp of
        Left err -> pure $ Left $ RespError "decode" (TL.pack err)
        Right r -> pure $ k <$> fromWireResp cmd r

with decode defined like this

decode :: ByteString -> Either String Resp
decode = parseOnly resp

I knew I'd have to revisit this function, it was naïve to believe that a call to recv would always result in as single complete response. It was however good enough to get going. When I got to improving sendCmd I was a little surprised to find that I'd also have to switch to using strict bytestrings in the parser.

Interlude on the Redis serialisation protocol (RESP3)

The Redis protocol has some defining attributes

  • It's somewhat of a binary protocol. If you stick to keys and values that fall within the set of ASCII strings, then the protocol is humanly readable and you can rather easily use netcat or telnet as a client. However, you aren't limited to storing only readable strings.
  • It's somewhat of a request-response protocol. A notable exception is the publish-subscribe subset, but it's rather small and I reckon most Redis users don't use it.
  • It's somewhat of a type-length-value style protocol. Some of the data types include their length in bytes, e.g. bulk strings and verbatim strings. Other types include the number of elements, e.g. arrays and maps. A large number of them have no length at all, e.g. simple strings, integers, and doubles.

I suspect there are good reasons, I gather a lot of it has to do with speed. It does however cause one issue when writing a client: it's not possible to read a whole response without parsing it.

Rewriting sendCmd

With that extra information about the RESP3 protocol the naïve implementation above falls short in a few ways

  • The read buffer may contain more than one full message and give the definition of decode above any remaining bytes are simply dropped.1
  • The read buffer my contain less than one full message and then decode will return an error.2

Surely this must be solvable, because in my mind running the parser results in one of three things:

  1. Parsing is done and the result is returned, together with any input that wasn't consumed.
  2. The parsing is not done due to lack of input, this is typically encoded as a continuation.
  3. The parsing failed so the error is returned, together with input that wasn't consumed.

So, I started looking in the documentation for the module Data.Attoparsec.ByteString.Lazy in attoparsec. I was a little surprised to find that the Result type lacked a way to feed more input to a parser – it only has two constructors, Done and Fail:

data Result r
    = Fail ByteString [String] String
    | Done ByteString r

I'm guessing the idea is that the function producing the lazy bytestring in the first place should be able to produce more chunks of data on demand. That's likely what the lazy variant of recv does, but at the same time it also requires choosing a maximum length and that doesn't rhyme with RESP3. The lazy recv isn't quite lazy in the way I needed it to be.

When looking at the parser for strict bytestrings I calmed down. This parser follows what I've learned about parsers (it's not defined exactly like this; it's parameterised in its input but for the sake of simplicity I show it with ByteString as input):

data Result r
    = Fail ByteString [String] String
    | Partial (ByteString -> Result r)
    | Done ByteString r

Then to my delight I found that there's already a function for handling exactly my problem

parseWith :: Monad m => (m ByteString) -> Parser a -> ByteString -> m (Result a)

I only needed to rewrite the existing parser to work with strict bytestrings and work out how to write a function using recv (for strict bytestrings) that fulfils the requirements to be used as the first argument to parseWith. The first part wasn't very difficult due to the similarity between attoparsec's APIs for lazy and strict bytestrings. The second only had one complication. It turns out recv is blocking, but of course that doesn't work well with parseWith. I wrapped it in timeout based on the idea that timing out means there's no more data and the parser should be given an empty string so it finishes. I also decided to pass the parser as an argument, so I could use the same function for receiving responses for individual commands as well as for pipelines. The full receiving function is

import Data.ByteString qualified as BS
import Data.Text qualified as T
import Network.Socket.ByteString qualified as SB

recvParse :: S.Socket -> Parser r -> IO (Either Text (BS.ByteString, r))
recvParse sock parser = do
    parseWith receive parser BS.empty >>= \case
        Fail _ [] err -> pure $ Left (T.pack err)
        Fail _ ctxs err -> pure $ Left $ T.intercalate " > " (T.pack <$> ctxs) <> ": " <> T.pack err
        Partial _ -> pure $ Left "impossible error"
        Done rem result -> pure $ Right (rem, result)
  where
    receive =
        timeout 100_000 (SB.recv sock 4096) >>= \case
            Nothing -> pure BS.empty
            Just bs -> pure bs

Then I only needed to rewrite sendCmd and I wanted to do it in such a way that any remaining input data could be use in by the next call to sendCmd.3 I settled for modifying the Conn type to hold an IORef ByteString together with the socket and then the function ended up looking like this

sendCmd :: Conn -> Command r -> IO (Result r)
sendCmd (Conn p) (Command k cmd) = withResource p $ \(sock, remRef) -> do
    _ <- SBL.send sock $ toWireCmd cmd
    rem <- readIORef remRef
    recvParse sock rem resp >>= \case
        Left err -> pure $ Left $ RespError "recv/parse" err
        Right (newRem, r) -> do
            writeIORef remRef newRem
            pure $ k <$> fromWireResp cmd r

What's next?

I've started looking into pub/sub, and basically all of the work described in this post is a prerequisite for that. It's not very difficult on the protocol level, but I think it's difficult to come up with a design that allows maximal flexibility. I'm not even sure it's worthwhile the complexity.

Footnotes:

1

This isn't that much of a problem when sticking to the request-response commands, I think. It most certainly becomes a problem with pub/sub though.

2

I'm sure that whatever size of buffer I choose to use there'll be someone out there who's storing values that are larger. Then there's pipelining that makes it even more of an issue.

3

To be honest I'm not totally convinced there'll ever be any remaining input. Unless a single Conn is used by several threads – which would lead to much pain with the current implementation – or pub/sub is used – which isn't supported yet.

June 28, 2025 10:41 AM

June 27, 2025

Brent Yorgey

Competitive programming in Haskell: prefix sums

Competitive programming in Haskell: prefix sums

Posted on June 27, 2025
Tagged , , , , , ,

In a previous blog post I categorized a number of different techniques for calculating range queries. Today, I will discuss one of those techniques which is simple but frequently useful.

Precomputing prefix sums

Suppose we have a static sequence of values \(a_1, a_2, a_3, \dots, a_n\) drawn from some groupThat is, there is an associative binary operation with an identity element, and every element has an inverse.

, and want to be able to compute the total value (according to the group operation) of any contiguous subrange. That is, given a range \([i,j]\), we want to compute \(a_i \diamond a_{i+1} \diamond \dots \diamond a_j\) (where \(\diamond\) is the group operation). For example, we might have a sequence of integers and want to compute the sum, or perhaps the bitwise xor (but not the maximum) of all the values in any particular subrange.

Of course, we could simply compute \(a_i \diamond \dots \diamond a_j\) directly, but that takes \(O(n)\) time. With some simple preprocessing, it’s possible to compute the value of any range in constant time.

The key idea is to precompute an array \(P\) of prefix sums, so \(P_i = a_1 \diamond \dots \diamond a_i\). This can be computed in linear time via a scan; for example:

import Data.Array
import Data.List (scanl')

prefix :: Monoid a => [a] -> Array Int a
prefix a = listArray (0, length a) $ scanl' (<>) mempty a

Actually, I would typically use an unboxed array, which is faster but slightly more limited in its uses: import Data.Array.Unboxed, use UArray instead of Array, and add an IArray UArray a constraint.

Note that we set \(P_0 = 0\) (or whatever the identity element is for the group); this is why I had the sequence of values indexed starting from \(1\), so \(P_0\) corresponds to the empty sum, \(P_1 = a_1\), \(P_2 = a_1 \diamond a_2\), and so on.

Now, for the value of the range \([i,j]\), just compute \(P_j \diamond P_{i-1}^{-1}\)—that is, we start with a prefix that ends at the right place, then cancel or “subtract” the prefix that ends right before the range we want. For example, to find the sum of the integers \(a_5 + \dots + a_{10}\), we can compute \(P_{10} - P_4\).

range :: Group a => Array Int a -> Int -> Int -> a
range p i j = p!j <> inv (p!(i-1))

That’s why this only works for groups but not for general monoids: only in a group can we cancel unwanted values. So, for example, this works for finding the sum of any range, but not the maximum.

Practice problems

Want to practice? Here are a few problems that can be solved using techniques discussed in this post:

It is possible to generalize this scheme to 2D—that is, to compute the value of any subrectangle of a 2D grid of values from some group in only \(O(1)\) time. I will leave you the fun of figuring out the details.

If you’re looking for an extra challenge, here are a few harder problems which use techniques from this post as an important component, but require some additional nontrivial ingredients:

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at June 27, 2025 12:00 AM

June 25, 2025

Well-Typed.Com

Haskell records in 2025 (Haskell Unfolder #45)

Today, 2025-06-25, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm GMT, 20:30 CET, …) we are streaming the 45th episode of the Haskell Unfolder live on YouTube.

Haskell records in 2025 (Haskell Unfolder #45)

Haskell records as originally designed have had a reputation of being somewhat weird or, at worst, useless. A lot of features and modifications have been proposed over the years to improve the situation. But not all of these got implemented, or widespread adoption. The result is that the situation now is quite different from what it was in the old days, and additional changes are in the works. But the current state can be a bit confusing. Therefore, in this episode, we are going to look at how to make best use of Haskell records right now, discussing extensions such as DuplicateRecordFields, NoFieldSelectors, OverloadedRecordDot and OverloadedRecordUpdate, and we’ll get take a brief look at optics.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at June 25, 2025 12:00 AM

June 24, 2025

Haskell Interlude

66: Daniele Micciancio

Niki and Mike talked to Daniele Micciancio who is a professor at UC San Diego. He's been using Haskell for 20 years, and works in lattice cryptography. We talked to him about how he got into Haskell, using Haskell for teaching theoretical computer science and of course for his research and the role type systems and comonads could play in the design of cryptographic algorithms. Along the way, he gave an accessible introduction to post-quantum cryptography which we really enjoyed. We hope you do, too. 

by Haskell Podcast at June 24, 2025 02:00 PM

June 23, 2025

Brent Yorgey

Competitive programming in Haskell: range queries, classified

Competitive programming in Haskell: range queries, classified

Posted on June 23, 2025
Tagged , , , , ,

Static range queries

Suppose we have a sequence of values, which is static in the sense that the values in the sequence will never change, and we want to perform range queries, that is, for various ranges we want to compute the total of all consecutive values in the range, according to some binary combining operation. For example, we might want to compute the maximum, sum, or product of all the consecutive values in a certain subrange. We have various options depending on the kind of ranges we want and the algebraic properties of the operation.

  • If we want ranges corresponding to a sliding window, we can use an amortized queue structure to find the total of each range in \(O(1)\), for an arbitrary monoid.

  • If we want arbitrary ranges but the operation is a group, the solution is relatively straightforward: we can precompute all prefix sums, and subtract to find the result for an arbitrary range in \(O(1)\).

  • If the operation is an idempotent semigroup (that is, it has the property that \(x \diamond x = x\) for all \(x\)), we can use a sparse table, which takes \(O(n \lg n)\) time and space for precomputation, and then allows us to answer arbitrary range queries in \(O(1)\).

  • If the operation is an arbitrary monoid, we can use a sqrt tree, which uses \(O(n \lg \lg n)\) precomputed time and space, and allows answering arbitrary range queries in \(O(\lg \lg n)\). I will write about this in a future post.

Dynamic range queries

What if we want dynamic range queries, that is, we want to be able to interleave range queries with arbitrary updates to the values of the sequence?

  • If the operation is an arbitrary monoid, we can use a segment tree.
  • If the operation is a group, we can use a Fenwick tree.

I published a paper about Fenwick trees, which also discusses segment trees, but I should write more about them here!

Table

Here’s a table summarizing the above classification scheme. I plan to fill in links as I write blog posts about each row.

Sequence Ranges Operation Solution Precomputation Queries
Static Sliding window Monoid Amortized queue \(O(1)\) \(O(1)\)
Static Arbitrary Group Prefix sum table \(O(n)\) \(O(1)\)
Static Arbitrary Idempotent semigroup Sparse table \(O(n \lg n)\) \(O(1)\)
Static Arbitrary Monoid Sqrt tree \(O(n \lg \lg n)\) \(O(\lg \lg n)\)
Dynamic Arbitrary Group Fenwick tree \(O(n)\) \(O(\lg n)\)
Dynamic Arbitrary Monoid Segment tree \(O(n)\) \(O(\lg n)\)
<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at June 23, 2025 12:00 AM

June 22, 2025

Philip Wadler

How to market Haskell to a mainstream programmer

An intriguing talk by Gabriella Gonzalez, delivered at Haskell Love 2020. Based largely on the famous marketing book, Crossing the Chasm. Gonzalez argues that marketing is not about hype, it is about setting priorities: what features and markets are you going to ignore? The key to adoption is to be able to solve a problem that people need solved today and where existing mainstream tools are inadequate. Joe Armstrong will tell you that the key to getting Erlang used was to approach failing projects and ask "Would you like us to build you a prototype?" Gonzalez makes a strong case that Haskell should first aim to capture the interpreters market. He points out that the finance/blockchain market may be another possibility. Recommended to me at Lambda Days by Pedro Abreu, host of the Type Theory Forall podcast.



by Philip Wadler (noreply@blogger.com) at June 22, 2025 07:07 PM

What is happening in Gaza is an injury to our collective conscience. We must be allowed to speak out

gaza-mate.JPG 

A powerful op-ed by Gabor Maté in the Toronto Star.

Just as nothing justifies the atrocities of October 7, nothing about October 7 justifies Israeli atrocities against the Palestinians, either before or since October 7. Recently, I listened to orthopedic surgeon Dr. Deirdre Nunan, like me a graduate of UBC’s Faculty of Medicine, recount her harrowing experiences serving in a Gaza hospital under the siege that followed Israel’s breaking of the ceasefire in March. Her depictions of unspeakable horror, enacted as policy by one of the world’s most sophisticated militaries, were soul shattering. Many other physicians — Canadian, American, Jewish, Muslim, Christian — who have worked in Gaza speak in similar terms. British doctors describe witnessing “a slaughterhouse.” All their testimonies are widely accessible. The leading medical journal Lancet editorialized that in its assault on health care facilities and personnel in Gaza, “the Israeli Government has acted with impunity … Many medical academies and health professional organizations that claim a commitment to social justice have failed to speak out.” ...

It may be true that antisemitic animus can lurk behind critiques of Zionism. But in my decades of advocacy for Palestinian rights including medical visits to Gaza and the West Bank, I have rarely witnessed it. When present, it has a certain tone that one can feel is directed at Jewishness itself, rather than at the theory and practice of Zionism or at Israel’s actions. What is far more common and genuinely confusing for many is that Israel and its supporters, Jews and non-Jews, habitually confound opposition to Israeli policy with antisemitism. This is akin to Vietnam War protesters being accused of anti-Americanism. How is opposing the napalming of human beings anti-American or, say, deploring Israel’s use of mass starvation as a weapon of war in any sense anti-Jewish? ...

People deserve the right to experience as much liberty to publicly mourn, question, oppose, deplore, denounce what they perceive as the perpetration of injustice and inhumanity as they are, in this country, to advocate for the aims and actions of the Israeli government and its Canadian abettors amongst our political leadership, academia, and media.

Even if we feel powerless to stop the first genocide we have ever watched on our screens in real time, allow at least our hearts to be broken openly, as mine is. And more, let us be free to take democratic, non-hateful action without fear of incurring the calumny of racism.

Thanks to a colleague in the Scottish Universities Jewish Staff Network for bringing it to my attention.

by Philip Wadler (noreply@blogger.com) at June 22, 2025 05:03 PM

The Provocateurs: Brave New Bullshit

[Reposting with update.]

Following two sell-out shows at the Fringe last year, I'm on at the Fringe again:

11.25 Monday 4 August, Stand 2 w/Lucy Remnant and Susan Morrison
17.40 Sunday 17 August, Stand 4 w/Smita Kheria and Sarah-Jane Judge
17.40 Tuesday 19 August, Stand 4 w/Cameron Wyatt and Susan Morrison

Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. The official blurb is brief:

Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.

Here is a longer blurb, from my upcoming appearance at Curious, run by the RSE, in September.
Brave New Bullshit
In an AI era, who wins and who loses?

Your future workday might look like this: 
  • You write bullet points.
  • You ask a chatbot to expand them into a report.
  • You send it to your boss ...
  • Who asks a chatbot to summarise it to bullet points.
Will AI help you to do your job or take it from you? Is it fair for AI to be trained on copyrighted material? Will any productivity gains benefit everyone or only a select few?
 
Join Professor Philip Wadler’s talk as he looks at the hopes and threats of AI, exploring who wins and who loses.

by Philip Wadler (noreply@blogger.com) at June 22, 2025 04:40 PM

June 20, 2025

Magnus Therning

Finding a type for Redis commands

Arriving at a type for Redis commands required a bit of exploration. I had some ideas early on that I for various reasons ended up dropping on the way. This is a post about my travels, hopefully someone finds it worthwhile reading.

The protocol

The Redis Serialization Protocol (RESP) initially reminded me of JSON and I thought that following the pattern of aeson might be a good idea. I decided up-front that I'd only support the latest version of RESP, i.e. version 3. So, I thought of a data type, Resp with a constructor for each RESP3 data type, and a pair of type classes, FromResp and ToResp for converting between Haskell types and RESP3. Then after some more reflection I realised that converting to RESP is largely pointless. The main reason to convert anything to RESP3 is to assemble a command, with its arguments, to send to Redis, but all commands are arrays of bulk strings so it's unlikely that anyone will actually use ToResp.1 So I scrapped the idea of ToResp. FromResp looked like this

class FromResp a where
    fromResp :: Value -> Either FromRespError a

When I started defining commands I didn't like the number of ByteString arguments that resulted in, so I defined a data type, Arg, and an accompanying type class for arguments, ToArg:

newtype Arg = Arg {unArg :: [ByteString]}
    deriving (Show, Semigroup, Monoid)

class ToArg a where
    toArg :: a -> Arg

Later on I saw that it might also be nice to have a type class specifically for keys, ToKey, though that's a wrapper for a single ByteString.

Implementing the functions to encode/decode the protocol were straight-forward applications of attoparsec and bytestring (using its Builder).

A command is a function in need of a sender

Even though supporting pipelining was one of the goals I felt a need to make sure I'd understood the protocol so I started off with single commands. The protocol is a simple request/response protocol at the core so I settled on this type for commands

type Cmd a = forall m. (Monad m) => (ByteString -> m ByteString) -> m (Either FromRespError a)

that is, a command is a function accepting a sender and returning an a.

I wrote a helper function for defining commands, sendCmd

sendCmd :: (Monad m, FromResp a) => [ByteString] -> (ByteString -> m ByteString) -> m (Either FromRespError a)
sendCmd cmdArgs send = do
    let cmd = encode $ Array $ map BulkString cmdArgs
    send cmd <&> decode >>= \case
        Left desc -> pure $ Left $ FromRespError "Decode" (Text.pack desc)
        Right v -> pure $ fromValue v

which made it easy to define commands. Here are two examples, append and mget:

append :: (ToArg a, ToArg b) => a -> b -> Cmd Int
append key val = sendCmd $ ["APPEND"] <> unArg (toArg key <> toArg val)

-- | https://redis.io/docs/latest/commands/mget/
mget :: (ToArg a, FromResp b) => NE.NonEmpty a -> Cmd (NE.NonEmpty b)
mget ks = sendCmd $ ["MGET"] <> unArg (foldMap1 toArg ks)

The function to send off a command and receive its response, sendAndRecieve, was just a call to send followed by a call to recv in network (the variants for lazy bytestrings).

I sort of liked this representation – there's always something pleasant with finding a way to represent something as a function. There's a very big problem with it though: it's difficult to implement pipelining!

Yes, Cmd is a functor since (->) r is a functor, and thus it's possible to make it an Applicative, e.g. using free. However, to implement pipelining it's necessary to

  1. encode all commands, then
  2. concatenate them all into a single bytestring and send it
  3. read the response, which is a concatenation of the individual commands' responses, and
  4. convert each separate response from RESP3.

That isn't easy when each command contains its own encoding and decoding. The sender function would have to relinquish control after encoding the command, and resume with the resume again later to decode it. I suspect it's doable using continuations, or monad-coroutine, but it felt complicated and rather than travelling down that road I asked for ideas on the Haskell Discourse. The replies lead me to a paper, Free delivery, and a bit later a package, monad-batcher. When I got the pointer to the package I'd already read the paper and started implementing the ideas in it, so I decided to save exploring monad-batcher for later.

A command for free delivery

The paper Free delivery is a perfect match for pipelining in Redis, and my understanding is that it proposes a solution where

  1. Commands are defined as a GADT, Command a.
  2. Two functions are defined to serialise and deserialise a Command a. In the paper they use String as the serialisation, so show and read is used.
  3. A type, ActionA a, is defined that combines a command with a modification of its a result. It implements Functor.
  4. A free type, FreeA f a is defined, and made into an Applicative with the constraint that f is a Functor.
  5. A function, serializeA, is defined that traverses a FreeA ActionA a serialising each command.
  6. A function, deserializeA, is defined that traverses a FreeA ActionA a deserialising the response for each command.

I defined a command type, Command a, with only three commands in it, echo, hello, and ping. I then followed the recipe above to verify that I could get it working at all. The Haskell used in the paper is showing its age, and there seems to be a Functor instance missing, but it was still straight forward and I could verify that it worked against a locally running Redis.

Then I made a few changes…

I renamed the command type to Cmd so I could use Command for what the paper calls ActionA.

data Cmd r where
    Echo :: Text -> Cmd Text
    Hello :: Maybe Int -> Cmd ()
    Ping :: Maybe Text -> Cmd Text

data Command a = forall r. Command !(r -> a) !(Cmd r)

instance Functor Command where
    fmap f (Command k c) = Command (f . k) c

toWireCmd :: Cmd r -> ByteString
toWireCmd (Echo msg) = _
toWireCmd (Hello ver) = _
toWireCmd (Ping msg) = _

fromWireResp :: Cmd r -> Resp -> Either RespError r
fromWireResp (Echo _) = fromResp
fromWireResp (Hello _) = fromResp
fromWireResp (Ping _) = fromResp

(At this point I was still using FromResp.)

I also replaced the free applicative defined in the paper and started using free. A couple of type aliases make it a little easier to write nice signatures

type Pipeline a = Ap Command a

type PipelineResult a = Validation [RespError] a

and defining individual pipeline commands turned into something rather mechanical. (I also swapped the order of the arguments to build a Command so I can use point-free style here.)

liftPipe :: (FromResp r) => Cmd r -> Pipeline r
liftPipe = liftAp . Command id

echo :: Text -> Pipeline Text
echo = liftPipe . Echo

hello :: Maybe Int -> Pipeline ()
hello = liftPipe . Hello

ping :: Maybe Text -> Pipeline Text
ping = liftPipe . Ping

One nice thing with switching to free was that serialisation became very simple

toWirePipeline :: Pipeline a -> ByteString
toWirePipeline = runAp_ $ \(Command _ c) -> toWireCmd c

On the other hand deserialisation became a little more involved, but it's not too bad

fromWirePipelineResp :: Pipeline a -> [Resp] -> PipelineResult a
fromWirePipelineResp (Pure a) _ = pure a
fromWirePipelineResp (Ap (Command k c) p) (r : rs) = fromWirePipelineResp p rs <*> (k <$> liftError singleton (fromWireResp c r))
fromWirePipelineResp _ _ = Failure [RespError "fromWirePipelineResp" "Unexpected wire result"]

Everything was working nicely and I started adding support for more commands. I used the small service from work to guide my choice of what commands to add. First out was del, then get and set. After adding lpush I was pretty much ready to try to replace hedis in the service from work.

data Cmd r where
    -- echo, hello, ping
    Del :: (ToKey k) => NonEmpty k -> Cmd Int
    Get :: (ToKey k, FromResp r) => k -> Cmd r
    Set :: (ToKey k, ToArg v) => k -> v -> Cmd Bool
    Lpush :: (ToKey k, ToArg v) => k -> NonEmpty v -> Cmd Int

However, when looking at the above definition started I thinking.

  • Was it really a good idea to litter Cmd with constraints like that?
  • Would it make sense to keep the Cmd type a bit closer to the actual Redis commands?
  • Also, maybe FromResp wasn't such a good idea after all, what if I remove it?

That brought me to the third version of the type for Redis commands.

Converging and simplifying

While adding new commands and writing instances of FromResp I slowly realised that my initial thinking of RESP3 as somewhat similar to JSON didn't really pan out. I had quickly dropped ToResp and now the instances of FromResp didn't sit right with me. They obviously had to "follow the commands", so to speak, but at the same time allow users to bring their own types. For instance, LSPUSH returns the number of pushed messages, but at the same time GET should be able to return an Int too. This led to Int's FromResp looking like this

instance FromResp Int where
    fromResp (BulkString bs) =
        case parseOnly (AC8.signed AC8.decimal) bs of
            Left s -> Left $ RespError "FromResp" (TL.pack s)
            Right n -> Right n
    fromResp (Number n) = Right $ fromEnum n
    fromResp _ = Left $ RespError "FromResp" "Unexpected value"

I could see this becoming worse, take the instance for Bool, I'd have to consider that

  • for MOVE Integer 1 means True and Integer 0 means False
  • for SET SimpleString "OK" means True
  • users would justifiably expect a bunch of bytestrings to be True, e.g. BulkString "true", BulkString "TRUE", BulkString "1", etc

However, it's impossible to cover all ways users can encode a Bool in a ByteString so no matter what I do users will end up having to wrap their Bool with newtype and implement a fitting FromResp. On top of that, even thought I haven't found any example of it yet, I fully expect there to be, somewhere in the large set of Redis commands, at least two commands each wanting an instance of a basic type that simply can't be combined into a single instance, meaning that the client library would need to do some newtype wrapping too.

No, I really didn't like it! So, could I get rid of FromResp and still offer users an API where they can user their own types as the result of commands?

To be concrete I wanted this

data Cmd r where
    -- other commands
    Get :: (ToKey k) => k -> Cmd (Maybe ByteString)

and I wanted the user to be able to conveniently turn a Cmd r into a Cmd s. In other words, I wanted a Functor instance. Making Cmd itself a functor isn't necessary and I just happened to already have a functor type that wraps Cmd, the Command type I used for pipelining. If I were to use that I'd need to write wrapper functions for each command though, but if I did that then I could also remove the ToKey~/~ToArg constraints from the constructors of Cmd r and put them on the wrapper instead. I'd get

data Cmd r where
    -- other commands
    Get :: Key -> Cmd (Maybe ByteString)

get :: (ToKey k) => k -> Command (Maybe ByteString)
get = Command id . Get . toKey

I'd also have to rewrite fromWireResp so it's more specific for each command. Instead of

fromWireResp :: Cmd r -> Resp -> Either RespError r
fromWireResp (Get _) = fromResp
...

I had to match up exactly on the possible replies to GET

fromWireResp :: Cmd r -> Resp -> Either RespError r
fromWireResp _ (SimpleError err desc) = Left $ RespError (T.decodeUtf8 err) (T.decodeUtf8 desc)
fromWireResp (Get _) (BulkString bs) = Right $ Just bs
fromWireResp (Get _) Null = Right Nothing
...
fromWireResp _ _ = Left $ RespError "fromWireResp" "Unexpected value"

Even though it was more code I liked it better than before, and I think it's slightly simpler code. I also hope it makes the use of the API is a bit simpler and clear.

Here's an example from the code for the service I wrote for work. It reads a UTC timestamp stored in timeKey, the timestamp is a JSON string so it needs to be decoded.

readUTCTime :: Connection -> IO (Maybe UTCTime)
readUTCTime conn =
    sendCmd conn (maybe Nothing decode <$> get timeKey) >>= \case
        Left _ -> pure Nothing
        Right datum -> pure datum

What's next?

I'm pretty happy with the command type for now, though I have a feeling I'll have to revisit Arg and ToArg at some point.

I've just turned the Connection type into a pool using resource-pool, and I started looking at pub/sub. The latter thing, pub/sub, will require some thought and experimentation I think. Quite possibly it'll end up in a post here too.

I also have a lot of commands to add.

Footnotes:

1

Of course one could use RESP3 as the serialisation format for storing values in Redis. Personally I think I'd prefer using something more widely used, and easier to read, such as JSON or BSON.

June 20, 2025 09:40 PM

Well-Typed.Com

GHC activities report: March–May 2025

This is the twenty-seventh edition of our GHC activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of March 2025 to May 2025. You can find the previous editions collected under the ghc-activities-report tag.

Sponsorship

We offer Haskell Ecosystem Support Packages to provide commercial users with support from Well-Typed’s experts, while investing in the Haskell community and its technical ecosystem including through the work described in this report. To find out more, read our recent announcement of these packages in partnership with the Haskell Foundation. We need funding to continue this essential maintenance work!

Many thanks to our Haskell Ecosystem Supporters: Channable and QBayLogic; to our existing clients who also contribute to making this work possible: Anduril, Juspay and Mercury; and to the HLS Open Collective for supporting HLS release management.

Team

The Haskell toolchain team at Well-Typed currently includes:

In addition, many others within Well-Typed contribute to GHC, Cabal and HLS occasionally, or contribute to other open source Haskell libraries and tools.

GHC

Highlights

Explicit level imports

Following on from our best paper prize at TFP 2025, Matthew implemented Explicit Level Imports (GHC proposal #682, !14241).

This feature allows one to specify whether imports are needed for running Template Haskell splices, or for generating Template Haskell quotes. This cleanly separates which modules are required at compile-time vs those that are required at runtime. For example, the pandoc package uses the Template Haskell deriveJSON function from the aeson package. This function can be imported using a splice import:

{-# LANGUAGE ExplicitLevelImports #-}
{-# LANGUAGE TemplateHaskell #-}
module Text.Pandoc.App.Opt where
import splice Data.Aeson.TH (deriveJSON, defaultOptions)
-- + many other non-splice imports

data XYZ = ...
$(deriveJSON defaultOptions ''XYZ)

Declaring the Data.Aeson.TH import as a splice import informs GHC that this module is required only at compile-time, and (crucially) that other, non-splice, imports, are not needed at compile time. This hugely improves the performance of tools that use -fno-code (such as HLS), as GHC is no longer required to pessimistically assume that all modules imported in a module enabling TemplateHaskell are required at compile-time.

GHCi support for primops

Andreas significantly improved GHCi performance by implementing certain GHC primops (such as integer arithmetic operations) directly in the bytecode interpreter (!13978).

Reductions in runtime of up to 50% have been observed, with GHC-in-GHCi speeding up by about 15%.

Improvements to the debugger

Rodrigo has made numerous improvements to the GHCi debugger, which had accumulated many bugs over the years due to lack of maintenance (!14246, !14195, !14160, !14106, !14196, !14195, !13997). Usability is improved across the board, with quality-of-life fixed such as adding breakpoints to all statements in a do block to make debugging more predictable (#25932) to significant performance improvements to :steplocal (#25779).

Rodrigo also published the ghc-debugger package including an executable ghc-debug-adapter. This implements the Debug Adapter Protocol, enabling Haskell programs to be stepped-through and debugged from editors such as Visual Studio Code. ghc-debug-adapter depends on many recent changes to GHC, so it is compatible only with the upcoming GHC 9.14.

Expressions in SPECIALISE pragmas

Sam worked with Simon Peyton Jones to finalise MR !12319 “Expressions in SPECIALISE pragmas”. This change means that a SPECIALISE pragma is no longer required to simply be a type signature, it can be an arbitrary expression. For full details, see GHC proposal #493, but two particular idioms are worth noting. Firstly, the type at which to specialise can now be specified by a type application, e.g.

myFunction :: forall a. Num a => a -> Maybe a -> (a, a)
myFunction = ...
{-# SPECIALISE myFunction @Int #-}

This specialise pragma is much more concise than:

{-# SPECIALISE :: Int -> Maybe Int -> (Int, Int) #-}

and less prone to breakage when the type of myFunction changes.

Secondly, the syntax enables value specialisation, for example:

mainFunction :: Bool -> ...
mainFunction debug = if debug then ... else ...
{-# SPECIALISE mainFunction False #-}

This tells GHC to optimise the non-debug code path, without the debug logic potentially getting in the way.

Multiple Home Units support in GHCi

GHC 9.14 is fully compatible with multiple home units, including all GHCi commands and the GHCi debugger, thanks to work by Hannes about which we recently published a blog post (!14231). Our new design generalises the architecture of GHCi so that multi-unit and single-unit sessions are handled in the same way. The uniform handling will make sure that multi-unit sessions work correctly as GHCi evolves.

GHC Releases

Frontend

  • Sam fixed a regression in the implementation of QuickLook in GHC 9.12 that would cause valid programs to be rejected (#26030, #25950, !14235).

  • Sam fixed a problem in which HasCallStack evidence was incorrectly cached in GHC, causing GHC to bogusly report identical call stacks (#25529, !14084).

  • Sam rectified several oversights in the initial implementation of the NamedDefaults language extension laid out in GHC proposal #409:

    • an issue with exporting named defaults (#25857, !14142),
    • lack of support for named default declarations for poly-kinded typeclasses such as Typeable (#25882, !14143),
    • an oversight in which NamedDefaults changed the behaviour of existing programs (#25775, !14075, ghc-proposals#694).
  • Sam fixed duplicate record fields sometimes being reported as unused when they are actually used (#24035, !14066).

  • Sam improved the error message emitted by GHC when one attempts to write a non-class at the head of a typeclass instance (#22688, !14105).

  • Sam fixed several issues with the renaming of export lists:

    • one issue involved the TypeData extension (#24027, !14119),
    • another was to do with bundled pattern synonyms (#25892, !14154).
  • Sam made “illegal term-level use” error messages more user friendly (#23982, !14122). That MR also improved the way GHC reports name qualification to the user, preferring to display the user-written qualification in error messages.

  • Sam fixed GHC creating unnecessary cycle-breaker variables, which could cause problems for type-checking plugins that weren’t expecting them (#25933, !14206).

  • Sam implemented the deprecation described in GHC proposal #448: the combination of ScopedTypeVariables and TypeApplications no longer enables the use of type applications in constructor patterns, requiring instead the TypeAbstractions extension (!13551).

  • Sam fixed an issue in which equal types compared non-equal under TypeRep-equality by implementing a suggestion by Krzysztof Gogolewski (#25998, !14281).

  • Sam improved the documentation surrounding defaulting in the user’s guide, providing a high-level overview of the different mechanisms in GHC for defaulting ambiguous type variables (#25807, !14057).

Backend

  • Ben and Sam investigated testsuite failures in the LLVM backend (#25769). They identified many different issues:

    • #25730 concerned incorrect type annotations in the generated LLVM, fixed in !13936.
    • #25770, #25773 were symptoms of a serious bug in the implementation of floating-point register padding (fixed in !14134),
    • !14129 fixed incorrect type annotations in the LLVM for atomic operations, adding new tests to Cmm Lint to avoid similar bugs in the future.
    • Most of the other bugs involved initializers/finalizers, which were due to incorrect linkage annotation for builtin arrays (fixed in !14157).
  • Rodrigo worked with Simon Peyton Jones to fix an issue in which the presence or absence of unrelated RULES could affect compilation, leading to non-deterministic compilation (#25170, !13884).

  • Andreas fixed a bug in which GHC would construct over-saturated constructor applications, which caused a panic when building the xmonad-contrib package (#23865, !14036).

  • Andreas made GHC constant-fold away invalid tagToEnum# calls to a particular error expression, which unlocks dead-code elimination opportunities and makes it easier to debug issues that arise from invalid use of tagToEnum# (#25976, !14254)

  • Andreas added -fhuge-code-sections, an off-by-default flag that provides a workaround for AArch64 users running into bug #24648.

  • Matthew overhauled the driver to bring one-shot compilation and make mode in line with each other, by consistently using the module graph to answer queries related to the module import structure (!14198, !14209). This was partly motivated by implementation requirements of the “Explicit Splice Imports” proposal, for which module graph queries are a central component.

  • Matthew added support for “fixed” nodes in the module graph, which can be used for modules without corresponding source-files that are e.g. generated via the GHC API (#25920, !14187).

  • Rodrigo moved some DynFlags consistency checks in order to consolidate the logic into the core makeDynFlagsConsistent function.

  • Ben changed how GHC prints Uniques to the user to avoid NULL characters (#25989, !14265).

Compiler performance

  • Matthew improved the performance of the bytecode assembler by ensuring the code is properly specialised (!13983).

  • Matthew made sure that forceModIface properly forced all fields of ModIface in order to avoid space leaks (!14078).

  • Matthew removed unused mi_used_th and mi_hpc fields from interfaces, which were needlessly bloating interface files (!14073).

  • Matthew avoided allocation of intermediate ByteStrings when serialising FastStrings (#25861, !14107).

Recompilation checking

  • Matthew overhauled the ModIface datatype, splitting it up in a more logical way which makes it easier to identify which parts contribute to recompilation checking (!14102). This allowed fixing several issues with recompilation checking in !14118, such as:

    • it ignored changes in exported named default declarations (#25855),
    • it did not take into account changes to COMPLETE pragmas (#25854).
  • Matthew added the -fwrite-if-self-recomp flag which controls whether to include self-recompilation information, which avoids writing recompilation information in cases such as producing binary distributions for which recompilation is not a concern (#10424, #22188, !8604).

  • Matthew refactored the implementation of recompilation-checking to ensure that all flags that influence recompilations are correctly taken into account (#25837, !14085).

  • Sam improved recompilation checking for export lists in !14178 (#25881). In practice, this means that modules with explicit import lists will no longer always trigger the recompilation of a module they depend on when that module’s export list changes, as long as the explicitly imported items are preserved.

  • Matthew improved the output of -dump-hi-diff to properly display the precise change in flags which caused recompilation (#25571, !13792).

Runtime system

  • Ben fixed a bug in which the WinIO I/O manager was being inconsistently selected (#25838, !14088).

  • Ben diagnosed and fixed a linking issue affecting global offset table usage on macOS that manifested in incorrect runtime results when using the GHC API (#25577, !13991).

  • Ben fixed an issue in which GHC’s RTS linker was too eager to load shared objects which refer to undefined symbols (#25943, !14290).

  • Ben significantly improved the performance of the RTS linker, culminating in a reduction in GHCi startup time from 2.5s to 250ms on Windows (#26052, #26009, !14339).

GHCi & bytecode interpreter

  • Andreas fixed several endianness issues in the interpreter (#25791, !14172).

  • Matthew implemented a fix for the mishandling of stack underflow frames (#25750, !13957). A remaining issue was subsequently identified (#25865) and fixed by Andreas’ work on the interpreter (!13978).

  • Matthew ensured that all top-level functions are visible when loading a module in the interpreter, not only exported functions (!14032).

  • Matthew fixed a bug in the simplifier that caused Core Lint failures when compiling certain programs (#25790, !14019).

  • Matthew fixed a regression in the way that GHCi would import modules that involved Cabal mixins stanzas (#25951, !14222).

Libraries

  • Ben exposed the constructors and fields of the Backtrace datatype in base (#26049, !14351).

  • Ben brought base changelog entries up to date in !14320.

Build system & packaging

  • Sam fixed GHC not working properly if the installation path contains spaces on Windows (#25204, !14137).

  • Ben fixed a couple of issues relating to the llvm-as flag:

    • the value of the field was incorrectly set (#25856, !14104),
    • the information in the field was passed incorrectly to clang (#25793, !14025).

Testsuite

  • Andreas fixed a bug in which tests requiring the interpreter would be run even if the compiler didn’t support it (#25533, !14201).

  • Matthew fixed an issue with tests that used Template Haskell in the profiled dynamic way (#25947, !14215).

Cabal

  • Mikolaj prepared the 3.14.2.0 bugfix release to the Cabal package suite (including the Cabal library and cabal-install).

  • Matthew fixed all known regressions in the 3.14.1.0 release of cabal-install:

    • Issue #10759 to do with picking up unwanted environment files #10828.
    • Duplication of environment variables (#10718, #10827).
    • Interaction of multi-repl with internal dependencies (#10775, #10841).
    • A working directory oversight (#10772, #10800).
    • The pkgname_datadir environment variable incorrectly using a relative path (#10717, #10830).
  • Matthew updated the outdated and gen-bounds commands to work with the v2- project infrastructure (#10878, #10840).

  • Matthew ensured that C++ environment variables are passed to configure scripts (#10797, #10844).

  • Matthew added a module name validity check to the cabal check command (#10295, #10816).

  • Matthew updated the Cabal CI to use GHC 9.12.2 and GHC 9.6.7 (#10893).

  • Matthew improved the testsuite output to make it more readable (#8419, #10837).

  • Matthew fixed an issue in which changes to the PATH environment variable would incorrectly not trigger recompilation (#2015, #10817).

HLS

  • Hannes prepared the HLS release 2.10.0.0 (#4448)

  • Zubin prepared the HLS release 2.11.0.0 (#4585)

  • Zubin added support for GHC 9.12.2 in HLS (#4527)

  • Zubin reworked the HLS release CI infrastructure (#4481)

Haskell.org infrastructure

Ben worked to refactor and migrate a variety of core haskell.org services from Equinix Metal to new infrastructure at OpenCape:

  • hoogle.haskell.org has been Nixified and now periodically reindexes automatically.

  • Haskell.org’s primary mail server, mail.haskell.org, has been Nixified and updated.

  • Haskell.org’s many mailing lists have been migrated to Mailman 3

  • gitlab.haskell.org has been migrated to OpenCape and updated

  • The Hackage documentation builder has been completely revamped with a more maintainable deployment strategy and a broader set of native packages available, enabling more Hackage packages to benefit from automatically-built documentation.

With these maintainability improvements we hope that haskell.org’s core infrastructure team can be more easily grown in the future.

by adam, andreask, ben, hannes, matthew, mikolaj, rodrigo, sam, zubin at June 20, 2025 12:00 AM

June 17, 2025

Magnus Therning

Why I'm writing a Redis client package

A couple of weeks ago I needed a small, hopefully temporary, service at work. It bridges a gap in functionality provided by a legacy system and the functionality desired by a new system. The legacy system is cumbersome to work with, so we tend to prefer building anti-corruption layers rather than changing it directly, and sometimes we implement it as separate services.

This time it was good enough to run the service as a cronjob, but it did need to keep track of when it ran the last time. It felt silly to spin up a separate DB just to keep a timestamp, and using another service's DB is something I really dislike and avoid.1 So, I ended up using the Redis instance that's used as a cache by a OSS service we host.

The last time I had a look at the options for writing a Redis client in Haskell I found two candidates, hedis and redis-io. At the time I wrote a short note about them. This time around I found nothing much has changed, they are still the only two contenders and they still suffer from the same issues

  • hedis has still has the same API and I still find it as awkward.
  • redis-io still requires a logger.

I once again decided to use hedis and wrote the service for work in a couple of days, but this time I thought I'd see what it would take to remove the requirement on tinylog from redis-io. I spent a few evenings on it, though I spent most time on "modernising" the dev setup, using Nix to build, re-format using fourmolu, etc. I did the same for redis-resp, the main dependency of redis-io. The result of that can be found on my gitlab account:

At the moment I won't take that particular experiment any further and given that the most recent change to redis-io was in 2020 (according to its git repo) I don't think there's much interest upstream either.

Making the changes to redis-io and redis-resp made me a little curious about the Redis protocol so I started reading about it. It made me start thinking about implementing a client lib myself. How hard could it be?

I'd also asked a question about Redis client libs on r/haskell and a response led me to redis-schema. It has a very good README, and its section on transactions with its observation that Redis transactions are a perfect match for Applicative. This pushed me even closer to start writing a client lib. What pushed me over the edge was the realisation that pipelining also is a perfect match for Applicative.

For the last few weeks I've spent some of my free time reading and experimenting and I'm enjoying it very much. We'll see where it leads, but hopefully I'll at least have bit more to write about it.

Footnotes:

1

One definition of a microservice I find very useful is "a service that owns its own DB schema."

June 17, 2025 08:43 PM

June 16, 2025

Well-Typed.Com

Making GHCi compatible with multiple home units

GHC’s support for compiling multiple units in a single invocation is essential for tooling to work well with real-world Haskell projects. Loading your whole project into a single GHCi session allows you to get feedback quickly on changes to any part of your project, without having to restart the REPL. Until now, not all of GHCi worked with multiple home units, and this was a source of confusion for many users.

We’re now happy to announce that in 9.14.1, GHCi will fully support multiple home units. This post contains a brief overview of the changes.

Multiple Home Units

Work on multiple home units has been ongoing for a while. This is the latest chapter in our efforts to update the ecosystem to support this feature.

The main way to start a multi-unit GHCi session is by using cabal repl --enable-multi-repl with a selector that selects multiple components in the project, such as all:

> cabal repl --enable-multi-repl all

This will start a GHCi session with a home unit for each selected component. Until now, support in the REPL was essentially limited to reloading modules to get feedback about changes. Almost all other commands were unsupported when using multiple home units.

GHCi Supports Multiple Home Units

Following our changes, GHCi now fully supports multiple home units in its REPL. The experience of a user is now the same whether they are using a single home unit or multiple home units. In particular, the following features have been fixed or enabled:

  • Usual REPL usage such as evaluating expressions
  • All GHCi commands
    • :seti/:set
    • :browse
    • :module [+/-] [*]Mod1 ...
    • … and many more!
  • The GHCi debugger
    • :break, :steplocal, :continue, etc…

Implementing Multi Unit Support in GHCi

To fully support multiple home units, GHCi needed a new internal model of how different contexts interact during a session. There are three key contexts:

  • the prompt (the context in which expressions are evaluated),
  • the script context (in which scripts loaded by :load are executed), and
  • the unit context (the home units specified on the command line, e.g. the components of the Cabal packages being loaded).

Distinguishing these three different contexts is the key to our design. Before, each GHCi session only had a single home unit, and so commands would always be interpreted relative to that unit. In a multi-unit session, one of the units was chosen as the “active” unit, and commands would be interpreted relative to that unit. Now since it is possible to talk precisely about the different contexts, the dependencies between them and where commands should be interpreted, we can properly implement all GHCi commands.

Virtual home units

Our design adds virtual home units for the prompt and script contexts. Therefore, every GHCi session is a multi-unit session, and all commands are modified to support this.

This virtual home unit for the prompt is called interactive-ghci. All user input is interpreted in the context of interactive-ghci (it is the “active” unit). Since it always depends on all user-given home units (i.e. those given on the command line), we can import modules, run code, and execute GHCi commands as usual.

The virtual home unit for scripts is called interactive-session. It is similar in structure to interactive-ghci, namely that it depends on all user-given home units. This allows scripts to use packages from the current GHCi REPL session. Additionally, interactive-ghci depends on interactive-session, allowing the user to load and execute the script modules from the prompt.

Why do we need two virtual home units? When a script is loaded via :load Mod.hs, this Mod.hs needs to be interpreted relative to some home unit. We do not want to guess which home unit Mod.hs should be added to, since the behaviour is hard to predict in a multiple home unit session. However, we also can’t add Mod.hs to the interactive-ghci home unit, as we want to be able to maintain a different set of GHC options for the prompt (i.e. interactive-ghci) and scripts.

Adding these two virtual home units to the GHCi REPL session yields the following Home Unit Graph. We mark interactive-ghci to indicate that it is the “active” context of the GHCi prompt.

GHCi’s Home Unit Graph, showing two virtual units interactive-ghci and interactive-session, where the former depends on the latter. Both of these depend on any number of user-given home units, indicated by the names pkg1 … pkgN.
GHCi’s Home Unit Graph, showing two virtual units interactive-ghci and interactive-session, where the former depends on the latter. Both of these depend on any number of user-given home units, indicated by the names pkg1pkgN.

Examples

Now that we know how the GHCi session will work, let’s show a couple of concrete examples.

We assume a regular cabal project, initialised via the command:

> mkdir mhu-example && cd mhu-example
> cabal init -n --tests --libandexe

This creates a cabal project with three components:

  • lib:mhu-example: The main library.
  • exe:mhu-example: An executable.
  • test:mhu-example-test: A test-suite.

From the perspective of GHC, a unit is essentially identical to a single component (with some hand-waving).

Example of a cabal project with multiple components. GHC treats each component as a separate unit.
Example of a cabal project with multiple components. GHC treats each component as a separate unit.

When we load only the library into a GHCi session, then the library is the single user-specified home unit in the GHCi session. For example, the cabal invocation

cabal repl lib:mhu-example

invokes the following GHC command:

ghc --interactive -this-unit-id lib-mhu-example -package base -package containers ...

This creates a home unit graph with three home units: interactive-ghci, interactive-session and mhu-example-library.

Home Unit Graph with a single user-specified Home Unit. There are three units, interactive-ghci, interactive-session and lib:mhu-example. interactive-ghci depends on interactive-session and lib:mhu-example, while interactive-session depends on lib:mhu-example.
Home Unit Graph with a single user-specified Home Unit. There are three units, interactive-ghci, interactive-session and lib:mhu-example. interactive-ghci depends on interactive-session and lib:mhu-example, while interactive-session depends on lib:mhu-example.

In the case of more than one user-specified home unit, the graph is extended in an intuitive way. For example, the cabal invocation

cabal repl --enable-multi-repl lib:mhu-example exe:mhu-example test:mhu-example-test

will result in the following GHC invocation:1

ghc --interactive -unit @lib-mhu-example -unit @exe-mhu-example -unit @test-mhu-example-test

GHCi internally structures this as the following:

Home Unit Graph with a multiple user-specified home units. There are five units, called interactive-ghci, interactive-session, lib:mhu-example, exe:mhu-example and test:mhu-example-test.
Home Unit Graph with a multiple user-specified home units. There are five units, called interactive-ghci, interactive-session, lib:mhu-example, exe:mhu-example and test:mhu-example-test.

Naturally, home units can have dependencies on other home units, e.g. test:mhu-example-test and exe:mhu-example both depend on lib:mhu-example.

Setting REPL Options

The GHCi commands :set and :seti are used to change the GHC options of the home units and the ghc options for the prompt respectively. In the new architecture, the :set command applies the new options to all home units except interactive-ghci. :seti, on the other hand, applies changes only to the interactive-ghci home unit.

In the future, we may want to extend the capabilities of the :set command to change the GHC options only for certain home units.

Summary

GHCi is now fully compatible with multiple home units, including all GHCi commands and the GHCi debugger. Our new design generalises the architecture of GHCi so that multi-unit and single-unit sessions are handled in the same way. The uniform handling will make sure that multi-unit sessions work correctly as GHCi evolves.

This work has been performed in collaboration with Mercury, who have a long-term commitment to the scalability and robustness of the Haskell ecosystem. Well-Typed are always interested in projects and looking for funding to improve GHC and other Haskell tools. Please contact info@well-typed.com if we might be able to work with you!


  1. The unit arguments are passed using response files. The file exe-mhu-example contains the arguments for the exe:mhu-example home unit, and similarly for the other files.↩︎

by hannes, matthew at June 16, 2025 12:00 AM

Brent Yorgey

Monads are not like burritos

Monads are not like burritos

Posted on June 16, 2025
Tagged , , , , ,

In January 2009, while just a baby first-year PhD student, I wrote a blog post titled Abstraction, intuition, and the “monad tutorial fallacy”. In it, I made the argument that humans tend to learn best by first grappling with concrete examples, and only later proceeding to higher-level intuition and analogies; hence, it’s a mistake to think that clearly presenting your intuition for a topic will help other people understand it. Analogies and intuition can help, but only when accompanied by concrete examples and active engagement. To illustrate the point, I made up a fictitious programmer with a fictitious analogy.

But now Joe goes and writes a monad tutorial called “Monads are Burritos,” under the well-intentioned but mistaken assumption that if other people read his magical insight, learning about monads will be a snap for them. “Monads are easy,” Joe writes. “Think of them as burritos.” Joe hides all the actual details about types and such because those are scary, and people will learn better if they can avoid all that difficult and confusing stuff. Of course, exactly the opposite is true, and all Joe has done is make it harder for people to learn about monads…

My intention was to choose a fictitious analogy which was obviously ridiculous and silly, as a parody of many of the monad tutorials which existed at the time (and still do). Mark Jason Dominus then wrote a blog post, Monads are like burritos, pointing out that actually, monads are kinda like burritos. It’s really funny, though I don’t think it’s actually a very good analogy, and my guess is that Mark would agree: it was clearly written as a silly joke and not as a real way to explain monads.

In any case, from that point the “monads are burritos” meme took on a life of its own. For example:

I even joined in the fun and made this meme image about bad monad tutorials:

Of course there are lots of people who still understand that it was all just a silly joke. Recently, however, I’ve seen several instances where people apparently believe “monads are burritos” is a real, helpful thing and not just a joke meme. For example, see this thread on lobste.rs, or this Mastodon post.

So, to set the record straight: “monads are burritos” is not a helpful analogy!Yes, I am writing a blog post because People Are Wrong On The Internet, and I know it probably won’t make any difference, but here we are.

Why not, you ask? To expand on my reasons from a 10-year-old Reddit comment:

  • The burrito analogy strongly implies that a value of type m a somehow “contains” a value (or values) of type a. But that is not true for all monads (e.g. there is no sense in which a value of type IO String contains a String).
  • Relatedly, the analogy also implies that a value of type m a can be “unwrapped” to get an a, but this is impossible for many monads.
  • It is not actually very easy to take a burrito containing a burrito and merge it into a single-level burrito. At least this is not in any sense a natural operation on burritos. Perhaps you could argue that it is always easy to remove outer tortilla layers (but not the innermost one since the food will all fall out), but this is a bad analogy, since in general join does not just “remove” an outer layer, but somehow merges the effects of two layers into one.

Actually, burritos are a great analogy for the Identity monad! …but not much beyond that.

On a more positive note, my sense is that the average pedagogical quality of Haskell materials, and monad tutorials in particular, has indeed gone up significantly since 2009. I’d love to think this can be at least partially attributed to my original blog post, though of course it’s impossible to know that for sure.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at June 16, 2025 12:00 AM

June 15, 2025

Chris Reade

PenroseKiteDart User Guide

Introduction

(Updated June 2025 for PenroseKiteDart version 1.4)

PenroseKiteDart is a Haskell package with tools to experiment with finite tilings of Penrose’s Kites and Darts. It uses the Haskell Diagrams package for drawing tilings. As well as providing drawing tools, this package introduces tile graphs (Tgraphs) for describing finite tilings. (I would like to thank Stephen Huggett for suggesting planar graphs as a way to reperesent the tilings).

This document summarises the design and use of the PenroseKiteDart package.

PenroseKiteDart package is now available on Hackage.

The source files are available on GitHub at https://github.com/chrisreade/PenroseKiteDart.

There is a small art gallery of examples created with PenroseKiteDart here.

Index

  1. About Penrose’s Kites and Darts
  2. Using the PenroseKiteDart Package (initial set up).
  3. Overview of Types and Operations
  4. Drawing in more detail
  5. Forcing in more detail
  6. Advanced Operations
  7. Other Reading

1. About Penrose’s Kites and Darts

The Tiles

In figure 1 we show a dart and a kite. All angles are multiples of 36^{\circ} (a tenth of a full turn). If the shorter edges are of length 1, then the longer edges are of length \phi, where \phi = (1+ \sqrt{5})/ 2 is the golden ratio.

Figure 1: The Dart and Kite Tiles
Figure 1: The Dart and Kite Tiles

Aperiodic Infinite Tilings

What is interesting about these tiles is:

It is possible to tile the entire plane with kites and darts in an aperiodic way.

Such a tiling is non-periodic and does not contain arbitrarily large periodic regions or patches.

The possibility of aperiodic tilings with kites and darts was discovered by Sir Roger Penrose in 1974. There are other shapes with this property, including a chiral aperiodic monotile discovered in 2023 by Smith, Myers, Kaplan, Goodman-Strauss. (See the Penrose Tiling Wikipedia page for the history of aperiodic tilings)

This package is entirely concerned with Penrose’s kite and dart tilings also known as P2 tilings.

In figure 2 we add a temporary green line marking purely to illustrate a rule for making legal tilings. The purpose of the rule is to exclude the possibility of periodic tilings.

If all tiles are marked as shown, then whenever tiles come together at a point, they must all be marked or must all be unmarked at that meeting point. So, for example, each long edge of a kite can be placed legally on only one of the two long edges of a dart. The kite wing vertex (which is marked) has to go next to the dart tip vertex (which is marked) and cannot go next to the dart wing vertex (which is unmarked) for a legal tiling.

Figure 2: Marked Dart and Kite
Figure 2: Marked Dart and Kite

Correct Tilings

Unfortunately, having a finite legal tiling is not enough to guarantee you can continue the tiling without getting stuck. Finite legal tilings which can be continued to cover the entire plane are called correct and the others (which are doomed to get stuck) are called incorrect. This means that decomposition and forcing (described later) become important tools for constructing correct finite tilings.

2. Using the PenroseKiteDart Package

You will need the Haskell Diagrams package (See Haskell Diagrams) as well as this package (PenroseKiteDart). When these are installed, you can produce diagrams with a Main.hs module. This should import a chosen backend for diagrams such as the default (SVG) along with Diagrams.Prelude.

    module Main (main) where
    
    import Diagrams.Backend.SVG.CmdLine
    import Diagrams.Prelude

For Penrose’s Kite and Dart tilings, you also need to import the PKD module and (optionally) the TgraphExamples module.

    import PKD
    import TgraphExamples

Then to ouput someExample figure

    fig::Diagram B
    fig = someExample

    main :: IO ()
    main = mainWith fig

Note that the token B is used in the diagrams package to represent the chosen backend for output. So a diagram has type Diagram B. In this case B is bound to SVG by the import of the SVG backend. When the compiled module is executed it will generate an SVG file. (See Haskell Diagrams for more details on producing diagrams and using alternative backends).

3. Overview of Types and Operations

Half-Tiles

In order to implement operations on tilings (decompose in particular), we work with half-tiles. These are illustrated in figure 3 and labelled RD (right dart), LD (left dart), LK (left kite), RK (right kite). The join edges where left and right halves come together are shown with dotted lines, leaving one short edge and one long edge on each half-tile (excluding the join edge). We have shown a red dot at the vertex we regard as the origin of each half-tile (the tip of a half-dart and the base of a half-kite).

Figure 3: Half-Tile pieces showing join edges (dashed) and origin vertices (red dots)
Figure 3: Half-Tile pieces showing join edges (dashed) and origin vertices (red dots)

The labels are actually data constructors introduced with type operator HalfTile which has an argument type (rep) to allow for more than one representation of the half-tiles.

    data HalfTile rep 
      = LD rep -- Left Dart
      | RD rep -- Right Dart
      | LK rep -- Left Kite
      | RK rep -- Right Kite
      deriving (Show,Eq)

Tgraphs

We introduce tile graphs (Tgraphs) which provide a simple planar graph representation for finite patches of tiles. For Tgraphs we first specialise HalfTile with a triple of vertices (positive integers) to make a TileFace such as RD(1,2,3), where the vertices go clockwise round the half-tile triangle starting with the origin.

    type TileFace  = HalfTile (Vertex,Vertex,Vertex)
    type Vertex    = Int  -- must be positive

The function

    makeTgraph :: [TileFace] -> Tgraph

then constructs a Tgraph from a TileFace list after checking the TileFaces satisfy certain properties (described below). We also have

    faces :: Tgraph -> [TileFace]

to retrieve the TileFace list from a Tgraph.

As an example, the fool (short for fool’s kite and also called an ace in the literature) consists of two kites and a dart (= 4 half-kites and 2 half-darts):

    fool :: Tgraph
    fool = makeTgraph [RD (1,2,3), LD (1,3,4)   -- right and left dart
                      ,LK (5,3,2), RK (5,2,7)   -- left and right kite
                      ,RK (5,4,3), LK (5,6,4)   -- right and left kite
                      ]

To produce a diagram, we simply draw the Tgraph

    foolFigure :: Diagram B
    foolFigure = draw fool

which will produce the diagram on the left in figure 4.

Alternatively,

    foolFigure :: Diagram B
    foolFigure = labelled drawj fool

will produce the diagram on the right in figure 4 (showing vertex labels and dashed join edges).

Figure 4: Diagram of fool without labels and join edges (left), and with (right)
Figure 4: Diagram of fool without labels and join edges (left), and with (right)

When any (non-empty) Tgraph is drawn, a default orientation and scale are chosen based on the lowest numbered join edge. This is aligned on the positive x-axis with length 1 (for darts) or length \phi (for kites).

Tgraph Properties

Tgraphs are actually implemented as

    newtype Tgraph = Tgraph [TileFace]
                     deriving (Show)

but the data constructor Tgraph is not exported to avoid accidentally by-passing checks for the required properties. The properties checked by makeTgraph ensure the Tgraph represents a legal tiling as a planar graph with positive vertex numbers, and that the collection of half-tile faces are both connected and have no crossing boundaries (see note below). Finally, there is a check to ensure two or more distinct vertex numbers are not used to represent the same vertex of the graph (a touching vertex check). An error is raised if there is a problem.

Note: If the TilFaces are faces of a planar graph there will also be exterior (untiled) regions, and in graph theory these would also be called faces of the graph. To avoid confusion, we will refer to these only as exterior regions, and unless otherwise stated, face will mean a TileFace. We can then define the boundary of a list of TileFaces as the edges of the exterior regions. There is a crossing boundary if the boundary crosses itself at a vertex. We exclude crossing boundaries from Tgraphs because they prevent us from calculating relative positions of tiles locally and create touching vertex problems.

For convenience, in addition to makeTgraph, we also have

    makeUncheckedTgraph :: [TileFace] -> Tgraph
    checkedTgraph   :: [TileFace] -> Tgraph

The first of these (performing no checks) is useful when you know the required properties hold. The second performs the same checks as makeTgraph except that it omits the touching vertex check. This could be used, for example, when making a Tgraph from a sub-collection of TileFaces of another Tgraph.

Main Tiling Operations

There are three key operations on finite tilings, namely

    decompose :: Tgraph -> Tgraph
    force     :: Tgraph -> Tgraph
    compose   :: Tgraph -> Tgraph

Decompose

Decomposition (also called deflation) works by splitting each half-tile into either 2 or 3 new (smaller scale) half-tiles, to produce a new tiling. The fact that this is possible, is used to establish the existence of infinite aperiodic tilings with kites and darts. Since our Tgraphs have abstracted away from scale, the result of decomposing a Tgraph is just another Tgraph. However if we wish to compare before and after with a drawing, the latter should be scaled by a factor 1/{\phi} = \phi - 1 times the scale of the former, to reflect the change in scale.

Figure 5: fool (left) and decompose fool (right)
Figure 5: fool (left) and decompose fool (right)

We can, of course, iterate decompose to produce an infinite list of finer and finer decompositions of a Tgraph

    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose

Force

Force works by adding any TileFaces on the boundary edges of a Tgraph which are forced. That is, where there is only one legal choice of TileFace addition consistent with the seven possible vertex types. Such additions are continued until either (i) there are no more forced cases, in which case a final (forced) Tgraph is returned, or (ii) the process finds the tiling is stuck, in which case an error is raised indicating an incorrect tiling. [In the latter case, the argument to force must have been an incorrect tiling, because the forced additions cannot produce an incorrect tiling starting from a correct tiling.]

An example is shown in figure 6. When forced, the Tgraph on the left produces the result on the right. The original is highlighted in red in the result to show what has been added.

Figure 6: A Tgraph (left) and its forced result (right) with the original shown red
Figure 6: A Tgraph (left) and its forced result (right) with the original shown red

Compose

Composition (also called inflation) is an opposite to decompose but this has complications for finite tilings, so it is not simply an inverse. (See Graphs,Kites and Darts and Theorems for more discussion of the problems). Figure 7 shows a Tgraph (left) with the result of composing (right) where we have also shown (in pale green) the faces of the original that are not included in the composition – the remainder faces.

Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green
Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green

Under some circumstances composing can fail to produce a Tgraph because there are crossing boundaries in the resulting TileFaces. However, we have established that

  • If g is a forced Tgraph, then compose g is defined and it is also a forced Tgraph.

Try Results

It is convenient to use types of the form Try a for results where we know there can be a failure. For example, compose can fail if the result does not pass the connected and no crossing boundary check, and force can fail if its argument is an incorrect Tgraph. In situations when you would like to continue some computation rather than raise an error when there is a failure, use a try version of a function.

    tryCompose :: Tgraph -> Try Tgraph
    tryForce   :: Tgraph -> Try Tgraph

We define Try as a synonym for Either ShowS (which is a monad) in module Tgraph.Try.

type Try a = Either ShowS a

(Note ShowS is String -> String). Successful results have the form Right r (for some correct result r) and failure results have the form Left (s<>) (where s is a String describing the problem as a failure report).

The function

    runTry:: Try a -> a
    runTry = either error id

will retrieve a correct result but raise an error for failure cases. This means we can always derive an error raising version from a try version of a function by composing with runTry.

    force = runTry . tryForce
    compose = runTry . tryCompose

Elementary Tgraph and TileFace Operations

The module Tgraph.Prelude defines elementary operations on Tgraphs relating vertices, directed edges, and faces. We describe a few of them here.

When we need to refer to particular vertices of a TileFace we use

    originV :: TileFace -> Vertex -- the first vertex - red dot in figure 2
    oppV    :: TileFace -> Vertex -- the vertex at the opposite end of the join edge from the origin
    wingV   :: TileFace -> Vertex -- the vertex not on the join edge

A directed edge is represented as a pair of vertices.

    type Dedge = (Vertex,Vertex)

So (a,b) is regarded as a directed edge from a to b.

When we need to refer to particular edges of a TileFace we use

    joinE  :: TileFace -> Dedge  -- shown dotted in figure 2
    shortE :: TileFace -> Dedge  -- the non-join short edge
    longE  :: TileFace -> Dedge  -- the non-join long edge

which are all directed clockwise round the TileFace. In contrast, joinOfTile is always directed away from the origin vertex, so is not clockwise for right darts or for left kites:

    joinOfTile:: TileFace -> Dedge
    joinOfTile face = (originV face, oppV face)

In the special case that a list of directed edges is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we can think of this as an edge list rather than just a directed edge list.

For example,

    internalEdges :: Tgraph -> [Dedge]

produces an edge list, whereas

    boundary :: Tgraph -> [Dedge]

produces single directions. Each directed edge in the resulting boundary will have a TileFace on the left and an exterior region on the right. The function

    dedges :: Tgraph -> [Dedge]

produces all the directed edges obtained by going clockwise round each TileFace so not every edge in the list has an inverse in the list.

Note: There is now a class HasFaces (introduced in version 1.4) which includes instances for both Tgraph and [TileFace] and others. This allows some generalisations. In particular the more general types of the above three functions are now

    internalEdges :: HasFaces a => a -> [Dedge]
    boundary      :: HasFaces a => a -> [Dedge] 
    dedges        :: HasFaces a => a -> [Dedge]   

Patches (Scaled and Positioned Tilings)

Behind the scenes, when a Tgraph is drawn, each TileFace is converted to a Piece. A Piece is another specialisation of HalfTile using a two dimensional vector to indicate the length and direction of the join edge of the half-tile (from the originV to the oppV), thus fixing its scale and orientation. The whole Tgraph then becomes a list of located Pieces called a Patch.

    type Piece = HalfTile (V2 Double)
    type Patch = [Located Piece]

Piece drawing functions derive vectors for other edges of a half-tile piece from its join edge vector. In particular (in the TileLib module) we have

    drawPiece :: Piece -> Diagram B
    dashjPiece :: Piece -> Diagram B
    fillPieceDK :: Colour Double -> Colour Double -> Piece -> Diagram B

where the first draws the non-join edges of a Piece, the second does the same but adds a dashed line for the join edge, and the third takes two colours – one for darts and one for kites, which are used to fill the piece as well as using drawPiece.

Patch is an instances of class Transformable so a Patch can be scaled, rotated, and translated.

Vertex Patches

It is useful to have an intermediate form between Tgraphs and Patches, that contains information about both the location of vertices (as 2D points), and the abstract TileFaces. This allows us to introduce labelled drawing functions (to show the vertex labels) which we then extend to Tgraphs. We call the intermediate form a VPatch (short for Vertex Patch).

    type VertexLocMap = IntMap.IntMap (Point V2 Double)
    data VPatch = VPatch {vLocs :: VertexLocMap,  vpFaces::[TileFace]} deriving Show

and

    makeVP :: Tgraph -> VPatch

calculates vertex locations using a default orientation and scale.

VPatch is made an instance of class Transformable so a VPatch can also be scaled and rotated.

One essential use of this intermediate form is to be able to draw a Tgraph with labels, rotated but without the labels themselves being rotated. We can simply convert the Tgraph to a VPatch, and rotate that before drawing with labels.

    labelled draw (rotate someAngle (makeVP g))

We can also align a VPatch using vertex labels.

    alignXaxis :: (Vertex, Vertex) -> VPatch -> VPatch 

So if g is a Tgraph with vertex labels a and b we can align it on the x-axis with a at the origin and b on the positive x-axis (after converting to a VPatch), instead of accepting the default orientation.

    labelled draw (alignXaxis (a,b) (makeVP g))

Another use of VPatches is to share the vertex location map when drawing only subsets of the faces (see Overlaid examples in the next section).

4. Drawing in More Detail

Class Drawable

There is a class Drawable with instances Tgraph, VPatch, Patch. When the token B is in scope standing for a fixed backend then we can assume

    draw   :: Drawable a => a -> Diagram B  -- draws non-join edges
    drawj  :: Drawable a => a -> Diagram B  -- as with draw but also draws dashed join edges
    fillDK :: Drawable a => Colour Double -> Colour Double -> a -> Diagram B -- fills with colours

where fillDK clr1 clr2 will fill darts with colour clr1 and kites with colour clr2 as well as drawing non-join edges.

These are the main drawing tools. However they are actually defined for any suitable backend b so have more general types.

(Update Sept 2024) As of version 1.1 of PenroseKiteDart, these will be

    draw ::   (Drawable a, OKBackend b) =>
              a -> Diagram b
    drawj ::  (Drawable a, OKBackend) b) =>
              a -> Diagram b
    fillDK :: (Drawable a, OKBackend b) =>
              Colour Double -> Colour Double -> a -> Diagram b

where the class OKBackend is a check to ensure a backend is suitable for drawing 2D tilings with or without labels.

In these notes we will generally use the simpler description of types using B for a fixed chosen backend for the sake of clarity.

The drawing tools are each defined via the class function drawWith using Piece drawing functions.

    class Drawable a where
        drawWith :: (Piece -> Diagram B) -> a -> Diagram B
    
    draw = drawWith drawPiece
    drawj = drawWith dashjPiece
    fillDK clr1 clr2 = drawWith (fillPieceDK clr1 clr2)

To design a new drawing function, you only need to implement a function to draw a Piece, (let us call it newPieceDraw)

    newPieceDraw :: Piece -> Diagram B

This can then be elevated to draw any Drawable (including Tgraphs, VPatches, and Patches) by applying the Drawable class function drawWith:

    newDraw :: Drawable a => a -> Diagram B
    newDraw = drawWith newPieceDraw

Class DrawableLabelled

Class DrawableLabelled is defined with instances Tgraph and VPatch, but Patch is not an instance (because this does not retain vertex label information).

    class DrawableLabelled a where
        labelColourSize :: Colour Double -> Measure Double -> (Patch -> Diagram B) -> a -> Diagram B

So labelColourSize c m modifies a Patch drawing function to add labels (of colour c and size measure m). Measure is defined in Diagrams.Prelude with pre-defined measures tiny, verySmall, small, normal, large, veryLarge, huge. For most of our diagrams of Tgraphs, we use red labels and we also find small is a good default size choice, so we define

    labelSize :: DrawableLabelled a => Measure Double -> (Patch -> Diagram B) -> a -> Diagram B
    labelSize = labelColourSize red

    labelled :: DrawableLabelled a => (Patch -> Diagram B) -> a -> Diagram B
    labelled = labelSize small

and then labelled draw, labelled drawj, labelled (fillDK clr1 clr2) can all be used on both Tgraphs and VPatches as well as (for example) labelSize tiny draw, or labelCoulourSize blue normal drawj.

Further drawing functions

There are a few extra drawing functions built on top of the above ones. The function smart is a modifier to add dashed join edges only when they occur on the boundary of a Tgraph

    smart :: (VPatch -> Diagram B) -> Tgraph -> Diagram B

So smart vpdraw g will draw dashed join edges on the boundary of g before applying the drawing function vpdraw to the VPatch for g. For example the following all draw dashed join edges only on the boundary for a Tgraph g

    smart draw g
    smart (labelled draw) g
    smart (labelSize normal draw) g

When using labels, the function rotateBefore allows a Tgraph to be drawn rotated without rotating the labels.

    rotateBefore :: (VPatch -> a) -> Angle Double -> Tgraph -> a
    rotateBefore vpdraw angle = vpdraw . rotate angle . makeVP

So for example,

    rotateBefore (labelled draw) (90@@deg) g

makes sense for a Tgraph g. Of course if there are no labels we can simply use

    rotate (90@@deg) (draw g)

Similarly alignBefore allows a Tgraph to be aligned on the X-axis using a pair of vertex numbers before drawing.

    alignBefore :: (VPatch -> a) -> (Vertex,Vertex) -> Tgraph -> a
    alignBefore vpdraw (a,b) = vpdraw . alignXaxis (a,b) . makeVP

So, for example, if Tgraph g has vertices a and b, both

    alignBefore draw (a,b) g
    alignBefore (labelled draw) (a,b) g

make sense. Note that the following examples are wrong. Even though they type check, they re-orient g without repositioning the boundary joins.

    smart (labelled draw . rotate angle) g      -- WRONG
    smart (labelled draw . alignXaxis (a,b)) g  -- WRONG

Instead use

    smartRotateBefore (labelled draw) angle g
    smartAlignBefore (labelled draw) (a,b) g

where

    smartRotateBefore :: (VPatch -> Diagram B) -> Angle Double -> Tgraph -> Diagram B
    smartAlignBefore  :: (VPatch -> Diagram B) -> (Vertex,Vertex) -> Tgraph -> Diagram B

are defined using

    restrictSmart :: Tgraph -> (VPatch -> Diagram B) -> VPatch -> Diagram B

Here, restrictSmart g vpdraw vp uses the given vp for drawing boundary joins and drawing faces of g (with vpdraw) rather than converting g to a new VPatch. This assumes vp has locations for vertices in g.

Overlaid examples (location map sharing)

The function

    drawForce :: Tgraph -> Diagram B

will (smart) draw a Tgraph g in red overlaid (using <>) on the result of force g as in figure 6. Similarly

    drawPCompose  :: Tgraph -> Diagram B

applied to a Tgraph g will draw the result of a partial composition of g as in figure 7. That is a drawing of compose g but overlaid with a drawing of the remainder faces of g shown in pale green.

Both these functions make use of sharing a vertex location map to get correct alignments of overlaid diagrams. In the case of drawForce g, we know that a VPatch for force g will contain all the vertex locations for g since force only adds to a Tgraph (when it succeeds). So when constructing the diagram for g we can use the VPatch created for force g instead of starting afresh. Similarly for drawPCompose g the VPatch for g contains locations for all the vertices of compose g so compose g is drawn using the the VPatch for g instead of starting afresh.

The location map sharing is done with

    subVP :: VPatch -> [TileFace] -> VPatch

so that subVP vp fcs is a VPatch with the same vertex locations as vp, but replacing the faces of vp with fcs. [Of course, this can go wrong if the new faces have vertices not in the domain of the vertex location map so this needs to be used with care. Any errors would only be discovered when a diagram is created.]

For cases where labels are only going to be drawn for certain faces, we need a version of subVP which also gets rid of vertex locations that are not relevant to the faces. For this situation we have

    restrictVP:: VPatch -> [TileFace] -> VPatch

which filters out un-needed vertex locations from the vertex location map. Unlike subVP, restrictVP checks for missing vertex locations, so restrictVP vp fcs raises an error if a vertex in fcs is missing from the keys of the vertex location map of vp.

5. Forcing in More Detail

The force rules

The rules used by our force algorithm are local and derived from the fact that there are seven possible vertex types as depicted in figure 8.

Figure 8: Seven vertex types
Figure 8: Seven vertex types

Our rules are shown in figure 9 (omitting mirror symmetric versions). In each case the TileFace shown yellow needs to be added in the presence of the other TileFaces shown.

Figure 9: Rules for forcing
Figure 9: Rules for forcing

Main Forcing Operations

To make forcing efficient we convert a Tgraph to a BoundaryState to keep track of boundary information of the Tgraph, and then calculate a ForceState which combines the BoundaryState with a record of awaiting boundary edge updates (an update map). Then each face addition is carried out on a ForceState, converting back when all the face additions are complete. It makes sense to apply force (and related functions) to a Tgraph, a BoundaryState, or a ForceState, so we define a class Forcible with instances Tgraph, BoundaryState, and ForceState.

This allows us to define

    force :: Forcible a => a -> a
    tryForce :: Forcible a => a -> Try a

The first will raise an error if a stuck tiling is encountered. The second uses a Try result which produces a Left string for failures and a Right a for successful result a.

There are several other operations related to forcing including

    stepForce :: Forcible a => Int -> a -> a
    tryStepForce  :: Forcible a => Int -> a -> Try a

    addHalfDart, addHalfKite :: Forcible a => Dedge -> a -> a
    tryAddHalfDart, tryAddHalfKite :: Forcible a => Dedge -> a -> Try a

The first two force (up to) a given number of steps (=face additions) and the other four add a half dart/kite on a given boundary edge.

Update Generators

An update generator is used to calculate which boundary edges can have a certain update. There is an update generator for each force rule, but also a combined (all update) generator. The force operations mentioned above all use the default all update generator (defaultAllUGen) but there are more general (with) versions that can be passed an update generator of choice. For example

    forceWith :: Forcible a => UpdateGenerator -> a -> a
    tryForceWith :: Forcible a => UpdateGenerator -> a -> Try a

In fact we defined

    force = forceWith defaultAllUGen
    tryForce = tryForceWith defaultAllUGen

We can also define

    wholeTiles :: Forcible a => a -> a
    wholeTiles = forceWith wholeTileUpdates

where wholeTileUpdates is an update generator that just finds boundary join edges to complete whole tiles.

In addition to defaultAllUGen there is also allUGenerator which does the same thing apart from how failures are reported. The reason for keeping both is that they were constructed differently and so are useful for testing.

In fact UpdateGenerators are functions that take a BoundaryState and a focus (list of boundary directed edges) to produce an update map. Each Update is calculated as either a SafeUpdate (where two of the new face edges are on the existing boundary and no new vertex is needed) or an UnsafeUpdate (where only one edge of the new face is on the boundary and a new vertex needs to be created for a new face).

    type UpdateGenerator = BoundaryState -> [Dedge] -> Try UpdateMap
    type UpdateMap = Map.Map Dedge Update
    data Update = SafeUpdate TileFace 
                | UnsafeUpdate (Vertex -> TileFace)

Completing (executing) an UnsafeUpdate requires a touching vertex check to ensure that the new vertex does not clash with an existing boundary vertex. Using an existing (touching) vertex would create a crossing boundary so such an update has to be blocked.

Forcible Class Operations

The Forcible class operations are higher order and designed to allow for easy additions of further generic operations. They take care of conversions between Tgraphs, BoundaryStates and ForceStates.

    class Forcible a where
      tryFSOpWith :: UpdateGenerator -> (ForceState -> Try ForceState) -> a -> Try a
      tryChangeBoundaryWith :: UpdateGenerator -> (BoundaryState -> Try BoundaryChange) -> a -> Try a
      tryInitFSWith :: UpdateGenerator -> a -> Try ForceState

For example, given an update generator ugen and any f:: ForceState -> Try ForceState , then f can be generalised to work on any Forcible using tryFSOpWith ugen f. This is used to define both tryForceWith and tryStepForceWith.

We also specialize tryFSOpWith to use the default update generator

    tryFSOp :: Forcible a => (ForceState -> Try ForceState) -> a -> Try a
    tryFSOp = tryFSOpWith defaultAllUGen

Similarly given an update generator ugen and any f:: BoundaryState -> Try BoundaryChange , then f can be generalised to work on any Forcible using tryChangeBoundaryWith ugen f. This is used to define tryAddHalfDart and tryAddHalfKite.

We also specialize tryChangeBoundaryWith to use the default update generator

    tryChangeBoundary :: Forcible a => (BoundaryState -> Try BoundaryChange) -> a -> Try a
    tryChangeBoundary = tryChangeBoundaryWith defaultAllUGen

Note that the type BoundaryChange contains a resulting BoundaryState, the single TileFace that has been added, a list of edges removed from the boundary (of the BoundaryState prior to the face addition), and a list of the (3 or 4) boundary edges affected around the change that require checking or re-checking for updates.

The class function tryInitFSWith will use an update generator to create an initial ForceState for any Forcible. If the Forcible is already a ForceState it will do nothing. Otherwise it will calculate updates for the whole boundary. We also have the special case

    tryInitFS :: Forcible a => a -> Try ForceState
    tryInitFS = tryInitFSWith defaultAllUGen

Efficient chains of forcing operations.

Note that (force . force) does the same as force, but we might want to chain other force related steps in a calculation.

For example, consider the following combination which, after decomposing a Tgraph, forces, then adds a half dart on a given boundary edge (d) and then forces again.

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = force . addHalfDart d . force . decompose

Since decompose:: Tgraph -> Tgraph, the instances of force and addHalfDart d will have type Tgraph -> Tgraph so each of these operations, will begin and end with conversions between Tgraph and ForceState. We would do better to avoid these wasted intermediate conversions working only with ForceStates and keeping only those necessary conversions at the beginning and end of the whole sequence.

This can be done using tryFSOp. To see this, let us first re-express the forcing sequence using the Try monad, so

    force . addHalfDart d . force

becomes

    tryForce <=< tryAddHalfDart d <=< tryForce

Note that (<=<) is the Kliesli arrow which replaces composition for Monads (defined in Control.Monad). (We could also have expressed this right to left sequence with a left to right version tryForce >=> tryAddHalfDart d >=> tryForce). The definition of combo becomes

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose

This has no performance improvement, but now we can pass the sequence to tryFSOp to remove the unnecessary conversions between steps.

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . tryFSOp (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose

The sequence actually has type Forcible a => a -> Try a but when passed to tryFSOp it specialises to type ForceState -> Try ForseState. This ensures the sequence works on a ForceState and any conversions are confined to the beginning and end of the sequence, avoiding unnecessary intermediate conversions.

A limitation of forcing

To avoid creating touching vertices (or crossing boundaries) a BoundaryState keeps track of locations of boundary vertices. At around 35,000 face additions in a single force operation the calculated positions of boundary vertices can become too inaccurate to prevent touching vertex problems. In such cases it is better to use

    recalibratingForce :: Forcible a => a -> a
    tryRecalibratingForce :: Forcible a => a -> Try a

These work by recalculating all vertex positions at 20,000 step intervals to get more accurate boundary vertex positions. For example, 6 decompositions of the kingGraph has 2,906 faces. Applying force to this should result in 53,574 faces but will go wrong before it reaches that. This can be fixed by calculating either

    recalibratingForce (decompositions kingGraph !!6)

or using an extra force before the decompositions

    force (decompositions (force kingGraph) !!6)

In the latter case, the final force only needs to add 17,864 faces to the 35,710 produced by decompositions (force kingGraph) !!6.

6. Advanced Operations

Guided comparison of Tgraphs

Asking if two Tgraphs are equivalent (the same apart from choice of vertex numbers) is a an np-complete problem. However, we do have an efficient guided way of comparing Tgraphs. In the module Tgraph.Rellabelling we have

    sameGraph :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Bool

The expression sameGraph (g1,d1) (g2,d2) asks if g2 can be relabelled to match g1 assuming that the directed edge d2 in g2 is identified with d1 in g1. Hence the comparison is guided by the assumption that d2 corresponds to d1.

It is implemented using

    tryRelabelToMatch :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph

where tryRelabelToMatch (g1,d1) (g2,d2) will either fail with a Left report if a mismatch is found when relabelling g2 to match g1 or will succeed with Right g3 where g3 is a relabelled version of g2. The successful result g3 will match g1 in a maximal tile-connected collection of faces containing the face with edge d1 and have vertices disjoint from those of g1 elsewhere. The comparison tries to grow a suitable relabelling by comparing faces one at a time starting from the face with edge d1 in g1 and the face with edge d2 in g2. (This relies on the fact that Tgraphs are connected with no crossing boundaries, and hence tile-connected.)

The above function is also used to implement

    tryFullUnion:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph

which tries to find the union of two Tgraphs guided by a directed edge identification. However, there is an extra complexity arising from the fact that Tgraphs might overlap in more than one tile-connected region. After calculating one overlapping region, the full union uses some geometry (calculating vertex locations) to detect further overlaps.

Finally we have

    commonFaces:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> [TileFace]

which will find common regions of overlapping faces of two Tgraphs guided by a directed edge identification. The resulting common faces will be a sub-collection of faces from the first Tgraph. These are returned as a list as they may not be a connected collection of faces and therefore not necessarily a Tgraph.

Empires and SuperForce

In Empires and SuperForce we discussed forced boundary coverings which were used to implement both a superForce operation

    superForce:: Forcible a => a -> a

and operations to calculate empires.

We will not repeat the descriptions here other than to note that

    forcedBoundaryECovering:: Tgraph -> [Tgraph]

finds boundary edge coverings after forcing a Tgraph. That is, forcedBoundaryECovering g will first force g, then (if it succeeds) finds a collection of (forced) extensions to force g such that

  • each extension has the whole boundary of force g as internal edges.
  • each possible addition to a boundary edge of force g (kite or dart) has been included in the collection.

(possible here means – not leading to a stuck Tgraph when forced.) There is also

    forcedBoundaryVCovering:: Tgraph -> [Tgraph]

which does the same except that the extensions have all boundary vertices internal rather than just the boundary edges.

Combinations and Explicitly Forced

We introduced a new type Forced (in v 1.3) to enable a forcible to be explictily labelled as being forced. For example

    forceF    :: Forcible a => a -> Forced a 
    tryForceF :: Forcible a => a -> Try (Forced a)
    forgetF   :: Forced a -> a

This allows us to restrict certain functions which expect a forced argument by making this explicit.

    composeF :: Forced Tgraph -> Forced Tgraph

The definition makes use of theorems established in Graphs,Kites and Darts and Theorems that composing a forced Tgraph does not require a check (for connectedness and no crossing boundaries) and the result is also forced. This can then be used to define efficient combinations such as

    compForce:: Tgraph -> Forced Tgraph      -- compose after forcing
    composeForce = composeF . forceF

    allCompForce:: Tgraph -> [Forced Tgraph] -- iterated (compose after force) while not emptyTgraph
    maxCompForce:: Tgraph -> Forced Tgraph   -- last item in allCompForce (or emptyTgraph)

Tracked Tgraphs

The type

    data TrackedTgraph = TrackedTgraph
       { tgraph  :: Tgraph
       , tracked :: [[TileFace]] 
       } deriving Show

has proven useful in experimentation as well as in producing artwork with darts and kites. The idea is to keep a record of sub-collections of faces of a Tgraph when doing both force operations and decompositions. A list of the sub-collections forms the tracked list associated with the Tgraph. We make TrackedTgraph an instance of class Forcible by having force operations only affect the Tgraph and not the tracked list. The significant idea is the implementation of

    decomposeTracked :: TrackedTgraph -> TrackedTgraph

Decomposition of a Tgraph involves introducing a new vertex for each long edge and each kite join. These are then used to construct the decomposed faces. For decomposeTracked we do the same for the Tgraph, but when it comes to the tracked collections, we decompose them re-using the same new vertex numbers calculated for the edges in the Tgraph. This keeps a consistent numbering between the Tgraph and tracked faces, so each item in the tracked list remains a sub-collection of faces in the Tgraph.

The function

    drawTrackedTgraph :: [VPatch -> Diagram B] -> TrackedTgraph -> Diagram B

is used to draw a TrackedTgraph. It uses a list of functions to draw VPatches. The first drawing function is applied to a VPatch for any untracked faces. Subsequent functions are applied to VPatches for the tracked list in order. Each diagram is beneath later ones in the list, with the diagram for the untracked faces at the bottom. The VPatches used are all restrictions of a single VPatch for the Tgraph, so will be consistent in vertex locations. When labels are used, there is also a drawTrackedTgraphRotated and drawTrackedTgraphAligned for rotating or aligning the VPatch prior to applying the drawing functions.

Note that the result of calculating empires (see Empires and SuperForce ) is represented as a TrackedTgraph. The result is actually the common faces of a forced boundary covering, but a particular element of the covering (the first one) is chosen as the background Tgraph with the common faces as a tracked sub-collection of faces. Hence we have

    empire1, empire2 :: Tgraph -> TrackedTgraph
    
    drawEmpire :: TrackedTgraph -> Diagram B

Figure 10 was also created using TrackedTgraphs.

Figure 10: Using a TrackedTgraph for drawing
Figure 10: Using a TrackedTgraph for drawing

7. Other Reading

Previous related blogs are:

  • Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).
  • Graphs, Kites and Darts intoduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
  • Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
  • Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.

by readerunner at June 15, 2025 03:32 PM

June 11, 2025

Simon Marlow

Browsing Stackage with VS Code and Glean

Browsing Stackage with VS Code and Glean

Have you ever wished you could browse all the Haskell packages together in your IDE, with full navigation using go-to-definition and find-references? Here’s a demo of something I hacked together while at ZuriHac 2025 over the weekend:

In the previous post I talked about how to index all of Hackage (actually Stackage, strictly speaking, because it’s not in general possible to build all of Hackage together) using Glean. Since that post I made some more progress on the indexer:

  • The indexer now indexes types. You can see type-on-hover working in the demo. The types are similar to what you see in the Haddock-generated hyperlinked source, except that here it’s always using the type of the definition and not the type at the usage site, which might be more specific. That’s a TODO for later.

  • Fixed a bunch of things, enriched the index with details about constructors, fields and class methods, and made indexing more efficient.

The DB size including types is now about 850MB, and it takes just under 8 minutes on my 9-year-old laptop to index the nearly 3000 packages in my stackage LTS 21.21 snapshot. (Note: the figures here were updated on 12-06-2025 when I redid the measurments).

Hooking it up to VS Code

The architecture looks like this:

The LSP server is a modified version of static-ls, which is already designed to provide an LSP service based on static information. I just reimplemented a few of its handlers to make calls to Glass instead of the existing hie/hiedb implementations. You can see the changes on my fork of static-ls. Of course, these changes are still quite hacky and not suitable for upstreaming.

Glass is a “Language-agnostic Symbol Server”. Essentially it provides an API abstraction over Glean with operations that are useful for code navigation and search.

Where to next?

There remain a few issues to solve before this can be useful.

  • Make Glean more easily installable. There’s a general concensus that cabal install glean would lower the barrier to entry significantly; in order to do this we need to build the folly dependency using Cabal.

  • Clean up and ship the LSP server, somehow. Once Glean is cabal-installable, we can depend on it from an LSP server package.

  • Think about continuous integration to build the Glean DB. Perhaps this can piggyback off the stackage CI infra? If we can already build a complete stackage snapshot, and Glean is easily installable, then indexing would be fairly straightforward. I’d love to hear suggestions on how best to do this.

And looking forwards a bit further:

  • Think about how to handle multiple packages versions. There’s no fundamental problem with indexing multiple package versions, except that Glass’s SymbolID format currently doesn’t include the package version but that’s easily fixable. We could for example build multiple stackage LTS instances and index them all in a single Glean DB. There would be advantages to doing this, if for instance there were packages in common between two Stackage instances then the Glean DB would only contain a single copy. A lot of the type structure would be shared too.

  • Provide search functionality in the LSP. Glean can provide simple textual search for names, and with some work could also provide Hoogle-like type search.

  • Think about how to index local projects and local changes. Glean supports stacked and incremental DBs, so we could build a DB for a local project stacked on top of the full Stackage DB. You would be able to go-to-definition directly from a file in your project to the packages it depends on in Stackage. We could re-index new .hie files as they are generated, rather like how static-ls currently handles changes.

  • Integrate with HLS? Perhaps Glean could be used to handle references outside of the current project, switching seamlessly from GHC-based navigation to Glean-based navigation if you jump into a non-local package.

More use cases?

I talked with a few people at ZuriHac about potential use cases for Glean within the Haskell ecosystem. Using it in haskell.org came up a few times, as a way to power search, navigation and analysis. Also mentioned was the possibility of using it as a Hoogle backend. Potentially we could replace the Haddock-generated hyperlinked sources on haskell.org with a Glean-based browser, which would allow navigating links between packages and find-references.

Another use cases that came up was the possibility of doing impact analysis for core library changes (or any API changes really). Some of this is already possible using find-references, but more complex cases such as finding instances that override certain methods aren’t possible yet until we extend the indexer to capture richer information.

If you’re interested in using Glean for something, why not jump on the Glean discord server and tell us about it!

June 11, 2025 12:00 AM

June 10, 2025

Brent Yorgey

Introduction to competitive programming in Haskell

Introduction to competitive programming in Haskell

Posted on June 10, 2025
Tagged , ,

A few days ago I gave a talk at ZuriHac 2025 entitled Haskell for Competitive Programming, a basic introduction to competitive programming in general, and the joy of using Haskell for competitive programming in particular. This is an expanded version of my talk in blog post form. (For an even gentler introduction to competitive programming in Haskell, see this old blog post from 2019.)

Competitive Programming

First of all, what is competitive programming? It’s a broad term, but when I talk about competitive programming I have something in mind along the following lines:

  • There are well-specified input and output formats, usually with a few examples, and a precise specification of what the output should be for a given input.
  • Your job is to write a program which transforms input meeting the specification into a correct output.
  • You submit your program, which is tested on a number of inputs and declared correct if and only if it yields the correct output for all the tested inputs.
  • There is often time pressure involved—that is, you have a limited amount of time in which to write your program. However, it is also possible to participate “recreationally”, simply for the joy of problem-solving, without time pressure (in fact, the vast majority of the competitive programming I do is of this form, though I have occasionally participated in timed contests).

There are many variations: whether you are allowed to use code libraries prepared ahead of time, or must type everything from scratch; outputs can be scored according to some criteria rather than simply being judged right or wrong; and so on.

There are many sites which allow you to participate in contests and/or solve competitive programming problems recreationally. My favorite is Open Kattis; I mention some others at the end of this post.

Pot: a first example

As an introductory example, let’s look at Pot. As usual, there’s a silly story, but what it boils down to is that we will be given a sequence of numbers, and we should interpret the last digit of each number as an exponent, then sum the results. For example, if given 125, we should interpret it as \(12^5\), and so on.

Dealing with I/O via interact

An imperative approach to such a problem would involve doing a sequence of input commands, some computation, and a sequence of output commands—possibly interleaved with one another—and we might immediately think to start using functions like getLine and putStrLn to do the required I/O in Haskell. However, there is a much more fruitful functional perspective: we are simply being asked to implement a particular (partial) function of type String -> String. The fact that the function’s input and output should be hooked up to the program’s standard input and output is just an implementation detail. Competitive programming is functional at heart!

It turns out that Haskell’s standard library already has the perfect built-in function for this scenario:

interact :: (String -> String) -> IO ()

interact takes a pure String -> String function and turns it into an IO action which reads from standard input, passes the input to the given String -> String function, and prints the result to standard output. It even does this using lazy I/O—that is, the input is read lazily, as demanded by the function, so that the output and input can be automatically interleaved depending on which parts of the output depend on which parts of the input. In particular, this means that that the entire input need not be stored in memory at once. If the inputs can be processed into outputs in a streaming fashion—as is the case in the example problem we are currently considering—then the input and output will be interleaved. In general, this kind of lazy I/O is problematic and even unsafe, but it’s perfect for this scenario.

Solving the problem with a pipeline

So interact does all the IO for us, and all we have to do is write a pure String -> String function which transforms the input to the output. In this case, we can split the input into lines, drop the first line (we don’t need to know how many lines of input there are—we just get a list of all of them, since interact will read until EOF), read each number and turn it into the first digits raised to the power of the last digit, then sum them and show the result. The full solution is below. Notice how I use the “backwards composition” operator (>>>), since I find it more convenient to type from left to right as I’m thinking about transforming from input to output.

import Control.Category ((>>>))

main = interact $
  lines >>> drop 1 >>> map (read >>> process) >>> sum >>> show

process :: Integer -> Integer
process n = (n `div` 10) ^ (n `mod` 10)

I use Integer here since raw performance doesn’t matter much for this easy problem, and Integer avoids any potential problems with overflow. However, using Int instead of Integer can make a big difference for some compute-intensive problems. On Kattis, Int will always be 64 bits, but last time I checked Int can be 32 bits on Codeforces.

Shopping List: wholemeal programming and ByteString

Let’s consider Shopping List as a second example. In this problem, we are given a list of shopping lists, where each shopping list consists of a list of space-separated items on a single line. We are asked to find the items which are common to all the shopping lists, and print them in alphabetical order.

Wholemeal programming with standard data structures

This problem is very amenable to a “wholemeal programming” approach, where we work entirely at the level of whole data structure transformations rather than looping over individual elements. We can turn each shopping list into a set, then find the intersection of all the sets. Moreover, if we use Data.Set, which uses an ordering on the elements, we will get the result in alphabetical order “for free” (“free” as in the amount of code we have to write, not necessarily runtime cost). Haskell has a decent collection of data structures in the containers library ((Int)Set, (Int)Map, Seq, Tree, and even Graph) with a large collection of standard methods to construct and manipulate them, which are bread and butter for many competitive programming problems.

{-# LANGUAGE ImportQualifiedPost #-}

import Control.Category ((>>>))
import Data.Set (Set)
import Data.Set qualified as S

main = interact $
  lines >>> drop 1 >>> map (words >>> S.fromList) >>>
  foldr1 S.intersection >>>
  (\s -> show (S.size s) : S.toList s) >>> unlines

ByteString vs String

Unfortunately, when we try submitting this code, we get a Time Limit Exceeded error! What’s wrong?

The issue is our use of String, which is an actual linked list of characters and is very slow, especially when we have many short strings, as in this problem. In the worst case, we could have 100 shopping lists, each with 5000 items of length 10, for a total of up to 5 MB of input; with that much input data to read, any overhead associated with reading and parsing the input can make a significant difference.

Switching to ByteString is much faster. Why not Text, you ask? Well, Text has to do a bunch of extra work to deal properly with Unicode encodings, but in 99.99% of all competitive programming problems I’ve ever seen, the input is guaranteed to be ASCII. So not only do we not need Text, we can get away with a version of ByteString that simply assumes every character is a single 8-bit byte!

Once we import it, all we need to do is replace a bunch of String operations with corresponding ByteString ones.

{-# LANGUAGE ImportQualifiedPost #-}

import Control.Category ((>>>))
import Data.Set (Set)
import Data.Set qualified as S
import Data.ByteString.Lazy.Char8 qualified as BS

main = BS.interact $
  BS.lines >>> drop 1 >>> map (BS.words >>> S.fromList) >>>
  foldr1 S.intersection >>>
  (\s -> BS.pack (show (S.size s)) : S.toList s) >>> BS.unlines

A Favourable Ending: input parsing and lazy recursive structures

As a last example, let’s look at A Favourable Ending. This problem consists of a number of test cases; each test case describes a choose-your-own-adventure book with a number of sections, where each section is either an ending (either good or bad), or allows the reader to choose among three sections to proceed to next. For each test case, we are asked how many distinct stories there are with good endings.

More abstractly, since we are guaranteed that there are no loops, the sections of the book form a DAG, and we are asked to count the number of distinct paths in a DAG from a distinguished start node to any of a distinguished set of “good” leaves.

Parsing with Scanner

Parsing the input for this problem is trickier than the other examples so far. In theory, we could still ignore the first number specifying the number of test cases, and just continue reading test cases until EOF. However, each test case begins with a number specifying the number of sections in the book, and we cannot ignore this number: we need to know how many lines to read before the start of the next test case. Doing this manually involves pattern-matching on a list of lines, using splitAt to split off the lines for each test case, and manually passing around the list of the remaining lines: tedious.

Fortunately, Haskell is great at building abstractions to insulate us from such tedium. I’ve developed a simple Scanner abstraction which works well in this context.

We begin by creating some data types to represent the input in structured form:

type Book = Map Int Section

data Section = End Disposition | Choice [Int]
  deriving (Eq, Show)

data Disposition = Favourably | Catastrophically
  deriving (Eq, Show, Read)

Now we can write a Scanner to read a Book:

book :: Scanner Book
book = do
  s <- int
  M.fromList <$> s >< ((,) <$> int <*> section)

section :: Scanner Section
section = do
  t <- peek
  if isDigit (BS.head t)
    then Choice <$> (3 >< int)
    else End . readLower . BS.unpack <$> str

readLower :: Read a => String -> a
readLower = read . onHead toUpper

onHead :: (a -> a) -> [a] -> [a]
onHead _ [] = []
onHead f (x : xs) = f x : xs

(readLower and onHead are functions in my personal competitive programming template, included here for completeness).

One more piece of boilerplate we can write at this point is the main function, which simply consists of running the Scanner to read all the test cases, solving each test case, and formatting the output.

main = BS.interact $ runScanner (numberOf book) >>> map (solve >>> showB) >>> BS.unlines

DP + topsort with a lazy recursive map

With all that framework out of the way, we can turn to actually solving the problem. And here is where something really fun happens. In a typical imperative language, we would have to first topologically sort the book sections, then use dynamic programming to compute the number of good stories beginning at each section, starting with the leaves and proceeding backwards through the topological sort to the start—dozens of lines of code. However, in Haskell we can get all of this for free, just by defining a lazy, recursive map!

solve :: Book -> Int
solve book = endings ! 1
  where
    endings = M.fromList [(p, endingsFrom (book!p)) | p <- M.keys book]
    endingsFrom (End d) = if d == Favourably then 1 else 0
    endingsFrom (Choice ps) = sum $ map (endings !) ps

endings is a Map from each book section to the number of favorable stories starting with that section. Notice how its values are defined via the endingsFrom function, which is in turn defined, in the Choice case, by looking up the values of the choices in the endings map and summing them. endings is thus defined recursively, which works because it is lazy in the values. When we demand the value of endings ! 1, the runtime system starts evaluating thunks in the map as needed, implicitly doing a topological sort for us.

Here’s another way to think about this: what we really want is the function endingsFrom : Section -> Int, which tells us how many good endings there are starting at a given section. It can be defined via a recurrence; however, if we were to literally implement it as a recursive function, our program would spend a ridiculous amount of time recomputing the same values over and over again. So, we insert a lazy map in the middle to memoize it (there are other data structures that can be used for this purpose as well).

Resources

Here are some resources in case you’re interested in exploring more.

  • Open Kattis has a collection of thousands of high-quality problems which can be solved in Haskell (or many other languages). If you just want to try solving some problems for fun, it’s a great place to start.
  • There are also other sites which accept Haskell, such as Codeforces. Check these out if you want to actually participate in timed contests.
  • My public listing of Kattis problems I have solved, with my own personal rating system.
  • I’ve written a series of blog posts about competitive programming in Haskell, on a variety of topics.
  • I also have a repository of modules I’ve developed specifically for competitive programming. Many of the modules are documented in one or more blog posts.
  • Soumik Sarkar has an even larger collection of Haskell libraries for competitive programming.
<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at June 10, 2025 12:00 AM

June 02, 2025

Edward Z. Yang

Vibe coding case study: ScubaDuck

A lot of strong engineers that I know haven't really taken a serious look at AI coding; they've used LLMs to ask questions or write simple scripts and appreciate that it is a useful tool, but haven't actually tried building a nontrivial application entirely from scratch in vibe coding style (here, I use the term in its original meaning: when you do AI coding without carefully reviewing the output). This is understandable: if you're not working on a green field project, there aren't that many opportunities to write code in this style--standard practice for established projects is that someone else needs to review all of the code you write: this is a bad match for vibe coding! So in this post, I want to give a concrete case study of a nontrivial system that was entirely vibe coded (ScubaDuck), to argue the following claims:

  1. AI coding can be done on a manager's schedule: you don't need continuous blocks of coding time and context-switching is considerably less harmful. ScubaDuck was implemented in three days of part time work, where all of the work happened when the baby was napping.
  2. AI coding substantially lowers the cost of doing projects in tech stacks you are less familiar with. ScubaDuck is mostly JavaScript UI code, which is not something I write on a day-to-day basis.
  3. AI coding is an unlock for "sidequests": support software that's ancillary to your main task that is nice to have, but not essential. If previously you would have decided the cost outweighed the benefit, AI coding reducing the cost means you should redo these calculations.
  4. Vibe coding works and can produce working software. ScubaDuck is an existence proof that vibe coding is a viable strategy for generating JavaScript UI code (NB: I don't claim vibe coding will work for all domains, nor do I claim this is the only domain for it works. Hopefully you can also build some intuition for where it is more or less likely to work). You will not one shot it (ScubaDuck was 150 prompts in the end) but if you are prompting the LLM to also generate tests, you can reliably fix issues without causing regressions to existing code.
  5. Vibe coding is good for situations where buggy software is low impact; be on the lookout for ways to engineer this sort of situation. ScubaDuck is a read-only interface, where the only downside to being buggy is you can't issue the queries you want to issue.

Update: You can see all of my prompts and the resulting agent trajectories at scubaduck-prompts.

What is ScubaDuck?

ScubaDuck is a discount implementation of Meta's internal Scuba realtime database system. You can read more about what exactly this is on GitHub, but it's not so important for the purposes of this post: the key details you need to know about ScubaDuck is that it consists of a Python server that exposes an API to perform queries against a DuckDB database, and an HTML and JavaScript frontend application which implements the forms for building these queries and rendering of the output data. Both the forms and output data rendering have nontrivial JavaScript enhancements: some form inputs are chip inputs and support autocomplete, and the time series view is an SVG chart. All of these components were coded from scratch, so the project has no third-party JavaScript dependencies.

So on the one hand, this project is pretty simple. There are no stringent performance or uptime requirements, it's a pretty standard server-client program that the LLM has seen millions of times before (this is good!) On the other hand, the exact behavior of the frontend UI is quite intricate and would be very difficult to one-shot in a single prompt. Indeed, as I was coding and testing the application, I frequently ran into situations that I didn't anticipate in my original specification, and that I had to ask Codex to refine. Another way to put it is that ScubaDuck is a relatively simple functional specification (although this too was not one shot), but I did a lot of polishing of small behaviors so that the interface behaved in the way that I expected Scuba to behave. Here, it was helpful that I had a very clear idea of what I wanted (since I've used Scuba quite a lot at work).

Going into ScubaDuck, I had a pretty good sense that this project should be a good fit for LLMs. HTML, JavaScript and Python are all extremely high resource languages, and I'd heard lots of people raving about how good LLMs were at transforming wireframes and mockups into fully functional websites. It is also fully self contained and straightforward-ish to test (only "ish" because you do have to use something like Playwright to actually test the frontend UI, which honestly is a slog. But fortunately, the LLM can write the tests for you!) One design decision I made, which I didn't originally anticipate but worked out in the end, was the decision to not use any third-party JavaScript libraries. This was by accident: Python has no native of bundling third party JavaScript, but I wanted the tool to work offline. I wasn't sure if you could vibe code an SVG charting library from scratch, but apparently you can and it's quite easy!

Agent setup

ScubaDuck was implemented with OpenAI Codex in the cloud (not the CLI tool). Codex's cloud offering requires you to initialize a hermetic environment which the coding agent can execute commands in. It's pretty well known now that AI coding agents work much better if they are able to run the code they write and see if it worked or not, so this is quite an important part of the process. Unfortunately, this was somewhat time consuming trial and error to setup. I had a fairly detailed initial prompt, and what I would do was submit it to Codex, watch it fail, read over the trajectory (the agent logs) to see what happened (Codex wanted to use npm! Codex couldn't download something from the internet! Codex tried to use a package that wasn't available!) and then fixed whatever environment misconfiguration had caused it to fail, or edited AGENTS.md to instruct it to not do some behavior. According to my history, the first day of the project was spent unsuccessfully trying to get the project setup, and my first successful Codex PR only happened on May 19.

At the end of setup, I had the following:

  1. A pyproject.toml with exactly the dependencies I wanted to be used (duckdb, flask and python-dateutil), a lockfile for it (since I was using uv) and my preferred configuration for various tools (pytest, ruff). I'm a big fan of pytest-xdist for vibe coded projects, since you can prompt the LLM to write tests that will work when run in parallel and it does a pretty good job at this. Later I'd also add a pyright configuration, though initially I left it out because I saw Codex doing some strange things on account of duckdb being untyped, and I didn't want to debug it at the time (the fix, by the way, is instructing the LLM to define stubs as necessary in this case.)
  2. An AGENTS.md file with some basic instructions to try to get Codex to stop doing things I saw it doing in the initial trajectories that I didn't want it to do. Nothing fancy, just if you see Codex do something bad, tell it not to do it in AGENTS.md. A good example of this is the "There are no nested AGENTS.md files, this is the only agents file": Codex is post-trained to look for nested AGENTS.md files, but you can save a few tool calls if you tell it there aren't any. (Note: folklore for Claude 3.7 is that instruction following for this sort of rules following was not great. Word on the street is that both Codex and Claude 4 are substantially better at this. Extra note: For uv users, another notable instruction in AGENTS.md is how to activate the venv, since at time of writing I couldn't get Codex to make this happen automatically.)
  3. A setup script for the environment. This took the most debugging, because Codex runs all Internet access through a proxy and sometimes it works imperfectly.

After I got my initial prompt to generate a first draft of the application, I was able to begin vibe coding in earnest.

The Human-Agent loop

The basic vibe coding loop works like this:

  1. Interact with the application and find things that are broken
  2. Prompt the LLM to fix them
  3. Repeat

For example, after the very first PR, some very mild poking around immediately revealed the bugs fixed in #2:

There's a race condition in the current test logic for matching against table contents in run_query. Specifically, if there were previously valid results in lastResults, and for some reason Dive doesn't do anything, then we will still see the old results. The testing framework should explicitly clear lastResults before attempting an interaction.

...and #3:

Filter functionality does not work. We will first add a failing test, and then fix it. The failing test should click "Add Filter", then select "user" as the field, and then add an "alice" chip (by typing alice in the text box and pressing ENTER). Then when we dive, we should see two alice rows. Right now, NO request is issued at all when we click Dive. Diagnose and then fix the problem.

Prompt the agent to write tests. It's very helpful to prompt the agent to generate tests for whatever bugs its fixing. For frontend code, I decided to use playwright to write these tests. An example in #11:

def test_header_and_tabs(page: Any, server_url: str) -> None:
    page.goto(server_url)
    page.wait_for_selector("#order_by option", state="attached")

    header = page.text_content("#header")
    assert "sample.csv" in header
    assert "events" in header

    assert page.is_visible("#settings")
    assert page.is_hidden("#columns")
    page.click("text=Columns")
    assert page.is_visible("#columns")
    cols = page.locator("#column_list li").all_inner_texts()
    assert "timestamp" in cols
    assert "event" in cols
    page.click("text=View Settings")
    assert page.is_visible("#settings")

    btn_color = page.evaluate("getComputedStyle(document.querySelector('#dive')).backgroundColor")
    assert "rgb(0, 128, 0)" == btn_color

    sidebar_overflow = page.evaluate("getComputedStyle(document.querySelector('#sidebar')).overflowY")
    view_overflow = page.evaluate("getComputedStyle(document.querySelector('#view')).overflowY")
    assert sidebar_overflow == 'auto'
    assert view_overflow == 'auto'

This is kind of a very manual and somewhat brittle test. I doubt I would have had the patience to manually write and debug this. But the LLM will happily do it. There is also a fear that the test might not actually be testing anything for real. But because these tests are all generated from bug reports, you can prompt the model to write the test first, check that it's failing, and then fix the problem. I didn't instruct this in the above prompt, but I did instruct it for #155, for example. The trajectory shows that the LLM demonstrates it can repro the bug before fixing the issue:

I’ll add a case-insensitive test for "samples" to check server behavior with lowercase input.

(scubaduck) root@5b69743466e4:/workspace/scubaduck# python - <<'EOF'
> import json
> from scubaduck import server
>
> app = server.app
> client = app.test_client()
> payload = {
>     "table": "events",
>     "start": "2024-01-01 00:00:00",
>     "end": "2024-01-03 00:00:00",
>     "graph_type": "table",
>     "order_by": "samples",  # lowercase
>     "order_dir": "DESC",
>     "limit": 10,
>     "columns": [],
>     "group_by": ["user"],
> }
> rv = client.post("/api/query", data=json.dumps(payload), content_type="application/json")
> print('status', rv.status_code)
> print('data', rv.get_json())
> EOF
status 400
data {'error': 'Unknown column: samples'}
(scubaduck) root@5b69743466e4:/workspace/scubaduck#

I’ve found the bug happens when "Samples" is lowercase. I’ll adjust the query to handle case insensitivity and add a test for it. Time to modify server.py to fix the issue!

Beware BC-breaking changes. The LLM generated tests worked well when I didn't need to change behavior in a BC-breaking way. Example prompt:

Change time column to default to none unconditionally. (I want to default it to a time column if we find a good candidate but this threshold needs tuning and I don't have the tuning right now.)

This hit a Codex timeout:

This attempt to update the default time column led to extensive frontend test failures that require deeper changes. The environment made it difficult to fully diagnose the Playwright test behaviors within the time available, so the work could not be completed.

In standard software engineering practice, when this happens, decouple the BC compatible and BC breaking changes!

Make it so that Time Column can be set to (none). When it is set this way, the Start/End fields are hidden and we don't apply a filter on time range. (#115)

and then later instead of defaulting the time column to none, I added a heuristic to pick a column that looked like time, which picked the same column that all of the existing tests had also expected to be called with.

Refactors have to be split up. Codex's timeout means that you can't ask it to do too much in one go. Here's a prompt that timed out:

scubaduck/index.html has gotten a bit long. Let's split out some of the JS code into dedicated JS files for their functionality. Also setup the necessary Flask scaffolding to serve these JS files. I think splitting out these specific components would be good:

  • Dropdown implementation
  • Sidebar resizing
  • JS controlling the View Settings (e.g., updateDisplayTypeUI, as well as one off interactions on form elements, columns handling, filter handling, the actual Dive implementation (including query updating), reading in defaults from query string)
  • Table rendering (e.g., formatNumber, sorting)
  • Chip input implementation
  • Chart rendering (showTimeSeries)

Make changes to AGENTS.md or README.md describing the structure so you can quickly find where the components you need are

I eventually did manage the refactor by prompting Codex to individually move out the pieces I wanted to extract one-by-one. This is a place where I think Claude Code probably would have performed better.

Parallelizing tasks. As you can see from the lengths of my prompts, it does take a while to write a good prompt; you're basically writing a bug report with enough detail that the LLM can repro it and then fix it. So sometimes I would be bottlenecked on prompt writing. However, sometimes the prompts were quite short. In those cases, Codex encourages you to submit more tasks that can run in parallel. I found this worked well, and I'd sometimes have as many as five instances going (once again, rate limited by discovering problems, making designs and typing prompts!) One irritation is when the tasks end up conflicting with each other. Sometimes the conflicts are easy to fix, but if it feels nontrivial, it's often better to just ask Codex to redo one of the PRs on latest main after the other has landed. To avoid merge conflicts, it helps to have only one "main feature" agent going at any time, and then ask the agent to do random bugfixes in parallel with it. Once you have no more tasks to get running, you can go do something else while you wait for the agents to finish (manager schedule!)

Prompting

As a reminder, I've posted all of my prompts (including the ones that failed) at scubaduck-prompts, and I think it's helpful to skim through them to get a flavor of what I was asking the LLM. But to summarize, what did I spend most of my time on prompting Codex to do? My general vibe (ahem) is that I spent most of my time doing minor enhancements, where I instructed Codex to make some part of the program work slightly differently, in a way that was previously unspecified from the previous prompt. The metaphor I had in my head while I was working on the project was like that of a sculptor chiseling away marble: in the beginning, anything is possible, but as I kept prompting, I continuously narrowed down the space of possible programs I had until I had exactly the one I wanted. One big thing I want to note is that Codex rarely needed to make updates to my tests; for the most part, tests that were added never got taken away, because I never "changed my mind". I suspect that the vibe coding process would have been rockier if I was having to change behavior frequently.

One of the things that surprised me the most about the process was how easy it was to implement a line chart in SVG with Codex. My first prompt resulted in a chart that looked broken on the test data:

We're going to add a new View type, to go along with Samples and Table: Time Series. Time Series supports all the fields that Table supports, and a few more:

  • X-axis: Main group by dimension, e.g., the x-axis on time series view. This is our custom dropdown selector, but only time columns are populated here. It should prefer a default setting from the following list, most preferred first: "time", "timestamp"
  • Granularity: Choose the time interval between data points on the chart. For example, a granularity of 1 hour means there will be a data point every 60 minutes that is aggregated with the chosen Aggregate function over the data for the granularity period before point. This is a plain drop down. The valid values are: Auto, Fine, 1 second, 5 seconds, 10 seconds, 30 seconds, 1 minute, 4 minutes, 5 minutes, 10 minutes, 15 minutes, 30 minutes, 1 hour, 3 hours, 6 hours, 1 day, 1 week, 30 days. The semantics of the Auto setting is that it sets the interval to whatever would result in maximum 100 buckets (if there are not enough data points for that many buckets, it just picks the finest time interval that makes sense), and Fine which sets the interval to 500 buckets.
  • Fill Missing Buckets: This is a dropdown. For now, it has the settings "Fill with 0 (Per Series)" (default), "Connect (Per Series)" and "Leave blank".

Additionally, the default setting of Limit is 7, as it controls how many elements from group by will be plotted (the actual number of lines plotted could be a multiple of this, as we will plot every selected Column).

Unlike Samples and Table, we will instead display a line chart in the right panel. To plot the line chart, we will implement it by hand with JS and SVG, similar to how highcharts implements it. We will not use any third party dependencies. Lines will be plotted as paths, no smoothing, no dots for individual data points. Each series (as generated by group by) should be plotted with a different color, assigned using a best practices color palette for graph design. There should be a rendering of x-axis and y-axis; the x-axis should have slanted labels to aid readability. When we mouse over the chart, a vertical line should snap to the center of the time bucket that we are closest to. We should also display a crosshair on all of the series showing us their values at that data point, and highlight the closest point we are on, and increase the thickness of the series that point is on. To the left of the graph (still in the right panel), there should be a legend. The legend looks like this:

[GROUP BY VALUE] [AGGREGATE]
[First Column name, with series color]
[Number of samples for the first column]
[Second Column name, with series color]
[Number of samples for the second column]
... for all columns
----
... for all group by values (up to the limit)

So for example, if I group by user, I might see:

Alice AVG
value
4 (samples)

The highlighted series (which has a thicker line) should also be highlighted in the legend).

This was kind of terrifying, because I initially thought I didn't have a good way to test the SVG outputs. But after doing some regular old-fashioned debugging and reading the code (yes, this part not vibe coded), I figured out the problem, and also realized that Playwright can test that an SVG path is not just entirely straight. After the initial bugs were fixed, I mostly had to add missing features like x-axis/y-axis and interactivity features (amusingly, Codex ignored most of the instructions in the latter half of the prompt, giving only the barest bones legend. I suspect this was because I had some files which were too long). My general take after this was that JS chart libraries are going to become obsolete: it's much easier to vibe code a bespoke implementation and then customize the heck out of it.

Conclusion

ScubaDuck was implemented in about 150 Codex prompts. As you can see from the sample prompts above, the prompts are recognizably programming, they just happen to be in plain English language. This is a big help, because I never had to keep track of the nest of callbacks and state machines for implementing complex UI elements in JavaScript. I had to be fluent in what I wanted my program to do, and a good QA tester for the application to discover new problems that needed to be fixed, but I did not have to worry at all about the vagaries of SVG DOM elements or pixel position computation minutiae. It's hard to say how long it would have taken to code this by hand, but I think reproducing a UI that's been in production for years at Meta in three (part-time) days is pretty good!

Despite having done a bit of AI coding before, I also learned a bit from working on Codex. Codex made it blindingly clear that the parallel modality (and subsequent conflict resolution) is important. It made me adjust up my estimation of the capability of LLMs to write raw HTML/JS and evoked a future where people vibe code components in place of taking on a third party dependency. I was very appreciative of no rate limit Codex (though I doubt it's going to last.) It also reminded me how difficult it will be to setup agent environments for "real" projects (like PyTorch).

Hopefully, this case study has given you some ideas for things to try. Go forth and vibe code, responsibly!

by Edward Z. Yang at June 02, 2025 04:31 AM

Chris Penner

Building Industrial Strength Software without Unit Tests

Building Industrial Strength Software without Unit Tests

I don't know about you, but testing isn't my favourite part of software development.

It's usually the last thing standing between me and shipping a shiny new feature, and writing tests is often an annoying process with a lot of boilerplate and fighting against your system to get your app into a good start starting for the test or mocking out whichever services your app depends on.

Much ink has been spilled about how to organize your code in order to make this easier, but the fact that so many blog posts and frameworks exist for this express purpose suggests to me that we as a community of software developers haven't quite solved this issue yet.

Keep reading to see how I've solved this problem for myself by simply avoiding unit testing altogether.

An alternative testing method

When I first started at Unison Computing I was submitting my first feature when I learned there were precious few unit tests. I found it rather surprising for a codebase for a compiler for a programming language! How do you prevent regressions without unit tests?

The answer is what the Unison team has dubbed transcript tests. These are a variation on the concept of golden-file tests.

A Unison transcript is a markdown file which explains in standard what behaviour it is going to test, then intersperses code-blocks which outline the steps involved in testing that feature using a mix of Unison code and UCM commands (UCM is Unison's CLI tool). After that comes the magic trick; UCM itself can understand and run these transcript files directly and record the results of each block.

When running a transcript file with the ucm transcript command UCM produces a deterministic output file containing the result of processing each code block. Unless the behaviour of UCM has changed since the last time it was run the resulting file will always be the same.

Each block in the markdown file is either a command, which is sent to the UCM shell tool, or it represents an update to a file on the (virtual) file-system, in which case it will be typechecked against the state of the codebase.

Here's a quick example of a transcript for testing UCM's view command so you can get a feel for it.

# Testing the `view` command

First, let's write a simple definition to view:

``` unison
isZero = cases
  0 -> true
  _ -> false
```

Now we add the definition to the codebase, and view it.

``` ucm
scratch/main> update
scratch/main> view isZero
```

We run this transcript file with ucm transcript my-transcript.md which produces the my-transcript.output.md file.

Notice how compiler output is added inline, ignore the hashed names, It's because I'm skipping the step which adds names for Unison's builtins.

# Testing the `view` command

First, let's write a simple definition to view:

``` unison
isZero = cases
  0 -> true
  _ -> false
```

``` ucm :added-by-ucm
  Loading changes detected in scratch.u.

  I found and typechecked these definitions in scratch.u. If you
  do an `add` or `update`, here's how your codebase would
  change:

    � These new definitions are ok to `add`:
    
      isZero : ##Nat -> ##Boolean
```

Now we add the definition to the codebase, and view it.

``` ucm
scratch/main> update

  Done.

scratch/main> view isZero

  isZero : ##Nat -> ##Boolean
  isZero = cases
    0 -> true
    _ -> false
```

Feel free to browse through the collection of transcripts we test in CI to keep UCM working as expected.

Testing in CI

Running transcript tests in CI is pretty trivial; we discover all markdown files within our transcript directory and run them all. After the outputs have been written we can use git diff --exit-code which will then fail with a non-zero code if anything of the outputs have changed from what was committed. Conveniently, git will also report exactly what changed, and what the old output was.

This failure method allows the developer to know exactly which file has unexpected behaviour so they can easily re-run that file or recreate the state in their own codebase if they desire.

Transcript tests in other domains

I liked the transcript tests in UCM so much that when I was tasked with building out the Unison Share webapp I decided to use transcript-style testing for that too. Fast forward a few years and Unison Share is now a fully-featured package repository and code collaboration platform running in production without a single unit test.

If you're interested in how I've adapted transcript tests to work well for a webapp, I'll leave a few notes at the end of the post.

Benefits of transcript tests

Here's a shortlist of benefits I've found working with transcript tests over alternatives like unit tests.

You write a transcript using the same syntax as you'd interact with UCM itself.

This allows all your users to codify any buggy behaviour they've encountered into a deterministic transcript. Knowing exactly how to reproduce the behaviour your users are seeing is a huge boon, and having a single standardized format for accepting bug reports helps reduce a lot of the mental work that usually goes into reproducing bug reports from a variety of sources. This also means that the bug report itself can go directly into the test suite if we so desire.

All tests are written against the tool's external interface.

The tests use the same interface that the users of your software will employ, which means that internal refactors won't ever break tests unless there's a change in behaviour that's externally observable.

This has been a huge benefit for me personally. I'd often find myself hesitant to re-work code because I knew that at the end I'd be rewriting thousands of lines of tests. If you always have to rewrite your tests at the same time you've rewritten your code, how do you have any confidence that the tests still work as intended?

Updating tests is trivial

In the common case where transcripts are mismatched because some help message was altered, or perhaps the behaviour has changed but the change is intended, you don't need to rewrite any complex assertions, or mock out any new dependencies. You can simply look at the new output, and if it's reasonable you commit the changed transcript output files.

It can't be understated how convenient this is when making sweeping changes; e.g. making changes to Unison's pretty printer. We don't need to manually update test-cases, we just run the transcripts locally and commit the output if it all looks good!

Transcript changes appear in PR reviews

Since all transcript outputs are committed, any change in behaviour will show up in the PR diff in an easy-to-read form. This allows reviewers to trivially see the old and new behaviour for each relevant feature.

Transcript tests are documentation

Each transcript shows how a feature is intended to be used by end-users.

Transcripts as a collaboration tool

When I'm implementing new features in Unison Share I need to communicate the shape of a JSON API with our Frontend designer Simon. Typically I'll just write a transcript test which exercises all possible variants of the new feature, then I can just point at the transcript output as the interface for those APIs.

It's beneficial for both of us since I don't need to keep an example up-to-date for him, and he knows that the output is actually accurate since it's generated from an execution of the service itself.

Transcript testing for Webapps

I've adapted transcript testing a bit for the Unison Share webapp. I run the standard Share executable locally with its dependencies mocked out via docker-compose. I've got a SQL file which resets the database with a known set of test fixtures, then use a zsh script to reset my application state in between running each transcript.

Each transcript file is just a zsh script that interacts with the running server using a few bash functions which wrap curl commands, but save the output to json files, which serve as the transcript output.

I've also got helpers for capturing specific fields from an API call into local variables which I can then interpolate into future queries, this is handy if you need to, for example, create a project then switch it from private to public, then fetch that project via API.

Here's a small snippet from one of my transcripts for testing Unison Share's project APIs:

#!/usr/bin/env zsh

# Fail the transcript if any command fails
set -e

# Load utility functions and variables for user credentials
source "../../transcript_helpers.sh"

# Run a UCM transcript to upload some code to load in projects.
transcript_ucm transcript prelude.md

# I should be able to see the fixture project as an unauthenticated user.
fetch "$unauthenticated_user" GET project-get-simple '/users/test/projects/publictestproject'

# I should be able to create a new project as an authenticated user.
fetch "$transcripts_user" POST project-create '/users/transcripts/projects/containers' '{
    "summary": "This is my project",
    "visibility": "private",
    "tags": []
}'

fetch "$transcripts_user" GET project-list '/users/transcripts/projects'

You can see the output files generated by the full transcript in this directory.

Requirements of a good transcript testing tool

After working with two different transcript testing tools across two different apps I've got a few criteria for what makes a good transcript testing tool, if you're thinking of adding transcript tests to your app consider the following:

Transcripts should be deterministic

This is critical. Transcripts are only useful if they produce the same result on every run, on every operating system, at every time of day.

You may need to make a few changes in your app to adapt or remove randomness, at least when in the context of a transcript test.

In Share there were a lot of timestamps, random IDs, and JWTs (which contain a timestamp). The actual values of these weren't important for the tests themselves, so I solved the issue by piping the curl output through a sed script before writing to disk. The script matches timestamps, UUIDs, and JWTs and replaces them with placeholders like <TIMESTAMP>, <UUID>, and <JWT> accordingly.

A special mode in your app for transcript testing which avoids randomness can be useful, but use custom modes sparingly lest your app's behaviour differ too much during transcripts and you can't test the real thing.

I also make sure that the data returned by APIs is always sorted by something other than randomized IDs, it's a small price to pay, and reduces randomness and heisenbugs in the app as a helpful byproduct.

Transcripts should be isolated

Each individual transcript should be run in its own pristine environment. Databases should be reset to known state, if the file-system is used, it should be cleared or even better, a virtual file-system should be used.

Transcripts should be self-contained

Everything that pertains to a given test-case's state or configuration should be evident from within the transcript file itself. I've found that changes in behaviour from the file's location or name can just end up being confusing.

Difficulties working with Transcripts

Transcripts often require custom tooling

In UCM's case the transcript tooling has evolved slowly over many years, it has it's own parser, and you can even test UCM's API server by using special code blocks for that.

Share has a variety of zsh utility scripts which provide helpers for fetching endpoints using curl, and filtering output to capture data for future calls. It also has a few tools for making database calls and assertions.

Don't shy away from investing a bit of time into making transcript testing sustainable and pleasant, it will pay dividends down the road.

Intensive Setup**

As opposed to unit tests which are generally pretty lightweight; transcript tests are full integration tests, and require setting up data, and sometimes executing entire flows so that we can get the system into a good state for testing each feature.

You can mitigate the setup time by testing multiple features with each transcript.

I haven't personally found transcript tests to take too much time in CI, largely because I think transcript testing tends to produce fewer tests, but of higher value than unit testing. I've seen many unit test suites bogged down by particular unit tests which generate hundreds of test cases that aren't actually providing real value. Also, any setup/teardown is going to be more costly on thousands of unit-tests as compared to dozens or hundreds of transcript tests.

Service Mocking

Since transcript tests run against the system-under-test's external interface, you won't have traditional mocking/stubbing frameworks available to you. Instead, you'll mock out the system's dependencies by specifying custom services using environment variables, or wiring things up in docker-compose.

Most systems have a setup for local development anyways, so integrating transcript tests against it has the added benefit that they'll ensure your local development setup is tested in CI, is consistent for all members of your team, and continues to work as expected.

In Summary

Hopefully this post has helped you to consider your relationship with unit tests and perhaps think about whether other testing techniques may work better for your app.

Transcript tests surely aren't ideal for all possible apps or teams, but my last few years at Unison have proven to me that tests can be more helpful, efficient, and readable than I'd previously thought possible.

Let me know how it works out for you!

Hopefully you learned something �! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! �

Become a Patron!

June 02, 2025 12:00 AM

May 30, 2025

Haskell Interlude

65: Andy Gordon

Andy Gordon from Cogna is interviewed by Sam and Matti. We learn about Andy’s influential work including the origins of the bind symbol in haskell, and the introduction of lambdas in Excel. We go onto discuss his current work at Cogna on using AI to allow non-programmers to write apps using natural language. We delve deeper into the ethics of AI and consider the most likely AI apocalypse.

by Haskell Podcast at May 30, 2025 02:00 PM

May 29, 2025

Tweag I/O

Quantifying the impact of an AI tool in our company

As software engineers at Modus Create, we are always on the lookout for tools that can enhance our productivity and code quality. The advent of AI-powered coding assistants such as GitHub Copilot has sparked excitement in the development community. Copilot code completions propose snippets at the current cursor that the user can quickly insert, while Copilot Chat allows users to discuss their code with an AI.

These tools promise to revolutionize software development, allowing engineers to focus on higher-level tasks while delegating implementation details to machines. However, their adoption also raises questions:

  • Do they genuinely improve developer productivity?
  • How do they affect code quality and maintainability?
  • Which users and tasks benefit the most from these AI-driven coding assistants?

This blog post explores the challenges of measuring the impact of AI tools in our software engineering practices, with a focus on GitHub Copilot. Note that the data discussed in the post was collected in Q2 2024. We expect that GitHub Copilot has improved since then; we have also not yet had the opportunity to quantitatively investigate newer interfaces to AI development, like Cursor or Windsurf.

“Developer Productivity”

At Modus Create, we’re passionate about improving the experience of developers, both for our own teams and those at clients. We have been working for years on tools that we think improve developer productivity, for instance with Nix, Bazel, Python, and many more. But measuring developer productivity is a notoriously difficult task.

At the heart of this question lies the nature of software development itself. Is it a productive activity that can fit scientific management, be objectively measured, and be optimized? Part of the research on developer productivity goes down this path, trying to measure things like the time it takes to complete standardized tasks. Another trend suggests that developers themselves can be their own assessors of productivity, where frameworks like SPACE are used to guide self-assessment. Each of these angles has strengths and weaknesses. To get as broad a picture as possible, we tried to use a bit of both. We found, though, that data collection issues made our task timings unusable (more on this below). Therefore, all our conclusions are drawn from self-assessments.

Our in-house experiment

To gain a deeper understanding of the impact of GitHub Copilot at Modus Create, we designed and conducted an in-house experiment.

We managed to recruit 22 participants in total, ranging from Junior to Principal software engineers. They had a wide range of programming experience.

Plot: Distribution of participants per years of programming

The experiment consisted of four coding tasks that participants needed to complete using Python within an existing codebase. The tasks were designed to evaluate different aspects of software development:

  1. Data ingestion: Loading and parsing data from a file into a Pandas DataFrame
  2. Data analysis: Performing statistical computations and aggregations using Pandas’ groupby operations
  3. Test development: Writing tests using Python’s unittest framework
  4. Data visualization: Creating interactive plots using the Streamlit library

Participants had varied levels of experience with the required tools. Most participants had at least a little bit of Python experience, but Pandas experience was less common and hardly anyone had used Streamlit before.1

Plot: Distribution of participants per experience with tools

Upon completion of the assigned tasks, all participants completed a comprehensive survey to provide detailed feedback on their experience. The survey consisted of approximately 50 questions designed to assess multiple dimensions of the development process, including:

  • Assessment of participant expertise levels regarding task requirements, AI tooling and GitHub Copilot proficiency
  • Evaluation of task-specific perceived productivity
  • Analysis of the impact on learning and knowledge acquisition
  • Insights into potential future GitHub Copilot adoption

Perceived productivity gains

We asked participants the following questions.

Question Choices
If you didn't have Copilot, reaching
the answer for task X would have taken...
  • Less time
  • About the same time
  • More time

This question was core to our study, as it allowed us to directly measure the perceived productivity gain of using Copilot versus not using it.

The result was clear: almost every Copilot user felt more productive using Copilot on every task.

Plot: Productivity gain per task using Copilot

We also broke out the same data by Python experience level, and found that more experienced Python users found less productivity gain than less experienced users. In this plot, we grouped the “no Python experience” and “beginner” users into the “less experienced” group, with the rest of the users in the “more experienced group”.

Plot: Productivity gain per experience level using Copilot

To better understand how participants tackled these tasks, we collected information by asking for each task:

Question Choices
Which of the following have you
used to complete task X?
  • Copilot code completions
  • Copilot Chat
  • Google search
  • Library documentation
  • My knowledge

We were also interested in comparing these usages across profiles of developers, so we asked this question as well:

Question Choices
How would you describe your Python level?
  • No Python experience
  • Beginner
  • Intermediate
  • Advanced

We could then visualize how participants who felt more productive with Copilot solved each problem, and see if there were variations depending on their profile. Since each participant could choose multiple options, sometimes there are more responses than participants.

Plot: Sources of information used to be more productive per Python experience for task 1 Plot: Sources of information used to be more productive per Python experience for task 2

Apparently, people don’t like library documentation. Also, we thought it was strange that the most experienced Python users never reported using their own knowledge. It would be interesting to dig more into this, but we don’t have the data available. One theory is that when reviewing AI suggestions everyone relied on their own Python knowledge, but experienced users took that knowledge for granted and so didn’t report using it.

Plot: Sources of information used to be more productive per Python experience for task 3 Plot: Sources of information used to be more productive per Python experience for task 4

Among people who felt more productive on tasks “Write unit tests” and “Plot with Streamlit”, we really see more usage of Copilot Chat than other sources.

Our hypothesis is that these tasks typically require making more global changes to the code or adding code in places that are not obvious at first. In these scenarios, Copilot Chat is more useful because it will tell you where and what code to add. In other tasks, it was clearer where to add code, so participants could likely place their cursor and prompt Copilot for a suggestion.

This is supported by the questions we asked:

Question Choices
Which of the following do you think is true?
  • Copilot is better with acceleration:
    it helps accelerate work that I already know how to do.
  • Copilot is better with exploration:
    it helps me explore the problem and how to solve it when I am not sure how to proceed.

This question uses checkboxes, so respondents were not restricted to a single answer.

Plot: Is Copilot suited for acceleration and/or exploration?

On average, participants thought Copilot was suited for both acceleration and exploration, but with some notable differences depending on experience level: experienced Pythonistas strongly favored Copilot for acceleration, while less experienced users thought it was better for exploration.

We also found that the participants’ perspective on acceleration versus exploration seems related to the usage of Copilot Chat.

Plot: average usage of source per opinion

The most interesting part of this chart is that participants who think Copilot is good for exploration or bad for acceleration relied most heavily on Copilot Chat. This suggests that users find the autocomplete features more useful for acceleration, while the chat features — which allow general questions, divorced from a specific code location — are useful for exploration. But it is interesting to note how usage of Copilot Chat versus autocomplete is correlated with how users perceive Copilot as a whole.

For more on acceleration versus exploration with Copilot, this OOPSLA23 talk which inspired to ask this question is worth watching.

Copilot will make code flow

The SPACE framework mentions “flow” as an important aspect of productivity.

Some research associates productivity with the ability to get complex tasks done with minimal distractions or interruptions. This conceptualization of productivity is echoed by many developers when they talk about “getting into the flow” […].

This concept of flow is really interesting, because it is a way to measure productivity that is not based on outputs, but rather on the experience of the developers themselves. And although “flow” might be subjective and perceptual, studies have linked it to higher productivity and reduced stress; see this open-access book chapter for a readable overview of the research.

To get an idea of Copilot’s impact on flow, we asked the following questions:

Question Choices
Did Copilot decrease your need to switch out of your IDE
(for example to search for answers or check the documentation)?
  • Significantly
  • A bit
  • No
Did Copilot enhance your capacity to stay in your development flow?
  • Significantly
  • A bit
  • No

The results were unambiguous: most users found that Copilot helped significantly, and a strong majority found that it helped at least a little.

Plot: Does Copilot help flow

Learnings from organizing the experiment

Although the experiment went well overall, we noted a few challenges worth sharing.

First, ensuring active participation in the experiment required a collective effort within the company. Spreading the word internally about the experiment and looking for participants is an effort not to be underestimated. In our case, we benefited from great support from internal leaders and managers who helped communicate with and recruit participants. Even so, we would have liked to have more participants. It turns out that engineers are sometimes just too busy!

Second, keeping participants focused on the experiment was harder than expected. We had asked participants to make a git commit at the end of each task, thinking that we could use this data to quantify the time it took for each participant to complete their tasks. When looking at the data, we were surprised to see that the time between commits varied widely and was often much longer than expected. When asked, several participants reported that they had to interrupt our experiment to deal with higher-priority tasks. In the end, we discarded the timing data: they were too limited and too heavily influenced by external factors to provide useful conclusions. For the same reason, we haven’t even mentioned yet that our study had a control group: since the timing data wasn’t useful, we’ve omitted the control group entirely from the data presented here.

The ideal scenario of securing dedicated, uninterrupted time from a large pool of engineers proved impractical within our organizational context. Nevertheless, despite these limitations, we successfully gathered a meaningful dataset that contributes valuable perspectives to the existing body of research on AI-assisted development.

Further references

Speaking of other work out there, there’s a lot of it! It turns out that many people are excited by the potential of code assistants and want to understand them better. Who knew? Here is some further reading that we found particularly interesting:

  • Experiments at Microsoft and Accenture introduced Copilot into engineers’ day-to-day workflow and measured the impact on various productivity metrics, like the number of opened pull requests; they found that Copilot usage significantly increased the number of successful builds. They had a much larger sample size than we did — Microsoft and Accenture have a lot of engineers — but unlike us they didn’t specifically consider the uptake of unfamiliar tools and libraries.

  • A research team from Microsoft and MIT recruited developers from Upwork, gave them a task, and measured the time it took with and without Copilot’s help; they found that Copilot users were about 50% faster. They did a better job than we did at measuring completion time (they used GitHub Classroom), but we think our exit survey asked more interesting questions.

  • The Pragmatic Engineer ran a survey about how engineers are using AI tooling, covering popular tools and their perceived impact on development.

Conclusion

Our experiment provided valuable insights into the impact of GitHub Copilot on developer experiences at Modus Create. Overall, developers reported increased productivity and a more seamless workflow. Participants used Copilot extensively in specific coding scenarios, such as automated testing and modifying code that used libraries they were unfamiliar with, and they felt more productive in those cases.

It was particularly interesting to see how the interface to the AI assistant (chat vs. completion) affected participants’ opinions on what the assistant was useful for, with chat-heavy users prioritizing exploration over acceleration and completion-heavy users the other way around. As interfaces and tooling continue to evolve — faster than we can design and run experiments to test them — we expect them to play a huge role in the success of AI-powered code assistants.


  1. We made a small mistake with the wording in Pandas and Streamlit questions: we gave them the options “I have never used it”, “I have heard of it”, “I have used it before in a limited way”, “I am comfortable with it”, and “I am an advanced user”. The problem, of course, is that these responses aren’t mutually exclusive. Given the order the responses were presented in, we think it’s reasonable to interpret “I have never used it” responses to mean that they’d heard of it but never used it. For the plot, we’ve combined “I have never used it” and “I have heard of it” into “Never used it”.

May 29, 2025 12:00 AM

May 28, 2025

Chris Smith 2

Threshold Strategy in Approval and Range Voting

How to turn polling insight into an optimal ballot — and why anything else is wasted.

“approve of�? What does that mean anyway?

I have written previously about how approval and range voting methods are intrinsically tactical. This doesn’t mean that they are more tactical than other election systems (nearly all of which are shown to sometimes be tactical by Gibbard’s Theorem when there are three or more options). Rather, it means that tactical voting is unavoidable. Voting in such a system requires answering the question of where to set your approval threshold or how to map your preferences to a ranged voting scale. These questions don’t have more or less “honest� answers. They are always tactical choices.

But I haven’t dug deeper into what these tactics look like. Here, I’ll do the mathematical analysis to show what effective voting looks like in these systems, and make some surprising observations along the way.

Mathematical formalism for approval voting

We’ll start by assuming an approval election, so the question is where to put your threshold. At what level of approval do you switch from voting not to approve a candidate to approving them?

We’ll keep the notation minimal:

  • As is standard in probability, I’ll write â„™[X] for the probability of an event X, and ğ�”¼[X] for the expected value of a (numerical) random variable X.
  • I will use B to refer to a random collection (multiset) of ballots, drawn from some probability distribution reflecting what we know from polling and other information sources on other voters. B will usually not include the approval vote that you’re considering casting, and to include that approval, we’ll write B ∪ {c}, where c is the candidate you contemplate approving.
  • I’ll write W(·) to indicate the winner of an election with a given set of ballots. This is the candidate with the most approvals. We’ll assume some tiebreaker is in place that’s independent of individual voting decisions; for instance, candidates could be shuffled into a random order before votes are cast, in in the event of a tie for number of approvals, we’ll pick the candidate who comes first in that shuffled order.
  • U(·) will be your utility function, so U(c) is the utility (i.e., happiness, satisfaction, or perceived social welfare) that you personally will get from candidate c winning the election. This doesn’t mean you have to be selfish, per se, as accomplishing some altruistic goal is still a form of utility, but we evaluate that utility from your point of view even though other voters may disagree.

With this notation established, we can clearly state, almost tautologically, when you should approve of a candidate c. You should approve of c whenever:

�[U(W(B ∪ {c}))] > �[U(W(B))]

That’s just saying you should approve of c if your expected utility from the election with your approval of c is more than your utility without it.

The role of pivotal votes and exact strategy

This inequality can be made more useful by isolating the circumstances in which your vote makes a difference in the outcome. That is, W(B ∪ {c}) ≠ W(B). Non-pivotal votes contribute zero to the net expectation, and can be ignored.

In approval voting, approving a candidate can only change the outcome by making that candidate the winner. This means a pivotal vote is equivalent to both of:

  • W(B ∪ {c}) = c
  • W(B) ≠ c

It’s useful to have notation for this, so we’ll define V(B, c) to mean that W(B ∪ {c}) ≠ W(B), or equivalently, that W(B ∪ {c}) = c and W(B) ≠ c. To remember this notation, recall that V is the pivotal letter in the word “pivot�, and also visually resembles a pivot.

With this in mind, the expected gain in utility from approving c is:

  • ğ�”¼[U(W(B ∪ {c}))] - ğ�”¼[U(W(B))]. But since the utility gain is zero except for pivotal votes, this is the same as
  • â„™[V(B, c)] · (ğ�”¼[U(W(B ∪ {c})) | V(B, c)] - ğ�”¼[U(W(B)) | V(B, c)]). But since V(B, c) implies that W(B ∪ {c}) = c, so this simplifies to
  • â„™[V(B, c)] · (U(c) - ğ�”¼[U(W(B)) | V(B, c)])

Therefore, you ought to approve of a candidate c whenever

U(c) > �[U(W(B)) | V(B, c)]

This is much easier to interpret. You should approve of a candidate c precisely when the utility you obtain from c winning is greater than the expected utility in cases where c is right on the verge of winning (but someone else wins instead).

There are a few observations worth making about this:

  • The expectation clarifies why the threshold setting part of approval voting is intrinsically tactical. It involves evaluating how likely each other candidate is to win, and using that information to compute an expectation. That means advice to vote only based on internal feelings like whether you consider a candidate acceptable is always wrong. An effective vote takes into account external information about how others are likely to vote, including polling and understanding of public opinion and mood.
  • The conditional expectation, assuming V(B, c), tells us that the optimal strategy for whether to approve of some candidate c depends on the very specific situation where c is right on the verge of winning the election. If c is a frontrunner in the election, this scenario isn’t likely to be too different from the general case, and the conditional probability doesn’t change much. However, if c is a long-shot candidate from some minor party, but somehow nearly ties for a win, we’re in a strange situation indeed: perhaps a major last-minute scandal, a drastic polling error, or a fundamental misunderstanding of the public mood. Here, the conditonal expected utility of an alternate winner might be quite different from your unconditional expectation. If, say, voters prove to have an unexpected appetite for extremism, this can affect the runner-ups, as well.
  • Counter-intuitively, an optimal strategy might even involve approving some candidates that you like less than some that you don’t approve! This can happen because different candidates are evaluated against different thresholds. Therefore, a single voter’s best approval ballot isn’t necessarily monotonic in their utility rankings. This adds a level of strategic complexity I hadn’t anticipated in my earlier writings on strategy in approval voting.

Approximate strategy

The strategy described above is rigorously optimal, but not at all easy to apply. Imagining the bizarre scenarios in which each candidate, no matter how minor, might tie for a win, is challenging to do well. We’re fortunate, then, that there’s a good approximation. Remember that the utility gain from approving a candidate was equal to

ℙ[V(B, c)] · (U(c) - �[U(W(B)) | V(B, c)])

In precisely the cases where V(B, c) is a bizarre assumption that’s difficult to imagine, we’re also multiplying by ℙ[V(B, c)], which is vanishingly small, so this vote is very unlikely to make a difference in the outcome. For front-runners, who are relatively much more likely to be in a tie for the win, the conditional probability changes a lot less: scenarios that end in a near-tie are not too different from the baseline expectation.

This happens because ℙ[V(B, c)] falls off quite quickly indeed as the popularity of c decreases, especially for large numbers of voters. For a national scale election (say, about 10 million voters), if c expects around 45% of approvals, then ℙ[V(B, c)] is around one in a million. That’s a small number, telling us that very large elections aren’t likely to be decided by a one-vote margin anyway. But it’s gargantuan compared to the number if c expects only 5% of approvals. Then ℙ[V(B, c)] is around one in 10^70. That’s about one in a quadrillion-vigintillion, if you want to know, and near the scale of possibly picking one atom at random from the entire universe! The probability of casting a pivotal vote drops off exponentially, and by this point it’s effectively zero.

With that in mind, we can drop the condition on the probability in the second term, giving us a new rule: Approve of a candidate c any time that:

U(c) > �[U(W(B))]

That is, approve of any candidate whose win you would like better than you expect to like the outcome of the election. In other words, imagine you have no other information on election night, and hear that this candidate has won. If this would be good news, approve of the candidate on your ballot. If it would be bad news, don’t.

  • This rule is still tactical. To determine how much you expect to like the outcome of the election, you need to have beliefs about who else is likely to win, which still requires an understanding of polling and public opinion and mood.
  • However, there is one threshold, derived from real polling data in realistic scenarios, and you can cast your approval ballot monotonically based on that single threshold.

This is no longer a true optimal strategy, but with enough voters, the exponential falloff in ℙ[V(B, c)] as c becomes less popular is a pretty good assurance that the incorrect votes you might cast by using this strategy instead of the optimal ones are extremely unlikely to matter. In practice, this is probably the best rule to communicate to voters in an approval election with moderate to large numbers of voters.

We can get closer with the following hypothetical: Imagine that on election night, you have no information on the results except for a headline that proclaims: Election Too Close To Call. With that as your prior, you ask of each candidate, is it good or bad news to hear now that this candidate has won. If it would be good news, then you approve of them. This still leaves one threshold, but we’re no longer making the leap that the pivotal condition for front-runners is unnecessary; we’re imagining a world in which at least some candidates, almost surely the front-runners, are tied. If this changes your decision (which it likely would only in very marginal cases), you can use this more accurate approximation.

Reducing range to approval voting

I promised to look at strategy for range voting, as well. Armed with an appreciation of approval strategy, it’s easy to extend this to an optimal range strategy, as well, for large-scale elections.

The key is to recognize that a range voting election with options 0, 1, 2, …, n is mathematically equivalent to an approval election where everyone is just allowed to vote n times. The number you mark on the range ballot can be interpreted as saying how many of your approval ballots you want to mark as approving that candidate.

Looking at it this way presents the obvious question: why would you vote differently on some ballots than others? In what situation could that possibly be the right choice?

  • For small elections, say if you’re voting on places to go out and eat with your friends or coworkers, it’s possible that adding in a handful of approvals materially changes the election so that the optimal vote is different. Then it may well be optimal to cast a range ballot using some intermediate number.
  • For large elections, though, you’re presented with pretty much exactly the same question each time, and you may as well give the same answer. Therefore, in large-scale elections, the optimal way to vote with a range ballot is always to rate everyone either the minimum or maximum possible score. This reduces a range election exactly to an approval election. The additional expressiveness of a range ballot is a siren call: by using it, you always vote less effectively than you would have by ignoring it and using only the two extreme choices.

Since we’re discussing political elections, which have relatively large numbers of voters, this answers the question for range elections, as well: Rate a candidate the maximum score if you like them better than you expect to like the outcome of the election. Otherwise, rate them the minimum score.

Summing it up

What we’ve learned, then, is that optimal voting in approval or range systems boils down to two nested rules.

  • Exact rule (for the mathematically fearless): approve c iff U(c) > ğ�”¼[ U(W(B)) | your extra vote for c is pivotal ]. This Bayesian test weighs each candidate against the expected utility in the razor-thin worlds where they tie for first.
  • Large-electorate shortcut (for everyone else): because those pivotal worlds become astronomically rare as the field grows, the condition shrinks to a single cutoff: approve (or give a maximum score) to every candidate whose victory you expect to enjoy more than you expected to like the result. (If you can, imagine only cases where you know the election is close.)

We’ve seen why the first rule is the gold standard; but the second captures virtually all of its benefit when millions are voting. Either way, strategy is inseparable from sincerity: you must translate beliefs about polling into a utility threshold, and then measure every candidate against it. We’ve also seen by a clear mathematical equivalence why range ballots add no real leverage in large-scale elections, instead only offering false choices that are always wrong.

The entire playbook fits on a sticky note: compute the threshold, vote all-or-nothing, and let the math do the rest.

by Chris Smith at May 28, 2025 08:25 PM

Philip Wadler

The Provocateurs: Brave New Bullshit

Following two sell-out shows at the Fringe last year, I'm on at the Fringe again:

11.25 Monday 4 August, Stand 2 w/Lucy Remnant and Susan Morrison
17.40 Sunday 17 August, Stand 4 w/Smita Kheria and Sarah-Jane Judge
17.40 Tuesday 19 August, Stand 4 w/Cameron Wyatt and Susan Morrison

Shows are under the banner of The Provocateurs (formerly Cabaret of Dangerous Ideas). Tickets go on sale Wednesday 7 May, around noon. The official blurb is brief:

Professor Philip Wadler (The University of Edinburgh) separates the hopes and threats of AI from the chatbot bullshit.

by Philip Wadler (noreply@blogger.com) at May 28, 2025 11:16 AM

May 25, 2025

Mark Jason Dominus

Mystery of the quincunx's missing quincunx

A quincunx is the X-shaped pattern of pips on the #5 face of a die.

A square with five dots arranged in an X

It's so-called because the Romans had a common copper coin called an as, and it was divided (monetarily, not physically) into twelve uncia. There was a bronze coin worth five uncia called a quīncunx, which is a contraction of quīnque (“five”) + uncia, and the coin had that pattern of dots on it to indicate its value.

Uncia generally meant a twelfth of something. It was not just a twelfth of an as, but also a twelfth of a pound , which is where we get the word “ounce”, and a twelfth of a foot, which is where we get the word “inch”.

The story I always heard about the connection between the coin and the X-shaped pattern of dots was the one that is told by Wikipedia:

Its value was sometimes represented by a pattern of five dots arranged at the corners and the center of a square, like the pips of a die. So, this pattern also came to be called quincunx.

Or the Big Dictionary:

… [from a] coin of this value (occasionally marked with a pattern resembling the five spots on a dice cube),…

But today I did Google image search for qunicunxes. And while most had five dots, I found not even one that had the dots arranged in an X pattern.

Pictures of the two sides of an ancient, corroded, worn, weathered coin.  Each one has a four-armed cross who arms have crossbars at the ends, and the one on the right also has five dots.  The dots are in a cluster in the space between the cross's lower and left arms, and are arranged in a row of three and, closer to the center, a row of two.

Another cruddy coin. The obverse shows the head of a person, probably Minerva, wearing a plumed helmet. Above the head is a row of five dots.

This coin is covered with green oxide.  The obverse is another helmeted Minerva, surmounted by a horizontal row of five dots.  The reverse has a picture of an owl, and, on the right, a column of five dots.

(I believe the heads here are Minerva, goddess of wisdom. The owl is also associated with Minerva.)

Where's the quincunx that actually has a quincuncial arrangement of dots? Nowhere to be found, it seems. But everyone says it, so it must be true.

Addenda

  • The first common use of “quincunx” as an English word was to refer to trees that were planted in a quincuncial pattern, although not necessarily in groups of exactly five, in which each square of four trees had a fifth at its center.

  • Similarly, the Galton Box, has a quincuncial arrangement of little pegs. Galton himself called it a “quincunx”.

  • The OED also offers this fascinating aside:

    Latin quincunx occurs earlier in an English context. Compare the following use apparently with reference to a v-shaped figure:

    1545 Decusis, tenne hole partes or ten Asses...It is also a fourme in any thynge representyng the letter, X, whiche parted in the middel, maketh an other figure called Quincunx, V.

    which shows that for someone, a quincuncial shape was a V and not an X, presumably because V is the Roman numeral for five.

    A decussis was a coin worth not ten uncia but ten asses, and it did indeed have an X on the front. A five-as coin was a quincussis and it had a V. I wonder if the author was confused?

    The source is Bibliotheca Eliotæ. The OED does not provide a page number.

  • It wasn't until after I published this that I realized that today's date was the extremely quincuncial 2025-05-25. I thank the gods of chance and fortune for this little gift.

by Mark Dominus (mjd@plover.com) at May 25, 2025 11:00 PM

May 24, 2025

Mark Jason Dominus

The fivefold symmetry of the quince

The quince is so-named because, like other fruits in the apple family, it has a natural fivefold symmetry:

several greenish-yellow quinces. They are like shiny pears, but less elongated.  In the foreground, one is cut in half, to reveal five wedge-shaped hollows arranged symmetrically to form a circle, each filled with shiny brown seeds.

This is because their fruits develop from five-petaled flowers, and the symmetry persists through development. These are pear blossoms:

A small branch from a pear tree, with green leaves and white pear blossoms.  The bossoms have five petals each, against which a cluster of dark-tipped stamens contrasts.

You can see this in most apples if you cut them into equatorial slices:

Apple slices on a cutting board, each with a hole in the middle from the seed capsule in the center of the core, in the shape of a five-pointed star.

The fivefold symmetry isn't usually apparent from the outside once the structure leaves the flowering stage. But perfect Red Delicious specimens do have five little feet:

A dozen Red Delicious apples, bottoms up to show that each does have five little bumps arranged around the blossom end.

P.S.: I was just kidding about the name of the quince, which actually has nothing to do with any of this. It is a coincidence.

by Mark Dominus (mjd@plover.com) at May 24, 2025 03:29 AM

May 22, 2025

Simon Marlow

Indexing Hackage: Glean vs. hiedb

Indexing Hackage: Glean vs. hiedb

I thought it might be fun to try to use Glean to index as much of Hackage as I could, and then do some rough comparisons against hiedb and also play around to see what interesting queries we could run against a database of all the code in Hackage.

This project was mostly just for fun: Glean is not going to replace hiedb any time soon, for reasons that will become clear. Neither are we ready (yet) to build an HLS plugin that can use Glean, but hopefully this at least demonstrates that such a thing should be possible, and Glean might offer some advantages over hiedb in performance and flexibility.

A bit of background:

  • Glean is a code-indexing system that we developed at Meta. It’s used internally at Meta for a wide range of use cases, including code browsing, documentation generation and code analysis. You can read about the ways in which Glean is used at Meta in Indexing Code At Scale with Glean.

  • hiedb is a code-indexing system for Haskell. It takes the .hie files that GHC produces when given the option -fwrite-ide-info and writes the information to a SQLite database in various tables. The idea is that putting the information in a DB allows certain operations that an IDE needs to do, such as go-to-definition, to be fast.

You can think of Glean as a general-purpose system that does the same job as hiedb, but for multiple languages and with a more flexible data model. The open-source version of Glean comes with indexers for ten languages or so, and moreover Glean supports SCIP which has indexers for various languages available from SourceGraph.

Since a hiedb is just a SQLite DB with a few tables, if you want you can query it directly using SQL. However, most users will access the data through either the command-line hiedb tool or through the API, which provide the higher-level operations such as go-to-definition and find-references. Glean has a similar setup: you can make raw queries using Glean’s query language (Angle) using the Glean shell or the command-line tool, while the higher-level operations that know about symbols and references are provided by a separate system called Glass which also has a command-line tool and API. In Glean the raw data is language-specific, while the Glass interface provides a language-agnostic view of the data in a way that’s useful for tools that need to navigate or search code.

An ulterior motive

In part all of this was an excuse to rewrite Glean’s Haskell indexer. We built a Haskell indexer a while ago but it’s pretty limited in what information it stores, only capturing enough information to do go-to-definition and find-references and only for a subset of identifiers. Furthermore the old indexer works by first producing a hiedb and consuming that, which is both unnecessary and limits the information we can collect. By processing the .hie files directly we have access to richer information, and we don’t have the intermediate step of creating the hiedb which can be slow.

The rest of this post

The rest of the post is organised as follows, feel free to jump around:

  • Performance: a few results comparing hiedb with Glean on an index of all of Hackage

  • Queries: A couple of examples of queries we can do with a Glean index of Hackage: searching by name, and finding dead code.

  • Apparatus: more details on how I set everything up and how it all works.

  • What’s next: some thoughts on what we still need to add to the indexer.

Performance

All of this was perfomed on a build of 2900+ packages from Hackage, for more details see Building all of Hackage below.

Indexing performance

I used this hiedb command:

hiedb index -D /tmp/hiedb . --skip-types

I’m using --skip-types because at the time of writing I haven’t implemented type indexing in Glean’s Haskell indexer, so this should hopefully give a more realistic comparison.

This was the Glean command:

glean --service localhost:1234 \
  index haskell-hie --db stackage/0 \
  --hie-indexer $(cabal list-bin hie-indexer) \
  ~/code/stackage/dist-newstyle/build/x86_64-linux/ghc-9.4.7 \
  --src '$PACKAGE'

Time to index:

  • hiedb: 1021s
  • Glean: 470s

I should note that in the case of Glean the only parallelism is between the indexer and the server that is writing to the DB. We didn’t try to index multiple .hie files in parallel, although that would be fairly trivial to do. I suspect hiedb is also single-threaded just going by the CPU load during indexing.

Size of the resulting DB

  • hiedb: 5.2GB
  • Glean: 0.8GB

It’s quite possible that hiedb is simply storing more information, but Glean does have a rather efficient storage system based on RocksDB.

Performance of find-references

Let’s look up all the references of Data.Aeson.encode:

hiedb -D /tmp/hiedb name-refs encode Data.Aeson

This is the query using Glass:

cabal run glass-democlient -- --service localhost:12345 \
  references stackage/hs/aeson/Data/Aeson/var/encode

This is the raw query using Glean:

glean --service localhost:1234 --db stackage/0 \
  '{ Refs.file, Refs.uses[..] } where Refs : hs.NameRefs; Refs.target.occ.name = "encode"; Refs.target.mod.name = "Data.Aeson"'
  • hiedb: 2.3s
  • glean (via Glass): 0.39s
  • glean (raw query): 0.03s

(side note: hiedb found 416 references while Glean found 415. I haven’t yet checked where this discrepancy comes from.)

But these results don’t really tell the whole story.

In the case of hiedb, name-refs does a full table scan so it’s going to take time proportional to the number of refs in the DB. Glean meanwhile has indexed the references by name, so it can serve this query very efficiently. The actual query takes a few milliseconds, the main overhead is encoding and decoding the results.

The reason the Glass query takes longer than the raw Glean query is because Glass also fetches additional information about each reference, so it performs a lot more queries.

We can also do the raw hiedb query using the sqlite shell:

sqlite> select count(*) from refs where occ = "v:encode" AND mod = "Data.Aeson";
417
Run Time: real 2.038 user 1.213905 sys 0.823001

Of course hiedb could index the refs table to make this query much faster, but it’s interesting to note that Glean has already done that and it was still quicker to index and produced a smaller DB.

Performance of find-definition

Let’s find the definition of Data.Aeson.encode, first with hiedb:

$ hiedb -D /tmp/hiedb name-def encode Data.Aeson
Data.Aeson:181:1-181:7

Now with Glass:

$ cabal run glass-democlient -- --service localhost:12345 \
  describe stackage/hs/aeson/Data/Aeson/var/encode
stackage@aeson-2.1.2.1/src/Data/Aeson.hs:181:1-181:47

(worth noting that hiedb is giving the span of the identifier only, while Glass is giving the span of the whole definition. This is just a different choice; the .hie file contains both.)

And the raw query using Glean:

$ glean --service localhost:1234 query --db stackage/0 --recursive \
  '{ Loc.file, Loc.span } where Loc : hs.DeclarationLocation; N : hs.Name; N.occ.name = "encode"; N.mod.name = "Data.Aeson"; Loc.name = N' | jq
{
  "id": 18328391,
  "key": {
    "tuplefield0": {
      "id": 9781189,
      "key": "aeson-2.1.2.1/src/Data/Aeson.hs"
    },
    "tuplefield1": {
      "start": 4136,
      "length": 46
    }
  }
}

Times:

  • hiedb: 0.18s
  • Glean (via Glass): 0.05s
  • Glean (raw query): 0.01s

In fact there’s a bit of overhead when using the Glean CLI, we can get a better picture of the real query time using the shell:

stackage> { Loc.file, Loc.span } where Loc : hs.DeclarationLocation; N : hs.Name; N.occ.name = "encode"; N.mod.name = "Data.Aeson"; Loc.name = N
{
  "id": 18328391,
  "key": {
    "tuplefield0": { "id": 9781189, "key": "aeson-2.1.2.1/src/Data/Aeson.hs" },
    "tuplefield1": { "start": 4136, "length": 46 }
  }
}

1 results, 2 facts, 0.89ms, 696176 bytes, 2435 compiled bytes

The query itself takes less than 1ms.

Again, the issue with hiedb is that its data is not indexed in a way that makes this query efficient: the defs table is indexed by the pair (hieFile,occ) not occ alone. Interestingly, when the module is known it ought to be possible to do a more efficient query with hiedb by first looking up the hieFile and then using that to query defs.

What other queries can we do with Glean?

I’ll look at a couple of examples here, but really the possibilities are endless. We can collect whatever data we like from the .hie file, and design the schema around whatever efficient queries we want to support.

Search by case-insensitive prefix

Let’s search for all identifiers that start with the case-insensitive prefix "withasync":

$ glass-democlient --service localhost:12345 \
  search stackage/withasync -i | wc -l
55

In less than 0.1 seconds we find 55 such identifiers in Hackage. (the output isn’t very readable so I didn’t include it here, but for example this finds results not just in async but in a bunch of packages that wrap async too).

Case-insensitive prefix search is supported by an index that Glean produces when the DB is created. It works in the same way as efficient find-references, more details on that below.

Why only prefix and not suffix or infix? What about fuzzy search? We could certainly provide a suffix search too; infix gets more tricky and it’s not clear that Glean is the best tool to use for infix or fuzzy text search: there are better data representations for that kind of thing. Still, case-insensitive prefix search is a useful thing to have.

Could we support Hoogle using Glean? Absolutely. That said, Hoogle doesn’t seem too slow. Also we need to index types in Glean before it could be used for type search.

Identify dead code

Dead code is, by definition, code that isn’t used anywhere. We have a handy way to find that: any identifier with no references isn’t used. But it’s not quite that simple: we want to ignore references in imports and exports, and from the type signature.

Admittedly finding unreferenced code within Hackage isn’t all that useful, because the libraries in Hackage are consumed by end-user code that we haven’t indexed so we can’t see all the references. But you could index your own project using Glean and use it to find dead code. In fact, I did that for Glean itself and identified one entire module that was dead, amongst a handful of other dead things.

Here’s a query to find dead code:

N where
  N = hs.Name _;
  N.sort.external?;
  hs.ModuleSource { mod = N.mod, file = F };
  !(
    hs.NameRefs { target = N, file = RefFile, uses = R };
    RefFile != F;
    coderef = (R[..]).kind
  )

Without going into all the details, here’s roughly how it works:

  • N = hs.Name _; declares N to be a fact of hs.Name
  • N.sort.external?; requires N to be external (i.e. exported), as opposed to a local variable
  • hs.ModuleSource { mod = N.mod, file = F }; finds the file F corresponding to this name’s module
  • The last part is checking to see that there are no references to this name that are (a) in a different file and (b) are in code, i.e. not import/export references. Restricting to other files isn’t exactly what we want, but it’s enough to exclude references from the type signature. Ideally we would be able to identify those more precisely (that’s on the TODO list).

You can try this on Hackage and it will find a lot of stuff. It might be useful to focus on particular modules to find things that aren’t used anywhere, for example I was interested in which identifiers in Control.Concurrent.Async aren’t used:

N where
  N = hs.Name _;
  N.mod.name = "Control.Concurrent.Async";
  N.mod.unit = "async-2.2.4-inplace";
  N.sort.external?;
  hs.ModuleSource { mod = N.mod, file = F };
  !(
    hs.NameRefs { target = N, file = RefFile, uses = R };
    RefFile != F;
    coderef = (R[..]).kind
  )

This finds 21 identifiers, which I can use to decide what to deprecate!

Apparatus

Building all of Hackage

The goal was to build as much of Hackage as possible and then to index it using both hiedb and Glean, and see how they differ.

To avoid problems with dependency resolution, I used a Stackage LTS snapshot of package versions. Using LTS-21.21 and GHC 9.4.7, I was able to build 2922 packages. About 50 failed for some reason or other.

I used this cabal.project file:

packages: */*.cabal
import: https://www.stackage.org/lts-21.21/cabal.config

package *
    ghc-options: -fwrite-ide-info

tests: False
benchmarks: False

allow-newer: *

And did a large cabal get to fetch all the packages in LTS-21.21.

Then

cabal build all --keep-going

After a few retries to install any required RPMs to get the dependency resolution phase to pass, and to delete a few packages that weren’t going to configure successfully, I went away for a few hours to let the build complete.

It’s entirely possible there’s a better way to do this that I don’t know about - please let me know!

Building Glean

The Haskell indexer I’m using is in this pull request which at the time of writing isn’t merged yet. (Since I’ve left Meta I’m just a regular open-source contributor and have to wait for my PRs to be merged just like everyone else!).

Admittedly Glean is not the easiest thing in the world to build, mainly because it has a couple of troublesome dependencies: folly (Meta’s library of highly-optimised C++ utilities) and RocksDB. Glean depends on a very up to date version of these libraries so we can’t use any distro packaged versions.

Full instructions for building Glean are here but roughly it goes like this on Linux:

  • Install a bunch of dependencies with apt or yum
  • Build the C++ dependencies with ./install-deps.sh and set some env vars
  • make

The Makefile is needed because there are some codegen steps that would be awkward to incorporate into the Cabal setup. After the first make you can usually just switch to cabal for rebuilding stuff unless you change something (e.g. a schema) that requires re-running the codegen.

Running Glean

I’ve done everything here with a running Glean server, which was started like this:

cabal run exe:glean-server -- \
  --db-root /tmp/db \
  --port 1234 \
  --schema glean/schema/source

While it’s possible to run Glean queries directly on the DB without a server, running a server is the normal way because it avoids the latency from opening the DB each time, and it keeps an in-memory cache which significantly speeds up repeated queries.

The examples that use Glass were done using a running Glass server, started like this:

cabal run glass-server -- --service localhost:1234 --port 12345

How does it work?

The interesting part of the Haskell indexer is the schema in hs.angle. Every language that Glean indexes needs a schema, which describes the data that the indexer will store in the DB. Unlike an SQL schema, a Glean schema looks more like a set of datatype declarations, and it really does correspond to a set of (code-generated) types that you can work with when programmatically writing data, making queries, or inspecting results. For more about Glean schemas, see the documentation.

Being able to design your own schema means that you can design something that is a close match for the requirements of the language you’re indexing. In our Glean schema for Haskell, we use a Name, OccName, and Module structure that’s similar to the one GHC uses internally and is stored in the .hie files.

The indexer itself just reads the .hie files and produces Glean data using datatypes that are generated from the schema. For example, here’s a fragment of the indexer that produces Module facts, which contain a ModuleName and a UnitName:

mkModule :: Glean.NewFact m => GHC.Module -> m Hs.Module
mkModule mod = do
  modname <- Glean.makeFact @Hs.ModuleName $
    fsToText (GHC.moduleNameFS (GHC.moduleName mod))
  unitname <- Glean.makeFact @Hs.UnitName $
    fsToText (unitFS (GHC.moduleUnit mod))
  Glean.makeFact @Hs.Module $
    Hs.Module_key modname unitname

Also interesting is how we support fast find-references. This is done using a stored derived predicate in the schema:

predicate NameRefs:
  {
    target: Name,
    file: src.File,
    uses: [src.ByteSpan]
  } stored {Name, File, Uses} where
  FileXRefs {file = File, refs = Refs};
  {name = Name, spans = Uses} = Refs[..];

here NameRefs is a predicate—which you can think of as a datatype, or a table in SQL—defined in terms of another predicate, FileXRefs. The facts of the predicate NameRefs (rows of the table) are derived automatically using this definition when the DB is created. If you’re familiar with SQL, a stored derived predicate in Glean is rather like a materialized view in SQL.

What’s next?

As I mentioned earlier, the indexer doesn’t yet index types, so that would be an obvious next step. There are a handful of weird corner cases that aren’t handled correctly, particularly around record selectors, and it would be good to iron those out.

Longer term ideally the Glean data would be rich enough to produce the Haddock docs. In fact Meta’s internal code browser does produce documentation on the fly from Glean data for some languages - Hack and C++ in particular. Doing it for Haskell is a bit tricky because while I believe the .hie file does contain enough information to do this, it’s not easy to reconstruct the full ASTs for declarations. Doing it by running the compiler—perhaps using the Haddock API—would be an option, but that involves a deeper integration with Cabal so it’s somewhat more awkward to go that route.

Could HLS use Glean? Perhaps it would be useful to have a full Hackage index to be able to go-to-definition from library references? As a plugin this might make sense, but there are a lot of things to fix and polish before it’s really practical.

Longer term should we be thinking about replacing hiedb with Glean? Again, we’re some way off from that. The issue of incremental updates is an interesting one - Glean does support incremental indexing but so far it’s been aimed at speeding up whole-repository indexing rather than supporting IDE features.

May 22, 2025 12:00 AM

May 08, 2025

Mark Jason Dominus

A descriptive theory of seasons in the Mid-Atlantic

[ I started thinking about this about twenty years ago, and then writing it down in 2019, but it seems to be obsolete. I am publishing it anyway. ]

The canonical division of the year into seasons in the northern temperate zone goes something like this:

  • Spring: March 21 – June 21
  • Summer: June 21 – September 21
  • Autumn: September 21 – December 21
  • Winter: December 21 – March 21

Living in the mid-Atlantic region of the northeast U.S., I have never been happy with this. It is just not a good description of the climate.

I begin by observing that the year is not equally partitioned between the four seasons. The summer and winter are longer, and spring and autumn are brief and happy interludes in between.

I have no problem with spring beginning in the middle of March. I think that is just right. March famously comes in like a lion and goes out like a lamb. The beginning of March is crappy, like February, and frequently has snowstorms and freezes. By the end of March, spring is usually skipping along, with singing birds and not just the early flowers (snowdrops, crocuses, daffodil) but many of the later ones also.

By the middle of May the spring flowers are over and the weather is getting warm, often uncomfortably so. Summer continues through the beginning of September, which is still good for swimming and lightweight clothes. In late September it finally gives way to autumn.

Autumn is jacket weather but not overcoat weather. Its last gasp is in the middle of November. By this time all the leaves have changed, and the ones that are going to fall off the trees have done so. The cool autumn mist has become a chilly winter mist. The cold winter rains begin at the end of November.

So my first cut would look something like this:

Months
Seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter

Note that this puts Thanksgiving where it belongs at the boundary between autumn (harvest season) and winter (did we harvest enough to survive?). Also, it puts the winter solstice (December 21) about one quarter of the way through the winter. This is correct. By the solstice the days have gotten short, and after that the cold starts to kick in. (“As the days begin to lengthen, the cold begins to strengthen”.) The conventional division takes the solstice as the beginning of winter, which I just find perplexing. December 1 is not the very coldest part of winter, but it certainly isn't autumn.

There is something to be said for it though. I think I can distinguish several subseasons — ten in fact:

Dominus Seasonal Calendar

Months
Seasons
Sub-seasons
January
February
March
April
May
June
July
August
September
October
November
December
Winter
Spring
Summer
Autumn
Winter
Midwinter
Late Winter
Early spring
Late spring
Early Summer
Midsummer
Late Summer
Early autumn
Late autumn
Early winter
Midwinter

Midwinter, beginning around the solstice, is when the really crappy weather arrives, day after day of bitter cold. In contrast, early and late winter are typically much milder. By late February the snow is usually starting to melt. (March, of course, is always unpredictable, and usually has one nasty practical joke hiding up its sleeve. Often, March is pleasant and springy in the second week, and then mocks you by turning back into January for the third week. This takes people by surprise almost every year and I wonder why they never seem to catch on.)

Similarly, the really hot weather is mostly confined to midsummer. Early and late summer may be warm but you do not get blazing sun and you have to fry your eggs indoors, not on the pavement.

Why the seasons seem to turn in the middle of each month, and not at the beginning, I can't say. Someone messed up, but who? Probably the Romans. I hear that the Persians and the Baha’i start their year on the vernal equinox. Smart!

Weather in other places is very different, even in the temperate zones. For example, in southern California they don't have any of the traditional seasons. They have a period of cooler damp weather in the winter months, and then instead of summer they have a period of gloomy haze from June through August.

However

I may have waited too long to publish this article, as climate change seems to have rendered it obsolete. In recent years, we have barely had midwinter, and instead of the usual two to three annual snows we have zero. Midsummer has grown from two to four months, and summer now lasts into October.

by Mark Dominus (mjd@plover.com) at May 08, 2025 10:39 PM

May 05, 2025

Matthew Sackman

Payslips and tax: calculating your own

In the UK, it’s very common that your employer pays you once a month. When this happens, they give you a document called a payslip, that has some numbers on it, such as how much your salary is, how much they paid you this month, how much went to HMRC in tax, how much went to your pension, and a few other numbers. But they never show any workings, so you really have no way to check whether any of these numbers are correct. There are plenty of online take-home-pay calculators, but these all focus on the full year; they have no facility to calculate your next payslip.

About half way through April 2024, I stopped working for one company. Everything was wrapped up – I received my final payslip from them, along with my P45. I then had a few months off, and started a new job in July 2024. When you start a new job it always takes a while for money things to get sorted out, for example pension enrolment and sorting out pension contributions, so it’s really worthwhile to keep a close eye on your payslips particularly for these first few months. Mine were arriving and some numbers looked right, but other numbers, such as the amount of tax I was paying, were changing dramatically, month to month. I had no idea why; whether they should be changing like that; whether they were going to keep changing or would eventually settle down. I had no way to check any of these numbers. Was I going to get in trouble with HMRC and get investigated?

I was also a little on edge because this was the first job where my pension contributions were using a thing called Qualifying Earnings. In all my previous jobs, if I chose for 10% of my salary to go into my pension, then that’s what would happen. But now there was this thing called Qualifying Earnings, which is (numbers correct at time of writing) a band from £6240 to £50,270. If you’re earning, say £30k, then your x% contribution is actually x% of £30,000-£6240. If you’re earning above £50,270, then any further increase to your salary will not result in any extra contributions to your pension because you’re above the band. The 2008 Pensions Act, which created the legal requirement for all employees to have workplace pensions and for automatic enrolment (with a minimum 8% combined contribution from the employer and employee), also created this concept of Qualifying Earnings. I consider this is a pretty scummy way of reducing employer pension contributions for large firms. It complicates the maths and no doubt adds confusion for people trying to check their own payslips. Given that 74% of the population have pensions that are too small to retire on, this whole concept of Qualifying Earnings seems amoral at best.

These days, a lot of smaller companies outsource their payroll processing. In my case, I was officially working for an international Employer of Record and they were then outsourcing payroll processing to local firms with country-specific expertise. So when I started asking questions, there was no ability to go and sit with someone and work through it. Or have a call. It was all messages passed across multiple different systems, and partial answers at best would come back several days later. Even if your payroll is done in-house, I strongly suspect that a lot of the time, some software package will be being used that does all the calculations and quite likely no one will actually understand or be able to explain the maths that’s going on.

After a while of getting no-where, and after uncovering some substantial mistakes that had been made that affected me, I decided to spend some weekends actually figuring out how PAYE works, and writing some code that can calculate my next payslip. This library is available for anyone to use. There’s a README that hopefully explains the basic principles of how the calculations are done. This only works if your tax-code ends in an L, and it only works if you’re in National Insurance category A. All the code can do is use some details you provide to predict your next payslips. Also, I’m not a trained accountant or financial adviser, and even for my own payslips, every month, the numbers don’t quite match up (but they’re within £1). So please treat this as a toy, rather than the basis for building a payroll processor!

Getting started

The library is written in Go so you’ll need Go installed. Then, in a terminal do:

$ mkdir payslips
$ cd payslips
$ go mod init mypayslips
$ go get wellquite.org/tax@latest

Now we need to write a tiny amount of code. In your new payslips directory, create a main.go file, and open it in your editor. You want something like this:

package main

import (
   "fmt"
   "wellquite.org/tax"
)

func main() {
   payslips := tax.Payslips{
      {
         Year:                            2024,
         TaxCode:                         "1257L",
         Salary:                          tax.Yearly(50000),
         PensionType:                     tax.Salary,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(50000),
         PensionType:                     tax.Salary,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {},
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         EmployeePensionContributionRate: 0.05,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         EmployeePensionContributionRate: 0.15,
         EmployerPensionContributionRate: 0.03,
      },
      {
         Salary:                          tax.Yearly(60000),
         PensionType:                     tax.QualifyingEarnings,
         Expenses:                        116.08,
         EmployeePensionContributionRate: 0.15,
         EmployerPensionContributionRate: 0.03,
      },
   }

   payslips.Complete()
   fmt.Println(payslips)
}

We create a list of Payslips. The first payslip must specify a year, and your tax-code. These details are automatically applied to the payslips that follow, if not explicitly provided. Many of the calculations rely on year-to-date totals, and so we must have a complete record of your payslips from the start of the tax year. So that means the first payslip is month 1 (in this example, April 2024), then month 2 (May 2024) and so on. If you have no income for a month then you can just put in an empty payslip ({}). The above example describes being paid in April and May 2024, then nothing in June, and then being paid (with a higher salary) in July, August and September.

Save this main.go file. Then, back in your terminal, in your payslips directory, just do:

go run main.go

You should get some output showing all sorts of calculations, including income tax, and personal allowance. With a little luck, if you change the numbers to match your own salary and other details, the numbers produced should match quite closely your own payslips, provided nothing you’re doing is too exotic.

There is documentation for all the different fields that you can provide in each payslip. In general, the code will try to fill in missing values. It should be able to cope with things like salary-sacrifice, or, if you change job within a month and have several payslips for the same month, this should work too. Everything is run locally on your computer: please feel free to check the source – there are no 3rd party libraries at all, and nothing imports the net package. It’ll work just the same if you yank out your network cable or disable your WiFi.

Note however, this code is lightly tested. Whilst it works for me (and one or two friends), I make no claims that it correctly models the entirety of PAYE, so it may very well not work for you. Feedback, contributions, corrections, and patches are all very welcome!

May 05, 2025 02:30 PM

Sandy Maguire

Analyzing API Design via Algebraic Laws

The other day, someone asked:

Why doesn’t [the Data.Map function] unionWith :: (a -> a -> a) -> Map k a -> Map k a -> Map k a allow for different value types the way intersectionWith :: (a -> b -> c) -> Map k a -> Map k b -> Map k c does?

This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation.

To answer the original question, what would the type of a different value type of unionWith look like? It would be something in the flavor of:

unionWith :: (Maybe a -> Maybe b -> c) -> Map k a -> Map k b -> Map k c

But this new Maybe a -> Maybe b -> c parameter is somewhat lossy, in that it gives the impression that it could be called with Nothing Nothing as parameters, which doesn’t fit into the vibe of being a “union.”

So instead we could restrict that possibility by using These a b:

data These a b = This a | That b | These a b

unionWith :: (These a b -> c) -> Map k a -> Map k b -> Map k c

which seems reasonable enough.


But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map?

A Map k v is a particularly efficient implementation of functions with type k -> Maybe v. But why is this Maybe here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be Monoid v => k -> v. In fact, it helps us make sense of the right bias present in Data.Map, where we see:

lookup k (singleton k v1 <> singleton k v2) = Just v2

This equality is hard to justify under the normal understanding of Map k v being an encoding of a function k -> Maybe v. But under the general monoid interpretation, we get a nice semigroup homomorphism:

lookup k (m1 <> m2) = lookup k m1 <> lookup k m2

where the monoid in question has been specialized to be Last.

Of course, we also have a monoid homomorphism:

lookup k mempty = mempty

Let’s re-evaluate the original question in terms of this newly-generalized Map. Now that we’ve removed all of the unnecessary baggage of Maybe, we can again think about the desired type of unionWith:

unionWith
    :: (a -> b -> c)
    -> Map k a
    -> Map k b
    -> Map k c

which looks awfully familiar. This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with mempty as a parameter!

We can give some semantics as to what unionWith ought to do again by relating it to the observation lookup. The relevant law here seems like it ought to be:

lookup k (unionWith f m n) = f (lookup k m) (lookup k n)

By choosing a degenerate function f, say, \_ _ -> nontrivial, where nontrivial is some value that is not mempty, we can see the beginnings of a problem:

  lookup k (unionWith f m n)
=
  f (lookup k m) (lookup k n)
= <let f = \_ _ -> nontrivial>
  nontrivial

Regardless of the key we lookup in our unionWithed Map, we need to get back nontrivial. How can we implement such a thing? I see only two ways:

  1. explicitly associate every key in the map with nontrivial, or
  2. keep nontrivial around as a default value in the map

#1 is clearly a non-starter, given that we want our Maps to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this:

data Map k v = Map
  { defaultValue :: v
  , implementation :: Data.Map.Map k v
  }
  deriving stock Generic
  deriving (Semigroup, Monoid) via (Generically (Map k v))

unionWith
    :: (a -> b -> c)
    -> Map k a
    -> Map k b
    -> Map k c
unionWith f (Map def1 imp1) (Map def2 imp2) =
  Map (f def1 def2) (liftA2 f imp1 imp2)

Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism:

lookup k (m1 <> m2) = lookup k m1 <> lookup k m2

Without loss of generalization, take m2 = pure nontrivial (where pure is just unionWith with a constant function.) This gives us:

lookup k (m1 <> pure nontrivial) = lookup k m1 <> nontrivial

Making this thing efficient is a further complication! We again have two options:

  1. modify the value at every key by multiplying in nontrivial, or
  2. finding a way of suspending this computation

#1 clearly requires \(O(n)\) work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire Map. For example, consider a Map constructed a la:

((pure prefix1 <> ((pure prefix2 <> m) <> n)) <> (p <> pure suffix)

Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of Data.Map.Map.


I find this quite an interesting result. I always assumed that Data.Map.Map (or at least, Data.Map.Monoidal.MonoidalMap) didn’t have an Applicative instance because it would require a Monoid constraint on its output—but that’s not the sort of thing we can express in Haskell.

But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of Applicative, even if we could constrain the result.

What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of Data.Map (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.

May 05, 2025 07:33 AM

May 04, 2025

Sandy Maguire

Using Obscure Graph Theory to solve PL Problems

Usually I write about solutions to problems I’ve worked out, but I’ve found myself increasingly becoming interesting in where solutions come from. Maybe it’s because I’ve been reading Boorstin’s excellent The Discoverers, which I’d strongly recommend.

Regardless of why, I thought I’d switch up the usual dance step today, and discuss what solving my most-recent-big-problem actually looked like, in terms of what I tried, where I looked, and what the timeline was.

The Problem

The problem is to serialize a program graph into a series of let-bindings. For example, given the following graph:

      +
    /   \
  f ---> g
  |     / \
  a     \ /
      expensive

which represents the program:

f a (g expensive expensive) + g expensive expensive

Unfortunately, this is a naive representation of the program, since it duplicates the work required to compute expensive four times, and g expensive expensive twice. Instead, we would prefer to generate the equivalent-but-more-efficient program:

let $0 = expensive
    $1 = g $0 $0
 in f a $1 + $1

This transformation is affectionately known as sharing, since it shares the computed answer whenever there is repeated work to be done.

So this is what we’re trying to do. Given the original graph, determine the best place to insert these let-bindings, for some reasonable definition of “best.” We can assume there are no side effects involved, so any place that an expression is well-scoped is an acceptable solution.

In order to understand some of my attempted solutions, it’s worth noting that our final solution should build something of type Expr, and the original graph is represented as a IntMap (ExprF Int). ExprF is the Base functor of Expr, with all of its self-references replaced by some type variable, in this case Int. Thus, the graph above looks much more like:

_ : IntMap (ExprF Int)
_ = IM.fromList
  [ (0, Apply "+" [1, 3])
  , (1, Apply "f" [2, 3]
  , (2, ...)  -- a
  , (3, Apply "g" [4, 4])
  , (4, ...)  -- expensive
  ]

The Original Solution

I spent over a year trying to solve this problem, with various mostly-working solutions during that time. My strategy here was to think really hard, write up some algorithm that seemed plausible, and then run it against our (small) battery of integration tests to make sure it got the same answer as before.

Why not property test it? I tried, but found it very challenging to implement well-typed generators that would reliably introduce shared thunks. But maybe there’s a different lesson to be learned here about writing good generators.

Anyway. For eight months, one of these think-really-hard algorithms fit the bill and didn’t give us any problems. It was a weird, bespoke solution to the problem that independetly kept track of all of the free variables in every graph fragment, and tried to let-bind a fragment as soon as we landed in a context where all of the free variables were in scope. It seemed to work, but it was extremely messy and unmaintainable.

At the time of writing, this sharing algorithm was the only source of let-binds in our entire language, which meant that it didn’t need to account for let-binds in the program.

Of course, that invariant eventually changed. We added a way in the source langauge to introduce lets, which meant my algorithm was wrong. And I had written it sufficiently long ago that I no longer remembered exactly why it worked. Which meant the theory of my program was lost, and thus that we ought to rewrite it.

Unfolding a Solution

I went back to the problem statement, and stared at it for a long time (back to the think-really-hard algorithm!) Upon staring at the problem, I realized that what I was really trying to do was determine where diamond patterns arose in the propgram graph.

Recall our original graph:

      +
    /   \
  f ---> g
  |     / \
  a     \ /
      expensive

If we redraw it such that g is on a different rank than f, then the two diamond patterns become much clearer:

      +
    /  \
  f     |
  | \   |
  a  \ /
      g
     / \
     \ /
   expensive

The insight I came up with is that if a node n is the source of a diamond, then we must let-bind the sink of the diamond immediately before inlining the definition of n.

This gives rise to the question of “how do we identify a diamond?” What we can do is give a mapping from each node to its reachable set of nodes. For example, in the above, we’d compute the map:

+         -> {+, f, a, g, expensive}
f         -> {f, a, g, expensive}
a         -> {a}
g         -> {g, expensive}
expensive -> {expensive}

Then when we go to inline a node, say, +, we can look for any nodes that are reachable via more than one of its immediate subterms. Since the immediate subterms of + are f and g, we can take the intersections of their reachable sets:

{f, a, g, expensive} union {g, expensive}

giving us

{g, expensive}

which is exactly the set of nodes that we need to perform sharing on. If you topologically sort this set, it gives you the order that you should perform your let bindings.

EXCEPT there’s a kink in the whole thing. What happens if one of the terms in this diamond contains free variables? In particular, we might have something like this:

      +
    /  \
  f     |
  | \   |
  a  \ /
      λx
     / \
     \ /
   expensive
      |
      x

This gives us an analogous set of reachable nodes when we look at +, but we obviously can’t lift expensive x above the lambda.

Resolving this problem required giving up on the notion of memoizing the entire reachable set of nodes, and to instead crawl the graph ensuring that everything is well-scoped.

Performance Woes

My algorithm looked fine, and, importantly, got the right answer in a reasonable amount of time on our (small) battery of integration tests. So I shipped it, commended myself on a job well done, and thought nothing more about it. For about a week, until a bug report came in saying that our compiler now seemed to hang on big programs.

Which was something I hadn’t noticed, since we didn’t have any big programs in our integration tests.

Damn!

Upon digging in to what exactly was so slow, I noticed that my algorithm was accidentally quadratic. I needed to fold over every node in the graph, and that required looking at the entire reachable set underneath it. I had put in some of the obvious safeguards, hoping that they would prune the search tree early, but it wasn’t enough sacrifice for the Great God of Asymptotes.

Did I mention that at this point in the story, having this algorithm working fast was on the critical path of the company? Everybody else was blocked on me figuring this out. Talk about pressure!

Anyway. You’ll notice above that in my description of the algorithm, everything sounds fine. But the juice is in the details, as the common saying goes. Computing reachability isn’t quite the right thing to be using here, as it gave us the wrong answer for the lambda example above. Which is unfortunate because reachability is something we can do in linear time.

And then when reachability didn’t work, I just threw away the fast performance and hoped my bespoke algorithm would do the job. My only redemption comes from the fact that at least it got the right answer, even if it did so very slowly.

Finding the Kernel

Back to the drawing board.

Whenever I have graph theory problems, I call up my boy Vikrem. He’s good at nerd stuff like this.

We rubberducked the problem, and tried to reframe the problem in the language of graph theory. We had a Merkiv–Maguire moment where we indepdently realized that the goal was somehow related to finding the lowest common ancestor (LCA) of a node.

Which is to say, roughly, that we are looking for forks in the diamond diagram. Which we already knew, but it was nice to have some language for.

Our new problem is that LCA is defined only over trees. There are some extensions to DAGs, but none of them seem to be particularly well founded. However, searching for exactly that brought me to this stackoverflow question, where nestled in the comments is someone suggesting that the poster isn’t looking for LCA, but instead for a related notion the lowest single common ancestor. LSCA is defined in a 2010 paper New common ancestor problems in trees and directed acyclic graphs.

The standard definition of LCA(x, y) = l is that “l is an ancestor of x and of y, and that no descendent of l has this property.”

But the definition of LSCA(x, y) = l is that “l lies on all root-to-x paths, and that l lies on all root-to-y paths, and that no descendent of l has this property.”

The distinction between the two is easily seen in the following graph:

  0
 / \
1   2
| X |
3   4

Under the standard definition, LCA is not uniquely defined for DAGs. That is, LCA(3, 4) = {1, 2}. But neither 1 nor 2 lies on all paths from the root. Under LSCA therefore we get LSCA(3, 4) = 0, which is the obviously-correct place to let-bind 3 and 4.

The paper gives a preprocessing scheme for computing LSCA by building a “lowest single ancestor” (LSA) tree. The LSA of a node is the LSCA of all of its in-edges. This definition cashes out to mean “the most immediate diamond above any node.” Finally! This is exactly what we’re looking for, since this is where we must insert our let-bindings! Even better, the paper gives us an algorithm for computing the LSA tree in linear time!

The First Implementer

Of course, I’m lazy and would prefer not to implement this thing. So instead I searched on hackage for lsca, and found nothing. But then I searched for lca and found that, like always, Ed Kmett was 13 years ahead of me.

The lca package implements an \(O(log n)\) algorithm for computing the LCA of any two nodes in a graph. Which is very convenient for me, since the LSCA algorithm requires being able to do this.

Time to roll up the sleeves and get cracking I suppose.

The paper was surprisingly straightforward, and my first attempt implemented the (imperative) algorithms as given (imperatively.) The first step is to do a topological sort on the DAG in order to know in which order one ought to unfold the LSA tree.

But as is so often the case, this topological sort isn’t actually relevant to the algorithm; it’s just an encoding detail of expressing the algorithm imperatively. But you don’t need that when you’ve got laziness on your side! Instead you can just tie the know and do something cool like this:

lsaTree :: Ord v => Map v (Set v) -> Map v (Path v)
lsaTree input = fix $ \result -> M.fromList $ do
  (node, parents) <- M.toList input
  let parentResults = fmap (result M.!) parents
  ...

Notice how we use fix to bind the eventual result of the final computation. Then we can chase pointers by looking them up in result—even though it’s not yet “computed.” Who cares what order the computer does it in. Why is that a thing I should need to specify?

Anyway. The exact details of implementing LSA are not particularly important for the remainder of this blog post. If you’re interested, you can peep the PR, which is delightfully small.

Tying It All Back Together

Equipped with my LSA tree, I was now ready to go back and solve the original problem of figuring out where to stick let-bindings. It’s easy now. Given the original program graph, find the LSA for each node. The LSA is the place you should insert the let binding.

So given the map of nodes to their LSAs, invert that map and get back a map of nodes to descendents who have this node as an LSA. Now when you go to inline a node, just look up everything in this map and inline it first.

It turns out to be a very elegant solution. It’s one third of the length of my horrible ad-hoc implementations, and it runs in linear time of the number of nodes in the graph. All in all, very good.

More often than I’m comfortable about, people will ask me how I can have so many good ideas. And what I like about this story is that it’s pretty typical of how I actually “have” “good” ideas. I’m reminded of the fact that luck favors the prepared mind. Attentive readers will notice that none of this process was due to brilliance on my part. I happened to know Vikrem who’s a genius. Together we pulled at some ancient graph theory strings and remembered a fact that someone else had thought important to teach us. That wasn’t actually the right path, but it lead us to stackoverflow where someone had linked to a relevant paper. I implemented the paper using a library that someone else had done the heavy lifting on, and simplified the implementation using this knot-tying trick I picked up somewhere along the way.

Also, I’m just really pleased that the solution came from trying to reverse engineer the relevant graph-theory search terms. Maybe that’s the actual takeaway here.

May 04, 2025 08:05 AM

May 03, 2025

Mark Jason Dominus

Claude and I write a utility program

Then I had two problems…

A few days ago I got angry at xargs for the hundredth time, because for me xargs is one of those "then he had two problems" technologies. It never does what I want by default and I can never remember how to use it. This time what I wanted wasn't complicated: I had a bunch of PDF documents in /tmp and I wanted to use GPG to encrypt some of them, something like this:

    gpg -ac $(ls *.pdf | menupick)

menupick is a lovely little utility that reads lines from standard input, presents a menu, prompts on the terminal for a selection from the items, and then prints the selection to standard output. Anyway, this didn't work because some of the filenames I wanted had spaces in them, and the shell sucks. Also because gpg probably only does one file at a time.

I could have done it this way:

    ls *.pdf | menupick | while read f; do gpg -ac "$f"; done

but that's a lot to type. I thought “aha, I'll use xargs.” Then I had two problems.

    ls *.pdf | menupick | xargs gpg -ac

This doesn't work because xargs wants to batch up the inputs to run as few instances of gpg as possible, and gpg only does one file at a time. I glanced at the xargs manual looking for the "one at a time please" option (which should have been the default) but I didn't see it amongst the forest of other options.

I think now that I needed -n 1 but I didn't find it immediately, and I was tired of looking it up every time when it was what I wanted every time. After many years of not remembering how to get xargs to do what I wanted, I decided the time had come to write a stripped-down replacement that just did what I wanted and nothing else.

(In hindsight I should perhaps have looked to see if gpg's --multifile option did what I wanted, but it's okay that I didn't, this solution is more general and I will use it over and over in coming years.)

xar is a worse version of xargs, but worse is better (for me)

First I wrote a comment that specified the scope of the project:

    # Version of xargs that will be easier to use
    #
    # 1. Replace each % with the filename, if there are any
    # 2. Otherwise put the filename at the end of the line
    # 3. Run one command per argument unless there is (some flag)
    # 4. On error, continue anyway
    # 5. Need -0 flag to allow NUL-termination

There! It will do one thing well, as Brian and Rob commanded us in the Beginning Times.

I wrote a draft implementation that did not even do all those things, just items 2 and 4, then I fleshed it out with item 1. I decided that I would postpone 3 and 5 until I needed them. (5 at least isn't a YAGNI, because I know I have needed it in the past.)

The result was this:

    import subprocess
    import sys

    def command_has_percent(command):
        for word in command:
            if "%" in word:
                return True
        return False

    def substitute_percents(target, replacement):
        return [ s.replace("%", replacement) for s in target ]

    def run_command_with_filename(command_template, filename):
        command = command_template.copy()
        if not command_has_percent(command):
            command.append("%")

        res = subprocess.run(substitute_percents(command, filename), check=False)
        return res.returncode == 0

    if __name__ == '__main__':
        template = sys.argv[1:]
        ok = True
        for line in sys.stdin:
            if line.endswith("\n"):
                line = line[:-1]
            if not run_command_with_filename(template, line):
                ok = False
        exit(0 if ok else 1)

Short, clean, simple, easy to use. I called it xar, ran

    ls *.pdf | menupick | xar gpg -ac

and was content.

Now again, with Claude

The following day I thought this would be the perfect opportunity to try getting some LLM help with programming. I already had a baseline version of xar working, and had thought through the problem specification. Now I could do it over with the LLM and compare the two results. The program I wanted was small and self-contained. If the LLM sabotaged me somehow, I would probably notice, and if I didn't, it wouldn't matter, because I had a working version already.

So I asked Claude about it. I knew Claude was not the right choice, because Claude is not really designed for this task, and has no integration with the coding editor, but Claude is what I am familiar with so I thought I'd give it a shot.

On the whole it went extremely well. The complete transcript is here. I imagine this was pretty much a maximally good experience, that all the terrible problems of LLM coding arise in larger, more complicated systems. But this program is small and self-contained, with no subtleties, so the LLM could be at its best.

Phase 1

First I just told Claude what I wanted and let it write the program from scratch.

Let's write a program in Python called xar. It will be a stripped-down version of the standard Unix xargs utility, but it will do less and be simpler to use.

It will get command-line arguments that specify a command template. Then it will read lines from stdin. Each line will be appended to the command template to produce a command. The program will run the resulting command, one command for each line of input.

Any questions?

Default Claude is very chatty and I am going to omit most of its chatty comments, which make me impatient.

The program it wrote it was not what I would have written, but it was good enough. If I had just used it right off the bat, instead of writing my own, it would have been enough, and it would have taken somewhere between 2% and 10% as long to produce.

So the one-line summary of this article is: I should stop writing simple command-line utilities, and just have Claude write them instead, because Claude's are good enough, and definitely better than yak-shaving.

Here's Claude's initial version:

    #!/usr/bin/env python3

    import sys
    import subprocess
    import shlex

    def main():
        if len(sys.argv) < 2:
            print("Usage: xar command [args...]", file=sys.stderr)
            sys.exit(1)

        command_template = sys.argv[1:]

        for line in sys.stdin:
            line = line.rstrip('\n')

            full_command = command_template + [line]

            try:
                result = subprocess.run(full_command, check=True)
            except subprocess.CalledProcessError as e:
                print(f"Command failed with exit code {e.returncode}: {' '.join(full_command)}", 
                      file=sys.stderr)
            except FileNotFoundError:
                print(f"Command not found: {full_command[0]}", file=sys.stderr)
                sys.exit(1)

    if __name__ == "__main__":
        main()

Claude's version had numerous comments, which I have omitted. I later told it to stop putting in comments, which it did.

Claude's use of check here was not what I wanted, because that makes subprocess.run raise an exception when the subcommand fails, and then the exception has to be immediately caught and handled. My original control flow had been simpler:

        res = subprocess.run(substitute_percents(command, filename), check=False)
        return res.returncode == 0

        …

        if not run_command_with_filename(template, line):
            ok = False

Claude's program pulled in shlex without noticing that it was unused. But Claude did teach me about str.rstrip('\n') which I had not known about before (or maybe had forgotten), so that was one small win already.

Argument parsing

The next step was a big win. Python's library for command-line argument handling is called argparse and it is really nice. If I were still writing programs in Perl, I would implement a Perl version of argparse because Perl has 29 available argument parsing libraries and they are all loathsome. The one problem with argparse is I never remember off the top of my head how to use it. I think the module is called argparse but it provides a class called Argparser but I often get these backward and try to use argparser and Argparse.

Instead of figuring it out every time I usually dig up some previous Python program and then copy-paste the argument parser from there, amending it to suit the purpose.

But this time I didn't have to do that. Instead, I just said to Claude:

This is good, now please add code at the top to handle argument parsing with the standard Argparse library, even though there are no options yet.

Claude handed me back pretty much the same program, but with the argument parser at the top:

    def parse_arguments():
        """Parse command-line arguments using argparse."""
        parser = argparse.ArgumentParser(
            description="A simplified version of xargs that appends each line from stdin to a command template.",
            prog="xar"
        )

        # Store all positional arguments as the command to execute
        parser.add_argument(
            "command", 
            nargs="+", 
            help="The command template to execute for each input line"
        )

        return parser.parse_args()

Let's pause for a moment. Maybe you kids are unimpressed by this. But if someone had sent this interaction back in time and showed it to me even as late as five years ago, I would have been stunned. It would have been completely science-fictional. The code that it produced is not so impressive, maybe, but that I was able to get it in response to an unstructured request in plain English is like seeing a nuclear-powered jetpack in action.

I make some changes

It bothered me that the command-line arguments were being taken implicitly from sys.argv; that makes the program hard to test. A better way to do this is to have the list of command-line arguments be a function to the main function, and then for the runtime to call main(sys.argv[1:]) instead of just main(). That way you can test the program without having to fake up sys.argv. So I changed that, and while I was making changes I got rid of Claude's comments and the unnecessary exception handling. I pasted the result back at Claude, saying:

Good so far. I have adjusted your source code in several ways:

  1. I removed the comments.
  2. I removed the exception handling for the subcommand. If one of the subcommands fails, that no longer terminates the main loop
  3. Command-line arguments are passed to the main() function rather than being accessed directly

Here is the code. Tell me if you see any problems or if you have any questions about it

I had also sabotaged the code with at least one likely error. Python's subprocess.run returns a “result” structure that includes the subcommand's output and exit status, and I can never remember the name of the component that records the exit status. I either look it up or get it by trial-and-error. This time I didn't bother to either, I just guessed that it was result.exitcode, which I handed off to Claude even though I thought was probably wrong.

Claude's reply said:

Here are a couple of issues I noticed:

There's an error in the error handling code - you're using result.exitcode but in the subprocess.run() function, the attribute is called returncode not exitcode

Awesome, now I don't have to remember that name of that component.

Claude continued:

The parse_arguments() function definition is missing in the pasted code, but it would need to be updated to accept the argv parameter

I had passed sys.argv[1:] to main, and then had main call parse_arguments(argv). But I hadn't changed parse_arguments to expect an argument list. (I hadn't included it in my paste at all, in fact.) Claude was exactly right, and continued with the corrected version:

    #!/usr/bin/env python3

    import sys
    import subprocess
    import argparse

    def parse_arguments(argv):
        …
        return parser.parse_args(argv)

    def main(argv):
        # Parse command-line arguments
        args = parse_arguments(argv)
        …

    if __name__ == "__main__":
        ok = main(sys.argv[1:])
        exit(0 if ok else 1)

Handling NUL-separated values

I said

It's great, thanks for catching those errors. Now let's add a -0 command-line option that tells the program to split the input on "\0" characters instead of at line breaks.

Recall that my handwritten xar did not do this yet; I had left that feature for another day. Claude just added it for me, adding

    parser.add_argument(
        "-0", "--null",
        action="store_true",
        help="Input items are terminated by a null character instead of by newline"
    )

to the argument parser and changing the main loop to:

    # Split input based on the delimiter option
    if args.null:
        # Split on null characters
        items = stdin_data.split('\0')
    else:
        # Split on newlines
        items = stdin_data.splitlines()

    for item in items:
        if not item:
            continue

        …

I was tired of Claude's comments, so I said

Give me the same thing, with no comments

which it did, so I said

From now on leave out the comments. I'm a Python expert and I don't need them. If there is something I don't understand I will ask you.

Claude complied. If I were going to do use Claude again in the future I would include that in the canned instructions that Claude is given up front. Instead I will probably use a tool better-suited to programming, and do whatever the analogous thing is.

Template filling

Now I told Claude to add the % feature:

I changed if not item to if item == "". Now let's make the following change:

  • If the command template includes any % characters, each of these should be replaced with the input item.
  • Otherwise, if there were no % characters, the input item should be appended to the end of the command as in the current version

Claude did this. It used an explicit loop instead of the list comprehension that I had used (and preferred), but it did do it correctly:

    for arg in command_template:
        if '%' in arg:
            has_placeholder = True
            full_command.append(arg.replace('%', item))
        else:
            full_command.append(arg)

    if not has_placeholder:
        full_command.append(item)

Even without the list comprehension, I would have factored out the common code:

    for arg in command_template:
        if '%' in arg:
            has_placeholder = True
        full_command.append(arg.replace('%', item))

    if not has_placeholder:
        full_command.append(item)

But I am not going to complain, my code is simpler but is doing unnecessary work.

Claude also took my hint to change item == "" even though I didn't explicitly tell it to change that.

At this point the main loop of the main function was 15 lines long, because Claude had stuck all the %-processing inline. So I said:

Good, let's extract the command template processing into a subroutine.

It did this right, understanding correctly what code I was referring to and extracting it into a subroutine called process_command_template. More science fiction: I can say "command template processing" and it guesses what I had in mind!

This cut the main loop to 7 lines. That worked so well I tried it again:

Good, now let's extract the part of main that processes stdin into a subroutine that returns the items array

It pulled the correct code into a function called process_stdin. It did not make the novice mistake of passing the entire args structure to this function. In the caller it had process_stdin(args.null) and inside of process_stdin this parameter was named use_null_delimiter,

YAGNI?

At this point I was satisfied but I thought I might as well ask if it should do something else before we concluded:

Can you think of any features I left out that would be useful enough to warrant inclusion? Remember this program is supposed to be small and easy to use, in contrast to the existing xargs which is very complicated.

Claude had four suggestions:

  1. A -p or --parallel option to run commands in parallel

  2. A -n or --max-args option to specify the maximum number of items to pass per command

  3. A simple -v or --verbose flag to show commands as they're executed

  4. A way to replace the command's standard input with the item instead of adding it as an argument

All reasonable suggestions, nothing stupid. (It also supplied code for #3, which I had not asked for and did not want, but as I said before, default Claude is very chatty.)

Parallelization

I didn't want any of these, and I knew that #2–4 would be easy to add if I did want any of them later. But #1 was harder. I've done code like this in the past, where the program has a worker pool and runs a new process whenever the worker pool isn't at capacity. It's not even that hard. In Perl you can play a cute trick and use something like

    $workers{spawn()} = 1 while delete $workers{wait()};

where the workers hash maps process IDs to dummy values. A child exits, wait() awakens and returns the process ID of the completed child, which is then deleted from the map, and the loop starts another worker.

I wanted to see how Claude would do it, and the result was an even bigger win than I had had previously, because Claude wrote this:

   with concurrent.futures.ProcessPoolExecutor(max_workers=args.parallel) as executor:
        futures = [executor.submit(execute_command, cmd, args.verbose) for cmd in commands]

        for future in concurrent.futures.as_completed(futures):
            success = future.result()
            if not success:
                ok = False

What's so great about this? What's great is that I hadn't known about concurrent.futures or ProcessPoolExecutor. And while I might have suspected that something like them existed, I didn't know what they were called. But now I do know about them.

If someone had asked me to write the --parallel option, I would have had to have this conversation with myself:

Python probably has something like this already. But how long will it take me to track it down? And once I do, will the API documentation be any good, or will it be spotty and incorrect? And will there be only one module, or will there be three and I will have to pick the right one? And having picked module F6, will I find out an hour later that F6 is old and unmaintained and that people will tell me “Oh, you should have used A1, it is the new hotness, everyone knows that.”

When I put all that uncertainty on a balance, and weigh it against the known costs of doing it myself, which one wins?

The right choice is: I should do the research, find the good module (A1, not F6), and figure out how to use it.

But one of my biggest weaknesses as a programmer is that I too often make the wrong choice in this situation. I think “oh, I've done this before, it will be quicker to just do it myself”, and then I do and it is.

Let me repeat, it is quicker to do it myself. But that is still the wrong choice.

Maybe the thing I wrote would be sooner or smaller or faster or more technically suitable to the project than the canned module would have been. But it would only have been more be technically suitable today. If it needed a new feature in the future it might have to be changed by someone who had never seen it before, whereas the canned module could well already have the needed feature ready to go, already documented, and perhaps already familiar to whoever had to make the change. My bespoke version would certainly be unfamiliar to every other programmer on the project — including perhaps myself six months later — so would be the wrong thing to use.

I'm really good at hacking this stuff up. Which is a problem. It makes me want to hack stuff up, even when I shouldn't.

Claude tips the balance strongly toward the correct side, which is that I should use the prepackaged module that someone else wrote and not hack something up.

And now I know about concurrent.futures.ProcessPoolExecutor! The world is full of manuals, how can I decide which ones I should read? One way is to read them all, which used to work back when I was younger, but now I have more responsibilities and I don't have time to read the entire Python library reference including all the useless bits I will never use. But here's Claude pointing out to me that this is something I need to know about, now, today, and I should read this one. That is valuable knowledge.

Claude flubs

While testing the program I did

    xar --parallel=2 bash -c '...'

which didn't work because the argparse library tried to parse the -c as a flag to xar, when I wanted it to be a flag to bash. This was my fault, and there is a standard way around it, which is that I should have said

    xar --parallel=2 -- bash -c '...'

which tells argparse to drop the -- and take everything after it literally with no interpretation.

I thought I'd see what Claude had to say about this, and it dropped the ball in a very LLM sort of way.

I asked:

When I did xar --parallel=2 bash -c '...' it misunderstood the arguments. It thought I was trying to pass -c to xar and failed with unrecognized arguments. What should I do about that?

Claude replied:

There are two main approaches to handle this:

  1. Use the -- separator to indicate the end of arguments for xar: …
  2. Modify your code to handle this case specifically by looking for a command terminator in the argument parser.

Here's how you could modify your code to implement the first approach:

And then it followed with the same, unmodified code…

Which was correct, because -- is already handled by argparse, so no change was actually required to implement the first approach! But Claude's characterization of it as a modification was misleading, wrong, and potentially very confusing.

I said:

I don't see what the difference is in that argument parsing code as compared to what I had before.

and Claude got into the LLM tail-chasing mode where it got more and more confused:

You're right, I apologize for the confusion. There's no difference in the argument parsing code I provided compared to what you had before. The issue is with how argparse handles arguments by default.

To fix the issue with xar --parallel=2 bash -c '...', you need to actually implement support for the -- separator in your code…

   # Find the index of '--' if it exists
   try:
       separator_index = argv.index('--')
       xar_args = argv[:separator_index]
       command_args = argv[separator_index+1:]
   except ValueError: …

Ugh, all wrong in the worst way. The code probably works, but it is completely unnecessary. Claude's claim that “you need to actually implement support for the -- separator” is flat wrong. I pointed this out and Claude got more confused. Oh well, nobody is perfect!

Lessons learned

A long time ago, when syntax-coloring editors were still new, I tried one and didn't like it, then tried again a few years later and discovered that I liked it better than I had before, and not for the reasons that anyone had predicted or that I would have been able to predict. (I wrote an article about the surprising reasons to use the syntax coloring.)

This time also. As usual, an actual experiment produced unexpected results, because the world is complicated and interesting. Some of the results were unsurprising, but some were not anything I would have thought of beforehand.

Claude's code is good enough, but it is not a magic oracle

Getting Claude to write most of the code was a lot faster and easier than writing it myself. This is good! But I was dangerously tempted to just take Claude's code at face value instead of checking it carefully. I quickly got used to flying along at great speed, and it was tough to force myself to slow down and be methodical, looking over everything as carefully as I would if Claude were a real junior programmer. It would be easy for me to lapse into bad habits, especially if I were tired or ill. I will have to be wary.

Fortunately there is already a part of my brain trained to deal with bright kids who lack experience, and I think perhaps that part of my brain will be able to deal effectively with Claude.

I did not notice any mistakes on Claude's part — at least this time.

At one point my testing turned up what appeared to be a bug, but it was not. The testing was still time well-spent.

Claude remembers the manual better than I do

Having Claude remember stuff for me, instead of rummaging the manual, is great. Having Claude stub out an argument parser, instead of copying one from somewhere else, was pure win.

Partway along I was writing a test script and I wanted to use that Bash flag that tells Bash to quit early if any of the subcommands fails. I can never remember what that flag is called. Normally I would have hunted for it in one of my own shell scripts, or groveled over the 378 options in the bash manual. This time I just asked in plain English “What's the bash option that tells the script to abort if a command fails?” Claude told me, and we went back to what we were doing.

Claude can talk about code with me, at least small pieces

Claude easily does simple refactors. At least at this scale, it got them right. I was not expecting this to work as well as it did.

When I told Claude to stop commenting every line, it did. I wonder, if I had told it to use if not expr only for Boolean expressions, would it have complied? Perhaps, at least for a while.

When Claude wrote code I wasn't sure about, I asked it what it was doing and at least once it explained correctly. Claude had written

     parser.add_argument(
        "-p", "--parallel",
        nargs="?",
        const=5,
        type=int,
        default=1,
        help="Run up to N commands in parallel (default: 5)"
    )

Wait, I said, I know what the const=5 is doing, that's so that if you have --parallel with no number it defaults to 5. But what is the --default doing here? I just asked Claude and it told me: that's used if there is no --parallel flag at all.

This was much easier than it would have been for me to pick over the argparse manual to figure out how to do this in the first place.

More thoughts

On a different project, Claude might have done much worse. It might have given wrong explanations, or written wrong code. I think that's okay though. When I work with human programmers, they give wrong explanations and write wrong code all the time. I'm used to it.

I don't know how well it will work for larger systems. Possibly pretty well if I can keep the project sufficiently modular that it doesn't get confused about cross-module interactions. But if the criticism is “that LLM stuff doesn't work unless you keep the code extremely modular” that's not much of a criticism. We all need more encouragement to keep the code modular.

Programmers often write closely-coupled modules knowing that it is bad and it will cause maintenance headaches down the line, knowing that the problems will most likely be someone else's to deal with. But what if writing closely-coupled modules had an immediate cost today, the cost being that the LLM would be less helpful and more likely to mess up today's code? Maybe programmers would be more careful about letting that happen!

Will my programming skill atrophy?

Folks at Recurse Center were discussing this question.

I don't think it will. It will only atrophy if I let it. And I have a pretty good track record of not letting it. The essence of engineering is to pay attention to what I am doing and why, to try to produce a solid product that satisifes complex constraints, to try to spot problems and correct them. I am not going to stop doing this. Perhaps the problems will be different ones than they were before. That is all right.

Starting decades ago I have repeatedly told people

You cannot just paste code with no understanding of what is going on and expect it to work.

That was true then without Claude and it is true now with Claude. Why would I change my mind about this? How could Claude change it?

Will I lose anything from having Claude write that complex parser.add_argument call for me? Perhaps if I had figured it out on my own, on future occasions I would have remembered the const=5 and default=1 specifications and how they interacted. Perhaps.

But I suspect that I have figured it out on my own in the past, more than once, and it didn't stick. I am happy with how it went this time. After I got Claude's explanation, I checked its claimed behavior pretty carefully with a stub program, as if I had been reviewing a colleague's code that I wasn't sure about.

The biggest win Claude gave me was that I didn't know about this ProcessPoolExecutor thing before, and now I do. That is going to make me a better programmer. Now I know something about useful that I didn't know before, and I have a pointer to documentation I know I should study.

My skill at writing ad-hoc process pool managers might atrophy, but if it does, that is good. I have already written too many ad-hoc process pool managers. It was a bad habit, I should have stopped long ago, and this will help me stop.

Conclusion

This works.

Perfectly? No, it's technology, technology never works perfectly. Have you ever used a computer?

Will it introduce new problems? Probably, it's new technology, and new technology always introduces new problems.

But is it better than what we had before? Definitely.

I still see some programmers turning up their noses at this technology as if they were sure it was a silly fad that would burn itself out once people came to their senses and saw what a terrible idea it was.

I think that is not going to happen, and those nose-turning-up people, like the people who pointed out all the drawbacks and unknown-unknowns of automobiles as compared to horse-drawn wagons, are going to look increasingly foolish.

Because it works.

by Mark Dominus (mjd@plover.com) at May 03, 2025 04:46 PM

A puzzle about balancing test tubes in a centrifuge

Suppose a centrifuge has slots, arranged in a circle around the center, and we have test tubes we wish to place into the slots. If the tubes are not arranged symmetrically around the center, the centrifuge will explode.

(By "arranged symmetrically around the center, I mean that if the center is at , then the sum of the positions of the tubes must also be at .)

Let's consider the example of . Clearly we can arrange , , , or tubes symmetrically:

twelve small circles arranged around a central point, like a   clock; circles 5 and 11 are filled in cirles 1, 5, and 9 are filled in circles 0, 3, 6, and 9 are filled circles 1, 3, 5, 7, 9, and 11 are filled

Equally clearly we can't arrange only . Also it's easy to see we can do tubes if and only if we can also do tubes, which rules out .

From now on I will write to mean the problem of balancing tubes in a centrifuge with slots. So and are possible, and and are not. And is solvable if and only if is.

It's perhaps a little surprising that is possible. If you just ask this to someone out of nowhere they might have a happy inspiration: “Oh, I'll just combine the solutions for and , easy.” But that doesn't work because two groups of the form and always overlap.

For example, if your group of is the slots then you can't also have your group of be , because slot already has a tube in it.

The k=3 and k=4 diagrams from before, superimposed; there is a   small explosion symbol at 9 o'clock where they interfere

The other balanced groups of are blocked in the same way. You cannot solve the puzzle with ; you have to do as below left. The best way to approach this is to do , as below right. This is easy, since the triangle only blocks three of the six symmetric pairs. Then you replace the holes with tubes and the tubes with holes to turn into .

a triangle filling slots 1, 5, and 9, plus a pair at 0, 6 and   another pair at 2, 8 the opposite of the previous diagram, consisting of a triangle   filling slots 3, 7, 11 and a pair at 4, 10

Given and , how can we decide whether the centrifuge can be safely packed?

Clearly you can solve when is a multiple of , but the example of (or ) shows this isn't a necessary condition.

A generalization of this is that is always solvable if since you can easily balance tubes at positions , then do another tubes one position over, and so on. For example, to do you just put first four tubes in slots and the next four one position over, in slots .

Two squares this time, as described in the previous paragraph

An interesting counterexample is that the strategy for , where we did , cannot be extended to . One would want to do , but there is no way to arrange the tubes so that the group of doesn't conflict with the group of , which blocks one slot from every pair.

But we can see that this must be true without even considering the geometry. is the reverse of , which impossible: the only nontrivial divisors of are and , so must be a sum of s and s, and is not.

You can't fit tubes when , but again the reason is a bit tricky. When I looked at directly, I did a case analysis to make sure that the -group and the -group would always conflict. But again there was an easier was to see this: and clearly won't work, as is not a sum of s and s. I wonder if there's an example where both and are not obvious?

For , every works except and the always-impossible .

What's the answer in general? I don't know.

Addenda

20250502

Now I am amusing myself thinking about the perversity of a centrifuge with a prime number of slots, say . If you use it at all, you must fill every slot. I hope you like explosions!

While I did not explode any centrifuges in university chemistry, I did once explode an expensive Liebig condenser.

A chemistry lab apparatus.  In the middle thje the Liebig condenser, a pair of long concentric glass tubes, one inside the other, supported by a metal ring stand.  At left a heater heats a flash whose top is connected to the upper end of the inner tube of the condenser.  The condensate collects in a flask at right.  Two rubber tubes connect to the top and bottom of the outer tube, carrying water through it.

Condenser setup by Mario Link from an original image by Arlen on Flickr. Licensed cc-by-2.0, provided via Wikimedia Commons.

20250503

  • Michael Lugo informs me that a complete solution may be found on Matt Baker's math blog. I have not yet looked at this myself.

  • Omar Antolín points out an important consideration I missed: it may be necessary to subtract polygons. Consider . This is obviously possible since . But there is a more interesting solution. We can add the pentagon to the digons and to obtain the solution $${0,5,6,10,12,18, 20, 24, 25}.$$

    Then from this we can subtract the triangle to obtain $${5, 6, 12, 18, 24, 25},$$ a solution to which is not a sum of regular polygons:

  • Thanks to Dave Long for pointing out a small but significant error, which I have corrected.

20250505

by Mark Dominus (mjd@plover.com) at May 03, 2025 03:25 PM

GHC Developer Blog

GHC 9.10.2 is now available

GHC 9.10.2 is now available

Zubin Duggal - 2025-05-03

The GHC developers are very pleased to announce the availability of the final release for GHC 9.10.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of severities and scopes, including:

  • Significantly improved performance when dynamically loading Haskell symbols (#23415).
  • Fixing a bug where the simplifier sometimes destroyed join points during float out, which could impact performance (#24768).
  • Reduced memory fragmentation in the non-moving GC’s segment allocator, improving resident set size by up to 26% for some applications (#24150).
  • Added new flags to control speculative evaluation (-fspec-eval and -fspec-eval-dictfun) to work around performance regressions (#25606).
  • Fixed several platform-specific issues, including segfaults with FFI on PowerPC (#23034) and improved code generation for AArch64 with multiway branches now using jump tables (#19912)
  • And many more!

A full accounting of these fixes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at May 03, 2025 12:00 AM

May 02, 2025

Gabriella Gonzalez

Prompt chaining reimagined with type inference

Prompt chaining reimagined with type inference

At work I’ve been researching how to improve the ergonomics of prompt engineering and I wanted to share and open source some of what I’ve done. This initial post is about how I’ve been experimenting with using bidirectional type inference to streamline prompt chaining.

“Prompt chaining” is a prompt engineering technique that splits a larger task/prompt into multiple smaller tasks/prompts which are chained together using code. For example, instead of prompting a model to generate a poem in one prompt like this:

Write a poem based off this idea:

${idea}

… by following this process:

  • First think through the form, stanza count, lines per stanza, and rhyme scheme
  • Then choose a poetic style (tone, voice, and literary devices) based on the poem’s form
  • Then write a complete poem based on that plan

… you can split it into smaller prompts, like this:

structure prompt:

Plan the structure of a new poem based on this idea

${idea}

Describe its form, stanza count, lines per stanza, and rhyme scheme

style prompt:

Given this poem structure:

  • Form: ${structure.form}
  • Stanzas: ${structure.stanzaCount}
  • Lines per stanza: ${structure.linesPerStanza}
  • Rhyme scheme: ${structure.rhymeScheme}

Choose a poetic style: tone, voice, and literary devices to emphasize

poem prompt:

Write a complete poem based on this idea:

${idea}

Structure:

  • Form: ${structure.form}
  • Stanzas: ${structure.stanzaCount}
  • Lines per stanza: ${structure.linesPerStanza}
  • Rhyme scheme: ${structure.rhymeScheme}

Style:

  • Tone: ${style.stone}
  • Voice: ${style.voice}
  • Literary Devices: ${style.literaryDevices}

Why might you want to do this?

  • to improve the quality of the results

    Models perform better when working on more constrained subproblems. Splitting a larger prompt into smaller prompts helps the model stay focused at each step.

  • to introspect intermediate results

    This comes in handy when you want to log, validate, or correct intermediate results.

  • to perform actions in between prompts

    You might want to take the output of one prompt, use that to call some tool, then use the output of that tool to decide what the next prompt should be, which you can’t do with a single prompt.

In other words, prompt chaining unlocks greater accuracy, control, and flexibility for prompt engineering.

The problem

The main issue with prompt chaining is that it is a huge pain in the ass; if you start do anything a little bit complicated you need to start using structured outputs (i.e. JSON), which adds a whole lot of boilerplate to the process:

  • you have to define the schema for each intermediate step of the process

    You typically do this by defining your data model in your host programming language (e.g. a Pydantic model in Python) or directly defining your JSON schema

  • You have to instruct the model to produce JSON and explain the shape of the expected output

  • (Depending on the framework) you have to decode the JSON into your data model

For small prompt chaining pipelines this isn’t too hard, but it starts to get annoying to define all these schemas when you scale this up to more sophisticated prompt chaining pipelines.

So as a thought experiment I wanted to create a research prototype that handled all of that for you so that you didn’t need to specify any schemas at all. In other words I wanted to build a programming language that harnessed bidirectional type inference to perform schema inference for prompts with structured JSON outputs.

Example

I’ll cut to the case by showing the above prompt chain written as a program in this language:

let concatSep =
      https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let lines = concatSep "\n"

let generatePoem idea =
        let structure = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Plan the structure of a new poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
                    ]
                }

        let renderedStructure = lines
                [ "- Form: " + structure.form
                , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
                , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
                , "- Rhyme scheme: " + structure.rhymeScheme
                ]

        let style = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Given this poem structure:"
                    , renderedStructure
                    , ""
                    , "Choose a poetic style: tone, voice, and literary devices to emphasize."
                    ]
                }

        let renderedStyle = lines
                [ "- Tone: " + style.tone
                , "- Voice: " + style.voice
                , "- Literary Devices: " + concatSep ", " style.literaryDevices
                ]

        let poem : Text = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Write a complete poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Structure:"
                    , renderedStructure
                    , ""
                    , "Style:"
                    , renderedStyle
                    ]
                }

        in { structure: structure, style: style, poem: poem }

in  generatePoem "The stillness of early morning"

The above prompt chain is a real program in a research programming language named Fall-from-Grace (or “Grace” for short). Specifically, this is implemented in a branch of the language that adds a magic prompt keyword that works like in the above example.

If you run the above example, you might get an output like this:

{
  "structure": {
    "form": "Free verse with a sprinkle of rhyme",
    "linesPerStanza": 4,
    "rhymeScheme": "ABCB",
    "stanzaCount": 5
  },
  "style": {
    "literaryDevices": [
      "imagery",
      "alliteration",
      "personification",
      "enjambment"
    ],
    "tone": "nostalgic",
    "voice": "reflective"
  },
  "poem": "### The Stillness of Early Morning\n\n**I**  \nIn the gentle rise of dawn's embrace,  \nA whisper of winds through sleepy skies,  \nThe world lies suspended in an echo  \nOf yesterday's dreams, reluctantly saying goodbye.\n\n**II**  \nSoft orange light graces fields of dew,  \nCradling tenacious tendrils of the past.  \nMorning breathes upon the waiting world  \nWith promises that this serenity will last.\n\n**III**  \nWaves of silence surge across the meadow,  \nNature's symphony gently cradles rest.  \nA chorus of leaves sings its soft refrain,  \nComforted by morning's tender caress.\n\n**IV**  \nThe silence dips to caress weary trees,  \nWhose branches waltz to whispers soft and slow.  \nEach tendril holds close the morning's intention,  \nBefore the day's demands, the chaos, start to show.\n\n**V**  \nEarly rays dance upon a tranquil sea,  \nA canvas painted through a dreamy haze.  \nTime sits by, savoring the gentle hush,  \nAs the world awakens, stirs, and finally plays."
}

… and if you’re curious what the formatted poem looks like:

The Stillness of Early Morning

I
In the gentle rise of dawn’s embrace,
A whisper of winds through sleepy skies,
The world lies suspended in an echo
Of yesterday’s dreams, reluctantly saying goodbye.

II
Soft orange light graces fields of dew,
Cradling tenacious tendrils of the past.
Morning breathes upon the waiting world
With promises that this serenity will last.

III
Waves of silence surge across the meadow,
Nature’s symphony gently cradles rest.
A chorus of leaves sings its soft refrain,
Comforted by morning’s tender caress.

IV
The silence dips to caress weary trees,
Whose branches waltz to whispers soft and slow.
Each tendril holds close the morning’s intention,
Before the day’s demands, the chaos, start to show.

V
Early rays dance upon a tranquil sea,
A canvas painted through a dreamy haze.
Time sits by, savoring the gentle hush,
As the world awakens, stirs, and finally plays.

Type inference

The sample Grace program hardly specifies any types (mainly the final expected type for the poem: Text). The reason this works is because Grace supports bidirectional type inference, which means that Grace can work backwards from how intermediate results are used to infer their schemas.

I’ll illustrate this with a contrived Grace example:

let numbers = prompt{ text: "Give me two numbers" }

in  { x: numbers.x
    , y: numbers.y
    , sum: numbers.x + numbers.y : Integer
    }

… which might produce an output like this:

$ grace interpret ./numbers.ffg
{ "x": 7, "y": 14, "sum": 21 }

When Grace analyzes this program the type checker works backwards from this expression:

numbers.x + numbers.y : Integer

… and reasons about it like this:

  • the addition produces an Integer, therefore numbers.x and numbers.y must also be Integers

  • therefore numbers is a record with two fields, x and y, both of which are Integers

    … or using Grace syntax, the inferred type of numbers is: { x: Integer, y: Integer }

  • therefore the output of the prompt command must have the same type

… and then Grace generates a JSON schema for the prompt which looks like this:

{ "type": "object",
  "properties": {
    "x": { "type": "integer" },
    "y": { "type": "integer" }
  },
  "required": [ "x", "y" ],
  "additionalProperties": false
}

Of course, you can specify types if you want (and they’re more lightweight than schemas in traditional prompt chaining frameworks). For example:

$ grace repl
>>> prompt{ text: "Give me a first and last name" } : { first: Text, last: Text }
{ "first": "Emily", "last": "Johnson" }
>>> prompt{ text: "Give me a list of names" } : List Text
[ "Alice"
, "Bob"
, "Charlie"
, "Diana"
, "Ethan"
, "Fiona"
, "George"
, "Hannah"
, "Isaac"
, "Jack"
]

However in our original example we don’t need to specify intermediate types because when the type-checker sees this code:

let structure = prompt
        { model: "gpt-4o"
        , text: lines
            [ "Plan the structure of a new poem based on this idea:"
            , ""
            , idea
            , ""
            , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
            ]
        }

let renderedStructure = lines
        [ "- Form: " + structure.form
        , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
        , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
        , "- Rhyme scheme: " + structure.rhymeScheme
        ]

… the compiler can reason backwards from how the structure value is used to infer that the JSON schema for the prompt needs to be:

{ "type": "object",
  "properties": {
    "form": { "type": "string" },
    "stanzaCount": { "type": "integer" },
    "linesPerStanza": { "type": "integer" },
    "rhymeScheme": { "type": "string" }
  },
  "required": [
    "form",
    "stanzaCount",
    "linesPerStanza",
    "rhymeScheme"
    ],
  "additionalProperties": false
}

Tool use

Grace also supports generating sum types (a.k.a. tagged unions), and you can imagine using this to subsume traditional tool use frameworks.

For example, consider this Grace program:

let concatSep = https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let call = merge
      { HttpRequest: \x -> "curl " + x.url
      , ShellCommand: \x -> concatSep " " ([ x.executable ] + x.arguments)
      }

in  List/map call (prompt{ text: "Call some tools" })

This doesn’t actually run any tools (I haven’t added any callable tools to my work-in-progress branch yet), but just renders the tool use as a string for now:

$ grace interpret ./tools.ffg
[ "curl https://api.example.com/data", "ls -l -a" ]

However, the idea is that you can model a tool as a sum type with one constructor per callable tool, and in the above example the type checker infers that the sum type representing one tool call is:

< HttpRequest: { url: Text }
| ShellCommand: { executable: Text, arguments: List Text }
>

… so the inferred type of call is:

call : < HttpRequest: …, ShellCommand:> -> Text

… but since we List/map the call function over the output of the prompt the type checker infers that the prompt needs to generate a List of tool calls:

prompt{ text: "Call some tools" } : List < HttpRequest: …, ShellCommand:>

… and then Grace does some magic under the hood to convert that type to the equivalent JSON schema.

What’s particularly neat about this example is that the prompt is so incredibly bare (“Call some tools”) because all the information the model needs is present in the schema.

Schema-driven prompting

We can explore this idea of using the schema to drive the prompt instead of prose using an example like this:

prompt{ text: "Generate some characters for a story", model: "gpt-4o" }
  : List
    { "The character's name": Text
    , "The most memorable thing about the character": Text
    , "The character's personal arc": Text
    }
[ { "The character's name": "Aveline Thatcher"
  , "The character's personal arc":
      "Aveline starts as a skeptical journalist who doubts the stories of mythical creatures. Over time, she becomes a firm believer, risking her career to uncover the truth and protect these creatures."
  , "The most memorable thing about the character":
      "The intricate tattoo of a phoenix on her forearm that seems to glow when she discovers hidden truths."
  }
, { "The character's name": "Kelan Frost"
  , "The character's personal arc":
      "A former rogue alchemist who turns hero after he inadvertently creates a dangerous substance. Driven by guilt, Kelan seeks redemption by finding an antidote and saving his village."
  , "The most memorable thing about the character":
      "His iridescent blue eyes that seem to see into one's soul, a side effect of his alchemical experiments."
  }
, { "The character's name": "Luciana Blair"
  , "The character's personal arc":
      "Luciana is a reclusive artist who initially fears the world outside her home. After a mysterious vision rejuvenates her, she sets out on a journey of self-discovery, ultimately finding both her voice and courage."
  , "The most memorable thing about the character":
      "Her ability to paint scenes before they happen, which she attributes to the visions she sees in her dreams."
  }
, { "The character's name": "Ezra Hartman"
  , "The character's personal arc":
      "Once a charismatic but self-centered lawyer, Ezra is confronted with a moral crisis that forces him to reevaluate his values. He chooses a path of integrity, becoming an advocate for justice."
  , "The most memorable thing about the character":
      "His perfectly tailored suits that slowly become more casual, symbolizing his transformation and shifting priorities."
  }
, { "The character's name": "Seraphine Mora"
  , "The character's personal arc":
      "Seraphine is a young music prodigy who loses her hearing after an accident. Battling despair, she learns to embrace a new way of 'hearing' music through vibrations and her other senses."
  , "The most memorable thing about the character":
      "The ethereal way she 'dances' with the music, using her entire body to express each note's emotion."
  }
]

Grace is a superset of JSON and since JSON supports arbitrary field names so does Grace! Field names in Grace support arbitrary capitalization, punctuation, and whitespace as long as you quote them, and we can use the field names to “smuggle” the description of each field into the schema.

Conclusion

Hopefully this gives you some idea of why I’ve begun to think of prompt chaining as a programming languages problem. Type inference is just the beginning and I think it is possible to use a domain-specific programming language not just to simplify the code but to ultimately unlock greater reasoning power.

I’m going to continue to use Grace as a research vehicle for prompt chaining but my LLM-enabled branch of Grace (like Grace itself) is not really intended to be used in production and I created it mainly as a proof-of-concept for where I’d like prompt chaining frameworks to go. If I do end up eventually productionizing this research I will create a proper fork with its own name and the whole works.

by Gabriella Gonzalez (noreply@blogger.com) at May 02, 2025 05:56 PM

April 29, 2025

Dan Piponi (sigfpe)

Expectation-Maximization with Less Arbitrariness

Introduction

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

There are many introductions to the Expectation-Maximisation algorithm. Unfortunately every one I could find uses arbitrary seeming tricks that seem to be plucked out of a hat by magic. They can all be justified in retrospect, but I find it more useful to learn from reusable techniques that you can apply to further problems. Examples of tricks I've seen used are:

  1. Using Jensen's inequality. It's easy to find inequalities that apply in any situation. But there are often many ways to apply them. Why apply it to this way of writing this expression and not that one which is equal?
  2. Substituting in the middle of an expression. Again, you can use just about anywhere. Why choose this at this time? Similarly I found derivations that insert a into an expression.
  3. Majorisation-Minimisation. This is a great technique, but involves choosing a function that majorises another. There are so many ways to do this, it's hard to imagine any general purpose method that tells you how to narrow down the choice.
My goal is to fill in the details of one key step in the derivation of the EM algorithm in a way that makes it inevitable rather than arbitrary. There's nothing original here, I'm merely expanding on a stackexchange answer.


Generalities about EM

The EM algorithm seeks to construct a maximum likelihood estimator (MLE) with a twist: there are some variables in the system that we can't observe.


First assume no hidden variables. We assume there is a vector of parameters that defines some model. We make some observations . We have a probability density that depends on . The likelihood of given the observations is . The maximum likelhood estimator for is the choice of that maximises for the we have observed.


Now suppose there are also some variables that we didn't get to observe. We assume a density . We now have

where we sum over all possible values of . The MLE approach says we now need to maximise
One of the things that is a challenge here is that the components of might be mixed up among the terms in the sum. If, instead, each term only referred to its own unique block of , then the maximisation would be easier as we could maximise each term independently of the others. Here's how we might move in that direction. Consider instead the log-likelihood
Now imagine that by magic we could commute the logarithm with the sum. We'd need to maximise
One reason this would be to our advantage is that often takes the form where is a simple function to optimise. In addition, may break up as a sum of terms, each with its own block of 's. Moving the logarithm inside the sum would give us something we could easily maximise term by term. What's more, the for each is often a standard probability distribution whose likelihood we already know how to maximise. But, of course, we can't just move that logarithm in.


Maximisation by proxy

Sometimes a function is too hard to optimise directly. But if we have a guess for an optimum, we can replace our function with a proxy function that approximates it in the neighbourhood of our guess and optimise that instead. That will give us a new guess and we can continue from there. This is the basis of gradient descent. Suppose is a differentiable function in a neighbourhood of . Then around we have

We can try optimising with respect to within a neighbourhood of . If we pick a small circular neighbourhood then the optimal value will be in the direction of steepest descent. (Note that picking a circular neighbourhood is itself a somewhat arbitrary step, but that's another story.) For gradient descent we're choosing because it matches both the value and derivatives of at . We could go further and optimise a proxy that shares second derivatives too, and that leads to methods based on Newton-Raphson iteration.


We want our logarithm of a sum to be a sum of logarithms. But instead we'll settle for a proxy function that is a sum of logarithms. We'll make the derivatives of the proxy match those of the original function precisely so we're not making an arbitrary choice.


Write

The are constants we'll determine. We want to match the derivatives on either side of the at :
On the other hand we have


To achieve equality we want to make these expressions match. We choose

Our desired proxy function is:


So the procedure is to take an estimated and obtain a new estimate by optimising this proxy function with respect to . This is the standard EM algorithm.


It turns out that this proxy has some other useful properties. For example, because of the concavity of the logarithm, the proxy is always smaller than the original likelihood. This means that when we optimise it we never optimise ``too far'' and that progress optimising the proxy is always progress optimising the original likelihood. But I don't need to say anything about this as it's all part of the standard literature.


Afterword

As a side effect we have a general purpose optimisation algorithm that has nothing to do with statistics. If your goal is to compute

you can iterate, at each step computing
where is the previous iteration. If the take a convenient form then this may turn out to be much easier.


Note

This was originally written as a PDF using LaTeX. It'll be available here for a while. Some fidelity was lost when converting it to HTML.

by sigfpe (noreply@blogger.com) at April 29, 2025 09:12 PM

Running from the past


Important Note

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.


Preface

Functional programming encourages us to program without mutable state. Instead we compose functions that can be viewed as state transformers. It's a change of perspective that can have a big impact on how we reason about our code. But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations. (Unless you're infinitely unlucky!)



Markov Chains

A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state. So it is a sequence such that for all . A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny). (And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)


Like imperative code, this description is stateful. The state at step depends on the state at step . Probability is often easier to reason about when we work with independent identically drawn random variables and our aren't of this type. But we can eliminate the state from our description using the same method used by functional programmers.


Let's choose a Markov chain to play with. I'll pick one with 3 states called , and and with transition probabilities given by where


Here's a diagram illustrating our states:




Implementation

First some imports:


> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}


> import Data.Sequence(replicateA) > import System.Random > import Control.Monad.State > import Control.Monad > import Data.List > import Data.Array


And now the type of our random variable:


> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)


We are now in a position to simulate our Markov chain. First we need some random numbers drawn uniformly from [0, 1]:


> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random


And now the code to take a single step in the Markov chain:


> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
>     a <- uniform
>     if a < 0.5
>         then return A
>         else return B
> step B = do
>     a <- uniform
>     if a < 1/3.0
>         then return A
>         else if a < 2/3.0
>             then return B
>             else return C
> step C = do
>     a <- uniform
>     if a < 0.5
>         then return B
>         else return C


Notice how the step function generates a new state at random in a way that depends on the previous state. The m ABC in the type signature makes it clear that we are generating random states at each step.


We can simulate the effect of taking steps with a function like this:


> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
>     i <- steps (n-1) i
>     step i


We can run for 100 steps, starting with , with a line like so:


*Main> evalState (steps 3 A) gen
B


The starting state of our random number generator is given by gen.


Consider the distribution of states after taking steps. For Markov chains of this type, we know that as goes to infinity the distribution of the th state approaches a limiting "stationary" distribution. There are frequently times when we want to sample from this final distribution. For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution. But for real world problems this can be intractable. Instead, a popular solution is to pick a large and hope it's large enough. As gets larger the distribution gets closer to the limiting distribution. And that's the problem I want to solve here - sampling from the limit. It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.



Some random functions


Here is a new version of our random step function:


> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
>     a <- uniform
>     return $ \case
>         A -> if a < 0.5 then A else B
>         B -> if a < 1/3.0
>                 then A
>                 else if a < 2/3.0 then B else C
>         C -> if a < 0.5 then B else C


In many ways it's similar to the previous one. But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state. We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:


> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
>   fs <- replicateA n step'
>   return $ foldr (flip (.)) id fs


Notice the use of flip. We want to compose functions , each time composing on the left by the new . This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation: (BTW I used replicateA instead of replicateM to indicate that these are independent random draws. It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)


*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]


When I first implemented this I accidentally forgot the flip. So maybe you're wondering what effect removing the flip has? The effect is about as close to a miracle as I've seen in mathematics. It allows us to sample from the limiting distribution in a finite number of steps!


Here's the code:


> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
>   fs <- replicateA n step'
>   return $ foldr (.) id fs


We end up building . This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'. Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain. Running with argument draws a random composition of functions. But if you increase by 1 you don't add a new step at the end. Instead you effectively restart the Markov chain with a new first step generated by a new random seed.


Try it and see:


*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]


Maybe that's surprising. It seems to get stuck in one state. In fact, we can try applying the resulting function to all three states.


*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]


In other words, for large enough we get the constant function.


Think of it this way: If f isn't injective then it's possible that two states get collapsed to the same state. If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state. Once this happens, we'll get the same result no matter how large we take . If we can detect this then we've found the limit of as goes to infinity. But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition. That flip can't change what probability distribution we're drawing from - just the dependence on the seed. So the value the constant function takes is actually a draw from the limiting stationary distribution.


We can code this up:


> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as


> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool > test_constant f = > all_equal $ map f $ enumFromTo minBound maxBound


This technique is called coupling from the past. It's "coupling" because we've arranged that different starting points coalesce. And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.


> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
>                   m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
>     if test_constant f
>         then return f
>         else do
>             f' <- step
>             couple_from_past step (f . f')


We can now sample from the limiting distribution a million times, say:


*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen


We can now count how often A appears:


*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748


That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.


> gen = mkStdGen 669



Notes

The technique of coupling from the past first appeared in a paper by Propp and Wilson. The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it. Note that the code above is absolutely not how you'd implement this for real. I wrote the code that way so that I could switch algorithm with the simple removal of a flip. In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred. Or even with uncountably large state spaces. But I'll let you read the Propp-Wilson paper to find out how.

by sigfpe (noreply@blogger.com) at April 29, 2025 09:10 PM

Constructing Clifford Algebras using the Super Tensor Product

Google have stopped supporting the Chart API so all of the mathematics notation below is missing. There is a PDF version of this article at GitHub.

Some literate Haskell but little about this code is specific to Haskell...


> {-# LANGUAGE DataKinds #-}
> {-# LANGUAGE TypeFamilies #-}
> {-# LANGUAGE TypeOperators #-}
> {-# LANGUAGE UndecidableInstances #-}
> 
> import GHC.TypeLits



Introduction

This is a followup to Geometric Algebra for Free and More Low Cost Geometric Algebra.


In those articles I showed how you could build up the Clifford algebras like so:


type Cliff1  = Complex R
type Cliff1' = Split R
type Cliff2  = Quaternion R
type Cliff2' = Matrix R
type Cliff3  = Quaternion Cliff1'
type Cliff3' = Matrix Cliff1
type Cliff4  = Quaternion Cliff2'
type Cliff4' = Matrix Cliff2
type Cliff5  = Quaternion Cliff3'
...


I used CliffN as the Clifford algebra for a negative definite inner product and CliffN' for the positive definite case. It's not a completely uniform sequence in the sense that CliffN is built from CliffN' for dimension two lower and you use a mix of Matrix and Quaternion.


The core principle making this work is that for type constructors implemented like Matrix, Quaternion etc. we have the property that



eg. Matrix (Quaternion Float) is effectively the same thing as Matrix Float Quaternion Float.


But John Baez pointed out to me that you can build up the CliffN algebras much more simply enabling us to use these definitions:


> type Cliff1 = Complex Float
> type Cliff2 = Complex Cliff1
> type Cliff3 = Complex Cliff2
> type Cliff4 = Complex Cliff3
> type Cliff5 = Complex Cliff4


...


Or even better:


> type family Cliff (n :: Nat) :: * where
>   Cliff 0 = Float
>   Cliff n = Complex (Cliff (n - 1))


But there's one little catch. We have to work, not with the tensor product, but the super tensor product.


We define Complex the same way as before:


> data Complex a = C a a deriving (Eq, Show)


Previously we used a definition of multiplication like this:


instance Num a => Num (Complex a) where
  C a b * C c d = C (a * c - b * d) (a * d + b * c) 


We can think of C a b in Complex R as representing the element . The definition of multiplication in a tensor product of algebras is defined by . So we have .


This means that line of code we wrote above defining * for Complex isn't simply a definition of multiplication of complex numbers, it says how to multiply in an algebra tensored with the complex numbers.



Let's go Super!

A superalgebra is an algebra graded by where is the ring of integers modulo 2. What that means is that we have some algebra that can be broken down as a direct sum (the subscripts live in ) with the property that multiplication respects the grading, ie. if is in and is in then is in .


The elements of are called "even" (or bosonic) and those in "odd" (or fermionic). Often even elements commute with everything and odd elements anticommute with each other but this isn't always the case. (The superalgebra is said to be supercommutative when this happens. This is a common pattern: a thing X becomes a superX if it has odd and even parts and swapping two odd things introduces a sign flip.)


The super tensor product is much like the tensor product but it respects the grading. This means that if is in and is in then is in . From now on I'm using to mean super tensor product.


Multiplication in the super tensor product of two superalgebras and is now defined by the following modified rule: if is in and is in then . Note that the sign flip arises when we shuffle an odd left past an odd .


The neat fact that John pointed out to me is that .


We have to modify our definition of * to take into account that sign flip.


I initially wrote a whole lot of code to define a superalgebra as a pair of algebras with four multiplication operations and it got a bit messy. But I noticed that the only specifically superalgebraic operation I ever performed on an element of a superalgebra was negating the odd part of an element.


So I could define SuperAlgebra like so:


class SuperAlgebra a where
  conjugation :: a -> a


where conjugation is the negation of the odd part.


(I'm not sure if this operation corresponds to what is usually called conjugation in this branch of mathematics.)


But there's a little efficiency optimization I want to write. If I used the above definition, then later I'd often find myself computing a whole lot of negates in a row. This means applying negate to many elements of large algebraic objects even though any pair of them cancel each other's effect. So I add a little flag to my conjugation function that is used to say we want an extra negate and we can accumulate flips of a flag rather than flips of lots of elements.


> class SuperAlgebra a where
>   conjugation :: Bool -> a -> a


Here's our first instance:


> instance SuperAlgebra Float where
>   conjugation False x = x
>   conjugation True x = negate x


This is saying that the conjugation is the identity on Float but if we want to perform an extra flip we can set the flag to True. Maybe I should call it conjugationWithOptionalExtraNegation.


And now comes the first bit of non-trivial superalgebra:


> instance (Num a, SuperAlgebra a) => SuperAlgebra (Complex a) where
>   conjugation e (C a b) = C (conjugation e a) (conjugation (not e) b)


We consider to be even and to be odd. When we apply the conjugation to then we can just apply it directly to . But that flips the "parity" of (because tensor product respects the grading) so we need to swap when we use the conjugation. And that should explain why conjugation is defined the way it is.


Now we can use the modified rule for defined above:


> instance (Num a, SuperAlgebra a) => Num (Complex a) where
>   fromInteger n = C (fromInteger n) 0
>   C a b + C a' b' = C (a + a') (b + b')
>   C a b * C c d = C (a * c - conjugation False b * d)
>                     (conjugation False a * d + b * c) 
>   negate (C a b) = C (negate a) (negate b)
>   abs = undefined
>   signum = undefined


For example, conjugation False is applied to the first on the RHS because implicitly represents an term and when expanding out the product we shuffle the (odd) in left of . It doesn't get applied to the second because and remain in the same order.


That's it!



Tests

I'll test it with some examples from Cliff3:


> class HasBasis a where
>   e :: Integer -> a


> instance HasBasis Float where > e = undefined


> instance (Num a, HasBasis a) => HasBasis (Complex a) where > e 0 = C 0 1 > e n = C (e (n - 1)) 0


> make a b c d e f g h = > C (C (C a b) (C c d)) > (C (C e f) (C g h))


> e1, e2, e3, e21, e31, e32, e321 :: Cliff 3 > e1 = e 0 > e2 = e 1 > e21 = e2 * e1 > e3 = e 2 > e31 = e3 * e1 > e32 = e3 * e2 > e321 = e3 * e2 * e1


> main = do > print (e1 * e1 + 1 == 0) > print (e31 * e31 + 1 == 0) > print (e3 * e3 + 1 == 0) > print (e21 * e21 + 1 == 0) > print (e2 * e2 + 1 == 0) > print (e32 * e32 + 1 == 0) > print (e321 * e321 - 1 == 0) > print (e3 * e2 * e1 - e321 == 0) > print (e2 * e1 - e21 == 0) > print (e3 * e1 - e31 == 0) > print (e3 * e2 - e32 == 0) > print (e21 * e32 - e31 == 0)



Observation

The implementation of multiplication looks remarkably like it's the Cayley-Dickson construction. It can't be (because iterating it three times gives you a non-associative algebra but the Clifford algebras are associative). Nonetheless, I think comparison with Cayley-Dickson may be useful.



Efficiency

As mentioned above, before I realised I just needed the conjugation operation I wrote the above code with an explicit split of a superalgebra into two pieces intertwined by four multiplications. I think the previous approach may have a big advantage - it may be possible to use variations on the well known "speed-up" of complex multiplication that uses three real multiplications instead of four. This should lead to a fast implementation of Clifford algebras.


Also be warned: you can kill GHC if you turn on optimization and try to multiply elements of high-dimensional Clifford algebras. I think it tries to inline absolutely everything and you end up with a block of code that grows exponentially with .


Note also that this code translates directly into many languages.

by sigfpe (noreply@blogger.com) at April 29, 2025 09:08 PM

April 26, 2025

Gabriella Gonzalez

Unification-free ("keyword") type checking

Unification-free ("keyword") type checking

From my perspective, one of the biggest open problems in implementing programming languages is how to add a type system to the language without significantly complicating the implementation.

For example, in my tutorial Fall-from-Grace implementation the type checker logic accounts for over half of the code. In the following lines of code report I’ve highlighted the modules responsible for type-checking with a :

$ cloc --by-file src/Grace/*.hs       

--------------------------------------------------------------------------------
File                                    blank        comment           code
--------------------------------------------------------------------------------
src/Grace/Infer.hs        ‡               499            334           1696
src/Grace/Type.hs         ‡                96             91            633
src/Grace/Syntax.hs                        61            163            543
src/Grace/Parser.hs                       166             15            477
src/Grace/Lexer.hs                         69             25            412
src/Grace/Normalize.hs                     47             48            409
src/Grace/Context.hs      ‡                72            165            249
src/Grace/Import.hs                        38              5            161
src/Grace/REPL.hs                          56              4            148
src/Grace/Interpret.hs                     30             28            114
src/Grace/Pretty.hs                        25             25            108
src/Grace/Monotype.hs     ‡                11             48             61
src/Grace/Location.hs                      16             15             60
src/Grace/TH.hs                            23             32             53
src/Grace/Value.hs                         12             53             53
src/Grace/Input.hs                         10              8             43
src/Grace/Compat.hs                         9              2             32
src/Grace/Existential.hs  ‡                12             23             25
src/Grace/Domain.hs       ‡                 4              7             20
--------------------------------------------------------------------------------
SUM:                                     1256           1091           5297
--------------------------------------------------------------------------------

That’s 2684 lines of code (≈51%) just for type-checking (and believe me: I tried very hard to simplify the type-checking code).

This is the reason why programming language implementers will be pretty keen to just not implement a type-checker for their language, and that’s how we end up with a proliferation of untyped programming languages (e.g. Godot or Nix), or ones that end up with a type system bolted on long after the fact (e.g. TypeScript or Python). You can see why someone would be pretty tempted to skip implementing a type system for their language (especially given that it’s an optional language feature) if it’s going to balloon the size of their codebase.

So I’m extremely keen on implementing a “lean” type checker that has a high power-to-weight ratio. I also believe that a compact type checker is an important foundational step for functional programming to “go viral” and displace imperative programming. This post outlines one approach to this problem that I’ve been experimenting with1.

Unification

The thing that bloats the size of most type-checking implementations is the need to track unification variables. These variables are placeholders for storing as-yet-unknown information about something’s type.

For example, when a functional programming language infers the type of something like this Grace expression:

(λx → x) true

… the way it typically works is that it will infer the type of the function (λx → x) which will be:

λx → x : α → α

… where α is a unification variable (an unsolved type). So you can read the above type annotation as saying “the type of λx → x is a function from some unknown input type (α) to the same output type (α).

Then the type checker will infer the type of the function’s input argument (true) which will be:

true : Bool

… and finally the type checker will combine those two pieces of information and reason about the final type like this:

  • the input to the function (true) is a Bool
  • therefore the function’s input type (α) must also be Bool
  • therefore the function’s output type (α) must also be Bool
  • therefore the entire expression’s type is Bool

… which gives the following conclusion of type inference:

(λx → x) true : Bool

However, managing unification variables like α is a lot trickier than it sounds. There are multiple unification algorithms/frameworks in the wild but the problem with all of them is that you have to essentially implement a bespoke logic programming language (with all of the complexity that entails). Like, geez, I’m already implementing a programming language and I don’t want to have to implement a logic programming language on top of that just to power my type-checker.

So there are a couple of ways I’ve been brainstorming how to address this problem and one idea I had was: what if we could get rid of unification variables altogether?

Deleting unification

Alright, so this is the part of the post that requires some familiarity/experience with implementing a type-checker. If you’re somebody new to programming language theory then you can still keep reading but this is where I have to assume some prior knowledge otherwise this post will get way too long.

The basic idea is that you start from the “Complete and Easy” bidirectional type checking algorithm which is a type checking algorithm that does use unification variables2 but is simpler than most type checking algorithms. The type checking rules look like this (you can just gloss over them):

Now, delete all the rules involving unification variables. Yes, all of them. That means that all of the type-checking judgments from Figures 9 and 10 are gone and also quite a few rules from Figure 11 disappear, too.

Surprisingly, you can still type check a lot of code with what’s left, but you lose two important type inference features if you do this:

  • you can no longer infer the types of lambda arguments

  • you can no longer automatically instantiate polymorphic code

… and I’ll dig into those two issues in more detail.

Inferring lambda argument types

You lose the ability to infer the type of a function like this one when you drop support for unification variables:

λx → x == False

Normally, a type checker that supports unification can infer that the above function has type Bool → Bool, but (in general) a type checker can no longer infer that when you drop unification variables from the implementation.

This loss is not too bad (in fact, it’s a pretty common trade-off proposed in the bidirectional type checking literature) because you can make up for it in a few ways (all of which are easy and efficient to implement in a type checker):

  • You can allow the input type to be inferred if the lambda is given an explicit type annotation, like this:

    λx → x == False : BoolBool

    More generally, you can allow the input type to be inferred if the lambda is checked against an expected type (and a type annotation is one case, but not the only case, where a lambda is checked against an expected type).

    We’re going to lean on this pretty heavily because it’s pretty reasonable to ask users to provide type annotations for function definitions and also because there are many situations where we can infer the expected type of a lambda expression from its immediate context.

  • You can allow the user to explicitly supply the type of the argument

    … like this:

    λ(x : Bool) → x == False

    This is how Dhall works, although it’s not as ergonomic.

  • You can allow the input type to be inferred if the lambda is applied to an argument

    This is not that interesting, but I’m mentioning it for completeness. The reason it’s not interesting is because you won’t often see expressions of the form (λx → e) y in the wild, because they can more idiomatically be rewritten as let x = y in e.

Instantiating polymorphic code

The bigger issue with dropping support for unification variables is: all user-defined polymorphic functions now require explicit type abstraction and explicit type application, which is a major regression in the type system’s user experience.

For example, in a language with unification variables you can write the polymorphic identity function as:

λx → x

… and use it like this3:

let id = λx → x
in  (id true, id 1)

… but when you drop support for unification variables then you have to do something like this:

let id = λ(a : Type) → λ(x : a) → x
in  (id Bool true, id Natural 1)

Most programmers do NOT want to program in a language where they have to explicitly manipulate type variables in this way. In particular, they really hate explicit type application. For example, nobody wants to write:

map { x : Bool, … large record … } Bool (λr → r.x) rs

So we need to figure out some way to work around this limitation.

The trick

However, there is a solution that I believe gives a high power-to-weight ratio, which I will refer to as “keyword” type checking:

  • add a bunch of built-in functions

    Specifically, add enough built-in functions to cover most use cases where users would need a polymorphic function.

  • add special type-checking rules for those built-in functions when they’re fully saturated with all of their arguments

    These special-cased type-checking rules would not require unification variables.

  • still require explicit type abstraction when these built-in functions are not fully saturated

    Alternatively, you can require that built-in polymorphic functions are fully saturated with their arguments and make it a parsing error if they’re not.

  • still require explicit type abstraction and explicit type application for all user-defined (i.e. non-builtin) polymorphic functions

  • optionally, turn these built-in functions into keywords or language constructs

I’ll give a concrete example: the map function for lists. In many functional programming languages this map function is not a built-in function; rather it’s defined within the host language as a function of the following type:

map : ∀(a b : Type) → (a → b) → List a → List b

What I’m proposing is that the map function would now become a built-in function within the language and you would now apply a special type-checking rule when the map function is fully saturated:

Γ ⊢ xs ⇒ List a   Γ ⊢ f ⇐ a → b
───────────────────────────────
Γ ⊢ map f xs ⇐ List b

In other words, we’re essentially treating the map built-in function like a “keyword” in our language (when it’s fully saturated). Just like a keyword, it’s a built-in language feature that has special type-checking rules. Hell, you could even make it an actual keyword or language construct (e.g. a list comprehension) instead of a function call.

I would even argue that you should make each of these special-cased builtin-functions a keyword or a language construct instead of a function call (which is why I call this “keyword type checking” in the first place). When viewed through this lens the restrictions that these polymorphic built-in functions (A) are saturated with their arguments and (B) have a special type checking judgment are no different than the restrictions for ordinary keywords or language constructs (which also must be saturated with their arguments and also require special type checking judgments).

To make an analogy, in many functional programming languages the if/then/else construct has this same “keyword” status. You typically don’t implement it as a user-space function of this type:

ifThenElse : ∀(a : Type) → Bool → a → a → a

Rather, you define if as a language construct and you also add a special type-checking rule for if:

Γ ⊢ b ⇐ Bool   Γ ⊢ x ⇒ a   Γ ⊢ y ⇐ a
────────────────────────────────────
Γ ⊢ if b then x else y ⇒ a

… and what I’m proposing is essentially greatly exploding the number of “keywords” in the implementation of the language by turning a whole bunch of commonly-used polymorphic functions into built-in functions (or keywords, or language constructs) that are given special type-checking treatment.

For example, suppose the user were to create a polymorphic function like this one:

let twice = λ(a : Type) → λ(x : a) → [ x, x ]

in  twice (List Bool) (twice Bool true)

That’s not very ergonomic to define and use, but we also can’t reasonably expect our programming language to provide a twice built-in function. However, our language could provide a generally useful replicate builtin function (like Haskell’s replicate function):

replicate : ∀(a : Type) → Natural → a → List a

… with the following type-checking judgment:

Γ ⊢ n ⇐ Natural   Γ ⊢ x ⇒ a
───────────────────────────
Γ ⊢ replicate n x ⇒ List a

… and then you would tell the user to use replicate directly instead of defining their own twice function:

replicate 2 (replicate 2 true)

… and if the user were to ask you “How do I define a twice synonym for replicate 2” you would just tell them “Don’t do that. Use replicate 2 directly.”

Conclusion

This approach has the major upside that it’s much easier to implement a large number of keywords than it is to implement a unification algorithm, but there are other benefits to doing this, too!

  • It discourages complexity and fragmentation in user-space code

    Built-in polymorphic functions have an ergonomic advantage over user-defined polymorphic functions because under this framework type inference works better for built-in functions. This creates an ergonomic incentive to stick to the “standard library” of built-in polymorphic functions, which in turn promotes an opinionated coding style across all code written in that language.

    You might notice that this approach is somewhat similar in spirit to how Go handles polymorphism which is to say: it doesn’t handle user-defined polymorphic code well. For example, Go provides a few built-in language features that support polymorphism (e.g. the map data structure and for loops) but if users ask for any sort of user-defined polymorphism then the maintainers tell them they’re wrong for wanting that. The main difference here is that (unlike Go) we do actually support user-defined polymorphism; it’s not forbidden, but it is less ergonomic than sticking to the built-in utilities that support polymorphism..

  • It improves error messages

    When you special-case the type-checking logic you can also special-case the error messages, too! With general-purpose unification the error message can often be a bit divorced from the user’s intent, but with “keyword type checking” the error message is not only more local to the problem but it can also suggest highly-specific tips or fixes appropriate for that built-in function (or keyword or language construct).

  • It can in some cases more closely match the expectations of imperative programmers

    What I mean is: most programmers coming from an imperative and typed background are used to languages where (most of the time) polymorphism is “supported” via built-in language constructs and keywords and user-defined polymorphism might be supported but considered “fancy”. Leaning on polymorphism via keywords and language constructs would actually make them more comfortable using polymorphism instead of trying to teach them how to produce and consume user-defined polymorphic functions.

    For example, in a lot of imperative languages the idiomatic solution for how to do anything with a list is “use a for loop” where you can think of a for loop as a built-in keyword that supports polymorphic code. The functional programming equivalent of “just use a for loop” would be something like “just use a list comprehension” (where a list comprehension is a “keyword” that supports polymorphic code that we can give special type checking treatment).

That said, this approach is still more brittle than unification and will require more type annotations in general. The goal here isn’t to completely recover the full power of unification but rather to get something that’s not too bad but significantly easier to implement.

I think this “keyword type checking” can potentially occupy a “low tech” point in the type checking design space for functional programming languages that need to have efficient and compact implementations (e.g. for ease of embedding). Also, this can potentially provide a stop-gap solution for novice language implementers that want some sort of a type system but they’re not willing to commit to implementing a unification-based type system.

There’s also variation on this idea which Verity Scheel has been exploring, which is to provide userland support for defining new functions with special type-checking rules and there’s a post from her outlining how to do that:

User Operators with Implicits & Overloads


  1. The other approach is to create essentially an “ABNF for type checkers” that would let you write type-checking judgments in a standard format that could generate the corresponding type-checking code in multiple languages. That’s still a work-in-progress, though.↩︎

  2. I believe some people might take issue with calling these unification variables because they consider bidirectional type checking as a distinct framework from unification. Moreover, in the original bidirectional type checking paper they’re called “unsolved” variables rather than unification variables. However, I feel that for the purpose of this post it’s still morally correct to refer to these unsolved variables as unification variables since their usage and complexity tradeoffs are essentially identical to unification variables in traditional unification algorithms.↩︎

  3. … assuming let expressions are generalized.↩︎

by Gabriella Gonzalez (noreply@blogger.com) at April 26, 2025 07:49 PM

Matthew Sackman

Rallentando, and the awfulness of anti-virus software

Since I was a child, I’ve been playing the French Horn. I still play, and I take it quite seriously. I’m lucky enough to play with some good ensembles, and I perform many concerts each year.

When learning difficult music, I often practise with a metronome. A metronome is a device that clicks or beeps regularly. You can set how often it clicks; for example you might set it to click 80 times a minute. The tempo (or speed) of a piece of music is often specified by the composer telling you how many beats per minute they want. This is guidance and not sacrosanct: you don’t normally have to play at exactly this tempo, and frequently music requires some implicit variation of tempo for it to be successful. But it certainly is an important piece of information from the composer, telling you at what basic speed the piece should be going.

The problem with metronomes is that they can’t change their speed by themselves. For complex music, the speed could be changing a lot, or, maybe the number of beats per bar changes. This is annoying because it means for some pieces of music you have to keep stopping playing, readjust the metronome, and then continue on. There are also gradual changes in tempo in music: a part of a piece might gently get faster or slower. No metronome can cope with this: essentially, metronomes know nothing about the piece of music you’re playing.

So in some spare time this year, I built one that does understand music.

It does nearly all the musical things you would want it to. As this is mainly a technical blog though, here I’ll focus on those aspects.

It could have been an app; I’ve built apps before. But the faff of getting onto the play store, or the app store is just not worth it. The development tools are heavyweight and annoying. Sending out new versions requires approval processes, and you have to convince people to install something before they can use it. So I wanted this to be browser based. Also, modern web browsers are pretty amazing – great features and well designed APIs. Yep, all the old APIs are terrible and awful to work with, but everything that’s at all recent is pretty great.

Drawing music in a browser is quite a challenge though. The way I’m doing it is I’m building an SVG, client-side. This was the first thing I started on: trying to figure out how to draw music in a browser, how to be able to detect clicks, and make it all interactive. The client side code is all generated from TypeScript using the plain tsc to do the translation to JavaScript. I can’t stand complex tool-chains, and modern browsers are absolutely fine with loading modules (and you can do some really nice things with import maps as we’ll see). I’m not even minimising the JavaScript: I’ve written the server myself; the modules are sent over the wire gzipped and I have correct cache-control settings using immutable and “cache-busting”, so minimising the source just makes debugging life harder for no real gain.

A score is essentially a list of blocks. I’m using a CRDT (the fugue list CRDT) to allow local-first editing (and even offline editing). Dirty blocks get sent over a websocket and stored on the server, using LMDB which is all very normal for me.

The server has a neat part of its design: when you compile the server, all the static assets are embedded into the binary, thus making it a single self-contained executable. Now those assets (HTML, CSS, images, JavaScript etc) are just normally named files, but they can also be Go templates. When the server starts up, it works through these static assets, building HTTP routes for them. The HTTP routes contain in their paths the hashcode of the file – this is necessary for the cache busting. If the asset is a template, the server knows how to run the template, and critically, I provide a url function in the template engine so that templates can get the URL of some other asset including its hashcode. So this means that if some HTML file needs to link to some CSS file, the HTML file as built into the server can be a template. At start up, this template gets run, it can invoke this url function, and it can find out the final URL of the CSS file. And of course this URL now influences the hashcode of the HTML file itself. This also plays very nicely with the integrity attribute you can put on all sorts of things these days.

So it all works out rather nicely: if you consider the tree of file A importing files B and C, and file B imports file D, then if I make some change to file D, then it means its hashcode changes, and so its URL changes. This propagates up to file B, and from there to file A (but no change to file C). So it’s safe to serve all these static assets with immutable cache-control headers and rely on this lovely hashcode chaining. All of this work is done once, each time the server starts-up. And it’ll all explode and stop working if there’s ever a cycle in the graph of file imports.

Now in practice, it seems that references between HTML, CSS, images, or JavaScript don’t seem to create cycles – at least I’ve not had a problem so far. But between JavaScript modules, it’s much more common, as you’d likely expect. But here, import maps come to the rescue: in my TypeScript/JavaScript, I just import modules normally. I have a function in the template engine which knows how to generate an import-map of all my JavaScript modules, which gets injected into the top HTML page. This import-map provides both the rewriting of paths (to add hashcodes onto the paths), and also provides the integrity section. This solves the problem of circular imports because it means the JavaScript itself never needs to contain the hashcode of any module it imports. Yet, if I change some JavaScript module, then its hashcode changes, which means the import-map changes, and so again, the browser is forced into correctly fetching the updated resource.


A couple of weekends ago, I was up visiting my parents and I wanted to demonstrate this thing to them (they’re also musicians). They have computers running Windows. I tried loading up a demo score, and it just didn’t work. Their browsers were up to date. Nothing of note in the server logs, so I opened up the browser console and found errors from deserialisation of data coming over the websocket: it was claiming the data was corrupted. I’d never seen this in my own development and use.

Checking a few other things, and I spotted that the source HTML for the page had had some additional <script> elements added to it: something was injecting some JavaScript. And then the penny dropped: this is MITM behaviour by some shitty anti-virus software – in this case, AVG. Some quick web searching, and yep, those products are also known for dicking around with websocket traffic: if you’re sending binary messages and you’re compressing the stream, it’s apparently quite common that the anti-virus software intercepts the traffic, and then screws up the framing leading your own code to face corrupted data. Completely ridiculous.

In my case, disabling compression on the websocket was enough to prevent the corruption, and I then established that even for big scores, the initial load would be maybe 25kB of data over the websocket, so not compressing it isn’t terrible.

What made me laugh though was this: the browser console was telling me both about the corrupted data, and also about the fact the browser was refusing to run some script, due to it violating CSP settings. It took me a moment to realise that the script that wasn’t being run, was the script that the anti-virus software was injecting into my pages! Now, import-maps can’t be external files, they have to be inline in the HTML. But there’s no way I’m serving HTML pages with a CSP header with script-src 'unsafe-inline'. Instead, I’m issuing a script-src CSP header with 'self' and also the hashcode of import-map itself.

What this says to the browser is that it can trust the import-map (because hashing it will give a hashcode that matches the CSP header), the import-map itself has its integrity for every JavaScript module it needs to load, and the CSP headers also tell the browser that it’s OK to load JavaScript modules from the same domain (this is the 'self'). But, this does not give permission for the browser to run arbitrary bits of crap JavaScript that some awful anti-virus thing has injected! So, by making use of CSP and import-maps, you can defeat attackers from tampering with your website and code!

April 26, 2025 05:30 PM

April 24, 2025

Chris Penner

3 things other languages should steal from Unison

3 things other languages should steal from Unison

New languages are coming out all the time, some experimental, some industrial, others are purpose built for a specific domain. No single language has the people-power or scope to try every cool new feature, so a critical step in designing a new language is to observe how experimental features have borne themselves out in practice.

As the saying goes, good [language designers] copy, great [language designers] steal.

If you've heard anything about the Unison Language it's not a surprise to you that it innovates in many areas. Unison very much tries to reinvent Human-Compiler interactions for the 21st century, and in that pursuit has spawned fully integrated ecosystem between the compiler, codebase-manager, language server, version control and package manager.

While some of these features are still too new to have proven their worth (but we have our fingers crossed); there are aspects that I think new languages should certainly consider as part of their designs.

A Fully Interactive and Incremental Compiler

With the modern era of language servers and programming assistants, developers greatly benefit from instant feedback on their work. With traditional batch compilers it's all too tempting to go for a coffee, or a walk, or a YouTube binge every time you kick off a big build. The context-switching induced by switching tasks while compiling wastes developer time by paging things in and out of their working memory, not to mention: it just feels bad. After the build finishes, the developer is left with a giant wall of text, sentenced to dig through a large list of compiler errors trying to find some root-cause error in the file they're working on.

Unison has a fully interactive compilation experience. The language-server is typechecking your scratch-file on every keystroke providing error feedback right in your editor, and offering helpful information via hover-hints which use your codebase and typechecking info to help you orient yourself. It can even partially typecheck the file to suggest which types or operators you may want to fill into a given slot.

Once you're happy with a chunk of code, you can check it in to the codebase and it won't be compiled again unless you want to change it, or an update is automatically propagated into it from a downstream change.

While most languages won't adopt Unison's scratch-file and codebase model; having an interactive compiler with good support for caching of already-compiled-assets is a huge boon to productivity in any language.

On the topic of the language server, Unison's language server is built directly into the compiler. This ensures we avoid the awkward disagreements between the LSP and compiler that sometimes happen in other languages. It can also help to avoid duplicate work, many languages are running the compiler independently and in their LSP at the same time without sharing any of the work between them, causing redundant work and a waste of precious resources.

Codebase API

It's the compiler's job to understand your code intimately. It knows exactly how every definition is linked together, even if you don't! In many languages it can be frustrating to know that this information exists deep within the compiler, but not having any access to it yourself!

Unison stores all your code as structured data within your codebase and exposes the ability for you to ask it useful questions about your code, exposing that precious understanding to you as a developer.

Unison allows searching by type, finding the dependencies of a definition, or inverting that relationship to finding all definitions which depend on a definition.

Via the UCM CLI you can use utilities like text.find to search only string constants, or find to search only definition names.

Some codebase data is provided via an API which is exposed from the interactive UCM compiler, allowing developers to write tooling to customize their workflow. For example, check out this VS Code plugin someone wrote to view codebase definitions in the sidebar. In other languages you'd typically need to write a scrappy Regex or re-compile the code in a subprocess in order to achieve something similar.

It doesn't have to be an API, it could be a parquet file or a SQLite database or any number of things, the important part is that a language exposes its one-true-source of information about the codebase in some structured format for third-party tools to build upon.

Smart docs

It doesn't matter how great your language's package ecosystem is if nobody can figure out how to use it! Documentation is critical for helping end users understand and use functionality in your language, but it has a fatal flaw: documentation isn't compiled and falls out of date with the code.

In Unison, docs are a data-type within the language itself. This means that docs can be generated dynamically by running Unison code! We've leveraged this ability to enable embedding typechecked runnable code examples into your docs. These examples are compiled alongside the rest of your program, so they're guaranteed to be kept up to date, and the outputs from your example code is run and updated whenever the source definitions change.

You can also write code which generates documentation based on your real application code. For example, you could write code which crawls your web-server's implementation and collects all the routes and parameters the server defines and displays them nicely as documentation.

Unison goes one step further here by providing special support for the documentation format on Unison Share, ensuring any definitions mentioned in docs and code examples are hyper-linked to make for a seamless package-browsing experience.

As an example of how far this can go, check out this awesome project by community contributor Alvaro which generates mermaid graphs in the docs representing the behaviour of simulations. The graphs are generated from the same underlying library code so they won't go out of date.

Get stealing

This subset of topics doesn't touch on Unison's ability system, continuation capturing, or code serialization so I'll probably need at least a part 2!

Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! �

Become a Patron!

April 24, 2025 12:00 AM

April 23, 2025

Haskell Interlude

64: Sandy Maguire

In this episode Mike Sperber and Niki Vazou talk with Sandy Maguire, lead compiler engineer at Manifold Valley. They talk about the benefits of using Haskell of course, about all the books Sandy has written, on effects and the problem with monads, on combinator libraries and programming with laws.

by Haskell Podcast at April 23, 2025 10:00 PM

April 18, 2025

GHC Developer Blog

GHC 9.10.2-rc1 is now available

GHC 9.10.2-rc1 is now available

bgamari - 2025-04-18

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.10.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.10.2 is a bug-fix release fixing over 50 issues of a variety of severities and scopes. A full accounting of these fixes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

This release candidate will have a two-week testing period. If all goes well the final release will be available the week of 1 May 2025.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at April 18, 2025 12:00 AM

April 12, 2025

Derek Elkins

What difference lists actually are

Introduction

Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of concatenation.

The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists” represented this way is just function composition which is a constant time operation. We can lift a list xs to this representation via the section (xs ++). This will still lead to O(length xs) amount of work to apply this function, but a composition of such functions applied to a list will always result in a fully right associated expression even if the function compositions aren’t right associated.

In the last several years, it has become popular to refer to this technique as “difference lists”. Often no justification is given for this name. When it is given, it is usually a reference to the idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise to efficient concatenation, they have almost no similarities.

Functional Lists

To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it is doing is a bit misunderstood and, consequently, oversold1. Let’s see how we would model this approach in an OO language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it would be exactly the same for generic lists.

interface PrependTo {
  String prependTo(String end);
}

class Compose implements PrependTo {
  private PrependTo left;
  private PrependTo right;
  public Compose(PrependTo left, PrependTo right) {
    this.left = left; this.right = right;
  }
  String prependTo(String end) {
    this.left.prependTo(this.right.prependTo(end));
  }
}

class Prepend implements PrependTo {
  private String s;
  public Prepend(String s) { this.s = s; }
  String prependTo(String end) {
    return this.s + end;
  }
}

This is just a straight, manual implementation of closures for (.) and (++) (specialized to strings). Other lambdas not of the above two forms would lead to other implementations of PrependTo. Let’s say, however, these are the only two forms that actually occur, which is mostly true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an OOP encoding of the algebraic data type:

data PrependTo = Compose PrependTo PrependTo | Prepend String

prependTo :: PrependTo -> String -> String
prependTo (Compose left right) end = prependTo left (prependTo right end)
prependTo (Prepend s) end = s ++ end

We could have also arrived at this by defunctionalizing a typical example of the technique. Modulo some very minor details (that could be resolved by using the Church-encoded version of this), this does accurately reflect what’s going on in the technique. Compose is clearly constant time. Less obviously, applying these functional lists requires traversing this tree of closures – made into an explicit tree here. In fact, this reveals that this representation could require arbitrarily large amounts of work for a given size of output. This is due to the fact that prepending an empty string doesn’t increase the output size but still increases the size of the tree. In practice, it’s a safe assumption that, on average, at least one character will be prepended per leaf of the tree which makes the overhead proportional to the size of the output.

This tree representation is arguably better than the “functional list” representation. It’s less flexible for producers, but that’s arguably a good thing because we didn’t really want arbitrary String -> String functions. It’s more flexible for consumers. For example, getting the head of the list is a relatively efficient operation compared to applying a “functional list” and taking the head of the result even in an eager language. (Laziness makes both approaches comparably efficient.) Getting the last element is just the same for the tree version, but, even with laziness, is much worse for the functional version. More to the point, this concrete representation allows the concatenation function to avoid adding empty nodes to the tree whereas (.) can’t pattern match on whether a function is the identity function or not.

This view makes it very clear what the functional version is doing.

Difference Lists in Prolog

List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.

append([], Ys, Ys).
append([X|Xs], Ys, [X|Zs]) :- append(Xs, Ys, Zs).

For our purposes, viewing this as a function of the first two arguments, this is exactly the usual functional implementation of list concatenation with exactly the same problems. We could, of course, encode the defunctionalized version of the functional approach into (pure) Prolog. This would produce:

prepend_to(compose(Xs, Ys), End, Zs) :- prepend_to(Ys, End, End2), prepend_to(Xs, End2, Zs).
prepend_to(prepend(Xs), End, Zs) :- append(Xs, End, Zs).

(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)

However, this being a logic programming language means we have additional tools available to use that functional languages lack. Namely, unification variables. For an imperative (destructive) implementation of list concatenation, the way we’d support efficient append of linked lists is we’d keep pointers to the start and end of the list. To append two lists, we’d simply use the end pointer of the first to update the end of the first list to point at the start of the second. We’d then return a pair consisting of the start pointer of the first and the end pointer of the second.

This is exactly how Prolog difference lists work, except instead of pointers, we use unification variables which are more principled. Concretely, we represent a list as a pair of lists, but the second list will be represented by an unbound unification variable and the first list contains that same unification variable as a suffix. This pair is often represented using the infix operator (“functor” in Prolog terminology), -, e.g. Xs - Ys. We could use diff(Xs, Ys) or some other name. - isn’t a built-in operator, it’s just a binary constructor essentially.

At the level of logic, there are no unification variables. The constraints above mean that Xs - Ys is a list Xs which contains Ys as a suffix.

The name “difference list” is arguably motivated by the definition of concatenation in this representation.

concat(Xs - Ys, Ys - Zs, Xs - Zs).

This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument is unbound, like it’s supposed to be, then this is a constant-time operation of binding that component to Ys. If it is bound, then we need to unify which, in the worst-case, is O(length Ys) where the length is up to either nil or an unbound variable tail2.

We also have the unit of concat, i.e. the empty list via3:

empty(Xs - Xs).

See the footnote, but this does in some way identify Xs - Ys with the “difference” of Xs and Ys.

We get back to a “normal” list via:

to_list(Xs - [], Xs).

% or more generally,
prepend_to(Xs - Ys, Ys, Xs).

to_list is a constant-time operation, no matter what. Note, to_list binds the suffix component of the difference list. This means that the first input no longer meets our condition to be a difference list. In other words, to_list (and prepend_to) consumes the difference list. More precisely, it constrains the possible suffixes the list could be. Indeed, any operation that binds the suffix component of a difference list consumes it. For example, concat consumes its first argument.

Of course, it still makes logical sense to work with the difference list when its suffix component is bound, it’s just that its operational interpretation is different. More to the point, given a difference list, you cannot prepend it (via prepend_to or concat) to two different lists to get two different results.

Converting from a list does require traversing the list since we need to replace the nil node, i.e. [], with a fresh unbound variable. Luckily, this is exactly what append does.

from_list(Xs, Ys - Zs) :- append(Xs, Zs, Ys).

from_list also suggests this “difference list” idea. If all of Xs, Ys, and Zs are ground terms, then from_list(Xs, Ys - Zs) holds when append(Xs, Zs, Ys) holds. Exactly when if our invariants are maintained, i.e. that Zs is a suffix of Ys. Writing these relations more functionally and writing append as addition, we’d have:

\[\mathtt{from\_list}(Xs) = Ys - Zs \iff Xs + Zs = Ys\]

If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal) list with to_list, and then we could use from_list multiple times on that result. This would, of course, still consume the original difference list. We’d also be paying O(length Xs) for every duplicate, including to replace the one we just consumed4.

That said, we can prepend to a list to a difference list without consuming it. We can perform other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing into the list would force the list to be at least a certain length, but still allow prepending to any list that will result in a final list at least that long.

Comparison

I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing enforces that a value of type ShowS actually just appends something to its input. We could use abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference lists also need an abstraction barrier to ensure their invariants, though their failure modes are different. A difference list can’t change what it is based on what it is prepended to.

Functional Representation Difference Lists
constant-time concatenation constant-time concatenation
constant-time conversion from a list (though you pay for it later) O(n) conversion from a list
persistent non-persistent, requires linear use
represented by a tree of closures represented by a pair of a list and a unification variable
O(n) (or worse!) conversion to a list constant-time conversion to a list
defunctionalized version can be implemented in pretty much any language requires at least single-assignment variables
unclear connection to being the difference of two lists (which two lists?) mathematical, if non-obvious, connection to being the difference of two (given) lists


As an illustration of the difference between persistent and non-persistent uses, the function:

double f = f . f

is a perfectly sensible function on ShowS values that behaves exactly as you’d expect. On the other hand:

double(In, Out) :- concat(In, In, Out).

is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic list) except for when In is the empty difference list.

Conclusion

I hope I’ve illustrated that the functional representation is not just not difference lists, but is, in fact, wildly different from difference lists.

This functional representation is enshrined into Haskell via the ShowS type and related functions, but I’d argue the concrete tree representation is actually clearer and better. The functional representation is more of a cute trick that allows us to reuse existing functions. Really, ShowS should have been an abstract type.

Difference lists are an interesting example of how imperative ideas can be incorporated into a declarative language. That said, difference lists come with some of the downsides of an imperative approach, namely the lack of persistence.

As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional representation. Calling it “functional lists” or something like that is, in my opinion, very ambiguous and potentially misleading. I think the lack of a good name for this is why “difference lists” started becoming popular. As I’ve argued, using “difference list” in this context is even more misleading and confusing.

If people really want a name, one option might be “delta list”. I don’t think this term is used. It keeps the intuitive idea that the functional representation represents some “change” to a list, a collection of deltas that will all be applied at once, but it doesn’t make any false reference to difference lists. I’m not super into this name; I just want something that isn’t “difference list” or otherwise misleading.


  1. To be clear, it’s still much, much, better than using plain concatenation.↩︎

  2. Such a length relation couldn’t be written in pure Prolog but can in actual Prolog.↩︎

  3. For those algebraically minded, this almost makes concat and empty into another monoid except concat is partial, but such a partial monoid is just a category! In other words, we have a category whose objects are lists and whose homsets are, at most, singletons containing Xs - Ys for Hom(Xs, Ys). If we maintain our invariant that we have Xs - Ys only when Ys is a suffix of Xs, this thin category is exactly the category corresponding to the reflexive, transitive “has suffix” relation. We could generalize this to any monoid via a “factors through” relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is exactly the slice category of the monoid viewed as a one-object category.↩︎

  4. Again, in actual Prolog, we could make a duplicate without consuming the original, though it would still take O(length Xs) time using the notion of length mentioned before.↩︎

April 12, 2025 11:25 PM

April 04, 2025

Edward Z. Yang

Why you should maintain a personal LLM coding benchmark

Do you use an LLM for coding? Do you maintain a personal benchmark based on problems you have posed the LLM? The purpose of this blog post is to convince you should do this: that you can do so with marginal effort on top of your day-to-day vibe coding and that you will get both short and long term benefits from making your own personal benchmark exist.


I started thinking about benchmarks for coding in part with my frustration with the discourse around LLMs in the public squares I frequent (Reddit and Twitter). People often want to know "what's the best model" or "what's the best coding IDE"? One might imagine that the way to answer this question would be to test the models on a variety of problems from real world uses of the LLM for coding, and then compare how well various systems do on this. Indeed, whenever a new SOTA model releases, the lab will usually tell you about the model's performance against a few well known coding benchmarks. Problem solved?

https://blog.ezyang.com/wp-content/uploads/2025/03/Screenshot-2025-03-31-at-10.10.14%E2%80%AFAM.png

Of course not! In fact, for the most part, no one really talks about benchmarks when comparing models. Why? I argue the most popular benchmarks measure tasks that are largely different from what a user wants out of an LLM. For example, take the recent Gemini 2.5 Pro release. In their headline table, they test against LiveCodeBench, Aider Polyglot and SWE-bench Verified. Both LiveCodeBench and Aider Polyglot derive their problems from contest programming and pedagogical exercises (respectively), while SWE-bench assesses bug fixes to preexisting codebases. While useful, this is only a small slice things people want to do with LLMs.

Wouldn't it be great if you had your own, personal benchmark, based on problems you actually care about? If you are tweaking your .cursorrules, you could run your benchmark to see if a change you made helped or not. When a new model comes out, you could spend a few bucks to run your eval and make a decision if you should switch your daily driver. And then on social media, if you wanted to stan the new model, instead of asking the model to drop a ball inside a rotating hexagon or vagueposting about how the new model is incredible, you could just post your benchmark results.


Nicholas Carlini's Yet Another Applied LLM Benchmark is an existence proof that this playbook can work. As Nicholas describes it:

It's a collection of nearly 100 tests I've extracted from my actual conversation history with various LLMs.

There are two defining features of this benchmark that make it interesting. Most importantly, I've implemented a simple dataflow domain specific language to make it easy for me (or anyone else!) to add new tests that realistically evaluate model capabilities. This DSL allows for specifying both how the question should be asked and also how the answer should be evaluated. Most questions are evaluated by actually running the code the model writes but the framework supports a bunch of other evaluation methods as well. And then, directly as a result of this, I've written nearly 100 tests for different situations I've actually encountered when working with LLMs as assistants.

I have been working on my own benchmark based off of Carlini's benchmark, and I can confirm that this works well for the traditional style of coding eval, where you have a one-shot task that generates and executes the code against some test cases. My basic strategy is to vibe code as usual, but whenever I give an LLM a task that it isn't able to one shot, I consider adding it to the benchmark. In more detail:

  • I only add a task if a SOTA LLM failed it. This ensures the benchmark consists of all appropriate difficulty problems: easy enough that I thought an LLM should be able to do it, but hard enough that a SOTA model failed on it. I don't need problems that are too hard (this is already well covered by well known benchmarks like SWE-Bench or SWE-Lancer), and I don't mind if my problems saturate because, hey, that means the models are that much better for my use cases!
  • After I have added the task to the benchmark, I can use the benchmark runner to tell if changing the model, tweaking the prompt, or even just running the prompt again at nonzero temperature can make it pass. Indeed, it's helpful to find some configuration that makes the eval pass, as this is good for debugging issues in the evaluation function itself... also it means you have working code for whatever task you were working on. Conversely, you can make the task harder by leaving things out from the prompt.
  • Writing the test is the labor intensive part, but you can always vibe code a test. Importantly, you have a failing implementation (your initial generation) and some way you (manually?) determined that the implementation was wrong, so just turn this into your evaluation function! (And for all you yak shaving aficionados, if the model fails to vibe code your test, well, you have another task for your benchmark!)

For example, the other day I needed to take an asciinema recording and convert it into a sequence of frames rendered as plain text. However, the only project for doing these conversations was agg, which converts recordings into animated gifs. In agg_to_text, I ask an LLM to take agg's source code and create a new program which dumps the frames as plain text rather than gif images. The reason why this task is difficult, is because there is some discretion in deciding when to emit a frame, and with my original prompt the LLM didn't precisely replicate the original behavior in agg. While working on the benchmark, I realized that instructing the model specifically about how frame batching worked was enough to get it to preserve the original behavior. But I don't think I should need to do this: thus this task. (P.S. If this test saturates, well, I can always make it harder by removing the agg source code from the prompt.)


The ability to benchmark one shot tasks is here today, but I would like to speculate a bit about what lies beyond them. In particular, most of my LLM coding activity involves asking the LLM to make changes to a pre-existing project, which makes it less amenable to "single prompt creates self contained program". (Also, I usually only ask one-shot questions that the LLM can answer, so most of them would never go in my benchmark.)

In short, how can I extract tasks from my day-to-day work? There seems to be two big extra levers we have:

  • Codebase tasks. This is the heavy-weight approach: you record the Git commit of your codebase at the time you prompted for some new feature to be added, and then when you want to run an eval on a new model you just check out the codebase at that commit and let the end-to-end system go. You'll typically want to execute the modified code, which means you'll also need a way to reliably setup the runtime environment for the code; things like lockfiles can help a lot here.
  • Transcript tasks. You don't actually need the entire codebase to be available to ask an LLM for a completion; you only need the conversation transcript up to the point of the critical generation. If the transcript is mostly your agent system reading in files for context, you can end up with a relatively system generic prompt that can tell you something about other systems. Of course, if you want to actually run the change, you still need the full codebase, which is why this approach is much more amenable if you're going to do some static analysis on the output. For example, if a model keeps adding try: ... except: ... blocks that are suppressing errors, you can take some transcripts where you've caught the model red-handed doing this and make an eval that checks if the model is still doing this. I suspect testing on transcripts works best for testing if changing prompts or rules improves performance, since the transcript itself will put the model into some particular latent space and if it were a different model they might have made different choices leading to a different latent space. Transcripts from thinking models are especially susceptible to this!

I have started adapting Carlini's framework to work better for these cases, although I would love to be told someone has already solved this problem for me. In particular, I am very excited about using transcript tasks to evaluate whether or not things I add to my prompts / triggered rules are helping or not. Current SOTA model instruction following isn't great and I regularly catch models doing behaviors that I explicitly told them not to in the system prompt. I have started some initial analysis over all of my chat logs to find cases where the model misbehaved, although I haven't quite worked out how I want to build an eval out of it.

One word of warning: to make transcript tasks, you need an AI coding system that doesn't obscure how it assembles its underlying prompts (which rules out most of the popular closed source AI code editors.)


I started building evals for a selfish reason: I wanted to be able to tell if modifications to my prompts were doing anything. But I also think there is a broader opportunity that arises if we also publish these benchmarks to the world.

For one, building a real world benchmark on use cases we care about is a way to communicate to the people training AI models whether or not they are doing well or not. Historical evals have focused on LeetCoding, and consequently we have models that would ace any big tech interview and yet on real world tasks will drive you off a cliff at the first opportunity. And this is not just free labor for the top labs: if you believe in open source models, one of the biggest barriers to good small models is having really high quality data. We, the OSS vibe coding community, can directly help here.

I think there is a tremendous opportunity for the open source community to really push the state of the art in coding evaluations. There's only so many benchmarks that I, personally, can create, but if everyone is making benchmarks I could eventually imagine a universe of benchmarks where you could curate the problems that are relevant to your work and quickly and cheaply judge models in this way: a Wikipedia of Coding Benchmarks.

To summarize: every time an LLM fails to solve a problem you ask it for, this is a potential new benchmark. As long as there is a way to automate testing if the LLM has solved the problem, you can turn this into a benchmark. Do this for yourself, and you can quickly have a personal benchmark with which to evaluate new models. Do this at scale, and you can help push the frontier in coding models.

by Edward Z. Yang at April 04, 2025 07:05 AM