Planet Haskell

November 07, 2024

Donnacha Oisín Kidney

POPL Paper—Algebraic Effects Meet Hoare Logic in Cubical Agda

Posted on November 7, 2024
Tags:

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

by Donnacha Oisín Kidney at November 07, 2024 12:00 AM

September 27, 2024

Chris Smith 2

Playing With a Game

In a recent comment (that I sadly cannot find any longer) in https://www.reddit.com/r/math/, someone mentioned the following game. There are n players, and they each independently choose a natural number. The player with the lowest unique number wins the game. So if two people choose 1, a third chooses 2, and a fourth chooses 5, then the third player wins: the 1s were not unique, so 2 was the least among the unique numbers chosen. (Presumably, though this wasn’t specified in the comment, if there is no unique number among all players, then no one wins).

I got nerd-sniped, so I’ll share my investigation.

For me, since the solution to the general problem wasn’t obvious, it made sense to specialize. Let’s say there are n players, and just to make the game finite, let’s say that instead of choosing any natural number, you choose a number from 1 to m. Choosing very large numbers is surely a bad strategy anyway, so intuitively I expect any reasonably large choice of m to give very similar results.

n = 2

Let’s start with the case where n = 2. This one turns out to be easy: you should always pick 1, daring your opponent to pick 1, as well. We can induct on m to prove this. If m = 1, then you are required to pick 1 by the rules. But if m > 1, suppose you pick m. Either your opponent also picks m and you both lose, or your opponent picks a number smaller than m and you still lose. Clearly, this is a bad strategy, and you always do at least as well choosing one of the first m - 1 options instead. This reduces the game to one where we already know the best strategy is to pick 1.

That wasn’t very interesting, so let’s try more players.

n = 3, m = 2

Suppose there are three players, each choosing either 1 or 2. It’s impossible for all three players to choose a different number! If you do manage to pick a unique number, then, you will be the only player to do so, so it will always be the least unique number simply because it’s the only one!

If you don’t think your opponents will have figured this out, you might be tempted to pick 2, in hopes that your opponents go for 1 to try to get the least number, and you’ll be the only one choosing 2. But this makes you predictable, so the other players can try to take advantage. But if one of the other players reasons the same way, you both are guaranteed to lose! What we want here is a Nash equilibrium: a strategy for all players such that no single player can do better by deviating from that strategy.

It’s not hard to see that all players should flip a coin, choosing either 1 or 2 with equal probability. There’s a 25% chance each that a player picks the unique number and wins, and there’s a 25% chance that they all choose the same number and all lose. Regrettable, but anything you do to try to avoid that outcome just makes your play more predictable so that the other players could exploit that.

It’s interesting to look at the actual computation. When computing a Nash equilibrium, we generally rely on the indifference principle: a player should always be indifferent between any choice that they make at random, since otherwise, they would take the one with the better outcome and always play that instead.

This is a bit counter-intuitive! Naively, you might think that the optimal strategy is the one that gives the best expected result, but when a Nash equilibrium involves a random choice— known as a mixed strategy — then any single player actually does equally well against other optimal players no matter which mix of those random choices they make! In this game, though, predictability is a weakness. Just as a poker player tries to avoid ‘tells’ that give away the strength of their hand, players in this number-choosing game need to be unpredictable. The reason for playing the Nash equilibrium isn’t that it gives the best expected result against optimal opponents, but rather that it can’t be exploited by an opponent.

Let’s apply this indifference principle. This game is completely symmetric — there’s no order of turns, and all players have the same choices and payoffs available — so an optimal strategy ought to be the same for any player. Then, let’s say p is the probability that any single player will choose 1. Then if you choose 1, you will win with probability (1 — p)², while if you choose 2, you’ll win with probability p². If you set these equal to each other as per the indifference principle, and solve the equation, you get p = 0.5, as we reasoned above.

n = 3, m = 3

Things get more interesting if each player can choose 1, 2, or 3. Now it’s possible for each player to choose uniquely, so it starts to matter which unique number you pick. Let’s say each player chooses 1, 2, and 3 with the probabilities p, q, and r respectively. We can analyze the probability of winning with each choice.

  • If you pick 1, then you always win unless someone else also picks a 1. Your chance of winning, then, is (qr)².
  • If you pick 2, then for you to win, either both other players need to pick 1 (eliminating each other because of uniqueness and leaving you to win by default), or both other players need to pick 3, so that you’ve picked the least number. Your chance of winning is p² + r².
  • If you pick 3, then you need your opponents to pick the same different number: either 1 or 2. Your chance of winning is p² + q².

Setting these equal to each other immediately shows us that since p² + q² = p² + r², we must conclude that q = r. Then p² + q² = (q + r)² = 4q², so p² = 3q² = 3r². Together with p + q + r = 1, we can conclude that p = 2√3 - 3 ≈ 0.464, while q = r = 2 - √3 ≈ 0.268.

This is our first really interesting result. Can we generalize?

n = 3, in general

The reasoning above generalizes well. If there are three players, and you pick a number k, you are betting that either the other two players will pick the same number less than k, or they will each pick numbers greater than k (regardless of whether they are the same one).

I’ll switch notation here for convenience. Let X be a random variable representing a choice by a player from the Nash equilibrium strategy. Then if you choose k, your probability of winning is P(X=1)² + … + P(X=k-1)² + P(X>k)². The indifference principle tells us that this should be equal for any choice of k. Equivalently, for any k from 1 to m - 1, the probability of winning when choosing k is the same as the probability when choosing k + 1. So:

  • P(X=1)² + … + P(X=k-1)² + P(X>k)² = P(X=1)² + … + P(X=k)² + P(X>k+1)²
  • Cancelling the common terms: P(X>k)² = P(X=k)² + P(X>k+1)²
  • Rearranging: P(X=k) = √(P(X≥k+1)² - P(X>k+1)²)

This gives us a recursive formula that we can use (in reverse) to compute P(X=k), if only we knew P(X=m) to get started. If we just pick something arbitrary, though, it turns out that all the results are just multiples of that choice. We can then divide by the sum of them all to normalize the probabilities to sum to 1.

Here I can write some code (in Haskell):

import Probability.Distribution (Distribution, categorical, probabilities)

nashEquilibriumTo :: Integer -> Distribution Double Integer
nashEquilibriumTo m = categorical (zip allPs [1 ..])
where
allPs = go m 1 0 []
go 1 pEqual pGreater ps = (/ (pEqual + pGreater)) <$> (pEqual : ps)
go k pEqual pGreater ps =
let pGreaterEqual = pEqual + pGreater
in go
(k - 1)
(sqrt (pGreaterEqual * pGreaterEqual - pGreater * pGreater))
pGreaterEqual
(pEqual : ps)

main :: IO ()
main = print (probabilities (nashEquilibriumTo 100))

I’ve used a probability library from https://github.com/cdsmith/prob that I wrote with Shae Erisson during a fun hacking session a few years ago. It doesn’t help yet, but we’ll play around with some of its further features below.

Trying a few large values for m confirms my suspicion that any reasonably large choice of m gives effectively the same result.

1 -> 0.4563109873079237
2 -> 0.24809127016999155
3 -> 0.1348844977362459
4 -> 7.333521940168612e-2
5 -> 3.987155303205954e-2
6 -> 2.1677725302500214e-2
7 -> 1.1785941067126387e-2

By inspection, this appears to be a geometric distribution, parameterized by the probability 0.4563109873079237. We can check that the distribution is geometric, which just means that for all k < m - 1, the ratio P(X > k) / P(X k) is the same as P(X > k + 1) / P(Xk + 1). This is the defining property of a geometric distribution, and some simple algebra confirms that it holds in this case.

But what is this bizarre number? A few Google queries gets us to an answer of sorts. A 2002 Ph.D. dissertation by Joseph Myers seems to arrive at the same number in the solution to a question about graph theory, where it’s identified as the real root of the polynomial x³ - 4x² + 6x - 2. We can check that this is right for a geometric distribution. Starting with P(X=k) = √(P(X≥k+1)² -P(X>k+1)²) where k = 1, we get P(X=1) = √(P(X ≥ 2)² -P(X > 2)²). If P(X=1) = p, then P(X ≥ 2) = 1 - p, and P(X > 2) = (1 - p)², so we have p = √((1-p)² - ((1 - p)²)²), which indeed expands to p⁴ - 4p³ + 6p² - 2p = 0, so either p = 0 (which is impossible for a geometric distribution), or p³ - 4p² + 6p - 2 = 0, giving the probability seen above. (How and if this is connected to the graph theory question investigated in that dissertation, though, is certainly beyond my comprehension.)

You may wonder, in these large limiting cases, how often it turns out that no one wins, or that we see wins with each number. Answering questions like this is why I chose to use my probability library. We can first define a function to implement the game’s basic rule:

leastUnique :: (Ord a) => [a] -> Maybe a
leastUnique xs = listToMaybe [x | [x] <- group (sort xs)]

And then we can define the whole game using the strategy above for each player:

gameTo :: Integer -> Distribution Double (Maybe Integer)
gameTo m = do
ns <- replicateM 3 (nashEquilibriumTo m)
return (leastUnique ns)

Then we can update main to tell us the distribution of game outcomes, rather than plays:

main :: IO ()
main = print (probabilities (gameTo 100))

And get these probabilities:

Nothing -> 0.11320677243374572
Just 1 -> 0.40465349320873445
Just 2 -> 0.22000565820506113
Just 3 -> 0.11961465909617276
Just 4 -> 6.503317590749513e-2
Just 5 -> 3.535782320137907e-2
Just 6 -> 1.9223659987298684e-2
Just 7 -> 1.0451692718822408e-2

An 11% probability of no winner for large m is an improvement over the 25% we computed for m = 2. Once again, a least unique number greater than 7 has less than 1% probability, and the probabilities drop even more rapidly from there.

More than three players?

With an arbitrary number of players, the expressions for the probability of winning grow rather more involved, since you must consider the possibility that some other players have chosen numbers greater than yours, while others have chosen smaller numbers that are duplicated, possibly in twos or in threes.

For the four-player case, this isn’t too bad. The three winning possibilities are:

  • All three other players choose the same smaller number. This has probability P(X=1)³ + … + P(X=k-1)³
  • All three other players choose larger numbers, though not necessarily the same one. This has probability P(X k
  • Two of the three other players choose the same smaller number, and the third chooses a larger number. This has probability 3 P(X > k) (P(X=1)² + … + P(X=k-1)²)

You could possibly work out how to compute this one without too much difficulty. The algebra gets harder, though, and I dug deep enough to determine that the Nash equilibrium is no longer a geometric distribution. If you assume the Nash equilibrium is geometric, then numerically, the probability of choosing 1 that gives 1 and 2 equal rewards would need to be about 0.350788, but this choice gives too small a reward for choosing 3 or more, implying they ought to be chosen less often.

For larger n, even stating the equations turns into a nontrivial problem of accurately counting the possible ways to win. I’d certainly be interested if there’s a nice-looking result here, but I do not yet know what it is.

Numerical solutions

We can solve this numerically, though. Using the probability library mentioned above, one can easily compute, for any finite game and any strategy (as a probability distribution of moves) the expected benefit for each choice.

expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double]
expectedOutcomesTo n m dist =
[ probability (== Just i) $ leastUnique . (i :) <$> replicateM (n - 1) dist
| i <- [1 .. m]
]

We can then then iteratively adjust the probability of each choice slightly based on how its expected outcome compares to other expected outcomes in the distribution. It turns out to be good enough to compare with an immediate neighbor. Just so that all of our distributions remain valid, instead of working with the global probabilities P(X=k), we’ll do the computation with conditional probabilities P(X = k | X k), so that any sequence of probabilities is valid, without worrying about whether they sum to 1. Given this list of conditional probabilities, we can produce a probability distribution like this.

distFromConditionalStrategy :: [Double] -> Distribution Double Int
distFromConditionalStrategy = go 1
where
go i [] = pure i
go i (q : qs) = do
choice <- bernoulli q
if choice then pure i else go (i + 1) qs

Then we can optimize numerically, using the difference of each choice’s win probability from its neighbor as a diff to add to the conditional probability of that choice.

refine :: Int -> Int -> [Double] -> Distribution Double Int
refine n iters strategy
| iters == 0 = equilibrium
| otherwise =
let ps = expectedOutcomesTo n m equilibrium
delta = zipWith subtract (drop 1 ps) ps
adjs = zipWith (+) strategy delta
in refine n (iters - 1) adjs
where
m = length strategy + 1
equilibrium = distFromConditionalStrategy strategy

It works well enough to run this for 10,000 iterations at n = 4, m = 10.

main :: IO ()
main = do
let n = 4
m = 10
d = refine n 10000 (replicate (m - 1) 0.3)
print $ probabilities d
print $ expectedOutcomesTo n m d

The resulting probability distribution is, to me, at least, quite surprising! I would have expected that more players would incentivize you to choose a higher number, since the additional players make collisions on low numbers more likely. But it seems the opposite is true. While three players at least occasionally (with 1% or more probability) should choose numbers up to 7, four players should apparently stop at 3.

Nash equilibrium strategy for n = 4, m = 10

Huh. I’m not sure why this is true, but I’ve checked the computation in a few ways, and it seems to be a real phenomenon. Please leave a comment if you have a better intuition for why it ought to be so!

With five players, at least, we see some larger numbers again in the Nash equilibrium, lending support to the idea that there was something unusual going on with the four player case. Here’s the strategy for five players:

Nash equilibrium strategy for n = 5, m = 10

The six player variant retracts the distribution a little, reducing the probabilities of choosing 5 or 6, but then 7 players expands the choices a bit, and it’s starting to become a pattern that even numbers of players lend themselves to a tighter style of play, while odd numbers open up the strategy.

Nash equilibrium strategy for n = 6, m = 10
Nash equilibrium strategy for n = 7, m = 10
Nash equilibrium strategy for n = 8, m = 10

In general, it looks like this is converging to something. The computations are also getting progressively slower, so let’s stop there.

Game variants

There is plenty of room for variation in the game, which would change the analysis. If you’re looking for a variant to explore on your own, in addition to expanding the game to more players, you might try these:

  • What if a tie awards each player an equal fraction of the reward for a full win, instead of nothing at all? (This actually simplifies the analysis a bit!)
  • What if, instead of all wins being equal, we found the least unique number, and paid that player an amount equal to the number itself? Now there’s somewhat less of an incentive for players to choose small numbers, since a larger number gives a large payoff! This gives the problem something like a prisoner’s dilemma flavor, where players could coordinate to make more money, but leave themselves open to being undercut by someone willing to make a small profit by betraying the coordinated strategy.

What other variants might be interesting?

Addendum (Sep 26): Making it faster

As is often the case, the naive code I originally wrote can be significantly improved. In this case, the code was evaluating probabilities by enumerating all the ways players might choose numbers, and then computing the winner for each one. For large values of m and n this is a lot, and it grows exponentially.

There’s a better way. We don’t need to remember each individual choice to determine the outcome of the game in the presence of further choices. Instead, we need only determine which numbers have been chosen once, and which have been chosen more than once.

data GameState = GameState
{ dups :: Set Int,
uniqs :: Set Int
}
deriving (Eq, Ord)

To add a new choice to a GameState requires checking whether it’s one of the existing unique or duplicate choices:

addToState :: Int -> GameState -> GameState
addToState n gs@(GameState dups uniqs)
| Set.member n dups = gs
| Set.member n uniqs = GameState (Set.insert n dups) (Set.delete n uniqs)
| otherwise = GameState dups (Set.insert n uniqs)

We can now directly compute the distribution of GameState corresponding to a set of n players playing moves with a given distribution. The use of simplify from the probability library here is crucial: it combines all the different paths that lead to the same outcome into a single case, avoiding the exponential explosion.

stateDist :: Int -> Distribution Double Int -> Distribution Double GameState
stateDist n moves = go n (pure (GameState mempty mempty))
where
go 0 states = states
go i states = go (i - 1) (simplify $ addToState <$> moves <*> states)

Now it remains to determine whether a certain move can win, given the game state resulting from the remaining moves.

win :: Int -> GameState -> Bool
win n (GameState dups uniqs) =
not (Set.member n dups) && maybe True (> n) (Set.lookupMin uniqs)

Finally, we update the function that computes win probabilities to use this new code.

expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double]
expectedOutcomesTo n m dist = [probability (win i) states | i <- [1 .. m]]
where
states = stateDist (n - 1) dist

The result is that while I previously had to leave the code running overnight to compute the n = 8 case, I can now easily compute cases up to 15 players with enough patience. This would involve computing the winner for about a quadrillion games in the naive code, making it hopeless , but the simplification reduces that to something feasible.

Nash equilibria for 2 through 15 players

It seems that once you leave behind small numbers of players where odd combinatorial things happen, the equilibrium eventually follows a smooth pattern. I suppose with enough players, the probability for every number would peak and then decline, just as we see for 4 and 5 here, as it becomes worthwhile to spread your choices even further to avoid duplicates. That’s a nice confirmation of my intuition.

by Chris Smith at September 27, 2024 07:19 AM

September 25, 2024

Oskar Wickström

How I Built "The Monospace Web"

Recently, I published The Monospace Web, a minimalist design exploration. It all started with this innocent post, yearning for a simpler web. Perhaps too typewriter-nostalgic, but it was an interesting starting point. After some hacking and sharing early screenshots, @noteed asked for grid alignment, and down the rabbit hole I went.

September 25, 2024 10:00 PM

September 24, 2024

Tweag I/O

Python Packaging in the Real World: Biomedical projects vs. PyPI

The Python programming language, and its huge ecosystem (there are more than 500,000 projects hosted on the main Python repository, PyPI), is used both for software engineering and scientific research. Both have similar requirements for reproducibility. But, as we will see, the practices are quite different.

In fact, the Python ecosystem and community is notorious for the countless ways it uses to declare dependencies. As we were developping FawltyDeps1, a tool to ensure that declared dependencies match the actual imports in the code, we had to accommodate many of these ways. This got us thinking: Could FawltyDeps be used to gain insights into how packaging is done across Python ecosystems?

In this blog post, we look at project structures and dependency declarations across Python projects, both from biomedical scientific papers (as an example of scientific usage of Python) as well as from more general and widely used Python packages. We’ll try to answer the following questions:

  • What practices does the community actually follows? And how do they differ between software engineering and scientific research?
  • Could such differences be related to why it’s often hard to reproduce results from scientific notebooks published in the data science community?

Experiment setup

In the following, we discuss the experimental setup — how we decided which data to use, where to get this data from, and what tools we use to analyze it, before we discuss our results in depth.

Data

First, we need to collect the names and source code locations of projects that we want to include in the analysis. Now, where did we find these projects? We selected projects for analysis based on two key areas: impactful real-world applications and broad community adoption.

  1. Biomedical data analysis repositories: biomedical data plays a vital role in healthcare and research. To capture its significance, we focused on packages directly linked to biomedical data, sourced from repositories supported or referenced by scientific biomedical articles. This criterion anchored our experiment in real-world scientific applications.
  2. To analyze software engineering practices, we’ve chosen to use the most popular PyPI packages: acknowledging the importance of widely adopted packages, we included a scan of the most downloaded and frequently used PyPI packages.

Biomedical data

We leverage a recent study by Samuel, S., & Mietchen, D. (2024): Computational reproducibility of Jupyter notebooks from biomedical publications. This study analyzed 2,177 GitHub repositories associated with publications indexed in PubMed Central to assess computational reproducibility. Specifically, we reused the dataset they generated (found here) for our own analyses.

PyPI data

In order to start analyzing actual projects published to PyPI, we still needed to access some basic metadata about these projects: the project’s name, source URL, and any extra metadata which could be useful for further analysis such as project tags.

While this information is available via the PyPI REST API, this API is subject to rate limiting and is not really designed for bulk analyses such as ours. Conveniently, Google maintains a public BigQuery dataset of PyPI download statistics and project metadata which we leveraged instead. As a starting point for our analysis, we produced a CSV with relevant metadata for top packages downloaded in 2023 using a simple SQL query. Since the above-mentioned biomedical database contains 2,177 projects, we conducted a scan of the first 2,000 PyPI packages to create a dataset of comparable size.

Using FawltyDeps to analyze the source code data

Now that we have the source URLs of our projects of interest, we downloaded all sources and ran an analysis script that wraps around FawltyDeps on the packages. For safety, all of this happened in a virtual machine.

Post-processing and filtering of FawltyDeps analysis results

While the data we collected from PyPI was quite clean (modulo broken or inaccessible project URLs), the biomedical dataset contained some projects written in R and some projects written in Python 2.X, which are outside of our scope. To further filter for relevant projects that are written in Python 3.X, we applied the following rules:

  • there should be .py or .ipynb files in the source code directory of the data. If there are only .ipynb files and no imports, then it is most likely an R project and not taken into account.
  • we are also only interested in Python projects that have 3rd-party imports, as these are the project we would expect to declare their dependencies.

After these filtering steps, we have 1,260 biomedical projects and 1,118 PyPI packages to be analyzed.

Results

Now that we had crunched thousands of Python packages, we were curious to see what secrets the data produced by FawltyDeps would reveal!

Dependency declaration patterns

First, we investigated which dependency declaration file choices were made in both samples. The following pie charts show the proportion of projects with and without dependency declaration files, and whether these files actually contain dependency declarations.

distribution deps 1 distribution deps 1 new
Figure 1. Percent of projects with dependency declaration files and actual dependency(ies) declared.

We find that about 60% of biomedical projects have dependency declaration files, while for PyPI packages, that number is almost 100%. That is expected, as the top PyPI projects are written to be reproducible: they are downloaded by a large group of people and if they are not working due to lack of dependency declarations, it would be noticed immediately by the users.

Interestingly, we found that some biomedical projects (6.8%) and PyPI packages (16.0%) have dependency declaration files with no dependencies listed inside them. This might be because they genuinely have no third-party dependencies, but more commonly it is a symptom of either:

  • setup.py files with complex dependency calculations: although FawltyDeps supports parsing simple setup.py files with a single setup()call and no computation involved for setting the install_requires and extras_require arguments, it is currently not able to analyze more complex scenarios.
  • pyproject.toml might be used to configure tools with sections like [tool.black] or [tool.isort], and declaring dependencies (and other project metadata) in the same file is not strictly required.

For the remainder of the analysis, we do not take these cases into account.

We then examined how different package types utilize various dependency declaration methods. The following chart shows the distribution of requirements.txt, pyproject.toml, and setup files across biomedical projects and PyPI packages (note that these three categories are not exclusive):

distribution deps 2
Figure 2. Percent of projects with dependencies declared in `requirements.txt`, `pyproject.toml` and setup files.

For biomedical projects, requirements.txt and setup.py/setup.cfg files are a majority of declaration files. In contrast, PyPI projects show a higher occurrence of pyproject.toml compared to biomedical projects. pyproject.toml is a suggested modern way of declaring dependencies. This result should not come as a surprise: top PyPI projects are actively maintained and are more likely to follow best practices. A requirements.txt file, on the other hand, is easier to add and if you do not need to package your projects it is a simpler option.

Now let’s have a more detailed view in which categories are exclusive:

distribution deps 3
Figure 3. Distribution of mutually exclusive dependency file choices.

For biomedical data there are a lot of projects that have either requirements.txt or setup.py/setup.cfg files (or a combination of both) present. The traditional method of using setup files utilizing setuptools to create Python packages has been around for a while and is still heavily relied upon in the scientific community.

On the PyPI side, no single method for declaring dependencies stood out, as different approaches were used with similar frequency across all projects. However, when it comes to using pyproject.toml, PyPI packages were about five times more likely to adopt this method compared to biomedical projects, suggesting that PyPI package authors tend to favor pyproject.toml significantly more often for dependency management.

Also, almost no top biomedical projects (only 2 out of 1,260) and very few PyPI packages (only 25 out of 1,118) used pyproject.toml and setup files together: it seems that projects don’t often mix the older method - setup files - with the more modern one - pyproject.toml - at the same time.

A different method of visualizing the subset of results pertaining to requirements.txt, pyproject.toml and setup.py/setup.cfg files are Venn diagrams:

distribution deps 4
Figure 4. Venn diagram of projects with dependencies declared with categories including combination of dependency files.

While these diagrams don’t contain new insights, they show clearly how much more common pyproject.toml usage is for PyPI packages.

Source code directories

We next examined where projects store their source code, which we refer to as the “source code directory”. In the following analysis, we defined this directory as the directory that contains the highest number of Python code files and does not have names like “test”, “example”, “sample”, “doc”, or “tutorial”.

code structure
Figure 5. Source code directories choices.

We can make some interesting observations: Over half (53%) of biomedical projects store their main source code in a directory with a name different than the project itself, and source code is not commonly stored in directories named src or src-python (7%). For PyPI projects, the numbers are lower, with 37% storing their main code in a directory that matches the project name. However, naming the source code directory differently from the package name is still fairly common for PyPI projects, appearing in 36% of cases. A somewhat surprising finding: the src layout, recommended by Python packaging user guide, appears in only 14% of cases.

Another noteworthy observation is that 23% of biomedical projects store all their source code in the root directory of the project. In contrast, only 12% of PyPI projects follow this pattern. This difference makes sense, as scientists working on biomedical projects might be less concerned about maintaining a strict code structure compared to developers on PyPI. Additionally, a lot of biomedical projects might be a loose collection of notebooks/scripts not intended to be packaged/importable, and thus will typically not need to add any subdirectories at all. On the other hand, everything from the PyPI data set is an importable package. Even in the “flat” layout (according to discussion), related modules are collected in a subdirectory named after the package.

The top PyPI projects that keep their code in the root directory are often small Python modules or plugins, like “python-json-patch”, “appdirs”, and “python-json-pointer”. These projects usually have all their source code in a single file, so storing it in the root directory makes sense.

Key results

Many people have preconceptions about how a Python project should look, but the reality can be quite different. Our analysis reveals distinct differences between top PyPI projects and biomedical projects:

  • PyPI projects tend to use modern tools like pyproject.toml more frequently, reflecting better overall project structure and dependency management practices.
  • In contrast, biomedical projects display a wide variety of practices; some store code in the root directory and fail to declare dependencies altogether.

This discrepancy is partially explained by the selection criteria: popular PyPI packages, by necessity, must be usable and thus correctly declare their dependencies, while biomedical projects accompanying scientific papers do not face such stringent requirements.

Conclusion

We found that biomedical projects are written with less attention to the coding best practices, which compromises their reproducibility. There are many projects without dependencies declared. The use of pyproject.toml, which is current state-of-the-art way to declare dependencies is less frequently present in biomedical packages. In our opinion, though, it’s essential for any package to adhere to the same high standards of reproducibility as top PyPI packages. This includes implementing robust dependency management practices and embracing modern packaging standards. Enhancing these practices will not only improve reproducibility but also foster greater trust and adoption within the scientific community.

While our initial analysis revealed some interesting insights, we feel that there might be some more interesting treasures to be found within this dataset - you can check yourself in our FawltyDeps-analysis repository! We invite you to join the discussion on FawltyDeps and reproducibility in package management on our Discord channel.

Finally, this experiment also served as a real-world stress test for FawltyDeps itself and identified several edge cases we had not yet accounted for, suggesting avenues of further development for FawltyDeps: One of the main challenges was to parse unconventional require and extra-require sections in setup.py files. This issue has been addressed by the FawltyDeps project, specifically through the improvements made in FawltyDeps PR #440. Furthermore, it was also not trivial to handle projects with multiple packages declared in one. Addressing these issues will be a focus as we continue to refine and improve FawltyDeps.

Stay tuned as we will drill deeper into the data we’ve collected. So far, we’ve reused part of FawltyDeps‘ code for our analysis, but the next step will be to run the full FawltyDeps tool on a large number of packages. Join us as we examine how FawltyDeps performs under rigorous testing and what improvements can be made to enhance its capabilities!


  1. For more insights, refer to our previous talk at PyData Global: Finding undeclared and unused dependencies in your notebooks and projects.

September 24, 2024 12:00 AM

September 18, 2024

Well-Typed.Com

The Haskell Unfolder Episode 32: solving tic-tac-toe

Today, 2024-09-18, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …) we are streaming the 32nd episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 32: solving tic-tac-toe

In this episode, which is suitable for Haskell beginners, we don’t focus on a specific Haskell programming technique, but instead try to develop an implementation of a simple game from scratch: tic-tac-toe. After having implemented the rules, we will show how to actually solve the game and allow optimal play by producing a complete game tree and using a naive minimax algorithm for evaluating states.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at September 18, 2024 12:00 AM

September 16, 2024

Dan Piponi (sigfpe)

What does it take to be a hero? and other questions from statistical mechanics.

1 We only hear about the survivors

In the classic Star Trek episode Errand of Mercy, Spock computes the chance of success:

CAPTAIN JAMES T. KIRK : What would you say the odds are on our getting out of here?

MR. SPOCK : Difficult to be precise, Captain. I should say, approximately 7,824.7 to 1.


And yet they get out of there. Are Spock’s probability computations unreliable? Think of it another way. The Galaxy is a large place. There must be tens of thousands of Spocks, and Grocks, and Plocks out there on various missions. But we won’t hear (or don’t want to hear) about the failures. So they may all be perfectly good at probability theory, but we’re only hearing about the lucky ones. This is an example of survivor bias.


2 Simulation


We can model this. I’ve written a small battle simulator for a super-simple made up role-playing game...


And the rest of this article can be found at github


(Be sure to download the actual PDF if you want to be able to follow links.)

by sigfpe (noreply@blogger.com) at September 16, 2024 04:11 PM

September 12, 2024

Tweag I/O

Reflecting away from definitions in Liquid Haskell

We’ve all been there: wasting a couple of days on a silly bug. Good news for you: formal methods have never been easier to leverage.

In this post, I will discuss the contributions I made during my internship to Liquid Haskell (LH), a tool that makes proving that your Haskell code is correct a piece of cake.

LH lets you write contracts for your functions inside your Haskell code. In other words, you write pre-conditions (what must be true when you call it) and post-conditions (what must always be true when you leave the function). These are then fed into an SMT solver that proves your code satisfies them! You may have to write a few lemmas to guide LH, but it makes verification easier than proving them completely in a proof assistant.

My contributions enhance the reflection mechanism, which allows LH to unfold function definitions in logic formulas when verifying a program. I have explored three approaches that are described in what follows.

The problem

Imagine that, in the course of your work, you wanted to define a function that inserts into an association list.

{-@
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] |
        lookup k l = Just v || head res = (k , v)
     }
@-}
smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)]
smartInsert k v l
  | lookup k l == Just v = l
  | otherwise = (k, v) : l

LH runs as a compiler plugin. While the bulk of the compiler ignores the special comments {-@ ... @-}, LH processes the annotations therein.

The annotation that you see in the first snippet is the specification of smartInsert, with the post-condition establishing that the result of the function must have the pair (k, v) at the front, or the pair must be already present in the original list.

Let us say that you also want to use that smartInsert function later in the logic or proofs, so you want to reflect it to the logic. For that, you will introduce another annotation:

{-@ reflect smartInsert @-}

This annotation is telling LH that the equations of the Haskell definition of smartInsert can be used to unfold calls to smartInsert in logic formulas.

As a human, you may agree that the specification is valid for this implementation, but you get this error from the machine:

error:
Illegal type specification for `Test.smartInsert`
[...]
    Unbound symbol GHC.Internal.List.lookup --- perhaps you meant: GHC.Internal.Base.. ?

Do not despair! This tells you that lookup is not defined in the logic. Despite lookup being a respectable function in Haskell, defined in GHC.List, LH knows nothing about it. Not all functions in Haskell can simply be used in the logic, at least not without reflecting them first. Far from being discouraged, you decide to reflect it like the others, but you realize that lookup wasn’t defined in your own module, it comes from the Prelude! This makes reflection impossible, as LH points out:

error:
Cannot lift Haskell function `lookup` to logic
"lookup" is not in scope

If you consider for a moment, LH needs the definition of the function in order to reflect it. So it can only complain when it is asked to reflect a function whose definition is not available because it was defined in some library dependency.

This is a recurring problem, especially when working with dependencies, and this is exactly what I have been working on during this internship at Tweag, in three different ways, as described below.

Idea #1: Define our own reflection of the function

Your first thought might be: “if I cannot reflect lookup because it comes from a foreign library, I will just define my own version of it myself”. Even better would be if you could still link your custom definition of lookup to the original symbol. Creating this link was my first contribution.

Step one is to define the pretend function. For this to work out correctly in the end, its definition must be equivalent to the original definition of the imported function.

The definition of the pretend function might look like this:

myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((x, y):xys)
  | key == x  = Just y
  | otherwise = myLookup key xys

So far, so good. Of course, we give it a different name from the actual function, as they refer to different definitions, and we want to be able to refer to both so that we can link them together later.

Now, we reflect this myLookup function, which LH has no problem doing, since this reflect command is located in the same module as its definition.

{-@ reflect myLookup @-}

Then, the magic happens with this annotation that links the two lookups together:

{-@ assume reflect lookup as myLookup @-}

Read it as “reflect lookup, assuming that its definition is the same as myLookup”. This is enough to get the smartInsert function verified. Just for the record, here is the working snippet:

{-@ reflect myLookup @-}
myLookup :: Eq a => a -> [(a, b)] -> Maybe b
myLookup _ [] = Nothing
myLookup key ((x, y):xys)
  | key == x  = Just y
  | otherwise = myLookup key xys

{-@ assume reflect lookup as myLookup @-}

{-@
reflect smartInsert
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] |
       lookup k l = Just v || head res = (k , v)
     }
@-}
smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)]
smartInsert k v l
  | lookup k l == Just v = l
  | otherwise = (k, v) : l

The question you may be asking at this point is: why does it work?

In order to verify the code, LH has to prove side-conditions (called subtyping relations) between the actual output and the post-condition to be verified. For the first equation of smartInsert, it needs to be proved that

lookup k l = Just v && res = l
  =>
lookup k l = Just v || head res = (k , v)

For the second equation, it needs to be proved that

res = (k, v) : l
  =>
lookup k l = Just v || head res = (k , v)

Because we started with such a simple example, the reflection of lookup is actually unused here (even though LH conservatively insists on it). But that’s just a coincidence; in fact, we can use a more direct post-condition that does actually use the reflection:

{-@
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] | lookup k res = Just v}
@-}

This time, the subtyping constraints require proving:

-- constraint for the first equation
lookup k l = Just v && res = l
  =>
lookup k res = Just v

-- constraint for the second equation
res = (k, v) : l
  =>
lookup k res = Just v

The first constraint can still be solved without going into the definition of lookup. But the second constraint isn’t something that we can prove for any definition of lookup. Thanks to reflection, we have the following unfoldings at our disposal:

lookup key l = myLookup k l

myLookup key l =
  if isEmpty l then Nothing
  else if key = fst (head l) then
    Just (snd (head l))
  else
    myLookup key (tail l)

The first equality is from assume-reflection. It links the pretend and actual functions. The second one is the reflection of myLookup.

With that in mind, let’s move on to prove the second constraint. We reduce the left-hand side to the right-hand side.

lookup k res
    = lookup k ((k, v):l)       (hypothesis)
    = myLookup k ((k, v) : l)   (lookup unfolding)
    = Just v                    (myLookup unfolding)

Q.E.D. Furthermore, you notice that the equation connecting lookup and myLookup was crucial. That is the gist of what we added to LH to make the proof work.

In addition to the implementation, I contributed a specification of assume-reflection that spells out the validation of the new annotation and the resolution rules when the same function is assume-reflected at different locations. It is worth noting that if there exist two assume-reflections in your imports that contradict each other, then one of them must be false, so your axiom environment will not be sound.

Idea #2: opaque reflection

We noted already that we didn’t truly need to know what lookup was about to prove the first, simpler specification, namely:

{-@
smartInsert
  :: k:String
  -> v:Int
  -> l:[(String, Int)]
  -> {res : [(String, Int)] |
       lookup k res = Just v || head res = (k, v)
     }
@-}

The only issue we had was that lookup was not defined in the logic. Similarly, it is possible that our own functions to be reflected use imported, unreflected functions whose content is irrelevant. We want to reflect the expressions of our functions, but do not care about the expression of some of the functions that appear inside them. Here, we want to reflect smartInsert, which contains lookup, but we don’t need to know exactly what lookup is about to prove our lemmas. Either lookup comes from a dependency, or it has a non-trivial implementation, or it uses primitives not implemented in Haskell.

We allowed this through what we call opaque reflection. Opaque reflection introduces a symbol, without any equation, for all the symbols in your reflections that aren’t defined yet in the logic.

For instance, when reflecting the definition of smartInsert,

smartInsert k v l
  | lookup k l == Just v = l
  | otherwise = (k, v) : l

LH looks for any free symbols in there that are not present in the logic. Here, it will see that lookup is something new to the logic, and it will introduce an uninterpreted function for it. Uninterpreted functions are symbols used by the SMT solver, for which it only knows it satisfies function congruence, i.e. that if two values are equal v = w, then when the function is applied to them, the result is still the same f v = f w.

As it turns out, we could also do that manually using the measure annotation. These annotations let you introduce an uninterpreted function in the logic yourself, and specify the refinement type of it.

For instance, we could define a measure like this:

{-@
measure GHC.Internal.List.lookup :: k:a -> xs:[(a, b)] -> Maybe b
GHC.Internal.List.lookup
  :: k:a
  -> xs:[(a, b)]
  -> {VV : Maybe b | VV == GHC.Internal.List.lookup k xs}
@-}

The measure annotation creates an uninterpreted function with the same name as the function in the Haskell code. The second line links both the uninterpreted and Haskell functions by strengthening the post-condition of the Haskell function with the uninterpreted function from the logic.

The new opaque reflection does all that for you automatically! It’s even more powerful when you think about imports. If two modules are opaque-reflecting the same function from some common import, the uninterpreted symbols are considered the same because they refer to the same thing.

Whereas, if you were to use measure annotations in both imports for the same external functions (say, lookup), and then to import those in another module, LH would complain about it. Indeed, there can not be two measures with identical names in scope. Since LH doesn’t know what you’re using those measures for, or whether they actually stand for the same uninterpreted function, it cannot resolve the ambiguity. The full specification is here.

Idea #3: Using the unfoldings

At this point, someone might object that Haskell can inline even imported functions when optimizing the code, so it must have access to the original definitions. As such, there is no need for assume-reflection or opaque-reflection, if we could just reflect the function definition wherever the optimizer finds it.

It is indeed the case for some functions, and under some circumstances (note the precautions I’m taking here), that some information about the implementation of functions is passed in interface files.

What are interface files? These are the files that contain the information that the other modules need to know. Part of this information is the unfoldings of the exported functions, in a syntax that is slightly different from the GHC’s CoreExprs, but can easily be converted to it.

After some experimentation, I observed that the unfoldings of many functions are available in interface files, unless prevented by the -fignore-interface-pragmas or -fomit-interface-pragmas flags (note that -O0 implies those flags, but -O1 does not). Since most packages are compiled with at least -O1, the unfolding of many functions are available without any further tuning. In particular, those functions that are small enough to be included in the interface files are available.

Once implemented, it suffices to use the same reflect annotation as before, but this time even for imported functions!

{-@ reflect flip -@}

LH will automatically detect if this function is defined in the current module or in the dependencies, and in the latter case it will look for possible unfoldings.

Unfortunately, these unfoldings turned out to have some drawbacks.

  • The presence of these unfoldings depends on some GHC flags, and heuristics from GHC. As such, it’s possible for a new version of a library to suddenly exclude an unfolding without the library author realizing it. This predicament is akin to that of the HERMIT tool, and it is difficult to solve without rebuilding the dependencies with custom configuration.
  • The unfoldings are based on the optimized version of the functions, which is sometimes harder to reason about. Also, it is subject to change if the GHC optimizations change, which means that any proof based on these unfoldings could be broken by a change to those optimizations.
  • Many functions are not possible to reflect as they are. If they use local recursive definitions, or lambda abstractions, LH cannot reflect them at the moment.
  • If the unfolding of a function depends on non-exported definitions, LH does not offer a mechanism to request these definitions to be reflected. Even if it did, this breaks encapsulation to some point, and makes our code dependent on internal implementation details of imported code, to the point where even a dot release could break the verification.
  • Reflections are still limited in their capabilities. At the time of writing, reflected functions cannot contain lambda abstractions or local recursive bindings. Recursive bindings are allowed, but local ones are not, since LH has no sense of locality (yet). Because unfoldings tend to have a lot of these, we cannot reflect them (yet).

For these reasons, further work and experimentation will be needed to make this approach truly useful. Nevertheless, we have included the implementation in a PR in the hope that it may be helpful in some cases, and that improving the capabilities of reflections in general will make it more and more valuable.

Conclusion

Liquid Haskell’s reflection is handy and powerful, but if your function uses some dependencies that are not yet reflected, you were stuck. We presented three ways to proceed: assert an equivalence between the imported function and a definition in the current module (ideally copy-pasted from the original source file), introduce some uninterpreted function in the logic for dependencies, or try to find the unfoldings of those dependencies in interface files.

All of these features have been implemented and pulled into Liquid Haskell. The implementation fits well into LH’s machinery, reusing the existing pipeline for uninterpreted symbols and reflections. We also added tests, especially for module imports, and checked the implementation against the numerous regression tests already in place. An enticing next step would be to improve the capabilities of reflection, which would also allow diving deeper into the reflection of unfoldings in interface files.

I hope this will improve the ease of proof-writing in LH, and that reading this post will encourage you to write more specifications and proofs about your code, seeing how much of a breeze it can be!

I would like to thank Tweag for this wonderful opportunity to work on Liquid Haskell; it has been an enriching internship that has allowed me to grow in Haskell experience and in contributing to large codebases. In particular, I’d like to express my heartfelt thanks to my supervisor, Facundo Domínguez, for his constant support, guidance, and invaluable assistance.

September 12, 2024 12:00 AM

September 09, 2024

Magnus Therning

Followup on secrets in my work notes

I got the following question on my post on how I handle secrets in my work notes:

Sounds like a nice approach for other secrets but how about :dbconnection for Orgmode and sql-connection-alist?

I have to admit I'd never come across the variable sql-connection-alist before. I've never really used sql-mode for more than editing SQL queries and setting up code blocks for running them was one of the first things I used yasnippet for.

I did a little reading and unfortunately it looks like sql-connection-alist can only handle string values. However, there is a variable sql-password-search-wallet-function, with the default value of sql-auth-source-search-wallet, so using auth-source is already supported for the password itself.

There seems to be a lack of good tutorials for setting up sql-mode in a secure way – all articles I found place the password in clear-text in the config – filling that gap would be a nice way to contribute to the Emacs community. I'm sure it'd prompt me to re-evaluate incorporating sql-mode in my workflow.

September 09, 2024 08:36 PM

in Code

My Physics and Math Heritage

This is just a “personal life update” kind of post, but I recently found out a couple of cool things about my academic history that I thought were neat enough to write down so that I don’t forget them.

Oppenheimer

When the Christopher Nolan Biopic about the life of J. Robert Oppenheimer was about to come out, it was billed as an “Avengers of Physics”, where every major physicist working in the US early and middle 20th century would be featured. I had a thought tracing my “academic family tree” to see if my PhD advisor’s advisor’s advisor’s advisor’s was involved in any of the major physics projects depicted in the movie, to see if I could spot them portrayed in the movie as a nice personal connection.

If you’re not familiar with the concept, the relationship between a PhD candidate and their doctoral advisor is a very personal and individual one: they personally direct and guide the candidate’s research and thesis. To an extent, they are like an academic parent.

I was able to find my academic family tree and, to my surprise, my academic lineage actually traces directly back to a key figure in the movie!

  • My advisor, Hesham El-Askary, received his PhD under the advisory of Menas Kafatos at George Mason university
  • Dr. Kafatos received his PhD under the advisory of Philip Morrison at the Massachusetts Institute of Technology.
  • Dr. Morrison received his PhD in 1940 at University of California, Berkeley under the advisory of none other than J. Robert Oppenheimer himself!

So, I started this out on a quest to figure out if I was “academically descended” from anyone in the movie, and I ended up finding out I was Oppenheimer’s advisee’s advisee’s advisee’s advisee! I ended up being able to watch the movie and identify my great-great-grand advisor no problem, and I think even my great-grand advisor. A fun little unexpected surprise and a cool personal connection to a movie that I enjoyed a lot.

Erdos

As an employee at Google, you can customize your directory page with “badges”, which are little personalized accomplishments or achievements, usually unrelated to any actual work you do. I noticed that some people had an “Erdos Number N” badge (1, 2, 3, etc.). I had never given any thought into my own personal Erdos number (it was probably really high, in my mind) but I thought maybe I could look into it in order to get a shiny worthless badge.

In academia, Paul Erdos is someone who wrote so many papers and collaborated with so many people that it became a joking “non-accomplishment” to say that you wrote a paper with him. Then after a while it became an joking non-accomplishment to say that you wrote a paper with someone who wrote a paper with him (because, who hasn’t?). And then it became an even more joking more non-accomplishment to say you had an Erdos Number of 3 (you wrote a paper with someone who wrote a paper with someone who wrote a paper with Dr. Erdos).

Anyway I just wanted to get that badge so I tried to figure it out. It turns my most direct trace through:

  1. I co-authored “Application of recurrent neural networks for drought projections in California” with Daniele C. Struppa.
  2. Dr. Struppa co-authored “Applications of commutative and computational algebra to partial differential equations” with William W. Adams.
  3. Dr. Adams co-authored “Non-Archimedian analytic functions taking the same values at the same points” with Ernst G. Straus.
  4. Dr. Straus collaborated with many people, including Einstein, Graham, Goldberg, and 20 papers with Erdos.

So I guess my Erdos number is 4? The median number for mathematicians today seems to be 5, so it’s just one step above that. Not really a note-worthy accomplishment, but still neat enough that I want a place to put the work tracking this down the next time I am curious again.

Anyways I submitted the information above and they gave me that sweet Edros 4 badge! It was nice to have for about a month before quitting the company.

That’s It

Thanks for reading and I hope you have a nice rest of your day!

by Justin Le at September 09, 2024 05:28 AM

Brent Yorgey

Decidable equality for indexed data types

Decidable equality for indexed data types

Posted on September 9, 2024
Tagged , ,

Recently, as part of a larger project, I wanted to define decidable equality for an indexed data type in Agda. I struggled quite a bit to figure out the right way to encode it to make Agda happy, and wasn’t able to find much help online, so I’m recording the results here.

The tl;dr is to use mutual recursion to define the indexed data type along with a sigma type that hides the index, and to use the sigma type in any recursive positions where we don’t care about the index! Read on for more motivation and details (and wrong turns I took along the way).

This post is literate Agda; you can download it here if you want to play along. I tested everything here with Agda version 2.6.4.3 and version 2.0 of the standard library.

Background

First, some imports and a module declaration. Note that the entire development is parameterized by some abstract set B of base types, which must have decidable equality.

open import Data.Product using (Σ ; _×_ ; _,_ ; -,_ ; proj₁ ; proj₂)
open import Data.Product.Properties using (≡-dec)
open import Function using (__)
open import Relation.Binary using (DecidableEquality)
open import Relation.Binary.PropositionalEquality using (__ ; refl)
open import Relation.Nullary.Decidable using (yes; no; Dec)

module OneLevelTypesIndexed (B : Set) (≟B : DecidableEquality B) where

We’ll work with a simple type system containing base types, function types, and some distinguished type constructor □. So far, this is just to give some context; it is not the final version of the code we will end up with, so we stick it in a local module so it won’t end up in the top-level namespace.

module Unindexed where
  data Ty : Set where
    base : B  Ty
    __ : Ty  Ty  Ty
_ : Ty  Ty

For example, if \(X\) and \(Y\) are base types, then we could write down a type like \(\square ((\square \square X \to Y) \to \square Y)\):

  infixr 2 __
  infix 30_

  postulate
    BX BY : B

  X : Ty
  X = base BX
  Y : Ty
  Y = base BY

  example : Ty
  example =((□ □ X ⇒ Y) ⇒ □ Y)

However, for reasons that would take us too far afield in this blog post, I don’t want to allow immediately nested boxes, like \(\square \square X\). We can still have multiple boxes in a type, and even boxes nested inside of other boxes, as long as there is at least one arrow in between. In other words, I only want to rule out boxes immediately applied to another type with an outermost box. So we don’t want to allow the example type given above (since it contains \(\square \square X\)), but, for example, \(\square ((\square X \to Y) \to \square Y)\) would be OK.

Encoding invariants

How can we encode this invariant so it holds by construction? One way would be to have two mutually recursive data types, like so:

module Mutual where
  data Ty : Set
  data UTy : Set

  data Ty where
_ : UTy  Ty
_ : UTy  Ty

  data UTy where
    base : B  UTy
    __ : Ty  Ty  UTy

UTy consists of types which have no top-level box; the constructors of Ty just inject UTy into Ty by adding either one or zero boxes. This works, and defining decidable equality for Ty and UTy is relatively straightforward (again by mutual recursion). However, it seemed to me that having to deal with Ty and UTy everywhere through the rest of the development was probably going to be super annoying.

The other option would be to index Ty by values indicating whether a type has zero or one top-level boxes; then we can use the indices to enforce the appropriate rules. First, we define a data type Boxity to act as the index for Ty, and show that it has decidable equality:

data Boxity : Set where
  [0] : Boxity
  [1] : Boxity

Boxity-≟ : DecidableEquality Boxity
Boxity-≟ [0] [0] = yes refl
Boxity-≟ [0] [1] = no λ ()
Boxity-≟ [1] [0] = no λ ()
Boxity-≟ [1] [1] = yes refl

My first attempt to write down a version of Ty indexed by Boxity looked like this:

module IndexedTry1 where
  data Ty : Boxity  Set where
    base : B  Ty [0]
    __ : {b₁ b₂ : Boxity}  Ty b₁  Ty b₂  Ty [0]
_ : Ty [0]  Ty [1]

base always introduces a type with no top-level box; the constructor requires a type with no top-level box, and produces a type with one (this is what ensures we cannot nest boxes); and the arrow constructor does not care how many boxes its arguments have, but constructs a type with no top-level box.

This is logically correct, but I found it very difficult to work with. The sticking point for me was injectivity of the arrow constructor. When defining decidable equality we need to prove lemmas that each of the constructors are injective, but I was not even able to write down the type of injectivity for _⇒_. We would want something like this:

-inj :
  {bσ₁ bσ₂ bτ₁ bτ₂ : Boxity}
  {σ₁ : Ty bσ₁} {σ₂ : Ty bσ₂} {τ₁ : Ty bτ₁} {τ₂ : Ty bτ₂} →
  (σ₁ ⇒ σ₂) ≡ (τ₁ ⇒ τ₂) →
  (σ₁ ≡ τ₁) × (σ₂ ≡ τ₂)

but this does not even typecheck! The problem is that, for example, σ₁ and τ₁ have different types, so the equality proposition σ₁ ≡ τ₁ is not well-typed.

At this point I tried turning to heterogeneous equality, but it didn’t seem to help. I won’t record here all the things I tried, but the same issues seemed to persist, just pushed around to different places (for example, I was not able to pattern-match on witnesses of heterogeneous equality because of types that didn’t match).

Sigma types to the rescue

At ICFP last week I asked Jesper Cockx for advice,which felt a bit like asking Rory McIlroy to give some tips on your mini-golf game

and he suggested trying to prove decidable equality for the sigma type pairing an index with a type having that index, like this:

  ΣTy : Set
  ΣTy = Σ Boxity Ty

This turned out to be the key idea, but it still took me a long time to figure out the right way to make it work. Given the above definitions, if we go ahead and try to define decidable equality for ΣTy, injectivity of the arrow constructor is still a problem.

After days of banging my head against this off and on, I finally realized that the way to solve this is to define Ty and ΣTy by mutual recursion: the arrow constructor should just take two ΣTy arguments! This perfectly captures the idea that we don’t care about the indices of the arrow constructor’s argument types, so we hide them by bundling them up in a sigma type.

ΣTy : Set
data Ty : Boxity  Set

ΣTy = Σ Boxity Ty

data Ty where
_ : Ty [0]  Ty [1]
  base : B  Ty [0]
  __ : ΣTy  ΣTy  Ty [0]

infixr 2 __
infix 30_

Now we’re cooking! We now make quick work of the required injectivity lemmas, which all go through trivially by matching on refl:


□-inj : {τ₁ τ₂ : Ty [0]}  (□ τ₁ ≡ □ τ₂)  (τ₁ ≡ τ₂)
□-inj refl = refl

base-inj : {b₁ b₂ : B}  base b₁ ≡ base b₂  b₁ ≡ b₂
base-inj refl = refl

⇒-inj : {σ₁ σ₂ τ₁ τ₂ : ΣTy}  (σ₁ ⇒ σ₂)(τ₁ ⇒ τ₂)  (σ₁ ≡ τ₁) × (σ₂ ≡ τ₂)
⇒-inj refl = refl , refl

Notice how the type of ⇒-inj is now perfectly fine: we just have a bunch of ΣTy values that hide their indices, so we can talk about propositional equality between them with no trouble.

Finally, we can define decidable equality for Ty and ΣTy by mutual recursion.

ΣTy-≟ : DecidableEquality ΣTy

{-# TERMINATING #-}
Ty-≟ :  {b}  DecidableEquality (Ty b)

Sadly, I had to reassure Agda that the definition of Ty-≟ is terminating—more on this later.

To define ΣTy-≟ we can just use a lemma from Data.Product.Properties which derives decidable equality for a sigma type from decidable equality for both components.

ΣTy-≟ = ≡-dec Boxity-≟ Ty-≟

The only thing left is to define decidable equality for any two values of type Ty b (given a specific boxity b), making use of our injectivity lemmas; now that we have the right definitions, this falls out straightforwardly.

Ty-≟ (□ σ) (□ τ) with Ty-≟ σ τ
... | no σ≢τ = no (σ≢τ ∘ □-inj)
... | yes refl = yes refl
Ty-≟ (base x) (base y) with ≟B x y
... | no x≢y = no (x≢y ∘ base-inj)
... | yes refl = yes refl
Ty-≟ (σ₁ ⇒ σ₂) (τ₁ ⇒ τ₂) with ΣTy-≟ σ₁ τ₁ | ΣTy-≟ σ₂ τ₂
... | no σ₁≢τ₁ | _ = no (σ₁≢τ₁ ∘ proj₁ ∘ ⇒-inj)
... | yes _ | no σ₂≢τ₂ = no (σ₂≢τ₂ ∘ proj₂ ∘ ⇒-inj)
... | yes refl | yes refl = yes refl
Ty-≟ (base _) (__) = no λ ()
Ty-≟ (__) (base _) = no λ ()

Final thoughts

First, the one remaining infelicity is that Agda could not tell that Ty-≟ is terminating. I am not entirely sure why, but I think it may be that the way the recursion works is just too convoluted for it to analyze properly: Ty-≟ calls ΣTy-≟ on structural subterms of its inputs, but then ΣTy-≟ works by providing Ty-≟ as a higher-order parameter to ≡-dec. If you look at the definition of ≡-dec, all it does is call its function parameters on structural subterms of its input, so everything should be nicely terminating, but I guess I am not surprised that Agda is not able to figure this out. If anyone has suggestions on how to make this pass the termination checker without using a TERMINATING pragma, I would love to hear it!

As a final aside, I note that converting back and forth between Ty (with ΣTy arguments to the arrow constructor) and IndexedTry1.Ty (with expanded-out Boxity and Ty arguments to arrow) is trivial:

Ty→Ty1 : {b : Boxity}  Ty b  IndexedTry1.Ty b
Ty→Ty1 (□ σ) = IndexedTry1.(Ty→Ty1 σ)
Ty→Ty1 (base x) = IndexedTry1.base x
Ty→Ty1 ((b₁ , σ₁)(b₂ , σ₂)) = (Ty→Ty1 σ₁) IndexedTry1.(Ty→Ty1 σ₂)

Ty1→Ty : {b : Boxity}  IndexedTry1.Ty b  Ty b
Ty1→Ty (IndexedTry1.base x) = base x
Ty1→Ty (σ₁ IndexedTry1.⇒ σ₂) = -, (Ty1→Ty σ₁) ⇒ -, (Ty1→Ty σ₂)
Ty1→Ty (IndexedTry1.□ σ) =(Ty1→Ty σ)

I expect it is also trivial to prove this is an isomorphism, though I’m not particularly motivated to do it. The point is that, as anyone who has spent any time proving things with proof assistants knows, two types can be completely isomorphic, and yet one can be vastly easier to work with than the other in certain contexts. Often when I’m trying to prove something in Agda it feels like at least half the battle is just coming up with the right representation that makes the proofs go through easily.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at September 09, 2024 12:00 AM

September 07, 2024

Dan Piponi (sigfpe)

How to hide information from yourself in a solo RPG

A more stable version of this article can be found on github.

The Problem

Since the early days of role-playing games there has been debate over which rolls the GM should make and which are the responsibility of the players. But I think that for “perception” checks it doesn’t really make sense for a player to roll. If, as a player, you roll to hear behind a door and succeed, but you’re told there is no sound, then you know there is nothing to be heard. But you ought to just be left in suspense.

If you play a solo RPG the situation is more challenging. If there is a probability p of a room being occupied, and probability q of you hearing the occupant if you listen at the door, how can you simulate listening without making a decision about whether the room is occupied before opening the door? I propose a little mathematical trick.
Helena Listening, by Arthur Rackham

Simulating conditional probabilities

Suppose P(M) = p and P(H|M) = q (and P(H|not M) = 0). Then P(H) = pq. So to simulate the probability of hearing something at a new door: roll to see if a monster is present, and then roll to hear it. If both come up positive then you hear a noise.

But...but...you object, if the first roll came up positive you know there is a monster, removing the suspense if the second roll fails. Well this process does produce the correct (marginal) probability of hearing a noise at a fresh door. So you reinterpret the first roll not as determining whether a monster is present, but as just the first step in a two-step process to determine if a sound is heard.

But what if no sound is heard and we decide to open the door? We need to reduce the probability that we find a monster behind the door. In fact we need to sample P(M|not H). We could use Bayes’ theorem to compute this but chances are you won’t have any selection of dice that will give the correct probability. And anyway, you don’t want to be doing mathematics in the middle of a game, do you? 
There’s a straightforward trick. In the event that you heard no noise at the door and want to now open the door: roll (again) to see if there is a monster behind the door, and then roll to listen again. If the outcome of the two rolls matches the information that you know, ie. it predicts you hear nothing, then you can now accept the first roll as determining whether the monster is present. In that case the situation is more or less vacuously described by P(M|not H). If the two rolls disagree with what you know, ie. they predict you hear something, then repeat the roll of two dice. Keep repeating until it agrees with what you know. 

In general

There is a general method here though it’s only practical for simple situations. If you need to generate some hidden variables as part of a larger procedure, just generate them as usual, keep the variables you observe, and discard the hidden part. If you ever need to generate those hidden variables again, and remain consistent with previous rolls, resimulate from the beginning, restarting the rolls if they ever disagree with your previous observations.

In principle you could even do something like simulate an entire fight against a creature whose hit points remain unknown to you. But you’ll spend a lot of time rerolling the entire fight from the beginning. So It’s better for situations that only have a small number of steps, like listening at a door.

by sigfpe (noreply@blogger.com) at September 07, 2024 11:06 PM

September 05, 2024

Tweag I/O

Adding algebraic data types to Nickel

Our Nickel language is a configuration language. It’s also a functional programming language. Functional programming isn’t a well-defined term: it can encompass anything from being vaguely able to pass functions as arguments and to call them (in that respect, C and JavaScript are functional) to being a statically typed, pure and immutable language based on the lambda-calculus, like Haskell.

However, if you ask a random developer, I can guarantee that one aspect will be mentioned every time: algebraic data types (ADTs) and pattern matching. They are the bread and butter of typed functional languages. ADTs are relatively easy to implement (for language maintainers) and easy to use. They’re part of the 20% of the complexity that makes for 80% of the joy of functional programming.

But Nickel didn’t have ADTs until recently. In this post, I’ll tell the story of Nickel and ADTs, starting from why they were initially lacking, the exploration of different possible solutions and the final design leading to the eventual retro-fitting of proper ADTs in Nickel. This post is intended for Nickel users, for people interested in configuration management, but also for anyone interested in programming language design and functional programming. It doesn’t require prior Nickel knowledge.

A quick primer on Nickel

Nickel is a gradually typed, functional, configuration language. From this point, we’ll talk about Nickel before the introduction of ADTs in the 1.5 release, unless stated otherwise. The core language features:

  • let-bindings: let extension = ".ncl" in "file.%{extension}"
  • first-class functions: let add = fun x y => x + y in add 1 2
  • records (JSON objects): {name = "Alice", age = 42}
  • static typing: let mult : Number -> Number -> Number = fun x y => x * y. By default, expressions are dynamically typed. A static type annotation makes a definition or an inline expression typechecked statically.
  • contracts look and act almost like types but are evaluated at runtime: { port | Port = 80 }. They are used to validate configurations against potentially complex schemas.

The lifecycle of a Nickel configuration is to be 1) written, 2) evaluated and 3) serialized, typically to JSON, YAML or TOML. An important guideline that we set first was that every native data structure (record, array, enum, etc.) should be trivially and straightforwardly serializable to JSON. In consequence, Nickel started with the JSON data model: records (objects), arrays, booleans, numbers and strings.

There’s one last primitive value: enums. As in C or in JavaScript, an enum in Nickel is just a tag. An enum value is an identifier with a leading ', such as in {protocol = 'http, server = "tweag.io"}. An enum is serialized as a string: the previous expression is exported to JSON as {"protocol": "http", "server": "tweag.io"}.

So why not just using strings? Because enums can better represent a finite set of alternatives. For example, the enum type [| 'http, 'ftp, 'sftp |] is the type of values that are either 'http, 'ftp or 'sftp. Writing protocol : [| 'http, 'ftp, 'sftp |] will statically (at typechecking time) ensure that protocol doesn’t take forbidden values such as 'https. Even without static typing, using an enum conveys to the reader that a field isn’t a free-form string.

Nickel has a match which corresponds to C or JavaScript’s switch:

is_http : [| 'http, 'ftp, 'sftp |] -> Bool =
  match {
    'http => true,
    _ => false,
  }

As you might notice, there are no ADTs in sight yet.

ADTs in a configuration language

While Nickel is a functional language, it’s first and foremost a configuration language, which comes with specific design constraints.

Because we’re telling the story of ADTs before they landed in Nickel, we can’t really use a proper Nickel syntax yet to provide examples. In what follows, we’ll use a Rust-like syntax to illustrate the examples: enum Foo<T> { Bar(i32), Baz(bool, T) } is an ADT parametrized by a generic type T with two constructors Bar and Baz, where the first one takes an integer as an argument and the other takes a pair of a boolean and a T. Concrete values are written as Bar(42) or Baz(true, "hello").

An unexpected obstacle: serialization

As said earlier, we want values to be straightforwardly serializable to the JSON data model.

Now, take a simple ADT such as enum Foo<T,U> = { SomePair(T,U), Nothing }. You can find reasonable serializations for SomePair(1,2), such as {"tag": "SomePair", "a": 1, "b": 2}. But why not {"flag": "SomePair", "0": 1, "1": 2} or {"mark": "SomePair", "data": [1, 2]}? While those representations are isomorphic, it’s hard to know the right choice for the right use-case beforehand, as it depends on the consumer of the resulting JSON. We really don’t want to make an arbitrary choice on behalf of the user.

Additionally, while ADTs are natural for a classical typed functional language, they might not entirely fit the configuration space. A datatype like enum Literal { String(String), Number(Number) } that can store either a string or a number is usually represented directly as an untagged union in a configuration, that {"literal": 5} or {"literal": "hello"}, instead of the less natural tagged union (another name for ADTs) {"literal": {"tag" = "Number", "value": 5}}.

This led us to look at (untagged) union types instead. Untagged unions have the advantage of not making any choice about the serialization: they aren’t a new data structure, as are ADTs, but rather new types (and contracts) to classify values that are already representable.

The road of union types

A union type is a type that accepts different alternatives. We’ll use the fictitious \/ type combinator to write a union in Nickel (| is commonly used elswhere but it’s already taken in Nickel). Our previous example of a literal that can be either a string or a number would be {literal: Number \/ String}. Those types are broadly useful independently of ADTs. For example, JSON Schema features unions through the core combinator any_of.

Our hope was to kill two birds with one stone by adding unions both as a way to better represent existing configuration schemas, but also as a way to emulate ADTs. Using unions lets users represent ADTs directly as plain records using their preferred serialization scheme. Together with flow-sensitive typing, we can get as expressive as ADTs while letting the user decide on the encoding. Here is an example in a hypothetical Nickel enhanced with unions and flow-sensitive typing:

let sum
  : {tag = 'SomePair, a : Number, b : Number} \/ {tag = 'Nothing}
    -> Number
  = match {
    {tag = 'SomePair, a, b} => a + b,
    {tag = 'Nothing} => 0,
  }

Using unions and flow-sensitive typing as ADTs is the approach taken by TypeScript, where the previous example would be:

type Foo = { tag: "SomePair"; a: number; b: number } | { tag: "Nothing" }

function sum(value: Foo): number {
  switch (value.tag) {
    case "SomePair":
      return value.a + value.b
    case "Nothing":
      return 0
  }
}

In Nickel, any type must have a contract counter-part. Alas union and intersection contracts are hard (in fact, union types alone are also not a trivial feat to implement!). In the linked blog post, we hint at possible pragmatic solutions for union contracts that we finally got to implement for Nickel 1.8. While sufficient for practical union contracts, this is far from the general union types that could subsume ADTs. This puts a serious stop to the idea of using union types to represent ADTs.

What are ADTs really good for?

As we have been writing more and more Nickel, we realized that we have been missing ADTs a lot for library functions - typically the types enum Option<T> { Some(T), None } and Result<T,E> = { Ok(T), Error(E) } - where we don’t care about serialization. Those ADTs are “internal” markers that wouldn’t leak out to the final exported configuration.

Here are a few motivating use-cases.

std.string.find

std.string.find is a function that searches for a substring in a string. Its current type is:

String
-> String
-> { matched : String, index : Number, groups : Array String }

If the substring isn’t found, {matched = "", index = -1, groups []} is returned, which is error-prone if the consumer doesn’t defend against such values. We would like to return a proper ADT instead, such as Found {matched : String, index : Number, groups : Array String} or NotFound, which would make for a better and a safer interface1.

Contract definition

Contracts are a powerful validation system in Nickel. The ability to plug in your own custom contracts is crucial.

However, the general interface to define custom contracts can seem bizarre. Custom contracts need to set error reporting data on a special label value and use the exception-throwing-like std.contract.blame function. Here is a simplified definition of std.number.Nat which checks that a value is natural number:

fun label value =>
  if std.typeof value == 'Number then
    if value % 1 == 0 && value >= 0 then
      value
    else
      let label = std.contract.label.with_message "not a natural" in
      std.contract.blame label
  else
    let label = std.contract.label.with_message "not a number" in
    std.contract.blame label

There are good (and bad) reasons for this situation, but if we had ADTs, we could cover most cases with an alternative interface where custom contracts return a Result<T,E>, which is simpler and more natural:

fun value =>
  if std.typeof value == 'Number then
    if value % 1 == 0 && value >= 0 then
      Ok
    else
      Error("not a natural")
  else
    Error("not a number")

Of course, we could just encode this using a record, but it’s just not as nice.

Let it go, let it go!

The list of other examples of using ADTs to make libraries nicer is endless.

Thus, for the first time, we decided to introduce a native data structure that isn’t serializable.

Note that this doesn’t break any existing code and is forward-compatible with making ADTs serializable in the future, should we change our mind and settle on one particular encoding. Besides, another feature is independently explored to make serialization more customizable through metadata, which would let users use custom (de)serializer for ADTs easily.

Ok, let’s add the good old-fashioned ADTs to Nickel!

The design

Structural vs nominal

In fact, we won’t exactly add the old-fashioned version. ADTs are traditionally implemented in their nominal form.

A nominal type system (such as C, Rust, Haskell, Java, etc.) decides if two types are equal based on their name and definition. For example, values of enum Alias1 { Value(String) } and enum Alias2 { Value(String) } are entirely interchangeable in practice, but Rust still doesn’t accept Alias1::Value(s) where a Alias2 is expected, because those types have distinct definitions. Similarly, you can’t swap a class for another in Java just because they have exactly the same fields and methods.

A structural type system, on the other hand, only cares about the shape of data. TypeScript has a structural type system. For example, the types interface Ball { diameter: number; } and interface Sphere { diameter: number; } are entirely interchangeable, and {diameter: 42} is both a Ball and a Sphere. Some languages, like OCaml2 or Go3, mix both.

Nickel’s current type system is structural because it’s better equipped to handle arbitrary JSON-like data. Because ADTs aren’t serializable, this consideration doesn’t weight as much for our motivating use-cases, meaning ADTs could be still be either nominal or structural.

However, nominal types aren’t really usable without some way of exporting and importing type definitions, which Nickel currently lacks. It sounds more natural to go for structural ADTs, which seamlessly extend the existing enums and would overall fit better with the rest of the type system.

Structural ADTs look like the better choice for Nickel. We can build, typecheck, and match on ADTs locally without having to know or to care about any type declaration. Structural ADTs are a natural extension of Nickel (structural) enums, syntactically, semantically, and on the type level, as we will see.

While less common, structural ADTs do exist in the wild and they are pretty cool. OCaml has both nominal ADTs and structural ADTs, the latter being known as polymorphic variants. They are an especially powerful way to represent a non trivial hierarchy of data types with overlapping, such as abstract syntax trees or sets of error values.

Syntax

C-style enums are just a special case of ADTs, namely ADTs where constructors don’t have any argument. The dual conclusion is that ADTs are enums with arguments. We thus write the ADT Some("hello") as an enum with an argument in Nickel: 'Some "hello".

We apply the same treatment to types. [| 'Some, 'None |] was a valid enum type, and now [| 'Some String, 'None |] is also a valid type (which would correspond to Rust’s Option<String>).

There is a subtlety here: what should be the type inferred for 'Some now? In a structural type system, 'Some is just a free-standing symbol. The typechecker can’t know if it’s a constant that will stay as it is - and thus has the type [| 'Some |] - or a constructor that will be eventually applied, of type a -> [| 'Some a |]. This difficulty just doesn’t exist in a nominal type system like Rust: there, Option::Some refers to a unique, known and fixed ADT constructor that is known to require precisely one argument.

To make it work, 'Ok 42 isn’t actually a normal function application in Nickel: it’s an ADT constructor application, and it’s parsed differently. We just repurpose the function application syntax4 in this special case. 'Ok isn’t a function, and let x = 'Ok in x 42 is an error (applying something that isn’t a function).

You can still recover Rust-style constructors that can be applied by defining a function (eta-expanding, in the functional jargon): let ok = fun x => 'Ok x.

We restrict ADTs to a single argument. You can use a record to emulate multiple arguments: 'Point {x = 1, y = 2}.

ADTs also come with pattern matching. The basic switch that was match is now a powerful pattern matching construct, with support for ADTs but also arrays, records, constant, wildcards, or-patterns and guards (if side-conditions).

Typechecking

Typechecking structural ADTs is a bit different from nominal ADTs. Take the simple example (the enclosing : _ annotation is required to make the example statically typed in Nickel)

(
  let data = 'Ok 42 in
  let process = match {
    'Ok x => x + 1,
    'Error => 0,
  } in

  process data
) :  _

process is inferred to have type [| 'Ok Number, 'Error |] -> Number. What type should we infer for data = 'Ok 42? The most obvious one is [| 'Ok Number |]. But then [| 'Ok Number |] and [| 'Ok Number, 'Error |] don’t match and process data doesn’t typecheck! This is silly, because this example should be perfectly valid.

One possible solution is to introduce subtyping, which is able to express this kind of inclusion relation: here that [| 'Ok Number |] is included in [| 'Ok Number, 'Error |]. However, subtyping has some defects and is whole can of worms when mixed with polymorphism (which Nickel has).

Nickel rather relies on another approach called row polymorphism, which is the ability to abstract over not just a type, as in classical polymorphism, but a whole piece of an enum type. Row polymorphism is well studied in the literature, and is for example implemented in PureScript. Nickel already features row polymorphism for basic enum types and for records types.

Here is how it works:

let process : forall a. [| 'Ok Number, 'Error; a |] -> Number = match {
  'Ok x => x + 1,
  'Error => 0,
  _ => -1,
} in

process 'Other

Because there’s a catch-all case _ => -1, the type of process is polymorphic, expressing that it can handle any other variant beside 'Ok Number and 'Error (this isn’t entirely true: Ok String is forbidden for example, because it can’t be distinguished from Ok Number). Here, a can be substituted for a subsequence of an enum type, such as 'Foo Bool, 'Bar {x : Number}.

Equipped with row polymorphism, we can infer the type forall a. [| 'Ok Number; a |]5 for 'Ok 42. When typechecking process data in the original example, a will be instantiated to the single row 'Error and the example typechecks. You can learn more about structural ADTs and row polymorphism in the corresponding section of the Nickel user manual.

Conclusion

While ADTs are part of the basic package of functional languages, Nickel didn’t have them until relatively recently because of peculiarities of the design of a configuration language. After exploring the route of union types, which came to a dead-end, we settled on a structural version of ADTs that turns out to be a natural extension of the language and didn’t require too much new syntax or concepts.

ADTs already prove useful to write cleaner and more concise code, and to improve the interface of libraries, even in a gradually typed configuration language. Some concrete usages can be found in try_fold_left and validators already.


  1. Unfortunately, we can’t change the type of std.string.find without breaking existing programs (at least not until a Nickel 2.0), but this use-case still applies to external libraries or future stdlib functions
  2. In OCaml, Objects, polymorphic variants and modules are structural while records and ADTs are nominal.
  3. In Go, interfaces are structural while structs are nominal.
  4. Repurposing application is theoretically backward incompatible because 'Ok 42 was already valid Nickel syntax before 1.5, but it was meaningless (an enum applied to a constant) and would always error out at runtime, so it’s ok.
  5. In practice, we infer a simpler type [| 'Ok Number; ?a |] where ?a is a unification variable which can still have limitations. Interestingly, we decided early on to not perform automatic generalization, as opposed to the ML tradition, for reasons similar to the ones exposed here. Doing so, we get (predicative) higher-rank polymorphism almost for free, while it’s otherwise quite tricky to combine with automatic generalization. It turned out to pay off in the case of structural ADTs, because it makes it possible to side-step those usual enum types inclusion issues (widening) by having the user add more polymorphic annotations. Or we could even actually infer the polymorphic type [| forall a. 'Ok Number; a |] for literals.

September 05, 2024 12:00 AM

September 04, 2024

in Code

Seven Levels of Type Safety in Haskell: Lists

One thing I always appreciate about Haskell is that you can often choose the level of type-safety you want to work at. Haskell offers tools to be able to work at both extremes, whereas most languages only offer some limited part of the spectrum. Picking the right level often comes down to being consciously aware of the benefits/drawbacks/unique advantages to each.

So, here is a rundown of seven “levels” of type safety that you can operate at when working with the ubiquitous linked list data type, and how to use them! I genuinely believe all of these are useful (or useless) in their own different circumstances, even though the “extremes” at both ends are definitely pushing the limits of the language.

This post is written for an intermediate Haskeller, who is already familiar with ADTs and defining their own custom list type like data List a = Nil | Cons a (List a). But, be advised that most of the techniques discussed in this post (especially at both extremes) are considered esoteric at best and harmful at worst for most actual real-world applications. The point of this post is more to inspire the imagination and demonstrate principles that could be useful to apply in actual code, and not to present actual useful data structures.

All of the code here is available online here, and if you check out the repo and run nix develop you should be able to load them all in ghci as well:

$ cd code-samples/type-levels
$ nix develop
$ ghci
ghci> :load Level1.hs

Level 1: Could be anything

Code available here

What’s the moooost type-unsafe you can be in Haskell? Well, we can make a “black hole” data type that could be anything:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L12-L13

data Any :: Type where
  MkAny :: a -> Any

(This data type declaration written using GADT Syntax, and the name was chosen because it resembles the Any type in base)

So you can have values:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L15-L22

anyInt :: Any
anyInt = MkAny (8 :: Int)

anyBool :: Any
anyBool = MkAny True

anyList :: Any
anyList = MkAny ([1, 2, 3] :: [Int])

A value of any type can be given to MkAny, and the resulting type will have type Any.

However, this type is truly a black hole; you can’t really do anything with the values inside it because of parametric polymorphism: you must treat any value inside it in a way that is compatible with a value of any type. But there aren’t too many useful things you can do with something in a way that is compatible with a value of any type (things like, id :: a -> a, const 3 :: a -> Int). In the end, it’s essentially isomorphic to unit ().

However, this isn’t really how dynamic types work. In other languages, we are at least able to query and interrogate a type for things we can do with it using runtime reflection. To get there, we can instead allow some sort of witness on the type of the value. Here’s Sigma, where Sigma p is a value a paired with some witness p a:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L24-L25

data Sigma :: (Type -> Type) -> Type where
  MkSigma :: p a -> a -> Sigma p

And the most classic witness is TypeRep from base, which is a witness that lets you “match” on the type.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L27-L32

showIfBool :: Sigma TypeRep -> String
showIfBool (MkSigma tr x) = case testEquality tr (typeRep @Bool) of
  Just Refl -> case x of -- in this branch, we know x is a Bool
    False -> "False"
    True -> "True"
  Nothing -> "Not a Bool"

This uses type application syntax, @Bool, that lets us pass in the type Bool to the function typeRep :: Typeable a => TypeRep a.

Now we can use TypeRep’s interface to “match” (using testEquality) on if the value inside is a Bool. If the match works (and we get Just Refl) then we can treat x as a Bool in that case. If it doesn’t (and we get Nothing), then we do what we would want to do otherwise.

ghci> let x = MkSigma typeRep True
ghci> let y = MkSigma typeRep (4 :: Int)
ghci> showIfBool x
"True"
ghci> showIfBool y
"Not a Bool"

This pattern is common enough that there’s the Data.Dynamic module in base that is Sigma TypeRep, and testEquality is replaced with that module’s fromDynamic:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L40-L45

showIfBoolDynamic :: Dynamic -> String
showIfBoolDynamic dyn = case fromDynamic dyn of
  Just x -> case x of -- in this branch, we know x is a Bool
    False -> "False"
    True -> "True"
  Nothing -> "Not a Bool"

For make our life easier in the future, let’s write a version of fromDynamic for our Sigma TypeRep:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L47-L53

castSigma :: TypeRep a -> Sigma TypeRep -> Maybe a
castSigma tr (MkSigma tr' x) = case testEquality tr tr' of
  Just Refl -> Just x
  Nothing -> Nothing

castSigma' :: Typeable a => Sigma TypeRep -> Maybe a
castSigma' = castSigma typeRep

But the reason why I’m presenting the more generic Sigma instead of the specific type Dynamic = Sigma TypeRep is that you can swap out TypeRep to get other interesting types. For example, if you had a witness of showability:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L55-L62

data Showable :: Type -> Type where
  WitShowable :: Show a => Showable a

showableInt :: Sigma Showable
showableInt = MkSigma WitShowable (3 :: Int)

showableBool :: Sigma Showable
showableBool = MkSigma WitShowable True

(This type is related to Dict Show from the constraints library; it’s technically Compose Dict Show)

And now we have a type Sigma Showable that’s kind of of “not-so-black”: we can at least use show on it:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L64-L65

showSigma :: Sigma Showable -> String
showSigma (MkSigma WitShowable x) = show x -- here, we know x is Show
ghci> let x = MkSigma WitShowable True
ghci> let y = MkSigma WitShowable 4
ghci> showSigma x
"True"
ghci> showSigma y
"4"

This is the “existential typeclass antipattern1, but since we are talking about different ways we can push the type system, it’s probably worth mentioning. In particular, Show is a silly typeclass to use in this context because a Sigma Showable is equivalent to just a String: once you match on the constructor to get the value, the only thing you can do with the value is show it anyway.

One fun thing we can do is provide a “useless witness”, like Proxy:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L67-L70

data Proxy a = Proxy

uselessBool :: Sigma Proxy
uselessBool = MkSigma Proxy True

So a value like MkSigma Proxy True :: Sigma Proxy is truly a useless data type (basically our Any from before), since we know that MkSigma constrains some value of some type, but there’s no witness to give us any clue on how we can use it. A Sigma Proxy is isomorphic to ().

On the other extreme, we can use a witness to constrain the value to only be a specific type, like IsBool:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L72-L76

data IsBool :: Type -> Type where
  ItsABool :: IsBool Bool

justABool :: Sigma IsBool
justABool = MkSigma ItsABool False

So you can have a value of type MkSigma ItsABool True :: Sigma IsBool, or MkSigma ItsABool False, but MkSigma ItsABool 2 will not typecheck — remember, to make a Sigma, you need a p a and an a. ItsABool :: IsBool Bool, so the a you put in must be Bool to match. Sigma IsBool is essentially isomorphic to Bool.

There’s a general version of this too, (:~:) a (from Data.Type.Equality in base). (:~:) Bool is our IsBool earlier. Sigma ((:~:) a) is essentially exactly a…basically bringing us incidentally back to complete type safety? Weird. Anyway.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L78-L79

justAnInt :: Sigma ((:~:) Int)
justAnInt = MkSigma Refl 10 -- Refl :: Int :~: Int

I think one interesting thing to see here is that being “type-unsafe” in Haskell can be much less convenient than doing something similar in a dynamically typed language like python. The python ecosystem is designed around runtime reflection and inspection for properties and interfaces, whereas the dominant implementation of interfaces in Haskell (typeclasses) doesn’t gel with this. There’s no runtime typeclass instantiation: we can’t pattern match on a TypeRep and check if it’s an instance of Ord or not.

That’s why I don’t fancy those memes/jokes about how dynamically typed languages are just “static types with a single type”. The actual way you use those types (and the ecosystem built around them) lend themselves to different ergonomics, and the reductionist take doesn’t quite capture that nuance.

Level 2: Heterogeneous List

Code available here

The lowest level of safety in which a list might be useful is the dynamically heterogeneous list. This is the level where lists (or “arrays”) live in most dynamic languages.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level2.hs#L12-L12

type HList p = [Sigma p]

We tag values with a witness p for the same reason as before: if we don’t provide some type of witness, our type is useless.

The “dynamically heterogeneous list of values of any type” is HList TypeRep. This is somewhat similar to how functions with positional arguments work in a dynamic language like javascript. For example, here’s a function that connects to a host (String), optionally taking a port (Int) and a method (Method).

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level2.hs#L14-L33

data Method = HTTP | HTTPS

indexHList :: Int -> HList p -> Maybe (Sigma p)
indexHList 0 [] = Nothing
indexHList 0 (x : _) = Just x
indexHList n (_ : xs) = indexHList (n - 1) xs

mkConnection :: HList TypeRep -> IO ()
mkConnection args = doTheThing host port method
  where
    host :: Maybe String
    host = castSigma' =<< indexHList 0 args
    port :: Maybe Int
    port = castSigma' =<< indexHList 1 args
    method :: Maybe Method
    method = castSigma' =<< indexHList 2 args

Of course, this would probably be better expressed in Haskell as a function of type Maybe String -> Maybe Int -> Maybe Method -> IO (). But maybe this could be useful in a situation where you would want to offer the ability to take arguments in any order? We could “find” the first value of a given type:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level2.hs#L35-L36

findValueOfType :: Typeable a => HList TypeRep -> Maybe a
findValueOfType = listToMaybe . mapMaybe castSigma'

Then we could write:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level2.hs#L39-L47

mkConnectionAnyOrder :: HList TypeRep -> IO ()
mkConnectionAnyOrder args = doTheThing host port method
  where
    host :: Maybe String
    host = findValueOfType args
    port :: Maybe Int
    port = findValueOfType args
    method :: Maybe Method
    method = findValueOfType args

But is this a good idea? Probably not.

Anyway, one very common usage of this type is for “extensible” systems that let you store components of different types in a container, as long as they all support some common interface (ie, the widgets system from the Luke Palmer post).

For example, we could have a list of any item as long as the item is an instance of Show: that’s HList Showable!

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level2.hs#L52-L55

showAll :: HList Showable -> [String]
showAll = map showSigma
  where
    showSigma (MkSigma WitShowable x) = show x
ghci> let xs = [MkSigma WitShowable 1, MkSigma WitShowable True]
ghci> showAll xs
["1", "True"]

Again, Show is a bad typeclass to use for this because we might as well be storing [String]. But for fun, let’s imagine some other things we could fill in for p. If we use HList Proxy, then we basically don’t have any witness at all. We can’t use the values in the list in any meaningful way; HList Proxy is essentially the same as Natural, since the only information is the length.

If we use HList IsBool, we basically have [Bool], since every item must be a Bool! In general, HList ((:~:) a) is the same as [a].

Level 3: Homogeneous Dynamic List

Code available here

A next level of type safety we can add is to ensure that all elements in the list are of the same type. This adds a layer of usefulness because there are a lot of things we might want to do with the elements of a list that are only possible if they are all of the same type.

First of all, let’s clarify a subtle point here. It’s very easy in Haskell to consume lists where all elements are of the same (but not necessarily known) type. Functions like sum :: Num a => [a] -> a and sort :: Ord a => [a] -> [a] do that. This is “polymorphism”, where the function is written to not worry about the type, and the ultimate caller of the function must pick the type they want to use with it. For the sake of this discussion, we aren’t talking about consuming values — we’re talking about producing and storing values where the producer (and not the consumer) controls the type variable.

To do this, we can flip the witness to outside the list:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L17-L18

data SomeList :: (Type -> Type) -> Type where
  MkSomeList :: p a -> [a] -> SomeList p

We can write some meaningful predicates on this list — for example, we can check if it is monotonic (the items increase in order)

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L21-L32

data Comparable :: Type -> Type where
  WitOrd :: Ord a => Comparable a

monotonic :: Ord a => [a] -> Bool
monotonic [] = True
monotonic (x : xs) = go x xs
  where
    go y [] = True
    go y (z : zs) = (y <= z) && go z zs

monotonicSomeList :: SomeList Comparable -> Bool
monotonicSomeList (MkSomeList WitOrd xs) = monotonic xs

This is fun, but, as mentioned before, monotonicSomeList doesn’t have any advantage over monotonic, because the caller determines the type. What would be more motivating here is a function that produces “any sortable type”, and the caller has to use it in a way generic over all sortable types. For example, a database API might let you query a database for a column of values, but you don’t know ahead of time what the exact type of that column is. You only know that it is “some sortable type”. In that case, a SomeList could be useful.

For a contrived one, let’s think about pulling such a list from IO:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L34-L54

getItems :: IO (SomeList Comparable)
getItems = do
  putStrLn "would you like to provide int or bool or string?"
  ans <- getLine
  case map toLower ans of
    "int" -> MkSomeList WitOrd <$> replicateM 3 (readLn @Int)
    "bool" -> MkSomeList WitOrd <$> replicateM 3 (readLn @Bool)
    "string" -> MkSomeList WitOrd <$> replicateM 3 getLine
    _ -> throwIO $ userError "no"

getAndAnalyze :: IO ()
getAndAnalyze = do
  MkSomeList WitOrd xs <- getItems
  putStrLn $ "Got " ++ show (length xs) ++ " items."
  let isMono = monotonic xs
      isRevMono = monotonic (reverse xs)
  when isMono $
    putStrLn "The items are monotonic."
  when (isMono && isRevMono) $ do
    putStrLn "The items are monotonic both directions."
    putStrLn "This means the items are all identical."

Consider also an example where process items different based on what type they have:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L62-L68

processList :: SomeList TypeRep -> Bool
processList (MkSomeList tr xs)
  | Just Refl <- testEquality tr (typeRep @Bool) = and xs
  | Just Refl <- testEquality tr (TypeRep @Int) = sum xs > 50
  | Just Refl <- testEquality tr (TypeRep @Double) = sum xs > 5.0
  | Just Refl <- testEquality tr (TypeRep @String) = "hello" `elem` xs
  | otherwise = False

(That’s pattern guard syntax, if you were wondering)

In this specific situation, using a closed ADT of all the types you’d actually want is probably preferred (like data Value = VBool Bool | VInt Int | VDouble Double | VString String), since we only ever get one of four different types. Using Comparable like this gives you a completely open type that can take any instance of Ord, and using TypeRep gives you a completely open type that can take literally anything.

This pattern is overall similar to how lists are often used in practice for dynamic languages: often when we use lists in dynamically typed situations, we expect them all to have items of the same type or interface. However, using lists this way (in a language without type safety) makes it really tempting to hop down into Level 2, where you start throwing “alternatively typed” things into your list, as well, for convenience. And then the temptation comes to also hop down to Level 1 and throw a null in every once in a while. All of a sudden, any consumers must now check the type of every item, and a lot of things are going to start needing unit tests.

Now, let’s talk a bit about ascending and descending between each levels. In the general case we don’t have much to work with, but let’s assume our constraint is TypeRep here, so we can match for type equality.

We can move from Level 3 to Level 2 by moving the TypeRep into the values of the list, and we can move from Level 3 to Level 1 by converting our TypeRep a into a TypeRep [a]:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L75-L86

someListToHList :: SomeList TypeRep -> HList TypeRep
someListToHList (MkSomeList tr xs) = MkSigma tr <$> xs

someListToSigma :: SomeList TypeRep -> Sigma TypeRep
someListToSigma (MkSomeList tr xs) = MkSigma (typeRep @[] `App` tr) xs

App here as a constructor lets us come TypeReps: App :: TypeRep f -> TypeRep a -> TypeRep (f a).

Going the other way around is trickier. For HList, we don’t even know if every item has the same type, so we can only successfully move up if every item has the same type. So, first we get the typeRep for the first value, and then cast the other values to be the same type if possible:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L70-L73

hlistToSomeList :: HList TypeRep -> Maybe (SomeList TypeRep)
hlistToSomeList = \case
  [] -> Nothing
  MkSigma tr x : xs -> MkSomeList tr . (x :) <$> traverse (castSigma tr) xs

To go from Sigma TypeRep, we first need to match the TypeRep as some f a application using the App pattern…then we can check if f is [] (list), then we can create a SomeList with the TypeRep a. But, testEquality can only be called on things of the same kind, so we have to verify that f has kind Type -> Type first, so that we can even call testEquality on f and []! Phew! Dynamic types are hard!

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L78-L83

sigmaToHList :: Sigma TypeRep -> Maybe (SomeList TypeRep)
sigmaToHList (MkSigma tr xs) = do
  App tcon telem <- Just tr
  Refl <- testEquality (typeRepKind telem) (typeRep @Type)
  Refl <- testEquality tcon (typeRep @[])
  pure $ MkSomeList telem xs

Level 4: Homogeneous Typed List

Ahh, now right in the middle, we’ve reached Haskell’s ubiquitous list type! It is essentially:

data List :: Type -> Type where
    Nil  :: List a
    Cons :: a -> List a -> List a

I don’t have too much to say here, other than to acknowledge that this is truly a “sweet spot” in terms of safety vs. unsafety and usability. This simple List a / [a] type has so many benefits from type-safety:

  • It lets us write functions that can meaningfully say that the input and result types are the same, like take :: Int -> [a] -> [a]
  • It lets us write functions that can meaningfully link lists and the items in the list, like head :: [a] -> a and replicate :: Int -> a -> [a].
  • It lets us write functions that can meaningfully state relationships between input and results, like map :: (a -> b) -> [a] -> [b]
  • We can require two input lists to have the same type of items, like (++) :: [a] -> [a] -> [a]
  • We can express complex relationships between inputs and outputs, like zipWith :: (a -> b -> c) -> [a] -> [b] -> [c].

The property of being able to state and express relationships between the values of input lists and output lists and the items in those lists is extremely powerful, and also extremely ergonomic to use in Haskell. It can be argued that Haskell, as a language, was tuned explicitly to be used with the least friction at this exact level of type safety. Haskell is a “Level 4 language”.

Level 5: Fixed-size List

Code available here

From here on, we aren’t going to be “building up” linearly on safety, but rather showing three structural type safety mechanism of increasing strength and complexity.

For Level 5, we’re not going to try to enforce anything on the contents of the list, but we can try to enforce something on the spline of the list: the number of items!

To me, this level still feels very natural in Haskell to write in, although in terms of usability we are starting to bump into some of the things Haskell is lacking for higher type safety ergonomics. I’ve talked about fixed-length vector types in depth before, so this is going to be a high-level view contrasting this level with the others.2

The essential concept is to introduce a phantom type, a type parameter that doesn’t do anything other than indicate something that we can use in user-space. Here we will create a type that structurally encodes the natural numbers 0, 1, 2…:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L15-L15

data Nat = Z | S Nat

So, Z will represent zero, S Z will represent one, S (S Z) will represent two, etc. We want to create a type Vec n a, where n will be a type of kind Nat (promoted using DataKinds, which lets us use Z and S as type constructors), representing a linked list with n elements of type a.

We can define Vec in a way that structurally matches how Nat is constructed, which is the key to making things work nicely:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L17-L21

data Vec :: Nat -> Type -> Type where
  VNil :: Vec Z a
  (:+) :: a -> Vec n a -> Vec (S n) a

infixr 5 :+

This is offered in the vec library. Here are some example values:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L23-L33

zeroItems :: Vec Z Int
zeroItems = VNil

oneItem :: Vec (S Z) Int
oneItem = 1 :+ VNil

twoItems :: Vec (S (S Z)) Int
twoItems = 1 :+ 2 :+ VNil

threeItems :: Vec (S (S (S Z))) Int
threeItems = 1 :+ 2 :+ 3 :+ VNil

Note two things:

  1. 1 :+ 2 :+ VNil gets automatically type-inferred to be a Vec (S (S Z)) a, because every application of :+ adds an S to the phantom type.
  2. There is only one way to construct a Vec (S (S Z)) a: by using :+ twice. That means that such a value is a list of exactly two items.

However, the main benefit of this system is not so you can create a two-item list…just use tuples or data V2 a = V2 a a from linear for that. No, the main benefit is that you can now encode how arguments in your functions relate to each other with respect to length.

For example, the type alone of map :: (a -> b) -> [a] -> [b] does not tell you that the length of the result list is the same as the length of the input list. However, consider vmap :: (a -> b) -> Vec n a -> Vec n b. Here we see that the output list must have the same number of items as the input list, and it’s enforced right there in the type signature!

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L35-L38

vmap :: (a -> b) -> Vec n a -> Vec n b
vmap f = \case
  VNil -> VNil
  x :+ xs -> f x :+ vmap f xs

And how about zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]? It’s not clear or obvious at all how the final list’s length depends on the input lists’ lengths. However, a vzipWith would ensure the input lengths are the same size and that the output list is also the same length:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L40-L45

vzipWith :: (a -> b -> c) -> Vec n a -> Vec n b -> Vec n c
vzipWith f = \case
  VNil -> \case
    VNil -> VNil
  x :+ xs -> \case
    y :+ ys -> f x y :+ vzipWith f xs ys

Note that both of the inner pattern matches are known by GHC to be exhaustive: if it knows that the first list is VNil, then it knows that n ~ Z, so the second list has to also be VNil. Thanks GHC!

From here on out, we’re now always going to assume that GHC’s exhaustiveness checker is on, so we always handle every branch that GHC tells us is necessary, and skip handling branches that GHC tells us is unnecessary (through compiler warnings).

We can even express more complicated relationships with type families (type-level “functions”):

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L47-L63

type family Plus (x :: Nat) (y :: Nat) where
  Plus Z y = y
  Plus (S z) y = S (Plus z y)

type family Times (x :: Nat) (y :: Nat) where
  Times Z y = Z
  Times (S z) y = Plus y (Times z y)

vconcat :: Vec n a -> Vec m a -> Vec (Plus n m) a
vconcat = \case
  VNil -> id
  x :+ xs -> \ys -> x :+ vconcat xs ys

vconcatMap :: (a -> Vec m b) -> Vec n a -> Vec (Times n m) b
vconcatMap f = \case
  VNil -> VNil
  x :+ xs -> f x `vconcat` vconcatMap f xs

Note that all of these only work in GHC because the structure of the functions themselves match exactly the structure of the type families. If you follow the pattern matches in the functions, note that they match exactly with the different equations of the type family.

Famously, we can totally index into fixed-length lists, in a way that indexing will not fail. To do that, we have to define a type Fin n, which represents an index into a list of length n. So, Fin (S (S (S Z))) will be either 0, 1, or 2, the three possible indices of a three-item list.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L65-L76

data Fin :: Nat -> Type where
  -- | if z is non-zero, FZ :: Fin z gives you the first item
  FZ :: Fin ('S n)
  -- | if i indexes into length z, then (i+1) indixes into length (z+1)
  FS :: Fin n -> Fin ('S n)

vindex :: Fin n -> Vec n a -> a
vindex = \case
  FZ -> \case
    x :+ _ -> x
  FS i -> \case
    _ :+ xs -> vindex i xs

Fin takes the place of Int in index :: Int -> [a] -> a. You can use FZ in any non-empty list, because FZ :: Fin (S n) will match any Vec (S n) (which is necessarily of length greater than 0). You can use FS FZ only on something that matches Vec (S (S n)). This is the type-safety.

We can also specify non-trivial relationships between lengths of lists, like making a more type-safe take :: Int -> [a] -> [a]. We want to make sure that the result list has a length less than or equal to the input list. We need another “int” that can only be constructed in the case that the result length is less than or equal to the first length. This called “proofs” or “witnesses”, and act in the same role as TypeRep, (:~:), etc. did above for our Sigma examples.

We want a type LTE n m that is a “witness” that n is less than or equal to m. It can only be constructed for if n is less than or equal to m. For example, you can create a value of type LTE (S Z) (S (S Z)), but not of LTE (S (S Z)) Z

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L78-L87

data LTE :: Nat -> Nat -> Type where
  -- | Z is less than or equal to any number
  LTEZ :: LTE Z m
  -- | if n <= m, then (n + 1) <= (m + 1)
  LTES :: LTE n m -> LTE ('S n) ('S m)

vtake :: LTE n m -> Vec m a -> Vec n a
vtake = \case
  LTEZ -> \_ -> VNil
  LTES l -> \case x :+ xs -> x :+ vtake l xs

Notice the similarity to how we would define take :: Int -> [a] -> [a]. We just spiced up the Int argument with type safety.

Another thing we would like to do is use be able to create lists of arbitrary length. We can look at replicate :: Int -> a -> [a], and create a new “spicy int” SNat n, so vreplicate :: SNat n -> a -> Vec n a

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L89-L96

data SNat :: Nat -> Type where
  SZ :: SNat Z
  SS :: SNat n -> SNat (S n)

vreplicate :: SNat n -> a -> Vec n a
vreplicate = \case
  SZ -> \_ -> VNil
  SS n -> \x -> x :+ vreplicate n x

Notice that this type has a lot more guarantees than replicate. For replicate :: Int -> a -> [a], we can’t guarantee (as the caller) that the return type does have the length we give it. But for vreplicate :: SNat n -> a -> Vec n a, it does!

SNat n is actually kind of special. We call it a singleton, and it’s useful because it perfectly reflects the structure of n the type, as a value…nothing more and nothing less. By pattern matching on SNat n, we can exactly determine what n is. SZ means n is Z, SS SZ means n is S Z, etc. This is useful because we can’t directly pattern match on types at runtime in Haskell (because of type erasure), but we can pattern match on singletons at runtime.

We actually encountered singletons before in this post! TypeRep a is a singleton for the type a: by pattern matching on it (like with App earlier), we can essentially “pattern match” on the type a itself.

In practice, we often write typeclasses to automatically generate singletons, similar to Typeable from before:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L98-L108

class KnownNat n where
  nat :: SNat n

instance KnownNat Z where
  nat = SZ

instance KnownNat n => KnownNat (S n) where
  nat = SS nat

vreplicate' :: KnownNat n => a -> Vec n a
vreplicate' = vreplicate nat

One last thing: moving back and forth between the different levels. We can’t really write a [a] -> Vec n a, because in Haskell, the type variables are determined by the caller. We want n to be determined by the list, and the function itself. And now suddenly we run into the same issue that we ran into before, when moving between levels 2 and 3.

We can do the same trick before and write an existential wrapper:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L110-L116

data SomeVec a = forall n. MkSomeVec (SNat n) (Vec n a)

toSomeVec :: [a] -> SomeVec a
toSomeVec = \case
  [] -> MkSomeVec SZ VNil
  x : xs -> case toSomeVec xs of
    MkSomeVec n ys -> MkSomeVec (SS n) (x :+ ys)

It is common practice (and a good habit) to always include a singleton (or a singleton-like typeclass constraint) to the type you are “hiding” when you create an existential type wrapper, even when it is not always necessary. That’s why we included TypeRep in HList and SomeList earlier.

SomeVec a is essentially isomorphic to [a], except you can pattern match on it and get the length n as a type you can use.

There’s a slightly more light-weight method of returning an existential type: by returning it in a continuation.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L118-L121

withVec :: [a] -> (forall n. SNat n -> Vec n a -> r) -> r
withVec = \case
  [] -> \f -> f SZ VNil
  x : xs -> \f -> withVec xs \n ys -> f (SS n) (x :+ ys)

That way, you can use the type variable within the continuation. Doing withSomeVec xs \n v -> .... is identical to case toSomeVec xs of SomeVec n v -> ....

However, since you don’t get the n itself until runtime, you might find yourself struggling to use concepts like Fin and LTE. To do use them comfortably, you have to write functions to “check” if your LTE is even possible, known as “decision functions”:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L123-L128

isLTE :: SNat n -> SNat m -> Maybe (LTE n m)
isLTE = \case
  SZ -> \_ -> Just LTEZ
  SS n -> \case
    SZ -> Nothing
    SS m -> LTES <$> isLTE n m

This was a very whirlwind introduction, and I definitely recommend reading this post on fixed-length lists for a more in-depth guide and tour of the features. In practice, fixed-length lists are not that useful because the situations where you want lazily linked lists and the situations where you want them to be statically sized has very little overlap. But you will often see fixed-length vectors in real life code — mostly numerical code.

Overall as you can see, at this level we gain some powerful guarantees and tools, but we also run into some small inconveniences (like manipulating witnesses and singletons). This level is fairly comfortable to work with in modern Haskell tooling. However, if you live here long enough, you’re going to eventually be tempted to wander into…

Level 6: Local Structure Enforced List

Code available here

For our next level let’s jump back back into constraints on the contents of the list. Let’s imagine a priority queue on top of a list. Each value in the list will be a (priority, value) pair. To make the pop operation (pop out the value of lowest priority) efficient, we can enforce that the list is always sorted by priority: the lowest priority is always first, the second lowest is second, etc.

If we didn’t care about type safety, we could do this by always inserting a new item so that it is sorted:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L21-L26

insertSortedList :: (Int, a) -> [(Int, a)] -> [(Int, a)]
insertSortedList (p, x) = \case
  [] -> [(p, x)]
  (q, y) : ys
    | p <= q -> (p, x) : (q, y) : ys
    | otherwise -> (q, y) : insertSortedList (p, x) ys

This method enforces a local structure: between every item x and the next item y in x:y:zs, the priority of x has to be less than the priority y. Keeping our structure local means we only need to enforce local invariants.

Writing it all willy nilly type unsafe like this could be good for a single function, but we’re also going to need some more complicated functions. What if we wanted to “combine” (merge) two sorted lists together. Using a normal list, we don’t have any assurances that we have written it correctly, and it’s very easy to mess up. How about we leverage type safety to ask GHC to ensure that our functions are always correct, and always preserve this local structure? Now you’re thinking in types!

Introducing level 6: enforcing local structure!

But, first, a quick note before we dive in: for the rest of this post, for the sake of simplicity, let’s switch from inductively defined types (like Nat above) to GHC’s built in opaque Nat type. You can think of it as essentially the same as the Nat we wrote above, but opaque and provided by the compiler. Under the hood, it’s implemented using machine integers for efficiency. And, instead of using concrete S (S (S Z)) syntax, you’d use abstract numeric literals, like 3. There’s a trade-off: because it’s opaque, we can’t pattern match on it and create or manipulate our own witnesses — we are at the mercy of the API that GHC gives us. We get +, <=, Min, etc., but in total it’s not that extensive. That’s why I never use these without also bringing typechecker plugins (ghc-typelits-natnormalise and ghc-typelits-knonwnnat) to help automatically bring witnesses and equalities and relationships into scope for us. Everything here could be done using hand-defined witnesses and types, but we’re using TypeNats here just for the sake of example.

{-# OPTIONS_GHC -fplugin GHC.TypeLits.KnownNat.Solver #-}
{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}

With that disclaimer out of the way, let’s create our types! Let’s make an Entry n a type that represents a value of type a with priority n.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L28-L28

newtype Entry (n :: Nat) a = Entry a

We’d construct this like Entry @3 "hello", which produces Entry 3 String. Again this uses type application syntax, @3, that lets us pass in the type 3 to the constructor Entry :: forall n a. a -> Entry n a.

Now, let’s think about what phantom types we want to include in our list. The fundamental strategy in this, as I learned from Conor McBride’s great writings on this topic, are:

  • Think about what “type safe operations” you want to have for your structure
  • Add just enough phantom types to perform those operations.

In our case, we want to be able to cons an Entry n a to the start of a sorted list. To ensure this, we need to know that n is less than or equal to the list’s current minimum priority. So, we need our list type to be Sorted n a, where n is the current minimum priority.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L33-L35

data Sorted :: Nat -> Type -> Type where
  SSingle :: Entry n a -> Sorted n a
  SCons :: (KnownNat m, n <= m) => Entry n a -> Sorted m a -> Sorted n a

To keep things simple, we are only going to talk about non-empty lists, so the minimum priority is always defined.

So, a Sorted n a is either SSingle (x :: Entry n a), where the single item is a value of priority n, or SCons x xs, where x has priority n and xs :: Sorted m a, where n <= m. In our previous inductive Nat, you could imagine this as SCons :: SNat m -> LTE n m -> Entry n a -> Sorted m a -> Sorted n a, but here we will use GHC’s built-in <= typeclass-based witness of less-than-or-equal-to-ness.

This works! You should be able to write:

Entry @1 'a' `SCons` Entry @2 'b' `SCons` SSingle (Entry @4 'c')

This creates a valid list where the priorities are all sorted from lowest to highest. You can now pop using pattern matching, which gives you the lowest element by construction. If you match on SCons x xs, you know that no entry in xs has a priority lower than x.

Critically, note that creating something out-of-order like the following would be a compiler error:

Entry @9 'a' `SCons` Entry @2 'b' `SCons` SSingle (Entry @4 'c')

Now, the users of our priority queue probably won’t often care about having the minimum priority in the type. In this case, we are using the phantom type to ensure that our data structure is correct by construction, for our own sake, and also to help us write internal functions in a correct way. So, for practical end-user usage, we want to existentially wrap out n.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L103-L120

data SomeSorted a = forall n. KnownNat n => SomeSorted (Sorted n a)

popSomeSorted :: Sorted n a -> (Entry n a, Maybe (SomeSorted a))
popSomeSorted = \case
  SSingle x -> (x, Nothing)
  SCons x xs -> (x, Just (SomeSorted xs))

popSomeSorted takes an Sorted n a and returns the Entry n a promised at the start of it, and then the rest of the list if there is anything left, eliding the phantom parameter.

Now let’s get to the interesting parts where we actually leverage n: let’s write insertSortedList, but the type-safe way!

First of all, what should the type be if we insert an Entry n a into a Sorted m a? If n <= m, it would be Sorted n a. If n > m, it should be Sorted m a. GHC gives us a type family Min n m, which returns the minimum between n and m. So our type should be:

insertSorted :: Entry n a -> Sorted m a -> Sorted (Min n m) a

To write this, we can use some helper functions: first, to decide if we are in the n <= m or the n > m case:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L41-L51

data DecideInsert :: Nat -> Nat -> Type where
  DIZ :: (n <= m, Min n m ~ n) => DecideInsert n m
  DIS :: (m <= n, Min n m ~ m) => DecideInsert n m

decideInsert :: forall a b. (KnownNat a, KnownNat b) => DecideInsert a b
decideInsert = case cmpNat (Proxy @a) (Proxy @b) of
  LTI -> DIZ -- if a < b, DIZ
  EQI -> DIZ -- if a == b, DIZ
  GTI -> case cmpNat (Proxy @b) (Proxy @a) of
    LTI -> DIS -- if a > b, DIZ, except GHC isn't smart enough to know this
    GTI -> error "absurd, we can't have both a > b and b > a"

We can use decideInsert to branch on if we are in the case where we insert the entry at the head or the case where we have to insert it deeper. DecideInsert here is our witness, and decideInsert constructs it using cmpNat, provided by GHC to compare two Nats. We use Proxy :: Proxy n to tell it what nats we want to compare. KnownNat is the equivalent of our KnownNat class we wrote from scratch, but with GHC’s TypeNats instead of our custom inductive Nats.

cmpNat :: (KnownNat a, KnownNat b) => p a -> p b -> OrderingI a b

data OrderingI :: k -> k -> Type where
    LTI :: -- in this branch, a < b
    EQI :: -- in this branch, a ~ b
    GTI :: -- in this branch, a > b

Note that GHC and our typechecker plugins aren’t smart enough to know we can rule out b > a if a > b is true, so we have to leave an error that we know will never be called. Oh well. If we were writing our witnesses by hand using inductive types, we could write this ourselves, but since we are using GHC’s Nat, we are limited by what their API can prove.

Let’s start writing our insertSorted:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L64-L76

insertSorted ::
  forall n m a.
  (KnownNat n, KnownNat m) =>
  Entry n a ->
  Sorted m a ->
  Sorted (Min n m) a
insertSorted x = \case
  SSingle y -> case decideInsert @n @m of
    DIZ -> SCons x (SSingle y)
    DIS -> SCons y (SSingle x)
  SCons @q y ys -> case decideInsert @n @m of
    DIZ -> SCons x (SCons y ys)
    DIS -> sConsMin @n @q y (insertSorted x ys)

The structure is more or less the same as insertSortedList, but now type safe! We basically use our handy helper function decideInsert to dictate where we go. I also used a helper function sConsMin to insert into the recursive case

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L53-L62

sConsMin ::
  forall q r n a.
  (KnownNat q, KnownNat r, n <= q, n <= r) =>
  Entry n a ->
  Sorted (Min q r) a ->
  Sorted n a
sConsMin = case cmpNat (Proxy @q) (Proxy @r) of
  LTI -> SCons :: Entry n a -> Sorted q a -> Sorted n a
  EQI -> SCons :: Entry n a -> Sorted q a -> Sorted n a
  GTI -> SCons :: Entry n a -> Sorted r a -> Sorted n a

sConsMin isn’t strictly necessary, but it saves a lot of unnecessary pattern matching. The reason why we need it is because we want to write SCons y (insertSorted x ys) in the last line of insertSorted. However, in this case, SCons does not have a well-defined type. It can either be Entry n -> Sorted q a -> Sorted n a or Entry n -> Sorted r a -> Sorted n a. Haskell requires functions to be specialized at the place we actually use them, so this is no good. We would have to pattern match on cmpNat and LTI/EQI/GTI in order to know how to specialize SCons. So, we use sConsMin to wrap this up for clarity.

How did I know this? I basically tried writing it out the full messy way, bringing in as much witnesses and pattern matching as I could, until I got it to compile. Then I spent time factoring out the common parts until I got what we have now!

Note that we use a feature called “Type Abstractions” to “match on” the existential type variable q in the pattern SCons @q y ys. Recall from the definition of SCons that the first type variable is the minimum priority of the tail.

And just like that, we made our insertSortedList type-safe! We can no longer return an unsorted list: it always inserts sortedly, by construction, enforced by GHC. We did cheat a little with error, that was only because we used GHC’s TypeNats…if we used our own inductive types, all unsafety can be avoided.

Let’s write the function to merge two sorted lists together. This is essentially the merge step of a merge sort: take two lists, look at the head of each one, cons the smaller of the two heads, then recurse.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L78-L92

mergeSorted ::
  forall n m a.
  (KnownNat n, KnownNat m) =>
  Sorted n a ->
  Sorted m a ->
  Sorted (Min n m) a
mergeSorted = \case
  SSingle x -> insertSorted x
  SCons @q x xs -> \case
    SSingle y -> case decideInsert @n @m of
      DIZ -> sConsMin @q @m x (mergeSorted xs (SSingle y))
      DIS -> SCons y (SCons x xs)
    SCons @r y ys -> case decideInsert @n @m of
      DIZ -> sConsMin @q @m x (mergeSorted xs (SCons y ys))
      DIS -> sConsMin @n @r y (mergeSorted (SCons x xs) ys)

Again, this looks a lot like how you would write the normal function to merge two sorted lists…except this time, it’s type-safe! You can’t return an unsorted list because the result list has to be sorted by construction.

To wrap it all up, let’s write our conversion functions. First, an insertionSort function that takes a normal non-empty list of priority-value pairs and throws them all into a Sorted, which (by construction) is guaranteed to be sorted:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L107-L135

insertionSort ::
  forall a.
  NonEmpty (Natural, a) ->
  SomeSorted a
insertionSort ((k0, x0) :| xs0) = withSomeSNat k0 \(SNat @k) ->
  go xs0 (SomeSorted (SSingle (Entry @k x0)))
  where
    go :: [(Natural, a)] -> SomeSorted a -> SomeSorted a
    go [] = id
    go ((k, x) : xs) = \case
      SomeSorted @_ @n ys -> withSomeSNat k \(SNat @k) ->
        go xs $
          someSortedMin @k @n $
            insertSorted (Entry @k x) ys

someSortedMin ::
  forall n m a.
  (KnownNat n, KnownNat m) =>
  Sorted (Min n m) a ->
  SomeSorted a
someSortedMin = case cmpNat (Proxy @n) (Proxy @m) of
  LTI -> SomeSorted
  EQI -> SomeSorted
  GTI -> SomeSorted

Some things to note:

  1. We’re using the nonempty list type type from base, because Sorted always has at least one element.
  2. We use withSomeSNat to turn a Natural into the type-level n :: Nat, the same way we wrote withVec earlier. This is just just the function that GHC offers to reify a Natural (non-negative Integer) to the type level.
  3. someSortedMin is used to clean up the implementation, doing the same job that sConsMin did.
ghci> case insertionSort ((4, 'a') :| [(3, 'b'), (5, 'c'), (4, 'd')]) of
          SomeSorted xs -> print xs
SCons Entry @3 'b' (SCons Entry @4 'd' (SCons Entry @4 'a' (SSingle Entry @5 'c')))

Finally, a function to convert back down to a normal non-empty list, using GHC’s natVal to “demote” a type-level n :: Nat to a Natural

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L137-L140

fromSorted :: forall n a. KnownNat n => Sorted n a -> NonEmpty (Natural, a)
fromSorted = \case
  SSingle (Entry x) -> (natVal (Proxy @n), x) :| []
  SCons (Entry x) xs -> (natVal (Proxy @n), x) NE.<| fromSorted xs

Level 7: Global structure Enforced List

Code available here

For our final level, let’s imagine a “weighted list” of (Int, a) pairs, where each item a has an associated weight or cost. Then, imagine a “bounded weighted list”, where the total cost must not exceed some limit value. Think of it as a list of files and their sizes and a maximum total file size, or a backpack for a character in a video game with a maximum total carrying weight.

There is a fundamental difference here between this type and our last type: we want to enforce a global invariant (total cannot exceed a limit), and we can’t “fake” this using local invariants like last time.

Introducing level 7: enforcing global structure! This brings some extra complexities, similar to the ones we encountered in Level 5 with our fixed-length lists: whatever phantom type we use to enforce this “global” invariant now becomes entangled to the overall structure of our data type itself.

Let’s re-use our Entry type, but interpret an Entry n a as a value of type a with a weight n. Now, we’ll again “let McBride be our guide” and ask the same question we asked before: what “type-safe” operation do we want, and what minimal phantom types do we need to allow this type-safe operation? In our case, we want to insert into our bounded weighted list in a safe way, to ensure that there is enough room. So, we need two phantom types:

  1. One phantom type lim to establish the maximum weight of our container
  2. Another phantom type n to establish the current used capacity of our container.

We want Bounded lim n a:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L24-L31

data Bounded :: Nat -> Nat -> Type -> Type where
  BNil :: Bounded lim 0 a
  BCons ::
    forall n m lim a.
    (KnownNat m, n + m <= lim) =>
    Entry n a ->
    Bounded lim m a ->
    Bounded lim (n + m) a
  • The empty bounded container BNil :: lim 0 a can satisfy any lim, and has weight 0.
  • If we have a Bounded lim m a, then we can add an Entry n a to get a Bounded lim (m + n) a provided that m + n <= lim using BCons.

Let’s try this out by seeing how the end user would “maybe insert” into a bounded list of it had enough capacity:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L133-L145

data SomeBounded :: Nat -> Type -> Type where
  SomeBounded :: KnownNat n => Bounded lim n a -> SomeBounded lim a

insertSomeBounded ::
  forall lim n a.
  (KnownNat lim, KnownNat n) =>
  Entry n a ->
  SomeBounded lim a ->
  Maybe (SomeBounded lim a)
insertSomeBounded x (SomeBounded @m xs) = case cmpNat (Proxy @(n + m)) (Proxy @lim) of
  LTI -> Just $ SomeBounded (BCons x xs)
  EQI -> Just $ SomeBounded (BCons x xs)
  GTI -> Nothing

First we match on the SomeBounded to see what the current capacity m is. Then we check using cmpNat to see if the Bounded can hold m + n. If it does, we can return successfully. Note that we define SomeBounded using GADT syntax so we can precisely control the order of the type variables, so SomeBounded @m xs binds m to the capacity of the inner list.

Remember in this case that the end user here isn’t necessarily using the phantom types to their advantage (except for lim, which could be useful). Instead, it’s us who is going to be using n to ensure that if we ever create any Bounded (or SomeBounded), it will always be within capacity by construction.

Now that the usage makes sense, let’s jump in and write some type-safe functions using our fancy phantom types!

First, let’s notice that we can always “resize” our Bounded lim n a to a Bounded lim' n a as long as the total usage n fits within the new carrying capacity:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L35-L38

reBounded :: forall lim lim' n a. n <= lim' => Bounded lim n a -> Bounded lim' n a
reBounded = \case
  BNil -> BNil
  BCons x xs -> BCons x (reBounded xs)

Note that we have full type safety here! GHC will prevent us from using reBounded if we pick a new lim that is less than what the bag currently weighs! You’ll also see the general pattern here that changing any “global” properties for our type here will require recursing over the entire structure to adjust the global property.

How about a function to combine two bags of the same weight? Well, this should be legal as long as the new combined weight is still within the limit:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L48-L56

concatBounded ::
  forall n m lim a.
  (KnownNat n, KnownNat m, KnownNat lim, n + m <= lim) =>
  Bounded lim n a ->
  Bounded lim m a ->
  Bounded lim (n + m) a
concatBounded = \case
  BNil -> id
  BCons @x @xs x xs -> BCons x . concatBounded xs

Aside

This is completely unrelated to the topic at hand, but if you’re a big nerd like me, you might enjoy the fact that this function makes Bounded lim n a the arrows of a Category whose objects are the natural numbers less than or equal to lim, the identity arrow is BNil, and arrow composition is concatBounded. Between object n and m, if n <= m, its arrows are values of type Bounded lim (m - n) a. Actually wait, it’s the same thing with Vec and vconcat above isn’t it? I guess we were moving so fast that I didn’t have time to realize it.

Anyway this is related to the preorder category, but not thin. A thicc preorder category, if you will. Always nice to spot a category out there in the wild.

It should be noted that the reason that reBounded and concatBounded look so clean so fresh is that we are heavily leveraging typechecker plugins. But, these are all still possible with normal functions if we construct the witnesses explicitly.

Now for a function within our business logic, let’s write takeBounded, which constricts a Bounded lim n a to a Bounded lim' q a with a smaller limit lim', where q is the weight of all of the elements that fit in the new limit. For example, if we had a bag of limit 15 containing items weighing 4, 3, and 5 (total 12), but we wanted to takeBounded with a new limit 10, we would take the 4 and 3 items, but leave behind the 5 item, to get a new total weight of 7.

It’d be nice to have a helper data type to existentially wrap the new q weight in our return type:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L113-L118

data TakeBounded :: Nat -> Nat -> Type -> Type where
  TakeBounded ::
    forall q lim n a.
    (KnownNat q, q <= n) =>
    Bounded lim q a ->
    TakeBounded lim n a

So the type of takeBounded would be:

takeBounded ::
  (KnownNat lim, KnownNat lim', KnownNat n) =>
  Bounded lim n a ->
  TakeBounded lim' n a

Again I’m going to introduce some helper functions that will make sense soon:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L40-L46

bConsExpand :: KnownNat m => Entry n a -> Bounded lim m a -> Bounded (n + lim) (n + m) a
bConsExpand x xs = withBoundedWit xs $ BCons x (reBounded xs)

withBoundedWit :: Bounded lim n a -> (n <= lim => r) -> r
withBoundedWit = \case
  BNil -> \x -> x
  BCons _ _ -> \x -> x

From the type, we can see bCons adds a new item while also increasing the limit: bConsExpand :: Entry n a -> Bounded lim m a -> Bounded (n + lim) (n + m) a. This is always safe conceptually because we can always add a new item into any bag if we increase the limit of the bag: Entry 100 a -> Bounded 5 3 a -> Bounded 105 103 a, for instance.

Next, you’ll notice that if we write this as BCons x (reBounded xs) alone, we’ll get a GHC error complaining that this requires m <= lim. This is something that we know has to be true (by construction), since there isn’t any constructor of Bounded that will give us a total weight m bigger than the limit lim. However, this requires a bit of witness manipulation for GHC to know this: we have to essentially enumerate over every constructor, and within each constructor GHC knows that m <= lim holds. This is what withBoundedWit does. We “know” n <= lim, we just need to enumerate over the constructors of Bounded lim n a so GHC is happy in every case.

withBoundedWit’s type might be a little confusing if this is the first time you’ve seen an argument of the form (constraint => r): it takes a Bounded lim n a and a “value that is only possible if n <= lim”, and then gives you that value.

With that, we’re ready:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L120-L131

takeBounded ::
  forall lim lim' n a.
  (KnownNat lim, KnownNat lim', KnownNat n) =>
  Bounded lim n a ->
  TakeBounded lim' n a
takeBounded = \case
  BNil -> TakeBounded BNil
  BCons @x @xs x xs -> case cmpNat (Proxy @x) (Proxy @lim') of
    LTI -> case takeBounded @lim @(lim' - x) xs of
      TakeBounded @q ys -> TakeBounded @(x + q) (bConsExpand x ys)
    EQI -> TakeBounded (BCons x BNil)
    GTI -> TakeBounded BNil

Thanks to the types, we ensure that the returned bag must contain at most lim'!

As an exercise, try writing splitBounded, which is like takeBounded but also returns the items that were leftover. Solution here.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L91-L103

data SplitBounded :: Nat -> Nat -> Nat -> Type -> Type where
  SplitBounded ::
    forall q lim lim' n a.
    (KnownNat q, q <= n) =>
    Bounded lim' q a ->
    Bounded lim (n - q) a ->
    SplitBounded lim lim' n a

splitBounded ::
  forall lim lim' n a.
  (KnownNat lim, KnownNat lim', KnownNat n) =>
  Bounded lim n a ->
  SplitBounded lim lim' n a

One final example, how about a function that reverses the Bounded lim n a? We’re going to write a “single-pass reverse”, similar to how it’s often written for lists:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L68-L73

reverseList :: [a] -> [a]
reverseList = go []
  where
    go res = \case
      [] -> res
      x : xs -> go (x : res) xs

Now, reversing a Bounded should be legal, because reversing the order of the items shouldn’t change the total weight. However, we basically “invert” the structure of the Bounded type, which, depending on how we set up our phantom types, could mean a lot of witness reshuffling. Luckily, our typechecker plugin handles most of it for us in this case, but it exposes one gap:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L58-L89

reverseBounded ::
  forall lim n a. (n <= lim, KnownNat lim, KnownNat n) => Bounded lim n a -> Bounded lim n a
reverseBounded = go BNil
  where
    go ::
      forall m q.
      (KnownNat m, KnownNat q, m <= lim, m + q <= lim) =>
      Bounded lim m a ->
      Bounded lim q a ->
      Bounded lim (m + q) a
    go res = \case
      BNil -> res
      BCons @x @xs x xs ->
        solveLte @m @q @x @lim $
          go @(x + m) @xs (BCons @x @m x res) xs

solveLte ::
  forall a b c n r.
  (KnownNat a, KnownNat c, KnownNat n, a + b <= n, c <= b) =>
  (a + c <= n => r) ->
  r
solveLte x = case cmpNat (Proxy @(a + c)) (Proxy @n) of
  LTI -> x
  EQI -> x
  GTI -> error "absurd: if a + b <= n and c < b, the a + c can't > n"

Due to how everything gets exposed, we need to prove that if a + b <= n and c <= b, then a + c <= n. This is always true, but the typechecker plugin needs a bit of help, and we have to resort to an unsafe operation to get this to work. However, if we were using our manually constructed inductive types instead of GHC’s opaque ones, we could write this in a type-safe and total way. We run into these kinds of issues a lot more often with global invariants than we do with local invariants, because the invariant phantom becomes so entangled with the structure of our data type.

And…that’s about as far as we’re going to go with this final level! If this type of programming with structural invariants is appealing to you, check out Conor McBride’s famous type-safe red-black trees in Haskell paper, or Edwin Brady’s Type-Driven Development in Idris for how to structure entire programs around these principles.

Evident from the fact that Conor’s work is in Agda, and Brady’s in Idris, you can tell that in doing this, we are definitely pushing the boundaries of what is ergonomic to write in Haskell. Well, depending on who you ask, we already zipped that boundary long ago. Still, there’s definitely a certain kind of joy to defining invariants in your data types and then essentially proving to the compiler that you’ve followed them. But, most people will be happier just writing a property test to fuzz the implementation on a non type-safe structure. And some will be happy with…unit tests. Ha ha ha ha. Good joke right?

Anyway, hope you enjoyed the ride! I hope you found some new ideas for ways to write your code in the future, or at least found them interesting or eye-opening. Again, none of the data structures here are presented to be practically useful as-is — the point is more to present these typing principles and mechanics in a fun manner and to inspire a sense of wonder.

Which level is your favorite, and what level do you wish you could work at if things got a little more ergonomic?

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)


  1. Luke’s blog has been known to switch back and forth from private to non-private, so I will link to the official post and respect the decision of the author on whether or not it should be visible. However, the term itself is quite commonly used and if you search for it online you will find much discussion about it.↩︎

  2. Note that I don’t really like calling these “vectors” any more, because in a computer science context the word vector carries implications of contiguous-memory storage. “Lists” of fixed length is the more appropriate description here, in my opinion. The term “vector” for this concept arises from linear algebra, where a vector is inherently defined by its vector space, which does have an inherent dimensionality. But we are talking about computer science concepts here, not mathematical concepts, so we should pick the name that provides the most useful implicit connotations.↩︎

by Justin Le at September 04, 2024 05:35 PM

September 02, 2024

Christopher Allen

Obtaining happiness by using Diesel Async in anger

I've been getting some questions from people about how to use Diesel and particularly diesel-async for interacting with SQL databases in Rust. I thought I'd write up a quick post with some patterns and examples.

The example project on GitHub for this post is located at: https://github.com/bitemyapp/better-living-through-petroleum/tree/blog/diesel-async-in-anger

The blog/diesel-async-in-anger Git tag is so you can see the version of the code that I'm using for this post.

by Unknown at September 02, 2024 12:00 AM

September 01, 2024

Magnus Therning

Improving how I handle secrets in my work notes

At work I use org-mode to keep notes about useful ways to query our systems, mostly that involves using the built-in SQL support to access DBs and ob-http to send HTTP requests. In both cases I often need to provide credentials for the systems. I'm embarrassed to admit it, but for a long time I've taken the easy path and kept all credentials in clear text. Every time I've used one of those code blocks I've thought I really ought to find a better way of handling these secrets one of these days. Yesterday was that day.

I ended up with two functions that uses auth-source and its ~/.authinfo.gpg file.

(defun mes/auth-get-pwd (host)
  "Get the password for a host (authinfo.gpg)"
  (-> (auth-source-search :host host)
      car
      (plist-get :secret)
      funcall))

(defun mes/auth-get-key (host key)
  "Get a key's value for a host (authinfo.gpg)

Not usable for getting the password (:secret), use 'mes/auth-get-pwd'
for that."
  (-> (auth-source-search :host host)
      car
      (plist-get key)))

It turns out that the library can handle more keys than the documentation suggests so for DB entries I'm using a machine (:host) that's a bit shorter and easier to remember than the full AWS hostname. Then I keep the DB host and name in dbhost (:dbhost) and dbname (:dbname) respectively. That makes an entry look like this:

machine db.svc login user port port password pwd dbname dbname dbhost dbhost

If I use it in a property drawer it looks like this

:PROPERTIES:
:header-args:sql: :engine postgresql
:header-args:sql+: :dbhost (mes/auth-get-key "db.svc" :dbhost)
:header-args:sql+: :dbport (string-to-number (mes/auth-get-key "db.svc" :port))
:header-args:sql+: :dbuser (mes/auth-get-key "db.svc" :user)
:header-args:sql+: :dbpassword (mes/auth-get-pwd "db.svc")
:header-args:sql+: :database (mes/auth-get-key "db.svc" :dbname)
:END:

September 01, 2024 01:03 PM

Mark Jason Dominus

Another corner of Pennsylvania

[ Previously: [1] [2] [3] ]

A couple of years back I wrote:

I live in southeastern Pennsylvania, so the Pennsylvania-New Jersey-Delaware triple point must be somewhere nearby. I sat up and got my phone so I could look at the map, and felt foolish.

Map of the Pennsylvania-New Jersey-Delaware triple border, about a kilometer offshore from Marcus Hook, PA, further described below.

As you can see, the triple point is in the middle of the Delaware River, as of course it must be; the entire border between Pennsylvania and New Jersey, all the hundreds of miles from its northernmost point (near Port Jervis) to its southernmost (shown above), runs right down the middle of the Delaware.

I briefly considered making a trip to get as close as possible, and photographing the point from land. That would not be too inconvenient. Nearby Marcus Hook is served by commuter rail. But Marcus Hook is not very attractive as a destination. Having been to Marcus Hook, it is hard for me to work up much enthusiasm for a return visit.

I was recently passing by Marcus Hook on the way back from Annapolis, so I thought what the heck, I'd stop in and see if I could get a look in the direction of the tripoint. As you can see from this screencap, I was at least standing in the right place, pointed in the right direction.

Screencap of my phone's map app, showing the same part of the river as the map above.  This one is marked with a blue dot (me) near the Marcus Hook Industrial Complex, pointed towards the tripoint, also labeled.

I didn't quite see the tripoint itself because this buoyancy-operated aquatic transport was in the way. I don't mind, it was more interesting to look at than open water would have been.

Photo of the Delaware river, taken from the Pennsylvania shore.  The near bank is covered with pretty green and purple weeds.  Floating in the river directly ahead is a pale green ship with a white superstructure, the BW Messina

Thanks to the Wonders of the Internet, I have learned that this is an LPG tanker. Hydrocarbons from hundreds of miles away are delivered to the refinery in Marcus Hook via rail, road, and pipeline, and then shipped out on vessels like this one. Infrastructure fans should check it out.

I was pleased to find that Marcus Hook wasn't as dismal as I remembered, it's just a typical industrial small town. I thought maybe I should go back and look around some more. If you hoped I might have something more interesting or even profound to say here, sorry.

Oh, I know. Here, I took this picture in Annapolis:

A sandstone plinth with the Maryland state coat of arms carved in bas-relief at the top. Under this are engraved words: ALBERT CABELL RITCHIE / 2876 – 1936 / FOUR TIMES GOVERNOR OF MARYLAND / HE WHO IS WORTHY OF HONOR DOES NOT DIE.

Perhaps he who is worthy of honor does not die. But fame is fleeting. Even if he who is worthy of honor does get a plinth, the grateful populace may not want to shell out for a statue.

by Mark Dominus (mjd@plover.com) at September 01, 2024 03:16 AM

August 30, 2024

Well-Typed.Com

GHC activities report: June–August 2024

This is the twenty-fourth edition of our GHC activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of June to August 2024. You can find the previous editions collected under the ghc-activities-report tag.

Sponsorship

We are delighted to offer new Haskell Ecosystem Support Packages to provide commercial users with access to Well-Typed’s experts while investing in the Haskell community and its technical ecosystem. If your company is using Haskell, read more about our offer, or get in touch with us today, so we can help you get the most out of the toolchain. We need more funding to continue our essential maintenance work!

Many thanks to our existing sponsors who make this work possible: Anduril and Juspay. In addition, we are grateful to Mercury for funding specific work on improved performance for developer tools on large codebases.

Team

The GHC team at Well-Typed currently consists of Ben Gamari, Andreas Klebinger, Matthew Pickering, Zubin Duggal, Sam Derbyshire and Rodrigo Mesquita. Cabal maintenance is undertaken by Mikolaj Konarski. In addition, many others within Well-Typed are contributing to GHC more occasionally.

Releases

Profiled dynamic support

Matthew has added support for building executables with the combination of dynamic linking and profiling, building on top of work by Ian-Woo Kim.

On the GHC side, we need to build and distribute libraries built in the profiled dynamic way. This required a few small changes in Hadrian (!12595).

The bulk of the work was adding support in Cabal (#9900). With Cabal >= 3.14 and GHC >= 9.12, libraries can be compiled in the profiled dynamic way by passing --profiling-shared. Passing --enable-profiling together with --enable-executable-dynamic will then allow one to build executables with profiling which are dynamically linked against the appropriate p_dyn archives.

Object determinism

Matthew and Rodrigo have been working on making it so that the object code that GHC produces is deterministic (#12935). One main benefit of determinism is improved caching of build products.

Matthew and Rodrigo have identified several sources of non-determinism (not just in object files):

  • Unique identifiers in Cmm are generated non-deterministically. Making these deterministic is the main challenge; it can be achieved by a combination of:

    • Threading through a unique supply to ensure that the uniques are generated in a deterministic order,

    • Where that isn’t possible (e.g. due to performance implications), recover determinism in a subsequent renaming pass.

    This is the work of the main MR !12680, which is still in progress at the time of writing.

  • Re-exports of entire modules can cause non-determinism in interface files (#13093). This is fixed by using a stable sort of the exports (!13093).

  • Files added by addDependentFile can be added in a non-deterministic order (#25131); this is fixed by using a deterministic module environment in GHC (!13099), and avoiding temporary file names which use randomly generated names inside Template Haskell.

  • The existence of rules can impact inlining decisions, and a bug in GHC (#19725) means that sometimes rules that are not in the dependency closure of a given module can impact inlining decisions in that module (#25170); as these rules may or may not be loaded (depending e.g. on parallel compilation), this causes non-determinism.

  • Build paths could leak into object files, e.g. a preprocesor could introduce a LINE pragma that referred to a build path, due to a regression in Cabal (#10242). Matthew addressed a plethora of issues surrounding the treatment of working directories in Cabal in #10256, putting this problem to rest.

SIMD

SIMD (“Single Instruction, Multiple Data”) refers to CPU support for instructions that operate on short vectors of data in parallel. For example, on X86-64 with SSE instructions, one can use a single mulps instruction to multiply two vectors containing 4 32-bit floating point values, element by element. Making good use of these instructions can often be critical in high-performance applications, in particular those involving array processing.

In Haskell, SIMD is exposed through GHC.Exts, with e.g. FloatX4# representing a 4-element vector of Float#s, and timesFloatX4# the corresponding elementwise multiplication operation that lowers to mulps on X86-64.

Historically, these types and operations were only supported in GHC’s LLVM backend (i.e. one had to use -fllvm), and not GHC’s native code generator (NCG). However, Sam has been working to add SIMD support to GHC’s X86 native code generator.

The first challenge was fixing register allocation, which takes in Cmm code and assigns variables to registers. GHC didn’t previously keep track of how registers were used, but SIMD instructions can use registers such as xmm0 to store many different things, such as f64 (Double#) and v4f32 (FloatX4#). The register allocator needs to know precisely what is stored in the register, so that if we need to spill the register to the stack we know how much to spill (and conversely how much to load when reloading from the stack). Tracking the usage of registers was made possible in Cmm with previous work (!9167); it was then a case of propagating this information through the register allocator and emitting the correct store and load instructions.

However, work in this area revealed further issues with the existing SIMD support in GHC, thus affecting both the NCG and LLVM backends:

  • The generated Cmm code for unknown function calls involving SIMD vectors was plain broken (#25062).
  • GHC is liable to drop the upper part of vector registers around hand-written Cmm code in the RTS (#25169).

There were a few other tricky aspects to the implementation, in particular to account for the fact that the Windows X86_64 C calling convention dictates that vector values must be passed by reference and cannot be passed directly in registers.

Haddock

The Haddock tool for generating Haskell documentation has been merged into the GHC tree. This greatly simplifies the contribution workflow, especially as GHC patches that required Haddock changes used to require a delicate dance with git submodules.

Zubin finalised the merge, restoring missing commits that were scattered across various branches (!12720). He then integrated the building and testing of Haddock with GHC’s build system, Hadrian (!12743).

Zubin then fixed many other outstanding issues with Haddock, including:

  • Incorrect hyperlinked source urls (#24907, !12761)

  • Handling of non-hs files, so that haddock can generate documentation for modules with foreign imports (!13008)

  • Several issues with incorrect cross-package references (!12980)

Bindist testing

Historically, the GHC project did not substantively test the deployment of produced GHC binary distributions. After several releases in which the CI of the ghcup project revealed issues, Matt implemented a more robust suite of tests for GHC bindists, testing both ghcup usage and manual installation.

This culminated in the creation of the ghcup-ci repository, which will be used henceforth as part of the GHC release process.

Semaphores for faster parallel builds

We previously introduced a new feature to GHC and Cabal allowing them to share compute resources more effectively using a semaphore: see our previous blog post on Reducing Haskell parallel build times using semaphores.

An issue with this feature was reported on the Cabal issue tracker (#9993). Zubin investigated and discovered that this was due to cabal-install and ghc being compiled and linked against inconsistent implementations of libc. For example, if cabal-install is built against musl, it will create a POSIX semaphore named /dev/shm/cabal_semaphore_k, but ghc built against glibc would instead attempt to read /dev/shm/sem.cabal_semaphore_k. This is not simply a naming issue, as the semaphore implementations can be genuinely incompatible.

To address this, Zubin has been rewriting the implementation used on POSIX-like systems to use sockets instead of semaphores, which don’t suffer from this problem, and has proposed to amend the GHC Proposal accordingly.

Other work

Frontend

  • Sam allowed the renamer to take COMPLETE pragmas into consideration when deciding whether a do block should require MonadFail (!10565), fixing a long standing feature request (#15681).

  • Sam fixed an issue where incomplete pattern match suggestions included out-of-scope patterns (#25115, !13089).

  • Sam updated the GHC diagnostic code infrastructure to allow it to be extensible for other tools. This enabled another GHC contributor, Jade, to re-use that infrastructure for GHCi error messages (#23338).

  • Matthew fixed an issue where -Wmissing-home-modules gave incorrect warnings when multiple units have the same module name (!13085).

  • Matthew extended the -reexported-module flag to support module renaming (!13117), unblocking Cabal issue #10181.

  • Zubin fixed the pretty printing of ticked prefix constructors (#24237, !13115).

Code generator

  • Andreas fixed a bug in the graph-colouring register allocator on AArch64 (#24941, !12990).

  • Rodrigo helped land work by contributor Alex Mason that improves the lowering of the byte-swap primitive in the AArch64 NCG using the REV instruction (!12814).

Compiler modularity

  • Rodrigo introduced a ‘one-shot’ strict state monad and used it to replace boilerplate monad instances in the GHC code base (!12978).

  • At Zurihac, Rodrigo oversaw a collaboration on the trees-that-grow infrastructure in GHC, aiming to make the Haskell AST separate from GHC in the hopes of splitting it up into a separate library. This culminated in a collaborative MR !12830 with a wide range of contributions from several new contributors.

GHCi

  • Andreas fixed tagging issues causing segfaults in GHCi (!12773).

  • Sam fixed an issue with the GHCi debugger’s treatment of record fields (#25109, !13091).

Build system and CI

  • Andreas investigated and fixed issues revealed by the test-primops suite.

  • Matt migrated the CI away from Debian 10 and onto Debian 12 (!13033), and addressed subsequent issues with ghcup metadata generation (!13044, !13078).

  • Sam updated the testsuite driver to use py-cpuinfo to query for available CPU features, fixing feature detection on Windows (!12971).

  • Matt added file-io as a boot package, to allow upgrading the version of directory (!13122).

by adam, andreask, ben, matthew, mikolaj, rodrigo, sam, zubin at August 30, 2024 12:00 AM

August 29, 2024

Gabriella Gonzalez

Firewall rules: not as secure as you think

Firewall rules: not as secure as you think

This post introduces some tricks for jailbreaking hosts behind “secure” enterprise firewalls in order to enable arbitrary inbound and outbound requests over any protocol. You’ll probably find the tricks outlined in the post useful if you need to deploy software in a hostile networking environment.

The motivation for these tricks is that you might be a vendor that sells software that runs in a customer’s datacenter (a.k.a. on-premises software), so your software has to run inside of a restricted network environment. You (the vendor) can ask the customer to open their firewall for your software to communicate with the outside world (e.g. your own datacenter or third party services), but customers will usually be reluctant to open their firewall more than necessary.

For example, you might want to ssh into your host so that you can service, maintain, or upgrade the host, but if you ask the customer to open their firewall to let you ssh in they’ll usually push back on or outright reject the request. Moreover, this isn’t one of those situations where you can just ask for forgiveness instead of permission because you can’t begin to do anything without explicitly requesting some sort of firewall change on their part.

So I’m about to teach you a bunch of tricks for efficiently tunneling whatever you want over seemingly innocuous openings in a customer’s firewall. These tricks will culminate with the most cursed trick of all, which is tunneling inbound SSH connections inside of outbound HTTPS requests. This will grant you full command-line access to your on-premises hosts using the most benign firewall permission that a customer can grant. Moreover, this post is accompanied by a repository named holepunch containing NixOS modules automating this ultimate trick which you can either use directly or consult as a working proof-of-concept for how the trick works.

Overview

Most of the tricks outlined in this post assume that you control the hosts on both ends of the network request. In other words, we’re going to assume that there is some external host in your datacenter and some internal host in the customer’s datacenter and you control the software running on both hosts.

There are four tricks in our arsenal that we’re going to use to jailbreak internal hosts behind a restrictive customer firewall:

Once you master these four tools you will typically be able to do basically anything you want using the slimmest of firewall permissions.

You might also want to read another post of mine: Forward and reverse proxies explained. It’s not required reading for this post, but you might find it helpful or interesting if you like this post.

Proxies

We’re going to start with proxies since that’s the easiest thing to explain which requires no other conceptual dependencies.

A proxy is a host that can connect to other hosts on a client’s behalf (instead of the client making a direct connection to those other hosts). We will call these other hosts “upstream hosts”.

One of the most common tricks when jailbreaking an internal host (in the customer’s datacenter) is to create an external host (in your datacenter) that is a proxy. This is really effective because the customer has no control over traffic between the proxy and upstream hosts. The customer’s firewall can only see, manage, and intercept traffic between the internal host and the proxy, but everything else is invisible to them.

There are two types of proxies, though: forward proxies and reverse proxies. Both types of proxies are going to come in handy for jailbreaking our internal host.

Forward proxy

A forward proxy is a proxy that lets the client decide which upstream host to connect to. In our case, the “client” is the internal host that resides in the customer datacenter that is trying to bypass the firewall.

Forward proxies come in handy when the customer restricts which hosts that you’re allowed to connect to. For example, suppose that your external host’s address is external.example.com and your internal hosts’s address is internal.example.com. Your customer might have a firewall rule that prevents internal.example.com from connecting to any host other than external.example.com. The intention here is to prevent your machine from connecting to other (potentially malicious) machines. However, this firewall rule is quite easy for a vendor to subvert.

All you have to do is host a forward proxy at external.example.com and then any time internal.example.com wants to connect to any other domain (e.g. google.com) it can just route the request through the forward proxy hosted at external.example.com. For example, squid is one example of a forward proxy that you can use for this purpose, and you could configure it like this:

acl internal src ${SUBNET OF YOUR INTERNAL SERVER(S)}

http_access allow internal
http_access deny all

… and then squid will let any program on internal.example.com connect to any host reachable from external.example.com so long as the program configured http://external.example.com:3128 as the forward proxy. For example, you’d be able to run this command on internal.example.com:

$ curl --proxy http://external.example.com:3128 https://google.com

… and the request would succeed despite the firewall because from the customer’s point of view they can’t tell that you’re using a forward proxy. Or can they?

Reverse proxy

Well, actually the customer can tell that you’re doing something suspicious. The connection to squid isn’t encrypted (note that the scheme for our forward proxy URI is http and not https), and most modern firewalls will be smart enough to monitor unencrypted traffic and notice that you’re trying to evade the firewall by using a forward proxy (and they will typically block your connection if you try this). Oops!

Fortunately, there’s a very easy way to evade this: encrypt the traffic to the proxy! There are quite a few ways to do this, but the most common approach is to put a “TLS-terminating reverse proxy” in front of any service that needs to be encrypted.

So what’s a “reverse proxy”? A reverse proxy is a proxy where the proxy decides which upstream host to connect to (instead of the client deciding). A TLS-terminating reverse proxy is one whose sole purpose is to provide an encrypted endpoint that clients can connect to and then it forwards unencrypted traffic to some (fixed) upstream endpoint (e.g. squid running on external.example.com:3128 in this example).

There are quite a few services created for doing this sort of thing, but the three I’ve personally used the most throughout my career are:

  • nginx
  • haproxy
  • stunnel

For this particular case, I actually will be using stunnel to keep things as simple as possible (nginx and haproxy require a bit more configuration to get working for this).

You would run stunnel on external.example.com with a configuration that would look something like this:

[default]
accept = 443
connect = localhost:3128
cert = /path/to/your-certificate.pem

… and now connections to https://external.example.com are encrypted and handled by stunnel, which will decrypt the traffic and route those requests to squid running on port 3128 of the same machine.

In order for this to work you’re going to need a valid certificate for external.example.com, which you can obtain for free using Let’s Encrypt. Then you staple the certificate public key and private key to generate the final PEM file that you reference in the above stunnel configuration.

So if you’ve gotten this far your server can now access any publicly reachable address despite the customer’s firewall restriction. Moreover, the customer can no longer detect that anything is amiss because all of your connections to the outside world will appear to the customer’s firewall as encrypted HTTPS connections to external.example.com:443, which is an extremely innocuous type of of connection.

Reverse tunnel

We’re only getting started, though! By this point we can make whatever outbound connections we want, but WHAT ABOUT INBOUND CONNECTIONS?

As it turns out, there is a trick known as a reverse tunnel which lets you tunnel inbound connections over outbound connections. Most reverse tunnels exploit two properties of TCP connections:

  • TCP connections may be long-lived (sometimes very long-lived)
  • TCP connections must necessarily support network traffic in both directions

Now, in the common case a lot of TCP connections are short-lived. For example, when you open https://google.com in your browser that is an HTTPS request which is layered on top of a TCP connection. The HTTP request message is data sent in one direction over the TCP connection and the HTTP response message is data sent in the other direction over the TCP connection and then the TCP connection is closed.

But TCP is much more powerful than that and reverse tunnels exploit that latent protocol power. To illustrate how that works I’ll use the most widely known type of reverse tunnel: the SSH reverse tunnel.

You typically create an SSH reverse tunnel by running a command like this from the internal machine (e.g. internal.example.com):

$ ssh -R "${EXTERNAL_PORT}:localhost:${INTERNAL_PORT}" -N external.example.com

In an SSH reverse tunnel, the internal machine (e.g. internal.example.com) initiates an outbound TCP request to the SSH daemon (sshd) listening on the external machine (e.g. external.example.com). When sshd receives this TCP request it keeps the TCP connection alive and then listens for inbound requests on EXTERNAL_PORT of the external machine. sshd forward all requests received on that port through the still-alive TCP connection back to the INTERNAL_PORT on the internal machine. This works fine because TCP connections permit arbitrary data flow both ways and the protocol does not care if the usual request/response flow is suddenly reversed.

In fact, an SSH reverse tunnel doesn’t just let you make inbound connections to the internal machine; it lets you make inbound connections to any machine reachable from the internal machine (e.g. other machines inside the customer’s datacenter). However, those kinds of connections to other internal hosts can be noticed and blocked by the customer’s firewall.

From the point of view of the customer’s firewall, our internal machine has just made a single long-lived outbound connection to external.example.com and they cannot easily tell that the real requests are coming in the other direction (inbound) because those requests are being tunneled inside of the outbound request.

However, this is not foolproof, for two reasons:

  • A customer’s firewall can notice (and ban) a long-lived connection

    I believe it is possible to disguise a long-lived connection as a series of shorter-lived connections, but I’ve never personally done that before so I’m not equipped to explain how to do that.

  • A customer’s firewall will notice that you’re making an SSH connection of some sort

    Even when the SSH connection is encrypted it is still possible for a firewall to detect that the SSH protocol is being used. A lot of firewalls will be configured to ban SSH traffic by default unless explicitly approved.

However, there is a great solution to that latter problem, which is …

corkscrew

corkscrew is an extremely simple tool that wraps an SSH connection in an HTTP connection. This lets us disguise SSH traffic as HTTP traffic (which we can then further disguise as HTTPS traffic by encrypting the connection using stunnel).

Normally, the only thing we’d need to do is to extend our ssh -R command to add this option:

ssh -R -o 'ProxyCommand /path/to/corkscrew external.example.com 443 %h %p` …

… but this doesn’t work because corkscrew doesn’t support HTTPS connections (it’s an extremely simple program written in just a couple hundred lines of C code). So in order to work around that we’re going to use stunnel again, but this time we’re going to run stunnel in “client mode” on internal.example.com so that it can handle the HTTPS logic on behalf of corkscrew.

[default]
client = yes
accept = 3128
connect = external.example.com:443

… and then the correct ssh command is:

$ ssh -R -o 'ProxyCommand /path/to/corkscrew localhost 3128 %h %p` …

… and now you are able to disguise an outbound SSH request as an outbound HTTPS request.

MOREOVER, you can use that disguised outbound SSH request to create an SSH reverse tunnel which you can use to forward inbound traffic from external.example.com to any INTERNAL_PORT on internal.example.com. Can you guess what INTERNAL_PORT we’re going to pick?

That’s right, we’re going to forward inbound traffic to port 22: sshd. Also, we’re going to arbitrarily set EXTERNAL_PORT to 17705:

$ ssh -R 17705:localhost:22 -N external.example.com

Now, (separately from the above command) we can ssh into our internal server via our external server like this:

$ ssh -p 17705 external.example.com

… and we have complete command-line access to our internal server and the customer is none the wiser.

From the customer’s perspective, we just ask them for an innocent-seeming firewall rule permitting outbound HTTPS traffic from internal.example.com to external.example.com. That is the most innocuous firewall change we can possibly request (short of not opening the firewall at all).

Conclusion

I don’t think all firewall rules are ineffective or bad, but if the same person or organization controls both ends of a connection then typically anything short of completely disabling internet access can be jailbroken in some way with off-the-shelf open source tools. It does require some work, but as you can see with the associated holepunch repository even moderately sophisticated firewall escape hatches can be neatly packaged for others to reuse.

by Gabriella Gonzalez (noreply@blogger.com) at August 29, 2024 01:49 PM

Tweag I/O

Deploying Buildbarn on Kubernetes with mTLS on the side

We have shown the benefits of using a shared build cache as well as using remote build execution (RBE) to offload builds to a remote build farm. Our customers are interested in leveraging RBE to improve developer experience and reduce continuous integration (CI) run times, giving us an opportunity to learn all aspects of deploying different RBE solutions. I would like to share how one can deploy one of them, Buildbarn, and secure all communications in it.

What is it and why do we care?

We want developers to be productive. Being productive requires spending as little time as possible waiting for build/test feedback, not having to switch to a different task while the build is running.

Remote caching

One part of achieving this is to never build the same thing twice. Tools like Bazel support caching the result of every action, every tool execution. While many tools support storing results in a local directory, Bazel tracks the actions and their inputs with high granularity, resulting in more frequent “cache hits”. This is already a good gain for a single developer working on one machine. However Bazel also supports conducting builds in a controlled environment with identical tooling and using a remote cache that can be shared between team members and CI, taking things a significant step further. You won’t have to rebuild anything that has been built by your colleagues or by CI, which means starting up on a new machine, onboarding a new team member or reproducing issues becomes faster.

Remote build execution

The second part of keeping developers productive is allowing them to use the right tools for the job. They still often need to build new things, and their local machine may be not be the fastest, not have enough charge or have the wrong architecture or OS. Remote build execution extends remote caching by executing actions on shared builders when their results are not cached already. This allows setting up a shared pool of necessary hardware or virtual compute for both developers and CI. In Bazel this was implemented using RBE API.

RBE implementations

Since the last post, RBE for Google Cloud Platform (GCP) has disappeared, and several new self-service and commercial services have been created. The RBE API has also gained popularity with different build systems, including Bazel (where it started), Buck2, and BuildStream. It is also used in projects that cannot change their build systems easily, but can use reclient to wrap all build actions and forward them to an RBE service. Examples of such setup include Android, Fuchsia and Chromium.

We’ll focus on one of opensource RBE API servers, Buildbarn.

Securing remote cache and builds

Any shared infrastructure implies some security risks. When sending code to be built remotely we expose it on the network, where it can be intercepted or altered. When reading from the cache, we trust it to contain valid, unaltered results. When setting up a pool of compute resources, we expect them to be used only for building our code, and not for enriching third parties. All these expectations mean that we require all communications with remote infrastructure and within it to be encrypted and authenticated. The industry standard for achieving this is mTLS: Transport Layer Security (TLS) protocol with mutual authentication. It uses public key infrastructure (PKI) to allow both clients and servers to verify each other’s identities before sending any data, and makes sure that the data sent on one side matches the data received on the other side.

Overview

In this extended blog post we’ll start by showing how to deploy Buildbarn on a Kubernetes cluster running in a local VM and configure a simple Bazel example to use it. Then we’ll turn on mTLS with the help of cert-manager for all Buildbarn pieces communicating with one another, and, finally, configure Bazel on a developer or CI machine to authenticate over the RBE API with a certificate and verify the one presented by the build server.

This blog post contains a lot of code snippets that let you follow the installation process step by step. If you copy each command into your terminal in order, you should see the same results as described. If you prefer to jump to the final result and look at the complete picture, you can check out our fork of the upstream buildbarn/bb-deployments repository and follow the instructions there.

Deploying Buildbarn

In this section we’ll create a local Buildbarn deployment on a Kubernetes cluster running in a VM. We’ll create a local VM with Kubernetes using an example config provided by lima. Then we’ll configure persistent volumes for Buildbarn storage inside that VM. After that we’ll use the Kubernetes example from a repository provided by Buildbarn to deploy Buildbarn itself.

Setting up a Kubernetes instance

If you already have access to a Kubernetes cluster that you can use, you can skip this section. Here we’ll deploy a local VM with Kubernetes running in it. In subsequent steps below it’s assumed that you’re using a local VM, so you’ll have to adjust some parameters accordingly if you use different means.

I’ve found that the easiest and most portable way to get a Kubernetes running locally is using the lima (Linux Machines) project. You can follow the official docs to install it. I prefer using Nix and direnv, so I’ve created a .envrc file with one line use nix and shell.nix with the following contents:

{ nixpkgs ? builtins.getFlake "nixpkgs"
, system ? builtins.currentSystem
, pkgs ? nixpkgs.legacyPackages.${system}
}:
pkgs.mkShell {
  packages = with pkgs; [
    kubectl
    lima-bin
    jq
  ];
}

Then you just need to run direnv allow and it will fetch the necessary packages and make them available in your shell.

Now we can create a Lima VM from the k8s template. We remove mounts from the template to specify our own later. We also need to add some special options for running on macOS:

limactl create template://k8s --name k8s --tty=false \
  --set '.provision |= . + {"mode":"system","script":"#!/bin/bash
for d in /mnt/fast-disks/vol{0,1,2,3}; do sudo mkdir -p $d; sudo mount --bind $d $d; done"}' \
  $([ "$(uname -s)" = "Darwin" ] && { echo "--vm-type vz"; [ "$(uname -m)" = "arm64" ] && echo "--rosetta"; })

Here arguments are:

  • --name k8s sets a name for the new VM; it defaults to the template name, but let’s keep it explicit
  • --set '.provision ...' uses a jq expression to add an additional provision step to the resulting YAML file creating necessary mountpoints for persistent volumes
  • --tty=false disables console prompts and confirmations
  • for macOS we also add --vm-type vz to use the native macOS Virtualization framework instead of QEMU for a faster VM
  • for Apple Silicon we also add --rosetta to enable the translation layer, allowing us to run x86_64 containers in the VM with little overhead

You can start the final VM and check if it is ready with:

limactl start k8s
export KUBECONFIG=~/.lima/k8s/copied-from-guest/kubeconfig.yaml
kubectl get node

It will take some time to bootstrap Kubernetes, after which it should show you one node called lima-k8s with Ready status:

NAME       STATUS   ROLES           AGE     VERSION
lima-k8s   Ready    control-plane   4m54s   v1.29.2

Buildbarn will need some PersistentVolumes to store data. Let’s teach it to use the mounts that we created earlier for that. First, configure a storage class:

kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-disks
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF

It should respond with storageclass.storage.k8s.io/fast-disks created.

Then start a local volume provisioner from sig-storage-local-static-provisioner:

curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-storage-local-static-provisioner/master/deployment/kubernetes/example/default_example_provisioner_generated.yaml | kubectl apply -f -

Run kubectl get pv to see that it created four volumes. They may take several seconds to appear. You can check the provisioner’s logs for any errors with kubectl logs daemonset/local-volume-provisioner.

Deploying Buildbarn

bb-deployments provides a Kustomize template to deploy Buildbarn. Let’s clone it, patch one service so that we can run it locally, and deploy:

git clone https://github.com/buildbarn/bb-deployments.git
pushd bb-deployments/kubernetes
cat >> kustomization.yaml <<EOF

# patch frontend service to not require external load balancers
patches:
  - target:
      kind: Service
      name: frontend
    patch: |
      - op: replace
        path: /spec/type
        value: NodePort
      - op: add
        path: /spec/ports/0/nodePort
        value: 30080
EOF
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

The last command will wait for everything to start. We’ve filtered out all messages about resources that it doesn’t know how to wait for.

To check that the Buildbarn frontend is accessible, we can use grpc-client-cli. Add it to the list in shell.nix, save it and run:

grpc-client-cli -a 127.0.0.1:30080 health

It should report that it is SERVING:

{
 "status": "SERVING"
}

We can exit the bb-deployments directory now:

popd

In this section we’ve deployed Buildbarn and verified that its API is accessible. Now we’ll move on to setting up a small Bazel project to use it. Then we’ll configure mTLS on Buildbarn, and finally configure Bazel to work with mTLS.

Using Buildbarn

Let’s set up a small Bazel project to use our Buildbarn instance. In this section we’ll use Bazel examples repo and show how to build it using Bazel locally and with RBE. We’ll also see how remote caching speeds up builds by caching intermediate results.

We will be using Bazelisk to fetch and run upstream distribution of Bazel. First we’ll need to install Bazelisk by adding bazelisk to shell.nix. If you are running NixOS, you will have to create an FHS environment to run Bazel. If you are running macOS and don’t have Xcode command line tools installed, you also need to provide necessary libraries to bazel invocation. Add this to your shell.nix:

pkgs.mkShell {
  packages = with pkgs; [
    ...
    bazelisk
  ];
  env = pkgs.lib.optionalAttrs pkgs.stdenv.isDarwin {
    BAZEL_LINKOPTS = with pkgs.darwin.apple_sdk;
      "-F${frameworks.Foundation}/Library/Frameworks:-L${objc4}/lib";
    BAZEL_CXXOPTS = "-I${pkgs.libcxx.dev}/include/c++/v1";
  };
  # fhs is only used on NixOS
  passthru.fhs = (pkgs.buildFHSUserEnv {
    name = "bazel-userenv";
    runScript = "zsh";  # replace with your shell of choice
    targetPkgs = pkgs: with pkgs; [
      libz  # required for bazelisk to unpack Bazel itself
    ];
  }).env;
}

Then on NixOS you can run nix-shell -A fhs to enter an environment where directories like /bin, /usr and /lib are set up as tools made for other Linux distributions expect.

Now we can clone Bazel examples repo and enter the simple C++ example in it:

git clone --depth 1 https://github.com/bazelbuild/examples
pushd examples/cpp-tutorial/stage1

On macOS we’ll need to configure compiler and linker flags to look for libraries in Nix store:

echo "build:macos --action_env=BAZEL_CXXOPTS=${BAZEL_CXXOPTS}" >> .bazelrc
echo "build:macos --action_env=BAZEL_LINKOPTS=${BAZEL_LINKOPTS}" >> .bazelrc

We will be building remotely for the Linux platform later, so we should specify a concrete platform and toolchain to use for Linux:

echo "build:linux --platforms=@aspect_gcc_toolchain//platforms:x86_64_linux" >> .bazelrc
echo "build:linux --extra_execution_platforms=@aspect_gcc_toolchain//platforms:x86_64_linux" >> .bazelrc

And then build and run the example locally:

bazelisk run //main:hello-world

You should see output like:

Starting local Bazel server and connecting to it...
INFO: Analyzed target //main:hello-world (38 packages loaded, 165 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
  bazel-bin/main/hello-world
INFO: Elapsed time: 7.545s, Critical Path: 0.94s
INFO: 8 processes: 6 internal, 2 processwrapper-sandbox.
INFO: Build completed successfully, 8 total actions
INFO: Running command line: bazel-bin/main/hello-world
Hello world

Note that if we run bazelisk run //main:hello-world again, it’ll be much faster, because Bazel only spends a fraction of a second on computing the action graph and making sure that nothing needs to be rebuilt:

...
INFO: Elapsed time: 0.113s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
...

We can also run bazelisk clean to remove previous output and re-run it to make sure we can rebuild from scratch.

Now let’s try building it using Buildbarn. First we need to configure execution properties to match ones set up in Buildbarn’s worker config:

echo "build:remote --remote_default_exec_properties OSFamily=linux" >> .bazelrc
echo "build:remote --remote_default_exec_properties container-image=docker://ghcr.io/catthehacker/ubuntu:act-22.04@sha256:5f9c35c25db1d51a8ddaae5c0ba8d3c163c5e9a4a6cc97acd409ac7eae239448" >> .bazelrc

Then we should tell Bazel to use Buildbarn as a remote executor:

echo "build:remote --remote_executor grpc://127.0.0.1:30080" >> .bazelrc

Now we can build it with bazelisk build --config=linux --config=remote //main:hello-world. Note that it will take some time to extract the Linux compiler and supplemental files first:

INFO: Invocation ID: d70b9d30-1865-4d1f-8d52-77c6fc5ec607
INFO: Build options --extra_execution_platforms, --incompatible_enable_cc_toolchain_resolution, and --platforms have changed, discarding analysis cache.
INFO: Analyzed target //main:hello-world (3 packages loaded, 6315 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
  bazel-bin/main/hello-world
INFO: Elapsed time: 96.249s, Critical Path: 52.72s
INFO: 5 processes: 3 internal, 2 remote.
INFO: Build completed successfully, 5 total actions

As you can see, two actions were executed remotely: compilation and linking. But we can find the result locally in bazel-bin/main/hello-world (and run it if we’re on an appropriate platform):

 % file bazel-bin/main/hello-world
bazel-bin/main/hello-world: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 4.9.0, not stripped

Now if we clean local caches and rebuild, we can see that it reuses results already stored in Buildbarn (remote cache hits):

 % bazelisk clean
INFO: Invocation ID: d655d3f2-071d-48ff-b3e9-e0b1c61ae5fb
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
 % bazelisk build --config=linux --config=remote //main:hello-world
INFO: Invocation ID: d38526d8-0242-4b91-92da-20ddd110d3ae
INFO: Analyzed target //main:hello-world (41 packages loaded, 6315 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
  bazel-bin/main/hello-world
INFO: Elapsed time: 0.663s, Critical Path: 0.07s
INFO: 5 processes: 2 remote cache hit, 3 internal.
INFO: Build completed successfully, 5 total actions

We can exit the examples directory now:

popd

In this section we’ve configured a Bazel project to be built using our Buildbarn instance. Now we’ll configure mTLS on Buildbarn and then finally reconfigure this Bazel project to access Buildbarn using mTLS.

Configuring TLS in Buildbarn

We want each component of Buildbarn to have its own automatically generated certificate and use it to connect to other components. On the other side, each component that accepts connections should verify that the incoming connection is accompanied by a valid certificate as well. In this section we’ll use cert-manager to generate certificates and a more secure CSI driver to request certificates and propagate them to Buildbarn components. Then we’ll configure Buildbarn components to verify both sides of each connection. Here’s how this process should look like for frontend and storage containers, for example:

Node 1                       │        Kubernetes API             │ Node 2
                             │                                   │
┌─────────────────────────┐  │                                   │  ┌─────────────────────────┐
│ Frontend pod            │  │              mTLS                 │  │             Storage pod │
│      bb-storage process │<───────────────────────────────────────>│ bb-storage process      │
├─────────────────────────┤  │       ┌──────────────┐            │  ├─────────────────────────┤
│ CSI volume       ca.crt │  │       │ cert-manager │            │  │ ca.crt       CSI volume │
│       tls.key   tls.crt │  │       └─────┬────────┘            │  │ tls.crt   tls.key       │
└──────────^─────────^────┘  │             │ fills out           │  └───^─────────^───────────┘
           │         │       │             V                     │      │         │
       generates  stores     │    apiVersion: cert-manager.io/v1 │   stores   generates
           │         │            kind: CertificateRequest              │         │
          ┌┴─────────┴─┐ creates  spec:                                ┌┴─────────┴─┐
          │ CSI driver │────────>   request: LS0tLS...                 │ CSI driver │
          └────────────┘          status:                              └────────────┘
                     ^ retrieves    certificate: ...
                     └───────────   ca: ...
  1. CSI driver sees CSI volume, generates a key in tls.key in there.
  2. CSI driver uses key from tls.key to generate a Certificate Signing Request (CSR) and creates CertificateRequest resource in Kubernetes API with it.
  3. cert-manager signs the CertificateRequest with CA certificate and puts both resulting certificate and CA certificate in the CertificateRequest’s status.
  4. CSI driver stores them in tls.crt and ca.crt respectively in CSI volume.
  5. bb-storage process in the frontend pod uses certificate and key from tls.crt and tls.key to establish TLS connection to the storage pod, verifying that the later presents a valid certificate signed by a CA certificate from ca.crt.
  6. On the storage side tls.key, tls.crt and ca.crt are filled out in the similar manner
  7. bb-storage process in the storage pod verifies the incoming certificate with CA certificate from ca.crt and presents certificate from tls.crt to the frontend.

Notice how with this approach secret keys never leave the node where they are generated and used, and the connection between frontend and storage pods is authenticated on both ends.

Installing cert-manager

To generate certificates for our Buildbarn we need to install and configure cert-manager itself and its CSI driver. cert-manager is responsible for generating and updating certificates requested via Kubernetes API objects. The CSI driver lets users create special volumes in pods where private keys are generated locally and certificates are requested from cert-manager and provided to the pod.

First, let’s fetch all necessary manifests and add them to our deployment. The cert-manager project publishes a ready-to-use Kubernetes manifest, so we can manually fetch it:

pushd bb-deployments/kubernetes
curl -LO https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.yaml

And then add it to the resources section of our kustomization.yaml:

resources:
  - ...
  - cert-manager.yaml

Unfortunately, the cert-manager CSI driver doesn’t directly provide a k8s manifest, but rather a Helm chart. Add kubernetes-helm to your shell.nix and then run:

helm template -n cert-manager -a storage.k8s.io/v1/CSIDriver https://charts.jetstack.io/charts/cert-manager-csi-driver-v0.7.1.tgz > cert-manager-csi-driver.yaml

-a storage.k8s.io/v1/CSIDriver makes sure that chart uses the latest version of the Kubernetes API to register itself.

Then we can add it to resources section of our kustomization.yaml:

resources:
  - ...
  - cert-manager.yaml
  - cert-manager-csi-driver.yaml

Let’s deploy and wait for everything to start. We will use cmctl to check that cert-manager is working correctly, so you’ll need to add it to shell.nix.

kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
cmctl check api --wait 10m
kubectl get csinode -o yaml

cmctl should report The cert-manager API is ready, and the last command should output your only node with one driver called csi.cert-manager.io installed:

namespace/buildbarn unchanged
namespace/cert-manager created
...
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
...
The cert-manager API is ready
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
  kind: CSINode
  metadata:
    ...
    name: lima-k8s
    ...
  spec:
    drivers:
    - name: csi.cert-manager.io
      nodeID: lima-k8s
      topologyKeys: null
kind: List
metadata:
  resourceVersion: ""

If it says drivers: null, re-run kubectl get csinode -o yaml a bit later to allow more time for driver deployment and startup.

Creating CA certificate

First we need to create a CA certificate and an Issuer that cert-manager will use to generate certificates for our needs. Note that to generate a self-signed certificate we’ll also need to create another issuer. Put this in ca.yaml:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: selfsigned
  namespace: buildbarn
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: ca
  namespace: buildbarn
spec:
  isCA: true
  commonName: ca
  secretName: ca
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned
    kind: Issuer
    group: cert-manager.io
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: ca
  namespace: buildbarn
spec:
  ca:
    secretName: ca

Then add it to resources section of our kustomization.yaml:

resources:
  - ...
  - ca.yaml

And apply it and check their status:

kubectl apply -k .
kubectl -n buildbarn get issuers -o wide

Both issuers should be there, and ca issuer should have the Signing CA verified status:

NAME         READY   STATUS                AGE
ca           True    Signing CA verified   14s
selfsigned   True                          14s

If it says something like secrets "ca" not found, it means it needs some time to generate the certificate. Re-run kubectl -n buildbarn get issuers -o wide.

Generating certificates for Buildbarn components

As mentioned before, we will be generating certificates for each component using cert-manager’s CSI driver. To do this, we need to add a volume to each pod and mount it into the main container so that the service can read it. We also need to pass CA certificate into all these containers to verify other side of each connection. Unfortunately, Buildbarn doesn’t support reading these from file, so we’ll have to pass it statically via config. Let’s prepare this config file using this command that reads the CA certificate via the Kubernetes API and formats it using jq into a JSON string:

kubectl -n buildbarn get certificaterequests ca-1 -o jsonpath='{.status.ca}' | base64 -d | jq --raw-input --slurp . > config/ca-cert.jsonnet

Now we can configure all pods by adding the following patches in kustomization.yaml:

patches:
  - ...
  - target:
      kind: Deployment
      namespace: buildbarn
    patch: |
      - op: add
        path: /spec/template/spec/volumes/-
        value:
          name: tls-cert
          csi:
            driver: csi.cert-manager.io
            readOnly: true
            volumeAttributes:
              csi.cert-manager.io/issuer-name: ca
      - op: add
        path: /spec/template/spec/containers/0/volumeMounts/-
        value:
          mountPath: /cert
          name: tls-cert
          readOnly: true
  - target:
      kind: Deployment
      namespace: buildbarn
      name: frontend
    patch: |
      - op: add
        path: /spec/template/spec/volumes/0/configMap/items/-
        value:
          key: ca-cert.jsonnet
          path: ca-cert.jsonnet
      - op: add
        path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names
        value: frontend,frontend.${POD_NAMESPACE},frontend.${POD_NAMESPACE}.svc.cluster.local
      - op: add
        path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1ip-sans
        value: 127.0.0.1
  - target:
      kind: Deployment
      namespace: buildbarn
      name: browser
    patch: |
      - op: add
        path: /spec/template/spec/volumes/0/configMap/items/-
        value:
          key: ca-cert.jsonnet
          path: ca-cert.jsonnet
      - op: add
        path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names
        value: browser,browser.${POD_NAMESPACE},browser.${POD_NAMESPACE}.svc.cluster.local
  - target:
      kind: Deployment
      namespace: buildbarn
      name: scheduler-ubuntu22-04
    patch: |
      - op: add
        path: /spec/template/spec/volumes/0/configMap/items/-
        value:
          key: ca-cert.jsonnet
          path: ca-cert.jsonnet
      - op: add
        path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names
        value: scheduler,scheduler.${POD_NAMESPACE}
  - target:
      kind: Deployment
      namespace: buildbarn
      name: worker-ubuntu22-04
    patch: |
      - op: add
        path: /spec/template/spec/volumes/1/configMap/items/-
        value:
          key: ca-cert.jsonnet
          path: ca-cert.jsonnet
      - op: add
        path: /spec/template/spec/volumes/3/csi/volumeAttributes/csi.cert-manager.io~1dns-names
        value: worker,worker.${POD_NAMESPACE}
  - target:
      kind: StatefulSet
      namespace: buildbarn
      name: storage
    patch: |
      - op: add
        path: /spec/template/spec/volumes/0/configMap/items/-
        value:
          key: ca-cert.jsonnet
          path: ca-cert.jsonnet
      - op: add
        path: /spec/template/spec/volumes/-
        value:
          name: tls-cert
          csi:
            driver: csi.cert-manager.io
            readOnly: true
            volumeAttributes:
              csi.cert-manager.io/issuer-name: ca
              csi.cert-manager.io/dns-names: ${POD_NAME}.storage,${POD_NAME}.storage.${POD_NAMESPACE}
      - op: add
        path: /spec/template/spec/containers/0/volumeMounts/-
        value:
          mountPath: /cert
          name: tls-cert
          readOnly: true

To avoid repetition, the first patch is applied to all Deployment objects, and consecutive patches only add the proper list of DNS names for each certificate. Note that many of those DNS names will not be used as only some of these services actually accept connections. For the frontend Deployment we also add 127.0.0.1 IP so that it can be accessed via a port forwarded to localhost as we currently use it on the host machine. For the storage StatefulSet we configure unique DNS name for each Pod because they are contacted directly and not through a common service. For each of these we also add ca-cert.jsonnet to the list of files used from the configuration ConfigMap. We also need to add it to the ConfigMap itself by adding it to the list in config/kustomization.yaml:

configMapGenerator:
  - name: buildbarn-config
    namespace: buildbarn
    files:
      - ...
      - ca-cert.jsonnet

We can apply all these changes with:

kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

Now you can fetch the list of CertificateRequest objects to see their statuses:

kubectl -n buildbarn get certificaterequest

It will output one request for the ca certificate named ca-1 and a bunch of requests generated for each pod:

NAME                                   APPROVED   DENIED   READY   ISSUER       REQUESTOR                                                    AGE
14468f64-909f-43d1-b67d-07b0844c0683   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m
1d9e41a6-e58f-4c13-b9e6-0b1ba1d5a4f6   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m1s
2c2f1177-81fc-45e5-8487-9b66bc0d6f73   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m1s
31fdb0ef-0c0b-4a06-94af-fb17875ee05d   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m1s
376d0933-c0e9-4d39-b5c6-b76071c65966   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   4m58s
3967cdd6-7d48-4814-8cec-542041182dd0   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m1s
464a1f35-f0ba-4236-aeec-294f880d9675   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   4m57s
5181e602-276e-413e-8888-76c4bd1ede21   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   4m57s
6f02092d-b8a3-4eb7-8ff2-5e4a433d59bb   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m1s
710a458e-6ba0-4a44-87ab-5115b5a2c213   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   4m58s
753c4653-71ae-447e-bbe5-022ce35cee9d   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m1s
8bcbb5a0-4575-40ad-b842-9c86bde8fdb8   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   4m56s
8df59bf5-ed23-47af-bfcc-3cf8a9053b9b   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m1s
b47fff23-40b4-43ed-8e34-35d988eb434d   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   4m56s
be72bdc6-c61d-4f1b-928e-f743df0f6188   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   4m57s
c14a52d5-dc20-4626-afe6-975442103d8b   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m
ca-1                                   True                True    selfsigned   system:serviceaccount:cert-manager:cert-manager              3d22h
ceabf1ab-06a7-47c0-855a-2009bbbd2418   True                True    ca           system:serviceaccount:cert-manager:cert-manager-csi-driver   5m

Using certificates

Now that we’ve generated all necessary certificates and made them available to all pods, we can configure all components to use them. We’ll use similar stanzas for each service, so let’s first add some helper functions to the top of config/common.libsonnet:

local localKeyPair = {
  files: {
    certificate_path: '/cert/tls.crt',
    private_key_path: '/cert/tls.key',
    refresh_interval: '3600s',
  },
};

local grpcClientWithTLS = function(address) {
  address: address,
  tls: {
    server_certificate_authorities: import 'ca-cert.jsonnet',
    client_key_pair: localKeyPair,
  },
};

local oneListenAddressWithTLS = function(address) [{
  listenAddresses: [address],
  authenticationPolicy: {
    tls_client_certificate: {
      client_certificate_authorities: import 'ca-cert.jsonnet',
      validation_jmespath_expression: '`true`',
      metadata_extraction_jmespath_expression: '`{}`',
    },
  },
  tls: {
    server_key_pair: localKeyPair,
  },
}];

And then expose these functions to use in other configs at the end of the file:

  ...
  grpcClientWithTLS: grpcClientWithTLS,
  oneListenAddressWithTLS: oneListenAddressWithTLS,
}

Note that local certificate and key files will be reloaded every hour per the refresh_interval setting, but the CA certificate will need to be reconfigured manually every time it refreshes.

Also note that we accept all valid certificates by setting validation_jmespath_expression to `true`. This expression can be configured later for each service if needed.

Now we’re ready to configure the Buildbarn services.

Storage

Let’s start with storage. The client side configuration is the same for all services that connect to it and is stored in config/common.libsonnet. Replace lines like this one:

backend: { grpc: { address: 'storage-0.storage.buildbarn:8981' } },

with usage of our new function:

backend: { grpc: grpcClientWithTLS('storage-0.storage.buildbarn:8981') },

Keep the address the same (storage-0 and storage-1 should remain in place).

Now in config/storage.jsonnet replace these GRPC server configuration lines:

grpcServers: [{
  listenAddresses: [':8981'],
  authenticationPolicy: { allow: {} },
}],

With a call to another function:

grpcServers: common.oneListenAddressWithTLS(':8981'),

Make sure that the address itself is the same again.

Now let’s apply it and wait for all pods to restart:

kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

Let’s check that the storage service is still accessible via the frontend service by rebuilding our example project:

pushd ../../examples/cpp-tutorial/stage1
bazelisk clean
bazelisk build --config=linux --config=remote //main:hello-world
popd

It should show that it fetched output from the remote cache:

...
INFO: 5 processes: 2 remote cache hit, 3 internal.
...

Scheduler

The scheduler exposes at least four GRPC endpoints, but we’ll cover only the client (frontend) and worker sides as we don’t use other endpoints yet. Just like with storage, you should replace clientGrpcServers and workerGrpcServers settings with calls to oneListenAddressWithTLS in config/scheduler.jsonnet, passing the addresses themselves as an argument:

...
clientGrpcServers: common.oneListenAddressWithTLS(':8982'),
workerGrpcServers: common.oneListenAddressWithTLS(':8983'),
...

The scheduler itself only connects to storage, and that part has already been configured in config/common.jsonnet.

Workers

Workers only connect to the scheduler and storage. With the latter being already configured, we need to only change scheduler setting in config/worker-ubuntu22-04.jsonnet:

...
scheduler: common.grpcClientWithTLS('scheduler:8983'),
...

Frontend

The frontend listens for incoming connections from clients and fans them out, either to storage or to the scheduler. Storage access has already been covered, so we only need to replace grpcServers and schedulers settings in config/frontend.jsonnet:

grpcServers: common.oneListenAddressWithTLS(':8980'),
schedulers: {
  '': {
    endpoint: common.grpcClientWithTLS('scheduler:8982') {
      addMetadataJmespathExpression: |||
        {
          "build.bazel.remote.execution.v2.requestmetadata-bin": incomingGRPCMetadata."build.bazel.remote.execution.v2.requestmetadata-bin"
        }
      |||,
    },
  },
},

Note that we preserve all addresses and keep the additional addMetadataJmespathExpression field that augments requests to the scheduler.

Applying it all

Now we can apply all these settings with:

kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

All deployments should eventually roll out and work. This means that all internal communications between Buildbarn components are encrypted and authenticated.

In this section we’ve achieved our goal of securing Buildbarn deployment using mTLS. Now all that’s left is to reconfigure Bazel to use and verify certificates while accessing Buildbarn’s RBE API endpoint.

Configuring certificates on client

So far we’ve configured Buildbarn to always use TLS encrypted connections. It means that our current client setup for using it will not work because it doesn’t expect TLS. In this section we’ll generate a client certificate for it using the cmctl tool, configure Bazel to both validate the server certificate and use this new client certificate when communicating with Buildbarn, and show the final complete example.

First, note that as said, if we run Bazel with current client configuration it will fail due to using a non-encrypted connection to an encrypted endpoint:

pushd ../../examples/cpp-tutorial/stage1
bazelisk clean
bazelisk build --config=linux --config=remote //main:hello-world

The error will look like this:

INFO: Invocation ID: dc8188ca-e77f-4884-a596-612779c6ae33
ERROR: Failed to query remote execution capabilities: UNAVAILABLE: Network closed for unknown reason

To configure the client to use an encrypted connection, we need to replace the grpc protocol with grpcs in .bazelrc and try again:

sed -i s/grpc/grpcs/ .bazelrc
bazelisk build --config=linux --config=remote //main:hello-world

Now the error will indicate that something else is missing - in this case, a client certificate:

INFO: Invocation ID: 7dcb900f-17eb-4dbb-ab9c-df9c70bc2c92
ERROR: Failed to query remote execution capabilities: UNAVAILABLE: io exception
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]

To address that, we need to generate client certificates and configure Bazel to use them.

Generating the client certificate

We will use cert-manager and its CLI client cmctl to generate a certificate for our client. First, we need to create a Certificate object template in cert-template.yaml:

cat > cert-template.yaml <<EOF
apiVersion: cert-manager.io/v1
kind: Certificate
spec:
  commonName: client
  usages:
  - client auth
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: ca
    kind: Issuer
    group: cert-manager.io
EOF

Then we can use it to create the actual certificate:

cmctl create certificaterequest -n buildbarn client --from-certificate-file cert-template.yaml --fetch-certificate

It will use this certificate template as if it was created in Kubernetes: it will generate a key in client.key, create a Certificate Signing Request (CSR) from it, embed that in a cert-manager CertificateRequest and send it, wait for the server to sign it, and finally retrieve the resulting certificate to client.crt.

We also need a CA certificate to verify server certificates. We can use the same command we used for Buildbarn configuration here:

kubectl -n buildbarn get certificaterequests ca-1 -o jsonpath='{.status.ca}' | base64 -d > ca.crt

You can make sure that client certificate is signed with this CA certificate by adding openssl to shell.nix and running:

openssl verify -CAfile ca.crt client.crt

It will output client.crt: OK if everything is correct.

Building with certificates

All that’s left is to tell Bazel to use these certificates to connect to Buildbarn. We’ll need to convert the private key to PKCS#8 format for it and add these settings to .bazelrc:

openssl pkcs8 -topk8 -nocrypt -in client.key -out client.pem
echo "build:remote --tls_certificate=ca.crt" >> .bazelrc
echo "build:remote --tls_client_certificate=client.crt" >> .bazelrc
echo "build:remote --tls_client_key=client.pem" >> .bazelrc

Now let’s clean the Bazel cache and run the build:

bazelisk clean
bazelisk build --config=linux --config=remote //main:hello-world

You will see that the remote cache is in use, which means that TLS has been configured successfully:

...
INFO: Elapsed time: 0.601s, Critical Path: 0.10s
INFO: 5 processes: 2 remote cache hit, 3 internal.
...

To make sure that the actual build also works, we can change the source file a bit and re-run the build:

echo >> main/hello-world.cc
bazelisk build --config=linux --config=remote //main:hello-world

It will now take some time and actually show that it has built one action remotely:

...
INFO: Elapsed time: 15.866s, Critical Path: 15.69s
INFO: 2 processes: 1 internal, 1 remote.
...

Conclusion

We’ve shown how to deploy Buildbarn on Kubernetes, how to configure mTLS between all its components, and how to use TLS authentication with RBE API clients using Bazel as an example. This is a starting configuration that can be improved in several aspects not covered here:

  • The Buildbarn browser and the scheduler web UIs are neither exposed nor encrypted;
  • cert-manager is not configured to limit access to certificate generation, meaning that anyone with access to Kubernetes API has access to all its capabilities;
  • no limits are imposed on client certificates, they only need to be valid;
  • there is no automation for client certificate renewal;
  • and only certificates are used for authentication, which is secure but can be enhanced or replaced with OAuth which is more flexible and provides better control

All these are interesting topics that would each deserve their own blog post.

August 29, 2024 12:00 AM

Abhinav Sarkar

Getting Started with Nix for Haskell

So, you’ve heard of the new hotness that is Nix, for creating reproducible and isolated development environments, and want to use it for your new Haskell project? But you are unclear about how to get started? Then this is the guide you are looking for.

This post was originally published on abhinavsarkar.net.

Nix is notoriously hard to get started with. If you are familiar with Haskell, you may have an easier time learning the Nix language, but it is still difficult to figure out the various toolchains and library functions needed to put your knowledge of the Nix language to use. There are some frameworks for setting up Haskell projects with Nix, but again, they are hard to understand because of their large feature scopes. So, in this post, I’m going to show a really easy way for you to get started.

Nix for Haskell

But first, what does it mean to use Nix for a Haskell project? It means that all the dependencies of our projects — Haskell packages, and non-Haskell ones too — come from Nixpkgs, a repository of software configured and managed using Nix1. It also means that all the tools we use for development, such as builders, linters, style checkers, LSP servers, and everything else, also come from Nixpkgs2. And all of this happens by writing some configuration files in the Nix language.

Start with creating a new directory for the project. For the purpose of this post, we name this project ftr:

$ mkdir ftr
$ cd ftr

The first thing to do is to set up the project to point to the Nixpkgs repo — more specifically, a particular fixed version of the repo — so that our builds are reproducible3. We do this by using Niv.

Niv is a tool for pinning/locking down the version of the Nixpkgs repo, much like cabal freeze or npm freeze. But instead of pinning each dependency at some version, we pin the entire repo (from which all the dependencies come) at a version.

Run the following commands:

$ nix-shell -p niv
$ niv init

Running nix-shell -p niv drops us into a nested shell in which the niv executable is available. Running niv init sets up Niv for our project, creating nix/sources.{json|nix} files. The nix/sources.json file is where the Nixpkgs repo version is pinned4. If we open it now, it may look something like this:

{
    "nixpkgs": {
        "branch": "nixos-unstable",
        "description": "Nix Packages collection",
        "homepage": null,
        "owner": "NixOS",
        "repo": "nixpkgs",
        "rev": "6c43a3495a11e261e5f41e5d7eda2d71dae1b2fe",
        "sha256": "16f329z831bq7l3wn1dfvbkh95l2gcggdwn6rk3cisdmv2aa3189",
        "type": "tarball",
        "url": "https://github.com/NixOS/nixpkgs/archive/6c43a3495a11e261e5f41e5d7eda2d71dae1b2fe.tar.gz",
        "url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
    }
}
nix/sources.json

By default, Niv sets up the Nixpkgs repo, pinned to some version. Let’s pin it to the latest stable version as of the time of writing this post: 24.05. Run:

$ niv drop nixpkgs
$ niv add NixOS/nixpkgs -n nixpkgs -b nixos-24.05

Now, nix/sources.json may look like this:

{
    "nixpkgs": {
        "branch": "nixos-24.05",
        "description": "Nix Packages collection & NixOS",
        "homepage": "",
        "owner": "NixOS",
        "repo": "nixpkgs",
        "rev": "36bae45077667aff5720e5b3f1a5458f51cf0776",
        "sha256": "0mkbsp2f07lrqcnlsnybi6kbxdr7sjs3hiz4kf4jkqirk4qgswfi",
        "type": "tarball",
        "url": "https://github.com/NixOS/nixpkgs/archive/36bae45077667aff5720e5b3f1a5458f51cf0776.tar.gz",
        "url_template": "https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"
    }
}
nix/sources.json

Pinning is done. Now, let’s get some stuff from the repo. But wait, first we have to configure Nixpkgs. Create a file nix/nixpkgs.nix:

{ system ? builtins.currentSystem }:
let
  sources = import ./sources.nix;
in import sources.nixpkgs {
  inherit system;
  overlays = [ ];
  config = { };
}
nix/nixpkgs.nix

Well, I lied. We could configure Nixpkgs if we had to5, but for this post, we leave all the settings empty, and just import it from Niv sources.

At this point, we could start pulling things from Nixpkgs manually, but to make it declarative and reproducible, let’s create our own Nix shell.

Shelling Out

Create a file named shell.nix:

{ system ? builtins.currentSystem, devTools ? true }:
let
  pkgs = import ./nix/nixpkgs.nix { inherit system; };
  myHaskellPackages = pkgs.haskellPackages;
in myHaskellPackages.shellFor {
  packages = p: [ ];
  nativeBuildInputs = with pkgs;
    [ ghc cabal-install ] ++ lib.optional devTools [
      niv
      hlint
      ormolu
      (ghc.withPackages (p: [ p.haskell-language-server ]))
    ];
}
shell.nix

Ah! Now, the Nix magic is shining through. What shell.nix does is, it creates a custom Nix shell with the things we mention already available in the shell. pkgs.haskellPackages.shellFor is how we create the custom shell, and nativeBuildInputs are the tools we want available.

We make ghc and cabal-install mandatorily available, because they are necessary for doing any Haskell development; and niv, hlint, ormolu and haskell-language-server67 optionally available (depending on the passed devTools flag), because we need them only when writing code.

Exit the previous Nix shell, and start a new one to start working on the project8:

$ nix-shell --arg devTools false

Okay, I lied again, we are still setting up. In this new shell, hlint, ormoulu etc are not available but we can run cabal now. We use it to initialize the Haskell project:

$ cabal init -p ftr

After answering all the questions Cabal asks us, we are left with a ftr.cabal file, along with some starter Haskell code in the right directories. Let’s build and run the starter code:

$ cabal run
Hello, Haskell!

It works!

Edit the ftr.cabal file now to add some new Haskell dependency (without a version), such as extra. If we run cabal build now, Cabal will start downloading the extra package. Cancel that! We want our dependencies to come from Nixpkgs, not Hackage. For that we need to tell Nix about our Haskell project.

Create a file package.nix:

{ system ? builtins.currentSystem }:
let
  pkgs = import ./nix/nixpkgs.nix { inherit system; };
  hlib = pkgs.haskell.lib.compose;
in pkgs.lib.pipe
(pkgs.haskellPackages.callCabal2nix "ftr" (pkgs.lib.cleanSource ./.) { })
[ hlib.dontHaddock ]
package.nix

The package.nix file is the Nix representation of the Cabal package for our project. We use cabal2nix here, a tool that makes Nix aware of Cabal files, making it capable of pulling the right Haskell dependencies from Nixpkgs. We also configure Nix to not run Haddock on our code by setting the hlib.dontHaddock option9, since we are not going to write any doc for this demo project.

Now, edit shell.nix to make it aware of our new Nix package:

{ system ? builtins.currentSystem, devTools ? true }:
let
  pkgs = import ./nix/nixpkgs.nix { inherit system; };
  myHaskellPackages = pkgs.haskellPackages.extend
    (final: prev: { ftr = import ./package.nix { inherit system; }; });
in myHaskellPackages.shellFor {
  packages = p: [ p.ftr ];
  nativeBuildInputs = with pkgs;
    [ ghc cabal-install ] ++ lib.optional devTools [
      niv
      hlint
      ormolu
      (ghc.withPackages (p: [ p.haskell-language-server ]))
    ];
}
shell.nix

We extend Haskell packages from Nixpkgs with our own package ftr, and add an entry in the previously empty packages list. This makes all the Haskell dependencies we mention in ftr.cabal available in the Nix shell. Exit the Nix shell now, and restart it by running:

$ nix-shell --arg devTools false

We can run cabal build now. Notice that nothing is downloaded from Hackage this time.

Even better, we can now build our project using Nix:

$ nix-build package.nix

This builds our project in a truly isolated environment outside the Nix shell, and puts the results in the result directory. Go ahead and try running it:

$ result/bin/ftr
Hello, Haskell!

Great! Now we can quit and restart the Nix shell without the --arg devTools false option. This will download and set up all the fancy dev tools we configured. Then we can start our favorite editor from the terminal and have access to all of them in it10.

This is all we need to get started on a Haskell project with Nix. There are some inconveniences in this setup, like we need to restart the Nix shell and the editor every time we modify our project dependencies, but these days most editors come with some extensions to do this automatically, without needing restarts. For more seamless experience in the terminal, we could install direnv and nix-direnv that refresh the Nix shells automatically11.

Bonus Round: Flakes

As a bonus, I’m going to show how to easily set up a Nix Flake for this project. Simply create a flake.nix file:

{
  description = "ftr is demo project for using Nix to manage Haskell projects";
  inputs.flake-utils.url = "github:numtide/flake-utils";

  outputs = { self, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let ftr = import ./package.nix { inherit system; };
      in rec {
        devShells.default = import ./shell.nix { inherit system; };
        packages.default = ftr;
        apps.default = {
          type = "app";
          program = "${ftr}/bin/ftr";
        };
      });
}
flake.nix

We reuse the package and shell Nix files we created earlier. We have to commit everything to our VSC at this point. After that, we can run the newfangled Nix commands such as12:

$ nix develop # same as: nix-shell
$ nix build # same as: nix-build package
$ nix shell # builds the package and starts a shell with the built executable available
$ nix run # builds the package and runs the built executable
$ nix profile install # builds the package and installs the built executable in our Nix profile

If we upload the project to a public Github repo, anyone with Nix set up can run and/or install our package executable by running single commands:

$ nix run github:username/ftr # downloads, builds and runs without installing
$ nix profile install github:username/ftr # downloads, builds and installs

If that not super cool then I don’t know what is.

Bonus Round 2: Statically Linked Executable

Create a file package-static.nix and nix-build it to create a statically linked executable on Linux13, which can be run on any Linux machine without installing any dependency libraries or even Nix14:

{ system ? builtins.currentSystem }:
let
  sources = import ./nix/sources.nix;
  nixpkgs = import sources.nixpkgs {
    inherit system;
    overlays = [
      (final: prev: {
        haskellPackages = prev.haskellPackages.override {
          ghc = prev.haskellPackages.ghc.override {
            enableRelocatedStaticLibs = true;
            enableShared = false;
            enableDwarf = false;
          };
          buildHaskellPackages =
            prev.haskellPackages.buildHaskellPackages.override
            (old: { ghc = final.haskellPackages.ghc; });
        };
      })
    ];
    config = { };
  };
  pkgs = nixpkgs.pkgsMusl;
  hlib = pkgs.haskell.lib.compose;
in pkgs.lib.pipe
(pkgs.haskellPackages.callCabal2nix "ftr" (pkgs.lib.cleanSource ./.) { }) [
  hlib.dontHaddock
  hlib.justStaticExecutables
  hlib.disableSharedLibraries
  hlib.enableDeadCodeElimination
  (hlib.appendConfigureFlags [
    "-O2"
    "--ghc-option=-fPIC"
    "--ghc-option=-optl=-static"
    "--extra-lib-dirs=${pkgs.gmp6.override { withStatic = true; }}/lib"
    "--extra-lib-dirs=${
      pkgs.libffi.overrideAttrs (old: { dontDisableStatic = true; })
    }/lib"
    "--extra-lib-dirs=${pkgs.ncurses.override { enableStatic = true; }}/lib"
    "--extra-lib-dirs=${pkgs.zlib.static}/lib"
  ])
]
package-static.nix

Conclusion

This post shows a quick and easy way to get started with using Nix for managing simple Haskell projects. Unfortunately, if we have any complex requirements, such as custom dependency versions, patched dependencies, custom non-Haskell dependencies, custom configuration for Nixpkgs, multi-component Haskell projects, using a different GHC version, custom build scripts etc, this setup does not scale. In such case you can either grow this setup by learning Nix in more depth with the help of the official Haskell with Nix docs and this great tutorial, or switch to using a framework like Nixkell or haskell-flake.

This post only scratches the surface of all things possible to do with Nix. I hope I was able to showcase some benefits of Nix, and help you get started. Happy Haskelling and happy Nixing!


  1. One big advantage that Nix has over using Cabal for managing Haskell projects is the Nix binary cache that provides pre-built libraries and executable for download. That means no more waiting for Cabal to build scores of dependencies from sources.↩︎

  2. Search Nixpkgs for packages at search.nixos.org.↩︎

  3. I’m assuming that you’ve already set up Nix at this point. If you have not, follow this guide.↩︎

  4. Of course, we can use Niv to manage any number of source repos, not just Nixpkgs. But we don’t need any other for this post.↩︎

  5. We could do all sort of interesting and useful things here, like patching some Nixpkgs packages with our own patches, reconfiguring the build flags of some packages, etc.↩︎

  6. hlint is a Haskell linter, ormolu is a Haskell file formatter, and haskell-language-server is an LSP server for Haskell. Other tools that I find useful are stan, the Haskell static analyzer, just, the command runner, and nixfmt, the Nix file formatter. All of them and more are available through Nixpkgs. You can start using them by adding them to nativeBuildInputs.↩︎

  7. If you are wondering why we need to wrap only haskell-language-server with all the ghc stuff, that’s because, to work correctly haskell-language-server is required to be compiled with same version of ghc that your project is going to used. The other tools do not have this restriction.↩︎

  8. You may notice Nix downloading a lot of stuff from Nixpkgs. It may occasionally need to build a few things as well, if they are not available in the binary cache.

    You may need to tweak the connect-timeout and download-attempts settings in the nix.conf file if you are on a slow network.↩︎

  9. There are many more options that we can set here. These options roughly correspond to the command line options for the cabal command. See a comprehensive list here.↩︎

  10. To update the tools and dependencies of the project, run niv update nixpkgs, and restart the Nix shell.↩︎

  11. Use this .envrc file to configure direnv for automatic refreshes for this project:

    #!/usr/bin/env bash
    use nix
    watch_file shell.nix
    watch_file nix/sources.json
    watch_file ftr.cabal
    ↩︎
  12. First, we would have to modify our nix.conf file to enable these commands by adding the line:

    experimental-features = nix-command flakes
    ↩︎
  13. This might take several hours to finish when run for the first time. Also, the enableDwarf = false config requires GHC >= 9.6.↩︎

  14. Another benefit of statically linked executables is, if you package them in Docker/OCI containers, the container sizes are much smaller than ones created for dynamically linked executables.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at August 29, 2024 12:00 AM

August 26, 2024

Michael Snoyman

Let the API protect you

Let's write a simple program to manage purchases at a small convenience store. The store only sells two items: eggs and apples. We know the price of each item, and we need to set aside 5% of every purchase for taxes. We should really use a decimal type instead of floats for handling currency, but we'll simplify things a bit here for convenience.

fn main() {
    let mut accounts = Accounts::default();
    accounts.buy_eggs(6);
    accounts.buy_apples(10);
    println!("{accounts:#?}");
}

const TAX_RATE: f64 = 0.05;
const PRICE_PER_EGG: f64 = 0.75;
const PRICE_PER_APPLE: f64 = 0.5;

#[derive(Debug, Default)]
struct Accounts {
    company_balance: f64,
    taxes_paid: f64,
}

impl Accounts {
    fn log_purchase(&mut self, money: f64) {
        let taxes = money * TAX_RATE;
        self.taxes_paid += taxes;
        self.company_balance += money - taxes;
    }

    fn buy_eggs(&mut self, eggs: u64) {
        self.log_purchase(eggs as f64 * PRICE_PER_EGG);
    }

    fn buy_apples(&mut self, apples: u64) {
        self.log_purchase(apples as f64 * PRICE_PER_APPLE);
    }
}

We now have a highly sophisticated and bullet-proof accounting systems for our store, no tax auditor could ever object to such pristine book keeping! We continue to run our successful little business and soon make enough money to open a second location. Let's say our first business was in Arizona, and now we want to expand into the Nevada market.

All good... except that the tax rates in the two states are different! While Arizona is 5%, Nevada is 8%. How can we model this in our code?

One possibility would be to pass in the tax rate as a parameter to log_purchase. Let's give that a shot:

fn main() {
    let mut accounts = Accounts::default();
    accounts.buy_eggs(6, TAX_RATE_ARIZONA);
    accounts.buy_apples(10, TAX_RATE_NEVADA);
    println!("{accounts:#?}");
}

const TAX_RATE_ARIZONA: f64 = 0.05;
const TAX_RATE_NEVADA: f64 = 0.08;
const PRICE_PER_EGG: f64 = 0.75;
const PRICE_PER_APPLE: f64 = 0.5;

#[derive(Debug, Default)]
struct Accounts {
    company_balance: f64,
    taxes_paid: f64,
}

impl Accounts {
    fn log_purchase(&mut self, money: f64, tax_rate: f64) {
        let taxes = money * tax_rate;
        self.taxes_paid += taxes;
        self.company_balance += money - taxes;
    }

    fn buy_eggs(&mut self, eggs: u64, tax_rate: f64) {
        self.log_purchase(eggs as f64 * PRICE_PER_EGG, tax_rate);
    }

    fn buy_apples(&mut self, apples: u64, tax_rate: f64) {
        self.log_purchase(tax_rate, apples as f64 * PRICE_PER_APPLE);
    }
}

That's not too bad... until you realize that there's a bug in the code above. Look at the implementation of buy_apples. We've accidentally provided the tax_rate as the amount of money the apples cost! Easy mistake to make, and thankfully easy enough to fix:

fn buy_apples(&mut self, apples: u64, tax_rate: f64) {
    self.log_purchase(apples as f64 * PRICE_PER_APPLE, tax_rate);
}

"Huh," some vague part of my brain screams out. "It was way too easy to write buggy code. Can we fix that?" At this point, I think that proponents of dynamic typing can (rightfully) claim a small victory here. I've written some reasonable code in Rust, a statically typed language, and the compiler couldn't stop me from making a silly mistake. As a proponent of types, I begin to question the fabric of reality and my entire stance on programming. But no time for that, I'm too busy expanding my store to other states!

Soon enough, we're ready to expand further into Utah. Utah also has a sales tax, but they exempt eggs from their sales tax because it's an essential good. (And if anyone's about to fact check me: I've completely made up all the tax rates and rules in this post.) Anyway, our existing Accounts struct and its API is totally up to the challenge here, and we can easily implement this correctly:

fn main() {
    let mut accounts = Accounts::default();
    accounts.buy_eggs(6, TAX_RATE_ARIZONA);
    accounts.buy_apples(10, TAX_RATE_NEVADA);
    accounts.buy_eggs(12, TAX_RATE_UTAH);
    accounts.buy_apples(2, 0.0); // essential goods have no taxes in Utah
    println!("{accounts:#?}");
}

Easy peasy... and broken! Once again, I've made a simple mistake, and the type system and my APIs have done nothing to protect me. I've set the tax rate in Utah at 0%... but for the purchase of apples, not eggs! Once again, it's an easy fix:

accounts.buy_eggs(12, 0.0); // essential goods have no taxes in Utah
accounts.buy_apples(2, TAX_RATE_UTAH);

But these recurring bugs are frustrating, and frankly the code structure is completely unsatisfactory. I've needed to put some of the logic for tax collection into our main function, while other parts live in log_purchase. And the types do nothing to protect us. Is there anything we can do about this?

Strong types, local logic

I want to bash apart the code above using two principles:

  1. Use strong types when possible. This isn't the same as static types. Static typing simply means that all variables have a known type. Strong typing is about making those types meaningful. In our log_purchase method, we currently have weak typing. We take two parameters, money and tax_rate. They're both f64s, and nothing prevents us from swapping the argument order by mistake.
  2. Keep logic as local as possible. We're currently making decisions on the taxes in two places: determining the tax rate in main, and calculating the taxes incurred in log_purchase. We also need to pass that logic through the buy_eggs and buy_apples methods.

Let's start with trying to address the second point. I'd like to have all tax logic present in log_purchase. That means I need to know if the purchase is taxable or not. One possibility would be adding a new parameter to indicate if taxes should be collected:

fn log_purchase(&mut self, money: f64, tax_rate: f64, collect_taxes: bool) {
    let taxes = if collect_taxes { money * tax_rate } else { 0.0 };
    self.taxes_paid += taxes;
    self.company_balance += money - taxes;
}

But this fails both of our problems from above:

  1. We've added in a new parameter, but it's just as weakly typed as an f64. (In this case, I'd call it boolean blindness.) While we don't have to worry about accidentally swapping parameters, who's to say if true means "collect taxes" versus "exempt from taxes?" Sure, you can look at the code or read the docs... but who's going to do that? I want my compiler to save me!
  2. It still requires performing logic in the caller to determine if this particular purchase is required to pay taxes, which still keeps our logic split up.

Instead of this slapdash approach, let's try to think of it from the bottom up.

Data driven

What information do we need to know to determine if taxes can be charged? Two things:

  1. Which state the purchase took place in
  2. What item was purchased

With that stated, it's easy enough to create some helper data types to begin modeling this more appropriately:

fn main() {
    let mut accounts = Accounts::default();
    accounts.buy_eggs(6, TAX_RATE_ARIZONA, State::Arizona);
    accounts.buy_apples(10, TAX_RATE_NEVADA, State::Nevada);
    accounts.buy_eggs(12, TAX_RATE_UTAH, State::Utah);
    accounts.buy_apples(2, TAX_RATE_UTAH, State::Utah);
    println!("{accounts:#?}");
}

const TAX_RATE_ARIZONA: f64 = 0.05;
const TAX_RATE_NEVADA: f64 = 0.08;
const TAX_RATE_UTAH: f64 = 0.09;
const PRICE_PER_EGG: f64 = 0.75;
const PRICE_PER_APPLE: f64 = 0.5;

#[derive(Debug, Default)]
struct Accounts {
    company_balance: f64,
    taxes_paid: f64,
}

enum State {
    Arizona,
    Nevada,
    Utah,
}

enum Item {
    Apples,
    Eggs,
}

impl Accounts {
    fn log_purchase(&mut self, money: f64, tax_rate: f64, state: State, item: Item) {
        let collect_taxes = match (state, item) {
            (State::Utah, Item::Eggs) => false,
            _ => true,
            // Or if, like me, you like to be really explicit:
            // (State::Arizona, Item::Apples)
            // | (State::Arizona, Item::Eggs)
            // | (State::Nevada, Item::Apples)
            // | (State::Nevada, Item::Eggs)
            // | (State::Utah, Item::Apples) => true,
        };
        let taxes = if collect_taxes { money * tax_rate } else { 0.0 };
        self.taxes_paid += taxes;
        self.company_balance += money - taxes;
    }

    fn buy_eggs(&mut self, eggs: u64, tax_rate: f64, state: State) {
        self.log_purchase(eggs as f64 * PRICE_PER_EGG, tax_rate, state, Item::Eggs);
    }

    fn buy_apples(&mut self, apples: u64, tax_rate: f64, state: State) {
        self.log_purchase(
            apples as f64 * PRICE_PER_APPLE,
            tax_rate,
            state,
            Item::Apples,
        );
    }
}

Now we're fully implementing our "essential goods" check within log_purchase, with none of the logic leaking out. And our new types are properly strong types; it's impossible to accidentally swap the State and Item with one of the f64 parameters, since they have totally different types.

It's not like everything is perfect yet. We can still easily write this incorrect code:

accounts.buy_apples(2, TAX_RATE_UTAH, State::Nevada);

But this is also easily rectified. Now that we're passing in a State parameter to log_purchase, we can determine the tax rate ourself within that function. And passing in a State value instead of an f64 prevents us from accidentally providing the parameters in the wrong order.

But you may have noticed something else: the tax_rate parameter is now redundant! Thanks to providing more information to log_purchase, it can be more intelligent in its own functioning, reducing burden on callers and removing a potential mismatch such as this code:

#[derive(Clone, Copy)]
enum State {
    Arizona,
    Nevada,
    Utah,
}

impl State {
    fn tax_rate(self) -> f64 {
        match self {
            State::Arizona => 0.05,
            State::Nevada => 0.08,
            State::Utah => 0.09,
        }
    }
}

fn log_purchase(&mut self, money: f64, state: State, item: Item) {
    let collect_taxes = match (state, item) {
        (State::Utah, Item::Eggs) => false,
        _ => true,
    };
    let taxes = if collect_taxes {
        money * state.tax_rate()
    } else {
        0.0
    };
    self.taxes_paid += taxes;
    self.company_balance += money - taxes;
}

And just like that, log_purchase doesn't require any outside logic to determine how to collect taxes. You simply, declaratively, and in a strongly-typed manner, provide it the information necessary for it to do its job, and the method carries out all the logic.

We could even go a step farther if we wanted, and have log_purchase handle the calculation of the cost of the goods too:

fn log_purchase(&mut self, quantity: u64, state: State, item: Item) {
    let collect_taxes = match (state, item) {
        (State::Utah, Item::Eggs) => false,
        _ => true,
    };
    let money = quantity as f64 * item.price();
    let taxes = if collect_taxes {
        money * state.tax_rate()
    } else {
        0.0
    };
    self.taxes_paid += taxes;
    self.company_balance += money - taxes;
}

And with that in place, you may even decide that helper methods like buy_eggs and buy_apples aren't worth it:

fn main() {
    let mut accounts = Accounts::default();
    accounts.buy(6, State::Arizona, Item::Eggs);
    accounts.buy(10, State::Nevada, Item::Apples);
    accounts.buy(12, State::Utah, Item::Eggs);
    accounts.buy(2, State::Utah, Item::Apples);
    accounts.buy(2, State::Nevada, Item::Apples);
    println!("{accounts:#?}");
}

#[derive(Debug, Default)]
struct Accounts {
    company_balance: f64,
    taxes_paid: f64,
}

#[derive(Clone, Copy)]
enum State {
    Arizona,
    Nevada,
    Utah,
}

impl State {
    fn tax_rate(self) -> f64 {
        match self {
            State::Arizona => 0.05,
            State::Nevada => 0.08,
            State::Utah => 0.09,
        }
    }
}

#[derive(Clone, Copy)]
enum Item {
    Apples,
    Eggs,
}

impl Item {
    fn price(self) -> f64 {
        match self {
            Item::Apples => 0.5,
            Item::Eggs => 0.75,
        }
    }
}

impl Accounts {
    fn buy(&mut self, quantity: u64, state: State, item: Item) {
        let collect_taxes = match (state, item) {
            (State::Utah, Item::Eggs) => false,
            _ => true,
        };
        let money = quantity as f64 * item.price();
        let taxes = if collect_taxes {
            money * state.tax_rate()
        } else {
            0.0
        };
        self.taxes_paid += taxes;
        self.company_balance += money - taxes;
    }
}

Conclusion

OK, so we moved some code around, centralized some logic, and now everything is nicer. We have some type safety in place too. You may be looking at this as small gains for introducing a lot of type complexity. But here are my closing thoughts:

  1. Sure, this silly example may not warrant the type machinery for protection. But it's very easy to scale up from such a simple example to real-world use cases where the type safety prevents far more complex and insidious bugs.
  2. I'd argue that there's not really any complexity here. We introduced two new data types and a new method on each of them, but also removed two helper functions and five constants. I'd take that trade in complexity any day.
  3. The next set of features we want to implement will become even easier to make. For example, take both the original weakly typed version and the new strongly typed version, and try implementing these changes:
    1. In Arizona only, reduce the cost of apples to 0.45 per apple when you purchase 12 or more.
    2. Allow the price of the goods to change during the course of execution. In other words, don't hard-code in all the prices. In my opinion, the strongly typed version makes both of these tasks much easier and safer.

So what's the overarching lesson to be learned here? I'd put it this way:

Identify the inputs needed for your functions to perform all their logic, avoiding splitting up that logic into multiple parts of your code base. Use well defined, strong types to represent that input cleanly.

It may sound simple, and perhaps obvious. But the next time you feel yourself succumbing to writing yet-another-weird-hack to address an unexpected business requirement, see if reframing the question from "how can I quickly add this feature" to "what's the best way to model the requirements as inputs and outputs" helps you come up with a better design.

August 26, 2024 12:00 AM

August 24, 2024

Mark Jason Dominus

Dancing bread

Marnanel Thurman reported the following item that they found in an 1875 book titled How to Entertain a Social Party:

To Make a Loaf of Bread Dance on the Table.

— Having a quill filled with quicksilver and stopped close, you secretly thrust it into a hot roll or loaf, which will put it in motion.

(Bottom of page 46.) No further explanation is given.

This may remind you of an episode from Huckleberry Finn:

Well, then I happened to think how they always put quicksilver in loaves of bread and float them off, because they always go right to the drownded carcass and stop there.

(Chapter 8.)

When I first read this I assumed it was a local Southern superstition, characteristic of that place and time. But it seems not! According to this article by Dan Rolph of the Historical Society of Pennsylvania, the belief was longstanding and widespread, lasting from at least 1767 to 1872, and appearing also in London and in Pennsylvania.

Details of the dancing bread trick are lacking. I guess the quicksilver stays inside the stopped-up quill. (Otherwise, there would be no need to “stop it close”.) Then perhaps on being heated by the bread, the quicksilver expands lengthwise as in a thermometer, and then… my imagination fails me.

The procedure for making drowned-body-finding bread is quite different. Rolph's sources all agree: you poke in your finger and scoop out a bit of the inside, pour the quicksilver into the cavity, and then plug up the hole. So there's no quill; the quicksilver is just sloshing around loose in there. Huckleberry Finn agrees:

I took out the plug and shook out the little dab of quicksilver…

Does anyone have more information about this? Does hot bread filled with mercury really dance on the table, and if so why? Is the supersition about bread finding drowned bodies related to this, or is it a coincidence?

Also, what song did the sirens sing, and by what name was Achilles called when he hid among women?

by Mark Dominus (mjd@plover.com) at August 24, 2024 08:55 PM

August 22, 2024

Mark Jason Dominus

XKCD game theory question

Six-panel cartoon from XKCD. Each panel gives a one-question mathematics ‘final exam’ from a different level of education from ‘kindergarten’ to  ‘postgraduate math’.  This article concerns the fifth, which says “Game Theory Final Exam: Q. Write down 10 more than the average of the class’s answers.  A. (blank).”

(Source: XKCD “Exam numbers”.)

This post is about the bottom center panel, “Game Theory final exam”.

I don't know much about game theory and I haven't seen any other discussion of this question. But I have a strategy I think is plausible and I'm somewhat pleased with.

(I assume that answers to the exam question must be real numbers — not  — and that “average” here is short for 'arithmetic mean'.)

First, I believe the other players and I must find a way to agree on what the average will be, or else we are all doomed. We can't communicate, so we should choose a Schelling point and hope that everyone else chooses the same one. Fortunately, there is only one distinguished choice: zero. So I will try to make the average zero and I will hope that others are trying to do the same.

If we succeed in doing this, any winning entry will therefore be . Not all players can win because the average must be . But can win, if the one other player writes . So my job is to decide whether I will be the loser. I should select a random integer between and . If it is zero, I have drawn a short straw, and will write . otherwise I write .

(The straw-drawing analogy is perhaps misleading. Normally, exactly one straw is short. Here, any or all of the straws might be short.)

If everyone follows this strategy, then I will win if exactly one person draws a short straw and if that one person isn't me. The former has a probability that rapidly approaches as increases, and the latter is . In an -person class, the probability of my winning is $$\left(\frac{n-1}n\right)^n$$ which is already better than when , and it increases slowly toward after that.

Some miscellaneous thoughts:

  1. The whole thing depends on my idea that everyone will agree on as a Schelling point. Is that even how Schelling points work? Maybe I don't understand Schelling points.

  2. I like that the probability appears. It's surprising how often this comes up, often when multiple agents try to coordinate without communicating. For example, in ALOHAnet a number of ground stations independently try to send packets to a single satellite transceiver, but if more than one tries to send a packet at a particular time, the packets are garbled and must be retransmitted. At most of the available bandwidth can be used, the rest being lost to packet collisions.

  3. The first strategy I thought of was plausible but worse: flip a coin, and write down if it is heads and if it is tails. With this strategy I win if exactly of the class flips heads and if I do too. The probability of this happening is only $$\frac{n\choose n/2}{2^n}\cdot \frac12 \approx \frac1{\sqrt{2\pi n}}.$$ Unlike the other strategy, this decreases to zero as increases, and in no case is it better than the first strategy. It also fails badly if the class contains an odd number of people.

    Thanks to Brian Lee for figuring out the asymptotic value of so I didn't have to.

  4. Just because this was the best strategy I could think of in no way means that it is the best there is. There might have been something much smarter that I did not think of, and if there is then my strategy will sabotage everyone else.

    Game theorists do think of all sorts of weird strategies that you wouldn't expect could exist. I wrote an article about one a few years back.

  5. Going in the other direction, even if of the smartest people all agree on the smartest possible strategy, if the th person is Leeroy Jenkins, he is going to ruin it for everyone.

  6. If I were grading this exam, I might give full marks to anyone who wrote down either or , even if the average came out to something else.

  7. For a similar and also interesting but less slippery question, see Wikipedia's article on Guess ⅔ of the average. Much of the discussion there is directly relevant. For example, “For Nash equilibrium to be played, players would need to assume both that everyone else is rational and that there is common knowledge of rationality. However, this is a strong assumption.” LEEROY JENKINS\infty-\infty-5010-5022-50\frac161010n-1!! players (including Vidkun) win if exactly one of them rolls zero. Vidkun's chance of winning increases. Intuitively, the other players' chances of winning ought to decrease. But by how much? I think I keep messing up the calculation because I keep getting zero. If this were actually correct, it would be a fascinating paradox!

by Mark Dominus (mjd@plover.com) at August 22, 2024 02:43 PM

Tweag I/O

Programming Languages & Compilers Activity Report - Q2 2024

One core value of Tweag is its dedication to the open-source community. Although our interests and expertise have become significantly broader over the years, our love for immutable, composable and typed architecture have made functional programming and programming languages in general an important part of our DNA. This long-standing activity was formalized last year as the Programming Languages & Compilers Group. The PL&C group has been busy in the second quarter of 2024, and this post is a summary of what we’ve been doing.

Our involvement varies depending on the availability of each team member and client engagements; if some projects might seem idle, this is usually just temporary. All projects appearing below are actively developed.

Rust

We have a bunch of Rust engineers at Tweag, and some of them have recently started to contribute to the Rust package manager cargo, which is a key part of the ecosystem. This is a choice motivated by our interest and expertise in build systems, our love for cargo and the need for contributions there.

Currently, in a cargo workspace spanning multiple crates, we can only publish one crate at a time, in dependency order. It’s been a long-standing issue as well as a focus area to be able to publish all the crates at once. To get there:

  • Joe Neeman implemented local registry overlays; which make it possible to package a crate even if its dependencies aren’t published yet. (#13926)
  • Joe Neeman and Tor Hovland added support to package all the crates in the workspace in a single command. Crates must still be published one at a time. (#13947, #14074)

Typically, cargo update is used to update dependencies to the latest versions that satisfy the version requirements defined in Cargo.toml. If you wanted to update the version requirements themselves to the latest available versions, you might use the 3rd party command cargo upgrade from cargo-edit. Another focus area for cargo is to bring this capability into cargo update.

  • Tor Hovland implemented cargo update --breaking, which will upgrade the version requirements in Cargo.toml if there are breaking changes. (#13979, #14049, #14259)
  • Tor Hovland implemented support for making breaking upgrades when doing a specific version update with cargo update --precise. At the time of writing, this isn’t merged yet. (#14140)

Furthermore, we have contributed with various smaller fixes: (#13874, #13886, #13960)

Haskell

GHC

GHC is the de facto standard Haskell compiler. Several of our GHC engineers are currently working on making it easier to seamlessly interface GHC with external build tools, such as Buck2. Most of our work on GHC is on behalf of Mercury.

  • Torsten Schmits and Cheng Shao have been working on supporting bytecode linking for Template Haskell dependencies in single-file compilation mode, which allows external build systems to take advantage of the performance improvements for Template Haskell that Cabal builds have been enjoying for some time now. (GHC MR 13042)
  • Torsten Schmits and Cheng Shao have implemented a way to print dependency metadata for a set of Haskell modules as JSON, for which Buck2 had to parse GHC’s generated Makefiles before. (GHC MR 11994)
  • Sjoerd Visscher added single-file processing support to Haddock, allowing external build tools to incrementally (re-)build documentation for individual modules without compilation. (GHC MR 12707)
  • Torsten Schmits has been working on performance improvements for dependency analysis. He wrote a patch (for -Wmissing-home-modules) that replaced a quadratic algorithm and reduces the startup time in a project with 10,000 modules by over a minute. He also wrote a WIP patch that introduces parallelism into the first phase of dependency graph computation, promising a reduction of the duration of this phase by a factor of 4 in some of our projects.
  • Cheng Shao looked into GHC’s ARM64 Windows support (GHC issue) and made an MVP that can cross-compile simple Haskell programs to ARM64 Windows executable from Linux.
  • Cheng Shao performed GHC housecleaning and removed legacy code paths related to 32-bit Darwin/Windows (GHC announcement).
  • Cheng Shao has been working on Template Haskell support in GHC’s WASM backend (GHC issue). A WASM dynamic code loader based on LLVM’s WASM shared library ABI is being prototyped at the moment; once it’s finished, remaining Template Haskell support should be straightforward.
  • Joseph Fourment joined us for an internship, during which he researched and implemented the initial steps towards more general and flexible let-bound types, which he wrote about on this blog. This effort introduces the capability for GHC to reuse in-memory data structures for type subexpressions that are shared by multiple larger types, promising substantial performance improvements when compiling programs with complex type-level computation.

Liquid Haskell

The Liquid Haskell contributions from Tweag are spearheaded by Facundo Domínguez. Liquid Haskell is a verification tool that allows you to write additional lightweight formal specifications for your Haskell programs. These specifications are then checked by the tool which discharges the proofs to SMT solvers, so that you don’t have to do it yourself.

Nickel

Nickel is a configuration programming language developed by Tweag aimed at infrastructure-as-code, build systems, Nix, or any complex system that needs to be configured and where YAML, JSON, TOML and the like aren’t sufficient.

  • The Nickel team released versions 1.6 and 1.7.
  • Yann Hamdaoui revived the previously stale nickel-kubernetes repository. In combination with updates to json-schema-to-nickel, we are able to auto-generate Nickel contracts (think “schemas”) for all Kubernetes resources at any given version.
  • Yann Hamdaoui implemented pattern matching extensions (constants, wildcards, guards, arrays and or-patterns) following the addition of structural ADTs aka enum variants (#1897, #1904, #1910, #1912, #1916).
  • Joe Neeman experimented with package management for Nickel and wrote an RFC to discuss and decide on various points in the design space.
  • Yann Hamdaoui reworked the contract system quite a bit to better support missing boolean operation on contracts, such as JSON Schema’s any_of and not. Beside boolean operators, this rework also made it much more ergonomic to write and compose custom contracts with custom error reporting. (#1964, #1970, #1975, #1987, #1995)
  • Yann Hamdaoui added span information for data imported from TOML to make validation errors more precise. (#1949)

Topiary

Topiary is a lightweight universal code formatter that relies on Tree-sitter grammars to handle a variety of languages. It has been developed by Tweag, and is used under the hood by the Nickel language to format code, but is also a standalone tool.

  • The Topiary team released version 0.4.0, “Exquisite Elm”. Highlights include improved Nickel formatting support and new CSS formatting support from external contributor Eric Lavigne.
  • Erin van der Veen moved all of Topiary’s dependencies to either published or vendored crates; that is, either those available on crates.io, or subsumed directly into our codebase. This prepares the ground for future releases of Topiary to crates.io, where projects with ad hoc dependencies (such as those direct from GitHub) are forbidden. (#672)
  • Christopher Harrison feature gated language support, mainly to allow the development of experimental language formatters without impinging on the supported formatters. This ties in with his development (still in progress) of formatting rules for Pact, the smart contract language for the Kadena blockchain. (#711, #713)
  • Erin van der Veen made a number of background, “quality of life” improvements, such as transitioning from TOML to Nickel for Topiary’s configuration. This allows for less complicated merging using Nickel’s record merging, especially in the future when Nickel implements custom merge functions. Another goal of this PR was to evaluate the use of Nickel as a library, which was a great success! (#703)

Closing words

The PL&C group will continue to contribute to the projects mentioned above in the near future. Stay tuned for the next quarterly report! In the meantime, you can find Tweag’s open source portfolio on Github and come chat with us on our Discord dedicated to our open-source activity, be it as a user, as a potential contributor, or simply to satisfy your own curiosity.

August 22, 2024 12:00 AM

August 21, 2024

Mark Jason Dominus

I DON'T KNOW

If you're an annoying know-it-all like me, I suggest that you try playing the following game when you attend a conference or a user group meetup or even a work meeting. The game is:

If someone asks you a question, and you say “I don't know”, you score a point.

That's it. That's the game. “I don't know” doesn't have to be perfectly truthful, only approximately truthful.

I forgot, there is one other rule:

If you follow up with something like “But if I had to guess…” you lose your point again.

by Mark Dominus (mjd@plover.com) at August 21, 2024 03:45 PM

Jasper Van der Jeugt

Turnstyle

I am delighted and horrified to announce a new graphical programming language called Turnstyle. You can see an example below (click to run).


In the time leading up to ZuriHac 2024 earlier this year, I had been thinking about Piet a little. We ended up working on something else during the Hackathon, but this was still in the back of my mind.

Some parts of Piets design are utter genius (using areas for number literals, using hue/lightness as cycles). There are also things I don’t like, such as the limited amount of colors, the difficulty reusing code, and the lack of a way to extend it with new primitive operations. I suspect these are part of the reason nobody has yet tried to write, say, an RDBMS or a web browser in Piet.

Given the amount of attention going to programming languages in the functional programming community, I was quite surprised nobody had ever tried to do a functional variant of it (as far as I could find).

I wanted to create something based on Lambda Calculus. It forms a nice basis for a minimal specification, and I knew that while code would still be somewhat frustrating to write, there is the comforting thought of being able to reuse almost everything once it’s written.

Cheatsheet for the specification
Cheatsheet for the specification

You can see the full specification here.

After playing around with different designs this is what I landed on. The guiding principle was to search for a specification that was as simple as possible, while still covering lambda calculus extended with primitives that, you know, allow you to interact with computers.

One interesting aspect that I discovered (not invented) is that it’s actually somewhat more expressive than Lambda Calculus, since you can build Abstract Syntax Graphs (rather than just Trees). This is illustrated in the loop example above, which recurses without the need for a fixed-point combinator.

For the full specification and more examples take a look at the Turnstyle website and feel free to play around with the sources on GitHub.

Thanks to Francesco Mazzoli for useful feedback on the specification and website.

by Jasper Van der Jeugt at August 21, 2024 12:00 AM

August 16, 2024

Haskell Interlude

55: Sebastian Ullrich

In this episode, Niki and Andres talk with Sebastian, one of the main developers of Lean, currently working at the Lean Focused Research Organization. Today we talk about the addictive notion of theorem provers, what is a sweet spot between dependent types and simple programming and how Lean is both a theorem prover and an efficient general purpose programming language. 

by Haskell Podcast at August 16, 2024 07:00 PM

August 14, 2024

Mark Jason Dominus

Poor Richard's Almanack

Benjamin Franklin wrote and published Poor Richard's Almanack annually from 1732 to 1758. Paper was expensive and printing difficult and time-consuming. The type would be inked, the sheet of paper laid on the press, the apprentices would press the sheet, by turning a big screw. Then the sheet was removed and hung up to dry. Then you can do another printing of the same page. Do this ten thousand times and you have ten thousand prints of a sheet. Do it ten thousand more to print a second sheet. Then print the second side of the first sheet ten thousand times and print the second side of the second sheet ten thousand times. Fold 20,000 sheets into eighths, cut and bind them into 10,000 thirty-two page pamphlets and you have your Almanacks.

As a youth, Franklin was apprenticed to his brother James, also a printer, in Boston. Franklin liked the work, but James drank and beat him, so he ran away to Philadelphia. When James died, Benjamin sent his widowed sister-in-law Ann five hundred copies of the Almanack to sell. When I first heard that I thought it was a mean present but I was being a twenty-first-century fool. The pressing of five hundred almanacks is no small feat of toil. Ann would have been able to sell those Almanacks in her print shop for fivepence each, or ₤10 8s. 4d. That was a lot of money in 1735.

In 1748 Franklin increased the size and the price. Here's a typical page from the 1748 Almanack:

detailed description in the article

Wow, there's a lot of stuff going on there. Here's a smaller excerpt, this time from November 1753:

The leftmost column is the day of the month, and then the next column is the day of the week, with 2–7 being Monday through Saturday. Sunday is denoted with a letter “G”. I thought this was G for God, but I see that in 1748 Franklin used “C” and in 1752 he used “A”, so I don't know.

The third column combines a weather forecast and a calendar. The weather forecast is in italic type, over toward the right: “Clouds and threatens cold rains and snow” in the early part of the month. Sounds like November in Philadelphia. The roman type gives important days. For example, November 1 is All Saints Day and November 5 is the anniversary of the Gunpowder Plot. November 10 is given as the birthday of King George II, then still the King of Great Britain.

The Sundays are marked with some description in the Christian liturgical calendar. For example, “20 past Trin.” means it's the start of the 20th week past Trinity Sunday.

This column also has notations like “Days dec. 4 32” and “Days dec. 5 h.” that I haven't been able to figure out. Something about the decreasing length of the day in November maybe? [ Addendum: Yes. See below. ] The notation on November 6 says “Day 10 10 long” which is consistent with the sunrise and sunset times Franklin gives for that day. The fourth and fifth columns, labeled “☉ ris” and “☉ set” are the times of sunrise and sunset, 6:55 (AM) and 5:05 (PM) respectively for November 6, ten hours and ten minutes apart as Franklin says.

“☽ pl.” is the position of the moon in the sky. (I guess “pl.” is short for “place”.) The sky is divided into twelve “houses” of 30 degrees each, and when it says that the “☽ pl.” on November 6 is “♓ 25” I think it means the moon is of the way along in the house of Pisces on its way to the house of Aries ♈. If you look at the January 1748 page above you can see the moon making its way through the whole sky in 29 days, as it does.

The last column, “Aspects, &c.” contains more astronomy. “♂ rise 6 13” means that Mars will rise at 6:13 that day. (But in the morning or the evening?) ⚹♃♀ on the 12th says that Jupiter is in sextile aspect to Venus, which means that they are in the sky 60 degrees apart. Similarly □☉♃ means that the Sun and Jupiter are in Square aspect, 90 degrees apart in the sky.

Also mixed into that last column, taking up the otherwise empty space, are the famous wise sayings of Poor Richard. Here we see:

Serving God is Doing Good to Man,
but Praying is thought an easier Service,
and therefore more generally chosen
.

Back on the January page you can see one of the more famous ones, Lost Time is never found again.

Franklin published an Almanack in 1752, the year that the British Calendar Act of 1751 updated the calendar from Julian to Gregorian reckoning. To bring the calendar into line with Gregorian, eleven days were dropped from September that year. I wondered what Franklin's calendar looked like that month. Here it is with the eleven days clearly missing:

The leftmost day-of-the-month column skips right from September 2 to September 14, as the law required. On this copy someone has added the old dates in the margin. Notice that St. Michael's Day, which would have been on Friday September 18th in the old calendar, has been moved up to September 29th. In most years Poor Richard's Almanack featured an essay by Poor Richard, little poems, and other reference material. The 1752 Almanack omitted most of this so that Franklin could use the space to instead reprint the entire text of the Calendar Act.

This page also commemorates the Great Fire of London, which began September 2, 1666.

Wikipedia tells me that Franklin may have gotten the King's birthday wrong. Franklin says November 10, but Wikipedia says November 9, and:

Over the course of George's life, two calendars were used: the Old Style Julian calendar and the New Style Gregorian calendar. Before 1700, the two calendars were 10 days apart. Hanover switched from the Julian to the Gregorian calendar on 19 February (O.S.) / 1 March (N.S.) 1700. Great Britain switched on 3/14 September 1752. George was born on 30 October Old Style, which was 9 November New Style, but because the calendar shifted forward a further day in 1700, the date is occasionally miscalculated as 10 November.

Ugh, calendars.

I got these scans from a web site called The Rare Book Room, but I found their user interface very troublesome, so I have scraped all the images they had. You may find them at https://pic.blog.plover.com/calendar/poor-richards-almanack/archive/. I'm pretty sure the copyright has expired, so share and enjoy.

Addenda

Several people have pointed out that the mysterious letters G, C, A on Sundays are the so-called dominical letters, used in remembering the correspondence between days of the month and days of the week, and important in the determination of the dates of Easter and other moveable feasts.

Why Franklin included them in the Almanack is not clear to me, as one of the main purposes of the almanac itself is so that you do not have to remember or calculate those things, you can just look them up in the almanac.

Mikkel Paulson explained the 'days dec.' and 'days inc.' notations: they describe the length of the day, but reported relative to the length of the most recent solstice. For example, the November 1753 excerpt for November 2 says "Days dec. 4 32". Going by the times of sunrise and sunset on that day, the day was 10 hours 18 minutes long. Adding the 4 hours 32 minutes from the notation we have 14 hours 50 minutes, which is indeed the length of the day on the summer solstice in Philadelphia, or close to it.

Similarly the notation on November 14 says "Days dec. 5 h" for a day that is 9 hours 50 minutes between sunrise and sunset, five hours shorter than on the summer solstice, and the January 3 entry says "Days inc. 18 m." for a 9h 28m day which is 18 minutes longer than the 9h 10m day one would have on the winter solstice.

by Mark Dominus (mjd@plover.com) at August 14, 2024 04:05 AM

Well-Typed.Com

The Haskell Unfolder Episode 31: nothunks

Today, 2024-08-14, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …) we are streaming the 31st episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 31: nothunks

Debugging space leaks can be one of the more difficult aspects of writing professional Haskell code. An important source of space leaks are unevaluated thunks in long-lived application data; in this episode of the Haskell Unfolder, we will see how we can take advantage of the nothunks library to make debugging and preventing these kinds of leaks significantly easier.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at August 14, 2024 12:00 AM

Chris Penner

August 12, 2024

ERDI Gergo

Formatting serial streams in hardware

I've been playing around with building a Sudoku solver circuit on an FPGA: you connect to it via a serial port, send it a Sudoku grid with some unknown cells, and after solving it, you get back the solved (fully filled-in) grid. I wanted the output to be nicely human-readable, for example for a 3,3-Sudoku (i.e. the usual Sudoku size where the grid is made up of a 3 ⨯ 3 matrix of 3 ⨯ 3 boxes):

4 2 1  9 5 8  6 3 7  
8 7 3  6 2 1  9 5 4  
5 9 6  4 7 3  2 1 8  

3 1 2  8 4 6  7 9 5  
7 6 8  5 1 9  3 4 2  
9 4 5  7 3 2  8 6 1  

2 8 9  1 6 4  5 7 3  
1 3 7  2 9 5  4 8 6  
6 5 4  3 8 7  1 2 9  
    

This post is about how I structured the stream transformer that produces all the right spaces and newlines, yielding a clash-protocols based circuit.

With clash-protocols, the type of a circuit that transforms a serial stream of a values into a stream of b values is Circuit (Df dom a) (Df dom b), with the Df type constructor taking care of representing acknowledgement signals. So for our formatter, we are looking to implement it as a Circuit (Df dom a) (Df dom (Either Char a): the output is a mixed stream of forwarded data a and punctuation characters.

If we were writing normal (software) Haskell, we could write the corresponding [a] -> [Either Char a] in many different ways, but since we want to describe a hardware circuit, there are a couple more constraints:

  • There is limited control over our input and output. The Circuit/Df abstraction provides a way to exert backpressure upstream, but that's a double edged sword: it also means whatever circuit we write has to be ready to handle backpressure from further downstream. And upstream can stall us: sometimes there is no input available.
  • Everything, of course, has to be finite. This includes both data and recursion depth.
  • We don't want to waste parts on our FPGA. If a counter can be just 11 bits, using an 11-bit data type instead of a Word16 can translate to actual savings.

Luckily, clash-protocols takes care of the first constraint via the expander function:

expander
    :: forall dom i o s. (HiddenClockResetEnable dom, NFDataX s)
    => s
    -> (s -> i -> (s, o, Bool)) -- ^ Return `True` when you're finished
                                -- with the current input value and are ready for the next one.
    -> Circuit (Df dom i) (Df dom o)
    

So basically expander reduces the problem to just writing a "normal" pure Haskell function s -> i -> (s, o, Bool). This is reassuring since I am otherwise not too familiar with clash-protocols but this take mes back to the terra firma of Haskell.

A simpler formatter

Let's warm up by writing a state machine for expander that just puts two spaces between each input:

data DoubleSpacedState
    = CopyTheInput
    | FirstSpace
    | SecondSpace
    deriving (Generic, NFDataX) -- So that Clash can store the state in a register

-- | Transform the stream 'a','b',... to Right 'a', Left ' ', Left ' ', Right 'b', Left ' ', Left ' ', ...
doubleSpaced :: (HiddenClockResetEnable dom) => Circuit (Df dom a) (Df dom (Either Char a))
doubleSpaced = expander CopyTheInput \s x -> case s of
    CopyTheInput -> (FirstSpace,   Right x,  True)
    FirstSpace   -> (SecondSpace,  Left ' ', False)
    SecondSpace  -> (CopyTheInput, Left ' ', False)
    

We can try it out using the Circuit simulator simulateCSE which on its own has an intimidating type because clash-protocols supports more protocols than just Df:

λ» :t simulateCSE
simulateCSE
    :: (Protocols.Internal.Drivable a, Protocols.Internal.Drivable b, KnownDomain dom)
    => (Clock dom -> Reset dom -> Enable dom -> Circuit a b)
    -> Protocols.Internal.ExpectType a
    -> Protocols.Internal.ExpectType b
    

But once we partially apply it on doubleSpaced with the Clash clock, reset and enable lines made explicit, the type suddenly makes a ton of sense:

λ» :t simulateCSE @System (exposeClockResetEnable  doubleSpaced) 
simulateCSE @System (exposeClockResetEnable doubleSpaced)
    :: (NFDataX a, ShowX a, Show a) => [a] -> [Either Char a]
    

Let's try applying it on a string like "Hello" to make sure it works correctly:

λ» simulateCSE @System (exposeClockResetEnable doubleSpaced) "Hello"
[Right 'H',Left ' ',Left ' ',
 Right 'e',Left ' ',Left ' ',
 Right 'l',Left ' ',Left ' ',
 Right 'l',Left ' ',Left ' ',
 Right 'o'
    

The simulation seemingly hangs because simulateCSE starts feeding input is not yet available signals to the simulated circuit after it runs out of the user-specified input. So it's not hanging, our circuit is just not producing any more output. And when we look at the type of expander, we see that this has to be how it works, since if there is no input, there is no i argument to pass to the state machine function. But this is not exactly what we want, since we want the two spaces to be sent as soon as possible, after the preceding character, not before the next character. So we change doubleSpaced to only consume the input after all spaces are output:

doubleSpaced :: (HiddenClockResetEnable dom) => Circuit (Df dom a) (Df dom (Either Char a))
doubleSpaced = expander CopyTheInput \s x -> case s of
    CopyTheInput -> (FirstSpace,   Right x,  False) -- Here...
    FirstSpace   -> (SecondSpace,  Left ' ', False)
    SecondSpace  -> (CopyTheInput, Left ' ', True)  -- ... and here
    
λ» simulateCSE @System (exposeClockResetEnable doubleSpaced) "Hello"
[Right 'H',Left ' ',Left ' ',
 Right 'e',Left ' ',Left ' ',
 Right 'l',Left ' ',Left ' ',
 Right 'l',Left ' ',Left ' ',
 Right 'o',Left ' ',Left ' '
    

Getting rid of the bespoke state datatype

If we want to write similar state machines for more complex formats, we should start thinking about composability. For example, we can think of our simple double-spaced formatter as a combination of a formatter that just forwards the input with another that outputs two spaces.

At any given moment, we are either in the forwarding state or the space-printing state:

data Forward = Forward deriving ...
data Spaces = FirstSpace | SecondSpace deriving ...
type DoubleSpacedState = Either Forward Spaces
    

Or using even more generic types, we can say that there is just one Forwarding state, and two state for the Spaces:

type DoubleSpacedState = Either (Index 1) (Index 2)
    

Here, Index :: Nat -> Type is a Clash-provided type with an exact number of distinct values 0, 1, ..., n-1, with Index n represented as ⌈log₂ n⌉ bits.

One nice thing about using only Either, Index and tuples for our state representation is that we can then use Clash's Counter class to iterate through the states:

doubleSpaced :: (HiddenClockResetEnable dom) => Circuit (Df dom a) (Df dom (Either Char a))
doubleSpaced = expander (Left 0 :: Either (Index 1) (Index 2)) \s x ->
    let output = case s of
            Left 0  -> Right x
            Right 0 -> Left ' '
            Right 1 -> Left ' '
        s' = countSucc s
        consume = case s' of
            Left 0 -> True
            _      -> False
    in (s', output, consume)
    

Here, I've changed the code computing whether the current input should be consumed so that it looks at the next state instead of the current one. Because this is really what we are doing – we want to go through as many states as we can, until we get to the point that the next time around we will need new input.

Declarative formatting

Compared to the initial version with three distinct, named constructors, we have gained generality, in that we can now imagine what the state would need to look like for our original formatting example. But already at this simplified example, it has cost us legibility: looking at the latest definition of doubleSpaced, it is not immediately obvious what format it corresponds to.

So of course the next thing we want to do is use a declarative syntax for the format, and derive everything else from that. We can take a page out of Servant and give users a library of type-level combinators corresponding to regular expressions without alternatives. Our end goal is to be able to write our Sudoku example as just a single type definition, using :++ for concatenation, :* for repetition, and type-level strings for literals:

type GridFormat n m = ((((Forward :++ " ") :* n :++ " ") :* m :++ "\r\n") :* m :++ "\r\n") :* n
    

So we Forward the data and follow it up with a single space, then after each nth repetition, we insert one more space. Do this whole thing m times, and end the line (using the old serial format instead of the "modern" newline-only Unix format), then after this is done m times, we insert the extra newline between the blocks.

Implementing this idea starts with capturing the essence of a format specifier: it needs to be associated with a counter type used for the given formatter's state, and we need to know how to produce the next single formatting token in any given state.

data PunctuatedBy c
    = Literal c
    | ForwardData
    deriving (Generic, NFDataX)

class (Counter (State fmt), NFDataX (State fmt)) => Format (fmt :: k) where
    type State fmt
    format1 :: proxy fmt -> State fmt -> PunctuatedBy Char
    

Then, using this interface, we can write a generic formatter using expander, similar to our earlier attempt:

format
    :: (HiddenClockResetEnable dom, Format fmt)
    => Proxy fmt
    -> Circuit (Df dom a) (Df dom (Either Word8 a))
format fmt = Df.expander countMin \s x ->
    let output = case format1 fmt s of
            ForwardData -> Right x
            Literal sep -> Left (ascii sep)
        s' = countSucc s
        consume = case format1 fmt s' of
            ForwardData -> True
            _ -> False
    in (s', output, consume)
    

The easy formatters

Let's get all the easy cases out of the way. These are the formatters where we can either directly write format1, or it can be delegated to other formatters:

-- | Consume one token of input and forward it to the output
data Forward

instance Format Forward where
    type State Forward = Index 1
        -- Here, `Index 1` stands in for `()` but with a (trival) `Counter` instance

    format1 _ _ = ForwardData

-- | Concatenation
data a :++ b

instance (Format a, Format b) => Format (a :++ b) where
    type State (a :++ b) = Either (State a) (State b)
        -- The order is important, since `countMin @(Either a b) = Left countMin`

    format1 _ = either (format1 (Proxy @a)) (format1 (Proxy @b))      

-- | Repetition
data a :* (rep :: Nat)

instance (Format a, KnownNat rep, 1 ≤ rep) => Format (a :* rep) where
    type State (a :* rep) = (Index rep, State a)
        -- The order is important, since that's how the `Counter` instance for tuples cascades increments

    format1 _ (_, fmt) = format1 (Proxy @a) fmt
    

Reflecting symbols character by character

What we want to do for the Format (sep :: Symbol) instance is to use Index n as the state, where n is the length of the symbol, and then format1 _ i would return the ith character of our separator sep.

Unfortunately, this requires considerably more elbow grease than the previous instances. Currently, there aren't many type-level functions over Symbol in base so we have to implement it all ourselves based on just the UnconsSymbol type family.

type SymbolLength s = SymbolLength' (UnconsSymbol s)
type IndexableSymbol s = IndexableSymbol' (UnconsSymbol s)

class (KnownNat (SymbolLength' s)) => IndexableSymbol' (s :: Maybe (Char, Symbol)) where
    type SymbolLength' s :: Nat

    symbolAt :: proxy s -> Index (SymbolLength' s) -> Char

instance IndexableSymbol' Nothing where
    type SymbolLength' Nothing = 0

    symbolAt _ i = error "impossible"

instance (IndexableSymbol s, KnownChar c) => IndexableSymbol' (Just '(c, s)) where
    type SymbolLength' (Just '(c, s)) = 1 + SymbolLength s

    {-# INLINE symbolAt #-}
    symbolAt _ i
        | i == 0
        = charVal (Proxy @c)

        | otherwise
        = symbolAt (Proxy @(UnconsSymbol s)) (fromIntegral (i - 1))
    

With the help of these utility classes, we can now write the formatter for Symbols. The lower bound on SymbolLength is needed because the degenerate type Index 0 (isomorphic to Void) would just screw everything up.

-- | Literal
instance (IndexableSymbol sep, KnownNat (SymbolLength sep), 1 ≤ SymbolLength sep) => Format sep where
    type State sep = Index (SymbolLength sep)

    format1 _ i = Literal $ symbolAt (Proxy @(UnconsSymbol sep)) i
    

Note that the indirection between a format fmt and its state type State fmt was only needed in the first place because we wanted Symbols to be valid formatters without wrapping them in an extra layer. If we were content with extra noise like Literal "\r\n" instead of just "\r\n", we could collapse the two types.

In my real code, I ended up having to change format slightly so that it directly produces 8-bit ASCII values instead of Chars, because I found that composing it with another Circuit that does the conversion wasn't getting inlined enough to produce Verilog that avoids the large 21-bit-wide multiplexers for Char.

August 12, 2024 07:16 PM

August 11, 2024

Oskar Wickström

A Flexible Minimalist Neovim for 2024

In the eternal search of a better text editor, I’ve recently gone back to Neovim. I’ve taken the time to configure it myself, with as few plugins and other cruft as possible. My goal is a minimalist editing experience, tailored for exactly those tasks that I do regularly, and nothing more. In this post, I’ll give a brief tour of my setup and its motivations.

August 11, 2024 10:00 PM

Magnus Therning

August 08, 2024

Brent Yorgey

Competitive Programming in Haskell: tree path decomposition, part II

Competitive Programming in Haskell: tree path decomposition, part II

Posted on August 8, 2024
Tagged , , , , ,

In a previous post I discussed the first half of my solution to Factor-Full Tree. In this post, I will demonstrate how to decompose a tree into disjoint paths. Technically, we should clarify that we are looking for directed paths in a rooted tree, that is, paths that only proceed down the tree. One could also ask about decomposing an unrooted tree into disjoint undirected paths; I haven’t thought about how to do that in general but intuitively I expect it is not too much more difficult.

For this particular problem, we want to decompose a tree into maximum-length paths (i.e. we start by taking the longest possible path, then take the longest path from what remains, and so on); I will call this the max-chain decomposition (I don’t know if there is a standard term). However, there are other types of path decomposition, such as heavy-light decomposition, so we will try to keep the decomposition code somewhat generic.

Preliminaries

This post is literate Haskell; you can find the source code on GitHub. We begin with some language pragmas and imports.

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE RecordWildCards #-}
{-# LANGUAGE TupleSections #-}

module TreeDecomposition where

import Control.Arrow ((>>>), (***))
import Data.Bifunctor (second)
import Data.ByteString.Lazy.Char8 (ByteString)
import Data.ByteString.Lazy.Char8 qualified as BS
import Data.List (sortBy)
import Data.List.NonEmpty (NonEmpty)
import Data.List.NonEmpty qualified as NE
import Data.Map (Map, (!), (!?))
import Data.Map qualified as M
import Data.Ord (Down(..), comparing)
import Data.Tree (Tree(..), foldTree)
import Data.Tuple (swap)

import ScannerBS

See here for the ScannerBS module.

Generic path decomposition

Remember, our goal is to split up a tree into a collection of linear paths; that is, in general, something like this:

What do we need in order to specify a decomposition of a tree into disjoint paths this way? Really, all we need is to choose at most one linked child for each node. In other words, at every node we can choose to continue the current path into a single child node (in which case all the other children will start their own new paths), or we could choose to terminate the current path (in which case every child will be the start of its own new path). We can represent such a choice with a function of type

type SubtreeSelector a = a -> [Tree a] -> Maybe (Tree a, [Tree a])

which takes as input the value at a node and the list of all the subtrees, and possibly returns a selected subtree along with the list of remaining subtrees.Of course, there is nothing in the type that actually requires a SubtreeSelector to return one of the trees from its input paired with the rest, but nothing we will do depends on this being true. In fact, I expect there may be some interesting algorithms obtainable by running a “path decomposition” with a “selector” function that actually makes up new trees instead of just selecting one, similar to the chop function.

Given such a subtree selection function, a generic path decomposition function will then take a tree and turn it into a list of non-empty paths:We could also imagine wanting information about the parent of each path, and a mapping from tree nodes to some kind of path ID, but we will keep things simple for now.

pathDecomposition :: SubtreeSelector a -> Tree a -> [NonEmpty a]

Implementing pathDecomposition is a nice exercise; you might like to try it yourself! You can find my implementation at the end of this blog post.

Max-chain decomposition

Now, let’s use our generic path decomposition to implement a max-chain decomposition. At each node we want to select the tallest subtree; in order to do this efficiently, we can first annotate each tree node with its height, via a straightforward tree fold:

type Height = Int

labelHeight :: Tree a -> Tree (Height, a)
labelHeight = foldTree node
 where
  node a ts = case ts of
    [] -> Node (0, a) []
    _ -> Node (1 + maximum (map (fst . rootLabel) ts), a) ts

Our subtree selection function can now select the subtree with the largest Height annotation. Instead of implementing this directly, we might as well make a generic function for selecting the “best” element from a list (we will reuse it later):

selectMaxBy :: (a -> a -> Ordering) -> [a] -> Maybe (a, [a])
selectMaxBy _ [] = Nothing
selectMaxBy cmp (a : as) = case selectMaxBy cmp as of
  Nothing -> Just (a, [])
  Just (b, bs) -> case cmp a b of
    LT -> Just (b, a : bs)
    _ -> Just (a, b : bs)

We can now put the pieces together to implement max-chain decomposition. We first label the tree by height, then do a path decomposition that selects the tallest subtree at each node. We leave the height annotations in the final output since they might be useful—for example, we can tell how long each path is just by looking at the Height annotation on the first element. If we don’t need them we can easily get rid of them later. We also sort by descending Height, since getting the longest chains first was kind of the whole point.

maxChainDecomposition :: Tree a -> [NonEmpty (Height, a)]
maxChainDecomposition =
  labelHeight >>>
  pathDecomposition (const (selectMaxBy (comparing (fst . rootLabel)))) >>>
  sortBy (comparing (Down . fst . NE.head))

Factor-full tree solution

To flesh this out into a full solution to Factor-Full Tree, after computing the chain decomposition we need to assign prime factors to the chains. From those, we can compute the value for each node if we know which chain it is in and the value of its parent. To this end, we will need one more function which computes a Map recording the parent of each node in a tree. Note that if we already know all the edges in a given edge list are oriented the same way, we can build this much more simply as e.g. map swap >>> M.fromList; but when (as in general) we don’t know which way the edges should be oriented first, we might as well first build a Tree a via DFS with edgesToTree and then construct the parentMap like this afterwards.

parentMap :: Ord a => Tree a -> Map a a
parentMap = foldTree node >>> snd
 where
  node :: Ord a => a -> [(a, Map a a)] -> (a, Map a a)
  node a b = (a, M.fromList (map (,a) as) <> mconcat ms)
   where
    (as, ms) = unzip b

Finally, we can solve Factor-Full tree. Note that some code from my previous blog post is needed as well, and is included at the end of the post for completeness. Once we compute the max chain decomposition and the prime factor for each node, we use a lazy recursive Map to compute the value assigned to each node.

solve :: TC -> [Int]
solve TC{..} = M.elems assignment
  where
    -- Build the tree and compute its parent map
    t = edgesToTree Node edges 1
    parent = parentMap t

    -- Compute the max chain decomposition, and use it to assign a prime factor
    -- to each non-root node
    paths :: [[Node]]
    paths = map (NE.toList . fmap snd) $ maxChainDecomposition t

    factor :: Map Node Int
    factor = M.fromList . concat $ zipWith (\p -> map (,p)) primes paths

    -- Compute an assignment of each node to a value, using a lazy map
    assignment :: Map Node Int
    assignment = M.fromList $ (1,1) : [(v, factor!v * assignment!(parent!v)) | v <- [2..n]]

For an explanation of this code for primes, see this old blog post.

primes :: [Int]
primes = 2 : sieve primes [3 ..]
 where
  sieve (p : ps) xs =
    let (h, t) = span (< p * p) xs
     in h ++ sieve ps (filter ((/= 0) . (`mod` p)) t)

Bonus: heavy-light decomposition

We can easily use our generic path decomposition to compute a heavy-light decomposition as well:

type Size = Int

labelSize :: Tree a -> Tree (Size, a)
labelSize = foldTree $ \a ts -> Node (1 + sum (map (fst . rootLabel) ts), a) ts

heavyLightDecomposition :: Tree a -> [NonEmpty (Size, a)]
heavyLightDecomposition =
  labelSize >>>
  pathDecomposition (const (selectMaxBy (comparing (fst . rootLabel))))

I plan to write about this in a future post.

Leftover code

Here’s my implementation of pathDecomposition; how did you do?

pathDecomposition select = go
 where
  go = selectPath select >>> second (concatMap go) >>> uncurry (:)

selectPath :: SubtreeSelector a -> Tree a -> (NonEmpty a, [Tree a])
selectPath select = go
 where
  go (Node a ts) = case select a ts of
    Nothing -> (NE.singleton a, ts)
    Just (t, ts') -> ((a NE.<|) *** (ts' ++)) (go t)

We also include some input parsing and tree-building code from last time.

main :: IO ()
main = BS.interact $ runScanner tc >>> solve >>> map (show >>> BS.pack) >>> BS.unwords

type Node = Int
data TC = TC { n :: Int, edges :: [(Node, Node)] }
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  edges <- (n - 1) >< pair int int
  return TC{..}

edgesToMap :: Ord a => [(a, a)] -> Map a [a]
edgesToMap = concatMap (\p -> [p, swap p]) >>> dirEdgesToMap

dirEdgesToMap :: Ord a => [(a, a)] -> Map a [a]
dirEdgesToMap = map (second (: [])) >>> M.fromListWith (++)

mapToTree :: Ord a => (a -> [b] -> b) -> Map a [a] -> a -> b
mapToTree nd m root = dfs root root
 where
  dfs parent root = nd root (maybe [] (map (dfs root) . filter (/= parent)) (m !? root))

edgesToTree :: Ord a => (a -> [b] -> b) -> [(a, a)] -> a -> b
edgesToTree nd = mapToTree nd . edgesToMap
<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at August 08, 2024 12:00 AM

August 04, 2024

Haskell Interlude

54: Dominic Orchard

In this episode, Wouter and Sam interview Dominic Orchard. Dominic has many roles, including: senior lecturer at the University of Kent, co-director of the Institute of Computing for Climate Science, and bye-fellow of Queen’s College in Cambridge. We will not only discuss his work on Granule - graded monads, coeffects, and linear types - but also his collaboration with actual scientists to improve the languages with which they work.

by Haskell Podcast at August 04, 2024 08:00 PM

August 01, 2024

Chris Reade

Graphs, Kites and Darts – and Theorems

We continue our exploration of properties of Penrose’s aperiodic tilings with kites and darts using Haskell and Haskell Diagrams.

In this blog we discuss some interesting properties we have discovered concerning the \small\texttt{decompose}, \small\texttt{compose}, and \small\texttt{force} operations along with some proofs.

Index

  1. Quick Recap (including operations \small\texttt{compose}, \small\texttt{force}, \small\texttt{decompose} on Tgraphs)
  2. Composition Problems and a Compose Force Theorem (composition is not a simple inverse to decomposition)
  3. Perfect Composition Theorem (establishing relationships between \small\texttt{compose}, \small\texttt{force}, \small\texttt{decompose})
  4. Multiple Compositions (extending the Compose Force theorem for multiple compositions)
  5. Proof of the Compose Force Theorem (showing \small\texttt{compose} is total on forced Tgraphs)

1. Quick Recap

Haskell diagrams allowed us to render finite patches of tiles easily as discussed in Diagrams for Penrose tiles. Following a suggestion of Stephen Huggett, we found that the description and manipulation of such tilings is greatly enhanced by using planar graphs. In Graphs, Kites and Darts we introduced a specialised planar graph representation for finite tilings of kites and darts which we called Tgraphs (tile graphs). These enabled us to implement operations that use neighbouring tile information and in particular operations \small\texttt{decompose}, \small\texttt{compose}, and \small\texttt{force}.

For ease of reference, we reproduce the half-tiles we are working with here.

Figure 1: Half-tile faces
Figure 1: Half-tile faces

Figure 1 shows the right-dart (RD), left-dart (LD), left-kite (LK) and right-kite (RK) half-tiles. Each has a join edge (shown dotted) and a short edge and a long edge. The origin vertex is shown red in each case. The vertex at the opposite end of the join edge from the origin we call the opp vertex, and the remaining vertex we call the wing vertex.

If the short edges have unit length then the long edges have length \phi (the golden ratio) and all angles are multiples of 36^{\circ} (a tenth turn) with kite halves having two 2s and a 1, and dart halves having a 3 and two 1s. This geometry of the tiles is abstracted away from at the graph representation level but used when checking validity of tile additions and by the drawing functions.

There are rules for how the tiles can be put together to make a legal tiling (see e.g. Diagrams for Penrose tiles). We defined a Tgraph (in Graphs, Kites and Darts) as a list of such half-tiles which are constrained to form a legal tiling but must also be connected with no crossing boundaries (see below).

As a simple example consider kingGraph (2 kites and 3 darts round a king vertex). We represent each half-tile as a TileFace with three vertex numbers, then apply makeTgraph to the list of ten Tilefaces. The function makeTgraph :: [TileFace] -> Tgraph performs the necessary checks to ensure the result is a valid Tgraph.

kingGraph :: Tgraph
kingGraph = makeTgraph 
  [LD (1,2,3),RD (1,11,2),LD (1,4,5),RD (1,3,4),LD (1,10,11)
  ,RD (1,9,10),LK (9,1,7),RK (9,7,8),RK (5,7,1),LK (5,6,7)
  ]

To view the Tgraph we simply form a diagram (in this case 2 diagrams horizontally separated by 1 unit)

  hsep 1 [labelled drawj kingGraph, draw kingGraph]

and the result is shown in figure 2 with labels and dashed join edges (left) and without labels and join edges (right).

Figure 2: kingGraph with labels and dashed join edges (left) and without (right).
Figure 2: kingGraph with labels and dashed join edges (left) and without (right).

The boundary of the Tgraph consists of the edges of half-tiles which are not shared with another half-tile, so they go round untiled/external regions. The no crossing boundary constraint (equivalently, locally tile-connected) means that a boundary vertex has exactly two incident boundary edges and therefore has a single external angle in the tiling. This ensures we can always locally determine the relative angles of tiles at a vertex. We say a collection of half-tiles is a valid Tgraph if it constitutes a legal tiling but also satisfies the connectedness and no crossing boundaries constraints.

Our key operations on Tgraphs are \small\texttt{decompose}, \small\texttt{force}, and \small\texttt{compose} which are illustrated in figure 3.

Figure 3: decompose, force, and compose
Figure 3: decompose, force, and compose

Figure 3 shows the kingGraph with its decomposition above it (left), the result of forcing the kingGraph (right) and the composition of the forced kingGraph (bottom right).

Decompose

An important property of Penrose dart and kite tilings is that it is possible to divide the half-tile faces of a tiling into smaller half-tile faces, to form a new (smaller scale) tiling.

Figure 4: Decomposition of (left) half-tiles
Figure 4: Decomposition of (left) half-tiles

Figure 4 illustrates the decomposition of a left-dart (top row) and a left-kite (bottom row). With our Tgraph representation we simply introduce new vertices for dart and kite long edges and kite join edges and then form the new faces using these. This does not involve any geometry, because that is taken care of by drawing operations.

Force

Figure 5 illustrates the rules used by our \small\texttt{force} operation (we omit a mirror-reflected version of each rule).

Figure 5: Force rules
Figure 5: Force rules

In each case the yellow half-tile is added in the presence of the other half-tiles shown. The yellow half-tile is forced because, by the legal tiling rules and the seven possible vertex types, there is no choice for adding a different half-tile on the edge where the yellow tile is added.

We call a Tgraph correct if it represents a tiling which can be continued infinitely to cover the whole plane without getting stuck, and incorrect otherwise. Forcing involves adding half-tiles by the illustrated rules round the boundary until either no more rules apply (in which case the result is a forced Tgraph) or a stuck tiling is encountered (in which case an incorrect Tgraph error is raised). Hence \small\texttt{force} is a partial function but total on correct Tgraphs.

Compose: This is discussed in the next section.

2. Composition Problems and a Theorem

Compose Choices

For an infinite tiling, composition is a simple inverse to decomposition. However, for a finite tiling with boundary, composition is not so straight forward. Firstly, we may need to leave half-tiles out of a composition because the necessary parts of a composed half-tile are missing. For example, a half-dart with a boundary short edge or a whole kite with both short edges on the boundary must necessarily be excluded from a composition. Secondly, on the boundary, there can sometimes be a problem of choosing whether a half-dart should compose to become a half-dart or a half-kite. This choice in composing only arises when there is a half-dart with its wing on the boundary but insufficient local information to determine whether it should be part of a larger half-dart or a larger half-kite.

In the literature (see for example 1 and 2) there is an often repeated method for composing (also called inflating). This method always make the kite choice when there is a choice. Whilst this is a sound method for an unbounded tiling (where there will be no choice), we show that this is an unsound method for finite tilings as follows.

Clearly composing should preserve correctness. However, figure 6 (left) shows a correct Tgraph which is a forced queen, but the kite-favouring composition of the forced queen produces the incorrect Tgraph shown in figure 6 (centre). Applying our \small\texttt{force} function to this reveals a stuck tiling and reports an incorrect Tgraph.

Figure 6: An erroneous and a safe composition
Figure 6: An erroneous and a safe composition

Our algorithm (discussed in Graphs, Kites and Darts) detects dart wings on the boundary where there is a choice and classifies them as unknowns. Our composition refrains from making a choice by not composing a half dart with an unknown wing vertex. The rightmost Tgraph in figure 6 shows the result of our composition of the forced queen with the half-tile faces left out of the composition (the remainder faces) shown in green. This avoidance of making a choice (when there is a choice) guarantees our composition preserves correctness.

Compose is a Partial Function

A different composition problem can arise when we consider Tgraphs that are not decompositions of Tgraphs. In general, \small\texttt{compose} is a partial function on Tgraphs.

Figure 7: Composition may fail to produce a Tgraph
Figure 7: Composition may fail to produce a Tgraph

Figure 7 shows a Tgraph (left) with its sucessful composition (centre) and the half-tile faces that would result from a second composition (right) which do not form a valid Tgraph because of a crossing boundary (at vertex 6). Thus composition of a Tgraph may fail to produce a Tgraph when the resulting faces are disconnected or have a crossing boundary.

However, we claim that \small\texttt{compose} is a total function on forced Tgraphs.

Compose Force Theorem

Theorem: Composition of a forced Tgraph produces a valid Tgraph.

We postpone the proof (outline) for this theorem to section 5. Meanwhile we use the result to establish relationships between \small\texttt{compose}, \small\texttt{force}, and \small\texttt{decompose} in the next section.

3. Perfect Composition Theorem

In Graphs, Kites and Darts we produced a diagram showing relationships between multiple decompositions of a dart and the forced versions of these Tgraphs. We reproduce this here along with a similar diagram for multiple decompositions of a kite.

Figure 8: Commuting Diagrams
Figure 8: Commuting Diagrams

In figure 8 we show separate (apparently) commuting diagrams for the dart and for the kite. The bottom rows show the decompositions, the middle rows show the result of forcing the decompositions, and the top rows illustrate how the compositions of the forced Tgraphs work by showing both the composed faces (black edges) and the remainder faces (green edges) which are removed in the composition. The diagrams are examples of some commutativity relationships concerning \small\texttt{force}, \small\texttt{compose} and \small\texttt{decompose} which we will prove.

It should be noted that these diagrams break down if we consider only half-tiles as the starting points (bottom right of each diagram). The decomposition of a half-tile does not recompose to its original, but produces an empty composition. So we do not even have g = (\small\texttt{compose} \cdot \small\texttt{decompose}) \  g in these cases. Forcing the decomposition also results in an empty composition. Clearly there is something special about the depicted cases and it is not merely that they are wholetile complete because the decompositions are not wholetile complete. [Wholetile complete means there are no join edges on the boundary, so every half-tile has its other half.]

Below we have captured the properties that are sufficient for the diagrams to commute as in figure 8. In the proofs we use a partial ordering on Tgraphs (modulo vertex relabelling) which we define next.

Partial ordering of Tgraphs

If g_0 and g_1 are both valid Tgraphs and g_0 consists of a subset of the (half-tile) faces of g_1 we have

\displaystyle g_0 \subseteq g_1

which gives us a partial order on Tgraphs. Often, though, g_0 is only isomorphic to a subset of the faces of g_1, requiring a vertex relabelling to become a subset. In that case we write

\displaystyle g_0 \sqsubseteq g_1

which is also a partial ordering and induces an equivalence of Tgraphs defined by

\displaystyle g_0 \equiv g_1  \text{ if and only if } g_0 \sqsubseteq g_1 \text{ and } g_1 \sqsubseteq g_0

in which case g_0 and g_1 are isomorphic as Tgraphs.

Both \small\texttt{compose} and \small\texttt{decompose} are monotonic with respect to \sqsubseteq meaning:

\displaystyle  g_0 \sqsubseteq g_1 \text{ implies } \small\texttt{compose} \ g_0 \sqsubseteq \small\texttt{compose} \ g_1 \text{ and } \small\texttt{decompose} \ g_0 \sqsubseteq \small\texttt{decompose} \ g_1

We also have \small\texttt{force} is monotonic, but only when restricted to correct Tgraphs. Also, when restricted to correct Tgraphs, we have \small\texttt{force} is non decreasing because it only adds faces:

\displaystyle  g  \sqsubseteq \small\texttt{force} \ g

and \small\texttt{force} is idempotent (forcing a forced correct Tgraph leaves it the same):

\displaystyle  (\small\texttt{force} \cdot \small\texttt{force}) \ g  \equiv \small\texttt{force} \ g

Composing perfectly and perfect compositions

Definition: A Tgraph g composes perfectly if all faces of g are composable (i.e there are no remainder faces of g when composing).

We note that the composed faces must be a valid Tgraph (connected with no crossing boundaries) if all faces are included in the composition because g has those properties. Clearly, if g composes perfectly then

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{compose}) \  g \equiv g

In general, for arbitrary g where the composition is defined, we only have

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{compose}) \  g \sqsubseteq g

Definition: A Tgraph g' is a perfect composition if \small\texttt{decompose} \  g' composes perfectly.

Clearly if g' is a perfect composition then

\displaystyle (\small\texttt{compose} \cdot \small\texttt{decompose}) \  g' \equiv g'

(We could use equality here because any new vertex labels introduced by \small\texttt{decompose} will be removed by \small\texttt{compose}). In general, for arbitrary g',

\displaystyle (\small\texttt{compose} \cdot \small\texttt{decompose}) \  g' \sqsubseteq g'

Lemma 1: g' is a perfect composition if and only if g' has the following 2 properties:

  1. every half-kite with a boundary join has either a half-dart or a whole kite on the short edge, and
  2. every half-dart with a boundary join has a half-kite on the short edge,

(Proof outline:) Firstly note that unknowns in g (= \small\texttt{decompose} \  g') can only come from boundary joins in g'. The properties 1 and 2 guarantee that g has no unknowns. Since every face of g has come from a decomposed face in g', there can be no faces in g that will not recompose, so g will compose perfectly to g'. Conversely, if g' is a perfect composition, its decomposition g can have no unknowns. This implies boundary joins in g' must satisfy properties 1 and 2. \square

(Note: a perfect composition g' may have unknowns even though its decomposition g has none.)

It is easy to see two special cases:

  1. If g' is wholetile complete then g' is a perfect composition.

    Proof: Wholetile complete implies no boundary joins which implies properties 1 and 2 in lemma 1 which implies g' is a perfect composition. \square

  2. If g' is a decomposition then g' is a perfect composition.

    Proof: If g' is a decomposition, then every half-dart has a half-kite on the short edge which implies property 2 of lemma 1. Also, any half-kite with a boundary join in g' must have come from a decomposed half-dart since a decomposed half-kite produces a whole kite with no boundary kite join. So the half-kite must have a half-dart on the short edge which implies property 1 of lemma 1. The two properties imply g' is a perfect composition. \square

We note that these two special cases cover all the Tgraphs in the bottom rows of the diagrams in figure 8. So the Tgraphs in each bottom row are perfect compositions, and furthermore, they all compose perfectly except for the rightmost Tgraphs which have empty compositions.

In the following results we make the assumption that a Tgraph is correct, which guarantees that when \small\texttt{force} is applied, it terminates with a correct Tgraph. We also note that \small\texttt{decompose} preserves correctness as does \small\texttt{compose} (provided the composition is defined).

Lemma 2: If g_f is a forced, correct Tgraph then

\displaystyle (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) \  g_f \equiv g_f

(Proof outline:) The proof uses a case analysis of boundary and internal vertices of g_f. For internal vertices we just check there is no change at the vertex after (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) using figure 12 (plus an extra case for the forced star). For boundary vertices we check the local context cases shown in figure 9.

Figure 9: Local contexts for boundary vertices of a forced Tgraph
Figure 9: Local contexts for boundary vertices of a forced Tgraph

This shows two cases for a kite origin, two cases for a kite opp, four cases for a kite wing, and four cases for a dart origin. The only case for a dart wing is one of the two kite opp cases and there are no dart opp cases as these cannot be on the boundary of a forced Tgraph. Actually figure 9 has a repeated case which is both a dart origin and a kite wing, but there are also 3 additional cases when we consider mirror images of those shown. Since there is no local change of the context in each case, and since this is true for all boundary vertices in any forced Tgraph, there can be no non-local change either. (We omit the full details). \square

Lemma 3: If g' is a perfect composition and a correct Tgraph, then

\displaystyle \small\texttt{force} \  g' \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) \  g'

(Proof outline:) The proof is by analysis of each possible force rule applicable on a boundary edge of g' and checking local contexts to establish that (i) the result of applying (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) to the local context must include the added half-tile, and (ii) if the added half tile has a new boundary join, then the result must include both halves of the new half-tile. The two properties of perfect compositions mentioned in lemma 1 are critical for the proof. However, since the result of adding a single half-tile may break the condition of the Tgraph being a pefect composition, we need to arrange that half-tiles are completed first then each subsequent half-tile addition is paired with its wholetile completion. This ensures the perfect composition condition holds at each step for a proof by induction. [A separate proof is needed to show that the ordering of applying force rules makes no difference to a final correct Tgraph (apart from vertex relabelling)]. \square

Lemma 4 If g composes perfectly and is a correct Tgraph then

\displaystyle \small\texttt{force} \ g \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose})\ g

Proof: Assume g composes perfectly and is a correct Tgraph. Since \small\texttt{force} is non-decreasing (with respect to \sqsubseteq on correct Tgraphs)

\displaystyle \small\texttt{compose} \  g \sqsubseteq (\small\texttt{force} \cdot \small\texttt{compose}) \  g

and since \small\texttt{decompose} is monotonic

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{compose}) \  g \sqsubseteq (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \  g

Since g composes perfectly, the left hand side is just g, so

\displaystyle g \sqsubseteq (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \  g

and since \small\texttt{force} is monotonic (with respect to \sqsubseteq on correct Tgraphs)

\displaystyle (*) \ \ \ \  \ \small\texttt{force} \  g \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \  g

For the opposite direction, we substitute \small\texttt{compose} \  g for g' in lemma 3 to get

\displaystyle (\small\texttt{force} \cdot \small\texttt{compose}) \  g \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \  g

Then, since (\small\texttt{decompose} \cdot \small\texttt{compose}) \  g \equiv g, we have

\displaystyle (\small\texttt{force} \cdot \small\texttt{compose}) \  g \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force}) \  g

Apply \small\texttt{decompose} to both sides (using monotonicity)

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \  g \sqsubseteq (\small\texttt{decompose} \cdot \small\texttt{compose} \cdot \small\texttt{force}) \  g

For any g'' for which the composition is defined we have (\small\texttt{decompose} \cdot \small\texttt{compose})\ g'' \sqsubseteq g'' so we get

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \  g \sqsubseteq \small\texttt{force} \  g

Now apply \small\texttt{force} to both sides and note (\small\texttt{force} \cdot \small\texttt{force})\ g \equiv \small\texttt{force} \ g to get

\displaystyle (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \  g \sqsubseteq \small\texttt{force} \  g

Combining this with (*) above proves the required equivalence. \square

Theorem (Perfect Composition): If g composes perfectly and is a correct Tgraph then

\displaystyle (\small\texttt{compose} \cdot \small\texttt{force}) \  g \equiv (\small\texttt{force} \cdot \small\texttt{compose}) \  g

Proof: Assume g composes perfectly and is a correct Tgraph. By lemma 4 we have

\displaystyle \small\texttt{force} \ g \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose})\ g

Applying \small\texttt{compose} to both sides, gives

\displaystyle (\small\texttt{compose} \cdot \small\texttt{force}) \ g \equiv (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose})\ g

Now by lemma 2, with g_f = (\small\texttt{force} \cdot \small\texttt{compose}) \  g, the right hand side is equivalent to

\displaystyle (\small\texttt{force} \cdot \small\texttt{compose}) \  g

which establishes the result. \square

Corollaries (of the perfect composition theorem):

  1. If g' is a perfect composition and a correct Tgraph then

    \displaystyle  \small\texttt{force} \  g' \equiv (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) \  g'

    Proof: Let g' = \small\texttt{compose} \  g (so g \equiv \small\texttt{decompose} \  g') in the theorem. \square

    [This result generalises lemma 2 because any correct forced Tgraph g_f is necessarily wholetile complete and therefore a perfect composition, and \small\texttt{force} \ g_f \equiv g_f.]

  2. If g' is a perfect composition and a correct Tgraph then

    \displaystyle  (\small\texttt{decompose} \cdot \small\texttt{force}) \  g' \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose}) \  g'

    Proof: Apply \small\texttt{decompose} to both sides of the previous corollary and note that

    \displaystyle  (\small\texttt{decompose} \cdot \small\texttt{compose}) \  g'' \sqsubseteq g'' \textit{ for any } g''

    provided the composition is defined, which it must be for a forced Tgraph by the Compose Force theorem. \square

  3. If g' is a perfect composition and a correct Tgraph then

    \displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose}) \  g' \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \  g'

    Proof: Apply \small\texttt{force} to both sides of the previous corollary noting \small\texttt{force} is monotonic and idempotent for correct Tgraphs

    \displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \  g' \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose}) \  g'

    From the fact that \small\texttt{force} is non decreasing and \small\texttt{decompose} and \small\texttt{force} are monotonic, we also have

    \displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose}) \  g' \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \  g'

    Hence combining these two sub-Tgraph results we have

    \displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose}) \  g' \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \  g'

    \square

It is important to point out that if g is a correct Tgraph and \small\texttt{compose} \  g is a perfect composition then this is not the same as g composes perfectly. It could be the case that g has more faces than (\small\texttt{decompose} \cdot \small\texttt{compose}) \  g and so g could have unknowns. In this case we can only prove that

\displaystyle  (\small\texttt{force} \cdot \small\texttt{compose}) \  g \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force}) \  g

As an example where this is not an equivalence, choose g to be a star. Then its composition is the empty Tgraph (which is still a pefect composition) and so the left hand side is the empty Tgraph, but the right hand side is a sun.

Perfectly composing generators

The perfect composition theorem and lemmas and the three corollaries justify all the commuting implied by the diagrams in figure 8. However, one might ask more general questions like: Under what circumstances do we have (for a correct forced Tgraph g_f)

\displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \  g_f \equiv g_f

Definition A generator of a correct forced Tgraph g_f is any Tgraph g such that g \sqsubseteq g_f and \small\texttt{force} \ g \equiv g_f.

We can now state that

Corollary If a correct forced Tgraph g_f has a generator which composes perfectly, then

\displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \  g_f \equiv g_f

Proof: This follows directly from lemma 4 and the perfect composition theorem. \square

As an example where the required generator does not exist, consider the rightmost Tgraph of the middle row in figure 10. It is generated by the Tgraph directly below it, but it has no generator with a perfect composition. The Tgraph directly above it in the top row is the result of applying (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) which has lost the leftmost dart of the Tgraph.

Figure 10: A Tgraph without a perfectly composing generator
Figure 10: A Tgraph without a perfectly composing generator

We could summarise this section by saying that \small\texttt{compose} can lose information which cannot be recovered by a subsequent \small\texttt{force} and, similarly, \small\texttt{decompose} can lose information which cannot be recovered by a subsequent \small\texttt{force}. We have defined perfect compositions which are the Tgraphs that do not lose information when decomposed and Tgraphs which compose perfectly which are those that do not lose information when composed. Forcing does the same thing at each level of composition (that is it commutes with composition) provided information is not lost when composing.

4. Multiple Compositions

We know from the Compose Force theorem that the composition of a Tgraph that is forced is always a valid Tgraph. In this section we use this and the results from the last section to show that composing a forced, correct Tgraph produces a forced Tgraph.

First we note that:

Lemma 5: The composition of a forced, correct Tgraph is wholetile complete.

Proof: Let g' = \small\texttt{compose} \  g_f where g_f is a forced, correct Tgraph. A boundary join in g' implies there must be a boundary dart wing of the composable faces of g_f. (See for example figure 4 where this would be vertex 2 for the half dart case, and vertex 5 for the half-kite face). This dart wing cannot be an unknown as the half-dart is in the composable faces. However, a known dart wing must be either a large kite centre or a large dart base and therefore internal in the composable faces of g_f (because of the force rules) and therefore not on the boundary in g'. This is a contradiction showing that g' can have no boundary joins and is therefore wholetile complete. \square

Theorem: The composition of a forced, correct Tgraph is a forced Tgraph.

Proof: Let g' = \small\texttt{compose} \  g_f for some forced, correct Tgraph g_f, then g' is wholetile complete (by lemma 5) and therefore a perfect composition. Let g = \small\texttt{decompose} \  g', so g composes perfectly (g' \equiv \small\texttt{compose} \  g). By the perfect composition theorem we have

\displaystyle (**) \ \ \ \  \ (\small\texttt{compose} \cdot \small\texttt{force}) \  g \equiv (\small\texttt{force} \cdot \small\texttt{compose}) \  g \equiv \small\texttt{force} \  g'

We also have

\displaystyle  g = \small\texttt{decompose} \  g' = (\small\texttt{decompose} \cdot \small\texttt{compose}) \  g_f \sqsubseteq g_f

Applying \small\texttt{force} to both sides, noting that \small\texttt{force} is monotonic and the identity on forced Tgraphs, we have

\displaystyle  \small\texttt{force} \  g \sqsubseteq \small\texttt{force} \  g_f \equiv g_f

Applying \small\texttt{compose} to both sides, noting that \small\texttt{compose} is monotonic, we have

\displaystyle  (\small\texttt{compose} \cdot \small\texttt{force}) \  g \sqsubseteq \small\texttt{compose} \  g_f \equiv g'

By (**) above, the left hand side is equivalent to \small\texttt{force} \  g' so we have

\displaystyle  \small\texttt{force} \  g' \sqsubseteq g'

but since we also have (\small\texttt{force} being non-decreasing)

\displaystyle  g' \sqsubseteq \small\texttt{force} \  g'

we have established that

\displaystyle  g' \equiv \small\texttt{force} \  g'

which means g' is a forced Tgraph. \square

This result means that after forcing once we can repeatedly compose creating valid Tgraphs until we reach the empty Tgraph.

We can also use lemma 5 to establish the converse to a previous corollary:

Corollary If a correct forced Tgraph g_f satisfies:

\displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \  g_f \equiv g_f

then g_f has a generator which composes perfectly.

Proof: By lemma 5, \small\texttt{compose} \ g_f is wholetile complete and hence a perfect composition. This means that (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g_f composes perfectly and it is also a generator for g_f because

\displaystyle  (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \  g_f \equiv g_f

\square

5. Proof of the Compose Force theorem

Theorem (Compose Force): Composition of a forced Tgraph produces a valid Tgraph.

Proof: For any forced Tgraph we can construct the composed faces. For the result to be a valid Tgraph we need to show no crossing boundaries and connectedness for the composed faces. These are proved separately by case analysis below.

Proof of no crossing boundaries

Assume g_f is a forced Tgraph and that it has a non-empty set of composed faces (we can ignore cases where the composition is empty as the empty Tgraph is valid). Consider a vertex v in the composed faces of g_f and first take the case that v is on the boundary of g_f . We consider local contexts for a vertex v on a forced Tgraph boundary where the composition is non-empty as shown in figure 11.

Figure 11: Forced Boundary Vertex Compose Contexts
Figure 11: Forced Boundary Vertex Compose Contexts

In each case v is shown as a red dot, and the composition is shown filled yellow. The cases for v are shown in rows: the first row is for dart origins, the second row is for kite origins, the third row is for kite wings, and the last rows is for kite opps. The dart wing cases are a subset of the kite opp cases, so not repeated, and dart opp vertices are excluded because they cannot be on the boundary of a forced Tgraph. We only show left-hand versions, so there is a mirror symmetric set for right-hand versions.

It is easy to see that there are no crossing boundaries of the composed faces at v in each case. Since any boundary vertex of any forced Tgraph (with a non-empty composition) must match one of these local context cases around the vertex, we can conclude that a boundary vertex of g_f cannot become a crossing boundary in compose \  g_f.

Next take the case where v is an internal vertex of g_f .

Figure 12: Vertex types and their relationships
Figure 12: Vertex types and their relationships

Figure 12 shows relationships between the forced Tgraphs of the 7 (internal) vertex types (plus a kite at the top right). The red faces are those around the vertex type and the black faces are those produced by forcing (if any). Each forced Tgraph has its composition directly above with empty compositions for the top row. We note that a (forced) star, jack, king, and queen vertex remains an internal vertex in the respective composition so cannot become a crossing boundary vertex. A deuce vertex becomes the centre of a larger kite and is no longer present in the composition (top right). That leaves cases for the sun vertex and ace vertex (=fool vertex). The sun Tgraph (sunGraph) and fool Tgraph (fool) consist of just the red faces at the respective vertex (shown top left and top centre). These both have empty compositions when there is no surrounding context. We thus need to check possible forced local contexts for sunGraph and fool.

The fool case is simple and similar to a duece vertex in that it is never part of a composition. [To see this consider inverting the decomposition arrows shown in figure 4. In both cases we see the half-dart opp vertex (labelled 4 in figure 4) is removed].

For the sunGraph there are only 7 local forced context cases to consider where the sun vertex is on the boundary of the composition.

Figure 13: Forced Contexts for a sun vertex v where v is on the composition boundary
Figure 13: Forced Contexts for a sun vertex v where v is on the composition boundary

Six of these are shown in figure 13 (the missing one is just a mirror reflection of the fourth case). Again, the relevant vertex v is shown as a red dot and the composed faces are shown filled yellow, so it is easy to check that there is no crossing boundary of the composed faces at v in each case. Every forced Tgraph containing an internal sun vertex where the vertex is on the boundary of the composition must match one of the 7 cases locally round the vertex.

Thus no vertex from g_f can become a crossing boundary vertex in the composed faces and since the vertices of the composed faces are a subset of those of g_f, we can have no crossing boundary vertex in the composed faces.

Proof of Connectedness

Assume g_f is a forced Tgraph as before. We refer to the half-tile faces of g_f that get included in the composed faces as the composable faces and the rest as the remainder faces. We want to prove that the composable faces are connected as this will imply the composed faces are connected.

As before we can ignore cases where the set of composable faces is empty, and assume this is not the case. We study the nature of the remainder faces of g_f. Firstly, we note:

Lemma (remainder faces)

The remainder faces of g_f are made up entirely of groups of half-tiles which are either:

  1. Half-fools (= a half dart and both halves of the kite attached to its short edge) where the other half-fool is entirely composable faces, or
  2. Both halves of a kite with both short edges on the (g_f) boundary (so they are not part of a half-fool) where (at most) the origin is in common with composable faces, or
  3. Whole fools where (at most) the shared kite origin in common with composable faces.
Figure 14: Remainder face groups (cases 1,2, and 3)
Figure 14: Remainder face groups (cases 1,2, and 3)

These 3 cases of remainder face groups are shown in figure 14. In each case the possible border in common with composable faces is shown yellow and the red edges are necessarily on the boundary of g_f (the black boundary could be on the boundary of g_f or shared with another reamainder face group). [A mirror symmetric version for the first group is not shown.] Examples can be seen in e.g. figure 13 where the first Tgraph has four examples of case 1, and two of case 2, the second has six examples of case 1 and two of case 2, and the fifth Tgraph has an example of case 3 as well as four of case 1. [We omit the detailed proof of this lemma which reasons about what gets excluded in a composition after forcing. However, all the local context cases are included in figure 15 (left-hand versions), where we only show those contexts where there is a non-empty composition.]

We note from the (remainder faces) lemma that the common boundary of the group of remainder faces with the composable faces (shown yellow in figure 14) is at most a single vertex in cases 2 and 3. In case 1, the common boundary is just a single edge of the composed faces which is made up of 2 adjacent edges of the composable faces that constitute the join of two half-fools.

This means each (remainder face) group shares boundary with exactly one connected component of the composable faces.

Next we establish that if two (remainder face) groups are connected they must share boundary with the same connected component of the composable faces. We need to consider how each (remainder face) group can be connected with a neighbouring such group. It is enough to consider forced contexts of boundary dart long edges (for cases 1 and 3) and boundary kite short edges (for case 2). The cases where the composition is non-empty all appear in figure 15 (left-hand versions) along with boundary kite long edges (middle two rows) which are not relevant for the proof.

Figure 15: Forced contexts for boundary edges
Figure 15: Forced contexts for boundary edges

We note that, whenever one group of the remainder faces (half-fool, whole-kite, whole-fool) is connected to a neighbouring group of the remainder faces, any common boundary (shared edges and vertices) with the compososable faces is also connected. The combined common boundary forms either 2 adjacent composed face boundary edges (= 4 adjacent edges of the composable faces), or a composed face boundary edge and one of its end vertices, or a single composed face boundary vertex.

It follows that any connected collection of the remainder face groups shares boundary with a unique connected component of the composable faces. Since the collection of composable and remainder faces together is connected (g_f is connected) the removal of the remainder faces cannot disconnect the composable faces. For this to happen, at least one connected collection of remainder face groups would have to be connected to more than one connected component of composable faces.

This establishes connectedness of any composition of a forced Tgraph, and this completes the proof of the Compose Force theorem. \square

References

[1] Martin Gardner (1977) MATHEMATICAL GAMES. Scientific American, 236(1), (pages 110 to 121). http://www.jstor.org/stable/24953856

[2] Grünbaum B., Shephard G.C. (1987) Tilings and Patterns. W. H. Freeman and Company, New York. ISBN 0-7167-1193-1 (Hardback) (pages 540 to 542).

by readerunner at August 01, 2024 01:59 PM

July 31, 2024

Well-Typed.Com

The Haskell Unfolder Episode 30: runST

Today, 2024-07-31, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …) we are streaming the 30th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 30: runST

In Haskell, the ST type offers a restricted subset of the IO functionality: it provides mutable variables, but nothing else. The advantage is that we can use mutable storage locally, because unlike IO, ST allows us to escape from its realm via the function runST. However, runST has a so-called rank-2 type. In this episode, we will discuss why this seemingly complicated type is necessary to preserve the safety of the operation.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at July 31, 2024 12:00 AM

July 30, 2024

Chris Smith 2

Collatz and Base 2 / Base 3 Duality

It might be obvious from my last two articles that I’ve been thinking about the Collatz conjecture a bit. Here, I can tie some of these ideas together in a surprising and really striking way.

Some of this I covered in earlier posts, but I’m going to construct things a little differently, so I’ll start from scratch. The Collatz conjecture is about the function f(n) defined to be n/2 if n is even, or 3n+1 if n is odd. Starting with some number (say, 7, for example) we can apply this function repeatedly to get 7, then 22, then 11, then 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, and then we’ll repeat 4, 2, 1, 4, 2, 1, and so on forever. The conjecture is that no matter which positive integer you start with, you’ll always end up in that same loop of 4, 2, and 1.

For reference, it’s going to be more convenient for us to work with something called the shortcut Collatz map. The idea here is that when n is odd, we already know that 3n+1 will be even. So we can shortcut one iteration by jumping straight to (3n+1)/2, just avoiding a separate pass for the division by two that we already know will be necessary.

We tend to work in base 10 as society, but the question I asked in an article a couple weeks ago is what happens if you perform this computation in base 2 or 3, instead.

  • In base 2, it’s trivial to decide if a number is even or odd, and if it’s even, to divide by two. You just look at the least significant bit, and drop it if it’s a zero!
  • In base 3, it’s trivial to compute 3n+1. You just add a 1 digit to the end of the number!

We could go either way, really, and in my original article I explored both computations to see what they looked like. This time, we’ll first head deep into the base 2 side, and see where it leads us.

Collatz in Base 2

When computing the Collatz function in base 2, the computationally significant part is to multiply a base 2 number by 3. We can work this out in the standard algorithm we all learned in elementary school, working from right to left, and keeping track of a carry at each step.

We can even enumerate the rules:

  • If the next bit is a 0 and the carry is a 0, then output a 0 and carry a 0.
  • If the next bit is a 1 and the carry is a 0, then output a 1 and carry a 1.
  • If the next bit is a 0 and the carry is a 1, then output a 1 and carry a 0.
  • If the next bit is a 1 and the carry is a 1, then output a 0 and carry a 2.
  • If the next bit is a 0 and the carry is a 2, then output a 0 and carry a 1.
  • If the next bit is a 1 and the carry is a 2, then output a 1 and carry a 2.

and represent this using a finite state machine with a transition diagram.

This machine isn’t too hard to understand, really. When you see a 0, move up one state; when you see a 1, move down one state. When the carry is 1 (before moving), output the opposite bit; otherwise, output the same bit. That’s all there is to it.

We will make three small modifications to this simple state machine:

  • In the Collatz map, we want to compute 3n+1, That just amounts to starting with a carry of 1, rather than 0.
  • Before computing 3n+1, we should divide by two until the number is odd. That amounts to adding a new “Start” state, or S for short, that ignores zeros on the right, and then acts like carry 1 when it encounters the first 1 bit. (Recall that we’re working from right to left!)
  • Finally, let’s compute the shortcut map as well: as discussed above, when we compute 3n+1, it will always be even. (We will always move the start state that acts like carry 1 into carry 2, and the arrow there emits a 0 in the least significant bit.) We do not emit the zero when moving from the start state to carry 2, so the bits that come out represent (3n+1)/2.

The resulting maching looks like this.

When fed the right-to-left bits of a non-zero number, this machine will compute what we might call a compressed Collatz map: dividing by 2 as long as the number remains even, and then compute (3n+1)/2 just like the shortcut Collatz map does.

Iterating the Map

The Collatz conjecture isn’t about a single application of this map, though, but rather about the trajectory of a number when the map is applied many times in succession. To simulate this, we’ll want a whole infinite array of these machines connected end to end, so the bits that leave each one arrive at the one after. Something like this:

This is starting to get out of hand! So let’s simplify. Two things:

  • Because their state transition diagrams are all the same, the only information we need about each machine is what state it’s in.
  • The S state never emits any bits, and you can never get back to S once you leave it, so we know that as soon as we see an S, the entire rest of the machines, the whole infinite tail, is still sitting in the S state waiting for bits. We need not worry about these states at all.
  • Once we’re done feeding in the non-zero digits of the input number, any machines in state 0 also become uninteresting. The rest of the inputs will all be zero, they will remain in state 0, and they will pass on that 0 bit of input unchanged. Again, we need not worry about these machines.

With that in mind, we can trace what happens when we feed this array of cascading machines the bits of a number. Let’s try 7, since we saw its sequence already earlier on.

The output of each machine feeds into the next machine below it, and I’ve drawn this propagation of outputs to inputs of the next machine using green arrows. We’ll draw the digits of input from right to left, matching the conventional order of writing binary numbers, so in a sense, time flows from right to left here. Each state machine remembers its state as time progresses to the left, and I’ve drawn this memory of previous states using blue arrows. Notice that to play out the full dynamics, we need to feed in quite a few of the leading zeros on the input.

In the rows of green arrows, you can read off the outputs of each state machine in the sequence in binary:

  • binary 111 = decimal 7
  • binary 1011 = decimal 11
  • binary 10001 = decimal 17
  • binary 11010 = decimal 26
  • binary 10100 = decimal 20
  • binary 1000 = decimal 8
  • binary 10 = decimal 2

If we were to continue, the next rows would just repeat binary 10 (decimal 2) forever. This makes sense, because the way we defined the compressed Collatz map stabilizes at 2, rather than 1.

A second thing we can read from this map is the so-called parity sequence of the shortcut Collatz map. This is just the sequence of evens and odds that occur when the map is iterated on the starting number. When a column ends by emitting a 1 bit, bumping a new machine out of the S state, that indicates that the next value of the shortcut Collatz map will be odd. When it ends in a 0 bit, then the next value will be even.

There’s quite a lot of interesting theory about the parity sequences of the shortcut Collatz map! It turns out that every possible parity sequence is generated by a unique 2-adic integer, which I defined in my previous article, so the 2-adic integers are in one-to-one correspondence with parity sequences. We can, in fact, compute the reverse direction of this one-to-one correspondence as well using state arrays like this one. Every eventually repeating parity sequence comes from a rational number, via the canonical embedding of the rationals into the 2-adic numbers. (The converse, though, that acyclic parity sequences only come from irrational 2-adic integers, is conjectured but not known!)

Because every parity sequence comes from a unique 2-adic integer, if we could show that every positive integer eventually leads to alternating even and odd numbers in its parity sequence, this would prove that the Collatz conjecture is true. Now we have a new way of looking at that question. Among the 2-adic integers, the positive integers are those that eventually have an infinite number of leading 0s. So we can ask instead, from any starting state sequence of this array of machines, when feeding zeros into the sequence forever, do we eventually (ignoring machines at the beginning that have reached the 0 state) reach only a single machine alternating through states 1, 2, 1, 2, etc.?

This isn’t an easy question, though. Certainly, feeding zeros into the array on the left will bump the state of the top-most machines down to zero. However, the bits continue to propagate through the machine, possibly pushing new machines out of their starting states and so appending them on the bottom! There is something of a race between these two processes of pruning machines on the top and adding them on the bottom, and we would need to show that the pruning wins that race.

From Base 2 to Base 3

As we investigate this race, we discover something surprising about the behavior of the state sequences when leading zeros are fed into the top machine. If you read the machine states in the blue arrows, starting at the third column from the right after all the non-zero bits of input have been fed in, we can interpret those state sequences as a ternary (base 3) number! And we get quite a familiar progression:

  • ternary 222 = decimal 26
  • ternary 111 = decimal 13
  • ternary 202 = decimal 20
  • ternary 101 = decimal 10
  • ternary 12 = decimal 5
  • ternary 22 = decimal 8
  • ternary 11 = decimal 4
  • ternary 2 = decimal 2
  • ternary 1 = decimal 1

That’s the shortcut Collatz sequence again! Rather than starting at 7, we jumped three numbers ahead because it took those three steps to feed in the non-zero bits of 7, so we missed 7, 11 and 17 and went straight to 26. Then we continue according to the same dynamics.

This coincidence where state sequences can be interpreted as ternary numbers is suprising, but is it useful? It can be a revealing way to think about Collatz sequences. Here’s an example.

Numbers of the form 2-1 have a binary representation consisting of n consecutive 1s. If we trace what happens to the state sequence, we find that each 1 we feed to this state sequence propagates all the way to the end to append another 2 to the sequence, leaving us with a state sequence consisting of n consecutive 2s. As a ternary number, that is 3-1. If the above is correct, then, we can conclude that iterating the shortcut Collatz map n times starting with 2-1 should yield 3-1 as a result.

In fact, this isn’t hard to prove. We can prove the more general statement that for all in, iterating the shortcut Collatz map i times on 2-1 gives a result of 32ⁿ⁻ⁱ-1. A simple induction suffices. If i = 0, then the result is immediate. Assuming it’s true for i, and that i + 1 ≤ n, we know that 32ⁿ⁻ⁱ-1 is odd, so applying the shortcut Collatz map one more time yields (3(32ⁿ⁻ⁱ-1)+1)/2, which simplifies to establish the property for i + 1 as well, completing the induction. Now let i = n to recover the original statement.

The proof was simple, but the idea came from observing the behavior of this state machine. And this is an interesting observation: 3-1 grows asymptotically faster than 2-1, so it implies that there is no bound on the factor by which a number might grow in the Collatz sequence. We can always find some arbitrarily large n that grows to at least about 1.5 times its original value.

From Base 3 to Base 2

Recall that in base 3, computing 3n+1 is easy, but it’s dividing by two that becomes a non-trivial computation. Back in elementary school again, we learned about long division, an algorithm for dividing numbers digit by digit, this time left to right. To do this, we divide each digit in turn, but keep track of a remainder that propagates from digit to digit. We can also draw this as a state machine.

To extend this to the shortcut Collatz map, we need to look for a remainder when the division completes. This means that we’ll need to feed our state machine not just the ternary digits 0, 1, and 2, but an extra “Stop” signal (S for short) indicating the number is complete. Since the result may be longer than the input, it will be convenient to send an infinite sequence of these S signals, giving the machine time to output all of the digits of the result before it begins to produce S signals, as well. Upon receiving this S signal, if there is a remainder, then the input was odd, so our state machine needs to add another 2 onto the end of the sequence of ternary digits to complete the computation of (3n+1)/2 before emitting its own S signals.

Just as before, we’re interested not in a single application of this state machine, but the behavior of numbers under many different iterations, so we will again chain together an infinite number of these machines, feeding the ternary digits (or S signals) that leave each one into the next one as inputs. This time I’ll draw the ternary digits flowing from right to left.

Let’s try to simplify this picture.

  • Again, all the state machines share the same transition diagram, so we need only note which state each machine is in.
  • Once a machine (or the input) starts emitting S signals, it will never stop, so we need not concern outselves with these machines.
  • Because the machines start in state 0 and machines in state 0 always decrease the ternary digit so no single digit can change more than two of these into non-zero states before it becomes zero itself, we’ll always encounter an infinite tail of 0 states to the left, which are similarly uninteresting.

With those simplifications in mind, we can work through the behavior of these machines starting with the input 7, which is 21 in base 3. This time, input digits (time, essentially) flows from top to bottom, while the iterations of the state machine are oriented from right to left. The green arrows represent the memory of state over time, and the blue arrows represent input and output digits.

Following the flow from right to left and reading down the blue arrows representing ternary digits, we can see the ternary values from the shortcut Collatz map computed by the state machines, read from top to bottom. We might ask a question similar to the earlier one: can we show that, starting from any state and throwing S signals at these state machines from the right, it somehow simplifies to the sequence 10 (a 0, followed by a 1 to its left), which indicates we’ve reached the cyclic orbit at 1, 2 in the shortcut Collatz sequence?

In looking at this, as you likely guessed, we find that the state sequences when read from left to right from green arrows (starting from the second row down, after all the input digits have been fed in) give the binary form of the compressed Collatz map. That’s the one that even further shortens the shortcut Collatz map by folding all the divisions by two so they happen implicitly before each odd value is processed. Starting with base 3, then, we end up back in base 2!

What’s going on? It’s easy to see that the diagram above is the same as the one from binary earlier, except for the addition of two rows at the top where we’re still feeding in the ternary digits, and some uninteresting state propagation from machines that are already emitting S signals, and swapping the interpretation of the axes. But why?

Let’s compare the state machines:

They look quite different… but this is an illusion created by a biased presentation. These diagrams emphasize the state structure, but relegate the input structure to text labels on the arrows. We can instead draw both diagrams at once in this way:

In the base 2 case, we can interpret the rows as representing bits of input, and the columns as states: three carries, and the Start state S. In the base 3 case, we can interpret the columns as inputs: three ternary digits, and the Stop signal S, and the rows as states. With either interpretation, though, the rule is the same: we are exchanging a presentation of a number from 0 through 5 as 3b + t for a presentation as 2t + b, where t takes values 0, 1, or 2, while b takes values 0 or 1, and with the same rules for the special S token on the side of the least significant digits.

So in some deep sense, computing the Collatz trajectory in base 2 or base 3 is performing the same computation. This is true even though in base 2, we’re computing the compressed Collatz map, which has fewer iterations (but more digits to compute with), while in base 3, we’re computing the shortcut Collatz map, which has more iterations (but fewer digits to compute with). Somehow these differences are all dual to each other so the same thing happens in each.

Frankly, I find that very pretty.

by Chris Smith at July 30, 2024 06:20 PM

July 26, 2024

in Code

Haskell Nuggets: k-means

AI is hot, so let’s talk about some “classical machine learning” in Haskell with k-means clustering! Let’s throw in some dependent types too.

There are a bazillion ways of implementing such a simple algorithm, but this is how I’d do it, as someone who develops almost exclusively in Haskell (or functional pure languages) in both personal projects and work. It’s not the “right” way or the “best” way, but it’s the way that brings me joy. Hopefully it can also break beyond the simple toy projects you’ll often see in conceptual tutorials. You’ll see how I integrate dependent types, type-driven development, mutable data structures, generating random data, and preparation for parallelism. I have been meaning to shift away from “conceptual” posts and instead post a bit more about small, practical snippets that demonstrate some useful Haskell techniques and principles drive how I approach coding in Haskell overall.

For reference, the intended audience is for people with knowledge of Haskell syntax and basic idioms (mapping, traversing, folding, applicatives). The source code is online here, and is structured as a nix flake script. If you have nix installed (and flakes enabled), you should be able to run the script as an executable (./kmeans.hs). You can also load it for editing with nix develop + ghci.

The Algorithm

K-means is a method of assigning a bunch of data points and samples into k clusters. For the purpose of this post, we’re going to talk about data points as points in a vector space and clustering as grouping together clusters of points that are close to each other (using Euclidean/L2 distance).

The basic iteration goes like this:

  1. Start with k cluster centers (“means”, or “centroids” sometimes), k arbitrary points in your space.
  2. Repeat until the stop condition:
    • Assign/bucket each data point to its closest cluster center/mean.
    • Move each of the cluster centers to the mean/centroid of the points that were assigned to it, or the points in its bucket.

Basically, we repeatedly say, “if this was the true cluster center, what points would be in it?”. Then we adjust our cluster center to the center of those points that were assigned to it, updating to a better guess. Then we repeat again. A simple stopping condition would be if none of the k centers move after the update step.

The algorithm leaves the assigning of the original points undefined, and it’s also not optimal either, since it might converge on clusters that aren’t the best. But it’s simple enough conceptually that it’s taught in every beginner machine learning course.

The Haskell

We’re going to be dealing with points in a vector space and distances between them, so a good thing to reach for is the linear library, which offers types for 2D vectors, 3D vectors, etc. and how to deal with them as points in a vector space. linear offers an abstraction over multiple vector space points. A point has type p a: p is a vector space over field a. The library has V2 a for 2D points, so V2 Double is essentially \(\mathbb{R}^2\), a 2 dimensional point with double-valued components.

We want a collection of k cluster centers. We can use vector-sized for a fixed-size collection of items, Vector k (V2 Double) for k 2-D double points, or Vector k (p a) for k of any type of points.1

So overall, our function will have type:

kMeans :: [p a] -> Vector k (p a)

It will take a collection of p a points, and provide the k cluster centers. Note here that we have “return-type polymorphism”, where the k (number of items) is determined by what type the user expects the function to return. If they want 3 clusters of 2d points, they will call it expecting Vector 3 (V2 Double). If they want 10 clusters of 2d points, they would call it expecting Vector 10 (V2 Double).

We take a list of p a’s here because all we are going to do is iterate over each one…we don’t really care about random access or updates, so it’s really the best we can hope for, asymptotically2.

We have some leeway as to how we initialize our initial clusters. One simple solution is to just assign point 0 to cluster 0, point 1 to cluster, point 2 to cluster 2, etc., cycling around the clusters.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L33-L42

initialClusters :: (Additive p, Fractional a, KnownNat k) => [p a] -> Vector k (p a)
initialClusters pts = runST do
  sums <- MV.replicate zero
  counts <- MV.replicate 0
  ifor_ pts \i p -> do
    let i' = modulo (fromIntegral i)
    MV.modify sums (^+^ p) i'
    MV.modify counts (+ 1) i'
  V.generateM \i ->
    (^/) <$> MV.read sums i <*> (fromInteger <$> MV.read counts i)

runST runs the mutable algorithm where we initialize a vector of point sums and a vector of point counts. We then iterate over all of the points with their index (with ifor_), and we add that point to the index of the cluster, modulo k. A sized vector Vector k a is indexed by a Finite k (an integer from 0 to k-1). So, modulo :: Integer -> Finite k will convert an integer index to the Finite k index type, using modulus to wrap it around if it’s too big.

Here we are using some functions from linear:

  • (^+^) :: (Additive p, Num a) => p a -> p a -> p a which adds together two points
  • (^/) :: (Functor p, Fractional a) => p a -> a -> p a which divides a point by a scalar

At the end of it all, we use V.generateM to assemble our final (immutable) centroids by reading out the sums and totals at each cluster:

V.generateM :: (Finite k -> m a) -> m (Vector k a)

Note that the lengths of our intermediate vectors (sums, counts, and the final result) are all implicitly inferred through type inference (by k).

We can actually do a similar loop to assign/bin each point and compute the new centroids:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L44-L61

moveClusters ::
  forall k p a.
  (Metric p, Floating a, Ord a, KnownNat k, 1 <= k) =>
  [p a] ->
  Vector k (p a) ->
  Vector k (p a)
moveClusters pts origCentroids = runST do
  sums <- MV.replicate zero
  counts <- MV.replicate 0
  for_ pts \p -> do
    let closestIx = V.minIndex @a @(k - 1) (distance p <$> origCentroids)
    MV.modify sums (^+^ p) closestIx
    MV.modify counts (+ 1) closestIx
  V.generateM \i -> do
    n <- MV.read counts i
    if n == 0
      then pure $ origCentroids `V.index` i
      else (^/ fromInteger n) <$> MV.read sums i

We just have to be careful to not move the centroid if there are no points assigned to it, otherwise we’d be dividing by 0.

Notice there’s also something a little subtle going on with closestIx, which exposes a bit of the awkwardness with working with type-level numbers in Haskell today. The type of V.minIndex is:

V.minIndex :: forall a n. Ord a => Vector (n + 1) a -> Finite (n + 1)

This is because we only ever get a minimum if the vector is non-empty. So the library takes n + 1 as the size to ensure that only positive length vectors are passed.

In our case, we want V.minIndex blah :: Finite k. However, remember how typechecking works: we need to unify the type variables a and n so that n + 1 is equal to k. So, what does n have to be so that \(n + 1 = k\)? Well, we can see from algebra that n needs to be k - 1: (k - 1) + 1 is equal to k. However, GHC is a little dumb-dumb here in that it cannot solve for n itself. We can explicitly pass in @(k - 1) to say that n has to be k - 1.

For this to work we need to pull in a GHC plugin ghc-typelits-natnormalise which will allow GHC to simplify (k - 1) + 1 to be k, which it can’t do by itself for some reason. It also requires the constraint that 1 <= k in order for k - 1 to make sense for natural number k. We can pull in the plugin with:

{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}

Honestly if we were to design the library from scratch today, I’d define it as:

V.minIndex :: forall a n. (Ord a, 1 <= n) => Vector n a -> Finite n

in the first place, and we wouldn’t need the typechecker plugin.

Anyway so that’s the whole thing:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L63-L75

kMeans ::
  forall k p a.
  (Metric p, Floating a, Ord a, Eq (p a), KnownNat k, 1 <= k) =>
  [p a] ->
  Vector k (p a)
kMeans pts = go 0 (initialClusters pts)
  where
    go :: Int -> Vector k (p a) -> Vector k (p a)
    go !i !cs
      | cs == cs' || i > 100 = cs
      | otherwise = go (i + 1) cs'
      where
        cs' = moveClusters pts cs

Note I also added a stop after 100 steps, just to be safe.

Type-Level Advantages and Usability

Having k in the type is useful for many reasons:

  1. It helps us ensure that moveClusters doesn’t change the number of clusters/centroids. If it was just [p a] -> [p a] we cannot guarantee that it does not add or drop clusters.
  2. The type system means we don’t have to manually pass int sizes around. For example, in initialClusters, we implicitly pass the size around four times when we do MV.replicate (twice), modulo, and generateM! And, in the definition of kMeans, we implicitly pass it on to our call to initialClusters.
  3. We don’t have to worry about out-of-bounds indexing because any indices we generate (using modular or minIndex) are guaranteed (by their types) to be valid.
  4. It’s useful for the caller to guarantee they are getting what they are asking for. If kMeans :: Int -> [p a] -> [p a], then we (as the caller) can’t be sure that the result list has the number of items that we requested. But because we have kMeans :: [p a] -> Vector k (p a), the compiler ensures that the result has k items.

However you won’t always be able to necessarily put in a literal 3 in Vector 3 (V2 Double). Maybe your k comes from a configuration file or something else you pull in at runtime. We need a way to call kMeans with just an Int! (also known as “reification”)

Normally, this means using someNatVal to convert a value-level Natural into a type-level Nat. However, in this case we have to be a bit more careful because k must be at least 1. As of GHC 9.2, we can use cmpNat (before this, you could use typelits-witnesses) to bring this constraint into scope.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L77-L87

kMeans' ::
  forall p a.
  (Metric p, Floating a, Ord a, Eq (p a)) =>
  Natural ->
  [p a] ->
  [p a]
kMeans' k pts = case someNatVal k of
  SomeNat @k pk -> case cmpNat (Proxy @1) pk of
    LTI -> toList $ kMeans @k pts -- 1 < k, so 1 <= k is valid
    EQI -> toList $ kMeans @k pts -- 1 == k, so 1 <= k is valid
    GTI -> [] -- in this branch, 1 > k, so we cannot call kMeans

Applying the Clusters

Of course, kMeans only gets us our centroids, so it would be useful to actually create the clusters themselves and all their member points. We can do something similar to what we did before with ST and mutable vectors and runST, but life is too short to always be using mutable state. Let’s instead build up a map of indices to all the points that are closest to that index. Then we use generate :: (Finite k -> a) -> Vector k a to create a vector by picking out the maps’ value at the index at each spot in the vector. Again here we see that the type system helps us by not having to manually pass in a size, and generate giving us indices i that match the number of the centroids we are grouping on.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L104-L119

applyClusters ::
  forall k p a.
  (Metric p, Floating a, Ord a, Ord (p a), KnownNat k, 1 <= k) =>
  [p a] ->
  Vector k (p a) ->
  Vector k (Set (p a))
applyClusters pts cs = V.generate \i -> M.findWithDefault S.empty i pointsClosestTo
  where
    pointsClosestTo :: Map (Finite k) (Set (p a))
    pointsClosestTo =
      M.fromListWith
        (<>)
        [ (closestIx, S.singleton p)
        | p <- pts
        , let closestIx = V.minIndex @a @(k - 1) (distance p <$> cs)
        ]

Parallelization

Typically we parallelize this by assigning each worker thread a chunk of points it has to deal with, and having each one compute sums and counts and coordinating it all back in the end. In this case we want to keep the intermediate sums and counts:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L89-L102

groupAndSum ::
  (Metric p, Floating a, Ord a, KnownNat (k + 1)) =>
  [p a] ->
  Vector (k + 1) (p a) ->
  Vector (k + 1) (p a, Integer)
groupAndSum pts cs0 = runST do
  sums <- MV.replicate zero
  counts <- MV.replicate 0
  for_ pts \p -> do
    let closestIx = V.minIndex (distance p <$> cs0)
    MV.modify sums (^+^ p) closestIx
    MV.modify counts (+ 1) closestIx
  V.generateM \i ->
    (,) <$> MV.read sums i <*> MV.read counts i

Running an example

For funsies let us generate sample points that we know are clustered based on k random cluster centers, using mwc-random for randomness.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L121-L147

generateSamples ::
  forall p g m.
  (Applicative p, Traversable p, StatefulGen g m) =>
  -- | number of points per cluster
  Int ->
  -- | number of clusters
  Int ->
  g ->
  m ([p Double], [p Double])
generateSamples numPts numClusters g = do
  (centers, ptss) <-
    unzip <$> replicateM numClusters do
      -- generate the centroid uniformly in the box component-by-component
      center <- sequenceA $ pure @p $ MWC.uniformRM (0, boxSize) g
      -- generate numPts points...
      pts <-
        replicateM numPts $
          -- .. component-by-component, as normal distribution around the center
          traverse (\c -> MWC.normal c 0.1 g) center
      pure (center, pts)
  pure (centers, concat ptss)
  where
    -- get the dimension by getting the length of a unit point
    dim = length (pure () :: p ())
    -- approximately scale the range of the numbers by the area that the
    -- clusters would take up
    boxSize = (fromIntegral numClusters ** recip (fromIntegral dim)) * 20

By the way isn’t it funny that everything just ends up being traverse or some derivation of it (like replicateM or sequenceA)? Anyways,

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L149-L158

main :: IO ()
main = do
  g <- MWC.createSystemRandom
  (centers, samps) <- generateSamples @V2 10 3 g
  putStrLn "* points"
  mapM_ print samps
  putStrLn "* actual centers"
  print centers
  putStrLn "* kmeans centers"
  print $ kMeans' 3 samps
* points
V2 15.117809404050517 2.4824833627968137
V2 14.825686288414198 2.569457175505424
V2 14.806948346588289 2.3222471406644867
V2 15.012490917145703 2.41735577349797
V2 15.007612893836304 2.3823051676970746
V2 14.866016893659538 2.590777185848723
V2 14.83908442030534 2.5756382736578343
V2 14.969996769619264 2.549568226274995
V2 14.983371307935904 2.4823314218207586
V2 14.931617828479244 2.469607213743923
V2 29.426938075603196 9.90899836541481
V2 29.657363050066813 9.844458859292706
V2 29.487332896419872 9.65240948313236
V2 29.717470180982964 9.756325723236502
V2 29.67198068295402 9.688676918673274
V2 29.564673351390947 9.63896189703656
V2 29.56057222121772 9.833541221236656
V2 29.563747509453506 9.75593412158655
V2 29.497322568720026 9.684752183878274
V2 29.598339480038018 9.968546198295204
V2 3.204536005881443 30.039372398954175
V2 3.1684921057193005 30.082909536200095
V2 3.2040077021183793 29.90694542057959
V2 3.151859377604784 29.89198303817146
V2 3.1027920089123935 30.240061564528673
V2 3.2323285236152937 30.037812094337777
V2 3.2722229374242366 30.05215727709455
V2 2.9723263815754652 30.06281544324189
V2 3.1935700833126437 30.068367400732857
V2 3.253701544151972 29.875079507116222
* actual centers
[V2 14.938139892220267 2.4859265040850276,V2 29.55811494146035 9.808348344980386,V2 3.239842205071254 30.070304958459946]
* kmeans centers
[V2 14.936063507003428 2.484177094150801,V2 29.57457400168471 9.773260497178288,V2 3.175583667031591 30.025750368095725]

Neat!

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)


  1. Be mindful, for Vector here we are using things strictly as a “fixed-sized collection of values”, whereas for linear, we have types like V2 which represent points in a mathematical vector space. It’s a bit unfortunate that the terminology overlaps here a bit.↩︎

  2. Yes, yes, linked lists are notoriously bad for the CPU-level cache and branch prediction, so if we are in a situation where we really care, using a contiguous memory data structure (like Storable Vector) might be better.↩︎

by Justin Le at July 26, 2024 07:06 PM

Lysxia's blog

Where does the name "algebraic data type" come from?

“Algebraic data types” is a beloved feature of programming languages with such a mysterious name. Where does this name come from?

There are two main episodes in this saga: Hope and Miranda. The primary conclusion is that the name comes from universal algebra, whereas another common interpretation of “algebraic” as a reference to “sums of products” is not historically accurate. We drive the point home with Clear. CLU is extra.

Disclaimer: I’m no historian and I’m nowhere as old as these languages to have any first-hand perspective. Corrections and suggestions for additional information are welcome.

Hope (1980)

Algebraic data types were at first simply called “data types”. This programming language feature is commonly attributed to Hope, an experimental applicative language by Rod Burstall et al.. Here is the relevant excerpt from the paper, illustrating its concrete syntax:

A data declaration is used to introduce a new data type along with the data constructors which create elements of that type. For example, the data declaration for natural numbers would be:

data num == 0 ++ succ(num)

(…) To define a type ‘tree of numbers’, we could say

data numtree == empty ++ tip(num)
                      ++ node(numtree#numtree)

(The sign # gives the cartesian product of types). One of the elements of numtree is:

node(tip(succ(0)),
     node(tip(succ(succ(0))), tip(0)))

But we would like to have trees of lists and trees of trees as well, without having to redefine them all separately. So we declare a type variable

typevar alpha

which when used in a type expression denotes any type (including second- and higher-order types). A general definition of tree as a parametric type is now possible:

data tree(alpha) == empty ++ tip(alpha)
                          ++ node(tree(alpha)#tree(alpha))
Now tree is not a type but a unary type constructor – the type numtree can be dispensed with in favour of tree(num).

Pattern matching in Hope is done in multi-clause function declarations or multi-clause lambdas. There was no case expression.

reverse(nil) <= nil
reverse(a::l) <= reverse(l) <> [a]
lambda true, p => p
     | false, p => false

As far as I can tell, other early programming languages cite Hope or one of its descendants as their inspiration for data types. There is a slightly earlier appearance in NPL by Darlington and the same Burstall, but I couldn’t find a source describing the language or any samples of data type declarations. Given the proximity, it seems reasonable to consider them the same language to a large extent. This paper by Burstall and Darlington (1977) seems to be using NPL in its examples, but data types are only introduced informally; see on page 62 (page 19 of the PDF):

We need a data type atom, from which we derive a data type tree, using constructor functions tip to indicate a tip and tree to combine two subtrees

tip : atoms → trees
tree : trees x trees → trees

We also need lists of atoms and of trees, so for any type alpha let

nil : alpha-lists
cons : alphas x alpha-lists → alpha-lists

Hope inspired ML (OCaml’s grandpa) to adopt data types. In Standard ML:

datatype 'a option = Nothing | Some of 'a

Before it became Standard, ML started out as the “tactic language” of the LCF proof assistant by Robin Milner, and early versions did not feature data types (see the first version of Edinburgh LCF). it’s unclear when data types were added exactly, but The Definition of Standard ML by Milner et al. credits Hope for it (in Appendix F: The Development of ML):

Two movements led to the re-design of ML. One was the work of Rod Burstall and his group on specifications, crystallised in the specification language Clear and in the functional programming language Hope; the latter was for expressing executable specifications. The outcome of this work which is relevant here was twofold. First, there were elegant programming features in Hope, particularly pattern matching and clausal function definitions; second, there were ideas on modular construction of specifications, using signatures in the interfaces. A smaller but significant movement was by Luca Cardelli, who extended the data-type repertoire in ML by adding named records and variant types.

Miranda (1985)

“Data types” as a programming language feature appeared in Hope, but its first mention under the name “algebraic data types” that I could find is in Miranda: a non-strict functional language with polymorphic types by David Turner in 1985:

Algebraic data types

The basic method of introducing a new concrete data type, as in a number of other languages, is to declare a free algebra. In Miranda this is done by an equation using the symbol ::=,

tree ::= Niltree | Node num tree tree
being a typical example. (…) The idea of using free algebras to define data types has a long and respectable history [Landin 64], [Burstall 69], [Hoare 75]. We call it a free algebra, because there are no associated laws, such as a law equating a tree with its mirror image. Two trees are equal only if they are constructed in exactly the same way.

In case you aren’t aware, Miranda is a direct precursor of Haskell. A minor similarity with Haskell that we can see here is that data constructors are curried in Miranda, unlike in Hope and ML. Another distinguishing feature of Miranda is laziness. See also A History of Haskell: being lazy with class.

Below are links to the articles cited in the quote above. The first [Landin 64] doesn’t explicitly talk about algebra in this sense, while [Burstall 69] and [Hoare 75] refer to “word algebra” rather than “free algebra” to describe the same structure, without putting “algebra” in the same phrase as “type” yet.

Hoare’s paper contains some futuristic pseudocode in particular:

A possible notation for such a type definition was suggested by Knuth; it is a mixture of BNF (the | symbol) and the PASCAL definition of a type by enumeration:

type proposition = (prop (letter) | neg (proposition) |
                   conj, disj (proposition, proposition));

(…) In defining operations on a data structure, it is usually necessary to enquire which of the various forms the structure takes, and what are its components. For this, I suggest an elegant notation which has been implemented by Fred McBride in his pattern-matching LISP. Consider for example a function intended to count the number of &s contained in a proposition. (…)

function andcount (p: proposition): integer;
  andcount := cases p of
    (prop(c) → 0|
     neg(q) → andcount(q)|
     conj(q,r) → andcount(q) + andcount(r)+1|
     disj(q,r) → andcount(q) + andcount(r));

Fred McBride’s pattern-matching LISP is the topic of his PhD dissertation. There is not enough room on this page to write about the groundbreaking history of LISP.

Unfree algebras in Miranda

If algebraic data types are “free algebras”, one may naturally wonder whether “unfree algebras” have a role to play. Miranda allows quotienting data type definitions by equations (“laws” or “rewrite rules”). You could then define the integers like this, with a constructor to decrement numbers, and equations to reduce integers to a canonical representation:

int ::= Zero | Suc int | Pred int
Suc (Pred n) => n
Pred (Suc n) => n

In hindsight this is superfluous, but it’s fun to see this kind of old experiments in programming languages. The modern equivalent in Haskell would be to hide the data constructors and expose smart constructors instead. There are uses for quotient types in proof assistants and dependently typed languages, but they work quite differently.

Sums of products?

There is another folklore interpretation of “algebraic” in “algebraic data types” as referring to “sums of products”.

It’s not an uncommon interpretation. In fact, trying to find a source for this folklore is what got me going on this whole adventure. The Wikipedia article on algebraic data types at the time of writing doesn’t outright say it, but it does refer to sums and products several times while making no mention of free algebras. Some [citation needed] tags should be sprinkled around. The Talk page of that article contains an unresolved discussion of this issue, with links to a highly upvoted SO answer and another one whose references don’t provide first-hand account of the origins of the term. For sure, following that idea leads to some fun combinatorics, like differentiation on data types, but that doesn’t seem to have been the original meaning of “algebraic data types”.

That interpretation might have been in some people’s mind in the 70s and 80s, even if only as a funny coincidence, but I haven’t found any written evidence of it except maybe this one sentence in a later paper, Some history of programming languages by David Turner (2012):

The ISWIM paper also has the first appearance of algebraic type definitions used to define structures. This is done in words, but the sum-of-products idea is clearly there.

It’s only a “maybe” because while the phrase “algebraic type” undeniably refers to sums of products, it’s not clear that the adjective “algebraic” specifically is meant to be associated with “sum-of-products” in that sentence. We could replace “algebraic type” with “data type” without changing the meaning of the sentence.

Clear (1979)

In contrast, free algebras—or initial algebras as one might prefer to call them—are a concept from the areas of universal algebra and category theory with a well-established history in programming language theory by the time algebraic data types came around, with influential contributions by a certain ADJ group; see for example Initial algebra semantics and continuous algebras.

Ironically, much related work focused on the other ADT, “abstract data types”. Using universal algebra as a foundation, a variety of “specification languages” have been designed for defining algebraic structures, notably the OBJ family of languages created by Joseph Goguen (a member of the aforementioned ADJ group) and others, and the Clear language by Rod Burstall (of Hope fame) and Joseph Goguen. Details of the latter can be found in The Semantics of Clear, a specification language. (You may remember seeing a mention of Clear earlier in the quote from The Definition of Standard ML.)

Example theories in Clear

Here is the theory of monoids in Clear. It consists of one sort named carrier, an element (a nullary operation) named empty and a binary operation append.

constant Monoid = theory
                      sorts carrier
                      opns empty : carrier
                           append : carrier,carrier -> carrier
                      eqns all x: carrier . append(x,empty) = x
                           all x: carrier . append(empty,x) = x
                           all x,y,z: carrier . append(append(x,y),z) = append(x,append(y,z))
                  endth

A theory is an interface. Its implementations are called algebras. In that example, the algebras of “the theory of monoids” are exactly monoids.

In every theory, there is an initial algebra obtained by turning the operations into constructors (or “uninterpreted operations”), equating elements (which are trees of constructors) modulo the equations of the theory. For the example above, the initial monoid is a singleton monoid, with only an empty element (all occurrences of append are simplified away by the two equations for empty), which is not very interesting. Better examples are those corresponding to the usual data types.

The booleans can be defined as the initial algebra of the theory with one sort (truthvalue) and two values of that sort, true and false.

constant Bool = theory data
                    sorts truthvalue
                    opns true,false: truthvalue
                endth

In Clear, the initial algebra is specified by adding the data keyword to a theory. In the semantics of Clear, rather than thinking in terms of a specific algebra, a “data theory” is still a theory (an interface), with additional constraints that encode “initiality”, so the only possible algebra (implementation) is the initial one. My guess as to why the concept of data theory is set up that way is that it allows plain theories and data theories to be combined seamlessly.

The natural numbers are the initial algebra of zero and succ:

constant Nat = theory data
                   sorts nat
                   opns zero: nat
                        succ: nat -> nat
               endth

At this point, the connection between “data theories” in Clear and data types in Hope and subsequent languages is hopefully clear.

More substantial examples in Clear

Theories can be extended into bigger theories with new sorts, operations, and equations. Here is an extended theory of booleans with two additional operations not, and, and their equations. This should demonstrate that, beyond the usual mathematical structures, we can define non-trivial operations in this language:

constant Bool1 = enrich Bool by
                     opns not: truthvalue -> truthvalue
                          and: truthvalue,truthvalue -> truthvalue
                     eqns all . not true = false
                          all . not false = true
                          all p: truthvalue . and(false, p) = false
                          all p: truthvalue . and(true, p) = p
                 enden

Initial algebras are also called free algebras, but that gets confusing because “free” is an overloaded word. Earlier for instance, you might have expected the initial monoid, or “free monoid”, to be the monoid of lists. The monoid of lists is the initial algebra in a slightly different theory: the theory of monoids with an embedding from a fixed set of elements A.

We might formalize it as follows in Clear. The theory List is parameterized by an algebra A of the theory Set, and its body is the same as Monoid, except that we renamed carrier to list, we added an embed operation, and we added the data keyword to restrict that theory to its initial algebra.

constant Set = theory sorts element endth
procedure List(A : Set) = theory data
                              sorts list
                              opns empty : list
                                   append : list,list -> list
                                   embed : element of A -> list
                              eqns all x: list . append(x,empty) = x
                                   all x: list . append(empty,x) = x
                                   all x,y,z: list . append(append(x,y),z) = append(x,append(y,z))
                          endth

One may certainly see a resemblance between theories in Clear, modules in ML, and object-oriented classes. It’s always funny to find overlaps between the worlds of functional and object-oriented programming.

CLU (1977)

CLU is a programming language created at MIT by Barbara Liskov and her students in the course of their work on data abstraction.

It features tagged union types, which are called “oneof types”. (Source: CLU Reference Manual by Barbara Liskov et al. (1979).)

T = oneof[empty:       null,
          integer:     int,
          real_num:    real,
          complex_num: complex]

Values are constructed by naming the oneof type (either as an identifier bound to it, or by spelling out the oneof construct) then the tag prefixed by make_:

T$make_integer(42)

The tagcase destructs “oneof” values.

x: oneof[pair: pair, empty: null]
...
tagcase x
    tag empty: return(false)
    tag pair(p: pair): if (p.car = i)
                       then return(true)
                       else x := down(p.cdr)
                       end
end

The main missing feature for parity with algebraic data types is recursive type definitions, which are not allowed directly. They can be achieved indirectly though inconveniently through multiple clusters (classes in modern terminology). (Source: A History of CLU by Barbara Liskov (1992).)

Burstall’s papers on Hope and Clear cite CLU, but beyond that it doesn’t seem easy to make precise claims about the influence of CLU, which is an object-oriented language, on the evolution of those other declarative languages developed across the pond.

by Lysxia at July 26, 2024 12:00 AM

July 23, 2024

Gabriella Gonzalez

Software engineers are not (and should not be) technicians

Software engineers are not (and should not be) technicians

I don’t actually think predictability is a good thing in software engineering. This will probably come as a surprise to some people (especially managers), but I’ll explain what I mean.

In my view, a great software engineer is one who automates repetitive/manual labor. You would think that this is a pretty low bar to clear, right? Isn’t automation of repetitive tasks … like … programming 101? Wouldn’t most software engineers be great engineers according to my criterion?

No.

I would argue that most large software engineering organizations incentivize anti-automation and it’s primarily because of their penchant for predictability, especially predictable estimates and predictable work. The reason this happens is that predictable work is work that could have been automated but was not automated.

Example

I’ll give a concrete example of predictable work from my last job. Early on we had a dedicated developer for maintaining our web API. Every time some other team added a new gRPC API endpoint to an internal service this developer was tasked with exposing that same information via an HTTP API. This was a fairly routine job but it still required time and thought on their part.

Initially managers liked the fact that this developer could estimate reliably (because the work was well-understood) and this developer liked the fact that they didn’t have to leave their comfort zone. But it wasn’t great for the business! This person frequently became a bottleneck for releasing new features because they had inserted their own manual labor as a necessary step in the development pipeline. They made the case that management should hire more such developers like themselves to handle increased demand for their work.

Our team pushed back on this because we recognized that this developer’s work was so predictable that it could be completely automated. We made the case to management that rather than hiring another person to do the same work we should be automating more and it’s a good thing we did; that developer soon left the company and instead of hiring to replace them we automated away their job instead. We wrote some code to automatically generate an HTTP API from the corresponding gRPC API1 and that generated much more value for the business than hiring a new developer.

Technicians vs Engineers

I like to use the term “technician” to describe a developer who (A) does work that is well-understood and (B) doesn’t need to leave their comfort zone very often. Obviously there is not a bright line dividing engineers from technicians, but generally speaking the more predictable and routine a developer’s job the more they tend to slide into becoming a technician. In the above example, I viewed the developer maintaining the web API as more of a technician than an engineer.

In contrast, the more someone leans into being an engineer the more unpredictable their work gets (along with their estimates). If you’re consistently automating things then all of the predictable work slowly dries up and all that’s left is unpredictable work. The nature of a software engineer’s job is that they are tackling increasingly challenging and ambitious tasks as they progressively automate more.

I believe that most tech companies should not bias towards predictability and should avoid hiring/cultivating technicians. The reason that tech companies command outsized valuations is because of automation. Leaning into predictability and well-understood work inadvertently incentivizes manual labor instead of automation. This isn’t obvious to a lot of tech companies because they assume any work involving code is necessarily automation but that’s not always the case2. Tech companies that fail to recognize this end up over-hiring and wondering why less work is getting done with more people.

Or to put it another way: I actually view it as a red flag if an engineer or team gets into a predictable “flow” because it means that there is a promising opportunity for automation they’re ignoring.


  1. Nowadays there are off-the-shelf tools to do this like grpc-gateway but this wasn’t available to us at the time.↩︎

  2. … or even usually the case; I’m personally very cynical about the engineering effectiveness of most tech companies.↩︎

by Gabriella Gonzalez (noreply@blogger.com) at July 23, 2024 02:10 PM

July 20, 2024

Magnus Therning

Emacs via Nix with mu4e

I've been running development versions of Emacs ever since I switched to Wayland and needed the PGTK code. The various X-git packages on AUR makes that easy, as long as one doesn't mind building the packages locally, and regularly. Building a large package like Emacs does get a bit tiring after a while though so I started looking at the emacs overlay to see if I could keep up without building quite that much.

The first attempt at this failed as I couldn't get my email setup working; emacs simply refused to find the locally installed mu4e package. I felt I didn't have time to solve it at the time, reverted back to doing the builds myself again. It kept irritating me though, and today I made another attempt. This time I invested a bit more time in reading up on how to install emacs via Nix with packages. Something that paid off.

I'm managing my packages using nix profile and a flake.nix. To install emacs with a working mu4e I started with adding the emacs overlay to the inputs

inputs = {
  nixpkgs.url = "github:nixos/nixpkgs?ref=nixpkgs-unstable";
  ...
  community-emacs.url = "github:nix-community/emacs-overlay";
};

and in my outputs I made sure to use the overlay on nixpkgs

outputs = inputs@{ nixpkgs, community-emacs, ... }:
  let
    system = "x86_64-linux";
    pkgs = import nixpkgs {
      inherit system;
      overlays = [ community-emacs.overlays.emacs ];
    };
    ...

and in the list of packages passed to pkgs.buildEnv I added

...
((emacsPackagesFor emacs-pgtk).emacsWithPackages
  (epkgs: [ epkgs.mu4e ]))
mu
...

That's all there's to it. After running nix profile update 0 I had a build of emacs with Wayland support that's less than a day old, all downloaded from the community cache. Perfect!

July 20, 2024 11:04 PM

Tony Zorman

Integrating KaTeX Into Hakyll

Posted on 2024-07-20  ·  5 min read  · 

Being quite into mathematics, I sometimes blog about it.1 There are very capable solutions for rendering LaTeX in HTML documents out there, which in particular solve the problem of properly aligning the fragments with the rest of the text. One of them is KaTeX, advertising itself to be easily executed on the server-side, avoiding the use of extraneous client-side JavaScript. Integrating it with Hakyll turned out to be relatively straightforward, yet I haven’t seen an actual implementation anywhere; this post is supposed to fill that gap.

My dark MathJax past

One of my quite strongly held opinions is that, for static websites such as this one, client-side LaTeX rendering is completely unnecessary, and actually just a waste of resources. As a result, I’ve been using MathJax to insert LaTeX fragments into the HTML after it’s compiled from Markdown. This setup—stolen essentially verbatim from Gwern—uses the now deprecated mathjax-node-page to crawl through the already rendered HTML pages, and, upon recognising a math fragment, replaces that text with the rendered formula. The call to mathjax-node-page is trivial to parallelise on a per-file level with something like GNU parallel, and so the whole thing actually works quite well.

However, the fact that this is “external” to Pandoc’s pipeline and requires a separate build.sh file to be created has always felt a bit awkward to me.2 Plus, Hakyll is already capable of using GHC’s parallel runtime—why outsource a part of that to an external tool? At some point, the annoyance I felt at this became stronger than the inertia my old setup had, so here we are.

A brighter future with KaTeX

Naturally, when you change something you really want to change something—at least I do—so instead of using MathJax v3’s native support for these kinds of things, why not try something new? An often cited alternative to MathJax is KaTeX, yet another JavaScript library that promises decent maths rendering on the web. This one is pretty good, though; it’s supposed to be faster than MathJax, and has “server side rendering” as a big bullet point on its landing page. Sounds exactly like what I’m looking for.

KaTeX has a CLI of the same name, but booting up the node runtime for every single maths fragment sounds perfectly dreadful to me, so let’s not do that. As such, one probably can’t avoid writing at least a little bit of JavaScript. Thankfully, integrating KaTeX into Pandoc itself seems to be a well-trodden path, so other people have already done this for me. For example, pandoc#6651 has a tiny script—essentially just calling katex.​render​To​String—that is fed maths on stdin, and then produces HTML on stdout. Slightly adjusted to support inline and display maths, it looks like this:

import { readLines } from "https://deno.land/std@0.224.0/io/mod.ts";
import katex from "https://cdn.jsdelivr.net/npm/katex@0.16.11/dist/katex.mjs";

for await (const line of readLines(Deno.stdin)) {
  try {
    let DISPLAY    = ":DISPLAY ";
    let useDisplay = line.startsWith(DISPLAY);
    let cleanLine  = useDisplay ? line.substring(DISPLAY.length) : line;
    console.log(katex.renderToString(cleanLine, {
      displayMode: useDisplay,
      strict: "error",
      throwOnError: true,
    }));
  } catch (error) {
    throw new Error(`Input: ${line}\n\nError: ${error}`);
  }
}

Having this in place, all that’s left is to crawl through Pandoc’s AST, and feed each maths fragment to KaTeX. Transforming its AST is something that Pandoc does very well, so the code is usually swiftly written. Indeed, both the Block and Inline types have a Math constructor which we can match on.3

import Data.Text    qualified as T
import Data.Text.IO qualified as T
import GHC.IO.Handle (BufferMode (NoBuffering), Handle, hSetBuffering)
import Hakyll
import System.Process (runInteractiveCommand)
import Text.Pandoc.Definition (Block (..), Inline (..), MathType (..), Pandoc)
import Text.Pandoc.Walk (walk, walkM)

hlKaTeX :: Pandoc -> Compiler Pandoc
hlKaTeX pandoc = recompilingUnsafeCompiler do
  (hin, hout, _, _) <- runInteractiveCommand "deno run scripts/math.ts"
  hSetBuffering hin  NoBuffering
  hSetBuffering hout NoBuffering

  (`walkM` pandoc) \case
    Math mathType (T.unwords . T.lines . T.strip -> text) -> do
      let math :: Text
            = foldl' (\str (repl, with) -> T.replace repl with str)
                     case mathType of
                       DisplayMath{-s-} -> ":DISPLAY " <> text
                       InlineMath{-s-}  ->                text
                     macros
      T.hPutStrLn hin math
      RawInline "html" <$> getResponse hout
    block -> pure block
 where
  -- KaTeX might sent the input back as multiple lines if it involves a
  -- matrix of coordinates. The big assumption here is that it does so only
  -- when matrices—or other such constructs—are involved, and not when it
  -- sends back "normal" HTML.
  getResponse :: Handle -> IO Text
  getResponse handle = go ""
   where
    go :: Text -> IO Text
    go !str = do
      more <- (str <>) <$> T.hGetLine handle
      if ">" `T.isSuffixOf` more  -- end of HTML snippet
      then pure more
      else go   more

  -- I know that one could supply macros to KaTeX directly, but where is the
  -- fun in that‽
  macros :: [(Text, Text)]
  macros =
    [ ("≔"       , "\\mathrel{\\vcenter{:}}=")
    , ("\\defeq" , "\\mathrel{\\vcenter{:}}=")
    , ("\\to"    , "\\longrightarrow")
    , ("\\mapsto", "\\longmapsto")
    , ("\\cat"   , "\\mathcal")
    , ("\\kVect" , "\\mathsf{Vect}_{\\mathtt{k}}")
    ]

The (T.unwords . T.lines . T.strip -> text) view pattern is because KaTeX really does not seem to like it when there is a line break—even a semantically irrelevant one—in a formula. Perhaps this is a setting I’ve overlooked. Other than that the code should be reasonably self-explanatory; there are a few macro definitions that are copied from the now deleted build.sh and some fiddling to make the stdout handle actually output the full response.4

The hlKaTeX function, having a Pandoc -> Compiler Pandoc signature, can be given to pandocCompilerWithTransformM like any other function:

myPandocCompiler :: Compiler (Item String)
myPandocCompiler =
  pandocCompilerWithTransformM
    defaultHakyllReaderOptions
    defaultHakyllWriterOptions
    hlKaTeX

And that’s pretty much it!

Adding CSS

All that’s left is to include the custom CSS and special fonts that KaTeX relies upon. The former can be downloaded from their CDN, and the latter are easily obtained from the latest release by copying the fonts directory. The fonts are both reasonably small and loaded on demand, such that the website does not blow up in size with this switch.

Conclusion

The whole affair was much easier than I—not knowing any JavaScript—expected it to be, and actually turned out to be quite fun. Of course, nothing at all has changed on the user-side of things, which is to say that the new KaTeX fragments look pretty much exactly the same as the old MathJax maths. Still, the warm feeling I had when deleting that build.sh shell script tells me that this was not solely an exercise in futility. Or perhaps I’ve fully embraced rolling the boulder up the hill by now.

If you’re interested, the commit adding it to my setup can be found here.


  1. Not as much as I should, I guess, but nowadays when I write maths it feels like a waste to not have it go into either Anki, Org Roam, or a paper, and these notes are not necessarily written/ready for public consumption. Oh well.↩︎

  2. Especially because, unlike in Gwern’s case, this site is not super complex to build; there aren’t any other moving parts that would require me to leave Haskell.↩︎

  3. {-} 󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    󠀠

    Mind the BlockArguments—and the “s”.↩︎

  4. Seemingly as always when subprocesses are involved, the hardest thing is to actually get all of the incantations right such that buffering does not deadlock your program indefinitely.↩︎

July 20, 2024 12:00 AM

Brent Yorgey

Help me attend ICFP in Milan!

Help me attend ICFP in Milan!

Posted on July 20, 2024
Tagged , , , , ,

tl;dr: if you appreciate my past or ongoing contributions to the Haskell community, please consider helping me get to ICFP this year by donating via my ko-fi page!

ko-fi

Working at a small liberal arts institution has some tremendous benefits (close interaction with motivated students, freedom to pursue the projects I want rather than jump through a bunch of hoops to get tenure, fantastic colleagues), and I love my job. But there are also downsides; the biggest ones for me are the difficulty of securing enough travel funding, and, relatedly, the difficulty of cultivating and maintaining collaborations.

Last year I was very grateful for people’s generosity in helping me get to Seattle. I am planning to again attend ICFP in Milan this September; this time I will even bring some students along. I have once again secured some funding from my institution, but it will not be enough to cover all the expenses.

So, if you have been positively impacted by my contributions to the Haskell community (e.g. CIS 194, the Typeclassopedia, diagrams, split, MonadRandom, burrito metaphors…) and/or would like to support my ongoing work (competitive programming in Haskell, swarm, disco, ongoing package maintenance…), and are able to express that appreciation or support with a donation of any size to help me get to ICFP, I would really appreciate it!

Thank you, friends — I hope to see many people in Milan! Next up: I will soon publish another post about tree path decomposition!

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at July 20, 2024 12:00 AM

July 19, 2024

Derek Elkins

Morleyization

Introduction

Morleyization is a fairly important operation in categorical logic for which it is hard to find readily accessible references to a statement and proof. Most refer to D1.5.13 of “Sketches of an Elephant” which is not an accessible text. 3.2.8 of “Accessible Categories” by Makkai and Paré is another reference, and “Accessible Categories” is more accessible but still a big ask for just a single theorem.

Here I reproduce the statement and proof from “Accessible Categories” albeit with some notational and conceptual adaptations as well as some commentary. This assumes some basic familiarity with the ideas and notions of traditional model theory, e.g. what structures, models, and |\vDash| are.

Preliminaries

The context of the theorem is infinitary, classical (multi-sorted) first-order logic. |L| will stand for a language aka a signature, i.e. sorts, function symbols, predicate symbols as usual, except if we’re allowing infinitary quantification we may have function or predicate symbols of infinite arity. We write |L_{\kappa,\lambda}| for the corresponding classical first-order logic where we allow conjunctions and disjunctions indexed by sets of cardinality less than the regular (infinite) cardinal |\kappa| while allowing quantification over sets of variables of (infinite) cardinality less than |\lambda \leq \kappa|. |\lambda=\varnothing| is also allowed to indicate a propositional logic. If |\kappa| or |\lambda| are |\infty|, that means conjunctions/disjunctions or quantifications over arbitrary sets. |L_{\omega,\omega}| would be normal finitary, classical first-order logic. Geometric logic would be a fragment of |L_{\infty,\omega}|. The theorem will focus on |L_{\infty,\infty}|, but inspection of the proof shows that theorem would hold for any reasonable choice for |\kappa| and |\lambda|.

As a note, infinitary logics can easily have a proper class of formulas. Thus, it will make sense to talk about small subclasses of formulas, i.e. ones which are sets.

Instead of considering logics with different sets of connectives Makkai and Paré, introduces the fairly standard notion of a positive existential formula which is a formula that uses only atomic formulas, conjunctions, disjunctions, and existential quantification. That is, no implication, negation, or universal quantification. They then define a basic sentence as “a conjunction of a set of sentences, i.e. closed formulas, each of which is of the form |\forall\vec x(\phi\to\psi)| where |\phi| and |\psi| are [positive existential] formulas”.

It’s clear the component formulas of a basic sentences correspond to sequents of the form |\phi\vdash\psi| for open positive existential formulas. A basic sentence corresponds to what is often called a theory, i.e. a set of sequents. Infinitary logic lets us smash a theory down to a single formula, but I think the theory concept is clearer though I’m sure there are benefits to having a single formula. Instead of talking about basic sentences, we can talk about a theory in the positive existential fragment of the relevant logic. This has the benefit that we don’t need to introduce connectives or infinitary versions of connectives just for structural reasons. I’ll call a theory that corresponds to a basic sentence a positive existential theory for conciseness.

Makkai and Paré also define |L_{\kappa,\lambda}^*| “for the class of formulas |L_{\kappa,\lambda}| which are conjunctions of formulas in each of which the only conjunctions occurring are of cardinality |< \lambda|”. For us, the main significance of this is that geometric theories correspond to basic sentences in |L_{\infty,\omega}^*| as this limits the conjunctions to the finitary case. Indeed, Makkai and Paré include the somewhat awkward sentence: “Thus, a geometric theory is the same as a basic sentence in |L_{\infty,\omega}^*|, and a coherent theory is a conjunction of basic sentences in |L_{\omega,\omega}|.” Presumably, the ambiguous meaning of “conjunction” leads to the differences in how these are stated, i.e. a basic sentence is already a “conjunction” of formulas.

The standard notion of an |L|-structure and model are used, and I won’t give a precise definition here. An |L|-structure assigns meaning (sets, functions, and relations) to all the sorts and symbols of |L|, and a model of a formula (or theory) is an |L|-structure which satisfies the formula (or all the formulas of the theory). We’ll write |Str(L)| for the category of |L|-structures and homomorphisms. In categorical logic, an |L|-structure would usually be some kind of structure preserving (fibred) functor usually into |\mathbf{Set}|, and a homomorphism is a natural transformation. A formula would be mapped to a subobject, and a model would require these subobjects to satisfy certain factoring properties specified by the theory. A sequent |\varphi \vdash \psi| in the theory would require a model to have the interpretation of |\varphi| factor through the interpretation of |\psi|, i.e. for the former to be a subset of the latter when interpreting into |\mathbf{Set}|.

Theorem Statement

|\mathcal F \subseteq L_{\infty,\infty}| is called a fragment of |L_{\infty,\infty}| if:

  1. it contains all atomic formulas of |L|,
  2. it is closed under substitution,
  3. if a formula is in |\mathcal F| then so are all its subformulas,
  4. if |\forall\vec x\varphi \in \mathcal F|, then so is |\neg\exists\vec x\neg\varphi|, and
  5. if |\varphi\to\psi \in \mathcal F|, then so is |\neg\varphi\lor\psi|.

Basically, and the motivation for this will become clear shortly, formulas in |\mathcal F| are like “compound atomic formulas” with the caveat that we must include the classically equivalent versions of |\forall| and |\to| in terms of |\neg| and |\exists| or |\lor| respectively.

Given |\mathcal F|, we define an |\mathcal F|-basic sentence exactly like a basic sentence except that we allow formulas from |\mathcal F| instead of just atomic formulas as the base case. In theory language, an |\mathcal F|-basic sentence is a theory, i.e. set of sequents, using only the connectives |\bigwedge|, |\bigvee|, and |\exists|, except within subformulas contained in |\mathcal F| which may use any (first-order) connective. We’ll call such a theory a positive existential |\mathcal F|-theory. Much of the following will be double-barrelled as I try to capture the proof as stated in “Accessible Categories” and my slight reformulation using positive existential theories.

|\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| for a theory |\mathbb T| (or |\mathrm{Mod}^{(\mathcal F)}(\sigma)| for a basic sentence |\sigma|) is the category whose objects are |L|-structures that are models of |\mathbb T| (or |\sigma|), and whose arrows are the |\mathcal F|-elementary mappings. An |\mathcal F|-elementary mapping |h : M \to N|, for any subset of formulas of |L_{\infty,\infty}|, |\mathcal F|, is a mapping of |L|-structures which preserves the meaning of all formulas in |\mathcal F|. That is, |M \vDash \varphi(\vec a)| implies |N \vDash \varphi(h(\vec a))| for all formulas, |\varphi \in \mathcal F| and appropriate sequences |\vec a|. We can define the elementary mappings for a language |L’| as the |\mathcal F’|-elementary mappings where |\mathcal F’| consists of (only) the atomic formulas of |L’|. |\mathrm{Mod}^{(L’)}(\mathbb T’)| (or |\mathrm{Mod}^{(L’)}(\sigma’)|) can be defined by |\mathrm{Mod}^{(\mathcal F’)}(\mathbb T’)| (or |\mathrm{Mod}^{(L’)}(\sigma’)|) for the |\mathcal F’| determined this way.

Here’s the theorem as stated in “Accessible Categories”.

Theorem (Proposition 3.2.8): Given any small fragment |\mathcal F| and an |\mathcal F|-basic sentence |\sigma|, the category of |\mathrm{Mod}^{(\mathcal F)}(\sigma)| is equivalent to |\mathrm{Mod}^{(L’)}(\sigma’)| for some other language |L’| and basic sentence |\sigma’| over |L’|, hence by 3.2.1, to the category of models of a small sketch as well.

We’ll replace the |\mathcal F|-basic sentences |\sigma| and |\sigma’| with positive existential |\mathcal F|-theories |\mathbb T| and |\mathbb T’|.

Implied is that |\mathcal F \subseteq L_{\infty,\infty}|, i.e. that |L| and |L’| may be distinct and usually will be. As the proof will show, they agree on sorts and function symbols, but we have different predicate symbols in |L’|.

I’ll be ignoring the final comment referencing Theorem 3.2.1. Theorem 3.2.1 is the main theorem of the section and states that every small sketch gives rise to a language |L| and theory |\mathbb T| (or basic sentence |\sigma|) and vice versa such that the category of models of the sketch are equivalent to models of |\mathbb T| (or |\sigma|). Thus, the final comment is an immediate corollary.

For us, the interesting part of 3.2.8 is that it takes a classical first-order theory, |\mathbb T|, and produces a positive existential theory, as represented by |\mathbb T’|, that has an equivalent, in fact isomorphic, category of models. This positive existential theory is called the Morleyization of the first-order theory.

In particular, if we have a finitary classical first-order theory, then we get a coherent theory with the same models. This means to study models of classical first-order theories, it’s enough to study models of coherent theories via the Morleyization of the classical first-order theories. This allows many techniques for geometric and coherent theories to be applied, e.g. (pre)topos theory and classifying toposes. As stated before, the theorem statement doesn’t actually make it clear that the result holds for a restricted degree of “infinitariness”, but this is obvious from the proof.

Proof

I’ll quote the first few sentences of the proof to which I have nothing to add.

The idea is to replace each formula in |\mathcal F| by a new predicate. Let the sorts of the language |L’| be the same as those of |L|, and similarly for the [function] symbols.

The description of the predicate symbols is complicated by their (potential) infinitary nature. I’ll quote the proof here as well as I have nothing to add and am not as interested in this case. The finitary quantifiers case would be similar, just slightly less technical. It would be even simpler if we defined formulas in a given (ordered) variable context as is typical in categorical logic.

With any formula |\phi(\vec x)| in |\mathcal F|, with |\vec x| the repetition free sequence |\langle x_\beta\rangle_{\beta<\alpha}| of exactly the free variables of |\phi| in a once and for all fixed order of variables, let us associate the new [predicate] symbol |P_\phi| of arity |a : \alpha \to \mathrm{Sorts}| such that |a(\beta) = x_\beta|. The [predicate] symbols of |L’| are the |P_\phi| for all |\phi\in\mathcal F|.

The motivation of |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories should now be totally clear. The |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories are literally basic sentences / positive existential theories in the language of |L’| if we replace all occurrences of subformulas in |\mathcal F| with their corresponding predicate symbol in |L’|.

We can extend any |L|-structure |M| to an |L’|-structure |M^\sharp| such that they agree on all the sorts and function symbols of |L|, and |M^\sharp| satisfies |M^\sharp \vDash P_\varphi(\vec a)| if and only if |M \vDash \varphi(\vec a)|. Which is to say, we define the interpretation of |P_\varphi| to be the subset of the interpretation of its domain specified by |M \vDash \varphi(\vec a)| for all |\vec a| in the domain. In more categorical language, we define the subobject that |P_\varphi| gets sent to to be the subobject |\varphi|.

We can define an |L|-structure, |N^\flat|, for |N| an |L’|-structure by, again, requiring it to do the same thing to sorts and function symbols as |N|, and defining the interpretation of the predicate symbols as |N^\flat \vDash R(\vec a)| if and only if |N \vDash P_{R(\vec x)}(\vec a)|.

We immediately have |(M^\sharp)^\flat = M|.

We can extend this to |L’|-formulas. Let |\psi| be an |L’|-formula, then |\psi^\flat| is defined by a connective-preserving operation for which we only need to specify the action on predicate symbols. We define that by declaring |P_\varphi(\vec t)^\flat| gets mapped to |\varphi(\vec t)|. We extend |\flat| to theories via |\mathbb T’^\flat \equiv \{ \varphi^\flat \vdash \psi^\flat \mid (\varphi\vdash\psi) \in \mathbb T’\}|. A similar induction allows us to prove \[M\vDash\psi^\flat(\vec a)\iff M^\sharp\vDash\psi(\vec a)\] for all |L|-structures |M| and appropriate |\vec a|.

We have |\mathbb T = \mathbb T’^\flat| for a positive existential theory |\mathbb T’| over |L’| (or |\sigma = \rho^\flat| for a basic |L’|-sentence |\rho|) and thus |\varphi^\flat \vDash_M \psi^\flat \iff \varphi \vDash_{M^\sharp}\psi| for all |\varphi\vdash\psi \in \mathbb T’| (or |M \vDash\sigma \iff M^\sharp\vDash\rho|). We want to make it so that any |L’|-structure |N| interpreting |\mathbb T’| (or |\rho|) as |\mathbb T| (or |\sigma|) is of the form |N = M^\sharp| for some |M|. Right now that doesn’t happen because, while the definition of |M^\sharp| forces it to respect the logical connectives in the formula |\varphi| associated to the |L’| predicate symbol |P_\varphi|, this isn’t required for an arbitrary model |N|. For example, nothing requires |N \vDash P_\top| to hold.

The solution is straightforward. In addition to |\mathbb T’| (or |\rho|) representing the theory |\mathbb T| (or |\sigma|), we add in an additional set of axioms |\Phi| that capture the behavior of the (encoded) logical connectives of the formulas associated to the predicate symbols.

These axioms are largely structural with a few exceptions that I’ll address separately. I’ll present this as a collection of sequents for a theory, but we can replace |\vdash| and |\dashv \vdash| with |\to| and |\leftrightarrow| for the basic sentence version. |\varphi \dashv\vdash \psi| stands for two sequents going opposite directions.

\[\begin{align} \varphi(\vec x) & \dashv\vdash P_\varphi(\vec x) \tag{for atomic $\varphi$} \\ P_{R(\vec x)}(\vec t) & \dashv\vdash P_{R(\vec t)}(\vec y) \tag{for terms $\vec t$ with free variables $\vec y$} \\ P_{\bigwedge\Sigma}(\vec x) & \dashv\vdash \bigwedge_{\varphi \in \Sigma} P_\varphi(\vec x_\varphi) \tag{$\vec x_\varphi$ are the free variables of $\varphi$} \\ P_{\bigvee\Sigma}(\vec x) & \dashv\vdash \bigvee_{\varphi \in \Sigma} P_\varphi(\vec x_\varphi) \tag{$\vec x_\varphi$ are the free variables of $\varphi$} \\ P_{\exists\vec y.\varphi(\vec x,\vec y)}(\vec x) & \dashv\vdash \exists\vec y.P_{\varphi(\vec x,\vec y)}(\vec x,\vec y) \end{align}\]

We then have two axiom schemas that eliminate the |\forall| and |\to| by leveraging the defining property of |\mathcal F| being a fragment.

\[\begin{align} P_{\forall\vec y.\varphi(\vec x,\vec y)}(\vec x) & \dashv\vdash P_{\neg\exists\vec y.\neg\varphi(\vec x,\vec y)}(\vec x) \\ P_{\varphi\to\psi}(\vec x) & \dashv\vdash P_{\neg\varphi}(\vec x) \lor P_\psi(\vec x) \end{align}\]

We avoid needing negation by axiomatizing that |P_{\neg\varphi}| is the complement to |P_\varphi|. This is arguably the key idea. Once we can simulate the behavior of negation without actually needing it, then it is clear that we can embed all the other non-positive-existential connectives.

\[\begin{align} & \vdash P_{\neg\varphi}(\vec x) \lor P_\varphi(\vec x) \\ P_{\neg\varphi}(\vec x) \land P_\varphi(\vec x) & \vdash \bot \end{align}\]

|\Phi| is the set of all these sequents. (For the basic sentence version, |\Phi| is the set of universal closures of all these formulas for all |\varphi,\psi \in \mathcal F|.)

Another straightforward structural induction over the subformulas of |\varphi\in\mathcal F| shows that \[N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)\] for any |L’|-structure |N| which is a model of |\Phi|. The only interesting case is the negation case. Here, the induction hypothesis states that |N^\flat\vDash\varphi(\vec a)| agrees with |N\vDash P_\varphi(\vec a)| and the axioms state that |N\vDash P_{\neg\varphi}(\vec a)| is the complement of the latter which thus agrees with the complement of the former which is |N^\flat\vDash\neg\varphi(\vec a)|.

From this, it follows that |N = M^\sharp| for |M = N^\flat| or, equivalently, |N = (N^\flat)^\sharp|.

|({-})^\sharp| and |({-})^\flat| thus establish a bijection between the objects of |\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| (or |\mathrm{Mod}^{(\mathcal F)}(\sigma)|) and |\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|). The morphisms of these two categories would each be subclasses of the morphisms of |Str(L_0)| where |L_0| is the language consisting of only the sorts and function symbols of |L| and thus |L’|. We can show that they are identical subclasses which basically comes down to showing that an elementary mapping of |\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|) is an |\mathcal F|-elementary mapping.

The idea is that such a morphism is a map |h : N \to N’| in |Str(L_0)| which must satisfy \[N \vDash P_\varphi(\vec a) \implies N’ \vDash P_\varphi(h(\vec a))\] for all |\varphi \in \mathcal F| and appropriate |\vec a|. However, since |N = (N^\flat)^\sharp| and |P_\varphi(\vec a)^\flat = \varphi(\vec a)|, we have |N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)| and similarly for |N’|. Thus \[N^\flat \vDash \varphi(\vec a) \implies N’^\flat \vDash \varphi(h(\vec a))\] for all |\varphi \in \mathcal F|, and every such |h| corresponds to an |\mathcal F|-elementary mapping. Choosing |N = M^\sharp| allows us to show the converse for any |\mathcal F|-elementary mapping |g : M \to M’|. |\square|

Commentary

The proof doesn’t particularly care that we’re interpreting the models into |\mathbf{Set}| and would work just as well if we interpreted into some other category with the necessary structure. The amount of structure required would vary with how much “infinitariness” we actually used, though it would need to be a Boolean category. In particular, the proof works as stated (in its theory form) without any infinitary connectives being implied for mapping finitary classical first-order logic to coherent logic.

We could simplify the statement and the proof by first eliminating |\forall| and |\to| and then considering the proof over classical first-order logic with the connectives |\{\bigwedge,\bigvee,\exists,\neg\}|. This would simplify the definition of fragment and remove some cases in the proof.

To reiterate, the key is how we handle negation.

Defunctionalization

Morleyization is related to defunctionalization1. For simplicity, I’ll only consider the finitary, propositional case, i.e. |L_{\omega,\varnothing}|.

In this case, we can consider each |P_\varphi| to be a new data type. In most cases, it would be a newtype to use Haskell terminology. The only non-trivial case is |P_{\neg\varphi}|. Now, the computational interpretation of classical propositional logic would use control operators to handle negation. Propositional coherent logic, however, has a straightforward (first-order) functional interpretation. Here, a negated formula, |\neg\varphi|, is represented by an primitive type |P_{\neg\varphi}|.

The |P_{\neg\varphi} \land P_\varphi \vdash \bot| sequent is the apply function for the defunctionalized continuation (of type |\varphi|). Even more clearly, this is interderivable with |P_{\neg\varphi} \land \varphi’ \vdash \bot| where |\varphi’| is the same as |\varphi| except the most shallow negated subformulas are replaced with the corresponding predicate symbols. In particular, if |\varphi| contains no negated subformulas, then |\varphi’=\varphi|. We have no way of creating new values of |P_{\neg\varphi}| other than via whatever sequents have been given. We can, potentially, get a value of |P_{\neg\varphi}| by case analyzing on |\vdash \mathsf{lem}_\varphi : P_{\neg\varphi}\lor P_\varphi|.

What this corresponds to is a first-order functional language with a primitive type for each negated formula. Any semantics/implementation for this, will need to decide if the primitive type |P_{\neg\varphi}| is empty or not, and then implement |\mathsf{lem}_\varphi| appropriately (or allow inconsistency). A programmer writing a program in this signature, however, cannot assume either way whether |P_{\neg\varphi}| is empty unless they can create a program with that type.

As a very slightly non-trivial example, let’s consider implementing |A \to P_{\neg\neg A}| corresponding to double negating. Using Haskell-like syntax, the program looks like:

proof :: A -> NotNotA
proof a = case lem_NotA of
            Left notNotA -> notNotA
            Right notA -> absurd (apply_NotA (notA, a))

where lem_NotA :: Either NotNotA NotA, apply_NotA :: (NotA, A) -> Void, and absurd :: Void -> a is the eliminator for |\bot| where |\bot| is represented by Void.

Normally in defunctionalization we’d also be adding constructors to our new types for all the occurrences of lambdas (or maybe |\mu|s would be better in this case). However, since the only thing we can do (in general) with NotA is use apply_A on it, no information can be extracted from it. Either it’s inhabited and behaves like (), i.e. |\top|, or it’s not inhabited and behaves like Void, i.e. |\bot|. We can even test for this by case analyzing on lem_A which makes sense because in the classical logic this formula was decidable.

Bonus: Grothendieck toposes as categories of models of sketches

The main point of this section of “Accessible Categories” is to show that we can equivalently view categories of models of sketches as categories of models of theories. In particular, models of geometric sketches, those whose cone diagrams are finite but cocone diagrams are arbitrary, correspond to models of geometric theories.

We can view a site, |(\mathcal C, J)|, for a Grothendieck topos as the data of a geometric sketch. In particular, |\mathcal C| becomes the underlying category of the sketch, we add cones to capture all finite limits, and the coverage, |J|, specifies the cocones. These cocones have a particular form as the quotient of the kernel of a sink as specified by the sieves in |J|. (We need to use the apex of the cones representing pullbacks instead of actual pullbacks.)

Lemma 3.2.2 shows the sketch-to-theory implication. The main thing I want to note about its proof is that it illustrates how infinitely large cones would require infinitary (universal) quantification (in addition to the unsurprising need for infinitary conjunction), but infinitely large cocones do not (but they do require infinitary disjunction). I’ll not reproduce it here, but it comes down to writing out the normal set-theoretic constructions of limits and colimits (in |\mathbf{Set}|), but instead of using some first-order theory of sets, like ZFC, uses of sets would be replaced with (infinitary) logical operations. The “infinite tuples” of an infinite limit become universal quantification over an infinitely large number of free variables. For the colimits, though, the most complex use of quantifiers is an infinite disjunction of increasingly deeply nested quantifiers to represent the transitive closure of a relation, but no single disjunct is infinitary. Figuring out the infinitary formulas is a good exercise.


  1. An even more direct connection to defunctionalization is the fact that geometric logic is the internal logic of Grothendieck toposes, but Grothendieck toposes are elementary toposes and so have the structure to model implication and universal quantification. It’s just that those connectives aren’t preserved by geometric morphisms. For implication, the idea is that |A \to B| is represented by |\bigvee\{\bigwedge\Gamma\mid \Gamma,A\vdash B\}| where |\Gamma| is finite. We can even see how a homomorphism that preserved geometric logic structure will fail to preserve this definition of |\to|. Specifically, there could be additional contexts not in the image of the homomorphism that should be included in the image of the disjunction for it to lead to |\to| in the target but won’t be.↩︎

July 19, 2024 02:35 AM

July 18, 2024

Haskell Interlude

53: Garrett Morris

In this episode, Garrett Morris talks with Wouter Swierstra  and Niki Vazou about his work on Haskell’s type classes, how to fail successfully, and how to construct a set of ponies.

by Haskell Podcast at July 18, 2024 12:00 PM

Brent Yorgey

Rivers: eventually constant streams in Haskell

Rivers: eventually constant streams in Haskell

Posted on July 18, 2024
Tagged , , , , , , ,

Lately I’ve been thinking about representing eventually constant streams in Haskell. An eventually constant stream is an infinite stream which eventually, after some finite prefix, starts repeating the same value forever. For example,

\(6, 8, 2, 9, 3, 1, 1, 1, 1, \dots\)

There are many things we can do in a decidable way with eventually constant streams that we can’t do with infinite streams in general—for example, test them for equality.

This is a work in progress. I only have one specific use case in mind (infinite-precision two’s complement arithmetic, explained at the end of the post), so I would love to hear of other potential use cases, or any other feedback. Depending on the feedback I may eventually turn this into a package on Hackage.

This blog post is typeset from a literate Haskell file; if you want to play along you can download the source from GitHub.

The River type

Some preliminaries:

{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE ViewPatterns #-}

module River where

import Data.Monoid (All (..), Any (..))
import Data.Semigroup (Max (..), Min (..))
import Prelude hiding (all, and, any, drop, foldMap, maximum, minimum, or, repeat, take, zipWith, (!!))
import Prelude qualified as P

Now let’s get to the main definition. A value of type River a is either a constant C a, representing an infinite stream of copies of a, or a Cons with an a and a River a.

data River a = C !a | Cons !a !(River a)
  deriving Functor

I call this a River since “all Rivers flow to the C”!

The strictness annotations on the a values just seem like a good idea in general. The strictness annotation on the River a tail, however, is more interesting: it’s there to rule out infinite streamsAlthough the strictness annotation on the River a is semantically correct, I could imagine not wanting it there for performance reasons; I’d be happy to hear any feedback on this point.

constructed using only Cons, such as flipflop = Cons 0 (Cons 1 flipflop). In other words, the only way to make a non-bottom value of type Stream a is to have a finite sequence of Cons finally terminated by C.

We need to be a bit careful here, since there are multiple ways to represent streams which are semantically supposed to be the same. For example, Cons 1 (Cons 1 (C 1)) and C 1 both represent an infinite stream of all 1’s. In general, we have the law

C a === Cons a (C a),

and want to make sure that any functions we write respect this It would be interesting to try implementing rivers as a higher inductive type, say, in Cubical Agda.

equivalence, i.e. do not distinguish between such values. This is the reason I did not derive an Eq instance; we will have to write our own.

We can partially solve this problem with a bidirectional pattern synonym:

expand :: River a -> River a
expand (C a) = Cons a (C a)
expand as = as

infixr 5 :::
pattern (:::) :: a -> River a -> River a
pattern (:::) a as <- (expand -> Cons a as)
  where
    a ::: as = Cons a as

{-# COMPLETE (:::) #-}

Matching with the pattern (a ::: as) uses a view pattern to potentially expand a C one step into a Cons, so that we can pretend all River values are always constructed with (:::). In the other direction, (:::) merely constructs a Cons.

We mark (:::) as COMPLETE on its own since it is, in fact, sufficient to handle every possible input of type River. However, in order to obtain terminating algorithms we will often include one or more special cases for C.

Normalization by construction?

As an alternative, we could use a variant pattern synonym:

infixr 5 ::=
pattern (::=) :: Eq a => a -> River a -> River a
pattern (::=) a as <- (expand -> Cons a as)
  where
    a' ::= C a | a' == a = C a
    a ::= as = Cons a as

{-# COMPLETE (::=) #-}

As compared to (:::), this has an extra Eq a constraint: when we construct a River with (::=), it checks to see whether we are consing an identical value onto an existing C a, and if so, simply returns the C a unchanged. If we always use (::=) instead of directly using Cons, it ensures that River values are always normalized—that is, for every eventually constant stream, we always use the canonical representative where the element immediately preciding the constant tail is not equal to it.

This, in turn, technically makes it impossible to write functions which do not respect the equivalence C a === Cons a (C a), simply because they will only ever be given canonical rivers as input. However, as we will see when we discuss folds, it is still possible to write “bad” functions, i.e. functions that are semantically questionable as functions on eventually constant streams—it would just mean we cannot directly observe them behaving badly.

The big downside of using this formulation is that the Eq constraint infects absolutely everything—we even end up with Eq constraints in places where we would not expect them (for example, on head :: River a -> a), because the pattern synonym incurs an Eq constraint anywhere we use it, regardless of whether we are using it to construct or destruct River values. As you can see from the definition above, we only do an equality check when using (::=) to construct a River, not when using it to pattern-match, but there is no way to give the pattern synonym different types in the two directions.Of course, we could make it a unidirectional pattern synonym and just make a differently named smart constructor, but that seems somewhat ugly, as we would have to remember which to use in which situation.

So, because this normalizing variant does not really go far enough in removing our burden of proof, and has some big downsides in the form of leaking Eq constraints everywhere, I have chosen to stick with the simpler (:::) in this post. But I am still a bit unsure about this choice; in fact, I went back and forth two times while writing.

We can at least provide a normalize function, which we can use when we want to ensure normalization:

normalize :: Eq a => River a -> River a
normalize (C a) = C a
normalize (a ::= as) = a ::= as

Some standard functions on rivers

With the preliminary definitions out of the way, we can now build up a library of standard functions and instances for working with River a values. To start, we can write an Eq instance as follows:

instance Eq a => Eq (River a) where
  C a == C b = a == b
  (a ::: as) == (b ::: bs) = a == b && as == bs

Notice that we only need two cases, not four: if we compare two values whose finite prefixes are different lengths, the shorter one will automatically expand (via matching on (:::)) to the length of the longer.

We already derived a Functor instance; we can also define a “zippy” Applicative instance like so:

repeat :: a -> River a
repeat = C

instance Applicative River where
  pure = repeat
  C f <*> C x = C (f x)
  (f ::: fs) <*> (x ::: xs) = f x ::: (fs <*> xs)

zipWith :: (a -> b -> c) -> River a -> River b -> River c
zipWith = liftA2

We can write safe head, tail, and index functions:

head :: River a -> a
head (a ::: _) = a

tail :: River a -> River a
tail (_ ::: as) = as

infixl 9 !!
(!!) :: River a -> Int -> a
C a !! _ = a
(a ::: _) !! 0 = a
(_ ::: as) !! n = as !! (n - 1)

We can also write take and drop variants. Note that take returns a finite prefix of a River, which is a list, not another River. The special case for drop _ (C a) is not strictly necessary, but makes it more efficient.

take :: Int -> River a -> [a]
take n _ | n <= 0 = []
take n (a ::: as) = a : take (n - 1) as

drop :: Int -> River a -> River a
drop n r | n <= 0 = r
drop _ (C a) = C a
drop n (_ ::: as) = drop (n - 1) as

There are many other such functions we could implement (e.g. span, dropWhile, tails…); if I eventually put this on Hackage I would be sure to have a much more thorough selection of functions. Which functions would you want to see?

Folds for River

How do we fold over a River a? The Foldable type class requires us to define either foldMap or foldr; let’s think about foldMap, which would have type

foldMap :: Monoid m => (a -> m) -> River a -> m

However, this doesn’t really make sense. For example, suppose we have a River Int; if we had foldMap with the above type, we could use foldMap Sum to turn our River Int into a Sum Int. But what is the sum of an infinite stream of Int? Unless the eventually repeating part is C 0, this is not well-defined. If we simply write a function to add up all the Int values in a River, including (once) the value contained in the final C, this would be a good example of a semantically “bad” function: it does not respect the law C a === a ::: C a. If we ensure River values are always normalized, we would not be able to directly observe anything amiss, but the function still seems suspect.

Thinking about the law C a === a ::: C a again is the key. Supposing foldMap f (C a) = f a (since it’s unclear what else it could possibly do), applying foldMap to both sides of the law we obtain f a == f a <> f a, that is, the combining operation must be idempotent. This makes sense: with an idempotent operation, continuing to apply the operation to the infinite constant tail will not change the answer, so we can simply stop once we reach the C.

We can create a subclass of Semigroup to represent idempotent semigroups, that is, semigroups for which a <> a = a. There are several idempotent semigroups in base; we list a few below. Note that since rivers are never empty, we can get away with just a semigroup and not a monoid, since we do not need an identity value onto which to map an empty structure.

class Semigroup m => Idempotent m
  -- No methods, since Idempotent represents adding only a law,
  -- namely, ∀ a. a <> a == a

-- Exercise for the reader: convince yourself that these are all
-- idempotent
instance Idempotent All
instance Idempotent Any
instance Idempotent Ordering
instance Ord a => Idempotent (Max a)
instance Ord a => Idempotent (Min a)

Now, although we cannot make a Foldable instance, we can write our own variant of foldMap which requires an idempotent semigroup instead of a monoid:

foldMap :: Idempotent m => (a -> m) -> River a -> m
foldMap f (C a) = f a
foldMap f (a ::: as) = f a <> foldMap f as

fold :: Idempotent m => River m -> m
fold = foldMap id

We can then instantiate it at some of the semigroups listed above to get some useful folds. These are all guaranteed to terminate and yield a sensible answer on any River.

and :: River Bool -> Bool
and = getAll . foldMap All

or :: River Bool -> Bool
or = getAny . foldMap Any

all :: (a -> Bool) -> River a -> Bool
all f = and . fmap f

any :: (a -> Bool) -> River a -> Bool
any f = or . fmap f

maximum :: Ord a => River a -> a
maximum = getMax . foldMap Max

minimum :: Ord a => River a -> a
minimum = getMin . foldMap Min

lexicographic :: Ord a => River a -> River a -> Ordering
lexicographic xs ys = fold $ zipWith compare xs ys

We could make an instance Ord a => Ord (River a) with compare = lexicographic; however, in the next section I want to make a different Ord instance for a specific instantiation of River.

Application: \(2\)-adic numbers

Briefly, here’s the particular application I have in mind: infinite-precision two’s complement arithmetic, i.e. \(2\)-adic numbers. Chris Smith also wrote about \(2\)-adic numbers recently; however, unlike Chris, I am not interested in \(2\)-adic numbers in general, but only specifically those \(2\)-adic numbers which represent an embedded copy of \(\mathbb{Z}\). These are precisely the eventually constant ones: nonnegative integers are represented in binary as usual, with an infinite tail of \(0\) bits, and negative integers are represented with an infinite tail of \(1\) bits. For example, \(-1\) is represented as an infinite string of all \(1\)’s. The amazing thing about this representation (and the reason it is commonly used in hardware) is that the usual addition and multiplication algorithms continue to work without needing special cases to handle negative integers. If you’ve never seen how this works, you should definitely read about it.

data Bit = O | I deriving (Eq, Ord, Enum)

type Bits = River Bit

First, some functions to convert to and from integers. We only need special cases for \(0\) and \(-1\), and beyond that it is just the usual business with mod and div to peel off one bit at a time, or multiplying by two and adding to build up one bit at a time. (I am a big fan of LambdaCase.)

toBits :: Integer -> Bits
toBits = \case
  0  -> C O
  -1 -> C I
  n  -> toEnum (fromIntegral (n `mod` 2)) ::: toBits (n `div` 2)

fromBits :: Bits -> Integer
fromBits = \case
  C O -> 0
  C I -> -1
  b ::: bs -> 2 * fromBits bs + fromIntegral (fromEnum b)

For testing, we can also make a Show instance. When it comes to showing the infinite constant tail, I chose to repeat the bit 3 times and then show an ellipsis; this is not really necessary but somehow helps my brain more easily see whether it is an infinite tail of zeros or ones.

instance Show Bits where
  show = reverse . go
   where
    go (C b) = replicate 3 (showBit b) ++ "..."
    go (b ::: bs) = showBit b : go bs

    showBit = ("01" P.!!) . fromEnum

Let’s try it out:

ghci> toBits 26
...00011010
ghci> toBits (-30)
...11100010
ghci> fromBits (toBits (-30))
-30
ghci> quickCheck $ \x -> fromBits (toBits x) == x
+++ OK, passed 100 tests.

Arithmetic on \(2\)-adic numbers

Let’s implement some arithmetic. First, incrementing. It is standard except for a special case for C I (without which, incrementing C I would diverge). Notice that we use (::=) instead of (:::), which ensures our Bits values remain normalized.

inc :: Bits -> Bits
inc = \case
  C I      -> C O
  O ::= bs -> I ::= bs
  I ::= bs -> O ::= inc bs

dec is similar, just the opposite:

dec :: Bits -> Bits
dec = \case
  C O      -> C I
  I ::= bs -> O ::= bs
  O ::= bs -> I ::= dec bs

Then we can write inv to invert all bits, and neg as the composition of inc and inv.

inv :: Bits -> Bits
inv = fmap $ \case { O -> I; I -> O }

neg :: Bits -> Bits
neg = inc . inv

Trying it out:

λ> toBits 3
...00011
λ> neg it
...11101
λ> inc it
...1110
λ> inc it
...111
λ> inc it
...000
λ> inc it
...0001
λ> dec it
...000
λ> dec it
...111

Finally, addition, multiplication, and Ord and Num instances:

add :: Bits -> Bits -> Bits
add = \cases
  (C O)      y          -> y
  x          (C O)      -> x
  (C I)      (C I)      -> O ::= C I
  (I ::= xs) (I ::= ys) -> O ::= inc (add xs ys)
  (x ::= xs) (y ::= ys) -> (x .|. y) ::= add xs ys
 where
  I .|. _ = I
  _ .|. y = y

mul :: Bits -> Bits -> Bits
mul = \cases
  (C O)      _     -> C O
  _          (C O) -> C O
  (C I)      y     -> neg y
  x          (C I) -> neg x
  (O ::= xs) ys    ->         O ::= mul xs ys
  (I ::= xs) ys    -> add ys (O ::= mul xs ys)

instance Ord Bits where
  -- It's a bit mind-boggling that this works
  compare (C x) (C y) = compare y x
  compare (x ::= xs) (y ::= ys) = compare xs ys <> compare x y

instance Num Bits where
  fromInteger = toBits
  negate = neg
  (+) = add
  (*) = mul
  abs = toBits . abs . fromBits
  signum = toBits . signum . fromBits
λ> quickCheck $ withMaxSuccess 1000 $ \x y -> fromBits (mul (toBits x) (toBits y)) == x * y
+++ OK, passed 1000 tests.
λ> quickCheck $ \x y -> compare (toBits x) (toBits y) == compare x y
+++ OK, passed 100 tests.

Just for fun, let’s implement the Collatz map:

collatz :: Bits -> Bits
collatz (O ::= bs) = bs
collatz bs@(I ::= _) = 3*bs + 1
λ> P.take 20 $ map fromBits (iterate collatz (toBits (-13)))
[-13,-38,-19,-56,-28,-14,-7,-20,-10,-5,-14,-7,-20,-10,-5,-14,-7,-20,-10,-5]
λ> P.take 20 $ map fromBits (iterate collatz (toBits 7))
[7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1,4,2,1]

Questions / future work

  • Is (:::) or (::=) the better default? It’s tempting to just say “provide both and let the user decide”. I don’t disagree with that; however, the question is which one we use to implement various basic functions such as map/fmap. For example, if we use (:::), we can make a Functor instance, but values may not be normalized after mapping.

  • Can we generalize from eventually constant to eventually periodic? That is, instead of repeating the same value forever, we cycle through a repeating period of some finite length. I think this is possible, but it would make the implementation more complex, and I don’t know the right way to generalize foldMap. (We could insist that it only works for commutative idempotent semigroups, but in that case what’s the point of having a sequence of values rather than just a set?)

Happy to hear any comments or suggestions!

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at July 18, 2024 12:00 AM

July 17, 2024

Chris Smith 2

The Collatz Step and 2-adic Integers

This is a follow-up to my previous post on Collatz in base 2 and 3. I got a response from a reader, Olaf K., who pointed out that the functions defined there work just fine not only on finite sequences of base 2/3 digits, but infinite sequences as well. In the base 2 case, where the digits were listed from right to left, this has a common mathematical interpretation. An integer with possibly non-zero bits extending infinitely to the left is a called 2-adic integer. And the function defined there yields some interesting observations when applied to the 2-adic integers!

Brief introduction to 2-adic integers

A standard binary integer is a finite sequence of bits, either 0 or 1, with each bit having a value equal to some power of two. Because any non-negative integer can be written as a sum of powers of two, it can be written in this way.

But a finite sequence isn’t exactly right. We can always make that sequence longer, incorporating greater powers of two, by adding zeros on the left side. For this reason, if we think of a binary number as a finite sequence, we get non-unique representations: one with a 1 as the largest digit, but others that add leading zeros on the left. This is messy, so in general we tend to think of a binary integer as having infinitely many bits, but with the constraint that only finitely many of them can be non-zero. We don’t usually write the leading zeros, but that’s just a matter of notation. They are still there.

This leads to the obvious question: what happens if you remove the restriction that all but finitely many digits must be zero? The answer is the 2-adic integers. It turns out that we can write a lot of rational numbers as 2-adic integers. For example:

  • Even without a negative sign, we can write -1 as …111, the 2–adic integer all of whose bits are 1. Why? Try adding one, and you’ll notice that the result is all 0s, so clearly this is the opposite of 1.
  • What happens when you multiply 3 (binary 11) by the 2-adic integer …01010101011? If you work it out, you’ll get 1. So that 2-adic integer is the multiplicative inverse of 3, making it effectively 1/3.

In fact, it turns out that the 2-adic integers include all rational numbers with odd denominators! Not only that, but all of them ultimately end up with digits in a repeating pattern to the left, similar to how rational numbers in traditional decimal representations end up with digits in a repeating pattern to the right. (There are even irrational 2–adic integers that don’t repeat their digits; but they don’t correspond to the traditional irrational numbers, but rather to some completely new concept that doesn’t happen in traditional numbers!)

The Collatz map on 2-adics

The Collatz map on 2-adic integers can be defined in precisely the same way as it is on integers: even numbers are halved, while odd numbers are mapped to 3n+1. But hold on… if we can represent numbers like 1/3, what does it mean to be even or odd?

For arbitrary rationals, this would be a tricky question to answer, but in the 2–adic integers, there’s an easy answer: just look at the 1s place. If it’s 0, the number is even; if it’s 1, the number is odd. This is equivalent to saying that a rational number is even iff its numerator is even. And this notion is well-defined because we’ve already constrained the denominator to be odd.

I’m now going to redefine the Collatz step function on binary numbers from my previous post, but with one difference: I’ll assume that the numbers are odd. Because every number therefore ends with a 1, we won’t represent the 1 explicitly, but rather let it be implied. This implied 1 is expressed by the OddBits newtype.

data Bit = B0 | B1
newtype OddBits = XXX1 [Bit]
data State = C0 | C1 | C1_Trim | C2 | C2_Trim

threeNPlusOneDiv2s :: OddBits -> OddBits
threeNPlusOneDiv2s (XXX1 bits) = XXX1 (go C2_Trim bits)
where
go C1_Trim [] = []
go C1_Trim (B0 : bs) = go C0 bs
go C1_Trim (B1 : bs) = go C2_Trim bs
go C2_Trim [] = []
go C2_Trim (B0 : bs) = go C1_Trim bs
go C2_Trim (B1 : bs) = go C2 bs
go C0 [] = []
go C0 (B0 : bs) = B0 : go C0 bs
go C0 (B1 : bs) = B1 : go C1 bs
go C1 [] = [B1]
go C1 (B0 : bs) = B1 : go C0 bs
go C1 (B1 : bs) = B0 : go C2 bs
go C2 [] = [B0, B1]
go C2 (B0 : bs) = B0 : go C1 bs
go C2 (B1 : bs) = B1 : go C2 bs

This function, in a single pass, multiplies an odd number by 3, adds 1, then divides by 2 as many times as needed to make the result odd. Therefore, this is a map from the odd numbers to other odd numbers. The states represent:

  1. The amount carried from the previous bit when multiplying by 3.
  2. Whether the lower-order bits are all zeros, in which case we should continue to trim zeros instead of emit them.

This function still handles finite lists, but you can generally ignore those equations, since they are equivalent to extending with 0 bits to the left. And as Olaf suggests, the function extends to the 2-adic numbers by operating on infinite lists. (That is, except for one specific input: …010101, on which the function hangs non-productively. That’s because this 2-adic integer corresponds to the rational -1/3, and 3(-1/3) + 1 = 0, which can never be halved long enough to yield another odd number!)

Fixed points

The Collatz conjecture amounts to finding the orbits of the Collatz map, which fall into two categories: periodic orbits, which repeat infinitely, and divergent orbits, which grow larger indefinitely without repeating. Among positive integers, the conjecture is that the only orbit is the periodic one that ends in 4, 2, 1, 4, 2, 1, 4, 2, 1…

Since we’re skipping the even numbers, our step function has the property that f(1) = 1, making 1 a fixed point. Not all periodic orbits are fixed points, but it’s natural to ask whether there are any other fixed points of this map. Let’s explore this question!

We start by looking only at the non-terminating equations for the recursive definition. (Recall that the terminating equations are really just duplicates of these, since leading zeros are equivalent to termination.)

    go C1_Trim (B0 : bs) = go C0 bs
go C1_Trim (B1 : bs) = go C2_Trim bs
go C2_Trim (B0 : bs) = go C1_Trim bs
go C2_Trim (B1 : bs) = go C2 bs
go C0 (B0 : bs) = B0 : go C0 bs
go C0 (B1 : bs) = B1 : go C1 bs
go C1 (B0 : bs) = B1 : go C0 bs
go C1 (B1 : bs) = B0 : go C2 bs
go C2 (B0 : bs) = B0 : go C1 bs
go C2 (B1 : bs) = B1 : go C2 bs
State transition diagram for Collatz step

These observations will be relevant:

  1. We start in the state C2_Trim
  2. The Trim states do not emit bits to the result, only consuming them. Therefore, the output will lag behind the input by some number of bits depending on how long evaluation lingers in these Trim states.
  3. Once we leave the Trim states, we can never re-enter them. Inputs and outputs will then match exactly, so the lag stays the same forever.
  4. If we’re searching for inputs that evaluate in a certain way, the bits of the input are completely determined by whether we want to stay in Trim states or leave them, and then whether we want the next output to be a 0 or 1.

Because of this, when searching for a fixed point of this function, the input value is entirely determined by one choice: for how many input bits do we choose to remain in the Trim states. Once that single choice is made, the rest of the input is entirely determined by that plus the desire for the input to be a fixed point.

Let’s work some out.

Lag = 1. Here, we want to stay in the Trim states for only one bit of input. Then that bit must be a 1, since that’s what gets us out of the Trim state. From that point, we will stay in state C2, and in order to produce the 1 output bits to match the inputs, we’ll need to keep seeing 1s in the input! Then the fixed point here is XXX1 (repeat 1), which corresponds to the 2-adic integer …111.

We noted earlier that this 2–adic integer corresponds to -1. We can double-check that -1 is indeed a fixed point of the function that computes 3n+1 and then divides by 2 until the result is odd. To compute f(-1), we first compute 3(-1)+1 = -2, then divide by 2 to get -1, which is odd. So it is indeed a fixed point.

Lag = 2. Here, we want to stay in the Trim states for two bits of input. That means we expect the first bit to be 0 so that we’ll switch to state C1_Trim, and then the second bit to be 0 again to transition us to the C0 state. At this point, we’re producing output, which must match the input bits already chosen, and the input bit we need will always be a 0 so as to produce the 0 that matches the input. Then the fixed point is XXX1 (repeat 0), and keeping in mind that there’s an implied 1 on the end, this corresponds to the 2-adic integer …0001, which is just 1.

This is the standard period orbit mentioned up above: 4, 2, 1, 4, 2, 1, which is just 1s when we skip the even numbers.

Now things start to get interesting:

Lag = 3. To stay in the Trim states for exactly three bits of input, we need those bits to be 0, 1, and 1. This ends up in state C2, with the input sequence 011 left to match. The next input must therefore be 0, yielding a 0 as output, and leaving us in state C1 with 110 left to match. The next input must be 0 again, leaving us in C0 with 100 left to match. We need a 1 next, leaving us in C1 with 001 left to match. Then we need to see a 1 again to leave us in C2 with 011 left to match. That’s the same state and pending bits as we were in when we left the Trim states, so we’ve finally found a loop.

The fixed point that produces this behavior is XXX1 (cycle [0, 1, 1, 0]), and including the implied 1 on the end, this corresponds to the 2-adic integer …(0110)(0110)(0110)1. This turns out to correspond to the rational number 1/5. We can check that 3(1/5) + 1 = 8/5, and halving that three times yields 1/5 again, so this is indeed a fixed point of the map, even though it’s not an integer.

Observations about fixed points

A few observations can be made about the fixed points of this map:

  1. There are an infinite number of them. Every possible choice of lag, starting with one but increasing without bound, yields a fixed point, and they all must be different since they produce different behaviors.
  2. There are only a countably infinite number of them. This is the only way to produce a fixed point, so the list of fixed points we compute in this way is complete. There are no others.
  3. The 2-adic fixed points are all rational. Once we leave to Trim states, there’s only a finite amount of state involved in determining what happens from here: the state of the function implementation, together with the pending input bits remaining to match, which keeps the same length. We progress through this finite number of states indefinitely, so we must eventually reach the same state twice, and from that point, the bits will follow a repeating pattern. Therefore, interpreted as a 2–adic number, they will correspond to a rational value.
  4. The only integer fixed points are 1 and -1. You can see this for non-negative integers by looking at the terminating equations in the original code: the longest terminating case produces two bits at the end before ending in trailing zeros, so the lag can be no greater than 2. Similar logic applies to negative integers, which have 1s extending infinitely to the left.

In fact, if we work out what’s going on here, we find that fixpoints of this function are precisely 1 / (2 - 3) for n > 0. (In fact, n = 0 yields -1/2, which is also a fixed point as a rational, but is not a 2-adic integer so it didn’t occur in our list.)

Periodic points

We can press further on this, and consider periodic points with period greater than one, by composing the function with itself and writing down the state machine that results. This grows more complex, as every additional iteration of the function adds a new choice about lag, yielding a larger-dimensional array of periodic points. The general form of the computation seems to remain the same, but the state diagrams grow increasingly daunting.

State transition diagram for two Collatz steps

The diagram above gives the state transitions for the composition of two Collatz steps. The left two states are those where the first of the two steps has yet to produce a bit. The next six are states where the first is producing bits but the second is not. The final nine, then, represent the situation where both of the two composed steps is productive.

I have not labeled the outputs, but the general rule is that the trim states have no output, the non-trim states with an even number of occurrences of C1 will produce outputs that are the same as their inputs, and the states with odd occurrences of C1 produce outputs the opposite of their inputs.

Because there are now two kinds of trimming that happen, we can choose any combination of the two lags, giving a 2D array of points that repeat with period 2. The computations are similar to the above, so I’ll just give the results for lag 1, 2, and 3 in each dimension.

https://medium.com/media/63f3f74afed7c389e34a492f2bf83e18/href

The diagonal elements are not new. This is expected, though, because a periodic point of period 1 is also periodic with period 2. The off-diagonal elements yield three new period-2 orbits:

  • -5, -7, -5, …
  • 5/7, 11/7, 5/7, …
  • 7/23, 11/23, 7/23, …

Just as with fixed points, we can work out a closed form for the period two points. This time, we get (2^m + 3) / (2^(m+n) - 9). As we noticed above, this simplifies to the earlier formula for fixed points when m = n. (Hint: the denominator factors as a difference of squares.)

You might wonder if there’s a pattern here that continues to all periodic points, and indeed there is! On the Wikipedia page about the Collatz Conjecture, a formula is given for the unique rational number that generates any periodic sequence of even and odd numbers from the “shortcut” Collatz map. (The shortcut map is defined there as the map that divides by 2 once after each 3n+1 step.)

To translate this into our terms:

  • m is the period, which is also the number of lag values.
  • k₀ = 0, because the function defined here can only be evaluated on odd numbers.
  • Each additional kᵢ is the sum of the first i lag values.
  • n is the sum of all the lag values.

We can make the interesting observation that the sign of the number is determined by the denominator: if n > m log₂(3), or about 1.585 m, then the result will be positive. But n/m is also understood as the average of the lag values. So looking for positive periodic points amounts to choosing lag values with an average of greater than about 1.585. But perhaps not too much greater, if we want them to be integers, because we should not allow the denominator to grow too large. (Indeed, we saw in the fixed point case above that there was a bound on how large the lag could grow because the output needs to catch up!) Working out more precise upper bounds on lag would be an interesting step toward a search for periodic points.

The fact that this rational number is unique, left somewhat mysterious in the Wikipedia statement, comes down to the way that fixed points of these composed state machines are always determined by how long we linger in each of the trim states. The result on Wikipedia already implies that these are the only periodic points among the rationals with odd denominators. The analysis here also makes it clear that these are the only periodic points in the 2-adic integers, as well, so there are no irrational 2-adic periodic points of the Collatz map.

Of course, the trick would be to show that none of these rational values except for 1 are positive integers, and then that there are also no orbits that increase aperiodically. Actually solving the Collatz Conjecture is left as an exercise for the reader. :)

by Chris Smith at July 17, 2024 04:03 PM

Well-Typed.Com

The Haskell Unfolder Episode 29: exceptions, annotations and backtraces

Today, 2024-07-17, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …) we are streaming the 29th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 29: exceptions, annotations and backtraces

Version 9.10 of GHC introduces an extremely useful new feature: exception annotations and automatic exception backtraces. This new feature, four years in the making, can be a life-saver when debugging code and has not received nearly as much attention as it deserves. In this episode of the Haskell Unfolder we therefore give an overview of the changes and discuss how we can take advantage of them.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at July 17, 2024 12:00 AM

July 13, 2024

Matthew Sackman

Let's build! Boolean operations of polygons: Part 2 - Intersections

In this series:


In Part 1 I explained what the boolean operations on polygons are, and I gave a motivating example by way of an algorithm that traces out the intersections of a set of overlapping polygons. That algorithm uses intersection (∩), union (∪), and subtraction (−) operations between polygons. It’s also necessary to identify duplicate polygons, which can be harder that you might first think: a polygon is typically defined by giving the coördinates of its vertices, in some order (for example, in GeoJSON, vertices (of an exterior ring) should be listed in anti-clockwise order). But the list of vertices can start at any of the vertices. More problematic is that the same polygon, calculated in different ways, may have tiny differences in its coördinates due to the rounding of floating point numbers. I’ll be spending a lot of time talking about floating point numbers very soon!

One algorithm that is widely used for these boolean operations is given by the papers:

The algorithm described by these papers will be the focus of the blog posts in this series. These papers are not quite enough on their own to understand the algorithm. You will also want to refer to bits of these books:

  • “Computational Geometry - An Introduction” by Franco P. Preparata and Michael Ian Shamos
  • “Geometric Tools for Computer Graphics” by Philip J. Schneider and David H. Eberly

Both of these you can find after a quick web search.

Francisco Martínez has a page for these papers and if that’s disappeared then the Wayback Machine has a copy. That page provides a link to a C++ implementation which also appears in various Github repositories, such as this one. This algorithm has been quite widely implemented (sometimes with a few changes), and this Github repo provides a list of implementations, which I reproduce verbatim here:

Language URL Notes
C++ fmartin/bool_op original implementation
JavaScript w8r/martinez an extension of the original algorithm
JavaScript mfogel/polygon-clipping originally forked from w8r/martinez, notably used here and here
JavaScript mapbox/polyclip by mourner! archived though
JavaScript velipso/polybooljs “based somewhat” on the original
Go engelsjk/polygol this one, a port of mfogel/polygon-clipping
Go toanqng/martinez-rueda based on kudm761/martinez-rueda-php
Go akavel/polyclip-go “known to have bugs”
Rust 21re/rust-geo-booleanop closely follows w8r/martinez
Python lycantropos/martinez port of the original
Python lycantropos/clipping
PHP kudm761/martinez-rueda-php based on the original
C/ActionScript3 akavel/martinez-src copy of the original with an AS3 port
Scala JonMcPherson/martinez-polygon-clipper ported from w8r/martinez

I have looked at some, but not all, of these implementations. Often, they seem to be quite mechanical translations of the original C++ code, so bugs may also have been translated across.

Working closely and in detail on this algorithm has turned out to be very interesting, and has completely changed the way I think about floating point maths. That alone has been worth the effort to me. Please do not interpret any of what follows as criticisms of the original authors or their work; for the following reasons (amongst many others):

  1. Most importantly, this algorithm seems to work well for a large number of people and applications. There’s a good chance that most reasonable people would consider the issues I’ve found to be “nit-picking”.
  2. Many years ago I attempted to do a PhD. I abandoned it after three years or so, but during that time, I attempted to write a few papers. I learnt one of the most frustrating things about writing papers is the page limit, which forces huge compromise on how you describe and explain your own work.
  3. They’ve provided some source code, which helps to explain many of the details elided from the papers (though in some places also suggests the details in the papers may not be quite right). This is vastly better than most academic papers.
  4. This is far from the first time I’ve come across papers that turn out to be missing certain key details. In fact nearly every paper describing an algorithm that I’ve wanted to implement turns out to be missing certain details.
  5. Without their work I couldn’t do this work.

Some constraints on the input polygons

I need to define some terminology, and what is considered to be valid input to this algorithm.

A polygon is defined by a list of (x,y) coördinates. Each (x,y) coördinate defines a vertex of the polygon. The polygon is closed: there is an edge between its final vertex back to its first vertex (some standards, such as GeoJSON, require that you restate the first vertex as the last. This typically makes iterating through the vertices and constructing the edges easier. I’m not specifying an input format here – anything will do).

The set of vertices that define the outside perimeter of the polygon is called the “exterior ring”. A polygon may have zero or more “holes”. A “hole” cuts out a region inside of the polygon. These are sometimes called “interior rings”. Both exterior and interior rings are sometimes collectively called “contours”.

The “edge interior” is the set of all the points that lie on an edge, excluding the two vertices that define the ends of the edge. The “polygon interior” is the set of all points that lie inside the exterior ring and outside all interior rings.

I rule as invalid any input polygon that:

  • is self-intersecting. It’s perfectly OK for a polygon to revisit the same vertex many times. But it’s not OK for a polygon to either:

    • have two edges that intersect each other away from their vertices (the “edge interior”),
    • have a vertex of some edge A to lie on the edge interior of some other edge B.

    In both these scenarios, adding additional vertices can solve these problems.

  • has a hole that intersects with itself (i.e. the hole is self-intersecting in the same way as described above).

  • has a hole that intersects any other hole (except by the sharing of vertices).

  • has a hole that intersects with the exterior ring (except by the sharing of vertices).

  • has a hole within a hole.

The diagram shows two valid polygons, and three invalid polygons, illustrating the requirements just stated

Figure 1: valid and invalid polygons

Figure 1 shows two valid polygons, a and b, and three invalid polygons:

  • c is invalid because it is self-intersecting;
  • d is invalid because its hole intersects with its exterior ring;
  • e is invalid because its hole has a hole.

Note that b could be a single exterior ring, or an exterior ring with one or two interior rings. All would be valid because the only intersections possible between these different rings are at vertices.

It’s all about intersections

No matter which boolean operation you wish to perform between polygons A and B, you need to find the coördinates of the intersections between the edges of A and the edges of B. To find the coördinates of an intersection between two edges (or “line segments”), refer to page 244 (as printed on the page; page 281 of the PDF) of “Geometric Tools for Computer Graphics”. There are more results possible than you might like though:

  • 0 points of intersection found: the two edges have no points in common;
  • 1 point of intersection found: the two edges cross over each other somewhere;
  • several points of intersection found: the two edges overlap (there are several points which are collinear to both edges). Typically, you want to know the coördinates of the two vertices which mark the ends of this overlapping region.

Starting down the floating-point rabbit hole

A refrain that I’m going to keep coming back to is:

Floats are not much different from ints, and if an algorithm isn’t correct for ints, it’s definitely not going to be correct for floats.

By “correct”, I don’t necessarily mean “calculates the exact right number”. I more mean that from a logical point of view, it “does the right thing”.

If the coördinates of the vertices of our line segments were expressed in ints and not floats, then we could draw this sort of diagram:

The diagram shows two valid line segments that intersect, and illustrates how the point of intersection cannot be expressed as integer coördinates

Figure 2: Intersection of two line segments with integer coördinates

  • The coördinates of our original two line segments are valid integer coördinates: the orange line is (20,65) to (24,60), and the blue line is (20,61) to (25,64).
  • The point of intersection cannot be expressed as integer coördinates. And once you ponder this for a moment, you realise the probability of any point of intersection being expressible as integer coördinates is close to 0. So we have no choice but to move the point of intersection to the nearest coördinate that we can express with integers. I think of this as “snap to grid”.
  • The result is four line segments, none of which exactly match up to the originals, but are as close as they can be. Note that this “snapping to grid” has changed the angle of the orange and blue lines. Foreshadowing for the distant future!

This “snapping to grid” is logically correct: it’s impossible to avoid, but it does also mean that the result you get back won’t be quite perfect, simply because we cannot represent the point of intersection exactly. Well, the situation with floating point coördinates is no different to that of integer coördinates (in fact, it’s a bit worse).

What is a floating point number anyway?

A 64-bit float consists of:

  • a sign bit. This is the 63rd bit. 0 indicates a positive number; 1 means negative.
  • Then 11 bits which represent the exponent. This is a biased unsigned integer: to get the real exponent, decode these 11 bits as an unsigned int, cast to a signed int, and then subtract 1023.
  • Then 52 bits which represent the fraction (or mantissa). Again, this encoded as an unsigned int.
Diagram showing the bit layout of a 64-bit float.

Figure 3: Bit layout of a 64-bit float. Diagram by Codekaizen, CC BY-SA 4.0

We put these parts together as follows:

  • If the exponent is all 0s then we have a subnormal number: (−1)sign × 2−1022 × 0.fraction
  • If the exponent is all 1s (a.k.a 0x7FF), then we have either NaN (not-a-number), or infinity. We won’t worry about these any further here.
  • Otherwise, we have a normal number: (−1)sign × 2(exponent − 1023) × 1.fraction

For the rest of this discussion when I write exponent, I mean the fully decoded and unbiased number, i.e. the result of the −1023 step. Here’s some Go code I wrote that decodes a float64. Note the fraction gets printed in binary (base-2):

func decodeFloat64(num float64) string {
   bits := math.Float64bits(num)

   sign := bits >> 63
   // 0x7FF is 2^11 - 1
   exponent := int64(((bits >> 52) & 0x7FF)) - 1023
   isSubnormal := ((bits >> 52) & 0x7FF) == 0
   // 0xFFFFFFFFFFFFF is 2^52 -1
   fraction := bits & 0xFFFFFFFFFFFFF

   signStr := "+"
   if sign == 1 {
      signStr = "-"
   }

   if isSubnormal {
      return fmt.Sprintf("%s0.%052b * 2^%d (subnormal)\t%g", signStr, fraction, -1022, num)
   } else {
      return fmt.Sprintf("%s1.%052b * 2^%d\t%g", signStr, fraction, exponent, num)
   }
}

Let’s try it out:

func main() {
   for i := float64(1); i > 0; i /= 2 {
      fmt.Println(decodeFloat64(i))
   }
}

We start with:

+1.0000000000000000000000000000000000000000000000000000 * 2^0   1
+1.0000000000000000000000000000000000000000000000000000 * 2^-1  0.5
+1.0000000000000000000000000000000000000000000000000000 * 2^-2  0.25
+1.0000000000000000000000000000000000000000000000000000 * 2^-3  0.125
+1.0000000000000000000000000000000000000000000000000000 * 2^-4  0.0625
  • The fraction is all 0s (these are normal numbers, so there’s an implicit 1. prefix to the fraction).
  • Each halving changes only the exponent, which is just slowly decreasing.

The lowest exponent we can represent in these 11-bits, biased, is −1022. So let’s jump ahead to that section:

+1.0000000000000000000000000000000000000000000000000000 * 2^-1020               8.900295434028806e-308
+1.0000000000000000000000000000000000000000000000000000 * 2^-1021               4.450147717014403e-308
+1.0000000000000000000000000000000000000000000000000000 * 2^-1022               2.2250738585072014e-308
+0.1000000000000000000000000000000000000000000000000000 * 2^-1022 (subnormal)   1.1125369292536007e-308
+0.0100000000000000000000000000000000000000000000000000 * 2^-1022 (subnormal)   5.562684646268003e-309
+0.0010000000000000000000000000000000000000000000000000 * 2^-1022 (subnormal)   2.781342323134e-309
+0.0001000000000000000000000000000000000000000000000000 * 2^-1022 (subnormal)   1.390671161567e-309
+0.0000100000000000000000000000000000000000000000000000 * 2^-1022 (subnormal)   6.953355807835e-310

2−1022 is the smallest possible normal number we can represent: the fraction is all 0s, and the exponent is as low as it can get. If we want to go even smaller, we need to start using the subnormal encoding. For subnormals, the exponent is fixed at −1022 (the exponent bits are all 0s), and now the implicit 1. prefix of the fraction becomes a 0. prefix, allowing us to represent even smaller numbers. So as we continue to halve the number, the exponent is staying the same, but the fraction is now changing. This sequence ends as:

+0.0000000000000000000000000000000000000000000000100000 * 2^-1022 (subnormal)   1.6e-322
+0.0000000000000000000000000000000000000000000000010000 * 2^-1022 (subnormal)   8e-323
+0.0000000000000000000000000000000000000000000000001000 * 2^-1022 (subnormal)   4e-323
+0.0000000000000000000000000000000000000000000000000100 * 2^-1022 (subnormal)   2e-323
+0.0000000000000000000000000000000000000000000000000010 * 2^-1022 (subnormal)   1e-323
+0.0000000000000000000000000000000000000000000000000001 * 2^-1022 (subnormal)   5e-324

0.00000000000000000000000000000000000000000000000000012 × 2−1022 (a.k.a. 5e−324) is the smallest possible number greater than 0 that we can represent with a 64-bit float. No number between 0 and 5e−324 can be represented.

Thinking about floats

What does this all mean though, and how can we think about it a bit more intuitively?

For any given exponent we have all 52 bits of the fraction available to us, so 252 possible values, which are all evenly distributed. So we can think of the fraction as a simple integer count (multiple) of the smallest possible value for that exponent. For any normal float, we can find that smallest value by:

(1.00000000000000000000000000000000000000000000000000012 × 2exponent) − (1.00000000000000000000000000000000000000000000000000002 × 2exponent)

Or in other words, it’s 0.00000000000000000000000000000000000000000000000000012 × 2exponent

Let’s imagine our exponent is 48. So that means we have 252 evenly-distributed values available to us from 248 (inclusive) to 249 (exclusive). We can think of the fraction as a simple counter of the number of smallest-values, which for this exponent is:

  • 0.00000000000000000000000000000000000000000000000000012 × 248
  • which is the same as 2−52 × 248
  • which is the same as 2−4
  • which is 0.0625

So for this exponent, the only values available are multiples of 0.0625. This smallest-value is known as unit in the last place (ULP).

Just to use a computer to double check this, here’s some Go code:

func main() {
   a := math.Pow(2,48)
   b := math.Nextafter(a, math.Pow(2,49))
   fmt.Println(b - a)
}
0.0625

math.Nextafter(a, b) finds the smallest possible value after a, in the direction of b

What’s the ULP if we go up to the next exponent: 49, instead of 48? Well we’d have:

2−52 × 249 = 2−3 = 0.125

So the ULP has just doubled. I.e. the distance between neighbouring values when the exponent is 49, is twice as big, as when the exponent was 48.

If our exponent is 52, then we have a ULP of 1; i.e. when the exponent is 52 a 64-bit float can only represent integers.

func main() {
   a := math.Pow(2,52)
   b := math.Nextafter(a, math.Pow(2,53))
   fmt.Println(b == a + 1)
   fmt.Println(decodeFloat64(a))
   fmt.Println(decodeFloat64(b))
}
true
+1.0000000000000000000000000000000000000000000000000000 * 2^52   4.503599627370496e+15
+1.0000000000000000000000000000000000000000000000000001 * 2^52   4.503599627370497e+15

So if we treat the fraction as an integer (ignoring the 1. prefix), then we can think of a number as 2exponent + fraction × ULP-for-this-exponent

For subnormals the exponent is fixed at −1022, and the 1. prefix of the fraction is gone, so the ULP for subnormals is the same as the smallest possible non-0 value:

  • 0.00000000000000000000000000000000000000000000000000012 × 2−1022
  • which is the same as 2−52 × 2−1022
  • which is the same as 2−1074
  • a.k.a 5e−324

So between 0 and the first normal number (2−1022), there are once again 252 possible values, which are evenly spaced, and we can think of them as multiples of 2−1074. Interestingly, the ULP is the same for subnormals as for the first normal range: the first normal range has an exponent of −1022, it’s just the fraction now has the 1. prefix.

Putting it all together, the ULP is smallest when we’re closest to 0. Every time the exponent increases by 1, the ULP doubles.

Back to the grid

The diagram shows two valid line segments that intersect, and illustrates how the point of intersection cannot be expressed as integer coördinates

Figure 3 (again): Intersection of two line segments with integer coördinates

Coming back to this grid, now armed with our understanding of floats, it’s suddenly apparent that the numbers on this grid could just be the integer interpretation of the fraction from floating point numbers. I.e. this is just the multiple of the ULP for whatever exponent we’re currently using. So the intuition about “snapping to grid” that made so much sense earlier when talking about integer coördinates applies to floats too.

Earlier, I said things were a bit worse with floats. This is because of having to deal with the fact the grid might may not be regular. Consider:

The diagram shows two valid line segments that intersect, and illustrates how the point of intersection cannot be expressed as float coördinates either

Figure 4: Intersection of two line segments with float coördinates

Here, the x-coördinates go beyond one exponent (52) and into the next (53). This means that the resolution of the grid halves (or the spacing doubles). But the y-coördinates are all within the range of a single exponent, which ends up creating a rectangular grid. Somewhat unusual (but again, when you ponder this for a moment, it’s clearly the common case: the square grid only exists when the exponents for the x- and y-coördinates are the same). Furthermore, the intersection of the same two line segments, translated around to different exponents, can be wildly different as the resolution of the grid changes. For example, if these polygons are geo-spatial data (i.e. shapes on the ground in some way), then intersecting them using different coördinate reference systems (CRS) could result in very different shapes, as the same polygons on the ground could end up using different exponents when expressed in different CRSs.

But I think the most challenging problem with non-regular grids is that it means subtraction might not work, and that means that when we calculate vectors by subtracting the coördinates of two vertices, that vector might be wrong.

With all values spaced evenly apart (such as with integers), we have: a + (ba) = b. But with a non-regular grid, if a and b have different exponents, then this may no longer hold, because it may be impossible to represent ba. For example:

  • Let’s say a is the value just before you get to 1.0, and b is 2.0
  • ba should be a value just greater than 1.0, so we need to use the exponent for the range 1.0-to-2.0 to represent this value.
  • But the exponent for the range 1.0-to-2.0 is 1 greater than for the range 0.5-to-1.0. So therefore the ULP for the range from 1.0-to-2.0 is twice the ULP for the range 0.5-to-1.0, and we know that a makes use of this finer-grained resolution, because we defined it to be the value immediately before 1.0, i.e. the greatest value possible using the 0.5-to-1.0 exponent.
  • So the smallest value greater than 1.0 is too big to represent ba.
func main() {
   a := math.Nextafter(1, 0)
   b := float64(2)
   difference := b - a
   fmt.Println(" a     =", decodeFloat64(a))
   fmt.Println(" b     =", decodeFloat64(b))
   fmt.Println("(b-a)  =", decodeFloat64(difference))
   fmt.Println("a + (b-a) == b ?", a+difference == b)
   fmt.Println("a == b - (b-a) ?", a == b-difference)

   fmt.Println()

   a = math.Nextafter(a, 0)
   difference = b - a
   fmt.Println(" a'    =", decodeFloat64(a))
   fmt.Println(" b     =", decodeFloat64(b))
   fmt.Println("(b-a') =", decodeFloat64(difference))
   fmt.Println("a' + (b-a') == b ?", a+difference == b)
   fmt.Println("a' == b - (b-a') ?", a == b-difference)
}
 a     = +1.1111111111111111111111111111111111111111111111111111 * 2^-1   0.9999999999999999
 b     = +1.0000000000000000000000000000000000000000000000000000 * 2^1    2
(b-a)  = +1.0000000000000000000000000000000000000000000000000000 * 2^0    1
a + (b-a) == b ? true
a == b - (b-a) ? false

 a'    = +1.1111111111111111111111111111111111111111111111111110 * 2^-1   0.9999999999999998
 b     = +1.0000000000000000000000000000000000000000000000000000 * 2^1    2
(b-a') = +1.0000000000000000000000000000000000000000000000000001 * 2^0    1.0000000000000002
a' + (b-a') == b ? true
a' == b - (b-a') ? true

So we can see the computer decides that ba is 1.0 exactly – i.e. a value that’s too small. If we define to be one value smaller than a, then we have that is on an even multiple of its ULP, which means it can be expressed correctly in the ULP of the next exponent up, so b can be expressed exactly as a 64-bit float. Given that finding the point of intersection between two line segments will require calculating the vector of each line segment, we need to tread carefully, to say the least!

Wrapping up

Hopefully I’ve illustrated that whether or not your coördinates are ints or floats, it’s unavoidable that line segment intersections will change the lines of your polygons: the intersection will always “snap to grid”. A floating point number can be thought of as 2exponent + fraction × ULP-for-this-exponent, and so within a given exponent, floats are really no different than integers. But for line segments that have coördinates that use different exponents, trouble lies ahead because we can’t expect the vector to be correct: we will likely have to do further, and coarser, “snapping to grid” if we want to be able to trust our vectors.

Thinking back to Part 1, the algorithm there needed to repeatedly consider whether one polygon covered another, to union polygons together and then subtract or intersect other polygons. It now seems that we actually need to do quite a lot of preparation work for each polygon based on what it intersects with, in order to have a logically correct outcome. This suggests an algorithm that allows you to prepare many polygons at a time, before then querying and calculating boolean operations between any pair of the prepared polygons. This is not what the Martínez algorithm does: the Martínez algorithm loads only the two polygons you want to operate on, and so this is further motivation for really digging into the algorithm so we can see where it can be extended. Stay tuned for part 3!

July 13, 2024 06:00 PM

Magnus Therning

Hoogle setup for local development

About a week ago I asked a question on the Nix Discourse about how to create a setup for Hoogle that

  • includes the locally installed packages, and
  • the package I'm working on, and ideally also
  • have all local links, i.e. no links to Hackage.

I didn't get an answer there, but some people on the Nix Haskell channel on Matrix helped a bit, but it seems this particular use case requires a bit of manual work. The following commands get me an almost fully working setup:

cabal haddock --haddock-internal --haddock-quickjump --haddock-hoogle --haddock-html

hoogle_dir=$(dirname $(dirname $(readlink -f $(which hoogle))))
hoogle generate --database=local.hoo \
       $(for d in $(fd -L .txt ${hoogle_dir}); do printf "--local=%s " $(dirname $d); done) \
       --local=./dist-newstyle/build/x86_64-linux/ghc-9.8.2/pkg-0.0.1/doc/html/pkg

hoogle server --local --database=local.foo

What's missing is working links between the documentation of locally installed packages. It looks like the links in the generated documention in Nix have a lot of relative references containing ${pkgroot}/../../../../ which is what I supect causes the broken links.

July 13, 2024 05:18 PM

July 11, 2024

Brent Yorgey

Competitive Programming in Haskell: tree path decomposition, part I

Competitive Programming in Haskell: tree path decomposition, part I

Posted on July 11, 2024
Tagged , , , , ,

In a previous post I challenged you to solve Factor-Full Tree. In this problem, we are given an unlabelled rooted tree, and asked to create a divisor labelling. That is, we must label the vertices with positive integers in such a way that \(v\) is an ancestor of \(u\) if and only if \(v\)’s label evenly divides \(u\)’s label.

For example, here is a tree with a divisor labelling:

Divisor labelling of a tree

The interesting point (though irrelevant to solving the problem) is that this is a method for encoding a tree as a set of integers: because \(v\) is an ancestor of \(u\) if and only if \(v\)’s label divides \(u\)’s, all the information about the tree’s structure is fully contained in the set of labels. For example, if we simply write down the set \(\{1, 5, 6, 7, 12, 14, 21, 49, 63\}\), it is possible to fully reconstruct the above tree from this set.Note that we consider trees equivalent up to reordering of siblings, that is, each node has a bag, not a list, of children.

This is not a particularly efficient way to encode a tree, but it is certainly interesting!

Basic setup

First, some basic setup.See here for the Scanner abstraction, and here for the basics of how I organize solutions.

The first line of input specifies the number of nodes \(N\), and after that there are \(N-1\) lines, each specifying a single undirected edge.

import Control.Category ((>>>))
import Data.Bifunctor (second)
import Data.Map (Map, (!?))
import qualified Data.Map as M
import Data.Tuple (swap)

main = C.interact $ runScanner tc >>> solve >>> format

data TC = TC { n :: Int, edges :: [Edge] }
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  edges <- (n - 1) >< pair int int
  return TC{..}

format :: [Integer] -> ByteString
format = map showB >>> C.unwords

We are guaranteed that the edges describe a tree; next we will actually build a tree data structure from the input.

Building trees

There are many similar problems which specify a tree structure by giving a list of edges, so it’s worthwhile trying to write some generic code to transform such an input into an actual tree. In an imperative language we would do this by building a map from each node to its neighbors, then doing a DFS to orient the tree. Our Haskell code will be similar, except building the map and doing a DFS will both be one-liners!

First, a function to turn a list of undirected edges into a Map associating each vertex to all its neighbors. It’s convenient to decompose this into a function to turn a list of directed edges into a Map, and a function to duplicate and swap each pair. We won’t need dirEdgesToMap for this problem, but we can certainly imagine wanting it elsewhere.

edgesToMap :: Ord a => [(a, a)] -> Map a [a]
edgesToMap = concatMap (\p -> [p, swap p]) >>> dirEdgesToMap

dirEdgesToMap :: Ord a => [(a, a)] -> Map a [a]
dirEdgesToMap = map (second (: [])) >>> M.fromListWith (++)

Next, we can turn such a neighbor Map into a tree. Rather than returning a literal Tree data structure, it’s convenient to incorporate a tree fold: that is, given a function a -> [b] -> b, a neighbor map, and a root node, we fold over the whole tree and return the resulting b value. (Of course, if we want an actual Tree we can use mapToTree Node.) We can also compose these into a single function edgesToTree.

mapToTree :: Ord a => (a -> [b] -> b) -> Map a [a] -> a -> b
mapToTree nd m root = dfs root root
 where
  dfs parent root = nd root (maybe [] (map (dfs root) . filter (/= parent)) (m !? root))

edgesToTree :: Ord a => (a -> [b] -> b) -> [(a, a)] -> a -> b
edgesToTree nd = mapToTree nd . edgesToMap

Inventing divisor labellings

So how do we create a divisor labelling for a given tree? Clearly, we might as well choose the root to have label \(1\), and every time we descend from a parent to a child, we must multiply by some integer, which might as well be a prime. Of course, we need to multiply by a different prime for each sibling. We might at first imagine simply multiplying by 2 for each (arbitrarily chosen) leftmost child, 3 for each second child, 5 for each third child, and so on, but this does not work—the second child of the first child ends up with the same label as the first child of the second child, and so on.

Each node \(u\)’s label is some prime \(p\) times its parent’s label; call \(p\) the factor of node \(u\). It is OK for one child of \(u\) to also have factor \(p\), but the other children must get different factors. To be safe, we can give each additional child a new globally unique prime factor. This is not always necessary—in some cases it can be OK to reuse a factor if it does not lead to identically numbered nodes—but it is certainly sufficient. As an example, below is a divisor labelling of the example tree from before, via this scheme. Each edge is labelled with the factor of its child.

Divisor labelling of a tree with consecutive primes

Notice how we use \(2\) for the first child of the root, and \(3\) for the next child. \(3\)’s first child can also use a factor of \(3\), yielding a label of \(3^2 = 9\). \(3\)’s next child uses a new, globally unique prime \(5\), and its third child uses \(7\); the final child of \(1\) uses the next available prime, \(11\).

We can code this up via a simple stateful traversal of the tree. (For primes, see this post.) It’s a bit fiddly since we have to switch to the next prime between consecutive children, but not after the last child.

primes :: [Integer]
primes = 2 : sieve primes [3 ..]
 where
  sieve (p : ps) xs =
    let (h, t) = span (< p * p) xs
     in h ++ sieve ps (filter ((/= 0) . (`mod` p)) t)

curPrime :: State [Integer] Integer
curPrime = gets head

nextPrime :: State [Integer] ()
nextPrime = modify tail

labelTree :: Tree a -> Tree (Integer, a)
labelTree = flip evalState primes . go 1
 where
  go :: Integer -> Tree a -> State [Integer] (Tree (Integer, a))
  go x (Node a ts) = Node (x, a) <$> labelChildren x ts

  labelChildren :: Integer -> [Tree a] -> State [Integer] [Tree (Integer, a)]
  labelChildren _ [] = pure []
  labelChildren x (t : ts) = do
    p <- curPrime
    t' <- go (x * p) t
    case ts of
      [] -> pure [t']
      _ -> do
        nextPrime
        (t' :) <$> labelChildren x ts

There is a bit of additional glue code we need get the parsed tree from the input, apply labelTree, and then print out the node labels in order. However, I’m not going to bother showing it, because—this solution is not accepted! It fails with a WA (Wrong Answer) verdict. What gives?

Keeping things small

The key is one of the last sentences in the problem statement, which I haven’t mentioned so far: all the labels in our output must be at most \(10^{18}\). Why is this a problem? Multiplying by primes over and over again, it’s not hard to get rather large numbers. For example, consider the tree below:

Tree for which our naïve scheme generates labels that are too large

Under our scheme, the root gets label \(1\), and the children of the root get consecutive primes \(2, 3, 5, \dots, 29\). Then the nodes in the long chain hanging off the last sibling get labels \(29^2, 29^3, \dots, 29^{13}\), and \(29^{13}\) is too big—in fact, it is approximately \(10^{19}\). And this tree has only 23 nodes; in general the input can have up to 60.

Of course, \(29\) was a poor choice of factor for such a long chain—we should have instead labelled the long chain with powers of, say, 2. Notice that if we have a “tree” consisting of a single long chain of 60 nodes (and you can bet this is one of the secret test inputs!), we just barely get by labelling it with powers of two from \(2^0\) up to \(2^{59}\): in fact \(2^{59} < 10^{18} < 2^{60}\). So in general, we want to find a way to label long chains with small primes, and reserve larger primes for shorter chains.

Attempt 1: sorting by height

One obvious approach is to simply sort the children at each node by decreasing height, before traversing the tree to assign prime factors. This handles the above example correctly, since the long chain would be sorted to the front and assigned the factor 2. However, this does not work in general! It can still fail to assign the smallest primes to the longest chains. As a simple example, consider this tree, in which the children of every node are already sorted by decreasing height from left to right:

Tree for which sorting by height first does not work

The straightforward traversal algorithm indeed assigns powers of 2 to the left spine of the tree, but it then assigns 3, 5, 7, and so on to all the tiny spurs hanging off it. So by the time we get to other long chain hanging off the root, it is assigned powers of \(43\), which are too big. In fact, we want to assign powers of 2 to the left spine, powers of 3 to the chain on the right, and then use the rest of the primes for all the short spurs. But this sort of “non-local” labelling means we can’t assign primes via a tree traversal.

To drive this point home, here’s another example tree. This one is small enough that it probably doesn’t matter too much how we label it, but it’s worth thinking about how to label the longest chains with the smallest primes. I’ve drawn it in a “left-leaning” style to further emphasize the different chains that are involved.

Tree with chains of various lengths

In fact, we want to assign the factor 2 to the long chain on the left; then the factor 3 to the second-longest chain, in the fourth column; then 5 to the length-6 chain in the second column; 7 to the length-3 chain all the way on the right; and finally 11 to the smallest chain, in column 3.

In general, then, we want a way to decompose an arbitrary tree into chains, where we repeatedly identify the longest chain, remove it from consideration, and then identify the longest chain from the remaining nodes, and so on. Once we have decomposed a tree into chains, it will be a relatively simple matter to sort the chains by length and assign consecutive prime factors.

This decomposition occasionally comes in handy (for example, see Floating Formation), and belongs to a larger family of important tree decomposition techniques such as heavy-light decomposition. Next time, I’ll demonstrate how to implement such tree decompositions in Haskell!

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at July 11, 2024 12:00 AM

July 10, 2024

Philip Wadler

Cabaret of Dangerous Ideas

I'll be appearing at the Fringe in the Cabaret of Dangerous Ideas, 12.20-13.20 Monday 5 August and 12.20-13.20 Saturday 17 August, at Stand 5. The 5 August show is joint with Matthew Knight of the National Museums of Scotland, the 17 August show is all mine. Both shows are hosted by comedian Susan Morrison.

You can book either via the Fringe or via the Stand. If one is sold out, try the other.

Here's the brief summary:

Chatbots like ChatGPT and Google's Gemini dominate the news. But the answers they give are, literally, bullshit. Historically, artificial intelligence has two strands. One is machine learning, which powers ChatGPT and art-bots like Midjourney, and which threatens to steal the work of writers and artists and put some of us out of work. The other is the 2,000-year-old discipline of logic. Professor Philip Wadler (The University of Edinburgh) takes you on a tour of the risks and promises of these two strands, and explores how they may work better together.
I'm looking forward to the audience interaction. Everyone should laugh and learn something. Do come!

by Philip Wadler (noreply@blogger.com) at July 10, 2024 04:50 PM

July 09, 2024

Chris Smith 2

Collatz Computations in Base 2 and 3

Every so often, the Collatz conjecture comes up in discussion forums I read, and I start to think about it again. I did for a bit this past weekend. Here are my thoughts this time around.

The Problem

A Collatz sequence starts with some positive integer x₀ and develops the sequence inductively as xₙ₊₁= xₙ / 2 if xₙ is even, or 3xₙ + 1 if xₙ is odd. For instance, starting with 13, we get:

  • x₀ = 13
  • x₁ = 3(13) + 1 = 40
  • x₂ = 40 / 2 = 20
  • x₃ = 20 / 2 = 10
  • x₄ = 10 / 2 = 5
  • x₅ = 3(5) + 1 = 16
  • x₆ = 16 / 2 = 8
  • x₇ = 8 / 2 = 4
  • x₈ = 4 / 2 = 2
  • x₉ = 2 / 2 = 1
  • x₁₀ = 3(1) + 1 = 4
  • x₁₁ = 4 / 2 = 2
  • x₁₂ = 2 / 2 = 1

From there, it’s apparent that the sequence repeats 1, 4, 2, 1, 4, 2, … forever. That is one way for a Collatz sequence to end up. The famous question here, known as the Collatz Conjecture, is whether it’s the only way any such sequence can terminate. Not necessarily! There could be other cycles besides 1, 4, 2. Or there could be a sequence that keeps increasing forever without repeating a number. Or maybe that never happens. No one knows!

We do know a few things. First, if these things happen, they only happen with astronomically large numbers that even powerful computers haven’t been able to check by hand. We know that if even a single number is repeated, then that part of the sequence will repeat forever, since the whole tail of the sequence is determined by any single number in it. And we know that such a sequence cannot decrease forever, since Collatz sequences remain positive integers, so eventually would reach number that we know end up in the 1,4,2 loop. We also know that there are other loops in Collatz sequences that begin with negative integers, so the fact that there have been none found so far in the positive integers is at least a little surprising.

The Collatz Conjecture is famous because it’s probably one of the easiest unsolved math problem to understand the meaning of, for mathematical novices. There’s no Riemann zeta function to define. Just even and odd numbers, division by two, multiplication by three, and adding one. That doesn’t mean it’s easy to solve, though! Many mathematicians and countless novices have spent decades working on the problem, and there’s no promising road to a solution. The mathematician Erdős suggested that it’s not simply that no one has found the solution, but that mathematics is lacking even the basic tools needed to work on this problem.

Collatz and Alternate Bases

There are many, many ways to think about the Collatz conjecture, but one of them is to look at the computation in different bases. We’re not really attempting to find a more efficient way to compute Collatz sequences. If we cared about that, it would be far more efficient to use whatever representation our computing hardware is designed for! Rather, what we’re looking for here is the possibility of some kind of pattern in the computation that reveals something analytical about the problem.

Addition works essentially the same way the same regardless of base, but computations involved in multiplication and division are very dependent on the choice of base! Since the definition of the Collatz sequence two natural choices for computing Collatz sequences are base 2 (binary) and base 3 (ternary).

  • In base 2, it’s trivial to decide whether a number is even or odd, and to divide by two. On the other hand, computing 3n+1 is less trivial, requiring a pass over potentially every digit in the number.
  • In base 3, the opposite happens. Computing 3n+1 is now trivial. But recognizing that a number is even and dividing by two now require a pass over every digit.

Let’s jump into the details and see what happens.

Base 3 in Detail

Base 3 representations are appealing for the Collatz sequence because it’s trivial to compute 3n+1. It amounts to simply adding a 1 to the end of the representation, shifting everything else left (i.e., multiplying it by 3) to make room. If you have n = 1201 (decimal 46), for example, then 3n+1 = 12011 (demical 139).

The more difficult tasks are:

  • Determining whether the number is even or odd. Unlike decimal, we cannot simply look at the last digit. Instead, a number in base 3 is even if and only if it has an even number of 1s in its representation. That’s not hard to count, but it does require looking at the entire sequence of digits.
  • Dividing by two. Given a sequence of base 3 digits, we can express the division algorithm on right-to-left numbers as a state machine using the long division algorithm with remainders as states (starting with zero), using the following division table.
https://medium.com/media/3d64cb9693704a9c676622d805ff9308/href

Let’s see how this table works with an example. Starting again with 1201 (decimal 46):

  • We always start with a remainder of 0. The first digit is 1. That’s the second line of the table. The output digit is, therefore, 0, and the next remainder is 1.
  • A remainder of 1 and a digit of 2 is the last line of the table. It tells us to add a 2 to the output, and proceed with a remainder of 1.
  • A remainder of 1 and a digit of 0 is the fourth line. We add a 1 to the output, and proceed with a remainder of 1.
  • A remainder of 1 and a digit of 1 is the fifth line. We add a 2 to the output and proceed with a remainder of 0.
  • We’re now out of digits. The quotient is 0212 (decimal 23, but note that leading zero which we’ll talk about later!) and the remainder is 0.

Naively, we would have to make two passes over the current number: one to determine whether it’s even or odd, and then again, if it’s even, to divide by two. We can avoid this, though, by remembering that if a number is odd, we intend to compute 3n+1, which will always be even (because the product of two odd numbers is odd, so adding one makes it even), so we’ll then divide that by two. A little algebra reveals that (3n+1)/2 = 3(n/2 - 1/2) + 2 = 3⌊n/2⌋ + 2 if n is odd.

What this means is that we can go ahead and halve n regardless of whether it’s even or odd. At the end, we’ll know whether there’s a remainder or not, and if so, we will already be in position to append a 2 (rather than a 1 as discussed earlier) to the halved number and rejoin the original sequence. This skips one step of the Collatz sequence, but that’s okay. If our goal is only to determine whether the sequence eventually reaches 1, it doesn’t change the answer if we take this shortcut.

Appending that 2 to the end of the number changes the meaning of our state transition table a little bit. Instead of automatically quitting when we reach the end of the current number, we’ll need a chance to append another digit at the end. We’ll add rows to the table for what to do after all the digits have been seen, and be explicit about when to terminate (i.e., finish processing).

There’s one more detail we can handle as we go: as we saw earlier, dividing by two can produce a leading zero at the beginning of the result, which is unnecessary. We can arrange to never produce that leading zero at all, so we don’t need to ignore or remove it later. We just need to remember where we’re just starting and therefore don’t need to write leading zeros. In that case, the remainder is always zero, so there’s only one state to add.

We get the following state transition table.

https://medium.com/media/a72dece41d05e263ba3a5108e5bdac34/href

Since there are no leading zeros in the representations, we need not concern ourselves with the case where the first digit encountered is a zero, but if you want to handle it, we can produce no output and remain in the Just Starting state, since it ought to change nothing. I’ve done so in the code below.

We can iterate this state machine on ternary numbers, and get consecutive values from the Collatz sequence, though slightly abbreviated because we combined the 3n+1 step with the following division by 2. The Collatz conjecture is now equivalent to the proposition that this iterated state machine will eventually produce only a single 1 digit.

I’ve implemented this in the Haskell programming language as follows:

import Data.Foldable (traverse_)
import System.Environment (getArgs)

data Ternary = T0 | T1 | T2 deriving (Eq, Read, Show)

step3 :: [Ternary] -> [Ternary]
step3 = si
where
si (T0 : xs) = si xs
si (T1 : xs) = s1 xs
si (T2 : xs) = T1 : s0 xs
si [] = []

s0 (T0 : xs) = T0 : s0 xs
s0 (T1 : xs) = T0 : s1 xs
s0 (T2 : xs) = T1 : s0 xs
s0 [] = []

s1 (T0 : xs) = T1 : s1 xs
s1 (T1 : xs) = T2 : s0 xs
s1 (T2 : xs) = T2 : s1 xs
s1 [] = [T2]

main :: IO ()
main = do
[n] <- fmap read <$> getArgs
traverse_ print (iterate step3 n)

And here’s a sample result:

$ cabal run exe:collatz3 ‘[T1, T2, T0, T1]’ | head -15
[T1,T2,T0,T1]
[T2,T1,T2]
[T1,T0,T2,T2]
[T1,T2,T2,T2]
[T2,T2,T2,T2]
[T1,T1,T1,T1]
[T2,T0,T2]
[T1,T0,T1]
[T1,T2]
[T2,T2]
[T1,T1]
[T2]
[T1]
[T2]
[T1]

Starting with 1201 (decimal 46), we get 212 (decimal 23), 1022 (decimal 35), 1222 (decimal 53), 2222 (decimal 80), 1111 (decimal 40), 202 (decimal 20), 101 (decimal 10), 12 (decimal 5), 22 (decimal 8), 11 (decimal 4), 2, 1, 2, 1, … As predicted, that’s the Collatz sequence, except for the omission of 3n+1 terms since their computation is merged into the following division by two.

Base 2 in Detail

So what happens in base 2 (binary)? It’s a curiously related but different story!

  • Determining whether a number is even or odd is trivial: just look at the last bit and observe whether it is 0 or 1.
  • Dividing an even number by two is trivial: once you observe that the last digit is a 0, simple delete it, shifting the remaining bits to the right to fill in.
  • However, computing 3n+1 becomes less trivial, now requiring a pass over the entire digit sequence.

Since the hard step is multiplication, and the algorithmically natural direction to perform multiplication is from right to left, we can reverse the order in which we visit the bits, progressing from the least-significant to the most-significant. This is a change from the base 3 case, where division (the inverse of multiplication) was easier to perform in the left-to-right order.

We can start as before, by writing down a simple state transition table for a state machine that multiplies a binary number by 3. The state here is represented by the number carried to the next column.

https://medium.com/media/931082c4a37131c67c30ffe618bfa188/href

(You might recognize this as the same table we already wrote down for halving a ternary number! The only differences are the column headers: the role of states and digits are swapped, and that we must traverse the digits in the opposite order.)

There’s one unfortunate subtlety to this table, and it has to do with leading zeros again. In principle, we think of a number in any base as having an infinite number of leading zeros on the left. In order to get correct results from this table, we need to continue consuming more digits until both the remaining digits and the current remainder are all zero. To express this, we’ll again need to convert our transition table to use explicit termination. This is so that we can stop at exactly the right point and not emit any unnecessary trailing zeros.

But what about the rest of the logic of the Collatz sequence?

  • We should add one after tripling to compute 3n+1. That would also require a pass over potentially the entire number in the worst case… but we’re in luck. We can combine the two tasks just by starting from the Carry 1 state when following this state transition diagram.
  • If the number is even, we should divide by two. Recall how in the ternary case, we merged some of the halving with the 3n+1 computation? This time, we can merge all the halving! Dividing even numbers by two just means dropping trailing zeros from the right side of the representation. Since we’re working right to left, it’s easy to add one more state that ignores trailing zeros at the start of the input.
https://medium.com/media/1e97abe0d424392b8034b1f1824b6f7a/href

We need to be a little careful here, because this version of the Collatz sequence never emits a 1, so looking for a 1 in the sequence is doomed! Instead, the numbers displayed are only the ones immediately after a 3n+1 step, so the final behavior (for all numbers computed so far, anyway) is an infinitely repeating sequence of 4s. We know from earlier that 4 is part of the 1,4,2 cycle, so seeing 4s is enough to know that the full Collatz sequence passes through 1.

We can fix this by remembering refusing to emit any of the trailing zeros. Now we’re ignoring trailing zeros, but also never producing them. The blowup in the number of states needed to keep track of whether a zero has been emitted yet is unfortunate, because we may pass through multiple states before emitting the first non-zero digit. Each of those states needs a copy that handles this new case. Here’s our final transition table.

https://medium.com/media/25712c1dd798a9e486aeeb209fb9eed7/href

Here’s the implementation in Haskell:

import Data.Foldable (traverse_)
import System.Environment (getArgs)

data Binary = B0 | B1 deriving (Eq, Show, Read)

step2 :: [Binary] -> [Binary]
step2 = si
where
si (B0 : xs) = si xs
si (B1 : xs) = s2i xs
si [] = [B1]

s0 (B0 : xs) = B0 : s0 xs
s0 (B1 : xs) = B1 : s1 xs
s0 [] = []

s1 (B0 : xs) = B1 : s0 xs
s1 (B1 : xs) = B0 : s2 xs
s1 [] = [B1]

s1i (B0 : xs) = B1 : s0 xs
s1i (B1 : xs) = s2i xs
s1i [] = [B1]

s2 (B0 : xs) = B0 : s1 xs
s2 (B1 : xs) = B1 : s2 xs
s2 [] = [B0, B1]

s2i (B0 : xs) = s1i xs
s2i (B1 : xs) = B1 : s2 xs
s2i [] = [B1]

main :: IO ()
main = do
[n] <- fmap read <$> getArgs
traverse_ print (iterate step2 n)

And a result:

$ cabal run exe:collatz2 '[B1, B0, B1, B1, B0, B1]' | head -80
[B1,B0,B1,B1,B0,B1]
[B1,B0,B0,B0,B1]
[B1,B0,B1,B1]
[B1,B0,B1]
[B1]
[B1]
[B1]

We start with 101101 (45 in decimal). We triple and add one to get 136, then half to get 68, then 34, then 17, which is the next value that appears (10001 = 17 in decimal). We triple and add one to get 52, then half to get 26, then 13, which is 1101 in binary, and the third number in the list. (Remember the bits are listed from right to left!) Now triple and add one to get 40, and half until you reach 5, which is 101 in binary and the fourth number in the list. Finally, triple and add one to get 16, and half until you reach 1, which is where it stays.

Analysis

Is this a promising avenue to attack the Collatz Conjecture? Almost surely not. I’m not sure anyone knows a promising way to solve the problem. Nevertheless, we can ask what it might look like if one were to use this approach to attempt some progress on the conjecture.

One way (in fact, in some sense, the only way) to solve the Collatz Conjecture is to find some kind of quantity that:

  1. Takes its minimum possible value for the number 1.
  2. Always decreases from one element of a Collatz sequence to the next, except at 1.
  3. Cannot decrease forever.

If such a quantity exists, then a Collatz sequence must eventually reach 1, so the Collatz Conjecture must be true — and conversely, in fact, if the Collatz Conjecture is true, then such a quantity must exist, since the number of steps to reach 1 would then be exactly such a quantity. This is equivalent to the original conjecture, which is why I commented that proving this is the only way to solve it! But this way of looking at the conjecture is interesting because it lets you define any quantity you like, as long as it has those three properties.

We know a lot of things that this quantity isn’t. It can’t be just the magnitude of the number, since that can increase with the 3n+1 rule. It also can’t be the number of digits (in any base), since that can increase sometimes, as well. Plenty of people have looked for other quantities that work. It’s useful to me to think of the quantity as a measure of the “entropy” (or rather its opposite, since it’s decreasing). It’s something you lose any time you take a step, and this tells you that eventually you will reach some minimum state, which must be the number 1.

Just guessing a quantity is unlikely to work. But if you can come to some understanding of the behavior of these computations, it’s conceivable there’s a quantity embedded in them somewhere that satisfies these conditions. If this entropy value is calculated digit by digit, you may be able to isolate how it changes in response to each of these state transition rules.

It is, at the very least, one point of view from which one might start thinking. I never claimed to have any answers! This was always just a random train of thought.

by Chris Smith at July 09, 2024 04:49 AM

July 06, 2024

Matthew Sackman

Let's build! Boolean operations of polygons: Part 1 - Introduction

In this series:


There are a small set of common boolean operations that you can perform on two polygons: union (a∪b), intersection (a∩b), subtraction (a−b and b−a), and exclusive-or (a xor b) (also called symmetric-difference). All apart from subtraction are commutative, e.g. a∪b gives the same result as b∪a.

The diagram shows two overlapping polygons, a and b, and the result of their union, intersection, subtraction (in both orders), and exclusive-or (xor).

Boolean operations between two overlapping polygons, a and b

I found myself in a situation where I needed to overlay a few dozen datasets of polygons, and trace the intersecting shapes. When you’re dealing with over 100 million polygons, this becomes non-trivial, and I ended up writing my own batch processing solution, which was highly parallelised. My solution nevertheless used an existing implementation of these boolean operations from libgeos.

Although not the focus of this series, it’s probably worth documenting the algorithm I came up with.

The algorithm proceeds in rounds: processing each round may produce some polygons that go into the next round, and it may produce some polygons that go into the output. When you’ve processed all the polygons in the current round, you start work on the next round. Eventually you reach a point where the next round is empty, and your result is then all the polygons you’ve output.

  1. For each polygon A in the current round:
    1. If there is some other polygon B in the current round that completely covers A, then ignore A and move on to the next polygon in the current round.
    2. For all other polygons (B, C, …) in the current round that intersect with A, output A − ⋃{B, C, …}.
    3. In addition, for each polygon B from the current round that intersects with A, add to the next round AB.
  2. If the next round is empty, we’re done, otherwise start processing the next round after removing duplicates.

Let’s walk through an example:

Round 1

Here are the original polygons:

The original polygons are labelled A, B, C and D.

Round 1 input

  • A, B and C are all covered by D (D does not have any holes in it), so A, B and C are completely ignored.
  • But for D, we:
    • output D − ⋃{A, B, C}
    • add to next-round DA, DB, and DC (which is equal to the set A, B, C because the ∩ D changes nothing).

Round 1 results

Round 2

Round 2 now starts, processing the next-round results from round 1: A, B, and C.

  • A is covered by C, so A gets completely ignored.
  • For B we:
    • output B − ⋃{A, C}
    • add to next-round BA and BC.
  • For C we:
    • output C − ⋃{A, B}
    • add to next-round CA (which is equal to A).

Round 2 results

Round 3

Round 3 now starts, processing the next-round results from round 2: A, BA, and BC

  • For A, we
    • output A − ⋃{(BA), (BC)} (which is the same as A − (BA), or A − (BC))
    • add to next-round A ∩ (BA), which is the same as BA.
    • add to next-round A ∩ (BC), which is the same as BA.
  • BA is covered by both A and BC, so BA gets completely ignored.
  • For BC, we:
    • output (BC) − ⋃{A, (BA)} (which is the same as (BC) − A, or (BC) − (BA))
    • add to next-round:
      • (BC) ∩ A, which is the same as BA
      • (BC) ∩ (BA), which is the same as BA
  • After removing duplicates, the next-round contains only one polygon, equal to BA.

Round 3 results

Round 4

Round 4 now starts, processing the next-round results from round 3: BA

  • For BA we output (BA) − ⋃{}. I.e. (BA)
  • There’s nothing to go to the next round, so we are finished.

The full set of output polygons is:

  • D − ⋃{A, B, C}
  • B − ⋃{A, C}
  • C − ⋃{A, B}
  • A − (BA)
  • (BC) − A
  • BA

Making it go fast

In each round, for each polygon A, we need to be able to find the set of polygons that intersect with A. So that suggests using an R-tree or quad-tree and building some sort of index of intersections. In my particular case, I had so many polygons that they couldn’t be held in RAM, so these data structures had to be designed to work off disk. As usual for me, I used my LMDB bindings and built a quad-tree on top of that. After some effort, my quad-tree could add multiple non-intersecting polygon segments in parallel.

Within each round, you can process every polygon in parallel. With a modern CPU with 16 cores or so, this makes a big difference. This is a major point of difference to other approaches I attempted such as using PostGIS: in my experience, PostGIS would often start off using multiple cores, and then fall back to a single core far too quickly. Even with my custom approach and a powerful desktop machine, processing these datasets would take multiple days. PostGIS would have taken months or more.

Additionally, the more I used libgeos to do the polygon intersection, union, and difference operations, the more I felt libgeos itself might be some way from optimal. I started wondering whether I could reimplement these boolean algorithms using OpenCL or CUDA. That required learning how these boolean operations actually work, and so I started down that rabbit hole.


The next post in this series will start looking at a widely used algorithm for boolean operations on polygons, and will start to uncover the challenges and difficulties in this space.

July 06, 2024 01:00 PM

July 04, 2024

Philip Wadler

Remember to vote, tactically (a message for the progressive among you)

 


Happy Election Day!

The above shows an average of five recent polls for my constituency, Edinburgh North and Leith, and comes courtesy of Stop the Tories. Clearly, the Tories have no chance, but I will still be voting tactically. I am a member of the Greens. But if everyone who intends to vote Green instead votes SNP, the SNP will beat Labour (rather than the other way around). While the SNP has made some awful missteps of late, they are the best hope to push Labour toward the more progressive policies from which Starmer has dragged them away. My tactical vote goes to the SNP.



by Philip Wadler (noreply@blogger.com) at July 04, 2024 07:09 AM

July 03, 2024

Gabriella Gonzalez

Quality and productivity are not necessarily mutually exclusive

Quality and productivity are not necessarily mutually exclusive

One of my pet peeves is when people pit quality and productivity against each other in engineering management discussions because I don’t always view them as competing priorities.

And I don’t just mean that quality improves productivity in the long run by avoiding tech debt. I’m actually saying that a focus on quality can immediately boost delivery speed for the task at hand.

In my experience there are two primary ways that attention to quality helps engineers ship and deliver more features on shorter timescales:

  • Mindfulness of quality counteracts tunnel vision

    By “tunnel vision” I mean the tendency of engineers to focus too much on their initial approach to solving a problem, to the point where they miss other (drastically) simpler solutions to the same problem. When an engineer periodically steps back and holistically evaluates the quality of what they’re building they’re more likely to notice a simpler solution to the same problem.

  • Prioritizing quality improves morale

    Many engineers deeply desire being masters at their craft, and the morale boost of doing a quality job can sharply increase their productivity, too. Conversely, if you pressure an engineer to cut corners and ship at all costs you might decrease the scope of the project but you also might tank their productivity even more and wipe out any gains from cutting scope.

HOWEVER, (and this is a big caveat) the above points do not always apply, which is why I say that a focus on quality only sometimes improves productivity. In other words, part of the art/intuition of being a manager is recognizing the situations where quality supports productivity.

For example, not every engineer cares about doing a quality job or honing their craft (for some people it’s just a job) and if you ask these kinds of engineers to prioritize quality they’re not going to get the morale/productivity boost that a more passionate engineer might get. Like, it could still be the right decision to prioritize quality, but now it’s no longer an obvious decision.

Similarly, not every engineer will benefit from stepping back and thinking longer about the problem at hand because some engineers are enamored with complexity and aren't as good at identifying radically simpler solutions (although I will say that valuing simplicity is a great thing to cultivate in all of your engineers even if they’re not good at it initially). As a manager you have to recognize which engineers will move faster when given this extra breathing room and which ones won’t.

Anyway, the reason I’m writing this post is to counteract the mindset that quality and productivity are competing priorities because this mentality causes people to turn off their brains and miss the numerous opportunities where quality actually supports productivity (even in the very short term).

by Gabriella Gonzalez (noreply@blogger.com) at July 03, 2024 01:12 PM

July 02, 2024

Haskell Interlude

52: Pepe Iborra

Andres and Sam interview Pepe Iborra, exploring his journey from academia via banking to now Meta. In this episode, we discuss Pepe’s involvement in the evolution of the Haskell ecosystem, in particular the ongoing journey to improve the developer experience via work on debuggers, build systems and IDEs.

by Haskell Podcast at July 02, 2024 03:00 PM

July 01, 2024

Monday Morning Haskell

Solve.hs Module 3 + Summer Sale!

After 6 months of hard work, I am happy to announce that Solve.hs now has a new module - Essential Algorithms! You’ll learn the “Haskell Way” to write all of the most important algorithms for solving coding problems, such as Breadth First Search, Dijkstra’s Algorithm, and more!

You can get a 20% discount code for this and all of our other courses by subscribing to our mailing list! Starting next week, the price for Solve.hs will go up to reflect the increased content. So if you subscribe and purchase this week, you’ll end up saving 40% vs. buying later!

So don’t miss out, head to the course sales page to buy today!

by James Bowen at July 01, 2024 04:00 PM

GHC Developer Blog

GHC 9.6.6 is now available

GHC 9.6.6 is now available

Zubin Duggal - 2024-07-01

The GHC developers are happy to announce the availability of GHC 9.6.6. Binary distributions, source distributions, and documentation are available on the release page.

This release is primarily a bugfix release addressing some issues found in the 9.6 series. These include:

  • A fix for a bug in the NCG that could lead to incorrect runtime results due to erroneously removing a jump instruction (#24507).
  • A fix for a linker error that manifested on certain platform/toolchain combinations, particularly darwin with a brew provisioned toolchain, arising due to a confusion in linker options between GHC and cabal (#22210).
  • A fix for a compiler panic in the simplifier due to incorrect eta expansion (#24718).
  • A fix for possible segfaults when using the bytecode interpreter due to incorrect constructor tagging (#24870).
  • And a few more fixes

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Enjoy!

-Zubin

by ghc-devs at July 01, 2024 12:00 AM