In a recent comment (that I sadly cannot find any longer) in https://www.reddit.com/r/math/, someone mentioned the following game. There are n players, and they each independently choose a natural number. The player with the lowest unique number wins the game. So if two people choose 1, a third chooses 2, and a fourth chooses 5, then the third player wins: the 1s were not unique, so 2 was the least among the unique numbers chosen. (Presumably, though this wasn’t specified in the comment, if there is no unique number among all players, then no one wins).
I got nerd-sniped, so I’ll share my investigation.
For me, since the solution to the general problem wasn’t obvious, it made sense to specialize. Let’s say there are n players, and just to make the game finite, let’s say that instead of choosing any natural number, you choose a number from 1 to m. Choosing very large numbers is surely a bad strategy anyway, so intuitively I expect any reasonably large choice of m to give very similar results.
n = 2
Let’s start with the case where n = 2. This one turns out to be easy: you should always pick 1, daring your opponent to pick 1, as well. We can induct on m to prove this. If m = 1, then you are required to pick 1 by the rules. But if m > 1, suppose you pick m. Either your opponent also picks m and you both lose, or your opponent picks a number smaller than m and you still lose. Clearly, this is a bad strategy, and you always do at least as well choosing one of the first m - 1 options instead. This reduces the game to one where we already know the best strategy is to pick 1.
That wasn’t very interesting, so let’s try more players.
n = 3, m = 2
Suppose there are three players, each choosing either 1 or 2. It’s impossible for all three players to choose a different number! If you do manage to pick a unique number, then, you will be the only player to do so, so it will always be the least unique number simply because it’s the only one!
If you don’t think your opponents will have figured this out, you might be tempted to pick 2, in hopes that your opponents go for 1 to try to get the least number, and you’ll be the only one choosing 2. But this makes you predictable, so the other players can try to take advantage. But if one of the other players reasons the same way, you both are guaranteed to lose! What we want here is a Nash equilibrium: a strategy for all players such that no single player can do better by deviating from that strategy.
It’s not hard to see that all players should flip a coin, choosing either 1 or 2 with equal probability. There’s a 25% chance each that a player picks the unique number and wins, and there’s a 25% chance that they all choose the same number and all lose. Regrettable, but anything you do to try to avoid that outcome just makes your play more predictable so that the other players could exploit that.
It’s interesting to look at the actual computation. When computing a Nash equilibrium, we generally rely on the indifference principle: a player should always be indifferent between any choice that they make at random, since otherwise, they would take the one with the better outcome and always play that instead.
This is a bit counter-intuitive! Naively, you might think that the optimal strategy is the one that gives the best expected result, but when a Nash equilibrium involves a random choice— known as a mixed strategy — then any single player actually does equally well against other optimal players no matter which mix of those random choices they make! In this game, though, predictability is a weakness. Just as a poker player tries to avoid ‘tells’ that give away the strength of their hand, players in this number-choosing game need to be unpredictable. The reason for playing the Nash equilibrium isn’t that it gives the best expected result against optimal opponents, but rather that it can’t be exploited by an opponent.
Let’s apply this indifference principle. This game is completely symmetric — there’s no order of turns, and all players have the same choices and payoffs available — so an optimal strategy ought to be the same for any player. Then, let’s say p is the probability that any single player will choose 1. Then if you choose 1, you will win with probability (1 — p)², while if you choose 2, you’ll win with probability p². If you set these equal to each other as per the indifference principle, and solve the equation, you get p = 0.5, as we reasoned above.
n = 3, m = 3
Things get more interesting if each player can choose 1, 2, or 3. Now it’s possible for each player to choose uniquely, so it starts to matter which unique number you pick. Let’s say each player chooses 1, 2, and 3 with the probabilities p, q, and r respectively. We can analyze the probability of winning with each choice.
If you pick 1, then you always win unless someone else also picks a 1. Your chance of winning, then, is (q + r)².
If you pick 2, then for you to win, either both other players need to pick 1 (eliminating each other because of uniqueness and leaving you to win by default), or both other players need to pick 3, so that you’ve picked the least number. Your chance of winning is p²+ r².
If you pick 3, then you need your opponents to pick the same different number: either 1 or 2. Your chance of winning is p²+ q².
Setting these equal to each other immediately shows us that since p²+ q² = p²+ r², we must conclude that q = r. Then p²+ q² = (q + r)² = 4q², so p² = 3q² = 3r². Together with p + q + r = 1, we can conclude that p = 2√3 - 3 ≈ 0.464, while q = r = 2 - √3 ≈ 0.268.
This is our first really interesting result. Can we generalize?
n = 3, in general
The reasoning above generalizes well. If there are three players, and you pick a number k, you are betting that either the other two players will pick the same number less than k, or they will each pick numbers greater than k (regardless of whether they are the same one).
I’ll switch notation here for convenience. Let X be a random variable representing a choice by a player from the Nash equilibrium strategy. Then if you choose k, your probability of winning is P(X=1)² + … + P(X=k-1)² + P(X>k)². The indifference principle tells us that this should be equal for any choice of k. Equivalently, for any k from 1 to m - 1, the probability of winning when choosing k is the same as the probability when choosing k + 1. So:
Cancelling the common terms: P(X>k)² = P(X=k)² + P(X>k+1)²
Rearranging: P(X=k) = √(P(X≥k+1)² - P(X>k+1)²)
This gives us a recursive formula that we can use (in reverse) to compute P(X=k), if only we knew P(X=m) to get started. If we just pick something arbitrary, though, it turns out that all the results are just multiples of that choice. We can then divide by the sum of them all to normalize the probabilities to sum to 1.
nashEquilibriumTo :: Integer -> Distribution Double Integer nashEquilibriumTo m = categorical (zip allPs [1 ..]) where allPs = go m 1 0 [] go 1 pEqual pGreater ps = (/ (pEqual + pGreater)) <$> (pEqual : ps) go k pEqual pGreater ps = let pGreaterEqual = pEqual + pGreater in go (k - 1) (sqrt (pGreaterEqual * pGreaterEqual - pGreater * pGreater)) pGreaterEqual (pEqual : ps)
main :: IO () main = print (probabilities (nashEquilibriumTo 100))
I’ve used a probability library from https://github.com/cdsmith/prob that I wrote with Shae Erisson during a fun hacking session a few years ago. It doesn’t help yet, but we’ll play around with some of its further features below.
Trying a few large values for m confirms my suspicion that any reasonably large choice of m gives effectively the same result.
By inspection, this appears to be a geometric distribution, parameterized by the probability 0.4563109873079237. We can check that the distribution is geometric, which just means that for all k < m - 1, the ratio P(X > k) / P(X ≥ k) is the same as P(X > k + 1) / P(X ≥ k + 1). This is the defining property of a geometric distribution, and some simple algebra confirms that it holds in this case.
But what is this bizarre number? A few Google queries gets us to an answer of sorts. A 2002 Ph.D. dissertation by Joseph Myers seems to arrive at the same number in the solution to a question about graph theory, where it’s identified as the real root of the polynomial x³ - 4x² + 6x - 2. We can check that this is right for a geometric distribution. Starting with P(X=k) = √(P(X≥k+1)² -P(X>k+1)²) where k = 1, we get P(X=1) = √(P(X ≥ 2)² -P(X > 2)²). If P(X=1) = p, then P(X ≥ 2) = 1 - p, and P(X > 2) = (1 - p)², so we have p = √((1-p)² - ((1 - p)²)²), which indeed expands to p⁴ - 4p³ + 6p² - 2p = 0, so either p = 0 (which is impossible for a geometric distribution), or p³ - 4p² + 6p - 2 = 0, giving the probability seen above. (How and if this is connected to the graph theory question investigated in that dissertation, though, is certainly beyond my comprehension.)
You may wonder, in these large limiting cases, how often it turns out that no one wins, or that we see wins with each number. Answering questions like this is why I chose to use my probability library. We can first define a function to implement the game’s basic rule:
leastUnique :: (Ord a) => [a] -> Maybe a leastUnique xs = listToMaybe [x | [x] <- group (sort xs)]
And then we can define the whole game using the strategy above for each player:
gameTo :: Integer -> Distribution Double (Maybe Integer) gameTo m = do ns <- replicateM 3 (nashEquilibriumTo m) return (leastUnique ns)
Then we can update main to tell us the distribution of game outcomes, rather than plays:
main :: IO () main = print (probabilities (gameTo 100))
And get these probabilities:
Nothing -> 0.11320677243374572 Just 1 -> 0.40465349320873445 Just 2 -> 0.22000565820506113 Just 3 -> 0.11961465909617276 Just 4 -> 6.503317590749513e-2 Just 5 -> 3.535782320137907e-2 Just 6 -> 1.9223659987298684e-2 Just 7 -> 1.0451692718822408e-2
An 11% probability of no winner for large m is an improvement over the 25% we computed for m = 2. Once again, a least unique number greater than 7 has less than 1% probability, and the probabilities drop even more rapidly from there.
More than three players?
With an arbitrary number of players, the expressions for the probability of winning grow rather more involved, since you must consider the possibility that some other players have chosen numbers greater than yours, while others have chosen smaller numbers that are duplicated, possibly in twos or in threes.
For the four-player case, this isn’t too bad. The three winning possibilities are:
All three other players choose the same smaller number. This has probability P(X=1)³ + … + P(X=k-1)³
All three other players choose larger numbers, though not necessarily the same one. This has probability P(X > k)³
Two of the three other players choose the same smaller number, and the third chooses a larger number. This has probability 3 P(X > k) (P(X=1)² + … + P(X=k-1)²)
You could possibly work out how to compute this one without too much difficulty. The algebra gets harder, though, and I dug deep enough to determine that the Nash equilibrium is no longer a geometric distribution. If you assume the Nash equilibrium is geometric, then numerically, the probability of choosing 1 that gives 1 and 2 equal rewards would need to be about 0.350788, but this choice gives too small a reward for choosing 3 or more, implying they ought to be chosen less often.
For larger n, even stating the equations turns into a nontrivial problem of accurately counting the possible ways to win. I’d certainly be interested if there’s a nice-looking result here, but I do not yet know what it is.
Numerical solutions
We can solve this numerically, though. Using the probability library mentioned above, one can easily compute, for any finite game and any strategy (as a probability distribution of moves) the expected benefit for each choice.
expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double] expectedOutcomesTo n m dist = [ probability (== Just i) $ leastUnique . (i :) <$> replicateM (n - 1) dist | i <- [1 .. m] ]
We can then then iteratively adjust the probability of each choice slightly based on how its expected outcome compares to other expected outcomes in the distribution. It turns out to be good enough to compare with an immediate neighbor. Just so that all of our distributions remain valid, instead of working with the global probabilities P(X=k), we’ll do the computation with conditional probabilities P(X = k | X ≥ k), so that any sequence of probabilities is valid, without worrying about whether they sum to 1. Given this list of conditional probabilities, we can produce a probability distribution like this.
distFromConditionalStrategy :: [Double] -> Distribution Double Int distFromConditionalStrategy = go 1 where go i [] = pure i go i (q : qs) = do choice <- bernoulli q if choice then pure i else go (i + 1) qs
Then we can optimize numerically, using the difference of each choice’s win probability from its neighbor as a diff to add to the conditional probability of that choice.
refine :: Int -> Int -> [Double] -> Distribution Double Int refine n iters strategy | iters == 0 = equilibrium | otherwise = let ps = expectedOutcomesTo n m equilibrium delta = zipWith subtract (drop 1 ps) ps adjs = zipWith (+) strategy delta in refine n (iters - 1) adjs where m = length strategy + 1 equilibrium = distFromConditionalStrategy strategy
It works well enough to run this for 10,000 iterations at n = 4, m = 10.
main :: IO () main = do let n = 4 m = 10 d = refine n 10000 (replicate (m - 1) 0.3) print $ probabilities d print $ expectedOutcomesTo n m d
The resulting probability distribution is, to me, at least, quite surprising! I would have expected that more players would incentivize you to choose a higher number, since the additional players make collisions on low numbers more likely. But it seems the opposite is true. While three players at least occasionally (with 1% or more probability) should choose numbers up to 7, four players should apparently stop at 3.
Nash equilibrium strategy for n = 4, m = 10
Huh. I’m not sure why this is true, but I’ve checked the computation in a few ways, and it seems to be a real phenomenon. Please leave a comment if you have a better intuition for why it ought to be so!
With five players, at least, we see some larger numbers again in the Nash equilibrium, lending support to the idea that there was something unusual going on with the four player case. Here’s the strategy for five players:
Nash equilibrium strategy for n = 5, m = 10
The six player variant retracts the distribution a little, reducing the probabilities of choosing 5 or 6, but then 7 players expands the choices a bit, and it’s starting to become a pattern that even numbers of players lend themselves to a tighter style of play, while odd numbers open up the strategy.
Nash equilibrium strategy for n = 6, m = 10Nash equilibrium strategy for n = 7, m = 10Nash equilibrium strategy for n = 8, m = 10
In general, it looks like this is converging to something. The computations are also getting progressively slower, so let’s stop there.
Game variants
There is plenty of room for variation in the game, which would change the analysis. If you’re looking for a variant to explore on your own, in addition to expanding the game to more players, you might try these:
What if a tie awards each player an equal fraction of the reward for a full win, instead of nothing at all? (This actually simplifies the analysis a bit!)
What if, instead of all wins being equal, we found the least unique number, and paid that player an amount equal to the number itself? Now there’s somewhat less of an incentive for players to choose small numbers, since a larger number gives a large payoff! This gives the problem something like a prisoner’s dilemma flavor, where players could coordinate to make more money, but leave themselves open to being undercut by someone willing to make a small profit by betraying the coordinated strategy.
What other variants might be interesting?
Addendum (Sep 26): Making it faster
As is often the case, the naive code I originally wrote can be significantly improved. In this case, the code was evaluating probabilities by enumerating all the ways players might choose numbers, and then computing the winner for each one. For large values of m and n this is a lot, and it grows exponentially.
There’s a better way. We don’t need to remember each individual choice to determine the outcome of the game in the presence of further choices. Instead, we need only determine which numbers have been chosen once, and which have been chosen more than once.
data GameState = GameState { dups :: Set Int, uniqs :: Set Int } deriving (Eq, Ord)
To add a new choice to a GameState requires checking whether it’s one of the existing unique or duplicate choices:
addToState :: Int -> GameState -> GameState addToState n gs@(GameState dups uniqs) | Set.member n dups = gs | Set.member n uniqs = GameState (Set.insert n dups) (Set.delete n uniqs) | otherwise = GameState dups (Set.insert n uniqs)
We can now directly compute the distribution of GameState corresponding to a set of n players playing moves with a given distribution. The use of simplify from the probability library here is crucial: it combines all the different paths that lead to the same outcome into a single case, avoiding the exponential explosion.
stateDist :: Int -> Distribution Double Int -> Distribution Double GameState stateDist n moves = go n (pure (GameState mempty mempty)) where go 0 states = states go i states = go (i - 1) (simplify $ addToState <$> moves <*> states)
Now it remains to determine whether a certain move can win, given the game state resulting from the remaining moves.
win :: Int -> GameState -> Bool win n (GameState dups uniqs) = not (Set.member n dups) && maybe True (> n) (Set.lookupMin uniqs)
Finally, we update the function that computes win probabilities to use this new code.
expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double] expectedOutcomesTo n m dist = [probability (win i) states | i <- [1 .. m]] where states = stateDist (n - 1) dist
The result is that while I previously had to leave the code running overnight to compute the n = 8 case, I can now easily compute cases up to 15 players with enough patience. This would involve computing the winner for about a quadrillion games in the naive code, making it hopeless , but the simplification reduces that to something feasible.
Nash equilibria for 2 through 15 players
It seems that once you leave behind small numbers of players where odd combinatorial things happen, the equilibrium eventually follows a smooth pattern. I suppose with enough players, the probability for every number would peak and then decline, just as we see for 4 and 5 here, as it becomes worthwhile to spread your choices even further to avoid duplicates. That’s a nice confirmation of my intuition.
Recently, I published The Monospace Web, a minimalist design
exploration. It all started with this innocent post, yearning for a
simpler web. Perhaps too typewriter-nostalgic, but it was an interesting
starting point. After some hacking and sharing early screenshots,
@noteed asked for grid alignment, and down the rabbit hole I went.
The Python programming language, and its huge ecosystem (there are
more than 500,000 projects hosted on the main Python repository,
PyPI), is used both for software engineering and
scientific research. Both have similar requirements for
reproducibility. But, as we will see, the practices are quite
different.
In fact, the Python ecosystem and community is notorious for the countless ways it uses to declare dependencies.
As we were developping FawltyDeps1,
a tool to ensure that declared dependencies match the actual imports
in the code, we had to accommodate many of these ways.
This got us thinking:
Could FawltyDeps be used
to gain insights into how packaging is done across Python ecosystems?
In this blog post, we look at project structures and dependency declarations across Python projects,
both from biomedical scientific papers (as an example of scientific usage of Python) as well as from more general and widely used Python packages.
We’ll try to answer the following questions:
What practices does the community actually follows? And how do they
differ between software engineering and scientific research?
Could such differences be related to why it’s often hard to reproduce results from scientific notebooks published in the data science community?
Experiment setup
In the following, we discuss the experimental setup — how we decided which data to use, where to get this data from, and what tools we use to analyze it, before we discuss our results in depth.
Data
First, we need to collect the names and source code locations of projects that we want to include in the analysis. Now, where did we find these projects?
We selected projects for analysis based on two key areas: impactful real-world
applications and broad community adoption.
Biomedical data analysis repositories:
biomedical data plays a vital role in healthcare and research.
To capture its significance, we focused on packages directly linked to
biomedical data, sourced from repositories supported or referenced by
scientific biomedical articles. This criterion anchored our experiment in
real-world scientific applications.
To analyze software engineering practices, we’ve chosen to use
the most popular PyPI packages: acknowledging the importance of widely
adopted packages, we included a scan of the most downloaded and frequently
used PyPI packages.
Biomedical data
We leverage a recent study by Samuel, S., & Mietchen, D. (2024):
Computational reproducibility of Jupyter notebooks from biomedical
publications. This study analyzed
2,177 GitHub repositories associated with publications indexed in
PubMed Central to assess computational reproducibility. Specifically,
we reused the dataset they generated (found
here) for our own analyses.
PyPI data
In order to start analyzing actual projects published to PyPI, we still needed
to access some basic metadata about these projects: the project’s name, source URL,
and any extra metadata which could be useful for further analysis such as project tags.
While this information is available via the PyPI REST API, this API is subject
to rate limiting and is not really designed for bulk analyses such as ours.
Conveniently, Google maintains a public BigQuery dataset of PyPI
download statistics and project metadata which we leveraged instead. As a
starting point for our analysis, we produced a CSV with relevant metadata for
top packages downloaded in 2023 using a simple SQL query.
Since the above-mentioned biomedical database contains 2,177 projects, we conducted a scan of the
first 2,000 PyPI packages to create a dataset of comparable size.
Using FawltyDeps to analyze the source code data
Now that we have the source URLs of our projects of interest, we downloaded all sources and ran an analysis script that wraps around FawltyDeps on the packages. For safety, all of this happened in a virtual machine.
Post-processing and filtering of FawltyDeps analysis results
While the data we collected from PyPI was quite clean (modulo broken or inaccessible
project URLs), the biomedical dataset contained some projects written in R and some
projects written in Python 2.X, which are outside of our scope.
To further filter for relevant projects that are written in Python 3.X, we applied the following rules:
there should be .py or .ipynb files in the source code directory of the data.
If there are only .ipynb files and no imports, then it is most likely an R project and not taken into account.
we are also only interested in Python projects that have 3rd-party imports,
as these are the project we would expect to declare their dependencies.
After these filtering steps, we have 1,260 biomedical projects and 1,118 PyPI packages
to be analyzed.
Results
Now that we had crunched thousands of Python packages, we were curious to see what secrets the data produced by FawltyDeps would reveal!
Dependency declaration patterns
First, we investigated which dependency declaration file choices were made in both samples.
The following pie charts show the proportion of projects with and without dependency
declaration files, and whether these files actually contain dependency declarations.
Figure 1. Percent of projects with dependency declaration files and actual dependency(ies) declared.
We find that about 60% of biomedical projects have dependency declaration files, while for PyPI packages, that number is almost 100%.
That is expected, as the top PyPI projects are written to be reproducible: they are downloaded by
a large group of people and if they are not working due to lack of dependency declarations, it would be noticed immediately by the users.
Interestingly, we found that some biomedical projects (6.8%) and PyPI packages
(16.0%) have dependency declaration files with no dependencies listed inside
them. This might be because they genuinely have no third-party dependencies,
but more commonly it is a symptom of either:
setup.py files with complex dependency calculations: although FawltyDeps supports
parsing simple setup.py files with a single setup()call and no computation
involved for setting the install_requires and extras_require arguments,
it is currently not able to analyze more complex scenarios.
pyproject.toml might be used to configure tools with sections like
[tool.black] or [tool.isort], and declaring dependencies (and other project metadata)
in the same file is not strictly required.
For the remainder of the analysis, we do not take these cases into account.
We then examined how different package types utilize various dependency declaration
methods. The following chart shows the distribution of requirements.txt,
pyproject.toml, and setup files across biomedical projects and PyPI
packages (note that these three categories are not exclusive):
Figure 2. Percent of projects with dependencies declared in `requirements.txt`, `pyproject.toml` and setup files.
For biomedical projects, requirements.txt and setup.py/setup.cfg files are a majority of declaration files. In contrast, PyPI projects show a higher occurrence of pyproject.toml compared to biomedical projects.
pyproject.toml is a suggested modern way of declaring dependencies. This result should not come as a surprise: top PyPI projects are actively maintained
and are more likely to follow best practices. A requirements.txt file, on the other hand, is easier to add
and if you do not need to package your projects it is a simpler option.
Now let’s have a more detailed view in which categories are exclusive:
Figure 3. Distribution of mutually exclusive dependency file choices.
For biomedical data there are a lot of projects that have either requirements.txt or setup.py/setup.cfg
files (or a combination of both) present. The traditional method of using setup files utilizing
setuptools to create Python packages has been around for a while and is still heavily relied
upon in the scientific community.
On the PyPI side, no single method for declaring dependencies stood out, as
different approaches were used with similar frequency across all projects.
However, when it comes to using pyproject.toml,
PyPI packages were about five times more likely to adopt this method compared to biomedical projects, suggesting that PyPI package authors tend to favor pyproject.toml significantly more often for dependency management.
Also, almost no top biomedical projects (only 2 out of 1,260) and very few PyPI packages (only 25 out of 1,118) used
pyproject.toml and setup files together: it seems that projects don’t often mix the older method - setup files - with the more modern one - pyproject.toml - at the same time.
A different method of visualizing the subset of results pertaining to requirements.txt, pyproject.toml and setup.py/setup.cfg files are Venn diagrams:
Figure 4. Venn diagram of projects with dependencies declared with categories including combination of dependency files.
While these diagrams don’t contain new insights, they show clearly how much more common pyproject.toml usage is for PyPI packages.
Source code directories
We next examined where projects store their source code, which we refer to as the
“source code directory”. In the following analysis, we defined this directory as the directory that contains the highest number of Python code files and does not have names like “test”, “example”, “sample”, “doc”, or “tutorial”.
Figure 5. Source code directories choices.
We can make some interesting observations: Over
half (53%) of biomedical projects store their main source code in a directory with a name different
than the project itself, and source code is not commonly stored in directories named src or src-python (7%).
For PyPI projects, the numbers are lower, with 37% storing their main code in a directory that matches
the project name. However, naming the source code directory differently from the package name is still fairly common for PyPI projects,
appearing in 36% of cases. A somewhat surprising finding: the src layout, recommended by Python packaging user guide, appears in only 14% of cases.
Another noteworthy observation is that 23% of biomedical projects store all their source code in the
root directory of the project. In contrast, only 12% of PyPI projects follow this pattern. This
difference makes sense, as scientists working on biomedical projects might be less concerned about
maintaining a strict code structure compared to developers on PyPI. Additionally, a lot of biomedical projects might be a loose collection of notebooks/scripts not intended to be packaged/importable, and thus will typically not need to add any subdirectories at all.
On the other hand, everything from the PyPI data set is an importable package. Even in the “flat” layout (according to discussion), related modules are collected in a subdirectory named after the package.
The top PyPI projects that keep their code in the root directory are often small Python modules or plugins, like “python-json-patch”, “appdirs”, and
“python-json-pointer”. These projects usually have all their source code in a single file, so
storing it in the root directory makes sense.
Key results
Many people have preconceptions about how a Python project should look, but the
reality can be quite different.
Our analysis reveals distinct differences between top PyPI projects and biomedical
projects:
PyPI projects tend to use modern tools like pyproject.toml more
frequently, reflecting better overall project structure and dependency management
practices.
In contrast, biomedical projects display a wide variety of practices;
some store code in the root directory and fail to declare dependencies altogether.
This discrepancy is partially explained by the selection criteria: popular PyPI
packages, by necessity, must be usable and thus correctly declare their
dependencies, while biomedical projects accompanying scientific papers do not face
such stringent requirements.
Conclusion
We found that biomedical projects are written with less attention to the coding best practices, which compromises
their reproducibility. There are many projects without dependencies declared. The use of
pyproject.toml, which is
current state-of-the-art way to declare dependencies is less frequently present in biomedical
packages.
In our opinion, though, it’s essential for any package to adhere to the same high standards of reproducibility as top PyPI packages.
This includes implementing robust dependency management practices and embracing modern packaging standards.
Enhancing these practices will not only improve reproducibility but also foster greater trust and adoption within the scientific community.
While our initial analysis revealed some interesting insights, we feel that
there might be some more interesting treasures to be found within this dataset - you can check yourself in
our FawltyDeps-analysis repository! We invite you
to join the discussion on FawltyDeps and reproducibility in package management on our
Discord channel.
Finally, this experiment also served as a real-world stress test for FawltyDeps itself and identified several edge cases we had not yet accounted for, suggesting avenues of further development for FawltyDeps:
One of the main challenges was to parse unconventional require and extra-require sections in
setup.py files.
This issue has been addressed by the FawltyDeps project, specifically through the improvements made in FawltyDeps PR #440.
Furthermore, it was also not trivial to handle projects with multiple packages declared in one.
Addressing these issues will be a focus as we continue to refine and improve FawltyDeps.
Stay tuned as we will drill deeper into the data we’ve collected. So far, we’ve
reused part of FawltyDeps‘ code for our analysis, but the next step will be to run
the full FawltyDeps tool on a large number of packages. Join us as we examine how
FawltyDeps performs under rigorous testing and what improvements can be made to
enhance its capabilities!
Today, 2024-09-18, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …)
we are streaming the 32nd episode of the Haskell Unfolder live on YouTube.
In this episode, which is suitable for Haskell beginners, we don’t focus on a specific Haskell programming technique, but instead try to develop an implementation of a simple game from scratch: tic-tac-toe. After having implemented the rules, we will show how to actually solve the game and allow optimal play by producing a complete game tree and using a naive minimax algorithm for evaluating states.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
In the classic Star Trek episode Errand of Mercy, Spock computes the chance of success:
CAPTAIN JAMES T. KIRK : What would you say the odds are on our getting out of here?
MR. SPOCK : Difficult to be precise, Captain. I should say, approximately 7,824.7 to 1.
And yet they get out of there. Are Spock’s probability computations
unreliable? Think of it another way. The Galaxy is a large place. There
must be tens of thousands of Spocks, and Grocks, and Plocks out there
on various missions. But we won’t hear (or don’t want to hear) about
the failures. So they may all be perfectly good at probability theory, but
we’re only hearing about the lucky ones. This is an example of survivor
bias.
2 Simulation
We can model this. I’ve written a small battle simulator for a super-simple
made up role-playing game...
And the rest of this article can be found at github
(Be sure to download the actual PDF if you want to be able to follow links.)
We’ve all been there: wasting a couple of days on a silly bug.
Good news for you: formal methods have never been easier to leverage.
In this post, I will discuss the contributions I made during my internship to
Liquid Haskell (LH), a tool that makes proving that your
Haskell code is correct a piece of cake.
LH lets you write contracts for your functions inside your Haskell code. In other
words, you write pre-conditions (what must be true when you call it)
and post-conditions (what must always be true when you leave the
function). These are then fed
into an SMT solver that proves your code satisfies them! You may have to
write a few lemmas to guide LH, but it makes verification easier than
proving them completely in a proof assistant.
My contributions enhance the reflection mechanism, which allows LH to unfold
function definitions in logic formulas when verifying a program.
I have explored three approaches that are described in what follows.
The problem
Imagine that, in the course of your work, you wanted to define a
function that inserts into an association list.
{-@
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] |
lookup k l = Just v || head res = (k , v)
}
@-}smartInsert::String->Int->[(String,Int)]->[(String,Int)]smartInsertkvl|lookupkl==Justv=l|otherwise=(k,v):l
LH runs as a compiler plugin. While the bulk of the compiler ignores the
special comments {-@ ... @-}, LH processes the annotations therein.
The annotation that you see in the first snippet is the specification of smartInsert,
with the post-condition establishing that the result of the function must have the
pair (k, v) at the front, or the pair must be already present in the
original list.
Let us say that you also want to use that smartInsert function later in the logic or
proofs, so you want to reflect it to the logic. For that, you will introduce another
annotation:
{-@ reflect smartInsert @-}
This annotation is telling LH that the equations of the Haskell definition of
smartInsert can be used to unfold calls to smartInsert in logic formulas.
As a human, you may agree that the specification is valid for this
implementation, but you get this error from the machine:
error:
Illegal type specification for `Test.smartInsert`
[...]
Unbound symbol GHC.Internal.List.lookup --- perhaps you meant: GHC.Internal.Base.. ?
Do not despair! This tells you that lookup is not defined in the logic.
Despite lookup being a respectable function in Haskell, defined in GHC.List,
LH knows nothing about it. Not all
functions in Haskell can simply be used in the logic, at least not without
reflecting them first. Far from being discouraged, you decide to reflect it
like the others, but you realize that lookup wasn’t defined in your own
module, it comes from the Prelude! This makes reflection impossible, as LH
points out:
error:
Cannot lift Haskell function `lookup` to logic
"lookup" is not in scope
If you consider for a moment, LH needs the definition of the function
in order to reflect it. So it can only complain when it is asked to reflect a
function whose definition is not available because it was defined in
some library dependency.
This is a recurring problem, especially when working with
dependencies, and this is exactly what I have been working on during this
internship at Tweag, in three different ways, as described below.
Idea #1: Define our own reflection of the function
Your first thought might be: “if I cannot reflect lookup because it comes
from a foreign library, I
will just define my own version of it myself”. Even better would be
if you could still link your custom definition of lookup to the original symbol.
Creating this link was my first contribution.
Step one is to define the pretend function. For this to work out correctly
in the end, its definition must be equivalent to the original definition
of the imported function.
The definition of the pretend function might look like this:
So far, so good. Of course, we give it a different name from the actual
function, as they refer to different definitions, and we want to be able to
refer to both so that we can link them together later.
Now, we reflect this myLookup function, which LH has no problem doing,
since this reflect command is located in the same module as its definition.
{-@ reflect myLookup @-}
Then, the magic happens with this annotation that links the two lookups
together:
{-@ assume reflect lookup as myLookup @-}
Read it as “reflect lookup, assuming that its definition is the same as myLookup”.
This is enough to get the smartInsert function verified. Just for the record,
here is the working snippet:
{-@ reflect myLookup @-}myLookup::Eqa=>a->[(a,b)]->MaybebmyLookup_[]=NothingmyLookupkey((x,y):xys)|key==x=Justy|otherwise=myLookupkeyxys{-@ assume reflect lookup as myLookup @-}{-@
reflect smartInsert
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] |
lookup k l = Just v || head res = (k , v)
}
@-}smartInsert::String->Int->[(String,Int)]->[(String,Int)]smartInsertkvl|lookupkl==Justv=l|otherwise=(k,v):l
The question you may be asking at this point is: why does it work?
In order to verify the code, LH has to prove side-conditions (called subtyping
relations) between the actual output and the post-condition to be verified.
For the first equation of smartInsert, it needs to be proved that
lookup k l = Just v && res = l
=>
lookup k l = Just v || head res = (k , v)
For the second equation, it needs to be proved that
res = (k, v) : l
=>
lookup k l = Just v || head res = (k , v)
Because we started with such a simple example,
the reflection of lookup is actually unused here (even though LH conservatively insists on it).
But that’s just a coincidence; in fact, we can use a more direct
post-condition that does actually use the reflection:
{-@
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] | lookup k res = Just v}
@-}
This time, the subtyping constraints require proving:
-- constraint for the first equationlookupkl=Justv&&res=l=>lookupkres=Justv-- constraint for the second equationres=(k,v):l=>lookupkres=Justv
The first constraint can still be solved without going into the definition of
lookup. But the second constraint isn’t something that we can prove for any
definition of lookup. Thanks to reflection, we have the following unfoldings
at our disposal:
Q.E.D. Furthermore, you notice that the equation connecting lookup and
myLookup was crucial. That is the gist of what we added to LH to make the
proof work.
In addition to the implementation, I contributed a specification of
assume-reflection that spells out the validation of the new annotation and
the resolution rules when the same function is assume-reflected at different
locations. It is worth noting that if there exist two assume-reflections in your imports that contradict
each other, then one of them must be false, so your axiom environment will not be sound.
Idea #2: opaque reflection
We noted already that we didn’t truly need
to know what lookup was about to prove the first, simpler specification,
namely:
{-@
smartInsert
:: k:String
-> v:Int
-> l:[(String, Int)]
-> {res : [(String, Int)] |
lookup k res = Just v || head res = (k, v)
}
@-}
The only issue we had was that lookup was not defined in the logic.
Similarly, it is possible that our own functions to be reflected use imported,
unreflected functions whose content is irrelevant. We want to reflect the
expressions of our functions, but do not care about the expression of some of
the functions that appear inside them. Here, we want to reflect smartInsert,
which contains lookup, but we don’t need to know exactly what lookup is
about to prove our lemmas. Either lookup comes from a dependency, or it has
a non-trivial implementation, or it uses primitives not implemented in Haskell.
We allowed this through what we call opaque reflection. Opaque reflection
introduces a symbol, without any equation, for all the symbols in your
reflections that aren’t defined yet in the logic.
For instance, when reflecting the definition of smartInsert,
LH looks for any free symbols in there that are not present in the logic.
Here, it will see that lookup is something new to the logic, and it will
introduce an uninterpreted function for it.
Uninterpreted functions are symbols used by the SMT solver, for which it only
knows it satisfies function congruence, i.e. that if two values are equal
v = w, then when the function is applied to them, the result is still the
same f v = f w.
As it turns out, we could also do that manually using the measure
annotation. These annotations let you introduce an uninterpreted function
in the logic yourself, and specify the refinement type of it.
For instance, we could define a measure like this:
{-@
measure GHC.Internal.List.lookup :: k:a -> xs:[(a, b)] -> Maybe b
GHC.Internal.List.lookup
:: k:a
-> xs:[(a, b)]
-> {VV : Maybe b | VV == GHC.Internal.List.lookup k xs}
@-}
The measure annotation creates an uninterpreted function with the same name as the
function in the Haskell code. The second line links both the uninterpreted and
Haskell functions by strengthening the post-condition of the Haskell function
with the uninterpreted function from the logic.
The new opaque reflection does all that for you automatically! It’s even more
powerful when you think about imports. If two modules are opaque-reflecting the
same function from some common import, the uninterpreted symbols are considered
the same because they refer to the same thing.
Whereas, if you were to use measure annotations in both
imports for the same external functions (say, lookup), and then to import those in
another module, LH would complain about it. Indeed, there can not be two measures
with identical names in scope. Since LH
doesn’t know what you’re using those measures for, or whether they actually
stand for the same uninterpreted function, it cannot resolve the ambiguity.
The full specification is here.
Idea #3: Using the unfoldings
At this point, someone might object that Haskell can inline even imported
functions when optimizing the code, so it must have access to the original
definitions. As such, there is no need for assume-reflection or opaque-reflection, if
we could just reflect the function definition wherever the optimizer finds
it.
It is indeed the case for some functions, and under some circumstances (note the
precautions I’m taking here), that some information about the implementation of
functions is passed in interface files.
What are interface files? These are the files that contain the information that
the other modules need to know. Part of this information is the unfoldings of
the exported functions, in a syntax that is slightly different from the GHC’s
CoreExprs, but can easily be converted to it.
After some experimentation, I observed that the unfoldings of many functions are
available in interface files, unless prevented by the -fignore-interface-pragmas or -fomit-interface-pragmas flags
(note that -O0 implies those flags, but -O1 does not).
Since most packages are compiled with at least -O1, the unfolding of many functions are
available without any further tuning. In particular, those functions that are small
enough to be included in the interface files are available.
Once implemented, it suffices to use the same reflect annotation as before,
but this time even for imported functions!
{-@reflectflip-@}
LH will automatically detect if this function is defined in the current module
or in the dependencies, and in the latter case it will look for possible
unfoldings.
Unfortunately, these unfoldings turned out to have some drawbacks.
The presence of these unfoldings depends on some GHC flags, and heuristics
from GHC. As such, it’s possible for a new version of a library to suddenly
exclude an unfolding without the library author realizing it. This predicament
is akin to that of the HERMIT tool, and it is difficult to solve without
rebuilding the dependencies with custom configuration.
The unfoldings are based on the optimized version of the functions,
which is sometimes harder to reason about. Also, it is subject to change if
the GHC optimizations change, which means that any proof based on these
unfoldings could be broken by a change to those optimizations.
Many functions are not possible to reflect as they are. If
they use local recursive definitions, or lambda abstractions, LH cannot
reflect them at the moment.
If the unfolding of a function depends on non-exported definitions, LH does
not offer a mechanism to request these definitions to be reflected.
Even if it did, this breaks encapsulation to some point,
and makes our code dependent on internal implementation details of imported
code, to the point where even a dot release could break the verification.
Reflections are still limited in their capabilities. At the time of writing,
reflected functions cannot contain lambda abstractions or local recursive bindings.
Recursive bindings are allowed, but local ones are not, since LH has no sense of
locality (yet). Because unfoldings tend to have a lot of these, we cannot reflect
them (yet).
For these reasons, further work and experimentation will be needed to make this
approach truly useful. Nevertheless, we have included the implementation in a PR
in the hope that it may be helpful in some cases, and that improving the capabilities
of reflections in general will make it more and more valuable.
Conclusion
Liquid Haskell’s reflection is handy and powerful, but if your
function uses some dependencies that are not yet reflected, you were stuck. We
presented three ways to proceed: assert an equivalence between the imported function
and a definition in the current module (ideally copy-pasted from the original source file),
introduce some uninterpreted function in the logic for dependencies, or try to find the
unfoldings of those dependencies in interface files.
All of these features have been implemented and pulled into Liquid Haskell. The
implementation fits well into LH’s machinery, reusing the existing
pipeline for uninterpreted symbols and reflections. We also added tests,
especially for module imports, and checked the implementation against the
numerous regression tests already in place. An enticing next step would be to
improve the capabilities of reflection, which would also allow diving deeper
into the reflection of unfoldings in interface files.
I hope this will improve the ease of proof-writing in LH, and that reading this
post will encourage you to write more specifications and proofs about your
code, seeing how much of a breeze it can be!
I would like to thank Tweag for this wonderful opportunity to work on
Liquid Haskell; it has been an enriching internship that has allowed me to grow in
Haskell experience and in contributing to large codebases. In particular, I’d
like to express my heartfelt thanks to my supervisor, Facundo Domínguez, for
his constant support, guidance, and invaluable assistance.
I got the following question on my post on how I handle secrets in my work notes:
Sounds like a nice approach for other secrets but how about :dbconnection for
Orgmode and sql-connection-alist?
I have to admit I'd never come across the variable sql-connection-alist
before. I've never really used sql-mode for more than editing SQL queries and
setting up code blocks for running them was one of the first things I used
yasnippet for.
I did a little reading and unfortunately it looks like sql-connection-alist
can only handle string values. However, there is a variable
sql-password-search-wallet-function, with the default value of
sql-auth-source-search-wallet, so using auth-source is already supported for
the password itself.
There seems to be a lack of good tutorials for setting up sql-mode in a secure
way – all articles I found place the password in clear-text in the config –
filling that gap would be a nice way to contribute to the Emacs community. I'm
sure it'd prompt me to re-evaluate incorporating sql-mode in my workflow.
This is just a “personal life update” kind of post, but I recently found out
a couple of cool things about my academic history that I thought were neat
enough to write down so that I don’t forget them.
Oppenheimer
When the Christopher Nolan
Biopic about the life of J. Robert
Oppenheimer was about to come out, it was billed as an “Avengers of
Physics”, where every major physicist working in the US early and middle 20th
century would be featured. I had a thought tracing my “academic family tree” to
see if my PhD advisor’s advisor’s advisor’s advisor’s was involved in any of the
major physics projects depicted in the movie, to see if I could spot them
portrayed in the movie as a nice personal connection.
If you’re not familiar with the concept, the relationship between a PhD
candidate and their doctoral advisor is a very personal and individual one: they
personally direct and guide the candidate’s research and thesis. To an extent,
they are like an academic parent.
I was able to find my academic
family tree and, to my surprise, my academic lineage actually traces
directly back to a key figure in the movie!
Dr. Kafatos received his PhD under the advisory of Philip Morrison at the
Massachusetts Institute of Technology.
Dr. Morrison received his PhD in 1940 at University of California, Berkeley
under the advisory of none other than J. Robert
Oppenheimer himself!
So, I started this out on a quest to figure out if I was “academically
descended” from anyone in the movie, and I ended up finding out I was
Oppenheimer’s advisee’s advisee’s advisee’s advisee! I ended up being able to
watch the movie and identify my great-great-grand advisor no problem, and I
think even my great-grand advisor. A fun little unexpected surprise and a cool
personal connection to a movie that I enjoyed a lot.
Erdos
As an employee at Google, you can customize your directory page with
“badges”, which are little personalized accomplishments or achievements, usually
unrelated to any actual work you do. I noticed that some people had an “Erdos
Number N” badge (1, 2, 3, etc.). I had never given any thought into my own
personal Erdos number (it was probably really high, in my mind) but I thought
maybe I could look into it in order to get a shiny worthless badge.
In academia, Paul
Erdos is someone who wrote so many papers and
collaborated with so many people that it became a joking
“non-accomplishment” to say that you wrote a paper with him. Then after a while
it became an joking non-accomplishment to say that you wrote a paper with
someone who wrote a paper with him (because, who hasn’t?). And then it became an
even more joking more non-accomplishment to say you had an Erdos Number of 3
(you wrote a paper with someone who wrote a paper with someone who wrote a paper
with Dr. Erdos).
Anyway I just wanted to get that badge so I tried to figure it out. It turns
my most direct trace through:
Dr. Straus collaborated with many people, including Einstein, Graham,
Goldberg, and 20 papers with Erdos.
So I guess my Erdos number is 4? The median number for mathematicians today
seems to be 5, so it’s just one step above that. Not really a note-worthy
accomplishment, but still neat enough that I want a place to put the work
tracking this down the next time I am curious again.
Anyways I submitted the information above and they gave me that sweet Edros 4
badge! It was nice to have for about a month before quitting the company.
That’s It
Thanks for reading and I hope you have a nice rest of your day!
Recently, as part of a larger project, I wanted to define decidable
equality for an indexed data type in Agda. I struggled quite a bit to
figure out the right way to encode it to make Agda happy, and wasn’t
able to find much help online, so I’m recording the results here.
The tl;dr is to use mutual recursion to define the indexed data
type along with a sigma type that hides the index, and to use the
sigma type in any recursive positions where we don’t care about the
index! Read on for more motivation and details (and wrong turns I
took along the way).
This post is literate Agda; you can download it here if you want to play along. I tested everything here with Agda version 2.6.4.3 and version 2.0 of the standard library.
Background
First, some imports and a module declaration. Note that the entire
development is parameterized by some abstract set B of base types,
which must have decidable equality.
We’ll work with a simple type system containing base types, function
types, and some distinguished type constructor □. So far, this is
just to give some context; it is not the final version of the code we
will end up with, so we stick it in a local module so it won’t end up
in the top-level namespace.
module Unindexed wheredata Ty :Setwhere base : B → Ty_⇒_: Ty → Ty → Ty □_: Ty → Ty
For example, if \(X\) and \(Y\) are base types, then we could write down a
type like \(\square ((\square \square X \to Y) \to \square Y)\):
infixr2_⇒_infix30 □_postulate BX BY : B X : Ty X = base BX Y : Ty Y = base BY example : Ty example = □ ((□ □ X ⇒ Y) ⇒ □ Y)
However, for reasons that would take us too far afield in this blog
post, I don’t want to allow immediately nested boxes, like \(\square \square X\). We can still have multiple boxes in a type, and even
boxes nested inside of other boxes, as long as there is at least one
arrow in between. In other words, I only want to rule out boxes
immediately applied to another type with an outermost box. So we
don’t want to allow the example type given above (since it contains
\(\square \square X\)), but, for example, \(\square ((\square X \to Y) \to \square Y)\) would be OK.
Encoding invariants
How can we encode this invariant so it holds by construction? One way
would be to have two mutually recursive data types, like so:
module Mutual wheredata Ty :Setdata UTy :Setdata Ty where □_: UTy → Ty ∙_: UTy → Tydata UTy where base : B → UTy_⇒_: Ty → Ty → UTy
UTy consists of types which have no top-level box; the constructors
of Ty just inject UTy into Ty by adding either one or zero
boxes. This works, and defining decidable equality for Ty and UTy
is relatively straightforward (again by mutual recursion). However,
it seemed to me that having to deal with Ty and UTy everywhere
through the rest of the development was probably going to be super
annoying.
The other option would be to index Ty by values indicating whether a
type has zero or one top-level boxes; then we can use the indices to
enforce the appropriate rules. First, we define a data type Boxity
to act as the index for Ty, and show that it has decidable equality:
My first attempt to write down a version of Ty indexed by Boxity
looked like this:
module IndexedTry1 wheredata Ty : Boxity →Setwhere base : B → Ty [0]_⇒_:{b₁ b₂ : Boxity}→ Ty b₁ → Ty b₂ → Ty [0] □_: Ty [0] → Ty [1]
base always introduces a type with no top-level box; the □
constructor requires a type with no top-level box, and produces a type
with one (this is what ensures we cannot nest boxes); and the arrow
constructor does not care how many boxes its arguments have, but
constructs a type with no top-level box.
This is logically correct, but I found it very difficult to work with.
The sticking point for me was injectivity of the arrow constructor.
When defining decidable equality we need to prove lemmas that each of
the constructors are injective, but I was not even able to write down
the type of injectivity for _⇒_. We would want something like this:
but this does not even typecheck! The problem is that, for example,
σ₁ and τ₁ have different types, so the equality proposition σ₁ ≡ τ₁ is not well-typed.
At this point I tried turning to heterogeneous
equality,
but it didn’t seem to help. I won’t record here all the things I
tried, but the same issues seemed to persist, just pushed around to
different places (for example, I was not able to pattern-match on
witnesses of heterogeneous equality because of types that didn’t
match).
Sigma types to the rescue
At ICFP last week I asked Jesper Cockx
for advice,which felt a bit like asking Rory McIlroy to give some
tips on your mini-golf game
and he suggested trying to prove
decidable equality for the sigma type pairing an index with a type
having that index, like this:
ΣTy :Set ΣTy = Σ Boxity Ty
This turned out to be the key idea, but it still took me a long time
to figure out the right way to make it work. Given the above
definitions, if we go ahead and try to define decidable equality for
ΣTy, injectivity of the arrow constructor is still a problem.
After days of banging my head against this off and on, I finally
realized that the way to solve this is to define Ty and ΣTy by
mutual recursion: the arrow constructor should just take two ΣTy
arguments! This perfectly captures the idea that we don’t care
about the indices of the arrow constructor’s argument types, so we
hide them by bundling them up in a sigma type.
ΣTy :Setdata Ty : Boxity →SetΣTy = Σ Boxity Tydata Ty where □_: Ty [0] → Ty [1] base : B → Ty [0]_⇒_: ΣTy → ΣTy → Ty [0]infixr2_⇒_infix30 □_
Now we’re cooking! We now make quick work of the required injectivity
lemmas, which all go through trivially by matching on refl:
Notice how the type of ⇒-inj is now perfectly fine: we just have a
bunch of ΣTy values that hide their indices, so we can talk about
propositional equality between them with no trouble.
Finally, we can define decidable equality for Ty and ΣTy by mutual
recursion.
ΣTy-≟ : DecidableEquality ΣTy{-# TERMINATING #-}Ty-≟ :∀{b}→ DecidableEquality (Ty b)
Sadly, I had to reassure Agda that the definition of Ty-≟ is terminating—more on this later.
To define ΣTy-≟ we can just use a lemma from
Data.Product.Properties which derives decidable equality for a sigma
type from decidable equality for both components.
ΣTy-≟ = ≡-dec Boxity-≟ Ty-≟
The only thing left is to define decidable equality for any two values
of type Ty b (given a specific boxity b), making use of our
injectivity lemmas; now that we have the right definitions, this falls
out straightforwardly.
Ty-≟ (□ σ)(□ τ)with Ty-≟ σ τ...| no σ≢τ = no (σ≢τ ∘ □-inj)...| yes refl = yes reflTy-≟ (base x)(base y)with ≟B x y...| no x≢y = no (x≢y ∘ base-inj)...| yes refl = yes reflTy-≟ (σ₁ ⇒ σ₂)(τ₁ ⇒ τ₂)with ΣTy-≟ σ₁ τ₁ | ΣTy-≟ σ₂ τ₂...| no σ₁≢τ₁ |_= no (σ₁≢τ₁ ∘ proj₁ ∘ ⇒-inj)...| yes _| no σ₂≢τ₂ = no (σ₂≢τ₂ ∘ proj₂ ∘ ⇒-inj)...| yes refl | yes refl = yes reflTy-≟ (base _)(_ ⇒ _)= no λ()Ty-≟ (_ ⇒ _)(base _)= no λ()
Final thoughts
First, the one remaining infelicity is that Agda could not tell that
Ty-≟ is terminating. I am not entirely sure why, but I think it may
be that the way the recursion works is just too convoluted for it to
analyze properly: Ty-≟ calls ΣTy-≟ on structural subterms of its
inputs, but then ΣTy-≟ works by providing Ty-≟as a higher-order
parameter to ≡-dec. If you look at the definition of ≡-dec, all
it does is call its function parameters on structural subterms of its
input, so everything should be nicely terminating, but I guess I am
not surprised that Agda is not able to figure this out. If anyone has
suggestions on how to make this pass the termination checker without
using a TERMINATING pragma, I would love to hear it!
As a final aside, I note that converting back and forth between Ty
(with ΣTy arguments to the arrow constructor) and IndexedTry1.Ty
(with expanded-out Boxity and Ty arguments to arrow) is trivial:
I expect it is also trivial to prove this is an isomorphism, though
I’m not particularly motivated to do it. The point is that, as anyone
who has spent any time proving things with proof assistants knows, two
types can be completely isomorphic, and yet one can be vastly easier
to work with than the other in certain contexts. Often when I’m
trying to prove something in Agda it feels like at least half the
battle is just coming up with the right representation that makes the
proofs go through easily.
<noscript>Javascript needs to be activated to view comments.</noscript>
A more stable version of this article can be found on github.
The Problem
Since the early days of role-playing games there has been debate over which rolls the GM should make and which are the responsibility of the players. But I think that for “perception” checks it doesn’t really make sense for a player to roll. If, as a player, you roll to hear behind a door and succeed, but you’re told there is no sound, then you know there is nothing to be heard. But you ought to just be left in suspense.
If you play a solo RPG the situation is more challenging. If there is a probability p of a room being occupied, and probability q of you hearing the occupant if you listen at the door, how can you simulate listening without making a decision about whether the room is occupied before opening the door? I propose a little mathematical trick.
Helena Listening, by Arthur Rackham
Simulating conditional probabilities
Suppose P(M) = p and P(H|M) = q (and P(H|not M) = 0). Then P(H) = pq. So to simulate the probability of hearing something at a new door: roll to see if a monster is present, and then roll to hear it. If both come up positive then you hear a noise.
But...but...you object, if the first roll came up positive you know there is a monster, removing the suspense if the second roll fails. Well this process does produce the correct (marginal) probability of hearing a noise at a fresh door. So you reinterpret the first roll not as determining whether a monster is present, but as just the first step in a two-step process to determine if a sound is heard.
But what if no sound is heard and we decide to open the door? We need to reduce the probability that we find a monster behind the door. In fact we need to sample P(M|not H). We could use Bayes’ theorem to compute this but chances are you won’t have any selection of dice that will give the correct probability. And anyway, you don’t want to be doing mathematics in the middle of a game, do you?
There’s a straightforward trick. In the event that you heard no noise at the door and want to now open the door: roll (again) to see if there is a monster behind the door, and then roll to listen again. If the outcome of the two rolls matches the information that you know, ie. it predicts you hear nothing, then you can now accept the first roll as determining whether the monster is present. In that case the situation is more or less vacuously described by P(M|not H). If the two rolls disagree with what you know, ie. they predict you hear something, then repeat the roll of two dice. Keep repeating until it agrees with what you know.
In general
There is a general method here though it’s only practical for simple situations. If you need to generate some hidden variables as part of a larger procedure, just generate them as usual, keep the variables you observe, and discard the hidden part. If you ever need to generate those hidden variables again, and remain consistent with previous rolls, resimulate from the beginning, restarting the rolls if they ever disagree with your previous observations.
In principle you could even do something like simulate an entire fight against a creature whose hit points remain unknown to you. But you’ll spend a lot of time rerolling the entire fight from the beginning. So It’s better for situations that only have a small number of steps, like listening at a door.
Our Nickel language is a configuration language. It’s also a
functional programming language. Functional programming isn’t a well-defined
term: it can encompass anything from being vaguely able to pass functions as
arguments and to call them (in that respect, C and JavaScript are functional) to
being a statically typed, pure and immutable language based on the
lambda-calculus, like Haskell.
However, if you ask a random developer, I can guarantee that one aspect will be
mentioned every time: algebraic data types (ADTs) and pattern matching. They are
the bread and butter of typed functional languages. ADTs are relatively easy to
implement (for language maintainers) and easy to use. They’re part of the 20% of
the complexity that makes for 80% of the joy of functional programming.
But Nickel didn’t have ADTs until recently. In this post, I’ll tell the story of
Nickel and ADTs, starting from why they were initially lacking, the exploration
of different possible solutions and the final design leading to the eventual
retro-fitting of proper ADTs in Nickel. This post is intended for Nickel users,
for people interested in configuration management, but also for anyone interested
in programming language design and functional programming. It doesn’t require
prior Nickel knowledge.
A quick primer on Nickel
Nickel is a gradually typed, functional, configuration language. From this
point, we’ll talk about Nickel before the introduction of ADTs in the 1.5
release, unless stated otherwise. The core language features:
let-bindings: let extension = ".ncl" in "file.%{extension}"
first-class functions: let add = fun x y => x + y in add 1 2
records (JSON objects): {name = "Alice", age = 42}
static typing: let mult : Number -> Number -> Number = fun x y => x * y. By
default, expressions are dynamically typed. A static type annotation makes a
definition or an inline expression typechecked statically.
contracts look and act almost like types but are evaluated at runtime:
{ port | Port = 80 }. They are used to validate configurations against
potentially complex schemas.
The lifecycle of a Nickel configuration is to be 1) written, 2) evaluated and 3) serialized, typically to JSON, YAML or TOML. An important guideline that we set
first was that every native data structure (record, array, enum, etc.) should
be trivially and straightforwardly serializable to JSON. In consequence, Nickel
started with the JSON data model: records (objects), arrays, booleans, numbers
and strings.
There’s one last primitive value: enums. As in C or in JavaScript, an enum in
Nickel is just a tag. An enum value is an identifier with a leading ', such as
in {protocol = 'http, server = "tweag.io"}. An enum is serialized as a string:
the previous expression is exported to JSON as {"protocol": "http", "server": "tweag.io"}.
So why not just using strings? Because enums can better represent a finite set
of alternatives. For example, the enum type [| 'http, 'ftp, 'sftp |] is the
type of values that are either 'http, 'ftp or 'sftp. Writing protocol : [| 'http, 'ftp, 'sftp |] will statically (at typechecking time) ensure that
protocol doesn’t take forbidden values such as 'https. Even without static
typing, using an enum conveys to the reader that a field isn’t a free-form
string.
Nickel has a match which corresponds to C or JavaScript’s switch:
As you might notice, there are no ADTs in sight yet.
ADTs in a configuration language
While Nickel is a functional language, it’s first and foremost a configuration
language, which comes with specific design constraints.
Because we’re telling the story of ADTs before they landed in Nickel, we can’t really
use a proper Nickel syntax yet to provide examples. In what follows, we’ll use a
Rust-like syntax to illustrate the examples: enum Foo<T> { Bar(i32), Baz(bool, T) } is an ADT parametrized by a generic type T with two constructors Bar
and Baz, where the first one takes an integer as an argument and the other
takes a pair of a boolean and a T. Concrete values are written as Bar(42) or
Baz(true, "hello").
An unexpected obstacle: serialization
As said earlier, we want values to be straightforwardly serializable to the JSON
data model.
Now, take a simple ADT such as enum Foo<T,U> = { SomePair(T,U), Nothing }. You
can find reasonable serializations for SomePair(1,2), such as {"tag": "SomePair", "a": 1, "b": 2}. But why not {"flag": "SomePair", "0": 1, "1": 2} or {"mark": "SomePair", "data": [1, 2]}? While those representations are isomorphic, it’s hard to know
the right choice for the right use-case beforehand, as it depends on the
consumer of the resulting JSON. We really don’t want to make an arbitrary choice
on behalf of the user.
Additionally, while ADTs are natural for a classical typed functional language,
they might not entirely fit the configuration space. A datatype like enum Literal { String(String), Number(Number) } that can store either a string or a
number is usually represented directly as an untagged union in a
configuration, that {"literal": 5} or {"literal": "hello"}, instead of the
less natural tagged union (another name for ADTs) {"literal": {"tag" = "Number", "value": 5}}.
This led us to look at (untagged) union types instead. Untagged unions have the
advantage of not making any choice about the serialization: they aren’t a new
data structure, as are ADTs, but rather new types (and contracts) to classify
values that are already representable.
The road of union types
A union type is a type that accepts different alternatives. We’ll use the
fictitious \/ type combinator to write a union in Nickel (| is commonly used
elswhere but it’s already taken in Nickel). Our previous example of a literal
that can be either a string or a number would be {literal: Number \/ String}.
Those types are broadly useful independently of ADTs. For example, JSON Schema
features unions through the core combinator any_of.
Our hope was to kill two birds with one stone by adding unions both as a way to
better represent existing configuration schemas, but also as a way to emulate
ADTs. Using unions lets users represent ADTs directly as plain records using
their preferred serialization scheme. Together with flow-sensitive
typing, we can get as expressive as ADTs while letting
the user decide on the encoding. Here is an example in a hypothetical Nickel
enhanced with unions and flow-sensitive typing:
In Nickel, any type must have a contract counter-part. Alas union and
intersection contracts are hard (in fact, union
types alone are also not a trivial feat to implement!). In the linked blog post,
we hint at possible pragmatic solutions for union contracts that we finally got
to implement for Nickel 1.8. While sufficient for
practical union contracts, this is far from the general union types that could
subsume ADTs. This puts a serious stop to the idea of using union types to
represent ADTs.
What are ADTs really good for?
As we have been writing more and more Nickel, we realized that we have been missing ADTs a
lot for library functions - typically the types enum Option<T> { Some(T), None } and Result<T,E> = { Ok(T), Error(E) } - where we don’t care about
serialization. Those ADTs are “internal” markers that wouldn’t leak out to the
final exported configuration.
Here are a few motivating use-cases.
std.string.find
std.string.find is a function that searches for a substring in a string. Its
current type is:
If the substring isn’t found, {matched = "", index = -1, groups []} is
returned, which is error-prone if the consumer doesn’t defend against such
values. We would like to return a proper ADT instead, such as Found {matched : String, index : Number, groups : Array String} or NotFound, which
would make for a better and a safer interface1.
Contract definition
Contracts are a powerful validation system in Nickel. The ability to plug in
your own custom contracts is crucial.
However, the general interface to define custom contracts can seem bizarre.
Custom contracts need to set error reporting data on a special label value and
use the exception-throwing-like std.contract.blame function. Here is a
simplified definition of std.number.Nat which checks that a value is natural
number:
funlabelvalue=>ifstd.typeofvalue=='Numberthenifvalue%1==0&&value>=0thenvalueelseletlabel=std.contract.label.with_message"not a natural"instd.contract.blamelabelelseletlabel=std.contract.label.with_message"not a number"instd.contract.blamelabel
There are good (and bad) reasons for this situation, but if we had ADTs, we
could cover most cases with an alternative interface where custom contracts
return a Result<T,E>, which is simpler and more natural:
funvalue=>ifstd.typeofvalue=='Numberthenifvalue%1==0&&value>=0thenOkelseError("not a natural")elseError("not a number")
Of course, we could just encode this using a record, but it’s just not as nice.
Let it go, let it go!
The list of other examples of using ADTs to make libraries nicer is endless.
Thus, for the first time, we decided to introduce a native data
structure that isn’t serializable.
Note that this doesn’t break any existing code and is forward-compatible with
making ADTs serializable in the future, should we change our mind and settle on
one particular encoding. Besides, another feature is
independently explored to make serialization more customizable through metadata,
which would let users use custom (de)serializer for ADTs easily.
Ok, let’s add the good old-fashioned ADTs to Nickel!
The design
Structural vs nominal
In fact, we won’t exactly add the old-fashioned version. ADTs are traditionally
implemented in their nominal form.
A nominal type system (such as C, Rust, Haskell, Java, etc.) decides if two
types are equal based on their name and definition. For example, values of enum Alias1 { Value(String) } and enum Alias2 { Value(String) } are entirely
interchangeable in practice, but Rust still doesn’t accept Alias1::Value(s)
where a Alias2 is expected, because those types have distinct definitions.
Similarly, you can’t swap a class for another in Java just because they have
exactly the same fields and methods.
A structural type system, on the other hand, only cares about the shape of data.
TypeScript has a structural type system. For example, the types interface Ball { diameter: number; } and interface Sphere { diameter: number; } are entirely
interchangeable, and {diameter: 42} is both a Ball and a Sphere. Some
languages, like OCaml2 or Go3,
mix both.
Nickel’s current type system is structural because it’s better equipped to
handle arbitrary JSON-like data. Because ADTs aren’t serializable, this
consideration doesn’t weight as much for our motivating use-cases, meaning ADTs
could be still be either nominal or structural.
However, nominal types aren’t really usable without some way of exporting and
importing type definitions, which Nickel currently lacks. It sounds more natural
to go for structural ADTs, which seamlessly extend the existing enums and would
overall fit better with the rest of the type system.
Structural ADTs look like the better choice for Nickel. We can build,
typecheck, and match on ADTs locally without having to know or to care about any
type declaration. Structural ADTs are a natural extension of Nickel (structural)
enums, syntactically, semantically, and on the type level, as we will see.
While less common, structural ADTs do exist in the wild and they are pretty
cool. OCaml has both nominal ADTs and structural ADTs, the latter being known as
polymorphic variants. They are an especially powerful way to represent a non
trivial hierarchy of data types with overlapping, such as abstract syntax
trees or sets of error values.
Syntax
C-style enums are just a special case of ADTs, namely ADTs where constructors
don’t have any argument. The dual conclusion is that ADTs are enums with
arguments. We thus write the ADT Some("hello") as an enum with an argument in
Nickel: 'Some "hello".
We apply the same treatment to types. [| 'Some, 'None |] was a valid enum
type, and now [| 'Some String, 'None |] is also a valid type (which would
correspond to Rust’s Option<String>).
There is a subtlety here: what should be the type inferred for 'Some now? In
a structural type system, 'Some is just a free-standing symbol. The typechecker
can’t know if it’s a constant that will stay as it is - and thus has the type
[| 'Some |] - or a constructor that will be eventually applied, of type a -> [| 'Some a |]. This difficulty just doesn’t exist in a nominal type system like
Rust: there, Option::Some refers to a unique, known and fixed ADT constructor
that is known to require precisely one argument.
To make it work, 'Ok 42 isn’t actually a normal function application in
Nickel: it’s an ADT constructor application, and it’s parsed differently. We
just repurpose the function application syntax4
in this special case. 'Ok isn’t a function, and let x = 'Ok in x 42 is an
error (applying something that isn’t a function).
You can still recover Rust-style constructors that can be applied by defining a
function (eta-expanding, in the functional jargon): let ok = fun x => 'Ok x.
We restrict ADTs to a single argument. You can use a record to emulate multiple
arguments: 'Point {x = 1, y = 2}.
ADTs also come with pattern matching. The basic switch that was match is now a
powerful pattern matching construct, with support for ADTs but also arrays,
records, constant, wildcards, or-patterns and guards (if side-conditions).
Typechecking
Typechecking structural ADTs is a bit different from nominal ADTs. Take the
simple example (the enclosing : _ annotation is required to make the example
statically typed in Nickel)
process is inferred to have type [| 'Ok Number, 'Error |] -> Number. What
type should we infer for data = 'Ok 42? The most obvious one is [| 'Ok Number |]. But then [| 'Ok Number |] and [| 'Ok Number, 'Error |] don’t match and
process data doesn’t typecheck! This is silly, because this example should
be perfectly valid.
One possible solution is to introduce subtyping, which is able to express this
kind of inclusion relation: here that [| 'Ok Number |] is included in [| 'Ok Number, 'Error |]. However, subtyping has some defects and is whole can of
worms when mixed with polymorphism (which Nickel has).
Nickel rather relies on another approach called row polymorphism, which is the
ability to abstract over not just a type, as in classical polymorphism, but a
whole piece of an enum type. Row polymorphism is well studied in the literature,
and is for example implemented in PureScript. Nickel already features row
polymorphism for basic enum types and for records types.
Because there’s a catch-all case _ => -1, the type of process is
polymorphic, expressing that it can handle any other variant beside 'Ok Number
and 'Error (this isn’t entirely true: Ok String is forbidden for example, because it can’t be distinguished from Ok Number). Here, a can be substituted for a subsequence of an enum type,
such as 'Foo Bool, 'Bar {x : Number}.
Equipped with row polymorphism, we can infer the type forall a. [| 'Ok Number; a |]5 for 'Ok 42. When typechecking process data in the
original example, a will be instantiated to the single row 'Error and the
example typechecks. You can learn more about structural ADTs and row
polymorphism in the corresponding section of the Nickel user
manual.
Conclusion
While ADTs are part of the basic package of functional languages, Nickel didn’t
have them until relatively recently because of peculiarities of the design of a
configuration language. After exploring the route of union types, which came to
a dead-end, we settled on a structural version of ADTs that turns out to be a
natural extension of the language and didn’t require too much new syntax or
concepts.
ADTs already prove useful to write cleaner and more concise code, and to improve
the interface of libraries, even in a gradually typed configuration language.
Some concrete usages can be found in try_fold_left and validators already.
Unfortunately, we can’t change the type of
std.string.find without breaking existing programs (at least not until a
Nickel 2.0), but this use-case still applies to external libraries or future
stdlib functions↩
In OCaml, Objects, polymorphic variants and modules
are structural while records and ADTs are nominal.↩
In Go, interfaces are structural while structs are
nominal.↩
Repurposing application is theoretically backward
incompatible because 'Ok 42 was already valid Nickel syntax before 1.5,
but it was meaningless (an enum applied to a constant) and would always error out
at runtime, so it’s ok.↩
In practice, we infer a simpler type [| 'Ok Number; ?a |]
where ?a is a unification variable which can still have limitations.
Interestingly, we decided early on to not perform automatic generalization,
as opposed to the ML tradition, for reasons similar to the ones exposed
here. Doing so, we get (predicative)
higher-rank polymorphism almost for free, while it’s otherwise quite tricky
to combine with automatic generalization. It turned out to pay off in the
case of structural ADTs, because it makes it possible to side-step those
usual enum types inclusion issues (widening) by having the user add more
polymorphic annotations. Or we could even actually infer the polymorphic
type [| forall a. 'Ok Number; a |] for literals.↩
One thing I always appreciate about Haskell is that you can often choose the
level of type-safety you want to work at. Haskell offers tools to be able to
work at both extremes, whereas most languages only offer some limited
part of the spectrum. Picking the right level often comes down to being
consciously aware of the benefits/drawbacks/unique advantages to each.
So, here is a rundown of seven “levels” of type safety that you can operate
at when working with the ubiquitous linked list data type, and how to use them!
I genuinely believe all of these are useful (or useless) in their own different
circumstances, even though the “extremes” at both ends are definitely pushing
the limits of the language.
This post is written for an intermediate Haskeller, who is already familiar
with ADTs and defining their own custom list type like
data List a = Nil | Cons a (List a). But, be advised that
most of the techniques discussed in this post (especially at both
extremes) are considered esoteric at best and harmful at worst for most actual
real-world applications. The point of this post is more to inspire the
imagination and demonstrate principles that could be useful to apply in actual
code, and not to present actual useful data structures.
All of the code here is available
online here, and if you check out the repo and run nix develop
you should be able to load them all in ghci as well:
$ cd code-samples/type-levels$ nix develop$ ghcighci> :load Level1.hs
A value of any type can be given to MkAny, and the resulting
type will have type Any.
However, this type is truly a black hole; you can’t really do
anything with the values inside it because of parametric polymorphism: you must
treat any value inside it in a way that is compatible with a value of
any type. But there aren’t too many useful things you can do
with something in a way that is compatible with a value of any type (things
like, id :: a -> a, const 3 :: a -> Int). In the
end, it’s essentially isomorphic to unit ().
However, this isn’t really how dynamic types work. In other languages, we are
at least able to query and interrogate a type for things we can do with it using
runtime reflection. To get there, we can instead allow some sort of witness on
the type of the value. Here’s Sigma, where Sigma p is
a value a paired with some witness p a:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L24-L25dataSigma :: (Type->Type) ->TypewhereMkSigma :: p a -> a ->Sigma p
And the most classic witness is TypeRep
from base, which is a witness that lets you “match” on the type.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L27-L32showIfBool ::SigmaTypeRep->StringshowIfBool (MkSigma tr x) =case testEquality tr (typeRep @Bool) ofJustRefl->case x of-- in this branch, we know x is a BoolFalse->"False"True->"True"Nothing->"Not a Bool"
This uses type application syntax, @Bool, that lets us
pass in the typeBool to the function
typeRep :: Typeable a => TypeRep a.
Now we can use TypeRep’s interface to “match” (using
testEquality) on if the value inside is a Bool. If the
match works (and we get Just Refl) then we can treat x
as a Bool in that case. If it doesn’t (and we get
Nothing), then we do what we would want to do otherwise.
ghci>let x =MkSigma typeRep Trueghci>let y =MkSigma typeRep (4 ::Int)ghci> showIfBool x"True"ghci> showIfBool y"Not a Bool"
This pattern is common enough that there’s the Data.Dynamic
module in base that is Sigma TypeRep, and testEquality
is replaced with that module’s fromDynamic:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L40-L45showIfBoolDynamic ::Dynamic->StringshowIfBoolDynamic dyn =case fromDynamic dyn ofJust x ->case x of-- in this branch, we know x is a BoolFalse->"False"True->"True"Nothing->"Not a Bool"
For make our life easier in the future, let’s write a version of
fromDynamic for our Sigma TypeRep:
But the reason why I’m presenting the more generic Sigma instead
of the specific type Dynamic = Sigma TypeRep is that you can swap
out TypeRep to get other interesting types. For example, if you had
a witness of showability:
(This type is related to Dict Show from the constraints
library; it’s technically Compose Dict Show)
And now we have a type Sigma Showable that’s kind of of
“not-so-black”: we can at least use show on it:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L64-L65showSigma ::SigmaShowable->StringshowSigma (MkSigmaWitShowable x) =show x -- here, we know x is Show
ghci>let x =MkSigmaWitShowableTrueghci>let y =MkSigmaWitShowable4ghci> showSigma x"True"ghci> showSigma y"4"
This is the “existential
typeclass antipattern”1, but since we are talking about different
ways we can push the type system, it’s probably worth mentioning. In particular,
Show is a silly typeclass to use in this context because a
Sigma Showable is equivalent to just a String: once
you match on the constructor to get the value, the only thing you can do with
the value is show it anyway.
One fun thing we can do is provide a “useless witness”, like
Proxy:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L67-L70dataProxy a =ProxyuselessBool ::SigmaProxyuselessBool =MkSigmaProxyTrue
So a value like MkSigma Proxy True :: Sigma Proxy is truly a
useless data type (basically our Any from before), since we know
that MkSigma constrains some value of some type,
but there’s no witness to give us any clue on how we can use it. A
Sigma Proxy is isomorphic to ().
On the other extreme, we can use a witness to constrain the value to only be
a specific type, like IsBool:
So you can have a value of type
MkSigma ItsABool True :: Sigma IsBool, or
MkSigma ItsABool False, but MkSigma ItsABool 2 will
not typecheck — remember, to make a Sigma, you need a
p a and an a. ItsABool :: IsBool Bool, so
the a you put in must be Bool to match.
Sigma IsBool is essentially isomorphic to Bool.
There’s a general version of this too, (:~:) a (from Data.Type.Equality
in base). (:~:) Bool is our IsBool earlier.
Sigma ((:~:) a) is essentially exactly a…basically
bringing us incidentally back to complete type safety? Weird. Anyway.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level1.hs#L78-L79justAnInt ::Sigma ((:~:) Int)justAnInt =MkSigmaRefl10-- Refl :: Int :~: Int
I think one interesting thing to see here is that being “type-unsafe” in
Haskell can be much less convenient than doing something similar in a
dynamically typed language like python. The python ecosystem is designed around
runtime reflection and inspection for properties and interfaces, whereas the
dominant implementation of interfaces in Haskell (typeclasses) doesn’t gel with
this. There’s no runtime typeclass instantiation: we can’t pattern match on a
TypeRep and check if it’s an instance of Ord or
not.
That’s why I don’t fancy those memes/jokes about how dynamically typed
languages are just “static types with a single type”. The actual way you use
those types (and the ecosystem built around them) lend themselves to different
ergonomics, and the reductionist take doesn’t quite capture that nuance.
The lowest level of safety in which a list might be useful is the dynamically
heterogeneous list. This is the level where lists (or “arrays”) live in most
dynamic languages.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level2.hs#L12-L12typeHList p = [Sigma p]
We tag values with a witness p for the same reason as before: if
we don’t provide some type of witness, our type is useless.
The “dynamically heterogeneous list of values of any type” is
HList TypeRep. This is somewhat similar to how functions with
positional arguments work in a dynamic language like javascript. For example,
here’s a function that connects to a host (String), optionally
taking a port (Int) and a method (Method).
Of course, this would probably be better expressed in Haskell as a
function of type
Maybe String -> Maybe Int -> Maybe Method -> IO (). But
maybe this could be useful in a situation where you would want to offer the
ability to take arguments in any order? We could “find” the first value of a
given type:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level2.hs#L39-L47mkConnectionAnyOrder ::HListTypeRep->IO ()mkConnectionAnyOrder args = doTheThing host port methodwhere host ::MaybeString host = findValueOfType args port ::MaybeInt port = findValueOfType args method ::MaybeMethod method = findValueOfType args
But is this a good idea? Probably not.
Anyway, one very common usage of this type is for “extensible” systems that
let you store components of different types in a container, as long as they all
support some common interface (ie, the widgets system from the Luke
Palmer post).
For example, we could have a list of any item as long as the item is an
instance of Show: that’s HList Showable!
Again, Show is a bad typeclass to use for this because we might
as well be storing [String]. But for fun, let’s imagine some other
things we could fill in for p. If we use HList Proxy,
then we basically don’t have any witness at all. We can’t use the values in the
list in any meaningful way; HList Proxy is essentially the same as
Natural, since the only information is the length.
If we use HList IsBool, we basically have [Bool],
since every item must be a Bool! In general,
HList ((:~:) a) is the same as [a].
A next level of type safety we can add is to ensure that all elements in the
list are of the same type. This adds a layer of usefulness because there are a
lot of things we might want to do with the elements of a list that are only
possible if they are all of the same type.
First of all, let’s clarify a subtle point here. It’s very easy in Haskell to
consume lists where all elements are of the same (but not necessarily
known) type. Functions like sum :: Num a => [a] -> a and
sort :: Ord a => [a] -> [a] do that. This is “polymorphism”,
where the function is written to not worry about the type, and the ultimate
caller of the function must pick the type they want to use with it. For
the sake of this discussion, we aren’t talking about consuming values —
we’re talking about producing and storing values where the
producer (and not the consumer) controls the type variable.
To do this, we can flip the witness to outside the list:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L17-L18dataSomeList :: (Type->Type) ->TypewhereMkSomeList :: p a -> [a] ->SomeList p
We can write some meaningful predicates on this list — for example, we can
check if it is monotonic (the items increase in order)
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L21-L32dataComparable ::Type->TypewhereWitOrd ::Ord a =>Comparable amonotonic ::Ord a => [a] ->Boolmonotonic [] =Truemonotonic (x : xs) = go x xswhere go y [] =True go y (z : zs) = (y <= z) && go z zsmonotonicSomeList ::SomeListComparable->BoolmonotonicSomeList (MkSomeListWitOrd xs) = monotonic xs
This is fun, but, as mentioned before, monotonicSomeList doesn’t
have any advantage over monotonic, because the caller determines
the type. What would be more motivating here is a function that produces “any
sortable type”, and the caller has to use it in a way generic over all sortable
types. For example, a database API might let you query a database for a column
of values, but you don’t know ahead of time what the exact type of that
column is. You only know that it is “some sortable type”. In that case,
a SomeList could be useful.
For a contrived one, let’s think about pulling such a list from IO:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level3.hs#L34-L54getItems ::IO (SomeListComparable)getItems =doputStrLn"would you like to provide int or bool or string?" ans <-getLinecasemaptoLower ans of"int"->MkSomeListWitOrd<$> replicateM 3 (readLn@Int)"bool"->MkSomeListWitOrd<$> replicateM 3 (readLn@Bool)"string"->MkSomeListWitOrd<$> replicateM 3getLine _ -> throwIO $userError"no"getAndAnalyze ::IO ()getAndAnalyze =doMkSomeListWitOrd xs <- getItemsputStrLn$"Got "++show (length xs) ++" items."let isMono = monotonic xs isRevMono = monotonic (reverse xs) when isMono $putStrLn"The items are monotonic." when (isMono && isRevMono) $doputStrLn"The items are monotonic both directions."putStrLn"This means the items are all identical."
Consider also an example where process items different based on what type
they have:
(That’s pattern guard
syntax, if you were wondering)
In this specific situation, using a closed ADT of all the types you’d
actually want is probably preferred (like
data Value = VBool Bool | VInt Int | VDouble Double | VString String),
since we only ever get one of four different types. Using
Comparable like this gives you a completely open type that
can take any instance of Ord, and using
TypeRep gives you a completely open type that can take
literally anything.
This pattern is overall similar to how lists are often used in practice for
dynamic languages: often when we use lists in dynamically typed situations, we
expect them all to have items of the same type or interface. However, using
lists this way (in a language without type safety) makes it really tempting to
hop down into Level 2, where you start throwing “alternatively typed” things
into your list, as well, for convenience. And then the temptation comes to also
hop down to Level 1 and throw a null in every once in a while. All
of a sudden, any consumers must now check the type of every item, and a
lot of things are going to start needing unit tests.
Now, let’s talk a bit about ascending and descending between each levels. In
the general case we don’t have much to work with, but let’s assume our
constraint is TypeRep here, so we can match for type equality.
We can move from Level 3 to Level 2 by moving the TypeRep into
the values of the list, and we can move from Level 3 to Level 1 by converting
our TypeRep a into a TypeRep [a]:
App here as a constructor lets us come TypeReps:
App :: TypeRep f -> TypeRep a -> TypeRep (f a).
Going the other way around is trickier. For HList, we don’t even
know if every item has the same type, so we can only successfully move up if
every item has the same type. So, first we get the typeRep for the
first value, and then cast the other values to be the same type if possible:
To go from Sigma TypeRep, we first need to match the
TypeRep as some f a application using the
App pattern…then we can check if f is []
(list), then we can create a SomeList with the
TypeRep a. But, testEquality can only be
called on things of the same kind, so we have to verify that f has
kind Type -> Type first, so that we can even call
testEquality on f and []! Phew! Dynamic
types are hard!
Ahh, now right in the middle, we’ve reached Haskell’s ubiquitous list type!
It is essentially:
dataList ::Type->TypewhereNil ::List aCons :: a ->List a ->List a
I don’t have too much to say here, other than to acknowledge that this is
truly a “sweet spot” in terms of safety vs. unsafety and usability. This simple
List a / [a] type has so many benefits from
type-safety:
It lets us write functions that can meaningfully say that the input and
result types are the same, like
take :: Int -> [a] -> [a]
It lets us write functions that can meaningfully link lists and the items in
the list, like head :: [a] -> a and
replicate :: Int -> a -> [a].
It lets us write functions that can meaningfully state relationships between
input and results, like map :: (a -> b) -> [a] -> [b]
We can require two input lists to have the same type of items, like
(++) :: [a] -> [a] -> [a]
We can express complex relationships between inputs and outputs, like
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c].
The property of being able to state and express relationships between the
values of input lists and output lists and the items in those lists is extremely
powerful, and also extremely ergonomic to use in Haskell. It can be argued that
Haskell, as a language, was tuned explicitly to be used with the least friction
at this exact level of type safety. Haskell is a “Level 4
language”.
From here on, we aren’t going to be “building up” linearly on safety, but
rather showing three structural type safety mechanism of increasing strength and
complexity.
For Level 5, we’re not going to try to enforce anything on the contents of
the list, but we can try to enforce something on the spline of the
list: the number of items!
To me, this level still feels very natural in Haskell to write in, although
in terms of usability we are starting to bump into some of the things Haskell is
lacking for higher type safety ergonomics. I’ve talked about fixed-length
vector types in depth before, so this is going to be a high-level view
contrasting this level with the others.2
The essential concept is to introduce a phantom type, a type
parameter that doesn’t do anything other than indicate something that we can use
in user-space. Here we will create a type that structurally encodes the natural
numbers 0, 1, 2…:
So, Z will represent zero, S Z will represent one,
S (S Z) will represent two, etc. We want to create a type
Vec n a, where n will be a type of kind
Nat (promoted using DataKinds, which lets us use Z and
S as type constructors), representing a linked list with
n elements of type a.
We can define Vec in a way that structurally matches how
Nat is constructed, which is the key to making things work
nicely:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L17-L21dataVec ::Nat->Type->TypewhereVNil ::VecZ a (:+) :: a ->Vec n a ->Vec (S n) ainfixr5:+
This is offered in the vec library. Here are
some example values:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L23-L33zeroItems ::VecZIntzeroItems =VNiloneItem ::Vec (SZ) IntoneItem =1:+VNiltwoItems ::Vec (S (SZ)) InttwoItems =1:+2:+VNilthreeItems ::Vec (S (S (SZ))) IntthreeItems =1:+2:+3:+VNil
Note two things:
1 :+ 2 :+ VNil gets automatically type-inferred to be a
Vec (S (S Z)) a, because every application of :+
adds an S to the phantom type.
There is only one way to construct a Vec (S (S Z)) a:
by using :+ twice. That means that such a value is a list of
exactly two items.
However, the main benefit of this system is not so you can create a
two-item list…just use tuples or data V2 a = V2 a a from linear for that. No,
the main benefit is that you can now encode how arguments in your functions
relate to each other with respect to length.
For example, the type alone of
map :: (a -> b) -> [a] -> [b] does not tell you
that the length of the result list is the same as the length of the input list.
However, consider vmap :: (a -> b) -> Vec n a -> Vec n b.
Here we see that the output list must have the same number of items as the input
list, and it’s enforced right there in the type signature!
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L35-L38vmap :: (a -> b) ->Vec n a ->Vec n bvmap f = \caseVNil->VNil x :+ xs -> f x :+ vmap f xs
And how about
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]? It’s
not clear or obvious at all how the final list’s length depends on the input
lists’ lengths. However, a vzipWith would ensure the input lengths
are the same size and that the output list is also the same length:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L40-L45vzipWith :: (a -> b -> c) ->Vec n a ->Vec n b ->Vec n cvzipWith f = \caseVNil-> \caseVNil->VNil x :+ xs -> \case y :+ ys -> f x y :+ vzipWith f xs ys
Note that both of the inner pattern matches are known by GHC to be
exhaustive: if it knows that the first list is VNil, then it knows
that n ~ Z, so the second list has to also be
VNil. Thanks GHC!
From here on out, we’re now always going to assume that GHC’s exhaustiveness
checker is on, so we always handle every branch that GHC tells us is necessary,
and skip handling branches that GHC tells us is unnecessary (through compiler
warnings).
We can even express more complicated relationships with type families
(type-level “functions”):
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L47-L63typefamilyPlus (x ::Nat) (y ::Nat) wherePlusZ y = yPlus (S z) y =S (Plus z y)typefamilyTimes (x ::Nat) (y ::Nat) whereTimesZ y =ZTimes (S z) y =Plus y (Times z y)vconcat ::Vec n a ->Vec m a ->Vec (Plus n m) avconcat = \caseVNil->id x :+ xs -> \ys -> x :+ vconcat xs ysvconcatMap :: (a ->Vec m b) ->Vec n a ->Vec (Times n m) bvconcatMap f = \caseVNil->VNil x :+ xs -> f x `vconcat` vconcatMap f xs
Note that all of these only work in GHC because the structure of the
functions themselves match exactly the structure of the type families. If you
follow the pattern matches in the functions, note that they match exactly with
the different equations of the type family.
Famously, we can totally index into fixed-length lists, in a way that
indexing will not fail. To do that, we have to define a type Fin n,
which represents an index into a list of length n. So,
Fin (S (S (S Z))) will be either 0, 1, or 2, the three possible
indices of a three-item list.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L65-L76dataFin ::Nat->Typewhere-- | if z is non-zero, FZ :: Fin z gives you the first itemFZ ::Fin ('S n)-- | if i indexes into length z, then (i+1) indixes into length (z+1)FS ::Fin n ->Fin ('S n)vindex ::Fin n ->Vec n a -> avindex = \caseFZ-> \case x :+ _ -> xFS i -> \case _ :+ xs -> vindex i xs
Fin takes the place of Int in
index :: Int -> [a] -> a. You can use FZ in any
non-empty list, because FZ :: Fin (S n) will match any
Vec (S n) (which is necessarily of length greater than 0). You can
use FS FZ only on something that matches
Vec (S (S n)). This is the type-safety.
We can also specify non-trivial relationships between lengths of lists, like
making a more type-safe take :: Int -> [a] -> [a]. We want to
make sure that the result list has a length less than or equal to the input
list. We need another “int” that can only be constructed in the case that the
result length is less than or equal to the first length. This called “proofs” or
“witnesses”, and act in the same role as TypeRep,
(:~:), etc. did above for our Sigma examples.
We want a type LTE n m that is a “witness” that n
is less than or equal to m. It can only be constructed for if
n is less than or equal to m. For example, you can
create a value of type LTE (S Z) (S (S Z)), but not of
LTE (S (S Z)) Z
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L78-L87dataLTE ::Nat->Nat->Typewhere-- | Z is less than or equal to any numberLTEZ ::LTEZ m-- | if n <= m, then (n + 1) <= (m + 1)LTES ::LTE n m ->LTE ('S n) ('S m)vtake ::LTE n m ->Vec m a ->Vec n avtake = \caseLTEZ-> \_ ->VNilLTES l -> \case x :+ xs -> x :+ vtake l xs
Notice the similarity to how we would define
take :: Int -> [a] -> [a]. We just spiced up the
Int argument with type safety.
Another thing we would like to do is use be able to create lists of
arbitrary length. We can look at
replicate :: Int -> a -> [a], and create a new “spicy int”
SNat n, so
vreplicate :: SNat n -> a -> Vec n a
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L89-L96dataSNat ::Nat->TypewhereSZ ::SNatZSS ::SNat n ->SNat (S n)vreplicate ::SNat n -> a ->Vec n avreplicate = \caseSZ-> \_ ->VNilSS n -> \x -> x :+ vreplicate n x
Notice that this type has a lot more guarantees than replicate.
For replicate :: Int -> a -> [a], we can’t guarantee (as the
caller) that the return type does have the length we give it. But for
vreplicate :: SNat n -> a -> Vec n a, it does!
SNat n is actually kind of special. We call it a
singleton, and it’s useful because it perfectly reflects the structure
of n the type, as a value…nothing more and nothing less. By pattern
matching on SNat n, we can exactly determine what n
is. SZ means n is Z, SS SZ
means n is S Z, etc. This is useful because we can’t
directly pattern match on types at runtime in Haskell (because of type erasure),
but we can pattern match on singletons at runtime.
We actually encountered singletons before in this post!
TypeRep a is a singleton for the type a: by pattern
matching on it (like with App earlier), we can essentially “pattern
match” on the type a itself.
In practice, we often write typeclasses to automatically generate singletons,
similar to Typeable from before:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L98-L108classKnownNat n where nat ::SNat ninstanceKnownNatZwhere nat =SZinstanceKnownNat n =>KnownNat (S n) where nat =SS natvreplicate' ::KnownNat n => a ->Vec n avreplicate' = vreplicate nat
One last thing: moving back and forth between the different levels. We can’t
really write a [a] -> Vec n a, because in Haskell, the type
variables are determined by the caller. We want n to be
determined by the list, and the function itself. And now suddenly we run into
the same issue that we ran into before, when moving between levels 2 and 3.
We can do the same trick before and write an existential wrapper:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L110-L116dataSomeVec a =forall n.MkSomeVec (SNat n) (Vec n a)toSomeVec :: [a] ->SomeVec atoSomeVec = \case [] ->MkSomeVecSZVNil x : xs ->case toSomeVec xs ofMkSomeVec n ys ->MkSomeVec (SS n) (x :+ ys)
It is common practice (and a good habit) to always include a singleton (or a
singleton-like typeclass constraint) to the type you are “hiding” when you
create an existential type wrapper, even when it is not always necessary. That’s
why we included TypeRep in HList and
SomeList earlier.
SomeVec a is essentially isomorphic to [a], except
you can pattern match on it and get the length n as a type you can
use.
There’s a slightly more light-weight method of returning an existential type:
by returning it in a continuation.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L118-L121withVec :: [a] -> (forall n.SNat n ->Vec n a -> r) -> rwithVec = \case [] -> \f -> f SZVNil x : xs -> \f -> withVec xs \n ys -> f (SS n) (x :+ ys)
That way, you can use the type variable within the continuation. Doing
withSomeVec xs \n v -> .... is identical to
case toSomeVec xs of SomeVec n v -> ....
However, since you don’t get the n itself until runtime, you
might find yourself struggling to use concepts like Fin and
LTE. To do use them comfortably, you have to write functions to
“check” if your LTE is even possible, known as “decision
functions”:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level5.hs#L123-L128isLTE ::SNat n ->SNat m ->Maybe (LTE n m)isLTE = \caseSZ-> \_ ->JustLTEZSS n -> \caseSZ->NothingSS m ->LTES<$> isLTE n m
This was a very whirlwind introduction, and I definitely recommend reading this
post on fixed-length lists for a more in-depth guide and tour of the
features. In practice, fixed-length lists are not that useful because the
situations where you want lazily linked lists and the situations where you want
them to be statically sized has very little overlap. But you will often see fixed-length vectors
in real life code — mostly numerical code.
Overall as you can see, at this level we gain some powerful guarantees and
tools, but we also run into some small inconveniences (like manipulating
witnesses and singletons). This level is fairly comfortable to work with in
modern Haskell tooling. However, if you live here long enough, you’re going to
eventually be tempted to wander into…
For our next level let’s jump back back into constraints on the
contents of the list. Let’s imagine a priority queue on top of
a list. Each value in the list will be a (priority, value) pair. To
make the pop operation (pop out the value of lowest priority)
efficient, we can enforce that the list is always sorted by priority:
the lowest priority is always first, the second lowest is second, etc.
If we didn’t care about type safety, we could do this by always inserting a
new item so that it is sorted:
This method enforces a local structure: between every item
x and the next item y in x:y:zs, the
priority of x has to be less than the priority y.
Keeping our structure local means we only need to enforce local invariants.
Writing it all willy nilly type unsafe like this could be good for a single
function, but we’re also going to need some more complicated functions. What if
we wanted to “combine” (merge) two sorted lists together. Using a normal list,
we don’t have any assurances that we have written it correctly, and it’s very
easy to mess up. How about we leverage type safety to ask GHC to ensure that our
functions are always correct, and always preserve this local structure? Now
you’re thinking in types!
Introducing level 6: enforcing local structure!
But, first, a quick note before we dive in: for the rest of this post, for
the sake of simplicity, let’s switch from inductively defined types (like
Nat above) to GHC’s built in opaque
Nat type. You can think of it as essentially the same as the
Nat we wrote above, but opaque and provided by the
compiler. Under the hood, it’s implemented using machine integers for
efficiency. And, instead of using concrete S (S (S Z)) syntax,
you’d use abstract numeric literals, like 3. There’s a trade-off:
because it’s opaque, we can’t pattern match on it and create or manipulate our
own witnesses — we are at the mercy of the API that GHC gives us. We get
+, <=, Min, etc., but in total it’s
not that extensive. That’s why I never use these without also bringing
typechecker plugins (ghc-typelits-natnormalise
and ghc-typelits-knonwnnat)
to help automatically bring witnesses and equalities and relationships into
scope for us. Everything here could be done using hand-defined witnesses and
types, but we’re using TypeNats here just for the sake of example.
With that disclaimer out of the way, let’s create our types! Let’s make an
Entry n a type that represents a value of type a with
priority n.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L28-L28newtypeEntry (n ::Nat) a =Entry a
We’d construct this like Entry @3 "hello", which produces
Entry 3 String. Again this uses type application syntax,
@3, that lets us pass in the type3 to the
constructor Entry :: forall n a. a -> Entry n a.
Now, let’s think about what phantom types we want to include in our list. The
fundamental strategy in this, as I learned from Conor McBride’s great writings on this
topic, are:
Think about what “type safe operations” you want to have for your
structure
Add just enough phantom types to perform those operations.
In our case, we want to be able to cons an Entry n a to the
start of a sorted list. To ensure this, we need to know that n is less than or
equal to the list’s current minimum priority. So, we need our list type
to be Sorted n a, where n is the current minimum
priority.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L33-L35dataSorted ::Nat->Type->TypewhereSSingle ::Entry n a ->Sorted n aSCons :: (KnownNat m, n <= m) =>Entry n a ->Sorted m a ->Sorted n a
To keep things simple, we are only going to talk about non-empty lists, so
the minimum priority is always defined.
So, a Sorted n a is either
SSingle (x :: Entry n a), where the single item is a value of
priority n, or SCons x xs, where x has
priority n and xs :: Sorted m a, where
n <= m. In our previous inductive Nat, you could
imagine this as
SCons :: SNat m -> LTE n m -> Entry n a -> Sorted m a -> Sorted n a,
but here we will use GHC’s built-in <= typeclass-based witness
of less-than-or-equal-to-ness.
This creates a valid list where the priorities are all sorted from lowest to
highest. You can now pop using pattern matching, which gives you the lowest
element by construction. If you match on SCons x xs, you
know that no entry in xs has a priority lower than
x.
Critically, note that creating something out-of-order like the following
would be a compiler error:
Now, the users of our priority queue probably won’t often care about
having the minimum priority in the type. In this case, we are using the phantom
type to ensure that our data structure is correct by construction, for our own
sake, and also to help us write internal functions in a correct way. So, for
practical end-user usage, we want to existentially wrap out n.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L103-L120dataSomeSorted a =forall n.KnownNat n =>SomeSorted (Sorted n a)popSomeSorted ::Sorted n a -> (Entry n a, Maybe (SomeSorted a))popSomeSorted = \caseSSingle x -> (x, Nothing)SCons x xs -> (x, Just (SomeSorted xs))
popSomeSorted takes an Sorted n a and returns the
Entry n a promised at the start of it, and then the rest of the
list if there is anything left, eliding the phantom parameter.
Now let’s get to the interesting parts where we actually leverage
n: let’s write insertSortedList, but the type-safe
way!
First of all, what should the type be if we insert an Entry n a
into a Sorted m a? If n <= m, it would be
Sorted n a. If n > m, it should be
Sorted m a. GHC gives us a type family Min n m, which
returns the minimum between n and m. So our type
should be:
insertSorted ::Entry n a ->Sorted m a ->Sorted (Min n m) a
To write this, we can use some helper functions: first, to decide if
we are in the n <= m or the n > m case:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L41-L51dataDecideInsert ::Nat->Nat->TypewhereDIZ :: (n <= m, Min n m ~ n) =>DecideInsert n mDIS :: (m <= n, Min n m ~ m) =>DecideInsert n mdecideInsert ::forall a b. (KnownNat a, KnownNat b) =>DecideInsert a bdecideInsert =case cmpNat (Proxy@a) (Proxy@b) ofLTI->DIZ-- if a < b, DIZEQI->DIZ-- if a == b, DIZGTI->case cmpNat (Proxy@b) (Proxy@a) ofLTI->DIS-- if a > b, DIZ, except GHC isn't smart enough to know thisGTI->error"absurd, we can't have both a > b and b > a"
We can use decideInsert to branch on if we are in the case where
we insert the entry at the head or the case where we have to insert it deeper.
DecideInsert here is our witness, and decideInsert
constructs it using cmpNat, provided by GHC to compare two
Nats. We use Proxy :: Proxy n to tell it what nats we
want to compare. KnownNat is the equivalent of our
KnownNat class we wrote from scratch, but with GHC’s TypeNats
instead of our custom inductive Nats.
cmpNat :: (KnownNat a, KnownNat b) => p a -> p b ->OrderingI a bdataOrderingI :: k -> k ->TypewhereLTI ::-- in this branch, a < bEQI ::-- in this branch, a ~ bGTI ::-- in this branch, a > b
Note that GHC and our typechecker plugins aren’t smart enough to know we can
rule out b > a if a > b is true, so we have to
leave an error that we know will never be called. Oh well. If we
were writing our witnesses by hand using inductive types, we could write this
ourselves, but since we are using GHC’s Nat, we are limited by what their API
can prove.
Let’s start writing our insertSorted:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L64-L76insertSorted ::forall n m a. (KnownNat n, KnownNat m) =>Entry n a ->Sorted m a ->Sorted (Min n m) ainsertSorted x = \caseSSingle y ->case decideInsert @n @m ofDIZ->SCons x (SSingle y)DIS->SCons y (SSingle x)SCons@q y ys ->case decideInsert @n @m ofDIZ->SCons x (SCons y ys)DIS-> sConsMin @n @q y (insertSorted x ys)
The structure is more or less the same as insertSortedList, but
now type safe! We basically use our handy helper function
decideInsert to dictate where we go. I also used a helper function
sConsMin to insert into the recursive case
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L53-L62sConsMin ::forall q r n a. (KnownNat q, KnownNat r, n <= q, n <= r) =>Entry n a ->Sorted (Min q r) a ->Sorted n asConsMin =case cmpNat (Proxy@q) (Proxy@r) ofLTI->SCons ::Entry n a ->Sorted q a ->Sorted n aEQI->SCons ::Entry n a ->Sorted q a ->Sorted n aGTI->SCons ::Entry n a ->Sorted r a ->Sorted n a
sConsMin isn’t strictly necessary, but it saves a lot of
unnecessary pattern matching. The reason why we need it is because we
want to write SCons y (insertSorted x ys) in the last line
of insertSorted. However, in this case, SCons does not
have a well-defined type. It can either be
Entry n -> Sorted q a -> Sorted n a or
Entry n -> Sorted r a -> Sorted n a. Haskell requires
functions to be specialized at the place we actually use them, so this
is no good. We would have to pattern match on cmpNat and
LTI/EQI/GTI in order to know how to
specialize SCons. So, we use sConsMin to wrap this up
for clarity.
How did I know this? I basically tried writing it out the full messy way,
bringing in as much witnesses and pattern matching as I could, until I got it to
compile. Then I spent time factoring out the common parts until I got what we
have now!
Note that we use a feature called “Type Abstractions” to “match on” the
existential type variable q in the pattern
SCons @q y ys. Recall from the definition of SCons
that the first type variable is the minimum priority of the tail.
And just like that, we made our insertSortedListtype-safe! We can no longer return an unsorted list: it always inserts
sortedly, by construction, enforced by GHC. We did cheat a little with
error, that was only because we used GHC’s TypeNats…if we used our
own inductive types, all unsafety can be avoided.
Let’s write the function to merge two sorted lists together. This is
essentially the merge step of a merge sort: take two lists, look at the head of
each one, cons the smaller of the two heads, then recurse.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L78-L92mergeSorted ::forall n m a. (KnownNat n, KnownNat m) =>Sorted n a ->Sorted m a ->Sorted (Min n m) amergeSorted = \caseSSingle x -> insertSorted xSCons@q x xs -> \caseSSingle y ->case decideInsert @n @m ofDIZ-> sConsMin @q @m x (mergeSorted xs (SSingle y))DIS->SCons y (SCons x xs)SCons@r y ys ->case decideInsert @n @m ofDIZ-> sConsMin @q @m x (mergeSorted xs (SCons y ys))DIS-> sConsMin @n @r y (mergeSorted (SCons x xs) ys)
Again, this looks a lot like how you would write the normal function to merge
two sorted lists…except this time, it’s type-safe! You can’t return an
unsorted list because the result list has to be sorted by
construction.
To wrap it all up, let’s write our conversion functions. First, an
insertionSort function that takes a normal non-empty list of
priority-value pairs and throws them all into a Sorted, which (by
construction) is guaranteed to be sorted:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level6.hs#L107-L135insertionSort ::forall a.NonEmpty (Natural, a) ->SomeSorted ainsertionSort ((k0, x0) :| xs0) = withSomeSNat k0 \(SNat@k) -> go xs0 (SomeSorted (SSingle (Entry@k x0)))where go :: [(Natural, a)] ->SomeSorted a ->SomeSorted a go [] =id go ((k, x) : xs) = \caseSomeSorted@_ @n ys -> withSomeSNat k \(SNat@k) -> go xs $ someSortedMin @k @n $ insertSorted (Entry@k x) yssomeSortedMin ::forall n m a. (KnownNat n, KnownNat m) =>Sorted (Min n m) a ->SomeSorted asomeSortedMin =case cmpNat (Proxy@n) (Proxy@m) ofLTI->SomeSortedEQI->SomeSortedGTI->SomeSorted
Some things to note:
We’re using the nonempty
list type type from base, because Sorted always has at
least one element.
We use withSomeSNat to turn a Natural into the
type-level n :: Nat, the same way we wrote withVec
earlier. This is just just the function that GHC offers to reify a
Natural (non-negative Integer) to the type level.
someSortedMin is used to clean up the implementation, doing the
same job that sConsMin did.
For our final level, let’s imagine a “weighted list” of (Int, a)
pairs, where each item a has an associated weight or cost. Then,
imagine a “bounded weighted list”, where the total cost must not exceed
some limit value. Think of it as a list of files and their sizes and a maximum
total file size, or a backpack for a character in a video game with a maximum
total carrying weight.
There is a fundamental difference here between this type and our last type:
we want to enforce a global invariant (total cannot exceed a limit),
and we can’t “fake” this using local invariants like last time.
Introducing level 7: enforcing global structure! This brings some
extra complexities, similar to the ones we encountered in Level 5 with our
fixed-length lists: whatever phantom type we use to enforce this “global”
invariant now becomes entangled to the overall structure of our data type
itself.
Let’s re-use our Entry type, but interpret an
Entry n a as a value of type a with a weight
n. Now, we’ll again “let McBride be our guide” and ask the same
question we asked before: what “type-safe” operation do we want, and what
minimal phantom types do we need to allow this type-safe operation? In our case,
we want to insert into our bounded weighted list in a safe way, to
ensure that there is enough room. So, we need two phantom types:
One phantom type lim to establish the maximum weight of our
container
Another phantom type n to establish the current used capacity
of our container.
We want Bounded lim n a:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L24-L31dataBounded ::Nat->Nat->Type->TypewhereBNil ::Bounded lim 0 aBCons::forall n m lim a. (KnownNat m, n + m <= lim) =>Entry n a ->Bounded lim m a ->Bounded lim (n + m) a
The empty bounded container BNil :: lim 0 a can satisfy
anylim, and has weight 0.
If we have a Bounded lim m a, then we can add an
Entry n a to get a Bounded lim (m + n) a provided that
m + n <= lim using BCons.
Let’s try this out by seeing how the end user would “maybe insert” into a
bounded list of it had enough capacity:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L133-L145dataSomeBounded ::Nat->Type->TypewhereSomeBounded ::KnownNat n =>Bounded lim n a ->SomeBounded lim ainsertSomeBounded ::forall lim n a. (KnownNat lim, KnownNat n) =>Entry n a ->SomeBounded lim a ->Maybe (SomeBounded lim a)insertSomeBounded x (SomeBounded@m xs) =case cmpNat (Proxy@(n + m)) (Proxy@lim) ofLTI->Just$SomeBounded (BCons x xs)EQI->Just$SomeBounded (BCons x xs)GTI->Nothing
First we match on the SomeBounded to see what the current
capacity m is. Then we check using cmpNat to see if
the Bounded can hold m + n. If it does, we can return
successfully. Note that we define SomeBounded using GADT syntax so
we can precisely control the order of the type variables, so
SomeBounded @m xs binds m to the capacity of the inner
list.
Remember in this case that the end user here isn’t necessarily using
the phantom types to their advantage (except for lim, which could
be useful). Instead, it’s us who is going to be using n to
ensure that if we ever create any Bounded (or
SomeBounded), it will always be within capacity by
construction.
Now that the usage makes sense, let’s jump in and write some type-safe
functions using our fancy phantom types!
First, let’s notice that we can always “resize” our
Bounded lim n a to a Bounded lim' n a as long as the
total usage n fits within the new carrying capacity:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L35-L38reBounded ::forall lim lim' n a. n <= lim' =>Bounded lim n a ->Bounded lim' n areBounded = \caseBNil->BNilBCons x xs ->BCons x (reBounded xs)
Note that we have full type safety here! GHC will prevent us from using
reBounded if we pick a new lim that is less
than what the bag currently weighs! You’ll also see the general pattern here
that changing any “global” properties for our type here will require recursing
over the entire structure to adjust the global property.
How about a function to combine two bags of the same weight? Well, this
should be legal as long as the new combined weight is still within the
limit:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L48-L56concatBounded ::forall n m lim a. (KnownNat n, KnownNat m, KnownNat lim, n + m <= lim) =>Bounded lim n a ->Bounded lim m a ->Bounded lim (n + m) aconcatBounded = \caseBNil->idBCons@x @xs x xs ->BCons x . concatBounded xs
Aside
This is completely unrelated to the topic at hand, but if you’re a big nerd
like me, you might enjoy the fact that this function makes
Bounded lim n a the arrows of a Category whose
objects are the natural numbers less than or equal to lim,
the identity arrow is BNil, and arrow composition is
concatBounded. Between object n and m, if
n <= m, its arrows are values of type
Bounded lim (m - n) a. Actually wait, it’s the same thing with
Vec and vconcat above isn’t it? I guess we were moving
so fast that I didn’t have time to realize it.
Anyway this is related to the preorder category, but not
thin. A thicc preorder category, if you will. Always nice to spot a category out
there in the wild.
It should be noted that the reason that reBounded and
concatBounded look so clean so fresh is that we are heavily
leveraging typechecker plugins. But, these are all still possible with normal
functions if we construct the witnesses explicitly.
Now for a function within our business logic, let’s write
takeBounded, which constricts a
Bounded lim n a to a Bounded lim' q a with a smaller
limit lim', where q is the weight of all of the
elements that fit in the new limit. For example, if we had a bag of limit
15 containing items weighing 4, 3, and 5 (total 12), but we wanted to
takeBounded with a new limit 10, we would take the 4 and 3 items,
but leave behind the 5 item, to get a new total weight of 7.
It’d be nice to have a helper data type to existentially wrap the new
q weight in our return type:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L113-L118dataTakeBounded ::Nat->Nat->Type->TypewhereTakeBounded::forall q lim n a. (KnownNat q, q <= n) =>Bounded lim q a ->TakeBounded lim n a
So the type of takeBounded would be:
takeBounded :: (KnownNat lim, KnownNat lim', KnownNat n) =>Bounded lim n a ->TakeBounded lim' n a
Again I’m going to introduce some helper functions that will make sense
soon:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L40-L46bConsExpand ::KnownNat m =>Entry n a ->Bounded lim m a ->Bounded (n + lim) (n + m) abConsExpand x xs = withBoundedWit xs $BCons x (reBounded xs)withBoundedWit ::Bounded lim n a -> (n <= lim => r) -> rwithBoundedWit = \caseBNil-> \x -> xBCons _ _ -> \x -> x
From the type, we can see bCons adds a new item while also
increasing the limit:
bConsExpand :: Entry n a -> Bounded lim m a -> Bounded (n + lim) (n + m) a.
This is always safe conceptually because we can always add a new item into any
bag if we increase the limit of the bag:
Entry 100 a -> Bounded 5 3 a -> Bounded 105 103 a, for
instance.
Next, you’ll notice that if we write this as
BCons x (reBounded xs) alone, we’ll get a GHC error complaining
that this requires m <= lim. This is something that we
know has to be true (by construction), since there isn’t any
constructor of Bounded that will give us a total weight
m bigger than the limit lim. However, this requires a
bit of witness manipulation for GHC to know this: we have to
essentially enumerate over every constructor, and within each constructor GHC
knows that m <= lim holds. This is what
withBoundedWit does. We “know” n <= lim, we just
need to enumerate over the constructors of Bounded lim n a so GHC
is happy in every case.
withBoundedWit’s type might be a little confusing if this is the
first time you’ve seen an argument of the form
(constraint => r): it takes a Bounded lim n a and a
“value that is only possible if n <= lim”, and then gives you
that value.
With that, we’re ready:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L120-L131takeBounded ::forall lim lim' n a. (KnownNat lim, KnownNat lim', KnownNat n) =>Bounded lim n a ->TakeBounded lim' n atakeBounded = \caseBNil->TakeBoundedBNilBCons@x @xs x xs ->case cmpNat (Proxy@x) (Proxy@lim') ofLTI->case takeBounded @lim @(lim' - x) xs ofTakeBounded@q ys ->TakeBounded@(x + q) (bConsExpand x ys)EQI->TakeBounded (BCons x BNil)GTI->TakeBoundedBNil
Thanks to the types, we ensure that the returned bag must contain at
mostlim'!
As an exercise, try writing splitBounded, which is like
takeBounded but also returns the items that were leftover. Solution
here.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L91-L103dataSplitBounded ::Nat->Nat->Nat->Type->TypewhereSplitBounded::forall q lim lim' n a. (KnownNat q, q <= n) =>Bounded lim' q a ->Bounded lim (n - q) a ->SplitBounded lim lim' n asplitBounded ::forall lim lim' n a. (KnownNat lim, KnownNat lim', KnownNat n) =>Bounded lim n a ->SplitBounded lim lim' n a
One final example, how about a function that reverses the
Bounded lim n a? We’re going to write a “single-pass reverse”,
similar to how it’s often written for lists:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L68-L73reverseList :: [a] -> [a]reverseList = go []where go res = \case [] -> res x : xs -> go (x : res) xs
Now, reversing a Bounded should be legal, because reversing the
order of the items shouldn’t change the total weight. However, we basically
“invert” the structure of the Bounded type, which, depending on how
we set up our phantom types, could mean a lot of witness reshuffling. Luckily,
our typechecker plugin handles most of it for us in this case, but it exposes
one gap:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/type-levels/Level7.hs#L58-L89reverseBounded ::forall lim n a. (n <= lim, KnownNat lim, KnownNat n) =>Bounded lim n a ->Bounded lim n areverseBounded = go BNilwhere go ::forall m q. (KnownNat m, KnownNat q, m <= lim, m + q <= lim) =>Bounded lim m a ->Bounded lim q a ->Bounded lim (m + q) a go res = \caseBNil-> resBCons@x @xs x xs -> solveLte @m @q @x @lim $ go @(x + m) @xs (BCons@x @m x res) xssolveLte ::forall a b c n r. (KnownNat a, KnownNat c, KnownNat n, a + b <= n, c <= b) => (a + c <= n => r) -> rsolveLte x =case cmpNat (Proxy@(a + c)) (Proxy@n) ofLTI-> xEQI-> xGTI->error"absurd: if a + b <= n and c < b, the a + c can't > n"
Due to how everything gets exposed, we need to prove that if
a + b <= n and c <= b, then
a + c <= n. This is always true, but the typechecker plugin
needs a bit of help, and we have to resort to an unsafe operation to get this to
work. However, if we were using our manually constructed inductive types instead
of GHC’s opaque ones, we could write this in a type-safe and total way. We run
into these kinds of issues a lot more often with global invariants than we do
with local invariants, because the invariant phantom becomes so entangled with
the structure of our data type.
And…that’s about as far as we’re going to go with this final level! If this
type of programming with structural invariants is appealing to you, check out
Conor McBride’s famous type-safe
red-black trees in Haskell paper, or Edwin Brady’s Type-Driven
Development in Idris for how to structure entire programs around these
principles.
Evident from the fact that Conor’s work is in Agda, and Brady’s in Idris, you
can tell that in doing this, we are definitely pushing the boundaries of what is
ergonomic to write in Haskell. Well, depending on who you ask, we already zipped
that boundary long ago. Still, there’s definitely a certain kind of joy to
defining invariants in your data types and then essentially proving to
the compiler that you’ve followed them. But, most people will be happier just
writing a property test to fuzz the implementation on a non type-safe structure.
And some will be happy with…unit tests. Ha ha ha ha. Good joke right?
Anyway, hope you enjoyed the ride! I hope you found some new ideas for ways
to write your code in the future, or at least found them interesting or
eye-opening. Again, none of the data structures here are presented to be
practically useful as-is — the point is more to present these typing principles
and mechanics in a fun manner and to inspire a sense of wonder.
Which level is your favorite, and what level do you wish you could
work at if things got a little more ergonomic?
Special Thanks
I am very humbled to be supported by an amazing community, who make it
possible for me to devote time to researching and writing these posts. Very
special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
Luke’s blog has been known to switch back and forth from private
to non-private, so I will link to the official post and respect the decision of
the author on whether or not it should be visible. However, the term itself is
quite commonly used and if you search for it online you will find much
discussion about it.↩︎
Note that I don’t really like calling these “vectors” any more,
because in a computer science context the word vector carries implications of
contiguous-memory storage. “Lists” of fixed length is the more appropriate
description here, in my opinion. The term “vector” for this concept arises from
linear algebra, where a vector is inherently defined by its vector
space, which does
have an inherent dimensionality. But we are talking about computer science
concepts here, not mathematical concepts, so we should pick the name that
provides the most useful implicit connotations.↩︎
I've been getting some questions from people about how to use Diesel and particularly diesel-async for interacting with SQL databases in Rust. I thought I'd write up a quick post with some patterns and examples.
At work I use org-mode to keep notes about useful ways to query our systems,
mostly that involves using the built-in SQL support to access DBs and ob-http to
send HTTP requests. In both cases I often need to provide credentials for the
systems. I'm embarrassed to admit it, but for a long time I've taken the easy
path and kept all credentials in clear text. Every time I've used one of those
code blocks I've thought I really ought to find a better way of handling these
secrets one of these days. Yesterday was that day.
I ended up with two functions that uses auth-source and its ~/.authinfo.gpg
file.
(defunmes/auth-get-pwd(host)"Get the password for a host (authinfo.gpg)"(->(auth-source-search :host host)
car
(plist-get :secret)
funcall))(defunmes/auth-get-key(host key)"Get a key's value for a host (authinfo.gpg)Not usable for getting the password (:secret), use 'mes/auth-get-pwd'for that."(->(auth-source-search :host host)
car
(plist-get key)))
It turns out that the library can handle more keys than the documentation
suggests so for DB entries I'm using a machine (:host) that's a bit shorter
and easier to remember than the full AWS hostname. Then I keep the DB host and
name in dbhost (:dbhost) and dbname (:dbname) respectively. That makes
an entry look like this:
I live in southeastern Pennsylvania, so the Pennsylvania-New Jersey-Delaware triple point must be somewhere nearby. I sat up and got my phone so I could look at the map, and felt foolish.
As you can see, the triple point is in the middle of the Delaware River, as of course it must be; the entire border between Pennsylvania and New Jersey, all the hundreds of miles from its northernmost point (near Port Jervis) to its southernmost (shown above), runs right down the middle of the Delaware.
I briefly considered making a trip to get as close as possible, and photographing the point from land. That would not be too inconvenient. Nearby Marcus Hook is served by commuter rail. But Marcus Hook is not very attractive as a destination. Having been to Marcus Hook, it is hard for me to work up much enthusiasm for a return visit.
I was recently passing by Marcus Hook on the way back from Annapolis,
so I thought what the heck, I'd stop in and see if I could get a look
in the direction of the tripoint. As you can see from this screencap,
I was at least standing in the right place, pointed in the right direction.
I didn't quite see the tripoint itself because this buoyancy-operated
aquatic transport was in the way. I don't mind, it was more
interesting to look at than open water would have been.
Thanks to
the Wonders of the Internet,
I have learned that this is an LPG tanker. Hydrocarbons from hundreds
of miles away are delivered to the refinery in Marcus Hook
via rail, road, and pipeline, and then shipped out on vessels like
this one.
Infrastructure fans should check it out.
I was pleased to find that Marcus Hook wasn't as dismal as I
remembered, it's just a typical industrial small town. I thought
maybe I should go back and look around some more. If you hoped I
might have something more interesting or even profound to say here,
sorry.
Oh, I know. Here, I took this picture in Annapolis:
Perhaps he who is worthy of honor does not die. But fame is fleeting.
Even if he who is worthy of honor does get a plinth, the grateful
populace may not want to shell out for a statue.
This is the twenty-fourth edition of our GHC activities report, which describes
the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain.
The current edition covers roughly the months of June to August 2024.
You can find the previous editions collected under the
ghc-activities-report tag.
Sponsorship
We are delighted to offer new Haskell Ecosystem Support Packages to provide
commercial users with access to Well-Typed’s experts while investing in the
Haskell community and its technical ecosystem. If your company is using Haskell,
read more about our offer, or
get in touch with us today, so we can help you get
the most out of the toolchain. We need more funding to continue our essential maintenance work!
Matthew has added support for building executables with the combination
of dynamic linking and profiling, building on top of work by Ian-Woo Kim.
On the GHC side, we need to build and distribute libraries built in the profiled
dynamic way. This required a few small changes in Hadrian (!12595).
The bulk of the work was adding support in Cabal (#9900).
With Cabal >= 3.14 and GHC >= 9.12, libraries can be compiled in the
profiled dynamic way by passing --profiling-shared. Passing --enable-profiling
together with --enable-executable-dynamic will then allow one to build
executables with profiling which are dynamically linked against the appropriate
p_dyn archives.
Object determinism
Matthew and Rodrigo have been working on making it so that the object code that
GHC produces is deterministic (#12935). One main benefit of determinism is
improved caching of build products.
Matthew and Rodrigo have identified several sources of non-determinism (not
just in object files):
Unique identifiers in Cmm are generated non-deterministically. Making these
deterministic is the main challenge; it can be achieved by a combination of:
Threading through a unique supply to ensure that the uniques are generated
in a deterministic order,
Where that isn’t possible (e.g. due to performance implications), recover
determinism in a subsequent renaming pass.
This is the work of the main MR !12680, which is still in progress at the
time of writing.
Re-exports of entire modules can cause non-determinism in interface
files (#13093). This is fixed by using a stable sort of the exports (!13093).
Files added by addDependentFile can be added in a non-deterministic order
(#25131); this is fixed by using a deterministic module environment in GHC (!13099),
and avoiding temporary file names which use randomly generated names inside
Template Haskell.
The existence of rules can impact inlining decisions, and a bug in GHC
(#19725) means that sometimes rules that are not in the dependency closure
of a given module can impact inlining decisions in that module (#25170);
as these rules may or may not be loaded (depending e.g. on parallel compilation),
this causes non-determinism.
Build paths could leak into object files, e.g. a preprocesor could introduce
a LINE pragma that referred to a build path, due to a regression in Cabal
(#10242).
Matthew addressed a plethora of issues surrounding the treatment of working
directories in Cabal in #10256,
putting this problem to rest.
SIMD
SIMD (“Single Instruction, Multiple Data”) refers to CPU support for instructions
that operate on short vectors of data in parallel. For example, on X86-64 with
SSE instructions, one can use a single mulps
instruction to multiply two vectors containing 4 32-bit floating point values,
element by element. Making good use of these instructions can often be critical
in high-performance applications, in particular those involving array processing.
In Haskell, SIMD is exposed through GHC.Exts, with e.g. FloatX4#
representing a 4-element vector of Float#s, and timesFloatX4# the corresponding elementwise multiplication operation
that lowers to mulps on X86-64.
Historically, these types and operations were only supported in GHC’s LLVM backend
(i.e. one had to use -fllvm), and not GHC’s native code generator (NCG).
However, Sam has been working to add SIMD support to GHC’s X86 native code generator.
The first challenge was fixing register allocation, which takes in Cmm code and
assigns variables to registers. GHC didn’t previously keep track of how registers
were used, but SIMD instructions can use registers such as xmm0 to store many
different things, such as f64 (Double#) and v4f32 (FloatX4#).
The register allocator needs to know precisely what is stored in the register,
so that if we need to spill the register to the stack we know how much to spill
(and conversely how much to load when reloading from the stack).
Tracking the usage of registers was made possible in Cmm with previous work (!9167);
it was then a case of propagating this information through the register allocator
and emitting the correct store and load instructions.
However, work in this area revealed further issues with the existing SIMD
support in GHC, thus affecting both the NCG and LLVM backends:
The generated Cmm code for unknown function calls
involving SIMD vectors was plain broken (#25062).
GHC is liable to drop the upper part of vector registers
around hand-written Cmm code in the RTS (#25169).
There were a few other tricky aspects to the implementation, in particular to
account for the fact that the Windows X86_64 C calling convention dictates that
vector values must be passed by reference and cannot be passed directly in
registers.
Haddock
The Haddock tool for generating Haskell documentation has been merged into the
GHC tree. This greatly simplifies the contribution workflow, especially as
GHC patches that required Haddock changes used to require a delicate dance with
git submodules.
Zubin finalised the merge, restoring missing commits that were
scattered across various branches (!12720). He then integrated the building and testing
of Haddock with GHC’s build system, Hadrian (!12743).
Zubin then fixed many other outstanding issues with Haddock, including:
Handling of non-hs files, so that haddock can generate documentation for
modules with foreign imports (!13008)
Several issues with incorrect cross-package references (!12980)
Bindist testing
Historically, the GHC project did not substantively test the deployment of
produced GHC binary distributions. After several releases in which the CI
of the ghcup project revealed issues, Matt implemented
a more robust suite of tests for GHC bindists, testing both ghcup usage and
manual installation.
This culminated in the creation of the
ghcup-ci repository,
which will be used henceforth as part of the GHC release process.
Semaphores for faster parallel builds
We previously introduced a new feature to GHC and Cabal allowing them to share
compute resources more effectively using a semaphore: see our previous blog post
on Reducing Haskell parallel build times using semaphores.
An issue with this feature was reported on the Cabal issue tracker (#9993).
Zubin investigated and discovered that this was due to cabal-install and
ghc being compiled and linked against inconsistent implementations of libc.
For example, if cabal-install is built against musl, it will create
a POSIX semaphore named /dev/shm/cabal_semaphore_k, but ghc built against glibc
would instead attempt to read /dev/shm/sem.cabal_semaphore_k.
This is not simply a naming issue, as the semaphore implementations can be
genuinely incompatible.
To address this, Zubin has been rewriting the implementation used
on POSIX-like systems to use sockets instead of semaphores, which don’t
suffer from this problem, and has
proposed to amend the GHC Proposal accordingly.
Other work
Frontend
Sam allowed the renamer to take COMPLETE pragmas into consideration when
deciding whether a do block should require MonadFail (!10565),
fixing a long standing feature request (#15681).
Sam fixed an issue where incomplete pattern match suggestions included
out-of-scope patterns (#25115, !13089).
Sam updated the GHC diagnostic code infrastructure to allow it to be
extensible for other tools. This enabled another GHC contributor, Jade,
to re-use that infrastructure for GHCi error messages (#23338).
Matthew fixed an issue where -Wmissing-home-modules gave incorrect warnings
when multiple units have the same module name (!13085).
Matthew extended the -reexported-module flag to support module renaming
(!13117), unblocking Cabal issue #10181.
Zubin fixed the pretty printing of ticked prefix constructors (#24237, !13115).
Code generator
Andreas fixed a bug in the graph-colouring register allocator on AArch64
(#24941, !12990).
Rodrigo helped land work by contributor Alex Mason that improves the lowering of the byte-swap primitive in the AArch64 NCG
using the REV instruction (!12814).
Compiler modularity
Rodrigo introduced a ‘one-shot’ strict state monad and used it to replace
boilerplate monad instances in the GHC code base (!12978).
At Zurihac, Rodrigo oversaw a collaboration on the trees-that-grow infrastructure
in GHC, aiming to make the Haskell AST separate from GHC in the hopes of splitting
it up into a separate library. This culminated in a collaborative MR !12830 with
a wide range of contributions from several new contributors.
GHCi
Andreas fixed tagging issues causing segfaults in GHCi (!12773).
Sam fixed an issue with the GHCi debugger’s treatment of record fields
(#25109, !13091).
Build system and CI
Andreas investigated and fixed issues revealed by the
test-primops suite.
Matt migrated the CI away from Debian 10 and onto Debian 12 (!13033),
and addressed subsequent issues with ghcup metadata generation (!13044, !13078).
Sam updated the testsuite driver to use py-cpuinfo to query for available
CPU features, fixing feature detection on Windows (!12971).
Matt added file-io as a boot package, to allow upgrading the version of directory (!13122).
This post introduces some tricks for jailbreaking hosts behind
“secure” enterprise firewalls in order to enable arbitrary inbound and
outbound requests over any protocol. You’ll probably find the tricks
outlined in the post useful if you need to deploy software in a hostile
networking environment.
The motivation for these tricks is that you might be a vendor that
sells software that runs in a customer’s datacenter (a.k.a. on-premises
software), so your software has to run inside of a restricted
network environment. You (the vendor) can ask the customer to open their
firewall for your software to communicate with the outside world
(e.g. your own datacenter or third party services), but customers will
usually be reluctant to open their firewall more than necessary.
For example, you might want to ssh into your host so
that you can service, maintain, or upgrade the host, but if you ask the
customer to open their firewall to let you ssh in they’ll
usually push back on or outright reject the request. Moreover, this
isn’t one of those situations where you can just ask for forgiveness
instead of permission because you can’t begin to do anything without
explicitly requesting some sort of firewall change on their
part.
So I’m about to teach you a bunch of tricks for efficiently tunneling
whatever you want over seemingly innocuous openings in a customer’s
firewall. These tricks will culminate with the most cursed trick of all,
which is tunneling inbound SSH connections inside of
outbound HTTPS requests. This will grant you full
command-line access to your on-premises hosts using the most benign
firewall permission that a customer can grant. Moreover, this post is
accompanied by a repository named
holepunch containing NixOS modules automating this ultimate
trick which you can either use directly or consult as a working
proof-of-concept for how the trick works.
Overview
Most of the tricks outlined in this post assume that you control the
hosts on both ends of the network request. In other words, we’re going
to assume that there is some external host in your datacenter and some
internal host in the customer’s datacenter and you control the software
running on both hosts.
There are four tricks in our arsenal that we’re going to use to
jailbreak internal hosts behind a restrictive customer firewall:
Once you master these four tools you will typically be able to do
basically anything you want using the slimmest of firewall
permissions.
You might also want to read another post of mine: Forward
and reverse proxies explained. It’s not required reading for this
post, but you might find it helpful or interesting if you like this
post.
Proxies
We’re going to start with proxies since that’s the easiest thing to
explain which requires no other conceptual dependencies.
A proxy is a host that can connect to other hosts on
a client’s behalf (instead of the client making a direct connection to
those other hosts). We will call these other hosts “upstream hosts”.
One of the most common tricks when jailbreaking an internal host (in
the customer’s datacenter) is to create an external host (in your
datacenter) that is a proxy. This is really effective
because the customer has no control over traffic between the proxy
and upstream hosts. The customer’s firewall can only see, manage,
and intercept traffic between the internal host and the proxy, but
everything else is invisible to them.
There are two types of proxies, though: forward proxies and reverse
proxies. Both types of proxies are going to come in handy for
jailbreaking our internal host.
Forward proxy
A forward proxy is a proxy that lets the client
decide which upstream host to connect to. In our case, the “client” is
the internal host that resides in the customer datacenter that is trying
to bypass the firewall.
Forward proxies come in handy when the customer restricts which hosts
that you’re allowed to connect to. For example, suppose that your
external host’s address is external.example.com and your
internal hosts’s address is internal.example.com. Your
customer might have a firewall rule that prevents
internal.example.com from connecting to any host other than
external.example.com. The intention here is to prevent your
machine from connecting to other (potentially malicious) machines.
However, this firewall rule is quite easy for a vendor to subvert.
All you have to do is host a forward proxy at
external.example.com and then any time
internal.example.com wants to connect to any other domain
(e.g. google.com) it can just route the request through the
forward proxy hosted at external.example.com. For example,
squid is one example of a forward proxy that you can use
for this purpose, and you could configure it like this:
acl internal src ${SUBNET OF YOUR INTERNAL SERVER(S)}
http_access allow internal
http_access deny all
… and then squid will let any program on
internal.example.com connect to any host reachable from
external.example.com so long as the program configured
http://external.example.com:3128 as the forward proxy. For
example, you’d be able to run this command on
internal.example.com:
… and the request would succeed despite the firewall because from the
customer’s point of view they can’t tell that you’re using a forward
proxy. Or can they?
Reverse proxy
Well, actually the customer can tell that you’re doing
something suspicious. The connection to squid isn’t
encrypted (note that the scheme for our forward proxy URI is
http and not https), and most modern firewalls
will be smart enough to monitor unencrypted traffic and notice that
you’re trying to evade the firewall by using a forward proxy (and they
will typically block your connection if you try this). Oops!
Fortunately, there’s a very easy way to evade this: encrypt the
traffic to the proxy! There are quite a few ways to do this, but the
most common approach is to put a “TLS-terminating reverse proxy” in
front of any service that needs to be encrypted.
So what’s a “reverse proxy”? A reverse proxy is a
proxy where the proxy decides which upstream host to connect to (instead
of the client deciding). A TLS-terminating reverse
proxy is one whose sole purpose is to provide an encrypted endpoint that
clients can connect to and then it forwards unencrypted traffic to some
(fixed) upstream endpoint (e.g. squid running on
external.example.com:3128 in this example).
There are quite a few services created for doing this sort of thing,
but the three I’ve personally used the most throughout my career
are:
nginx
haproxy
stunnel
For this particular case, I actually will be using
stunnel to keep things as simple as possible
(nginx and haproxy require a bit more
configuration to get working for this).
You would run stunnel on
external.example.com with a configuration that would look
something like this:
… and now connections to https://external.example.com
are encrypted and handled by stunnel, which will decrypt
the traffic and route those requests to squid running on
port 3128 of the same machine.
In order for this to work you’re going to need a valid certificate
for external.example.com, which you can obtain for free
using Let’s Encrypt. Then you
staple the certificate public key and private key to generate the final
PEM file that you reference in the above stunnel
configuration.
So if you’ve gotten this far your server can now access any publicly
reachable address despite the customer’s firewall restriction. Moreover,
the customer can no longer detect that anything is amiss because all of
your connections to the outside world will appear to the customer’s
firewall as encrypted HTTPS connections to
external.example.com:443, which is an extremely innocuous
type of of connection.
Reverse tunnel
We’re only getting started, though! By this point we can make
whatever outbound connections we want, but WHAT ABOUT INBOUND
CONNECTIONS?
As it turns out, there is a trick known as a reverse
tunnel which lets you tunnel inbound connections over outbound
connections. Most reverse tunnels exploit two properties of TCP
connections:
TCP connections may be long-lived (sometimes very long-lived)
TCP connections must necessarily support network traffic in both
directions
Now, in the common case a lot of TCP connections are short-lived. For
example, when you open https://google.com in your browser that is an
HTTPS request which is layered on top of a TCP connection. The HTTP
request message is data sent in one direction over the TCP connection
and the HTTP response message is data sent in the other direction over
the TCP connection and then the TCP connection is closed.
But TCP is much more powerful than that and reverse tunnels exploit
that latent protocol power. To illustrate how that works I’ll use the
most widely known type of reverse tunnel: the SSH reverse tunnel.
You typically create an SSH reverse tunnel by running a command like
this from the internal machine
(e.g. internal.example.com):
In an SSH reverse tunnel, the internal machine
(e.g. internal.example.com) initiates an outbound TCP
request to the SSH daemon (sshd) listening on the external
machine (e.g. external.example.com). When sshd
receives this TCP request it keeps the TCP connection alive and
then listens for inbound requests on EXTERNAL_PORT of the
external machine. sshd forward all requests received on
that port through the still-alive TCP connection back to the
INTERNAL_PORT on the internal machine. This works fine
because TCP connections permit arbitrary data flow both ways and the
protocol does not care if the usual request/response flow is suddenly
reversed.
In fact, an SSH reverse tunnel doesn’t just let you make inbound
connections to the internal machine; it lets you make inbound
connections to any machine reachable from the internal
machine (e.g. other machines inside the customer’s datacenter).
However, those kinds of connections to other internal hosts can be
noticed and blocked by the customer’s firewall.
From the point of view of the customer’s firewall, our internal
machine has just made a single long-lived outbound
connection to external.example.com and they cannot easily
tell that the real requests are coming in the other direction
(inbound) because those requests are being tunneled
inside of the outbound request.
However, this is not foolproof, for two reasons:
A customer’s firewall can notice (and ban) a long-lived
connection
I believe it is possible to disguise a long-lived connection as a
series of shorter-lived connections, but I’ve never personally done that
before so I’m not equipped to explain how to do that.
A customer’s firewall will notice that you’re making an
SSH connection of some sort
Even when the SSH connection is encrypted it is still possible for a
firewall to detect that the SSH protocol is being used. A lot of
firewalls will be configured to ban SSH traffic by default unless
explicitly approved.
However, there is a great solution to that latter problem, which is
…
corkscrew
corkscrew is an extremely simple tool that wraps an SSH
connection in an HTTP connection. This lets us disguise SSH traffic as
HTTP traffic (which we can then further disguise as HTTPS traffic by
encrypting the connection using stunnel).
Normally, the only thing we’d need to do is to extend our
ssh -R command to add this option:
… but this doesn’t work because corkscrew doesn’t
support HTTPS connections (it’s an extremely simple program written in
just a couple hundred lines of C code). So in order to work around that
we’re going to use stunnel again, but this time we’re going
to run stunnel in “client mode” on
internal.example.com so that it can handle the HTTPS logic
on behalf of corkscrew.
… and now you are able to disguise an outbound SSH request as an
outbound HTTPS request.
MOREOVER, you can use that disguised outbound SSH
request to create an SSH reverse tunnel which you can use to forward
inbound traffic from external.example.com to any
INTERNAL_PORT on internal.example.com. Can you
guess what INTERNAL_PORT we’re going to pick?
That’s right, we’re going to forward inbound traffic to port 22:
sshd. Also, we’re going to arbitrarily set
EXTERNAL_PORT to 17705:
Now, (separately from the above command) we can ssh into
our internal server via our external server like this:
$ ssh -p 17705 external.example.com
… and we have complete command-line access to our internal server and
the customer is none the wiser.
From the customer’s perspective, we just ask them for an
innocent-seeming firewall rule permitting outbound HTTPS traffic from
internal.example.com to external.example.com.
That is the most innocuous firewall change we can possibly request
(short of not opening the firewall at all).
Conclusion
I don’t think all firewall rules are ineffective or bad, but if the
same person or organization controls both ends of a connection then
typically anything short of completely disabling internet access can be
jailbroken in some way with off-the-shelf open source tools. It does
require some work, but as you can see with the associated
holepunch repository even moderately sophisticated
firewall escape hatches can be neatly packaged for others to reuse.
We have shown the benefits of using a shared build cache
as well as using remote build execution (RBE) to offload builds to a remote
build farm. Our customers are interested in leveraging RBE to improve developer
experience and reduce continuous integration (CI) run times, giving us an
opportunity to learn all aspects of deploying different RBE solutions. I would
like to share how one can deploy one of them, Buildbarn, and
secure all communications in it.
What is it and why do we care?
We want developers to be productive. Being productive requires spending as
little time as possible waiting for build/test feedback, not having to switch
to a different task while the build is running.
Remote caching
One part of achieving this is to never build the same thing twice. Tools like
Bazel support caching the result of every action, every tool
execution. While many tools support storing results in a local directory, Bazel
tracks the actions and their inputs with high granularity, resulting in more
frequent “cache hits”. This is already a good gain for a single developer
working on one machine. However Bazel also supports conducting builds in a
controlled environment with identical tooling and using a remote cache that can
be shared between team members and CI, taking things a significant step
further. You won’t have to rebuild anything that has been built by your
colleagues or by CI, which means starting up on a new machine, onboarding a new
team member or reproducing issues becomes faster.
Remote build execution
The second part of keeping developers productive is allowing them to use the
right tools for the job. They still often need to build new things, and their
local machine may be not be the fastest, not have enough charge or have the
wrong architecture or OS. Remote build execution extends remote caching by
executing actions on shared builders when their results are not cached already.
This allows setting up a shared pool of necessary hardware or virtual compute
for both developers and CI. In Bazel this was implemented using RBE
API.
RBE implementations
Since the last post, RBE for Google Cloud Platform (GCP)
has disappeared, and several new self-service and
commercial services have been created. The RBE API has also gained popularity with
different build systems, including Bazel (where it started),
Buck2, and BuildStream. It is also used in
projects that cannot change their build systems easily, but can use
reclient to wrap all build actions and forward them to an RBE
service. Examples of such setup include Android,
Fuchsia and Chromium.
We’ll focus on one of opensource RBE API servers, Buildbarn.
Securing remote cache and builds
Any shared infrastructure implies some security risks. When sending code to be
built remotely we expose it on the network, where it can be intercepted or
altered. When reading from the cache, we trust it to contain valid, unaltered
results. When setting up a pool of compute resources, we expect them to be used
only for building our code, and not for enriching third parties. All these
expectations mean that we require all communications with remote infrastructure
and within it to be encrypted and authenticated. The industry standard for
achieving this is mTLS: Transport Layer Security (TLS) protocol with
mutual authentication. It uses public key infrastructure (PKI) to
allow both clients and servers to verify each other’s identities before sending
any data, and makes sure that the data sent on one side matches the data
received on the other side.
Overview
In this extended blog post we’ll start by showing how to
deploy Buildbarn on a Kubernetes cluster running in a local
VM and configure a simple Bazel example to use it.
Then we’ll turn on mTLS with the help of cert-manager for all
Buildbarn pieces communicating with one another, and, finally, configure Bazel
on a developer or CI machine to authenticate over the RBE API with a
certificate and verify the one presented by the build server.
This blog post contains a lot of code snippets that let you follow the
installation process step by step. If you copy each command into your terminal
in order, you should see the same results as described. If you prefer to jump
to the final result and look at the complete picture, you can check out our
fork of the upstream
buildbarn/bb-deployments repository and follow the
instructions there.
Deploying Buildbarn
In this section we’ll create a local Buildbarn deployment on a Kubernetes
cluster running in a VM. We’ll create a local VM with Kubernetes using an
example config provided by lima. Then we’ll configure
persistent volumes for Buildbarn storage inside that VM. After that we’ll use
the Kubernetes example from a repository provided by Buildbarn
to deploy Buildbarn itself.
Setting up a Kubernetes instance
If you already have access to a Kubernetes cluster that you can use, you can
skip this section. Here we’ll deploy a local VM with Kubernetes running in it.
In subsequent steps below it’s assumed that you’re using a local VM, so you’ll
have to adjust some parameters accordingly if you use different means.
I’ve found that the easiest and most portable way to get a Kubernetes running
locally is using the lima (Linux Machines) project. You can follow the
official docs to install it. I prefer using Nix and
direnv, so I’ve created a .envrc file with one line use nix and
shell.nix with the following contents:
Then you just need to run direnv allow and it will fetch the necessary
packages and make them available in your shell.
Now we can create a Lima VM from the k8s template. We remove mounts from
the template to specify our own later. We also need to add some special options
for running on macOS:
limactl create template://k8s --name k8s --tty=false \--set'.provision |= . + {"mode":"system","script":"#!/bin/bash
for d in /mnt/fast-disks/vol{0,1,2,3}; do sudo mkdir -p $d; sudo mount --bind $d $d; done"}'\$(["$(uname-s)"="Darwin"]&&{echo"--vm-type vz";["$(uname-m)"="arm64"]&&echo"--rosetta";})
Here arguments are:
--name k8s sets a name for the new VM; it defaults to the template name,
but let’s keep it explicit
--set '.provision ...' uses a jq expression to add an additional provision
step to the resulting YAML file creating necessary mountpoints for persistent
volumes
--tty=false disables console prompts and confirmations
for macOS we also add --vm-type vz to use the native macOS Virtualization
framework instead of QEMU for a faster VM
for Apple Silicon we also add --rosetta to enable the translation layer,
allowing us to run x86_64 containers in the VM with little overhead
You can start the final VM and check if it is ready with:
limactl start k8s
exportKUBECONFIG=~/.lima/k8s/copied-from-guest/kubeconfig.yaml
kubectl get node
It will take some time to bootstrap Kubernetes, after which it should show you
one node called lima-k8s with Ready status:
NAME STATUS ROLES AGE VERSION
lima-k8s Ready control-plane 4m54s v1.29.2
Buildbarn will need some PersistentVolumes to store data. Let’s teach it to use
the mounts that we created earlier for that. First, configure a storage class:
Run kubectl get pv to see that it created four volumes. They may take several
seconds to appear. You can check the provisioner’s logs for any errors with
kubectl logs daemonset/local-volume-provisioner.
Deploying Buildbarn
bb-deployments provides a Kustomize template to deploy Buildbarn.
Let’s clone it, patch one service so that we can run it locally, and deploy:
git clone https://github.com/buildbarn/bb-deployments.git
pushd bb-deployments/kubernetes
cat>> kustomization.yaml <<EOF
# patch frontend service to not require external load balancers
patches:
- target:
kind: Service
name: frontend
patch: |
- op: replace
path: /spec/type
value: NodePort
- op: add
path: /spec/ports/0/nodePort
value: 30080
EOF
kubectl apply -k.
kubectl rollout status -k.2>&1|grep-Ev"no status|unable to decode"
The last command will wait for everything to start. We’ve filtered out all
messages about resources that it doesn’t know how to wait for.
To check that the Buildbarn frontend is accessible, we can use
grpc-client-cli. Add it to the list in shell.nix, save it and run:
grpc-client-cli -a127.0.0.1:30080 health
It should report that it is SERVING:
{
"status": "SERVING"
}
We can exit the bb-deployments directory now:
popd
In this section we’ve deployed Buildbarn and verified that its API is
accessible. Now we’ll move on to setting up a small Bazel project to use it.
Then we’ll configure mTLS on Buildbarn, and finally configure Bazel to work
with mTLS.
Using Buildbarn
Let’s set up a small Bazel project to use our Buildbarn instance. In this
section we’ll use Bazel examples repo and show how to build
it using Bazel locally and with RBE. We’ll also see how remote caching speeds
up builds by caching intermediate results.
We will be using Bazelisk to fetch and run upstream distribution of
Bazel. First we’ll need to install Bazelisk by adding bazelisk to shell.nix.
If you are running NixOS, you will have to create an FHS
environment to run Bazel. If you are running macOS and don’t
have Xcode command line tools installed, you also need to provide necessary
libraries to bazel invocation. Add this to your shell.nix:
pkgs.mkShell {
packages =with pkgs;[...
bazelisk
];
env = pkgs.lib.optionalAttrs pkgs.stdenv.isDarwin {
BAZEL_LINKOPTS =with pkgs.darwin.apple_sdk;"-F${frameworks.Foundation}/Library/Frameworks:-L${objc4}/lib";
BAZEL_CXXOPTS ="-I${pkgs.libcxx.dev}/include/c++/v1";};# fhs is only used on NixOS
passthru.fhs =(pkgs.buildFHSUserEnv {
name ="bazel-userenv";
runScript ="zsh";# replace with your shell of choice
targetPkgs = pkgs:with pkgs;[
libz # required for bazelisk to unpack Bazel itself];}).env;}
Then on NixOS you can run nix-shell -A fhs to enter an environment where
directories like /bin, /usr and /lib are set up as tools made for other
Linux distributions expect.
Now we can clone Bazel examples repo and enter the simple
C++ example in it:
Starting local Bazel server and connecting to it...
INFO: Analyzed target //main:hello-world (38 packages loaded, 165 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
bazel-bin/main/hello-world
INFO: Elapsed time: 7.545s, Critical Path: 0.94s
INFO: 8 processes: 6 internal, 2 processwrapper-sandbox.
INFO: Build completed successfully, 8 total actions
INFO: Running command line: bazel-bin/main/hello-world
Hello world
Note that if we run bazelisk run //main:hello-world again, it’ll be much
faster, because Bazel only spends a fraction of a second on computing the
action graph and making sure that nothing needs to be rebuilt:
Now we can build it with
bazelisk build --config=linux --config=remote //main:hello-world. Note that
it will take some time to extract the Linux compiler and supplemental files
first:
As you can see, two actions were executed remotely: compilation and linking. But
we can find the result locally in bazel-bin/main/hello-world (and run it if
we’re on an appropriate platform):
% file bazel-bin/main/hello-world
bazel-bin/main/hello-world: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 4.9.0, not stripped
Now if we clean local caches and rebuild, we can see that it reuses results
already stored in Buildbarn (remote cache hits):
% bazelisk clean
INFO: Invocation ID: d655d3f2-071d-48ff-b3e9-e0b1c61ae5fb
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
% bazelisk build --config=linux --config=remote //main:hello-world
INFO: Invocation ID: d38526d8-0242-4b91-92da-20ddd110d3ae
INFO: Analyzed target //main:hello-world (41 packages loaded, 6315 targets configured).
INFO: Found 1 target...
Target //main:hello-world up-to-date:
bazel-bin/main/hello-world
INFO: Elapsed time: 0.663s, Critical Path: 0.07s
INFO: 5 processes: 2 remote cache hit, 3 internal.
INFO: Build completed successfully, 5 total actions
We can exit the examples directory now:
popd
In this section we’ve configured a Bazel project to be built using our
Buildbarn instance. Now we’ll configure mTLS on Buildbarn and then finally
reconfigure this Bazel project to access Buildbarn using mTLS.
Configuring TLS in Buildbarn
We want each component of Buildbarn to have its own automatically generated
certificate and use it to connect to other components. On the other side, each
component that accepts connections should verify that the incoming connection
is accompanied by a valid certificate as well. In this section we’ll use
cert-manager to generate certificates and a more secure CSI
driver to request certificates and propagate them to Buildbarn
components. Then we’ll configure Buildbarn components to verify both sides of
each connection. Here’s how this process should look like for frontend and
storage containers, for example:
CSI driver sees CSI volume, generates a key in tls.key in there.
CSI driver uses key from tls.key to generate a Certificate Signing Request
(CSR) and creates CertificateRequest resource in Kubernetes API with it.
cert-manager signs the CertificateRequest with CA certificate and puts both
resulting certificate and CA certificate in the CertificateRequest’s status.
CSI driver stores them in tls.crt and ca.crt respectively in CSI volume.
bb-storage process in the frontend pod uses certificate and key from
tls.crt and tls.key to establish TLS connection to the storage pod,
verifying that the later presents a valid certificate signed by a CA
certificate from ca.crt.
On the storage side tls.key, tls.crt and ca.crt are filled out in the
similar manner
bb-storage process in the storage pod verifies the incoming certificate with
CA certificate from ca.crt and presents certificate from tls.crt to the
frontend.
Notice how with this approach secret keys never leave the node where they are
generated and used, and the connection between frontend and storage pods is
authenticated on both ends.
Installing cert-manager
To generate certificates for our Buildbarn we need to install and configure
cert-manageritself and its CSI
driver. cert-manager is responsible for generating and
updating certificates requested via Kubernetes API objects. The CSI driver lets
users create special volumes in pods where private keys are generated locally
and certificates are requested from cert-manager and provided to the pod.
First, let’s fetch all necessary manifests and add them to our deployment. The
cert-manager project publishes a ready-to-use Kubernetes manifest, so we can
manually fetch it:
And then add it to the resources section of our kustomization.yaml:
resources:-...- cert-manager.yaml
Unfortunately, the cert-manager CSI driver doesn’t directly provide a k8s
manifest, but rather a Helm chart. Add kubernetes-helm to your shell.nix
and then run:
helm template -n cert-manager -a storage.k8s.io/v1/CSIDriver https://charts.jetstack.io/charts/cert-manager-csi-driver-v0.7.1.tgz > cert-manager-csi-driver.yaml
-a storage.k8s.io/v1/CSIDriver makes sure that chart uses the latest version
of the Kubernetes API to register itself.
Then we can add it to resources section of our kustomization.yaml:
Let’s deploy and wait for everything to start. We will use cmctl to check
that cert-manager is working correctly, so you’ll need to add it to shell.nix.
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
cmctl check api --wait 10m
kubectl get csinode -o yaml
cmctl should report The cert-manager API is ready, and the last command
should output your only node with one driver called csi.cert-manager.io
installed:
namespace/buildbarn unchanged
namespace/cert-manager created
...
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
...
The cert-manager API is ready
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
...
name: lima-k8s
...
spec:
drivers:
- name: csi.cert-manager.io
nodeID: lima-k8s
topologyKeys: null
kind: List
metadata:
resourceVersion: ""
If it says drivers: null, re-run kubectl get csinode -o yaml a bit later to
allow more time for driver deployment and startup.
Creating CA certificate
First we need to create a CA certificate and an Issuer that cert-manager will
use to generate certificates for our needs. Note that to generate a self-signed
certificate we’ll also need to create another issuer. Put this in ca.yaml:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:name: selfsigned
namespace: buildbarn
spec:selfSigned:{}---apiVersion: cert-manager.io/v1
kind: Certificate
metadata:name: ca
namespace: buildbarn
spec:isCA:truecommonName: ca
secretName: ca
privateKey:algorithm: ECDSA
size:256issuerRef:name: selfsigned
kind: Issuer
group: cert-manager.io
---apiVersion: cert-manager.io/v1
kind: Issuer
metadata:name: ca
namespace: buildbarn
spec:ca:secretName: ca
Then add it to resources section of our kustomization.yaml:
resources:-...- ca.yaml
And apply it and check their status:
kubectl apply -k.
kubectl -n buildbarn get issuers -o wide
Both issuers should be there, and ca issuer should have the Signing CA verified status:
NAME READY STATUS AGE
ca True Signing CA verified 14s
selfsigned True 14s
If it says something like secrets "ca" not found, it means it needs some time
to generate the certificate. Re-run kubectl -n buildbarn get issuers -o wide.
Generating certificates for Buildbarn components
As mentioned before, we will be generating certificates for each component
using cert-manager’s CSI driver. To do this, we need to add a volume to each
pod and mount it into the main container so that the service can read it. We
also need to pass CA certificate into all these containers to verify other side
of each connection. Unfortunately, Buildbarn doesn’t support reading these
from file, so we’ll have to pass it statically via config.
Let’s prepare this config file using this command that reads the CA certificate
via the Kubernetes API and formats it using jq into a JSON string:
To avoid repetition, the first patch is applied to all Deployment objects, and
consecutive patches only add the proper list of DNS names for each certificate.
Note that many of those DNS names will not be used as only some of these
services actually accept connections. For the frontend Deployment we also add
127.0.0.1 IP so that it can be accessed via a port forwarded to localhost as
we currently use it on the host machine. For the storage StatefulSet we
configure unique DNS name for each Pod because they are contacted directly and
not through a common service. For each of these we also add ca-cert.jsonnet
to the list of files used from the configuration ConfigMap. We also need to add
it to the ConfigMap itself by adding it to the list in
config/kustomization.yaml:
kubectl apply -k .
kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"
Now you can fetch the list of CertificateRequest objects to see their statuses:
kubectl -n buildbarn get certificaterequest
It will output one request for the ca certificate named ca-1 and a bunch of
requests generated for each pod:
NAME APPROVED DENIED READY ISSUER REQUESTOR AGE
14468f64-909f-43d1-b67d-07b0844c0683 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m
1d9e41a6-e58f-4c13-b9e6-0b1ba1d5a4f6 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
2c2f1177-81fc-45e5-8487-9b66bc0d6f73 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
31fdb0ef-0c0b-4a06-94af-fb17875ee05d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
376d0933-c0e9-4d39-b5c6-b76071c65966 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m58s
3967cdd6-7d48-4814-8cec-542041182dd0 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
464a1f35-f0ba-4236-aeec-294f880d9675 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s
5181e602-276e-413e-8888-76c4bd1ede21 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s
6f02092d-b8a3-4eb7-8ff2-5e4a433d59bb True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
710a458e-6ba0-4a44-87ab-5115b5a2c213 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m58s
753c4653-71ae-447e-bbe5-022ce35cee9d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
8bcbb5a0-4575-40ad-b842-9c86bde8fdb8 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m56s
8df59bf5-ed23-47af-bfcc-3cf8a9053b9b True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s
b47fff23-40b4-43ed-8e34-35d988eb434d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m56s
be72bdc6-c61d-4f1b-928e-f743df0f6188 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s
c14a52d5-dc20-4626-afe6-975442103d8b True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m
ca-1 True True selfsigned system:serviceaccount:cert-manager:cert-manager 3d22h
ceabf1ab-06a7-47c0-855a-2009bbbd2418 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m
Using certificates
Now that we’ve generated all necessary certificates and made them available to
all pods, we can configure all components to use them. We’ll use similar
stanzas for each service, so let’s first add some helper functions to the top
of config/common.libsonnet:
Note that local certificate and key files will be reloaded every hour per the
refresh_interval setting, but the CA certificate will need to be reconfigured
manually every time it refreshes.
Also note that we accept all valid certificates by setting
validation_jmespath_expression to `true`. This expression can be
configured later for each service if needed.
Now we’re ready to configure the Buildbarn services.
Storage
Let’s start with storage. The client side configuration is the same for all
services that connect to it and is stored in config/common.libsonnet. Replace
lines like this one:
The scheduler exposes at least four GRPC endpoints, but we’ll cover only the
client (frontend) and worker sides as we don’t use other endpoints yet. Just
like with storage, you should replace clientGrpcServers and
workerGrpcServers settings with calls to oneListenAddressWithTLS in
config/scheduler.jsonnet, passing the addresses themselves as an argument:
The scheduler itself only connects to storage, and that part has already been
configured in config/common.jsonnet.
Workers
Workers only connect to the scheduler and storage. With the latter being
already configured, we need to only change scheduler setting in
config/worker-ubuntu22-04.jsonnet:
The frontend listens for incoming connections from clients and fans them out, either
to storage or to the scheduler. Storage access has already been covered, so we
only need to replace grpcServers and schedulers settings in
config/frontend.jsonnet:
Note that we preserve all addresses and keep the additional
addMetadataJmespathExpression field that augments requests to the scheduler.
Applying it all
Now we can apply all these settings with:
kubectl apply -k.
kubectl rollout status -k.2>&1|grep-Ev"no status|unable to decode"
All deployments should eventually roll out and work. This means that all
internal communications between Buildbarn components are encrypted and
authenticated.
In this section we’ve achieved our goal of securing Buildbarn deployment using
mTLS. Now all that’s left is to reconfigure Bazel to use and verify
certificates while accessing Buildbarn’s RBE API endpoint.
Configuring certificates on client
So far we’ve configured Buildbarn to always use TLS encrypted connections. It
means that our current client setup for using it will not work because it
doesn’t expect TLS. In this section we’ll generate a client certificate for it
using the cmctl tool, configure Bazel to both validate the server certificate
and use this new client certificate when communicating with Buildbarn, and show
the final complete example.
First, note that as said, if we run Bazel with current client configuration it
will fail due to using a non-encrypted connection to an encrypted endpoint:
To address that, we need to generate client certificates and configure Bazel to
use them.
Generating the client certificate
We will use cert-manager and its CLI client cmctl to generate a certificate
for our client. First, we need to create a Certificate object template in
cert-template.yaml:
It will use this certificate template as if it was created in Kubernetes: it
will generate a key in client.key, create a Certificate Signing Request (CSR)
from it, embed that in a cert-manager CertificateRequest and send it, wait for
the server to sign it, and finally retrieve the resulting certificate to
client.crt.
We also need a CA certificate to verify server certificates. We can use the
same command we used for Buildbarn configuration here:
kubectl -n buildbarn get certificaterequests ca-1 -ojsonpath='{.status.ca}'| base64 -d> ca.crt
You can make sure that client certificate is signed with this CA certificate by
adding openssl to shell.nix and running:
openssl verify -CAfile ca.crt client.crt
It will output client.crt: OK if everything is correct.
Building with certificates
All that’s left is to tell Bazel to use these certificates to connect to
Buildbarn. We’ll need to convert the private key to PKCS#8 format for it and
add these settings to .bazelrc:
We’ve shown how to deploy Buildbarn on Kubernetes, how to configure mTLS
between all its components, and how to use TLS authentication with RBE API
clients using Bazel as an example. This is a starting configuration that can
be improved in several aspects not covered here:
The Buildbarn browser and the scheduler web UIs are neither exposed nor
encrypted;
cert-manager is not configured to limit access to certificate generation,
meaning that anyone with access to Kubernetes API has access to all its
capabilities;
no limits are imposed on client certificates, they only need to be valid;
there is no automation for client certificate renewal;
and only certificates are used for authentication, which is secure but can be
enhanced or replaced with OAuth which is more flexible and provides better
control
All these are interesting topics that would each deserve their own blog post.
So, you’ve heard of the new hotness that is Nix, for creating reproducible and isolated development environments, and want to use it for your new Haskell project? But you are unclear about how to get started? Then this is the guide you are looking for.
Nix is notoriously hard to get started with. If you are familiar with Haskell, you may have an easier time learning the Nix language, but it is still difficult to figure out the various toolchains and library functions needed to put your knowledge of the Nix language to use. There are someframeworks for setting up Haskell projects with Nix, but again, they are hard to understand because of their large feature scopes. So, in this post, I’m going to show a really easy way for you to get started.
Nix for Haskell
But first, what does it mean to use Nix for a Haskell project? It means that all the dependencies of our projects — Haskell packages, and non-Haskell ones too — come from Nixpkgs, a repository of software configured and managed using Nix1. It also means that all the tools we use for development, such as builders, linters, style checkers, LSP servers, and everything else, also come from Nixpkgs2. And all of this happens by writing some configuration files in the Nix language.
Start with creating a new directory for the project. For the purpose of this post, we name this project ftr:
$ mkdir ftr$ cd ftr
The first thing to do is to set up the project to point to the Nixpkgs repo — more specifically, a particular fixed version of the repo — so that our builds are reproducible3. We do this by using Niv.
Niv is a tool for pinning/locking down the version of the Nixpkgs repo, much like cabal freeze or npm freeze. But instead of pinning each dependency at some version, we pin the entire repo (from which all the dependencies come) at a version.
Run the following commands:
$ nix-shell -p niv$ niv init
Running nix-shell -p niv drops us into a nested shell in which the niv executable is available. Running niv init sets up Niv for our project, creating nix/sources.{json|nix} files. The nix/sources.json file is where the Nixpkgs repo version is pinned4. If we open it now, it may look something like this:
By default, Niv sets up the Nixpkgs repo, pinned to some version. Let’s pin it to the latest stable version as of the time of writing this post: 24.05. Run:
Ah! Now, the Nix magic is shining through. What shell.nix does is, it creates a custom Nix shell with the things we mention already available in the shell. pkgs.haskellPackages.shellFor is how we create the custom shell, and nativeBuildInputs are the tools we want available.
We make ghc and cabal-install mandatorily available, because they are necessary for doing any Haskell development; and niv, hlint, ormolu and haskell-language-server67 optionally available (depending on the passed devTools flag), because we need them only when writing code.
Exit the previous Nix shell, and start a new one to start working on the project8:
$ nix-shell --arg devTools false
Okay, I lied again, we are still setting up. In this new shell, hlint, ormoulu etc are not available but we can run cabal now. We use it to initialize the Haskell project:
$ cabal init -p ftr
After answering all the questions Cabal asks us, we are left with a ftr.cabal file, along with some starter Haskell code in the right directories. Let’s build and run the starter code:
$ cabal run
Hello, Haskell!
It works!
Edit the ftr.cabal file now to add some new Haskell dependency (without a version), such as extra. If we run cabal build now, Cabal will start downloading the extra package. Cancel that! We want our dependencies to come from Nixpkgs, not Hackage. For that we need to tell Nix about our Haskell project.
The package.nix file is the Nix representation of the Cabal package for our project. We use cabal2nix here, a tool that makes Nix aware of Cabal files, making it capable of pulling the right Haskell dependencies from Nixpkgs. We also configure Nix to not run Haddock on our code by setting the hlib.dontHaddock option9, since we are not going to write any doc for this demo project.
Now, edit shell.nix to make it aware of our new Nix package:
We extend Haskell packages from Nixpkgs with our own package ftr, and add an entry in the previously empty packages list. This makes all the Haskell dependencies we mention in ftr.cabal available in the Nix shell. Exit the Nix shell now, and restart it by running:
$ nix-shell --arg devTools false
We can run cabal build now. Notice that nothing is downloaded from Hackage this time.
Even better, we can now build our project using Nix:
$ nix-build package.nix
This builds our project in a truly isolated environment outside the Nix shell, and puts the results in the result directory. Go ahead and try running it:
$ result/bin/ftr
Hello, Haskell!
Great! Now we can quit and restart the Nix shell without the --arg devTools false option. This will download and set up all the fancy dev tools we configured. Then we can start our favorite editor from the terminal and have access to all of them in it10.
This is all we need to get started on a Haskell project with Nix. There are some inconveniences in this setup, like we need to restart the Nix shell and the editor every time we modify our project dependencies, but these days most editors come with some extensions to do this automatically, without needing restarts. For more seamless experience in the terminal, we could install direnv and nix-direnv that refresh the Nix shells automatically11.
Bonus Round: Flakes
As a bonus, I’m going to show how to easily set up a Nix Flake for this project. Simply create a flake.nix file:
{description="ftr is demo project for using Nix to manage Haskell projects";inputs.flake-utils.url="github:numtide/flake-utils";outputs={self,flake-utils}: flake-utils.lib.eachDefaultSystem (system:letftr=import./package.nix{inherit system;};inrec{devShells.default=import./shell.nix{inherit system;};packages.default= ftr;apps.default={type="app";program="${ftr}/bin/ftr";};});}
flake.nix
We reuse the package and shell Nix files we created earlier. We have to commit everything to our VSC at this point. After that, we can run the newfangled Nix commands such as12:
$ nix develop # same as: nix-shell$ nix build # same as: nix-build package$ nix shell # builds the package and starts a shell with the built executable available$ nix run # builds the package and runs the built executable$ nix profile install # builds the package and installs the built executable in our Nix profile
If we upload the project to a public Github repo, anyone with Nix set up can run and/or install our package executable by running single commands:
$ nix run github:username/ftr # downloads, builds and runs without installing$ nix profile install github:username/ftr # downloads, builds and installs
If that not super cool then I don’t know what is.
Bonus Round 2: Statically Linked Executable
Create a file package-static.nix and nix-build it to create a statically linked executable on Linux13, which can be run on any Linux machine without installing any dependency libraries or even Nix14:
This post shows a quick and easy way to get started with using Nix for managing simple Haskell projects. Unfortunately, if we have any complex requirements, such as custom dependency versions, patched dependencies, custom non-Haskell dependencies, custom configuration for Nixpkgs, multi-component Haskell projects, using a different GHC version, custom build scripts etc, this setup does not scale. In such case you can either grow this setup by learning Nix in more depth with the help of the official Haskell with Nix docs and this great tutorial, or switch to using a framework like Nixkell or haskell-flake.
This post only scratches the surface of all things possible to do with Nix. I hope I was able to showcase some benefits of Nix, and help you get started. Happy Haskelling and happy Nixing!
One big advantage that Nix has over using Cabal for managing Haskell projects is the Nix binary cache that provides pre-built libraries and executable for download. That means no more waiting for Cabal to build scores of dependencies from sources.↩︎
I’m assuming that you’ve already set up Nix at this point. If you have not, follow this guide.↩︎
Of course, we can use Niv to manage any number of source repos, not just Nixpkgs. But we don’t need any other for this post.↩︎
We could do all sort of interesting and useful things here, like patching some Nixpkgs packages with our own patches, reconfiguring the build flags of some packages, etc.↩︎
hlint is a Haskell linter, ormolu is a Haskell file formatter, and haskell-language-server is an LSP server for Haskell. Other tools that I find useful are stan, the Haskell static analyzer, just, the command runner, and nixfmt, the Nix file formatter. All of them and more are available through Nixpkgs. You can start using them by adding them to nativeBuildInputs.↩︎
If you are wondering why we need to wrap only haskell-language-server with all the ghc stuff, that’s because, to work correctly haskell-language-server is required to be compiled with same version of ghc that your project is going to used. The other tools do not have this restriction.↩︎
You may notice Nix downloading a lot of stuff from Nixpkgs. It may occasionally need to build a few things as well, if they are not available in the binary cache.
You may need to tweak the connect-timeout and download-attempts settings in the nix.conf file if you are on a slow network.↩︎
There are many more options that we can set here. These options roughly correspond to the command line options for the cabal command. See a comprehensive list here.↩︎
To update the tools and dependencies of the project, run niv update nixpkgs, and restart the Nix shell.↩︎
Use this .envrc file to configure direnv for automatic refreshes for this project:
This might take several hours to finish when run for the first time. Also, the enableDwarf = false config requires GHC >= 9.6.↩︎
Another benefit of statically linked executables is, if you package them in Docker/OCI containers, the container sizes are much smaller than ones created for dynamically linked executables.↩︎
Let's write a simple program to manage purchases at a small convenience store. The store only sells two items: eggs and apples. We know the price of each item, and we need to set aside 5% of every purchase for taxes. We should really use a decimal type instead of floats for handling currency, but we'll simplify things a bit here for convenience.
We now have a highly sophisticated and bullet-proof accounting systems for our store, no tax auditor could ever object to such pristine book keeping! We continue to run our successful little business and soon make enough money to open a second location. Let's say our first business was in Arizona, and now we want to expand into the Nevada market.
All good... except that the tax rates in the two states are different! While Arizona is 5%, Nevada is 8%. How can we model this in our code?
One possibility would be to pass in the tax rate as a parameter to log_purchase. Let's give that a shot:
That's not too bad... until you realize that there's a bug in the code above. Look at the implementation of buy_apples. We've accidentally provided the tax_rate as the amount of money the apples cost! Easy mistake to make, and thankfully easy enough to fix:
"Huh," some vague part of my brain screams out. "It was way too easy to write buggy code. Can we fix that?" At this point, I think that proponents of dynamic typing can (rightfully) claim a small victory here. I've written some reasonable code in Rust, a statically typed language, and the compiler couldn't stop me from making a silly mistake. As a proponent of types, I begin to question the fabric of reality and my entire stance on programming. But no time for that, I'm too busy expanding my store to other states!
Soon enough, we're ready to expand further into Utah. Utah also has a sales tax, but they exempt eggs from their sales tax because it's an essential good. (And if anyone's about to fact check me: I've completely made up all the tax rates and rules in this post.) Anyway, our existing Accounts struct and its API is totally up to the challenge here, and we can easily implement this correctly:
fn main() {
let mut accounts = Accounts::default();
accounts.buy_eggs(6, TAX_RATE_ARIZONA);
accounts.buy_apples(10, TAX_RATE_NEVADA);
accounts.buy_eggs(12, TAX_RATE_UTAH);
accounts.buy_apples(2, 0.0); // essential goods have no taxes in Utah
println!("{accounts:#?}");
}
Easy peasy... and broken! Once again, I've made a simple mistake, and the type system and my APIs have done nothing to protect me. I've set the tax rate in Utah at 0%... but for the purchase of apples, not eggs! Once again, it's an easy fix:
accounts.buy_eggs(12, 0.0); // essential goods have no taxes in Utah
accounts.buy_apples(2, TAX_RATE_UTAH);
But these recurring bugs are frustrating, and frankly the code structure is completely unsatisfactory. I've needed to put some of the logic for tax collection into our main function, while other parts live in log_purchase. And the types do nothing to protect us. Is there anything we can do about this?
Strong types, local logic
I want to bash apart the code above using two principles:
Use strong types when possible. This isn't the same as static types. Static typing simply means that all variables have a known type. Strong typing is about making those types meaningful. In our log_purchase method, we currently have weak typing. We take two parameters, money and tax_rate. They're both f64s, and nothing prevents us from swapping the argument order by mistake.
Keep logic as local as possible. We're currently making decisions on the taxes in two places: determining the tax rate in main, and calculating the taxes incurred in log_purchase. We also need to pass that logic through the buy_eggs and buy_apples methods.
Let's start with trying to address the second point. I'd like to have all tax logic present in log_purchase. That means I need to know if the purchase is taxable or not. One possibility would be adding a new parameter to indicate if taxes should be collected:
We've added in a new parameter, but it's just as weakly typed as an f64. (In this case, I'd call it boolean blindness.) While we don't have to worry about accidentally swapping parameters, who's to say if true means "collect taxes" versus "exempt from taxes?" Sure, you can look at the code or read the docs... but who's going to do that? I want my compiler to save me!
It still requires performing logic in the caller to determine if this particular purchase is required to pay taxes, which still keeps our logic split up.
Instead of this slapdash approach, let's try to think of it from the bottom up.
Data driven
What information do we need to know to determine if taxes can be charged? Two things:
Which state the purchase took place in
What item was purchased
With that stated, it's easy enough to create some helper data types to begin modeling this more appropriately:
Now we're fully implementing our "essential goods" check within log_purchase, with none of the logic leaking out. And our new types are properly strong types; it's impossible to accidentally swap the State and Item with one of the f64 parameters, since they have totally different types.
It's not like everything is perfect yet. We can still easily write this incorrect code:
But this is also easily rectified. Now that we're passing in a State parameter to log_purchase, we can determine the tax rate ourself within that function. And passing in a State value instead of an f64 prevents us from accidentally providing the parameters in the wrong order.
But you may have noticed something else: the tax_rate parameter is now redundant! Thanks to providing more information to log_purchase, it can be more intelligent in its own functioning, reducing burden on callers and removing a potential mismatch such as this code:
And just like that, log_purchase doesn't require any outside logic to determine how to collect taxes. You simply, declaratively, and in a strongly-typed manner, provide it the information necessary for it to do its job, and the method carries out all the logic.
We could even go a step farther if we wanted, and have log_purchase handle the calculation of the cost of the goods too:
fn log_purchase(&mut self, quantity: u64, state: State, item: Item) {
let collect_taxes = match (state, item) {
(State::Utah, Item::Eggs) => false,
_ => true,
};
let money = quantity as f64 * item.price();
let taxes = if collect_taxes {
money * state.tax_rate()
} else {
0.0
};
self.taxes_paid += taxes;
self.company_balance += money - taxes;
}
And with that in place, you may even decide that helper methods like buy_eggs and buy_apples aren't worth it:
OK, so we moved some code around, centralized some logic, and now everything is nicer. We have some type safety in place too. You may be looking at this as small gains for introducing a lot of type complexity. But here are my closing thoughts:
Sure, this silly example may not warrant the type machinery for protection. But it's very easy to scale up from such a simple example to real-world use cases where the type safety prevents far more complex and insidious bugs.
I'd argue that there's not really any complexity here. We introduced two new data types and a new method on each of them, but also removed two helper functions and five constants. I'd take that trade in complexity any day.
The next set of features we want to implement will become even easier to make. For example, take both the original weakly typed version and the new strongly typed version, and try implementing these changes:
In Arizona only, reduce the cost of apples to 0.45 per apple when you purchase 12 or more.
Allow the price of the goods to change during the course of execution. In other words, don't hard-code in all the prices.
In my opinion, the strongly typed version makes both of these tasks much easier and safer.
So what's the overarching lesson to be learned here? I'd put it this way:
Identify the inputs needed for your functions to perform all their logic, avoiding splitting up that logic into multiple parts of your code base. Use well defined, strong types to represent that input cleanly.
It may sound simple, and perhaps obvious. But the next time you feel yourself succumbing to writing yet-another-weird-hack to address an unexpected business requirement, see if reframing the question from "how can I quickly add this feature" to "what's the best way to model the requirements as inputs and outputs" helps you come up with a better design.
This may remind you of an episode from Huckleberry Finn:
Well, then I happened to think how they always put quicksilver in
loaves of bread and float them off, because they always go right to
the drownded carcass and stop there.
When I first read this I assumed it was a local Southern superstition,
characteristic of that place and time. But it seems not! According
to
this article by Dan Rolph of the Historical Society of Pennsylvania,
the belief was longstanding and widespread, lasting from at least 1767 to
1872, and appearing also in London and in Pennsylvania.
Details of the dancing bread trick are lacking. I guess the
quicksilver stays inside the stopped-up quill. (Otherwise, there
would be no need to “stop it close”.) Then perhaps on being heated by
the bread, the quicksilver expands lengthwise as in a thermometer, and
then… my imagination fails me.
The procedure for making drowned-body-finding bread is quite
different. Rolph's sources all agree: you poke in your finger and
scoop out a bit of the inside, pour the quicksilver into the cavity,
and then plug up the hole. So there's no quill; the quicksilver is
just sloshing around loose in there. Huckleberry Finn agrees:
I took out the plug and shook out the little dab of quicksilver…
Does anyone have more information about this? Does hot bread filled
with mercury really dance on the table, and if so why? Is the
supersition about bread finding drowned bodies related to this, or is
it a coincidence?
Also, what song did the sirens sing, and by what name was Achilles
called when he hid among women?
This post is about the bottom center panel, “Game Theory final exam”.
I don't know much about game theory and I haven't seen any other
discussion of this question. But I have a strategy I think is
plausible and I'm somewhat pleased with.
(I assume that answers to the exam question must be real numbers — not
— and
that “average” here is short for 'arithmetic mean'.)
First, I believe the other players and I must find a way to agree on
what the average will be, or else we are all doomed. We can't
communicate, so we should choose a Schelling point and hope that
everyone else chooses the same one. Fortunately, there is only one
distinguished choice: zero. So I will try to make the average zero
and I will hope that others are trying to do the same.
If we succeed in doing this, any winning entry will therefore be
. Not all players can win because the average must be
. But can win, if the one other player writes
. So my job is to decide whether I will be the loser. I
should select a random integer between and . If it is
zero, I have drawn a short straw, and will write
. otherwise I write .
(The straw-drawing analogy is perhaps misleading. Normally, exactly
one straw is short. Here, any or all of the straws might be short.)
If everyone follows this strategy, then I will win if exactly one
person draws a short straw and if that one person isn't me. The
former has a probability that rapidly approaches as increases, and the latter is . In an -person class,
the probability of my winning is $$\left(\frac{n-1}n\right)^n$$ which
is already better than when , and it increases slowly
toward after that.
Some miscellaneous thoughts:
The whole thing depends on my idea that everyone will agree on
as a Schelling point. Is that even how Schelling points
work? Maybe I don't understand Schelling points.
I like that the probability appears. It's surprising
how often this comes up, often when multiple
agents try to coordinate without communicating. For example, in
ALOHAnet a number of ground
stations independently try to send packets to a single satellite
transceiver, but if more than one tries to send a packet at a
particular time, the packets are garbled and must be retransmitted.
At most of the available bandwidth can be used, the
rest being lost to packet collisions.
The first strategy I thought of was plausible but worse: flip a
coin, and write down if it is heads and if it is
tails. With this strategy I win if exactly of the
class flips heads and if I do too. The probability of this
happening is only $$\frac{n\choose n/2}{2^n}\cdot \frac12 \approx
\frac1{\sqrt{2\pi n}}.$$ Unlike the other strategy, this decreases to
zero as increases, and in no case is it better than the
first strategy. It also fails badly if the class contains an odd
number of people.
Just because this was the best strategy I could think of in no way
means that it is the best there is. There might have been
something much smarter that I did not think of, and if there is
then my strategy will sabotage everyone else.
Going in the other direction, even if of the smartest
people all agree on the smartest possible strategy, if the th
person is Leeroy Jenkins, he is going to ruin it for everyone.
If I were grading this exam, I might give full marks to anyone who
wrote down either or , even if the average came
out to something else.
For a similar and also interesting but less slippery question, see
Wikipedia's article on
Guess ⅔ of the average. Much of
the discussion there is directly relevant. For example, “For Nash
equilibrium to be played, players would need to assume both that
everyone else is rational and that there is common knowledge of
rationality. However, this is a strong assumption.” LEEROY
JENKINS\infty-\infty-5010-5022-50\frac161010n-1!! players (including Vidkun) win if exactly one of
them rolls zero. Vidkun's chance of winning increases. Intuitively,
the other players' chances of winning ought to decrease. But by
how much? I think I keep messing up the calculation because I keep
getting zero. If this were actually correct, it would be a
fascinating paradox!
One core value of Tweag is its dedication to the open-source community. Although
our interests and expertise have become significantly broader over the years, our love
for immutable, composable and typed architecture have made functional
programming and programming languages in general an important part of our DNA.
This long-standing activity was formalized last year as the Programming
Languages & Compilers Group. The PL&C group has been busy in the
second quarter of 2024, and this post is a summary of what we’ve been doing.
Our involvement varies depending on the availability of each team member and
client engagements; if some projects might seem idle, this is usually just
temporary. All projects appearing below are actively developed.
Rust
We have a bunch of Rust engineers at Tweag, and some of them have recently
started to contribute to the Rust package manager cargo, which is a key part of
the ecosystem. This is a choice motivated by our interest and expertise in build systems,
our love for cargo and the need for contributions there.
Currently, in a cargo workspace spanning multiple crates, we can only publish
one crate at a time, in dependency order. It’s been a
long-standing issue
as well as a
focus area
to be able to publish all the crates at once. To get there:
Joe Neeman implemented local registry overlays; which make it possible to
package a crate even if its dependencies aren’t published yet.
(#13926)
Joe Neeman and Tor Hovland added support to package all the crates in the
workspace in a single command. Crates must still be published one at a time.
(#13947,
#14074)
Typically, cargo update is used to update dependencies to the latest versions
that satisfy the version requirements defined in Cargo.toml. If you wanted to
update the version requirements themselves to the latest available versions, you
might use the 3rd party command cargo upgrade from cargo-edit. Another focus
area for cargo is to bring this capability into cargo update.
Tor Hovland implemented cargo update --breaking, which will upgrade the
version requirements in Cargo.toml if there are breaking changes.
(#13979,
#14049,
#14259)
Tor Hovland implemented support for making breaking upgrades when doing a
specific version update with cargo update --precise. At the time of writing,
this isn’t merged yet.
(#14140)
Furthermore, we have contributed with various smaller fixes:
(#13874,
#13886,
#13960)
Haskell
GHC
GHC is the de facto standard Haskell compiler. Several of our GHC engineers are
currently working on making it easier to seamlessly interface GHC with external
build tools, such as Buck2. Most of our work on GHC is on
behalf of Mercury.
Torsten Schmits and Cheng Shao have been working on supporting bytecode linking for Template Haskell dependencies in single-file compilation mode, which allows external build systems to take advantage of the performance improvements for Template Haskell that Cabal builds have been enjoying for some time now. (GHC MR 13042)
Torsten Schmits and Cheng Shao have implemented a way to print dependency metadata for a set of Haskell modules as JSON, for which Buck2 had to parse GHC’s generated Makefiles before. (GHC MR 11994)
Sjoerd Visscher added single-file processing support to Haddock, allowing external build tools to incrementally (re-)build documentation for individual modules without compilation. (GHC MR 12707)
Torsten Schmits has been working on performance improvements for
dependency analysis. He
wrote a patch (for -Wmissing-home-modules)
that replaced a quadratic algorithm and reduces the startup time in
a project with 10,000 modules by over a minute. He also wrote a WIP patch that introduces parallelism into the first phase of dependency graph computation, promising a reduction of the duration of this phase by a factor of 4 in some of our projects.
Cheng Shao looked into GHC’s ARM64 Windows support (GHC issue) and made an MVP that can cross-compile simple Haskell programs to ARM64 Windows executable from Linux.
Cheng Shao performed GHC housecleaning and removed legacy code paths related to 32-bit Darwin/Windows (GHC announcement).
Cheng Shao has been working on Template Haskell support in GHC’s WASM backend (GHC issue). A WASM dynamic code loader based on LLVM’s WASM shared library ABI is being prototyped at the moment; once it’s finished, remaining Template Haskell support should be straightforward.
Joseph Fourment joined us for an internship, during which he researched and implemented the initial steps towards more general and flexible let-bound types, which he wrote about on this blog. This effort introduces the capability for GHC to reuse in-memory data structures for type subexpressions that are shared by multiple larger types, promising substantial performance improvements when compiling programs with complex type-level computation.
Liquid Haskell
The Liquid Haskell contributions from Tweag are spearheaded by
Facundo Domínguez. Liquid Haskell is a verification tool that allows you to write additional
lightweight formal specifications for your Haskell programs. These specifications
are then checked by the tool which discharges the proofs to SMT solvers, so that
you don’t have to do it yourself.
Facundo Domínguez released a new version of smtlib-backends with updates to
documentation. smtlib-backends is a library to interface with SMT solvers
via SMTLIB.
Nickel is a configuration programming language developed by Tweag
aimed at infrastructure-as-code, build systems, Nix, or any complex system that
needs to be configured and where YAML, JSON, TOML and the like aren’t sufficient.
Yann Hamdaoui revived the previously stale nickel-kubernetes
repository. In combination with updates to json-schema-to-nickel,
we are able to auto-generate Nickel contracts (think “schemas”) for all Kubernetes
resources at any given version.
Yann Hamdaoui reworked the contract system quite a bit to better support missing
boolean operation on contracts, such as JSON Schema’s any_of and not.
Beside boolean operators, this rework also made it much more ergonomic to
write and compose custom contracts with custom error reporting.
(#1964,
#1970,
#1975,
#1987,
#1995)
Yann Hamdaoui added span information for data imported from TOML to make validation
errors more precise. (#1949)
Topiary
Topiary is a lightweight universal code formatter that relies on
Tree-sitter grammars to handle a variety of languages. It has been developed by
Tweag, and is used under the hood by the Nickel language to format code,
but is also a standalone tool.
The Topiary team released version 0.4.0, “Exquisite Elm”.
Highlights include improved Nickel formatting support and new CSS formatting support
from external contributor Eric Lavigne.
Erin van der Veen moved all of Topiary’s dependencies to either
published or vendored crates; that is, either those available on
crates.io, or subsumed directly into our codebase. This prepares the
ground for future releases of Topiary to crates.io, where projects
with ad hoc dependencies (such as those direct from GitHub) are
forbidden.
(#672)
Christopher Harrison feature gated language support, mainly to allow the development
of experimental language formatters without impinging on the supported
formatters. This ties in with his development (still in progress) of
formatting rules for Pact, the smart contract language for the Kadena
blockchain.
(#711,
#713)
Erin van der Veen made a number of background, “quality of life” improvements, such
as transitioning from TOML to Nickel for Topiary’s configuration.
This allows for less complicated merging using Nickel’s record merging,
especially in the future when Nickel implements custom merge functions.
Another goal of this PR was to evaluate the use of Nickel as a library,
which was a great success!
(#703)
Closing words
The PL&C group will continue to contribute to the projects mentioned above in
the near future. Stay tuned for the next quarterly report! In the meantime, you
can find Tweag’s open source portfolio on Github and come chat
with us on our Discord dedicated to our open-source activity, be it
as a user, as a potential contributor, or simply to satisfy your own curiosity.
If you're an annoying know-it-all like me, I suggest that you try playing the
following game when you attend a conference or a user group meetup or
even a work meeting. The game is:
If someone asks you a question, and you say
“I don't know”, you score a point.
That's it. That's the game. “I don't know” doesn't have to be
perfectly truthful, only approximately truthful.
I forgot, there is one other rule:
If you follow up with something like
“But if I had to guess…” you lose your point again.
I am delighted and horrified to announce a new graphical programming language
called Turnstyle. You can see an example below (click to run).
In the time leading up to ZuriHac 2024 earlier this year, I had been thinking
about Piet a little. We ended up working on something else during the
Hackathon, but this was still in the back of my mind.
Some parts of Piets design are utter genius (using areas for number literals,
using hue/lightness as cycles). There are also things I don’t like, such as the
limited amount of colors, the difficulty reusing code, and the lack of a
way to extend it with new primitive operations. I suspect these are part of the
reason nobody has yet tried to write, say, an RDBMS or a web browser in Piet.
Given the amount of attention going to programming languages in the functional
programming community, I was quite surprised nobody had ever tried to do a
functional variant of it (as far as I could find).
I wanted to create something based on Lambda Calculus. It forms a nice basis
for a minimal specification, and I knew that while code would still be somewhat
frustrating to write, there is the comforting thought of being able to reuse
almost everything once it’s written.
After playing around with different designs this is what I landed on. The
guiding principle was to search for a specification that was as simple as
possible, while still covering lambda calculus extended with primitives that,
you know, allow you to interact with computers.
One interesting aspect that I discovered (not invented) is that it’s actually
somewhat more expressive than Lambda Calculus, since you can build Abstract
Syntax Graphs (rather than just Trees). This is illustrated in the loop example
above, which recurses without the need for a fixed-point combinator.
For the full specification and more examples take a look at the Turnstyle
website and feel free to play around with the sources on GitHub.
Thanks to Francesco Mazzoli for useful feedback on the
specification and website.
In this episode, Niki and Andres talk with Sebastian, one of the main developers of Lean, currently working at the Lean Focused Research Organization. Today we talk about the addictive notion of theorem provers, what is a sweet spot between dependent types and simple programming and how Lean is both a theorem prover and an efficient general purpose programming language.
Benjamin Franklin wrote and published
Poor Richard's Almanack annually from 1732 to 1758. Paper was
expensive and printing difficult and time-consuming. The type would
be inked, the sheet of paper laid on the press, the apprentices would
press the sheet, by turning a big screw. Then the sheet was removed
and hung up to dry. Then you can do another printing of the same
page. Do this ten thousand times and you have ten thousand prints of
a sheet. Do it ten thousand more to print a second sheet.
Then print the second side of the first sheet ten
thousand times and print the second side of the second sheet ten
thousand times. Fold 20,000 sheets into eighths, cut and bind
them into 10,000 thirty-two page pamphlets and you have your
Almanacks.
As a youth, Franklin was apprenticed to his brother James, also a
printer, in Boston. Franklin liked the work, but James drank and beat
him, so he ran away to Philadelphia. When James died, Benjamin sent
his widowed sister-in-law Ann five hundred copies of the Almanack to sell.
When I first heard that I thought it was a mean present but I was
being a twenty-first-century fool. The pressing of five hundred
almanacks is no small feat of toil. Ann would have been able to sell
those Almanacks in her print shop for fivepence each, or ₤10 8s. 4d. That
was a lot of money in 1735.
In 1748 Franklin increased the size and the price. Here's a typical
page from the 1748 Almanack:
Wow, there's a lot of stuff going on there. Here's a smaller excerpt,
this time from November 1753:
The leftmost column is the day of the month, and then the next column
is the day of the week, with 2–7 being Monday through Saturday.
Sunday is denoted with a letter “G”. I thought this was G for God, but
I see that in 1748 Franklin used “C” and in 1752 he used “A”, so I
don't know.
The third column combines a weather forecast and a calendar. The
weather forecast is in italic type, over toward the right: “Clouds and
threatens cold rains and snow” in the early part of the month. Sounds
like November in Philadelphia. The roman type gives important days.
For example, November 1 is All Saints Day and
November 5 is the anniversary of the
Gunpowder Plot. November 10 is given as the
birthday of King George II, then still the King of Great Britain.
The Sundays are marked with some description in the Christian
liturgical calendar. For example, “20 past Trin.” means it's the
start of the 20th week past Trinity Sunday.
This column also has notations like “Days dec. 4 32” and
“Days dec. 5 h.” that I haven't been able to figure out. Something
about the decreasing length of the day in November maybe?
[ Addendum: Yes. See below. ]
The notation on November 6 says “Day 10 10 long” which is consistent
with the sunrise and sunset times Franklin gives for that day. The
fourth and fifth columns, labeled “☉ ris” and “☉ set” are the times of
sunrise and sunset, 6:55 (AM) and 5:05 (PM) respectively for November
6, ten hours and ten minutes apart as Franklin says.
“☽ pl.” is the position of the moon in the sky. (I guess “pl.” is
short for “place”.) The sky is divided into twelve “houses” of 30
degrees each, and when it says that the “☽ pl.” on November 6 is
“♓ 25” I think it means the moon is of the way
along in the house of Pisces on its way to the house of Aries ♈. If
you look at the January 1748 page above you can see the moon making
its way through the whole sky in 29 days, as it does.
The last column, “Aspects, &c.” contains more astronomy. “♂ rise 6 13”
means that Mars will rise at 6:13 that day. (But in the morning or
the evening?) ⚹♃♀ on the 12th says that Jupiter is in sextile aspect
to Venus, which means that they are in the sky 60 degrees apart.
Similarly □☉♃ means that the Sun and Jupiter are in Square aspect, 90
degrees apart in the sky.
Also mixed into that last column, taking up the otherwise empty space,
are the famous wise sayings of Poor Richard. Here we see:
Serving God is Doing Good to Man,
but Praying is thought an easier Service,
and therefore more generally chosen.
Back on the January page you can see one of the more famous ones,
Lost Time is never found again.
Franklin published an Almanack in 1752, the year that the British
Calendar Act of 1751 updated
the calendar from Julian to Gregorian reckoning. To bring the
calendar into line with Gregorian, eleven days were dropped from
September that year. I wondered what Franklin's calendar looked like
that month. Here it is with the eleven days clearly missing:
The leftmost day-of-the-month column skips right from September 2 to
September 14, as the law required. On this copy someone has added the
old dates in the margin. Notice that St. Michael's Day,
which would have been on Friday September 18th in the old calendar,
has been moved up to September 29th. In most years Poor Richard's Almanack featured an essay by
Poor Richard, little poems, and other reference material. The 1752
Almanack omitted most of this so that Franklin could use the space to
instead reprint the entire text of the Calendar Act.
This page also commemorates the
Great Fire of London, which began
September 2, 1666.
Wikipedia tells me that Franklin may have gotten the King's birthday
wrong. Franklin says November 10, but
Wikipedia says November 9,
and:
Over the course of George's life, two calendars were used: the Old
Style Julian calendar and the New Style Gregorian calendar. Before
1700, the two calendars were 10 days apart. Hanover switched from
the Julian to the Gregorian calendar on 19 February (O.S.) / 1 March
(N.S.) 1700. Great Britain switched on 3/14 September 1752. George
was born on 30 October Old Style, which was 9 November New Style,
but because the calendar shifted forward a further day in 1700, the
date is occasionally miscalculated as 10 November.
Several people have pointed out that the mysterious letters G, C, A on
Sundays are the so-called dominical letters,
used in remembering the correspondence between days of the month and
days of the week, and important in the determination of the dates of
Easter and other moveable feasts.
Why Franklin included them in the Almanack is not clear to me, as one of
the main purposes of the almanac itself is so that you do not have
to remember or calculate those things, you can just look them up in
the almanac.
Mikkel Paulson explained the 'days dec.' and 'days inc.' notations:
they describe the length of the day, but reported relative
to the length of the most recent solstice. For example, the November
1753 excerpt for November 2 says "Days dec. 4 32". Going by the times
of sunrise and sunset on that day, the day was 10 hours 18 minutes
long. Adding the 4 hours 32 minutes from the notation we have 14
hours 50 minutes, which is indeed the length of the day on the summer
solstice in Philadelphia, or close to it.
Similarly the notation on November 14 says "Days dec. 5 h" for a day
that is 9 hours 50 minutes between sunrise and sunset, five hours
shorter than on the summer solstice, and the January 3 entry says
"Days inc. 18 m." for a 9h 28m day which is 18 minutes longer than the
9h 10m day one would have on the winter solstice.
Today, 2024-08-14, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …)
we are streaming the 31st episode of the Haskell Unfolder live on YouTube.
Debugging space leaks can be one of the more difficult aspects of writing professional Haskell code. An important source of space leaks are unevaluated thunks in long-lived application data; in this episode of the Haskell Unfolder, we will see how we can take advantage of the nothunks library to make debugging and preventing these kinds of leaks significantly easier.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
Hello! Today we'll be looking into type-based search, what it is, how
it helps, and how to build one for the Unison programming language at
production scale.
Motivating type-directed
search
If you've never used a type-directed code search like Hoogle it's tough to fully
understand how useful it can be. Before starting on my journey to learn
Haskell I had never even thought to ask for a tool like it, now I reach
for it every day.
Many languages offer some form of code search, or at the very least a
package search. This allows users to find code which is relevant for the
task they're trying to accomplish. Typically you'd use these searches by
querying for a natural language phrase describing what you want, e.g.
"Markdown Parser".
This works great for finding entire packages, but searching at the
package level is often far too rough-grained for the problem at hand. If
I just want to very quickly remember the name of the function which
lifts a Char into a Text, I already know it's
probably in the Text package, but I can save some time
digging through the package by asking a search precisely for definitions
of this type. Natural languages are quite imprecise, so a more
specialized query-by-type language allows us to get better results
faster.
If I search using google for "javascript function to group elements
of a list using a predicate" I find many different functions which do
some form of grouping, but none of them quite match the shape I
had in mind, and I need to read through blogs, stack-overflow answers,
and package documentation to determine whether the provided functions
actually do what I'd like them to.
In Haskell I can instead express that question using a type! If I
enter the type [a] -> (a -> Bool) -> ([a], [a])
into Hoogle I get a list of functions which match that type exactly,
there are few other operations with a matching signature, but I can
quickly open those definitions on Hackage and determine that
partition is exactly what I was looking for.
Hopefully this helps to convince you on the utility of a
type-directed search, though it does raise a question: if type-directed
search is so useful, why isn't it more
ubiquitous?
Here are a few possible reasons this could be:
Some languages lack a sufficiently sophisticated type-system with
which to express useful queries
Some languages don't have a centralized package repository
Indexing every function ever written in your language can be
computationally expensive
It's not immediately obvious how to implement such a system
Read on and I'll do what my best to help with the latter
limitation.
Establishing our goals
Before we start building anything, we should nail-down what our
search problem actually is.
Here are a few goals we had for the Unison type-directed search:
It should be able to find functions based on a partial
type-signature
The names given to type variables shouldn't matter.
The ordering of arguments to a function shouldn't matter.
It should be fast
It should should scale
It should be good...
The last criterion is a bit subjective of course, but you know it
when you see it.
The method
It's easy to imagine search methods which match some of the
required characteristics. E.g. we can imagine iterating through every
definition and running the typechecker to see if the query unifies with
the definition's signature, but this would be far too
slow, and wouldn't allow partial matches or mismatched argument
orders.
Alternatively we could perform a plain-text search over rendered type
signatures, but this would be very imprecise and would break our
requirement that type variable names are unimportant.
Investigating prior art, Neil Mitchell's excellent Hoogle uses a
linear scan over a set of pre-built function-fingerprints for whittling
down potential matches. The level of speed accomplished with this method
is quite impressive!
In our case, Unison Share, the code-hosting platform and package
manager for Unison is backed by a Postgres database where all the code
is stored. I investigated a few different Postgres index variants and
landed on a GIN (Generalized inverted index).
If you're unfamiliar with GIN indexes the gist of it is that it
allows us to quickly find rows which are associated with any given
combination of search tokens. They're typically useful when implementing
full-text searches, for instance we may choose to index a text document
like the following:
postgres=# select to_tsvector('And what is the use of a book without pictures or conversations?');
to_tsvector
-------------------------------------------------------
'book':8 'convers':12 'pictur':10 'use':5 'without':9
(1 row)
The generated lexemes represent a fingerprint of the text file which
can be used to quickly and efficiently determine a subset of stored
documents which can then be filtered using other more precise methods.
So for instance we could search for book & pictur to
very efficiently find all documents which contain at least one word that
tokenizes as book AND any word that tokenizes as
pictur.
I won't go too in-depth here on how GIN indexes work as you can
consult the excellent Postgres
documentation if you'd like a deeper dive into that area.
Although our problem isn't exactly full-text-search, we can leverage
GIN into something similar to search type signatures by a set of
attributes.
The attributes we want to search for can be distilled from our
requirements; we need to know which types are mentioned in the
signature, and we need some way to normalize type variables and
argument position.
Let's come up with a way to tokenize type signatures into
the attributes we care about.
Computing Tokens for
type signature search
Mentions of concrete types
If the user metnions a concrete type in their query, we'll need to
find all type signatures which mention it.
Consider the following signature:
Text.take : Nat -> Text -> Text
We can boil down the info here into the following data:
A type called Nat is mentioned once, and it
does NOT appear in the return type of the function.
A type called Text is mentioned twice, and it
does appear in the return type of the function.
There really aren't any rules on how to represent lexemes in a GIN
index, it's really just a set of string tokens. Earlier we saw how
Postgres used an English language tokenizer to distill down the essence
of a block of text into a set of tokens; we can just as easily devise
our own token format for the information we care about.
Here's the format I went with for our search tokens:
<token-kind>,<number-of-occurrences>,<name|hash|variable-id>
So for the mentions of Nat in Text.take's
signature we can build the token: mn,1,Nat.. It starts with
the token's kind (mn for Mention by Name), which prevents
conflicts between tokens even though they'll all be stored in the same
tsvector column. Next I include the number of times it's
mentioned in the signature followed by it's fully qualified name with
the path reversed.
In this case Nat is a single segment, but if the type
were named data.JSON.Array it would be encoded as
Array.JSON.data.,
Why? Postgres allows us to do prefix matches over tokens in
GIN indexes. This allows us to search for matches for any valid suffix
of the query's path, e.g. mn,1,Array.*,
mn,1,Array.JSON.* or mn,1,Array.JSON.data.*
would all match a single mention of this type.
Users don't always know all of the arguments of a
function they're looking for, so we'd love for partial type matches to
still return results. This also helps us to start searching for and
displaying potentially relevant results while the user is still typing
out their query.
For instance Nat -> Text should still find
Text.take, so to facilitate that, when we have more than a
single mention we make a separate token for each of the
1..n mentions. E.g. in
Text.take : Nat -> Text -> Text we'd store both
mn,1,Text AND mn,2,Text in our set of
tokens.
We can't perform arithmetic in our GIN lookup, so this method is a
workaround which allows us to find any type where the number of mentions
is greater than or equal to the number of mentions in the query.
Type mentions by hash
This is Unison after all, so if there's a specific type you care
about but you don't care what the particular package has named that
type, or if there's even a specific version of a type
you care about, you can search for it by hash: E.g.
#abcdef -> #ghijk. This will tokenize into
mh,1,#abcdef and mh,1,#ghijk. Similar to name
mentions this allows us to search using only a prefix of the actual
hash.
Handling return types
Although we don't care about the order of arguments to a
given function, the return-type is a very high value piece of
information. We can add additional tokens to track every type which is
mentioned in the return type of a function by simply adding an
additional token with an r in the 'mentions' place, e.g.
mn,r,Text
We'll use this later to improve the scoring of returned results, and
may in the future allow performing more advanced searches like "Show me
all functions which produce a value of this type", a.k.a. functions
which return that type but don't accept it as an argument, or perhaps
"Show me all handlers of this ability", which corresponds to all
functions which accept that ability as an argument but don't
return it, e.g. 'mn,1,Stream' & (! 'mn,r,Stream').
A note on higher-kinded types and abilities like
Map Text Nat and a -> {Exception} b, we
simply treat each of these as its own concrete type mention. The system
could be expanded to include more token types for each of these, but one
has to be wary of an explosion in the number of generated tokens and in
initial testing the search seems to work quite well despite no special
treatment.
Mentions of type variables
Concrete types are covered, but what about type variables? Consider
the type signature: const: b -> a -> b.
This type contains a and b which are
type variables. The names of type variables are not important
on their own, you can rename any type variable to anything you like as
long as you consider its scope and rename all the mentions of the same
variable within its scope.
To normalize the names of type variables I assign each variable a
numerical ID instead. In this example we may choose to assign
b the number 1 and a the number
2. However, we have to be careful because we also
want to be indifferent with regard to argument order. A search for
a -> b -> b should still find const! if
we assigned a to 1 and b to
2 according to the order of their appearance we wouldn't
have a match.
To fix this issue we can simply sort the type variables according to
their number of occurrences, so in this example a
has fewer occurrences than b, so it gets the lower variable
ID.
This means that both a -> b -> b and
b -> a -> b will tokenize to the same set of tokens:
v,1,1 for a, and v,1,2,
v,2,2, and v,r,2 for b.
Parsing the search query
We could require that all queries are properly formed
type-signatures, but that's quite restrictive and we'd much rather allow
the user to be a bit sloppy in their search.
To that end I wrote a custom version of our type-parser that is
extremely lax in what it accepts, it will attempt to determine the arity
and return type of the query, but will also happily accept just a list
of type names. Searching for Nat Text Text and
Nat -> Text -> Text are both valid queries, but the
latter will return better results since we have information about both
the arity of the desired function and the return type. Once we've parsed
the query we can convert it into the same set of tokens we generated
from the type signatures in the codebase.
Performing the search
After we've indexed all the code in our system (in Unison this takes
only a few minutes) we can start searching!
For Unison's search I've opted to require that each occurrence in the
query MUST be present in each match, however for better partial
type-signature support I do include results which are missing specified
return types, but will rank them lower than results with matching return
types in the results.
Other criteria used to score matches include: * Types with an arity
closer to the user's query are ranked higher * How complex the type
signature is, types with more tokens are ranked lower. * We give a
slight boost to some core projects, e.g. Unison's standard library
base will show up higher in search results if they match. *
You can include a text search along with your type search to further
filter results, e.g. map (a -> b) -> [a] -> [b]
will prefer finding definitions with map somewhere in the
name. * Queries can include a specific user or project to search within
to further filter results, e.g. @unison/cloud Remote
Summary
I hope that helps shed some light on how it all works, and perhaps
will help others in implementing their own type-directed-search down the
road!
If you're interested in digging deeper, Unison Share, and by-proxy
the entire type-directed search implementation, is all Open-Source, so
go check it out! It's changing and improving all the time, but this
module would be a good place to start digging.
Let us know in the Unison
Discord if you've got any suggested improvements or run into any
bugs. Cheers!
Hopefully you learned something 🤞! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put
these things together, if I managed to teach your something or even just entertain you for a minute or two
maybe send a few bucks my way for a coffee? Cheers! �
I've been playing around with building a Sudoku solver circuit
on an FPGA: you connect to it via a serial port, send it a
Sudoku grid with some unknown cells, and after solving it, you
get back the solved (fully filled-in) grid. I wanted the output
to be nicely human-readable, for example for a 3,3-Sudoku
(i.e. the usual Sudoku size where the grid is made up of a 3 ⨯ 3
matrix of 3 ⨯ 3 boxes):
This post is about how I structured the stream transformer that
produces all the right spaces and newlines, yielding a clash-protocols
based circuit.
With clash-protocols, the type of a circuit that transforms a
serial stream of a values into a stream of
b values is Circuit (Df dom a) (Df dom
b), with the Df type constructor taking care
of representing acknowledgement signals. So for our formatter,
we are looking to implement it as a Circuit (Df dom a) (Df
dom (Either Char a): the output is a mixed stream of
forwarded data a and punctuation characters.
If we were writing normal (software) Haskell, we could write the
corresponding [a] -> [Either Char a] in many
different ways, but since we want to describe a hardware
circuit, there are a couple more constraints:
There is limited control over our input and output. The
Circuit/Df abstraction provides a
way to exert backpressure upstream, but that's a double edged
sword: it also means whatever circuit we write has to be ready
to handle backpressure from further downstream. And upstream
can stall us: sometimes there is no input available.
Everything, of course, has to be finite. This includes both
data and recursion depth.
We don't want to waste parts on our FPGA. If a counter can be
just 11 bits, using an 11-bit data type instead of a
Word16 can translate to actual savings.
Luckily, clash-protocols takes care of the first constraint via
the expander function:
expander
:: forall dom i o s. (HiddenClockResetEnable dom, NFDataX s)
=> s
-> (s -> i -> (s, o, Bool)) -- ^ Return `True` when you're finished
-- with the current input value and are ready for the next one.
-> Circuit (Df dom i) (Df dom o)
So basically expander reduces the problem to just
writing a "normal" pure Haskell function s -> i -> (s, o,
Bool). This is reassuring since I am otherwise not too
familiar with clash-protocols but this take mes back to
the terra firma of Haskell.
A simpler formatter
Let's warm up by writing a state machine for
expander that just puts two spaces between each input:
data DoubleSpacedState
= CopyTheInput
| FirstSpace
| SecondSpace
deriving (Generic, NFDataX) -- So that Clash can store the state in a register
-- | Transform the stream 'a','b',... to Right 'a', Left ' ', Left ' ', Right 'b', Left ' ', Left ' ', ...
doubleSpaced :: (HiddenClockResetEnable dom) => Circuit (Df dom a) (Df dom (Either Char a))
doubleSpaced = expander CopyTheInput \s x -> case s of
CopyTheInput -> (FirstSpace, Right x, True)
FirstSpace -> (SecondSpace, Left ' ', False)
SecondSpace -> (CopyTheInput, Left ' ', False)
We can try it out using the Circuit simulator
simulateCSE which on its own has an intimidating
type because clash-protocols supports more protocols
than just Df:
λ» :t simulateCSE
simulateCSE
:: (Protocols.Internal.Drivable a, Protocols.Internal.Drivable b, KnownDomain dom)
=> (Clock dom -> Reset dom -> Enable dom -> Circuit a b)
-> Protocols.Internal.ExpectType a
-> Protocols.Internal.ExpectType b
But once we partially apply it on doubleSpaced with
the Clash clock, reset and enable lines made explicit, the type
suddenly makes a ton of sense:
λ» :t simulateCSE @System (exposeClockResetEnable doubleSpaced)
simulateCSE @System (exposeClockResetEnable doubleSpaced)
:: (NFDataX a, ShowX a, Show a) => [a] -> [Either Char a]
Let's try applying it on a string like "Hello" to
make sure it works correctly:
λ» simulateCSE @System (exposeClockResetEnable doubleSpaced) "Hello"
[Right 'H',Left ' ',Left ' ',
Right 'e',Left ' ',Left ' ',
Right 'l',Left ' ',Left ' ',
Right 'l',Left ' ',Left ' ',
Right 'o'
The simulation seemingly hangs because simulateCSE
starts feeding input is not yet available signals to the
simulated circuit after it runs out of the user-specified
input. So it's not hanging, our circuit is just not producing
any more output. And when we look at the type of
expander, we see that this has to be how it works,
since if there is no input, there is no i argument
to pass to the state machine function. But this is not exactly
what we want, since we want the two spaces to be sent as soon as
possible, after the preceding character, not before
the next character. So we change doubleSpaced
to only consume the input after all spaces are output:
doubleSpaced :: (HiddenClockResetEnable dom) => Circuit (Df dom a) (Df dom (Either Char a))
doubleSpaced = expander CopyTheInput \s x -> case s of
CopyTheInput -> (FirstSpace, Right x, False) -- Here...
FirstSpace -> (SecondSpace, Left ' ', False)
SecondSpace -> (CopyTheInput, Left ' ', True) -- ... and here
λ» simulateCSE @System (exposeClockResetEnable doubleSpaced) "Hello"
[Right 'H',Left ' ',Left ' ',
Right 'e',Left ' ',Left ' ',
Right 'l',Left ' ',Left ' ',
Right 'l',Left ' ',Left ' ',
Right 'o',Left ' ',Left ' '
Getting rid of the bespoke state datatype
If we want to write similar state machines for more complex
formats, we should start thinking about composability. For
example, we can think of our simple double-spaced formatter as a
combination of a formatter that just forwards the input with
another that outputs two spaces.
At any given moment, we are either in the forwarding state or
the space-printing state:
data Forward = Forward deriving ...
data Spaces = FirstSpace | SecondSpace deriving ...
type DoubleSpacedState = Either Forward Spaces
Or using even more generic types, we can say that there is just
one Forwarding state, and two state for the
Spaces:
type DoubleSpacedState = Either (Index 1) (Index 2)
Here, Index :: Nat -> Type is a Clash-provided type
with an exact number of distinct values 0, 1, ..., n-1,
with Index n represented as ⌈log₂ n⌉ bits.
One nice thing about using only Either,
Index and tuples for our state representation is
that we can then use Clash's Counter
class to iterate through the states:
doubleSpaced :: (HiddenClockResetEnable dom) => Circuit (Df dom a) (Df dom (Either Char a))
doubleSpaced = expander (Left 0 :: Either (Index 1) (Index 2)) \s x ->
let output = case s of
Left 0 -> Right x
Right 0 -> Left ' '
Right 1 -> Left ' '
s' = countSucc s
consume = case s' of
Left 0 -> True
_ -> False
in (s', output, consume)
Here, I've changed the code computing whether the current input
should be consumed so that it looks at the
next state instead of the current one. Because this is
really what we are doing – we want to go through as many states
as we can, until we get to the point that the next time around
we will need new input.
Declarative formatting
Compared to the initial version with three distinct, named
constructors, we have gained generality, in that we can now
imagine what the state would need to look like for our original
formatting example. But already at this simplified example, it
has cost us legibility: looking at the latest definition of
doubleSpaced, it is not immediately obvious what
format it corresponds to.
So of course the next thing we want to do is use a declarative
syntax for the format, and derive everything else from that. We
can take a page out of Servant and give
users a library of type-level combinators corresponding to
regular expressions without alternatives. Our end goal is to be
able to write our Sudoku example as just a single type
definition, using :++ for concatenation,
:* for repetition, and type-level strings for
literals:
type GridFormat n m = ((((Forward :++ " ") :* n :++ " ") :* m :++ "\r\n") :* m :++ "\r\n") :* n
So we Forward the data and follow it up with a
single space, then after each nth repetition,
we insert one more space. Do this whole thing m times,
and end the line (using the old serial format instead of the
"modern" newline-only Unix format), then after this is done
m times, we insert the extra newline between the blocks.
Implementing this idea starts with capturing the essence of a
format specifier: it needs to be associated with a counter type
used for the given formatter's state, and we need to know how to
produce the next single formatting token in any given state.
data PunctuatedBy c
= Literal c
| ForwardData
deriving (Generic, NFDataX)
class (Counter (State fmt), NFDataX (State fmt)) => Format (fmt :: k) where
type State fmt
format1 :: proxy fmt -> State fmt -> PunctuatedBy Char
Then, using this interface, we can write a generic formatter
using expander, similar to our earlier attempt:
format
:: (HiddenClockResetEnable dom, Format fmt)
=> Proxy fmt
-> Circuit (Df dom a) (Df dom (Either Word8 a))
format fmt = Df.expander countMin \s x ->
let output = case format1 fmt s of
ForwardData -> Right x
Literal sep -> Left (ascii sep)
s' = countSucc s
consume = case format1 fmt s' of
ForwardData -> True
_ -> False
in (s', output, consume)
The easy formatters
Let's get all the easy cases out of the way. These are the
formatters where we can either directly write
format1, or it can be delegated to other formatters:
-- | Consume one token of input and forward it to the output
data Forward
instance Format Forward where
type State Forward = Index 1
-- Here, `Index 1` stands in for `()` but with a (trival) `Counter` instance
format1 _ _ = ForwardData
-- | Concatenation
data a :++ b
instance (Format a, Format b) => Format (a :++ b) where
type State (a :++ b) = Either (State a) (State b)
-- The order is important, since `countMin @(Either a b) = Left countMin`
format1 _ = either (format1 (Proxy @a)) (format1 (Proxy @b))
-- | Repetition
data a :* (rep :: Nat)
instance (Format a, KnownNat rep, 1 ≤ rep) => Format (a :* rep) where
type State (a :* rep) = (Index rep, State a)
-- The order is important, since that's how the `Counter` instance for tuples cascades increments
format1 _ (_, fmt) = format1 (Proxy @a) fmt
Reflecting symbols character by character
What we want to do
for the Format (sep :: Symbol) instance
is to use Index n as the state, where
n is the length of the symbol, and then
format1 _ i would return the ith
character of our separator sep.
Unfortunately, this requires considerably more elbow grease than
the previous instances. Currently, there aren't many
type-level
functions over Symbol in base
so we have to implement it all ourselves based on just the
UnconsSymbol type family.
type SymbolLength s = SymbolLength' (UnconsSymbol s)
type IndexableSymbol s = IndexableSymbol' (UnconsSymbol s)
class (KnownNat (SymbolLength' s)) => IndexableSymbol' (s :: Maybe (Char, Symbol)) where
type SymbolLength' s :: Nat
symbolAt :: proxy s -> Index (SymbolLength' s) -> Char
instance IndexableSymbol' Nothing where
type SymbolLength' Nothing = 0
symbolAt _ i = error "impossible"
instance (IndexableSymbol s, KnownChar c) => IndexableSymbol' (Just '(c, s)) where
type SymbolLength' (Just '(c, s)) = 1 + SymbolLength s
{-# INLINE symbolAt #-}
symbolAt _ i
| i == 0
= charVal (Proxy @c)
| otherwise
= symbolAt (Proxy @(UnconsSymbol s)) (fromIntegral (i - 1))
With the help of these utility classes, we can now write the
formatter for Symbols. The lower bound on
SymbolLength is needed because the degenerate type
Index 0 (isomorphic to Void) would
just screw everything up.
-- | Literal
instance (IndexableSymbol sep, KnownNat (SymbolLength sep), 1 ≤ SymbolLength sep) => Format sep where
type State sep = Index (SymbolLength sep)
format1 _ i = Literal $ symbolAt (Proxy @(UnconsSymbol sep)) i
Note that the indirection between a format fmt and
its state type State fmt was only needed in the
first place because we wanted Symbols to be valid
formatters without wrapping them in an extra layer. If we were
content with extra noise like Literal "\r\n"
instead of just "\r\n", we could collapse the two
types.
In my real code, I ended up having to change format
slightly so that it directly produces 8-bit ASCII values instead
of Chars, because I found that composing it with
another Circuit that does the conversion wasn't
getting inlined enough to produce Verilog that avoids the large
21-bit-wide multiplexers for Char.
In the eternal search of a better text editor, I’ve recently gone back
to Neovim. I’ve taken the time to configure it myself, with as few
plugins and other cruft as possible. My goal is a minimalist editing
experience, tailored for exactly those tasks that I do regularly, and
nothing more. In this post, I’ll give a brief tour of my setup and its
motivations.
I'm not sure why, but all of a sudden I started getting this question every time
emacs starts
Symbolic link to Git-controlled source file; follow link?
After some searching I found out that it's VC asking. I'm guessing this comes
from straight's very liberal use of symlinks. Though I'm still a little
surprised at VC kicking in when reading the config.
Anyway, there are two variables to consider, vc-follow-symlinks and
vc-handled-backends. I opted to modify the latter one, and since I don't use
VC at all I'm turning it off completely.
In a previous
post
I discussed the first half of my solution to Factor-Full
Tree. In this post,
I will demonstrate how to decompose a tree into disjoint paths.
Technically, we should clarify that we are looking for directed
paths in a rooted tree, that is, paths that only proceed down the
tree. One could also ask about decomposing an unrooted tree into
disjoint undirected paths; I haven’t thought about how to do that in
general but intuitively I expect it is not too much more difficult.
For
this particular problem, we want to decompose a tree into
maximum-length paths (i.e. we start by taking the longest possible
path, then take the longest path from what remains, and so on); I will call
this the max-chain decomposition (I don’t know if there is a
standard term). However, there are other types of path
decomposition, such as heavy-light decomposition, so we will try to
keep the decomposition code somewhat generic.
Remember, our goal is to split up a tree into a collection of linear
paths; that is, in general, something like this:
What do we need in order to specify a decomposition of a
tree into disjoint paths this way? Really, all we need is to choose at most
one linked child for each node. In other words, at every node we can
choose to continue the current path into a single child node (in which
case all the other children will start their own new paths), or we
could choose to terminate the current path (in which case every child
will be the start of its own new path). We can represent such a
choice with a function of type
typeSubtreeSelector a = a -> [Tree a] ->Maybe (Tree a, [Tree a])
which takes as input the value at a node and the list of all the
subtrees, and possibly returns a selected subtree along with the list of remaining
subtrees.Of course, there is nothing in the
type that actually requires a SubtreeSelector to return one of the
trees from its input paired with the rest, but nothing we will do
depends on this being true. In fact, I expect there may be some
interesting algorithms obtainable by running a “path decomposition”
with a “selector” function that actually makes up new trees instead of just
selecting one, similar to the chop function.
Given such a subtree selection function, a generic path decomposition
function will then take a tree and turn it into a list of non-empty
paths:We could also imagine wanting information about the parent of each
path, and a mapping from tree nodes to some kind of path ID, but we
will keep things simple for now.
pathDecomposition ::SubtreeSelector a ->Tree a -> [NonEmpty a]
Implementing pathDecomposition is a nice exercise; you might like to
try it yourself! You can find my implementation at the end of this
blog post.
Max-chain decomposition
Now, let’s use our generic path decomposition to implement a max-chain
decomposition. At each node we want to select the tallest subtree;
in order to do this efficiently, we can first annotate each tree node with
its height, via a straightforward tree fold:
typeHeight=IntlabelHeight ::Tree a ->Tree (Height, a)labelHeight = foldTree nodewhere node a ts =case ts of [] ->Node (0, a) [] _ ->Node (1+maximum (map (fst. rootLabel) ts), a) ts
Our subtree selection function can now select the subtree with the
largest Height annotation. Instead of implementing this directly,
we might as well make a generic function for selecting the “best”
element from a list (we will reuse it later):
selectMaxBy :: (a -> a ->Ordering) -> [a] ->Maybe (a, [a])selectMaxBy _ [] =NothingselectMaxBy cmp (a : as) =case selectMaxBy cmp as ofNothing->Just (a, [])Just (b, bs) ->case cmp a b ofLT->Just (b, a : bs) _ ->Just (a, b : bs)
We can now put the pieces together to implement max-chain
decomposition. We first label the tree by height, then do a path
decomposition that selects the tallest subtree at each node. We leave
the height annotations in the final output since they might be
useful—for example, we can tell how long each path is just by
looking at the Height annotation on the first element. If we don’t
need them we can easily get rid of them later. We also sort by
descending Height, since getting the longest chains first was kind
of the whole point.
To flesh this out into a full solution to Factor-Full
Tree, after
computing the chain decomposition we need to assign prime factors to
the chains. From those, we can compute the value for each node if we
know which chain it is in and the value of its parent. To this end,
we will need one more function which computes a Map recording the
parent of each node in a tree. Note that if we already know all the
edges in a given edge list are oriented the same way, we can build
this much more simply as e.g.map swap >>> M.fromList; but when
(as in general) we don’t know which way the edges should be oriented
first, we might as well first build a Tree a via DFS with
edgesToTree and then construct the parentMap like this afterwards.
parentMap ::Ord a =>Tree a ->Map a aparentMap = foldTree node >>>sndwhere node ::Ord a => a -> [(a, Map a a)] -> (a, Map a a) node a b = (a, M.fromList (map (,a) as) <>mconcat ms)where (as, ms) =unzip b
Finally, we can solve Factor-Full tree. Note that some code from my
previous blog
post
is needed as well, and is included at the end of the post for
completeness. Once we compute the max chain decomposition and the
prime factor for each node, we use a lazy recursive
Map
to compute the value assigned to each node.
solve ::TC-> [Int]solve TC{..} = M.elems assignmentwhere-- Build the tree and compute its parent map t = edgesToTree Node edges 1 parent = parentMap t-- Compute the max chain decomposition, and use it to assign a prime factor-- to each non-root node paths :: [[Node]] paths =map (NE.toList .fmapsnd) $ maxChainDecomposition t factor ::MapNodeInt factor = M.fromList .concat$zipWith (\p ->map (,p)) primes paths-- Compute an assignment of each node to a value, using a lazy map assignment ::MapNodeInt assignment = M.fromList $ (1,1) : [(v, factor!v * assignment!(parent!v)) | v <- [2..n]]
In this episode, Wouter and Sam interview Dominic Orchard. Dominic has many roles, including: senior lecturer at the University of Kent, co-director of the Institute of Computing for Climate Science, and bye-fellow of Queen’s College in Cambridge. We will not only discuss his work on Granule - graded monads, coeffects, and linear types - but also his collaboration with actual scientists to improve the languages with which they work.
Haskell diagrams allowed us to render finite patches of tiles easily as discussed in Diagrams for Penrose tiles. Following a suggestion of Stephen Huggett, we found that the description and manipulation of such tilings is greatly enhanced by using planar graphs. In Graphs, Kites and Darts we introduced a specialised planar graph representation for finite tilings of kites and darts which we called Tgraphs (tile graphs). These enabled us to implement operations that use neighbouring tile information and in particular operations , , and .
For ease of reference, we reproduce the half-tiles we are working with here.
Figure 1: Half-tile faces
Figure 1 shows the right-dart (RD), left-dart (LD), left-kite (LK) and right-kite (RK) half-tiles. Each has a join edge (shown dotted) and a short edge and a long edge. The origin vertex is shown red in each case. The vertex at the opposite end of the join edge from the origin we call the opp vertex, and the remaining vertex we call the wing vertex.
If the short edges have unit length then the long edges have length (the golden ratio) and all angles are multiples of (a tenth turn) with kite halves having two 2s and a 1, and dart halves having a 3 and two 1s. This geometry of the tiles is abstracted away from at the graph representation level but used when checking validity of tile additions and by the drawing functions.
There are rules for how the tiles can be put together to make a legal tiling (see e.g. Diagrams for Penrose tiles). We defined a Tgraph (in Graphs, Kites and Darts) as a list of such half-tiles which are constrained to form a legal tiling but must also be connected with no crossing boundaries (see below).
As a simple example consider kingGraph (2 kites and 3 darts round a king vertex). We represent each half-tile as a TileFace with three vertex numbers, then apply makeTgraph to the list of ten Tilefaces. The function makeTgraph :: [TileFace] -> Tgraph performs the necessary checks to ensure the result is a valid Tgraph.
To view the Tgraph we simply form a diagram (in this case 2 diagrams horizontally separated by 1 unit)
hsep1[labelleddrawjkingGraph,drawkingGraph]
and the result is shown in figure 2 with labels and dashed join edges (left) and without labels and join edges (right).
Figure 2: kingGraph with labels and dashed join edges (left) and without (right).
The boundary of the Tgraph consists of the edges of half-tiles which are not shared with another half-tile, so they go round untiled/external regions. The no crossing boundary constraint (equivalently, locally tile-connected) means that a boundary vertex has exactly two incident boundary edges and therefore has a single external angle in the tiling. This ensures we can always locally determine the relative angles of tiles at a vertex. We say a collection of half-tiles is a validTgraph if it constitutes a legal tiling but also satisfies the connectedness and no crossing boundaries constraints.
Our key operations on Tgraphs are , , and which are illustrated in figure 3.
Figure 3: decompose, force, and compose
Figure 3 shows the kingGraph with its decomposition above it (left), the result of forcing the kingGraph (right) and the composition of the forced kingGraph (bottom right).
Decompose
An important property of Penrose dart and kite tilings is that it is possible to divide the half-tile faces of a tiling into smaller half-tile faces, to form a new (smaller scale) tiling.
Figure 4: Decomposition of (left) half-tiles
Figure 4 illustrates the decomposition of a left-dart (top row) and a left-kite (bottom row). With our Tgraph representation we simply introduce new vertices for dart and kite long edges and kite join edges and then form the new faces using these. This does not involve any geometry, because that is taken care of by drawing operations.
Force
Figure 5 illustrates the rules used by our operation (we omit a mirror-reflected version of each rule).
Figure 5: Force rules
In each case the yellow half-tile is added in the presence of the other half-tiles shown. The yellow half-tile is forced because, by the legal tiling rules and the seven possible vertex types, there is no choice for adding a different half-tile on the edge where the yellow tile is added.
We call a Tgraphcorrect if it represents a tiling which can be continued infinitely to cover the whole plane without getting stuck, and incorrect otherwise. Forcing involves adding half-tiles by the illustrated rules round the boundary until either no more rules apply (in which case the result is a forced Tgraph) or a stuck tiling is encountered (in which case an incorrect Tgraph error is raised). Hence is a partial function but total on correct Tgraphs.
Compose: This is discussed in the next section.
2. Composition Problems and a Theorem
Compose Choices
For an infinite tiling, composition is a simple inverse to decomposition. However, for a finite tiling with boundary, composition is not so straight forward. Firstly, we may need to leave half-tiles out of a composition because the necessary parts of a composed half-tile are missing. For example, a half-dart with a boundary short edge or a whole kite with both short edges on the boundary must necessarily be excluded from a composition. Secondly, on the boundary, there can sometimes be a problem of choosing whether a half-dart should compose to become a half-dart or a half-kite. This choice in composing only arises when there is a half-dart with its wing on the boundary but insufficient local information to determine whether it should be part of a larger half-dart or a larger half-kite.
In the literature (see for example 1 and 2) there is an often repeated method for composing (also called inflating). This method always make the kite choice when there is a choice. Whilst this is a sound method for an unbounded tiling (where there will be no choice), we show that this is an unsound method for finite tilings as follows.
Clearly composing should preserve correctness. However, figure 6 (left) shows a correct Tgraph which is a forced queen, but the kite-favouring composition of the forced queen produces the incorrect Tgraph shown in figure 6 (centre). Applying our function to this reveals a stuck tiling and reports an incorrect Tgraph.
Figure 6: An erroneous and a safe composition
Our algorithm (discussed in Graphs, Kites and Darts) detects dart wings on the boundary where there is a choice and classifies them as unknowns. Our composition refrains from making a choice by not composing a half dart with an unknown wing vertex. The rightmost Tgraph in figure 6 shows the result of our composition of the forced queen with the half-tile faces left out of the composition (the remainder faces) shown in green. This avoidance of making a choice (when there is a choice) guarantees our composition preserves correctness.
Compose is a Partial Function
A different composition problem can arise when we consider Tgraphs that are not decompositions of Tgraphs. In general, is a partial function on Tgraphs.
Figure 7: Composition may fail to produce a Tgraph
Figure 7 shows a Tgraph (left) with its sucessful composition (centre) and the half-tile faces that would result from a second composition (right) which do not form a valid Tgraph because of a crossing boundary (at vertex 6). Thus composition of a Tgraph may fail to produce a Tgraph when the resulting faces are disconnected or have a crossing boundary.
However, we claim that is a total function on forced Tgraphs.
Compose Force Theorem
Theorem: Composition of a forced Tgraph produces a valid Tgraph.
We postpone the proof (outline) for this theorem to section 5. Meanwhile we use the result to establish relationships between , , and in the next section.
3. Perfect Composition Theorem
In Graphs, Kites and Darts we produced a diagram showing relationships between multiple decompositions of a dart and the forced versions of these Tgraphs. We reproduce this here along with a similar diagram for multiple decompositions of a kite.
Figure 8: Commuting Diagrams
In figure 8 we show separate (apparently) commuting diagrams for the dart and for the kite. The bottom rows show the decompositions, the middle rows show the result of forcing the decompositions, and the top rows illustrate how the compositions of the forced Tgraphs work by showing both the composed faces (black edges) and the remainder faces (green edges) which are removed in the composition. The diagrams are examples of some commutativity relationships concerning , and which we will prove.
It should be noted that these diagrams break down if we consider only half-tiles as the starting points (bottom right of each diagram). The decomposition of a half-tile does not recompose to its original, but produces an empty composition. So we do not even have in these cases. Forcing the decomposition also results in an empty composition. Clearly there is something special about the depicted cases and it is not merely that they are wholetile complete because the decompositions are not wholetile complete. [Wholetile complete means there are no join edges on the boundary, so every half-tile has its other half.]
Below we have captured the properties that are sufficient for the diagrams to commute as in figure 8. In the proofs we use a partial ordering on Tgraphs (modulo vertex relabelling) which we define next.
Partial ordering of Tgraphs
If and are both valid Tgraphs and consists of a subset of the (half-tile) faces of we have
which gives us a partial order on Tgraphs. Often, though, is only isomorphic to a subset of the faces of , requiring a vertex relabelling to become a subset. In that case we write
which is also a partial ordering and induces an equivalence of Tgraphs defined by
in which case and are isomorphic as Tgraphs.
Both and are monotonic with respect to meaning:
We also have is monotonic, but only when restricted to correct Tgraphs. Also, when restricted to correct Tgraphs, we have is non decreasing because it only adds faces:
and is idempotent (forcing a forced correct Tgraph leaves it the same):
Composing perfectly and perfect compositions
Definition: A Tgraphcomposes perfectly if all faces of are composable (i.e there are no remainder faces of when composing).
We note that the composed faces must be a valid Tgraph (connected with no crossing boundaries) if all faces are included in the composition because has those properties. Clearly, if composes perfectly then
In general, for arbitrary where the composition is defined, we only have
Definition: A Tgraphis a perfect composition if composes perfectly.
Clearly if is a perfect composition then
(We could use equality here because any new vertex labels introduced by will be removed by ). In general, for arbitrary ,
Lemma 1: is a perfect composition if and only if has the following 2 properties:
every half-kite with a boundary join has either a half-dart or a whole kite on the short edge, and
every half-dart with a boundary join has a half-kite on the short edge,
(Proof outline:) Firstly note that unknowns in (= ) can only come from boundary joins in . The properties 1 and 2 guarantee that has no unknowns. Since every face of has come from a decomposed face in , there can be no faces in that will not recompose, so will compose perfectly to . Conversely, if is a perfect composition, its decomposition can have no unknowns. This implies boundary joins in must satisfy properties 1 and 2.
(Note: a perfect composition may have unknowns even though its decomposition has none.)
It is easy to see two special cases:
If is wholetile complete then is a perfect composition.
Proof: Wholetile complete implies no boundary joins which implies properties 1 and 2 in lemma 1 which implies is a perfect composition.
If is a decomposition then is a perfect composition.
Proof: If is a decomposition, then every half-dart has a half-kite on the short edge which implies property 2 of lemma 1. Also, any half-kite with a boundary join in must have come from a decomposed half-dart since a decomposed half-kite produces a whole kite with no boundary kite join. So the half-kite must have a half-dart on the short edge which implies property 1 of lemma 1. The two properties imply is a perfect composition.
We note that these two special cases cover all the Tgraphs in the bottom rows of the diagrams in figure 8. So the Tgraphs in each bottom row are perfect compositions, and furthermore, they all compose perfectly except for the rightmost Tgraphs which have empty compositions.
In the following results we make the assumption that a Tgraph is correct, which guarantees that when is applied, it terminates with a correct Tgraph. We also note that preserves correctness as does (provided the composition is defined).
Lemma 2: If is a forced, correct Tgraph then
(Proof outline:) The proof uses a case analysis of boundary and internal vertices of . For internal vertices we just check there is no change at the vertex after using figure 12 (plus an extra case for the forced star). For boundary vertices we check the local context cases shown in figure 9.
Figure 9: Local contexts for boundary vertices of a forced Tgraph
This shows two cases for a kite origin, two cases for a kite opp, four cases for a kite wing, and four cases for a dart origin. The only case for a dart wing is one of the two kite opp cases and there are no dart opp cases as these cannot be on the boundary of a forced Tgraph. Actually figure 9 has a repeated case which is both a dart origin and a kite wing, but there are also 3 additional cases when we consider mirror images of those shown. Since there is no local change of the context in each case, and since this is true for all boundary vertices in any forced Tgraph, there can be no non-local change either. (We omit the full details).
Lemma 3: If is a perfect composition and a correct Tgraph, then
(Proof outline:) The proof is by analysis of each possible force rule applicable on a boundary edge of and checking local contexts to establish that (i) the result of applying to the local context must include the added half-tile, and (ii) if the added half tile has a new boundary join, then the result must include both halves of the new half-tile. The two properties of perfect compositions mentioned in lemma 1 are critical for the proof. However, since the result of adding a single half-tile may break the condition of the Tgraph being a pefect composition, we need to arrange that half-tiles are completed first then each subsequent half-tile addition is paired with its wholetile completion. This ensures the perfect composition condition holds at each step for a proof by induction. [A separate proof is needed to show that the ordering of applying force rules makes no difference to a final correct Tgraph (apart from vertex relabelling)].
Lemma 4 If composes perfectly and is a correct Tgraph then
Proof: Assume composes perfectly and is a correct Tgraph. Since is non-decreasing (with respect to on correct Tgraphs)
and since is monotonic
Since composes perfectly, the left hand side is just , so
and since is monotonic (with respect to on correct Tgraphs)
For the opposite direction, we substitute for in lemma 3 to get
Then, since , we have
Apply to both sides (using monotonicity)
For any for which the composition is defined we have so we get
Now apply to both sides and note to get
Combining this with (*) above proves the required equivalence.
Theorem (Perfect Composition): If composes perfectly and is a correct Tgraph then
Proof: Assume composes perfectly and is a correct Tgraph. By lemma 4 we have
Applying to both sides, gives
Now by lemma 2, with , the right hand side is equivalent to
which establishes the result.
Corollaries (of the perfect composition theorem):
If is a perfect composition and a correct Tgraph then
Proof: Let (so ) in the theorem.
[This result generalises lemma 2 because any correct forced Tgraph is necessarily wholetile complete and therefore a perfect composition, and .]
If is a perfect composition and a correct Tgraph then
Proof: Apply to both sides of the previous corollary and note that
provided the composition is defined, which it must be for a forced Tgraph by the Compose Force theorem.
If is a perfect composition and a correct Tgraph then
Proof: Apply to both sides of the previous corollary noting is monotonic and idempotent for correct Tgraphs
From the fact that is non decreasing and and are monotonic, we also have
Hence combining these two sub-Tgraph results we have
It is important to point out that if is a correct Tgraph and is a perfect composition then this is not the same as composes perfectly. It could be the case that has more faces than and so could have unknowns. In this case we can only prove that
As an example where this is not an equivalence, choose to be a star. Then its composition is the empty Tgraph (which is still a pefect composition) and so the left hand side is the empty Tgraph, but the right hand side is a sun.
Perfectly composing generators
The perfect composition theorem and lemmas and the three corollaries justify all the commuting implied by the diagrams in figure 8. However, one might ask more general questions like: Under what circumstances do we have (for a correct forced Tgraph)
Definition A generator of a correct forced Tgraph is any Tgraph such that and .
We can now state that
Corollary If a correct forced Tgraph has a generator which composes perfectly, then
Proof: This follows directly from lemma 4 and the perfect composition theorem.
As an example where the required generator does not exist, consider the rightmost Tgraph of the middle row in figure 10. It is generated by the Tgraph directly below it, but it has no generator with a perfect composition. The Tgraph directly above it in the top row is the result of applying which has lost the leftmost dart of the Tgraph.
Figure 10: A Tgraph without a perfectly composing generator
We could summarise this section by saying that can lose information which cannot be recovered by a subsequent and, similarly, can lose information which cannot be recovered by a subsequent . We have defined perfect compositions which are the Tgraphs that do not lose information when decomposed and Tgraphs which compose perfectly which are those that do not lose information when composed. Forcing does the same thing at each level of composition (that is it commutes with composition) provided information is not lost when composing.
4. Multiple Compositions
We know from the Compose Force theorem that the composition of a Tgraph that is forced is always a valid Tgraph. In this section we use this and the results from the last section to show that composing a forced, correct Tgraph produces a forced Tgraph.
First we note that:
Lemma 5: The composition of a forced, correct Tgraph is wholetile complete.
Proof: Let where is a forced, correct Tgraph. A boundary join in implies there must be a boundary dart wing of the composable faces of . (See for example figure 4 where this would be vertex 2 for the half dart case, and vertex 5 for the half-kite face). This dart wing cannot be an unknown as the half-dart is in the composable faces. However, a known dart wing must be either a large kite centre or a large dart base and therefore internal in the composable faces of (because of the force rules) and therefore not on the boundary in . This is a contradiction showing that can have no boundary joins and is therefore wholetile complete.
Theorem: The composition of a forced, correct Tgraph is a forced Tgraph.
Proof: Let for some forced, correct Tgraph, then is wholetile complete (by lemma 5) and therefore a perfect composition. Let , so composes perfectly (). By the perfect composition theorem we have
We also have
Applying to both sides, noting that is monotonic and the identity on forced Tgraphs, we have
Applying to both sides, noting that is monotonic, we have
By (**) above, the left hand side is equivalent to so we have
but since we also have ( being non-decreasing)
we have established that
which means is a forced Tgraph.
This result means that after forcing once we can repeatedly compose creating valid Tgraphs until we reach the empty Tgraph.
We can also use lemma 5 to establish the converse to a previous corollary:
Corollary If a correct forced Tgraph satisfies:
then has a generator which composes perfectly.
Proof: By lemma 5, is wholetile complete and hence a perfect composition. This means that composes perfectly and it is also a generator for because
5. Proof of the Compose Force theorem
Theorem (Compose Force): Composition of a forced Tgraph produces a valid Tgraph.
Proof: For any forced Tgraph we can construct the composed faces. For the result to be a valid Tgraph we need to show no crossing boundaries and connectedness for the composed faces. These are proved separately by case analysis below.
Proof of no crossing boundaries
Assume is a forced Tgraph and that it has a non-empty set of composed faces (we can ignore cases where the composition is empty as the empty Tgraph is valid). Consider a vertex v in the composed faces of and first take the case that v is on the boundary of . We consider local contexts for a vertex v on a forced Tgraph boundary where the composition is non-empty as shown in figure 11.
In each case v is shown as a red dot, and the composition is shown filled yellow. The cases for v are shown in rows: the first row is for dart origins, the second row is for kite origins, the third row is for kite wings, and the last rows is for kite opps. The dart wing cases are a subset of the kite opp cases, so not repeated, and dart opp vertices are excluded because they cannot be on the boundary of a forced Tgraph. We only show left-hand versions, so there is a mirror symmetric set for right-hand versions.
It is easy to see that there are no crossing boundaries of the composed faces at v in each case. Since any boundary vertex of any forced Tgraph (with a non-empty composition) must match one of these local context cases around the vertex, we can conclude that a boundary vertex of cannot become a crossing boundary in .
Next take the case where v is an internal vertex of .
Figure 12: Vertex types and their relationships
Figure 12 shows relationships between the forced Tgraphs of the 7 (internal) vertex types (plus a kite at the top right). The red faces are those around the vertex type and the black faces are those produced by forcing (if any). Each forced Tgraph has its composition directly above with empty compositions for the top row. We note that a (forced) star, jack, king, and queen vertex remains an internal vertex in the respective composition so cannot become a crossing boundary vertex. A deuce vertex becomes the centre of a larger kite and is no longer present in the composition (top right). That leaves cases for the sun vertex and ace vertex (=fool vertex). The sun Tgraph (sunGraph) and fool Tgraph (fool) consist of just the red faces at the respective vertex (shown top left and top centre). These both have empty compositions when there is no surrounding context. We thus need to check possible forced local contexts for sunGraph and fool.
The fool case is simple and similar to a duece vertex in that it is never part of a composition. [To see this consider inverting the decomposition arrows shown in figure 4. In both cases we see the half-dart opp vertex (labelled 4 in figure 4) is removed].
For the sunGraph there are only 7 local forced context cases to consider where the sun vertex is on the boundary of the composition.
Figure 13: Forced Contexts for a sun vertex v where v is on the composition boundary
Six of these are shown in figure 13 (the missing one is just a mirror reflection of the fourth case). Again, the relevant vertex v is shown as a red dot and the composed faces are shown filled yellow, so it is easy to check that there is no crossing boundary of the composed faces at v in each case. Every forced Tgraph containing an internal sun vertex where the vertex is on the boundary of the composition must match one of the 7 cases locally round the vertex.
Thus no vertex from can become a crossing boundary vertex in the composed faces and since the vertices of the composed faces are a subset of those of , we can have no crossing boundary vertex in the composed faces.
Proof of Connectedness
Assume is a forced Tgraph as before. We refer to the half-tile faces of that get included in the composed faces as the composable faces and the rest as the remainder faces. We want to prove that the composable faces are connected as this will imply the composed faces are connected.
As before we can ignore cases where the set of composable faces is empty, and assume this is not the case. We study the nature of the remainder faces of . Firstly, we note:
Lemma (remainder faces)
The remainder faces of are made up entirely of groups of half-tiles which are either:
Half-fools (= a half dart and both halves of the kite attached to its short edge) where the other half-fool is entirely composable faces, or
Both halves of a kite with both short edges on the () boundary (so they are not part of a half-fool) where (at most) the origin is in common with composable faces, or
Whole fools where (at most) the shared kite origin in common with composable faces.
Figure 14: Remainder face groups (cases 1,2, and 3)
These 3 cases of remainder face groups are shown in figure 14. In each case the possible border in common with composable faces is shown yellow and the red edges are necessarily on the boundary of (the black boundary could be on the boundary of or shared with another reamainder face group). [A mirror symmetric version for the first group is not shown.] Examples can be seen in e.g. figure 13 where the first Tgraph has four examples of case 1, and two of case 2, the second has six examples of case 1 and two of case 2, and the fifth Tgraph has an example of case 3 as well as four of case 1. [We omit the detailed proof of this lemma which reasons about what gets excluded in a composition after forcing. However, all the local context cases are included in figure 15 (left-hand versions), where we only show those contexts where there is a non-empty composition.]
We note from the (remainder faces) lemma that the common boundary of the group of remainder faces with the composable faces (shown yellow in figure 14) is at most a single vertex in cases 2 and 3. In case 1, the common boundary is just a single edge of the composed faces which is made up of 2 adjacent edges of the composable faces that constitute the join of two half-fools.
This means each (remainder face) group shares boundary with exactly one connected component of the composable faces.
Next we establish that if two (remainder face) groups are connected they must share boundary with the same connected component of the composable faces. We need to consider how each (remainder face) group can be connected with a neighbouring such group. It is enough to consider forced contexts of boundary dart long edges (for cases 1 and 3) and boundary kite short edges (for case 2). The cases where the composition is non-empty all appear in figure 15 (left-hand versions) along with boundary kite long edges (middle two rows) which are not relevant for the proof.
Figure 15: Forced contexts for boundary edges
We note that, whenever one group of the remainder faces (half-fool, whole-kite, whole-fool) is connected to a neighbouring group of the remainder faces, any common boundary (shared edges and vertices) with the compososable faces is also connected. The combined common boundary forms either 2 adjacent composed face boundary edges (= 4 adjacent edges of the composable faces), or a composed face boundary edge and one of its end vertices, or a single composed face boundary vertex.
It follows that any connected collection of the remainder face groups shares boundary with a unique connected component of the composable faces. Since the collection of composable and remainder faces together is connected ( is connected) the removal of the remainder faces cannot disconnect the composable faces. For this to happen, at least one connected collection of remainder face groups would have to be connected to more than one connected component of composable faces.
This establishes connectedness of any composition of a forced Tgraph, and this completes the proof of the Compose Force theorem.
Today, 2024-07-31, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …)
we are streaming the 30th episode of the Haskell Unfolder live on YouTube.
In Haskell, the ST type offers a restricted subset of the IO functionality: it provides mutable variables, but nothing else. The advantage is that we can use mutable storage locally, because unlike IO, ST allows us to escape from its realm via the function runST. However, runST has a so-called rank-2 type. In this episode, we will discuss why this seemingly complicated type is necessary to preserve the safety of the operation.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
It might be obvious from my last two articles that I’ve been thinking about the Collatz conjecture a bit. Here, I can tie some of these ideas together in a surprising and really striking way.
Some of this I covered in earlier posts, but I’m going to construct things a little differently, so I’ll start from scratch. The Collatz conjecture is about the function f(n) defined to be n/2 if n is even, or 3n+1 if n is odd. Starting with some number (say, 7, for example) we can apply this function repeatedly to get 7, then 22, then 11, then 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1, and then we’ll repeat 4, 2, 1, 4, 2, 1, and so on forever. The conjecture is that no matter which positive integer you start with, you’ll always end up in that same loop of 4, 2, and 1.
For reference, it’s going to be more convenient for us to work with something called the shortcut Collatz map. The idea here is that when n is odd, we already know that 3n+1 will be even. So we can shortcut one iteration by jumping straight to (3n+1)/2, just avoiding a separate pass for the division by two that we already know will be necessary.
We tend to work in base 10 as society, but the question I asked in an article a couple weeks ago is what happens if you perform this computation in base 2 or 3, instead.
In base 2, it’s trivial to decide if a number is even or odd, and if it’s even, to divide by two. You just look at the least significant bit, and drop it if it’s a zero!
In base 3, it’s trivial to compute 3n+1. You just add a 1 digit to the end of the number!
We could go either way, really, and in my original article I explored both computations to see what they looked like. This time, we’ll first head deep into the base 2 side, and see where it leads us.
Collatz in Base 2
When computing the Collatz function in base 2, the computationally significant part is to multiply a base 2 number by 3. We can work this out in the standard algorithm we all learned in elementary school, working from right to left, and keeping track of a carry at each step.
We can even enumerate the rules:
If the next bit is a 0 and the carry is a 0, then output a 0 and carry a 0.
If the next bit is a 1 and the carry is a 0, then output a 1 and carry a 1.
If the next bit is a 0 and the carry is a 1, then output a 1 and carry a 0.
If the next bit is a 1 and the carry is a 1, then output a 0 and carry a 2.
If the next bit is a 0 and the carry is a 2, then output a 0 and carry a 1.
If the next bit is a 1 and the carry is a 2, then output a 1 and carry a 2.
and represent this using a finite state machine with a transition diagram.
This machine isn’t too hard to understand, really. When you see a 0, move up one state; when you see a 1, move down one state. When the carry is 1 (before moving), output the opposite bit; otherwise, output the same bit. That’s all there is to it.
We will make three small modifications to this simple state machine:
In the Collatz map, we want to compute 3n+1, That just amounts to starting with a carry of 1, rather than 0.
Before computing 3n+1, we should divide by two until the number is odd. That amounts to adding a new “Start” state, or S for short, that ignores zeros on the right, and then acts like carry 1 when it encounters the first 1 bit. (Recall that we’re working from right to left!)
Finally, let’s compute the shortcut map as well: as discussed above, when we compute 3n+1, it will always be even. (We will always move the start state that acts like carry 1 into carry 2, and the arrow there emits a 0 in the least significant bit.) We do not emit the zero when moving from the start state to carry 2, so the bits that come out represent (3n+1)/2.
The resulting maching looks like this.
When fed the right-to-left bits of a non-zero number, this machine will compute what we might call a compressed Collatz map: dividing by 2 as long as the number remains even, and then compute (3n+1)/2 just like the shortcut Collatz map does.
Iterating the Map
The Collatz conjecture isn’t about a single application of this map, though, but rather about the trajectory of a number when the map is applied many times in succession. To simulate this, we’ll want a whole infinite array of these machines connected end to end, so the bits that leave each one arrive at the one after. Something like this:
This is starting to get out of hand! So let’s simplify. Two things:
Because their state transition diagrams are all the same, the only information we need about each machine is what state it’s in.
The S state never emits any bits, and you can never get back to S once you leave it, so we know that as soon as we see an S, the entire rest of the machines, the whole infinite tail, is still sitting in the S state waiting for bits. We need not worry about these states at all.
Once we’re done feeding in the non-zero digits of the input number, any machines in state 0 also become uninteresting. The rest of the inputs will all be zero, they will remain in state 0, and they will pass on that 0 bit of input unchanged. Again, we need not worry about these machines.
With that in mind, we can trace what happens when we feed this array of cascading machines the bits of a number. Let’s try 7, since we saw its sequence already earlier on.
The output of each machine feeds into the next machine below it, and I’ve drawn this propagation of outputs to inputs of the next machine using green arrows. We’ll draw the digits of input from right to left, matching the conventional order of writing binary numbers, so in a sense, time flows from right to left here. Each state machine remembers its state as time progresses to the left, and I’ve drawn this memory of previous states using blue arrows. Notice that to play out the full dynamics, we need to feed in quite a few of the leading zeros on the input.
In the rows of green arrows, you can read off the outputs of each state machine in the sequence in binary:
binary 111 = decimal 7
binary 1011 = decimal 11
binary 10001 = decimal 17
binary 11010 = decimal 26
binary 10100 = decimal 20
binary 1000 = decimal 8
binary 10 = decimal 2
If we were to continue, the next rows would just repeat binary 10 (decimal 2) forever. This makes sense, because the way we defined the compressed Collatz map stabilizes at 2, rather than 1.
A second thing we can read from this map is the so-called parity sequence of the shortcut Collatz map. This is just the sequence of evens and odds that occur when the map is iterated on the starting number. When a column ends by emitting a 1 bit, bumping a new machine out of the S state, that indicates that the next value of the shortcut Collatz map will be odd. When it ends in a 0 bit, then the next value will be even.
There’s quite a lot of interesting theory about the parity sequences of the shortcut Collatz map! It turns out that every possible parity sequence is generated by a unique 2-adic integer, which I defined in my previous article, so the 2-adic integers are in one-to-one correspondence with parity sequences. We can, in fact, compute the reverse direction of this one-to-one correspondence as well using state arrays like this one. Every eventually repeating parity sequence comes from a rational number, via the canonical embedding of the rationals into the 2-adic numbers. (The converse, though, that acyclic parity sequences only come from irrational 2-adic integers, is conjectured but not known!)
Because every parity sequence comes from a unique 2-adic integer, if we could show that every positive integer eventually leads to alternating even and odd numbers in its parity sequence, this would prove that the Collatz conjecture is true. Now we have a new way of looking at that question. Among the 2-adic integers, the positive integers are those that eventually have an infinite number of leading 0s. So we can ask instead, from any starting state sequence of this array of machines, when feeding zeros into the sequence forever, do we eventually (ignoring machines at the beginning that have reached the 0 state) reach only a single machine alternating through states 1, 2, 1, 2, etc.?
This isn’t an easy question, though. Certainly, feeding zeros into the array on the left will bump the state of the top-most machines down to zero. However, the bits continue to propagate through the machine, possibly pushing new machines out of their starting states and so appending them on the bottom! There is something of a race between these two processes of pruning machines on the top and adding them on the bottom, and we would need to show that the pruning wins that race.
From Base 2 to Base 3
As we investigate this race, we discover something surprising about the behavior of the state sequences when leading zeros are fed into the top machine. If you read the machine states in the blue arrows, starting at the third column from the right after all the non-zero bits of input have been fed in, we can interpret those state sequences as a ternary (base 3) number! And we get quite a familiar progression:
ternary 222 = decimal 26
ternary 111 = decimal 13
ternary 202 = decimal 20
ternary 101 = decimal 10
ternary 12 = decimal 5
ternary 22 = decimal 8
ternary 11 = decimal 4
ternary 2 = decimal 2
ternary 1 = decimal 1
That’s the shortcut Collatz sequence again! Rather than starting at 7, we jumped three numbers ahead because it took those three steps to feed in the non-zero bits of 7, so we missed 7, 11 and 17 and went straight to 26. Then we continue according to the same dynamics.
This coincidence where state sequences can be interpreted as ternary numbers is suprising, but is it useful? It can be a revealing way to think about Collatz sequences. Here’s an example.
Numbers of the form 2ⁿ-1 have a binary representation consisting of n consecutive 1s. If we trace what happens to the state sequence, we find that each 1 we feed to this state sequence propagates all the way to the end to append another 2 to the sequence, leaving us with a state sequence consisting of n consecutive 2s. As a ternary number, that is 3ⁿ-1. If the above is correct, then, we can conclude that iterating the shortcut Collatz map n times starting with 2ⁿ-1 should yield 3ⁿ-1 as a result.
In fact, this isn’t hard to prove. We can prove the more general statement that for all i ≤ n, iterating the shortcut Collatz map i times on 2ⁿ-1 gives a result of 3ⁱ2ⁿ⁻ⁱ-1. A simple induction suffices. If i = 0, then the result is immediate. Assuming it’s true for i, and that i + 1 ≤ n, we know that 3ⁱ2ⁿ⁻ⁱ-1 is odd, so applying the shortcut Collatz map one more time yields (3(3ⁱ2ⁿ⁻ⁱ-1)+1)/2, which simplifies to establish the property for i + 1 as well, completing the induction. Now let i = n to recover the original statement.
The proof was simple, but the idea came from observing the behavior of this state machine. And this is an interesting observation: 3ⁿ-1 grows asymptotically faster than 2ⁿ-1, so it implies that there is no bound on the factor by which a number might grow in the Collatz sequence. We can always find some arbitrarily large n that grows to at least about 1.5ⁿ times its original value.
From Base 3 to Base 2
Recall that in base 3, computing 3n+1 is easy, but it’s dividing by two that becomes a non-trivial computation. Back in elementary school again, we learned about long division, an algorithm for dividing numbers digit by digit, this time left to right. To do this, we divide each digit in turn, but keep track of a remainder that propagates from digit to digit. We can also draw this as a state machine.
To extend this to the shortcut Collatz map, we need to look for a remainder when the division completes. This means that we’ll need to feed our state machine not just the ternary digits 0, 1, and 2, but an extra “Stop” signal (S for short) indicating the number is complete. Since the result may be longer than the input, it will be convenient to send an infinite sequence of these S signals, giving the machine time to output all of the digits of the result before it begins to produce S signals, as well. Upon receiving this S signal, if there is a remainder, then the input was odd, so our state machine needs to add another 2 onto the end of the sequence of ternary digits to complete the computation of (3n+1)/2 before emitting its own S signals.
Just as before, we’re interested not in a single application of this state machine, but the behavior of numbers under many different iterations, so we will again chain together an infinite number of these machines, feeding the ternary digits (or S signals) that leave each one into the next one as inputs. This time I’ll draw the ternary digits flowing from right to left.
Let’s try to simplify this picture.
Again, all the state machines share the same transition diagram, so we need only note which state each machine is in.
Once a machine (or the input) starts emitting S signals, it will never stop, so we need not concern outselves with these machines.
Because the machines start in state 0 and machines in state 0 always decrease the ternary digit so no single digit can change more than two of these into non-zero states before it becomes zero itself, we’ll always encounter an infinite tail of 0 states to the left, which are similarly uninteresting.
With those simplifications in mind, we can work through the behavior of these machines starting with the input 7, which is 21 in base 3. This time, input digits (time, essentially) flows from top to bottom, while the iterations of the state machine are oriented from right to left. The green arrows represent the memory of state over time, and the blue arrows represent input and output digits.
Following the flow from right to left and reading down the blue arrows representing ternary digits, we can see the ternary values from the shortcut Collatz map computed by the state machines, read from top to bottom. We might ask a question similar to the earlier one: can we show that, starting from any state and throwing S signals at these state machines from the right, it somehow simplifies to the sequence 10 (a 0, followed by a 1 to its left), which indicates we’ve reached the cyclic orbit at 1, 2 in the shortcut Collatz sequence?
In looking at this, as you likely guessed, we find that the state sequences when read from left to right from green arrows (starting from the second row down, after all the input digits have been fed in) give the binary form of the compressed Collatz map. That’s the one that even further shortens the shortcut Collatz map by folding all the divisions by two so they happen implicitly before each odd value is processed. Starting with base 3, then, we end up back in base 2!
What’s going on? It’s easy to see that the diagram above is the same as the one from binary earlier, except for the addition of two rows at the top where we’re still feeding in the ternary digits, and some uninteresting state propagation from machines that are already emitting S signals, and swapping the interpretation of the axes. But why?
Let’s compare the state machines:
They look quite different… but this is an illusion created by a biased presentation. These diagrams emphasize the state structure, but relegate the input structure to text labels on the arrows. We can instead draw both diagrams at once in this way:
In the base 2 case, we can interpret the rows as representing bits of input, and the columns as states: three carries, and the Start state S. In the base 3 case, we can interpret the columns as inputs: three ternary digits, and the Stop signal S, and the rows as states. With either interpretation, though, the rule is the same: we are exchanging a presentation of a number from 0 through 5 as 3b + t for a presentation as 2t + b, where t takes values 0, 1, or 2, while b takes values 0 or 1, and with the same rules for the special S token on the side of the least significant digits.
So in some deep sense, computing the Collatz trajectory in base 2 or base 3 is performing the same computation. This is true even though in base 2, we’re computing the compressed Collatz map, which has fewer iterations (but more digits to compute with), while in base 3, we’re computing the shortcut Collatz map, which has more iterations (but fewer digits to compute with). Somehow these differences are all dual to each other so the same thing happens in each.
AI is hot, so let’s talk about some “classical machine learning” in Haskell
with k-means clustering! Let’s throw in some dependent types too.
There are a bazillion ways of implementing such a simple algorithm, but this
is how I’d do it, as someone who develops almost exclusively in Haskell
(or functional pure languages) in both personal projects and work. It’s not the
“right” way or the “best” way, but it’s the way that brings me joy. Hopefully it
can also break beyond the simple toy projects you’ll often see in conceptual
tutorials. You’ll see how I integrate dependent types, type-driven development,
mutable data structures, generating random data, and preparation for
parallelism. I have been meaning to shift away from “conceptual” posts and
instead post a bit more about small, practical snippets that demonstrate some
useful Haskell techniques and principles drive how I approach coding in Haskell
overall.
For reference, the intended audience is for people with knowledge of Haskell
syntax and basic idioms (mapping, traversing, folding, applicatives). The source
code is
online here, and is structured as a nix flake script. If you have nix installed (and flakes enabled), you should be
able to run the script as an executable (./kmeans.hs). You can also
load it for editing with nix develop + ghci.
The Algorithm
K-means is a
method of assigning a bunch of data points and samples into k clusters.
For the purpose of this post, we’re going to talk about data points as points in
a vector space and clustering as grouping together clusters of points that are
close to each other (using Euclidean/L2 distance).
The basic iteration goes like this:
Start with k cluster centers (“means”, or “centroids” sometimes),
k arbitrary points in your space.
Repeat until the stop condition:
Assign/bucket each data point to its closest cluster center/mean.
Move each of the cluster centers to the mean/centroid of the points that
were assigned to it, or the points in its bucket.
Basically, we repeatedly say, “if this was the true cluster center, what
points would be in it?”. Then we adjust our cluster center to the center of
those points that were assigned to it, updating to a better guess. Then we
repeat again. A simple stopping condition would be if none of the k
centers move after the update step.
The algorithm leaves the assigning of the original points undefined, and it’s
also not optimal either, since it might converge on clusters that aren’t the
best. But it’s simple enough conceptually that it’s taught in every beginner
machine learning course.
The Haskell
We’re going to be dealing with points in a vector space and distances between
them, so a good thing to reach for is the linear library, which
offers types for 2D vectors, 3D vectors, etc. and how to deal with them as
points in a vector space. linear offers an abstraction over multiple
vector space points. A point has type p a: p is a
vector space over field a. The library has V2 a for 2D
points, so V2 Double is essentially\(\mathbb{R}^2\), a 2 dimensional point with
double-valued components.
We want a collection of k cluster centers. We can use vector-sized for
a fixed-size collection of items, Vector k (V2 Double) for
k 2-D double points, or Vector k (p a) for
k of any type of points.1
So overall, our function will have type:
kMeans :: [p a] ->Vector k (p a)
It will take a collection of p a points, and provide the
k cluster centers. Note here that we have “return-type
polymorphism”, where the k (number of items) is determined by what
type the user expects the function to return. If they want 3 clusters of 2d
points, they will call it expecting Vector 3 (V2 Double). If they
want 10 clusters of 2d points, they would call it expecting
Vector 10 (V2 Double).
We take a list of p a’s here because all we are going
to do is iterate over each one…we don’t really care about random access
or updates, so it’s really the best we can hope for, asymptotically2.
We have some leeway as to how we initialize our initial clusters. One simple
solution is to just assign point 0 to cluster 0, point 1 to cluster, point 2 to
cluster 2, etc., cycling around the clusters.
runST runs the mutable algorithm where we initialize a vector of
point sums and a vector of point counts. We then iterate over all of the points
with their index (with ifor_), and we add that point to the index
of the cluster, modulo k. A sized vector Vector k a is
indexed by a Finite k (an integer from 0 to k-1). So,
modulo :: Integer -> Finite k will convert an integer index to
the Finite k index type, using modulus to wrap it around if it’s
too big.
Here we are using some functions from linear:
(^+^) :: (Additive p, Num a) => p a -> p a -> p a
which adds together two points
(^/) :: (Functor p, Fractional a) => p a -> a -> p a
which divides a point by a scalar
At the end of it all, we use V.generateM to assemble our final
(immutable) centroids by reading out the sums and totals at each cluster:
V.generateM :: (Finite k -> m a) -> m (Vector k a)
Note that the lengths of our intermediate vectors (sums,
counts, and the final result) are all implicitly inferred through
type inference (by k).
We can actually do a similar loop to assign/bin each point and compute the
new centroids:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L44-L61moveClusters ::forall k p a. (Metric p, Floating a, Ord a, KnownNat k, 1<= k) => [p a] ->Vector k (p a) ->Vector k (p a)moveClusters pts origCentroids = runST do sums <- MV.replicate zero counts <- MV.replicate 0 for_ pts \p ->dolet closestIx = V.minIndex @a @(k -1) (distance p <$> origCentroids) MV.modify sums (^+^ p) closestIx MV.modify counts (+1) closestIx V.generateM \i ->do n <- MV.read counts iif n ==0thenpure$ origCentroids `V.index` ielse (^/fromInteger n) <$> MV.read sums i
We just have to be careful to not move the centroid if there are no points
assigned to it, otherwise we’d be dividing by 0.
Notice there’s also something a little subtle going on with
closestIx, which exposes a bit of the awkwardness with working with
type-level numbers in Haskell today. The type of V.minIndex is:
V.minIndex ::forall a n.Ord a =>Vector (n +1) a ->Finite (n +1)
This is because we only ever get a minimum if the vector is non-empty. So the
library takes n + 1 as the size to ensure that only positive length
vectors are passed.
In our case, we want V.minIndex blah :: Finite k. However,
remember how typechecking works: we need to unify the type variables
a and n so that n + 1 is equal to
k. So, what does n have to be so that \(n + 1 = k\)? Well, we can see from algebra that
n needs to be k - 1: (k - 1) + 1 is equal
to k. However, GHC is a little dumb-dumb here in that it cannot
solve for n itself. We can explicitly pass in @(k - 1)
to say that n has to be k - 1.
For this to work we need to pull in a GHC plugin ghc-typelits-natnormalise
which will allow GHC to simplify (k - 1) + 1 to be k,
which it can’t do by itself for some reason. It also requires the constraint
that 1 <= k in order for k - 1 to make sense for
natural number k. We can pull in the plugin with:
Honestly if we were to design the library from scratch today, I’d define it
as:
V.minIndex ::forall a n. (Ord a, 1<= n) =>Vector n a ->Finite n
in the first place, and we wouldn’t need the typechecker plugin.
Anyway so that’s the whole thing:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L63-L75kMeans ::forall k p a. (Metric p, Floating a, Ord a, Eq (p a), KnownNat k, 1<= k) => [p a] ->Vector k (p a)kMeans pts = go 0 (initialClusters pts)where go ::Int->Vector k (p a) ->Vector k (p a) go !i !cs| cs == cs' || i >100= cs|otherwise= go (i +1) cs'where cs' = moveClusters pts cs
Note I also added a stop after 100 steps, just to be safe.
Type-Level Advantages and
Usability
Having k in the type is useful for many reasons:
It helps us ensure that moveClusters doesn’t change the number
of clusters/centroids. If it was just [p a] -> [p a] we cannot
guarantee that it does not add or drop clusters.
The type system means we don’t have to manually pass int sizes
around. For example, in initialClusters, we implicitly pass the
size around four times when we do MV.replicate (twice),
modulo, and generateM! And, in the definition of
kMeans, we implicitly pass it on to our call to
initialClusters.
We don’t have to worry about out-of-bounds indexing because any indices we
generate (using modular or minIndex) are guaranteed
(by their types) to be valid.
It’s useful for the caller to guarantee they are getting what they are
asking for. If kMeans :: Int -> [p a] -> [p a], then we (as
the caller) can’t be sure that the result list has the number of items that we
requested. But because we have
kMeans :: [p a] -> Vector k (p a), the compiler ensures that the
result has k items.
However you won’t always be able to necessarily put in a literal
3 in Vector 3 (V2 Double). Maybe your k comes
from a configuration file or something else you pull in at runtime. We need a
way to call kMeans with just an Int! (also known as
“reification”)
Normally, this means using someNatVal to convert a value-level
Natural into a type-level Nat. However, in this case
we have to be a bit more careful because k must be at least 1. As of
GHC 9.2, we can use cmpNat (before this, you could use typelits-witnesses)
to bring this constraint into scope.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L77-L87kMeans' ::forall p a. (Metric p, Floating a, Ord a, Eq (p a)) =>Natural-> [p a] -> [p a]kMeans' k pts =case someNatVal k ofSomeNat@k pk ->case cmpNat (Proxy@1) pk ofLTI-> toList $ kMeans @k pts -- 1 < k, so 1 <= k is validEQI-> toList $ kMeans @k pts -- 1 == k, so 1 <= k is validGTI-> [] -- in this branch, 1 > k, so we cannot call kMeans
Applying the Clusters
Of course, kMeans only gets us our centroids, so it would be
useful to actually create the clusters themselves and all their member points.
We can do something similar to what we did before with ST and
mutable vectors and runST, but life is too short to always be using
mutable state. Let’s instead build up a map of indices to all the points that
are closest to that index. Then we use
generate :: (Finite k -> a) -> Vector k a to create a vector
by picking out the maps’ value at the index at each spot in the vector. Again
here we see that the type system helps us by not having to manually pass in a
size, and generate giving us indices i that match the
number of the centroids we are grouping on.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L104-L119applyClusters ::forall k p a. (Metric p, Floating a, Ord a, Ord (p a), KnownNat k, 1<= k) => [p a] ->Vector k (p a) ->Vector k (Set (p a))applyClusters pts cs = V.generate \i -> M.findWithDefault S.empty i pointsClosestTowhere pointsClosestTo ::Map (Finite k) (Set (p a)) pointsClosestTo = M.fromListWith (<>) [ (closestIx, S.singleton p)| p <- pts , let closestIx = V.minIndex @a @(k -1) (distance p <$> cs) ]
Parallelization
Typically we parallelize this by assigning each worker thread a chunk of
points it has to deal with, and having each one compute sums and counts and
coordinating it all back in the end. In this case we want to keep the
intermediate sums and counts:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L89-L102groupAndSum :: (Metric p, Floating a, Ord a, KnownNat (k +1)) => [p a] ->Vector (k +1) (p a) ->Vector (k +1) (p a, Integer)groupAndSum pts cs0 = runST do sums <- MV.replicate zero counts <- MV.replicate 0 for_ pts \p ->dolet closestIx = V.minIndex (distance p <$> cs0) MV.modify sums (^+^ p) closestIx MV.modify counts (+1) closestIx V.generateM \i -> (,) <$> MV.read sums i <*> MV.read counts i
Running an example
For funsies let us generate sample points that we know are clustered based on
k random cluster centers, using mwc-random for
randomness.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/kmeans/kmeans.hs#L121-L147generateSamples ::forall p g m. (Applicative p, Traversable p, StatefulGen g m) =>-- | number of points per clusterInt->-- | number of clustersInt-> g -> m ([p Double], [p Double])generateSamples numPts numClusters g =do (centers, ptss) <-unzip<$> replicateM numClusters do-- generate the centroid uniformly in the box component-by-component center <-sequenceA$pure@p $ MWC.uniformRM (0, boxSize) g-- generate numPts points... pts <- replicateM numPts $-- .. component-by-component, as normal distribution around the centertraverse (\c -> MWC.normal c 0.1 g) centerpure (center, pts)pure (centers, concat ptss)where-- get the dimension by getting the length of a unit point dim =length (pure () :: p ())-- approximately scale the range of the numbers by the area that the-- clusters would take up boxSize = (fromIntegral numClusters **recip (fromIntegral dim)) *20
By the way isn’t it funny that everything just ends up being
traverse or some derivation of it (like replicateM or
sequenceA)? Anyways,
I am very humbled to be supported by an amazing community, who make it
possible for me to devote time to researching and writing these posts. Very
special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
Be mindful, for Vector here we are using things
strictly as a “fixed-sized collection of values”, whereas for linear,
we have types like V2 which represent points in a mathematical
vector space. It’s a bit unfortunate that the terminology overlaps here a
bit.↩︎
Yes, yes, linked lists are notoriously bad for the CPU-level
cache and branch prediction, so if we are in a situation where we really care,
using a contiguous memory data structure (like Storable Vector) might be
better.↩︎
“Algebraic data types” is a beloved feature of programming languages with
such a mysterious name. Where does this name come from?
There are two main episodes in this saga: Hope and Miranda.
The primary conclusion is that the name comes from universal algebra,
whereas another common interpretation of “algebraic” as a reference
to “sums of products” is not historically accurate.
We drive the point home with Clear. CLU is extra.
Disclaimer: I’m no historian and I’m nowhere as old as these languages
to have any first-hand perspective.
Corrections and suggestions for additional information are welcome.
Algebraic data types were at first simply called “data types”.
This programming language feature is commonly attributed to
Hope, an experimental applicative language by Rod Burstall et al..
Here is the relevant excerpt from the paper, illustrating its concrete syntax:
A data declaration is used to introduce a new data type along with
the data constructors which create elements of that type. For example,
the data declaration for natural numbers would be:
data num == 0 ++ succ(num)
(…) To define a type ‘tree of numbers’, we could say
data numtree == empty ++ tip(num)
++ node(numtree#numtree)
(The sign # gives the cartesian product of types).
One of the elements of numtree is:
But we would like to have trees of lists and trees of trees as well,
without having to redefine them all separately. So we declare a
type variable
typevar alpha
which when used in a type expression denotes any type
(including second- and higher-order types).
A general definition of tree as a parametric type is now possible:
data tree(alpha) == empty ++ tip(alpha)
++ node(tree(alpha)#tree(alpha))
Now tree is not a type but a unary type constructor – the type
numtree can be dispensed with in favour of tree(num).
Pattern matching in Hope is done in multi-clause function declarations or multi-clause lambdas.
There was no case expression.
As far as I can tell, other early programming languages cite Hope or one of its descendants
as their inspiration for data types.
There is a slightly earlier appearance in NPL by Darlington and the same Burstall,
but I couldn’t find a source describing the language or any samples of data type declarations.
Given the proximity, it seems reasonable to consider them the same language to a large extent.
This paper by Burstall and Darlington (1977) seems to be using NPL
in its examples, but data types are only introduced informally;
see on page 62 (page 19 of the PDF):
We need a data type atom, from which we derive a data type tree, using constructor
functions tip to indicate a tip and tree to combine two subtrees
tip : atoms → trees
tree : trees x trees → trees
We also need lists of atoms and of trees, so for any type alpha let
nil : alpha-lists
cons : alphas x alpha-lists → alpha-lists
Hope inspired ML (OCaml’s grandpa) to adopt data types. In Standard ML:
datatype 'a option = Nothing | Some of 'a
Before it became Standard, ML started out as the “tactic language” of the
LCF proof assistant
by Robin Milner, and early versions did not feature data types
(see the first version of Edinburgh LCF).
it’s unclear when data types were added exactly, but
The Definition of Standard ML
by Milner et al. credits Hope for it (in Appendix F: The Development of ML):
Two movements led to the re-design of ML. One was the work of Rod Burstall
and his group on specifications, crystallised in the specification language
Clear and in the functional programming language Hope; the
latter was for expressing executable specifications. The outcome of this work
which is relevant here was twofold. First, there were elegant programming
features in Hope, particularly pattern matching and clausal function definitions;
second, there were ideas on modular construction of specifications,
using signatures in the interfaces. A smaller but significant movement was
by Luca Cardelli, who extended the data-type repertoire in ML by adding
named records and variant types.
The basic method of introducing a new concrete data type, as in a number of
other languages, is to declare a free algebra. In Miranda this is done by an
equation using the symbol ::=,
tree ::= Niltree | Node num tree tree
being a typical example. (…)
The idea of using free algebras to define data types has a long and respectable
history [Landin 64], [Burstall 69], [Hoare 75]. We call it a free algebra, because
there are no associated laws, such as a law equating a tree with its mirror image.
Two trees are equal only if they are constructed in exactly the same way.
In case you aren’t aware, Miranda is a direct precursor of Haskell.
A minor similarity with Haskell that we can see here
is that data constructors are curried in Miranda, unlike in Hope and ML.
Another distinguishing feature of Miranda is laziness.
See also A History of Haskell: being lazy with class.
Below are links to the articles cited in the quote above.
The first [Landin 64] doesn’t explicitly talk about algebra in
this sense, while [Burstall 69] and [Hoare 75] refer to “word algebra” rather than
“free algebra” to describe the same structure, without putting “algebra” in the
same phrase as “type” yet.
Hoare’s paper contains some futuristic pseudocode in particular:
A possible notation for such a type definition was
suggested by Knuth; it is a mixture of BNF (the | symbol) and the
PASCAL definition of a type by enumeration:
(…)
In defining operations on a data structure, it is usually necessary
to enquire which of the various forms the structure takes, and what are
its components. For this, I suggest an elegant notation which has been
implemented by Fred McBride in his pattern-matching LISP. Consider
for example a function intended to count the number of &s contained in
a proposition. (…)
function andcount (p: proposition): integer;
andcount := cases p of
(prop(c) → 0|
neg(q) → andcount(q)|
conj(q,r) → andcount(q) + andcount(r)+1|
disj(q,r) → andcount(q) + andcount(r));
Fred McBride’s pattern-matching LISP is the topic of
his PhD dissertation.
There is not enough room on this page to write about the groundbreaking history of LISP.
Unfree algebras in Miranda
If algebraic data types are “free algebras”,
one may naturally wonder whether “unfree algebras” have a role to play.
Miranda allows quotienting data type definitions by equations (“laws” or “rewrite rules”).
You could then define the integers like this, with a constructor
to decrement numbers, and equations to reduce integers to a canonical representation:
int ::= Zero | Suc int | Pred int
Suc (Pred n) => n
Pred (Suc n) => n
In hindsight this is superfluous, but it’s fun to see this kind of
old experiments in programming languages.
The modern equivalent in Haskell would be to hide the data constructors
and expose smart constructors instead.
There are uses for quotient types in proof assistants and dependently
typed languages, but they work quite differently.
Sums of products?
There is another folklore interpretation of “algebraic” in “algebraic data types”
as referring to “sums of products”.
It’s not an uncommon interpretation. In fact, trying to find a source for
this folklore is what got me going on this whole adventure.
The Wikipedia article on algebraic data types
at the time of writing doesn’t outright say it, but it does refer to sums and
products several times while making no mention of free algebras.
Some [citation needed] tags should be sprinkled around.
The Talk page of that article contains an unresolved discussion of this issue, with links to
a highly upvoted SO answer
and another one
whose references don’t provide first-hand account of the origins of the term.
For sure, following that idea leads to some fun combinatorics,
like differentiation on data types,
but that doesn’t seem to have been the original meaning of “algebraic data types”.
That interpretation might have been in some people’s mind in the 70s and 80s,
even if only as a funny coincidence, but I haven’t found any written evidence of
it except maybe this one sentence in a later paper,
Some history of programming languages by David Turner (2012):
The ISWIM paper also has the first appearance of algebraic type definitions
used to define structures. This is done in words, but the sum-of-products idea is
clearly there.
It’s only a “maybe” because while the phrase “algebraic type” undeniably refers to
sums of products, it’s not clear that the adjective “algebraic” specifically is
meant to be associated with “sum-of-products” in that sentence. We
could replace “algebraic type” with “data type” without changing the meaning
of the sentence.
In contrast, free algebras—or initial algebras as one might prefer to call them—are
a concept from the areas of universal algebra and category theory with
a well-established history in programming language theory by the time
algebraic data types came around, with influential contributions by a certain
ADJ group;
see for example Initial algebra semantics and continuous algebras.
Ironically, much related work focused on the other ADT, “abstract data types”.
Using universal algebra as a foundation, a variety of “specification languages” have
been designed for defining algebraic structures, notably the OBJ family
of languages created by Joseph Goguen (a member of the aforementioned ADJ group) and others,
and the Clear language by Rod Burstall (of Hope fame) and Joseph Goguen.
Details of the latter can be found in The Semantics of Clear, a specification language.
(You may remember seeing a mention of Clear earlier
in the quote from The Definition of Standard ML.)
Example theories in Clear
Here is the theory of monoids in Clear. It consists of one sort named carrier,
an element (a nullary operation) named empty and a binary operation append.
constant Monoid = theory
sorts carrier
opns empty : carrier
append : carrier,carrier -> carrier
eqns all x: carrier . append(x,empty) = x
all x: carrier . append(empty,x) = x
all x,y,z: carrier . append(append(x,y),z) = append(x,append(y,z))
endth
A theory is an interface. Its implementations are called algebras.
In that example, the algebras of “the theory of monoids” are exactly monoids.
In every theory, there is an initial algebra obtained by turning the
operations into constructors (or “uninterpreted operations”), equating elements
(which are trees of constructors) modulo the equations of the theory.
For the example above, the initial monoid is a singleton monoid, with only an empty element
(all occurrences of append are simplified away by the two equations for empty),
which is not very interesting. Better examples are those corresponding to the usual data types.
The booleans can be defined as the initial algebra of the theory with one sort (truthvalue)
and two values of that sort, true and false.
constant Bool = theory data
sorts truthvalue
opns true,false: truthvalue
endth
In Clear, the initial algebra is specified by adding the data keyword to a theory.
In the semantics of Clear, rather than thinking in terms of a specific algebra,
a “data theory” is still a theory (an interface),
with additional constraints that encode “initiality”, so the only possible
algebra (implementation) is the initial one.
My guess as to why the concept of data theory is set up that way
is that it allows plain theories and data theories to be combined seamlessly.
The natural numbers are the initial algebra of zero and succ:
constant Nat = theory data
sorts nat
opns zero: nat
succ: nat -> nat
endth
At this point, the connection between “data theories” in Clear and data types
in Hope and subsequent languages is hopefully clear.
More substantial examples in Clear
Theories can be extended into bigger theories with new sorts, operations, and equations.
Here is an extended theory of booleans with two additional operations not, and,
and their equations. This should demonstrate that, beyond the usual mathematical structures,
we can define non-trivial operations in this language:
constant Bool1 = enrich Bool by
opns not: truthvalue -> truthvalue
and: truthvalue,truthvalue -> truthvalue
eqns all . not true = false
all . not false = true
all p: truthvalue . and(false, p) = false
all p: truthvalue . and(true, p) = p
enden
Initial algebras are also called free algebras, but that gets confusing because
“free” is an overloaded word. Earlier for instance, you might have expected the initial
monoid, or “free monoid”, to be the monoid of lists. The monoid of lists is the
initial algebra in a slightly different theory: the theory of monoids with an
embedding from a fixed set of elements A.
We might formalize it as follows in Clear.
The theory List is parameterized by an algebra A of the theory Set,
and its body is the same as Monoid, except that we renamed carrier to list,
we added an embed operation, and we added the data keyword to restrict that
theory to its initial algebra.
constant Set = theory sorts element endth
procedure List(A : Set) = theory data
sorts list
opns empty : list
append : list,list -> list
embed : element of A -> list
eqns all x: list . append(x,empty) = x
all x: list . append(empty,x) = x
all x,y,z: list . append(append(x,y),z) = append(x,append(y,z))
endth
One may certainly see a resemblance between theories in Clear, modules in ML,
and object-oriented classes.
It’s always funny to find overlaps between the worlds of functional and
object-oriented programming.
CLU is a programming language created at MIT by Barbara
Liskov and her students in the course of their work on data abstraction.
It features tagged union types, which are called “oneof types”.
(Source: CLU Reference Manual by Barbara Liskov et al. (1979).)
T = oneof[empty: null,
integer: int,
real_num: real,
complex_num: complex]
Values are constructed by naming the oneof type (either as an identifier bound to it,
or by spelling out the oneof construct) then the tag prefixed by make_:
T$make_integer(42)
The tagcase destructs “oneof” values.
x: oneof[pair: pair, empty: null]
...
tagcase x
tag empty: return(false)
tag pair(p: pair): if (p.car = i)
then return(true)
else x := down(p.cdr)
end
end
The main missing feature for parity with algebraic data types is recursive type
definitions, which are not allowed directly. They can be achieved indirectly
though inconveniently through multiple clusters (classes in modern terminology).
(Source: A History of CLU by Barbara Liskov (1992).)
Burstall’s papers on Hope and Clear cite CLU, but beyond that it doesn’t
seem easy to make precise claims about the influence of CLU, which is an object-oriented language,
on the evolution of those other declarative languages developed across the pond.
Software engineers are not (and should not be) technicians
I don’t actually think predictability is a good thing in software
engineering. This will probably come as a surprise to some people
(especially managers), but I’ll explain what I mean.
In my view, a great software engineer is one who automates
repetitive/manual labor. You would think that this is a pretty low bar
to clear, right? Isn’t automation of repetitive tasks … like …
programming 101? Wouldn’t most software engineers be great engineers
according to my criterion?
No.
I would argue that most large software engineering
organizations incentivize anti-automation and it’s primarily
because of their penchant for predictability, especially predictable
estimates and predictable work. The reason this happens is that
predictable work is work that could have been automated but was
not automated.
Example
I’ll give a concrete example of predictable work from my last job.
Early on we had a dedicated developer for maintaining our web API. Every
time some other team added a new gRPC API endpoint to an internal
service this developer was tasked with exposing that same information
via an HTTP API. This was a fairly routine job but it still required
time and thought on their part.
Initially managers liked the fact that this developer could estimate
reliably (because the work was well-understood) and this developer liked
the fact that they didn’t have to leave their comfort zone. But
it wasn’t great for the business! This person frequently became
a bottleneck for releasing new features because they had inserted their
own manual labor as a necessary step in the development pipeline. They
made the case that management should hire more such developers
like themselves to handle increased demand for their work.
Our team pushed back on this because we recognized that this
developer’s work was so predictable that it could be completely
automated. We made the case to management that rather than hiring
another person to do the same work we should be automating more and it’s
a good thing we did; that developer soon left the company and instead of
hiring to replace them we automated away their job instead. We wrote
some code to automatically generate an HTTP API from the corresponding
gRPC API1 and that generated much more value
for the business than hiring a new developer.
Technicians vs Engineers
I like to use the term “technician” to describe a developer who (A)
does work that is well-understood and (B) doesn’t need to leave their
comfort zone very often. Obviously there is not a bright line dividing
engineers from technicians, but generally speaking the more predictable
and routine a developer’s job the more they tend to slide into becoming
a technician. In the above example, I viewed the developer maintaining
the web API as more of a technician than an engineer.
In contrast, the more someone leans into being an engineer the more
unpredictable their work gets (along with their estimates). If you’re
consistently automating things then all of the predictable work slowly
dries up and all that’s left is unpredictable work. The nature of a
software engineer’s job is that they are tackling increasingly
challenging and ambitious tasks as they progressively automate more.
I believe that most tech companies should not bias towards
predictability and should avoid hiring/cultivating technicians. The
reason that tech companies command outsized valuations is because of
automation. Leaning into predictability and well-understood work
inadvertently incentivizes manual labor instead of automation. This
isn’t obvious to a lot of tech companies because they assume any work
involving code is necessarily automation but that’s not always the
case2. Tech companies that fail to
recognize this end up over-hiring and wondering why less work is getting
done with more people.
Or to put it another way: I actually view it as a red flag if an
engineer or team gets into a predictable “flow” because it means that
there is a promising opportunity for automation they’re ignoring.
Nowadays there are off-the-shelf
tools to do this like grpc-gateway
but this wasn’t available to us at the time.↩︎
… or even usually the case; I’m
personally very cynical about the engineering effectiveness of most tech
companies.↩︎
I've been running development versions of Emacs ever since I switched to Wayland
and needed the PGTK code. The various X-git packages on AUR makes that easy,
as long as one doesn't mind building the packages locally, and regularly.
Building a large package like Emacs does get a bit tiring after a while though
so I started looking at the emacs overlay to see if I could keep up without
building quite that much.
The first attempt at this failed as I couldn't get my email setup working; emacs
simply refused to find the locally installed mu4e package. I felt I didn't
have time to solve it at the time, reverted back to doing the builds myself
again. It kept irritating me though, and today I made another attempt. This time
I invested a bit more time in reading up on how to install emacs via Nix with
packages. Something that paid off.
I'm managing my packages using nix profile and a flake.nix. To install emacs
with a working mu4e I started with adding the emacs overlay to the inputs
and in the list of packages passed to pkgs.buildEnv I added
...
((emacsPackagesFor emacs-pgtk).emacsWithPackages
(epkgs: [ epkgs.mu4e ]))
mu
...
That's all there's to it. After running nix profile update 0 I had a build of
emacs with Wayland support that's less than a day old, all downloaded from the
community cache. Perfect!
Being quite into mathematics,
I sometimes blog about it.1
There are very capable solutions for rendering LaTeX in HTML documents out there,
which in particular solve the problem of properly aligning the fragments with the rest of the text.
One of them is KaTeX,
advertising itself to be easily executed on the server-side,
avoiding the use of extraneous client-side JavaScript.
Integrating it with Hakyll turned out to be relatively straightforward,
yet I haven’t seen an actual implementation anywhere;
this post is supposed to fill that gap.
My dark MathJax past
One of my quite strongly held opinions is that,
for static websites such as this one,
client-side LaTeX rendering is completely unnecessary,
and actually just a waste of resources.
As a result, I’ve been using MathJax
to insert LaTeX fragments into the HTML after it’s compiled from Markdown.
This setup—stolen essentially verbatim from Gwern—uses the now deprecated mathjax-node-page
to crawl through the already rendered HTML pages, and, upon recognising a math fragment,
replaces that text with the rendered formula.
The call to mathjax-node-page is trivial to parallelise on a per-file level with something like GNU parallel,
and so the whole thing actually works quite well.
However, the fact that this is “external” to Pandoc’s pipeline
and requires a separate build.sh file to be created has always felt a bit awkward to me.2
Plus, Hakyll is already capable of using GHC’s parallel runtime—why outsource a part of that to an external tool?
At some point, the annoyance I felt at this became stronger than the inertia my old setup had, so here we are.
A brighter future with KaTeX
Naturally, when you change something you really want to change something—at least I do—so instead of using MathJax v3’s native support for these kinds of things,
why not try something new?
An often cited alternative to MathJax is KaTeX,
yet another JavaScript library that promises decent maths rendering on the web.
This one is pretty good, though;
it’s supposed to be faster than MathJax,
and has “server side rendering” as a big bullet point on its landing page.
Sounds exactly like what I’m looking for.
KaTeX has a CLI of the same name,
but booting up the node runtime for every single maths fragment sounds perfectly dreadful to me,
so let’s not do that.
As such, one probably can’t avoid writing at least a little bit of JavaScript.
Thankfully, integrating KaTeX into Pandoc itself seems to be a well-trodden path,
so other people have already done this for me.
For example,
pandoc#6651
has a tiny script—essentially just calling katex.renderToString—that
is fed maths on stdin,
and then produces HTML on stdout.
Slightly adjusted to support inline and display maths, it looks like this:
Having this in place,
all that’s left is to crawl through Pandoc’s AST,
and feed each maths fragment to KaTeX.
Transforming its AST is something that Pandoc does
verywell,
so the code is usually swiftly written.
Indeed, both the Block and Inline types
have a Math constructor which we can match on.3
importData.TextqualifiedasTimportData.Text.IOqualifiedasTimportGHC.IO.Handle (BufferMode (NoBuffering), Handle, hSetBuffering)importHakyllimportSystem.Process (runInteractiveCommand)importText.Pandoc.Definition (Block (..), Inline (..), MathType (..), Pandoc)importText.Pandoc.Walk (walk, walkM)hlKaTeX ::Pandoc->CompilerPandochlKaTeX pandoc = recompilingUnsafeCompiler do (hin, hout, _, _) <- runInteractiveCommand "deno run scripts/math.ts" hSetBuffering hin NoBuffering hSetBuffering hout NoBuffering (`walkM` pandoc) \caseMath mathType (T.unwords . T.lines . T.strip -> text) ->dolet math ::Text= foldl' (\str (repl, with) -> T.replace repl with str)case mathType ofDisplayMath{-s-}->":DISPLAY "<> textInlineMath{-s-}-> text macros T.hPutStrLn hin mathRawInline"html"<$> getResponse hout block ->pure blockwhere-- KaTeX might sent the input back as multiple lines if it involves a-- matrix of coordinates. The big assumption here is that it does so only-- when matrices—or other such constructs—are involved, and not when it-- sends back "normal" HTML. getResponse ::Handle->IOText getResponse handle = go ""where go ::Text->IOText go !str =do more <- (str <>) <$> T.hGetLine handleif">"`T.isSuffixOf` more -- end of HTML snippetthenpure moreelse go more-- I know that one could supply macros to KaTeX directly, but where is the-- fun in that‽ macros :: [(Text, Text)] macros = [ ("≔" , "\\mathrel{\\vcenter{:}}=") , ("\\defeq" , "\\mathrel{\\vcenter{:}}=") , ("\\to" , "\\longrightarrow") , ("\\mapsto", "\\longmapsto") , ("\\cat" , "\\mathcal") , ("\\kVect" , "\\mathsf{Vect}_{\\mathtt{k}}") ]
The (T.unwords . T.lines . T.strip -> text)view pattern
is because KaTeX really does not seem to like it when there is a line break—even a semantically irrelevant one—in a formula.
Perhaps this is a setting I’ve overlooked.
Other than that the code should be reasonably self-explanatory;
there are a few macro definitions that are copied from the now deleted
build.sh
and some fiddling to make the stdout handle actually output the full response.4
The hlKaTeX function,
having a Pandoc -> Compiler Pandoc signature,
can be given to pandocCompilerWithTransformM like any other function:
All that’s left is to include the custom CSS and special fonts that KaTeX relies upon.
The former can be downloaded from their CDN,
and the latter are easily obtained from
the latest release
by copying the fonts directory.
The fonts are both reasonably small and loaded on demand,
such that the website does not blow up in size with this switch.
Conclusion
The whole affair was much easier than I—not knowing any JavaScript—expected it to be, and actually turned out to be quite fun.
Of course, nothing at all has changed on the user-side of things,
which is to say that the new KaTeX fragments look pretty much exactly the same as the old MathJax maths.
Still, the warm feeling I had when deleting that build.sh shell script tells me that this was not solely an exercise in futility.
Or perhaps I’ve fully embraced rolling the boulder up the hill by now.
If you’re interested,
the commit adding it to my setup can be found
here.
Not as much as I should,
I guess,
but nowadays when I write maths it feels like a waste to not have it go into either
Anki, Org Roam, or a paper,
and these notes are not necessarily written/ready for public consumption.
Oh well.↩︎
Especially because, unlike in Gwern’s case, this site is not super complex to build;
there aren’t any other moving parts that would require me to leave Haskell.↩︎
Seemingly as always when subprocesses are involved,
the hardest thing is to actually get all of the incantations right
such that buffering does not deadlock your program indefinitely.↩︎
tl;dr: if you appreciate my past or ongoing contributions to the
Haskell community, please consider helping me get to ICFP this year by donating
via my ko-fi page!
Working at a small liberal arts institution
has some tremendous benefits (close interaction with motivated students,
freedom to pursue the projects I want rather than jump through a bunch
of hoops to get tenure, fantastic colleagues), and I love my job. But
there are also downsides; the biggest ones for me are the difficulty of
securing enough travel funding, and, relatedly, the difficulty of
cultivating and maintaining collaborations.
Last
year
I was very grateful for people’s generosity in helping me get to
Seattle. I am planning to again attend ICFP in Milan this
September; this time I will even bring
some students along. I have once again secured some funding from my
institution, but it will not be enough to cover all the expenses.
Morleyization is a fairly important operation in categorical logic for which it is hard to find readily
accessible references to a statement and proof. Most refer to D1.5.13 of “Sketches of an Elephant” which is
not an accessible text. 3.2.8 of “Accessible Categories” by Makkai and Paré is another reference, and
“Accessible Categories” is more accessible but still a big ask for just a single theorem.
Here I reproduce the statement and proof from “Accessible Categories” albeit with some notational and
conceptual adaptations as well as some commentary. This assumes some basic familiarity with the ideas
and notions of traditional model theory, e.g. what structures, models, and |\vDash| are.
Preliminaries
The context of the theorem is infinitary, classical (multi-sorted) first-order logic.
|L| will stand for a language aka a signature, i.e. sorts, function symbols, predicate symbols as usual,
except if we’re allowing infinitary quantification we may have function or predicate symbols of infinite
arity. We write |L_{\kappa,\lambda}| for the corresponding classical first-order logic where we
allow conjunctions and disjunctions indexed by sets of cardinality less than the regular (infinite) cardinal
|\kappa| while allowing quantification over sets of variables of (infinite) cardinality less than
|\lambda \leq \kappa|. |\lambda=\varnothing| is also allowed to indicate a propositional logic.
If |\kappa| or |\lambda| are |\infty|, that means conjunctions/disjunctions or quantifications over
arbitrary sets. |L_{\omega,\omega}| would be normal finitary, classical first-order logic. Geometric
logic would be a fragment of |L_{\infty,\omega}|. The theorem will focus on |L_{\infty,\infty}|,
but inspection of the proof shows that theorem would hold for any reasonable choice for |\kappa|
and |\lambda|.
As a note, infinitary logics can easily have a proper class of formulas. Thus, it will make sense
to talk about small subclasses of formulas, i.e. ones which are sets.
Instead of considering logics with different sets of connectives Makkai and Paré, introduces the
fairly standard notion of a positive existential formula which is a formula that uses only
atomic formulas, conjunctions, disjunctions, and existential quantification. That is, no implication,
negation, or universal quantification. They then define a basic sentence as “a conjunction of
a set of sentences, i.e. closed formulas, each of which is of the form |\forall\vec x(\phi\to\psi)|
where |\phi| and |\psi| are [positive existential] formulas”.
It’s clear the component formulas of a basic sentences correspond to sequents of the form
|\phi\vdash\psi| for open positive existential formulas. A basic sentence corresponds to what
is often called a theory, i.e. a set of sequents. Infinitary logic lets us smash a theory down
to a single formula, but I think the theory concept is clearer though I’m sure there are benefits
to having a single formula. Instead of talking about basic sentences, we can talk about a theory
in the positive existential fragment of the relevant logic. This has the benefit that we don’t
need to introduce connectives or infinitary versions of connectives just for structural reasons.
I’ll call a theory that corresponds to a basic sentence a positive existential theory for
conciseness.
Makkai and Paré also define |L_{\kappa,\lambda}^*| “for the class of formulas |L_{\kappa,\lambda}|
which are conjunctions of formulas in each of which the only conjunctions occurring are
of cardinality |< \lambda|”. For us, the main significance of this is that geometric theories
correspond to basic sentences in |L_{\infty,\omega}^*| as this limits the conjunctions to the
finitary case. Indeed, Makkai and Paré include the somewhat awkward sentence: “Thus, a geometric
theory is the same as a basic sentence in |L_{\infty,\omega}^*|, and a coherent theory is
a conjunction of basic sentences in |L_{\omega,\omega}|.” Presumably, the ambiguous meaning of
“conjunction” leads to the differences in how these are stated, i.e. a basic sentence is already
a “conjunction” of formulas.
The standard notion of an |L|-structure and model are used, and I won’t give a precise definition
here. An |L|-structure assigns meaning (sets, functions, and relations) to all the sorts and
symbols of |L|, and a model of a formula (or theory) is an |L|-structure which satisfies the
formula (or all the formulas of the theory). We’ll write |Str(L)| for the category of |L|-structures
and homomorphisms. In categorical logic, an |L|-structure would usually
be some kind of structure preserving (fibred) functor usually into |\mathbf{Set}|, and a homomorphism
is a natural transformation. A formula would be mapped to a subobject, and a model would require
these subobjects to satisfy certain factoring properties specified by the theory. A sequent
|\varphi \vdash \psi| in the theory would require a model to have the interpretation of
|\varphi| factor through the interpretation of |\psi|, i.e. for the former to be a subset
of the latter when interpreting into |\mathbf{Set}|.
Theorem Statement
|\mathcal F \subseteq L_{\infty,\infty}| is called a fragment of |L_{\infty,\infty}| if:
it contains all atomic formulas of |L|,
it is closed under substitution,
if a formula is in |\mathcal F| then so are all its subformulas,
if |\forall\vec x\varphi \in \mathcal F|, then so is |\neg\exists\vec x\neg\varphi|, and
if |\varphi\to\psi \in \mathcal F|, then so is |\neg\varphi\lor\psi|.
Basically, and the motivation for this will become clear shortly, formulas in |\mathcal F| are
like “compound atomic formulas” with the caveat that we must include the classically equivalent
versions of |\forall| and |\to| in terms of |\neg| and |\exists| or |\lor| respectively.
Given |\mathcal F|, we define an |\mathcal F|-basic sentence exactly like a basic sentence
except that we allow formulas from |\mathcal F| instead of just atomic formulas as the base
case. In theory language, an |\mathcal F|-basic sentence is a theory, i.e. set of sequents,
using only the connectives |\bigwedge|, |\bigvee|, and |\exists|, except within subformulas
contained in |\mathcal F| which may use any (first-order) connective. We’ll call such a
theory a positive existential |\mathcal F|-theory. Much of the following will be double-barrelled
as I try to capture the proof as stated in “Accessible Categories” and my slight reformulation
using positive existential theories.
|\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| for a theory |\mathbb T| (or
|\mathrm{Mod}^{(\mathcal F)}(\sigma)| for a basic sentence |\sigma|) is the category
whose objects are |L|-structures that are models of |\mathbb T| (or |\sigma|), and whose arrows are the
|\mathcal F|-elementary mappings. An |\mathcal F|-elementary mapping |h : M \to N|,
for any subset of formulas of |L_{\infty,\infty}|, |\mathcal F|, is a mapping of |L|-structures
which preserves the meaning of all formulas in |\mathcal F|.
That is, |M \vDash \varphi(\vec a)| implies |N \vDash \varphi(h(\vec a))| for all
formulas, |\varphi \in \mathcal F| and appropriate sequences |\vec a|. We can define
the elementary mappings for a language |L’| as the |\mathcal F’|-elementary mappings
where |\mathcal F’| consists of (only) the atomic formulas of |L’|. |\mathrm{Mod}^{(L’)}(\mathbb T’)|
(or |\mathrm{Mod}^{(L’)}(\sigma’)|) can be defined by |\mathrm{Mod}^{(\mathcal F’)}(\mathbb T’)|
(or |\mathrm{Mod}^{(L’)}(\sigma’)|) for the |\mathcal F’| determined this way.
Here’s the theorem as stated in “Accessible Categories”.
Theorem (Proposition 3.2.8): Given any small fragment |\mathcal F| and an |\mathcal F|-basic
sentence |\sigma|, the category of |\mathrm{Mod}^{(\mathcal F)}(\sigma)| is equivalent to
|\mathrm{Mod}^{(L’)}(\sigma’)| for some other language |L’| and basic sentence |\sigma’| over
|L’|, hence by 3.2.1, to the category of models of a small sketch as well.
We’ll replace the |\mathcal F|-basic sentences |\sigma| and |\sigma’| with positive existential
|\mathcal F|-theories |\mathbb T| and |\mathbb T’|.
Implied is that |\mathcal F \subseteq L_{\infty,\infty}|, i.e. that |L| and |L’| may be distinct
and usually will be. As the proof will show, they agree on sorts and function symbols, but we have
different predicate symbols in |L’|.
I’ll be ignoring the final comment referencing Theorem 3.2.1. Theorem 3.2.1 is the main theorem of
the section and states that every small sketch gives rise to a language |L| and theory |\mathbb T|
(or basic sentence |\sigma|) and vice versa such that the category of models of the sketch are
equivalent to models of |\mathbb T| (or |\sigma|). Thus, the final comment is an immediate corollary.
For us, the interesting part of 3.2.8 is that it takes a classical first-order theory, |\mathbb T|,
and produces a positive existential theory, as represented by |\mathbb T’|, that has
an equivalent, in fact isomorphic, category of models. This positive existential theory is called
the Morleyization of the first-order theory.
In particular, if we have a finitary classical first-order theory, then we get a coherent theory
with the same models. This means to study models of classical first-order theories, it’s enough
to study models of coherent theories via the Morleyization of the classical first-order theories.
This allows many techniques for geometric and coherent theories to be applied, e.g. (pre)topos theory
and classifying toposes. As stated before, the theorem statement doesn’t actually make it clear that
the result holds for a restricted degree of “infinitariness”, but this is obvious from the proof.
Proof
I’ll quote the first few sentences of the proof to which I have nothing to add.
The idea is to replace each formula in |\mathcal F| by a new predicate. Let the
sorts of the language |L’| be the same as those of |L|, and similarly for the [function]
symbols.
The description of the predicate symbols is complicated by their (potential) infinitary nature.
I’ll quote the proof here as well as I have nothing to add and am not as interested in this case.
The finitary quantifiers case would be similar, just slightly less technical. It would be even
simpler if we defined formulas in a given (ordered) variable context as is typical in categorical
logic.
With any formula |\phi(\vec x)| in |\mathcal F|, with |\vec x| the repetition free sequence
|\langle x_\beta\rangle_{\beta<\alpha}| of exactly the free variables of |\phi| in a
once and for all fixed order of variables, let us associate the new [predicate] symbol |P_\phi|
of arity |a : \alpha \to \mathrm{Sorts}| such that |a(\beta) = x_\beta|. The [predicate]
symbols of |L’| are the |P_\phi| for all |\phi\in\mathcal F|.
The motivation of |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories should
now be totally clear. The |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories
are literally basic sentences / positive existential theories in the language of |L’| if we
replace all occurrences of subformulas in |\mathcal F| with their corresponding predicate symbol in |L’|.
We can extend any |L|-structure |M| to an |L’|-structure |M^\sharp| such that they agree on all the sorts
and function symbols of |L|, and |M^\sharp| satisfies |M^\sharp \vDash P_\varphi(\vec a)| if and only if
|M \vDash \varphi(\vec a)|. Which is to say, we define the interpretation of |P_\varphi| to
be the subset of the interpretation of its domain specified by |M \vDash \varphi(\vec a)| for all
|\vec a| in the domain. In more categorical language, we define the subobject that |P_\varphi|
gets sent to to be the subobject |\varphi|.
We can define an |L|-structure, |N^\flat|, for |N| an |L’|-structure by, again, requiring it to do the
same thing to sorts and function symbols as |N|, and defining the interpretation of the predicate
symbols as |N^\flat \vDash R(\vec a)| if and only if |N \vDash P_{R(\vec x)}(\vec a)|.
We immediately have |(M^\sharp)^\flat = M|.
We can extend this to |L’|-formulas. Let |\psi| be an |L’|-formula, then |\psi^\flat| is defined
by a connective-preserving operation for which we only need to specify the action on predicate
symbols. We define that by declaring |P_\varphi(\vec t)^\flat| gets mapped to |\varphi(\vec t)|.
We extend |\flat| to theories via |\mathbb T’^\flat \equiv \{ \varphi^\flat \vdash \psi^\flat \mid (\varphi\vdash\psi) \in \mathbb T’\}|.
A similar induction allows us to prove \[M\vDash\psi^\flat(\vec a)\iff M^\sharp\vDash\psi(\vec a)\]
for all |L|-structures |M| and appropriate |\vec a|.
We have |\mathbb T = \mathbb T’^\flat| for a positive existential theory |\mathbb T’| over |L’|
(or |\sigma = \rho^\flat| for a basic |L’|-sentence |\rho|)
and thus |\varphi^\flat \vDash_M \psi^\flat \iff \varphi \vDash_{M^\sharp}\psi|
for all |\varphi\vdash\psi \in \mathbb T’| (or |M \vDash\sigma \iff M^\sharp\vDash\rho|).
We want to make it so that any |L’|-structure |N| interpreting |\mathbb T’| (or |\rho|) as |\mathbb T|
(or |\sigma|) is of the form |N = M^\sharp| for some |M|. Right now that doesn’t happen because, while
the definition of |M^\sharp| forces it to respect the logical connectives in the formula |\varphi|
associated to the |L’| predicate symbol |P_\varphi|, this isn’t required for an arbitrary model |N|.
For example, nothing requires |N \vDash P_\top| to hold.
The solution is straightforward. In addition to |\mathbb T’| (or |\rho|) representing
the theory |\mathbb T| (or |\sigma|), we add in an additional set of axioms |\Phi|
that capture the behavior of the (encoded) logical connectives of the formulas associated to the
predicate symbols.
These axioms are largely structural with a few exceptions that I’ll address separately. I’ll present
this as a collection of sequents for a theory, but we can replace |\vdash| and |\dashv \vdash| with
|\to| and |\leftrightarrow| for the basic sentence version. |\varphi \dashv\vdash \psi| stands
for two sequents going opposite directions.
We avoid needing negation by axiomatizing that |P_{\neg\varphi}| is the complement to |P_\varphi|. This
is arguably the key idea. Once we can simulate the behavior of negation without actually needing it, then
it is clear that we can embed all the other non-positive-existential connectives.
|\Phi| is the set of all these sequents. (For the basic sentence version, |\Phi| is the set of universal
closures of all these formulas for all |\varphi,\psi \in \mathcal F|.)
Another straightforward structural induction over the subformulas of |\varphi\in\mathcal F| shows that
\[N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)\] for any |L’|-structure |N|
which is a model of |\Phi|. The only interesting case is the negation case. Here, the induction hypothesis
states that |N^\flat\vDash\varphi(\vec a)| agrees with |N\vDash P_\varphi(\vec a)| and the axioms
state that |N\vDash P_{\neg\varphi}(\vec a)| is the complement of the latter which thus agrees with the
complement of the former which is |N^\flat\vDash\neg\varphi(\vec a)|.
From this, it follows that |N = M^\sharp| for |M = N^\flat| or, equivalently, |N = (N^\flat)^\sharp|.
|({-})^\sharp| and |({-})^\flat| thus establish a bijection between the objects of
|\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| (or |\mathrm{Mod}^{(\mathcal F)}(\sigma)|) and
|\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|).
The morphisms of these two categories would each be subclasses of the morphisms of |Str(L_0)| where |L_0| is
the language consisting of only the sorts and function symbols of |L| and thus |L’|. We can show that they
are identical subclasses which basically comes down to showing that an elementary mapping of
|\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|)
is an |\mathcal F|-elementary mapping.
The idea is that such a morphism is a map |h : N \to N’| in |Str(L_0)| which must satisfy
\[N \vDash P_\varphi(\vec a) \implies N’ \vDash P_\varphi(h(\vec a))\] for
all |\varphi \in \mathcal F| and appropriate |\vec a|. However, since |N = (N^\flat)^\sharp|
and |P_\varphi(\vec a)^\flat = \varphi(\vec a)|, we have |N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)|
and similarly for |N’|. Thus
\[N^\flat \vDash \varphi(\vec a) \implies N’^\flat \vDash \varphi(h(\vec a))\] for all
|\varphi \in \mathcal F|, and every such |h| corresponds to an |\mathcal F|-elementary mapping.
Choosing |N = M^\sharp| allows us to show the converse for any |\mathcal F|-elementary
mapping |g : M \to M’|. |\square|
Commentary
The proof doesn’t particularly care that we’re interpreting the models into |\mathbf{Set}| and would
work just as well if we interpreted into some other category with the necessary structure. The amount
of structure required would vary with how much “infinitariness” we actually used, though it would need
to be a Boolean category. In particular, the proof works as stated (in its theory form) without
any infinitary connectives being implied for mapping finitary classical first-order logic to coherent logic.
We could simplify the statement and the proof by first eliminating |\forall| and |\to| and then
considering the proof over classical first-order logic with the connectives
|\{\bigwedge,\bigvee,\exists,\neg\}|. This would simplify the definition of fragment and
remove some cases in the proof.
To reiterate, the key is how we handle negation.
Defunctionalization
Morleyization is related to defunctionalization1.
For simplicity, I’ll only consider the finitary, propositional case, i.e. |L_{\omega,\varnothing}|.
In this case, we can consider each |P_\varphi| to be a new data type. In most cases, it would be
a newtype to use Haskell terminology. The only non-trivial case is |P_{\neg\varphi}|. Now, the
computational interpretation of classical propositional logic would use control operators to handle
negation. Propositional coherent logic, however, has a straightforward (first-order) functional
interpretation. Here, a negated formula, |\neg\varphi|, is represented by an primitive type
|P_{\neg\varphi}|.
The |P_{\neg\varphi} \land P_\varphi \vdash \bot| sequent is the apply
function for the defunctionalized continuation (of type |\varphi|). Even more clearly, this
is interderivable with |P_{\neg\varphi} \land \varphi’ \vdash \bot| where |\varphi’| is
the same as |\varphi| except the most shallow negated subformulas are replaced with the corresponding
predicate symbols. In particular, if |\varphi| contains no negated subformulas, then |\varphi’=\varphi|.
We have no way of creating new values of |P_{\neg\varphi}| other than via whatever sequents have been given.
We can, potentially, get a value of |P_{\neg\varphi}| by case analyzing on |\vdash \mathsf{lem}_\varphi : P_{\neg\varphi}\lor P_\varphi|.
What this corresponds to is a first-order functional language with a primitive type for each negated formula.
Any semantics/implementation for this, will need to decide if the primitive type |P_{\neg\varphi}| is
empty or not, and then implement |\mathsf{lem}_\varphi| appropriately (or allow inconsistency). A
programmer writing a program in this signature, however, cannot assume either way whether |P_{\neg\varphi}|
is empty unless they can create a program with that type.
As a very slightly non-trivial example, let’s consider implementing |A \to P_{\neg\neg A}|
corresponding to double negating. Using Haskell-like syntax, the program looks like:
proof ::A->NotNotAproof a =case lem_NotA ofLeft notNotA -> notNotARight notA -> absurd (apply_NotA (notA, a))
where lem_NotA :: Either NotNotA NotA, apply_NotA :: (NotA, A) -> Void, and absurd :: Void -> a
is the eliminator for |\bot| where |\bot| is represented by Void.
Normally in defunctionalization we’d also be adding constructors to our new types for all the
occurrences of lambdas (or maybe |\mu|s would be better in this case). However, since the only
thing we can do (in general) with NotA is use apply_A on it, no information can be extracted
from it. Either it’s inhabited and behaves like (), i.e. |\top|, or it’s not inhabited and
behaves like Void, i.e. |\bot|. We can even test for this by case analyzing on lem_A which
makes sense because in the classical logic this formula was decidable.
Bonus: Grothendieck toposes as categories of models of sketches
The main point of this section of “Accessible Categories” is to show that we can equivalently
view categories of models of sketches
as categories of models of theories. In particular, models of geometric sketches, those whose
cone diagrams are finite but cocone diagrams are arbitrary, correspond to models of geometric theories.
We can view a site, |(\mathcal C, J)|, for a Grothendieck topos as the
data of a geometric sketch. In particular, |\mathcal C| becomes the underlying category of the sketch, we
add cones to capture all finite limits, and the coverage, |J|, specifies the cocones. These cocones
have a particular form as the quotient of the kernel of a sink
as specified by the sieves in |J|. (We need to use the apex of the cones representing pullbacks instead
of actual pullbacks.)
Lemma 3.2.2 shows the sketch-to-theory implication. The main thing I want to note about its proof is that
it illustrates how infinitely large cones would require infinitary (universal) quantification (in addition
to the unsurprising need for infinitary conjunction), but infinitely large cocones do not (but they do
require infinitary disjunction). I’ll not reproduce it here, but it comes down to writing out the normal
set-theoretic constructions of limits and colimits (in |\mathbf{Set}|), but instead of using some first-order
theory of sets, like ZFC, uses of sets would be replaced with (infinitary) logical operations. The
“infinite tuples” of an infinite limit become universal quantification over an infinitely large number of
free variables. For the colimits, though, the most complex use of quantifiers is an infinite disjunction of
increasingly deeply nested quantifiers to represent the transitive closure of a relation, but no single
disjunct is infinitary. Figuring out the infinitary formulas is a good exercise.
An
even more direct connection to defunctionalization is the fact that geometric logic is the internal logic
of Grothendieck toposes, but Grothendieck toposes are elementary toposes and so have the structure to model
implication and universal quantification. It’s just that those connectives aren’t preserved by geometric
morphisms. For implication, the idea is that |A \to B| is represented by
|\bigvee\{\bigwedge\Gamma\mid \Gamma,A\vdash B\}| where |\Gamma| is finite. We can even see how
a homomorphism that preserved geometric logic structure will fail to preserve this definition of |\to|.
Specifically, there could be additional contexts not in the image of the homomorphism that should be
included in the image of the disjunction for it to lead to |\to| in the target but won’t be.↩︎
In this episode, Garrett Morris talks with Wouter Swierstra and Niki Vazou about his work on Haskell’s type classes, how to fail successfully, and how to construct a set of ponies.
Lately I’ve been thinking about representing eventually constant
streams in Haskell. An eventually constant stream is an infinite
stream which eventually, after some finite prefix, starts repeating
the same value forever. For example,
\(6, 8, 2, 9, 3, 1, 1, 1, 1, \dots\)
There are many things we can do in a
decidable way with eventually constant streams that we can’t do with
infinite streams in general—for example, test them for equality.
This is a work in progress. I only have one specific use case in mind
(infinite-precision two’s complement arithmetic, explained at the end
of the post), so I would love to hear of other potential use cases, or
any other feedback. Depending on the feedback I may eventually turn
this into a package on Hackage.
{-# LANGUAGE LambdaCase #-}{-# LANGUAGE PatternSynonyms #-}{-# LANGUAGE ViewPatterns #-}moduleRiverwhereimportData.Monoid (All (..), Any (..))importData.Semigroup (Max (..), Min (..))importPreludehiding (all, and, any, drop, foldMap, maximum, minimum, or, repeat, take, zipWith, (!!))importPreludequalifiedasP
Now let’s get to the main definition. A value of type River a is
either a constant C a, representing an infinite stream of copies of
a, or a Cons with an a and a River a.
dataRiver a =C!a |Cons!a !(River a)derivingFunctor
I call this a River since “all Rivers flow to the C”!
The strictness annotations on the a values just seem like a good
idea in general. The strictness annotation on the River a tail,
however, is more interesting: it’s there to rule out infinite streamsAlthough the strictness annotation on the River a is semantically correct, I could imagine not wanting it there for performance reasons; I’d be happy to hear any feedback on this point.
constructed using only Cons, such as flipflop = Cons 0 (Cons 1 flipflop). In
other words, the only way to make a non-bottom value of type Stream a is
to have a finite sequence of Cons finally terminated by C.
We need to be a bit careful here, since there are multiple ways to
represent streams which are semantically supposed to be the same. For
example, Cons 1 (Cons 1 (C 1)) and C 1 both represent an infinite stream of
all 1’s. In general, we have the law
C a === Cons a (C a),
and want to make sure that any functions we write respect this
It would be interesting to try implementing rivers as a higher inductive type, say, in Cubical Agda.
equivalence, i.e. do not distinguish between such values. This is
the reason I did not derive an Eq instance; we will have to write
our own.
expand ::River a ->River aexpand (C a) =Cons a (C a)expand as = asinfixr5:::pattern (:::) :: a ->River a ->River apattern (:::) a as <- (expand ->Cons a as)where a ::: as =Cons a as{-# COMPLETE (:::) #-}
Matching with the pattern (a ::: as) uses a view pattern
to potentially expand a C one step into a Cons, so that we can
pretend all River values are always constructed with (:::).
In the other direction, (:::) merely constructs a Cons.
We mark (:::) as COMPLETE on its own since it is, in fact,
sufficient to handle every possible input of type River. However,
in order to obtain terminating algorithms we will often include one or
more special cases for C.
Normalization by construction?
As an alternative, we could use a variant pattern synonym:
infixr5::=pattern (::=) ::Eq a => a ->River a ->River apattern (::=) a as <- (expand ->Cons a as)where a' ::=C a | a' == a =C a a ::= as =Cons a as{-# COMPLETE (::=) #-}
As compared to (:::), this has an extra Eq a constraint: when we
construct a River with (::=), it checks to see whether we are
consing an identical value onto an existing C a, and if so, simply
returns the C a unchanged. If we always use (::=) instead of
directly using Cons, it ensures that River values are always
normalized—that is, for every eventually constant stream, we
always use the canonical representative where the element immediately
preciding the constant tail is not equal to it.
This, in turn, technically makes it impossible to write functions
which do not respect the equivalence C a === Cons a (C a), simply
because they will only ever be given canonical rivers as input.
However, as we will see when we discuss folds, it is still possible to
write “bad” functions, i.e. functions that are semantically
questionable as functions on eventually constant streams—it would
just mean we cannot directly observe them behaving badly.
The big downside of using this formulation is that the Eq constraint
infects absolutely everything—we even end up with Eq constraints
in places where we would not expect them (for example, on head :: River a -> a), because the pattern synonym incurs an Eq constraint
anywhere we use it, regardless of whether we are using it to construct
or destruct River values. As you can see from the definition above,
we only do an equality check when using (::=) to construct a
River, not when using it to pattern-match, but there is no way to
give the pattern synonym different types in the two directions.Of course, we could make it a unidirectional pattern synonym and just make a differently named smart constructor, but that seems somewhat ugly, as we would have to remember which to use in which situation.
So, because this normalizing variant does not really go far enough in
removing our burden of proof, and has some big downsides in the form
of leaking Eq constraints everywhere, I have chosen to stick with
the simpler (:::) in this post. But I am still a bit unsure about this
choice; in fact, I went back and forth two times while writing.
We can at least provide a normalize function, which we can use when
we want to ensure normalization:
normalize ::Eq a =>River a ->River anormalize (C a) =C anormalize (a ::= as) = a ::= as
Some standard functions on rivers
With the preliminary definitions out of the way, we can now build up a
library of standard functions and instances for working with River a
values. To start, we can write an Eq instance as follows:
instanceEq a =>Eq (River a) whereC a ==C b = a == b (a ::: as) == (b ::: bs) = a == b && as == bs
Notice that we only need two cases, not four: if we compare two values
whose finite prefixes are different lengths, the shorter one will
automatically expand (via matching on (:::)) to the length of the
longer.
We already derived a Functor instance; we can also define a “zippy”
Applicative instance like so:
repeat :: a ->River arepeat=CinstanceApplicativeRiverwherepure=repeatC f <*>C x =C (f x) (f ::: fs) <*> (x ::: xs) = f x ::: (fs <*> xs)zipWith :: (a -> b -> c) ->River a ->River b ->River czipWith= liftA2
We can write safe head, tail, and index functions:
head ::River a -> ahead (a ::: _) = atail ::River a ->River atail (_ ::: as) = asinfixl9!!(!!) ::River a ->Int-> aC a !! _ = a(a ::: _) !!0= a(_ ::: as) !! n = as !! (n -1)
We can also write take and drop variants. Note that take
returns a finite prefix of a River, which is a list, not another
River. The special case for drop _ (C a) is not strictly
necessary, but makes it more efficient.
take ::Int->River a -> [a]take n _ | n <=0= []take n (a ::: as) = a :take (n -1) asdrop ::Int->River a ->River adrop n r | n <=0= rdrop _ (C a) =C adrop n (_ ::: as) =drop (n -1) as
There are many other such functions we could implement (e.g.span,
dropWhile, tails…); if I eventually put this on Hackage I would
be sure to have a much more thorough selection of functions. Which
functions would you want to see?
Folds for River
How do we fold over a River a? The Foldable type class requires us
to define either foldMap or foldr; let’s think about foldMap,
which would have type
foldMap :: Monoid m => (a -> m) -> River a -> m
However, this doesn’t really make sense. For example, suppose we have
a River Int; if we had foldMap with the above type, we could use
foldMap Sum to turn our River Int into a Sum Int. But what is
the sum of an infinite stream of Int? Unless the eventually
repeating part is C 0, this is not well-defined. If we simply write
a function to add up all the Int values in a River, including
(once) the value contained in the final C, this would be a good
example of a semantically “bad” function: it does not respect the law
C a === a ::: C a. If we ensure River values are always
normalized, we would not be able to directly observe anything amiss,
but the function still seems suspect.
Thinking about the law C a === a ::: C a again is the key.
Supposing foldMap f (C a) = f a (since it’s unclear what else it
could possibly do), applying foldMap to both sides of the law we
obtain f a == f a <> f a, that is, the combining operation must be
idempotent. This makes sense: with an idempotent operation,
continuing to apply the operation to the infinite constant tail will
not change the answer, so we can simply stop once we reach the C.
We can create a subclass of Semigroup to represent idempotent
semigroups, that is, semigroups for which a <> a = a. There are
several idempotent semigroups in base; we list a few below. Note
that since rivers are never empty, we can get away with just a
semigroup and not a monoid, since we do not need an identity value
onto which to map an empty structure.
classSemigroup m =>Idempotent m-- No methods, since Idempotent represents adding only a law,-- namely, ∀ a. a <> a == a-- Exercise for the reader: convince yourself that these are all-- idempotentinstanceIdempotentAllinstanceIdempotentAnyinstanceIdempotentOrderinginstanceOrd a =>Idempotent (Max a)instanceOrd a =>Idempotent (Min a)
Now, although we cannot make a Foldable instance, we can write our own
variant of foldMap which requires an idempotent semigroup instead of
a monoid:
foldMap ::Idempotent m => (a -> m) ->River a -> mfoldMap f (C a) = f afoldMap f (a ::: as) = f a <>foldMap f asfold ::Idempotent m =>River m -> mfold =foldMapid
We can then instantiate it at some of the semigroups listed above to
get some useful folds. These are all guaranteed to terminate and
yield a sensible answer on any River.
and ::RiverBool->Booland= getAll .foldMapAllor ::RiverBool->Boolor= getAny .foldMapAnyall :: (a ->Bool) ->River a ->Boolall f =and.fmap fany :: (a ->Bool) ->River a ->Boolany f =or.fmap fmaximum ::Ord a =>River a -> amaximum= getMax .foldMapMaxminimum ::Ord a =>River a -> aminimum= getMin .foldMapMinlexicographic ::Ord a =>River a ->River a ->Orderinglexicographic xs ys = fold $zipWithcompare xs ys
We could make an instance Ord a => Ord (River a) with compare = lexicographic; however, in the next section I want to make a
different Ord instance for a specific instantiation of River.
Application: \(2\)-adic numbers
Briefly, here’s the particular application I have in mind:
infinite-precision two’s complement arithmetic, i.e.\(2\)-adic
numbers. Chris Smith also wrote about \(2\)-adic numbers
recently;
however, unlike Chris, I am not interested in \(2\)-adic numbers in
general, but only specifically those \(2\)-adic numbers which represent
an embedded copy of \(\mathbb{Z}\). These are precisely the eventually
constant ones: nonnegative integers are represented in binary as
usual, with an infinite tail of \(0\) bits, and negative integers are
represented with an infinite tail of \(1\) bits. For example, \(-1\) is
represented as an infinite string of all \(1\)’s. The amazing thing
about this representation (and the reason it is commonly used in
hardware) is that the usual addition and multiplication algorithms
continue to work without needing special cases to handle negative
integers. If you’ve never seen how this works, you should definitely
readabout it.
First, some functions to convert to and from integers. We only need
special cases for \(0\) and \(-1\), and beyond that it is just the usual
business with mod and div to peel off one bit at a time, or
multiplying by two and adding to build up one bit at a time. (I am a big fan ofLambdaCase.)
toBits ::Integer->BitstoBits = \case0->CO-1->CI n ->toEnum (fromIntegral (n `mod`2)) ::: toBits (n `div`2)fromBits ::Bits->IntegerfromBits = \caseCO->0CI->-1 b ::: bs ->2* fromBits bs +fromIntegral (fromEnum b)
For testing, we can also make a Show instance. When it comes to
showing the infinite constant tail, I chose to repeat the bit 3 times
and then show an ellipsis; this is not really necessary but somehow
helps my brain more easily see whether it is an infinite tail of zeros
or ones.
instanceShowBitswhereshow=reverse. gowhere go (C b) =replicate3 (showBit b) ++"..." go (b ::: bs) = showBit b : go bs showBit = ("01"P.!!) .fromEnum
Let’s implement some arithmetic. First, incrementing. It is standard
except for a special case for C I (without which, incrementing C I
would diverge). Notice that we use (::=) instead of (:::), which
ensures our Bits values remain normalized.
Is (:::) or (::=) the better default? It’s tempting to just say
“provide both and let the user decide”. I don’t disagree with that;
however, the question is which one we use to implement various basic
functions such as map/fmap. For example, if we use (:::), we
can make a Functor instance, but values may not be normalized
after mapping.
Can we generalize from eventually constant to eventually periodic?
That is, instead of repeating the same value forever, we cycle
through a repeating period of some finite length. I think this
is possible, but it would make the implementation more
complex, and I don’t know the right way to generalize foldMap. (We
could insist that it only works for commutative idempotent
semigroups, but in that case what’s the point of having a sequence
of values rather than just a set?)
Happy to hear any comments or suggestions!
<noscript>Javascript needs to be activated to view comments.</noscript>
This is a follow-up to my previous post on Collatz in base 2 and 3. I got a response from a reader, Olaf K., who pointed out that the functions defined there work just fine not only on finite sequences of base 2/3 digits, but infinite sequences as well. In the base 2 case, where the digits were listed from right to left, this has a common mathematical interpretation. An integer with possibly non-zero bits extending infinitely to the left is a called 2-adic integer. And the function defined there yields some interesting observations when applied to the 2-adic integers!
Brief introduction to 2-adic integers
A standard binary integer is a finite sequence of bits, either 0 or 1, with each bit having a value equal to some power of two. Because any non-negative integer can be written as a sum of powers of two, it can be written in this way.
But a finite sequence isn’t exactly right. We can always make that sequence longer, incorporating greater powers of two, by adding zeros on the left side. For this reason, if we think of a binary number as a finite sequence, we get non-unique representations: one with a 1 as the largest digit, but others that add leading zeros on the left. This is messy, so in general we tend to think of a binary integer as having infinitely many bits, but with the constraint that only finitely many of them can be non-zero. We don’t usually write the leading zeros, but that’s just a matter of notation. They are still there.
This leads to the obvious question: what happens if you remove the restriction that all but finitely many digits must be zero? The answer is the 2-adic integers. It turns out that we can write a lot of rational numbers as 2-adic integers. For example:
Even without a negative sign, we can write -1 as …111, the 2–adic integer all of whose bits are 1. Why? Try adding one, and you’ll notice that the result is all 0s, so clearly this is the opposite of 1.
What happens when you multiply 3 (binary 11) by the 2-adic integer …01010101011? If you work it out, you’ll get 1. So that 2-adic integer is the multiplicative inverse of 3, making it effectively 1/3.
In fact, it turns out that the 2-adic integers include all rational numbers with odd denominators! Not only that, but all of them ultimately end up with digits in a repeating pattern to the left, similar to how rational numbers in traditional decimal representations end up with digits in a repeating pattern to the right. (There are even irrational 2–adic integers that don’t repeat their digits; but they don’t correspond to the traditional irrational numbers, but rather to some completely new concept that doesn’t happen in traditional numbers!)
The Collatz map on 2-adics
The Collatz map on 2-adic integers can be defined in precisely the same way as it is on integers: even numbers are halved, while odd numbers are mapped to 3n+1. But hold on… if we can represent numbers like 1/3, what does it mean to be even or odd?
For arbitrary rationals, this would be a tricky question to answer, but in the 2–adic integers, there’s an easy answer: just look at the 1s place. If it’s 0, the number is even; if it’s 1, the number is odd. This is equivalent to saying that a rational number is even iff its numerator is even. And this notion is well-defined because we’ve already constrained the denominator to be odd.
I’m now going to redefine the Collatz step function on binary numbers from my previous post, but with one difference: I’ll assume that the numbers are odd. Because every number therefore ends with a 1, we won’t represent the 1 explicitly, but rather let it be implied. This implied 1 is expressed by the OddBits newtype.
data Bit = B0 | B1 newtype OddBits = XXX1 [Bit] data State = C0 | C1 | C1_Trim | C2 | C2_Trim
threeNPlusOneDiv2s :: OddBits -> OddBits threeNPlusOneDiv2s (XXX1 bits) = XXX1 (go C2_Trim bits) where go C1_Trim [] = [] go C1_Trim (B0 : bs) = go C0 bs go C1_Trim (B1 : bs) = go C2_Trim bs go C2_Trim [] = [] go C2_Trim (B0 : bs) = go C1_Trim bs go C2_Trim (B1 : bs) = go C2 bs go C0 [] = [] go C0 (B0 : bs) = B0 : go C0 bs go C0 (B1 : bs) = B1 : go C1 bs go C1 [] = [B1] go C1 (B0 : bs) = B1 : go C0 bs go C1 (B1 : bs) = B0 : go C2 bs go C2 [] = [B0, B1] go C2 (B0 : bs) = B0 : go C1 bs go C2 (B1 : bs) = B1 : go C2 bs
This function, in a single pass, multiplies an odd number by 3, adds 1, then divides by 2 as many times as needed to make the result odd. Therefore, this is a map from the odd numbers to other odd numbers. The states represent:
The amount carried from the previous bit when multiplying by 3.
Whether the lower-order bits are all zeros, in which case we should continue to trim zeros instead of emit them.
This function still handles finite lists, but you can generally ignore those equations, since they are equivalent to extending with 0 bits to the left. And as Olaf suggests, the function extends to the 2-adic numbers by operating on infinite lists. (That is, except for one specific input: …010101, on which the function hangs non-productively. That’s because this 2-adic integer corresponds to the rational -1/3, and 3(-1/3) + 1 = 0, which can never be halved long enough to yield another odd number!)
Fixed points
The Collatz conjecture amounts to finding the orbits of the Collatz map, which fall into two categories: periodic orbits, which repeat infinitely, and divergent orbits, which grow larger indefinitely without repeating. Among positive integers, the conjecture is that the only orbit is the periodic one that ends in 4, 2, 1, 4, 2, 1, 4, 2, 1…
Since we’re skipping the even numbers, our step function has the property that f(1) = 1, making 1 a fixed point. Not all periodic orbits are fixed points, but it’s natural to ask whether there are any other fixed points of this map. Let’s explore this question!
We start by looking only at the non-terminating equations for the recursive definition. (Recall that the terminating equations are really just duplicates of these, since leading zeros are equivalent to termination.)
go C1_Trim (B0 : bs) = go C0 bs go C1_Trim (B1 : bs) = go C2_Trim bs go C2_Trim (B0 : bs) = go C1_Trim bs go C2_Trim (B1 : bs) = go C2 bs go C0 (B0 : bs) = B0 : go C0 bs go C0 (B1 : bs) = B1 : go C1 bs go C1 (B0 : bs) = B1 : go C0 bs go C1 (B1 : bs) = B0 : go C2 bs go C2 (B0 : bs) = B0 : go C1 bs go C2 (B1 : bs) = B1 : go C2 bs
State transition diagram for Collatz step
These observations will be relevant:
We start in the state C2_Trim
The Trim states do not emit bits to the result, only consuming them. Therefore, the output will lag behind the input by some number of bits depending on how long evaluation lingers in these Trim states.
Once we leave the Trim states, we can never re-enter them. Inputs and outputs will then match exactly, so the lag stays the same forever.
If we’re searching for inputs that evaluate in a certain way, the bits of the input are completely determined by whether we want to stay in Trim states or leave them, and then whether we want the next output to be a 0 or 1.
Because of this, when searching for a fixed point of this function, the input value is entirely determined by one choice: for how many input bits do we choose to remain in the Trim states. Once that single choice is made, the rest of the input is entirely determined by that plus the desire for the input to be a fixed point.
Let’s work some out.
Lag = 1. Here, we want to stay in the Trim states for only one bit of input. Then that bit must be a 1, since that’s what gets us out of the Trim state. From that point, we will stay in state C2, and in order to produce the 1 output bits to match the inputs, we’ll need to keep seeing 1s in the input! Then the fixed point here is XXX1 (repeat 1), which corresponds to the 2-adic integer …111.
We noted earlier that this 2–adic integer corresponds to -1. We can double-check that -1 is indeed a fixed point of the function that computes 3n+1 and then divides by 2 until the result is odd. To compute f(-1), we first compute 3(-1)+1 = -2, then divide by 2 to get -1, which is odd. So it is indeed a fixed point.
Lag = 2. Here, we want to stay in the Trim states for two bits of input. That means we expect the first bit to be 0 so that we’ll switch to state C1_Trim, and then the second bit to be 0 again to transition us to the C0 state. At this point, we’re producing output, which must match the input bits already chosen, and the input bit we need will always be a 0 so as to produce the 0 that matches the input. Then the fixed point is XXX1 (repeat 0), and keeping in mind that there’s an implied 1 on the end, this corresponds to the 2-adic integer …0001, which is just 1.
This is the standard period orbit mentioned up above: 4, 2, 1, 4, 2, 1, which is just 1s when we skip the even numbers.
Now things start to get interesting:
Lag = 3. To stay in the Trim states for exactly three bits of input, we need those bits to be 0, 1, and 1. This ends up in state C2, with the input sequence 011 left to match. The next input must therefore be 0, yielding a 0 as output, and leaving us in state C1 with 110 left to match. The next input must be 0 again, leaving us in C0 with 100 left to match. We need a 1 next, leaving us in C1 with 001 left to match. Then we need to see a 1 again to leave us in C2 with 011 left to match. That’s the same state and pending bits as we were in when we left the Trim states, so we’ve finally found a loop.
The fixed point that produces this behavior is XXX1 (cycle [0, 1, 1, 0]), and including the implied 1 on the end, this corresponds to the 2-adic integer …(0110)(0110)(0110)1. This turns out to correspond to the rational number 1/5. We can check that 3(1/5) + 1 = 8/5, and halving that three times yields 1/5 again, so this is indeed a fixed point of the map, even though it’s not an integer.
Observations about fixed points
A few observations can be made about the fixed points of this map:
There are an infinite number of them. Every possible choice of lag, starting with one but increasing without bound, yields a fixed point, and they all must be different since they produce different behaviors.
There are only a countably infinite number of them. This is the only way to produce a fixed point, so the list of fixed points we compute in this way is complete. There are no others.
The 2-adic fixed points are all rational. Once we leave to Trim states, there’s only a finite amount of state involved in determining what happens from here: the state of the function implementation, together with the pending input bits remaining to match, which keeps the same length. We progress through this finite number of states indefinitely, so we must eventually reach the same state twice, and from that point, the bits will follow a repeating pattern. Therefore, interpreted as a 2–adic number, they will correspond to a rational value.
The only integer fixed points are 1 and -1. You can see this for non-negative integers by looking at the terminating equations in the original code: the longest terminating case produces two bits at the end before ending in trailing zeros, so the lag can be no greater than 2. Similar logic applies to negative integers, which have 1s extending infinitely to the left.
In fact, if we work out what’s going on here, we find that fixpoints of this function are precisely 1 / (2ⁿ - 3) for n > 0. (In fact, n = 0 yields -1/2, which is also a fixed point as a rational, but is not a 2-adic integer so it didn’t occur in our list.)
Periodic points
We can press further on this, and consider periodic points with period greater than one, by composing the function with itself and writing down the state machine that results. This grows more complex, as every additional iteration of the function adds a new choice about lag, yielding a larger-dimensional array of periodic points. The general form of the computation seems to remain the same, but the state diagrams grow increasingly daunting.
State transition diagram for two Collatz steps
The diagram above gives the state transitions for the composition of two Collatz steps. The left two states are those where the first of the two steps has yet to produce a bit. The next six are states where the first is producing bits but the second is not. The final nine, then, represent the situation where both of the two composed steps is productive.
I have not labeled the outputs, but the general rule is that the trim states have no output, the non-trim states with an even number of occurrences of C1 will produce outputs that are the same as their inputs, and the states with odd occurrences of C1 produce outputs the opposite of their inputs.
Because there are now two kinds of trimming that happen, we can choose any combination of the two lags, giving a 2D array of points that repeat with period 2. The computations are similar to the above, so I’ll just give the results for lag 1, 2, and 3 in each dimension.
The diagonal elements are not new. This is expected, though, because a periodic point of period 1 is also periodic with period 2. The off-diagonal elements yield three new period-2 orbits:
-5, -7, -5, …
5/7, 11/7, 5/7, …
7/23, 11/23, 7/23, …
Just as with fixed points, we can work out a closed form for the period two points. This time, we get (2^m + 3) / (2^(m+n) - 9). As we noticed above, this simplifies to the earlier formula for fixed points when m = n. (Hint: the denominator factors as a difference of squares.)
You might wonder if there’s a pattern here that continues to all periodic points, and indeed there is! On the Wikipedia page about the Collatz Conjecture, a formula is given for the unique rational number that generates any periodic sequence of even and odd numbers from the “shortcut” Collatz map. (The shortcut map is defined there as the map that divides by 2 once after each 3n+1 step.)
To translate this into our terms:
m is the period, which is also the number of lag values.
k₀ = 0, because the function defined here can only be evaluated on odd numbers.
Each additional kᵢ is the sum of the first i lag values.
n is the sum of all the lag values.
We can make the interesting observation that the sign of the number is determined by the denominator: if n > m log₂(3), or about 1.585 m, then the result will be positive. But n/m is also understood as the average of the lag values. So looking for positive periodic points amounts to choosing lag values with an average of greater than about 1.585. But perhaps not too much greater, if we want them to be integers, because we should not allow the denominator to grow too large. (Indeed, we saw in the fixed point case above that there was a bound on how large the lag could grow because the output needs to catch up!) Working out more precise upper bounds on lag would be an interesting step toward a search for periodic points.
The fact that this rational number is unique, left somewhat mysterious in the Wikipedia statement, comes down to the way that fixed points of these composed state machines are always determined by how long we linger in each of the trim states. The result on Wikipedia already implies that these are the only periodic points among the rationals with odd denominators. The analysis here also makes it clear that these are the only periodic points in the 2-adic integers, as well, so there are no irrational 2-adic periodic points of the Collatz map.
Of course, the trick would be to show that none of these rational values except for 1 are positive integers, and then that there are also no orbits that increase aperiodically. Actually solving the Collatz Conjecture is left as an exercise for the reader. :)
Today, 2024-07-17, at 1830 UTC (11:30 am PDT, 2:30 pm EDT, 7:30 pm BST, 20:30 CEST, …)
we are streaming the 29th episode of the Haskell Unfolder live on YouTube.
Version 9.10 of GHC introduces an extremely useful new feature: exception annotations and automatic exception backtraces. This new feature, four years in the making, can be a life-saver when debugging code and has not received nearly as much attention as it deserves. In this episode of the Haskell Unfolder we therefore give an overview of the changes and discuss how we can take advantage of them.
About the Haskell Unfolder
The Haskell Unfolder is a YouTube series about all things Haskell hosted by
Edsko de Vries and Andres Löh, with episodes appearing approximately every two
weeks. All episodes are live-streamed, and we try to respond to audience
questions. All episodes are also available as recordings afterwards.
In Part 1 I explained what
the boolean operations on polygons are, and I gave a motivating
example by way of an algorithm that traces out the intersections of a
set of overlapping polygons. That algorithm uses intersection (∩),
union (∪), and subtraction (−) operations between polygons. It’s also
necessary to identify duplicate polygons, which can be harder that you
might first think: a polygon is typically defined by giving the
coördinates of its vertices, in some order (for example, in
GeoJSON, vertices (of an
exterior ring) should be listed in anti-clockwise order). But the list
of vertices can start at any of the vertices. More problematic is that
the same polygon, calculated in different ways, may have tiny
differences in its coördinates due to the rounding of floating point
numbers. I’ll be spending a lot of time talking about floating point
numbers very soon!
One algorithm that is widely used for these boolean operations is given by the papers:
The algorithm described by these papers will be the focus of the blog
posts in this series. These papers are not quite enough on their own
to understand the algorithm. You will also want to refer to bits of
these books:
“Computational Geometry - An Introduction” by Franco P. Preparata and Michael Ian Shamos
“Geometric Tools for Computer Graphics” by Philip J. Schneider and David H. Eberly
Both of these you can find after a quick web search.
Francisco Martínez has a page for these
papers and if that’s
disappeared then the Wayback Machine has a
copy. That
page provides a link to a C++
implementation which also
appears in various Github repositories, such as this
one. This
algorithm has been quite widely implemented (sometimes with a few
changes), and this Github
repo
provides a list of implementations, which I reproduce verbatim here:
I have looked at some, but not all, of these implementations. Often,
they seem to be quite mechanical translations of the original C++
code, so bugs may also have been translated across.
Working closely and in detail on this algorithm has turned out to be
very interesting, and has completely changed the way I think about
floating point maths. That alone has been worth the effort to
me. Please do not interpret any of what follows as criticisms of the
original authors or their work; for the following reasons (amongst
many others):
Most importantly, this algorithm seems to work well for a large
number of people and applications. There’s a good chance that most
reasonable people would consider the issues I’ve found to be
“nit-picking”.
Many years ago I attempted to do a PhD. I abandoned it after three
years or so, but during that time, I attempted to write a few
papers. I learnt one of the most frustrating things about writing
papers is the page limit, which forces huge compromise on how you
describe and explain your own work.
They’ve provided some source code, which helps to explain many of
the details elided from the papers (though in some places also
suggests the details in the papers may not be quite right). This is
vastly better than most academic papers.
This is far from the first time I’ve come across papers that turn
out to be missing certain key details. In fact nearly every paper
describing an algorithm that I’ve wanted to implement turns out to
be missing certain details.
Without their work I couldn’t do this work.
Some constraints on the input polygons
I need to define some terminology, and what is considered to be valid
input to this algorithm.
A polygon is defined by a list of (x,y) coördinates. Each (x,y)
coördinate defines a vertex of the polygon. The polygon is
closed: there
is an edge between its final vertex back to its first vertex (some
standards, such as GeoJSON, require that you restate the first vertex
as the last. This typically makes iterating through the vertices and
constructing the edges easier. I’m not specifying an input format here
– anything will do).
The set of vertices that define the outside perimeter of the polygon
is called the “exterior ring”. A polygon may have zero or more
“holes”. A “hole”
cuts out a region inside of the polygon. These are sometimes called
“interior rings”. Both exterior and interior rings are sometimes
collectively called “contours”.
The “edge interior” is the set of all the points that lie on an edge,
excluding the two vertices that define the ends of the edge. The
“polygon interior” is the set of all points that lie inside the
exterior ring and outside all interior rings.
I rule as invalid any input polygon that:
is self-intersecting. It’s perfectly OK for a polygon to revisit the
same vertex many times. But it’s not OK for a polygon to either:
have two edges that intersect each other away from their
vertices (the “edge interior”),
have a vertex of some edge A to lie on the edge interior of
some other edge B.
In both these scenarios, adding additional vertices can solve
these problems.
has a hole that intersects with itself (i.e. the hole is
self-intersecting in the same way as described above).
has a hole that intersects any other hole (except by the sharing of
vertices).
has a hole that intersects with the exterior ring (except by the
sharing of vertices).
has a hole within a hole.
Figure 1: valid and invalid polygons
Figure 1 shows two valid polygons, a and b, and three invalid
polygons:
c is invalid because it is self-intersecting;
d is invalid because its hole intersects with its exterior ring;
e is invalid because its hole has a hole.
Note that b could be a single exterior ring, or an exterior ring
with one or two interior rings. All would be valid because the only
intersections possible between these different rings are at vertices.
It’s all about intersections
No matter which boolean operation you wish to perform between polygons
A and B, you need to find the coördinates of the intersections
between the edges of A and the edges of B. To find the coördinates
of an intersection between two edges (or “line segments”), refer to
page 244 (as printed on the page; page 281 of the PDF) of “Geometric
Tools for Computer Graphics”. There are more results possible than you
might like though:
0 points of intersection found: the two edges have no points in
common;
1 point of intersection found: the two edges cross over each other
somewhere;
several points of intersection found: the two edges overlap (there
are several points which are
collinear to both
edges). Typically, you want to know the coördinates of the two
vertices which mark the ends of this overlapping region.
Starting down the floating-point rabbit hole
A refrain that I’m going to keep coming back to is:
Floats are not much different from ints, and if an algorithm isn’t
correct for ints, it’s definitely not going to be correct for
floats.
By “correct”, I don’t necessarily mean “calculates the exact right
number”. I more mean that from a logical point of view, it “does the
right thing”.
If the coördinates of the vertices of our line segments were expressed
in ints and not floats, then we could draw this sort of diagram:
Figure 2: Intersection of two line segments with integer coördinates
The coördinates of our original two line segments are valid integer
coördinates: the orange line is (20,65) to (24,60), and the blue
line is (20,61) to (25,64).
The point of intersection cannot be expressed as integer
coördinates. And once you ponder this for a moment, you realise the
probability of any point of intersection being expressible as
integer coördinates is close to 0. So we have no choice but to move
the point of intersection to the nearest coördinate that we can
express with integers. I think of this as “snap to grid”.
The result is four line segments, none of which exactly match up to
the originals, but are as close as they can be. Note that this
“snapping to grid” has changed the angle of the orange and blue
lines. Foreshadowing for the distant future!
This “snapping to grid” is logically correct: it’s impossible to
avoid, but it does also mean that the result you get back won’t be
quite perfect, simply because we cannot represent the point of
intersection exactly. Well, the situation with floating point
coördinates is no different to that of integer coördinates (in fact,
it’s a bit worse).
a sign bit. This is the 63rd bit. 0 indicates a positive number; 1
means negative.
Then 11 bits which represent the exponent. This is a biased unsigned
integer: to get the real exponent, decode these 11 bits as an
unsigned int, cast to a signed int, and then subtract 1023.
Then 52 bits which represent the fraction (or mantissa). Again,
this encoded as an unsigned int.
If the exponent is all 0s then we have a subnormal number:
(−1)sign × 2−1022 × 0.fraction
If the exponent is all 1s (a.k.a 0x7FF), then we have either NaN
(not-a-number), or infinity. We won’t worry about these any further
here.
Otherwise, we have a normal number: (−1)sign ×
2(exponent − 1023) × 1.fraction
For the rest of this discussion when I write exponent, I mean the
fully decoded and unbiased number, i.e. the result of the −1023 step.
Here’s some Go code I wrote that decodes a float64. Note the
fraction gets printed in binary (base-2):
funcdecodeFloat64(numfloat64)string{bits:=math.Float64bits(num)sign:=bits>>63// 0x7FF is 2^11 - 1
exponent:=int64(((bits>>52)&0x7FF))-1023isSubnormal:=((bits>>52)&0x7FF)==0// 0xFFFFFFFFFFFFF is 2^52 -1
fraction:=bits&0xFFFFFFFFFFFFFsignStr:="+"ifsign==1{signStr="-"}ifisSubnormal{returnfmt.Sprintf("%s0.%052b * 2^%d (subnormal)\t%g",signStr,fraction,-1022,num)}else{returnfmt.Sprintf("%s1.%052b * 2^%d\t%g",signStr,fraction,exponent,num)}}
2−1022 is the smallest possible normal number we can
represent: the fraction is all 0s, and the exponent is as low as
it can get. If we want to go even smaller, we need to start using the
subnormal encoding. For subnormals, the exponent is fixed at
−1022 (the exponent bits are all 0s), and now the implicit 1.
prefix of the fraction becomes a 0. prefix, allowing us to
represent even smaller numbers. So as we continue to halve the number,
the exponent is staying the same, but the fraction is now
changing. This sequence ends as:
0.00000000000000000000000000000000000000000000000000012 ×
2−1022 (a.k.a. 5e−324) is the smallest possible number greater
than 0 that we can represent with a 64-bit float. No number between 0
and 5e−324 can be represented.
Thinking about floats
What does this all mean though, and how can we think about it a bit
more intuitively?
For any given exponent we have all 52 bits of the fraction
available to us, so 252 possible values, which are all
evenly distributed. So we can think of the fraction as a simple
integer count (multiple) of the smallest possible value for that
exponent. For any normal float, we can find that smallest value
by:
Or in other words, it’s 0.00000000000000000000000000000000000000000000000000012 × 2exponent
Let’s imagine our exponent is 48. So that means we have
252 evenly-distributed values available to us from
248 (inclusive) to 249 (exclusive). We can think
of the fraction as a simple counter of the number of
smallest-values, which for this exponent is:
So if we treat the fraction as an integer (ignoring the 1.
prefix), then we can think of a number as 2exponent +
fraction × ULP-for-this-exponent
For subnormals the exponent is fixed at −1022, and the 1. prefix
of the fraction is gone, so the ULP for subnormals is the same as
the smallest possible non-0 value:
So between 0 and the first normal number (2−1022), there
are once again 252 possible values, which are evenly
spaced, and we can think of them as multiples of 2−1074.
Interestingly, the ULP is the same for subnormals as for the first
normal range: the first normal range has an exponent of −1022,
it’s just the fraction now has the 1. prefix.
Putting it all together, the ULP is smallest when we’re closest
to 0. Every time the exponent increases by 1, the ULP doubles.
Back to the grid
Figure 3 (again): Intersection of two line segments with integer coördinates
Coming back to this grid, now armed with our understanding of floats,
it’s suddenly apparent that the numbers on this grid could just be the
integer interpretation of the fraction from floating point
numbers. I.e. this is just the multiple of the ULP for whatever
exponent we’re currently using. So the intuition about
“snapping to grid” that made so much sense earlier when talking about
integer coördinates applies to floats too.
Earlier, I said things were a bit worse with floats. This is because
of having to deal with the fact the grid might may not be
regular. Consider:
Figure 4: Intersection of two line segments with float coördinates
Here, the x-coördinates go beyond one exponent (52) and into the next
(53). This means that the resolution of the grid halves (or the
spacing doubles). But the y-coördinates are all within the range of a
single exponent, which ends up creating a rectangular grid. Somewhat
unusual (but again, when you ponder this for a moment, it’s clearly
the common case: the square grid only exists when the exponents for
the x- and y-coördinates are the same). Furthermore, the intersection
of the same two line segments, translated around to different
exponents, can be wildly different as the resolution of the grid
changes. For example, if these polygons are geo-spatial data
(i.e. shapes on the ground in some way), then intersecting them using
different coördinate reference
systems (CRS)
could result in very different shapes, as the same polygons on the
ground could end up using different exponents when expressed in
different CRSs.
But I think the most challenging problem with non-regular grids is
that it means subtraction might not work, and that means that when we
calculate vectors by subtracting the coördinates of two vertices, that
vector might be wrong.
With all values spaced evenly apart (such as with integers), we have:
a + (b − a) = b. But with a non-regular grid, if a and b
have different exponents, then this may no longer hold, because it may
be impossible to represent b − a. For example:
Let’s say a is the value just before you get to 1.0, and b is 2.0
b − a should be a value just greater than 1.0, so we need to use the exponent for the range 1.0-to-2.0 to represent this value.
But the exponent for the range 1.0-to-2.0 is 1 greater than for the
range 0.5-to-1.0. So therefore the ULP for the range from 1.0-to-2.0
is twice the ULP for the range 0.5-to-1.0, and we know that a
makes use of this finer-grained resolution, because we defined it to
be the value immediately before 1.0, i.e. the greatest value
possible using the 0.5-to-1.0 exponent.
So the smallest value greater than 1.0 is too big to represent b −
a.
funcmain(){a:=math.Nextafter(1,0)b:=float64(2)difference:=b-afmt.Println(" a =",decodeFloat64(a))fmt.Println(" b =",decodeFloat64(b))fmt.Println("(b-a) =",decodeFloat64(difference))fmt.Println("a + (b-a) == b ?",a+difference==b)fmt.Println("a == b - (b-a) ?",a==b-difference)fmt.Println()a=math.Nextafter(a,0)difference=b-afmt.Println(" a' =",decodeFloat64(a))fmt.Println(" b =",decodeFloat64(b))fmt.Println("(b-a') =",decodeFloat64(difference))fmt.Println("a' + (b-a') == b ?",a+difference==b)fmt.Println("a' == b - (b-a') ?",a==b-difference)}
a = +1.1111111111111111111111111111111111111111111111111111 * 2^-1 0.9999999999999999
b = +1.0000000000000000000000000000000000000000000000000000 * 2^1 2
(b-a) = +1.0000000000000000000000000000000000000000000000000000 * 2^0 1
a + (b-a) == b ? true
a == b - (b-a) ? false
a' = +1.1111111111111111111111111111111111111111111111111110 * 2^-1 0.9999999999999998
b = +1.0000000000000000000000000000000000000000000000000000 * 2^1 2
(b-a') = +1.0000000000000000000000000000000000000000000000000001 * 2^0 1.0000000000000002
a' + (b-a') == b ? true
a' == b - (b-a') ? true
So we can see the computer decides that b − a is 1.0 exactly –
i.e. a value that’s too small. If we define aʹ to be one value
smaller than a, then we have that aʹ is on an even multiple of its
ULP, which means it can be expressed correctly in the ULP of the next
exponent up, so b − aʹ can be expressed exactly as a 64-bit float.
Given that finding the point of intersection between two line segments
will require calculating the vector of each line segment, we need to
tread carefully, to say the least!
Wrapping up
Hopefully I’ve illustrated that whether or not your coördinates are
ints or floats, it’s unavoidable that line segment intersections will
change the lines of your polygons: the intersection will always “snap
to grid”. A floating point number can be thought of as
2exponent + fraction × ULP-for-this-exponent, and so
within a given exponent, floats are really no different than
integers. But for line segments that have coördinates that use
different exponents, trouble lies ahead because we can’t expect the
vector to be correct: we will likely have to do further, and coarser,
“snapping to grid” if we want to be able to trust our vectors.
Thinking back to Part 1, the algorithm
there needed to repeatedly consider whether one polygon covered
another, to union polygons together and then subtract or intersect
other polygons. It now seems that we actually need to do quite a lot
of preparation work for each polygon based on what it intersects with,
in order to have a logically correct outcome. This suggests an
algorithm that allows you to prepare many polygons at a time, before
then querying and calculating boolean operations between any pair of
the prepared polygons. This is not what the Martínez algorithm does:
the Martínez algorithm loads only the two polygons you want to operate
on, and so this is further motivation for really digging into the
algorithm so we can see where it can be extended. Stay tuned for part
3!
I didn't get an answer there, but some people on the Nix Haskell channel on
Matrix helped a bit, but it seems this particular use case requires a bit of
manual work. The following commands get me an almost fully working setup:
cabal haddock --haddock-internal --haddock-quickjump --haddock-hoogle --haddock-html
hoogle_dir=$(dirname $(dirname $(readlink -f $(which hoogle))))
hoogle generate --database=local.hoo \
$(for d in $(fd -L .txt ${hoogle_dir}); do printf "--local=%s " $(dirname $d); done)\
--local=./dist-newstyle/build/x86_64-linux/ghc-9.8.2/pkg-0.0.1/doc/html/pkg
hoogle server --local --database=local.foo
What's missing is working links between the documentation of locally installed
packages. It looks like the links in the generated documention in Nix have a lot
of relative references containing ${pkgroot}/../../../../ which is what I
supect causes the broken links.
In a previous
post
I challenged you to solve Factor-Full
Tree. In this
problem, we are given an unlabelled rooted tree, and asked to create a divisor
labelling. That is, we must label the vertices with positive
integers in such a way that \(v\) is an ancestor of \(u\) if and only if
\(v\)’s label evenly divides \(u\)’s label.
For example, here is a tree with a divisor labelling:
Divisor labelling of a tree
The interesting point (though irrelevant to solving the problem) is
that this is a method for encoding a tree as a set of integers:
because \(v\) is an ancestor of \(u\)if and only if\(v\)’s label divides
\(u\)’s, all the information about the tree’s structure is fully
contained in the set of labels. For example, if we simply write
down the set \(\{1, 5, 6, 7, 12, 14, 21, 49, 63\}\), it is possible to
fully reconstruct the above tree from this set.Note that we
consider trees equivalent up to reordering of siblings, that is, each
node has a bag, not a list, of children.
This is not a
particularly efficient way to encode a tree, but it is certainly
interesting!
Basic setup
First, some basic setup.See here for the Scanner
abstraction, and
here
for the basics of how I organize solutions.
The first line of
input specifies the number of nodes \(N\), and after that there are
\(N-1\) lines, each specifying a single undirected edge.
We are guaranteed that the edges describe a tree; next we will
actually build a tree data structure from the input.
Building trees
There are many
similar problems which specify a tree structure by giving a list of
edges, so it’s worthwhile trying to write some generic code to
transform such an input into an actual tree. In an imperative language
we would do this by building a map from each node to its neighbors,
then doing a DFS to orient the tree. Our Haskell code will be
similar, except building the map and doing a DFS will both be
one-liners!
First, a function to turn a list of undirected edges into a Map
associating each vertex to all its neighbors. It’s convenient to
decompose this into a function to turn a list of directed edges into
a Map, and a function to duplicate and swap each pair. We won’t
need dirEdgesToMap for this problem, but we can certainly imagine
wanting it elsewhere.
edgesToMap ::Ord a => [(a, a)] ->Map a [a]edgesToMap =concatMap (\p -> [p, swap p]) >>> dirEdgesToMapdirEdgesToMap ::Ord a => [(a, a)] ->Map a [a]dirEdgesToMap =map (second (: [])) >>> M.fromListWith (++)
Next, we can turn such a neighbor Map into a tree. Rather than
returning a literal Tree data structure, it’s convenient to
incorporate a tree fold: that is, given a function a -> [b] -> b, a neighbor
map, and a root node, we fold over the whole tree and return the
resulting b value. (Of course, if we want an actual Tree we can use
mapToTree Node.) We can also compose these into a single function edgesToTree.
mapToTree ::Ord a => (a -> [b] -> b) ->Map a [a] -> a -> bmapToTree nd m root = dfs root rootwhere dfs parent root = nd root (maybe [] (map (dfs root) .filter (/= parent)) (m !? root))edgesToTree ::Ord a => (a -> [b] -> b) -> [(a, a)] -> a -> bedgesToTree nd = mapToTree nd . edgesToMap
Inventing divisor labellings
So how do we create a divisor labelling for a given tree? Clearly, we
might as well choose the root to have label \(1\), and every time we
descend from a parent to a child, we must multiply by some integer,
which might as well be a prime. Of course, we need to multiply by a
different prime for each sibling. We might at first imagine simply
multiplying by 2 for each (arbitrarily chosen) leftmost child, 3 for
each second child, 5 for each third child, and so on, but this does
not work—the second child of the first child ends up with the same
label as the first child of the second child, and so on.
Each node \(u\)’s label is some prime \(p\) times its parent’s label; call
\(p\) the factor of node \(u\). It is OK for one child of \(u\) to also
have factor \(p\), but the other children must get different factors.
To be safe, we can give each additional child a new globally unique
prime factor. This is not always necessary—in some cases it can be
OK to reuse a factor if it does not lead to identically numbered
nodes—but it is certainly sufficient. As an example, below is a
divisor labelling of the example tree from before, via this scheme.
Each edge is labelled with the factor of its child.
Divisor labelling of a tree with consecutive primes
Notice how we use \(2\) for the first child of the root, and \(3\) for the
next child. \(3\)’s first child can also use a factor of \(3\), yielding
a label of \(3^2 = 9\). \(3\)’s next child uses a new, globally unique
prime \(5\), and its third child uses \(7\); the final child of \(1\) uses
the next available prime, \(11\).
We can code this up via a simple stateful traversal of the tree. (For
primes, see this
post.)
It’s a bit fiddly since we have to switch to the next prime between
consecutive children, but not after the last child.
primes :: [Integer]primes =2: sieve primes [3..]where sieve (p : ps) xs =let (h, t) =span (< p * p) xsin h ++ sieve ps (filter ((/=0) . (`mod` p)) t)curPrime ::State [Integer] IntegercurPrime = gets headnextPrime ::State [Integer] ()nextPrime = modify taillabelTree ::Tree a ->Tree (Integer, a)labelTree =flip evalState primes . go 1where go ::Integer->Tree a ->State [Integer] (Tree (Integer, a)) go x (Node a ts) =Node (x, a) <$> labelChildren x ts labelChildren ::Integer-> [Tree a] ->State [Integer] [Tree (Integer, a)] labelChildren _ [] =pure [] labelChildren x (t : ts) =do p <- curPrime t' <- go (x * p) tcase ts of [] ->pure [t'] _ ->do nextPrime (t' :) <$> labelChildren x ts
There is a bit of additional glue code we need get the parsed tree
from the input, apply labelTree, and then print out the node
labels in order. However, I’m not going to bother showing it,
because—this solution is not accepted! It fails with a WA (Wrong
Answer) verdict. What gives?
Keeping things small
The key is one of the last sentences in the problem statement, which I
haven’t mentioned so far: all the labels in our output must be at most
\(10^{18}\). Why is this a problem? Multiplying by primes over and
over again, it’s not hard to get rather large numbers. For example,
consider the tree below:
Tree for which our naïve scheme generates labels that are too large
Under our scheme, the root gets label \(1\), and the children of the
root get consecutive primes \(2, 3, 5, \dots, 29\). Then the nodes
in the long chain hanging off the last sibling get labels \(29^2, 29^3, \dots, 29^{13}\), and \(29^{13}\) is too big—in fact, it is
approximately \(10^{19}\). And this tree has only 23 nodes; in general
the input can have up to 60.
Of course, \(29\) was a poor choice of factor for such a long chain—we
should have instead labelled the long chain with powers of,
say, 2. Notice that if we have a “tree” consisting of a single long
chain of 60 nodes (and you can bet this is one of the secret test
inputs!), we just barely get by labelling it with powers of two from
\(2^0\) up to \(2^{59}\): in fact \(2^{59} < 10^{18} < 2^{60}\). So in
general, we want to find a way to label long chains with small primes,
and reserve larger primes for shorter chains.
Attempt 1: sorting by height
One obvious approach is to simply sort the children at each node by
decreasing height, before traversing the tree to assign prime
factors. This handles the above example correctly, since the long
chain would be sorted to the front and assigned the factor 2.
However, this does not work in general! It can still fail to assign
the smallest primes to the longest chains. As a simple example,
consider this tree, in which the children of every node are already
sorted by decreasing height from left to right:
Tree for which sorting by height first does not work
The straightforward traversal algorithm indeed assigns powers of 2 to
the left spine of the tree, but it then assigns 3, 5, 7, and so on to
all the tiny spurs hanging off it. So by the time we get to other long
chain hanging off the root, it is assigned powers of \(43\), which are
too big. In fact, we want to assign powers of 2 to the left spine,
powers of 3 to the chain on the right, and then use the rest of the
primes for all the short spurs. But this sort of “non-local”
labelling means we can’t assign primes via a tree traversal.
To drive this point home, here’s another example tree. This one is
small enough that it probably doesn’t matter too much how we label it,
but it’s worth thinking about how to label the longest chains with the
smallest primes. I’ve drawn it in a “left-leaning” style to further
emphasize the different chains that are involved.
Tree with chains of various lengths
In fact, we want to assign the factor 2 to the long chain on the left;
then the factor 3 to the second-longest chain, in the fourth column;
then 5 to the length-6 chain in the second column; 7 to the length-3
chain all the way on the right; and finally 11 to the smallest chain, in column 3.
In general, then, we want a way to decompose an arbitrary tree into
chains, where we repeatedly identify the longest chain, remove it from
consideration, and then identify the longest chain from the remaining
nodes, and so on. Once we have decomposed a tree into chains, it will
be a relatively simple matter to sort the chains by length and assign
consecutive prime factors.
This decomposition occasionally comes in handy (for example, see
Floating
Formation), and
belongs to a larger family of important tree decomposition techniques
such as heavy-light
decomposition. Next time,
I’ll demonstrate how to implement such tree decompositions in Haskell!
<noscript>Javascript needs to be activated to view comments.</noscript>
I'll be appearing at the Fringe in the Cabaret of Dangerous Ideas,
12.20-13.20 Monday 5 August and 12.20-13.20 Saturday 17 August, at Stand 5.
The 5 August show is joint with Matthew Knight of the National Museums of Scotland, the 17 August show is all mine. Both shows are hosted by comedian Susan Morrison.
You can book either via the Fringe or via the Stand.
If one is sold out, try the other.
Here's the brief summary:
Chatbots like ChatGPT and Google's Gemini dominate the news. But the answers they give are, literally, bullshit. Historically, artificial intelligence has two strands. One is machine learning, which powers ChatGPT and art-bots like Midjourney, and which threatens to steal the work of writers and artists and put some of us out of work. The other is the 2,000-year-old discipline of logic. Professor Philip Wadler (The University of Edinburgh) takes you on a tour of the risks and promises of these two strands, and explores how they may work better together.
I'm looking forward to the audience interaction. Everyone should laugh and learn something. Do come!
Every so often, the Collatz conjecture comes up in discussion forums I read, and I start to think about it again. I did for a bit this past weekend. Here are my thoughts this time around.
The Problem
A Collatz sequence starts with some positive integer x₀ and develops the sequence inductively as xₙ₊₁= xₙ / 2 if xₙ is even, or 3xₙ + 1 if xₙ is odd. For instance, starting with 13, we get:
x₀ = 13
x₁ = 3(13) + 1 = 40
x₂ = 40 / 2 = 20
x₃ = 20 / 2 = 10
x₄ = 10 / 2 = 5
x₅ = 3(5) + 1 = 16
x₆ = 16 / 2 = 8
x₇ = 8 / 2 = 4
x₈ = 4 / 2 = 2
x₉ = 2 / 2 = 1
x₁₀ = 3(1) + 1 = 4
x₁₁ = 4 / 2 = 2
x₁₂ = 2 / 2 = 1
From there, it’s apparent that the sequence repeats 1, 4, 2, 1, 4, 2, … forever. That is one way for a Collatz sequence to end up. The famous question here, known as the Collatz Conjecture, is whether it’s the only way any such sequence can terminate. Not necessarily! There could be other cycles besides 1, 4, 2. Or there could be a sequence that keeps increasing forever without repeating a number. Or maybe that never happens. No one knows!
We do know a few things. First, if these things happen, they only happen with astronomically large numbers that even powerful computers haven’t been able to check by hand. We know that if even a single number is repeated, then that part of the sequence will repeat forever, since the whole tail of the sequence is determined by any single number in it. And we know that such a sequence cannot decrease forever, since Collatz sequences remain positive integers, so eventually would reach number that we know end up in the 1,4,2 loop. We also know that there are other loops in Collatz sequences that begin with negative integers, so the fact that there have been none found so far in the positive integers is at least a little surprising.
The Collatz Conjecture is famous because it’s probably one of the easiest unsolved math problem to understand the meaning of, for mathematical novices. There’s no Riemann zeta function to define. Just even and odd numbers, division by two, multiplication by three, and adding one. That doesn’t mean it’s easy to solve, though! Many mathematicians and countless novices have spent decades working on the problem, and there’s no promising road to a solution. The mathematician Erdős suggested that it’s not simply that no one has found the solution, but that mathematics is lacking even the basic tools needed to work on this problem.
Collatz and Alternate Bases
There are many, many ways to think about the Collatz conjecture, but one of them is to look at the computation in different bases. We’re not really attempting to find a more efficient way to compute Collatz sequences. If we cared about that, it would be far more efficient to use whatever representation our computing hardware is designed for! Rather, what we’re looking for here is the possibility of some kind of pattern in the computation that reveals something analytical about the problem.
Addition works essentially the same way the same regardless of base, but computations involved in multiplication and division are very dependent on the choice of base! Since the definition of the Collatz sequence two natural choices for computing Collatz sequences are base 2 (binary) and base 3 (ternary).
In base 2, it’s trivial to decide whether a number is even or odd, and to divide by two. On the other hand, computing 3n+1 is less trivial, requiring a pass over potentially every digit in the number.
In base 3, the opposite happens. Computing 3n+1 is now trivial. But recognizing that a number is even and dividing by two now require a pass over every digit.
Let’s jump into the details and see what happens.
Base 3 in Detail
Base 3 representations are appealing for the Collatz sequence because it’s trivial to compute 3n+1. It amounts to simply adding a 1 to the end of the representation, shifting everything else left (i.e., multiplying it by 3) to make room. If you have n = 1201 (decimal 46), for example, then 3n+1 = 12011 (demical 139).
The more difficult tasks are:
Determining whether the number is even or odd. Unlike decimal, we cannot simply look at the last digit. Instead, a number in base 3 is even if and only if it has an even number of 1s in its representation. That’s not hard to count, but it does require looking at the entire sequence of digits.
Dividing by two. Given a sequence of base 3 digits, we can express the division algorithm on right-to-left numbers as a state machine using the long division algorithm with remainders as states (starting with zero), using the following division table.
Let’s see how this table works with an example. Starting again with 1201 (decimal 46):
We always start with a remainder of 0. The first digit is 1. That’s the second line of the table. The output digit is, therefore, 0, and the next remainder is 1.
A remainder of 1 and a digit of 2 is the last line of the table. It tells us to add a 2 to the output, and proceed with a remainder of 1.
A remainder of 1 and a digit of 0 is the fourth line. We add a 1 to the output, and proceed with a remainder of 1.
A remainder of 1 and a digit of 1 is the fifth line. We add a 2 to the output and proceed with a remainder of 0.
We’re now out of digits. The quotient is 0212 (decimal 23, but note that leading zero which we’ll talk about later!) and the remainder is 0.
Naively, we would have to make two passes over the current number: one to determine whether it’s even or odd, and then again, if it’s even, to divide by two. We can avoid this, though, by remembering that if a number is odd, we intend to compute 3n+1, which will always be even (because the product of two odd numbers is odd, so adding one makes it even), so we’ll then divide that by two. A little algebra reveals that (3n+1)/2 = 3(n/2 - 1/2) + 2 = 3⌊n/2⌋ + 2 if n is odd.
What this means is that we can go ahead and halve n regardless of whether it’s even or odd. At the end, we’ll know whether there’s a remainder or not, and if so, we will already be in position to append a 2 (rather than a 1 as discussed earlier) to the halved number and rejoin the original sequence. This skips one step of the Collatz sequence, but that’s okay. If our goal is only to determine whether the sequence eventually reaches 1, it doesn’t change the answer if we take this shortcut.
Appending that 2 to the end of the number changes the meaning of our state transition table a little bit. Instead of automatically quitting when we reach the end of the current number, we’ll need a chance to append another digit at the end. We’ll add rows to the table for what to do after all the digits have been seen, and be explicit about when to terminate (i.e., finish processing).
There’s one more detail we can handle as we go: as we saw earlier, dividing by two can produce a leading zero at the beginning of the result, which is unnecessary. We can arrange to never produce that leading zero at all, so we don’t need to ignore or remove it later. We just need to remember where we’re just starting and therefore don’t need to write leading zeros. In that case, the remainder is always zero, so there’s only one state to add.
Since there are no leading zeros in the representations, we need not concern ourselves with the case where the first digit encountered is a zero, but if you want to handle it, we can produce no output and remain in the Just Starting state, since it ought to change nothing. I’ve done so in the code below.
We can iterate this state machine on ternary numbers, and get consecutive values from the Collatz sequence, though slightly abbreviated because we combined the 3n+1 step with the following division by 2. The Collatz conjecture is now equivalent to the proposition that this iterated state machine will eventually produce only a single 1 digit.
I’ve implemented this in the Haskell programming language as follows:
Starting with 1201 (decimal 46), we get 212 (decimal 23), 1022 (decimal 35), 1222 (decimal 53), 2222 (decimal 80), 1111 (decimal 40), 202 (decimal 20), 101 (decimal 10), 12 (decimal 5), 22 (decimal 8), 11 (decimal 4), 2, 1, 2, 1, … As predicted, that’s the Collatz sequence, except for the omission of 3n+1 terms since their computation is merged into the following division by two.
Base 2 in Detail
So what happens in base 2 (binary)? It’s a curiously related but different story!
Determining whether a number is even or odd is trivial: just look at the last bit and observe whether it is 0 or 1.
Dividing an even number by two is trivial: once you observe that the last digit is a 0, simple delete it, shifting the remaining bits to the right to fill in.
However, computing 3n+1 becomes less trivial, now requiring a pass over the entire digit sequence.
Since the hard step is multiplication, and the algorithmically natural direction to perform multiplication is from right to left, we can reverse the order in which we visit the bits, progressing from the least-significant to the most-significant. This is a change from the base 3 case, where division (the inverse of multiplication) was easier to perform in the left-to-right order.
We can start as before, by writing down a simple state transition table for a state machine that multiplies a binary number by 3. The state here is represented by the number carried to the next column.
(You might recognize this as the same table we already wrote down for halving a ternary number! The only differences are the column headers: the role of states and digits are swapped, and that we must traverse the digits in the opposite order.)
There’s one unfortunate subtlety to this table, and it has to do with leading zeros again. In principle, we think of a number in any base as having an infinite number of leading zeros on the left. In order to get correct results from this table, we need to continue consuming more digits until both the remaining digits and the current remainder are all zero. To express this, we’ll again need to convert our transition table to use explicit termination. This is so that we can stop at exactly the right point and not emit any unnecessary trailing zeros.
But what about the rest of the logic of the Collatz sequence?
We should add one after tripling to compute 3n+1. That would also require a pass over potentially the entire number in the worst case… but we’re in luck. We can combine the two tasks just by starting from the Carry 1 state when following this state transition diagram.
If the number is even, we should divide by two. Recall how in the ternary case, we merged some of the halving with the 3n+1 computation? This time, we can merge all the halving! Dividing even numbers by two just means dropping trailing zeros from the right side of the representation. Since we’re working right to left, it’s easy to add one more state that ignores trailing zeros at the start of the input.
We need to be a little careful here, because this version of the Collatz sequence never emits a 1, so looking for a 1 in the sequence is doomed! Instead, the numbers displayed are only the ones immediately after a 3n+1 step, so the final behavior (for all numbers computed so far, anyway) is an infinitely repeating sequence of 4s. We know from earlier that 4 is part of the 1,4,2 cycle, so seeing 4s is enough to know that the full Collatz sequence passes through 1.
We can fix this by remembering refusing to emit any of the trailing zeros. Now we’re ignoring trailing zeros, but also never producing them. The blowup in the number of states needed to keep track of whether a zero has been emitted yet is unfortunate, because we may pass through multiple states before emitting the first non-zero digit. Each of those states needs a copy that handles this new case. Here’s our final transition table.
main :: IO () main = do [n] <- fmap read <$> getArgs traverse_ print (iterate step2 n)
And a result:
$ cabal run exe:collatz2 '[B1, B0, B1, B1, B0, B1]' | head -80 [B1,B0,B1,B1,B0,B1] [B1,B0,B0,B0,B1] [B1,B0,B1,B1] [B1,B0,B1] [B1] [B1] [B1]
We start with 101101 (45 in decimal). We triple and add one to get 136, then half to get 68, then 34, then 17, which is the next value that appears (10001 = 17 in decimal). We triple and add one to get 52, then half to get 26, then 13, which is 1101 in binary, and the third number in the list. (Remember the bits are listed from right to left!) Now triple and add one to get 40, and half until you reach 5, which is 101 in binary and the fourth number in the list. Finally, triple and add one to get 16, and half until you reach 1, which is where it stays.
Analysis
Is this a promising avenue to attack the Collatz Conjecture? Almost surely not. I’m not sure anyone knows a promising way to solve the problem. Nevertheless, we can ask what it might look like if one were to use this approach to attempt some progress on the conjecture.
One way (in fact, in some sense, the only way) to solve the Collatz Conjecture is to find some kind of quantity that:
Takes its minimum possible value for the number 1.
Always decreases from one element of a Collatz sequence to the next, except at 1.
Cannot decrease forever.
If such a quantity exists, then a Collatz sequence must eventually reach 1, so the Collatz Conjecture must be true — and conversely, in fact, if the Collatz Conjecture is true, then such a quantity must exist, since the number of steps to reach 1 would then be exactly such a quantity. This is equivalent to the original conjecture, which is why I commented that proving this is the only way to solve it! But this way of looking at the conjecture is interesting because it lets you define any quantity you like, as long as it has those three properties.
We know a lot of things that this quantity isn’t. It can’t be just the magnitude of the number, since that can increase with the 3n+1 rule. It also can’t be the number of digits (in any base), since that can increase sometimes, as well. Plenty of people have looked for other quantities that work. It’s useful to me to think of the quantity as a measure of the “entropy” (or rather its opposite, since it’s decreasing). It’s something you lose any time you take a step, and this tells you that eventually you will reach some minimum state, which must be the number 1.
Just guessing a quantity is unlikely to work. But if you can come to some understanding of the behavior of these computations, it’s conceivable there’s a quantity embedded in them somewhere that satisfies these conditions. If this entropy value is calculated digit by digit, you may be able to isolate how it changes in response to each of these state transition rules.
It is, at the very least, one point of view from which one might start thinking. I never claimed to have any answers! This was always just a random train of thought.
There are a small set of common boolean operations that you can
perform on two polygons: union (a∪b), intersection (a∩b), subtraction
(a−b and b−a), and exclusive-or (a xor b) (also called
symmetric-difference). All apart from subtraction are commutative,
e.g. a∪b gives the same result as b∪a.
Boolean operations between two overlapping polygons, a and b
I found myself in a situation where I needed to overlay a few dozen
datasets of polygons, and trace the intersecting shapes. When you’re
dealing with over 100 million polygons, this becomes non-trivial, and
I ended up writing my own batch processing solution, which was highly
parallelised. My solution nevertheless used an existing implementation
of these boolean operations from libgeos.
Although not the focus of this series, it’s probably worth documenting
the algorithm I came up with.
The algorithm proceeds in rounds: processing each round may produce
some polygons that go into the next round, and it may produce some
polygons that go into the output. When you’ve processed all the
polygons in the current round, you start work on the next
round. Eventually you reach a point where the next round is empty, and
your result is then all the polygons you’ve output.
For each polygon A in the current round:
If there is some other polygon B in the current round that
completely covers A, then ignore A and move on to the next
polygon in the current round.
For all other polygons (B, C, …) in the current round that
intersect with A, output A − ⋃{B, C, …}.
In addition, for each polygon B from the current round that
intersects with A, add to the next round A ∩ B.
If the next round is empty, we’re done, otherwise start processing
the next round after removing duplicates.
Let’s walk through an example:
Round 1
Here are the original polygons:
Round 1 input
A, B and C are all covered by D (D does not have any holes
in it), so A, B and C are completely ignored.
But for D, we:
output D − ⋃{A, B, C}
add to next-round D ∩ A, D ∩ B, and D ∩ C (which is
equal to the set A, B, C because the ∩ D changes
nothing).
Round 1 results
Round 2
Round 2 now starts, processing the next-round results from
round 1: A, B, and C.
A is covered by C, so A gets completely ignored.
For B we:
output B − ⋃{A, C}
add to next-round B ∩ A and B ∩ C.
For C we:
output C − ⋃{A, B}
add to next-round C ∩ A (which is equal to A).
Round 2 results
Round 3
Round 3 now starts, processing the next-round results from
round 2: A, B ∩ A, and B ∩ C
For A, we
output A − ⋃{(B ∩ A), (B ∩ C)} (which is the same as A − (B ∩ A), or A − (B ∩ C))
add to next-round A ∩ (B ∩ A), which is the same as B ∩ A.
add to next-round A ∩ (B ∩ C), which is the same as B ∩ A.
B ∩ A is covered by both A and B ∩ C, so B ∩ A gets
completely ignored.
For B ∩ C, we:
output (B ∩ C) − ⋃{A, (B ∩ A)} (which is the same as (B ∩ C) − A, or (B ∩ C) − (B ∩ A))
add to next-round:
(B ∩ C) ∩ A, which is the same as B ∩ A
(B ∩ C) ∩ (B ∩ A), which is the same as B ∩ A
After removing duplicates, the next-round contains only one polygon,
equal to B ∩ A.
Round 3 results
Round 4
Round 4 now starts, processing the next-round results from
round 3: B ∩ A
For B ∩ A we output (B ∩ A) − ⋃{}. I.e. (B ∩ A)
There’s nothing to go to the next round, so we are finished.
The full set of output polygons is:
D − ⋃{A, B, C}
B − ⋃{A, C}
C − ⋃{A, B}
A − (B ∩ A)
(B ∩ C) − A
B ∩ A
Making it go fast
In each round, for each polygon A, we need to be able to find the
set of polygons that intersect with A. So that suggests using an
R-tree or
quad-tree and building some
sort of index of intersections. In my particular case, I had so many
polygons that they couldn’t be held in RAM, so these data structures
had to be designed to work off disk. As usual for me, I used my LMDB
bindings and built a quad-tree on top of
that. After some effort, my quad-tree could add multiple
non-intersecting polygon segments in parallel.
Within each round, you can process every polygon in parallel. With a
modern CPU with 16 cores or so, this makes a big difference. This is a
major point of difference to other approaches I attempted such as
using PostGIS: in my experience, PostGIS would
often start off using multiple cores, and then fall back to a single
core far too quickly. Even with my custom approach and a powerful
desktop machine, processing these datasets would take multiple
days. PostGIS would have taken months or more.
Additionally, the more I used libgeos to do the
polygon intersection, union, and difference operations, the more I
felt libgeos itself might be some way from optimal. I started
wondering whether I could reimplement these boolean algorithms using
OpenCL or
CUDA. That required learning
how these boolean operations actually work, and so I started down that
rabbit hole.
The next post in this series will start looking at a widely used
algorithm for boolean operations on polygons, and will start to
uncover the challenges and difficulties in this space.
The above shows an average of five recent polls for my constituency, Edinburgh North and Leith, and comes courtesy of Stop the Tories. Clearly, the Tories have no chance, but I will still be voting tactically. I am a member of the Greens. But if everyone who intends to vote Green instead votes SNP, the SNP will beat Labour (rather than the other way around). While the SNP has made some awful missteps of late, they are the best hope to push Labour toward the more progressive policies from which Starmer has dragged them away. My tactical vote goes to the SNP.
Quality and productivity are not necessarily mutually exclusive
One of my pet peeves is when people pit quality and
productivity against each other in engineering management discussions
because I don’t always view them as competing priorities.
And I don’t just mean that quality improves productivity in the long
run by avoiding tech debt. I’m actually saying that a focus on quality
can immediately boost delivery speed for the task at
hand.
In my experience there are two primary ways that attention to quality
helps engineers ship and deliver more features on shorter
timescales:
Mindfulness of quality counteracts tunnel vision
By “tunnel vision” I mean the tendency of engineers to focus too much
on their initial approach to solving a problem, to the point where they
miss other (drastically) simpler solutions to the same problem. When an
engineer periodically steps back and holistically evaluates the quality
of what they’re building they’re more likely to notice a simpler
solution to the same problem.
Prioritizing quality improves morale
Many engineers deeply desire being masters at their craft, and the
morale boost of doing a quality job can sharply increase their
productivity, too. Conversely, if you pressure an engineer to cut
corners and ship at all costs you might decrease the scope of the
project but you also might tank their productivity even more
and wipe out any gains from cutting scope.
HOWEVER, (and this is a big caveat) the above points
do not always apply, which is why I say that a focus on quality only
sometimes improves productivity. In other words, part of the
art/intuition of being a manager is recognizing the situations where
quality supports productivity.
For example, not every engineer cares about doing a quality job or
honing their craft (for some people it’s just a job) and if you ask
these kinds of engineers to prioritize quality they’re not going to get
the morale/productivity boost that a more passionate engineer might get.
Like, it could still be the right decision to prioritize quality, but
now it’s no longer an obvious decision.
Similarly, not every engineer will benefit from stepping back and
thinking longer about the problem at hand because some engineers are
enamored with complexity and aren't as good at identifying radically
simpler solutions (although I will say that valuing simplicity is a great
thing to cultivate in all of your engineers even if they’re not good at it
initially). As a manager you have to
recognize which engineers will move faster when given this extra breathing room and
which ones won’t.
Anyway, the reason I’m writing this post is to counteract the mindset
that quality and productivity are competing priorities because this
mentality causes people to turn off their brains and miss the numerous
opportunities where quality actually supports productivity (even in the
very short term).
Andres and Sam interview Pepe Iborra, exploring his journey from academia via banking to now Meta. In this episode, we discuss Pepe’s involvement in the evolution of the Haskell ecosystem, in particular the ongoing journey to improve the developer experience via work on debuggers, build systems and IDEs.
After 6 months of hard work, I am happy to announce that Solve.hs now has a new module - Essential Algorithms! You’ll learn the “Haskell Way” to write all of the most important algorithms for solving coding problems, such as Breadth First Search, Dijkstra’s Algorithm, and more!
You can get a 20% discount code for this and all of our other courses by subscribing to our mailing list! Starting next week, the price for Solve.hs will go up to reflect the increased content. So if you subscribe and purchase this week, you’ll end up saving 40% vs. buying later!
The GHC developers are happy to announce the availability of GHC 9.6.6. Binary
distributions, source distributions, and documentation are available on the
release page.
This release is primarily a bugfix release addressing some issues
found in the 9.6 series. These include:
A fix for a bug in the NCG that could lead to incorrect runtime results due to
erroneously removing a jump instruction (#24507).
A fix for a linker error that manifested on certain platform/toolchain combinations,
particularly darwin with a brew provisioned toolchain, arising due to a confusion
in linker options between GHC and cabal (#22210).
A fix for a compiler panic in the simplifier due to incorrect eta expansion (#24718).
A fix for possible segfaults when using the bytecode interpreter due to incorrect
constructor tagging (#24870).
And a few more fixes
A full accounting of changes can be found in the release notes. As
some of the fixed issues do affect correctness users are encouraged to
upgrade promptly.
We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and
other anonymous contributors whose on-going financial and in-kind support has
facilitated GHC maintenance and release management over the years. Finally,
this release would not have been possible without the hundreds of open-source
contributors whose work comprise this release.
As always, do give this release a try and open a ticket if you see
anything amiss.