Planet Haskell

February 17, 2025

Mark Jason Dominus

David McShane's mural with 18 Franks

Since the demolition of Harriet Tubman, this has been my favorite mural in Philadelphia. It's by Philadelphia muralist David McShane.

The mural is outside an infamous windowless bar called Dirty Frank's. I like to say that Oscar's Tavern on Sansom is Philadelphia's best Worst Bar. That's where, when the fancy place across the street wouldn't seat us, I took my coworker from out of town, with pride. Dirty Frank's might be Philadelphia's worst Worst Bar.

I few months ago Rik Signes remarked:

Street view of the sign outside McGlinchey's.  At the top it says “McGlinchey's Bar” with a green shamrock.  Under this in clear red letters on white is the word “Sandwiches”.  Below this hands a red neon sign that reads “DRAFT BEER”.

I think Mark Dominus said "Dirty Frank's is where I saw roaches walk over the food and when I told them, they shrugged"

I was at once able to refute this, because I know for a fact that I have never ordered food at Dirty Frank's. Nor would I. Actually I have only ever been there once, which was enough.

(Lorrie has a similar story about a similarly notorious bar, McGlinchey's. Hanging outside McGlinchey's is a sign that proclaims “sandwiches”. Lorrie tried to order a sandwich there and was met only with puzzled stares.)


I will stop digressing now. My current favorite mural is outside Dirty Frank's and is by David McShane. It depicts famous Franks through history. I enjoyed trying to identify the 18 Franks. Many years ago I took pictures of it so that I could offer my Gentle Readers an opportunity to enjoy this themselves. You can infer from the resolution of the pictures below how long ago that must have been. But at last, here they are. I will reveal the answers tomorrow.

(This link will point to tomorrow's article, once it is posted.)

by Mark Dominus (mjd@plover.com) at February 17, 2025 03:03 PM

Haskell Interlude

62: Conal Elliott

In this episode Wouter Swiestra and Niki Vazou talk with Conal Elliott. Conal discusses doing things just for the poetry, how most programs miss their purpose, and the simplest way to ask a question. Conal is currently working on a book about his ideas and actively looking for partners. 

by Haskell Podcast at February 17, 2025 11:00 AM

February 15, 2025

Sandy Maguire

Bidirectional Instance Contexts

Just a quick one today, but I wanted to point out a little trick you can do with Haskell’s typeclass inference.

Imagine we have some little class, the details of which matter not in the least:

type Foo :: Type -> Constraint
class Foo a where
  ...

We can give some instances of this type:

instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...

Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context:

instance (Foo a, Foo b) => Foo (a, b) where ...

Then, of course, if we know both Foo a and Foo b, we can infer Foo (a, b). To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the Dict type, and thus the following program will typecheck:

import Data.Constraint

forwards
  :: Dict (Foo a)
  -> Dict (Foo b)
  -> Dict (Foo (a, b))
forwards Dict Dict = Dict

Perhaps tipped off by the name here, the gentle reader is asked to notice the asymmetry here, since the converse program will not typecheck:

backwards
  :: Dict (Foo (a, b))
  -> (Dict (Foo a), Dict (Foo b))
backwards Dict = (Dict, Dict)

But why should it not typecheck?1 Recall from the relevant instance definition that these instances must, in fact, exist:

instance (Foo a, Foo b) => Foo (a, b)

As a testament to just how good GHC is, we can support this bidirectionality via a minor tweak to the definition of class and its instances.

The trick is to add an associated type family to Foo, and to use it as a superclass constraint:

type Foo :: Type -> Constraint
class Evidence a => Foo a where
  type Evidence a :: Constraint
  type Evidence a = ()
  ...

Because we’ve given a default implementation of the type family, our existing simple instances work as before:

instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...

with the only change required coming from the type constructor instances:

instance (Foo a, Foo b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...

or, if we you want to be cute about it:

instance Evidence (a, b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...

By sticking Evidence into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a Foo dictionary around. And our earlier backwards program now typechecks as expected.

This is all available in a play session if you’d like to fool around with it.


  1. Rhetorical question. I don’t want to hear about orphans or overlapping instances or whatever.↩︎

February 15, 2025 02:15 AM

February 13, 2025

Tweag I/O

Bashfulness


When I first joined the Topiary Team, I floated the idea of trying to format Bash with Topiary. While this did nothing to appease my unenviable epithet of “the Bash guy,� it was our first foray into expanding Topiary’s support beyond OCaml and simple syntaxes like JSON.

Alas, at the time, the Tree-sitter Bash grammar was not without its problems. I got quite a long way, despite this, but there were too many things that didn’t work properly for us to graduate Bash to a supported language.

Fast-forward two years and both Topiary and the Tree-sitter Bash grammar have moved on. As the incumbent Bash grammar was beginning to cause downstream problems from bit rot — frustratingly breaking the builds of both Topiary and Nickel — my fellow Topiarist, Nicolas Bacquey, migrated Topiary to the latest version of the Bash grammar and updated our Bash formatting queries to match.

With surprisingly little effort, Nicolas was able to resolve all those outstanding problems. So with that, Bash was elevated to the lofty heights of “supported language� and — with the changes I’ve made from researching this blog post — Bash formatting is now in pretty good shape in Topiary v0.6.

So much so, in fact, let me put my money where my mouth is! Let’s see how Topiary fares against a rival formatter. I’ll do this, first, by taking you down some of the darker alleys of Bash parsing, just to show you what we’re up against.

Hello darkness, my old friend

There is a fifth dimension beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition; it lies between the pit of man’s fears and the summit of his knowledge. This is the dimension of imagination. It is an area we call: the Bash grammar.

In our relentless hubris, man has built a rocket that — rather than exploding on contact with reality — dynamically twists and turns to meet reality’s expectations. Is that a binary? Execute it! Is that a built-in? Execute it! Is that three raccoons in a trench coat, masquerading as a function? Execute it! And so, with each token parsed, we are Bourne Again and stray ever further from god.

Bear witness to but a few eldritch horrors:1

  • Trailing comments must be preceded by whitespace or a semicolon. However, if either of those are escaped, they are interpreted as literals and this changes the tokenisation semantics:

    echo \ # Ceci n'est pas
     | une pipe'

    Here, perhaps the writer intended to add a comment against the first line. But, what looks like a comment isn’t a comment at all; it becomes an argument to echo, along with everything that follows. That includes the apostrophe in “n’est�, which is interpreted as an opening quote — a raw string — which is closed at the end of the next line.

  • Case statements idiomatically delimit each branch condition with a closing parenthesis. In a subshell, for example, this leads to unbalanced brackets:

    ( case $x in foo )   # Wat?...
    echo bar;; esac )    # 🤯

    This subshell outputs bar when the variable $x is equal to foo. Whereas, on a more casual reading, this formulation might just look like a confusing syntax error.

    Speaking of case statements, did you know that ;& and ;;& are also valid branch terminators? Without checking the manual — if you can find the single paragraph where it’s mentioned — can you tell me how they differ?

  • Bash will try to compute an array index if it looks like an arithmetic expression:

    # Output the (foo - bar)th element of array
    echo "${array[foo-bar]}"

    However, if array in this example is an associative array (i.e., a hash map/dictionary), then foo-bar could be a valid key. In which case, it’s not evaluated and used verbatim.

  • Without backtracking, it’s not possible to distinguish between an arithmetic expansion and a command substitution containing a subshell at its beginning or end:

    echo $((foo + bar))
    echo $((foo); (bar))

    Here, the first statement will output the value of the addition of those two variables; the second will execute foo then bar, each in a subshell, echoing their output. In the subshell case, the POSIX standards even recommend that you add spaces — e.g., $( (foo) ) — to remove this ambiguity.

  • Heredocs effectively switch the parser into a different state, where everything is interpreted literally except when it isn’t. This alone is tricky, but Bash introduces some variant forms that allow additional indentation (with hard tabs), switching off all string interpolation, or both.

    # Indented, with interpolation
    cat <<-HEREDOC
    	I am a heredoc. Hear me roar.
    	HEREDOC

Suffice to say, any formatter has their work cut out.

Battle of the Bash formatters

The de facto formatter for Bash is shfmt. It’s written in Go, by Daniel Martí, actively maintained and has been around for the best part of a decade.

Let’s compare Topiary’s Bash formatting with shfmt in a contest worthy of a Netflix special. I’ll look specifically at each tool’s parsing and formatting capabilities as well as their performance characteristics. I won’t, however, compare their subjective formatting styles, as this is largely a matter of taste.

What Topiary can’t do that shfmt can2

When it comes to formatting Bash in a way that is commonly attested in the wild, there are three things that Topiary cannot currently do. Unfortunately, these are either from the absence of a feature in Topiary, or a lack of fidelity in the Tree-sitter grammar; no amount of hacking on queries will fix them.

The worst offender is probably the inability to distinguish line continuations from other token boundaries. These are used in Bash scripts all the time to break up long commands into more digestible code. In the following example, the call to topiary was spread over multiple lines, with line continuations. Topiary slurps everything onto a single line, whereas shfmt preserves the original line continuations in the input:

# Topiary
topiary format --language bash --query bash.scm <"${script}"
# shfmt
topiary format \
    --language bash \
    --query bash.scm \
    <"${script}"

One saving grace is that Topiary’s Bash parser understands a trailing |, in a pipeline, to accept a line break. As such — while it isn’t my personal favourite style3 — Topiary does support multi-line pipelines. Arguably, they even look a little nicer in Topiary than in shfmt, which only preserves where the line breaks occurred in the input:

# Topiary
foo |
  bar |
  baz |
  quux
# shfmt
foo | bar |
    baz | quux

Otherwise, in Topiary, every command is a one-liner…whether you like it or not!

Next on the “nice to have� list is the long-standing (and controversial) feature request of “alignment blocks�; specifically for comments. That is, presumably related comments appearing on a series of lines should be aligned to the same column:

# Topiary
here # comment
is # comment
a # comment
sequence # comment
of # comment
commands # comment
# shfmt
here     # comment
is       # comment
a        # comment
sequence # comment
of       # comment
commands # comment

The tl;dr of the controversy is that, despite being a popular request — and we all know where popularity gets us, these days — it’s a slap in the face to one of Topiary’s core design principles: minimising diffs. Because we live in a universe where elastic tabstops never really took off, a small change to the above example — say, adding an option to one of the commands — would produce the following noisy diff:

-here     # comment
-is       # comment
-a        # comment
-sequence # comment
-of       # comment
-commands # comment
+here                      # comment
+is                        # comment
+a                         # comment
+sequence                  # comment
+of                        # comment
+commands --with-an-option # comment

For the time being, Topiary won’t be making alignment great again.

Finally, string interpolations — with command substitution and arithmetic expansions — cannot be formatted without potentially breaking the string itself. This is particularly true of heredocs; the full subtleties of which escape the Tree-sitter Bash grammar and so are easily corruptible with naive formatting changes. As such, Topiary has to treat these as immutable leaves and leave them untouched:

# Topiary
echo "2 + 2 = $((  2+  2 ))"

cat <<EOF
Today is $(   date )
EOF
# shfmt
echo "2 + 2 = $((2 + 2))"

cat <<EOF
Today is $(date)
EOF

So far, I have only found three constructions that are syntactically correct, but the Tree-sitter Bash grammar cannot parse (whereas, shfmt can):

  1. A herestring that follows a file redirection (issue #282):

    rev > output <<< hello

    A workaround, for now, is to switch the order; so the herestring comes first.

  2. A heredoc that uses an empty marker (issue #283):

    cat <<''
    Only a monster would do this, anyway!
    
    
  3. Similar to line continuations, the Tree-sitter Bash grammar seems to swallow escaped spaces at the beginning of tokens, interpreting them as tokenisation whitespace rather than literals (issue #284):

    # This should output:
    # <a>
    # <b>
    # < >
    # <c>
    printf "<%s>\n" a b \  c

For what it’s worth, shfmt also supports POSIX shell and mksh (a KornShell implementation). As of writing, there are no Tree-sitter grammars for these shells. However, their syntax doesn’t diverge too far from Bash, so it’s likely that Topiary’s Bash support will be sufficient for large swathes of such scripts. Moreover, the halcyon years of the 1990s are a long way behind us, so maybe this doesn’t matter.

What shfmt can’t do that Topiary can2

shfmt is part of a wider project that includes a Bash parser for the Go ecosystem. A purpose-built parser, particularly for Bash, should perform better than the generalised promise of Tree-sitter and, indeed, that’s what we see. However, there are a few minor constructions that shfmt doesn’t like, but the Tree-sitter Bash grammar accepts:

  1. An array index assignment which uses the addition augmented assignment operator:

    my_array=(
      foo
      [0]+=bar
    )

    To be fair to shfmt, while this is valid Bash, not even the venerable ShellCheck can parse this!

  2. Topiary leaves array indices unformatted, despite them allowing arithmetic expressions. shfmt, however, will add whitespace to any index that looks like an arithmetic expression (e.g., [foo-bar] will become [ foo - bar ]); even if the original, unspaced version could be a valid associative array key.

    (Neither Topiary nor shfmt can handle indices containing spaces. However, the standard Bash workaroundâ„¢ is to quote these: ${array["foo bar"]}.)

  3. Brace expansions can appear — perhaps surprisingly — almost anywhere. Particularly surprising to shfmt is when they appear in variable declarations, which it cannot parse:

    declare {a,b,c}=123      # a=123 b=123 c=123
    declare foo{1..10}=bar   # foo1=bar foo2=bar ... foo10=bar

While it’s a bit of a hack,4 we also implement something akin to “rewrite rules� in our Topiary Bash formatting queries, which shfmt (mostly) doesn’t do. This is to enforce a canonical style over certain constructions. Namely:

  • All $... variables are rewritten in their unambiguous form of ${...}, excluding special variables such as $1 and $@. (Note that this doesn’t affect $'...' ANSI C strings, despite their superficial similarity.)

  • All function signatures are rewritten to the name() { ... } form, rather than function name { ... } or function name() { ... }.

  • All POSIX-style [ ... ] test clauses are rewritten to the Bash [[ ... ]] form.

  • All legacy $[ ... ] arithmetic expansions are rewritten to their $(( ... )) form.

  • All `...` command substitutions are rewritten to their $( ... ) form.

    (This is one that shfmt does do.)

Technically, it is also possible to write rules that put quotes around unquoted command arguments, ignoring things like -o/--options. While this is good practice, we do not enforce this style as it changes the code’s semantics and there may be legitimate reasons to leave arguments unquoted.

Throughput

Let’s be honest: If you have so much Bash to format that throughput becomes meaningful, then formatting is probably the least of your worries. That being said, it is the one metric that we can actually quantify.

Our first problem is that we need a large corpus of normal scripts. By “normal,� I mean things that you’d see in the wild and could conceivably understand if you squint hard enough. This rules out the Bash test suite, for example, which — while quite large — is a grimoire of weird edge cases that neither Topiary nor shfmt handle well. Quite frankly, if you’re writing Bash that looks like this, then you don’t deserve formatting:

: $(case a in a) : ;#esac ;;
esac)

Digging around on r/bash, I came across this repository of scripts. They’re all fairly short, but they’re quite sane. This will do.

We need to slam large amounts of Bash into the immovable objects that are our formatters; a “Bash test dummy,�5 if you will. It would be ideal if we could stream Bash into our formatters — so we could orchestrate sampling at regular time intervals — however, neither Topiary nor shfmt support streaming formatting. This stands to reason as there are cases where formatting will depend on some future context, so the whole input will need to be read upfront. As such, we need to invert our approach to collecting metrics and sample over input size instead.

The general method is:

  1. Locate the scripts in the repository that are Bash, by looking at their shebang.
  2. Filter this list to those which Topiary can handle without tripping over itself because of some obscure parsing failure. (We assume shfmt doesn’t require such a concession.)
  3. Perform <semantics>N<annotation encoding="application/x-tex">N</annotation></semantics>N trials, in which:
    • The whitelist of scripts is randomised, to remove any potential confounding from caching.
    • The top <semantics>M<annotation encoding="application/x-tex">M</annotation></semantics>M scripts are concatenated to obtain a single trial input.6 This is to increase the input size to the formatters in each trial, which is presumed to be the dependent variable, but may be subject to confounding effects when the input is small.
    • The trial input is read to /dev/null a handful of times to warm up the filesystem cache.
    • The trial input is fed into the following, with benchmarks — trial input size (bytes) and runtime (nanoseconds) — recorded for each:
      • cat, which acts as a control;
      • Topiary (v0.5.1; release build, with the query changes described in this blog post);
      • Topiary, with its idempotence checking disabled;
      • shfmt (v3.10.0).

This identified 156 Bash scripts within the test repository; of which, 154 of them could be handled by Topiary.7 On an 11th generation Intel Core i7, at normal stepping, with <semantics>N=50<annotation encoding="application/x-tex">N=50</annotation></semantics>N=50 and <semantics>M=25<annotation encoding="application/x-tex">M=25</annotation></semantics>M=25, on a Tuesday afternoon, I obtained the following results:

N=50, M=25

cat, which does nothing, is unsurprisingly way out in front; by two orders of magnitude. This is not interesting, but establishes that input can be read faster than it can be formatted. That is, our little experiment is not accidentally I/O bound.

What is interesting is that Topiary is about 3× faster than shfmt. We also see that the penalty imposed by idempotency checking — which formats twice, to check the output reaches a fixed point — is quite negligible. This indicates that most of the work Topiary is doing is in its startup overhead, which involves loading the grammar and parsing the formatting query file.

Since Topiary only has to do this once per trial, it’s a little unfair to set <semantics>M=25<annotation encoding="application/x-tex">M=25</annotation></semantics>M=25; that is, an artificially enlarged input that is syntactically valid but semantically meaningless. However, if we set <semantics>M=1<annotation encoding="application/x-tex">M=1</annotation></semantics>M=1 (i.e., individual scripts), then we see a similar comparison:

N=50, M=1

For small inputs, the idempotency check penalty is barely perceptible. Otherwise, the startup overhead dominates for both formatters — hence the much lower throughput values — but, still, Topiary comfortably outperforms shfmt by a similar factor.

And the winner is…

In an attempt to regain some professional integrity, I’ll fess up to the fact that Topiary has a bit of a home advantage and maybe — just maybe — I’m ever so slightly biased. That is, as we are in the (dubious) position of building a plane while attempting to fly it, I was able to tweak and fix a few of our formatting rules to improve Topiary’s Bash support during the writing of this blog post:

  • I added formatting rules for arrays (and associative arrays) and their elements.
  • I corrected the formatting of trailing comments that appear at the end of a script.
  • I corrected the function signature rewriting rule.
  • I corrected the formatting of a string of commands that are interposed by Bash’s & asynchronous operator.
  • I fixed the formatting of test commands and added a rewrite rule for POSIX-style [ ... ] tests.
  • I implemented multi-line support for pipelines.8
  • I updated the $... variable rewrite rule to avoid targeting special forms like $0, $? and $@, etc.
  • I implemented a rewrite rule that converts legacy $[ ... ] arithmetic expansions into their $(( ... )) form.
  • I implemented a rewrite rule that converts `...` command substitutions into their $(...) form.
  • I fixed the spacing within variable declarations, to accommodate arguments and expansions.
  • I forced additional spacing in command substitutions containing subshells, to remove any ambiguity with arithmetic expansions.

The point I’m making here is that these adjustments were very easy to conjure up; just a few minutes of thought for each, across our Tree-sitter queries, was required.

So who’s the winner?

Well, would it be terribly anticlimactic of me, after all that, not to call it? shfmt is certainly more resilient to Bash-weirdness and, of the “big three� I discussed, its line continuation handling is a must have. However, Topiary does pretty well, regardless: It’s much faster, for what that’s worth, and — more to the point — far easier to tweak and hack on.

Indeed, when the Topiary team first embarked upon this path, we weren’t even sure whether it would be possible to format Bash. Now that the Tree-sitter Bash grammar has matured, Topiary — perhaps with future fixes to address some of its shortcomings, uncovered by this blog post — is a contender in the Bash ecosystem.

Thanks to Nicolas Bacquey, Yann Hamdaoui, Tor Hovland, Torsten Schmits and Arnaud Spiwack for their reviews and input on this post, and to Florent Chevrou for his assistance with the side-by-side code styling.


  1. It’s very likely that the syntax highlighting for the more exotic Bash snippets in this blog post will be completely broken.↩

  2. …Yet.↩

  3. My preferred multi-line pipeline style is to have a line continuation and then the | character on the next line, indented:

    foo \
      | bar \
      | baz \
      | quux

    I personally find this much clearer, but Topiary cannot currently handle those pesky line continuations. For shame!↩

  4. Topiary’s formatting rules include node deletion and delimiter insertion. However, delimiters can be any string, so we can coopt this functionality to create basic rewrite rules.↩

  5. I’m also the “terrible pun guy.�↩

  6. This exposed an unexpected bug, whereby Topiary’s formatting model breaks down when some complexity (or, by proxy, size) limit is reached. This behaviour had not been previously observed and further investigation is required.↩

  7. The two failures were due to the aforementioned herestring and complexity6 problems.↩

  8. It may also be possible to implement multi-line && and || lists in a similar way. However, the Tree-sitter grammar parses these into a left-associative nested (list) structure, which is tricky to query.↩

February 13, 2025 12:00 AM

Oleg Grenrus

PHOAS to de Bruijn conversion

Posted on 2025-02-13 by Oleg Grenrus agda

Recently I looked again at PHOAS, and once again I concluded it's nice for library APIs, but so painful to do anything with inside those libraries. So let convert to something else, like de Bruijn.

There are standalone source files if you just want to see the code:

How to convert PHOAS terms to de Bruijn terms?

The solution is hard to find.

You can cheat, [as mentioned by Roman on Agda mailing list https://lists.chalmers.se/pipermail/agda/2018/010033.html]:

There is always a way to cheat, though. You can turn the PHOAS -> untyped de Bruijn machinery into the PHOAS -> typed de Bruijn machinery by checking that future contexts indeed extend past contexts and throwing an error otherwise (which can't happed, because future contexts always extend past contexts, but it's a metatheorem).

In "Generic Conversions of Abstract Syntax Representation" by Steven Keuchel and Johan Jeuring, authors also "cheat" a bit. The "Parametrhic higher-order abstract syntax" section ends with a somewhat disappointing

  where postulate whatever : _

Keuchel and Jeuring also mention "Unembedding Domain-Specific Languages" by Robert Atkey, Sam Lindley and Jeremy Yallop; where there is one unsatisfactory ⊥ (undefined in Haskell) hiding.

I think that for practical developments (say a library in Haskell), it is ok to make a small short cut; but I kept wondering isn't there is a way to make a conversion without cheating.

Well... it turns out that we cannot "cheat". Well-formedness of PHOAS representation depends on parametricity, and the conversion challenge seems to requires a theorem which there are no proof in Agda.

In unpublished (?) work Adam Chlipala shows a way to do the conversion without relying on postulates http://adam.chlipala.net/cpdt/html/Intensional.html; but that procedure requires an extra well formedness proof of given PHOAS term.

This Agda development is a translation of that developement.

Common setup

Our syntax representations will be well-typed, so we need types:

-- Types
data Ty : Set where
  emp : Ty
  fun : Ty  Ty  Ty

Ctx : Set
Ctx = List Ty

variable
  A B C : Ty
  Γ Δ Ω : Ctx
  v : Ty  Set

de Bruijn syntax

Var : Ctx  Ty  Set
Var Γ A = Idx A Γ -- from agda-np, essentially membership relation.

data DB (Γ : Ctx) : Ty  Set where
  var : Var Γ A  DB Γ A
  app : DB Γ (fun A B)  DB Γ A  DB Γ B
  lam : DB (A ∷ Γ) B  DB Γ (fun A B)
  abs : DB Γ emp  DB Γ A

Parametric Higher-order abstract syntax

data PHOAS (v : Ty  Set) : Ty  Set where
  var : v A  PHOAS v A
  app : PHOAS v (fun A B)  PHOAS v A  PHOAS v B
  lam : (v A  PHOAS v B)  PHOAS v (fun A B)
  abs : PHOAS v emp  PHOAS v A

-- closed "true" PHOAS terms.
PHOAS° : Ty  Set₁
PHOAS° A =  {v}  PHOAS v A

de Bruijn to PHOAS

This direction is trivial. An anecdotal evidence that de Bruijn representation is easier to transformation on.

phoasify : NP v Γ  DB Γ A  PHOAS v A
phoasify γ (var x)   = var (lookup γ x)
phoasify γ (app f t) = app (phoasify γ f) (phoasify γ t)
phoasify γ (lam t)   = lam λ x  phoasify (x ∷ γ) t
phoasify γ (abs t)   = abs (phoasify γ t)

Interlude: Well-formedness of PHOAS terms

dam Chlipala defines an equivalence relation between two PHOAS terms, exp_equiv in Intensional, wf in CPDT book). e only need a single term well-formedness so can do a little less

The goal is to rule out standalone terms like

module Invalid where
  open import Data.Unit using (; tt)

  invalid : PHOAS  _ ) emp
  invalid = var tt

Terms like invalid cannot be values of PHOAS°, as all values of "v" inside PHOAS° have to originated from lam-constructor abstractions. We really should keep v parameter free, i.e. parametric, when constructing PHOAS terms.

The idea is then to simply to track which variables (values of v) are intoduced by lambda abstraction.

data phoasWf {v : Ty  Set} (G : ListTy v)) : {A : Ty}  PHOAS v A  Set
 where
  varWf :  {A} {x : v A}
     Idx (A , x) G
     phoasWf G (var x)
  appWf :  {A B} {f : PHOAS v (fun A B)} {t : PHOAS v A}
     phoasWf G f
     phoasWf G t
     phoasWf G (app f t)
  lamWf :  {A B} {f : v A  PHOAS v B}
     ( (x : v A)  phoasWf ((A , x)  G) (f x))
     phoasWf G (lam f)
  absWf :  {A} {t : PHOAS v emp}
     phoasWf G t
     phoasWf G (abs {A = A} t)

-- closed terms start with an empty G
phoasWf° : PHOAS° A  Set
phoasWf° tm =  {v}  phoasWf {v = v} [] tm

A meta theorem is then that all PHOASᵒ terms are well-formed, i.e.

meta-theorem-proposition : Set₁
meta-theorem-proposition =  {A} (t : PHOAS° A)  phoasWf° t

As far as I'm aware this proposition cannot be proved nor refuted in Agda.

de Bruijn to PHOAS translation creates well-formed PHOAS terms.

As a small exercise we can show that phoasify of closed de Bruijn terms creates well-formed PHOAS terms.

toList : NP v Γ  List (Σ Ty v)
toList []       = []
toList (x ∷ xs) = (_ , x) ∷ toList xs

phoasifyWfVar : (γ : NP v Γ) (x : Var Γ A)  Idx (A , lookup γ x) (toList γ)
phoasifyWfVar (x ∷ γ) zero    = zero
phoasifyWfVar (x ∷ γ) (suc i) = suc (phoasifyWfVar γ i)

phoasifyWf : (γ : NP v Γ) (t : DB Γ A)  phoasWf (toList γ) (phoasify γ t)
phoasifyWf γ (var x)   = varWf (phoasifyWfVar γ x)
phoasifyWf γ (app f t) = appWf (phoasifyWf γ f) (phoasifyWf γ t)
phoasifyWf γ (lam t)   = lamWf λ x  phoasifyWf (x ∷ γ) t
phoasifyWf γ (abs t)   = absWf (phoasifyWf γ t)

phoasifyWf° : (t : DB [] A)  phoasWf° (phoasify [] t)
phoasifyWf° t = phoasifyWf [] t

PHOAS to de Bruijn

The rest deals with the opposite direction.

In Intensional Adam Chlipala uses v = λ _ → ℕ instatiation to make the translation.

I think that in the typed setting using v = λ _ → Ctx turns out nicer.

The idea in both is that we instantiate PHOAS variables to be de Bruijn levels.

data IsSuffixOf {} {a : Set} : List a  List a  Setwhere
  refl :  {xs}  IsSuffixOf xs xs
  cons :  {xs ys}  IsSuffixOf xs ys   {y}  IsSuffixOf xs (y ∷ ys)

We need to establish well-formedness of PHOAS expression in relation to some context Γ

Note that variables encode de Bruijn levels, thus the contexts we "remember" in variables should be the suffix of that outside context.

wf : (Γ : Ctx)  PHOAS  _  Ctx) A  Set
wf {A = A} Γ (var Δ)         = IsSuffixOf (A ∷ Δ) Γ
wf         Γ (app f t)       = wf Γ f × wf Γ t
wf         Γ (lam {A = A} t) = wf (A ∷ Γ) (t Γ)
wf         Γ (abs t)         = wf Γ t

And if (A ∷ Δ) is suffix of context Γ, we can convert the evidence to the de Bruijn index (i.e. variable):

makeVar : IsSuffixOf (A ∷ Δ) Γ  Var Γ A
makeVar refl     = zero
makeVar (cons s) = suc (makeVar s)

Given the term is well-formed in relation to context Γ we can convert it to de Bruijn representation.

dbify : (t : PHOAS  _  Ctx) A)  wf Γ t  DB Γ A
dbify         (var x)   wf        = var (makeVar wf)
dbify         (app f t) (fʷ , tʷ) = app (dbify f fʷ) (dbify t tʷ)
dbify {Γ = Γ} (lam t)   wf        = lam (dbify (t Γ) wf)
dbify         (abs t)   wf        = abs (dbify t wf)

What is left is to show that we can construct wf for all phoasWf-well-formed terms.

Adam Chlipala defines a helper function:

makeG′ : Ctx  List (Σ Ty  _  Ctx))
makeG′ [] = []
makeG′ (A ∷ Γ) = (A , Γ) ∷ makeG′ Γ

However for somewhat technical reasons, we rather define

expand : (Γ : Ctx)  NP  _  Ctx) Γ
expand []      = []
expand (_ ∷ Γ) = Γ ∷ expand Γ

and use expand with previously defined toList to define our version of makeG:

makeG : Ctx  List (Σ Ty  _  Ctx))
makeG Γ = toList (expand Γ)

makeG and makeG′ are the same:

toList∘expand≡makeG :  Γ  makeG Γ ≡ makeG′ Γ
toList∘expand≡makeG []      = refl
toList∘expand≡makeG (A ∷ Γ) = cong ((A , Γ)_) (toList∘expand≡makeG Γ)

Then we can construct wf for all phoasWf:

wfWfVar : Idx (A , Δ) (makeG Γ) → IsSuffixOf (A ∷ Δ) Γ
wfWfVar {Γ = B ∷ Γ} zero    = refl
wfWfVar {Γ = B ∷ Γ} (suc i) = cons (wfWfVar i)

wfWf : (t : PHOAS (λ _ → Ctx) A) → phoasWf (makeG Γ) t → wf Γ t
wfWf         (var x)   (varWf xʷ)    = wfWfVar xʷ
wfWf         (app f t) (appWf fʷ tʷ) = wfWf f fʷ , wfWf t tʷ
wfWf {Γ = Γ} (lam f)   (lamWf fʷ)    = wfWf (f Γ) (fʷ Γ)
wfWf         (abs t)   (absWf tʷ)    = wfWf t tʷ

And finally we define dbifyᵒ for all well-formed PHOASᵒ terms.

dbify° : (t : PHOAS° A)  phoasWf° t  DB [] A
dbify° t w = dbify t (wfWf t w)

Bonus section

We can show that converting closed de Bruijn term to PHOAS and back is an identity function:

bonus-var : (x : Var Γ A)  x ≡ makeVar (wfWfVar (phoasifyWfVar (expand Γ) x))
bonus-var {Γ = A  Γ} zero    = refl
bonus-var {Γ = A  Γ} (suc i) = cong suc (bonus-var i)

bonus : (t : DB Γ A)
       t ≡ dbify (phoasify (expand Γ) t)
              (wfWf (phoasify (expand Γ) t) (phoasifyWf _ t))
bonus (var x)   = cong var (bonus-var x)
bonus (app f t) = cong₂ app (bonus f) (bonus t)
bonus (lam t)   = cong lam (bonus t)
bonus (abs t)   = cong abs (bonus t)

bonus° :  (t : DB [] A)  t ≡ dbify° (phoasify [] t) (phoasifyWf° t)
bonus° t = bonus t

February 13, 2025 12:00 AM

February 12, 2025

Mark Jason Dominus

Genealogy of the House of Reuss

A couple of years ago I lamented the difficulty I had in verifying what appeared to be a simple statement of fact:

[Abdullah bin Abdul-Rahman] was the seventh son of the Emir of the Second Saudi State, Abdul Rahman bin Faisal.

The essential problem is that Saudi princes have at least ten or twenty sons each, and they all reuse the same ten or twenty names.

Until today, I was not aware of any European tradition even remotely so confusing. Today I learned of the House of Reuss.

I have other things to do today, so just a couple of highlights, starting with this summary:

Since the end of the 12th century, all male members of the House of Reuss are named Heinrich.

No, don't panic, there must be some way to distinguish them, and of course there is:

For the purpose of differentiation, they are given order numbers according to certain systems (see below, section Numbering of the Heinrichs)

Yes, they are numbered. Since the 12th century. So you might think they would be up to Heinrich MCMXVII by now. No no no, that would be silly.

In the elder line the numbering covers all male children of the elder House, and the numbers increase until 100 is reached and then start again at 1.

In the younger line the system is similar but the numbers increase until the end of the century before starting again at 1.

The Wikipedia article later embarks on a list of rulers of the House of Reuss that includes 151 Henrys with numbers as high as LXXVII. I wonder at this, since if they have really exercised that numbering scheme you would expect to see mention of at least one Henry with a number in the LXXX–XCIX range, but there are none.

A few of the 151 Henrys have distinctive nicknames like Henry II the Bohemian, Henry VII the Red, or Henry VI the Peppersack. But they seem to have run out of new epithets in the 14th century, and lapsed into a habit of using and reusing "the Elder", "the Middle", and "the Younger" over and over. Around the mid-1600s they tired even of this and abandoned the epithets entirely.

Just by way of example, I searched the page for “Henry XIX” and found three rulers by that name and number:

  1. One born 1 March 1790, Heinrich XIX, Prince Reuss of Greiz

  2. Another born 16 October 1720, Count of Selbitz. The English Wikipedia page is a redlink, but the German article on the Houses of Reuss has a bit to say.

  3. And a third, born around 1440, where these is a whole article about him, in Bulgarian For some reason he is known as Хайнрих XXI фон Вайда, Henry XXI (not XIX) of Vaida.

Toward the end of the article, we learn this:

On 7 December 2022, German police conducted an operation which resulted in the arrest of 25 alleged members of the far-right group Reichsbürger, including a member of the Köstritz branch of the House of Reuss, identified as Heinrich XIII Prince Reuss. The suspects arrested in the operation were allegedly planning to overturn the existing German government, and instate Heinrich XIII as the new German de facto leader.

All I can think now is, I think of myself as someone who is good at sniffing out Wikipedia bullshit, but this entire article could be completely made up and I would never be the wiser.

By the way, the link from “Henry VI the Peppersack” is to an article in Bulgarian Wikipedia that does not appear to mention the "Peppersack" epithet, a search on the Internet Archive for books mentioning "Henry Peppersack" turns up nothing, and while the section on the plot to bring Heinrich XIII to power cites a source, the page it purports to link to is gone.

Addendum 20250215

Here's a funny coincidence. The highest-numbered Henry I could find was Henry LXXVII. Lord Sepulchrave is stated at the beginning of Titus Groan to be the 76th Earl of Groan, which makes Titus Groan the 77th.

by Mark Dominus (mjd@plover.com) at February 12, 2025 02:23 AM

Well-Typed.Com

The Haskell Unfolder Episode 39: deriving strategies

Today, 2025-02-12, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 39th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 39: deriving strategies

In this episode we’ll discuss the the four different ways GHC offers for deriving class instance definitions: the classic “stock” deriving, generalised “newtype” deriving, as well as the “anyclass” and “via” strategies. For each of these, we’ll explain the underlying ideas, use cases, and limitations.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at February 12, 2025 12:00 AM

February 11, 2025

Oleg Grenrus

NbE PHOAS

Posted on 2025-02-11 by Oleg Grenrus agda

Normalization by evaluation using parametric higher order syntax. In Agda.

I couldn't find a self-contained example of PHOAS NbE, so here it is. I hope someone might find it useful.

module NbEXP.PHOAS where

data Ty : Set where
  emp : Ty
  fun : Ty  Ty  Ty

data Tm (v : Ty  Set) : Ty  Set where
  var :  {a}  v a  Tm v a
  app :  {a b}  Tm v (fun a b)  Tm v a  Tm v b
  lam :  {a b}  (v a  Tm v b)  Tm v (fun a b)

data Nf (v : Ty  Set) : Ty  Set
data Ne (v : Ty  Set) : Ty  Set

data Ne v where
  nvar :  {a}  v a  Ne v a
  napp :  {a b}  Ne v (fun a b)  Nf v a  Ne v b

data Nf v where
  neut : Ne v emp  Nf v emp
  nlam :  {a b}  (v a  Nf v b)  Nf v (fun a b)

Sem : (Ty  Set)  Ty  Set
Sem v emp       = Ne v emp
Sem v (fun a b) = Sem v a  Sem v b

lower :  {v : Ty  Set} (a : Ty)  Sem v a  Nf v a
raise :  {v : Ty  Set} (a : Ty)  Ne v a  Sem v a

lower emp       s = neut s
lower (fun a b) s = nlam λ x  lower b (s (raise a (nvar x)))

raise emp       n   = n
raise (fun a b) n x = raise b (napp n (lower a x ))

eval : {v : Ty  Set} {a : Ty}  Tm (Sem v) a  Sem v a
eval (var x)   = x
eval (app f t) = eval f (eval t)
eval (lam t) x = eval (t x)

nf : {a : Ty}  {v : Ty  Set}  Tm (Sem v) a  Nf v a
nf {a} t = lower a (eval t)

nf_parametric : {a : Ty}  ({v : Ty  Set}  Tm v a) -> ({v : Ty  Set}  Nf v a)
nf_parametric t = nf t

February 11, 2025 12:00 AM

February 10, 2025

Oskar Wickström

Machine: Learning; Human: Unlearning;

This last month has been fascinating. I guess LLMs have finally resonated with me on a deeper level. It wasn’t like I woke up and suddenly everything was different, but their impact is growing on me non-linearly, forcing me to rewire my brain.

February 10, 2025 11:00 PM

Mark Jason Dominus

Surnames from nicknames nobody has any more

English has a pattern of common patronymic names. For example, "John Peters" and "John Peterson" are someone whose father was named "Peter". ("Peters" should be understood as "Peter's".) Similarly we have John Williams and John Williamson, John Roberts and John Robertson, John Richards and John Richardson, John James and John Jameson, John Johns and John Johnson, and so on.

Often Dad's name was a nickname. For example, a common nickname for "John" is "Jack" and we have (less commonly) John Jacks and (more commonly) John Jackson. John Bills and John Bilson, John Wills and John Wilson, and John Willis and John Willison are Bill, Will, and Wille, all short for William.

"Richard" is "Dick", and we have John Dicks (or Dix) and John Dickson (or Dixon). "Nicholas" is "Nick" and we have John Nicks (or Nix) and John Nickson (or Nixon).

Sometimes the name has the diminutive suffix “-kin” inserted. Wilkins is little Will's son, as is Wilkinson; Peterkins is little Peter's son.

These patterns are so common that if you find surnames that follow them you can almost always infer a forename, although it may be one that is no longer common, or that is spelled differently. For example, many people are named Pierce, Pearse, Pierson, or Pearson, which is from the name Pierre, Piers or Pierce, still used in English although much less common than in the past. (It is from the same root as Peter.) Perkins is little Pierre. Robin used to be a nickname for Robert (it's “Robkin” with the difficult “-bk-” simplified to just “-b-”) and we have John Robins and John Robinson.

Sometimes, the pattern is there but the name is unclear because it is a nickname that is now so uncommon that it is neatly forgotten. The fathers of John Watts, Watson, and Watkins were called Wat, which used to be short for Walter. John Hobbs, John Hobson, and Hobkins are named for Hob, which was short for Robert in the same way that Rob and Bob are still. (I had a neighbor who was called Hob, and told me his family claimed that it was short for Robert, but that he wasn't sure. I assured him that they were correct.) “Daw”, an archaic nickname for “David”, gives us Dawes, Dawkins, and Dawson.

Back in September when I started this article I thought on John Gibbs and John Gibson. Who's named "Gib", and why? Is it archaic nickname? Yes! It was short for Gilbert. Then I forgot about the draft article until today when I woke up wondering about John Simpson (and, I realize now, John Simms and John Simkins). And it transpired "Sim" or "Simme" was once a common nickname for Simon.

I would welcome further examples.

Addenda

20250210

Vicki Rosenzweig reminds me that another lost nickname for Richard is "Hick" (like "Rick" and "Dick") and from this we get Hicks, Hix, Hickson, Hixon, Hickman, and (she informs me) "Hickmott".

"-mott" perhaps indicates a male in-law, according to the Dictionary of American Family Names via Ancestrry.com. The only other clear example of this that I noticed was "Willmott". Note that Gaelic names like Dermott are not of this type; the resemblance is coincidental.

We get Hodge / Hodges / Hodgson / Hodgkins from "Hodge", an arcahic nickname for Roger.

Lucy Keer suggests that Bateson / Bates is from "Bat", an archair nickname for Bartholomew.

Christopher Taylor reminds me that in Atkins and Atkinson the "At-" is short for "Adam", and similarly in "Addis" and "Addison". I know a guy named Atteson but I'm not sure it is the same thing.

20250210 (again)

Response to this article has been too much for me to keep up with.

Hacker News user tibbar mentions mentions several I either didn't know, or knew and forgot to include:

  • "Dob" is another old nickname for "Robert", giving us "Dobson", "Dobbs", "Dobkin", etc.;
  • "Dodge" another nickname for Roger, hence Dodgson, as in Louis Carrol's real name, Charles Dodgson;
  • "Tibb" is an old nickname for Theobald, giving surnames like "Tibbs" and "Tibbets";
  • "Hud" for "Hugh", giving us the Hudsons.

Citizen Claude weighs in

I also asked Claude for suggestions:

I'm going to paste a blog article that discusses common English surnames that are patronymic versions of forenames that no longer exist. For example, the common surname "Watson" is from "Wat", and "Wat" was once a common nickname for "Walter", but no longer used. I would like you to list ten similar examples that are not mentioned in the article.

Claude replied:

Here are ten examples of common English surnames derived from obsolete forenames or nicknames, different from those mentioned in the article:

  1. Higgins/Higgs - from "Hick," a medieval pet form of Richard
  2. Timms/Timson - from "Tim," originally a medieval nickname for Timothy
  3. Hodgson/Hodges - from "Hodge," once a common nickname for Roger
  4. Atkins/Atkinson - from "At" or "Atte," medieval diminutive of Adam
  5. Parkins/Parkinson - from "Perkin," diminutive of Peter/Pierre
  6. Judson/Judd - from "Judd," medieval nickname for Jordan
  7. Malkins/Malkinson - from "Malk," diminutive of Malcolm
  8. Larkins/Larkin - from "Larry" or "Laurence," medieval form of Lawrence
  9. Collins/Colson - from "Col," medieval nickname for Nicholas
  10. Huggins/Huggett - from "Hug," medieval pet form of Hugh

Remember that some of these might be made up. For example, Wiktionary claims that #7 is matronymic, from Malle, a nickname for Mary. Who's right? I don't know. The only way to settle this is with a cage match! I hope.

I complained that #2 was not to spec because “Tim” is still common, and demanded a replacement:

  1. Rankins/Rankin - from "Ran," a medieval diminutive of Randolph

I also asked Claude if I had made any errors, and it said:

The article claims that "Robin" comes from "Robkin" with "bk" simplified to "b". This is incorrect.

"Robin" is actually a French diminutive of Robert, coming from Old French Robin/Rovin. …

This seems plausible.

by Mark Dominus (mjd@plover.com) at February 10, 2025 09:06 PM

February 07, 2025

Mark Jason Dominus

Just give the man the fish!

Last week I complained about a Math SE pathology in which OP asks a simple question, and instead of an answer gets an attempt at a socratic dialog. I ended by saying:

I have been banging this drum for decades, but I will cut the scroll here. Expect a followup article.

Seeing this, Scott Francis remarked:

when you say “I have been banging this drum for decades" I hear echoes of EFnet #perl in the back of my head :)

And yes, that is one of the things I was thinking of.

Thirty years ago the regulars in the #perl IRC channel would play a cruel teasing game. A stranger would come into the channel and ask a simple technical question, like “how do I remove the first character from a string?”

Instead of giving the answer, two or three people would reply perldoc perlre.

In case it's not obvious — and there is no reason why it should be — this means you can run this command to get the manual for how to use Perl regular expressions.

This manual was about 20,000 words long.

People indulging in this shitty behavior would excuse themselves by chanting the maxim “If you give a man a fish, he can eat for one day. If you teach him to fish, he can eat for his whole life.” An actual answer to a question was a “fish”. Apparently, saying perldoc perlre was considered to be “teaching a man to fish.”

If the newbie objected that the reply perldoc perlre was unhelpful, the regulars were only too ready to lecture them on why it was helpful actually, on why they didn't deserve a better answer, on why they shouldn't expect their questions to be answered, on how they were being rude by rejecting the help that was offered them, on how they shouldn't feel entitled to answers, and on why the regulars there were all very busy people with more important things to do that to answer stupid newbie questions.

In my view, someone who is hanging around in #perl should expect newbie questions, and if they don't want to answer newbie questions they simply shouldn't do it, they should ignore them. If they can't do that, if they are so enraged by newbie questions that it ruins the rest of the chat for them, they should go start a different channel with a name that won't attract newbies. But they should not hang around and vent their impotent rage on the newbies who inevitably do show up.

I'm kind of an asshole, but I'm not that big an asshole. I'm callous, but I'm not sadistic. Someone who says they don't have time to help you, but who does have time to explain to you in detail why they aren't helping you, is sadistic.

“Well, we want them to learn to read the manual,” the regulars would claim. Maybe so, but I don't think their strategy was usually effective. If one really wants people to read the manual, a much better strategy would be to answer the question, and then having established oneself as a helpful person, suggest the manual:

By the way, you can get complete documentation about regexes with the command perldoc perlre. It's really long, but it's full of useful information. The ^ operator I mentioned is in the section called "Metacharacters". Would you like help finding it?

On the other hand if what one actually wanted was to convince someone that Perl was a language used by assholes and they might have better success with a different language whose community had fewer assholes, then the #perl regulars’ strategy was probably very effective.

Then as now my usual habit was to just answer the question. There would be this odd little moment where three people would say perldoc perlre and I would say $string =~ s/^.//. Did people yell at me for this? I don't remember. Probably, I was spoiling their fun.

But at least once someone asked me (in good faith, I'm sure) why I did it my way. I saved my answer. It was:

Because it's easy. Because it's helpful. Because I think the theory that says that people will become dependent on it is bullshit.

Because I think the theory that says that telling them to read the man page is more helpful is also bullshit.

Because in my experience people are much more likely to heed your suggestion to read the man page after you have established that you are a helpful concerned person by assisting them.

The main points are the first two: Because it's easy, and because it's helpful, so why not?

It's at least 25 years later and I'm still angry about this. Who the hell hangs around in a help forum for the purpose of refusing to help?

Social media now is toxic in ways we couldn't have imagined then. But let's not forget that it could be pretty toxic then too.

Addenda

“in good faith, I'm sure” is not sarcasm.

20250208

The previous addendum was also not sarcasm.

by Mark Dominus (mjd@plover.com) at February 07, 2025 01:06 AM

February 06, 2025

Philip Wadler

I've been nominated for a teaching award


I've been fortunate to be nominated for a few teaching awards over my career, and even to win a couple. The nomination I just received may be the best.

As a new student at the uni, Philip Wadler was the first introductory lecture I had, and his clear passion for the subject made me feel excited to begin my journey in computer science. In particular he emphasised the importance of asking questions, which made the idea of tutorials and lectures a lot less intimidating, and went on to give really valuable advice for starting university. I enjoyed this session so much, and so was looking forward to the guest lectures he was going to do for Inf1A at the end of semester 1. They certainly did not disappoint, the content he covered was engaging, interesting, and above all very entertaining to listen to, especially when he dressed up as a superhero to cement his point. Because I found these talks so rewarding, I also attended the STMU that he spoke at about AI and ChatGPT, and everyone I talked to after the event said they had a really good time whilst also having a completely new insightful perspective on the topic. In summary, Philip Wadler has delivered the best lectures I have attended since starting university, and I have gotten a lot out of them.

Thank you, anonymous first-year student! 

by Philip Wadler (noreply@blogger.com) at February 06, 2025 10:07 PM

Tweag I/O

The refactoring of a Haskell codebase

Common engineering scenario: There is a large legacy codebase out there which is known to have a few pervasive problems that everyone wants to get rid of. But nobody understands all the details of the codebase, and few are willing to risk breaking the artifact in a long and costly surgery. This post is an experience report on one such refactoring of Liquid Haskell (LH), a tool to verify Haskell programs.

LH has grown mostly from academic contributions that demonstrate the feasibility of some proof technique or another. Since the focus of a demonstration is not always placed on generality, a new user can find unresolved problems, sometimes blockers that make adoption difficult. Let us look at one such example.

The problem: Name resolution

LH requires the user to write specifications for the various parts of the program she wants to verify. Suppose we have a module with a type to describe the verbosity of a program.

module Verbosity where
data Verbosity = Quiet | Verbose

And suppose that we also have a module where we declare the configuration of the program.

module Config where
import Verbosity
data Config = Config Verbosity

For the sake of brevity, this program can only configure the verbosity. Now let us add some more definitions in the Config module to construct a configuration and to give it a specification.

{-@ measure isVerboseConfig @-}
isVerboseConfig :: Config -> Bool
isVerboseConfig (Config Verbose) = True
isVerboseConfig _ = False

{-@ verboseConfig :: {v:Config | isVerboseConfig v } @-}
verboseConfig :: Config
verboseConfig = Config Verbose

The annotation {-@ measure isVerboseConfig @-} indicates to LH that we want to use isVerboseConfig in specifications, as we do in the specification of verboseConfig:

{-@ verboseConfig :: {v:Config | isVerboseConfig v } @-}

This specification says that we expect verboseConfig to have verbosity Verbose, and LH will verify so, but first it has to find out what names like Verbosity and Verbose refer to. For this sake, LH inspects the imports of the module to learn that these names come from the Verbosity module.

Now, when we import the module elsewhere

module Main where
import Config
...

the specification of verboseConfig should be available to verify the new module Main. This time we would hope we wouldn’t need to resolve the names of the imported specs again. Alas, when changing modules LH discards the name resolution of imported modules and needs to resolve names a second time. But module Main is missing the import of module Verbosity, which provides the names that LH needs to resolve.

Easily enough, we can import Verbosity in module Main and declare the problem solved. Unfortunately, this solution means that in large programs we need to import explicitly the transitive dependencies of the modules we want to verify, which is too much to ask of our kind users.

We must consider, then, why LH is discarding the name resolution of imported modules. It turns out that it is a pretty structural reason: all names in Liquid Haskell are represented as strings. While at places there is an effort to make the strings unambiguous, the representation makes resolved and unresolved names hard to distinguish without doing some parsing, and there are just too many opportunities to mistake one for the other. Matters are worsened by the fact that LH does not keep the keys (GHC Names) that allow us to retrieve type information used for verification, and finding these keys from the environments of different modules is not trivial either.

The refactoring process

LH is a tool of 28000 lines of code; changing its representation of names is not an easy refactoring, especially when much of the knowledge on the implementation details still needs to be acquired, as it was in my case. Another alternative would be to rewrite LH from scratch, this time making it right. But to accomplish this I would also need to have complete awareness of all the quirks in the old implementation. We couldn’t argue either that we have a method or a technology that promises a better outcome, so refactoring it had to be then.

Ideally, we would replace all strings with a more structured type to represent resolved names, which should weed out the accidental mistakes and omissions. The change was so massive though, that making a single contribution with the whole change was impractical.

  1. It would have been difficult for anyone to review such a large contribution.
  2. If tests in the testsuite didn’t pass, it would be difficult to identify which part of the changes affected the test outcome.
  3. If tests uncovered issues with the design, we would have invested much effort into an implementation built on the wrong assumptions.
  4. It would have been difficult to estimate the overall effort.

For these reasons, it is essential that whatever plan is chosen, the refactoring is carried over in sufficiently small and incremental steps. Fortunately, name resolution admits breaking down the task, as the many language constructs used in specifications can be resolved separately. I started by resolving names of Haskell types used in specifications, and then I could resolve names in measure annotations, and later names in assumptions, and later names of data constructors in specifications for algebraic data types, and a long etcetera.

There were also choices to make about introducing a new representation. For instance, I knew from the start that I wanted to use GHC Names for all names pointing to Haskell entities (type constructors, data constructors, functions). This makes the name representation as precise as it can ever be. But should I arrange data structures to be parametric on the name representation?

data Spec name = ...

The parser then would produce specifications that use strings when the names are unresolved, and later on convert them to a type of specifications that have resolved names.

parse :: String -> Either [Error] (Spec String)
resolveNames :: Spec String -> Spec GHC.Name

This is close to how the GHC compiler manages different representations of names in the various phases of the compilation pipeline.

Making the abstract syntax tree (AST) parametric, and then implementing the traversals and updating function type signatures was going to be some work, and it didn’t look like the parametricity would help me catch a lot of mistakes. The alternative that I adopted was to replace strings with a sum type called LHName that could hold both resolved or unresolved names.

data LHName = LHNResolved ... | LHNUnresolved String

The parser produces LHNUnresolved values, and a generic syb-style traversal takes care of changing those to LHNResolved during name resolution. In intent, all names are resolved after name resolution, though this knowledge is not explicit in the types of the AST. This would be a problem if some odd function after name resolution expected unresolved names, or if name resolution accidentally left names unresolved. But I don’t regard the runtime errors arising in those cases too likely to escape the testing. After I modified the AST one string occurrence at a time, the type checker dutifully flagged every use of string names that needed updating.

Being in a strongly typed language, it feels sinful to defer to runtime the checks that could be detected via the type system, like parameterizing the AST. The implementation of the GHC compiler is a notable example where the common ASTs are parametric on the variable representation. Parameterizing the specification may still be considered in LH, though it wasn’t an absolute prerequisite to start.

One advantage of LHNames with respect to strings is that LHNames can hold GHC Names. And another advantage is that passing an unresolved LHName where a resolved name is expected produces a runtime error, whereas with strings we got undefined behavior. This is most helpful when serializing specifications to import them later. If any unresolved name is found at that time, an error is produced. Moreover, propagating the switch to LHName through the code helped finding the places that were mistakenly producing unresolved names.

Another interesting choice came when deciding the representation for logic names. These are names that refer to entities in the logic, usually unknown to the Haskell compiler. Logic names can refer to functions or type aliases that can be used in specifications. An example of a type alias is the one for non-negative integers.

{-@ type Nat = {v:Int | v >= 0} @-}

For the refactoring, the major difference between Haskell and logic names was that logic names need to be fed to liquid-fixpoint, the theorem prover that LH uses to discharge proof obligations, and liquid-fixpoint does expect strings as a representation for names. Because liquid-fixpoint is used as a library, data structures in LH and liquid-fixpoint share the name representation, which made using a sum type like LHName harder in this case.

One option would have been to generalize liquid-fixpoint to deal with other representations for names. But this was going to be another project on its own. It seemed more practical to keep the interface to liquid-fixpoint unaffected, so the plan was to parse names as strings, resolve them to LHName, serialize the specs, and then convert the LHName back to strings before interacting with liquid-fixpoint. In this way, I could still reuse the output of name resolution when importing specs.

If I didn’t want to have two versions of the AST with strings and with LHNames, then oh surprise, I had to parameterize specifications with logic names. Ignoring environments and other details I ended up with a schematic interface like

data Spec logicName = ...
parse :: String -> Either [Error] (Spec String)
resolveNames :: Spec String -> Spec LHName
serializeSpec :: Spec LHName -> ByteString
convertToLiquidFixpoint :: Spec LHName -> Spec String

Now the syb-traversals were no longer good to implement name resolution, as the transformation is changing the type of the AST, so I had to implement it with a mix of stock Traversable instances and manual traversals. And I find a bit amusing that I chose a parametric representation for the sake of reusing data structures, and I’m still not doing it to have more precise types.

The current state of the refactoring

At the time of writing, the resolution of all Haskell names in LH annotations is persisted and reused when importing specifications. Some of the logic names are handled in the same fashion, but there are a few cases needed still to complete the refactoring. The state of the refactoring and all of the related contributions can be checked in the corresponding GitHub issue.

There were quite a few side quests derived from the name resolution refactoring. I found it challenging to stay focused on name resolution and not try to fix all the things I discovered broken along the way. These were details like type parameters that could be removed since they were always instantiated to the same type, or fields in record data types that were never read, or functions that were almost dead-code if I could remove just that one use site that should be doing something different. I ended up fixing a bunch of secondary problems when they were easy enough to resolve. But I had to give up more than once on a few issues that turned out to be deeper than anticipated; a humbling exercise, if you will, where I had to admit my goals of the day to be too ambitious for the sake of progressing on the main refactoring.

I’m excited at the prospect of leaving behind the kind of user-facing errors that the old implementation induced. Much of the success rests on having formulated a way that allowed to perform the task incrementally, always keeping the test suite passing. The disarray of name resolution was identified as problematic by both contributors and users, and for much of the specification language it is already a thing of the past.

February 06, 2025 12:00 AM

February 05, 2025

Mark Jason Dominus

Claude helps me find more presidential emoji

A couple of years back I tried to make a list of emoji representing the U.S. presidents. Many of them were fun and easy, or at least amused me. But for some I was stumped. What emoji represents Zachary Taylor?

I've been playing around with Anthropic's LLM “Claude� for a while, so I thought I'd see what Claude had to contribute.

Last time I had looked at the LLM space I was deeply unimpressed:

  1. ChatGPT discusses four-digit numbers
  2. ChatGPT discusses a hypothetical fifth tarot suit
  3. ChatGPT discusses women named James
  4. ChatGPT discusses cauliflower, Decameron and Scheherazade

But that was two years ago, and gods, what a difference. What persuded me that it was time to take another look was two articles by Adam Unikowsky. Unikowsky is a working lawyer who has practiced before the US Supreme Court. He writes an extremly geeky blog, called Adam's Legal Newsletter. Last summer he wrote two articles that blew my mind. Here's an excerpt from the first one:

Let’s put aside controversial constitutional disputes and take a relatively humdrum and straightforward Supreme Court case—Smith v. Spizziri, decided on May 16, 2024. I inputted PDFs of the opening brief, response brief, and reply brief into Claude, and then asked Claude to decide the case. Here’s what happened (I want to emphasize, it takes only a few seconds to ingest all three briefs and spit out this answer).

The prompt he gave to Claude was:

You are a Supreme Court Justice. Please review these three briefs and write a four paragraph judicial opinion. In the first, you will describe the dispute. In the second, you will describe the parties’ positions. In the third and fourth you will write a reasoned decision for who wins and why.

Claude did so; you can read the output on Unikowsky's blog. Here's his evaluation of the result:

This is absolutely correct, and exactly what the Supreme Court decided. 10/10.

Unikowsky looks at a second specific example and concludes:

This is perfect.

(His italics.)

Unikowsky then discusses at some length why AI might be preferable to human judges. Don't sneer; he has probably thought about this more carefully than you have. Read the article.

In the second article Unikowsky tests more extensively:

I downloaded the briefs in every Supreme Court merits case that has been decided so far this Term, inputted them into Claude 3 Opus (the best version of Claude), and then asked a few follow-up questions.

And he reports:

The results were otherworldly. Claude is fully capable of acting as a Supreme Court Justice right now. When used as a law clerk, Claude is easily as insightful and accurate as human clerks, while towering over humans in efficiency.

(Emphasis mine this time.)

This is why I decided I should take another look at LLMs. Well, this is not what I planned to write about. You are here for presidential emoji, so let's do that. Here's my prompt to Claude:

I'm going to paste a full list of U.S. presidents and associated emoji, and I'd like you to explain all of them. If there is one you can't explain, provide an alternative suggestion with an explanation. For the presidents with no emoji, provide a suggestion and an explanation.

and then appended the list.

Here are Claude's explanations for the emoji I had selected:

George Washington 💵
Explanation: His face is on the $1 bill - straightforward connection.

Thomas Jefferson 📜
Explanation: References his role as primary author of the Declaration of Independence.

Right so far.

John Quincy Adams �
Explanation: He was known for planting pear trees on his property and had a particular fondness for horticulture.

This wasn't what I had in mind. I was thinking that it would be funny to represent John Quincy Adams with a picture of a quince. Which is silly and dumb and I can't fault Claude for not getting it. If it had been Claude that had made this suggestion I would probably have quibbled that “Quincy� is not pronounced the same as “quince�.

Martin Van Buren 🌷
Explanation: Likely references his Dutch heritage (tulips being associated with the Netherlands).

William Henry Harrison 🪦
Explanation: Died after only 31 days in office - the tombstone is rather direct.

Both right. Although maybe it would be more amusing to use 🛶 for Harrison.

Millard Fillmore ⛽
Explanation: This is unclear - I'd suggest � instead, as he established the first free public library system.

Another pun, this one even dumber that the quince. ⛽ is the gas pump you use to fill more gas into your car, get it?


I'm not excited about Claude's suggestion, but at least it isn't stupid.

Abraham Lincoln �
Explanation: Likely references his assassination at Ford's Theatre, though � might be more appropriate for his iconic hat.

Claude is a little bit prudish and tends not to like my inappropriate jokes. The hat of course is obvious. In the previous article I said:

I considered, and rejected � for Lincoln because it doesn't look like his actual hat.

Still, it's a perfectly good suggestion.

Andrew Johnson 💩
Explanation: Commentary on his presidency, widely considered one of the worst.

I notice that Claude did not object that this was inappropriate. Prudish or not, even Claude can agree that Andrew Johnson was a turd of a President.

Ulysses S. Grant �
Explanation: References his noted drinking habits.

Rutherford B. Hayes 🧔�
Explanation: Known for his distinctive beard.

James Garfield 🧔�
Explanation: Also known for his beard.

I wasn't completely phoning it in here, the repeated white-guys-with-beards thing is also a joke. I don't think Garfield was actually known for his beard, but whatever. (I've already dispensed with Garfield the lazy cat in the previous article.)

Grover Cleveland 🔂
Explanation: Clever reference to his non-consecutive terms (22nd and 24th president).

I'm pretty sure I don't like that Claude appears to be trying to flatter me. What does it mean, philosophically, when Claude calls something ‘clever’? I have no idea. Being flattered by a human is bad enough, they might really mean something by it.

Benjamin Harrison 🧔�
Explanation: Another bearded president.

Theodore Roosevelt 🧸
Explanation: The teddy bear was named after him.

William Howard Taft �
Explanation: References the (possibly apocryphal) story of getting stuck in a White House bathtub.

Woodrow Wilson �
Explanation: Former president of Princeton University, academic background.

Warren G. Harding 🫖
Explanation: Likely references the Teapot Dome scandal.

Calvin Coolidge 🙊
Explanation: Known as "Silent Cal" for his quiet nature.

Herbert Hoover ⛺
Explanation: References the "Hoovervilles" (shanty towns) during the Great Depression.

I wasn't sure Claude would get these last three because they're a little bit tricky and obscure. But it did.

Franklin D. Roosevelt 👨�🦽
Explanation: References his use of a wheelchair due to polio.

Harry S. Truman �
Explanation: References the atomic bombs and mushroom clouds.

Dwight D. Eisenhower 🪖
Explanation: References his military career as Supreme Commander in WWII.

John F. Kennedy �
Explanation: Likely references his notorious womanizing - though 🚢 (PT-109) might be more appropriate.

Yes, yes, yes, and yes.

Again Claude implies that my suggestion is inappropriate. Lighten up, Claude.

Lyndon B. Johnson 🗳�
Explanation: References his focus on voting rights legislation.

Uh, yeah, the Voting Rights Act of 1965 is definitely what I meant, I certainly would not have been intending to remind everyone of LBJ's propensity to stuff ballot boxes.

In some ways, Claude is a better person than I am.

Richard M. Nixon �
Explanation: Likely a "bug" reference to Watergate surveillance.

Gerald R. Ford �
Explanation: Was a football player at University of Michigan.

Jimmy Carter 🥜
Explanation: Was a peanut farmer before presidency.

Ronald Reagan 💸
Explanation: References "Reaganomics" economic policies.

Yes, yes, yes, and yes.

George H. W. Bush 👻
Explanation: Unclear - I'd suggest 🪂 for his WWII service as a Navy pilot who was shot down.

I had picked 👻 to recall his tenure as Director of the CIA. But on looking into it I have found he had not served in that role for nearly as long as I thought: only from 1974–1976. It is far from his most prominent accomplishment in government.

I sometimes wonder what would have happened if Bush had beaten Reagan in the 1980 election. People sometimes say that the Republican party only ever runs fools and clowns for president. George Bush was their candidate in 1988 and whatever his faults he was neither a fool nor a clown.

Bill Clinton �
Explanation: Famous for playing the saxophone.

George W. Bush �
Explanation: References the shoe thrown at him during a press conference in Iraq.

Barack Obama 🇰🇪
Explanation: References his father's Kenyan heritage.

Here's Claude again being a better person than me. I had picked 🇰🇪 because I was trying to troll my audience with the insinuation that Obama was born in Kenya.

Donald Trump �
Explanation: References his distinctive skin tone.

Joe Biden 🕶�
Explanation: References his trademark aviator sunglasses.

Right, except to me the little quiff on the tangerine is also mocking Trump's hair. But overall I give Claude no demerits. The only times Claude didn't take my meaning were on stupid shit like ⛽ Fillmore.

Here are the presidents where I couldn't come up with anything and asked for Claude's suggestions. I found Claude's suggestions mostly reasonable but also pretty dull.

John Adams
Suggestion: �� - As a key Founding Father who championed strong federal government and was crucial in establishing many governmental institutions.

I don't know, �� is supposed to be a “classical building�, and yes, the buildings in Washington were eventually in neoclassical style once they were built, mostly in the early 20th century, but okay, I guess.

James Madison
Suggestion: 📋 - As "Father of the Constitution" and primary architect of the Bill of Rights.

Okay, but… a clipboard?

James Monroe
Suggestion: � - For the Monroe Doctrine, establishing U.S. influence in the Western Hemisphere.

Full marks. The only reason I'm not kicking myself for not thinking of it first is that I just barely did think of it first. As I was composing the prompt for Claude I said to myself “Why didn't I think of � for the Monroe Doctrine? Oh well, I'll see what Claude says anyway.�

Andrew Jackson
Suggestion: ⚔� - For his military background and famously aggressive personality.

Fine.

John Tyler
Suggestion: � - Known as "His Accidency" for unexpectedly becoming president after Harrison's death.

Not good. I had to get Claude to explain this one to me. See below.

James K. Polk
Suggestion: 🗺� - For major territorial expansion during his presidency.

Not good.

(The emoji is “world map�.)

Zachary Taylor
Suggestion: � - He allegedly died from eating cherries and milk on a hot day.

Whatever my complaints about Claude's other suggestions, I feel that this one redeems all their faults. I love it. It's just the kind of thing I was looking for, the sort of thing Arachne would have woven into her tapestry.

James Buchanan
Suggestion: 👨�🦳 - The only bachelor president.

I'll discuss this one later.

Franklin Pierce
Suggestion: 🌨� - Born in New Hampshire, the only president from that state.

I had wanted to comment on Peirce's best quality which was his great hairstyle, but I couldn't find any good emoji for hair. But this is a better idea. Using 🌨� for New Hampshire is funny.

Chester A. Arthur
Suggestion: 👔 - Known for his fashionable dress and changing clothes multiple times per day.

I don't know a damn thing about Chester Arthur except he succeeded Garfield and he had sideburns. I haven't even checked to see if Claude is right about his fashionable dress. I don't think it is physically possible to get me to care about Chester Arthur.


Okay, back to Tyler and Buchanan. I asked Claude to explain Tyler:

The only suggestion you made that I don't understand is for John Tyler. How does � relate to "his accidency"?

Claude said:

Ah, I see why that's confusing! You're right - my connection between the theater masks emoji � and "His Accidency" wasn't very logical. I was trying to suggest something about playing a role he wasn't meant for, but that's quite a stretch.

That actually makes sense! I agree it was a stretch, but I see it now.

But Claude continued:

Let me suggest a better emoji for John Tyler: ⚡�

ZOMG, hilarious! Perfect! A++++ 11/10 ����� would buy again.

If you don't get it, here's an excerpt of Claude's explanation:

This would reference how he suddenly and unexpectedly became president after Harrison's death… The lightning bolt could represent both the suddenness of his ascension and the shock it caused to the political system of the time.

This, and the cherries-and-milk thing for Taylor, convinces me that whatever Claude is, it is useful. I asked it for help with this ridiculous task and it provided real contributions of real value.

I remarked:

The gray hair for Buchanan I do not like, for two reasons. First, many presidents had gray hair. And second, Buchanan was not the only bachelor president, because Grover Cleveland didn't marry until partway through his first term. Do you have any suggestions for Hayes, Garfield, or Harrison?

Claude tried again for Buchanan:

Thank you for those corrections about Buchanan! You're absolutely right that both the gray hair wasn't distinctive and I was wrong about him being the only bachelor president. Let me rethink that one.

For Buchanan, perhaps 🕊� would be appropriate - but ironically so, as his ineffective leadership and indecision in the lead-up to the Civil War showed that his attempts at keeping peace at any cost ultimately failed to prevent the conflict. Though that might be too subtle.

I don't love it, but I don't have anything better…

No, wait, I just thought of something! I'm going with 🥫 because, to my mind, Buchanan was the guy who, when he tried to kick the slavery can a little farther down the road, discovered that there was no more road down which to kick it.

I suggested this to Claude just now and it was so enthusiastically complimentary that I was embarrassed, so let's move on.

Claude didn't have any ideas I liked for Hayes, Garfield, or Harrison. I tried workshopping Hayes a little more:

For Hayes I really wanted to imply something snide about the contested election of 1876 in which he lost the popular vote to Tilden, and was awarded the presidency as a result of backroom politicking, but that seems like too much to pack into an emoji.

Claude said:

Maybe � for the "dice roll" nature of the commission's decisions? Or � for the corrupt bargain itself? But neither really captures the full gravity of how that compromise effectively ended Reconstruction and ushered in the Jim Crow era.

I think it kind of misses the point if you don't put EMOJI MODIFIER FITZPATRICK TYPE 1-2 on the corrupt handshake: ��. But this is the amazing thing, it does feel like I'm workshopping with Claude. It really feels like a discussion between two people. This isn't Eliza parroting back IS IT BECAUSE OF YOUR MOTHER THAT YOU SAY I DON'T PUT EMOJI MODIFIER FITZPATRICK TYPE 1-2 ON THE CORRUPT HANDSHAKE?.

Could Hayes be a crow? You're supposed to be able to compose ‘bird’, ZWJ, and ‘black square’ to get a black bird. It might be too bitter, even for me.

If you want a conclusion, it is: Claude is fun and useful, even for silly stuff that nobody could have planned for.

by Mark Dominus (mjd@plover.com) at February 05, 2025 12:00 PM

February 04, 2025

Michael Snoyman

Who pays a tax?

President Trump has started rolling out his tariffs, something I blogged about in November. People are talking about these tariffs a lot right now, with many people (correctly) commenting on how consumers will end up with higher prices as a result of these tariffs. While that part is true, I’ve seen a lot of people taking it to the next, incorrect step: that consumers will pay the entirety of the tax. I put up a poll on X to see what people thought, and while the right answer got a lot of votes, it wasn't the winner.

For purposes of this blog post, our ultimate question will be the following:

  • Suppose apples currently sell for $1 each in the entire United States.
  • There are domestic sellers and foreign sellers of apples, all receiving the same price.
  • There are no taxes or tariffs on the purchase of apples.
  • The question is: if the US federal government puts a $0.50 import tariff per apple, what will be the change in the following:
    • Number of apples bought in the US
    • Price paid by buyers for apples in the US
    • Post-tax price received by domestic apple producers
    • Post-tax price received by foreign apple producers

Before we can answer that question, we need to ask an easier, first question: before instituting the tariff, why do apples cost $1?

And finally, before we dive into the details, let me provide you with the answers to the ultimate question. I recommend you try to guess these answers before reading this, and if you get it wrong, try to understand why:

  1. The number of apples bought will go down
  2. The buyers will pay more for each apple they buy, but not the full amount of the tariff
  3. Domestic apple sellers will receive a higher price per apple
  4. Foreign apple sellers will receive a lower price per apple, but not lowered by the full amount of the tariff

In other words, regardless of who sends the payment to the government, both taxed parties (domestic buyers and foreign sellers) will absorb some of the costs of the tariff, while domestic sellers will benefit from the protectionism provided by tariffs and be able to sell at a higher price per unit.

Marginal benefit

All of the numbers discussed below are part of a helper Google Sheet I put together for this analysis. Also, apologies about the jagged lines in the charts below, I hadn’t realized before starting on this that there are some difficulties with creating supply and demand charts in Google Sheets.

Let’s say I absolutely love apples, they’re my favorite food. How much would I be willing to pay for a single apple? You might say “$1, that’s the price in the supermarket,” and in many ways you’d be right. If I walk into supermarket A, see apples on sale for $50, and know that I can buy them at supermarket B for $1, I’ll almost certainly leave A and go buy at B.

But that’s not what I mean. What I mean is: how high would the price of apples have to go everywhere so that I’d no longer be willing to buy a single apple? This is a purely personal, subjective opinion. It’s impacted by how much money I have available, other expenses I need to cover, and how much I like apples. But let’s say the number is $5.

How much would I be willing to pay for another apple? Maybe another $5. But how much am I willing to pay for the 1,000th apple? 10,000th? At some point, I’ll get sick of apples, or run out of space to keep the apples, or not be able to eat, cook, and otherwise preserve all those apples before they rot.

The point being: I’ll be progressively willing to spend less and less money for each apple. This form of analysis is called marginal benefit: how much benefit (expressed as dollars I’m willing to spend) will I receive from each apple? This is a downward sloping function: for each additional apple I buy (quantity demanded), the price I’m willing to pay goes down. This is what gives my personal demand curve. And if we aggregate demand curves across all market participants (meaning: everyone interested in buying apples), we end up with something like this:

Demand curve before tax

Assuming no changes in people’s behavior and other conditions in the market, this chart tells us how many apples will be purchased by our buyers at each price point between $0.50 and $5. And ceteris paribus (all else being equal), this will continue to be the demand curve for apples.

Marginal cost

Demand is half the story of economics. The other half is supply, or: how many apples will I sell at each price point? Supply curves are upward sloping: the higher the price, the more a person or company is willing and able to sell a product.

Let’s understand why. Suppose I have an apple orchard. It’s a large property right next to my house. With about 2 minutes of effort, I can walk out of my house, find the nearest tree, pick 5 apples off the tree, and call it a day. 5 apples for 2 minutes of effort is pretty good, right?

Yes, there was all the effort necessary to buy the land, and plant the trees, and water them… and a bunch more than I likely can’t even guess at. We’re going to ignore all of that for our analysis, because for short-term supply-and-demand movement, we can ignore these kinds of sunk costs. One other simplification: in reality, supply curves often start descending before ascending. This accounts for achieving efficiencies of scale after the first number of units purchased. But since both these topics are unneeded for understanding taxes, I won’t go any further.

Anyway, back to my apple orchard. If someone offers me $0.50 per apple, I can do 2 minutes of effort and get $2.50 in revenue, which equates to a $75/hour wage for me. I’m more than happy to pick apples at that price!

However, let’s say someone comes to buy 10,000 apples from me instead. I no longer just walk out to my nearest tree. I’m going to need to get in my truck, drive around, spend the day in the sun, pay for gas, take a day off of my day job (let’s say it pays me $70/hour). The costs go up significantly. Let’s say it takes 5 days to harvest all those apples myself, it costs me $100 in fuel and other expenses, and I lose out on my $70/hour job for 5 days. We end up with:

  • Total expenditure: $100 + $70 * 8 hours a day * 5 days == $2900
  • Total revenue: $5000 (10,000 apples at $0.50 each)
  • Total profit: $2100

So I’m still willing to sell the apples at this price, but it’s not as attractive as before. And as the number of apples purchased goes up, my costs keep increasing. I’ll need to spend more money on fuel to travel more of my property. At some point I won’t be able to do the work myself anymore, so I’ll need to pay others to work on the farm, and they’ll be slower at picking apples than me (less familiar with the property, less direct motivation, etc.). The point being: at some point, the number of apples can go high enough that the $0.50 price point no longer makes me any money.

This kind of analysis is called marginal cost. It refers to the additional amount of expenditure a seller has to spend in order to produce each additional unit of the good. Marginal costs go up as quantity sold goes up. And like demand curves, if you aggregate this data across all sellers, you get a supply curve like this:

Supply curve before tariff

Equilibrium price

We now know, for every price point, how many apples buyers will purchase, and how many apples sellers will sell. Now we find the equilibrium: where the supply and demand curves meet. This point represents where the marginal benefit a buyer would receive from the next buyer would be less than the cost it would take the next seller to make it. Let’s see it in a chart:

Supply and demand before tariff

You’ll notice that these two graphs cross at the $1 price point, where 63 apples are both demanded (bought by consumers) and supplied (sold by producers). This is our equilibrium price. We also have a visualization of the surplus created by these trades. Everything to the left of the equilibrium point and between the supply and demand curves represents surplus: an area where someone is receiving something of more value than they give. For example:

  • When I bought my first apple for $1, but I was willing to spend $5, I made $4 of consumer surplus. The consumer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and above the equilibrium price point.
  • When a seller sells his first apple for $1, but it only cost $0.50 to produce it, the seller made $0.50 of producer surplus. The producer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and below the equilibrium price point.

Another way of thinking of surplus is “every time someone got a better price than they would have been willing to take.”

OK, with this in place, we now have enough information to figure out how to price in the tariff, which we’ll treat as a negative externality.

Modeling taxes

Alright, the government has now instituted a $0.50 tariff on every apple sold within the US by a foreign producer. We can generally model taxes by either increasing the marginal cost of each unit sold (shifting the supply curve up), or by decreasing the marginal benefit of each unit bought (shifting the demand curve down). In this case, since only some of the producers will pay the tax, it makes more sense to modify the supply curve.

First, let’s see what happens to the foreign seller-only supply curve when you add in the tariff:

Foreign supply shift from tariff

With the tariff in place, for each quantity level, the price at which the seller will sell is $0.50 higher than before the tariff. That makes sense: if I was previously willing to sell my 82nd apple for $3, I would now need to charge $3.50 for that apple to cover the cost of the tariff. We see this as the tariff “pushing up” or “pushing left” the original supply curve.

We can add this new supply curve to our existing (unchanged) supply curve for domestic-only sellers, and we end up with a result like this:

Supply curves post tariff

The total supply curve adds up the individual foreign and domestic supply curves. At each price point, we add up the total quantity each group would be willing to sell to determine the total quantity supplied for each price point. Once we have that cumulative supply curve defined, we can produce an updated supply-and-demand chart including the tariff:

Supply and demand post tariff

As we can see, the equilibrium has shifted:

  • The equilibrium price paid by consumers has risen from $1 to $1.20.
  • The total number of apples purchased has dropped from 63 apples to 60 apples.
  • Consumers therefore received 3 less apples. They spent $72 for these 60 apples, whereas previously they spent $63 for 3 more apples, a definite decrease in consumer surplus.
  • Foreign producers sold 36 of those apples (see the raw data in the linked Google Sheet), for a gross revenue of $43.20. However, they also need to pay the tariff to the US government, which accounts for $18, meaning they only receive $25.20 post-tariff. Previously, they sold 42 apples at $1 each with no tariff to be paid, meaning they took home $42.
  • Domestic producers sold the remaining 24 apples at $1.20, giving them a revenue of $28.80. Since they don’t pay the tariff, they take home all of that money. By contrast, previously, they sold 21 apples at $1, for a take-home of $21.
  • The government receives $0.50 for each of the 60 apples sold, or in other words receives $30 in revenue it wouldn’t have received otherwise.

We could be more specific about the surpluses, and calculate the actual areas for consumer surplus, producer surplus, inefficiency from the tariff, and government revenue from the tariff. But I won’t bother, as those calculations get slightly more involved. Instead, let’s just look at the aggregate outcomes:

  • Consumers were unquestionably hurt. Their price paid went up by $0.20 per apple, and received less apples.
  • Foreign producers were also hurt. Their price received went down from the original $1 to the new post-tariff price of $1.20, minus the $0.50 tariff. In other words: foreign producers only receive $0.70 per apple now. This hurt can be mitigated by shifting sales to other countries without a tariff, but the pain will exist regardless.
  • Domestic producers scored. They can sell less apples and make more revenue doing it.
  • And the government walked away with an extra $30.

Hopefully you now see the answer to the original questions. Importantly, while the government imposed a $0.50 tariff, neither side fully absorbed that cost. Consumers paid a bit more, foreign producers received a bit less. The exact details of how that tariff was split across the groups is mediated by the relevant supply and demand curves of each group. If you want to learn more about this, the relevant search term is “price elasticity,” or how much a group’s quantity supplied or demanded will change based on changes in the price.

Other taxes

Most taxes are some kind of a tax on trade. Tariffs on apples is an obvious one. But the same applies to income tax (taxing the worker for the trade of labor for money) or payroll tax (same thing, just taxing the employer instead). Interestingly, you can use the same model for analyzing things like tax incentives. For example, if the government decided to subsidize domestic apple production by giving the domestic producers a $0.50 bonus for each apple they sell, we would end up with a similar kind of analysis, except instead of the foreign supply curve shifting up, we’d see the domestic supply curve shifting down.

And generally speaking, this is what you’ll always see with government involvement in the economy. It will result in disrupting an existing equilibrium, letting the market readjust to a new equilibrium, and incentivization of some behavior, causing some people to benefit and others to lose out. We saw with the apple tariff, domestic producers and the government benefited while others lost.

You can see the reverse though with tax incentives. If I give a tax incentive of providing a deduction (not paying income tax) for preschool, we would end up with:

  • Government needs to make up the difference in tax revenue, either by raising taxes on others or printing more money (leading to inflation). Either way, those paying the tax or those holding government debased currency will pay a price.
  • Those people who don’t use the preschool deduction will receive no benefit, so they simply pay a cost.
  • Those who do use the preschool deduction will end up paying less on tax+preschool than they would have otherwise.

This analysis is fully amoral. It’s not saying whether providing subsidized preschool is a good thing or not, it simply tells you where the costs will be felt, and points out that such government interference in free economic choice does result in inefficiencies in the system. Once you have that knowledge, you’re more well educated on making a decision about whether the costs of government intervention are worth the benefits.

February 04, 2025 12:00 AM

February 02, 2025

Joachim Breitner

Coding on my eInk Tablet

For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It’s winter right now, but when its summer again it’s always a bit. this weekend I got closer to that goal.

TL;DR: Using code-server on a beefy machine seems to be quite neat.

Passively lit coding
Passively lit coding

Personal history

Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.

Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I’d attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet’s screen’s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.

Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don’t have to use Boox’s “monitor” app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn’t as convenient to be used regularly.

I also played around with pure terminal approaches, e.g. SSH’ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.

VSCode, working remotely

Since these attempts I have started a new job working on the Lean theorem prover, and working on or with Lean basically means using VSCode. (There is a very good neovim plugin as well, but I’m using VSCode nevertheless, if only to make sure I am dogfooding our default user experience).

My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it’s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it’s a bit spooky to run these work loads without the laptop’s fan spinning up.

In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?

VSCode itself doesn’t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet’s system.

code-server, running remotely

A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.

Access

With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.

I also decided against the more involved “reverse proxy behind proper hostname with SSL” setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.

That left me with the option of using a code-server’s built-in support for self-signed certificates and a password:

$ cat .config/code-server/config.yaml
bind-addr: 1.2.3.4:8080
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: true

With trust-on-first-use this seems reasonably secure.

Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot “install” the page (as a Progressive Web App) unless it has a valid certificate. But since I don’t have superuser access to that machine, I can’t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using acme.sh) and copied the files over, so the configuration now reads as follows:

bind-addr: 1.2.3.4:3933
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.cer
cert-key: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.key

(This is getting very specific to my particular needs and constraints, so I’ll spare you the details.)

Service

To keep code-server running I created a systemd service that’s managed by my user’s systemd instance:

~ $ cat ~/.config/systemd/user/code-server.service
[Unit]
Description=code-server
After=network-online.target

[Service]
Environment=PATH=/home/joachim/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
ExecStart=/nix/var/nix/profiles/default/bin/nix run nixpkgs#code-server

[Install]
WantedBy=default.target

(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server.

For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.

Git credentials

The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved.

I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn’t be horrible, but I still wondered if I can do better.

I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn’t want to bother anyone on the weekend.

So I am experimenting with Github’s git-credential-manager now. I have configured it to use git’s credential cache with an elevated timeout, so that once I log in, I don’t have to again for one workday.

$ nix-env -iA nixpkgs.git-credential-manager
$ git-credential-manager configure
$ git config --global credential.credentialStore cache
$ git config --global credential.cacheOptions "--timeout 36000"

To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…

This still grants rather broad permissions to the code-server, but at least only temporarily

Android setup

On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!

I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It’s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it’s a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn’t on Open-VSX.

For some reason the F11 key doesn’t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode’s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that’s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.

A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn’t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed.

I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn’t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions.

Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does’t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don’t do that I get the full screen experience. We’ll see.

I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean’s InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically.

I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I’d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don’t have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?

Conclusion

After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.

A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe – I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?

by Joachim Breitner (mail@joachim-breitner.de) at February 02, 2025 03:07 PM

January 31, 2025

Well-Typed.Com

An introduction to Cabal Hooks for package authors

Over the last year, Well-Typed have carried out significant work in Cabal, Haskell’s build system, thanks to funding from the Sovereign Tech Fund. Our main goal was to re-think the Cabal architecture for building packages. This was historically tied to the Setup command-line interface, with each package technically capable of providing its own independent build system via the Custom build-type. In practice, the full generality of this interface is not useful, and it obstructs the development of new features and created a drag on maintenance, so there has long been an appetite to reimagine this interface within Cabal.1

With the release of Cabal-3.14.0.0 and cabal-install-3.14.1.1, the new Hooks build-type we have developed, together with the Cabal-hooks library, are now available to package authors. Over time, we hope to see packages that depend on the Custom build-type gradually migrate to use Hooks instead.

For more background on this work, check out:

In the remainder of this post, we will:

  • dive into the background details of how Cabal works,

  • provide an introduction to the new interfaces for package authors who may wish to adapt their packages.

This post is based on Sam’s talk at the Haskell Ecosystem Workshop 2024.

Background

The Cabal specification

The Cabal specification (2005) was designed to allow Haskell tool authors to package their code and share it with other developers.

The Haskell Package System (Cabal) has the following main goal:

  • to specify a standard way in which a Haskell tool can be packaged, so that it is easy for consumers to use it, or re-package it, regardless of the Haskell implementation or installation platform.

The Cabal concept of a package is a versioned unit of distribution in source format, with enough metadata to allow it to be built and packaged by downstream distributors (e.g. Linux distributions and other build tools).

A Cabal package consists of multiple components which map onto individual Haskell units (e.g. a single library or executable).

The Cabal package model

Each package must bundle some metadata, specified in a .cabal file. Chiefly:

  • the package name and version number,
  • its dependencies, including version bounds (e.g. base >= 4.17 && < 4.21, lens ^>= 5.3),
  • what the package provides (libraries and their exposed modules, executables…),
  • how to build the package (e.g. build-type: Simple).

The Cabal library then implements everything required to build individual packages, first parsing the .cabal file and then building and invoking the Setup script of the package.

The Setup interface

The key component of the original Cabal specification is that each package must provision an executable which is used to build it. As written in an early draft:

To help users install packages and their dependencies, we propose a system similar to Python’s Distutils, where each Haskell package is distributed with a script which has a standard command-line interface.

More precisely, to comply with the Cabal specification, the build system of a package need only implement the Setup command-line interface, i.e. provide a Setup executable that supports invocations of the form ./Setup <cmd>:

<cmd> description
configure resolve compiler, tools and dependencies
build/haddock/repl prepare sources and build/generate docs/open a session in the interpreter
test/bench run testsuites or benchmarks
copy/install/register move files into an image dir or final location/register libraries with the compiler
sdist create an archive for distribution/packaging
clean clean local files (local package store, local build artifacts, …)

In practice, the ./Setup configure command takes a large number of parameters (as represented in the Cabal ConfigFlags datatype). This configuration is preserved for subsequent invocations, which usually only take a couple of parameters (e.g. ./Setup build -v2 --builddir=<dir>).

This interface can be used directly to build any package, by executing the the following recipe:

  • build and install the dependencies in dependency order;
  • to build each individual unit:
    • ./Setup configure <componentName> <configurationArgs>
    • ./Setup build --builddir=<buildDir>
    • ./Setup haddock --builddir=<buildDir> <haddockArgs> (optional, to generate documentation)
  • to make a unit available to units that depend on it:
    • ./Setup copy --builddir=<buildDir> --destDir=<destDir> (this makes executables available, e.g. for build-tool-depends)
    • for libraries, registration (see § Library registration):
      • ./Setup register --builddir=<buildDir> --gen-pkg-config=<unitPkgRegFile>
      • hc-pkg register --package-db=<pkgDb> <unitPkgRegFile>

Usually, these steps will be executed by a build tool such as cabal-install, which provides a more convenient user interface than invoking Setup commands directly. Some systems (such as nixpkgs) do directly use this interface, however.

The tricky parts in the above are:

  • passing appropriate arguments to ./Setup configure, in particular exactly specifying dependencies,2 and making sure the arguments are consistent with those expected by the cabal-version of the package,3
  • constructing the correct environment for invoking ./Setup, e.g. adding appropriate build-tool-depends executables in PATH and defining the corresponding <buildTool>_datadir environment variables.

Library registration

In the above recipe to build packages, there was a single step which wasn’t an invocation of the Setup script: a call to hc-pkg. To quote from the original Cabal specification:

  • Each Haskell compiler hc must provide an associated package-management program hc-pkg. A compiler user installs a package by placing the package’s supporting files somewhere, and then using hc-pkg to make the compiler aware of the new package. This step is called registering the package with the compiler.
  • To register a package, hc-pkg takes as input an installed package description (IPD), which describes the installed form of the package in detail.

This is the key interchange mechanism between Cabal and the Haskell compiler.

The installed package description format is laid out in the Cabal specification; in brief, it contains all the information the Haskell compiler needs to use a library, such as its exposed modules, its dependencies, and its installation path. This information can be seen by calling hc-pkg describe:

> ghc-pkg describe attoparsec --package-db=<cabal-store>/<ghc-ver>/package.db
name:            attoparsec
version:         0.14.4
visibility:      public
id:              attoparsec-0.14.4-b35cdbf2c0654f3ef00c00804c5e2b390700d4a0
abi:             d84b6b3e46222f7ab87b5a2d405e7f48
exposed:         True
exposed-modules:
    Data.Attoparsec Data.Attoparsec.ByteString
    [...]
hidden-modules:
    Data.Attoparsec.ByteString.Internal Data.Attoparsec.Text.Internal
depends:
    array-0.5.7.0-9340
    attoparsec-0.14.4-ab0b5b7d4498267368f35b0c9f521e31e33fe144
    base-4.20.0.0-30dc bytestring-0.12.1.0-b549 containers-0.7-2f81
    deepseq-1.5.0.0-30ad ghc-prim-0.11.0-d05e
    scientific-0.3.6.2-d4ceb07500a94c3c60cb88dff4bfb53d40348b25
    text-2.1.1-e169 transformers-0.6.1.1-6955

Note that, perhaps confusingly, the hc-pkg interface is not concerned with Cabal’s notion of “packages”. Rather, it deals only in “units”; these generally map to Cabal components, such as the package’s main library and its private and public sublibraries. For example, the internal attoparsec-internal sublibrary of the attoparsec package is registered separately:

> ghc-pkg describe z-attoparsec-z-internal
name:            z-attoparsec-z-attoparsec-internal
version:         0.14.4
package-name:    attoparsec
lib-name:        attoparsec-internal
id:              attoparsec-0.14.4-ab0b5b7d4498267368f35b0c9f521e31e33fe144
abi:             908ae57d09719bcdfb9cf85a27dab0e4
exposed-modules:
    Data.Attoparsec.ByteString.Buffer
    Data.Attoparsec.ByteString.FastSet Data.Attoparsec.Internal.Compat
    [...]
depends:
    array-0.5.7.0-9340 base-4.20.0.0-30dc bytestring-0.12.1.0-b549
    text-2.1.1-e169

How the Setup interface is used by packages

Centering the package build process around the Setup script provides a great deal of flexibility to package authors, as the Setup executable can be implemented in any way the package author chooses. In this way, each package brings its own build system.

However, in practice, this is more expressiveness that most library authors want or need. Consequently, almost all packages use one of the following two build systems:

  1. build-type: Simple (most packages). For such packages, the Setup.hs file is of the following form:

    module Main where
    import Distribution.Simple (defaultMain)
    main = defaultMain

    This means that the ./Setup CLI interface maps directly to the implementation provided by the Cabal library:

  2. build-type: Custom where the Setup.hs file uses the Cabal library to perform most of the build, but brackets some of its logic with package-specific code using the Cabal UserHooks mechanism, e.g. so that it runs custom configuration code after Cabal configure, or generates module sources before running Cabal build.

For an example of case (2), the custom Setup.hs code for hooking into the configure phase might look like the following:

main =
  ( defaultMainWithHooks simpleUserHooks )
    { confHook = \ info cfgFlags -> do
        info' <- customPreConfHook info cfgFlags
        confHook simpleUserHooks info' cfgFlags
    }

In this example, simpleUserHooks means “no hooks” (or more accurately “exactly the hooks that build-type: Simple uses”). So the above snippet shows how we can include custom logic in customPreConfHook in order to update the Cabal GenericPackageDescription, before calling the Cabal library configure function (via confHook simpleUserHooks). Here, aGenericPackageDescription is the representation of a .cabal file used by Cabal (the Generic part means “before attempting to resolve any conditionals”).

The fact that Setup executables may (in principle) be arbitrary when using build-type: Custom fundamentally limits what build tools such as cabal-install or the Haskell Language Server can do in multi-package projects. The tool has to treat the build system of each package as an opaque black box, merely invoking functionality defined by the specific version of the Setup interface supported by the package.

The main observation is that, in practice, custom Setup.hs scripts only insert benign modifications to the build process: they still fundamentally rely on the Cabal library to do the bulk of the work building the package.

A replacement for Custom setup scripts

The limitations of the Setup interface discussed above motivate the need for a new mechanism to customise the build system of a package:

  • The bulk of the work should be carried out by the Cabal library, which exposes functions such as configure and build, but these need to be augmented with hooks so that individual packages can customise certain phases.
  • The hooks provided by this mechanism should be kept to a minimum (to give more flexibility to build tools) while still accommodating the needs of package authors in practice.
  • Customisation should be declared by a Haskell library interface (as opposed to the black-box command-line interface of Setup.hs), in order to enable as much introspection by build systems as possible.

This will enable a gradual restructuring of build tools such as cabal-install away from the Setup command-line interface, which has grown unwieldy due to the difficulty of evolving it to meet requirements that could not be foreseen when it was created.

Building on this understanding, as well as a survey of existing uses cases of build-type: Custom, we have introduced an alternative mechanism for customizing how a package is built: build-type: Hooks. This mechanism does not allow arbitrary replacement of the usual Cabal build logic, but rather merely exposes a set of well-defined hooks which bracket a subset of Cabal’s existing build steps.

We arrived at this design through collaboration with Cabal developers, users, and packagers as part of a RFC process in Haskell Foundation Tech Proposal #60.

Introducing build-type: Hooks

The main documentation for usage of the hooks API is provided in the Haddocks for the Cabal-hooks package. The Cabal Hooks overlay contains patched packages using build-type: Hooks. It can be used as an overlay like head.hackage, for constructing build plans without any build-type: Custom packages. It can also serve as a reference for usage of the API.

At a high-level, a package with build-type: Hooks:

  • declares in its .cabal file:
    • a cabal-version of at least 3.14,
    • build-type: Hooks,
    • a custom-setup stanza with a dependency on Cabal-hooks (the latter is a library bundled with Cabal that provides the API for writing hooks):
cabal-version: 3.14
...
build-type: Hooks
...

custom-setup
  setup-depends:
    base        >= 4.18 && < 5,
    Cabal-hooks >= 0.1  && < 0.2
  • contains a SetupHooks.hs Haskell module source file, next to the .cabal file, which specifies the hooks the package uses. This module exports a value setupHooks :: SetupHooks (in which the SetupHooks type is exported by Distribution.Simple.SetupHooks from the Cabal-hooks package).
module SetupHooks where

-- Cabal-hooks
import Distribution.Simple.SetupHooks

setupHooks :: SetupHooks
setupHooks =
  noSetupHooks
    { configureHooks = myConfigureHooks
    , buildHooks = myBuildHooks }

The new hooks fall into the following categories:

  • configure hooks allow customising how a package will be built
  • pre-build rules allow generating source files to be built
  • post-build hooks allow the package to customise the linking step
  • install hooks allow the package to install additional files alongside the usual binary artifacts

In the remainder of this blog post, we will focus on the two most important (and most commonly used) hooks: configure hooks and pre-build rules.

Configure hooks

The configure hooks allow package authors to make decisions about how to build their package, by modifying the Cabal package description (which is Cabal’s internal representation of the information in a .cabal file). Crucially, these modifications will persist to all subsequent phases.

Configuration happens at two levels:

  • global configuration covers the entire package,
  • local configuration covers a single component.

There are three hooks into the configure phase:

  1. Package-wide pre-configure. This can be used for custom logic in the style of traditional ./configure scripts, e.g. finding out information about the system and configuring dependencies, when those don’t easily fit into Cabal’s framework.

  2. Package-wide post-configure. This can be used to write custom package-wide information to disk, to be consumed by (3).

  3. Per-component pre-configure. This can be used to modify individual components, e.g. adding exposed modules or specifying flags to be used when building the component.

Per-package configuration

Suppose our package needs to use some external executable, e.g. a preprocessor. If we require custom logic to find this external executable on the system, or to parse its version number, we need to go beyond Cabal’s built-in support for build-tool-depends.

We can do this in a pre-configure hook:

myConfigureHooks :: ConfigureHooks
myConfigureHooks =
  noConfigureHooks
    { preConfigurePackageHook = Just configureCustomPreProc }

configureCustomPreProc :: PreConfPackageInputs -> IO PreConfPackageOutputs
configureCustomPreProc pcpi@( PreConfPackageInputs { configFlags = cfg, localBuildConfig = lbc } ) = do
  let verbosity = fromFlag $ configVerbosity cfg
      progDb = withPrograms lbc
  configuredPreProcProg <-
    configureUnconfiguredProgram verbosity customPreProcProg progDb
  return $
    ( noPreConfPackageOutputs pcpi )
      { extraConfiguredProgs =
        Map.fromList
          [ ( customPreProcName, configuredPreProcProg ) ]
      }

customPreProcName :: String
customPreProcName = "customPreProc"

customPreProcProg :: Program
customPreProcProg =
  ( simpleProgram customPreProcName )
    { programFindLocation =
        -- custom logic to find the installed location of myPreProc
        -- on the system used to build the package
        myPreProcProgFindLocation
    , programFindVersion =
        -- custom logic to find the program version
        myPreProcProgFindVersion
    }

Cabal will then add this program to its program database, allowing the program to be used to satisfy build-tool-depends requirements, as well as making it available in subsequent hooks (e.g. pre-build hooks).

Modifying individual components

Suppose we want to modify a component of a Cabal package, e.g. inserting configuration options determined by inspecting the system used to build the package (e.g. availability of certain processor capabilities). We can do this using hooks into the configure phase. For illustration, consider the following example, which includes:

  • a package-wide post-configure hook, which inspects the system to determine availability of AVX2 CPU features, and writes it out to a "system-info" file,
  • a per-component pre-configure hook which reads the "system-info" file, and uses that to pass appropriate compiler options (e.g. -mavx2) when compiling each component.
myConfigureHooks :: ConfigureHooks
myConfigureHooks =
  noConfigureHooks
    { postConfPackageHook  = Just writeSystemInfo
    , preConfComponentHook = Just confComps
    }

data SystemInfo = SystemInfo { supportsAVX2 :: !Bool }
  deriving stock ( Show, Read )
    -- Show/Read for a quick-and-dirty serialisation interface (illustration only)

systemInfoFlags :: SystemInfo -> [ String ]
systemInfoFlags ( SystemInfo { supportsAVX2 } ) =
  [ "-mavx2" | supportsAVX2 ]

writeSystemInfo :: PostConfPackageInputs -> IO ()
writeSystemInfo ( PostConfPackageInputs { packageBuildDescr = pbd } ) = do
  let cfg = LBC.configFlags pbd
      distPref = fromFlag $ configDistPref cfg
      mbWorkDir = flagToMaybe $ configWorkingDir cfg
  supportsAVX2 <- System.Cpuid.Basic.supportsAVX2
  -- + more system-wide checks, if desired
  writeFile ( interpretSymbolicPath mbWorkDir $ systemInfoFile distPref )
    ( show $ SystemInfo { supportsAVX2 } )

systemInfoFile :: SymbolicPath Pkg ( Dir Dist ) -> SymbolicPath Pkg File
systemInfoFile distPref = distPref </> makeRelativePathEx "system-info"

confComps :: PreConfComponentInputs -> IO PreConfComponentOutputs
confComps pcci@( PreConfComponentInputs { packageBuildDescr = pbd, component = comp } ) = do
  let cfg = LBC.configFlags pbd
      distPref = fromFlag $ configDistPref cfg
      mbWorkDir = flagToMaybe $ configWorkingDir cfg
  sysInfo <- read <$> readFile ( interpretSymbolicPath mbWorkDir $ systemInfoFile distPref )
  let opts = systemInfoFlags sysInfo
      bi' = emptyBuildInfo
              { ccOptions = opts
              , cxxOptions = opts
              , options = PerCompilerFlavor opts []
              }
  return $
    ( noPreConfComponentOutputs pcci )
      { componentDiff =
         buildInfoComponentDiff ( componentName comp ) bi'
      }

Pre-build rules

Pre-build rules can be used to generate Haskell source files which can then be built as part of the compilation of a unit. Since we want to ensure that such generated modules don’t break recompilation avoidance (thereby crippling HLS and other interactive tools), these hooks comprise a simple build system. They are described in the Haddock documentation for Cabal-hooks.

The overall structure is that one specifies a collection of Rules inside the monadic API in the RulesM monad.

Each individual rule contains a Command, consisting of a statically specified action to run (e.g. a preprocessor such as alex, happy or c2hs) bundled with (possibly dynamic) arguments (such as the input and output filepaths). In the Hooks API, these are constructed using the mkCommand function. The actions are referenced using static pointers; this allows the static pointer table of the SetupHooks module to be used as a dispatch table for all the custom preprocessors provided by the hooks.

One registers rules using staticRule, declaring the inputs and outputs of each rule. In this way, we can think of each rule as corresponding to an individual invocation of a custom preprocessor. Rules are also allowed to have dynamic dependencies (using dynamicRule instead of staticRule); this supports use-cases such as C2Hs in which one needs to first process .chs module headers to discover the import structure.

Let’s start with a simple toy example to get used to the API: declare hooks that run alex on Lexer.alex and happy on Parser.happy (running alex/happy on *.x/*.y files is built into Cabal, but this is just for illustrative purposes).

{-# LANGUAGE StaticPointers #-}
-- [...]
myBuildHooks :: BuildHooks
myBuildHooks =
  noBuildHooks
    { preBuildComponentRules =
      Just $ rules ( static () ) myPreBuildRules
    }

myPreBuildRules :: PreBuildComponentInputs -> RulesM ()
myPreBuildRules pbci = do
  -- [...]
  -- Define the alex/happy commands.
      alexCmd  = mkCommand ( static Dict ) ( static runAlex )
      happyCmd = mkCommand ( static Dict ) ( static runHappy )
  -- Register a rule: run alex on Lexer.alex, producing Lexer.hs.
  let lexerInFile  = Location srcDir     ( makeRelativePathEx "Lexer.alex" )
      lexerOutFile = Location autogenDir ( makeRelativePathEx "Lexer.hs" )
  registerRule_ "alex:Lexer" $
    staticRule ( alexCmd ( verbosity, mbWorkDir, alex, lexerInFile, lexerOutFile ) )
      {- inputs  -} [ FileDependency lexerInFile ]
      {- outputs -} ( NE.singleton lexerOutFile )
  -- Register a rule: run happy on Parser.happy, producing Parser.hs.
  let parserInFile  = Location srcDir     (  makeRelativePathEx "Parser.happy" )
      parserOutFile = Location autogenDir (  makeRelativePathEx "Parser.hs" )
  registerRule_ "happy:Parser" $
    staticRule ( happyCmd ( verbosity, mbWorkDir, happy, parserInFile, parserOutFile ) )
      {- inputs  -} [ FileDependency parserInFile ]
      {- outputs -} ( NE.singleton parserOutFile )

runAlex, runHappy :: ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location ) -> IO ()
runAlex  = runPp ( Suffix "x" )
runHappy = runPp ( Suffix "y" )

runPp :: Suffix
      -> ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location )
      -> IO ()
runPp ( Suffix ppExt ) ( verbosity, mbWorkDir, ppProg, inLoc, outLoc ) = do
  -- Alex/Happy expect files with a specific extension,
  -- so we make a new temporary file and copy its contents,
  -- giving the file the expected file extension.
  tempDir <- makeSymbolicPath <$> getTemporaryDirectory
  withTempFileCwd mbWorkDir tempDir ( "." <> ppExt ) $ \ inputPpFile _ -> do
    copyFileVerbose verbosity
      ( interpretSymbolicPath mbWorkDir $ location inLoc )
      ( interpretSymbolicPath mbWorkDir inputPpFile )
    runProgramCwd verbosity mbWorkDir ppProg
      [ getSymbolicPath inputPpFile
      , "-o"
      , getSymbolicPath ( location outLoc )
      ]

The static Dict arguments to mkCommand provide evidence that the arguments passed to the preprocessor can be serialised and deserialised. While syntactically inconvenient for writers of Hooks, this crucially allows external build tools (such as cabal-install or HLS) to run and re-run individual build rules without re-building everything, as explained in the Haskell Foundation Tech Proposal #60.

Rules are allowed to depend on the output of other rules, as well as directly on files (using the Location datatype). If rule B depends on a file generated by rule A, then one must declare A as rule dependency of B (and not use a file dependency).

To summarise, the general structure is that we use the monadic API to declare a collection of rules (usually, one rule per Haskell module we want to generate, but a rule can generate multiple outputs as well). Each rule stores a reference (via StaticPointers) to a command to run, as well as the (possibly dynamic) arguments to that command. We can think of the pre-build rules as a table of statically known custom pre-processors, together with a collection of invocations of these custom pre-processors with specific arguments.

A word of warning: authors of pre-build rules should use the static keyword at the top-level whenever possible in order to avoid GHC bug #16981. In the example above, this corresponds to defining runAlex and runHappy at the top-level, instead of defining them in-line in the body of myPreBuildRules.

Custom pre-processors

To illustrate how to write pre-build rules, let’s suppose one wants to declare a custom preprocessor, say myPreProc, which generates Haskell modules from *.hs-mypp files. Any component of the package which requires such pre-processing would declare build-tool-depends: exe:myPreProc.

The pre-build rules can be structured as follows:

  1. Look up the pre-processor in the Cabal ProgramDb (program database).
  2. Define how, given input/output files, we should invoke this preprocessor, e.g. what arguments should we pass to it.
  3. Search for all *.hs-mypp files relevant to the project, monitoring the results of this search (for recompilation checking).
  4. For each file found by the search in (3), register a rule which invokes the processor as in (2).
{-# LANGUAGE StaticPointers #-}
myBuildHooks =
  noBuildHooks
    { preBuildComponentRules =
        Just $ rules ( static () ) myPreBuildRules
    }

myPreBuildRules :: PreBuildComponentInputs -> RulesM ()
myPreBuildRules
  PreBuildComponentInputs
    { buildingWhat   = what
    , localBuildInfo = lbi
    , targetInfo     = TargetInfo { targetComponent = comp, targetCLBI = clbi }
    } = do
  let verbosity = buildingWhatVerbosity what
      progDb = withPrograms lbi
      bi = componentBuildInfo comp
      mbWorkDir = mbWorkDirLBI lbi
  -- 1. Look up our custom pre-processor in the Cabal program database.
  for_ ( lookupProgramByName myPreProcName progDb ) $ \ myPreProc -> do
    -- 2. Define how to invoke our custom preprocessor.
    let myPpCmd :: Location -> Location -> Command MyPpArgs ( IO () )
        myPpCmd inputLoc outputLoc =
          mkCommand ( static Dict ) ( static ppModule )
            ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc )

    -- 3. Search for "*.hs-mypp" files to pre-process in the source directories of the package.
    let glob = GlobDirRecursive [ WildCard, Literal "hs-mypp" ]
    myPpFiles <- liftIO $ for ( hsSourceDirs bi ) $ \ srcDir -> do
      let root = interpretSymbolicPath mbWorkDir srcDir
      matches <- runDirFileGlob verbosity Nothing root glob
      return
        [ Location srcDir ( makeRelativePathEx match )
        | match <- globMatches matches
        ]
    -- Monitor existence of file glob to handle new input files getting added.
    --   NB: we don't have to monitor the contents of the files, because the files
    --       are declared as inputs to rules, which means that their contents are
    --       automatically tracked.
    addRuleMonitors [ monitorFileGlobExistence $ RootedGlob FilePathRelative glob ]
      -- NB: monitoring a directory recursive glob isn't currently supported;
      -- but implementing support would be a nice newcomer-friendly task for cabal-install.
      -- See https://github.com/haskell/cabal/issues/10064.

    -- 4. Declare rules, one for each module to be preprocessed, with the
    --    corresponding preprocessor invocation.
    for_ ( concat myPpFiles ) $ \ inputLoc@( Location _ inputRelPath ) -> do
      let outputBaseLoc = autogenComponentModulesDir lbi clbi
          outputLoc =
            Location
              outputBaseLoc
              ( unsafeCoerceSymbolicPath $ replaceExtensionSymbolicPath inputRelPath "hs" )
      registerRule_ ( toShortText $ getSymbolicPath inputRelPath ) $
        staticRule ( myPpCmd inputLoc outputLoc ) [] ( outputLoc NE.:| [] )

type MyPpArgs = ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location )
  -- NB: this could be a datatype instead, but it would need a 'Binary' instance.

ppModule :: MyPpArgs -> IO ()
ppModule ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc ) = do
  let inputPath  = location inputLoc
      outputPath = location outputLoc
  createDirectoryIfMissingVerbose verbosity True $
    interpretSymbolicPath mbWorkDir $ takeDirectorySymbolicPath outputPath
  runProgramCwd verbosity mbWorkDir myPreProc
    [ getSymbolicPath inputPath, getSymbolicPath outputPath ]

This might all be a bit much on first reading, but the key principle is that we are declaring a preprocessor, and then registering one invocation of this preprocessor per *.hs-mypp file:

  • In myPpCmd, the occurrence of static ppModule can be thought of as declaring a new preprocessor,4 with ppModule being the function to run. This is accompanied by the neighbouring static Dict occurrence, which provides a way to serialise and deserialise the arguments passed to preprocessor invocations.

  • We register one rule per each module to pre-process, which means that external build tools can re-run the preprocessor on individual modules when the source *.hs-mypp file changes.

Conclusion

This post has introduced build-type: Hooks for the benefit of package authors who use build-type: Custom. We hope that this introduction will inspire and assist package authors to move away from build-type: Custom in the future.

We encourage package maintainers to explore build-type: Hooks and contribute their feedback on the Cabal issue tracker, helping refine the implementation and expand its adoption across the ecosystem. To assist such explorations, we also recall the existence of the Cabal Hooks overlay, an overlay repository like head.hackage which contains packages that have been patched to use build-type: Hooks instead of build-type: Custom.

In addition to the work described here, we have done extensive work in cabal-install to address technical debt and enable it to make use of the new interface as opposed to going through the Setup CLI. The changes needed in cabal-install and other build tools (such as HLS) will be the subject of a future post.

While there remains technical work needed in cabal-install and HLS to fully realize the potential of build-type: Hooks, it should eventually lead to:

  • decreases in build times,
  • improvements in recompilation checking,
  • more robust HLS support,
  • removal of most limitations of build-type: Custom, such as the lack of ability to use multiple sublibraries,
  • better long-term maintainability of the Cabal project.

Well-Typed are grateful to the Sovereign Tech Fund for funding this work. In order to continue our work on Cabal and the rest of the Haskell tooling ecosystem, we are offering Haskell Ecosystem Support Packages. If your company relies on Haskell, please encourage them to consider purchasing a package!


  1. See, for example, Cabal issue #3600.↩︎

  2. e.g. --package-db=<pkgDb>, --cid=<unitId> and --dependency=<depPkgNm>:<depCompNm>=<depUnitId> arguments↩︎

  3. The cabal-version field of a package description specifies the version of the Cabal specification it expects. As the Cabal specification evolves, so does the set of flags understood by the Setup CLI. This means that, when invoking the Setup script for a package, the build tool needs to be careful to pass arguments consistent with that version; see for instance how cabal-install handles this in Distribution.Client.Setup.filterConfigureFlags.↩︎

  4. In practice, this means adding an entry to the static pointer table.↩︎

by sam at January 31, 2025 12:00 AM

January 30, 2025

Tweag I/O

Writing a formatter has never been so easy: a Topiary tutorial

A bit more than one year ago, Tweag announced our open-source, universal formatting engine Topiary, based on the tree-sitter ecosystem. Since then, Topiary has been serving as the official formatter (under the hood) for the Nickel configuration language. Topiary also supports a bunch of other languages (CSS, TOML, OCaml, Bash) and we are seeing people trying it out to support even more languages such as Catala, Nushell, Nix, and more. While I’ve kind of been part of the project from a distance, I’m first and foremost a happy user of Topiary, which I genuinely find really cool both conceptually and practically. While the technical documentation provides an extensive description of Topiary’s capabilities, it doesn’t include (as of now) a complete step-by-step guide on how to write a new formatter for your own language starting from zero. In this post, I’ll show you precisely how to do that.

Why you should use Topiary

Let’s say that you’ve authored a great payroll management application and created a new niche programming language named Yolo to describe tax logic for different countries (tax calculation is all but a trivial subject!). Developers these days aren’t satisfied with an obscure command-line interpreter anymore. They expect beautiful colors, they expect auto-completion, they expect automatic and uniform formatting, they expect package management and a package registry to distribute their code!

While some of those features are just too much work for a niche language, formatting does sound like a basic commodity that you could provide. Alas, this is only true on the surface. At a high-level, a formatter performs the following steps:

  1. Parse the input to a structured representation
  2. Pretty-print the result while respecting parts of the original layout (comments, some line breaks, etc.)

Sometimes you can reuse the parser and the representation of your language implementation, but it’s not guaranteed, as parsing for formatting, interpretation or for compilation have different requirements. If you’ve ever written a serious pretty-printer, with indentation, single-line versus multi-line layout, line-wrapping and all, you’ll know that it’s also not as simple as it looks. For a serious formatter, you’ll need to search for a variety of patterns and treat them in a specific way.

The worst part about all of this is that many of these tasks are generic (not language specific) and laborious, but we still need to reimplement them for every formatter under the sun. It’s frustrating!

This is where Topiary comes in. Topiary is a generic formatter that leverages tree-sitter, an incremental parsing framework. Chances are your language already has a tree-sitter grammar, or it probably should, if you want basic editor support such as syntax highlighting. Given a tree-sitter grammar definition for a language, Topiary will handle parsing and pretty-printing automatically for you. What’s left to do is to use Topiary’s declarative language to write formatting rules. You can focus on the actual logic of the formatter and delegate the boring stuff to Topiary.

As a teaser, beyond the initial setup, you’ll only need to write rules that look like this somewhere in a file:

; Add indentation to the condition of pattern guards in a match branch
(match_branch
  (pattern_guard
    "if" @append_indent_start
    (term) @append_indent_end
  )
)

And you’ll get a formatter! Neat, isn’t it?

There is one caveat: Topiary doesn’t plan to officially support formatting whitespace-sensitive languages, such as Python or Haskell. Depending on the language, it might or might not be doable, but it is likely to be troublesome.

Writing a formatter for Yolo

A Yolo file defines inputs and outputs for a tax calculation using the eponymous keywords:

input income, status
output net_income, income_tax

The rest of the file defines the output as functions of the inputs and other outputs. They can be either simple arithmetic formulas, or they can be defined by case analysis with basic support for boolean conditions:

income_tax := case {
  status = "exempted" | income < 10000 => 0,
  _ => income * 0.2
}

net_income := income - income_tax

Step 1: the tree-sitter grammar

This tutorial isn’t about writing a tree-sitter grammar, but since it’s a requirement for Topiary and I want this post to be exhaustive, I can’t just leave this part out. I’ll quickly cover how to spin up a tree-sitter grammar for a language and how to understand tree-sitter output.

Setup

You’ll need to install the tree-sitter CLI with a recent version (tested with 0.24). I’ll use Nix to install it, but other installation methods are documented in the tree-sitter documentation.

$ nix profile install nixpkgs#tree-sitter
$ mkdir tree-sitter-yolo
$ cd tree-sitter-yolo
$ tree-sitter init
[.. prompts from tree-sitter to init your repo ..]

tree-sitter init generates a bunch of files, but the one we care about is grammar.js. This is a grammar definition of your language in JavaScript. I won’t go into the details of tree-sitter grammar development but instead just provide a simple definition for our toy language Yolo.

Here is a simple tree-sitter grammar for Yolo. Even if you don’t know JavaScript nor tree-sitter very well, it should be reasonably readable.

Then, we need to ask tree-sitter to generate the parser source files for Yolo and build it:

tree-sitter generate
tree-sitter build

If everything went well, you should have a file yolo.so at the root of your grammar directory.

The grammar

The grammar defines the shape of the tree that tree-sitter will produce and that your formatter will manipulate. You might need to refine the grammar later to support finer formatting rules.

What’s important to understand is how a parse tree is represented. Let’s take the original Yolo example in full and put it in a test.yolo file:

input income, status
output net_income, income_tax

income_tax := case {
  status = "exempted" | income < 10000 => 0,
  _ => income * 0.2
}

net_income := income - income_tax

tree-sitter will parse it to a tree that looks like this1 (some subtrees have been collapsed for brevity):

png

Images aren’t really suitable for interaction and automation, though. Fortunately, tree-sitter uses a syntax called S-expressions to represent and manipulate such trees as text. You can ask tree-sitter to print the text representation:

tree-sitter parse test.yolo --no-ranges

The full output is a bit verbose, but very instructive. Let’s take a quick look at it. I’ve added the corresponding source next to each node as a ;-delimited comment for clarity. The nesting structure is given by the parentheses, which introduce a new node starting with a name and followed by the node’s children.

(tax_rule
  (statement
    (input_statement              ; input income, status
      (identifier)                ; income
      (identifier)))              ; status
  (statement
    (output_statement             ; output net_income, income_tax
      (identifier)                ; net_income
      (identifier)))              ; income_tax
  (statement
    (definition_statement         ; income_tax := case { ... }
      (identifier)                ; income_tax
      (expression
        (case                     ; case { ... }
          (case_branch            ; status = "exempted" | income < 10000 => 0
            condition: (condition ; status = "exempted" | income < 10000
              (condition          ; status = "exempted"
                (identifier)      ; status
                (expression
                  (string)))      ; "exempted"
              [..])               ; | income < 10000
            body: (expression
              (number)))          ; 0
          (case_branch            ; _ => income * 0.2
           [..])
  [..]

You can take another look at the image above and try to match each node with a line in the S-expression (beware that I didn’t collapse exactly the same parts in the S-expression and in the image). We can see labels such as condition: and body: which we have introduced in the grammar using the tree-sitter field() helper, to make things easier to read and to use.

Some nodes seem to be missing from the S-expression: where are the operators or keywords such as |, :=, or case? Those are unnamed nodes in the tree-sitter jargon, which are hidden by default in the S-expression representation — but they are there in the tree nonetheless.

Step 2: the Topiary setup

Let’s now install Topiary and extend it with our grammar. Since Topiary 0.5, we don’t need to mess with the source code nor rebuild it anymore to add a custom language. Instead we can configure it.

First, install Topiary version 0.5.1 or higher. I will once again use Nix magic2, but the Topiary repository comes with pre-built binaries and other installation methods.

nix profile install github:tweag/topiary

Then, write the following Nickel configuration file in your grammar repository:

# topiary-yolo.ncl
{
  languages = {
    yolo = {
      extensions = ["yolo"],
      grammar.source.path = "/path/to/tree-sitter-yolo/yolo.so",
    }
  }
}

This defines the file extensions for yolo and the path to the compiled grammar3. If one day the grammar is published to a git repository, you can specify a git repository and a revision instead. See Topiary’s documentation for more information.

The last ingredient is the query file, which contains the formatting rules. We’ll start with an empty one:

mkdir -p ~/.config/topiary/queries
touch ~/.config/topiary/queries/yolo.scm

Using TOPIARY_LANGUAGE_DIR to point Topiary to our extra query directory, we can now try to format our program. Topiary formats in-place by default, but for now we use shell redirections to avoid mutating the original file:

$ export TOPIARY_LANGUAGE_DIR=~/.config/topiary/queries
$ topiary format --configuration topiary-yolo.ncl --skip-idempotence --language yolo < test.yolo
inputincome,statusoutputnet_income,income_taxincome_tax:=case{status="exempted"|income<10000=>0,_=>income*0.2}net_income:=income-income_tax

Well, that’s not exactly what we expected, but something happened! Because our formatter is somehow empty, and Topiary consider that languages are whitespace-insensitive by default, all spaces have just been eaten up (--skip-idempotence disables a sanity check that would have rejected the output).

We can finally start to write the meat of our Yolo formatter to fix this!

Step 3: the queries

Queries are patterns that match subtrees of the input. A query is decorated with captures, which are attributes that are attached to matched nodes (prefixed with the @ sign). When a query matches, the tree is decorated with the corresponding captures. For tree-sitter, captures are generic extra annotations, but Topiary interprets them to format the output as desired.

I encourage you to read the reference documentation on tree-sitter queries at one point. Topiary’s README lists all captures that you can use with Topiary. Comments are introduced with a leading ; in the query file.

In the following, the code snippets are to be appended to the query file ~/.config/topiary/queries/yolo.scm. First, we’ll tell Topiary to ensure some spacing around operators:

; Do not mess with spaces within strings
(string) @leaf

; Do not remove empty lines between statements, for readability and space
(statement) @allow_blank_line_before

; Always surround operators with spaces
[
  "="
  ">"
  "<"
  "&"
  "|"
  "_"
  "=>"
  "+"
  "-"
  "*"
  ":="
] @prepend_space @append_space

Those queries will match the corresponding nodes wherever they appear in the tree. Now, let’s stipulate that each statement must be separated by at least a new line:

; Add a newline between two consecutive statements
(
  (statement) @append_hardline
  .
  (statement)
)

We’ve used a tree-sitter anchor ., which ensures that this pattern matches two consecutive statements with nothing in between (except maybe unnamed nodes), so that we don’t add a new line before the first one or after the last one, but only between each consecutive pair. Topiary won’t add a second new line if the source already has one: existing spacing is mostly forgotten (except when using @allow_blank_line_before or @append/prepend_input_softline) while query-introduced spacing is accumulated and flattened (this includes whitespace and line breaks). For example, if two different queries append a space after a node, the final result will still be that only one space is appended.

The statement nodes have more content than the query makes it look like, if you look back at the output of tree-sitter parse (a single child and many grand-children) in step 1. Indeed, you can omit irrelevant siblings and children by default in tree-sitter queries.

Let’s format the case branches now. We want to put the initial case { on the same line, then each branch indented and on their own line, and finally the closing } alone on its line.

; Lay out the case skeleton
(case
  "{" @append_hardline @append_indent_start
  "}" @prepend_indent_end
)

; Put case branches on their own lines
(case
  (case_branch) @append_hardline
)

Again, because extra children and siblings can appear in the matched subtree by default, the second query will match each branch of each case expression once, and not only a case expression with a single branch.

It looks like we could merge those two queries since they both control how the case is formatted. However, it’s in fact much harder to get the combined query right than just concatenating both, if even possible. In general, it’s both simpler and better to split your queries into small and topically coherent atoms, even if they apply to the same top-level node.

Let’s try to format a mangled version of our original Yolo file:

input income,
status output net_income, income_tax

income_tax := case { status="exempted"  | income<10000 => 0, _ => income*0.2}


net_income := income -    income_tax
$ topiary format --configuration topiary-yolo.ncl --skip-idempotence --language yolo < mangled.yolo
inputincome,status
outputnet_income,income_tax

income_tax := case{
  status = "exempted" | income < 10000 => 0
  , _ => income * 0.2
}

net_income := income - income_tax

Better, but we have some troubleshooting to do.

First, spaces are missing between input or output and the list of identifiers.

Second, we’d like to add a space after the comma and make sure there’s no space before the comma: input income, status. We also want a space between case and the following {.

Finally, the comma following a case branch is wrongly laid out on the next line. We are impacted by the way we wrote our grammar here: the comma is actually grouped with the next branch in the grammar as repeat(seq(",", $.case_branch)). We could either change the grammar or adapt the query. We choose the latter for simplicity.

Here’s the diff of the fix:

--- a/yolo.scm
+++ b/yolo.scm
@@ -19,6 +19,21 @@
   ":="
 ] @prepend_space @append_space

+; Add space after `input` and `output` decl
+[
+  "input"
+  "output"
+] @append_space
+
+; Add a space after and remove space before the comma in an identifier list
+(
+ (identifier)
+ .
+ "," @prepend_antispace @append_space
+ .
+ (identifier)
+)
+
 ; Add a newline between two consecutive statements
 (
   (statement) @append_hardline
@@ -28,11 +43,17 @@

 ; Lay out the case skeleton
 (case
-  "{" @append_hardline @append_indent_start
+  "{" @prepend_space @append_hardline
+  "}" @prepend_hardline
+)
+
+; Indent the content of case
+(case
+  "{" @append_indent_start
   "}" @prepend_indent_end
 )

 ; Put case branches on their own lines
 (case
-  (case_branch) @append_hardline
+  "," @append_hardline
 )

Now, we can try to format the mangled Yolo file again. We finally get rid of --skip-idempotence as we now output valid Yolo, and can format in-place.

$ topiary format --configuration topiary-yolo.ncl mangled.yolo
$ cat mangled.yolo
input income, status
output net_income, income_tax

income_tax := case {
  status = "exempted" | income < 10000 => 0,
  _ => income * 0.2
}

net_income := income - income_tax

And voilà!

Conclusion

In this post, we’ve seen how to set up a formatter for a new language using Topiary from scratch, creating a tree-sitter grammar, configuring Topiary, and writing our formatting rules. I hope that it’s a convincing demonstration that writing a code formatter has never been easier than today thanks to Topiary. Our formatter is simple but honest. In a follow-up post, I’ll cover more advanced features, such as multi-line versus single-line formatting, measuring scopes, comments, and more. Stay tuned!


  1. You can refer to Topiary’s documentation to learn how to generate those graphs.
  2. Although the Nix way is the easiest, the installation can take some time. Don’t panic if Nix doesn’t show any output for a while. Also note that we don’t install from nixpkgs but directly from the GitHub repository: nixpkgs doesn’t have the latest Topiary version yet.
  3. At the time of writing, using grammar.source.path unfortunately doesn’t work on Windows. You can still use the git revision style to point to your local tree-sitter-yolo repo, see Topiary documentation.

January 30, 2025 12:00 AM

January 26, 2025

Chris Reade

PenroseKiteDart Animations

About PenroseKiteDart

Below we present some animations that illustrate operations on finite patches of Penrose’s Kite and Dart tiles.

These were created using PenroseKiteDart which is a Haskell package available on Hackage making use of the Haskell Diagrams package. For details, see the PenroseKiteDart user guide.

Penrose’s Kite and Dart tiles can produce infinite aperiodic tilings of the plane. There are legal tiling rules to ensure aperiodicity, but these rules do not guarantee that a finite tiling will not get stuck. A legal finite tiling which can be continued to cover the whole plane is called a correct tiling. The rest, which are doomed to get stuck, are called incorrect tilings. (More details can be found in the links at the end of this blog.)

Decomposition Animations

The function decompose is a total operation which is guaranteed to preserve the correctness of a finite tiling represented as a tile graph (or Tgraph). Let us start with a particular Tgraph called sunGraph which is defined in PenroseKiteDart and consists of 5 kites arranged with a common origin vertex. It is drawn using default style in figure 1 on the left. On the right of figure 1 it is drawn with both vertex labels and dotted lines for half-tile join edges.

Figure 1: sunGraph
Figure 1: sunGraph

We can decompose sunGraph three times by selecting index 3 of the infinite list of its decompositions.

    sunD3 :: Tgraph
    sunD3 = decompositions sunGraph !! 3

where we have used

    decompose :: Tgraph -> Tgraph
    
    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose

The result (sunD3) is drawn in figure 2 (scaled up).

Figure 2: sunD3
Figure 2: sunD3

The animation in figure 3 illustrates two further decompositions of sunD3 in two stages.

Figure 3: Two decompositions of sunD3

Figure 4 also illustrates two decompositions, this time starting from forcedKingD.

    forcedKingD :: Tgraph
    forcedKingD = force (decompose kingGraph)
Figure 4: Two decompositions of forcedKingD

A Composition Animation

An inverse to decomposing (namely composing) has some extra intricacies. In the literature (see for example 1 and 2) versions of the following method are frequently described.

  • Firstly, split darts in half.
  • Secondly, glue all the short edges of the half-darts where they meet a kite (simultaneously). This will form larger scale complete darts and larger scale half kites.
  • Finally join the halves of the larger scale kites.

This works for infinite tilings, but we showed in Graphs,Kites and Darts and Theorems that this method is unsound for finite tilings. There is the trivial problem that a half-dart may not have a complete kite on its short edge. Worse still, the second step can convert a correct finite tiling into an incorrect larger scale tiling. An example of this is given in Graphs, Kites and Darts and Theorems where we also described our own safe method of composing (never producing an incorrect Tgraph when given a correct Tgraph). This composition can leave some boundary half-tiles out of the composition (called remainder half-tiles).

The animation in figure 5 shows such a composition where the remainder half-tiles are indicated with lime green edges.

Figure 5: Composition Animation

In general, compose is a partial operation as the resulting half-tiles can break some requirements for Tgraphs (namely, connectedness and no crossing boundaries). However we have shown that it is a total function on forced Tgraphs. (Forcing is discussed next.)

Forcing Animations

The process of forcing a Tgraph adds half-tiles on the boundary where only one legal choice is possible. This continues until either there are no more forced additions possible, or a clash is found showing that the tiling is incorrect. In the latter case it must follow that the initial tiling before forcing was already an incorrect tiling.

The process of forcing is animated in figure 6, starting with a 5 times decomposed kite and in figure 7 with a 5 times decomposed dart.

Figure 6: Force animation
Figure 7: Another force animation

It is natural to wonder what forcing will do with cut-down (but still correct) Tgraphs. For example, taking just the boundary faces from the final Tgraph shown in the previous animation forms a valid Tgraph (boundaryExample) shown in figure 8.

    boundaryExample :: Tgraph
    boundaryExample = runTry $ tryBoundaryFaceGraph $ force $ decompositions dartGraph !!5
Figure 8: boundaryExample
Figure 8: boundaryExample

Applying force to boundaryExample just fills in the hole to recreate force (decompositions dartGraph !!5) modulo vertex numbering. To make it more interesting we tried removing further half-tiles from boundaryExample to make a small gap. Forcing this also completes the filling in of the boundary half-tiles to recreate force (decompositions dartGraph !!5). However, we can see that this filling in is constrianed to preserve the required Tgraph property of no crossing boundaries which prevents the tiling closing round a hole.

This is illustrated in the animation shown in figure 9.

Figure 9: Boundary gap animation

As another experiment, we take the boundary faces of a (five times decomposed but not forced) star. When forced this fills in the star and also expands outwards, as illustrated in figure 10.

Figure 10: Star boundary

In the final example, we pick out a shape within a correct Tgraph (ensuring the chosen half-tiles form a valid Tgraph) then animate the force process and then run the animation in both directions (by adding a copy of the frames in reverse order).

The result is shown in figure 11.

Figure 11: Heart animation

Creating Animations

Animations as gif files can be produced by the Haskell Diagrams package using the rasterific back end.

The main module should import both Diagrams.Prelude and Diagrams.Backend.Rasterific.CmdLine. This will expose the type B standing for the imported backend, and diagrams then have type Diagram B.

An animation should have type [(Diagram B, Int)] and consist of a list of frames for the animation, each paired with an integer delay (in one-hundredths of a second).

The animation can then be passed to mainWith.

module Main (main) where
    
import Diagrams.Prelude
import Diagrams.Backend.Rasterific.CmdLine

...

fig::[(Diagram B,Int)]
fig = myExampleAnimation

main :: IO ()
main = mainWith fig

If main is then compiled and run (e.g. with parameters -w 700 -o test.gif) it will produce an output file (test.gif with width 700).

Crossfade tool

The decompose and compose animations were defined using crossfade.

crossfade :: Int -> Diagram B -> Diagram B -> [Diagram B]
crossfade n d1 d2 = map blending ratios 
  where
    blending r = opacity (1-r) d1 <> opacity r d2
    ratios = map ((/ fromIntegral n) . fromIntegral) [0..n]

Thus crossfade n d1 d2 produces n+1 frames, each with d1 overlaid on d2 but with varying opacities (decreasing for d1 and increasing for d2).

Adding the same pause (say 10 hundreths of a second) to every frame can be done by applying map (,10) and this will produce an animation.

Force animation tool

To create force animations it was useful to create a tool to produce frames with stages of forcing.

forceFrames :: Angle Double 
            -> Int
            -> Tgraph 
            -> (Colour Double, Colour Double, Colour Double)
            -> [Diagram B]

This takes as arguments

  • an angle argument (to rotate the diagrams in the animation from the default alignment of the Tgraph),
  • an Int (for the required number of frames),
  • a Tgraph (to be forced),
  • a triple of colours for filling darts, kites and grout (edge colour), respectively.

The definition of forceFrames uses stepForce to advance forcing a given number of steps to get the intermediate Tgraphs. The total number of forcing steps will be the number of faces (half-tiles) in the final force g less the number of faces in the initial g. All the Tgraphs are drawn (using colourDKG) but the resulting diagrams must all be aligned properly. The alignment can be achieved by creating a VPatch (vertex patch) from the final Tgraph which is then rotated. All the Tgraphs can then be drawn using sub vertex patches of the final rotated one. (For details see Overlaid examples in the PenroseKiteDart user guide.)

Previous related blogs

  • PenroseKiteDart user guide – this explains how to install and use the PenroseKiteDart package.
  • Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.
  • Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
  • Graphs, Kites and Darts introduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
  • Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).

References

[1] Martin Gardner (1977) MATHEMATICAL GAMES. Scientific American, 236(1), (pages 110 to 121). http://www.jstor.org/stable/24953856

[2] Grünbaum B., Shephard G.C. (1987) Tilings and Patterns. W. H. Freeman and Company, New York. ISBN 0-7167-1193-1 (Hardback) (pages 540 to 542).

by readerunner at January 26, 2025 10:11 AM

January 25, 2025

Ken T Takusagawa

[kufstdwm] alpha-beta with transposition table as a library function

transposition table is the other elegant improvement to minimax (after alpha-beta): elegant in principle, hairy to implement in practice.

consider a generic implementation of alpha-beta game tree search with transposition table, generic enough to be applicable to any user-specified game.  what should be its API?  what features should it provide?

evaluate to infinite depth (possible because of transposition table), returning game value and line (principal variation).  intended for small games.

return the transposition table so that it can be reused for subsequent moves.

evaluate to given depth.  or, user-specified predicate of whether to stop searching, e.g., quiescence search.  quiescence search wants access to the transposition table.

ambitious: because of the many ways game tree search can be customized (for many examples, albeit often poorly described, see the chessprogramming wiki), structure the algorithm as a collection components each of which can be modified and hooked together in various ways.  I have no idea what language or framework could enable this kind of software engineering, though functional programming languages seem attractive as the first thing to try.  but beware that a pure functional programming language such as Haskell easily leaks space for this kind of task, and threading state, the transposition table, though the computation may be awkward.

common customizations sacrifice accuracy (correctness or completeness) for speed.  for example, if two different evaluated positions have the same key (for example, a 64-bit Zobrist hash in chess), one can optimize by doing no transposition table collision resolution; the second position gets ignored, assumed to have already been evaluated.  the default algorithm should not do such optimizations but should allow the user to specify both safe and unsafe optimizations.

allow the search to be augmented with various statistics gathered along the way that get consumed by other user-specified parts of the algorithm.  for example, the move generator could order moves based on values of similar moves already evaluated in other parts of the tree.

provide visibility into how user customizations are working, ways to evaluate whether or not they are worth it.

by Unknown (noreply@blogger.com) at January 25, 2025 04:34 AM

January 24, 2025

Sandy Maguire

Use Monoids for Construction

There’s a common anti-pattern I see in beginner-to-intermediate Haskell programmers that I wanted to discuss today. It’s the tendency to conceptualize the creation of an object by repeated mutation. Often this takes the form of repeated insertion into an empty container, but comes up under many other guises as well.

This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods:

Foo myFoo = new Foo().setBar().setQux(17).setZap(true);

Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this:

Doodad doodad = new Doodad;
foreach (Widget widget in widgets) {
  doodad.addWidget(widget);
}

To be more concrete, maybe instead of doodads and widgets you have BSTs and Nodes. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code.

Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like:

foldr
    (\(k, v) m -> Map.insert k v m)
    Map.empty
  $ toKVPairs something

Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark.

The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and foldrs all have in common is that they are algorithms for strategies for building something.

But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at.

I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one?

cgo :: (a -> (UInt, UInt)) -> [a] -> [NonEmpty a]
cgo f = foldr step []
  where
    step a [] = [pure a]
    step a bss0@((b :| bs) : bss)
      | let (al, ac) = f a
      , let (bl, bc) = f b
      , al + 1 == bl && ac == bc
            = (a :| b : bs) : bss
      | otherwise = pure a : bss0

which I found by grepping through haskell-language-server for foldr, and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests!

It’s Always Monoids

The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces.

The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.”

As an amazingly fast recap, a monoid is a collection of three things: some type m, some value of that type mempty, and binary operation over that type (<>) :: m -> m -> m, subject to a bunch of laws:

∀a. mempty <> a = a = a <> mempty
∀a b c. (a <> b) <> c = a <> (b <> c)

which is to say, mempty does nothing and (<>) doesn’t care where you stick the parentheses.

If you’re going to memorize any two particular examples of monoids, it had better be these two:

instance Monoid [a] where
  mempty = []
  a <> b = a ++ b

instance (Monoid a, Monoid b) => Monoid (a, b) where
  mempty = (mempty, mempty)
  (a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2)

The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids.

The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation toList a <> toList b = toList (a <> b). Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap.

The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example:

from :: (Bool -> a) -> (a, a)
from f = (f False, f True)

to :: (a, a) -> (Bool -> a)
to (a, _) False = a
to (_, a) True  = a

and under this isomorphism, we should thereby expect the Monoid a => Monoid (Bool -> a) instance to agree with Monoid a => Monoid (a, a). If you generalize this out, you get the following instance:

instance Monoid a => Monoid (x -> a) where
  mempty = \_ -> mempty
  f <> g = \x -> f x <> g x

which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation:

  from f <> from g
= (f False,  f True) <> (g False, g True)
= (f False <> g False, f True <> g True)
= ((f <> g) False, (f <> g) True)
= from (f <> g)

and

  to (a11, a12) <> to (a21, a22)
= \x -> to (a11, a12) x <> to (a21, a22) x
= \x -> case x of
    False -> to (a11, a12) False <> to (a21, a22) False
    True  -> to (a11, a12) True  <> to (a21, a22) True
= \x -> case x of
    False -> a11 <> a21
    True  -> a12 <> a22
= \x -> to (a11 <> a21, a12 <> a22) x
= to (a11 <> a21, a12 <> a22)

which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type x in the domain of the function, but showing it generically is challenging.

Anyway, I digresss.

The reason to memorize this Monoid instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be.

Don’t believe me? A Map k v is an encoding of the function k -> Maybe v optimized to efficiently query which k values map to Just something. That is to say, it’s a sparse representation of a function.

From Theory to Practice

What does all of this look like in practice? Stuff like worrying about foldr is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application.

The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things.

At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well.

It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the Scope construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these Scope nodes, and propagating their information downstream.

Rather amazingly, this implementation kinda sorta worked! But it was slow, and took \(O(10k)\) SLOC to implement.

The insight here is that everything we needed to collect was monoidal:

data Solution = Solution
  { graph :: Graph
  , collectedNodes :: Set Node
  , metadata :: Map Node Metadata
  }
  deriving stock (Generic)
  deriving (Semigroup, Monoidally) via Generically Solution

where the deriving (Semigroup, Monoidally) via Generically Solution stanza gives us the semigroup and monoid instances that we’d expect from Solution being the product of a bunch of other monoids.

And now for the coup de grace: we hook everything up with the Writer monad. Writer is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details:

instance Monoid w => Monad (Writer w)

Writer w is a monad whenever w is a monoid, which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions:

collectNode :: MonadWriter Solution m => Node -> m ()
collectNode n = tell $ mempty { collectedNodes = Set.singleton n }

addMetadata :: MonadWriter Solution m => Node -> Metadata -> m ()
addMetadata n m = tell $ mempty { metadata = Map.singleton n m }

emitGraphFragment :: MonadWriter Solution m => Graph -> m ()
emitGraphFragment g = tell $ mempty { graph = g }

each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type:

algorithm
  :: Metadata
  -- ^ the current scope
  -> Tree
  -- ^ the tree we're reducing
  -> Writer Solution Node
  -- ^ our partial solution, and the node corresponding to the root of the tree

which traverses the Tree, recursing with a different Metadata whenever it comes across a Scope constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root Node of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces.

This new implementation is roughly 20x smaller, coming in at @O(500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation.

Chalk it down to another win for induction!

January 24, 2025 09:35 AM

January 23, 2025

Tweag I/O

Contract Testing: Shifting Left with Confidence for Enhanced Integration

In software development, especially with microservices, ensuring seamless integration between components is crucial for delivering high-quality applications. One approach I really like, to tame this complexity, is contract testing.

Contract testing is a powerful technique that focuses on verifying interactions between software components early and in a controlled environment. In this post, I want to show why I think contract testing can often reduce the amount of integration testing.

Contract Testing

A contract, in this context, is a scenario that describes an interaction between two components. A very simple contract could describe a call to a REST API, and its response, but they can describe more complex scenarios.

Contract testing consists in testing both components against the contract and specifications. Crucially, there are two different tests: not only is Service-1 tested against the contract, but Service-2 is tested against the specifications. Contract testing doesn’t involve both components at the same time.

Contract-Testing

Contrary to unit testing with a mock, contract tests are bi-directional, verifying both requirements and implementation during build time. Also contrary to integration testing, we don’t need to tackle the preparation of dependencies in an integration testing environment. Contract testing can be run in an isolated way whenever there is an update, even locally. For these reasons, I consider contract testing as both easy and useful.

Challenges in Integration Testing

As the number of services grow, the number of interactions between them, that integration tests are traditionally responsible for testing, grows, and the challenges of interaction testing become more apparent:

  • Integration tests are late
  • Integration tests require a lot of context
  • Integration tests are expensive

Let me elaborate.

Integration tests are late

Integration testing is conducted after several stages in the pipeline, including static checks, code building, unit tests, reviews, and deployment to the test environment. Providing feedback on integration issues after all these steps requires repeating the entire process multiple times. Consequently, running integration tests can result in delayed feedback. While this approach may seem reasonable due to the structured process, it can become a significant bottleneck in the pipeline for large projects.

Integration tests require a lot of context

Integration testing environments encompass the integration of all necessary cloud resources, effectively creating production-like settings. This approach is valuable as it provides comprehensive feedback on the overall system’s performance. However, evaluating the impact of a single commit within such an environment is often slow and inefficient. Maintaining these environments is challenging, and any issues or failures in dependent services can lead to false-positive results.

Another significant challenge is flakiness, which frequently arises from improperly managed test data that might be used by one of the dependent services. Managing this data is complex due to the numerous dependencies involved in creating and manipulating it.

Integration tests are expensive

Challenges in integration testing makes it expensive to maintain and run tests. Building a complex production-like testing environment requires all dependent services and databases to be updated and functioning. Running a single check requires to go all the way down to the network. Imagine how hard, slow and expensive it is to run integration tests given the following network traffic:

End-to-end traffic at Netflix

The microservice architectural pattern has led to a notable increase of such complex networks. For this reason, the shift-right strategy of doing integration tests late in the pipeline became less relevant, and microservice pioneers like Netflix or Amazon have been advocating for a shift-left strategy such as contract testing to test their massive networks.

Microservice interactions at Netflix and Amazon

The first thing you observe from the image is that we have a lot of integrations between the microservices. With hundreds of microservices, the number of integrations between them becomes too many. Consider two services, which have one interaction. With three services, it can be up three, then six, ten, fifteen, twenty-one, and so on. This increases drastically as the number reaches hundreds. For example, 100 services have 4950 potential integrations, and 500 services have 124750. This is based on the <semantics>n<annotation encoding="application/x-tex">n</annotation></semantics>n-choose-<semantics>k<annotation encoding="application/x-tex">k</annotation></semantics>k formula where <semantics>n<annotation encoding="application/x-tex">n</annotation></semantics>n is the number of microservices, and <semantics>k=2<annotation encoding="application/x-tex">k=2</annotation></semantics>k=2 (as we’re counting pairs of services that can interact bidirectionally):

<semantics>(n2)=n!(n−2)! × 2!=n(n−1)2<annotation encoding="application/x-tex">{n\choose 2} = \frac{n!}{(n - 2)! \space \times \space 2!} = \frac{n(n-1)}{2}</annotation></semantics>(2n)=(n2)! × 2!n!=2n(n1)

This calculates the maximum number of integrations with <semantics>n<annotation encoding="application/x-tex">n</annotation></semantics>n services. Asymptotically, the number of interactions grows as the square of the number of services. It is not realistic to say we will have that many of integrations but it gives an idea of how fast the number of interactions grows. On the other hand, one interaction can involve many API calls, each requiring tests.

Let’s give a real-world example. Say we have 10 microservices and the integration between the microservices are shown in the table:

Microservice Integrates With
User Service Authentication Service, Profile Service, Notification Service
Authentication Service User Service, Authorization Service, API Gateway
Profile Service User Service, Notification Service, Database Service
Notification Service User Service, Profile Service, External Email Service
Authorization Service User Service, Resource Service, API Gateway
Resource Service Authorization Service, Logging Service, Payment Service
Billing Service User Service, Payment Service, Notification Service
Payment Service Billing Service, User Service, Notification Service
Logging Service Resource Service, Monitoring Service, Notification Service
Monitoring Service Logging Service, Notification Service, Dashboard Service

Total Integration Points = <semantics>∑<annotation encoding="application/x-tex">\sum</annotation></semantics> (Number of integrations for each microservice)
Total Integration Points = 3 + 3 + 3 + 3 + 3 + 3 + 3 + 3 + 3 + 3 = 30

Again each interaction can require many tests.

Netflix, Google, and Amazon are pioneers in microservices testing. Netflix publicly shared their experience, showing how their testing evolved. Netflix and Spotify have also changed the traditional test pyramid, turning it into a test diamond/honeycomb. While unit tests are still important, the focus has shifted to writing more integration tests rather than extensive unit tests. To learn more about Spotify’s test pyramid transformation for microservice testing, read this post.

Consumer-Driven Contract Testing

In the terminology of contract testing, a consumer is a client of the API under test while a provider is a service that exposes the API. The most common architecture for contract-testing is consumer-driven contract testing, where the contract is defined in the consumer component, and shared with the provider. The converse, provider-driven contract testing is mostly useful when you have a public API and want to share contracts with unknown consumers. I’ll be focusing on consumer-driven contract testing.

consumer-driven-contract-testing

Imagine the following scenario:

  • Order Service (Consumer): Responsible for managing orders and inventory.
  • Inventory Service (Provider): Maintains the inventory levels of products. The Order Service needs to check the availability of products in the Inventory Service before processing an order.

The Order Service needs to check the availability of products in the Inventory Service before processing an order. The Order Service sets the expectation as the consumer by defining this expectation in a specification which produces a contract document. The contract is then stored by a special service, called the broker, which makes the contract available to the provider for its own tests.

We’ll use Pact to ensure that the Inventory Service meets the expectations of the Order Service. Pact is the most popular contract-testing framework and supports many languages. Another popular contract-testing framework is Spring Cloud Contract which supports JVM based applications.

Consumer Implementation in Python (Order Service):

# test_order_service.py
import unittest
from pact import Consumer, Provider
import requests

class OrderServicePactTest(unittest.TestCase):
    def setUp(self):
        # create a `pact` object by defining the consumer and the provider
        self.pact = Consumer('OrderService').has_pact_with(Provider('InventoryService'), pact_specification_version="3.0.0")
        self.pact.start_service()
        self.addCleanup(self.pact.stop_service)
        self.base_url = 'http://localhost:1234'

    def test_order_service(self):
        # simple order object that should be the response
        expected = {
            'product_id': '123',
            'available': True
        }

        # setting the specification
        (self.pact
         .given('Product 123 exists')                                # set a precondition
         .upon_receiving('a request to check product availability')  # this is the name of the interaction aka test case, `description` of the interaction in the contract
         .with_request('get', '/inventory/123')                      # request detail
         .will_respond_with(200, body=expected))                     # response detail

        # running the specification
        with self.pact:
            result = requests.get(f'{self.base_url}/inventory/123')

        self.assertEqual(result.json(), expected)

if __name__ == '__main__':
    unittest.main()

Let’s run the test on the consumer side (Order Service):

python -m unittest test_order_service.py

Upon running the command above, the specification: “a request to check product availability”, in the Pact consumer test, is turned into an interaction in a document. This document is the contract between OrderService and InventoryService which is generated by Pact in a JSON file (e.g. orderservice-inventoryservice.json):

{
  "provider": {
    "name": "InventoryService"
  },
  "consumer": {
    "name": "OrderService"
  },
  "interactions": [
    {
      "description": "a request to check product availability",
      "request": {
        "method": "GET",
        "path": "/inventory/123",
        "headers": {}
      },
      "response": {
        "status": 200,
        "headers": {},
        "body": {
          "available": true
        }
      }
    }
  ],
  "metadata": {
    "pactSpecification": {
      "version": "3.0.0"
    }
  }
}

Notice how this test can easily be run locally on your machine. It can just as easily run in CI, even if the inventory service is in another repository, since no actual inventory service is required to run the test. It doesn’t require a complex setup or configuration, and runs on the local network loop, which is very fast.

When the contract is ready, we can publish it to the Pact broker which is a service for holding all the contracts.

export PACT_BROKER_BASE_URL=<patc-broker-url>
export PACT_BROKER_USERNAME=<username>
export PACT_BROKER_PASSWORD=<password>

pact-broker publish ./pacts/orderservice-inventoryservice.json \
    --consumer-app-version consumer-version \
    --broker-base-url $PACT_BROKER_BASE_URL \
    --broker-username $PACT_BROKER_USERNAME \
    --broker-password $PACT_BROKER_PASSWORD \
    --tag dev

You can either run the Pact broker locally or use the cloud services provided by Pact. Either way, you’ll have to set up a few environment variables for the Pact CLI to connect to the service. To run the Pact broker locally, you should select an database adapter such as sqlite or postgres and then run the Docker command.

docker run -d --name pact-broker -p 9292:9292 \
  -e PACT_BROKER_DATABASE_ADAPTER=sqlite \
  -e PACT_BROKER_DATABASE_NAME=/var/pact_broker/db.sqlite3 \
  -v $(pwd)/pact_broker:/var/pact_broker \
  pactfoundation/pact-broker

Provider Implementation in Python (Inventory Service):

# test_inventory_service.py
import unittest
from pact import Verifier

class InventoryServicePactTest(unittest.TestCase):
    def test_inventory_service(self):
        # define the verifier by defining the `provider` which will be used to get all the contracts
        # whose provider are set to `InventoryService` so that to run all the verification tests
        verifier = Verifier(provider='InventoryService', provider_base_url='http://localhost:8000')
        pact_broker_url = '<pact-broker-url>'
        broker_username = '<username>'
        broker_password = '<password>'

        # `verify_with_broker` connects to the pact broker and pulls all the related contract and does the verification
        verifier.verify_with_broker(
            broker_url=pact_broker_url,
            broker_username=broker_username,
            broker_password=broker_password,
            publish_version='1.0.0',
            provider_tags=['master']
        )

if __name__ == '__main__':
    unittest.main()

First, we should run the provider service (inventory service):

python inventory_service.py

Then, let’s run the provider verification test for the inventory service:

python -m unittest test_inventory_service.py

Here again, the test was easy to run, and doesn’t require any knowledge of an actual implementation of the consumer.

What happens if there’s a mistake in the provider’s implementation, and it doesn’t actually satisfy the contract? In this case, Pact would respond with an error looking something like this.

    >       assert resp.status_code == 200, resp.text
    E       AssertionError: Actual interactions do not match expected interactions for mock MockService.
    E
    E       Missing requests:
    E           GET /inventory/123
    E
    E       See pact-mock-service.log for details.

    venv/lib/python3.10/site-packages/pact/pact.py:209: AssertionError

After the error is fixed, you can check the status, for instance, in Pact’s UI (pactflow):

pactflow-pact-broker-service

Conclusion

Contract testing addresses the challenges inherent in integration testing. By shifting integration testing to an earlier stage in the development process, it eliminates the need for maintaining complex integration testing environments, such as data preparation and deployment. Additionally, contract testing can be executed in isolation whenever a service changes, removing the necessity for integrating all services into a single running environment.

Contract testing doesn’t eliminate the need for integration tests altogether, such as testing end-to-end scenarios as system tests. But many integration tests can be replaced by contract tests, such as interactions between microservices. As a consequence we can have much fewer tests that depend on a complex, slow, unreliable network environment, rendering the whole process much faster.

January 23, 2025 12:00 AM

Brent Yorgey

You could have invented Fenwick trees

You could have invented Fenwick trees

Posted on January 23, 2025
Tagged , , , , , ,

My paper, You could have invented Fenwick trees, has just been published as a Functional Pearl in the Journal of Functional Programming. This blog post is an advertisement for the paper, which presents a novel way to derive the Fenwick tree data structure from first principles.

Suppose we have a sequence of integers \(a_1, \dots, a_n\) and want to be able to perform two operations:

  • we can update any \(a_i\) by adding some value \(v\) to it; or
  • we can perform a range query, which asks for the sum of the values \(a_i + \dots + a_j\) for any range \([i,j]\).

There are several ways to solve this problem. For example:

  1. We could just keep the sequence of integers in a mutable array. Updating is \(O(1)\), but range queries are \(O(n)\) since we must actually loop through the range and add up all the values.
  2. We could keep a separate array of prefix sums on the side, so that \(P_i\) stores the sum \(a_1 + \dots + a_i\). Then the range query on \([i,j]\) can be computed as \(P_j - P_{i-1}\), which only takes \(O(1)\); however, updates now take \(O(n)\) since we must also update all the prefix sums which include the updated element.
  3. We can get the best of both worlds using a segment tree, a binary tree storing the elements at the leaves, with each internal node caching the sum of its children. Then both update and range query can be done in \(O(\lg n)\).

I won’t go through the details of this third solution here, but it is relatively straightforward to understand and implement, especially in a functional language.

However, there is a fourth solution, known as a Fenwick tree or Fenwick array, independently invented by Ryabko (1989) and Fenwick (1994). Here’s a typical Java implementation of a Fenwick tree:

class FenwickTree {
    private long[] a;
    public FenwickTree(int n) { a = new long[n+1]; }
    public long prefix(int i) {
        long s = 0;
        for (; i > 0; i -= LSB(i)) s += a[i]; return s;
    }
    public void update(int i, long delta) {
        for (; i < a.length; i += LSB(i)) a[i] += delta;
    }
    public long range(int i, int j) {
        return prefix(j) - prefix(i-1);
    }
    public long get(int i) { return range(i,i); }
    public void set(int i, long v) { update(i, v - get(i)); }
    private int LSB(int i) { return i & (-i); }
}

I know what you’re thinking: what the heck!? There are some loops adding and subtracting LSB(i), which is defined as the bitwise AND of i and -i? What on earth is this doing? Unless you have seen this before, this code is probably a complete mystery, as it was for me the first time I encountered it.

However, from the right point of view, we can derive this mysterious imperative code as an optimization of segment trees. In particular, in my paper I show how we can:

  1. Start with a segment tree.
  2. Delete some redundant info from the segment tree, and shove the remaining values into an array in a systematic way.
  3. Define operations for moving around in the resulting Fenwick array by converting array indices to indices in a segment tree, moving around the tree appropriately, and converting back.
  4. Describe these operations using a Haskell EDSL for infinite-precision 2’s complement binary arithmetic, and fuse away all the intermediate conversion steps, until the above mysterious implementation pops out.
  5. Profit.

I may be exaggerating step 5 a teensy bit. But you’ll find everything else described in much greater detail, with pretty pictures, in the paper! The official JFP version is here, and here’s an extended version with an appendix containing an omitted proof.

References

Fenwick, Peter M. 1994. “A New Data Structure for Cumulative Frequency Tables.” Software: Practice and Experience 24 (3): 327–36.
Ryabko, Boris Yakovlevich. 1989. “A Fast on-Line Code.” In Doklady Akademii Nauk, 306:548–52. 3. Russian Academy of Sciences.
<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at January 23, 2025 12:00 AM

January 22, 2025

Haskell Interlude

61: Sam Lindley

Sam Lindley is a Reader in Programming Languages Design and Implementation at the University of Edinburgh. In this episode, he tells us how difficult naming is, the different kinds of effect systems and handlers, languages *much* purer than Haskell, and Modal logic.

by Haskell Podcast at January 22, 2025 09:00 PM

Well-Typed.Com

grapesy: industrial strength gRPC library for Haskell

Well-Typed are delighted to announce the release of grapesy (Hackage, GitHub), an industial strength Haskell library providing support for gRPC, a modern open source high performance Remote Procedure Call (RPC) framework developed by Google. The library has the following features:

  • Parametric in the choice of message format; Protobuf is the most common choice for gRPC and is of course supported, as is JSON1. There is also support for general binary (“raw”) messages, and adding additional formats is easy and can be done without modifying grapesy itself.

  • Client-side and server-side support for the Protobuf common communication patterns: non-streaming, client-side streaming, server-side streaming, and bidirectional streaming. Use of these patterns is independent of the choice of message encoding, and is strictly optional.

  • For the specific case of Protobuf, support for Protobuf rich errors.

  • Support for metadata: request initial metadata, response initial metadata, and response trailing metadata.

  • Support for all the common gRPC compression algorithms: gzip and zlib (both through the zlib package), as well as snappy (through a new package snappy-c, developed for this purpose). Bespoke compression algorithms can also be used, and compression can be disabled on a per-message basis.

  • Support for both unencrypted and encrypted connections (TLS).

  • Support for cancellation, both through deadlines/timeouts (server-side cancellation) as well as through terminating a RPC early (client-side cancellation).2

  • Flow control: we are careful to use back-pressure to limit traffic, ultimately relying on HTTP2 flow control (which can be adjusted through the HTTP2Settings, primarily the stream window size and the connection window size).

  • Support for Wait-for-Ready, where the connection to a server can be (re)established in the background, rather than an RPC failing if the server is not immediately available. Note that this must be enabled explicitly (as per the spec).

  • Asynchronous design: operations happen in the background whenever possible (opening a connection, initiating an RPC, sending a message), and exceptions are only raised when directly interacting with those background processes. For example, when a client disconnects from the server, the corresponding handler will only get an exception if it attempts any further communication with that client. This is particularly important in RPC servers, which may need to complete certain operations even if the client that requested those operations did not stick around to wait for them.

  • Type safety: the types of inputs (messages sent from the client to the server) and outputs (messages from the server to the client), as well as the types of the request and response metadata, are all determined from the choice of a specific RPC. In addition, for Protobuf servers we can guarantee at the type-level that all methods are handled (or explicitly declared as unsupported).

  • Extensive documentation: this blog post contains a number of tutorials that highlight the various parts of grapesy, and the Haddock documentation is comprehensive.

The library is designed to be robust:

  • Exception safety: all exceptions, in both client and in server contexts, are caught and handled in context appropriate ways; they are never simply “lost”. Server-side exceptions are reported as gRPC errors on the client; handlers can also throw any of the standard gRPC errors.

  • Deals correctly with broken deployments (clients or servers that do not conform to the gRPC specification). This includes things such as dealing with non-200 HTTP status codes, correctly responding to unsupported content types (for which the gRPC spec mandates a different resolution on servers and clients), dealing with servers that don’t respect timeouts, etc.

  • Native Haskell library (does not bind to any C or C++ libraries).

  • Comes with a comprehensive test suite, which has been instrumental in achieving high reliability, as well as finding problems elsewhere in the network stack; as part of the development of grapesy we have also made numerous improvements to http2 and related libraries3. Many thanks to Kazu Yamamoto for being so receptive to all our PRs and willing to discuss all the issues we found, as well as his hard work on these core infrastructure libraries!

  • No memory leaks: even under stress conditions, memory usage is completely flat in both the client and the server.

  • Good performance, on par with the official Java implementation.

Developing a library of this nature is a significant investment, and so Well-Typed is thankful to Anduril for sponsoring the work.

Quickstart

In this section we explain how to get started, in the style of the official Quickstart guide. You can also use the Quickstart example as a basic template for your own gRPC applications.

gRPC tools

Neither gRPC nor grapesy requires the use of Protobuf, but it is the most common way of using gRPC, and it is used by both the Quickstart tutorial as well as the Basics tutorial. You will therefore need to install the protobuf buffer compiler protoc, which can usually be done using your system’s package manager; see Protobuf Buffer Compiler Installation for details.

Download the example

If you want to work through this quick start, you will need to clone the grapesy repository:

$ git clone https://github.com/well-typed/grapesy.git
$ cd grapesy/tutorials/quickstart

Run a gRPC application

From the grapesy/tutorials/quickstart directory, run the server

$ cabal run greeter_server

From another terminal, run the client:

$ cabal run greeter_client

If all went well, you should see the server responding to the client with

Proto {message: "Hello, you"}

Update the gRPC service

Now let’s try to add another method to the Greeter service. This service is defined using protocol buffers; for an introduction to gRPC in general and Protobuf specifically, you may wish to read the official Introduction to gRPC; we will also see more examples of Protobuf below in the Basics tutorial. You can find the definition for the quickstart tutorial in tutorials/quickstart/proto/helloworld.proto:

syntax = "proto3";

// The greeting service definition.
service Greeter {
  // Sends a greeting
  rpc SayHello (HelloRequest) returns (HelloReply) {}
}

// The request message containing the user's name.
message HelloRequest {
  string name = 1;
}

// The response message containing the greetings
message HelloReply {
  string message = 1;
}

Let’s add another method to this service, with the same request and response types:

service Greeter {
  // Sends a greeting
  rpc SayHello (HelloRequest) returns (HelloReply) {}

  // Sends another greeting
  rpc SayHelloAgain (HelloRequest) returns (HelloReply) {}
}

Generate gRPC code

The example is set up to use a custom Cabal setup script to automatically compile the proto definitions; see proto-lens-setup for a detailed discussion on how to do this. If you prefer not to use custom setup scripts in your own projects, it is also possible to run the Protobuf compiler manually; see section Manually running the protocol compiler of the proto-lens-protoc documentation.

This means that to re-run the Protobuf compiler, it suffices to build either the client or the server; let’s attempt to build the server:

$ cabal build greeter_server

You should see a type error:

app/Server.hs:13:7: error: [GHC-83865]
    • Couldn't match type: '[]
                     with: '[Protobuf Greeter "sayHelloAgain"]

This is telling you that the server is incomplete: we are missing a handler for the new sayHelloAgain method.

Update the server

To update the server, edit Server.hs and add:

sayHelloAgain :: Proto HelloRequest -> IO (Proto HelloReply)
sayHelloAgain req = do
    let resp = defMessage & #message .~ "Hello again, " <> req ^. #name
    return resp

Then update methods to list the new handler:

methods :: Methods IO (ProtobufMethodsOf Greeter)
methods =
      Method (mkNonStreaming sayHello)
    $ Method (mkNonStreaming sayHelloAgain)
    $ NoMoreMethods

Update the client

Unlike the server, the change to the service definition does not require changes to the client. The server must implement the new method, but the client does not have to call it. Of course, it is more interesting when it does, so let’s add another call to Client.hs:

withConnection def server $ \conn -> do
  let req = defMessage & #name .~ "you"
  resp <- nonStreaming conn (rpc @(Protobuf Greeter "sayHello")) req
  print resp
  resp2 <- nonStreaming conn (rpc @(Protobuf Greeter "sayHelloAgain")) req
  print resp2

Run

After restarting greeter_server, running greeter_client should now output

Proto {message: "Hello, you"}
Proto {message: "Hello again, you"}

Basics

In this section we delve a little deeper, following the official Basics tutorial, which introduces the RouteGuide service. From the official docs:

Our example is a simple route mapping application that lets clients get information about features on their route, create a summary of their route, and exchange route information such as traffic updates with the server and other clients.

You can find the example in the tutorials/basics directory of the grapesy repo.

Defining the service

The RouteGuide example illustrates the four different kinds of communication patterns that Protobuf services can have. You can find the full service definition in tutorials/basics/proto/route_guide.proto:

  • Non-streaming: client sends a single input, server replies with a single output:

    // Obtains the feature at a given position.
    rpc GetFeature(Point) returns (Feature) {}
  • Server-side streaming: client sends a single input, server can respond with any number of outputs:

    // Obtains the Features available within the given Rectangle.
    rpc ListFeatures(Rectangle) returns (stream Feature) {}
  • Client-side streaming: client can send any number of inputs, after which the server responds with a single output:

    // Accepts a stream of Points on a route being traversed, returning a
    // RouteSummary when traversal is completed.
    rpc RecordRoute(stream Point) returns (RouteSummary) {}
  • Bidirectional streaming: the client and the server can exchange messages at will:

    // Accepts a stream of RouteNotes sent while a route is being traversed,
    // while receiving other RouteNotes (e.g. from other users).
    rpc RouteChat(stream RouteNote) returns (stream RouteNote) {}

There is explicit support in grapesy for these four communication patterns, both for defining servers and for defining clients. In addition, there is a lower-level API which provides more control over the communication; we will see some examples in Beyond the basics.

Generated code

As in the Quickstart, we have set things up in the example to automatically generate Haskell code from the .proto definition. There is however one more thing that we need to take care of, which we glossed over previously. The .proto definition is sufficient to determine the types of the methods of the service, their arguments, and their results. But it does not say anything about the type of any metadata. We don’t need any metadata in this example, so we can declare the following module:

module Proto.API.RouteGuide (
    module Proto.RouteGuide
  ) where

import Network.GRPC.Common
import Network.GRPC.Common.Protobuf

import Proto.RouteGuide

type instance RequestMetadata          (Protobuf RouteGuide meth) = NoMetadata
type instance ResponseInitialMetadata  (Protobuf RouteGuide meth) = NoMetadata
type instance ResponseTrailingMetadata (Protobuf RouteGuide meth) = NoMetadata

This re-exports module Proto.RouteGuide (which was generated), along with three type family instances that indicate that none of the methods of the RouteGuide require metadata. We will see an example of using metadata later.

Proto wrapper

In the repository you will find an implementation of the logic of the RouteGuide example as a collection of pure functions; see tutorials/basics/src/RouteGuide.hs. For example, the type of the function that looks up which feature exists at a particular point, given the example database of features, is given by:

featureAt :: DB -> Proto Point -> Maybe (Proto Feature)

The precise implementation is not very important for our purposes here, but we should discuss that Proto wrapper. This is a type-level marker that explicitly identifies Protobuf values. Such values don’t behave like regular Haskell values; for example, record fields always have defaults, enums might have unknown values, etc. The idiomatic way of accessing fields of a Proto value is using a lens access and an (overloaded) label; for example, the following expression extracts a field #location from a feature (f :: Proto Feature):

f ^. #location

To construct a Proto value you first create an empty value using defMessage, and then update individual fields with a lens update. For example, here is how we might construct a Proto RouteSummary:

defMessage
  & #pointCount   .~ ..
  & #featureCount .~ ..
  & #distance     .~ ..
  & #elapsedTime  .~ ..

Everything required to work with Protobuf values is (re-)exported from Network.GRPC.Common.Protobuf. In addition, Network.GRPC.Common.Protobuf.Any provides functionality for working with the Protobuf Any type.

Implementing the server

We can use the type checker to help us in the development of the server. We know that we want to implement the methods of the RouteGuide service; if we define

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db = _

the type checker will tell us that it expects something of this type4:

_ :: Methods IO [
    Protobuf RouteGuide "getFeature"
  , Protobuf RouteGuide "listFeatures"
  , Protobuf RouteGuide "recordRoute"
  , Protobuf RouteGuide "routeChat"
  ]

We can therefore refine methods to

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db =
      Method _getFeature
    $ Method _listFeatures
    $ Method _recordRoute
    $ Method _routeChat
    $ NoMoreMethods

at which point the type checker informs us:

_getFeature   :: ServerHandler' NonStreaming    IO (Protobuf RouteGuide "getFeature")
_listFeatures :: ServerHandler' ServerStreaming IO (Protobuf RouteGuide "listFeatures")
_recordRoute  :: ServerHandler' ClientStreaming IO (Protobuf RouteGuide "recordRoute")
_routeChat    :: ServerHandler' BiDiStreaming   IO (Protobuf RouteGuide "routeChat")

We can therefore refine once more to

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db =
      Method (mkNonStreaming    $ _getFeature)
    $ Method (mkServerStreaming $ _listFeatures)
    $ Method (mkClientStreaming $ _recordRoute)
    $ Method (mkBiDiStreaming   $ _routeChat)
    $ NoMoreMethods

The resulting types will depend on the communication pattern (non-streaming, client-side streaming, etc.). We will discuss them one by one.

Non-streaming RPC

The first method is a non-streaming RPC, for which the type checker infers:

_getFeature :: Proto Point -> IO (Proto Feature)

That is, we are given a point of interest, and must return “the” feature at that point. We will also need the database of features. The implementation is straight-forward, and essentially just wraps the pure function featureAt:

getFeature :: DB -> Proto Point -> IO (Proto Feature)
getFeature db p = return $ fromMaybe (defMessage & #location .~ p) (featureAt db p)

The only minor complication here is that we need to construct some kind of default location for when there is no feature found at point p.

Server-side streaming

For server-side streaming we are given the input from the client, along with a function that we can use to send outputs back to the client:

_listFeatures :: Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> IO ()

NextElem is similar to Maybe:

data NextElem a = NoNextElem | NextElem !a

but with a more specialized API. For example, it offers

forM_ :: Monad m => [a] -> (NextElem a -> m ()) -> m ()

which will invoke the specified callback NextElem x for all x in the list, and then invoke the callback once more with NoNextElem. We can use this to implement listFeatures:

listFeatures :: DB -> Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> IO ()
listFeatures db r send = NextElem.forM_ (featuresIn db r) send

Client-side streaming

For client-side streaming we are given a function to receive inputs from the client, and must produce a single output to be sent back to the client:

_recordRoute :: IO (NextElem (Proto Point)) -> IO (Proto RouteSummary)

To implement it, we can use another function from the NextElem API:

collect :: Monad m => m (NextElem a) -> m [a]

The only other complication is that the function which constructs the RouteSummary also wants to know how long it took to collect all points:

recordRoute :: DB -> IO (NextElem (Proto Point)) -> IO (Proto RouteSummary)
recordRoute db recv = do
    start <- getCurrentTime
    ps    <- NextElem.collect recv
    stop  <- getCurrentTime
    return $ summary db (stop `diffUTCTime` start) ps

Bidirectional streaming

For bidirectional streaming finally we get two functions: one to receive inputs from the client, and one to send outputs back to the client:

_routeChat ::
     IO (NextElem (Proto RouteNote))
  -> (NextElem (Proto RouteNote) -> IO ())
  -> IO ()

The implementation is straight-forward and does not require any new grapesy features; you can find it in tutorials/basics/app/Server.hs.

Top-level application

The main server application then looks like this:

main :: IO ()
main = do
    db <- getDB
    runServerWithHandlers def config $ fromMethods (methods db)
  where
    config :: ServerConfig
    config = ServerConfig {
          serverInsecure = Just (InsecureConfig Nothing defaultInsecurePort)
        , serverSecure   = Nothing
        }

The first parameter to runServerWithHandlers are the server parameters. The most important parameters to consider are serverTopLevel and serverExceptionToClient. These two are related, and describe how to deal with exceptions:

  • serverTopLevel says what to do with exceptions server-side; by default it simply prints them to stderr
  • serverExceptionToClient says what information to include in the error sent to the client; by default it calls displayException. You may wish to override this if you are concerned about leaking security sensitive information.

Implementing the client

You can find the complete client in tutorials/basics/app/Client.hs.

Connecting to the server

Before we can make any RPCs, we have to connect to the server:

main :: IO ()
main =
    withConnection def server $ \conn -> do
      ..
  where
    server :: Server
    server = ServerInsecure $ Address "127.0.0.1" defaultInsecurePort Nothing

The first argument are the connection parameters, the most important of which is probably the reconnection policy which (amongst other things) is used to enable Wait-for-Ready semantics.

Simple RPC

We already saw how to make a simple non-streaming RPC in the quickstart:

getFeature :: Connection -> IO ()
getFeature conn = do
    let req = defMessage
                & #latitude  .~  409146138
                & #longitude .~ -746188906
    resp <- nonStreaming conn (rpc @(Protobuf RouteGuide "getFeature")) req
    print resp

We construct a request, do the RPC, and print the response.

Server-side streaming

When we make a server-side streaming RPC, we are given a function we can call to get all of the server outputs:

listFeatures :: Connection -> IO ()
listFeatures conn = do
    let req = ..
    serverStreaming conn (rpc @(Protobuf RouteGuide "listFeatures")) req $ \recv ->
      NextElem.whileNext_ recv print

Here we are using another function from the NextElem API, in a sense dual to the one we used server-side; for comparison, both types:

forM_      :: Monad m => [a] -> (NextElem a -> m ()) -> m ()
whileNext_ :: Monad m => m (NextElem a) -> (a -> m b) -> m ()

Client-side streaming

To make a client-side streaming RPC, we are given a function that we can use to send inputs to the server; once we are done sending all inputs, we then receive the final (and only) output from the server:

recordRoute :: Connection -> IO ()
recordRoute conn = do
    resp <- clientStreaming_ conn (rpc @(Protobuf RouteGuide "recordRoute")) $ \send -> do
      replicateM_ 10 $ do
        let p = (db !! i) ^. #location
        send $ NextElem p
        threadDelay 500_000 -- 0.5 seconds
      send NoNextElem
    print resp

Bidirectional streaming

Finally, for bidirectional streaming we are given two functions, one to send, and one to receive. In this particular case, we can first send all inputs and then receive all outputs, but in general these can be interleaved in any order:

routeChat :: Connection -> IO ()
routeChat conn = do
    biDiStreaming conn (rpc @(Protobuf RouteGuide "routeChat")) $ \send recv -> do
      NextElem.forM_ messages send
      NextElem.whileNext_ recv print
  where
    messages = ..

See also The Haskell Unfolder, episode 27: duality for a more in-depth look into the duality between these various communication patterns.

End of output

When we discussed the client-side implementation of a client-side streaming RPC, we used function clientStreaming_:

clientStreaming_ ::
     ..
  -> (    (NextElem (Input rpc) -> m ())
       -> m ()
     )
  -> m (Output rpc)

The callback is given a function (which we called send) to send outputs to the server. The problem with this approach is that it’s possible to forget to call this function; in particular, it’s quite easy to forget the final

send NoNextElem

to indicate to the server that there is no further input coming. In some cases iteration functions such as NextElem.forM_ can take care of this, but this could also result in the opposite problem, calling send on a NextElem after NoNextElem has already been sent.

In short: make sure to send NoNextElem in clients or servers that stream values to their peer:

  • If you forget to do this in a server handler, grapesy will assume this is a bug and throw a HandlerTerminated exception, which will be reported as a gRPC exception with an unknown error on the client.

  • If you forget to do this in a client, grapesy will assume that you intend to cancel the RPC. The server will see call closed suddenly5, and on the client this will result in a gRPC exception with “cancelled” as the error.

Sending more elements after sending NoNextElem will result in SendAfterFinal exception.

Side note. In principle it is possible to give clientStreaming_ a different type:

-- Alternative definition, not actually used in grapesy
clientStreaming_ ::
     ..
  -> m (NextElem (Input rpc))
  -> m (Output rpc)

In this style there is no callback at all; instead, we must provide an action that produces the next element one by one, and the library will ensure that the function is called repeatedly until it returns NoNextElem. This amounts to inversion of control: you don’t call a function to send each value, but the library calls you to ask what the next value to be sent is. This provides stronger guarantees that the communication pattern is implemented correctly, but we deemed the cost too high: it results in quite an awkward programming model. Of course, if you want to, nothing stops you from defining such an API on top of the API offered by grapesy.

Beyond the basics

In this section we describe some of the more advanced features of grapesy.

Using the low-level API

Both the Quickstart and the Basics tutorial used the StreamType API, which captures the four different communication patterns (aka streaming types) used in Protobuf, both on the server and on the client: non-streaming, server-side streaming, client-side streaming, and bidirectional streaming. Although these four communication patterns originate in Protobuf, in grapesy they are not Protobuf specific and can also be used with other message encodings.

The high-level API will probably suffice for the vast majority of gRPC applications, but not quite all, and grapesy also offers a low-level API. The most important reasons to use the low-level API instead are:

  • Making sure that the final message is marked as final; we discuss this in more detail in this section in Final elements.

  • Sending and receiving metadata; we will discuss this in detail in the next section Using metadata.

  • Preference: some people may simpler prefer the style of the low-level API over the high-level API.

Although the use of the low-level API does come with some responsibilities that are taken care of for you in the high-level API, it is not significantly more difficult to use.

Final elements

When we discussed the high-level API, we saw the NextElem type. The low-level API uses StreamElem instead; here they are side by side:

data NextElem     a = NoNextElem     | NextElem   !a
data StreamElem b a = NoMoreElems !b | StreamElem !a | FinalElem !a !b

There are two differences here:

  • When there are no more elements, we record an additional value. This is the metadata to be sent or received after the final element. We will see an example of this below; for RouteGuide this metadata will always be NoMetadata, which is a trivial type isomorphic to ():

    data NoMetadata = NoMetadata
  • The final element can be marked as final, rather than requiring a separate NoMoreElems value. This may feel like an insignificant difference, but although it is a technicality, in some cases it’s an important technicality.

To understand the need for marking the final element, we need to understand that gRPC messages are transferred over HTTP2 DATA frames. It’s not necessarily true that one frame corresponds to one message, but let’s for the sake of simplicity assume that it is. Then in order to send 3 messages, we have two options:

Option 1: empty final frame Option 2: mark final message
frame 1: message 1 frame 1: message 1
frame 2: message 2 frame 2: message 2
frame 3: message 3 frame 3: message 3, marked END_STREAM
frame 4: empty, marked END_STREAM

corresponding to

    [StreamElem 1, StreamElem 2, StreamElem 3, NoMoreElems NoMetadata]
and [StreamElem 1, StreamElem 2, FinalElem 3 NoMetadata]

respectively. This matters because some servers report an error if they receive a message that they expect will be the final message, but the corresponding HTTP2 DATA frame is not marked END_STREAM. This is not completely unreasonable: after all, waiting to receive the next DATA frame might be a blocking operation.

This is particularly important in cases where a server (or client) only expects a single message (non-streaming, client-side streaming, expecting a single output from the server, or server-side streaming, expecting a single input from the client). It is much less critical in other situations, which is why the high-level API gets away with using NextElem instead of StreamElem (which it uses only when multiple messages are expected).

On the server

To use the low-level API on the server, you can either use RawMethod to use the low-level API for some (or all) of the methods of an API, or you avoid the use of fromMethods altogether. The latter option is primarily useful if you don’t have a type-level description of your service available. If you do, the first option is safer:

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db =
      RawMethod (mkRpcHandler $ getFeature   db)
    $ RawMethod (mkRpcHandler $ listFeatures db)
    $ RawMethod (mkRpcHandler $ recordRoute  db)
    $ RawMethod (mkRpcHandler $ routeChat      )
    $ NoMoreMethods

It is also possible to use the high-level API for most methods, and escape to the low-level API for those methods that need it.

Unlike with the high-level API, the signature of all handlers that use the low-level API is the same:

getFeature   :: DB -> Call (Protobuf RouteGuide "getFeature")   -> IO ()
listFeatures :: DB -> Call (Protobuf RouteGuide "listFeatures") -> IO ()
recordRoute  :: DB -> Call (Protobuf RouteGuide "recordRoute")  -> IO ()
routeChat    ::       Call (Protobuf RouteGuide "routeChat")    -> IO ()

The most important two functions6 for communication on the server are recvInput and sendOutput:

recvInput  :: Call rpc -> IO (StreamElem NoMetadata (Input rpc))
sendOutput :: Call rpc -> StreamElem (ResponseTrailingMetadata rpc) (Output rpc) -> IO ()

For convenience there are also some derived functions available; for example, here is getFeature again, now using the low-level API:

getFeature :: DB -> Call (Protobuf RouteGuide "getFeature") -> IO ()
getFeature db call = do
    p <- recvFinalInput call
    sendFinalOutput call (
        fromMaybe (defMessage & #location .~ p) (featureAt db p)
      , NoMetadata
      )

The StreamElem API also offers some iteration functions similar to the ones offered by NextElem; for example, here is listFeatures:

listFeatures :: DB -> Call (Protobuf RouteGuide "listFeatures") -> IO ()
listFeatures db call = do
    r <- recvFinalInput call
    StreamElem.forM_ (featuresIn db r) NoMetadata (sendOutput call)

The full server definition is available in tutorials/lowlevel/app/Server.hs.

On the client

The main function to make an RPC using the low-level API is withRPC. For example, here is getFeature:

getFeature :: Connection -> IO ()
getFeature conn = do
    let req = ..
    withRPC conn def (Proxy @(Protobuf RouteGuide "getFeature")) $ \call -> do
      sendFinalInput call req
      resp <- recvFinalOutput call
      print resp

The second argument to withRPC are the call parameters, of which there are two important ones: the timeout for this RPC, and the request metadata. (When using the high-level API the only way to set a timeout is to specify the default RPC timeout for the connection.)

End of output, revisited

At the end of the basics tutorial, we emphasized the importance of indicating end of output for streaming clients and server handlers. The discussion there is relevant when using the low-level API as well, with some additional caveats:

  • In the high-level API, the library can take care of marking the (only) value for non-streaming output as final; in the low-level API, this is your own responsibility, either through calling sendFinalInput / sendFinalOutput or through calling sendInput / sendOutput and constructing the StreamElem manually.

  • For streaming outputs, you can use sendEndOfInput (clients) or sendTrailers (servers) to indicate end of output after the fact (like NoNextElem does), or use sendFinalInput / sendFinalOutput to mark the final element as final when you send it. This should be preferred whenever possible.

Using metadata

As an example of using metadata, let’s construct a simple file server which tells the client the size of the file to be downloaded as the initial response metadata, then streams the contents of the file as a series of chunks, and finally reports a SHA256 hash of the file contents in the trailing response metadata. The client can use the initial file size metadata to show a progress bar, and the hash in the trailing metadata to verify that everything went well.

You can find the full example in tutorials/metadata.

Service definition

The .proto file is straight-forward:

syntax = "proto3";

package fileserver;

service Fileserver {
  rpc Download (File) returns (stream Partial) {}
}

message File {
  string name = 1;
}

message Partial {
  bytes chunk = 1;
}

As mentioned above, however, the .proto definition does not tell us the type of the metadata. We need to do this in Haskell:

type instance RequestMetadata          (Protobuf Fileserver "download") = NoMetadata
type instance ResponseInitialMetadata  (Protobuf Fileserver "download") = DownloadStart
type instance ResponseTrailingMetadata (Protobuf Fileserver "download") = DownloadDone

data DownloadStart = DownloadStart {
      downloadSize :: Integer
    }
  deriving stock (Show)

data DownloadDone = DownloadDone {
      downloadHash :: ByteString
    }
  deriving stock (Show)

(In this example we make no use of request metadata; see callRequestMetadata for the main entry point for setting request metadata.)

Serialization

In order for the server to be able to send the metadata to the client, we need to be able serialize it as one (or more, or zero) headers/trailers. This means we must give an instance of BuildMetadata:

instance BuildMetadata DownloadStart where
  buildMetadata DownloadStart{downloadSize} = [
        CustomMetadata "download-size" $ C8.pack (show downloadSize)
      ]

instance BuildMetadata DownloadDone where
  buildMetadata DownloadDone{downloadHash} = [
        CustomMetadata "download-hash-bin" downloadHash
      ]

Note the use of the -bin suffix for the name of the download-hash-bin trailer: this indicates that this is metadata containing binary data, and that it must be Base64-encoded; grapesy will automatically take care of encoding and decoding for binary metadata.

We need to take care of one more thing. The HTTP2 spec mandates that clients must be informed up-front which trailing headers they can expect. In grapesy this comes down to giving an instance of StaticMetadata:

instance StaticMetadata DownloadDone where
  metadataHeaderNames _ = ["download-hash-bin"]

This can be an over-approximation but not an under-approximation; if you return a trailer in BuildMetadata that was not declared in StaticMetadata, then grapesy will throw an exception.

Deserialization

For deserialization we must provide an instance of ParseMetadata, which is given all metadata headers to parse. In our example this is relatively simple because our metadata uses only a single header:

instance ParseMetadata DownloadStart where
  parseMetadata md =
      case md of
        [CustomMetadata "download-size" value]
          | Just downloadSize <- readMaybe (C8.unpack value)
          -> return $ DownloadStart{downloadSize}
        _otherwise
          -> throwM $ UnexpectedMetadata md

instance ParseMetadata DownloadDone where
  parseMetadata md =
      case md of
        [CustomMetadata "download-hash-bin" downloadHash]
          -> return $ DownloadDone{downloadHash}
        _otherwise
          -> throwM $ UnexpectedMetadata md

These particular instances will throw an error if additional metadata is present. This is a choice, and instead we could simply ignore any additional headers. There is no single right answer here: ignoring additional metadata runs the risk of not realizing that the peer is trying to tell you something important, but throwing an error runs the risk of unnecessarily aborting an RPC.

Specifying initial response metadata

The metadata that is sent to the client with the response headers can be overridden with setResponseInitialMetadata. This can be done at any point before initiating the request, either explicitly using initiateResponse or implicitly by sending the first output to the client using sendOutput and related functions.

Most server handlers however don’t care about metadata, and prefer not to have to call to setResponseInitialMetadata at all. For this reason mkRpcHandler has type

mkRpcHandler :: Default (ResponseInitialMetadata rpc) => ..

This constraint is inherited by the high-level API, which doesn’t support metadata at all:

Method :: (Default (ResponseInitialMetadata rpc), Default (ResponseTrailingMetadata rpc)) => ..

Crucially, there is a Default instance for NoMetadata:

instance Default NoMetadata where
  def = NoMetadata

In our case however we cannot provide a Default instance, because the initial metadata depends on the file size. We therefore use mkRpcHandlerNoDefMetadata instead:

methods :: Methods IO (ProtobufMethodsOf Fileserver)
methods =
      RawMethod (mkRpcHandlerNoDefMetadata download)
    $ NoMoreMethods

This means we must call setResponseInitialMetadata in the handler; if we don’t, an exception will be raised when the response is initiated.

Server handler

Since we are using the low-level API (we must, if we want to deal with metadata), the server handler has this signature:

download :: Call (Protobuf Fileserver "download") -> IO ()
download call = do

We wait for the request from the client, get the file size, set the response initial metadata, and initiate the response. Explicitly initiating the response in this manner is not essential, but it means that the file size is sent to the client (along with the rest of the response headers) before the first chunk is sent; in some cases this may be important:

req :: Proto File <- recvFinalInput call
let fp :: FilePath
    fp = Text.unpack (req ^. #name)

fileSize <- getFileSize fp
setResponseInitialMetadata call $ DownloadStart fileSize
initiateResponse call

We then open the file the client requested, and keep reading chunks until we have reached end of file. Although it is probably not particularly critical in this case, we follow the recommendations from End of output, revisited and mark the final chunk as the final output to the client, as opposed to telling the client that no more outputs are available after the fact.

withFile fp ReadMode $ \h -> do
  let loop :: SHA256.Ctx -> IO ()
      loop ctx = do
          chunk <- BS.hGet h defaultChunkSize
          eof   <- hIsEOF h

          let resp :: Proto Partial
              resp = defMessage & #chunk .~ chunk

              ctx' :: SHA256.Ctx
              ctx' = SHA256.update ctx chunk

          if eof then
            sendFinalOutput call (resp, DownloadDone $ SHA256.finalize ctx')
          else do
            sendNextOutput call resp
            loop ctx'

  loop SHA256.init

When we send the final output, we must also include the hash that we computed as we were streaming the file to the client.

Client

Let’s first consider how to process the individual chunks that we get from the server. We do this in an auxiliary function processPartial:

processPartial ::
     Handle
  -> Proto Partial
  -> ProgressT (StateT SHA256.Ctx IO) ()
processPartial h partial = do
    liftIO $ BS.hPut h chunk
    modify $ flip SHA256.update chunk
    updateProgressBar $ BS.length chunk
  where
    chunk :: ByteString
    chunk = partial ^. #chunk

We do three things in this function: write the chunk to disk, update the hash, and update the progress bar; this uses StateT to keep track of the partially computed hash, and ProgressT for a simple progress bar (ProgressT is defined in tutorials/metadata/app/ProgressT.hs; its details are not important here).

This in hand, we can now define the main client function. We are given some file inp that we are interested in downloading, and a path out where we want to store it locally. Like in the server, here too we must use the low-level API, so the client starts like this:

download :: Connection -> String -> String -> IO ()
download conn inp out = do
    withRPC conn def (Proxy @(Protobuf Fileserver "download")) $ \call -> do
      sendFinalInput call $ defMessage & #name .~ Text.pack inp

We then wait for the initial response metadata, telling us how big the file is:

DownloadStart{downloadSize} <- recvResponseInitialMetadata call

We then use StreamElem.whileNext_ again to process all the chunks using processPartial that we already discussed, unwrap the monad stack, and finally do a hash comparison:

(DownloadDone{downloadHash = theirHash}, ourHash) <-
  withFile out WriteMode $ \h ->
    flip runStateT SHA256.init . runProgressT downloadSize $
      StreamElem.whileNext_ (recvOutput call) (processPartial h)

putStrLn $ "Hash match: " ++ show (theirHash == SHA256.finalize ourHash)

Custom monad stack

In this section we will briefly discuss how to use custom monad stacks. You can find the full tutorial in tutorials/monadstack; it is a variant on Basics tutorial.

On the server

Most of the server handlers in for the RouteGuide service need to take the DB as an argument:

getFeature   :: DB -> Proto Point -> IO (Proto Feature)
listFeatures :: DB -> Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> IO ()
recordRoute  :: DB -> IO (NextElem (Proto Point)) -> IO (Proto RouteSummary)

It might be more convenient to define a custom Handler monad stack in which we have access to the DB at all times:

newtype Handler a = WrapHandler {
      unwrapHandler :: ReaderT DB IO a
    }
  deriving newtype (Functor, Applicative, Monad, MonadIO, MonadReader DB)

runHandler :: DB -> Handler a -> IO a
runHandler db = flip runReaderT db . unwrapHandler

The types of our handlers then becomes

getFeature   :: Proto Point -> Handler (Proto Feature)
listFeatures :: Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> Handler ()
recordRoute ::  IO (NextElem (Proto Point)) -> Handler (Proto RouteSummary)

Note that the callbacks to send or receive values still live in IO. The DB argument now disappears from methods also:

methods :: Methods Handler (ProtobufMethodsOf RouteGuide)
methods =
      Method (mkNonStreaming    getFeature  )
    $ Method (mkServerStreaming listFeatures)
    $ Method (mkClientStreaming recordRoute )
    $ Method (mkBiDiStreaming   routeChat   )
    $ NoMoreMethods

The only requirement from grapesy is that at the top-level we can hoist this monad stack into IO, using hoistMethods:

hoistMethods :: (forall a. m a -> n a) -> Methods m rpcs -> Methods n rpcs

Here’s how we can run the server:

runServerWithHandlers def config $ fromMethods $
  hoistMethods (runHandler db) methods

On the client

For the high-level API there is support for custom monad stacks also. One reason why you might want to do this is to avoid having to pass the Connection object around all the time. In the Basics tutorial our client functions had these signatures:

getFeature   :: Connection -> IO ()
listFeatures :: Connection -> IO ()
recordRoute  :: Connection -> IO ()
routeChat    :: Connection -> IO ()

Like on the server, we can define a custom monad stack to reduce syntactic overhead:

newtype Client a = WrapClient {
      unwrapClient :: ReaderT ClientEnv IO a
    }
  deriving newtype (Functor, Applicative, Monad, MonadIO, MonadCatch, MonadThrow, MonadMask)

data ClientEnv = ClientEnv {
      conn :: Connection
    }

In order for such a monad stack to be useable, it needs to implement MonadIO and MonadMask, as well as CanCallRPC; it’s this last class that tells grapesy to get get access to the Connection object:

instance CanCallRPC Client where
  getConnection = WrapClient $ conn <$> ask

With this defined, we can now avoid having to pass the connection around at all. Instead of importing from Network.GRPC.Client.StreamType.IO we import from Network.GRPC.Client.StreamType.CanCallRPC instead, which gives us a different definition of nonStreaming and friends. For example, here is getFeature:

getFeature :: Client ()
getFeature = do
    let req = ..
    resp <- nonStreaming (rpc @(Protobuf RouteGuide "getFeature")) req
    liftIO $ print resp

As for the server handlers, the callbacks provided to send and receive messages still live in IO; this means that we’ll need to liftIO them where appropriate:

listFeatures :: Client ()
listFeatures = do
    let req = ..
    serverStreaming (rpc @(Protobuf RouteGuide "listFeatures")) req $ \recv -> liftIO $
      NextElem.whileNext_ recv print

Using conduits

We discussed the simplest form of serverStreaming and co when we discussed the implementation of the client in the Basics tutorial, and we have also seen the generalized form to arbitrary monad stacks. There is one more form, provided in Network.GRPC.Client.StreamType.Conduit, which provides an API using conduits. You can find this example in tutorials/conduit; there is currently no conduit support on the server side.

The main idea is that serverStreaming provides a source to stream from, and clientStreaming_ provides a sink to stream to:

listFeatures :: Connection -> IO ()
listFeatures conn = do
    let req = ..

    let sink :: ConduitT (Proto Feature) Void IO ()
        sink = ..

    serverStreaming conn (rpc @(Protobuf RouteGuide "listFeatures")) req $ \source ->
      runConduit $ source .| sink

recordRoute :: Connection -> IO ()
recordRoute conn = do
    let source :: ConduitT () (Proto Point) IO ()
        source = ..

    resp <- clientStreaming_ conn (rpc @(Protobuf RouteGuide "recordRoute")) $ \sink ->
              runConduit $ source .| sink
    print resp

In bidirectional streaming finally we get two conduits, one in each direction (that is, one source and one sink).

(Ab)using Trailers-Only

For this final section we need to consider some more low-level details about how gRPC is sent over HTTP2. When we discussed final elements, we mentioned that gRPC messages are sent using HTTP2 DATA frames, but we didn’t talk about headers. In general, a gRPC request looks like this:

  1. One or more HEADERS frames, containing the request headers. One of the most important headers here is the :path (pseudo) header, which indicates which RPC we want to invoke; for example, this might be /routeguide.RouteGuide/ListFeatures.

  2. One or more DATA frames, the last of which is marked END_STREAM. We discussed these before.

This is probably as expected, but the structure of the response may look a bit more surprising:

  1. Just like the request, we first get one or more HEADERS. An important example here is the content-type response header, which indicates what kind of message encoding is being used (for example, application/grpc+proto for Protobuf).

  2. One or more DATA frames, the last of which is marked END_STREAM.

  3. Finally, another set of headers, also known as trailers. This set of trailers provides some concluding information about how the RPC went; for example, if the RPC failed, then the trailers will include a grpc-status header with a non-zero value. Any application specific response trailing metadata (such as the checksum we discussed in the file server example) is included here as well.

There is however a special case, known as Trailers-Only: if there is no data to be sent at all, it is possible to send only HEADERS frames, the last of which is marked END_STREAM, and no DATA frames at all. Put another way, the two sets of headers (headers and trailers) are combined, and the data frames are omitted entirely.

The gRPC specification is very explicit about the use of Trailers-Only, and states that it can be used only in RPCs that result in an error:

Most responses are expected to have both headers and trailers but Trailers-Only is permitted for calls that produce an immediate error.

In grapesy this will happen automatically: if a server handler raises an error, and no outputs have as yet been sent to the client, then grapesy will automatically take advantage of Trailers-Only and only send a single set of headers.

However, some gRPC servers also make use of Trailers-Only in non-error cases, when there is no output (e.g. for server-side streaming). Since this does not conform to the gRPC specification, grapesy will not do this automatically, but it is possible if really needed. In tutorials/trailers-only you can find an example RouteGuide server which will take advantage of Trailers-Only in the listFeatures method, when there are no features to return:

listFeatures :: DB -> Call (Protobuf RouteGuide "listFeatures") -> IO ()
listFeatures db call = do
    r <- recvFinalInput call
    case featuresIn db r of
      [] -> sendTrailersOnly call NoMetadata
      ps -> StreamElem.forM_ ps NoMetadata (sendOutput call)

The difference between this implementation and the previous one can only be observed when we look at the raw network traffic; the difference is not visible at the gRPC level. Since this violates the specification, however, it’s possible (though perhaps unlikely) that some clients will be confused by this server.

Future work

The gRPC specification is only the core of the gRPC ecosystem. There are additional features that are defined on top, some of which are supported by grapesy (see list of features at the start of this post), but not all; the features that are not yet supported are listed below. Note that these are optional features, which have various degrees of support in the official clients and servers. If you or your company needs any of these features, we’d be happy to discuss options; please contact us at info@well-typed.com.

  • Authentication. The gRPC Authentication Guide mentions three ways to authenticate: SSL/TLS, Application Layer Transport Security (ALTS) and token-based authentication, possibly through OAuth2. Of these three only SSL/TLS is currently supported by grapesy.

  • Interceptors are essentially a form of middleware that are applied to every request, and can be used for things like metrics (see below).

  • Custom Backend Metrics. There is support in grapesy for parsing or including an Open Request Cost Aggregation (ORCA) load report in the response trailers through the endpoint-load-metrics-bin trailer, but there is otherwise no support for ORCA or custom backend metrics in general.

  • Load balancing. There is very limited support for load balancing in the ReconnectPolicy, but we have no support for load balancing as described in the Custom Load Balancing Policies Guide.

  • Custom name resolution.

  • Automatic deadline propagation. There is of course support for setting timeouts, but there is no support for automatic propagation from one server to another, adjusting for clock skew. See the section “Deadline Propagation” in Deadlines Guide for server.

  • Introspection, services that allow to query the state of the server:

  • True binary metadata. There is support in grapesy for sending binary metadata (in -bin headers/trailers), using base64 encoding (as per the spec). True binary metadata is about avoiding this encoding overhead.

  • Sending keep-alive pings (this will require adding this feature to the http2 library).

  • Retry policies. The gRPC documentation currently identifies two such policies: request hedging, which sends the same request to a number of servers, waiting for the first response it receives; and automatic retries of failed requests. There is support in grapesy for the grpc-previous-rpc-attempts request header as well as the grpc-retry-pushback-ms response trailer, necessary to support these features.

Footnotes

  1. There are actually two ways to use JSON with gRPC. It can be a very general term, simply meaning using an otherwise-unspecified JSON encoding, or it can specifically refer to “Protobuf over JSON”. The former is supported by grapesy, the latter is not yet↩︎

  2. The cancellation guide describes client-side cancellation as “A client cancels an RPC call by calling a method on the call object or, in some languages, on the accompanying context object.”. In grapesy this is handled slightly differently: cancellation corresponds to leaving the scope of withRPC early.↩︎

  3. The full list: http2#72, http2#74, http2#77, http2#78, http2#79, http2#80, http2#81, http2#82, http2#83, http2#84, http2#92, http2#97, http2#99, http2#101, http2#104, http2#105, http2#106, http2#107, http2#108, http2#115, http2#116, http2#117, http2#119, http2#120, http2#122, http2#124, http2#126, http2#133, http2#135, http2#136, http2#137, http2#138, http2#140, http2#142, http2#146, http2#147, http2#155, http-semantics#1, http-semantics#2, http-semantics#3, http-semantics#4, http-semantics#5, http-semantics#9, http-semantics#10, http-semantics#11, http2-tls#2, http2-tls#3, http2-tls#4, http2-tls#5, http2-tls#6, http2-tls#8, http2-tls#9, http2-tls#10, http2-tls#11, http2-tls#14, http2-tls#15, http2-tls#16, http2-tls#17, http2-tls#19, http2-tls#20, http2-tls#21, network-run#3, network-run#6, network-run#8, network-run#9, network-run#12, network-run#13, network-control#4, network-control#7, tls#458, tls#459, tls#477, tls#478, and network#588.↩︎

  4. Layout of the type error slightly modified for improved readability↩︎

  5. Since gRPC does not support client-side trailers, client-side cancellation is made visible to the server by sending a HTTP2 RST_STREAM frame.↩︎

  6. They are however not primitive; see recvInputWithMeta and sendOutputWithMeta.↩︎

by edsko, finley at January 22, 2025 12:00 AM

January 21, 2025

in Code

Advent of Code 2024: Haskell Solution Reflections for all 25 Days

Admittedly a bit late, buuuuuut Merry belated Christmas and Happy New Years to all!

This past December I again participated in Eric Wastl’s Advent of Code, a series of 25 daily Christmas-themed puzzles. Each puzzle comes with a cute story about saving Christmas, and the puzzles increase in difficulty as the stakes get higher and higher. Every night at midnight EST, my friends and I (including the good people of libera chat’s ##advent-of-code channel) discuss the latest puzzle and creative ways to solve and optimize it. But, the main goal isn’t to solve it quickly, it’s always to see creative ways to approach the puzzle and share different insights. The puzzles are bite-sized enough that there are often multiple ways to approach it, and in the past I’ve leveraged group theory, galilean transformations and linear algebra, and more group theory. This year was also the special 10 year anniversary event, with callbacks to fun story elements of all the previous years!

Most of the puzzles are also pretty nice to solve in Haskell! Lots of DFS’s that melt away as simple recursion or recursion schemes, and even the BFS’s that expose you to different data structures and encodings.

This year I’ve moved almost all of my Haskell code to an Advent of Code Megarepo. I also like to post write-ups on Haskelly ways to approach the problems, and they are auto-compiled on the megarepo wiki.

I try my best every year, but sometimes I am able to complete write-ups for all 25 puzzles before the new year catches up. The last time was 2020, and I’m proud to announce that 2024 is now also 100% complete!

You can find all of them here, but here are links to each individual one. Hopefully you can find them helpful. And if you haven’t yet, why not try Advent of Code yourself? :) And drop by the freenode ##advent-of-code channel, we’d love to say hi and chat, or help out! Thanks all for reading, and also thanks to Eric for a great event this year, as always!

by Justin Le at January 21, 2025 07:54 AM

Matt Parsons

Making My Life Harder with GADTs

Lucas Escot wrote a good blog post titled “Making My Life Easier with GADTs”, which contains a demonstration of GADTs that made his life easier. He posted the article to reddit.

I’m going to trust that - for his requirements and anticipated program evolution - the solution is a good one for him, and that it actually made his life easier. However, there’s one point in his post that I take issue with:

Dependent types and assimilated type-level features get a bad rep. They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.

I am often arguing against complexity in Haskell codebases. While Lucas’s prediction about “misconceptions” may be true for others, it is not true for me. I have worked extensively with Haskell’s most advanced features in large scale codebases. I’ve studied “Types and Programming Languages,” the Idris book, “Type Theory and Formal Proof”, and many other resources on advanced type systems. I don’t say this to indicate that I’m some kind of genius or authority, just that I’m not a rube who’s looking up on the Blub Paradox. My argument for simplicity comes from the hard experience of having to rip these advanced features out, and the pleasant discovery that simpler alternatives are usually nicer in every respect.

So how about GADTs? Do they make my life easier? Here, I’ll reproduce the comment I left on reddit:


They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.

This opinion - in my case at least - stems from having seen people code themselves into a corner with fancy type features where a simpler feature would have worked just as well.

In this case, the “simplest solution” is to have two entirely separate datatypes, as the blog post initially starts with. These datatypes, after all, represent different things - a typed environment and an untyped environment. Why mix the concerns? What pain or requirement is solved by having one more complicated datatype when two datatypes works pretty damn well?

I could indeed keep typed environments completely separate. Different datatypes, different information. But this would lead to a lot of code duplication. Given that the compilation logic will mostly be mostly identical for these two targets, I don’t want to be responsible for the burden of keeping both implementations in sync.

Code duplication can be a real concern. In this case, we have code that is not precisely duplicated, but simply similar - we want compilation logic to work for both untyped and typed logics, and only take typing information into account. When we want code to work over multiple possible types, we have two options: parametric polymorphism and ad-hoc polymorphism.

With parametric polymorphism, the solution looks like this:

data GlobalEnv a = GlobalEnv [(Name, GlobalDecl a)]

data GlobalDecl a
  = DataDecl (DataBody a)
  | FunDecl  (FunBody a)
  | TypeDecl a

data DataBody a = DataBody
  { indConstructors :: [ConstructorBody a]
  }

data ConstructorBody a = ConstructorBody
  { ctorName :: Name
  , ctorArgs :: Int
  , ctorType :: a
  }

data FunBody a = FunBody
  { funBody :: LamBox.Term
  , funType :: a
  }

This is actually very similar to the GADT approach, because we’re threading a type variable through the system. For untyped, we can write GlobalDecl (), and for typed, we can write GlobalDecl LamBox.Type.

Functions which can work on either untyped or typed would have GlobalDecl a -> _ as their input, and functions which require a representation can specify it directly. This would look very similar to the GADT approach: in practice, replace GlobalDecl Typed with GlobalDecl Type and GlobalDecl Untyped with GlobalDecl () and you’re good.

(or, heck, data Untyped = Untyped and the change is even smaller).

This representation is much easier to work with. You can deriving stock (Show, Eq, Ord). You can $(deriveJSON ''GlobalEnv). You can delete several language extensions. It’s also more flexible: you can use Maybe Type to represent partially typed programs (or programs with type inference). You can use Either TypeError Type to represent full ASTs with type errors. You can deriving stock (Functor, Foldable, Traversable) to get access to fmap (change the type with a function) and toList (collect all the types in the AST) and traverse (change each type effectfully, combining results).

When we choose GADTs here, we pay significant implementation complexity costs, and we give up flexibility. What is the benefit? Well, the entire benefit is that we’ve given up flexibility. With the parametric polymorphism approach, we can put anything in for that type variable a. The GADT prevents us from writing TypeDecl () and it forbids you from having anything other than Some (type :: Type) or None in the fields.

This restriction is what I mean by ‘coding into a corner’. Let’s say you get a new requirement to support partially typed programs. If you want to stick with the GADT approach, then you need to change data Typing = Typed | Untyped | PartiallyTyped and modify all the WhenTyped machinery - Optional :: Maybe a -> WhenTyped PartiallTyped a. Likewise, if you want to implement inference or type-checking, you need another constructor on Typing and another onWhenTyped - ... | TypeChecking and Checking :: Either TypeError a -> WhenTyped TypeChecking a.

But wait - now our TypeAliasDecl has become overly strict!

data GlobalDecl :: Typing -> Type where
  FunDecl       :: FunBody t     -> GlobalDecl t
  DataDecl      :: DataBody t    -> GlobalDecl t
  TypeAliasDecl :: TypeAliasBody -> GlobalDecl Typed

We actually want TypeAliasDecl to work with any of PartiallyTyped, Typed, or TypeChecking. Can we make this work? Yes, with a type class constraint:

class IsTypedIsh (t :: Typing)

instance IsTypedIsh Typed
instance IsTypedIsh PartiallyTyped
instance (Unsatisfiable msg) => IsTypedIsh Untyped

data GlobalDecl :: Typing -> Type where
  FunDecl       :: FunBody t     -> GlobalDecl t
  DataDecl      :: DataBody t    -> GlobalDecl t
  TypeAliasDecl :: (IsTypedIsh t) => TypeAliasBody -> GlobalDecl t

But, uh oh, we also want to write functions that can operate in many of these states. We can extend IsTypedish with a function witness witnessTypedish :: WhenTyped t Type -> Type, but that also doesn’t quite work - the t actually determines the output type.

class IsTypedIsh (t :: Typing) where
  type TypedIshPayload t 
  isTypedIshWitness :: WhenTyped t Type -> TypedIshPayload t

instance IsTypedIsh Typed where
  type TypedIshPayload Typed = Type
  isTypedIshWitness (Some a) = a

instance IsTypedIsh PartiallyTyped where
  type TypedIshPayload PartiallyTyped = Maybe Type
  isTypedIshWitness (Optional a) = a

instance IsTypedIsh TypeChecking where
  type TypedIshPayload TypeChecking = Either TypeError Type
  isTypedIshWitness (Checking a) = a

instance (Unsatisfiable msg) => IsTypedIsh Untyped

Now, this does let us write code like:

inputHasTypeSorta :: (IsTypedIsh t) => GlobalDec t -> _

but actually working with this becomes a bit obnoxious. You see, without knowing t, you can’t know the result of isTypedIshWitness, so you end up needing to say things like (IsTypedish t, TypedIshPayload t ~ f Type, Foldable f) => ... to cover the Maybe and Either case - and this only lets you fold the result. But now you’re working with the infelicities of type classes (inherently open) and sum types (inherently closed) and the way that GHC tries to unify these two things with type class dispatch.

Whew.

Meanwhile, in parametric polymorphism land, we get almost all of the above for free. If we want to write code that covers multiple possible cases, then we can use much simpler type class programming. Consider how easy it is to write this function and type:

beginTypeChecking 
    :: GlobalDecl () 
    -> GlobalDecl (Maybe (Either TypeError Type))
beginTypeChecking = fmap (\() -> Nothing)

And now consider what you need to do to make the GADT program work out like this.

January 21, 2025 12:00 AM

January 19, 2025

Magnus Therning

Reviewing GitHub PRs in Emacs

My Emacs config's todo-list has long had an item about finding some way to review GitHub PRs without having to leave Emacs and when the forge issue that I subscribe to came alive again I thought it was time to see if I can improve my config.

I found three packages for doing reviews

I've tried the first one before but at the time it didn't seem to work at all. Apparently that's improved somewhat, though there's a PR with a change that's necessary to make it work.1 The first two don't support comments on multiple lines of a PR, there are issues/discussions for both

The last one, emacs-pr-review does support commenting on multiple lines, but it lacks a nice way of opening a review from magit. What I can do is

  1. position the cursor on a PR in the magit status view, then
  2. copy the the PR's URL using forge-copy-url-at-point-as-kill, and
  3. open the PR by calling pr-review and pasting the PR's URL.

Which I did for a few days until I got tired of it and wrote a function to cut out they copy/paste part.

(defun mes/pr-review-via-forge ()
  (interactive)
  (if-let* ((target (forge--browse-target))
            (url (if (stringp target) target (forge-get-url target)))
            (rev-url (pr-review-url-parse url)))
      (pr-review url)
    (user-error "No PR to review at point")))

I've bound it to a key in magit-mode-map to make it easier.

I have to say I'm not completely happy with emacs-pr-review, so if either of the other two sort out commenting on multiple lines I'll check them out again.

My full setup for pr-review is here.

Footnotes:

1

The details can be found among the comments of the forge issue.

January 19, 2025 10:10 AM

Dan Piponi (sigfpe)

Running from the past


Important Note

The links to formulae here are broken but a PDF version is available at github.


Preface

Functional programming encourages us to program without mutable state. Instead we compose functions that can be viewed as state transformers. It's a change of perspective that can have a big impact on how we reason about our code. But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations. (Unless you're infinitely unlucky!)



Markov Chains

A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state. So it is a sequence such that for all . A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny). (And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)


Like imperative code, this description is stateful. The state at step depends on the state at step . Probability is often easier to reason about when we work with independent identically drawn random variables and our aren't of this type. But we can eliminate the state from our description using the same method used by functional programmers.


Let's choose a Markov chain to play with. I'll pick one with 3 states called , and and with transition probabilities given by where


Here's a diagram illustrating our states:




Implementation

First some imports:


> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}


> import Data.Sequence(replicateA) > import System.Random > import Control.Monad.State > import Control.Monad > import Data.List > import Data.Array


And now the type of our random variable:


> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)


We are now in a position to simulate our Markov chain. First we need some random numbers drawn uniformly from [0, 1]:


> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random


And now the code to take a single step in the Markov chain:


> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
>     a <- uniform
>     if a < 0.5
>         then return A
>         else return B
> step B = do
>     a <- uniform
>     if a < 1/3.0
>         then return A
>         else if a < 2/3.0
>             then return B
>             else return C
> step C = do
>     a <- uniform
>     if a < 0.5
>         then return B
>         else return C


Notice how the step function generates a new state at random in a way that depends on the previous state. The m ABC in the type signature makes it clear that we are generating random states at each step.


We can simulate the effect of taking steps with a function like this:


> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
>     i <- steps (n-1) i
>     step i


We can run for 100 steps, starting with , with a line like so:


*Main> evalState (steps 3 A) gen
B


The starting state of our random number generator is given by gen.


Consider the distribution of states after taking steps. For Markov chains of this type, we know that as goes to infinity the distribution of the th state approaches a limiting "stationary" distribution. There are frequently times when we want to sample from this final distribution. For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution. But for real world problems this can be intractable. Instead, a popular solution is to pick a large and hope it's large enough. As gets larger the distribution gets closer to the limiting distribution. And that's the problem I want to solve here - sampling from the limit. It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.



Some random functions


Here is a new version of our random step function:


> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
>     a <- uniform
>     return $ \case
>         A -> if a < 0.5 then A else B
>         B -> if a < 1/3.0
>                 then A
>                 else if a < 2/3.0 then B else C
>         C -> if a < 0.5 then B else C


In many ways it's similar to the previous one. But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state. We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:


> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
>   fs <- replicateA n step'
>   return $ foldr (flip (.)) id fs


Notice the use of flip. We want to compose functions , each time composing on the left by the new . This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation: (BTW I used replicateA instead of replicateM to indicate that these are independent random draws. It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)


*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]


When I first implemented this I accidentally forgot the flip. So maybe you're wondering what effect removing the flip has? The effect is about as close to a miracle as I've seen in mathematics. It allows us to sample from the limiting distribution in a finite number of steps!


Here's the code:


> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
>   fs <- replicateA n step'
>   return $ foldr (.) id fs


We end up building . This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'. Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain. Running with argument draws a random composition of functions. But if you increase by 1 you don't add a new step at the end. Instead you effectively restart the Markov chain with a new first step generated by a new random seed.


Try it and see:


*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]


Maybe that's surprising. It seems to get stuck in one state. In fact, we can try applying the resulting function to all three states.


*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]


In other words, for large enough we get the constant function.


Think of it this way: If f isn't injective then it's possible that two states get collapsed to the same state. If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state. Once this happens, we'll get the same result no matter how large we take . If we can detect this then we've found the limit of as goes to infinity. But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition. That flip can't change what probability distribution we're drawing from - just the dependence on the seed. So the value the constant function takes is actually a draw from the limiting stationary distribution.


We can code this up:


> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as


> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool > test_constant f = > all_equal $ map f $ enumFromTo minBound maxBound


This technique is called coupling from the past. It's "coupling" because we've arranged that different starting points coalesce. And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.


> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
>                   m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
>     if test_constant f
>         then return f
>         else do
>             f' <- step
>             couple_from_past step (f . f')


We can now sample from the limiting distribution a million times, say:


*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen


We can now count how often A appears:


*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748


That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.


> gen = mkStdGen 669



Notes

The technique of coupling from the past first appeared in a paper by Propp and Wilson. The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it. Note that the code above is absolutely not how you'd implement this for real. I wrote the code that way so that I could switch algorithm with the simple removal of a flip. In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred. Or even with uncountably large state spaces. But I'll let you read the Propp-Wilson paper to find out how.

by sigfpe (noreply@blogger.com) at January 19, 2025 05:15 AM

Abhinav Sarkar

Interpreting Brainfuck in Haskell

Writing an interpreter for Brainfuck is almost a rite of passage for any programming language implementer, and it’s my turn now. In this post, we’ll write not one but four Brainfuck interpreters in Haskell. Let’s go!

This post was originally published on abhinavsarkar.net.

Introduction

Brainfuck (henceforth BF) is the most famous of esoteric programming languages. Its fame lies in the fact that it is extremely minimalistic, with only eight instructions, and very easy to implement. Yet, it is Turing-complete and as capable as any other programming language1. Writing an interpreter for BF is a fun exercise, and so there are hundreds, maybe even thousands of them. Since BF is very verbose, optimizing BF interpreters is almost a sport, with people posting benchmarks of their creations. I can’t say that what I have in this post is novel, but it was definitely a fun exercise for me.

BF has eight instructions of one character each. A BF program is a sequence of these instructions. It may have other characters as well, which are treated as comments and are ignored while executing. An instruction pointer (IP) points at the next instruction to be executed, starting with the first instruction. The instructions are executed sequentially, except for the jump instructions that may cause the IP to jump to remote instructions. The program terminates when the IP moves past the last instruction.

BF programs work by modifying data in a memory that is an array of at least 30000 byte cells initialized to zero. A data pointer (DP) points to the current byte of the memory to be modified, starting with the first byte of the memory. BF programs can also read from standard input and write to standard output, one byte at a time using the ASCII character encoding.

The eight BF instructions each consist of a single character:

>
Increment the DP by one to point to the next cell to the right.
<
Decrement the DP by one to point to the next cell to the left.
+
Increment the byte at the DP by one.
-
Decrement the byte at the DP by one.
.
Output the byte at the DP.
,
Accept one byte of input, and store its value in the byte at the DP.
[
If the byte at the DP is zero, then instead of moving the IP forward to the next command, jump it forward to the command after the matching ] command.
]
If the byte at the DP is nonzero, then instead of moving the IP forward to the next command, jump it back to the command after the matching [ command.

Each [ matches exactly one ] and vice versa, and the [ comes first. Together, they add conditions and loops to BF.

Some details are left to implementations. In our case, we assume that the memory cells are signed bytes that underflow and overflow without errors. Also, accessing the memory beyond array boundaries wraps to the opposite side without errors.

For a taste, here is a small BF program that prints Hello, World! when run:

+++++++++++[>++++++>+++++++++>++++++++>++++>+++>+<<<<<<-]>+++
+++.>++.+++++++..+++.>>.>-.<<-.<.+++.------.--------.>>>+.>-.

As you can imagine, interpreting BF is easy, at least when doing it naively. So instead of writing one interpreter, we are going to write four, with increasing performance and complexity.

Setup

First, some imports:

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Monad (void)
import Data.Bits (shiftR, (.&.))
import Data.ByteArray qualified as BA
import Data.Char (chr, ord)
import Data.Functor (($>))
import Data.Int (Int8)
import Data.Kind (Type)
import Data.Vector qualified as V
import Data.Vector.Storable.Mutable qualified as MV
import Data.Word (Word16, Word8)
import Foreign.Ptr (Ptr, castPtr, minusPtr, plusPtr)
import Foreign.Storable qualified as S
import System.Environment (getArgs, getProgName)
import System.Exit (exitFailure)
import System.IO qualified as IO
import Text.ParserCombinators.ReadP qualified as P

We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the memory and vector libraries.

We abstract the interpreter interface as a typeclass:

class Interpreter a where
  data Program a :: Type
  parse :: String -> Program a
  interpret :: Memory -> Program a -> IO ()

An Interpreter is specified by a data type Program and two functions: parse parses a string to a Program, and interpret interprets the parsed Program.

For modelling the mutable memory, we use a mutable unboxed IOVector of signed bytes (Int8) from the vector package. Since our interpreter runs in IO, this works well for us. The DP hence, is modelled as a index in this vector, which we name the MemIdx type.

newtype Memory = Memory {unMemory :: MV.IOVector Int8}
type MemIdx = Int

newMemory :: Int -> IO Memory
newMemory = fmap Memory . MV.new

memorySize :: Memory -> Int
memorySize = MV.length . unMemory

readMemory :: Memory -> MemIdx -> IO Int8
readMemory = MV.unsafeRead . unMemory

writeMemory :: Memory -> MemIdx -> Int8 -> IO ()
writeMemory = MV.unsafeWrite . unMemory

modifyMemory :: Memory -> (Int8 -> Int8) -> MemIdx -> IO ()
modifyMemory = MV.unsafeModify . unMemory

nextMemoryIndex :: Memory -> MemIdx -> MemIdx
nextMemoryIndex memory memIdx = (memIdx + 1) `rem` memorySize memory

prevMemoryIndex :: Memory -> MemIdx -> MemIdx
prevMemoryIndex memory memIdx = (memIdx - 1) `mod` memorySize memory

We wrap the IOVector Int8 with a Memory newtype. newMemory creates a new memory array of bytes initialized to zero. memorySize returns the size of the memory. readMemory, writeMemory and modifyMemory are for reading from, writing to and modifying the memory respectively. nextMemoryIndex and prevMemoryIndex increment and decrement the array index respectively, taking care of wrapping at boundaries.

Now we write the main function using the Interpreter typeclass functions:

main :: IO ()
main = do
  IO.hSetBuffering IO.stdin IO.NoBuffering
  IO.hSetBuffering IO.stdout IO.LineBuffering
  progName <- getProgName
  let usage = "Usage: " <> progName <> " -(s|a|b|o) <bf_file>"

  getArgs >>= \case
    [interpreterType, fileName] -> do
      code <- filter (`elem` "+-.,><[]") <$> readFile fileName
      memory <- newMemory 30000
      parseAndInterpret memory code usage interpreterType
    _ -> exitWithMsg usage
  where
    parseAndInterpret memory code usage = \case
      "-s" -> interpret @StringInterpreter memory $ parse code
      "-a" -> interpret @ASTInterpreter memory $ parse code
      "-b" -> interpret @BytecodeInterpreter memory $ parse code
      "-o" -> interpret @OptimizingBytecodeInterpreter memory $ parse code
      t -> exitWithMsg $ "Invalid interpreter type: " <> t <> "\n" <> usage

    exitWithMsg msg = IO.hPutStrLn IO.stderr msg >> exitFailure

The main function calls the parse and interpret functions for the right interpreter with a new memory and the input string read from the file specified in the command line argument. We make sure to filter out non-BF characters when reading the input file.

With the setup done, let’s move on to our first interpreter.

String Interpreter

A BF program can be interpreted directly from its string representation, going over the characters and executing the right logic for them. But strings in Haskell are notoriously slow because they are implemented as singly linked-lists of characters. Indexing into strings has \(O(n)\) time complexity, so it is not a good idea to use them directly. Instead, we use a char Zipper2.

data StringInterpreter

instance Interpreter StringInterpreter where
  data Program StringInterpreter = ProgramCZ CharZipper
  parse = ProgramCZ . czFromString
  interpret memory (ProgramCZ code) = interpretCharZipper memory code

Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus3.

data CharZipper = CharZipper
  {czLeft :: String, czFocus :: Maybe Char, czRight :: String}

czFromString :: String -> CharZipper
czFromString = \case
  [] -> CharZipper [] Nothing []
  (x : xs) -> CharZipper [] (Just x) xs

czMoveLeft :: CharZipper -> CharZipper
czMoveLeft = \case
  CharZipper [] (Just focus) right -> CharZipper [] Nothing (focus : right)
  CharZipper (x : xs) (Just focus) right -> CharZipper xs (Just x) (focus : right)
  z -> z

czMoveRight :: CharZipper -> CharZipper
czMoveRight = \case
  CharZipper left (Just focus) [] -> CharZipper (focus : left) Nothing []
  CharZipper left (Just focus) (x : xs) -> CharZipper (focus : left) (Just x) xs
  z -> z

This zipper is a little different from the usual implementations because we need to know when the focus of the zipper has moved out the program boundaries. Hence, we model the focus as Maybe Char. czFromString creates a char zipper from a string. czMoveLeft and czMoveRight move the focus left and right respectively, taking care of setting the focus to Nothing if we move outside the program string.

Parsing the program is thus same as creating the char zipper from the program string. For interpreting the program, we write this function:

interpretCharZipper :: Memory -> CharZipper -> IO ()
interpretCharZipper memory = go 0
  where
    go !memIdx !program = case czFocus program of
      Nothing -> return ()
      Just c -> case c of
        '+' -> modifyMemory memory (+ 1) memIdx >> goNext
        '-' -> modifyMemory memory (subtract 1) memIdx >> goNext
        '>' -> go (nextMemoryIndex memory memIdx) program'
        '<' -> go (prevMemoryIndex memory memIdx) program'
        ',' -> do
          getChar >>= writeMemory memory memIdx . fromIntegral . ord
          goNext
        '.' -> do
          readMemory memory memIdx >>= putChar . chr . fromIntegral
          goNext
        '[' -> readMemory memory memIdx >>= \case
          0 -> go memIdx $ skipRight 1 program
          _ -> goNext
        ']' -> readMemory memory memIdx >>= \case
          0 -> goNext
          _ -> go memIdx $ skipLeft 1 program
        _ -> goNext
      where
        program' = czMoveRight program
        goNext = go memIdx program'

Our main driver here is the tail-recursive go function that takes the memory index and the program as inputs. It then gets the current focus of the program zipper, and executes the BF logic accordingly.

If the current focus is Nothing, it means the program has finished running. So we end the execution. Otherwise, we switch over the character and do what the BF spec tells us to do.

For + and -, we increment or decrement respectively the value in the memory cell at the current index, and go to the next character. For > and <, we increment or decrement the memory index respectively, and go to the next character.

For ,, we read an ASCII encoded character from the standard input, and write it to the memory at the current memory index as a byte. For ., we read the byte from the memory at the current memory index, and write it out to the standard output as an ASCII encoded character. After either cases, we go to the next character.

For [, we read the byte at the current memory index, and if it is zero, we skip right over the part of the program till the matching ] is found. Otherwise, we go to the next character.

For ], we skip left over the part of the program till the matching [ is found, if the current memory byte is non-zero. Otherwise, we go to the next character.

The next two functions implement the skipping logic:

skipRight :: Int -> CharZipper -> CharZipper
skipRight !depth !program
  | depth == 0 = program'
  | otherwise = case czFocus program' of
      Nothing -> error "No matching [ while skipping the loop forward"
      Just '[' -> skipRight (depth + 1) program'
      Just ']' -> skipRight (depth - 1) program'
      _ -> skipRight depth program'
  where
    program' = czMoveRight program

skipLeft :: Int -> CharZipper -> CharZipper
skipLeft !depth !program
  | depth == 0 = czMoveRight program
  | otherwise = case czFocus program' of
      Nothing -> error "No matching ] while skipping the loop backward"
      Just ']' -> skipLeft (depth + 1) program'
      Just '[' -> skipLeft (depth - 1) program'
      _ -> skipLeft depth program'
  where
    program' = czMoveLeft program

The tail-recursive functions skipRight and skipLeft skip over parts of the program by moving the focus to right and left respectively, till the matching bracket is found. Since the loops can contain nested loops, we keep track of the depth of loops we are in, and return only when the depth becomes zero. If we move off the program boundaries while skipping, we throw an error.

That’s it! We now have a fully functioning BF interpreter. To test it, we use these two BF programs: hanoi.bf and mandelbrot.bf.

hanoi.bf solves the Tower of Hanoi puzzle with animating the solution process as ASCII art:

A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf <noscript>A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf</noscript>
A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf

mandelbrot.bf prints an ASCII art showing the Mandelbrot set:

Mandelbrot set ASCII art by mandelbrot.bf <noscript>Mandelbrot set ASCII art by mandelbrot.bf</noscript>
Mandelbrot set ASCII art by mandelbrot.bf

Both of these BF programs serve as good benchmarks for BF interpreters. Let’s test ours by compiling and running it4:

❯ nix-shell -p "ghc.withPackages (pkgs: with pkgs; [vector memory])" \
    --run "ghc --make bfi.hs -O2"
[1 of 2] Compiling Main             ( bfi.hs, bfi.o )
[2 of 2] Linking bfi [Objects changed]
❯ time ./bfi -s hanoi.bf > /dev/null
       29.15 real        29.01 user         0.13 sys
❯ time ./bfi -s mandelbrot.bf > /dev/null
       94.86 real        94.11 user         0.50 sys

That seems quite slow. We can do better.

AST Interpreter

Instead of executing BF programs from their string representations, we can parse them to an Abstract Syntax Tree (AST). This allows us to match brackets only once at parse time, instead of doing it repeatedly at run time. We capture loops as AST nodes, allowing us to skip them trivially.

data ASTInterpreter

instance Interpreter ASTInterpreter where
  data Program ASTInterpreter = ProgramAST Instructions
  parse = ProgramAST . parseToInstrs
  interpret memory (ProgramAST instrs) = interpretAST memory instrs

We represent the BF AST as a Haskell Algebraic Data Type (ADT):

type Instructions = V.Vector Instruction

data Instruction
  = Inc               -- +
  | Dec               -- -
  | MoveR             -- >
  | MoveL             -- <
  | GetC              -- ,
  | PutC              -- .
  | Loop Instructions -- []
  deriving (Show)

There is one constructor per BF instruction, except for loops where the Loop constructor captures both the start and end of loop instructions. We use immutable boxed vectors for lists of instructions instead of Haskell lists so that we can index into them in \(O(1)\).

We use the parse combinator library ReadP to write a recursive-decent parser for BF:

parseToInstrs :: String -> Instructions
parseToInstrs code =
  V.fromList $ case P.readP_to_S (P.many instrParser <* P.eof) code of
    [(res, "")] -> res
    out -> error $ "Unexpected output while parsing: " <> show out

instrParser :: P.ReadP Instruction
instrParser = P.choice
  [ P.char '+' $> Inc,
    P.char '-' $> Dec,
    P.char '>' $> MoveR,
    P.char '<' $> MoveL,
    P.char ',' $> GetC,
    P.char '.' $> PutC,
    Loop . V.fromList <$> P.between (P.char '[') (P.char ']') (P.many instrParser)
  ]

All cases except the loop one are straightforward. For loops, we call the parser recursively to parse the loop body. Note that the parser matches the loop brackets correctly. If the brackets don’t match, the parser fails.

Next, we interpret the BF AST:

interpretAST :: Memory -> Instructions -> IO ()
interpretAST memory = void . interpretInstrs 0 memory

interpretInstrs :: MemIdx -> Memory -> Instructions -> IO MemIdx
interpretInstrs memIdx !memory !program = go memIdx 0
  where
    go !memIdx !progIdx
      | progIdx == V.length program = return memIdx
      | otherwise = case program V.! progIdx of
          Inc -> modifyMemory memory (+ 1) memIdx >> goNext
          Dec -> modifyMemory memory (subtract 1) memIdx >> goNext
          MoveR -> go (nextMemoryIndex memory memIdx) $ progIdx + 1
          MoveL -> go (prevMemoryIndex memory memIdx) $ progIdx + 1
          GetC -> do
            getChar >>= writeMemory memory memIdx . fromIntegral . ord
            goNext
          PutC -> do
            readMemory memory memIdx >>= putChar . chr . fromIntegral
            goNext
          Loop instrs -> readMemory memory memIdx >>= \case
            0 -> goNext
            _ -> interpretInstrs memIdx memory instrs >>= flip go progIdx
      where
        goNext = go memIdx $ progIdx + 1

The AST interpreter code is quite similar to the string interpreter one. This time we use an integer as the IP to index the Instructions vector. All cases except the loop one are pretty much same as before.

For loops, we read the byte at the current memory index, and if it is zero, we skip executing the Loop AST node and go to the next instruction. Otherwise, we recursively interpret the loop body and go to the next instruction, taking care of passing the updated memory index returned from the recursive call to the execution of the next instruction.

And we are done. Let’s see how it performs:

❯ time ./bfi -a hanoi.bf > /dev/null
       14.94 real        14.88 user         0.05 sys
❯ time ./bfi -a mandelbrot.bf > /dev/null
       36.49 real        36.32 user         0.17 sys

Great! hanoi.bf runs 2x faster, whereas mandelbrot.bf runs 2.6x faster. Can we do even better?

Bytecode Interpreter

AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST, it jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode. That’s what our next interpreter uses.

data BytecodeInterpreter

instance Interpreter BytecodeInterpreter where
  data Program BytecodeInterpreter = ProgramBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> assemble
      >>> ProgramBC
  interpret memory (ProgramBC bytecode) = interpretBytecode memory bytecode

We reuse the parser from the AST interpreter, but then we convert the resultant AST into bytecode by translating and assembling it5. We use the Bytes byte array data type from the memory package to represent bytecode.

Unlike AST, bytecode has a flat list of instructions—called Opcodes—that can be encoded in a single byte each, with optional parameters. Because of its flat nature and compactness, bytecode is more CPU friendly to execute, which is where it gets its performance from. The downside is that bytecode is not human readable unlike AST.

data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]

We use the Opcode ADT to model the BF opcodes. For now, it corresponds one-to-one with the Instruction ADT.

The translate function translates Instructions to Opcodes:

translate :: Instructions -> Opcodes
translate = V.toList >>> map translateOpcode
  where
    translateOpcode = \case
      Inc -> OpInc
      Dec -> OpDec
      MoveR -> OpMoveR
      MoveL -> OpMoveL
      GetC -> OpGetC
      PutC -> OpPutC
      Loop instrs -> OpLoop $ translate instrs

The assemble function assembles Opcodes to bytecode byte array:

assemble :: Opcodes -> BA.Bytes
assemble = BA.pack . concatMap assembleOpcode

assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]
  where
    assembleBodyLen bodyLen =
      let lb = fromIntegral $ bodyLen .&. 0xff
          mb = fromIntegral $ (bodyLen .&. 0xff00) `shiftR` 8
       in [lb, mb] -- assumes Little-endian arch

The assembleOpcode function assembles an Opcode to a list of bytes (Word8s). For all cases except for OpLoop, we simply return a unique byte for the opcode.

For OpLoop, we first recursively assemble the loop body. We encode both the body and the body length in the assembled bytecode, so that the bytecode interpreter can use the body length to skip over the loop body when required. We use two bytes to encode the body length, so we first check if the body length plus three is over 65536 (\(= 2^8*2^8\)). If so, we throw an error. Otherwise, we return:

  1. a unique byte for loop start (6),
  2. followed by the body length encoded in two bytes (in the Little-endian order),
  3. then the assembled loop body,
  4. followed by a unique byte for loop end (7),
  5. finally followed by the encoded body length again.

We encode the body length at the end again so that we can use it to jump backward to the start of the loop, to continue looping. Let’s look at this example to understand the loop encoding better:

> code = "++++++++++++++++++++++++++++++++++++++++++++++++>+++++[<+.>-]"
> concatMap assembleOpcode . translate . parseToInstrs $ code
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,6,8,0,3,0,5,2,1,7,8,0]

Let’s focus on the last twelve bytes. The diagram below shows the meaning of the various bytes:

Assembled bytecode for a  loop <noscript>Assembled bytecode for a  loop</noscript>
Assembled bytecode for a BF loop

The example also demonstrates the flat nature of assembled bytecode. Now, all we have to do is to interpret it:

interpretBytecode :: Memory -> BA.Bytes -> IO ()
interpretBytecode memory bytecode =
  MV.unsafeWith
    (unMemory memory)
    (BA.withByteArray bytecode
      . interpretBytecodePtr (memorySize memory) (BA.length bytecode))

Instead of using integer indices in the bytecode array and memory vector, this time we use C-style direct pointers6:

type ProgPtr = Ptr Word8
type MemPtr = Ptr Int8

interpretBytecodePtr :: Int -> Int -> MemPtr -> ProgPtr -> IO ()
interpretBytecodePtr memLen programLen memStartPtr progStartPtr =
  go memStartPtr progStartPtr
  where
    progEndPtr = progStartPtr `plusProgPtr` programLen
    memEndPtr = memStartPtr `plusMemPtr` memLen

    go !memPtr !progPtr
      | progPtr == progEndPtr = return ()
      | otherwise = readProg >>= \case
          0 -> modifyMem (+ 1) >> goNext                           -- Inc
          1 -> modifyMem (subtract 1) >> goNext                    -- Dec
          2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
          3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
          4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
          5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
          6 -> readMem >>= \case                                   -- Loop start
            0 -> readProg2 >>= jump memPtr
            _ -> jump memPtr 3
          7 -> readMem >>= \case                                   -- Loop end
            0 -> jump memPtr 3
            _ -> readProg2 >>= jump memPtr . negate
          8 -> writeMem 0 >> goNext                                -- Clear
          op -> error $ "Unknown opcode: " <> show op
      where
        goNext = jump memPtr 1
        jump memPtr offset = go memPtr $ progPtr `plusProgPtr` offset

        readProg = S.peek progPtr
        readProg2 = -- assumes Little-endian arch
          fromIntegral <$> S.peek (castPtr @_ @Word16 $ progPtr `plusProgPtr` 1)

        readMem = S.peek memPtr
        writeMem = S.poke memPtr
        modifyMem f = readMem >>= writeMem . f

In Haskell, the pointer type Ptr is parametrized by the type of the data it points to. We have two types of pointers here, one that points to the bytecode program, and another that points to the memory cells. So in this case, the IP and DP are actually pointers.

The go function here is again the core of the interpreter loop. We track the current IP and DP in it, and execute the logic corresponding to the opcode at the current memory location. go ends when the IP points to the end of the program byte array.

Most of the cases in go are similar to previous interpreters. Only difference is that we use pointers to read the current opcode and memory cell. For the loop start opcode, we read the byte pointed to by the DP, and if it is zero, we read the next two bytes from the program bytecode, and use it as the offset to jump the IP by to skip over the loop body. Otherwise, we jump the IP by 3 bytes to skip over the loop start opcode and encoded loop body length bytes. For the loop end opcode, we follow similar steps, except we jump backward to the start of the loop.

The helper functions for doing pointer arithmetic are following:

plusProgPtr :: ProgPtr -> Int -> ProgPtr
plusProgPtr = plusPtr

plusMemPtr :: MemPtr -> Int -> MemPtr
plusMemPtr = plusPtr

nextMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
nextMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` inc
   in if memEndPtr > memPtr'
        then memPtr'
        else memStartPtr `plusPtr` (memPtr' `minusPtr` memEndPtr)

prevMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
prevMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` (-1 * inc)
   in if memPtr' >= memStartPtr
        then memPtr'
        else memEndPtr `plusPtr` (memPtr' `minusPtr` memStartPtr)

nextMemPtr and prevMemPtr implement wrapping of pointers as we do for memory indices in nextMemoryIndex and prevMemoryIndex. Let’s see what the results of our hard work are:

❯ time ./bfi -b hanoi.bf > /dev/null
       11.10 real        11.04 user         0.04 sys
❯ time ./bfi -b mandelbrot.bf > /dev/null
       15.72 real        15.68 user         0.04 sys

1.3x and 2.3x speedups for hanoi.bf and mandelbrot.bf respectively over the AST interpreter. Not bad. But surely we can do even better?

Optimizing Bytecode Interpreter

We can optimize our bytecode interpreter by emitting specialized opcodes for particular patterns of opcodes that occur frequently. Think of it as replacing every occurrence of a long phrase in a text with a single word that means the same, leading to a shorter text and faster reading time. Since BF is so verbose, there are many opportunities for optimizing BF bytecode7. We are going to implement only one simple optimization, just to get a taste of how to do it.

data OptimizingBytecodeInterpreter

instance Interpreter OptimizingBytecodeInterpreter where
  data Program OptimizingBytecodeInterpreter = ProgramOBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> optimize
      >>> assemble
      >>> ProgramOBC
  interpret memory (ProgramOBC bytecode) = interpretBytecode memory bytecode

The optimizing bytecode interpreter is pretty much same as the bytecode interpreter, with the optimize function called between the translation and assembly phases.

The pattern of opcode we are optimizing for is [-] and [+]. Both of these BF opcodes when executed, decrement or increment the current memory cell till it becomes zero. In effect, these patterns clear the current cell. We start the process by adding a new Opcode for clearing a cell:

data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]

The optimize function recursively goes over the Opcodes, and emits optimized ones by replacing the patterns that clear the current cell with OpClear:

optimize :: Opcodes -> Opcodes
optimize = map $ \case
  OpLoop [OpDec] -> OpClear
  OpLoop [OpInc] -> OpClear
  OpLoop body -> OpLoop $ optimize body
  op -> op

Then we modify the assembleOpcode function to emit a unique byte for OpClear:

assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]

Finally, we modify the bytecode interpreter to execute the OpClear opcode.

go !memPtr !progPtr
  | progPtr == progEndPtr = return ()
  | otherwise = readProg >>= \case
      0 -> modifyMem (+ 1) >> goNext                           -- Inc
      1 -> modifyMem (subtract 1) >> goNext                    -- Dec
      2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
      3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
      4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
      5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
      6 -> readMem >>= \case                                   -- Loop start
        0 -> readProg2 >>= jump memPtr
        _ -> jump memPtr 3
      7 -> readMem >>= \case                                   -- Loop end
        0 -> jump memPtr 3
        _ -> readProg2 >>= jump memPtr . negate
      8 -> writeMem 0 >> goNext                                -- Clear
      op -> error $ "Unknown opcode: " <> show op

We can see how the patterns [-] and [+] that may execute operations tens, maybe hundreds, of times, are replaced by a single operation in the interpreter now. This is what gives us the speedup in this case. Let’s run it:

❯ time ./bfi -o hanoi.bf > /dev/null
        4.07 real         4.04 user         0.01 sys
❯ time ./bfi -o mandelbrot.bf > /dev/null
       15.58 real        15.53 user         0.04 sys

hanoi.bf runs 2.7x faster, whereas mandelbrot.bf is barely 1% faster as compared to the non-optimizing bytecode interpreter. This demonstrates how different optimizations apply to different programs, and hence the need to implement a wide variety of them to be able to optimize all programs well.

Comparison

It’s time for a final comparison of the run times of the four interpreters:

Interpreter Hanoi Mandelbrot
String 29.15s 94.86s
AST 14.94s 36.49s
Bytecode 11.10s 15.72s
Optimizing Bytecode 4.07s 15.58s

The final interpreter is 7x faster than the baseline one for hanoi.bf, and 6x faster for mandelbrot.bf. Here’s the same data as a chart:

Run time of the four interpreters
Run time of the four interpreters

That’s it for this post. I hope you enjoyed it and took something away from it. In a future post, we’ll explore more optimization for our BF interpreter. The full code for this post is available here.

If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!


  1. BF is Turning-complete. That means it can be used to implement any computable program. However, it is a Turing tarpit, which means it is not feasible to write any useful programs in it because of its lack of abstractions.↩︎

  2. A string interpreter also serves as an useful baseline for measuring the performance of BF interpreters. That’s why I decided to use strings instead of Data.Text or Data.Sequence, which are more performant.↩︎

  3. I am a big fan of zippers, as evidenced by this growing list of posts that I use them in.↩︎

  4. We use Nix for getting the dependency libraries.↩︎

  5. If you are unfamiliar, >>> is the left-to-right function composition function:

    f >>> g = g . f
    ↩︎
  6. While the only way to access byte arrays is pointers, we could have continued accessing the memory vector using indices. I benchmarked both methods, and found that using pointers for memory access sped up the execution of hanoi.bf by 1.1x and mandelbrot.bf by 1.6x as compared to index-based access. It’s also nice to learn how to use pointers in Haskell. This is why we chose to use Storable vectors for the memory.↩︎

  7. See BFC, which touts itself as “an industrial-grade Brainfuck compiler”, with a huge list of optimizations.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at January 19, 2025 12:00 AM

January 18, 2025

Sandy Maguire

A New Perspective on Lenses

I’ve always considered lenses to be a bit uncomfortable. While they’re occasionally useful for doing deeply nested record updates, they often seem to be more trouble than they’re worth. There’s a temptation in the novice programmer, to ^.. and folded their way to a solution that is much more naturally written merely as toList. And don’t get me started about the stateful operators like <<+= and their friends. Many programs which can be more naturally written functionally accidentally end up being imperative due to somebody finding a weird lens combinator and trying to use it in anger. Much like a serious drug collection, the tendency is to push it as far as you can.

Thus, my response has usually been one of pushback and moderation. I don’t avoid lenses at all costs, but I do try to limit myself to the prime types (Lens', Prism', Iso'), and to the boring combinators (view, set, over). I feel like these give me most of the benefits of lenses, without sending me tumbling down the rabbit hole.

All of this is to say that my grokkage of lenses has always been one of generalized injections and projections, for a rather shallow definition of “generalized”. That is, I’ve grown accustomed to thinking about lenses as getter/setter pairs for data structures—eg, I’ve got a big product type and I want to pull a smaller piece out of it, or modify a smaller piece in a larger structure. I think about prisms as the dual structure over coproducts—“generalized” injecting and pattern matching.

And this is all true; but I’ve been missing the forest for the trees on this one. That’s not to say that I want to write lensier code, but that I should be taking the “generalized” part much more seriously.

The big theme of my intellectual development over the last few years has been thinking about abstractions as shared vocabularies. Monoids are not inherently interesting; they’re interesting because of how they let you quotient seemingly-unrelated problems by their monoidal structure. Applicatives are cool because once you’ve grokked them, you begin to see them everywhere. Anywhere you’ve got conceptually-parallel, data-independent computations, you’ve got an applicative lurking somewhere under the surface (even if it happens to be merely the Identity applicative.)

I’ve had a similar insight about lenses, and that’s what I wanted to write about today.

The Context

At work, I’ve been thinking a lot about compilers and memory layout lately. I won’t get into the specifics of why, but we can come up with an inspired example. Imagine we’d like to use Haskell to write a little eDSL that we will use to generate x86 machine code.

The trick of course, is that we’re writing Haskell in order to not write machine code. So the goal is to design high-level combinators in Haskell that express our intent, while simultaneously generating machine code that faithfully implements the intention.

One particularly desirable feature about eDSLs is that they allow us to reuse Haskell’s type system. Thus, imagine we have some type:

type Code :: Type -> Type
data Code a = Code
  { getMachineCode :: [X86OpCode]
  }

Notice that the a parameter here is entirely phantom; it serves only to annotate the type of the value produced by executing getMachineCode. For today’s purpose, we’ll ignore all the details about calling conventions and register layout and what not; let’s just assume a Code a corresponds to a computation that leaves a value (or pointer) to something of type a in a well-known place, whether that be the top of the stack, or eax or something. It doesn’t matter!

Since the type parameter to Code is phantom, we need to think about what role it should have. Keeping it at phantom would be disastrous, since this type isn’t used by Haskell, but it is certainly used to ensure our program is correct. Similarly, representational seems wrong, since coerce is meaningful only when thinking about Haskell; which this thing decidedly is not. Thus, our only other option is:

type role Code nominal

Frustratingly, due to very similar reasoning, Code cannot be a functor, because there’s no way1 to lift an arbitrary Haskell function a -> b into a corresponding function Code a -> Code b. If there were, we’d be in the clear! But alas, we are not.

The Problem

All of the above is to say that we are reusing Haskell’s type system, but not its values. An expression of type Code Bool has absolutely no relation to the values True or False—except that we could write, by hand, a function litBool :: Bool -> Code Bool which happened to do the right thing.

It is tempting, however, to make new Haskell types in order to help constrain the assembly code we end up writing. For example, maybe we want to write a DSP for efficiently decoding audio. We can use Haskell’s types to organize our thoughts and prevent ourselves from making any stupid mistakes:

data Decoder = Decoder
  { format :: Format
  , seekPos :: Int
  , state :: ParserState
  }

data Chunk = ...

createDecoder :: Code MediaHandle -> Code Decoder
decodeChunk :: Code Decoder -> (Code Decoder, Code Chunk)

We now have a nice interface in our eDSL to guide end-users along the blessed path of signal decoding. We have documented what we are trying to do, and how it can be used once it’s implemented. But due to our phantom, yet nominal, parameter to Code, this is all just make believe. There is absolutely no correlation between what we’ve written down and how we can use it. The problem arises when we go to implement decodeChunk. We’ll need to know what state we’re in, which means we’ll need some function:

decoderState :: Code Decoder -> Code ParserState
decoderState = ???

In a world where Code is a functor, this is implemented trivially as fmap state. But Code is not a functor! Alas! Woe! What ever can we do?

The Solution

Lenses, my guy!

Recall that Code is phantom in its argument, even if we use roles to restrict that fact. This means we can implement a safe-ish version of unsafeCoerce, that only fiddles with the paramater of our phantom type:

unsafeCoerceCode :: Code a -> Code b
unsafeCoerceCode (Code ops) = Code ops

Judicious use of unsafeCoerceCode allows us to switch between a value’s type and its in-memory representation. For example, given a type:

type Bytes :: Nat -> Type
data Bytes n

we can reinterpret a Decode as a sequence of bytes:

decoderRep :: Iso' (Code Decoder) (Code (Bytes (32 + 4 + 1)))
decoderRep = iso unsafeCoerceCode unsafeCoerceCode

stateRep :: Iso' (Code ParserState) (Code (Bytes 1))
stateRep = iso unsafeCoerceCode unsafeCoerceCode

which says we are considering our Decoder to be laid out in memory like:

struct Decoder {
  char format[32];
  int32_t seekPos;
  char state;
};

Of course, this is a completely unsafe transformation, as far as the Haskell type system is aware. We’re in the wild west out here, well past any type theoretical life buoys. We’d better be right that this coercion is sound. But assuming this is in fact the in-memory representation of a Decoder, we are well justified in this transformation.

Notice the phrasing of our Iso' above. It is not an iso between Decoder and Bytes 37, but between Codes of such things. This witnesses the fact that it is not true in the Haskell embedding, merely in our Code domain. Of course, isos are like the least exciting optics, so let’s see what other neat things we can do.

Imagine we have some primitives:

slice
    :: n <= m
    => Int     -- ^ offset
    -> Proxy n -- ^ size
    -> Code (Bytes m)
    -> Code (Bytes n)

overwrite
    :: n <= m
    => Int  -- ^ offset
    -> Bytes n
    -> Bytes m
    -> Bytes m

which we can envision as Haskell bindings to the pseudo-C functions:

const char[n] slice(size_t offset, char[m] bytes) {
  return &bytes[offset];
}

char[m] overwrite(size_t offset, char[n] value, char[m] bytes) {
  char[m] new_bytes = malloc(m);
  memcpy(new_bytes, bytes, m);
  memcpy(&new_bytes[offset], value, n);
  return new_bytes;
}

We can use slice and overwrite to give a Lens' into Bytes:

slicing :: n <= m => Int -> Code (Bytes m) -> Code (Bytes n)
slicing offset =
  lens
    (slice offset Proxy)
    (\orig new -> overwrite offset new orig)

and finally, we can give an implementation of the desired decoderState above:

decoderState :: Lens' (Code Decoder) (Code ParserState)
decoderState = decoderRep . slicing 36 . from stateRep

Such a lens acts exactly as a record selector would, in that it allows us to view, set, and over a ParserState inside of a Decoder. But recall that Code is just a list of instructions we eventually want the machine to run. We’re using the shared vocabulary of lenses to emit machine code! What looks like using a data structure to us when viewed through the Haskell perspective, is instead invoking an assembler.

Reflections

Once the idea sinks in, you’ll start seeing all sorts of cool things you can do with optics to generate code. Prisms generalize running initializer code. A Traversal over Code can be implemented as a loop. And since all the sizes are known statically, if you’re feeling plucky, you can decide to unroll the loop right there in the lens.

Outside of the context of Code, the realization that optics are this general is still doing my head in. Something I love about working in Haskell is that I’m still regularly having my mind blown, even after a decade.


  1. Short of compiling to categories via something like categorifier.↩︎

January 18, 2025 09:18 AM

January 17, 2025

Well-Typed.Com

Tracing foreign function invocations

When profiling Haskell programs, time spent in foreign functions (functions defined in C) does not show up on normal time profiles, which can be problematic when debugging or optimizing performance of code that makes heavy use of the foreign function interface (FFI). In this blog post we present a new compiler plugin called trace-foreign-calls, which makes this time visible and available for analysis.

The trace-foreign-calls plugin as well as a simple analysis tool ghc-events-util are both available on GitHub.

Overview

Consider a C function

long slow_add(long a, long b) {
  while(b--) {
    a++;
  }
  return a;
}

with corresponding Haskell import

foreign import capi unsafe "test_cbits.h slow_add"
  c_slowAddIO :: CLong -> CLong -> IO CLong

and an application that invokes

main :: IO ()
main = do
    print =<< slowAddIO a b
    print =<< slowAddIO b a
  where
    a = 1_000_000_000
    b = 2_000_000_000

When we compile the application with the trace-foreign-calls plugin enabled, run it, and then look at the generated .eventlog using ghc-events-util, we will see:

 607.16ms   607.16ms       1922573  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
   0.26ms     0.00ms     609077635  cap 1  trace-foreign-calls: return c_slowAddIO

 302.02ms   302.02ms     609336093  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
   0.01ms     0.00ms     911353269  cap 1  trace-foreign-calls: return c_slowAddIO

The important column here is the first column, which for each event reports the time from that event to the next; in this case, from the call to the foreign call to its return. Perhaps ghc-events-util could be given a mode that is designed specifically for trace-foreign-events to make this output a bit more readable, but for now this general purpose output suffices.1

If we additionally compile with profiling enabled, we get an additional event for each foreign call, recording the cost-centre callstack:

   0.00ms     0.00ms       4643217  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
 856.37ms   856.37ms       4643327  cap 1  heap prof sample 0, residency 1, cost centre stack:
                                           slowAddIO in Example at Example.hs:29:1-78
                                           main in Main at test/Main.hs:(8,1)-(19,21)
                                           runMainIO1 in GHC.Internal.TopHandler at <no location info>
   0.65ms     0.00ms     861018010  cap 1  trace-foreign-calls: return c_slowAddIO

   0.00ms     0.00ms     861672464  cap 1  trace-foreign-calls: call c_slowAddIO (capi unsafe "test_cbits.h slow_add")
 426.79ms   426.79ms     861672834  cap 1  heap prof sample 0, residency 1, cost centre stack:
                                           slowAddIO in Example at Example.hs:29:1-78
                                           main in Main at test/Main.hs:(8,1)-(19,21)
                                           runMainIO1 in GHC.Internal.TopHandler at <no location info>
   0.06ms     0.00ms    1288461103  cap 1  trace-foreign-calls: return c_slowAddIO

Note that we are abusing the “heap profile sample” event to record the cost-centre callstack to the foreign function (see “Conclusions and future work”, below).2

Dependencies

Suppose in example-pkg-A we have

foreign import capi "cbits.h xkcdRandomNumber"
  someFunInA :: IO CInt

and we use this function in example-pkg-B

main :: IO ()
main = do
    randomNumber <- someFunInA
    let bs = compress (BS.Char8.pack $ show randomNumber)
    print $ BS.Word8.unpack bs

where compress is from zlib. Although we are running main from example-pkg-B, in order to get information about someFunInA we need to enable the plugin when compiling example-pkg-A; the README.md describes how to enable the plugin for all dependencies. Indeed, when we do this, we see calls to libz as well:

   0.00ms     0.00ms        414047  cap 0  trace-foreign-calls: call someFunInA (capi safe "cbits.h xkcdRandomNumber")
   0.00ms     0.00ms        414607  cap 0  trace-foreign-calls: return someFunInA

(..)

   0.00ms     0.00ms        493076  cap 0  trace-foreign-calls: call c_zlibVersion (capi unsafe "zlib.h zlibVersion")
   0.00ms     0.00ms        493866  cap 0  trace-foreign-calls: return c_zlibVersion

Indeed, if we are willing to do a custom build of ghc, we can even enable the plugin on the boot libraries, which (amongst other things) makes the final print also visible:

   0.00ms     0.00ms        609576  cap 0  trace-foreign-calls: call unsafe_fdReady (ccall unsafe "fdReady")
   0.00ms     0.00ms        611846  cap 0  trace-foreign-calls: return unsafe_fdReady
   0.01ms     0.00ms        612286  cap 0  trace-foreign-calls: call c_write (capi unsafe "HsBase.h write")
   0.23ms     0.00ms        618236  cap 0  trace-foreign-calls: return c_write

Conclusions and future work

The trace-foreign-calls compiler plugin can be used to generate eventlog events for foreign function invocations, so that the time spent in foreign functions becomes visible; the ghc-events-util tool can be used to inspect these eventlogs.

The plugin works by renaming each foreign function import of foo to foo_uninstrumented, and then introducing a new wrapper function foo which emits some custom events to the eventlog before and after calling foo_uninstrumented. Since we want the plugin to work even on the GHC boot libraries, the wrapper tries to use only functionality from GHC.Prim, which limits what we can do. One consequence is that because the plugin reuses “heap profile sample” events to record the cost centre stacks for foreign functions, it is not currently possible to record both regular heap profile samples (that is, run the code with +RTS -p) and enable the plugin at the same time.

A better solution would be to add support for profiling foreign functions to ghc itself. This would involve creating new eventlog event types, and then upgrading existing time profiling tools to interpret these new events. Until then, however, time profiling of foreign function invocations is now at least possible.


  1. The first three columns are the time from each event to the next visible event (some events might be filtered out), the time from each event to the next actual event, and the time of the event since the start of the program.↩︎

  2. The samplefield will always be 0; the residency field is used to record the capability. The latter allows us to correlate events of concurrent foreign function invocations; the --res-is-cap command line option to ghc-events-util makes it understand this convention.↩︎

by edsko, zubin, matthew at January 17, 2025 12:00 AM

January 16, 2025

Michael Snoyman

The Paradox of Necessary Force

Humans want the resources of other humans. I want the food that the supermarket owns so that I can eat it. Before buying it, I wanted the house that I now own. And before that, someone wanted to build a house on that plot of land, which was owned by someone else first. Most of the activities we engage in during our lifetime revolve around extracting something from someone else.

There are two basic modalities to getting the resources of someone else. The first, the simplest, and the one that has dominated the majority of human history, is force. Conquer people, kill them, beat them up and take their stuff, force them into slavery and make them do your work. It’s a somewhat effective strategy. This can also be more subtle, by using coercive and fraudulent methods to trick people into giving you their resources. Let’s call this modality the looter approach.

The second is trade. In the world of trade, I can only extract resources from someone else when they willingly give them to me in exchange for something else of value. This can be barter of value for value, payment in money, built-up goodwill, favors, charity (exchanging resources for the benefit you receive for helping someone else), and more. In order to participate in this modality, you need to create your own valuable resources that other people want to trade for. Let’s call this the producer approach.

The producer approach is better for society in every conceivable way. The looter approach causes unnecessary destruction, pushes production into ventures that don’t directly help anyone (like making more weapons), and rewards people for their ability to inflict harm. By contrast, the producer approach rewards the ability to meet the needs of others and causes resources to end up in the hands of those who value them the most.

Looter philosophy is rooted in the concept of the zero sum game, the mistaken belief that I can only have more if someone else has less. By contrast, the producer philosophy correctly identifies the fact that we can all end up better by producing more goods in more efficient ways. We live in our modern world of relatively widespread luxury because producers have made technological leaps—for their own self-serving motives—that have improved everyone’s ability to produce more goods going forward. Think of the steam engine, electricity, computing power, and more.

A producer-only world

It would be wonderful to live in a world in which there are no looters. We all produce, we all trade, everyone receives more value than they give, and there is no wasted energy or destruction from the use of force.

Think about how wonderful it could be! We wouldn’t need militaries, allowing a massive amount of productive capacity to be channeled into things that make everyone’s lives better. We wouldn’t need police. Not only would that free up more resources, but would remove the threat of improper use of force by the state against citizens. The list goes on and on.

I believe many economists—especially Austrian economists—are cheering for that world. I agree with them on the cheering. It’s why things like Donald Trump’s plans for tariffs are so horrific in their eyes. Tariffs introduce an artificial barrier between nations, impeding trade, preventing the peaceful transfer of resources, and leading to a greater likelihood of armed conflict.

There’s only one problem with this vision, and it’s also based in economics: game theory.

Game theory and looters

Imagine I’m a farmer. I’m a great farmer, I have a large plot of land, I run my operations efficiently, and I produce huge amounts of food. I sell that food into the marketplace, and with that money I’m able to afford great resources from other people, who willingly trade them to me because they value the money more than their own resources. For example, how many T-shirts does the clothing manufacturer need? Instead of his 1,000th T-shirt, he’d rather sell it for $5 and buy some food.

While I’m really great as a farmer, I’m not very good as a fighter. I have no weapons training, I keep no weapons on my property, and I dislike violence.

And finally, there’s a strong, skilled, unethical person down the street. He could get a job with me on the farm. For back-breaking work 8 hours a day, I’ll pay him 5% of my harvest. Or, by contrast, he could act like the mafia, demand a “protection fee” of 20%, and either beat me up, beat up my family, or cause harm to my property, if I don’t pay it.

In other words, he could be a producer and get 5% in exchange for hard work, or be a looter and get 20% in exchange for easy (and, likely for him, fun) work. As described, the game theoretic choice is clear.

So how do we stop a producer world from devolving back into a looter world?

Deterrence

There’s only one mechanism I’m aware of for this, and it’s deterrence. As the farmer, I made a mistake. I should get weapons training. I should keep weapons on my farm. I should be ready to defend myself and my property. Because if I don’t, game theory ultimately predicts that all trade will collapse, and society as we know it will crumble.

I don’t necessarily have to have the power of deterrence myself. I could hire a private security company, once again allowing the producer world to work out well. I trade something of lesser value (some money) for something I value more (the protection afforded by private security). If I’m lucky, that security company will never need to do anything, because the mere threat of their presence is sufficient.

And in modern society, we generally hope to rely on the government police force to provide this protection.

There are easy ways to defeat the ability of deterrence to protect our way of life. The simplest is to defang it. Decriminalize violent and destructive acts, for example. Remove the consequences for bad, looter behavior, and you will incentivize looting. This is far from a theoretical discussion. We’ve seen the clear outcome in California, which has decriminalized theft under $950, resulting—in a completely predictable way—in more theft, stores closing, and an overall erosion of producer philosophy.

And in California, this is even worse. Those who try to be their own deterrence, by arming themselves and protecting their rights, are often the targets of government force instead of the looters.

I’m guessing this phrasing has now split my reading audience into three groups. Group A agrees wholly with what I’m saying. Group B believes what I’ve just written is pure evil and garbage. Group C initially disagreed with my statements, but has an open mind and is willing to consider a different paradigm. The next section is targeted at groups A and C. Group B: good luck with the broken world you’re advocating.

Global scale

This concept of deterrence applies at a global scale too. I would love to live in a world where all nations exchange value for value and never use force against others. In fact, I believe the ultimate vision for this kind of a world ends with anarcho-capitalism (though I don’t know enough about the topic to be certain). There ends up being no need for any force against anyone else. It’s a beautiful vision for a unified world, where there are no borders, there is no destruction, there is only unity through trade. I love it.

But game theory destroys this too. If the entire world disarmed, it would take just one person who thinks he can do better through looter tactics to destroy the system. The only way to defeat that is to have a realistic threat of force to disincentivize someone from acting like a looter.

And this is the paradox. In order to live in our wonderful world of production, prosperity, health, and happiness, we always need to have our finger near enough to the trigger to respond to looters with force. I know of no other approach that allows production to happen. (And I am very interested in other theoretical solutions to this problem, if anyone wants to share reading material.)

Peace through strength

This line of thinking leads to the concept of peace through strength. When those tempted to use violence see the overwhelming strength of their potential victims, they will be disincentivized to engage in violent behavior. It’s the story of the guy who wants to rob my farm. Or the roaming army in the ancient world that bypassed the well fortified walled city and attacked its unprotected neighbor.

There are critics of this philosophy. As put by Andrew Bacevich, "'Peace through strength' easily enough becomes 'peace through war.'" I don’t disagree at all with that analysis, and it’s something we must remain vigilant against. But disarming is not the answer, as it will, of course, necessarily lead to the victory of those willing to use violence on others.

In other words, my thesis here is that the threat of violence must be present to keep society civilized. But the cost of using that violence must be high enough that neither side is incentivized to initiate it.

Israel

I’d been thinking of writing a blog post on this topic for a few months now, but finally decided to today. Israel just agreed to a hostage deal with Hamas. In exchange for the release of 33 hostages taken in the October 7 massacre, Israel will hand over 1,000 terrorists in Israeli prisons.

I have all the sympathy in the world for the hostages and their families. I also have great sympathy for the Palestinian civilians who have been harmed, killed, displaced, and worse by this war. And I have empathy (as one of the victims) for all of the Israeli citizens who have lived under threat of rocket attacks, had our lives disrupted, and for those who have been killed by this war. War is hell, full stop.

My message here is to those who have been pushing the lie of “peace through negotiations.” Or peace through capitulation. Or anything else. These tactics are the reason the war has continued. As long as the incentive structure makes initiating a war a positive, wars will continue to be initiated. Hamas has made its stance on the matter clear: it has sworn for the eradication of all Jews within the region, and considers civilian casualties on the Palestinian side not only acceptable, but advantageous.

Gaza Chief's Brutal Calculation: Civilian Bloodshed Will Help Hamas

I know that many people who criticize Israel and put pressure on us to stop the war in Gaza believe they are doing so for noble reasons. (For the record, I also believe many people have less altruistic reasons for their stance.) I know people like to point to the list of atrocities they believe Israel has committed. And, by contrast, the pro-Israel side is happy to respond with corresponding atrocities from the other side.

I honestly believe this is all far beyond irrelevant. The only question people should be asking is: how do we disincentivize the continuation of hostilities? And hostage deals that result in the release of terrorists, allow “aid” to come in (which, if history is any indication, will be used to further the construction of tunnels and other sources for attack on Israel), and give Hamas an opportunity to rearm, only incentivize the continuation of the war.

In other words, if you care about the innocent people on either side, you should be opposed to this kind of capitulation. Whatever you think about the morality of each side, more people will suffer with this approach.

Skin in the game

It’s easy to say things like that when your life isn’t on the line. I also don’t think that matters much. Either the philosophical, political, and economic analysis is correct, or it isn’t. Nonetheless, I do have skin in the game here. I still live in a warzone. I am less than 15 kilometers from the Lebanese border. We’ve had Hezbollah tunnels reaching into our surrounding cities. My family had to lock ourselves inside when Hezbollah paratroopers had attempted to land in our city.

My wife (Miriam) and I have discussed this situation at length, many times, over the course of this war. If I’m ever taken hostage, I hope the Israeli government bombs the hell out of wherever I am being held. I say this not only because I believe it is the right, just, moral, ethical, and strategically correct thing to do. I say this because I am selfish:

  • I would rather die than be tortured by our enemies.
  • I would rather die than be leveraged to make my family and country less safe.
  • I would rather die than live the rest of my life a shell of my former self, haunted not only by the likely torture inflicted on me, but by the guilt of the harm to others resulting from my spared life.

I don’t know why this hostage deal went through now. I don’t know what pressures have been brought to bear on the leaders in Israel. I don’t know if they are good people trying to protect their citizens, nefarious power hungry cretins looking to abuse both the Israeli and Palestinian populace to stay in control, weak-willed toadies who do what they’re told by others, or simply stupid. But my own stance is clear.

But what about the Palestinians?

I said it above, and I’ll say it again: I truly do feel horrible for the trauma that the Palestinian people are going through. Not for the active terrorists mind you, I feel no qualms about those raising arms against us being destroyed. But everyone else, even those who wish me and my fellow Israelis harm. (And, if polling is to be believed, that’s the majority of Palestinians.) I would much rather that they not be suffering now, and that eventually through earned trust on both sides, everyone’s lots are improved.

But the framework being imposed by those who “love” peace isn’t allowing that to happen. Trust cannot be built when there’s a greater incentive to return to the use of force. I was strongly opposed to the 2005 disengagement from Gaza. But once it happened, it could have been one of those trust-building starting points. Instead, I saw many people justify further violence by Hamas—such as non-stop rocket attacks on the south of Israel—because Israel hadn’t done enough yet.

Notice how fundamentally flawed this mentality is, just from an incentives standpoint! Israel gives up control of land, something against its own overall interests and something desired by Palestinians, and is punished for it with increased violence against citizens. Hamas engaged in a brutal destruction of all of its opponents within the Palestinian population, launched attacks on Israel, and when Israel did respond with force, Israel was blamed for having not done enough to appease Hamas.

I know people will want to complicate this story by bringing up the laundry list of past atrocities, of assigning negative motivations to Israel and its leaders, and a million other evasions that are used to avoid actually solving this conflict. Instead, I beg everyone to just use basic logic.

The violence will continue as long as the violence gets results.

January 16, 2025 12:00 AM

January 15, 2025

Well-Typed.Com

The Haskell Unfolder Episode 38: tasting and testing CUDA (map, fold, scan)

Today, 2025-01-15, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 38th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 38: tasting and testing CUDA (map, fold, scan)

CUDA is an extension of C for programming NVIDIA GPUs. In this episode of the Haskell Unfolder we show how to set up a CUDA library so that we can link to it from a Haskell application, how we can call CUDA functions from Haskell, and how we can use QuickCheck to find subtle bugs in our CUDA code. On the CUDA side, we show how to implement simple concurrent versions of map, fold and scan. No familiarity with CUDA will be assumed, but of course we will only be able to give a taste of CUDA programming.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at January 15, 2025 12:00 AM

January 13, 2025

Michael Snoyman

Incentives Determine Outcomes

My blog posts and reading material have both been on a decidedly economics-heavy slant recently. The topic today, incentives, squarely falls into the category of economics. However, when I say economics, I’m not talking about “analyzing supply and demand curves.” I’m talking about the true basis of economics: understanding how human beings make decisions in a world of scarcity.

A fair definition of incentive is “a reward or punishment that motivates behavior to achieve a desired outcome.” When most people think about economic incentives, they’re thinking of money. If I offer my son $5 if he washes the dishes, I’m incentivizing certain behavior. We can’t guarantee that he’ll do what I want him to do, but we can agree that the incentive structure itself will guide and ultimately determine what outcome will occur.

The great thing about monetary incentives is how easy they are to talk about and compare. “Would I rather make $5 washing the dishes or $10 cleaning the gutters?” But much of the world is incentivized in non-monetary ways too. For example, using the “punishment” half of the definition above, I might threaten my son with losing Nintendo Switch access if he doesn’t wash the dishes. No money is involved, but I’m still incentivizing behavior.

And there are plenty of incentives beyond our direct control! My son is also incentivized to not wash dishes because it’s boring, or because he has some friends over that he wants to hang out with, or dozens of other things. Ultimately, the conflicting array of different incentive structures placed on him will ultimately determine what actions he chooses to take.

Why incentives matter

A phrase I see often in discussions—whether they are political, parenting, economic, or business—is “if they could just do…” Each time I see that phrase, I cringe a bit internally. Usually, the underlying assumption of the statement is “if people would behave contrary to their incentivized behavior then things would be better.” For example:

  • If my kids would just go to bed when I tell them, they wouldn’t be so cranky in the morning.
  • If people would just use the recycling bin, we wouldn’t have such a landfill problem.
  • If people would just stop being lazy, our team would deliver our project on time.

In all these cases, the speakers are seemingly flummoxed as to why the people in question don’t behave more rationally. The problem is: each group is behaving perfectly rationally.

  • The kids have a high time preference, and care more about the joy of staying up now than the crankiness in the morning. Plus, they don’t really suffer the consequences of morning crankiness, their parents do.
  • No individual suffers much from their individual contribution to a landfill. If they stopped growing the size of the landfill, it would make an insignificant difference versus the amount of effort they need to engage in to properly recycle.
  • If a team doesn’t properly account for the productivity of individuals on a project, each individual receives less harm from their own inaction. Sure, the project may be delayed, company revenue may be down, and they may even risk losing their job when the company goes out of business. But their laziness individually won’t determine the entirety of that outcome. By contrast, they greatly benefit from being lazy by getting to relax at work, go on social media, read a book, or do whatever else they do when they’re supposed to be working.

Free Candy!

My point here is that, as long as you ignore the reality of how incentives drive human behavior, you’ll fail at getting the outcomes you want.

If everything I wrote up until now made perfect sense, you understand the premise of this blog post. The rest of it will focus on a bunch of real-world examples to hammer home the point, and demonstrate how versatile this mental model is.

Running a company

Let’s say I run my own company, with myself as the only employee. My personal revenue will be 100% determined by my own actions. If I decide to take Tuesday afternoon off and go fishing, I’ve chosen to lose that afternoon’s revenue. Implicitly, I’ve decided that the enjoyment I get from an afternoon of fishing is greater than the potential revenue. You may think I’m being lazy, but it’s my decision to make. In this situation, the incentive–money–is perfectly aligned with my actions.

Compare this to a typical company/employee relationship. I might have a bank of Paid Time Off (PTO) days, in which case once again my incentives are relatively aligned. I know that I can take off 15 days throughout the year, and I’ve chosen to use half a day for the fishing trip. All is still good.

What about unlimited time off? Suddenly incentives are starting to misalign. I don’t directly pay a price for not showing up to work on Tuesday. Or Wednesday as well, for that matter. I might ultimately be fired for not doing my job, but that will take longer to work its way through the system than simply not making any money for the day taken off.

Compensation overall falls into this misaligned incentive structure. Let’s forget about taking time off. Instead, I work full time on a software project I’m assigned. But instead of using the normal toolchain we’re all used to at work, I play around with a new programming language. I get the fun and joy of playing with new technology, and potentially get to pad my resume a bit when I’m ready to look for a new job. But my current company gets slower results, less productivity, and is forced to subsidize my extracurricular learning.

When a CEO has a bonus structure based on profitability, he’ll do everything he can to make the company profitable. This might include things that actually benefit the company, like improving product quality, reducing internal red tape, or finding cheaper vendors. But it might also include destructive practices, like slashing the R&D budget to show massive profits this year, in exchange for a catastrophe next year when the next version of the product fails to ship.

Golden Parachute CEO

Or my favorite example. My parents owned a business when I was growing up. They had a back office where they ran operations like accounting. All of the furniture was old couches from our house. After all, any money they spent on furniture came right out of their paychecks! But in a large corporate environment, each department is generally given a budget for office furniture, a budget which doesn’t roll over year-to-year. The result? Executives make sure to spend the entire budget each year, often buying furniture far more expensive than they would choose if it was their own money.

There are plenty of details you can quibble with above. It’s in a company’s best interest to give people downtime so that they can come back recharged. Having good ergonomic furniture can in fact increase productivity in excess of the money spent on it. But overall, the picture is pretty clear: in large corporate structures, you’re guaranteed to have mismatches between the company’s goals and the incentive structure placed on individuals.

Using our model from above, we can lament how lazy, greedy, and unethical the employees are for doing what they’re incentivized to do instead of what’s right. But that’s simply ignoring the reality of human nature.

Moral hazard

Moral hazard is a situation where one party is incentivized to take on more risk because another party will bear the consequences. Suppose I tell my son when he turns 21 (or whatever legal gambling age is) that I’ll cover all his losses for a day at the casino, but he gets to keep all the winnings.

What do you think he’s going to do? The most logical course of action is to place the largest possible bets for as long as possible, asking me to cover each time he loses, and taking money off the table and into his bank account each time he wins.

Heads I win, tails you lose

But let’s look at a slightly more nuanced example. I go to a bathroom in the mall. As I’m leaving, I wash my hands. It will take me an extra 1 second to turn off the water when I’m done washing. That’s a trivial price to pay. If I don’t turn off the water, the mall will have to pay for many liters of wasted water, benefiting no one. But I won’t suffer any consequences at all.

This is also a moral hazard, but most people will still turn off the water. Why? Usually due to some combination of other reasons such as:

  1. We’re so habituated to turning off the water that we don’t even consider not turning it off. Put differently, the mental effort needed to not turn off the water is more expensive than the 1 second of time to turn it off.
  2. Many of us have been brought up with a deep guilt about wasting resources like water. We have an internal incentive structure that makes the 1 second to turn off the water much less costly than the mental anguish of the waste we created.
  3. We’re afraid we’ll be caught by someone else and face some kind of social repercussions. (Or maybe more than social. Are you sure there isn’t a law against leaving the water tap on?)

Even with all that in place, you may notice that many public bathrooms use automatic water dispensers. Sure, there’s a sanitation reason for that, but it’s also to avoid this moral hazard.

A common denominator in both of these is that the person taking the action that causes the liability (either the gambling or leaving the water on) is not the person who bears the responsibility for that liability (the father or the mall owner). Generally speaking, the closer together the person making the decision and the person incurring the liability are, the smaller the moral hazard.

It’s easy to demonstrate that by extending the casino example a bit. I said it was the father who was covering the losses of the gambler. Many children (though not all) would want to avoid totally bankrupting their parents, or at least financially hurting them. Instead, imagine that someone from the IRS shows up at your door, hands you a credit card, and tells you you can use it at a casino all day, taking home all the chips you want. The money is coming from the government. How many people would put any restriction on how much they spend?

And since we’re talking about the government already…

Government moral hazards

As I was preparing to write this blog post, the California wildfires hit. The discussions around those wildfires gave a huge number of examples of moral hazards. I decided to cherry-pick a few for this post.

The first and most obvious one: California is asking for disaster relief funds from the federal government. That sounds wonderful. These fires were a natural disaster, so why shouldn’t the federal government pitch in and help take care of people?

The problem is, once again, a moral hazard. In the case of the wildfires, California and Los Angeles both had ample actions they could have taken to mitigate the destruction of this fire: better forest management, larger fire department, keeping the water reservoirs filled, and probably much more that hasn’t come to light yet.

If the federal government bails out California, it will be a clear message for the future: your mistakes will be fixed by others. You know what kind of behavior that incentivizes? More risky behavior! Why spend state funds on forest management and extra firefighters—activities that don’t win politicians a lot of votes in general—when you could instead spend it on a football stadium, higher unemployment payments, or anything else, and then let the feds cover the cost of screw-ups.

You may notice that this is virtually identical to the 2008 “too big to fail” bail-outs. Wall Street took insanely risky behavior, reaped huge profits for years, and when they eventually got caught with their pants down, the rest of us bailed them out. “Privatizing profits, socializing losses.”

Too big to fail

And here’s the absolute best part of this: I can’t even truly blame either California or Wall Street. (I mean, I do blame them, I think their behavior is reprehensible, but you’ll see what I mean.) In a world where the rules of the game implicitly include the bail-out mentality, you would be harming your citizens/shareholders/investors if you didn’t engage in that risky behavior. Since everyone is on the hook for those socialized losses, your best bet is to maximize those privatized profits.

There’s a lot more to government and moral hazard, but I think these two cases demonstrate the crux pretty solidly. But let’s leave moral hazard behind for a bit and get to general incentivization discussions.

Non-monetary competition

At least 50% of the economics knowledge I have comes from the very first econ course I took in college. That professor was amazing, and had some very colorful stories. I can’t vouch for the veracity of the two I’m about to share, but they definitely drive the point home.

In the 1970s, the US had an oil shortage. To “fix” this problem, they instituted price caps on gasoline, which of course resulted in insufficient gasoline. To “fix” this problem, they instituted policies where, depending on your license plate number, you could only fill up gas on certain days of the week. (Irrelevant detail for our point here, but this just resulted in people filling up their tanks more often, no reduction in gas usage.)

Anyway, my professor’s wife had a friend. My professor described in great detail how attractive this woman was. I’ll skip those details here since this is a PG-rated blog. In any event, she never had any trouble filling up her gas tank any day of the week. She would drive up, be told she couldn’t fill up gas today, bat her eyes at the attendant, explain how helpless she was, and was always allowed to fill up gas.

This is a demonstration of non-monetary compensation. Most of the time in a free market, capitalist economy, people are compensated through money. When price caps come into play, there’s a limit to how much monetary compensation someone can receive. And in that case, people find other ways of competing. Like this woman’s case: through using flirtatious behavior to compensate the gas station workers to let her cheat the rules.

The other example was much more insidious. Santa Monica had a problem: it was predominantly wealthy and white. They wanted to fix this problem, and decided to put in place rent controls. After some time, they discovered that Santa Monica had become wealthier and whiter, the exact opposite of their desired outcome. Why would that happen?

Someone investigated, and ended up interviewing a landlady that demonstrated the reason. She was an older white woman, and admittedly racist. Prior to the rent controls, she would list her apartments in the newspaper, and would be legally obligated to rent to anyone who could afford it. Once rent controls were in place, she took a different tact. She knew that she would only get a certain amount for the apartment, and that the demand for apartments was higher than the supply. That meant she could be picky.

She ended up finding tenants through friends-of-friends. Since it wasn’t an official advertisement, she wasn’t legally required to rent it out if someone could afford to pay. Instead, she got to interview people individually and then make them an offer. Normally, that would have resulted in receiving a lower rental price, but not under rent controls.

So who did she choose? A young, unmarried, wealthy, white woman. It made perfect sense. Women were less intimidating and more likely to maintain the apartment better. Wealthy people, she determined, would be better tenants. (I have no idea if this is true in practice or not, I’m not a landlord myself.) Unmarried, because no kids running around meant less damage to the property. And, of course, white. Because she was racist, and her incentive structure made her prefer whites.

You can deride her for being racist, I won’t disagree with you. But it’s simply the reality. Under the non-rent-control scenario, her profit motive for money outweighed her racism motive. But under rent control, the monetary competition was removed, and she was free to play into her racist tendencies without facing any negative consequences.

Bureaucracy

These were the two examples I remember for that course. But non-monetary compensation pops up in many more places. One highly pertinent example is bureaucracies. Imagine you have a government office, or a large corporation’s acquisition department, or the team that apportions grants at a university. In all these cases, you have a group of people making decisions about handing out money that has no monetary impact on them. If they give to the best qualified recipients, they receive no raises. If they spend the money recklessly on frivolous projects, they face no consequences.

Under such an incentivization scheme, there’s little to encourage the bureaucrats to make intelligent funding decisions. Instead, they’ll be incentivized to spend the money where they recognize non-monetary benefits. This is why it’s so common to hear about expensive meals, gift bags at conferences, and even more inappropriate ways of trying to curry favor with those that hold the purse strings.

Compare that ever so briefly with the purchases made by a small mom-and-pop store like my parents owned. Could my dad take a bribe to buy from a vendor who’s ripping him off? Absolutely he could! But he’d lose more on the deal than he’d make on the bribe, since he’s directly incentivized by the deal itself. It would make much more sense for him to go with the better vendor, save $5,000 on the deal, and then treat himself to a lavish $400 meal to celebrate.

Government incentivized behavior

This post is getting longer in the tooth than I’d intended, so I’ll finish off with this section and make it a bit briefer. Beyond all the methods mentioned above, government has another mechanism for modifying behavior: through directly changing incentives via legislation, regulation, and monetary policy. Let’s see some examples:

  • Artificial modification of interest rates encourages people to take on more debt than they would in a free capital market, leading to malinvestment and a consumer debt crisis, and causing the boom-bust cycle we all painfully experience.
  • Going along with that, giving tax breaks on interest payments further artificially incentivizes people to take on debt that they wouldn’t otherwise.
  • During COVID-19, at some points unemployment benefits were greater than minimum wage, incentivizing people to rather stay home and not work than get a job, leading to reduced overall productivity in the economy and more printed dollars for benefits. In other words, it was a perfect recipe for inflation.
  • The tax code gives deductions to “help” people. That might be true, but the real impact is incentivizing people to make decisions they wouldn’t have otherwise. For example, giving out tax deductions on children encourages having more kids. Tax deductions on childcare and preschools incentivizes dual-income households. Whether or not you like the outcomes, it’s clear that it’s government that’s encouraging these outcomes to happen.
  • Tax incentives cause people to engage in behavior they wouldn’t otherwise (daycare+working mother, for example).
  • Inflation means that the value of your money goes down over time, which encourages people to spend more today, when their money has a larger impact. (Milton Friedman described this as high living.)

Conclusion

The idea here is simple, and fully encapsulated in the title: incentives determine outcomes. If you want to know how to get a certain outcome from others, incentivize them to want that to happen. If you want to understand why people act in seemingly irrational ways, check their incentives. If you’re confused why leaders (and especially politicians) seem to engage in destructive behavior, check their incentives.

We can bemoan these realities all we want, but they are realities. While there are some people who have a solid internal moral and ethical code, and that internal code incentivizes them to behave against their externally-incentivized interests, those people are rare. And frankly, those people are self-defeating. People should take advantage of the incentives around them. Because if they don’t, someone else will.

(If you want a literary example of that last comment, see the horse in Animal Farm.)

How do we improve the world under these conditions? Make sure the incentives align well with the overall goals of society. To me, it’s a simple formula:

  • Focus on free trade, value for value, as the basis of a society. In that system, people are always incentivized to provide value to other people.
  • Reduce the size of bureaucracies and large groups of all kinds. The larger an organization becomes, the farther the consequences of decisions are from those who make them.
  • And since the nature of human beings will be to try and create areas where they can control the incentive systems to their own benefits, make that as difficult as possible. That comes in the form of strict limits on government power, for example.

And even if you don’t want to buy in to this conclusion, I hope the rest of the content was educational, and maybe a bit entertaining!

January 13, 2025 12:00 AM

January 12, 2025

Sandy Maguire

Read the Code, Not the Profile

At work a few weeks back, I found myself digging into profile reports, trying to determine why our program was running so slowly. Despite having the extremely obvious-in-retrospect data in front of me, I wasted a lot of time speeding up code that turned out to not move the needle at all.

Although perhaps it will be interesting only to future me, I thought it would be a good exercise to write up the experience—if only so I learn the lesson about how to read profiles and not make the same mistake again.

Some Context

I’m currently employed to work on a compiler. The performance has never been stellar, in that we were usually seeing about 5s to compile programs, even trivially small ones consisting of less than a hundred instructions. It was painful, but not that painful, since the test suite still finished in a minute or two. It was a good opportunity to get a coffee. I always assumed that the time penalties we were seeing were constant factors; perhaps it took a second or two to connect to Z3 or something like that.

But then we started unrolling loops, which turned trivially small programs into merely small programs, and our performance ballooned. Now we were looking at 45s for some of our tests! Uh oh! That’s no longer in the real of constant factors, and it was clear that something asymptotically was wrong.

So I fired up GHC with the trusty old -prof flag, and ran the test suite in +RTS -p mode, which instruments the program with all sorts of profiling goodies. After a few minutes, the test suite completed, and left a test-suite.prof file laying around in the current directory. You can inspect such things by hand, but tools like profiteur make the experience much nicer.

Without further ado, here’s what our profile looked like:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%

Well, that’s not very helpful. Of course MAIN takes 100% of the time. So I expanded that, and saw:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%

No clearer. Opening up main:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%

Sheesh.

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%

OH MY GOD. JUST TELL ME SOMETHING ALREADY.

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%

Fast forwarding for quite a while, I opened up the entire stack until I got to something that didn’t take 100% of the program’s runtime:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%
        └ makeTest . . . . . . . . . . . . . . . . . 100%
          └ makeTest.\ . . . . . . . . . . . . . . . 100%
            └ compileProgram . . . . . . . . . . . . 100%
              └ evalAppT . . . . . . . . . . . . . . 100%
                └ runAppT  . . . . . . . . . . . . . 100%
                  └ runAppT' . . . . . . . . . . . . 100%
                    └ withLogging  . . . . . . . . . 100%
                      └ transformSSA . . . . . . . . 100%
                        └ >>=  . . . . . . . . . . . 100%
                          └ >>>= . . . . . . . . . . 100%
                            └ ibind  . . . . . . . . 100%
                              └ ibind.\  . . . . . . 100%
                                └ ibind.\.\  . . . . 100%
                                  ├ toSSA  . . . . . 15%
                                  ├ transform1 . . . 15%
                                  ├ transform2 . . . 10%
                                  ├ transform3 . . . 10%
                                  ├ transform4 . . . 20%
                                  └ collectGarbage . 30%

Now we’re in business. I dutifully dug into toSSA, the transforms, and collectGarbage. I cached some things, used better data structures, stopped appending lists, you know, the usual Haskell tricks. My work was rewarded, in that I managed to shave 80% off the runtime of our program.

A few months later, we wrote a bigger program and fed it to the compiler. This one didn’t stop compiling. We left it overnight.

Uh oh. Turns out I hadn’t fixed the problem. I’d only papered over it.

Retrospective

So what went wrong here? Quite a lot, in fact! And worse, I had all of the information all along, but managed to misinterpret it at several steps of the process.

Unwinding the story stack, the most salient aspect of having not solved the problem was reducing the runtime by only 80%. Dramatic percentages feel like amazing improvements, but that’s because human brains are poorly designed for building software. In the real world, big percentages are fantastic. In software, they are linear improvements.

That is to say that a percentage-based improvement is \(O(n)\) faster in the best case. My efforts improved our runtime from 45s to 9s. Which feels great, but the real problem is that this program is measured in seconds at all.

It’s more informative to think in terms of orders of magnitude. Taking 45s on a ~3GHz processor is on the order of 1011 instructions, while 9s is 1010. How the hell is it taking us TEN BILLION instructions to compile a dinky little program? That’s the real problem. Improving things from one hundred billion down to ten billion is no longer very impressive at all.

To get a sense of the scale here, even if we spent 1M cycles (which feels conservatively expensive) for each instruction we wanted to compile, we should still be looking at < 0.1s. Somehow we are over 1000x worse than that.

So that’s one mistake I made: being impressed by extremely marginal improvements. Bad Sandy.

The other mistake came from my interpretation of the profile. As a quick pop quiz, scroll back up to the profile and see if you can spot where the problem is.

After expanding a few obviously-not-the-problem call centers that each were 100% of the runtime, I turned my brain off and opened all of the 100% nodes. But in doing so, I accidentally breezed past the real problem. The real problem is either that compileProgram takes 100% of the time of the test, or that transformSSA takes 100% of compiling the program. Why’s that? Because unlike main and co, test does more work than just compiling the program. It also does non-trivial IO to produce debugging outputs, and property checks the resulting programs. Similarly for compileProgram, which does a great deal more than transformSSA.

This is somewhat of a philosophical enlightenment. The program execution hasn’t changed at all, but our perspective has. Rather than micro-optimizing the code that is running, this new perspective suggests we should focus our effort on determining why that code is running in the first place.

Digging through transformSSA made it very obvious the problem was an algorithmic one—we were running an unbounded loop that terminated on convergence, where each step it took @O(n^2)@ work to make a single step. When I stopped to actually read the code, the problem was immediate, and the solution obvious.

The lesson? Don’t read the profile. Read the code. Use the profile to focus your attention.

January 12, 2025 03:29 PM

January 09, 2025

Edward Z. Yang

New Years resolutions for PyTorch in 2025

In my previous two posts "Ways to use torch.compile" and "Ways to use torch.export", I often said that PyTorch would be good for a use case, but there might be some downsides. Some of the downsides are foundational and difficult to remove. But some... just seem like a little something is missing from PyTorch. In this post, here are some things I hope we will end up shipping in 2025!

Improving torch.compile

A programming model for PT2. A programming model is a an abstract description of the system that is both simple (so anyone can understand it and keep it in their head all at once) and can be used to predict the system's behavior. The torch.export programming model is an example of such a description. Beyond export, we would like to help users understand why all aspects of PT2 behave the way it does (e.g., via improved error messages), and give simple, predictable tools for working around problems when they arise. The programming model helps us clearly define the intrinsic complexity of our compiler, which we must educate users about. This is a big effort involving many folks on the PyTorch team and I hope we can share more about this effort soon.

Pre-compilation: beyond single graph export. Whenever someone realizes that torch.compile compilation is taking a substantial amount of time on expensive cluster machines, the first thing they ask is, "Why don't we just compile it in advance?" To support precompiling the torch.compile API exactly as is not so easy; unlike a traditional compiler which gets the source program directly as input, users of torch.compile must actually run their Python program to hit the regions of code that are intended to be compiled. Nor can these regions be trivially enumerated and then compiled: not only must know all the metadata input tensors flowing into a region, a user might not even know what the compiled graphs are if a model has graph breaks.

OK, but why not just run the model, dump all the compiled products, and then reuse them later? This works! Here is a POC from Nikita Shulga where a special decorator aot_compile_sticky_cache swaps between exporting a graph and running the exported product. Zhengxu Chen used a similar idea to export Whisper as a few distinct graphs, which he then manually stitched together in C++ to get a Python-free version of Whisper. If you want training to work, you can more directly integrate AOTInductor as an Inductor backend, e.g., as seen in this POC.. We are a stones throw away from working precompilation, which can guarantee no compilation at runtime, we just need to put the pieces together!

Improving caching further. There are some gaps with caching which we hope to address in the near future: (1) loading Triton cache artifacts takes a long time because we still re-parse the Triton code before doing a cache lookup (James Wu is on this), (2) if you have a lot of small graphs, remote cache ends up having to do lots of small network requests, instead of one batched network request at the beginning (Oguz Ulgen recently landed this), (3) AOTAutograd cache is not fully rolled out yet (James Wu again). These collectively should be worth a 2x speedup or even more on warm cache time.

Fix multithreading. We should just make sure multithreading works, doing the testing and fiddly thread safety auditing needed to make it work. Here's a list of multithreading related issues.

Improving torch.export

Draft mode export. Export requires a lot of upfront work to even get an exported artifact in the first place. Draft mode export capitalizes on the idea that it's OK to generate an unsound "draft" graph early in the export, because even an incorrect graph is useful for kicking the tires on the downstream processing that happens after export. A draft export gives you a graph, and it also gives you a report describing what potential problems need to be fixed to get some guarantees about the correctness of the export. You can then chip away on the problems in the report until everything is green. One of the biggest innovations of draft-mode export is pervasive use of real tensor propagation when doing export: you run the export with actual tensors, so you can always trace through code, even if it is doing spicy things like data-dependent control flow.

Libtorch-free AOTInductor. AOTInductor generated binaries have a relatively small ABI surface that needs to be implemented. This hack from the most recent CUDA Mode meetup shows that you can just create an alternate implementation of the ABI that has no dependence on libtorch. This makes your deployed binary size much smaller!

Support for bundling CUDA kernels into AOTInductor. AOTInductor already supports directly bundling Triton kernels into the generated binary, but traditional CUDA kernels cannot be bundled in this way. There's no reason this has to be the case though: all we're doing is bundling cubins in both case. If we have the ability to bundle traditional CUDA kernels into AOTInductor, this means you could potentially directly embed custom operators into AOTInductor binaries, which is nice because then those operators no longer have to be offered on the runtime (especially if you're commonly iterating on these kernels!)

Export multigraphs. Export's standard model is to give you a single graph that you call unconditionally. But it's easy to imagine a level of indirection on top of these graphs, where we can dispatch between multiple graphs depending on some arguments to the model. For example, if you have a model that optionally takes an extra Tensor argument, you can simply have two graphs, one for when the Tensor is absent, and one for when it is present.

ABI stable PyTorch extensions. It's hard work being a third-party PyTorch extension with native code, because whenever there's a new release of Python or PyTorch you have to rebuild all of your wheels. If there was a limited ABI that you could build your extension against that didn't expose CPython and only relied on a small, stable ABI of PyTorch functions, your binary packaging situation would be much simpler! And if an extension relied on a small ABI, it could even be bundled with AOTInductor binary, letting these export products be truly package agnostic (one of our lessons we learned with torch.package is picking the split between "what is packaged" and "what is not" is very difficult, and people would much rather just have everything be packaged.) Jane Xu is investigating how to do this, and separately, Scott Wolchok has been refactoring headers in libtorch so that a small set of headers can be used independently of the rest of libtorch.

by Edward Z. Yang at January 09, 2025 08:50 PM

January 05, 2025

Manuel M T Chakravarty

Functional Programming in Swift

When people talk about functional programming in modern multi-paradigm languages, they usually mention Rust, Scala, or Kotlin. You rarely hear Swift being mentioned. This is odd, as one might argue that, of these languages, Swift places the strongest emphasis on functional programming.

In this talk, I will explain the core functional programming features of Swift, including its expressive type system, value types, and mutability control. Furthermore, I will discuss how Swift’s language design is influenced by the desire to create a language that addresses the whole spectrum from low-level systems programming up to high-level applications with sophisticated graphical user interfaces. Beyond the core language itself, functional programming also permeates Swift’s rich ecosystem of libraries. To support this point, I will outline some FP-inspired core libraries, covering concepts from functional data structures over functional reactive programming to declarative user interfaces.

Finally, I will briefly summarise practical considerations for using Swift in your own projects. This includes the cross-platform toolchain, the package manager, and interoperability with other languages.

January 05, 2025 07:45 PM

Abhinav Sarkar

Solving Advent of Code “Seating System” with Comonads and Stencils

In this post, we solve the Advent of Code 2020 “Seating System” challenge in Haskell using comonads and stencils.

This post was originally published on abhinavsarkar.net.

The Challenge

Here’s a quick summary of the challenge:

The seat layout fits on a grid. Each position is either floor (.), an empty seat (L), or an occupied seat (#). For example, the initial seat layout might look like this:

L.LL.LL.LL
LLLLLLL.LL
L.L.L..L..
LLLL.LL.LL
L.LL.LL.LL
L.LLLLL.LL
..L.L.....
LLLLLLLLLL
L.LLLLLL.L
L.LLLLL.LL

All decisions are based on the number of occupied seats adjacent to a given seat (one of the eight positions immediately up, down, left, right, or diagonal from the seat).

The following rules are applied to every seat simultaneously:

  • If a seat is empty (L) and there are no occupied seats adjacent to it, the seat becomes occupied.
  • If a seat is occupied (#) and four or more seats adjacent to it are also occupied, the seat becomes empty.
  • Otherwise, the seat’s state does not change.
Floor (.) never changes; seats don’t move, and nobody sits on the floor.

This is a classic Cellular Automaton problem. We need to write a program that simulates seats being occupied till no further seats are emptied or occupied, and returns the final number of occupied seats. Let’s solve this in Haskell.

The Cellular Automaton

First, some imports:

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Comonad (Comonad (..))
import Data.Function (on)
import Data.List (intercalate, nubBy)
import Data.Massiv.Array (Ix2 (..))
import Data.Massiv.Array qualified as A
import Data.Massiv.Array.Unsafe qualified as AU
import Data.Proxy (Proxy (..))
import Data.Vector.Generic qualified as VG
import Data.Vector.Generic.Mutable qualified as VGM
import Data.Vector.Unboxed qualified as VU
import System.Environment (getArgs, getProgName)

We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the comonad, massiv and vector libraries.

Quoting the Wikipedia page on Cellular Automaton (CA):

  • A cellular automaton consists of a regular grid of cells, each in one of a finite number of states.
  • For each cell, a set of cells called its neighborhood is defined relative to the specified cell.
  • An initial state is selected by assigning a state for each cell.
  • A new generation is created, according to some fixed rule that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood.

Let’s model the automaton of the challenge using Haskell:

newtype Cell = Cell Char deriving (Eq)

pattern Empty, Occupied, Floor :: Cell
pattern Empty = Cell 'L'
pattern Occupied = Cell '#'
pattern Floor = Cell '.'
{-# COMPLETE Empty, Occupied, Floor #-}

parseCell :: Char -> Cell
parseCell = \case
  'L' -> Empty
  '#' -> Occupied
  '.' -> Floor
  c -> error $ "Invalid character: " <> show c

rule :: Cell -> [Cell] -> Cell
rule cell neighbours =
  let occupiedNeighboursCount = length $ filter (== Occupied) neighbours
   in case cell of
        Empty | occupiedNeighboursCount == 0 -> Occupied
        Occupied | occupiedNeighboursCount >= 4 -> Empty
        _ -> cell

A cell in the grid can be in empty, occupied or floor state. We encode this with the pattern synonyms Empty, Occupied and Floor over the Cell newtype over Char1.

The parseCell function parses a character to a Cell. The rule function implements the automaton rule.

The Solution

We are going to solve this puzzle in three different ways. So, let’s abstract the details and solve it top-down.

class (Eq a) => Grid a where
  fromLists :: [[Cell]] -> a
  step :: a -> a
  toLists :: a -> [[Cell]]

solve :: forall a. (Grid a) => Proxy a -> [[Cell]] -> Int
solve _ =
  fromLists @a
    >>> fix step
    >>> toLists
    >>> fmap (filter (== Occupied) >>> length)
    >>> sum
  where
    fix f x = let x' = f x in if x == x' then x else fix f x'

We solve the challenge using the Grid typeclass that all our different solutions implement. A grid is specified by three functions:

  1. fromList: converts a list of lists of cells to the grid.
  2. step: runs one step of the CA simulation.
  3. toList: converts the grid back to a list of lists of cells.

The solve function calculates the number of finally occupied seats for any instance of the Grid typeclass by running the simulation till it converges2.

Now, we use solve to solve the challenge in three ways depending on the command line argument supplied:

main :: IO ()
main = do
  progName <- getProgName
  getArgs >>= \case
    [gridType, fileName] ->
      readFile fileName
        >>= (lines >>> map (map parseCell) >>> solve' gridType >>> print)
    _ -> putStrLn $ "Usage: " <> progName <> " -(z|a|s) <input_file>"
  where
    solve' = \case
      "-z" -> solve $ Proxy @(ZGrid Cell)
      "-a" -> solve $ Proxy @(AGrid Cell)
      "-s" -> solve $ Proxy @(SGrid Cell)
      _ -> error "Invalid grid type"

We have set up the top (main) and the bottom (rule) of our solutions. Now let’s work on the middle part.

The Zipper

To simulate a CA, we need to focus on each cell of the automaton grid, and run the rule for the cell. What is the first thing that come to minds of functional programmers when we want to focus on a part of a data structure? Zippers!.

Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper always has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus.

The first way to solve the challenge is the zipper for once-nested lists. Let’s start with creating the zipper for a simple list:

data Zipper a = Zipper [a] a [a] deriving (Eq, Functor)

zPosition :: Zipper a -> Int
zPosition (Zipper left _ _) = length left

zLength :: Zipper a -> Int
zLength (Zipper left _ right) = length left + 1 + length right

listToZipper :: [a] -> Zipper a
listToZipper = \case
  [] -> error "Cannot create Zipper from empty list"
  (x : xs) -> Zipper [] x xs

zipperToList :: Zipper a -> [a]
zipperToList (Zipper left focus right) = reverse left <> (focus : right)

pShowZipper :: (Show a) => Zipper a -> String
pShowZipper (Zipper left focus right) =
  unwords $
    map show (reverse left) <> (("[" <> show focus <> "]") : map show right)

zLeft :: Zipper a -> Zipper a
zLeft z@(Zipper left focus right) = case left of
  [] -> z
  x : xs -> Zipper xs x (focus : right)

zRight :: Zipper a -> Zipper a
zRight z@(Zipper left focus right) = case right of
  [] -> z
  x : xs -> Zipper (focus : left) x xs

A list zipper has a focus element, and two lists that capture the elements to the left and right of the focus. We use it through these functions:

  • zPosition returns the zero-indexed position of the focus in the zipper.
  • zLength returns the length of the zipper.
  • listToZipper and zipperToList do conversions between lists and zippers.
  • pShowZipper pretty-prints a zipper, highlighting the focus.
  • zLeft and zRight move the zipper’s focus to left and right respectively.

Let’s see it all in action:

> z = listToZipper [1..7]
> putStrLn $ pShowZipper z
[1] 2 3 4 5 6 7
> z' = zRight $ zRight $ zLeft $ zRight $ zRight z
> putStrLn $ pShowZipper z'
1 2 3 [4] 5 6 7
> zPosition z'
3
> zLength z'
7
> zipperToList z'
[1,2,3,4,5,6,7]

Great! Now, what is the zipper for a once-nested list? A once-nested zipper, of course:

newtype ZGrid a = ZGrid (Zipper (Zipper a)) deriving (Eq, Functor)

zgPosition :: ZGrid a -> (Int, Int)
zgPosition (ZGrid rows@(Zipper _ focus _)) = (zPosition rows, zPosition focus)

zgSize :: ZGrid a -> (Int, Int)
zgSize (ZGrid rows@(Zipper _ focus _)) = (zLength rows, zLength focus)

listsToZGrid :: [[a]] -> ZGrid a
listsToZGrid rows =
  let (first : rest) = fmap listToZipper rows
   in ZGrid $ Zipper [] first rest

zGridToLists :: ZGrid a -> [[a]]
zGridToLists (ZGrid (Zipper left focus right)) =
  reverse (fmap zipperToList left)
    <> (zipperToList focus : fmap zipperToList right)

pShowZGrid :: (Show a) => ZGrid a -> String
pShowZGrid (ZGrid (Zipper left focus right)) =
  intercalate "\n" $ pShowRows left <> (pShowZipper focus : pShowRows right)
  where
    pShowRows = map pShowZipper'
    pShowZipper' =
      zipperToList
        >>> splitAt (zPosition focus)
        >>> \ ~(left', focus' : right') ->
          unwords $
            map show left' <> ((" " <> show focus' <> " ") : map show right')

ZGrid is a newtype over a zipper of zippers. It has functions similar to Zipper for getting focus, position and size, for conversions to-and-from lists of lists, and for pretty-printing.

Next, the functions to move the focus in the grid:

zgUp :: ZGrid a -> ZGrid a
zgUp (ZGrid rows) = ZGrid $ zLeft rows

zgDown :: ZGrid a -> ZGrid a
zgDown (ZGrid rows) = ZGrid $ zRight rows

zgLeft :: ZGrid a -> ZGrid a
zgLeft (ZGrid rows) = ZGrid $ fmap zLeft rows

zgRight :: ZGrid a -> ZGrid a
zgRight (ZGrid rows) = ZGrid $ fmap zRight rows

Let’s check them out in GHCi:

> zg = listsToZGrid $ replicate 7 $ [1..7]
> putStrLn $ pShowZGrid zg
[1] 2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
> zg' = zgDown $ zgRight $ zgDown $ zgRight zg
> putStrLn $ pShowZGrid zg'
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2 [3] 4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
> zgPosition zg'
(2,2)
> zgSize zg'
(7,7)
> zGridToLists zg'
[[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7]]

It works as expected. Now, how do we use this to simulate a CA?

The Comonad

A CA requires us to focus on each cell of the grid, and run a rule for the cell that depends on the neighbours of the cell. An Haskell abstraction that neatly fits this requirement is Comonad.

Comonads are duals of Monads3. We don’t need to learn everything about them for now. For our purpose, Comonad provides an interface that exactly lines up with what is needed for simulating CA:

class Functor w => Comonad w where
  extract :: w a -> a
  duplicate :: w a -> w (w a)
  extend :: (w a -> b) -> w a -> w b
  {-# MINIMAL extract, (duplicate | extend) #-}

Assuming we can make ZGrid a comonad instance, the signatures for the above functions for ZGrid Cell would be:

class Comonad ZGrid where
  extract :: ZGrid Cell -> Cell
  duplicate :: ZGrid Cell -> ZGrid (ZGrid Cell)
  extend :: (ZGrid Cell -> Cell) -> ZGrid Cell -> ZGrid Cell

For ZGrid as a CA comonad:

  • The extract function would return the current focus of the grid.
  • The duplicate function would return a grid of grids, one inner grid for each possible focus of the input grid.
  • The extend function would apply the automata rule to each possible focus of the grid, and return a new grid.

The nice part is, we need to implement only the extract and duplicate functions, and the generation of the new grid is taken care of automatically by the default implementation of the extend function. Let’s write the comonad instance for ZGrid.

First, we write the comonad instance for Zipper:

instance Comonad Zipper where
  extract (Zipper _ focus _) = focus
  duplicate zipper = Zipper left zipper right
    where
      pos = zPosition zipper
      left = iterateN pos zLeft $ zLeft zipper
      right = iterateN (zLength zipper - pos - 1) zRight $ zRight zipper

iterateN :: Int -> (a -> a) -> a -> [a]
iterateN n f = take n . iterate f

extract for Zipper simply returns the input zipper’s focus element.

duplicate returns a zipper of zippers, with the input zipper as its focus, and the left and right lists of zippers as variation of the input zipper with all possible focuses. Trying out the functions in GHCi gives a better idea:

> z = listToZipper [1..7] :: Zipper Int
> :t duplicate z
duplicate z :: Zipper (Zipper Int)
> mapM_ (putStrLn . pShowZipper) $ zipperToList $ duplicate z
[1] 2 3 4 5 6 7
1 [2] 3 4 5 6 7
1 2 [3] 4 5 6 7
1 2 3 [4] 5 6 7
1 2 3 4 [5] 6 7
1 2 3 4 5 [6] 7
1 2 3 4 5 6 [7]

Great! Now we use similar construction to write the comonad instance for ZGrid:

instance Comonad ZGrid where
  extract (ZGrid grid) = extract $ extract grid
  duplicate grid = ZGrid $ Zipper left focus right
    where
      (focusRowPos, focusColPos) = zgPosition grid
      (rowCount, colCount) = zgSize grid

      focus = Zipper focusLeft grid focusRight
      focusLeft = iterateN focusColPos zgLeft $ zgLeft grid
      focusRight =
        iterateN (colCount - focusColPos - 1) zgRight $ zgRight grid

      left = iterateN focusRowPos (fmap zgUp) $ fmap zgUp focus
      right =
        iterateN (rowCount - focusRowPos - 1) (fmap zgDown) $ fmap zgDown focus

It works in similar fashion:

> zg = listsToZGrid $ replicate 4 $ [0..3] :: ZGrid Int
> putStrLn $ pShowZGrid zg
[0] 1 2 3
 0  1 2 3
 0  1 2 3
 0  1 2 3
> :t duplicate zg
duplicate zg :: ZGrid (ZGrid Int)
> :t mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg
mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg :: IO ()

I’ve rearranged the output of running the last line of the code above for clarity:

Output of duplicate for ZGrid <noscript>Output of duplicate for ZGrid</noscript>
Output of duplicate for ZGrid

We can see a grid of grids, with one inner grid focussed at each possible focus of the input grid. Now we finally implement the automaton:

zGridNeighbours :: ZGrid a -> [a]
zGridNeighbours grid =
  map snd . nubBy ((==) `on` fst) $
    [ (pos, extract grid')
      | move <- moves,
        let grid' = move grid,
        let pos = zgPosition grid',
        pos /= zgPosition grid
    ]
  where
    moves =
      [ zgUp, zgDown, zgRight, zgLeft,
        zgUp >>> zgLeft, zgUp >>> zgRight,
        zgDown >>> zgLeft, zgDown >>> zgRight
      ]

stepZGrid :: ZGrid Cell -> ZGrid Cell
stepZGrid = extend $ \grid -> rule (extract grid) (zGridNeighbours grid)

instance Grid (ZGrid Cell) where
  fromLists = listsToZGrid
  step = stepZGrid
  toLists = zGridToLists

zGridNeighbours returns the neighbour cells of the currently focussed cell of the grid. It does so by moving the focus in all eight directions, and extracting the new focuses. We also make sure to return unique cells by their position.

stepZGrid implements one step of the CA using the extend function of the Comonad typeclass. We call extend with a function that takes the current grid, and returns the result of running the CA rule on its focus and the neighbours of the focus.

Finally, we plug in our functions into the ZGrid Cell instance of Grid.

That’s it! Let’s compile and run the code4:

❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[1 of 2] Compiling Main             ( seating-system.hs, seating-system.o )
[2 of 2] Linking seating-system
❯ time ./seating-system -z input.txt
2243
        2.72 real         2.68 user         0.02 sys

I verified with the Advent of Code website that the result is correct. We also see the time elapsed, which is 2.7 seconds. That seems pretty high. Can we do better?

The Array

The problem with the zipper approach is that lists in Haskell are too slow. Some operations on them like length are \(O(n)\). The are also lazy in spine and value, and build up thunks. We could switch to a different list-like data structure5, or cache the grid size and neighbour indices for each index to make it run faster. Or we could try an entirely different approach.

Let’s think about it for a bit. Zippers intermix two things together: the data in the grid, and the focus. When running a step of the CA, the grid data does not change when focussing on all possible focuses, only the focus itself changes. What if we separate the data from the focus? Maybe that’ll make it faster. Let’s try it out.

Let’s model the grid as combination of a 2D array and an index into the array. We are using the arrays from the massiv library.

data AGrid a = AGrid {aGrid :: A.Array A.B A.Ix2 a, aGridFocus :: A.Ix2}
  deriving (Eq, Functor)

A.Ix2 is massiv’s way of representing an index into an 2D array, and is essentially same as a two-tuple of Ints. A.Array A.B A.Ix2 a here means a 2D boxed array of as. massiv uses representation strategies to decide how arrays are actually represented in the memory, among which are boxed, unboxed, primitive, storable, delayed etc. Even though primitive and storable arrays are faster, we have to go with boxed arrays here because the Functor instance of A.Array exists only for boxed and delayed arrays, and boxed ones are the faster among the two for our purpose.

It is actually massively6 easier to write the Comonad instance for AGrid:

instance Comonad AGrid where
  extract (AGrid grid focus) = grid A.! focus
  extend f (AGrid grid focus) =
    AGrid (A.compute $ A.imap (\pos _ -> f $ AGrid grid pos) grid) focus

The extract implementation simply looks up the element from the array at the focus index. This time, we don’t need to implement duplicate because it is easier to implement extend directly. We map with index (A.imap) over the grid, calling the function f for the variation of the grid with the index as the focus.

Next, we write the CA step:

listsToAGrid :: [[Cell]] -> AGrid Cell
listsToAGrid = A.fromLists' A.Seq >>> flip AGrid (0 :. 0)

aGridNeighbours :: AGrid a -> [a]
aGridNeighbours (AGrid grid (x :. y)) =
  [ grid A.! (x + i :. y + j)
    | i <- [-1, 0, 1],
      j <- [-1, 0, 1],
      (x + i, y + j) /= (x, y),
      validIndex (x + i, y + j)
  ]
  where
    A.Sz (rowCount :. colCount) = A.size grid
    validIndex (a, b) = and [a >= 0, b >= 0, a < rowCount, b < colCount]

stepAGrid :: AGrid Cell -> AGrid Cell
stepAGrid = extend $ \grid -> rule (extract grid) (aGridNeighbours grid)

instance Grid (AGrid Cell) where
  fromLists = listsToAGrid
  step = stepAGrid
  toLists = aGrid >>> A.toLists

listsToAGrid converts a list of lists of cells into an AGrid focussed at (0,0). aGridNeighbours finds the neighbours of the current focus of a grid by directly looking up the valid neighbour indices into the array. stepAGrid calls extract and aGridNeighbours to implement the CA step, much like the ZGrid case. And finally, we create the AGrid Cell instance of Grid.

Let’s compile and run it:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -a input.txt
2243
        0.10 real         0.09 user         0.00 sys

Woah! It takes only 0.1 second this time. Can we do even better?

The Stencil

massiv has a construct called Stencil that can be used for simulating CA:

Stencil is abstract description of how to handle elements in the neighborhood of every array cell in order to compute a value for the cells in the new array.

That sounds like exactly what we need. Let’s try it out next.

With stencils, we do not need the instance of Comonad for the grid. So we can switch to the faster unboxed array representation:

newtype instance VU.MVector s Cell = MV_Char (VU.MVector s Char)
newtype instance VU.Vector Cell = V_Char (VU.Vector Char)
deriving instance VGM.MVector VU.MVector Cell
deriving instance VG.Vector VU.Vector Cell
instance VU.Unbox Cell

type SGrid a = A.Array A.U A.Ix2 a

First five lines make Cell an instance of the Unbox typeclass. We chose to make Cell a newtype wrapper over Char because Char has an Unbox instance.

Then we define a new grid type SGrid that is an 2D unboxed array.

Now, we define the stencil and the step function for our CA:

ruleStencil :: A.Stencil A.Ix2 Cell Cell
ruleStencil = AU.makeUnsafeStencil (A.Sz (3 :. 3)) (1 :. 1) $ \_ get ->
  rule (get (0 :. 0)) $ map get neighbourIndexes
  where
    neighbourIndexes =
      [ -1 :. -1, -1 :. 0, -1 :. 1,
         0 :. -1,           0 :. 1,
         1 :. -1,  1 :. 0,  1 :. 1
      ]

stepSGrid :: SGrid Cell -> SGrid Cell
stepSGrid = A.mapStencil (A.Fill Floor) ruleStencil >>> A.computeP

instance Grid (SGrid Cell) where
  fromLists = A.fromLists' A.Seq
  step = stepSGrid
  toLists = A.toLists

We make a stencil of size 3-by-3, where the focus is at index (1,1) relative to the stencil’s top-left cell. In the callback function, we use the supplied get function to get the neighbours of the focus by using indices relative to the focus, and call rule with the cells at focus and neighbour indices.

Then we write the step function stepSGrid that maps the stencil over the grid. Finally we put everything together in the SGrid Cell instance of Grid.

Let’s compile and run it:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt
2243
        0.08 real         0.07 user         0.00 sys

It is only a bit faster than the previous solution. But, this time we have another trick up our sleeves. Did you notice A.computeP we sneaked in there? With stencils, we can now run the step for all cells in parallel! Let’s recompile it with the right options and run it again:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2 -threaded -rtsopts"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt +RTS -N
2243
        0.04 real         0.11 user         0.05 sys

The -threaded option enables multithreading, and the +RTS -N option makes the process use all CPU cores7. We get a nice speedup of 2x over the single-threaded version.

Bonus Round: Simulation Visualization

Since you’ve read the entire post, here is a bonus visualization of the CA simulation for you (warning: lots of fast blinking):

Play the simulation <noscript></noscript>

That’s it for this post! I hope you enjoyed it and took something away from it. If you have any questions or comments, please leave a comment below. If you liked this post, please share it with your friends. Thanks for reading!

The full code for this post is available here.


  1. The reason for using a newtype instead of a data is explained in the Stencil section.↩︎

  2. If you are unfamiliar, >>> is the left-to-right function composition function:

    f >>> g = g . f
    ↩︎
  3. This short post by Bartosz Milewski explains how comonads and monads are related.↩︎

  4. We use Nix for getting the dependency libraries.↩︎

  5. I did try a variation with Data.Sequence.Seq instead of lists, and it was twice as fast.↩︎

  6. Pun very much intended.↩︎

  7. I tried running the process with different values of N and found that N4 gave the fastest results. So, Amdahl’s law applies here.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at January 05, 2025 12:00 AM

January 04, 2025

Philip Wadler

Telnaes quits The Washington Post



Cartoonist Ann Telnaes has quit the Washington Post, after they refused to publish one of her cartoons, depicting Mark Zuckerberg (Meta), Sam Altman (Open AI), Patrick Soon-Shiong (LA Times), the Walt Disney Company (ABC News), and Jeff Bezos (Amazon & Washington Post). All that exists is her preliminary sketch, above. Why is this important? See her primer below. (Spotted via Boing Boing.)





 

by Philip Wadler (noreply@blogger.com) at January 04, 2025 09:41 PM

December 24, 2024

Edward Z. Yang

Ways to use torch.export

Previously, I discussed the value proposition of torch.compile. While doing so, I observed a number of downsides (long compile time, complicated operational model, lack of packaging) that were intrinsic to torch.compile's API contract, which emphasized being able to work on Python code as is, with minimal intervention from users. torch.export occupies a different spot in the tradeoff space: in exchange for more upfront work making a model exportable, it allows for use of PyTorch models in environments where using torch.compile as is would be impossible.

Enable end-to-end C++ CPU/GPU Inference

Scenario: Like before, suppose you want to deploy your model for inference. However, now you have more stringent runtime requirements: perhaps you need to do inference from a CPython-less environment (because your QPS requirements require GIL-less multithreading; alternately, CPython execution overhead is unacceptable but you cannot use CUDA graphs, e.g., due to CPU inference or dynamic shapes requirements). Or perhaps your production environment requires hermetic deploy artifacts (for example, in a monorepo setup, where infrastructure code must be continually pushed but model code should be frozen). But like before, you would prefer not to have to rewrite your model; you would like the existing model to serve as the basis for your Python-less inference binary.

What to do: Use torch.export targeting AOTInductor. This will compile the model into a self-contained shared library which then can be directly invoked from a C++ runtime. This shared library contains all of the compiler generated Triton kernels as precompiled cubins and is guaranteed not to need any runtime compilation; furthermore, it relies only on a small runtime ABI (with no CPython dependency), so the binaries can be used across versions of libtorch. AOTInductor's multithreading capability and low runtime overhead also makes it a good match for CPU inference too!

You don't have to go straight to C++ CPU/GPU inference: you can start with using torch.compile on your code before investing in torch.export. There are four primary extra requirements export imposes: (1) your model must compile with fullgraph=True (though you can sometimes bypass missing Dynamo functionality by using non-strict export; sometimes, it is easier to do non-strict torch.export than it is to torch.compile!), (2) your model's inputs/outputs must only be in torch.export's supported set of argument types (think Tensors in pytrees), (3) your model must never recompile--specifically, you must specify what inputs have dynamic shapes, and (4) the top-level of your model must be an nn.Module (so that export can keep track of all of the parameters your model has).

Some tips:

  • Check out the torch.export programming model. The torch.export programming model is an upcoming doc which aims to help set expectations on what can and cannot be exported. It talks about things like "Tensors are the only inputs that can actually vary at runtime" and common mistakes such as module code which modifies NN modules (not supported!) or optional input types (you will end up with an export that takes in that input or not, there is no runtime optionality).
  • Budget time for getting a model to export. With torch.compile for Python inference, you could just slap it on your model and see what happens. For torch.export, you have to actually finish exporting your entire model before you can even consider running the rest of the pipeline. For some of the more complicated models we have exported, there were often dozens of issues that had to be worked around in one way or another. And that doesn't even account for all of the post-export work you have to do, like validating the numerics of the exported model.
  • Intermediate value debugging. AOTInductor has an option to add dumps of intermediate tensor values in the compiled C++ code. This is good for determining, e.g., the first time where a NaN shows up, in case you are suspecting a miscompilation.

Open source examples: Among other things, torchchat has an example end-to-end AOTInductor setup for server-side LLM inference, which you can view in run.cpp.

torch.export specific downsides:

  • No built-in support for guard-based dispatch (multiple compilations). Earlier, I mentioned that an exported model must not have any recompiles. This leads to some fairly common patterns of code not being directly supported by torch.export: you can't export a single model that takes an enum as input, or has an optional Tensor argument, or accepts two distinct tensor shapes that need to be compiled individually. Now, technically, we could support this: you could imagine a package that contains multiple exported artifacts and dispatches between them depending on some conditions (e.g., the value of the enum, whether or the optional Tensor argument was provided, the shape of the input tensor). But you're on your own: torch.compile will do this for you, but torch.export will not.
  • No built-in support for models that are split into multiple graphs. Similarly, we've mentioned that an exported model must be a single graph. This is in contrast to torch.compile, which will happily insert graph breaks and compile distinct islands of code that can be glued together with Python eager code. Now, technically, you can do this with export too: you can carve out several distinct subnets of your model, export them individually, and then glue them together with some custom written code on the other end (in fact, Meta's internal recommendation systems do this), but there's no built-in support for this workflow.
  • The extra requirements often don't cover important components of real world models. I've mentioned this previously as the extra restrictions export places on you, but it's worth reiterating some of the consequences of this. Take an LLM inference application: obviously, there is a core model that takes in tokens and produces logit predictions--this part of the model is exportable. But there are also important other pieces such as the tokenizer and sampling strategy which are not exportable (tokenizer because it operates on strings, not tensors; sampling because it involves complicated control flow). Arguably, it would be much better if all of these things could be directly bundled with the model itself; in practice, end-to-end applications should just expect to directly implement these in native code (e.g., as is done in torchchat). Our experience with TorchScript taught us that we don't really want to be in the business of designing a general purpose programming language that is portable across all of export's targets; better to just bet that the tokenizer doesn't change that often and eat the cost of natively integrating it by hand.

AOTInductor specific downsides:

  • You still need libtorch to actually run the model. Although AOTInductor binaries bundle most of their compiled kernel implementation, they still require a minimal runtime that can offer basic necessities such as tensor allocation and access to custom operators. There is not yet an official offering of an alternative, lightweight implementation of the stable ABI AOTInductor binaries depends on, so if you do want to deploy AOTInductor binaries you will typically have to also bring libtorch along. This is usually not a big deal server side, but it can be problematic if you want to do client side deployments!
  • No CUDA graphs support. This one is not such a big deal since you are much less likely to be CPU bound when the host side logic is all compiled C++, but there's no support for CUDA graphs in AOTInductor. (Funnily enough, this is also something you technically can orchestrate from outside of AOTInductor.)

Edge deployment

Scenario: You need to deploy your PyTorch model to edge devices (e.g., a mobile phone or a wearable device) where computational resources are limited. You have requirements that are a bit different from server size: you care a lot more about minimizing binary size and startup time. Traditional PyTorch deployment with full libtorch won't work. The device you're deploying too might also have some strange extra processors, like a DSP or NPU, that you want your model to target.

What to do: Use torch.export targeting Executorch. Among other things, Executorch offers a completely separate runtime for exported PyTorch programs (i.e., it has no dependency on libtorch, except perhaps there are a few headers which we share between the projects) which was specifically designed for edge deployment. (Historical note: we spent a long time trying to directly ship a stripped down version of libtorch to mobile devices, but it turns out it's really hard to write code that is portable on server and client, so it's better to only share when absolutely necessary.) Quantization is also a pretty important part of deployment to Edge, and Executorch incorporates this into the end-to-end workflow.

Open source examples: torchchat also has an Executorch integration letting you run an LLM on your Android phone.

Downsides. All of the export related downsides described previously apply here. But here's something to know specifically about Executorch:

  • The edge ecosystem is fragmented. At time of writing, there are seven distinct backends Executorch can target. This is not really Executorch's fault, it comes with the territory--but I want to call it out because it stands in stark contrast to the NVIDIA's server-side hegemony. Yes, AMD GPUs are a thing, and various flavors of CPU are real, but it really is a lot easier to be focused on server side because NVIDIA GPUs come first.

Pre-compiled kernels for eager mode

Scenario: You need a new function or self-contained module with an efficient kernel implementation. However, you would prefer not to have to write the CUDA (or even Triton) by hand; the kernel is something that torch.compile can generate from higher level PyTorch implementation. At the same time, however, you cannot tolerate just-in-time compilation at all (perhaps you are doing a massive training job, and any startup latency makes it more likely that one of your nodes will fail during startup and then you make no progress at all; or maybe you just find it annoying when PyTorch goes out to lunch when you cache miss).

What to do: Use torch.export targeting AOTInductor, and then load and run the AOTInductor generated binary from Python.

Downsides. So, we know this use case works, because we have internally used this to unblock people who wanted to use Triton kernels but could not tolerate Triton's just-in-time compilation. But there's not much affordance in our APIs for this use case; for example, guard-based dispatch is often quite useful for compiled functions, but you'll have to roll that by hand. More generally, when compiling a kernel, you have to make tradeoffs about how static versus dynamic the kernel should be (for example, will you force the inputs to be evenly divisible by eight? Or would you have a separate kernel for the divisible and not divisible cases?) Once again, you're on your own for making the call there.

An exchange format across systems

Scenario: In an ideal world, you would have a model, you could export it to an AOTInductor binary, and then be all done. In reality, maybe this export process needs to be a multi-stage process, where it has to be processed to some degree on one machine, and then finish processing on another machine. Or perhaps you need to shift the processing over time: you want to export a model to freeze it (so it is no longer tied to its original source code), and then repeatedly run the rest of the model processing pipeline on this exported program (e.g., because you are continuously updating its weights and then reprocessing the model). Maybe you want to export the model and then train it from Python later, committing to a distributed training strategy only when you know how many nodes you are running. The ability to hermetically package a model and then process it later is one of the big value propositions of TorchScript and torch.package.

What to do: Use torch.export by itself, potentially using pre-dispatch if you need to support training use-cases. torch.export produces an ExportedProgram which has a clean intermediate representation that you can do processing on, or just serialize and then do processing on later.

Downsides:

  • Custom operators are not packaged. A custom operator typically refers to some native code which was linked with PyTorch proper. There's no way to extract out this kernel and embed it into the exported program so that there is no dependence; instead, you're expected to ensure the eventual runtime relinks with the same custom operator. Note that this problem doesn't apply to user defined Triton kernels, as export can simply compile it and package the binary directly into the exported product. (Technically, this applies to AOTInductor too, but this tends to be much more of a problem for use cases which are primarily about freezing rapidly evolving model code, as opposed to plain inference where you would simply just expect people to not be changing custom operators willy nilly.)
  • Choose your own decompositions. Export produces IR that only contains operators from a canonical operator set. However, the default choice is sometimes inappropriate for use cases (e.g., some users want aten.upsample_nearest2d.vec to be decomposed while others do not), so in practice for any given target you may have a bespoke operator set that is appropriate for that use case. Unfortunately, it can be fiddly getting your operator set quite right, and while we've talked about ideas like a "build your own operator set interactive tool" these have not been implemented yet.
  • Annoyingly large FC/BC surface. Something I really like about AOTInductor is that it has a very small FC/BC surface: I only need to make sure I don't make breaking changes to the C ABI, and I'm golden. With export IR, the FC/BC surface is all of the operators produced by export. Even a decomposition is potentially BC breaking: a downstream pass could be expecting to see an operator that no longer exists because I've decomposed it into smaller pieces. Matters get worse in pre-dispatch export, since the scope of APIs used inside export IR expands to include autograd control operators (e.g., torch.no_grad) as well as tensor subclasses (since Tensor subclasses cannot be desugared if we have not yet eliminated autograd). We will not break your AOTInductor blobs. We can't as easily give the same guarantee for the IR here.

Next time: What's missing, and what we're doing about it

by Edward Z. Yang at December 24, 2024 04:28 AM

December 23, 2024

Michael Snoyman

A secure Bitcoin self custody strategy

Up until this year, my Bitcoin custody strategy was fairly straightforward, and likely familiar to other hodlers:

  • Buy a hardware wallet
  • Put the seed phrase on steel plates
  • Secure those steel plates somewhere on my property

But in October of last year, the situation changed. I live in Northern Israel, close to the Lebanese border. The past 14 months have involved a lot of rocket attacks, including destruction of multiple buildings in my home town. This brought into question how to properly secure my sats. Importantly, I needed to balance two competing goals:

  1. Resiliency of the saved secrets against destruction. In other words: make sure I didn't lose access to the wallet.
  2. Security against attackers trying to steal those secrets. In other words: make sure no one else got access to the wallet.

I put some time into designing a solution to these conflicting goals, and would like to share some thoughts for others looking to improve their BTC custody strategy. And if anyone has any recommendations for improvements, I'm all ears!

Goals

  • Self custody I didn't want to rely on an external custody company. Not your keys, not your coins.
  • Full access I always maintain full access to my funds, without relying on any external party.
  • Computer hack resilient If my computer systems are hacked, I will not lose access to or control of my funds (neither stolen nor lost).
  • Physical destruction resilient If my hardware device and steel plates are both destroyed (as well as anything else physically located in my home town), I can still recovery my funds.
  • Will survive me If I'm killed, I want my wife, children, or other family members to be able to recover and inherit my BTC.

Multisig

The heart of this protection mechanism is a multisig wallet. Unfortunately, interfaces for setting up multisig wallets are tricky. I'll walk through the basics and then come back to how to set it up.

The concept of a multisig is that your wallet is protected by multiple signers. Each signer can be any "normal" wallet, e.g. a software or hardware wallet. You choose a number of signers and a threshold of signers required to perform a transaction.

For example, a 2 of 2 multisig would mean that 2 wallets can sign transactions, and both of them need to sign to make a valid transaction. A 3 of 5 would mean 5 total signers, any 3 of them being needed to sign a transaction.

For my setup, I set up a 2 of 3 multisig, with the 3 signers being a software wallet, a hardware wallet, and SLIP39 wallet. Let's go through each of those, explain how they work, and then see how the solution addresses the goals.

Software wallet

I set up a software wallet and saved the seed phrase in a dedicated password manager account using Bitwarden. Bitwarden offers an emergency access feature, which essentially means a trusted person can be listed as an emergency contact and can recover your account. The process includes a waiting period, during which the account owner can reject the request.

Put another way: Bitwarden is offering a cryptographically secure, third party hosted, fully managed, user friendly dead-man switch. Exactly what I needed.

I added a select group of trusted people as the recoverers on the account. Otherwise, I keep the account securely locked down in Bitwarden and can use it for signing when necessary.

Let's see how this stacks up against the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Fail, if my system is hacked, I lose control of the wallet
  • Physical destruction resilient Check, Bitwarden lives beyond my machines
  • Will survive me Check thanks to the dead-man switch

Hardware wallet

Not much to say about the hardware wallet setup that I haven't said already. Let's do the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Check, the private keys never leave the hardware device
  • Physical destruction resilient Fail, the wallet and plates could easily be destroyed, and the plates could easily be stolen. (The wallet could be stolen too, but thanks to the PIN mechanism would theoretically be resistant to compromise. But that's not a theory I'd want to bet my wealth on.)
  • Will survive me Check, anyone can take my plates and recover the wallet

SLIP39

This one requires a bit of explanation. SLIP39 is a not-so-common standard for taking some data and splitting it up into a number of shards. You can define the threshold of shards necessary to reconstruct the original secret. This uses an algorithm called Shamir's Secret Sharing. (And yes, it is very similar in function to multisig, but implemented differently).

The idea here is that this wallet is controlled by a group of friends and family members. Without getting into my actual setup, I could choose 7 very trusted individuals from all over the world and tell them that, should I contact them and ask for them, they should send me their shards so I can reconstruct that third wallet. And to be especially morbid, they also know the identity of some backup people in the event of my death.

In any event, the idea is that if enough of these people agree to, they can reconstruct the third wallet. The assumption is that these are all trustworthy people. But even with trustworthy people, (1) I could be wrong about how trustworthy they are, or (2) they could be coerced or tricked. So let's see how these security mechanism stands up:

  • Self custody Fail, I'm totally reliant on others.
  • Full access Fail, by design I don't keep this wallet myself, so I must rely on others.
  • Computer hack resilient Check, the holders of these shards keep them in secure, offline storage.
  • Physical destruction resilient Check (sort of), since the probability of all copies being destroyed or stolen is negligible.
  • Will survive me Check, by design

Comparison against goals

We saw how each individual wallet stacked up against the goals. How about all of them together? Well, there are certainly some theoretical ways I could lose the funds, e.g. my hardware wallet and plates are destroyed and a majority of shard holders for the SLIP39 lost their shards. However, if you look through the check/fail lists, every category has at least two checks. Meaning: on all dimensions, if some catastrophe happens, at least two of the wallets should survive.

Now the caveats (I seem to like that word). I did a lot of research on this, and this is at least tangential to my actual field of expertise. But I'm not a dedicated security researcher, and can't really claim full, deep understanding of all these topics. So if I made any mistakes here, please let me know.

How-to guide

OK, so how do you actually get a system like this running? I'll give you my own step-by-step guide. Best case scenario for all this: download all the websites and programs mentioned onto a fresh Linux system install, disconnect the internet, run the programs and copy down any data as needed, and then wipe the system again. (Or, alternatively, do all the actions from a Live USB session.)

  1. Set up the SLIP39. You can use an online generator. Choose the number of bits of entropy (IMO 128bit is sufficient), choose the total shares and threshold, and then copy down the phrases.
  2. Generate the software wallet. You can use a sister site to the SLIP39 generator. Choose either 12 or 24 words, and write those words down. On a different, internet-connected computer, you can save those words into a Bitwarden account, and set it up with appropriate emergency access.
  3. Open up Electrum. (Other wallets, like Sparrow, probably work for this too, but I've only done it with Electrum.) The rest of this section will include a step-by-step guide through the Electrum steps. And yes, I took these screenshots on a Mac, but for a real setup use a Linux machine.

Set up a new wallet. Enter a name (doesn't matter what) and click next.

New wallet

Choose a multisig wallet and click next.

Multisig

Choose 3 cosigners and require 2 signatures.

Signer count

Now we're going to enter all three wallets. The first one will be your hardware device. Click next, then follow all the prompts to set it up.

Hardware

After a few screens (they'll be different based on your choice of hardware device), you'll be prompted to select a derivation path. Use native segwit and the standard derivation path.

segwit

This next screen was the single most complicated for me, simply because the terms were unclear. First, you'll see a Zpub string displayed as a "master public key," e.g.:

Zpub75J9cLwa3iX1zB2oiTdvGDf4EyHWN1ZYs5gVt6JSM9THA6XLUoZhA4iZwyruCKHpw8BFf54wbAK6XdgtMLa2TgbDcftdsietCuKQ6eDPyi6

You need to write this down. It's the same as an xpub, but for multisig wallets. This represents all the possible public keys for your hardware wallet. Putting together the three Zpub values will allow your software of choice to generate all the receiving and change addresses for your new wallet. You'll need all three, so don't lose them! But on their own, they cannot be used to access your funds. Therefore, treat them with "medium" security. Backing up in Bitwarden with your software wallet is a good idea, and potentially simply sending to some friends to back up just in case.

And that explanation brings us back to the three choices on the screen. You can choose to either enter a cosigner key, a cosigner seed, or use another hardware wallet. The difference between key and seed is that the former is public information only, whereas the latter is full signing power. Often, multisig wallets are set up by multiple different people, and so instead of sharing the seed with each other (a major security violation), they each generate a seed phrase and only share the key with each other.

However, given that you're setting up the wallet with access to all seed phrases, and you're doing it on an airgapped device, it's safe to enter the seed phrases directly. And I'd recommend it, to avoid the risk of generating the wrong master key from a seed. So go ahead and choose "enter cosigner seed" and click next.

Add cosigner 2

And now onto the second most confusing screen. I copied my seed phrase into this text box, but it won't let me continue!

Cannot continue

The trick is that Electrum, by default, uses its own concept of seed phrases. You need to click on "Options" and then choose BIP39, and then enter your seed phrase.

BIP39

Continue through the other screens until you're able to enter the final seed. This time, instead of choosing BIP39, choose SLIP39. You'll need to enter enough of the SLIP39 shards to meet the threshold.

SLIP39

And with that, you can continue through the rest of the screens, and you'll now have a fully operational multisig!

Addresses

Open up Electrum again on an internet-connected computer. This time, connect the hardware wallet as before, enter the BIP39 as before, but for the SLIP39, enter the master key instead of the SLIP39 seed phrase. This will ensure that no internet connected device ever has both the software wallet and SLIP39 at the same time. You should confirm that the addresses on the airgapped machine match the addresses on the internet connected device.

If so, you're ready for the final test. Send a small amount of funds into the first receiving address, and then use Electrum on the internet connected device to (1) confirm in the history that it arrived and (2) send it back to another address. You should be asked to sign with your hardware wallet.

If you made it this far, congratulations! You're the proud owner of a new 2of3 multisig wallet.

Conclusion

I hope the topic of death and war wasn't too terribly morbid for others. But these are important topics to address in our world of self custody. I hope others found this useful. And once again, if anyone has recommendations for improvements to this setup, please do let me know!

December 23, 2024 12:00 AM

December 22, 2024

Haskell Interlude

60: Tom Ellis

Tom Ellis works at Groq, using Haskell to compile AI models to specialized hardware.  In this episode, we talk about stability of both GHC and Haskell libraries, effects, and strictness, and the premise of functional programming: make invalid states and invalid *laziness* unrepresentable! 

by Haskell Podcast at December 22, 2024 06:00 PM

December 21, 2024

Philip Wadler

Please submit to Lambda Days

 


I'm part of the programme committee for Lambda Days, and I’m personally inviting you to submit your talk!

Lambda Days is all about celebrating the world of functional programming, and we’re eager to hear about your latest ideas, projects, and discoveries. Whether it’s functional languages, type theory, reactive programming, or something completely unexpected—we want to see it!

🎯 Submission Deadline: 9 February 2025
🎙ï¸� Never spoken before? No worries! We’re committed to supporting speakers from all backgrounds, especially those from underrepresented groups in tech.

Submit your talk and share your wisdom with the FP community.

👉 https://www.lambdadays.org/lambdadays2025#call-for-talks

by Philip Wadler (noreply@blogger.com) at December 21, 2024 07:56 PM

December 19, 2024

Tweag I/O

The Developer Experience Upgrade: From Create React App to Vite

We all know how it feels: staring at the terminal while your development server starts up, or watching your CI/CD pipeline crawl through yet another build process. For many React developers using Create React App (CRA), this waiting game has become an unwanted part of the daily routine. While CRA has been the go-to build tool for React applications for years, its aging architecture is increasingly becoming a bottleneck for developer productivity. Enter Vite: a modern build tool that’s not just an alternative to CRA, but a glimpse into the future of web development tooling. I’ll introduce both CRA and Vite, share how switching to Vite transformed our development workflow with concrete numbers and benchmarks to demonstrate the dramatic improvements in build times, startup speed, and overall developer experience.

Create React App: A Historical Context

Create React App played a very important role in making React what it is today. By introducing a single, clear, and recommended approach for creating React projects, it enabled developers to focus on building applications without worrying about the complexity of the underlying build tools.

However, like many mature and widely established tools, CRA has become stagnant over time by not keeping up with features provided by modern (meta-)frameworks like server-side rendering, routing, and data fetching. It also hasn’t taken advantage of web APIs to deliver fast applications by default.

Let’s dive into some of the most noticeable limitations.

Performance Issues

CRA’s performance issues stem from one major architectural factor: its reliance on Webpack as its bundler. Webpack, while powerful and flexible, has inherent performance limitations. Webpack processes everything through JavaScript, which is single-threaded by nature and slower at CPU-intensive tasks compared to lower-level languages like Go or Rust.

Here’s a simplified version of what happens every time you make a code change:

  1. CRA (using Webpack) needs to scan your entire project to understand how all your files are connected to build a dependency graph
  2. It then needs to transform all your modern JavaScript, TypeScript, or JSX code into a version that browsers can understand
  3. Finally, it bundles everything together into a single package that can be served to your browser

Rebuilding the app becomes increasingly time-consuming as the project grows. During development, Webpack’s incremental builds help mitigate performance challenges by only reprocessing modules that have changed, leveraging the dependency graph to minimize unnecessary work. However, the bundling step still needs to consider all files—both cached and reprocessed, to generate a complete bundle that can be served to the browser, which means Webpack must account for the entire codebase’s structure with each build.

Security Issues

When running npx create-react-app <project-directory>, after waiting for a while, a good amount of deprecated warnings (23 packages as of writing this) will be shown. At the end of the installation process, a message indicating 8 vulnerabilities (2 moderate, 6 high) will appear. This means that create-react-app relies on packages that have known critical security vulnerabilities.

Support Issues

The React team no longer recommends CRA for new projects, and they have stopped providing support for it. The last published version on npm was 3 years ago.

Instead, React’s official documentation now includes Vite in its recommendations for both starting new projects and adding React to existing projects.

While CRA served its purpose well in the past, its aging architecture, security vulnerabilities, and lack of modern features make it increasingly difficult to justify for new projects.

Introducing Vite

Vite is a build tool that is designed to be simpler, faster and more efficient for building modern web applications. It’s opinionated and comes with sensible defaults out of the box.

Vite was created by Evan You, author of Vue, in 2020 to solve the complexity, slowness and the heaviness of the JavaScript module bundling toolchain. Since then, Vite has become one of the most popular build tools for web development, with over 15 million downloads per week and a community that has rated it as the Most Loved Library Overall, No.1 Most Adopted (+30%) and No.2 Highest Retention (98%) in the State of JS 2024 Developer Survey.

In addition to streamlining the development of single-page applications, Vite can also power meta frameworks and has support for server-side rendering (SSR). Although its scope is broader than what CRA was meant for, it does a fantastic job replacing CRA.

Why Vite is Faster

Vite applies several modern web technologies to improve the development experience:

1. Native ES Modules (ESM)

During development mode, Vite serves source code over native ES modules basically letting the browser handle module loading directly and skipping the bundling step. With this approach, Vite only processes and sends code as it is imported by the browser, and conditionally imported modules are processed only if they’re actually needed on the current page. This means the dev server can start much faster, even in large projects.

2. Efficient Hot Module Replacement (HMR)

By serving source code as native ESM to the browser, thus skipping the bundling step, Vite’s HMR process can provide near-instant updates while preserving the application state. When code changes, Vite updates only the modified module and its direct dependencies, ensuring fast updates regardless of project size. Additionally, Vite leverages HTTP headers and caching to minimize server requests, speeding up page reloads when necessary. More information about what HMR is and how it works in Vite can be found in this exhaustive blog post.

3. Optimized Build Tooling

Even though ESM are now widely supported, dependencies can still be shipped as CommonJS or UMD. To leverage the benefits of ESM during development, Vite uses esbuild to pre-bundle dependencies when starting the dev server. This step involves transforming CommonJS/UMD to ES modules and converting dependencies with many internal modules into a single module, thus improving performance and reducing browser requests.

When it comes to production, Vite switches to Rollup to bundle the application. Bundling is still preferred over ESM when shipping to production, as it allows for more optimizations like tree-shaking, lazy-loading and chunk splitting.

While this dual-bundler approach leverages the strengths of each bundler, it’s important to note that it’s a trade-off that can potentially introduce subtle inconsistencies between development and production environments and adds to Vite’s complexity.

By leveraging modern web technologies like ESM and efficient build tools like esbuild and Rollup, Vite represents a significant leap forward in development tooling, offering speed and simplicity that CRA simply cannot match with the way it’s currently architected.

Practical Results

The Migration Process

The codebase we migrated from CRA to Vite had around 250 files and 30k lines of code. Built as a Single Page Application using React 18, it uses Zustand and React Context for state management, with Tailwind CSS and shadcn/ui and some Bootstrap legacy components.

Here is a high-level summary of the migration process as it applied to our project, which took roughly a day to complete. The main steps included:

  1. Removing CRA-related dependencies
  2. Installing Vite and its React plugin
  3. Moving index.html to the root directory
  4. Creating a Vite configuration file
  5. Adding a type declaration file
  6. Updating the npm scripts in package.json
  7. Adjusting tsconfig.json to align with Vite’s requirements

All steps are well documented in the Vite documentation and in several step-by-step guides available on the web.

Most challenges encountered were related to environment variables and path aliases, which were easily resolved using Vite’s documentation, and its vibrant community has produced extensive resources, guides, and solutions for even the most specialized setups.

Build Time

The build time for the project using Create React App (CRA) was 1 minute and 34 seconds. After migrating to Vite, the build time was reduced to 29.2 seconds, making it 3.2 times faster.

[Build time comparison between CRA and Vite showing 3.2x improvement]

This reduction in build time speeds up CI/CD cycles, enabling more frequent testing and deployment. This is crucial for our development workflow, where faster builds mean quicker turnaround times and fewer delays for other team members. It can also reduce the cost of running the build process.

Dev Server Startup Time

The speed at which the development server starts can greatly impact the development workflow, especially in large projects.

The development server startup times saw a remarkable improvement after migrating from Create React App (CRA) to Vite. With CRA, a cold start took 15.469 seconds, and a non-cold start was 6.241 seconds. Vite dramatically reduced these times, with a cold start at just 1.202 seconds—12.9 times faster—and a non-cold start at 598 milliseconds, 10.4 times faster. The graph below highlights these impressive gains.

Development server startup time comparison showing 12.9x improvement

This dramatic reduction in startup time is particularly valuable when working with multiple branches or when frequent server restarts are needed during development.

HMR Update Time

While both CRA and Vite perform well with Hot Module Replacement at our current project scale, there are notable differences in the developer experience. CRA’s Webpack-based HMR typically takes around 1 second to update—which might sound fast, but the difference becomes apparent when compared to Vite’s near-instantaneous updates.

This distinction becomes more pronounced as projects grow in size and complexity. More importantly, the immediate feedback from Vite’s HMR creates a noticeably smoother development experience, especially when designing features that require frequent code changes and UI testing cycles. The absence of even a small delay helps maintain a more fluid and enjoyable workflow.

Bundle Size

Another essential factor is the size of the final bundled application, which affects load times and overall performance.

Bundle size comparison between CRA and Vite showing 27.5% reduction in raw bundle size and 9.3% reduction in gzipped size

This represents a 27.5% reduction in raw bundle size and a 9.3% reduction in gzipped size. For end users, this means faster page loads, less data usage, and better performance, especially on mobile devices.

The data clearly illustrates that Vite’s improvements in build times, startup speed, and bundle size provide a significant and measurable upgrade to our development workflow.

The Hidden Advantage: Reduced Context Switching

One of the less obvious but valuable benefits of migrating to a faster environment like Vite is the reduction in context switching. In environments with slower build and start-up times, developers are more likely to engage in other tasks during these “idle” moments. Research on task interruptions shows that even brief context switches can introduce cognitive “reorientation” costs, increasing stress and reducing efficiency.

By reducing build and start-up times, Vite allows our team to maintain focus on their primary tasks. Developers are less likely to switch tasks and better able to stay within the “flow” of development, ultimately leading to a smoother, more focused workflow and, over time, less cognitive strain.

Beyond the measurable metrics, the real victory lies in how Vite’s speed helps developers maintain their focus and flow, leading to a more enjoyable and happy experience overall.

The Future of Vite is Bright

Vite is aiming to be a unified toolchain for the JavaScript ecosystem, and it is already showing great progress by introducing new tools like Rolldown and OXC.

Rolldown, Vite’s new bundler written in Rust, promises to be even faster than esbuild while maintaining full compatibility with the JavaScript ecosystem. It also unifies Vite’s bundling approach across development and production environments, solving the previously mentioned trade-off. Meanwhile, OXC provides a suite of high-performance tools including the fastest JavaScript parser, resolver, and TypeScript transformer available.

These innovations are part of Vite’s broader vision to create a more unified, efficient, and performant development experience that eliminates the traditional fragmentation in JavaScript tooling.

Early benchmarks show impressive performance improvements:

  • OXC Parser is 3x faster than SWC
  • OXC Resolver is 28x faster than enhanced-resolve
  • OXC TypeScript transformer is 4x faster than SWC
  • OXLint is 50-100x faster than ESLint

With innovations like Rolldown and OXC on the horizon, Vite is not just solving today’s development challenges but is actively shaping the future of web development tooling.

Conclusion

Migrating from Create React App to Vite proved to be a straightforward process that delivered substantial benefits across multiple dimensions. The quantifiable improvements in terms of build time, bundle size and development server startup time were impressive and by themselves justify the migration effort.

However, the true value extends beyond these measurable metrics. The near-instant Hot Module Replacement, reduced context switching, and overall smoother development workflow have significantly enhanced our team’s development experience. Developers spend less time waiting and more time in their creative flow, leading to better focus and increased productivity.

The migration also positions our project for the future, as Vite continues to evolve with promising innovations like Rolldown and OXC. Given the impressive results and the relatively straightforward migration process, the switch from CRA to Vite stands as a clear win for both our development team and our application’s performance.

December 19, 2024 12:00 AM

December 18, 2024

Michael Snoyman

Normal People Shouldn't Invest

The world we live in today is inflationary. Through the constant increase in the money supply by governments around the world, the purchasing power of any dollars (or other government money) sitting in your wallet or bank account will go down over time. To simplify massively, this leaves people with three choices:

  1. Keep your money in fiat currencies and earn a bit of interest. You’ll still lose purchasing power over time, because inflation virtually always beats interest, but you’ll lose it more slowly.
  2. Try to beat inflation by investing in the stock market and other risk-on investments.
  3. Recognize that the game is slanted against you, don’t bother saving or investing, and spend all your money today.

(Side note: if you’re reading this and screaming at your screen that there’s a much better option than any of these, I’ll get there, don’t worry.)

High living and melting ice cubes

Option 3 is what we’d call “high time preference.” It means you value the consumption you can have today over the potential savings for the future. In an inflationary environment, this is unfortunately a very logical stance to take. Your money is worth more today than it will ever be later. May as well live it up while you can. Or as Milton Friedman put it, engage in high living.

But let’s ignore that option for the moment, and pursue some kind of low time preference approach. Despite the downsides, we want to hold onto our wealth for the future. The first option, saving in fiat, would work with things like checking accounts, savings accounts, Certificates of Deposit (CDs), government bonds, and perhaps corporate bonds from highly rated companies. There’s little to no risk in those of losing your original balance or the interest (thanks to FDIC protection, a horrible concept I may dive into another time). And the downside is also well understood: you’re still going to lose wealth over time.

Or, to quote James from InvestAnswers, you can hold onto some melting ice cubes. But with sufficient interest, they’ll melt a little bit slower.

The investment option

With that option sitting on the table, many people end up falling into the investment bucket. If they’re more risk-averse, it will probably be a blend of both risk-on stock investment and risk-off fiat investment. But ultimately, they’re left with some amount of money that they want to put into a risk-on investment. The only reason they’re doing that is on the hopes that between price movements and dividends, the value of their investment will grow faster than anything else they can choose.

You may be bothered by my phrasing. “The only reason.” Of course that’s the only reason! We only put money into investments in order to make more money. What other possible reason exists?

Well, the answer is that while we invest in order to make money, that’s not the only reason. That would be like saying I started a tech consulting company to make money. Yes, that’s a true reason. But the purpose of the company is to meet a need in the market: providing consulting services. Like every economic activity, starting a company has a dual purpose: making a profit, but by providing actual value.

So what actual value is generated for the world when I choose to invest in a stock? Let’s rewind to real investment, and then we’ll see how modern investment differs.

Michael (Midas) Mulligan

Let’s talk about a fictional character, Michael Mulligan, aka Midas. In Atlas Shrugged, he’s the greatest banker in the country. He created a small fortune for himself. Then, using that money, he very selectively invested in the most promising ventures. He put his own wealth on the line because he believed each of those ventures had a high likelihood to succeed.

He wasn’t some idiot who jumps on his CNBC show to spout nonsense about which stocks will go up and down. He wasn’t a venture capitalist who took money from others and put it into the highest-volatility companies hoping that one of them would 100x and cover the massive losses on the others. He wasn’t a hedge fund manager who bets everything on financial instruments so complex he can’t understand them, knowing that if it crumbles, the US government will bail him out.

And he wasn’t a normal person sitting in his house, staring at candlestick charts, hoping he can outsmart every other person staring at those same charts by buying in and selling out before everyone else.

No. Midas Mulligan represented the true gift, skill, art, and value of real investment. In the story, we find out that he was the investor who got Hank Rearden off the ground. Hank Rearden uses that investment to start a steel empire that drives the country, and ultimately that powers his ability to invest huge amounts of his new wealth into research into an even better metal that has the promise to reshape the world.

That’s what investment is. And that’s why investment has such a high reward associated with it. It’s a massive gamble that may produce untold value for society. The effort necessary to determine the right investments is high. It’s only right that Midas Mulligan be well compensated for his work. And by compensating him well, he’ll have even more money in the future to invest in future projects, creating a positive feedback cycle of innovation and improvements.

Michael (Crappy Investor) Snoyman

I am not Midas Mulligan. I don’t have the gift to choose the winners in newly emerging markets. I can’t sit down with entrepreneurs and guide them to the best way to make their ideas thrive. And I certainly don’t have the money available to make such massive investments, much less the psychological profile to handle taking huge risks with my money like that.

I’m a low time preference individual by my upbringing, plus I am very risk-averse. I spent most of my adult life putting money into either the house I live in or into risk-off assets. I discuss this background more in a blog post on my current investment patterns. During the COVID-19 money printing, I got spooked about this, realizing that the melting ice cubes were melting far faster than I had ever anticipated. It shocked me out of my risk-averse nature, realizing that if I didn’t take a more risky stance with my money, ultimately I’d lose it all.

So like so many others, I diversified. I put money into stock indices. I realized the stock market was risky, so I diversified further. I put money into various cryptocurrencies too. I learned to read candlestick charts. I made some money. I felt pretty good.

I started feeling more confident overall, and started trying to predict the market. I fixated on this. I was nervous all the time, because my entire wealth was on the line constantly.

And it gets even worse. In economics, we have the concept of an opportunity cost. If I invest in company ABC and it goes up 35% in a month, I’m a genius investor, right? Well, if company DEF went up 40% that month, I can just as easily kick myself for losing out on the better opportunity. In other words, once you’re in this system, it’s a constant rat race to keep finding the best possible returns, not simply being happy with keeping your purchasing power.

Was I making the world a better place? No, not at all. I was just another poor soul trying to do a better job of entering and exiting a trade than the next guy. It was little more than riding a casino.

And yes, I ultimately lost a massive amount of money through this.

Normal people shouldn’t invest

Which brings me to the title of this post. I don’t believe normal people should be subjected to this kind of investment. It’s an extra skill to learn. It’s extra life stress. It’s extra risk. And it doesn’t improve the world. You’re being rewarded—if you succeed at all—simply for guessing better than others.

(Someone out there will probably argue efficient markets and that having everyone trading stocks like this does in fact add some efficiencies to capital allocation. I’ll give you a grudging nod of agreement that this is somewhat true, but not sufficiently enough to justify the returns people anticipate from making “good” gambles.)

The only reason most people ever consider this is because they feel forced into it, otherwise they’ll simply be sitting on their melting ice cubes. But once they get into the game, between risk, stress, and time investment, they’re lives will often get worse.

One solution is to not be greedy. Invest in stock market indices, don’t pay attention to day-to-day price, and assume that the stock market will continue to go up over time, hopefully beating inflation. And if that’s the approach you’re taking, I can honestly say I think you’re doing better than most. But it’s not the solution I’ve landed on.

Option 4: deflation

The problem with all of our options is that they are built in a broken world. The fiat/inflationary world is a rigged game. You’re trying to walk up an escalator that’s going down. If you try hard enough, you’ll make progress. But the system is against you. This is inherent to the design. The inflation in our system is so that central planners have the undeserved ability to appropriate productive capacity in the economy to do whatever they want with it. They can use it to fund government welfare programs, perform scientific research, pay off their buddies, and fight wars. Whatever they want.

If you take away their ability to print money, your purchasing power will not go down over time. In fact, the opposite will happen. More people will produce more goods. Innovators will create technological breakthroughs that will create better, cheaper products. Your same amount of money will buy more in the future, not less. A low time preference individual will be rewarded. By setting aside money today, you’re allowing productive capacity today to be invested into building a stronger engine for tomorrow. And you’ll be rewarded by being able to claim a portion of that larger productive pie.

And to reiterate: in today’s inflationary world, if you defer consumption and let production build a better economy, you are punished with reduced purchasing power.

So after burying the lead so much, my option 4 is simple: Bitcoin. It’s not an act of greed, trying to grab the most quickly appreciating asset. It’s about putting my money into a system that properly rewards low time preference and saving. It’s admitting that I have no true skill or gift to the world through my investment capabilities. It’s recognizing that I care more about destressing my life and focusing on things I’m actually good at than trying to optimize an investment portfolio.

Can Bitcoin go to 0? Certainly, though year by year that likelihood is becoming far less likely. Can Bitcoin have major crashes in its price? Absolutely, but I’m saving for the long haul, not for a quick buck.

I’m hoping for a world where deflation takes over. Where normal people don’t need to add yet another stress and risk to their life, and saving money is the most natural, safest, and highest-reward activity we can all do.

Further reading

December 18, 2024 12:00 AM

December 16, 2024

GHC Developer Blog

GHC 9.12.1 is now available

GHC 9.12.1 is now available

Zubin Duggal - 2024-12-16

The GHC developers are very pleased to announce the release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • Experimental support for the RISC-V platform with the native code generator.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at December 16, 2024 12:00 AM

December 12, 2024

Stackage Blog

LTS 23 release for ghc-9.8 and Nightly now on ghc-9.10

Stackage LTS 23 has been released

The Stackage team is happy to announce that Stackage LTS version 23 has finally been released a couple of days ago, based on GHC stable version 9.8.4. It follows on from the LTS 22 series which was the longest lived LTS major release to date (with probable final snapshot lts-22.43).

We are dedicating the LTS 23 release to the memory of Chris Dornan, who left this world suddenly and unexceptedly around the end of May. We are indebted to Christopher for his many years of wide Haskell community service, including also being one of the Stackage Curators up until the time he passed away. He is warmly remembered.

LTS 23 includes many package changes, and almost 3200 packages! Thank you for all your nightly contributions that made this release possible: the initial release was prepared by Jens Petersen. (The closest nightly snapshot to lts-23.0 is nightly-2024-12-09, but lts-23 is just ahead of it with pandoc-3.6.)

If your package is missing from LTS 23 and can build there, you can easily have it added by opening a PR in lts-haskell to the build-constraints/lts-23-build-constraints.yaml file.

Stackage Nightly updated to ghc-9.10.1

At the same time we are excited to move Stackage Nightly to GHC 9.10.1: the initial snapshot release is nightly-2024-12-11. Current nightly has over 2800 packages, and we expect that number to grow over the coming weeks and months: we welcome your contributions and help with this. This initial release build was made by Jens Petersen (64 commits).

Most of our upperbounds were dropped for this rebase so quite a lot of packages had to be disabled. You can see all the changes made relative to the preceding last 9.8 nightly snapshot. Apart from trying to build yourself, the easiest way to understand why particular packages are disabled is to look for their < 0 lines in build-constraints.yaml, particularly under the "Library and exe bounds failures" section. We also have some tracking issues still open related to 9.10 core boot libraries.

Thank you to all those who have already done work updating their packages for ghc-9.10.

Adding or enabling your package for Nightly is just a simple pull request to the large build-constraints.yaml file.

If you have questions, you can ask in Stack and Stackage Matrix room (#haskell-stack:matrix.org) or Slack channel.

December 12, 2024 07:00 AM

December 11, 2024

Haskell Interlude

Episode 59: Harry Goldstein

Sam and Wouter interview Harry Goldstein, a researcher in property-based testing who works in PL, SE, and HCI. In this episode, we reflect on random generators, the find-a-friend model, interdisciplinary research, and how to have impact beyond your own research community.

by Haskell Podcast at December 11, 2024 02:00 PM

Philip Wadler

John Longley's Informatics Lecturer Song

From my colleague, John Longley, a treat. 

‘Informatics Lecturer Song 

(Based on Gilbert and Sullivan’s ‘Major General song’) 

John Longley 

I am the very model of an Informatics lecturer,
For educating students you will never find a betterer.
I teach them asymptotics with a rigour that’s impeccable,
I’ll show them how to make their proofs mechanically checkable.
On parsing algorithms I can hold it with the best of them,
With LL(1) and CYK and Earley and the rest of them.
I’ll teach them all the levels of the Chomsky hierarchy…
With a nod towards that Natural Language Processing malarkey.

I’ll summarize the history of the concept of a function,
And I’ll tell them why their Haskell code is ‘really an adjunction’.
In matters mathematical and logical, etcetera,
I am the very model of an Informatics lecturer.

For matters of foundations I’m a genuine fanaticker:
I know by heart the axioms of Principia Mathematica,
I’m quite au fait with Carnap and with Wittgenstein’s Tractatus,
And I’ll dazzle you with Curry, Church and Turing combinators.
I’ll present a proof by Gödel with an algebraic seasoning,
I’ll instantly detect a step of non-constructive reasoning.
I’ll tell if you’re a formalist or logicist or Platonist…
For I’ll classify your topos by the kinds of objects that exist.

I’ll scale the heights of cardinals from Mahlo to extendible,
I’ll find your favourite ordinals and stick them in an n-tuple.
In matters philosophical, conceptual, etcetera,
I am the very essence of an Informatics lecturer.

And right now I’m getting started on my personal computer,
I’ve discovered how to get it talking to the Wifi router.
In Internet and World Wide Web I’ve sometimes had my finger dipped,
And once I wrote a line of code in HTML/Javascript.
[Sigh.] I know I have a way to go to catch up with my students,
But I try to face each lecture with a dash of common prudence.
When it comes to modern tech: if there’s a way to get it wrong, I do!
But that seems to be forgiven if I ply them with a song or two.

So… although my present IT skills are rather rudimentary,
And my knowledge of computing stops around the nineteenth century,
Still, with help from all my colleagues and my audience, etcetera…
I’ll be the very model of an Informatics lecturer.


by Philip Wadler (noreply@blogger.com) at December 11, 2024 11:52 AM

December 10, 2024

Chris Smith 2

When is a call stack not a call stack?

Tom Ellis, who I have the privilege of working with at Groq, has an excellent article up about using HasCallStack in embedded DSLs. You should read it. If you don’t, though, the key idea is that HasCallStack isn’t just about exceptions: you can use it to get source code locations in many different contexts, and storing call stacks with data is particularly powerful in providing a helpful experience to programmers.

Seeing Tom’s article reminded me of a CodeWorld feature which was implemented long ago, but I’m excited to share again in this brief note.

CodeWorld Recap

If you’re not familiar with CodeWorld, it’s a web-based programming environment I created mainly to teach mathematics and computational thinking to students in U.S. middle school, ages around 11 to 14 years old. The programming language is based on Haskell — well, it is technically Haskell, but with a lot of preprocessing and tricks aimed at smoothing out the rough edges. There’s also a pure Haskell mode, giving you the full power of the idiomatic Haskell language.

In CodeWorld, the standard library includes primitives for putting pictures on the screen. This includes:

  • A few primitive pictures: circles, rectangles, and the like
  • Transformations to rotate, translate, scale, clip, and and recolor an image
  • Compositions to overlay and combine multiple pictures into a more complex picture.

Because the environment is functional and declarative — and this will be important — there isn’t a primitive to draw a circle. There is a primitive that represents the concept of a circle. You can include a circle in your drawing, of course, but you compose a picture by combining simpler pictures declaratively, and then draw the whole thing only at the very end.

Debugging in CodeWorld

CodeWorld’s declarative interface enables a number of really fun kinds of interactivity… what programmers might call “debugging”, but for my younger audience, I view as exploratory tools: ways they can pry open the lid of their program and explore what it’s doing.

There are a few of these that are pretty awesome. Lest I seem to be claiming the credit, the implementation for these features is due to two students in Summer of Haskell and then in Google Summer of Code: Eric Roberts, and Krystal Maughan.

  • Not the point here, but there are some neat features for rewinding and replaying programs, zooming in, etc.
  • There’s also an “inspect” mode, in which you not only see the final result, but the whole structure of the resulting picture (e.g., maybe it’s an overlay of three other pictures: a background, and two characters, and each of those is transformed in some way, and the base picture for the transformation is some other overlay of multiple parts…) This is possible because pictures are represented not as bitmaps, but as data structures that remember how the picture was built from its individual parts

Krystal’s recap blog post contains demonstrations of not only her own contributions, but the inspect window as well. Here’s a section showing what I’ll talk about now.

https://medium.com/media/7f09408e8411d852516bedb5aab2601c/href

The inspect window is linked to the code editor! Hover over a structural part of the picture, and you can see which expression in your own code produced that part of the picture.

This is another application of the technique from Tom’s post. The data type representing pictures in CodeWorld stores a call stack captured at each part of the picture, so that when you inspect the picture and hover over some part, the environment knows where in your code you described that part, and it highlights the code for you, and jumps there when clicked.

While it’s the same technique, I really like this example because it’s not at all like an exception. We aren’t reporting errors or anything of the sort. Just using this nice feature of GHC that makes the connection between code and declarative data observable to help our users observe things about their own code.

by Chris Smith at December 10, 2024 10:50 PM

Christopher Allen

Two memory issues from the last two weeks

Okay maybe they don't qualify as actual memory bugs, but they were annoying and had memory as a common theme. One of them by itself doesn't merit a blog post so I bundled them together.

by Unknown at December 10, 2024 12:00 AM

December 02, 2024

GHC Developer Blog

GHC 9.8.4 is now available

GHC 9.8.4 is now available

Ben Gamari - 2024-12-02

The GHC developers are happy to announce the availability of GHC 9.8.4. Binary distributions, source distributions, and documentation are available on the release page.

This release is a small release fixing a few issues noted in 9.8.3, including:

  • Update the filepath submodule to avoid a misbehavior of splitFileName under Windows.

  • Update the unix submodule to fix a compilation issue on musl platforms

  • Fix a potential source of miscompilation when building large projects on 32-bit platforms

  • Fix unsound optimisation of prompt# uses

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Happy compiling!

  • Ben

by ghc-devs at December 02, 2024 12:00 AM

December 01, 2024

Magnus Therning

Servant and a weirdness in Keycloak

When writing a small tool to interface with Keycloak I found an endpoint that require the content type to be application/json while the body should be plain text. (The details are in the issue.) Since servant assumes that the content type and the content match (I know, I'd always thought that was a safe assumption to make too) it doesn't work with ReqBody '[JSON] Text. Instead I had to create a custom type that's a combination of JSON and PlainText, something that turned out to required surprisingly little code:

data KeycloakJSON deriving (Typeable)

instance Accept KeycloakJSON where
    contentType _ = "application" // "json"

instance MimeRender KeycloakJSON Text where
    mimeRender _ = fromStrict . encodeUtf8

The bug has already been fixed in Keycloak, but I'm sure there are other APIs with similar weirdness so maybe this will be useful to someone else.

December 01, 2024 10:00 PM

Christopher Allen

Rebuilding Rust (Leptos) apps quickly

I'm working on a side project that is written in Rust on the backend and the frontend. The frontend component is in Leptos. Our app is about 20kLOC in total, so it takes a little time.

by Unknown at December 01, 2024 12:00 AM

November 29, 2024

GHC Developer Blog

GHC 9.12.1-rc1 is now available

GHC 9.12.1-rc1 is now available

Zubin Duggal - 2024-11-29

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 29, 2024 12:00 AM

November 28, 2024

Christopher Allen

The cost of hosting is too damn high

I recently migrated a side project from DigitalOcean to some dedicated servers. I thought that I would offer some context and examples for why.

by Unknown at November 28, 2024 12:00 AM

November 27, 2024

Brent Yorgey

Competitive Programming in Haskell: stacks, queues, and monoidal sliding windows

Competitive Programming in Haskell: stacks, queues, and monoidal sliding windows

Posted on November 27, 2024
Tagged , , , , ,

Suppose we have a list of items of length \(n\), and we want to consider windows (i.e. contiguous subsequences) of width \(w\) within the list.

A list of numbers, with contiguous size-3 windows highlighted

We can compute the sum of each window by brute force in \(O(nw)\) time, by simply generating the list of all the windows and then summing each. But, of course, we can do better: keep track of the sum of the current window; every time we slide the window one element to the right we can add the new element that enters the window on the right and subtract the element that falls of the window to the left. Using this “sliding window” technique, we can compute the sum of every window in only \(O(n)\) total time instead of \(O(nw)\).

How about finding the maximum of every window? Of course the brute force \(O(nw)\) algorithm still works, but doing it in only \(O(n)\) is considerably trickier! We can’t use the same trick as we did for sums since there’s no way to “subtract” the element falling off the left. This really comes down to the fact that addition forms a group (i.e. a monoid-with-inverses), but max does not. So more generally, the question is: how can we compute a monoidal summary for every window in only \(O(n)\) time?

Today I want to show you how to solve this problem using one of my favorite competitive programming tricks, which fits beautifully in a functional context. Along the way we’ll also see how to implement simple yet efficient functional queues.

Stacks

Before we get to queues, we need to take a detour through stacks. Stacks in Haskell are pretty boring. We can just use a list, with the front of the list corresponding to the top of the stack. However, to make things more interesting—and because it will come in very handy later—we’re going to implement monoidally-annotated stacks. Every element on the stack will have a measure, which is a value from some monoid m. We then want to be able to query any stack for the total of all the measures in \(O(1)\). For example, perhaps we want to always be able to find the sum or max of all the elements on a stack.

If we wanted to implement stacks annotated by a group, we could just do something like this:

data GroupStack g a = GroupStack (a -> g) !g [a]

That is, a GroupStack stores a measure function, which assigns to each element of type a a measure of type g (which is intended to be a Group); a value of type g representing the sum (via the group operation) of measures of all elements on the stack; and the actual stack itself. To push, we would just compute the measure of the new element and add it to the cached g value; to pop, we subtract the measure of the element being popped, something like this:

push :: a -> GroupStack g a -> GroupStack g a
push a (GroupStack f g as) = GroupStack f (f a <> g) (a:as)

pop :: GroupStack g a -> Maybe (a, GroupStack g a)
pop (GroupStack f g as) = case as of
  [] -> Nothing
  (a:as') -> GroupStack f (inv (f a) <> g) as'

But this won’t work for a monoid, of course. The problem is pop, where we can’t just subtract the measure for the element being popped. Instead, we need to be able to restore the measure of a previous stack. Hmmm… sounds like we might be able to use… a stack! We could just store a stack of measures alongside the stack of elements; even better is to store a stack of pairs. That is, each element on the stack is paired with an annotation representing the sum of all the measures at or below it. Here, then, is our representation of monoidally-annotated stacks:

{-# LANGUAGE BangPatterns #-}

module Stack where

data Stack m a = Stack (a -> m) !Int [(m, a)]

A Stack m a stores three things:

  1. A measure function of type a -> m.Incidentally, what if we want to be able to specify an arbitrary measure for each element, and even give different measures to the same element at different times? Easy: just use (m,a) pairs as elements, and use fst as the measure function.

  2. An Int representing the size of the stack. This is not strictly necessary, especially since one could always just use a monoidal annotation to keep track of the size; but wanting the size is so ubiquitous that it seems convenient to just include it as a special case.

  3. The aforementioned stack of (annotation, element) pairs.

Note that we cannot write a Functor instance for Stack m, since a occurs contravariantly in (a -> m). But this makes sense: if we change all the a values, the cached measures would no longer be valid.

When creating a new, empty stack, we have to specify the measure function; to get the measure of a stack, we just look up the measure on top, or return mempty for an empty stack.

new :: (a -> m) -> Stack m a
new f = Stack f 0 []

size :: Stack m a -> Int
size (Stack _ n _) = n

measure :: Monoid m => Stack m a -> m
measure (Stack _ _ as) = case as of
  [] -> mempty
  (m, _) : _ -> m

Now let’s implement push and pop. Both are relatively straightforward.

push :: Monoid m => a -> Stack m a -> Stack m a
push a s@(Stack f n as) = Stack f (n + 1) ((f a <> measure s, a) : as)

pop :: Stack m a -> Maybe (a, Stack m a)
pop (Stack f n as) = case as of
  [] -> Nothing
  (_, a) : as' -> Just (a, Stack f (n - 1) as')

Note that if we care about using non-commutative monoids, in the implementation of push we have a choice to make between f a <> measure s and measure s <> f a. The former seems nicer to me, since it keeps the measures “in the same order” as the list representing the stack. For example, if we push a list of elements onto a stack via foldr, using the measure function (:[]) that injects each element into the monoid of lists, the resulting measure is just the original list:

measure . foldr push (new (:[])) == id

And more generally, for any measure function f, we have

measure . foldr push (new f) == foldMap f

Finally, we are going to want a function to reverse a stack, which is a one-liner:

reverse :: Monoid m => Stack m a -> Stack m a
reverse (Stack f _ as) = foldl' (flip push) (new f) (map snd as)

That is, to reverse a stack, we extract the elements and then use foldl' to push the elements one at a time onto a new stack using the same measure function.

There is a bit more code you can find on GitHub, such as Show and Eq instances.

Queues

Now that we have monoidally-annotated stacks under our belt, let’s turn to queues. And here’s where my favorite trick is revealed: we can implement a queue out of two stacks, so that enqueue and dequeue run in \(O(1)\) amortized time; and if we use monoidally-annotated stacks, we get monoidally-annotated queues for free!

First, some imports.

{-# LANGUAGE ImportQualifiedPost #-}

module Queue where

import Data.Bifunctor (second)
import Stack (Stack)
import Stack qualified as Stack

A Queue m a just consists of two stacks, one for the front and one for the back. To create a new queue, we just create two new stacks; to get the size of a queue, we just add the sizes of the stacks; to get the measure of a queue, we just combine the measures of the stacks. Easy peasy.

type CommutativeMonoid = Monoid

data Queue m a = Queue {getFront :: Stack m a, getBack :: Stack m a}
  deriving (Show, Eq)

new :: (a -> m) -> Queue m a
new f = Queue (Stack.new f) (Stack.new f)

size :: Queue m a -> Int
size (Queue front back) = Stack.size front + Stack.size back

measure :: CommutativeMonoid m => Queue m a -> m
measure (Queue front back) = Stack.measure front <> Stack.measure back

Note the restriction to commutative monoids, since the queue elements are stored in different orders in the front and back stacks. If we really cared about making this work with non-commutative monoids, we would have to make two different push methods for the front and back stacks, to combine the measures in opposite orders. That just doesn’t seem worth it. But if you have a good example requiring the use of a queue annotated by a non-commutative monoid, I’d love to hear it!

Now, to enqueue, we just push the new element on the back:

enqueue :: CommutativeMonoid m => a -> Queue m a -> Queue m a
enqueue a (Queue front back) = Queue front (Stack.push a back)

Dequeueing is the magic bit that makes everything work. If there are any elements in the front stack, we can just pop from there. Otherwise, we need to first reverse the back stack into the front stack. This means dequeue may occasionally take \(O(n)\) time, but it’s still \(O(1)\) amortized.The easiest way to see this is to note that every element is touched exactly three times: once when it is pushed on the back; once when it is transferred from the back to the front; and once when it is popped from the front. So, overall, we do \(O(1)\) work per element.

dequeue :: CommutativeMonoid m => Queue m a -> Maybe (a, Queue m a)
dequeue (Queue front back)
  | Stack.size front == 0 && Stack.size back == 0 = Nothing
  | Stack.size front == 0 = dequeue (Queue (Stack.reverse back) front)
  | otherwise = second (\front' -> Queue front' back) <$> Stack.pop
  front

Finally, for convenience, we can make a function drop1 which just dequeues an item from the front of a queue and throws it away.

drop1 :: CommutativeMonoid m => Queue m a -> Queue m a
drop1 q = case dequeue q of
  Nothing -> q
  Just (_, q') -> q'

This “banker’s queue” method of building a queue out of two stacks is discussed in Purely Functional Data Structures by Okasaki, though I don’t think he was the first to come up with the idea. It’s also possible to use some clever tricks to make both enqueue and dequeue take \(O(1)\) time in the worst case. In a future post I’d like to do some benchmarking to compare various queue implementations (i.e. banker’s queues, Data.Sequence, circular array queues built on top of STArray). At least anecdotally, in solving some sliding window problems, banker’s queues seem quite fast so far.

Sliding windows

I hope you can see how this solves the initial motivating problem: to find e.g. the max of a sliding window, we can just put the elements in a monoidally-annotated queue, enqueueing and dequeueing one element every time we slide the window over.More generally, of course, it doesn’t even matter if the left and right ends of the window stay exactly in sync; we can enqueue and dequeue as many times as we want.

The following windows function computes the monoidal sum foldMap f window for each window of width \(w\), in only \(O(n)\) time overall.

windows :: CommutativeMonoid m => Int -> (a -> m) -> [a] -> [m]
windows w f as = go startQ rest
 where
  (start, rest) = splitAt w as
  startQ = foldl' (flip enqueue) (new f) start

  go q as =
    measure q : case as of
      [] -> []
      a : as -> go (enqueue a (drop1 q)) as

“But…maximum and minimum do not form monoids, only semigroups!” I hear you cry. Well, we can just adjoin special positive or negative infinity elements as needed, like so:

data Max a = NegInf | Max a deriving (Eq, Ord, Show)

instance Ord a => Semigroup (Max a) where
  NegInf <> a = a
  a <> NegInf = a
  Max a <> Max b = Max (max a b)

instance Ord a => Monoid (Max a) where
  mempty = NegInf

data Min a = Min a | PosInf deriving (Eq, Ord, Show)

instance Ord a => Semigroup (Min a) where
  PosInf <> a = a
  a <> PosInf = a
  Min a <> Min b = Min (min a b)

instance Ord a => Monoid (Min a) where
  mempty = PosInf

Now we can write, for example, windows 3 Max [1,4,2,8,9,4,4,6] which yields [Max 4, Max 8, Max 9, Max 9, Max 9, Max 6], the maximums of each 3-element window.

Challenges

If you’d like to try solving some problems using the techniques from this blog post, I can recommend the following (generally in order of difficulty):

In a future post I’ll walk through my solution to Hockey Fans. And here’s another couple problems along similar lines; unlike the previous problems I am not so sure how to solve these in a nice way. I may write about them in the future.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 27, 2024 12:00 AM

November 18, 2024

Haskell Interlude

58: ICFP 2024

In this episode, Matti and Sam traveled to the International Conference on Functional Programming (ICFP 2024) in Milan, Italy, and recorded snippets with various participants, including keynote speakers, Haskell legends, and organizers.

by Haskell Podcast at November 18, 2024 04:00 PM

Brent Yorgey

Competitive Programming in Haskell: Union-Find, part II

Competitive Programming in Haskell: Union-Find, part II

Posted on November 18, 2024
Tagged , ,

In my previous post I explained how to implement a reasonably efficient union-find data structure in Haskell, and challenged you to solve a couple Kattis problems. In this post, I will (1) touch on a few generalizations brought up in the comments of my last post, (2) go over my solutions to the two challenge problems, and (3) briefly discuss generalizing the second problem’s solution to finding max-edge decompositions of weighted trees.

Generalizations

Before going on to explain my solutions to those problems, I want to highlight some things from a comment by Derek Elkins and a related blog post by Philip Zucker. The first is that instead of (or in addition to) annotating each set with a value from a commutative semigroup, we can also annotate the edges between nodes with elements from a group (or, more generally, a groupoid). The idea is that each edge records some information about, or evidence for, the relationship between the endpoints of the edge. To compute information about the relationship between two arbitrary nodes in the same set, we can compose elements along the path between them. This is a nifty idea—I have never personally seen it used for a competitive programming problem, but it probably has been at some point. (It kind of makes me want to write such a problem!) And of course it has “real” applications beyond competitive programming as well. I have not actually generalized my union-find code to allow edge annotations; I leave it as an exercise for the reader.

The other idea to highlight is that instead of thinking in terms of disjoint sets, what we are really doing is building an equivalence relation, which partitions the elements into disjoint equivalence classes. In particular, we do this by incrementally building a relation \(R\), where the union-find structure represents the reflexive, transitive, symmetric closure of \(R\). We start with the empty relation \(R\) (whose reflexive, transitive, symmetric closure is the discrete equivalence relation, with every element in its own equivalence class); every \(\mathit{union}(x,y)\) operation adds \((x,y)\) to \(R\); and the \(\mathit{find}(x)\) operation computes a canonical representative of the equivalence class of \(x\). In other words, given some facts about which things are related to which other things (possibly along with some associated evidence), the union-find structure keeps track of everything we can infer from the given facts and the assumption that the relation is an equivalence.

Finally, through the comments I also learned about other potentially-faster-in-practice schemes for doing path compression such as Rem’s Algorithm; I leave it for future me to try these out and see if they speed things up.

Now, on to the solutions!

Duck Journey

In Duck Journey, we are essentially given a graph with edges labelled by bitstrings, where edges along a path are combined using bitwise OR. We are then asked to find the greatest possible value of a path between two given vertices, assuming that we are allowed to retrace our steps as much as we want.Incidentally, if we are not allowed to retrace our steps, this problem probably becomes NP-hard.

If we can retrace our steps, then on our way from A to B we might as well visit every edge in the entire connected component, so this problem is not really about path-finding at all. It boils down to two things: (1) being able to quickly test whether two given vertices are in the same connected component or not, and (2) computing the bitwise OR of all the edge labels in each connected component.

One way to solve this would be to first use some kind of graph traversal, like DFS, to find the connected components and build a map from vertices to component labels; then partition the edges by component and take the bitwise OR of all the edge weights in each component. To answer queries we could first look up the component label of the two vertices; if the labels are the same then we look up the total weight for that component.

This works, and is in some sense the most “elemantary” solution, but it requires building some kind of graph data structure, storing all the edges in memory, doing the component labelling via DFS and building another map, and so on. An alternative solution is to use a union-find structure with a bitstring annotation for each set: as we read in the edges in the input, we simply union the endpoints of the edge, and then update the bitstring for the resulting equivalence class with the bitstring for the edge. If we take a union-find library as given, this solution seems simpler to me.

First, some imports and the top-level main function. (See here for the ScannerBS module.)

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}

module Main where

import Control.Category ((>>>))
import Control.Monad.ST
import Data.Bits
import Data.ByteString.Lazy.Char8 (ByteString)
import Data.ByteString.Lazy.Char8 qualified as BS

import ScannerBS
import UnionFind qualified as UF

main = BS.interact $ runScanner tc >>> solve >>> format

format :: [Maybe Int] -> ByteString
format = map (maybe "-1" (show >>> BS.pack)) >>> BS.unlines

Next, some data types to represent the input, and a Scanner to read it.

-- Each edge is a "filter" represented as a bitstring stored as an Int.
newtype Filter = Filter Int
  deriving (Eq, Show)

instance Semigroup Filter where
  Filter x <> Filter y = Filter (x .|. y)

filterSize :: Filter -> Int
filterSize (Filter f) = popCount f

data Channel = Channel UF.Node UF.Node Filter deriving (Eq, Show)
data TC = TC {n :: !Int, channels :: [Channel], queries :: [(Int, Int)]}
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  m <- int
  q <- int
  channels <- m >< (Channel <$> int <*> int <*> (Filter <$> int))
  queries <- q >< pair int int
  return TC {..}

Finally, here’s the solution itself: process each channel with a union-find structure, then process queries. The annoying thing, of course, is that this all has to be in the ST monad, but other than that it’s quite straightforward.

solve :: TC -> [Maybe Int]
solve TC {..} = runST $ do
  uf <- UF.new (n + 1) (Filter 0)
  mapM_ (addChannel uf) channels
  mapM (answer uf) queries

addChannel :: UF.UnionFind s Filter -> Channel -> ST s ()
addChannel uf (Channel a b f) = do
  UF.union uf a b
  UF.updateAnn uf a f

answer :: UF.UnionFind s Filter -> (Int, Int) -> ST s (Maybe Int)
answer uf (a, b) = do
  c <- UF.connected uf a b
  case c of
    False -> pure Nothing
    True -> Just . filterSize <$> UF.getAnn uf a

Inventing Test Data

In Inventing Test Data, we are given a tree \(T\) with integer weights on its edges, and asked to find the minimum possible weight of a complete graph for which \(T\) is the unique minimum spanning tree (MST).



Let \(e = (x,y)\) be some edge which is not in \(T\). There must be a unique path between \(x\) and \(y\) in \(T\) (so adding \(e\) to \(T\) would complete a cycle); let \(m\) be the maximum weight of the edges along this path. Then I claim that we must give edge \(e\) weight \(m+1\):

  • On the one hand, this ensures \(e\) can never be in any MST, since an edge which is strictly the largest edge in some cycle can never be part of an MST (this is often called the “cycle property”).
  • Conversely, if \(e\) had a weight less than or equal to \(m\), then \(T\) would not be a MST (or at least not uniquely): we could remove any edge in the path from \(x\) to \(y\) through \(T\) and replace it with \(e\), resulting in a spanning tree with a lower (or equal) weight.

Hence, every edge not in \(T\) must be given a weight one more than the largest weight in the unique \(T\)-path connecting its endpoints; these are the minimum weights that ensure \(T\) is a unique MST.

A false start

At first, I thought what we needed was a way to quickly compute this max weight along any path in the tree (where by “quickly” I mean something like “faster than linear in the length of the path”). There are indeed ways to do this, for example, using a heavy-light decomposition and then putting a data structure on each heavy path that allows us to query subranges of the path quickly. (If we use a segment tree on each path we can even support operations to update the edge weights quickly.)

All this is fascinating, and something I may very well write about later. But it doesn’t actually help! Even if we could find the max weight along any path in \(O(1)\), there are still \(O(V^2)\) edges to loop over, which is too big. There can be up to \(V = 15\,000\) nodes in the tree, so \(V^2 = 2.25 \times 10^8\). A good rule of thumb is \(10^8\) operations per second, and there are likely to be very high constant factors hiding in whatever complex data structures we use to query paths efficiently.

So we need a way to somehow process many edges at once. As usual, a change in perspective is helpful; to get there we first need to take a slight detour.

Kruskal’s Algorithm

It helps to be familiar with Kruskal’s Algorithm, which is the simplest algorithm I know for finding minimum spanning trees:

  • Sort the edges from smallest to biggest weight.
  • Initialize \(T\) to an empty set of edges.
  • For each edge \(e\) in order from smallest to biggest:
    • If \(e\) does not complete a cycle with the other edges already in \(T\), add \(e\) to \(T\).

To efficiently check whether \(e\) completes a cycle with the other edges in \(T\), we can use a union-find, of course: we maintain equivalence classes of vertices under the “is connected to” equivalence relation; adding \(e\) would complete a cycle if and only if the endpoints of \(e\) are already connected to each other in \(T\). If we do add an edge \(e\), we can just \(\mathit{union}\) its endpoints to properly maintain the relation.

A change of perspective

So how does this help us solve “Inventing Test Data”? After all, we are not being directly asked to find a minimum spanning tree. However, it’s still helpful to think about the process Kruskal’s Algorithm would go through, in order to choose edge weights that will force it to do what we want (i.e. pick all the edges in \(T\)). That is, instead of thinking about each individual edge not in \(T\), we can instead think about the edges that are in \(T\), and what must be true to force Kruskal’s algorithm to pick each one.

Suppose we are part of the way through running Kruskal’s algorithm, and that it is about to consider a given edge \(e = (x,y) \in T\) which has weight \(w_e\). At this point it has already considered any edges with smaller weight, and (we shall assume) chosen all the smaller-weight edges in \(T\). So let \(X\) be the set of vertices reachable from \(x\) by edges in \(T\) with weight less than or equal to \(w_e\), and similarly let \(Y\) be those reachable from \(y\). Kruskal’s algorithm will pick edge \(e\) after checking that \(X\) and \(Y\) are disjoint.



Think about all the other edges from \(X\) to \(Y\): all of them must have weight greater than \(w_e\), because otherwise Kruskal’s algorithm would have already considered them earlier, and used one of them to connect \(X\) and \(Y\). In fact, all of these edges must have weight \(w_e + 1\), as we argued earlier, since \(e\) is the largest-weight edge on the \(T\)-path between their endpoints (all the other edges on these paths were already chosen earlier and hence have smaller weight). The number of such edges is just \(|X| |Y| - 1\) (there is an edge for every pair of vertices, but we do not want to count \(e\) itself). Hence they contribute a total of \((|X||Y| - 1)(w_e + 1)\) to the sum of edge weights.

Hopefully the solution is now becoming clear: we process the edges of \(T\) in order from smallest to biggest, using a union-find to keep track equivalence classes of connected vertices so far. For each edge \((x,y)\) we look up the sizes of the equivalence classes of \(x\) and \(y\), add \((|X||Y| - 1)(w_e + 1)\) to a running total, and union. This accounts for all the edges not in \(T\); finally we must also add the weights of the edges in \(T\) themselves.

First some standard pragmas and imports, along with some data types and a Scanner to parse the input. Note the custom Ord instance for Edge, so we can sort edges by weight.

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE RecordWildCards #-}

import Control.Category ((>>>))
import Control.Monad.ST
import Data.ByteString.Lazy.Char8 qualified as BS
import Data.List (sort)
import Data.Ord (comparing)
import Data.STRef
import ScannerBS
import UnionFind qualified as UF

main = BS.interact $ runScanner (numberOf tc) >>> map (solve >>> show >>> BS.pack) >>> BS.unlines

data Edge = Edge {a :: !Int, b :: !Int, w :: !Integer}
  deriving (Eq, Show)

instance Ord Edge where
  compare = comparing w

data TC = TC {n :: !Int, edges :: [Edge]}
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  edges <- (n - 1) >< (Edge <$> int <*> int <*> integer)
  return TC {..}

Finally, the (remarkably short) solution proper: we sort the edges and process them from smallest to biggest; for each edge we update an accumulator according to the formula discussed above. Since we’re already tied to the ST monad anyway, we might as well keep the accumulator in a mutable STRef cell.

solve :: TC -> Integer
solve TC {..} = runST $ do
  uf <- UF.new (n + 1)
  total <- newSTRef (0 :: Integer)
  mapM_ (processEdge uf total) (sort edges)
  readSTRef total

processEdge :: UF.UnionFind s -> STRef s Integer -> Edge -> ST s ()
processEdge uf total (Edge a b w) = do
  modifySTRef' total (+ w)
  sa <- UF.size uf a
  sb <- UF.size uf b
  modifySTRef' total (+ (fromIntegral sa * fromIntegral sb - 1) * (w + 1))
  UF.union uf a b

Max-edge decomposition



Incidentally, there’s something a bit more general going on here: for a given nonempty weighted tree \(T\), a max-edge decomposition of \(T\) is a binary tree defined as follows:

  • The max-edge decomposition of a trivial single-vertex tree is a single vertex.
  • Otherwise, the max-edge decomposition of \(T\) consists of a root node with two children, which are the max-edge decompositions of the two trees that result from deleting a largest-weight edge from \(T\).

Any max-edge decomposition of a tree \(T\) with \(n\) vertices will have \(n\) leaf nodes and \(n-1\) internal nodes. Typically we think of the leaf nodes of the decomposition as being labelled by the vertices of \(T\), and the internal nodes as being labelled by the edges of \(T\).

An alternative way to think of the max-edge decomposition is as the binary tree of union operations performed by Kruskal’s algorithm while building \(T\), starting with each vertex in a singleton leaf and then merging two trees into one with every union operation. Thinking about, or even explicitly building, this max-edge decomposition occasionally comes in handy. For example, see Veður and Toll Roads.

Incidentally, I can’t remember whether I got the term “max-edge decomposition” from somewhere else or if I made it up myself; in any case, regardless of what it is called, I think I first learned of it from this blog post by Petr Mitrichev.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 18, 2024 12:00 AM