Planet Haskell

March 21, 2025

Well-Typed.Com

GHC activities report: December 2024–February 2025

This is the twenty-sixth edition of our GHC activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of December 2024 to February 2025. You can find the previous editions collected under the ghc-activities-report tag.

Sponsorship

We offer Haskell Ecosystem Support Packages to provide commercial users with support from Well-Typed’s experts, while investing in the Haskell community and its technical ecosystem. Clients who engage in these packages both fund the work described in this report and support the Haskell Foundation.

We are delighted to announce two new Bronze Haskell Ecosystem Supporters: Channable and QBayLogic. Many thanks also to our existing clients who also contribute to making this work possible: Anduril, Juspay and Mercury, and to the HLS Open Collective for supporting HLS release management.

However, Haskell’s core infrastructure is expansive; we need more funding to continue this essential maintenance work! Read more about our offer, or get in touch with us today.

Team

The Haskell toolchain team at Well-Typed currently includes:

In addition, many others within Well-Typed contribute to GHC, Cabal and HLS occasionally, or contribute to other open source Haskell libraries and tools.

GHC

GHC Releases

We are currently overseeing releases of the GHC 9.10 and 9.12 release series.

Zubin oversaw the preparation and final release of GHC 9.12.1 on 16 December 2024. Unfortunately, it was found in mid-January that this release was affected by a regression affecting sub-word division (#25653). In response to this we scheduled a minimal 9.12.2 release fixing this issue, which was released on 14 March 2025. We anticipate that the next GHC 9.12 release will come in the summer.

Our current release engineering focus is 9.10.2, which is currently being worked on by Andreas and Ben. We expect that the release candidate for this will be out in late March.

In parallel, Zubin has been working towards cutting a corresponding release of HLS and introducing support for GHC 9.12.

Platform support

For many years, GHC’s FreeBSD support has been in a state of limbo: while the compiler has usually been functional on FreeBSD, we have never had proper CI support, meaning that we could neither systematically validate correctness nor produce binary distributions.

Late last year Ben fixed several issues that had been plaguing FreeBSD, allowing it to pass the testsuite and worked with a contributor to bring up a continuous integration runner for this platform (!13619, !13963). In response, we expect that GHC 9.14 will ship binary distributions for FreeBSD as a tier 2 platform.

However, supporting GHC’s compatibility matrix requires a real investment of time and energy. If you rely on FreeBSD or any other BSD, we would appreciate your help in looking after and improving GHC’s support of these platforms.

Frontend

  • Matthew, Adam, and Rodrigo wrote and proposed the now-accepted Explicit Level Imports proposal. This proposal represents a significant step forward in Haskell staged metaprogramming story, introducing syntax to distinguish imports needed at runtime from those only needed at compile-time (e.g. for TemplateHaskell splices). This distinction opens the door to compile-time improvements, more robust cross-compilation support, and a more expressive metaprogramming story.

  • Sam worked with Simon Peyton-Jones to implement GHC proposal #493 (Allow expressions in SPECIALISE pragmas) (!12319).

  • Sam finished up work by GHC contributor Jade, giving GHCi error messages their own error codes (#23338, !13094).

  • Sam made several internal improvements to the typechecker, surrounding the function checkTyEqRhs which is responsible for skolem escape, occurs checks and representation-polymorphism checking (!13778, !13931).

  • Sam implemented a simplification of the logic for solving of quantified constraints, both improving solver efficiency and simplifying specialization (!13958).

  • Sam refactored the GHC “error context” infrastructure, migrating it to a structured representation like for the error message contents (#23436, !10540).

  • Sam implemented defaulting of representational equalities, which allows GHC to accept several uses of coerce that used to be rejected with ambiguous types (#21003, !13834).

  • Sam fixed GHC emitting spurious “incomplete record selectors” warnings due to missing long-distance information in the pattern-match checker (#25749, !13979).

Backend

  • Sam prevented GHC emitting LLVM code with incompatible vector types (e.g. the same variable being declared as 4xi32 and used as 8xi16) (!13936).

  • Sam investigated CI failures with the LLVM backend, identifying several critical bugs such as #25771 and #25773.

  • Ben improved the naming of various compiler-generated binders, which will make it easier to make sense of -ddump-simpl output and runtime stacks (!13849, !13875).

  • Rodrigo ensured that certain join-points inline, dramatically improving runtime allocations of certain programs (#25723, !13909).

  • In response to a serious correctness regression in 9.12.1, Ben improved the testing story for primops by adding Cmm surface syntax for the previously-untested Mul2 operations and expanding the scope of the test-primops testsuite (!13843, test-primops!27).

Compiler performance

  • Matthew improved the performance of type family consistency checking by ensuring checks are run in topological order, significantly reducing redundant work (#25554, !13685).

  • Rodrigo made a variety of improvements in GHC’s memory consumption, including refactoring the ModuleGraph interface (!13658) and reducing the memory usage of module transitive closure calculations done when encountering Template Haskell splices (#25634, !13753).

  • Rodrigo refactored the HomePackageTable and HomeUnitGraph to avoid significant space leaks (#25511, !13675).

  • Rodrigo improved the performance of compiling deriving Show and deriving Data (!13739).

Runtime system

  • Ben fixed a few bugs in the linker’s object unloading implementation resulting in runtime crashes (#24935, !13704; #25039, !13714).

  • Ben removed some dead code in the IO manager which was causing some CI jobs to fail (!13678).

  • Ben lifted the runtime system’s limit of 256 capabilities, ensuring that the runtime system can scale to large multicore systems (#25560, !13692).

  • Ben fixed a bug in mmapInRegion which would cause it to loop indefinitely in certain circumstances on FreeBSD (#25492, !13618).

GHCi & bytecode interpreter

  • Matthew improved the error reporting of out-of-scope qualified names in GHCi (!13751).

  • Matthew fixed segfaults in the bytecode interpreter that were caused by the FastString table being loaded unoptimised (!13877).

  • Matthew dramatically improved the performance of the bytecode interpreter

    • by avoiding generating no-op SLIDE x 0 instructions (!13868),
    • by using a strict genericLength function (!13885),
    • by avoiding intermerdiate lists in nameToCLabel (!13898),
    • by using Name rather than FastString to key the symbol cache (!13914).
  • Matthew fixed the INTERP_STATS macro that is used for performance statistics of GHCi (!13879), and then proceeded to fix accounting errors in these statistics (!13956).

  • Ben and Matthew improved the printing of BCOs to assist debugging the bytecode interpreter (!13570, !13878, !13955)

  • Ben fixed an incorrect assumption regarding which unlifted types can appear at the top-level (#25641, !13796).

Libraries

  • Ben re-introduced missing {Enum/Show} IOSubSystem instances that were accidentally removed in !9676 (#25549, !13683).

  • Ben introduced Data.Enum and Data.Bounded as agreed in CLC Proposal #208, correcting an accidental inclusion of Data.Enum in the ghc-internal refactor (#25320, !11347, !13790).

  • Ben implemented CLC proposal #305, ensuring that threads created by GHC’s base library can be easily identified by their thread label (#25452, !13566).

  • Rodrigo improved the implementation of SomeException for SomeAsyncException, implementing CLC Proposal #309 (!13725).

Build system

  • Ben fixed #25501, ensuring that the ld-override logic is consistent between the configure script and ghc-toolchain (!13617).

  • Ben mitigated a race condition with mktexfmt in Hadrian (#25564, !13703).

  • Ben allowed i686 to be parsed as part of triples in the configure script (#25691, !13874).

Cabal

  • Matthew fixed two Cabal 3.14 regressions in which the current working directory was not correctly taken into account, when creating the build folder (#10772, #10800) and when running test executables (#10704, #10725).

  • Matthew fixed a Cabal 3.14 regression in which Cabal would erroneously pick versions of build tools (such as alex or happy) from the system environment rather than the versions specified in build-tool-depends (#10692, #10731).

ghc-debug

Zubin implemented support in the ghc-debug backend and Brick front-end for streaming heap traversals, enabling constant-space analysis of large heaps (!66).

by adam, andreask, ben, hannes, matthew, mikolaj, rodrigo, sam, zubin at March 21, 2025 12:00 AM

March 20, 2025

Tweag I/O

A hundred pull requests for Liquid Haskell

A new release of Liquid Haskell is out after quite an active period of development with 99 pull requests in the liquidhaskell repository, and 29 pull requests in the liquid-fixpoint repository from about ten contributors. This post is to provide an overview of the changes that made it into the latest release.

There were contributions to the reflection and proof mechanisms; we got contributions to the integration with GHC; the support of cvc5 was improved when dealing with sets, bags, and maps; and there was a rather large overhaul of the name resolution mechanism.

Reflection improvements

Liquid Haskell is a tool to verify Haskell programs. We can write formal specifications inside special Haskell comments {-@ ... @-}, and the tool will check whether the program behaves as specified. For instance, the following specification of the filter function says that we expect all of the elements in the result to satisfy the given predicate.

{-@ filter :: p:(a -> Bool) -> xs:[a] -> {v:[a] | all p v } @-}

Liquid Haskell would then analyze the implementation of filter to verify that it does indeed yield elements that satisfy the predicate.

To verify such a specification, Liquid Haskell needs to attach a meaning to the names in the predicate all p v. It readily learns that p is a parameter of filter, and that v is the result. all, however, isn’t bound by the specification’s parameters, so it refers to whatever is in scope, which is the Haskell function from the Prelude.

all :: (a -> Bool) -> [a] -> Bool

And Liquid Haskell has a mechanism to provide logic meaning to the implementation of a function like all, known as reflection. While it has always been convenient to reflect functions in modules analyzed by Liquid Haskell, it was not so easy when there was a mix of local and imported definitions from dependencies that are not analysed with Liquid Haskell. Last year, there was an internship at Tweag to address exactly this friction, which resulted in contributions to the latest release.

Reasoning and reflection of lambdas

The reflection mechanism also has other specific limitations at the moment. For instance, it doesn’t allow reflecting recursive functions defined in let or where bindings. And until recently, it didn’t allow reflecting functions that contained anonymous functions. For example,

takePositives = filter (\x -> x > 0)

In the latest release, we have several contributions that introduce support for reflecting lambdas and improve the story for reasoning with them. This feature is considered experimental at the moment, since we will still have usability and performance concerns that deserve further contributions, but one can already explore the experience that we could expect in the long run.

Integration with GHC

In 2020 Liquid Haskell became a compiler plugin for GHC. It was hooked into the end of the type checking phase firstly to ensure it only runs on well-typed programs, and secondly, to ensure the plugin runs when GHC is only asked to typecheck the module but not to generate code, which was helpful to IDEs.

For a few technical reasons, the plugin was re-parsing and re-typechecking the module instead of using the abstract syntax tree (AST) that GHC handed to it as the result of type checking. That is no longer the case in the latest release, where the AST after type checking is now used for all purposes. In addition, there were several improvements to how the ghc library is used.

cvc5 support

Liquid Haskell offloads part of its reasoning to a family of automated theorem provers known as SMT solvers. For most developments, Liquid Haskell has been used with the Z3 SMT solver, and this is what has been used most of the time in continuous integration pipelines.

In theory, any SMT solver can be used with Liquid Haskell, if it provides a standard interface known as SMT-LIB. In practice, however, experiments are done with theories that are not part of the standard. For instance, the reasoning capabilities for bags, sets, and maps used to require z3. But now the latest release implements support for cvc5 as well.

Name resolution overhaul

Name resolution determines, for each name in a program, what is the definition that it refers to. Liquid Haskell, in particular, is responsible for resolving names that appear in specifications. This task was problematic when the programs it was asked to verify spanned many modules.

There were multiple kinds of names, each with their own name resolution rules, and names were resolved in different environments when verifying a module and when importing it elsewhere, not always yielding the same results, which often produced confusing errors.

Name resolution, however, was done all over the code base, and any attempt to rationalize it would require a few months of effort. I started such an epic last September, and managed to conclude it in February. These changes made it into the latest release together with an awful lot of side quests to simplify the existing code.

The road ahead

There is no coordinated roadmap for Liquid Haskell. Much of the contributions that it receives depend on the opportunity enabled by academic research or the needs of particular use cases.

On my side, I’m trying to improve the adoption of Liquid Haskell. Much of the challenge is reducing the amount of common workarounds that the proficient Liquid Haskeller needs to employ today. For instance, supporting reflection of functions in local bindings would save the user the trouble of rewriting her programs to put the recursive functions in the top level. Repairing the support for type classes would allow functions to be verified even if they use type classes, which is a large subset of Haskell today. And without having defined a scope with precision yet, Liquid Haskell still needs to improve its user documentation, its error messages, and its tracing and logging.

The project is chugging along, though. It is making significant leaps in usability. The upgrade costs have been quantified for a few GHC releases, and no longer look like an unbounded risk. The amount of external contributions has increased last year, although we still have to see if it is a trend. And there is no shortage of interest from academia and industrial interns.

Thanks to the many contributors for their work and their help during code reviews. I look forward to learning what makes it into the coming Liquid Haskell releases!

March 20, 2025 12:00 AM

March 18, 2025

Mark Jason Dominus

Baseball on the Moon

We want to adapt baseball to be played on the moon. Is there any way to make it work?

My first impression is: no, for several reasons.

The pitched ball will go a little faster (no air resistance) but breaking balls are impossible (ditto). So the batter will find it easier to get a solid hit. We can't fix this by moving the plate closer to the pitcher's rubber; that would expose both batter and pitcher to uncceptable danger. I think we also can't fix it by making the plate much wider.

Once the batter hits the ball, it will go a long long way, six times as far as a batted ball on Earth. In order for every hit to not be a home run, the outfield fence will have to be about six times as far way, so the outfield will be times as large. I don't think the outfielders can move six times as fast to catch up to it. Perhaps if there were 100 outfielders instead of only three?

Fielding the ball will be more difficult. Note that even though the vacuum prevents the pitch from breaking, the batted ball can still take unexpected hops off the ground.

Having gotten hold of the ball, the outfielder will then need to throw it back to the infield. They will be able to throw it that far, but they probably won't be able do it accurately enough for the receiving fielder to make the play at the base. More likely the outfielder will throw it wild.

I don't think this can be easily salvaged. People do love home runs, but I don't think they would love this. Games are too long already.

Well, here's a thought. What if instead of four bases, arranged in a -foot square, we had, I don't know, eight or ten, maybe or feet apart? More opportunities for outs on the basepaths, and also the middle bases would not be so far from the outfield. Instead of throwing directly to the infield, the outfielders would have a relay system where one outfielder would throw to another that was farther in, and perhaps one more, before reaching the infield. That might be pretty cool.

I think it's not easy to run fast on the Moon. On the Earth, a runner's feet are pushing aganst the ground many times each second. On the Moon, the runner is taking big leaps. They may only get in one-sixth as many steps over the same distance, which would give them much less opportunity to convert muscle energy into velocity. (Somewhat countervailing, though: no air resistance.) Runners would have to train specially to be able to leap accurately to the bases. Under standard rules, a runner who overshoots the base will land off the basepaths and be automatically out.

So we might expect to see the runner bounding toward first base. Then one of the thirty or so far-left fielders would get the ball, relay it to the middle-left fielder and then the near-left fielder who would make the throw back to first. The throw would be inaccurate because it has to traverse a very large infield, and the first baseman would have to go chasing after it and pick it up from foul territory. He can't get back to first base quickly enough, but that's okay, the pitcher has bounded over from the mound and is waiting near first base to make the force play. Maybe the runner isn't there yet because one of his leaps was too long and to take another he has to jump high into the air and come down again.

It would work better than Quiddich, anyway.

by Mark Dominus (mjd@plover.com) at March 18, 2025 03:12 PM

March 17, 2025

Haskell Interlude

63: Farhad Mehta

On this episode of the Haskell Interlude, Andres Löh and Mike Sperber are joined by Farhad Mehta, a professor at OST Rapperswil, and one of the organizers of ZuriHac. Fahrad tells us about formal methods, building tunnels, the importance of education, and the complicated relationship between academia and industry.

by Haskell Podcast at March 17, 2025 02:00 PM

March 16, 2025

Magnus Therning

Using lens-aeson to implement FromJSON

At work I sometimes need to deal with large and deep JSON objects where I'm only interested in a few of the values. If all the interesting values are on the top level, then aeson have functions that make it easy to implement FromJSON's parseJSON (Constructors and accessors), but if the values are spread out then the functions in aeson come up a bit short. That's when I reach for lens-aeson, as lenses make it very easy to work with large structures. However, I've found that using its lenses to implement parseJSON become a lot easier with a few helper functions.

Many of the lenses produces results wrapped in Maybe, so the first function is one that transforms a Maybe a to a Parser a. Here I make use of Parser implementing MonadFail.

infixl 8 <!>
(<!>) :: (MonadFail m) => Maybe a -> String -> m a
(<!>) mv err = maybe (fail err) pure mv

In some code I wrote this week I used it to extract the user name out of a JWT produced by Keycloak:

instance FromJSON OurClaimsSet where
    parseJSON = ... $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "preferre username missing"
        ...
        pure $ OurClaimsSet cs n ...

Also, all the lenses start with a Value and that makes the withX functions in aeson to not be a perfect fit. So I define variations of the withX functions, e.g.

withObjectV :: String -> (Value -> Parser a) -> Value -> Parser a
withObjectV s f = withObject s (f . Object)

That makes the full FromJSON instance for OurClaimsSet looks like this

instance FromJSON OurClaimsSet where
    parseJSON = withObjectV "OurClaimsSet" $ \o -> do
        cs <- parseJSON o
        n <- o ^? key "preferred_username" . _String <!> "name"
        let rs = o ^.. key "resource_access" . members . key "roles" . _Array . traverse . _String
        pure $ OurClaimsSet cs n rs

March 16, 2025 08:26 AM

March 15, 2025

Mark Jason Dominus

Hangeul sign-engraving machine

Last summer I was privileged to visit the glorious Letterpress Museum in Paju Book City, where I spent several hours and took a collection of photos that are probably not of interest to anyone but letterpress geeks, and perhaps not even to them.

Looking back at the photos it's not always clear to me why I took each one. But some of them I can remember. For example, this one:

This is not exactly letterpress. It is a device for engraving lettered signs on thin strips of metal or perhaps plastic. Happily I don't have to spend too much time explaining this because Marcin Wichary has just published an extensively-illustrated article about the Latin-script version. The only thing different about this one is the fonts, which are for writing Korean in Hangeul script rather than English in Latin script.

(Here's my real-quick summary. There is no ink. A stylus goes into the grooves of those brass templates. The stylus is attached with a pantograph to a router bit that rests on the object that the operator wants to engrave. When operator moves the stylus in the template grooves, the router bit follows their motions and engraves matching grooves in the target object. By adjusting the pantograph, one can engrave letters that are larger or smaller than the templates.)

Hangeul has an alphabet of 24 letters, but there's a difficulty in adapting this engraving technique for written Hangeul: The letters aren't written in a simple horizontal row as European languages are. Instead, they are grouped into syllables of two or three letters. For example, consider the consider the Korean word “문어”, pronounced (roughly) "moon-aw". which means “octopus”. This is made up of five letters ㅁㅜㄴㅇㅓ, but as you see they are arranged in two syllables 문 ("moon") and 어 ("aw"). So instead of twenty-four kinds of templates, one for each letter, the Korean set needs one for every possible syllable, and there are thousands of possible syllables.

Unicode gets around this by… sorry, Unicode doesn't get around it, they just allocate eleven thousand codepoints, one for each possible syllable. But for this engraving device, it would be prohibitively expensive to make eleven thousand little templates, then another eleven thousand spares, and impractical to sort and manage them in the shop. Instead there is a clever solution.

Take a look at just one of these templates:

This is not a Hangeul syllable.

Rather, it is five. The upper-right letter in the syllable is the vowel, and the template allows the operator to engrave any of the five vowels

ㅣㅓㅏㅕㅑ

to produce the syllables

잉 엉 앙 영 양

pronounced respectively "ing", "ông", "ang", "yông", and "yang".

Similarly this one can produce six different syllables:

The upper-left part can be used to engrave either of the consonants or and the upper-right part can be used to engrave any of the vowels ㅣㅓㅏ, to produce the combined set 싱 성 상 징 정 장. I'm not sure why this template doesn't also enable vowels ㅕㅑ as the other one did.

In the picture at top you can see that while the third template can be used to engrave any of the three syllables 송 승 숭 the operator has acutally used it to engrave the first of these.

This ingenious mechanism cuts down the required number of templates by perhaps a factor of five, from ten boxes to two.

by Mark Dominus (mjd@plover.com) at March 15, 2025 09:58 PM

March 14, 2025

GHC Developer Blog

GHC 9.12.2 is now available

GHC 9.12.2 is now available

Ben Gamari - 2025-03-14

The GHC developers are happy to announce the release of GHC 9.12.2. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly. This is a small release fixing a critical code generation bug, #25653, affecting some subword division operations.

As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors who contribute their code, tickets, and energy to the GHC project.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at March 14, 2025 12:00 AM

March 13, 2025

Brent Yorgey

Hendrix College Programming Contest 2025

Hendrix College Programming Contest 2025

Posted on March 13, 2025
Tagged , , , , ,

I haven’t written on here in a while, mostly because a lot of my time has gone into preparing for the second annual Hendrix College Programming Contest, which will take place this Saturday, March 15, from 12:30-5:30pm CDT (17:30-22:30 UTC).

I’ve created an open mirror contest which will run in parallel to the official contest, so if you want to grab some friends and try solving some of the problems together using your favorite language, be my guest!

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at March 13, 2025 12:00 AM

March 12, 2025

Well-Typed.Com

The Haskell Unfolder Episode 40: understanding through a model

Today, 2025-03-12, at 1930 UTC (12:30 pm PST, 3:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 40th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 40: understanding through a model

QuickCheck is useful for more than just testing. Comparing the behaviour of a system to a model can be used to check if a system under construction is working correctly, but it can also be used to better understand an already existing system. In this episode we show that this does not need to be very difficult, by designing a model that we can use to understand tensor convolutions in an existing large library.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at March 12, 2025 12:00 AM

March 10, 2025

Joachim Breitner

Extrinsic termination proofs for well-founded recursion in Lean

A few months ago I explained that one reason why this blog has become more quiet is that all my work on Lean is covered elsewhere.

This post is an exception, because it is an observation that is (arguably) interesting, but does not lead anywhere, so where else to put it than my own blog…

Want to share your thoughts about this? Please join the discussion on the Lean community zulip!

Background

When defining a function recursively in Lean that has nested recursion, e.g. a recusive call that is in the argument to a higher-order function like List.map, then extra attention used to be necessary so that Lean can see that xs.map applies its argument only elements of the list xs. The usual idiom is to write xs.attach.map instead, where List.attach attaches to the list elements a proof that they are in that list. You can read more about this my Lean blog post on recursive definitions and our new shiny reference manual, look for Example “Nested Recursion in Higher-order Functions”.

To make this step less tedious I taught Lean to automatically rewrite xs.map to xs.attach.map (where suitable) within the construction of well-founded recursion, so that nested recursion just works (issue #5471). We already do such a rewriting to change if c then … else … to the dependent if h : c then … else …, but the attach-introduction is much more ambitious (the rewrites are not definitionally equal, there are higher-order arguments etc.) Rewriting the terms in a way that we can still prove the connection later when creating the equational lemmas is hairy at best. Also, we want the whole machinery to be extensible by the user, setting up their own higher order functions to add more facts to the context of the termination proof.

I implemented it like this (PR #6744) and it ships with 4.18.0, but in the course of this work I thought about a quite different and maybe better™ way to do this, and well-founded recursion in general:

A simpler fix

Recall that to use WellFounded.fix

WellFounded.fix : (hwf : WellFounded r) (F : (x : α) → ((y : α) → r y x → C y) → C x) (x : α) : C x

we have to rewrite the functorial of the recursive function, which naturally has type

F : ((y : α) →  C y) → ((x : α) → C x)

to the one above, where all recursive calls take the termination proof r y x. This is a fairly hairy operation, mangling the type of matcher’s motives and whatnot.

Things are simpler for recursive definitions using the new partial_fixpoint machinery, where we use Lean.Order.fix

Lean.Order.fix : [CCPO α] (F : β → β) (hmono : monotone F) : β

so the functorial’s type is unmodified (here β will be ((x : α) → C x)), and everything else is in the propositional side-condition montone F. For this predicate we have a syntax-guided compositional tactic, and it’s easily extensible, e.g. by

theorem monotone_mapM (f : γ → α → m β) (xs : List α) (hmono : monotone f) :
    monotone (fun x => xs.mapM (f x)) 

Once given, we don’t care about the content of that proof. In particular proving the unfolding theorem only deals with the unmodified F that closely matches the function definition as written by the user. Much simpler!

Isabelle has it easier

Isabelle also supports well-founded recursion, and has great support for nested recursion. And it’s much simpler!

There, all you have to do to make nested recursion work is to define a congruence lemma of the form, for List.map something like our List.map_congr_left

List.map_congr_left : (h : ∀ a ∈ l, f a = g a) :
    List.map f l = List.map g l

This is because in Isabelle, too, the termination proofs is a side-condition that essentially states “the functorial F calls its argument f only on smaller arguments”.

Can we have it easy, too?

I had wished we could do the same in Lean for a while, but that form of congruence lemma just isn’t strong enough for us.

But maybe there is a way to do it, using an existential to give a witness that F can alternatively implemented using the more restrictive argument. The following callsOn P F predicate can express that F calls its higher-order argument only on arguments that satisfy the predicate P:

section setup

variable {α : Sort u}
variable {β : α → Sort v}
variable {γ : Sort w}

def callsOn (P : α → Prop) (F : (∀ y, β y) → γ) :=
  ∃ (F': (∀ y, P y → β y) → γ), ∀ f, F' (fun y _ => f y) = F f

variable (R : α → α → Prop)
variable (F : (∀ y, β y) → (∀ x, β x))

local infix:50 " ≺ " => R

def recursesVia : Prop := ∀ x, callsOn (· ≺ x) (fun f => F f x)

noncomputable def fix (wf : WellFounded R) (h : recursesVia R F) : (∀ x, β x) :=
  wf.fix (fun x => (h x).choose)

def fix_eq (wf : WellFounded R) h x :
    fix R F wf h x = F (fix R F wf h) x := by
  unfold fix
  rw [wf.fix_eq]
  apply (h x).choose_spec

This allows nice compositional lemmas to discharge callsOn predicates:

theorem callsOn_base (y : α) (hy : P y) :
    callsOn P (fun (f : ∀ x, β x) => f y) := by
  exists fun f => f y hy
  intros; rfl

@[simp]
theorem callsOn_const (x : γ) :
    callsOn P (fun (_ : ∀ x, β x) => x) :=
  ⟨fun _ => x, fun _ => rfl⟩

theorem callsOn_app
    {γ₁ : Sort uu} {γ₂ : Sort ww}
    (F₁ :  (∀ y, β y) → γ₂ → γ₁) -- can this also support dependent types?
    (F₂ :  (∀ y, β y) → γ₂)
    (h₁ : callsOn P F₁)
    (h₂ : callsOn P F₂) :
    callsOn P (fun f => F₁ f (F₂ f)) := by
  obtain ⟨F₁', h₁⟩ := h₁
  obtain ⟨F₂', h₂⟩ := h₂
  exists (fun f => F₁' f (F₂' f))
  intros; simp_all

theorem callsOn_lam
    {γ₁ : Sort uu}
    (F : γ₁ → (∀ y, β y) → γ) -- can this also support dependent types?
    (h : ∀ x, callsOn P (F x)) :
    callsOn P (fun f x => F x f) := by
  exists (fun f x => (h x).choose f)
  intro f
  ext x
  apply (h x).choose_spec

theorem callsOn_app2
    {γ₁ : Sort uu} {γ₂ : Sort ww}
    (g : γ₁ → γ₂ → γ)
    (F₁ :  (∀ y, β y) → γ₁) -- can this also support dependent types?
    (F₂ :  (∀ y, β y) → γ₂)
    (h₁ : callsOn P F₁)
    (h₂ : callsOn P F₂) :
    callsOn P (fun f => g (F₁ f) (F₂ f)) := by
  apply_rules [callsOn_app, callsOn_const]

With this setup, we can have the following, possibly user-defined, lemma expressing that List.map calls its arguments only on elements of the list:

theorem callsOn_map (δ : Type uu) (γ : Type ww)
    (P : α → Prop) (F : (∀ y, β y) → δ → γ) (xs : List δ)
    (h : ∀ x, x ∈ xs → callsOn P (fun f => F f x)) :
    callsOn P (fun f => xs.map (fun x => F f x)) := by
  suffices callsOn P (fun f => xs.attach.map (fun ⟨x, h⟩ => F f x)) by
    simpa
  apply callsOn_app
  · apply callsOn_app
    · apply callsOn_const
    · apply callsOn_lam
      intro ⟨x', hx'⟩
      dsimp
      exact (h x' hx')
  · apply callsOn_const

end setup

So here is the (manual) construction of a nested map for trees:

section examples

structure Tree (α : Type u) where
  val : α
  cs : List (Tree α)

-- essentially
-- def Tree.map (f : α → β) : Tree α → Tree β :=
--   fun t => ⟨f t.val, t.cs.map Tree.map⟩)
noncomputable def Tree.map (f : α → β) : Tree α → Tree β :=
  fix (sizeOf · < sizeOf ·) (fun map t => ⟨f t.val, t.cs.map map⟩)
    (InvImage.wf (sizeOf ·) WellFoundedRelation.wf) <| by
  intro ⟨v, cs⟩
  dsimp only
  apply callsOn_app2
  · apply callsOn_const
  · apply callsOn_map
    intro t' ht'
    apply callsOn_base
    -- ht' : t' ∈ cs -- !
    -- ⊢ sizeOf t' < sizeOf { val := v, cs := cs }
    decreasing_trivial

end examples

This makes me happy!

All details of the construction are now contained in a proof that can proceed by a syntax-driven tactic and that’s easily and (likely robustly) extensible by the user. It also means that we can share a lot of code paths (e.g. everything related to equational theorems) between well-founded recursion and partial_fixpoint.

I wonder if this construction is really as powerful as our current one, or if there are certain (likely dependently typed) functions where this doesn’t fit, but the β above is dependent, so it looks good.

With this construction, functions defined by well-founded recursion will reduce even worse in the kernel, I assume. This may be a good thing.

The cake is a lie

What unfortunately kills this idea, though, is the generation of the functional induction principles, which I believe is not (easily) possible with this construction: The functional induction principle is proved by massaging F to return a proof, but since the extra assumptions (e.g. for ite or List.map) only exist in the termination proof, they are not available in F.

Oh wey, how anticlimactic.

PS: Path dependencies

Curiously, if we didn’t have functional induction at this point yet, then very likely I’d change Lean to use this construction, and then we’d either not get functional induction, or it would be implemented very differently, maybe a more syntactic approach that would re-prove termination. I guess that’s called path dependence.

by Joachim Breitner (mail@joachim-breitner.de) at March 10, 2025 05:47 PM

March 08, 2025

Mark Jason Dominus

Claude and Merle Miller let me down

Claude

My relationship with Claude has its ups and downs, and I'm still trying to figure out what to use it for and what not. It's great at rewriting my job application cover letters to sound less like an awkward nerd.

Last week I was in the supermarket and decided to try asking it something I've been wondering for a long time:

MJD: What is the song "Mercy Street” about? What is the connection with Anne Sexton? Who is "Mary"?

I thought Claude might do well with this. I had had a conversation with it a while back about Pixies songs, which I was satisfied with. But this time Claude let me down:

Claude: … The "Mary" referenced in the song is likely Mary Gray Staples, who was the name of Sexton's childhood doll. …

(I thought: What? Am I supposed to believe that

Dreaming of the tenderness
The tremble in the hips
Of kissing Mary's lips

is about a doll?)

MJD: Mary Gray Staples is an unusual name for a doll. Are you sure about that?

Claude: You're right to question this - I made an error. Mary Gray was actually Anne Sexton's great-aunt, not the name of her doll. … The doll references in Gabriel's song likely connect to Sexton's imagery in other poems, but Mary specifically refers to her great-aunt.

MJD: I don't remember a doll in Gabriel's song.

Claude: You're absolutely right - I apologize for the error. There isn't actually a reference to a doll in Peter Gabriel's "Mercy Street" lyrics. … The "Mary" referenced in the song is indeed Anne Sexton's great-aunt Mary Gray…

MJD: Are you sure? Wikipedia says that Mary Gray Staples was the birth name of Sexton's mother.

Claude: You're absolutely right, and I apologize for the confusion. …

Claude just flubbed over and over.

I wonder if the grammatical error in “Mary Gray Staples, who was the name of …” is a kind of a tell? Perhaps Claude is fabricating, by stitching together parts of two unrelated sentences that it read somewhere, one with “Mary Gray Staples, who was…” and the other “… was the name of…”? Probably it's not that simple, but the grammatical error is striking.

Anyway, this was very annoying because I tend to remember things like this long past the time when I remember where I heard them. Ten years from now I might remember that Anne Sexton once had a doll with a very weird name.

Merle Miller

A while back I read Merle Miller's book Plain Speaking. It's an edited digest of a series of interviews Miller did with former President Truman in 1962, at his home in Independence, Missouri. The interviews were originally intended to be for a TV series, but when that fell through Miller turned them into a book. In many ways it's a really good book. I enjoyed it a lot, read it at least twice, and a good deal of it stuck in my head.

But I can't recommend it, because it has a terrible flaw. There have been credible accusations that Miller changed some of the things that Truman said, embellished or rephrased many others, that he tarted up Truman's language, and that he made up some conversations entirely.

So now whenever I remember something that I think Truman said, I have to stop and try to remember if it was from Miller. Did Truman really say that it was the worst thing in the world when records were destroyed? I'm sure I read it in Miller, so, uhh… maybe?

Miller recounts a discussion in which Truman says he is pretty sure that President Grant had never read the Constitution. Later, Miller says, he asked Truman if he thought that Nixon had read the Constitution, and reports that Truman's reply was:

I don't know. I don't know. But I'll tell you this. If he has, he doesn't understand it.

Great story! I have often wanted to repeat it. But I don't, because for all I know it never happened.

(I've often thought of this, in years past, and whatever Nixon's faults you could at least wonder what the answer was. Nobody would need to ask this about the current guy, because the answer is so clear.)

Miller, quotes Truman's remarks about Supreme Court Justice Tom Clark, “It isn't so much that he's a bad man. It's just that he's such a dumb son of a bitch.” Did Truman actually say that? Did he just imply it? Did he say anything like it? Uhhh… maybe?

There's a fun anecdote about the White House butler learning to make an Old-fashioned cocktail in the way the Trumans preferred. (The usual recipe involves whiskey, sugar, fresh fruit, and bitters.) After several attempts the butler converged on the Trumans' preferred recipe, of mostly straight bourbon. Hmm, is that something I heard from Merle Miller? I don't remember.

There's a famous story about how Paul Hume, music critic for the Washington Post, savaged an performance of Truman's daughter Margaret, and how Truman sent him an infamous letter, very un-presidential, that supposedly contained the paragraph:

Some day I hope to meet you. When that happens you'll need a new nose, a lot of beef steak for black eyes, and perhaps a supporter below!

Miller reports that he asked Truman about this, and Truman's blunt response: “I said I'd kick his nuts out.” Or so claims Miller, anyway.

I've read Truman's memoirs. Volume I, about the immediate postwar years, is fascinating; Volume II is much less so. They contain many detailed accounts of the intransigence of the Soviets and their foreign minister Vyacheslav Molotov, namesake of the Molotov Cocktail. Probably 95% of what I remember Truman saying is from those memoirs, direct from Truman himself. But some of it must be from Plain Speaking. And I don't know any longer which 5% it is.

As they say, an ice cream sundae with a turd in it isn't 95% ice cream, it's 100% shit. Merle Miller shit in the ice cream sundae of my years of reading of Truman and the Truman administrations.

Now Claude has done the same. And if I let it, Claude will keep doing it to me. Claude caga en la leche.

Addendum

The Truman Library now has the recordings of those interviews available online. I could conceivably listen to them all and find out for myself which things went as Miller said.

So there may yet be a happy ending, thanks to the Wonders of the Internet! I dream of someday going through those interviews and producing an annotated edition of Plain Speaking.

by Mark Dominus (mjd@plover.com) at March 08, 2025 02:07 AM

March 07, 2025

Mark Jason Dominus

Reflector grids

Around here, these metal things are commonly found on streetside utility poles, attached maybe a meter off the ground.

a yellow rectangular grid has been nailed to a wooden utility pole about a   meter off the ground, curving around the pole.  It has four rows of ten rectangular holes   punched out of it.
Metal reflector
A very similar-looking grid on a different pole
Plastic reflector

When I first noticed one of these I said “I wonder what the holes are for. Maybe to make it more visible? And what do they do with all the leftover rectangles after they've made one?”

I eventually got a better idea: The little metal rectangles are the primary product, and after they have been die-cut out of the metal sheet, there is this waste material left over with all the holes. Instead of throwing it away someone nails it to a utility pole to make the pole easier to see at night. I felt a bit silly that my first idea had been exactly backwards.

I later learned that only the older ones are made of sheet metal. Newer ones are made of some sort of plastic, maybe polyethylene or vinyl or something, about the same thickness. They look pretty much the same. I can only tell them apart by feeling them.

Still I wondered what the little rectangles had been used for. It turns out that the purpose is this:

a wooden utility pole, attached to which is a metal frame enclosing six of those little yellow rectangles.  Each has been embossed with a black numeral, and together the six rectangles announce the post's ID number.

That's according to an old Philadelphia Inquirer article, Why yellow grids are on some Philly-area utility poles. (Patricia Madej, Aug. 31, 2019.) But I measured them to make sure. They matched.


The answer came as a bit of a surprise to Jay Lipschutz, 73, of Northeast Philly …

His wife, Ruth, he said, had insisted they’re reflectors for drivers to see. She was right.

Jay, my friend, your wife is smarter than you are. Listen to her.

The article also tells us that the rectangular leftover is called a “grid reflector”. With a little more research I learned that one manufacturer of grid reflectors is Almetek. They cost $3.50 each. Pricey, for something they would have had to throw away. (Here's the old South Philly Review article that put me on to Almetek.)

What kicked off this article was that I was walking around and I saw this similar reflector grid, which felt to me like it was a bit of a farce, like a teenager sneaking into a bar wearing a fake mustache:

This one is a solid sheed of plastic, nailed to the post.  Instead of forty little rectangular cutouts, it has forty black rectangles that only look like cutouts.

Hey, those aren't holes! When I saw this one I wondered for a moment if I was suffering some sort of mental collapse, or if none of the others had had real holes either. But no, they had, and this one really did have fake holes.

(Also, it has been installed sideways. Normally they are oriented as the two above.)

This isn't the first time I have written about ID numbers on utility poles hereabouts.

by Mark Dominus (mjd@plover.com) at March 07, 2025 12:37 AM

March 06, 2025

in Code

Sum Types and Subtypes and Unions

There’s yet again been a bit of functional programming-adjacent twitter drama recently, but it’s actually sort of touched into some subtleties about sum types that I am asked about (and think about) a lot nowadays. So, I’d like to take this opportunity to talk a bit about the “why” and nature of sum types and how to use them effectively, and how they contrast with other related concepts in programming and software development and when even cases where sum types aren’t the best option.

Sum Types at their Best

The quintessential sum type that you just can’t live without is Maybe, now adopted in a lot of languages as Optional:

data Maybe a = Nothing | Just a

If you have a value of type Maybe Int, it means that its valid values are Nothing, Just 0, Just 1, etc.

This is also a good illustration to why we call it a “sum” type: if a has n possible values, then Maybe a has 1 + n: we add the single new value Nothing to it.

The “benefit” of the sum type is illustrated pretty clearly here too: every time you use a value of type Maybe Int, you are forced to consider the fact that it could be Nothing:

showMaybeInt :: Maybe Int -> String
showMaybeInt = \case
  Nothing -> "There's nothing here"
  Just i -> "Something is here: " <> show i

That’s because usually in sum type implementations, they are implemented in a way that forces you to handle each case exhaustively. Otherwise, sum types are much less useful.

At the most fundamental level, this behaves like a compiler-enforced null check, but built within the language in user-space instead being compiler magic, ad-hoc syntax1, or static analysis — and the fact that it can live in user-space is why it’s been adopted so widely. At a higher level, functional abstractions like Functor, Applicative, Monad, Foldable, Traversable allow you to use a Maybe a like just a normal a with the appropriate semantics, but that’s a topic for another time (like 2014).

This power is very special to me on a personal level. I remember many years ago on my first major haskell project changing a type from String to Maybe String, and then GHC telling me every place in the codebase where something needed to change in order for things to work still. Coming from dynamically typed languages in the past, this sublime experience truly altered my brain chemistry and Haskell-pilled me for the rest of my life. I still remember the exact moment, what coffee shop I was at, what my order was, the weather that day … it was truly the first day of the rest of my life.

It should be noted that I don’t consider sum types a “language feature” or a compiler feature as much as I’d consider it a design pattern. Languages that don’t have sum types built-in can usually implement them using typed unions and an abstract visitor pattern interface (more on that later). Of course, having a way to “check” your code before running it (like with a type system or statically verified type annotations) does make a lot of the features much more useful.

Anyway, this basic pattern can be extended to include more error information in your Nothing branch, which is how you get the Either e a type in the Haskell standard library, or the Result<T,E> type in rust.

Along different lines, we have the common use case of defining syntax trees:

data Expr =
    Lit Int
  | Negate Expr
  | Add Expr Expr
  | Sub Expr Expr
  | Mul Expr Expr

eval :: Expr -> Int
eval = \case
    Lit i -> i
    Negate x -> -(eval x)
    Add x y -> eval x + eval y
    Sub x y -> eval x - eval y
    Mul x y -> eval x * eval y

pretty :: Expr -> String
pretty = go 0
  where
    wrap :: Int -> Int -> String -> String
    wrap prio opPrec s
      | prio > opPrec = "(" <> s <> ")"
      | otherwise = s
    go prio = \case
        Lit i -> show i
        Negate x -> wrap prio 2 $ "-" <> go 2 x
        Add x y -> wrap prio 0 $ go 0 x <> " + " <> go 1 y
        Sub x y -> wrap prio 0 $ go 0 x <> " - " <> go 1 y
        Mul x y -> wrap prio 1 $ go 1 x <> " * " <> go 2 y

main :: IO ()
main = do
    putStrLn $ pretty myExpr
    print $ eval myExpr
  where
    myExpr = Mul (Negate (Add (Lit 4) (Lit 5))) (Lit 8)
-(4 + 5) * 8
-72

Now, if we add a new command to the sum type, the compiler enforces us to handle it.

data Expr =
    Lit Int
  | Negate Expr
  | Add Expr Expr
  | Sub Expr Expr
  | Mul Expr Expr
  | Abs Expr

eval :: Expr -> Int
eval = \case
    Lit i -> i
    Negate x -> -(eval x)
    Add x y -> eval x + eval y
    Sub x y -> eval x - eval y
    Mul x y -> eval x * eval y
    Abs x -> abs (eval x)

pretty :: Expr -> String
pretty = go 0
  where
    wrap :: Int -> Int -> String -> String
    wrap prio opPrec s
      | prio > opPrec = "(" <> s <> ")"
      | otherwise = s
    go prio = \case
        Lit i -> show i
        Negate x -> wrap prio 2 $ "-" <> go 2 x
        Add x y -> wrap prio 0 $ go 0 x <> " + " <> go 1 y
        Sub x y -> wrap prio 0 $ go 0 x <> " - " <> go 1 y
        Mul x y -> wrap prio 1 $ go 1 x <> " * " <> go 2 y
        Abs x -> wrap prio 2 $ "|" <> go 0 x <> "|"

Another example where things shine are as clearly-fined APIs between processes. For example, we can imagine a “command” type that sends different types of commands with different payloads. This can be interpreted as perhaps the result of parsing command line arguments or the message in some communication protocol.

For example, you could have a protocol that launches and controls processes:

data Command a =
    Launch String (Int -> a)    -- ^ takes a name, returns a process ID
  | Stop Int (Bool -> a)        -- ^ takes a process ID, returns success/failure

launch :: String -> Command Int
launch nm = Launch nm id

stop :: Int -> Command Bool
stop pid = Stop pid id

This ADT is written in the “interpreter” pattern (used often with things like free monad), where any arguments not involving a are the command payload, any X -> a represent that the command could respond with X.

Let’s write a sample interpreter backing the state in an IntMap in an IORef:

import qualified Data.IntMap as IM
import Data.IntMap (IntMap)

runCommand :: IORef (IntMap String) -> Command a -> IO a
runCommand ref = \case
    Launch newName next -> do
        currMap <- readIORef ref
        let newId = case IM.lookupMax currMap of
              Nothing -> 0
              Just (i, _) -> i + 1
        modifyIORef ref $ IM.insert newId newName
        pure (next newId)
    Stop procId next -> do
        existed <- IM.member procId <$> readIORef ref
        modifyIORef ref $ IM.delete procId
        pure (next existed)

main :: IO ()
main = do
    ref <- newIORef IM.empty
    aliceId <- runCommand ref $ launch "alice"
    putStrLn $ "Launched alice with ID " <> show aliceId
    bobId <- runCommand ref $ launch "bob"
    putStrLn $ "Launched bob with ID " <> show bobId
    success <- runCommand ref $ stop aliceId
    putStrLn $
      if success
        then "alice succesfully stopped"
        else "alice unsuccesfully stopped"
    print =<< readIORef ref
Launched alice with ID 0
Launched bob with ID 1
alice succesfully stopped
fromList [(1, "bob")]

Let’s add a command to “query” a process id for its current status:

data Command a =
    Launch String (Int -> a)    -- ^ takes a name, returns a process ID
  | Stop Int (Bool -> a)        -- ^ takes a process ID, returns success/failure
  | Query Int (String -> a)     -- ^ takes a process ID, returns a status message

query :: Int -> Command String
query pid = Query pid id

runCommand :: IORef (IntMap String) -> Command a -> IO a
runCommand ref = \case
    -- ...
    Query procId next -> do
        procName <- IM.lookup procId <$> readIORef ref
        pure case procName of
          Nothing -> "This process doesn't exist, silly."
          Just n -> "Process " <> n <> " chugging along..."

Relationship with Unions

To clarify a common confusion: sum types can be described as “tagged unions”: you have a tag to indicate which branch you are on (which can be case-matched on), and then the rest of your data is conditionally present.

In many languages this can be implemented under the hood as a struct with a tag and a union of data, along with some abstract visitor pattern interface to ensure exhaustiveness.

Remember, it’s not exactly a union, because, ie, consider a type like:

data Entity = User Int | Post Int

An Entity here could represent a user at a user id, or a post at a post id. If we considered it purely as a union of Int and Int:

union Entity {
    int user_id;
    int post_id;
};

we’d lose the ability to branch on whether or not we have a user or an int. If we have the tagged union, we recover the original tagged union semantics:

struct Entity {
    bool is_user;
    union {
        int user_id;
        int post_id;
    } payload;
};

Of course, you still need an abstract interface like the visitor pattern to actually be able to use this as a sum type with guarantees that you handle every branch, but that’s a story for another day. Alternatively, if your language supports dynamic dispatch nicely, that’s another underlying implementation that would work to back a higher-level visitor pattern interface.

Subtypes Solve a Different Problem

Now, sum types aren’t exactly a part of common programming education curriculum, but subtypes and supertypes definitely were drilled into every CS student’s brain and waking nightmares from their first year.

Informally (a la Liskov), B is a subtype of A (and A is a supertype of B) if anywhere that expects an A, you could also provide a B.

In normal object-oriented programming, this often shows up in early lessons as Cat and Dog being subclasses of an Animal class, or Square and Circle being subclasses of a Shape class.

When people first learn about sum types, there is a tendency to understand them as similar to subtyping. This is unfortunately understandable, since a lot of introductions to sum types often start with something like

-- | Bad Sum Type Example!
data Shape = Circle Double | Rectangle Double Double

While there are situations where this might be a good sum type (ie, for an API specification or a state machine), on face-value this is a bad example on the sum types vs. subtyping distinction.

You might notice the essential “tension” of the sum type: you declare all of your options up-front, the functions that consume your value are open and declared ad-hoc. And, if you add new options, all of the consuming functions must be adjusted.

So, subtypes (and supertypes) are more effective when they lean into the opposite end: the universe of possible options are open and declared ad-hoc, but the consuming functions are closed. And, if you add new functions, all of the members must be adjusted.

In typed languages with a concept of “objects” and “classes”, subtyping is often implemented using inheritance and interfaces.

interface Widget {
    void draw();
    void handleEvent(String event);
    String getName();
}

class Button implements Widget {
    // ..
}

class InputField implements Widget {
    // ..
}

class Box implements Widget {
    // ..
}

So, a function like processWidget(Widget widget) that expects a Widget would be able to be passed a Button or InputField or Box. And, if you had a container like List<Widget>, you could assemble a structure using Button, InputField, and Box. A perfect Liskov storm.

In typical library design, you’re able to add new implementations of Widget as an open universe easily: anyone that imports Widget can, and they can now use it with functions taking Widgets. But, if you ever wanted to add new functionality to the Widget interface, that would be a breaking change to all downstream implementations.

However, this implementation of subtyping, while prevalent, is the most mind-numbly boring realization of the concept, and it pained my soul to even spend time talking about it. So let’s jump into the more interesting way that subtype and supertype relationships manifest in the only language where anything is interesting: Haskell.

Subtyping via Parametric Polymorphism

In Haskell, subtyping is implemented in terms of parametric polymorphism and sometimes typeclasses. This allows for us to work nicely with the concept of functions and APIs as subtypes and supertypes of each other.

For example, let’s look at a function that takes indexers and applies them:

sumAtLocs :: ([Double] -> Int -> Double) -> [Double] -> Double
sumAtLocs ixer xs = ixer xs 1 + ixer xs 2 * ixer xs 3
ghci> sumAtLocs (!!) [1,2,3,4,5]
14

So, what functions could you pass to sumAtLocs? Can you only pass [Double] -> Int -> Double?

Well, not quite. Look at the above where we passed (!!), which has type forall a. [a] -> Int -> a!

In fact, what other types could we pass? Here are some examples:

fun1 :: [a] -> Int -> a
fun1 = (!!)

fun2 :: [a] -> Int -> a
fun2 xs i = reverse xs !! i

fun3 :: (Foldable t, Floating a) => t a -> Int -> a
fun3 xs i = if length xs > i then xs !! i else pi

fun4 :: Num a => [a] -> Int -> a
fun4 xs i = sum (take i xs)

fun5 :: (Integral b, Num c) => a -> b -> c
fun5 xs i = fromIntegral i

fun5 :: (Foldable t, Fractional a, Integral b) => t a -> b -> a
fun5 xs i = sum xs / fromIntegral i

fun5 :: (Foldable t, Integral b, Floating a) => t a -> b -> a
fun5 xs i = logBase (fromIntegral i) (sum xs)

What’s going on here? Well, the function expects a [Double] -> Int -> Double, but there are a lot of other types that could be passed instead.

At first this might seem like meaningless semantics or trickery, but it’s deeper than that: remember that each of the above types actually has a very different meaning and different possible behaviors!

  1. forall a. [a] -> Int -> a means that the a must come from the given list. In fact, any function with that type is guaranteed to be partial: if you pass it an empty list, there is no a available to use.
  2. forall a. Num a => [a] -> Int -> a means that the result might actually come from outside of the list: the implementation could always return 0 or 1, even if the list is empty. It also guarantees that it will only add, subtract, multiply, or abs: it will never divide.
  3. forall a. Fractional a => [a] -> Int -> a means that we could possibly do division on the result, but we can’t do anything “floating” like square rooting or logarithms.
  4. forall a. Floating a => [a] -> Int -> a means that we can possibly start square rooting or taking the logarithms of our input numbers
  5. [Double] -> Int -> Double gives us the least guarantees about the behavior: the result could come from thin air (and not be a part of the list), and we can even inspect the machine representation of our inputs.

So, we have all of these types with completely different semantics and meanings. And yet, they can all be passed to something expecting a [Double] -> Int -> Double. That means that they are all subtypes of [Double] -> Int -> Double! [Double] -> Int -> Double is a supertype that houses multitudes of possible values, uniting all of the possible values and semantics into one big supertype.

Through the power of parametric polymorphism and typeclasses, you can actually create an extensible hierarchy of supertypes, not just of subtypes.

Consider a common API for json serialization. You could have multiple functions that serialize into JSON:

fooToJson :: Foo -> Value
barToJson :: Bar -> Value
bazToJson :: Baz -> Value

Through typeclasses, you can create:

toJSON :: ToJSON a => a -> Value

The type of toJSON :: forall a. JSON a => a -> Value is a subtype of Foo -> Value, Bar -> Value, and Baz -> Value, because everywhere you would want a Foo -> Value, you could give toJSON instead. Every time you want to serialize a Foo, you could use toJSON.

This usage works well, as it gives you an extensible abstraction to design code around. When you write code polymorphic over Monoid a, it forces you to reason about your values with respect to only the aspects relating to monoidness. If you write code polymorphic over Num a, it forces you to reason about your values only with respect to how they can be added, subtracted, negated, or multiplied, instead of having to worry about things like their machine representation.

The extensibility comes from the fact that you can create even more supertypes of forall a. ToJSON a => a -> Value easily, just by defining a new typeclass instance. So, if you need a MyType -> Value, you could make it a supertype of toJSON :: ToJSON a => a -> Value by defining an instance of the ToJSON typeclass, and now you have something you can use in its place.

Practically this is used by many libraries. For example, ad uses it for automatic differentiation: its diff function looks scary:

diff :: (forall s. AD s ForwardDouble -> AD s ForwardDouble) -> Double -> Double

But it relies on the fact that that (forall s. AD s ForwardDouble -> AD s ForwardDuble) is a superclass of (forall a. Floating a => a -> a), (forall a. Num a => a -> a), etc., so you can give it functions like \x -> x * x (which is a forall a. Num a => a -> a) and it will work as that AD s type:

ghci> diff (\x -> x * x) 10
20      -- 2*x

This “numeric overloading” method is used by libraries for GPU programming, as well, to accept numeric functions to be optimized and compiled to GPU code.

Another huge application is in the lens library, which uses subtyping to unite its hierarchy of optics.

For example, an Iso is a subtype of Traversal which is a subtype of Lens, and Lens is a supertype of Fold and Traversal, etc. In the end the system even allows you to use id from the Prelude as a lens or a traversal, because the type signature of id :: a -> a is actually a subtype of all of those types!

Subtyping using Existential Types

What more closely matches the spirit of subtypes in OOP and other languages is the existential type: a value that can be a value of any type matching some interface.

For example, let’s imagine a value that could be any instance of Num:

data SomeNum = forall a. Num a => SomeNum a

someNums :: [SomeNum]
someNums = [SomeNum (1 :: Int), SomeNum (pi :: Double), SomeNum (0xfe :: Word)]

This is somewhat equivalent to Java’s List<MyInterface> or List<MyClass>, or python’s List[MyClass].

Note that to use this effectively in Haskell with superclasses and subclasses, you need to manually wrap and unwrap:

data SomeFrational = forall a. Fractional a => SumFractional a

castUp :: SomeFractional -> SumNum
castUp (SomeFractional x) = SomeNum x

So, SomeNum is “technically” a supertype of SomeFractional: everywhere a SomeNum is expected, a SomeFractional can be given…but in Haskell it’s a lot less convenient because you have to explicitly cast.

In OOP languages, you can often cast “down” using runtime reflection (SomeNum -> Maybe SomeFractional). However, this is impossible in Haskell the way we have written it!

castDown :: SomeNum -> Maybe SomeFractional
castDown = error "impossible!"

That’s because of type erasure: Haskell does not (by default) couple a value at runtime with all of its associated interface implementations. When you create a value of type SomeNum, you are packing an untyped pointer to that value as well as a “dictionary” of all the functions you could use it with:

data NumDict a = NumDict
    { (+) :: a -> a -> a
    , (*) :: a -> a -> a
    , negate :: a -> a
    , abs :: a -> a
    , fromInteger :: Integer -> a
    }

mkNumDict :: Num a => NumDict a
mkNumDict = NumDict (+) (*) negate abs fromInteger

data FractionalDict a = FractionalDict
    { numDict :: NumDict a
    , (/) :: a -> a -> a
    , fromRational :: Rational -> a
    }

-- | Essentially equivalent to the previous 'SomeNum'
data SomeNum = forall a. SomeNum
    { numDict :: NumDict a
    , value :: a
    }

-- | Essentially equivalent to the previous 'SomeFractional'
data SomeFractional = forall a. SomeFractional
    { fractionalDict :: FractionalDict a
    , value :: a
    }

castUp :: SomeFractional -> SomeNum
castUp (SomeFractional (FractionalDict {numDict}) x) = SomeNum d x

castDown :: SomeNum -> Maybe SomeFractional
castDown (SomeNum nd x) = error "not possible!"

All of these function pointers essentially exist at runtime inside the SomeNum. So, SomeFractional can be “cast up” to SomeNum by simply dropping the FractionalDict. However, you cannot “cast down” from SomeNum because there is no way to materialize the FractionalDict: the association from type to instance is lost at runtime. OOP languages usually get around this by having the value itself hold pointers to all of its interface implementations at runtime. However, in Haskell, we have type erasure by default: there are no tables carried around at runtime.2

In the end, existential subtyping requires explicit wrapping/unwrapping instead of implicit or lightweight casting possible in OOP languages optimized around this sort of behavior.3 Existential-based subtyping is just less common in Haskell because parametric polymorphism offers a solution to most similar problems. For more on this topic, Simon Peyton Jones has a nice lecture on the topic.

The pattern of using existentially qualified data in a container (like [SomeNum]) is often called the “widget pattern” because it’s used in libraries like xmonad to allow extensible “widgets” stored alongside the methods used to manipualte them. It’s more common to explicitly store the handler functions (a “dictionary”) inside the type instead of of existential typeclasses, but sometimes it can be nice to let the compiler handle generating and passing your method tables implicitly for you. Using existential typeclasses instead of explicit dictionaries also allows you to bless certain methods and functions as “canonical” to your type, and the compiler will make sure they are always coherent.

I do mention in a blog post about different types of existential lists, however, that this “container of instances” type is much less useful in Haskell than in other languages for many reasons, including the up/downcasting issues mentioned above. In addition, Haskell gives you a whole wealth of functionality to operate over homogeneous parameters (like [a], where all items have the same type) that jumping to heterogeneous lists gives up so much.

Aside

Let’s briefly take a moment to talk about how typeclass hierarchies give us subtle subtype/supertype relationships.

Let’s look at the classic Num and Fractional:

class Num a

class Num a => Fractional a

Num is a superclass of Fractional, and Fractional is a subclass of Num. Everywhere a Num constraint is required, you can provide a Fractional constraint to do the same thing.

However, in these two types:

Num a => a
Fractional a => a

forall a. Num a => a is actually a subclass of forall a. Fractional a => a! That’s because if you need a forall a. Fractional a => a, you can provide a forall a. Num a => a instead. In fact, let’s look at three levels: Double, forall a. Fractional a => a, and forall a. Num a => a.

-- can be used as `Double`
1.0 :: Double
1.0 :: Fractional a => a
1 :: Num a => a

-- can be used as `forall a. Fractional a => a`
1.0 :: Fractional a => a
1 :: Num a => a

-- can be used as `forall a. Num a => a`
1 :: Num a => a

So, Double is a supertype of Fractional a => a is a supertype of Num a => a.

The general idea here is that the more super- you go, the more you “know” about the actual term you are creating. So, with Num a => a, you know the least (and, you have the most possible actual terms because there are more instances of Num than of Fractional). And, with Double, you know the most: you even know its machine representation!

So, Num is a superclass of Fractional but forall a. Num a => a is a subclass of forall a. Fractional a => a. This actually follows the typical rules of subtyping: if something appears on the “left” of an arrow (=> in this case), it gets flipped from sub- to super-. We often call the left side a “negative” (contravariant) position and the right side a “positive” position, because a negative of a negative (the left side of a left size, like a in (a -> b) -> c) is a positive.

Also note that our “existential wrappers”:

data SomeNum = forall a. Num a => SomeFractional a
data SomeFractional = forall a. Fractional a => SomeFractional a

can be CPS-transformed to their equivalent types:

type SomeNum' = forall r. (forall a. Num a => a -> r) -> r
type SomeFractional' = forall r. (forall a. Fractional a => a -> r) -> r

toSomeNum' :: SomeNum -> SomeNum'
toSomeNum' (SomeNum x) f = f x

toSomeNum :: SomeNum' -> SomeNum
toSomeNum sn = sn SomeNum

And in those cases, Num and Fractional again appear in the covariant (positive) position, since they’re the negative of negative. So, this aligns with our intuition that SomeFractional is a subtype of SomeNum.

The Expression Problem

This tension that I described earlier is closely related to the expression problem, and is a tension that is inherent to a lot of different aspects of language and abstraction design. However, in the context laid out in this post, it serves as a good general guide to decide what pattern to go down:

  • If you expect a canonical set of “inhabitants” and an open set of “operations”, sum types can suit that end of the spectrum well.
  • If you expect a canonical set of “operations” and an open set of “inhabitants”, consider subtyping and supertyping.

I don’t really think of the expression problem as a “problem” in the sense of “some hindrance to deal with”. Instead, I see it in the “math problem” sort of way: by adjusting how you approach things, you can play with the equation make the most out of what requirements you need in your design.

Looking Forward

A lot of frustration in Haskell (and programming in general) lies in trying to force abstraction and tools to work in a way they weren’t meant to. Hopefully this short run-down can help you avoid going against the point of these design patterns and start making the most of what they can offer. Happy Haskelling!

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)


  1. ?↩︎

  2. Must OOP languages also have mechanisms for type erasure, but the default is unerased, which is opposite of Haskell.↩︎

  3. Note that there are current GHC proposals that attempt to allow “naked” existentials without newtype wrappers, so we could actually get the same seamless and implicit up-casting as we would get in OOP languages. However, the jury is out on whether or not this is a good idea.↩︎

by Justin Le at March 06, 2025 05:25 PM

Tweag I/O

Bazel and Testwell CTC++, revisited

A while ago, we wrote a post on how we helped a client initially integrate the Testwell CTC++ code coverage tool from Verifysoft into their Bazel build.

Since then, some circumstances have changed, and we were recently challenged to see if we could improve the CTC++/Bazel integration to the point were CTC++ coverage builds could enjoy the same benefits of Bazel caching and incremental rebuilds as regular (non-coverage) builds. Our objective was to make it feasible for developers to do coverage builds with CTC++ locally, rather than them using different coverage tools or delaying coverage testing altogether. Thus we could enable the client to focus their efforts on improving overall test coverage with CTC++ as their only coverage tool.

In this sequel to the initial integration, we, as a team, have come up with a more involved scheme for making CTC++ meet Bazel’s expectations of hermetic and reproducible build actions. There is considerable extra complexity needed to make this work, but the result is a typical speedup of 5-10 times on most coverage builds. The kind of speedup that not only makes your CI faster, but that allows developers to work in a different and more efficient way, altogether.

More generally, we hope this blog post can serve as a good example (or maybe a cautionary tale 😉) of how to take a tool that does not play well with Bazel’s idea of a well-behaved build step, and force it into a shape where we can still leverage Bazel’s strengths.

The status quo

You can read our previous blog post for more details, but here we’ll quickly summarize the relevant bits of the situation after our initial integration of CTC++ coverage builds with Bazel:

  • CTC++ works by wrapping the compiler invocation with its ctc tool, and adding coverage instrumentation between the preprocessing and compiling steps.
  • In addition to instrumenting the source code itself, ctc also writes instrumentation data in a custom text format (aka. symbol data) to a separate output file, typically called MON.sym (aka. the symbol file).
  • At runtime the instrumented unit tests will collect coverage statistics and write these (in binary form) to another separate output file: MON.dat.
  • As far as Bazel is concerned, both the MON.sym and MON.dat files are untracked side-effects of the respective compilation and testing steps. As such we had to poke a hole in the Bazel sandbox and arrange for these files to be written to a persistent location without otherwise being tracked or managed by Bazel.
  • More importantly, these side-effects mean that we have to disable all caching and re-run the entire build and all tests from scratch every single time. Otherwise, we would end up with incomplete MON.sym and MON.dat files.

Another consideration - not emphasized in our previous post since we had to disable caching of intermediate outputs in any case - is that the outputs from ctc are not hermetic and reproducible. Both the instrumentation that is added to the source code, as well as the symbol file that is written separately by ctc contain the following information that is collected at compile time:

  • Absolute paths to source code files: Even though Bazel passes relative paths on the command-line, ctc will still resolve these into absolute paths and record these paths into its outputs. Since all these build steps run inside the Bazel sandbox, the recorded paths vary arbitrarily from build to build. Even worse: the paths are made invalid as soon as the sandbox is removed, when the compilation step is done.
  • Timestamps: ctc will also record timestamps into the instrumented source code and the symbol file. As far as we know, these might have been part of some internal consistency check in previous versions of CTC++, but currently they are simply copied into the final report, and displayed as a property of the associated symbol data on which the HTML report is based. Since our coverage reports are already tied to known Git commits in the code base, these timestamps have no additional value for us.
  • Fingerprints: ctc calculates a 32-bit fingerprint based on the symbol data, and records this fingerprint into both the instrumented source and the symbol file. Since the symbol data already contains absolute path names as detailed above, the resulting fingerprint will also vary accordingly, and thus not be reproducible from one build to the next, even when all other inputs remain unchanged.

Outlining the problems to be solved

If we are to make CTC++ coverage builds quicker by leveraging the Bazel cache, we must answer these two questions:

  1. Can we make ctc’s outputs reproducible? Without this, re-enabling the Bazel cache for these builds is a non-starter, as each re-evaluation of an intermediate build step will have never-before-seen action inputs, and none of the cached outputs from previous builds will ever get reused.
  2. Can we somehow capture the extra MON.sym output written by ctc at build time, and appropriately include it into Bazel’s build graph?1 We need for Bazel to cache and reuse the symbol data associated with a compilation unit in exactly the same way that it would cache and reuse the object file associated with the same compilation unit.

Solving both of these would allow us to achieve a correct coverage report assembled from cached object files and symbol data from previously-built and unchanged source code, together with newly-built object files and symbol data from recently-changed source code (in addition to the coverage statistics collected from re-running all tests).

Achieving reproducibility

Let’s tackle the problem of making ctc’s outputs reproducible first. We start by observing that ctc allows us to configure hook scripts that will be invoked at various points while ctc is running. We are specifically interested in:

  • RUN_AFTER_CPP, allows access to the preprocessed source before the instrumentation step, and
  • RUN_AFTER_INSTR, allows access to the instrumented source before it’s passed on to the underlying compiler.

From our existing work, we of course also have our own wrapper script around ctc, which allows us to access the outputs of each ctc invocation before they are handed back to Bazel. We also know, from our previous work, that we can instruct ctc to write a separate symbol file per compilation unit, rather than have all compilation units append to the same MON.sym file.

Together this allows us to rewrite the outputs from ctc in such a way as to make them reproducible. What we want to rewrite, has already been outlined above:

  • Absolute paths into the sandbox: We could rewrite these into corresponding absolute paths to the original source tree instead, but we can just as well take it one step further and simply strip the sandbox root directory prefix from all absolute paths. This turns them into relative paths that happen to resolve correctly, whether they’re taken relative to the sandbox directory at compile time, or relative to the root of the source tree afterwards.
  • Timestamps: This one is relatively easy, we just need to decide on a static timestamp that does not change across builds. For some reason the CTC++ report tooling did not like us passing the ultimate default timestamp, aka. the Unix Epoch, so we instead settled for midnight on January 1 2024.2
  • Fingerprints: Here we need to calculate a 32-bit value that will reflect the complete source code in this compilation unit (but importantly with transient sandbox paths excluded). We don’t have direct access to the in-progress symbol data that ctc uses to calculate its own fingerprint, so instead we settle on calculating a CRC32 checksum across the entire preprocessed source code (before ctc adds its own instrumentation).3

Once we’ve figured out what to rewrite, we can move on to the how:

  1. Using the RUN_AFTER_CPP option to ctc, we can pass in a small script that calculates our new fingerprint by running the preprocessed source code through CRC32.
  2. Using the RUN_AFTER_INSTR option to ctc, we can pass in a script that processes the instrumented source, line by line:
    • rewriting any absolute paths that point into the Bazel sandbox,
    • rewriting the timestamp recorded by ctc into our static timestamp, and
    • rewriting the fingerprint to the one calculated in step 1.
  3. In our script that wraps the ctc invocation, we can insert the above two options on the ctc command line. We can also instruct ctc to write a separate .sym file for this compilation unit inside the sandbox.
  4. In the same wrapper script, after ctc is done producing the object file and symbol file for a compilation unit, we can now rewrite the symbol file that ctc produced. The rewrites are essentially the same as performed in step 2, although the syntax of the symbol file is different than the instrumented source.

At this point, we have managed to make ctc’s outputs reproducible, and we can proceed to looking at the second problem from above: properly capturing and maintaining the symbol data generated by ctc. However, we have changed the nature of the symbol data somewhat: Instead of having multiple compilation units write to the same MON.sym file outside of the sandbox, we now have one .sym file per compilation unit written inside the sandbox. These files are not yet known to Bazel, and would be removed together with the rest of the sandbox as soon as the compilation step is finished.

Enabling correct cache/reuse of symbol data

What we want to achieve here is for the symbol data associated with a compilation unit to closely accompany the corresponding object file from the same compilation unit: If the object file is cached and later reused by Bazel, we want the symbol file to be treated the same. And when the object file is linked into an executable or a shared library, we want the symbol file to automatically become part of any coverage report that is later created based on running code from that executable or library.

I suspect there are other ways we could handle this, for example using Bazel aspects, or similar, but since we’re already knee-deep in compiler wrappers and rewriting outputs…

In for a penny, in for a pound…

Given that we want the symbol file to be as closely associated with the object file as possible, let’s take that to the ultimate conclusion and make it a stowaway inside the object file. After all, the object file is “just� an ELF file, and it does not take too much squinting to regard the ELF format as a generic container of sections, where a section really can be any piece of data you like.

The objcopy tool, part of the GNU binutils tool suite, also comes to our aid with options like --add-section and --dump-section to help us embed and extract such sections from any ELF file.

With this in hand, we can design the following scheme:

  1. In our wrapper script, after ctc has generated an object file with an accompanying symbol file, we run objcopy --add-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE to embed the symbol file as a new ctc_sym section inside the object file.
  2. We make no changes to our Bazel build, otherwise. We merely expect Bazel to collect, cache, and reuse the object files as it would do with any intermediate build output. The symbol data is just along for the ride.
  3. In the linking phase (which is already intercepted by ctc and our wrapper script) we can forward the symbol data from the linker inputs (ELF object files) into the linker output (a shared library or executable, also in the ELF format), like this: Extract the ctc_sym from each object file passed as input (objcopy --dump-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE /dev/null), then concatenate these symbol files together, and finally embed that into the ELF output file from the linker.4
  4. At test run time, in addition to running the tests (which together produce MON.dat as a side effect), we can iterate over the test executables and their shared library dependencies, and extract any ctc_sym sections that we come across. These are then split into separate symbol files and placed next to MON.dat.
  5. Finally, we can pass MON.dat and all the .sym files on to the ctcreport report generator to generate the final HTML report.5

Results

With all of the above in place, we can run coverage builds with and without our changes, while testing various build scenarios, to see what we have achieved.

Let’s look at some sample build times for generating CTC++ coverage reports. All times below are taken from the best of three runs, all on the same machine.

Status quo

Starting with the situation as of our previous blog post:

Scope of coverage build + tests bazel build/test ctcreport Total
Entire source tree 38m46s 2m06s 44m26s
One large application 13m59s 43s 15m30s
One small application 21s 1s 35s

Since caching is intentionally disabled and there is no reuse between these coverage builds, these are the kinds of numbers you will get, no matter the size of your changes since the last coverage build.

Let’s look at the situation after we made the changes outlined above.

Worst case after our changes: No cache to be reused

First, for a new coverage build from scratch (i.e. a situation in which there is nothing that can be reused from the cache):

Scope of coverage build + tests bazel build/test ctcreport Total Speedup
Entire source tree 38m48s 1m59s 43m03s 1.0x
One large application 13m04s 43s 14m26s 1.1x
One small application 19s 1s 22s 1.6x

As expected, these numbers are very similar to the status quo. After all, we are doing the same amount of work, and this is not the scenario we sought to improve in any case.

There is maybe a marginal improvement in the overhead (i.e. the time spent between/around bazel and ctcreport), but it’s pretty much lost in the noise, and certainly nothing worth writing a blog post about.

Best case after our changes: Rebuild with no changes

This is the situation where we are now able to reuse already-instrumented intermediate build outputs. In fact, in this case there are no changes whatsoever, and Bazel can reuse the test executables from the previous build directly, no (re-)building necessary. However, as discussed above, we do need to re-run all tests and then re-generate the coverage report:

Scope of coverage build + tests bazel build/test ctcreport Total Speedup
Entire source tree 3m24s 1m58s 6m55s 6.4x
One large application 1m31s 42s 2m49s 5.5x
One small application 1s 1s 4s 8.8x

Common case after our changes: Rebuild with limited change set

This last table is in many ways the most interesting (but least accurate), as it tries to reflect the common case that most developers are interested in:

“I’ve made a few changes to the source code, how long will I have to wait to see the updated coverage numbers?�

Of course, as with a regular build, it depends on the size of your changes, and the extent to which they cause misses in Bazel’s build cache. Here, I’ve done some small source code change that cause rebuilds in a handful of compilation units:

Scope of coverage build + tests bazel build/test ctcreport Total Speedup
Entire source tree 3m23s 1m57s 6m54s 6.4x
One large application 1m34s 42s 2m52s 5.4x
One small application 4s 1s 6s 5.8x

The expectation here would be that the total time needed is the sum of how long it takes to do a regular build of your changes, plus the numbers from the no-op case above. And this seems to largely hold true. Especially for the single- application case were we expect your changes to affect application’s unit tests, and therefore the build phase must strictly precede the test runs.

In the full source tree scenario, it seems that Bazel can start running other (unrelated) tests concurrently with building your changes, and as long as your changes, and the tests on which they depend, are not among the slowest tests to run, then those other, slower test will “hide� the marginal build time cost imposed by your changes.

Conclusion

We have achieved what we set out to do: to leverage the Bazel cache to avoid unnecessary re-building of coverage-instrumented source code. It involves a fair amount of added complexity in the build process, in order to make CTC++‘s outputs reproducible, and thus reusable by Bazel, but the end result, in the common case - a developer making a small source code change relative to a previous coverage build - is a 5-10x speedup of the total time needed to build and test with coverage instrumentation, including the generation of the final coverage report.

Future work

A natural extension of the above scheme is to apply a similar treatment to the generation of the coverage statistics at test runtime: Bazel allows for test runs to be cached, so that later build/test runs can reuse the results and logs from earlier test runs, rather than having to re-run tests that haven’t changed.

However, in much the same way as for symbol data at build time, we would need to make sure that coverage statistics (.dat files) were saved and reused along with the corresponding test run results/logs.

One could imagine each test creating a separate .dat file when run, and then have Bazel cache this together with the test logs. The report generation phase would then need to collect the .dat files from both the reused/cached and the new/uncached test runs, and pass them all to the ctcreport tool. Failure to do so correctly would cause coverage statistics to be lost, and the resulting coverage report would be misleading.

With all this in place we could then enable caching of test results (in practice, removing the --nocache_test_results flag that we currently pass), and enjoy yet another speedup courtesy of Bazel’s cache.

That said, we are entering the realm of diminishing returns: Unit tests - once they are built - typically run quickly, and there is certainly less time to be saved here than what is saved by reusing cached build results. Looking at the above numbers: even if we were able to fully eliminate time used by bazel test, we would still only achieve another 2x speedup, theoretically.

For now, we can live with re-running all tests from scratch in order to create a complete MON.dat file, every time.

And that is where I believe it stops: extending this even further to incrementally generate the coverage report itself, in effect to re-generate parts of the report based on a few changed inputs, is - as far as I can see - not possible with the existing tools.

Finally, I want to commend Verifysoft for their understanding and cooperation. I can only imagine that for someone not used to working with Bazel, our initial questions must have seemed very eccentric. They were, however, eager to understand our situation and find a way to make CTC++ work for us. They have even hinted at including a feature in a future version of CTC++ to allow shortening/mapping paths at instrumentation time. Using such a feature to remove the sandbox paths would also have the nice side effect of making CTC++‘s own fingerprint logic reproducible, as far as we can see. Together, this would enable us to stop rewriting paths and fingerprints on our own.

Thanks to Mark Karpov for being my main co-conspirator in coming up with this scheme, and helping to work out all the side quests and kinks along the way.

Also thanks to Christopher Harrison, Joseph Neeman, and Malte Poll for their reviews of this article.


  1. For now, we ignore the non-hermetic writing of MON.dat files. See the section on future work for how tackling this properly is in many ways similar (and similarly complex) to what we’re doing for the CTC++ symbol data in the rest of this article.↩
  2. On reconsideration, we should probably have used the somewhat standardized $SOURCE_DATE_EPOCH environment variable here rather than coming up with our own static date. In practice, it should not matter.↩
  3. In later talks with Verifysoft, we have been given the OK that this fingerprint scheme should be sufficient for our purpose, at least until a new version of CTC++ that allows for more reproducible fingerprints are available.↩
  4. It seems that - by default - the linker is doing almost exactly what we want: The ctc_sym sections from the linker inputs are indeed automatically concatenated into the linker output. However, the linker appears to discard sections from inputs that are completely optimized away at link time. But we do in fact want these symbol data sections to be retained, otherwise the final coverage report would omit the corresponding source files rather than showing them as lacking test coverage. Hence we resort to maintaining the ctc_sym section ourselves at link time.↩
  5. As an extra sanity check, ctcreport will verify that the fingerprints from inside the given .sym files match the corresponding fingerprints recorded alongside the coverage statistics in the MON.dat file. Thus we can discover if we’ve messed up somewhere along the way.↩

March 06, 2025 12:00 AM

March 01, 2025

Mark Jason Dominus

Jonathan Chait

[ Content warning: angry, contemptuous ranting that accomplishes nothing. ]

I didn't really know who Jonathan Chait was until last week when I unfortunately read this essay of his (from February 2016) on “Why Liberals Should Support a Trump Republican Nomination”.

I've said a lot of dumb things in my life but I don't think I've ever been as wrong about anything as Chait was about this. I sure hope I haven't. But if I do ever find out I had been this wrong about something, I would want to retire to a cave or a mountaintop or something.

“Hey, remember Dominus? Whatever happened to him, anyway?”

“Oh, he said he was going away to cleanse himself of error, and might not be back for a long time.”

And yet this guy is still shamelessly writing. And why not? Editors are still buying his essays and maybe people are even still reading them. Why? You'd think that people would look at this essay and say “yeah, that's enough Chait for me, thanks, next time I need an opinion I'll try someone else.” I get it, nobody's right all the time. Whenever you read anyone's essay you're taking a risk, like rolling a die. Sometimes the die rolls high, sometimes it rolls low, and some dice might have higher numbers to begin with. I've usually been well-served by Daniel Dennett's dice, and Robertson Davies'.

But here people have an opportunity to toss a totally unknown die that they haven't tried before but that most likely rolls numbers from 1 to 6, and instead they toss the Jonathan Chait die when they know it has at least one side with a -1000.

A long time ago, I wrote:

I hate trying to predict the future; I don't think I'm good at it and I don't think anyone else is.

I don't think anyone could have predicted the extent of the current fiasco, but I do think it should not have been hard to predict, in 2016, that liberals should not, in fact, have supported a Trump Republican nomination.

Anyone can be wrong, even the wise cannot see all ends. But I think this one was maybe not so hard to see. Chait spends a lot of time comparing Trump with Arnold Schwartzenegger: both nominally conservative, both inexperienced in government, both assholes. I think the part that Chait ignored was that by 2016 — no, scratch that, by 1990 — it was perfectly clear that Trump was a liar, a thief, a racist, and a deadbeat, and that he had no respect for law or truth or ethics or anything other than his own convenience of the moment. (Here are just two examples. More recently, his ridiculous years-long insistence that Barack Obama was born in Kenya. And earlier, his equally ridiculous lies around his destruction of the Bonwit Teller building.)

In that old essay I said:

Most people who try don't seem to revisit their old predictions to see if they were correct, or to learn from their past errors, and the people who listen to them never do this.

I looked around a little to see if Jonathan Chait had written an essay titled “I was wrong, I was so, so wrong, I just couldn't have been wronger” but I didn't find one and I also didn't find any recent essays that said anything like “here's why I think this new essay is more reliable than that embarrassing Trump one I wrote for The New Yorker in 2016.”

I don't understand how Chait still has a job after writing this essay. Why isn't he selling shoes? How does a writer come back from this? Isn't there some charitable society for the protection for the public that could pay to have someone follow Chait around, quoting out loud from this essay, as a warning to everyone he meets for the rest of his life?

It least now I've been warned. Now when I read “Jonathan Chait said recently…” I'll remember: “Oh, you mean Jonathan ‘Why Liberals Should Support a Trump Republican Nomination’ Chait! Thanks, I'll pass.”

by Mark Dominus (mjd@plover.com) at March 01, 2025 12:00 PM

February 27, 2025

Russell O'Connor

The Right Number of YubiKeys is Three

Four years ago I bought a pair of YubiKey 5s: One YubiKey 5 Nano, which fits in my laptop’s USB slot, and another YubiKey 5 NFC as backup, which sat in my home office. However, I kept worrying about what happens if my house burns down or something, taking both my laptop and office YubiKeys together at the same time. On the otherhand, if I stored my YubiKey 5 NFC offsite, then whenever I needed to register a new FIDO service, I would need to go fetch the key, update it, and then return it. Based my peronal experince, even if that were not a big pain, the "return it" step often gets delayed indefinitely because it feels so low priority.

Then I read a popular comment made on Hacker News: Get three YubiKeys. Suddenly everything clicked! I bought a second YubiKey 5 NFC last year.

Now, I keep a second YubiKey 5 NFC offsite, in addition to the one in my laptop and the one in my office. If my home burns down, I still have an offsite YubiKey available. But the best thing about having a second YubiKey 5 NFC is that it partly mitigates the offsite update problem.

In the previous scenario, we required potentially two trips offsite to update the backup YubiKey. However, now the procedure is to register a new FIDO service is to first update the office YubiKey 5 NFC key (and the YubiKey 5 nano). Then, at your earlist convienence, you swap the office YubiKey 5 NFC key with the offsite YubiKey 5 NFC. When you get the offsite YubiKey home, you update it with the new FIDO service and then it becomes the new office YubiKey. There is no need to return to the offsite location.

Part of the issue is that there is no "public FIDO key", like there is with a "public PGP key". You need the acutual YubiKey in hand to register it with a FIDO service, no matter whether it is a discoverable credetial or not. If you were only using the YubiKey as a OpenPGP smart card, the perhaps you could get away with just having a local key and an offsite key. Even still, I would recommend a third YubiKey so that whenever the time comes to do some operation on your offsite key, you can perform the same swaping trick.

The title of this article says that three is the right number of YubiKeys. However this is because I only have one nano in my laptop because that is my primary computing interface. I do have a desktop computer that I mostly only access as a remote server. If you have multiple computer devices that you regularly use, it would make sense to have a YubiKey nano device in each of them. And in addition to those, have one offsite key, and one local key for swapping with the offsite key.

February 27, 2025 01:25 AM

Tweag I/O

Evaluating the evaluators: know your RAG metrics

Retrieval-augmented generation (RAG) is about providing large language models with extra context to help them produce more informative responses. Like any machine learning application, your RAG app needs to be monitored and evaluated to ensure that it continues to respond accurately to user queries. Fortunately, the RAG ecosystem has developed to the point where you can evaluate your system in just a handful of lines of code. The outputs of these evaluations are easily interpretable: numbers between 0 and 1, where higher numbers are better. Just copy our sample code below, paste it into your continuous monitoring system, and you’ll be looking at nice dashboards in no time. So that’s it, right?

Well, not quite. There are several common pitfalls in RAG evaluation. From this blog post, you will learn what the metrics mean and how to check that they’re working correctly on your data with our field-gained knowledge. As they say, “forewarned is forearmed”!

Background

If you’re new to RAG evaluation, our previous posts about it give an introduction to evaluation and discuss benchmark suites. For now, you just need to know that a benchmark suite consists of a collection of questions or prompts, and for each question establishes:

  • a “ground truth” context, consisting of documents from our database that are relevant for answering the question; and
  • a “ground truth” answer to the question.

For example

Query Ground truth context Ground truth answer
What is the capital of France? Paris, the capital of France, is known for its delicious croissants. Paris
Where are the best croissants? Lune Croissanterie, in Melbourne, Australia, has been touted as ‘the best croissant in the world.’ Melbourne

Then the RAG system provides (for each question):

  • a “retrieved” context — the documents that our RAG system thought were relevant — and
  • a generated answer.
evaluator inputs
The inputs to a RAG evaluator

Example

Here’s an example that uses the Ragas library to evaluate the “faithfulness” (how well the response was supported by the context) of a single RAG output, using an LLM from AWS Bedrock:

from langchain_aws import ChatBedrockConverse
from ragas import EvaluationDataset, evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import Faithfulness

# In real life, this probably gets loaded from an internal file (and hopefully
# has more than one element!)
eval_dataset = EvaluationDataset.from_list([{
    "user_input": "What is the capital of France?",
    "retrieved_contexts": ["Berlin is the capital of Germany."],
    "response": "I don't know.",
}])

# The LLM to use for computing metrics (more on this below).
model = "anthropic.claude-3-haiku-20240307-v1:0"
evaluator = LangchainLLMWrapper(ChatBedrockConverse(model=model))
print(evaluate(dataset=eval_dataset, metrics=[Faithfulness(llm=evaluator)]))

If you paid close attention in the previous section, you’ll have noticed that our evaluation dataset doesn’t include all of the components we talked about. That’s because the “faithfulness” metric only requires the retrieved context and the generated answer.

RAG evaluation metrics

There are a variety of RAG evaluation metrics available; to keep them straight, we like to use the RAG Triad, a helpful system of categorizing some RAG metrics. A RAG system has one input (the query) and two outputs (the context and the response), and the RAG Triad lets us visualize the three interactions that need to be evaluated.

rag triad
The RAG triad

Evaluating retrieval

Feeding an LLM with accurate and relevant context can help it respond well; that’s the whole idea of RAG. Your system needs to find that relevant context, and your evaluation system needs to figure out how well the retrieval is working. This is the top-right side of the RAG Triad: evaluating the relationship between the query and the retrieved context. The two main retrieval metrics are precision and recall; each one has a classical definition, plus an “LLM-enhanced” definition for RAG. Roughly, “good precision” means that we don’t return irrelevant information, while “good recall” means that we don’t miss any relevant information. Let’s say that each of our benchmark queries is labelled with a ground truth set of relevant documents, so that we can check how many of the retrieved documents are relevant.

Then the classical precision and recall are

<semantics>precision=# relevant retrieved docs# retrieved docsrecall=# relevant retrieved docs# relevant docs in the database<annotation encoding="application/x-tex">\text{precision} = \frac{\text{\# relevant retrieved docs}}{\text{\# retrieved docs}} \qquad \text{recall} = \frac{\text{\# relevant retrieved docs}}{\text{\# relevant docs in the database}}</annotation></semantics>precision=# retrieved docs# relevant retrieved docsrecall=# relevant docs in the database# relevant retrieved docs

These metrics are well-established, useful, and easy to compute. But in a RAG system, the database might be large, uncurated, and contain redundant documents. For example, suppose you have ten related documents, each containing an answer to the query. If your retrieval system returns just one of them then it will have done its job adequately, but it will only receive a 10% recall score. With a large database, it’s also possible that there’s a document with the necessary context that wasn’t tagged as relevant by the benchmark builder. If the retrieval system finds that document, it will be penalized in the precision score even though the document is relevant.

Because of these issues with classical precision and recall, RAG evaluations often adapt them to work on statements instead of documents. We list the statements in the ground-truth context and in the retrieved context; we call a retrieved statement “relevant” if it was present in the ground-truth context.

<semantics>precision=# relevant retrieved statements# retrieved statementsrecall=# relevant retrieved statements# ground truth statements<annotation encoding="application/x-tex">\text{precision} = \frac{\text{\# relevant retrieved statements}}{\text{\# retrieved statements}} \qquad \text{recall} = \frac{\text{\# relevant retrieved statements}}{\text{\# ground truth statements}}</annotation></semantics>precision=# retrieved statements# relevant retrieved statementsrecall=# ground truth statements# relevant retrieved statements

This definition of precision and recall is better tailored to RAG than the classical one, but it comes with a big disadvantage: you need to decide what a “statement” is, and whether two statements are “the same.” Usually you’ll want to automate this decision with an LLM, but that raises its own issues with cost and reliability. We’ll say more about that later.

Evaluating generation

Once your retrieval is working well — with continuous monitoring and evaluation, of course — you’ll need to evaluate your generation step. The most commonly used metric here is faithfulness1, which measures whether a generated answer is factually supported by the retrieved context; this is the bottom side of the RAG Triad. To calculate faithfulness, we count the number of factual claims in the generated answer, and then decide which of them is supported by the context. Then we define

<semantics>faithfulness=# context-supported statements# statements<annotation encoding="application/x-tex">\text{faithfulness} = \frac{\text{\# context-supported statements}}{\text{\# statements}}</annotation></semantics>faithfulness=# statements# context-supported statements

Like the RAG-adapted versions of context precision and recall, this is a statement-based metric. To automate it, we’d need an LLM to count the factual claims and decide which of them is context-supported.

You can evaluate faithfulness without having retrieval working yet, as long as you have a benchmark with ground truth contexts. But if you do that, there’s one crucial point to keep in mind: you also need to test generation when retrieval is bad, like when it contains distracting irrelevant documents or just doesn’t have anything useful at all. Bad retrieval will definitely happen in the wild, and so you need to ensure that your generation (and your generation evaluation) will degrade gracefully. More on that below.

Evaluating the answer

Finally, there is a family of commonly-used generation metrics that evaluate the quality of the answer by comparing it to the prompt and the ground truth:

  • answer semantic similarity measures the semantic similarity between the generated answer and the ground truth;
  • answer correctness also compares the generated answer and the ground truth, but is based on counting factual claims instead of semantic similarity; and
  • answer relevance measures how well the generated answer corresponds to the question that the prompt asked. This is the top-left side of the RAG Triad.

These metrics directly get to the key outcome of your RAG system: are the generated responses good? They come with the usual pluses and minuses of end-to-end metrics. On the one hand, they measure exactly what you care about; on the other hand, when they fail you don’t know which component is to blame.

As you’ve seen above, many of the metrics used for evaluating RAG rely on LLMs to extract and evaluate factual claims. That means that some of the same challenges you’ll face while building your RAG system also apply to its evaluation:

  • You’ll need to decide which model (or models) to use for evaluation, taking into account cost, accuracy, and reliability.
  • You’ll need to sanity-check the evaluator’s responses, preferably with continuous monitoring and occasional manual checks.
  • Because the field is moving so quickly, you’ll need to evaluate the options yourself — any benchmarks you read online have a good chance of being obsolete by the time you read them.

When the judges don’t agree

In order to better understand these issues, we ran a few experiments on a basic RAG system — without query re-writing, context re-ranking or other tools to improve retrieval — using the Neural Bridge benchmark dataset as our test set. We first ran these experiments in early 2024; when we re-visited them in December 2024 we found that newer base LLMs had improved results somewhat but not dramatically.

The Neural Bridge dataset contains 12,000 questions; each one comes with a context and an answer. We selected 200 of these questions at random and ran them through a basic RAG system using Chroma DB as the vector store and either Llama 2 or Claude Haiku 3 as the LLM for early 2024 and December 2024 runs, respectively. The RAG system was not highly tuned — for example, its retrieval step was just a vector similarity search — and so it gave a mix of good answers, bad answers, and answers saying essentially “I don’t know: the context doesn’t say.” Finally, we used Ragas to evaluate various metrics on the generated responses, while varying the LLMs used to power the metrics.

Experimental results

Our goal in these experiments was to determine:

  1. whether the LLM evaluators were correct, and
  2. whether they were consistent with one another.

We found that different LLMs are often not in agreement. In particular, they can’t all be correct.

Here are the evaluation scores of five different models on four different metrics, averaged across our benchmark dataset. You’ll notice a fair amount of spread in the scores for faithfulness and context precision.

model comparison
Average metrics scores across models

But the scores above are just averages across the dataset — they don’t tell us how well the LLMs agreed on individual ratings. For that, we checked the correlation between model scores and again found some discrepancies between models. Here are the results for answer relevancy scores: the correlations show that even though the different models gave very similar average scores, they aren’t in full agreement.

correlation
Correlation of answer relevancy scores across models. A score of one means that the models agree completely, while a score of zero means that they agree or disagree essentially at random.

It might not be too surprising that models from the same family (GPT 3.5 and 4, and Sonnet 3 and 3.5) had larger overlaps than models from different families. If your budget allows it, choosing multiple uncorrelated models and evaluating with all of them might make your evaluation more robust.

When faithfulness gets difficult

We dug a little more into the specific reasons for LLM disagreement, and found something interesting about the faithfulness score: we restricted to the subset of questions for which retrieval was particularly bad, having no overlap with the ground truth data. Even the definition of faithfulness is tricky when the context is bad. Let’s say the LLM decides that the context doesn’t have relevant information and so responds “I don’t know” or “The context doesn’t say.” Are those factual statements? If so, are they supported by the context? If not, then according to the definition, the faithfulness is zero divided by zero. Alternatively, you could try to detect responses like this and treat them as a sort of meta-response that doesn’t go through the normal metrics pipeline. We’re not sure how best to handle this corner case, but we do know that you need to do it explicitly and consistently. You also need to be prepared to handle null values and empty responses from your metrics pipeline, because this situation often induces them.

Experimental results

On the subset of questions with poor retrieval our Ragas-computed faithfulness scores ranged from 0%, as judged by Llama 3, to more than 80%, as judged by Claude 3 Sonnet. We emphasize that these were faithfulness scores evaluated by different LLMs judging the same retrievals, responses, and generated answers. Even if you exclude Llama 3 as an outlier, there is a lot of variation.

faithfulness
Faithfulness scores across models, when the context is bad

This variation in scores doesn’t seem to be an intentional choice (to the extent that LLMs can have “intent”) by the evaluator LLMs, but rather a situation of corner cases compounding one another. We noticed that this confusing situation made some models — Llama 3 most often, but also other models — fail to respond in the JSON format expected by the Ragas library. Depending on how you treat these failures, this can result in missing metrics or strange scores. You can sidestep these issues somewhat if you have thorough evaluation across the entire RAG pipeline: if other metrics are flagging poor retrieval, it matters less that your generation metrics are behaving strangely on poorly-retrieved examples.

In general, there’s no good substitute for careful human evaluation. The LLM judges don’t agree, so which one agrees best with ground truth human evaluations (and is the agreement good enough for your application)? That will depend on your documents, your typical questions, and on future releases of improved models.

Conclusion

Oh, were you hoping we’d tell you which LLM you should use? No such luck: our advice would be out of date by the time you read this, and if your data doesn’t closely resemble our benchmark data, then our results might not apply anyway.

In summary, it’s easy to compute metrics for your RAG application, but don’t just do it blindly. You’ll want to test different LLMs for driving the metrics, and you’ll need to evaluate their outputs. Your metrics should cover all the sides of the RAG triad, and you should know what they mean (and be aware of their corner cases) so that you can interpret the results. We hope that helps, and happy measuring!


  1. The terminology is not quite settled: what Ragas calls “faithfulness,” TruLens calls “groundedness.” Since the RAG Triad was introduced by TruLens, you’ll usually see it used in conjunction with their terminology. We’ll use the Ragas terminology in this post, since that’s what we used for our experiments.

February 27, 2025 12:00 AM

February 26, 2025

GHC Developer Blog

GHC 9.12.2-rc1 is now available

GHC 9.12.2-rc1 is now available

Ben Gamari - 2025-02-26

The GHC developers are happy to announce the availability of the first and likely final release candidate of GHC 9.12.2. This is an important bug-fix release resolving a significant correctness issue present in 9.12.1 (#25653).

In accordance with our under-discussion release policies this candidate will have a two-week testing window. The final 9.12.2 release will likely come the week of 12 March 2025.

As always, if you find anything amiss please open a ticket.

Happy Haskelling!

  • Ben

by ghc-devs at February 26, 2025 12:00 AM

February 20, 2025

Tweag I/O

From minimal skeletons to comprehensive transactions with cooked-validators

Cooked Validators is a Haskell library designed to simplify the complex process of crafting and testing transactions on the Cardano blockchain. Writing proper transactions in Cardano can be challenging due to its UTXO-based model, which requires precise definitions and careful structuring of inputs, outputs, and complementary components. cooked-validators tackles these challenges by offering a powerful framework for defining transactions in a minimal and declarative manner while incorporating a significant degree of automation.

One of the library’s core strengths lies in its ability to help developers transform simple transaction templates, referred to as “skeletons”, or TxSkel, into fully-formed transactions that satisfy the technical requirements of Cardano’s validation process. This automation not only minimizes boilerplate code but also reduce the room for errors, thus streamlining the creation and testing of transactions. In particular, we’ve used cooked-validators extensively to rigorously audit smart contracts for many clients and well-known products now live on Cardano.

Although cooked-validators has been a reliable tool for years, no blog post has yet explored how it automates key aspects of transaction creation, simplifying complex processes into manageable workflows. This post aims to fill that gap by showcasing how the library helps developers build Cardano transactions with ease and efficiency, allowing them to focus on high-level design and intent rather than getting bogged down by low-level technical details.

Validating transactions in cooked-validators

cooked-validators provides a convenient way to interact with the blockchain through a type class abstraction, MonadBlockChain. Among the primitives provided by this type class, the most fundamental is validateTxSkel which:

  1. takes a transaction skeleton as input,
  2. expands the skeleton’s content based on missing parts and skeleton options,
  3. generates a transaction,
  4. submits this transaction for validation, and
  5. returns the validated transaction, or throws an error if it is invalid.

Thus, the function has the following type signature:

validateTxSkel :: (MonadBlockChain m) => TxSkel -> m CardanoTx

In the remainder of this post, we will explore the fields of the transaction skeleton (TxSkel) and how validateTxSkel behaves when automatically expanding this skeleton.

Transaction skeletons

Cardano transactions are usually represented by large Haskell records containing a predefined set of fields that evolve alongside the Cardano protocol. The traditional approach to building transactions involves directly creating instances of these records and submitting them for validation.

In cooked-validators, however, transactions are further abstracted through a custom record called TxSkel, which has its own set of fields, some of which map directly to corresponding fields in a Cardano transaction, while others guide the translation process. The primary motivation behind using this abstraction is to highlight the most relevant information for common use cases while hiding less critical details that can be inferred automatically based on the provided data1.

There are several additional reasons for the use of TxSkel:

  • Our transaction skeletons embed as much type information as possible for scripts and UTXOs, thus increasing type-safety.
  • Each transaction skeleton includes its own set of options to guide transaction generation, with sensible default values.
  • Our transaction skeletons have default values for all fields, allowing users to provide minimal information relevant to their use-case.
  • Our skeleton elements use meaningful, yet simple types, avoiding the need for the complex overlays and type annotations commonly found in Cardano or Ledger APIs, which are avoided by defaulting to the current Cardano era.

While TxSkel is designed to be lighter and more user-friendly than Cardano transactions, it does not compromise user flexibility. Since TxSkel ultimately generates Cardano transactions, users are provided with the option to manually tweak the generated transaction if desired. This ensures that users retain full control and can build their Cardano transactions in any way they prefer.

To build a transaction skeleton, users simply override the fields they need to set from the default skeleton, txSkelTemplate.

txSkelTemplate
  { txSkelIns = ...,
    txSkelMints = ...,
	...
  }

From manual ADA payments to automated transaction balancing

The first feature one might expect from a transaction is to pay assets to a given peer. Surprisingly, this can be quite complex due to the underlying extended UTXO model on which Cardano is based. Without diving too deeply into the details, it’s important to understand that exchanging assets in Cardano is done through “pouches” of various sizes, called UTXOs. If Alice wants to send 12 ADA (Cardano’s currency) to Bob, and she possesses one UTXO with 4 ADA and another with 10 ADA, she will have to provide both UTXOs, create a new UTXO with 12 ADA for Bob, and return a UTXO with 2 ADA for herself. Moreover, she will also need to account for transaction fees, meaning the returning UTXO will actually contain something like 1.998222 ADA (1,998,222 lovelace).

In summary, this seemingly simple payment of 12 ADA will result in a transaction with 2 inputs and 2 outputs, along with an additional “phantom” payment corresponding to the transaction fees. However, from the user’s perspective, the key point is that Alice needs to pay 12 ADA to Bob. cooked-validators allows users to focus on these high-level intentions, as demonstrated by the following skeleton:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ ada 12],
    txSkelSigners = [alice]
  }

In this skeleton, we specify that the transaction pays 12 ADA to Bob and that Alice is a signer of the transaction. And that’s it.

Internally, cooked-validators processes this skeleton through a balancing phase. In this context, “balancing” is a multifaceted term. It not only refers to ensuring that the inputs and outputs of the transaction contain the same amount of ADA (and other assets)2, but also to calculating fees, accounting for them in the transaction, and handling associated collaterals when necessary (funds that are made available within the transaction in case a script failure occurs during validation). This automated process is a part of the added value provided by cooked-validators.

Computing fees, collaterals, and balancing transactions is notoriously difficult in Cardano due to circular dependencies (higher fees imply more collaterals, which increase transaction size, which in turn leads to higher fees…) and the unpredictable resource consumption of scripts in terms of memory space and computation cycles. See cooked-validators’s documentation for the details of what balancing involves, how cooked-validators performs it, and the options available to control balancing. Notably, cooked-validators is non-invasive, meaning that the automation can be disabled if needed. For instance, users can manually set fees and collaterals and even balance their transactions themselves.

After balancing, the skeleton will look like this:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ ada 12, paysPk alice $ lovelace 1_998_222],
    txSkelIns = Map.fromList [(aliceUtxo1, emptyTxSkelRedeemer), (aliceUtxo2, emptyTxSkelRedeemer)],
    txSkelSigners = [alice]
  }

In most cases, this skeleton will remain hidden from the user, though it can be retrieved and used if necessary by manually invoking the balancing function or checking the logs.

From manual payments to automated minimal amount of ADA

While Alice is using Cardano, she might come across non-ADA tokens with custom names4, such as mySmartContractToken. These tokens are provided by smart contracts and dedicated to specific purposes such as NFTs to represent ownership of a certain resource. Alice might also want to send such a token to Bob:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ mySmartContractToken 1],
    txSkelSigners = [alice]
  }

As shown above, cooked-validators will attempt to balance this skeleton by retrieving an instance of mySmartContractToken from Alice’s UTXOs, along with the necessary ADA to cover the transaction fee. However, validating the resulting balanced skeleton will fail because Cardano requires every UTXO to include a minimum amount of lovelace to cover its storage cost. This minimum amount, derived from the protocol parameters, also acts as a safeguard against potential security risks that could arise if UTXOs were allowed to exist without any ADA. Thankfully, cooked-validators can automatically calculate this required amount when the appropriate transaction option is enabled. The updated skeleton then becomes:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ permanentToken 1],
    txSkelSigners = [alice],
    txSkelOpts = def { txOptEnsureMinAda = True }
  }

Enabling this option triggers an initial transformation pass, before balancing, which calculates the required amount of ADA to sustain the output and adds this amount to the transaction skeleton. After both passes, the skeleton will resemble something like, with remainingValue being the original value in Alice’s UTXO minus the fees and the payment to Bob:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ permanentToken 1 <> lovelace 546_000, paysPk alice remainingValue],
    txSkelIns = Map.singleton aliceUtxo emptyTxSkelRedeemer,
    txSkelSigners = [alice],
    txSkelOpts = def { txOptEnsureMinAda = True }
  }

By default, txOptEnsureMinAda is set to False, which may seem counterintuitive. However, this prevents unexpected adjustments to ADA amounts that may have been carefully computed. If a transaction output is meant to contain a specific ADA amount based on a precise calculation, but the protocol requires a higher minimum, enabling this option would silently modify the value. This could obscure computation errors, allowing transactions to validate without the user realizing the discrepancy. To stay true to cooked-validators’s philosophy of minimal intervention, the option remains off by default, ensuring that any necessary adjustments are made explicitly.

From spending scripts to automated script witness binding

In the previous examples, we saw how cooked-validators can handle the addition of inputs in a transaction skeleton. However, there are cases where one might want to manually specify the inputs. This is typically necessary when a transaction needs to consume UTXOs belonging to scripts, in which case a redeemer must be provided, as it cannot be inferred automatically. A redeemer is a piece of information (which may be empty) required whenever a script from a smart contract is invoked. This redeemer usually informs the script as to why it has been called, and can also pass dynamic values as inputs to the script. In the examples above, the added inputs were UTXOs from peers, so emptyTxSkelRedeemer was automatically provided.

When consuming scripts, collaterals must be included in case the validation process fails after the script execution. These collaterals cover the computation resources used during validation, which cannot be covered by fees, as fees are only paid if the transaction is successfully validated. The inclusion (or omission) of collaterals, depending on whether the transaction involves scripts, is handled during balancing. Collaterals can only be provided as UTXOs from peers, so a signer is also required in such cases, even if no peer UTXO is consumed. A transaction skeleton that consumes a script can thus be written as:

txSkelTemplate
  { txSkelIns = Map.singleton scriptUtxo $ someTxSkelRedeemer scriptRedeemer,
    txSkelSigners = [alice],
  }

From this skeleton, cooked-validators offers two types of automation. The first is the balancing mechanism, which has already been discussed. Beyond computing fees and collaterals, the balancing process also creates an output at the first signer’s address to return any excess value from inputs and consumes a UTXO from the user to cover the transaction fees.

The second automation concerns the addition of script witnesses. On-chain, scripts are represented by their hash, which serves different purposes depending on the script’s type3—address for spending scripts, policy ID for minting scripts, or staking ID for staking scripts. However, during validation, scripts must be executed, and their hash alone is insufficient. Instead, users must supply the full scripts as witnesses, ensuring their hash matches the expected on-chain hash.

When a UTXO is created at a spending script’s address, cooked-validators retains the script, allowing it to automatically attach the required witness for these inputs in future transactions. However, for minting or staking scripts, the tool lacks knowledge of the necessary witnesses, so they must be specified manually.

Since September 2022, Cardano supports reference scripts, which are complete scripts stored on-chain in UTXOs. These reference scripts can be used as witnesses in place of the full script, reducing transaction size and fees. cooked-validators also automates the inclusion of such reference scripts. In practice, when a script witness is required, the following process unfolds:

  • if a witness is manually provided by the user, it is used as is.
  • if no such witness exists, cooked-validators attempts to find a reference witness among known UTXOs.
  • if no such witness could be found, and the script is used for spending a UTXO, cooked-validators attempts to find a direct witness among its known scripts.

In the previous example, assuming a reference witness was present on some UTXO, the skeleton will look like this:

txSkelTemplate
  { txSkelIns = Map.fromList
      [ (scriptUtxo, TxSkelRedeemer scriptRedeemer (Just referenceInputWithScript)),
        (aliceUtxoForFees, emptyTxSkelRedeemer)
      ],
    txSkelOut = [paysPK alice $ valueInScriptUtxo <> valueInAliceUtxo <> negate fee],
    txSkelSigners = [alice],
  }

From issuing proposals to automated deposit payment

The final type of automation we will discuss in this post involves proposals issued by users, a feature introduced in the Conway era. These proposals can vary, but the most common are parameter changes, where users propose new values for parameters that control on-chain behaviors. These proposals must obey a set of constitutional rules, which is checked by a constitution script. For example, here is a skeleton where Alice proposes to update the cost of fees per byte in the size of a transaction to 100 lovelace, witnessed by a given constitution script:

txSkelTemplate
  { txSkelProposals = [simpleTxSkelProposal alice (TxGovActionParameterChange [FeePerByte 100])
                        `withWitness` (constitutionScript, emptyTxSkelRedeemer),
    txSkelSigners = [alice],
  }

Each proposal requires a deposit of a certain amount of lovelace, as specified by the protocol parameters. cooked-validators takes such deposits into account during the balancing phase. It looks up the current required deposit amount and retrieves this amount from the available UTXOs from the balancing wallet to include in the transaction. After balancing, the skeleton will look like this:

txSkelTemplate
  { txSkelProposals = [simpleTxSkelProposal alice (TxGovActionParameterChange [FeePerByte 100])
                        `withWitness` (constitutionScript, emptyTxSkelRedeemer),
    txSkelIns = Map.singleton aliceUtxo emptyTxSkelRedeemer,
    txSkelOuts = [paysPK alice (valueInAliceUtxo <> negate fee <> negate depositValueFromParams)],
    txSkelSigners = [alice],
  }

Currently, cooked-validators allows the users to provide any constitution script to validate whether the proposal adheres to constitutional rules. In practice, the ledger prevents any such script that does not correspond to the current official Cardano constitution. Thus, in the future, cooked-validators might automatically fetch this script and attach it to proposals.

Conclusion

One of cooked-validators’ main strengths is its ability to allow users to express their high-level transaction requirements conveniently and efficiently, without having to deal with the intricate technical details of the resulting transaction. This is achieved through TxSkels, which are transaction abstractions that can be partially filled by users. cooked-validators performs several passes on these partial skeletons, such as filling in missing minimal ADA, balancing the transaction, and automatically adding witnesses, to translate these minimal skeletons into transactions that can be submitted for validation. This blog post has summarized these key automation steps, stay tuned for more posts around cooked-validators.


  1. it is always possible to override those fields in the generated transaction though, as cooked-validators never forces users to build their transactions one way or another.
  2. the actual balancing equation is more complicated: withdrawals + inputs + mints = burn + outputs + deposits + fees
  3. this name here stands for the combination of a token name and a policy ID.
  4. all scripts are defined in a same way following the Conway era, what we call script types are only abstractions to reference the way they are used.

February 20, 2025 12:00 AM

February 17, 2025

Haskell Interlude

62: Conal Elliott

In this episode Wouter Swiestra and Niki Vazou talk with Conal Elliott. Conal discusses doing things just for the poetry, how most programs miss their purpose, and the simplest way to ask a question. Conal is currently working on a book about his ideas and actively looking for partners. 

by Haskell Podcast at February 17, 2025 11:00 AM

February 15, 2025

Sandy Maguire

Bidirectional Instance Contexts

Just a quick one today, but I wanted to point out a little trick you can do with Haskell’s typeclass inference.

Imagine we have some little class, the details of which matter not in the least:

type Foo :: Type -> Constraint
class Foo a where
  ...

We can give some instances of this type:

instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...

Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context:

instance (Foo a, Foo b) => Foo (a, b) where ...

Then, of course, if we know both Foo a and Foo b, we can infer Foo (a, b). To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the Dict type, and thus the following program will typecheck:

import Data.Constraint

forwards
  :: Dict (Foo a)
  -> Dict (Foo b)
  -> Dict (Foo (a, b))
forwards Dict Dict = Dict

Perhaps tipped off by the name here, the gentle reader is asked to notice the asymmetry here, since the converse program will not typecheck:

backwards
  :: Dict (Foo (a, b))
  -> (Dict (Foo a), Dict (Foo b))
backwards Dict = (Dict, Dict)

But why should it not typecheck?1 Recall from the relevant instance definition that these instances must, in fact, exist:

instance (Foo a, Foo b) => Foo (a, b)

As a testament to just how good GHC is, we can support this bidirectionality via a minor tweak to the definition of class and its instances.

The trick is to add an associated type family to Foo, and to use it as a superclass constraint:

type Foo :: Type -> Constraint
class Evidence a => Foo a where
  type Evidence a :: Constraint
  type Evidence a = ()
  ...

Because we’ve given a default implementation of the type family, our existing simple instances work as before:

instance Foo Int where ...
instance Foo Bool where ...
instance Foo () where ...

with the only change required coming from the type constructor instances:

instance (Foo a, Foo b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...

or, if we you want to be cute about it:

instance Evidence (a, b) => Foo (a, b) where
  type Evidence (a, b) = (Foo a, Foo b)
  ...

By sticking Evidence into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a Foo dictionary around. And our earlier backwards program now typechecks as expected.

This is all available in a play session if you’d like to fool around with it.


  1. Rhetorical question. I don’t want to hear about orphans or overlapping instances or whatever.↩︎

February 15, 2025 02:15 AM

February 13, 2025

Tweag I/O

Bashfulness


When I first joined the Topiary Team, I floated the idea of trying to format Bash with Topiary. While this did nothing to appease my unenviable epithet of “the Bash guy,� it was our first foray into expanding Topiary’s support beyond OCaml and simple syntaxes like JSON.

Alas, at the time, the Tree-sitter Bash grammar was not without its problems. I got quite a long way, despite this, but there were too many things that didn’t work properly for us to graduate Bash to a supported language.

Fast-forward two years and both Topiary and the Tree-sitter Bash grammar have moved on. As the incumbent Bash grammar was beginning to cause downstream problems from bit rot — frustratingly breaking the builds of both Topiary and Nickel — my fellow Topiarist, Nicolas Bacquey, migrated Topiary to the latest version of the Bash grammar and updated our Bash formatting queries to match.

With surprisingly little effort, Nicolas was able to resolve all those outstanding problems. So with that, Bash was elevated to the lofty heights of “supported language� and — with the changes I’ve made from researching this blog post — Bash formatting is now in pretty good shape in Topiary v0.6.

So much so, in fact, let me put my money where my mouth is! Let’s see how Topiary fares against a rival formatter. I’ll do this, first, by taking you down some of the darker alleys of Bash parsing, just to show you what we’re up against.

Hello darkness, my old friend

There is a fifth dimension beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition; it lies between the pit of man’s fears and the summit of his knowledge. This is the dimension of imagination. It is an area we call: the Bash grammar.

In our relentless hubris, man has built a rocket that — rather than exploding on contact with reality — dynamically twists and turns to meet reality’s expectations. Is that a binary? Execute it! Is that a built-in? Execute it! Is that three raccoons in a trench coat, masquerading as a function? Execute it! And so, with each token parsed, we are Bourne Again and stray ever further from god.

Bear witness to but a few eldritch horrors:1

  • Trailing comments must be preceded by whitespace or a semicolon. However, if either of those are escaped, they are interpreted as literals and this changes the tokenisation semantics:

    echo \ # Ceci n'est pas
     | une pipe'

    Here, perhaps the writer intended to add a comment against the first line. But, what looks like a comment isn’t a comment at all; it becomes an argument to echo, along with everything that follows. That includes the apostrophe in “n’est�, which is interpreted as an opening quote — a raw string — which is closed at the end of the next line.

  • Case statements idiomatically delimit each branch condition with a closing parenthesis. In a subshell, for example, this leads to unbalanced brackets:

    ( case $x in foo )   # Wat?...
    echo bar;; esac )    # 🤯

    This subshell outputs bar when the variable $x is equal to foo. Whereas, on a more casual reading, this formulation might just look like a confusing syntax error.

    Speaking of case statements, did you know that ;& and ;;& are also valid branch terminators? Without checking the manual — if you can find the single paragraph where it’s mentioned — can you tell me how they differ?

  • Bash will try to compute an array index if it looks like an arithmetic expression:

    # Output the (foo - bar)th element of array
    echo "${array[foo-bar]}"

    However, if array in this example is an associative array (i.e., a hash map/dictionary), then foo-bar could be a valid key. In which case, it’s not evaluated and used verbatim.

  • Without backtracking, it’s not possible to distinguish between an arithmetic expansion and a command substitution containing a subshell at its beginning or end:

    echo $((foo + bar))
    echo $((foo); (bar))

    Here, the first statement will output the value of the addition of those two variables; the second will execute foo then bar, each in a subshell, echoing their output. In the subshell case, the POSIX standards even recommend that you add spaces — e.g., $( (foo) ) — to remove this ambiguity.

  • Heredocs effectively switch the parser into a different state, where everything is interpreted literally except when it isn’t. This alone is tricky, but Bash introduces some variant forms that allow additional indentation (with hard tabs), switching off all string interpolation, or both.

    # Indented, with interpolation
    cat <<-HEREDOC
    	I am a heredoc. Hear me roar.
    	HEREDOC

Suffice to say, any formatter has their work cut out.

Battle of the Bash formatters

The de facto formatter for Bash is shfmt. It’s written in Go, by Daniel Martí, actively maintained and has been around for the best part of a decade.

Let’s compare Topiary’s Bash formatting with shfmt in a contest worthy of a Netflix special. I’ll look specifically at each tool’s parsing and formatting capabilities as well as their performance characteristics. I won’t, however, compare their subjective formatting styles, as this is largely a matter of taste.

What Topiary can’t do that shfmt can2

When it comes to formatting Bash in a way that is commonly attested in the wild, there are three things that Topiary cannot currently do. Unfortunately, these are either from the absence of a feature in Topiary, or a lack of fidelity in the Tree-sitter grammar; no amount of hacking on queries will fix them.

The worst offender is probably the inability to distinguish line continuations from other token boundaries. These are used in Bash scripts all the time to break up long commands into more digestible code. In the following example, the call to topiary was spread over multiple lines, with line continuations. Topiary slurps everything onto a single line, whereas shfmt preserves the original line continuations in the input:

# Topiary
topiary format --language bash --query bash.scm <"${script}"
# shfmt
topiary format \
    --language bash \
    --query bash.scm \
    <"${script}"

One saving grace is that Topiary’s Bash parser understands a trailing |, in a pipeline, to accept a line break. As such — while it isn’t my personal favourite style3 — Topiary does support multi-line pipelines. Arguably, they even look a little nicer in Topiary than in shfmt, which only preserves where the line breaks occurred in the input:

# Topiary
foo |
  bar |
  baz |
  quux
# shfmt
foo | bar |
    baz | quux

Otherwise, in Topiary, every command is a one-liner…whether you like it or not!

Next on the “nice to have� list is the long-standing (and controversial) feature request of “alignment blocks�; specifically for comments. That is, presumably related comments appearing on a series of lines should be aligned to the same column:

# Topiary
here # comment
is # comment
a # comment
sequence # comment
of # comment
commands # comment
# shfmt
here     # comment
is       # comment
a        # comment
sequence # comment
of       # comment
commands # comment

The tl;dr of the controversy is that, despite being a popular request — and we all know where popularity gets us, these days — it’s a slap in the face to one of Topiary’s core design principles: minimising diffs. Because we live in a universe where elastic tabstops never really took off, a small change to the above example — say, adding an option to one of the commands — would produce the following noisy diff:

-here     # comment
-is       # comment
-a        # comment
-sequence # comment
-of       # comment
-commands # comment
+here                      # comment
+is                        # comment
+a                         # comment
+sequence                  # comment
+of                        # comment
+commands --with-an-option # comment

For the time being, Topiary won’t be making alignment great again.

Finally, string interpolations — with command substitution and arithmetic expansions — cannot be formatted without potentially breaking the string itself. This is particularly true of heredocs; the full subtleties of which escape the Tree-sitter Bash grammar and so are easily corruptible with naive formatting changes. As such, Topiary has to treat these as immutable leaves and leave them untouched:

# Topiary
echo "2 + 2 = $((  2+  2 ))"

cat <<EOF
Today is $(   date )
EOF
# shfmt
echo "2 + 2 = $((2 + 2))"

cat <<EOF
Today is $(date)
EOF

So far, I have only found three constructions that are syntactically correct, but the Tree-sitter Bash grammar cannot parse (whereas, shfmt can):

  1. A herestring that follows a file redirection (issue #282):

    rev > output <<< hello

    A workaround, for now, is to switch the order; so the herestring comes first.

  2. A heredoc that uses an empty marker (issue #283):

    cat <<''
    Only a monster would do this, anyway!
    
    
  3. Similar to line continuations, the Tree-sitter Bash grammar seems to swallow escaped spaces at the beginning of tokens, interpreting them as tokenisation whitespace rather than literals (issue #284):

    # This should output:
    # <a>
    # <b>
    # < >
    # <c>
    printf "<%s>\n" a b \  c

For what it’s worth, shfmt also supports POSIX shell and mksh (a KornShell implementation). As of writing, there are no Tree-sitter grammars for these shells. However, their syntax doesn’t diverge too far from Bash, so it’s likely that Topiary’s Bash support will be sufficient for large swathes of such scripts. Moreover, the halcyon years of the 1990s are a long way behind us, so maybe this doesn’t matter.

What shfmt can’t do that Topiary can2

shfmt is part of a wider project that includes a Bash parser for the Go ecosystem. A purpose-built parser, particularly for Bash, should perform better than the generalised promise of Tree-sitter and, indeed, that’s what we see. However, there are a few minor constructions that shfmt doesn’t like, but the Tree-sitter Bash grammar accepts:

  1. An array index assignment which uses the addition augmented assignment operator:

    my_array=(
      foo
      [0]+=bar
    )

    To be fair to shfmt, while this is valid Bash, not even the venerable ShellCheck can parse this!

  2. Topiary leaves array indices unformatted, despite them allowing arithmetic expressions. shfmt, however, will add whitespace to any index that looks like an arithmetic expression (e.g., [foo-bar] will become [ foo - bar ]); even if the original, unspaced version could be a valid associative array key.

    (Neither Topiary nor shfmt can handle indices containing spaces. However, the standard Bash workaroundâ„¢ is to quote these: ${array["foo bar"]}.)

  3. Brace expansions can appear — perhaps surprisingly — almost anywhere. Particularly surprising to shfmt is when they appear in variable declarations, which it cannot parse:

    declare {a,b,c}=123      # a=123 b=123 c=123
    declare foo{1..10}=bar   # foo1=bar foo2=bar ... foo10=bar

While it’s a bit of a hack,4 we also implement something akin to “rewrite rules� in our Topiary Bash formatting queries, which shfmt (mostly) doesn’t do. This is to enforce a canonical style over certain constructions. Namely:

  • All $... variables are rewritten in their unambiguous form of ${...}, excluding special variables such as $1 and $@. (Note that this doesn’t affect $'...' ANSI C strings, despite their superficial similarity.)

  • All function signatures are rewritten to the name() { ... } form, rather than function name { ... } or function name() { ... }.

  • All POSIX-style [ ... ] test clauses are rewritten to the Bash [[ ... ]] form.

  • All legacy $[ ... ] arithmetic expansions are rewritten to their $(( ... )) form.

  • All `...` command substitutions are rewritten to their $( ... ) form.

    (This is one that shfmt does do.)

Technically, it is also possible to write rules that put quotes around unquoted command arguments, ignoring things like -o/--options. While this is good practice, we do not enforce this style as it changes the code’s semantics and there may be legitimate reasons to leave arguments unquoted.

Throughput

Let’s be honest: If you have so much Bash to format that throughput becomes meaningful, then formatting is probably the least of your worries. That being said, it is the one metric that we can actually quantify.

Our first problem is that we need a large corpus of normal scripts. By “normal,� I mean things that you’d see in the wild and could conceivably understand if you squint hard enough. This rules out the Bash test suite, for example, which — while quite large — is a grimoire of weird edge cases that neither Topiary nor shfmt handle well. Quite frankly, if you’re writing Bash that looks like this, then you don’t deserve formatting:

: $(case a in a) : ;#esac ;;
esac)

Digging around on r/bash, I came across this repository of scripts. They’re all fairly short, but they’re quite sane. This will do.

We need to slam large amounts of Bash into the immovable objects that are our formatters; a “Bash test dummy,�5 if you will. It would be ideal if we could stream Bash into our formatters — so we could orchestrate sampling at regular time intervals — however, neither Topiary nor shfmt support streaming formatting. This stands to reason as there are cases where formatting will depend on some future context, so the whole input will need to be read upfront. As such, we need to invert our approach to collecting metrics and sample over input size instead.

The general method is:

  1. Locate the scripts in the repository that are Bash, by looking at their shebang.
  2. Filter this list to those which Topiary can handle without tripping over itself because of some obscure parsing failure. (We assume shfmt doesn’t require such a concession.)
  3. Perform <semantics>N<annotation encoding="application/x-tex">N</annotation></semantics>N trials, in which:
    • The whitelist of scripts is randomised, to remove any potential confounding from caching.
    • The top <semantics>M<annotation encoding="application/x-tex">M</annotation></semantics>M scripts are concatenated to obtain a single trial input.6 This is to increase the input size to the formatters in each trial, which is presumed to be the dependent variable, but may be subject to confounding effects when the input is small.
    • The trial input is read to /dev/null a handful of times to warm up the filesystem cache.
    • The trial input is fed into the following, with benchmarks — trial input size (bytes) and runtime (nanoseconds) — recorded for each:
      • cat, which acts as a control;
      • Topiary (v0.5.1; release build, with the query changes described in this blog post);
      • Topiary, with its idempotence checking disabled;
      • shfmt (v3.10.0).

This identified 156 Bash scripts within the test repository; of which, 154 of them could be handled by Topiary.7 On an 11th generation Intel Core i7, at normal stepping, with <semantics>N=50<annotation encoding="application/x-tex">N=50</annotation></semantics>N=50 and <semantics>M=25<annotation encoding="application/x-tex">M=25</annotation></semantics>M=25, on a Tuesday afternoon, I obtained the following results:

N=50, M=25

cat, which does nothing, is unsurprisingly way out in front; by two orders of magnitude. This is not interesting, but establishes that input can be read faster than it can be formatted. That is, our little experiment is not accidentally I/O bound.

What is interesting is that Topiary is about 3× faster than shfmt. We also see that the penalty imposed by idempotency checking — which formats twice, to check the output reaches a fixed point — is quite negligible. This indicates that most of the work Topiary is doing is in its startup overhead, which involves loading the grammar and parsing the formatting query file.

Since Topiary only has to do this once per trial, it’s a little unfair to set <semantics>M=25<annotation encoding="application/x-tex">M=25</annotation></semantics>M=25; that is, an artificially enlarged input that is syntactically valid but semantically meaningless. However, if we set <semantics>M=1<annotation encoding="application/x-tex">M=1</annotation></semantics>M=1 (i.e., individual scripts), then we see a similar comparison:

N=50, M=1

For small inputs, the idempotency check penalty is barely perceptible. Otherwise, the startup overhead dominates for both formatters — hence the much lower throughput values — but, still, Topiary comfortably outperforms shfmt by a similar factor.

And the winner is…

In an attempt to regain some professional integrity, I’ll fess up to the fact that Topiary has a bit of a home advantage and maybe — just maybe — I’m ever so slightly biased. That is, as we are in the (dubious) position of building a plane while attempting to fly it, I was able to tweak and fix a few of our formatting rules to improve Topiary’s Bash support during the writing of this blog post:

  • I added formatting rules for arrays (and associative arrays) and their elements.
  • I corrected the formatting of trailing comments that appear at the end of a script.
  • I corrected the function signature rewriting rule.
  • I corrected the formatting of a string of commands that are interposed by Bash’s & asynchronous operator.
  • I fixed the formatting of test commands and added a rewrite rule for POSIX-style [ ... ] tests.
  • I implemented multi-line support for pipelines.8
  • I updated the $... variable rewrite rule to avoid targeting special forms like $0, $? and $@, etc.
  • I implemented a rewrite rule that converts legacy $[ ... ] arithmetic expansions into their $(( ... )) form.
  • I implemented a rewrite rule that converts `...` command substitutions into their $(...) form.
  • I fixed the spacing within variable declarations, to accommodate arguments and expansions.
  • I forced additional spacing in command substitutions containing subshells, to remove any ambiguity with arithmetic expansions.

The point I’m making here is that these adjustments were very easy to conjure up; just a few minutes of thought for each, across our Tree-sitter queries, was required.

So who’s the winner?

Well, would it be terribly anticlimactic of me, after all that, not to call it? shfmt is certainly more resilient to Bash-weirdness and, of the “big three� I discussed, its line continuation handling is a must have. However, Topiary does pretty well, regardless: It’s much faster, for what that’s worth, and — more to the point — far easier to tweak and hack on.

Indeed, when the Topiary team first embarked upon this path, we weren’t even sure whether it would be possible to format Bash. Now that the Tree-sitter Bash grammar has matured, Topiary — perhaps with future fixes to address some of its shortcomings, uncovered by this blog post — is a contender in the Bash ecosystem.

Thanks to Nicolas Bacquey, Yann Hamdaoui, Tor Hovland, Torsten Schmits and Arnaud Spiwack for their reviews and input on this post, and to Florent Chevrou for his assistance with the side-by-side code styling.


  1. It’s very likely that the syntax highlighting for the more exotic Bash snippets in this blog post will be completely broken.↩

  2. …Yet.↩

  3. My preferred multi-line pipeline style is to have a line continuation and then the | character on the next line, indented:

    foo \
      | bar \
      | baz \
      | quux

    I personally find this much clearer, but Topiary cannot currently handle those pesky line continuations. For shame!↩

  4. Topiary’s formatting rules include node deletion and delimiter insertion. However, delimiters can be any string, so we can coopt this functionality to create basic rewrite rules.↩

  5. I’m also the “terrible pun guy.�↩

  6. This exposed an unexpected bug, whereby Topiary’s formatting model breaks down when some complexity (or, by proxy, size) limit is reached. This behaviour had not been previously observed and further investigation is required.↩

  7. The two failures were due to the aforementioned herestring and complexity6 problems.↩

  8. It may also be possible to implement multi-line && and || lists in a similar way. However, the Tree-sitter grammar parses these into a left-associative nested (list) structure, which is tricky to query.↩

February 13, 2025 12:00 AM

Oleg Grenrus

PHOAS to de Bruijn conversion

Posted on 2025-02-13 by Oleg Grenrus agda

Recently I looked again at PHOAS, and once again I concluded it's nice for library APIs, but so painful to do anything with inside those libraries. So let convert to something else, like de Bruijn.

There are standalone source files if you just want to see the code:

How to convert PHOAS terms to de Bruijn terms?

The solution is hard to find.

You can cheat, [as mentioned by Roman on Agda mailing list https://lists.chalmers.se/pipermail/agda/2018/010033.html]:

There is always a way to cheat, though. You can turn the PHOAS -> untyped de Bruijn machinery into the PHOAS -> typed de Bruijn machinery by checking that future contexts indeed extend past contexts and throwing an error otherwise (which can't happed, because future contexts always extend past contexts, but it's a metatheorem).

In "Generic Conversions of Abstract Syntax Representation" by Steven Keuchel and Johan Jeuring, authors also "cheat" a bit. The "Parametrhic higher-order abstract syntax" section ends with a somewhat disappointing

  where postulate whatever : _

Keuchel and Jeuring also mention "Unembedding Domain-Specific Languages" by Robert Atkey, Sam Lindley and Jeremy Yallop; where there is one unsatisfactory ⊥ (undefined in Haskell) hiding.

I think that for practical developments (say a library in Haskell), it is ok to make a small short cut; but I kept wondering isn't there is a way to make a conversion without cheating.

Well... it turns out that we cannot "cheat". Well-formedness of PHOAS representation depends on parametricity, and the conversion challenge seems to requires a theorem which there are no proof in Agda.

In unpublished (?) work Adam Chlipala shows a way to do the conversion without relying on postulates http://adam.chlipala.net/cpdt/html/Intensional.html; but that procedure requires an extra well formedness proof of given PHOAS term.

This Agda development is a translation of that developement.

Common setup

Our syntax representations will be well-typed, so we need types:

-- Types
data Ty : Set where
  emp : Ty
  fun : Ty  Ty  Ty

Ctx : Set
Ctx = List Ty

variable
  A B C : Ty
  Γ Δ Ω : Ctx
  v : Ty  Set

de Bruijn syntax

Var : Ctx  Ty  Set
Var Γ A = Idx A Γ -- from agda-np, essentially membership relation.

data DB (Γ : Ctx) : Ty  Set where
  var : Var Γ A  DB Γ A
  app : DB Γ (fun A B)  DB Γ A  DB Γ B
  lam : DB (A ∷ Γ) B  DB Γ (fun A B)
  abs : DB Γ emp  DB Γ A

Parametric Higher-order abstract syntax

data PHOAS (v : Ty  Set) : Ty  Set where
  var : v A  PHOAS v A
  app : PHOAS v (fun A B)  PHOAS v A  PHOAS v B
  lam : (v A  PHOAS v B)  PHOAS v (fun A B)
  abs : PHOAS v emp  PHOAS v A

-- closed "true" PHOAS terms.
PHOAS° : Ty  Set₁
PHOAS° A =  {v}  PHOAS v A

de Bruijn to PHOAS

This direction is trivial. An anecdotal evidence that de Bruijn representation is easier to transformation on.

phoasify : NP v Γ  DB Γ A  PHOAS v A
phoasify γ (var x)   = var (lookup γ x)
phoasify γ (app f t) = app (phoasify γ f) (phoasify γ t)
phoasify γ (lam t)   = lam λ x  phoasify (x ∷ γ) t
phoasify γ (abs t)   = abs (phoasify γ t)

Interlude: Well-formedness of PHOAS terms

dam Chlipala defines an equivalence relation between two PHOAS terms, exp_equiv in Intensional, wf in CPDT book). e only need a single term well-formedness so can do a little less

The goal is to rule out standalone terms like

module Invalid where
  open import Data.Unit using (; tt)

  invalid : PHOAS  _ ) emp
  invalid = var tt

Terms like invalid cannot be values of PHOAS°, as all values of "v" inside PHOAS° have to originated from lam-constructor abstractions. We really should keep v parameter free, i.e. parametric, when constructing PHOAS terms.

The idea is then to simply to track which variables (values of v) are intoduced by lambda abstraction.

data phoasWf {v : Ty  Set} (G : ListTy v)) : {A : Ty}  PHOAS v A  Set
 where
  varWf :  {A} {x : v A}
     Idx (A , x) G
     phoasWf G (var x)
  appWf :  {A B} {f : PHOAS v (fun A B)} {t : PHOAS v A}
     phoasWf G f
     phoasWf G t
     phoasWf G (app f t)
  lamWf :  {A B} {f : v A  PHOAS v B}
     ( (x : v A)  phoasWf ((A , x)  G) (f x))
     phoasWf G (lam f)
  absWf :  {A} {t : PHOAS v emp}
     phoasWf G t
     phoasWf G (abs {A = A} t)

-- closed terms start with an empty G
phoasWf° : PHOAS° A  Set
phoasWf° tm =  {v}  phoasWf {v = v} [] tm

A meta theorem is then that all PHOASᵒ terms are well-formed, i.e.

meta-theorem-proposition : Set₁
meta-theorem-proposition =  {A} (t : PHOAS° A)  phoasWf° t

As far as I'm aware this proposition cannot be proved nor refuted in Agda.

de Bruijn to PHOAS translation creates well-formed PHOAS terms.

As a small exercise we can show that phoasify of closed de Bruijn terms creates well-formed PHOAS terms.

toList : NP v Γ  List (Σ Ty v)
toList []       = []
toList (x ∷ xs) = (_ , x) ∷ toList xs

phoasifyWfVar : (γ : NP v Γ) (x : Var Γ A)  Idx (A , lookup γ x) (toList γ)
phoasifyWfVar (x ∷ γ) zero    = zero
phoasifyWfVar (x ∷ γ) (suc i) = suc (phoasifyWfVar γ i)

phoasifyWf : (γ : NP v Γ) (t : DB Γ A)  phoasWf (toList γ) (phoasify γ t)
phoasifyWf γ (var x)   = varWf (phoasifyWfVar γ x)
phoasifyWf γ (app f t) = appWf (phoasifyWf γ f) (phoasifyWf γ t)
phoasifyWf γ (lam t)   = lamWf λ x  phoasifyWf (x ∷ γ) t
phoasifyWf γ (abs t)   = absWf (phoasifyWf γ t)

phoasifyWf° : (t : DB [] A)  phoasWf° (phoasify [] t)
phoasifyWf° t = phoasifyWf [] t

PHOAS to de Bruijn

The rest deals with the opposite direction.

In Intensional Adam Chlipala uses v = λ _ → ℕ instatiation to make the translation.

I think that in the typed setting using v = λ _ → Ctx turns out nicer.

The idea in both is that we instantiate PHOAS variables to be de Bruijn levels.

data IsSuffixOf {} {a : Set} : List a  List a  Setwhere
  refl :  {xs}  IsSuffixOf xs xs
  cons :  {xs ys}  IsSuffixOf xs ys   {y}  IsSuffixOf xs (y ∷ ys)

We need to establish well-formedness of PHOAS expression in relation to some context Γ

Note that variables encode de Bruijn levels, thus the contexts we "remember" in variables should be the suffix of that outside context.

wf : (Γ : Ctx)  PHOAS  _  Ctx) A  Set
wf {A = A} Γ (var Δ)         = IsSuffixOf (A ∷ Δ) Γ
wf         Γ (app f t)       = wf Γ f × wf Γ t
wf         Γ (lam {A = A} t) = wf (A ∷ Γ) (t Γ)
wf         Γ (abs t)         = wf Γ t

And if (A ∷ Δ) is suffix of context Γ, we can convert the evidence to the de Bruijn index (i.e. variable):

makeVar : IsSuffixOf (A ∷ Δ) Γ  Var Γ A
makeVar refl     = zero
makeVar (cons s) = suc (makeVar s)

Given the term is well-formed in relation to context Γ we can convert it to de Bruijn representation.

dbify : (t : PHOAS  _  Ctx) A)  wf Γ t  DB Γ A
dbify         (var x)   wf        = var (makeVar wf)
dbify         (app f t) (fʷ , tʷ) = app (dbify f fʷ) (dbify t tʷ)
dbify {Γ = Γ} (lam t)   wf        = lam (dbify (t Γ) wf)
dbify         (abs t)   wf        = abs (dbify t wf)

What is left is to show that we can construct wf for all phoasWf-well-formed terms.

Adam Chlipala defines a helper function:

makeG′ : Ctx  List (Σ Ty  _  Ctx))
makeG′ [] = []
makeG′ (A ∷ Γ) = (A , Γ) ∷ makeG′ Γ

However for somewhat technical reasons, we rather define

expand : (Γ : Ctx)  NP  _  Ctx) Γ
expand []      = []
expand (_ ∷ Γ) = Γ ∷ expand Γ

and use expand with previously defined toList to define our version of makeG:

makeG : Ctx  List (Σ Ty  _  Ctx))
makeG Γ = toList (expand Γ)

makeG and makeG′ are the same:

toList∘expand≡makeG :  Γ  makeG Γ ≡ makeG′ Γ
toList∘expand≡makeG []      = refl
toList∘expand≡makeG (A ∷ Γ) = cong ((A , Γ)_) (toList∘expand≡makeG Γ)

Then we can construct wf for all phoasWf:

wfWfVar : Idx (A , Δ) (makeG Γ) → IsSuffixOf (A ∷ Δ) Γ
wfWfVar {Γ = B ∷ Γ} zero    = refl
wfWfVar {Γ = B ∷ Γ} (suc i) = cons (wfWfVar i)

wfWf : (t : PHOAS (λ _ → Ctx) A) → phoasWf (makeG Γ) t → wf Γ t
wfWf         (var x)   (varWf xʷ)    = wfWfVar xʷ
wfWf         (app f t) (appWf fʷ tʷ) = wfWf f fʷ , wfWf t tʷ
wfWf {Γ = Γ} (lam f)   (lamWf fʷ)    = wfWf (f Γ) (fʷ Γ)
wfWf         (abs t)   (absWf tʷ)    = wfWf t tʷ

And finally we define dbifyᵒ for all well-formed PHOASᵒ terms.

dbify° : (t : PHOAS° A)  phoasWf° t  DB [] A
dbify° t w = dbify t (wfWf t w)

Bonus section

We can show that converting closed de Bruijn term to PHOAS and back is an identity function:

bonus-var : (x : Var Γ A)  x ≡ makeVar (wfWfVar (phoasifyWfVar (expand Γ) x))
bonus-var {Γ = A  Γ} zero    = refl
bonus-var {Γ = A  Γ} (suc i) = cong suc (bonus-var i)

bonus : (t : DB Γ A)
       t ≡ dbify (phoasify (expand Γ) t)
              (wfWf (phoasify (expand Γ) t) (phoasifyWf _ t))
bonus (var x)   = cong var (bonus-var x)
bonus (app f t) = cong₂ app (bonus f) (bonus t)
bonus (lam t)   = cong lam (bonus t)
bonus (abs t)   = cong abs (bonus t)

bonus° :  (t : DB [] A)  t ≡ dbify° (phoasify [] t) (phoasifyWf° t)
bonus° t = bonus t

February 13, 2025 12:00 AM

February 12, 2025

Well-Typed.Com

The Haskell Unfolder Episode 39: deriving strategies

Today, 2025-02-12, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 39th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 39: deriving strategies

In this episode we’ll discuss the the four different ways GHC offers for deriving class instance definitions: the classic “stock” deriving, generalised “newtype” deriving, as well as the “anyclass” and “via” strategies. For each of these, we’ll explain the underlying ideas, use cases, and limitations.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at February 12, 2025 12:00 AM

February 11, 2025

Oleg Grenrus

NbE PHOAS

Posted on 2025-02-11 by Oleg Grenrus agda

Normalization by evaluation using parametric higher order syntax. In Agda.

I couldn't find a self-contained example of PHOAS NbE, so here it is. I hope someone might find it useful.

module NbEXP.PHOAS where

data Ty : Set where
  emp : Ty
  fun : Ty  Ty  Ty

data Tm (v : Ty  Set) : Ty  Set where
  var :  {a}  v a  Tm v a
  app :  {a b}  Tm v (fun a b)  Tm v a  Tm v b
  lam :  {a b}  (v a  Tm v b)  Tm v (fun a b)

data Nf (v : Ty  Set) : Ty  Set
data Ne (v : Ty  Set) : Ty  Set

data Ne v where
  nvar :  {a}  v a  Ne v a
  napp :  {a b}  Ne v (fun a b)  Nf v a  Ne v b

data Nf v where
  neut : Ne v emp  Nf v emp
  nlam :  {a b}  (v a  Nf v b)  Nf v (fun a b)

Sem : (Ty  Set)  Ty  Set
Sem v emp       = Ne v emp
Sem v (fun a b) = Sem v a  Sem v b

lower :  {v : Ty  Set} (a : Ty)  Sem v a  Nf v a
raise :  {v : Ty  Set} (a : Ty)  Ne v a  Sem v a

lower emp       s = neut s
lower (fun a b) s = nlam λ x  lower b (s (raise a (nvar x)))

raise emp       n   = n
raise (fun a b) n x = raise b (napp n (lower a x ))

eval : {v : Ty  Set} {a : Ty}  Tm (Sem v) a  Sem v a
eval (var x)   = x
eval (app f t) = eval f (eval t)
eval (lam t) x = eval (t x)

nf : {a : Ty}  {v : Ty  Set}  Tm (Sem v) a  Nf v a
nf {a} t = lower a (eval t)

nf_parametric : {a : Ty}  ({v : Ty  Set}  Tm v a) -> ({v : Ty  Set}  Nf v a)
nf_parametric t = nf t

February 11, 2025 12:00 AM

February 10, 2025

Oskar Wickström

Machine: Learning; Human: Unlearning;

This last month has been fascinating. I guess LLMs have finally resonated with me on a deeper level. It wasn’t like I woke up and suddenly everything was different, but their impact is growing on me non-linearly, forcing me to rewire my brain.

February 10, 2025 11:00 PM

February 06, 2025

Philip Wadler

I've been nominated for a teaching award


I've been fortunate to be nominated for a few teaching awards over my career, and even to win a couple. The nomination I just received may be the best.

As a new student at the uni, Philip Wadler was the first introductory lecture I had, and his clear passion for the subject made me feel excited to begin my journey in computer science. In particular he emphasised the importance of asking questions, which made the idea of tutorials and lectures a lot less intimidating, and went on to give really valuable advice for starting university. I enjoyed this session so much, and so was looking forward to the guest lectures he was going to do for Inf1A at the end of semester 1. They certainly did not disappoint, the content he covered was engaging, interesting, and above all very entertaining to listen to, especially when he dressed up as a superhero to cement his point. Because I found these talks so rewarding, I also attended the STMU that he spoke at about AI and ChatGPT, and everyone I talked to after the event said they had a really good time whilst also having a completely new insightful perspective on the topic. In summary, Philip Wadler has delivered the best lectures I have attended since starting university, and I have gotten a lot out of them.

Thank you, anonymous first-year student! 

by Philip Wadler (noreply@blogger.com) at February 06, 2025 10:07 PM

February 04, 2025

Michael Snoyman

Who pays a tax?

President Trump has started rolling out his tariffs, something I blogged about in November. People are talking about these tariffs a lot right now, with many people (correctly) commenting on how consumers will end up with higher prices as a result of these tariffs. While that part is true, I’ve seen a lot of people taking it to the next, incorrect step: that consumers will pay the entirety of the tax. I put up a poll on X to see what people thought, and while the right answer got a lot of votes, it wasn't the winner.

For purposes of this blog post, our ultimate question will be the following:

  • Suppose apples currently sell for $1 each in the entire United States.
  • There are domestic sellers and foreign sellers of apples, all receiving the same price.
  • There are no taxes or tariffs on the purchase of apples.
  • The question is: if the US federal government puts a $0.50 import tariff per apple, what will be the change in the following:
    • Number of apples bought in the US
    • Price paid by buyers for apples in the US
    • Post-tax price received by domestic apple producers
    • Post-tax price received by foreign apple producers

Before we can answer that question, we need to ask an easier, first question: before instituting the tariff, why do apples cost $1?

And finally, before we dive into the details, let me provide you with the answers to the ultimate question. I recommend you try to guess these answers before reading this, and if you get it wrong, try to understand why:

  1. The number of apples bought will go down
  2. The buyers will pay more for each apple they buy, but not the full amount of the tariff
  3. Domestic apple sellers will receive a higher price per apple
  4. Foreign apple sellers will receive a lower price per apple, but not lowered by the full amount of the tariff

In other words, regardless of who sends the payment to the government, both taxed parties (domestic buyers and foreign sellers) will absorb some of the costs of the tariff, while domestic sellers will benefit from the protectionism provided by tariffs and be able to sell at a higher price per unit.

Marginal benefit

All of the numbers discussed below are part of a helper Google Sheet I put together for this analysis. Also, apologies about the jagged lines in the charts below, I hadn’t realized before starting on this that there are some difficulties with creating supply and demand charts in Google Sheets.

Let’s say I absolutely love apples, they’re my favorite food. How much would I be willing to pay for a single apple? You might say “$1, that’s the price in the supermarket,” and in many ways you’d be right. If I walk into supermarket A, see apples on sale for $50, and know that I can buy them at supermarket B for $1, I’ll almost certainly leave A and go buy at B.

But that’s not what I mean. What I mean is: how high would the price of apples have to go everywhere so that I’d no longer be willing to buy a single apple? This is a purely personal, subjective opinion. It’s impacted by how much money I have available, other expenses I need to cover, and how much I like apples. But let’s say the number is $5.

How much would I be willing to pay for another apple? Maybe another $5. But how much am I willing to pay for the 1,000th apple? 10,000th? At some point, I’ll get sick of apples, or run out of space to keep the apples, or not be able to eat, cook, and otherwise preserve all those apples before they rot.

The point being: I’ll be progressively willing to spend less and less money for each apple. This form of analysis is called marginal benefit: how much benefit (expressed as dollars I’m willing to spend) will I receive from each apple? This is a downward sloping function: for each additional apple I buy (quantity demanded), the price I’m willing to pay goes down. This is what gives my personal demand curve. And if we aggregate demand curves across all market participants (meaning: everyone interested in buying apples), we end up with something like this:

Demand curve before tax

Assuming no changes in people’s behavior and other conditions in the market, this chart tells us how many apples will be purchased by our buyers at each price point between $0.50 and $5. And ceteris paribus (all else being equal), this will continue to be the demand curve for apples.

Marginal cost

Demand is half the story of economics. The other half is supply, or: how many apples will I sell at each price point? Supply curves are upward sloping: the higher the price, the more a person or company is willing and able to sell a product.

Let’s understand why. Suppose I have an apple orchard. It’s a large property right next to my house. With about 2 minutes of effort, I can walk out of my house, find the nearest tree, pick 5 apples off the tree, and call it a day. 5 apples for 2 minutes of effort is pretty good, right?

Yes, there was all the effort necessary to buy the land, and plant the trees, and water them… and a bunch more than I likely can’t even guess at. We’re going to ignore all of that for our analysis, because for short-term supply-and-demand movement, we can ignore these kinds of sunk costs. One other simplification: in reality, supply curves often start descending before ascending. This accounts for achieving efficiencies of scale after the first number of units purchased. But since both these topics are unneeded for understanding taxes, I won’t go any further.

Anyway, back to my apple orchard. If someone offers me $0.50 per apple, I can do 2 minutes of effort and get $2.50 in revenue, which equates to a $75/hour wage for me. I’m more than happy to pick apples at that price!

However, let’s say someone comes to buy 10,000 apples from me instead. I no longer just walk out to my nearest tree. I’m going to need to get in my truck, drive around, spend the day in the sun, pay for gas, take a day off of my day job (let’s say it pays me $70/hour). The costs go up significantly. Let’s say it takes 5 days to harvest all those apples myself, it costs me $100 in fuel and other expenses, and I lose out on my $70/hour job for 5 days. We end up with:

  • Total expenditure: $100 + $70 * 8 hours a day * 5 days == $2900
  • Total revenue: $5000 (10,000 apples at $0.50 each)
  • Total profit: $2100

So I’m still willing to sell the apples at this price, but it’s not as attractive as before. And as the number of apples purchased goes up, my costs keep increasing. I’ll need to spend more money on fuel to travel more of my property. At some point I won’t be able to do the work myself anymore, so I’ll need to pay others to work on the farm, and they’ll be slower at picking apples than me (less familiar with the property, less direct motivation, etc.). The point being: at some point, the number of apples can go high enough that the $0.50 price point no longer makes me any money.

This kind of analysis is called marginal cost. It refers to the additional amount of expenditure a seller has to spend in order to produce each additional unit of the good. Marginal costs go up as quantity sold goes up. And like demand curves, if you aggregate this data across all sellers, you get a supply curve like this:

Supply curve before tariff

Equilibrium price

We now know, for every price point, how many apples buyers will purchase, and how many apples sellers will sell. Now we find the equilibrium: where the supply and demand curves meet. This point represents where the marginal benefit a buyer would receive from the next buyer would be less than the cost it would take the next seller to make it. Let’s see it in a chart:

Supply and demand before tariff

You’ll notice that these two graphs cross at the $1 price point, where 63 apples are both demanded (bought by consumers) and supplied (sold by producers). This is our equilibrium price. We also have a visualization of the surplus created by these trades. Everything to the left of the equilibrium point and between the supply and demand curves represents surplus: an area where someone is receiving something of more value than they give. For example:

  • When I bought my first apple for $1, but I was willing to spend $5, I made $4 of consumer surplus. The consumer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and above the equilibrium price point.
  • When a seller sells his first apple for $1, but it only cost $0.50 to produce it, the seller made $0.50 of producer surplus. The producer portion of the surplus is everything to the left of the equilibrium point, between the supply and demand curves, and below the equilibrium price point.

Another way of thinking of surplus is “every time someone got a better price than they would have been willing to take.”

OK, with this in place, we now have enough information to figure out how to price in the tariff, which we’ll treat as a negative externality.

Modeling taxes

Alright, the government has now instituted a $0.50 tariff on every apple sold within the US by a foreign producer. We can generally model taxes by either increasing the marginal cost of each unit sold (shifting the supply curve up), or by decreasing the marginal benefit of each unit bought (shifting the demand curve down). In this case, since only some of the producers will pay the tax, it makes more sense to modify the supply curve.

First, let’s see what happens to the foreign seller-only supply curve when you add in the tariff:

Foreign supply shift from tariff

With the tariff in place, for each quantity level, the price at which the seller will sell is $0.50 higher than before the tariff. That makes sense: if I was previously willing to sell my 82nd apple for $3, I would now need to charge $3.50 for that apple to cover the cost of the tariff. We see this as the tariff “pushing up” or “pushing left” the original supply curve.

We can add this new supply curve to our existing (unchanged) supply curve for domestic-only sellers, and we end up with a result like this:

Supply curves post tariff

The total supply curve adds up the individual foreign and domestic supply curves. At each price point, we add up the total quantity each group would be willing to sell to determine the total quantity supplied for each price point. Once we have that cumulative supply curve defined, we can produce an updated supply-and-demand chart including the tariff:

Supply and demand post tariff

As we can see, the equilibrium has shifted:

  • The equilibrium price paid by consumers has risen from $1 to $1.20.
  • The total number of apples purchased has dropped from 63 apples to 60 apples.
  • Consumers therefore received 3 less apples. They spent $72 for these 60 apples, whereas previously they spent $63 for 3 more apples, a definite decrease in consumer surplus.
  • Foreign producers sold 36 of those apples (see the raw data in the linked Google Sheet), for a gross revenue of $43.20. However, they also need to pay the tariff to the US government, which accounts for $18, meaning they only receive $25.20 post-tariff. Previously, they sold 42 apples at $1 each with no tariff to be paid, meaning they took home $42.
  • Domestic producers sold the remaining 24 apples at $1.20, giving them a revenue of $28.80. Since they don’t pay the tariff, they take home all of that money. By contrast, previously, they sold 21 apples at $1, for a take-home of $21.
  • The government receives $0.50 for each of the 60 apples sold, or in other words receives $30 in revenue it wouldn’t have received otherwise.

We could be more specific about the surpluses, and calculate the actual areas for consumer surplus, producer surplus, inefficiency from the tariff, and government revenue from the tariff. But I won’t bother, as those calculations get slightly more involved. Instead, let’s just look at the aggregate outcomes:

  • Consumers were unquestionably hurt. Their price paid went up by $0.20 per apple, and received less apples.
  • Foreign producers were also hurt. Their price received went down from the original $1 to the new post-tariff price of $1.20, minus the $0.50 tariff. In other words: foreign producers only receive $0.70 per apple now. This hurt can be mitigated by shifting sales to other countries without a tariff, but the pain will exist regardless.
  • Domestic producers scored. They can sell less apples and make more revenue doing it.
  • And the government walked away with an extra $30.

Hopefully you now see the answer to the original questions. Importantly, while the government imposed a $0.50 tariff, neither side fully absorbed that cost. Consumers paid a bit more, foreign producers received a bit less. The exact details of how that tariff was split across the groups is mediated by the relevant supply and demand curves of each group. If you want to learn more about this, the relevant search term is “price elasticity,” or how much a group’s quantity supplied or demanded will change based on changes in the price.

Other taxes

Most taxes are some kind of a tax on trade. Tariffs on apples is an obvious one. But the same applies to income tax (taxing the worker for the trade of labor for money) or payroll tax (same thing, just taxing the employer instead). Interestingly, you can use the same model for analyzing things like tax incentives. For example, if the government decided to subsidize domestic apple production by giving the domestic producers a $0.50 bonus for each apple they sell, we would end up with a similar kind of analysis, except instead of the foreign supply curve shifting up, we’d see the domestic supply curve shifting down.

And generally speaking, this is what you’ll always see with government involvement in the economy. It will result in disrupting an existing equilibrium, letting the market readjust to a new equilibrium, and incentivization of some behavior, causing some people to benefit and others to lose out. We saw with the apple tariff, domestic producers and the government benefited while others lost.

You can see the reverse though with tax incentives. If I give a tax incentive of providing a deduction (not paying income tax) for preschool, we would end up with:

  • Government needs to make up the difference in tax revenue, either by raising taxes on others or printing more money (leading to inflation). Either way, those paying the tax or those holding government debased currency will pay a price.
  • Those people who don’t use the preschool deduction will receive no benefit, so they simply pay a cost.
  • Those who do use the preschool deduction will end up paying less on tax+preschool than they would have otherwise.

This analysis is fully amoral. It’s not saying whether providing subsidized preschool is a good thing or not, it simply tells you where the costs will be felt, and points out that such government interference in free economic choice does result in inefficiencies in the system. Once you have that knowledge, you’re more well educated on making a decision about whether the costs of government intervention are worth the benefits.

February 04, 2025 12:00 AM

February 02, 2025

Joachim Breitner

Coding on my eInk Tablet

For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It’s winter right now, but when its summer again it’s always a bit. this weekend I got closer to that goal.

TL;DR: Using code-server on a beefy machine seems to be quite neat.

Passively lit coding
Passively lit coding

Personal history

Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since.

Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I’d attach it to my laptop and it just works™. Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet’s screen’s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that.

Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don’t have to use Boox’s “monitor” app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn’t as convenient to be used regularly.

I also played around with pure terminal approaches, e.g. SSH’ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.

VSCode, working remotely

Since these attempts I have started a new job working on the Lean theorem prover, and working on or with Lean basically means using VSCode. (There is a very good neovim plugin as well, but I’m using VSCode nevertheless, if only to make sure I am dogfooding our default user experience).

My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it’s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it’s a bit spooky to run these work loads without the laptop’s fan spinning up.

In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there?

VSCode itself doesn’t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix)… but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet’s system.

code-server, running remotely

A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.

Access

With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment.

I also decided against the more involved “reverse proxy behind proper hostname with SSL” setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use.

That left me with the option of using a code-server’s built-in support for self-signed certificates and a password:

$ cat .config/code-server/config.yaml
bind-addr: 1.2.3.4:8080
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: true

With trust-on-first-use this seems reasonably secure.

Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot “install” the page (as a Progressive Web App) unless it has a valid certificate. But since I don’t have superuser access to that machine, I can’t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using acme.sh) and copied the files over, so the configuration now reads as follows:

bind-addr: 1.2.3.4:3933
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.cer
cert-key: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.key

(This is getting very specific to my particular needs and constraints, so I’ll spare you the details.)

Service

To keep code-server running I created a systemd service that’s managed by my user’s systemd instance:

~ $ cat ~/.config/systemd/user/code-server.service
[Unit]
Description=code-server
After=network-online.target

[Service]
Environment=PATH=/home/joachim/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
ExecStart=/nix/var/nix/profiles/default/bin/nix run nixpkgs#code-server

[Install]
WantedBy=default.target

(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server.

For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.

Git credentials

The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved.

I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn’t be horrible, but I still wondered if I can do better.

I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn’t want to bother anyone on the weekend.

So I am experimenting with Github’s git-credential-manager now. I have configured it to use git’s credential cache with an elevated timeout, so that once I log in, I don’t have to again for one workday.

$ nix-env -iA nixpkgs.git-credential-manager
$ git-credential-manager configure
$ git config --global credential.credentialStore cache
$ git config --global credential.cacheOptions "--timeout 36000"

To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character…

This still grants rather broad permissions to the code-server, but at least only temporarily

Android setup

On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working!

I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It’s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it’s a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn’t on Open-VSX.

For some reason the F11 key doesn’t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode’s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that’s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings.

A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn’t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed.

I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn’t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions.

Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does’t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don’t do that I get the full screen experience. We’ll see.

I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean’s InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically.

I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I’d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don’t have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?

Conclusion

After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer.

A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe – I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?

by Joachim Breitner (mail@joachim-breitner.de) at February 02, 2025 03:07 PM

January 31, 2025

Well-Typed.Com

An introduction to Cabal Hooks for package authors

Over the last year, Well-Typed have carried out significant work in Cabal, Haskell’s build system, thanks to funding from the Sovereign Tech Fund. Our main goal was to re-think the Cabal architecture for building packages. This was historically tied to the Setup command-line interface, with each package technically capable of providing its own independent build system via the Custom build-type. In practice, the full generality of this interface is not useful, and it obstructs the development of new features and created a drag on maintenance, so there has long been an appetite to reimagine this interface within Cabal.1

With the release of Cabal-3.14.0.0 and cabal-install-3.14.1.1, the new Hooks build-type we have developed, together with the Cabal-hooks library, are now available to package authors. Over time, we hope to see packages that depend on the Custom build-type gradually migrate to use Hooks instead.

For more background on this work, check out:

In the remainder of this post, we will:

  • dive into the background details of how Cabal works,

  • provide an introduction to the new interfaces for package authors who may wish to adapt their packages.

This post is based on Sam’s talk at the Haskell Ecosystem Workshop 2024.

Background

The Cabal specification

The Cabal specification (2005) was designed to allow Haskell tool authors to package their code and share it with other developers.

The Haskell Package System (Cabal) has the following main goal:

  • to specify a standard way in which a Haskell tool can be packaged, so that it is easy for consumers to use it, or re-package it, regardless of the Haskell implementation or installation platform.

The Cabal concept of a package is a versioned unit of distribution in source format, with enough metadata to allow it to be built and packaged by downstream distributors (e.g. Linux distributions and other build tools).

A Cabal package consists of multiple components which map onto individual Haskell units (e.g. a single library or executable).

The Cabal package model

Each package must bundle some metadata, specified in a .cabal file. Chiefly:

  • the package name and version number,
  • its dependencies, including version bounds (e.g. base >= 4.17 && < 4.21, lens ^>= 5.3),
  • what the package provides (libraries and their exposed modules, executables…),
  • how to build the package (e.g. build-type: Simple).

The Cabal library then implements everything required to build individual packages, first parsing the .cabal file and then building and invoking the Setup script of the package.

The Setup interface

The key component of the original Cabal specification is that each package must provision an executable which is used to build it. As written in an early draft:

To help users install packages and their dependencies, we propose a system similar to Python’s Distutils, where each Haskell package is distributed with a script which has a standard command-line interface.

More precisely, to comply with the Cabal specification, the build system of a package need only implement the Setup command-line interface, i.e. provide a Setup executable that supports invocations of the form ./Setup <cmd>:

<cmd> description
configure resolve compiler, tools and dependencies
build/haddock/repl prepare sources and build/generate docs/open a session in the interpreter
test/bench run testsuites or benchmarks
copy/install/register move files into an image dir or final location/register libraries with the compiler
sdist create an archive for distribution/packaging
clean clean local files (local package store, local build artifacts, …)

In practice, the ./Setup configure command takes a large number of parameters (as represented in the Cabal ConfigFlags datatype). This configuration is preserved for subsequent invocations, which usually only take a couple of parameters (e.g. ./Setup build -v2 --builddir=<dir>).

This interface can be used directly to build any package, by executing the the following recipe:

  • build and install the dependencies in dependency order;
  • to build each individual unit:
    • ./Setup configure <componentName> <configurationArgs>
    • ./Setup build --builddir=<buildDir>
    • ./Setup haddock --builddir=<buildDir> <haddockArgs> (optional, to generate documentation)
  • to make a unit available to units that depend on it:
    • ./Setup copy --builddir=<buildDir> --destDir=<destDir> (this makes executables available, e.g. for build-tool-depends)
    • for libraries, registration (see § Library registration):
      • ./Setup register --builddir=<buildDir> --gen-pkg-config=<unitPkgRegFile>
      • hc-pkg register --package-db=<pkgDb> <unitPkgRegFile>

Usually, these steps will be executed by a build tool such as cabal-install, which provides a more convenient user interface than invoking Setup commands directly. Some systems (such as nixpkgs) do directly use this interface, however.

The tricky parts in the above are:

  • passing appropriate arguments to ./Setup configure, in particular exactly specifying dependencies,2 and making sure the arguments are consistent with those expected by the cabal-version of the package,3
  • constructing the correct environment for invoking ./Setup, e.g. adding appropriate build-tool-depends executables in PATH and defining the corresponding <buildTool>_datadir environment variables.

Library registration

In the above recipe to build packages, there was a single step which wasn’t an invocation of the Setup script: a call to hc-pkg. To quote from the original Cabal specification:

  • Each Haskell compiler hc must provide an associated package-management program hc-pkg. A compiler user installs a package by placing the package’s supporting files somewhere, and then using hc-pkg to make the compiler aware of the new package. This step is called registering the package with the compiler.
  • To register a package, hc-pkg takes as input an installed package description (IPD), which describes the installed form of the package in detail.

This is the key interchange mechanism between Cabal and the Haskell compiler.

The installed package description format is laid out in the Cabal specification; in brief, it contains all the information the Haskell compiler needs to use a library, such as its exposed modules, its dependencies, and its installation path. This information can be seen by calling hc-pkg describe:

> ghc-pkg describe attoparsec --package-db=<cabal-store>/<ghc-ver>/package.db
name:            attoparsec
version:         0.14.4
visibility:      public
id:              attoparsec-0.14.4-b35cdbf2c0654f3ef00c00804c5e2b390700d4a0
abi:             d84b6b3e46222f7ab87b5a2d405e7f48
exposed:         True
exposed-modules:
    Data.Attoparsec Data.Attoparsec.ByteString
    [...]
hidden-modules:
    Data.Attoparsec.ByteString.Internal Data.Attoparsec.Text.Internal
depends:
    array-0.5.7.0-9340
    attoparsec-0.14.4-ab0b5b7d4498267368f35b0c9f521e31e33fe144
    base-4.20.0.0-30dc bytestring-0.12.1.0-b549 containers-0.7-2f81
    deepseq-1.5.0.0-30ad ghc-prim-0.11.0-d05e
    scientific-0.3.6.2-d4ceb07500a94c3c60cb88dff4bfb53d40348b25
    text-2.1.1-e169 transformers-0.6.1.1-6955

Note that, perhaps confusingly, the hc-pkg interface is not concerned with Cabal’s notion of “packages”. Rather, it deals only in “units”; these generally map to Cabal components, such as the package’s main library and its private and public sublibraries. For example, the internal attoparsec-internal sublibrary of the attoparsec package is registered separately:

> ghc-pkg describe z-attoparsec-z-internal
name:            z-attoparsec-z-attoparsec-internal
version:         0.14.4
package-name:    attoparsec
lib-name:        attoparsec-internal
id:              attoparsec-0.14.4-ab0b5b7d4498267368f35b0c9f521e31e33fe144
abi:             908ae57d09719bcdfb9cf85a27dab0e4
exposed-modules:
    Data.Attoparsec.ByteString.Buffer
    Data.Attoparsec.ByteString.FastSet Data.Attoparsec.Internal.Compat
    [...]
depends:
    array-0.5.7.0-9340 base-4.20.0.0-30dc bytestring-0.12.1.0-b549
    text-2.1.1-e169

How the Setup interface is used by packages

Centering the package build process around the Setup script provides a great deal of flexibility to package authors, as the Setup executable can be implemented in any way the package author chooses. In this way, each package brings its own build system.

However, in practice, this is more expressiveness that most library authors want or need. Consequently, almost all packages use one of the following two build systems:

  1. build-type: Simple (most packages). For such packages, the Setup.hs file is of the following form:

    module Main where
    import Distribution.Simple (defaultMain)
    main = defaultMain

    This means that the ./Setup CLI interface maps directly to the implementation provided by the Cabal library:

  2. build-type: Custom where the Setup.hs file uses the Cabal library to perform most of the build, but brackets some of its logic with package-specific code using the Cabal UserHooks mechanism, e.g. so that it runs custom configuration code after Cabal configure, or generates module sources before running Cabal build.

For an example of case (2), the custom Setup.hs code for hooking into the configure phase might look like the following:

main =
  ( defaultMainWithHooks simpleUserHooks )
    { confHook = \ info cfgFlags -> do
        info' <- customPreConfHook info cfgFlags
        confHook simpleUserHooks info' cfgFlags
    }

In this example, simpleUserHooks means “no hooks” (or more accurately “exactly the hooks that build-type: Simple uses”). So the above snippet shows how we can include custom logic in customPreConfHook in order to update the Cabal GenericPackageDescription, before calling the Cabal library configure function (via confHook simpleUserHooks). Here, aGenericPackageDescription is the representation of a .cabal file used by Cabal (the Generic part means “before attempting to resolve any conditionals”).

The fact that Setup executables may (in principle) be arbitrary when using build-type: Custom fundamentally limits what build tools such as cabal-install or the Haskell Language Server can do in multi-package projects. The tool has to treat the build system of each package as an opaque black box, merely invoking functionality defined by the specific version of the Setup interface supported by the package.

The main observation is that, in practice, custom Setup.hs scripts only insert benign modifications to the build process: they still fundamentally rely on the Cabal library to do the bulk of the work building the package.

A replacement for Custom setup scripts

The limitations of the Setup interface discussed above motivate the need for a new mechanism to customise the build system of a package:

  • The bulk of the work should be carried out by the Cabal library, which exposes functions such as configure and build, but these need to be augmented with hooks so that individual packages can customise certain phases.
  • The hooks provided by this mechanism should be kept to a minimum (to give more flexibility to build tools) while still accommodating the needs of package authors in practice.
  • Customisation should be declared by a Haskell library interface (as opposed to the black-box command-line interface of Setup.hs), in order to enable as much introspection by build systems as possible.

This will enable a gradual restructuring of build tools such as cabal-install away from the Setup command-line interface, which has grown unwieldy due to the difficulty of evolving it to meet requirements that could not be foreseen when it was created.

Building on this understanding, as well as a survey of existing uses cases of build-type: Custom, we have introduced an alternative mechanism for customizing how a package is built: build-type: Hooks. This mechanism does not allow arbitrary replacement of the usual Cabal build logic, but rather merely exposes a set of well-defined hooks which bracket a subset of Cabal’s existing build steps.

We arrived at this design through collaboration with Cabal developers, users, and packagers as part of a RFC process in Haskell Foundation Tech Proposal #60.

Introducing build-type: Hooks

The main documentation for usage of the hooks API is provided in the Haddocks for the Cabal-hooks package. The Cabal Hooks overlay contains patched packages using build-type: Hooks. It can be used as an overlay like head.hackage, for constructing build plans without any build-type: Custom packages. It can also serve as a reference for usage of the API.

At a high-level, a package with build-type: Hooks:

  • declares in its .cabal file:
    • a cabal-version of at least 3.14,
    • build-type: Hooks,
    • a custom-setup stanza with a dependency on Cabal-hooks (the latter is a library bundled with Cabal that provides the API for writing hooks):
cabal-version: 3.14
...
build-type: Hooks
...

custom-setup
  setup-depends:
    base        >= 4.18 && < 5,
    Cabal-hooks >= 0.1  && < 0.2
  • contains a SetupHooks.hs Haskell module source file, next to the .cabal file, which specifies the hooks the package uses. This module exports a value setupHooks :: SetupHooks (in which the SetupHooks type is exported by Distribution.Simple.SetupHooks from the Cabal-hooks package).
module SetupHooks where

-- Cabal-hooks
import Distribution.Simple.SetupHooks

setupHooks :: SetupHooks
setupHooks =
  noSetupHooks
    { configureHooks = myConfigureHooks
    , buildHooks = myBuildHooks }

The new hooks fall into the following categories:

  • configure hooks allow customising how a package will be built
  • pre-build rules allow generating source files to be built
  • post-build hooks allow the package to customise the linking step
  • install hooks allow the package to install additional files alongside the usual binary artifacts

In the remainder of this blog post, we will focus on the two most important (and most commonly used) hooks: configure hooks and pre-build rules.

Configure hooks

The configure hooks allow package authors to make decisions about how to build their package, by modifying the Cabal package description (which is Cabal’s internal representation of the information in a .cabal file). Crucially, these modifications will persist to all subsequent phases.

Configuration happens at two levels:

  • global configuration covers the entire package,
  • local configuration covers a single component.

There are three hooks into the configure phase:

  1. Package-wide pre-configure. This can be used for custom logic in the style of traditional ./configure scripts, e.g. finding out information about the system and configuring dependencies, when those don’t easily fit into Cabal’s framework.

  2. Package-wide post-configure. This can be used to write custom package-wide information to disk, to be consumed by (3).

  3. Per-component pre-configure. This can be used to modify individual components, e.g. adding exposed modules or specifying flags to be used when building the component.

Per-package configuration

Suppose our package needs to use some external executable, e.g. a preprocessor. If we require custom logic to find this external executable on the system, or to parse its version number, we need to go beyond Cabal’s built-in support for build-tool-depends.

We can do this in a pre-configure hook:

myConfigureHooks :: ConfigureHooks
myConfigureHooks =
  noConfigureHooks
    { preConfigurePackageHook = Just configureCustomPreProc }

configureCustomPreProc :: PreConfPackageInputs -> IO PreConfPackageOutputs
configureCustomPreProc pcpi@( PreConfPackageInputs { configFlags = cfg, localBuildConfig = lbc } ) = do
  let verbosity = fromFlag $ configVerbosity cfg
      progDb = withPrograms lbc
  configuredPreProcProg <-
    configureUnconfiguredProgram verbosity customPreProcProg progDb
  return $
    ( noPreConfPackageOutputs pcpi )
      { extraConfiguredProgs =
        Map.fromList
          [ ( customPreProcName, configuredPreProcProg ) ]
      }

customPreProcName :: String
customPreProcName = "customPreProc"

customPreProcProg :: Program
customPreProcProg =
  ( simpleProgram customPreProcName )
    { programFindLocation =
        -- custom logic to find the installed location of myPreProc
        -- on the system used to build the package
        myPreProcProgFindLocation
    , programFindVersion =
        -- custom logic to find the program version
        myPreProcProgFindVersion
    }

Cabal will then add this program to its program database, allowing the program to be used to satisfy build-tool-depends requirements, as well as making it available in subsequent hooks (e.g. pre-build hooks).

Modifying individual components

Suppose we want to modify a component of a Cabal package, e.g. inserting configuration options determined by inspecting the system used to build the package (e.g. availability of certain processor capabilities). We can do this using hooks into the configure phase. For illustration, consider the following example, which includes:

  • a package-wide post-configure hook, which inspects the system to determine availability of AVX2 CPU features, and writes it out to a "system-info" file,
  • a per-component pre-configure hook which reads the "system-info" file, and uses that to pass appropriate compiler options (e.g. -mavx2) when compiling each component.
myConfigureHooks :: ConfigureHooks
myConfigureHooks =
  noConfigureHooks
    { postConfPackageHook  = Just writeSystemInfo
    , preConfComponentHook = Just confComps
    }

data SystemInfo = SystemInfo { supportsAVX2 :: !Bool }
  deriving stock ( Show, Read )
    -- Show/Read for a quick-and-dirty serialisation interface (illustration only)

systemInfoFlags :: SystemInfo -> [ String ]
systemInfoFlags ( SystemInfo { supportsAVX2 } ) =
  [ "-mavx2" | supportsAVX2 ]

writeSystemInfo :: PostConfPackageInputs -> IO ()
writeSystemInfo ( PostConfPackageInputs { packageBuildDescr = pbd } ) = do
  let cfg = LBC.configFlags pbd
      distPref = fromFlag $ configDistPref cfg
      mbWorkDir = flagToMaybe $ configWorkingDir cfg
  supportsAVX2 <- System.Cpuid.Basic.supportsAVX2
  -- + more system-wide checks, if desired
  writeFile ( interpretSymbolicPath mbWorkDir $ systemInfoFile distPref )
    ( show $ SystemInfo { supportsAVX2 } )

systemInfoFile :: SymbolicPath Pkg ( Dir Dist ) -> SymbolicPath Pkg File
systemInfoFile distPref = distPref </> makeRelativePathEx "system-info"

confComps :: PreConfComponentInputs -> IO PreConfComponentOutputs
confComps pcci@( PreConfComponentInputs { packageBuildDescr = pbd, component = comp } ) = do
  let cfg = LBC.configFlags pbd
      distPref = fromFlag $ configDistPref cfg
      mbWorkDir = flagToMaybe $ configWorkingDir cfg
  sysInfo <- read <$> readFile ( interpretSymbolicPath mbWorkDir $ systemInfoFile distPref )
  let opts = systemInfoFlags sysInfo
      bi' = emptyBuildInfo
              { ccOptions = opts
              , cxxOptions = opts
              , options = PerCompilerFlavor opts []
              }
  return $
    ( noPreConfComponentOutputs pcci )
      { componentDiff =
         buildInfoComponentDiff ( componentName comp ) bi'
      }

Pre-build rules

Pre-build rules can be used to generate Haskell source files which can then be built as part of the compilation of a unit. Since we want to ensure that such generated modules don’t break recompilation avoidance (thereby crippling HLS and other interactive tools), these hooks comprise a simple build system. They are described in the Haddock documentation for Cabal-hooks.

The overall structure is that one specifies a collection of Rules inside the monadic API in the RulesM monad.

Each individual rule contains a Command, consisting of a statically specified action to run (e.g. a preprocessor such as alex, happy or c2hs) bundled with (possibly dynamic) arguments (such as the input and output filepaths). In the Hooks API, these are constructed using the mkCommand function. The actions are referenced using static pointers; this allows the static pointer table of the SetupHooks module to be used as a dispatch table for all the custom preprocessors provided by the hooks.

One registers rules using staticRule, declaring the inputs and outputs of each rule. In this way, we can think of each rule as corresponding to an individual invocation of a custom preprocessor. Rules are also allowed to have dynamic dependencies (using dynamicRule instead of staticRule); this supports use-cases such as C2Hs in which one needs to first process .chs module headers to discover the import structure.

Let’s start with a simple toy example to get used to the API: declare hooks that run alex on Lexer.alex and happy on Parser.happy (running alex/happy on *.x/*.y files is built into Cabal, but this is just for illustrative purposes).

{-# LANGUAGE StaticPointers #-}
-- [...]
myBuildHooks :: BuildHooks
myBuildHooks =
  noBuildHooks
    { preBuildComponentRules =
      Just $ rules ( static () ) myPreBuildRules
    }

myPreBuildRules :: PreBuildComponentInputs -> RulesM ()
myPreBuildRules pbci = do
  -- [...]
  -- Define the alex/happy commands.
      alexCmd  = mkCommand ( static Dict ) ( static runAlex )
      happyCmd = mkCommand ( static Dict ) ( static runHappy )
  -- Register a rule: run alex on Lexer.alex, producing Lexer.hs.
  let lexerInFile  = Location srcDir     ( makeRelativePathEx "Lexer.alex" )
      lexerOutFile = Location autogenDir ( makeRelativePathEx "Lexer.hs" )
  registerRule_ "alex:Lexer" $
    staticRule ( alexCmd ( verbosity, mbWorkDir, alex, lexerInFile, lexerOutFile ) )
      {- inputs  -} [ FileDependency lexerInFile ]
      {- outputs -} ( NE.singleton lexerOutFile )
  -- Register a rule: run happy on Parser.happy, producing Parser.hs.
  let parserInFile  = Location srcDir     (  makeRelativePathEx "Parser.happy" )
      parserOutFile = Location autogenDir (  makeRelativePathEx "Parser.hs" )
  registerRule_ "happy:Parser" $
    staticRule ( happyCmd ( verbosity, mbWorkDir, happy, parserInFile, parserOutFile ) )
      {- inputs  -} [ FileDependency parserInFile ]
      {- outputs -} ( NE.singleton parserOutFile )

runAlex, runHappy :: ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location ) -> IO ()
runAlex  = runPp ( Suffix "x" )
runHappy = runPp ( Suffix "y" )

runPp :: Suffix
      -> ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location )
      -> IO ()
runPp ( Suffix ppExt ) ( verbosity, mbWorkDir, ppProg, inLoc, outLoc ) = do
  -- Alex/Happy expect files with a specific extension,
  -- so we make a new temporary file and copy its contents,
  -- giving the file the expected file extension.
  tempDir <- makeSymbolicPath <$> getTemporaryDirectory
  withTempFileCwd mbWorkDir tempDir ( "." <> ppExt ) $ \ inputPpFile _ -> do
    copyFileVerbose verbosity
      ( interpretSymbolicPath mbWorkDir $ location inLoc )
      ( interpretSymbolicPath mbWorkDir inputPpFile )
    runProgramCwd verbosity mbWorkDir ppProg
      [ getSymbolicPath inputPpFile
      , "-o"
      , getSymbolicPath ( location outLoc )
      ]

The static Dict arguments to mkCommand provide evidence that the arguments passed to the preprocessor can be serialised and deserialised. While syntactically inconvenient for writers of Hooks, this crucially allows external build tools (such as cabal-install or HLS) to run and re-run individual build rules without re-building everything, as explained in the Haskell Foundation Tech Proposal #60.

Rules are allowed to depend on the output of other rules, as well as directly on files (using the Location datatype). If rule B depends on a file generated by rule A, then one must declare A as rule dependency of B (and not use a file dependency).

To summarise, the general structure is that we use the monadic API to declare a collection of rules (usually, one rule per Haskell module we want to generate, but a rule can generate multiple outputs as well). Each rule stores a reference (via StaticPointers) to a command to run, as well as the (possibly dynamic) arguments to that command. We can think of the pre-build rules as a table of statically known custom pre-processors, together with a collection of invocations of these custom pre-processors with specific arguments.

A word of warning: authors of pre-build rules should use the static keyword at the top-level whenever possible in order to avoid GHC bug #16981. In the example above, this corresponds to defining runAlex and runHappy at the top-level, instead of defining them in-line in the body of myPreBuildRules.

Custom pre-processors

To illustrate how to write pre-build rules, let’s suppose one wants to declare a custom preprocessor, say myPreProc, which generates Haskell modules from *.hs-mypp files. Any component of the package which requires such pre-processing would declare build-tool-depends: exe:myPreProc.

The pre-build rules can be structured as follows:

  1. Look up the pre-processor in the Cabal ProgramDb (program database).
  2. Define how, given input/output files, we should invoke this preprocessor, e.g. what arguments should we pass to it.
  3. Search for all *.hs-mypp files relevant to the project, monitoring the results of this search (for recompilation checking).
  4. For each file found by the search in (3), register a rule which invokes the processor as in (2).
{-# LANGUAGE StaticPointers #-}
myBuildHooks =
  noBuildHooks
    { preBuildComponentRules =
        Just $ rules ( static () ) myPreBuildRules
    }

myPreBuildRules :: PreBuildComponentInputs -> RulesM ()
myPreBuildRules
  PreBuildComponentInputs
    { buildingWhat   = what
    , localBuildInfo = lbi
    , targetInfo     = TargetInfo { targetComponent = comp, targetCLBI = clbi }
    } = do
  let verbosity = buildingWhatVerbosity what
      progDb = withPrograms lbi
      bi = componentBuildInfo comp
      mbWorkDir = mbWorkDirLBI lbi
  -- 1. Look up our custom pre-processor in the Cabal program database.
  for_ ( lookupProgramByName myPreProcName progDb ) $ \ myPreProc -> do
    -- 2. Define how to invoke our custom preprocessor.
    let myPpCmd :: Location -> Location -> Command MyPpArgs ( IO () )
        myPpCmd inputLoc outputLoc =
          mkCommand ( static Dict ) ( static ppModule )
            ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc )

    -- 3. Search for "*.hs-mypp" files to pre-process in the source directories of the package.
    let glob = GlobDirRecursive [ WildCard, Literal "hs-mypp" ]
    myPpFiles <- liftIO $ for ( hsSourceDirs bi ) $ \ srcDir -> do
      let root = interpretSymbolicPath mbWorkDir srcDir
      matches <- runDirFileGlob verbosity Nothing root glob
      return
        [ Location srcDir ( makeRelativePathEx match )
        | match <- globMatches matches
        ]
    -- Monitor existence of file glob to handle new input files getting added.
    --   NB: we don't have to monitor the contents of the files, because the files
    --       are declared as inputs to rules, which means that their contents are
    --       automatically tracked.
    addRuleMonitors [ monitorFileGlobExistence $ RootedGlob FilePathRelative glob ]
      -- NB: monitoring a directory recursive glob isn't currently supported;
      -- but implementing support would be a nice newcomer-friendly task for cabal-install.
      -- See https://github.com/haskell/cabal/issues/10064.

    -- 4. Declare rules, one for each module to be preprocessed, with the
    --    corresponding preprocessor invocation.
    for_ ( concat myPpFiles ) $ \ inputLoc@( Location _ inputRelPath ) -> do
      let outputBaseLoc = autogenComponentModulesDir lbi clbi
          outputLoc =
            Location
              outputBaseLoc
              ( unsafeCoerceSymbolicPath $ replaceExtensionSymbolicPath inputRelPath "hs" )
      registerRule_ ( toShortText $ getSymbolicPath inputRelPath ) $
        staticRule ( myPpCmd inputLoc outputLoc ) [] ( outputLoc NE.:| [] )

type MyPpArgs = ( Verbosity, Maybe ( SymbolicPath CWD ( Dir Pkg ) ), ConfiguredProgram, Location, Location )
  -- NB: this could be a datatype instead, but it would need a 'Binary' instance.

ppModule :: MyPpArgs -> IO ()
ppModule ( verbosity, mbWorkDir, myPreProc, inputLoc, outputLoc ) = do
  let inputPath  = location inputLoc
      outputPath = location outputLoc
  createDirectoryIfMissingVerbose verbosity True $
    interpretSymbolicPath mbWorkDir $ takeDirectorySymbolicPath outputPath
  runProgramCwd verbosity mbWorkDir myPreProc
    [ getSymbolicPath inputPath, getSymbolicPath outputPath ]

This might all be a bit much on first reading, but the key principle is that we are declaring a preprocessor, and then registering one invocation of this preprocessor per *.hs-mypp file:

  • In myPpCmd, the occurrence of static ppModule can be thought of as declaring a new preprocessor,4 with ppModule being the function to run. This is accompanied by the neighbouring static Dict occurrence, which provides a way to serialise and deserialise the arguments passed to preprocessor invocations.

  • We register one rule per each module to pre-process, which means that external build tools can re-run the preprocessor on individual modules when the source *.hs-mypp file changes.

Conclusion

This post has introduced build-type: Hooks for the benefit of package authors who use build-type: Custom. We hope that this introduction will inspire and assist package authors to move away from build-type: Custom in the future.

We encourage package maintainers to explore build-type: Hooks and contribute their feedback on the Cabal issue tracker, helping refine the implementation and expand its adoption across the ecosystem. To assist such explorations, we also recall the existence of the Cabal Hooks overlay, an overlay repository like head.hackage which contains packages that have been patched to use build-type: Hooks instead of build-type: Custom.

In addition to the work described here, we have done extensive work in cabal-install to address technical debt and enable it to make use of the new interface as opposed to going through the Setup CLI. The changes needed in cabal-install and other build tools (such as HLS) will be the subject of a future post.

While there remains technical work needed in cabal-install and HLS to fully realize the potential of build-type: Hooks, it should eventually lead to:

  • decreases in build times,
  • improvements in recompilation checking,
  • more robust HLS support,
  • removal of most limitations of build-type: Custom, such as the lack of ability to use multiple sublibraries,
  • better long-term maintainability of the Cabal project.

Well-Typed are grateful to the Sovereign Tech Fund for funding this work. In order to continue our work on Cabal and the rest of the Haskell tooling ecosystem, we are offering Haskell Ecosystem Support Packages. If your company relies on Haskell, please encourage them to consider purchasing a package!


  1. See, for example, Cabal issue #3600.↩︎

  2. e.g. --package-db=<pkgDb>, --cid=<unitId> and --dependency=<depPkgNm>:<depCompNm>=<depUnitId> arguments↩︎

  3. The cabal-version field of a package description specifies the version of the Cabal specification it expects. As the Cabal specification evolves, so does the set of flags understood by the Setup CLI. This means that, when invoking the Setup script for a package, the build tool needs to be careful to pass arguments consistent with that version; see for instance how cabal-install handles this in Distribution.Client.Setup.filterConfigureFlags.↩︎

  4. In practice, this means adding an entry to the static pointer table.↩︎

by sam at January 31, 2025 12:00 AM

January 26, 2025

Chris Reade

PenroseKiteDart Animations

About PenroseKiteDart

Below we present some animations that illustrate operations on finite patches of Penrose’s Kite and Dart tiles.

These were created using PenroseKiteDart which is a Haskell package available on Hackage making use of the Haskell Diagrams package. For details, see the PenroseKiteDart user guide.

Penrose’s Kite and Dart tiles can produce infinite aperiodic tilings of the plane. There are legal tiling rules to ensure aperiodicity, but these rules do not guarantee that a finite tiling will not get stuck. A legal finite tiling which can be continued to cover the whole plane is called a correct tiling. The rest, which are doomed to get stuck, are called incorrect tilings. (More details can be found in the links at the end of this blog.)

Decomposition Animations

The function decompose is a total operation which is guaranteed to preserve the correctness of a finite tiling represented as a tile graph (or Tgraph). Let us start with a particular Tgraph called sunGraph which is defined in PenroseKiteDart and consists of 5 kites arranged with a common origin vertex. It is drawn using default style in figure 1 on the left. On the right of figure 1 it is drawn with both vertex labels and dotted lines for half-tile join edges.

Figure 1: sunGraph
Figure 1: sunGraph

We can decompose sunGraph three times by selecting index 3 of the infinite list of its decompositions.

    sunD3 :: Tgraph
    sunD3 = decompositions sunGraph !! 3

where we have used

    decompose :: Tgraph -> Tgraph
    
    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose

The result (sunD3) is drawn in figure 2 (scaled up).

Figure 2: sunD3
Figure 2: sunD3

The animation in figure 3 illustrates two further decompositions of sunD3 in two stages.

Figure 3: Two decompositions of sunD3

Figure 4 also illustrates two decompositions, this time starting from forcedKingD.

    forcedKingD :: Tgraph
    forcedKingD = force (decompose kingGraph)
Figure 4: Two decompositions of forcedKingD

A Composition Animation

An inverse to decomposing (namely composing) has some extra intricacies. In the literature (see for example 1 and 2) versions of the following method are frequently described.

  • Firstly, split darts in half.
  • Secondly, glue all the short edges of the half-darts where they meet a kite (simultaneously). This will form larger scale complete darts and larger scale half kites.
  • Finally join the halves of the larger scale kites.

This works for infinite tilings, but we showed in Graphs,Kites and Darts and Theorems that this method is unsound for finite tilings. There is the trivial problem that a half-dart may not have a complete kite on its short edge. Worse still, the second step can convert a correct finite tiling into an incorrect larger scale tiling. An example of this is given in Graphs, Kites and Darts and Theorems where we also described our own safe method of composing (never producing an incorrect Tgraph when given a correct Tgraph). This composition can leave some boundary half-tiles out of the composition (called remainder half-tiles).

The animation in figure 5 shows such a composition where the remainder half-tiles are indicated with lime green edges.

Figure 5: Composition Animation

In general, compose is a partial operation as the resulting half-tiles can break some requirements for Tgraphs (namely, connectedness and no crossing boundaries). However we have shown that it is a total function on forced Tgraphs. (Forcing is discussed next.)

Forcing Animations

The process of forcing a Tgraph adds half-tiles on the boundary where only one legal choice is possible. This continues until either there are no more forced additions possible, or a clash is found showing that the tiling is incorrect. In the latter case it must follow that the initial tiling before forcing was already an incorrect tiling.

The process of forcing is animated in figure 6, starting with a 5 times decomposed kite and in figure 7 with a 5 times decomposed dart.

Figure 6: Force animation
Figure 7: Another force animation

It is natural to wonder what forcing will do with cut-down (but still correct) Tgraphs. For example, taking just the boundary faces from the final Tgraph shown in the previous animation forms a valid Tgraph (boundaryExample) shown in figure 8.

    boundaryExample :: Tgraph
    boundaryExample = runTry $ tryBoundaryFaceGraph $ force $ decompositions dartGraph !!5
Figure 8: boundaryExample
Figure 8: boundaryExample

Applying force to boundaryExample just fills in the hole to recreate force (decompositions dartGraph !!5) modulo vertex numbering. To make it more interesting we tried removing further half-tiles from boundaryExample to make a small gap. Forcing this also completes the filling in of the boundary half-tiles to recreate force (decompositions dartGraph !!5). However, we can see that this filling in is constrianed to preserve the required Tgraph property of no crossing boundaries which prevents the tiling closing round a hole.

This is illustrated in the animation shown in figure 9.

Figure 9: Boundary gap animation

As another experiment, we take the boundary faces of a (five times decomposed but not forced) star. When forced this fills in the star and also expands outwards, as illustrated in figure 10.

Figure 10: Star boundary

In the final example, we pick out a shape within a correct Tgraph (ensuring the chosen half-tiles form a valid Tgraph) then animate the force process and then run the animation in both directions (by adding a copy of the frames in reverse order).

The result is shown in figure 11.

Figure 11: Heart animation

Creating Animations

Animations as gif files can be produced by the Haskell Diagrams package using the rasterific back end.

The main module should import both Diagrams.Prelude and Diagrams.Backend.Rasterific.CmdLine. This will expose the type B standing for the imported backend, and diagrams then have type Diagram B.

An animation should have type [(Diagram B, Int)] and consist of a list of frames for the animation, each paired with an integer delay (in one-hundredths of a second).

The animation can then be passed to mainWith.

module Main (main) where
    
import Diagrams.Prelude
import Diagrams.Backend.Rasterific.CmdLine

...

fig::[(Diagram B,Int)]
fig = myExampleAnimation

main :: IO ()
main = mainWith fig

If main is then compiled and run (e.g. with parameters -w 700 -o test.gif) it will produce an output file (test.gif with width 700).

Crossfade tool

The decompose and compose animations were defined using crossfade.

crossfade :: Int -> Diagram B -> Diagram B -> [Diagram B]
crossfade n d1 d2 = map blending ratios 
  where
    blending r = opacity (1-r) d1 <> opacity r d2
    ratios = map ((/ fromIntegral n) . fromIntegral) [0..n]

Thus crossfade n d1 d2 produces n+1 frames, each with d1 overlaid on d2 but with varying opacities (decreasing for d1 and increasing for d2).

Adding the same pause (say 10 hundreths of a second) to every frame can be done by applying map (,10) and this will produce an animation.

Force animation tool

To create force animations it was useful to create a tool to produce frames with stages of forcing.

forceFrames :: Angle Double 
            -> Int
            -> Tgraph 
            -> (Colour Double, Colour Double, Colour Double)
            -> [Diagram B]

This takes as arguments

  • an angle argument (to rotate the diagrams in the animation from the default alignment of the Tgraph),
  • an Int (for the required number of frames),
  • a Tgraph (to be forced),
  • a triple of colours for filling darts, kites and grout (edge colour), respectively.

The definition of forceFrames uses stepForce to advance forcing a given number of steps to get the intermediate Tgraphs. The total number of forcing steps will be the number of faces (half-tiles) in the final force g less the number of faces in the initial g. All the Tgraphs are drawn (using colourDKG) but the resulting diagrams must all be aligned properly. The alignment can be achieved by creating a VPatch (vertex patch) from the final Tgraph which is then rotated. All the Tgraphs can then be drawn using sub vertex patches of the final rotated one. (For details see Overlaid examples in the PenroseKiteDart user guide.)

Previous related blogs

  • PenroseKiteDart user guide – this explains how to install and use the PenroseKiteDart package.
  • Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.
  • Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
  • Graphs, Kites and Darts introduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
  • Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).

References

[1] Martin Gardner (1977) MATHEMATICAL GAMES. Scientific American, 236(1), (pages 110 to 121). http://www.jstor.org/stable/24953856

[2] Grünbaum B., Shephard G.C. (1987) Tilings and Patterns. W. H. Freeman and Company, New York. ISBN 0-7167-1193-1 (Hardback) (pages 540 to 542).

by readerunner at January 26, 2025 10:11 AM

January 25, 2025

Ken T Takusagawa

[kufstdwm] alpha-beta with transposition table as a library function

transposition table is the other elegant improvement to minimax (after alpha-beta): elegant in principle, hairy to implement in practice.

consider a generic implementation of alpha-beta game tree search with transposition table, generic enough to be applicable to any user-specified game.  what should be its API?  what features should it provide?

evaluate to infinite depth (possible because of transposition table), returning game value and line (principal variation).  intended for small games.

return the transposition table so that it can be reused for subsequent moves.

evaluate to given depth.  or, user-specified predicate of whether to stop searching, e.g., quiescence search.  quiescence search wants access to the transposition table.

ambitious: because of the many ways game tree search can be customized (for many examples, albeit often poorly described, see the chessprogramming wiki), structure the algorithm as a collection components each of which can be modified and hooked together in various ways.  I have no idea what language or framework could enable this kind of software engineering, though functional programming languages seem attractive as the first thing to try.  but beware that a pure functional programming language such as Haskell easily leaks space for this kind of task, and threading state, the transposition table, though the computation may be awkward.

common customizations sacrifice accuracy (correctness or completeness) for speed.  for example, if two different evaluated positions have the same key (for example, a 64-bit Zobrist hash in chess), one can optimize by doing no transposition table collision resolution; the second position gets ignored, assumed to have already been evaluated.  the default algorithm should not do such optimizations but should allow the user to specify both safe and unsafe optimizations.

allow the search to be augmented with various statistics gathered along the way that get consumed by other user-specified parts of the algorithm.  for example, the move generator could order moves based on values of similar moves already evaluated in other parts of the tree.

provide visibility into how user customizations are working, ways to evaluate whether or not they are worth it.

by Unknown (noreply@blogger.com) at January 25, 2025 04:34 AM

January 24, 2025

Sandy Maguire

Use Monoids for Construction

There’s a common anti-pattern I see in beginner-to-intermediate Haskell programmers that I wanted to discuss today. It’s the tendency to conceptualize the creation of an object by repeated mutation. Often this takes the form of repeated insertion into an empty container, but comes up under many other guises as well.

This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods:

Foo myFoo = new Foo().setBar().setQux(17).setZap(true);

Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this:

Doodad doodad = new Doodad;
foreach (Widget widget in widgets) {
  doodad.addWidget(widget);
}

To be more concrete, maybe instead of doodads and widgets you have BSTs and Nodes. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code.

Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like:

foldr
    (\(k, v) m -> Map.insert k v m)
    Map.empty
  $ toKVPairs something

Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark.

The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and foldrs all have in common is that they are algorithms for strategies for building something.

But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at.

I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one?

cgo :: (a -> (UInt, UInt)) -> [a] -> [NonEmpty a]
cgo f = foldr step []
  where
    step a [] = [pure a]
    step a bss0@((b :| bs) : bss)
      | let (al, ac) = f a
      , let (bl, bc) = f b
      , al + 1 == bl && ac == bc
            = (a :| b : bs) : bss
      | otherwise = pure a : bss0

which I found by grepping through haskell-language-server for foldr, and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests!

It’s Always Monoids

The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces.

The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.”

As an amazingly fast recap, a monoid is a collection of three things: some type m, some value of that type mempty, and binary operation over that type (<>) :: m -> m -> m, subject to a bunch of laws:

∀a. mempty <> a = a = a <> mempty
∀a b c. (a <> b) <> c = a <> (b <> c)

which is to say, mempty does nothing and (<>) doesn’t care where you stick the parentheses.

If you’re going to memorize any two particular examples of monoids, it had better be these two:

instance Monoid [a] where
  mempty = []
  a <> b = a ++ b

instance (Monoid a, Monoid b) => Monoid (a, b) where
  mempty = (mempty, mempty)
  (a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2)

The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids.

The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation toList a <> toList b = toList (a <> b). Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap.

The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example:

from :: (Bool -> a) -> (a, a)
from f = (f False, f True)

to :: (a, a) -> (Bool -> a)
to (a, _) False = a
to (_, a) True  = a

and under this isomorphism, we should thereby expect the Monoid a => Monoid (Bool -> a) instance to agree with Monoid a => Monoid (a, a). If you generalize this out, you get the following instance:

instance Monoid a => Monoid (x -> a) where
  mempty = \_ -> mempty
  f <> g = \x -> f x <> g x

which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation:

  from f <> from g
= (f False,  f True) <> (g False, g True)
= (f False <> g False, f True <> g True)
= ((f <> g) False, (f <> g) True)
= from (f <> g)

and

  to (a11, a12) <> to (a21, a22)
= \x -> to (a11, a12) x <> to (a21, a22) x
= \x -> case x of
    False -> to (a11, a12) False <> to (a21, a22) False
    True  -> to (a11, a12) True  <> to (a21, a22) True
= \x -> case x of
    False -> a11 <> a21
    True  -> a12 <> a22
= \x -> to (a11 <> a21, a12 <> a22) x
= to (a11 <> a21, a12 <> a22)

which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type x in the domain of the function, but showing it generically is challenging.

Anyway, I digresss.

The reason to memorize this Monoid instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be.

Don’t believe me? A Map k v is an encoding of the function k -> Maybe v optimized to efficiently query which k values map to Just something. That is to say, it’s a sparse representation of a function.

From Theory to Practice

What does all of this look like in practice? Stuff like worrying about foldr is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application.

The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things.

At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well.

It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the Scope construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these Scope nodes, and propagating their information downstream.

Rather amazingly, this implementation kinda sorta worked! But it was slow, and took \(O(10k)\) SLOC to implement.

The insight here is that everything we needed to collect was monoidal:

data Solution = Solution
  { graph :: Graph
  , collectedNodes :: Set Node
  , metadata :: Map Node Metadata
  }
  deriving stock (Generic)
  deriving (Semigroup, Monoid) via Generically Solution

where the deriving (Semigroup, Monoid) via Generically Solution stanza gives us the semigroup and monoid instances that we’d expect from Solution being the product of a bunch of other monoids.

And now for the coup de grace: we hook everything up with the Writer monad. Writer is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details:

instance Monoid w => Monad (Writer w)

Writer w is a monad whenever w is a monoid, which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions:

collectNode :: MonadWriter Solution m => Node -> m ()
collectNode n = tell $ mempty { collectedNodes = Set.singleton n }

addMetadata :: MonadWriter Solution m => Node -> Metadata -> m ()
addMetadata n m = tell $ mempty { metadata = Map.singleton n m }

emitGraphFragment :: MonadWriter Solution m => Graph -> m ()
emitGraphFragment g = tell $ mempty { graph = g }

each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type:

algorithm
  :: Metadata
  -- ^ the current scope
  -> Tree
  -- ^ the tree we're reducing
  -> Writer Solution Node
  -- ^ our partial solution, and the node corresponding to the root of the tree

which traverses the Tree, recursing with a different Metadata whenever it comes across a Scope constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root Node of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces.

This new implementation is roughly 20x smaller, coming in at @O(500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation.

Chalk it down to another win for induction!

January 24, 2025 09:35 AM

January 23, 2025

Brent Yorgey

You could have invented Fenwick trees

You could have invented Fenwick trees

Posted on January 23, 2025
Tagged , , , , , ,

My paper, You could have invented Fenwick trees, has just been published as a Functional Pearl in the Journal of Functional Programming. This blog post is an advertisement for the paper, which presents a novel way to derive the Fenwick tree data structure from first principles.

Suppose we have a sequence of integers \(a_1, \dots, a_n\) and want to be able to perform two operations:

  • we can update any \(a_i\) by adding some value \(v\) to it; or
  • we can perform a range query, which asks for the sum of the values \(a_i + \dots + a_j\) for any range \([i,j]\).

There are several ways to solve this problem. For example:

  1. We could just keep the sequence of integers in a mutable array. Updating is \(O(1)\), but range queries are \(O(n)\) since we must actually loop through the range and add up all the values.
  2. We could keep a separate array of prefix sums on the side, so that \(P_i\) stores the sum \(a_1 + \dots + a_i\). Then the range query on \([i,j]\) can be computed as \(P_j - P_{i-1}\), which only takes \(O(1)\); however, updates now take \(O(n)\) since we must also update all the prefix sums which include the updated element.
  3. We can get the best of both worlds using a segment tree, a binary tree storing the elements at the leaves, with each internal node caching the sum of its children. Then both update and range query can be done in \(O(\lg n)\).

I won’t go through the details of this third solution here, but it is relatively straightforward to understand and implement, especially in a functional language.

However, there is a fourth solution, known as a Fenwick tree or Fenwick array, independently invented by Ryabko (1989) and Fenwick (1994). Here’s a typical Java implementation of a Fenwick tree:

class FenwickTree {
    private long[] a;
    public FenwickTree(int n) { a = new long[n+1]; }
    public long prefix(int i) {
        long s = 0;
        for (; i > 0; i -= LSB(i)) s += a[i]; return s;
    }
    public void update(int i, long delta) {
        for (; i < a.length; i += LSB(i)) a[i] += delta;
    }
    public long range(int i, int j) {
        return prefix(j) - prefix(i-1);
    }
    public long get(int i) { return range(i,i); }
    public void set(int i, long v) { update(i, v - get(i)); }
    private int LSB(int i) { return i & (-i); }
}

I know what you’re thinking: what the heck!? There are some loops adding and subtracting LSB(i), which is defined as the bitwise AND of i and -i? What on earth is this doing? Unless you have seen this before, this code is probably a complete mystery, as it was for me the first time I encountered it.

However, from the right point of view, we can derive this mysterious imperative code as an optimization of segment trees. In particular, in my paper I show how we can:

  1. Start with a segment tree.
  2. Delete some redundant info from the segment tree, and shove the remaining values into an array in a systematic way.
  3. Define operations for moving around in the resulting Fenwick array by converting array indices to indices in a segment tree, moving around the tree appropriately, and converting back.
  4. Describe these operations using a Haskell EDSL for infinite-precision 2’s complement binary arithmetic, and fuse away all the intermediate conversion steps, until the above mysterious implementation pops out.
  5. Profit.

I may be exaggerating step 5 a teensy bit. But you’ll find everything else described in much greater detail, with pretty pictures, in the paper! The official JFP version is here, and here’s an extended version with an appendix containing an omitted proof.

References

Fenwick, Peter M. 1994. “A New Data Structure for Cumulative Frequency Tables.” Software: Practice and Experience 24 (3): 327–36.
Ryabko, Boris Yakovlevich. 1989. “A Fast on-Line Code.” In Doklady Akademii Nauk, 306:548–52. 3. Russian Academy of Sciences.
<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at January 23, 2025 12:00 AM

January 22, 2025

Haskell Interlude

61: Sam Lindley

Sam Lindley is a Reader in Programming Languages Design and Implementation at the University of Edinburgh. In this episode, he tells us how difficult naming is, the different kinds of effect systems and handlers, languages *much* purer than Haskell, and Modal logic.

by Haskell Podcast at January 22, 2025 09:00 PM

Well-Typed.Com

grapesy: industrial strength gRPC library for Haskell

Well-Typed are delighted to announce the release of grapesy (Hackage, GitHub), an industial strength Haskell library providing support for gRPC, a modern open source high performance Remote Procedure Call (RPC) framework developed by Google. The library has the following features:

  • Parametric in the choice of message format; Protobuf is the most common choice for gRPC and is of course supported, as is JSON1. There is also support for general binary (“raw”) messages, and adding additional formats is easy and can be done without modifying grapesy itself.

  • Client-side and server-side support for the Protobuf common communication patterns: non-streaming, client-side streaming, server-side streaming, and bidirectional streaming. Use of these patterns is independent of the choice of message encoding, and is strictly optional.

  • For the specific case of Protobuf, support for Protobuf rich errors.

  • Support for metadata: request initial metadata, response initial metadata, and response trailing metadata.

  • Support for all the common gRPC compression algorithms: gzip and zlib (both through the zlib package), as well as snappy (through a new package snappy-c, developed for this purpose). Bespoke compression algorithms can also be used, and compression can be disabled on a per-message basis.

  • Support for both unencrypted and encrypted connections (TLS).

  • Support for cancellation, both through deadlines/timeouts (server-side cancellation) as well as through terminating a RPC early (client-side cancellation).2

  • Flow control: we are careful to use back-pressure to limit traffic, ultimately relying on HTTP2 flow control (which can be adjusted through the HTTP2Settings, primarily the stream window size and the connection window size).

  • Support for Wait-for-Ready, where the connection to a server can be (re)established in the background, rather than an RPC failing if the server is not immediately available. Note that this must be enabled explicitly (as per the spec).

  • Asynchronous design: operations happen in the background whenever possible (opening a connection, initiating an RPC, sending a message), and exceptions are only raised when directly interacting with those background processes. For example, when a client disconnects from the server, the corresponding handler will only get an exception if it attempts any further communication with that client. This is particularly important in RPC servers, which may need to complete certain operations even if the client that requested those operations did not stick around to wait for them.

  • Type safety: the types of inputs (messages sent from the client to the server) and outputs (messages from the server to the client), as well as the types of the request and response metadata, are all determined from the choice of a specific RPC. In addition, for Protobuf servers we can guarantee at the type-level that all methods are handled (or explicitly declared as unsupported).

  • Extensive documentation: this blog post contains a number of tutorials that highlight the various parts of grapesy, and the Haddock documentation is comprehensive.

The library is designed to be robust:

  • Exception safety: all exceptions, in both client and in server contexts, are caught and handled in context appropriate ways; they are never simply “lost”. Server-side exceptions are reported as gRPC errors on the client; handlers can also throw any of the standard gRPC errors.

  • Deals correctly with broken deployments (clients or servers that do not conform to the gRPC specification). This includes things such as dealing with non-200 HTTP status codes, correctly responding to unsupported content types (for which the gRPC spec mandates a different resolution on servers and clients), dealing with servers that don’t respect timeouts, etc.

  • Native Haskell library (does not bind to any C or C++ libraries).

  • Comes with a comprehensive test suite, which has been instrumental in achieving high reliability, as well as finding problems elsewhere in the network stack; as part of the development of grapesy we have also made numerous improvements to http2 and related libraries3. Many thanks to Kazu Yamamoto for being so receptive to all our PRs and willing to discuss all the issues we found, as well as his hard work on these core infrastructure libraries!

  • No memory leaks: even under stress conditions, memory usage is completely flat in both the client and the server.

  • Good performance, on par with the official Java implementation.

Developing a library of this nature is a significant investment, and so Well-Typed is thankful to Anduril for sponsoring the work.

Quickstart

In this section we explain how to get started, in the style of the official Quickstart guide. You can also use the Quickstart example as a basic template for your own gRPC applications.

gRPC tools

Neither gRPC nor grapesy requires the use of Protobuf, but it is the most common way of using gRPC, and it is used by both the Quickstart tutorial as well as the Basics tutorial. You will therefore need to install the protobuf buffer compiler protoc, which can usually be done using your system’s package manager; see Protobuf Buffer Compiler Installation for details.

Download the example

If you want to work through this quick start, you will need to clone the grapesy repository:

$ git clone https://github.com/well-typed/grapesy.git
$ cd grapesy/tutorials/quickstart

Run a gRPC application

From the grapesy/tutorials/quickstart directory, run the server

$ cabal run greeter_server

From another terminal, run the client:

$ cabal run greeter_client

If all went well, you should see the server responding to the client with

Proto {message: "Hello, you"}

Update the gRPC service

Now let’s try to add another method to the Greeter service. This service is defined using protocol buffers; for an introduction to gRPC in general and Protobuf specifically, you may wish to read the official Introduction to gRPC; we will also see more examples of Protobuf below in the Basics tutorial. You can find the definition for the quickstart tutorial in tutorials/quickstart/proto/helloworld.proto:

syntax = "proto3";

// The greeting service definition.
service Greeter {
  // Sends a greeting
  rpc SayHello (HelloRequest) returns (HelloReply) {}
}

// The request message containing the user's name.
message HelloRequest {
  string name = 1;
}

// The response message containing the greetings
message HelloReply {
  string message = 1;
}

Let’s add another method to this service, with the same request and response types:

service Greeter {
  // Sends a greeting
  rpc SayHello (HelloRequest) returns (HelloReply) {}

  // Sends another greeting
  rpc SayHelloAgain (HelloRequest) returns (HelloReply) {}
}

Generate gRPC code

The example is set up to use a custom Cabal setup script to automatically compile the proto definitions; see proto-lens-setup for a detailed discussion on how to do this. If you prefer not to use custom setup scripts in your own projects, it is also possible to run the Protobuf compiler manually; see section Manually running the protocol compiler of the proto-lens-protoc documentation.

This means that to re-run the Protobuf compiler, it suffices to build either the client or the server; let’s attempt to build the server:

$ cabal build greeter_server

You should see a type error:

app/Server.hs:13:7: error: [GHC-83865]
    • Couldn't match type: '[]
                     with: '[Protobuf Greeter "sayHelloAgain"]

This is telling you that the server is incomplete: we are missing a handler for the new sayHelloAgain method.

Update the server

To update the server, edit Server.hs and add:

sayHelloAgain :: Proto HelloRequest -> IO (Proto HelloReply)
sayHelloAgain req = do
    let resp = defMessage & #message .~ "Hello again, " <> req ^. #name
    return resp

Then update methods to list the new handler:

methods :: Methods IO (ProtobufMethodsOf Greeter)
methods =
      Method (mkNonStreaming sayHello)
    $ Method (mkNonStreaming sayHelloAgain)
    $ NoMoreMethods

Update the client

Unlike the server, the change to the service definition does not require changes to the client. The server must implement the new method, but the client does not have to call it. Of course, it is more interesting when it does, so let’s add another call to Client.hs:

withConnection def server $ \conn -> do
  let req = defMessage & #name .~ "you"
  resp <- nonStreaming conn (rpc @(Protobuf Greeter "sayHello")) req
  print resp
  resp2 <- nonStreaming conn (rpc @(Protobuf Greeter "sayHelloAgain")) req
  print resp2

Run

After restarting greeter_server, running greeter_client should now output

Proto {message: "Hello, you"}
Proto {message: "Hello again, you"}

Basics

In this section we delve a little deeper, following the official Basics tutorial, which introduces the RouteGuide service. From the official docs:

Our example is a simple route mapping application that lets clients get information about features on their route, create a summary of their route, and exchange route information such as traffic updates with the server and other clients.

You can find the example in the tutorials/basics directory of the grapesy repo.

Defining the service

The RouteGuide example illustrates the four different kinds of communication patterns that Protobuf services can have. You can find the full service definition in tutorials/basics/proto/route_guide.proto:

  • Non-streaming: client sends a single input, server replies with a single output:

    // Obtains the feature at a given position.
    rpc GetFeature(Point) returns (Feature) {}
  • Server-side streaming: client sends a single input, server can respond with any number of outputs:

    // Obtains the Features available within the given Rectangle.
    rpc ListFeatures(Rectangle) returns (stream Feature) {}
  • Client-side streaming: client can send any number of inputs, after which the server responds with a single output:

    // Accepts a stream of Points on a route being traversed, returning a
    // RouteSummary when traversal is completed.
    rpc RecordRoute(stream Point) returns (RouteSummary) {}
  • Bidirectional streaming: the client and the server can exchange messages at will:

    // Accepts a stream of RouteNotes sent while a route is being traversed,
    // while receiving other RouteNotes (e.g. from other users).
    rpc RouteChat(stream RouteNote) returns (stream RouteNote) {}

There is explicit support in grapesy for these four communication patterns, both for defining servers and for defining clients. In addition, there is a lower-level API which provides more control over the communication; we will see some examples in Beyond the basics.

Generated code

As in the Quickstart, we have set things up in the example to automatically generate Haskell code from the .proto definition. There is however one more thing that we need to take care of, which we glossed over previously. The .proto definition is sufficient to determine the types of the methods of the service, their arguments, and their results. But it does not say anything about the type of any metadata. We don’t need any metadata in this example, so we can declare the following module:

module Proto.API.RouteGuide (
    module Proto.RouteGuide
  ) where

import Network.GRPC.Common
import Network.GRPC.Common.Protobuf

import Proto.RouteGuide

type instance RequestMetadata          (Protobuf RouteGuide meth) = NoMetadata
type instance ResponseInitialMetadata  (Protobuf RouteGuide meth) = NoMetadata
type instance ResponseTrailingMetadata (Protobuf RouteGuide meth) = NoMetadata

This re-exports module Proto.RouteGuide (which was generated), along with three type family instances that indicate that none of the methods of the RouteGuide require metadata. We will see an example of using metadata later.

Proto wrapper

In the repository you will find an implementation of the logic of the RouteGuide example as a collection of pure functions; see tutorials/basics/src/RouteGuide.hs. For example, the type of the function that looks up which feature exists at a particular point, given the example database of features, is given by:

featureAt :: DB -> Proto Point -> Maybe (Proto Feature)

The precise implementation is not very important for our purposes here, but we should discuss that Proto wrapper. This is a type-level marker that explicitly identifies Protobuf values. Such values don’t behave like regular Haskell values; for example, record fields always have defaults, enums might have unknown values, etc. The idiomatic way of accessing fields of a Proto value is using a lens access and an (overloaded) label; for example, the following expression extracts a field #location from a feature (f :: Proto Feature):

f ^. #location

To construct a Proto value you first create an empty value using defMessage, and then update individual fields with a lens update. For example, here is how we might construct a Proto RouteSummary:

defMessage
  & #pointCount   .~ ..
  & #featureCount .~ ..
  & #distance     .~ ..
  & #elapsedTime  .~ ..

Everything required to work with Protobuf values is (re-)exported from Network.GRPC.Common.Protobuf. In addition, Network.GRPC.Common.Protobuf.Any provides functionality for working with the Protobuf Any type.

Implementing the server

We can use the type checker to help us in the development of the server. We know that we want to implement the methods of the RouteGuide service; if we define

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db = _

the type checker will tell us that it expects something of this type4:

_ :: Methods IO [
    Protobuf RouteGuide "getFeature"
  , Protobuf RouteGuide "listFeatures"
  , Protobuf RouteGuide "recordRoute"
  , Protobuf RouteGuide "routeChat"
  ]

We can therefore refine methods to

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db =
      Method _getFeature
    $ Method _listFeatures
    $ Method _recordRoute
    $ Method _routeChat
    $ NoMoreMethods

at which point the type checker informs us:

_getFeature   :: ServerHandler' NonStreaming    IO (Protobuf RouteGuide "getFeature")
_listFeatures :: ServerHandler' ServerStreaming IO (Protobuf RouteGuide "listFeatures")
_recordRoute  :: ServerHandler' ClientStreaming IO (Protobuf RouteGuide "recordRoute")
_routeChat    :: ServerHandler' BiDiStreaming   IO (Protobuf RouteGuide "routeChat")

We can therefore refine once more to

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db =
      Method (mkNonStreaming    $ _getFeature)
    $ Method (mkServerStreaming $ _listFeatures)
    $ Method (mkClientStreaming $ _recordRoute)
    $ Method (mkBiDiStreaming   $ _routeChat)
    $ NoMoreMethods

The resulting types will depend on the communication pattern (non-streaming, client-side streaming, etc.). We will discuss them one by one.

Non-streaming RPC

The first method is a non-streaming RPC, for which the type checker infers:

_getFeature :: Proto Point -> IO (Proto Feature)

That is, we are given a point of interest, and must return “the” feature at that point. We will also need the database of features. The implementation is straight-forward, and essentially just wraps the pure function featureAt:

getFeature :: DB -> Proto Point -> IO (Proto Feature)
getFeature db p = return $ fromMaybe (defMessage & #location .~ p) (featureAt db p)

The only minor complication here is that we need to construct some kind of default location for when there is no feature found at point p.

Server-side streaming

For server-side streaming we are given the input from the client, along with a function that we can use to send outputs back to the client:

_listFeatures :: Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> IO ()

NextElem is similar to Maybe:

data NextElem a = NoNextElem | NextElem !a

but with a more specialized API. For example, it offers

forM_ :: Monad m => [a] -> (NextElem a -> m ()) -> m ()

which will invoke the specified callback NextElem x for all x in the list, and then invoke the callback once more with NoNextElem. We can use this to implement listFeatures:

listFeatures :: DB -> Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> IO ()
listFeatures db r send = NextElem.forM_ (featuresIn db r) send

Client-side streaming

For client-side streaming we are given a function to receive inputs from the client, and must produce a single output to be sent back to the client:

_recordRoute :: IO (NextElem (Proto Point)) -> IO (Proto RouteSummary)

To implement it, we can use another function from the NextElem API:

collect :: Monad m => m (NextElem a) -> m [a]

The only other complication is that the function which constructs the RouteSummary also wants to know how long it took to collect all points:

recordRoute :: DB -> IO (NextElem (Proto Point)) -> IO (Proto RouteSummary)
recordRoute db recv = do
    start <- getCurrentTime
    ps    <- NextElem.collect recv
    stop  <- getCurrentTime
    return $ summary db (stop `diffUTCTime` start) ps

Bidirectional streaming

For bidirectional streaming finally we get two functions: one to receive inputs from the client, and one to send outputs back to the client:

_routeChat ::
     IO (NextElem (Proto RouteNote))
  -> (NextElem (Proto RouteNote) -> IO ())
  -> IO ()

The implementation is straight-forward and does not require any new grapesy features; you can find it in tutorials/basics/app/Server.hs.

Top-level application

The main server application then looks like this:

main :: IO ()
main = do
    db <- getDB
    runServerWithHandlers def config $ fromMethods (methods db)
  where
    config :: ServerConfig
    config = ServerConfig {
          serverInsecure = Just (InsecureConfig Nothing defaultInsecurePort)
        , serverSecure   = Nothing
        }

The first parameter to runServerWithHandlers are the server parameters. The most important parameters to consider are serverTopLevel and serverExceptionToClient. These two are related, and describe how to deal with exceptions:

  • serverTopLevel says what to do with exceptions server-side; by default it simply prints them to stderr
  • serverExceptionToClient says what information to include in the error sent to the client; by default it calls displayException. You may wish to override this if you are concerned about leaking security sensitive information.

Implementing the client

You can find the complete client in tutorials/basics/app/Client.hs.

Connecting to the server

Before we can make any RPCs, we have to connect to the server:

main :: IO ()
main =
    withConnection def server $ \conn -> do
      ..
  where
    server :: Server
    server = ServerInsecure $ Address "127.0.0.1" defaultInsecurePort Nothing

The first argument are the connection parameters, the most important of which is probably the reconnection policy which (amongst other things) is used to enable Wait-for-Ready semantics.

Simple RPC

We already saw how to make a simple non-streaming RPC in the quickstart:

getFeature :: Connection -> IO ()
getFeature conn = do
    let req = defMessage
                & #latitude  .~  409146138
                & #longitude .~ -746188906
    resp <- nonStreaming conn (rpc @(Protobuf RouteGuide "getFeature")) req
    print resp

We construct a request, do the RPC, and print the response.

Server-side streaming

When we make a server-side streaming RPC, we are given a function we can call to get all of the server outputs:

listFeatures :: Connection -> IO ()
listFeatures conn = do
    let req = ..
    serverStreaming conn (rpc @(Protobuf RouteGuide "listFeatures")) req $ \recv ->
      NextElem.whileNext_ recv print

Here we are using another function from the NextElem API, in a sense dual to the one we used server-side; for comparison, both types:

forM_      :: Monad m => [a] -> (NextElem a -> m ()) -> m ()
whileNext_ :: Monad m => m (NextElem a) -> (a -> m b) -> m ()

Client-side streaming

To make a client-side streaming RPC, we are given a function that we can use to send inputs to the server; once we are done sending all inputs, we then receive the final (and only) output from the server:

recordRoute :: Connection -> IO ()
recordRoute conn = do
    resp <- clientStreaming_ conn (rpc @(Protobuf RouteGuide "recordRoute")) $ \send -> do
      replicateM_ 10 $ do
        let p = (db !! i) ^. #location
        send $ NextElem p
        threadDelay 500_000 -- 0.5 seconds
      send NoNextElem
    print resp

Bidirectional streaming

Finally, for bidirectional streaming we are given two functions, one to send, and one to receive. In this particular case, we can first send all inputs and then receive all outputs, but in general these can be interleaved in any order:

routeChat :: Connection -> IO ()
routeChat conn = do
    biDiStreaming conn (rpc @(Protobuf RouteGuide "routeChat")) $ \send recv -> do
      NextElem.forM_ messages send
      NextElem.whileNext_ recv print
  where
    messages = ..

See also The Haskell Unfolder, episode 27: duality for a more in-depth look into the duality between these various communication patterns.

End of output

When we discussed the client-side implementation of a client-side streaming RPC, we used function clientStreaming_:

clientStreaming_ ::
     ..
  -> (    (NextElem (Input rpc) -> m ())
       -> m ()
     )
  -> m (Output rpc)

The callback is given a function (which we called send) to send outputs to the server. The problem with this approach is that it’s possible to forget to call this function; in particular, it’s quite easy to forget the final

send NoNextElem

to indicate to the server that there is no further input coming. In some cases iteration functions such as NextElem.forM_ can take care of this, but this could also result in the opposite problem, calling send on a NextElem after NoNextElem has already been sent.

In short: make sure to send NoNextElem in clients or servers that stream values to their peer:

  • If you forget to do this in a server handler, grapesy will assume this is a bug and throw a HandlerTerminated exception, which will be reported as a gRPC exception with an unknown error on the client.

  • If you forget to do this in a client, grapesy will assume that you intend to cancel the RPC. The server will see call closed suddenly5, and on the client this will result in a gRPC exception with “cancelled” as the error.

Sending more elements after sending NoNextElem will result in SendAfterFinal exception.

Side note. In principle it is possible to give clientStreaming_ a different type:

-- Alternative definition, not actually used in grapesy
clientStreaming_ ::
     ..
  -> m (NextElem (Input rpc))
  -> m (Output rpc)

In this style there is no callback at all; instead, we must provide an action that produces the next element one by one, and the library will ensure that the function is called repeatedly until it returns NoNextElem. This amounts to inversion of control: you don’t call a function to send each value, but the library calls you to ask what the next value to be sent is. This provides stronger guarantees that the communication pattern is implemented correctly, but we deemed the cost too high: it results in quite an awkward programming model. Of course, if you want to, nothing stops you from defining such an API on top of the API offered by grapesy.

Beyond the basics

In this section we describe some of the more advanced features of grapesy.

Using the low-level API

Both the Quickstart and the Basics tutorial used the StreamType API, which captures the four different communication patterns (aka streaming types) used in Protobuf, both on the server and on the client: non-streaming, server-side streaming, client-side streaming, and bidirectional streaming. Although these four communication patterns originate in Protobuf, in grapesy they are not Protobuf specific and can also be used with other message encodings.

The high-level API will probably suffice for the vast majority of gRPC applications, but not quite all, and grapesy also offers a low-level API. The most important reasons to use the low-level API instead are:

  • Making sure that the final message is marked as final; we discuss this in more detail in this section in Final elements.

  • Sending and receiving metadata; we will discuss this in detail in the next section Using metadata.

  • Preference: some people may simpler prefer the style of the low-level API over the high-level API.

Although the use of the low-level API does come with some responsibilities that are taken care of for you in the high-level API, it is not significantly more difficult to use.

Final elements

When we discussed the high-level API, we saw the NextElem type. The low-level API uses StreamElem instead; here they are side by side:

data NextElem     a = NoNextElem     | NextElem   !a
data StreamElem b a = NoMoreElems !b | StreamElem !a | FinalElem !a !b

There are two differences here:

  • When there are no more elements, we record an additional value. This is the metadata to be sent or received after the final element. We will see an example of this below; for RouteGuide this metadata will always be NoMetadata, which is a trivial type isomorphic to ():

    data NoMetadata = NoMetadata
  • The final element can be marked as final, rather than requiring a separate NoMoreElems value. This may feel like an insignificant difference, but although it is a technicality, in some cases it’s an important technicality.

To understand the need for marking the final element, we need to understand that gRPC messages are transferred over HTTP2 DATA frames. It’s not necessarily true that one frame corresponds to one message, but let’s for the sake of simplicity assume that it is. Then in order to send 3 messages, we have two options:

Option 1: empty final frame Option 2: mark final message
frame 1: message 1 frame 1: message 1
frame 2: message 2 frame 2: message 2
frame 3: message 3 frame 3: message 3, marked END_STREAM
frame 4: empty, marked END_STREAM

corresponding to

    [StreamElem 1, StreamElem 2, StreamElem 3, NoMoreElems NoMetadata]
and [StreamElem 1, StreamElem 2, FinalElem 3 NoMetadata]

respectively. This matters because some servers report an error if they receive a message that they expect will be the final message, but the corresponding HTTP2 DATA frame is not marked END_STREAM. This is not completely unreasonable: after all, waiting to receive the next DATA frame might be a blocking operation.

This is particularly important in cases where a server (or client) only expects a single message (non-streaming, client-side streaming, expecting a single output from the server, or server-side streaming, expecting a single input from the client). It is much less critical in other situations, which is why the high-level API gets away with using NextElem instead of StreamElem (which it uses only when multiple messages are expected).

On the server

To use the low-level API on the server, you can either use RawMethod to use the low-level API for some (or all) of the methods of an API, or you avoid the use of fromMethods altogether. The latter option is primarily useful if you don’t have a type-level description of your service available. If you do, the first option is safer:

methods :: DB -> Methods IO (ProtobufMethodsOf RouteGuide)
methods db =
      RawMethod (mkRpcHandler $ getFeature   db)
    $ RawMethod (mkRpcHandler $ listFeatures db)
    $ RawMethod (mkRpcHandler $ recordRoute  db)
    $ RawMethod (mkRpcHandler $ routeChat      )
    $ NoMoreMethods

It is also possible to use the high-level API for most methods, and escape to the low-level API for those methods that need it.

Unlike with the high-level API, the signature of all handlers that use the low-level API is the same:

getFeature   :: DB -> Call (Protobuf RouteGuide "getFeature")   -> IO ()
listFeatures :: DB -> Call (Protobuf RouteGuide "listFeatures") -> IO ()
recordRoute  :: DB -> Call (Protobuf RouteGuide "recordRoute")  -> IO ()
routeChat    ::       Call (Protobuf RouteGuide "routeChat")    -> IO ()

The most important two functions6 for communication on the server are recvInput and sendOutput:

recvInput  :: Call rpc -> IO (StreamElem NoMetadata (Input rpc))
sendOutput :: Call rpc -> StreamElem (ResponseTrailingMetadata rpc) (Output rpc) -> IO ()

For convenience there are also some derived functions available; for example, here is getFeature again, now using the low-level API:

getFeature :: DB -> Call (Protobuf RouteGuide "getFeature") -> IO ()
getFeature db call = do
    p <- recvFinalInput call
    sendFinalOutput call (
        fromMaybe (defMessage & #location .~ p) (featureAt db p)
      , NoMetadata
      )

The StreamElem API also offers some iteration functions similar to the ones offered by NextElem; for example, here is listFeatures:

listFeatures :: DB -> Call (Protobuf RouteGuide "listFeatures") -> IO ()
listFeatures db call = do
    r <- recvFinalInput call
    StreamElem.forM_ (featuresIn db r) NoMetadata (sendOutput call)

The full server definition is available in tutorials/lowlevel/app/Server.hs.

On the client

The main function to make an RPC using the low-level API is withRPC. For example, here is getFeature:

getFeature :: Connection -> IO ()
getFeature conn = do
    let req = ..
    withRPC conn def (Proxy @(Protobuf RouteGuide "getFeature")) $ \call -> do
      sendFinalInput call req
      resp <- recvFinalOutput call
      print resp

The second argument to withRPC are the call parameters, of which there are two important ones: the timeout for this RPC, and the request metadata. (When using the high-level API the only way to set a timeout is to specify the default RPC timeout for the connection.)

End of output, revisited

At the end of the basics tutorial, we emphasized the importance of indicating end of output for streaming clients and server handlers. The discussion there is relevant when using the low-level API as well, with some additional caveats:

  • In the high-level API, the library can take care of marking the (only) value for non-streaming output as final; in the low-level API, this is your own responsibility, either through calling sendFinalInput / sendFinalOutput or through calling sendInput / sendOutput and constructing the StreamElem manually.

  • For streaming outputs, you can use sendEndOfInput (clients) or sendTrailers (servers) to indicate end of output after the fact (like NoNextElem does), or use sendFinalInput / sendFinalOutput to mark the final element as final when you send it. This should be preferred whenever possible.

Using metadata

As an example of using metadata, let’s construct a simple file server which tells the client the size of the file to be downloaded as the initial response metadata, then streams the contents of the file as a series of chunks, and finally reports a SHA256 hash of the file contents in the trailing response metadata. The client can use the initial file size metadata to show a progress bar, and the hash in the trailing metadata to verify that everything went well.

You can find the full example in tutorials/metadata.

Service definition

The .proto file is straight-forward:

syntax = "proto3";

package fileserver;

service Fileserver {
  rpc Download (File) returns (stream Partial) {}
}

message File {
  string name = 1;
}

message Partial {
  bytes chunk = 1;
}

As mentioned above, however, the .proto definition does not tell us the type of the metadata. We need to do this in Haskell:

type instance RequestMetadata          (Protobuf Fileserver "download") = NoMetadata
type instance ResponseInitialMetadata  (Protobuf Fileserver "download") = DownloadStart
type instance ResponseTrailingMetadata (Protobuf Fileserver "download") = DownloadDone

data DownloadStart = DownloadStart {
      downloadSize :: Integer
    }
  deriving stock (Show)

data DownloadDone = DownloadDone {
      downloadHash :: ByteString
    }
  deriving stock (Show)

(In this example we make no use of request metadata; see callRequestMetadata for the main entry point for setting request metadata.)

Serialization

In order for the server to be able to send the metadata to the client, we need to be able serialize it as one (or more, or zero) headers/trailers. This means we must give an instance of BuildMetadata:

instance BuildMetadata DownloadStart where
  buildMetadata DownloadStart{downloadSize} = [
        CustomMetadata "download-size" $ C8.pack (show downloadSize)
      ]

instance BuildMetadata DownloadDone where
  buildMetadata DownloadDone{downloadHash} = [
        CustomMetadata "download-hash-bin" downloadHash
      ]

Note the use of the -bin suffix for the name of the download-hash-bin trailer: this indicates that this is metadata containing binary data, and that it must be Base64-encoded; grapesy will automatically take care of encoding and decoding for binary metadata.

We need to take care of one more thing. The HTTP2 spec mandates that clients must be informed up-front which trailing headers they can expect. In grapesy this comes down to giving an instance of StaticMetadata:

instance StaticMetadata DownloadDone where
  metadataHeaderNames _ = ["download-hash-bin"]

This can be an over-approximation but not an under-approximation; if you return a trailer in BuildMetadata that was not declared in StaticMetadata, then grapesy will throw an exception.

Deserialization

For deserialization we must provide an instance of ParseMetadata, which is given all metadata headers to parse. In our example this is relatively simple because our metadata uses only a single header:

instance ParseMetadata DownloadStart where
  parseMetadata md =
      case md of
        [CustomMetadata "download-size" value]
          | Just downloadSize <- readMaybe (C8.unpack value)
          -> return $ DownloadStart{downloadSize}
        _otherwise
          -> throwM $ UnexpectedMetadata md

instance ParseMetadata DownloadDone where
  parseMetadata md =
      case md of
        [CustomMetadata "download-hash-bin" downloadHash]
          -> return $ DownloadDone{downloadHash}
        _otherwise
          -> throwM $ UnexpectedMetadata md

These particular instances will throw an error if additional metadata is present. This is a choice, and instead we could simply ignore any additional headers. There is no single right answer here: ignoring additional metadata runs the risk of not realizing that the peer is trying to tell you something important, but throwing an error runs the risk of unnecessarily aborting an RPC.

Specifying initial response metadata

The metadata that is sent to the client with the response headers can be overridden with setResponseInitialMetadata. This can be done at any point before initiating the request, either explicitly using initiateResponse or implicitly by sending the first output to the client using sendOutput and related functions.

Most server handlers however don’t care about metadata, and prefer not to have to call to setResponseInitialMetadata at all. For this reason mkRpcHandler has type

mkRpcHandler :: Default (ResponseInitialMetadata rpc) => ..

This constraint is inherited by the high-level API, which doesn’t support metadata at all:

Method :: (Default (ResponseInitialMetadata rpc), Default (ResponseTrailingMetadata rpc)) => ..

Crucially, there is a Default instance for NoMetadata:

instance Default NoMetadata where
  def = NoMetadata

In our case however we cannot provide a Default instance, because the initial metadata depends on the file size. We therefore use mkRpcHandlerNoDefMetadata instead:

methods :: Methods IO (ProtobufMethodsOf Fileserver)
methods =
      RawMethod (mkRpcHandlerNoDefMetadata download)
    $ NoMoreMethods

This means we must call setResponseInitialMetadata in the handler; if we don’t, an exception will be raised when the response is initiated.

Server handler

Since we are using the low-level API (we must, if we want to deal with metadata), the server handler has this signature:

download :: Call (Protobuf Fileserver "download") -> IO ()
download call = do

We wait for the request from the client, get the file size, set the response initial metadata, and initiate the response. Explicitly initiating the response in this manner is not essential, but it means that the file size is sent to the client (along with the rest of the response headers) before the first chunk is sent; in some cases this may be important:

req :: Proto File <- recvFinalInput call
let fp :: FilePath
    fp = Text.unpack (req ^. #name)

fileSize <- getFileSize fp
setResponseInitialMetadata call $ DownloadStart fileSize
initiateResponse call

We then open the file the client requested, and keep reading chunks until we have reached end of file. Although it is probably not particularly critical in this case, we follow the recommendations from End of output, revisited and mark the final chunk as the final output to the client, as opposed to telling the client that no more outputs are available after the fact.

withFile fp ReadMode $ \h -> do
  let loop :: SHA256.Ctx -> IO ()
      loop ctx = do
          chunk <- BS.hGet h defaultChunkSize
          eof   <- hIsEOF h

          let resp :: Proto Partial
              resp = defMessage & #chunk .~ chunk

              ctx' :: SHA256.Ctx
              ctx' = SHA256.update ctx chunk

          if eof then
            sendFinalOutput call (resp, DownloadDone $ SHA256.finalize ctx')
          else do
            sendNextOutput call resp
            loop ctx'

  loop SHA256.init

When we send the final output, we must also include the hash that we computed as we were streaming the file to the client.

Client

Let’s first consider how to process the individual chunks that we get from the server. We do this in an auxiliary function processPartial:

processPartial ::
     Handle
  -> Proto Partial
  -> ProgressT (StateT SHA256.Ctx IO) ()
processPartial h partial = do
    liftIO $ BS.hPut h chunk
    modify $ flip SHA256.update chunk
    updateProgressBar $ BS.length chunk
  where
    chunk :: ByteString
    chunk = partial ^. #chunk

We do three things in this function: write the chunk to disk, update the hash, and update the progress bar; this uses StateT to keep track of the partially computed hash, and ProgressT for a simple progress bar (ProgressT is defined in tutorials/metadata/app/ProgressT.hs; its details are not important here).

This in hand, we can now define the main client function. We are given some file inp that we are interested in downloading, and a path out where we want to store it locally. Like in the server, here too we must use the low-level API, so the client starts like this:

download :: Connection -> String -> String -> IO ()
download conn inp out = do
    withRPC conn def (Proxy @(Protobuf Fileserver "download")) $ \call -> do
      sendFinalInput call $ defMessage & #name .~ Text.pack inp

We then wait for the initial response metadata, telling us how big the file is:

DownloadStart{downloadSize} <- recvResponseInitialMetadata call

We then use StreamElem.whileNext_ again to process all the chunks using processPartial that we already discussed, unwrap the monad stack, and finally do a hash comparison:

(DownloadDone{downloadHash = theirHash}, ourHash) <-
  withFile out WriteMode $ \h ->
    flip runStateT SHA256.init . runProgressT downloadSize $
      StreamElem.whileNext_ (recvOutput call) (processPartial h)

putStrLn $ "Hash match: " ++ show (theirHash == SHA256.finalize ourHash)

Custom monad stack

In this section we will briefly discuss how to use custom monad stacks. You can find the full tutorial in tutorials/monadstack; it is a variant on Basics tutorial.

On the server

Most of the server handlers in for the RouteGuide service need to take the DB as an argument:

getFeature   :: DB -> Proto Point -> IO (Proto Feature)
listFeatures :: DB -> Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> IO ()
recordRoute  :: DB -> IO (NextElem (Proto Point)) -> IO (Proto RouteSummary)

It might be more convenient to define a custom Handler monad stack in which we have access to the DB at all times:

newtype Handler a = WrapHandler {
      unwrapHandler :: ReaderT DB IO a
    }
  deriving newtype (Functor, Applicative, Monad, MonadIO, MonadReader DB)

runHandler :: DB -> Handler a -> IO a
runHandler db = flip runReaderT db . unwrapHandler

The types of our handlers then becomes

getFeature   :: Proto Point -> Handler (Proto Feature)
listFeatures :: Proto Rectangle -> (NextElem (Proto Feature) -> IO ()) -> Handler ()
recordRoute ::  IO (NextElem (Proto Point)) -> Handler (Proto RouteSummary)

Note that the callbacks to send or receive values still live in IO. The DB argument now disappears from methods also:

methods :: Methods Handler (ProtobufMethodsOf RouteGuide)
methods =
      Method (mkNonStreaming    getFeature  )
    $ Method (mkServerStreaming listFeatures)
    $ Method (mkClientStreaming recordRoute )
    $ Method (mkBiDiStreaming   routeChat   )
    $ NoMoreMethods

The only requirement from grapesy is that at the top-level we can hoist this monad stack into IO, using hoistMethods:

hoistMethods :: (forall a. m a -> n a) -> Methods m rpcs -> Methods n rpcs

Here’s how we can run the server:

runServerWithHandlers def config $ fromMethods $
  hoistMethods (runHandler db) methods

On the client

For the high-level API there is support for custom monad stacks also. One reason why you might want to do this is to avoid having to pass the Connection object around all the time. In the Basics tutorial our client functions had these signatures:

getFeature   :: Connection -> IO ()
listFeatures :: Connection -> IO ()
recordRoute  :: Connection -> IO ()
routeChat    :: Connection -> IO ()

Like on the server, we can define a custom monad stack to reduce syntactic overhead:

newtype Client a = WrapClient {
      unwrapClient :: ReaderT ClientEnv IO a
    }
  deriving newtype (Functor, Applicative, Monad, MonadIO, MonadCatch, MonadThrow, MonadMask)

data ClientEnv = ClientEnv {
      conn :: Connection
    }

In order for such a monad stack to be useable, it needs to implement MonadIO and MonadMask, as well as CanCallRPC; it’s this last class that tells grapesy to get get access to the Connection object:

instance CanCallRPC Client where
  getConnection = WrapClient $ conn <$> ask

With this defined, we can now avoid having to pass the connection around at all. Instead of importing from Network.GRPC.Client.StreamType.IO we import from Network.GRPC.Client.StreamType.CanCallRPC instead, which gives us a different definition of nonStreaming and friends. For example, here is getFeature:

getFeature :: Client ()
getFeature = do
    let req = ..
    resp <- nonStreaming (rpc @(Protobuf RouteGuide "getFeature")) req
    liftIO $ print resp

As for the server handlers, the callbacks provided to send and receive messages still live in IO; this means that we’ll need to liftIO them where appropriate:

listFeatures :: Client ()
listFeatures = do
    let req = ..
    serverStreaming (rpc @(Protobuf RouteGuide "listFeatures")) req $ \recv -> liftIO $
      NextElem.whileNext_ recv print

Using conduits

We discussed the simplest form of serverStreaming and co when we discussed the implementation of the client in the Basics tutorial, and we have also seen the generalized form to arbitrary monad stacks. There is one more form, provided in Network.GRPC.Client.StreamType.Conduit, which provides an API using conduits. You can find this example in tutorials/conduit; there is currently no conduit support on the server side.

The main idea is that serverStreaming provides a source to stream from, and clientStreaming_ provides a sink to stream to:

listFeatures :: Connection -> IO ()
listFeatures conn = do
    let req = ..

    let sink :: ConduitT (Proto Feature) Void IO ()
        sink = ..

    serverStreaming conn (rpc @(Protobuf RouteGuide "listFeatures")) req $ \source ->
      runConduit $ source .| sink

recordRoute :: Connection -> IO ()
recordRoute conn = do
    let source :: ConduitT () (Proto Point) IO ()
        source = ..

    resp <- clientStreaming_ conn (rpc @(Protobuf RouteGuide "recordRoute")) $ \sink ->
              runConduit $ source .| sink
    print resp

In bidirectional streaming finally we get two conduits, one in each direction (that is, one source and one sink).

(Ab)using Trailers-Only

For this final section we need to consider some more low-level details about how gRPC is sent over HTTP2. When we discussed final elements, we mentioned that gRPC messages are sent using HTTP2 DATA frames, but we didn’t talk about headers. In general, a gRPC request looks like this:

  1. One or more HEADERS frames, containing the request headers. One of the most important headers here is the :path (pseudo) header, which indicates which RPC we want to invoke; for example, this might be /routeguide.RouteGuide/ListFeatures.

  2. One or more DATA frames, the last of which is marked END_STREAM. We discussed these before.

This is probably as expected, but the structure of the response may look a bit more surprising:

  1. Just like the request, we first get one or more HEADERS. An important example here is the content-type response header, which indicates what kind of message encoding is being used (for example, application/grpc+proto for Protobuf).

  2. One or more DATA frames, the last of which is marked END_STREAM.

  3. Finally, another set of headers, also known as trailers. This set of trailers provides some concluding information about how the RPC went; for example, if the RPC failed, then the trailers will include a grpc-status header with a non-zero value. Any application specific response trailing metadata (such as the checksum we discussed in the file server example) is included here as well.

There is however a special case, known as Trailers-Only: if there is no data to be sent at all, it is possible to send only HEADERS frames, the last of which is marked END_STREAM, and no DATA frames at all. Put another way, the two sets of headers (headers and trailers) are combined, and the data frames are omitted entirely.

The gRPC specification is very explicit about the use of Trailers-Only, and states that it can be used only in RPCs that result in an error:

Most responses are expected to have both headers and trailers but Trailers-Only is permitted for calls that produce an immediate error.

In grapesy this will happen automatically: if a server handler raises an error, and no outputs have as yet been sent to the client, then grapesy will automatically take advantage of Trailers-Only and only send a single set of headers.

However, some gRPC servers also make use of Trailers-Only in non-error cases, when there is no output (e.g. for server-side streaming). Since this does not conform to the gRPC specification, grapesy will not do this automatically, but it is possible if really needed. In tutorials/trailers-only you can find an example RouteGuide server which will take advantage of Trailers-Only in the listFeatures method, when there are no features to return:

listFeatures :: DB -> Call (Protobuf RouteGuide "listFeatures") -> IO ()
listFeatures db call = do
    r <- recvFinalInput call
    case featuresIn db r of
      [] -> sendTrailersOnly call NoMetadata
      ps -> StreamElem.forM_ ps NoMetadata (sendOutput call)

The difference between this implementation and the previous one can only be observed when we look at the raw network traffic; the difference is not visible at the gRPC level. Since this violates the specification, however, it’s possible (though perhaps unlikely) that some clients will be confused by this server.

Future work

The gRPC specification is only the core of the gRPC ecosystem. There are additional features that are defined on top, some of which are supported by grapesy (see list of features at the start of this post), but not all; the features that are not yet supported are listed below. Note that these are optional features, which have various degrees of support in the official clients and servers. If you or your company needs any of these features, we’d be happy to discuss options; please contact us at info@well-typed.com.

  • Authentication. The gRPC Authentication Guide mentions three ways to authenticate: SSL/TLS, Application Layer Transport Security (ALTS) and token-based authentication, possibly through OAuth2. Of these three only SSL/TLS is currently supported by grapesy.

  • Interceptors are essentially a form of middleware that are applied to every request, and can be used for things like metrics (see below).

  • Custom Backend Metrics. There is support in grapesy for parsing or including an Open Request Cost Aggregation (ORCA) load report in the response trailers through the endpoint-load-metrics-bin trailer, but there is otherwise no support for ORCA or custom backend metrics in general.

  • Load balancing. There is very limited support for load balancing in the ReconnectPolicy, but we have no support for load balancing as described in the Custom Load Balancing Policies Guide.

  • Custom name resolution.

  • Automatic deadline propagation. There is of course support for setting timeouts, but there is no support for automatic propagation from one server to another, adjusting for clock skew. See the section “Deadline Propagation” in Deadlines Guide for server.

  • Introspection, services that allow to query the state of the server:

  • True binary metadata. There is support in grapesy for sending binary metadata (in -bin headers/trailers), using base64 encoding (as per the spec). True binary metadata is about avoiding this encoding overhead.

  • Sending keep-alive pings (this will require adding this feature to the http2 library).

  • Retry policies. The gRPC documentation currently identifies two such policies: request hedging, which sends the same request to a number of servers, waiting for the first response it receives; and automatic retries of failed requests. There is support in grapesy for the grpc-previous-rpc-attempts request header as well as the grpc-retry-pushback-ms response trailer, necessary to support these features.

Footnotes

  1. There are actually two ways to use JSON with gRPC. It can be a very general term, simply meaning using an otherwise-unspecified JSON encoding, or it can specifically refer to “Protobuf over JSON”. The former is supported by grapesy, the latter is not yet↩︎

  2. The cancellation guide describes client-side cancellation as “A client cancels an RPC call by calling a method on the call object or, in some languages, on the accompanying context object.”. In grapesy this is handled slightly differently: cancellation corresponds to leaving the scope of withRPC early.↩︎

  3. The full list: http2#72, http2#74, http2#77, http2#78, http2#79, http2#80, http2#81, http2#82, http2#83, http2#84, http2#92, http2#97, http2#99, http2#101, http2#104, http2#105, http2#106, http2#107, http2#108, http2#115, http2#116, http2#117, http2#119, http2#120, http2#122, http2#124, http2#126, http2#133, http2#135, http2#136, http2#137, http2#138, http2#140, http2#142, http2#146, http2#147, http2#155, http-semantics#1, http-semantics#2, http-semantics#3, http-semantics#4, http-semantics#5, http-semantics#9, http-semantics#10, http-semantics#11, http2-tls#2, http2-tls#3, http2-tls#4, http2-tls#5, http2-tls#6, http2-tls#8, http2-tls#9, http2-tls#10, http2-tls#11, http2-tls#14, http2-tls#15, http2-tls#16, http2-tls#17, http2-tls#19, http2-tls#20, http2-tls#21, network-run#3, network-run#6, network-run#8, network-run#9, network-run#12, network-run#13, network-control#4, network-control#7, tls#458, tls#459, tls#477, tls#478, and network#588.↩︎

  4. Layout of the type error slightly modified for improved readability↩︎

  5. Since gRPC does not support client-side trailers, client-side cancellation is made visible to the server by sending a HTTP2 RST_STREAM frame.↩︎

  6. They are however not primitive; see recvInputWithMeta and sendOutputWithMeta.↩︎

by edsko, finley at January 22, 2025 12:00 AM

January 21, 2025

in Code

Advent of Code 2024: Haskell Solution Reflections for all 25 Days

Admittedly a bit late, buuuuuut Merry belated Christmas and Happy New Years to all!

This past December I again participated in Eric Wastl’s Advent of Code, a series of 25 daily Christmas-themed puzzles. Each puzzle comes with a cute story about saving Christmas, and the puzzles increase in difficulty as the stakes get higher and higher. Every night at midnight EST, my friends and I (including the good people of libera chat’s ##advent-of-code channel) discuss the latest puzzle and creative ways to solve and optimize it. But, the main goal isn’t to solve it quickly, it’s always to see creative ways to approach the puzzle and share different insights. The puzzles are bite-sized enough that there are often multiple ways to approach it, and in the past I’ve leveraged group theory, galilean transformations and linear algebra, and more group theory. This year was also the special 10 year anniversary event, with callbacks to fun story elements of all the previous years!

Most of the puzzles are also pretty nice to solve in Haskell! Lots of DFS’s that melt away as simple recursion or recursion schemes, and even the BFS’s that expose you to different data structures and encodings.

This year I’ve moved almost all of my Haskell code to an Advent of Code Megarepo. I also like to post write-ups on Haskelly ways to approach the problems, and they are auto-compiled on the megarepo wiki.

I try my best every year, but sometimes I am able to complete write-ups for all 25 puzzles before the new year catches up. The last time was 2020, and I’m proud to announce that 2024 is now also 100% complete!

You can find all of them here, but here are links to each individual one. Hopefully you can find them helpful. And if you haven’t yet, why not try Advent of Code yourself? :) And drop by the freenode ##advent-of-code channel, we’d love to say hi and chat, or help out! Thanks all for reading, and also thanks to Eric for a great event this year, as always!

by Justin Le at January 21, 2025 07:54 AM

Matt Parsons

Making My Life Harder with GADTs

Lucas Escot wrote a good blog post titled “Making My Life Easier with GADTs”, which contains a demonstration of GADTs that made his life easier. He posted the article to reddit.

I’m going to trust that - for his requirements and anticipated program evolution - the solution is a good one for him, and that it actually made his life easier. However, there’s one point in his post that I take issue with:

Dependent types and assimilated type-level features get a bad rep. They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.

I am often arguing against complexity in Haskell codebases. While Lucas’s prediction about “misconceptions” may be true for others, it is not true for me. I have worked extensively with Haskell’s most advanced features in large scale codebases. I’ve studied “Types and Programming Languages,” the Idris book, “Type Theory and Formal Proof”, and many other resources on advanced type systems. I don’t say this to indicate that I’m some kind of genius or authority, just that I’m not a rube who’s looking up on the Blub Paradox. My argument for simplicity comes from the hard experience of having to rip these advanced features out, and the pleasant discovery that simpler alternatives are usually nicer in every respect.

So how about GADTs? Do they make my life easier? Here, I’ll reproduce the comment I left on reddit:


They are often misrepresented as a futile toy for “galaxy-brain people”, providing no benefit to the regular programmer. I think this opinion stems from a severe misconception about the presumed complexity of dependent type systems.

This opinion - in my case at least - stems from having seen people code themselves into a corner with fancy type features where a simpler feature would have worked just as well.

In this case, the “simplest solution” is to have two entirely separate datatypes, as the blog post initially starts with. These datatypes, after all, represent different things - a typed environment and an untyped environment. Why mix the concerns? What pain or requirement is solved by having one more complicated datatype when two datatypes works pretty damn well?

I could indeed keep typed environments completely separate. Different datatypes, different information. But this would lead to a lot of code duplication. Given that the compilation logic will mostly be mostly identical for these two targets, I don’t want to be responsible for the burden of keeping both implementations in sync.

Code duplication can be a real concern. In this case, we have code that is not precisely duplicated, but simply similar - we want compilation logic to work for both untyped and typed logics, and only take typing information into account. When we want code to work over multiple possible types, we have two options: parametric polymorphism and ad-hoc polymorphism.

With parametric polymorphism, the solution looks like this:

data GlobalEnv a = GlobalEnv [(Name, GlobalDecl a)]

data GlobalDecl a
  = DataDecl (DataBody a)
  | FunDecl  (FunBody a)
  | TypeDecl a

data DataBody a = DataBody
  { indConstructors :: [ConstructorBody a]
  }

data ConstructorBody a = ConstructorBody
  { ctorName :: Name
  , ctorArgs :: Int
  , ctorType :: a
  }

data FunBody a = FunBody
  { funBody :: LamBox.Term
  , funType :: a
  }

This is actually very similar to the GADT approach, because we’re threading a type variable through the system. For untyped, we can write GlobalDecl (), and for typed, we can write GlobalDecl LamBox.Type.

Functions which can work on either untyped or typed would have GlobalDecl a -> _ as their input, and functions which require a representation can specify it directly. This would look very similar to the GADT approach: in practice, replace GlobalDecl Typed with GlobalDecl Type and GlobalDecl Untyped with GlobalDecl () and you’re good.

(or, heck, data Untyped = Untyped and the change is even smaller).

This representation is much easier to work with. You can deriving stock (Show, Eq, Ord). You can $(deriveJSON ''GlobalEnv). You can delete several language extensions. It’s also more flexible: you can use Maybe Type to represent partially typed programs (or programs with type inference). You can use Either TypeError Type to represent full ASTs with type errors. You can deriving stock (Functor, Foldable, Traversable) to get access to fmap (change the type with a function) and toList (collect all the types in the AST) and traverse (change each type effectfully, combining results).

When we choose GADTs here, we pay significant implementation complexity costs, and we give up flexibility. What is the benefit? Well, the entire benefit is that we’ve given up flexibility. With the parametric polymorphism approach, we can put anything in for that type variable a. The GADT prevents us from writing TypeDecl () and it forbids you from having anything other than Some (type :: Type) or None in the fields.

This restriction is what I mean by ‘coding into a corner’. Let’s say you get a new requirement to support partially typed programs. If you want to stick with the GADT approach, then you need to change data Typing = Typed | Untyped | PartiallyTyped and modify all the WhenTyped machinery - Optional :: Maybe a -> WhenTyped PartiallTyped a. Likewise, if you want to implement inference or type-checking, you need another constructor on Typing and another onWhenTyped - ... | TypeChecking and Checking :: Either TypeError a -> WhenTyped TypeChecking a.

But wait - now our TypeAliasDecl has become overly strict!

data GlobalDecl :: Typing -> Type where
  FunDecl       :: FunBody t     -> GlobalDecl t
  DataDecl      :: DataBody t    -> GlobalDecl t
  TypeAliasDecl :: TypeAliasBody -> GlobalDecl Typed

We actually want TypeAliasDecl to work with any of PartiallyTyped, Typed, or TypeChecking. Can we make this work? Yes, with a type class constraint:

class IsTypedIsh (t :: Typing)

instance IsTypedIsh Typed
instance IsTypedIsh PartiallyTyped
instance (Unsatisfiable msg) => IsTypedIsh Untyped

data GlobalDecl :: Typing -> Type where
  FunDecl       :: FunBody t     -> GlobalDecl t
  DataDecl      :: DataBody t    -> GlobalDecl t
  TypeAliasDecl :: (IsTypedIsh t) => TypeAliasBody -> GlobalDecl t

But, uh oh, we also want to write functions that can operate in many of these states. We can extend IsTypedish with a function witness witnessTypedish :: WhenTyped t Type -> Type, but that also doesn’t quite work - the t actually determines the output type.

class IsTypedIsh (t :: Typing) where
  type TypedIshPayload t 
  isTypedIshWitness :: WhenTyped t Type -> TypedIshPayload t

instance IsTypedIsh Typed where
  type TypedIshPayload Typed = Type
  isTypedIshWitness (Some a) = a

instance IsTypedIsh PartiallyTyped where
  type TypedIshPayload PartiallyTyped = Maybe Type
  isTypedIshWitness (Optional a) = a

instance IsTypedIsh TypeChecking where
  type TypedIshPayload TypeChecking = Either TypeError Type
  isTypedIshWitness (Checking a) = a

instance (Unsatisfiable msg) => IsTypedIsh Untyped

Now, this does let us write code like:

inputHasTypeSorta :: (IsTypedIsh t) => GlobalDec t -> _

but actually working with this becomes a bit obnoxious. You see, without knowing t, you can’t know the result of isTypedIshWitness, so you end up needing to say things like (IsTypedish t, TypedIshPayload t ~ f Type, Foldable f) => ... to cover the Maybe and Either case - and this only lets you fold the result. But now you’re working with the infelicities of type classes (inherently open) and sum types (inherently closed) and the way that GHC tries to unify these two things with type class dispatch.

Whew.

Meanwhile, in parametric polymorphism land, we get almost all of the above for free. If we want to write code that covers multiple possible cases, then we can use much simpler type class programming. Consider how easy it is to write this function and type:

beginTypeChecking 
    :: GlobalDecl () 
    -> GlobalDecl (Maybe (Either TypeError Type))
beginTypeChecking = fmap (\() -> Nothing)

And now consider what you need to do to make the GADT program work out like this.

January 21, 2025 12:00 AM

January 19, 2025

Magnus Therning

Reviewing GitHub PRs in Emacs

My Emacs config's todo-list has long had an item about finding some way to review GitHub PRs without having to leave Emacs and when the forge issue that I subscribe to came alive again I thought it was time to see if I can improve my config.

I found three packages for doing reviews

I've tried the first one before but at the time it didn't seem to work at all. Apparently that's improved somewhat, though there's a PR with a change that's necessary to make it work.1 The first two don't support comments on multiple lines of a PR, there are issues/discussions for both

The last one, emacs-pr-review does support commenting on multiple lines, but it lacks a nice way of opening a review from magit. What I can do is

  1. position the cursor on a PR in the magit status view, then
  2. copy the the PR's URL using forge-copy-url-at-point-as-kill, and
  3. open the PR by calling pr-review and pasting the PR's URL.

Which I did for a few days until I got tired of it and wrote a function to cut out they copy/paste part.

(defun mes/pr-review-via-forge ()
  (interactive)
  (if-let* ((target (forge--browse-target))
            (url (if (stringp target) target (forge-get-url target)))
            (rev-url (pr-review-url-parse url)))
      (pr-review url)
    (user-error "No PR to review at point")))

I've bound it to a key in magit-mode-map to make it easier.

I have to say I'm not completely happy with emacs-pr-review, so if either of the other two sort out commenting on multiple lines I'll check them out again.

My full setup for pr-review is here.

Footnotes:

1

The details can be found among the comments of the forge issue.

January 19, 2025 10:10 AM

Dan Piponi (sigfpe)

Running from the past


Important Note

The links to formulae here are broken but a PDF version is available at github.


Preface

Functional programming encourages us to program without mutable state. Instead we compose functions that can be viewed as state transformers. It's a change of perspective that can have a big impact on how we reason about our code. But it's also a change of perspective that can be useful in mathematics and I'd like to give an example: a really beautiful technique that alows you to sample from the infinite limit of a probability distribution without needing an infinite number of operations. (Unless you're infinitely unlucky!)



Markov Chains

A Markov chain is a sequence of random states where each state is drawn from a random distribution that possibly depends on the previous state, but not on any earlier state. So it is a sequence such that for all . A basic example might be a model of the weather in which each day is either sunny or rainy but where it's more likely to be rainy (or sunny) if the previous day was rainy (or sunny). (And to be technically correct: having information about two days or earlier doesn't help us if we know yesterday's weather.)


Like imperative code, this description is stateful. The state at step depends on the state at step . Probability is often easier to reason about when we work with independent identically drawn random variables and our aren't of this type. But we can eliminate the state from our description using the same method used by functional programmers.


Let's choose a Markov chain to play with. I'll pick one with 3 states called , and and with transition probabilities given by where


Here's a diagram illustrating our states:




Implementation

First some imports:


> {-# LANGUAGE LambdaCase #-}
> {-# LANGUAGE TypeApplications #-}


> import Data.Sequence(replicateA) > import System.Random > import Control.Monad.State > import Control.Monad > import Data.List > import Data.Array


And now the type of our random variable:


> data ABC = A | B | C deriving (Eq, Show, Ord, Enum, Bounded)


We are now in a position to simulate our Markov chain. First we need some random numbers drawn uniformly from [0, 1]:


> uniform :: (RandomGen gen, MonadState gen m) => m Double
> uniform = state random


And now the code to take a single step in the Markov chain:


> step :: (RandomGen gen, MonadState gen m) => ABC -> m ABC
> step A = do
>     a <- uniform
>     if a < 0.5
>         then return A
>         else return B
> step B = do
>     a <- uniform
>     if a < 1/3.0
>         then return A
>         else if a < 2/3.0
>             then return B
>             else return C
> step C = do
>     a <- uniform
>     if a < 0.5
>         then return B
>         else return C


Notice how the step function generates a new state at random in a way that depends on the previous state. The m ABC in the type signature makes it clear that we are generating random states at each step.


We can simulate the effect of taking steps with a function like this:


> steps :: (RandomGen gen, MonadState gen m) => Int -> ABC -> m ABC
> steps 0 i = return i
> steps n i = do
>     i <- steps (n-1) i
>     step i


We can run for 100 steps, starting with , with a line like so:


*Main> evalState (steps 3 A) gen
B


The starting state of our random number generator is given by gen.


Consider the distribution of states after taking steps. For Markov chains of this type, we know that as goes to infinity the distribution of the th state approaches a limiting "stationary" distribution. There are frequently times when we want to sample from this final distribution. For a Markov chain as simple as this example, you can solve exactly to find the limiting distribution. But for real world problems this can be intractable. Instead, a popular solution is to pick a large and hope it's large enough. As gets larger the distribution gets closer to the limiting distribution. And that's the problem I want to solve here - sampling from the limit. It turns out that by thinking about random functions instead of random states we can actually sample from the limiting distribution exactly.



Some random functions


Here is a new version of our random step function:


> step' :: (RandomGen gen, MonadState gen m) => m (ABC -> ABC)
> step' = do
>     a <- uniform
>     return $ \case
>         A -> if a < 0.5 then A else B
>         B -> if a < 1/3.0
>                 then A
>                 else if a < 2/3.0 then B else C
>         C -> if a < 0.5 then B else C


In many ways it's similar to the previous one. But there's one very big difference: the type signature m (ABC -> ABC) tells us that it's returning a random function, not a random state. We can simulate the result of taking 10 steps, say, by drawing 10 random functions, composing them, and applying the result to our initial state:


> steps' :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps' n = do
>   fs <- replicateA n step'
>   return $ foldr (flip (.)) id fs


Notice the use of flip. We want to compose functions , each time composing on the left by the new . This means that for a fixed seed gen, each time you increase by 1 you get the next step in a single simulation: (BTW I used replicateA instead of replicateM to indicate that these are independent random draws. It may be well known that you can use Applicative instead of Monad to indicate independence but I haven't seen it written down.)


*Main> [f A | n <- [0..10], let f = evalState (steps' n) gen]
[A,A,A,B,C,B,A,B,A,B,C]


When I first implemented this I accidentally forgot the flip. So maybe you're wondering what effect removing the flip has? The effect is about as close to a miracle as I've seen in mathematics. It allows us to sample from the limiting distribution in a finite number of steps!


Here's the code:


> steps_from_past :: (RandomGen gen, MonadState gen m) => Int -> m (ABC -> ABC)
> steps_from_past n = do
>   fs <- replicateA n step'
>   return $ foldr (.) id fs


We end up building . This is still a composition of independent identically distributed functions and so it's still drawing from exactly the same distribution as steps'. Nonetheless, there is a difference: for a particular choice of seed, steps_from_past n no longer gives us a sequence of states from a Markov chain. Running with argument draws a random composition of functions. But if you increase by 1 you don't add a new step at the end. Instead you effectively restart the Markov chain with a new first step generated by a new random seed.


Try it and see:


*Main> [f A | n <- [0..10], let f = evalState (steps_from_past n) gen]
[A, A, A, A, A, A, A, A, A, A]


Maybe that's surprising. It seems to get stuck in one state. In fact, we can try applying the resulting function to all three states.


*Main> [fmap f [A, B, C] | n <- [0..10], let f = evalState (steps_from_past n) gen]
[[A,B,C],[A,A,B],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A],[A,A,A]]


In other words, for large enough we get the constant function.


Think of it this way: If f isn't injective then it's possible that two states get collapsed to the same state. If you keep picking random f's it's inevitable that you will eventually collapse down to the point where all arguments get mapped to the same state. Once this happens, we'll get the same result no matter how large we take . If we can detect this then we've found the limit of as goes to infinity. But because we know composing forwards and composing backwards lead to draws from the same distribution, the limiting backward composition must actually be a draw from the same distribution as the limiting forward composition. That flip can't change what probability distribution we're drawing from - just the dependence on the seed. So the value the constant function takes is actually a draw from the limiting stationary distribution.


We can code this up:


> all_equal :: (Eq a) => [a] -> Bool
> all_equal [] = True
> all_equal [_] = True
> all_equal (a : as) = all (== a) as


> test_constant :: (Bounded a, Enum a, Eq a) => (a -> a) -> Bool > test_constant f = > all_equal $ map f $ enumFromTo minBound maxBound


This technique is called coupling from the past. It's "coupling" because we've arranged that different starting points coalesce. And it's "from the past" because we're essentially asking answering the question of what the outcome of a simulation would be if we started infinitely far in the past.


> couple_from_past :: (RandomGen gen, MonadState gen m, Enum a, Bounded a, Eq a) =>
>                   m (a -> a) -> (a -> a) -> m (a -> a)
> couple_from_past step f = do
>     if test_constant f
>         then return f
>         else do
>             f' <- step
>             couple_from_past step (f . f')


We can now sample from the limiting distribution a million times, say:


*Main> let samples = map ($ A) $ evalState (replicateA 1000000 (couple_from_past step' id)) gen


We can now count how often A appears:


*Main> fromIntegral (length $ filter (== A) samples)/1000000
0.285748


That's a pretty good approximation to , the exact answer that can be found by finding the eigenvector of the transition matrix corresponding to an eigenvalue of 1.


> gen = mkStdGen 669



Notes

The technique of coupling from the past first appeared in a paper by Propp and Wilson. The paper Iterated Random Functions by Persi Diaconis gave me a lot of insight into it. Note that the code above is absolutely not how you'd implement this for real. I wrote the code that way so that I could switch algorithm with the simple removal of a flip. In fact, with some clever tricks you can make this method work with state spaces so large that you couldn't possibly hope to enumerate all starting states to detect if convergence has occurred. Or even with uncountably large state spaces. But I'll let you read the Propp-Wilson paper to find out how.

by sigfpe (noreply@blogger.com) at January 19, 2025 05:15 AM

Abhinav Sarkar

Interpreting Brainfuck in Haskell

Writing an interpreter for Brainfuck is almost a rite of passage for any programming language implementer, and it’s my turn now. In this post, we’ll write not one but four Brainfuck interpreters in Haskell. Let’s go!

This post was originally published on abhinavsarkar.net.

Introduction

Brainfuck (henceforth BF) is the most famous of esoteric programming languages. Its fame lies in the fact that it is extremely minimalistic, with only eight instructions, and very easy to implement. Yet, it is Turing-complete and as capable as any other programming language1. Writing an interpreter for BF is a fun exercise, and so there are hundreds, maybe even thousands of them. Since BF is very verbose, optimizing BF interpreters is almost a sport, with people posting benchmarks of their creations. I can’t say that what I have in this post is novel, but it was definitely a fun exercise for me.

BF has eight instructions of one character each. A BF program is a sequence of these instructions. It may have other characters as well, which are treated as comments and are ignored while executing. An instruction pointer (IP) points at the next instruction to be executed, starting with the first instruction. The instructions are executed sequentially, except for the jump instructions that may cause the IP to jump to remote instructions. The program terminates when the IP moves past the last instruction.

BF programs work by modifying data in a memory that is an array of at least 30000 byte cells initialized to zero. A data pointer (DP) points to the current byte of the memory to be modified, starting with the first byte of the memory. BF programs can also read from standard input and write to standard output, one byte at a time using the ASCII character encoding.

The eight BF instructions each consist of a single character:

>
Increment the DP by one to point to the next cell to the right.
<
Decrement the DP by one to point to the next cell to the left.
+
Increment the byte at the DP by one.
-
Decrement the byte at the DP by one.
.
Output the byte at the DP.
,
Accept one byte of input, and store its value in the byte at the DP.
[
If the byte at the DP is zero, then instead of moving the IP forward to the next command, jump it forward to the command after the matching ] command.
]
If the byte at the DP is nonzero, then instead of moving the IP forward to the next command, jump it back to the command after the matching [ command.

Each [ matches exactly one ] and vice versa, and the [ comes first. Together, they add conditions and loops to BF.

Some details are left to implementations. In our case, we assume that the memory cells are signed bytes that underflow and overflow without errors. Also, accessing the memory beyond array boundaries wraps to the opposite side without errors.

For a taste, here is a small BF program that prints Hello, World! when run:

+++++++++++[>++++++>+++++++++>++++++++>++++>+++>+<<<<<<-]>+++
+++.>++.+++++++..+++.>>.>-.<<-.<.+++.------.--------.>>>+.>-.

As you can imagine, interpreting BF is easy, at least when doing it naively. So instead of writing one interpreter, we are going to write four, with increasing performance and complexity.

Setup

First, some imports:

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Monad (void)
import Data.Bits (shiftR, (.&.))
import Data.ByteArray qualified as BA
import Data.Char (chr, ord)
import Data.Functor (($>))
import Data.Int (Int8)
import Data.Kind (Type)
import Data.Vector qualified as V
import Data.Vector.Storable.Mutable qualified as MV
import Data.Word (Word16, Word8)
import Foreign.Ptr (Ptr, castPtr, minusPtr, plusPtr)
import Foreign.Storable qualified as S
import System.Environment (getArgs, getProgName)
import System.Exit (exitFailure)
import System.IO qualified as IO
import Text.ParserCombinators.ReadP qualified as P

We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the memory and vector libraries.

We abstract the interpreter interface as a typeclass:

class Interpreter a where
  data Program a :: Type
  parse :: String -> Program a
  interpret :: Memory -> Program a -> IO ()

An Interpreter is specified by a data type Program and two functions: parse parses a string to a Program, and interpret interprets the parsed Program.

For modelling the mutable memory, we use a mutable unboxed IOVector of signed bytes (Int8) from the vector package. Since our interpreter runs in IO, this works well for us. The DP hence, is modelled as a index in this vector, which we name the MemIdx type.

newtype Memory = Memory {unMemory :: MV.IOVector Int8}
type MemIdx = Int

newMemory :: Int -> IO Memory
newMemory = fmap Memory . MV.new

memorySize :: Memory -> Int
memorySize = MV.length . unMemory

readMemory :: Memory -> MemIdx -> IO Int8
readMemory = MV.unsafeRead . unMemory

writeMemory :: Memory -> MemIdx -> Int8 -> IO ()
writeMemory = MV.unsafeWrite . unMemory

modifyMemory :: Memory -> (Int8 -> Int8) -> MemIdx -> IO ()
modifyMemory = MV.unsafeModify . unMemory

nextMemoryIndex :: Memory -> MemIdx -> MemIdx
nextMemoryIndex memory memIdx = (memIdx + 1) `rem` memorySize memory

prevMemoryIndex :: Memory -> MemIdx -> MemIdx
prevMemoryIndex memory memIdx = (memIdx - 1) `mod` memorySize memory

We wrap the IOVector Int8 with a Memory newtype. newMemory creates a new memory array of bytes initialized to zero. memorySize returns the size of the memory. readMemory, writeMemory and modifyMemory are for reading from, writing to and modifying the memory respectively. nextMemoryIndex and prevMemoryIndex increment and decrement the array index respectively, taking care of wrapping at boundaries.

Now we write the main function using the Interpreter typeclass functions:

main :: IO ()
main = do
  IO.hSetBuffering IO.stdin IO.NoBuffering
  IO.hSetBuffering IO.stdout IO.LineBuffering
  progName <- getProgName
  let usage = "Usage: " <> progName <> " -(s|a|b|o) <bf_file>"

  getArgs >>= \case
    [interpreterType, fileName] -> do
      code <- filter (`elem` "+-.,><[]") <$> readFile fileName
      memory <- newMemory 30000
      parseAndInterpret memory code usage interpreterType
    _ -> exitWithMsg usage
  where
    parseAndInterpret memory code usage = \case
      "-s" -> interpret @StringInterpreter memory $ parse code
      "-a" -> interpret @ASTInterpreter memory $ parse code
      "-b" -> interpret @BytecodeInterpreter memory $ parse code
      "-o" -> interpret @OptimizingBytecodeInterpreter memory $ parse code
      t -> exitWithMsg $ "Invalid interpreter type: " <> t <> "\n" <> usage

    exitWithMsg msg = IO.hPutStrLn IO.stderr msg >> exitFailure

The main function calls the parse and interpret functions for the right interpreter with a new memory and the input string read from the file specified in the command line argument. We make sure to filter out non-BF characters when reading the input file.

With the setup done, let’s move on to our first interpreter.

String Interpreter

A BF program can be interpreted directly from its string representation, going over the characters and executing the right logic for them. But strings in Haskell are notoriously slow because they are implemented as singly linked-lists of characters. Indexing into strings has \(O(n)\) time complexity, so it is not a good idea to use them directly. Instead, we use a char Zipper2.

data StringInterpreter

instance Interpreter StringInterpreter where
  data Program StringInterpreter = ProgramCZ CharZipper
  parse = ProgramCZ . czFromString
  interpret memory (ProgramCZ code) = interpretCharZipper memory code

Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus3.

data CharZipper = CharZipper
  {czLeft :: String, czFocus :: Maybe Char, czRight :: String}

czFromString :: String -> CharZipper
czFromString = \case
  [] -> CharZipper [] Nothing []
  (x : xs) -> CharZipper [] (Just x) xs

czMoveLeft :: CharZipper -> CharZipper
czMoveLeft = \case
  CharZipper [] (Just focus) right -> CharZipper [] Nothing (focus : right)
  CharZipper (x : xs) (Just focus) right -> CharZipper xs (Just x) (focus : right)
  z -> z

czMoveRight :: CharZipper -> CharZipper
czMoveRight = \case
  CharZipper left (Just focus) [] -> CharZipper (focus : left) Nothing []
  CharZipper left (Just focus) (x : xs) -> CharZipper (focus : left) (Just x) xs
  z -> z

This zipper is a little different from the usual implementations because we need to know when the focus of the zipper has moved out the program boundaries. Hence, we model the focus as Maybe Char. czFromString creates a char zipper from a string. czMoveLeft and czMoveRight move the focus left and right respectively, taking care of setting the focus to Nothing if we move outside the program string.

Parsing the program is thus same as creating the char zipper from the program string. For interpreting the program, we write this function:

interpretCharZipper :: Memory -> CharZipper -> IO ()
interpretCharZipper memory = go 0
  where
    go !memIdx !program = case czFocus program of
      Nothing -> return ()
      Just c -> case c of
        '+' -> modifyMemory memory (+ 1) memIdx >> goNext
        '-' -> modifyMemory memory (subtract 1) memIdx >> goNext
        '>' -> go (nextMemoryIndex memory memIdx) program'
        '<' -> go (prevMemoryIndex memory memIdx) program'
        ',' -> do
          getChar >>= writeMemory memory memIdx . fromIntegral . ord
          goNext
        '.' -> do
          readMemory memory memIdx >>= putChar . chr . fromIntegral
          goNext
        '[' -> readMemory memory memIdx >>= \case
          0 -> go memIdx $ skipRight 1 program
          _ -> goNext
        ']' -> readMemory memory memIdx >>= \case
          0 -> goNext
          _ -> go memIdx $ skipLeft 1 program
        _ -> goNext
      where
        program' = czMoveRight program
        goNext = go memIdx program'

Our main driver here is the tail-recursive go function that takes the memory index and the program as inputs. It then gets the current focus of the program zipper, and executes the BF logic accordingly.

If the current focus is Nothing, it means the program has finished running. So we end the execution. Otherwise, we switch over the character and do what the BF spec tells us to do.

For + and -, we increment or decrement respectively the value in the memory cell at the current index, and go to the next character. For > and <, we increment or decrement the memory index respectively, and go to the next character.

For ,, we read an ASCII encoded character from the standard input, and write it to the memory at the current memory index as a byte. For ., we read the byte from the memory at the current memory index, and write it out to the standard output as an ASCII encoded character. After either cases, we go to the next character.

For [, we read the byte at the current memory index, and if it is zero, we skip right over the part of the program till the matching ] is found. Otherwise, we go to the next character.

For ], we skip left over the part of the program till the matching [ is found, if the current memory byte is non-zero. Otherwise, we go to the next character.

The next two functions implement the skipping logic:

skipRight :: Int -> CharZipper -> CharZipper
skipRight !depth !program
  | depth == 0 = program'
  | otherwise = case czFocus program' of
      Nothing -> error "No matching [ while skipping the loop forward"
      Just '[' -> skipRight (depth + 1) program'
      Just ']' -> skipRight (depth - 1) program'
      _ -> skipRight depth program'
  where
    program' = czMoveRight program

skipLeft :: Int -> CharZipper -> CharZipper
skipLeft !depth !program
  | depth == 0 = czMoveRight program
  | otherwise = case czFocus program' of
      Nothing -> error "No matching ] while skipping the loop backward"
      Just ']' -> skipLeft (depth + 1) program'
      Just '[' -> skipLeft (depth - 1) program'
      _ -> skipLeft depth program'
  where
    program' = czMoveLeft program

The tail-recursive functions skipRight and skipLeft skip over parts of the program by moving the focus to right and left respectively, till the matching bracket is found. Since the loops can contain nested loops, we keep track of the depth of loops we are in, and return only when the depth becomes zero. If we move off the program boundaries while skipping, we throw an error.

That’s it! We now have a fully functioning BF interpreter. To test it, we use these two BF programs: hanoi.bf and mandelbrot.bf.

hanoi.bf solves the Tower of Hanoi puzzle with animating the solution process as ASCII art:

A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf <noscript>A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf</noscript>
A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf

mandelbrot.bf prints an ASCII art showing the Mandelbrot set:

Mandelbrot set ASCII art by mandelbrot.bf <noscript>Mandelbrot set ASCII art by mandelbrot.bf</noscript>
Mandelbrot set ASCII art by mandelbrot.bf

Both of these BF programs serve as good benchmarks for BF interpreters. Let’s test ours by compiling and running it4:

❯ nix-shell -p "ghc.withPackages (pkgs: with pkgs; [vector memory])" \
    --run "ghc --make bfi.hs -O2"
[1 of 2] Compiling Main             ( bfi.hs, bfi.o )
[2 of 2] Linking bfi [Objects changed]
❯ time ./bfi -s hanoi.bf > /dev/null
       29.15 real        29.01 user         0.13 sys
❯ time ./bfi -s mandelbrot.bf > /dev/null
       94.86 real        94.11 user         0.50 sys

That seems quite slow. We can do better.

AST Interpreter

Instead of executing BF programs from their string representations, we can parse them to an Abstract Syntax Tree (AST). This allows us to match brackets only once at parse time, instead of doing it repeatedly at run time. We capture loops as AST nodes, allowing us to skip them trivially.

data ASTInterpreter

instance Interpreter ASTInterpreter where
  data Program ASTInterpreter = ProgramAST Instructions
  parse = ProgramAST . parseToInstrs
  interpret memory (ProgramAST instrs) = interpretAST memory instrs

We represent the BF AST as a Haskell Algebraic Data Type (ADT):

type Instructions = V.Vector Instruction

data Instruction
  = Inc               -- +
  | Dec               -- -
  | MoveR             -- >
  | MoveL             -- <
  | GetC              -- ,
  | PutC              -- .
  | Loop Instructions -- []
  deriving (Show)

There is one constructor per BF instruction, except for loops where the Loop constructor captures both the start and end of loop instructions. We use immutable boxed vectors for lists of instructions instead of Haskell lists so that we can index into them in \(O(1)\).

We use the parse combinator library ReadP to write a recursive-decent parser for BF:

parseToInstrs :: String -> Instructions
parseToInstrs code =
  V.fromList $ case P.readP_to_S (P.many instrParser <* P.eof) code of
    [(res, "")] -> res
    out -> error $ "Unexpected output while parsing: " <> show out

instrParser :: P.ReadP Instruction
instrParser = P.choice
  [ P.char '+' $> Inc,
    P.char '-' $> Dec,
    P.char '>' $> MoveR,
    P.char '<' $> MoveL,
    P.char ',' $> GetC,
    P.char '.' $> PutC,
    Loop . V.fromList <$> P.between (P.char '[') (P.char ']') (P.many instrParser)
  ]

All cases except the loop one are straightforward. For loops, we call the parser recursively to parse the loop body. Note that the parser matches the loop brackets correctly. If the brackets don’t match, the parser fails.

Next, we interpret the BF AST:

interpretAST :: Memory -> Instructions -> IO ()
interpretAST memory = void . interpretInstrs 0 memory

interpretInstrs :: MemIdx -> Memory -> Instructions -> IO MemIdx
interpretInstrs memIdx !memory !program = go memIdx 0
  where
    go !memIdx !progIdx
      | progIdx == V.length program = return memIdx
      | otherwise = case program V.! progIdx of
          Inc -> modifyMemory memory (+ 1) memIdx >> goNext
          Dec -> modifyMemory memory (subtract 1) memIdx >> goNext
          MoveR -> go (nextMemoryIndex memory memIdx) $ progIdx + 1
          MoveL -> go (prevMemoryIndex memory memIdx) $ progIdx + 1
          GetC -> do
            getChar >>= writeMemory memory memIdx . fromIntegral . ord
            goNext
          PutC -> do
            readMemory memory memIdx >>= putChar . chr . fromIntegral
            goNext
          Loop instrs -> readMemory memory memIdx >>= \case
            0 -> goNext
            _ -> interpretInstrs memIdx memory instrs >>= flip go progIdx
      where
        goNext = go memIdx $ progIdx + 1

The AST interpreter code is quite similar to the string interpreter one. This time we use an integer as the IP to index the Instructions vector. All cases except the loop one are pretty much same as before.

For loops, we read the byte at the current memory index, and if it is zero, we skip executing the Loop AST node and go to the next instruction. Otherwise, we recursively interpret the loop body and go to the next instruction, taking care of passing the updated memory index returned from the recursive call to the execution of the next instruction.

And we are done. Let’s see how it performs:

❯ time ./bfi -a hanoi.bf > /dev/null
       14.94 real        14.88 user         0.05 sys
❯ time ./bfi -a mandelbrot.bf > /dev/null
       36.49 real        36.32 user         0.17 sys

Great! hanoi.bf runs 2x faster, whereas mandelbrot.bf runs 2.6x faster. Can we do even better?

Bytecode Interpreter

AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST, it jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode. That’s what our next interpreter uses.

data BytecodeInterpreter

instance Interpreter BytecodeInterpreter where
  data Program BytecodeInterpreter = ProgramBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> assemble
      >>> ProgramBC
  interpret memory (ProgramBC bytecode) = interpretBytecode memory bytecode

We reuse the parser from the AST interpreter, but then we convert the resultant AST into bytecode by translating and assembling it5. We use the Bytes byte array data type from the memory package to represent bytecode.

Unlike AST, bytecode has a flat list of instructions—called Opcodes—that can be encoded in a single byte each, with optional parameters. Because of its flat nature and compactness, bytecode is more CPU friendly to execute, which is where it gets its performance from. The downside is that bytecode is not human readable unlike AST.

data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]

We use the Opcode ADT to model the BF opcodes. For now, it corresponds one-to-one with the Instruction ADT.

The translate function translates Instructions to Opcodes:

translate :: Instructions -> Opcodes
translate = V.toList >>> map translateOpcode
  where
    translateOpcode = \case
      Inc -> OpInc
      Dec -> OpDec
      MoveR -> OpMoveR
      MoveL -> OpMoveL
      GetC -> OpGetC
      PutC -> OpPutC
      Loop instrs -> OpLoop $ translate instrs

The assemble function assembles Opcodes to bytecode byte array:

assemble :: Opcodes -> BA.Bytes
assemble = BA.pack . concatMap assembleOpcode

assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]
  where
    assembleBodyLen bodyLen =
      let lb = fromIntegral $ bodyLen .&. 0xff
          mb = fromIntegral $ (bodyLen .&. 0xff00) `shiftR` 8
       in [lb, mb] -- assumes Little-endian arch

The assembleOpcode function assembles an Opcode to a list of bytes (Word8s). For all cases except for OpLoop, we simply return a unique byte for the opcode.

For OpLoop, we first recursively assemble the loop body. We encode both the body and the body length in the assembled bytecode, so that the bytecode interpreter can use the body length to skip over the loop body when required. We use two bytes to encode the body length, so we first check if the body length plus three is over 65536 (\(= 2^8*2^8\)). If so, we throw an error. Otherwise, we return:

  1. a unique byte for loop start (6),
  2. followed by the body length encoded in two bytes (in the Little-endian order),
  3. then the assembled loop body,
  4. followed by a unique byte for loop end (7),
  5. finally followed by the encoded body length again.

We encode the body length at the end again so that we can use it to jump backward to the start of the loop, to continue looping. Let’s look at this example to understand the loop encoding better:

> code = "++++++++++++++++++++++++++++++++++++++++++++++++>+++++[<+.>-]"
> concatMap assembleOpcode . translate . parseToInstrs $ code
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,6,8,0,3,0,5,2,1,7,8,0]

Let’s focus on the last twelve bytes. The diagram below shows the meaning of the various bytes:

Assembled bytecode for a  loop <noscript>Assembled bytecode for a  loop</noscript>
Assembled bytecode for a BF loop

The example also demonstrates the flat nature of assembled bytecode. Now, all we have to do is to interpret it:

interpretBytecode :: Memory -> BA.Bytes -> IO ()
interpretBytecode memory bytecode =
  MV.unsafeWith
    (unMemory memory)
    (BA.withByteArray bytecode
      . interpretBytecodePtr (memorySize memory) (BA.length bytecode))

Instead of using integer indices in the bytecode array and memory vector, this time we use C-style direct pointers6:

type ProgPtr = Ptr Word8
type MemPtr = Ptr Int8

interpretBytecodePtr :: Int -> Int -> MemPtr -> ProgPtr -> IO ()
interpretBytecodePtr memLen programLen memStartPtr progStartPtr =
  go memStartPtr progStartPtr
  where
    progEndPtr = progStartPtr `plusProgPtr` programLen
    memEndPtr = memStartPtr `plusMemPtr` memLen

    go !memPtr !progPtr
      | progPtr == progEndPtr = return ()
      | otherwise = readProg >>= \case
          0 -> modifyMem (+ 1) >> goNext                           -- Inc
          1 -> modifyMem (subtract 1) >> goNext                    -- Dec
          2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
          3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
          4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
          5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
          6 -> readMem >>= \case                                   -- Loop start
            0 -> readProg2 >>= jump memPtr
            _ -> jump memPtr 3
          7 -> readMem >>= \case                                   -- Loop end
            0 -> jump memPtr 3
            _ -> readProg2 >>= jump memPtr . negate
          8 -> writeMem 0 >> goNext                                -- Clear
          op -> error $ "Unknown opcode: " <> show op
      where
        goNext = jump memPtr 1
        jump memPtr offset = go memPtr $ progPtr `plusProgPtr` offset

        readProg = S.peek progPtr
        readProg2 = -- assumes Little-endian arch
          fromIntegral <$> S.peek (castPtr @_ @Word16 $ progPtr `plusProgPtr` 1)

        readMem = S.peek memPtr
        writeMem = S.poke memPtr
        modifyMem f = readMem >>= writeMem . f

In Haskell, the pointer type Ptr is parametrized by the type of the data it points to. We have two types of pointers here, one that points to the bytecode program, and another that points to the memory cells. So in this case, the IP and DP are actually pointers.

The go function here is again the core of the interpreter loop. We track the current IP and DP in it, and execute the logic corresponding to the opcode at the current memory location. go ends when the IP points to the end of the program byte array.

Most of the cases in go are similar to previous interpreters. Only difference is that we use pointers to read the current opcode and memory cell. For the loop start opcode, we read the byte pointed to by the DP, and if it is zero, we read the next two bytes from the program bytecode, and use it as the offset to jump the IP by to skip over the loop body. Otherwise, we jump the IP by 3 bytes to skip over the loop start opcode and encoded loop body length bytes. For the loop end opcode, we follow similar steps, except we jump backward to the start of the loop.

The helper functions for doing pointer arithmetic are following:

plusProgPtr :: ProgPtr -> Int -> ProgPtr
plusProgPtr = plusPtr

plusMemPtr :: MemPtr -> Int -> MemPtr
plusMemPtr = plusPtr

nextMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
nextMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` inc
   in if memEndPtr > memPtr'
        then memPtr'
        else memStartPtr `plusPtr` (memPtr' `minusPtr` memEndPtr)

prevMemPtr :: MemPtr -> MemPtr -> MemPtr -> Int -> MemPtr
prevMemPtr memStartPtr memEndPtr memPtr inc =
  let memPtr' = memPtr `plusMemPtr` (-1 * inc)
   in if memPtr' >= memStartPtr
        then memPtr'
        else memEndPtr `plusPtr` (memPtr' `minusPtr` memStartPtr)

nextMemPtr and prevMemPtr implement wrapping of pointers as we do for memory indices in nextMemoryIndex and prevMemoryIndex. Let’s see what the results of our hard work are:

❯ time ./bfi -b hanoi.bf > /dev/null
       11.10 real        11.04 user         0.04 sys
❯ time ./bfi -b mandelbrot.bf > /dev/null
       15.72 real        15.68 user         0.04 sys

1.3x and 2.3x speedups for hanoi.bf and mandelbrot.bf respectively over the AST interpreter. Not bad. But surely we can do even better?

Optimizing Bytecode Interpreter

We can optimize our bytecode interpreter by emitting specialized opcodes for particular patterns of opcodes that occur frequently. Think of it as replacing every occurrence of a long phrase in a text with a single word that means the same, leading to a shorter text and faster reading time. Since BF is so verbose, there are many opportunities for optimizing BF bytecode7. We are going to implement only one simple optimization, just to get a taste of how to do it.

data OptimizingBytecodeInterpreter

instance Interpreter OptimizingBytecodeInterpreter where
  data Program OptimizingBytecodeInterpreter = ProgramOBC BA.Bytes
  parse =
    parseToInstrs
      >>> translate
      >>> optimize
      >>> assemble
      >>> ProgramOBC
  interpret memory (ProgramOBC bytecode) = interpretBytecode memory bytecode

The optimizing bytecode interpreter is pretty much same as the bytecode interpreter, with the optimize function called between the translation and assembly phases.

The pattern of opcode we are optimizing for is [-] and [+]. Both of these BF opcodes when executed, decrement or increment the current memory cell till it becomes zero. In effect, these patterns clear the current cell. We start the process by adding a new Opcode for clearing a cell:

data Opcode
  = OpInc
  | OpDec
  | OpMoveR
  | OpMoveL
  | OpGetC
  | OpPutC
  | OpLoop Opcodes
  | OpClear
  deriving (Show)

type Opcodes = [Opcode]

The optimize function recursively goes over the Opcodes, and emits optimized ones by replacing the patterns that clear the current cell with OpClear:

optimize :: Opcodes -> Opcodes
optimize = map $ \case
  OpLoop [OpDec] -> OpClear
  OpLoop [OpInc] -> OpClear
  OpLoop body -> OpLoop $ optimize body
  op -> op

Then we modify the assembleOpcode function to emit a unique byte for OpClear:

assembleOpcode :: Opcode -> [Word8]
assembleOpcode = \case
  OpInc -> [0]
  OpDec -> [1]
  OpMoveR -> [2]
  OpMoveL -> [3]
  OpGetC -> [4]
  OpPutC -> [5]
  OpLoop body ->
    let assembledBody = concatMap assembleOpcode body
        bodyLen = length assembledBody + 3
     in if bodyLen > 65_536 -- 2 ^ 16
          then error $ "Body of loop is too big: " <> show bodyLen
          else do
            let assembledBodyLen = assembleBodyLen bodyLen
            [6] <> assembledBodyLen <> assembledBody <> [7] <> assembledBodyLen
  OpClear -> [8]

Finally, we modify the bytecode interpreter to execute the OpClear opcode.

go !memPtr !progPtr
  | progPtr == progEndPtr = return ()
  | otherwise = readProg >>= \case
      0 -> modifyMem (+ 1) >> goNext                           -- Inc
      1 -> modifyMem (subtract 1) >> goNext                    -- Dec
      2 -> jump (nextMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveR
      3 -> jump (prevMemPtr memStartPtr memEndPtr memPtr 1) 1  -- MoveL
      4 -> getChar >>= writeMem . fromIntegral . ord >> goNext -- GetC
      5 -> readMem >>= putChar . chr . fromIntegral >> goNext  -- PutC
      6 -> readMem >>= \case                                   -- Loop start
        0 -> readProg2 >>= jump memPtr
        _ -> jump memPtr 3
      7 -> readMem >>= \case                                   -- Loop end
        0 -> jump memPtr 3
        _ -> readProg2 >>= jump memPtr . negate
      8 -> writeMem 0 >> goNext                                -- Clear
      op -> error $ "Unknown opcode: " <> show op

We can see how the patterns [-] and [+] that may execute operations tens, maybe hundreds, of times, are replaced by a single operation in the interpreter now. This is what gives us the speedup in this case. Let’s run it:

❯ time ./bfi -o hanoi.bf > /dev/null
        4.07 real         4.04 user         0.01 sys
❯ time ./bfi -o mandelbrot.bf > /dev/null
       15.58 real        15.53 user         0.04 sys

hanoi.bf runs 2.7x faster, whereas mandelbrot.bf is barely 1% faster as compared to the non-optimizing bytecode interpreter. This demonstrates how different optimizations apply to different programs, and hence the need to implement a wide variety of them to be able to optimize all programs well.

Comparison

It’s time for a final comparison of the run times of the four interpreters:

Interpreter Hanoi Mandelbrot
String 29.15s 94.86s
AST 14.94s 36.49s
Bytecode 11.10s 15.72s
Optimizing Bytecode 4.07s 15.58s

The final interpreter is 7x faster than the baseline one for hanoi.bf, and 6x faster for mandelbrot.bf. Here’s the same data as a chart:

Run time of the four interpreters
Run time of the four interpreters

That’s it for this post. I hope you enjoyed it and took something away from it. In a future post, we’ll explore more optimization for our BF interpreter. The full code for this post is available here.


  1. BF is Turning-complete. That means it can be used to implement any computable program. However, it is a Turing tarpit, which means it is not feasible to write any useful programs in it because of its lack of abstractions.↩︎

  2. A string interpreter also serves as an useful baseline for measuring the performance of BF interpreters. That’s why I decided to use strings instead of Data.Text or Data.Sequence, which are more performant.↩︎

  3. I am a big fan of zippers, as evidenced by this growing list of posts that I use them in.↩︎

  4. We use Nix for getting the dependency libraries.↩︎

  5. If you are unfamiliar, >>> is the left-to-right function composition function:

    f >>> g = g . f
    ↩︎
  6. While the only way to access byte arrays is pointers, we could have continued accessing the memory vector using indices. I benchmarked both methods, and found that using pointers for memory access sped up the execution of hanoi.bf by 1.1x and mandelbrot.bf by 1.6x as compared to index-based access. It’s also nice to learn how to use pointers in Haskell. This is why we chose to use Storable vectors for the memory.↩︎

  7. See BFC, which touts itself as “an industrial-grade Brainfuck compiler”, with a huge list of optimizations.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at January 19, 2025 12:00 AM

January 18, 2025

Sandy Maguire

A New Perspective on Lenses

I’ve always considered lenses to be a bit uncomfortable. While they’re occasionally useful for doing deeply nested record updates, they often seem to be more trouble than they’re worth. There’s a temptation in the novice programmer, to ^.. and folded their way to a solution that is much more naturally written merely as toList. And don’t get me started about the stateful operators like <<+= and their friends. Many programs which can be more naturally written functionally accidentally end up being imperative due to somebody finding a weird lens combinator and trying to use it in anger. Much like a serious drug collection, the tendency is to push it as far as you can.

Thus, my response has usually been one of pushback and moderation. I don’t avoid lenses at all costs, but I do try to limit myself to the prime types (Lens', Prism', Iso'), and to the boring combinators (view, set, over). I feel like these give me most of the benefits of lenses, without sending me tumbling down the rabbit hole.

All of this is to say that my grokkage of lenses has always been one of generalized injections and projections, for a rather shallow definition of “generalized”. That is, I’ve grown accustomed to thinking about lenses as getter/setter pairs for data structures—eg, I’ve got a big product type and I want to pull a smaller piece out of it, or modify a smaller piece in a larger structure. I think about prisms as the dual structure over coproducts—“generalized” injecting and pattern matching.

And this is all true; but I’ve been missing the forest for the trees on this one. That’s not to say that I want to write lensier code, but that I should be taking the “generalized” part much more seriously.

The big theme of my intellectual development over the last few years has been thinking about abstractions as shared vocabularies. Monoids are not inherently interesting; they’re interesting because of how they let you quotient seemingly-unrelated problems by their monoidal structure. Applicatives are cool because once you’ve grokked them, you begin to see them everywhere. Anywhere you’ve got conceptually-parallel, data-independent computations, you’ve got an applicative lurking somewhere under the surface (even if it happens to be merely the Identity applicative.)

I’ve had a similar insight about lenses, and that’s what I wanted to write about today.

The Context

At work, I’ve been thinking a lot about compilers and memory layout lately. I won’t get into the specifics of why, but we can come up with an inspired example. Imagine we’d like to use Haskell to write a little eDSL that we will use to generate x86 machine code.

The trick of course, is that we’re writing Haskell in order to not write machine code. So the goal is to design high-level combinators in Haskell that express our intent, while simultaneously generating machine code that faithfully implements the intention.

One particularly desirable feature about eDSLs is that they allow us to reuse Haskell’s type system. Thus, imagine we have some type:

type Code :: Type -> Type
data Code a = Code
  { getMachineCode :: [X86OpCode]
  }

Notice that the a parameter here is entirely phantom; it serves only to annotate the type of the value produced by executing getMachineCode. For today’s purpose, we’ll ignore all the details about calling conventions and register layout and what not; let’s just assume a Code a corresponds to a computation that leaves a value (or pointer) to something of type a in a well-known place, whether that be the top of the stack, or eax or something. It doesn’t matter!

Since the type parameter to Code is phantom, we need to think about what role it should have. Keeping it at phantom would be disastrous, since this type isn’t used by Haskell, but it is certainly used to ensure our program is correct. Similarly, representational seems wrong, since coerce is meaningful only when thinking about Haskell; which this thing decidedly is not. Thus, our only other option is:

type role Code nominal

Frustratingly, due to very similar reasoning, Code cannot be a functor, because there’s no way1 to lift an arbitrary Haskell function a -> b into a corresponding function Code a -> Code b. If there were, we’d be in the clear! But alas, we are not.

The Problem

All of the above is to say that we are reusing Haskell’s type system, but not its values. An expression of type Code Bool has absolutely no relation to the values True or False—except that we could write, by hand, a function litBool :: Bool -> Code Bool which happened to do the right thing.

It is tempting, however, to make new Haskell types in order to help constrain the assembly code we end up writing. For example, maybe we want to write a DSP for efficiently decoding audio. We can use Haskell’s types to organize our thoughts and prevent ourselves from making any stupid mistakes:

data Decoder = Decoder
  { format :: Format
  , seekPos :: Int
  , state :: ParserState
  }

data Chunk = ...

createDecoder :: Code MediaHandle -> Code Decoder
decodeChunk :: Code Decoder -> (Code Decoder, Code Chunk)

We now have a nice interface in our eDSL to guide end-users along the blessed path of signal decoding. We have documented what we are trying to do, and how it can be used once it’s implemented. But due to our phantom, yet nominal, parameter to Code, this is all just make believe. There is absolutely no correlation between what we’ve written down and how we can use it. The problem arises when we go to implement decodeChunk. We’ll need to know what state we’re in, which means we’ll need some function:

decoderState :: Code Decoder -> Code ParserState
decoderState = ???

In a world where Code is a functor, this is implemented trivially as fmap state. But Code is not a functor! Alas! Woe! What ever can we do?

The Solution

Lenses, my guy!

Recall that Code is phantom in its argument, even if we use roles to restrict that fact. This means we can implement a safe-ish version of unsafeCoerce, that only fiddles with the paramater of our phantom type:

unsafeCoerceCode :: Code a -> Code b
unsafeCoerceCode (Code ops) = Code ops

Judicious use of unsafeCoerceCode allows us to switch between a value’s type and its in-memory representation. For example, given a type:

type Bytes :: Nat -> Type
data Bytes n

we can reinterpret a Decode as a sequence of bytes:

decoderRep :: Iso' (Code Decoder) (Code (Bytes (32 + 4 + 1)))
decoderRep = iso unsafeCoerceCode unsafeCoerceCode

stateRep :: Iso' (Code ParserState) (Code (Bytes 1))
stateRep = iso unsafeCoerceCode unsafeCoerceCode

which says we are considering our Decoder to be laid out in memory like:

struct Decoder {
  char format[32];
  int32_t seekPos;
  char state;
};

Of course, this is a completely unsafe transformation, as far as the Haskell type system is aware. We’re in the wild west out here, well past any type theoretical life buoys. We’d better be right that this coercion is sound. But assuming this is in fact the in-memory representation of a Decoder, we are well justified in this transformation.

Notice the phrasing of our Iso' above. It is not an iso between Decoder and Bytes 37, but between Codes of such things. This witnesses the fact that it is not true in the Haskell embedding, merely in our Code domain. Of course, isos are like the least exciting optics, so let’s see what other neat things we can do.

Imagine we have some primitives:

slice
    :: n <= m
    => Int     -- ^ offset
    -> Proxy n -- ^ size
    -> Code (Bytes m)
    -> Code (Bytes n)

overwrite
    :: n <= m
    => Int  -- ^ offset
    -> Bytes n
    -> Bytes m
    -> Bytes m

which we can envision as Haskell bindings to the pseudo-C functions:

const char[n] slice(size_t offset, char[m] bytes) {
  return &bytes[offset];
}

char[m] overwrite(size_t offset, char[n] value, char[m] bytes) {
  char[m] new_bytes = malloc(m);
  memcpy(new_bytes, bytes, m);
  memcpy(&new_bytes[offset], value, n);
  return new_bytes;
}

We can use slice and overwrite to give a Lens' into Bytes:

slicing :: n <= m => Int -> Code (Bytes m) -> Code (Bytes n)
slicing offset =
  lens
    (slice offset Proxy)
    (\orig new -> overwrite offset new orig)

and finally, we can give an implementation of the desired decoderState above:

decoderState :: Lens' (Code Decoder) (Code ParserState)
decoderState = decoderRep . slicing 36 . from stateRep

Such a lens acts exactly as a record selector would, in that it allows us to view, set, and over a ParserState inside of a Decoder. But recall that Code is just a list of instructions we eventually want the machine to run. We’re using the shared vocabulary of lenses to emit machine code! What looks like using a data structure to us when viewed through the Haskell perspective, is instead invoking an assembler.

Reflections

Once the idea sinks in, you’ll start seeing all sorts of cool things you can do with optics to generate code. Prisms generalize running initializer code. A Traversal over Code can be implemented as a loop. And since all the sizes are known statically, if you’re feeling plucky, you can decide to unroll the loop right there in the lens.

Outside of the context of Code, the realization that optics are this general is still doing my head in. Something I love about working in Haskell is that I’m still regularly having my mind blown, even after a decade.


  1. Short of compiling to categories via something like categorifier.↩︎

January 18, 2025 09:18 AM

January 16, 2025

Michael Snoyman

The Paradox of Necessary Force

Humans want the resources of other humans. I want the food that the supermarket owns so that I can eat it. Before buying it, I wanted the house that I now own. And before that, someone wanted to build a house on that plot of land, which was owned by someone else first. Most of the activities we engage in during our lifetime revolve around extracting something from someone else.

There are two basic modalities to getting the resources of someone else. The first, the simplest, and the one that has dominated the majority of human history, is force. Conquer people, kill them, beat them up and take their stuff, force them into slavery and make them do your work. It’s a somewhat effective strategy. This can also be more subtle, by using coercive and fraudulent methods to trick people into giving you their resources. Let’s call this modality the looter approach.

The second is trade. In the world of trade, I can only extract resources from someone else when they willingly give them to me in exchange for something else of value. This can be barter of value for value, payment in money, built-up goodwill, favors, charity (exchanging resources for the benefit you receive for helping someone else), and more. In order to participate in this modality, you need to create your own valuable resources that other people want to trade for. Let’s call this the producer approach.

The producer approach is better for society in every conceivable way. The looter approach causes unnecessary destruction, pushes production into ventures that don’t directly help anyone (like making more weapons), and rewards people for their ability to inflict harm. By contrast, the producer approach rewards the ability to meet the needs of others and causes resources to end up in the hands of those who value them the most.

Looter philosophy is rooted in the concept of the zero sum game, the mistaken belief that I can only have more if someone else has less. By contrast, the producer philosophy correctly identifies the fact that we can all end up better by producing more goods in more efficient ways. We live in our modern world of relatively widespread luxury because producers have made technological leaps—for their own self-serving motives—that have improved everyone’s ability to produce more goods going forward. Think of the steam engine, electricity, computing power, and more.

A producer-only world

It would be wonderful to live in a world in which there are no looters. We all produce, we all trade, everyone receives more value than they give, and there is no wasted energy or destruction from the use of force.

Think about how wonderful it could be! We wouldn’t need militaries, allowing a massive amount of productive capacity to be channeled into things that make everyone’s lives better. We wouldn’t need police. Not only would that free up more resources, but would remove the threat of improper use of force by the state against citizens. The list goes on and on.

I believe many economists—especially Austrian economists—are cheering for that world. I agree with them on the cheering. It’s why things like Donald Trump’s plans for tariffs are so horrific in their eyes. Tariffs introduce an artificial barrier between nations, impeding trade, preventing the peaceful transfer of resources, and leading to a greater likelihood of armed conflict.

There’s only one problem with this vision, and it’s also based in economics: game theory.

Game theory and looters

Imagine I’m a farmer. I’m a great farmer, I have a large plot of land, I run my operations efficiently, and I produce huge amounts of food. I sell that food into the marketplace, and with that money I’m able to afford great resources from other people, who willingly trade them to me because they value the money more than their own resources. For example, how many T-shirts does the clothing manufacturer need? Instead of his 1,000th T-shirt, he’d rather sell it for $5 and buy some food.

While I’m really great as a farmer, I’m not very good as a fighter. I have no weapons training, I keep no weapons on my property, and I dislike violence.

And finally, there’s a strong, skilled, unethical person down the street. He could get a job with me on the farm. For back-breaking work 8 hours a day, I’ll pay him 5% of my harvest. Or, by contrast, he could act like the mafia, demand a “protection fee” of 20%, and either beat me up, beat up my family, or cause harm to my property, if I don’t pay it.

In other words, he could be a producer and get 5% in exchange for hard work, or be a looter and get 20% in exchange for easy (and, likely for him, fun) work. As described, the game theoretic choice is clear.

So how do we stop a producer world from devolving back into a looter world?

Deterrence

There’s only one mechanism I’m aware of for this, and it’s deterrence. As the farmer, I made a mistake. I should get weapons training. I should keep weapons on my farm. I should be ready to defend myself and my property. Because if I don’t, game theory ultimately predicts that all trade will collapse, and society as we know it will crumble.

I don’t necessarily have to have the power of deterrence myself. I could hire a private security company, once again allowing the producer world to work out well. I trade something of lesser value (some money) for something I value more (the protection afforded by private security). If I’m lucky, that security company will never need to do anything, because the mere threat of their presence is sufficient.

And in modern society, we generally hope to rely on the government police force to provide this protection.

There are easy ways to defeat the ability of deterrence to protect our way of life. The simplest is to defang it. Decriminalize violent and destructive acts, for example. Remove the consequences for bad, looter behavior, and you will incentivize looting. This is far from a theoretical discussion. We’ve seen the clear outcome in California, which has decriminalized theft under $950, resulting—in a completely predictable way—in more theft, stores closing, and an overall erosion of producer philosophy.

And in California, this is even worse. Those who try to be their own deterrence, by arming themselves and protecting their rights, are often the targets of government force instead of the looters.

I’m guessing this phrasing has now split my reading audience into three groups. Group A agrees wholly with what I’m saying. Group B believes what I’ve just written is pure evil and garbage. Group C initially disagreed with my statements, but has an open mind and is willing to consider a different paradigm. The next section is targeted at groups A and C. Group B: good luck with the broken world you’re advocating.

Global scale

This concept of deterrence applies at a global scale too. I would love to live in a world where all nations exchange value for value and never use force against others. In fact, I believe the ultimate vision for this kind of a world ends with anarcho-capitalism (though I don’t know enough about the topic to be certain). There ends up being no need for any force against anyone else. It’s a beautiful vision for a unified world, where there are no borders, there is no destruction, there is only unity through trade. I love it.

But game theory destroys this too. If the entire world disarmed, it would take just one person who thinks he can do better through looter tactics to destroy the system. The only way to defeat that is to have a realistic threat of force to disincentivize someone from acting like a looter.

And this is the paradox. In order to live in our wonderful world of production, prosperity, health, and happiness, we always need to have our finger near enough to the trigger to respond to looters with force. I know of no other approach that allows production to happen. (And I am very interested in other theoretical solutions to this problem, if anyone wants to share reading material.)

Peace through strength

This line of thinking leads to the concept of peace through strength. When those tempted to use violence see the overwhelming strength of their potential victims, they will be disincentivized to engage in violent behavior. It’s the story of the guy who wants to rob my farm. Or the roaming army in the ancient world that bypassed the well fortified walled city and attacked its unprotected neighbor.

There are critics of this philosophy. As put by Andrew Bacevich, "'Peace through strength' easily enough becomes 'peace through war.'" I don’t disagree at all with that analysis, and it’s something we must remain vigilant against. But disarming is not the answer, as it will, of course, necessarily lead to the victory of those willing to use violence on others.

In other words, my thesis here is that the threat of violence must be present to keep society civilized. But the cost of using that violence must be high enough that neither side is incentivized to initiate it.

Israel

I’d been thinking of writing a blog post on this topic for a few months now, but finally decided to today. Israel just agreed to a hostage deal with Hamas. In exchange for the release of 33 hostages taken in the October 7 massacre, Israel will hand over 1,000 terrorists in Israeli prisons.

I have all the sympathy in the world for the hostages and their families. I also have great sympathy for the Palestinian civilians who have been harmed, killed, displaced, and worse by this war. And I have empathy (as one of the victims) for all of the Israeli citizens who have lived under threat of rocket attacks, had our lives disrupted, and for those who have been killed by this war. War is hell, full stop.

My message here is to those who have been pushing the lie of “peace through negotiations.” Or peace through capitulation. Or anything else. These tactics are the reason the war has continued. As long as the incentive structure makes initiating a war a positive, wars will continue to be initiated. Hamas has made its stance on the matter clear: it has sworn for the eradication of all Jews within the region, and considers civilian casualties on the Palestinian side not only acceptable, but advantageous.

Gaza Chief's Brutal Calculation: Civilian Bloodshed Will Help Hamas

I know that many people who criticize Israel and put pressure on us to stop the war in Gaza believe they are doing so for noble reasons. (For the record, I also believe many people have less altruistic reasons for their stance.) I know people like to point to the list of atrocities they believe Israel has committed. And, by contrast, the pro-Israel side is happy to respond with corresponding atrocities from the other side.

I honestly believe this is all far beyond irrelevant. The only question people should be asking is: how do we disincentivize the continuation of hostilities? And hostage deals that result in the release of terrorists, allow “aid” to come in (which, if history is any indication, will be used to further the construction of tunnels and other sources for attack on Israel), and give Hamas an opportunity to rearm, only incentivize the continuation of the war.

In other words, if you care about the innocent people on either side, you should be opposed to this kind of capitulation. Whatever you think about the morality of each side, more people will suffer with this approach.

Skin in the game

It’s easy to say things like that when your life isn’t on the line. I also don’t think that matters much. Either the philosophical, political, and economic analysis is correct, or it isn’t. Nonetheless, I do have skin in the game here. I still live in a warzone. I am less than 15 kilometers from the Lebanese border. We’ve had Hezbollah tunnels reaching into our surrounding cities. My family had to lock ourselves inside when Hezbollah paratroopers had attempted to land in our city.

My wife (Miriam) and I have discussed this situation at length, many times, over the course of this war. If I’m ever taken hostage, I hope the Israeli government bombs the hell out of wherever I am being held. I say this not only because I believe it is the right, just, moral, ethical, and strategically correct thing to do. I say this because I am selfish:

  • I would rather die than be tortured by our enemies.
  • I would rather die than be leveraged to make my family and country less safe.
  • I would rather die than live the rest of my life a shell of my former self, haunted not only by the likely torture inflicted on me, but by the guilt of the harm to others resulting from my spared life.

I don’t know why this hostage deal went through now. I don’t know what pressures have been brought to bear on the leaders in Israel. I don’t know if they are good people trying to protect their citizens, nefarious power hungry cretins looking to abuse both the Israeli and Palestinian populace to stay in control, weak-willed toadies who do what they’re told by others, or simply stupid. But my own stance is clear.

But what about the Palestinians?

I said it above, and I’ll say it again: I truly do feel horrible for the trauma that the Palestinian people are going through. Not for the active terrorists mind you, I feel no qualms about those raising arms against us being destroyed. But everyone else, even those who wish me and my fellow Israelis harm. (And, if polling is to be believed, that’s the majority of Palestinians.) I would much rather that they not be suffering now, and that eventually through earned trust on both sides, everyone’s lots are improved.

But the framework being imposed by those who “love” peace isn’t allowing that to happen. Trust cannot be built when there’s a greater incentive to return to the use of force. I was strongly opposed to the 2005 disengagement from Gaza. But once it happened, it could have been one of those trust-building starting points. Instead, I saw many people justify further violence by Hamas—such as non-stop rocket attacks on the south of Israel—because Israel hadn’t done enough yet.

Notice how fundamentally flawed this mentality is, just from an incentives standpoint! Israel gives up control of land, something against its own overall interests and something desired by Palestinians, and is punished for it with increased violence against citizens. Hamas engaged in a brutal destruction of all of its opponents within the Palestinian population, launched attacks on Israel, and when Israel did respond with force, Israel was blamed for having not done enough to appease Hamas.

I know people will want to complicate this story by bringing up the laundry list of past atrocities, of assigning negative motivations to Israel and its leaders, and a million other evasions that are used to avoid actually solving this conflict. Instead, I beg everyone to just use basic logic.

The violence will continue as long as the violence gets results.

January 16, 2025 12:00 AM

January 13, 2025

Michael Snoyman

Incentives Determine Outcomes

My blog posts and reading material have both been on a decidedly economics-heavy slant recently. The topic today, incentives, squarely falls into the category of economics. However, when I say economics, I’m not talking about “analyzing supply and demand curves.” I’m talking about the true basis of economics: understanding how human beings make decisions in a world of scarcity.

A fair definition of incentive is “a reward or punishment that motivates behavior to achieve a desired outcome.” When most people think about economic incentives, they’re thinking of money. If I offer my son $5 if he washes the dishes, I’m incentivizing certain behavior. We can’t guarantee that he’ll do what I want him to do, but we can agree that the incentive structure itself will guide and ultimately determine what outcome will occur.

The great thing about monetary incentives is how easy they are to talk about and compare. “Would I rather make $5 washing the dishes or $10 cleaning the gutters?” But much of the world is incentivized in non-monetary ways too. For example, using the “punishment” half of the definition above, I might threaten my son with losing Nintendo Switch access if he doesn’t wash the dishes. No money is involved, but I’m still incentivizing behavior.

And there are plenty of incentives beyond our direct control! My son is also incentivized to not wash dishes because it’s boring, or because he has some friends over that he wants to hang out with, or dozens of other things. Ultimately, the conflicting array of different incentive structures placed on him will ultimately determine what actions he chooses to take.

Why incentives matter

A phrase I see often in discussions—whether they are political, parenting, economic, or business—is “if they could just do…” Each time I see that phrase, I cringe a bit internally. Usually, the underlying assumption of the statement is “if people would behave contrary to their incentivized behavior then things would be better.” For example:

  • If my kids would just go to bed when I tell them, they wouldn’t be so cranky in the morning.
  • If people would just use the recycling bin, we wouldn’t have such a landfill problem.
  • If people would just stop being lazy, our team would deliver our project on time.

In all these cases, the speakers are seemingly flummoxed as to why the people in question don’t behave more rationally. The problem is: each group is behaving perfectly rationally.

  • The kids have a high time preference, and care more about the joy of staying up now than the crankiness in the morning. Plus, they don’t really suffer the consequences of morning crankiness, their parents do.
  • No individual suffers much from their individual contribution to a landfill. If they stopped growing the size of the landfill, it would make an insignificant difference versus the amount of effort they need to engage in to properly recycle.
  • If a team doesn’t properly account for the productivity of individuals on a project, each individual receives less harm from their own inaction. Sure, the project may be delayed, company revenue may be down, and they may even risk losing their job when the company goes out of business. But their laziness individually won’t determine the entirety of that outcome. By contrast, they greatly benefit from being lazy by getting to relax at work, go on social media, read a book, or do whatever else they do when they’re supposed to be working.

Free Candy!

My point here is that, as long as you ignore the reality of how incentives drive human behavior, you’ll fail at getting the outcomes you want.

If everything I wrote up until now made perfect sense, you understand the premise of this blog post. The rest of it will focus on a bunch of real-world examples to hammer home the point, and demonstrate how versatile this mental model is.

Running a company

Let’s say I run my own company, with myself as the only employee. My personal revenue will be 100% determined by my own actions. If I decide to take Tuesday afternoon off and go fishing, I’ve chosen to lose that afternoon’s revenue. Implicitly, I’ve decided that the enjoyment I get from an afternoon of fishing is greater than the potential revenue. You may think I’m being lazy, but it’s my decision to make. In this situation, the incentive–money–is perfectly aligned with my actions.

Compare this to a typical company/employee relationship. I might have a bank of Paid Time Off (PTO) days, in which case once again my incentives are relatively aligned. I know that I can take off 15 days throughout the year, and I’ve chosen to use half a day for the fishing trip. All is still good.

What about unlimited time off? Suddenly incentives are starting to misalign. I don’t directly pay a price for not showing up to work on Tuesday. Or Wednesday as well, for that matter. I might ultimately be fired for not doing my job, but that will take longer to work its way through the system than simply not making any money for the day taken off.

Compensation overall falls into this misaligned incentive structure. Let’s forget about taking time off. Instead, I work full time on a software project I’m assigned. But instead of using the normal toolchain we’re all used to at work, I play around with a new programming language. I get the fun and joy of playing with new technology, and potentially get to pad my resume a bit when I’m ready to look for a new job. But my current company gets slower results, less productivity, and is forced to subsidize my extracurricular learning.

When a CEO has a bonus structure based on profitability, he’ll do everything he can to make the company profitable. This might include things that actually benefit the company, like improving product quality, reducing internal red tape, or finding cheaper vendors. But it might also include destructive practices, like slashing the R&D budget to show massive profits this year, in exchange for a catastrophe next year when the next version of the product fails to ship.

Golden Parachute CEO

Or my favorite example. My parents owned a business when I was growing up. They had a back office where they ran operations like accounting. All of the furniture was old couches from our house. After all, any money they spent on furniture came right out of their paychecks! But in a large corporate environment, each department is generally given a budget for office furniture, a budget which doesn’t roll over year-to-year. The result? Executives make sure to spend the entire budget each year, often buying furniture far more expensive than they would choose if it was their own money.

There are plenty of details you can quibble with above. It’s in a company’s best interest to give people downtime so that they can come back recharged. Having good ergonomic furniture can in fact increase productivity in excess of the money spent on it. But overall, the picture is pretty clear: in large corporate structures, you’re guaranteed to have mismatches between the company’s goals and the incentive structure placed on individuals.

Using our model from above, we can lament how lazy, greedy, and unethical the employees are for doing what they’re incentivized to do instead of what’s right. But that’s simply ignoring the reality of human nature.

Moral hazard

Moral hazard is a situation where one party is incentivized to take on more risk because another party will bear the consequences. Suppose I tell my son when he turns 21 (or whatever legal gambling age is) that I’ll cover all his losses for a day at the casino, but he gets to keep all the winnings.

What do you think he’s going to do? The most logical course of action is to place the largest possible bets for as long as possible, asking me to cover each time he loses, and taking money off the table and into his bank account each time he wins.

Heads I win, tails you lose

But let’s look at a slightly more nuanced example. I go to a bathroom in the mall. As I’m leaving, I wash my hands. It will take me an extra 1 second to turn off the water when I’m done washing. That’s a trivial price to pay. If I don’t turn off the water, the mall will have to pay for many liters of wasted water, benefiting no one. But I won’t suffer any consequences at all.

This is also a moral hazard, but most people will still turn off the water. Why? Usually due to some combination of other reasons such as:

  1. We’re so habituated to turning off the water that we don’t even consider not turning it off. Put differently, the mental effort needed to not turn off the water is more expensive than the 1 second of time to turn it off.
  2. Many of us have been brought up with a deep guilt about wasting resources like water. We have an internal incentive structure that makes the 1 second to turn off the water much less costly than the mental anguish of the waste we created.
  3. We’re afraid we’ll be caught by someone else and face some kind of social repercussions. (Or maybe more than social. Are you sure there isn’t a law against leaving the water tap on?)

Even with all that in place, you may notice that many public bathrooms use automatic water dispensers. Sure, there’s a sanitation reason for that, but it’s also to avoid this moral hazard.

A common denominator in both of these is that the person taking the action that causes the liability (either the gambling or leaving the water on) is not the person who bears the responsibility for that liability (the father or the mall owner). Generally speaking, the closer together the person making the decision and the person incurring the liability are, the smaller the moral hazard.

It’s easy to demonstrate that by extending the casino example a bit. I said it was the father who was covering the losses of the gambler. Many children (though not all) would want to avoid totally bankrupting their parents, or at least financially hurting them. Instead, imagine that someone from the IRS shows up at your door, hands you a credit card, and tells you you can use it at a casino all day, taking home all the chips you want. The money is coming from the government. How many people would put any restriction on how much they spend?

And since we’re talking about the government already…

Government moral hazards

As I was preparing to write this blog post, the California wildfires hit. The discussions around those wildfires gave a huge number of examples of moral hazards. I decided to cherry-pick a few for this post.

The first and most obvious one: California is asking for disaster relief funds from the federal government. That sounds wonderful. These fires were a natural disaster, so why shouldn’t the federal government pitch in and help take care of people?

The problem is, once again, a moral hazard. In the case of the wildfires, California and Los Angeles both had ample actions they could have taken to mitigate the destruction of this fire: better forest management, larger fire department, keeping the water reservoirs filled, and probably much more that hasn’t come to light yet.

If the federal government bails out California, it will be a clear message for the future: your mistakes will be fixed by others. You know what kind of behavior that incentivizes? More risky behavior! Why spend state funds on forest management and extra firefighters—activities that don’t win politicians a lot of votes in general—when you could instead spend it on a football stadium, higher unemployment payments, or anything else, and then let the feds cover the cost of screw-ups.

You may notice that this is virtually identical to the 2008 “too big to fail” bail-outs. Wall Street took insanely risky behavior, reaped huge profits for years, and when they eventually got caught with their pants down, the rest of us bailed them out. “Privatizing profits, socializing losses.”

Too big to fail

And here’s the absolute best part of this: I can’t even truly blame either California or Wall Street. (I mean, I do blame them, I think their behavior is reprehensible, but you’ll see what I mean.) In a world where the rules of the game implicitly include the bail-out mentality, you would be harming your citizens/shareholders/investors if you didn’t engage in that risky behavior. Since everyone is on the hook for those socialized losses, your best bet is to maximize those privatized profits.

There’s a lot more to government and moral hazard, but I think these two cases demonstrate the crux pretty solidly. But let’s leave moral hazard behind for a bit and get to general incentivization discussions.

Non-monetary competition

At least 50% of the economics knowledge I have comes from the very first econ course I took in college. That professor was amazing, and had some very colorful stories. I can’t vouch for the veracity of the two I’m about to share, but they definitely drive the point home.

In the 1970s, the US had an oil shortage. To “fix” this problem, they instituted price caps on gasoline, which of course resulted in insufficient gasoline. To “fix” this problem, they instituted policies where, depending on your license plate number, you could only fill up gas on certain days of the week. (Irrelevant detail for our point here, but this just resulted in people filling up their tanks more often, no reduction in gas usage.)

Anyway, my professor’s wife had a friend. My professor described in great detail how attractive this woman was. I’ll skip those details here since this is a PG-rated blog. In any event, she never had any trouble filling up her gas tank any day of the week. She would drive up, be told she couldn’t fill up gas today, bat her eyes at the attendant, explain how helpless she was, and was always allowed to fill up gas.

This is a demonstration of non-monetary compensation. Most of the time in a free market, capitalist economy, people are compensated through money. When price caps come into play, there’s a limit to how much monetary compensation someone can receive. And in that case, people find other ways of competing. Like this woman’s case: through using flirtatious behavior to compensate the gas station workers to let her cheat the rules.

The other example was much more insidious. Santa Monica had a problem: it was predominantly wealthy and white. They wanted to fix this problem, and decided to put in place rent controls. After some time, they discovered that Santa Monica had become wealthier and whiter, the exact opposite of their desired outcome. Why would that happen?

Someone investigated, and ended up interviewing a landlady that demonstrated the reason. She was an older white woman, and admittedly racist. Prior to the rent controls, she would list her apartments in the newspaper, and would be legally obligated to rent to anyone who could afford it. Once rent controls were in place, she took a different tact. She knew that she would only get a certain amount for the apartment, and that the demand for apartments was higher than the supply. That meant she could be picky.

She ended up finding tenants through friends-of-friends. Since it wasn’t an official advertisement, she wasn’t legally required to rent it out if someone could afford to pay. Instead, she got to interview people individually and then make them an offer. Normally, that would have resulted in receiving a lower rental price, but not under rent controls.

So who did she choose? A young, unmarried, wealthy, white woman. It made perfect sense. Women were less intimidating and more likely to maintain the apartment better. Wealthy people, she determined, would be better tenants. (I have no idea if this is true in practice or not, I’m not a landlord myself.) Unmarried, because no kids running around meant less damage to the property. And, of course, white. Because she was racist, and her incentive structure made her prefer whites.

You can deride her for being racist, I won’t disagree with you. But it’s simply the reality. Under the non-rent-control scenario, her profit motive for money outweighed her racism motive. But under rent control, the monetary competition was removed, and she was free to play into her racist tendencies without facing any negative consequences.

Bureaucracy

These were the two examples I remember for that course. But non-monetary compensation pops up in many more places. One highly pertinent example is bureaucracies. Imagine you have a government office, or a large corporation’s acquisition department, or the team that apportions grants at a university. In all these cases, you have a group of people making decisions about handing out money that has no monetary impact on them. If they give to the best qualified recipients, they receive no raises. If they spend the money recklessly on frivolous projects, they face no consequences.

Under such an incentivization scheme, there’s little to encourage the bureaucrats to make intelligent funding decisions. Instead, they’ll be incentivized to spend the money where they recognize non-monetary benefits. This is why it’s so common to hear about expensive meals, gift bags at conferences, and even more inappropriate ways of trying to curry favor with those that hold the purse strings.

Compare that ever so briefly with the purchases made by a small mom-and-pop store like my parents owned. Could my dad take a bribe to buy from a vendor who’s ripping him off? Absolutely he could! But he’d lose more on the deal than he’d make on the bribe, since he’s directly incentivized by the deal itself. It would make much more sense for him to go with the better vendor, save $5,000 on the deal, and then treat himself to a lavish $400 meal to celebrate.

Government incentivized behavior

This post is getting longer in the tooth than I’d intended, so I’ll finish off with this section and make it a bit briefer. Beyond all the methods mentioned above, government has another mechanism for modifying behavior: through directly changing incentives via legislation, regulation, and monetary policy. Let’s see some examples:

  • Artificial modification of interest rates encourages people to take on more debt than they would in a free capital market, leading to malinvestment and a consumer debt crisis, and causing the boom-bust cycle we all painfully experience.
  • Going along with that, giving tax breaks on interest payments further artificially incentivizes people to take on debt that they wouldn’t otherwise.
  • During COVID-19, at some points unemployment benefits were greater than minimum wage, incentivizing people to rather stay home and not work than get a job, leading to reduced overall productivity in the economy and more printed dollars for benefits. In other words, it was a perfect recipe for inflation.
  • The tax code gives deductions to “help” people. That might be true, but the real impact is incentivizing people to make decisions they wouldn’t have otherwise. For example, giving out tax deductions on children encourages having more kids. Tax deductions on childcare and preschools incentivizes dual-income households. Whether or not you like the outcomes, it’s clear that it’s government that’s encouraging these outcomes to happen.
  • Tax incentives cause people to engage in behavior they wouldn’t otherwise (daycare+working mother, for example).
  • Inflation means that the value of your money goes down over time, which encourages people to spend more today, when their money has a larger impact. (Milton Friedman described this as high living.)

Conclusion

The idea here is simple, and fully encapsulated in the title: incentives determine outcomes. If you want to know how to get a certain outcome from others, incentivize them to want that to happen. If you want to understand why people act in seemingly irrational ways, check their incentives. If you’re confused why leaders (and especially politicians) seem to engage in destructive behavior, check their incentives.

We can bemoan these realities all we want, but they are realities. While there are some people who have a solid internal moral and ethical code, and that internal code incentivizes them to behave against their externally-incentivized interests, those people are rare. And frankly, those people are self-defeating. People should take advantage of the incentives around them. Because if they don’t, someone else will.

(If you want a literary example of that last comment, see the horse in Animal Farm.)

How do we improve the world under these conditions? Make sure the incentives align well with the overall goals of society. To me, it’s a simple formula:

  • Focus on free trade, value for value, as the basis of a society. In that system, people are always incentivized to provide value to other people.
  • Reduce the size of bureaucracies and large groups of all kinds. The larger an organization becomes, the farther the consequences of decisions are from those who make them.
  • And since the nature of human beings will be to try and create areas where they can control the incentive systems to their own benefits, make that as difficult as possible. That comes in the form of strict limits on government power, for example.

And even if you don’t want to buy in to this conclusion, I hope the rest of the content was educational, and maybe a bit entertaining!

January 13, 2025 12:00 AM

January 12, 2025

Sandy Maguire

Read the Code, Not the Profile

At work a few weeks back, I found myself digging into profile reports, trying to determine why our program was running so slowly. Despite having the extremely obvious-in-retrospect data in front of me, I wasted a lot of time speeding up code that turned out to not move the needle at all.

Although perhaps it will be interesting only to future me, I thought it would be a good exercise to write up the experience—if only so I learn the lesson about how to read profiles and not make the same mistake again.

Some Context

I’m currently employed to work on a compiler. The performance has never been stellar, in that we were usually seeing about 5s to compile programs, even trivially small ones consisting of less than a hundred instructions. It was painful, but not that painful, since the test suite still finished in a minute or two. It was a good opportunity to get a coffee. I always assumed that the time penalties we were seeing were constant factors; perhaps it took a second or two to connect to Z3 or something like that.

But then we started unrolling loops, which turned trivially small programs into merely small programs, and our performance ballooned. Now we were looking at 45s for some of our tests! Uh oh! That’s no longer in the real of constant factors, and it was clear that something asymptotically was wrong.

So I fired up GHC with the trusty old -prof flag, and ran the test suite in +RTS -p mode, which instruments the program with all sorts of profiling goodies. After a few minutes, the test suite completed, and left a test-suite.prof file laying around in the current directory. You can inspect such things by hand, but tools like profiteur make the experience much nicer.

Without further ado, here’s what our profile looked like:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%

Well, that’s not very helpful. Of course MAIN takes 100% of the time. So I expanded that, and saw:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%

No clearer. Opening up main:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%

Sheesh.

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%

OH MY GOD. JUST TELL ME SOMETHING ALREADY.

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%

Fast forwarding for quite a while, I opened up the entire stack until I got to something that didn’t take 100% of the program’s runtime:

MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
  └ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
    └ getTest  . . . . . . . . . . . . . . . . . . . 100%
      └ test . . . . . . . . . . . . . . . . . . . . 100%
        └ makeTest . . . . . . . . . . . . . . . . . 100%
          └ makeTest.\ . . . . . . . . . . . . . . . 100%
            └ compileProgram . . . . . . . . . . . . 100%
              └ evalAppT . . . . . . . . . . . . . . 100%
                └ runAppT  . . . . . . . . . . . . . 100%
                  └ runAppT' . . . . . . . . . . . . 100%
                    └ withLogging  . . . . . . . . . 100%
                      └ transformSSA . . . . . . . . 100%
                        └ >>=  . . . . . . . . . . . 100%
                          └ >>>= . . . . . . . . . . 100%
                            └ ibind  . . . . . . . . 100%
                              └ ibind.\  . . . . . . 100%
                                └ ibind.\.\  . . . . 100%
                                  ├ toSSA  . . . . . 15%
                                  ├ transform1 . . . 15%
                                  ├ transform2 . . . 10%
                                  ├ transform3 . . . 10%
                                  ├ transform4 . . . 20%
                                  └ collectGarbage . 30%

Now we’re in business. I dutifully dug into toSSA, the transforms, and collectGarbage. I cached some things, used better data structures, stopped appending lists, you know, the usual Haskell tricks. My work was rewarded, in that I managed to shave 80% off the runtime of our program.

A few months later, we wrote a bigger program and fed it to the compiler. This one didn’t stop compiling. We left it overnight.

Uh oh. Turns out I hadn’t fixed the problem. I’d only papered over it.

Retrospective

So what went wrong here? Quite a lot, in fact! And worse, I had all of the information all along, but managed to misinterpret it at several steps of the process.

Unwinding the story stack, the most salient aspect of having not solved the problem was reducing the runtime by only 80%. Dramatic percentages feel like amazing improvements, but that’s because human brains are poorly designed for building software. In the real world, big percentages are fantastic. In software, they are linear improvements.

That is to say that a percentage-based improvement is \(O(n)\) faster in the best case. My efforts improved our runtime from 45s to 9s. Which feels great, but the real problem is that this program is measured in seconds at all.

It’s more informative to think in terms of orders of magnitude. Taking 45s on a ~3GHz processor is on the order of 1011 instructions, while 9s is 1010. How the hell is it taking us TEN BILLION instructions to compile a dinky little program? That’s the real problem. Improving things from one hundred billion down to ten billion is no longer very impressive at all.

To get a sense of the scale here, even if we spent 1M cycles (which feels conservatively expensive) for each instruction we wanted to compile, we should still be looking at < 0.1s. Somehow we are over 1000x worse than that.

So that’s one mistake I made: being impressed by extremely marginal improvements. Bad Sandy.

The other mistake came from my interpretation of the profile. As a quick pop quiz, scroll back up to the profile and see if you can spot where the problem is.

After expanding a few obviously-not-the-problem call centers that each were 100% of the runtime, I turned my brain off and opened all of the 100% nodes. But in doing so, I accidentally breezed past the real problem. The real problem is either that compileProgram takes 100% of the time of the test, or that transformSSA takes 100% of compiling the program. Why’s that? Because unlike main and co, test does more work than just compiling the program. It also does non-trivial IO to produce debugging outputs, and property checks the resulting programs. Similarly for compileProgram, which does a great deal more than transformSSA.

This is somewhat of a philosophical enlightenment. The program execution hasn’t changed at all, but our perspective has. Rather than micro-optimizing the code that is running, this new perspective suggests we should focus our effort on determining why that code is running in the first place.

Digging through transformSSA made it very obvious the problem was an algorithmic one—we were running an unbounded loop that terminated on convergence, where each step it took @O(n^2)@ work to make a single step. When I stopped to actually read the code, the problem was immediate, and the solution obvious.

The lesson? Don’t read the profile. Read the code. Use the profile to focus your attention.

January 12, 2025 03:29 PM

January 09, 2025

Edward Z. Yang

New Years resolutions for PyTorch in 2025

In my previous two posts "Ways to use torch.compile" and "Ways to use torch.export", I often said that PyTorch would be good for a use case, but there might be some downsides. Some of the downsides are foundational and difficult to remove. But some... just seem like a little something is missing from PyTorch. In this post, here are some things I hope we will end up shipping in 2025!

Improving torch.compile

A programming model for PT2. A programming model is a an abstract description of the system that is both simple (so anyone can understand it and keep it in their head all at once) and can be used to predict the system's behavior. The torch.export programming model is an example of such a description. Beyond export, we would like to help users understand why all aspects of PT2 behave the way it does (e.g., via improved error messages), and give simple, predictable tools for working around problems when they arise. The programming model helps us clearly define the intrinsic complexity of our compiler, which we must educate users about. This is a big effort involving many folks on the PyTorch team and I hope we can share more about this effort soon.

Pre-compilation: beyond single graph export. Whenever someone realizes that torch.compile compilation is taking a substantial amount of time on expensive cluster machines, the first thing they ask is, "Why don't we just compile it in advance?" To support precompiling the torch.compile API exactly as is not so easy; unlike a traditional compiler which gets the source program directly as input, users of torch.compile must actually run their Python program to hit the regions of code that are intended to be compiled. Nor can these regions be trivially enumerated and then compiled: not only must know all the metadata input tensors flowing into a region, a user might not even know what the compiled graphs are if a model has graph breaks.

OK, but why not just run the model, dump all the compiled products, and then reuse them later? This works! Here is a POC from Nikita Shulga where a special decorator aot_compile_sticky_cache swaps between exporting a graph and running the exported product. Zhengxu Chen used a similar idea to export Whisper as a few distinct graphs, which he then manually stitched together in C++ to get a Python-free version of Whisper. If you want training to work, you can more directly integrate AOTInductor as an Inductor backend, e.g., as seen in this POC.. We are a stones throw away from working precompilation, which can guarantee no compilation at runtime, we just need to put the pieces together!

Improving caching further. There are some gaps with caching which we hope to address in the near future: (1) loading Triton cache artifacts takes a long time because we still re-parse the Triton code before doing a cache lookup (James Wu is on this), (2) if you have a lot of small graphs, remote cache ends up having to do lots of small network requests, instead of one batched network request at the beginning (Oguz Ulgen recently landed this), (3) AOTAutograd cache is not fully rolled out yet (James Wu again). These collectively should be worth a 2x speedup or even more on warm cache time.

Fix multithreading. We should just make sure multithreading works, doing the testing and fiddly thread safety auditing needed to make it work. Here's a list of multithreading related issues.

Improving torch.export

Draft mode export. Export requires a lot of upfront work to even get an exported artifact in the first place. Draft mode export capitalizes on the idea that it's OK to generate an unsound "draft" graph early in the export, because even an incorrect graph is useful for kicking the tires on the downstream processing that happens after export. A draft export gives you a graph, and it also gives you a report describing what potential problems need to be fixed to get some guarantees about the correctness of the export. You can then chip away on the problems in the report until everything is green. One of the biggest innovations of draft-mode export is pervasive use of real tensor propagation when doing export: you run the export with actual tensors, so you can always trace through code, even if it is doing spicy things like data-dependent control flow.

Libtorch-free AOTInductor. AOTInductor generated binaries have a relatively small ABI surface that needs to be implemented. This hack from the most recent CUDA Mode meetup shows that you can just create an alternate implementation of the ABI that has no dependence on libtorch. This makes your deployed binary size much smaller!

Support for bundling CUDA kernels into AOTInductor. AOTInductor already supports directly bundling Triton kernels into the generated binary, but traditional CUDA kernels cannot be bundled in this way. There's no reason this has to be the case though: all we're doing is bundling cubins in both case. If we have the ability to bundle traditional CUDA kernels into AOTInductor, this means you could potentially directly embed custom operators into AOTInductor binaries, which is nice because then those operators no longer have to be offered on the runtime (especially if you're commonly iterating on these kernels!)

Export multigraphs. Export's standard model is to give you a single graph that you call unconditionally. But it's easy to imagine a level of indirection on top of these graphs, where we can dispatch between multiple graphs depending on some arguments to the model. For example, if you have a model that optionally takes an extra Tensor argument, you can simply have two graphs, one for when the Tensor is absent, and one for when it is present.

ABI stable PyTorch extensions. It's hard work being a third-party PyTorch extension with native code, because whenever there's a new release of Python or PyTorch you have to rebuild all of your wheels. If there was a limited ABI that you could build your extension against that didn't expose CPython and only relied on a small, stable ABI of PyTorch functions, your binary packaging situation would be much simpler! And if an extension relied on a small ABI, it could even be bundled with AOTInductor binary, letting these export products be truly package agnostic (one of our lessons we learned with torch.package is picking the split between "what is packaged" and "what is not" is very difficult, and people would much rather just have everything be packaged.) Jane Xu is investigating how to do this, and separately, Scott Wolchok has been refactoring headers in libtorch so that a small set of headers can be used independently of the rest of libtorch.

by Edward Z. Yang at January 09, 2025 08:50 PM

January 05, 2025

Manuel M T Chakravarty

Functional Programming in Swift

When people talk about functional programming in modern multi-paradigm languages, they usually mention Rust, Scala, or Kotlin. You rarely hear Swift being mentioned. This is odd, as one might argue that, of these languages, Swift places the strongest emphasis on functional programming.

In this talk, I will explain the core functional programming features of Swift, including its expressive type system, value types, and mutability control. Furthermore, I will discuss how Swift’s language design is influenced by the desire to create a language that addresses the whole spectrum from low-level systems programming up to high-level applications with sophisticated graphical user interfaces. Beyond the core language itself, functional programming also permeates Swift’s rich ecosystem of libraries. To support this point, I will outline some FP-inspired core libraries, covering concepts from functional data structures over functional reactive programming to declarative user interfaces.

Finally, I will briefly summarise practical considerations for using Swift in your own projects. This includes the cross-platform toolchain, the package manager, and interoperability with other languages.

January 05, 2025 07:45 PM

Abhinav Sarkar

Solving Advent of Code “Seating System” with Comonads and Stencils

In this post, we solve the Advent of Code 2020 “Seating System” challenge in Haskell using comonads and stencils.

This post was originally published on abhinavsarkar.net.

The Challenge

Here’s a quick summary of the challenge:

The seat layout fits on a grid. Each position is either floor (.), an empty seat (L), or an occupied seat (#). For example, the initial seat layout might look like this:

L.LL.LL.LL
LLLLLLL.LL
L.L.L..L..
LLLL.LL.LL
L.LL.LL.LL
L.LLLLL.LL
..L.L.....
LLLLLLLLLL
L.LLLLLL.L
L.LLLLL.LL

All decisions are based on the number of occupied seats adjacent to a given seat (one of the eight positions immediately up, down, left, right, or diagonal from the seat).

The following rules are applied to every seat simultaneously:

  • If a seat is empty (L) and there are no occupied seats adjacent to it, the seat becomes occupied.
  • If a seat is occupied (#) and four or more seats adjacent to it are also occupied, the seat becomes empty.
  • Otherwise, the seat’s state does not change.
Floor (.) never changes; seats don’t move, and nobody sits on the floor.

This is a classic Cellular Automaton problem. We need to write a program that simulates seats being occupied till no further seats are emptied or occupied, and returns the final number of occupied seats. Let’s solve this in Haskell.

The Cellular Automaton

First, some imports:

{-# LANGUAGE GHC2021 #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Arrow ((>>>))
import Control.Comonad (Comonad (..))
import Data.Function (on)
import Data.List (intercalate, nubBy)
import Data.Massiv.Array (Ix2 (..))
import Data.Massiv.Array qualified as A
import Data.Massiv.Array.Unsafe qualified as AU
import Data.Proxy (Proxy (..))
import Data.Vector.Generic qualified as VG
import Data.Vector.Generic.Mutable qualified as VGM
import Data.Vector.Unboxed qualified as VU
import System.Environment (getArgs, getProgName)

We use the GHC2021 extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the comonad, massiv and vector libraries.

Quoting the Wikipedia page on Cellular Automaton (CA):

  • A cellular automaton consists of a regular grid of cells, each in one of a finite number of states.
  • For each cell, a set of cells called its neighborhood is defined relative to the specified cell.
  • An initial state is selected by assigning a state for each cell.
  • A new generation is created, according to some fixed rule that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood.

Let’s model the automaton of the challenge using Haskell:

newtype Cell = Cell Char deriving (Eq)

pattern Empty, Occupied, Floor :: Cell
pattern Empty = Cell 'L'
pattern Occupied = Cell '#'
pattern Floor = Cell '.'
{-# COMPLETE Empty, Occupied, Floor #-}

parseCell :: Char -> Cell
parseCell = \case
  'L' -> Empty
  '#' -> Occupied
  '.' -> Floor
  c -> error $ "Invalid character: " <> show c

rule :: Cell -> [Cell] -> Cell
rule cell neighbours =
  let occupiedNeighboursCount = length $ filter (== Occupied) neighbours
   in case cell of
        Empty | occupiedNeighboursCount == 0 -> Occupied
        Occupied | occupiedNeighboursCount >= 4 -> Empty
        _ -> cell

A cell in the grid can be in empty, occupied or floor state. We encode this with the pattern synonyms Empty, Occupied and Floor over the Cell newtype over Char1.

The parseCell function parses a character to a Cell. The rule function implements the automaton rule.

The Solution

We are going to solve this puzzle in three different ways. So, let’s abstract the details and solve it top-down.

class (Eq a) => Grid a where
  fromLists :: [[Cell]] -> a
  step :: a -> a
  toLists :: a -> [[Cell]]

solve :: forall a. (Grid a) => Proxy a -> [[Cell]] -> Int
solve _ =
  fromLists @a
    >>> fix step
    >>> toLists
    >>> fmap (filter (== Occupied) >>> length)
    >>> sum
  where
    fix f x = let x' = f x in if x == x' then x else fix f x'

We solve the challenge using the Grid typeclass that all our different solutions implement. A grid is specified by three functions:

  1. fromList: converts a list of lists of cells to the grid.
  2. step: runs one step of the CA simulation.
  3. toList: converts the grid back to a list of lists of cells.

The solve function calculates the number of finally occupied seats for any instance of the Grid typeclass by running the simulation till it converges2.

Now, we use solve to solve the challenge in three ways depending on the command line argument supplied:

main :: IO ()
main = do
  progName <- getProgName
  getArgs >>= \case
    [gridType, fileName] ->
      readFile fileName
        >>= (lines >>> map (map parseCell) >>> solve' gridType >>> print)
    _ -> putStrLn $ "Usage: " <> progName <> " -(z|a|s) <input_file>"
  where
    solve' = \case
      "-z" -> solve $ Proxy @(ZGrid Cell)
      "-a" -> solve $ Proxy @(AGrid Cell)
      "-s" -> solve $ Proxy @(SGrid Cell)
      _ -> error "Invalid grid type"

We have set up the top (main) and the bottom (rule) of our solutions. Now let’s work on the middle part.

The Zipper

To simulate a CA, we need to focus on each cell of the automaton grid, and run the rule for the cell. What is the first thing that come to minds of functional programmers when we want to focus on a part of a data structure? Zippers!.

Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper always has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus.

The first way to solve the challenge is the zipper for once-nested lists. Let’s start with creating the zipper for a simple list:

data Zipper a = Zipper [a] a [a] deriving (Eq, Functor)

zPosition :: Zipper a -> Int
zPosition (Zipper left _ _) = length left

zLength :: Zipper a -> Int
zLength (Zipper left _ right) = length left + 1 + length right

listToZipper :: [a] -> Zipper a
listToZipper = \case
  [] -> error "Cannot create Zipper from empty list"
  (x : xs) -> Zipper [] x xs

zipperToList :: Zipper a -> [a]
zipperToList (Zipper left focus right) = reverse left <> (focus : right)

pShowZipper :: (Show a) => Zipper a -> String
pShowZipper (Zipper left focus right) =
  unwords $
    map show (reverse left) <> (("[" <> show focus <> "]") : map show right)

zLeft :: Zipper a -> Zipper a
zLeft z@(Zipper left focus right) = case left of
  [] -> z
  x : xs -> Zipper xs x (focus : right)

zRight :: Zipper a -> Zipper a
zRight z@(Zipper left focus right) = case right of
  [] -> z
  x : xs -> Zipper (focus : left) x xs

A list zipper has a focus element, and two lists that capture the elements to the left and right of the focus. We use it through these functions:

  • zPosition returns the zero-indexed position of the focus in the zipper.
  • zLength returns the length of the zipper.
  • listToZipper and zipperToList do conversions between lists and zippers.
  • pShowZipper pretty-prints a zipper, highlighting the focus.
  • zLeft and zRight move the zipper’s focus to left and right respectively.

Let’s see it all in action:

> z = listToZipper [1..7]
> putStrLn $ pShowZipper z
[1] 2 3 4 5 6 7
> z' = zRight $ zRight $ zLeft $ zRight $ zRight z
> putStrLn $ pShowZipper z'
1 2 3 [4] 5 6 7
> zPosition z'
3
> zLength z'
7
> zipperToList z'
[1,2,3,4,5,6,7]

Great! Now, what is the zipper for a once-nested list? A once-nested zipper, of course:

newtype ZGrid a = ZGrid (Zipper (Zipper a)) deriving (Eq, Functor)

zgPosition :: ZGrid a -> (Int, Int)
zgPosition (ZGrid rows@(Zipper _ focus _)) = (zPosition rows, zPosition focus)

zgSize :: ZGrid a -> (Int, Int)
zgSize (ZGrid rows@(Zipper _ focus _)) = (zLength rows, zLength focus)

listsToZGrid :: [[a]] -> ZGrid a
listsToZGrid rows =
  let (first : rest) = fmap listToZipper rows
   in ZGrid $ Zipper [] first rest

zGridToLists :: ZGrid a -> [[a]]
zGridToLists (ZGrid (Zipper left focus right)) =
  reverse (fmap zipperToList left)
    <> (zipperToList focus : fmap zipperToList right)

pShowZGrid :: (Show a) => ZGrid a -> String
pShowZGrid (ZGrid (Zipper left focus right)) =
  intercalate "\n" $ pShowRows left <> (pShowZipper focus : pShowRows right)
  where
    pShowRows = map pShowZipper'
    pShowZipper' =
      zipperToList
        >>> splitAt (zPosition focus)
        >>> \ ~(left', focus' : right') ->
          unwords $
            map show left' <> ((" " <> show focus' <> " ") : map show right')

ZGrid is a newtype over a zipper of zippers. It has functions similar to Zipper for getting focus, position and size, for conversions to-and-from lists of lists, and for pretty-printing.

Next, the functions to move the focus in the grid:

zgUp :: ZGrid a -> ZGrid a
zgUp (ZGrid rows) = ZGrid $ zLeft rows

zgDown :: ZGrid a -> ZGrid a
zgDown (ZGrid rows) = ZGrid $ zRight rows

zgLeft :: ZGrid a -> ZGrid a
zgLeft (ZGrid rows) = ZGrid $ fmap zLeft rows

zgRight :: ZGrid a -> ZGrid a
zgRight (ZGrid rows) = ZGrid $ fmap zRight rows

Let’s check them out in GHCi:

> zg = listsToZGrid $ replicate 7 $ [1..7]
> putStrLn $ pShowZGrid zg
[1] 2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
 1  2 3 4 5 6 7
> zg' = zgDown $ zgRight $ zgDown $ zgRight zg
> putStrLn $ pShowZGrid zg'
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2 [3] 4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
1 2  3  4 5 6 7
> zgPosition zg'
(2,2)
> zgSize zg'
(7,7)
> zGridToLists zg'
[[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7]]

It works as expected. Now, how do we use this to simulate a CA?

The Comonad

A CA requires us to focus on each cell of the grid, and run a rule for the cell that depends on the neighbours of the cell. An Haskell abstraction that neatly fits this requirement is Comonad.

Comonads are duals of Monads3. We don’t need to learn everything about them for now. For our purpose, Comonad provides an interface that exactly lines up with what is needed for simulating CA:

class Functor w => Comonad w where
  extract :: w a -> a
  duplicate :: w a -> w (w a)
  extend :: (w a -> b) -> w a -> w b
  {-# MINIMAL extract, (duplicate | extend) #-}

Assuming we can make ZGrid a comonad instance, the signatures for the above functions for ZGrid Cell would be:

class Comonad ZGrid where
  extract :: ZGrid Cell -> Cell
  duplicate :: ZGrid Cell -> ZGrid (ZGrid Cell)
  extend :: (ZGrid Cell -> Cell) -> ZGrid Cell -> ZGrid Cell

For ZGrid as a CA comonad:

  • The extract function would return the current focus of the grid.
  • The duplicate function would return a grid of grids, one inner grid for each possible focus of the input grid.
  • The extend function would apply the automata rule to each possible focus of the grid, and return a new grid.

The nice part is, we need to implement only the extract and duplicate functions, and the generation of the new grid is taken care of automatically by the default implementation of the extend function. Let’s write the comonad instance for ZGrid.

First, we write the comonad instance for Zipper:

instance Comonad Zipper where
  extract (Zipper _ focus _) = focus
  duplicate zipper = Zipper left zipper right
    where
      pos = zPosition zipper
      left = iterateN pos zLeft $ zLeft zipper
      right = iterateN (zLength zipper - pos - 1) zRight $ zRight zipper

iterateN :: Int -> (a -> a) -> a -> [a]
iterateN n f = take n . iterate f

extract for Zipper simply returns the input zipper’s focus element.

duplicate returns a zipper of zippers, with the input zipper as its focus, and the left and right lists of zippers as variation of the input zipper with all possible focuses. Trying out the functions in GHCi gives a better idea:

> z = listToZipper [1..7] :: Zipper Int
> :t duplicate z
duplicate z :: Zipper (Zipper Int)
> mapM_ (putStrLn . pShowZipper) $ zipperToList $ duplicate z
[1] 2 3 4 5 6 7
1 [2] 3 4 5 6 7
1 2 [3] 4 5 6 7
1 2 3 [4] 5 6 7
1 2 3 4 [5] 6 7
1 2 3 4 5 [6] 7
1 2 3 4 5 6 [7]

Great! Now we use similar construction to write the comonad instance for ZGrid:

instance Comonad ZGrid where
  extract (ZGrid grid) = extract $ extract grid
  duplicate grid = ZGrid $ Zipper left focus right
    where
      (focusRowPos, focusColPos) = zgPosition grid
      (rowCount, colCount) = zgSize grid

      focus = Zipper focusLeft grid focusRight
      focusLeft = iterateN focusColPos zgLeft $ zgLeft grid
      focusRight =
        iterateN (colCount - focusColPos - 1) zgRight $ zgRight grid

      left = iterateN focusRowPos (fmap zgUp) $ fmap zgUp focus
      right =
        iterateN (rowCount - focusRowPos - 1) (fmap zgDown) $ fmap zgDown focus

It works in similar fashion:

> zg = listsToZGrid $ replicate 4 $ [0..3] :: ZGrid Int
> putStrLn $ pShowZGrid zg
[0] 1 2 3
 0  1 2 3
 0  1 2 3
 0  1 2 3
> :t duplicate zg
duplicate zg :: ZGrid (ZGrid Int)
> :t mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg
mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg :: IO ()

I’ve rearranged the output of running the last line of the code above for clarity:

Output of duplicate for ZGrid <noscript>Output of duplicate for ZGrid</noscript>
Output of duplicate for ZGrid

We can see a grid of grids, with one inner grid focussed at each possible focus of the input grid. Now we finally implement the automaton:

zGridNeighbours :: ZGrid a -> [a]
zGridNeighbours grid =
  map snd . nubBy ((==) `on` fst) $
    [ (pos, extract grid')
      | move <- moves,
        let grid' = move grid,
        let pos = zgPosition grid',
        pos /= zgPosition grid
    ]
  where
    moves =
      [ zgUp, zgDown, zgRight, zgLeft,
        zgUp >>> zgLeft, zgUp >>> zgRight,
        zgDown >>> zgLeft, zgDown >>> zgRight
      ]

stepZGrid :: ZGrid Cell -> ZGrid Cell
stepZGrid = extend $ \grid -> rule (extract grid) (zGridNeighbours grid)

instance Grid (ZGrid Cell) where
  fromLists = listsToZGrid
  step = stepZGrid
  toLists = zGridToLists

zGridNeighbours returns the neighbour cells of the currently focussed cell of the grid. It does so by moving the focus in all eight directions, and extracting the new focuses. We also make sure to return unique cells by their position.

stepZGrid implements one step of the CA using the extend function of the Comonad typeclass. We call extend with a function that takes the current grid, and returns the result of running the CA rule on its focus and the neighbours of the focus.

Finally, we plug in our functions into the ZGrid Cell instance of Grid.

That’s it! Let’s compile and run the code4:

❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[1 of 2] Compiling Main             ( seating-system.hs, seating-system.o )
[2 of 2] Linking seating-system
❯ time ./seating-system -z input.txt
2243
        2.72 real         2.68 user         0.02 sys

I verified with the Advent of Code website that the result is correct. We also see the time elapsed, which is 2.7 seconds. That seems pretty high. Can we do better?

The Array

The problem with the zipper approach is that lists in Haskell are too slow. Some operations on them like length are \(O(n)\). The are also lazy in spine and value, and build up thunks. We could switch to a different list-like data structure5, or cache the grid size and neighbour indices for each index to make it run faster. Or we could try an entirely different approach.

Let’s think about it for a bit. Zippers intermix two things together: the data in the grid, and the focus. When running a step of the CA, the grid data does not change when focussing on all possible focuses, only the focus itself changes. What if we separate the data from the focus? Maybe that’ll make it faster. Let’s try it out.

Let’s model the grid as combination of a 2D array and an index into the array. We are using the arrays from the massiv library.

data AGrid a = AGrid {aGrid :: A.Array A.B A.Ix2 a, aGridFocus :: A.Ix2}
  deriving (Eq, Functor)

A.Ix2 is massiv’s way of representing an index into an 2D array, and is essentially same as a two-tuple of Ints. A.Array A.B A.Ix2 a here means a 2D boxed array of as. massiv uses representation strategies to decide how arrays are actually represented in the memory, among which are boxed, unboxed, primitive, storable, delayed etc. Even though primitive and storable arrays are faster, we have to go with boxed arrays here because the Functor instance of A.Array exists only for boxed and delayed arrays, and boxed ones are the faster among the two for our purpose.

It is actually massively6 easier to write the Comonad instance for AGrid:

instance Comonad AGrid where
  extract (AGrid grid focus) = grid A.! focus
  extend f (AGrid grid focus) =
    AGrid (A.compute $ A.imap (\pos _ -> f $ AGrid grid pos) grid) focus

The extract implementation simply looks up the element from the array at the focus index. This time, we don’t need to implement duplicate because it is easier to implement extend directly. We map with index (A.imap) over the grid, calling the function f for the variation of the grid with the index as the focus.

Next, we write the CA step:

listsToAGrid :: [[Cell]] -> AGrid Cell
listsToAGrid = A.fromLists' A.Seq >>> flip AGrid (0 :. 0)

aGridNeighbours :: AGrid a -> [a]
aGridNeighbours (AGrid grid (x :. y)) =
  [ grid A.! (x + i :. y + j)
    | i <- [-1, 0, 1],
      j <- [-1, 0, 1],
      (x + i, y + j) /= (x, y),
      validIndex (x + i, y + j)
  ]
  where
    A.Sz (rowCount :. colCount) = A.size grid
    validIndex (a, b) = and [a >= 0, b >= 0, a < rowCount, b < colCount]

stepAGrid :: AGrid Cell -> AGrid Cell
stepAGrid = extend $ \grid -> rule (extract grid) (aGridNeighbours grid)

instance Grid (AGrid Cell) where
  fromLists = listsToAGrid
  step = stepAGrid
  toLists = aGrid >>> A.toLists

listsToAGrid converts a list of lists of cells into an AGrid focussed at (0,0). aGridNeighbours finds the neighbours of the current focus of a grid by directly looking up the valid neighbour indices into the array. stepAGrid calls extract and aGridNeighbours to implement the CA step, much like the ZGrid case. And finally, we create the AGrid Cell instance of Grid.

Let’s compile and run it:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -a input.txt
2243
        0.10 real         0.09 user         0.00 sys

Woah! It takes only 0.1 second this time. Can we do even better?

The Stencil

massiv has a construct called Stencil that can be used for simulating CA:

Stencil is abstract description of how to handle elements in the neighborhood of every array cell in order to compute a value for the cells in the new array.

That sounds like exactly what we need. Let’s try it out next.

With stencils, we do not need the instance of Comonad for the grid. So we can switch to the faster unboxed array representation:

newtype instance VU.MVector s Cell = MV_Char (VU.MVector s Char)
newtype instance VU.Vector Cell = V_Char (VU.Vector Char)
deriving instance VGM.MVector VU.MVector Cell
deriving instance VG.Vector VU.Vector Cell
instance VU.Unbox Cell

type SGrid a = A.Array A.U A.Ix2 a

First five lines make Cell an instance of the Unbox typeclass. We chose to make Cell a newtype wrapper over Char because Char has an Unbox instance.

Then we define a new grid type SGrid that is an 2D unboxed array.

Now, we define the stencil and the step function for our CA:

ruleStencil :: A.Stencil A.Ix2 Cell Cell
ruleStencil = AU.makeUnsafeStencil (A.Sz (3 :. 3)) (1 :. 1) $ \_ get ->
  rule (get (0 :. 0)) $ map get neighbourIndexes
  where
    neighbourIndexes =
      [ -1 :. -1, -1 :. 0, -1 :. 1,
         0 :. -1,           0 :. 1,
         1 :. -1,  1 :. 0,  1 :. 1
      ]

stepSGrid :: SGrid Cell -> SGrid Cell
stepSGrid = A.mapStencil (A.Fill Floor) ruleStencil >>> A.computeP

instance Grid (SGrid Cell) where
  fromLists = A.fromLists' A.Seq
  step = stepSGrid
  toLists = A.toLists

We make a stencil of size 3-by-3, where the focus is at index (1,1) relative to the stencil’s top-left cell. In the callback function, we use the supplied get function to get the neighbours of the focus by using indices relative to the focus, and call rule with the cells at focus and neighbour indices.

Then we write the step function stepSGrid that maps the stencil over the grid. Finally we put everything together in the SGrid Cell instance of Grid.

Let’s compile and run it:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt
2243
        0.08 real         0.07 user         0.00 sys

It is only a bit faster than the previous solution. But, this time we have another trick up our sleeves. Did you notice A.computeP we sneaked in there? With stencils, we can now run the step for all cells in parallel! Let’s recompile it with the right options and run it again:

❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
      --run "ghc --make seating-system.hs -O2 -threaded -rtsopts"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt +RTS -N
2243
        0.04 real         0.11 user         0.05 sys

The -threaded option enables multithreading, and the +RTS -N option makes the process use all CPU cores7. We get a nice speedup of 2x over the single-threaded version.

Bonus Round: Simulation Visualization

Since you’ve read the entire post, here is a bonus visualization of the CA simulation for you (warning: lots of fast blinking):

Play the simulation <noscript></noscript>

That’s it for this post! I hope you enjoyed it and took something away from it. The full code for this post is available here.


  1. The reason for using a newtype instead of a data is explained in the Stencil section.↩︎

  2. If you are unfamiliar, >>> is the left-to-right function composition function:

    f >>> g = g . f
    ↩︎
  3. This short post by Bartosz Milewski explains how comonads and monads are related.↩︎

  4. We use Nix for getting the dependency libraries.↩︎

  5. I did try a variation with Data.Sequence.Seq instead of lists, and it was twice as fast.↩︎

  6. Pun very much intended.↩︎

  7. I tried running the process with different values of N and found that N4 gave the fastest results. So, Amdahl’s law applies here.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at January 05, 2025 12:00 AM

January 04, 2025

Philip Wadler

Telnaes quits The Washington Post



Cartoonist Ann Telnaes has quit the Washington Post, after they refused to publish one of her cartoons, depicting Mark Zuckerberg (Meta), Sam Altman (Open AI), Patrick Soon-Shiong (LA Times), the Walt Disney Company (ABC News), and Jeff Bezos (Amazon & Washington Post). All that exists is her preliminary sketch, above. Why is this important? See her primer below. (Spotted via Boing Boing.)





 

by Philip Wadler (noreply@blogger.com) at January 04, 2025 09:41 PM

December 24, 2024

Edward Z. Yang

Ways to use torch.export

Previously, I discussed the value proposition of torch.compile. While doing so, I observed a number of downsides (long compile time, complicated operational model, lack of packaging) that were intrinsic to torch.compile's API contract, which emphasized being able to work on Python code as is, with minimal intervention from users. torch.export occupies a different spot in the tradeoff space: in exchange for more upfront work making a model exportable, it allows for use of PyTorch models in environments where using torch.compile as is would be impossible.

Enable end-to-end C++ CPU/GPU Inference

Scenario: Like before, suppose you want to deploy your model for inference. However, now you have more stringent runtime requirements: perhaps you need to do inference from a CPython-less environment (because your QPS requirements require GIL-less multithreading; alternately, CPython execution overhead is unacceptable but you cannot use CUDA graphs, e.g., due to CPU inference or dynamic shapes requirements). Or perhaps your production environment requires hermetic deploy artifacts (for example, in a monorepo setup, where infrastructure code must be continually pushed but model code should be frozen). But like before, you would prefer not to have to rewrite your model; you would like the existing model to serve as the basis for your Python-less inference binary.

What to do: Use torch.export targeting AOTInductor. This will compile the model into a self-contained shared library which then can be directly invoked from a C++ runtime. This shared library contains all of the compiler generated Triton kernels as precompiled cubins and is guaranteed not to need any runtime compilation; furthermore, it relies only on a small runtime ABI (with no CPython dependency), so the binaries can be used across versions of libtorch. AOTInductor's multithreading capability and low runtime overhead also makes it a good match for CPU inference too!

You don't have to go straight to C++ CPU/GPU inference: you can start with using torch.compile on your code before investing in torch.export. There are four primary extra requirements export imposes: (1) your model must compile with fullgraph=True (though you can sometimes bypass missing Dynamo functionality by using non-strict export; sometimes, it is easier to do non-strict torch.export than it is to torch.compile!), (2) your model's inputs/outputs must only be in torch.export's supported set of argument types (think Tensors in pytrees), (3) your model must never recompile--specifically, you must specify what inputs have dynamic shapes, and (4) the top-level of your model must be an nn.Module (so that export can keep track of all of the parameters your model has).

Some tips:

  • Check out the torch.export programming model. The torch.export programming model is an upcoming doc which aims to help set expectations on what can and cannot be exported. It talks about things like "Tensors are the only inputs that can actually vary at runtime" and common mistakes such as module code which modifies NN modules (not supported!) or optional input types (you will end up with an export that takes in that input or not, there is no runtime optionality).
  • Budget time for getting a model to export. With torch.compile for Python inference, you could just slap it on your model and see what happens. For torch.export, you have to actually finish exporting your entire model before you can even consider running the rest of the pipeline. For some of the more complicated models we have exported, there were often dozens of issues that had to be worked around in one way or another. And that doesn't even account for all of the post-export work you have to do, like validating the numerics of the exported model.
  • Intermediate value debugging. AOTInductor has an option to add dumps of intermediate tensor values in the compiled C++ code. This is good for determining, e.g., the first time where a NaN shows up, in case you are suspecting a miscompilation.

Open source examples: Among other things, torchchat has an example end-to-end AOTInductor setup for server-side LLM inference, which you can view in run.cpp.

torch.export specific downsides:

  • No built-in support for guard-based dispatch (multiple compilations). Earlier, I mentioned that an exported model must not have any recompiles. This leads to some fairly common patterns of code not being directly supported by torch.export: you can't export a single model that takes an enum as input, or has an optional Tensor argument, or accepts two distinct tensor shapes that need to be compiled individually. Now, technically, we could support this: you could imagine a package that contains multiple exported artifacts and dispatches between them depending on some conditions (e.g., the value of the enum, whether or the optional Tensor argument was provided, the shape of the input tensor). But you're on your own: torch.compile will do this for you, but torch.export will not.
  • No built-in support for models that are split into multiple graphs. Similarly, we've mentioned that an exported model must be a single graph. This is in contrast to torch.compile, which will happily insert graph breaks and compile distinct islands of code that can be glued together with Python eager code. Now, technically, you can do this with export too: you can carve out several distinct subnets of your model, export them individually, and then glue them together with some custom written code on the other end (in fact, Meta's internal recommendation systems do this), but there's no built-in support for this workflow.
  • The extra requirements often don't cover important components of real world models. I've mentioned this previously as the extra restrictions export places on you, but it's worth reiterating some of the consequences of this. Take an LLM inference application: obviously, there is a core model that takes in tokens and produces logit predictions--this part of the model is exportable. But there are also important other pieces such as the tokenizer and sampling strategy which are not exportable (tokenizer because it operates on strings, not tensors; sampling because it involves complicated control flow). Arguably, it would be much better if all of these things could be directly bundled with the model itself; in practice, end-to-end applications should just expect to directly implement these in native code (e.g., as is done in torchchat). Our experience with TorchScript taught us that we don't really want to be in the business of designing a general purpose programming language that is portable across all of export's targets; better to just bet that the tokenizer doesn't change that often and eat the cost of natively integrating it by hand.

AOTInductor specific downsides:

  • You still need libtorch to actually run the model. Although AOTInductor binaries bundle most of their compiled kernel implementation, they still require a minimal runtime that can offer basic necessities such as tensor allocation and access to custom operators. There is not yet an official offering of an alternative, lightweight implementation of the stable ABI AOTInductor binaries depends on, so if you do want to deploy AOTInductor binaries you will typically have to also bring libtorch along. This is usually not a big deal server side, but it can be problematic if you want to do client side deployments!
  • No CUDA graphs support. This one is not such a big deal since you are much less likely to be CPU bound when the host side logic is all compiled C++, but there's no support for CUDA graphs in AOTInductor. (Funnily enough, this is also something you technically can orchestrate from outside of AOTInductor.)

Edge deployment

Scenario: You need to deploy your PyTorch model to edge devices (e.g., a mobile phone or a wearable device) where computational resources are limited. You have requirements that are a bit different from server size: you care a lot more about minimizing binary size and startup time. Traditional PyTorch deployment with full libtorch won't work. The device you're deploying too might also have some strange extra processors, like a DSP or NPU, that you want your model to target.

What to do: Use torch.export targeting Executorch. Among other things, Executorch offers a completely separate runtime for exported PyTorch programs (i.e., it has no dependency on libtorch, except perhaps there are a few headers which we share between the projects) which was specifically designed for edge deployment. (Historical note: we spent a long time trying to directly ship a stripped down version of libtorch to mobile devices, but it turns out it's really hard to write code that is portable on server and client, so it's better to only share when absolutely necessary.) Quantization is also a pretty important part of deployment to Edge, and Executorch incorporates this into the end-to-end workflow.

Open source examples: torchchat also has an Executorch integration letting you run an LLM on your Android phone.

Downsides. All of the export related downsides described previously apply here. But here's something to know specifically about Executorch:

  • The edge ecosystem is fragmented. At time of writing, there are seven distinct backends Executorch can target. This is not really Executorch's fault, it comes with the territory--but I want to call it out because it stands in stark contrast to the NVIDIA's server-side hegemony. Yes, AMD GPUs are a thing, and various flavors of CPU are real, but it really is a lot easier to be focused on server side because NVIDIA GPUs come first.

Pre-compiled kernels for eager mode

Scenario: You need a new function or self-contained module with an efficient kernel implementation. However, you would prefer not to have to write the CUDA (or even Triton) by hand; the kernel is something that torch.compile can generate from higher level PyTorch implementation. At the same time, however, you cannot tolerate just-in-time compilation at all (perhaps you are doing a massive training job, and any startup latency makes it more likely that one of your nodes will fail during startup and then you make no progress at all; or maybe you just find it annoying when PyTorch goes out to lunch when you cache miss).

What to do: Use torch.export targeting AOTInductor, and then load and run the AOTInductor generated binary from Python.

Downsides. So, we know this use case works, because we have internally used this to unblock people who wanted to use Triton kernels but could not tolerate Triton's just-in-time compilation. But there's not much affordance in our APIs for this use case; for example, guard-based dispatch is often quite useful for compiled functions, but you'll have to roll that by hand. More generally, when compiling a kernel, you have to make tradeoffs about how static versus dynamic the kernel should be (for example, will you force the inputs to be evenly divisible by eight? Or would you have a separate kernel for the divisible and not divisible cases?) Once again, you're on your own for making the call there.

An exchange format across systems

Scenario: In an ideal world, you would have a model, you could export it to an AOTInductor binary, and then be all done. In reality, maybe this export process needs to be a multi-stage process, where it has to be processed to some degree on one machine, and then finish processing on another machine. Or perhaps you need to shift the processing over time: you want to export a model to freeze it (so it is no longer tied to its original source code), and then repeatedly run the rest of the model processing pipeline on this exported program (e.g., because you are continuously updating its weights and then reprocessing the model). Maybe you want to export the model and then train it from Python later, committing to a distributed training strategy only when you know how many nodes you are running. The ability to hermetically package a model and then process it later is one of the big value propositions of TorchScript and torch.package.

What to do: Use torch.export by itself, potentially using pre-dispatch if you need to support training use-cases. torch.export produces an ExportedProgram which has a clean intermediate representation that you can do processing on, or just serialize and then do processing on later.

Downsides:

  • Custom operators are not packaged. A custom operator typically refers to some native code which was linked with PyTorch proper. There's no way to extract out this kernel and embed it into the exported program so that there is no dependence; instead, you're expected to ensure the eventual runtime relinks with the same custom operator. Note that this problem doesn't apply to user defined Triton kernels, as export can simply compile it and package the binary directly into the exported product. (Technically, this applies to AOTInductor too, but this tends to be much more of a problem for use cases which are primarily about freezing rapidly evolving model code, as opposed to plain inference where you would simply just expect people to not be changing custom operators willy nilly.)
  • Choose your own decompositions. Export produces IR that only contains operators from a canonical operator set. However, the default choice is sometimes inappropriate for use cases (e.g., some users want aten.upsample_nearest2d.vec to be decomposed while others do not), so in practice for any given target you may have a bespoke operator set that is appropriate for that use case. Unfortunately, it can be fiddly getting your operator set quite right, and while we've talked about ideas like a "build your own operator set interactive tool" these have not been implemented yet.
  • Annoyingly large FC/BC surface. Something I really like about AOTInductor is that it has a very small FC/BC surface: I only need to make sure I don't make breaking changes to the C ABI, and I'm golden. With export IR, the FC/BC surface is all of the operators produced by export. Even a decomposition is potentially BC breaking: a downstream pass could be expecting to see an operator that no longer exists because I've decomposed it into smaller pieces. Matters get worse in pre-dispatch export, since the scope of APIs used inside export IR expands to include autograd control operators (e.g., torch.no_grad) as well as tensor subclasses (since Tensor subclasses cannot be desugared if we have not yet eliminated autograd). We will not break your AOTInductor blobs. We can't as easily give the same guarantee for the IR here.

Next time: What's missing, and what we're doing about it

by Edward Z. Yang at December 24, 2024 04:28 AM

December 23, 2024

Michael Snoyman

A secure Bitcoin self custody strategy

Up until this year, my Bitcoin custody strategy was fairly straightforward, and likely familiar to other hodlers:

  • Buy a hardware wallet
  • Put the seed phrase on steel plates
  • Secure those steel plates somewhere on my property

But in October of last year, the situation changed. I live in Northern Israel, close to the Lebanese border. The past 14 months have involved a lot of rocket attacks, including destruction of multiple buildings in my home town. This brought into question how to properly secure my sats. Importantly, I needed to balance two competing goals:

  1. Resiliency of the saved secrets against destruction. In other words: make sure I didn't lose access to the wallet.
  2. Security against attackers trying to steal those secrets. In other words: make sure no one else got access to the wallet.

I put some time into designing a solution to these conflicting goals, and would like to share some thoughts for others looking to improve their BTC custody strategy. And if anyone has any recommendations for improvements, I'm all ears!

Goals

  • Self custody I didn't want to rely on an external custody company. Not your keys, not your coins.
  • Full access I always maintain full access to my funds, without relying on any external party.
  • Computer hack resilient If my computer systems are hacked, I will not lose access to or control of my funds (neither stolen nor lost).
  • Physical destruction resilient If my hardware device and steel plates are both destroyed (as well as anything else physically located in my home town), I can still recovery my funds.
  • Will survive me If I'm killed, I want my wife, children, or other family members to be able to recover and inherit my BTC.

Multisig

The heart of this protection mechanism is a multisig wallet. Unfortunately, interfaces for setting up multisig wallets are tricky. I'll walk through the basics and then come back to how to set it up.

The concept of a multisig is that your wallet is protected by multiple signers. Each signer can be any "normal" wallet, e.g. a software or hardware wallet. You choose a number of signers and a threshold of signers required to perform a transaction.

For example, a 2 of 2 multisig would mean that 2 wallets can sign transactions, and both of them need to sign to make a valid transaction. A 3 of 5 would mean 5 total signers, any 3 of them being needed to sign a transaction.

For my setup, I set up a 2 of 3 multisig, with the 3 signers being a software wallet, a hardware wallet, and SLIP39 wallet. Let's go through each of those, explain how they work, and then see how the solution addresses the goals.

Software wallet

I set up a software wallet and saved the seed phrase in a dedicated password manager account using Bitwarden. Bitwarden offers an emergency access feature, which essentially means a trusted person can be listed as an emergency contact and can recover your account. The process includes a waiting period, during which the account owner can reject the request.

Put another way: Bitwarden is offering a cryptographically secure, third party hosted, fully managed, user friendly dead-man switch. Exactly what I needed.

I added a select group of trusted people as the recoverers on the account. Otherwise, I keep the account securely locked down in Bitwarden and can use it for signing when necessary.

Let's see how this stacks up against the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Fail, if my system is hacked, I lose control of the wallet
  • Physical destruction resilient Check, Bitwarden lives beyond my machines
  • Will survive me Check thanks to the dead-man switch

Hardware wallet

Not much to say about the hardware wallet setup that I haven't said already. Let's do the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Check, the private keys never leave the hardware device
  • Physical destruction resilient Fail, the wallet and plates could easily be destroyed, and the plates could easily be stolen. (The wallet could be stolen too, but thanks to the PIN mechanism would theoretically be resistant to compromise. But that's not a theory I'd want to bet my wealth on.)
  • Will survive me Check, anyone can take my plates and recover the wallet

SLIP39

This one requires a bit of explanation. SLIP39 is a not-so-common standard for taking some data and splitting it up into a number of shards. You can define the threshold of shards necessary to reconstruct the original secret. This uses an algorithm called Shamir's Secret Sharing. (And yes, it is very similar in function to multisig, but implemented differently).

The idea here is that this wallet is controlled by a group of friends and family members. Without getting into my actual setup, I could choose 7 very trusted individuals from all over the world and tell them that, should I contact them and ask for them, they should send me their shards so I can reconstruct that third wallet. And to be especially morbid, they also know the identity of some backup people in the event of my death.

In any event, the idea is that if enough of these people agree to, they can reconstruct the third wallet. The assumption is that these are all trustworthy people. But even with trustworthy people, (1) I could be wrong about how trustworthy they are, or (2) they could be coerced or tricked. So let's see how these security mechanism stands up:

  • Self custody Fail, I'm totally reliant on others.
  • Full access Fail, by design I don't keep this wallet myself, so I must rely on others.
  • Computer hack resilient Check, the holders of these shards keep them in secure, offline storage.
  • Physical destruction resilient Check (sort of), since the probability of all copies being destroyed or stolen is negligible.
  • Will survive me Check, by design

Comparison against goals

We saw how each individual wallet stacked up against the goals. How about all of them together? Well, there are certainly some theoretical ways I could lose the funds, e.g. my hardware wallet and plates are destroyed and a majority of shard holders for the SLIP39 lost their shards. However, if you look through the check/fail lists, every category has at least two checks. Meaning: on all dimensions, if some catastrophe happens, at least two of the wallets should survive.

Now the caveats (I seem to like that word). I did a lot of research on this, and this is at least tangential to my actual field of expertise. But I'm not a dedicated security researcher, and can't really claim full, deep understanding of all these topics. So if I made any mistakes here, please let me know.

How-to guide

OK, so how do you actually get a system like this running? I'll give you my own step-by-step guide. Best case scenario for all this: download all the websites and programs mentioned onto a fresh Linux system install, disconnect the internet, run the programs and copy down any data as needed, and then wipe the system again. (Or, alternatively, do all the actions from a Live USB session.)

  1. Set up the SLIP39. You can use an online generator. Choose the number of bits of entropy (IMO 128bit is sufficient), choose the total shares and threshold, and then copy down the phrases.
  2. Generate the software wallet. You can use a sister site to the SLIP39 generator. Choose either 12 or 24 words, and write those words down. On a different, internet-connected computer, you can save those words into a Bitwarden account, and set it up with appropriate emergency access.
  3. Open up Electrum. (Other wallets, like Sparrow, probably work for this too, but I've only done it with Electrum.) The rest of this section will include a step-by-step guide through the Electrum steps. And yes, I took these screenshots on a Mac, but for a real setup use a Linux machine.

Set up a new wallet. Enter a name (doesn't matter what) and click next.

New wallet

Choose a multisig wallet and click next.

Multisig

Choose 3 cosigners and require 2 signatures.

Signer count

Now we're going to enter all three wallets. The first one will be your hardware device. Click next, then follow all the prompts to set it up.

Hardware

After a few screens (they'll be different based on your choice of hardware device), you'll be prompted to select a derivation path. Use native segwit and the standard derivation path.

segwit

This next screen was the single most complicated for me, simply because the terms were unclear. First, you'll see a Zpub string displayed as a "master public key," e.g.:

Zpub75J9cLwa3iX1zB2oiTdvGDf4EyHWN1ZYs5gVt6JSM9THA6XLUoZhA4iZwyruCKHpw8BFf54wbAK6XdgtMLa2TgbDcftdsietCuKQ6eDPyi6

You need to write this down. It's the same as an xpub, but for multisig wallets. This represents all the possible public keys for your hardware wallet. Putting together the three Zpub values will allow your software of choice to generate all the receiving and change addresses for your new wallet. You'll need all three, so don't lose them! But on their own, they cannot be used to access your funds. Therefore, treat them with "medium" security. Backing up in Bitwarden with your software wallet is a good idea, and potentially simply sending to some friends to back up just in case.

And that explanation brings us back to the three choices on the screen. You can choose to either enter a cosigner key, a cosigner seed, or use another hardware wallet. The difference between key and seed is that the former is public information only, whereas the latter is full signing power. Often, multisig wallets are set up by multiple different people, and so instead of sharing the seed with each other (a major security violation), they each generate a seed phrase and only share the key with each other.

However, given that you're setting up the wallet with access to all seed phrases, and you're doing it on an airgapped device, it's safe to enter the seed phrases directly. And I'd recommend it, to avoid the risk of generating the wrong master key from a seed. So go ahead and choose "enter cosigner seed" and click next.

Add cosigner 2

And now onto the second most confusing screen. I copied my seed phrase into this text box, but it won't let me continue!

Cannot continue

The trick is that Electrum, by default, uses its own concept of seed phrases. You need to click on "Options" and then choose BIP39, and then enter your seed phrase.

BIP39

Continue through the other screens until you're able to enter the final seed. This time, instead of choosing BIP39, choose SLIP39. You'll need to enter enough of the SLIP39 shards to meet the threshold.

SLIP39

And with that, you can continue through the rest of the screens, and you'll now have a fully operational multisig!

Addresses

Open up Electrum again on an internet-connected computer. This time, connect the hardware wallet as before, enter the BIP39 as before, but for the SLIP39, enter the master key instead of the SLIP39 seed phrase. This will ensure that no internet connected device ever has both the software wallet and SLIP39 at the same time. You should confirm that the addresses on the airgapped machine match the addresses on the internet connected device.

If so, you're ready for the final test. Send a small amount of funds into the first receiving address, and then use Electrum on the internet connected device to (1) confirm in the history that it arrived and (2) send it back to another address. You should be asked to sign with your hardware wallet.

If you made it this far, congratulations! You're the proud owner of a new 2of3 multisig wallet.

Conclusion

I hope the topic of death and war wasn't too terribly morbid for others. But these are important topics to address in our world of self custody. I hope others found this useful. And once again, if anyone has recommendations for improvements to this setup, please do let me know!

December 23, 2024 12:00 AM

December 22, 2024

Haskell Interlude

60: Tom Ellis

Tom Ellis works at Groq, using Haskell to compile AI models to specialized hardware.  In this episode, we talk about stability of both GHC and Haskell libraries, effects, and strictness, and the premise of functional programming: make invalid states and invalid *laziness* unrepresentable! 

by Haskell Podcast at December 22, 2024 06:00 PM

December 21, 2024

Philip Wadler

Please submit to Lambda Days

 


I'm part of the programme committee for Lambda Days, and I’m personally inviting you to submit your talk!

Lambda Days is all about celebrating the world of functional programming, and we’re eager to hear about your latest ideas, projects, and discoveries. Whether it’s functional languages, type theory, reactive programming, or something completely unexpected—we want to see it!

🎯 Submission Deadline: 9 February 2025
🎙ï¸� Never spoken before? No worries! We’re committed to supporting speakers from all backgrounds, especially those from underrepresented groups in tech.

Submit your talk and share your wisdom with the FP community.

👉 https://www.lambdadays.org/lambdadays2025#call-for-talks

by Philip Wadler (noreply@blogger.com) at December 21, 2024 07:56 PM

December 18, 2024

Michael Snoyman

Normal People Shouldn't Invest

The world we live in today is inflationary. Through the constant increase in the money supply by governments around the world, the purchasing power of any dollars (or other government money) sitting in your wallet or bank account will go down over time. To simplify massively, this leaves people with three choices:

  1. Keep your money in fiat currencies and earn a bit of interest. You’ll still lose purchasing power over time, because inflation virtually always beats interest, but you’ll lose it more slowly.
  2. Try to beat inflation by investing in the stock market and other risk-on investments.
  3. Recognize that the game is slanted against you, don’t bother saving or investing, and spend all your money today.

(Side note: if you’re reading this and screaming at your screen that there’s a much better option than any of these, I’ll get there, don’t worry.)

High living and melting ice cubes

Option 3 is what we’d call “high time preference.” It means you value the consumption you can have today over the potential savings for the future. In an inflationary environment, this is unfortunately a very logical stance to take. Your money is worth more today than it will ever be later. May as well live it up while you can. Or as Milton Friedman put it, engage in high living.

But let’s ignore that option for the moment, and pursue some kind of low time preference approach. Despite the downsides, we want to hold onto our wealth for the future. The first option, saving in fiat, would work with things like checking accounts, savings accounts, Certificates of Deposit (CDs), government bonds, and perhaps corporate bonds from highly rated companies. There’s little to no risk in those of losing your original balance or the interest (thanks to FDIC protection, a horrible concept I may dive into another time). And the downside is also well understood: you’re still going to lose wealth over time.

Or, to quote James from InvestAnswers, you can hold onto some melting ice cubes. But with sufficient interest, they’ll melt a little bit slower.

The investment option

With that option sitting on the table, many people end up falling into the investment bucket. If they’re more risk-averse, it will probably be a blend of both risk-on stock investment and risk-off fiat investment. But ultimately, they’re left with some amount of money that they want to put into a risk-on investment. The only reason they’re doing that is on the hopes that between price movements and dividends, the value of their investment will grow faster than anything else they can choose.

You may be bothered by my phrasing. “The only reason.” Of course that’s the only reason! We only put money into investments in order to make more money. What other possible reason exists?

Well, the answer is that while we invest in order to make money, that’s not the only reason. That would be like saying I started a tech consulting company to make money. Yes, that’s a true reason. But the purpose of the company is to meet a need in the market: providing consulting services. Like every economic activity, starting a company has a dual purpose: making a profit, but by providing actual value.

So what actual value is generated for the world when I choose to invest in a stock? Let’s rewind to real investment, and then we’ll see how modern investment differs.

Michael (Midas) Mulligan

Let’s talk about a fictional character, Michael Mulligan, aka Midas. In Atlas Shrugged, he’s the greatest banker in the country. He created a small fortune for himself. Then, using that money, he very selectively invested in the most promising ventures. He put his own wealth on the line because he believed each of those ventures had a high likelihood to succeed.

He wasn’t some idiot who jumps on his CNBC show to spout nonsense about which stocks will go up and down. He wasn’t a venture capitalist who took money from others and put it into the highest-volatility companies hoping that one of them would 100x and cover the massive losses on the others. He wasn’t a hedge fund manager who bets everything on financial instruments so complex he can’t understand them, knowing that if it crumbles, the US government will bail him out.

And he wasn’t a normal person sitting in his house, staring at candlestick charts, hoping he can outsmart every other person staring at those same charts by buying in and selling out before everyone else.

No. Midas Mulligan represented the true gift, skill, art, and value of real investment. In the story, we find out that he was the investor who got Hank Rearden off the ground. Hank Rearden uses that investment to start a steel empire that drives the country, and ultimately that powers his ability to invest huge amounts of his new wealth into research into an even better metal that has the promise to reshape the world.

That’s what investment is. And that’s why investment has such a high reward associated with it. It’s a massive gamble that may produce untold value for society. The effort necessary to determine the right investments is high. It’s only right that Midas Mulligan be well compensated for his work. And by compensating him well, he’ll have even more money in the future to invest in future projects, creating a positive feedback cycle of innovation and improvements.

Michael (Crappy Investor) Snoyman

I am not Midas Mulligan. I don’t have the gift to choose the winners in newly emerging markets. I can’t sit down with entrepreneurs and guide them to the best way to make their ideas thrive. And I certainly don’t have the money available to make such massive investments, much less the psychological profile to handle taking huge risks with my money like that.

I’m a low time preference individual by my upbringing, plus I am very risk-averse. I spent most of my adult life putting money into either the house I live in or into risk-off assets. I discuss this background more in a blog post on my current investment patterns. During the COVID-19 money printing, I got spooked about this, realizing that the melting ice cubes were melting far faster than I had ever anticipated. It shocked me out of my risk-averse nature, realizing that if I didn’t take a more risky stance with my money, ultimately I’d lose it all.

So like so many others, I diversified. I put money into stock indices. I realized the stock market was risky, so I diversified further. I put money into various cryptocurrencies too. I learned to read candlestick charts. I made some money. I felt pretty good.

I started feeling more confident overall, and started trying to predict the market. I fixated on this. I was nervous all the time, because my entire wealth was on the line constantly.

And it gets even worse. In economics, we have the concept of an opportunity cost. If I invest in company ABC and it goes up 35% in a month, I’m a genius investor, right? Well, if company DEF went up 40% that month, I can just as easily kick myself for losing out on the better opportunity. In other words, once you’re in this system, it’s a constant rat race to keep finding the best possible returns, not simply being happy with keeping your purchasing power.

Was I making the world a better place? No, not at all. I was just another poor soul trying to do a better job of entering and exiting a trade than the next guy. It was little more than riding a casino.

And yes, I ultimately lost a massive amount of money through this.

Normal people shouldn’t invest

Which brings me to the title of this post. I don’t believe normal people should be subjected to this kind of investment. It’s an extra skill to learn. It’s extra life stress. It’s extra risk. And it doesn’t improve the world. You’re being rewarded—if you succeed at all—simply for guessing better than others.

(Someone out there will probably argue efficient markets and that having everyone trading stocks like this does in fact add some efficiencies to capital allocation. I’ll give you a grudging nod of agreement that this is somewhat true, but not sufficiently enough to justify the returns people anticipate from making “good” gambles.)

The only reason most people ever consider this is because they feel forced into it, otherwise they’ll simply be sitting on their melting ice cubes. But once they get into the game, between risk, stress, and time investment, they’re lives will often get worse.

One solution is to not be greedy. Invest in stock market indices, don’t pay attention to day-to-day price, and assume that the stock market will continue to go up over time, hopefully beating inflation. And if that’s the approach you’re taking, I can honestly say I think you’re doing better than most. But it’s not the solution I’ve landed on.

Option 4: deflation

The problem with all of our options is that they are built in a broken world. The fiat/inflationary world is a rigged game. You’re trying to walk up an escalator that’s going down. If you try hard enough, you’ll make progress. But the system is against you. This is inherent to the design. The inflation in our system is so that central planners have the undeserved ability to appropriate productive capacity in the economy to do whatever they want with it. They can use it to fund government welfare programs, perform scientific research, pay off their buddies, and fight wars. Whatever they want.

If you take away their ability to print money, your purchasing power will not go down over time. In fact, the opposite will happen. More people will produce more goods. Innovators will create technological breakthroughs that will create better, cheaper products. Your same amount of money will buy more in the future, not less. A low time preference individual will be rewarded. By setting aside money today, you’re allowing productive capacity today to be invested into building a stronger engine for tomorrow. And you’ll be rewarded by being able to claim a portion of that larger productive pie.

And to reiterate: in today’s inflationary world, if you defer consumption and let production build a better economy, you are punished with reduced purchasing power.

So after burying the lead so much, my option 4 is simple: Bitcoin. It’s not an act of greed, trying to grab the most quickly appreciating asset. It’s about putting my money into a system that properly rewards low time preference and saving. It’s admitting that I have no true skill or gift to the world through my investment capabilities. It’s recognizing that I care more about destressing my life and focusing on things I’m actually good at than trying to optimize an investment portfolio.

Can Bitcoin go to 0? Certainly, though year by year that likelihood is becoming far less likely. Can Bitcoin have major crashes in its price? Absolutely, but I’m saving for the long haul, not for a quick buck.

I’m hoping for a world where deflation takes over. Where normal people don’t need to add yet another stress and risk to their life, and saving money is the most natural, safest, and highest-reward activity we can all do.

Further reading

December 18, 2024 12:00 AM

December 16, 2024

GHC Developer Blog

GHC 9.12.1 is now available

GHC 9.12.1 is now available

Zubin Duggal - 2024-12-16

The GHC developers are very pleased to announce the release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • Experimental support for the RISC-V platform with the native code generator.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at December 16, 2024 12:00 AM

December 12, 2024

Stackage Blog

LTS 23 release for ghc-9.8 and Nightly now on ghc-9.10

Stackage LTS 23 has been released

The Stackage team is happy to announce that Stackage LTS version 23 has finally been released a couple of days ago, based on GHC stable version 9.8.4. It follows on from the LTS 22 series which was the longest lived LTS major release to date (with probable final snapshot lts-22.43).

We are dedicating the LTS 23 release to the memory of Chris Dornan, who left this world suddenly and unexceptedly around the end of May. We are indebted to Christopher for his many years of wide Haskell community service, including also being one of the Stackage Curators up until the time he passed away. He is warmly remembered.

LTS 23 includes many package changes, and almost 3200 packages! Thank you for all your nightly contributions that made this release possible: the initial release was prepared by Jens Petersen. (The closest nightly snapshot to lts-23.0 is nightly-2024-12-09, but lts-23 is just ahead of it with pandoc-3.6.)

If your package is missing from LTS 23 and can build there, you can easily have it added by opening a PR in lts-haskell to the build-constraints/lts-23-build-constraints.yaml file.

Stackage Nightly updated to ghc-9.10.1

At the same time we are excited to move Stackage Nightly to GHC 9.10.1: the initial snapshot release is nightly-2024-12-11. Current nightly has over 2800 packages, and we expect that number to grow over the coming weeks and months: we welcome your contributions and help with this. This initial release build was made by Jens Petersen (64 commits).

Most of our upperbounds were dropped for this rebase so quite a lot of packages had to be disabled. You can see all the changes made relative to the preceding last 9.8 nightly snapshot. Apart from trying to build yourself, the easiest way to understand why particular packages are disabled is to look for their < 0 lines in build-constraints.yaml, particularly under the "Library and exe bounds failures" section. We also have some tracking issues still open related to 9.10 core boot libraries.

Thank you to all those who have already done work updating their packages for ghc-9.10.

Adding or enabling your package for Nightly is just a simple pull request to the large build-constraints.yaml file.

If you have questions, you can ask in Stack and Stackage Matrix room (#haskell-stack:matrix.org) or Slack channel.

December 12, 2024 07:00 AM

December 11, 2024

Haskell Interlude

Episode 59: Harry Goldstein

Sam and Wouter interview Harry Goldstein, a researcher in property-based testing who works in PL, SE, and HCI. In this episode, we reflect on random generators, the find-a-friend model, interdisciplinary research, and how to have impact beyond your own research community.

by Haskell Podcast at December 11, 2024 02:00 PM

Philip Wadler

John Longley's Informatics Lecturer Song

From my colleague, John Longley, a treat. 

‘Informatics Lecturer Song 

(Based on Gilbert and Sullivan’s ‘Major General song’) 

John Longley 

I am the very model of an Informatics lecturer,
For educating students you will never find a betterer.
I teach them asymptotics with a rigour that’s impeccable,
I’ll show them how to make their proofs mechanically checkable.
On parsing algorithms I can hold it with the best of them,
With LL(1) and CYK and Earley and the rest of them.
I’ll teach them all the levels of the Chomsky hierarchy…
With a nod towards that Natural Language Processing malarkey.

I’ll summarize the history of the concept of a function,
And I’ll tell them why their Haskell code is ‘really an adjunction’.
In matters mathematical and logical, etcetera,
I am the very model of an Informatics lecturer.

For matters of foundations I’m a genuine fanaticker:
I know by heart the axioms of Principia Mathematica,
I’m quite au fait with Carnap and with Wittgenstein’s Tractatus,
And I’ll dazzle you with Curry, Church and Turing combinators.
I’ll present a proof by Gödel with an algebraic seasoning,
I’ll instantly detect a step of non-constructive reasoning.
I’ll tell if you’re a formalist or logicist or Platonist…
For I’ll classify your topos by the kinds of objects that exist.

I’ll scale the heights of cardinals from Mahlo to extendible,
I’ll find your favourite ordinals and stick them in an n-tuple.
In matters philosophical, conceptual, etcetera,
I am the very essence of an Informatics lecturer.

And right now I’m getting started on my personal computer,
I’ve discovered how to get it talking to the Wifi router.
In Internet and World Wide Web I’ve sometimes had my finger dipped,
And once I wrote a line of code in HTML/Javascript.
[Sigh.] I know I have a way to go to catch up with my students,
But I try to face each lecture with a dash of common prudence.
When it comes to modern tech: if there’s a way to get it wrong, I do!
But that seems to be forgiven if I ply them with a song or two.

So… although my present IT skills are rather rudimentary,
And my knowledge of computing stops around the nineteenth century,
Still, with help from all my colleagues and my audience, etcetera…
I’ll be the very model of an Informatics lecturer.


by Philip Wadler (noreply@blogger.com) at December 11, 2024 11:52 AM

December 10, 2024

Chris Smith 2

When is a call stack not a call stack?

Tom Ellis, who I have the privilege of working with at Groq, has an excellent article up about using HasCallStack in embedded DSLs. You should read it. If you don’t, though, the key idea is that HasCallStack isn’t just about exceptions: you can use it to get source code locations in many different contexts, and storing call stacks with data is particularly powerful in providing a helpful experience to programmers.

Seeing Tom’s article reminded me of a CodeWorld feature which was implemented long ago, but I’m excited to share again in this brief note.

CodeWorld Recap

If you’re not familiar with CodeWorld, it’s a web-based programming environment I created mainly to teach mathematics and computational thinking to students in U.S. middle school, ages around 11 to 14 years old. The programming language is based on Haskell — well, it is technically Haskell, but with a lot of preprocessing and tricks aimed at smoothing out the rough edges. There’s also a pure Haskell mode, giving you the full power of the idiomatic Haskell language.

In CodeWorld, the standard library includes primitives for putting pictures on the screen. This includes:

  • A few primitive pictures: circles, rectangles, and the like
  • Transformations to rotate, translate, scale, clip, and and recolor an image
  • Compositions to overlay and combine multiple pictures into a more complex picture.

Because the environment is functional and declarative — and this will be important — there isn’t a primitive to draw a circle. There is a primitive that represents the concept of a circle. You can include a circle in your drawing, of course, but you compose a picture by combining simpler pictures declaratively, and then draw the whole thing only at the very end.

Debugging in CodeWorld

CodeWorld’s declarative interface enables a number of really fun kinds of interactivity… what programmers might call “debugging”, but for my younger audience, I view as exploratory tools: ways they can pry open the lid of their program and explore what it’s doing.

There are a few of these that are pretty awesome. Lest I seem to be claiming the credit, the implementation for these features is due to two students in Summer of Haskell and then in Google Summer of Code: Eric Roberts, and Krystal Maughan.

  • Not the point here, but there are some neat features for rewinding and replaying programs, zooming in, etc.
  • There’s also an “inspect” mode, in which you not only see the final result, but the whole structure of the resulting picture (e.g., maybe it’s an overlay of three other pictures: a background, and two characters, and each of those is transformed in some way, and the base picture for the transformation is some other overlay of multiple parts…) This is possible because pictures are represented not as bitmaps, but as data structures that remember how the picture was built from its individual parts

Krystal’s recap blog post contains demonstrations of not only her own contributions, but the inspect window as well. Here’s a section showing what I’ll talk about now.

https://medium.com/media/7f09408e8411d852516bedb5aab2601c/href

The inspect window is linked to the code editor! Hover over a structural part of the picture, and you can see which expression in your own code produced that part of the picture.

This is another application of the technique from Tom’s post. The data type representing pictures in CodeWorld stores a call stack captured at each part of the picture, so that when you inspect the picture and hover over some part, the environment knows where in your code you described that part, and it highlights the code for you, and jumps there when clicked.

While it’s the same technique, I really like this example because it’s not at all like an exception. We aren’t reporting errors or anything of the sort. Just using this nice feature of GHC that makes the connection between code and declarative data observable to help our users observe things about their own code.

by Chris Smith at December 10, 2024 10:50 PM

Christopher Allen

Two memory issues from the last two weeks

Okay maybe they don't qualify as actual memory bugs, but they were annoying and had memory as a common theme. One of them by itself doesn't merit a blog post so I bundled them together.

by Unknown at December 10, 2024 12:00 AM