Planet Haskell

November 07, 2024

Donnacha Oisín Kidney

POPL Paper—Algebraic Effects Meet Hoare Logic in Cubical Agda

Posted on November 7, 2024
Tags:

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

by Donnacha Oisín Kidney at November 07, 2024 12:00 AM

February 23, 2024

GHC Developer Blog

GHC 9.8.2 is now available

GHC 9.8.2 is now available

Zubin Duggal - 2024-02-23

The GHC developers are happy to announce the availability of GHC 9.8.2. Binary distributions, source distributions, and documentation are available on the release page.

This release is primarily a bugfix release addressing many issues found in the 9.8 series. These include:

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Enjoy!

-Zubin

by ghc-devs at February 23, 2024 12:00 AM

February 22, 2024

Gabriella Gonzalez

Unification-free ("keyword") type checking

Unification-free ("keyword") type checking

From my perspective, one of the biggest open problems in implementing programming languages is how to add a type system to the language without significantly complicating the implementation.

For example, in my tutorial Fall-from-Grace implementation the type checker logic accounts for over half of the code. In the following lines of code report I’ve highlighted the modules responsible for type-checking with a :

$ cloc --by-file src/Grace/*.hs       

--------------------------------------------------------------------------------
File                                    blank        comment           code
--------------------------------------------------------------------------------
src/Grace/Infer.hs        ‡               499            334           1696
src/Grace/Type.hs         ‡                96             91            633
src/Grace/Syntax.hs                        61            163            543
src/Grace/Parser.hs                       166             15            477
src/Grace/Lexer.hs                         69             25            412
src/Grace/Normalize.hs                     47             48            409
src/Grace/Context.hs      ‡                72            165            249
src/Grace/Import.hs                        38              5            161
src/Grace/REPL.hs                          56              4            148
src/Grace/Interpret.hs                     30             28            114
src/Grace/Pretty.hs                        25             25            108
src/Grace/Monotype.hs     ‡                11             48             61
src/Grace/Location.hs                      16             15             60
src/Grace/TH.hs                            23             32             53
src/Grace/Value.hs                         12             53             53
src/Grace/Input.hs                         10              8             43
src/Grace/Compat.hs                         9              2             32
src/Grace/Existential.hs  ‡                12             23             25
src/Grace/Domain.hs       ‡                 4              7             20
--------------------------------------------------------------------------------
SUM:                                     1256           1091           5297
--------------------------------------------------------------------------------

That’s 2684 lines of code (≈51%) just for type-checking (and believe me: I tried very hard to simplify the type-checking code).

This is the reason why programming language implementers will be pretty keen to just not implement a type-checker for their language, and that’s how we end up with a proliferation of untyped programming languages (e.g. Godot or Nix), or ones that end up with a type system bolted on long after the fact (e.g. TypeScript or Python). You can see why someone would be pretty tempted to skip implementing a type system for their language (especially given that it’s an optional language feature) if it’s going to balloon the size of their codebase.

So I’m extremely keen on implementing a “lean” type checker that has a high power-to-weight ratio. I also believe that a compact type checker is an important foundational step for functional programming to “go viral” and displace imperative programming. This post outlines one approach to this problem that I’ve been experimenting with1.

Unification

The thing that bloats the size of most type-checking implementations is the need to track unification variables. These variables are placeholders for storing as-yet-unknown information about something’s type.

For example, when a functional programming language infers the type of something like this Grace expression:

(λx → x) true

… the way it typically works is that it will infer the type of the function (λx → x) which will be:

λx → x : α → α

… where α is a unification variable (an unsolved type). So you can read the above type annotation as saying “the type of λx → x is a function from some unknown input type (α) to the same output type (α).

Then the type checker will infer the type of the function’s input argument (true) which will be:

true : Bool

… and finally the type checker will combine those two pieces of information and reason about the final type like this:

  • the input to the function (true) is a Bool
  • therefore the function’s input type (α) must also be Bool
  • therefore the function’s output type (α) must also be Bool
  • therefore the entire expression’s type is Bool

… which gives the following conclusion of type inference:

(λx → x) true : Bool

However, managing unification variables like α is a lot trickier than it sounds. There are multiple unification algorithms/frameworks in the wild but the problem with all of them is that you have to essentially implement a bespoke logic programming language (with all of the complexity that entails). Like, geez, I’m already implementing a programming language and I don’t want to have to implement a logic programming language on top of that just to power my type-checker.

So there are a couple of ways I’ve been brainstorming how to address this problem and one idea I had was: what if we could get rid of unification variables altogether?

Deleting unification

Alright, so this is the part of the post that requires some familiarity/experience with implementing a type-checker. If you’re somebody new to programming language theory then you can still keep reading but this is where I have to assume some prior knowledge otherwise this post will get way too long.

The basic idea is that you start from the “Complete and Easy” bidirectional type checking algorithm which is a type checking algorithm that does use unification variables2 but is simpler than most type checking algorithms. The type checking rules look like this (you can just gloss over them):

Now, delete all the rules involving unification variables. Yes, all of them. That means that all of the type-checking judgments from Figures 9 and 10 are gone and also quite a few rules from Figure 11 disappear, too.

Surprisingly, you can still type check a lot of code with what’s left, but you lose two important type inference features if you do this:

  • you can no longer infer the types of lambda arguments

  • you can no longer automatically instantiate polymorphic code

… and I’ll dig into those two issues in more detail.

Inferring lambda argument types

You lose the ability to infer the type of a function like this one when you drop support for unification variables:

λx → x == False

Normally, a type checker that supports unification can infer that the above function has type Bool → Bool, but (in general) a type checker can no longer infer that when you drop unification variables from the implementation.

This loss is not too bad (in fact, it’s a pretty common trade-off proposed in the bidirectional type checking literature) because you can make up for it in a few ways (all of which are easy and efficient to implement in a type checker):

  • You can allow the input type to be inferred if the lambda is given an explicit type annotation, like this:

    λx → x == False : BoolBool

    More generally, you can allow the input type to be inferred if the lambda is checked against an expected type (and a type annotation is one case, but not the only case, where a lambda is checked against an expected type).

    We’re going to lean on this pretty heavily because it’s pretty reasonable to ask users to provide type annotations for function definitions and also because there are many situations where we can infer the expected type of a lambda expression from its immediate context.

  • You can allow the user to explicitly supply the type of the argument

    … like this:

    λ(x : Bool) → x == False

    This is how Dhall works, although it’s not as ergonomic.

  • You can allow the input type to be inferred if the lambda is applied to an argument

    This is not that interesting, but I’m mentioning it for completeness. The reason it’s not interesting is because you won’t often see expressions of the form (λx → e) y in the wild, because they can more idiomatically be rewritten as let x = y in e.

Instantiating polymorphic code

The bigger issue with dropping support for unification variables is: all user-defined polymorphic functions now require explicit type abstraction and explicit type application, which is a major regression in the type system’s user experience.

For example, in a language with unification variables you can write the polymorphic identity function as:

λx → x

… and use it like this3:

let id = λx → x
in  (id true, id 1)

… but when you drop support for unification variables then you have to do something like this:

let id = λ(a : Type) → λ(x : a) → x
in  (id Bool true, id Natural 1)

Most programmers do NOT want to program in a language where they have to explicitly manipulate type variables in this way. In particular, they really hate explicit type application. For example, nobody wants to write:

map { x : Bool, … large record … } Bool (λr → r.x) rs

So we need to figure out some way to work around this limitation.

The trick

However, there is a solution that I believe gives a high power-to-weight ratio, which I will refer to as “keyword” type checking:

  • add a bunch of built-in functions

    Specifically, add enough built-in functions to cover most use cases where users would need a polymorphic function.

  • add special type-checking rules for those built-in functions when they’re fully saturated with all of their arguments

    These special-cased type-checking rules would not require unification variables.

  • still require explicit type abstraction when these built-in functions are not fully saturated

    Alternatively, you can require that built-in polymorphic functions are fully saturated with their arguments and make it a parsing error if they’re not.

  • still require explicit type abstraction and explicit type application for all user-defined (i.e. non-builtin) polymorphic functions

  • optionally, turn these built-in functions into keywords or language constructs

I’ll give a concrete example: the map function for lists. In many functional programming languages this map function is not a built-in function; rather it’s defined within the host language as a function of the following type:

map : ∀(a b : Type) → (a → b) → List a → List b

What I’m proposing is that the map function would now become a built-in function within the language and you would now apply a special type-checking rule when the map function is fully saturated:

Γ ⊢ xs ⇒ List a   Γ ⊢ f ⇐ a → b
───────────────────────────────
Γ ⊢ map f xs ⇐ List b

In other words, we’re essentially treating the map built-in function like a “keyword” in our language (when it’s fully saturated). Just like a keyword, it’s a built-in language feature that has special type-checking rules. Hell, you could even make it an actual keyword or language construct (e.g. a list comprehension) instead of a function call.

I would even argue that you should make each of these special-cased builtin-functions a keyword or a language construct instead of a function call (which is why I call this “keyword type checking” in the first place). When viewed through this lens the restrictions that these polymorphic built-in functions (A) are saturated with their arguments and (B) have a special type checking judgment are no different than the restrictions for ordinary keywords or language constructs (which also must be saturated with their arguments and also require special type checking judgments).

To make an analogy, in many functional programming languages the if/then/else construct has this same “keyword” status. You typically don’t implement it as a user-space function of this type:

ifThenElse : ∀(a : Type) → Bool → a → a → a

Rather, you define if as a language construct and you also add a special type-checking rule for if:

Γ ⊢ b ⇐ Bool   Γ ⊢ x ⇒ a   Γ ⊢ y ⇐ a
────────────────────────────────────
Γ ⊢ if b then x else y ⇒ a

… and what I’m proposing is essentially greatly exploding the number of “keywords” in the implementation of the language by turning a whole bunch of commonly-used polymorphic functions into built-in functions (or keywords, or language constructs) that are given special type-checking treatment.

For example, suppose the user were to create a polymorphic function like this one:

let twice = λ(a : Type) → λ(x : a) → [ x, x ]

in  twice (List Bool) (twice Bool true)

That’s not very ergonomic to define and use, but we also can’t reasonably expect our programming language to provide a twice built-in function. However, our language could provide a generally useful replicate builtin function (like Haskell’s replicate function):

replicate : ∀(a : Type) → Natural → a → List a

… with the following type-checking judgment:

Γ ⊢ n ⇐ Natural   Γ ⊢ x ⇒ a
───────────────────────────
Γ ⊢ replicate n x ⇒ List a

… and then you would tell the user to use replicate directly instead of defining their own twice function:

replicate 2 (replicate 2 true)

… and if the user were to ask you “How do I define a twice synonym for replicate 2” you would just tell them “Don’t do that. Use replicate 2 directly.”

Conclusion

This approach has the major upside that it’s much easier to implement a large number of keywords than it is to implement a unification algorithm, but there are other benefits to doing this, too!

  • It discourages complexity and fragmentation in user-space code

    Built-in polymorphic functions have an ergonomic advantage over user-defined polymorphic functions because under this framework type inference works better for built-in functions. This creates an ergonomic incentive to stick to the “standard library” of built-in polymorphic functions, which in turn promotes an opinionated coding style across all code written in that language.

    You might notice that this approach is somewhat similar in spirit to how Go handles polymorphism which is to say: it doesn’t handle user-defined polymorphic code well. For example, Go provides a few built-in language features that support polymorphism (e.g. the map data structure and for loops) but if users ask for any sort of user-defined polymorphism then the maintainers tell them they’re wrong for wanting that. The main difference here is that (unlike Go) we do actually support user-defined polymorphism; it’s not forbidden, but it is less ergonomic than sticking to the built-in utilities that support polymorphism..

  • It improves error messages

    When you special-case the type-checking logic you can also special-case the error messages, too! With general-purpose unification the error message can often be a bit divorced from the user’s intent, but with “keyword type checking” the error message is not only more local to the problem but it can also suggest highly-specific tips or fixes appropriate for that built-in function (or keyword or language construct).

  • It can in some cases more closely match the expectations of imperative programmers

    What I mean is: most programmers coming from an imperative and typed background are used to languages where (most of the time) polymorphism is “supported” via built-in language constructs and keywords and user-defined polymorphism might be supported but considered “fancy”. Leaning on polymorphism via keywords and language constructs would actually make them more comfortable using polymorphism instead of trying to teach them how to produce and consume user-defined polymorphic functions.

    For example, in a lot of imperative languages the idiomatic solution for how to do anything with a list is “use a for loop” where you can think of a for loop as a built-in keyword that supports polymorphic code. The functional programming equivalent of “just use a for loop” would be something like “just use a list comprehension” (where a list comprehension is a “keyword” that supports polymorphic code that we can give special type checking treatment).

That said, this approach is still more brittle than unification and will require more type annotations in general. The goal here isn’t to completely recover the full power of unification but rather to get something that’s not too bad but significantly easier to implement.

I think this “keyword type checking” can potentially occupy a “low tech” point in the type checking design space for functional programming languages that need to have efficient and compact implementations (e.g. for ease of embedding). Also, this can potentially provide a stop-gap solution for novice language implementers that want some sort of a type system but they’re not willing to commit to implementing a unification-based type system.

There’s also variation on this idea which Verity Scheel has been exploring, which is to provide userland support for defining new functions with special type-checking rules and there’s a post from her outlining how to do that:

User Operators with Implicits & Overloads


  1. The other approach is to create essentially an “ABNF for type checkers” that would let you write type-checking judgments in a standard format that could generate the corresponding type-checking code in multiple languages. That’s still a work-in-progress, though.↩︎

  2. I believe some people might take issue with calling these unification variables because they consider bidirectional type checking as a distinct framework from unification. Moreover, in the original bidirectional type checking paper they’re called “unsolved” variables rather than unification variables. However, I feel that for the purpose of this post it’s still morally correct to refer to these unsolved variables as unification variables since their usage and complexity tradeoffs are essentially identical to unification variables in traditional unification algorithms.↩︎

  3. … assuming let expressions are generalized.↩︎

by Gabriella Gonzalez (noreply@blogger.com) at February 22, 2024 04:04 PM

February 21, 2024

Well-Typed.Com

The Haskell Unfolder Episode 20: Dijkstra's shortest paths

Today, 2024-02-21, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 20th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 20: Dijkstra’s shortest paths

In this (beginner-friendly) episode, we will use Dijkstra’s shortest paths algorithm as an example of how one can go about implementing an algorithm given in imperative pseudo-code in idiomatic Haskell. We will focus on readability, not on performance.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

by andres, edsko at February 21, 2024 12:00 AM

February 18, 2024

Haskell Interlude

43: Ivan Perez

In this episode, Wouter and Andres interview Ivan Perez, a senior research scientist at NASA. Ivan tells us about how NASA uses Haskell to develop the Copilot embedded domain specific language for runtime verification, together with some of the obstacles he encounters getting to end users to learn Haskell and adopt such an EDSL.

February 18, 2024 12:00 PM

February 16, 2024

Mark Jason Dominus

Etymology roundup 2024-02

The Recurse Center Zulip chat now has an Etymology channel, courtesy of Jesse Chen, so I have been posting whenever I run into something interesting. This is a summary of some of my recent discoveries. Everything in this article is, to the best of my knowledge, accurate. That is, there are no intentional falsehoods.

Baba ghanouj

I tracked down the meaning of (Arabic) baba ghanouj. It was not what I would have guessed.

Well, sort of. Baba is “father” just like in every language. I had thought of this and dismissed it as unlikely. (What is the connection with eggplants?) But that is what it is.

And ghanouj is …
“coquetry”.

So it's the father of coquetry.

Very mysterious.

Eggnog

Toph asked me if “nog” appeared in any word other than “eggnog”. Is there lemonnog or baconnog? I had looked this up before but couldn't remember what it was except that it was some obsolete word for some sort of drink.

“Nog” is an old Norfolk (England) term for a kind of strong beer which was an ingredient in the original recipe, sometime in the late 17th or early 18th century.

I think modern recipes don't usually include beer.

Wow

“Wow!” appears to be an 18th-century borrowing from an indigenous American language, because most of its early appearances are quotes from indigenous Americans. It is attested in standard English from 1766, spelled “waugh!”, and in Scots English from 1788, spelled “vow!”

Riddles

Katara asked me for examples of words in English like “bear” where there are two completely unrelated meanings. (The word bear like to bear fruit, bear children, or bear a burden is not in any way related to the big brown animal with claws.)

There are a zillion examples of this. They're easy to find in a paper dictionary: you just go down the margin looking for a superscript. When you see “bear¹” and “bear²”, you know you've found an example.

The example I always think of first is “venery” because long, long ago Jed Hartman pointed it out to me: venery can mean stuff pertaining to hunting (it is akin to “venison”) and it can also mean stuff pertaining to sex (akin to “venereal”) and the fact that these two words are spelled the same is a complete coincidence.

Jed said “I bet this is a really rare phenomenon” so I harassed him for the next several years by emailing him examples whenever I happened to think of it.

Anyway, I found an excellent example for Katara that is less obscure than “venery”: “riddle” (like a puzzling question) has nothing to do with when things are riddled with errors. It's a complete coincidence.

The “bear” / “bear” example is a nice simple one, everyone understands it right away. When I was studying Korean I asked my tutor an etymology question, something like whether the “eun” in eunhaeng 은행, “bank”, was the same word as “eun” 은 which means “silver”. He didn't understand the question at first: what did I mean, “is it the same word”?

I gave the bear / bear example, and said that to bear fruit and to bear children are the same word, but the animal with claws is a different word, and just a coincidence that it is spelled the same way. Then he understood what I meant.

(Korean eunhaeng 은행 is a Chinese loanword, from 銀行. 銀 is indeed the word for silver, and 行 is a business-happening-place.)

Right and left

The right arm is the "right" arm because, being the one that is (normally) stronger and more adept, it is the right one to use for most jobs.

But if you ignore the right arm, there is only one left, so that is the "left" arm.

This sounds like a joke, but I looked it up and it isn't.

Leave and left

"Left" is the past tense passive of "leave". As in, I leave the room, I left the room, when I left the room I left my wallet there, my wallet was left, etc.

(As noted above, this is also where we get the left side.)

There are two other words "leave" in English. Leaves like the green things on trees are not related to leaving a room.

(Except I was once at a talk by J.H. Conway in which he was explaining some sort of tree algorithm in which certain nodes were deleted and he called the remaining ones "leaves" because they were the ones that were left. Conway was like that.)

The other "leave" is the one that means "permission" as in "by your leave…". This is the leave we find in "sick leave" or "shore leave". They are not related to the fact that you have left on leave, that is a coincidence.

Normal norms

Latin norma is a carpenter's square, for making sure that things are at right angles to one another.

So something that is normal is something that is aligned the way things are supposed to be aligned, that is to say at right angles. And a norm is a rule or convention or standard that says how things ought to line up.

In mathematics and physics we have terms like “normal vector”, “normal forces” and the like, which means that vectors or forces are at right angles to something. This is puzzling if you think of “normal” as “conventional” or “ordinary” but becomes obvious if you remember the carpenter's square.

In contrast, mathematical “normal forms” have nothing to do with right angles, they are conventional or standard forms. “Normal subgroups” are subgroups that behave properly, the way subgroups ought to.

The names Norman and Norma are not related to this. They are related to the surname Norman which means a person from Normandy. Normandy is so-called because it was inhabited by Vikings (‘northmen’) starting from the 9th century.

Hydrogen and oxygen

Jesse Chen observed that hydrogen means “water-forming”, because when you burn it you get water.

A lot of element names are like this. Oxygen is oxy- (“sharp” or “sour”) because it makes acids, or was thought to make acids. In German the analogous calque is “sauerstoff”.

Nitrogen makes nitre, which is an old name for saltpetre (potassium nitrate). German for nitre seems to be salpeter which doesn't work as well with -stoff.

The halogen gases are ‘salt-making’. (Greek for salt is hals.) Chlorine, for example, is a component of table salt, which is sodium chloride.

In Zulip I added that The capital of Denmark, Copenha-gen, is so-called because in the 11th century is was a major site for the production of koepenha, a Germanic term for a lye compound, used in leather tanning processes, produced from bull dung. I was somewhat ashamed when someone believed this lie despite my mention of bull dung.

Spas, baths, and coaches

Spas (like wellness spa or day spa) are named for the town of Spa, Belgium, which has been famous for its cold mineral springs for thousands of years!

(The town of Bath England is named for its baths, not the other way around.)

The coach is named for the town of Kocs (pronounced “coach”), Hungary, where it was invented. This sounds like something I would make up to prank the kids, but it is not.

Spanish churches

“Iglesia” is Spanish for “church”, and you see it as a surname in Spanish as in English. (I guess, like “Church”, originally the name of someone who lived near a church).

Thinking on this, I realized: “iglesia” is akin to English “ecclesiastic”.

They're both from ἐκκλησία which is an assembly or congregation.

The mysterious Swedish hedgehog

In German, a hedgehog is “Igel”. This is a very ancient word, and several other Germanic languages have similar words. For example, in Frisian it's “ychel”.

In Swedish, “igel” means leech. The hedgehog is “igelkott”.

I tried to find out what -kott was about. “kotte” is a pinecone and may be so-called because “kott” originally meant some rounded object, so igelkott would mean the round igel rather than the blood igel, which is sometimes called blodigel in Swedish.

I was not able to find any other words in Swedish with this sense of -kott. There were some obviously unrelated words like bojkott (“boycott”). And there are a great many Swedish words that end in -skott, which is also unrelated. It means “tail”. For example, the grip of a handgun is revolverskott.

[ Addendum: Gustaf Erikson advises me that I have misunderstood ‑skott; see below. ]

Bonus hedgehog weirdness: In Michael Moorcock's Elric books, Elric's brother is named “Yyrkoon”. The Middle English for a hedgehog is “yrchoun” (variously spelled). Was Moorcock thinking of this? The -ch- in “yrchoun” is t͡ʃ though, which doesn't match the stop consonant in “Yyrkoon”. Also which makes clear that “yrchoun” is just a variant spelling of “urchin”. (Compare “sea urchin”, which is a sea hedgehog. or compare “street urchin”, a small round bristly person who scuttles about in the gutter.)

In Italian a hedgehog is riccio, which I think is also used as a nickname for a curly-haired or bristly-haired person.

Slobs and schlubs

These are not related. Schlub is originally Polish, coming to English via (obviously!) Yiddish. But slob is Irish.

-euse vs. -ice

I tried to guess the French word for a female chiropractor. I guessed “chiropracteuse" by analogy with masseur, masseuse, but I was wrong. It is chiropractrice.

The '‑ice' suffix was clearly descended from the Latin '‑ix' suffix, but I had to look up ‘‑euse’. It's also from a Latin suffix, this time from ‘‑osa’.

Jot

When you jot something down on a notepad, the “jot” is from Greek iota, which is the name of the small, simple letter ι that is easily jotted.

Bonus: This is also the jot that is meant by someone who says “not a jot or a tittle”, for example Matthew 5:18 (KJV):

For verily I say unto you, Till heaven and earth pass, one jot or one tittle shall in no wise pass from the law, till all be fulfilled.

A tittle is the dot above the lowercase ‘i’ or ‘j’. The NIV translates this as “not the smallest letter, not the least stroke of a pen”, which I award an A-plus for translation.

Vilifying villains

I read something that suggested that these were cognate, but they are not.

“Vilify” is from Latin vīlificō which means to vilify. It is a compound of vīlis (of low value or worthless, I suppose the source of “vile”) and faciō (to make, as in “factory” and “manufacture”.)

A villain, on the other hand, was originally just a peasant or serf; that is, a person who lives in a village. “Village” is akin to Latin villa, which originally meant a plantation.

Döner kebab

I had always assumed that “Döner” and its “ö” were German, but they are not, at least not originally. “Döner kebab” is the original Turkish name of the dish, right down to the diaresis on the ‘ö’, which is the normal Turkish spelling; Turkish has an ‘ö’ also. Döner is the Turkish word for a turning-around-thing, because döner kebab meat roasts on a vertical spit from which it is sliced off as needed.

“Döner” was also used in Greek as a loanword but at some point the Greeks decided to use the native Greek word gyro, also a turning-around-thing, instead. Greek is full of Turkish loanwords. (Ottoman Empire, yo.)

“Shawarma”, another variation on the turning-around-vertical-spit dish, is from a different Ottoman Turkish word for a turning-around thing, this time چویرمه (çevirme), which is originally from Arabic.

The Armenian word for shawarma is also shawarma, but despite Armenian being full of Turkish loanwords, this isn't one. They got it from Russian.

Everyone loves that turning-on-a-vertical-spit dish. Lebanese immigrants brought it to Mexico, where it is served in tacos with pineapple and called tacos al pastor (“shepherd style”). I do not know why the Mexicans think that Lebanese turning-around-meat plus pineapples adds up to shepherds. I suppose it must be because the meat is traditionally lamb.

Roll call

To roll is to turn over with a circular motion. This motion might wind a long strip of paper into a roll, or it might roll something into a flat sheet, as with a rolling pin. After rolling out the flat sheet you could then roll it up into a roll.

Dinner rolls are made by rolling up a wad of bread dough.

When you call the roll, it is because you are reading a list of names off a roll of paper.

Theatrical roles are from French rôle which seems to have something to do with rolls but I am not sure what. Maybe because the cast list is a roll (as in roll call).

Wombats and numbats

Both of these are Australian animals. Today it occurred to me to wonder: are the words related? Is -bat a productive morpheme, maybe a generic animal suffix in some Australian language?

The answer is no! The two words are from different (although distantly related) languages. Wombat is from Dharug, a language of the Sydney area. Numbat is from the Nyungar language, spoken on the other end of the continent.

Addendum

Gustaf Erikson advises me that I have misunderstood ‑skott. It is akin to English shoot, and means something that springs forth suddenly, like little green shoots in springtime, or like the shooting of an arrow. In the former sense, it can mean a tail or a sticking-out thing more generally. But in revolverskott is it the latter sense, the firing of a revolver.

by Mark Dominus (mjd@plover.com) at February 16, 2024 09:05 AM

February 15, 2024

Mark Jason Dominus

The pleasures of dolmen-licking

Ugh, the blog has been really stuck lately. I have lots of good stuff in process but I don't know if I will finish any of it, which would be a shame, because it's good stuff and I have put a lot of work into it. So I thought maybe I should make an effort to relax my posting standards for a bit. In fact I should make an effort to relax them more generally. But in particular, today. So,

here is a picture of me licking a dolmen.

A slightly balding dark-haired man with glasses is leaning slightly as he sticks out his tongue to touch a massive rectangular stone that is resting at head height atop smaller upright stones.  In the background are a green hill and a stone wall.  The man has his hands in the pockets of his blue jeans and is wearing a blue denom jacket.

Here is Michael G. Schwern licking the same dolmen.

A bearded man with a great deal of long curly hair is leaning over to lick the same stone table as in the other picture.  The day is much brighter and sunnier.  He is wearing blue jeans, an olive-colored sweater, and has his hands clasped behind him.

Not on the same day, obviously. As far as I know we were not in the country at the same time. The question is in my mind: who was the first of us to lick the dolmen? I think he was there before me. But I also wonder: when I decided to lick it, did I know he had done the same thing? It's quite possible that Marty Pauley or someone said to me “You know, when Schwern was here, he licked it,” to which I would surely have responded “then I shall lick it as well!” But it's also possible that we licked the dolmen completely independently, because why wouldn't you? How often to you get a chance to taste a piece of human prehistory?

As a little kid you discover that the world is full of all sorts of fascinating stuff that you may be allowed to look at, but not to touch, and certainly not to climb on or to lick. (“Don't put it in your mouth!”) Dolmens are a delightful exception to this rule. Sure, lick the dolmen all you want. It has stood in the same place for five thousand years, and whether it stands there for five thousand more will not be affected by any amount of licking.

My inner four-year-old was very satisfied the day I licked the dolmen. I imagine that Schwern felt the same way.

by Mark Dominus (mjd@plover.com) at February 15, 2024 01:32 AM

February 06, 2024

Mark Jason Dominus

Jehovah's Witnesses do not number the days of the week

[ Content warning: Rambly. ]

Two Jehovah's Witnesses came to the door yesterday and at first I did not want to talk to them but as they were leaving I remembered that I had a question. I asked them what they called the days of the week. They were very puzzled by this because it turns out that they call them Monday, Tuesday, Wednesday, Thursday, and so on, just like everyone else in this country. They were so puzzled that they did not even take the opportunity to contine the conversation. They thanked me for coming to the door, and left.

I found this interesting. The reason I had asked is that the JW religion is very strict regarding paganism. For example, they do not observe Christmas or Easter, because these holidays, to them, have a suspicious pagan origin. A few months ago I had wondered: do they celebrate Thanksgiving? I thought it was possible. As far as I know it has no pagan connection at all, and an observance of giving thanks to Jehovah seemed consistent with their beliefs. No, it turns out that they don't, on the principle that to single out one special day might lead them to neglect to give proper thanks to Jehovah on the other days.

So, I wondered, if they object to Easter, how do they feel about the days of the week? To speak of Tuesday, Wednesday, Thursday, and Friday is to honor the pagan Germanic gods Tyr, Odin, Thor, and Frigg, and I thought they might object to this also. The Quakers referred to the days of the week as First Day, Second Day, and so on for this reason, and I thought that the Witnesses might too. But the issue appears to have flown under the JWs' radar.

I didn't ask about the months, assuming that if they didn't cringe when speaking of Thor's Day, they wouldn't have a problem with the month of Janus (the two-faced god of boundaries) or with Maia (her fertility festival is in May) or with the month of the deified person of Roman Emperor Augustus.

I have a sense that Quakers are generally more sophisticated thinkers than Jehovah's Witnesses. They objected to the names of the months also, but decided it would be too confusing to change them. But they saw their opportunity in 1752, when the Kingdom of Great Britain finally brought its calendar in line with the rest of Europe. Along with the other calendrical changes, the Quakers agreed amongst themselves to start calling the months after numbers instead of the old-style names.

I had a conversation once with Larry Wall, who is himself a devout Christian. We were talking about Jehovah's Witnesses, because at that time there was a prominent member of the Perl community who was one. Larry, not at all a venomous person, said with some venom, that the JWs were “a cult”.

“A ‘cult’?” I asked. “What do you mean?” People often use the word cult as a pejorative for “sect” or religion: a cult is any religion that I don't like. But Larry, as usual, was wiser and more thoughtful than that. He said that he called them a cult because you are not allowed to leave. If you do, the other JWs, even your close friends and your family, are no longer allowed to associate with you, and if they do, they may be threatened with expulsion.

I thought that seemed like a principled definition, and it has served me since then. Sometimes, encountering other organizations from which it was difficult to extract onesself, I have heard Larry's voice in my mind, saying “that's a cult”. Thanks, Larry.

I have a draft article about how Larry Wall is my model for a rational, admirable Christian, but I'm not sure it is ever going to come together.

by Mark Dominus (mjd@plover.com) at February 06, 2024 03:10 PM

Tweag I/O

Evaluating Retrieval in RAGs: A Gentle Introduction

llama on rag
No, not this RAG.

Despite their many capabilities, Large Language Models (LLMs) have a serious limitation: they’re stuck in time and their knowledge is limited to the data they have been trained on.

Updating the knowledge of an LLM can take two forms: fine-tuning, which we will address in a future post, and the ever-present RAG. RAG, short for Retrieval Augmented Generation, has garnered a lot of attention in the GenAI community and for good reasons. You “simply” hook the LLM up to your documents (more on that later), and it can suddenly tackle any question, as long as the answer is somewhere in the documents.

This is almost too good to be true: it offers endless possibilities, a simple concept and, thanks to advances in the tooling ecosystem, a straightforward implementation. It is hard to imagine at first sight how it could go wrong.

Yet wrong it goes, and we have seen it happen consistently with our chatbots, as well as SaaS products that we have tested.

In this article, the first of a series on evaluation in LLMs, we will unpack how retrieval impacts the performance of RAG systems, why we need systematic evaluation and what the different schools and frameworks of evaluation are. If you’ve been wondering about evaluating your own RAG system and needed an introduction, look no further.

The perfect RAG assumptions

Simply put, a RAG system retrieves documents similar to your query and uses them to generate a response (see Figure 1).

RAG data
Figure 1. Data flow in a RAG system.

For this to work perfectly, the following assumptions should hold:

  • In retrieval, you need to retrieve relevant data, all the relevant data and nothing but the relevant data.
  • In generation, the LLM should know enough about the topic to synthesize retrieved documents, yet be capable of changing its knowledge when confronted with conflicting or updated evidence.

Good retrieval is vital for a good RAG system. If you feed garbage into your LLM, you should not be too surprised when it spouts garbage back at you. But good retrieval becomes even more essential when using smaller LLMs. These models are not always the best at identifying and filtering irrelevant context.

Retrieval can indeed be one of the weakest parts of a RAG system. Despite the hype around vector databases and semantic search, the problem of knowledge indexing is still far from being solved.

Retrieval, semantic search and everything in between

Because the context an LLM can take is limited, stuffing your whole knowledge base in a prompt is not an option. Even if it were, LLMs are not as good with extracting information from a long piece of text as they are with shorter contexts. This is why retrieval is needed to find the documents that are most relevant to your query.

While this can be done with good old keyword search, semantic search is becoming increasingly the norm for RAG applications. This makes sense. Suppose you ask, “Why do we need search in RAGs?” Despite the absence of the exact words from the query, semantic search may be able to find the previous paragraph as it is semantically aligned with the query. On the other hand, keyword search will fail as neither “search” nor “RAG” are in the text.

In practice, the process is a bit more involved:

  • Documents in a knowledge base are divided into smaller chunks.
  • An embedding model is used to “vectorize” these chunks.
  • These vectors are indexed into a vector database.

Upon receiving a query:

  • The query is vectorized using the same embedding model.
  • The closest vectors are retrieved from the vector database.

Experiments vs. Eyeballing: or why do we need evaluation anyway?

developer eyeballing
Figure 2. Eyeballing, aka changing the code until it works.

We’ve been to so many demos and presentations where questions about evaluation were answered with a variation of “evaluation is on our future agenda” or “we changed the [prompt|chain|model|temperature] until the answer looked good”1 that we internally coined a term for this: eyeballing™.

When performing “eyeballing”, the most probable scenario is that someone, likely the engineer working on the RAG app, tested the app with some queries. For one or more of those, the generated answer was subpar. The engineer randomly debugs these cases, and finds one or more of the following problems:

  • Retrieved references are not relevant to the query.
  • The answer is not truthful to the retrieved content.
  • The answer does not address the question.

The engineer changes something in the implementation, and now the answer looks better (for some, most probably vague definition of the notion of better).

There are many problems with this approach:

  • No benchmark: There is no guarantee that the introduced change did not degrade performance on other questions.
  • No experiment tracking: Likely none of the intermediate states were committed or properly tracked. So we don’t know what combinations of parameters were tested.
  • No evaluation metrics: In the absence of an evaluation framework that defines the notion of “better”, we cannot numerically compare the current RAG state to any other possible state.

The closest software engineering metaphor to the eyeballing approach is manually testing every change applied to the code without having a proper test suite.

The two schools of evaluation: human vs. machine

schools of evaluation

By now it should be clear why evaluation of RAG systems is a must. The question has been approached from various angles and with different evaluation metrics and strategies. We can distinguish, however, a division along the line of whether the evaluator, or the oracle (as it’s typically referred to in Machine Learning and Expert Systems), is a human or an LLM.

  • In human-based evaluation, a human labeler rates the relevance of retrieved documents, either repeatedly (for every experimental setting), or as a one-off, by creating a benchmark of queries and associated relevant documents.
  • In LLM-based evaluation, it is an LLM, usually one that is powerful enough, that evaluates if and how the retrieved content is relevant to a query.

Building a benchmark

Note that in both cases, you need a benchmark to evaluate the RAG against. With LLM-based evaluation, this is usually a set of queries over the documents database. In human-based evaluation, benchmarks can be more elaborate (more on that further below).

Building a useful benchmark is not an easy task. One should balance the types of queries asked, their statistical incidence over the database and the value in catering to a specific subset of queries as opposed to doing a good job over all queries. Exploring these considerations is beyond the scope of this post.

Human-based evaluation

Human-based evaluation is closer to the evaluation paradigm in classic Machine Learning. One can easily apply evaluation metrics originally devised for Information Retrieval. These should be adapted to the RAG retrieval setting where only the k top documents are passed on to the LLM as context and the order in which these documents are retrieved is not relevant. Instead of raw recall and precision, we should instead think of those as a function of k.

A higher precision at k means less noise is mixed with the signal, while a higher recall means that more relevant information is retrieved. Since k is fixed, these should go hand in hand.

Besides using k as a threshold, we can also consider other parameters such as the threshold of similarity between the query and retrieved documents.

Note that despite this approach being more demanding in time, automation of evaluation is still possible once a one-off benchmark is created and evaluation metrics are defined.

LLM-based evaluation

LLM-based evaluation is easier to set-up and automate since it does not require any human involvement beyond the creation of a benchmark of queries. This is the core of the RAGAS and TruLens evaluation frameworks that we will discuss below.

TruLens

rag triad
Figure 3. The RAG evaluation triad.

TruLens defines a golden triad of RAG evaluation (see Figure 3). Let’s discuss in particular retrieval relevance. The idea is to quantify how much the retrieved content is relevant to the query by computing the ratio of relevant to total sentences in the retrieved documents. It is an LLM that determines whether a sentence is needed to answer a query.

RAGAS

RAGAS defines an evaluation matrix over both retrieval and generation, two of those are retrieval evaluation metrics, namely: context relevancy (which is similar to the one defined by TruLens) and context recall.

Context recall is defined as the ratio of statements in the retrieved documents out of the statements in a “model” answer. This model answer should be provided in a “human”-crafted ground truth and the approach is therefore a hybrid human-LLM one. An LLM is responsible for extracting and comparing statements from the retrieved context and the model answer.

Limitations of LLM-based evaluation

llama not understanding quenya
Llamas do not understand Quenya

A fundamental unspoken assumption that underlies using LLMs to evaluate retrieval is that the LLM knows enough about the question and the context to make a judgement on their relevance. This assumption is hard to justify in the context e.g. of fairly technical documentation that the model has not seen before or in a subject the model is not fluent in.

Take the following passage written in Quenya, a fictional language Tolkien invented in Lord of the Rings:

Alcar i cala elenion ancalima. Varda Elentári, Tintallë, tiris ninqe eleni. Lórien omentieva Yavanna Kementári. Eärendil elenion ancalima, perina i oiolossë.2

And take this query:

Man enyalië Varda Elentári tiris eleni?3

Can you tell if the context is relevant to the query?

This is admittedly a constructed example, but we have seen similar cases in play while evaluating a chatbot over Bazel documentation.

This approach has the additional pitfall of not taking into account that, even if the retrieved context is relevant to the query, this does not measure recall: how many of the existing relevant documents are in the database, or how much information required to answer the question is retrieved. While the RAGAS recall metric attempts to mitigate this, crafting answers to fairly technical topics or those that require intimate knowledge of a domain or a knowledge base is both hard and time-consuming. It also does not take into account that the crafted answer might be correct without necessarily including all relevant bits of relevant information in a knowledge base.

Conclusion

The evaluation of RAG systems presents a unique sets of challenges but its value in building usable apps cannot be overstated.

The evaluation frameworks we discussed, both human-based and LLM-based, each have their own advantages and limitations. Human-based evaluations, while thorough and more trustworthy are labor-intensive and hard to repeat. LLM-based evaluations, on the other hand, are much more scalable and can easily be repeated but they rely heavily on LLMs, which have their own biases and limitations.

Stay tuned for the next post in this series, where we present our in-house evaluation framework and share insights and results from real-world cases.


  1. I (Nour) have been to a conference lately where someone said they were using RAGAS, and I had a hard time containing my excitement.
  2. “The glory of the light of the stars is brightest. Varda, Queen of the Stars, Kindler, watches over the sparkling stars. Lórien met Yavanna, Queen of the Earth. Eärendil, brightest of stars, sailed on the everlasting night.”
  3. “Who called Varda, the star queen, watcher of the stars?”

February 06, 2024 12:00 AM

February 03, 2024

Magnus Therning

Bending Warp

In the past I've noticed that Warp both writes to stdout at times and produces some default HTTP responses, but I've never bothered taking the time to look up what possibilities it offers to changes this behaviour. I've also always thought that I ought to find out how Warp handles signals.

If you wonder why this would be interesting to know there are three main points:

  1. The environments where the services run are set up to handle structured logging. In our case it should be JSONL written to stdout, i.e. one JSON object per line.
  2. We've decided that the error responses we produce in our code should be JSON, so it's irritating to have to document some special cases where this isn't true just because Warp has a few default error responses.
  3. Signal handling is, IMHO, a very important part of writing a service that runs well in k8s as it uses signals to handle the lifetime of pods.

Looking through the Warp API

Browsing through the API documentation for Warp it wasn't too difficult to find the interesting pieces, and that Warp follows a fairly common pattern in Haskell libraries

  • There's a function called runSettings that takes an argument of type Settings.
  • The default settings are available in a variable called defaultSettings (not very surprising).
  • There are several functions for modifying the settings and they all have the same shape

    setX :: X -> Settings -> Settings.
    

    which makes it easy to chain them together.

  • The functions I'm interested in now are
    setOnException
    the default handler, defaultOnException, prints the exception to stdout using its Show instance
    setOnExceptionResponse
    the default responses are produced by defaultOnExceptionResponse and contain plain text response bodies
    setInstallShutdownHandler
    the default behaviour is to wait for all ongoing requests and then shut done
    setGracefulShutdownTimeout
    sets the number of seconds to wait for ongoing requests to finnish, the default is to wait indefinitely

Some experimenting

In order to experiment with these I put together a small API using servant, app, with a main function using runSettings and stringing together a bunch of modifications to defaultSettings.

main :: IO ()
main = Log.withLogger $ \logger -> do
    Log.infoIO logger "starting the server"
    runSettings (mySettings logger defaultSettings) (app logger)
    Log.infoIO logger "stopped the server"
  where
    mySettings logger = myShutdownHandler logger . myOnException logger . myOnExceptionResponse

myOnException logs JSON objects (using the logging I've written about before, here and here). It decides wether to log or not using defaultShouldDisplayException, something I copied from defaultOnException.

myOnException :: Log.Logger -> Settings -> Settings
myOnException logger = setOnException handler
  where
    handler mr e = when (defaultShouldDisplayException e) $ case mr of
        Nothing -> Log.warnIO logger $ lm $ "exception: " <> T.pack (show e)
        Just _ -> do
            Log.warnIO logger $ lm $ "exception with request: " <> T.pack (show e)

myExceptionResponse responds with JSON objects. It's simpler than defaultOnExceptionResponse, but it suffices for my learning.

myOnExceptionResponse :: Settings -> Settings
myOnExceptionResponse = setOnExceptionResponse handler
  where
    handler _ =
        responseLBS
            H.internalServerError500
            [(H.hContentType, "application/json; charset=utf-8")]
            (encode $ object ["error" .= ("Something went wrong" :: String)])

Finally, myShutdownHandler installs a handler for SIGTERM that logs and then shuts down.

myShutdownHandler :: Log.Logger -> Settings -> Settings
myShutdownHandler logger = setInstallShutdownHandler shutdownHandler
  where
    shutdownAction = Log.infoIO logger "closing down"
    shutdownHandler closeSocket = void $ installHandler sigTERM (Catch $ shutdownAction >> closeSocket) Nothing

Conclusion

I really ought to have looked into this sooner, especially as it turns out that Warp offers all the knobs and dials I could wish for to control these aspects of its behaviour. The next step is to take this and put it to use in one of the services at $DAYJOB

February 03, 2024 09:16 PM

January 31, 2024

Haskell Interlude

42 : Jezen Thomas

Jezen Thomas is co-founder and CTO of Supercede, a company applying Haskell in the reinsurance industry. In this episode, Jezen, Wouter and Joachim talk about his experience using Haskell in industry, growing a diverse and remote team of developers, and starting a company to create your own Haskell job.

by Haskell Podcast at January 31, 2024 10:00 AM

Well-Typed.Com

The Haskell Unfolder Episode 19: a new perspective on foldl'

Today, 2024-01-31, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 19th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 19: a new perspective on foldl’

In this beginner-oriented episode we introduce a useful combinator called repeatedly, which captures the concept “repeatedly execute an action to a bunch of arguments.” We will discuss both how to implement this combinator as well as some use cases.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

by andres, edsko at January 31, 2024 12:00 AM

January 25, 2024

Joachim Breitner

GHC Steering Committee Retrospective

After seven years of service as member and secretary on the GHC Steering Committee, I have resigned from that role. So this is a good time to look back and retrace the formation of the GHC proposal process and committee.

In my memory, I helped define and shape the proposal process, optimizing it for effectiveness and throughput, but memory can be misleading, and judging from the paper trail in my email archives, this was indeed mostly Ben Gamari’s and Richard Eisenberg’s achievement: Already in Summer of 2016, Ben Gamari set up the ghc-proposals Github repository with a sketch of a process and sent out a call for nominations on the GHC user’s mailing list, which I replied to. The Simons picked the first set of members, and in the fall of 2016 we discussed the committee’s by-laws and procedures. As so often, Richard was an influential shaping force here.

Three ingredients

For example, it was him that suggested that for each proposal we have one committee member be the “Shepherd�, overseeing the discussion. I believe this was one ingredient for the process effectiveness: There is always one person in charge, and thus we avoid the delays incurred when any one of a non-singleton set of volunteers have to do the next step (and everyone hopes someone else does it).

The next ingredient was that we do not usually require a vote among all members (again, not easy with volunteers with limited bandwidth and occasional phases of absence). Instead, the shepherd makes a recommendation (accept/reject), and if the other committee members do not complain, this silence is taken as consent, and we come to a decision. It seems this idea can also be traced back on Richard, who suggested that “once a decision is requested, the shepherd [generates] consensus. If consensus is elusive, then we vote.�

At the end of the year we agreed and wrote down these rules, created the mailing list for our internal, but publicly archived committee discussions, and began accepting proposals, starting with Adam Gundry’s OverloadedRecordFields.

At that point, there was no “secretary� role yet, so how I did become one? It seems that in February 2017 I started to clean-up and refine the process documentation, fixing “bugs in the process� (like requiring authors to set Github labels when they don’t even have permissions to do that). This in particular meant that someone from the committee had to manually handle submissions and so on, and by the aforementioned principle that at every step there ought to be exactly one person in change, the role of a secretary followed naturally. In the email in which I described that role I wrote:

Simon already shoved me towards picking up the “secretary� hat, to reduce load on Ben.

So when I merged the updated process documentation, I already listed myself “secretary�.

It wasn’t just Simon’s shoving that put my into the role, though. I dug out my original self-nomination email to Ben, and among other things I wrote:

I also hope that there is going to be clear responsibilities and a clear workflow among the committee. E.g. someone (possibly rotating), maybe called the secretary, who is in charge of having an initial look at proposals and then assigning it to a member who shepherds the proposal.

So it is hardly a surprise that I became secretary, when it was dear to my heart to have a smooth continuous process here.

I am rather content with the result: These three ingredients – single secretary, per-proposal shepherds, silence-is-consent – helped the committee to be effective throughout its existence, even as every once in a while individual members dropped out.

Ulterior motivation

I must admit, however, there was an ulterior motivation behind me grabbing the secretary role: Yes, I did want the committee to succeed, and I did want that authors receive timely, good and decisive feedback on their proposals – but I did not really want to have to do that part.

I am, in fact, a lousy proposal reviewer. I am too generous when reading proposals, and more likely mentally fill gaps in a specification rather than spotting them. Always optimistically assuming that the authors surely know what they are doing, rather than critically assessing the impact, the implementation cost and the interaction with other language features.

And, maybe more importantly: why should I know which changes are good and which are not so good in the long run? Clearly, the authors cared enough about a proposal to put it forward, so there is some need… and I do believe that Haskell should stay an evolving and innovating language… but how does this help me decide about this or that particular feature.

I even, during the formation of the committee, explicitly asked that we write down some guidance on “Vision and Guideline�; do we want to foster change or innovation, or be selective gatekeepers? Should we accept features that are proven to be useful, or should we accept features so that they can prove to be useful? This discussion, however, did not lead to a concrete result, and the assessment of proposals relied on the sum of each member’s personal preference, expertise and gut feeling. I am not saying that this was a mistake: It is hard to come up with a general guideline here, and even harder to find one that does justice to each individual proposal.

So the secret motivation for me to grab the secretary post was that I could contribute without having to judge proposals. Being secretary allowed me to assign most proposals to others to shepherd, and only once in a while myself took care of a proposal, when it seemed to be very straight-forward. Sneaky, ain’t it?

7 Years later

For years to come I happily played secretary: When an author finished their proposal and public discussion ebbed down they would ping me on GitHub, I would pick a suitable shepherd among the committee and ask them to judge the proposal. Eventually, the committee would come to a conclusion, usually by implicit consent, sometimes by voting, and I’d merge the pull request and update the metadata thereon. Every few months I’d summarize the current state of affairs to the committee (what happened since the last update, which proposals are currently on our plate), and once per year gathered the data for Simon Peyton Jones’ annually GHC Status Report. Sometimes some members needed a nudge or two to act. Some would eventually step down, and I’d sent around a call for nominations and when the nominations came in, distributed them off-list among the committee and tallied the votes.

Initially, that was exciting. For a long while it was a pleasant and rewarding routine. Eventually, it became a mere chore. I noticed that I didn’t quite care so much anymore about some of the discussion, and there was a decent amount of naval-gazing, meta-discussions and some wrangling about claims of authority that was probably useful and necessary, but wasn’t particularly fun.

I also began to notice weaknesses in the processes that I helped shape: We could really use some more automation for showing proposal statuses, notifying people when they have to act, and nudging them when they don’t. The whole silence-is-assent approach is good for throughput, but not necessary great for quality, and maybe the committee members need to be pushed more firmly to engage with each proposal. Like GHC itself, the committee processes deserve continuous refinement and refactoring, and since I could not muster the motivation to change my now well-trod secretarial ways, it was time for me to step down.

Luckily, Adam Gundry volunteered to take over, and that makes me feel much less bad for quitting. Thanks for that!

And although I am for my day job now enjoying a language that has many of the things out of the box that for Haskell are still only language extensions or even just future proposals (dependent types, BlockArguments, do notation with (� foo) expressions and 💜 Unicode), I’m still around, hosting the Haskell Interlude Podcast, writing on this blog and hanging out at ZuriHac etc.

by Joachim Breitner (mail@joachim-breitner.de) at January 25, 2024 12:21 AM

Well-Typed.Com

Eras profiling for GHC

Memory detectives now have many avenues of investigation when looking into memory usage problems in Haskell programs. You might start by looking at what has been allocated: which types of closures and which constructors are contributing significantly to the problem. Then perhaps it’s prudent to look at why a closure has been allocated by the info table provenance information. This will tell you from which point in the source code your allocations are coming from. But if if you then turned to investigate when a closure was allocated during the lifecycle of your program, you end up being stuck.

Existing Haskell heap profiling tools work by taking regular samples of the heap to generate a graph of heap usage over time. This can give an aggregate view, but makes it difficult to determine when an individual closure was allocated.

Eras profiling is a new GHC profiling mode that will be available in GHC 9.10 (!11903). For each closure it records the “era” during which it was allocated, thereby making it possible to analyse the points at which closures are allocated much more precisely.

In this post, we are going to explore this profiling mode that makes it easier to find space leaks and identify suspicious long lived objects on the heap. We have discussed ghc-debug before, and we are going to make use of it to explore the new profiling mode using some new features added to the ghc-debug-brick TUI.

Introduction to eras profiling

The idea of eras profiling is to mark each closure with the era it was allocated. An era is simply a Word. The era can then be used to create heap profiles by era and also inspected by ghc-debug.

To enable eras profiling, you compile programs with profiling enabled and run with the +RTS -he option.

The era starts at 1, then there are two means by which it can be changed:

  • User: The user program has control to set the era explicitly (by using functions in GHC.Profiling.Eras).
  • Automatic: The era is incremented at each major garbage collection (enabled by --automatic-era-increment).

The user mode is most useful as this allows you to provide domain specific eras. There are three new primitive functions exported from GHC.Profiling.Eras for manipulating the era:

setUserEra :: Word -> IO ()
getUserEra :: IO Word
incrementUserEra :: Word -> IO Word

Note that the current era is a program global variable, so if your program is multi-threaded then setting the era will apply to all threads.

Below is an example of an eras profile rendered using eventlog2html. The eras have been increased by the user functions, and the programmer has defined 4 distinct eras.

eventlog2html eras graph, sample
eventlog2html eras graph, sample

Diagnosing a GHCi memory leak

Recently, we came across a regression in GHCi’s memory behaviour (#24116), where reloading a project would use double the amount of expected memory. During each reload of a project, the memory usage would uniformly increase, only to return to the expected level after the reload had concluded.

Reproducing the problem

In order to investigate the issue we loaded Agda into a GHCi session and reloaded the project a few times. Agda is the kind of project we regularly use to analyse compiler performance as it’s a typical example of a medium size Haskell application.

eventlog2html detail chart, agda
eventlog2html detail chart, agda

The profile starts with the initial load of Agda into GHCi, then each subsequent vertical line represents a :reload call.

We can see that while loading the project a second time, GHCi seems to hold on to all of the in-memory compilation artifacts from the first round of compilation, before releasing them right as the load finishes. This is not expected behaviour for GHCi, and ideally it should stay at a roughly constant level of heap usage during a reload.

During a reload, GHCi should either

  1. Keep compilation artifacts from the previous build in memory if it determines that they are still valid after recompilation avoidance checking
  2. Release them as soon as it realizes they are out of date, replacing them with fresh artifacts

In either case, the total heap usage shouldn’t change, since we are either keeping old artifacts or replacing them with new ones of roughly similar size. This task is a perfect fit for eras profiling, if we can get assign a different era for each reload then we should be able to easily confirm our hypothesis about the memory usage pattern.

Instrumenting GHCi

First we instrument the program so that the era is incremented on each reload. We do this by modifying the load function in GHC to increment the era on each call:

--- a/compiler/GHC/Driver/Make.hs
+++ b/compiler/GHC/Driver/Make.hs
@@ -153,6 +153,7 @@ import GHC.Utils.Constants
 import GHC.Types.Unique.DFM (udfmRestrictKeysSet)
 import GHC.Types.Unique
 import GHC.Iface.Errors.Types
+import GHC.Profiling.Eras

 import qualified GHC.Data.Word64Set as W

@@ -702,6 +703,8 @@ load' mhmi_cache how_much diag_wrapper mHscMessage mod_graph = do
     -- In normal usage plugins are initialised already by ghc/Main.hs this is protective
     -- for any client who might interact with GHC via load'.
     -- See Note [Timing of plugin initialization]
+    era <- liftIO getUserEra
+    liftIO $ setUserEra (era + 1)
     initializeSessionPlugins
     modifySession $ \hsc_env -> hsc_env { hsc_mod_graph = mod_graph }
     guessOutputFile

Then when running the benchmark with eras profiling enabled, the profile looks as follows:

eventlog2html eras graph, agda
eventlog2html eras graph, agda

Now we can clearly see that after the reload (the vertical line), all the memory which has been allocated during era 2 remains alive as newly allocated memory belongs to era 3.

Identifying a culprit closure

With the general memory usage pattern established, it’s time to look more closely at the culprits. By performing an info table profile and looking at the detailed tab, we can identify a specific closure which contributes to the bad memory usage pattern.

The GRE closure is one of the top entries in the info table profile, and we can see that its pattern of heap usage matches the overall shape of the graph, which means that we are probably incorrectly holding on to GREs from the first round of compilation.

eventlog2html screenshot, detailed
eventlog2html screenshot, detailed

Now we can turn to more precise debugging tools in order to actually determine where the memory leak is.

Looking closer with ghc-debug

We decided to investigate the leak using ghc-debug. After instrumenting the GHC executable we can connect to a running ghc process with ghc-debug-brick and explore its heap objects using a TUI interface.

Tracing retainers with ghc-debug

To capture the leak, we pause the GHC process right before it finished the reload, while it is compiling the final few modules in Agda. Remember that all the memory is released after the end of the reload to return to the expected baseline.

In order to check for the cause of the leak, we do a search for the retainers of GRE closure in ghc-debug-brick. We are searching for GRE because the info table profile indicated that this was one type of closure which was leaking.

Before eras profiling, if we tried to use this knowledge and ghc-debug-brick to find out why the GREs are being retained then we got a bit stuck. Looking at the interface we can’t distinguish between the two distinct classes of live GREs:

  1. Fresh GREs from the current load (era 3), which we really do need in memory.
  2. Stale GREs from the first load (era 2), which shouldn’t be live anymore and should have be released.
ghc-debug-brick screenshot, no eras
ghc-debug-brick screenshot, no eras

This is the retainer view of ghc-debug, where all closures matching our search (constructor is GRE) are listed, and expanding a closure shows a tree with a retainer stack of all the heap objects which retain the closure. Reading this stack downwards you can determine a chain of references through which any particular closure is retained, going all the way back to a GC root. Inspecting the retainer stack can shed light on why your program is holding on to a particular object.

We can try to scroll through the list of GREs in the TUI, carefully inspecting the retainer stack of each in turn and using our domain knowledge to classify each GRE closure as belonging to one of the two categories above.

However, this process is tedious and error prone, especially given that we have such a large number of potentially leaking objects to inspect. Depending on the order that ghc-debug happened to traverse the heap, we might find leaking entries after inspecting the first few items, or we may be very unlucky and all the leaking items might be hundreds or thousands of entries deep into the list.

ghc-debug supercharged with eras profiling

Now with eras profiling there are two extensions to ghc-debug which make it easy to distinguish these two cases since we already distinguished the era of the objects.

  • Filtering By Era: You can filter the results of any search to only include objects allocated during particular eras, given by ranges.
  • Colouring by Era: You can also enable colouring by era, so that the background colour of entries in ghc-debug-brick is selected based on the era the object was allocated in, making it easy to visually partition the heap and quickly identify leaking objects.

So now, if we enable filtering by era, it’s easy to distinguish the new and old closures.

With the new filtering mode, we search for retainers of GREs which were allocated in era 2. Now we can inspect the retainer stacks of any one of these closures with the confidence that it has leaked. Because ghc-debug-brick is also colouring by era, we can also easily identify roughly where in the retainer stack the leak occurs, because we can see new objects (from era 3) holding on to objects from the previous era (era 2).

We can see that a GRE from era 2 (green) is being retained, through a thunk from GHC.Driver.Make allocated in era 3 (yellow):

ghc-debug-brick screenshot, with closures coloured by era
ghc-debug-brick screenshot, with closures coloured by era

The location of the thunk tells us the exact location of the leak, and it is now just of matter of understanding why this code is retaining on to the unwanted objects and plugging the leak. For more details on the actual fix, see !11608.

Conclusion

We hope the new ghc-debug features and the eras profiling mode will be useful to others investigating the memory behaviour of their programs and easily identifying leaking objects which should not be retained in memory.

This work has been performed in collaboration with Mercury. Mercury have a long-term commitment to the scalability and robustness of the Haskell ecosystem and are supporting the development of memory profiling tools to aid with these goals.

Well-Typed are always interested in projects and looking for funding to improve GHC and other Haskell tools. Please contact info@well-typed.com if we might be able to work with you!

by matthew, zubin at January 25, 2024 12:00 AM

January 18, 2024

Tweag I/O

A look under GHC's hood: desugaring linear types

I recently merged linear let- and where-bindings in GHC. Which means that we’ll have these in GHC 9.10, which is cause for celebration for me. Though they are much overdue, so maybe I should instead apologise to you.

Anyway, I thought I’d take the opportunity to discuss some of GHC’s inner workings and how they explain some of the features of linear types in Haskell. We’ll be discussing a frequently asked question: why can’t Ur be a newtype? And some new questions such as: why must linear let-bindings have a !? But first…

GHC’s frontend part 1: typechecking

Like a nature documentary, let’s follow the adventure of a single declaration as it goes through GHC’s frontend. Say

f True x y = [ toUpper c | c <- show x ++ show y ]
f False x y = show x ++ y

There’s a lot left implicit here: What are the types of x, y, and c? Where do the instances of Show required by show come from? (Magic from outer space?)

Figuring all this out is the job of the typechecker. The typechecker is carefully designed; it’s actually very easy to come up with a sound type system which is undecidable. But even then, decidability is not enough: we don’t want inversion of the True and the False equations of f to change the result of the typechecker. Nor should pulling part of a term in a let let s = show x ++ show y in [ toUpper c | c <- s ]. We want the algorithm to be stable and predictable.

GHC’s typechecker largely follows the algorithm described in the paper OutsideIn(x). It would walk the declaration as:

  • x is used as an argument of show, so it has some type tx and we need Show tx.
  • y is used as an argument of show, so it has some type ty and we need Show ty.
  • c is bound to the return of show, so it has some type tc such that [tc] is [Char] (from which we know that tc is Char).
  • c is used as an argument of toUpper so tc is Char (which we already knew from the previous point, so there’s no type error).
  • From all this we deduce that f :: Bool -> tx -> ty -> [Char].
  • Then the typechecker proceeds through the second equation and discovers that ty is [Char].
  • So the type of f is Bool -> tx -> [Char] -> [Char].
  • Only now, we look at tx and notice that we don’t know anything about it, except that it must have Show tx. So tx is generalised and we get the final type for f: f :: forall a. Show a => Bool -> a -> [Char] -> [Char].

Where we’re allowed to generalise is very important: if the typechecking algorithm could generalise everywhere it’d be undecidable. So the typechecker only generalises at bindings (it even only generalises at top-level bindings with -XMonoLocalBinds), and where it knows that it needs a forall (usually: because of a type signature).

Of course, there’s no way I could give a faithful account of GHC’s typechecker in a handful of paragraphs. I’m sweeping a ton under the carpet (such as bidirectional typechecking or the separation of constraint generation and constraint solving).

GHC’s frontend part 2: desugaring

Now that we know all of this type information, our declaration is going to get desugared. In GHC, this means that the expression is translated to a language called Core. Our expression looks like this after desugaring:

f = \(@a :: Type) -> \(d :: Show a) -> \(b :: Bool) -> \(x :: a) -> \(y :: [Char]) ->
  case b of
  { True -> map @Char (\(c::Char) -> toUpper c) ((++) @Char (show d x) (show (showList showChar) y))
  , False -> (++) @Char (show d x) y
  }

There are a few things to notice:

  • This language is typed.
  • Nothing is left implicit: all the types are tediously specified on each variable binding, on each forall, and on each application of a polymorphic function.
  • Core is a language with far fewer features than Haskell. Gone are the type classes (replaced with actual function arguments), gone are list comprehensions, gone are equations, in fact gone are deep pattern matching, “do” notation, and so many other things.

For the optimiser (Core is the language where most of the optimisations take place), this means dealing with fewer constructs. So there’s less interaction to consider. Types are used to drive some optimisations such as specialisation. But by the time we reach the optimiser, we’ve already left the frontend, and it’s a story for another blog post maybe.

For language authors, this means something very interesting: when defining a new language construction, we can describe its semantics completely by specifying its desugaring. This is a powerful technique, which is used, for instance, in many GHC proposals.

Ur can’t be a newtype

Let me use this idea that a feature’s semantics is given by its desugaring to answer a common question: why must Ur be boxed, couldn’t it be a newtype instead?

To this end, consider the following function, which illustrates the meaning of Ur

f :: forall a. Ur a %1 -> Bool
f (Ur _) = True

That is, Ur a is a wrapper over a that lets me use the inner a as many times as I wish (here, none at all).

The desugaring is very straightforward:

f = \(@a :: Type) -> \(x :: Ur a) ->
  case x of
    Ur y -> True

Because Ur a is just a wrapper, it looks as though it could be defined as a newtype. But newtypes are actually desugared quite differently and this will turn out to be a problem. That is, if Ur a was a newtype then it would have the same underlying representation as a, the desugaring reflects that we can always use one in place of the other. Specifically, desugaring a newtype is done via a coercion of the form ur_co :: Ur a ~R a, which lets Core convert back and forth between types Ur a and a (the R stands for representational). With a newtype the desugaring of f would be:

f_newtype = \(@a :: Type) -> \(x :: Ur a) ->
  let
    y :: a
    y = x `cast` ur_co
  in
  True

Because y isn’t used at all, this is optimised to:

f_newtype = \(@a :: Type) -> \(x :: Ur a) -> True

See the problem yet?

Consider f_newtype destroyTheOneRing. This evaluates to True; the one ring is never destroyed! Contrast with f destroyTheOneRing which reduces to:

case destroyTheOneRing of
  Ur y -> True

This forces me to destroy the one ring before returning True (in our story, the ring is destroyed when the Ur constructor is forced).

This is why Ur can’t be a newtype and why GHC’s typechecker rejects attempts to define an Ur newtype: such a newtype would semantically violate the linearity discipline.

Exercise: this objection wouldn’t apply to a strict language. Do you see why?

No linear lazy patterns

In Haskell, a pattern in a let-binding is lazy. That is:

let (x, y) = bakeTurkey in body

Desugars to:

let
  xy = bakeTurkey
  x = fst xy
  y = snd xy
in
body

Is this linear in bakeTurkey? You can make this argument: although it’s forced several times (once when forcing x and once when forcing y), laziness will ensure that you don’t actually bake your turkey twice. There are some systems, such as Linearity and laziness [pdf], which treat lazy patterns as linear.

But it’s awfully tricky to treat the desugared code as linear. A typechecker that does so would have to understand that x and y are disjoint (so as to make sure that we don’t bake parts of the turkey twice) and that they actually cover all of xy (so that no part of the turkey is left unbaked). And because Core is typed, we do have to make sure that the desugaring is linear.

At this point, this is a complication that we don’t want to tackle. If we wanted to, it would require some additional research; it’s not just a matter of engineering time. As a consequence, patterns in linear let-bindings must be annotated with a !, making them strict:

let !(x, y) = bakeTurkey in body

which desugars to:1

case bakeTurkey of
  (x, y) -> body

Which is understood as linear in Core.

This is really only for actual patterns. Variables let x = rhs in body, which desugar to themselves, are fine.

No linear polymorphic patterns

Finally, let’s look at a trickier interaction. Consider the following:

let (x, y) = ((\z -> z), (\z -> z)) in
case b of
  True -> (x 'a', y False)
  False -> (x True, y 'b')

If MonoLocalBind is turned off, then x and y are inferred to have the type forall a. a -> a. What’s really interesting is the desugaring:

let
  xy = \@a -> ((\(z::a) -> z), (\(z::a) -> z))
  x = \@a -> fst (xy @a)
  y = \@a -> snd (xy @a)
in
case b of
  True -> (x @Char 'a', y @Bool False)
  False -> (x @Bool True, y @Char 'b')

An alternative desugaring might have been:

let
  xy = ((\@a (z::a) -> z), (\@a (z::a) -> z))
  x = fst xy
  y = snd xy
in
case b of
  True -> (x @Char 'a', y @Bool False)
  False -> (x @Bool True, y @Char 'b')

Which does indeed give the same type to x and y. But this alternative desugaring can’t be produced because of how the typechecker works: remember that the typechecker only considers generalisation (which corresponds to adding \@a -> in the desugaring) at the root of a let-binding, which isn’t the case of the alternative desugaring.

So, what does this all have to do with us anyway? Well, here’s the thing. If the alternative desugaring can be transformed into a case (in the strict case):

case ((\@a (z::a) -> z), (\@a (z::a) -> z)) of
  (x, y)  ->
    case b of
      True -> (x @Char 'a', y @Bool False)
      False -> (x @Bool True, y @Char 'b')

Then the actual desugaring cannot:

-- There's no type to use for `??`
case (\@a -> ((\(z::a) -> z), (\(z::a) -> z))) @?? of
  (x, y) ->
    -- Neither `x` nor `y` is polymorphic so the result doesn't
    -- type either.
    case b of
      True -> (x @Char 'a', y @Bool False)
      False -> (x @Bool True, y @Char 'b')

So we’re back to the situation of the lazy pattern matching: we might be able to make the desugaring linear without endangering soundness, but it’s too much effort.

So pattern let-bindings can’t be both linear and polymorphic (here again, variables are fine). This means in practice that:

  • In let %1 (x, y) = the type of x and y is never generalised.
  • Starting with GHC 9.10, LinearTypes will imply MonoLocalBinds.
  • If you turn MonoLocalBinds off, however, and the typechecker infers a polymorphic type for let (x, y) = rhs then rhs will count as being used Many times.

What’s really interesting, here, is that this limitation follows mainly from the typechecking algorithm, which can’t infer every permissible type otherwise it wouldn’t always terminate.

Wrapping up

This is the end of our little journey in the interactions of typechecking, desugaring, and linear types. Maybe you’ve learned a thing or two about what makes GHC tick. Hopefully you’ve learned something about why the design of linear types is how it is.

What I’d like to stress before I leave you is that the design of GHC is profound. We’re not speaking about accidents of history that makes some features harder to implement than they need to. It’s quite possible that a future exists where we can lift some of the limitations that I discussed (not Ur being a newtype, though, this one is simply unsound). But the reason why they are hard to lift is that they require actual research to understand what the relaxed condition implies. They are hard to lift because they are fundamentally hard problems! GHC forces you to think about it. It forces you to deeply understand the features you implement, and tell you to go do your research if you don’t.


  1. This is not actually how strict lets used to desugar. Instead they desugared to:

    let
      xy = bakeTurkey
      x = fst xy
      y = snd xy
    in
    xy `seq` body

    Which will eventually optimise to the appropriate case expression. This desugaring generalises to (mutually) recursive strict bindings, but these aren’t relevant for us as recursive definition lets aren’t linear.

    As part of the linear let patch, I did change the desugaring to target a case directly in the non-recursive case, in order to support linear lets.

January 18, 2024 12:00 AM

Michael Snoyman

My Best and Worst Deadlock in Rust

We're going to build up a deadlock together. If you're unfamiliar with Rust and/or its multithreaded concepts, you'll probably learn a lot from this. If you are familiar with Rust's multithreading capabilities, my guess is you'll be as surprised by this deadlock as I was. And if you spot the deadlock immediately, you get a figurative hat-tip from me.

As to the title, this deadlock was the worst I ever experienced because of how subtle it was. It was the best because of the tooling told me exactly where the problem was. You'll see both points come out below.

Access control

If you've read much of my writing, you'll know I almost always introduce a data structure that looks like this:

struct Person {
    name: String,
    age: u32,
}

So we'll do something very similar here! I'm going to simulate some kind of an access control program that allows multiple threads to use some shared, mutable state representing a person. And we'll make two sets of accesses to this state:

  • A read-only thread that checks if the user has access
  • A writer thread that will simulate a birthday and make the person 1 year older

Our access control is really simple: we grant access to people 18 years or older. One way to write this program looks like this:

use std::sync::Arc;

use parking_lot::RwLock;

#[derive(Clone)]
struct Person {
    inner: Arc<RwLock<PersonInner>>,
}

struct PersonInner {
    name: String,
    age: u32,
}

impl Person {
    fn can_access(&self) -> bool {
        const MIN_AGE: u32 = 18;

        self.inner.read().age >= MIN_AGE
    }

    /// Returns the new age
    fn birthday(&self) -> u32 {
        let mut guard = self.inner.write();
        guard.age += 1;
        guard.age
    }
}

fn main() {
    let alice = Person {
        inner: Arc::new(RwLock::new(PersonInner {
            name: "Alice".to_owned(),
            age: 15,
        })),
    };

    let alice_clone = alice.clone();
    std::thread::spawn(move || loop {
        println!("Does the person have access? {}", alice_clone.can_access());
        std::thread::sleep(std::time::Duration::from_secs(1));
    });

    for _ in 0..10 {
        std::thread::sleep(std::time::Duration::from_secs(1));
        let new_age = alice.birthday();

        println!("Happy birthday! Person is now {new_age} years old.");
    }
}

We're using the wonderful parking-lot crate for this example. Since we have one thread which will exclusively read, an RwLock seems like the right data structure to use. It will allow us to take multiple concurrent read locks or one exclusive write lock at a time. For those familiar with it, this is very similar to the general Rust borrow rules, which allow for multiple read-only (or shared) references or a single mutable (or exclusive) reference.

Anyway, we follow a common pattern with our Person data type. It has a single inner field, which contains an Arc and RwLock wrapping around our inner data structure, which contains the actual name and age. Now we can cheaply clone the Person, keep a single shared piece of data in memory for multiple threads, and either read or mutate the values inside.

Next up, to provide nicely encapsulated access, we provide a series of methods on Person that handle the logic of getting read or write locks. In particular, the can_access method takes a read lock, gets the current age, and compares it to the constant value 18. The birthday method takes a write lock and increments the age, returning the new value.

If you run this on your computer, you'll see something like the following output:

Does the person have access? false
Happy birthday! Person is now 16 years old.
Does the person have access? false
Happy birthday! Person is now 17 years old.
Does the person have access? false
Does the person have access? false
Happy birthday! Person is now 18 years old.
Does the person have access? true
Happy birthday! Person is now 19 years old.
Does the person have access? true
Happy birthday! Person is now 20 years old.
Does the person have access? true
Happy birthday! Person is now 21 years old.
Does the person have access? true
Happy birthday! Person is now 22 years old.
Does the person have access? true
Happy birthday! Person is now 23 years old.
Does the person have access? true
Happy birthday! Person is now 24 years old.
Does the person have access? true
Happy birthday! Person is now 25 years old.

The output may look slightly different due to timing differences, but you get the idea. The person, whoever that happens to be, suddenly has access starting at age 18.

NOTE TO READER I'm not going to keep asking this, but I encourage you to look at each code sample and ask: is this the one that introduces the deadlock? I'll give you the answers towards the end of the post.

What's in a name?

It's pretty annoying having now idea who has access. Alice has a name! We should use it. Let's implement a helper method for getting the person's name:

fn get_name(&self) -> &String {
    &self.inner.read().name
}

While this looks nice, it doesn't actually compile:

error[E0515]: cannot return value referencing temporary value
  --> src/main.rs:30:9
   |
30 |         &self.inner.read().name
   |         ^-----------------^^^^^
   |         ||
   |         |temporary value created here
   |         returns a value referencing data owned by the current function

You see, the way an RwLock's read method works is that it returns a RwLockReadGuard. This implements all the borrow rules we want to see at runtime via value creation and dropping. Said more directly: when you call read, it does something like the following:

  1. Waits until it's allowed to take a read guard. For example, if there's an existing write guard active, it will block until that write guard finishes.
  2. Increments a counter somewhere indicating that there's a new active read guard.
  3. Constructs the RwLockReadGuard value.
  4. When that value gets dropped, its Drop impl will decrement that counter.

And this is basically how many interior mutability primitives in Rust work, whether it's an RwLock, Mutex, or RefCell.

The problem with our implementation of get_name is that it tries to take a lock and then borrow a value through the lock. However, when we exit the get_name method it's still holding a reference to the RwLockReadGuard which we're trying to drop. So how do we implement this method? There are a few possibilities:

  • Return the RwLockReadGuard<PersonInner>. This is no longer a get_name method, but now a general purpose "get a read lock" method. It's also unsatisfying because it requires exposing the innards of our inner data structure.
  • Clone the inner String, which is unnecessary allocation.
  • Wrap the name field with an Arc and clone the Arc, which is probably cheaper than cloning the String.

There are really interesting API design points implied by all this, and it would be fun to explore them another time. However, right now, I've got a tight deadline from my boss on the really important feature of print out the person's name, so I better throw together something really quick and direct. And the easiest thing to do is to just lock the RwLock directly wherever we want a name.

We'll make a small tweak to our spawned thread's closure:

std::thread::spawn(move || loop {
    let guard = alice_clone.inner.read();
    println!(
        "Does the {} have access? {}",
        guard.name,
        alice_clone.can_access()
    );
    std::thread::sleep(std::time::Duration::from_secs(1));
});

Delays

The definition of insanity is doing the same thing over and over and expecting different results

- Somebody, but almost certainly not Albert Einstein

By the above definition of insanity, many have pointed out that multithreaded programming is asking the programmer to become insane. You need to expect different results for different runs of a program. That's because the interleaving of actions between two different threads is non-deterministic. Random delays, scheduling differences, and much more can cause a program to behave correctly on one run and completely incorrectly on another. Which is what makes deadlocks so infuriatingly difficult to diagnose and fix.

So let's simulate some of those random delays in our program by pretending that we need to download some super cute loading image while checking access. I've done so with a println call and an extra sleep to simulate the network request time:

    std::thread::spawn(move || loop {
        let guard = alice_clone.inner.read();
        println!("Downloading a cute loading image, please wait...");
        std::thread::sleep(std::time::Duration::from_secs(1));
        println!(
            "Does the {} have access? {}",
            guard.name,
            alice_clone.can_access()
        );
        std::thread::sleep(std::time::Duration::from_secs(1));
    });

And when I run my program, lo and behold, output stops after printing Downloading a cute loading image, please wait.... Maybe the output will be a bit different on your computer, maybe not. That's the nature of the non-deterministic beast. But this appears to be a deadlock.

The best deadlock experience ever

It turns out that the parking-lot crate provides an experimental feature: deadlock detection. When we were facing the real-life deadlock in our production systems, Sibi found this feature and added it to our executable. And boom! The next time our program deadlocked, we immediately got a backtrace pointing us to the exact function where the deadlock occurred. Since it was a release build, we didn't get line numbers, since those had been stripped out. But since I'm doing a debug build for this blog post, we're going to get something even better here.

Let's add in the following code to the top of our main function:

    std::thread::spawn(move || loop {
        std::thread::sleep(std::time::Duration::from_secs(2));
        for deadlock in parking_lot::deadlock::check_deadlock() {
            for deadlock in deadlock {
                println!(
                    "Found a deadlock! {}:\n{:?}",
                    deadlock.thread_id(),
                    deadlock.backtrace()
                );
            }
        }
    });

Every 2 seconds, this background thread will check if parking-lot has detected any deadlocks and print out the thread they occurred in and the full backtrace. (Why 2 seconds? Totally arbitrary. You could use any sleep amount you want.) When I add this to my program, I get some very helpful output. I'll slightly trim the output to not bother with a bunch of uninteresting backtraces outside of the main function:

Found a deadlock! 140559740036800:
   0: parking_lot_core::parking_lot::deadlock_impl::on_unpark
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot_core-0.9.9/src/parking_lot.rs:1211:32
   1: parking_lot_core::parking_lot::deadlock::on_unpark
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot_core-0.9.9/src/parking_lot.rs:1144:9
   2: parking_lot_core::parking_lot::park::{{closure}}
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot_core-0.9.9/src/parking_lot.rs:637:17
   3: parking_lot_core::parking_lot::with_thread_data
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot_core-0.9.9/src/parking_lot.rs:207:5
      parking_lot_core::parking_lot::park
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot_core-0.9.9/src/parking_lot.rs:600:5
   4: parking_lot::raw_rwlock::RawRwLock::lock_common
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:1115:17
   5: parking_lot::raw_rwlock::RawRwLock::lock_shared_slow
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:719:9
   6: <parking_lot::raw_rwlock::RawRwLock as lock_api::rwlock::RawRwLock>::lock_shared
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/parking_lot-0.12.1/src/raw_rwlock.rs:109:26
   7: lock_api::rwlock::RwLock<R,T>::read
             at /home/michael/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lock_api-0.4.11/src/rwlock.rs:459:9
   8: access_control::Person::can_access
             at src/main.rs:19:9
   9: access_control::main::{{closure}}
             at src/main.rs:59:13
  10: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/79e9716c980570bfd1f666e3b16ac583f0168962/library/std/src/sys_common/backtrace.rs:154:18

Wow, this gave us a direct pointer to where in our codebase the problem occurs. The deadlock happens in the can_access method, which is called from our println! macro call in main.

In a program of this size, getting a direct link to the relevant code isn't terribly helpful. There were only a few lines that could have possibly caused the deadlock. However, in our production codebase, we have thousands of lines of code in the program itself that could have possibly been related. And it turns out the program itself wasn't even the culprit, it was one of the support libraries we wrote!

Being able to get such direct information on a deadlock is a complete gamechanger for debugging problems of this variety. Absolutely huge props and thanks to the parking-lot team for providing this.

But what's the problem?

OK, now it's time for the worst. We still need to identify what's causing the deadlock. Let's start off with the actual deadlock location: the can_access method:

fn can_access(&self) -> bool {
    const MIN_AGE: u32 = 18;

    self.inner.read().age >= MIN_AGE
}

Is this code, on its own, buggy? Try as I might, I can't possibly find a bug in this code. And there isn't one. This is completely legitimate usage of a read lock. In fact, it's a great demonstration of best practices: we take the lock for as little time as needed, ensuring we free the lock and avoiding contention.

So let's go up the call stack and look at the body of our subthread infinite loop:

let guard = alice_clone.inner.read();
println!("Downloading a cute loading image, please wait...");
std::thread::sleep(std::time::Duration::from_secs(1));
println!(
    "Does the {} have access? {}",
    guard.name,
    alice_clone.can_access()
);
std::thread::sleep(std::time::Duration::from_secs(1));

This code is already pretty suspicious. The first thing that pops out to me when reading this code is the sleeps. We're doing something very inappropriate: holding onto a read lock while sleeping. This is a sure-fire way to cause contention for locks. It would be far superior to only take the locks for a limited period of time. Because lexical scoping leads to drops, and drops lead to freeing locks, one possible implementation would look like this:

println!("Downloading a cute loading image, please wait...");
std::thread::sleep(std::time::Duration::from_secs(1));
{
    let guard = alice_clone.inner.read();
    println!(
        "Does the {} have access? {}",
        guard.name,
        alice_clone.can_access()
    );
}
std::thread::sleep(std::time::Duration::from_secs(1));

This version of the code is an improvement. We've eliminated a legitimate performance issue of over-locking a value. And if you run it, you might see output like the following:

Downloading a cute loading image, please wait...
Happy birthday! Person is now 16 years old.
Does the Alice have access? false
Happy birthday! Person is now 17 years old.
Downloading a cute loading image, please wait...
Happy birthday! Person is now 18 years old.
Does the Alice have access? true
Happy birthday! Person is now 19 years old.
Downloading a cute loading image, please wait...
Happy birthday! Person is now 20 years old.
Does the Alice have access? true
Downloading a cute loading image, please wait...
Happy birthday! Person is now 21 years old.
Happy birthday! Person is now 22 years old.
Does the Alice have access? true
Downloading a cute loading image, please wait...
Happy birthday! Person is now 23 years old.
Does the Alice have access? true
Happy birthday! Person is now 24 years old.
Happy birthday! Person is now 25 years old.
Downloading a cute loading image, please wait...

However, you may also see another deadlock message! So our change is a performance improvement, and makes it more likely for our program to complete without hitting the deadlock. But the deadlock is still present. But where???

Why I thought this isn't a deadlock

It's worth pausing one quick moment before explaining where the deadlock is. (And figurative hat-tip if you already know.) Our program has three threads of execution:

  1. The deadlock detection thread. We know this isn't the cause of the deadlock, because we added that thread after we saw the deadlock. (Though "deadlock detection thread leads to deadlock" would be an appropriately mind-breaking statement to make.)
  2. The access check thread, which only does read locks.
  3. The main thread, where we do the birthday updates. We'll call it the birthday thread instead. This thread takes write locks.

And my assumption going into our debugging adventure is that this is perfectly fine. The birthday thread will keep blocking waiting for a write lock. It will block as long as the access check thread is holding a read lock. OK, that's part of a deadlock: thread B is waiting on thread A. And the check access thread will wait for the birthday thread to release its write lock before it can grab a read lock. That's another component of a deadlock. But it seems like each thread can always complete its locking without waiting on the other thread.

If you don't know what the deadlock is yet, and want to try to figure it out for yourself, go check out the RwLock docs from the standard library. But we'll continue the analysis here.

How many read locks?

At this point in our real-life debugging, Sibi observed something: our code was less efficient than it should be. Focus on this bit of code:

let guard = alice_clone.inner.read();
println!(
    "Does the {} have access? {}",
    guard.name,
    alice_clone.can_access()
);

If we inline the definition of can_access, the problem becomes more obvious:

let guard = alice_clone.inner.read();
println!("Does the {} have access? {}", guard.name, {
    const MIN_AGE: u32 = 18;

    alice_clone.inner.read().age >= MIN_AGE
});

The inefficiency is that we're taking two read locks instead of one! We already read-lock inner to get the name, and then we call alice_clone.can_access() which makes its own lock. This is good from a code reuse standpoint. But it's not good from a resource standpoint. During our debugging session, I agreed that this warranted further investigation, but we continued looking for the deadlock.

Turns out, I was completely wrong. This wasn't just an inefficiency. This is the deadlock. But how? It turns out, I'd missed a very important piece of the documentation for RwLock.

This lock uses a task-fair locking policy which avoids both reader and writer starvation. This means that readers trying to acquire the lock will block even if the lock is unlocked when there are writers waiting to acquire the lock. Because of this, attempts to recursively acquire a read lock within a single thread may result in a deadlock.

Or, to copy from std's docs, we have a demonstration of how to generate a potential deadlock with seemingly innocuous code:

// Thread 1             |  // Thread 2
let _rg = lock.read();  |
                        |  // will block
                        |  let _wg = lock.write();
// may deadlock         |
let _rg = lock.read();  |

This is exactly what our code above was doing: the access check thread took a first read lock to get the name, then took a second read lock inside the can_access method to check the age. By introducing a sleep in between these two actions, we increased the likelihood of the deadlock occurring by giving a wider timespan when the write lock from the birthday thread could come in between those two locks. But the sleep was not the bug. The bug was taking two read locks in the first place!

Let's first try to understand why RwLock behaves like this, and then put together some fixes.

Fairness and starvation

Imagine that, instead of a single access check thread, we had a million of them. Each of them is written so that it grabs a read lock, holds onto it for about 200 milliseconds, and then releases it. With a million such threads, there's a fairly high chance that the birthday thread will never be able to get a write lock. There will always be at least one read lock active.

This problem is starvation: one of the workers in a system is never able to get a lock, and therefore it's starved from doing any work. This can be more than just a performance issue, it can completely undermine the expected behavior of a system. In our case, Alice would remain 15 for the entire lifetime of the program and never be able to access the system.

The solution to starvation is fairness, where you make sure all workers get a chance to do some work. With a simpler data structure like a Mutex, this is relatively easy to think about: everyone who wants a lock stands in line and takes the lock one at a time.

However, RwLocks are more complex. They have both read and write locks, so there's not really just one line to stand in. A naive implementation--meaning what I would have implemented before reading the docs from std and parking-lot--would look like this:

  • read blocks until all write locks are released
  • write blocks until all read and write locks are released

However, the actual implementation with fairness accounted for looks something like this:

  • read blocks if there's an active write lock, or if another thread is waiting for a write lock
  • write blocks until all read and write locks are released

And now we can see the deadlock directly:

  1. Access check thread takes a read lock (for reading the name)
  2. Birthday thread tries to take a write lock, but it can't because there's already a read lock. It stands in line waiting its turn.
  3. Access check thread tries to take a read lock (for checking the age). It sees that there's a write lock waiting in line, and to avoid starving it, stands in line behind the birthday thread
  4. The access check thread is blocked until the birthday thread releases its lock. The birthday thread is blocked until the access check thread releases its first lock. Neither thread can make progress. Deadlock!

This, to me, is the worst deadlock I've encountered. Every single step of this process is logical. The standard library and parking-lot both made the correct decisions about implementation. And it still led to confusing behavior at runtime. Yes, the answer is "you should have read the docs," which I've now done. Consider this blog post an attempt to make sure that everyone else reads the docs too.

OK, so how do we resolve this problem? Let's check out two approaches.

Easiest: read_recursive

The parking-lot crate provides a read_recursive method. Unlike the normal read method, this method will not check if there's a waiting write lock. It will simply grab a read lock. By using read_recursive in our can_access method, we don't have a deadlock anymore. And in this program, we also don't have a risk of starvation, because the read_recursive call is always gated after our thread already got a read lock.

However, this isn't a good general purpose solution. It's essentially undermining all the fairness work that's gone into RwLock. Instead, even though it requires a bit more code change, there's a more idiomatic solution.

Just take one lock

This is the best approach we can take. We only need to take one read lock inside our access check thread. One way to make this work is to move the can_access method from Person to PersonInner, and then call can_access on the guard, like so:

impl PersonInner {
    fn can_access(&self) -> bool {
        const MIN_AGE: u32 = 18;

        self.age >= MIN_AGE
    }
}

// ...


let guard = alice_clone.inner.read();
println!("Downloading a cute loading image, please wait...");
std::thread::sleep(std::time::Duration::from_secs(1));
println!(
    "Does the {} have access? {}",
    guard.name,
    guard.can_access()
);
std::thread::sleep(std::time::Duration::from_secs(1));

This fully resolves the deadlock issue. There are still questions about exposing the innards of our data structure. We could come up with a more complex API that keeps some level of encapsulation, e.g.:

use std::sync::Arc;

use parking_lot::{RwLock, RwLockReadGuard};

#[derive(Clone)]
struct Person {
    inner: Arc<RwLock<PersonInner>>,
}

struct PersonInner {
    name: String,
    age: u32,
}

struct PersonReadGuard<'a> {
    guard: RwLockReadGuard<'a, PersonInner>,
}

impl Person {
    fn read(&self) -> PersonReadGuard {
        PersonReadGuard {
            guard: self.inner.read(),
        }
    }

    /// Returns the new age
    fn birthday(&self) -> u32 {
        let mut guard = self.inner.write();
        guard.age += 1;
        guard.age
    }
}

impl PersonReadGuard<'_> {
    fn can_access(&self) -> bool {
        const MIN_AGE: u32 = 18;

        self.guard.age >= MIN_AGE
    }

    fn get_name(&self) -> &String {
        &self.guard.name
    }
}

fn main() {
    std::thread::spawn(move || loop {
        std::thread::sleep(std::time::Duration::from_secs(2));
        for deadlock in parking_lot::deadlock::check_deadlock() {
            for deadlock in deadlock {
                println!(
                    "Found a deadlock! {}:\n{:?}",
                    deadlock.thread_id(),
                    deadlock.backtrace()
                );
            }
        }
    });

    let alice = Person {
        inner: Arc::new(RwLock::new(PersonInner {
            name: "Alice".to_owned(),
            age: 15,
        })),
    };

    let alice_clone = alice.clone();
    std::thread::spawn(move || loop {
        let guard = alice_clone.read();
        println!("Downloading a cute loading image, please wait...");
        std::thread::sleep(std::time::Duration::from_secs(1));
        println!(
            "Does the {} have access? {}",
            guard.get_name(),
            guard.can_access()
        );
        std::thread::sleep(std::time::Duration::from_secs(1));
    });

    for _ in 0..10 {
        std::thread::sleep(std::time::Duration::from_secs(1));
        let new_age = alice.birthday();

        println!("Happy birthday! Person is now {new_age} years old.");
    }
}

Is this kind of overhead warranted? Definitely not for this case. But such an approach might make sense for larger programs.

So when did we introduce the bug?

Just to fully answer the question I led with: we introduced the deadlock in the section title "What's in a name". In the real life production code, the bug came into existance in almost exactly the same way I described above. We had an existing helper method that took a read lock, then ended up introducing another method that took a read lock on its own and, while that lock was held, called into the existing helper method.

It's very easy to introduce a bug like that. (Or at least that's what I'm telling myself to feel like less of an idiot.) Besides the deadlock problem, it also introduces other race conditions. For example, if I had taken-and-released the read lock in the parent function before calling the helper function, I'd have a different kind of race condition: I'd be pulling data from the same RwLock in a non-atomic manner. Consider if, for example, Alice's name changes to "Alice the Adult" when she turns 18. In the program above, it's entirely possible to imagine a scenario where we say that "Alice the Adult" doesn't have access.

All of this to say: any time you're dealing with locking, you need to be careful to avoid potential data races. Rust makes it so much nicer than many other languages to avoid race conditions through things like RwLockReadGuard, the Send and Sync traits, mutable borrow checking, and other techniques. But it's still not a panacea.

January 18, 2024 12:00 AM

January 17, 2024

Well-Typed.Com

The Haskell Unfolder Episode 18: computing constraints

Today, 2024-01-17, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 18th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 18: computing constraints

Sometimes, for example when working with type-level lists, you have to compute with constraints. For example, you might want to say that a constraint holds for all types in a type-level list. In this episode, we will explore this special case of type-level programming in Haskell. We will also revisit type class aliases and take a closer look at exactly how and why they work.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

by andres, edsko at January 17, 2024 12:00 AM

January 15, 2024

Monday Morning Haskell

Functional Programming vs. Object Oriented Programming

Functional Programming (FP) and Object Oriented Programming (OOP) are the two most important programming paradigms in use today. In this article, we'll discuss these two different programming paradigms and compare their key differences, strengths and weaknesses. We'll also highlight a few specific ways Haskell fits into this discussion. Here's a quick outline if you want to skip around a bit!

What is a Programming Paradigm?

A paradigm is a way of thinking about a subject. It's a model against which we can compare examples of something.

In programming, there are many ways to write code to solve a particular task. Our tasks normally involve taking some kind of input, whether data from a database or commands from a user. A program's job is then to produce outputs of some kind, like updates in that database or images on the user's screen.

Programming paradigms help us to organize our thinking so that we can rapidly select an implementation path that makes sense to us and other developers looking at the code. Paradigms also provide mechanisms for reusing code, so that we don't have to start from scratch every time we write a new program.

The two dominant paradigms in programming today are Object Oriented Programming (OOP) and Functional Programming (FP).

The Object Oriented Paradigm

In object oriented programming, our program's main job is to maintain objects. Objects almost always store data, and they have particular ways of acting on other objects and being acted on by other objects (these are the object's methods). Objects often have mutable data - many actions you take on your objects are capable of changing some of the object's underlying data.

Object oriented programming allows code reuse through a system called inheritance. Objects belong to classes which share the same kinds of data and actions. Classes can inherit from a parent class (or multiple classes, depending on the language), so that they also have access to the data from the base class and some of the same code that manipulates it.

The Functional Paradigm

In functional programming, we think about programming in terms of functions. This idea is rooted in the mathematical idea of a function. A function in math is a process which takes some input (or a series of different inputs) and produces some kind of output. A simple example would be a function that takes an input number and produces the square of that number. Many functional languages emphasize pure functions, which produce the exact same output every time when given the same input.

In programming, we may view our entire program as a function. It is a means by which some kind of input (file data or user commands), is transformed into some kind of output (new files, messages on our terminal). Individual functions within our program might take smaller portions of this input and produce some piece of our output, or some intermediate result that is needed to eventually produce this output.

In functional programming, we still need to organize our data in some way. So some of the ideas of objects/classes are still used to combine separate pieces of data in meaningful ways. However, we generally do not attach "actions" to data in the same way that classes do in OOP languages.

Since we don't perform actions directly on our data, functional languages are more likely to use immutable data as a default, rather than mutable data. (We should note though that both paradigms use both kinds of data in their own ways).

Functional Programming vs. OOP

The main point of separation between these paradigms is the question of "what is the fundamental building block of my program?" In object oriented programming, our programs are structured around objects. Functions are things we can do to an object or with an object.

In functional programming, functions are always first class citizens - the main building block of our code. In object oriented programming, functions can be first class citizens, but they do not need to be. Even in languages where they can be, they often are not used in this way, since this isn't as natural within the object oriented paradigm.

Object Oriented Programming Languages

Many of the most popular programming languages are OOP languages. Java, for a long time the most widely used language, is perhaps the most archetypal OO language. All code must exist within an object, even in a simple "Hello World" program:

class MyProgram {
  public static void main(String[] args) {
    System.out.println("Hello World!");
  }
}

In this example, we could not write our 'main' function on its own, without the use of 'class MyProgram'.

Java has a single basic 'Object' class, and all other classes (including any new classes you write) must inherit from it for basic behaviors like memory allocation. Java classes only allow single inheritance. This means that a class cannot inherit from multiple different types. Thus, all Java classes you would use can be mapped out on a tree structure with 'Object' as the root of the tree.

Other object oriented languages use the general ideas of classes, objects, and inheritance, but with some differences. C++ and Python both allow multiple inheritance, so that a class can inherit behavior from multiple existing classes. While these are both OOP languages, they are also more flexible in allowing functions to exist outside of classes. A basic script in either of these languages need not use any classes. In Python, we'd just write:

if __name__ == "__main__":
  print("Hello World!")

In C++, this looks like:

int main() {
  std::cout << "Hello World!" << std::endl;
}

These languages also don't have such a strictly defined inheritance structure. You can create classes that do not inherit from anything else, and they'll still work.

FP Languages

Haskell is perhaps the language that is most identifiable with the functional paradigm. Its type system and compiler really force you to adopt functional ideas, especially around immutable data, pure functions, and tail call optimization. It also embraces lazy evaluation, which is aligned with FP principles, but not a requirement for a functional language.

There are several other programming languages that generally get associated with the functional paradigm include Clojure, OCaml, Lisp, Scala and Rust. These languages aren't all functional in the same way as Haskell; there are many notable differences. Lisp bills itself specifically as a multi-paradigm language, and Scala is built to cross-compile with Java! Meanwhile Rust's syntax looks more object oriented, but its inheritance system (traits) feel much more like Haskell. However, on balance, these languages express functional programming ideas much more than their counterparts.

Amongst the languages mentioned in the object oriented section, Python has the most FP features. It is more natural to write functions outside of your class objects, and concepts like higher order functions and lambda expressions are more idiomatic than in C++ or Java. This is part of the reason Python is often recommended for beginners, with another reason being that its syntax makes it a relatively simple language to learn.

Advantages of Functional Programming

Fewer Bugs

FP code has a deserved reputation for having fewer bugs. Anecdotally, I certainly find I have a much easier time writing bug free code in Haskell than Python. Many bugs in object oriented code are caused by the proliferation of mutable state. You might pass an object to a method and expect your object to come back unchanged...only to find that the method does in fact change your object's state. With objects, it's also very easy for unstated pre-conditions to pop up in class methods. If your object is not in the state you expect when the method is called, you'll end up with behavior you didn't intend.

A lot of function-based code makes these errors impossible by imposing immutable objects as the default, if not making it a near requirement, as Haskell does. When the function is the building block of your code, you must specify precisely what the inputs of the function are. This gives you more opportunities to determine pre-conditions for this data. It also ensures that the return results of the function are the primary way you affect the rest of your program.

Functions also tend to be easier to test than objects. It is often tricky to create objects with the precise state you want to assess in a unit test, whereas to test a function you only need to reproduce the inputs.

More Expressive, Reasonable Design

The more you work with functions as your building blocks, and the more you try to fill your code with pure functions, the easier it will be to reason about your code. Imagine you have a couple dozen fields on an object in OO code. If someone calls a function on that object, any of those fields could impact the result of the method call.

Functions give you the opportunity to narrow things down to the precise values that you actually need to perform the computation. They let you separate the essential information from superfluous information, making it more obvious what the responsibilities are for each part of your code.

Multithreading

You can do parallel programming no matter what programming language you're using, but the functional programming paradigm aligns very well with parallel processing. To kick off a new thread in any language, you pretty much always have to pass a function as an argument, and this is more natural in FP. And with pure functions that don't modify shared mutable objects, FP is generally much easier to break into parallelizable pieces that don't require complex locking schemes.

Disadvantages of Functional Programming

Intuition of Complete Objects

Functional programming can feel less intuitive than object oriented programming. Perhaps one reason for this is that object oriented programming allows us to reason about "complete" objects, whose state at any given time is properly defined.

Functions are, in a sense, incomplete. A function is not a what that you can hold as a picture in your head. A function is a how. Given some inputs, how do you produce the outputs? In other words, it's a procedure. And a procedure can only really be imagined as a concrete object once you've filled in its inputs. This is best exemplified by the fact that functions have no native 'Show' instance in Haskell.

>> show (+)
No instance for Show (Integer -> Integer -> Integer) arising from a use of 'show'

If you apply the '+' function to arguments (and so create what could be called an "object"), then we can print it. But until then, it doesn't make much sense. If objects are the building block of your code though, you could, hypothetically, print the state of the objects in your code every step of the way.

Mutable State can be Useful!

As much as mutable state can cause a lot of bugs, it is nonetheless a useful tool for many problems, and decidedly more intuitive for certain data structures. If we just imagine something like the "Snake" game, it has a 2D grid that remains mostly the same from tick to tick, with just a couple things updating. This is easier to capture with mutable data.

Web development is another area where mutable objects are extremely useful. Anytime the user enters information on the page, some object has to change! Web development in FP almost requires its own paradigm (see "Functional Reactive Programming"). Haskell can represent mutable data, but the syntax is more cumbersome; you essentially need a separate data structure. Likewise, other functional languages might make mutability easier than Haskell, but mutability is still, again, more intuitive when objects are your fundamental building block, rather than functions on those objects.

We can see this even with something as simple as loops. Haskell doesn't perform "for-loops" in the same way as other languages, because most for loops essentially rely on the notion that there is some kind of state updating on each iteration of the loop, even if that state is only the integer counter. To write loops in Haskell, you have to learn concepts like maps and folds, which require you to get very used to writing new functions on the fly.

A Full Introduction to Haskell (and its Functional Aspects)

So functional programming languages are perhaps a bit more difficult to learn, but can offer a significant payoff if you put in the time to master the skills. Ultimately, you can use either paradigm for most kinds of projects and keep your development productive. It's down to your personal preference which you try while building software.

If you really want to dive into functional programming though, Haskell is a great language, since it will force you to learn FP principles more than other functional languages. For a complete introduction to Haskell, you should take a look at Haskell From Scratch, our beginner-level course for those new to the language. It will teach you everything you need to know about syntax and fundamental concepts, while providing you with a ton of hands-on practice through exercises and projects.

Haskell From Scratch also includes Making Sense of Monads, our course that shows the more functional side of Haskell by teaching you about the critical concept of monads. With these two courses under your belt, you'll be well on your way to mastery of functional programming! Head over here to learn more about these courses!

by James Bowen at January 15, 2024 04:00 PM

Haskell Interlude

41: Moritz Angermann

Today, Matthías and Joachim are interviewing Moritz Angermann. Moritz knew he wanted to use Haskell before he knew Haskell, fixed cross-compilation as his first GHC contribution. We’ll talk more about cross-compilation to Windows and mobile platforms, why Template Haskell is the cause of most headaches, why you should be careful if your sister calls and tells you to cabal install a package, and finally how we can reduce the fear of new GHC releases, by improving stability.

by Haskell Podcast at January 15, 2024 08:00 AM

Derek Elkins

The Pullback Lemma in Gory Detail (Redux)

Introduction

Andrej Bauer has a paper titled The pullback lemma in gory detail that goes over the proof of the pullback lemma in full detail. This is a basic result of category theory and most introductions leave it as an exercise. It is a good exercise, and you should prove it yourself before reading this article or Andrej Bauer’s.

Andrej Bauer’s proof is what most introductions are expecting you to produce. I very much like the representability perspective on category theory and like to see what proofs look like using this perspective.

So this is a proof of the pullback lemma from the perspective of representability.

Preliminaries

The key thing we need here is a characterization of pullbacks in terms of representability. To just jump to the end, we have for |f : A \to C| and |g : B \to C|, |A \times_{f,g} B| is the pullback of |f| and |g| if and only if it represents the functor \[\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}\]

That is to say we have the natural isomorphism \[ \mathrm{Hom}({-}, A \times_{f,g} B) \cong \{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \} \]

We’ll write the left to right direction of the isomorphism as |\langle u,v\rangle : U \to A \times_{f,g} B| where |u : U \to A| and |v : U \to B| and they satisfy |f \circ u = g \circ v|. Applying the isomorphism right to left on the identity arrow gives us two arrows |p_1 : A \times_{f,g} B \to A| and |p_2 : A \times_{f,g} B \to B| satisfying |p_1 \circ \langle u, v\rangle = u| and |p_2 \circ \langle u,v \rangle = v|. (Exercise: Show that this follows from being a natural isomorphism.)

One nice thing about representability is that it reduces categorical reasoning to set-theoretic reasoning that you are probably already used to, as we’ll see. You can connect this definition to a typical universal property based definition used in Andrej Bauer’s article. Here we’re taking it as the definition of the pullback.

Proof

The claim to be proven is if the right square in the below diagram is a pullback square, then the left square is a pullback square if and only if the whole rectangle is a pullback square. \[ \xymatrix { A \ar[d]_{q_1} \ar[r]^{q_2} & B \ar[d]_{p_1} \ar[r]^{p_2} & C \ar[d]^{h} \\ X \ar[r]_{f} & Y \ar[r]_{g} & Z }\]

Rewriting the diagram as equations, we have:

Theorem: If |f \circ q_1 = p_1 \circ q_2|, |g \circ p_1 = h \circ p_2|, and |(B, p_1, p_2)| is a pullback of |g| and |h|, then |(A, q_1, q_2)| is a pullback of |f| and |p_1| if and only if |(A, q_1, p_2 \circ q_2)| is a pullback of |g \circ f| and |h|.

Proof: If |(A, q_1, q_2)| was a pullback of |f| and |p_1| then we’d have the following.

\[\begin{align} \mathrm{Hom}({-}, A) & \cong \{(u_1, u_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, B) \mid f \circ u_1 = p_1 \circ u_2 \} \\ & \cong \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = p_1 \circ \langle v_1, v_2\rangle \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = v_1 \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, v_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, C) \mid g \circ f \circ u_1 = h \circ v_2 \} \end{align}\]

The second isomorphism is |B| being a pullback and |u_2| is an arrow into |B| so it’s necessarily of the form |\langle v_1, v_2\rangle|. The first equality is just |p_1 \circ \langle v_1, v_2\rangle = v_1| mentioned earlier. The second equality merely eliminates the use of |v_1| using the equation |f \circ u_1 = v_1|.

This overall natural isomorphism, however, is exactly what it means for |A| to be a pullback of |g \circ f| and |h|. We verify the projections are what we expect by pushing |id_A| through the isomorphism. By assumption, |u_1| and |u_2| will be |q_1| and |q_2| respectively in the first isomorphism. We see that |v_2 = p_2 \circ \langle v_1, v_2\rangle = p_2 \circ q_2|.

We simply run the isomorphism backwards to get the other direction of the if and only if. |\square|

The simplicity and compactness of this proof demonstrates why I like representability.

January 15, 2024 01:33 AM

January 11, 2024

Chris Reade

Graphs, Kites and Darts

Graphs, Kites and Darts

Figure 1: Three Coloured Patches
Figure 1: Three Coloured Patches

Non-periodic tilings with Penrose’s kites and darts

(An updated version, since original posting on Jan 6, 2022)

We continue our investigation of the tilings using Haskell with Haskell Diagrams. What is new is the introduction of a planar graph representation. This allows us to define more operations on finite tilings, in particular forcing and composing.

Previously in Diagrams for Penrose Tiles we implemented tools to create and draw finite patches of Penrose kites and darts (such as the samples depicted in figure 1). The code for this and for the new graph representation and tools described here can be found on GitHub https://github.com/chrisreade/PenroseKiteDart.

To describe the tiling operations it is convenient to work with the half-tiles: LD (left dart), RD (right dart), LK (left kite), RK (right kite) using a polymorphic type HalfTile (defined in a module HalfTile)

data HalfTile rep 
 = LD rep | RD rep | LK rep | RK rep   deriving (Show,Eq)

Here rep is a type variable for a representation to be chosen. For drawing purposes, we chose two-dimensional vectors (V2 Double) and called these Pieces.

type Piece = HalfTile (V2 Double)

The vector represents the join edge of the half tile (see figure 2) and thus the scale and orientation are determined (the other tile edges are derived from this when producing a diagram).

Figure 2: The (half-tile) pieces showing join edges (dashed) and origin vertices (red dots)
Figure 2: The (half-tile) pieces showing join edges (dashed) and origin vertices (red dots)

Finite tilings or patches are then lists of located pieces.

type Patch = [Located Piece]

Both Piece and Patch are made transformable so rotate, and scale can be applied to both and translate can be applied to a Patch. (Translate has no effect on a Piece unless it is located.)

In Diagrams for Penrose Tiles we also discussed the rules for legal tilings and specifically the problem of incorrect tilings which are legal but get stuck so cannot continue to infinity. In order to create correct tilings we implemented the decompose operation on patches.

The vector representation that we use for drawing is not well suited to exploring properties of a patch such as neighbours of pieces. Knowing about neighbouring tiles is important for being able to reason about composition of patches (inverting a decomposition) and to find which pieces are determined (forced) on the boundary of a patch.

However, the polymorphic type HalfTile allows us to introduce our alternative graph representation alongside Pieces.

Tile Graphs

In the module Tgraph.Prelude, we have the new representation which treats half tiles as triangular faces of a planar graph – a TileFace – by specialising HalfTile with a triple of vertices (clockwise starting with the tile origin). For example

LD (1,3,4)       RK (6,4,3)
type Vertex = Int
type TileFace = HalfTile (Vertex,Vertex,Vertex)

When we need to refer to particular vertices from a TileFace we use originV (the first vertex – red dot in figure 2), oppV (the vertex at the opposite end of the join edge – dashed edge in figure 2), wingV (the remaining vertex not on the join edge).

originV, oppV, wingV :: TileFace -> Vertex

Tgraphs

The Tile Graphs implementation uses a newtype Tgraph which is a list of tile faces.

newtype Tgraph = Tgraph [TileFace]
                 deriving (Show)

faces :: Tgraph -> [TileFace]
faces (Tgraph fcs) = fcs

For example, fool (short for a fool’s kite) is a Tgraph with 6 faces (and 7 vertices), shown in figure 3.

fool = Tgraph [RD (1,2,3),LD (1,3,4),RK (6,2,5)
              ,LK (6,3,2),RK (6,4,3),LK (6,7,4)
              ]

(The fool is also called an ace in the literature)

Figure 3: fool
Figure 3: fool

With this representation we can investigate how composition works with whole patches. Figure 4 shows a twice decomposed sun on the left and a once decomposed sun on the right (both with vertex labels). In addition to decomposing the right Tgraph to form the left Tgraph, we can also compose the left Tgraph to get the right Tgraph.

Figure 4: sunD2 and sunD
Figure 4: sunD2 and sunD

After implementing composition, we also explore a force operation and an emplace operation to extend tilings.

There are some constraints we impose on Tgraphs.

  • No spurious vertices. The vertices of a Tgraph are the vertices that occur in the faces of the Tgraph (and maxV is the largest number occurring).
  • Connected. The collection of faces must be a single connected component.
  • No crossing boundaries. By this we mean that vertices on the boundary are incident with exactly two boundary edges. The boundary consists of the edges between the Tgraph faces and exterior region(s). This is important for adding faces.
  • Tile connected. Roughly, this means that if we collect the faces of a Tgraph by starting from any single face and then add faces which share an edge with those already collected, we get all the Tgraph faces. This is important for drawing purposes.

In fact, if a Tgraph is connected with no crossing boundaries, then it must be tile connected. (We could define tile connected to mean that the dual graph excluding exterior regions is connected.)

Figure 5 shows two excluded graphs which have crossing boundaries at 4 (left graph) and 13 (right graph). The left graph is still tile connected but the right is not tile connected (the two faces at the top right do not have an edge in common with the rest of the faces.)

Although we have allowed for Tgraphs with holes (multiple exterior regions), we note that such holes cannot be created by adding faces one at a time without creating a crossing boundary. They can be created by removing faces from a Tgraph without necessarily creating a crossing boundary.

Important We are using face as an abbreviation for half-tile face of a Tgraph here, and we do not count the exterior of a patch of faces to be a face. The exterior can also be disconnected when we have holes in a patch of faces and the holes are not counted as faces either. In graph theory, the term face would generally include these other regions, but we will call them exterior regions rather than faces.

Figure 5: A tile-connected graph with crossing boundaries at 4, and a non tile-connected graph
Figure 5: A tile-connected graph with crossing boundaries at 4, and a non tile-connected graph

In addition to the constructor Tgraph we also use

checkedTgraph:: [TileFace] -> Tgraph

which creates a Tgraph from a list of faces, but also performs checks on the required properties of Tgraphs. We can then remove or select faces from a Tgraph and then use checkedTgraph to ensure the resulting Tgraph still satisfies the required properties.

selectFaces, removeFaces  :: [TileFace] -> Tgraph -> Tgraph
selectFaces fcs g = checkedTgraph (faces g `intersect` fcs)
removeFaces fcs g = checkedTgraph (faces g \\ fcs)

Edges and Directed Edges

We do not explicitly record edges as part of a Tgraph, but calculate them as needed. Implicitly we are requiring

  • No spurious edges. The edges of a Tgraph are the edges of the faces of the Tgraph.

To represent edges, a pair of vertices (a,b) is regarded as a directed edge from a to b. A list of such pairs will usually be regarded as a directed edge list. In the special case that the list is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we will refer to this as an edge list rather than a directed edge list.

The following functions on TileFaces all produce directed edges (going clockwise round a face).

type Dedge = (Vertex,Vertex)

joinE  :: TileFace -> Dedge  -- join edge - dashed in figure 2
shortE :: TileFace -> Dedge  -- the short edge which is not a join edge
longE  :: TileFace -> Dedge  -- the long edge which is not a join edge
faceDedges :: TileFace -> [Dedge]
  -- all three directed edges clockwise from origin

For the whole Tgraph, we often want a list of all the directed edges of all the faces.

graphDedges :: Tgraph -> [Dedge]
graphDedges = concatMap faceDedges . faces

Because our graphs represent tilings they are planar (can be embedded in a plane) so we know that at most two faces can share an edge and they will have opposite directions of the edge. No two faces can have the same directed edge. So from graphDedges g we can easily calculate internal edges (edges shared by 2 faces) and boundary directed edges (directed edges round the external regions).

internalEdges, boundaryDedges :: Tgraph -> [Dedge]

The internal edges of g are those edges which occur in both directions in graphDedges g. The boundary directed edges of g are the missing reverse directions in graphDedges g.

We also refer to all the long edges of a Tgraph (including kite join edges) as phiEdges (both directions of these edges).

phiEdges :: Tgraph -> [Dedge]

This is so named because, when drawn, these long edges are phi times the length of the short edges (phi being the golden ratio which is approximately 1.618).

Drawing Tgraphs (Patches and VPatches)

The module Tgraph.Convert contains functions to convert a Tgraph to our previous vector representation (Patch) defined in TileLib so we can use the existing tools to produce diagrams.

However, it is convenient to have an intermediate stage (a VPatch = Vertex Patch) which contains both faces and calculated vertex locations (a finite map from vertices to locations). This allows vertex labels to be drawn and for faces to be identified and retained/excluded after the location information is calculated.

data VPatch = VPatch { vLocs :: VertexLocMap
                     , vpFaces::[TileFace]
                     } deriving Show

The conversion functions include

makeVP   :: Tgraph -> VPatch

For drawing purposes we introduced a class Drawable which has a means to create a diagram when given a function to draw Pieces.

class Drawable a where
  drawWith :: (Piece -> Diagram B) -> a -> Diagram B

This allows us to make Patch, VPatch and Tgraph instances of Drawable, and we can define special cases for the most frequently used drawing tools.

draw :: Drawable a => a -> Diagram B
draw = drawWith drawPiece

drawj :: Drawable a => a -> Diagram B
drawj = drawWith dashjPiece

We also need to be able to create diagrams with vertex labels, so we use a draw function modifier

class DrawableLabelled a where
  labelSize :: Measure Double -> (VPatch -> Diagram B) -> a -> Diagram B

Both VPatch and Tgraph are made instances (but not Patch as this no longer has vertex information). The type Measure is defined in Diagrams, but we generally use a default measure for labels to define

labelled :: DrawableLabelled a => (VPatch -> Diagram B) -> a -> Diagram B
labelled = labelSize (normalized 0.018)

This allows us to use, for example (where g is a Tgraph or VPatch)

labelled draw g
labelled drawj g

One consequence of using abstract graphs is that there is no unique predefined way to orient or scale or position the VPatch (and Patch) arising from a Tgraph representation. Our implementation selects a particular join edge and aligns it along the x-axis (unit length for a dart, philength for a kite) and tile-connectedness ensures the rest of the VPatch (and Patch) can be calculated from this.

We also have functions to re-orient a VPatch and lists of VPatchs using chosen pairs of vertices. [Simply doing rotations on the final diagrams can cause problems if these include vertex labels. We do not, in general, want to rotate the labels – so we need to orient the VPatch before converting to a diagram]

Decomposing Graphs

We previously implemented decomposition for patches which splits each half-tile into two or three smaller scale half-tiles.

decompPatch :: Patch -> Patch

We now have a Tgraph version of decomposition in the module Tgraph.Decompose:

decompose :: Tgraph -> Tgraph

Graph decomposition is particularly simple. We start by introducing one new vertex for each long edge (the phiEdges) of the Tgraph. We then build the new faces from each old face using the new vertices.

As a running example we take fool (mentioned above) and its decomposition foolD

*Main> foolD = decompose fool

*Main> foolD
Tgraph [LK (1,8,3),RD (2,3,8),RK (1,3,9)
       ,LD (4,9,3),RK (5,13,2),LK (5,10,13)
       ,RD (6,13,10),LK (3,2,13),RK (3,13,11)
       ,LD (6,11,13),RK (3,14,4),LK (3,11,14)
       ,RD (6,14,11),LK (7,4,14),RK (7,14,12)
       ,LD (6,12,14)
       ]

which are best seen together (fool followed by foolD) in figure 6.

Figure 6: fool and foolD (= decomposeG fool)
Figure 6: fool and foolD (= decompose fool)

Composing Tgraphs, and Unknowns

Composing is meant to be an inverse to decomposing, and one of the main reasons for introducing our graph representation. In the literature, decomposition and composition are defined for infinite tilings and in that context they are unique inverses to each other. For finite patches, however, we will see that composition is not always uniquely determined.

In figure 7 (Two Levels) we have emphasised the larger scale faces on top of the smaller scale faces.

Figure 7: Two Levels
Figure 7: Two Levels

How do we identify the composed tiles? We start by classifying vertices which are at the wing tips of the (smaller) darts as these determine how things compose. In the interior of a graph/patch (e.g in figure 7), a dart wing tip always coincides with a second dart wing tip, and either

  1. the 2 dart halves share a long edge. The shared wing tip is then classified as a largeKiteCentre and is at the centre of a larger kite. (See left vertex type in figure 8), or
  2. the 2 dart halves touch at their wing tips without sharing an edge. This shared wing tip is classified as a largeDartBase and is the base of a larger dart. (See right vertex type in figure 8)
Figure 8: largeKiteCentre (left) and largeDartBase (right)
Figure 8: largeKiteCentre (left) and largeDartBase (right)

[We also call these (respectively) a deuce vertex type and a jack vertex type later in figure 10]

Around the boundary of a Tgraph, the dart wing tips may not share with a second dart. Sometimes the wing tip has to be classified as unknown but often it can be decided by looking at neighbouring tiles. In this example of a four times decomposed sun (sunD4), it is possible to classify all the dart wing tips as a largeKiteCentre or a largeDartBase so there are no unknowns.

If there are no unknowns, then we have a function to produce the unique composed Tgraph.

compose:: Tgraph -> Tgraph

Any correct decomposed Tgraph without unknowns will necessarily compose back to its original. This makes compose a left inverse to decompose provided there are no unknowns.

For example, with an (n times) decomposed sun we will have no unknowns, so these will all compose back up to a sun after n applications of compose. For n=4 (sunD4 – the smaller scale shown in figure 7) the dart wing classification returns 70 largeKiteCentres, 45 largeDartBases, and no unknowns.

Similarly with the simpler foolD example, if we classsify the dart wings we get

largeKiteCentres = [14,13]
largeDartBases = [3]
unknowns = []

In foolD (the right hand Tgraph in figure 6), nodes 14 and 13 are new kite centres and node 3 is a new dart base. There are no unknowns so we can use compose safely

*Main> compose foolD
Tgraph [RD (1,2,3),LD (1,3,4),RK (6,2,5)
       ,RK (6,4,3),LK (6,3,2),LK (6,7,4)
       ]

which reproduces the original fool (left hand Tgraph in figure 6).

However, if we now check out unknowns for fool we get

largeKiteCentres = []
largeDartBases = []
unknowns = [4,2]    

So both nodes 2 and 4 are unknowns. It had looked as though fool would simply compose into two half kites back-to-back (sharing their long edge not their join), but the unknowns show there are other possible choices. Each unknown could become a largeKiteCentre or a largeDartBase.

The question is then what to do with unknowns.

Partial Compositions

In fact our compose resolves two problems when dealing with finite patches. One is the unknowns and the other is critical missing faces needed to make up a new face (e.g the absence of any half dart).

It is implemented using an intermediary function for partial composition

partCompose:: Tgraph -> ([TileFace],Tgraph) 

partCompose will compose everything that is uniquely determined, but will leave out faces round the boundary which cannot be determined or cannot be included in a new face. It returns the faces of the argument Tgraph that were not used, along with the composed Tgraph.

Figure 9 shows the result of partCompose applied to two graphs. [These are force kiteD3 and force dartD3 on the left. Force is described later]. In each case, the excluded faces of the starting Tgraph are shown in pale green, overlaid by the composed Tgraph on the right.

Figure 9: partCompose for two graphs (force kiteD3 top row and force dartD3 bottom row)
Figure 9: partCompose for two graphs (force kiteD3 top row and force dartD3 bottom row)

Then compose is simply defined to keep the composed faces and ignore the unused faces produced by partCompose.

compose:: Tgraph -> Tgraph
compose = snd . partCompose 

This approach avoids making a decision about unknowns when composing, but it may lose some information by throwing away the uncomposed faces.

For correct Tgraphs g, if decompose g has no unknowns, then compose is a left inverse to decompose. However, if we take g to be two kite halves sharing their long edge (not their join edge), then these decompose to fool which produces an empty Tgraph when recomposed. Thus we do not have g = compose (decompose g) in general. On the other hand we do have g = compose (decompose g) for correct whole-tile Tgraphs g (whole-tile means all half-tiles of g have their matching half-tile on their join edge in g)

Later (figure 21) we show another exception to g = compose (decompose g) with an incorrect tiling.

We make use of

selectFacesVP    :: [TileFace] -> VPatch -> VPatch
removeFacesVP    :: [TileFace] -> VPatch -> VPatch

for creating VPatches from selected tile faces of a Tgraph or VPatch. This allows us to represent and draw a list of faces which need not be connected nor satisfy the no crossing boundaries property provided the Tgraph it was derived from had these properties.

Forcing

When building up a tiling, following the rules, there is often no choice about what tile can be added alongside certain tile edges at the boundary. Such additions are forced by the existing patch of tiles and the rules. For example, if a half tile has its join edge on the boundary, the unique mirror half tile is the only possibility for adding a face to that edge. Similarly, the short edge of a left (respectively, right) dart can only be matched with the short edge of a right (respectively, left) kite. We also make use of the fact that only 7 types of vertex can appear in (the interior of) a patch, so on a boundary vertex we sometimes have enough of the faces to determine the vertex type. These are given the following names in the literature (shown in figure 10): sun, star, jack (=largeDartBase), queen, king, ace, deuce (=largeKiteCentre).

Figure 10: Vertex types
Figure 10: Vertex types

The function

force :: Tgraph -> Tgraph

will add some faces on the boundary that are forced (i.e new faces where there is exactly one possible choice). For example:

  • When a join edge is on the boundary – add the missing half tile to make a whole tile.
  • When a half dart has its short edge on the boundary – add the half kite that must be on the short edge.
  • When a vertex is both a dart origin and a kite wing (it must be a queen or king vertex) – if there is a boundary short edge of a kite half at the vertex, add another kite half sharing the short edge, (this converts 1 kite to 2 and 3 kites to 4 in combination with the first rule).
  • When two half kites share a short edge their common oppV vertex must be a deuce vertex – add any missing half darts needed to complete the vertex.

Figure 11 shows foolDminus (which is foolD with 3 faces removed) on the left and the result of forcing, ie force foolDminus on the right which is the same Tgraph we get from force foolD (modulo vertex renumbering).

foolDminus = 
    removeFaces [RD(6,14,11), LD(6,12,14), RK(5,13,2)] foolD
Figure 11: foolDminus and force foolDminus = force foolD
Figure 11: foolDminus and force foolDminus = force foolD

Figures 12, 13 and 14 illustrate the result of forcing a 5-times decomposed kite, a 5-times decomposed dart, and a 5-times decomposed sun (respectively). The first two figures reproduce diagrams from an article by Roger Penrose illustrating the extent of influence of tiles round a decomposed kite and dart. [Penrose R Tilings and quasi-crystals; a non-local growth problem? in Aperiodicity and Order 2, edited by Jarich M, Academic Press, 1989. (fig 14)].

Figure 12: force kiteD5 with kiteD5 shown in red
Figure 12: force kiteD5 with kiteD5 shown in red
Figure 13: force dartD5 with dartD5 shown in red
Figure 13: force dartD5 with dartD5 shown in red
Figure 14: force sunD5 with sunD5 shown in red
Figure 14: force sunD5 with sunD5 shown in red

In figure 15, the bottom row shows successive decompositions of a dart (dashed blue arrows from right to left), so applying compose to each dart will go back (green arrows from left to right). The black vertical arrows are force. The solid blue arrows from right to left are (force . decompose) being applied to the successive forced Tgraphs. The green arrows in the reverse direction are compose again and the intermediate (partCompose) figures are shown in the top row with the remainder faces in pale green.

Figure 15: Arrows: black = force, green = composeG, solid blue = (force . decomposeG)
Figure 15: Arrows: black = force, green = compose, solid blue = (force . decompose)

Figure 16 shows the forced graphs of the seven vertex types (with the starting Tgraphs in red) along with a kite (top right).

Figure 16: Relating the forced seven vertex types and the kite
Figure 16: Relating the forced seven vertex types and the kite

These are related to each other as shown in the columns. Each Tgraph composes to the one above (an empty Tgraph for the ones in the top row) and the Tgraph below is its forced decomposition. [The rows have been scaled differently to make the vertex types easier to see.]

Adding Faces to a Tgraph

This is technically tricky because we need to discover what vertices (and implicitly edges) need to be newly created and which ones already exist in the Tgraph. This goes beyond a simple graph operation and requires use of the geometry of the faces. We have chosen not to do a full conversion to vectors to work out all the geometry, but instead we introduce a local representation of relative directions of edges at a vertex allowing a simple equality test.

Edge directions

All directions are integer multiples of 1/10th turn (mod 10) so we use these integers for face internal angles and boundary external angles. The face adding process always adds to the right of a given directed edge (a,b) which must be a boundary directed edge. [Adding to the left of an edge (a,b) would mean that (b,a) will be the boundary direction and so we are really adding to the right of (b,a)]. Face adding looks to see if either of the two other edges already exist in the Tgraph by considering the end points a and b to which the new face is to be added, and checking angles.

This allows an edge in a particular sought direction to be discovered. If it is not found it is assumed not to exist. However, the search will be undermined if there are crossing boundaries. In such a case there will be more than two boundary directed edges at the vertex and there is no unique external angle.

Establishing the no crossing boundaries property ensures these failures cannot occur. We can easily check this property for newly created Tgraphs (with checkedTgraph) and the face adding operations cannot create crossing boundaries.

Touching Vertices and Crossing Boundaries

When a new face to be added on (a,b) has neither of the other two edges already in the Tgraph, the third vertex needs to be created. However it could already exist in the Tgraph – it is not on an edge coming from a or b but from another non-local part of the Tgraph. We call this a touching vertex. If we simply added a new vertex without checking for a clash this would create a non-sensible Tgraph. However, if we do check and find an existing vertex, we still cannot add the face using this because it would create a crossing boundary.

Our version of forcing prevents face additions that would create a touching vertex/crossing boundary by calculating the positions of boundary vertices.

No conflicting edges

There is a final (simple) check when adding a new face, to prevent a long edge (phiEdge) sharing with a short edge. This can arise if we force an incorrect Tgraph (as we will see later).

Implementing Forcing

Our order of forcing prioritises updates (face additions) which do not introduce a new vertex. Such safe updates are easy to recognise and they do not require a touching vertex check. Surprisingly, this pretty much removes the problem of touching vertices altogether.

As an illustration, consider foolDMinus again on the left of figure 11. Adding the left dart onto edge (12,14) is not a safe addition (and would create a crossing boundary at 6). However, adding the right dart RD(6,14,11) is safe and creates the new edge (6,14) which then makes the left dart addition safe. In fact it takes some contrivance to come up with a Tgraph with an update that could fail the check during forcing when safe cases are always done first. Figure 17 shows such a contrived Tgraph formed by removing the faces shown in green from a twice decomposed sun on the left. The forced result is shown on the right. When there are no safe cases, we need to try an unsafe one. The four green faces at the bottom are blocked by the touching vertex check. This leaves any one of 9 half-kites at the centre which would pass the check. But after just one of these is added, the check is not needed again. There is always a safe addition to be done at each step until all the green faces are added.

Figure 17: A contrived example requiring a touching vertex check
Figure 17: A contrived example requiring a touching vertex check

Boundary information

The implementation of forcing has been made more efficient by calculating some boundary information in advance. This boundary information uses a type BoundaryState

data BoundaryState
  = BoundaryState
    { boundary    :: [Dedge]
    , bvFacesMap  :: Mapping Vertex [TileFace]
    , bvLocMap    :: Mapping Vertex (Point V2 Double)
    , allFaces    :: [TileFace]
    , nextVertex  :: Vertex
    } deriving (Show)

This records the boundary directed edges (boundary) plus a mapping of the boundary vertices to their incident faces (bvFacesMap) plus a mapping of the boundary vertices to their positions (bvLocMap). It also keeps track of all the faces and the vertex number to use when adding a vertex. The boundary information is easily incremented for each face addition without being recalculated from scratch, and a final Tgraph with all the new faces is easily recovered from the boundary information when there are no more updates.

makeBoundaryState  :: Tgraph -> BoundaryState
recoverGraph  :: BoundaryState -> Tgraph

The saving that comes from using boundary information lies in efficient incremental changes to the boundary information and, of course, in avoiding the need to consider internal faces. As a further optimisation we keep track of updates in a mapping from boundary directed edges to updates, and supply a list of affected edges after an update so the update calculator (update generator) need only revise these. The boundary and mapping are combined in a ForceState.

type UpdateMap = Mapping Dedge Update
type UpdateGenerator = BoundaryState -> [Dedge] -> UpdateMap
data ForceState = ForceState 
       { boundaryState:: BoundaryState
       , updateMap:: UpdateMap 
       }

Forcing then involves using a specific update generator (allUGenerator) and initialising the state, then using the recursive forceAll which keeps doing updates until there are no more, before recovering the final Tgraph.

force:: Tgraph -> Tgraph
force = forceWith allUGenerator

forceWith:: UpdateGenerator -> Tgraph -> Tgraph
forceWith uGen = recoverGraph . boundaryState . 
                 forceAll uGen . initForceState uGen

forceAll :: UpdateGenerator -> ForceState -> ForceState
initForceState :: UpdateGenerator -> Tgraph -> ForceState

In addition to force we can easily define

wholeTiles:: Tgraph -> Tgraph
wholeTiles = forceWith wholeTileUpdates 

which just uses the first forcing rule to make sure every half-tile has a matching other half.

We also have a version of force which counts to a specific number of face additions.

stepForce :: Int -> ForceState -> ForceState

This proved essential in uncovering problems of accumulated inaccuracy in calculating boundary positions (now fixed).

Some Other Experiments

Below we describe results of some experiments using the tools introduced above. Specifically: emplacements, sub-Tgraphs, incorrect tilings, and composition choices.

Emplacements

The finite number of rules used in forcing are based on local boundary vertex and edge information only. We thought we may be able to improve on this by considering a composition and forcing at the next level up before decomposing and forcing again. This thus considers slightly broader local information. In fact we can iterate this process to all the higher levels of composition. Some Tgraphs produce an empty Tgraph when composed so we can regard those as maximal compositions. For example compose fool produces an empty Tgraph.

The idea was to take an arbitrary Tgraph and apply (compose . force) repeatedly to find its maximally composed (non-empty) Tgraph, before applying (force . decompose) repeatedly back down to the starting level (so the same number of decompositions as compositions).

We called the function emplace, and called the result the emplacement of the starting Tgraph as it shows a region of influence around the starting Tgraph.

With earlier versions of forcing when we had fewer rules, emplace g often extended force g for a Tgraph g. This allowed the identification of some new rules. However, since adding the new rules we have not found Tgraphs where the result of force had fewer faces than the result of emplace.

[As an important update, we have now found examples where the result of force strictly includes the result of emplace (modulo vertex renumbering).

Sub-Tgraphs

In figure 18 on the left we have a four times decomposed dart dartD4 followed by two sub-Tgraphs brokenDart and badlyBrokenDart which are constructed by removing faces from dartD4 (but retaining the connectedness condition and the no crossing boundaries condition). These all produce the same forced result (depicted middle row left in figure 15).

Figure 18: dartD4, brokenDart, badlyBrokenDart
Figure 18: dartD4, brokenDart, badlyBrokenDart

However, if we do compositions without forcing first we find badlyBrokenDart fails because it produces a graph with crossing boundaries after 3 compositions. So compose on its own is not always safe, where safe means guaranteed to produce a valid Tgraph from a valid correct Tgraph.

In other experiments we tried force on Tgraphs with holes and on incomplete boundaries around a potential hole. For example, we have taken the boundary faces of a forced, 5 times decomposed dart, then removed a few more faces to make a gap (which is still a valid Tgraph). This is shown at the top in figure 19. The result of forcing reconstructs the complete original forced graph. The bottom figure shows an intermediate stage after 2200 face additions. The gap cannot be closed off to make a hole as this would create a crossing boundary, but the channel does get filled and eventually closes the gap without creating a hole.

Figure 19: Forcing boundary faces with a gap (after 2200 steps)
Figure 19: Forcing boundary faces with a gap (after 2200 steps)

Incorrect Tilings

When we say a Tgraph g is correct (respectively: incorrect), we mean g represents a correct tiling (respectively: incorrect tiling). A simple example of an incorrect Tgraph is a kite with a dart on each side (referred to as a mistake by Penrose) shown on the left of figure 20.

*Main> mistake
Tgraph [RK (1,2,4),LK (1,3,2),RD (3,1,5)
       ,LD (4,6,1),LD (3,5,7),RD (4,8,6)
       ]

If we try to force (or emplace) this Tgraph it produces an error in construction which is detected by the test for conflicting edge types (a phiEdge sharing with a non-phiEdge).

*Main> force mistake
... *** Exception: doUpdate:(incorrect tiling)
Conflicting new face RK (11,1,6)
with neighbouring faces
[RK (9,1,11),LK (9,5,1),RK (1,2,4),LK (1,3,2),RD (3,1,5),LD (4,6,1),RD (4,8,6)]
in boundary
BoundaryState ...

In figure 20 on the right, we see that after successfully constructing the two whole kites on the top dart short edges, there is an attempt to add an RK on edge (1,6). The process finds an existing edge (1,11) in the correct direction for one of the new edges so tries to add the erroneous RK (11,1,6) which fails a noConflicts test.

Figure 20: An incorrect Tgraph (mistake), and the point at which force mistake fails
Figure 20: An incorrect Tgraph (mistake), and the point at which force mistake fails

So it is certainly true that incorrect Tgraphs may fail on forcing, but forcing cannot create an incorrect Tgraph from a correct Tgraph.

If we apply decompose to mistake it produces another incorrect Tgraph (which is similarly detected if we apply force), but will nevertheless still compose back to mistake if we do not try to force.

Interestingly, though, the incorrectness of a Tgraph is not always preserved by decompose. If we start with mistake1 which is mistake with just two of the half darts (and also incorrect) we still get a similar failure on forcing, but decompose mistake1 is no longer incorrect. If we apply compose to the result or force then compose the mistake is thrown away to leave just a kite (see figure 21). This is an example where compose is not a left inverse to either decompose or (force . decompose).

Figure 21: mistake1 with its decomposition, forced decomposition, and recomposed.
Figure 21: mistake1 with its decomposition, forced decomposition, and recomposed.

Composing with Choices

We know that unknowns indicate possible choices (although some choices may lead to incorrect Tgraphs). As an experiment we introduce

makeChoices :: Tgraph -> [Tgraph]

which produces 2^n alternatives for the 2 choices of each of n unknowns (prior to composing). This uses forceLDB which forces an unknown to be a largeDartBase by adding an appropriate joined half dart at the node, and forceLKC which forces an unknown to be a largeKiteCentre by adding a half dart and a whole kite at the node (making up the 3 pieces for a larger half kite).

Figure 22 illustrates the four choices for composing fool this way. The top row has the four choices of makeChoices fool (with the fool shown embeded in red in each case). The bottom row shows the result of applying compose to each choice.

Figure 22: makeChoices fool (top row) and compose of each choice (bottom row)
Figure 22: makeChoices fool (top row) and compose of each choice (bottom row)

In this case, all four compositions are correct tilings. The problem is that, in general, some of the choices may lead to incorrect tilings. More specifically, a choice of one unknown can determine what other unknowns have to become with constraints such as

  • a and b have to be opposite choices
  • a and b have to be the same choice
  • a and b cannot both be largeKiteCentres
  • a and b cannot both be largeDartBases

This analysis of constraints on unknowns is not trivial. The potential exponential results from choices suggests we should compose and force as much as possible and only consider unknowns of a maximal Tgraph.

For calculating the emplacement of a Tgraph, we first find the forced maximal Tgraph before decomposing. We could also consider using makeChoices at this top step when there are unknowns, i.e a version of emplace which produces these alternative results (emplaceChoices)

The result of emplaceChoices is illustrated for foolD in figure 23. The first force and composition is unique producing the fool level at which point we get 4 alternatives each of which compose further as previously illustrated in figure 22. Each of these are forced, then decomposed and forced, decomposed and forced again back down to the starting level. In figure 23 foolD is overlaid on the 4 alternative results. What they have in common is (as you might expect) emplace foolD which equals force foolD and is the graph shown on the right of figure 11.

Figure 23: emplaceChoices foolD
Figure 23: emplaceChoices foolD

Future Work

I am collaborating with Stephen Huggett who suggested the use of graphs for exploring properties of the tilings. We now have some tools to experiment with but we would also like to complete some formalisation and proofs.

It would also be good to establish whether it is true that g is incorrect iff force g fails.

We have other conjectures relating to subgraph ordering of Tgraphs and Galois connections to explore.

by readerunner at January 11, 2024 12:51 PM

January 10, 2024

Chris Reade

Diagrams for Penrose Tiles

Penrose Kite and Dart Tilings with Haskell Diagrams

Revised version (no longer the full program in this literate Haskell)

Infinite non-periodic tessellations of Roger Penrose’s kite and dart tiles.

leftFilledSun6
leftFilledSun6

As part of a collaboration with Stephen Huggett, working on some mathematical properties of Penrose tilings, I recognised the need for quick renderings of tilings. I thought Haskell diagrams would be helpful here, and that turned out to be an excellent choice. Two dimensional vectors were well-suited to describing tiling operations and these are included as part of the diagrams package.

This literate Haskell uses the Haskell diagrams package to draw tilings with kites and darts. It also implements the main operations of compChoices and decompPatch which are used for constructing tilings (explained below).

Firstly, these 5 lines are needed in Haskell to use the diagrams package:

{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE FlexibleContexts          #-}
{-# LANGUAGE TypeFamilies              #-}
import Diagrams.Prelude
import Diagrams.Backend.SVG.CmdLine

and we will also import a module for half tiles (explained later)

import HalfTile

These are the kite and dart tiles.

Kite and Dart
Kite and Dart

The red line marking here on the right hand copies, is purely to illustrate rules about how tiles can be put together for legal (non-periodic) tilings. Obviously edges can only be put together when they have the same length. If all the tiles are marked with red lines as illustrated on the right, the vertices where tiles meet must all have a red line or none must have a red line at that vertex. This prevents us from forming a simple rombus by placing a kite top at the base of a dart and thus enabling periodic tilings.

All edges are powers of the golden section \phi which we write as phi.

phi::Double
phi = (1.0 + sqrt 5.0) / 2.0

So if the shorter edges are unit length, then the longer edges have length phi. We also have the interesting property of the golden section that phi^2 = phi + 1 and so 1/phi = phi-1, phi^3 = 2phi +1 and 1/phi^2 = 2-phi.

All angles in the figures are multiples of tt which is 36 deg or 1/10 turn. We use ttangle to express such angles (e.g 180 degrees is ttangle 5).

ttangle:: Int -> Angle Double
ttangle n = (fromIntegral (n `mod` 10))*^tt
             where tt = 1/10 @@ turn

Pieces

In order to implement compChoices and decompPatch, we need to work with half tiles. We now define these in the separately imported module HalfTile with constructors for Left Dart, Right Dart, Left Kite, Right Kite

data HalfTile rep = LD rep -- defined in HalfTile module
                  | RD rep
                  | LK rep
                  | RK rep

where rep is a type variable allowing for different representations. However, here, we want to use a more specific type which we will call Piece:

type Piece = HalfTile (V2 Double)

where the half tiles have a simple 2D vector representation to provide orientation and scale. The vector represents the join edge of each half tile where halves come together. The origin for a dart is the tip, and the origin for a kite is the acute angle tip (marked in the figure with a red dot).

These are the only 4 pieces we use (oriented along the x axis)

ldart,rdart,lkite,rkite:: Piece
ldart = LD unitX
rdart = RD unitX
lkite = LK (phi*^unitX)
rkite = RK (phi*^unitX)
pieces
pieces

Perhaps confusingly, we regard left and right of a dart differently from left and right of a kite when viewed from the origin. The diagram shows the left dart before the right dart and the left kite before the right kite. Thus in a complete tile, going clockwise round the origin the right dart comes before the left dart, but the left kite comes before the right kite.

When it comes to drawing pieces, for the simplest case, we just want to show the two tile edges of each piece (and not the join edge). These edges are calculated as a list of 2 new vectors, using the join edge vector v. They are ordered clockwise from the origin of each piece

pieceEdges:: Piece -> [V2 Double]
pieceEdges (LD v) = [v',v ^-^ v'] where v' = phi*^rotate (ttangle 9) v
pieceEdges (RD v) = [v',v ^-^ v'] where v' = phi*^rotate (ttangle 1) v
pieceEdges (RK v) = [v',v ^-^ v'] where v' = rotate (ttangle 9) v
pieceEdges (LK v) = [v',v ^-^ v'] where v' = rotate (ttangle 1) v

Now drawing lines for the 2 outer edges of a piece is simply

drawPiece:: Piece -> Diagram B
drawPiece = strokeLine . fromOffsets . pieceEdges

and drawing all 3 edges round a piece is

drawRoundPiece:: Piece -> Diagram B
drawRoundPiece = strokeLoop . closeLine . fromOffsets . pieceEdges

To fill half tile pieces, we can use fillOnlyPiece which fills without showing edges of a half tile (by using line width none).

fillOnlyPiece:: Colour Double -> Piece -> Diagram B
fillOnlyPiece col piece = drawRoundPiece piece # fc col # lw none

We also use fillPieceDK which fills darts and kites with given colours and also draws edges using drawPiece.

fillPieceDK:: Colour Double -> Colour Double -> Piece -> Diagram B
fillPieceDK dcol kcol piece = drawPiece piece <> fillOnlyPiece col piece where
    col = case piece of (LD _) -> dcol
                        (RD _) -> dcol
                        (LK _) -> kcol
                        (RK _) -> kcol

For an alternative fill operation on whole tiles, it is useful to calculate a list of the 4 tile edges of a completed half-tile piece clockwise from the origin of the tile. (This will allow colour filling a whole tile)

wholeTileEdges:: Piece -> [V2 Double]
wholeTileEdges (LD v) = pieceEdges (RD v) ++ map negated (reverse (pieceEdges (LD v)))
wholeTileEdges (RD v) = wholeTileEdges (LD v)
wholeTileEdges (LK v) = pieceEdges (LK v) ++ map negated (reverse (pieceEdges (RK v)))
wholeTileEdges (RK v) = wholeTileEdges (LK v)

To fill whole tiles with colours, darts with dcol and kites with kcol we can now use leftFillPieceDK. This uses only the left pieces to identify the whole tile and ignores right pieces so that a tile is not filled twice.

leftFillPieceDK:: Colour Double -> Colour Double -> Piece -> Diagram B
leftFillPieceDK dcol kcol c = case c of 
  (LD _) -> (strokeLoop $ glueLine $ fromOffsets $ wholeTileEdges c)  # fc dcol
  (LK _) -> (strokeLoop $ glueLine $ fromOffsets $ wholeTileEdges c)  # fc kcol
  _      -> mempty

By making Pieces transformable we can reuse generic transform operations. These 4 lines of code are required to do this

type instance N (HalfTile a) = N a
type instance V (HalfTile a) = V a
instance Transformable a => Transformable (HalfTile a) where
    transform t ht = fmap (transform t) ht

So we can also scale and rotate a piece by an angle. (Positive rotations are in the anticlockwise direction.)

scale :: Double -> Piece -> Piece
rotate :: Angle Double -> Piece -> Piece

Patches

A patch is a list of located pieces (each with a 2D point)

type Patch = [Located Piece]

To turn a whole patch into a diagram using some function pd for drawing the pieces, we use

drawPatchWith:: (Piece -> Diagram B) -> Patch -> Diagram B 
drawPatchWith pd patch = position $ fmap (viewLoc . mapLoc pd) patch

Here mapLoc applies a function to the piece in a located piece – producing a located diagram in this case, and viewLoc returns the pair of point and diagram from a located diagram. Finally position forms a single diagram from the list of pairs of points and diagrams.

Update: We now use a class for drawable tilings, making Patch an instance

class Drawable a where
 drawWith :: (Piece -> Diagram B) -> a -> Diagram B
instance Drawable Patch where
 drawWith = drawPatchWith

We then introduce special cases:

draw :: Drawable a => a -> Diagram B
draw = drawWith drawPiece
fillDK:: Drawable a => Colour Double -> Colour Double -> a -> Diagram B
fillDK c1 c2 = drawWith (fillPieceDK c1 c2)

Patches are automatically inferred to be transformable now Pieces are transformable, so we can also scale a patch, translate a patch by a vector, and rotate a patch by an angle.

scale :: Double -> Patch -> Patch
rotate :: Angle Double -> Patch -> Patch
translate:: V2 Double -> Patch -> Patch

As an aid to creating patches with 5-fold rotational symmetry, we combine 5 copies of a basic patch (rotated by multiples of ttangle 2 successively).

penta:: Patch -> Patch
penta p = concatMap copy [0..4] 
            where copy n = rotate (ttangle (2*n)) p

This must be used with care to avoid nonsense patches. But two special cases are

sun,star::Patch         
sun =  penta [rkite `at` origin, lkite `at` origin]
star = penta [rdart `at` origin, ldart `at` origin]

This figure shows some example patches, drawn with draw The first is a star and the second is a sun.

tile patches
tile patches

The tools so far for creating patches may seem limited (and do not help with ensuring legal tilings), but there is an even bigger problem.

Correct Tilings

Unfortunately, correct tilings – that is, tilings which can be extended to infinity – are not as simple as just legal tilings. It is not enough to have a legal tiling, because an apparent (legal) choice of placing one tile can have non-local consequences, causing a conflict with a choice made far away in a patch of tiles, resulting in a patch which cannot be extended. This suggests that constructing correct patches is far from trivial.

The infinite number of possible infinite tilings do have some remarkable properties. Any finite patch from one of them, will occur in all the others (infinitely many times) and within a relatively small radius of any point in an infinite tiling. (For details of this see links at the end)

This is why we need a different approach to constructing larger patches. There are two significant processes used for creating patches, namely inflate (also called compose) and decompose.

To understand these processes, take a look at the following figure.

experiment
experiment

Here the small pieces have been drawn in an unusual way. The edges have been drawn with dashed lines, but long edges of kites have been emphasised with a solid line and the join edges of darts marked with a red line. From this you may be able to make out a patch of larger scale kites and darts. This is an inflated patch arising from the smaller scale patch. Conversely, the larger kites and darts decompose to the smaller scale ones.

Decomposition

Since the rule for decomposition is uniquely determined, we can express it as a simple function on patches.

decompPatch :: Patch -> Patch
decompPatch = concatMap decompPiece

where the function decompPiece acts on located pieces and produces a list of the smaller located pieces contained in the piece. For example, a larger right dart will produce both a smaller right dart and a smaller left kite. Decomposing a located piece also takes care of the location, scale and rotation of the new pieces.

decompPiece lp = case viewLoc lp of
  (p, RD vd)-> [ LK vd  `at` p
               , RD vd' `at` (p .+^ v')
               ] where v'  = phi*^rotate (ttangle 1) vd
                       vd' = (2-phi) *^ (negated v') -- (2-phi) = 1/phi^2
  (p, LD vd)-> [ RK vd `at` p
               , LD vd' `at` (p .+^ v')
               ]  where v'  = phi*^rotate (ttangle 9) vd
                        vd' = (2-phi) *^ (negated v')  -- (2-phi) = 1/phi^2
  (p, RK vk)-> [ RD vd' `at` p
               , LK vk' `at` (p .+^ v')
               , RK vk' `at` (p .+^ v')
               ] where v'  = rotate (ttangle 9) vk
                       vd' = (2-phi) *^ v' -- v'/phi^2
                       vk' = ((phi-1) *^ vk) ^-^ v' -- (phi-1) = 1/phi
  (p, LK vk)-> [ LD vd' `at` p
               , RK vk' `at` (p .+^ v')
               , LK vk' `at` (p .+^ v')
               ] where v'  = rotate (ttangle 1) vk
                       vd' = (2-phi) *^ v' -- v'/phi^2
                       vk' = ((phi-1) *^ vk) ^-^ v' -- (phi-1) = 1/phi

This is illustrated in the following figure for the cases of a right dart and a right kite.

explanation
explanation

The symmetric diagrams for left pieces are easy to work out from these, so they are not illustrated.

With the decompPatch operation we can start with a simple correct patch, and decompose repeatedly to get more and more detailed patches. (Each decomposition scales the tiles down by a factor of 1/phi but we can rescale at any time.)

This figure illustrates how each piece decomposes with 4 decomposition steps below each one.

four decompositions of pieces
four decompositions of pieces
thePieces =  [ldart, rdart, lkite, rkite]  
fourDecomps = hsep 1 $ fmap decomps thePieces # lw thin where
        decomps pc = vsep 1 $ fmap draw $ take 5 $ decompositionsP [pc `at` origin] 

We have made use of the fact that we can create an infinite list of finer and finer decompositions of any patch, using:

decompositionsP:: Patch -> [Patch]
decompositionsP = iterate decompPatch

We could get the n-fold decomposition of a patch as just the nth item in a list of decompositions.

For example, here is an infinite list of decomposed versions of sun.

suns = decompositionsP sun

The coloured tiling shown at the beginning is simply 6 decompositions of sun displayed using leftFillPieceDK

leftFilledSun6 :: Diagram B
leftFilledSun6 = drawWith (leftFillPieceDK red blue) (suns !!6) # lw thin

The earlier figure illustrating larger kites and darts emphasised from the smaller ones is also suns!!6 but this time pieces are drawn with experiment.

experimentFig = drawWith experiment (suns!!6) # lw thin
experiment:: Piece -> Diagram B
experiment pc = emph pc <> (drawRoundPiece pc # dashingN [0.002,0.002] 0 # lw ultraThin)
  where emph pc = case pc of
          (LD v) -> (strokeLine . fromOffsets) [v] # lc red   -- emphasise join edge of darts in red
          (RD v) -> (strokeLine . fromOffsets) [v] # lc red 
          (LK v) -> (strokeLine . fromOffsets) [rotate (ttangle 1) v] -- emphasise long edge for kites
          (RK v) -> (strokeLine . fromOffsets) [rotate (ttangle 9) v]

Compose Choices

You might expect composition (also called inflation) to be a kind of inverse to decomposition, but it is a bit more complicated than that. With our current representation of pieces, we can only compose single pieces. This amounts to embedding the piece into a larger piece that matches how the larger piece decomposes. There is thus a choice at each inflation step as to which of several possibilities we select as the larger half-tile. We represent this choice as a list of alternatives. This list should not be confused with a patch. It only makes sense to select one of the alternatives giving a new single piece.

The earlier diagram illustrating how decompositions are calculated also shows the two choices for embedding a right dart into either a right kite or a larger right dart. There will be two symmetric choices for a left dart, and three choices for left and right kites.

Once again we work with located pieces to ensure the resulting larger piece contains the original in its original position in a decomposition.

compChoices :: Located Piece -> [Located Piece]
compChoices lp = case viewLoc lp of
  (p, RD vd)-> [ RD vd' `at` (p .+^ v')
               , RK vk  `at` p
               ] where v'  = (phi+1) *^ vd                  -- vd*phi^2
                       vd' = rotate (ttangle 9) (vd ^-^ v')
                       vk  = rotate (ttangle 1) v'
  (p, LD vd)-> [ LD vd' `at` (p .+^ v')
               , LK vk `at` p
               ] where v'  = (phi+1) *^ vd                  -- vd*phi^2
                       vd' = rotate (ttangle 1) (vd ^-^ v')
                       vk  = rotate (ttangle 9) v'
  (p, RK vk)-> [ LD vk  `at` p
               , LK lvk' `at` (p .+^ lv') 
               , RK rvk' `at` (p .+^ rv')
               ] where lv'  = phi*^rotate (ttangle 9) vk
                       rv'  = phi*^rotate (ttangle 1) vk
                       rvk' = phi*^rotate (ttangle 7) vk
                       lvk' = phi*^rotate (ttangle 3) vk
  (p, LK vk)-> [ RD vk  `at` p
               , RK rvk' `at` (p .+^ rv')
               , LK lvk' `at` (p .+^ lv')
               ] where v0 = rotate (ttangle 1) vk
                       lv'  = phi*^rotate (ttangle 9) vk
                       rv'  = phi*^rotate (ttangle 1) vk
                       rvk' = phi*^rotate (ttangle 7) vk
                       lvk' = phi*^rotate (ttangle 3) vk

As the result is a list of alternatives, we need to select one to do further inflations. We can express all the alternatives after n steps as compNChoices n where

compNChoices :: Int -> Located Piece -> [Located Piece]
compNChoices 0 lp = [lp]
compNChoices n lp = do
    lp' <- inflate lp
    inflations (n-1) lp'

This figure illustrates 5 consecutive choices for inflating a left dart to produce a left kite. On the left, the finishing piece is shown with the starting piece embedded, and on the right the 5-fold decomposition of the result is shown.

five inflations
five inflations
fiveCompChoices = hsep 1 $ [ draw [ld] <> draw [lk']
                           , draw (decompositionsP [lk'] !!5)
                           ] where
  ld  = (ldart `at` origin)
  lk  = compChoices ld  !!1
  rk  = compChoices lk  !!1
  rk' = compChoices rk  !!2
  ld' = compChoices rk' !!0
  lk' = compChoices ld' !!1

Finally, at the end of this literate haskell program we choose which figure to draw as output.

fig :: Diagram B
fig = leftFilledSun6
main = mainWith fig

That’s it. But, What about composing whole patches?, I hear you ask. Unfortunately we need to answer questions like what pieces are adjacent to a piece in a patch and whether there is a corresponding other half for a piece. These cannot be done with our simple vector representations. We would need some form of planar graph representation, which is much more involved. That is another story.

Many thanks to Stephen Huggett for his inspirations concerning the tilings. A library version of the above code is available on GitHub

Further reading on Penrose Tilings

As well as the Wikipedia entry Penrose Tilings I recommend two articles in Scientific American from 2005 by David Austin Penrose Tiles Talk Across Miles and Penrose Tilings Tied up in Ribbons.

There is also a very interesting article by Roger Penrose himself: Penrose R Tilings and quasi-crystals; a non-local growth problem? in Aperiodicity and Order 2, edited by Jarich M, Academic Press, 1989.

More information about the diagrams package can be found from the home page Haskell diagrams

by readerunner at January 10, 2024 04:48 PM

Well-Typed.Com

When "blocked indefinitely" is not indefinite

Consider a Haskell thread trying to read from a TMVar:

x <- atomically $ takeTMVar v

If the TMVar is currently empty and there are no other threads that could write to the TMVar, then this thread will never be able to make progress. The GHC runtime detects such situations, and this call to atomically will throw a BlockedIndefinitelyOnSTM exception, rendered as

thread blocked indefinitely in an STM transaction

Occasionally, however, the runtime will throw this exception even when progress is possible.

This blog post is not the first to make this observation; Simon Marlow’s book Parallel and Concurrent Programming in Haskell discusses it, and it occasionally comes up in various tickets (e.g. GHC #9401, GHC #10241, Async #14). Nonetheless, the problem is not as widely known as perhaps it should be, and it can lead to very confusing behaviour. In this blog post we will therefore examine when and how this can arise, and what we can do about it.

Note on terminology. In the Haskell literature a thread that cannot make progress in a situation like this is often referred to as “deadlocked” (for example, section Detecting Deadlock in Simon Marlow’s book, or the section on Deadlock in the documentation of Control.Concurrent). However, traditionally the term deadlock refers to a situation in which a group of threads cannot make progress because they are all waiting on each other; that need not be the case here, and so we will avoid the term “deadlock” in this blog post, instead referring to a thread that cannot make progress as “stalled.”

Example

Consider an application with three threads.

  1. The first thread mimicks a low-level network library (such as http2); we’ll assume it decodes messages from a network interface and makes them available on a TMVar.
  2. The second thread mimicks a higher-level networking library (such as grapesy, a gRPC library); for our purposes we’ll just assume that this reads the messages from the TMVar and writes them to a TQueue, providing some kind of buffering.1
  3. The third thread mimicks the application layer; here we’ll just assume it’s reading from the TQueue and writes all messages to the terminal.

In other words, the setup looks something like this:

/--------\          /--------\            /-------\
|        |   TMVar  |        |   TQueue   |       |
| decode | <------> | buffer | <--------> | write |
|        |          |        |            |       |
\--------/          \--------/            \-------/

For the implementation of decode, we will simply write a new “message” every two seconds:

decode :: TMVar Int -> IO ()
decode decoderOutput =
    forM_ [1..] $ \i -> do
      threadDelay 2_000_000
      atomically $ putTMVar decoderOutput i

For buffer we want to wait for each message, enqueue it, and loop; however, we also want to detect if any exceptions are thrown, and if so, write those to the queue as well2:

newtype NetworkFailure = NetworkFailure SomeException
  deriving (Show)

buffer :: TMVar Int -> TQueue (Either NetworkFailure Int) -> IO ()
buffer decoderOutput queue =
    loop
  where
    loop :: IO ()
    loop = do
        mMsg <- try $ atomically $ takeTMVar decoderOutput
        case mMsg of
          Left err -> do
            atomically $ writeTQueue queue $ Left (NetworkFailure err)
            putStrLn $ "buffer exiting: " ++ show err
          Right msg -> do
            atomically $ writeTQueue queue $ Right msg
            loop

Finally, in write we wait for a message, print it to the terminal, and loop, unless we see an exception reported:

write :: TQueue (Either NetworkFailure Int) -> IO ()
write queue =
    loop
  where
    loop :: IO ()
    loop = do
        mMsg <- atomically $ readTQueue queue
        case mMsg of
          Left (NetworkFailure err) ->
            putStrLn $ "Network failure: " ++ show err
          Right msg -> do
            putStrLn $ "Incoming message: " ++ show msg
            loop

When the decoder dies

Suppose we start all three threads running: decode is writing a new message to the TMVar every two seconds, buffer is copying all of these to the TQueue, and write is dequeueing them and writing them to the terminal – and then we kill the decoder thread (this might simulate a network failure, for example).

Since nobody is writing to the TMVar anymore, the takeTMVar in buffer cannot make progress and will throw a “blocked indefinitely” exception; this exception is caught, written to the TQueue as NetworkFailure BlockedIndefinitelyOnSTM, and buffer terminates cleanly. So far so good.

However, the write thread does not manage to dequeue this NetworkFailure; instead, it is killed with a BlockedIndefinitelyOnSTM of its own when it reads from the TQueue. This is extremely confusing; we can see that the write to the TQueue happens, so why is the readTQueue considered to be blocked indefinitely? Indeed, if we replace the call to atomically in write by

atomicallyStubborn :: forall a. STM a -> IO a
atomicallyStubborn stm = persevere
  where
    persevere :: IO a
    persevere =
        catch (atomically stm) $ \BlockedIndefinitelyOnSTM ->
          persevere

which simply attempts the transaction again when it receives the “blocked indefinitely” exception, then the write thread does see the NetworkFailure being reported and terminates cleanly.

To re-iterate, we have a readTQueue from a queue and we see something being written to that TQueue. Yet, that readTQueue throws a “blocked indefinitely” exception, and if we try to read again after receiving that exception, the read succeeds. This clearly illustrates that the thread was not blocked indefinitely, and can make progress. So why was it killed?

Detection of stalled threads

The detection of stalled threads happens as part of garbage collection. The default garbage collector (GC) in ghc uses a mark-and-sweep algorithm: it first traverses the heap, starting at a set of roots, marking everything that is reachable from that set of roots, and then collects (“deletes”) everything that was not marked. As a first approximation, the set of the roots is the set of running (non-blocked) threads.

To understand how this relates to the detection of stalled threads, we need to know two additional pieces of information. First, threads and TVars are themselves heap allocated objects (and, by extension, so are TMVars and TQueues, which are built from TVars)3. Second, TVars have associated lists of threads that are blocked on them.

Let’s first consider the non-exceptional situation where the buffer thread is blocked (i.e., not running), waiting on the TMVar, and the decode thread is running and is therefore a GC root. As we start traversing the heap, starting at the decode thread, the GC will encounter the TMVar; from there, it will mark the buffer thread, which is recorded as one of the threads blocked on that TMVar. Since the buffer thread gets marked, the runtime concludes that it is not stalled.

However, now consider what happens when the decode thread is killed. When GC marks the heap, it never encounters the buffer thread (there is no path from any running thread to the buffer thread). After everything is marked, the GC therefore concludes that the buffer thread is unreachable, and hence that it must be stalled, and it sends it the BlockedIndefinitelyOnSTM exception. Put another way:

A thread that is blocked on a TVar is considered blocked indefinitely if there is no reference to that TVar from a running thread.

The problem is that the buffer thread is not the only thread that is unreachable: the write thread is also blocked, and similarly cannot be reached from any running thread. The GC therefore concludes that it is also stalled, and sends it too a BlockedIndefinitelyOnSTM exception. The fact that the buffer thread can recover from this exception, and then in turn unblocks the write thread, is invisible to the GC. The write thread is sent the exception before the exception handler in the buffer thread even gets a chance to run.

Whether or not this is the correct behaviour can of course be argued. It is certainly expected behaviour, and changing it may be non-trivial. In the remainder of this blog post we will consider how we can work with this behaviour to get the results we want.

Workaround

As the library author, we know that a read from the TQueue in the write thread can never be blocked indefinitely (provided we stop reading after receiving a NetworkFailure). It would therefore be good if we could somehow exclude that thread from stall detection.

One option is to use something like atomicallyStubborn in the write thread. This works, but has two important downsides. First, in our example the write thread is the “application layer”; if we are the author of the library in the middle (the buffer thread), we would not have control over the write thread. But perhaps instead of exposing the TQueue directly, we could offer an API such as

dequeue :: TQueue (Either NetworkFailure Int) -> IO (Either NetworkFailure Int)
dequeue queue = atomicallyStubborn $ readTQueue queue

However, this doesn’t address the second, more important, problem. Suppose that instead of writing the messages to the terminal, the write thread writes those messages to a TMVar of its own:

/--------\          /--------\            /-------\          /-----\
|        |   TMVar  |        |   TQueue   |       |   TMVar  |     |
| decode | <------> | buffer | <--------> | write | <------> | ... |
|        |          |        |            |       |          |     |
\--------/          \--------/            \-------/          \-----/

If we now have a fourth thread reading from this TMVar, it too must use atomicallyStubborn to read from that TMVar, or else be subject to the exact same problem again.

A better workaround is to consider the write thread to be a GC root when it is blocked on the TQueue; this way it will not be considered as stalled, and nor will any other threads that depend on it (such as the fourth thread in the example above). We can do this by providing creating a stable pointer to the thread (this workaround is due to Simon Marlow):

write :: TQueue (Either NetworkFailure Int) -> IO ()
write queue = do
    void $ newStablePtr =<< myThreadId
    loop
  where
    loop = .. -- as before

Alternatively, if write is considered part of the application layer again and we are developing the middle layer, we could provide a dequeue function such as

dequeue :: TQueue (Either NetworkFailure Int) -> IO (Either NetworkFailure Int)
dequeue queue = do
    tid <- myThreadId
    bracket (newStablePtr tid) freeStablePtr $ \_ ->
      atomically $ readTQueue queue

Here we make sure to free the stable pointer again, because we don’t want to turn off stall detection entirely in the user’s application.

Conclusions

Arguably the real problem in our running example is that a network failure in the decode thread results in a stall in the first place. Indeed, Simon Marlow writes:

You should not rely on deadlock detection for the correct working of your program. Deadlock detection is a debugging feature; in the event of a deadlock, you get an exception rather than a silent hang, but you should aim to never have any deadlocks in your program.

In practice this is not always easy to achieve; maybe because the stall originates in a library we have no control over, or simply because fixing the problem is difficult (e.g. see Http2 #97, Http2 #104); in cases like this it is important to have a reliable workaround.

For the reasons we explained above, we do not consider atomicallyStubborn to be a good workaround. However, variants of this pattern are occasionally useful and do appear in the wild; for example, it gets used in async (Async #14), although it is not entirely clear why the stable pointer workaround did not work there.

For completeness sake, we want to mention two further complications. First, the interaction between finalizers and stalled threads is subtle; see the section on Deadlock in the documentation of Control.Concurrent (and GHC #11001). Second, in the case of both infinite recursion and stalling, you might not get the exception you prefer; see Shake #294 / GHC #10793.

This work was sponsored by Anduril as part of the development of grapesy, a new Haskell library for gRPC. This library is currently still under development, but we expect to make a first release relatively soon. When that happens it will of course be accompanied by a blog post on the Well-Typed blog.

With thanks to Justin Le, Ryan Brown and Kazu Yamamoto for their comments on a draft of this blog post.


  1. In reality an unbounded buffer like this is almost certainly a bad idea. This is just an example.↩︎

  2. This may seem a bit contrived in this hugely simplified example, but if the “get a new message” function, which here is simply the call to takeTMVar, would instead be something like getNewMessage :: IO Int, and could throw all kinds of networking exceptions, as well as the “blocked indefinitely” exception we discuss in this blog post, it is a lot more natural.↩︎

  3. Stall detection for MVars works exactly the same way.↩︎

by edsko at January 10, 2024 12:00 AM

January 09, 2024

GHC Developer Blog

GHC 9.6.4 is now available

GHC 9.6.4 is now available

Zubin Duggal - 2024-01-09

The GHC developers are happy to announce the availability of GHC 9.6.4. Binary distributions, source distributions, and documentation are available on the release page.

This release is primarily a bugfix release addressing a few issues found in the 9.6 series. These include:

  • A fix for a bug where certain warnings flags were not recognised (#24071)
  • Fixes for a number of simplifier bugs (#23952, #23862).
  • Fixes for compiler panics with certain package databases involving unusable units and module reexports (#21097, #16996, #11050).
  • A fix for a typechecker crash (#24083).
  • A fix for a code generator bug on AArch64 platforms resulting in invalid conditional jumps (#23746).
  • Fixes for some memory leaks in GHCi (#24107, #24118)
  • And a few other fixes

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Enjoy!

-Zubin

by ghc-devs at January 09, 2024 12:00 AM

January 08, 2024

Monday Morning Haskell

How to Write Comments in Haskell

Comments are often a simple item to learn, but there's a few ways we can get more sophisticated with them! This article is all about writing comments in Haskell. Here's a quick outline to get you started!

  • What is a Comment?
  • Single Line Comments
  • Multi-Line Comments
  • Inline Comments
  • Writing Formal Documentation Comments
  • Intro to Haddock
  • Basic Haddock Comments
  • Creating Our Haskell Report
  • Documenting the Module Header
  • Module Header Fields
  • Haddock Comments Below
  • Commenting Type Signatures
  • Commenting Constructors
  • Commenting Record Fields
  • Commenting Class Definitions
  • A Complete Introduction to the Haskell Programming Language

    What is a Comment?

    A comment is non-code note you write in a code file. You write it to explain what the code does or how it works, in order to help someone else reading it. Comments are ignored by a language's compiler or interpreter. There is usually some kind of syntax to comments to distinguish them from code. Writing comments in Haskell isn't much different from other programming languages. But in this article, we'll look extensively at Haddock, a more advanced program for writing nice-looking documentation.

    Single Line Comments

    The basic syntax for comments in Haskell is easy, even if it is unusual compared to more common programming languages. In languages like Java, Javascript and C++, you use two forward slashes to start a single line comment:

    int main() {
    // This line will print the string value "Hello, World!" to the console
    std::cerr << "Hello, World!" << std::endl;
    }

    But in Haskell, single line comments start with two hyphens, '--':

    -- This is our 'main' function, which will print a string value to the console
    main :: IO ()
    main = putStrLn "Hello World!"

    You can have these take up an entire line by themselves, or you can add a comment after a line of code. In this simple "Hello World" program, we place a comment at the end of the first line of code, giving instructions on what would need to happen if you extended the program.

    main :: IO ()
    main = -- Add 'do' to this line if you add another 'putStrLn' statement!
    putStrLn "Hello World!"

    Multi-Line Comments

    While you can always start multiple consecutive lines with whatever a comment line starts with in your language, many languages also have a specific way to make multiline comments. And generally speaking, this method has a "start" and an "end" sequence. For example, in C++ or Java, you start a multi line comment block with the characters '/' and end it with '/'

    /*
    This function returns a new list
    that is a reversed copy of the input. 
    
    It iterates through each value in the input 
    and uses 'push_front' on the new copy.
    */
    std::list<int> reverseList(const std::list<int>& ints) {
    std::list<int> result;
    for (const auto& i : ints) {
      result.push_front(i);
    }
    return result;
    }

    In Haskell, it is very similar. You use the brace and a hyphen character to open ('{-') and then the reverse to close the block ('-}').

    {- This function returns a new list
     that is a reversed copy of the input.
    
     It uses a tail recursive helper function.
    -}
    reverse :: [a] -> [a]
    reverse = reverseTail []
    where
      reverseTail acc [] = acc
      reverseTail acc (x : xs) = reverseTail (x : acc) xs

    Notice we don't have to start every line in the comment with double hyphens. Everything in there is part of the comment, until we reach the closing character sequence. Comments like these with multiple lines are also known as "block comments". They are useful because it is easy to add more information to the comment without adding any more formatting.

    Inline Comments

    While you generally use the brace/hyphen sequence to write a multiline comment, this format is surprisingly also useful for a particular form of single line comments. You can write an "inline" comment, where the content is in between operational code on that line.

    reverse :: [a] -> [a]
    reverse = reverseTail []
    where
      reverseTail {- Base Case -}      acc [] = acc
      reverseTail {- Recursive Case -} acc (x : xs) = reverseTail (x : acc) xs

    The fact that our code has a start and end sequence means that the compiler knows where the real code starts up again. This is impossible when you use double hyphens to signify a comment.

    Writing Formal Documentation Comments

    If the only people using this code will be you or a small team, the two above techniques are all you really need. They tell people looking at your source code (including your future self) why you have written things in a certain way, and how they should work. However, if other people will be using your code as a library without necessarily looking at the source code, there's a much deeper area you can explore. In these cases, you will want to write formal documentation comments. A documentation comment tells someone what a function does, generally without going into the details of how it works. More importantly, documentation comments are usually compiled into a format for someone to look at outside of the source code. These sorts of comments are aimed at people using your code as a library. They'll import your module into their own programs, rather than modifying it themselves. You need to answer questions they'll have like "How do I use this feature?", or "What argument do I need to provide for this function to work"? You should also consider having examples in this kind of documentation, since these can explain your library much better than plain statements. A simple code snippet often provides way more clarification than a long document of function descriptions.

    Intro to Haddock

    As I mentioned above, formal documentation needs to be compiled into a format that is more readable than source code. In most cases, this requires an additional tool. Doxygen, for example, is one tool that supports many programming languages, like C++ and Python. Haskell has a special tool called Haddock. Luckily, you probably don't need to go through any additional effort to install Haddock. If you used GHCup to install Haskell, then Haddock comes along with it automatically. (For a full walkthrough on getting Haskell installed, you can read our Startup Guide). It also integrates well with Haskell's package tools, Stack and Cabal. In this article we'll use it through Stack. So if you want to follow along, you should create a new Haskell project on your machine with Stack, calling it 'HaddockTest'. Then build the code before we add comments so you don't have to wait for it later:

    >> stack new HaddockTest
    >> cd HaddockTest
    >> stack build

    You can write all the code from the rest of the article in the file 'src/Lib.hs', which Stack creates by default.

    Basic Haddock Comments

    Now let's see how easy it is to write Haddock comments! To write basic comments, you just have to add a vertical bar character after the two hyphens:

    -- | Get the "block" distance of two 2D coordinate pairs
    manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
    manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

    It still works even if you add a second line without the vertical bar. All comment lines until the type signature or function definition will be considered part of the Haddock comment.

    -- | Get the "block" distance of two 2D coordinate pairs
    -- This is the sum of the absolute difference in x and y values.
    manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
    manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

    You can also make a block comment in the Haddock style. It involves the same character sequences as multi line comments, but once again, you just add a vertical bar after the start sequence. The end sequence does not need the bar:

    {-| Get the "block" distance of two 2D coordinate pairs
     This is the sum of the absolute difference in x and y values.
    -}
    manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
    manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

    No matter which of these options you use, your comment will look the same in the final document. Next, we'll see how to generate our Haddock document. To contrast Haddock comments with normal comments, we'll add a second function in our code with a "normal" single line comment. We also need to add both functions to the export list of our module at the top: `haskell module Lib ( someFunc, , manhattanDistance , euclidenDistance ) where

...

-- Get the Euclidean distance of two 2D coordinate pairs (not Haddock) euclideanDistance :: (Double, Double) -> (Double, Double) -> Double euclideanDistance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ^ 2 + (y2 - y1) ^ 2)

Now let's create our document!
## Creating Our Haskell Report
To generate our document, we just use the following command:
```bash
>> stack haddock

This will compile our code. At the end of the process, it will also inform us about what percentage of the elements in our code used Haddock comments. For example:

25% (  1 /  4) in 'Lib'
  Missing documentation for:
    Module header
    someFunc (src/Lib.hs:7)
    euclideanDistance (src/Lib.hs:17)

As expected, 'euclideanDistance' is not considered to have a Haddock comment. We also haven't defined a Haddock comment for our module header. We'll do that in the next section. We'll get rid of the 'someFunc' expression, which is just a stub. This command will generate HTML files for us, most importantly an index file! They get generated in the '.stack-work' directory, usually in a folder that looks like '{project}/.stack-work/install/{os}/{hash}/{ghc_version}/doc/'. For example, the full path of my index file in this example is:

/home/HaddockTest/.stack-work/install/x86_64-linux-tinfo6/6af01190efdb20c14a771b6e2823b492cb22572e9ec30114989156919ec4ab3a/9.6.3/doc/index.html

You can open the file with your web browser, and you'll find a mostly blank page listing the modules in your project, which at this point should only be 'Lib'. If you click on 'Lib', it will take you to a page that looks like this:

We can see that all three expressions from our file are there, but only 'manhattanDistance' has its comment visible on the page. What's neat is that the type links all connect to documentation for the base libraries. If we click on 'Int', it will take us to the page for the 'base' package module 'Data.Int', giving documentation on 'Int' and other integer types.

Documenting the Module Header

In the picture above, you'll see a blank space between our module name and the 'Documentation' section. This is where the module header documentation should go. Let's see how to add this into our code. Just as Haddock comments for functions should go above their type signatures, the module comment should go above the module declaration. You can start it with the same format as you would have with other Haddock block comments:

{-| This module exposes a couple functions
    related to 2D distance calculation.
-}
module Lib
  ( manhattanDistance
  , euclideanDistance
  ) where

...

If you rerun 'stack haddock' and refresh your Haddock page, this comment will now appear under 'Lib' and above 'Documentation'. This is the simplest thing you can do to provide general information about the module.

Module Header Fields

However, there are also additional fields you can add to the header that Haddock will specifically highlight on the page. Suppose we update our block comment to have these lines:

{-|
Module: Lib
Description: A module for distance functions.
Copyright: (c) Monday Morning Haskell, 2023
License: MIT
Maintainer: person@mmhaskell.com

The module has two functions. One calculates the "Manhattan" distance, or "block" distance on integer 2D coordinates. The other calculates the Euclidean distance for a floating-point coordinate system.
-}
module Lib
  ( manhattanDistance
  , euclideanDistance
  ) where

...

At the bottom of the multi line comment, after all the lines for the fields, we can put a longer description, as you see. After adding this, removing 'someFunc', and making our prior comment on Euclidean distance a Haddock comment, we now get 100% marks on the documentation for this module when we recompile it:

100% (  3 /  3) in 'Lib'

And here's what our HTML page looks like now. Note how the fields we entered are populated in the small box in the upper right.

Note that the short description we gave is now visible next to the module name on the index page. This page still only contains the description below the fields.

Haddock Comments Below

So far, we've been using the vertical bar character to place Haddock comments above our type signatures. However, it is also possible to place comments below the type signatures, and this will introduce us to a new syntax technique that we'll use for other areas. The general idea is that we can use a caret character '^' instead of the vertical bar, indicating that the item we are commenting is "above" or "before" the comment. We can do this either with single line comments or block comments. Here's how we would use this technique with our existing functions:

manhattanDistance :: (Int, Int) -> (Int, Int) -> Int
-- ^ Get the "blocK" distance of two 2D coordinate pairs
manhattanDistance (x1, y1) (x2, y2) = abs (x2 - x1) + abs (y2 - y1)

euclideanDistance :: (Double, Double) -> (Double, Double) -> Double
{- ^ Get the Euclidean distance of two 2D coordinate pairs
     This uses the Pythagorean formula.
-}
euclideanDistance (x1, y1) (x2, y2) = sqrt ((x2 - x1) ^ 2 + (y2 - y1) ^ 2)

The comments will appear the same in the final documentation.

Commenting Type Signatures

The comments we've written so far have described each function as a unit. However, sometimes you want to make notes on specific function arguments. The most common way to write these comments in Haskell with Haddock is with the "above" style. Each argument goes on its own line with a "caret" Haddock comment after it. Here's an example:

-- | Given a base point and a list of other points, returns
-- the shortest distance from the base point to a point in the list.
shortestDistance ::
  (Double, Double) -> -- ^ The base point we are measuring from
  [(Double, Double)] -> -- ^ The list of alternative points
  Double
shortestDistance base [] = -1.0
shorestDistance base rest = minimum $ (map (euclideanDistance base) rest)

It is also possible to write these with the vertical bar above each argument, but then you will need a second line for the comment.

-- | Given a base point and a list of other points, returns
-- the shortest distance from the base point to a point in the list.
shortestDistance ::
  -- | The base point we are measuring from
  (Double, Double) ->
  -- | The list of alternative points
  [(Double, Double)] -> 
  Double
shortestDistance base [] = -1.0
shorestDistance base rest = minimum $ (map (euclideanDistance base) rest)

It is even possible to write the comments before AND on the same line as inline comments. However, this is less common since developers usually prefer seeing the type as the first thing on the line.

Commenting Constructors

You can also use Haddock comments for type definitions. Here is an example of a data type with different constructors. Each gets a comment.

data Direction =
  DUp    | -- ^ Positive y direction
  DRight | -- ^ Positive x direction
  DDown  | -- ^ Negative y direction
  DLeft    -- ^ Negative x direction

Commenting Record Fields

You can also comment record fields within a single constructor.

data Movement = Movement
  { direction :: Direction -- ^ Which way we are moving
  , distance  :: Int       -- ^ How far we are moving
  }

An important note is that if you have a constructor on the same line as its fields, a single caret comment will refer to the constructor, not to its last field.

data Point =
  Point2I Int Int       |      -- ^ 2d integral coordinate
  Point2D Double Double |      -- ^ 2d floating point coordinate
  Point3I Int Int Int   |      -- ^ 3d integral coordinate
  Point3D Double Double Double -- ^ 3d floating point coordinate

Commenting Class Definitions

As one final feature, we can add these sorts of comments to class definitions as well. With class functions, it is usually better to use "before" comments with the vertical bar. Unlike constructors and fields, an "after" comment will get associated with the argument, not the method.

{-| The Polar class describes objects which can be described
    in "polar" coordinates, with a magnitude and angle
-}
class Polar a where
  -- | The total length of the item
  magnitude :: a -> Double 
  -- | The angle (in radians) of the point around the z-axis
  angle :: a -> Double

Here's what all these new pieces look like in our documentation:

You can see the way that each comment is associated with a particular field or argument.

A Complete Introduction to the Haskell Programming Language

Of course, comments are useless if you have no code or projects to write them in! If you're a beginner to Haskell, the fastest way to get up to writing project-level code is our course, Haskell From Scratch! This course features hours of video lectures, over 100 programming exercises, and a final project to test your skills! Learn more about it on this page!

by James Bowen at January 08, 2024 04:00 PM

January 04, 2024

Stackage Blog

LTS 22 release for ghc-9.6 and Nightly now on ghc-9.8

Stackage LTS 22 has been released

The Stackage team is happy to announce that Stackage LTS version 22 was released last month, based on GHC stable version 9.6.3.

LTS 22 includes many package changes, and has over 3300 packages! Thank you for all the nightly contributions that made this release possible: the release was made by Mihai Maruseac. (The closest nightly snapshot to lts-22.0 is nightly-2023-12-17.)

If your package is missing from LTS 22 and builds there, you can easily request to have it added by (new) opening a PR in the lts-haskell project to the build-constraints/lts-22-build-constraints.yaml file. The new LTS workflow was implemented by Adam Bergmark and first appeared in lts-22.1: we are in the process of updating our documentation to cover the new nightly-style workflow for LTS snapshots.

Stackage Nightly updated to ghc-9.8.1

At the same time we are excited to have moved Stackage Nightly to GHC 9.8.1: the initial snapshot being nightly-2023-12-27. Current nightly has over 2400 packages, but we expect that number to continue to grow over the coming days, weeks, and months: we very much welcome your contributions and help with this. You can see all the changes made relative to the preceding last 9.6 nightly snapshot. The initial snapshot was done by Alexey Zabelin and Jens.

Thank you to all those who have already done work updating their packages to ghc-9.8.

Adding or enabling your package for Nightly is just a simple pull request to the large build-constraints.yaml file.

If you have questions you can also ask in the Slack #stackage channel.

New HF server

We would also like to take this opportunity to thank the Haskell Foundation for providing the new upgraded Stackage build-server (setup by Bryan Richter, along with other stackage.org migration), which has greatly helped our daily work with much increased performance and storage.

January 04, 2024 04:00 AM

January 03, 2024

Derek Elkins

Universal Quantification and Infinite Conjunction

Introduction

It is not uncommon for universal quantification to be described as (potentially) infinite conjunction1. Quoting Wikipedia’s Quantifier_(logic) page (my emphasis):

For a finite domain of discourse |D = \{a_1,\dots,a_n\}|, the universal quantifier is equivalent to a logical conjunction of propositions with singular terms |a_i| (having the form |Pa_i| for monadic predicates).

The existential quantifier is equivalent to a logical disjunction of propositions having the same structure as before. For infinite domains of discourse, the equivalences are similar.

While there’s a small grain of truth to this, I think it is wrong and/or misleading far more often than it’s useful or correct. Indeed, it takes a bit of effort to even get a statement that makes sense at all. There’s a bit of conflation between syntax and semantics that’s required to have it naively make sense, unless you’re working (quite unusually) in an infinitary logic where it is typically outright false.

What harm does this confusion do? The most obvious harm is that this view does not generalize to non-classical logics. I’ll focus on constructive logics, in particular. Besides causing problems in these contexts, which maybe you think you don’t care about, it betrays a significant gap in understanding of what universal quantification actually is. Even in purely classical contexts, this confusion often manifests, e.g., in confusion about |\omega|-inconsistency.

So what is the difference between universal quantification and infinite conjunction? Well, the most obvious difference is that infinite conjunction is indexed by some (meta-theoretic) set that doesn’t have anything to do with the domain the universal quantifier quantifies over. However, even if these sets happened to coincide2 there are still differences between universal quantification and infinite conjunction. The key is that universal quantification requires the predicate being quantified over to hold uniformly, while infinite conjunction does not. It just so happens that for the standard set-theoretic semantics of classical first-order logic this “uniformity” constraint is degenerate. However, even for classical first-order logic, this notion of uniformity will be relevant.

Classical Semantic View

I want to start in the context where this identification is closest to being true, so I can show where the idea comes from. The summary of this section is that the standard, classical, set-theoretic semantics of universal quantification is equivalent to an infinitary generalization of the semantics of conjunction. The issue is “infinitary generalization of the semantics of conjunction” isn’t the same as “semantics of infinitary conjunction”.

The standard set-theoretic semantics of classical first-order logic interprets each formula, |\varphi|, as a subset of |D^{\mathsf{fv}(\varphi)}| where |D| is a given domain set and |\mathsf{fv}| computes the (necessarily finite) set of free variables of |\varphi|. Traditionally, |D^{\mathsf{fv}(\varphi)}| would be identified with |D^n| where |n| is the cardinality of |\mathsf{fv}(\varphi)|. This involves an arbitrary mapping of the free variables of |\varphi| to the numbers |1| to |n|. The semantics of a formula then becomes an |n|-ary set-theoretic relation.

The interpretation of binary conjunction is straightforward:

\[\den{\varphi \land \psi} = \den{\varphi} \cap \den{\psi}\]

where |\den{\varphi}| stands for the interpretation of the formula |\varphi|. To be even more explicit, I should index this notation by a structure which specifies the domain, |D|, as well as the interpretations of any predicate or function symbols, but we’ll just consider this fixed but unspecified.

The interpretation of universal quantification is more complicated but still fairly straightforward:

\[\den{\forall x.\varphi} = \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\}\]

Set-theoretically, we have:

\[\begin{align} \bar z \in \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\} \iff & \forall d \in D. \bar z \in \left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\} \\ \iff & \forall d \in D. \exists \bar y \in \den{\varphi}. \bar z = \bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \land \bar y(x) = d \\ \iff & \forall d \in D. \bar z[x \mapsto d] \in \den{\varphi} \end{align}\]

where |f[x \mapsto c]| extends a function |f \in D^{S}| to a function in |D^{S \cup \{x\}}| via |f[x \mapsto c](v) = \begin{cases}c, &\textrm{ if }v = x \\ f(v), &\textrm{ if }v \neq x\end{cases}|. The final |\iff| arises because |\bar z[x \mapsto d]| is the unique function which extends |\bar z| to the desired domain such that |x| is mapped to |d|. Altogether, this illustrates our desired semantics of the interpretation of |\forall x.\varphi| being the interpretations of |\varphi| which hold when |x| is interpreted as any element of the domain.

This demonstrates the summary that the semantics of quantification is an infinitary version of the semantics of conjunction, as |\bigcap| is an infinitary version of |\cap|. But even here there are substantial cracks in this perspective.

Infinitary Logic

The first problem is that we don’t have an infinitary conjunction so saying universal quantification is essentially infinitary conjunction doesn’t make sense. However, it’s easy enough to formulate the syntax and semantics of infinitary conjunction (assuming we have a meta-theoretic notion of sets).

Syntactically, for a (meta-theoretic) set |I| and an |I|-indexed family of formulas |\{\varphi_i\}_{i \in I}|, we have the infinitary conjunction |\bigwedge_{i \in I} \varphi_i|.

The set-theoretic semantics of this connective is a direct generalization of the binary conjunction case:

\[\bigden{\bigwedge_{i \in I}\varphi_i} = \bigcap_{i \in I}\den{\varphi_i}\]

If |I = \{1,2\}|, we recover exactly the binary conjunction case.

Equipped with a semantics of actual infinite conjunction, we can compare to the semantics of universal quantification case and see where things go wrong.

The first problem is that it makes no sense to choose |I| to be |D|. The formula |\bigwedge_{i \in I} \varphi_i| can be interpreted with respect to many different domains. So any particular choice of |D| would be wrong for most semantics. This is assuming that our syntax’s meta-theoretic sets were the same as our semantics’ meta-theoretic sets, which need not be the case at all3.

An even bigger problem is that infinitary conjunction expects a family of formulas while with universal quantification has just one. This is one facet of the uniformity I mentioned. Universal quantification has one formula that is interpreted a single way (with respect to the given structure). The infinitary intersection expression is computing a set out of this singular interpretation. Infinitary conjunction, on the other hand, has a family of formulas which need have no relation to each other. Each of these formulas is independently interpreted and then all those separate interpretations are combined with an infinitary intersection. The problem we have is that there’s generally no way to take a formula |\varphi| with free variable |x| and an element |d \in D| and make a formula |\varphi_d| with |x| not free such that |\bar y[x \mapsto d] \in \den{\varphi} \iff \bar y \in \den{\varphi_d}|. A simple cardinality argument shows that: there are only countably many (finitary) formulas, but there are plenty of uncountable domains. This is why |\omega|-inconsistency is possible. We can easily have elements in the domain which cannot be captured by any formula.

Syntactic View

Instead of taking a semantic view, let’s take a syntactic view of universal quantification and infinitary conjunction, i.e. let’s compare the rules that characterize them. As before, the first problem we have is that traditional first-order logic does not have infinitary conjunction, but we can easily formulate what the rules would be.

The elimination rules are superficially similar but have subtle but important distinctions:

\[\frac{\Gamma \vdash \forall x.\varphi}{\Gamma \vdash \varphi[x \mapsto t]}\forall E,t \qquad \frac{\Gamma \vdash \bigwedge_{i \in I} \varphi_i}{\Gamma \vdash \varphi_j}{\wedge}E,j\] where |t| is a term, |j| is an element of |I|, and |\varphi[x \mapsto t]| corresponds to syntactically substituting |t| for |x| in |\varphi| in a capture-avoiding way. A first, not-so-subtle distinction is if |I| is an infinite set, then |\bigwedge_{i \in I}\varphi_i| is an infinitely large formula. Another pretty obvious issue is universal quantification is restricted to instantiating terms while |I| stands for either an arbitrary (meta-theoretic) set or it may stand for some particular (meta-theoretic) set, e.g. |\mathbb N|. Either way, it is typically not the set of terms of the logic.

Arguably, this isn’t an issue since the claim isn’t that every infinite conjunction corresponds to a universal quantification, but only that universal quantification corresponds to some infinite conjunction. The set of terms is a possible choice for |I|, so that shouldn’t be a problem. Well, whether it’s a problem or not depends on how you set up the syntax of the language. In my preferred way of handling the syntax of logical formulas, I index each formula by the set of free variables that may occur in that formula. This means the set of terms varies with the set of possible free variables. Writing |\vdash_V \varphi| to mean |\varphi| is well-formed and provable in a context with free variables |V|, then we would want the following rule:

\[\frac{\vdash_V \varphi}{\vdash_U \varphi}\] where |V \subseteq U|. This simply states that if a formula is provable, it should remain provable even if we add more (unused) free variables. This causes a problem with having an infinitary conjunction indexed by terms. Writing |\mathsf{Term}(V)| for the set of terms with (potential) free variables in |V|, then while |\vdash_V \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t| might be okay, this would also lead to |\vdash_U \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t| which would also hold but would no longer correspond to universal quantification in a context with free variables in |U|. This really makes a difference. For example, for many theories, such as the usual presentation of ZFC, |\mathsf{Term}(\varnothing) = \varnothing|, i.e. there are no closed terms. As such, |\vdash_\varnothing \forall x.\bot| is neither provable (which we wouldn’t expect it to be) nor refutable without additional axioms. On the other hand, |\bigwedge_{i \in \varnothing}\bot| is |\top| and thus trivially provable. If we consider |\vdash_{\{y\}} \forall x.\bot| next, it becomes refutable. This doesn’t contradict our earlier rule about adding free variables because |\vdash_\varnothing \forall x.\bot| wasn’t provable and so the rule says nothing. On the other hand, that rule does require |\vdash_{\{y\}} \bigwedge_{i \in \varnothing}\bot| to be provable, and it is. Of course, it no longer corresponds to |\forall x.\bot| with this set of free variables. The putative corresponding formula would be |\bigwedge_{i \in \{y\}}\bot| which is indeed refutable.

With the setup above, we can’t get the elimination rule for |\bigwedge| to correspond to the elimination rule for |\forall|, because there isn’t a singular set of terms. However, a more common if less clean approach is to allow all free variables all the time, i.e. to fix a single countably infinite set of variables once and for all. This would “resolve” this problem.

The differences in the introduction rules are more stark. The rules are:

\[\frac{\Gamma \vdash \varphi \quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\forall I \qquad \frac{\left\{\Gamma \vdash \varphi_i \right\}_{i \in I}}{\Gamma \vdash \bigwedge_{i \in I}\varphi_i}{\wedge}I\]

Again, the most blatant difference is that (when |I| is infinite) |{\wedge}I| corresponds to an infinitely large derivation. Again, the uniformity aspects show through. |\forall I| requires a single derivation that will handle all terms, whereas |{\wedge}I| allows a different derivation for each |i \in I|.

We don’t run into the same issue as in the semantic view with needing to turn elements of the domain into terms/formulas. Given a formula |\varphi| with free variable |x|, we can easily make a formula |\varphi_t| for every term |t|, namely |\varphi_t = \varphi[x \mapsto t]|. We won’t have the issue that leads to |\omega|-inconsistency because |\forall x.\varphi| is derivable from |\bigwedge_{t \in \mathsf{Term}(V)}\varphi[x \mapsto t]|. Of course, the reason this is true is because one of the terms in |\mathsf{Term}(V)| will be a variable not occurring in |\Gamma| allowing us to derive the premise of |\forall I|. On the other hand, if we choose |I = \mathsf{Term}(\varnothing)|, i.e. only consider closed terms, which is what the |\omega| rule in arithmetic is doing, then we definitely can get |\omega|-inconsistency-like situations. Most notably, in the case of theories, like ZFC, which have no closed terms.

Constructive View

A constructive perspective allows us to accentuate the contrast between universal quantification and infinitary conjunction even more as well as bring more clarity to the notion of uniformity.

We’ll start with the BHK interpretation of Intuitionistic logic and specifically a realizabilty interpretation. For this, we’ll allow infinitary conjunction only for |I = \mathbb N|.

I’ll write |n\textbf{ realizes }\varphi| for the statement that the natural number |n| realizes the formula |\varphi|. As in the linked articles, we’ll need a computable pairing function which computably encodes a pair of natural numbers as a natural number. I’ll just write this using normal pairing notation, i.e. |(n,m)|. We’ll also need Gödel numbering to computably map a natural number |n| to a computable function |f_n|.

\[\begin{align} (n_0, n_1)\textbf{ realizes }\varphi_1 \land \varphi_2 \quad & \textrm{if and only if} \quad n_0\textbf{ realizes }\varphi_0\textrm{ and } n_1\textbf{ realizes }\varphi_1 \\ n\textbf{ realizes }\forall x.\varphi \quad & \textrm{if and only if}\quad \textrm{for all }m, f_n(m)\textbf{ realizes }\varphi[x \mapsto m] \\ (k, n_k)\textbf{ realizes }\varphi_1 \lor \varphi_2 \quad & \textrm{if and only if} \quad k \in \{0, 1\}\textrm{ and }n_k\textbf{ realizes }\varphi_k \\ n\textbf{ realizes }\neg\varphi \quad & \textrm{if and only if} \quad\textrm{there is no }m\textrm{ such that }m\textbf{ realizes }\varphi \end{align}\]

I included disjunction and negation in the above so I could talk about the Law of the Excluded Middle. Via the above interpretation, given any formula |\varphi| with free variable |x|, the meaning of |\forall x.\varphi\lor\neg\varphi| would be a computable function which for each natural number |m| produces a bit indicating whether or not |\varphi[x \mapsto m]| holds. The Law of Excluded Middle holding would thus mean every such formula is computationally decidable which we know isn’t the case. For example, choose |\varphi| as the formula which asserts that the |x|-th Turing machine halts.

This example illustrates the uniformity constraint. Assuming a traditional, classical meta-language, e.g. ZFC, then it is the case that |(\varphi\lor\neg\varphi)[x \mapsto m]| is realized for each |m| in the case where |\varphi| is asserting the halting of the |x|-th Turing machine4. But this interpretation of universal quantification requires not only that the quantified formula holds for all naturals, but also that we can computably find this out.

It’s clear that trying to formulate a notion of infinitary conjunction with regards to realizability would require using something other than natural numbers as realizers if we just directly generalize the finite conjunction case. For example, we might use potentially infinite sequences of natural numbers as realizers. Regardless, the discussion of the previous example makes it clear an interpretation of infinitary conjunction can’t be done in standard computability5, while, obviously, universal quantification can.

Categorical View

The categorical semantics of universal quantification and conjunction are quite different which also suggests that they are not related, at least not in some straightforward way.

One way to get to categorical semantics is to restate traditional, set-theoretic semantics in categorical terms. Traditionally, the semantics of a formula is a subset of some product of the domain set, one for each free variable. Categorically, that suggests we want finite products and the categorical semantics of a formula should be a subobject of a product of some object representing the domain.

Conjunction is traditionally represented via intersection of subsets, and categorically we form the intersection of subobjects via pulling back. So to support finite conjunctions, we need our category to additionally have finite pullbacks of monomorphisms. Infinitary conjunctions simply require infinitely wide pullbacks of monomorphisms. However, we can start to see some cracks here. What does it mean for a pullback to be infinitely wide? It means the obvious thing; namely, that we have an infinite set of monomorphisms sharing a codomain, and we’ll take the limit of this diagram. The key here, though, is “set”. Regardless of whatever the objects of our semantic category are, the infinitary conjunctions are indexed by a set.

To talk about the categorical semantics of universal quantification, we need to bring to the foreground some structure that we have been leaving – and traditionally accounts do leave – in the background. Before, I said the semantics of a formula, |\varphi|, depends on the free variables in that formula, e.g. if |D| is our domain object, then the semantics of a formula with three free variables would be a subobject of |\prod_{v \in \mathsf{fv}(\varphi)}D \cong D\times D \times D| which I’ll continue to write as |D^{\mathsf{fv}(\varphi)}| though now it will be interpreted as a product rather than a function space. For |\mathbf{Set}|, this makes no difference. It would be more accurate to say that a formula can be given semantics in any product of the domain object indexed by any superset of the free variables. This is just to say that a formula doesn’t need to use every free variable that is available. Nevertheless, even if it is induced by the same formula, a subobject of |D^{\mathsf{fv}(\varphi)}| is a different subobject than a subobject of |D^{\mathsf{fv}(\varphi) \cup \{u\}}| where |u| is a variable not free in |\varphi|, so we need a way of relating the semantics of formulas considered with respect to different sets of free variables.

To do this, we will formulate a category of contexts and index our semantics by it. Fix a category |\mathcal C| and an object |D| of |\mathcal C|. Our category of contexts, |\mathsf{Ctx}|, will be the full subcategory of |\mathcal C| with objects of the form |D^S| where |S| is a finite subset of |V|, a fixed set of variables. We’ll assume these products exist, though typically we’ll just assume that |\mathcal C| has all finite products. From here, we use the |\mathsf{Sub}| functor. |\mathsf{Sub} : \mathsf{Ctx}^{op} \to \mathbf{Pos}| maps an object of |\mathsf{Ctx}| to the poset of its subobjects as objects of |\mathcal C|6. Now an arrow |f : D^{\{x,y,z,w\}} \to D^{\{x,y,z\}}| would induce a monotonic function |\mathsf{Sub}(f) : \mathsf{Sub}(D^{\{x,y,z\}}) \to \mathsf{Sub}(D^{\{x,y,z,w\}})|. This is defined for each subobject by pulling back a representative monomorphism of that subobject along |f|. Arrows of |\mathsf{Ctx}| are the semantic analogues of substitutions, and |\mathsf{Sub}(f)| applies these “substitutions” to the semantics of formulas.

Universal quantification is then characterized as the (indexed) right adjoint (Galois connection in this context) of |\mathsf{Sub}(\pi^x)| where |\pi^x : D^S \to D^{S \setminus \{x\}}| is just projection. The indexed nature of this adjoint leads to Beck-Chevalley conditions reflecting the fact universal quantification should respect substitution. |\mathsf{Sub}(\pi^x)| corresponds to adding |x| as a new, unused free variable to a formula. Let |U| be a subobject of |D^{S \setminus \{x\}}| and |V| a subobject of |D^S|. Furthermore, write |U \sqsubseteq U’| to indicate that |U| is a subobject of the subobject |U’|, i.e. that the monos that represent |U| factor through the monos that represent |U’|. The adjunction then states: \[\mathsf{Sub}(\pi^x)(U) \sqsubseteq V\quad \textrm{if and only if}\quad U \sqsubseteq \forall_x(V)\] The |\implies| direction is a fairly direct semantic analogue of the |\forall I| rule: \[\frac{\Gamma \vdash \varphi\quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\] Indeed, it is easy to show that the converse of this rule is derivable with |\forall E| validating the semantic “if and only if”. To be clear, the full adjunction is natural in |U| and |V| and indexed, effectively, in |S|.

Incidentally, we’d also want the semantics of infinite conjunctions to respect substitution, so they too have a Beck-Chevalley condition they satisfy and give rise to an indexed right adjoint.

It’s hard to even compare the categorical semantics of infinitary conjunction and universal quantification, let alone conflate them, even when |\mathcal C = \mathbf{Set}|. This isn’t too surprising as these semantics work just fine for constructive logics where, as illustrated earlier, these can be semantically distinct. As mentioned, both of these constructs can be described by indexed right adjoints. However, they are adjoints between very different indexed categories. If |\mathcal M| is our indexed category (above it was |\mathsf{Sub}|), then we’ll have |I|-indexed products if |\Delta_{\mathcal M} : \mathcal M \to [DI, -] \circ \mathcal M| has an indexed right adjoint where |D : \mathbf{Set} \to \mathbf{cat}| is the discrete (small) category functor. For |\mathcal M| to have universal quantification, we need an indexed right adjoint to an indexed functor |\mathcal M \circ \mathsf{cod} \circ \iota \to \mathcal M \circ \mathsf{dom} \circ \iota| where |\iota : s(\mathsf{Ctx}) \hookrightarrow \mathsf{Ctx}^{\to}| is the full subcategory of the arrow category |\mathsf{Ctx}^{\to}| consisting of just the projections.

Conclusion

My hope is that the preceding makes it abundantly clear that viewing universal quantification as some kind of special “infinite conjunction” is not sensible even approximately. To do so is to seriously misunderstand universal quantification. Most discussions “equating” them involve significant conflations of syntax and semantics where a specific choice of domain is fixed and elements of that specific domain are used as terms.

A secondary goal was to illustrate an aspect of logic from a variety of perspectives and illustrate some of the concerns in meta-logical reasoning. For example, quantifiers and connectives are syntactical concepts and thus can’t depend on the details of the semantic domain. As another example, better perspectives on quantifiers and connectives are more robust to weakening the logic. I’d say this is especially true when going from classical to constructive logic. Structural proof theory and categorical semantics are good at formulating logical concepts modularly so that they still make sense in very weak logics.

Unfortunately, the traditional trend towards minimalism strongly pushes in the other direction leading to the exploiting of every symmetry and coincidence a stronger logic (namely classical logic) provides producing definitions that don’t survive even mild weakening of the logic7. The attempt to identify universal quantification with infinite conjunction here takes that impulse too far and doesn’t even work in classical logic as demonstrated. While there’s certainly value in recognizing redundancy, I personally find minimizing logical assumptions far more important and valuable than minimizing (primitive) logical connectives.


  1. “Universal statements are true if they are true for every individual in the world. They can be thought of as an infinite conjunction,” from some random AI lecture notes. You can find many others.↩︎

  2. The domain doesn’t even need to be a set.↩︎

  3. For example, we may formulate our syntax in a second-order arithmetic identifying our syntax’s meta-theoretic sets with unary predicates, while our semantics is in ZFC. Just from cardinality concerns, we know that there’s no way of injectively mapping every ZFC set to a set of natural numbers.↩︎

  4. It’s probably worth pointing out that not only will this classical meta-language not tell us whether it’s |\varphi[x \mapsto m]| or |\neg\varphi[x \mapsto m]| that holds for every specific |m|, but it’s easy to show (assuming consistency of ZFC) that |\varphi[x \mapsto m]| is independent of ZFC for specific values of |m|. For example, it’s easy to make a Turing machine that halts if and only if it finds a contradiction in the theory of ZFC.↩︎

  5. Interestingly, for some models of computation, e.g. ones based on Turing machines, infinitary disjunction, or, specifically, |\mathbb N|-ary disjunction is not problematic. Given an infinite sequence of halting Turing machines, we can interleave their execution such that every Turing machine in the sequence will halt at some finite time. Accordingly, extending the definition of disjunction in realizability to the |\mathbb N|-ary case does not run into any of the issues that |\mathbb N|-ary conjunction has and is completely unproblematic. We just let |k| be an arbitrary natural instead of just |\{0, 1\}|.↩︎

  6. This is a place we could generalize the categorical semantics further. There’s no reason we need to consider this particular functor. We could consider other functors from |\mathsf{Ctx}^{op} \to \mathbf{Pos}|, i.e. other indexed |(0,1)|-categories. This setup is called a hyperdoctrine↩︎

  7. The most obvious example of this is defining quantifiers and connectives in terms of other connectives particularly when negation is involved. A less obvious example is the overwhelming focus on |\mathbf 2|-valued semantics when classical logic naturally allows arbitrary Boolean-algebra-valued semantics.↩︎

January 03, 2024 06:00 AM

January 01, 2024

Monday Morning Haskell

How to Write “Hello World” in Haskell

In this article we're going to write the easiest program we can in the Haskell programming language. We're going to write a simple example program that prints "Hello World!" to the console. It's such a simple program that we can do it in one line! But it's still the first thing you should do when starting a new programming language. Even with such a simple program there are several details we can learn about writing a Haskell program. Here's a quick table of contents if you want to jump around!

Now let's get started!

Writing Haskell "Hello World"

To write our "Haskell Hello World" program, we just need to open a file named 'HelloWorld.hs' in our code editor and write the following line:

main = putStrLn "Hello World!"

This is all the code you need! With just this one line, there's still another way you could write it. You could use the function 'print' instead of 'putStrLn':

main = print "Hello World!"

These programs will both accomplish our goal, but their behavior is slightly different! But to explore this, we first need to run our program!

The Simplest Way to Run the Code

Hopefully you've already installed the Haskell language tools on your machine. The old way to do this was through Haskell Platform, but now you should use GHCup. You can read our Startup Guide for more instructions on that! But assuming you've installed everything, the simplest way to run your program is to use the 'runghc' command on your file:

>> runghc HelloWorld.hs

With the first version of our code using 'putStrLn', we'll see this printed to our terminal:

Hello World!

If we use 'print' instead, we'll get this output:

"Hello World!"

In the second example, there are quotation marks! To understand why this is, we need to understand a little more about types, which are extremely important in Haskell code.

Functional Programming and Types

Haskell is a functional programming language with a strong, static type system. Even something as simple as our "Hello World" program is comprised of expressions, and each of these expressions has a type. For that matter, our whole program has a type!

In fact, every Haskell program has the same type: 'IO ()'. The IO type signifies any expression which can perform Input/Output activities, like printing to the terminal and reading user input. Most functions you write in Haskell won't need to do these tasks. But since we're printing, we need the IO signifier. The second part of the type is the empty tuple, '()'. This is also referred to as the "unit type". When used following 'IO', it is similar to having a 'void' return value in other programming languages.

Now, our 'main' expression signifies our whole program, and we can explicitly declare it to have this type by putting a type signature above it in our code. We give the expression name, two colons, and then the type:

main :: IO ()
main = putStrLn "Hello World!"

Our program will run the same with the type signature. We didn't need to put it there, because GHC, the Haskell compiler, can usually infer the types of expressions. With more complicated programs, it can get stuck without explicit type signatures, but we don't have to worry about that right now.

Requirements of an Executable Haskell Program

Now if we gave any other type to our main function, we won't be able to run our program! Our file is supposed to be an entry point - the root of an executable program. And Haskell has several requirements for such files.

These files must have an expression named 'main'. This expression must have the type 'IO ()'. Finally, if we put a module name on our code, that module name should be Main. Module names go at the top of our file, prefaced by "module", and followed by the word "where". Here's how we can explicitly declare the name of our module:

module Main where

main :: IO ()
main = putStrLn "Hello World!"

Like the type signature on our function 'main', GHC could infer the module name as well. But let's try giving it a different module name:

module HelloWorld where

main :: IO ()
main = putStrLn "Hello World!"

For most Haskell modules you write, using the file name (minus the '.hs' extension) IS how you want to name the module. But runnable entry point modules are different. If we use the 'runghc' command on this code, it will still work. However, if we get into more specific behaviors of GHC, we'll see that Haskell treats our file differently if we don't use 'Main'.

Using the GHC Compiler

Instead of using 'runghc', a command designed mainly for one-off files like this, let's try to compile our code more directly using the Haskell compiler. Suppose we have used HelloWorld as the module name. What files does it produce when we compile it with the 'ghc' command?

>> ghc HelloWorld.hs
[1 of 1] Compiling HelloWorld       ( HelloWorld.hs, HelloWorld.o )
>> ls
HelloWorld.hi HelloWorld.hs HelloWorld.o

This produces two output files beside our source module. The '.hi' file is an interface file. The '.o' file is an object file. Unfortunately, neither of these are runnable! So let's try changing our module name back to Main.

module Main where

main :: IO ()
main = putStrLn "Hello World!"

Now we'll go back to the command line and run it again:

>> ghc HelloWorld.hs
[1 of 2] Compiling Main       ( HelloWorld.hs, HelloWorld.o )
[2 of 2] Linking HelloWorld
>> ls 
HelloWorld HelloWorld.hi HelloWorld.hs HelloWorld.o

This time, things are different! We now have two compilation steps. The first says 'Compiling Main', referring to our code module. The second says 'Linking HelloWorld'. This refers to the creation of the 'HelloWorld' file, which is executable code! (On Windows, this file will be called 'HelloWorld.exe'). We can "run" this file on the command line now, and our program will run!

>> ./HelloWorld
Hello World!

Using GHCI - The Haskell Interpreter

Now there's another simple way for us to run our code. We can also use the GHC Interpreter, known as GHCI. We open it with the command 'ghci' on our command line terminal. This brings us a prompt where we can enter Haskell expressions. We can also load code from our modules, using the ':load' command. Let's load our hello world program and run its 'main' function.

>> ghci
GHCI, version 9.4.7: https://www.haskell.org/ghc/   :? for help
ghci> :load HelloWorld
[1 of 2] Compiling Main          ( HelloWorld.hs, interpreted )
ghci> main
Hello World!

If we wanted, we could also just run our "Hello World" code in the interpreter itself:

ghci> putStrLn "Hello World!"
Hello World!

It's also possible to assign our string to a value and then use it in another expression:

ghci> let myString = "Hello World!"
ghci> putStrLn myString
Hello World!

A Closer Look at Our Types

A very useful function of GHCI is that it can tell us the types of our expressions. We just have to use the ':type' command, or ':t' for short. We have two expressions in our Haskell program: 'putStrLn', and "Hello World!". Let's look at their types. We'll start with "Hello World!":

ghci> :type "Hello World!"
"Hello World!" :: String

The type of "Hello World!" itself is a 'String'. This is the name given for a list of characters. We can look at the type of an individual character as well:

ghci> :type 'H'
'H' :: Char

What about 'putStrLn'?

ghci> :t putStrLn
putStrLn :: String -> IO ()

The type for 'putStrLn' looks like 'String -> IO ()'. Any type with an arrow in it ('->') is a function. It takes a 'String' as an input and it returns a value of type 'IO ()', which we've discussed. In order to apply a function, we place its argument next to it in our code. This is very different from other programming languages, where you usually need parentheses to apply a function on arguments. Once we apply a function, the type of the resulting expression is just whatever is on the right side of the arrow. So applying our string to the function 'putStrLn', we get 'IO ()' as the resulting type!

ghci> :t putStrLn "Hello World!"
putStrLn "Hello World!" :: IO ()

Compilation Errors

For a different example, let's see what happens if we try to use an integer with 'putStrLn':

ghci> putStrLn 5
No instance for (Num String) arising from the literal '5'

The 'putStrLn' function only works with values of the 'String' type, while 5 has a type more like 'Int'. So we can't use these expressions together.

A Quick Look At Type Classes

However, this is where 'print' comes in. Let's look at its type signature:

ghci> :t print
print :: Show a => a -> IO ()

Unlike 'putStrLn', the 'print' function takes a more generic input. A "type class" is a general category describing a behavior. Many different types can perform the behavior. One such class is 'Show'. The behavior is that Show-able items can be converted to strings for printing. The 'Int' type is part of this type class, so we can use 'print' with it!

ghci> print 5
5

When use 'show' on a string, Haskell adds quotation marks to the string. This is why it looks different to use 'print' instead of 'putStrLn' in our initial program:

ghci> print "Hello World!"
"Hello World!"

Echo - Another Example Program

Our Haskell "Hello World" program is the most basic example of a program we can write. It only showed one side of the input/output equation. Here's an "echo" program, which first waits for the user to enter some text on the command line and then prints that line back out:

main :: IO ()
main = do
  input <- getLine
  putStrLn input

Let's quickly check the type of 'getLine':

ghci> :t getLine
getLine :: IO String

We can see that 'getLine' is an IO action returning a string. When we use the backwards arrow '<-' in our code, this means we unwrap the IO value and get the result on the left side. So the type of 'input' in our code is just 'String', meaning we can then use it with 'putStrLn'! Then we use the 'do' keyword to string together two consecutive IO actions. Here's what it looks like to run the program. The first line is us entering input, the second line is our program repeating it back to us!

>> runghc Echo.hs
I'm entering input!
I'm entering input!

A Complete Introduction to the Haskell Programming Language

Our Haskell "Hello World" program is the most basic thing you can do with the language. But if you want a comprehensive look at the syntax and every fundamental concept of Haskell, you should take our beginners course, Haskell From Scratch.

You'll get several hours of video lectures, plus a lot of hands-on experience with 100+ exercise problems with automated testing.

All-in-all, you'll only need 10-15 hours to work through all the material, so within a couple weeks you'll be ready for action! Read more about the course here!

by James Bowen at January 01, 2024 04:00 PM

December 31, 2023

Haskell Interlude

40: Mike Sperber

In this episode, Andres and Matti talk to Mike Sperber, CEO of Active Group in Germany. They discuss how to successfully develop an application based on deep learning in Haskell, contrast learning by example with the German bureaucratic approach, and highlight the virtues of having fewer changes in the language.

by Haskell Podcast at December 31, 2023 05:00 PM

December 26, 2023

Sandy Maguire

FRP in Yampa: Part 4: Routing

In the last post, we investigated the switch combinator, and saw how it can give us the ability to work with “state machine�-sorts of things in our functionally reactive programs.

Today we turn our attention towards game objects—that is, independently operating entities inside of the game, capable of behaving on their own and communicating with one another. I originally learned of this technique from the paper The Yampa Arcade, but haven’t looked at it in a few years, so any shortcomings here are my own.

Nevertheless, the material presented here does in fact work—I’ve actually shipped a game using this exact technique!

Game Objects🔗

Before we dive into the Yampa, it’s worth taking some time to think about what it is we’re actually trying to accomplish. There are a series of constraints necessary to get everything working, and we’ll learn a lot about the problem domain by solving those constraints simultaneously.

The problem: we’d like several Objects running around, which we’d like to program independently, but which behave compositionally. There are going to be a lot of moving pieces here—not only in our game, but also in our solution—so let’s take a moment to define a type synonym for ourselves:

type Object = SF ObjectInput ObjectOutput

Of course, we haven’t yet defined ObjectInput or ObjectOutput, but that’s OK! They will be subject to a boatload of constraints, so we’ll sort them out as we go. At the very least, we will need the ability for an Object to render itself, so we can add a Render field:

data ObjectOutput = ObjectOutput
  { oo_render :: Render
  , ...
  }

We would like Objects to be able to interact with one another. The usual functional approach to this problem is to use message passing—that is, Objects can send values of some message type to one another. Those messages could be things like “I shot you!� or “teleport to me,� or any sort of crazy game-specific behavior you’d like.

In order to do this, we’ll need some sort of Name for each Object. The exact structure of this type depends on your game. For the purposes of this post we’ll leave the thing abstract:

data Name = ...

We’ll also need a Message type, which again we leave abstract:

data Message = ...

Sending messages is clearly an output of the Object, so we will add them to ObjectOutput:

data ObjectOutput = ObjectOutput
  { oo_render :: Render
  , oo_outbox :: [(Name, Message)]
  , ...
  }

There are actions we’d like to perform in the world which are not messages we want to send to anyone; particularly things like “kill my Object� or “start a new Object.� These two are particularly important, but you could imagine updating global game state or something else here.

data Command
  = Die
  | Spawn Name ObjectState Object
  | ...

Commands are also outputs:

data ObjectOutput = ObjectOutput
  { oo_render   :: Render
  , oo_outbox   :: [(Name, Message)]
  , oo_commands :: [Command]
  , ...
  }

Finally, it’s often helpful to have some common pieces of state that belong to all Objects—things like their current position, and hot boxes, and anything else that might make sense to track in your game. We’ll leave this abstract:

data ObjecState = ...

data ObjectOutput = ObjectOutput
  { oo_render   :: Render
  , oo_outbox   :: [(Name, Message)]
  , oo_commands :: [Command]
  , oo_state    :: ObjectState
  }

Let’s turn our attention now to the input side. It’s pretty clear we’re going to want incoming messages, and our current state:

data ObjectInput = ObjectInput
  { oi_inbox :: [(Name, Message)]
  , oi_state :: ObjectState
  }

What’s more interesting, however, than knowing our own state is knowing everyone’s state. Once we have that, we can re-derive oi_state if we know our own Name. Thus, instead:

data ObjectInput = ObjectInput
  { oi_inbox    :: [(Name, Message)]
  , oi_me       :: Name
  , oi_everyone :: Map Name ObjectState
  }

oi_state :: ObjectInput -> ObjectState
oi_state oi
    = fromMaybe (error "impossible!")
    $ Data.Map.lookup (oi_me oi)
    $ oi_everyone oi

Parallel Switching🔗

Armed with our input and output types, we need now figure out how to implement any of this. The relevant combinator is Yampa’s pSwitch, with the ridiculous type:

pSwitch
  :: Functor col
  => (forall sf. gi -> col sf -> col (li, sf))
  -> col (SF li o)
  -> SF (gi, col o) (Event e)
  -> (col (SF li o) -> e -> SF gi (col o))
  -> SF gi (col o)

Yes, there are five type variables here (six, if you include the rank-2 type.) In order, they are:

  1. col: the data structure we’d like to store everything in
  2. gi: the global input, fed to the eventual signal
  3. li: the local input, fed to each object
  4. o: the output of each object signal
  5. e: the type we will use to articulate desired changes to the world

Big scary types like these are an excellent opportunity to turn on -XTypeApplications, and explicitly fill out the type parameters. From our work earlier, we know the types of li and o—they ought to be ObjectInput and ObjectOutput:

pSwitch @_
        @_
        @ObjectInput
        @ObjectOutput
        @_
  :: Functor col
  => (forall sf. gi -> col sf -> col (ObjectInput, sf))
  -> col (SF ObjectInput ObjectOutput)
  -> SF (gi, col ObjectOutput) (Event e)
  -> (col (SF ObjectInput ObjectOutput) -> e -> SF gi (col ObjectOutput))
  -> SF gi (col ObjectOutput)

It’s a little clearer what’s going on here. We can split it up by its four parameters:

  1. The first (value) parameter is this rank-2 function which is responsible for splitting the global input into a local input for each object.
  2. The second parameter is the collection of starting objects.
  3. The third parameter extracts the desired changes from the collection of outputs
  4. The final parameter applies the desired changes, resulting in a new signal of collections.

We are left with a few decisions, the big ones are: what should col be, and what should e be? My answer for the first is:

data ObjectMap a = ObjectMap
  { om_objects  :: Map Name (ObjectState, a)
  , om_messages :: MonoidalMap Name [(Name, Message)]
  }
  deriving stock Functor

which not only conveniently associates names with their corresponding objects and states, but also keeps track of the messages which haven’t yet been delivered. We’ll investigate this further momentarily.

For maximum switching power, we can therefore make our event type be ObjectMap Object -> ObjectMap Object. Filling all the types in, we get:

pSwitch @ObjectMap
        @_
        @ObjectInput
        @ObjectOutput
        @(ObjectMap Object -> ObjectMap Object)
  :: (forall sf. gi -> ObjectMap sf -> ObjectMap (ObjectInput, sf))
  -> ObjectMap Object
  -> SF (gi, ObjectMap ObjectOutput)
        (Event (ObjectMap Object -> ObjectMap Object))
  -> ( ObjectMap Object
    -> (ObjectMap Object -> ObjectMap Object)
    -> SF gi (ObjectMap ObjectOutput)
     )
  -> SF gi (ObjectMap ObjectOutput)

which is something that feels almost reasonable. Let’s write a function that calls pSwitch at these types. Thankfully, we can immediately fill in two of these parameters:

router
    :: ObjectMap Object
    -> SF gi (ObjectMap ObjectOutput)
router objs =
  pSwitch @ObjectMap
          @_
          @ObjectInput
          @ObjectOutput
          @(ObjectMap Object -> ObjectMap Object)
    _
    objs
    _
    (\om f -> router' $ (f om) { om_messages = mempty })

We are left with two holes: one which constructs ObjectInputs, the other which destructs ObjectOutputs. The first is simple enough:

routeInput :: gi -> ObjectMap sf -> ObjectMap (ObjectInput, sf)
routeInput gi om@(ObjectMap objs msgs) = om
  { om_objects = flip Data.Map.mapWithKey objs $ \name (_, sf) ->
      (, sf) $ ObjectInput
        { oi_inbox    = fromMaybe mempty $ Data.MonoidalMap.lookup name msgs
        , oi_me       = name
        , oi_everyone = fmap fst objs
        }
  }

Writing decodeOutput is a little more work—we need to accumulate every change that ObjectOutput might want to enact:

decodeOutput :: Name -> ObjectOutput -> Endo (ObjectMap Object)
decodeOutput from (ObjectOutput _ msgs cmds _) = mconcat
  [ flip foldMap msgs $ uncurry $ send from
  , flip foldMap cmds $ decodeCommand from
  ]

send :: Name -> Name -> Message -> Endo (ObjectMap Object)
send from to msg
  = Endo $ #om_messages <>~ Data.MonoidalMap.singleton to [(from, msg)]

decodeCommand :: Name -> Command -> Endo (ObjectMap Object)
decodeCommand _ (Spawn name st obj)
  = Endo $ #om_objects . at name ?~ (st, obj)
decodeCommand who Die
  = Endo $ #om_objects %~ Data.Map.delete who

There’s quite a lot going on here. Rather than dealing with ObjectMap Object -> ObjectMap Object directly, we instead work with Endo (ObjectMap Object) which gives us a nice monoid for combining endomorphisms. Then by exploiting mconcat and foldMap, we can split up all of the work of building the total transformation into pieces. Then send handles sending a message from one object to another, while also decodeCommand transforms each Command into an endomap.

We can tie everything together:

router
    :: ObjectMap Object
    -> SF gi (ObjectMap ObjectOutput)
router objs =
  pSwitch @ObjectMap
          @_
          @ObjectInput
          @ObjectOutput
          @(ObjectMap Object -> ObjectMap Object)
    routeInput
    objs
    (arr $ Event
         . appEndo
         . foldMap (uncurry decodeOutput)
         . Data.Map.assocs
         . om_objects
         . snd
         )
    (\om f -> router' $ (f om) { om_messages = mempty })

Notice that we’ve again done the monoid trick to run decodeOutput on every output in the ObjectMap. If you’re not already on the monoid bandwagon, hopefully this point will help to change your mind about that!

So our router is finally done! Except not quite. For some reason I don’t understand, pSwitch is capable of immediately switching if the Event you generate for decodeOutput immediately fires. This makes sense, but means Yampa will happily get itself into an infinite loop. The solution is to delay the event by an infinitesimal amount:

router
    :: ObjectMap Object
    -> SF gi (ObjectMap ObjectOutput)
router objs =
  pSwitch @ObjectMap
          @_
          @ObjectInput
          @ObjectOutput
          @(ObjectMap Object -> ObjectMap Object)
    routeInput
    objs
    ((arr $ Event
         . appEndo
         . foldMap (uncurry decodeOutput)
         . Data.Map.assocs
         . om_objects
         . snd
         ) >>> notYet)
    (\om f -> router' $ (f om) { om_messages = mempty })

There’s probably a more elegant solution to this problem, and if you know it, please do get in touch!

Wrapping Up🔗

Today we saw how to use the pSwitch combinator in order to build a router capable of managing independent objects, implementing message passing between them in the process.

You should now have enough knowledge of Yampa to get real tasks done, although if I’m feeling inspired, I might write one more post on integrating a Yampa stream into your main function, and doing all the annoying boilerplate like setting up a game window. Maybe! Watch this space for updates!

December 26, 2023 12:00 AM

December 24, 2023

Sandy Maguire

FRP in Yampa: Part 3: Switching

Yesterday we looked at arrowized FRP in Yampa, and saw how it the proc notation is to arrows as do is for monads. While these syntaxes don’t give you any new power, notation nevertheless matters and helps us better structure our programs.

So far all of our programs have consisted of a single signal function. We’ve sketched out how to build a lobotomized version of the Snake game, but real games have things like title screens and option menus as well as the actual gameplay component. If you were determined, you could probably figure out how to build these missing components with what we’ve seen so far, but it wouldn’t be fun.

Instead, we turn our attention to switches.

Switching🔗

Yampa’s SF type isn’t monadic, but the switch combinator gets you surprisingly close:

switch :: SF i (o, Event e) -> (e -> SF i o) -> SF i o

The idea is that you run the first SF until the outputted Event produces an event, at which point you take its value and use it to generate a new SF, which you subsequently run.

As an example, let’s build a little coproduct type for the choices we might make on the menu screen:

data MenuOption = Start | Options

Our menu screen is now an SF that outputs the things we’d like to draw on the screen (a Render), as well as an Event MenuOption corresponding to an event for when we actually make a selection:

menuScreen :: SF () (Render, Event MenuOption)
menuScreen = ...

As before, we have our main Snake game, and now a new screen for the options:

mainGame :: SF () Render
mainGame = ...

optionsScreen :: SF () Render
optionsScreen = ...

We can tie it all together by switching from menuScreen to the appropriate next SF:

program :: SF () Render
program = switch menuScreen $ \case
  Start   -> mainGame
  Options -> optionsScreen

Again, you can kind of squint to get the picture, but things get a little gnarlier when you actually get into the gritty details here. For example, in a real game, you might go back to the menu screen after the game ends, and you’d certainly go back after setting up the appropriate options. If we wanted to encode those rules, we’d need to fiddle with some types.

Let’s add Event ()s to mainGame and optionScreen, corresponding to when the player has died and when the options have been set, respectively:

mainGame :: SF () (Render, Event ())
optionsScreen :: SF () (Render, Event ())

With a creative amount of switching, it’s possible to encode everything we’d like:

program :: SF () Render
program = switch menuScreen $ \case
  Start   -> switch mainGame      $ const program
  Options -> switch optionsScreen $ const program

Of course, we can use switch for much more than just modeling state machines—the following example uses it as a combinator to do something for a while:

timed :: Time -> SF i o -> SF i o
timed dur s1 s2 =
  switch
    (proc i -> do
      o  <- s1 -< i
      ev <- after dur () -< ()
      returnA -< (o, ev)
    ) $ const s2

or, more interestingly, a combinator which interpolates a function:

interpolate :: Time -> (Time -> a) -> SF (i, a) o -> SF i o -> SF i o
interpolate dur f interp final =
  switch
    (proc i -> do
      t  <- time -< ()
      o  <- s1 -< (i, t / dur)
      ev <- after dur () -< ()
      returnA -< (o, ev)
    ) $ const final

The parameter f here will be called with values of time from 0 to 1, linearly increasing until dur. This is the sort of combinator that is extremely useful for animating objects, where you’d like to tween from a known starting point to a know ending point.

Making a Real Monad🔗

Most of what I know about Yampa I learned by reverse-engineering Alex Stuart’s excellent game Peoplemon (source here). As you might expect, it’s a fun parody on Pokemon.

One night while desperately trying to work out how he programmed up the menu-based battle system in Peoplemon, I came across the mysteriously named Lightarrow.hs, which makes the following improvement over the switching technique above.

He sticks the whole thing into the Cont monad:

newtype Cont r a = Cont { runCont :: (a -> r) -> r }

I think this is the first and only time I’ve seen a use for Cont in the wild, that doesn’t stem directly from trying to CPS everything in order to make your program go faster from fusion. It’s so COOL to see a real world opportunity to throw Cont at a problem!

Anyway. This type is known as Swont, which I’ve always assumed was something like “signal continuation� but your guess is as good as mine:

newtype Swont i o a = Swont { unSwont :: Cont (SF i o) a }
  deriving newtype (Functor, Applicative, Monad)

We can lift any SF i (b, Event c) into a Swont via swont:

swont :: SF i (o, Event e) -> Swont i o e
swont = Swont . cont . switch

and we can lower the whole thing again by way of switchSwont:

switchSwont :: Swont i o e -> (e -> SF i o) -> SF i o
switchSwont sw end = runCont (unSwont sw) end

What’s really nice about Swont is that it is a genuine, bona-fide monad. This gives us a really lovely notation for programming sequential things like state machines or battle animations—stuff that consists of needing to switch between disparate things with discrete reasons to change.

We can use Swont to encode our above state machine in a much more familiar way:

foreverSwont :: Swont i o e -> SF i o
foreverSwont sw = switchSwont (forever sw) $ error "impossible"

program :: SF () Render
program = foreverSwont $ do
  menuScreen >>= \case
    Start   -> mainGame
    Options -> optionsScreen

Not bad at all!

Wrapping Up🔗

Today we looked at Yampa’s switch combinator, seen how it can be used to string disparate signals together, and seen how wrapping the whole thing in a continuation monad can make the whole thing tolerable to work with.

In tomorrow’s post, we’ll look at writing object routers in Yampa—essentially, the main data structure for tracking lots of game objects, and allowing them to communicate with one another. Until then, I hope you’re having a very special Christmas weekend.

December 24, 2023 12:00 AM

December 22, 2023

Joachim Breitner

The Haskell Interlude Podcast

It was pointed out to me that I have not blogged about this, so better now than never:

Since 2021 I am – together with four other hosts – producing a regular podcast about Haskell, the Haskell Interlude. Roughly every two weeks two of us interview someone from the Haskell Community, and we chat for approximately an hour about how they came to Haskell, what they are doing with it, why they are doing it and what else is on their mind. Sometimes we talk to very famous people, like Simon Peyton Jones, and sometimes to people who maybe should be famous, but aren’t quite yet.

For most episodes we also have a transcript, so you can read the interviews instead, if you prefer, and you should find the podcast on most podcast apps as well. I do not know how reliable these statistics are, but supposedly we regularly have around 1300 listeners. We don’t get much feedback, however, so if you like the show, or dislike it, or have feedback, let us know (for example on the Haskell Disourse, which has a thread for each episode).

At the time of writing, we released 40 episodes. For the benefit of my (likely hypothetical) fans, or those who want to train an AI voice model for nefarious purposes, here is the list of episodes co-hosted by me:

Can’t decide where to start? The one with Ryan Trinkle might be my favorite.

Thanks to the Haskell Foundation and its sponsors for supporting this podcast (hosting, editing, transscription).

by Joachim Breitner (mail@joachim-breitner.de) at December 22, 2023 09:04 AM

Derek Elkins

What is the coproduct of two groups?

Introduction

The purpose of this article is to answer the question: what is the coproduct of two groups? The approach, however, will be somewhat absurd. Instead of simply presenting a construction and proving that it satisfies the appropriate universal property, I want to find the general answer and simply instantiate it for the case of groups.

Specifically, this will be a path through the theory of Lawvere theories and their models with the goal of motivating some of the theory around it in pursuit of the answer to this relatively simple question.

If you really just want to know the answer to the title question, then the construction is usually called the free product and is described on the linked Wikipedia page.

Groups as Models of a Lawvere Theory

A group is a model of an equational theory. This means a group is described by a set equipped with a collection of operations that must satisfy some equations. So we’d have a set, |G|, and operations |\mathtt{e} : () \to G|, |\mathtt{i} : G \to G|, and |\mathtt{m} : G \times G \to G|. These operations satisfy the equations, \[ \begin{align} \mathtt{m}(\mathtt{m}(x, y), z) = \mathtt{m}(x, \mathtt{m}(y, z)) \\ \mathtt{m}(\mathtt{e}(), x) = x = \mathtt{m}(x, \mathtt{e}()) \\ \mathtt{m}(\mathtt{i}(x), x) = \mathtt{e}() = \mathtt{m}(x, \mathtt{i}(x)) \end{align} \] universally quantified over |x|, |y|, and |z|.

These equations can easily be represented by commutative diagrams, i.e. equations of compositions of arrows, in any category with finite products of an object, |G|, with itself. For example, the left inverse law becomes: \[ \mathtt{m} \circ (\mathtt{i} \times id_G) = \mathtt{e} \circ {!}_G \] where |{!}_G : G \to 1| is the unique arrow into the terminal object corresponding to the |0|-ary product of copies of |G|.

One nice thing about this categorical description is that we can now talk about a group object in any category with finite products. Even better, we can make this pattern describing what a group is first-class. The (Lawvere) theory of a group is a (small) category, |\mathcal{T}_{\mathbf{Grp}}| whose objects are an object |\mathsf{G}| and all its powers, |\mathsf{G}^n|, where |\mathsf{G}^0 = 1| and |\mathsf{G}^{n+1} = \mathsf{G} \times \mathsf{G}^n|. The arrows consist of the relevant projection and tupling operations, the three arrows above, |\mathsf{m} : \mathsf{G}^2 \to \mathsf{G}^1|, |\mathsf{i} : \mathsf{G}^1 \to \mathsf{G}^1|, |\mathsf{e} : \mathsf{G}^0 \to \mathsf{G}^1|, and all composites that could be made with these arrows. See my previous article for a more explicit description of this, but it should be fairly intuitive.

An actual group is then, simply, a finite-product-preserving functor |\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}|. It must be finite-product-preserving so the image of |\mathsf{m}| actually gets sent to a binary function and not some function with some arbitrary domain. The category, |\mathbf{Grp}|, of groups and group homomorphisms is equivalent to the category |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Grp}}}| which is defined to be the full subcategory of the category of functors from |\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}| consisting of the functors which preserve finite products. While we’ll not explore it more here, we could use any category with finite products as the target, not just |\mathbf{Set}|. For example, we’ll show that |\mathbf{Grp}| has finite products, and in fact all limits and colimits, so we can talk about the models of the theory of groups in the category of groups. This turns out to be equivalent to the category of Abelian groups via the well-known Eckmann-Hilton argument.

A Bit of Organization

First, a construction that will become even more useful later. Given any category, |\mathcal{C}|, we define |\mathcal{C}^{\times}|, or, more precisely, an inclusion |\sigma : \mathcal{C} \hookrightarrow \mathcal{C}^{\times}| to be the free category-with-finite-products generated from |\mathcal{C}|. Its universal property is: given any functor |F : \mathcal{C} \to \mathcal{E}| into a category-with-finite-products |\mathcal E|, there exists a unique finite-product-preserving functor |\bar{F} : \mathcal{C}^{\times} \to \mathcal E| such that |F = \bar{F} \circ \sigma|.

An explicit construction of |\mathcal{C}^{\times}| is the following. Its objects consist of (finite) lists of objects of |\mathcal{C}| with concatenation as the categorical product and the empty list as the terminal object. The arrows are tuples with a component for each object in the codomain list. Each component is a pair of an index into the domain list and an arrow from the corresponding object in the domain list to the object in the codomain list for this component. For example, the arrow |[A, B] \to [B, A]| would be |((1, id_B), (0, id_A))|. Identity and composition is straightforward. |\sigma| then maps each object to a singleton list and each arrow |f| to |((0, f))|.

Like most free constructions, this construction completely ignores any finite products the original category may have had. In particular, we want the category |\mathcal{T}_{\mathbf{Set}} = \mathbf{1}^{\times}|, called the theory of a set. The fact that the one object of the category |\mathbf{1}| is terminal has nothing to do with its image via |\sigma| which is not the terminal object.

We now define the general notion of a (Lawvere) theory as a small category with finite products, |\mathcal{T}|, equipped with a finite-product-preserving, identity-on-objects functor |\mathcal{T}_{\mathbf{Set}} \to \mathcal{T}|. A morphism of (Lawvere) theories is a finite-product-preserving functor that preserves these inclusions a la: \[ \xymatrix { & \mathcal{T}_{\mathbf{Set}} \ar[dl] \ar[dr] & \\ \mathcal{T}_1 \ar[rr] & & \mathcal{T}_2 } \]

The identity-on-objects aspect of the inclusion of |\mathcal{T}_{\mathbf{Set}}| along with finite-product-preservation ensures that the only objects in |\mathcal{T}| are powers of a single object which we’ll generically call |\mathsf{G}|. This is sometimes called the “generic object”, though the term “generic object” has other meanings in category theory.

A model of a theory (in |\mathbf{Set}|) is then simply a finite-product-preserving functor into |\mathbf{Set}|. |\mathbf{Mod}_{\mathcal{T}}| is the full subcategory of functors from |\mathcal{T} \to \mathbf{Set}| which preserve finite products. The morphisms of models are simply the natural transformations. As an exercise, you should show that for a natural transformation |\tau : M \to N| where |M| and |N| are two models of the same theory, |\tau_{\mathsf{G}^n} = \tau_{\mathsf{G}}^n|.

The Easy Categorical Constructions

This relatively simple definition of model already gives us a large swathe of results. An easy result in basic category theory is that (co)limits in functor categories are computed pointwise whenever the corresponding (co)limits exist in the codomain category. In our case, |\mathbf{Set}| has all (co)limits, so all categories of |\mathbf{Set}|-valued functors have all (co)limits and they are computed pointwise.

However, the (co)limit of finite-product-preserving functors into |\mathbf{Set}| may not be finite-product-preserving, so we don’t immediately get that |\mathbf{Mod}_{\mathcal{T}}| has all (co)limits (and they are computed pointwise). That said, finite products are limits and limits commute with each other, so we do get that |\mathbf{Mod}_{\mathcal{T}}| has all limits and they are computed pointwise. Similarly, sifted colimits, which are colimits that commute with finite products in |\mathbf{Set}| also exist and are computed pointwise in |\mathbf{Mod}_{\mathcal{T}}|. Sifted colimits include the better known filtered colimits which commute with all finite limits.

I’ll not elaborate on sifted colimits. We’re here for (finite) coproducts, and, as you’ve probably already guessed, coproducts are not sifted colimits.

When the Coproduct of Groups is Easy

There is one class of groups whose coproduct is easy to compute for general reasons: the free groups. The free group construction, like most “free constructions”, is a left adjoint and left adjoints preserve colimits, so the coproduct of two free groups is just the free group on the coproduct, i.e. disjoint union, of their generating sets. We haven’t defined the free group yet, though.

Normally, the free group construction would be defined as left adjoint to the underlying set functor. We have a very straightforward way to define the underlying set functor. Define |U : \mathbf{Mod}_{\mathcal T} \to \mathbf{Set}| as |U(M) = M(\mathsf{G}^1)| and |U(\tau) = \tau_{\mathsf{G}^1}|. Identifying |\mathsf{G}^1| with the functor |\mathsf G : \mathbf{1} \to \mathcal{T}| we have |U(M) = M \circ \mathsf{G}| giving a functor |\mathbf{1} \to \mathbf{Set}| which we identify with a set. The left adjoint to precomposition by |\mathsf{G}| is the left Kan extension along |\mathsf{G}|.

We then compute |F(S) = \mathrm{Lan}_{\mathsf{G}}(S) \cong \int^{{*} : \mathbf{1}} \mathcal{T}(\mathsf{G}({*}), {-}) \times S({*}) \cong \mathcal{T}(\mathsf{G}^1, {-}) \times S|. This is the left Kan extension and does form an adjunction but not with the category of models because the functor produced by |F(S)| does not preserve finite products. We should have |F(S)(\mathsf{G}^n) \cong F(S)(\mathsf{G})^n|, but substituting in the definition of |F(S)| clearly does not satisfy this. For example, consider |F(\varnothing)(\mathsf{G}^0)|.

We can and will show that the left Kan extension of a functor into |\mathbf{Set}| preserves finite products when the original functor did. Once we have that result we can correct our definition of the free construction. We simply replace |S : \mathbf{1} \to \mathbf{Set}| with a functor that does preserve finite products, namely |\bar{S} : \mathbf{1}^{\times} \to \mathbf{Set}|. Of course, |\mathbf{1}^{\times}| is exactly our definition of |\mathcal{T}_{\mathbf{Set}}|. We see now that a model of |\mathcal{T}_{\mathbf{Set}}| is the same thing as having a set, hence the name. Indeed, we have an equivalence of categories between |\mathbf{Set}| and |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Set}}}|. (More generally, this theory is called “the theory of an object” as we may consider models in categories other than |\mathbf{Set}|, and we’ll still have this relation.)

The correct definition of |F| is |F(S) = \mathrm{Lan}_{\iota}(\bar S) \cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times \bar{S}(\mathsf{G}^n) \cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times S^n| where |\iota : \mathcal{T}_{\mathbf{Set}} \to \mathcal{T}| is the inclusion we give as part of the definition of a theory. We can also see |\iota| as |\bar{\mathsf{G}}|.

We can start to see the term algebra in this definition. An element of |F(S)| is a choice of |n|, an |n|-tuple of elements of |S|, and a (potentially compound) |n|-ary operation. We can think of an element of |\mathcal{T}(\mathsf{G}^n, {-})| as a term with |n| free variables which we’ll label with the elements of |S^n| in |F(S)|. The equivalence relation in the explicit construction of the coend allows us to swap projections and tupling morphisms from the term to the tuple of labels. For example, it equates a unary term paired with one label with a binary term paired with two labels but where the binary term immediately discards one of its inputs. Essentially, if you are given a unary term and two labels, you can either discard one of the labels or you can make the unary term binary by precomposing with a projection. Similarly for tupling.

It’s still not obvious this definition produces a functor which preserves finite products. As a lemma to help in the proof of that fact, we have a bit of coend calculus.

Lemma 1: Let |F \dashv U : \mathcal{D} \to \mathcal{C}| and |H : \mathcal D^{op} \times \mathcal{C} \to \mathcal{E}|. Then, |\int^C H(FC, C) \cong \int^D H(D, UD)| when one, and thus both, exist. Proof: \[ \begin{align} \mathcal{E}\left(\int^C H(FC, C), {-}\right) & \cong \int_C \mathcal{E}(H(FC, C), {-}) \tag{continuity} \\ & \cong \int_C \int_D [\mathcal{D}(FC, D), \mathcal{E}(H(D, C), {-})] \tag{Yoneda} \\ & \cong \int_C \int_D [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{adjunction} \\ & \cong \int_D \int_C [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{Fubini} \\ & \cong \int_D \mathcal{E}(H(D, UD), {-}) \tag{Yoneda} \\ & \cong \mathcal{E}\left(\int^D H(D, UD), {-}\right) \tag{continuity} \\ & \square \end{align} \]

Using the adjunction |\Delta \dashv \times : \mathcal{C} \times \mathcal{C}\to \mathcal{C}| gives the following corollary.

Corollary 2: For any |H : \mathcal{C}^{op} \times \mathcal{C}^{op} \times \mathcal{C} \to \mathcal{E}|, \[\int^{C} H(C, C, C) \cong \int^{C_1}\int^{C_2} H(C_1, C_2, C_1 \times C_2)\] when both exists. This allows us to combine two (co)ends into one.

Now our theorem.

Theorem 3: Let |F : \mathcal{T}_1 \to \mathbf{Set}| and |J : \mathcal{T}_1 \to \mathcal{T}_2| where |\mathcal{T}_1| and |\mathcal{T}_2| have finite products. Then |\mathrm{Lan}_J(F)| preserves finite products if |F| does.

Proof: \[ \begin{flalign} \mathrm{Lan}_J(F)(X \times Y) & \cong \int^A \mathcal{T}_2(J(A), X \times Y) \times F(A) \tag{coend formula for left Kan extension} \\ & \cong \int^A \mathcal{T}_2(J(A), X) \times \mathcal{T}_2(J(A), Y) \times F(A) \tag{continuity} \\ & \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1 \times A_2) \tag{Corollary 2} \\ & \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1) \times F(A_2) \tag{finite product preservation} \\ & \cong \left(\int^{A_1}\mathcal{T}_2(J(A_1), X) \times F(A_1) \right) \times \left(\int^{A_2}\mathcal{T}_2(J(A_2), Y) \times F(A_2)\right) \tag{commutativity and cocontinuity of $\times$} \\ & \cong \mathrm{Lan}_J(F)(X) \times \mathrm{Lan}_J(F)(Y) \tag{coend formula for left Kan extension} \\ & \square \end{flalign} \]

The Coproduct of Groups

To get general coproducts (and all colimits), we’ll show that |\mathbf{Mod}_{\mathcal{T}}| is a reflective subcategory of |[\mathcal{T}, \mathbf{Set}]|. Write |\iota : \mathbf{Mod}_{\mathcal{T}} \hookrightarrow [\mathcal{T}, \mathbf{Set}]|. If we had a functor |R| such that |R \dashv \iota|, then we have |R \circ \iota = Id| which allows us to quickly produce colimits in the subcategory via |\int^I D(I) \cong R\int^I \iota D(I)|. It’s easy to verify that |R\int^I \iota D(I)| has the appropriate universal property to be |\int^I D(I)|.

We’ll compute |R| by composing two adjunctions. First, we have |\bar{({-})} \dashv \iota({-}) \circ \sigma : \mathbf{Mod}_{\mathcal{T}^{\times}} \to [\mathcal T, \mathbf{Set}]|. This is essentially the universal property of |\mathcal{T}^{\times}|. When |\mathcal{T}| has finite products, which, of course, we’re assuming, then we can use the universal property of |\mathcal{T}^{\times}| to factor |Id_{\mathcal{T}}| into |Id = \bar{Id} \circ \sigma|. The second adjunction is then |\mathrm{Lan}_{\bar{Id}} \dashv {-} \circ \bar{Id} : \mathbf{Mod}_{\mathcal{T}} \to \mathbf{Mod}_{\mathcal{T}^{\times}}|. The left adjoint sends finite-product-preserving functors to finite-product-preserving functors via Theorem 3. The right adjoint is the composition of finite-product-preserving functors.

The composite of the left adjoints is |\iota({-} \circ \bar{Id}) \circ \sigma = \iota({-}) \circ \bar{Id} \circ \sigma = \iota({-})|. The composite of the right adjoint is \[ \begin{align} R(F) & = \mathrm{Lan}_{\bar{Id}}(\bar{F}) \\ & \cong \int^X \mathcal{T}(\bar{Id}(X), {-}) \times \bar{F}(X) \\ & \cong \int^X \mathcal{T}\left(\prod_{i=1}^{\lvert X\rvert} X_i, {-}\right) \times \prod_{i=1}^{\lvert X \rvert} F(X_i) \end{align} \] where we view the list |X : \mathcal{T}^{\times}| as a |\lvert X\rvert|-tuple with components |X_i|.

This construction of the reflector, |R|, is quite similar to the free construction. The main difference is that here we factor |Id| via |\mathcal{T}^{\times}| where there we factored |\mathsf{G} : \mathbf{1} \to \mathcal{T}| via |\mathbf{1}^{\times} = \mathcal{T}_{\mathbf{Set}}|.

Let’s now explicitly describe the coproducts via |R|. As a warm-up, we’ll consider the initial object, i.e. nullary coproducts. We consider |R(\Delta 0)|. Because |0 \times S = 0|, the only case in the coend that isn’t |0| is when |\lvert X \rvert = 0| so the underlying set of the coend reduces to |\mathcal{T}(\mathsf{G}^0, \mathsf{G}^1)|, i.e. the nullary terms. For groups, this is just the unit element. For bounded lattices, it would be the two element set consisting of the top and bottom elements. For lattices without bounds, it would be the empty set. Of course, |R(\Delta 0)| matches |F(0)|, i.e. the free model on |0|.

Next, we consider two models |G| and |H|. First, we compute to the coproduct of |G| and |H| as (plain) functors which is just computed pointwise, i.e. |(G+H)(\mathsf{G}^n) = G(\mathsf{G}^n)+H(\mathsf{G}^n) \cong G(\mathsf{G^1})^n + H(\mathsf{G^1})^n|. Considering the case where |X_i = \mathsf{G}^1| for all |i| and where |\lvert X \rvert = n|, which subsumes all the other cases, we see we have a term with |n| free variables each labelled by either an element of |G| or an element of |H|. If we normalized the term into a list of variables representing a product of variables, then we’d have a essentially a word as described on the Wikipedia page for the free product. If we then only considered quotienting by the equivalences induced by projection and tupling, we’d have the free group on the disjoint union of the underlying sets of the |G| and |H|. However, for |R|, we quotient also by the action of the other operations. The lists of objects with |X_i \neq \mathsf{G}^1| come in here to support equating non-unary ops. For example, a pair of the binary term |\mathsf{m}| and the 2-tuple of elements |(g_1, g_2)| for |g_1, g_2 \in U(G)|, will be equated with the pair of the unary term |id| and the 1-tuple of elements |(g)| where |g = g_1 g_2| in |G|. Similarly for |H| and the other operations (and terms generally). Ultimately, the quotient identifies every element with an element that consists of a pair of a term that is a fully right associated set of multiplications ending in a unit where each variable is labelled with an element from |U(G)| or |U(H)| in an alternating fashion. These are the reduced words in the Wikipedia article.

This, perhaps combined with a more explicit spelling out of the equivalence relation, should make it clear that this construction does actually correspond to the usual free product construction. The name “free product” is also made a bit clearer, as we are essentially building the free group on the disjoint union of the underlying sets of the inputs, and then quotienting that to get the result. While there are some categorical treatments of normalization, the normalization arguments used above were not guided by the category theory. The (underlying sets of the) models produced by the above |F| and |R| functors big equivalence classes of “terms”. The above constructions provide no guidance for finding “good” representatives of those equivalence classes.

Conclusions

This was, of course, a very complex and round-about way of answering the title question. Obviously the real goal was illustrating these ideas and illustrating how “abstract” categorical reasoning can lead to relatively “concrete” results. Of course, these concrete constructions are derived from other concrete constructions, usually concrete constructions of limits and colimits in |\mathbf{Set}|. That said, category theory allows you to get a lot from a small collection of relatively simple concrete constructions. Essentially, category theory is like a programming language with a small set of primitives. You can write “abstract” programs in terms of that language, but once you provide an “implementation” for those primitives, all those “abstract” programs can be made concrete.

I picked (finite) coproducts, in particular, as they are where a bunch of complexity suddenly arises when studying algebraic objects categorically, but (finite) coproducts are still fairly simple.

For Lawvere theories, one thing to note is that the Lawvere theory is independent of the presentation. Any presentation of the axioms of a group would give rise to the same Lawvere theory. Of course, to explicitly describe the category would end up requiring a presentation of the category anyway. Beyond Lawvere theories are algebraic theories and algebraic categories, and further into essentially algebraic theories and categories. These extend to the multi-sorted case and then into the finite limit preserving case. The theory of categories, for example, cannot be presented as a Lawvere theory but is an essentially algebraic theory. There’s much more that can be said even about specifically Lawvere theories, both from a theoretical perspective, starting with monadicity, and from practical perspectives like algebraic effects.

Familiarity with the properties of functor categories, and especially categories of (co)presheaves was behind many of these results, and many that I only mentioned in passing. It is always useful to learn more about categories of presheaves. That said, most of the theory works in an enriched context and often without too many assumptions. The fact that all we need to talk about models is for the codomains of the functors to have finite products allows quite broad application. We can talk about algebraic objects almost anywhere. For example, sheaves of rings, groups, etc. can equivalently be described as models of the theories of rings, groups, etc. in sheaves of sets.

Kan extensions unsurprisingly played a large role, as they almost always do when you’re talking about (co)presheaves. One of the motivations for me to make this article was a happy confluence of things I was reading leading to a nice, coend calculus way of describing and proving finite-product-preservation for free models.

Thinking about what exactly was going on around finite-product-preservation was fairly interesting. The incorrect definition of the free model functor could be corrected in a different (though, of course, ultimately equivalent) way. The key is to remember that the coend formula for the left Kan extension is generally a copower and not a cartesian product. The copower for |\mathbf{Set}|-valued functors is different from the copower for finite-product-preserving |\mathbf{Set}|-valued functors. For a category with (arbitrary) coproducts, the copower corresponds to the coproduct of a constant family. We get, |F(S) \cong \coprod_{S} \mathcal T(\mathsf{G}^1, {-})| as is immediately evident from |F| being a left adjoint and a set |S| being the coproduct of |1| |S|-many times. For the purposes of this article, this would have been less than satisfying as figuring out what coproducts were was the nominal point.

That said, it isn’t completely unsatisfying as this defines the free model in terms of a coproduct of, specifically, representables and those are more tractable. In particular, an easy and neat exercise is to work out what |\mathcal{T}(\mathsf{G}^n, {-}) + \mathcal{T}(\mathsf{G}^m, {-})| is. Just use Yoneda and work out what must be true of the mapping out property, and remember that the object you’re mapping into preserves finite products. Once you have finite coproducts described, you can get all the rest via filtered colimits, and since those commute with finite products that gives us arbitrary coproducts.

December 22, 2023 02:47 AM

Sandy Maguire

FRP in Yampa: Part 2: Arrowized FRP

In the last part, we got a feel for how FRP can help us with real-time programming tasks, especially when contrasted against implicit models of time. However, the interface we looked at yesterday left much to be desired—stringing together long signal functions felt clunky, and since SFs don’t form a monad, we couldn’t alleviate the problem with do-notation.

So today we’ll look at one of Haskell’s lesser-known features—arrow notation—and learn how it can help structure bigger reactive programs.

Arrows🔗

What an awful, overloaded word we’ve found ourselves with. Being Haskell programmers, we’re all very familiar with the everyday function arrow (->), which you should think of as a special case of a more general notion of arrow.

Notice how both function arrows (i -> o) and signal functions (SF i o) have two type parameters—one for the input side of things, and another for the output side. And indeed, we should think of these as sides of the computation, where we are transforming an i into an o.

For our purposes today, we’ll want to be very precise when we differentiate between functions-as-data and functions-as-ways-of-building things. In order to do so, we will give give ourselves a little type synonym to help differentiate:

type Fn i o = i -> o

And henceforth, we will use the Fn synonym to refer to functions we’re manipulating, reserving (->) to talk about combinators for building those functions.

For example, our favorite identity function is a Fn:

id :: Fn a a

We usually give the constant function the type a -> b -> a, but my claim is that it ought to be:

const :: a -> Fn b a

The subtle thing I’m trying to point out is that there is a (conceptual) difference between the functions we want to operate on at runtime (called Fns), and the combinators we use to build those functions (called (->).)

Like I said, it’s a bit hard to point to in Haskell, because one of the great successes of functional programming has been to blur this distinction.

Anyway, let’s return to our discussion of arrows. Both functions and SFs admit a notion of composition, which allow us to line up the output of one arrow with the input of another, fusing the two into a single computation. The types they have are:

  • (.) :: Fn b c -> Fn a b -> Fn a c
  • (<<<) :: SF b c -> SF a b -> SF a c

Despite our intimate familiarity with functions, this pattern of types with both an input and an output is quite uncommon in Haskell. Due to the immense mindshare that the monad meme takes up, we usually think about computation in terms of monads, and it can be hard to remember that not all computation is monadic (nor applicative.)

Monadic values are of the shape M o, with only a single type parameter that corresponds (roughly) with the output of the computation. That is to say, all of the interesting computational structure of a monad exists only in its output, and never in its input—in fact, we can’t even talk about the input to a monad. What we do instead is cheat; we take the input side of the computation directly from the function arrow.

If we expand out the types of (<*>) and flip (>>=), using our Fn notation from above, they get the types:

  • (<*>) :: M (Fn i o) -> Fn (M i) (M o)
  • flip (>>=) :: Fn i (M o) -> Fn (M i) (M o)

which makes it much clearer that the relevant interactions here are some sort of distributivity of our monad over the regular, everyday function arrows. In other words, that monads are cheating by getting their “inputs� from functions.

What the Hell?🔗

Enough philosophy. What the hell are arrows? The example that really made it stick for me is in the domain of digital circuits. A digital circuit is some piece of silicon with wire glued to it, that moves electrons from one side to the other—with the trick being that the eventual endpoint of the electrons depends on their original positions. With enough squinting, you can see the whole thing as a type Circuit i o, where i corresponds to which wires we chose to put a high voltage on, and o is which wires have a high voltage at the end of the computation. With a little more squinting, it’s not too hard to reconceptualize these wires as bits, which we can again reconceptualize as encodings of particular types.

The point I was trying to make earlier about the distinction between (->) and Fn makes much more sense in this context; just replace Fn with Circuit. Here it makes much more sense to think about the identity circuit:

id :: Circuit a a

which is probably just a bundle of wires, and the constant circuit:

const :: o -> Circuit i o

which lets you pick some particular o value (at design time), and then make a circuit that is disconnected from its input wires and merely holds the chosen o value over its output wires.

Anyway. The important thing about digital circuits is that you have infinite flexibility when you are designing them, but once they’re manufactured, they stay that way. If you chose to wire the frobulator directly to the zanzigurgulator, those two components are, and always will be, wired together. In perpetuity.

Of course, you can do some amount of dynamic reconfiguring of a circuit, by conditionally choosing which wires you consider to be “relevant� right now, but those wires are going to have signals on them whether you’re interested in them or not.

In other words, there is a strict phase distinction between the components on the board and the data they carry at runtime.

And this is what arrows are all about.

Arrows are about computations whose internal structure must remain constant. You’ve got all the flexibility in the world when you’re designing them, but you can’t reconfigure anything at runtime.

Arrow Notation🔗

Yesterday’s post ended with the following code, written directly with the arrow combinators.

onPress :: (Controller -> Bool) -> a -> SF () (Event a)
onPress field a = fmap (fmap (const a)) $ fmap field controller >>> edge

arrowEvents :: Num a => SF () (Event (V2 a))
arrowEvents =
  (\u d l r -> asum [u, d, l r])
    <$> onPress ctrl_up    (V2 0 (-1))
    <*> onPress ctrl_down  (V2 0 1)
    <*> onPress ctrl_left  (V2 (-1) 0)
    <*> onPress ctrl_right (V2 1    0)

snakeDirection :: SF () (V2 Float)
snakeDirection = arrowEvents >>> hold (V2 0 1)

snakePosition :: SF () (V2 Float)
snakePosition = snakeDirection >>> integral

While technically you can get anything done in this style, it’s a lot like writing all of your monadic code directly in terms of (>>=). Possible certainly, but indisputably clunky.

Instead, let’s rewrite it with arrow notation:

{-# LANGUAGE Arrows #-}

snakePosition :: SF () (V2 Float)
snakePosition = proc i -> do
  u <- onPress ctrl_up    $ V2 0 (-1) -< i
  d <- onPress ctrl_down  $ V2 0 1    -< i
  l <- onPress ctrl_left  $ V2 (-1) 0 -< i
  r <- onPress ctrl_right $ V2 1    0 -< i

  dir <- hold $ V2 0 1 -< asum [u, d, l r]
  pos <- integral -< dir

  returnA -< pos

Much tidier, no? Reading arrow notation takes a little getting used to, but there are really only two things you need to understand. The first is that proc i -> do introduces an arrow computation, much like the do keyword introduces a monadic computation. Here, the input to the entire arrow is bound to i, but you can put any legal Haskell pattern you want there.

The other thing to know about arrow notation is that <- and -< are two halves of the same syntax. The notation here is:

  output <- arrow -< input

where arrow is of type SF i o, and input is any normal everyday Haskell value of type i. At the end of the day, you bind the result to output, whose type is obviously o.

The mnemonic for this whole thing is that you’re shooting an arrow (of bow and arrow fame) from the input to the output. And the name of the arrow is written on the shaft. It makes more sense if you play around with the whitespace:

  output   <-arrow-<   input

More importantly, the name of that arrow can be any valid Haskell expression, including one with infix operators. Thus, we should parse:

  u <- onPress ctrl_up $ V2 0 (-1) -< i

as

  u <- (onPress ctrl_up $ V2 0 (-1)) -< i

What’s likely to bite you as you get familiar with arrow notation is that the computations (the bits between <- and -<) exist in a completely different phase/namespace than the inputs and outputs. That means the following program is illegal:

  proc (i, j) -> do
    x <- blah  -< i
    y <- bar x -< j
    ...

because x simply isn’t in scope in the expression bar x. It’s the equivalent of designing a circuit board with n capacitors on it, where n will be determined by an input voltage supplied by the end-user. Completely nonsensical!

Wrapping Up🔗

That’s all for today, folks. The day caught me by surprise, so we’ll be back tomorrow to talk about building state machines in Yampa—something extremely important for making real video games.

December 22, 2023 12:00 AM

December 21, 2023

Sandy Maguire

FRP in Yampa: Part 1

I’ve been writing some Haskell lately, for the first time in a year, and it’s a total blast! In particular, school is out for the holidays, so I had some spare time, and thought I’d waste it by making a video game. In Haskell.

It’s always more fun to make video games with other people, but the few people I pitched it to all had the same response—“I don’t know how to do that.� So it seemed like a good opportunity to dust off the old blog and write about how to make a video game in Haskell, using arrowized FRP.

What the hell does that mean? Get ready to FIND OUT!

FRP?🔗

FRP is short for functional reactive programming, originally invented by Conal Elliott. The library we’ll be using today is called Yampa, which is certainly inspired by Elliott’s work, but my guess is it’s insufficiently true to the core idea for him to be excited about it.

Nevertheless, even an imperfect implementation of the idea is still orders of magnitude for making real-time applications than doing everything by hand. And to this extent, Yampa is an excellent library.

So what exactly is FRP? The core idea is that we want to talk about functions that are continuous in time, which give rise to extremely useful combinators-over-time. Real-time programs written as FRP are much easier to reason about, and significantly more expressive than you’d manage otherwise.

A Point of Contrast🔗

It’s informative to compare what writing a video game looks like under an imperative style. The idea is that you have your game loop (a fancy name for “infinite loop�) running:

void main() {
  setup();

  while (true) {
    float delta_time = waitForNextFrame();
    updateGame(delta_time);
    renderFrame();
  }
}

and this is kind of fine and manages to get the job done. But it’s inelegant for a few reasons. The biggest problem is that we are not actually modeling time here; we’re just running the game discretely, and time happens as a side effect of things changing. There’s this delta_time variable which counts how long it’s been since you last updated the game, which is to say it corresponds to “how much work you need to do this frame.�

What goes wrong is when updateGame or renderFrame takes too long to run; in that case, you might get spikes in how long it’s been since you last updated. Procedurally-written games compensate by interpolating everything a little further on the next frame, which gives the player the perception that they’re actually experiencing time.

But things can break down. If your last frame took too long, you need to simulate physics a little more this frame. In practice this usually means that you integrate your velocity a little more than usual—which really means your positions will teleport a little further than usual. This is a common bug in games, where it’s often easy to clip through obstacles when the frame-rate is too low.

The other problem with modeling your time only incidentally is that it makes it really annoying to actually do anything. For example, when you read from the controller you will only get whether the buttons are down or up, but you won’t get whether the button was just pressed this frame. If you want to know that you’ll have to compute it yourself:

bool last_a_button = false;

void updateGame(float delta_time) {
  controller ctrls = getControllerState();

  if (ctrls.a_button && !last_a_button) {
    // handle a press
  }

  last_a_button = ctrls.a_button;
}

It’s tedious, but it gets the job done. Another common pain point is when you want to do something five seconds in the future:

float timer;

void updateGame(float delta_time) {
  timer -= delta_time;

  if (getWantsToStartTimer()) {
    timer = 5.0;
  }

  // ...

  if (timer <= 0) {
    // handle timer finishing
  }
}

Again, nothing you can’t tackle, but in aggregate, this all becomes very weighty. Not being able to model time explicitly is a real pain, and everywhere you go you need to simulate it by diddling state changes.

Signal Functions🔗

If you’ve ever written a video game, it probably looked a lot like the examples from the previous section. That’s the sort of thing we’d like to abstract over, and work at a much higher level of detail than.

Here comes FRP to the rescue.

The core building block in Yampa is the “signal function�, written as SF i o. You can think of this as a transformer of signals of i into signals of o, where a signal is a function Time -> a. Unwrapping all of this, an SF i o is a function (Time -> i) -> (Time -> o).

That’s everything you need to know about what SFs are. I don’t know how they’re implemented, and I don’t need to, because the abstraction doesn’t leak. Being a haskell programmer, you’re probably looking at SF i o and thinking “that thing is clearly a Functor/Applicative/Monad.� Two out of three—it’s a functor and an applicative, but not a monad. We’ll come back to this momentarily.

The trick to working in FRP is to think of continuous streams of values over time. Thus, we can think about the player’s controller as an SF:

controller :: SF () Controller

which is to say, a continuous stream of Controller values. By marking the input side of the SF as a unit, it means we don’t need to provide anything in order to get this value, which makes sense since the controller state is obviously at the very periphery of our program.

Since SF is a functor, we can get the state of the A button by fmapping it:

aState :: SF () Bool
aState = fmap a_button controller

which isn’t very surprising. But what’s more interesting are the SF-operating primitives that Yampa gives us. For example, there’s delay:

delay :: Time -> a -> SF a a

which delays a signal by the given time, using the a parameter as the value for the initial value of the stream. Thus, we can get the value of the A button two seconds ago via:

aStateTwoSecondsAgo :: SF () Bool
aStateTwoSecondsAgo = aState >>> delay 2 False

where (>>>) :: SF a b -> SF b c -> SF a c is composition of SFs, analogous to function composition.

Already we can see the benefit of this approach. While it’s not clear exactly why we might want to look at the state of the controller two seconds ago, it’s also non-obvious how you’d go about implementing such a thing procedurally with a game loop.

One last signal function we might be interested for the time being is integral, which allows us to compute the integral of a stream:

integral :: Fractional a => SF a a

Events🔗

SFs are transformers of continuous signals, but often we want to talk about discrete moments in time. For this, we’ve got the Event type, which is isomorphic to Maybe:

data Event a
  = Event a
  | NoEvent

The interpretation you should have for an Event is that it’s a discrete piece of data arriving at a specific moment in time. In our earlier discussion of things you want to do in games, we’ve already seen two examples of events: when a timer expires, and when the player presses the A button. Under Yampa, the first is particularly easy to code up, by way of the after combinator:

after :: Time -> b -> SF a (Event b)

If we want to trigger a timer after 5 seconds, it’s just after 5 () :: SF a (Event ()), and we can listen to the output of this stream for an Event () value in order to know when the timer has elapsed.

Similarly, when we’re interested in the player pressing a button, what we’re really interested is in the edges of their button signal. We can get this functionality by way of the edge signal function:

edge :: SF Bool (Event ())

which generates an event whenever the input boolean goes from false to true.

Of course, just being able to generate events isn’t very useful if we don’t have any means of subsequently eliminating them. A simple means of eliminating events is via hold:

hold :: a -> SF (Event a) a

The hold function takes a stream of events, and holds onto the most recent value it received.

Making a Game of Snake🔗

We’ve already seen enough of FRP in order to make most of the old classic, Snake. In Snake, you are a snake who slithers around in a square, with a constant velocity, continuing in the direction you’re going until the player asks you to turn.

Begin with a Controller, and an SF to read it:

data Controller = Controller
  { ctrl_up    :: Bool
  , ctrl_down  :: Bool
  , ctrl_left  :: Bool
  , ctrl_right :: Bool
  }

controller :: SF () Controller
controller = ...

We can then write a little helper function to determine when a button has been pressed—tagging it with a particular value of our choice:

onPress :: (Controller -> Bool) -> a -> SF () (Event a)
onPress field a = fmap (fmap (const a)) $ fmap field controller >>> edge

Next, we can sum up an onPress for each direction on the controller, mapping them into direction vectors:

arrowEvents :: Num a => SF () (Event (V2 a))
arrowEvents =
  (\u d l r -> asum [u, d, l r])
    <$> onPress ctrl_up    (V2 0 (-1))
    <*> onPress ctrl_down  (V2 0 1)
    <*> onPress ctrl_left  (V2 (-1) 0)
    <*> onPress ctrl_right (V2 1    0)

Above, the use of asum allows us to combine these four events into one, meaning that if the player presses two directions at exactly the same moment, we will prefer up over down, and down over left, etc.

By holding onto the most recent arrow event, we can get the current direction our snake is facing:

snakeDirection :: SF () (V2 Float)
snakeDirection = arrowEvents >>> hold (V2 0 1)

which we can then integrate in order to have the snake move around:

snakePosition :: SF () (V2 Float)
snakePosition = snakeDirection >>> integral

Not too shabby at all! This particular snake will move at a rate of 1 unit per second, but we could make him faster by scaling up snakeDirection before taking its integral.

Wrapping Up🔗

Hopefully I’ve given you a taste of how FRP can radically simplify the implementation of real-time applications. Tomorrow we’ll look into arrowized FRP, and get a sense of how to build bigger, more interesting programs.

December 21, 2023 12:00 AM

December 16, 2023

Mark Jason Dominus

My Git pre-commit hook contained a footgun

The other day I made some changes to a program, but when I ran the tests they failed in a very bizarre way I couldn't understand. After a bit of investigation I still didn't understand. I decided to try to narrow down the scope of possible problems by reverting the code to the unmodified state, then introducing changes from one file at a time.

My plan was: commit all the new work, reset the working directory back to the last good commit, and then start pulling in file changes. So I typed in rapid succession:

git add -u
git commit -m 'broken'
git branch wat
git reset --hard good

So the complete broken code was on the new branch wat.

Then I wanted to pull in the first file from wat. But when I examined wat there were no changes.

Wat.

I looked all around the history and couldn't find the changes. The wat branch was there but it was on the current commit, the one with none of the changes I wanted. I checked in the reflog for the commit and didn't see it.

Eventually I looked back in my terminal history and discovered the problem: I had a Git pre-commit hook which git-commit had attempted to run before it made the new commit. It checks for strings I don't usually intend to commit, such as XXX and the like.

This time one of the files had something like that. My pre-commit hook had printed an error message and exited with a failure status, so git-commit aborted without making the commit. But I had typed the commands in quick succession without paying attention to what they were saying, so I went ahead with the git-reset without even seeing the error message. This wiped out the working tree changes that I had wanted to preserve.

Fortunately the git-add had gone through, so the modified files were in the repository anyway, just hard to find. And even more fortunately, last time this happened to me, I wrote up instructions about what to do. This time around recovery was quicker and easier. I knew I only needed to recover stuff from the last add command, so instead of analyzing every loose object in the repository, I did

find .git/objects -mmin 10 --type f

to locate loose objects that had been modified in the last ten minutes. There were only half a dozen or so. I was able to recover the lost changes without too much trouble.

Looking back at that previous article, I see that it said:

it only took about twenty minutes… suppose that it had taken much longer, say forty minutes instead of twenty, to rescue the lost blobs from the repository. Would that extra twenty minutes have been time wasted? No! … The rescue might have cost twenty extra minutes, but if so it was paid back with forty minutes of additional Git expertise…

To that I would like to add, the time spent writing up the blog article was also well-spent, because it meant that seven years later I didn't have to figure everything out again, I just followed my own instructions from last time.

But there's a lesson here I'm still trying to figure out. Suppose I want to prevent this sort of error in the future. The obvious answer is “stop splatting stuff onto the terminal without paying attention, jackass”, but that strategy wasn't sufficient this time around and I couldn't think of any way to make it more likely to work next time around.

You have to play the hand you're dealt. If I can't fix myself, maybe I can fix the software. I would like to make some changes to the pre-commit hook to make it easier to recover from something like this.

My first idea was that the hook could unconditionally save the staged changes somewhere before it started, and then once it was sure that it would complete it could throw away the saved changes. For example, it might use the stash for this.

(Although, strangely, git-stash does not seem to have an easy way to say “stash the current changes, but without removing them from the working tree”. Maybe git-stash save followed by git-stash apply would do what I wanted? I have not yet experimented with it.)

Rather than using the stash, the hook might just commit everything (with commit -n to prevent infinite loops) and then reset the commit immediately, before doing whatever it was planning to do. Then if it was successful, Git would make a second, permanent commit and we could forget about the one made by the hook. But if something went wrong, the hook's commit would still be in the reflog. This doubles the number of commits you make. That doesn't take much time, because Git commit creation is lightning fast. But it would tend to clutter up the reflog.

Thinking on it now, I wonder if a better approach isn't to turn the pre-commit hook into a post-commit hook. Instead of a pre-commit hook that does this:

  1. Check for errors in staged files
    • If there are errors:
      1. Fix the files (if appropriate)
      2. Print a message
      3. Fail
    • Otherwise:
      1. Exit successfully
      2. (git-commit continues and commits the changes)

How about a post-commit hook that does this:

  1. Check for errors in the files that changed in the current head commit
    • If there are errors:
      1. Soft-reset back to the previous commit
      2. Fix the files (if appropriate)
      3. Print a message
      4. Fail
    • Otherwise:
      1. Exit successfully

Now suppose I ignore the failure, and throw away the staged changes. It's okay, the changes were still committed and the commit is still in the reflog. This seems clearly better than my earlier ideas.

I'll consider it further and report back if I actually do anything about this.

Larry Wall once said that too many programmers will have a problem, think of a solution, and implement it, but it works better if you can think of several solutions, then implement the one you think is best.

That's a lesson I think I have learned. Thanks, Larry.

Addendum

I see that Eric Raymond's version of the jargon file, last revised December 2003, omits “footgun”. Surely this word is not that new? I want to see if it was used on Usenet prior to that update, but Google Groups search is useless for this question. Does anyone have suggestions for how to proceed?

by Mark Dominus (mjd@plover.com) at December 16, 2023 04:49 PM

December 15, 2023

Mark Jason Dominus

Recent addenda to articles 202311: Christenings in Tel Aviv

[ Content warning: extremly miscellaneous. ]

Wow, has it really been 7 months since I did one of these? Surprising, then, how few there are. (I omitted the ones that seemed trivial, and the ones that turned into complete articles.)

  • Back in 2018 I wrote an article that mentioned two alleys in Tel Aviv and quoted an article from Haaretz that said (in part):

    A wealthy American businessman … had christened the streets without official permission… .

    Every time I go back to read this I am brought up short by the word “christened”, in an article in Haaretz, in connection with the naming of streets in Tel Aviv. A christening is a specifically Christian baptism and naming ceremony. It's right there in the word!

    Orwell's essay on Politics and the English Language got into my blood when I was quite young. Orwell's thesis is that language is being warped by the needs of propaganda. The world is full of people who (in one of Orwell's examples) want to slip the phrase “transfer of population” past you before you can realize that what it really means is “millions of peasants are robbed of their farms and sent trudging along the roads with no more than they can carry”. Writers are exposed to so much of this purposefully vague language that they learn to imitate it even when they are not trying to produce propaganda.

    I don't mean to say that that's what the Haaretz writer was doing, intentionally or unintentionally. My claim is only that in this one case, because she wasn't thinking carefully about the meanings of the words she chose, she chose a hilariously inept one. Because of an early exposure to Orwell, that kind of mischoice jumps out at me.

    This is hardly the most memorable example I have. The prize for that belongs to my mother, who once, when she was angry with me, called me a “selfish bastard”. This didn't have the effect she intended, because I was so distracted by the poor word choice.

    Anyway, the Orwell thing is good. Brief and compelling. Full of good style advice. Check it out.

  • In 2019, I wrote an article about men who are the husbands of someone important and gave as examples the billionaire husband of Salma Hayek and the Nobel prizewinning husband of Marie Curie. I was not expecting that I would join this august club myself! In April, Slate ran an article about my wife in which I am referred to only as “Kim's husband”. (Judy Blume's husband is also mentioned, and having met him, I am proud to be in the same club.)

  • Also, just today I learned that Antoine Veil is interred in the Panthéon, but only because he was married to Simone Veil.

  • In an ancient article about G.H. Hardy I paraphrased from memory something Hardy had said about Ramanujan. In latter years Hardy's book become became available on the Internet, so I was able to append the exact quotation.

  • A few years ago I wrote a long article about eggplants in which I asked:

    Wasn't there a classical Latin word for eggplant? If so, what was it? Didn't the Romans eat eggplant? How do you conquer the world without any eggplants?

    I looked into this a bit and was amazed to discover that the Romans did not eat eggplant. I can only suppose that it was because they didn't have any, poor benighted savages. No wonder the Eastern Roman Empire lasted three times as long.

by Mark Dominus (mjd@plover.com) at December 15, 2023 05:25 PM

Haskell Interlude

39: Rebecca Skinner

In this episode, we are joined by Rebecca Skinner. She talks about her new book, Effective Haskell, which takes you from list manipulation to thunks to type-level programming. She also tells us about large scale industrial applications in Haskell, and how the architecture is shaped by the organization of the engineering teams.

Disclaimer: Mercury is a financial technology company, not a bank. Banking services provided by Choice Financial Group and Evolve Bank & Trust, Members FDIC.

by Haskell Podcast at December 15, 2023 11:00 AM

December 09, 2023

Magnus Therning

Getting Amazonka S3 to work with localstack

I'm writing this in case someone else is getting strange errors when trying to use amazonka-s3 with localstack. It took me rather too long finding the answer and neither the errors I got from Amazonka nor from localstack were very helpful.

The code I started with for setting up the connection looked like this

main = do
  awsEnv <- AWS.overrideService localEndpoint <$> AWS.newEnv AWS.discover
  -- do S3 stuff
  where
    localEndpoint = AWS.setEndpoint False "localhost" 4566

A few years ago, when I last wrote some Haskell to talk to S3 this was enough1, but now I got some strange errors.

It turns out there are different ways to address buckets and the default, which is used by AWS itself, isn't used by localstack. The documentation of S3AddressingStyle has more details.

So to get it to work I had to change the S3 addressing style as well and ended up with this code instead

main = do
  awsEnv <- AWS.overrideService (s3AddrStyle . localEndpoint) <$> AWS.newEnv AWS.discover
  -- do S3 stuff
  where
    localEndpoint = AWS.setEndpoint False "localhost" 4566
    s3AddrStyle svc = svc {AWS.s3AddressingStyle = AWS.S3AddressingStylePath}

Footnotes:

1

That was before version 2.0 of Amazonka, so it did look slightly different, but overriding the endpoint was all that was needed.

December 09, 2023 04:23 PM

December 07, 2023

Tweag I/O

BazelCon Community Day - Munich

On October 23, the day before the first European BazelCon, EngFlow and Tweag organized the sixth Bazel Community Day at the Salesforce office in Munich, capped off with a happy hour sponsored by Gradle.

Photo from Bazel Community Day, Munich

The event kicked off early afternoon with snacks and welcome talks:

  • Introduction by Helen Altshuler (EngFlow) and Andreas Herrmann (Tweag)
  • Bazel at Salesforce by Gunnar Wagenknecht (Salesforce)

After that, the event split into two parallel tracks:

Track 1:

  • A Bazel Beginner Bootcamp, by Billy Autrey (EngFlow)
  • Debugging Cache Misses and Sources of Nondeterminism in Bazel, by Ben Radford (Tweag) and Joseph Gette (Mercedes)

The Bootcamp was delivered to a packed room with mixed experience levels and received useful feedback which will be used to improve the next one.

It was really good actually. I already use Bazel, but the Bootcamp helped me fill in some gaps.

The session on debugging cache misses was also in high demand. We had to move it to the main stage to provide enough room for all attendees. This session also received positive and constructive feedback.

Track 2:

Comprised a series of lightning talks:

  • Coverage with Bazel: An Overdue Summary, by Ulf Adams (EngFlow)
  • Bazel Migration Using Fully Ephemeral BUILD Files, by Markus Hofbauer (Luminar)
  • Fast Incremental Bazel Builds with ‘Persistent Pods’, by Shishir Kumar (ThoughtSpot)
  • Build Server Protocol, by Andrzej Głuszak (Jetbrains)
  • Buck2: Optimizations and Dynamic Dependencies, by Neil Mitchell and Chris Hopman (Meta)
  • Bazel + Go with rules_go and Gazelle, by Tyler French and Zhongpeng Lin (Uber)
  • Building JavaScript & TypeScript with rules_js, by Greg Magolan (Aspect)

There were also two unconference sessions. One focused on managing large monorepos and remote execution, while the other went into a deeper discussion on Buck2, Bazel, Starlark, and bzlmod. Bazel folks were interested to know more about Buck2’s Starlark type checking, profiling, debugging, and dynamic actions.

We’ll go into more detail about some of those talks and sessions. The recordings of the lightning talks are available on YouTube.

Intro and welcome

Helen Altshuler (EngFlow) gave us a brief history of Bazel Community Day, a quick recap of the Build Meetup from a few days prior, and some numbers about Bazel uptake. She also mentioned that the next Community Day would be held in London, sometime in June 2024.

Andreas Herrmann (Tweag) gave us a breakdown of the Community Day Structure and logistics for the day and moderated the lightning talks.

Gunnar Wagenknecht (Salesforce) gave us a detailed account of the scale of the Bazel migration at Salesforce. Many groups are using Bazel, some with up to 4000 engineers, multiple languages and operating systems, and up to 120000 Bazel targets. This was made possible by disallowing custom build rules, regulating load statements, and testing everything.

Coverage with Bazel: An Overdue Summary, by Ulf Adams (EngFlow)

After excitedly confessing to having had two colas before the talk, Ulf Adams sped into an overview of code coverage and Bazel.

Coverage keeps track of what code is executed.

Ulf underscored that while low coverage may indicate subpar tests, a high coverage figure doesn’t guarantee the quality of the tests. You also need to select tests based on coverage to target your changes to optimize resource utilization.

He went on to talk about how there is no right way to collect coverage, instead, it can be done:

  • per file
  • per class
  • per function
  • per line
  • per character

When looking at code coverage with Bazel, you find that different languages have different coverage stacks, each with its own format! Before having a human look at the output you’re probably going to need to post-process it:

  1. With an output generator,
  2. A report generator, or
  3. By manually generating the HTML for human consumption.

Bazel Migration Using Fully Ephemeral BUILD Files, by Markus Hofbauer (Luminar)

Markus gave us an account of how Luminar, a lidar system built from chip for autonomous driving, migrated to ephemeral BUILD files that aren’t checked into version control.

They were initially using Conan, a C++ package manager, to instrument CMake and split the monolithic CMake project into packages with interdependencies. That resulted in a handcrafted build system for the mono repo, which didn’t work for them in the end.

Then they looked at automatically converting the Conan Buildfile into Bazel BUILD files, but:

  • people introduced changes that broke the build, and
  • they didn’t want to maintain two CI systems.

So how did they solve this? By always auto-generating the Bazel BUILD files, using Gazelle, with the following constraints:

  • always mapping 1 header file to 1 target, resulting in small targets
  • figuring out dependencies by looking at include files with naming conventions
  • handling edge cases with manual targets that override Gazelle
  • using NixOS for 3rd party dependencies
  • removing local BUILD files

All of which led them to a happy state with low Bazel maintenance so they could focus on developing new features.

Fast Incremental Bazel Builds with ‘Persistent Pods’, by Shishir Kumar (ThoughtSpot)

Shishir’s opening line,

What do you want in life? Peace, long life, money, and faster builds.

got some laughs from the crowd. He went on to discuss the difference between clean and incremental builds, and how he leveraged warm Bazel caches with long-living Kubernetes pods as Jenkins agents, and local disk caches as low-latency persistent volumes to achieve a 40% reduction in average Build and Test time.

Challenges included finding the right pod for the incoming build, minimizing the amount of work that you do, and syncing Git branches.

Ongoing work includes experimenting with different heuristics to minimize build time:

  • using the SHA from the previous build
  • diffing the targets between the two SHAs and using a smaller target list

All in all, they made good progress, but there are many avenues still left to explore.

Build Server Protocol, by Andrzej Głuszak (Jetbrains)

Andrzeij presented a lightning view of the Build Server Protocol (BSP), which creates an abstraction layer between editors and build tools, analogous to what the Language Server Protocol (LSP) does for language tools.

By using the protocol, you can write once and run everywhere! Andrezej also mentioned a new IntelliJ plugin for Bazel that uses BSP.

Buck2: optimizations and dynamic dependencies, by Neil Mitchell and Chris Hopman (Meta)

Next up, Neil and Chris gave an account of Buck2, the new primary build system at Meta, which is:

  • Polyglot
  • For Monorepos
  • Open Source
  • Twice as fast as Buck1

Buck2 has target files in Starlark specific to the project, rules which are all written in Starlark, a core binary written in Rust, providing the Starlark APIs and a Starlark interpreter. It also has profiling, linting and a type checker built in.

The main reason to build Buck2 was to go faster, and as such, it has:

  • a single dependency graph with no phases
  • remote execution, with pre-computed Merkle trees, and virtual files to provide builds without the bytes and inputs through EdenFS.

Some rules don’t yet work, and it does not have a bzlmod-like package manager, but it does have dynamic dependencies and the Buck Extension Language (BXL).

You can learn more about the differences and similarities between Bazel and Buck2 here.

Bazel + Go with rules_go and Gazelle, by Tyler French and Zhongpeng Lin (Uber)

Tyler and Lin talked about how nogo, the rules_go static analysis tool, ran as part of the compile action, and how changing the nogo settings file invalidated the cache. An approach to solving this could be moving the nogo check out of the compile action.

They also talked about using Go modules with generated code, and handling dependencies with large MODULE.bazel files.

Building JavaScript & TypeScript with rules_js, by Greg Magolan (Aspect)

Greg started by describing the early state of JavaScript support in Bazel, which came out in 2017 with rules_nodejs but hit a dead end in terms of performance with third-party dependencies and runtime compatibility with JavaScript and TypeScript build tools.

In 2021, Greg and Alex Eagle started writing rules_js, which was released in August 2022 and has gained steady adoption.

rules_js handles third-party dependencies much more efficiently, by working with pnpm lockfiles.

Debugging Cache Misses, by Ben Radford (Tweag), and Joseph Gette (Mercedes)

Ben and Joseph co-presented a workshop on debugging cache misses.

It began with a brief explanation of hermeticity and determinism, which are important concepts to know about when trying to understand why a cache miss has occurred.

Then the audience were invited to clone the [toy repository][cache-misses-worksho] created for the workshop, and follow along as various issues were discovered, explained, and fixed.

Bonus: Games Night after BazelCon Day 1

The next day, in the evening after BazelCon Day 1, JetBrains hosted a games night at their Munich office, co-organized with EngFlow.

Spirits were high after a packed and exciting first conference day. Groups formed where people chatted with beverages in hand or played board games, card games, and chess.

We even had a chance to enjoy a round of a Jeopardy-style quiz game about Bazel flags, featuring the new deck of Bazel flag playing cards EngFlow printed for the event.

Photo of Bazel playing cards made by EngFlow

Bringing Bazel Community Day to You

If you’d like to join a future Bazel Community Day, then you can fill in this survey hosted by EngFlow.

You can also read this article on the EngFlow Blog.

December 07, 2023 12:00 AM

December 04, 2023

Matthew Sackman

Golang Bebop serialisation codec

I’ve been paying some attention to serialisation formats since 2014 or so. I used Cap’n Proto when I was writing GoshawkDB which dates from around 2015, and just before that I think I’d been using Protobuf. Two years ago, when working on WalkTogether, was the first time I’d used Bebop.

There are dozens of different serialisation formats, and a very wide range of trade-offs to each one. I tend to favour formats where you define the schema in a separate language and file, and from there use tools to generate suitable code for different languages. I think I was very resistant to this approach when I first came across it, but without it, it means you have no single source of truth as to what the protocol or its types are, and manually updating different implementations in different languages quickly becomes a disaster.

JSON has its place: it benefits from being mildly human-readable on the wire, and if you really need something vaguely self-describing then it’s probably the obvious choice. Tooling support for JSON is great, especially in browsers, though JSON Schema is pretty awful. But JSON doesn’t scale, and the awful efficiency of the encoding is frustrating. About a year ago I found myself dealing with 100GB+ JSON files. That wasn’t a lot of fun, though I would be surprised if even the most efficient serialisation format would get that below 70GB and so you’d still be writing your own tooling to deal with streaming and processing files at that size.

Bebop brings nothing new to the table. It doesn’t say anything about memory allocation like Cap’n Proto, and it doesn’t have built-in mechanisms to deal with versioning and schema evolution (though you can build support for that yourself by using its unions). It doesn’t have good documentation – certainly nothing that I would consider acceptable as a specification. There are also some statements there that at best are questionable. For example when talking about opcodes, it says:

All the compiler does is check that no opcode is used twice…

Well that’s impossible unless you’re doing compile-the-world. But there’s no specification of how compilation should proceed. So in my implementation it’s a runtime check (via Go’s init() funcs) that no opcode is used twice.

There is an existing Go binding which I’ve used before. It’s fast and it seems to work just fine. Nevertheless, I wasn’t super keen on a few aspects: the parser is hand rolled; the code generation is done by appending strings together; and the generated code isn’t all that nice. None of these are critical issues really, but nevertheless I decided to see whether I could address these points, and here is the result.

For the parser, I’m using Pigeon which I’ve used before. Given that upstream doesn’t have a specification of the Bebop language, I propose my PEG grammar as a specification (of sorts). That grammar is fairly relaxed about where you need semicolons, new lines, that sort of thing.

I’m using Go’s stdlib templates to drive the code generation. In truth the code to drive it is not super pretty: it would probably benefit from a bunch of refactoring and tidying. The generated code though is fairly nice: I don’t think it’s terribly far off the sort of code I’d write if I was rolling it by hand. There’s a []byte-based APIs (Marshal/Unmarshal) which makes sense when you have something else doing framing (for example maybe you’re sending over websocket and so your transport is message oriented already, or you’re reading in from a key-value store so the value’s length is already known), and there’s an io.Reader/io.Writer-based API too for when your transport is a stream (e.g. plain TCP).

December 04, 2023 04:01 PM

November 30, 2023

Ken T Takusagawa

[iuigljdm] and (False,True)

some more silliness resulting from the Foldable Traversable Proposal (FTP) in Haskell:

*Main> and (False,True)
True
*Main> and [False,True]
False
*Main> uncurry (&&) (False,True)
False
*Main> snd (False,True)
True

"and" may be called on a tuple because tuples are instances of Foldable.  this is similar to length (1,2) == 1.

"and" == "snd" was discovered by accident.  I had accidentally typed "and" instead of "snd" (A and S are adjacent on a QWERTY keyboard), calling it on an argument of type (a,Bool).  despite the typo, seemingly substituting functions of completely different types, the program compiled and ran successfully.  I think "and" and "snd" always give the same answer for inputs of type (a,Bool).

by Unknown (noreply@blogger.com) at November 30, 2023 10:38 PM

November 28, 2023

Tweag I/O

Source filtering with file sets

Sponsored by Antithesis (distributed systems reliability testing experts), I’ve developed a new library to filter local files in Nix which I’d like to introduce!

This post requires some familiarity with Nix and its language. So if you don’t know what Nix is yet, take a look first, it’s pretty neat.

In this post we’re going to look at what source filtering is, why it’s useful, why a new library was needed for it, and the basics of the new library.

This post notably won’t really teach you a lot about the new library, that’s what the official tutorial and the reference documentation is for. But if you’d like to know some background and motivation, please read on!

Why filter sources

You most likely have come across this pattern:

stdenv.mkDerivation {
  src = ./.;
  # stuff..
}

This is the basis for a Nix expression to build a project in a local directory. There’s a lot of magic to make this work, but we’ll focus on just a few aspects:

  • Bar some exceptions, attributes passed to stdenv.mkDerivation are automatically turned into environment variables that are available to the derivation builder.
  • The relative path expression ./. is turned into a string of the form "/nix/store/<hash>-<name>" by hashing the contents of the directory that Nix file is in, and adding it to the Nix store.

We then end up with a store derivation whose environment variables include src = "/nix/store/<hash>-<name>". To get more info on how all of this works, see the documentation on derivations.

This generally does work, with the big caveat that all files in the local directory are copied into the Nix store and become a dependency of the derivation. This means that:

  • Changing any file will cause the resulting derivation to change, making Nix unable to reuse previous build results.

    For example, if you just format your Nix files, Nix will have to build the project’s derivation again!

  • If you have secret files in the local directory, they get added to the Nix store too, making them readable by any user on the system!

    Note If you use the experimental new nix CLI with Flakes and Git in a current (2.18.1) version of Nix, only files tracked by Git will be available to Nix, so this generally won’t be a problem.

    But be careful: If you don’t use Git, the entire directory is always copied to the Nix store! Furthermore, experimental features may change over time.

The hardcore way to filter sources

To address this, Nix comes with builtins.path, which allows controlling how paths get added to the Nix store:

builtins.path {
  # The path to add to the store
  path = ./.;
  # A function determining whether to include specific paths under ./.
  filter =
    # A path under ./.
    path:
    # The path type, either "directory", "normal", "symlink" or "unknown"
    type:
    # The return value of this function indicates
    # whether the path should be included or not.
    # In this case we always return true, meaning everything is included
    true;
}

While this interface looks straightforward, it’s notoriously tricky to get it to do what you want.

Let’s give it a try and start with something trivial, say only including a single file in the current directory:

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    path == ./file;
}
$ touch file

$ tree "$(nix eval --raw -f.)"
/nix/store/dg5zq00kxabc3lfg03bnrfwax1ndgn6s-filter

0 directories, 0 files

It doesn’t work, we just get an empty directory! The problem here is that the filter function is not called with path values but rather strings, and the == operator always returns false when given two values of different types.

To fix this, we can use the builtin toString function, which converts path values to strings:

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    path == toString ./file;
}
$ tree "$(nix eval --raw -f.)"
/nix/store/2myzf03ca2ch3lc40p7frvqqbvm5nm2m-filter
└── file

1 directory, 1 file

Great, this works! Now let’s try the same for dir/file:

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    path == toString ./dir/file;
}
$ mkdir dir && touch dir/file

$ tree "$(nix eval --raw -f.)"
/nix/store/dg5zq00kxabc3lfg03bnrfwax1ndgn6s-filter

0 directories, 0 files

Apparently this doesn’t work for nested directories.

The problem now is that the filter function first gets called on dir itself. And because ./dir != ./dir/file, it returns false, therefore excluding dir entirely.

To fix this we need to make sure the function recurses into the directories, which we can do by checking for type == "directory":

# default.nix
builtins.path {
  path = ./.;
  filter =
    path: type:
    # Return true for all directories
    type == "directory"
    # But also for the file we want to include
    || path == toString ./dir/file;
}

But what if there is another directory we don’t care about?

$ mkdir another-dir

$ tree $(nix eval --raw -f.)
/nix/store/wj0y4f4x5llzz8kj48jd0gvszddp3jr0-filter
├── another-dir
└── dir
    └── file

3 directories, 1 file

This worked, dir/file is there. But so is the another-dir, although it doesn’t even contain any files!

We could go on like that, but I think you get the gist: This function is tricky to use!

Introducing the file set library

Let’s compare this to the new file set library:

{ lib ? import (fetchTarball "channel:nixos-23.11" + "/lib") }:
lib.fileset.toSource {
  root = ./.;
  fileset = ./dir/file;
}
$ tree $(nix eval --raw -f. outPath)
/nix/store/csgp388b3zqxp2av01gjncy9sadxib9q-source
└── dir
    └── file

2 directories, 1 file

This is much more straightforward and does what you’d expect. But where is the file set here?

The key here is that when the file set library expects to get a file set, but gets a path, it implicitly turns the path into a file set. So to the library, ./dir/file is a file set containing just the single file ./dir/file, while ./dir would be a file set containing all files in the directory ./dir.

The real power of this library however comes from the fact that file sets behave just like mathematical sets, and it comes with some core functions to support that:

Some other notable features are:

  • Files are never added to the store unless explicitly requested with lib.fileset.toSource.
  • Maximum laziness: Directories are never recursed into more than necessary.
  • Actionable error messages in case something doesn’t look right.
  • Minimal assumptions: The library only relies on stable Nix features. It even works correctly with possible future changes to the behavior of Nix paths.

But this is not the best place to teach you about the library. For that, head over to the official tutorial instead, or check out the reference documentation!

The file set library is going to be included in the upcoming NixOS 23.11 release. If you encounter any problems using it or are missing some feature, let me know in this tracking issue.

Comparison

For completeness, we also need to look at previous related efforts and see how they compare to this library:

  • builtins.fetchGit ./. allows creating a store path from all Git-tracked files, so it’s very similar to lib.fileset.gitTracked. However, it’s tricky to further restrict or extend the set selected files, since the above filter-based approach wouldn’t work on store paths without some changes.

  • lib.sources.cleanSourceWith is a simple wrapper around builtins.path. While it has the same filter-based interface, it improves over builtins.path by being chainable, allowing the set of included files to be further restricted.

  • lib.sources.cleanSource uses cleanSourceWith underneath to set filter to a reasonable default, filtering out some of the most common unwanted files automatically. The file set library doesn’t yet have a good replacement for this, but there is lib.fileset.fromSource, which you can use to convert any lib.sources-based value to a file set.

  • lib.sources.sourceByRegex and lib.sources.sourceFilesBySuffices are also functions built on top of cleanSourceWith, and as such can be chained with each other. While sourceFilesBySuffices is not bad, the interface of sourceByRegex is rather clunky and error-prone. Furthermore, it’s hard to add more files to the result.

  • gitignore.nix and pkgs.nix-gitignore allow you to filter files based on Git’s .gitignore files, which is very related to Git-tracked files. The file set library doesn’t replace these functions, but it can be used as a more composable foundation.

  • nix-filter is a third-party lib.sources wrapper. It wraps it with a nicer interface, but suffers from some unclear semantics and composability issues. The file set library should serve as an improved replacement.

  • Source combinators was a previous attempt to create a composable interface for handling source files. It was a bit tricky to use and never merged, but this is in fact the work that inspired the new file set library!

Conclusion

We’ve seen that filtering sources can improve your Nix experience by avoiding unnecessary derivation rebuilds. While it was possible to filter sources before using the builtins.path function and other approaches, there are many pitfalls. The lib.fileset library in comparison makes source filtering a breeze.

In addition to a huge thanks to Antithesis as the main sponsor of this work, I’d also like to thank Robert Hensing from Hercules CI and Valentin Gagarin from Tweag for all the help they’ve given me during reviews!

November 28, 2023 12:00 AM

November 27, 2023

Monday Morning Haskell

Black Friday Sale: Last Day!

We've come to Cyber Monday, marking the last day of our Black Friday sale! Today is your last chance to get big discounts on all of our courses. You can get 20% by using the code BFSOLVE23 at checkout. Or you can subscribe to our mailing list to receive a 30% discount code. You must use these codes by the end of the day in order to get the discount!

Here's a final runthrough of the courses we have available, including our newest course, Solve.hs!

Solve.hs

We just released the first part of our newest course last week! These two detailed modules dive into the fundamentals of problem solving in Haskell. You'll get to rewrite the list type and most of its API from scratch, teaching you all the different ways you can write "loop" code in Haskell. Then you'll get an in-depth look at how data structures work in Haskell, including the quick process to learn a data structure from start to finish!

Course Page

Normal Price: $89 Sale Price: $71.20 Subscriber Price: $62.30

Haskell From Scratch

This is our extensive, 7-module beginners course. You'll get a complete introduction to Haskell's syntax and core concepts, including things like monads and tricky type conversions.

Course Page

Normal Price: $99 Sale Price: $79.20 Subscriber Price: $69.30

Practical Haskell

Practical Haskell is designed to break the idea that "Haskell is only an academic language". In our longest and most detailed course, you'll learn the ins and outs of communicating with a database in Haskell, building a web server, and connecting that server to a functional frontend page. You'll also learn about the flexibility that comes with Haskell's effect systems, as well as best practices for testing your code, including tricky test cases like IO based functions!

Course Page

Normal Price: $149 Sale Price: $119.20 Subscriber Price: $104.30

Making Sense of Monads

The first of our shorter, more targeted courses, Making Sense of Monads will teach you how to navigate monads, one of Haskell's defining concepts. This idea is a bit tricky at first but also quite important for unleashing Haskell's full power. The course is well suited to beginners who know all the basic syntax but want more conceptual practice.

Note that Making Sense of Monads is bundled with Haskell From Scratch. So if you buy the full beginners course, you'll get this in-depth look at monads for free!

Course Page

Normal Price: $29 Sale Price: $23.20 Subscriber Price: $20.30

Effectful Haskell

If Making Sense of Monads is best for teaching the basics of monads, Effectful Haskell will show you how to maximize the potential of this idea. You'll develop a more complete idea of what we mean by "effects" in your code. You'll see a variety of ways to incorporate them into your code and learn some interesting ideas about effect substitution!

Course Page

Normal Price: $39 Sale Price: $31.20 Subscriber Price: $27.30

Haskell Brain

Last, but not least, Haskell Brain will teach you how to perform machine learning tasks in Haskell with TensorFlow. There's a lot of steps involved in linking these two technologies. So while machine learning is a valuable skill to have in today's world, understanding the ways we can link software together is almost as valuable!

Course Page

Normal Price: $39 Sale Price: $31.20 Subscriber Price: $27.30

Conclusion

So don't miss out on this special offer! You can use the code BFSOLVE23 for 20% off, or you can subscribe to our mailing list to get a code for 30% off! This offer ends tonight, so don't wait!

by James Bowen at November 27, 2023 03:30 PM

November 24, 2023

Monday Morning Haskell

Spotlight: Quick, Focused Haskell Courses

A couple days ago I gave a brief spotlight on the longer, more in-depth courses I've written. The newest of these is Solve.hs, with its focus on problem solving, and the original two I wrote were Haskell From Scratch and Practical Haskell.

After my first two courses, I transitioned towards writing a few shorter courses. These are designed to teach vital concepts in a shorter period of time. They all consist of just a single module and have a shorter total lecture time (1.5 to 2 hours each). You can finish any of them in a concentrated 1-2 week effort. Today I'll give a brief summary of each of these, listed from most abstract to most practical, and easiest to hardest.

Remember, all of these are on sale at 20% off using the code BFSOLVE23 at checkout! You can also subscribe to our mailing list to get an even bigger discount, at 30% off!

Making Sense of Monads

This is for those of you who have been writing Haskell long enough that you've got the hang of the syntax, but you still struggle a bit to understand monads. You might look at parts of Modules 4 and 5 from Haskell From Scratch and think they look useful, but you don't think you need the rest of the course.

Making Sense of Monads really "zooms in" on Module 5. It goes deeper in understanding all of the simpler structures that help us understand monads, and it gives a sizable amount of practice with writing monadic code. You'll also get a crash course on parsing (a common use of monadic operations), and write two fairly complex parsers. So it's a great option if you want a shorter but more concentrated approach on some of the basics!

Effectful Haskell

Effectful Haskell takes a lot of the core ideas and concepts in Making Sense of Monads and goes one step beyond into the more practical realm of applying monadic effects in a program. You'll learn more abstractly what an effect is, but then also the different ways to incorporate polymorphic effects into your Haskell program. You'll see how to use monads and monad classes to swap effectful behaviors in your program, and why this is useful.

This course culminates in a similar (but smaller) project to Practical Haskell, where you'll deploy an effectful web server to Heroku.

Haskell Brain

This course is the hardest and most practically-oriented of this series. You will take on the challenge of incorporating TensorFlow and machine learning into Haskell. This is easier said than done, because TensorFlow has many dependencies beyond the normal packages you can simply pick up on Hackage. So you'll gain valuable experience going through this installation process, and then we'll run through some of the main information you need to know when it comes to creating tensors in Haskell, and building moderately complex models.

Conclusion

So while these courses are shorter, they still pack a decent amount of material! And with the subscriber discount, you can get each of them for less than $30! This offer will only last until Monday though, so make up your mind quickly!

by James Bowen at November 24, 2023 03:30 PM

November 23, 2023

Tweag I/O

Separating debug symbols from executables

This article aims to introduce and explore the practice of splitting debug symbols away from C/C++ build artifacts to save space and time when building large codebases. Note that we want to retain access to the debug symbols if and when they are needed at a later date, hence we don’t want to merely remove (aka strip) the debug symbols.1

This exploration is largely inspired and based on what I have learned in various places around the web, most notably:

along with various experiments of my own, outlined below.

This article will focus on ELF files on Linux. For other formats and platforms, things are likely to be quite different. The compiler/toolchain below is based on GCC, but the experiments are repeatable with minor changes on LLVM-based toolchains. Your mileage may vary.

What are debug symbols?

In short, debug symbols are extra “stuff� in your intermediate object files — and ultimately in your executables — that help your debugger map the machine code being executed back into higher-level source code concepts like variables and functions. This allows the debugger to then present a view of the execution that corresponds more directly to the source code that you’re used to reading.

Without debug symbols, the debugger can become almost useless as it is typically very hard to understand which part of the (often optimized) machine code execution (i.e. registers, memory addresses, etc.) corresponds to which part of the source code (variables, functions, etc.).

To illustrate with a toy example, hello.cpp:

#include <iostream>

int main() {
    std::cout << "Hello, World!" << std::endl;
    return 0;
}

Debug symbols are the difference between:

$ g++ hello.cpp -o hello.default
$ gdb ./hello.default
GNU gdb (GDB) 13.1
[...]
Reading symbols from ./hello.default...
(No debugging symbols found in ./hello.default)
(gdb) br main
Breakpoint 1 at 0x4010a0
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.default
[...]
Breakpoint 1, 0x00000000004010a0 in main ()
(gdb) list
No symbol table is loaded.  Use the "file" command.

and:

$ g++ -g hello.cpp -o hello.with-g
$ gdb ./hello.with-g
[...]
Reading symbols from ./hello.with-g...
(gdb) br main
Breakpoint 1 at 0x4010a0: file hello.cpp, line 4.
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.with-g
[...]
Breakpoint 1, main () at hello.cpp:4
4           std::cout << "Hello, world!" << std::endl;
(gdb) list
1       #include <iostream>
2
3       int main() {
4           std::cout << "Hello, world!" << std::endl;
5           return 0;
6       }

Preliminaries

Before we dive into the deeper analysis, let’s make sure that our toy example can stay relevant for the remainder of this exploration.

Compiling and linking as separate steps

Above, we used a single command (g++ <options> hello.cpp -o hello.<suffix>) to compile and link our application. These two steps are worth separating in our further analysis. For one, it allows us to examine the intermediate results (the object files). But also, all build systems for larger C/C++ codebases typically compile and link in separate steps already, so this is closer to what we’ll encounter when we want to apply our learnings to a larger build system towards the end of this article.

Here’s how we separate the two steps:

$ g++ -g -c hello.cpp
$ g++ hello.o -o hello.with-g.2

We can confirm that nothing has changed with our executable:

$ ls -l hello.*
-rw-r--r-- 1 jherland users    97 Jan  1 00:00 hello.cpp
-rwxr-xr-x 1 jherland users 16304 Jan  1 00:00 hello.default
-rw-r--r-- 1 jherland users 29744 Jan  1 00:00 hello.o
-rwxr-xr-x 1 jherland users 38784 Jan  1 00:00 hello.with-g
-rwxr-xr-x 1 jherland users 38784 Jan  1 00:00 hello.with-g.2
$ diff --report-identical-files hello.with-g hello.with-g.2
Files hello.with-g and hello.with-g.2 are identical

In addition, we have this intermediate hello.o file. More about that, soon.

At this point we can start to play around with different compiler and linker options.

Using the gold linker

In fact, let’s start with trying a different linker altogether. The default linker used by GCC is the BFD linker, and that is what we’ve used so far. Let’s try the more recent gold linker instead. There are multiple reasons for this switch:

  • gold is faster and generates smaller executables than BFD.
  • gold supports some options that we’re going to need later on.
  • gold is a popular choice in many build systems, including Bazel.

Let’s re-run the above commands, but using gold:

$ rm hello.with-g.2
$ g++ -fuse-ld=gold hello.cpp -o hello.default
$ g++ -g -c hello.cpp
$ g++ -fuse-ld=gold hello.o -o hello.with-g

What about other linkers?

In this experiment we could have opted for an even more modern linker, like lld from the LLVM project or the more recent mold linker. The choice, however, is often dictated by the context of your project. For example, if you’re working in an embedded setting, the more recent linkers are often not available for the cross-building toolchains that are used.

What are debug symbols, really?

OK, with that out of the way, let’s dive into what our executables look like with and without debug symbols. First let’s have a look at the relative sizes of our files:

$ ls -l hello.*
-rw-r--r-- 1 jherland users    97 Jan  1 00:00 hello.cpp
-rwxr-xr-x 1 jherland users  8280 Jan  1 00:00 hello.default
-rw-r--r-- 1 jherland users 29744 Jan  1 00:00 hello.o
-rwxr-xr-x 1 jherland users 31560 Jan  1 00:00 hello.with-g

We can see that the debug symbols add an extra (31560 - 8280 =) 23280 bytes (or almost 300%) to the final executable. Comparing the output of readelf --sections --wide between the two executables (to list the ELF sections inside), we can see some extra ELF sections in the latter:2

+  [28] .debug_info       PROGBITS        0000000000000000 001023 002c66 00      0   0  1
+  [29] .debug_abbrev     PROGBITS        0000000000000000 003c89 0007b9 00      0   0  1
+  [30] .debug_loclists   PROGBITS        0000000000000000 004442 00010a 00      0   0  1
+  [31] .debug_aranges    PROGBITS        0000000000000000 00454c 000050 00      0   0  1
+  [32] .debug_rnglists   PROGBITS        0000000000000000 00459c 00007f 00      0   0  1
+  [33] .debug_line       PROGBITS        0000000000000000 00461b 000242 00      0   0  1
+  [34] .debug_str        PROGBITS        0000000000000000 00485d 001b62 01  MS  0   0  1
+  [35] .debug_line_str   PROGBITS        0000000000000000 0063bf 0004e1 01  MS  0   0  1

The other sections in the executable appear to be unchanged.

So, the debug symbols are part of the executable file, but they are not part of the actual machine code that is being executed (as that resides in other ELF sections).

You can drill further into the contents of these sections with commands like readelf --debug-dump, but the above will suffice for our analysis here.

What is the problem with debug symbols?

They take up space. And therefore also time. Both build time and run time.

In the above toy example, the effects are also toy-sized, but as we’ll see later, the giant proportions of debug symbols relative to executable code remain when we scale up to real-world projects. Furthermore this extra space is consumed both in the executable and the intermediate object files. In other words, turning on debug symbols can make your build artifacts take up orders of magnitude more disk space compared to a build without debug symbols.

Since these debug symbols are embedded within the intermediate object files, the tools that interact with these files have to process the debug symbols too: the compiler has to generate them in the first place, and the linker has to copy these sections into the final executable. Then, typically, the final packaging steps of the build process need to package these larger executables. All of these steps take extra time, because everything is so much bigger.

This extra space and time is all wasted as long as the debug symbols are not actually used.

In embedded projects where the final executable often has to run on a hardware-constrained device, the extra space taken up by debug symbols can also be the difference between something that will fit on the device and run successfully, and something that simply won’t.

Stripped executables

Many projects strip their final executables before deploying them. Stripping removes the debug symbols from the executable, but it also removes more than that. Returning to our toy example:

$ strip hello.with-g -o hello.stripped
$ ls -l hello.*
-rw-r--r-- 1 jherland users    97 Jan  1 00:00 hello.cpp
-rwxr-xr-x 1 jherland users  8280 Jan  1 00:00 hello.default
-rw-r--r-- 1 jherland users 29744 Jan  1 00:00 hello.o
-rwxr-xr-x 1 jherland users  6368 Jan  1 00:00 hello.stripped
-rwxr-xr-x 1 jherland users 31560 Jan  1 00:00 hello.with-g

With strip we are able to remove not only the debug symbols, but also an additional 1912 bytes (or ~23%) from the original executable (i.e. compared to hello.default). What is removed? Again, comparing the output of readelf --sections --wide on hello.default vs hello.stripped, we see that the following sections disappear:

-  [29] .symtab           SYMTAB          0000000000000000 001040 000408 18     30  21  8
-  [30] .strtab           STRTAB          0000000000000000 001448 0002df 00      0   0  1

Similar to debug symbols these sections are not necessary for the execution of the program per se, but they do provide the most basic of symbol lookup (e.g. what function name is located at which address).3 Without these sections, any kind of debugging now becomes very hard indeed:

$ gdb ./hello.stripped
[...]
Reading symbols from ./hello.stripped...
(No debugging symbols found in ./hello.stripped)
(gdb) br main
Function "main" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) br *0x4009e0
Breakpoint 1 at 0x4009e0
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.stripped
[...]
Breakpoint 1, 0x00000000004009e0 in ?? ()
(gdb) list
No symbol table is loaded.  Use the "file" command.

Where our initial executable (compiled with default options) was at least able to understand where the main function was located, this stripped executable offers absolutely no help whatsoever, and we’re forced to interact via raw memory addresses. 😱

In other words, running a stripped executable under a debugger is not a very friendly experience. Not only are we missing correspondences between the machine code and the source code, but we’re even missing the most fundamental of symbol information.

Together this provides the top two reasons for why someone would like to strip their executables:

  1. To really make the executable as small and fast as possible.
  2. To intentionally make it harder to run the executable in a debugger, especially if developed in a proprietary setting and/or deployed into a potentially hostile environment where one wants to make reverse-engineering more difficult.

Stripped and unstripped, the best of both worlds?

A common practice in many C/C++ projects is to build everything with debug symbols, and then add a final build step that strips all the executables. Thus we can provide two versions of all executables: one stripped that is small and fast, and an unstripped version that contains all the debugging comforts.

When you need to debug the stripped executable, you can have gdb look at the corresponding unstripped executable to find all the debug symbols. Here, the unstripped executable is not actually executed itself, instead we execute the stripped executable and merely use the unstripped executable as a source of debug symbols:

$ gdb ./hello.stripped
[...]
Reading symbols from ./hello.stripped...
(No debugging symbols found in ./hello.stripped)
(gdb) symbol-file ./hello.with-g
Reading symbols from ./hello.with-g...
(gdb) br main
Breakpoint 1 at 0x4009e0: file hello.cpp, line 4.
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.stripped
[...]
Breakpoint 1, main () at hello.cpp:4
4           std::cout << "Hello, world!" << std::endl;
(gdb) list
1       #include <iostream>
2
3       int main() {
4           std::cout << "Hello, world!" << std::endl;
5           return 0;
6       }

Success! �

However, your build process might not be so happy with this: Depending on the shape of the final build product (e.g. a single archive or image containing multiple executables), you might end up generating a giant archive full of unstripped executables that then needs to be unpacked, stripped, and finally re-packaged as a corresponding archive of stripped executables.

Worse, when all your executables go into this archive, these expensive pack/unpack/strip/repack steps end up depending on all the executables in your project! The result is that no matter how small the change you make to one of your executables, your build system ends up having to redo the pack/unpack/strip/repack dance for almost every build.

You could mitigate this by moving the stripping to an earlier stage of the build: for example, right after you generate an (unstripped) executable, you could then strip this executable and carry both the stripped and unstripped executables forward into the final build steps. This is certainly much better, but you still haven’t addressed the extra build time incurred by the linker having to process these debug symbols in the first place.

Approaching debug fission

After the previous section, there are two burning questions that should remain:

  1. If we never actually execute the unstripped executable, does it actually need to contain any code at all? Can we remove the executable code from it and retain only the debug information?
  2. Can we push the splitting of the executable to an even earlier step in the build graph?

In essence, can we have the compiler and/or linker generate two separate output files? One with already-stripped executable code, and another with the debug information only?

Let’s tackle the first question first:

Making a file containing only debug information

We can use the --only-keep-debug option to strip (or objcopy) to convert an unstripped executable into a (non-executable ELF) file that contains only the debug-related sections from the original ELF executable:

$ strip --only-keep-debug hello.with-g -o hello.debug
$ ls -l hello.*
-rw-r--r-- 1 jherland users    97 Jan  1 00:00 hello.cpp
-rwxr-xr-x 1 jherland users 28184 Jan  1 00:00 hello.debug
-rwxr-xr-x 1 jherland users  8280 Jan  1 00:00 hello.default
-rw-r--r-- 1 jherland users 29744 Jan  1 00:00 hello.o
-rwxr-xr-x 1 jherland users  6368 Jan  1 00:00 hello.stripped
-rwxr-xr-x 1 jherland users 31560 Jan  1 00:00 hello.with-g

The new file is 28184 bytes, and although it appears to be an executable ELF file, it surely cannot be executed:

$ ./hello.debug
bash: ./hello.debug: cannot execute binary file: Exec format error

If we compare the output of readelf --sections --wide from hello.with-g with that of hello.debug, we see that most of the ELF sections have been removed: even though the section headers are still there, their type has been changed into NOBITS, and the actual section data is gone.

The only non-empty ELF sections that remain are:

  [ 2] .note.gnu.property     NOTE            00000000004002c8 0002c8 000030 00   A  0   0  8
  [ 3] .note.ABI-tag          NOTE            00000000004002f8 0002f8 000020 00   A  0   0  4
[...]
  [27] .comment               PROGBITS        0000000000000000 000318 000013 01  MS  0   0  1
  [28] .debug_info            PROGBITS        0000000000000000 00032b 002c66 00      0   0  1
  [29] .debug_abbrev          PROGBITS        0000000000000000 002f91 0007b9 00      0   0  1
  [30] .debug_loclists        PROGBITS        0000000000000000 00374a 00010a 00      0   0  1
  [31] .debug_aranges         PROGBITS        0000000000000000 003854 000050 00      0   0  1
  [32] .debug_rnglists        PROGBITS        0000000000000000 0038a4 00007f 00      0   0  1
  [33] .debug_line            PROGBITS        0000000000000000 003923 000242 00      0   0  1
  [34] .debug_str             PROGBITS        0000000000000000 003b65 001b62 01  MS  0   0  1
  [35] .debug_line_str        PROGBITS        0000000000000000 0056c7 0004e1 01  MS  0   0  1
  [36] .note.gnu.gold-version NOTE            0000000000000000 005ba8 00001c 00      0   0  4
  [37] .symtab                SYMTAB          0000000000000000 005bc8 000408 18     38  21  8
  [38] .strtab                STRTAB          0000000000000000 005fd0 0002ab 00      0   0  1
  [39] .shstrtab              STRTAB          0000000000000000 00627b 00019a 00      0   0  1

These correspond almost directly to the sections that we stripped earlier.

Furthermore, this new — much smaller — file can be used directly with gdb:

$ gdb ./hello.stripped
[...]
Reading symbols from ./hello.stripped...
(No debugging symbols found in ./hello.stripped)
(gdb) symbol-file ./hello.debug
Reading symbols from ./hello.debug...
(gdb) br main
Breakpoint 1 at 0x4009e0: file hello.cpp, line 4.
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.stripped
[...]
Breakpoint 1, main () at hello.cpp:4
4           std::cout << "Hello, world!" << std::endl;
(gdb) list
1       #include <iostream>
2
3       int main() {
4           std::cout << "Hello, world!" << std::endl;
5           return 0;
6       }

Connecting the stripped executable to the debug file

It is still annoying to have to tell gdb exactly where to find the debug symbols with symbol-file ./hello.debug. Fortunately, there are a couple of tools available to help us connect an executable to its debug file.

objcopy has a --add-gnu-debuglink option that allows us to reconnect the stripped executable to its debug symbols in that other file:

$ objcopy --add-gnu-debuglink=hello.debug hello.stripped hello.stripped.debuglink
$ ls -l hello.stripped*
-rwxr-xr-x 1 jherland users 6368 Jan  1 00:00 hello.stripped
-rwxr-xr-x 1 jherland users 6464 Jan  1 00:00 hello.stripped.debuglink

We can now debug the stripped executable directly!

$ gdb ./hello.stripped.debuglink
[...]
Reading symbols from ./hello.stripped.debuglink...
Reading symbols from /home/jherland/code/debug_fission_experiment/hello.debug...
(gdb) br main
Breakpoint 1 at 0x4009e0: file hello.cpp, line 4.
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.stripped.debuglink
[...]
Breakpoint 1, main () at hello.cpp:4
4           std::cout << "Hello, world!" << std::endl;
(gdb) list
1       #include <iostream>
2
3       int main() {
4           std::cout << "Hello, world!" << std::endl;
5           return 0;
6       }

As expected, comparing readelf --sections --wide on ./hello.stripped.debuglink and ./hello.stripped confirms that a single section (.gnu_debuglink) has been added, and comparing the file sizes we see that this costs a modest 96 bytes.

Linking with --build-id

The linker option --build-id (reachable with -Wl,--build-id from the GCC command line) will embed a Build ID (specifically, a .note.gnu.build-id ELF section) into the linked executable. This build ID is a unique identifier for the built files: “the ID remains the same across multiple builds of the same build tree�.4 This section will be copied with --only-keep-debug, and it will survive stripping, so that — following the steps outlined in the previous sections — you will end up with a stripped executable and the debug symbols in a separate file, but both files will share the same Build ID.

When it comes time to debug, the debug file should be placed in a special place (that correspond to the debug-file-directory setting in gdb, and with a name that is derived from the Build ID itself), and gdb will then be able to automatically find and use it. Here is a (contrived) example of creating a stripped executable and its debug file, both with the same Build ID, and then putting the debug file where it will be automatically picked up by gdb (after setting debug-file-directory):

$ g++ -fuse-ld=gold -Wl,--build-id hello.o -o hello.buildid
$ strip --only-keep-debug hello.buildid -o hello.buildid.debug
$ strip hello.buildid -o hello.buildid.stripped
$ readelf -a  hello.buildid.stripped | grep "Build ID"
    Build ID: 8b7c6e6c56287a2959b0fe41996fcb7876c6bf98
$ mkdir -p .build-id/8b
$ cp hello.buildid.debug .build-id/8b/7c6e6c56287a2959b0fe41996fcb7876c6bf98.debug
$ gdb
[...]
(gdb) set debug-file-directory .
(gdb) file hello.buildid.stripped
Reading symbols from hello.buildid.stripped...
Reading symbols from /home/jherland/code/debug_fission_experiment/.build-id/8b/7c6e6c56287a2959b0fe41996fcb7876c6bf98.debug...
(gdb) br main
Breakpoint 1 at 0x400a00: file hello.cpp, line 4.
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.buildid.stripped
[...]
Breakpoint 1, main () at hello.cpp:4
4           std::cout << "Hello, world!" << std::endl;
(gdb) list
1       #include <iostream>
2
3       int main() {
4           std::cout << "Hello, world!" << std::endl;
5           return 0;
6       }

Since this scheme requires configuration of gdb and/or control over system-wide paths like /usr/lib/debug, we’ll leave it alone for the rest of this exploration. Still, depending on your development/debugging scenario, this might be a better way to organize the lookup of debug symbols. For example, if your build was performed on some CI infrastructure, and you’re now trying to remotely debug a hardware device running a stripped executable, it would be awfully nice if the CI had already arranged for the debug symbols to be placed somewhere your gdb instance could find them. For situations like these, it’s worth looking into debuginfod for serving debug symbols to gdb. That, however, is outside the scope of what we’re looking at here.

Summary so far

So at this point we have:

  • A stripped executable of 6464 bytes with a debuglink to:
  • A separate 28184 byte file with debug information

Together, these two files replace the 31560 byte unstripped executable.

Compared to keeping both the stripped and unstripped executables, this provides a space saving of 3376 bytes (or 9%). And compared to the original unstripped executable, the size increase is modest, at 3088 bytes (or 10%).

In larger projects, the numbers vary of course, but these are savings that are often worth pursuing.

Furthermore, the only tools we have used so far are g++, objcopy, and strip, with which we have been able to achieve what we want without relying on any “modern� toolchain features! �

What remains at this point is to examine if we can make this scale to work with larger executables. That is, are we able to:

  1. split off the debug symbols into a separate file already at the compilation stage?
  2. link together the debug symbols for each object file into bigger “packages� of debug symbols for the entire executable?
  3. (re)establish a link from the final executable to this debug package so that gdb is able to seamlessly debug a stripped executable when it is accompanied by its corresponding debug package?

Debug fission, for real

Splitting debug symbols into a .dwo file with -gsplit-dwarf

The -gsplit-dwarf option is at the core of the debug fission concept. It instructs the compiler to place debug symbols into a separate .dwo file. Let’s try it:

$ g++ -g -gsplit-dwarf -c hello.cpp -o hello.split.o
$ ls -l *o
-rw-r--r-- 1 jherland users 29744 Jan  1 00:00 hello.o
-rw-r--r-- 1 jherland users 20328 Jan  1 00:00 hello.split.dwo
-rw-r--r-- 1 jherland users 10640 Jan  1 00:00 hello.split.o

We have two new files, a new .o file, and a corresponding .dwo file. Looking at the file sizes, it seems that around two thirds of the data in hello.o has been moved into hello.split.dwo, and one third remains in hello.split.o. The overhead introduced is at a modest 1224 bytes (or 4% of the original object file).

If we try to look closer at the actual debug information, it is clear that the debug information now is split between hello.split.o and hello.split.dwo:

$ readelf --debug-dump hello.split.o
The .debug_info section contains link(s) to dwo file(s):

  Name:      hello.split.dwo
  Directory: /home/jherland/code/debug_fission_experiment


hello.split.o: Found separate debug object file: /home/jherland/code/debug_fission_experiment/hello.split.dwo

Contents of the .debug_addr section (loaded from hello.split.o):
[...]
Contents of the .debug_info section (loaded from hello.split.o):
[...]
[9 more sections loaded from hello.split.o...]
Contents of the .debug_info.dwo section (loaded from /home/jherland/code/debug_fission_experiment/hello.split.dwo):
[...]
Contents of the .debug_abbrev.dwo section (loaded from /home/jherland/code/debug_fission_experiment/hello.split.dwo):
[...]
[5 more sections loaded from hello.split.dwo...]

To summarize, -gsplit-dwarf has moved most (but not all) of the debug information from the .o file into a separate .dwo, and replaced it with a reference that links the .o file to its .dwo counterpart to allow tools to access the debug information there.

When it comes time to link the final executable, the linker now has to handle an object file that is only one third of the original size. This results in a smaller executable, and considerably faster link times. For our toy example, these things do not matter, but for a larger code base (although the exact numbers and relative sizes will surely vary) this can make a very significant difference in build times. Especially incremental build times that nonetheless depend on executables to be linked from scratch.

Speaking of linking…

Linking an executable after compiling with -gsplit-dwarf

Now let’s link together an unstripped executable using the new, hello.split.o file we create in the previous section:

$ g++ -fuse-ld=gold hello.split.o -o hello.split
$ ls -l hello.split hello.with-g
-rwxr-xr-x 1 jherland users 19984 Jan  1 00:00 hello.split
-rwxr-xr-x 1 jherland users 31560 Jan  1 00:00 hello.with-g

This new hello.split executable is equivalent to the previous hello.with-g, the only difference being that it is based on an object file that was built with -gsplit-dwarf. Indeed, running readelf --debug-dump on the split executable shows the same results as we got for the corresponding .o files above:

readelf --debug-dump hello.split
The .debug_info section contains link(s) to dwo file(s):

  Name:      hello.split.dwo
  Directory: /home/jherland/code/debug_fission_experiment


hello.split: Found separate debug object file: /home/jherland/code/debug_fission_experiment/hello.split.dwo
[...]

The important thing to note here is that the .dwo references are carried forward into the final executable by the linker.

At this point it’s also worthwhile to compare the ELF sections inside this new, split executable against the original executable (with debug symbols). Again, we pull out readelf --sections --wide and compare its output on hello.with-g to the corresponding output on hello.split. Here is a quick summary of the differences (section sizes are in hexadecimal, as reported by readelf):

ELF section name hello.with-g size hello.split size diff
.debug_addr not present 120 +120
.debug_info 2c66 31 -2c35
.debug_abbrev 7b9 15 -7a4
.debug_loclists 10a not present -10a
.debug_gnu_pubnames not present 10e0 +10e0
.debug_gnu_pubtypes not present fdd +fdd
.debug_aranges 50 50 0
.debug_rnglists 7f 17 -68
.debug_line 242 264 +22
.debug_str 1b62 3d -1b25
.debug_line_str 4e1 577 +96
Sum section sizes 587d 2aa2 -2ddb

So even though most of the ELF sections are the same, we find that 0x2ddb (= 11739) bytes of debug information has moved into the .dwo file. (This corresponds roughly with the directory listing above which shows the split executable being 11576 bytes smaller than the original executable with debug symbols embedded).

An option that is typically also mentioned when we talk about debug fission is the --gdb-index linker option (or -Wl,--gdb-index via the usual compiler wrapper). Although it’s not easy to find good documentation on this option,5 its main objective seems to be to speed up GDB when loading the executable and its symbols for debugging, in effect trading link time for debugging time, perhaps at a minor cost in executable size.

Let’s try it:

$ g++ -fuse-ld=gold Wl,--gdb-index hello.split.o -o hello.split.gdbindex
$ ls -l hello.split hello.split.gdbindex hello.with-g hello.stripped
-rwxr-xr-x 1 jherland users 19984 Jan  1 00:00 hello.split
-rwxr-xr-x 1 jherland users 11377 Jan  1 00:00 hello.split.gdbindex
-rwxr-xr-x 1 jherland users  6368 Jan  1 00:00 hello.stripped
-rwxr-xr-x 1 jherland users 31560 Jan  1 00:00 hello.with-g

Wow, we saved 8607 bytes by adding this option. What happened to the debug sections? Three sections (.debug_gnu_pubnames, .debug_gnu_pubtypes, and .debug_aranges), totalling 8461 bytes were replaced with one .gdb_index section of only 25 bytes.

Furthermore, if we compare the size of this new executable against the hello.stripped executable, we see that although we are still quite a bit off, we have taken off more than half of the overhead from our .split executable.

Can we further strip this executable?

As we did previously, we could try to strip this executable:

$ strip hello.split.gdbindex -o hello.stripped.again
$ ls -l hello.split.gdbindex hello.stripped hello.stripped.again
-rwxr-xr-x 1 jherland users 11377 Jan  1 00:00 hello.split.gdbindex
-rwxr-xr-x 1 jherland users  6368 Jan  1 00:00 hello.stripped
-rwxr-xr-x 1 jherland users  6368 Jan  1 00:00 hello.stripped.again
$ diff --report-identical-files hello.stripped hello.stripped.again
Files hello.stripped and hello.stripped.again are identical

However, we seem to have thrown the baby out with the bath water: there is no debug information at all in hello.stripped.again, not even any references to the .dwo file. It is in fact identical to the hello.stripped we created in an earlier section. This is further confirmed by gdb:

$ gdb hello.stripped.again
GNU gdb (GDB) 13.1
[...]
Reading symbols from hello.stripped.again...
(No debugging symbols found in hello.stripped.again)
(gdb) symbol-file hello.split.dwo
Reading symbols from hello.split.dwo...
(No debugging symbols found in hello.split.dwo)
(gdb) br main
No symbol table is loaded.  Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) n

We can’t even use symbol-file to tell gdb to look up symbols in the .dwo file. Thus it seems that the -gsplit-dwarf mechanism relies on putting debug information into both files, and that the .dwo file is not at all useful if we strip the corresponding executable.

What about the .dwo files?

So we now have a linked executable with references to the .dwo file(s) that were created in the -gsplit-dwarf compilation step. For a bigger application spread across many source files, this will amount to a lot of .dwo files. If you want to debug the application at a later date, you must make sure to have all these .dwo files available at that point in time. So, do you need to paintstakingly collect .dwo files into a tar archive that accompanies your executable, and that needs to be unpacked before each debugging session?

Fear not: In the same way that the linker takes a collection of .o files and produces an executable, you can regard the dwp tool as a linker for debug information: It takes a collection of .dwo files and produces single .dwp (short for “DWARF package�) file that contains all the debug symbols needed to debug the final executable.

Compiling .dwo files into .dwp packages can be done directly, by passing each .dwo on the dwp command line:

$ dwp -o hello.split.dwp hello.split.dwo
$ ls -l hello.split.dw*
-rw-r--r-- 1 jherland users 20328 Jan  1 00:00 hello.split.dwo
-rw-r--r-- 1 jherland users 57416 Jan  1 00:00 hello.split.dwp

The easier option, however, is probably to use the --exec option to have dwp look at the executable itself, to automatically find all the .dwo files referenced and then “link� them into a .dwp package that corresponds to the name of the executable:

$ dwp --exec hello.split.gdbindex
$ ls -l hello.split*dw*
-rw-r--r-- 1 jherland users 20328 Jan  1 00:00 hello.split.dwo
-rw-r--r-- 1 jherland users 57416 Jan  1 00:00 hello.split.dwp
-rw-r--r-- 1 jherland users 57416 Jan  1 00:00 hello.split.gdbindex.dwp
$ diff --report-identical-files hello.split.dwp hello.split.gdbindex.dwp
Files hello.split.dwp and hello.split.gdbindex.dwp are identical

(Note that I’m not sure why the .dwp file here ends up so much larger than the .dwo file it is based on. On a different (older) toolchain version the size difference is much smaller, almost negligible. I suspect this is due to some constant-size overhead, and that it will disappear in the noise when scaled up to much larger executables.)

In any case, with the .dwp file created, we can remove the .dwo file(s) and still successfully access the debug symbols via the .dwp file:

$ rm *.dwo
$ gdb hello.split.gdbindex
[...]
Reading symbols from hello.split.gdbindex...
(gdb) br main
Breakpoint 1 at 0x400a00: file hello.cpp, line 4.
(gdb) run
Starting program: /home/jherland/code/debug_fission_experiment/hello.split.gdbindex
[...]
Breakpoint 1, main () at hello.cpp:4
4           std::cout << "Hello, world!" << std::endl;
(gdb) list
1       #include <iostream>
2
3       int main() {
4           std::cout << "Hello, world!" << std::endl;
5           return 0;
6       }

Summary of debug fission

So, at last, we have arrived at debug fission: we now have a debuggable executable that is only slightly larger than a stripped executable, and an accompanying .dwp package of debug symbols. The executable can be distributed/deployed on its own, and as long as the corresponding .dwp file is available when you need to debug, all the debug symbols will automatically be available to you. Let’s recap:

  • The compiler produces two output files from each compilation step: a .o object file without debug symbols, and a .dwo file containing the debug information.
  • The .o file carries a reference to the corresponding .dwo file. This reference is carried forward by the linker into the final executable.
  • The .dwo files can also be “linkedâ€� together into a .dwp “packageâ€� that carries all the debug symbols for the associated executable.
  • GDB knows how to look up debug symbols in both .dwp and .dwo files, so in the end we need to make either available to GDB, along with the final (stripped) executable.
  • The use of the -Wl,--gdb-index allows further debugging optimizations to be precomputed into the final executable, and will make debugging considerably faster.

Integration into larger build systems

So far, we’ve looked at the basics, invoking the compiler/linker manually at each step, but this is not how most software is built. Let’s look at enabling debug fission in a couple of popular build systems.

Case study: CMake

CMake currently does not support debug fission (aka. “split dwarf�) natively. Still, we’re not going to give up that easily.6

First steps

First, let’s try to wrap our toy example in a simple CMake project:

$ cat CMakeLists.txt
cmake_minimum_required(VERSION 3.25)
project(debug_fission_experiment)
add_executable(hello hello.cpp)
set(CMAKE_VERBOSE_MAKEFILE on)
$ cmake -S . -B cmake_default
[...]
$ cmake --build cmake_default
[...]
[ 50%] Building CXX object CMakeFiles/hello.dir/hello.cpp.o
[...]/g++ [...] -o CMakeFiles/hello.dir/hello.cpp.o -c /home/jherland/code/debug_fission_experiment/hello.cpp
[100%] Linking CXX executable hello
[...]g++ CMakeFiles/hello.dir/hello.cpp.o -o hello
[100%] Built target hello
[...]
$ ls -l cmake_default/hello
-rwxr-xr-x 1 jherland users 16304 Jan  1 00:00 cmake_default/hello

We see that CMake by default creates an executable without debug symbols, and one that is indeed identical to the very first build we did in this entire saga:

$ g++ hello.cpp -o cmake_default/hello.compare
$ diff --report-identical-files cmake_default/hello cmake_default/hello.compare
Files cmake_default/hello and cmake_default/hello.compare are identical

From this we can deduce that:

  • CMake defaults to building executables without debug symbols
  • CMake uses the default linker (BFD, not gold) by default

Let’s fix both of those.

Debug builds with the gold linker

As far as I can see, there’s no built-in mechanism in CMake to choose the linker to be used, but all we really need to do is to add -fuse-ld=gold to the command line used when linking the executable.

Conversely, although we could “simply� add -g to the compiler command line, CMake instead provides the CMAKE_BUILD_TYPE variable to specify what kind of build we want. The following alternatives are available by default: Debug, Release, RelWithDebInfo and MinSizeRel, and you can see what they entail in terms of compiler flags here. We’ll stick with Debug for this exploration.

We encode these choices into our CMakeLists.txt by adding these lines:

add_link_options(-fuse-ld=gold)
set(CMAKE_BUILD_TYPE Debug)

Setting compiler/linker flags to achieve debug fission

We can use the various CMAKE_*_FLAGS variables to directly pass the debug fission options to the compiler and linker command lines run by CMake:

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -gsplit-dwarf")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -gsplit-dwarf")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gdb-index")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gdb-index")

Producing the .dwp debug package

However, the above compiler/linker flags do not get us all the way there: We still need to tell CMake how to assemble the .dwo files into a .dwp package. That is, for our toy example, we need to run:

$ dwp -o cmake_split/hello.dwp cmake_split/CMakeFiles/hello.dir/hello.cpp.dwo

# or, the more indirect option that goes via .dwo references in the executable:
$ dwp --exec cmake_split/hello -o cmake_split/hello.dwp

(Note that the latter, more indirect, invocation currently fails with a segmentation fault. The instructions below will nonetheless assume that this dwp bug is fixed, and that the indirect invocation will work as advertised.)

Furthermore, we want to make a generic CMake rule so that this is done automatically for all executables. Here is a CMake fragment to do just that:

find_program(DWP_TOOL dwp)
function(add_executable target_name)
    # Call the original function
    _add_executable(${target_name} ${ARGN})
    set(out_dwp "${target_name}.dwp")
    add_custom_command(TARGET ${target_name}
        POST_BUILD
        COMMAND ${DWP_TOOL} --exec ${target_name} -o ${out_dwp}
        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
        COMMENT "Linking debug package ${out_dwp}"
        VERBATIM
        )
endfunction()

This redefines the add_executable() CMake function to also attach the appropriate dwp command as an extra command run immediately after the creation of every executable.

In the end, this is our final CMakeLists.txt for our toy project:

cmake_minimum_required(VERSION 3.25)
project(debug_fission_experiment)
add_link_options(-fuse-ld=gold)
set(CMAKE_BUILD_TYPE Debug)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -gsplit-dwarf")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -gsplit-dwarf")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gdb-index")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gdb-index")

find_program(DWP_TOOL dwp)
function(add_executable target_name)
    # Call the original function
    _add_executable(${target_name} ${ARGN})
    set(out_dwp "${target_name}.dwp")
    add_custom_command(TARGET ${target_name}
        POST_BUILD
        COMMAND ${DWP_TOOL} --exec ${target_name} -o ${out_dwp}
        WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
        COMMENT "Linking debug package ${out_dwp}"
        VERBATIM
        )
endfunction()

add_executable(hello hello.cpp)

Case study: Bazel

As opposed to CMake, modern versions of Bazel (since v6) do support debug fission with the --fission option. However, this is conditional on the underlying toolchain configuration advertising the per_object_debug_info toolchain feature. If we assume that is in place, using debug fission is fairly straightforward:

  • Build an executable target with --fission=yes, either passed via the command line, or suitably encoded in a .bazelrc file: bazel build //path/to:executable --fission=yes
  • To get the .dwp file corresponding to an executable, append .dwp to the executable target, to trigger its creation: bazel build //path/to:executable.dwp --fission=yes
  • (Along the same lines, you can instruct Bazel to build a stripped executable by appending .stripped to the executable target name.)
  • As always when debugging compilation with Bazel: the --subcommands option is very useful to see exactly how Bazel ends up invoking the compiler/linker.

That’s it, really, from a naive point of view. Depending on the complexity of your project, you might run into other complications, for example, if your Bazel project uses rules_foreign_cc to drive some other build system for a subset of your build products, then you might have to communicate debug fission into that other build system and — crucially — extracting the separate debug symbols out from that build system and back into Bazel.

An example at scale: Building LLVM

Here, we’re going to leave our small experiments behind, and rather look at the potential wins of using debug fission in a larger project. We’ll look at the cost that debug symbols add to a release build, and see how debug fission can mitigate these costs.

For this, we need a larger project where we can compare release builds to builds with debug symbols, with or without debug fission enabled. One such project is LLVM. In addition to building with CMake, LLVM also already provides a LLVM_USE_SPLIT_DWARF option to enable debug fission.7

Setting up the LLVM builds

To get numbers suitable for comparison, we’ll choose the Release build type for the build without debug symbols, and RelWithDebInfo (rather than Debug) for the builds with debug symbols.

These are the three separate builds of the LLVM project that we will run:

  1. A build with debug symbols, but no fission enabled. We name this build “debug� in the discussion below, and it is configured like this:
    cmake -S llvm -B debug -G Ninja \
          -DLLVM_USE_LINKER=lld \
          -DCMAKE_BUILD_TYPE=RelWithDebInfo
  2. A build with debug symbols, and debug fission enabled. We name this build “fission�, and it is configured like this:
    cmake -S llvm -B fission -G Ninja \
          -DLLVM_USE_LINKER=lld \
          -DCMAKE_BUILD_TYPE=RelWithDebInfo \
          -DLLVM_USE_SPLIT_DWARF=ON
  3. A release build with no debug symbols, named “release�. It is configured like this:
    cmake -S llvm -B release -G Ninja \
          -DLLVM_USE_LINKER=lld \
          -DCMAKE_BUILD_TYPE=Release

After configuration, each build is performed by running cmake --build $build_dir, and the following installation is done with cmake --install $build_dir.

The numbers

Here are the relevant statistics from running these builds on my laptop. The absolute numbers in this table are not too interesting, but we’ll examine the relative differences below:

Build phase Debug Fission Release
Wall clock time spent building 71m58s 64m36s 51m53s
Total size of $build_dir 46.94GB 14.91GB 3.06GB
Total size of $build_dir/bin 30.90GB 7.25GB 1.97GB
Total size of $build_dir/lib 15.19GB 7.03GB 1.02GB
Number of files in $build_dir 4348 7275 4348
Number of *.o files in $build_dir 2931 2931 2931
Number of *.a files in $build_dir 192 192 192
Number of *.dwo files in $build_dir 0 2927 0
Total size of *.dwo files N/A 4.02GB N/A
Install phase Debug Fission Release
Wall clock time spent installing 5m44s 1m08s 0m02s
Total size of $install_dir 35.69GB 8.16GB 2.16GB
Total size of $install_dir/bin 27.45GB 6.48GB 1.78GB
Total size of $install_dir/lib 8.22GB 1.65GB 0.36GB
Number of files in $install_dir 2201 2201 2201
Number of *.a files in $install_dir 189 189 189
Focus on a single executable: llvm-ar Debug Fission Release
Size of llvm-ar executable 259MB 84MB 32MB
Number of *.dwo referenced by llvm-ar 0 530 0
Size of *.dwo referenced by llvm-ar N/A 363MB N/A

Using the release build as our baseline, the build time increases by 39% for a debug build, but only by 25% when debug fission is enabled. Looking at the size of the build, the differences are much bigger: The debug build needs 15 times the disk space of the release build, but with debug fission, only 5 times the disk space is needed.

Next, let’s add the installation phase into the mix, which in LLVM’s case consists almost exclusively of copying files from the $build_dir. This highlights how the sheer size of build outputs with debug symbols contribute to slowing everything down: The debug build + install is 50% slower than the release build + install, and for debug fission the corresponding slowdown is only 27%.

The same trend is reflected if we focus on a single executable from the many built by LLVM: llvm-ar is 8.1 times bigger in the debug build than in the release build, but with debug fission, this is reduced to 2.6 times. In the debug fission case, the debug information has been moved into a large number of .dwo files that all together (84MB + 363MB) take up more space than the debug executable (259MB), but most of these .dwo files are referenced from (i.e. shared between) several LLVM executables, so the full cost of this debug information is amortized.

Summary of our LLVM build comparison

What can we learn from these three LLVM builds?

First, debug symbols take up a lot of space: When you turn on debug symbols (whether you enable debug fission or not), you should expect your build artifacts to become several times larger compared to a stripped release build.

Without debug fission, the large sections of debug symbols are copied from object files into intermediate archives/libraries and then again into the final executables. This duplication wastes a lot of space, hence it also significantly impacts the overall build time.

With debug fission enabled, however, much of this duplication is eliminated, and the reduced waste helps improve build times as well.

Conclusion

This concludes our exploration of how to separate debug symbols from executables in a way that saves both build time and space. Hopefully I have shown that this is possible to achieve in real-world projects, albeit maybe at the cost of some added complexity, especially if your build system is not already equipped with support for debug fission.

So, is debug fission worth it? The answer, of course, depends on your point of view:

If your baseline is a stripped release executable without any debug information, and you “just� wanted to add the ability to debug — but with minimal overhead in the executable itself — then I’m afraid debug fission is no silver bullet.

In terms of final executable size, the absolute least overhead you can achieve is with the --add-gnu-debuglink or --build-id discussed previously. However, this incurs the most overhead in terms of build time: You first need to do a full debug build, then strip the resulting executable to create the release executable.

On the other hand, if you already have a debug build, but struggle with its space/time requirements, then — depending on how well your build system supports it — debug fission could be a very valuable investment.

It all comes down to knowing your own project/codebase and the context in which it is built. Hopefully, in this article, you have at least found some hints on where and how to look for potential savings.

In terms of toolchain requirements for the techniques described here, you need a halfway modern C/C++ toolchain with support for -gsplit-dwarf and a corresponding debugger capable of reading .dwo/.dwp files. You will also need a linker that is more modern than the default BFD linker. In fact, if your project has access to a linker like gold (or even better: lld or mold), then these are a sure win over the BFD linker in any case, both in terms of improving build times, and the size of the final executables. And that is probably true even before you factor in the techniques described above!

Generally, it seems to me that GCC’s wiki pages on debug fission and the DWARF package format should be considered the authoritative documentation on debug fission. Other than those, I consider this to be an under-documented feature in the world of C/C++ compilers/linkers. Hopefully this article can help remedy that.

GDB’s documentation details the use of --only-keep-debug and --add-gnu-debuglink.

At the start, I mentioned two articles that turned me onto this topic in the first place. In addition to those, here are some other interesting resources that I came across while working on this:

Thanks to Christopher Harrison, Mark Karpov, Cheng Shao, and Arnaud Spiwack for their reviews of this article.


  1. A related topic would be that of compressing the debug symbols. This can be done instead of — or in some cases, in addition to — the separation of debug symbols that we discuss here. Debug symbol compression is a topic worthy of its own exploration, and I won’t tackle it here, except for pointing to resources like this, and this, compiler options like GCC’s -gz, linker options like --compress-debug-sections, or the dwz tool.↩
  2. According to the ELF specifications: “All section names with the prefix .debug hold information for symbolic debugging. The contents of these sections are unspecified.�↩
  3. Again, the ELF specifications have this to say about these sections: “.symtab holds a symbol table�, and “.strtab holds strings, most commonly the strings that represent the names associated with symbol table entries.�↩
  4. According to the GDB documentation.↩
  5. GCC’s wiki states: “Use the gold linker’s --gdb-index option (-Wl,--gdb-index when linking with gcc or g++) at link time to create the .gdb_index section that allows GDB to locate and read the .dwo files as it needs them.�↩
  6. Note that yours truly does not have much experience with CMake, so please don’t regard the following instructions as authoritative in any way. 😉 This section is inspired by some projects that use CMake, and that also do support debug fission, e.g. the WebKit project has this resolved bug, along with this code in a current version.↩
  7. The LLVM_USE_SPLIT_DWARF does not give us the “full� debug fission experience, as outlined in previous sections: Notably here is no “linking� of .dwo files into .dwp debug packages, and the .dwo files are also not part of the final installation. That is, you will need access to the original build tree in order to access the debug information. This aspect of debug fission is therefore missing from the numbers presented below.↩

November 23, 2023 12:00 AM

November 19, 2023

Magnus Therning

Making Emacs without terminal emulator a little more usable

After reading Andrey Listopadov's You don't need a terminal emulator (mentioned at Irreal too) I decided to give up on using Emacs as a terminal for my shell. In my experience Emacs simply isn't a very good terminal to run a shell in anyway. I removed the almost completely unused shell-pop from my configuration and the keybinding with a binding to async-shell-command. I'm keeping terminal-here in my config for the time being though.

I realised projectile didn't have a function for running it in the root of a project, so I wrote one heavily based on project-async-shell-command.

(defun mep-projectile-async-shell-command ()
  "Run `async-shell-command' in the current project's root directory."
  (declare (interactive-only async-shell-command))
  (interactive)
  (let ((default-directory (projectile-project-root)))
    (call-interactively #'async-shell-command)))

I quickly found that the completion offered by Emacs for shell-command and async-shell-command is far from as sophisticated as what I'm used to from Z shell. After a bit of searching I found emacs-bash-completion. Bash isn't my shell of choice, partly because I've found the completion to not be as good as in Z shell, but it's an improvement over what stock Emacs offers. The instructions in the repo was good, but had to be adjusted slightly:

(use-package bash-completion
  :straight (:host github
             :repo "szermatt/emacs-bash-completion")
  :config
  (add-hook 'shell-dynamic-complete-functions 'bash-completion-dynamic-complete))

I just wish I'll find a package offering completions reaching Z shell levels.

November 19, 2023 07:50 AM

November 16, 2023

Magnus Therning

Using the golang mode shipped with Emacs

A few weeks ago I wanted to try out tree-sitter and switched a few of the modes I use for coding to their -ts-mode variants. Based on the excellent How to Get Started with Tree-Sitter I added bits like this to the setup I have for coding modes:1

(use-package X-mode
  :init
  (add-to-list 'treesit-language-source-alist '(X "https://github.com/tree-sitter/tree-sitter-X"))
  ;; (treesit-install-language-grammar 'X)
  (add-to-list 'major-mode-remap-alist '(X-mode . X-ts-mode))
  ;; ...
  )

I then manually evaluated the expression that's commented out to download and compile the tree-sitter grammar. It's a rather small change, it works, and I can switch over language by language. I swapped a couple of languages to the tree-sitter modes like this, including golang. The only mode that I noticed changes in was golang, in particular my adding of gofmt-before-save to before-save-hook had stopped having any effect.

What I hadn't realised was that the go-mode I was using didn't ship with Emacs and that when I switched to go-ts-mode I switched to one that was. It turns out that gofmt-before-save is hard-wired to work only in go-mode, something others have noticed.

I don't feel like waiting for go-mode to fix that though, especially not when there's a perfectly fine golang mode shipping with Emacs now, and not when emacs-reformatter make it so easy to define formatters (as I've written about before).

My golang setup, sans keybindings, now looks like this:2

(use-package go-ts-mode
  :hook
  (go-ts-mode . lsp-deferred)
  (go-ts-mode . go-format-on-save-mode)
  :init
  (add-to-list 'treesit-language-source-alist '(go "https://github.com/tree-sitter/tree-sitter-go"))
  (add-to-list 'treesit-language-source-alist '(gomod "https://github.com/camdencheek/tree-sitter-go-mod"))
  ;; (dolist (lang '(go gomod)) (treesit-install-language-grammar lang))
  (add-to-list 'auto-mode-alist '("\\.go\\'" . go-ts-mode))
  (add-to-list 'auto-mode-alist '("/go\\.mod\\'" . go-mod-ts-mode))
  :config
  (reformatter-define go-format
    :program "goimports"
    :args '("/dev/stdin"))
  :general
  ;; ...
  )

So far I'm happy with the built-in go-ts-mode and I've got to say that using a minor mode for the format-on-save functionality is more elegant than adding a function to before-save-hook (something that go-mode may get through this PR).

Footnotes:

1

There were a few more things that I needed to modify. As the tree-sitter modes are completely separate from the non-tree-sitter modes things like hooks and keybindings in the modes' keymaps.

2

The full file is here.

November 16, 2023 06:02 AM

November 10, 2023

GHC Developer Blog

GHC 9.4.8 is now available

GHC 9.4.8 is now available

Zubin Duggal - 2023-11-10

The GHC developers are happy to announce the availability of GHC 9.4.8. Binary distributions, source distributions, and documentation are available on the release page.

This release is primarily a bugfix release addressing a few issues found in the 9.4 series. These include:

  • A fix for a recompilation checking bug where GHC may miss changes in transitive dependencies when deciding to relink a program (#23724).
  • A fix for a code generator bug on AArch64 platforms resulting in invalid conditional jumps (#23746).
  • Support for -split-sections on Windows.
  • Enabling -split-sections for various Linux and Windows binary distributions, enabling GHC to produce smaller binaries on these platforms.
  • And a few other fixes

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Enjoy!

-Bryan

by ghc-devs at November 10, 2023 12:00 AM

November 07, 2023

Chris Reade

Graphs, Kites and Darts – and Theorems

We continue our exploration of properties of Penrose’s aperiodic tilings with kites and darts using Haskell and Haskell Diagrams.

In this blog we discuss some interesting properties we have discovered concerning the \small\texttt{decompose}, \small\texttt{compose}, and \small\texttt{force} operations along with some proofs.

Index

  1. Quick Recap (including operations \small\texttt{compose}, \small\texttt{force}, \small\texttt{decompose} on Tgraphs)
  2. Composition Problems and a Compose Force Theorem (composition is not a simple inverse to decomposition)
  3. Perfect Composition Theorem (establishing relationships between \small\texttt{compose}, \small\texttt{force}, \small\texttt{decompose})
  4. Multiple Compositions (extending the Compose Force theorem for multiple compositions)
  5. Proof of the Compose Force Theorem (showing \small\texttt{compose} is total on forced Tgraphs)

1. Quick Recap

Haskell diagrams allowed us to render finite patches of tiles easily as discussed in Diagrams for Penrose tiles. Following a suggestion of Stephen Huggett, we found that the description and manipulation of such tilings is greatly enhanced by using planar graphs. In Graphs, Kites and Darts we introduced a specialised planar graph representation for finite tilings of kites and darts which we called Tgraphs (tile graphs). These enabled us to implement operations that use neighbouring tile information and in particular operations \small\texttt{decompose}, \small\texttt{compose}, and \small\texttt{force}.

For ease of reference, we reproduce the half-tiles we are working with here.

Figure 1: Half-tile faces
Figure 1: Half-tile faces

Figure 1 shows the right-dart (RD), left-dart (LD), left-kite (LK) and right-kite (RK) half-tiles. Each has a join edge (shown dotted) and a short edge and a long edge. The origin vertex is shown red in each case. The vertex at the opposite end of the join edge from the origin we call the opp vertex, and the remaining vertex we call the wing vertex.

If the short edges have unit length then the long edges have length \phi (the golden ratio) and all angles are multiples of 36^{\circ} (a tenth turn) with kite halves having  two 2s and a 1, and dart halves having a 3 and two 1s. This geometry of the tiles is abstracted away from at the graph representation level but used when checking validity of tile additions and by the drawing functions.

There are rules for how the tiles can be put together to make a legal tiling (see e.g. Diagrams for Penrose tiles). We defined a Tgraph (in Graphs, Kites and Darts) as a list of such half-tiles which are constrained to form a legal tiling but must also be connected with no crossing boundaries (see below).

As a simple example consider kingGraph (2 kites and 3 darts round a king vertex). We represent each half-tile as a TileFace with three vertex numbers, then apply makeTgraph to the list of ten Tilefaces. The function makeTgraph :: [TileFace] -> Tgraph performs the necessary checks to ensure the result is a valid Tgraph.

kingGraph :: Tgraph
kingGraph = makeTgraph 
  [LD (1,2,3),RD (1,11,2),LD (1,4,5),RD (1,3,4),LD (1,10,11)
  ,RD (1,9,10),LK (9,1,7),RK (9,7,8),RK (5,7,1),LK (5,6,7)
  ]

To view the Tgraph we simply form a diagram (in this case 2 diagrams horizontally separated by 1 unit)

  hsep 1 [labelled drawj kingGraph, draw kingGraph]

and the result is shown in figure 2 with labels and dashed join edges (left) and without labels and join edges (right).

Figure 2: kingGraph with labels and dashed join edges (left) and without (right).
Figure 2: kingGraph with labels and dashed join edges (left) and without (right).

The boundary of the Tgraph consists of the edges of half-tiles which are not shared with another half-tile, so they go round untiled/external regions. The no crossing boundary constraint (equivalently, locally tile-connected) means that a boundary vertex has exactly two incident boundary edges and therefore has a single external angle in the tiling. This ensures we can always locally determine the relative angles of tiles at a vertex. We say a collection of half-tiles is a valid Tgraph if it constitutes a legal tiling but also satisfies the connectedness and no crossing boundaries constraints.

Our key operations on Tgraphs are \small\texttt{decompose}, \small\texttt{force}, and \small\texttt{compose} which are illustrated in figure 3.

Figure 3: decompose, force, and compose
Figure 3: decompose, force, and compose

Figure 3 shows the kingGraph with its decomposition above it (left), the result of forcing the kingGraph (right) and the composition of the forced kingGraph (bottom right).

Decompose

An important property of Penrose dart and kite tilings is that it is possible to divide the half-tile faces of a tiling into smaller half-tile faces, to form a new (smaller scale) tiling.

Figure 4: Decomposition of (left) half-tiles
Figure 4: Decomposition of (left) half-tiles

Figure 4 illustrates the decomposition of a left-dart (top row) and a left-kite (bottom row). With our Tgraph representation we simply introduce new vertices for dart and kite long edges and kite join edges and then form the new faces using these. This does not involve any geometry, because that is taken care of by drawing operations.

Force

Figure 5 illustrates the rules used by our \small\texttt{force} operation (we omit a mirror-reflected version of each rule).

Figure 5: Force rules
Figure 5: Force rules

In each case the yellow half-tile is added in the presence of the other half-tiles shown. The yellow half-tile is forced because, by the legal tiling rules, there is no choice for adding a different half-tile on the edge where the yellow tile is added.

We call a Tgraph correct if it represents a tiling which can be continued infinitely to cover the whole plane without getting stuck, and incorrect otherwise. Forcing involves adding half-tiles by the illustrated rules round the boundary until either no more rules apply (in which case the result is a forced Tgraph) or a stuck tiling is encountered (in which case an incorrect Tgraph error is raised). Hence \small\texttt{force} is a partial function but total on correct Tgraphs.

Compose: This is discussed in the next section.

2. Composition Problems and a Theorem

Compose Choices

For an infinite tiling, composition is a simple inverse to decomposition. However, for a finite tiling with boundary, composition is not so straight forward. Firstly, we may need to leave half-tiles out of a composition because the necessary parts of a composed half-tile are missing. For example, a half-dart with a boundary short edge or a whole kite with both short edges on the boundary must necessarily be excluded from a composition. Secondly, on the boundary, there can sometimes be a problem of choosing whether a half-dart should compose to become a half-dart or a half-kite. This choice in composing only arises when there is a half-dart with its wing on the boundary but insufficient local information to determine whether it should be part of a larger half-dart or a larger half-kite.

In the literature (see for example 1 and 2) there is an often repeated method for composing (also called inflating). This method always make the kite choice when there is a choice. Whilst this is a sound method for an unbounded tiling (where there will be no choice), we show that this is an unsound method for finite tilings as follows.

Clearly composing should preserve correctness. However, figure 6 (left) shows a correct Tgraph which is a forced queen, but the kite-favouring composition of the forced queen produces the incorrect Tgraph shown in figure 6 (centre). Applying our \small\texttt{force} function to this reveals a stuck tiling and reports an incorrect Tgraph.

Figure 6: An erroneous and a safe composition
Figure 6: An erroneous and a safe composition

Our algorithm (discussed in Graphs, Kites and Darts) detects dart wings on the boundary where there is a choice and classifies them as unknowns. Our composition refrains from making a choice by not composing a half dart with an unknown wing vertex. The rightmost Tgraph in figure 6 shows the result of our composition of the forced queen with the half-tile faces left out of the composition (the remainder faces) shown in green. This avoidance of making a choice (when there is a choice) guarantees our composition preserves correctness.

Compose is a Partial Function

A different composition problem can arise when we consider Tgraphs that are not decompositions of Tgraphs. In general, \small\texttt{compose} is a partial function on Tgraphs.

Figure 7: Composition may fail to produce a Tgraph
Figure 7: Composition may fail to produce a Tgraph

Figure 7 shows a Tgraph (left) with its sucessful composition (centre) and the half-tile faces that would result from a second composition (right) which do not form a valid Tgraph because of a crossing boundary (at vertex 6). Thus composition of a Tgraph may fail to produce a Tgraph when the resulting faces are disconnected or have a crossing boundary.

However, we claim that \small\texttt{compose} is a total function on forced Tgraphs.

Compose Force Theorem

Theorem: Composition of a forced Tgraph produces a valid Tgraph.

We postpone the proof (outline) for this theorem to section 5. Meanwhile we use the result to establish relationships between \small\texttt{compose}, \small\texttt{force}, and \small\texttt{decompose} in the next section.

3. Perfect Composition Theorem

In Graphs, Kites and Darts we produced a diagram showing relationships between multiple decompositions of a dart and the forced versions of these Tgraphs. We reproduce this here along with a similar diagram for multiple decompositions of a kite.

Figure 8: Commuting Diagrams
Figure 8: Commuting Diagrams

In figure 8 we show separate (apparently) commuting diagrams for the dart and for the kite. The bottom rows show the decompositions, the middle rows show the result of forcing the decompositions, and the top rows illustrate how the compositions of the forced Tgraphs work by showing both the composed faces (black edges) and the remainder faces (green edges) which are removed in the composition. The diagrams are examples of some commutativity relationships concerning \small\texttt{force}, \small\texttt{compose} and \small\texttt{decompose} which we will prove.

It should be noted that these diagrams break down if we consider only half-tiles as the starting points (bottom right of each diagram). The decomposition of a half-tile does not recompose to its original, but produces an empty composition. So we do not even have g = (\small\texttt{compose} \cdot \small\texttt{decompose}) \ g in these cases. Forcing the decomposition also results in an empty composition. Clearly there is something special about the depicted cases and it is not merely that they are wholetile complete because the decompositions are not wholetile complete. [Wholetile complete means there are no join edges on the boundary, so every half-tile has its other half.]

Below we have captured the properties that are sufficient for the diagrams to commute as in figure 8. In the proofs we use a partial ordering on Tgraphs (modulo vertex relabelling) which we define next.

Partial ordering of Tgraphs

If g_0 and g_1 are both valid Tgraphs and g_0 consists of a subset of the (half-tile) faces of g_1 we have

\displaystyle g_0 \subseteq g_1

which gives us a partial order on Tgraphs. Often, though, g_0 is only isomorphic to a subset of the faces of g_1, requiring a vertex relabelling to become a subset. In that case we write

\displaystyle g_0 \sqsubseteq g_1

which is also a partial ordering and induces an equivalence of Tgraphs defined by

\displaystyle g_0 \equiv g_1 \text{ if and only if } g_0 \sqsubseteq g_1 \text{ and } g_1 \sqsubseteq g_0

in which case g_0 and g_1 are isomorphic as Tgraphs.

Both \small\texttt{compose} and \small\texttt{decompose} are monotonic with respect to \sqsubseteq meaning:

\displaystyle g_0 \sqsubseteq g_1 \text{ implies } \small\texttt{compose} \ g_0 \sqsubseteq \small\texttt{compose} \ g_1 \text{ and } \small\texttt{decompose} \ g_0 \sqsubseteq \small\texttt{decompose} \ g_1

We also have \small\texttt{force} is monotonic, but only when restricted to correct Tgraphs. Also, when restricted to correct Tgraphs, we have \small\texttt{force} is non decreasing because it only adds faces:

\displaystyle g \sqsubseteq \small\texttt{force} \ g

and \small\texttt{force} is idempotent (forcing a forced correct Tgraph leaves it the same):

\displaystyle (\small\texttt{force} \cdot \small\texttt{force}) \ g \equiv \small\texttt{force} \ g

Composing perfectly and perfect compositions

Definition: A Tgraph g composes perfectly if all faces of g are composable (i.e there are no remainder faces of g when composing).

We note that the composed faces must be a valid Tgraph (connected with no crossing boundaries) if all faces are included in the composition because g has those properties. Clearly, if g composes perfectly then

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g \equiv g

In general, for arbitrary g where the composition is defined, we only have

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g \sqsubseteq g

Definition: A Tgraph g' is a perfect composition if \small\texttt{decompose} \ g' composes perfectly.

Clearly if g' is a perfect composition then

\displaystyle (\small\texttt{compose} \cdot \small\texttt{decompose}) \ g' \equiv g'

(We could use equality here because any new vertex labels introduced by \small\texttt{decompose} will be removed by \small\texttt{compose}). In general, for arbitrary g',

\displaystyle (\small\texttt{compose} \cdot \small\texttt{decompose}) \ g' \sqsubseteq g'

Lemma 1: g' is a perfect composition if and only if g' has the following 2 properties:

  1. every half-kite with a boundary join has either a half-dart or a whole kite on the short edge, and
  2. every half-dart with a boundary join has a half-kite on the short edge,

(Proof outline:) Firstly note that unknowns in g (= \small\texttt{decompose} \ g') can only come from boundary joins in g'. The properties 1 and 2 guarantee that g has no unknowns. Since every face of g has come from a decomposed face in g', there can be no faces in g that will not recompose, so g will compose perfectly to g'. Conversely, if g' is a perfect composition, its decomposition g can have no unknowns. This implies boundary joins in g' must satisfy properties 1 and 2. \square

(Note: a perfect composition g' may have unknowns even though its decomposition g has none.)

It is easy to see two special cases:

  1. If g' is wholetile complete then g' is a perfect composition.Proof: Wholetile complete implies no boundary joins which implies properties 1 and 2 in lemma 1 which implies g' is a perfect composition. \square
  2. If g' is a decomposition then g' is a perfect composition.Proof: If g' is a decomposition, then every half-dart has a half-kite on the short edge which implies property 2 of lemma 1. Also, any half-kite with a boundary join in g' must have come from a decomposed half-dart since a decomposed half-kite produces a whole kite with no boundary kite join. So the half-kite must have a half-dart on the short edge which implies property 1 of lemma 1. The two properties imply g' is a perfect composition. \square

We note that these two special cases cover all the Tgraphs in the bottom rows of the diagrams in figure 8. So the Tgraphs in each bottom row are perfect compositions, and furthermore, they all compose perfectly except for the rightmost Tgraphs which have empty compositions.

In the following results we make the assumption that a Tgraph is correct, which guarantees that when \small\texttt{force} is applied, it terminates with a correct Tgraph. We also note that \small\texttt{decompose} preserves correctness as does \small\texttt{compose} (provided the composition is defined).

Lemma 2: If g_f is a forced, correct Tgraph then

\displaystyle (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) \ g_f \equiv g_f

(Proof outline:) The proof uses a case analysis of boundary and internal vertices of g_f. For internal vertices we just check there is no change at the vertex after (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) using figure 11 (plus an extra case for the forced star). For boundary vertices we check local contexts similar to those depicted in figure 10 (but including empty composition cases). This reveals there is no local change of the boundary at any boundary vertex, and since this is true for all boundary vertices, there can be no global change. (We omit the full details). \square

Lemma 3: If g' is a perfect composition and a correct Tgraph, then

\displaystyle \small\texttt{force} \ g' \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) \ g'

(Proof outline:) The proof is by analysis of each possible force rule applicable on a boundary edge of g' and checking local contexts to establish that (i) the result of applying (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) to the local context must include the added half-tile, and (ii) if the added half tile has a new boundary join, then the result must include both halves of the new half-tile. The two properties of perfect compositions mentioned in lemma 1 are critical for the proof. However, since the result of adding a single half-tile may break the condition of the Tgraph being a pefect composition, we need to arrange that half-tiles are completed first then each subsequent half-tile addition is paired with its wholetile completion. This ensures the perfect composition condition holds at each step for a proof by induction. [A separate proof is needed to show that the ordering of applying force rules makes no difference to a final correct Tgraph (apart from vertex relabelling)]. \square

Lemma 4 If g composes perfectly and is a correct Tgraph then

\displaystyle \small\texttt{force} \ g \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose})\ g

Proof: Assume g composes perfectly and is a correct Tgraph. Since \small\texttt{force} is non-decreasing (with respect to \sqsubseteq on correct Tgraphs)

\displaystyle \small\texttt{compose} \ g \sqsubseteq (\small\texttt{force} \cdot \small\texttt{compose}) \ g

and since \small\texttt{decompose} is monotonic

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g \sqsubseteq (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \ g

Since g composes perfectly, the left hand side is just g, so

\displaystyle g \sqsubseteq (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \ g

and since \small\texttt{force} is monotonic (with respect to \sqsubseteq on correct Tgraphs)

\displaystyle (*) \ \ \ \ \ \small\texttt{force} \ g \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \ g

For the opposite direction, we substitute \small\texttt{compose} \ g for g' in lemma 3 to get

\displaystyle (\small\texttt{force} \cdot \small\texttt{compose}) \ g \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \ g

Then, since (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g \equiv g, we have

\displaystyle (\small\texttt{force} \cdot \small\texttt{compose}) \ g \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force}) \ g

Apply \small\texttt{decompose} to both sides (using monotonicity)

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \ g \sqsubseteq (\small\texttt{decompose} \cdot \small\texttt{compose} \cdot \small\texttt{force}) \ g

For any g'' for which the composition is defined we have (\small\texttt{decompose} \cdot \small\texttt{compose})\ g'' \sqsubseteq g'' so we get

\displaystyle (\small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \ g \sqsubseteq \small\texttt{force} \ g

Now apply \small\texttt{force} to both sides and note (\small\texttt{force} \cdot \small\texttt{force})\ g \equiv \small\texttt{force} \ g to get

\displaystyle (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose}) \ g \sqsubseteq \small\texttt{force} \ g

Combining this with (*) above proves the required equivalence. \square

Theorem (Perfect Composition): If g composes perfectly and is a correct Tgraph then

\displaystyle (\small\texttt{compose} \cdot \small\texttt{force}) \ g \equiv (\small\texttt{force} \cdot \small\texttt{compose}) \ g

Proof: Assume g composes perfectly and is a correct Tgraph. By lemma 4 we have

\displaystyle \small\texttt{force} \ g \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose})\ g

Applying \small\texttt{compose} to both sides, gives

\displaystyle (\small\texttt{compose} \cdot \small\texttt{force}) \ g \equiv (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force} \cdot \small\texttt{compose})\ g

Now by lemma 2, with g_f = (\small\texttt{force} \cdot \small\texttt{compose}) \ g, the right hand side is equivalent to

\displaystyle (\small\texttt{force} \cdot \small\texttt{compose}) \ g

which establishes the result. \square

Corollaries (of the perfect composition theorem):

  1. If g' is a perfect composition and a correct Tgraph then
    \displaystyle \small\texttt{force} \ g' \equiv (\small\texttt{compose} \cdot \small\texttt{force} \cdot \small\texttt{decompose}) \ g'

    Proof: Let g' = \small\texttt{compose} \ g (so g \equiv \small\texttt{decompose} \ g') in the theorem. \square

    [This result generalises lemma 2 because any correct forced Tgraph g_f is necessarily wholetile complete and therefore a perfect composition, and \small\texttt{force} \ g_f \equiv g_f.]

  2. If g' is a perfect composition and a correct Tgraph then
    \displaystyle (\small\texttt{decompose} \cdot \small\texttt{force}) \ g' \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose}) \ g'

    Proof: Apply \small\texttt{decompose} to both sides of the previous corollary and note that

    \displaystyle (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g'' \sqsubseteq g'' \textit{ for any } g''

    provided the composition is defined, which it must be for a forced Tgraph by the Compose Force theorem. \square

  3. If g' is a perfect composition and a correct Tgraph then
    \displaystyle (\small\texttt{force} \cdot \small\texttt{decompose}) \ g' \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \ g'

    Proof: Apply \small\texttt{force} to both sides of the previous corollary noting \small\texttt{force} is monotonic and idempotent for correct Tgraphs

    \displaystyle (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \ g' \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose}) \ g'

    From the fact that \small\texttt{force} is non decreasing and \small\texttt{decompose} and \small\texttt{force} are monotonic, we also have

    \displaystyle (\small\texttt{force} \cdot \small\texttt{decompose}) \ g' \sqsubseteq (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \ g'

    Hence combining these two sub-Tgraph results we have

    \displaystyle (\small\texttt{force} \cdot \small\texttt{decompose}) \ g' \equiv (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{force}) \ g'

    \square

It is important to point out that if g is a correct Tgraph and \small\texttt{compose} \ g is a perfect composition then this is not the same as g composes perfectly. It could be the case that g has more faces than (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g and so g could have unknowns. In this case we can only prove that

\displaystyle (\small\texttt{force} \cdot \small\texttt{compose}) \ g \sqsubseteq (\small\texttt{compose} \cdot \small\texttt{force}) \ g

As an example where this is not an equivalence, choose g to be a star. Then its composition is the empty Tgraph (which is still a pefect composition) and so the left hand side is the empty Tgraph, but the right hand side is a sun.

Perfectly composing generators

The perfect composition theorem and lemmas and the three corollaries justify all the commuting implied by the diagrams in figure 8. However, one might ask more general questions like: Under what circumstances do we have (for a correct forced Tgraph g_f)

\displaystyle (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \ g_f \equiv g_f

Definition A generator of a correct forced Tgraph g_f is any Tgraph g such that g \sqsubseteq g_f and \small\texttt{force} \ g \equiv g_f.

We can now state that

Corollary If a correct forced Tgraph g_f has a generator which composes perfectly, then

\displaystyle (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \ g_f \equiv g_f

Proof: This follows directly from lemma 4 and the perfect composition theorem. \square

As an example where the required generator does not exist, consider the rightmost Tgraph of the middle row in figure 9. It is generated by the Tgraph directly below it, but it has no generator with a perfect composition. The Tgraph directly above it in the top row is the result of applying (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) which has lost the leftmost dart of the Tgraph.

Figure 9: A Tgraph without a perfectly composing generator
Figure 9: A Tgraph without a perfectly composing generator

We could summarise this section by saying that \small\texttt{compose} can lose information which cannot be recovered by a subsequent \small\texttt{force} and, similarly, \small\texttt{decompose} can lose information which cannot be recovered by a subsequent \small\texttt{force}. We have defined perfect compositions which are the Tgraphs that do not lose information when decomposed and Tgraphs which compose perfectly which are those that do not lose information when composed. Forcing does the same thing at each level of composition (that is it commutes with composition) provided information is not lost when composing.

4. Multiple Compositions

We know from the Compose Force theorem that the composition of a Tgraph that is forced is always a valid Tgraph. In this section we use this and the results from the last section to show that composing a forced, correct Tgraph produces a forced Tgraph.

First we note that:

Lemma 5: The composition of a forced, correct Tgraph is wholetile complete.

Proof: Let g' = \small\texttt{compose} \ g_f where g_f is a forced, correct Tgraph. A boundary join in g' implies there must be a boundary dart wing of the composable faces of g_f. (See for example figure 4 where this would be vertex 2 for the half dart case, and vertex 5 for the half-kite face). This dart wing cannot be an unknown as the half-dart is in the composable faces. However, a known dart wing must be either a large kite centre or a large dart base and therefore internal in the composable faces of g_f (because of the force rules) and therefore not on the boundary in g'. This is a contradiction showing that g' can have no boundary joins and is therefore wholetile complete. \square

Theorem: The composition of a forced, correct Tgraph is a forced Tgraph.

Proof: Let g' = \small\texttt{compose} \ g_f for some forced, correct Tgraph g_f, then g' is wholetile complete (by lemma 5) and therefore a perfect composition. Let g = \small\texttt{decompose} \ g', so g composes perfectly (g' \equiv \small\texttt{compose} \ g). By the perfect composition theorem we have

\displaystyle (**) \ \ \ \ \ (\small\texttt{compose} \cdot \small\texttt{force}) \ g \equiv (\small\texttt{force} \cdot \small\texttt{compose}) \ g \equiv \small\texttt{force} \ g'

We also have

\displaystyle g = \small\texttt{decompose} \ g' = (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g_f \sqsubseteq g_f

Applying \small\texttt{force} to both sides, noting that \small\texttt{force} is monotonic and the identity on forced Tgraphs, we have

\displaystyle \small\texttt{force} \ g \sqsubseteq \small\texttt{force} \ g_f \equiv g_f

Applying \small\texttt{compose} to both sides, noting that \small\texttt{compose} is monotonic, we have

\displaystyle (\small\texttt{compose} \cdot \small\texttt{force}) \ g \sqsubseteq \small\texttt{compose} \ g_f \equiv g'

By (**) above, the left hand side is equivalent to \small\texttt{force} \ g' so we have

\displaystyle \small\texttt{force} \ g' \sqsubseteq g'

but since we also have (\small\texttt{force} being non-decreasing)

\displaystyle g' \sqsubseteq \small\texttt{force} \ g'

we have established that

\displaystyle g' \equiv \small\texttt{force} \ g'

which means g' is a forced Tgraph. \square

This result means that after forcing once we can repeatedly compose creating valid Tgraphs until we reach the empty Tgraph.

We can also use lemma 5 to establish the converse to a previous corollary:

Corollary If a correct forced Tgraph g_f satisfies:

\displaystyle (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \ g_f \equiv g_f

then g_f has a generator which composes perfectly.

Proof: By lemma 5, \small\texttt{compose} \ g_f is wholetile complete and hence a perfect composition. This means that (\small\texttt{decompose} \cdot \small\texttt{compose}) \ g_f composes perfectly and it is also a generator for g_f because

\displaystyle (\small\texttt{force} \cdot \small\texttt{decompose} \cdot \small\texttt{compose}) \ g_f \equiv g_f

\square

5. Proof of the Compose Force theorem

Theorem (Compose Force): Composition of a forced Tgraph produces a valid Tgraph.

Proof: For any forced Tgraph we can construct the composed faces. For the result to be a valid Tgraph we need to show no crossing boundaries and connectedness for the composed faces. These are proved separately by case analysis below.

Proof of no crossing boundaries

Assume g_f is a forced Tgraph and that it has a non-empty set of composed faces (we can ignore cases where the composition is empty as the empty Tgraph is valid). Consider a vertex v in the composed faces of g_f and first take the case that v is on the boundary of g_f . We consider the possible local contexts for a vertex v on a forced Tgraph boundary and the nature of the composed faces at v in each case.

Figure 10: Forced Boundary Vertex Contexts
Figure 10: Forced Boundary Vertex Contexts

Figure 10 shows local contexts for a boundary vertex v in a forced Tgraph where the composition is non-empty. In each case v is shown as a red dot, and the composition is shown filled yellow. The cases for v are shown in rows: the first row is for dart origins, the second row is for kite origins, the next two rows are for kite wings, and the last two rows are for kite opps. The dart wing cases are a subset of the kite opp cases, so not repeated, and dart opp vertices are excluded because they cannot be on the boundary of a forced Tgraph. We only show left-hand versions, so there is a mirror symmetric set for right-hand versions.

It is easy to see that there are no crossing boundaries of the composed faces at v in each case. Since any boundary vertex of any forced Tgraph (with a non-empty composition) must match one of these local context cases around the vertex, we can conclude that a boundary vertex of g_f cannot become a crossing boundary in compose \ g_f.

Next take the case where v is an internal vertex of g_f .

Figure 11: Vertex types and their relationships
Figure 11: Vertex types and their relationships

Figure 11 shows relationships between the forced Tgraphs of the 7 (internal) vertex types (plus a kite at the top right). The red faces are those around the vertex type and the black faces are those produced by forcing (if any). Each forced Tgraph has its composition directly above with empty compositions for the top row. We note that a (forced) star, jack, king, and queen vertex remains an internal vertex in the respective composition so cannot become a crossing boundary vertex. A deuce vertex becomes the centre of a larger kite and is no longer present in the composition (top right). That leaves cases for the sun vertex and ace vertex (=fool vertex). The sun Tgraph (sunGraph) and fool Tgraph (fool) consist of just the red faces at the respective vertex (shown top left and top centre). These both have empty compositions when there is no surrounding context. We thus need to check possible forced local contexts for sunGraph and fool.

The fool case is simple and similar to a duece vertex in that it is never part of a composition. [To see this consider inverting the decomposition arrows shown in figure 4. In both cases we see the half-dart opp vertex (labelled 4 in figure 4) is removed].

For the sunGraph there are only 7 local forced context cases to consider where the sun vertex is on the boundary of the composition.

Figure 12: Forced Contexts for a sun vertex v where v is on the composition boundary
Figure 12: Forced Contexts for a sun vertex v where v is on the composition boundary

Six of these are shown in figure 12 (the missing one is just a mirror reflection of the fourth case). Again, the relevant vertex v is shown as a red dot and the composed faces are shown filled yellow, so it is easy to check that there is no crossing boundary of the composed faces at v in each case. Every forced Tgraph containing an internal sun vertex where the vertex is on the boundary of the composition must match one of the 7 cases locally round the vertex.

Thus no vertex from g_f can become a crossing boundary vertex in the composed faces and since the vertices of the composed faces are a subset of those of g_f, we can have no crossing boundary vertex in the composed faces.

Proof of Connectedness

Assume g_f is a forced Tgraph as before. We refer to the half-tile faces of g_f that get included in the composed faces as the composable faces and the rest as the remainder faces. We want to prove that the composable faces are connected as this will imply the composed faces are connected.

As before we can ignore cases where the set of composable faces is empty, and assume this is not the case. We study the nature of the remainder faces of g_f. Firstly, we note:

Lemma (remainder faces)

The remainder faces of g_f are made up entirely of groups of half-tiles which are either:

  1. Half-fools (= a half dart and both halves of the kite attached to its short edge) where the other half-fool is entirely composable faces, or
  2. Both halves of a kite with both short edges on the (g_f) boundary (so they are not part of a half-fool) where only the origin is in common with composable faces, or
  3. Whole fools with just the shared kite origin in common with composable faces.
Figure 13: Remainder face groups (cases 1,2, and 3)
Figure 13: Remainder face groups (cases 1,2, and 3)

These 3 cases of remainder face groups are shown in figure 13. In each case the border in common with composable faces is shown yellow and the red edges are necessarily on the boundary of g_f (the black boundary could be on the boundary of g_f or shared with another reamainder face group). [A mirror symmetric version for the first group is not shown.] Examples can be seen in e.g. figure 12 where the first Tgraph has four examples of case 1, and two of case 2, the second has six examples of case 1 and two of case 2, and the fifth Tgraph has an example of case 3 as well as four of case 1. [We omit the detailed proof of this lemma which reasons about what gets excluded in a composition after forcing. However, all the local context cases are included in figure 14 (left-hand versions), where we only show those contexts where there is a non-empty composition.]

We note from the (remainder faces) lemma that the common boundary of the group of remainder faces with the composable faces (shown yellow in figure 13) is just a single vertex in cases 2 and 3. In case 1, the common boundary is just a single edge of the composed faces which is made up of 2 adjacent edges of the composable faces that constitute the join of two half-fools.

This means each (remainder face) group shares boundary with exactly one connected component of the composable faces.

Next we establish that if two (remainder face) groups are connected they must share boundary with the same connected component of the composable faces. We need to consider how each (remainder face) group can be connected with a neighbouring such group. It is enough to consider forced contexts of boundary dart long edges (for cases 1 and 3) and boundary kite short edges (for case 2). The cases where the composition is non-empty all appear in figure 14 (left-hand versions) along with boundary kite long edges (middle two rows) which are not relevant here.

Figure 14: Forced contexts for boundary edges
Figure 14: Forced contexts for boundary edges

We note that, whenever one group of the remainder faces (half-fool, whole-kite, whole-fool) is connected to a neighbouring group of the remainder faces, the common boundary (shared edges and vertices) with the compososable faces is also connected, forming either 2 adjacent composed face boundary edges (= 4 adjacent edges of the composable faces), or a composed face boundary edge and one of its end vertices, or a single composed face boundary vertex.

It follows that any connected collection of the remainder face groups shares boundary with a unique connected component of the composable faces. Since the collection of composable and remainder faces together is connected (g_f is connected) the removal of the remainder faces cannot disconnect the composable faces. For this to happen, at least one connected collection of remainder face groups would have to be connected to more than one connected component of composable faces.

This establishes connectedness of any composition of a forced Tgraph, and this completes the proof of the Compose Force theorem. \square

References

[1] Martin Gardner (1977) MATHEMATICAL GAMES. Scientific American, 236(1), (pages 110 to 121). http://www.jstor.org/stable/24953856

[2] Grünbaum B., Shephard G.C. (1987) Tilings and Patterns. W. H. Freeman and Company, New York. ISBN 0-7167-1193-1 (Hardback) (pages 540 to 542).

by readerunner at November 07, 2023 01:55 PM

Donnacha Oisín Kidney

POPL Paper—Algebraic Effects Meet Hoare Logic in Cubical Agda

Posted on November 7, 2023
Tags:

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

by Donnacha Oisín Kidney at November 07, 2023 12:00 AM

November 01, 2023

Joachim Breitner

Joining the Lean FRO

Tomorrow is going to be a new first day in a new job for me: I am joining the Lean FRO, and I’m excited.

What is Lean?

Lean is the new kid on the block of theorem provers.

It’s a pure functional programming language (like Haskell, with and on which I have worked a lot), but it’s dependently typed (which Haskell may be evolving to be as well, but rather slowly and carefully). It has a refreshing syntax, built on top of a rather good (I have been told, not an expert here) macro system.

As a dependently typed programming language, it is also a theorem prover, or proof assistant, and there exists already a lively community of mathematicians who started to formalize mathematics in a coherent library, creatively called mathlib.

What is a FRO?

A Focused Research Organization has the organizational form of a small start up (small team, little overhead, a few years of runway), but its goals and measure for success are not commercial, as funding is provided by donors (in the case of the Lean FRO, the Simons Foundation International, the Alfred P. Sloan Foundation, and Richard Merkin). This allows us to build something that we believe is a contribution for the greater good, even though it’s not (or not yet) commercially interesting enough and does not fit other forms of funding (such as research grants) well. This is a very comfortable situation to be in.

Why am I excited?

To me, working on Lean seems to be the perfect mix: I have been working on language implementation for about a decade now, and always with a preference for functional languages. Add to that my interest in theorem proving, where I have used Isabelle and Coq so far, and played with Agda and others. So technically, clearly up my alley.

Furthermore, the language isn’t too old, and plenty of interesting things are simply still to do, rather than tried before. The ecosystem is still evolving, so there is a good chance to have some impact.

On the other hand, the language isn’t too young either. It is no longer an open question whether we will have users: we have them already, they hang out on zulip, so if I improve something, there is likely someone going to be happy about it, which is great. And the community seems to be welcoming and full of nice people.

Finally, this library of mathematics that these users are building is itself an amazing artifact: Lots of math in a consistent, machine-readable, maintained, documented, checked form! With a little bit of optimism I can imagine this changing how math research and education will be done in the future. It could be for math what Wikipedia is for encyclopedic knowledge and OpenStreetMap for maps – and the thought of facilitating that excites me.

With this new job I find that when I am telling friends and colleagues about it, I do not hesitate or hedge when asked why I am doing this. This is a good sign.

What will I be doing?

We’ll see what main tasks I’ll get to tackle initially, but knowing myself, I expect I’ll get broadly involved.

To get up to speed I started playing around with a few things already, and for example created Loogle, a Mathlib search engine inspired by Haskell’s Hoogle, including a Zulip bot integration. This seems to be useful and quite well received, so I’ll continue maintaining that.

Expect more about this and other contributions here in the future.

by Joachim Breitner (mail@joachim-breitner.de) at November 01, 2023 08:47 PM

October 30, 2023

Sandy Maguire

Certainty by Construction: Done!

Happy days and happy news: it’s done.

Certainty by Construction

After a year of work, I’m thrilled to announce the completion my new book, Certainty by Construction.

Certainty by Construction is a book on doing mathematics and software design in the proof assistant Agda, which is the language Haskell wants to be when it grows up. The book is part Agda primer, introduction to abstract algebra, and algorithm design manual, with a healthy dose of philosophy mixed in to help build intuition.

If you’re the sort of person who would like to learn more math (including all the proof burden), and see how to apply it to writing real software, I think you’d groove on this book. If it sounds up your alley, I’d highly encourage you to give it a read.

I’m not much on social media these days, but if you are, I’d really appreciate a signal boost on this announcement! Thanks to everyone for their support and understanding over the last year. I love you all!

Go cop Certainty by Construction!

October 30, 2023 12:00 AM

October 29, 2023

Joachim Breitner

Squash your Github PRs with one click

TL;DR: Squash your PRs with one click at https://squasher.nomeata.de/.

Very recently I got this response from the project maintainer at a pull request I contributed: “Thanks, approved, please squash so that I can merge.”

It’s nice that my contribution can go it, but why did the maintainer not just press the “Squash and merge button”, and instead adds the this unnecessary roundtrip to the process? Anyways, maintainers make the rules, so I play by them. But unlike the maintainer, who can squash-and-merge with just one click, squashing the PR’s branch is surprisingly laberous: Github does not allow you to do that via the Web UI (and hence on mobile), and it seems you are expected to go to your computer and juggle with git rebase --interactive.

I found this rather annoying, so I created Squasher, a simple service that will squash your branch for you. There is no configuration, just paste the PR url. It will use the PR title and body as the commit message (which is obviously the right way™), and create the commit in your name:

Squasher in action
Squasher in action

If you find this useful, or found it to be buggy, let me know. The code is at https://github.com/nomeata/squasher if you are curious about it.

by Joachim Breitner (mail@joachim-breitner.de) at October 29, 2023 09:46 PM

October 22, 2023

Tony Zorman

Fixing Lsp-Mode's Hover Signatures

Posted on 2023-10-22  ·  last modified: 2023-10-27  ·  5 min read  ·  , ,

By now, LSP servers have become the norm for editor-agnostic language support. As expected, Emacs features at least two packages that implement the protocol: the built-in eglot, and the third-party lsp-mode. I will focus on the latter in this post.

Lsp clients have the option of showing useful things on hover. In most languages, there is an obvious candidate for this: the type signature of the thing at point. Sadly—for some languages—the implementation of the feature is… not great. Buggy even, one might say.1 Taking this as an excuse to talk about Emacs’s infinite customisability, there is of course a way to fix this within the bounds of our configuration. Let’s do that!

The problem

Take any Haskell function with a long enough type signature, like the following:

iAmTooLong :: String -> String -> String -> String -> String -> String -> String -> String
iAmTooLong = undefined

By default, lsp-mode2 will display the following type signature in the echo-area when hovering over the function name:

By default, lsp-mode only shows `iamTooLong :: Stirng`

That’s… not correct. Executing lsp-describe-thing-at-point immediately reveals the problem; the request we get back looks like this:

``` haskell
iAmTooLong :: String
-> String
-> String
-> String
-> String
-> String
-> String
-> String
```

Defined at »PATH«

The type signature is so long that the server breaks it into several lines. Lsp-mode uses lsp-clients-extract-signature-on-hover to extract a signature on hover—by default, it looks like this:

(cl-defgeneric lsp-clients-extract-signature-on-hover (contents _server-id)
  "Extract a representative line from CONTENTS, to show in the echo area."
  (car (s-lines (s-trim (lsp--render-element contents)))))

It just takes the first line of the first markdown code block. While this works for simple type signatures, it obviously falls flat in more complicated scenarios. However, this being a generic function, there’s the possibility to overload it depending on the major mode.

Fixing Haskell type signatures

The strategy seems pretty clear: extract the whole block instead of only the first line. This is swiftly done:3

(defun slot/lsp-get-type-signature (lang str)
  "Get LANGs type signature in STR.
Original implementation from https://github.com/emacs-lsp/lsp-mode/pull/1740."
  (let* ((start (concat "```" lang))
         (groups (--filter (s-equals? start (car it))
                           (-partition-by #'s-blank? (s-lines (s-trim str)))))
         (name-at-point (symbol-name (symbol-at-point)))
         (type-sig-group (car
                          (--filter (s-contains? name-at-point (cadr it))
                                    groups))))
    (->> (or type-sig-group (car groups))
         (-drop 1)                    ; ``` LANG
         (-drop-last 1)               ; ```
         (-map #'s-trim)
         (s-join " "))))

We can now override the method with our own implementation:

(cl-defmethod lsp-clients-extract-signature-on-hover
  (contents (_server-id (eql lsp-haskell))) ; Only for Haskell.
  "Display the type signature of the function at point."
  (slot/lsp-get-type-signature "haskell" (plist-get contents :value)))

This already looks fine, but something is still amiss.

Correctly shows the whole type signature, but there is no syntax highlighting

There is no syntax highlighting! Thankfully, this is not very difficult to fix; the idea is to paste the string into a temporary buffer, activate haskell-mode, and grab the propertised string from that. The only thing to take care of is that we dont want to run lsp-mode and friends again in the temporary buffer.

(defun slot/syntax-highlight-string (str mode)
  "Syntax highlight STR in MODE."
  (with-temp-buffer
    (insert str)
    ;; We definitely don't want to call certain modes, so delay the mode's
    ;; hooks until we have removed them.
    (delay-mode-hooks (funcall mode))
    (-map #'funcall
          (--remove (-contains? '(lsp-mode lsp-deferred) it)
                    (-mapcat #'symbol-value delayed-mode-hooks)))
    ;; Now we can propertise the string.
    (font-lock-ensure)
    (buffer-string)))

Lsp-mode also provides a function for this, lsp--render-string, but that one does not try to load all of the “safe” hooks for the major mode. However, I have some pretify-symbols-mode configuration for Haskell which I would very much like to take effect.

All in all, we have4

;; Fixes https://github.com/emacs-lsp/lsp-haskell/issues/151
(cl-defmethod lsp-clients-extract-signature-on-hover
  (contents (_server-id (eql lsp-haskell)))
  "Display the type signature of the function at point."
  (slot/syntax-highlight-string
   (slot/lsp-get-type-signature "haskell" (plist-get contents :value))
   'haskell-mode))

This works quite nicely:

Properly syntax highlighted type signature

Fixing Rust hovers

One of the above code snippets already mentions lsp-mode#1740, which is not about Haskell, but Rust, a language that I also occasionally dabble in. The basic issue here goes like this: by default, lsp-mode shows the following hover information.

By default, the hover shows the module that the identifier is imported from

Much like the user who opened the mentioned pull-request, I really don’t care about this. Instead, I’d much rather see

Instead of the module, show the type singature

which looks much more useful to me.

Luckily, this is exactly the same situation as in the Haskell case, which we already fixed. Writing

(cl-defmethod lsp-clients-extract-signature-on-hover
  (contents (_server-id (eql rust-analyzer))) ; Only for Rust.
  "Display the type signature of the function at point."
  (slot/syntax-highlight-string
   (slot/lsp-get-type-signature "rust" (plist-get contents :value))
   'rustic-mode))

works out of the box. Nice.

Bonus: adding type signatures

Here’s another problem that we’ve solved en passant: lsp-mode has code-lens support5, which enables one to add type signatures by clicking on the relevant button:

Clicking on the relevant code lens adds a type signature

However, this ostensibly requires me to use the mouse,6 and—more importantly—the above GIF also shows that local functions do not have such a code lens attached to them. I quite like type signatures for local definitions, so that’s a bit of a shame.

Fixing this is not terribly difficult either; the hardest thing is having to look through lsp-mode’s codebase so one actually knows which functions to call. When defining the overrides for lsp-clients-extract-signature-on-hover, the LSP response was free, whereas now we want to create a request for the thing at point.

(defun slot/lsp-get-type-signature-at-point (&optional lang)
  "Get LANGs type signature at point.
LANG is not given, get it from `lsp--buffer-language'."
  (interactive)
  (-some->> (lsp--text-document-position-params)
    (lsp--make-request "textDocument/hover")
    lsp--send-request
    lsp:hover-contents
    (funcall (-flip #'plist-get) :value)
    (slot/lsp-get-type-signature (or lang lsp--buffer-language))))

Once we have the type signature at point, all that’s left is to insert it into the buffer.7

(defun slot/lsp-haskell-type-signature ()
  "Add a type signature for the thing at point.
This is very convenient, for example, when dealing with local
functions, since those—as opposed to top-level expressions—don't
have a code lens for \"add type signature here\" associated with
them."
  (interactive)
  (let* ((value (slot/lsp-get-type-signature-at-point "haskell")))
    (slot/back-to-indentation)
    (insert value)
    (haskell-indentation-newline-and-indent)))

Bind that to a key and you’re good to go!

Clicking on the relevant code lens adds a type signature


  1. I have reported this as a bug here, but that issue seems to have stalled, so here we are.↩︎

  2. And also eglot, judging from a cursory test.↩︎

  3. Even more so because smart people have already written this for me; see the docstring.↩︎

  4. {-} 󠀠

    󠀠

    This code assumes that lsp-mode uses plists instead of hash tables for deserialisation. If you don’t have the lsp-use-plists variable set—and have recompiled lsp-mode afterwards—then just replace (plist-get contents :value) with (gethash "value" contents).↩︎

  5. Incidentally, this is the only reason that I use lsp-mode over eglot. There is a stalled PR from five years ago, but that never led anywhere. Someone should pick this back up, I suppose.↩︎

  6. Lsp-mode also provides lsp-avy-lens, so this is not really an actual problem.↩︎

  7. {-} For when hovering inexplicably breaks again, this also enables for a quick definition of “show the type signature of the thing at point”:

    (defun slot/lsp-show-type-signature ()
      "Show the type signature for the thing at
    point.  This is essentially what
    `lsp-clients-extract-signature-on-hover'
    does, just as an extra function."
      (interactive)
      (message
       (slot/syntax-highlight-string
        (slot/lsp-get-type-signature-at-point)
        major-mode)))

    This can, again, be bound to a key for convenient access.↩︎

October 22, 2023 12:00 AM

October 21, 2023

ERDI Gergo

Getting my HomeLab-2 sea legs

Previously, we left off our HomeLab-2 game jam story with two somewhat working emulators, and a creeping realization that we still haven't written a single line of code.

Actually, it was a bit worse than that. My initial "plan" was to "participate in the HomeLab-2 game jam", with nothing about pesky details such as:

  • What game do I want to make?
  • What technology will I use?
  • How will I find time to do it, given that I'm spending all of September back home in Hungary?

I found the answers to these questions in reverse order. First of all, since for three of the five weeks I've spent in Hungary, I was working from home instead of being on leave, we didn't really have much planned for those days so the afternoons were mostly free.

Because the HomeLab-2 is so weak in its processing power (what with its Z80 only doing useful work in less than 20% of the time if you want to have video output), and also because I have never ever done any assembly programming, I decided now or never: I will go full assembly. Perhaps unsurprisingly, perhaps as a parody of myself, I found a way to use Haskell as my Z80 assembler of choice.

This left me with the question of what game to do. Coming up with a completely original concept was out of the quesiton simply because I lack both the game designing experience as well as ideas. Also, if there's one thing I've learnt from the Haskell Tiny Games Jam, it is that it's better to crank out multiple poor quality entries (and improve in the process) than it is to aim for the stars (a.k.a. that pottery class story that is hard to find an authoritative origin for). Another constraint was that neither of my emulators supported raster graphics, and I was worried that even if they did, it would be too slow on real hardware; so I wanted to come up with games that would work well with character graphics.

Screenshot of Snake

After a half-hearted attempt at Tetris (which I stopped working on when someone else has already submitted a Tetris implementation), the first game I actually finished was Snake. For testing, I just hacked my emulator so that on boot, it loads the game to its fixed stating address, then used CALL from HomeLab BASIC to start it. This was much more convenient than loading from WAV files; doubly so because it took me a while to figure out how exactly to generate a valid WAV file. For the release version, I ended up going via an HTP file (a byte-level representation of the cassette tape contents) which is used by some of the pre-existing emulators. There's an HTP to WAV converter completing the pipeline.

Screenshot of Snake's attract screen

There's not much to say my Snake. I tried to give it a bit of an arcade machine flair, with an animated attract screen and some simple screen transitions between levels. One of the latter was inspired by Wolfenstein 3D's death transition effect: since the textual video mode has 40×25 characters, a 10-bit maximal LFSR can be used as a computationally cheap way of replacing every character in a seemingly random (yet full-screen-covering) way.

For my second entry, I went with 2048. Shortly after finishing Snake, thanks to Gábor Képes I had the opportunity to try a real HomeLab-2 machine. Feeling how unresponsive the original keyboard is convinced me that real-time games are not the way to go.

Screenshot of HL-2048

The challenge with a 2048-like game on a platform like this is that you have to update a large portion of the screen as the tiles slide around. Having an assembler that is an EDSL made it a breeze to try various speedcoding techniques, but with lots of tiles sliding around, I just couldn't get it to fit within the frame budget, which led to annoying flickering as a half-finished frame would get redrawn before the video refreshing interrupt handler eventually returned to finish the rest of the frame. So I ended up using double buffering, which technically makes each frame longer to draw, but avoids inconsistent visible frames.

Since the original 2048 is a product of modern times, I decided to pair my implementation with a more clean design: just like a phone app, it boots straight into the game with no intro screen, with all controls shown right on the main screen.

Between these two games, and all the other fun stuff one doesn when visiting home for a month, September flew by. As October approached, I stumbled upon this year's RetroChallenge announcement and realized the potential for synergy between doing something cool in a last hurrah before the HomeLab-2 game jam deadline and also blogging about it for RetroChallenge. But this meant less than two weeks to produce a magnum opus. Which is why this blogpost series became retrospective — there was no way to finish my third game on time while also writing about it.

But what even was that third and final game idea? Let's find out in the next post.

October 21, 2023 10:37 AM

October 20, 2023

Chris Reade

Graphs, Kites and Darts – Empires and SuperForce

We have been exploring properties of Penrose’s aperiodic tilings with kites and darts using Haskell.

Previously in Diagrams for Penrose tiles we implemented tools to draw finite tilings using Haskell diagrams. There we also noted that legal tilings are only correct tilings if they can be continued infinitely and are incorrect otherwise. In Graphs, Kites and Darts we introduced a graph representation for finite tilings (Tgraphs) which enabled us to implement operations that use neighbouring tile information. In particular we implemented a force operation to extend a Tgraph on any boundary edge where there is a unique choice for adding a tile.

In this note we find a limitation of force, show a way to improve on it (superForce), and introduce boundary coverings which are used to implement superForce and calculate empires.

Properties of Tgraphs

A Tgraph is a collection of half-tile faces representing a legal tiling and a half-tile face is either an LD (left dart) , RD (right dart), LK (left kite), or RK (right kite) each with 3 vertices to form a triangle. Faces of the Tgraph which are not half-tile faces are considered external regions and those edges round the external regions are the boundary edges of the Tgraph. The half-tile faces in a Tgraph are required to be connected and locally tile-connected which means that there are exactly two boundary edges at any boundary vertex (no crossing boundaries).

As an example Tgraph we show kingGraph (the three darts and two kites round a king vertex), where

  kingGraph = makeTgraph 
    [LD (1,2,3),RD (1,11,2),LD (1,4,5),RD (1,3,4),LD (1,10,11)
    ,RD (1,9,10),LK (9,1,7),RK (9,7,8),RK (5,7,1),LK (5,6,7)
    ]

This is drawn in figure 1 using

  hsep 1 [labelled drawj kingGraph, draw kingGraph]

which shows vertex labels and dashed join edges (left) and without labels and join edges (right). (hsep 1 provides a horizontal seperator of unit length.)

Figure 1: kingGraph with labels and dashed join edges (left) and without (right).
Figure 1: kingGraph with labels and dashed join edges (left) and without (right).

Properties of forcing

We know there are at most two legal possibilities for adding a half-tile on a boundary edge of a Tgraph. If there are zero legal possibilities for adding a half-tile to some boundary edge, we have a stuck tiling/incorrect Tgraph.

Forcing deals with all cases where there is exactly one legal possibility for extending on a boundary edge. That means forcing either fails at some stage with a stuck Tgraph (indicating the starting Tgraph was incorrect) or it enlarges the starting Tgraph until every boundary edge has exactly two legal possibilities for adding a half-tile so a choice would need to be made to grow the Tgraph any further.

Figure 2 shows force kingGraph with kingGraph shown red.

Figure 2: force kingGraph with kingGraph shown red.
Figure 2: force kingGraph with kingGraph shown red.

If g is a correct Tgraph, then force g succeeds and the resulting Tgraph will be common to all infinite tilings that extend the finite tiling represented by g. However, we will see that force g is not a greatest lower bound of (infinite) tilings that extend g. Firstly, what is common to all extensions of g may not be a connected collection of tiles. This leads to the concept of empires which we discuss later. Secondly, even if we only consider the connected common region containing g, we will see that we need to go beyond force g to find this, leading to an operation we call superForce.

Our empire and superForce operations are implemented using boundary coverings which we introduce next.

Boundary edge covering

Given a successfully forced Tgraph fg, a boundary edge covering of fg is a list of successfully forced extensions of fg such that

  1. no boundary edge of fg remains on the boundary in each extension, and
  2. the list takes into account all legal choices for extending on each boundary edge of fg.

[Technically this is a covering of the choices round the boundary, but each extension is also a cover of the boundary edges.] Figure 3 shows a boundary edge covering for a forced kingGraph (force kingGraph is shown red in each extension).

Figure 3: A boundary edge covering of force kingGraph.
Figure 3: A boundary edge covering of force kingGraph.

In practice, we do not need to explore both choices for every boundary edge of fg. When one choice is made, it may force choices for other boundary edges, reducing the number of boundary edges we need to consider further.

The main function is boundaryECovering working on a BoundaryState (which is a Tgraph with extra boundary information). It uses covers which works on a list of extensions each paired with the remaining set of the original boundary edges not yet covered. (Initially covers is given a singleton list with the starting boundary state and the full set of boundary edges to be covered.) For each extension in the list, if its uncovered set is empty, that extension is a completed cover. Otherwise covers replaces the extension with further extensions. It picks the (lowest numbered) boundary edge in the uncovered set, tries extending with a half-dart and with a half-kite on that edge, forcing in each case, then pairs each result with its set of remaining uncovered boundary edges before adding the resulting extensions back at the front of the list to be processed again. If one of the choices for a dart/kite leads to an incorrect tiling (a stuck tiling) when forced, that choice is dropped (provided the other choice succeeds). The final list returned consists of all the completed covers.

  boundaryECovering:: BoundaryState -> [BoundaryState]
  boundaryECovering bs = covers [(bs, Set.fromList (boundary bs))]

  covers:: [(BoundaryState, Set.Set Dedge)] -> [BoundaryState]
  covers [] = []
  covers ((bs,es):opens) 
    | Set.null es = bs:covers opens -- bs is complete
    | otherwise   = covers (newcases ++ opens)
       where (de,des) = Set.deleteFindMin es
             newcases = fmap (\b -> (b, commonBdry des b))
                             (atLeastOne $ tryDartAndKite bs de)

Here we have used

  type Try a = Either String a
  tryDartAndKite:: BoundaryState -> Dedge -> [Try BoundaryState]
  atLeastOne    :: [Try a] -> [a]

We frequently use Try as a type for results of partial functions where we need to continue computation if there is a failure. For example we have a version of force (called tryForce) that returns a Try Tgraph so it does not fail by raising an error, but returns a result indicating either an explicit failure situation or a successful result with a final forced Tgraph. The function tryDartAndKite tries adding an appropriate half-dart and half-kite on a given boundary edge, then uses tryForceBoundary (a variant of tryForce which works with boundary states) on each result and returns a list of Try results. The list of Try results is converted with atLeastOne which collects the successful results but will raise an error when there are no successful results.

Boundary vertex covering

You may notice in figure 3 that the top right cover still has boundary vertices of kingGraph on the final boundary. We use a boundary vertex covering rather than a boundary edge covering if we want to exclude these cases. This involves picking a boundary edge that includes such a vertex and continuing the process of growing possible extensions until no boundary vertices of the original remain on the boundary.

Empires

A partial example of an empire was shown in a 1977 article by Martin Gardner 1. The full empire of a finite tiling would consist of the common faces of all the infinite extensions of the tiling. This will include at least the force of the tiling but it is not obviously finite. Here we confine ourselves to the empire in finite local regions.

For example, we can calculate a local empire for a given Tgraph g by finding the common faces of all the extensions in a boundary vertex covering of force g (which we call empire1 g).

This requires an efficient way to compare Tgraphs. We have implemented guided intersection and guided union operations which, when given a common edge starting point for two Tgraphs, proceed to compare the Tgraphs face by face and produce an appropriate relabelling of the second Tgraph to match the first Tgraph only in the overlap where they agree. These operations may also use geometric positioning information to deal with cases where the overlap is not just a single connected region. From these we can return a union as a single Tgraph when it exists, and an intersection as a list of common faces. Since the (guided) intersection of Tgraphs (the common faces) may not be connected, we do not have a resulting Tgraph. However we can arbitrarily pick one of the argument Tgraphs and emphasise which are the common faces in this example Tgraph.

Figure 4 (left) shows empire1 kingGraph where the starting kingGraph is shown in red. The grey-filled faces are the common faces from a boundary vertex covering. We can see that these are not all connected and that the force kingGraph from figure 2 corresponds to the connected set of grey-filled faces around and including the kingGraph in figure 4.

Figure 4: King's empire (level 1 and level 2).
Figure 4: King’s empire (level 1 and level 2).

We call this a level 1 empire because we only explored out as far as the first boundary covering. We could instead, find further boundary coverings for each of the extensions in a boundary covering. This grows larger extensions in which to find common faces. On the right of figure 4 is a level 2 empire (empire2 kingGraph) which finds the intersection of the combined boundary edge coverings of each extension in a boundary edge covering of force kingGraph. Obviously this process could be continued further but, in practice, it is too inefficient to go much further.

SuperForce

We might hope that (when not discovering an incorrect tiling), force g produces the maximal connected component containing g of the common faces of all infinite extensions of g. This is true for the kingGraph as noted in figure 4. However, this is not the case in general.

The problem is that forcing will not discover if one of the two legal choices for extending a resulting boundary edge always leads to an incorrect Tgraph. In such a situation, the other choice would be common to all infinite extensions.

We can use a boundary edge covering to reveal such cases, leading us to a superForce operation. For example, figure 5 shows a boundary edge covering for the forced Tgraph shown in red.

Figure 5: One choice cover.
Figure 5: One choice cover.

This example is particularly interesting because in every case, the leftmost end of the red forced Tgraph has a dart immediately extending it. Why is there no case extending one of the leftmost two red edges with a half-kite? The fact that such cases are missing from the boundary edge covering suggests they are not possible. Indeed we can check this by adding a half-kite to one of the edges and trying to force. This leads to a failure showing that we have an incorrect tiling. Figure 6 illustrates the Tgraph at the point that it is discovered to be stuck (at the bottom left) by forcing.

Figure 6: An incorrect extension.
Figure 6: An incorrect extension.

Our superForce operation starts by forcing a Tgraph. After a successful force, it creates a boundary edge covering for the forced Tgraph and checks to see if there is any boundary edge of the forced Tgraph for which each cover has the same choice. If so, that choice is made to extend the forced Tgraph and the process is repeated by applying superForce to the result. Otherwise, just the result of forcing is returned.

Figure 7 shows a chain of examples (rockets) where superForce has been used. In each case, the starting Tgraph is shown red, the additional faces added by forcing are shown black, and any further extension produced by superForce is shown in blue.

Figure 7: SuperForce rockets.
Figure 7: SuperForce rockets.

Coda

We still do not know if forcing decides that a Tgraph is correct/incorrect. Can we conclude that if force g succeeds then g (and force g) are correct? We found examples (rockets in figure 7) where force succeeds but one of the 2 legal choices for extending on a boundary edge leads to an incorrect Tgraph. If we find an example g where force g succeeds but both legal choices on a boundary edge lead to incorrect Tgraphs we will have a counter-example. If such a g exists then superForce g will raise an error. [The calculation of a boundary edge covering will call atLeastOne where both branches have led to failure for extending on an edge.]

This means that when superForce succeeds every resulting boundary edge has two legal extensions, neither of which will get stuck when forced.

I would like to thank Stephen Huggett who suggested the idea of using graphs to represent tilings and who is working with me on proof problems relating to the kite and dart tilings.

Reference [1] Martin Gardner (1977) MATHEMATICAL GAMES. Scientific American, 236(1), (pages 110 to 121). http://www.jstor.org/stable/24953856

by readerunner at October 20, 2023 01:06 PM

October 17, 2023

Oleg Grenrus

More traversals and more Cabal SAT

Posted on 2023-10-17 by Oleg Grenrus

In [the previous post] I discussed using traversals for batch operations.

I forgot to mention any libraries which actually do this. They are kind of hard to find, as often the Traversable usage comes up very naturally.

One such example is unification-fd. As the name suggests the library is for doing unification. One operation in the process is applying bindings, i.e. substituting the unification values with the terms they have been unified to. (I think that's what zonking is in GHC1).

The function type signature is

applyBindings :: (...)
              => UTerm t v -> em m (UTerm t v)

But the library also provides the batched method:

applyBindingsAll :: (..., Traversable s)
                 => s (UTerm t v) -> em m (s (UTerm t v))

And the docs say:

Same as applyBindings, but works on several terms simultaneously. This function preserves sharing across the entire collection of terms, whereas applying the bindings to each term separately would only preserve sharing within each term.

The library also has freshen and freshenAll.

When I was studying how unification-fd works, having applyBindingsAll operation with Traversable is very natural, the library make use of Traversable a lot already anyway.

There are probably more examples, but I cannot find them. (If you know any others, please tell me, I'll be happy to learn more, and maybe include them into this post).

sat-simple

One another example is sat-simple, a hopefully simple SAT library (e.g. simpler than ersatz).

/The/ operation of a library is

solve :: Traversable model => model (Lit s) -> SAT s (model Bool)

We have some model with symbolic boolean variables (Lit s), and the solve finds a concrete assignment of them Bool.

For comparison, ersatz uses type-family (Decoded):

solveWith :: (Monad m, HasSAT s, Default s, Codec a)
          => Solver s m -> StateT s m a -> m (Result, Maybe (Decoded a))

while ersatz approach is arguably more expressive, find it somewhat more "magical": The Decoded x value may look very different than x.

different types

I saw a comment on Twitter/X

If arguments can have different types then you need to generalize somehow, and product-profunctors is one sufficient generalization.

I never grasped product-profunctors library. The ProductProfunctor class looks like

class (forall a. Applicative p a, Profunctor p) => where
   (***!) :: p a b -> p a' b' -> p (a, a') (b, b')  -- Arrow has (***)

and it feels like a very ad-hoc collection of things.

indexed

There is an alternative solution to "if arguments can have different types". Often when you have singular thing, and you want to generalize to many things, you make an indexed version of the singular thing.

By indexed here I mean, changing Type to I -> Type for some index I.

A simple example is recursive types. Suppose a language has recursive types, so we can write

data Nat = Zero | Succ Nat

but this language does not have mutual recursive types, but happens to have DataKinds and GADTs like features.

So we cannot write

data Even = Zero | SuccOdd  Odd
data Odd  = One  | SuccEven Even

but we can write

data I = E | O

type :: I -> Type
data EvenOdd i where
    Zero     :: EvenOdd E
    SuccOdd  :: EvenOdd O -> EvenOdd E
    One      :: EvenOdd O
    SuccEven :: EvenOdd E -> EvenOdd O

type Even = EvenOdd E
type Odd  = EvenOdd O

And sometimes the latter encoding "works" better, e.g. mutual recursion becomes ordinary recursion on a single type. I remember having better time satisfying Coq termination checker with a similar trick.

another indexed traversable

So what does "indexed" Traversable looks like. It looks like

type FTraversable :: ((k -> Type) -> Type) -> Constraint
class (...) => FTraversable t where
  ftraverse :: Applicative m => (forall a. f a -> m (g a)) -> t f -> m (t g)

This class exists in many libraries on Hackage

I use FTraversable in my version-sat experiment.

It's like simple-sat, but adds Version literals. The solve function has more general type, which however looks very similar.

solve :: FTraversable model => model (Lit s) -> SAT s (model Identity)

We can have symbolic booleans Lit s Bool, but also symbolic versions Lit s Version, the resulting model will have Bools and Versions (wrapped in Identity).

"Historical" note: simple-sat started as hkdsat, trying to allow encodings like in ersatz, and maybe it eventually will, if I find simple way to add them. 2

version-sat: satisfiable

What I do with version-sat. Well, it's just an experiment for now. One thing you can do, is to ask whether a Cabal library has any build plan.

That is very straight-forward: convert a library stanza (conditional tree) information into proposition formula and ask whether it has any satisfiable models.

It turns out that 417 of 125991 libraries are unsatisfiable. For example vector-0.12.1.1 has been revisioned with base <0 bound.

I think that is fine number. Mistakes happen and 0.33% is a very small amount of b0rked releases. Many of these revisions are actually on my packages. And probably the number should be a bit larger, as people deprecate package version, which allows them still be installed, just less prioritized.

While you can look for unsatisfiable build-depends syntactically, it's becomes less obvious with if conditionals etc.

Throwing problem at a SAT solver in full generality is a complete (i.e. always give a definitive answer) approach.

version-sat: disjoint automatic flags

Another question we can ask version-sat is

Is there a install-plan which satisfies package definition with an automatic flag turned on *and* off.

That probably needs an explanation. In cabal-install solver as sat solver I briefly touched this topic.

Perfectly, the automatic flag assignment is disjoint, so the assignment made by dependency solver is deterministic (function of package versions in install plan).

The easy way to ensure it is to have disjoint constraints:

  if flag(old-locale)
    build-depends:
        old-locale  >=1.0.0.2 && <1.1
      , time        >=1.4     && <1.5

  else
    build-depends: time >=1.5 && <1.7

The time <1.5 and time >=1.5 constraints are disjoint, so depending on which time package version is picked, the value of old-locale flag is forced.

I was surprised that 15776 of 136048 libraries are with non-disjoint automatic flags (11.60%). That number seems very high.

There are obvious false positives however, e.g. semigroups has following structure

if impl(ghc < 7.11.20151002)
    if flag(bytestring)
      if flag(bytestring-builder)
        build-depends: bytestring         >= 0.9    && < 0.10.4,
                       bytestring-builder >= 0.10.4 && < 1
      else
        build-depends: bytestring         >= 0.10.4 && < 1

An automatic bytestring-builder flag has only an effect on old GHC and when manul bytestring flag are on.

However, the dependency solver would still try to flip bytestring-builder flag if it cannot satisfy the other dependencies. Not a terrible cost in case of semigroups, but might be for some other packages (with non-trivial dependencies). A way to force it would be to have

if impl(ghc < 7.11.20151002)
  ...
else
  if flag(bytestring-builder)
    build-depends: base <0

Another example of invalid usage of automatic flag is e.g. examples flag in Earley (which have been fixed long ago: examples is now a manual flag). The flag disables building of example executables. When it was automatic dependency solver could unnecessarily flip it, and try to satisfy the example dependencies as well.

Unfortunately there are a lot of what I consider invalid usage of automatic flags. (Having flags automatic by default is really a wrong default, IMO).

But for example accelerate, atomic-primops, hashtables, unordered-containers have flags like debug, bounds-checks which affect the package code in a non-trivial way. You definitely don't want dependency solver flipping flags like that.

The Cabal documentation says

By default, Cabal will first try to satisfy dependencies with the default flag value and then, if that is not possible, with the negated value.

However, I don't think this can or should be relied upon. The build plans are generally non-comparable.

A somewhat contrived example is

flag foo
  default: True

flag bar
  default: True

library
  ...

  if flag(foo) && flag(bar)
    build-depends: base <0

the solver will need to make a (non-deterministic choice) to flip either flag.

Secondly, it restricts possible alternative solver implementations. I.e. they also would need to try hard to keep automatic flags at their default values. Luckily e.g. minisat tries literals with False first, so one can initialise flag literals so their default value matches. Still, SAT solver is a black box, there isn't hard guarantees it won't flip something just because it feels like that.

TL;DR from the cabal-install solver as sat solver post:

Only use automatic flags for encoding `if build-depends(...)` like constraints.

  1. GHC relies heavily on mutability in the typechecker for efficient operation. For this reason, throughout much of the type checking process meta type variables (the MetaTv constructor of TcTyVarDetails) are represented by mutable variables (known as TcRefs).

    Zonking is the process of ripping out these mutable variables and replacing them with a real Type. This involves traversing the entire type expression, but the interesting part of replacing the mutable variables occurs in zonkTyVarOcc.↩︎

  2. In my opinion, the Bool only, Traversable based solve is very simple to work with, when it's enough. And the Version encoding in version-sat is (ab)using mutability a lot, I haven't tried to do it in ersatz.↩︎

October 17, 2023 12:00 AM

October 12, 2023

Oleg Grenrus

Use traversals for batch operations

Posted on 2023-10-12 by Oleg Grenrus

Often enough we have an API which may (or need) to provide a batch operation: "give me many inputs, and I'll give you many outputs".

For example, shake has operators like

-- Define a rule for building multiple files at the same time.
(&%>) :: [FilePattern] -> ([FilePath] -> Action ()) -> Rules ()

And the usage looks like

["*.o","*.hi"] &%> \[o,hi] -> do
    let hs = o -<.> "hs"
    need ... -- all files the .hs import
    cmd "ghc -c" [hs]

but that is terrible. \[o, hi] -> ... is a incomplete pattern match. Recent GHCs included -Wincomplete-uni-patterns in -Wall:

warning: [GHC-62161] [-Wincomplete-uni-patterns]
    Pattern match(es) are non-exhaustive

There is a relation: the inputs and outputs counts should match, but that is not encoded in the types, so compiler cannot know.


One option is to use Vec:

(&%>) :: Vec n FilePattern -> (Vec n FilePath -> Action ()) -> Rules ()

Here it's clear that the output count will match the input count.

Than the usage will look like:

("*.o" ::: "*.hi" ::: Nil) &%> \(o ::: hi ::: Nil) -> do
    let hs = o -<.> "hs"
    need ... -- all files the .hs import
    cmd "ghc -c" [hs]

This is slightly more noisy1 but the pattern match is complete.


Another alternative is to use traversals.

(&%>) :: Traversable t
      => t FilePattern -> (t FilePath -> Action ()) -> Rules ()

This abstracts over both previous usages. You may use Vecs if you really don't like (turning off) the incomplete pattern warnings. Or you may continue use lists, as lists are Traversable, and the signature of this variant of (&%>) tells (and restricts the implementation) to just traversing the structure.

You can go one step further and use Each class from lens2, which generalises Traversable:

(&%>) :: Each FilePattern FilePath ps fs
      => ps -> (fs -> Action ()) -> Rules ()

As Each has special instance for tuples (forcing them to be homogeneous), our running example can be written neatly as:

("*.o","*.hi") &%> \(o,hi) -> do
    let hs = o -<.> "hs"
    need ... -- all files the .hs import
    cmd "ghc -c" [hs]

Each traversal :: Applicative f => (a -> f b) -> s -> f t can be converted into a s -> FunList a b t function and back:

data FunList a b t = Done t
                   | More a (FunList a b (b -> t))

-- this can be done more efficent using Curried Yoneda,
-- without using `append`.
-- See https://dl.acm.org/doi/10.1145/3236780
-- and https://gist.github.com/phadej/f5e8107e303265241e6b7b556db5ca48
funList :: (forall f. Applicative f => (a -> f b) -> s -> f t)
        -> s -> FunList a b t
funList trav s = trav singleton s

unfunList :: forall f s t a b. Applicative f => (s -> FunList a b t)
          -> (a -> f b) -> s -> f t
unfunList f afb s = go (f s) where
    go :: FunList a b r -> f r
    go (Done t)    = pure t
    go (More x xs) = liftA2 (&) (afb x) (go xs)

where

empty :: t -> FunList a b t
empty = Done

append :: (t -> s -> r) -> FunList a b t -> FunList a b s -> FunList a b r
append h (Done t)    ys = fmap (\s -> h t s) ys
append h (More x xs) ys = More x $ append (\bt s b -> h (bt b) s) xs ys

singleton :: a -> FunList a b b
singleton x = More x (Done id)

instance Applicative (FunList a b) where
    pure = empty
    liftA2 = append

so if your underlying implementation would be easier using a concrete type (instead of using traversal directly) then a FunList is one candidate:

(&%>) :: FunList FilePattern FilePath res -> (res -> Action ()) -> Rules ()

that would be terrible to use, but it might be about as easy to implement as list variant.


Alternatively, you can "cheat" like lens does in partsOf implementation, by using a state monad:

Given an like operation fooList :: Monad m => [k] -> m [v], we can write a generalized version

fooTrav :: (Monad m, Traversable t) => t k -> m (t v)
fooTrav ks = do
    -- convert to list and use fooList
    vs <- fooList (toList ks)

    -- traverse ks again, replacing each k with a v from the state
    evalStateT (traverse (\_k -> state un) vs) ks
  where
    un []     = error "invalid traversal"
    un (x:xs) = (x, xs)

Implementation using Each would look somewhat similar.


Finally, not only Traversable-powered interface allows to use complete pattern matches as in shake like use-cases, it also allows using more elaborate data-structures for batch operations.

For example, if you have a Map Client [Key] and you want to lookup every value getting Map Client [Value] back.

With Traversable interface it's as easy as using Compose, turning Map Client [Key] into Compose (Map Client) [] Key which fits the Traversable interface perfectly, so you avoid bundling-and-distributing code in the use sites: Map in, Map out.

The answer is always traverse.


  1. And if we had different OverloadedLists, this could look like previous, though I'm not aware if anyone figure how to do overloaded pattern matches for list-like structures so that Vec would fit it too.↩︎

  2. Which I'd like to split out into own tiny package https://github.com/ekmett/lens/issues/1050↩︎

October 12, 2023 12:00 AM

October 09, 2023

Matt Parsons

Matt von Hagen

I am happy to announce that I have married Emily von Hagen. I am taking her name.

I will be registering new account names to remove Parsons from things. However, the URLs on the web live forever, and this blog will remain up. I will eventually (he says, again) make a forwarding service and complete the migration to overcoming.software.

October 09, 2023 12:00 AM