November 20, 2025

Tweag I/O

Migrating to Bazel symbolic macros
In Bazel, there are two types of macros: legacy macros and symbolic macros, that were introduced in Bazel 8. Symbolic macros are recommended for code clarity, where possible. They include enhancements like typed arguments and the ability to define and limit the visibility of the targets they create.

This post is intended for experienced Bazel engineers or those tasked with modernizing the build metadata of their codebases. The following discussion assumes a solid working knowledge of Bazel’s macro system and build file conventions. If you are looking to migrate legacy macros or deepen your understanding of symbolic macros, you’ll find practical guidance and nuanced pitfalls addressed here.

What are symbolic macros?

Macros instantiate rules by acting as templates that generate targets. As such, they are expanded in the loading phase, when Bazel definitions and BUILD files are loaded and evaluated. This is in contrast with build rules that are run later in the analysis phase. In older Bazel versions, macros were defined exclusively as Starlark functions (the form that is now called “legacy macros”). Symbolic macros are an improvement on that idea; they allow defining a set of attributes similar to those of build rules.

In a BUILD file, you invoke a symbolic macro by supplying attribute values as arguments. Because Bazel is explicitly aware of symbolic macros and their function in the build process, they can be considered “first-class macros”. See the Symbolic macros design document to learn more about the rationale. Symbolic macros also intend to support lazy evaluation, a feature that is currently being considered for a future Bazel release. When that functionality is implemented, Bazel would defer evaluating a macro until the targets defined by that macro are actually requested.

Conventions and restrictions

There is already good documentation that explains how to write symbolic macros. In this section, we are going to take a look at some practical examples of the restrictions that apply to their implementation, which you can learn more about in the Restrictions docs page.

Naming

Any targets created by a symbolic macro must either match the macro’s name parameter exactly or begin with that name followed by a _ (preferred), ., or -. This is different from legacy macros which don’t have naming constraints.

This symbolic macro
# defs.bzl
def _simple_macro_impl(name):
    native.genrule(
        name = "genrule" + name,
        outs = [name + "_out.data"],
        srcs = ["//:file.json"],
    )

# BUILD.bazel
simple_macro(name = "tool")
would fail when evaluated:
$ bazel cquery //...
ERROR: in genrule rule //src:genruletool: Target //src:genruletool declared in symbolic macro 'tool'
violates macro naming rules and cannot be built.
This means simple_macro(name = "tool") may only produce files or targets named tool or starting with tool_, tool., or tool-. In this particular macro, tool_genrule would work.

Access to undeclared resources

Symbolic macros must follow Bazel’s standard visibility rules: they cannot directly access source files unless those files are passed in as arguments or are made public by their parent package. This is different from legacy macros, whose implementations were effectively inlined into the BUILD file where they were called.

Attributes

Positional arguments

In legacy macro invocations, you could have passed the attribute values as positional arguments. For instance, these are perfectly valid legacy macro calls:
# defs.bzl
def special_test_legacy(name, tag = "", **kwargs):
    kwargs["name"] = name
    kwargs["tags"] = [tag] if tag else []
    cc_test(**kwargs)


# BUILD.bazel
special_test_legacy("no-tag")
special_test_legacy("with-tag", "manual")
With the macro’s name and tags collected as expected:
$ bazel cquery //test/package:no-tag --output=build
cc_test(
  name = "no-tag",
  tags = [],
  ...
)

$ bazel cquery //test/package:with-tag --output=build
cc_test(
  name = "with-tag",
  tags = ["manual"],
  ...
)
You can control how arguments are passed to functions by using an asterisk (*) in the parameter list of a legacy macro, as per the Starlark language specs. If you are a seasoned Python developer (Starlark’s syntax is heavily inspired by Python), you might have already guessed that this asterisk separates positional arguments from keyword-only arguments:
# defs.bzl
def special_test_legacy(name, *, tag = "", **kwargs):
    kwargs["name"] = name
    kwargs["tags"] = [tag] if tag else []
    cc_test(**kwargs)

# BUILD.bazel
special_test_legacy("no-tag")  # okay
special_test_legacy("with-tag", tag="manual") # okay
# Error: special_test_legacy() accepts no more than 1 positional argument but got 2
special_test_legacy("with-tag", "manual")
Positional arguments are not supported in symbolic macros as attributes must either be declared in the attrs dictionary (which would make it automatically a keyword argument) or be inherited in which case it should also be provided by name.

Arguably, avoiding positional arguments in macros altogether is helpful because it eliminates subtle bugs caused by incorrect order of parameters passed and makes them easier to read and easier to process by tooling such as buildozer.

Default values

Legacy macros accepted default values for their parameters which made it possible to skip passing certain arguments:
# defs.bzl
def special_test_legacy(name, *, purpose = "dev", **kwargs):
    kwargs["name"] = name
    kwargs["tags"] = [purpose]
    cc_test(**kwargs)

# BUILD.bazel
special_test_legacy("dev-test")
special_test_legacy("prod-test", purpose="prod")
With symbolic macros, the default values are declared in the attrs dictionary instead:
# defs.bzl
def _special_test_impl(name, purpose = "dev", **kwargs):
    kwargs["tags"] = [purpose]
    cc_test(
        name = name,
        **kwargs
    )

special_test = macro(
    inherit_attrs = native.cc_test,
    attrs = {
        "purpose": attr.string(configurable = False, default = "staging"),
        "copts": None,
    },
    implementation = _special_test_impl,
)

# BUILD.bazel
special_test(
    name = "my-special-test-prod",
    srcs = ["test.cc"],
    purpose = "prod",
)

special_test(
    name = "my-special-test-dev",
    srcs = ["test.cc"],
)
Let’s see what kind of tags are going to be set for these cc_test targets:
$ bazel cquery //test/package:my-special-test-prod --output=build
cc_test(
  name = "my-special-test-prod",
  tags = ["prod"],
  ...
)

$ bazel cquery //test/package:my-special-test-dev --output=build
cc_test(
  name = "my-special-test-dev",
  tags = ["staging"],
  ...
)
Notice how the default dev value declared in the macro implementation was never used. This is because the default values defined for parameters in the macro’s function are going to be ignored, so it’s best to remove them to avoid any confusion.

Also, all the inherited attributes have a default value of None, so make sure to refactor your macro logic accordingly. Be careful when processing the keyword arguments to avoid subtle bugs such as checking whether a user has passed [] in a keyword argument merely by doing if not kwargs["attr-name"] as None would also be evaluated to False in this context.

This might be potentially confusing as the default value for many common attributes is not None. Take a look at the target_compatible_with attribute which normally has the default value [] when used in a rule, but when used in a macro, would still by default be set to None. Using bazel cquery //:target --output=build with some print calls in your .bzl files can help when refactoring.

Inheritance

Macros are frequently designed to wrap a rule (or another macro), and the macro’s author typically aims to pass most of the wrapped symbol’s attributes using **kwargs directly to the macro’s primary target or the main inner macro without modification.

To enable this behavior, a macro can inherit attributes from a rule or another macro by providing the rule or macro symbol to the inherit_attrs parameter of macro(). Note that when inherit_attrs is set, the implementation function must have a **kwargs parameter. This makes it possible to avoid listing every attribute that the macro may accept, and it is also possible to disable certain attributes that you don’t want macro callers to provide. For instance, let’s say you don’t want copts to be defined in macros that wrap cc_test because you want to manage them internally within the macro body instead:
# BUILD.bazel
special_test(
    name = "my-special-test",
    srcs = ["test.cc"],
    copts = ["-std=c++22"],
)
This can be done by setting the attributes you don’t want to inherit to None.
# defs.bzl
special_test = macro(
    inherit_attrs = native.cc_test,
    attrs = { "copts": None },
    implementation = _special_test_impl,
)
Now the macro caller will see that copts is not possible to declare when calling the macro:
$ bazel query //test/package:my-special-test
        File "defs.bzl", line 19, column 1, in special_test
                special_test = macro(
Error: no such attribute 'copts' in 'special_test' macro
Keep in mind that all inherited attributes are going to be included in the kwargs parameter with the default value of None unless specified otherwise. This means you have to be extra careful in the macro implementation function if you refactor a legacy macro: you can no longer merely check for the presence of a key in the kwargs dictionary.

Mutation

In symbolic macros, you will not be able to mutate the arguments passed to the macro implementation function.
# defs.bzl
def _simple_macro_impl(name, visibility, env):
    print(type(env), env)
    env["some"] = "more"

simple_macro = macro(
    attrs = {
        "env": attr.string_dict(configurable = False)
    },
    implementation = _simple_macro_impl
)

# BUILD.bazel
simple_macro(name = "tool", env = {"state": "active"})
Let’s check how this would get evaluated:
$ bazel cquery //...
DEBUG: defs.bzl:36:10: dict {"state": "active"}
        File "defs.bzl", line 37, column 17, in _simple_macro_impl
                env["some"] = "more"
Error: trying to mutate a frozen dict value
This, however, is no different to legacy macros where you could not modify mutable objects in place either. In situations like this, creating a new dict with env = dict(env) would be of help.

In legacy macros you can still modify objects in place when they are inside the kwargs, but this arguably leads to code that is harder to reason about and invites subtle bugs that are a nightmare to troubleshoot in a large codebase. See the Mutability in Starlark section to learn more.

This is still possible in legacy macros:
# defs.bzl
def special_test_legacy(name, **kwargs):
    kwargs["name"] = name
    kwargs["env"]["some"] = "more"
    cc_test(**kwargs)

# BUILD.bazel
special_test_legacy("small-test", env = {"state": "active"})
Let’s see how the updated environment variables were set for the cc_test target created in the legacy macro:
$ bazel cquery //test/package:small-test --output=build
...
cc_test(
  name = "small-test",
  ...
  env = {"state": "active", "some": "more"},
)
This is no longer allowed in symbolic macros:
# defs.bzl
def _simple_macro_impl(name, visibility, **kwargs):
    print(type(kwargs["env"]), kwargs["env"])
    kwargs["env"]["some"] = "more"
It would fail to evaluate:
$ bazel cquery //...

DEBUG: defs.bzl:35:10: dict {"state": "active"}
        File "defs.bzl", line 36, column 27, in _simple_macro_impl
                kwargs["env"]["some"] = "more"
Error: trying to mutate a frozen dict value
Configuration

Symbolic macros, just like legacy macros, support configurable attributes, commonly known as select(), a Bazel feature that lets users determine the values of build rule (or macro) attributes at the command line.

Here’s an example symbolic macro with the select toggle:
# defs.bzl
def _special_test_impl(name, **kwargs):
    cc_test(
        name = name,
        **kwargs
    )
special_test = macro(
    inherit_attrs = native.cc_test,
    attrs = {},
    implementation = _special_test_impl,
)

# BUILD.bazel
config_setting(
    name = "linking-static",
    define_values = {"static-testing": "true"},
)

config_setting(
    name = "linking-dynamic",
    define_values = {"static-testing": "false"},
)

special_test(
    name = "my-special-test",
    srcs = ["test.cc"],
    linkstatic = select({
        ":linking-static": True,
        ":linking-dynamic": False,
        "//conditions:default": False,
    }),
)
Let’s see how this expands in the BUILD file:
$ bazel query //test/package:my-special-test --output=build
cc_test(
  name = "my-special-test",
  ...(omitted for brevity)...
  linkstatic = select({
    "//test/package:linking-static": True,
    "//test/package:linking-dynamic": False,
    "//conditions:default": False
  }),
)
The query command does show that the macro was expanded into a cc_test target, but it does not show what the select() is resolved to. For this, we would need to use the cquery (configurable query) which is a variant of query that runs after select()s have been evaluated.
$ bazel cquery //test/package:my-special-test --output=build
cc_test(
  name = "my-special-test",
  ...(omitted for brevity)...
  linkstatic = False,
)
Let’s configure the test to be statically linked:
$ bazel cquery //test/package:my-special-test --output=build --define="static-testing=true"
cc_test(
  name = "my-special-test",
  ...(omitted for brevity)...
  linkstatic = True,
)
Each attribute in the macro function explicitly declares whether it tolerates select() values, in other words, whether it is configurable. For common attributes, consult the Typical attributes defined by most build rules to see which attributes can be configured. Most attributes are configurable, meaning that their values may change when the target is built in different ways; however, there are a handful which are not. For example, you cannot assign a *_test target to be flaky using a select() (e.g., to mark a test as flaky only on aarch64 devices).

Unless specifically declared, all attributes in symbolic macros are configurable (if they support this) which means they will be wrapped in a select() (that simply maps //conditions:default to the single value), and you might need to adjust the code of the legacy macro you migrate. For instance, this legacy code used to append some dependencies with the .append() list method, but this might break:
# defs.bzl
def _simple_macro_impl(name, visibility, **kwargs):
    print(kwargs["deps"])
    kwargs["deps"].append("//:commons")
    cc_test(**kwargs)

simple_macro = macro(
    attrs = {
        "deps": attr.label_list(),
    },
    implementation = _simple_macro_impl,
)

# BUILD.bazel
simple_macro(name = "simple-test", deps = ["//:helpers"])
Let’s evaluate the macro:
$ bazel cquery //...
DEBUG: defs.bzl:35:10: select({"//conditions:default": [Label("//:helpers")]})
        File "defs.bzl", line 36, column 19, in _simple_macro_impl
                kwargs["deps"].append("//:commons")
Error: 'select' value has no field or method 'append'
Keep in mind that select is an opaque object with limited interactivity. It does, however, support modification in place, so that you can extend it, e.g., with kwargs["deps"] += ["//:commons"]:
$ bazel cquery //test/package:simple-test --output=build
...
cc_test(
  name = "simple-test",
  generator_name = "simple-test",
  ...
  deps = ["//:commons", "//:helpers", "@rules_cc//:link_extra_lib"],
)
Be extra vigilant when dealing with attributes of bool type that are configurable because the return type of select converts silently in truthy contexts to True. This can lead to some code being legitimate, but not doing what you intended. See Why does select() always return true? to learn more.

When refactoring, you might need to make an attribute configurable, however, it may stop working using the existing macro implementation. For example, imagine you need to pass different files as input to your macro depending on the configuration specified at runtime:
# defs.bzl
def _deployment_impl(name, visibility, filepath):
    print(filepath)
    # implementation

simple_macro = macro(
    attrs = {
        "filepath": attr.string(),
    },
    implementation = _deployment_impl,
)

# BUILD.bazel
deployment(
    name = "deploy",
    filepath = select({
        "//conditions:default": "deploy/config/dev.ini",
        "//:production": "deploy/config/production.ini",
    }),
)
In rules, select() objects are resolved to their actual values, but in macros, select() creates a special object of type select that isn’t evaluated until the analysis phase, which is why you won’t be able to get actual values out of it.
$ bazel cquery //:deploy
...
select({
    Label("//conditions:default"): "deploy/config/dev.ini",
    Label("//:production"): "deploy/config/production.ini"
    })
...
In some cases, such as when you need to have the selected value available in the macro function, you can have the select object resolved before it’s passed to the macro. This can be done with the help of an alias target, and the label of a target can be turned into a filepath using the special location variable:
# defs.bzl
def _deployment_impl(name, visibility, filepath):
    print(type(filepath), filepath)
    native.genrule(
        name = name + "_gen",
        srcs = [filepath],
        outs = ["config.out"],
        cmd = "echo '$(location {})' > $@".format(filepath)
    )

deployment = macro(
    attrs = {
        "filepath": attr.label(configurable = False),
    },
    implementation = _deployment_impl,
)

# BUILD.bazel
alias(
    name = "configpath",
    actual = select({
        "//conditions:default": "deploy/config/dev.ini",
        "//:production": "deploy/config/production.ini",
    }),
    visibility = ["//visibility:public"],
)

deployment(
    name = "deploy",
    filepath = ":configpath",
)
You can confirm the right file is chosen when passing different configuration flags before building the target:
$ bazel cquery //tests:configpath --output=build
INFO: Analyzed target //tests:configpath (0 packages loaded, 1 target configured).
...
alias(
  name = "configpath",
  visibility = ["//visibility:public"],
  actual = "//tests:deploy/config/dev.ini",
)
...

$ bazel build //tests:deploy_gen && cat bazel-bin/tests/config.out
...
DEBUG: defs.bzl:29:10: Label //tests:configpath
...
tests/deploy/config/dev.ini
Querying macros

Since macros are evaluated when BUILD files are queried, you cannot use Bazel itself to query “raw” BUILD files. Identifying definitions of legacy macros is quite difficult, as they resemble Starlark functions, but instantiate targets. Using bazel cquery with the --output=starlark might help printing the properties of targets to see if they have been instantiated from macros.

When using --output=build, you can also inspect some of the properties:

generator_name (the name attribute of the macro)

generator_function (which function generated the rules)

generator_location (where the macro was invoked)

This information with some heuristics might help you to identify the macros. Once you have identified the macro name, you can run bazel query --output=build 'attr(generator_function, simple_macro, //...)' to find all targets that are generated by a particular macro. Finding symbolic macros, in contrast, is trivial as you would simply need to grep for macro() function calls in .bzl files.

To query unprocessed BUILD files, you might want to use buildozer which is a tool that lets you query the contents of BUILD files using a static parser. The tool will come in handy for various use cases when refactoring, such as migrating the macros. Because both legacy and symbolic macros follow the same BUILD file syntax, buildozer can be used to query build metadata for either type.

Let’s write some queries for these macro invocations:
# BUILD.bazel
perftest(
  name = "apis",
  srcs = ["//:srcA", "//:srcB"],
  env = {"type": "performance"},
)

perftest(
  name = "backend",
  srcs = ["//:srcC", "//:srcD"],
  env = {"type": "performance"},
)
Print all macro invocations (raw) across the whole workspace:
$ buildozer 'print rule' "//...:%perftest"

perftest(
    name = "apis",
    srcs = [
        "//:srcA",
        "//:srcB",
    ],
    env = {"type": "performance"},
)

perftest(
    name = "backend",
    srcs = [
        "//:srcC",
        "//:srcD",
    ],
    env = {"type": "performance"},
)
Print attribute’s values for all macro invocations:
$ buildozer 'print label srcs' "//...:%perftest"
//test/package:apis [//:srcA //:srcB]
//test/package:backend [//:srcC //:srcD]
Print path to files where macros are invoked:
$ buildozer 'print path' "//...:%perftest" | xargs realpath --relative-to "$PWD" | sort | uniq
test/package/BUILD.bazel
The path can be combined with an attribute, e.g., print path and the srcs to make reviewing easier:
$ buildozer 'print path srcs' "//...:%perftest"
/home/user/code/project/test/package/BUILD.bazel [//:srcA //:srcB]
/home/user/code/project/test/package/BUILD.bazel [//:srcC //:srcD]
Remove an attribute from a macro invocation (e.g., env will be set up in the macro implementation function):
$ buildozer 'remove env' "//...:%perftest"
fixed /home/user/code/project/test/package/BUILD.bazel
You might also want to check that no macro invocation passes an attribute that is not supposed to be passed. In the command output, the missing means the attribute doesn’t exist; these lines can of course be ignored with grep -v missing:
$ buildozer -quiet 'print path env' "//...:%perftest" 2>/dev/null
/home/user/code/project/test/package/BUILD.bazel {"type": "performance"}
/home/user/code/project/test/package/BUILD.bazel (missing)
We hope that these practical suggestions and examples will assist you in your efforts to modernize the use of macros throughout your codebase. Remember that you can compose legacy and symbolic macros, which may be useful during the transition. Also, legacy macros can still be used and are to remain supported in Bazel for the foreseeable future. Some organizations may even choose not to migrate at all, particularly if they rely on the current behavior of the legacy macros heavily.
November 20, 2025 12:00 AM

November 18, 2025

Donnacha Oisín Kidney

POPL Paperâ€”Hyperfunctions: Communicating Continuations

Posted on November 18, 2025

Tags:

New paper: “Hyperfunctions: Communicating Continuations”, by myself and Nicolas Wu, will be published at POPL 2026.

The preprint is available here.

The work contained in the paper started with a post on this blog in 2021. I had read a paper by Launchbury, Krstić, and Sauerwein (2000) and I recognised that their hyperfunction construction was quite similar to some types I had used to implement breadth-first traversal (in particular, the Queue in this post). After that, I started seeing hyperfunctions in lots of different settings, rediscovered by different authors, and almost always accompanied by some remark about how difficult it was to understand the type.

My hope with this paper is to clarify and explain what hyperfunctions can do, and where they might be useful. Ideally, the paper will save some future programmer from having to reinvent the type.

Launchbury, John, Sava Krstić, and Timothy E. Sauerwein. 2000. Zip Fusion with Hyperfunctions. Oregon Graduate Institute. https://launchbury.blog/wp-content/uploads/2019/01/zip-fusion-with-hyperfunctions.pdf.

by Donnacha Oisín Kidney at November 18, 2025 12:00 AM

November 17, 2025

Monday Morning Haskell

Serializing an HTTP Response & Running the Server
Welcome to the third and final part of our simple HTTP Server series. In Part 1, we defined our request and response types, showing how we can use documentation to understand these. In Part 2, we wrote a parser for the HTTP Request type, which will allow our server to understand the inputs we receive from clients. In this final part, we’ll define a function to produce a response, serialize that response, and wrap all this in an actual server.

If you want a more in depth tutorial on some of the aspects of this series, you can take a look at some of our Haskell courses! Solve.hs will teach you about parsing so that you can use Megaparsec effectively. And Practical Haskell will show you how to write a more advanced server with special Haskell type features!

Creating a Response

Before we get started, let’s recall what our main types look like:
newtype HttpHeaders = HttpHeaders
    (HM.HashMap ByteString ByteString)
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    , requestHeaders :: HttpHeaders
    , requestBody :: Maybe ByteString
    }
    deriving (Show, Eq)

data HttpResponse = HttpResponse
    { responseHttpVersion :: (Word8, Word8)
    , responseStatusCode :: Int
    , responseReason :: ByteString
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)
Last time, we parsed a request. Since we’re writing a very simple server, we won’t actually use much of anything in that request. We’ll just return a basic “200 OK” response if we parsed the request successfully, and a “400 Bad Request” response if we could not parse the request.

When we run a Megaparsec parser, we get an Either result thanks to this function:
runParserT :: ParsecT e s m a -> m (Either (ParseErrorBundle) a)
We don’t need to know much about ParseErrorBundle, except that it is Show-able. So we’ll write our server function to have this outline:
server :: (Show a) => Either a HttpRequest -> IO HttpResponse
server (Left e) = undefined
server (Right (HttpRequest m u v (HttpHeaders h) b)) = undefined
In the Left case, we want to build a response that has a 400 exit code, a “Bad Request” reason, and have it show the parse error for our response body. Here’s our first pass:
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.HashMap.Lazy as HM
import qualified Data.ByteString.Lazy.Char8 as BSC

server :: (Show a) => Either a HttpRequest -> IO HttpResponse
server (Left e) =
    let body = BSC.pack (show e)
    in  return $ HttpResponse (1,1) 400 "Bad Request" (HttpHeaders HM.empty) (Just body)
There’s something missing though! When we return a response, we always want to include a Content-Length header so that the recipient knows how many bytes to parse for the body. It’s easy enough to add this though:
server :: (Show a) => Either a HttpRequest -> IO HttpResponse
server (Left e) =
    let body = BSC.pack (show e)
        headerMap = HM.singleton "Content-Length" (BSC.pack $ show (BS.length body))
    in  return $ HttpResponse (1,1) 400 "Bad Request" (HttpHeaders headerMap) (Just body)
server (Right (HttpRequest m u v (HttpHeaders h) b)) = undefined
The Right branch will have a similar look. We’ll just use a dummy body instead of reporting an error. The other difference here is that we’ll use the input’s HTTP version as our version.
server :: (Show a) => Either a HttpRequest -> IO HttpResponse
server (Left e) =
    let body = BSC.pack (show e)
        headerMap = HM.singleton "Content-Length" (BSC.pack $ show (BS.length body))
    in  return $ HttpResponse (1,1) 400 "Bad Request" (HttpHeaders headerMap) (Just body)
server (Right (HttpRequest m u v (HttpHeaders h) b)) = do
    let body = "This is the response body!"
    let headerMap = HM.singleton "Content-Length" (BSC.pack $ show (BS.length body))
    return $ HttpResponse v 200 "OK" (HttpHeaders headerMap) (Just body)
Serializing the Response

Now that we’ve got our response, we need to convert it to a ByteString! This is the opposite of parsing, but a lot of the same principles apply. We want to consider the specification of the response in our documentation, outline the structure of our response, write applicable helpers, and then fill everything in.

Here’s what a general outline might look like:
serializeHttpVersion :: (Word8, Word8) -> ByteString

serializeStatusCode :: Int -> ByteString

serializeHttpHeaders :: HttpHeaders -> ByteString

serializeBody :: Maybe ByteString -> ByteString

-- Not final!
serializeHttpResponse :: HttpResponse -> ByteString
serializeHttpResponse (HttpResponse v c r h b) =
    serializeHttpVersion v <> serializeStatusCode c <> r <>
    serializeHttpHeaders h <> serializeBody b
Obviously, we don’t need a special function to serialize the “reason” field since it’s already a ByteString! Additionally, it is easy enough to serialize the body with fromMaybe:
serializeHttpVersion :: (Word8, Word8) -> ByteString

serializeStatusCode :: Int -> ByteString

serializeHttpHeaders :: HttpHeaders -> ByteString

-- Not final!
serializeHttpResponse :: HttpResponse -> ByteString
serializeHttpResponse (HttpResponse v c r h b) =
    serializeHttpVersion v <> serializeStatusCode c <> r <>
    serializeHttpHeaders h <> end
    where
        end = fromMaybe “” b
However, we should consider as well the format of the “response line” and the headers in the documentation.

Here’s the full response format:
Response = Status-Line
           *(( general-header
            | response-header
            | entity-header ) CRLF)
           CRLF
           [ message-body ]
Here’s the response line:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
And here’s the header structure:
message-header = field-name ":" [ field-value ]
 field-name     = token
 field-value    = *( field-content | LWS )
 field-content  = <the OCTETs making up the field-value
                  and consisting of either *TEXT or combinations
                  of token, separators, and quoted-string>
Just as with parsing, we’ll want helpers to incorporate the SP character and CRLF sequence into our serialized bytestring using helpers.

Adding Helpers

Since prepending is generally more efficient with Haskell’s string manipulation that appending, we’ll write these helpers like so:
addSp :: ByteString -> ByteString
addSp = BS.cons (o ' ')

addCrlf :: ByteString -> ByteString
addCrlf = (<>) "\r\n"
Factoring these in, we can put spaces in front of the status code and reason, and a CLRF in front of the headers section. (We’ll handle the final CRLF before the body in serializeHttpHeaders).
serializeHttpResponse :: HttpResponse -> ByteString
serializeHttpResponse (HttpResponse v c r h b) =
    serializeHttpVersion v <> addSp (serializeStatusCode c) <> addSp r <>
    (addCrlf $ serializeHttpHeaders h) <> end
    where
        end = fromMaybe "" b
Version and Code

Now we just have to fill in the functions we outlined earlier. We can start with the HTTP version and the status code. These are straightforward, since we are mostly just using show and pack to convert these values into bytestrings. With the version, we also include the HTTP/ prefix and the . separator.
serializeHttpVersion :: (Word8, Word8) -> ByteString
serializeHttpVersion (d1, d2) = "HTTP/" <>
    (BSC.pack $ (show d1 <> ('.' : show d2)))

serializeStatusCode :: Int -> ByteString
serializeStatusCode = BSC.pack . show
Serializing Headers

Serializing headers is a little trickier, just like parsing them was. The crux of this is HashMap’s foldrWithKey function:
foldrWithKey :: (k -> v -> a -> a) -> a -> HashMap k v -> a
In this setup, k, v and a are all ByteStrings. We get k and v from the keys and values in our map. Then a is the resulting bytestring we construct by appending. Our “folding function” will take one key and value and prepend the line with these to the prior string. Here’s what that folding function looks like:
serializeHttpHeaders :: HttpHeaders -> ByteString
serializeHttpHeaders (HttpHeaders mp) = ...
    where
        f k v bs = k <> (o ':' `BS.cons` v) <> (addCrlf bs)
Starting from the right of this line, we prepend the CRLF for this line to the prior bytestring. Then we prepend a colon : to the value of this header. Finally we prepend the key, the name of the header to this. We invoke the fold like so:
serializeHttpHeaders :: HttpHeaders -> ByteString
serializeHttpHeaders (HttpHeaders mp) = HM.foldrWithKey f "\r\n" mp
    where
        f k v bs = k <> (o ':' `BS.cons` v) <> (addCrlf bs)
The “initial” bytestring is a CLRF, since the headers section must contain this sequence even without any headers. As an alternative though, we could have prepended it to our “body” portion as well and used an empty string here.

And believe it or not, we’ve tied up all the loose ends with serializing our response! That function should be working now. We just have to pull everything together at the server level.

Writing the Networking Layer

When writing a utility “from scratch”, you’ll have many choices of which details you’ll implement at the lowest possible level. So far, we’ve gone into a lot of depth on parsing and serializing the message. We could also go into a lot of depth on the network aspects of running a server - binding to a port, listening on that port, and accepting connections. You’ll find this sort of functionality in Network.Socket.

Instead though, we’ll shortcut a lot of those details by relying on the runTCPServer function from Network.Run.TCP.
runTCPServer :: Maybe HostName -> ServiceName -> (Socket -> IO a) -> IO a
All we have to provide to use this is our hostname (Just “127.0.0.1” to run locally), a port (e.g. 3000) and a server loop function to communicate with the socket:
import Network.Run.TCP (runTCPServer)

serverLoop :: Socket -> IO ()
serverLoop sock = ...

main :: IO ()
main = runTCPServer (Just “127.0.0.1”) “3000” serverLoop
Our final remaining task is to implement serverLoop. This function handles a single connection with a single client, communicating back and forth until the client ends the connection. It runs as a separate process, forked off our main process. There are three primary functions we would use to run this communication. Since we’re using bytestrings, we’ll get them from Network.Socket.ByteString.
recv :: Socket -> Int -> IO ByteString

send :: Socket -> ByteString -> IO Int

sendAll :: Socket -> ByteString -> IO ()
When we call recv, our function blocks until it receives data from the client, up to a maximum number of bytes we specify. (A single large request could require multiple recv calls). We can then use send to communicate our response, which will return the number of bytes the client accepted. Again, large requests could require multiple send calls, but sendAll automatically ensures the whole response gets sent.

Hopefully the structure of serverLoop is clear. We want to “receive” the data, parse it as a request, pass the parser result to our server function to get a response, and send this response. We’ll also recurse to listen for more data from this client, unless we receive a null input (this indicates the client closed the connection).
server :: (Show a) => Either a HttpRequest -> IO HttpResponse
server = ... (as above)

serverLoop :: Socket -> IO ()
serverLoop sock = do
    msg <- recv sock 1024
    unless (BS.null msg) $ do
        req <- runParserT parseHttpRequest "" msg
        response <- server req
        let resp = serializeHttpResponse response
        sendAll sock resp
        serverLoop sock
Our server is now complete! The next section has all the code we wrote this week so you can put it all together:

Final Code
{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy.Char8 as BSC
import Data.ByteString.Lazy (ByteString)
import Network.Run.TCP
import Network.Socket
import Network.Socket.ByteString.Lazy
import Text.Megaparsec

addSp :: ByteString -> ByteString
addSp = BS.cons (o ' ')

addCrlf :: ByteString -> ByteString
addCrlf = (<>) "\r\n"

serializeHttpVersion :: (Word8, Word8) -> ByteString
serializeHttpVersion (d1, d2) = "HTTP/" <>
    (BSC.pack $ (show d1 <> ('.' : show d2)))

serializeStatusCode :: Int -> ByteString
serializeStatusCode = BSC.pack . show

serializeHttpHeaders :: HttpHeaders -> ByteString
serializeHttpHeaders (HttpHeaders mp) = HM.foldrWithKey f "\r\n" mp
    where
        f k v bs = k <> (o ':' `BS.cons` v) <> (addCrlf bs)

serializeHttpResponse :: HttpResponse -> ByteString
serializeHttpResponse (HttpResponse v c r h b) =
    serializeHttpVersion v <> addSp (serializeStatusCode c) <> addSp r <>
    (addCrlf $ serializeHttpHeaders h) <> end
    where
        end = fromMaybe "" b

server :: (Show a) => Either a HttpRequest -> IO HttpResponse
server (Left e) =
    let body = BSC.pack (show e)
        headerMap = HM.singleton "Content-Length" (BSC.pack $ show (BS.length body))
    in  return $ HttpResponse (1,1) 400 "Bad Request" (HttpHeaders headerMap) (Just body)
server (Right (HttpRequest m u v (HttpHeaders h) b)) = do
    let body = "This is the response body!"
    let headerMap = HM.singleton "Content-Length" (BSC.pack $ show (BS.length body))
    return $ HttpResponse v 200 "OK" (HttpHeaders headerMap) (Just body)

serverLoop :: Socket -> IO ()
serverLoop sock = do
    putStrLn "Running Server Loop"
    msg <- recv sock 1024
    unless (BS.null msg) $ do
        req <- runParserT parseHttpRequest "" msg
        response <- server req
        let resp = serializeHttpResponse response
        sendAll sock resp
        serverLoop sock

main :: IO ()
main = runTCPServer (Just "127.0.0.1") "3000" serverLoop
If you put all the code from the three parts of this series together, you’ll have an executable program you can run. While running this, you can point your browser or an application like Postman to this program and it should return a “200 OK” response with a body!

Obviously there’s more work to do to make this server useful, but you can now work entirely off of the HttpRequest and HttpResponse types without having to worry about the network layer, parsing or serialization!

Conclusion

If you want to go even further with writing a server, you can take our course Practical Haskell. In this course you’ll learn about Servant, which uses Haskell’s type mechanics to abstract away even more HTTP details. Instead of worrying about requests and responses, you can define all your functions in terms of your “business” types.

You can also learn more about parsing by taking our other course, Solve.hs, and focusing on Module 4. You’ll learn how to build a parser from scratch, as well as how to use Megaparsec.

Starting next week, we’ll follow a similar trajectory of “building from scratch” with a more complicated project!
by James Bowen at November 17, 2025 09:30 AM

November 16, 2025

Philip Wadler

Maybe Donâ€™t Talk to the New York Times About Zohran Mamdani

Peter Coviello explains how the New York Times enables right-wing billionaires to push their agenda.
Like so many other bits of Times coverage, the whole of the piece is structured as an orchestrated encounter. Some people say this; however, others say this. It’s so offhand you can think you’re gazing through a pane of glass. Only when you stand a little closer, or when circumstances make you a little less blinkered, do you notice the fact which then becomes blinding and finally crazymaking, which is just that there is zero, less than zero, stress put on the relation between those two “sides,” or their histories, or their sponsors, or their relative evidentiary authority, or any of it. Instead, what you get is a piece making the various more or less bovine noises of studious grey-lady impartiality, with the labor of anything resembling “appraisal” surgically excised.
... [W]hat this sort of reporting ultimately means is that if you have enough money to get somebody, anybody, to produce a white paper for you, which you can then put on some think-tank stationery? Then, my friend, you are ready to enter into the rushing current of elite reportage. For no matter how unhinged the position you’ve taken, or paid someone marginally credentialed to sketch out on your behalf—“Can Woman Think?: We Investigate,” “Is the Negro a Man: A Reconsideration”—that opinion will, by virtue of such provenance, possess all needed evidentiary gravity for the Times. And then some. (Only yesterday the Times ran this actual story, which is not parody.)
George Monbiot points out that the same forces are in action at the BBC.
This also describes the BBC’s understanding of “impartiality”. While it no longer provides a platform for outright climate denial, almost every day it breaks its own editorial guidelines by hosting Tufton Street junktanks (which often argue against environmental action) without revealing who funds them. Shouldn’t we be allowed to know whether or not they are sponsored by fossil fuel companies?

by Philip Wadler (noreply@blogger.com) at November 16, 2025 01:38 PM

November 13, 2025

Lee Pike

This blog has moved!

This blog has moved to https://leepike.github.io/blog/.

by Lee Pike at November 13, 2025 08:17 PM

Haskell Interlude

73: Jean-Philippe Bernardy

In this Interlude, weâ€™re joined by Jean-Philipe Bernardy, a Senior Lecturer at University of Gothenburg and Chalmers University of Technology. We discuss letting types be your guide, getting into AI to feed yourself, and never testing your programs.

by Haskell Podcast at November 13, 2025 05:00 PM

Christoph Breitkopf

Interval Tables in Common Lisp

Recently, I've been getting back to parensful programming. I started with Scheme in the 1980s after reading SICP, but for most of my programming, I've preferred statically typed languages. However, for some reason, interacting with Lisp code always gives me that warm, fuzzy feeling, so in the intervening years, I sometimes tried to get back to Scheme, but was always put off by the fractured ecosystem and incompatibilities between implementations. I remember trying Common Lisp too, but the fact that it's a Lisp-2, coupled with the ugly #'function syntax, drove me away before I had a chance to see the positives.
But last time I had a strong urge to write Lisp, I just sat down to prototype something larger in Common Lisp, and parts of it started to click. I grew accustomed to the less-than-ideal aspects (quoting from the CLtL2 index: "kludges, 1-971") and began to appreciate the scope of the language, its type system, the quality and compatibility of implementations, and the surprisingly stable library ecosystem. I've been using Common Lisp regularly for about two years now, and I felt it's time to port some libraries I've been using in other languages.
So I started writing a Lisp version of my Haskell IntervalMap library. When writing the Haskell version, I started out with a simple API using a concrete type for intervals, and later added a version using type classes. For the functions to provide in addition to those for interval queries, the Data.Map API was a good guideline. (And a source of much work - it's a large API with almost 100 functions, even if many are just variants of others. And since Haskell also has Data.Set there are IntervalSets, too.)
Common Lisp does not have sorted collections in the standard, and there's no widely accepted library either. As for other tables, the standard has property lists, association lists, and hash tables. The first is rather specialized; association lists are, well, lists; so only hash tables could serve as inspiration for the API. In comparison to Haskell's Data.Map, Lisps hash table API is small - just about 10 functions. Unlike most data structures in Haskell, and the pure subset of Lisp lists, hash-tables are not persistent, but are mutated when adding, changing, or deleting elements. It seemed advisable, if only for efficiency reasons, to make the interval table API use destructive operations like hash tables, and perhaps later offer a persistent version as an alternative.
Efficiency considerations also played a role in the API design. In Haskell, there's a lower barrier to returning, say, a list of tuples from a function, because the assumption is that the compiler will transform intermediate data structures away. In practice, that's more often an unfulfilled hope than a realistic assumption, since it requires coding things in a certain way when producing the result and sufficient inlining, which is problematic given the recursive structure of binary trees. In Lisp, consing (Lisp slang for "allocating on the heap") intermediate data structures will most certainly not be optimized away, so the API should avoid that as far as possible. Instead, it takes a function argument that is called with each key-value pair.
The most important decision, however, was how to handle ordering. Common Lisp lacks comparison predicates that work across all comparable types. So there are two options: pass an ordering predicate to the table constructor, or use CLOS generic methods to implement the necessary operations on intervals, like the Interval type class in the Haskell version. Not having used CLOS extensively so far, I decided to start with the seemingly simpler and more functional-style predicate version. I might add an alternative CLOS-based API later on. Using just a single predicate leaves the question of how to get at the lower and upper bounds of the intervals themselves. No problem with CLOS - just add generic methods. The critical realization was to pass the lower and upper interval bounds as separate values, obviating the need for an actual interval type.
Taken together, this led to the following basic API, choosing names to avoid clashes with standard functions:
make-interval-table predicate [bounds-type]
interval-table-count table
get-interval lower-bound upper-bound [default] (setf-able, of course)
delete-interval lower-bound upper-bound table
clear-intervals table
map-intervals function table
But where are the interval-lookup functions, like containing and intersecting? They turned out to fit quite nicely into the map paradigm, since most of the time, you want to process their results further. Thus, map-intervals has a result-type parameter just like the standard map function, and keyword arguments like :containing or :intersecting for interval queries. And there are some functions related to the table being ordered, such as get-min, get-max, delete-min, ... There's certainly room for improvement, and some experimentation with the API would be good, which is why I have not yet requested addition to Quicklisp.
Here is the code and documentation on GitHub: https://github.com/bokesan/interval-tables

by bokesan (noreply@blogger.com) at November 13, 2025 03:39 PM

November 11, 2025

Chris Penner

Ditch your (mut)ex, you deserve better
Having access to multiple parallel CPU cores isn't a new thing by any means, people have been programming in parallel for half a century now, but recent years we've found ourselves at an inflection point. Moore's law is dying, beefy single cores are no longer keeping up. Modern computers come with multiple CPU cores, so exploiting parallel compute is more important than ever. Given how long it's been an area of research we can naturally expect that effective tools have taken root and that synchronizing threads is trivial now right...?

Unfortunately this has not been my experience, and I'm willing to bet it hasn't been yours either. Managing shared state across threads is hard, and the most commonly used tools: mutexes and semaphores, simply haven't evolved much since their inception.

The words that follow will dig into the problems inherent to mutexes and synchronizing shared mutable state. Afterwards we'll look into other avenues which should prove more helpful.

The Problem with Shared State

Let's begin by crafting a simple software system which needs synchronization in the first place.

I'll present a commonly used example: the task of managing bank account balances correctly in spite of parallel transfer requests.

Of course real banks don't store all their account balances in RAM, so I'll hope that the reader can apply the concepts from this pedagogical example to a their own domain as necessary, it serves as a stand-in for any sufficiently complex system which requires ad-hoc synchronization of arbitrary data between multiple threads.

Here's some golang'ish pseudo-code (please don't try to actually compile it) for a simple bank account and the operations upon it. I'm focused on the synchronization problems here, so forgive me for skipping the double-entry accounting, input validation, and other real-world complexities.
struct Account {
  balance int,
}

// Deposit money into an account
func (a *Account) deposit(amount int) {
  a.balance += amount
}

// Withdraw money from an account, or return false if there are insufficient funds
func (a *Account) withdraw(amount int) bool {
  if (a.balance <= amount) {
    return false
  } else {
    balance -= amount
    return true
  }
}
Great! This defines our Account type and some methods for withdrawing and depositing money into such an account. Now let's add a function to transfer money between accounts:
func transfer(from *Account, to *Account, amount int) bool {
  if (from.withdraw(amount)) {
    to.deposit(amount)
    return true
  } else {
    return false
  }
}
Looks good, but now what happens when we start handling multiple requests concurrently?
struct TransferRequest {
  from *Account,
  to *Account,
  amount int,
}

func main() {
  // loop forever, accepting transfer requests and processing them in goroutines
  for {
    req := acceptTransferRequest()
    go transfer(req.from, req.to, req.amount)
  }
}
Things may work well in your tests if you're (un)lucky, and might even work well in production for a while, but sooner or later you're going to lose track of money and have some confused and angry customers.

Do you see why? This brings us to our first synchronization problem to solve, Data Races.

Data races

Most programming languages are imperative with mutable data structures [citation needed], so passing pointers to multiple threads leads to shared mutable data, and shared mutable data necessarily causes data races.

A data race occurs any time two threads access the same memory location concurrently and non-deterministically when at least one of the accesses is a write. When a data race is present two runs of the same code with the same state may non-deterministically have a different result.

We're passing accounts by reference here, so multiple threads have access to modify the same account. With multiple transfer go-routines running on the same account, each could be paused by the scheduler at nearly any point during its execution. This means that even within this simple example we've already introduced a data race. Take another look at the withdraw function, I'll point it out:
// Withdraw money from an account, or return false if there are insufficient funds
func (a *Account) withdraw(amount int) bool {
  hasFunds := a.balance >= amount 
  // HERE! The scheduler could pause execution here and switch to another thread
  if (hasFunds) {
    balance -= amount
    return true
  } else {
    return false
  }
}
If two threads are withdrawing $100 from Alice's account, which only has $150 in it, it's possible that thread 1 checks the balance, sees there's enough money, then gets paused by the scheduler. Thread 2 runs, checks the balance, also sees there's enough money, then withdraws $100. When thread 1 later resumes execution after the check it withdraws its $100 too, Alice's account ends up with a negative balance of -$50, which is invalid even though we had validation!

This sort of concurrency error is particularly insidious because the original withdraw method is perfectly reasonable, idiomatic, and correct in a single-threaded program; however when we decide to add concurrency at a completely different place in the system we've introduced a bug deep within existing previously correct code. The idea that a perfectly normal evolution from a single-threaded to a multi-threaded program can introduce critical system-breaking bugs in completely unrelated code without so much as a warning is quite frankly completely unacceptable. As a craftsman I expect better from my tools.

Okay, but now that we've lost thousands if not millions of dollars, how do we fix this?

Traditional knowledge points us towards Mutexes.

Mutexes

Okay, we've encountered a problem with our shared mutable state, the traditional approach to solving these problems is to enforce exclusive access to the shared data in so-called "critical sections". Mutexes are so-named because they provide mutual exclusion, meaning only a single thread may access a given virtual resource at a time.

Here's how we can edit our program to fix the data race problems using a mutex:
struct Account {
  mutex Mutex,
  balance int,
}

func (a *Account) deposit(amount int) {
  a.mutex.lock()
  defer a.mutex.unlock()
  a.balance += amount
}

func (a *Account) withdraw(amount int) bool {
  a.mutex.lock()
  defer a.mutex.unlock()
  hasFunds := a.balance >= amount 
  if (hasFunds) {
    balance -= amount
    return true
  } else {
    return false
  }
}
Now every Account has a mutex on it, which acts as an exclusive lock.

It's much like a bathroom key in a busy restaurant. When you want to use the bathroom, you take the key, there's only one key available for each bathroom, so while you've got hold of it nobody else can use that bathroom. Now you're free to do your business, then you return the key to the hook on the wall for the next person.

Unlike a bathroom key however, mutexes are only conceptual locks, not real locks, and as such they operate on the honor system.

If the programmer forgets to lock the mutex the system won't stop them from accessing the data anyways, and even then there's no actual link between the data being locked and the lock itself, we need to trust the programmers to both understand and respect the agreement. A risky prospect on both counts.

In this case, we've addressed the data-race within withdraw and deposit by using mutexes, but we've still got a problem within the transfer function.

What happens if a thread is pre-empted between the calls to withdraw and deposit while running the transfer function? It's possible that money will been withdrawn from an account, but won't have yet been deposited in the other. This is an inconsistent state of the system, the money has temporarily disappeared, existing only in the operating memory of a thread, but not visible in any externally observable state. This can (and will) result in very strange behaviour.

As a concrete way to observe the strangeness let's write a report function which prints out all account balances:
func report() {
    for _, account := range accounts {
        account.mutex.lock()
        fmt.Println(account.balance)
        account.mutex.unlock()
    }
}
If we run a report while transfers are ongoing we'll likely see that the count of the total amount of money that exists within the system is incorrect, and changes from report to report, which should be impossible in a closed system like this! This inconsistency occurs even if we obtain the locks for each individual account before checking the balance.

In larger systems this sort of inconsistency problem can cause flaws in even simple logic, since choices may be made against inconsistent system states. The root of this issue is that the transfer function requires holding multiple independent locks, but they're not grouped in any way into an atomic operation.

Composing Critical Sections

We need some way to make the entire transfer operation atomic, at least from the perspective of other threads who are respecting our mutexes.

Okay, well no problem, we can just lock both accounts, right?
func transfer(from *Account, to *Account, amount int) bool {
  from.mutex.lock()
  to.mutex.lock()
  defer from.mutex.unlock()
  defer to.mutex.unlock()

  if (from.withdraw(amount)) {
    to.deposit(amount)
    return true
  } else {
    return false
  }
}
I'm sure some readers have already seen a problem here, but have you seen two problems here?

The first is obvious when you point it out, remember that withdraw and deposit also lock the mutex on the account, so we're trying to acquire the same lock twice in the same thread.

transfer won't even begin to run in this state, it will block forever inside withdraw when it tries to lock the from.mutex for the second time.

Some systems, like re-entrant locks and Java's synchronized keyword do some additional book-keeping which allow a single thread to lock the same mutex multiple times, so using a re-entrant lock here would solve this particular problem. However other systems, like golang, avoid providing re-entrant locks on a matter of principle.

So what can we do? I suppose we'll need to pull the locks out of withdraw and deposit so we can lock them in transfer instead.
func (a *Account) deposit(amount int) {
  a.balance += amount
}

func (a *Account) withdraw(amount int) bool {
  hasFunds := a.balance >= amount 
  if (hasFunds) {
    balance -= amount
    return true
  } else {
    return false
  }
}

func transfer(from *Account, to *Account, amount int) bool {
  from.mutex.lock()
  to.mutex.lock()
  defer from.mutex.unlock()
  defer to.mutex.unlock()

  if (from.withdraw(amount)) {
    to.deposit(amount)
    return true
  } else {
    return false
  }
}
Ugh, a correct transfer function should conceptually just be the composition of our well encapsulated withdraw and a deposit functions, but defining it correctly has forced us to remove the locking from both withdraw and deposit, making both of them less safe to use. It has placed the burden of locking on the caller (without any system-maintained guarantees), and even worse, we now need to remember to go and add locking around every existing withdraw and deposit call in the entire codebase. Even if we try to encapsulate everything within the module and only export "safe" operations we've caused duplication since we now need synchronized and unsynchronized versions of our withdraw and deposit operations. And we'd still need to expose the mutexes if we want to allow callers to synchronize operations with other non-Account data.

What I'm getting at is that mutexes don't compose! They don't allow us to chain multiple critical sections into a single atomic unit, they force us to break encapsulation and thrust the implementation details of mutexes and locking onto the caller who shouldn't need to know the details about which invariants must be maintained deep within the implementation. Adding or removing access to synchronized variables within an operation will also necessitate adding or removing locking to every call site, and those call sites may be in a completely different application or library. This is an absolute mess.

All that sounds pretty bad, but would you believe those aren't the only problems here? It's not just composition that's broken here though, in fixing transfer to make it an atomic operation we've managed to introduce a new, extra-well-hidden deadlock bug.

Deadlocks/Livelocks

Recall that in our main loop we're accepting arbitrary transfer requests and spawning them off in goroutines. What happens in our system if we have two transfer requests, Alice is trying to Venmo Bob $25 for the beanbag chair she just bought off him, meanwhile Bob remembers he needs to Venmo Alice the $130 he owes her for Weird Al concert tickets.

If by sheer coincidence they both submit their requests at the same time, we have two transfer calls:

transfer(aliceAccount, bobAccount, 25)

transfer(bobAccount, aliceAccount, 130)

Each of these calls will attempt to lock their from account and then their to account. If Alice and Bob get very unlucky, the system will start the first transfer and lock Alice's account, then get paused by the scheduler. When the second transfer call comes in, it first locks Bob's account, then tries to lock Alice's account, but can't because it's already locked by the first transfer call.

This is a classic deadlock situation. Both threads will be stuck forever, and worse, both Alice and Bob's accounts will be locked until the system restarts.

This is a pretty disastrous consequence for a problem which is relatively hard to spot even in this trivially simple example. In a real system with dozens or hundreds of methods being parallelized in a combinatorial explosion of ways it's very difficult to reason about this, and can be a lot of work to ensure locks are obtained in a safe and consistent order.

Golang gets some credit here in that it does provide some runtime tools for detecting both dead-locks and data-races, which is great, but these detections only help if your tests encounter the problem; they don't prevent the problem from happening in the first place. Most languages aren't so helpful, these issues can be very difficult to track down in production systems.

Assessing the damage

What a dumpster fire we've gotten ourselves into...

While it may be no accident that the example I've engineered happens to hit all of the worst bugs at once, in my experience, given enough time and complexity these sorts of problems will crop up any system eventually. Solving them with mutexes is especially dangerous because it will seem to be an effective solution at first. Mutexes work fine in small localized use-cases, thus tempting us to use them, but as the system grows organically we stretch them too far and they fail catastrophically as the complexity of the system scales up, causing all sorts of hacky workarounds. I'm of the opinion that crossing your fingers and hoping for the best is not an adequate software-engineering strategy.

So, we've seen that architecting a correct software system using mutexes is possible, but very difficult. Every attempt we've made to fix one problem has spawned a couple more.

Here's a summary of the problems we've encountered:

Data races causing non-determinism and logic bugs

Lack of atomicity causing inconsistent system states

Lack of composition causing

Broken encapsulation

Code duplication

Cognitive overload on callers

Deadlocks/livelocks causing system-wide freezes

New features may require changes to every call-site

In my opinion, we've tried to stretch mutexes beyond their limits, both in this blog post and in the industry as a whole. Mutexes work great in small, well-defined scopes where you're locking a single resource which is only ever accessed in a handful of functions in the same module, but they're too hard to wrangle in larger complex systems with many interacting components maintained by dozens or hundreds of developers. We need to evolve our tools and come up with more reliable solutions.

Cleaning up the Chaos

Thankfully, despite an over-reliance on mutexes, we as an industry have still learned a thing or two since the 1960s. Particularly I think that enforcing immutability by default goes a long way here. For many programmers this is a paradigm shift from what they're used to, which usually causes some uneasiness. Seatbelts, too, were often scorned in their early years for their restrictive nature, but over time it has become the prevailing opinion that the mild inconvenience is more than worth the provided safety.

More and more languages (Haskell, Clojure, Erlang, Gleam, Elixir, Roc, Elm, Unison, ...) are realizing this and are adopting this as core design principle. Obviously not every programmer can switch to an immutable-first language over night, but I think it would behoove most programmers to strongly consider an immutable language if parallelism is a large part of their project's workload.

Using immutable data structures immediately prevents data-races, full-stop. So stick with immutable data everywhere you can, but in a world of immutability we'll still need some way to synchronize parallel processes and for that most of these languages do still provide some form of mutable reference. It's never the default, and there's typically some additional ceremony or tracking in the type system which acts as an immediate sign-post that shared-mutable state is involved; here there be dragons.

Even better than mutable references, decades of research and industrial research have provided us with a swath battle-tested high-level concurrency patterns which are built on top of lower-level synchronization primitives like mutexes or mutable references, typically exposing much safer interfaces to the programmer.

Concurrency Patterns

Actor systems and Communicating Sequential Processes (CSP) are some of the most common concurrency orchestration patterns. Each of these operate by defining independent sub-programs which have their own isolated states which only they can access. Each actor or process receives messages from other units and can respond to them in turn. Each of these deserves a talk or blog post of their own so I won't dive too deeply into them here, but please look into them deeper if this is the first you're hearing of them.

These approaches work great for task parallelism, where there are independent processes to run, and where your parallelism needs are bounded by the number of tasks you'd like to run. As an example, I used an actor-based system when building Unison's code-syncing protocol. There was one actor responsible for loading and sending requests for code, one for receiving and unpacking code, and one for validating the hashes of received code. This system required exactly 3 workers to co-operate regardless of how many things I was syncing. Actor and CSP systems are great choices when the number of workers/tasks we need to co-ordinate is statically known, i.e. a fixed number of workers, or a pre-defined map-reduce pipeline. These patterns can scale well to many cores since each actor or process can run independently on its own core without worrying about synchronizing access to shared mutable state, and as a result can often scale to multiple machines as well.

However, there are also problems where the parallelism is dynamic or ad-hoc, meaning there could be any number of runtime-spawned concurrent actors that must co-ordinate well with each other. In those cases these systems tend to break down. I've seen consultants describe complex patterns for dynamically introducing actors, one-actor-per-resource systems, tree-based actor resource hierarchies and other complex ideas but in my opinion these systems quickly outgrow the ability of any one developer to understand and debug.

So how then do we model a system like the bank account example? Even if we were to limit the system to a fixed number of transfer-workers they'd still be concurrently accessing the same data (the bank accounts) and need some way to express atomic transfers between them, which isn't easily accomplished with actors or CSP.

What's a guy to do?

A new (old) synchronization primitive

In the vast majority of cases using a streaming system, actors or CSP is going to be most effective and understandable. However in cases where we must synchronize individual chunks of data across many workers, and require operations to affect multiple chunks of data atomically, there's only one name in town that gets the job done right.

Software Transactional Memory (STM) is a criminally under-utilized synchronization tool which solves all of the problems we've encountered so far while providing more safety, better compositionality, and cleaner abstractions. Did I mention they prevent most deadlocks and livelocks too?

To understand how STM works, think of database transactions; in a database transaction isolation provides you with a consistent view of data in spite of concurrent access. Each transaction sees an isolated view of the data, untampered by other reads and writes. After making all your reads and writes you commit the transaction. Upon commit, the transaction either succeeds completely and applies ALL the changes you made to the data snapshot, or it may result in a conflict. In cases of a conflict the transaction fails and rolls back all your changes as though nothing happened, then it can retry on the new data snapshot.

STM works in much the same way, but instead of the rows and columns in a database, transactions operate on normal in-memory data structures and variables.

To explore this technique let's convert our bank account example into Haskell so we can use STM instead of mutexes.
data Account = Account {
  -- Data that needs synchronization is stored in a 
  -- Transactional Variable, a.k.a. TVar
  balanceVar :: TVar Int
}

-- Deposit money into an account.
deposit :: Account -> Int -> STM ()
deposit Account{balanceVar} amount = do
  -- We interact with the data using TVar operations which
  -- build up an STM transaction.
  modifyTVar balanceVar (\existing -> existing + amount)

-- Withdraw money from an account
-- Everything within the `do` block
-- is part of the same transaction.
-- This guarantees a consistent view of the TVars we 
-- access and mutate.
withdraw :: Account -> Int -> STM Bool
withdraw Account{balanceVar} amount = do
  existing <- readTVar balanceVar
  if existing <= amount
    then (return False)
    else do
      writeTVar balanceVar (existing - amount)
      return True

-- Transfer money between two accounts atomically
transfer :: Account -> Account -> Int -> STM Bool
transfer from to amount = do
  -- These two individual transactions seamlessly
  -- compose into one larger transaction, guaranteeing
  -- consistency without any need to change the individual
  -- operations.
  withdrawalSuccessful <- withdraw from amount
  if successful
    then do
      deposit to amount
      return True
    else 
      return False
Let's do another lap over all the problems we had with mutexes to see how this new approach fares.

Data Races

Data races are a problem which I believe are best solved at the language level itself. As mentioned earlier, using immutable data by default simply prevents data races from existing in the first place. Since data in Haskell is all immutable by default, pre-emption can occur at any point in normal code and we know we won't get a data race.

When we need mutable data, it's made explicit by wrapping that data in TVars. The language further protects us by only allowing us to mutate these variables within transactions, which we compose into operations which are guaranteed a consistent uncorrupted view of the data.

Let's convert withdraw to use STM and our balaceVar TVar.
-- Withdraw money from an account
withdraw :: Account -> Int -> STM Bool
withdraw Account{balanceVar} amount = do
  existing <- readTVar balanceVar
  if existing <= amount
    then (return False)
    else do
      -- No data races here!
      writeTVar balanceVar (existing - amount)
      return True
We can see that the code we wrote looks very much like the original unsynchronized golang version, but while using STM it's perfectly safe from data races! Even if it the thread is pre-empted in the middle of the operation, the transaction-state is invisible to other threads until the transaction commits.

Deadlock/Livelock

STM is an optimistic concurrency system. This means that threads never block waiting for locks. Instead, each concurrent operation proceeds, possibly in parallel, on their own independent transaction log. Each transaction tracks which pieces of data it has accessed or mutated and if at commit time it is detected that some other transaction has been committed and altered data which this transaction also accessed, then the latter transaction is rolled back and is simply retried.

This arrangement is fundamentally different from a lock-based exclusive access system. In STM, you don't deal with locks at all, you simply read and write data within a transaction as necessary. Our transfer function reads and writes two different TVars, but since we're not obtaining exclusive locks to these vars, we don't need to worry about deadlock at all. If two threads happen to be running a transfer on the same TVars at the same time, whichever commits first will atomically apply its updates to both accounts and the other transaction will detect this update at commit-time and will retry against the new balances.

This can cause some contention and possibly even starvation of any single transaction if many threads are trying to update the same data at the same time, but since a conflict can only occur if some other transaction has been committed, it does still have the guarantee that the system will make progress on at least some work. In Haskell, STM transactions must be pure code, and can't do IO, so most transactions are relatively short-running and should proceed eventually. This seems like a downside, but in practice it only surfaces as a rare annoyance and can usually be worked around without too much trouble.

Composition

It may not be immediately obvious from the types if you're not used to Haskell code, but all three of withdraw, deposit, and transfer are all functions which return their results wrapped in the STM monad, which is essentially a sequence of operations which we can ask to execute in a transaction using the atomically function.

We can call out to any arbitrary methods which return something wrapped in STM and it will automatically be joined in as part of the current transaction.

Unlike our mutex setup, callers don't need to manually handle locks when callingwithdraw and deposit, nor do we need to expose special synchronized versions of these methods for things to be safe. We can define them exactly once and use that one definition either on its own or within a more complex operation like transfer without any additional work. The abstraction is leak-proof, the caller doesn't need to know which synchronized data is accessed or lock or unlock any mutexes. It simply runs the transaction and the STM system happily handles the rest for you.

Here's what it looks like to actually run our STM transactions, which we do using the atomically function:
main :: IO ()
main = do
  forever $ do
    req <- acceptTransferRequest
    -- Run each transfer on its own green-thread, in an atomic transaction.
    forkIO (atomically (transfer req.from req.to req.amount)
If we'd like to compile a report of all account balances as we did previously, we can do that too. This time however we won't get a potentially inconsistent snapshot of the system by accident, instead the type-system forces us to make an explicit choice of which behaviour we'd like.

We can either:

Access and print each account balance individually as separate transaction which means accounts may be edited in-between transactions, leading to an inconsistent report like we saw earlier.

Or, we can wrap the entire report into a single transaction, reading all account balances in a single transaction. This will provide a consistent snapshot of the system, but due to the optimistic transaction system, the entire transaction will be retried if any individual transfers commit and edit accounts while we're collecting the report. It's possible that if transfers are happening very frequently, the report may be retried many times before it can complete.

This is a legitimate tradeoff that the developer of the system should be forced to consider.

Here's what those two different implementations look like:
-- Inconsistent report, may see money disappear/appear
reportInconsistent :: [Account] -> IO ()
reportInconsistent accounts = do
  for_ accounts $ \Account{balanceVar} -> do
    balance <- atomically (readTVar balanceVar)
    print balance

-- Consistent report, may be retried indefinitely 
-- if transfers are happening too frequently
reportConsistent :: [Account] -> IO ()
reportConsistent accounts = do
  balances <- atomically do 
    for accounts $ \Account{balanceVar} -> do
      readTVar balanceVar
  -- Now that we've got a snapshot we can print it out
  for_ balances print
Smart Retries

One last benefit of STM which we haven't yet discussed is that it supports intelligent transaction retries based on conditions of the synchronized data itself. For instance, if we have a task to withdraw $100 from Alice's account but it only has $50 in it, the mutex-based system has no choice to but fail the withdrawal entirely and return the failure up the stack. We can wrap that call with code to try again later, but how will we know when it's reasonable to try again? This would once again require the caller to understand the implementation details, and which locks the method is accessing.

STM, instead, supports failure and retrying as a first-class concept. At any point in an STM transaction you can simply call retry, this will record every TVar that the transaction has accessed up until that point, then will abort the current transaction and will sleep until any of those TVars has been modified by some other successful transaction. This avoids busy-waiting, and allows writing some very simple and elegant code.

For example, here's a new version of our withdraw function which instead of returning a failure will simply block the current thread until sufficient funds are available, retrying only when the balance of that account is changed by some other transaction's success.
-- Withdraw money from an account, blocking until sufficient funds are available
withdraw :: Account -> Int -> STM ()
withdraw Account{balanceVar} amount = do
  existing <- readTVar balanceVar
  if existing <= amount
    then retry
    else do
      writeTVar balanceVar (existing - amount)
You typically wouldn't use this to wait for an event which may take days or weeks to occur like in this example; but it's a very elegant and efficient solution for waiting on a channel, waiting for a future to produce a result, or waiting on any other short-term condition to be met.

Here's an example utility for zipping together two STM queues. The transaction will only succeed and produce a result when a value is available on both queues, and if that's not the case, it will only bother retrying when one of the queues is modified since readTQueue calls retry internally if the queue is empty.
zipQueues :: TQueue a -> TQueue b -> STM (a, b)
zipQueues q1 q2 = do
  val1 <- readTQueue q1
  val2 <- readTQueue q2
  return (val1, val2)
Nifty!

Conclusion

We've covered a lot in this post, if there's only one thing you can take away from it, I hope that you've taken the time to consider whether mutexes with shared mutable state are providing you with utility which outweighs their inherent costs and complexities. Unless you need peak performance, you may want to think twice about using such dangerous tools. Instead, consider using a concurrency pattern like actors, CSP, streaming, or map-reduce if it matches your use-case.

If you need something which provides greater flexibility or lower-level control, Software Transactional Memory (STM) is a fantastic choice if it's available in your language of choice, though note that not all languages support it, or if they do, may not be able to provide sufficient safety guarantees due to mutable variables and data structures.

If you're starting a new project for which concurrency or parallelism is a first-class concern, consider trying out a language that supports STM properly, I can recommend Unison or Haskell as great starting points.

Hopefully you learned something ðŸ¤ž! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! ðŸ�»
November 11, 2025 12:00 AM

November 10, 2025

Monday Morning Haskell

Parsing an HTTP Request
Last week, we discussed the utility of writing simple tools from scratch. We started writing a simple HTTP server by defining our request and response types. Today, we’ll write a parser for the HTTP request.

We’ll use the Megaparsec library throughout this article, without dwelling on too many details of its usage. If you want to learn all about parsing in Haskell from the ground up, including all the details of using this library, check out our course, Solve.hs. Module 4 in particular will teach you all about parsing!

Outlining our Parser

In our Haskell LeetCode series, we discussed top-down and bottom-up implementations of functions. In a top-down implementation, we can start with the high-level aspects of our program, leaving many parts as undefined temporarily. Then we’ll fill in the details as we go along. In a bottom-up implementation, we first figure out the smaller details and combine them until we get our complete program.

In reality, you can often use both of these together! To write this parser, we’ll begin with a top-down outline, then we’ll fill in some helper functions that we know will be useful. Finally we’ll fill in the “middle” of our program.

As a reminder, we are following this RFC guide on HTTP Version 1.1. Here are all the Haskell types we wrote last week to describe an HTTP Request:
import Data.Word (Word8)
import qualified Data.HashMap.Lazy as HM
import Data.ByteString.Lazy (ByteString)

data HttpMethod =
    HttpOptions | HttpGet | HttpHead | HttpPost | HttpPut |
    HttpDelete | HttpTrace | HttpConnect
    deriving (Show, Eq)

newtype HttpHeaders = HttpHeaders
    (HM.HashMap ByteString ByteString)
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    , requestHeaders :: HttpHeaders
    , requestBody :: Maybe ByteString
    }
    deriving (Show, Eq)
First, let’s define the type for our parsing function. Our Megaparsec Stream type will be ByteString, and we’ll be operating in the IO monad on our server. So the type should look something like this:
parseHttpRequest :: ParsecT Void ByteString IO HttpRequest
parseHttpRequest = undefined
We’ll be writing many type signatures with this parameterized ParsecT type though. So let’s simplify this with an alias:
type Parser a = ParsecT Void ByteString IO a

parseHttpRequest :: Parser HttpRequest
parseHttpRequest = undefined
Now that nice thing about monadically parsing a type is that, as an outline, we can parse each of the fields in order and combine them, looking something like this:
-- Not a final version!
parseHttpRequest :: Parser HttpRequest
parseHttpRequest = do
    m <- parseMethod
    u <- parseUri
    v <- parseVersion
    h <- parseHeaders
    b <- parseBody
    return $ HttpRequest m u v h b
In the spirit of top-down implementation, we can then define functions for each of these fields:
parseMethod :: Parser HttpMethod

parseUri :: Parser ByteString

parseHttpVersion :: Parser (Word8, Word8)

parseHeaders :: Parser HttpHeaders

parseBody :: Parser (Maybe ByteString)

-- Not a final version!
parseHttpRequest :: Parser HttpRequest
parseHttpRequest = do
    m <- parseMethod
    u <- parseUri
    v <- parseVersion
    h <- parseHeaders
    b <- parseBody
    return $ HttpRequest m u v h b
This is already a great start, but we don’t want to forget about some of the formatting elements of the request. For example, there are spaces between the method and URI, and then the URI and version. There are CRLF sequences between the version and headers, and the headers and body. So we may wish to define helpers that specifically parse those sequences.
sp :: Parser ()

crlf’ :: Parser ()

-- Not a final version!
parseHttpRequest :: Parser HttpRequest
parseHttpRequest = do
    m <- parseMethod
    sp
    u <- parseUri
    sp
    v <- parseVersion
    crlf
    h <- parseHeaders
    crlf
    b <- parseBody
    return $ HttpRequest m u v h b
Now we’ve got a definite outline, as well as some motivation for helper functions. This still isn’t our final parseHttpRequest function, but it's given us sufficient information to switch to bottom-up implementation to write some of these helpers.

Helper Functions

One of the important roles of helper functions is to smooth out rough edges of library functions. In our case, we want to define a lot of helpers using characters (Char type). However, since we’re parsing bytestrings, the “token” of our parsing is actually Word8. So we’ll write a simple function to convert a character to Word8, using Data.Char (ord):
import Data.Char

o :: Char -> Word8
o = fromIntegral . ord
Now we can use this to fill in sp, writing a simple Parser for a single space character. We use single from the Parsec library:
sp :: Parser ()
sp = void $ single (o ' ')
With URIs, we want to parse any character except a space. So we can use the opposite function anySingleBut to do that:
nonSp :: Parser Word8
nonSp = anySingleBut (o ' ')
We also observe that the CRLF sequence is common, so we’ll write a parser that gives us these two characters:
crlf' :: Parser ()
crlf' = void $ do
    single (o '\r')
    single (o '\n')
For the HTTP version, we’ll want to parse a digit and convert it to Word8. This involves two tricks. First, we use satisfy with a predicate to only capture characters 0-9. Then we subtract 48 from them! This is because we want the number, not the ASCII code. Since 48 is the ASCII code for 0, this subtraction gives us the actual digit value.
digitW :: Parser Word8
digitW = (\x -> x - (o '0')) <$> satisfy (\c -> c >= (o '0') && c <= (o '9'))
Finally, I’ll share an important helper that isn’t so intuitive. Our parsers will capture individual Word8 characters, often producing the type [Word8] when we combine them. We want to easily turn a Parser of this list into a Parser of a ByteString. To do this, we want to use Data.ByteString.pack like so:
import qualified Data.ByteString.Lazy as BS

mkBS :: Parser [Word8] -> Parser ByteString
mkBS = fmap BS.pack
We can combine our helpers to capture even more patterns! For example, here’s a parser that will capture all characters up until a CRLF sequence, and then consume the CRLF. It uses o, crlf’ and mkBS, as well as the someTill combinator from Megaparsec.
parseTillCrlf :: Parser ByteString
parseTillCrlf = mkBS (someTill (anySingleBut (o '\r')) crlf')
That’s all the helpers we’ll need! Now let’s consider the individual components of the HTTP Request, one by one.

Parsing a Method

We’ll start with the first part of our request line, the “method”. We have a finite number of methods as defined by this enum:
data HttpMethod =
    HttpOptions | HttpGet | HttpHead | HttpPost | HttpPut |
    HttpDelete | HttpTrace | HttpConnect
    deriving (Show, Eq)
This means we can define all the pairs of strings to parse with the proper constructor:
parseMethod :: Parser HttpMethod
parseMethod = ...
    where
        pairs :: [(ByteString, HttpMethod)]
        pairs =
            [ ("OPTIONS", HttpOptions)
            , ("GET", HttpGet)
            , ("HEAD", HttpHead)
            , ("POST", HttpPost)
            , ("PUT", HttpPut)
            , ("DELETE", HttpDelete)
            , ("TRACE", HttpTrace)
            , ("CONNECT", HttpConnect)
            ]

        ...
Now we can write a function that will turn one of these pairs into a Parser. This parser will use string to parse the string, and then return the method constructor:
parseMethod :: Parser HttpMethod
parseMethod = ...
    where
        pairs :: [(ByteString, HttpMethod)]
        pairs = ...
        
        p :: (ByteString, HttpMethod) -> Parser HttpMethod
        p (s, m) = string s >> return m
Finally, we use map to create a list of Parser HttpMethod items, and then we use the choice combinator from Megaparsec to parse one of these options:
parseMethod :: Parser HttpMethod
parseMethod = choice parsers
    where
        pairs :: [(ByteString, HttpMethod)]
        pairs =
            [ ("OPTIONS", HttpOptions)
            , ("GET", HttpGet)
            , ("HEAD", HttpHead)
            , ("POST", HttpPost)
            , ("PUT", HttpPut)
            , ("DELETE", HttpDelete)
            , ("TRACE", HttpTrace)
            , ("CONNECT", HttpConnect)
            ]
        
        p :: (ByteString, HttpMethod) -> Parser HttpMethod
        p (s, m) = string s >> return m

        parsers :: [Parser HttpMethod]
        parsers = map p pairs
And now our method parser is complete!

Parsing the URI

Parsing the URI is quite easy with our helpers:
parseUri :: Parser ByteString
parseUri =  mkBS $ someTill nonSp sp
This definition has 3 steps:

Parse a series of “non-space” characters

End the parsing once we see a “space” character

Convert this to a ByteString.

The someTill combinator from Megaparsec does most of the work here. What is notable is that this parser will consume the space separator, but does not include it as part of the result. So we won’t need to include this separator in parseHttpRequest.

Parsing the Version

Parsing the version is even simpler as a step-by-step process.

Parse the string HTTP/

Parse the major version digit. (We’ll limit to 1 digit to keep things simple)

Parse the period separator

Parse the minor version digit.

We already know about the string function from Megaparsec, and we have our digitW helper, so this is easy:
parseHttpVersion :: Parser (Word8, Word8)
parseHttpVersion = do
    string "HTTP/"
    d1 <- digitW
    single (o '.')
    d2 <- digitW
    return (d1, d2)
At this point, it’s worth checking in on our main request function. Here’s what is implemented so far:
parseHttpRequest :: Parser HttpRequest
parseHttpRequest = do
    m <- parseMethod
    sp
    u <- parseUri
    v <- parseHttpVersion
    crlf'
    h <- undefined
    b <- undefined
    return $ HttpRequest m u v h b
We still have a sp separator between the method and URI. But we don’t need one between the URI and version since parseUri consumes that space. But we’ll still keep the crlf’ between the version and the headers.

Parsing Headers

Now we need to parse the headers, bearing in mind that there could be 0 of them. In this case, the headers section consists of a single CRLF sequence. The key trick here is to view the header parser as a recursive loop, maintaining the state of the header map we’ve parsed so far. So its type signature will actually look like this:
parseHeaders :: HM.HashMap ByteString ByteString -> Parser HttpHeaders
To implement this function, we need to consider two different cases. In the first case, we see a CRLF. This means we are at the end of the headers, because a header key cannot start (or even contain) a carriage return. We’ll wrap our map into HttpHeaders and return it. Thus we can start our function like this:
parseHeaders :: HM.HashMap ByteString ByteString -> Parser HttpHeaders
parseHeaders prev = crlf' >> return (HttpHeaders prev)
But if we don’t encounter a CRLF, we need to actually parse the key-value pair. We use the <|> combinator to give ourselves a second option if the CRLF parser fails.
parseHeaders :: HM.HashMap ByteString ByteString -> Parser HttpHeaders
parseHeaders prev = (crlf' >> return (HttpHeaders prev)) <|> do
    ...
We’ll then parse all the characters until a colon : for the header name, followed by parseTillCrlf for the value characters. (Recall that each individual header line has its own CRLF). The “name” portion uses the same pattern as parseTillCrlf, except with a colon):
parseHeaders :: HM.HashMap ByteString ByteString -> Parser HttpHeaders
parseHeaders prev = (crlf' >> return (HttpHeaders prev)) <|> do
    headerName <- mkBS (someTill (anySingleBut (o ':')) (single $ o ':'))
    headerBody <- parseTillCrlf
    ...
Now we store this key/value pair in our HashMap, and then we recurse with our new map:
parseHeaders :: HM.HashMap ByteString ByteString -> Parser HttpHeaders
parseHeaders prev = (crlf' >> return (HttpHeaders prev)) <|> do
    headerName <- mkBS (someTill (anySingleBut (o ':')) (single $ o ':'))
    headerBody <- parseTillCrlf
    parseHeaders (HM.insert headerName headerBody prev)
Our request parser now looks like this:
parseHttpRequest :: Parser HttpRequest
parseHttpRequest = do
    m <- parseMethod
    sp
    u <- parseUri
    v <- parseHttpVersion
    crlf'
    h <- parseHeaders HM.empty
    ...
    return $ HttpRequest m u v h undefined
The headers parser consumes the CRLF separating the headers from the body, so we don’t need that separator there.

Parsing the Body

Now we’ll write a simple body parser. This will have a similar pattern to the header parser, except that we are checking for the “end of the input” using the eof parser, rather than checking for a carriage return.
parseBody :: Parser (Maybe ByteString)
parseBody = (eof >> return Nothing) <|> do
    b <- mkBS (manyTill anySingle eof)
    return $ Just b
We use the alternative operator <|> so that we can return Nothing in the case of an empty body, and Just in the case of a body.

This approach would be dubious on a production server, since another incoming request could prevent us from reading eof from the socket. We’d include the second request as part of the first’s body! In reality, you would actually want to use the Content-Length header in the request to parse a specific number of characters in the body.

With this, we can now complete our request parser!
parseHttpRequest :: Parser HttpRequest
parseHttpRequest = do
    m <- parseMethod
    sp
    u <- parseUri
    v <- parseHttpVersion
    crlf'
    h <- parseHeaders HM.empty
    b <- parseBody
    return $ HttpRequest m u v h b
Complete Parser Code

Here’s all the code we worked on this week:
o :: Char -> Word8
o c = fromIntegral (ord c)

sp :: Parser ()
sp = void $ single (o ' ')

nonSp :: Parser Word8
nonSp = anySingleBut (o ' ')

crlf' :: Parser ()
crlf' = void $ do
    single (o '\r')
    single (o '\n')

digitW :: Parser Word8
digitW = (\x -> x - (o '0')) <$> satisfy (\c -> c >= (o '0') && c <= (o '9'))

mkBS :: Parser [Word8] -> Parser ByteString
mkBS = fmap BS.pack

parseTillCrlf :: Parser ByteString
parseTillCrlf = mkBS (someTill (anySingleBut (o '\r')) crlf') 

parseMethod :: Parser HttpMethod
parseMethod = choice parsers
    where
        pairs :: [(ByteString, HttpMethod)]
        pairs =
            [ ("OPTIONS", HttpOptions)
            , ("GET", HttpGet)
            , ("HEAD", HttpHead)
            , ("POST", HttpPost)
            , ("PUT", HttpPut)
            , ("DELETE", HttpDelete)
            , ("TRACE", HttpTrace)
            , ("CONNECT", HttpConnect)
            ]

        p :: (ByteString, HttpMethod) -> Parser HttpMethod
        p (s, m) = string s >> return m

        parsers :: [Parser HttpMethod]
        parsers = map p pairs

parseUri :: Parser ByteString
parseUri =  mkBS $ someTill nonSp sp

parseHttpVersion :: Parser (Word8, Word8)
parseHttpVersion = do
    string "HTTP/"
    d1 <- digitW
    single (o '.')
    d2 <- digitW
    return (d1, d2)

parseHeaders :: HM.HashMap ByteString ByteString -> Parser HttpHeaders
parseHeaders prev = (crlf' >> return (HttpHeaders prev)) <|> do
    headerName <- mkBS (someTill (anySingleBut (o ':')) (single $ o ':'))
    headerBody <- parseTillCrlf
    parseHeaders (HM.insert headerName headerBody prev)

parseBody :: Parser (Maybe ByteString)
parseBody = (eof >> return Nothing) <|> do
    b <- mkBS (manyTill anySingle eof)
    return $ Just b

parseHttpRequest :: Parser HttpRequest
parseHttpRequest = do
    m <- parseMethod
    sp
    u <- parseUri
    v <- parseHttpVersion
    crlf'
    h <- parseHeaders HM.empty
    b <- parseBody
    return $ HttpRequest m u v h b
Conclusion

Next week, we’ll take this parser and incorporate it into a simple but functional HTTP server! We’ll just need to add some serialization logic and network mechanics.

If you want to learn how to use Megaparsec to parse anything in Haskell, sign up for our course, Solve.hs! Module 4 of this course will teach you how to write a parser from scratch, as well as deal with common data formats, including HTML!
by James Bowen at November 10, 2025 09:30 AM

Derek Elkins

Umbral Calculus

Introduction

I’ll start by rationalizing an example of “old” umbral calculus from Wikipedia. |\newcommand{pair}[2]{\langle{#1}\mid{#2}\rangle}| |\newcommand{bigpair}[2]{\left\langle{#1}\ \middle|\ {#2}\right\rangle}| |\newcommand{pseq}[2]{\{#1\}_{#2 \in \mathbb N}}| |\newcommand{ucomp}[2]{#1_n(\underline #2(x))}| >>>

We know |(x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k|. We then “infer” that |B_n(x+y) = \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k| where |B_n(x)| are the Bernoulli polynomials. Actually, the “old” style would be even more dubious. You’d have a “rule” like representing |B_n(x+y)| as |(b+y)^n|, then expand like usual and replace |b^k = (b + 0)^k| with |B_k(x)|. The variables like |b| were the “shadow” or “umbral” variables.

Rationalizing it using techniques I’ll describe below. Let |\varepsilon_y| be the linear operator on polynomials satisfying |\varepsilon_y p(x) = p(x + y)|. Since |D_x[\varepsilon_y p(x)] = \varepsilon_y D_x p(x)| for all |y| where |D_x| is differentiation by |x|, |\varepsilon_y| is induced from a formal power series in |D_x|. In particular, |\varepsilon_y = e^{yD_x}|.

Let |T| be the linear operator (the transfer or umbral operator) characterized by mapping |x^n| to |B_n(x)| representing a change of basis on the vector space of polynomials. We can apply |T| to the equation \[ \varepsilon_y(x^n) = (x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k \] to get via linearity \[ T(\varepsilon_y(x^n)) = T((x+y)^n) = \sum_{k=0}^n {n \choose k} T(x^{n-k}) y^k = \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k \]

The key property we then need is |T(\varepsilon_y(x^n)) = \varepsilon_y T(x^n) = B_n(x + y)| which can be reduced to |D_xT(x^n) = T(D_x x^n)|. This is just the statement that |D_x B_n(x) = nB_{n-1}(x) = nT(x^{n-1}) = T(nx^{n-1}) = T(D_x x^n)| using a well-known property of Bernoulli (and other) polynomials. In fact, this relation implies that |T| is itself induced by a formal power series and thus commutes with any other linear operator so induced.

Ultimately, the only properties we needed for this was that we had a (linear) change of basis from the monomial basis to the polynomial sequence, which we’ll have for any polynomial sequence whose |n|th element has degree |n|, and that the change of basis commuted with the |D_x| operator. The latter is more stringent, but there are various ways we can expand the scope of the argument.

First, |D_x x^n = \frac{c_n}{c_{n-1}} x^n| with |c_n = n!| completely characterizes (with |D_x x^0 = 0|) the differentiation operation on polynomials. Choosing a different sequence |c| leads to different notions of “differentiation”. This will change |\varepsilon_y| and lead to different formulas, but they will be structurally similar.

In a different direction, we can ask for formal power series and polynomial sequences that relate to each other the way |D_x| and |x^n| do. We say that a polynomial sequence |s_n(x)| is Sheffer for a pair of formal power series |(g(t), f(t))| with |\deg g = 0| and |\deg f = 1| when |\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n|. (This inner-product-like notation will be defined later, but the key thing is that this mirrors |\pair{t^k}{x^n} = c_n\delta_k^n|.) This has |g(t)f(t)^k| taking the place of |t^k| and |s_n(x)| taking the place of |x^n|. This would let us transfer identities involving the |s_n(x)| to any linear operator that commutes with |g(t)f(t)|. While changing |c| changes our notion of “differentiation”, using a Sheffer sequence allows us to consider other “differential operators” using the same notion of “differentiation”.

This is based primarily off of works by Steven Roman and Gian-Carlo Rota. It closely follows The Theory of the Umbral Calculus. I by Steven Roman (1982). See also The Umbral Calculus by Steven Roman (1984).

I’ll include proofs below to illustrate that each bit of reasoning is fairly straightforward.

Overview

Conventions

Formal Power Series

Linear Functionals

Properties

Evaluation Functional

Formal Derivative

Linear Operators

Evaluation Operator

Characterizing Linear Operators Induced from Formal Power Series

Polynomial Sequences

Recurrence Formulas

Transfer Formulas

Umbral Composition and Transfer Operators

Example: Chebyshev Polynomials

Summary

Overview

One of the key ideas of umbral calculus is the relation of four vector spaces: the space of polynomials, its dual space, i.e. linear functionals on the space of polynomials, linear operators, i.e. endomorphisms, on the space of polynomials, and the space of formal power series. It will turn out that the space of formal power series and the space of linear functionals on the space of polynomials are isomorphic not just as vector spaces but as algebras. Further, we can embed the space of formal power series as a commutative sub-algebra of the space of linear operators on the space of polynomials. This latter view views formal power series as differential operators on polynomials. This gives us three perspectives on formal powers series and two ways they interact with polynomials.

We’ll also be interested in linear operators on the space of polynomials that aren’t induced by formal power series such as the transfer or umbral operators and umbral shift operators. A transfer operator is essentially a “well-behaved” change of basis from the monomial basis. Umbral shift operators generalize the “multiply by |x|” operator. These operators will have adjoints that are linear operators on the space of formal power series.

We’ll find that a linear operator on the space of formal power series is an adjoint of a linear operator on the space of polynomials if and only if it is continuous in a sense to be defined. Surjective derivations on the space of formal power series will be exactly those adjoint to umbral shifts. Finally, continuous algebra automorphisms on the algebra of formal power series will be exactly those adjoint to transfer operators.

Beyond these structural aspects, we’ll also derive many results for working with Sheffer sequences and polynomial sequences along the way.

Conventions

Fix a field |\mathbb K| of characteristic |0|. I’ll write |\mathscr F = \mathbb K[\![t]\!]| for the |\mathbb K|-algebra of univariate formal power series (with indeterminate |t|) and |P = \mathbb K[x]| for the |\mathbb K|-algebra of univariate polynomials (with indeterminate |x|). Further, |\mathrm{Hom}(X, Y)| will be the |\mathbb K|-vector space of |\mathbb K|-linear maps from a |\mathbb K|-vector space |X| to a |\mathbb K|-vector space |Y|. In particular, |X^* = \mathrm{Hom}(X,\mathbb K)| is the dual space of |X|.

The four main |\mathbb K|-vector spaces we’ll be focused on are |\mathscr F|, |P|, |P^*|, and |\mathrm{Hom}(P, P)|. The first two are additionally |\mathbb K|-algebras, and we’ll find that the third is as well and is, in fact, isomorphic to |\mathscr F| which is arguably a key enabling fact of umbral calculus. In particular, the |\mathbb K|-algebra structure induced on |P^*| via |\mathscr F| is called the umbral algebra.

Given the above, unsurprisingly, we’ll be talking a lot about formal power series and polynomials. To save a bit of space and typing for me, unless otherwise specified, if I say |f| is a formal power series or |f \in \mathscr F|, then that will also defined the sequence |\pseq{f_n}{n}| such that |f(t) = \sum_{k=0}^\infty f_k t^k|. Generally, when I state something is a sequence it will be a function |\mathbb N \to X| for some |X| and the parameter will be written as a subscript. So the formal power series |f| also defines a sequence, also called |f|, from |\mathbb N \to \mathbb K|. (In this case, we could literally identify formal power series with these sequences.) Similarly, stating |p| is a polynomial or |p \in P| defines a sequence also called |p| such that |p(x) = \sum_{n=0}^{\deg p}p_n x^n| where |\deg p| is the degree of the polynomial, i.e. the largest value of |n| such that |p_n \neq 0|. These are also sequences |\mathbb N \to \mathbb K|, and we could identify polynomials with such sequences that are eventually always |0|. We also have the degree of a formal power series written |\deg f| which is the smallest |k| such that |f_k \neq 0|. Note that this is dual to the notion for polynomials. It’s clear that |\deg(fg) = \deg f + \deg g|.

Of course, sometimes I will explicitly state something like |f(t) = \sum_{k=0}^\infty a_k t^k| in which case the sequence |f| is not defined. Usually, this will arise with a formal power series expression, e.g. |(f \circ g)(t)| so there shouldn’t be any ambiguity. As is typical, I’ll often say “the formal power series |f(t)|” as opposed to “the formal power series |f|”. Finally, as has been illustrated, I’ll endeavor to use |k| as the indexing variable for formal power series and |n| for polynomials, but that won’t always be possible.

The Kronecker delta is written |\delta_k^n| and defined by |\delta_n^n = 1| and |\delta_k^n = 0| for |k \neq n|. This should typically be thought of as a way of forcing |n| and |k| to be equal, i.e. |f(n)\delta_k^n = f(k)\delta_k^n| and |\delta_{f(k)}^n = \delta_k^{f^{-1}(n)}| for an invertible function |f|.

Formal Power Series

I’ll assume familiarity with the basic algebra of formal power series. This lecture gives a nice in-depth and more technical overview, though it goes far beyond what we’ll need. I’ll recall a few results and fix the terminology and notation that will be used in this article which largely follows Roman but there are many variations in the literature.

Theorem (id:wewp): |f \in \mathscr F| has a multiplicative inverse, written |f^{-1}| if and only if |\deg f = 0|.

Proof: (click to expand) |f(t)g(t) = \sum_{k=0}^\infty c_n t^k| where |c_k = \sum_{i+j=k} f_i g_j|. Clearly, |c_0 = f_0 g_0| which makes it clear a multiplicative inverse to |f| can only exist if |f_0 \neq 0|. It also makes it clear that for |g| to be |f^{-1}|, |g_0 = 1/f_0|. A simple calculation shows that |g_k = f_0^{-1} \sum_{n=1} f_n g_{k-n}| for |k > 0| which gives a recurrence computing all the remaining coefficients of |f^{-1}|. |\square|

Thus, |f| being invertible will be synonymous with |f| having degree |0|.

It’s worth noting that |f/g| can be defined even when |g| isn’t invertible, e.g. |t/t = 1|. If |\deg f > \deg g|, then we can cancel out common factors of |t| until the denominator is invertible.

Suppose |g : \mathbb N \to \mathscr F|, which we’ll write as |g_k(t) \in \mathscr F|, such that |\deg g_k \to \infty| as |k \to \infty|. Given any |a : \mathbb N \to \mathbb K|, then |\sum_{k=0}^\infty a_k g_k(t)| is a well-defined element of |\mathscr F|. In particular, if |\deg g > 0|, then |g_k(t) = g(t)^k| satisfies the conditions. If we have |\deg g_k = k|, which is the case when |g_k(t) = g(t)^k| for |g| with degree |1| for example, then |g| forms a pseudobasis¹ for |\mathscr F| meaning for any |f \in \mathscr F|, there exists a unique sequence |a| such that |f(t) = \sum_{k=0}^\infty a_n g_k(t)|. A series |f| with |\deg f = 1| is called a delta series. Every delta series gives rise to a pseudobasis of |\mathscr F|.

If |f, g \in \mathscr F| and |\deg g > 0|, then we can thus form the composition |f(g(t)) = \sum_{k=0}^\infty f_k g(t)^k|. It’s clear that |\deg(f\circ g) = \deg f\deg g|.

Theorem (id:cjme): A series, |f|, has a compositional inverse, written |\bar f|, meaning |f(\bar f(t)) = t = \bar f(f(t)| if and only if |\deg f = 1|.

Proof: (click to expand) Suppose |g \in \mathscr F| such that |f(g(t)) = t|. By taking degrees, we immediately see that |f| (and |g|) need to be of degree exactly one to have any chance for |g| to be a compositional inverse to |f|. If |g(t)^k = \sum_{n=0}^\infty b_{k,n} t^n|, then clearly we need |f_1 b_{1,1} = 1|. By induction, we have |b_{k+1,n} = \sum_{i+j=n} b_{1,i}b_{k,j}| but note that |b_{k,n} = 0| for all |n < k| so this sum always has |k \leq j < n| and thus |1 \leq i \leq n-k|. This leads to \[\sum_{k=1}^n f_k b_{k,n} = 0 = f_1 b_{1,n} + \sum_{k=2}^n f_k b_{k,n}\] where the last sum only involves |b_{1,i}| for |i < n|. Alternatively, simply note that if |f(t) = th(t)| and |g(t) = tk(t)| then, |h| and |k| have degree |0|, and |t = f(g(t)) = t k(t)h(g(t))|. So |k(t) = h(g(t))^{-1} = h(tk(t))^{-1}|. |h| is invertible and |tk(t)| clearly has degree greater than |0| so the composition is well-defined. Unfolding this expression for |k(t)| shows that the |i|th coefficient only depends on earlier coefficients. |\square|

A useful result linking these two together is if |\deg f = 1|, then \[ 1 = t’ = [\bar f(f(t))]' = \bar f’(f(t))f’(t) \] In other words, |f’(t)^{-1} = \bar f’(f(t))|.

Linear Functionals

For |L \in P^*| and |p \in P|, we’ll write |\pair{L}{p(x)}| for the action of |L| on |p|. Any such |L| is uniquely defined by its values on |x^n| for all |n\in\mathbb N|.

If |c : \mathbb N \to \mathbb K\setminus \{0\}|, we can define for each |f \in \mathscr F| a linear functional which we’ll also write as |f| or |f(t)| via \[\pair{f(t)}{x^n} = c_n f_n\] Really, we should write something like |\pair{f(t)}{p(x)}_c| to indicate the dependence on |c|. This play on notation is unambiguous since |f(t) = g(t)| if and only if |\pair{f(t)}{x^n} = \pair{g(t)}{x^n}| for all |n|, i.e. |f| and |g| are equal as power series if and only if the induced linear functionals are equal.

Notable choices for |c| are:

|c_n = n!| is the most traditional case.

|c_n = 1|

|c_n = 1/{\lambda \choose n}| for |\lambda| not a negative integer.

The definition of the linear functional induced by |f \in \mathscr F| implies that |\pair{t^k}{x^n} = c_n\delta_k^n|. This leads to \[\bigpair{\sum_{n=0}^\infty a_n t^n}{p(x)} = \sum_{n=0}^\infty a_n \pair{t^n}{p(x)} \] where the right-hand side is well-defined because only finitely many of the terms of the sum will be non-zero. (We can generalize to allow Laurent series with only finitely many negative powers on the left and Laurent series with only finitely many positive powers on the right.)

We can articulate L’Hôpital’s rule with this notation as: if |\deg f \geq \deg g > 0|, then \[\pair{f(t)/g(t)}{x^0} = \pair{f’(t)/g’(t)}{x^0} \]

We can explicitly write the formal power series, |f_L|, corresponding to the linear functional, |L|, as \[f_L(t) = \sum_{k=0}^\infty \frac{\pair{L}{x^k}}{c_n}t^k\] It is trivial to verify that the linear functional induced by |f_L| is |L|. This gives an isomorphism |\mathscr F \cong P^*| as |\mathbb K|-vector spaces. However, the algebra structure on |\mathscr F| then induces an algebra structure on |P^*|. We can compute \[\pair{f(t)g(t)}{x^n}\rangle = \sum_{i+j=n}\frac{c_n}{c_i c_j}\pair{f(t)}{x^i}\pair{g(t)}{x^j}\]

We’ll call |L| a delta/invertible functional when it corresponds to a delta/invertible power series.

Properties

If |\deg f > \deg p| then |\pair{f(t)}{p(x)} = 0|.

If |\deg p_n(x) = n| and |\pair{f(t)}{p_n(x)} = 0| for all |n \in \mathbb N|, then |f(t) = 0|.

|\pair{f(at)}{p(x)} = \pair{f(t)}{p(ax)}| [Proof: Follows immediately from |\pair{a^n t^n}{x^n} = a^n c_n = \pair{t^n}{a^n x^n}|. |\square|]

|p(x) = \sum_{n=0}^\infty \frac{\pair{t^n}{p(x)}}{c_n}x^n|

If |\deg f_k = k| and |\pair{f_k(t)}{p(x)} = 0| for all |k \in \mathbb N|, then |p(x) = 0|.

Evaluation Functional

We always have the evaluation functional |\varepsilon_y| for |y \in \mathbb K| defined by \[\pair{\varepsilon_y(t)}{p(x)} = p(y)\] Note that this definition doesn’t depend on the choice of |c|. We quickly compute |\pair{\varepsilon_y(t)}{x^n} = c_n y^n| so \[ \varepsilon_y(t) = \sum_{k=0}^\infty \frac{y^k}{c_k}t^k\]

When |c_n = n!|, then |\varepsilon_y(t) = e^{yt}|.

Formal Derivative

The formal derivative of |f \in \mathscr F|, written |\partial_t f(t)|, is defined as \[\partial_t t^k = \begin{cases} \frac{c_k}{c_{k-1}}t^{k-1}, & k > 0 \\ 0, & k = 0 \end{cases}\] which leads to the key property |\pair{\partial_t f(t)}{p(x)} = \pair{f(t)}{xp(x)}|.

As an example, we immediately compute that |\partial_t\varepsilon_y(t) = y\varepsilon_y(t)|.

We will also use the ordinary derivative of formal power series which we’ll notate with |f’(t)|. The formal derivative and the ordinary derivative coincide when |c_n = n!| as suggested by the previous example.

Linear Operators

We’ve identified formal power series with linear functionals on |P|. Next, we want to identify them with linear operators on |P|. We’re clearly not going to get an isomorphism in this case as multiplication (i.e. composition) of linear operators doesn’t commute in general, while multiplication of formal power series does. Nevertheless, we will derive simple characterizations of which linear operators are of this form.

One of the most important properties we will want is the following adjointness property: \[ \pair{f(t)g(t)}{p(x)} = \pair{f(t)}{g(t)p(x)} \] where |g(t) p(x)| is the action of the linear operator induced by |g(t)| on |p(x)|. We can derive what the induced linear operator must be to satisfy this property.

\[\begin{align} c_n \delta_m^n & = \pair{t^m}{x^n} \\ & = \pair{t^{m-k} t^k}{x^n} \\ & = \pair{t^{m-k}}{t^k x^n} \\ & = \bigpair{t^{m-k}}{\frac{c_n}{c_{n-k}}x^{n-k}} \end{align}\] so |t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k}| for |k \leq n| and |0| otherwise. Thus, \[f(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} f_k x^{n-k} \] which can be extended to all polynomials by linearity.

This should look familiar. |tx^n = \frac{c_n}{c_{n-1}}x^{n-1}| which is exactly the same formula as the one for |\partial_t| except this operates on polynomials while |\partial_t| operates on formal power series. In particular, when |c_n = n!|, |t| behaves exactly like the derivative of polynomials with respect to |x|, and we see that the formal power series pick out a special class of differential operators on polynomials.

A simple calculation shows that |(t^j t^k) x^n = t^j (t^k x^n)| which lifts to a general associativity law: |(f(t) g(t)) p(x) = f(t) (g(t) p(x))|. The adjointness property also immediately implies that the induced linear operators commute.

As before, we will say delta/invertible operator when the linear operator is induced by a delta/invertible formal power series.

Define |Dx^n = nx^{n-1}|, |D^{-1}x^n = \frac{1}{n+1}x^{n+1}|, and |x^{-1} x^n = \begin{cases}x^{n-1}, & n > 0 \\ 0, & n = 0\end{cases}|. Then for various choices of |c|, |t| behaves as the following linear operators:

|c_n = n!| implies |t = D|

|c_n = 1| implies |t = x^{-1}|

|c_n = (n!)^{m+1}| implies |t = (Dx)^m D|

|c_n = 1/(-\lambda)_{(n)}| implies |t = -(\lambda + xD)^{-1} x^{-1}|

|c_n = 1/n| implies |t = x^{-1} D^{-1} x^{-1}|

|c_n = 1/{-\lambda \choose n}| implies |t = -(\lambda + xD)^{-1} D|

|c_n = 2^{2n}(1+\alpha)^{(n)}/(1 + \alpha + \beta)^{(2n)}| implies |t = 4(1 + \alpha + \beta + 2xD)^{-1} (2 + \alpha + \beta + 2xD)^{-1} x^{-1} (\alpha + xD)|

|c_n = (1-q^n)^{-1} \prod_{k=1}^n (1-q^k)| implies |tp(x) = (p(qx) - p(x))/(qx - x)|

A linear operator |T| on |\mathscr F| is continuous if given a sequence of formal power series |\pseq{f_k}{k}| such that |\deg f_k \to \infty| as |k \to \infty|, we have |\deg T(f_k) \to \infty| as |k \to \infty|.

Theorem (id:fqys): If |T| is a continuous linear operator on |\mathscr F|, then \[ T\left(\sum_{k=0}^\infty a_k f_k(t)\right) = \sum_{k=0}^\infty a_k T(f_k(t)) \] for all sequences |\pseq{a_k}{k}| in |\mathbb K| and |\pseq{f_k}{k}| in |\mathscr F| for which |\deg f_k \to \infty| as |k \to \infty|.

Proof: (click to expand) By the assumptions, both |\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n}| and |\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n}| involve only finitely many terms of the sum. That is, for every |n| there is some |N| such that for all |m > N|, |\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n} = \pair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n}| and |\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n} = \pair{\sum_{k=0}^m a_k T(f_k(t))}{x^n}|. But \[\begin{align} \bigpair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n} & = \bigpair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n} \\ & = \bigpair{\sum_{k=0}^m a_k T(f_k(t))}{x^n} \\ & = \bigpair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n} \end{align}\] so these two linear functionals agree on a pseudobasis and thus are the same which implies the formal power series are the same as well. |\square|

This can be cast as an instance of topological continuity, but I won’t describe that here.

Evaluation Operator

Unlike the linear functional case, the linear operator induced by the formal power series corresponding to the evaluation functional does depend on the choice of |c|. In general, we have: \[\varepsilon_y(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k} c_k} y^k x^{n-k} \]

For |c_n = n!|, |\varepsilon_y(t) p(x) = p(x + y)|. For |c_n = 1|, |\varepsilon_y(t) x^n = \frac{x^{n+1} - y^{n+1}}{x-y}|.

Characterizing Linear Operators Induced from Formal Power Series

Theorem (id:lgtb): Let |U| be a linear operator on |P|. There is an |f \in \mathscr F| such that |Up(x) = f(t) p(x)| for all |p \in P| if and only if |U| commutes with the operator |t|, i.e. |Utp(x) = tUp(x)| for all |p \in P|.

Proof: (click to expand) The only if direction is obvious. For the other direction, first note that |\deg Ux^n \leq n| because |t^k Ux^n = Ut^k x^n = 0| if |k > n| so |\pair{t^k}{Ux^n} = 0| for all |k > n|. Now define \[f(t) = \sum_{k=0}^\infty \frac{\pair{t^0}{Ux^k}}{c_k}t^k \] Then, \[\begin{align} f(t) x^n & = \sum_{k=0}^n \frac{\pair{t^0}{Ux^k}}{c_k}t^k x^n \\ & = \sum_{k=0}^n \frac{c_n}{c_k c_{n-k}} \pair{t^0}{Ux^k} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^0}{Ut^{n-k} x^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^0}{t^{n-k} Ux^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^{n-k}}{Ux^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^k}{Ux^n}}{c_k} x^k \\ & = Ux^n \end{align}\] The last equality relies on the degree of |Ux^n| being less than or equal to |n|. |\square|

Corollary (id:viti): A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and only if it commutes with any specific delta operator.

Proof: (click to expand) The sequence of powers of the formal power series associated with the specific delta operator form a pseudobasis which means we can write |t| as an infinite linear combination of them. Thus the linear operator commutes with |t| and we can apply the theorem. |\square|

Corollary (id:qllw): A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and only if it commutes with |\varepsilon_y(t)| for all |y \in \mathbb K|.

Proof: |\varepsilon_y(t) - c_0^{-1} t^0| is a delta operator. |\square|

Polynomial Sequences

When we say |\pseq{p_n(x)}{n}| is a (polynomial) sequence, that will always mean that |\deg p_n(x) = n|.

Theorem (id:cgsr): Let |f| be a delta series and |g| be an invertible series, then there is a unique polynomial sequence |\pseq{s_n(x)}{n}| such that \[\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n \] holds for all |n,k \in \mathbb N|.

Proof: (click to expand)
Uniqueness follows easily by considering |\pair{g(t)f(t)^k}{s_n(x) - r_n(x)} = 0| where |\pseq{r_n(x)}{n}| is another sequence satisfying the same property.
For existence, we can just brute force it. If |g(t)f(t)^k = \sum_{i=k}^\infty b_{k,i} t^i| and we set |s_n(x) = \sum_{j=0}^n a_{n,j} x^j|, then we want to solve for the |a_{n,i}| induced by the following triangular system of linear equations: \[\begin{align} c_n\delta_k^n & = \pair{g(t)f(t)^k}{s_n(x)} \\ & = \bigpair{\sum_{i=k}^\infty b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\ & = \bigpair{\sum_{i=k}^n b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\ & = \sum_{i=k}^n \sum_{j=0}^n b_{k,i} a_{n,j} \pair{t^i}{x^j} \\ & = \sum_{i=k}^n c_i b_{k,i} a_{n,i} \end{align}\] |b_{k,k} \neq 0| since |\deg g(t)f(t)^k = k| and |c_n \neq 0| by assumption. Therefore, the diagonal entries of the triangular matrix corresponding to this system of linear equations are non-zero, and thus the matrix is invertible. |\square|

We’ll say |\pseq{s_n(x)}{n}| is the Sheffer sequence or is Sheffer for the pair |(g, f)|. When |g(t) = 1|, then we say that the corresponding Sheffer sequence is the associated sequence to |f|. When |f(t) = t|, then we say that the corresponding Sheffer sequence is the Appell sequence for |g|. Often I’ll use |\pseq{p_n(x)}{n}| for associated sequences, i.e. when |g(t) = 1|, and reserve |\pseq{s_n(x)}{n}| for the general case. The idea is that if |g(t)f(t)^k| takes the place of |t^k|, then |s_n(x)| takes the place of |x^n|. This is illustrated by the defining property and the following theorems.

Theorem (Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for any |h \in \mathscr F|, \[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]

Proof: (click to expand) |\pseq{s_n(x)}{n}|, like any polynomial sequence, is a basis for |P|, so two linear functionals are equal if they agree on all |s_n(x)|. Clearly, \[\begin{align} \bigpair{\sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)} & = \bigpair{\sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)} \\ & = \sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}\pair{g(t)f(t)^k}{s_n(x)} \\ & = \pair{h(t)}{s_n(x)} \end{align}\] |\square|

Corollary (Polynomial Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for a |p \in P|, \[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]

Proof: (click to expand) Choose |h = \varepsilon_y| in the Expansion Theorem, then apply it as a linear functional to |p|. |\square|

Theorem (Generating Function): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] for all |y \in \mathbb K|.

Proof: (click to expand)
For the forward implication, using the Expansion Theorem we have: \[ \varepsilon_y(t) = \sum_{k=0}^\infty \frac{\pair{\varepsilon_y}{s_k(x)}}{c_k} g(t) f(t)^k = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} g(t) f(t)^k \] Substituting |\bar f(t)| for |t| and dividing both sides by |g(\bar f(t))| gives the result.
For the reverse implication, if |\pseq{r_n(x)}{n}| is the Sheffer sequence for |(g, f)| then we immediately get \[ \sum_{k=0}^\infty \frac{r_k(y)}{c_k} t^k = \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] and applying both sides to |x^n| then gives |s_n(y) = r_n(y)| for all |y \in \mathbb K|. |\square|

Theorem (Conjugate Representation): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]

Proof: (click to expand) Applying |\varepsilon_y| to both sides gives \[\begin{align} s_n(y) & = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} y^k \\ & = \bigpair{\sum_{k=0}^n \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\ & = \bigpair{\sum_{k=0}^\infty \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\ & = \pair{g(\bar f(t))^{-1}\varepsilon_y(\bar f(t))}{x^n} \\ \end{align}\] but \[ s_n(y) = \bigpair{\sum_{k=0}^\infty \frac{s_n(y)}{c_k}t^k}{x^n} \] so we can apply the Generating Function theorem. |\square|

Theorem (Multiplication Theorem): Let |\pseq{s_n(x)}{n}| be Appell for |g|, then \[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \] for |\alpha\neq 0|.

Proof: (click to expand) \[\begin{align} \pair{t^k}{g(t/\alpha) s_n(\alpha x)} & = \pair{t^k g(t/\alpha)}{s_n(\alpha x)} \\ & = \pair{(\alpha t)^k g(\alpha t/\alpha)}{s_n(x)} \\ & = \alpha^k \pair{g(t) t^k}{s_n(x)} \\ & = \alpha^k c_n \delta_k^n \\ & = \alpha^n c_n \delta_k^n \\ & = \alpha^n \pair{g(t)t^k}{s_n(x)} \\ & = \pair{t^k}{\alpha^n g(t) s_n(x)} \end{align}\] so |g(t/\alpha) s_n(\alpha x) = \alpha^n g(t) s_n(x)|. |\square|

Theorem (id:qqes): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.

Proof: Just apply adjointness to the definition. |\square|

Theorem (id:cutg): A sequence |\pseq{p_n(x)}{n}| is the associated sequence for |f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|, and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N_+|.

Proof: (click to expand)
|f(t)^0 = t^0| implies the first condition. For the second condition, if |\pseq{p_n(x)}{n}| is associated to |f|, then for |k > 0| \[\begin{align} \bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)} & = \frac{c_n}{c_{n-1}}\pair{f(t)^{k-1}}{p_{n-1}(x)} \\ & = c_n \delta_k^n \\ & = \pair{f(t)^k}{p_n(x)} \\ & = \pair{f(t)^{k-1}}{f(t)p_n(x)} \end{align}\] and |\pseq{f(t)^k}{k}| is a pseudobasis.
Conversely, assuming (1) and (2) hold, then \[\begin{align} \pair{f(t)^k}{p_n(x)} & = \pair{t^0}{f(t)^k p_n(x)} \\ & = \frac{c_n}{c_{n-k}} \pair{t^0}{p_{n-k}(x)} \tag{(2) k times} \\ & = \frac{c_n}{c_{n-k}} c_0 \delta_{n-k}^0 \tag{(1)} \\ & = c_n \delta_k^n \end{align}\] |\square|

Theorem (id:hvdt): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.

Proof: (click to expand)
For the forward implication, simply apply the previous theorem to |\pseq{g(t)s_n(x)}{n}| which is associated to |f|, then apply |g(t)^{-1}| to the resulting recurrence equation.
For the reverse implication, let |\pseq{p_n(x)}{n}| be the associated sequence for |f| and |U| be the linear operator defined by sending |s_n(x)| to |p_n(x)|. Then we have \[ Uf(t)s_n(x) = \frac{c_n}{c_{n-1}}Us_{n-1}(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x) = f(t)p_n(x) = f(t)Us_n(x) \] Since |\pseq{s_n(x)}{n}| is a basis, we see that |U| commutes with a delta series and thus must be of the form |g(t)| for some |g| which is invertible because |U| preserves degree. Thus |\pseq{g(t)s_n(x)}{n}| is associated to |f| which is equivalent to |\pseq{s_n(x)}{n}| being Sheffer for |(g, f)|. |\square|

Corollary (id:tvdx): \[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]

Proof: (click to expand) Start by expanding |ts_n(x)| via the Polynomial Expansion Theorem \[\begin{align} ts_n(x) & = \sum_{k=0}^\infty \frac{\pair{g(t)f(t)^k}{ts_n(x)}}{c_k} s_k(x) \\ & = \sum_{k=0}^{n-1} \frac{\pair{t}{g(t)f(t)^k s_n(x)}}{c_k} s_k(x) \\ & = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{g(t) s_{n-k}(x)} s_k(x) \tag{theorem (id:hvdt)} \\ & = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{p_{n-k}(x)} s_k(x) \tag{theorem (id:qqes)} \end{align}\] |\square|

Theorem (Sheffer Identity): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if \[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \] for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.

Proof: (click to expand)
First, we’ll establish that \[ \varepsilon_y(t) p_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x) \]

Applying |f(t)^k| to both sides of the equation leads to \[\begin{align} \pair{f(t)^k}{\varepsilon_y(t) p_n(x)} & = \pair{\varepsilon_y(t)}{f(t)^k p_n(x)} \\ & = \frac{c_n}{c_{n-k}} \pair{\varepsilon_y(t)}{p_{n-k}(x)} \\ & = \frac{c_n}{c_{n-k}} p_{n-k}(y) \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_{j-k}} p_i(y) \pair{t^0}{p_{j-k}(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{t^0}{f(t)^k p_j(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{f(t)^k}{p_j(x)} \\ & = \bigpair{f(t)^k}{\sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x)} \end{align}\] which is most easily read from outwards in.
Doing the same trick as the previous proof, we let |U| be a linear operator defined by sending |s_n(x)| to |p_n(x)|. If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some |g|, we can choose |U = g(t)| and applying |U| to both sides of the above equation gives the forward direction since |g(t)| commutes with |\varepsilon_y(t)|. Conversely, if we assume the equation then we’ll see that |U| must commute with |\varepsilon_y(t)| which implies its of the form |g(t)|. |\square|

As an example, we see that the Bernoulli polynomials |B_n(x)| are Sheffer for |(1, t)| with |c_n = n!| for which the associated polynomials are |\pseq{x^n}{n}|. The Sheffer Identity is thus the one mentioned in the Introduction.

Theorem (id:jtbs): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and |\pseq{p_n(x)}{n}| be associated to |f|. For all |h, l \in \mathscr F|, \[ \pair{h(t)l(t)}{s_n(x)} = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{h(t)}{p_i(x)} \pair{l(t)}{s_j(x)} \]

Proof: (click to expand) Use the Expansion Theorem on |h| with respect to |\pseq{p_n(x)}{n}| and |l| with respect to |\pseq{s_n(x)}{n}|. |h(t)l(t)| will then be \[ \sum_{i+j=k} \left(\frac{\pair{h(t)}{p_i(x)}}{c_i} f(t)^i\right) \left(\frac{\pair{l(t)}{s_j(x)}}{c_j} g(t)f(t)^j\right) = \left[\sum_{i+j=k} \frac{\pair{h(t)}{p_i(x)}\pair{l(t)}{s_j(x)}}{c_i c_j}\right] g(t)f(t)^k \] Applying this to |s_n(x)| gives the result. |\square|

See the paper for an interesting alternative proof of this result.

Recurrence Formulas

Given a linear operator |\mu| on |P|, the adjoint |\mu^*| is a linear operator on |\mathscr F| characterized by: \[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]

We can readily compute that: \[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]

Theorem (id:ahdu): The adjoint to a linear operator on |P| is continuous.

Proof: (click to expand) Let |\pseq{f_k(t)}{k}| where |\deg f_k \to \infty| as |k \to \infty|. Let |K_n| be an index such that |\deg f_{K_n} > \sup_{i=0}^n \deg \mu x^i|. |\pair{f_{K_n}(t)}{\mu x^m} = 0| for all |m \leq n|, so, using the formula above, |\deg \mu^* f_{K_n} \geq n|. |\square|

In fact, a linear operator on |\mathscr F| is an adjoint to one on |P| if and only if it is continuous.

Theorem (id:fdui): If |T| is a continuous linear operator on |\mathscr F|, then there exists a linear operator |\mu| on |P| such that |T = \mu^*|.

Proof: (click to expand) Define |\mu x^n = \sum_{k=0}^\infty \frac{\pair{Tt^k}{x^n}}{c_k} x^k| which is well-defined because |T| being continuous means only finitely many |\pair{Tt^k}{x^n}| are non-zero. By construction, we see that |\pair{t^k}{\mu x^n} = \pair{Tt^k}{x^n}|. |\square|

If |\pseq{p_n(x)}{n}| is associated to |f|, then the umbral shift |\theta_f| associated to |f| is the linear operator on |P| defined by \[ \theta_f p_n(x) = \frac{(n+1)c_n}{c_{n+1}} p_{n+1}(x) \] for all |n \in \mathbb N|. In the case where |p_n(x) = x^n| and |c_n = n!| the umbral shift is just multiplication by |x|. Since, famously, multiplication by |x| does not commute with differentiation, |Dx - xD = 1| as operators, the umbral shift isn’t induced as a linear operator by a formal power series. We’ll see that this is generally the case below.

A derivation on an algebra |A| is a linear operator |\partial| on |A| satisfying \[ \partial(ab) = (\partial a)b + a\partial b \] for all |a, b \in A|.

Lemma (id:lxnr): A continuous linear operator |\partial| on |\mathscr F| is a continuous derivation if and only if |\partial 1 = 0| and for any delta series |f|, |\partial f(t)^k = kf(t)^{k-1}g(t)| for all |k \in \mathbb N| for some |g|.

Proof: (click to expand) A continuous derivation satisfies |\partial h(t) = h’(t)\partial t| from which the result follows with |g(t) = f’(t)\partial t|. A continuous linear operator satisfying these laws satisfies |\partial h(t) = h’(t)f’(t)^{-1}g(t)| which we can see by expanding |h(t)| in terms of the pseudobasis |\pseq{f(t)^k}{k}| and using continuity to push the |\partial| into the sum. Given this \[\begin{align} \partial (h(t)k(t)) & = (h(t)k(t))'f’(t)^{-1}g(t) \\ & = k(t)h’(t)f’(t)^{-1}g(t) + h(t)k’(t)f’(t)^{-1}g(t) \\ & = (\partial h(t))k(t) + h(t)\partial k(t) \end{align}\] |\square|

Theorem (id:oqyq): An operator |\theta| on |P| is the umbral shift for the delta series |f| if and only if its adjoint |\theta^*| is a derivation on |\mathscr F| and \[ \theta^* f(t)^k = kf(t)^{k-1} \] for all |k \in \mathbb N|.

Proof: (click to expand)
If |\theta| is the umbral shift for |f| associated to |\pseq{p_n(x)}{n}|, then \[\begin{align} \pair{\theta^* f(t)^k}{p_n(x)} & = \pair{f(t)^k}{\theta p_n(x)} \\ & = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)} \\ & = (n+1)c_n \delta_k^{n+1} \\ & = kc_n \delta_k^{n+1} \\ & = kc_n \delta_{k-1}^n \\ & = \pair{kf(t)^{k-1}}{p_n(x)} \end{align}\] We of course have |\pair{\theta^* t^0}{p_n(x)} = \pair{t^0}{\theta p_n(x)} = \frac{(n+1)c_n c_0}{c_{n+1}} \pair{f(t)^0}{p_{n+1}(x)} = \frac{(n+1)c_n c_0}{c_{n+1}} \delta_0^{n+1} = 0| since |n+1| is never |0|. Since |\theta^*| is an adjoint, it’s continuous, and we can apply the previous lemma to conclude that it is a derivation.
If |\theta^*| is a derivation satisfying |\theta^* f(t)^k = kf(t)^{k-1}| then we can rearrange the equations of the first result to get: \[\begin{align} \pair{f(t)^k}{\theta p_n(x)} & = \pair{\theta^* f(t)^k}{p_n(x)} \\ & = \pair{kf(t)^{k-1}}{p_n(x)} \\ & = kc_n \delta_{k-1}^n \\ & = kc_n \delta_k^{n+1} \\ & = (n+1)c_n \delta_k^{n+1} \\ & = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)} \end{align}\] |\square|

In the particular case where |f(t) = t|, the above states that |\theta_t^* t^k = kt^{k-1}|, i.e. |\theta_t^* f(t) = f’(t)|. (Not to be confused with |\partial_t| which is only the true derivative when |c_n = n!|.) We can easily compute that |\theta_t t = xD|. Notably, this does not depend on the choice of |c|.

Theorem (id:kscz): If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then \[ \theta_t s_n(x) = \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]

Proof: (click to expand) Start with the Polynomial Expansion Theorem. \[\begin{align} \theta_t s_n(x) & = \sum_{n=0}^{n+1} \frac{\pair{g(t)f(t)^k}{\theta_t s_n(x)}}{c_k} s_k(x) \\ & = \sum_{n=0}^{n+1} \frac{\pair{\theta_t^*(g(t)f(t)^k)}{s_n(x)}}{c_k} s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)f(t)^k + kg(t)f(t)^{k-1}f’(t)}{s_n(x)}}{c_k}\right] s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)}{f(t)^k s_n(x)} + \pair{kg(t)f’(t)}{f(t)^{k-1} s_n(x)}}{c_k}\right] s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}}\pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_{n-k+1}c_k}\pair{g(t)f’(t)}{s_{n-k+1}(x)}\right] s_k(x) \end{align}\] |\square|

Lemma (id:golc): A surjective derivation on |\mathscr F| is continuous.

Proof: (click to expand) For any derivation |\partial|, we have |\partial 1 = \partial 1 + \partial 1| so |\partial 1 = 0|. We also have |\partial t^k = kt^{k-1}\partial t|. Since |\deg \partial t^0 = \infty| and |\deg \partial t^k = \deg \partial t + k - 1|, the only way to get something of degree |0| and have |\partial| be surjective is if |\deg \partial t = 0|. In general, if |\deg f = k| then |f(t) = t^k g(t)| where |\deg g = 0|. Therefore, |\deg \partial f(t) = \deg (t^k \partial g(t) + kt^{k-1}g(t)\partial t)| and the right-hand side has degree |k-1| implying |\partial| is continuous. |\square|

Theorem (id:gxkw): A surjective derivation on |\mathscr F| is adjoint to an umbral shift and vice versa.

Proof: (click to expand)
For the reverse direction, since |f| is a delta series in the previous theorem and |\theta^* f(t)^k = kf(t)^{k-1}|, |\pseq{\theta^* f(t)^k}{k}| is a pseudobasis and so |\theta^*| is surjective.
For the forward direction, if |\partial| is a surjective derivation, then it is an adjoint of a linear operator on |P| because it is continuous. Generally, we have |\partial f(t) = f’(t)\partial t| so we can solve |f’(t) = (\partial t)^{-1}| with |f(0) = 0|, i.e. |\deg f = 1|. Then |\partial f(t)^k = kf(t)^{k-1} f’(t)\partial t| and |f’(t) = (\partial t)^{-1}| so we end up with just |\partial f(t)^k = k f(t)^{k-1}| as desired. |\square|

Lemma (id:woxc): If |f| and |g| are delta series, then \[\theta_f^* = (\theta_f^* g(t))\theta_g^* \]

Proof: (click to expand) For any derivation |\partial|, we have |\partial a^k = (\partial a)ka^{k-1}| and |\theta_f^*| is a derivation. Therefore \[ \theta_f^* g(t)^k = (\theta_f^* g(t))kg(t)^{k-1} = (\theta_f^* g(t)) \theta_g^* g(t)^k \] and |g(t)^k| for |k\in \mathbb N| is a pseudobasis so this suffices to show the operators are equal. |\square|

Theorem (id:dxii): If |\theta_f| and |\theta_g| are umbral shifts, then \[ \theta_f = \theta_g \circ (\theta_g^* f(t))^{-1} \]

Proof: (click to expand) \[\begin{align} \pair{t^k}{\theta_f p(x)} & = \pair{\theta_f^* t^k}{p(x)} \\ & = \pair{(\theta_g^* f(t))^{-1}\theta_g^* t^k}{p(x)} \\ & = \pair{\theta_g^* t^k}{(\theta_g^* f(t))^{-1} p(x)} \\ & = \pair{t^k}{\theta_g (\theta_g^* f(t))^{-1} p(x)} \end{align}\] |\square|

Theorem (id:ouma): If |\pseq{p_n(x)}{n}| is associated to |f|, then \[ p_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_t(f’(t))^{-1} p_n(x) \]

Proof: This is just the previous theorem applied to |p_n(x)| with |g(t) = t|. |\square|

Lemma (id:xnvj): Let |\theta_f| be the umbral shift for |f|. Then \[ \theta_f^*(h(t)) = h(t) \theta_f - \theta_f h(t) \] for all |h \in \mathscr F|. The left-hand side is the linear operator on |P| induced by the formal power series that is the output of |\theta_f^*|.

Proof: (click to expand) \[\begin{align} \pair{t^k}{\theta_f^*(h(t))x^n + \theta_f h(t) x^n} & = \pair{t^k}{\theta_f^*(h(t))x^n} + \pair{t^k}{\theta_f h(t) x^n} \\ & = \pair{\theta_f^*(h(t)) t^k}{x^n} + \pair{h(t)\theta_f^*(t^k)}{x^n} \\ & = \pair{\theta_f^*(h(t)) t^k + h(t)\theta_f^*(t^k)}{x^n} \\ & = \pair{\theta_f^*(h(t) t^k)}{x^n} \\ & = \pair{t^k}{h(t) \theta_f x^n} \end{align}\] |\square|

This lemma shows that no umbral shift has the form |g(t)| for a formal power series |g|.

Theorem (id:omlq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. Then if |\theta_f| is the umbral shift for |f|, \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]

Proof: (click to expand) We have that |\pseq{g(t)s_n(x)}{n}| is associated to |f| leading to \[ g(t)s_n(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_f g(t)s_{n+1}(x) \] The previous lemma leads to \[\begin{align} s_n(x) & = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}\theta_f g(t)s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}(g(t)\theta_f - \theta_f^*(g(t)))s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} (\theta_f - g(t)^{-1}\theta_f^*(g(t)))s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} (\theta_f + g(t)\theta_f^*(g(t)^{-1}))s_{n+1}(x) \end{align}\] where the final equality comes from \[ 0 = \theta_f^*(1) = \theta_f^*(g(t)g(t)^{-1}) = g(t)^{-1}\theta_f^*(g(t)) + g(t)\theta_f^*(g(t)^{-1}) \] |\square|

Theorem (id:kusb): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}\left(\theta_t - \frac{g’(t)}{g(t)}\right) \frac{1}{f’(t)} s_n(x) \]

Proof: (click to expand)
Starting from the previous theorem \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]
Theorem (id:dxii) gives |\theta_f = \theta_t f’(t)^{-1}|. Since |\theta_f^* h(t) = h’(t)\theta_f^* t| for any |h|, we first note that using theorem (id:oqyq), |1 = \theta_f^* f(t) = f’(t) \theta_f^* t| or |\theta_f^* t = f’(t)^{-1}|. Then \[ g(t)\theta_f^*(g(t)^{-1}) = -\frac{g(t) g’(t)}{g(t)^2} \theta_f^* t = -\frac{g’(t)}{g(t)} \theta_f^* t = -\frac{g’(t)}{g(t)} f’(t)^{-1} \] |\square|

Theorem (id:vxhh): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. If \[ T = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)} = \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \] then \[ Ts_n(x) = ns_n(x) \] In other words, |s_n(x)| is an eigenfunction for |T| with eigenvalue |n|.

Proof: (click to expand) The equality for |T| just involves inserting a |t/t| in the middle. \[ Ts_n(x) = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}f(t)s_n(x) = \frac{c_n}{c_{n-1}}\left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}s_{n-1}(x) = n s_n(x) \] where theorem (id:kusb) and theorem (id:hvdt) have been used. |\square|

Transfer Formulas

Theorem (Transfer Formula): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \] for all |n \in \mathbb N|.

Proof: (click to expand)
We verify that the right-hand side meets the conditions of theorem (id:cutg). Condition (2) is easily verified: \[\begin{align} f(t) p_n(x) & = f(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = f’(t)\left(\frac{t}{f(t)}\right)^n t x^n \\ & = \frac{c_n}{c_{n-1}} f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1} \\ & = \frac{c_n}{c_{n-1}} p_{n-1}(x) \end{align}\]

For condition (1), we start with a small trick by writing |f’(t) = [t(f(t)/t)]’|. \[\begin{align} \bigpair{t^0}{f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n} & = \bigpair{\left(t\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \\ & = \bigpair{\left(\frac{t}{f(t)}\right)^n + t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \tag{product rule} \\ & = \bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \end{align}\]

We proceed from there by cases. In the |n = 0| case, we have \[ \bigpair{t^0}{x^0} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)}{x^0} = \pair{t^0}{x^0} = c_0 \]
For the |n > 0| case, to simplify the expressions we’ll show that \[ \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} = -\bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} \] We proceed as follows \[\begin{align} \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} & = \bigpair{\left(\frac{f(t)}{t}\right)'\left(\frac{f(t)}{t}\right)^{-n-1}}{tx^n} \\ & = -\frac{1}{n}\bigpair{\left[\left(\frac{f(t)}{t}\right)^{-n}\right]'}{tx^n} \\ & = -\frac{1}{n}\bigpair{\theta_t^*\left[\left(\frac{f(t)}{t}\right)^{-n}\right]}{tx^n} \\ & = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{\theta_t tx^n} \\ & = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{xDx^n} \\ & = -\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{x^n} \end{align}\] |\square|

Theorem (Transfer Formula, alternate form): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \] for all |n \geq 1|.

Proof: (click to expand) \[\begin{align} \bigpair{f(t)^k}{\frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1}} & = \frac{c_n}{nc_{n-1}}\bigpair{\theta_t^*[f(t)^k]}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}f’(t)}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}}{f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{k}{n}\bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)} \tag{Transfer Formula} \\ & = \frac{k}{n}\pair{f(t)^{k-1}}{f(t)p_n(x)} \tag{theorem (id:cutg)} \\ & = \frac{k}{n}\pair{f(t)^k}{p_n(x)} \\ & = \frac{k}{n}c_n\delta_k^n \\ & = c_n\delta_k^n \\ & = \pair{f(t)^k}{p_n(x)} \end{align}\] |\square|

Corollary (id:hcem): Let |\pseq{p_n(x)}{n}| is associated to |f| and |\pseq{q_n(x)}{n}| is associated to |gf| with |g| invertible. Then \[ q_n(x) = \theta_t g(t)^{-n} \theta_t^{-1} p_n(x) \] where |\theta_t^{-1} x^{n+1} = (c_{n+1}/((n+1)c_n))x^n| and |\theta_t^{-1} 1 = 0|.

Proof: Just write |q_n(x)| using the alternate form of the Transfer Formula. |\square|

Theorem (id:jyhq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|, and let |h| and |l| be invertible series. Then the sequence |r_n(x) = h(t)l(t)^n s_n(x)| is Sheffer for \[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \]

Proof: (click to expand) Apply |g(t)| to both sides getting \[ g(t)r_n(x) = h(t)l(t)^n g(t)s_n(x) \] and noting that |\pseq{g(t)s_n(x)}{n}| is associated to |f| and applying the Transfer Formula, we get: \[\begin{align} g(t)r_n(x) & = h(t)l(t)^n f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} l(t)^{n+1} \left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} \frac{[l(t)^{-1} f(t)]'}{[l(t)^{-1} f(t)]'} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ & = \frac{h(t)f’(t)}{[l(t)^{-1} f(t)]'l(t)} [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ \end{align} \] So we get \[ \frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)} l(t)g(t)r_n(x) = [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \] which by the Transfer Formula means these are associated to |l(t)^{-1}f(t)| and thus |\pseq{r_n(x)}{n}| is Sheffer for \[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \] |\square|

Umbral Composition and Transfer Operators

Let |\pseq{p_n(x)}{n}| be the associated sequence to |f|. The transfer or umbral² operator for |\pseq{p_n(x)}{n}| (or |f|) is the linear operator |\lambda_f| on |P| defined by \[ \lambda_f x^n = p_n(x) \] This implies the adjoint operator is \[ \lambda_f^* g(t) = \sum_{k=0}^\infty \frac{\pair{g(t)}{p_k(x)}}{c_k} t^k \]

Lemma (id:oony): A |\mathbb K|-algebra homomorphism of |\mathscr F| is an automorphism if and only if it is preserves degree.

Proof: (click to expand)
For the forward direction, assume |T| is an automorphism. Let |f(t) = T^{-1}(t)| with |\deg f = k| so |f(t) = t^kg(t)|. |k > 0| since |\mathbb K|-algebra homomorphisms send constants to constants but then |T(f(t)) = t = T(t)^k T(g(t))| and the degrees can only line up if |k = 1| and |\deg T(t) = 1|. |T| thus can’t reduce degree and by the same logic neither can |T^{-1}|, so |T| must preserve degree.
For the reverse direction, a degree preserving linear operator is continuous. We have that |\pseq{T(t)^k}{k}| is a pseudobasis, so for any |f \in \mathscr F|, we can write \[ f(t) = \sum_{k=0}^\infty a_k T(t)^k = T\left(\sum_{k=0}^\infty a_k t^k\right) \] so |g(t) = \sum_{k=0}^\infty a_k t^k| satisfies |f = T(g)| and for every |f| we can find such a |g| making |T| surjective. The uniqueness of the |\pseq{a_k}{k}| implies |T| is injective and thus bijective. For abstract nonsense reasons this is enough for it to be an automorphism. |\square|

Lemma (id:lfum): If |T| is a continuous |\mathbb K|-algebra homomorphism on |\mathscr F| and |f, g \in \mathscr F| with |\deg f > 0|, then |T(g(t)) = g(T(t))| and |(Tg)(f(t)) = T(g(f(t)))|.

Proof: (click to expand) \[\begin{align} T(g(t)) & = T\left(\sum_{k=0}^\infty g_k t^k\right) \\ & = \sum_{k=0}^\infty g_k T(t)^k \\ & = g(T(t)) \end{align}\] and |(Tg)(f(t)) = g(T(f(t))) = T(g(f(t)))|. |\square|

Theorem (id:stji): A linear operator |\lambda| on |P| is the transfer operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a |\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^* f(t) = t|. This makes |\lambda_f(g(t)) = g(\bar f(t))|.

Proof: (click to expand)
For the forward direction, if |\lambda| is a transfer operator for |f| then we immediately get \[ \lambda^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{p_k(x)}}{c_k} t^k = \sum_{k=0}^\infty \delta_1^k t^k = t \] and \[\begin{align} \pair{\lambda^*(g(t)h(t))}{x^n} & = \pair{g(t)h(t)}{\lambda x^n} \\ & = \pair{g(t)h(t)}{p_n(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{g(t)}{p_i(x)}\pair{h(t)}{p_j(x)} \tag{theorem (id:jtbs)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{\lambda^* g(t)}{x^i}\pair{\lambda^* h(t)}{x^j} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{\lambda^* g(t)}{x^i}\pair{\lambda^* h(t)}{x^j} \\ & = \pair{(\lambda^* g(t))(\lambda^* h(t))}{x^n} \end{align}\]
For the reverse direction, \[\begin{align} \pair{f(t)^k}{\lambda x^n} & = \pair{\lambda^*(f(t)^k)}{x^n} \\ & = \pair{\lambda^*(f(t))^k}{x^n} \\ & = \pair{t^k}{x^n} \\ & = c_n \delta_k^n \end{align}\] so |\lambda x^n| has the characteristic property of |p_n(x)|. |\square|

Corollary (id:chiw): A continuous |\mathbb K|-algebra automorphism on |\mathscr F| is the adjoint of a transfer operator.

Proof: By theorem (id:fdui) a continuous linear operator is adjoint to some linear operator on |P|, and being an automorphism there is some |f \in \mathscr F| that gets mapped to |t| by the automorphism so the previous theorem applies. |\square|

Summarizing some results of this form: There’s a bijection between continuous linear operators on |\mathscr F| and linear operators on |P|. Further, there’s a bijection between continuous surjective derivations on |\mathscr F| and umbral shifts, and a bijection between continuous |\mathbb K|-algebra automorphisms on |\mathscr F| and transfer operators.

Corollary (id:ocfq): Transfer operators form a group with |(\lambda_f^*)^{-1} = \lambda_{\bar f}^*| and |\lambda_f^* \circ \lambda_g^* = \lambda_{g\circ f}^*|.

Proof: This readily follows from |\lambda_f^* g(t) = g(\bar f(t))|. |\square|

Theorem (id:ydci):

An transfer operator maps associated sequences to associated sequences.

If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| are associated to |f| and |g| respectively, and |\lambda| is a linear operator which maps |p_n(x)| to |q_n(x)|, then |\lambda^* g(t) = f(t)|.

Additionally, |\lambda| is a transfer operator.

Proof: (click to expand)
For all of the following let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be associated to |f| and |g| respectively and |\lambda_f|, |\lambda_g| be the transfer operators.

For 1, \[ \pair{(\lambda_g^*)^{-1}(g(t)^k)}{\lambda_g q_n(x)} = \pair{\lambda_g(\lambda_g^*)^{-1}(g(t)^k)}{q_n(x)} = \pair{g(t)^k}{q_n(x)} = c_n \delta_k^n \] so |\lambda_g q_n(x)| is associated to |(\lambda_g^*)^{-1}(g(t))|.

For 2, \[ \pair{\lambda^*(g(t))}{p_n(x)} = \pair{g(t)}{\lambda p_n(x)} = \pair{g(t)}{q_n(x)} = c_n \delta_1^n = \pair{f(t)}{p_n(x)} \]
For 3, by lemma (id:lfum), we can just precompose with |\bar f| to get |\lambda(g(\bar f(t))) = t| and do the same logic as 2 with |t^k| and |x^n| implying that |\lambda| is a transfer operator for |g(\bar f(t))|. |\square|

Let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be two polynomial sequences. The umbral composition of |q| with |p| is written and defined as \[ \ucomp{q}{p} = \sum_{k=0}^n q_{n,k}p_k(x) \] If |\lambda| is the transfer operator for |\pseq{p_n(x)}{n}|, then we have \[ \ucomp{q}{p} = \lambda q_n(x) \]

Theorem (id:xvse): If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| are associated to |f| and |g| respectively, then |\pseq{\ucomp{q}{p}}{n}| is associated to |g(f(t))|.

Proof: (click to expand) \[\begin{align} \pair{g(f(t))^k}{\ucomp{q}{p}} & = \pair{g(f(t))^k}{\lambda_f q_n(x)} \\ & = \pair{\lambda_f^*(g(f(t)))^k}{q_n(x)} \\ & = \pair{g(\lambda_f^*(f(t)))^k}{q_n(x)} \\ & = \pair{g(t)^k}{q_n(x)} \\ & = c_n \delta_k^n \\ \end{align}\] |\square|

Corollary (id:xajh): Umbral composition makes the set of associated sequences into a group.

Proof: It follows the group structure of transfer operators. |\square|

A Sheffer operator is the linear operator |\mu_{g,f}| defined by |\mu_{g,f}x^n = s_n(x)| where |\pseq{s_n(x)}{n}| is Sheffer for |(g,f)|. By considering the associated sequence induced by a Sheffer sequence, i.e. |\pseq{g(t)s_n(x)}{n}|, we readily get |\mu_{g,f} = g(t)^{-1}\lambda_f| so Sheffer operators can be reduced to transfer operators.

Theorem (id:pnci): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.

Proof: (click to expand) \[\begin{align} \pair{g(t)h(f(t))l(f(t))^k}{\ucomp{r}{s}} & = \pair{g(t)h(f(t))l(f(t))^k}{\mu_{g,f} r_n(x)} \\ & = \pair{h(f(t))l(f(t))^k}{g(t)\mu_{g,f} r_n(x)} \\ & = \pair{h(f(t))l(f(t))^k}{\lambda_f r_n(x)} \\ & = \pair{\lambda_f^*(h(f(t))l(f(t))^k)}{r_n(x)} \\ & = \pair{h(\lambda_f^*(f(t)))l(\lambda_f^*(f(t)))^k)}{r_n(x)} \\ & = \pair{h(t)l(t)^k)}{r_n(x)} \\ & = c_n \delta_k^n \end{align}\] |\square|

Corollary (id:fony): \[ \pair{h(t)}{\mu_{g,f} q_n(x)} = \pair{\mu_{g,f}^*(h(t))}{q_n(x)} = \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]

Proof: Immediate from definition of |\mu_{g,f}| and theorem (id:stji). |\square|

Given two polynomial sequences |\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| related by \[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \] the connection-constants problem is to determine the constants |a_{n,k}|. When |\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| are Sheffer for given formal power series pairs, we can solve this problem as follows.

Theorem (id:hqtd): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. If \[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \] then the sequence |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is Sheffer for \[ \left(\frac{h(\bar f(t))}{g(\bar f(t))}, l(\bar f(t)) \right) \]

Proof: (click to expand) Clearly, |r_n(x) = \ucomp{t}{s}|, so we just apply theorem (id:pnci) and solve for the |u, v \in \mathscr F| such that |\pseq{t_n(x)}{n}| is Sheffer for |(u, v)|. |\square|

Corollary (id:ixkb): Let |\pseq{p_n(x)}{n}| be associated to |f| and |\pseq{q_n(x)}{n}| be associated to |l| and \[ q_n(x) = \sum_{k=0}^n a_{n,k} p_k(x) \] then |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is associated to |l(\bar f(t))|.

Proof: Immediate from the previous theorem. |\square|

Transfer operators give us a concise proof of the Lagrange Inversion Formula used for the compositional inverse. The usual formula would arise from |g(t) = t| in the below.

Corollary (Lagrange Inversion Formula): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \] Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.

Proof: (click to expand)
First we characterize the action of |\lambda_f^*|. |\lambda_f^*(g(f(t))) = g(\lambda_f^*(f(t))) = g(t)| so |\lambda_f^*(g(t)) = g(\bar f(t))|.
We conclude with a use of the alternate form of the Transfer Formula with |\pseq{p_n(x)}{n}| being associated to |f|. \[\begin{align} \pair{g(\bar f(t))}{x^n} & = \pair{\lambda_f^*(g(t))}{x^n} \\ & = \pair{g(t)}{\lambda_f x^n} \\ & = \pair{g(t)}{p_n(x)} \\ & = \bigpair{g(t)}{\theta_t\left(\frac{t}{f(t)}\right)^n x^{n-1}} \tag{Transfer Formula, alternate form} \\ & = \bigpair{\theta_t^*(g(t))\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \\ & = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \end{align}\] |\square|

Corollary (Lagrange Inversion Formula, alternate form): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]

Proof: Do the same proof as the previous corollary just with the first form of the Transfer Formula. |\square|

Corollary (Lagrange Inversion Formula, Hermite form): Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n} = \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]

Proof: Apply the previous corollary with |g(t) = h(t)\frac{f(t)}{tf’(t)}|. |\square|

Example: Chebyshev Polynomials

While the “classical” umbral calculus generally used |c_n = n!|, one interesting (orthogonal) polynomial sequence that benefits from the extra flexibility is Chebyshev polynomials where we’ll use |c_n = (-1)^n|. (This can be viewed as a special case of Gegenbauer polynomials which use |c_n = {-\lambda \choose n}^{-1}| which reduces to the Chebyshev case when |\lambda = 1|.)

The book “The Umbral Calculus” mentioned in the introduction primarily covers the “classical” case and has many examples of those. It also covers the “non-classical” case as well albeit as a bit of a second thought.

|tx^n = -x^{n-1}| so |tp(x) = -x^{-1} p(x)|

|\pseq{T_n(x)}{n}| is Sheffer for |(g, f)| where |g(t) = (1-t^2)^{-2}| and |f(t) = \frac{\sqrt{1-t^2} - 1}{t} = \frac{-t}{1 + \sqrt{1 - t^2}}|. |\bar f(t) = \frac{-2t}{1+t^2}| and |f’(t) = \frac{-1}{\sqrt{1-t^2}(1 + \sqrt{1 - t^2})} = \frac{f(t)}{t\sqrt{1-t^2}}|.

|f(t)s_n(x) = \frac{c_n}{c_{n-1}}s_n(x)| from theorem (id:hvdt) gives the recurrence \[\frac{c_n}{c_{n+1}ts_{n+1}(x) + \frac{c_n}{c_{n-1}}ts_{n-1}(x)} + 2s_n(x) = 0 \] for any Sheffer sequence with |f(t)| as its delta series. For the Chebyshev polynomials, this simplifies to \[ 2xT_n(x) + T_{n+1}(x) + T_{n-1}(x) = 0 \]

|\theta_t x^n = -(n+1)x^{n+1}| which we can compute from |\theta_t t = xD| which takes the form |\theta_t t x^n = -\theta_t x^{n-1} = nx^n|. This leads to |\theta_t = -x(1 + xD)|.

From theorem (id:vxhh), we get |TT_n(x) = nT_n(x)| where

\[ T = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)} = \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \]

This leads to |nT_n(x) = (xD - (1-t^2)^{-1})\sqrt{1-t^2}T_n(x)|.

Summary

A formal power series |f| is invertible iff |\deg f = 0| and a delta series iff |\deg f = 1|. We call a linear functional/operator induced by an invertible/delta series an invertible/delta functional/operator.

Defining property of |\pseq{s_n(x)}{n}| being Sheffer for |(g, f)| where |g| is an invertible series and |f| is a delta series: \[ \pair{g(t)f(t)^k}{s_n(x)} = \pair{t^k}{x^n} = c_n \delta_k^n \]

If |g(t) = 1|, then we say |\pseq{s_n(x)}{n}| is the associated sequence for |f|, and usually we’ll use |\pseq{p_n(x)}{n}| instead.

If |f(t) = t|, then we say |\pseq{s_n(x)}{n}| is the Appell sequence for |g|.

Formal power series as operators \[ t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k} \] and generally, \[f(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} f_k x^{n-k} \]

We can generalize this to Sheffer sequences a la \[h(f(t)) s_n(x) = \sum_{k=0}^n \frac{c_n}{c_{n-k}} h_k s_{n-k}(x) \]

A linear operator |T| on |\mathscr F| is continuous if given |\pseq{f_k}{k}| such that |\deg f_k \to \infty| as |k \to \infty|, we have |\deg T(f_k) \to \infty| as |k \to \infty|.

Given a linear operator |\mu| on |P|, the adjoint |\mu^*| is a linear operator on |\mathscr F| characterized by: \[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]

Its adjoint expansion is: \[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]

This applies generally, but, in particular for umbral shifts and transfer operators.

Theorem (id:ahdu): The adjoint to a linear operator on |P| is continuous.

Theorem (id:fdui): If |T| is a continuous linear operator on |\mathscr F|, then there exists a linear operator |\mu| on |P| such that |T = \mu^*|.

Theorem (Generating Function): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] for all |y \in \mathbb K|.*

Theorem (Sheffer Identity): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if \[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \] for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.

Theorem (id:cutg): A sequence |\pseq{p_n(x)}{n}| is the associated sequence for |f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|, and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N_+|.

Theorem (Transfer Formula): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \] for all |n \in \mathbb N|.

Theorem (Transfer Formula, alternate form): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \] for all |n \geq 1|.

Theorem (id:qqes): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.

Theorem (id:hvdt): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.

Theorem (id:omlq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. Then if |\theta_f| is the umbral shift for |f|, \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]

Theorem (Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for any |h \in \mathscr F|, \[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]

Corollary (Polynomial Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for a |p \in P|, \[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]

Theorem (id:stji): A linear operator |\lambda| on |P| is the transfer operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a |\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^* f(t) = t|. This makes |\lambda_f(g(t)) = g(\bar f(t))|.

Theorem (id:pnci): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.

Corollary (id:fony): \[ \pair{h(t)}{\mu_{g,f} q_n(x)} = \pair{\mu_{g,f}^*(h(t))}{q_n(x)} = \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]

Theorem (id:kscz): If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then \[ \theta_t s_n(x) = \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]

Corollary (id:tvdx): \[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]

Theorem (Conjugate Representation): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]

Theorem (Multiplication Theorem): Let |\pseq{s_n(x)}{n}| be Appell for |g|, then \[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \] for |\alpha\neq 0|.

Corollary (Lagrange Inversion Formula): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \] Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.

Corollary (Lagrange Inversion Formula, alternate form): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]

Corollary (Lagrange Inversion Formula, Hermite form): Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n} = \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]

This isn’t a basis because we’re allowing countably infinite sums of the elements of the pseudobasis, whereas for a basis, even with infinite elements, we’d be claiming that every element is a finite sum of the basis elements.↩︎

As far as I can tell, “umbral operator” is used in the “classical” |c_n = n!| case while “transfer operator” is more general. I’m sure people use “umbral operator” generally too, though.↩︎

November 10, 2025 12:36 AM

Brent Yorgey

Call for collaboration: Disco web UI
Call for collaboration: Disco web UI

Posted on November 10, 2025
Tagged disco, web, WASM, UI, teaching, discrete, math, Haskell
tl;dr: I would like to have a web interface for Disco, a student-oriented programming language for teaching functional programming and discrete mathematics, which is implemented in Haskell. I’m looking for others interested to help build it. If you like building web stuff with Haskell compiled to WASM and want to have a positive impact on students learning mathematics and functional programming, get in touch!
Disco

For the past nine (!) years I have been developing Disco, a functional teaching language for use in a discrete mathematics course. It features first-class functions, polymorphism, and recursive algebraic data types, along with various built-in mathematical niceties and syntax that is designed to be close to mathematical practice.

As a simple example, here’s a recursive function in Disco to compute the hyperbinary numbers:
h : N -> N
h(0)      = 1
h(2n)     = h(n) + h(n .- 1)
h(2n + 1) = h(n)
Since recursive functions in Disco are automatically memoized, this evaluates almost instantly even on very large numbers:
Disco> h(1000000)
1287
If you want to know more about Disco, you can read a paper about it here, or read the language documentation.
The problem

I want students in my Discrete Mathematics course (or any Discrete Mathematics course) to be able to use Disco: to play around at a REPL, write and test their own code, and so on.

However, getting students to install Haskell and build Disco from source is a total non-starter, for multiple reasons. Some students in the course are not all that tech-savvy. Some students don’t even have their own computer, or the computer they do have chokes when trying to compile a large Haskell project (e.g. because of limited memory). Even aside from those issues, I simply don’t want to spend time and effort helping students install stuff at the beginning of the semester (at least not in this class).

Old solution: Replit

For a couple years, students were able to use Disco via a site called Replit, which provided free educational accounts, and supported arbitrary Nix configurations. Since the disco package on Hackage was picked up by nixpkgs, it was possible to spin up a Disco interpreter on Replit in just a few seconds. Replit provided a virtual file system, an editor, and, of course, a REPL.

This was a great solution while it lasted, but sadly it is no longer viable:

Disco has been broken in nixpkgs for a while now, and I have no idea how to fix it.

Even if I could fix it, Replit has stopped supporting free education accounts, and started trying to shove LLMs into everything, making it no longer a viable teaching platform.

Solution criteria

There are a few non-negotiable criteria that any solution must meet:

It must either run on existing cloud infrastructure, or run completely client-side. I do not want to be in the business of running a server, or of worrying about mitigating DOS attacks (by which I mean students accidentally running infinite loops generating infinite amounts of memory). I also do not want to be in the business of managing accounts and logins.

Students should not have to install anything. As I mentioned before, this is a nonstarter for some students.

It should allow students to work at a Disco REPL, and also test their own Disco code.

Potential solutions

Of course, a really nice solution would be to find an existing site which replicates some of the functionality we used to have with Replit and supports educational uses. I kind of doubt such a thing exists, though I would be happy to be wrong about this.

Another possibility is to have students use VSCode in their browsers via GitHub; to make this work we would have to add LSP support to Disco and package it up appropriately. I’m not really excited about using GitHub for this, although implementing LSP for Disco is independently a great idea.

The other solution I can think of is to compile Disco to WASM, and build a web UI on top of it, so that the whole thing runs in the student’s browser. I spent some time on this last year and successfully got Disco to compile to WASM, but didn’t make it any farther than that. Honestly, I simply don’t know anything about web development, and I am not very motivated to learn.

UI levels

Running with the last idea above (WASM + a custom web UI), what could such a thing look like? What features would we want? Here are some different solutions I could imagine, in increasing order of effort.

Level 0 would be to have just a web-based REPL. Students can edit Disco code on their own device, upload it to the website, then evaluate expressions at the REPL. Having to reupload their code every time they make changes would be annoying, but this would still be better than nothing.

Level 1 would be to have a web-based editor and REPL. Students can type code in the editor, then push a button to load it into the REPL and play with it.

Level 2 would be to have a web-based filesystem + editor + REPL. Students can see a list of multiple files, edit them individually, and load any of them into the REPL. The filesystem must live on their own computer, not on a remote server; but it could be their real filesystem, or a virtual filesystem.

I don’t know how difficult any of these are; I would assume that at least some of them can be achieved by gluing together some existing Javascript components (e.g. CodeMirror?). I’m sure it will require extending Disco itself with some new APIs, but I can definitely work on that part once I know what would be needed.

If you know how to build web UIs like this and are interested in helping, get in touch!
<noscript>Javascript needs to be activated to view comments.</noscript>
by Brent Yorgey at November 10, 2025 12:00 AM

November 08, 2025

Magnus Therning

Making a theme based on modus

In modus-theme 5.0.0 Prot introduced a structured way to build a theme based on modus. Just a few days ago he released version 5.1.0 with some improvements in this area.

The official documentation of how to build on top of the Modus themes is very good. It's focused on how to make sure your theme fits in with the rest of the "modus universe". However, after reading it I still didn't have a good idea of how to get started with my own theme. In case others feel the same way I thought I'd write down how I ended up getting started.

The resulting theme, modus-catppuccin, can be found here.

A little background

I read about how to create a catppuccin-mocha theme using modus-vivendi through modus' mechanism of overrides. On Reddit someone pointed out that Prot had been working on basing themes on modus and when I checked the state of it he'd just released version 5.0.0. Since I'm using catppuccin themes for pretty much all software with a GUI I thought it could be interesting to see if I could make a modus-based catppuccin theme to replace my use of catppuccin-theme.

I'm writing the rest as if it was a straight and easy journey. It wasn't! I made a few false starts, each time realising something new about the structure and starting over with a better idea.

Finding a starting point

When reading what Prot had written about modus-themes in general, and about how to create themes based on it, in particular, I found that he's ported both standard-themes and ef-themes so they now are based on modus. Instead of just using them for inspiration I decided that since standard-themes is so small I might as well use it as my starting point.

Starting

I copied all files of standard-themes to an empty git repository, then I

deleted all but one of the theme file

copied the remaining theme file so I had four in total (one for each of the catppuccin flavours)

renamed constants, variables, and functions so they would match the theme and its flavours

put the colours into each catppuccin-<flavour>-palette

emptied the common palette, modus-catppuccin-common-palette-mappings

made sure that my use of modus-themes-theme was reasonable, in particular the base palette (I based the light flavour on modus-operandi and the three dark flavours on modus-vivendi)

The result can be seen here.

At this point the three theme flavours contained no relevant mappings of their own, so what I had was in practice modus-operandi under a new name and modus-vivendi under three new names.

Adding mappings for catppuccin

By organising the theme flavours the way outlined above I only need to add mappings to modus-catppuccin-common-palette-mappings because

each flavour-specific mapping adds its colour palette using the same name (that's how catppuccin organises its colors too, as seen here)

each flavour-specific mapping is combined with the common one

any missing mapping is picked up by the underlying theme, modus-operandi or modus-vivendi, so there will be (somewhat) nice colours for everything

I started out with the mappings in the dark standard theme but then I realised that's not the complete list of available mappings and I started looking at the themes in modus-themes itself.

Current state of modus-catppuccin

I've so far defined enough mappings to make it look enough like catppuccin for my use. There are a lot of possible mappings so my plan is to add them over time and use catppuccin-theme for inspiration.

Tags: emacs

November 08, 2025 08:45 AM

November 04, 2025

GHC Developer Blog

GHC 9.14.1-rc2 is now available

GHC 9.14.1-rc2 is now available

zubin - 2025-11-04

The GHC developers are very pleased to announce the availability of the second release candidate of GHC 9.14.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

The changes from the first release candidate are:

Bump the exceptions submodule to 0.10.11

Bump the os-string submodule to 2.0.8

Fix a driver regression causing an infinite loop with cyclic imports

GHC 9.14 will bring a number of new features and improvements, including:

Significant improvements in specialisation:

The SPECIALISE pragma now allows use of type application syntax

The SPECIALISE pragma can be used to specialise for expression arguments as well as type arguments.

Specialisation is now considerably more reliable in the presence of newtypes

Significant GHCi improvements including:

Correctness and performance improvements in the bytecode interpreter

Features in the GHCi debugger

Support for multiple home units in GHCi

Implementation of the Explicit Level Imports proposal

RequiredTypeArguments can now be used in more contexts

SSE/AVX2 support in the x86 native code generator backend

A major update of the Windows toolchain and improved compatibility with macOS Tahoe

â€¦ and many more

A full accounting of changes can be found in the release notes. Given the many specialisation improvements and their potential for regression, we would very much appreciate testing and performance characterisation on downstream workloads.

Note that while this release makes many improvements in the specialisation optimisation, polymorphic specialisation will remain disabled by default in the final release due to concern over regressions of the sort identified in #26329. Users needing more aggressive specialisation can explicitly enable this feature with the -fpolymorphic-specialisation flag. Depending upon our experience with 9.14.1, we may enable this feature by default in a later minor release.

We would like to thank the Zw3rk stake pool, Well-Typed, Mercury, Channable, Tweag I/O, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work have made the Haskell ecosystem what it is today.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 04, 2025 12:00 AM

November 03, 2025

Monday Morning Haskell

Defining Types for a Simple HTTP Server
In the last several months, we’ve gone through solutions for a multitude of LeetCode problems in Haskell. Practicing problems like these is a great step towards learning a new language. However, you’ll only get so far solving contrived problems with no extra programming context.

Another great step you can take to level up your programming skills is to write common tools from scratch. This forces you to tackle a larger context than simply the inputs and outputs of a single function. You’ll also get more familiar with techniques that are entirely absent from LeetCode problems, like filesystem operations and network mechanics. This is beneficial whether you’re learning more with your primary language or getting started with a new language.

We’re going to spend the rest of the year writing a couple small projects like this in Haskell. We’ll start by writing a simple HTTP Server in these first few weeks. Then we’ll try something more complicated.

What you’ll find with projects like these is that parsing is extremely important. In a LeetCode problem, you’re typically receiving pre-structured input data. When you’re writing a tool from scratch, your input is more often a stream of unstructured data from a file or the network, and one of your main jobs is making sense of that data! To learn some great techniques for parsing in Haskell, you should sign up for our course, Solve.hs! In Module 4, you’ll learn all about the Megaparsec library that we’ll use in this series!

Outlining Our Server

Before we dive into any code, let’s outline the basic expectations for our server - what do we expect it to do? We’re going to keep things very simple.

Our program should start a server listening on port 3000 When a user pings our server with a valid HTTP request, we should reply with a valid HTTP response using the code “200 OK”. This response should have a simple body like “This is the response body!” If we receive an invalid HTTP request, we should reply with a valid HTTP response using the code “400 Bad Request”. This 400 response should give an error message in the body.

Now there are many libraries out there for writing HTTP Servers. In fact, if you take our Practical Haskell course, you’ll learn about Servant, which uses some really cool type-level mechanics that are unique to Haskell! By using a server library, you could get all this functionality in about 10-20 lines of code (if that).

But when you’re writing something “from scratch”, you want to limit which libraries you use, so that you can focus on learning some of the lower level details. In our case, we want to focus on the details of the HTTP Protocol itself. Our objective will be to improve our understanding of the message format behind HTTP requests and responses.

This means we’re going to write our own parsing code for HTTP requests, and our own serialization code for responses. We’ll follow this guide for HTTP version 1.1. We’ll use this to help structure our data, but we won’t get too complicated. We’ll aim to correctly parse (almost) all valid requests. But as we’ll explain below, there are a lot of rules we won’t enforce, so our server will “accept” a wide variety of “invalid” requests.

Defining Types

The first thing we want to do when writing a parser is define the types of our system. This is especially true in Haskell, where it’s easy for us to define the structure of new types, and to combine our elements using sum and product types.

If you’re using open-source documentation, coming up with types is usually easy! The docs will often lay out the structure for you. For example, the the doc linked above defines an HTTP Message like so:
HTTP-message = Request | Response; HTTP/1.1 messages
We could translate this into Haskell types:
data HttpRequest = HttpRequest

data HttpResponse = HttpResponse

data HttpMessage =
  RequestMessage HttpRequest |
  ResponseMessage HttpResponse
Of course our request and response types are incomplete, and we’ll fill them in next. If we wanted, we could define each field as we parse it. When you’re writing an entirely new system, you might take this approach. Once again though, good documentation can give us an overview of the entire type. Let’s see how we can use the documentation to produce a complete “request” type.

HTTP Request

For the request, we can read the following definition in the docs:
Request       = Request-Line;
                *(( general-header;
                  | request-header;
                  | entity-header)
                 CRLF);
                CRLF
                [ message-body ];
Note that the CRLF items refer to the consecutive characters \r\n, a “carriage return” and “line feed” (AKA “new line character”). We read the full definition as 4 parts.

The request line (we’ll see below what information this has)

0 or more headers, each terminated by CRLF. There are 3 types of headers, but they all have the same structure, as we’ll see.

A mandatory CRLF separating the headers from the body

An optional message body

The Request Line

This still isn’t specific enough to write our types. Let’s examine the “request line” for more details.
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
The request line has 3 components and 3 separators (SP means a single space character ’ ‘). The first component is the “method” of the request (e.g. “Get”, “Post”). The protocol defines a series of valid methods for a request.
Method = "OPTIONS"
       | "GET"
       | "HEAD"
       | "POST"
       | "PUT"
       | "DELETE"
       | "TRACE"
       | "CONNECT"
       | extension-method
extension-method = token
If we ignore the “extension” method, we can make a simple enumerated type for the different methods, and add this as the first field in our request!
data HttpMethod =
    HttpOptions | HttpGet | HttpHead | HttpPost | HttpPut |
    HttpDelete | HttpTrace | HttpConnect
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    ...
    } deriving (Show, Eq)
The “request URI” has a few different options as well.
Request-URI    = "*" | absoluteURI | abs_path | authority
Each of these has a particular structure and rules, but we’re going to simplify it considerably. We’ll just treat the URI as a ByteString, with the only restriction being that it can’t have any “space” characters, since the space is the separator.

One of the biggest gains you’ll get from a good HTTP library is breaking down request URIs into component parts, like path components and query parameters. The Servant library does this very well.
data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    ...
    } deriving (Show, Eq)
The last item in the request line is the “HTTP Version”. Here’s the spec from the documentation:
HTTP-Version   = "HTTP" "/" 1*DIGIT "." 1*DIGIT
The two values we care about are the major and minor version numbers. For example, HTTP/1.0 gives the major version 1 and the minor version 0. As a practical matter, we only care about very small integer (<256), data-preserve-html-node="true" so each of these. So we can represent the version of the request with a tuple (Word8, Word8).
import Data.Word (Word8)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    ...
    } deriving (Show, Eq)
So now our type is representing all the parts of the request line. Let’s move on to the rest of the request.

Headers & Body

Now let’s tackle headers. As we mentioned before, there are several types of headers (general, request, response, entity), but they all have the same basic structure. Here is that structure:
message-header = field-name ":" [ field-value ]
 field-name     = token
 field-value    = *( field-content | LWS )
 field-content  = <the OCTETs making up the field-value
                  and consisting of either *TEXT or combinations
                  of token, separators, and quoted-string>
There are references to LWS, which is “leading white space”. But at a basic level, a header consists of a “name” and a “value”, separated by a colon. We’ll treat both the name and value as bytestrings. Then we want to use some kind of a map to match the names with the values. So we’ll add this field to our type:
import qualified Data.HashMap.Lazy as HM

newtype HttpHeaders = HttpHeaders
    (HM.HashMap ByteString ByteString)
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    , requestHeaders :: HttpHeaders
    ...
    } deriving (Show, Eq)
We use a newtype to package this map away in a type-safe manner.

Finally, we have the “Body” of the request. In general, this is simply a ByteString. We could represent empty request bodies with an empty bytestring. But since there’s a meaningful semantic difference between a request that has a body and one that doesn’t, we can also use a Maybe value.
data HttpResponse = HttpResponse
    { responseHttpVersion :: (Word8, Word8)
    , responseStatusCode :: Int
    , responseReason :: ByteString
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)
This completes our request type!

The Response Type

Any server that receives a request should be able to produce a valid response, so we need to define that type as well. The good news is that the documentation shows that a response is very similar to a request in its structure:
Response = Status-Line;
           *(( general-header;
           | response-header;
           | entity-header ) CRLF);
           CRLF
           [ message-body ]
There are only two differences. First a response has a “status line” instead of a “request line”. Second, it has “response-header” as an option instead of “request-header”. This second difference doesn’t affect our type, so we’ll go ahead and start outlining the response like this:
data HttpResponse = HttpResponse
    { ...
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)
Now we just have to understand the response line. Here is its specification:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
This has a similar structure to the request line, but different data in a different order. The HTTP version comes first. Then comes a status code (e.g. 200 = OK, 400 = client error, etc.). Finally, we have a “reason” for the response code (e.g. “OK”, “Bad Request”, “Forbidden”).

We’re already representing the version as (Word8, Word8). The status code is a straightforward Int, and the reason is just going to be a Bytestring. So it’s easy to fill out the rest of this response type:
data HttpResponse = HttpResponse
    { responseHttpVersion :: (Word8, Word8)
    , responseStatusCode :: Int
    , responseReason :: ByteString
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)
Now we have our fundamental types! Here’s the complete code for our request and response types, including imports and subtypes:
import Data.Word (Word8)
import qualified Data.HashMap.Lazy as HM
import Data.ByteString.Lazy (ByteString)

data HttpMethod =
    HttpOptions | HttpGet | HttpHead | HttpPost | HttpPut |
    HttpDelete | HttpTrace | HttpConnect
    deriving (Show, Eq)

newtype HttpHeaders = HttpHeaders
    (HM.HashMap ByteString ByteString)
    deriving (Show, Eq)

data HttpRequest = HttpRequest
    { requestMethod :: HttpMethod
    , requestUri :: ByteString
    , requestHttpVersion :: (Word8, Word8)
    , requestHeaders :: HttpHeaders
    , requestBody :: Maybe ByteString
    }
    deriving (Show, Eq)

data HttpResponse = HttpResponse
    { responseHttpVersion :: (Word8, Word8)
    , responseStatusCode :: Int
    , responseReason :: ByteString
    , responseHeaders :: HttpHeaders
    , responseBody :: Maybe ByteString
    }
    deriving (Show, Eq)
Conclusion

That’s all for the first part of this series. Next week in Part 2, we’ll write code to parse a request on our server using Megaparsec. For an in-depth tutorial on parsing in Haskell, including using this powerful library, you should sign up for Solve.hs, our Haskell problem solving course! Module 4 goes into a lot of detail on parsing, and allows you to build your own parser from scratch!
by James Bowen at November 03, 2025 09:30 AM

October 31, 2025

Oskar Wickström

Computer Says No: Error Reporting for LTL
Quickstrom is a property-based testing tool for web applications, using QuickLTL for specifying the intended behavior. QuickLTL is a linear temporal logic (LTL) over finite traces, especially suited for testing. As with many other logic systems, when a formula evaluates to false — like when a counterexample to a safety property is found or a liveness property cannot be shown to hold — the computer says no. That is, you get “false” or “test failed”, perhaps along with a trace. Understanding complex bugs in stateful systems then comes down to staring at the specification alongside the trace, hoping you can somehow pin down what went wrong. It’s not great.

Instead, we should have helpful error messages explaining why a property does not hold; which parts of the specification failed and which concrete values from the trace were involved. Not false, unsat, or even assertion error: x != y. We should get the full story. I started exploring this space a few years ago when I worked actively on Quickstrom, but for some reason it went on the shelf half-finished. Time to tie up the loose ends!

The starting point was Picostrom, a minimal Haskell version of the checker in Quickstrom, and Error Reporting Logic (ERL), a paper introducing a way of rendering natural-language messages to explain propositional logic counterexamples. I ported it to Rust mostly to see what it turned into, and extended it with error reporting supporting temporal operators. The code is available at codeberg.org/owi/picostrom-rs under the MIT license.

Between the start of my work and picking it back up now, A Language for Explaining Counterexamples was published, which looks closely related, although it’s focused on model checking with CTL. If you’re interested in other related work, check out A Systematic Literature Review on Counterexample Explanation in Model Checking.

All right, let’s dive in!

QuickLTL and Picostrom

A quick recap on QuickLTL is in order before we go into the Picostrom code. QuickLTL operates on finite traces, making it suitable for testing. It’s a four-valued logic, meaning that a formula evaluates to one of these values:

definitely true

definitely false

probably true

probably false

It extends propositional logic with temporal operators, much like LTL:

next_d(P)

P must hold in the next state, demanding a next state is available. This forces the evaluator to draw a next state.

next_f(P)

P must hold in the next state, defaulting to definitely false if no next state is available.

next_t(P)

P must hold in the next state, defaulting to probably true if no next state is available.

eventually_N(P)

P must hold in the current or a future state. It demands at least N states, evaluating on all available states, finally defaulting to probably false.

always_N(P)

P must hold in the current and all future states. It demands at least N states, evaluating on all available states, finally defaulting to probably true.

You can think of eventually_N(P) as unfolding into a sequence of N nested next_d, wrapping an infinite sequence of next_f, all connected by ∨. Let’s define that inductively with a coinductive base case:

$$ \begin{align} \text{eventually}_0(P) & = P \lor \text{next}_F(\text{eventually}_0(P)) \\ \text{eventually}_(N + 1)(P) & = P \lor \text{next}_D(\text{eventually}_N(P)) \\ \end{align} $$

And similarly, always_N(P) can be defined as:

$$ \begin{align} \text{always}_0(P) & = P \land \text{next}_T(\text{always}_0(P)) \\ \text{always}_(N + 1)(P) & = P \land \text{next}_D(\text{always}_N(P)) \\ \end{align} $$

This is essentially how the evaluator expands these temporal operators, but for error reporting reasons, not exactly.

Finally, there are atoms, which are domain-specific expressions embedded in the AST, evaluating to ⊤ or ⊥. The AST is parameterized on the atom type, so you can plug in an atom language of choice. An atom type must implement the Atom trait, which in simplified form looks like this:
trait Atom {  type State;   fn eval(&self, state: &Self::State) -> bool;   fn render(  &self,   mode: TextMode,   negated: bool,  ) -> String;   fn render_actual(  &self,   negated: bool,   state: &Self::State,  ) -> String; }
For testing the checker, and for this blog post, I’m using the following atom type:
enum TestAtom {  Literal(u64),  Select(Identifier),  Equals(Box<TestAtom>, Box<TestAtom>),  LessThan(Box<TestAtom>, Box<TestAtom>),  GreaterThan(Box<TestAtom>, Box<TestAtom>), }  enum Identifier {  A,  B,  C, }
Evaluation

The first step, like in ERL, is transforming the formula into negation normal form (NNF), which means pushing down all negations into the atoms:
enum Formula<Atom> {  Atomic {  negated: bool,  atom: Atom,  },  // There's no `Not` variant here!  ... }
This makes it much easier to construct readable sentences, in addition to another important upside which I’ll get to in a second. The NNF representation is the one used by the evaluator internally.

Next, the eval function takes an Atom::State and a Formula, and produces a Value:
enum Value<'a, A: Atom> {  True,  False { problem: Problem<'a, A> },  Residual(Residual<'a, A>), }
A value is either immediately true or false, meaning that we don’t need to evaluate on additional states, or a residual, which describes how to continue evaluating a formula when given a next state. Also note how the False variant holds a Problem, which is what we’d report as definitely false. The True variant doesn’t need to hold any such information, because due to NNF, it can’t be negated and “turned into a problem.”

I won’t cover every variant of the Residual type, but let’s take one example:
 enum Residual<'a, A: Atom> {  // ...  AndAlways {  start: Numbered<&'a A::State>,  left: Box<Residual<'a, A>>,  right: Box<Residual<'a, A>>,  },  // ... }
When such a value is returned, the evaluator checks if it’s possible to stop at this point, i.e. if there are no demanding operators in the residual. If not possible, it draws a new state and calls step on the residual. The step function is analogous to eval, also returning a Value, but it operates on a Residual rather than a Formula.

The AndAlways variant describes an ongoing evaluation of the always operator, where the left and right residuals are the operands of ∧ in the inductive definition I described earlier. The start field holds the starting state, which is used when rendering error messages. Similarly, the Residual enum has variants for ∨, ∧, ⟹, next, eventually, and a few others.

When the stop function deems it possible to stop evaluating, we get back a value of this type:
enum Stop<'a, A: Atom> {  True,  False(Problem<'a, A>), }
Those variants correspond to probably true and probably false. In the false case, we get a Problem which we can render. Recall how the Value type returned by eval and step also had True and False variants? Those are the definite cases.

Rendering Problems

The Problem type is a tree structure, mirroring the structure of the evaluated formula, but only containing the parts of it that contributed to its falsity.
enum Problem<'a, A: Atom> {  And {  left: Box<Problem<'a, A>>,  right: Box<Problem<'a, A>>,  },  Or {  left: Box<Problem<'a, A>>,  right: Box<Problem<'a, A>>,  },  Always {  state: Numbered<&'a A::State>,  problem: Box<Problem<'a, A>>,  },  Eventually {  state: Numbered<&'a A::State>,  formula: Box<Formula<A>>,  },  // A bunch of others... }
I’ve written a simple renderer that walks the Problem tree, constructing English error messages. When hitting the atoms, it uses the render and render_actual methods from the Atom trait I showed you before.

The mode is very much like in the ERL paper, i.e. whether it should be rendered in deontic (e.g. “x should equal 4”) or indicative (e.g. “x equals 4”) form:
enum TextMode {  Deontic,  Indicative, }
The render method should render the atom according to the mode, and render_actual should render relevant parts of the atom in a given state, like its variable assignments.

With all these pieces in place, we can finally render some error messages! Let’s say we have this formula:

eventually₁₀(B = 3 ∧ C = 4)

If we run a test and never see such a state, the rendered error would be:

Probably false: eventually B must equal 3 and C must equal 4, but it was not observed starting at state 0

Neat! This is the kind of error reporting I want for my stateful tests.

Implication

You can trace why some subformula is relevant by using implication. A common pattern in state machine specs and other safety properties is:

precondition ⟹ before ∧ next_t(after)

So, let’s say we have this formula:

always_N((A > 0) ⟹ (B > 5 ∧ next_t(C < 10)))

If B or C are false, the error includes the antecedent:

Definitely false: B must be greater than 5 and in the next state, C must be less than 10 since A is greater than 0, […]

Small Errors, Short Tests

Let’s consider a conjunction of two invariants. We could of course combine the two atomic propositions with conjunction inside a single always(...), but in this case we have the formula:

always(A < 3) ∧ always(B < C)

An error message, where both invariants fail, might look the following:

Definitely false: it must always be the case that A is less than 3 and it must always be the case that B is greater than C, but A=3 in state 3 and B=0 in state 3

If only the second invariant (B < C) fails, we get a smaller error:

Definitely false: it must always be the case that B is greater than C, but B=0 and C=0 in state 0

And, crucially, if one of the invariants fail before the other we also get a smaller error, ignoring the other invariant. While single-state conjunctions evaluate both sides, possibly creating composite errors, conjunctions over time short-circuit in order to stop tests as soon as possible.

Diagrams

Let’s say we have a failing safety property like the following:

next_d(always₈(B < C))

The textual error might be:

Definitely false: in the next state, it must always be the case that B is greater than C, but B=13 and C=15 in state 6

But with some tweaks we could also draw a diagram, using the Problem tree and the collected states:

Or for a liveness property like next_d(eventually₈(B = C)), where there is no counterexample at a particular state, we could draw a diagram showing how we give up after some time:

These are only sketches, but I think they show how the Problem data structure can be used in many interesting ways. What other visualizations would be possible? An interactive state space explorer could show how problems evolve as you navigate across time. You could generate spreadsheets or HTML documents, or maybe even annotate the relevant source code of some system-under-test? I think it depends a lot on the domain this is applied to.

No Loose Ends

It’s been great to finally finish this work! I’ve had a lot of fun working through the various head-scratchers in the evaluator, getting strange combinations of temporal operators to render readable error messages. I also enjoyed drawing the diagrams, and almost nerd-sniped myself into automating that. Maybe another day. I hope this is interesting or even useful to someone out there. LTL is really cool and should be used more!

The code, including many rendering tests cases, is available at codeberg.org/owi/picostrom-rs.

A special thanks goes to Divyanshu Ranjan for reviewing a draft of this post.
October 31, 2025 11:00 PM

Manuel M T Chakravarty

Applicative code —the IDE for functional programming— is now in beta and sports a Bluesky account to…

Applicative Code (@code.applicative.co)
Applicative code —the IDE for functional programming— is now in beta and sports a Bluesky account to follow!

October 31, 2025 11:04 AM

Well-Typed.Com

Case Study: Debugging a Haskell space leak
As part of their Haskell Ecosystem Support Package, QBayLogic asked us to investigate a space leak in one of their Haskell applications, a simulation of a circuit using Clash. The starting point was a link to a ticket in the bittide-hardware package with reproduction instructions.

This post explains the debugging process which led to the resolution of this ticket. At the start of the investigation the program used 2 GB memory, at the end, about 200 MB, an improvement of approximately 10x!

First impressions

I first looked at the ticket report to get an idea of the problem.

The ticket contained a profile generated by eventlog2html which showed a profile of a program which runs in two phases. During these two phases the memory increased before resetting to some baseline and then increasing again.

Reproduction instructions were provided in the subsequent comment, I could run these easily to reproduce the issue. I altered the options to use the -hT profiling mode to generate a basic heap profile without needing to compile with profiling support. This is a useful technique to get an initial handle on the problem.

The instructions used profiling-detail: all-functions, which will insert many cost centres into the program which will significantly affect the runtime characteristics of the resulting program. I replaced this with profiling-detail: late.

Most importantly, the ticket lacked a precise description about what the issue was with the profile. It may have been that this was exactly the memory profile that the program should exhibit! When starting to think about memory issues, thinking about memory invariants is a very helpful technique. The first question I ask myself is:

What is the memory invariant that the program should uphold?

This situation was a useful test of this technique, since I had no domain knowledge of what the program did, what the test did or what function the library even aimed to perform. It certainly highlighted to me the importance of knowing your domain and knowing the invariants.

Memory invariants

A memory invariant is a property that your program’s heap should obey. Establishing a memory invariant makes it easier to verify and fix memory issues, since if you have a precise invariant, it is easy to check whether the invariant holds.

A memory invariant consists of two parts:

A predicate on the heap

The timeline over which the predicate should hold

For example, some predicates on the heap might be:

“No constructors of type T are alive”

“There is an upper bound of 20000 bytestrings”

“There are exactly 5 live closures of type T”

“No closures of type T are reachable from closures of type Q”

When paired with a timeline, a memory invariant is formed. Example timelines include:

“Before the second phase of the program”

“During the cleanup phase”

“After initialisation”

“Between runs 10 and 100”

“At all points of the program’s execution”

Establishing a memory invariant requires domain knowledge about the program. Without first establishing an invariant (even informally in your head), you can’t begin to debug memory usage of a program. The main challenge for me when investigating this issue was coming up with a memory invariant.

Initial investigation

In order to get an idea of how to proceed, I generated a “Profile by Closure Type” using the -hT runtime system option.
cabal run bittide-instances:unittests -- -p RegisterWb +RTS -hT -l -RTS
The result was a unittests.eventlog file which contains the profiling information.

I rendered this eventlog using eventlog2html and inspected the result in my browser.
eventlog2html unittests.eventlog
The profile shows a coarse breakdown by the type of closures currently alive on the heap. The maximum value reported in the profile is about 600 MB. This value relates to the total memory used by the process (2 GB), but doesn’t include additional memory used by the RTS. The relationship between live bytes and OS memory usage is explained fully in this blog post. Reducing live memory is a good way to reduce the overall memory usage of your program, but it isn’t the only factor.

The top four bands came from

Clash.Signal.Internal.:-

Protocols.Wishbone.Wishbone.S2M

THUNK_1_0

2-tuples (,)

and as can be easily seen from the “detailed pane”, the patterns of allocation of these top four bands closely align with each other:

Looking at these correlations in the detailed pane can be invaluable in understanding the root issue, since memory issues are normally about different closures retaining each other, they are allocated together and retained together. Seeing these overall patterns can give you context about what exact kind of thing is using the memory.

Without a clear memory invariant, I wanted to get a better idea about these top 4 bands of allocation, I had a hypothesis at this stage that the THUNK closures were contained within tuples, which were retaining the WishboneS2M and :- constructors.

A more specific profile with info table provenance

I want to know information about the precise source location of the :-, WishboneS2M constructors and thunks. Therefore I enabled a few more debugging options to add this information to the binary:

-finfo-table-map: Gives a source location to each thunk and data constructor

-fdistinct-constructor-tables: Distinguishes between allocation sites of each constructor

Then if you make a profile using the -hi option, you get a very fine-grained breakdown about where exactly in your program to start looking. That’s useful for me since I didn’t yet look at any of the source code!
cabal run bittide-instances:unittests -- -p RegisterWb +RTS -hi -l -RTS
Nothing very useful from these source locations.

After consulting the source code in the relevant places, I quickly realised that this wouldn’t necessarily be as straightforward an investigation as I had hoped. I had hoped that the THUNK_1_0 locations reported in the -hT profile would be clear that my hypothesis about retaining was correct, but the -hi profile didn’t show up anything directly wrong. These locations were normal ways you could construct a Clash circuit.

At this stage my lack of a memory invariant or some domain knowledge was a hindrance. I took the opportunity to consult Ben who knew about the Clash ecosystem and asked on the ticket what the expected profile should look like.

The program is simulating a digital circuit.

The :- constructor represents a single time-step of simulation.

The expected memory profile is to use a constant (or near constant) amount of memory, since the circuit being simulated has bounded size.

For this program, a plausible invariant might have been: the number of :- constructors should remain roughly constant during simulation.

With this knowledge, the number of :- constructors alive seemed to be the biggest unexpected source of memory usage. Knowing that :- is a data constructor with two arguments, Each allocation is 24 bytes, so 240 MB of live :- closures corresponds to roughly 10 million constructors. That is certainly an issue.

Secondly, the number of live WishboneS2M constructors looked wrong. I didn’t have a good idea of the domain still, but by similar arithmetic, many millions of these are also resident on the heap.

These two facts gave me some further avenues to investigate but I was going to need to use ghc-debug to investigate further.

Using ghc-debug to investigate retainers

Using ghc-debug I wanted to establish

What was retaining :- constructors

What was retaining WishboneS2M constructors

Therefore I instrumented the test executable, and launched ghc-debug-brick in order to query what was retaining :- and WishboneS2M. This was the start of making progress on the investigation.

To find the retainers of :-, I paused the test program just after the test started and used the “Find retainers” command in ghc-debug-brick. The result was a list of 100 :- closures, and when expanded, each one shows the path which was taken to reach it. It wasn’t very important where I paused, as long as it was in this initial period, since we saw in the profile that the :- closures are alive and linearly increasing for the whole phase.

When looking at retainers of :-, it was immediately noticeable that the program contained very long chains of :- constructors (upwards of 5000 long). This looked wrong to me, since my understanding was that :- was being used as a control operation to drive the simulation of the circuit.

The information about where each :- constructor was allocated is not very informative. That just gives me a location inside the library functions.

The question then becomes, why is :- being retained? I scrolled, for a long while, and eventually get to the point where the chain of :- constructors is retained by a non-:- constructor. That’s the interesting part since it’s the part of the program which led to the long chain being retained.

At the time, I didn’t think this looked so interesting, but also I didn’t know what I was looking for exactly.

So I kept going down the stack, looking for anything which looked suspicious. In the end, I got quite lucky: I found a tuple which was retained by a thunk. Since I had compiled with profiling enabled, I could see the cost centre stack where the thunk was allocated, which pointed to the implementation of singleMasterInterconnectC.

Culprit 1: lazy unzip

In the source code of singleMasterInterconnectC, I worked out this part of the allocation was coming from these calls to unzip.
  go (((), m2s), unzip -> (prefixes, unzip -> (slaveMms, s2ms))) = ((SimOnly memMap, s2m), (\x -> ((),     ((), x))) <$> m2ss)
Then I looked at the definition of unzip, and found it was defined in a very lazy manner.
unzip :: Vec n (a,b) -> (Vec n a, Vec n b)
unzip xs = (map fst xs, map snd xs)
With this definition, the thunk created by applying map to fst and xs retains a reference to xs, which retains a reference to all the bs as well as the as. In a definition which performs a single iteration, if you force either list, both will be evaluated and leave no reference to the other half. I changed this definition to one which performed a single iteration and this had a massive positive effect on memory usage.
unzip xs
  | clashSimulation = unzipSim xs
  | otherwise = (map fst xs, map snd xs)
 where
  unzipSim :: Vec m (a,b) -> (Vec m a, Vec m b)
  unzipSim Nil = (Nil, Nil)
  unzipSim (~(a,b) `Cons` rest) =
    let (as, bs) = unzipSim rest
    in (a `Cons` as, b `Cons` bs)
This issue was hard to spot with the tools, I got lucky, but it was harder to spot since unzip was also marked as INLINE. In the end, I guessed right, but it’s a bit unsatisfying to not have a great story about how I worked it out, but I knew the answer was somewhere in the retainer stack I was looking at, and eventually I looked in the right place.

This problem is similar to one you can encounter when using conduit and similar libraries. In short, by sharing a thunk between two consumers, the input structure can be retained longer than intended. Since one part of the program continues by evaluating the thunk, the other reference is updated to the result of the thunk being evaluated. This is a problem though, since the original structure was intended to be used for control and discarded immediately after driving the next step of execution.

Culprit 2: lazy record update retains old field values

Once the original problem had been fixed, memory usage was much improved. I circled back to the start to look at a modified -hT profile. Perhaps there were still other problems lurking?

The final phase of memory usage looks much better, so I turned my attention to the initial phase. In the initial phase, it looked like OUTPUT was increasing linearly.

I turned to ghc-debug again, inspected the retainers of the OUTPUT constructor and discovered that the fields of WishboneM2S were not being strictly updated, indirectly keeping a reference to the OUTPUT constructor.

I looked at the source location that WishboneM2S was allocated,

and made the update strict:
     toSlaves =
-      (\newStrobe -> (updateM2SAddr newAddr master){strobe = strobe && newStrobe})
+      (\newStrobe -> strictM2S $ (updateM2SAddr newAddr master){strobe = strobe && newStrobe})
         <$> oneHotOrZeroSelected
     toMaster
       | busCycle && strobe =
@@ -152,10 +152,8 @@ singleMasterInterconnect (fmap pack -> config) =
             (maskToMaybes slaves oneHotOrZeroSelected)
       | otherwise = emptyWishboneS2M

+    strictM2S (WishboneM2S !a !b !c !d !e !f !g !h !i) = WishboneM2S a b c d e f g h i
This resulted in another reduction in memory usage:

Culprit 3: lack of sharing in iterate

When I returned to the profile a final time, the memory usage seemed much better, especially in the initial section. I had now eliminated retained :- constructors, so the overall memory usage was much lower, but still increasing slightly. I turned to a -hi profile again to get more information about the THUNK bands.

Looking at the source code for the sat_ssVQ thunks, they come from the Clash.Sized.Vector.map function. Therefore… back to ghc-debug to see what retains these map thunks, and the callstack where they are allocated from. This time I used “Find Retainers (Exact)”, to find closures which are named sat_ssVQ_info.

The first one I looked at, I found was allocated from iterateI by inspecting the cost centre stack.

iterateI was implemented as follows:
iterateI :: forall n a. KnownNat n => (a -> a) -> a -> Vec n a
iterateI f a = xs
  where
    xs = init (a `Cons` ws)
    ws = map f (lazyV xs)
Reasoning about the definition, you can see that iterateI will result in a vector of the form:
a `Cons` map f (a `Cons` (map f (a `Cons` ...)))
As a result, each element of the vector will independently compute f^n a, no intermediate results will be shared, a quadratic amount of thunks for the n applications of f will be allocated.

Defining iterateI in a directly recursive style means only a linear amount of thunks will be allocated and f will be computed only a linear number of times.
iterateU :: UNat n -> (a -> a) -> a -> Vec n a
iterateU UZero _ _ = Nil
iterateU (USucc s) f a = a `Cons` iterateU s f (f a)
Even better, for the specific example, was to use a strict accumulator, so no intermediate thunks were allocated or retained in the early part of the program.

Final result

The final profile shows slightly higher memory usage in the initial phase of the program’s execution than the second phase, but looking at the detailed pane, I could identify why the memory was retained.

Overall, the total memory usage decreased from about 2 GB to 200 MB. There is probably still some improvement which can be made to this profile, but we felt like it was a good place to stop for this post.

Conclusion

The goal of this post is to document the thought process involved in investigating a memory issue. Overall, I feel like I would have found it easier to fix the problem with some domain knowledge. Once I acquired some knowledge about the domain I made much more rapid progress about what to investigate.

The main cause of the memory leak was not obvious, and I got a bit lucky in finding the right place. One issue being the problem was obfuscated by the problematic definition being inlined. In general, with a busy heap, finding the needle can be quite tricky. Debugging the subsequent issues was more straightforward.

In the future, we want to explore more reliable ways to identify and investigate the kinds of memory invariants that were violated in this program. For example, it was crucial to know that :- should not be retained, perhaps additional language design can express that property more clearly. On another note, a logical specification of memory invariants could be useful to automatically detect and pause the program at the exact point a violation was detected. There remains significant potential to improve our memory debugging tooling!

This work was performed for QBayLogic as part of their Haskell Ecosystem Support Package. If your company is using Haskell and from time to time requires expert help in issues like this, our packages fund maintenance on core tooling such as GHC and Cabal, as well as development or support for your specific issues. Please contact info@well-typed.com if we might be able to help you!
by matthew at October 31, 2025 12:00 AM

October 30, 2025

Haskell Interlude

72: Manuel Chakravarty

In this episode, we talk to Manuel Chakravarty - specifically, his work on the ghc backend such as data-parallel Haskell and the FFI and how that work segued into type system design. We also discussed Manuel's perspective on Haskell from the language design of Swift.

by Haskell Podcast at October 30, 2025 10:00 AM

Tweag I/O

Continuous Performance Testing: staying fast
The performance of a system is critical to the user experience. Whether it’s a website, mobile app, or service, users demand fast response and seamless functionality. Every change to a system brings the risk of performance degradation, so you should check every commit during development to ensure that loyal users do not face any performance issues.

From my experience, one of the most effective methods to achieve this is with Continuous Performance Testing (CPT). In this post, I want to explain how CPT is effective in catching performance-related issues during development. CPT is a performance testing strategy, so you might benefit from a basic understanding of the latter. A look at my previous blog post will be helpful!

What is Continuous Performance Testing?

Continuous Performance Testing (CPT) is an automated and systematic approach to performance testing, leveraging various tools to spontaneously conduct tests throughout the development lifecycle. Its primary goal is to gather insightful data, providing real-time feedback on how code changes impact system performance and ensuring the system is performing adequately before proceeding further.

As shown in the example below, CPT is integrated directly into the Continuous Integration and Continuous Deployment (CI/CD) pipeline. This integration allows performance testing to act as a crucial gatekeeper, enabling quick and accurate assessments to ensure that software meets required performance benchmarks before moving to subsequent stages.

A key benefit of this approach is its alignment with shift-left testing, which emphasizes bringing performance testing earlier into the development lifecycle. By identifying and addressing performance issues much sooner, teams can avoid costly late-stage fixes, improve software quality, and accelerate the overall development process, ultimately ensuring that performance standards and Service Level Agreements (SLAs) are consistently met.

To which types of performance testing can CPT be applied?

Continuous performance testing can be applied to the all types of performance testing. However each types has different challenges.

Automated Performance Testing is

Easily applied to load testing

Hard to apply to stress and spike tests, but still has benefits

Very hard to apply to soak-endurance tests

For more details about why the latter two performance testing types are difficult to implement in CI/CD, see the previous blog post.

Why prefer automated load testing?

The load test is designed with the primary objective of assessing how well the system performs under a specific and defined load. This type of testing is crucial for evaluating the system’s behavior and ensuring it can handle expected levels of user activity or data processing. The success of a load test is determined by its adherence to predefined metrics, which serve as benchmarks against which the system’s performance is measured. These metrics might include factors such as response times, throughput, and resource utilization. Given this focus on quantifiable outcomes, load testing is considered the most appropriate, easiest and well-suited type of performance testing type for Continuous Performance Testing (CPT).

How to apply continuous load testing

Strategy

Performance testing can be conducted at every level, starting with unit testing. It should be tailored to evaluate the specific performance requirements of each development stage, ensuring the system meets its capabilities and user expectation.

Load testing can be performed at any level—unit, integration, system, or acceptance. In Continuous Performance Testing (CPT), performance testing should start as early as possible in the development process to provide timely feedback, especially at the integration level. Early testing helps identify bottlenecks and optimize the application before progression. When CPT is applied at the system level, it offers insights into the overall performance of the entire system and how components interact, helping ensure the system meets its performance goals.

In my opinion, to maximize CPT benefits, it’s best to apply automated load testing at both integration and system level. This ensures realistic load conditions, highlights performance issues early, and helps optimize performance throughout development for a robust, efficient application.

Evaluation with static thresholds

Continuous Performance Testing (CPT) is fundamentally centered around fully automated testing processes, meaning that the results obtained from performance testing must also be evaluated automatically to ensure efficiency and accuracy. This automatic evaluation can be achieved in different ways. Establishing static metrics that serve as benchmarks against which the current results can be measured is one of them. By setting and comparing against these predefined metrics, we can effectively assess whether the application meets the required performance standards.

The below code snippet shows how we can set threshold values for various metrics with K6. K6 is an open source performance testing tool built in Go and it allows us to write performance testing scripts in Javascript, and it has an embedded threshold feature that we can use to evaluate the performance test results. For more information about setting thresholds, please see the documentation of K6 thresholds.
import { check, sleep } from "k6"
import http from "k6/http"

export let options = {
  vus: 250, // number of virtual users
  duration: "30s", // duration of the test
  thresholds: {
    http_req_duration: [
      "avg<20", // average response time must be below 2ms
      "p(90)<30", // 90% of requests must complete below 3ms
      "p(95)<40", // 95% of requests must complete below 4ms
      "max<50", // max response time must be below 5ms
    ],
    http_req_failed: [
      "rate<0.01", // http request failures should be less than 1%
    ],
    checks: [
      "rate>0.99", // 99% of checks should pass
    ],
  },
}
With the example above, K6 tests the service for 30 seconds with 250 virtual users and compares the results to the metrics defined in the threshold section. Let’s look at the results of this test:
running (0m30.0s), 250/250 VUs, 7250 complete and 0 interrupted iterations
default   [ 100% ] 250 VUs  30.0s/30s

     ✓ is status 201
     ✓ is registered

   ✓ checks.........................: 100.00% 15000 out of 15000
   ✗ http_req_duration..............: avg=2.45ms   min=166.47µs med=1.04ms   max=44.52ms p(90)=3.68ms   p(95)=7.71ms
       { expected_response:true }...: avg=2.45ms   min=166.47µs med=1.04ms   max=44.52ms p(90)=3.68ms   p(95)=7.71ms
   ✓ http_req_failed................: 0.00%   0 out of 7500
     iterations.....................: 7500    248.679794/s
     vus_max........................: 250     min=250            max=250


running (0m30.2s), 000/250 VUs, 7500 complete and 0 interrupted iterations
default ✓ [ 100% ] 250 VUs  30s
time="2025-03-12T12:09:54Z" level=error msg="thresholds on metrics 'http_req_duration' have been crossed"
Error: Process completed with exit code 99.
Although the checks and the http_req_failed rate thresholds are satisfied, this test failed because all the calculated http_req_duration metrics are greater than the thresholds defined above.

Evaluation by comparing to historical data

Another method of evaluation involves comparing the current results with historical data within a defined confidence level. This statistical approach allows us to understand trends over time and determine if the application’s performance is improving, declining, or remaining stable.

In many cases, performance metrics such as response times or throughput can be assumed to follow a normal distribution, especially when you have a large enough sample size. The normal distribution, often referred to as the bell curve, is a probability distribution that is symmetric about the mean. You can read more about it on Wikipedia.

Here’s how the statistical analysis works: from your historical data, calculate the mean (or average, μ) and standard deviation (SD, σ) of the performance metrics. These values will serve as the basis for hypothesis testing. Then, determine the performance metric from the current test run that you want to compare against the historical data. This could be the mean response time, p(90), error rate, etc.

Define test hyptheses

Concretely, let’s first create an hypothesis to test the current result with the historical data.

Null Hypothesis (H0): The current performance metric is equal to the historical mean (no significant difference).
$encoding="application/x-tex">H_0: \mu_{\text{current}} = \mu_{\text{historical}}</annotation></semantics>$

Alternative Hypothesis (H1): The current performance metric is not equal to the historical mean (there is a significant difference).
$encoding="application/x-tex">H_1: \mu_{\text{current}} \neq \mu_{\text{historical}}</annotation></semantics>$

Define a comparison metric and acceptance criterion

To compare the current result to the historical mean, we calculate the Z-score, which tells you how many standard deviations the current mean is from the historical mean. The formula for the Z-score is:

$\frac{\mu_{\text{current}} - \mu_{\text{historical}}}{\sigma_{\text{historical}}}</annotation></semantics>$

Where:

$encoding="application/x-tex">\mu_{\text{current}}</annotation></semantics>$ is the current mean.

$encoding="application/x-tex">\mu_{\text{historical}}</annotation></semantics>$ is the historical mean.

$encoding="application/x-tex">\sigma_{\text{historical}}</annotation></semantics>$ is the standard deviation of the historical data.

Finally, we need to determine the critical value of the Z-score: for a 95% confidence level, you can extract it from the standard normal distribution table. For a two-tailed test, the critical values are approximately ±1.96. For the full standard normal distribution table, see, for example, this website.

The confidence level means that the calculated difference between current and historical performance would fall within the chosen range around the historical mean in 95% of the cases. I believe the 95% confidence level provides good enough coverage for most purposes, but depending on the criticality of the product or service, you can increase or decrease it.

Make a decision

If the calculated Z-score falls outside the range of -1.96 to +1.96, you reject the null hypothesis (H0) and conclude that there is a statistically significant difference between the current performance metric and the historical mean. If the Z-score falls within this range, you fail to reject the null hypothesis, indicating no significant difference.

Based on these findings, you can interpret whether the application’s performance has improved, declined, or remained stable compared to historical data. This statistical analysis provides a robust framework for understanding performance trends over time and making data-driven decisions for further optimizations.

Implementation

In the above section, I tried to provide a clear explanation of how we can effectively evaluate the results of performance testing using historical data. It is important to note that we do not need to engage in complex manual statistical analyses to check the validity of these results. Instead, we should focus on scripting a comprehensive process that allows us to test the hypothesis for the Z-score within the 95% confidence level. This approach will streamline our evaluation and ensure that we rely on a straightforward method to assess the performance outcomes in the CI/CD pipeline.
import numpy as np
from scipy import stats

def hypothesis_test(historical_data, current_data, confidence_level=0.95):
    # Calculate historical mean and standard deviation
    historical_mean = np.mean(historical_data)
    historical_std = np.std(historical_data, ddof=1)

    # Calculate the current mean
    current_mean = np.mean(current_data)

    # Number of observations in the current dataset
    n_current = len(current_data)

    # Calculate Z-score
    z_score = (current_mean - historical_mean) / historical_std

    # Determine the critical Z-values for the two-tailed test
    critical_value = stats.norm.ppf((1 + confidence_level) / 2)

    # Print results
    print(f"Historical Mean: {historical_mean:.2f}")
    print(f"Current Mean: {current_mean:.2f}")
    print(f"Z-Score: {z_score:.2f}")
    print(f"Critical Value for {confidence_level*100}% confidence: ±{critical_value:.2f}")

    # Hypothesis testing decision
    if abs(z_score) > critical_value:
        assert abs(z_score) <= critical_value, f"z_score {z_score} exceeds the critical value {critical_value}"

if __name__ == "__main__":
    # Read the historical data (performance metrics)
    historical_data = get_historical_data()
    # Current data to compare
    current_data = get_current_result()
    hypothesis_test(historical_data, current_data, confidence_level=0.95)
The challenges with CPT

CPT can add additional cost to your project. It is an additional step in the CI pipeline, and requires performance engineering expertise that organizations might need to hire for. Furthermore, an additional test environment is needed to run the performance testing.

In addition to the costs, maintenance can be challenging. Likewise, data generation is very critical for the success of the performance testing: it requires obtaining data, masking sensitive information, and deleting them securely. CPT also requires testing new services, reflecting changes in the current services or removing unused services. Following up on detected issues and on new features of performance testing tools are also mandatory. All these must be done regularly to keep the system afloat, adding to existing maintenance efforts.

The benefits of CPT

Continuous Performance Testing offers significant benefits by enabling automatic early detection of performance issues within the development process. This proactive approach allows teams to identify and address bottlenecks before they reach production, reducing both costs and efforts associated with fixing problems later. By continuously monitoring and optimizing application performance, CPT helps ensure a fast, responsive user experience and minimizes the risk of outages or slowdowns that could disrupt users and business operations.

In addition to early detection, CPT enhances resource utilization by pinpointing inefficient code and infrastructure setups, ultimately reducing overall costs despite initial investments. It also fosters better collaboration among development, testing, and operations teams by providing a shared understanding of performance metrics: each test generates valuable data that supports advanced analysis and better decision-making regarding code improvements, infrastructure upgrades, and capacity planning. Finally, CPT offers the convenience of on-demand testing with just one click, providing an easy-to-use baseline for more rigorous performance evaluations when needed.

Conclusion

Continuous Performance Testing (CPT) transforms traditional performance testing by integrating it directly into the CI/CD pipeline. CPT can, in principle, be applied to each performance testing type, but load testing is most advantageous with lower cost and higher benefits.

The core idea is to automate and conduct performance tests continuously and earlier in the development cycle, aligning with the “shift-left” philosophy. This approach provides real-time feedback on performance impacts, helps identify and resolve issues sooner, and ultimately leads to improved software quality, faster development, and consistent adherence to performance standards and SLAs.
October 30, 2025 12:00 AM

GHC Developer Blog

GHC 9.14.1-rc1 is now available

GHC 9.14.1-rc1 is now available

bgamari - 2025-10-30

The GHC developers are very pleased to announce the availability of the release candidate of GHC 9.14.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

GHC 9.14 will bring a number of new features and improvements, including:

Significant improvements in specialisation:

The SPECIALISE pragma now allows use of type application syntax

The SPECIALISE pragma can be used to specialise for expression arguments as well as type arguments.

Specialisation is now considerably more reliable in the presence of newtypes

Significant GHCi improvements including:

Correctness and performance improvements in the bytecode interpreter

Features in the GHCi debugger

Support for multiple home units in GHCi

Implementation of the Explicit Level Imports proposal

RequiredTypeArguments can now be used in more contexts

SSE/AVX2 support in the x86 native code generator backend

A major update of the Windows toolchain and improved compatibility with macOS Tahoe

â€¦ and many more

A full accounting of changes can be found in the release notes. Given the many specialisation improvements and their potential for regression, we would very much appreciate testing and performance characterisation on downstream workloads.

Note that while this release makes many improvements in the specialisation optimisation, polymorphic specialisation will remain disabled by default in the final release due to concern over regressions of the sort identified in #26329. Users needing more aggressive specialisation can explicitly enable this feature with the -fpolymorphic-specialisation flag. Depending upon our experience with 9.14.1, we may enable this feature by default in a later minor release.

This is the first and hopefully last release candidate prerelease of 9.14.1. This comes later than expected in part due to work on resolving a regression in macOS 26 (#26166) which threatened the usability of the release. This prerelease includes a fix to this regression; naturally, please open a ticket if you encounter any trouble when using this release on macOS Tahoe or recent XCode releases. We expect that this fix will be backported to GHC 9.12 and 9.10 in the coming months.

We would like to thank the Zw3rk stake pool, Well-Typed, Mercury, Channable, Tweag I/O, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work have made the Haskell ecosystem what it is today.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at October 30, 2025 12:00 AM

October 27, 2025

Monday Morning Haskell

Stock Market Shark: More Multidimensional DP
Today will be the final problem we do (for now) comparing Rust and Haskell LeetCode solutions. We’ll do a wrap-up of some of the important lessons next week. Last week’s problem was a multi-dimensional dynamic programming problem where the “dimensions” were obvious. We were working in 2D space trying to find the largest square, so we wanted the cells in our “DP” grid to correspond to the cells in our input grid.

Today we’ll solve one final problem using DP in multiple dimensions where the dimensions aren’t quite as obvious. To learn more about the basics behind implementing DP in Haskell, you need to enroll in our course, Solve.hs! You’ll learn many principles about algorithms in Module 3 and get a ton of practice with our exercises!

The Problem

Today’s problem is Best Time to Buy and Sell Stack IV, the final in a series of problems where we are aiming to maximize the profit we can make from purchasing a single stock.

We have two problem inputs. The first is an array of the prices of the stock over a number of days. Each day has one price. There is no fluctuation over the course of a day (real world stock trading would be much easier if we got this kind of future data!).

Our second input is a number of “transactions” we can make. A single transaction consists of buying AND selling the stock. There are some restrictions on how these transactions work. The primary one is that we cannot have simultaneous transactions. Another way of saying this is that we can only hold one “instance” of the stock at a time. We can’t buy one instance of the stock on day 1, and then another instance on day 2, and then sell them both later.

We also cannot sell a stock on the same day we buy it, nor buy a new instance on the same day we sell a previous instance. This isn’t so much a problem constraint as an algorithmic insight that there is no benefit to us doing this. Buying and selling on the same day yields no net profit, so we may as well just not use the transaction.

As an example, suppose we have 3 transactions to use, and the following data for the upcoming days:
[1, 4, 8, 2, 7, 1, 15]
The solution here is 26, via the following transactions:

Buy the stock on day 1 for $1, sell it on day 3 for $8 ($7 profit)

Buy the stock on day 4 for $2, sell it on day 5 for $7 ($5 profit)

Buy the stock on day 6 for $1, sell it on day 7 for $15 ($14 profit)

If we only had 2 transactions to work with, the answer would be 21. We would simply omit the second transaction.

The Algorithm

Since this is a “hard” problem, the algorithm description is a bit tricky! But we can break it into a few pieces.

Grid Structure

As I alluded to, this is a multi-dimensional DP problem, but the “dimensions” are not as clear as our last problem, because this problem doesn’t have a spatial nature. But once you do enough DP problems, it gets easier to see what the dimensions are.

One dimension will be the “current day”, and the other will be the “transaction state”. The cell {s, d} will indicate “Given I am in state s on day d, what is the largest additional profit I can achieve?”

The number of days is obviously equal to the size of our input array. This will be our column dimension. So column i will always mean “if I am in this state on day i”.

The number of transaction states is actually double the number of transactions we are allowed. We want one row for each transaction to capture the state after we have bought for this transaction, and one row for before buying as part of this transaction (we’ll refer to this row as “pre-bought” throughout).

We’ll order the rows so that earlier rows represent fewer transactions remaining. Thus the first row indicates the state of having purchased the stock for the final transaction, but not yet having sold it. The second row indicates you have one transaction still available, but you haven’t bought the stock for this transaction yet. The third row indicates you have purchased the stock and you’ll have 1 complete transaction remaining after selling it. And so on. So with n days and k transactions, our grid will have size 2k x n.

Base Cases

Now let’s think about the base cases of this grid. It is easiest to consider the last day, the final column of the grid. If we’re on the last day, the marginal gain we can make if we are holding the stock is simply to sell it (all prices are positive), which would give us a “profit” of the final sale price. We don’t need to consider the cost of buying the stock for these rows. We just think about “given that I have the stock, what’s the most I can end up with”.

Then, for all the “pre-bought” rows, the final column is 0. We don’t have enough time to buy AND sell a stock, so we just do nothing.

Now we can also populate the rows for the final transaction fairly easily. These are base cases as well. We’ll populate them from right to left, meaning from the later days to the earlier days (recall we’ve already filled in the very last day).

For the “top” row, where we’ve already bought the stock for our final transaction, we have two choices. We can “sell” the stock on that day, or “keep” the stock to cell later. The first option means we just use the price for that day, and the second means we use the recorded value for the next day. We want the maximum of these options.

Once we’ve populated the “bought” row, we move on to the “pre-bought” row below it. Again, we’ll loop right to left and have two options each time. We can “buy” the stock, which would move us “up” to the bought row on the next day, except we have to subtract the price of the stock. Or we can “stay” and not buy the stock. This means we grab the value from the same row in the next column. Again, we just use the max of these two options.

At this point, we’ve populated the entire last column of our grid AND the first two rows.

Recursive Cases

For the “recursive” cases (we can actually think of them as “inductive” cases), we go two rows at a time, counting up to our total transaction count. Each transaction follows the same pattern, which is similar to what we did for the rows above.

First, fill in the “bought” row for this transaction. We can “sell” or “keep” the stock. Selling moves us up and to the right, and adds the sale price for that day. But keeping moves us directly right in our grid. Again, we take the max of these options.

Then we fill the “pre-bought” row for this transaction. We can “buy” or “stay”. Buying means subtracting the price for that day from the value up and to the right. Staying means we take the value immediately to our right. As always, take the max.

When we’ve completed populated our grid following this pattern, our final answer is the value in the bottom left of the grid! This is the maximum profit starting from day 0 and before buying for any of our transactions, which is the true starting state of the problem.

Rust Solution

Let’s solve this in Rust first! We begin by defining a few values and handling an edge case (if there’s only 1 day, the answer is 0 since we can’t buy and sell).
pub fn max_profit(k: i32, prices: Vec<i32>) -> i32 {
    let n = prices.len();
    if n == 1 {
        return 0;
    }
    let ku = k as usize;
    let numRows = 2 * ku;

    // Create our zero-ed out grid
    let mut dp: Vec<Vec<i32>> = Vec::with_capacity(numRows);
    dp.resize(numRows, Vec::with_capacity(n));
    for i in 0..numRows {
        dp[i].resize(n, 0);
    }
    ...
}
Now we handle the first two rows (our “final” transaction). In each case, we start with the base case of the final day, and then move from right to left, following the rules described in the algorithm.
pub fn max_profit(k: i32, prices: Vec<i32>) -> i32 {
    ...

    // Final Transaction
    // Always sell on the last day!
    dp[0][n - 1] = prices[n - 1];
    for i in (0..=(n-2)).rev() {
        // Sell or Keep
        dp[0][i] = std::cmp::max(prices[i], dp[0][i+1]);
    }
    dp[1][n - 1] = 0;
    for i in (0..=(n-2)).rev() {
        // Buy (subtract price!) or keep
        dp[1][i] = std::cmp::max(dp[0][i+1] - prices[i], dp[1][i+1]);
    }
    ...
}
Now we write our core loop, going through the remaining transaction count. We start by defining the correct row numbers and setting the final-column base cases:
pub fn max_profit(k: i32, prices: Vec<i32>) -> i32 {
    // Setup
    ...
    // Final Transaction
    ...
    // All other transactions
    for j in 1..ku {
        let boughtRow = 2 * j;
        let preBoughtRow = boughtRow + 1;
        // Always sell on the last day!
        dp[boughtRow][n - 1] = prices[n - 1];
        // 0 - No time to buy/sell!
        dp[preBoughtRow][n - 1] = 0;
        ...
    }
}
And now we apply the logic for our algorithm. As we populate each row from right to left, we simply apply our two choices: sell/keep for the “bought” row and buy/stay for the “pre-bought” row.
pub fn max_profit(k: i32, prices: Vec<i32>) -> i32 {
    ...
    // All other transactions
    for j in 1..ku {
        let boughtRow = 2 * j;
        let preBoughtRow = boughtRow + 1;
        // Always sell on the last day!
        dp[boughtRow][n - 1] = prices[n - 1];
        // 0 - No time to buy/sell!
        dp[preBoughtRow][n - 1] = 0;
        // Sell or Keep!
        for i in (0..=(n-2)).rev() {
            dp[boughtRow][i] = std::cmp::max(dp[boughtRow - 1][i+1] + prices[i], dp[boughtRow][i + 1]);
        }
        // Buy or Stay!
        for i in (0..=(n-2)).rev() {
            dp[preBoughtRow][i] = std::cmp::max(dp[boughtRow][i+1] - prices[i], dp[preBoughtRow][i + 1])
        }
    }
    return dp[numRows - 1][0];
}
This completes our loop, and the final thing we need, as you can see, is to return the value in the bottom left of our grid!

Here is the complete solution:
pub fn max_profit(k: i32, prices: Vec<i32>) -> i32 {
    let n = prices.len();
    if n == 1 {
        return 0;
    }
    let ku = k as usize;
    let numRows = 2 * ku;
    let mut dp: Vec<Vec<i32>> = Vec::with_capacity(numRows);
    dp.resize(numRows, Vec::with_capacity(n));
    for i in 0..numRows {
        dp[i].resize(n, 0);
    }

    // Final Transaction
    dp[0][n - 1] = prices[n - 1];
    for i in (0..=(n-2)).rev() {
        dp[0][i] = std::cmp::max(prices[i], dp[0][i+1]);
    }
    dp[1][n - 1] = 0;
    for i in (0..=(n-2)).rev() {
        dp[1][i] = std::cmp::max(dp[0][i+1] - prices[i], dp[1][i+1]);
    }

    // All other transactions
    for j in 1..ku {
        let boughtRow = 2 * j;
        let preBoughtRow = boughtRow + 1;
        dp[boughtRow][n - 1] = prices[n - 1];
        dp[preBoughtRow][n - 1] = 0;
        for i in (0..=(n-2)).rev() {
            dp[boughtRow][i] = std::cmp::max(dp[boughtRow - 1][i+1] + prices[i], dp[boughtRow][i + 1]);
        }
        for i in (0..=(n-2)).rev() {
            dp[preBoughtRow][i] = std::cmp::max(dp[boughtRow][i+1] - prices[i], dp[preBoughtRow][i + 1])
        }
    }
    return dp[numRows - 1][0];
}
Haskell Solution

As we saw in our first DP problem, we often don’t need as much memory as it initially seems. We filled out the “whole grid” for Rust, which helps make the algorithm more clear. But our Haskell solution will reflect the fact that we only actually need to pass along one preceding row (the pre-bought row) each time we loop through a transaction.

Let’s start by defining our edge case, as well as a few useful terms. We’ll define our indices in left-to-right order, but in all cases we’ll loop through them in reverse with foldr:
maxProfit :: V.Vector Int -> Int -> Int
maxProfit nums k = if n == 1 then 0
  else ...
  where
    n = V.length nums
    lastPrice = nums V.! (n - 1)
    idxs = ([0..(n-2)] :: [Int])
    ...
Now we’ll define three different “loop” functions, all with the same pattern. We’ll use an IntMap Int to represent each “row” in our grid. So these functions will modify the IntMap for the row as we go along, while taking the new “index” we are populating. Let’s start with the base case, the first “bought” row, corresponding to our final transaction.

It will give us two options: sell or keep, following our algorithm. We insert the max of these into the map.
maxProfit :: V.Vector Int -> Int -> Int
maxProfit nums k = if n == 1 then 0
  else ...
  where
    n = V.length nums
    lastPrice = nums V.! (n - 1)
    idxs = ([0..(n-2)] :: [Int])

    ibFold :: Int -> IM.IntMap Int -> IM.IntMap Int
    ibFold i mp =
      let sell = nums V.! i
          keep = mp IM.! (i + 1)
      in  IM.insert i (max sell keep) mp
    initialBought = foldr ibFold (IM.singleton (n-1) lastPrice) idxs

    ...
We construct our initialBought row by folding, starting with a singleton of the last column base case.

Now we’ll write a function that, given a “bought” row, can construct the preceding “pre-bought” row. This will apply the “buy” and “stay” ideas in our algorithm and select between them. Choosing the “buy” option requires looking into the preceding “bought” row, while “stay” looking into a later index of the existing map:
maxProfit :: V.Vector Int -> Int -> Int
maxProfit nums k = if n == 1 then 0
  else ...
  where
    ...
    initialBought = foldr ibFold (IM.singleton (n-1) lastPrice) idxs
    
    preBoughtFold :: IM.IntMap Int -> Int -> IM.IntMap Int -> IM.IntMap Int
    preBoughtFold bought i preBought =
      let buy = bought IM.! (i+1) - nums V.! i
          stay = preBought IM.! (i+1)
      in  IM.insert i (max buy stay) preBought

    initialPreBought = foldr (preBoughtFold initialBought) (IM.singleton (n-1) 0) idxs
We construct the initialPreBought row by applying this function with initialBought as the input. But we’ll use this for the rest of our “pre-bought” rows as well! First though, we need a more general loop for the rest of our “bought” rows.

This function has the same structure as pre-bought, just applying the “sell” and “keep” rules instead of “buy” and “stay”:
maxProfit :: V.Vector Int -> Int -> Int
maxProfit nums k = if n == 1 then 0
  else ...
  where
    ...
    
    boughtFold :: IM.IntMap Int -> Int -> IM.IntMap Int -> IM.IntMap Int
    boughtFold preBought i bought =
      let sell = preBought IM.! (i+1) + nums V.! i
          keep = bought IM.! (i+1)
      in  IM.insert i (max sell keep) bought
Now we’re ready for our core loop! This will loop through every transaction except the base case. It takes only the preceding “pre-bought” row and the transaction counter. Once the counter reaches k, we return the first value in this row. Otherwise, we run the “bought” loop to produce a next “bought” row, and we pass this in to the “pre-bought” loop to produce a new “pre-bought” row. This becomes the input to our recursive call:
maxProfit :: V.Vector Int -> Int -> Int
maxProfit nums k = if n == 1 then 0
  else loop 1 initialPreBought
  where
    ...
    loop :: Int -> IM.IntMap Int -> Int
    loop i preBought = if i >= k then preBought IM.! 0
      else
        let bought' = foldr (boughtFold preBought) (IM.singleton (n-1) lastPrice) idxs
            preBought' = foldr (preBoughtFold bought') (IM.singleton (n-1) 0) idxs
        in  loop (i + 1) preBought'
As you can see above, we complete the solution by calling our loop with the initial “pre-bought” row, and a transaction counter of 1!

Here’s our full Haskell solution:
maxProfit :: V.Vector Int -> Int -> Int
maxProfit nums k = if n == 1 then 0
  else loop 1 initialPreBought
  where
    n = V.length nums
    lastPrice = nums V.! (n - 1)
    idxs = ([0..(n-2)] :: [Int])

    ibFold :: Int -> IM.IntMap Int -> IM.IntMap Int
    ibFold i mp =
      let sell = nums V.! i
          keep = mp IM.! (i + 1)
      in  IM.insert i (max sell keep) mp
    initialBought = foldr ibFold (IM.singleton (n-1) lastPrice) idxs
    
    preBoughtFold :: IM.IntMap Int -> Int -> IM.IntMap Int -> IM.IntMap Int
    preBoughtFold bought i preBought =
      let buy = bought IM.! (i+1) - nums V.! i
          stay = preBought IM.! (i+1)
      in  IM.insert i (max buy stay) preBought

    initialPreBought = foldr (preBoughtFold initialBought) (IM.singleton (n-1) 0) idxs

    boughtFold :: IM.IntMap Int -> Int -> IM.IntMap Int -> IM.IntMap Int
    boughtFold preBought i bought =
      let sell = preBought IM.! (i+1) + nums V.! i
          keep = bought IM.! (i+1)
      in  IM.insert i (max sell keep) bought

    loop :: Int -> IM.IntMap Int -> Int
    loop i preBought = if i >= k then preBought IM.! 0
      else
        let bought' = foldr (boughtFold preBought) (IM.singleton (n-1) lastPrice) idxs
            preBought' = foldr (preBoughtFold bought') (IM.singleton (n-1) 0) idxs
        in  loop (i + 1) preBought'
Conclusion

That’s the last LeetCode solution we’re going to write for now! Hopefully you’ve got a good impression now on the differences between dynamic programming in Haskell when compared to a language like Rust. To learn more about the basics of these Haskell solutions, take a look at our course, Solve.hs! You’ll also get a ton of practice with hundreds of exercise problems in the course!

Next week we will switch gears, and start working on some interesting parsing problems!
by James Bowen at October 27, 2025 08:30 AM

October 26, 2025

Ken T Takusagawa

[blyqokgn] simultaneously define record type and data

we propose a Haskell syntax extension which may be useful when a record type has very few data values of that type, perhaps just one, e.g., the Singleton design pattern.

introduce the keyword DEFINERECORD:

singletonrecord :: Recordtype;
singletonrecord = DEFINERECORD RecordType RecordConstructor { john :: Int = 1, paul :: String = "bass", george :: Bool = True, ringo :: Float = 1.0};
the benefit of this syntax is that the field types and their corresponding values get defined next to each other. future code changes to type and value will happen at the same place. there is less danger of accidentally leaving a field uninitialized.
also, the field labels are each written exactly once (Don't Repeat Yourself), in contrast to the current method (below), defining the data type then defining the singleton record using record syntax, which requires typing each field label twice. although you can optionally put type annotations on field values so that field, value, and type are next to each other when defining the singleton, it feels like even more Repeating Yourself.

data RecordType = RecordConstructor { john :: Int, paul :: String, george :: Bool, ringo :: Float };
singletonrecord :: Recordtype;
singletonrecord = RecordConstructor { john = 1 :: Int, paul = "bass" :: String, george = True :: Bool, ringo = 1 :: Float};

the proposed syntax cannot be used if RecordType has multiple constructors. (because there can only be one constructor, also consider simpler syntax which restricts the constructor to be the same as that of the type.)

if you have nested records, the syntax can define inner record types "in place" as well.
singletonrecord :: Recordtype;
singletonrecord = DEFINERECORD RecordType RecordConstructor { john :: Int = 1, paul :: String = "bass", george :: Bool = True, ringo = DEFINERECORD DrumType DrumConstructor { snare :: Double = 1.0, cymbal :: String = "crash" } };
this might be a nice feature to combine with Data.Default .

slight variation: record types can be unnamed (anonymous), but have named accessor functions (named fields). introduce the keyword UNNAMEDRECORD, which stands for both type and constructor.

f1 :: UNNAMEDRECORD;
f1 = UNNAMEDRECORD { john :: String = "guitar", paul :: String = "bass", george :: String = "guitar", ringo = UNNAMEDRECORD { snare :: Double = 1.0, cymbal :: String = "crash" } };

f2 :: UNNAMEDRECORD;
f2 = UNNAMEDRECORD { capital :: String = "london", home :: String = "liverpool" };

g :: String;
g = let
{ v1 :: UNNAMEDRECORD
; v1 = f1
; v2 :: UNNAMEDRECORD
; v2 = f2
} in cymbal (ringo v1) ++ home v2;
-- returns "crashliverpool"

this is better than returning multiple values through tuple syntax because components get named, both when creating and extracting. using names seems less error-prone than extracting tuple components by position:

g = let { v1 = f1 ; v2 = f2 } in (case v1 of {(_,_,_, (_,x) )->x}) ++ snd v2

open question: if we use DEFINERECORD or UNNAMEDRECORD inside a let or where, should the type name and accessor functions escape the let and become visible in the global scope? perhaps explicitly mark things for export from the let (requires more new syntax).
more sophistication (and complexity) possible: lenses, record wildcards and puns, etc. type inference probably becomes more difficult.
related work by Alexander Thiemann: SuperRecord anonymous records.

although we propose this extension for Haskell, it seems a nice feature to have in any programming language.

by Unknown (noreply@blogger.com) at October 26, 2025 06:30 PM

October 25, 2025

Edward Z. Yang

Draw high dimensional tensors as a matrix of matrices
I have recently needed to draw the contents of high-dimensional (e.g., 4D and up) tensors where it is important to ensure that is clear how to identify each of the dimensions in the representation. Common strategies I've seen people do in this situation include printing a giant list 2D slices (what the default PyTorch printer will do) or flattening the Tensor in some way back down to a 2D tensor. However, if you have a lot of horizontal space, there is a strategy that I like that makes it easy to identify all the axes of the higher dimensional tensor: draw it as a matrix of matrices.

Here are some examples, including the easy up-to-2D cases for completeness.

0D: torch.arange(1).view()
0
1D: torch.arange(2)
0  1
2D: torch.arange(4).view(2, 2 )
0  1
2  3
3D: torch.arange(8).view(2, 2, 2)
0  1    4  5
2  3    6  7
4D: torch.arange(16).view(2, 2, 2, 2)
 0  1    4  5
 2  3    6  7

 8  9   12 13
10 11   14 15
5D: torch.arange(32).view(2, 2, 2, 2, 2):
 0  1    4  5  :  16 17   20 21
 2  3    6  7  :  18 19   22 23
               :
 8  9   12 13  :  24 25   28 29
10 11   14 15  :  26 27   30 31
The idea is that every time you add a new dimension, you alternate between stacking the lower dimension matrices horizontally and vertically. You always stack horizontally before stacking vertically, to follow the standard row-major convention for printing in the 2D case. Dimensions always proceed along the x and y axis, but the higher dimensions (smaller dim numbers) involve skipping over blocks. For example, a "row" on dim 3 in the 4D tensor is [0, 1] but the "row" on dim 1 is [0, 4] (we skip over to the next block.) The fractal nature of the construction means we can keep repeating the process for as many dimensions as we like.

In fact, for the special case when every size in the tensor is 2, the generated sequence of indices form a Morton curve. But I don't call it that, since I couldn't find a popular name for the variation of the Morton curve where the radix of each digit in the coordinate representation can vary.

Knowledge check. For the 4D tensor of size (2, 2, 2, 2) arranged in this way, draw the line(s) that would split the tensor into the pieces that torch.split(x, 1, dim), for each possible dimension 0, 1, 2 and 3. Answer under the fold.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.
dim=0

>>> [x.reshape(-1) for x in torch.arange(16).view(2,2,2,2).split(1,dim=0)]
[tensor([0, 1, 2, 3, 4, 5, 6, 7]), tensor([ 8, 9, 10, 11, 12, 13, 14, 15])]

     0  1    4  5
     2  3    6  7
   ----------------
     8  9   12 13
    10 11   14 15


dim=1

>>> [x.reshape(-1) for x in torch.arange(16).view(2,2,2,2).split(1,dim=1)]
[tensor([ 0, 1, 2, 3, 8, 9, 10, 11]), tensor([ 4, 5, 6, 7, 12, 13, 14, 15])]

     0  1 |  4  5
     2  3 |  6  7
          |
     8  9 | 12 13
    10 11 | 14 15

dim=2

>>> [x.reshape(-1) for x in torch.arange(16).view(2,2,2,2).split(1,dim=2)]
[tensor([ 0, 1, 4, 5, 8, 9, 12, 13]), tensor([ 2, 3, 6, 7, 10, 11, 14, 15])]

     0  1    4  5
   ------- -------
     2  3    6  7

     8  9   12 13
   ------- -------
    10 11   14 15

dim=3

>>> [x.reshape(-1) for x in torch.arange(16).view(2,2,2,2).split(1,dim=3)]
[tensor([ 0, 2, 4, 6, 8, 10, 12, 14]), tensor([ 1, 3, 5, 7, 9, 11, 13, 15])]

     0 |  1    4 |  5
     2 |  3    6 |  7

     8 |  9   12 | 13
    10 | 11   14 | 15
by Edward Z. Yang at October 25, 2025 04:55 PM

October 23, 2025

Tweag I/O

Introduction to Agentic Coding

AI-assisted coding is having its moment. For autocomplete tools and AI agents like GitHub Copilot and Cursor, the hype is real. But so is the confusion. Are we replacing developers? Can anyone build software just by prompting? Is “vibe coding” the future?

At Modus Create, we wanted to cut through the noise. So we ran a real experiment: two teams, same scope, same product, same timeline. One team used traditional workflows. The other used AI agents to scaffold, implement, and iterate — working in a new paradigm we call Agentic Coding.

Every technique we learned along the way and every insight this approach taught us is collected in our Agentic Coding Handbook. This article distills the lessons from the handbook into the core principles and practices any engineer can start applying today.

From Typing Code to Designing Systems

Agentic coding isn’t about writing code faster. It’s about working differently. Instead of manually authoring every line, engineers become high-level problem solvers. They define the goal, plan the implementation, and collaborate with an AI agent that writes code on their behalf.

Agentic Coding is a structured, AI-assisted workflow where skilled engineers prompt intentionally, validate rigorously, and guide the output within clear architectural boundaries.

This approach is fundamentally different from what many refer to as “vibe coding”, the idea that you can throw a vague prompt at an LLM and see what comes back. That mindset leads to bloated code, fragile architecture, and hallucinations.

Agentic Coding vs. Vibe Coding

To illustrate the difference, here’s how agentic coding compares to the more casual “vibe coding” approach across key dimensions:

Agentic Coding Vibe Coding

Planning Structured implementation plan None or minimal upfront thinking

Prompting Scoped, intentional, reusable Loose, improvisational, trial-and-error

Context Deliberately curated via files/MCPs Often missing or overloaded

Validation Treated as a critical engineering step Frequently skipped or shallow

Output Quality High, repeatable, aligned to standards Inconsistent, often needs full rewrite

Team Scalability Enables leaner squads with high output Prone to technical debt and drift

Agentic coding provides the structure, discipline, and scalability that large organizations need to standardize success across multiple squads. It aligns AI workflows with existing engineering quality gates, enabling automation without losing control. In contrast, vibe coding may produce short-term wins but fails to scale under the weight of enterprise demands for predictability, maintainability, and shared accountability.

A Note on Our Experiment

We ran a structured experiment with two engineering squads working on the same product. One team (DIY) built the product using traditional methods. The other team (AI) used Cursor and GitHub Copilot Agent to complete the same scope, using agentic workflows. The AI team had 30% fewer engineers and delivered in half the time. More importantly, the code quality — verified by SonarQube and human reviewers — was consistent across both teams.

Core Practices That Make the Difference

Implementation Planning is Non-Negotiable

Before any prompting happens, engineers must do the thinking. Creating an implementation plan isn’t just a formality but the most critical piece in making agentic coding work. It’s where intent becomes design.

A solid implementation plan defines what to build, but also why, how, and within what constraints. It includes:

Functional goals: What should this piece of code do?

Constraints: Performance expectations, architecture rules, naming conventions, etc.

Edge cases: Known pitfalls, alternate flows, integration risks.

Required context: Links to schemas, designs, existing modules, etc.

Step-by-step plan: Breakdown of the task into scoped units that will become individual prompts.

This plan is usually written in markdown and lives inside the codebase. It acts like a contract between the engineer and the AI agent.

The more precise and explicit this document is, the easier it is to turn each unit into a high-quality prompt. This is where agentic coding shifts from “throw a prompt and see what happens” to deliberate system design, supported by AI.

In short, prompting is the act. Planning is the discipline. Without it, you’re not doing agentic coding — you’re just taking shots in the dark and hoping something works.

Prompt Engineering is a Real Skill

Prompt engineering is not about being clever. It’s about being precise, scoped, and iterative. We teach engineers to break down tasks into discrete steps, write action-oriented instructions, avoid vague intentions, chain prompts, and use prompting strategies like:

Three Experts: Use this when you want multiple perspectives on a tough design problem. For example, ask the AI to respond as a senior engineer, a security expert, and a performance-focused architect.

N-Shot Prompting: Provide the AI with N examples of the desired output format or pattern. Zero-shot uses no examples, one-shot provides a single example, and few-shot (N-shot) includes multiple examples to guide the AI toward the expected structure and style.

10 Iteration Self-Refinement: Best used when you want the AI to improve its own output iteratively. Give it a problem, then prompt it to improve its previous response 10 times, evaluating each step with reasoning.

Choosing the right style depends on the type of challenge you’re tackling — architectural design, implementation, refactoring, or debugging.

Context is a First-Class Citizen

Model Context Providers (MCPs) give GitHub Copilot a second brain. Instead of treating the LLM as an isolated suggester, MCPs stream relevant context — from Figma designs, documentation in Confluence, code changes from GitHub, and decision logs — directly into the Copilot chat session.

This allows engineers to ask Copilot to write code that matches an actual UI layout, or implements some logic described in a design doc, without manually pasting content into the prompt. The results are significantly more relevant and aligned. Some of the MCPs we use are:

GitHub MCP: Pulls in pull request content and comments to give the model full context for writing review responses, proposing changes, or continuing implementation from feedback.

Figma MCP: Streams UI layouts into the session, enabling the AI to generate frontend code that accurately reflects the design.

Database Schema MCP: Injects table structures, column types, and relationships to help the AI write or update queries, migrations, or API models with accurate field-level context.

Memory Bank MCP: Shares scoped memory across sessions and team members, maintaining continuity of architectural decisions, prompt history, and recent iterations.

CloudWatch MCP: Supplies log output to the AI for debugging and incident triage — essential during the Debugging workflow.

SonarQube MCP: Feeds static analysis results so the AI can refactor code to eliminate bugs, smells, or duplication.

Confluence MCP: Integrates architecture and business documentation to inform decisions around domain logic, constraints, and requirements.

MCPs are just one part of the context curation puzzle. Engineers also need to deliberately craft the model’s working memory for each session. That includes:

Implementation Plans: Markdown files that define goals, steps, constraints, and trade-offs, acting as an onboarding doc for the AI agent.

Codebase Files: Selectively attaching relevant parts of the codebase (like entry points, shared utilities, schemas, or config files) so the AI operates with architectural awareness.

Console Logs or Test Output: Including runtime details helps the AI understand execution behavior and suggest context-aware fixes.

Instructions or TODO Blocks: GitHub Copilot supports markdown-based instruction files and inline TODO comments to guide its code generation. These instructions act like lightweight tickets embedded directly in the repo. For example, an INSTRUCTIONS.md might define architectural rules, file responsibilities, or interface contracts. Within code files, TODOs like // TODO: replace mock implementation with production-ready logic act as scoped prompts that Copilot can act on directly. Used consistently, these become in-repo signals that align the agent’s output with team expectations and design intent, markers inside the code to direct the model towards a specific change or design pattern.

Effective context curation is an engineering discipline. Give too little, and the agent hallucinates. Give too much, and it loses focus or runs out of space in the LLM context window. The best results come from curating the smallest possible set of high-signal resources. When you treat context as a design artifact the AI becomes a more reliable collaborator.

The Role of Workflows

We embedded AI in our delivery pipeline using a set of core workflows. You can explore each one in more detail in our handbook, but here is the high-level overview:

Workflow Purpose

Spec-First Write a scoped prompt plan before coding

Exploratory Understand unfamiliar codebases with AI help

Memory Bank Maintain continuity across sessions and team members

TDD Test-first with AI-generated test coverage

Debugging Use AI to triage, investigate, and fix bugs

Visual Feedback Align AI output with Figma and screenshots

Auto Validations Run tools like SonarQube, ESLint post-output

In our experience, these workflows are not just productivity boosters; they’re the foundation for scaling AI-assisted development across teams. They provide consistency, repeatability, and shared mental models. We believe this approach is especially critical in enterprise environments, where large engineering organizations require predictable output, quality assurance, and alignment with established standards. Agentic workflows bring just enough structure to harness AI’s strengths without sacrificing accountability or control.

Building a Validation Loop

We use validation tools like SonarQube, ESLint, Vitest, and Prettier to provide automatic feedback to the AI. For example, if SonarQube flags duplication, we prompt the AI to refactor accordingly. This creates a tight loop where validation tools become coaching signals.

Some tools, like GitHub Copilot, can even collect log output from the terminal running tests or executing scripts. This allows the AI to observe the outcome of code execution, analyze stack traces or test failures, and automatically attempt fixes. One common approach is asking the AI to run a test suite, interpret the failed test results, make corrections, and repeat this process until all tests pass.

Lizard, a tool that calculates code complexity metrics, is another useful validation tool. Engineers can instruct the AI to execute Lizard against the codebase. When the output indicates that a function exceeds the defined complexity threshold (typically 10), the AI is prompted to refactor that function into smaller, more maintainable blocks. This method forces the AI to act on specific, measurable quality signals and improves overall code readability.

In this setup, engineers can let the AI operate in a closed loop for several iterations. Once the AI produces clean validation results — whether through passing tests, static analysis, or complexity reduction — the human engineer steps back in to review the result. This combination of automation and oversight speeds up bug fixing while maintaining accountability.

But here’s the thing: the team needs to actually understand what the AI built. If you’re just rubber-stamping AI changes without really getting what they do, you’re setting yourself up for trouble. The review step isn’t just a checkbox — it’s where you make sure the code actually makes sense for your system.

Why Human Oversight Still Matters

No AI is accountable for what goes to production. Engineers are. AI doesn’t own architectural tradeoffs, domain-specific reasoning, or security assumptions. Human-in-the-loop is the safety mechanism.

Humans are the only ones who can recognize when business context changes, when a feature should be cut for scope, or when a security concern outweighs performance gains. AI can assist in code generation, validation, and even debugging — but it lacks the experience, judgment, and ownership required to make trade-offs that affect users, stakeholders, or the long-term health of the system.

Human engineers are also responsible for reviewing the AI’s decisions, ensuring they meet legal, ethical, and architectural constraints. This is especially critical in regulated industries, or when dealing with sensitive data. Without a human to enforce these standards, the risk of silent failure increases dramatically.

Agentic coding isn’t about handing off responsibility, it’s about amplifying good engineering judgment.

Where People Fail (And Blame the AI)

Common mistakes include vague prompts, lack of planning, poor context, and not validating output. While LLMs have inherent limitations — they hallucinate, make incorrect assumptions, and produce plausible-sounding but wrong outputs even with good inputs — engineering discipline significantly increases the reliability of results.

A prompt like “make this better” tells the AI nothing about what “better” means — faster? more readable? safer? Without clear constraints and context, LLMs default to producing generic solutions that may not align with your actual needs. The goal isn’t to eliminate all AI errors, but to create workflows that catch and correct them systematically.

Lack of validation is another key failure mode. Trusting the first output, skipping tests, or ignoring code quality tools defeats the point of the feedback loop. AI agents need boundaries and coaching signals or, without them, they can drift into plausible nonsense.

Using these tools effectively also means understanding their current limitations. AI models work best with well-represented programming languages like JavaScript, TypeScript, and Python (to name a few examples). However, teams working in specialized domains may see limited results even with popular languages.

A Closer Look at Our Tooling

GitHub Copilot played a key role in our experiment, especially when paired with instruction files, validation scripts, and Model Context Providers (MCPs).

What made GitHub Copilot viable for agentic workflows wasn’t just its autocomplete or inline chat. It was how we surrounded it with structure and feedback mechanisms:

Instruction Files

Instruction files served as the AI’s map. These markdown-based guides detailed the implementation plan, scoped tasks, architectural constraints, naming conventions, and even file-level goals. When placed inside the repo, they gave GitHub Copilot context it otherwise wouldn’t have. Unlike ad-hoc prompts, these files were written with intent and discipline, and became a critical part of the repo’s knowledge layer.

Validation Scripts

We paired Copilot with post-generation validation tools like ESLint, Vitest, Horusec, and SonarQube. These weren’t just guardrails but closers of the loop. When Copilot generated code that violated rules or failed tests, engineers would reframe the prompt with validation results as input. This prompted Copilot to self-correct. It’s how we turned passive AI output into an iterative feedback process.

Copilot + Workflows = Impact

Used this way, GitHub Copilot became more than a helper. It became a participant in our structured workflows:

In Spec-First, Copilot consumed instruction files to scaffold code.

In Debugging, it analyzed logs fed via MCP and proposed targeted fixes.

In TDD, it generated unit tests from requirements, then refactored code until tests passed.

In Visual Feedback, it aligned components with Figma via the design MCP.

By aligning Copilot with prompts, plans, validation, and context, we moved from “code completion” to code collaboration.

So no — GitHub Copilot isn’t enough on its own. But when embedded inside a disciplined workflow, with context and feedback flowing in both directions, it’s a capable agent. One that gets better the more structured your engineering practice becomes.

Final Advice: How to Actually Start

The path to agentic coding begins with a single, well-chosen task. Pick something atomic that you understand deeply — a function you need to refactor, a component you need to build, or a bug you need to fix. Before touching any AI tool, write an implementation plan that defines your goals, constraints, and step-by-step approach.

Once you have your plan, start experimenting with the workflows we’ve outlined. Try Spec-First to scaffold your implementation, then use Auto Validations to create feedback loops. If you’re working with UI, explore Visual Feedback with design tools. As you gain confidence, introduce Model Context Providers to give your AI agent richer context about your codebase and requirements. Always keep in mind that the quality of AI output depends on the quality of the task setup and the availability of feedback.

Treat each interaction as both an experiment and a learning opportunity. Validate every output as if it came from a junior developer. Most importantly, remember that this isn’t about replacing your engineering judgment; it’s about amplifying it. The most successful engineers in our experiments were the ones who treated the AI as a collaborator — not a magician.

What we’ve described isn’t just a productivity technique — it’s a fundamental shift in how we think about human creativity and machine capability. When engineers become high-level problem solvers, supported by AI agents within well-defined boundaries, we unlock new possibilities for what software teams can accomplish. Welcome to the next era of software development.

October 23, 2025 12:00 AM

	Agentic Coding	Vibe Coding
Planning	Structured implementation plan	None or minimal upfront thinking
Prompting	Scoped, intentional, reusable	Loose, improvisational, trial-and-error
Context	Deliberately curated via files/MCPs	Often missing or overloaded
Validation	Treated as a critical engineering step	Frequently skipped or shallow
Output Quality	High, repeatable, aligned to standards	Inconsistent, often needs full rewrite
Team Scalability	Enables leaner squads with high output	Prone to technical debt and drift

Workflow	Purpose
Spec-First	Write a scoped prompt plan before coding
Exploratory	Understand unfamiliar codebases with AI help
Memory Bank	Maintain continuity across sessions and team members
TDD	Test-first with AI-generated test coverage
Debugging	Use AI to triage, investigate, and fix bugs
Visual Feedback	Align AI output with Figma and screenshots
Auto Validations	Run tools like SonarQube, ESLint post-output

October 21, 2025

Abhinav Sarkar

A Fast Bytecode VM for Arithmetic: The Virtual Machine
In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:

Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).

Unit testing for our parser.

Interpreting ASTs.

Compiling ASTs to bytecode.

Disassembling and decompiling bytecode.

Unit testing for our compiler.

Property-based testing for our compiler.

Efficiently executing bytecode in a virtual machine (VM).

Unit testing and property-based testing for our VM.

Benchmarking our code to see how the different passes perform.

All the while keeping an eye on performance.

In this final post, we write the virtual machine that executes our bytecode, and benchmark it.

This post was originally published on abhinavsarkar.net.

This post is part of the series: A Fast Bytecode VM for Arithmetic.

The Parser

The Compiler

The Virtual Machine (you are here)

Contents
Introduction
Testing the Compiler
The Virtual Machine
Testing the VM
Benchmarking the VM
Benchmarking Against C
Future Directions
Conclusion

Introduction

Bytecode Virtual Machines (VMs) are known to be faster than AST-walking interpreters. That’s why many real-world programming languages these days are implemented with bytecode VMs, for example, Java, Python, PHP, and Raku. The reason is partially, the flat and compact nature of bytecode itself. But VMs also have a few other tricks up their sleeves that make them highly performant. In this post, we write a VM for our arithmetic expression language, and explore some of these performance tricks.

But first, we need to finish a pending task.

Testing the Compiler

We wrote some unit tests for our compiler in the last post, but unit tests cover only the cases we can think of. A compiler has to deal with any input, and with just unit tests we cannot be sure of its correctness.

To test our compiler and other components for correctness, we use the QuickCheck library. QuickCheck is a Property-based Testing framework. The key idea of property-based testing is to write properties of our code that hold true for any input, and then to automatically generate a large number of arbitrary inputs and make sure that the properties are indeed true for them¹ ². Since we are writing an arithmetic expression parser/compiler/VM, we generate arbitrary expression ASTs, and use them to assert certain invariants of our program.

With QuickCheck, we write generators for the inputs for our tests. These generators are composable just like parser combinators are. We use the library provided generators to write small generators that we combine to create larger ones. Let’s start:
numGen :: Q.Gen Expr
numGen = Num <$> Q.arbitrary

varGen :: Set.Set Ident -> Q.Gen Expr
varGen vars = Var <$> Q.elements (Set.toList vars)
 
identGen :: Q.Gen Ident
identGen =
  mkIdent
    <$> ( (:)
            <$> Q.elements lower
            <*> Q.resize 5 (Q.listOf1 $ Q.elements validChars)
        ) `Q.suchThat` (not . isReservedKeyword . BSC.pack)
  where
    lower = ['a' .. 'z']
    validChars = lower <> ['A' .. 'Z']
ArithVMLib.hs
First come the basic generators:

numGen generates number expressions by using QuickCheck’s built-in arbitrary function.

varGen generates variable expressions by choosing from the set of passed valid variable names.

identGen generates valid identifiers from combinations of letters a—z and A—Z, and discarding ones that are reserved keywords.

Moving on to composite generators:
binOpGen :: Set.Set Ident -> Int -> Q.Gen Expr
binOpGen vars size =
  BinOp
    <$> Q.chooseEnum (Add, Div)
    <*> exprGen vars (size `div` 2)
    <*> exprGen vars (size `div` 2)

letGen :: Set.Set Ident -> Int -> Q.Gen Expr
letGen vars size = do
  x <- identGen
  let vars' = Set.insert x vars
  Let x <$> exprGen vars (size `div` 2) <*> exprGen vars' (size `div` 2)

exprGen :: Set.Set Ident -> Int -> Q.Gen Expr
exprGen vars size
  | size < 5 = Q.frequency [(4, Q.oneof baseGens), (1, Q.oneof compositeGens)]
  | otherwise = Q.frequency [(1, Q.oneof baseGens), (4, Q.oneof compositeGens)]
  where
    baseGens = numGen : [varGen vars | not $ Set.null vars]
    compositeGens = [binOpGen vars size, letGen vars size]
ArithVMLib.hs
binOpGen generates binary expressions with arbitrary binary operations. It recursively calls exprGen to generate the operands. The size parameter controls the complexity of the generated expressions, and we half the size of operands (and so on recursively) so that we don’t end up with infinitely large expressions.

letGen generates Let expressions by generating an identifier, and then generating the assignment and body expressions recursively. We do the same trick of halving sizes here as well. Notice that the assignment is generated with the passed variable names in scope, whereas the body is generated with the new identifier added to the scope.

exprGen uses the above generators to generate all kinds of expressions. At smaller sizes, it prefers to generate base expressions, while at larger sizes, it prefers composite ones. Due to the careful recursive halving of size in composite generators, we end up with expressions of finite sizes.

Finally, we have some instances of QuickCheck’s Arbitrary type class to tie everything together:
instance Q.Arbitrary Expr where
  arbitrary = Q.sized $ exprGen Set.empty
  shrink = Q.genericShrink

instance Q.Arbitrary Ident where
  arbitrary = identGen

instance Q.Arbitrary Op where
  arbitrary = Q.chooseEnum (Add, Div)
ArithVMLib.hs
We can apply them in GHCi:
> :set -XTypeApplications
> Q.sample $ Q.arbitrary @Expr
0
((let jgSg = 2 in (-2 - -2)) + -2)
2
(0 / 1)
(-11 / -13)
((let kpuS = 10 in 31) + (let jChmZV = -12 in jChmZV))
((54 * -55) * (let ohLSk = 29 in -45))
(-102 - (-119 * -125))
(-234 - (32 / -217))
(let kVrB = (-261 * 238) in ((let qdz = 228 in 347) + 18))
(let uMMdXH = ((let ePUi = 842 in ePUi) - (let zrkM = (let vwH = ((9 + -987) / -487) in (let ylKowr = vwH in vwH)) in zrkM)) in (((uMMdXH / -836) / uMMdXH) - (let qkK = uMMdXH in qkK)))
Notice that the generated samples increase in complexity. With the generators in place, we define our properties next. Let’s test our parser first:
prop_print_ast_then_parse_returns_same_ast :: Spec
prop_print_ast_then_parse_returns_same_ast =
  prop "Property: Print AST then parse returns same AST" $ \expr ->
    parse (BSC.pack $ show expr) == Right expr
ArithVMSpec.hs
This property is a simple round-trip test for the parser and printer: we parse the string representation of a generated expression, and assert that it gives back the same expression.

The second property is a more involved round-trip test for the compiler and decompiler:
prop_disassemble_bytecode_then_decompile_then_compile_returns_same_bytecode :: Spec
prop_disassemble_bytecode_then_decompile_then_compile_returns_same_bytecode =
  prop ( "Property: Disassemble bytecode then decompile then compile"
         <> " returns same bytecode" ) $ \expr ->
    case compile (sizedExpr expr) of
      Left _ -> Q.discard
      Right bytecode ->
        (disassemble bytecode >>= (decompile >>> fmap sizedExpr) >>= compile)
          == Right bytecode
ArithVMSpec.hs
This asserts that compiling an expression, then disassembling and decompiling it, and finally compiling it again should result in the original bytecode³.

This requires a helper function to get the size of an expression:
sizedExpr :: Expr -> SizedExpr
sizedExpr expr = case expr of
  Num _ -> (expr, 3)
  Var _ -> (expr, 2)
  BinOp _ a b -> (expr, snd (sizedExpr a) + snd (sizedExpr b) + 1)
  Let _ a b -> (expr, snd (sizedExpr a) + snd (sizedExpr b) + 1)
ArithVMLib.hs
We run these tests in a later section. This ends our short detour.

The Virtual Machine

Now for the main event: the virtual machine. Our VM is a stack-based machine that operates on a stack of values and executes the compiled bytecode. Our goal is to be as fast as possible. For a quick reminder, these are our Opcodes:
data Opcode
  = OPush !Int16        -- 0
  | OGet !Word8         -- 1
  | OSwapPop            -- 2
  | OAdd                -- 3
  | OSub                -- 4
  | OMul                -- 5
  | ODiv                -- 6
  deriving (Show, Read, Eq, Generic)
ArithVMLib.hs
And now, the heart of the VM:
interpretBytecode :: Bytecode -> Result Int16
interpretBytecode = interpretBytecode' defaultStackSize

interpretBytecode' :: Int -> Bytecode -> Result Int16
interpretBytecode' stackSize bytecode = runST $ runExceptT $ do
  stack <- PA.newPinnedPrimArray stackSize
  sp <- go 0 0 stack
  checkStack InterpretBytecode stackSize sp
  PA.readPrimArray stack 0
  where
    !size = BS.length bytecode

    go sp ip _ | ip == size = pure sp
    go !sp !ip stack = do
      let opcode = readInstr bytecode ip
      if
        | sp >= stackSize -> throwInterpretError "Stack overflow"
        | sp < 0 -> throwInterpretError "Stack underflow"
        | sp < 2 && opcode >= 2 -> throwInsufficientElementsError
        | opcode == 0 && ip + 2 >= size -> throwIPOOBError $ ip + 2
        | opcode == 1 && ip + 1 >= size -> throwIPOOBError $ ip + 1
        | otherwise -> case opcode of
            0 -> do                 -- OPush
              PA.writePrimArray stack sp $ readInstrArgInt16 bytecode ip
              go (sp + 1) (ip + 3) stack
            1 -> do                 -- OGet
              let i = fromIntegral $ readInstrArgWord8 bytecode ip
              if i < sp
                then do
                  PA.copyMutablePrimArray stack sp stack i 1
                  go (sp + 1) (ip + 2) stack
                else throwInterpretError $
                  "Stack index " <> show i <> " out of bound " <> show (sp - 1)
            2 -> do                 -- OSwapPop
              PA.copyMutablePrimArray stack (sp - 2) stack (sp - 1) 1
              go (sp - 1) (ip + 1) stack
            3 -> interpretBinOp (+) -- OAdd
            4 -> interpretBinOp (-) -- OSub
            5 -> interpretBinOp (*) -- OMul
            6 -> do                 -- ODiv
              b <- PA.readPrimArray stack $ sp - 1
              a <- PA.readPrimArray stack $ sp - 2
              when (b == 0) $ throwInterpretError "Division by zero"
              when (b == (-1) && a == minBound) $
                throwInterpretError "Arithmetic overflow"
              PA.writePrimArray stack (sp - 2) $ a `div` b
              go (sp - 1) (ip + 1) stack
            n -> throwInterpretError $
              "Invalid bytecode: " <> show n <> " at: " <> show ip
      where
        interpretBinOp op = do
          b <- PA.readPrimArray stack $ sp - 1
          a <- PA.readPrimArray stack $ sp - 2
          PA.writePrimArray stack (sp - 2) $ a `op` b
          go (sp - 1) (ip + 1) stack
        {-# INLINE interpretBinOp #-}

        throwIPOOBError ip = throwInterpretError $
          "Instruction index " <> show ip <> " out of bound " <> show (size - 1)

        throwInsufficientElementsError =
          throwInterpretError "Not enough elements to execute operation"

        throwInterpretError = throwError . Error InterpretBytecode
ArithVMLib.hs
The interpretBytecode' function is where the action happens. It is way more complex than interpretAST, but the complexity has a reason, namely performance.

interpretBytecode' runs inside the ST monad wrapped with the ExceptT monad transformer. ST monad lets us use mutable data structures locally while ensuring the function remains externally pure. ExceptT monad transformer adds support for throwing and propagating errors in a pure manner.

We use PrimArray for our stack, which is a mutable array of unboxed primitive types, in our case an array of Int16 values. Using a mutable unboxed array is much faster than using an immutable and/or boxed one like Seq or Vector due to reduced allocation and/or pointer chasing.

The core of the VM is the go function, a tight, tail-recursive loop that GHC compiles into an efficient machine loop, as we see later. It takes the stack pointer (sp), instruction pointer (ip)⁴, and the stack as arguments.

At the top of each loop, a block of guard clauses checks for stack overflow, underflow, and other error conditions before branching on the current opcode. Placing these checks at the top instead of inside the opcode cases is a deliberate choice. This may make the code slightly harder to understand, but it significantly improves the performance of the loop by moving all branching at the beginning of the loop, resulting in code that is more friendly to the CPU’s Branch Predictor. Also notice how we reduce the number of checks by working with a range of opcodes at once in the opcode >= 2 guard. The checks are also sorted so as to be most performant, guided by profiling and benchmarking⁵.

The handling of each opcode is actually pretty straightforward. We use different PrimArray specific operations to read and write to the stack, while taking care of doing the required bound and arithmetic checks. We also use the readInstr* functions that we wrote earlier.

After carrying out each operation, we reenter the loop by calling it tail-recursively with the right stack and instruction pointers. Finally, we make sure that the execution terminated correctly by checking the state of the stack, and return its first element.

Peeking Under the Hood: GHC Core

We see later that the VM is quite fast, but how does GHC achieve this performance? To see the magic, we can look at GHC’s intermediate language: Core. Core is a simpler functional language than Haskell to which GHC compiles Haskell. The simpler nature of Core makes it easier for GHC to optimize it, and compile it further. We can get the Core code for a program by compiling with the GHC option -ddump-simpl.

The actual Core code for our VM is too verbose to show here, but here is a simplified C-like pseudo-code version of our go loop:
$wgo (stack_addr, ip, sp) {
  if (ip == bytecode_size) {
    return sp;
  }
  if (sp >= stack_size) {
    throw "Stack Overflow";
  }
  if (sp < 0) {
    throw "Stack Underflow";
  }

  opcode = read_byte_at(bytecode_addr, ip);
  // ... other checks ...

  switch (opcode) {
    case 0: // OPush
      val = read_int16_at(bytecode_addr, ip + 1);
      write_int16_at(stack_addr, sp, val);
      jump $wgo(stack_addr, ip + 3, sp + 1);

    case 3: // OAdd
      val2 = read_int16_at(stack_addr, sp - 1);
      val1 = read_int16_at(stack_addr, sp - 2);
      write_int16_at(stack_addr, sp - 2, val1 + val2);
      jump $wgo(stack_addr, ip + 1, sp - 1);

    // ... other cases ...
  }
}
A few key optimizations are worth pointing out:

The loop: The tail-recursive go function is compiled into a proper loop. The jump $wgo(...) instruction is effectively a goto, which means there’s no function call overhead for each iteration of the VM loop.

Unboxing: The Core code is full of primitive, unboxed types like Int#, Addr#, and Word#, and operations on them. These are raw machine integers and memory addresses, not boxed Haskell objects. This means operations on them are as fast as they would be in C. The stack operations are not function calls on a PrimArray instance, but primitive memory reads and writes on a raw memory address stack_addr.

Inlining: The interpretBinOp helper function is completely inlined into the main loop. For OAdd, the code for reading two values, adding them, and writing the result is laid out inline, and works on unboxed values and array address.

In short, GHC has turned our high-level, declarative Haskell code into a low-level loop that looks remarkably like one we would write in C. We get the safety and expressiveness of Haskell, while GHC does the heavy lifting to produce highly optimized code. It’s the best of both worlds!

Testing the VM

We must test the VM to make sure it works correctly⁶. We reuse the success and failure tests for the AST interpreter, as the bytecode interpreter should yield the same result:
bytecodeInterpreterSpec :: Spec
bytecodeInterpreterSpec = describe "Bytecode interpreter" $ do
  forM_ astInterpreterSuccessTests $ \(input, result) ->
    it ("interprets: \"" <> BSC.unpack input <> "\"") $ do
      parseCompileInterpret input `shouldBe` Right result

  forM_ errorTests $ \(input, err) ->
    it ("fails for: \"" <> BSC.unpack input <> "\"") $ do
      parseCompileInterpret input `shouldSatisfy` \case
        Left (Error InterpretBytecode msg) | err == msg -> True
        _ -> False
  where
    parseCompileInterpret = parseSized >=> compile >=> interpretBytecode' 7

    errorTests =
      [ ("1/0", "Division by zero"),
        ("-32768 / -1", "Arithmetic overflow"),
        ( "let a = 0 in let b = 0 in let c = 0 in let d = 0 in let e = 0 in "
            <> "let f = 0 in a + b + c + d + e + f",
          "Stack overflow"
        )
      ]

prop_interpret_ast_returns_same_result_as_compile_assemble_then_interpret_bytecode ::
  Spec
prop_interpret_ast_returns_same_result_as_compile_assemble_then_interpret_bytecode =
  prop ( "Property: Interpret AST returns same result as compile"
          <> " then interpret bytecode" ) $ \expr ->
    interpretAST expr == (compile (sizedExpr expr) >>= interpretBytecode)
ArithVMSpec.hs
We also add a property-based test this time: for any given expression, interpreting the AST should produce the same result as compiling it to bytecode and executing it in the VM⁷.

Our test suite is complete now:
main :: IO ()
main = hspec $ do
  parserSpec
  astInterpreterSpec
  compilerSpec
  prop_print_ast_then_parse_returns_same_ast
  prop_disassemble_bytecode_then_decompile_then_compile_returns_same_bytecode
  bytecodeInterpreterSpec
  prop_interpret_ast_returns_same_result_as_compile_assemble_then_interpret_bytecode
ArithVMSpec.hs
And finally, we run all tests together:
Test run
$ cabal test -O2
Running 1 test suites...
Test suite specs: RUNNING...

Parser
  parses: "1 + 2 - 3 * 4 + 5 / 6 / 0 + 1" [✔]
  parses: "1+2-3*4+5/6/0+1" [✔]
  parses: "1 + -1" [✔]
  parses: "let x = 4 in x + 1" [✔]
  parses: "let x=4in x+1" [✔]
  parses: "let x = 4 in let y = 5 in x + y" [✔]
  parses: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
  parses: "let x = 4 in (let y = 5 in x + 1) + let z = 2 in z * z" [✔]
  parses: "let x=4in 2+let y=x-5in x+let z=y+1in z/2" [✔]
  parses: "let x = (let y = 3 in y + y) in x * 3" [✔]
  parses: "let x = let y = 3 in y + y in x * 3" [✔]
  parses: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
  fails for: "" [✔]
  fails for: "1 +" [✔]
  fails for: "1 & 1" [✔]
  fails for: "1 + 1 & 1" [✔]
  fails for: "1 & 1 + 1" [✔]
  fails for: "(" [✔]
  fails for: "(1" [✔]
  fails for: "(1 + " [✔]
  fails for: "(1 + 2" [✔]
  fails for: "(1 + 2}" [✔]
  fails for: "66666" [✔]
  fails for: "-x" [✔]
  fails for: "let 1" [✔]
  fails for: "let x = 1 in " [✔]
  fails for: "let let = 1 in 1" [✔]
  fails for: "let x = 1 in in" [✔]
  fails for: "let x=1 inx" [✔]
  fails for: "letx = 1 in x" [✔]
  fails for: "let x ~ 1 in x" [✔]
  fails for: "let x = 1 & 2 in x" [✔]
  fails for: "let x = 1 inx" [✔]
  fails for: "let x = 1 in x +" [✔]
  fails for: "let x = 1 in x in" [✔]
  fails for: "let x = let x = 1 in x" [✔]
AST interpreter
  interprets: "1" [✔]
  interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
  interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
  interprets: "1 + -1" [✔]
  interprets: "1 * -1" [✔]
  interprets: "let x = 4 in x + 1" [✔]
  interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
  interprets: "let x = 4 in let y = 5 in x + y" [✔]
  interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
  interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
  interprets: "let x = let y = 3 in y + y in x * 3" [✔]
  interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
  fails for: "x" [✔]
  fails for: "let x = 4 in y + 1" [✔]
  fails for: "let x = y + 1 in x" [✔]
  fails for: "let x = x + 1 in x" [✔]
  fails for: "1/0" [✔]
  fails for: "-32768 / -1" [✔]
Compiler
  compiles: "1" [✔]
  compiles: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
  compiles: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
  compiles: "let x = 4 in x + 1" [✔]
  compiles: "let x = 4 in let y = 5 in x + y" [✔]
  compiles: "let x = 4 in let x = x + 1 in x + 2" [✔]
  compiles: "let x = let y = 3 in y + y in x * 3" [✔]
  compiles: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
  compiles: "1/0" [✔]
  compiles: "-32768 / -1" [✔]
  fails for: "x" [✔]
  fails for: "let x = 4 in y + 1" [✔]
  fails for: "let x = y + 1 in x" [✔]
  fails for: "let x = x + 1 in x" [✔]
  fails for: "let x = 4 in let y = 1 in let z = 2 in y + x" [✔]
  fails for: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
  fails for: "let a = 0 in let b = 0 in let c = 0 in let d = 0 in d" [✔]
  fails for greater sized expr [✔]
  fails for lesser sized expr [✔]
Property: Print AST then parse returns same AST [✔]
  +++ OK, passed 100 tests.
Property: Disassemble bytecode then decompile then compile returns same bytecode [✔]
  +++ OK, passed 100 tests.
Bytecode interpreter
  interprets: "1" [✔]
  interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
  interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
  interprets: "1 + -1" [✔]
  interprets: "1 * -1" [✔]
  interprets: "let x = 4 in x + 1" [✔]
  interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
  interprets: "let x = 4 in let y = 5 in x + y" [✔]
  interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
  interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
  interprets: "let x = let y = 3 in y + y in x * 3" [✔]
  interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
  fails for: "1/0" [✔]
  fails for: "-32768 / -1" [✔]
  fails for: "let a = 0 in let b = 0 in let c = 0 in let d = 0 in let e = 0 in let f = 0 in a + b + c + d + e + f" [✔]
Property: Interpret AST returns same result as compile then interpret bytecode [✔]
  +++ OK, passed 100 tests.

Finished in 0.0166 seconds
91 examples, 0 failures
Test suite specs: PASS
Happily, all tests pass.

Benchmarking the VM

Now for the fun part: benchmarking. We use the criterion library to benchmark the code.
{-# LANGUAGE GHC2021 #-}

module Main where

import ArithVMLib
import Control.Arrow ((>>>))
import Control.DeepSeq (force)
import Control.Exception (evaluate)
import Control.Monad ((>=>))
import Criterion
import Criterion.Main
import Criterion.Main.Options
import Criterion.Types
import Data.ByteString qualified as BS

main :: IO ()
main = do
  code <- BS.getContents >>= evaluate . force
  let Right ast = force $ parseSized code
      Right bytecode = force $ compile ast
      Right program = force $ disassemble bytecode
  runMode
    ( Run
        (defaultConfig {reportFile = Just "benchmark.html"})
        Prefix
        []
    )
    [ bgroup
        "pass"
        [ bench "parse" $ whnf (parseSized >>> force) code,
          bench "compile" $ whnf (compile >>> force) ast,
          bench "disassemble" $ whnf (disassemble >>> force) bytecode,
          bench "decompile" $ whnf (decompile >>> force) program
        ],
      bgroup
        "interpret"
        [ bench "ast" $ whnf (fst >>> interpretAST >>> force) ast,
          bench "bytecode" $ whnf (interpretBytecode >>> force) bytecode
        ],
      bgroup
        "run"
        [ bench "ast" $
            whnf (parse >=> interpretAST >>> force) code,
          bench "bytecode" $
            whnf (parseSized >=> compile >=> interpretBytecode >>> force) code
        ]
    ]
ArithVMBench.hs
We have a benchmark suite to measure the performance of each pass, the two interpreters (AST and bytecode), and the full end-to-end runs⁸. We compile with the following GHC options:
 -O2
 -fllvm
 -funbox-strict-fields
 -funfolding-use-threshold=16
 -threaded
 -rtsopts
 -with-rtsopts=-N2
Benchmark run
$ cat benchmark.tb | cabal bench
Running 1 benchmarks...
Benchmark bench: RUNNING...
benchmarking pass/parse
time                 581.1 ms   (566.7 ms .. 594.3 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 573.5 ms   (570.4 ms .. 577.1 ms)
std dev              3.948 ms   (1.359 ms .. 5.424 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking pass/compile
time                 51.00 ms   (50.48 ms .. 52.54 ms)
                     0.998 R²   (0.995 R² .. 1.000 R²)
mean                 50.82 ms   (50.57 ms .. 51.87 ms)
std dev              810.9 μs   (185.8 μs .. 1.509 ms)

benchmarking pass/disassemble
time                 160.3 ms   (154.7 ms .. 166.5 ms)
                     0.998 R²   (0.990 R² .. 1.000 R²)
mean                 155.8 ms   (150.0 ms .. 160.5 ms)
std dev              7.642 ms   (4.255 ms .. 11.76 ms)
variance introduced by outliers: 12% (moderately inflated)

benchmarking pass/decompile
time                 495.1 ms   (454.0 ms .. 523.7 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 506.5 ms   (495.0 ms .. 525.1 ms)
std dev              17.73 ms   (2.167 ms .. 22.59 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking interpret/ast
time                 49.57 ms   (49.53 ms .. 49.61 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 49.80 ms   (49.71 ms .. 50.07 ms)
std dev              255.9 μs   (124.2 μs .. 433.9 μs)

benchmarking interpret/bytecode
time                 15.83 ms   (15.79 ms .. 15.88 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 15.79 ms   (15.75 ms .. 15.83 ms)
std dev              96.85 μs   (70.30 μs .. 140.9 μs)

benchmarking run/ast
time                 628.0 ms   (626.7 ms .. 630.5 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 617.2 ms   (610.2 ms .. 621.0 ms)
std dev              6.679 ms   (1.899 ms .. 8.802 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking run/bytecode
time                 643.8 ms   (632.5 ms .. 655.3 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 638.3 ms   (635.8 ms .. 641.2 ms)
std dev              2.981 ms   (1.292 ms .. 4.153 ms)
variance introduced by outliers: 19% (moderately inflated)

Benchmark bench: FINISH
Here are the results in a more digestible format:

Benchmark Mean Time (ms)

pass/parse 573.5

pass/compile 50.8

pass/disassemble 155.8

pass/decompile 506.5

interpret/ast 49.8

interpret/bytecode 15.8

run/ast 617.2

run/bytecode 638.3

Here are the times in a chart (smaller is better):

<noscript></noscript>
Benchmark times

Let’s break down these numbers:

Parsing and decompiling are slow: At ~573ms and ~506ms, these are by far the slowest passes. This isn’t surprising. Parsing with parser combinators has a known trade-off of expressiveness for performance. Decompiling is a shift-reduce parser that reconstructs an AST from a linear stream of opcodes, and we didn’t spend any time optimizing it.

Compilation is fast: At ~51ms, compilation is an order of magnitude faster than parsing. This is thanks to pre-calculating the bytecode size during the parsing phase, which allows us to pre-allocate a single ByteString and fill it in with low-level pointer operations.

Bytecode interpretation is blazingly fast: At just ~16ms, our VM’s interpreter is over 3 times faster than the AST interpreter (~50ms), which proves our belief that bytecode interpreters are faster.

End-to-end runs: Interestingly, the total time to run via bytecode (~638ms) is slightly slower than the run via AST (~617ms). This is because the cost of parsing, compiling, and then interpreting is higher than just parsing and interpreting. The real win for a bytecode VM comes when you compile once and run many times, amortizing the initial compilation cost.

I can already see readers thinking, “Sure that’s fast, but is it faster than C/Rust/Zig/my favourite language?” Let’s find out.

Benchmarking Against C

To get a better sense of our VM’s performance, I rewrote it in C.

The C implementation is a classic manual approach: a hand-written tokenizer and recursive-descent parser, structs with pointers for the AST, and manual memory management and error propagation. The VM is a simple while loop with a switch statement for dispatching opcodes⁹.

To compare our Haskell code against the C code, we need to write the last Haskell module, the CLI app that we demonstrated in the first post:
ArithVMApp.hs
{-# LANGUAGE GHC2021 #-}

module Main where

import ArithVMLib
import Control.Arrow ((>>>))
import Control.Monad ((>=>))
import Data.ByteString qualified as BS
import Data.Foldable (toList)
import Data.Set qualified as Set
import Data.String (IsString (fromString))
import Options.Applicative qualified as O
import System.Exit (exitFailure)
import System.IO qualified as IO
import Test.QuickCheck qualified as Q
import Text.Pretty.Simple qualified as PS

data Command
  = RunPass Pass Input
  | Run Input
  | Generate Int
  deriving (Show, Eq)

data Input = InputFP FilePath | InputStdin deriving (Show, Eq)

instance IsString Input where
  fromString = \case
    "-" -> InputStdin
    fp -> InputFP fp

commandParser :: IO Command
commandParser =
  O.customExecParser (O.prefs $ O.showHelpOnError <> O.showHelpOnEmpty)
    . O.info (O.hsubparser (mconcat subcommandParsers) O.<**> O.helper)
    $ O.fullDesc <> O.header "Bytecode VM for Arithmetic written in Haskell"
  where
    subcommandParsers =
      map
        ( \(command, pass, desc) ->
            O.command command
            . O.info (RunPass pass <$> inputParser)
            $ O.progDesc desc
        )
        [ ("read", Read, "Read an expression from file or STDIN"),
          ("parse", Parse, "Parse expression to AST"),
          ("print", Print, "Parse expression to AST and print it"),
          ("compile", Compile, "Parse and compile expression to bytecode"),
          ("disassemble", Disassemble, "Disassemble bytecode to opcodes"),
          ( "decompile",
            Decompile,
            "Disassemble and decompile bytecode to expression"
          ),
          ("interpret-ast", InterpretAST, "Parse expression and interpret AST"),
          ( "interpret-bytecode",
            InterpretBytecode,
            "Parse, compile and assemble expression, and interpret bytecode"
          )
        ]
        <> [ O.command "run" . O.info (Run <$> inputParser) $
               O.progDesc "Run bytecode",
             O.command "generate" . O.info (Generate <$> maxSizeParser) $
               O.progDesc "Generate a random arithmetic expression"
           ]

    inputParser =
      O.strArgument
        ( O.metavar "FILE"
            <> O.value InputStdin
            <> O.help "Input file, pass - to read from STDIN (default)"
        )

    maxSizeParser =
      O.option
        O.auto
        ( O.long "size"
            <> O.short 's'
            <> O.metavar "INT"
            <> O.value 100
            <> O.help "Maximum size of the generated AST"
        )

main :: IO ()
main = commandParser >>= runCommand

runCommand :: Command -> IO ()
runCommand = \case
  RunPass Read i -> run i (const $ pure ()) (\_ -> Right () :: Either String ())
  RunPass Parse i -> run i (const $ pure ()) parse
  RunPass Print i -> run i pPrintExpr parse
  RunPass Compile i -> run i BS.putStr $ parseSized >=> compile
  RunPass Decompile i -> run i pPrintExpr $ disassemble >=> decompile
  RunPass Disassemble i -> run i (mapM_ print) $ disassemble >>> fmap toList
  RunPass InterpretAST i -> run i print $ parse >=> interpretAST
  RunPass InterpretBytecode i ->
    run i print $ parseSized >=> compile >=> interpretBytecode
  Run i -> run i print interpretBytecode
  Generate maxSize -> Q.generate (exprGen Set.empty maxSize) >>= pPrintExpr
  where
    run input print process = do
      code <- case input of
        InputStdin -> BS.getContents
        InputFP fp -> BS.readFile fp
      case process code of
        Left err -> IO.hPrint IO.stderr err >> exitFailure
        Right val -> print val

    pPrintExpr =
      PS.pPrintOpt PS.CheckColorTty $
        PS.defaultOutputOptionsDarkBg
          { PS.outputOptionsIndentAmount = 2,
            PS.outputOptionsCompact = True
          }
We compile with the following GHC options¹⁰:
 -O2
 -fllvm
 -funbox-strict-fields
 -funfolding-use-threshold=16
And for the C version, we compile using GCC:
gcc -O3 arithvm.c -o arithvm -Wall
Now, let’s see how they stack up against each other. We use hyperfine to run the two executables.
Hyperfine run
$ arith-vm compile benchmark.tb > benchmark.tbc
# Haskell runs
$ hyperfine -L pass read,parse,compile,interpret-bytecode --warmup 10 -r 30 \
    "arith-vm {pass} benchmark.tb"
Benchmark 1: arith-vm read benchmark.tb
  Time (mean ± σ):      30.4 ms ±   0.2 ms    [User: 2.4 ms, System: 15.9 ms]
  Range (min … max):    30.0 ms …  30.9 ms    30 runs

Benchmark 2: arith-vm parse benchmark.tb
  Time (mean ± σ):     567.6 ms ±   5.7 ms    [User: 537.4 ms, System: 22.0 ms]
  Range (min … max):   554.7 ms … 579.9 ms    30 runs

Benchmark 3: arith-vm compile benchmark.tb
  Time (mean ± σ):     630.0 ms ±   4.5 ms    [User: 598.5 ms, System: 23.5 ms]
  Range (min … max):   622.6 ms … 641.1 ms    30 runs

Benchmark 4: arith-vm interpret-bytecode benchmark.tb
  Time (mean ± σ):     650.2 ms ±   4.9 ms    [User: 619.0 ms, System: 23.3 ms]
  Range (min … max):   640.9 ms … 656.6 ms    30 runs

$ hyperfine --warmup 10 -r 30 "arith-vm run benchmark.tbc"
Benchmark 1: arith-vm run benchmark.tbc
  Time (mean ± σ):      29.3 ms ±   0.2 ms    [User: 17.6 ms, System: 2.9 ms]
  Range (min … max):    28.9 ms …  29.6 ms    30 runs

# C runs
$ hyperfine -L pass read,parse,compile,interpret --warmup 10 -r 30 \
    "./arithvm {pass} benchmark.tb"
Benchmark 1: ./arithvm read benchmark.tb
  Time (mean ± σ):      14.2 ms ±   0.2 ms    [User: 0.8 ms, System: 13.0 ms]
  Range (min … max):    14.0 ms …  14.6 ms    30 runs

Benchmark 2: ./arithvm parse benchmark.tb
  Time (mean ± σ):     217.4 ms ±   2.6 ms    [User: 192.2 ms, System: 23.7 ms]
  Range (min … max):   213.6 ms … 223.9 ms    30 runs

Benchmark 3: ./arithvm compile benchmark.tb
  Time (mean ± σ):     254.5 ms ±   2.9 ms    [User: 228.3 ms, System: 24.7 ms]
  Range (min … max):   246.0 ms … 259.1 ms    30 runs

Benchmark 4: ./arithvm interpret benchmark.tb
  Time (mean ± σ):     267.9 ms ±   2.1 ms    [User: 241.5 ms, System: 24.9 ms]
  Range (min … max):   263.4 ms … 272.2 ms    30 runs

$ hyperfine --warmup 10 -r 30 "./arithvm run benchmark.tbc"
Benchmark 1: ./arithvm run benchmark.tbc
  Time (mean ± σ):      13.9 ms ±   0.1 ms    [User: 12.4 ms, System: 1.1 ms]
  Range (min … max):    13.6 ms …  14.1 ms    30 runs
Here’s a summary of the results:

Pass C Time (ms) Haskell Time (ms) Slowdown

Read 14.2 30.4 2.14x

Parse 203.2 537.2 2.64x

Compile 37.1 62.4 1.68x

Interpret 13.4 20.2 1.51x

Run 13.9 29.3 2.11x

I have subtracted the times of previous passes to get the times for individual passes. Here’s the same in a chart (smaller is better):

<noscript></noscript>
Run time of different passes for C and Haskell VMs

As expected, the C implementation is faster across the board, between 1.5x to 2.6x. The biggest difference is in parsing, where the hand-written C parser is more than twice as fast as our combinator-based one. On the other hand, the Haskell VM is only 50% slower than the C VM. In my opinion, the Haskell code’s performance is quite respectable, especially given the safety, expressiveness and conciseness benefits, as illustrated by the code sizes¹¹:

Implementation Lines of Code

C 775

Haskell 407

The Haskell implementation is almost half the size of the C code. I don’t know about you but I’m perfectly happy with the half as small, half as fast tread-off.

The benchmark results for the VMs become less surprising when I compare the C interpret function with the GHC Core code for interpretBytecode'¹².
int interpret(const uint8_t *bytecode, const long bytecode_size, int16_t *result) {
  VM vm;
  vm_init(&vm);

  while (vm.ip < bytecode_size) {
    if (vm.sp >= STACK_SIZE) { return VM_ERROR_STACK_OVERFLOW; }
    if (vm.sp < 0) { return VM_ERROR_STACK_UNDERFLOW; }

    const uint8_t op = bytecode[vm.ip];
    // other checks

    switch (op) {
    case OP_PUSH: {
      const uint8_t byte1 = bytecode[vm.ip + 1];
      const uint8_t byte2 = bytecode[vm.ip + 2];
      const int16_t value = (int16_t)((uint16_t)byte1 | ((uint16_t)byte2 << 8));

      vm.stack[vm.sp] = value;
      vm.sp++;
      vm.ip += 3;
      break;
    }

    case OP_ADD:
    case OP_SUB:
    case OP_MUL:
    case OP_DIV: {
      int16_t value1 = vm.stack[vm.sp - 2];
      int16_t value2 = vm.stack[vm.sp - 1];

      int16_t result;
      switch (op) {
      case OP_ADD: { result = value1 + value2; break; }
      case OP_SUB: { result = value1 - value2; break; }
      case OP_MUL: { result = value1 * value2; break; }
      case OP_DIV: {
        if (value2 == 0) { return VM_ERROR_DIVISION_BY_ZERO; }
        if (value2 == -1 && value1 == -32768) {
          return VM_ERROR_ARITHMETIC_OVERFLOW;
        }
        result = value1 / value2;
        break;
      }
      }

      vm.stack[vm.sp - 2] = result;
      vm.sp--;
      vm.ip++;
      break;
    }
    // ... other cases ...
    }
  }
  // ... final checks and return ...
}
This structure is almost a 1-to-1 match with the GHC Core code we saw earlier. The C while loop corresponds to the optimized $wgo function that GHC generates, the switch statement is almost identical to the case analysis on the raw opcode byte, and the C stack array is equivalent to the MutableByteArray# GHC uses. GHC effectively compiles our high-level Haskell into a low-level code that is structurally identical to what we wrote by hand in C¹³.

This explains why the performance is in the same ballpark. The remaining performance gap is probably due to the thin layer of abstraction that the Haskell runtime still maintains, but it’s remarkable how close we can get to C-like performance.

Future Directions

While our Haskell program is fast, we can improve certain things:

Parser optimizations: As the benchmarks showed, parsing is our slowest pass. For better performance, we could replace our Attoparsec-based combinator parser with a parser generator like Alex and Happy, or even write a recursive-descent parser by hand.

Superinstructions: We could analyze the bytecode for common instruction sequences (like OPush followed by OAdd) and combine them into single superinstructions. This would reduce the instruction dispatch overhead, but may make compilation slower.

Register-based VM: A register-based VM, which uses a small array of virtual registers instead of a memory-based stack, could significantly reduce memory traffic and improve performance. This would require a more complex compiler capable of register allocation.

Just-in-Time (JIT) compilation: The ultimate performance boost could come from a JIT compiler. Instead of interpreting bytecode, we could compile it to native machine code at runtime, eliminating the interpreter entirely. Maybe we could use LLVM to build a JIT compiler in Haskell.

Conclusion

And that’s a wrap! We successfully built a bytecode compiler and virtual machine in Haskell. We covered parsing, AST interpretation, compilation, and bytecode execution, as well as, debugging and testing functionalities. Let’s update our checklist:

Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).

Unit testing for our parser.

Interpreting ASTs.

Compiling ASTs to bytecode.

Disassembling and decompiling bytecode.

Unit testing for our compiler.

Property-based testing for our compiler.

Efficiently executing bytecode in a virtual machine.

Unit testing and property-based testing for our VM.

Benchmarking our code to see how the different passes perform.

All the while keeping an eye on performance.

The journey from a simple AST interpreter to a bytecode VM has been a rewarding one. We saw a significant performance improvement, learned about how compilers and VMs work, and how to write performant code in Haskell. While our Haskell implementation isn’t as fast as the hand-written C version, it’s far more concise and, I would argue, easier to reason about. It’s a great demonstration of Haskell’s power for writing high-performance—yet safe and elegant—code.

See the full code at:

ArithVMLib.hs

ArithVMSpec.hs

ArithVMBench.hs

ArithVMApp.hs

arithvm.c

If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!

Actually, QuickCheck does not generate entirely arbitrary inputs. It generates arbitrary inputs with increasing complexity—where the complexity is defined by the user—and asserts the properties on these inputs. When a test fails for a particular input, QuickCheck also tries to simplify the culprit and tries to find the simplest input for which the test fails. This process is called Shrinking in QuickCheck parlance. QuickCheck then shows this simplest input to the user for them to use it to debug their code.↩︎

Read this good introduction to QuickCheck if you are unfamiliar.↩︎

Notice that we discard the expressions that do not compile successfully.↩︎

sp and ip are not actual pointers, but indices into the stack and bytecode arrays respectively.↩︎

Guided by the GHC profiler, I tweaked the code in many different ways and ran benchmarks for every change. Then I chose the code that was most performant.↩︎

It is extremely important to write good tests before getting your hands dirty with performance optimizations. In my case, the tests saved me many times from breaking the VM while moving code around for performance.↩︎

We are using our AST interpreter as a definitional interpreter, assuming it to be correctly implemented because of its simpler nature.↩︎

I ran all benchmarks on an Apple M4 Pro 24GB machine against a 142MB file generated using the expression generator we wrote earlier.↩︎

I don’t claim to be a great or even a good C programmer. In fact, this C VM is the first substantial C code I have written in decades. I’m sure the code is not most optimized. It may even be ridden with memory management bugs. If you find something wrong, please let me know in the comments.↩︎

I tried various RTS options to tweak GHC garbage collection, but the defaults proved to be fastest.↩︎

The lines of code are for only the overlapping functionalities between C and Haskell versions.↩︎

I did try using Direct Threading and Subroutine Threading in the C code, but they resulted in slower code than the switch-case variant. GCC may be smart enough in case of this simple VM to optimize the switch-case to be faster than threaded code.↩︎

You may have noticed that the C interpret function is not laid out in the exact same manner as the Haskell interpretBytecode'.go function. In case of C, moving the checks to the front did not yield in performance improvement. I suspect this may be because GCC is smart enough to do that optimization by itself. The nested switch were also no detriment to the performance of the C code.↩︎

This post is part of the series: A Fast Bytecode VM for Arithmetic.

The Parser

The Compiler

The Virtual Machine (you are here)

If you liked this post, please leave a comment.
by Abhinav Sarkar (abhinav@abhinavsarkar.net) at October 21, 2025 12:00 AM

Benchmark	Mean Time (ms)
pass/parse	573.5
pass/compile	50.8
pass/disassemble	155.8
pass/decompile	506.5
interpret/ast	49.8
interpret/bytecode	15.8
run/ast	617.2
run/bytecode	638.3

Pass	C Time (ms)	Haskell Time (ms)	Slowdown
Read	14.2	30.4	2.14x
Parse	203.2	537.2	2.64x
Compile	37.1	62.4	1.68x
Interpret	13.4	20.2	1.51x
Run	13.9	29.3	2.11x

Implementation	Lines of Code
C	775
Haskell	407

October 20, 2025

Monday Morning Haskell

Spatial DP: Finding the Largest Square
In the past two weeks we’ve explored a couple different problems in dynamic programming. These were simpler 1-dimensional problems. But dynamic programming is often at its most powerful when you can work across multiple dimensions. In today’s problem, we’ll consider a problem that is actually a 2D spatial problem where we can use dynamic programming.

If you want to learn how to write dynamic programming solutions in Haskell from the ground up, take a look at our Solve.hs course. DP is one of several algorithmic approaches you’ll learn in Module 3!

The Problem

Today’s problem (Maximal Square) is fairly simple conceptually. We are given a grid of 1’s and 0’s like so:
10100
11111
00111
10101
We must return the size of the largest square in the grid composed entirely of 1’s. So in the example above, the answer would be 4. There are two 2x2 squares we can form, starting in the 2nd row, using either the 3rd or 4th column as the “top left” of the square.

We can do a couple small edits to change the answer here. For example, we can flip the second ‘0’ in the bottom row and we’ll get a 3x3 grid, allowing the answer 9:
10100
11111
00111
10111
We could instead flip the second ‘1’ in the third row, and now the answer is only 1, as there are no 2x2 squares remaining:
10100
11111
00101
10101
The Algorithm

To solve this, we can imagine a DP grid that has “layers” where each layer has the same dimensions as the original grid. Each layer has a number “k” associated with it. The index {row,column} at layer k tells us whether or not a square of size k exists in the original grid with size k x k, with the cell {row, column} as its top left cell.

To construct this grid, we would need a base case and a recursive case. The base case is to consider layer 1. This is identical to the original grid we receive. Any location with 1 in the original grid is the top left for a 1x1 square.

So how do we build the layer k+1? This requires one simple insight. Suppose we are dealing with a single index {r,c}. In order for this to be the top left of a square of size k+1, we just need to check that 4 cells begin squares of size k: {r,c}, {r+1,c}, {r,c+1},{r+1,c+1}.

So to form the next layer, we just loop through each index in the layer and fill it in with 1 if it meets that criterion. Once we reach a layer where each entry is 0, we are done. We should return the square of the last layer we found.

There are a few optimizations possible here. Thinking back to our first DP problem, we didn’t need to store the full DP array since each new step only depended on a couple prior values. This time, we don’t need a full grid with “k” layers. We could alternate with only two grids, saving new values from the prior grid, and then making our “new” grid the “old” grid for the next layer.

But even simpler than that, we can keep modifying a single grid in place. Each “new” value we calculate depends on numbers below and/or to its right. So as long as we loop through the grid from left to right and top to bottom, we are safe modifying its values in place. At least, that’s what we’ll do in Rust. In Haskell we could do this with the mutable array API, but we’ll stick with the more conventional, immutable, approach in this article. (You can learn more about Haskell’s mutable arrays in Solve.hs).

Rust Solution

Let’s start with the Rust solution, demonstrating the mutable array approach. We’ll start by defining a series of terms, like the dimensions of our input and our dp grid (which is initially a clone of the input). We’ll also define a boolean (found) to indicate if we’ve found at least a single 1 on the current layer. We’ll track level, the number of layers confirmed to have a 1.
pub fn maximal_square(matrix: Vec<Vec<char>>) -> i32 {
    let m = matrix.len();
    let n = matrix[0].len();
    let mut level = 0;
    let mut dp = matrix.clone();
    let mut found = true;
    ...
    return level * level;
}
Of course, our final answer is just the square of the final “level” we determine. But how do we find this? We’ll need an outer while loop that terminates once we hit a level that does not hold a 1. We reset found as false to start each loop, but at the end of the loop, we’ll increment the level if we have found something.
pub fn maximal_square(matrix: Vec<Vec<char>>) -> i32 {
    let m = matrix.len();
    let n = matrix[0].len();
    let mut level = 0;
    let mut dp = matrix.clone();
    let mut found = true;
    while (found) {
        found = false;
        ...
        if (found) {
            level += 1;
        }
    }
    return level * level;
}
Now the core of the “layer” loop is to loop through each cell, left to right and top to bottom.
pub fn maximal_square(matrix: Vec<Vec<char>>) -> i32 {
    ...
    while (found) {
        found = false;
        for i in 0..m {
            for j in 0..n {
                ...
            }
        }
        if (found) {
            level += 1;
        }
    }
    return level * level;
}
So what happens inside the loop? When we hit a 0 cell, we don’t need to do anything. It always remains a 0 and we haven’t “found” anything. But interesting things happen if we hit a 1.

First, we note that found is now true - this layer is not empty. We have found a k x k square. But second, we should now reset this cell as 0 if it does not make a square of size k+1. We need to first check the dimensions to make sure we don’t go out of bounds, but then also check the 3 spaces, to the right, below, and diagonally away from us. If any of these are 0, we reset this cell as 0.
pub fn maximal_square(matrix: Vec<Vec<char>>) -> i32 {
    ...
    while (found) {
        found = false;
        for i in 0..m {
            for j in 0..n {
                if (dp[i][j] == '1') {
                    found = true;
                    if (i + 1 >= m || 
                        j + 1 >= n || 
                        dp[i][j+1] == '0' ||
                        dp[i+1][j] == '0' ||
                        dp[i+1][j+1] == '0') {
                        dp[i][j] = '0';
                    }
                }
            }
        }
        if (found) {
            level += 1;
        }
    }
    return level * level;
}
And just by filling in this logic, our function is suddenly done! Our inner loop is complete, and our outer loop will break once we find no more increasingly large squares. Here is the full Rust solution:
pub fn maximal_square(matrix: Vec<Vec<char>>) -> i32 {
    let m = matrix.len();
    let n = matrix[0].len();
    let mut level = 0;
    let mut dp = matrix.clone();
    let mut found = true;
    while (found) {
        found = false;
        for i in 0..m {
            for j in 0..n {
                if (dp[i][j] == '1') {
                    found = true;
                    if (i + 1 >= m || 
                        j + 1 >= n || 
                        dp[i][j+1] == '0' ||
                        dp[i+1][j] == '0' ||
                        dp[i+1][j+1] == '0') {
                        dp[i][j] = '0';
                    }
                }
            }
        }
        if (found) {
            level += 1;
        }
    }
    return level * level;
}
Haskell Solution

Now let’s write this in Haskell. We’ll start with a few definitions, including a type alias for our DP map. We’ll take an Array as the problem input, but we want a HashMap for our stateful version since we can “mutate” a HashMap efficiently:
type SquareMap = HM.HashMap (Int, Int) Bool

maximalSquare :: A.Array (Int, Int) Bool -> Int
maximalSquare grid = ...
  where
    ((minRow,minCol), (maxRow, maxCol)) = A.bounds grid
    initialMap = HM.fromList [vs | vs <- A.assocs grid]

    ...
Now we’ll define two loop functions - one for the inner loop, one for the outer loop. The “state” for the inner loop is our current level number, as well as the map of the previous layer. The inner loop (coordLoop) should return us an updated map, as well as the found bool value telling us if we’ve found at least a single 1 in the prior layer.
maximalSquare :: A.Array (Int, Int) Bool -> Int
maximalSquare grid = ...
  where
    ((minRow,minCol), (maxRow, maxCol)) = A.bounds grid
    initialMap = HM.fromList [vs | vs <- A.assocs grid]

    coordLoop :: (Bool, SquareMap) -> (Int, Int) -> (Bool, SquareMap)
    coordLoop (found, mp) coord@(r, c) = ...

    loop :: Int -> HM.HashMap (Int, Int) Bool -> Int
    loop level mp = ...

    ...
Notice that coordLoop has the argument pattern for foldl, rather than foldr. We want to loop through our coordinates in the proper order, from left to right and top down. If we use a right fold over the indices of the grid, it will go in reverse order.

Let’s start by filling in the inner loop. The first thing to do is determine if the found value needs to change. This is the case if we discover a True value at this index:
maximalSquare :: A.Array (Int, Int) Bool -> Int
maximalSquare grid = ...
  where
    ((minRow,minCol), (maxRow, maxCol)) = A.bounds grid
    initialMap = HM.fromList [vs | vs <- A.assocs grid]

    coordLoop :: (Bool, SquareMap) -> (Int, Int) -> (Bool, SquareMap)
    coordLoop (found, mp) coord@(r, c) =
      let found' = found || mp HM.! coord
          ...
      in  (found’, ...)
Now we need the 5 conditions that tell us if this cell should get cleared. Calculate all these, and insert False at the cell if any of them match. Otherwise, keep the map as is!
maximalSquare :: A.Array (Int, Int) Bool -> Int
maximalSquare grid = ...
  where
    ((minRow,minCol), (maxRow, maxCol)) = A.bounds grid
    initialMap = HM.fromList [vs | vs <- A.assocs grid]

    coordLoop :: (Bool, SquareMap) -> (Int, Int) -> (Bool, SquareMap)
    coordLoop  (found, mp) coord@(r, c) =
      let found' = found || mp HM.! coord
          tooRight = c >= maxCol
          tooLow = r >= maxRow
          toRight = mp HM.! (r, c + 1)
          under = mp HM.! (r + 1, c)
          diag = mp HM.! (r + 1, c + 1)
          failNext = tooLow || tooRight || not toRight || not under || not diag
          mp' = if failNext then HM.insert coord False mp else mp
      in  (found', mp')

    ...
Now for the outer loop, we use foldl to go through our coordinates using the coordLoop. If we’ve found at least 1 square at this size, then we recurse with the new map and an incremented size. Otherwise we return the square of the current level. Then we just need to call this loop with initial values:
```haskell
type SquareMap = HM.HashMap (Int, Int) Bool

maximalSquare :: A.Array (Int, Int) Bool -> Int
maximalSquare grid = loop 0 initialMap
  where
    ((minRow,minCol), (maxRow, maxCol)) = A.bounds grid
    initialMap = HM.fromList [vs | vs <- A.assocs grid]

    coordLoop :: (Bool, SquareMap) -> (Int, Int) -> (Bool, SquareMap)
    coordLoop  (found, mp) coord@(r, c) = ...

    loop :: Int -> HM.HashMap (Int, Int) Bool -> Int
    loop level mp =
      let (found, mp') = foldl coordLoop (False, mp) (A.indices grid)
      in  if found then loop (level + 1) mp' else (level * level)
This completes our Haskell solution!
type SquareMap = HM.HashMap (Int, Int) Bool

maximalSquare :: A.Array (Int, Int) Bool -> Int
maximalSquare grid = loop 0 initialMap
  where
    ((minRow,minCol), (maxRow, maxCol)) = A.bounds grid
    initialMap = HM.fromList [vs | vs <- A.assocs grid]

    coordLoop :: (Bool, SquareMap) -> (Int, Int) -> (Bool, SquareMap)
    coordLoop  (found, mp) coord@(r, c) =
      let found' = found || mp HM.! coord
          tooRight = c >= maxCol
          tooLow = r >= maxRow
          toRight = mp HM.! (r, c + 1)
          under = mp HM.! (r + 1, c)
          diag = mp HM.! (r + 1, c + 1)
          failNext = tooLow || tooRight || not toRight || not under || not diag
          mp' = if failNext then HM.insert coord False mp else mp
      in  (found', mp')

    loop :: Int -> HM.HashMap (Int, Int) Bool -> Int
    loop level mp =
      let (found, mp') = foldl coordLoop (False, mp) (A.indices grid)
      in  if found then loop (level + 1) mp' else (level * level)
Conclusion

Next week we’ll look at one more multi-dimensional DP problem where the dimensions aren’t quite as obvious in this spatial way. The best way to understand DP is to learn related concepts from scratch, including your basic use-it-or-lose-it problems and memoization. You’ll study all these concepts and learn Haskell implementation tricks in Module 3 of Solve.hs. Enroll in the course now!
by James Bowen at October 20, 2025 08:30 AM

October 16, 2025

Haskell Interlude

71: Stefan Wehr

Stefan Wehr is a professor at the Offenburg University of Applied Sciences. Before becoming a professor, Stefan worked in industry on a large Haskell codebase - specifically one that's not a compiler and not a blockchain. So of course we talked about using Haskell in large projects, software architecture, modularity, type classes and data modeling and the suppression of sums outside of functional programming, and also about teaching Haskell at his current job.

by Haskell Podcast at October 16, 2025 08:00 AM

Chris Penner

Exploring Arrows for sequencing effects
Last time, we explored common methods of sequencing effects into little programs. If you haven't read it yet, I'd recommend starting with that, but you can probably manage without it if you insist.

We examined Applicatives, Monads, and Selective Applicatives, and each of these systems had its own trade-offs. We dug into how all approaches exist on the spectrum between being expressive or analyzable and at the end of the post we were unfortunately left wanting something better. Monads reign supreme when it comes to expressiveness as they can express any possible programs we may want to write, but they offer essentially no ability to analyze program they represent without executing it.

On the other hand, Applicatives and Selective Applicatives offered reasonable program analysis, but are unable to express complex programs. They can't even encode programs in which downstream effects materially depend on the results of upstream effects.

These approaches are all based on the same Functor-Applicative-Monad hierarchy, in this post we'll set that aside and rebuild on an altogether different foundation to see if we can do even better.

Setting the goal posts

Before putting in the work let's think critically about the what we felt was missing from the Monad hierarchy and what we wish to gain from a new system.

Here's my wish-list:

I want to be able to list out every effect that program might perform without executing anything.

I want to understand the dependencies between the effects including the flow of data between them.

I want to be able to express programs in which downstream effects can fully utilize the results of upstream effects.

Looking at these requirements, the biggest problem with the Monadic effects system is that it's far too rough-grained in how it handles the results of previous effects. We can see this by reviewing the signature of bind:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
We can see that the result from the previous effect is passed to an arbitrary Haskell function whose job is to return the entire continuation of the program! This permits that function to swap out the entire rest of the program on any particular run, which I'd argue is way more power than the vast majority of reasonable programs require. This is quite frankly a dangerous amount of expressive power, what sort of programs are you writing where you can't even statically identify the possible code paths that might be taken? Even more complex flows like branching, looping and recursion can be expressed in a more structured way without resorting to this sledgehammer level of dynamism.

This tells us we have some room to constrain our programs a bit, and if we're economical about how we do it we can trade that power for the benefits we desire.

We still need to utilize these past results, but we want to avoid opening Pandora's box. That is, we must be careful not to allow the creation of new effects by running arbitrary Haskell functions at execution time. So, in order to use results without a continuation-building function like Monads use, we must meaningfully include the inputs and outputs for our effects in the structure of our effect system itself. We also know that we need to be able to chain these effects together, so we'll need some way to compose them.

If it's not obvious already, this is a great fit for the Category typeclass:
class Category k where
  id :: k a a
  (.) :: k b c -> k a b -> k a c
This already gives us a lot of what we want. Unlike Monads which bake outputs into the continuation of the program using function closures, the Category structure routes inputs and outputs explicitly as part of its structure. Unsurprisingly, it's quite a natural fit; after all, it's called Category Theory, not Monad Theory...

Rebuilding on Categories

Now let's begin to re-implement the examples from the previous post using this new Category-based effect system. In order to save some time, we're actually going to jump up the hierarchy a bit all the way to Arrows.

The Arrow class, if you're not familiar with it, looks like this:
class Category a => Arrow (a :: Type -> Type -> Type) where
  arr :: (b -> c) -> a b c
  (***) :: a b c -> a b' c' -> a (b, b') (c, c')
There are a few other methods we get for free, but this is a minimal set of methods we need to define.

Notice that it has a Category superclass, so we'll use identity and composition from there. We can leverage arr to lift pure Haskell functions into our Category structure. I know we just said we wanted to avoid arbitrary Haskell functions, but note that in this case, just like Applicatives, the function is pure, we can't determine any effects or structure of the effects within the function. No problems here.

We'll re-visit (***) in just a minute.

To get started, how about we re-implement the program we wrote using Applicative in the previous post?

I'll save you from clicking over, here's a refresher on what we did before:
import Control.Applicative (liftA3)
import Control.Monad.Writer (Writer, runWriter, tell)

class (Applicative m) => ReadWrite m where
  readLine :: m String
  writeLine :: String -> m ()

data Command
  = ReadLine
  | WriteLine String
  deriving (Show)

-- | We can implement an instance which runs a dummy interpreter that simply records the commands
-- the program wants to run, without actually executing anything for real.
instance ReadWrite (Writer [Command]) where
  readLine = tell [ReadLine] *> pure "Simulated User Input"
  writeLine msg = tell [WriteLine msg]

-- | A helper to run our program and get the list of commands it would execute
recordCommands :: Writer [Command] String -> [Command]
recordCommands w = snd (runWriter w)

-- | A simple program that greets the user.
myProgram :: (ReadWrite m) => String -> m String
myProgram greeting =
  liftA3
    (\_ name _ -> name)
    (writeLine (greeting <> ", what is your name?"))
    readLine
    (writeLine "Welcome!")

-- We can now run our program in the Writer applicative to see what it would do!
main :: IO ()
main = do
  let commands = recordCommands (myProgram "Hello")
  print commands

-- [WriteLine "Hello, what is your name?", ReadLine, WriteLine "Welcome!"]
The key aspects of this Applicative version were that we could analyze any program which required only an Applicative constraint to get the full list of sequential effects that the program would perform.

Here's the same program, but this time we'll encode the effects using Arrow constraints instead.

But first, a disclaimer: writing Arrow-based programs looks ugly, but don't worry, bear with me for a bit and we'll address that later.

Just like the Applicative version, we'll define a typeclass as the interface to our set of ReadWrite effects, but this time will assume an Arrow constraint:
import Control.Arrow
import Control.Category
import Prelude hiding (id)

class (Arrow k) => ReadWrite k where
  -- Readline has no interesting input, so we use () as input type.
  readLine :: k () String

  -- We track the inputs for the writeLine directly in the Category structure.
  writeLine :: k String ()

-- Helper for embedding a static Haskell value directly into an Arrow
constA :: (Arrow k) => b -> k a b
constA b = arr (\_ -> b)

-- | A simple program which uses a statically provided message to greet the user.
myProgram :: (ReadWrite k) => String -> k () ()
myProgram greeting =
  constA (greeting <> ", what is your name?")
    >>> writeLine
    >>> readLine
    >>> constA "Welcome!"
    >>> writeLine
Great, that should feel pretty straight-forward, it's trivial to convert sequential Applicative programs like this.

In order to run it, we still need to use the IO monad, since that's just how base does IO, but we can use the nifty Kleisli newtype wrapper which turns any monadic computation into a valid Arrow by embedding the monadic effects into the Arrow structure.

Here's how we implement the ReadWrite instance for Kleisli IO:
instance ReadWrite (Kleisli IO) where
  readLine = Kleisli $ \() -> getLine
  writeLine = Kleisli $ \msg -> putStrLn msg

run :: Kleisli IO i o -> i -> IO o
run prog i = do
  runKleisli prog i
And it runs just fine:
>>> run (myProgram "Hello") ()
Hello, what is your name?
Chris
Welcome!
Let's look a little closer at Kleisli:
newtype Kleisli m a b = Kleisli { runKleisli :: a -> m b }
Look familiar? It's just the continuation function from monadic bind hiding in there.

There's a difference though, now that arbitrary function is part of our implementation, not our interface!

This is important, because it means we can invent a different implementation of our ReadWrite interface that just tracks the effects that doesn't have to deal with arbitrary binds like this.

Let's implement a command-recorder that does exactly that.
data Command
  = ReadLine
  | WriteLine
  deriving (Show)

-- Just like the applicative we create a custom implementation of the interface which for static analysis.
-- The parameters are phantom, we won't be running anything, so we only care about
-- the structure of the effects for now.
data CommandRecorder i o = CommandRecorder [Command]

-- We need a Category instance since it's a pre-requisite for Arrow:
instance Category CommandRecorder where
  -- The identity command does nothing, so it records no commands.
  id = CommandRecorder []

  -- Composition of two CommandRecorders just collects their command lists.
  (CommandRecorder cmds2) . (CommandRecorder cmds1) = CommandRecorder (cmds1 <> cmds2)

-- Now the Arrow instance.
instance Arrow CommandRecorder where
  -- We know this function must be pure (barring errors), so we don't
  -- need to track any effects from it.
  arr _ = CommandRecorder []

  -- Don't worry about this combinator yet, we'll come back to it.
  -- For now we'll collect the effects from both sides.
  (CommandRecorder cmds1) *** (CommandRecorder cmds2) = CommandRecorder (cmds1 <> cmds2)

-- | Now implementing the ReadWrite instance is just a matter of collecting the commands
-- the program is running.
instance ReadWrite CommandRecorder where
  readLine = CommandRecorder [ReadLine]
  writeLine = CommandRecorder [WriteLine]

-- | A helper to run our program and get the list of commands it would execute
recordCommands :: CommandRecorder i o -> [Command]
recordCommands (CommandRecorder cmds) = cmds

-- | Here's a helper for printing out the effects a program will run.
analyze :: CommandRecorder i o -> IO ()
analyze prog = do
  let commands = recordCommands prog
  print commands
We can analyze our program and it'll show us which effects it will run if we were to execute it:
>>> analyze (myProgram "Hello")
[WriteLine,ReadLine,WriteLine]
Okay, we've achieved the ability to analyze and execute our program at parity with the Applicative version, but isn't it silly that we're asking the user their name and simply ignoring it? As it turns out, our Arrow interface is quantifiably more expressive: we can use results of past effects in future effects! Since we're now allowing writeLine to take it's input dynamically we no longer track the output in the structure of the command itself. This bit might seem like a step back, but if you still wanted the old version you could of course still define it: writeLineStatic :: String -> k () (). Arrows allow us the flexibility to choose which we prefer. We'll chat a bit more about this later in the article.

Here's something we couldn't do with the Applicative version, we can rewrite the program to greet the user by the name they provide. While we're at it, why not receive the greeting message as an input too?
-- | This program uses the name provided by the user in the response.
myProgram2 :: (ReadWrite k) => k String ()
myProgram2 =
  arr (\greeting -> greeting <> ", what is your name?")
    >>> writeLine
    >>> readLine
    >>> arr (\name -> "Welcome, " <> name <> "!")
    >>> writeLine
Composing arrows lets us route data from one effect to the next, and arr let's us map over values to change them just like fmap does for Functors. The structure of the effects are still statically defined, so even when routing input we can still analyze the entire program ahead of time:
>>> analyze myProgram2
[WriteLine, ReadLine, WriteLine]

>>> run myProgram2 "Hello"
Hello, what is your name?
Chris
Welcome, Chris!
Nifty!

Levelling Up

We're off to a great start, the ability to use the results of past effects is already better than we could get from Selective Applicative, without sacrificing any of the analysis capabilities we had in the Applicative version.

However, at the moment our programs are all still just linear sequences of commands. What happens if we want to route results from an earlier effect down to one far later in the program?

We need a bit more power, time to call back to that (***) we ignored earlier, and while we're at it, let's look at (&&&) too, which we get for free when we implement (***).
(***) :: Arrow k => k a b -> k c d -> k (a, c) (b, d)
(&&&) :: Arrow k => k a b -> k a c -> k a (b, c)
These operators allow us to take two independent programs in our arrow interface and compose them in parallel to one another, rather than sequentially. What parallel means is going to be up to the implementation (within the scope of the Arrow laws), but the key part is that these two sides don't depend on each other, which is distinct from the normal sequential composition we've been doing with (>>>).

With these we can write a now write a slightly more complex program which routes values around, and can forward values from earlier effects to later ones.
import UnliftIO.Directory qualified as Directory

-- The effects we'll need for this example
class (Arrow k) => FileCopy k where
  readLine :: k () String
  writeLine :: k String ()
  copyFile :: k (String, String) ()

data Command
  = ReadLine
  | WriteLine
  | CopyFile
  deriving (Show)

-- Here's the real executable implementation
instance FileCopy (Kleisli IO) where
  readLine = Kleisli $ \() -> getLine
  writeLine = Kleisli $ \msg -> putStrLn msg
  copyFile = Kleisli $ \(src, dest) -> Directory.copyFile src dest

-- Helper prompting the user for input.
prompt :: (FileCopy cat) => String -> cat a String
prompt msg =
  pureC msg
    >>> writeLine
    >>> readLine

fileCopyProgram :: (FileCopy k) => k () ()
fileCopyProgram =
  ( prompt "Select a file to copy"
      &&& prompt "Select the destination"
  )
    >>> copyFile
This program prompts the user for a source file and a destination file, then copies the source file to the destination. Notably, each prompt is independent of one another, that is, they don't have any data-dependencies on one another. But, copyFile takes two arguments, the results of each prompt. (&&&) allows us to express this.

Let's run it:
>>> run fileCopyProgram ()
Select a file to copy
ShoppingList.md
Select the destination
ShoppingList.backup
Uhh, okay so you can't see the result, but trust me it works! Kleisli's implementation of (***) just runs the left side, then the right side; but if, for other applications, you wanted real parallel execution you could write your implementation which runs each pair of parallel operations using Concurrently or something like it and your program will magically become as parallel as your data-dependencies allow! Caveat emptor, but at least having the option is nice, we don't get that from the Monadic interface where data-dependencies are hidden from us.

Now for the analysis.

We could, of course, still collect and print out the list of effects that would be run, but I'm bored of that, so let's level that up too. Now that we have both sequential and parallel composition, our programs are a tree of operations, so our analysis tools should probably follow suite.

Here's a rewrite of our CommandRecorder which tracks the whole tree of effects:
-- | We can represent the effects in our computations as a tree now.
data CommandTree eff
  = Effect eff
  | Identity
  | Composed (CommandTree eff {- >>> -}) (CommandTree eff)
  | -- (***)
    Parallel
      (CommandTree eff) -- First
      (CommandTree eff) -- Second
  deriving (Show, Eq, Ord, Functor, Traversable, Foldable)

data CommandRecorder eff i o = CommandRecorder (CommandTree eff)

instance Category (CommandRecorder eff) where
  -- The identity command does nothing, so it records no commands.
  id = CommandRecorder Identity

  -- I collapse redundant 'Identity's for clarity.
  -- The category laws make this safe to do.
  (CommandRecorder Identity) . (CommandRecorder cmds1) = CommandRecorder cmds1
  (CommandRecorder cmds2) . (CommandRecorder Identity) = CommandRecorder cmds2
  (CommandRecorder cmds2) . (CommandRecorder cmds1) = CommandRecorder (Composed cmds1 cmds2)

instance Arrow (CommandRecorder eff) where
  -- We don't bother tracking pure functions, so arr is a no-op.
  arr _f = CommandRecorder Identity

  -- Track when we fork into parallel execution paths as part of the tree.
  (CommandRecorder cmdsL) *** (CommandRecorder cmdsR) = CommandRecorder (Parallel cmdsL cmdsR)

-- | The interface implementation just tracks the commands
instance FileCopy (CommandRecorder Command) where
  readLine = CommandRecorder (Effect ReadLine)
  writeLine = CommandRecorder (Effect WriteLine)
  copyFile = CommandRecorder (Effect CopyFile)

analyze :: CommandRecorder Command i o -> IO ()
analyze prog = do
  let commands = recordCommands prog
  putStrLn $ renderCommandTree commands
Now we can build the tree of effects, let's take advantage of that and render it as a tree too!

Here's a function that renders any program tree down into a flow-chart description using the mermaid diagramming language.

Don't judge me for the implementation of my mermaid renderer... In fact, if you have a nicer one please send it to me :)

(It's not terribly important, so feel free to skip it)
diagram :: CommandRecorder Command i o -> IO ()
diagram prog = do
  let commands = recordCommands prog
  putStrLn $ commandTreeToMermaid commands

-- | A helper to render our command tree as a flow-chart style mermaid diagram.
commandTreeToMermaid :: forall eff. (Show eff) => CommandTree eff -> String
commandTreeToMermaid cmdTree =
  let preamble = "flowchart TD\n"
      (outputNodes, links) =
        renderNode cmdTree
          & flip runReaderT (["Input"] :: [String])
          & flip evalState (0 :: Int)
   in preamble
        <> unlines
          ( links
              <> ((\output -> output <> " --> Output") <$> outputNodes)
          )
  where
    newNodeId :: (MonadState Int m) => m Int
    newNodeId = do
      n <- get
      put (n + 1)
      return n
    renderNode :: CommandTree eff -> ReaderT [String] (State Int) ([String], [String])
    renderNode = \case
      Effect cmd -> do
        prev <- ask
        nodeId <- newNodeId
        let cmdLabel = show cmd
            nodeDef = show nodeId <> "[" <> cmdLabel <> "]"
            links = do
              x <- prev
              pure $ x <> (" --> " <> nodeDef)
        pure ([nodeDef], links)
      Identity -> do
        nodeId <- newNodeId
        prev <- ask
        let nodeDef = show nodeId <> ("[Identity]")
        let links = do
              x <- prev
              pure $ x <> (" --> " <> nodeDef)
        pure ([nodeDef], links)
      Composed cmds1 cmds2 -> do
        (leftIds, leftNode) <- renderNode cmds1
        (rightIds, rightNode) <- local (const leftIds) $ renderNode cmds2
        pure (rightIds, leftNode <> rightNode)
      Parallel cmds1 cmds2 -> do
        prev <- ask
        nodeId <- newNodeId
        let nodeDef = show nodeId <> ("[Parallel]")
        (leftIds, leftNode) <- local (const [nodeDef]) $ renderNode cmds1
        (rightIds, rightNode) <- local (const [nodeDef]) $ renderNode cmds2
        let thisLink = do
              x <- prev
              pure $ x <> (" --> " <> nodeDef)
            links =
              thisLink
                <> leftNode
                <> rightNode
        pure (leftIds <> rightIds, links)
Here's what the diagram output for our fileCopyProgram looks like:
>>> diagram fileCopyProgram
flowchart TD
Input --> 0[Parallel]
0[Parallel] --> 1[WriteLine]
1[WriteLine] --> 2[ReadLine]
0[Parallel] --> 3[WriteLine]
3[WriteLine] --> 4[ReadLine]
2[ReadLine] --> 5[CopyFile]
4[ReadLine] --> 5[CopyFile]
5[CopyFile] --> Output
And rendered:

Pretty cool eh?

Diagramming is just one thing you can do with our CommandTree, it's just data, you can fold over it to get all the effects, analyze which effects depend on which others, all sorts of things. This provides more clarity into what's happening than Selective's Over and Under newtypes.

This was a very simple example, but I promise you, with combinations of arr, (***) and first/second you can do any possible routing of values that you might like.

What you can't do yet, however, is to branch between possible execution paths, then run only one of them.

Let's add that.

Branching with ArrowChoice

Luckily for us, adding branching is pretty straight-forward. There's an aptly named ArrowChoice in base that we'll go ahead and implement.

ArrowChoice adds a new combinator:
(+++) :: ArrowChoice k => k a b -> k c d -> k (Either a c) (Either b d)
Similar to how (***) lets us represent two parallel and independent programs and fuse them into a single arrow which runs both, (+++) lets us introduce a conditional branch to our program, only one path will be executed based on whether the input value is a Left or a Right.

By implementing (+++) we also get the similar (|||) for free:
(|||) :: ArrowChoice k => k a c -> k b c -> k (Either a b) c
Let's add a Branch case to our CommandTree and implement ArrowChoice for our CommandRecorder.
data CommandTree eff
  = Effect eff
  | Identity
  | Composed (CommandTree eff {- >>> -}) (CommandTree eff)
  | Parallel
      (CommandTree eff) -- First
      (CommandTree eff) -- Second
  | Branch
      (CommandTree eff) -- Left
      (CommandTree eff) -- Right
  deriving (Show, Eq, Ord, Functor, Traversable, Foldable)

instance ArrowChoice (CommandRecorder eff) where
  (CommandRecorder cmds1) +++ (CommandRecorder cmds2) = CommandRecorder (Branch cmds1 cmds2)
No problem. As a reminder, here's the branching program we expressed using Selective Applicatives last time:
-- | A program using Selective effects
myProgram :: (ReadWriteDelete m) => m String
myProgram =
  let msgKind =
        Selective.matchS
          -- The list of values our program has explicit branches for.
          -- These are the values which will be used to crawl codepaths when
          -- analysing your program using `Over`.
          (Selective.cases ["friendly", "mean"])
          -- The action we run to get the input
          readLine
          -- What to do with each input
          ( \case
              "friendly" -> writeLine ("Hello! what is your name?") *> readLine
              "mean" ->
                let msg = unlines [ "Hey doofus, what do you want?"
                                  , "Too late. I deleted your hard-drive."
                                  , "How do you feel about that?"
                                  ]
                 in writeLine msg *> deleteMyHardDrive *> readLine
              -- This can't actually happen.
              _ -> error "impossible"
          )
      prompt = writeLine "Select your mood: friendly or mean"
      fallback =
        (writeLine "That was unexpected. You're an odd one aren't you?")
          <&> \() actualInput -> "Got unknown input: " <> actualInput
   in prompt
        *> Selective.branch
          msgKind
          fallback
          (pure id)
This example was always a bit forced just because of how limited Selective Applicatives are, but let's copy it over into our Arrow setup anyways.

First we'll implement ArrowChoice for our CommandRecorder.
-- Define our effects
class (Arrow k) => ReadWriteDelete k where
  readLine :: k () String

  writeLine :: k String ()

  deleteMyHardDrive :: k () ()

-- New commands for the new effects
data Command
  = ReadLine
  | WriteLine
  | DeleteMyHardDrive
  deriving (Show)

-- Track the effects
instance ReadWriteDelete CommandRecorder where
  readLine = CommandRecorder (Pure ReadLine)
  writeLine = CommandRecorder (Pure WriteLine)
  deleteMyHardDrive = CommandRecorder (Pure DeleteMyHardDrive)

-- Here's the runnable implementation
instance ReadWriteDelete (Kleisli IO) where
  readLine = Kleisli $ \() -> getLine
  writeLine = Kleisli $ \msg -> putStrLn msg
  deleteMyHardDrive = Kleisli $ \() -> putStrLn "Deleting hard drive... Just kidding!"
And here's our program which uses ArrowChoice:
branchingProgram :: (ReadWriteDelete k, ArrowChoice k) => k () ()
branchingProgram =
  pureC "Select your mood: friendly or mean"
    >>> writeLine
    >>> readLine
    >>> mapC
      ( \case
          "mean" -> Left ()
          "friendly" -> Right ()
          -- Just default to friendly
          _ -> Right ()
      )
    >>> let friendly =
              pureC "Hello! what is your name?"
                >>> writeLine
                >>> readLine
                >>> mapC (\name -> "Lovely to meet you, " <> name <> "!")
                >>> writeLine
            mean =
              pureC
                ( unlines
                    [ "Hey doofus, what do you want?",
                      "Too late. I deleted your hard-drive.",
                      "How do you feel about that?"
                    ]
                )
                >>> writeLine
                >>> deleteMyHardDrive
         in mean ||| friendly
Notice again, this version is actually more expressive than the Selective Applicative version, it actually greets the user by the name they provided, how kind.

I'll elide the edits to the mermaid renderer, Branch is very similar to the implementation of Parallel.

Let's make a mermaid chart like before:
>>> diagram branchingProgram
flowchart TD
Input --> 0[WriteLine]
0[WriteLine] --> 1[ReadLine]
1[ReadLine] --> 2[Branch]
2[Branch] --> 3[WriteLine]
3[WriteLine] --> 4[DeleteMyHardDrive]
2[Branch] --> 5[WriteLine]
5[WriteLine] --> 6[ReadLine]
6[ReadLine] --> 7[WriteLine]
4[DeleteMyHardDrive] --> Output
7[WriteLine] --> Output
See how it's now clear that the effects on one branch differ from another?

And of course we can run it just as you'd expect:
>>> run branchingProgram
Select your mood: friendly or mean
friendly
Hello! what is your name?
Joe
Lovely to meet you, Joe!

>>> run branchingProgram
Select your mood: friendly or mean
mean
Hey doofus, what do you want?
Too late. I deleted your hard-drive.
How do you feel about that?

Deleting hard drive... Just kidding!
Okay, so the syntax of that last example was starting to get pretty hairy, if only there was something like do-notation, but for arrows...

Arrow Notation

By enabling the {-# LANGUAGE Arrows #-} pragma we can use a form of do-notation with arrows. It will automatically route your inputs wherever you need them using combinators from the Arrow class and will even translate if and case statements into ArrowChoice combinators, it's very impressive.

I won't explain Arrow Notation deeply here, so go ahead and check out the GHC Manual for a more detailed look.

Here's what our branching program looks like when we translate it:
branchingProgramArrowNotation :: (ReadWriteDelete k, ArrowChoice k) => k () ()
branchingProgramArrowNotation = proc () -> do
  writeLine -< "Select your mood: friendly or mean"
  mood <- readLine -< ()
  case mood of
    "mean" -> mean -< ()
    "friendly" -> friendly -< ()
    _ -> friendly -< ()
  where
    friendly = proc () -> do
      writeLine -< "Hello! what is your name?"
      name <- readLine -< ()
      writeLine -< "Lovely to meet you, " <> name <> "!"

    mean = proc () -> do
      writeLine
        -<
          unlines
            [ "Hey doofus, what do you want?",
              "Too late. I deleted your hard-drive.",
              "How do you feel about that?"
            ]
      deleteMyHardDrive -< ()
It takes a bit of getting used to, but it's not so bad.

Here's the diagram, so we can get an idea of how it's being translated:

It's not quite as pretty, the translation introduces a lot of unnecessary calls to Parallel where it's just inserting Identity on the other side, this is perfectly valid, since the Category laws require that the Identity won't affect behaviour, but in our case it's messy and is clogging up our diagram, so let's clean it up.

The command tree we build as an intermediate step is just a value, so we can transform it to clean it up no problem.

If you derive Data and Plated for our Command and CommandTree types then we can do this with a simple transform on the tree. transform will rebuild the tree from the bottom up removing any redundant Identity nodes as it goes.
unredundify :: (Data eff) => CommandTree eff -> CommandTree eff
unredundify = transform \case
  Parallel Identity right -> right
  Parallel left Identity -> left
  Composed Identity right -> right
  Composed left Identity -> left
  other -> other
Diagramming the unredundified version looks much cleaner:

We can see here that the with multiple arms are getting collapsed into a sequence of binary branches, which is perfectly correct of course, but if you wanted to diagram it as a single branch you could rewrite the Branch constructor to have a list of options and collapse them all down with another rewrite rule. Same for Parallels of course. You can really do whatever is most useful for your use-case.

Arrow notation has its quirks, but it's still a substantial improvement over doing argument routing completely manually.

Static vs Dynamic data

It's worth a quick note on the difference between static and dynamic data with Arrows. With Applicatives, all the data needed to define an effect's behaviour was static, that is, it must be known at the time the program was constructed, though this might still be at runtime for the greater Haskell program.

With Arrows it's possible to interleave static and dynamic data, it's up to the author of the interface.

For example, if one were constructing a build-system they might have an interface like this:
class (Arrow k) => Builder k where
  dynamicReadFile :: k FilePath String
  staticReadFile :: FilePath -> k () String
dynamicReadFile takes its FilePath as a dynamic input, so we won't know which file we're going to read until execution time, however staticReadFile takes its FilePath as a static input. You pass it a single FilePath as a Haskell value when you construct the program. In this case we can embed the FilePath into the structure of the effect itself so that it's available during analysis.

While this is a bit more of an advanced use-case, it can be very useful. In the build-system case you could provide any statically known dependency files using staticReadFile and the build-system could check if those files have changed since the last run and safely replace some subtrees of the build with cached results if no dependencies in that subtree have changed.

This sort of thing takes careful thought and design, but provides a lot of flexibility which can unlock whole new programming techniques.

Folks may well have heard of Haxl, it's a Haskell library for analyzing programs and batching and caching requests to remote data sources. The implementation and interface for Haxl is moderately complex, and is limited in what it can do by the fact that it uses Monads. I'm curious how effective an Arrow-based version could be.

What's next?

We explored enough classes to enable most basic programs here. At this point you can branch, express independence between computations, and route input anywhere you need it. In case you're still hankering for a bit more expressive power we'll do a lightning quick tour of a few more classes.

There's ArrowLoop which encodes fixed-point style recursion.
class Arrow a => ArrowLoop a where
  loop :: a (b, d) (c, d) -> a b c
Interestingly, this is actually just another name for Costrong, as you can see by comparing with Costrong from the profunctors package.

If you really really need to be able to completely restructure your program on the fly you can do so using the ArrowApply class, which enables applying arbitrary runtime-created arrows.
class Arrow a => ArrowApply a where
    app :: a (a b c, b) c
This gives you the wildly expressive power to define entirely new code-paths at runtime. I'd still argue that reasonable programs that actually need to do this are pretty rare, but sometimes it's a useful shortcut to avoid some tedium. Note that if you use app, any effects within the dynamically applied arrow will be hidden from analysis, but you can still analyze the non-dynamic parts.

There are a few additional interesting classes which are strangely missing from base; but they have counterparts in profunctors. One example would be an arrow counterpart to Cochoice, which, if it existed, would look something like this:
class (Arrow k) => ArrowCochoice k where
  unright :: k (Either d a) (Either d b) -> k a b
  unleft :: k (Either a d) (Either b d) -> k a b
While the behaviour ultimately depends on the implementation, you can use this to implement things like recursive loops and while-loops, which avoids one of the more common needs for ArrowApply while preserving analysis over the contents of the loop.

There's some other good stuff in profunctors so I'd recommend just browsing around over there, (Thanks Ed). Traversing lets you apply a profunctor to elements of a Traversable container, Mapping does the same for Functors.

Anyways, you can see that most behaviours you take for granted when writing Haskell code with arbitrary functions in do-notation binds can generally be decomposed into some combination of Arrow typeclasses which accomplish the same thing. Using the principal of least-power is a good rule of thumb here. Generally you should use the lowest-power abstraction you can reasonably encode your program with, that will ensure you'll have the strongest potential for analysis.

In Summary

We've discovered that by switching from the Functor-Applicative-Monad effect system to a Category and Arrow hierarchy we can express significantly more complex and expressive programs while maintaining the ability to deeply introspect the programs we create.

We learned how we can collect additional typeclasses to gain more expressive power, and how we can implement custom instances to analyze and even diagram our programs.

Lastly we took a look at Arrow notation and how it improves the burden of syntax for writing these sorts of programs.

So, should we all abandon Monads and write everything using Arrows instead? Truthfully, I do believe they comprise a better foundation; so while the current Haskell ecosystem is all-in on Monads, if you the reader happen to be designing the effects system for a brand new functional programming language, why not give Arrows a try?

Hopefully you learned something ðŸ¤ž! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! ðŸ�»
October 16, 2025 12:00 AM

October 14, 2025

Arrows to Arrows, Categories to Queries

I’ve had a little time off of work as of late, and been spending it in characteristically unwise ways. In particular, I’ve written a little programming language that compiles to SQL. I call it catlang. That’s not to say that I’ve written a new query language. It’s a programming language, whose compiler spits out one giant SELECT statement. When you run that query in postgres, you get the output of your program.

Why have I done this? Because I needed a funny compilation target to test out the actual features of the language, which is that its intermediary language is a bunch of abstract category theory nonsense. Which I’ll get to. But I’m sure you first want to see this bad boy in action.

Behold, the function that returns 100 regardless of what input you give it. But it does it with the equivalent of a while loop:

count : Int -> Int
count =
  x ->
    loop x
      i ->
        n <- join id id -< i
        z <- abs . (-) -< (n, 100)
        case z of
          inl _ -> inr . (+) -< (n, 1)
          inr _ -> inl -< n

If you’re familiar with arrow notation, you’ll notice the above looks kinda like one big proc block. This is not a coincidence (because nothing is a coincidence). I figured if I were to go through all of this work, we might as well get a working arrow desugarer out of the mix. But I digress; that’s a story for another time.

Anyway, what’s going on here is we have an arrow count, which takes a single argument x. We then loop, starting from the value of x. Inside the loop, we now have a new variable i, which we do some voodoo on to compute n—the current value of the loop variable. Then we subtract 100 from n, and take the absolute value. The abs function here is a bit odd; it returns Left (abs x) if the input was negative, and Right x otherwise. Then we branch on the output of abs, where Left and Right have been renamed inl and inr respectively. If n - 100 was less than zero, we find ourselves in the inl case, where we add 1 to n and wrap the whole thing in inr—which the loop interprets as “loop again with this new value.” Otherwise, n - 100 was non-negative, and so we can return n directly.

Is it roundabout? You bet! The obtuseness here is not directly a feature, I was just looking for conceptually simple things I could do which would be easy to desugar into category-theoretical stuff. Which brings us to the intermediary language. After desugaring the source syntax for count above, we’re left with this IL representation:

  id △ id
⨟ cochoice
    ( undist
    ⨟   ( (prj₁ ⨟ id ▽ id) △ id
          ⨟   ( prj₁ △ 100
              ⨟ (-)
              ⨟ abs
              )
            △ id
          ⨟ prj₁ △ id
          ⨟ dist
          ⨟   ( (prj₂ ⨟ prj₂ ⨟ prj₁) △ 1
              ⨟ (+)
              ⨟ inr
              )
            ▽ ( prj₂
              ⨟ prj₂
              ⨟ prj₁
              ⨟ inl
              )
        )
      △ prj₂
    ⨟ dist
    )
⨟ prj₁

We’ll discuss all of this momentarily, but for now, just let your eyes glaze over the pretty unicode.

The underlying idea here is that each of these remaining symbols has very simple and specific algebraic semantics. For example, A ⨟ B means “do A and pipe the result into B.” By giving a transformation from this categorical IL into other domains, it becomes trivial to compile catlang to all sorts of weird compilation targets. Like SQL.

You’re probably wondering what the generated SQL looks like. Take a peek if you dare.

Ungodly Compiled SQL

SELECT
f0 AS f0
FROM
(SELECT
 f0 AS f0, f1 AS f1
 FROM
 (SELECT *
  FROM
  (WITH t0 AS
   (SELECT *
    FROM
    (WITH RECURSIVE recursion AS
     (SELECT
      clock_timestamp() as step
      , *
      FROM
      (WITH t1 AS
       (SELECT *
        FROM
        (SELECT
         f0 AS f0, f1 AS f1, NULL::integer AS f2, NULL::integer AS f3
         FROM
         (WITH t2 AS
          (SELECT * FROM (SELECT 0 as f0) AS _)
          SELECT *
          FROM
          (SELECT * FROM (SELECT f0 AS f0 FROM t2 AS _) AS _
           CROSS JOIN
           (SELECT f0 AS f1 FROM t2 AS _))
          AS _)
         AS _)
        AS _)
       SELECT *
       FROM
       (WITH t3 AS
        (SELECT *
         FROM
         (-- undist
          SELECT *
          FROM
          (SELECT
           f0 AS f0, NULL::integer AS f1, f1 AS f2
           FROM
           (-- undist1
            SELECT * FROM t1 AS _ WHERE "f0" IS NOT NULL)
           AS _)
          AS _
          UNION
          SELECT *
          FROM
          (SELECT
           NULL::integer AS f0, f2 AS f1, f3 AS f2
           FROM
           (-- dist2
            SELECT * FROM t1 AS _ WHERE "f2" IS NOT NULL)
           AS _)
          AS _)
         AS _)
        SELECT *
        FROM
        (WITH t4 AS
         (SELECT *
          FROM
          (SELECT *
           FROM
           (SELECT
            f0 AS f0, f1 AS f1
            FROM
            (WITH t5 AS
             (SELECT * FROM t3 AS _)
             SELECT *
             FROM
             (WITH t6 AS
              (SELECT *
               FROM
               (SELECT *
                FROM
                (SELECT
                 f0 AS f0
                 FROM
                 (WITH t7 AS
                  (SELECT * FROM (SELECT f0 AS f0, f1 AS f1 FROM t5 AS _) AS _)
                  SELECT *
                  FROM
                  (SELECT *
                   FROM
                   (SELECT
                    f0 AS f0
                    FROM
                    (-- join1
                     SELECT * FROM t7 AS _ WHERE "f0" IS NOT NULL)
                    AS _)
                   AS _
                   UNION
                   SELECT *
                   FROM
                   (SELECT
                    f1 AS f0
                    FROM
                    (-- join2
                     SELECT * FROM t7 AS _ WHERE "f1" IS NOT NULL)
                    AS _)
                   AS _)
                  AS _)
                 AS _)
                AS _
                CROSS JOIN
                (SELECT f0 AS f1, f1 AS f2, f2 AS f3 FROM t5 AS _))
               AS _)
              SELECT *
              FROM
              (WITH t8 AS
               (SELECT *
                FROM
                (SELECT *
                 FROM
                 (SELECT
                  f0 AS f0, f1 AS f1
                  FROM
                  (WITH t9 AS
                   (SELECT *
                    FROM
                    (SELECT
                     f0 - f1 AS f0
                     FROM
                     (WITH t10 AS
                      (SELECT * FROM t6 AS _)
                      SELECT *
                      FROM
                      (SELECT *
                       FROM
                       (SELECT f0 AS f0 FROM (SELECT f0 AS f0 FROM t10 AS _) AS _)
                       AS _
                       CROSS JOIN
                       (SELECT f0 AS f1 FROM (SELECT 100 as f0 FROM t10 AS _) AS _))
                      AS _)
                     AS _)
                    AS _)
                   SELECT *
                   FROM
                   (SELECT *
                    FROM
                    (SELECT
                     abs(f0) as f0, NULL::integer as f1
                     FROM
                     t9
                     AS _
                     WHERE
                     f0 < 0)
                    AS _
                    UNION
                    SELECT *
                    FROM
                    (SELECT NULL::integer as f0, f0 as f1 FROM t9 AS _ WHERE f0 >= 0)
                    AS _)
                   AS _)
                  AS _)
                 AS _
                 CROSS JOIN
                 (SELECT f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5 FROM t6 AS _))
                AS _)
               SELECT *
               FROM
               (WITH t11 AS
                (SELECT *
                 FROM
                 (SELECT *
                  FROM
                  (SELECT
                   f0 AS f0, f1 AS f1
                   FROM
                   (SELECT f0 AS f0, f1 AS f1 FROM t8 AS _)
                   AS _)
                  AS _
                  CROSS JOIN
                  (SELECT
                   f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5, f4 AS f6, f5 AS f7
                   FROM
                   t8
                   AS _))
                 AS _)
                SELECT *
                FROM
                (WITH t12 AS
                 (SELECT *
                  FROM
                  (-- dist
                   SELECT *
                   FROM
                   (SELECT
                    f0 AS f0, f2 AS f1, NULL::integer AS f10, NULL::integer AS f11, NULL::integer AS f12, NULL::integer AS f13, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5, f7 AS f6, NULL::integer AS f7, NULL::integer AS f8, NULL::integer AS f9
                    FROM
                    (-- dist1
                     SELECT * FROM t11 AS _ WHERE "f0" IS NOT NULL)
                    AS _)
                   AS _
                   UNION
                   SELECT *
                   FROM
                   (SELECT
                    NULL::integer AS f0, NULL::integer AS f1, f4 AS f10, f5 AS f11, f6 AS f12, f7 AS f13, NULL::integer AS f2, NULL::integer AS f3, NULL::integer AS f4, NULL::integer AS f5, NULL::integer AS f6, f1 AS f7, f2 AS f8, f3 AS f9
                    FROM
                    (-- dist2
                     SELECT * FROM t11 AS _ WHERE "f1" IS NOT NULL)
                    AS _)
                   AS _)
                  AS _)
                 SELECT *
                 FROM
                 (SELECT *
                  FROM
                  (SELECT
                   NULL::integer AS f0, f0 AS f1
                   FROM
                   (SELECT
                    f0 + f1 AS f0
                    FROM
                    (WITH t13 AS
                     (SELECT *
                      FROM
                      (SELECT
                       f0 AS f0, f1 AS f1, f2 AS f2, f3 AS f3, f4 AS f4, f5 AS f5, f6 AS f6
                       FROM
                       (-- join1
                        SELECT * FROM t12 AS _
                        WHERE
                        ("f0" IS NOT NULL) AND ((("f1" IS NOT NULL) OR ("f2" IS NOT NULL)) AND (("f3" IS NOT NULL) AND ((("f4" IS NOT NULL) OR ("f5" IS NOT NULL)) AND ("f6" IS NOT NULL)))))
                       AS _)
                      AS _)
                     SELECT *
                     FROM
                     (SELECT *
                      FROM
                      (SELECT
                       f0 AS f0
                       FROM
                       (SELECT
                        f0 AS f0
                        FROM
                        (SELECT
                         f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
                         FROM
                         (SELECT
                          f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
                          FROM
                          t13
                          AS _)
                         AS _)
                        AS _)
                       AS _)
                      AS _
                      CROSS JOIN
                      (SELECT f0 AS f1 FROM (SELECT 1 as f0 FROM t13 AS _) AS _))
                     AS _)
                    AS _)
                   AS _)
                  AS _
                  UNION
                  SELECT *
                  FROM
                  (SELECT
                   f0 AS f0, NULL::integer AS f1
                   FROM
                   (SELECT
                    f0 AS f0
                    FROM
                    (SELECT
                     f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
                     FROM
                     (SELECT
                      f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
                      FROM
                      (SELECT
                       f7 AS f0, f8 AS f1, f9 AS f2, f10 AS f3, f11 AS f4, f12 AS f5, f13 AS f6
                       FROM
                       (-- join2
                        SELECT * FROM t12 AS _
                        WHERE
                        ("f7" IS NOT NULL) AND ((("f8" IS NOT NULL) OR ("f9" IS NOT NULL)) AND (("f10" IS NOT NULL) AND ((("f11" IS NOT NULL) OR ("f12" IS NOT NULL)) AND ("f13" IS NOT NULL)))))
                       AS _)
                      AS _)
                     AS _)
                    AS _)
                   AS _)
                  AS _)
                 AS _)
                AS _)
               AS _)
              AS _)
             AS _)
            AS _)
           AS _
           CROSS JOIN
           (SELECT f0 AS f2 FROM (SELECT f2 AS f0 FROM t3 AS _) AS _))
          AS _)
         SELECT *
         FROM
         (-- dist
          SELECT *
          FROM
          (SELECT
           f0 AS f0, f2 AS f1, NULL::integer AS f2, NULL::integer AS f3
           FROM
           (-- dist1
            SELECT * FROM t4 AS _ WHERE "f0" IS NOT NULL)
           AS _)
          AS _
          UNION
          SELECT *
          FROM
          (SELECT
           NULL::integer AS f0, NULL::integer AS f1, f1 AS f2, f2 AS f3
           FROM
           (-- dist2
            SELECT * FROM t4 AS _ WHERE "f1" IS NOT NULL)
           AS _)
          AS _)
         AS _)
        AS _)
       AS _)
      AS _
      UNION ALL
      SELECT
      clock_timestamp() as step
      , *
      FROM
      (SELECT *
       FROM
       (WITH t14 AS
        (SELECT * FROM recursion AS _)
        SELECT *
        FROM
        (WITH t15 AS
         (SELECT *
          FROM
          (-- undist
           SELECT *
           FROM
           (SELECT
            f0 AS f0, NULL::integer AS f1, f1 AS f2
            FROM
            (-- undist1
             SELECT * FROM t14 AS _ WHERE "f0" IS NOT NULL)
            AS _)
           AS _
           UNION
           SELECT *
           FROM
           (SELECT
            NULL::integer AS f0, f2 AS f1, f3 AS f2
            FROM
            (-- dist2
             SELECT * FROM t14 AS _ WHERE "f2" IS NOT NULL)
            AS _)
           AS _)
          AS _)
         SELECT *
         FROM
         (WITH t16 AS
          (SELECT *
           FROM
           (SELECT *
            FROM
            (SELECT
             f0 AS f0, f1 AS f1
             FROM
             (WITH t17 AS
              (SELECT * FROM t15 AS _)
              SELECT *
              FROM
              (WITH t18 AS
               (SELECT *
                FROM
                (SELECT *
                 FROM
                 (SELECT
                  f0 AS f0
                  FROM
                  (WITH t19 AS
                   (SELECT * FROM (SELECT f0 AS f0, f1 AS f1 FROM t17 AS _) AS _)
                   SELECT *
                   FROM
                   (SELECT *
                    FROM
                    (SELECT
                     f0 AS f0
                     FROM
                     (-- join1
                      SELECT * FROM t19 AS _ WHERE "f0" IS NOT NULL)
                     AS _)
                    AS _
                    UNION
                    SELECT *
                    FROM
                    (SELECT
                     f1 AS f0
                     FROM
                     (-- join2
                      SELECT * FROM t19 AS _ WHERE "f1" IS NOT NULL)
                     AS _)
                    AS _)
                   AS _)
                  AS _)
                 AS _
                 CROSS JOIN
                 (SELECT f0 AS f1, f1 AS f2, f2 AS f3 FROM t17 AS _))
                AS _)
               SELECT *
               FROM
               (WITH t20 AS
                (SELECT *
                 FROM
                 (SELECT *
                  FROM
                  (SELECT
                   f0 AS f0, f1 AS f1
                   FROM
                   (WITH t21 AS
                    (SELECT *
                     FROM
                     (SELECT
                      f0 - f1 AS f0
                      FROM
                      (WITH t22 AS
                       (SELECT * FROM t18 AS _)
                       SELECT *
                       FROM
                       (SELECT *
                        FROM
                        (SELECT f0 AS f0 FROM (SELECT f0 AS f0 FROM t22 AS _) AS _)
                        AS _
                        CROSS JOIN
                        (SELECT f0 AS f1 FROM (SELECT 100 as f0 FROM t22 AS _) AS _))
                       AS _)
                      AS _)
                     AS _)
                    SELECT *
                    FROM
                    (SELECT *
                     FROM
                     (SELECT
                      abs(f0) as f0, NULL::integer as f1
                      FROM
                      t21
                      AS _
                      WHERE
                      f0 < 0)
                     AS _
                     UNION
                     SELECT *
                     FROM
                     (SELECT NULL::integer as f0, f0 as f1 FROM t21 AS _ WHERE f0 >= 0)
                     AS _)
                    AS _)
                   AS _)
                  AS _
                  CROSS JOIN
                  (SELECT f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5 FROM t18 AS _))
                 AS _)
                SELECT *
                FROM
                (WITH t23 AS
                 (SELECT *
                  FROM
                  (SELECT *
                   FROM
                   (SELECT
                    f0 AS f0, f1 AS f1
                    FROM
                    (SELECT f0 AS f0, f1 AS f1 FROM t20 AS _)
                    AS _)
                   AS _
                   CROSS JOIN
                   (SELECT
                    f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5, f4 AS f6, f5 AS f7
                    FROM
                    t20
                    AS _))
                  AS _)
                 SELECT *
                 FROM
                 (WITH t24 AS
                  (SELECT *
                   FROM
                   (-- dist
                    SELECT *
                    FROM
                    (SELECT
                     f0 AS f0, f2 AS f1, NULL::integer AS f10, NULL::integer AS f11, NULL::integer AS f12, NULL::integer AS f13, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5, f7 AS f6, NULL::integer AS f7, NULL::integer AS f8, NULL::integer AS f9
                     FROM
                     (-- dist1
                      SELECT * FROM t23 AS _ WHERE "f0" IS NOT NULL)
                     AS _)
                    AS _
                    UNION
                    SELECT *
                    FROM
                    (SELECT
                     NULL::integer AS f0, NULL::integer AS f1, f4 AS f10, f5 AS f11, f6 AS f12, f7 AS f13, NULL::integer AS f2, NULL::integer AS f3, NULL::integer AS f4, NULL::integer AS f5, NULL::integer AS f6, f1 AS f7, f2 AS f8, f3 AS f9
                     FROM
                     (-- dist2
                      SELECT * FROM t23 AS _ WHERE "f1" IS NOT NULL)
                     AS _)
                    AS _)
                   AS _)
                  SELECT *
                  FROM
                  (SELECT *
                   FROM
                   (SELECT
                    NULL::integer AS f0, f0 AS f1
                    FROM
                    (SELECT
                     f0 + f1 AS f0
                     FROM
                     (WITH t25 AS
                      (SELECT *
                       FROM
                       (SELECT
                        f0 AS f0, f1 AS f1, f2 AS f2, f3 AS f3, f4 AS f4, f5 AS f5, f6 AS f6
                        FROM
                        (-- join1
                         SELECT * FROM t24 AS _
                         WHERE
                         ("f0" IS NOT NULL) AND ((("f1" IS NOT NULL) OR ("f2" IS NOT NULL)) AND (("f3" IS NOT NULL) AND ((("f4" IS NOT NULL) OR ("f5" IS NOT NULL)) AND ("f6" IS NOT NULL)))))
                        AS _)
                       AS _)
                      SELECT *
                      FROM
                      (SELECT *
                       FROM
                       (SELECT
                        f0 AS f0
                        FROM
                        (SELECT
                         f0 AS f0
                         FROM
                         (SELECT
                          f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
                          FROM
                          (SELECT
                           f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
                           FROM
                           t25
                           AS _)
                          AS _)
                         AS _)
                        AS _)
                       AS _
                       CROSS JOIN
                       (SELECT f0 AS f1 FROM (SELECT 1 as f0 FROM t25 AS _) AS _))
                      AS _)
                     AS _)
                    AS _)
                   AS _
                   UNION
                   SELECT *
                   FROM
                   (SELECT
                    f0 AS f0, NULL::integer AS f1
                    FROM
                    (SELECT
                     f0 AS f0
                     FROM
                     (SELECT
                      f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
                      FROM
                      (SELECT
                       f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
                       FROM
                       (SELECT
                        f7 AS f0, f8 AS f1, f9 AS f2, f10 AS f3, f11 AS f4, f12 AS f5, f13 AS f6
                        FROM
                        (-- join2
                         SELECT * FROM t24 AS _
                         WHERE
                         ("f7" IS NOT NULL) AND ((("f8" IS NOT NULL) OR ("f9" IS NOT NULL)) AND (("f10" IS NOT NULL) AND ((("f11" IS NOT NULL) OR ("f12" IS NOT NULL)) AND ("f13" IS NOT NULL)))))
                        AS _)
                       AS _)
                      AS _)
                     AS _)
                    AS _)
                   AS _)
                  AS _)
                 AS _)
                AS _)
               AS _)
              AS _)
             AS _)
            AS _
            CROSS JOIN
            (SELECT f0 AS f2 FROM (SELECT f2 AS f0 FROM t15 AS _) AS _))
           AS _)
          SELECT *
          FROM
          (-- dist
           SELECT *
           FROM
           (SELECT
            f0 AS f0, f2 AS f1, NULL::integer AS f2, NULL::integer AS f3
            FROM
            (-- dist1
             SELECT * FROM t16 AS _ WHERE "f0" IS NOT NULL)
            AS _)
           AS _
           UNION
           SELECT *
           FROM
           (SELECT
            NULL::integer AS f0, NULL::integer AS f1, f1 AS f2, f2 AS f3
            FROM
            (-- dist2
             SELECT * FROM t16 AS _ WHERE "f1" IS NOT NULL)
            AS _)
           AS _)
          AS _)
         AS _)
        AS _)
       AS _
       WHERE
       ("f2" IS NOT NULL) AND ("f3" IS NOT NULL))
      AS _)
     SELECT * FROM recursion ORDER BY step DESC LIMIT 1)
    AS _)
   SELECT *
   FROM
   (WITH t26 AS
    (SELECT *
     FROM
     (-- undist
      SELECT *
      FROM
      (SELECT
       f0 AS f0, NULL::integer AS f1, f1 AS f2
       FROM
       (-- undist1
        SELECT * FROM t0 AS _ WHERE "f0" IS NOT NULL)
       AS _)
      AS _
      UNION
      SELECT *
      FROM
      (SELECT
       NULL::integer AS f0, f2 AS f1, f3 AS f2
       FROM
       (-- dist2
        SELECT * FROM t0 AS _ WHERE "f2" IS NOT NULL)
       AS _)
      AS _)
     AS _)
    SELECT *
    FROM
    (WITH t27 AS
     (SELECT *
      FROM
      (SELECT *
       FROM
       (SELECT
        f0 AS f0, f1 AS f1
        FROM
        (WITH t28 AS
         (SELECT * FROM t26 AS _)
         SELECT *
         FROM
         (WITH t29 AS
          (SELECT *
           FROM
           (SELECT *
            FROM
            (SELECT
             f0 AS f0
             FROM
             (WITH t30 AS
              (SELECT * FROM (SELECT f0 AS f0, f1 AS f1 FROM t28 AS _) AS _)
              SELECT *
              FROM
              (SELECT *
               FROM
               (SELECT
                f0 AS f0
                FROM
                (-- join1
                 SELECT * FROM t30 AS _ WHERE "f0" IS NOT NULL)
                AS _)
               AS _
               UNION
               SELECT *
               FROM
               (SELECT
                f1 AS f0
                FROM
                (-- join2
                 SELECT * FROM t30 AS _ WHERE "f1" IS NOT NULL)
                AS _)
               AS _)
              AS _)
             AS _)
            AS _
            CROSS JOIN
            (SELECT f0 AS f1, f1 AS f2, f2 AS f3 FROM t28 AS _))
           AS _)
          SELECT *
          FROM
          (WITH t31 AS
           (SELECT *
            FROM
            (SELECT *
             FROM
             (SELECT
              f0 AS f0, f1 AS f1
              FROM
              (WITH t32 AS
               (SELECT *
                FROM
                (SELECT
                 f0 - f1 AS f0
                 FROM
                 (WITH t33 AS
                  (SELECT * FROM t29 AS _)
                  SELECT *
                  FROM
                  (SELECT *
                   FROM
                   (SELECT f0 AS f0 FROM (SELECT f0 AS f0 FROM t33 AS _) AS _)
                   AS _
                   CROSS JOIN
                   (SELECT f0 AS f1 FROM (SELECT 100 as f0 FROM t33 AS _) AS _))
                  AS _)
                 AS _)
                AS _)
               SELECT *
               FROM
               (SELECT *
                FROM
                (SELECT
                 abs(f0) as f0, NULL::integer as f1
                 FROM
                 t32
                 AS _
                 WHERE
                 f0 < 0)
                AS _
                UNION
                SELECT *
                FROM
                (SELECT NULL::integer as f0, f0 as f1 FROM t32 AS _ WHERE f0 >= 0)
                AS _)
               AS _)
              AS _)
             AS _
             CROSS JOIN
             (SELECT f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5 FROM t29 AS _))
            AS _)
           SELECT *
           FROM
           (WITH t34 AS
            (SELECT *
             FROM
             (SELECT *
              FROM
              (SELECT
               f0 AS f0, f1 AS f1
               FROM
               (SELECT f0 AS f0, f1 AS f1 FROM t31 AS _)
               AS _)
              AS _
              CROSS JOIN
              (SELECT
               f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5, f4 AS f6, f5 AS f7
               FROM
               t31
               AS _))
             AS _)
            SELECT *
            FROM
            (WITH t35 AS
             (SELECT *
              FROM
              (-- dist
               SELECT *
               FROM
               (SELECT
                f0 AS f0, f2 AS f1, NULL::integer AS f10, NULL::integer AS f11, NULL::integer AS f12, NULL::integer AS f13, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5, f7 AS f6, NULL::integer AS f7, NULL::integer AS f8, NULL::integer AS f9
                FROM
                (-- dist1
                 SELECT * FROM t34 AS _ WHERE "f0" IS NOT NULL)
                AS _)
               AS _
               UNION
               SELECT *
               FROM
               (SELECT
                NULL::integer AS f0, NULL::integer AS f1, f4 AS f10, f5 AS f11, f6 AS f12, f7 AS f13, NULL::integer AS f2, NULL::integer AS f3, NULL::integer AS f4, NULL::integer AS f5, NULL::integer AS f6, f1 AS f7, f2 AS f8, f3 AS f9
                FROM
                (-- dist2
                 SELECT * FROM t34 AS _ WHERE "f1" IS NOT NULL)
                AS _)
               AS _)
              AS _)
             SELECT *
             FROM
             (SELECT *
              FROM
              (SELECT
               NULL::integer AS f0, f0 AS f1
               FROM
               (SELECT
                f0 + f1 AS f0
                FROM
                (WITH t36 AS
                 (SELECT *
                  FROM
                  (SELECT
                   f0 AS f0, f1 AS f1, f2 AS f2, f3 AS f3, f4 AS f4, f5 AS f5, f6 AS f6
                   FROM
                   (-- join1
                    SELECT * FROM t35 AS _
                    WHERE
                    ("f0" IS NOT NULL) AND ((("f1" IS NOT NULL) OR ("f2" IS NOT NULL)) AND (("f3" IS NOT NULL) AND ((("f4" IS NOT NULL) OR ("f5" IS NOT NULL)) AND ("f6" IS NOT NULL)))))
                   AS _)
                  AS _)
                 SELECT *
                 FROM
                 (SELECT *
                  FROM
                  (SELECT
                   f0 AS f0
                   FROM
                   (SELECT
                    f0 AS f0
                    FROM
                    (SELECT
                     f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
                     FROM
                     (SELECT
                      f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
                      FROM
                      t36
                      AS _)
                     AS _)
                    AS _)
                   AS _)
                  AS _
                  CROSS JOIN
                  (SELECT f0 AS f1 FROM (SELECT 1 as f0 FROM t36 AS _) AS _))
                 AS _)
                AS _)
               AS _)
              AS _
              UNION
              SELECT *
              FROM
              (SELECT
               f0 AS f0, NULL::integer AS f1
               FROM
               (SELECT
                f0 AS f0
                FROM
                (SELECT
                 f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
                 FROM
                 (SELECT
                  f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
                  FROM
                  (SELECT
                   f7 AS f0, f8 AS f1, f9 AS f2, f10 AS f3, f11 AS f4, f12 AS f5, f13 AS f6
                   FROM
                   (-- join2
                    SELECT * FROM t35 AS _
                    WHERE
                    ("f7" IS NOT NULL) AND ((("f8" IS NOT NULL) OR ("f9" IS NOT NULL)) AND (("f10" IS NOT NULL) AND ((("f11" IS NOT NULL) OR ("f12" IS NOT NULL)) AND ("f13" IS NOT NULL)))))
                   AS _)
                  AS _)
                 AS _)
                AS _)
               AS _)
              AS _)
             AS _)
            AS _)
           AS _)
          AS _)
         AS _)
        AS _)
       AS _
       CROSS JOIN
       (SELECT f0 AS f2 FROM (SELECT f2 AS f0 FROM t26 AS _) AS _))
      AS _)
     SELECT *
     FROM
     (-- dist
      SELECT *
      FROM
      (SELECT
       f0 AS f0, f2 AS f1, NULL::integer AS f2, NULL::integer AS f3
       FROM
       (-- dist1
        SELECT * FROM t27 AS _ WHERE "f0" IS NOT NULL)
       AS _)
      AS _
      UNION
      SELECT *
      FROM
      (SELECT
       NULL::integer AS f0, NULL::integer AS f1, f1 AS f2, f2 AS f3
       FROM
       (-- dist2
        SELECT * FROM t27 AS _ WHERE "f1" IS NOT NULL)
       AS _)
      AS _)
     AS _)
    AS _)
   AS _)
  AS _
  WHERE
  ("f0" IS NOT NULL) AND ("f1" IS NOT NULL))
 AS _)
AS _;

It’s not pretty, rather amazingly, running the above query in postgres 17 will in fact return a single row with a single column whose value is 100. And you’d better believe it does it by actually looping its way up to 100. If you don’t believe me, make the following change:

-     SELECT * FROM recursion ORDER BY step DESC LIMIT 1)
+     SELECT * FROM recursion ORDER BY step DESC)

which will instead return a row for each step of the iteration.

There are some obvious optimizations I could make to the generated SQL, but it didn’t seem worth my time, since that’s not the interesting part of the project.

What the Hell Is Going On?

Let’s take some time to discuss the underlying category theory here. I am by no means an expert, but what I have learned after a decade of bashing my head against this stuff is that a little goes a long way.

For our intents and purposes, we have types, and arrows (functions) between types. We always have the identity “do nothing arrow” id:

id  :: a ~> a

and we can compose arrows by lining up one end to another:¹

(⨟) :: (a ~> b) -> (b ~> c) -> (a ~> c)

Unlike Haskell (or really any programming language, for that matter), we DO NOT have the notion of function application. That is, there is no arrow:

-- doesn't exist!
($) :: (a ~> b) -> a -> b

You can only compose arrows, you can’t apply them. That’s why we call these things “arrows” rather than “functions.”

There are a bundle of arrows for working with product types. The two projection functions correspond to fst and snd, taking individual components out of pairs:

prj₁ :: (a, b) ~> a
prj₂ :: (a, b) ~> b

How do we get things into pairs in the first place? We can use the “fork” operation, which takes two arrows computing b and c, and generates a new arrow which generates a pair of (b, c):

(△)  :: (a ~> b) -> (a ~> c) -> (a ~> (b, c))

If you’re coming from a Haskell background, it’s tempting to think of this operation merely as the (,) pair constructor. But you’ll notice from the type of the computation that there can be no data dependency between b and c, thus we are free to parallelize each side of the pair.

In category theory, the distinction between left and right sides of an arrow is rather arbitrary. This gives rise to a notion called duality where we can flip the arrows around, and get cool new behavior. If we dualize all of our product machinery, we get the coproduct machinery, where a coproduct of a and b is “either a or b, but definitely not both nor neither.”

Swapping the arrow direction of prj₁ and prj₂, and replacing (,) with Either gives us the following injections:

inl :: a ~> Either a b
inr :: b ~> Either a b

and the following “join” operation for eliminating coproducts:

(▽) :: (a ~> c) -> (b ~> c) -> (Either a b ~> c)

Again, coming from Haskell this is just the standard either function. It corresponds to a branch between one of two cases.

As you can see, with just these eight operations, we already have a tremendous amount of expressivity. We can express data dependencies via ⨟ and branching via ▽. With △ we automatically encode opportunities for parallelism, and gain the ability to build complicated data structures, with prj₁ and prj₂ allowing us to get the information back out of the data structures.

You’ll notice in the IL that there are no variable names anywhere to be found. The desugaring of the source language builds a stack (via the something to allocate △ id pattern), and replaces subsequent variable lookups with a series of projections on the stack to find the value again. On one hand, this makes the categorical IL rather hard to read, but it makes it very easy to re-target! Many domains do have a notion of grouping, but don’t have a native notion of naming.

For example, in an electronic circuit, I can have a ribbon of 32 wires which represents an Int32. If I have another ribbon of 32 wires, I can trivially route both wires into a 64-wire ribbon corresponding to a pair of (Int32, Int32).

By eliminating names before we get to the IL, it means no compiler backend ever needs to deal with names. They can just work on a stack representation, and are free to special-case optimize series of projections if they are able to.

Of particular interest to this discussion is how we desugar loops in catlang. The underlying primitive is cochoice:

cochoice :: (Either a c ~> Either b c) -> (a ~> b)

which magically turns an arrow on Eithers into an arrow without the eithers. We obviously must run that arrow on eithers. If that function returns inl, then we’re happy and we can just output that. But if the function returns inr, we have no choice but to pass it back in to the eithered arrow. In Haskell, cochoice is implemented as:

cochoiceHask :: (Either a c -> Either b c) -> a -> c
cochoiceHask f = go . Left
  where
    go :: Either a c -> b
    go eac =
      case f eac of
        Left b -> b
        Right c -> go (Right c)

which as you can see, will loop until f finally returns a Left. What’s neat about this formulation of a loop is that we can statically differentiate between our first and subsequent passes through the loop body. The first time through eac is Left, while for all other times it is Right. We don’t take advantage of it in the original count program, but how many times have you written loop code that needs to initialize something its first time through?

Compiling to SQL

So that’s the underlying theory behind the IL. How can we compile this to SQL now?

As alluded to before, we simply need to give SQL implementations for each of the operations in the intermediary language. As a simple example, id compiles to SELECT * FROM {}, where {} is the input of the arrow.

The hardest part here was working out a data representation. It seems obvious to encode each element of a product as a new column, but what do we do about coproducts? After much work thought, I decided to flatten out the coproducts. So, for example, the type:

(Int, Either Int Int)

would be represented as three columns:

( f1 INT NOT NULL
, f2 INT
, f3 INT
)

with the constraint that exactly one of f2 or f3 would be IS NOT NULL at any given point in time.

With this hammered out, almost everything else is pretty trivial. Composition corresponds to a nested query. Forks are CROSS JOINs which concatenate the columns of each sub-query. Joins are UNIONs, where we add a WHERE field IS NOT NULL clause to enforce we’re looking at the correct coproduct constructor.

Cochoice is the only really tricky thing, but it corresponds to a recursive CTE. Generating a recursive CTE table for the computation isn’t too hard, but getting the final value out of it was surprisingly tricky. The semantics of SQL tables is that they are multisets and come with an arbitrary greatest element. Which is to say, you need an column structured in a relevant way in order to query the final result. Due to some quirks in what postgres accepts, and in how I structured my queries, it was prohibitively hard to insert a “how many times have I looped” column and order by that. So instead I cheated and added a clock_timestamp() as step column which looks at the processor clock and ordered by that.

This is clearly a hack, and presumably will cause problems if I ever add some primitives which generate more than one row, but again, this is just for fun and who cares. Send me a pull request if you’re offended by my chicanery!

Stupid Directions To Go In the Future

I’ve run out of vacation time to work on this project, so I’m probably not going to get around to the meta-circular stupidity I was planning.

The compiler still needs a few string-crunching primitives (which are easy to add), but then it would be simple to write a little brainfuck interpreter in catlang. Which I could then compile to SQL. Now we’ve got a brainfuck interpreter running in postgres. Of course, this has been done by hand before, but to my knowledge, never via compilation.

There exist C to brainfuck compilers. And postgres is written in C. So in a move that would make Xzibit proud, we could run postgres in postgres. And of course, it would be fun to run brainfuck in brainfuck. That’d be a cool catlang backend if someone wanted to contribute such a thing.

Notes and Due Diligence and What Have You

I am not the first person to do anything like this. The source language of catlang is heavily inspired by Haskell’s arrow syntax, which in turn is essentially a desugaring algorithm for Arrows. Arrows are slightly the wrong abstraction because they require an operation arr :: (a -> b) -> (a ~> b)—which requires you to be able to embed Haskell functions in your category, something which is almost never possible.

Unfortunately, arrow syntax in Haskell desugars down to arr for almost everything it does, which in turn makes arrow notation effectively useless. In an ideal world, everything I described in this blog post would be a tiny little Haskell library, with arrow notation doing the heavy lifting. But that is just not the world we live in.

Nor am I the first person to notice that there are categorical semantics behind programming languages. I don’t actually know whom to cite on this one, but it is well-established folklore that the lambda calculus corresponds to cartesian-closed categories. The “closed” part of “cartesian-closed” means we have an operation eval :: (a ~> b, a) ~> b, but everyone and their dog has implemented the lambda calculus, so I thought it would be fun to see how far we can get without it. This is not a limitation on catlang’s turing completeness (since cochoice gives us everything we need.)

I’ve been thinking about writing a category-first programming language for the better part of a decade, ever since I read Compiling to Categories. That paper takes Haskell and desugars it back down to categories. I stole many of the tricks here from that paper.

Anyway. All of the code is available on github if you’re interested in taking a look. The repo isn’t up to my usual coding standards, for which you have my apologies. Of note is the template-haskell backend which can spit out Haskell code; meaning it wouldn’t be very hard to make a quasiquoter to compile catlang into what Haskell’s arrow desugaring ought to be. If there’s enough clamor for such a thing, I’ll see about turning this part into a library.

When looking at the types of arrows in this essay, we make the distinction that ~> are arrows that we can write in catlang, while -> exist in the metatheory.↩︎

October 14, 2025 02:31 PM

October 13, 2025

Well-Typed.Com

Verifying and testing timeliness constraints with io-sim
Testing and verifying concurrent systems is hard due to their non-deterministic nature — verifying behavior that changes with each execution is difficult. Race conditions thrive in the non-deterministic world of thread scheduling. Even more challenging is verifying timeliness constraints, i.e. ensuring that operations complete within specified deadlines or that service guarantees are maintained under load. Traditional testing approaches struggle with concurrency, and mocking strategies often fail to capture the subtle interactions between threads, time, and shared state that cause real production failures.

The io-sim Haskell library, developed by Well-Typed in partnership with engineers from IOG and Quviq, offers a compelling solution to this problem. The library provides a pure simulation environment for IO computations, enabling deterministic execution of concurrent code with accurate time simulation and detailed execution traces. Unlike other testing approaches, with io-sim one is able to write highly concurrent, real-time systems and verify their timeliness constraints in a deterministic manner, by accurately simulating GHC’s runtime system (e.g. asynchronous exceptions, timeouts & delays, etc.).

This blog post introduces and explores io-sim through a practical example: debugging an elevator controller that violates its response time requirements.

There’s also this great blog post announcing io-sim and it goes a bit more into detail about its features!

The Problem

Consider a simple elevator located in a three-floor building (ground, first , second). It takes roughly 1 second for the elevator to go up and down between each floor. The service requirement is: no passenger should wait more than 4 seconds from pressing the call button until the elevator doors open at their floor. It should be possible to test and verify this requirement when writing our elevator controller.

This ensures a reasonable quality of service and prevents frustration. Given the short distance between floors, 4 seconds is sensible. In the worst case, the elevator must travel from ground to second floor and back again.

Here’s a first attempt at modelling the system. Let’s start with the core data structures:
data Direction = Up | Down | None
  deriving (Eq, Show)

data Floor = Ground | First | Second
  deriving (Eq, Ord, Show, Enum)

data ElevatorState = ElevatorState
  { currentFloor :: Floor
  , moving       :: Direction
  , requests     :: [Floor]
  } deriving (Show)
The elevator’s state tracks three things: where it currently is, which direction it’s moving (if any), and a queue of floor requests.

The system has two main components that run concurrently:

An elevator controller that continuously processes the request queue

Button press handler that adds new floor requests

Let’s look at the controller first:
-- | Initialize an empty elevator state.
--
-- The elevator starts on the ground floor
--
initElevator :: IO (TVar ElevatorState)
initElevator = newTVarIO $ ElevatorState Ground None []

-- | Elevator controller logic.
--
-- 1. Read the current 'ElevatorState'
-- 2. Check if there are any requested floors
-- 3.
--    3.1. Block waiting for new requests if there aren't any
--    3.2. If there any requests, move to the floor at the top of the queue.
--
-- Straightforward FIFO elevator algorithm.
--
elevatorController :: TVar ElevatorState -> IO ()
elevatorController elevatorVar = forever $ do
  -- Atomically get the next floor from the queue
  (nextFloor, dir) <- atomically $ do
    state <- readTVar elevatorVar
    case requests state of
      []               -> retry  -- Block until a request arrives
      (targetFloor:rs) -> do
        -- Remove the floor from queue and start moving
        let direction = getDirection (currentFloor state) targetFloor
        writeTVar elevatorVar $ state
          { moving = direction, requests = rs }
        return (targetFloor, direction)

  putStrLn ("Going " ++ show dir ++ " to " ++ show nextFloor)
  moveToFloor elevatorVar nextFloor
The moveToFloor function simulates the physical movement of the elevator.
moveToFloor :: TVar ElevatorState -> Floor -> IO ()
moveToFloor elevatorVar targetFloor = do
  elevatorState <- readTVarIO elevatorVar
  when (currentFloor elevatorState /= targetFloor) $ do
    -- Takes 1 second to move between floors
    threadDelay (1000000 * numberOfFloorsToMove)
    atomically $
      modifyTVar elevatorVar (\elevatorState' ->
        elevatorState' { currentFloor = targetFloor
                       , moving       = Idle
                       }
        )
  putStrLn ("Arrived at " ++ show targetFloor)
The buttonPress function handles both external calls (someone waiting for the elevator) and internal requests (someone inside selecting a destination):
-- | Whenever a button is pressed this function is called.
--
-- There are two scenarios when a button is pressed:
--
-- 1. When a person is calling the elevator to a floor in order to enter it.
-- 2. When a person is inside the elevator and wants to instruct the elevator
--    to go to a particular floor.
--
buttonPress :: TVar ElevatorState -> Floor -> IO ()
buttonPress elevatorVar floor = do
  putStrLn ("Pressing button to " ++ show floor)
  atomically $
    modifyTVar elevatorVar $ \state -> do
      case requests state of
        rs@(nextFloor:_)
          | let mostRecentRequestedFloor = last rs
          ,    nextFloor /= floor
            || mostRecentRequestedFloor /= floor ->
            state { requests = rs ++ [floor] }
          | otherwise -> state
        [] -> state { requests = [floor] }
Consider the following example scenario and timeline:

The elevator starts on the ground floor.

Person A is on the first floor and presses the button to call the elevator to the first floor.

While the elevator is going up, Person B arrives on the ground floor calls it to the ground floor.

Elevator arrives at the first floor.

Person A enters and presses the button to go to the second floor.

Elevator goes to the ground floor to pick up Person B.

Person B enters and presses the button to go to the first floor.

Elevator goes to the second floor.

Elevator goes to the first floor.
-- | This example mimicks the scenario above, pressing buttons in the right
-- order.
elevatorExample :: [Floor] -> IO ()
elevatorExample floors = do
  elevator <- initElevator
  withAsync (elevatorController elevator)
    $ \controllerAsync -> do
        -- Simulate multiple people pressing buttons simultaneously
        forConcurrently_ floors (buttonPress elevator)
        threadDelay (10 * 1000000)
        cancel controllerAsync

elevatorExample [First, Ground, Second, First]
This function spawns the elevator controller and then simulates multiple button presses happening concurrently. Let’s trace through our example:
Pressing button to First
Going Up to First
Pressing button to Ground
Pressing button to Second
Pressing button to First
Arrived at First
Going Down to Ground
Arrived at Ground
Going Up to Second
Arrived at Second
Going Down to First
Arrived at First
Does such a simple implementation adhere to the specified time constraints? The answer is no, a FIFO elevator algorithm is easy to implement but can be inefficient if the requests are spread out across floors, leading to more travel time.

How would one go about to test/verify this? Testing timeliness constraints in concurrent IO is tricky, due to its non-deterministic nature and limited observability.

io-sim: Deterministic IO Simulator

io-sim closes the gap between the code that’s actually run in production and the code that runs in tests. Combined with property based testing techniques it is possible to simulate execution of a program for years worth of simulated time and find reproducible, rare edge-case bugs.

io-sim achieves this by taking advantage of the io-classes set of packages, which offers a class-based API compatible with most of the core Haskell packages, including mtl. In general the APIs follow the base or async

io-sim is a time based, discrete event simulator. Which means, it provides a granular execution trace that can be used from inspecting the commit order of STM transactions to validating a high level, temporal logic property over some abstract trace. The best part is that code requires minimal changes to use io-sim, just polymorphic type signatures that work with both IO and IOSim monads. Here’s the elevator controller code refactored for testing with io-sim:
initElevator :: MonadSTM m => m (TVar m ElevatorState)
initElevator = ...

elevatorController
  :: ( MonadSTM m
     , MonadDelay m
     , MonadSay m
     )
  => TVar m ElevatorState -> m ()
elevatorController elevatorVar =
  ...
  say ("Going " ++ show dir ++ " to " ++ show nextFloor)
  ...

moveToFloor
  :: ( MonadSTM m
     , MonadDelay m
     , MonadSay m
     )
  => TVar m ElevatorState -> Floor -> m ()
moveToFloor elevatorVar targetFloor = do
  ...
  say ("Arrived at " ++ show targetFloor)

getDirection :: Floor -> Floor -> Direction
getDirection from to = ...

buttonPress
  :: ( MonadSTM m
     , MonadSay m
     )
  => TVar m ElevatorState -> Floor -> m ()
buttonPress elevatorVar floor = do
  say ("Pressing button to " ++ show floor)
  ...

elevatorExample
  :: ( MonadSTM m
     , MonadAsync m
     , MonadDelay m
     , MonadSay m
     )
  => [Floor]
  -> m ()
elevatorExample floors = ...
Notice that only type signatures and IO operations needed changes. The core business logic remains identical. When instantiated to IO, say becomes putStrLn, but in the IOSim monad it produces traceable events.
main :: IO ()
main = do
  let simpleExample :: [Floor]
      simpleExample = [First, Ground, Second, First]

  -- Runs the 'elevatorExample' in IO. This outputs exactly the same output
  -- as before
  elevatorExample simpleExample

  -- Runs the 'elevatorExample' in IOSim.
  putStrLn . intercalate "\n"
           . map show
           . selectTraceEventsSayWithTime
           -- ^ Extracts only the 'say' events from the 'SimTrace' and
           -- attaches the timestamp for each event
           --
           -- selectTraceEventsSayWithTime :: SimTrace a -> [(Time, String)]
           --
           -- This function takes a 'SimTrace' and filters all 'EventSay'
           -- traces. It also captures the time of the trace event.
           $ runSimTrace (elevatorExample simpleExample)
           -- ^ Runs example in 'IOSim'
           --
           -- runSimTrace :: (forall s. IOSim s a) -> SimTrace a
           --
           -- This function runs a IOSim program, yielding an execution trace.
Running the program above, the first noticeable thing is that when the program runs in IO, it actually takes 10 real seconds to run due to the threadDelay calls. However, when the program runs in IOSim the output is instantaneous. This is because io-sim operates on simulated time rather than wall-clock time, i.e. only the internal clock advances when threads execute time-dependent operations like threadDelay or timeouts. Between these operations, the simulation executes as if it had infinite CPU speed, i.e. all computations at a given timestamp complete instantly, yet remain sequentially ordered and deterministic.
(Time 0s,"Pressing button to First")
(Time 0s,"Going Up to First")
(Time 0s,"Pressing button to Ground")
(Time 0s,"Pressing button to Second")
(Time 0s,"Pressing button to First")
(Time 1s,"Arrived at First")
(Time 1s,"Going Down to Ground")
(Time 2s,"Arrived at Ground")
(Time 2s,"Going Up to Second")
(Time 4s,"Arrived at Second")
(Time 4s,"Going Down to First")
(Time 5s,"Arrived at First")
This particular scenario doesn’t violate the constraint. To find violations, property-based testing can explore the space of possible request patterns. The only problem is that our say traces are strings which is not a very functional way of tracing things.

contra-tracer: Structured Tracing

While say provides basic, string-based tracing, real systems need structured tracing of domain-specific events. String-based logging quickly becomes inadequate when trying to verify complex properties or analyze system behavior programmatically. Tracing strongly-typed events that can be filtered, analyzed, and used in property tests is much better. The contra-tracer library provides a contravariant tracing abstraction that integrates seamlessly with io-sim.

The key advantages of structured tracing:

Type Safety: Events are strongly typed, preventing typos and logging errors.

Composability: Tracers can be filtered, transformed, and combined.

Testability: Events can be programmatically analyzed in tests.

All one needs to do is to have a custom trace type:
data ElevatorTrace = ButtonPress Floor
                   | Going Direction Floor
                   | ArrivedAt Floor
                   deriving (Eq, Show, Typeable)
And substitute all calls to say for traceWith tracer (ButtonPress floor), for example.

With structured tracing in place, extracting and analyzing traces becomes type-safe and straightforward:
-- | Extract typed elevator events with timestamps

extractElevatorEvents :: SimTrace a -> [(Time, ElevatorTrace)]
extractElevatorEvents =
  selectTraceEventsDynamicWithTime
Property-Based Testing: Verifying Timing Constraints

The elevator system began with a clear requirement: no passenger should wait more than 4 seconds. The FIFO implementation seemed reasonable, but the elevator can end up travelling between the bottom and top floors whilst someone in the middle waits their turn.

With typed traces from contra-tracer and deterministic simulation from io-sim, QuickCheck can systematically explore the space of possible request patterns and verify this property.

To verify our timing constraint, we need to:

Generate random sequences of floor requests

Run each sequence through the elevator simulation

Check that every passenger gets service within 4 seconds

Let’s start with the test data generation:
-- | 'Floor' Arbitrary instance.
--
-- Randomly generate floors. The shrink instance is the most important here
-- since it will be responsible for generating a simpler counterexample.
--
instance Arbitrary Floor where
  arbitrary = elements [Ground, First, Second]
  shrink Second = [Ground, First]
  shrink First  = [Ground]
  shrink Ground   = []
The shrinking strategy is important because when QuickCheck finds a failing case with many floors, it will try simpler combinations to find the minimal reproduction of the original input.

To verify the property that no passenger waits more than 4 seconds for the elevator to arrive to its floor, one needs to track the button presses and measure how long until the elevator arrives.

The property works by maintaining a map of pending requests. Each ButtonPress adds an entry (keeping the earliest if multiple people request the same floor), and each ArrivedAt checks if that floor was requested and whether the wait exceeded 4 seconds:
-- Traverse the event trace and check if there is any gap longer than 4s
-- between requests and the elevator arriving at the request's floor.
--
violatesFourSecondRule :: [(Time, ElevatorTrace)] -> Property
violatesFourSecondRule events = counterexample (intercalate "\n" $ map show events)
                              $ checkViolations events Map.empty
  where
    checkViolations :: [(Time, ElevatorTrace)] -> Map Floor DiffTime -> Property
    -- Fail if there are pending requests
    checkViolations [] pending =
      counterexample ("Elevator never arrived at: " ++ show pending)
                     (Map.null pending)
    checkViolations ((Time t, event):rest) pending =
      case event of
        -- Add request to the pending requests map. Note that if there's
        -- already a request for a particular floor, overwriting the
        -- timestamp is not the right thing to do because there's an older
        -- request that shouldn't be forgotten.
        --
        ButtonPress floor ->
          checkViolations rest (Map.alter (maybe (Just t) Just) floor pending)

        -- The elevator arrived at a floor. Check if it took more than 4
        -- seconds to do so. If not continue and remove the request from
        -- the pending map.
        --
        ArrivedAt floor ->
          case Map.lookup floor pending of
            Nothing ->
              checkViolations rest pending
            Just requestTime
              | let time = t - requestTime
                counterexample (  "Passenger waited "
                               ++ show time
                               ++ " for the elevator to arrive to the "
                               ++ show floor
                               ++ " floor"
                               ) False
              | otherwise -> checkViolations rest (Map.delete floor pending)

        _ -> checkViolations rest pending
Then it is just a matter of running the example for randomly generated inputs, extract the trace and use QuickCheck to assert if the property is true or not.
prop_no_passenger_waits_4_seconds :: [Floor] -> Property
prop_no_passenger_waits_4_seconds floors =
  -- Run the button press sequence and get the execution trace
  --
  let trace = extractElevatorEvents
            $ runSimTrace
            $ elevatorExample (Tracer (emit traceM)) floors

   in violatesFourSecondRule trace

  where
Running this property, QuickCheck quickly finds a counterexample:
*** Failed! Falsified (after 8 tests and 2 shrinks):
[Second,Ground,First]
(Time 0s,ButtonPress Second)
(Time 0s,Going Up Second)
(Time 0s,ButtonPress Ground)
(Time 0s,ButtonPress First)
(Time 2s,ArrivedAt Second)
(Time 2s,Going Down Ground)
(Time 4s,ArrivedAt Ground)
(Time 4s,Going Up First)
(Time 5s,ArrivedAt First)
Passenger waited 5s for the elevator to arrive to the First floor
The counterexample is minimal thanks to QuickCheck’s shrinking. Here, one can imagine three passengers, pressing a button almost at the same time. Since the elevator starts on the Ground floor and it is the Second floor passenger that wins the race, the elevator starts going to the Second floor and queues the Ground and the First floor requests, by this order. It then takes 5 seconds in total for the elevator to arrive at the First floor, violating the timeliness requirement.

With property test in place, it is possible to iterate on better algorithms with confidence. prop_no_passenger_waits_4_seconds property will be able to assert if any of the improvements actually meet the timing requirements.

Using io-sim in the Real World

Real systems don’t explicitly block, they perform actual work that takes time. To make such code testable with io-sim, one can introduce a typeclass abstraction (e.g. MonadElevator m) with methods like moveElevator. In production, this would perform real hardware control; in tests, it would use threadDelay to simulate the operation’s duration.

In this elevator example, in a real system, there would be a sensor which would inform the controller at what time the elevator arrive at a specific floor, at which point the internal logic about the current floor of the elevator would be updated. With suitable abstraction, that implementation could replace our simplification using threadDelay.

io-sim can accurately simulate the standard IO operations, but this additional abstraction also introduces the challenge of verifying that the model accurately describes the real-world interactions. For example, 1 second is actually a very fast elevator, so our model and timeliness requirements may have to be modified slightly. That’s a topic left for another blog post!

Related Tools and Libraries

The Haskell ecosystem offers several libraries to test concurrent systems, each one addresses different aspects of the problem. Here are two of the most popular and known ones:

dejafu

quickcheck-state-machine

Each takes a slightly different approach to exploring thread schedules, invariants, or state-space, and all have proven useful in practice.

dejafu explores all possible thread interleavings to find concurrency bugs. The library offers a similar typeclass abstraction to io-classes for concurrency primitives, allowing testing code that uses threads, MVars and STM.

quickcheck-state-machine tests stateful programs using state machine models with pre and post-conditions. The library can find race conditions through parallel testing. It excels at testing APIs with complex state dependencies, e.g. databases or file systems, but focuses on state correctness rather than temporal properties.

io-sim distinguishes itself by being the only time-based simulator. One can’t easily ask “what happens when this operation takes 150ms instead of 15ms?” with dejafu nor quickcheck-state-machine. io-sim enables testing of timeout logic, retry mechanisms, timeliness constraints, etc. The ability to compress years of simulated execution into seconds of test runtime makes io-sim particularly valuable for testing long-running systems where bugs emerge only after extended operation.

Conclusion

The key insight is that io-sim simulates the actual behavior of Haskell’s runtime. STM transactions, thread scheduling, and time passing behave exactly as in production, but deterministically.

For concurrent Haskell systems with timing requirements, e.g. network protocols, distributed systems, or real-time controllers, io-sim allows the verification of time-sensitive properties. The library offers much more than shown here, including thread scheduling exploration testing with partial order reduction.

The complete code examples are available here.
by armando at October 13, 2025 12:00 AM

October 09, 2025

Oskar Wickström

Programming in the Sun: A Year with the Daylight Computer

I’ve been hinting on X/Twitter about my use of the Daylight DC-1 as a programming environment, and after about a year of use, it’s time to write about it in longer form. This isn’t a full product review, but rather an experience report on coding in sunlight. It’s also about the Boox Tab Ultra – which has a different type of display – and how it compares to the DC-1 for my use cases.

This is not a sponsored post.

Neovim in Termux on the Daylight DC-1.

Why do I even bother, you might ask? Sunlight makes me energetic and alert, which I need when I work. Living in the Nordics, 50% of the year is primarily dark, so any direct daylight I can get becomes really important. I usually run light mode on my Framework laptop during the day, but working in actual daylight with these displays, or plain old paper, is even better.

The Setup

Here are the main components of this coding environment:

Daylight DC-1: an Android-based tablet with a “Live Paper” display (Reflective LCD, not E-Ink)

8BitDo Retro Mechanical Keyboard: a mechanical Bluetooth-enabled keyboard, with Kailh key switches and USB-C charging and optional connection

Termux: a terminal emulator for Android, with a package collection based on apt

SSH, tmux, and Neovim: nothing surprising here

I use a slimmed-down version of my regular dotfiles, because this setup doesn’t use Nix. I’ve manually installed Neovim, tmux, and a few other essentials, using the package manager that comes with Termux. I’ve configured Termux to not show its virtual keyboard when a physical keyboard is connected (the Bluetooth keyboard). The Termux theme is “E-Ink” and the font is JetBrains Mono, all built into Termux. Neovim uses the built-in quiet colorscheme for maximum contrast.

Certain work requires a more capable environment, and in those cases I connect to my workstation using SSH and run tmux in there. For writing or simpler programming projects (I’ve even done Rust work with Cargo, for instance), the local Termux environment is fine.

Sometimes I want to go really minimalist, so I hide the tmux status bar and run Goyo in Neovim. Deep breaths. Feel the fresh air in your lungs. This is especially nice for writing blog posts like this one.

Minimalist typing with Goyo in Neovim.

My blog editing works locally in Termux, with a live reloading Chrome in a split window, here during an evening writing session with the warm backlight enabled:

Split-screen blogging locally on the Daylight.

There’s the occasional Bluetooth connection problem with the 8BitDo keyboard. I also don’t love the layout, and I’m considering getting the Kinesis Freestyle2 Blue instead. I already have the wired version for my workstation, and the ergonomics are great.

Daylight DC-1 vs Boox Tab Ultra

What about the Boox? I’ve had this device for longer and I really like it too, but not for the same tasks. The E-Ink display is, quite frankly, a lot nicer to read on; EPUB books, research PDFs, web articles, etc. The 227 PPI instead of the Daylight’s 190 PPI makes a difference, and I like the look of E-Ink better overall.

However, the refresh rate and ghosting make it a bit frustrating for typing. Same goes for drawing, which I’ve used the Daylight for a lot. Most of my home renovation blueprints are sketched on the Daylight. The refresh rate makes it possible.

When reading at night with a more direct bedside lamp, often in combination with a subtle backlight, the Boox is much better. The Daylight screen can glare quite a bit, so the only option is backlight only. And at that point, a lot of the paperlike quality goes away.

You can also get some glare when there’s direct sunlight at a particular angle:

You may get glare in direct sunlight or from lamps at some angles.

Even if I don’t write or program directly on the Boox, I’ve experimented with using it as a secondary display, like for the live reload blog preview:

Using the Boox Tab Ultra as a secondary display by browsing the live reload HTTP server.

To sum up, these devices are good for different things, in my experience. I’ve probably spent more time on the Boox, because I’ve had it for longer and I’ve read a lot on it, but the Daylight has been much better for typing and drawing.

Another thing I’d like to try is a larger E-Ink monitor for my workstation, like the one Zack is hacking on. I’m hoping this technology continues to improve on refresh rate, because I love E-Ink. Until then, the Daylight is a good compromise.

Touch grass, as they say.

October 09, 2025 10:00 PM

Sandy Maguire

Theorems for Free Redux
A reader recently got in touch with me regarding my 2017 blog post Review: Theorems for Free. He had some questions about the paper/my review, and upon revisiting it, I realized that I had no idea how the paper worked anymore.

So I decided to rehash my understanding, and came up with something much conceptually clearer about what is happening and why.

A quick summary of Theorems for Free:

For any polymorphic type, we can generate a law that must hold for any value of that type.

One the examples given is for the function length :: forall a. [a] -> Int, which states that forall f l. length (fmap f l) = length l—namely, that fmap doesn’t change the length of the list.

Theorems for Free gives a roundabout and obtuse set of rules for computing these free theorems. But, as usual, the clarity of the idea is obscured by the encoding details.

The actual idea is this:

Parametrically-polymorphic functions can’t branch on the specific types they are instantiated at.

Because of this fact, functions must behave the same way, regardless of the type arguments passed to them. So all of the free theorems have the form “replacing the type variables before calling the function is the same as replacing the type variables after calling the function.”

What does it mean to replace a type variable? Well, if we want to replace a type variable a with a', we will generate a fresh function f :: a -> a', and then stick it wherever we need to.

For example, given the function id :: a -> a, we generate the free theorem:
forall f a.
  f (id a) = id (f a)
or, for the function fromJust :: Maybe a -> a, we get:
forall f ma.
  f (fromJust ma) = fromJust (fmap f ma)
This scheme also works for functions in multiple type parameters. Given the function swap :: (a, b) -> (b, a), we must replace both a and b, giving the free theorem:
forall
    (f :: a -> a')
    (g :: b -> b')
    (p :: (a, b))
  swap (bimap f g p) = bimap g f (swap p)
In the special case where there are no type parameters, we don’t need to do anything. This is what’s happening in the length example given in the introduction.

Simple stuff, right? The obfuscation in the paper comes from the actual technique given to figure out where to apply these type substitutions. The paper is not fully general here, in that it only gives rules for the [] and (->) type constructors (if I recall correctly.) These rules are further obscured in that they inline the definitions of fmap, rather than writing fmap directly.¹ But for types in one variable, fmap is exactly the function that performs type substitution.

Perhaps this paper predates typeclasses? Very possible.↩︎
October 09, 2025 11:27 AM

Chris Smith 2

Rebooting NYHaskell

It’s been a few years since the last meeting of the New York Haskell User Group. I’m very pleased to announce that we’ll be meeting again starting in November. Richard Eisenberg is presenting at the next meeting. I hope to see you there!
A Tale of Two Lambdas: A Haskeller’s Journey Into Ocaml
November 6, 2025
Jane Street, 250 Vesey St, New York, NY 10007
https://www.meetup.com/ny-haskell/events/311160463
NOTE: Please RSVP if you plan to attend. If you arrive unannounced, we’ll do our best to get you a visitor badge so you can attend, but it’s a last minute scramble for the security staff.
Schedule
6:00–6:30: Meet and Greet
6:30–8:30: Presentation
8:30–10:00: Optional Social Gathering @ a nearby bar
Speaker: Richard Eisenberg
Richard Eisenberg is a Principal Researcher at Jane Street and a leading figure in the Haskell community. His work focuses on programming language design and implementation, with major contributions to GHC, including dependent types and type system extensions. He is widely recognized for advancing the expressiveness and power of Haskell’s type system while making these ideas accessible to the broader functional programming community.
Abstract
After spending a decade focusing mostly on Haskell, I have spent the last three years looking deeply at Ocaml. This talk will capture some lessons learned about my work in the two languages and their communities — how they are similar, how they differ, and how each might usefully grow to become more like the other. I will compare Haskell’s purity against Ocaml’s support for mutation, type classes against modules as abstraction paradigms, laziness against strictness, along with some general thoughts about language philosophy. We’ll also touch on some of the challenges both languages face as open-source products, in need of both volunteers and funding. While some functional programming experience will definitely be helpful, I’ll explain syntax as we go — no Haskell or Ocaml knowledge required, as I want this talk to be accessible equally to the two communities.

by Chris Smith at October 09, 2025 12:50 AM

GHC Developer Blog

GHC 9.14.1-alpha3 is now available

GHC 9.14.1-alpha3 is now available

bgamari - 2025-10-09

The GHC developers are very pleased to announce the availability of the third alpha release of GHC 9.14.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

GHC 9.14 will bring a number of new features and improvements, including:

Significant improvements in specialisation:

The SPECIALISE pragma now allows use of type application syntax

The SPECIALISE pragma can be used to specialise for expression arguments as well as type arguments.

Specialisation is now considerably more reliable in the presence of newtypes

Significant improvements GHCi including:

Correctness and performance improvements in the bytecode interpreter

Features in the GHCi debugger

Support for multiple home units in GHCi

Implementation of the Explicit Level Imports proposal

RequiredTypeArgments can now be used in more contexts

SSE/AVX2 support in the x86 native code generator backend

A major update of the Windows toolchain

â€¦ and many more

A full accounting of changes can be found in the release notes. Given the many specialisation improvements and their potential for regression, we would very much appreciate testing and performance characterisation on downstream workloads.

Note that while this release makes many improvements in the specialisation optimisation, polymorphic specialisation will remain disabled by default in the final release due to concern over regressions of the sort identified in #26329. Users needing more aggressive specialisation can explicitly enable this feature with the -fpolymorphic-specialisation flag. Depending upon our experience with 9.14.1, we may enable this feature by default in a later minor release.

This is the third alpha release of 9.14.1. This comes later than expected in part due to work on a resolving a regression in the macOS 26 (#26166) which threatened the usability of the release. While a complete fix for this issue is not present in this alpha, we have done enough work to have confidence that it will be in finished for the release candidate which we expect should come the week of 27 October.

We would like to thank the Zw3rk stake pool, Well-Typed, Mercury, Channable, Tweag I/O, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work have made the Haskell ecosystem what it is today.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at October 09, 2025 12:00 AM

October 06, 2025

Gabriella Gonzalez

Nix Steering Committee vote of no confidence

Nix Steering Committee vote of no confidence
Earlier this week I proposed a vote of no confidence for the Nix Steering Committee, which would have ended the terms of all currently serving members and put all seven positions up for election in November. That vote failed with 3 out of 6 votes (4 were necessary) and I’m writing up a post-mortem on why I proposed and voted in favor of the vote of no confidence even though it ultimately failed.

Background

In a previous post of mine I announced that I was ending my Nix Steering Committee term early (at the one year mark instead of the two year term I was elected for). In that post I shared some fairly polite criticisms of the Nix Steering Committee’s performance over the last year and explained why I was stepping down early (basically: burnout induced by the Nix Steering Committee’s dysfunction).

Not long after that the moderation team resigned and I was part of the problem and bear some responsibility for that. I (along with three other Steering Committee members: Tom Berek, John Ericson, and Robert Hensing) voted in favor of both of the moderation-related changes that the moderation team resigned in response to (I later changed one of my two votes at the last minute but I take responsibility for the consequences of both votes).

In the wake of that, Winter (another Steering Committee member), publicly blew the whistle on internal SC discussions specifically highlighting malfeasance from another Steering Committee member (John Ericson) although the exact conversations were not included (only summaries and third parties who had seen the conversations confirming the details). This led to a public outcry calling for John’s resignation and/or a vote of no confidence.

In response to that outcry four members of the Steering Committee (Tom, John, Robert, and Jan) responded by publishing the votes relevant to the ongoing controversy and also claiming that the conversations Winter leaked were taken out of context.

I personally agreed with the outcry and the targeted criticisms of John based on my own experiences working on the Steering Committee. I didn’t propose to remove John from the Steering Committee but that same day I did propose a vote of no confidence and I’ll explain why I proposed and voted in favor of that.

Politics

From my perspective, three current members and one former member of the Steering Committee have already lost confidence in the committee:

Franz is a former member of the Steering Committee who had to step down earlier for his well-being who has echoed many of the accusations being leveled at the Steering Committee based on his experiences working on the committee, and called for the current Steering Committee to resign pending reelection

Winter has lost confidence in the Steering Committee, going so far as to blow the whistle on internal comms and vote in favor of the vote of no confidence

I lost confidence in the Nix Steering Committee even before the recent controversy, which led me to end my term early, publish my Steering Committee retrospective, and ultimately vote in favor of the vote of no confidence.

Jan also ended his term early and voted for the vote of no confidence

If Franz had not been forced to resign for health reasons the vote of no confidence would have gone through, but currently the Steering Committee is deadlocked over this vote. Only a minority of the original Steering Committee (John, Tom, and Robert) still believe that the Steering Committee has any legitimacy at this point.

The Nix core team

Not so coincidentally John, Tom, and Robert are the three Steering Committee members that are also members of the Nix core team. The vote of no confidence made it pretty clear to me that the Nix team has consistently put the needs of their own team and members ahead of the needs of the broader community (which is why I felt compelled to speak out).

It was probably a mistake to allow three Steering Committee members to all be members of the Nix team. There should be a constitutional amendment to consider shared membership on the Nix team to also count as a conflict of interest, which would create a soft limit of one of them on the team and a hard limit of two of them on the team. For more details, see the Nix Constitution’s Conflict of Interest Balance section.

However, besides the constitutional amendment, I’d go even further and say that the Nix community should vote against any member of the current Nix team (which would include Tom who is currently running for re-election), since I believe they are in large part responsible for why our community now has two forks (Lix and Determinate Nix) and is losing ground against both of them.

Nix has lost a large number of contributors to these forks due to dysfunction within the Nix team and now they’ve brought that same dysfunction to the Steering Committee, which has resulted in every other member of the Steering Committee abandoning ship because we can’t do our job.

The Rust rule

A few people brought up the “Rust rule” during the recent controversy, which says that under the Rust governance structure both the Leadership Council (the Rust analog of Nix’s Steering Committee) and their moderation team have the nuclear option of disbanding both teams.

The Nix Constitution has no such rule, but I do think that the Rust rule is the morally correct way to think about the recent controversies, even if it is not enforceable under our current Constitution. In particular, if the moderation team resigns in such a public manner it signals a serious loss of confidence in the leadership of the Steering Committee which justifies the need for members of the Steering Committee to run for reelection and reaffirm their mandate.

Conclusion

The committee is down a member, mired in controversy, and facing a community that feels misled by a lack of transparency. Franz’s public comment confirms that four of the original seven committee members would have supported a vote of no confidence today. I do not believe any member can now credibly claim to hold a mandate.

Note that John and Robert could still run in the next Steering Committee election (a vote of no confidence does not bar them from reelection). To me, refusing to resign under these circumstances and stand for reelection suggests a belief that voters would not return them to office.

Anyone who wishes to remain should run for re-election if they still believe their policies are the best way forward for Nix.

by Gabriella Gonzalez (noreply@blogger.com) at October 06, 2025 02:04 PM

October 02, 2025

Philip Wadler

Type Theory for All: The Goal of Science is to Communicate Ideas

Thank you to the crew at Type Theory for All for their thoughtful and well-prepared interview.

by Philip Wadler (noreply@blogger.com) at October 02, 2025 11:35 AM

Tweag I/O

Single-line and multi-line formatting with Topiary
Writing a formatter has never been so easy: a Topiary tutorial

Single-line and multi-line formatting with Topiary

In a previous post, I introduced Topiary, a universal formatter (or one could say a formatter generator), and showed how to start a formatter for a programming language from scratch. This post is the second part of the tutorial, where we’ll explore more advanced features of Topiary that come in handy when handling real-life languages, and in particular the single-line and multi-line layouts. I’ll assume that you have a working setup to format our toy Yolo language. If you don’t, please follow the relevant sections of the previous post first.

Single-line and multi-line

A fundamental tenet of formatting is that you want to lay code out in different ways depending on if it fits on one line or not. For example, in Nickel, or any functional programming language for that matter, it’s idiomatic to write small anonymous functions on one line, as in std.array.map (fun x => x * 2 + 1) [1,2,3]. But longer functions would rather look like:
fun x y z =>
  if x then
    y
  else
    z
This is true for almost any language construct that you can think of: you’d write a small boolean condition is_a && is_b, but write a long validation expressions as:
std.is_string value
&& std.string.length value > 5
&& std.string.length value < 10
&& !(std.string.is_match "\\d" value)
In Rust, with rustfmt, short method calls are formatted on one line as in x.clone().unwrap().into(), but they are spread over several lines when the line length is over a fixed threshold:
value
    .maybe_do_something(|x| x+1)
    .or_something_else(|_| Err(()))
    .into_iter()
You usually either want the single-line layout or the multi-line one. A hybrid solution wouldn’t be very consistent:
std.is_string value
&& std.string.length value > 5 && std.string.length value < 10
&& !(std.string.is_match "\\d" value)
Some formatters, such as Rust’s, choose the layout automatically depending on the length of the line. Long lines are wrapped and laid out in the multi-line style automatically, freeing the programmer from any micro decision. On the flip side, the programmer can’t force one style in cases where it’d make more sense.

Some other formatters, like our own Ormolu for Haskell, decide on the layout based on the original source code. For any syntactic construct, the programmer has two options:

Write it on one line, or

Write it on two lines or more.

1. will trigger the single-line layout, and 2. the multi-line one. No effort is made to try to fit within reasonable line lengths. That’s up to the programmer.

As we will see, Topiary follows the same approach as Ormolu, although future support for optional line wrapping isn’t off the table¹.

Softlines

Less line breaks, please

Let’s see how our Yolo formatter handles the following source:
input income, status
output income_tax

income_tax := case { status = "exempted" => 0, _ => income * 0.2 }
Since the case is short, we want to keep it single-line. Alas, this gets formatted as:
input income, status
output income_tax

income_tax := case {
  status = "exempted" => 0,
  _ => income * 0.2
}
The simplest mechanism for multi-line-aware layout is to use soft lines instead of spaces or hardlines. Let’s change the @append_hardline capture in the case branches separating rule to @append_spaced_softline:
; Put case branches on their own lines
(case
  "," @append_spaced_softline
)
As the name indicates, a spaced softline will result in a space for the single-line case, and a line break for the multi-line case, which is precisely what we want. However, if we try to format our example, we get the dreaded idempotency check failure, meaning that formatting one time or two times in a row doesn’t give the same result, which is a usually a red flag (and is why Topiary performs this check). What happens is that our braces { and } also introduce hardlines, so the double formatting goes like:
income_tax := case { status = "exempted" => 0, _ => income * 0.2 }

--> (case is single-line: @append_spaced_softline is a space)
income_tax := case {
  status = "exempted" => 0, _ => income * 0.2
}
--> (case is multi-line! @append_spaced_softline is a line break)
income_tax := case {
  status = "exempted" => 0,
  _ => income * 0.2
}
We need to amend the rule for braces as well:
; Lay out the case skeleton
(case
  "{" @prepend_space @append_spaced_softline
  "}" @prepend_spaced_sofline
)
Our original example is now left untouched, as desired. Note that softline annotations are expanded depending on the multi-lineness of the direct parent of the node they attach to (and neither the subtree matched by the whole query nor the node itself). Topiary applies this logic because this is most often what you want. The parse tree of the multi-line version of income_tax:
income_tax := case {
  status = "exempted" => 0,
  _ => income * 0.2
}
is as follows (hiding irrelevant parts in [...]):
0:0  - 4:0    tax_rule
0:0  - 3:1      statement
0:0  - 3:1        definition_statement
0:0  - 0:10         identifier `income_tax`
0:11 - 0:13         ":="
0:14 - 3:1          expression
0:14 - 3:1            case
0:14 - 0:18             "case"
0:19 - 0:20             "{"
1:2  - 1:26             case_branch
                        [...]
1:26 - 1:27             ","
2:2  - 2:19             case_branch
                        [...]
3:0  - 3:1              "}"
The left part is the span of the node, in the format start_line:start_column - end_line:end_column. A node is multiline simply if end_line > start_line. You can see that since "{" is not multiline (it can’t be, as it’s only one character!), if Topiary considered the multi-lineness of the node itself, our previous "{" @append_spaced_softline would always act as a space.

What happens is that Topiary considers the direct parent instead, which is 0:14 - 3:1 case here, and is indeed multi-line.

Both single-line and multi-line case are now formatted as expected.

More line breaks, please

Let’s consider the dual issue, where line breaks are unduly removed. We’d like to allow inputs and outputs to span multiple lines, but the following snippet:
input
  income,
  status,
  tax_coefficient
output income_tax
is formatted as:
input income, status, tax_coefficient
output income_tax
The rule for spacing around input and output and the rule for spacing around , and identifiers both use @append_space. We can simply replace this with a spaced softline. Recall that a spaced softline turns into a space and thus behaves like @append_space in a single-line context, making it a proper substitution.
; Add spaced softline after `input` and `output` decl
[
  "input"
  "output"
] @append_spaced_softline


; Add a spaced softline after and remove space before the comma in an identifier
; list
(
  (identifier)
  .
  "," @prepend_antispace @append_spaced_softline
  .
  (identifier)
)
We also need to add new rules to indent multi-line lists of inputs or outputs.
; Indent multi-line lists of inputs.
(input_statement
  "input" @append_indent_start
) @append_indent_end

; Indent multi-line lists of outputs.
(output_statement
  "output" @append_indent_start
) @append_indent_end
A matching pair of indentation captures *_indent_start and *_indent_end will amount to a no-op if they are on the same line, so those rules don’t disturb the single-line layout.

Recall that as long as you don’t use anchors (.), additional nodes can be omitted from a Tree-sitter query: here, the first query will match an input statement with an "input" child somewhere, and any children before or after that (although in our case, there won’t be any children before).

Scopes

More (scoped) line breaks, please

Let us now consider a similar example, at least on the surface. We want to allow long arithmetic expressions to be laid out on multiple lines as well, as in:
input
  some_long_name,
  other_long_name,
  and_another_one
output result

result :=
  some_long_name
  + other_long_name
  + and_another_one
As before, result is currently smashed back into one line by our current formatter. Unsurprisingly, since our keywords rule uses @prepend_space and @append_space. At this point, you start to get the trick: let’s use softlines! I’ll only handle + for simplicity. We remove "+" from the original keywords rule and add the following rule:
; (Multi-line) spacing around +
("+" @prepend_spaced_softline @append_space)
Ignoring indentation for now, the line wrapping seems to work. For the following example at least:
result :=
  some_long_name
  + other_long_name + and_another_one
which is reformatted as:
result := some_long_name
+ other_long_name
+ and_another_one
However, perhaps surprisingly, the following example:
result :=
some_long_name + other_long_name
+ and_another_one
is reformatted as:
result := some_long_name + other_long_name
+ and_another_one
The first addition hasn’t been split! To understand why, we have to look at how our grammar parses arithmetic expressions:
expression: $ => choice(
  $.identifier,
  $.number,
  $.string,
  $.arithmetic_expr,
  $.case,
),

arithmetic_expr: $ => choice(
  prec.left(1, seq(
    $.expression,
    choice('+', '-'),
    $.expression,
  )),
  prec.left(2, seq(
    $.expression,
    choice('*', '/'),
    $.expression,
  )),
  prec(3, seq(
    '(',
    $.expression,
    ')',
  )),
),
Even if you don’t understand everything, there are two important points:

Arithmetic expressions are recursively nested. Indeed, we can compose arbitrarily complex expressions, as in (foo*2 + 1) + (bar / 4 * 6).

They are parsed in a left-associative way.

This means that our big addition is parsed as: ((some_long_name "+" other_long_name) "+" and_another_one). In the first example, since the line break happens just after some_long_name in the original source, both the inner node and the outer one are multi-line. However, in the second example, the line break happens after other_long_name, meaning that the innermost arithmetic expression is contained in a single line, and the corresponding + isn’t considered multi-line. Indeed, you can see here that the parent of the first + is 7:0 - 7:32 arithmetic_expr, which fits entirely on line 7.
7:0  - 8:17           arithmetic_expr
7:0  - 7:32             expression
7:0  - 7:32               arithmetic_expr
7:0  - 7:14                 expression
7:0  - 7:14                   identifier `some_long_name`
7:15 - 7:16                 "+"
7:17 - 7:32                 expression
7:17 - 7:32                   identifier `other_long_name`
8:0  - 8:1              "+"
8:2  - 8:17             expression
8:2  - 8:17               identifier `and_another_one`
The solution here is to use scopes. A scope is a user-defined group of nodes associated with an identifier. Crucially, when using scoped softline captures such as @append_scoped_space_softline within a scope, Topiary will consider the multi-lineness of the whole scope instead of the multi-lineness of the (parent) node.

Let’s create a scope for all the nested sub-expressions of an arithmetic expression. Scopes work the same as other node groups in Topiary: we create them by using a matching pair of begin and end captures. We need to find a parent node that can’t occur recursively in an arithmetic expression. A good candidate would be definition_statement, which encompasses the whole right-hand side of the definition of an output:
; Creates a scope for the whole right-hand side of a definition statement
(definition_statement
  (#scope_id! "definition_rhs")
  ":="
  (expression) @prepend_begin_scope @append_end_scope
)
We must specify an identifier for the scope using the predicate scope_id. Identifiers are useful when several scopes might be nested or even overlap, and help readability in general.

We then amend our initial attempt at formatting multi-line arithmetic expressions:
; (Multi-line) spacing around +
(
  (#scope_id! "definition_rhs")
  "+" @prepend_scoped_spaced_softline @append_space
)
We use a scoped version of softlines, in which case we need to specify the identifier of the corresponding scope. The captured node must also be part of said scope. You can check that both examples (and multiple variations of them) are finally formatted as expected.

Conclusion

This second part of the Topiary tutorial has taught how to finely specify an alternative formatting layout depending on whether an expression spans multiple lines or not. The main concepts at play here are multi-line versus single-line nodes, and scopes. There is an extension to this concept not covered here, measuring scopes, but standard scopes already go a long way for formatting a real life language. If you’re looking for a comprehensive resource to help you write your formatter, the official Topiary book is for you. You can however find the complete code for this post in the companion repository. Happy hacking!

See #700 ↩
October 02, 2025 12:00 AM

September 26, 2025

Well-Typed.Com

Haskell ecosystem activities report: Juneâ€“August 2025

This is the twenty-eighth edition of our Haskell ecosystem activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of June 2025 to August 2025.

This is a change of name for our GHC activities report, to reflect the fact that it focuses on more than just GHC work. You can find the previous editions collected under the haskell-ecosystem-report tag.

Sponsorship

We offer Haskell Ecosystem Support Packages to provide commercial users with support from Well-Typed’s experts while investing in the Haskell community and its technical ecosystem including through the work described in this report. To find out more, read our announcement of these packages in partnership with the Haskell Foundation. We need funding to continue this essential maintenance work!

Recently we will delighted to welcome Standard Chartered as a Gold Haskell Ecosystem Supporter. Many thanks to Standard Chartered; to our other Haskell Ecosystem Supporters: Channable and QBayLogic; to our existing clients who also contribute to making this work possible: Anduril, Juspay and Mercury; and to the HLS Open Collective for supporting HLS release management.

Team

The Haskell toolchain team at Well-Typed currently includes:

Andreas Klebinger

Ben Gamari

Matthew Pickering

Rodrigo Mesquita

Sam Derbyshire

Hannes Siebenhandl

Zubin Duggal

Mikolaj Konarski

In addition, many others within Well-Typed contribute to GHC, Cabal and HLS occasionally, or contribute to other open source Haskell libraries and tools. This report includes contributions from Alex Washburn and Wen Kokke in particular.

GHC

GHC Releases

Ben worked on the 9.14.1 release, preparing backports and releasing the first alpha.

Ben and Zubin worked on backports for GHC 9.12.3.

Zubin worked on the 9.10.3 release, preparing backports and publishing the rc1, rc2 and rc3 release candidates.

Frontend

Sam made several improvements to the implementation of deep subsumption, allowing GHC to accept programs that it previously rejected (#26225, !14577).

Matt refactored the treatment of nested Template Haskell splices in GHC, making the code paths more consistent and removing code duplication (!14377).

Matt fixed some bugs related to level checking (for the ExplicitLevelImports extension), including a crash in the presence of cyclic imports (#26087, !14478) and some missing level checks (#26088, !14479, #26090, !14550).

Andreas allowed the type-class specialiser to look through type families, exposing more opportunities for specialisation (#26051, !14272).

Sam identified and fixed a situation in which RULES which were not active could fire nonetheless (#26323, !14687).

Ben and Andreas investigated various performance issues in the opaque newtype dictionaries patch, in preparation for merging to 9.14 (!10479)

LLVM backend

Alex fixed incorrect sign-extension and narrowing of bitReverse, byteSwap and pdep primops in the LLVM backend (#20645, #26109, !14609).

Alex fixed the LLVM backend generating references to non-existent LLVM intrinsics llvm.x86.bmi.{pdep,pext}.{8,16}, replacing them with usage of the appropriate 32-bit operations (#26065, !14647).

Andreas made GHC allow LLVM versions outside of the supported range, emitting a warning rather than an error (#25915, !14531).

Ben fixed the treatment of built-in arrays with LLVM, which was necessary to support newer LLVM versions. This fixed a raft of failing tests with the LLVM backend (#25769, !14157).

Ben implemented a major rework of the Windows Clang toolchain (!14442), with help from long-time Windows contributor Tamar Christina. This was necessary to adapt to changes in newer LLVM versions, such as the use of API sets.

GHCi and bytecode interpreter

Rodrigo has been working to improve the GHCi debugger, as a step towards better debugger tooling for Haskell programs. In particular, he implemented support for stepping-out of a function when debugging in GHCi, with the new :stepout command (#26042, !14416). He also made breakpoint indices 32 bits instead of 16 bits, as loading large programs in GHCi could overflow the counter (#26325, !14691), and made various other improvements to breakpoints (!14461, !14480, !14534).

Andreas fixed some issues around endianness in the bytecode interpreter, which could cause the interpreter to produce incorrect results from primitive operations on some platforms (#25791, #23387, !14172).

Hannes added a new GHCi flag -fno-load-initial-targets, allowing GHCi to be started without immediately loading all the target modules, so the user can selectively load the modules they are interested in with :reload (#26144, !14448). This can significantly reduce GHCi startup times when working on part of a large project.

Hannes fixed the remaining issues in the interaction of the GHCi :reload command with multiple home units (#26128, !14427).

Matt fixed some bugs in interpreter statistics calculations (#25756, !13956, #25695, !13879).

Documentation

Hannes updated the user’s guide to advertise full support for multiple home units (#20889, !14426).

Wen improved the documentation of eventlog cost centres and sample labels (!14499), and of heap profile IDs (!14506).

Andreas added @since annotations for the -fexpose-overloaded-unfolding and -fdo-clever-arg-eta-expansion GHC flags (#26112, #26113, !14517).

Runtime system and linker

Andreas added support for COFF BigObj files in the linker (!14582).

Ben made the linker less reliant on file extensions to identify archive members (#13103, #24230, !14405).

Hannes fixed an oversight in the hashing function used in the RTS (#26274, !14651).

Profiling and debugging

Hannes added a new primop and ghc-experimental API which allows annotating the call stack with arbitrary data, in pursuit of better backtraces (#26218, !14538). See his recent blog post Better Haskell stack traces via user annotations for a more detailed explanation of the new features.

Hannes improved the implementation of the Backtraces type and extended the backtrace mechanism to allow configuring stack decoders (#26211, !14532). He also reverted a change that would have exposed the internal implementation of the Backtraces type from base, as this needs a CLC proposal (!14587).

Andreas disabled the -fprof-late-overloaded-calls functionality for join points, as this could cause GHC crashes (#26138, !14460). This is a temporary fix before the root problem can be addressed in full.

Matt allowed info table entries used for profiling to be distributed separately (#21766, !14465).

Wen disabled the usage of --eventlog-flush-interval in the non-threaded RTS, to avoid eventlog corruption (#26222, !14547).

Wen removed the unused hard-coded profile_id from eventlog traces (!14507).

Ben factored out the constructor ctoi tuple info tables into a data section for re-usability (!14508).

Ben fixed a regression in zstd compression support for info-table provenance tables (#26312)

Core libraries and ghc library

Zubin added newNameCache, a version of initNameCache that isn’t prone to being misused (!14446).

This was in response to a bug in which the weeder library (weeder #194, #26055) was using initNameCache incorrectly.

Rodrigo added an export of displayExceptionWithInfo to base, implementing CLC proposal #344 (!14419).

Hannes implemented CLC proposal #212, removing some deprecated heap representation details from GHC.Exts (!14544).

Ben removed IOPort, an internal datatype that could be replaced by MVar, implementing CLC proposal #213 (!8776).

Build system and packaging

Zubin fixed an issue in which the user’s guide PDF would not be included in a binary distribution (#24093, !14469).

Ben added support for otool, install-name-tool and LLVM utilies such as llc, opt to ghc-toolchain (#23675, !14050).

Ben allowed the CrossCompiling predicate to be overridden (#26236, !14568).

Ben dropped build-system logic for preferring the now-deprecated ld.gold linker (#25716, !14324).

CI and testing

Hannes and Zubin upgraded the bootstrap compiler to 9.10.1 on MacOS (!14601), Windows (!14622) and FreeBSD (!14666). This allowed Hannes to update the test-bootstrap job to use 9.10.1 (!14676).

Zubin improved how the testsuite driver filters out certain spurious linker warnings (#26249, !14615).

Haddock

Zubin fixed Haddock emitting spurious warnings for undocumented type family axioms (which cannot have documentation attached to them) (#26114, !14447).

Cabal

Matt helped the Cabal project adopt a formal proposal process by writing up the Cabal Proposals Process. This document was discussed in Cabal #11006, and eventually agreed upon by the existing Cabal maintainers.

Matt opened a Cabal proposal to add support for bytecode artifacts, which would speed up GHCi usage and allow the GHCi debugger to step through dependencies such as code in base (Cabal Proposals #2).

Matt helped finish up work by Cabal contributor Julian G (@jgotoh) to migrate the cabal.project parser to use Parsec (Cabal #8889).

Matt updated the CI release scripts, bumping the boot compiler version and updating platforms (Cabal #11032).

Matt adapted the Cabal library to the change in exception contexts in base-4.21 (Cabal #11125). This issue arose when helping Phil de Joux investigate a mysterious issue on Cabal #10684.

Matt made Cabal use response files when starting multi-repl sessions, rather than passing long command-line invocations (Cabal #10995). These changes were subsequently reverted for the Cabal 3.16 release, in order to preserve compatibility with released HLS bindists (Cabal #11101). The plan is to go forward with response files in Cabal 3.18, giving HLS the time to adapt.

Matt, with help from Hannes, added the --with-repl flag to the cabal-install repl command, allowing external tools such as hie-bios and doctest to easily figure out the correct options for starting a GHCi session (Cabal #10996).

Haskell Language Server

Hannes made hie-bios use Cabal’s --with-repl command to load the session, which greatly simplifies the implementation and its treatment of multiple home units (hie-bios #466).

ghc-debug

Hannes made the IPE information display inline for stack closures (ghc-debug #73).

Matthew allowed the ghc-debug-brick terminal interface to be incrementally updated while a query is run (ghc-debug #68).

Hannes restructured the ghc-debug-brick module hierarchy (ghc-debug #72), and made it available as a separate library (ghc-debug #74).

Hannes added support for custom stack annotations (ghc-debug #69).

Infrastructure

Ben migrated the haskell.org mail delivery and mailing list infrastructure to a more maintainable hosting situation. This involved rebuilding the mail delivery configuration on NixOS, migrating two decades of mailing list data to from mailman-2 to mailman-3, and implementing a scheme to ensure that the previous mailman-2 archives remain available.

In response to user feedback, Ben carried out a variety of improvements to the Hackage documentation builder, allowing a greater breadth of packages to build.

Ben coordinated with the Haskell Foundation to provision a set of new CI runners for AArch64/Linux to replace capacity lost to the end of Azure’s open-source program.

Libraries

Sam migrated the ghc-typelits-natnormalise library to use ghc-tcplugin-api in ghc-typelits-natnormalise #90.

This eases the maintenance burden of keeping ghc-typelits-natnormalise working across different GHC versions. Several pre-existing bugs were fixed in the process (ghc-typelits-natnormalise #47, ghc-typelits-natnormalise #70, ghc-typelits-natnormalise #71).

Wen improved the eventlog-socket library, adding support for setting the eventlog writer from C, and providing documented usage examples.

by adam, andreask, ben, hannes, matthew, mikolaj, rodrigo, sam, zubin at September 26, 2025 12:00 AM

September 24, 2025

Chris Reade

PenroseKiteDart User Guide
Introduction

(Updated September 2025 for PenroseKiteDart version 1.5.1)

PenroseKiteDart is a Haskell package with tools to experiment with finite tilings of Penrose’s Kites and Darts. It uses the Haskell Diagrams package for drawing tilings. As well as providing drawing tools, this package introduces tile graphs (Tgraphs) for describing finite tilings. (I would like to thank Stephen Huggett for suggesting planar graphs as a way to reperesent the tilings).

This document summarises the design and use of the PenroseKiteDart package.

PenroseKiteDart package is now available on Hackage.

The source files are available on GitHub at https://github.com/chrisreade/PenroseKiteDart.

There is a small art gallery of examples created with PenroseKiteDart here.

Index

About Penrose’s Kites and Darts

Using the PenroseKiteDart Package (initial set up).

Overview of Types and Operations

Drawing in more detail

Forcing in more detail

Advanced Operations

Other Reading

1. About Penroseâ€™s Kites and Darts

The Tiles

In figure 1 we show a dart and a kite. All angles are multiples of $36^{\circ}$ (a tenth of a full turn). If the shorter edges are of length 1, then the longer edges are of length $\phi$ , where $\phi = (1+ \sqrt{5})/ 2$ is the golden ratio.

Figure 1: The Dart and Kite Tiles

Aperiodic Infinite Tilings

What is interesting about these tiles is:

It is possible to tile the entire plane with kites and darts in an aperiodic way.

Such a tiling is non-periodic and does not contain arbitrarily large periodic regions or patches.

The possibility of aperiodic tilings with kites and darts was discovered by Sir Roger Penrose in 1974. There are other shapes with this property, including a chiral aperiodic monotile discovered in 2023 by Smith, Myers, Kaplan, Goodman-Strauss. (See the Penrose Tiling Wikipedia page for the history of aperiodic tilings)

This package is entirely concerned with Penrose’s kite and dart tilings also known as P2 tilings.

Legal Tilings

In figure 2 we add a temporary green line marking purely to illustrate a rule for making legal tilings. The purpose of the rule is to exclude the possibility of periodic tilings.

If all tiles are marked as shown, then whenever tiles come together at a point, they must all be marked or must all be unmarked at that meeting point. So, for example, each long edge of a kite can be placed legally on only one of the two long edges of a dart. The kite wing vertex (which is marked) has to go next to the dart tip vertex (which is marked) and cannot go next to the dart wing vertex (which is unmarked) for a legal tiling.

Figure 2: Marked Dart and Kite

Correct Tilings

Unfortunately, having a finite legal tiling is not enough to guarantee you can continue the tiling without getting stuck. Finite legal tilings which can be continued to cover the entire plane are called correct and the others (which are doomed to get stuck) are called incorrect. This means that decomposition and forcing (described later) become important tools for constructing correct finite tilings.

2. Using the PenroseKiteDart Package

You will need the Haskell Diagrams package (See Haskell Diagrams) as well as this package (PenroseKiteDart). When these are installed, you can produce diagrams with a Main.hs module. This should import a chosen backend for diagrams such as the default (SVG) along with Diagrams.Prelude.
    module Main (main) where
    
    import Diagrams.Backend.SVG.CmdLine
    import Diagrams.Prelude
For Penrose’s Kite and Dart tilings, you also need to import the PKD module and (optionally) the TgraphExamples module.
    import PKD
    import TgraphExamples
Then to ouput someExample figure
    fig::Diagram B
    fig = someExample

    main :: IO ()
    main = mainWith fig
Note that the token B is used in the diagrams package to represent the chosen backend for output. So a diagram has type Diagram B. In this case B is bound to SVG by the import of the SVG backend. When the compiled module is executed it will generate an SVG file. (See Haskell Diagrams for more details on producing diagrams and using alternative backends).

3. Overview of Types and Operations

Half-Tiles

In order to implement operations on tilings (decompose in particular), we work with half-tiles. These are illustrated in figure 3 and labelled RD (right dart), LD (left dart), LK (left kite), RK (right kite). The join edges where left and right halves come together are shown with dotted lines, leaving one short edge and one long edge on each half-tile (excluding the join edge). We have shown a red dot at the vertex we regard as the origin of each half-tile (the tip of a half-dart and the base of a half-kite).

Figure 3: Half-Tile pieces showing join edges (dashed) and origin vertices (red dots)

The labels are actually data constructors introduced with type operator HalfTile which has an argument type (rep) to allow for more than one representation of the half-tiles.
    data HalfTile rep 
      = LD rep -- Left Dart
      | RD rep -- Right Dart
      | LK rep -- Left Kite
      | RK rep -- Right Kite
      deriving (Show,Eq)
Tgraphs

We introduce tile graphs (Tgraphs) which provide a simple planar graph representation for finite patches of tiles. For Tgraphs we first specialise HalfTile with a triple of vertices (positive integers) to make a TileFace such as RD(1,2,3), where the vertices go clockwise round the half-tile triangle starting with the origin.
    type TileFace  = HalfTile (Vertex,Vertex,Vertex)
    type Vertex    = Int  -- must be positive
The function
    makeTgraph :: [TileFace] -> Tgraph
then constructs a Tgraph from a TileFace list after checking the TileFaces satisfy certain properties (described below). We also have
    faces :: Tgraph -> [TileFace]
to retrieve the TileFace list from a Tgraph.

As an example, the fool (short for fool’s kite and also called an ace in the literature) consists of two kites and a dart (= 4 half-kites and 2 half-darts):
    fool :: Tgraph
    fool = makeTgraph [RD (1,2,3), LD (1,3,4)   -- right and left dart
                      ,LK (5,3,2), RK (5,2,7)   -- left and right kite
                      ,RK (5,4,3), LK (5,6,4)   -- right and left kite
                      ]
To produce a diagram, we simply draw the Tgraph
    foolFigure :: Diagram B
    foolFigure = draw fool
which will produce the diagram on the left in figure 4.

Alternatively,
    foolFigure :: Diagram B
    foolFigure = labelled drawj fool
will produce the diagram on the right in figure 4 (showing vertex labels and dashed join edges).

Figure 4: Diagram of fool without labels and join edges (left), and with (right)

When any (non-empty) Tgraph is drawn, a default orientation and scale are chosen based on the lowest numbered join edge. This is aligned on the positive x-axis with length 1 (for darts) or length $\phi$ (for kites).

Tgraph Properties

Tgraphs are actually implemented as
    newtype Tgraph = Tgraph [TileFace]
                     deriving (Show)
but the data constructor Tgraph is not exported to avoid accidentally by-passing checks for the required properties. The properties checked by makeTgraph ensure the Tgraph represents a legal tiling as a planar graph with positive vertex numbers, and that the collection of half-tile faces are both connected and have no crossing boundaries (see note below). Finally, there is a check to ensure two or more distinct vertex numbers are not used to represent the same vertex of the graph (a touching vertex check). An error is raised if there is a problem.

Note: If the TileFaces are faces of a planar graph there will also be exterior (untiled) regions, and in graph theory these would also be called faces of the graph. To avoid confusion, we will refer to these only as exterior regions, and unless otherwise stated, face will mean a TileFace. We can then define the boundary of a list of TileFaces as the edges of the exterior regions. There is a crossing boundary if the boundary crosses itself at a vertex. We exclude crossing boundaries from Tgraphs because they prevent us from calculating relative positions of tiles locally and create touching vertex problems.

For convenience, in addition to makeTgraph, we also have
    makeUncheckedTgraph :: [TileFace] -> Tgraph
    checkedTgraph   :: [TileFace] -> Tgraph
The first of these (performing no checks) is useful when you know the required properties hold. The second performs the same checks as makeTgraph except that it omits the touching vertex check. This could be used, for example, when making a Tgraph from a sub-collection of TileFaces of another Tgraph.

Main Tiling Operations

There are three key operations on finite tilings, namely
    decompose :: Tgraph -> Tgraph
    force     :: Tgraph -> Tgraph
    compose   :: Tgraph -> Tgraph
Decompose

Decomposition (also called deflation) works by splitting each half-tile into either 2 or 3 new (smaller scale) half-tiles, to produce a new tiling. The fact that this is possible, is used to establish the existence of infinite aperiodic tilings with kites and darts. Since our Tgraphs have abstracted away from scale, the result of decomposing a Tgraph is just another Tgraph. However if we wish to compare before and after with a drawing, the latter should be scaled by a factor $1/{\phi} = \phi - 1$ times the scale of the former, to reflect the change in scale.

Figure 5: fool (left) and decompose fool (right)

We can, of course, iterate decompose to produce an infinite list of finer and finer decompositions of a Tgraph
    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose
Force

Force works by adding any TileFaces on the boundary edges of a Tgraph which are forced. That is, where there is only one legal choice of TileFace addition consistent with the seven possible vertex types. Such additions are continued until either (i) there are no more forced cases, in which case a final (forced) Tgraph is returned, or (ii) the process finds the tiling is stuck, in which case an error is raised indicating an incorrect tiling. [In the latter case, the argument to force must have been an incorrect tiling, because the forced additions cannot produce an incorrect tiling starting from a correct tiling.]

An example is shown in figure 6. When forced, the Tgraph on the left produces the result on the right. The original is highlighted in red in the result to show what has been added.

Figure 6: A Tgraph (left) and its forced result (right) with the original shown red

Compose

Composition (also called inflation) is an opposite to decompose but this has complications for finite tilings, so it is not simply an inverse. (See Graphs,Kites and Darts and Theorems for more discussion of the problems). Figure 7 shows a Tgraph (left) with the result of composing (right) where we have also shown (in pale green) the faces of the original that are not included in the composition – the remainder faces.

Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green

Under some circumstances composing can fail to produce a Tgraph because there are crossing boundaries in the resulting TileFaces. However, we have established that

If g is a forced Tgraph, then compose g is defined and it is also a forced Tgraph.

Try Results

It is convenient to use types of the form Try a for results where we know there can be a failure. For example, compose can fail if the result does not pass the connected and no crossing boundary check, and force can fail if its argument is an incorrect Tgraph. In situations when you would like to continue some computation rather than raise an error when there is a failure, use a try version of a function.
    tryCompose :: Tgraph -> Try Tgraph
    tryForce   :: Tgraph -> Try Tgraph
We define Try as a synonym for Either ShowS (which is a monad) in module Tgraph.Try.
type Try a = Either ShowS a
(Note ShowS is String -> String). Successful results have the form Right r (for some correct result r) and failure results have the form Left (s<>) (where s is a String describing the problem as a failure report).

The function
    runTry:: Try a -> a
    runTry = either error id
will retrieve a correct result but raise an error for failure cases. This means we can always derive an error raising version from a try version of a function by composing with runTry.
    force = runTry . tryForce
    compose = runTry . tryCompose
Elementary Tgraph and TileFace Operations

The module Tgraph.Prelude defines elementary operations on Tgraphs relating vertices, directed edges, and faces. We describe a few of them here.

When we need to refer to particular vertices of a TileFace we use
    originV :: TileFace -> Vertex -- the first vertex - red dot in figure 2
    oppV    :: TileFace -> Vertex -- the vertex at the opposite end of the join edge from the origin
    wingV   :: TileFace -> Vertex -- the vertex not on the join edge
A directed edge is represented as a pair of vertices.
    type Dedge = (Vertex,Vertex)
So (a,b) is regarded as a directed edge from a to b.

When we need to refer to particular edges of a TileFace we use
    joinE  :: TileFace -> Dedge  -- shown dotted in figure 2
    shortE :: TileFace -> Dedge  -- the non-join short edge
    longE  :: TileFace -> Dedge  -- the non-join long edge
which are all directed clockwise round the TileFace. In contrast, joinOfTile is always directed away from the origin vertex, so is not clockwise for right darts or for left kites:
    joinOfTile:: TileFace -> Dedge
    joinOfTile face = (originV face, oppV face)
In the special case that a list of directed edges is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we can think of this as an edge list rather than just a directed edge list.

For example,
    internalEdges :: Tgraph -> [Dedge]
produces an edge list, whereas
    boundary :: Tgraph -> [Dedge]
produces single directions. Each directed edge in the resulting boundary will have a TileFace on the left and an exterior region on the right. The function
    dedges :: Tgraph -> [Dedge]
produces all the directed edges obtained by going clockwise round each TileFace so not every edge in the list has an inverse in the list.

Note: There is now a class HasFaces (introduced in version 1.4) which includes instances for both Tgraph and [TileFace] and others. This allows some generalisations. In particular the more general types of the above three functions are now
    internalEdges :: HasFaces a => a -> [Dedge]
    boundary      :: HasFaces a => a -> [Dedge] 
    dedges        :: HasFaces a => a -> [Dedge]   
Patches (Scaled and Positioned Tilings)

Behind the scenes, when a Tgraph is drawn, each TileFace is converted to a Piece. A Piece is another specialisation of HalfTile using a two dimensional vector to indicate the length and direction of the join edge of the half-tile (from the originV to the oppV), thus fixing its scale and orientation. The whole Tgraph then becomes a list of located Pieces called a Patch.
    type Piece = HalfTile (V2 Double)
    type Patch = [Located Piece]
Piece drawing functions derive vectors for other edges of a half-tile piece from its join edge vector. In particular (in the TileLib module) we have
    drawPiece :: Piece -> Diagram B
    darawjPiece :: Piece -> Diagram B
    fillPieceDK :: Colour Double -> Colour Double -> Piece -> Diagram B
where the first draws the non-join edges of a Piece, the second does the same but adds a faint dashed line for the join edge, and the third takes two colours – one for darts and one for kites, which are used to fill the piece as well as using drawPiece.

Patch is an instances of class Transformable so a Patch can be scaled, rotated, and translated.

Vertex Patches

It is useful to have an intermediate form between Tgraphs and Patches, that contains information about both the location of vertices (as 2D points), and the abstract TileFaces. This allows us to introduce labelled drawing functions (to show the vertex labels) which we then extend to Tgraphs. We call the intermediate form a VPatch (short for Vertex Patch).
    type VertexLocMap = IntMap.IntMap (Point V2 Double)
    data VPatch = VPatch {vLocs :: VertexLocMap,  vpFaces::[TileFace]} deriving Show
and
    makeVP :: Tgraph -> VPatch
calculates vertex locations using a default orientation and scale.

VPatch is made an instance of class Transformable so a VPatch can also be scaled and rotated.

One essential use of this intermediate form is to be able to draw a Tgraph with labels, rotated but without the labels themselves being rotated. We can simply convert the Tgraph to a VPatch, and rotate that before drawing with labels.
    labelled draw (rotate someAngle (makeVP g))
We can also align a VPatch using vertex labels.
    alignXaxis :: (Vertex, Vertex) -> VPatch -> VPatch 
So if g is a Tgraph with vertex labels a and b we can align it on the x-axis with a at the origin and b on the positive x-axis (after converting to a VPatch), instead of accepting the default orientation.
    labelled draw (alignXaxis (a,b) (makeVP g))
Another use of VPatches is to share the vertex location map when drawing only subsets of the faces (see Overlaid examples in the next section).

4. Drawing in More Detail

Class Drawable

There is a class Drawable with instances Tgraph, VPatch, Patch. When the token B is in scope standing for a fixed backend then we can assume
    draw   :: Drawable a => a -> Diagram B  -- draws non-join edges
    drawj  :: Drawable a => a -> Diagram B  -- as with draw but also draws dashed join edges
    fillDK :: Drawable a => Colour Double -> Colour Double -> a -> Diagram B -- fills with colours
where fillDK clr1 clr2 will fill darts with colour clr1 and kites with colour clr2 as well as drawing non-join edges.

These are the main drawing tools. However they are actually defined for any suitable backend b so have more general types.

(Update Sept 2024) From version 1.1 onwards of PenroseKiteDart, these are
    draw ::   (Drawable a, OKBackend b) =>
              a -> Diagram b
    drawj ::  (Drawable a, OKBackend) b) =>
              a -> Diagram b
    fillDK :: (Drawable a, OKBackend b) =>
              Colour Double -> Colour Double -> a -> Diagram b
where the class OKBackend is a check to ensure a backend is suitable for drawing 2D tilings with or without labels.

In these notes we will generally use the simpler description of types using B for a fixed chosen backend for the sake of clarity.

The drawing tools are each defined via the class function drawWith using Piece drawing functions.
    class Drawable a where
        drawWith :: (Piece -> Diagram B) -> a -> Diagram B
    
    draw = drawWith drawPiece
    drawj = drawWith drawjPiece
    fillDK clr1 clr2 = drawWith (fillPieceDK clr1 clr2)
To design a new drawing function, you only need to implement a function to draw a Piece, (let us call it newPieceDraw)
    newPieceDraw :: Piece -> Diagram B
This can then be elevated to draw any Drawable (including Tgraphs, VPatches, and Patches) by applying the Drawable class function drawWith:
    newDraw :: Drawable a => a -> Diagram B
    newDraw = drawWith newPieceDraw
Class DrawableLabelled

Class DrawableLabelled is defined with instances Tgraph and VPatch, but Patch is not an instance (because this does not retain vertex label information).
    class DrawableLabelled a where
        labelColourSize :: Colour Double -> Measure Double -> (Patch -> Diagram B) -> a -> Diagram B
So labelColourSize c m modifies a Patch drawing function to add labels (of colour c and size measure m). Measure is defined in Diagrams.Prelude with pre-defined measures tiny, verySmall, small, normal, large, veryLarge, huge. For most of our diagrams of Tgraphs, we use red labels and we also find small is a good default size choice, so we define
    labelSize :: DrawableLabelled a => Measure Double -> (Patch -> Diagram B) -> a -> Diagram B
    labelSize = labelColourSize red

    labelled :: DrawableLabelled a => (Patch -> Diagram B) -> a -> Diagram B
    labelled = labelSize small
and then labelled draw, labelled drawj, labelled (fillDK clr1 clr2) can all be used on both Tgraphs and VPatches as well as (for example) labelSize tiny draw, or labelCoulourSize blue normal drawj.

Further drawing functions

There are a few extra drawing functions built on top of the above ones. The function smart is a modifier to add dashed join edges only when they occur on the boundary of a Tgraph
    smart :: (VPatch -> Diagram B) -> Tgraph -> Diagram B
So smart vpdraw g will draw dashed join edges on the boundary of g before applying the drawing function vpdraw to the VPatch for g. For example the following all draw dashed join edges only on the boundary for a Tgraph g
    smart draw g
    smart (labelled draw) g
    smart (labelSize normal draw) g
When using labels, the function rotateBefore allows a Tgraph to be drawn rotated without rotating the labels.
    rotateBefore :: (VPatch -> a) -> Angle Double -> Tgraph -> a
    rotateBefore vpdraw angle = vpdraw . rotate angle . makeVP
So for example,
    rotateBefore (labelled draw) (90@@deg) g
makes sense for a Tgraph g. Of course if there are no labels we can simply use
    rotate (90@@deg) (draw g)
Similarly alignBefore allows a Tgraph to be aligned on the X-axis using a pair of vertex numbers before drawing.
    alignBefore :: (VPatch -> a) -> (Vertex,Vertex) -> Tgraph -> a
    alignBefore vpdraw (a,b) = vpdraw . alignXaxis (a,b) . makeVP
So, for example, if Tgraph g has vertices a and b, both
    alignBefore draw (a,b) g
    alignBefore (labelled draw) (a,b) g
make sense. Note that the following examples are wrong. Even though they type check, they re-orient g without repositioning the boundary joins.
    smart (labelled draw . rotate angle) g      -- WRONG
    smart (labelled draw . alignXaxis (a,b)) g  -- WRONG
Instead use
    smartRotateBefore (labelled draw) angle g
    smartAlignBefore (labelled draw) (a,b) g
where
    smartRotateBefore :: (VPatch -> Diagram B) -> Angle Double -> Tgraph -> Diagram B
    smartAlignBefore  :: (VPatch -> Diagram B) -> (Vertex,Vertex) -> Tgraph -> Diagram B
are defined using
    smartOn :: Tgraph -> (VPatch -> Diagram B) -> VPatch -> Diagram B
Here, smartOn g vpdraw vp uses the given vp for drawing boundary joins and drawing faces of g (with vpdraw) rather than converting g to a new VPatch. This assumes vp has locations for vertices in g.

Overlaid examples (location map sharing)

The function
    drawForce :: Tgraph -> Diagram B
will (smart) draw a Tgraph g in red overlaid (using <>) on the result of force g as in figure 6. Similarly
    drawPCompose  :: Tgraph -> Diagram B
applied to a Tgraph g will draw the result of a partial composition of g as in figure 7. That is a drawing of compose g but overlaid with a drawing of the remainder faces of g shown in pale green.

Both these functions make use of sharing a vertex location map to get correct alignments of overlaid diagrams. In the case of drawForce g, we know that a VPatch for force g will contain all the vertex locations for g since force only adds to a Tgraph (when it succeeds). So when constructing the diagram for g we can use the VPatch created for force g instead of starting afresh. Similarly for drawPCompose g the VPatch for g contains locations for all the vertices of compose g so compose g is drawn using the the VPatch for g instead of starting afresh.

The location map sharing is done with
    subFaces :: HasFaces a => 
                a -> VPatch -> VPatch
so that subFaces fcs vp is a VPatch with the same vertex locations as vp, but replacing the faces of vp with fcs. [Of course, this can go wrong if the new faces have vertices not in the domain of the vertex location map so this needs to be used with care. Any errors would only be discovered when a diagram is created.]

For cases where labels are only going to be drawn for certain faces, we need a version of subFaces which also gets rid of vertex locations that are not relevant to the faces. For this situation we have
    restrictTo:: HasFaces a => 
                 a -> VPatch -> VPatch
which filters out un-needed vertex locations from the vertex location map. Unlike subFaces, restrictTo checks for missing vertex locations, so restrictTo fcs vp raises an error if a vertex in fcs is missing from the keys of the vertex location map of vp.

5. Forcing in More Detail

The force rules

The rules used by our force algorithm are local and derived from the fact that there are seven possible vertex types as depicted in figure 8.

Figure 8: Seven vertex types

Our rules are shown in figure 9 (omitting mirror symmetric versions). In each case the TileFace shown yellow needs to be added in the presence of the other TileFaces shown.

Figure 9: Rules for forcing

Main Forcing Operations

To make forcing efficient we convert a Tgraph to a BoundaryState to keep track of boundary information of the Tgraph, and then calculate a ForceState which combines the BoundaryState with a record of awaiting boundary edge updates (an update map). Then each face addition is carried out on a ForceState, converting back when all the face additions are complete. It makes sense to apply force (and related functions) to a Tgraph, a BoundaryState, or a ForceState, so we define a class Forcible with instances Tgraph, BoundaryState, and ForceState.

This allows us to define
    force :: Forcible a => a -> a
    tryForce :: Forcible a => a -> Try a
The first will raise an error if a stuck tiling is encountered. The second uses a Try result which produces a Left string for failures and a Right a for successful result a.

There are several other operations related to forcing including
    stepForce :: Forcible a => Int -> a -> a
    tryStepForce  :: Forcible a => Int -> a -> Try a

    addHalfDart, addHalfKite :: Forcible a => Dedge -> a -> a
    tryAddHalfDart, tryAddHalfKite :: Forcible a => Dedge -> a -> Try a
The first two force (up to) a given number of steps (=face additions) and the other four add a half dart/kite on a given boundary edge.

Update Generators

An update generator is used to calculate which boundary edges can have a certain update. There is an update generator for each force rule, but also a combined (all update) generator. The force operations mentioned above all use the default all update generator (defaultAllUGen) but there are more general (with) versions that can be passed an update generator of choice. For example
    forceWith :: Forcible a => UpdateGenerator -> a -> a
    tryForceWith :: Forcible a => UpdateGenerator -> a -> Try a
In fact we defined
    force = forceWith defaultAllUGen
    tryForce = tryForceWith defaultAllUGen
We can also define
    wholeTiles :: Forcible a => a -> a
    wholeTiles = forceWith wholeTileUpdates
where wholeTileUpdates is an update generator that just finds boundary join edges to complete whole tiles.

In addition to defaultAllUGen there is also allUGenerator which does the same thing apart from how failures are reported. The reason for keeping both is that they were constructed differently and so are useful for testing.

In fact UpdateGenerators are functions that take a BoundaryState and a focus (list of boundary directed edges) to produce an update map. Each Update is calculated as either a SafeUpdate (where two of the new face edges are on the existing boundary and no new vertex is needed) or an UnsafeUpdate (where only one edge of the new face is on the boundary and a new vertex needs to be created for a new face).
    type UpdateGenerator = BoundaryState -> [Dedge] -> Try UpdateMap
    type UpdateMap = Map.Map Dedge Update
    data Update = SafeUpdate TileFace 
                | UnsafeUpdate (Vertex -> TileFace)
Completing (executing) an UnsafeUpdate requires a touching vertex check to ensure that the new vertex does not clash with an existing boundary vertex. Using an existing (touching) vertex would create a crossing boundary so such an update has to be blocked.

Forcible Class Operations

The Forcible class operations are higher order and designed to allow for easy additions of further generic operations. They take care of conversions between Tgraphs, BoundaryStates and ForceStates.
    class Forcible a where
      tryFSOpWith :: UpdateGenerator -> (ForceState -> Try ForceState) -> a -> Try a
      tryChangeBoundaryWith :: UpdateGenerator -> (BoundaryState -> Try BoundaryChange) -> a -> Try a
      tryInitFSWith :: UpdateGenerator -> a -> Try ForceState
For example, given an update generator ugen and any f:: ForceState -> Try ForceState , then f can be generalised to work on any Forcible using tryFSOpWith ugen f. This is used to define both tryForceWith and tryStepForceWith.

We also specialize tryFSOpWith to use the default update generator
    tryFSOp :: Forcible a => (ForceState -> Try ForceState) -> a -> Try a
    tryFSOp = tryFSOpWith defaultAllUGen
Similarly given an update generator ugen and any f:: BoundaryState -> Try BoundaryChange , then f can be generalised to work on any Forcible using tryChangeBoundaryWith ugen f. This is used to define tryAddHalfDart and tryAddHalfKite.

We also specialize tryChangeBoundaryWith to use the default update generator
    tryChangeBoundary :: Forcible a => (BoundaryState -> Try BoundaryChange) -> a -> Try a
    tryChangeBoundary = tryChangeBoundaryWith defaultAllUGen
Note that the type BoundaryChange contains a resulting BoundaryState, the single TileFace that has been added, a list of edges removed from the boundary (of the BoundaryState prior to the face addition), and a list of the (3 or 4) boundary edges affected around the change that require checking or re-checking for updates.

The class function tryInitFSWith will use an update generator to create an initial ForceState for any Forcible. If the Forcible is already a ForceState it will do nothing. Otherwise it will calculate updates for the whole boundary. We also have the special case
    tryInitFS :: Forcible a => a -> Try ForceState
    tryInitFS = tryInitFSWith defaultAllUGen
Efficient chains of forcing operations.

Note that (force . force) does the same as force, but we might want to chain other force related steps in a calculation.

For example, consider the following combination which, after decomposing a Tgraph, forces, then adds a half dart on a given boundary edge (d) and then forces again.
    combo :: Dedge -> Tgraph -> Tgraph
    combo d = force . addHalfDart d . force . decompose
Since decompose:: Tgraph -> Tgraph, the instances of force and addHalfDart d will have type Tgraph -> Tgraph so each of these operations, will begin and end with conversions between Tgraph and ForceState. We would do better to avoid these wasted intermediate conversions working only with ForceStates and keeping only those necessary conversions at the beginning and end of the whole sequence.

This can be done using tryFSOp. To see this, let us first re-express the forcing sequence using the Try monad, so
    force . addHalfDart d . force
becomes
    tryForce <=< tryAddHalfDart d <=< tryForce
Note that (<=<) is the Kliesli arrow which replaces composition for Monads (defined in Control.Monad). (We could also have expressed this right to left sequence with a left to right version tryForce >=> tryAddHalfDart d >=> tryForce). The definition of combo becomes
    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose
This has no performance improvement, but now we can pass the sequence to tryFSOp to remove the unnecessary conversions between steps.
    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . tryFSOp (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose
The sequence actually has type Forcible a => a -> Try a but when passed to tryFSOp it specialises to type ForceState -> Try ForseState. This ensures the sequence works on a ForceState and any conversions are confined to the beginning and end of the sequence, avoiding unnecessary intermediate conversions.

A limitation of forcing

To avoid creating touching vertices (or crossing boundaries) a BoundaryState keeps track of locations of boundary vertices. At around 35,000 face additions in a single force operation the calculated positions of boundary vertices can become too inaccurate to prevent touching vertex problems. In such cases it is better to use
    recalibratingForce :: Forcible a => a -> a
    tryRecalibratingForce :: Forcible a => a -> Try a
These work by recalculating all vertex positions at 20,000 step intervals to get more accurate boundary vertex positions. For example, 6 decompositions of the kingGraph has 2,906 faces. Applying force to this should result in 53,574 faces but will go wrong before it reaches that. This can be fixed by calculating either
    recalibratingForce (decompositions kingGraph !!6)
or using an extra force before the decompositions
    force (decompositions (force kingGraph) !!6)
In the latter case, the final force only needs to add 17,864 faces to the 35,710 produced by decompositions (force kingGraph) !!6.

6. Advanced Operations

Guided comparison of Tgraphs

Asking if two Tgraphs are equivalent (the same apart from choice of vertex numbers) is a an np-complete problem. However, we do have an efficient guided way of comparing Tgraphs. In the module Tgraph.Rellabelling we have
    sameGraph :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Bool
The expression sameGraph (g1,d1) (g2,d2) asks if g2 can be relabelled to match g1 assuming that the directed edge d2 in g2 is identified with d1 in g1. Hence the comparison is guided by the assumption that d2 corresponds to d1.

It is implemented using
    tryRelabelToMatch :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph
where tryRelabelToMatch (g1,d1) (g2,d2) will either fail with a Left report if a mismatch is found when relabelling g2 to match g1 or will succeed with Right g3 where g3 is a relabelled version of g2. The successful result g3 will match g1 in a maximal tile-connected collection of faces containing the face with edge d1 and have vertices disjoint from those of g1 elsewhere. The comparison tries to grow a suitable relabelling by comparing faces one at a time starting from the face with edge d1 in g1 and the face with edge d2 in g2. (This relies on the fact that Tgraphs are connected with no crossing boundaries, and hence tile-connected.)

The above function is also used to implement
    tryFullUnion:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph
which tries to find the union of two Tgraphs guided by a directed edge identification. However, there is an extra complexity arising from the fact that Tgraphs might overlap in more than one tile-connected region. After calculating one overlapping region, the full union uses some geometry (calculating vertex locations) to detect further overlaps.

Finally we have
    commonFaces:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> [TileFace]
which will find common regions of overlapping faces of two Tgraphs guided by a directed edge identification. The resulting common faces will be a sub-collection of faces from the first Tgraph. These are returned as a list as they may not be a connected collection of faces and therefore not necessarily a Tgraph.

Empires and SuperForce

In Empires and SuperForce we discussed forced boundary coverings which were used to implement both a superForce operation
    superForce:: Forcible a => a -> a
and operations to calculate empires.

We will not repeat the descriptions here other than to note that
    forcedBoundaryECovering:: Tgraph -> [Tgraph]
finds boundary edge coverings after forcing a Tgraph. That is, forcedBoundaryECovering g will first force g, then (if it succeeds) finds a collection of (forced) extensions to force g such that

each extension has the whole boundary of force g as internal edges.

each possible addition to a boundary edge of force g (kite or dart) has been included in the collection.

(possible here means – not leading to a stuck Tgraph when forced.) There is also
    forcedBoundaryVCovering:: Tgraph -> [Tgraph]
which does the same except that the extensions have all boundary vertices internal rather than just the boundary edges.

Combinations and Explicitly Forced

We introduced a new type Forced (in v 1.3) to enable a forcible to be explictily labelled as being forced. For example
    forceF    :: Forcible a => a -> Forced a 
    tryForceF :: Forcible a => a -> Try (Forced a)
    forgetF   :: Forced a -> a
This allows us to restrict certain functions which expect a forced argument by making this explicit.
    composeF :: Forced Tgraph -> Forced Tgraph
The definition makes use of theorems established in Graphs,Kites and Darts and Theorems that composing a forced Tgraph does not require a check (for connectedness and no crossing boundaries) and the result is also forced. This can then be used to define efficient combinations such as
    compForce:: Tgraph -> Forced Tgraph      -- compose after forcing
    compForce = composeF . forceF

    allCompForce:: Tgraph -> [Forced Tgraph] -- iterated (compose after force) while not emptyTgraph
    maxCompForce:: Tgraph -> Forced Tgraph   -- last item in allCompForce (or emptyTgraph)
Tracked Tgraphs

The type
    data TrackedTgraph = TrackedTgraph
       { tgraph  :: Tgraph
       , tracked :: [[TileFace]] 
       } deriving Show
has proven useful in experimentation as well as in producing artwork with darts and kites. The idea is to keep a record of sub-collections of faces of a Tgraph when doing both force operations and decompositions. A list of the sub-collections forms the tracked list associated with the Tgraph. We make TrackedTgraph an instance of class Forcible by having force operations only affect the Tgraph and not the tracked list. The significant idea is the implementation of
    decomposeTracked :: TrackedTgraph -> TrackedTgraph
Decomposition of a Tgraph involves introducing a new vertex for each long edge and each kite join. These are then used to construct the decomposed faces. For decomposeTracked we do the same for the Tgraph, but when it comes to the tracked collections, we decompose them re-using the same new vertex numbers calculated for the edges in the Tgraph. This keeps a consistent numbering between the Tgraph and tracked faces, so each item in the tracked list remains a sub-collection of faces in the Tgraph.

The function
    drawTrackedTgraph :: [VPatch -> Diagram B] -> TrackedTgraph -> Diagram B
is used to draw a TrackedTgraph. It uses a list of functions to draw VPatches. The first drawing function is applied to a VPatch for any untracked faces. Subsequent functions are applied to VPatches for the tracked list in order. Each diagram is beneath later ones in the list, with the diagram for the untracked faces at the bottom. The VPatches used are all restrictions of a single VPatch for the Tgraph, so will be consistent in vertex locations. When labels are used, there is also a drawTrackedTgraphRotated and drawTrackedTgraphAligned for rotating or aligning the VPatch prior to applying the drawing functions.

Note that the result of calculating empires (see Empires and SuperForce ) is represented as a TrackedTgraph. The result is actually the common faces of a forced boundary covering, but a particular element of the covering (the first one) is chosen as the background Tgraph with the common faces as a tracked sub-collection of faces. Hence we have
    empire1, empire2 :: Tgraph -> TrackedTgraph
    
    drawEmpire :: TrackedTgraph -> Diagram B
Figure 10 was also created using TrackedTgraphs.

Figure 10: Using a TrackedTgraph for drawing

7. Other Reading

Previous related blogs are:

Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).

Graphs, Kites and Darts intoduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).

Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.

Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.
by readerunner at September 24, 2025 09:01 AM

Chris Penner

Monads are too powerful: The Expressiveness Spectrum
Okay, so you and I both know monads are great, they allow us to sequence effects in a structured way and are in many ways a super-power in the functional-programming toolkit. It's likely none of us would have even heard of Haskell without them.

It's my opinion, though, that monads are actually too powerful for their own good. Or to be more clear, monads are more expressive than they need to be, and that we're paying hidden costs to gain expressive power that we rarely, if ever, actually use.

In this post we'll take a look at how different approaches to effects lie on the spectrum between expressiveness and strong static analysis, and how, just like Dynamic vs Statically typed programming languages, there's a benefit to limiting the number of programs you can write by adding more structure and constraints to your effects system.

The Status Quo

A defining feature of the Monadic interface is that it allows the dynamic selection of effects based on the results of previous effects.

This is a huge boon, and is what allowed the construction of real programs in Haskell without compromising on its goals of purity and laziness. This ability is what allows us to express normal programming workflows like fetching input from a user before deciding which command to run next, or fetching IDs from the database and then resolving those IDs with subsequent database calls. This form of choice is necessary for writing most moderately complex programs.

Alas, as it turns out, this expressiveness isn't free! It exists on a spectrum. As anyone who's maintained any relatively complex JavaScript or Python codebase can tell you, the ability to do anything at any time comes at a cost of readability, perhaps more relevant to the current discussion, at the cost of static analysis.

Allow me to present, in all its glory, the Expressiveness Spectrum:
Strong Static Analysis <+---------+---------+> Embarrassingly Expressive Code
As you can clearly see, as you gain more expressive power you begin to lose the ability to know what the heck your program could possibly do when it runs.

This has fueled a good many debates among programming language connoisseurs, and it turns out that there's a similar version of the debate to be had within the realm of effect systems themselves.

In their essence, effect systems are just methods of expressing miniature programs within your programming language of choice. These mini programs can be constructed, analysed, and executed at runtime within the framework of the larger programming language, and the same Expressiveness Spectrum applies independently to them as well. That is, the more programs you allow your effect system to express, the less you can know about any individual program before you run it.

In the effect-system microcosm there are similar mini compile time and run time stages. As an example here's a simple Haskell program which constructs a chain of effects using a DSL:
-- The common way to express effects in Haskell 
-- is with a Monadic typeclass interface.
class Monad m => ReadWrite m where
  readLine :: m String
  writeLine :: String -> m ()

-- We can write a little program builder which depends on 
-- input that may only be known at runtime.
greetUser :: ReadWrite m => String -> m () 
greetUser greeting = do
  writeLine (greeting <> ", what is your name?")
  name <- readLine
  writeLine ("Hello, " <> name <> "!")

-- We can, at run time, construct a new mini-program 
-- that the world has never seen before!
mkSimpleGreeting :: ReadWrite m => IO (m ())
mkSimpleGreeting = do 
  greeting <- readFile "greeting.txt"
  pure (greetUser greeting)
In this simplified example we clearly see that we can use our host languages features arbitrarily to construct a smaller program within our ReadWrite DSL. Our simple program here just reads a line of input from the user and then greets them by name.

This is all well and good in such a simple case, however if we expand our simple ReadWrite effect slightly by adding a new effect:
class Monad m => ReadWriteDelete m where
  readLine :: m String
  writeLine :: String -> m ()
  deleteMyHardDrive :: m ()
Well now, if we're constructing or parsing programs of the ReadWriteDelete effect type at runtime, we probably want to be able to know whether or not the program we're about to run contains a call to deleteMyHardDrive before we actually run it.

We could of course simply abort execution or ignore requests to delete everything when we're running the effects in our host language, which is nice, but the fact remains that if our app is handed an arbitrary ReadWriteDelete m => m () program at runtime, there's no way to know whether or not it could possibly contain a call to deleteMyHardDrive without actually running the program, and even then, there's no way to know whether there's some other possible execution path that we missed which does call deleteMyHardDrive.

We'd really love to be able to analyse the program and all of its possible effects before we run anything at all.

The Benefits of Static Analysis

Most programmers are familiar with the benefits of static analysis when applied to regular everyday programming languages. It can catch basic errors like type-mismatches, incorrect function calls, and in some cases things like memory unsafety or race conditions.

We're typically after different kinds of benefits when analysing programs in our effect systems, but they are similarly useful!

For instance, given enough understanding of an effectful program we can perform code transformations like removing redundant calls, parallelizing independent workflows, caching results, and optimizing workflows into more efficient ones.

We can also gain useful knowledge, like creating a call graph for developers to better understand what's about to happen. Or perhaps analyzing the use of sensitive resources like the file system or network such that we can ask for approval before even beginning execution.

But as I've already mentioned, we can't do most of these techniques in a Monadic effect system. The monad interface itself makes it clear why this is the case:
class Applicative m => Monad m where
  (>>=) :: m a -> (a -> m b) -> m b
  return :: a -> m a
We can see from Bind (>>=) that in order to know which effects (m b) will be executed next, we need to first execute the previous effect (m a) and then we need the host language (Haskell) to execute an arbitrary Haskell function. There's no way at all for us to gain insight about what the results of that function might be without running it first.

Let's move a step towards the analysis side of the spectrum and talk about Applicatives...

The origin of Applicatives

Applicatives are another interface for expressing effectful operations.

As far as I can determine, the first widespread introduction of Applicatives to programming was in Applicative Programming with Effects, a 2008 paper by Conor McBride and Ross Paterson.

Take note that this paper was written after Monads were already in widespread use, and Applicatives are, by their very definition, less expressive than Monads. To be precise, Applicatives can express fewer effectful programs than Monads can. This is shown by the fact that every Monad implements the Applicative interface, but not every Applicative is a Monad.

Despite being less expressive Applicatives are still very useful. They allow us to express programs with effects that aren't valid monads, but they also provide us with the ability to better analyse which effects are part of an effectful program before running it.

Take a look at the Applicative interface:
class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b
Notice how the interface does contain an arrow f (a -> b), but this arrow can only affect the pure aspect of the computation. Unlike monadic bind, there's no way to use the a result from running effects to select or build new effects to run.

The sequence of effects is determined entirely by the host language before we start to run the effects, and thus the sequence of effects can be reliably inspected in advance.

This limitation, if you can even call it that, gives us a ton of utility in program analysis. For any given sequence of Applicative Effects we can analyse it and produce a list of all the planned effects before running any of them, then could ask the end-user for permission before running potentially harmful effects.

Let's see what this looks like for our ReadWrite effect.
import Control.Applicative (liftA3)
import Control.Monad.Writer (Writer, runWriter, tell)

-- | We only require the Applicative interface now
class (Applicative m) => ReadWrite m where
  readLine :: m String
  writeLine :: String -> m ()

data Command
  = ReadLine
  | WriteLine String
  deriving (Show)

-- | We can implement an instance which runs a dummy interpreter that simply records the commands
-- the program wants to run, without actually executing anything for real.
instance ReadWrite (Writer [Command]) where
  readLine = tell [ReadLine] *> pure "Simulated User Input"
  writeLine msg = tell [WriteLine msg]

-- | A helper to run our program and get the list of commands it would execute
recordCommands :: Writer [Command] String -> [Command]
recordCommands w = snd (runWriter w)

-- | A simple program that greets the user.
myProgram :: (ReadWrite m) => String -> m String
myProgram greeting =
  liftA3
    (\_ name _ -> name)
    (writeLine (greeting <> ", what is your name?"))
    readLine
    (writeLine "Welcome!")

-- We can now run our program in the Writer applicative to see what it would do!
main :: IO ()
main = do
  let commands = recordCommands (myProgram "Hello")
  print commands

-- [WriteLine "Hello, what is your name?",ReadLine,WriteLine "Welcome!"]
Since this interface doesn't provide us with a bind, we can't use results from readLine in a future writeLine effect, which is a bummer. It's clear that Applicatives are less expressive in this way, but we can run an analysis of a program written in the Applicative ReadWrite to see exactly which effects it will run, and which arguments each of them are provided with, before we execute anything for real.

I hope that's enough ink to convince you that it's not a simple matter of "more expressive is always better", but rather that expressiveness exists on a continuum between ease of program analysis and expressiveness.

Expressive power comes at a cost, specifically the cost of analysis.

Closer to the Sweet Spot

So clearly Applicatives are nice, but they're a pretty strong limitation and prevent us from writing a lot of useful programs. What if there was an interface somewhere on the spectrum between the two?

Selective Applicatives fit nicely between Applicatives and Monads.

If you haven't heard of them, this isn't a tutorial on Selective itself, so go read up on them here if you like.

The interface for Selective Applicatives is similar to Applicatives, but they allow us to specify a known set of branching codepaths that our program may choose between when executing. Unlike the monadic interface, these branching paths need to be known and enumerated in advance, we can't make them up on the fly while running our effects.

This interface gets us much closer to matching the level of expressiveness we actually need for everyday programming while still granting us most of the best benefits of program analysis.

Here's an example of what it looks like to analyse a ReadWriteDelete program using Selective Applicatives:
import Control.Monad.Writer
import Control.Selective as Selective
import Data.Either
import Data.Functor ((<&>))

-- We require the Selective interface now
class (Selective m) => ReadWriteDelete m where
  readLine :: m String
  writeLine :: String -> m ()
  deleteMyHardDrive :: m ()

data Command
  = ReadLine
  | WriteLine String
  | DeleteMyHardDrive
  deriving (Show)

-- | "Under" is a helper for collecting the 
-- *minimum* set of effects we might run.
instance ReadWriteDelete (Under [Command]) where
  readLine = Under [ReadLine]
  writeLine msg = Under [WriteLine msg]
  deleteMyHardDrive = Under [DeleteMyHardDrive]

-- | "Over" is a helper which collects *all* possible effects we might run.
instance ReadWriteDelete (Over [Command]) where
  readLine = Over [ReadLine]
  writeLine msg = Over [WriteLine msg]
  deleteMyHardDrive = Over [DeleteMyHardDrive]

-- | A "real" IO instance
instance ReadWriteDelete IO where
  readLine = getLine
  writeLine msg = putStrLn msg
  deleteMyHardDrive = putStrLn "Deleting hard drive... Just kidding!"

-- | A program using Selective effects
myProgram :: (ReadWriteDelete m) => m String
myProgram =
  let msgKind =
        Selective.matchS
          -- The list of values our program has explicit branches for.
          -- These are the values which will be used to crawl codepaths when
          -- analysing your program using `Over`.
          (Selective.cases ["friendly", "mean"])
          -- The action we run to get the input
          readLine
          -- What to do with each input
          ( \case
              "friendly" -> writeLine ("Hello! what is your name?") *> readLine
              "mean" -> 
                let msg = unlines [ "Hey doofus, what do you want?"
                                  , "Too late. I deleted your hard-drive."
                                  , "How do you feel about that?"
                                  ]
                 in writeLine msg *> deleteMyHardDrive *> readLine
              -- This can't actually happen.
              _ -> error "impossible"
          )
      prompt = writeLine "Select your mood: friendly or mean"
      fallback =
        (writeLine "That was unexpected. You're an odd one aren't you?")
          <&> \() actualInput -> "Got unknown input: " <> actualInput
   in prompt
        *> Selective.branch
          msgKind
          fallback
          (pure id)

allPossibleCommands :: Over [Command] x -> [Command]
allPossibleCommands (Over cmds) = cmds

minimumPossibleCommands :: Under [Command] x -> [Command]
minimumPossibleCommands (Under cmds) = cmds

runIO :: IO String
runIO = myProgram

-- | We can now run our program in the Writer applicative to see what it would do!
main :: IO ()
main = do
  let allCommands = allPossibleCommands myProgram
  let minimumCommands = minimumPossibleCommands myProgram
  putStrLn "All possible commands:"
  print allCommands
  putStrLn "Minimum possible commands:"
  print minimumCommands

-- All possible commands:
-- [ WriteLine "Select your mood: friendly or mean"
-- , ReadLine
-- , WriteLine "Hey doofus, what do you want?\nToo late. I deleted your hard-drive.\nHow do you feel about that?"
-- , DeleteMyHardDrive
-- , ReadLine
-- , WriteLine "Hello! what is your name?"
-- , ReadLine
-- , WriteLine "That was unexpected. You're an odd one aren't you?"
-- ]
--
-- Minimum possible commands:
-- [ WriteLine "Select your mood: friendly or mean"
-- , ReadLine
-- ]
Okay, so now you've read a program which uses the full power of Selective applicative to branch based on the results of previous effects.

We can branch on user input to select either a friendly or mean greeting style, so it's clearly more expressive than the Applicative version, but it's also pretty obvious that this is the clunkiest option available. It's a bit tricky to write, and is also pretty tough to read.

We can now branch on user input, but since we need to pre-configure an explicit branch for every possible input we want to handle, we can't even write a simple program which echos back whatever the user types in, or even one that greets them by name. There are clearly still some substantial limitations on which programs we can express here.

However, let's look on the bright side for a bit, similar to our approach with Applicatives we can analyse the commands our program may run. This time however, we've got branching paths in our program.

The selective interface gives us two methods to analyse our program:

The Under newtype will let us collect the minimum possible sequence of of effects that our program will run no matter what inputs it receives.

The Over newtype instead collects the list of all possible effects that our program could possibly encounter if it were to run through all of its branching paths.

This isn't as usful as receiving, say, a graph representing the possible execution paths, but it does give us enough information to give users a warning aobut what a program might possibly do, we can let them know that hey, I don't know exactly what will cause it, but this program has the ability to delete your hard-drive.

You can of course write additional Selective interfaces, or use the Free Selective to re-write Selective computations in order to optimize or memoize them as you wish just like you can with Applicatives.

It's clear at this point that Selectives are another good tool, but the limitations are still too severe:

We can't use results from previous effects in future effects.

We can't express things like loops or recursion which require effects

Branching logic like case-statements are expressible, but very cumbersome.

The syntax for writing programs using Selective Applicatives is a bit rough, and there's no do-notation equivalent.

In search of the true sweet spot

This isn't a solved problem yet, but don't worry, there are yet more methods of sequencing effects to explore!

It may take me another 5 years to finally finish it, but at some point we'll continue this journey and explore how we can sequence effects using the hierarchy of Category classes instead. Perhaps you've wondered why Arrows don't get more love, we'll dive into that too! We'll seek to find a more tenable middle-ground on our Expressiveness Spectrum, a place where we can analyze possible execution paths without sacrificing the ability to write the programs we need.

I hope this blog post helps others to understand that while Monads were a huge discovery to the benefit of functional programming, that we should keep looking for abstractions which are a better fit for the problems we generally face in day-to-day programming.

Hopefully you learned something ðŸ¤ž! Did you know I'm currently writing a book? It's all about Lenses and Optics! It takes you all the way from beginner to optics-wizard and it's currently in early access! Consider supporting it, and more posts like this one by pledging on my Patreon page! It takes quite a bit of work to put these things together, if I managed to teach your something or even just entertain you for a minute or two maybe send a few bucks my way for a coffee? Cheers! ðŸ�»
September 24, 2025 12:00 AM

September 22, 2025

GHC Developer Blog

GHC 9.12.3-rc1 is now available

GHC 9.12.3-rc1 is now available

wz1000 - 2025-09-22

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.12.3. Binary distributions, source distributions, and documentation are available at downloads.haskell.org and via GHCup.

GHC 9.12.3 is a bug-fix release fixing several issues of a variety of severities and scopes. A full accounting of these fixes can be found in the release notes. As always, GHCâ€™s release status, including planned future releases, can be found on the GHC Wiki status.

This release candidate will have a two-week testing period. If all goes well the final release will be available the week of 2 October 2025.

We would like to thank Well-Typed, Tweag I/O, Juspay, QBayLogic, Channable, Serokell, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at September 22, 2025 12:00 AM

September 21, 2025

Mark Jason Dominus

My new git utility `what-changed-twice` needs a new name
As I have explained in the past, my typical workflow is to go along commiting stuff that might or might not make sense, then clean it all up at the end, doing multiple passes with git-add and git-rebase to get related changes into the same commit, and then to order the commits in a sensible way. Yesterday I built a new utility that I found helpful. I couldn't think of a name for it, so I called it what-changed-twice, which is not great but my I am bad at naming things and my first attempt was analyze-commits. I welcome suggestions. In this article I will call it Fred.

What is Fred for? I have a couple of uses for it so far.

Often as I work I'll produce a chain of commits that looks like this:
470947ff minor corrections
d630bf32 continue work on `jq` series
c24b8b24 wip
f4695e97 fix link
a8aa1a5c sp
5f1d7a61 WIP
a337696f Where is the quincunx on the quincunx?
39fe1810 new article: The fivefold symmetry of the quince
0a5a8e2e update broken link
196e7491 sp
bdc781f6 new article: fpuzhpx
40c52f47 merge old and new seasons articles and publish
b59441cd finish updating with Star Wars Droids
537a3545 droids and BJ and the Bear
d142598c Add nicely formatted season tables to this old article
19340470 mention numberphile video
It often happens that I will modify a file on Monday, modify it some more on Tuesday, correct a spelling error on Wednesday. I might have made 7 sets of changes to the main file, of which 4 are related, 2 others are related to each other but not to the other 4, and the last one is unrelated to any of the rest. When a file has changed more than once, I need to see what changed and then group the changes into related sets.

The sp commits are spelling corrections; if the error was made in the same unmerged topic branch I will want to squash the correction into the original commit so that the error never appears at all.

Some files changed only once, and I don't need to think about those at this stage. Later I can go back and split up those commits if it seems to make the history clearer.

Fred takes the output of git-log for the commits you are interested in:
$ git log --stat -20 main...topic | /tmp/what-changed-twice
It finds which files were modified in which commits, and it prints a report about any file that was modified in more than one commit:
 calendar/seasons.blog  196 40 d1
  math/centrifuge.blog  193 33
misc/straight-men.blog  53 b5 bd
        prog/jq-2.blog  33 5f d6 

    193  1934047
    196  196e749
     33  33a2304
     40  40c52f4
     53  537a354
     5f  5f1d7a6
     b5  b59441c
     bd  bdc781f
     d1  d142598
     d6  d630bf3
The report is in two parts. At the top, the path of each file that changed more than once in the log, and the (highly-abbreviated) commit IDs of the commits in which it changed. For example, calendar/seasons.blog changed in commits 196, 40, and d1. The second part of the report explains that 196 is actually an abbreviation for commit 196e749.

Now I can look to see what else changed in those three commits:
$ git show --stat 196e749 40c52f4 d142598
then look at the changes to calendar/seasons.blog in those three commits
$ git show 196e74 40c52f4 d142598 -- calendar/seasons.blog
and then decide if there are any changes I might like to squash together.

Many other files changed on the branch, but I only have to concern myself with four.

There's bonus information too. If a commit is not mentioned in the report, then it only changed files that didn't change in any other commit. That means that in a rebase, I can move that commit literally anywhere else in the sequence without creating a conflict. Only the commits in the report can cause conflicts if they are reordered.

I write most things in Python these days, but this one seemed to cry out for Perl. Here's the code.

Hmm, maybe I'll call it squash-what.
by Mark Dominus (mjd@plover.com) at September 21, 2025 02:48 PM

September 18, 2025

Philip Wadler

Curious: Brave New Bullshit

by Philip Wadler (noreply@blogger.com) at September 18, 2025 04:34 PM

Gabriella Gonzalez

Steering Committee Retrospective

Steering Committee Retrospective
I am voluntarily ending my Nix Steering Committee term early (I am only serving out a one-year term instead of two) and I wanted to document the reasons for my early exit.

The short version is: I believe the Nix Steering Committee is in need of reform in order to be effective and in its present state it does not set up the Nix community for success nor does it set up individual Steering Committee members for success. In particular, I’m resigning because I’m unable to make progress on issues that I care about and campaigned on even when there is a Steering Committee supermajority in favor of these policy positions.

That might sound surprising, which brings me to the longer version of my concerns, starting with:

Size

I believe the Steering Committee is too large and should be reduced in size (which would require a change to the Constitution). I think the Steering Committee should be (conservatively) reduced to five members and possibly (more aggressively) reduced to even just three members. The large size of the Steering Committee is counterproductive because of:

diffusion of responsibility

Steering Committee members are less willing to step up and volunteer for various responsibilities if they believe they can offload that responsibility onto another Steering Committee member.

This also has multiple negative downstream effects. For example, you tend to see an unequal division of responsibilities which in turn leads to all participants engaging less: the participants who volunteer too much burn out and the participants who volunteer too little check out.

more stagnation

It’s much harder and slower to round up a majority of votes on anything when the committee is larger. This doesn’t just affect final votes on community policies: it slows down intermediate steps such as delegation of tasks, public statements … everything. The high latency and activation energy surrounding all of these things kills momentum on a lot of internal efforts and fosters a committee culture of learned helplessness.

greater difficulty building consensus

The Steering Committee can technically force certain policies/statements/initiatives through by simple majorities over the protest of the minority, but we try to avoid this as much as possible because that’s an easy way to kill the working relationship between committee members (and it’s already hard enough to get anything done when the working relationship is good).

The consensus-building is also particularly difficult because of the next issue:

Timidity

Consensus-building wouldn’t be as much of a problem if the Steering Committee were willing to force through certain policies with a vote but many of the current Steering Committee members do not have the temperament to “disagree and commit”, which means that if any committee member raises an objection and/or filibusters then the issue typically dies in committee. In particular, several committee members will wait for unanimous consensus before formally voting in support of something. For example, there were a few cases where we had a supermajority of the committee theoretically in support of a policy and we still got bogged down trying to please a highly vocal minority instead of shutting them down.

Poor self-organization and internal policies/procedures

As the first “edition” of the Steering Committee we had to self-organize and figure out how we would operate. I think there are some things we got right, but also some things that I believe we got wrong.

I think one of the big mistakes we made was that we insisted on “speaking with one voice”, meaning that we could not make any meaningful external statements or comments without getting majority approval from the committee (something we had difficulty with on the regular). This is why the committee remained largely silent or slow-to-respond on a large number of issues.

This problem got bad enough that at some point many members began to break the wall of silence by commenting in an unofficial capacity on high-profile issues so that outsiders would get some visibility into what was going on instead of waiting for us to completely the slow process of gathering enough consensus and votes.

Another internal policy that I believe was counter-productive was not disclosing the final votes on various issues or requiring individual signatories on public statements. Had we done this it would have likely broken a lot of internal stalemates and filibusters if all committee members were held publicly accountable for their policy positions (and therefore subject to public pressure).

It would have also helped with another issue, which was:

Absenteeism

For various reasons (some justifiable, some not), at many points in time a large number of committee members would be unreachable, even during crucial junctures like ongoing controversy. This absenteeism was masked by the committee not publicizing that fact earlier. If we had required all votes to be publicly recorded and all statements to require individual signatories it would have exposed this absenteeism earlier (and led to quicker corrections).

Conclusion

I burned out on Steering Committee work for the above reasons, which is why I’m ending my term after one year instead of two.

I hope that people reading this push for reforms and candidates that will address the current stagnation on the committee, which is why I’m breaking the wall of silence to publicize my criticisms. I’ve done my part attempting to fix some of these issues but I haven’t been successful in doing so (one reason why I believe that I’m not the correct person for the job).

I don’t want to give the impression that the Steering Committee accomplished nothing or that they were a force for bad/harm. There were several positive outcomes of the Steering Committee’s first year, but overall I feel like there is still wasted potential that could be improved upon. I originally ran for the Nix Steering Committee because I want to see Nix win, meaning that I want Nix to go mainstream and I also want Nix/NixOS/Nixpkgs to come out ahead against other forks.

The early end of my term means that there is another Steering Committee opening for the upcoming election, so if you believe you can do a better job of fixing the problem I encourage you to run for the seat I’m vacating. There are five openings on the Steering Committee up for election, so there is ample opportunity for newcomers to shake things up.

by Gabriella Gonzalez (noreply@blogger.com) at September 18, 2025 04:18 PM

Tweag I/O

Managing dependency graph in a large codebase

In the previous post, we explored the concepts of the dependency graph and got familiar with some of its applications in the context of build systems. We also observed that managing dependencies can be complicated.

In this post, we are going to take a closer look at some of the issues you might need to deal with when working in a large codebase, such as having incomplete build metadata or conflicting requirements between components.

Common issues

Diamond dependency

The diamond dependency problem is common in large projects, and resolving it often requires careful dependency version management or deduplication strategies.

Imagine you have these dependencies in your project:

Packaging appA and appB individually is not a problem because they will end up having libX of a particular version. But what if appA starts using something from libB as well? Now when building appA, it is unclear what version of libX should be used — v1 or v2. This results in having a part of the dependency graph looking like a diamond hence the dependency name.

Depending on the programming language and the packaging mechanisms, it might be possible to specify that when calls are made from libA, then libX.v1 should be used, and when calls are made from libB, then libX.v2 should be used, but in practice it can get quite complicated. The worst situation is perhaps when appA is compatible with both v1 and v2, but may suffer from intermittent failures when being used in certain conditions such as under high load. Then you would actually be able to build your application, and since it includes a “build compatible” yet different version of the third-party library, you won’t be able to spot the issue straight away.

Some tools, such as the functional package manager nix, treat packages as immutable values and allow you to specify exact versions of dependencies for each package, and these can coexist without conflict.

Having a single set of requirements can also be desirable, because if all the code uses the same versions of required libraries, you avoid version conflicts entirely and everyone in the company works with the same dependencies, reducing “works on my machine”-type issues. In practice, however, this is often unrealistic for large or complex projects, especially in large monorepos or polyglot codebases. For instance, upgrading a single dependency may require updating many parts of the codebase at once, which might be risky and time-consuming. Likewise, if you want to split your codebase into independently developed modules or services, a single requirements set can become a bottleneck.

Re-exports

Re-exports — when a module imports a member from another module and re-exports it — are possible in some languages such as Python or JavaScript.

Take a look at this graph

where appA needs value of dpi from the config, but instead of importing from the config, it imports it from libA. While re-exports may simplify imports and improve encapsulation, they also introduce implicit dependencies: downstream code like appA becomes coupled not only to libA, but also to the transitive closure of libA. In this graph this means that changes in any modules that libA depends on would require rebuilding appA. This is not truly needed since appA doesn’t really depend on any code members from that closure.

To improve the chain of dependencies, the refactored graph would look like this:

Identifying re-exports can be tricky particularly with highly dynamic languages such as Python. The available tooling is limited (e.g. see mypy), and custom static analysis programs might need to be written.

Stale dependencies

Maintaining up-to-date and correct build metadata is necessary to represent the dependency graph accurately, but issues might appear silently. For example, you might have modules that were once declared to depend on a particular library but do not depend on them any longer (however, the metadata in build files suggests they still are). This can cause your modules to be unnecessarily rebuilt every time the library changes.

Some build systems such as Pants rely on dependency inference where users do not have to maintain the build metadata in build files, but any manual dependencies declared (where inference cannot be done programmatically in all situations) still need to be kept up-to-date and might easily get stale.

There are tools that can help ensuring the dependency metadata is fresh for C++ (1, 2) Python, and JVM codebases, but often keeping the build metadata up-to-date is still a semi-automated process that cannot be safely automated completely due to edge cases and occasional false positives.

Incompatible dependencies

It is possible for an application to end up depending on third-party libraries that cannot be used together. This could be enforced for multiple reasons:

to ensure the design is sane (e.g., only a single cryptography library may be used by an application)

to avoid malfunctioning of the service (e.g., two resource intensive backend services can’t be run concurrently)

to keep the CI costs under control (e.g., tests may not depend on a live database instance and should always use rich mock objects instead).

Appropriate rules vary between organizations, and should be updated continuously as the dependency graph evolves. If you use Starlark for declaring build metadata, take a look at buildozer which can help querying the build files when validating dependencies statically.

Large transitive closures

If a module depends on a lot of other modules, it’s more likely that it will also need to be changed whenever any of those dependencies change. Usually, bigger files (with more lines of code) have more dependencies, but that’s not always true. For example, a file full of boilerplate or generated code might be huge, but barely depend on anything else. Sticking to good design practices — like grouping related code together and making sure classes only do one thing — can help keep your dependencies under control.

For example, with this graph

a build system is likely to require running all test cases in tests should any of the apps change which would be wasteful most of the time since most likely you are going to change only one of them at a time.

This could be refactored in having individual test modules targeting every application individually:

Third-party dependencies

It is generally advisable to be cautious about adding any dependency, particularly a third-party one, and its usage should be justified — it may pay off to be reluctant to adding any external dependencies unless the benefits of bringing them outweigh the associated cost.

For instance, a team working on a Python command-line application processing some text data may consider using pandas because it’s a powerful data manipulation tool and twenty lines of code written using built-in modules could be replaced by a one-liner with pandas. But what happens when this application is going to be distributed? The team will have to make sure that pandas (which contains C code that needs to be compiled) can be used on all supported operating systems and CPU architectures meeting the reliability and performance constraints.

It may sound harsh, but there’s truth to the idea that every dependency eventually becomes a liability. By adding a dependency (either to your dependency graph, if it’s a new one, or to your program), you are committing to stay on top of its security vulnerabilities, compatibility with other dependencies and your build system, and licensing compliance.

Adding a new dependency means adding a new node or a new edge to the dependency graph, too. The graph traversal time is negligible, but the time spent on rebuilding code at every node is not. The absolute build time is less of a problem since most build systems can parallelize build actions very aggressively, but what about the computational time? While developer time (mind they still have to wait for the builds to finish!) is far more valuable than machine time, every repeated computation during a build contributes to the total build cost. These operations still consume resources — whether you’re paying a cloud provider or covering the energy and maintenance costs of an on-premises setup.

Cross-component dependencies

It is common for applications to depend on libraries (shared code), however, it is also possible (but less ideal) for an application to use code from another application. If multiple applications have some code they both need, it is often advisable that this code is extracted into a shared library so that both applications can depend on that instead.

Modern build systems such as Pants and Bazel have a visibility control mechanism that enforces rules of dependency between your codebase components. These safeguards exist to prevent developers from accessing and incorporating code from unrelated parts of the codebase. For instance, when building source code for accounting software, the billing component should never depend on the expenses component just because it also needs to support exports to PDF.

However, visibility rules may not be expressive enough to cover certain cases. For instance, if you follow a particular deployment model, you may need to make sure that a specified module will never end up as a transitive dependency of a certain package. You may also want to enforce that some code is justified to exist in a particular package only if it’s being imported by some others. For example, you may want to prevent placing any modules in the src/common-plugins package unless they are imported by src/plugins package modules to keep the architecture robust.

Keep in mind that when introducing a modern build system to a large, legacy codebase that has evolved without paying attention to the dependency graph’s shape, builds may be slow not because the code compilation or tests take long, but because any change in the source code requires re-building most or all nodes of the dependency graph. That is, if all nodes of the graph transitively depend on a node with many widely used code members that are modified often, there will be lots of re-build actions unless this module is split across multiple modules each containing only closely related code.

Direct change propagation

When source code in a module is changed, downstream nodes (reverse dependencies of this module) often get rebuilt even if the specific changes don’t truly require it. In large codebases, this causes unnecessary rebuilds, longer feedback cycles, and higher CI costs.

In most build systems (including Bazel and GNU Make), individual actions or targets are invalidated if their inputs change. In GNU Make, this would be mtime of declared input files, and in Bazel, this would be digests, or the action key. Most build systems can perform an “early cutoff” if the output of an action doesn’t change. Granted, with GNU Make, the mtime could be updated even if the output was already correct from a previous build (which will force unnecessary rebuilds), but that’s a very nuanced point.

However, with Application Binary Interface (ABI) awareness, it would only be necessary to rebuild downstream dependencies if the interface they rely on has actually changed.

A related idea is having a stable API, which can help figure out which nodes in the graph actually changed. Picture a setup like this — an application depends on the database writer module which in turn depends on the database engine:

This application calls the apply function from the database writer module to insert some rows, which then uses the database engine to handle the actual disk writing. If anything in internals changes (e.g., how the data is compressed before writing to disk), the client won’t notice as long as the writer’s interface stays the same. That interface acts as a “stable layer” between the parts. In the build context, running tests of the application should not be necessary on changes in the database component.

Practically, reordering methods in a Java class, adding a docstring to a Python function, or even making minor changes in the implementation (such as return a + b instead of return b + a) would still be marking that node in the graph as “changed” particularly if you rely on tooling that queries modified files in the version control repository without taking into account the semantics of the change.

Therefore, relying on the checksum of a source file or all files in a package (depending on what a node in your dependency graph represents) just as relying on checksum of compiled objects (be it machine code or bytecode) may prove insufficient when determining what kind of change deserves to be propagated further in the dependency chain of the graph. Take a look at the Recompilation avoidance in rules_haskell to learn more about checksum based recompilation avoidance in Haskell.

Many programming languages have language constructs, such as interfaces in Go, that can avoid this problem by replacing a dependency on some concrete implementation with a dependency on a shared public interface. The application from the example above could depend on a database interface (or abstract base class) instead of the actual implementation. This is another kind of “ABI” system that avoids unnecessary rebuilds and helps to decouple components.

How ABI compatibility is handled depends on the build system used. In Buck, there is a concept of Java ABI that is used to figure out which nodes actually need rebuilding during an incremental build. For example, a Java library doesn’t always need to be rebuilt just because one of its dependencies changed unless the public interface of that dependency changed too. Knowing this helps skip unnecessary rebuilds when the output would be the same anyway.

In the most recent versions of Bazel, there is experimental support for dormant dependencies which are not an actual dependency, but the possibility of one. The idea is that every edge between nodes can be marked as dormant, and then it is possible for it to be passed up the dependency graph and turned into an actual dependency (“materialized”) in the reverse transitive closure. Take a look at the design document to learn more about the rationale.

We hope it is clear now how notoriously complex managing a large dependency graph in a monorepo is. Changes in one package can ripple across dozens or even hundreds of interconnected modules. Developers must carefully coordinate versioning, detect and prevent circular dependencies, and ensure that builds remain deterministic, particularly in industries with harder reproducibility constraints such as automotive or biotech.

Failing to keep the dependency graph sane often leads to brittle CI pipelines and long development feedback loops which impedes innovation and worsens developer experience. In the future, we can expect more intelligent tools to emerge such as machine learning based dependency impact analyzers that predict downstream effects of code changes and self-healing CI pipelines that auto-adjust scope and change propagation. Additionally, semantic-aware refactoring tools and “intent-based” build systems could automate much of the manual effort that is currently required to manage interdependencies at scale.

In the next post, we’ll talk about scalability problems and limitations of the dependency graph scope that is exposed by build systems and explore some applications of graph querying that are relevant for tests selection and code review assignment strategy.

September 18, 2025 12:00 AM

September 16, 2025

Magnus Therning

Listing buffers by tab using consult and bufferlo
I've gotten into the habit of using tabs, via tab-bar, to organise my buffers when I have multiple projects open at once. Each project has its own tab. There's nothing fancy here (yet), I simply open a new tab manually before opening a new project.

A while ago I added bufferlo to my config to help with getting consult-buffer to organise buffers (somewhat) by tab. I copied the configuration from the bufferlo README and started using it. It took me a little while to notice that the behaviour wasn't quite what I wanted. It seemed like one buffer "leaked" from another tab.

Figure 1: Example of buffer leakage

In the image above all files in ~/.emacs.d should be listed under Other Buffers, but one has been brought over into the tab for the Sider project.

After a bit of experimenting I realised that

the buffer that leaks is the one I'm in when creating the new tab, and

my function for creating a new tab doesn't work the way I thought.

My function for creating a new tab looked like this
(lambda ()
  (interactive)
  (tab-new)
  (dashboard-open))
and it turns out that tab-new shows the current buffer in the new tab which in turn caused bufferlo to associate it to the wrong tab. From what I can see there's no way to tell tab-new to open a specific buffer in the newly created tab. I tried the following
(lambda ()
  (interactive)
  (with-current-buffer dashboard-buffer-name
    (tab-new)))
hoping that the dashboard would open in the new tab. It didn't, it was still the active buffer that popped up in the new tab.

In the end I resorted to use bufferlo-remove to simply remove the current buffer from the new tab.
(lambda ()
  (interactive)
  (tab-new)
  (bufferlo-remove (current-buffer))
  (dashboard-open))
No more leakage and consult-buffer works like I wanted it to.

Tags: emacs
September 16, 2025 06:29 AM

September 14, 2025

Haskell Interlude

70: Phil Wadler

We sat down with Phil Wadler, one of the most influential folks in the Haskell community, functional programming, and programming languages, responsible for type classes, monads, and much more. We take a stroll down memory lane, starting from Haskell's inception. We talked about the difference between research and Phil's work on impactful industrial projects and standards - specifically XML and the design of generics in Java, as well as Phll's teaching at the University of Edinburgh using Agda.. Phil is a fountain of great ideas and stories, and this conversation could have gone on for hours. As it is, we hope you enjoy the hour that we had as much as we did.

by Haskell Podcast at September 14, 2025 07:00 AM

Christopher Allen

Moonbit developers are lying to you

The Moonbit team recently published a blog post claiming their language runs "30% faster than Rust" for FFT workloads. This is a lie by omission. They benchmarked against a deliberately crippled Rust implementation that no competent programmer would write.

The Moonbit FFT benchmark used a crippled Rust baseline and used to claim their language was faster than Rust.

My corrected Rust implementation is 3.2–3.4× faster than Moonbit on the same benchmark.

In 5 minutes of prompting GPT-5, I produced a Rust version already 2.33× faster than Moonbit.

Zero PRs merged or replied to by the team at time of writing. There are PRs fixing the Rust benchmark older than their tweet announcing Moonbit was faster than Rust.

Moonbit devs are programming language developers that have marketed their language aggressively on the basis of performance for awhile now, they know better than this.

Moonbit should retract or clearly amend their blog post with corrected Rust baseline results. Including the qualification that their benchmark is a naive Cooley-Tukey FFT benchmark and nothing else.

by Unknown at September 14, 2025 12:00 AM

September 13, 2025

Philip Wadler

Haskell equations, thirty-eight years later
One night, while drifting off to sleep (or failing to), I solved a conundrum that has puzzled me since 1987.
Before Haskell there was Orwell. In Orwell equations were checked to ensure order is unimportant (similar to Agda today). When an equation was to match only if no previous equation applied, it was to be preceded by ELSE. Thus, equality on lists would be defined as follows:
    (==) :: Eq a => [a] -> [a] -> Bool
    [] == []          =  True
    (x:xs) == (y:ys)  =  x == y && xs == ys
    ELSE
    _ == _            =  False
We pondered whether to include this restriction in Haskell. Further, we wondered whether Haskell should insist that order is unimportant in a sequence of conditionals, unless ELSE was included. Thus, equality on an abstract type Shape would be defined as follows:
    (==) :: Shape -> Shape -> Bool
    x == y | circle x && circle y  =  radius x == radius y
           | square x && square y  =  side x == side y
    ELSE
           | otherwise             =  False
In Orwell and early Haskell, guards were written at the end of an equation and preceded by the keyword if or the end of an equation could be labelled otherwise. (Miranda was similar, but lacked the keywords.) Here I use the guard notation, introduced later by Paul Hudak, where otherwise is a variable bound to True.
Sometime two equations or two guards not separated by ELSE might both be satisfied. In that case, we thought the semantics should ensure that both corresponding right-hand sides returned the same value, indicating an error otherwise. Thus, the following:
    plus :: Thing -> Thing -> Thing
    plus x y | zero x     =  y
             | zero y     =  x
    ELSE
             | otherwise  =  ...
would be equivalent to:
    plus :: Thing -> Thing -> Thing
    plus x y | zero x && zero y && x == y    =  x
             | zero x && zero y && x /= y    =  error "undefined"
             | zero x && not (zero y)        =  y
             | not (zero x) && zero y        =  x
             | not (zero x) && not (zero y)  =  ...
Here the code checks that if x and y are both zero then they are the same. (I will consider a refinement to the check for sameness later.) Of course, the compiler would issue code that performs the tests zero x, zero y, and x == y at most once.
We didn’t pursue this design in Haskell for two reasons. First, because we thought it might be too unfamiliar. Second, because the ELSE on a line by itself was syntactically awkward. It would be especially annoying if one ever wanted the usual cascading behaviour:
    f :: Thing -> Thing
    f x | p x  =  ...
    ELSE
        | q x  =  ...
    ELSE
        | r x  =  ...
Here each guard is tested in turn, and we take the first that succeeds.
Today, the first problem is perhaps no longer quite so strong an issue. Many applications using Haskell would welcome the extra assurance from flagging any cases where order of the equations is significant. But the syntactic awkwardness of ELSE remains considerable. It was syntax about which I had an insight while tossing in bed.
Above otherwise is a variable bound to True in the standard prelude. But say we were to treat otherwise as a keyword, and to give it the meaning that the equation applies only if no previous equation applies, and to allow it to optionally be followed by a further guard. Then our first example becomes:
    (==) :: Eq a => [a] -> [a] -> Bool
    [] == []            =  True
    (x:xs) == (y:ys)    =  x == y && xs == ys
    _ == _ | otherwise  =  False
And our second example becomes:
    (==) :: Shape -> Shape -> Bool
    x == y | circle x && circle y  =  radius x == radius y
           | square x && square y  =  side x == side
           | otherwise             =  False
And our third example becomes:
    plus :: Thing -> Thing -> Thing
    plus x y | zero x     =  y
             | zero y     =  x
             | otherwise  =  ...
If one doesn’t want to invoke the equality test in the case that both zero x and zero y hold then one would instead write:
    plus :: Thing -> Thing -> Thing
    plus x y | zero x            =  y
             | otherwise zero y  =  x
             | otherwise         =  ...
Similarly, the cascading example becomes:
    f :: Thing -> Thing
    f x | p x            =  ...
        | otherwise q x  =  ...
        | otherwise r x  =  ...
That’s it! The syntactic awkwardness is greatly reduced.
The proposed notation depends upon Paul’s clever insight to move the guard from the end of the equation to the middle, so evaluation works strictly left to right. But we’ve had guards in that position for quite a while now. Goodness knows why none of us hit upon this proposal thirty-odd years ago.
Of course, the change is not backward compatible. Changes to guards could be made backward compatible (with added ugliness) by using a different symbol than ‘|’ to mark guards with the new semantics. But now the old definition of (==) should not be accepted without an otherwise, and I cannot think of how to introduce that new semantics with a backward compatible syntax.
The solution, as with so much of Haskell nowadays, is to activate the new semantics with a pragma. Manual porting of legacy code would not be hard in most cases, and it would also be easy to write a tool that adds otherwisewhenever the equations are not easily shown to be independent of order.
John Hughes suggests a further refinement to the above. Using equality to check that the value of two equations is the same may not be appropriate if the values are computed lazily. Instead, he suggests that the plus example should translates as follows:
    plus :: Thing -> Thing -> Thing
    plus x y | zero x && zero y              =  x `meet` y
             | zero x && not (zero y)        =  y
             | not (zero x) && zero y        =  x
             | not (zero x) && not (zero y)  =  ...
Here we presume a type class
    class Meet a where
      meet : a -> a -> a
which confirms that the two arguments are the same and returns a value that is the same as both the arguments. For strict data types, two arguments are the same if they are equal.
    instance Meet Integer where
      x `meet` y | x == y     =  x
                 | otherwise  =  error "undefined"
For lazy data types, we check that they are the meet lazily.
    instance Meet a => Meet [a] where
      [] `meet` []           =  []
      (x:xs) `meet` (y:ys)   =  (x `meet` y) :: (xs `meet` ys)
      meet _ _  | otherwise  =  error "undefined"
If the compiler could not verify that equations are disjoint, it would require that their right-hand sides have a type belonging to the class Meet.
In most cases, one would hope the compiler could verify that equations are disjoint, and hence would not have to resort to meet or additional checks. One might wish to allow a pragma to declare disjointness, permitting the compiler to assume, for instance, that x < y and x >= y are disjoint. An SMT solver could do much of the work of checking for disjointness.
In general, equations not separated with otherwise would be checked to ensure they are disjoint or all give equivalent results. For example,
    g :: Thing -> Thing
    g x | p x             =  a x
        | q x             =  b x
        | otherwise r x   =  c x
        | s x             =  d x
        | otherwise t x   =  e x
would be equivalent to
    g :: Thing -> Thing
    g x | p x && q x              =  a x `meet` b x
        | p x && not (q x)        =  a x
        | q x && not (p x)        =  b x
        | otherwise r x && s x    =  c x `meet` d x
        | r x && not (s x)        =  c x
        | s x && not (r x)        =  d x
        | otherwise t x           =  e x
On the other hand, if we declared that p x and q x are disjoint, and the same for s x and r x, then the first code would instead compile to something equivalent to Haskell’s current behaviour,
    g :: Thing -> Thing
    g x | p x             =  a x
        | otherwise q x   =  b x
        | otherwise r x   =  c x
        | otherwise s x   =  d x
        | otherwise t x   =  e x
One drawback of this proposal is that the source code doesn’t directly indicate when extra tests and the use of meet are required. An IDE might provide feedback to make explicit which tests are performed, or one might also add pragmas or additional syntax to reflect that information in the source.
I hope some reader might be keen to take this forward. What do you think?
by Philip Wadler (noreply@blogger.com) at September 13, 2025 10:37 PM

September 09, 2025

Philip Wadler

Translation Table

I remember seeing a version of the above in High School. My favourite entries, which I quote to this day, are
"... accidentally strained during mounting" --> "... dropped on the floor"
"... handled with extreme care throughout the experiments" --> "... not dropped on the floor"
and
"correct within an order of magnitude" --> "wrong"
From Futility Closet. Spotted via Boing Boing.

by Philip Wadler (noreply@blogger.com) at September 09, 2025 11:34 AM

September 05, 2025

Edward Z. Yang

So you want to control flow in PT2
With contributions from Richard Zou.

PT2’s dominant internal representation, FX graphs, do not directly support control flow (if statements, while loops): they only represent straight-line basic blocks. Most of our graph capture mechanisms are tracing based (fx.symbolic_trace, make_fx, Dynamo), which means that we expect to be able to linearize all conditionals we encounter into a straight line program. Sometimes, you want to work with code that has control flow while working the compiler stack. There is no silver bullet, instead there are a lot of different options with different tradeoffs.

Regional compilation

We have a perfectly good general purpose language that supports control flow: Python. To handle control flow, compile only regions/submodules of your program that have no internal control flow, and then string them together with a standard Python control flow constructs. PT2 compiled regions are compositional with non-compiled regions, “it works.”

Pro:

Simple: requires no major model changes

Universal: it always works (including data dependent flow, calling into third-party libraries, making an HTTP request, anything!)

Cons:

You will not get a full graph this way; you will only get graphs for each region. In particular, you will not be able to do truly global optimizations, nor will you be able to serialize a self-contained Python-less representation of the entire model

It can sometimes be inconvenient to structure your program so all the regions you want are compilable. Suppose you have this call graph between modules: A -> B -> C. C is compileable; A is compileable except for its call to B, which is what does the control flow. It’s easy to compile C, but you can’t directly compile A, as it has a B-shaped bit that can’t be compiled. What to do? If you split A so it is pipelined as A1, B, A2, you can then compile A1 and A2, but not B. Dynamo also supports “graph breaks” to automatically perform this split for you, in which case you just disable compilation on B, but graph break generated graphs can be difficult to reason about as the inputs to A2 are implicitly inferred.

Link: Reducing torch.compile cold start compilation time with regional compilation

Multiple graphs dispatched with guards

When the control flow is controlled by arguments that are known ahead of time (no data-dependent), you can also compile at the top level and get the flattened straight-line program for the particular branching you had in this case. Because Dynamo is a symbolic bytecode interpreter, it can automatically determine what inputs were used as part of control flow, and generate guards to validate that we would take the same paths again. If those values change, we will recompile the program at the new values. We dispatch between all the different unrollings of the program we have generated.

Pros:

Simple: requires no major model changes

You get a full graph for a particular unrolling of loops / conditionals, so global optimizations are possible

Cons:

Doesn’t work with data-dependent shapes.

You will end up with a graph for every unrolling; for example, if you have a loop that ranges from 1 to 32, you will end up with 32 different graphs. This will increase compile time.

Black box via custom operator

An FX graph just calls operators. The operator internally can have whatever control flow in them they want. So you can always black box a problematic region of your model into an operator and preserve compilation for everything else.

Pros:

You get a single, full graph that works for all possible branches

Cons:

A custom operator only supports inputs/outputs that fall inside our type system, which means you can only pass simple types like Tensor, int, bool (or pytree-able containers containing these things). There is some in progress work to relax this to allow more opaque types.

You have to explicitly declare all the inputs/outputs for the custom operator. This can be tiresome if the black boxed region represents a Module, since all the parameters also have to be directly passed in as well. The larger the region you black box, the bigger the arguments are.

You don’t actually get to see the inside of the custom operator from the outside graph, so no optimization over both inside and outside of the custom operator is possible. (Of course, you can always special case this operator in a pass on the outer graph.)

There are some bugs related to doing another torch.compile region inside of a custom operator, although these are workaroundable: https://github.com/pytorch/pytorch/issues/151328

Conditional operators / Unroll to max iterations

Do you really, really need a conditional? If you’re doing an if-branch, can you instead rewrite it so that you run both branches and torch.where dispatch to the results? If you’re doing a while-loop, can you unroll it to the max number of iterations and rely on dynamic shapes to cause it to no-op when you’re done and running extra iterations. Basically, this option is to rewrite your model so it doesn’t have Python-level control flow anymore (the conditional can either be done host or GPU side).

Pros:

You get a single, full graph that works for all possible branches

You are able to optimize inside and outside of the control flow

Cons:

You have to rewrite your model

For unrolling, if you are close to being CPU-dispatch bound, unrolling and running with zero size could push you over the brink (as zero size dispatches are still not free)

For conditional operators, unconditionally both branches increases the compute you need to do, which can be bad if you are compute-bound.
Control flow HOP

torch has special structured control flow operators that avoid unrolling large loops or needing to execute both branches of a control flow statement. If you’re familiar with JAX, these are very similar to the JAX equivalents. They have specific constraints that allow them to be directly compilable by torch.compile. For example, torch.cond accepts two functions (a true_fn and a false_fn) for the two branches and requires that outputs of each function must have the same properties (e.g. shape, dtype).

So far, we have the following “higher-order” operators (HOPs):

torch.cond (differentiable)

torch.while_loop (not yet differentiable)

torch._higher_order_ops.scan (differentiable, ignore the docs)

These are relatively new, have been used in torch.export for inference, but have not been battle tested for training or performance.

The semantics of these control flow operators are as follows:
def cond(pred, true_branch, false_branch, operands):
    if pred:
        return true_branch(*operands)
    else:
        return false_branch(*operands)

def while_loop(cond_fn, body_fn, carried_inputs):
    val = carried_inputs
    while cond_fn(*val):
        val = body_fn(*val)
    return val

def scan(combine_fn, init, xs, length=None):
    carry = init
    ys = []
    for x in xs:
        carry, y = f(carry, x)
        ys.append(y)
    return carry, np.stack(ys)
Pros:

You get a single, full graph that works for all possible branches

You are able to optimize inside and outside of the control flow

Cons:

You have to rewrite your model.

The control flow HOPs are structured: they have specific constraints on the functions (true_fn, false_fn (cond) or body_fn (while_loop)) that can be passed to them. One such constraint is that these functions may not mutate any of their inputs. This may make rewrites difficult because you have to think about code in a “functional”, JAX-like way.

Still WIP and they have some quirks especially for training. For example, the backward pass of torch.scan currently requires re-computing the forward pass (instead of just saving intermediates from each iteration of scan).
CFG over FX graphs

If FX graphs give you basic blocks, you can use them as building blocks for a language that does support conditionals, stringing them together with basic blocks. In fact, Helion, a kernel DSL language, does exactly this, as it is common to need to directly write data-dependent conditionals and loops when writing kernels (it otherwise uses all PyTorch API functions, similar to conventional FX graphs). To do this, you would need to write your own Python frontend that parses Python directly to generate the CFG. TorchScript also does this, but TorchScript frontend is unmaintained and we don’t recommend using it (and it also doesn’t generate FX graphs by default.)

Pros:

You get a single graph that works for all possible branches

You are able to optimize inside and outside of control flow

In principle, you can write exactly the control flow you want

Cons:

You have to write the frontend, we don’t have one ready for you (TorchScript is not it, you’re princess is in another castle)

If your language looks too much like Python and too general purpose, prepare to get on the endless treadmill of feature requests for adding “just one more Python feature” (can we have lists? dataclasses? etc etc) in the frontend (it is more tractable for Helion, as it’s not a general purpose language.)
by Edward Z. Yang at September 05, 2025 02:01 PM

September 04, 2025

Well-Typed.Com

Better Haskell stack traces via user annotations
Getting an accurate and precise backtrace is the key to debugging unexpected exceptions in Haskell programs. We recently implemented a family of functions that enable the user to push user-defined annotations to the native Haskell stack. The native stack decoder can display this information to the user when an unexpected exception is thrown.

This facility offers a number of advantages over the existing backtrace collection mechanisms:

It is not necessary modify the function API (unlike HasCallStack)

A “continuous chain” of modifications is not necessary (unlike HasCallStack)

The annotations work in all ways of compilation (unlike cost centre stacks)

The backtrace is expressed in terms of predictable source locations (unlike some IPE backtraces)

In this post we wil introduce the API for stack annotation, give some examples of how to use the annotation functions and discuss some trade-offs we have noticed with the design.

We’re interested in feedback from users on this feature. We’re expecting it to be available from GHC 9.16, as our implementation already landed in GHC HEAD (!14538).

Annotation stack frames

The core of the design is a new primop, annotateStack#, which when executed pushes an “annotation stack-frame” to the stack. Semantically, the frame is a no-op, but the payload contains a pointer to an arbitrary user-defined annotation. When decoding the native Haskell stack the annotation can be rendered to provide the user with additional context about the current location of the program.

The primop annotateStack# is exposed to the user via an IO-based API in GHC.Stack.Annotation.Experimental from the ghc-experimental package:¹
annotateStackIO :: (Typeable a, StackAnnotation a) => a -> IO b -> IO b
This will push the annotation value a onto the stack for the duration of the IO b action. The constraints allow the value to be rendered to a string or have its type inspected, similarly to the Exception class.

There are also specialised variants:
annotateCallStackIO   :: HasCallStack => IO b -> IO b  -- Annotate with the current source location
annotateStackStringIO :: String       -> IO b -> IO b  -- Annotate with an arbitrary String
annotateStackShowIO   :: Show a => a  -> IO b -> IO b  -- Annotate with the result of 'show' on a value
In addition, there are “pure” variants for use in non-IO code. However, these tend to be less intuitive due to the combination of lazy evaluation and imprecise exceptions, so the IO versions will generally produce better stack traces more reliably.

Note, annotateStack# is heavily inspired by annotated-exception and can be used together with annotated-exception for even better stack traces.

Example of the status quo

Let’s use the annotation functions to improve the backtrace for a program reported in a GHC ticket (#26040). The program implements a simple REST API using servant. When the endpoint is requested with a parameter which is larger than or equal to 100, the endpoint will error. topHandler catches all exceptions thrown by the handler and turns them into an HTTP 505 error. Finally, the exception handler prints any exceptions that might be thrown by the endpoint.
main :: IO ()
main = do
  setBacktraceMechanismState IPEBacktrace True
  run 8086 mkServer

type Api = Capture "x" Int :> Get '[PlainText] Text

mkServer :: Application
mkServer =
  serve
    (Proxy @Api)
    (hoistServer (Proxy @Api) topHandler api)

topHandler :: IO a -> Handler a
topHandler action = do
  result <- liftIO $
    (Right <$> action) `catch` \(exc :: SomeException) -> do
      liftIO $ putStrLn $ "Exception: " <> displayExceptionWithInfo exc
      pure $ Left err500

  either throwError pure result

api :: ServerT Api IO
api = handler

handler :: Int -> IO Text
handler x =
  if x >= 100
    then throw $ ErrorCall "Oh no!"
    else pure (pack "handler")
With the current version of GHC, when calling this API via http://localhost:8086/105, this stack trace is printed:
Exception: ghc-internal:GHC.Internal.Exception.ErrorCall:

Oh no!

IPE backtrace:
  Main.liftIO (src/Servant/Server/Internal/Handler.hs:30:36-42)
  Servant.Server.Internal.Delayed.runHandler' (src/Servant/Server/Internal/Handler.hs:27:31-41)
  Control.Monad.Trans.Resource.runResourceT (./Control/Monad/Trans/Resource.hs:(192,14)-(197,18))
  Network.Wai.Handler.Warp.HTTP1.processRequest (./Network/Wai/Handler/Warp/HTTP1.hs:195:20-22)
  Network.Wai.Handler.Warp.HTTP1.processRequest (./Network/Wai/Handler/Warp/HTTP1.hs:(195,5)-(203,31))
  Network.Wai.Handler.Warp.HTTP1.http1server.loop (./Network/Wai/Handler/Warp/HTTP1.hs:(141,9)-(157,42))
HasCallStack backtrace:
  collectExceptionAnnotation, called at libraries/ghc-internal/src/GHC/Internal/Exception.hs:170:37 in ghc-internal:GHC.Internal.Exception
  toExceptionWithBacktrace, called at libraries/ghc-internal/src/GHC/Internal/Exception.hs:90:42 in ghc-internal:GHC.Internal.Exception
  throw, called at app/Main.hs:42:10 in backtrace-0.1.0.0-inplace-server:Main
In this example there are two different backtraces:

The “IPE backtrace” is constructed by decoding the Haskell stack, using information stored in the binary by -finfo-table-map, where each frame is automatically associated with a source location. (The compiler option -finfo-table-map was originally introduced for profiling.)

On the the other hand, the “HasCallStack backtrace” is built using the implicitly passed HasCallStack constraints, which are automatically supplied by the type-checker, provided HasCallStack appears in the type.

The HasCallStack backtrace seems the most useful, telling us exactly where our program went wrong. However, the backtrace is very brief, as the rest of the program doesn’t have any HasCallStack constraints. As such, this stack trace might be unhelpful in larger programs, if the call to error was placed behind many layers of abstraction.

The IPE backtrace looks impressive, but doesn’t even show us where the exception is thrown! We get more intermediate source locations, but not the source of the exception. The function from which the exception is thrown is not even listed.

The reason the IPE backtrace may be unhelpful lies in the way the Haskell call stack works. We show the IPE info for each stack frame, which doesn’t relate precisely to the original source code and the resulting stack trace feels unintuitive. One reason for this is many function calls are tail-calls which don’t result in stack frames.

For more of an overview of the different backtrace mechanisms consult the discussion section of GHC Proposal #330.

Better stack traces with annotateCallStackIO

The IPE backtrace can be improved by manually annotating important parts of the program which should always appear in a backtrace.

For example, we always want to know in which handler the exception was thrown in, so the handler function is annotated with annotateCallStackIO. Further, we annotate the location where the exception is thrown.
handler :: Int -> IO Text
handler x = annotateCallStackIO $ do
  if x >= 100
    then annotateCallStackIO $ throw $ ErrorCall "Oh no!"
    else pure (pack "handleIndex")
When running this program again, the stack trace will now contain the source location of the handler where exception was thrown from:
Exception: ghc-internal:GHC.Internal.Exception.ErrorCall:

Oh no!

IPE backtrace:
  annotateCallStackIO, called at app/Main.hs:42:10 in backtrace-0.1.0.0-inplace-server:Main
  annotateCallStackIO, called at app/Main.hs:40:13 in backtrace-0.1.0.0-inplace-server:Main
  Main.handler (app/Main.hs:(40,1)-(43,30))
  Main.liftIO (src/Servant/Server/Internal/Handler.hs:30:36-42)
  Servant.Server.Internal.Delayed.runHandler' (src/Servant/Server/Internal/Handler.hs:27:31-41)
  Control.Monad.Trans.Resource.runResourceT (./Control/Monad/Trans/Resource.hs:(192,14)-(197,18))
  Network.Wai.Handler.Warp.HTTP1.processRequest (./Network/Wai/Handler/Warp/HTTP1.hs:195:20-22)
  Network.Wai.Handler.Warp.HTTP1.processRequest (./Network/Wai/Handler/Warp/HTTP1.hs:(195,5)-(203,31))
  Network.Wai.Handler.Warp.HTTP1.http1server.loop (./Network/Wai/Handler/Warp/HTTP1.hs:(141,9)-(157,42))
HasCallStack backtrace:
  collectExceptionAnnotation, called at libraries/ghc-internal/src/GHC/Internal/Exception.hs:170:37 in ghc-internal:GHC.Internal.Exception
  toExceptionWithBacktrace, called at libraries/ghc-internal/src/GHC/Internal/Exception.hs:90:42 in ghc-internal:GHC.Internal.Exception
  throw, called at app/Main.hs:42:32 in backtrace-0.1.0.0-inplace-server:Main
Note the first two entries of the IPE backtrace:
annotateCallStackIO, called at app/Main.hs:42:10 in backtrace-0.1.0.0-inplace-server:Main
annotateCallStackIO, called at app/Main.hs:40:13 in backtrace-0.1.0.0-inplace-server:Main
These have been added due to our manual annotation of our source program via annotateCallStackIO!

They give us precise source location where the exception is thrown, making the IPE backtrace just as useful as the HasCallStack backtrace. However, note, we did not have to change the type signature of handler at all to get a much more informative stack trace.

throwIO vs throw vs error

Some readers may have noticed that we used throw instead of error, which is usually the go to function for throwing example errors (or from within pure code). At the moment, throw and error produce noticeably different stack traces, because error evaluates the exception annotations lazier than throw, which leads to failing to capture the call stack when throwing the exception. This should be possible to resolve; see GHC issue #25430.

On the other hand, throwIO behaves more predictably within IO code and the IPE backtrace includes the source location of the exception throwing:
IPE backtrace:
  Main.handler (app/Main.hs:42:10-45)
  Main.liftIO (src/Servant/Server/Internal/Handler.hs:30:36-42)
  Servant.Server.Internal.Delayed.runHandler' (src/Servant/Server/Internal/Handler.hs:27:31-41)
  Control.Monad.Trans.Resource.runResourceT (./Control/Monad/Trans/Resource.hs:(192,14)-(197,18))
  Network.Wai.Handler.Warp.HTTP1.processRequest (./Network/Wai/Handler/Warp/HTTP1.hs:195:20-22)
  Network.Wai.Handler.Warp.HTTP1.processRequest (./Network/Wai/Handler/Warp/HTTP1.hs:(195,5)-(203,31))
  Network.Wai.Handler.Warp.HTTP1.http1server.loop (./Network/Wai/Handler/Warp/HTTP1.hs:(141,9)-(157,42))
This means that how the exception is thrown is important to get reasonable stack traces. Unsurprisingly, you should use throwIO whenever you are within the IO monad.

Summary

Annotation stack frames are a lightweight way to add extra information to stack traces. By modifying the execution stack, the information is always available and can be used by the native stack decoder to display informative backtraces to users. We’re interested to hear what users think about this feature and how libraries will be adapted to take advantage of the new annotation frames.

This work has been performed in collaboration with Mercury, who have a long-term commitment to the scalability and robustness of the Haskell ecosystem. Well-Typed are always interested in projects and looking for funding to improve GHC and other Haskell tools. Please contact info@well-typed.com if we might be able to work with you!

The ghc-experimental package ships with GHC, but is distinct from base, and has weaker stability guarantees. This allows new APIs to be introduced and fine-tuned before eventually being stabilised and added to base.↩︎
by hannes, matthew at September 04, 2025 12:00 AM

September 03, 2025

Joachim Breitner

F91 in Lean
Back in March, with version 4.17.0, Lean introduced partial_fixpoint, a new way to define recursive functions. I had drafted a blog post for the official Lean FRO blog back then, but forgot about it, and with the Lean FRO blog discontinued, I’ll just publish it here, better late than never.

With the partial_fixpoint mechanism we can model possibly partial functions (so those returning an Option) without an explicit termination proof, and still prove facts about them. See the corresponding section in the reference manual for more details.

On the Lean Zulip, I was asked if we can use this feature to define the McCarthy 91 function and prove it to be total. This function is a well-known tricky case for termination proofs.

First let us have a brief look at why this function is tricky to define in a system like Lean. A naive definition like
def f91 (n : Nat) : Nat :=
  if n > 100
  then n - 10
  else f91 (f91 (n + 11))
does not work; Lean is not able to prove termination of this functions by itself.

Even using well-founded recursion with an explicit measure (e.g. termination_by 101 - n) is doomed, because we would have to prove facts about the function’s behaviour (namely that f91n = f91101 = 91 for 90 ≤ n ≤ 100) and at the same time use that fact in the termination proof that we have to provide while defining the function. (The Wikipedia page spells out the proof.)

We can make well-founded recursion work if we change the signature and use a subtype on the result to prove the necessary properties while we are defining the function. Lean by Example shows how to do it, but for larger examples this approach can be hard or tedious.

With partial_fixpoint, we can define the function as a partial function without worrying about termination. This requires a change to the function’s signature, returning an Option Nat:
def f91 (n : Nat) : Option Nat :=
  if n > 100
    then pure (n - 10)
    else f91 (n + 11) >>= f91
partial_fixpoint
From the point of view of the logic, Option.none is then used for those inputs for which the function does not terminate.

This function definition is accepted and the function runs fine as compiled code:
#eval f91 42
prints some 91.

The crucial question is now: Can we prove anything about f91 In particular, can we prove that this function is actually total?

Since we now have the f91 function defined, we can start proving auxillary theorems, using whatever induction schemes we need. In particular we can prove that f91 is total and always returns 91 for n ≤ 100:
theorem f91_spec_high (n : Nat) (h : 100 < n) : f91 n = some (n - 10) := by
  unfold f91; simp [*]

theorem f91_spec_low (n : Nat) (h₂ : n ≤ 100) : f91 n = some 91 := by
  unfold f91
  rw [if_neg (by omega)]
  by_cases n < 90
  · rw [f91_spec_low (n + 11) (by omega)]
    simp only [Option.bind_eq_bind, Option.some_bind]
    rw [f91_spec_low 91 (by omega)]
  · rw [f91_spec_high (n + 11) (by omega)]
    simp only [Nat.reduceSubDiff, Option.some_bind]
    by_cases h : n = 100
    · simp [f91, *]
    · exact f91_spec_low (n + 1) (by omega)

theorem f91_spec (n : Nat) : f91 n = some (if n ≤ 100 then 91 else n - 10) := by
  by_cases h100 : n ≤ 100
  · simp [f91_spec_low, *]
  · simp [f91_spec_high, Nat.lt_of_not_le ‹_›, *]

-- Generic totality theorem
theorem f91_total (n : Nat) : (f91 n).isSome := by simp [f91_spec]
(Note that theorem f91_spec_low is itself recursive in a somewhat non-trivial way, but Lean can figure that out all by itself. Use termination_by? if you are curious.)

This is already a solid start! But what if we want a function of type f91! (n : Nat) : Nat, without the Option? Then can derive that from the partial variant, as we have just proved that to be actually total:
def f91! (n : Nat) : Nat  := (f91 n).get (f91_total n)

theorem f91!_spec (n : Nat) : f91! n = if n ≤ 100 then 91 else n - 10 := by
  simp [f91!, f91_spec]
Using partial_fixpoint one can decouple the definition of a function from a termination proof, or even model functions that are not terminating on all inputs. This can be very useful in particular when using Lean for program verification, such as with the aeneas package, where such partial definitions are used to model Rust programs.
by Joachim Breitner (mail@joachim-breitner.de) at September 03, 2025 08:18 PM

September 01, 2025

Lysxia's blog

Alpha-beta pruning is just minimax in a lattice of clamping functions
A lazy take on a classic game theory algorithm.

Sip a caffè latte while thinking about lattices
Haskell extensions and imports used in this post
{-# LANGUAGE
  DataKinds,
  DeriveGeneric,
  DeriveTraversable,
  DerivingStrategies,
  GeneralizedNewtypeDeriving,
  RankNTypes,
  ScopedTypeVariables,
  StandaloneDeriving,
  TypeFamilies #-}

import Data.Ord (Down(Down, getDown))
import Data.List.NonEmpty (NonEmpty(..))
import qualified GHC.Generics as GHC
import Generics.SOP (Generic, HasDatatypeInfo, NP(..), K(..))
import Test.QuickCheck
import Test.StrictCheck
Minimax

Minimax is a general algorithm for finding optimal strategies. It’s not meant to be efficient or practical. It is more of a basic concept of game theory, and a reference against which to compare other game-solving algorithms.

We consider a simple model of two-player games. They take turns playing moves until reaching an end state with a final score. One player’s goal is to maximize the score, whereas the other player’s goal is to minimize it. Let us call these players Max and Min respectively, short for Maximizer and Minimizer.

We represent such a game by its game tree, which is made up of three constructors: a Max (resp. Min) node represents a game state where Max (resp. Min) chooses the next move, each move resulting in a new game state, and an End leaf represents an end state as its score.
data Game score
  = Max (NonEmpty (Game score))
  | Min (NonEmpty (Game score))
  | End score
  deriving stock (Show, Functor, Foldable)
Note that Max and Min nodes must have at least one possible move. You may be wondering about games that end when one player can no longer play: instead of an empty Min or Max node, such game states simply correspond to an End leaf, making the final score explicit.

Most real games just have a win/tie/lose end condition. They naturally correspond to applying Game to a type with three possible scores:
data WinLose = MinWins | Tie | MaxWins
  deriving (Eq, Ord, Show)
In practice, chess engines don’t work with the whole game tree since it is too massive. Instead, they build approximations by pruning certain branches of the tree and replacing them with leaves. The score on each leaf is a number which estimates how favorable the game state is to either player. So we end up with Game ℝ, or Game Double.

In general, the type Game represents two-player games with complete information and zero-sum objectives.

We shall assume that score is a totally ordered set. This requirement corresponds to a constraint Ord score in Haskell. In that case, there exists an “optimal strategy” for each player which guarantees them an “optimal score” m in the sense that as long as one player sticks to their “optimal strategy”, the other player cannot score better than m. This situation is what we call a Nash equilibrium in game theory. For win/tie/lose games, the existence of a Nash equilibrium means that either there is a winning strategy for one of the players, or they must tie by playing optimally.

The “optimal score” m is unique, and can be computed by a fold of the game tree, replacing Max and Min constructors with the functions maximum and minimum. This is the minimax algorithm:
minimax :: Ord score => Game score -> score
minimax (Max gs) = maximum (minimax <$> gs)
minimax (Min gs) = minimum (minimax <$> gs)
minimax (End s) = s
minimax is quite an inefficient algorithm: it must traverse the whole game tree. Indeed, maximum and minimum must traverse the whole list to find the maximum or minimum element.

Often, we can do much better. For instance, consider the following tree:
Max [ End 0,
      Min [ End (-1),
            t ] ]
The minimax of that tree does not depend on the subtree t. Indeed, minimum [-1, minimax t] is guaranteed to be at most -1, so the maximum between that value and 0 is guaranteed to be 0. Thus we can compute the minimax without inspecting the subtree t, which may be arbitrarily large. That idea leads to a more efficient algorithm to compute the minimax.

Alpha-beta

The alpha-beta pruning algorithm¹ is a modification of minimax with an extra pair of arguments:
alphabeta :: Ord score => Game score -> (score, score) -> score
The pair (alpha, beta) represents a “relevance interval” which relaxes the possible outputs of alphabeta. Either alphabeta t (alpha, beta) produces a score within that interval, in which case it is guaranteed to be equal to minimax. Otherwise, alphabeta t (alpha, beta) produces a value outside of the interval, in which case its exact value does not matter; it only has to be on the same side of the interval as minimax t. More rigorously:

if alpha < minimax t < beta, then alphabeta t (alpha, beta) = minimax t;

if minimax t <= alpha, then alphabeta t (alpha, beta) <= alpha;

if beta <= minimax t, then beta <= alphabeta t (alpha, beta).

Leaving the value of alphabeta underspecified when outside of the interval allows the implementation to short-circuit: we can stop searching through Max nodes as soon as we can guarantee a score greater than beta, and we can stop searching through Min nodes as soon as we can guarantee a score smaller than alpha.

We can then use alphabeta to redefine minimax:
-- Minimax using alpha-beta pruning
minimaxAB :: (Ord score, Bounded score) => Game score -> score
minimaxAB t = alphabeta t (minBound, maxBound)
assuming that score is Bounded with extreme values minBound :: score and maxBound :: score. It’s possible to avoid the Bounded constraint by changing the interval type (score, score) to (Maybe score, Maybe score), which amounts to adding distinguished top and bottom elements. We’ll stick with Bounded to keep things a bit simpler.

Implementing alphabeta is a standard exercise. It is even easier when you have a formal specification like the above to guide the implementation.
alphabeta :: Ord score => Game score -> (score, score) -> score
alphabeta (Max (g0 :| [])) i = alphabeta g0 i
alphabeta (Max (g0 :| g1 : gs)) (alpha, beta) =
  let m0 = alphabeta g0 (alpha, beta) in
  if beta <= m0 then m0
  else m0 `max` alphabeta (Max (g1 :| gs)) (max alpha m0, beta)
alphabeta (Min (g0 :| [])) i = alphabeta g0 i
alphabeta (Min (g0 :| g1 : gs)) (alpha, beta) =
  let m0 = alphabeta g0 (alpha, beta) in
  if m0 <= alpha then m0
  else m0 `min` alphabeta (Min (g1 :| gs)) (alpha, min beta m0)
alphabeta (End s) _ = s
But still, it is at least a little finicky and tedious to make sure that you haven’t mixed your alphas and betas.

As we will see in this post, we can streamline the implementation of alpha-beta pruning by factoring the short-circuiting logic out of the “minimax” logic.

Generalized minimax

Remark that minimax only uses min and max (via minimum and maximum), rather than the comparison functions of Ord (compare, (<=), etc.).

We can reduce the dependency footprint of minimax by defining a new class with only the necessary operations, the class of lattices:
class Lattice a where
  -- Join, least upper bound, max
  (\/) :: a -> a -> a 
  -- Meet, greatest lower bound, min
  (/\) :: a -> a -> a
In mathematics, lattices are algebraic structures with two operations (\/) (“join”) and (/\) (“meet”) satisfying commutativity, associativity, as well as the absorption laws:
x \/ (x /\ y) = x
x /\ (x \/ y) = x
In this post, we will only be looking at lattices that arise out of total orders, so this class is rather just a way of saying that we only depend on min and max.

Binary operations can be iterated to combine lists of arguments, similarly to the maximum and minimum functions:
-- maximum
joins :: Lattice a => NonEmpty a -> a
joins = foldr1 (\/)

-- minimum
meets :: Lattice a => NonEmpty a -> a
meets = foldr1 (/\)
Minimax in lattices is defined by replacing Max and Min nodes with the joins and meets operations.
-- Minimax in lattices
minimaxL :: Lattice score => Game score -> score
minimaxL (Max gs) = joins (minimaxL <$> gs)
minimaxL (Min gs) = meets (minimaxL <$> gs)
minimaxL (End x) = x
minimaxL generalizes minimax since every decidable total order is a lattice (because you can use (<=) to define min/max). Ideally this fact would be made explicit by making Lattice into a superclass of Ord. Unfortunately in Haskell this would require us to modify Ord or redefine it. Another way to express the relation between Lattice and Ord is through a newtype.
newtype OrdLattice a = OrdLattice a
   deriving newtype (Eq, Ord, Bounded)

unOrdLattice :: OrdLattice a -> a
unOrdLattice (OrdLattice x) = x

instance Ord a => Lattice (OrdLattice a) where
  OrdLattice x \/ OrdLattice y = OrdLattice (max x y)
  OrdLattice x /\ OrdLattice y = OrdLattice (min x y)
With that, we recover the starting minimax by specializing minimaxL to OrdLattice s, and then unwrapping OrdLattice:
minimaxO :: Ord score => Game score -> score
minimaxO = unOrdLattice . minimaxL . fmap OrdLattice
Clamping functions

Focus on the type (score, score) -> score which appears in the signature of alphabeta. More specifically, we are interested in a subset of those functions that we shall call clamping functions.

Intuitively, a clamping function f is a delayed representation of a constant s: the goal of f is to compute s, but it may also stop early with an approximation if it’s not necessary to know the exact value of s.

The name “clamping function” is a reference to the clamp function:
clamp :: Ord score => score -> (score, score) -> score
clamp s (alpha, beta) = max alpha (min s beta)
We can think of the partially applied function clamp s as an encoding of the constant s, which may or may not be output depending on the interval (alpha, beta).

More formally, a clamping function with value s is a function f :: (score, score) -> score that satisfies the following, for all (alpha, beta) such that alpha < beta:

if alpha < s < beta, then f (alpha, beta) = s;

if s <= alpha, then f (alpha, beta) <= alpha;

if beta <= s, then beta <= f (alpha, beta).

Two clamping functions with the same value s are considered equal. In particular, as clamping functions, const s is equal to clamp s. Making the notion of equality explicit is necessary to make sense of equations (laws for lattices, homomorphisms, and isomorphisms).

We enshrine the definition of clamping functions in a newtype:
-- Type of clamping functions, satisfying the properties above.
newtype Clamping score = Clamping ((score, score) -> score)

unClamping :: Clamping score -> (score, score) -> score
unClamping (Clamping f) = f
For any value s, we can construct the constant clamping function:
clamping :: score -> Clamping score
clamping s = Clamping (\_ -> s)
Note that \_ -> s and clamp s are both clamping functions with value s, so both are valid definitions of clamping s. We prefer the constant function \_ -> s because it does less work.

Conversely, we can project clamping functions back into their values by passing the whole interval (minBound, maxBound):
declamp :: Bounded score => Clamping score -> score
declamp (Clamping f) = f (minBound, maxBound)
Those two functions form an isomorphism between score and Clamping score, meaning that they satisfy the following equations:
declamp . clamping = id
clamping . declamp = id
We now get to the secret sauce of this post: the maximum of two clamping functions (as well as the minimum). This operation can be defined in two ways. First is the naive definition, for reference:
-- "max" for clamping functions, naive variant
maxC :: Ord s => Clamping s -> Clamping s -> Clamping s
maxC (Clamping f) (Clamping g) = Clamping (\i -> max (f i) (g i))
Second is the lazy definition: if f (alpha, beta) is greater than the given upper bound beta, then the max of f and g will be even greater:
beta <= f (alpha, beta) <= max (f (alpha, beta)) (g (alpha, beta)) 
In that case, the maximum of f and g is allowed to output f (alpha, beta) without looking at g. Otherwise we must evaluate g, but we can tighten the interval by updating the lower bound to max alpha (f (alpha, beta)).
-- "max" for clamping functions, lazy variant
lazyMaxC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMaxC (Clamping f) (Clamping g) = Clamping (\(alpha, beta) ->
  let fi = f (alpha, beta) in
  if beta <= fi then fi else fi `max` g (max alpha fi, beta))
Dually, we also have a lazyMinC.
lazyMinC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMinC (Clamping f) (Clamping g) = Clamping (\(alpha, beta) ->
  let fi = f (alpha, beta) in
  if fi <= alpha then fi else fi `min` g (alpha, min beta fi))
To avoid repeating ourselves, we can also reuse lazyMaxC to implement lazyMinC. Use Down to invert the ordering of an Ord:
lazyMinC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMinC f g = undualize (lazyMaxC (dualize f) (dualize g))
  where
    dualizeWith from to (Clamping h) =
      Clamping ($beta, alpha) -> from (h (to alpha, to beta)))
    dualize   = dualizeWith Down getDown -- Clamping s -> Clamping (Down s)
    undualize = dualizeWith getDown Down -- Clamping (Down s) -> Clamping s
These “naive” and “lazy” functions denote the same value (maxC = lazyMaxC and minC = lazyMinC), but lazyMaxC and lazyMinC may do less work, either by ignoring their second argument or by applying it to a smaller interval than expected.

The point is that these “lazy” functions embody the short-circuiting logic of alpha-beta pruning exactly. All that’s left to do is to plug them into minimax.

The lattice of clamping functions

With the lazy min and max that we just defined, we get a lattice:
instance Ord score => Lattice (Clamping score) where
  (\/) = lazyMaxC
  (/$ = lazyMinC
Specialize minimax in the lattice of clamping functions:
minimaxC :: Ord score => Game (Clamping score) -> Clamping score
minimaxC = minimaxL
This doesn’t look like much, but we have actually implemented the alpha-beta pruning algorithm. With a tiny bit of plumbing, we can redefine the function alphabeta from earlier:
alphabeta' :: Ord score => Game score -> (score, score) -> score
alphabeta' = unClamping . minimaxC . fmap clamping
Then we want to partially apply alphabeta' to the interval (minBound, maxBound). This amounts to replacing unClamping with declamp in the body of alphabeta'. Behold our final implementation of minimax by alpha-beta pruning:
minimaxAB' :: (Ord score, Bounded score) => Game score -> score
minimaxAB' = declamp . minimaxC . fmap clamping
To sum up, we implemented alpha-beta pruning as a simple combination of:

minimax, generalized from orders to lattices (minimaxL);

the lattice of clamping functions (Lattice (Clamping score)).

This alternative approach does not completely absolve you from effort: you still have to juggle alphas and betas correctly to implement the lattice (lazyMinC and lazyMaxC). But unlike in the original alphabeta, you don’t have to do all that juggling in the middle of a recursive function. The logic of alpha-beta pruning is neatly decomposed into bite-sized pieces.

Correctness for free

Since we just reused the code of minimax, it’s also easier to prove that that alpha-beta pruning yields the same result:
minimax = minimaxAB'
As we are about to see, this is a direct consequence of the free theorem² for minimaxL: any function of type forall s. Lattice s => Game s -> s, such as minimaxL, commutes with any lattice homomorphism³ f, in the following sense:
f . minimaxL = minimaxL . fmap f
We can picture that equation as a commutative diagram:

\[\require{AMScd} \begin{CD} \small\texttt{Game s} @>{\texttt{minimaxL}}>> \small\texttt{s} \\ @V{\texttt{fmap f}}VV @VV{\texttt{f}}V \\ \small\texttt{Game t} @>{\texttt{minimaxL}}>> \small\texttt{t} \end{CD}\]

If f has an inverse f⁻¹, we can rewrite that to
minimaxL = f⁻¹ . minimaxL . fmap f
By replacing (f, f⁻¹) with the isomorphism (clamping, declamp) defined earlier, we obtain exactly the equality between minimax and alpha-beta pruning:
minimaxL = declamp . minimaxL . fmap clamping
         = minimaxAB'
As a commutative diagram:

\[\require{AMScd} \begin{CD} \small\texttt{Game s} @>{\texttt{minimaxAB’}\text{ (alpha-beta)}}>> \small\texttt{s} \\ @V{\texttt{fmap clamping}}VV @AA{\texttt{declamp}}A \\ \scriptsize\texttt{Game (Clamping s)} @>{\texttt{minimaxL}}>> \scriptsize\texttt{Clamping s} \end{CD}\]

QED.

(To be pedantic, the above proof conflates minimaxL with minimax/minimaxO, which relies on pretending that Lattice is a superclass of Ord. Below is another proof that doesn’t take that shortcut, by going through the OrdLattice newtype explicitly, so this proof applies more directly to the Haskell definitions as written here.)
A somewhat more rigorous proof
We want to prove that the alpha-beta-pruning minimaxAB' is equivalent to the naive minimax:
minimax = minimaxAB'
Recall the free theorem of minimaxL. For any lattice isomorphism (f, f⁻¹):
minimaxL = f⁻¹ . minimaxL . fmap f
Replace (f, f⁻¹) with the lattice isomorphism (clamping . unOrdLattice, OrdLattice . declamp) between the lattices OrdLattice score and Clamping score.
minimaxL = OrdLattice . declamp . minimaxL . fmap (clamping . unOrdLattice)
Now we can prove the equality between minimax and minimaxAB', using the above equation as the middle step, followed by canceling inverses:
minimax
= minimaxO
= unOrdLattice . minimaxL . fmap OrdLattice
= unOrdLattice . OrdLattice . declamp . minimaxL . fmap (clamping . unOrdLattice) . fmap OrdLattice
= declamp . minimaxL . fmap clamping
= minimaxAB'
The above is only a proof of functional correctness: minimax and minimaxAB' compute the same result.

To verify that minimaxAB' does so more efficiently is another problem for another day. For now, we can test it.

Strictness check

We test that our “fancy” implementation of alpha-beta (minimaxAB') has the same strictness as the “classical” implementation (minimaxAB), which we presume to be much lazier than minimax.

We use StrictCheck for property-testing of strictness behaviors in Haskell. The following test checks that minimaxAB and minimaxAB' have the same demand on random inputs. We use the function observe1 from StrictCheck to observe the demand of a function f: observe1 applies f it to an instrumented copy of the provided input g, it forces the output (f g of type Int) using the provided forcing function (`seq` ()), and finally returns the demand on the input tree g that was observed by forcing the instrumented copy of g.
main :: IO ()
main = do
  quickCheck $ \(g :: Game Int) ->
    label (bucket (length g)) $
    let demand f = snd (observe1 (`seq` ()) f g) in
    demand minimaxAB === demand minimaxAB'
From the source repository of this blog, the following command compiles and runs this blog post:
cabal run alpha-beta
Instances and auxiliary definitions
-- Histogram of generated value sizes
bucket :: Int -> String
bucket n | n == 1 = "= 1"
         | n < 10 = "< 10"
         | n < 100 = "< 100"
         | n < 1000 = "< 1000"
         | otherwise = ">= 1000"

-- Instances
deriving stock instance GHC.Generic (Game a) 
instance Generic (Game a)
instance HasDatatypeInfo (Game a)
instance Shaped a => Shaped (Game a)

instance Arbitrary a => Arbitrary (Game a) where
  arbitrary = sized $ \n -> if n == 0 then End <$> arbitrary else
    resize (n `div` 2) $ frequency
      [(1, End <$> arbitrary), (2, Max <$> arbitrary), (2, Min <$> arbitrary)]
  shrink (Max (g :| gs)) = g : gs ++ (Max <$> shrink (g :| gs))
  shrink (Min (g :| gs)) = g : gs ++ (Min <$> shrink (g :| gs))
  shrink (End s) = End <$> shrink s

instance Arbitrary a => Arbitrary (NonEmpty a) where
  arbitrary = liftA2 (:|) arbitrary arbitrary
  shrink (x :| xs) = [y :| ys | y : ys <- shrink (x : xs)]
Conclusion

I came up with this idea a while back on Stack Overflow, as an answer to Alpha-beta pruning with recursion schemes. My understanding of alpha-beta pruning changed overnight from a somewhat tricky algorithm to a completely trivial solution. Getting to reuse minimax is not only a satisfying achievement in refactoring, it enables a neat proof of correctness by parametricity (via free theorems).

The role of laziness should also be underscored. If you try to do the same thing in a call-by-value language, the implementation of “generalized minimax” must explicitly delay computations, obscuring the point:

Alpha-beta pruning is just minimax in a lattice of clamping functions.
For a clearer presentation, see the talk Alpha-Beta Pruning Explored, Extended and Verified (2024) by Tobias Nipkow.↩︎

Theorems for free! by Philip Wadler. Free theorems involving type constructor classes by Janis Voigtländer.↩︎
A lattice homomorphism f is a function that commutes with the lattice operations:
f (x /\ y) = f x /\ f y
f (x \/ y) = f x \/ f y
↩︎
by Lysxia at September 01, 2025 12:00 AM

August 31, 2025

Edward Z. Yang

The Parallelism Mesh Zoo

When training large scale LLMs, there is a large assortment of parallelization strategies which you can employ to scale your training runs to work on more GPUs. There are already a number of good resources for understanding how to parallelize your models: I particularly recommend How To Scale Your Model and The Ultra-Scale Playbook. The purpose of this blog post is to discuss parallelization strategies in a more schematic way by focusing only on how they affect your device mesh. The device mesh is an abstraction used by both PyTorch and JAX that takes your GPUs (however many of them you've got in your cluster!) and organizes them into a N-D tensor that expresses how the devices communicate with each other. When we parallelize computation, we shard a tensor along one dimension of the mesh, and then do collectives along that dimension when there are nontrivial dependencies between shards. Being able to explain why a device mesh is set up the way it is for a collection of parallelization strategies is a good check for seeing if you understand how the parallelization strategies work in the first place! (Credit: This post was influenced by Visualizing 6D Mesh Parallelism.)

tl;dr

DP, FSDP: ["dp"]

HSDP: ["dp_replicate", "dp_shard"]

DP+TP, DP+TP+SP: ["dp", "tp"]

DP+UlyssesSP: ["dp", "sp"] (verl)

DP+CP: ["dp", "cp"]

DP+CP+TP: ["dp", "cp", "tp"]

PP+DP+...: ["pp", "dp", ...] (torchtitan), ["dp", "pp", ...] (Megatron)

PP+DP+CP+TP+EP: ["pp", "dp_replicate", "dp_shard_mod_ep", "dp_shard_in_ep", "cp", "tp"] (torchtitan)

Prologue: Why device mesh? Before we jump into the zoo, why do we have multi-dimensional meshes in the first place? One intuition is that the dimensions of the device mesh are a reflection of the physical constraints of networking between GPUs (there's a reason why all of the scaling books talk extensively about how the networking for GPUs works; you can't reason about what parallelization strategy you should use without knowing about this!) Let's imagine you have 1024 NVIDIA GPUs. You don't want to treat this 1024 GPUs as an undifferentiated blob of GPUs. Physically, these GPUs are grouped into nodes of eight which have much faster NVLink connections compared to cross-node communication which is done on a slower Infiniband connection. Intuitively, you will want to do something different depending on if you're doing intra-node communication or inter-node communication.

The device mesh imposes structure on this collection of GPUs. A mesh is typically specified as a tensor size (e.g., (128, 8)) as well as string axis names ala named tensor (e.g., ["dp", "tp"]), and is simply an N-D tensor over a range of GPU indices (typically [0, 1, 2, 3, ...] for GPUs, and a mostly ascending but occasionally permuted sequence for TPUs). We typically think of 2D and 3D tensors as grids and cubes, but I find it is more helpful (especially in higher dimensions) to think of the device mesh as imposing some self-similar (fractal) structure on the GPUs. In the simplest 2D mesh that accounts for intra versus inter node communication, GPUs are first organized into nodes on the inner-most dimension, and then the nodes are collected together in the outer-most dimension to form the cluster. (The self-similar nature of the nodes is important because it tells us how communication occurs across the cluster: to communicate over the outer-most mesh dimension, all the GPU 0s on each node talk to each other, all the GPU 1s, etc.) This is only the very simplest mesh we can create, however; with more complicated parallelization strategies we may impose extra levels of structure, e.g., we may organize nodes into pods of two and four, or we might further divide the eight GPUs of a single node. In other words, the mesh tells us about which GPUs communicate to which other GPUs. This is important to know, because when I want to parallelize our model, I am making choices about how to shard tensors across my GPUs. The mesh tells me which GPUs have the other shards of my tensor; in other words, they are who I have to communicate with when I am doing a computation that requires information about the full tensor and cannot be done with the local shards only.

In the zoo, when we talk about a parallelism strategy, we will talk to how it typically relates to other parallelization strategies in the model, and the device mesh will tell us if it is orthogonal to other parallelisms (a new dimension), multiplexed with another strategy (a reused dimension) or perhaps a completely different hierarchy of communication (multiple meshes in the same model that don't factor into the other).

Without further ado, here is the zoo!

Data parallelism (DP). Data parallelism predates the concept of device meshes, since you don't actually need any nontrivial mesh structure to do data parallelism: if you are only doing data parallel, you just shard your input on the batch axis for however many devices you have. This sharding propagates through forwards and backwards until you allreduce to compute the final global gradient for a parameter. If you did make a 1D device mesh (this is useful to think about, because most higher dimensional parallelisms will include some form of data parallelism), you'd probably name your mesh ["dp"], ["ddp"] or perhaps ["batch"].

Let's talk briefly about how people tend to name device mesh axes. In the PyTorch world, it's most common to name the axis after the parallelism that it is responsible, so either "dp" or "ddp" (you really shouldn't call it ddp, but the DataParallel taboo in PyTorch is very real!) The batch name is common in JAX, and is very natural there because when you annotate the sharding of your input, you need to say for each dimension tensor what mesh dim it is sharded over. So when you shard the batch dimension over the batch mesh dim, it looks just like you're labeling the batch dimension of your tensor as batch, e.g., P("batch", None). (This situation doesn't happen in PyTorch because shardings of a tensor are specified per device mesh dim, but that's a story for another day!)

Fully-sharded data parallel (FSDP). This is best understood as an augmentation over DP where weights are also sharded over all GPUs and you just all-gather weights before performing operations (and reduce-scatter in backwards). Because this all-gather is also among all devices, you don't need another axes in your mesh, and your mesh might also be called ["dp"] in this case, even though you're actually doing FSDP. Occasionally, you'll see people name their mesh ["fsdp"] in this case.

Hybrid sharded data parallel (HSDP). HSDP is an extension of FSDP where you shard weights (FSDP) up to the point where you can't actually do a giant all-gather/reduce-scatter over every GPU, and then replicate these shards to cover the rest of your cluster (DP). It's also amenable to fault tolerance techniques that make the modeling assumption that it's OK to lose samples of your batch if a replica fails (you won't model this with device mesh though!). This is probably the first time you will encounter a 2D device mesh (indeed, the DeviceMesh tutorial in PyTorch specifically uses hybrid sharding as its motivating example), since HSDP doesn't require any extra model changes on top of FSDP. There are a few common ways to name the mesh axes for HSDP. One way to think about it is that it is FSDP on the inner dimension and DP on the outer dimension, in which case you would say ["dp", "fsdp"]. Another way is to think about what happens to parameters at the various layers of the mesh: the inner dimension shards, while the outer dimension replicates, so you would say ["replicate", "shard"] or perhaps ["dp_replicate", "dp_shard"] to make it clear that you are still doing data parallelism across both of these device mesh dims (in particular, when you split your batches, you split on both the dp_replicate and dp_shard dims--although, to get the final gradients, you can do the reduction hierarchically by first doing a reduce-scatter on "dp_shard" and then doing an allreduce on "dp_replicate").

Tensor parallelism (TP). Depending on who you ask, tensor parallelism is either about letting you reduce your effective batch size for training or moving you towards reducing the memory usage of activations in your model. In the "reduce effective batch size" framing, the idea behind TP is that you can only scale up DP until your cluster is as large as your batch size. From a modeling perspective, it can be undesirable to have a batch size that is too large, so you can't just keep increasing your batch size to get more parallelism. Instead, TP allows us to get some extra scaling by sharding over the feature dimension of our matrix multiplies [1] (you can shard over either the columns or the rows of your weight matrix, so we will frequently specify if a TP Linear is column-wise or row-wise; in attention, column-wise linear effectively parallelizes the attention computation over attention heads). The communication needed to do TP is fairly exposed (unless you're doing async tensor parallel), so you typically want to keep the communications for it within a single node. This leads to this classic 2D device mesh for DP+TP: ["dp", "tp"] (or, if you're a JAXer, you might write ["batch", "model"], where model is used to indicate the inner feature dimension of the model weights being parallelized over.) When someone says 2D parallelism, they're usually referring to this combo of parallelisms (although I do not recommend using this term--as you can see, it is obviously ambiguous!) Note that tp is the inner mesh dimension, since it benefits the most from the high bandwidth network between GPUs on a single node.

You don't have to stop with DP+TP, however. If you're using FSDP with tensor parallelism (remember, "dp" can mean FSDP!), intra-node TP doesn't improve the amount of inter-node FSDP communication you have to do: however much TP you do, within one TP node you only have one slice of the model and have to talk to everyone else to get their slices. You could solve this by expanding TP to also cross nodes, but in practice mixed intra/inter-node collectives are a lot slower than pure inter-node collectives. This limits the scaling you can get from TP, and so if you're still hitting limits on FSDP, it can still be useful to apply HSDP to avoid running collectives that are too large. In that case, you'd end up with a mesh like ["dp_replicate", "dp_shard", "tp"].

Sequence parallelism (SP). For this section, we specifically take the definition of sequence parallelism from the Ultrascale Playbook (as distinguished from context parallelism). Although we said that TP is the first step towards reducing the memory usage of activations [2], if you literally implement DP+TP based on my descriptions above, you will still end up with more memory spent on activations than you want because there are still parts of the model around the FFN like the LayerNorm need the full hidden dimension to compute mean and variance [3]. To reduce the memory usage in these segments, you need to shard on something else. So typically what you will see is that the model will alternate between TP (hidden dimension is sharded) and SP (sequence dimension is sharded). Consequently, if you look at the device mesh for a model using DP+TP+SP, it will typically still look like ["dp", "tp"], and instead the tp dimension is multiplexed to be used both for TP and SP. Because TP and SP never occur at the same time, you don't need a separate dimension for them.

Ulysses sequence parallelism. Ulysses sequence parallelism from DeepSpeed Ulysses is another sequence parallelism strategy that is implemented by verl (because verl is forked so often, it shows up quite prominently if you are looking for examples of init_device_mesh on GitHub code search). It aims to alleviate memory pressure from extremely long sequences, so sequences are sharded on input, and only when attention needs to be computed is an alltoall issued to re-shard on the attention heads rather than the sequence (doing another alltoall to restore the sequence sharding after the attention is done). Importantly, this means it competes with TP for sharding on the attention heads, which is why you also see people use it to replace TP in MoE models, since it has much less communication than TP (at the cost of having to replicate the attention weights). In verl, you will just see a device mesh ["dp", "sp"] when you are using their FSDP backend (which is what supports Ulysses).

Context parallelism (CP). Context parallelism is another form of "sequence" parallelism. Like Ulysses sequence parallelism, sequences are sharded on input; the difference, however, is instead of using an alltoall to re-shard on attention heads, you just do a (distributed) attention on the entire context. You can do this the easy way by just using allgather to get the full context (as was done in llama4) or you can use a fancy kernel like ring attention, which carefully overlaps communication and computation when performing attention. A popular implementation of context parallelism lives in Megatron, which doesn't directly use PyTorch's native DeviceMesh abstraction but has an analogous HyperCommGrid. The mesh we see here will be something like ["dp", "cp"] or more commonly ["dp", "cp", "tp"]. Notice that we can have a dedicated mesh dim for CP: CP operates very similarly to SP outside of the attention calls (as it is just plain data parallelism when there is no cross-token dependency), but because it never shards on attention heads, it doesn't compete with TP and can be used completely orthogonally to TP (TP shards hidden, CP shards sequence).

CP has a pretty interesting interaction with FSDP. Both DP and CP shard the input data (on batch and sequence respectively). It's pretty common when you do FSDP to just shard over both "dp" ("dp_shard" in HSDP) and "cp". In torchtitan, we create a flattened mesh dim "dp_shard_cp" specifically for FSDP sharding (a flattened mesh dim is what happens if you take your mess and "forget" about some of the structure; e.g., if you were to do an all-gather, you just all-gather over all the flattened axes). In the HSDP world, "dp_cp" is still a useful concept because this is the combination of axes you want to all-reduce over to, e.g., compute the global average loss.

Pipeline parallelism (PP). Pipeline parallelism is kind of an ugly duckling and people tend to hate on it because you have to rewrite your models to introduce pipeline stages, and you can't really use things like DTensor with it (unless you do really strange things like how the GSPMD paper "supports" pipeline parallelism--the general consensus is automatic parallelism does not like PP). PP still goes in the device mesh, because it affects how you are organizing your GPUs, but, for example, torchtitan solely uses it to setup PGs for doing the point-to-point communications. I've seen both ["dp", "pp", ...] or ["pp", "dp", ...] for meshes with PP, but the order probably doesn't make too much of a difference as you are likely solidly inter-node at this point. Pipeline parallelism bandwidth use is very low, and latency can be covered up as you can immediately start processing the next batch after triggering an asynchronous send of the previous batch.

Expert parallelism (EP). EP is its own kettle of fish. Expert parallelism only applies over the expert computation of the model, but within this region, we are not sharding parameters as FSDP conventionally sees it: we will commonly have the entire expert's weights on our node. torchtitan's WIP expert parallelism implementation, when it has ALL parallelisms on, would look like ["pp", "dp_replicate", "dp_shard_mod_ep", "dp_shard_in_ep", "cp", "tp"], where dp_shard has been split into two mesh dimensions (DP shard modulo EP, and DP shard in EP). dp_shard_mod_ep is conventionally one, but when it is not it represents further FSDP-style sharding of expert weights inside of the expert region (there's some complication here if you have shared experts along-side your EP-sharded experts). But then dp_shard_in_ep, cp and optionally tp are combined together to give you the expert parallel dimension. It's actually more intuitive to imagine that you have two distinct meshes: ["pp", "dp_replicate", "dp_shard", "cp", "tp"] and ["pp", "dp_shard_mod_ep", "ep", "tp"]. The keen-eyed may also notice that there is no intrinsic reason the tp mesh size inside and outside of the expert parallel region, but this is not easily done if you have to have a single global device mesh for everything. In fact, there is a WIP PR to have two meshes, one for inside the expert region and one for outside: https://github.com/pytorch/torchtitan/pull/1660

Conclusion. The general concept behind mesh parallelism is that you can compose parallelization strategies without too much fuss. Indeed, the use of, e.g., TP to improve scaling is precisely because it lets you cover your device space without having to expand DP beyond the batch size you want to do. However, as you can see from these concrete examples, it's not always quite as simple as just stacking all of the parallelisms together one on top of each other. In the end, all the device mesh is doing is creating PGs behind groups of devices as defined by the mesh, so if you want some weird setup where you're swapping between two device meshes, PyTorch's general philosophy has been to say, have fun!

Thanks to Horace He, Tianyu Liu and Natalia Gimelshein for helping fact check this post. Any remaining errors are mine!

[1] One more subtlety I want to point out: while we tend to think of TP as sharding the feature dimension of parameters, when we "propagate" this sharding through the network, other intermediate tensors end up getting sharded on the TP dimension as well. In particular, in a transformer block, you will typically have a column-wise linear followed by a row-wise linear, and the intermediate activation will be temporarily sharded on the TP dimension before the row-wise linear runs.

[2] I am very carefully using "activation memory" here and not total memory, because total memory usage (what you actually care about) is also a function of peak memory usage, which is subject to transient peaks such as when FSDP does an all-gather to collect parameters. In fact, even without SP, TP will improve your peak memory usage, because unlike FSDP, it's not necessary to all-gather the full weight matrix to actually perform the matrix multiply. TPs peak memory usage occurs when it all-gathers activations.

[3] You will get a little improvement between the column-wise and row-wise linear, since the activations there are sharded. You can turn this into a big improvement by using selective activation checkpointing and forcing recomputation of activations that aren't sharded! (Plain activation checkpointing tends not to work so well because of the all-gather of the activations.)

by Edward Z. Yang at August 31, 2025 03:20 AM

August 29, 2025

Well-Typed.Com

Welcoming a new Haskell Ecosystem Supporter: Standard Chartered

Following on from our announcement of Haskell Ecosystem Support Packages, Well-Typed are delighted to introduce Standard Chartered as our first Gold Haskell Ecosystem Supporter.

At Standard Chartered Bank, Haskell is used in a core software library supporting the entire Markets division – a business line with 3 billion USD operating income in 2023. Typed functional programming is used across the entire tech stack, including foundational APIs and CLIs for deal valuation and risk analysis, server-side components for long-running batches or sub-second RESTful services, and end-user GUIs. Thousands of users across Markets interact with software built using functional programming, and over one hundred write functional code.

Well-Typed’s Haskell Ecosystem Support Packages, offered in partnership with the Haskell Foundation, allow companies using Haskell to

invest in the maintenance and future development of the core Haskell toolchain,

access Well-Typed’s team of Haskell experts for private development or technical support, and

fund the Haskell Foundation to sustain key community infrastructure.

You can read more about the toolchain maintenance activities these packages fund in our regular reports. Many thanks to Standard Chartered, to the existing Haskell Ecosystem Supporters, and to our other clients who fund open-source development work, for making this possible.

If your company relies on Haskell, and depends on its core toolchain and vibrant open-source ecosystem, why not read more about our offer?

by adam at August 29, 2025 12:00 AM

August 27, 2025

Oskar Wickström

Finding Bugs in a Coding Agent with Lightweight DST
Amp is a coding agent which I’ve been working on the last six months at Sourcegraph. And in the last couple of weeks, I’ve been building a testing rig inspired by Deterministic Simulation Testing (DST) to test the most crucial parts of the system. DST is closely related to fuzzing and property-based testing.

The goal is to get one of Amp’s most central pieces, the ThreadWorker, under heavy scrutiny. We’ve had a few perplexing bug reports, where users experienced corrupted threads, LLM API errors from invalid tool calls, and more vague issues like “it seems like it’s spinning forever.” Reproducing such problems manually is usually somewhere between impractical and impossible. I want to reproduce them deterministically, and in a way where we can debug and fix them. And beyond the known ones, I’d like to find the currently unknown ones before our users hit them.

Generative testing to the rescue!

Approach: Lightweight DST in TypeScript

Amp is written in TypeScript, which is an ecosystem currently not drowning in fuzzing tools. My starting point was using jsfuzz, which I hadn’t used before but it looked promising. However, I had a bunch of problems getting it to run together with our Bun stack. One could use fast-check, but as far as I can tell, the model-based testing they support doesn’t fit with our needs. We don’t have a model of the system, and we need to generate values in multiple places as the test runs. So, I decided to build something from scratch for our purposes.

I borrowed an idea I got from matklad last year: instead of passing a seeded PRNG to generate test input, we generate an entropy Buffer with random contents, and track our position in that array with a cursor. Drawing a random byte consumes the byte at the current position and increments the cursor. We don’t know up-front how many bytes we need for a given fuzzer, so the entropy buffer grows dynamically when needed, appending more random bytes. This, together with a bunch of methods for drawing different types of values, is packaged up in an Entropy class:
class Entropy {  random(count): UInt8Array { ... }  randomRange(minIncl: number, maxExcl: number): number { ... }  // ... lots of other stuff }
A fuzzer is an ES module written in TypeScript, exporting a single function:
export async function fuzz(entropy: Entropy) {  // test logic here }
Any exception thrown by fuzz is considered a test failure. We use the node:assert module for our test assertions, but it could be anything.

Another program, the fuzz runner, imports a built fuzzer module and runs as many tests it can before a given timeout. If it finds a failure, it prints out the command to reproduce that failure:
Fuzzing example.fuzzer.js iteration 1000... Fuzzing example.fuzzer.js iteration 2000... Fuzzer failed: AssertionError [ERR_ASSERTION]: 3 != 4 at [...] Reproduce with: bun --console-depth=10 scripts/fuzz.ts \ dist/example.fuzzer.js \ --verbose \ --reproduce=1493a513f88d0fd9325534c33f774831
Why use this Entropy rather than a seed? More about that at the end of the post!

The ThreadWorker Fuzzer

In the fuzzer for our ThreadWorker, we stub out all IO and other nondeterministic components, and we install fake timers to control when and how asynchronous code is run. In effect, we have determinism and simulation to run tests in, so I guess it qualifies as DST.

The test simulates a sequence of user actions (send message, cancel, resume, and wait). Similarly, it simulates responses from tool calls (like the agent reading a file) and from inference backends (like the Anthropic API). We inject faults and delays in both tool calls and inference requests to test our error handling and possible race conditions.

After all user actions have been executed, we make sure to approve any pending tool calls that require confirmation. Next, we tell the fake timer to run all outstanding timers until the queue is empty; like fast-forwarding until there’s nothing left to do. Finally, we check that the thread is idle, i.e. that there’s no ongoing inference and that all tool calls have terminated. This is a liveness property.

After the liveness property, we check a bunch of safety properties:

all messages posted by the user are present in the thread

all message pairs involving tools calls are valid according to Anthropic’s API specification

all tool calls have settled in expected terminal states

Some of these are targeted at specific known bugs, while some are more general but have found bugs we did not expect.

Here’s a highly simplified version of the fuzzer:
export async function fuzz(entropy: Entropy) {  const clock = sinon.useFakeTimers({  loopLimit: 1_000_000,  })  const worker = setup() // including stubbing IO, etc   try {  const resumed = worker.resume()  await clock.runAllAsync()  await resumed   async function run() {  for (let round = 0; round < entropy.randomRange(1, 50); round++) {  const action = await generateNextAction(entropy, worker)  switch (action.type) {  case 'user-message':  await worker.handle({  ...action,  type: 'user:message',  })  break  case 'cancel':  await worker.cancel()  break  case 'resume':  await worker.resume()  break  case 'sleep':  await sleep(action.milliseconds)  break  case 'approve': {  await approveTool(action.threadID, action.toolUseID)  break  }  }  }   // Approve any remaining tool uses to ensure termination into an   // idle thread state  const blockedTools = await blockedToolUses()  await Promise.all(blockedTools.map(approve))  }   const done = run()  await clock.runAllAsync()  await done   // check liveness and safety properties  // ...  } finally {  sinon.restore()  } }
Now, let’s dig into the findings!

Results

Given I’ve been working on this for about a week in total, I’m very happy with the outcome. Here are some issues the fuzzer found:

Corrupted thread due to eagerly starting tool calls during streaming

While streaming tool use blocks from the Anthropic API, we invoked tools eagerly, while not all of them were finished streaming. This, in combination with how state was managed, led to tool results being incorrectly split across messages. Anthropic’s API would reject any further requests, and the thread would essentially be corrupted. This was reported by a user and was the first issue we found and fixed using the fuzzer.

Another variation, which the fuzzer also found, this was a race condition where user messages interfered at a particular timing with ongoing tool calls, splitting them up incorrectly.

Subagent tool calls not terminating when subthread tool calls were rejected

Due to a recent change in behavior, where we don’t run inference automatically after tool call rejection, subagents could end up never signalling their termination, which led to the main thread never reaching an idle state.

I confirmed this in both VSCode and the CLI: infinite spinners, indeed.

Tool calls blocked on user not getting cancelled after user message

Due to how some tool calls require confirmation, like reading files outside the workspace or running some shell commands, in combination how we represent and track termination of tools, there’s a possibility for such tools to be resumed and then, after an immediate user cancellation, not be properly cancelled. This leads to incorrect mutations of the thread data.

I’ve not yet found the cause of this issue, but it’s perfectly reproducible, so that’s a start.

Furthermore, we were able to verify an older bug fix, where Anthropic’s API would send an invalid message with an empty tool use block array. That used to get the agent into an infinite loop. With the fuzzer, we verified and improved the old fix which had missed another case.

How about number of test runs and timeouts? Most of these bugs were found almost immediately, i.e. within a second. The last one in the list above takes longer, around a minute normally. We run a short version of each fuzzer in every CI build, and longer runs on a nightly basis. This is up for a lot of tuning and experimentation.

Why the Entropy Buffer?

So why the entropy buffer instead of a seeded PRNG? The idea is to use that buffer to mutate the test input, instead of just bombarding with random data every time. If we can track which parts of the entropy was used where, we can make those slices “smaller” or “bigger.” We can use something like gradient descent or simulated annealing to optimize inputs, maximizing some objective function set by the fuzzer. Finally, we might be able to minimize inputs by manipulating the entropy.

In case the JavaScript community gets some powerful fuzzing framework like AFL+, that could also just be plugged in. Who knows, but I find this an interesting approach that’s worth exploring. I believe the entropy buffer approach is also similar to how Hypothesis works under the hood. Someone please correct me if that’s not the case.

Anyhow, that’s today’s report from the generative testing mines. Cheers!
August 27, 2025 10:00 PM

August 25, 2025

Haskell Interlude

69: Jurriaan Hage

Today’s guest is Jurriaan Hage. Jurriaan is a professor at Heriot-Watt University in Edinburgh who’s worked with and on Haskell for many years. He’s known for the Helium Haskell compiler, specifically designed for teaching, and he has plenty of other projects related to Haskell, including improvements to the type system, the generation of better error messages, or detection of plagiarism.

by Haskell Podcast at August 25, 2025 07:00 AM

August 24, 2025

Abhinav Sarkar

A Fast Bytecode VM for Arithmetic: The Compiler
In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:

Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).

Unit testing for our parser.

Interpreting ASTs.

Compiling ASTs to bytecode.

Disassembling and decompiling bytecode.

Unit testing for our compiler.

Property-based testing for our compiler.

Efficiently executing bytecode in a virtual machine (VM).

Unit testing and property-based testing for our VM.

Benchmarking our code to see how the different passes perform.

All the while keeping an eye on performance.

In this post, we write the compiler for our AST to bytecode, and a decompiler for the bytecode.

This post was originally published on abhinavsarkar.net.

This post is part of the series: A Fast Bytecode VM for Arithmetic.

The Parser

The Compiler (you are here)

The Virtual Machine

Contents
Introduction
The Bytecode
Num
BinOp
Var and Let
The Compiler
Compiling, Fast and Slow
The Decompiler
Testing the Compiler

Introduction

AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST, the interpreter jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode.

Bytecode is a flattened and compact representation of a program, usually manifested as a byte array. Bytecode is essentially an Instruction Set (IS), but custom-made to be executed by a Virtual Machine (VM), instead of a physical machine. Each bytecode instruction is one byte in size (that’s where it gets its name from). A bytecode and its VM are created in synergy so that the execution is as efficient as possible¹. Compiling source code to bytecode and executing it in a VM also allows the program to be run on all platforms that the VM supports without the developer caring much about portability concerns. The most popular combo of bytecode and VM is probably the Java bytecode and the Java virtual machine.

The VMs can be stack-based or register-based. In a stack-based VM, all values created during the execution of a program are stored only in a Stack data-structure residing in the memory. Whereas, in a register-based VM, there is also an additional set of fixed number of registers that are used to store values in preference to the stack². Register-based VMs are usually faster, but stack-based VMs are usually simpler to implement. For our purpose, we choose to implement a stack-based VM.

We are going to write a compiler that compiles our expression AST to bytecode. But first, let’s design the bytecode for our stack-based VM.

The Bytecode

Here is our expression AST as a reminder:
data Expr
  = Num !Int16
  | Var !Ident
  | BinOp !Op Expr Expr
  | Let !Ident Expr Expr
  deriving (Eq, Generic)

newtype Ident = Ident BS.ByteString
  deriving (Eq, Ord, Generic, Hashable)

data Op = Add | Sub | Mul | Div deriving (Eq, Enum, Generic)
ArithVMLib.hs
Let’s figure out the right bytecode for each case. First, we create Opcodes for each bytecode, which are sort of mnemonics for actual bytecode. Think of them as Assembly is to Machine Code.

Num

For a number literal, we need to put it directly in the bytecode so that we can use it later during the execution. We also need an opcode to push it on the stack. Let’s call it OPush with an Int16 parameter.

BinOp

Binary operations recursively use Expr for their operands. To evaluate a binary operation, we need its operands to be evaluated before, so we compile them first to bytecode. After that, all we need is an opcode per operator. Let’s call them OAdd, OSub, OMul, and ODiv for Add, Sub, Mul, and Div operators respectively.

Var and Let

Variables and Let expressions are more complex³. In the AST interpreter we chucked the variables in a map, but we cannot do that in a VM. There is no environment map in a VM, and all values must reside in the stack. How do we have variables at all then? Let’s think for a bit.

Each expression, after being evaluated in the VM, must push exactly one value on the stack: its result. Num expressions are a trivial case. When a binary operation is evaluated, first its left operand is evaluated. That pushes one value on the stack. Then its right operand is evaluated, and that pushes another value on the stack. Finally, the operation pops the two values from the top of the stack, does its thing, and pushes the resultant value back on the stack—again one value for the entire BinOp expression.

A Let expression binds a variable’s value to its name, and then the variable can be referred from the body of the expression. But how can we refer to a variable when the stack contains only values, not names? Let’s imagine that we are in middle of evaluating a large expression, wherein we encounter a Let expression. First we evaluate its assignment expression, and that pushes a value on the top of the stack. Let’s say that the stack has n values at this point. After this we get to evaluate the body expression. At all times when we are doing that, the value from assignment stays at the same point in the stack because evaluating sub-expressions, no matter how complicated, only adds new values to the stack, without popping an existing value from before. Therefore, we can use the stack index of the assignment value (n−1) to refer to it from within the body expression. So, we encode Var as an opcode and an integer index into the stack.

We choose to use a Word8 to index the stack, limiting us to a stack depth of 256. We encode the variable references with an opcode OGet, which when executed gets the value from the stack at the given index and pushes it on the stack.

For a Let expression, after we compile its assignment and body expressions, we need to make sure that the exactly-one-value invariant holds. Evaluating the assignment and body pushes two values on the stack, but we can have only one! So we overwrite the assignment value with the body value, and pop the stack to remove the body value. We invent a new opcode OSwapPop to do this, called so because its effect is equivalent to swapping the topmost two values on the stack, and then popping the new top value⁴.

Putting all the opcodes together, we have the Opcode ADT:
data Opcode
  = OPush !Int16        -- 0
  | OGet !Word8         -- 1
  | OSwapPop            -- 2
  | OAdd                -- 3
  | OSub                -- 4
  | OMul                -- 5
  | ODiv                -- 6
  deriving (Show, Read, Eq, Generic)

instance NFData Opcode
ArithVMLib.hs
Notice that we also assigned bytecodes—that is, a unique byte value—to each Opcode above, which are just their ordinals. Now we are ready to write the compiler.

The Compiler

The compiler takes an expression with the bytecode size, and compiles it to a strict ByteString of that size. Recall that in the previous post, we wrote our parser such that the bytecode size for each AST node was calculated while parsing it. This allows us to pre-allocate a bytestring of required size before compiling the AST. We compile to actual bytes here, and don’t use the opcodes.
type Bytecode = BS.ByteString

compile :: SizedExpr -> Result Bytecode
compile = compile' defaultStackSize

compile' :: Int -> SizedExpr -> Result Bytecode
compile' stackSize (expr, bytecodeSize) =
  uncurry (fmap . const) . BSI.unsafeCreateUptoN' bytecodeSize $ \fp -> do
    (bytecodeSize,)
      <$> fmap
        Right
        (compileIO bytecodeSize stackSize fp fp expr >>= checkSize fp . TS.fst)
        `catch` (pure . Left)
  where
    checkSize fp ip = do
      let actualBytecodeSize = ip `minusPtr` fp
      unless (actualBytecodeSize == bytecodeSize) $
        throwIO . Error Compile $
          "Compiled bytecode size " <> show actualBytecodeSize
            <> " is not same as expected size: " <> show bytecodeSize

compileIO ::
  Int -> Int -> Ptr Word8 -> Ptr Word8 -> Expr -> IO (Pair (Ptr Word8) Int)
compileIO bytecodeSize stackSize fp ip = go Map.empty 0 ip
  where
    ep = fp `plusPtr` bytecodeSize

    go env !sp !ip = \case
      Num n | sp + 1 <= stackSize -> do
        let !lb = fromIntegral $ n .&. 0xff
            !mb = fromIntegral $ ((fromIntegral n :: Word16) .&. 0xff00) `shiftR` 8
        writeByte ip 0 -- OPush
        writeByte (ip `plusPtr` 1) lb
        writeByte (ip `plusPtr` 2) mb
        pure (ip `plusPtr` 3 :!: sp + 1)
      Num _ -> throwCompileError "Stack overflow"
      BinOp op a b -> do
        (ip' :!: sp') <- go env sp ip a
        (ip'' :!: sp'') <- go env sp' ip' b
        writeByte ip'' $ translateOp op
        pure (ip'' `plusPtr` 1 :!: sp'' - 1)
      Let x assign body -> do
        (ip' :!: sp') <- go env sp ip assign
        (ip'' :!: sp'') <- go (Map.insert x sp env) sp' ip' body
        writeByte ip'' 2 -- OSwapPop
        pure (ip'' `plusPtr` 1 :!: sp'' - 1)
      Var x | sp + 1 <= stackSize -> case Map.lookup x env of
        Nothing -> throwCompileError $ "Unknown variable: " <> show x
        Just varScope
          | varScope < stackSize && varScope < fromIntegral (maxBound @Word8) -> do
              writeByte ip 1 -- OGet
              writeByte (ip `plusPtr` 1) $ fromIntegral varScope
              pure (ip `plusPtr` 2 :!: sp + 1)
        Just _ -> throwCompileError "Stack overflow"
      Var _ -> throwCompileError "Stack overflow"

    writeByte :: Ptr Word8 -> Word8 -> IO ()
    writeByte !ip !val
      | ip < ep = poke ip val
      | otherwise = throwCompileError $
          "Instruction index " <> show (ip `minusPtr` fp)
            <> " out of bound " <> show (bytecodeSize - 1)

    translateOp = \case
      Add -> 3 -- OAdd
      Sub -> 4 -- OSub
      Mul -> 5 -- OMul
      Div -> 6 -- ODiv

    throwCompileError = throwIO . Error Compile

defaultStackSize :: Int
defaultStackSize = 256
ArithVMLib.hs
We use the unsafeCreateUptoN' function from the Data.ByteString.Internal module that allocates enough memory for the provided bytecode size, and gives us a pointer to the allocated memory. We call this pointer fp for frame pointer. Then we traverse the AST recursively, writing bytes for opcodes and arguments for each case. We use pointer arithmetic and the poke function to write the bytes. Int16 numbers are encoded as two bytes in little endian fashion.

In the recursive traversal function go, we pass and return the current stack pointer sp and instruction pointer ip. We update these correctly for each case⁵. We also take care of checking that the pointers stay in the right bounds, failing which we throw appropriate errors.

We also pass an env parameter that is similar to the variable names to values environment we use in the AST interpreter, but this one tracks variable names to stack indices at which they reside. We update this information before compiling the body of a Let expression to capture the stack index of its assignment value. When compiling a Var expression, we use the env map to lookup the variable’s stack index, and encode it in the bytecode.

At the end of compilation, we check that the entire bytestring is filled with bytes till the very end, failing which we throw an error. This check is required because otherwise the bytestring may have garbage bytes, and may fail inexplicably during execution.

All the errors are thrown in the IO monad using the throwIO function, and are caught after compilation using the catch function. The final result or error is returned wrapped into Result.

Let’s see it in action:
$ echo -n "1 + 2 - 3 * 4" | arith-vm compile | hexdump -C
00000000  00 01 00 00 02 00 03 00  03 00 00 04 00 05 04     |...............|
0000000f
$ echo -n "let x = 4 in let y = 5 in x + y" | arith-vm compile | hexdump -C
00000000  00 04 00 00 05 00 01 00  01 01 03 02 02           |.............|
0000000d
You can verify that the resultant bytes are indeed correct. I assume that it is difficult for you to read raw bytes. We’ll fix this in a minute. Meanwhile, let’s ponder upon some performance characteristics of our compiler.

Compiling, Fast and Slow

You may be wondering why I chose to write the compiler in this somewhat convoluted way of pre-allocating a bytestring and using pointers. The answer is: performance. I didn’t actually start with pointers. I iterated through many different data and control structures to find the fastest one.

The table below shows the compilation times for a benchmark expression file when using different data structures to implement the compileIO function:

Data structure Time (ms) Incremental speedup Overall speedup

List 4345 1x 1x

Seq 523 8.31x 8.31x

DList 486 1.08x 8.94x

BS Builder 370 1.31x 11.74x

Pre-allocated BS 54 6.85x 80.46x

Bytearray 52 1.02x 83.55x

I started with the bread-and-butter data structure of Haskellers, the humble and known to be slow List, which was indeed quite slow. Next, I moved on to Seq and thereafter DList, which are known to be faster at concatenation/consing. Then I abandoned the use of intermediate data structures completely, choosing to use a bytestring Builder to create the bytestring. Finally, I had the epiphany that the bytestring size was known at compile time, and rewrote the function to pre-allocate the bytestring, thereby reaching the fastest solution.

I also tried using Bytearray, which has more-or-less same performance of bytestring, but it is inconvenient to use because there are no functions to do IO with bytearrays. So I’d anyway need to use bytestrings for reading from STDIN or writing to STDOUT, and converting to-and-fro between bytearray and bytestring is a performance killer. Thus, I decided to stick to bytestrings.

The pre-allocated bytestring approach is 80 times faster than using lists, and almost 10 times faster than using Seq. For such gain, I’m okay with the complications it brings to the code. Here are the numbers in a chart (smaller is better):

<noscript></noscript>
Compilation time using different data-structures

The other important data structure used here is the map (or dictionary) in which we add the mappings from identifiers to their stack indices. This data structure needs to be performant because we do a lookup for each variable we encounter while compiling. I benchmarked compilation for some data structures⁶:

Data structure Time (ms) Slowdown

Data.HashMap.Strict.HashMap 55 1.00x

Data.List.List⁷ 63 1.14x

Data.Map.Strict.Map 71 1.29x

Data.Trie.Trie 80 1.45x

Data.Vector.Hashtables.Dictionary 104 1.89x

Data.HashTable.IO.BasicHashTable 312 5.67x

Strict hashmap turns out to be the fasted one, but interestingly, linked list is a close second. Mutable hashtable is the slowest even though I expected it to be the fastest. Here are the times in a chart (smaller is better):

<noscript></noscript>
Compilation time using different map data-structures

Another choice I had to make was how to write the go function. I ended up passing and returning pointers and environment map, and throwing errors in IO, but a number of solutions are possible. I tried out some of them, and noted the compilation times for the benchmark expression file:

Control structure Time (ms) Slowdown

IO 57.4 1.00x

IO + IORef 65.0 1.13x

IO + ReaderT 60.9 1.06x

IO + StateT 65.6 1.14x

IO + ExceptT 65.9 1.15x

IO + ReaderT + ExceptT 107.1 1.87x

IO + StateT + ExceptT 383.9 6.69x

IO + StateT + ReaderT 687.5 11.98x

IO + StateT + ReaderT + ExceptT 702.0 12.23x

IO + CPS 78.2 1.36x

IO + DCPS 78.4 1.37x

IO + ContT 76.5 1.33x

I tried putting the pointer in IORefs and StateT state instead of passing them back-and-forth. I tried putting the environment in a ReaderT config. I tried using ExceptT for throwing errors instead of using IO errors. Then I tried various combinations of these monad transformers.

Finally, I also tried converting the go function to be tail-recursive by using Continuation-passing style (CPS), and then defunctionalizing the continuations, as well as, using the ContT monad transformer. All of these approaches resulted in slower code. The times are interesting to compare (smaller is better):

<noscript></noscript>
Compilation time using different control-structures

There is no reason to use IORefs here because they result in slower and uglier code. Using one monad transformer at a time results in slight slowdowns, which may be worth the improvement in the code. But using more than one of them degrades performance by a lot. Also, there is no improvement caused by CPS conversion, because GHC is smart enough to optimize the non tail-recursive code to be faster then handwritten tail-recursive one that allocates a lot of closures (or objects in case of defunctionalization).

Moving on …

The Decompiler

It is a hassle to read raw bytes in the compiler output. Let’s write a decompiler to aid us in debugging and testing the compiler. First, a disassembler that converts bytes to opcodes:
type Program = Seq Opcode

disassemble :: Bytecode -> Result Program
disassemble bytecode = go 0 Seq.empty
  where
    !size = BS.length bytecode

    go !ip !program
      | ip == size = pure program
      | otherwise = case readInstr bytecode ip of
          0 | ip + 2 < size ->
            go (ip + 3) $ program |> OPush (readInstrArgInt16 bytecode ip)
          0 -> throwIPOOBError $ ip + 2
          1 | ip + 1 < size ->
            go (ip + 2) $ program |> OGet (readInstrArgWord8 bytecode ip)
          1 -> throwIPOOBError $ ip + 1
          2 -> go (ip + 1) $ program |> OSwapPop
          3 -> go (ip + 1) $ program |> OAdd
          4 -> go (ip + 1) $ program |> OSub
          5 -> go (ip + 1) $ program |> OMul
          6 -> go (ip + 1) $ program |> ODiv
          n -> throwDisassembleError $
            "Invalid bytecode: " <> show n <> " at: " <> show ip

    throwIPOOBError ip = throwDisassembleError $
      "Instruction index " <> show ip <> " out of bound " <> show (size - 1)

    throwDisassembleError = throwError . Error Disassemble
ArithVMLib.hs
A disassembled program is a sequence of opcodes. We simply go over each byte of the bytecode, and append the right opcode for it to the program, along with any parameters it may have. Note that we do not verify that the disassembled program is correct.

Here are the helpers that read instruction bytes and their arguments from a bytestring:
readInstr :: BS.ByteString -> Int -> Word8
readInstr = BS.unsafeIndex
{-# INLINE readInstr #-}

readInstrArgWord8 :: BS.ByteString -> Int -> Word8
readInstrArgWord8 bytecode ip = readInstr bytecode (ip + 1)
{-# INLINE readInstrArgWord8 #-}

readInstrArgInt16 :: BS.ByteString -> Int -> Int16
readInstrArgInt16 bytecode ip =
  let lb = readInstr bytecode (ip + 1)
      mb = readInstr bytecode (ip + 2)
      b1 :: Word16 = fromIntegral lb
      b2 = fromIntegral mb `shiftL` 8
   in fromIntegral (b1 .|. b2)
{-# INLINE readInstrArgInt16 #-}
ArithVMLib.hs
Next, we decompile the opcodes to an expression:
decompile :: Program -> Result Expr
decompile program = do
  stack <- go Seq.empty program
  checkStack Decompile maxBound $ length stack
  let ast :<| _ = stack
  pure ast
  where
    go stack = \case
      Seq.Empty -> pure stack
      opcode :<| rest -> case opcode of
        OPush n -> go (stack |> Num n) rest
        OAdd -> decompileBinOp Add >>= flip go rest
        OSub -> decompileBinOp Sub >>= flip go rest
        OMul -> decompileBinOp Mul >>= flip go rest
        ODiv -> decompileBinOp Div >>= flip go rest
        OGet i -> go (stack |> Var (mkIdent $ mkName $ fromIntegral i)) rest
        OSwapPop -> decompileLet >>= flip go rest
      where
        decompileBinOp op = case stack of
          stack' :|> a :|> b -> pure $ stack' |> BinOp op a b
          _ -> throwDecompileError $
            "Not enough elements to decompile binary operation: " <> show op

        decompileLet = case stack of
          stack' :|> a :|> b ->
            pure $ stack' |> Let (mkIdent $ mkName $ length stack - 2) a b
          _ -> throwDecompileError "Not enough elements to decompile let"

    mkName i = names `Seq.index` i
    names = Seq.fromList $ tail $ combinations 2

    combinations = \case
      0 -> [""]
      n -> let prev = combinations (n - 1)
        in prev <> [x : xs | x <- ['a' .. 'z'], xs <- prev]

    throwDecompileError = throwError . Error Decompile

checkStack :: (MonadError Error m) => Pass -> Int -> Int -> m ()
checkStack pass stackSize = \case
  1 -> pure ()
  0 -> throwError $ Error pass "Final stack has no elements"
  n | n > stackSize -> throwError . Error pass $ "Stack overflow"
  n | n > 1 -> throwError . Error pass $ "Final stack has more than one element"
  _ -> throwError . Error pass $ "Stack underflow"
ArithVMLib.hs
Decompilation is the opposite of compilation. While compiling there is an implicit stack of expressions that are yet to be compiled. We make that stack explicit here, capturing expressions as they are decompiled from opcodes. For compound expressions, we inspect the stack and use the already decompiled expressions as the operands of the expression being decompiled. This way we build up larger expressions from smaller ones, culminating in the single top-level expression at the end⁸. Finally, we check the stack to make sure that there is only one expression left in it. Note that like the disassembler, we do not verify that the decompiled expression is correct.

There is one tricky thing in decompilation: we lose the names of the variables when compiling, and are left with only stack indices. So while decompiling, we generate variable names from their stack indices by indexing a list of unique names. Let’s see it in action:
$ echo -n "1 + 2 - 3 * 4" | arith-vm compile | arith-vm disassemble
OPush 1
OPush 2
OAdd
OPush 3
OPush 4
OMul
OSub

$ echo -n "1 + 2 - 3 * 4" | arith-vm compile | arith-vm decompile
( ( 1 + 2 ) - ( 3 * 4 ) )

$ echo -n "let x = 4 in let y = 5 in x + y" | arith-vm compile | arith-vm disassemble
OPush 4
OPush 5
OGet 0
OGet 1
OAdd
OSwapPop
OSwapPop

$ echo -n "let x = 4 in let y = 5 in x + y" | arith-vm compile | arith-vm decompile
( let a = 4 in ( let b = 5 in ( a + b ) ) )
That’s all for compilation and decompilation. Now, we use them together to make sure that everything works.

Testing the Compiler

We write some unit tests for the compiler, targeting both success and failure cases:
compilerSpec :: Spec
compilerSpec = describe "Compiler" $ do
  forM_ compilerSuccessTests $ \(input, result) ->
    it ("compiles: \"" <> BSC.unpack input <> "\"") $ do
      parseCompile input `shouldBe` Right (Seq.fromList result)

  forM_ compilerErrorTests $ \(input, err) ->
    it ("fails for: \"" <> BSC.unpack input <> "\"") $ do
      parseCompile input `shouldSatisfy` \case
        Left (Error Compile msg) | err == msg -> True
        _ -> False

  it "fails for greater sized expr" $ do
    compile (Num 1, 4) `shouldSatisfy` \case
      Left
        ( Error Compile "Compiled bytecode size 3 is not same as expected size: 4"
        ) -> True
      _ -> False

  it "fails for lesser sized expr" $ do
    compile (Num 1, 2) `shouldSatisfy` \case
      Left (Error Compile "Instruction index 2 out of bound 1") -> True
      _ -> False
  where
    parseCompile = parseSized >=> compile' 4 >=> disassemble

compilerSuccessTests :: [(BSC.ByteString, [Opcode])]
compilerSuccessTests =
  [ ( "1",
      [OPush 1]
    ),
    ( "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1",
      [ OPush 1, OPush 2, OAdd, OPush 3, OPush 4, OMul, OSub, OPush 5, OPush 6,
        ODiv, OPush 1, ODiv, OAdd, OPush 1, OAdd ]
    ),
    ( "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)",
      [ OPush 1, OPush 2, OPush 3, OSub, OPush 4, OMul, OAdd, OPush 5, OPush 6,
        ODiv, OPush 1, OPush 1, OAdd, ODiv, OAdd ]
    ),
    ( "let x = 4 in x + 1",
      [OPush 4, OGet 0, OPush 1, OAdd, OSwapPop]
    ),
    ( "let x = 4 in let y = 5 in x + y",
      [OPush 4, OPush 5, OGet 0, OGet 1, OAdd, OSwapPop, OSwapPop]
    ),
    ( "let x = 4 in let x = x + 1 in x + 2",
      [OPush 4, OGet 0, OPush 1, OAdd, OGet 1, OPush 2, OAdd, OSwapPop, OSwapPop]
    ),
    ( "let x = let y = 3 in y + y in x * 3",
      [ OPush 3, OGet 0, OGet 0, OAdd, OSwapPop, OGet 0, OPush 3, OMul, OSwapPop ]
    ),
    ( "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3",
      [ OPush 1, OPush 2, OGet 1, OGet 1, OMul, OSwapPop, OAdd, OGet 0, OPush 1,
        OAdd, OSwapPop, OGet 0, OPush 3, OMul, OSwapPop ]
    ),
    ("1/0", [OPush 1, OPush 0, ODiv]),
    ("-32768 / -1", [OPush (-32768), OPush (-1), ODiv])
  ]

compilerErrorTests :: [(BSC.ByteString, String)]
compilerErrorTests =
  [ ("x", "Unknown variable: x"),
    ("let x = 4 in y + 1", "Unknown variable: y"),
    ("let x = y + 1 in x", "Unknown variable: y"),
    ("let x = x + 1 in x", "Unknown variable: x"),
    ("let x = 4 in let y = 1 in let z = 2 in y + x", "Stack overflow"),
    ("let x = 4 in let y = 5 in x + let z = y in z * z", "Stack overflow"),
    ("let a = 0 in let b = 0 in let c = 0 in let d = 0 in d", "Stack overflow")
  ]
ArithVMSpec.hs
In each test, we parse and compile an expression, and then disassemble the compiled bytes, which we match with expected list of opcodes, or an error message.

Let’s put these tests with the parser tests, and run them:
main :: IO ()
main = hspec $ do
  parserSpec
  astInterpreterSpec
  compilerSpec
ArithVMSpec.hs
Output of the test run
$ cabal test -O2
Running 1 test suites...
Test suite specs: RUNNING...

Parser
  parses: "1 + 2 - 3 * 4 + 5 / 6 / 0 + 1" [✔]
  parses: "1+2-3*4+5/6/0+1" [✔]
  parses: "1 + -1" [✔]
  parses: "let x = 4 in x + 1" [✔]
  parses: "let x=4in x+1" [✔]
  parses: "let x = 4 in let y = 5 in x + y" [✔]
  parses: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
  parses: "let x = 4 in (let y = 5 in x + 1) + let z = 2 in z * z" [✔]
  parses: "let x=4in 2+let y=x-5in x+let z=y+1in z/2" [✔]
  parses: "let x = (let y = 3 in y + y) in x * 3" [✔]
  parses: "let x = let y = 3 in y + y in x * 3" [✔]
  parses: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
  fails for: "" [✔]
  fails for: "1 +" [✔]
  fails for: "1 & 1" [✔]
  fails for: "1 + 1 & 1" [✔]
  fails for: "1 & 1 + 1" [✔]
  fails for: "(" [✔]
  fails for: "(1" [✔]
  fails for: "(1 + " [✔]
  fails for: "(1 + 2" [✔]
  fails for: "(1 + 2}" [✔]
  fails for: "66666" [✔]
  fails for: "-x" [✔]
  fails for: "let 1" [✔]
  fails for: "let x = 1 in " [✔]
  fails for: "let let = 1 in 1" [✔]
  fails for: "let x = 1 in in" [✔]
  fails for: "let x=1 inx" [✔]
  fails for: "letx = 1 in x" [✔]
  fails for: "let x ~ 1 in x" [✔]
  fails for: "let x = 1 & 2 in x" [✔]
  fails for: "let x = 1 inx" [✔]
  fails for: "let x = 1 in x +" [✔]
  fails for: "let x = 1 in x in" [✔]
  fails for: "let x = let x = 1 in x" [✔]
AST interpreter
  interprets: "1" [✔]
  interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
  interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
  interprets: "1 + -1" [✔]
  interprets: "1 * -1" [✔]
  interprets: "let x = 4 in x + 1" [✔]
  interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
  interprets: "let x = 4 in let y = 5 in x + y" [✔]
  interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
  interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
  interprets: "let x = let y = 3 in y + y in x * 3" [✔]
  interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
  fails for: "x" [✔]
  fails for: "let x = 4 in y + 1" [✔]
  fails for: "let x = y + 1 in x" [✔]
  fails for: "let x = x + 1 in x" [✔]
  fails for: "1/0" [✔]
  fails for: "-32768 / -1" [✔]
Compiler
  compiles: "1" [✔]
  compiles: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
  compiles: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
  compiles: "let x = 4 in x + 1" [✔]
  compiles: "let x = 4 in let y = 5 in x + y" [✔]
  compiles: "let x = 4 in let x = x + 1 in x + 2" [✔]
  compiles: "let x = let y = 3 in y + y in x * 3" [✔]
  compiles: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
  compiles: "1/0" [✔]
  compiles: "-32768 / -1" [✔]
  fails for: "x" [✔]
  fails for: "let x = 4 in y + 1" [✔]
  fails for: "let x = y + 1 in x" [✔]
  fails for: "let x = x + 1 in x" [✔]
  fails for: "let x = 4 in let y = 1 in let z = 2 in y + x" [✔]
  fails for: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
  fails for: "let a = 0 in let b = 0 in let c = 0 in let d = 0 in d" [✔]
  fails for greater sized expr [✔]
  fails for lesser sized expr [✔]

Finished in 0.0147 seconds
73 examples, 0 failures
Test suite specs: PASS
Awesome, it works! That’s it for this post. Let’s update our checklist:

Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).

Unit testing for our parser.

Interpreting ASTs.

Compiling ASTs to bytecode.

Disassembling and decompiling bytecode.

Unit testing for our compiler.

Property-based testing for our compiler.

Efficiently executing bytecode in a virtual machine (VM).

Unit testing and property-based testing for our VM.

Benchmarking our code to see how the different passes perform.

All the while keeping an eye on performance.

In the next part, we write a virtual machine that runs our compiled bytecode, and do some benchmarking.

If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!

There are VMs that execute hardware ISs instead of bytecode. Such VMs are also called Emulators because they emulate actual CPU hardware. Some examples are QEMU and video game console emulators.↩︎

VMs use virtual registers instead of actual CPU registers, which are often represented as a fixed size array of 1, 2, 4 or 8 byte elements.↩︎

I call them variables here but they do not actually vary. A better name is let bindings.↩︎

We could have used two separate opcodes here: OSwap and OPop. That would result in same final result when evaluating an expression, but we’d have to execute two instructions instead of one for Let expressions. Using a single OSwapPop instruction speeds up execution, not only because we reduce the number of instructions, but also because we don’t need to do a full swap, only a half swap is enough because we pop the stack anyway after the swap. This also shows how we can improve the performance of our VMs by inventing specific opcodes for particular operations.↩︎

Notice the use of strict Pairs here, for performance reasons.↩︎

I ran all benchmarks on an Apple M4 Pro 24GB machine against a 142MB file.↩︎

Used as Association List.↩︎

The decompiler is a bottom-up shift-reduce parser from the opcodes to the expression tree.↩︎

This post is part of the series: A Fast Bytecode VM for Arithmetic.

The Parser

The Compiler (you are here)

The Virtual Machine

If you liked this post, please leave a comment.
by Abhinav Sarkar (abhinav@abhinavsarkar.net) at August 24, 2025 12:00 AM

Data structure	Time (ms)	Incremental speedup	Overall speedup
List	4345	1x	1x
Seq	523	8.31x	8.31x
DList	486	1.08x	8.94x
BS Builder	370	1.31x	11.74x
Pre-allocated BS	54	6.85x	80.46x
Bytearray	52	1.02x	83.55x

Data structure	Time (ms)	Slowdown
`Data.HashMap.Strict.HashMap`	55	1.00x
`Data.List.List`⁷	63	1.14x
`Data.Map.Strict.Map`	71	1.29x
`Data.Trie.Trie`	80	1.45x
`Data.Vector.Hashtables.Dictionary`	104	1.89x
`Data.HashTable.IO.BasicHashTable`	312	5.67x

Control structure	Time (ms)	Slowdown
IO	57.4	1.00x
IO + IORef	65.0	1.13x
IO + ReaderT	60.9	1.06x
IO + StateT	65.6	1.14x
IO + ExceptT	65.9	1.15x
IO + ReaderT + ExceptT	107.1	1.87x
IO + StateT + ExceptT	383.9	6.69x
IO + StateT + ReaderT	687.5	11.98x
IO + StateT + ReaderT + ExceptT	702.0	12.23x
IO + CPS	78.2	1.36x
IO + DCPS	78.4	1.37x
IO + ContT	76.5	1.33x

Planet Haskell

November 22, 2025

GHC 9.12.3-rc2 is now available

wz1000 - 2025-11-22

November 20, 2025

What are symbolic macros?

Conventions and restrictions

Naming

Access to undeclared resources

Attributes

Positional arguments

Default values

Inheritance

Mutation

Configuration

Querying macros

November 18, 2025

November 17, 2025

Creating a Response

Serializing the Response

Adding Helpers

Version and Code

Serializing Headers

Writing the Networking Layer

Final Code

Conclusion

November 16, 2025

November 13, 2025

November 11, 2025

The Problem with Shared State

Data races

Mutexes

Composing Critical Sections

Deadlocks/Livelocks

Assessing the damage

Cleaning up the Chaos

Concurrency Patterns

A new (old) synchronization primitive

Data Races

Deadlock/Livelock

Composition

Smart Retries

Conclusion

November 10, 2025

Outlining our Parser

Helper Functions

Parsing a Method

Parsing the URI

Parsing the Version

Parsing Headers

Parsing the Body

Complete Parser Code

Conclusion

Introduction

Overview

Conventions

Formal Power Series

Linear Functionals

Properties

Evaluation Functional

Formal Derivative

Linear Operators

Evaluation Operator

Characterizing Linear Operators Induced from Formal Power Series

Polynomial Sequences

Recurrence Formulas

Transfer Formulas

Umbral Composition and Transfer Operators

Example: Chebyshev Polynomials

Summary

Call for collaboration: Disco web UI

Disco

The problem

Old solution: Replit

Solution criteria

Potential solutions

UI levels

November 08, 2025

A little background

Finding a starting point

Current state of `modus-catppuccin`

Using `ghc-debug` to investigate retainers

Culprit 1: lazy `unzip`

Culprit 3: lack of sharing in `iterate`