Planet Haskell

December 23, 2024

Michael Snoyman

A secure Bitcoin self custody strategy

Up until this year, my Bitcoin custody strategy was fairly straightforward, and likely familiar to other hodlers:

  • Buy a hardware wallet
  • Put the seed phrase on steel plates
  • Secure those steel plates somewhere on my property

But in October of last year, the situation changed. I live in Northern Israel, close to the Lebanese border. The past 14 months have involved a lot of rocket attacks, including destruction of multiple buildings in my home town. This brought into question how to properly secure my sats. Importantly, I needed to balance two competing goals:

  1. Resiliency of the saved secrets against destruction. In other words: make sure I didn't lose access to the wallet.
  2. Security against attackers trying to steal those secrets. In other words: make sure no one else got access to the wallet.

I put some time into designing a solution to these conflicting goals, and would like to share some thoughts for others looking to improve their BTC custody strategy. And if anyone has any recommendations for improvements, I'm all ears!

Goals

  • Self custody I didn't want to rely on an external custody company. Not your keys, not your coins.
  • Full access I always maintain full access to my funds, without relying on any external party.
  • Computer hack resilient If my computer systems are hacked, I will not lose access to or control of my funds (neither stolen nor lost).
  • Physical destruction resilient If my hardware device and steel plates are both destroyed (as well as anything else physically located in my home town), I can still recovery my funds.
  • Will survive me If I'm killed, I want my wife, children, or other family members to be able to recover and inherit my BTC.

Multisig

The heart of this protection mechanism is a multisig wallet. Unfortunately, interfaces for setting up multisig wallets are tricky. I'll walk through the basics and then come back to how to set it up.

The concept of a multisig is that your wallet is protected by multiple signers. Each signer can be any "normal" wallet, e.g. a software or hardware wallet. You choose a number of signers and a threshold of signers required to perform a transaction.

For example, a 2 of 2 multisig would mean that 2 wallets can sign transactions, and both of them need to sign to make a valid transaction. A 3 of 5 would mean 5 total signers, any 3 of them being needed to sign a transaction.

For my setup, I set up a 2 of 3 multisig, with the 3 signers being a software wallet, a hardware wallet, and SLIP39 wallet. Let's go through each of those, explain how they work, and then see how the solution addresses the goals.

Software wallet

I set up a software wallet and saved the seed phrase in a dedicated password manager account using Bitwarden. Bitwarden offers an emergency access feature, which essentially means a trusted person can be listed as an emergency contact and can recover your account. The process includes a waiting period, during which the account owner can reject the request.

Put another way: Bitwarden is offering a cryptographically secure, third party hosted, fully managed, user friendly dead-man switch. Exactly what I needed.

I added a select group of trusted people as the recoverers on the account. Otherwise, I keep the account securely locked down in Bitwarden and can use it for signing when necessary.

Let's see how this stacks up against the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Fail, if my system is hacked, I lose control of the wallet
  • Physical destruction resilient Check, Bitwarden lives beyond my machines
  • Will survive me Check thanks to the dead-man switch

Hardware wallet

Not much to say about the hardware wallet setup that I haven't said already. Let's do the goals:

  • Self custody Check, no reliance on anyone else
  • Full access Check, I have access to the wallet at all times
  • Computer hack resilient Check, the private keys never leave the hardware device
  • Physical destruction resilient Fail, the wallet and plates could easily be destroyed, and the plates could easily be stolen. (The wallet could be stolen too, but thanks to the PIN mechanism would theoretically be resistant to compromise. But that's not a theory I'd want to bet my wealth on.)
  • Will survive me Check, anyone can take my plates and recover the wallet

SLIP39

This one requires a bit of explanation. SLIP39 is a not-so-common standard for taking some data and splitting it up into a number of shards. You can define the threshold of shards necessary to reconstruct the original secret. This uses an algorithm called Shamir's Secret Sharing. (And yes, it is very similar in function to multisig, but implemented differently).

The idea here is that this wallet is controlled by a group of friends and family members. Without getting into my actual setup, I could choose 7 very trusted individuals from all over the world and tell them that, should I contact them and ask for them, they should send me their shards so I can reconstruct that third wallet. And to be especially morbid, they also know the identity of some backup people in the event of my death.

In any event, the idea is that if enough of these people agree to, they can reconstruct the third wallet. The assumption is that these are all trustworthy people. But even with trustworthy people, (1) I could be wrong about how trustworthy they are, or (2) they could be coerced or tricked. So let's see how these security mechanism stands up:

  • Self custody Fail, I'm totally reliant on others.
  • Full access Fail, by design I don't keep this wallet myself, so I must rely on others.
  • Computer hack resilient Check, the holders of these shards keep them in secure, offline storage.
  • Physical destruction resilient Check (sort of), since the probability of all copies being destroyed or stolen is negligible.
  • Will survive me Check, by design

Comparison against goals

We saw how each individual wallet stacked up against the goals. How about all of them together? Well, there are certainly some theoretical ways I could lose the funds, e.g. my hardware wallet and plates are destroyed and a majority of shard holders for the SLIP39 lost their shards. However, if you look through the check/fail lists, every category has at least two checks. Meaning: on all dimensions, if some catastrophe happens, at least two of the wallets should survive.

Now the caveats (I seem to like that word). I did a lot of research on this, and this is at least tangential to my actual field of expertise. But I'm not a dedicated security researcher, and can't really claim full, deep understanding of all these topics. So if I made any mistakes here, please let me know.

How-to guide

OK, so how do you actually get a system like this running? I'll give you my own step-by-step guide. Best case scenario for all this: download all the websites and programs mentioned onto a fresh Linux system install, disconnect the internet, run the programs and copy down any data as needed, and then wipe the system again. (Or, alternatively, do all the actions from a Live USB session.)

  1. Set up the SLIP39. You can use an online generator. Choose the number of bits of entropy (IMO 128bit is sufficient), choose the total shares and threshold, and then copy down the phrases.
  2. Generate the software wallet. You can use a sister site to the SLIP39 generator. Choose either 12 or 24 words, and write those words down. On a different, internet-connected computer, you can save those words into a Bitwarden account, and set it up with appropriate emergency access.
  3. Open up Electrum. (Other wallets, like Sparrow, probably work for this too, but I've only done it with Electrum.) The rest of this section will include a step-by-step guide through the Electrum steps. And yes, I took these screenshots on a Mac, but for a real setup use a Linux machine.

Set up a new wallet. Enter a name (doesn't matter what) and click next.

New wallet

Choose a multisig wallet and click next.

Multisig

Choose 3 cosigners and require 2 signatures.

Signer count

Now we're going to enter all three wallets. The first one will be your hardware device. Click next, then follow all the prompts to set it up.

Hardware

After a few screens (they'll be different based on your choice of hardware device), you'll be prompted to select a derivation path. Use native segwit and the standard derivation path.

segwit

This next screen was the single most complicated for me, simply because the terms were unclear. First, you'll see a Zpub string displayed as a "master public key," e.g.:

Zpub75J9cLwa3iX1zB2oiTdvGDf4EyHWN1ZYs5gVt6JSM9THA6XLUoZhA4iZwyruCKHpw8BFf54wbAK6XdgtMLa2TgbDcftdsietCuKQ6eDPyi6

You need to write this down. It's the same as an xpub, but for multisig wallets. This represents all the possible public keys for your hardware wallet. Putting together the three Zpub values will allow your software of choice to generate all the receiving and change addresses for your new wallet. You'll need all three, so don't lose them! But on their own, they cannot be used to access your funds. Therefore, treat them with "medium" security. Backing up in Bitwarden with your software wallet is a good idea, and potentially simply sending to some friends to back up just in case.

And that explanation brings us back to the three choices on the screen. You can choose to either enter a cosigner key, a cosigner seed, or use another hardware wallet. The difference between key and seed is that the former is public information only, whereas the latter is full signing power. Often, multisig wallets are set up by multiple different people, and so instead of sharing the seed with each other (a major security violation), they each generate a seed phrase and only share the key with each other.

However, given that you're setting up the wallet with access to all seed phrases, and you're doing it on an airgapped device, it's safe to enter the seed phrases directly. And I'd recommend it, to avoid the risk of generating the wrong master key from a seed. So go ahead and choose "enter cosigner seed" and click next.

Add cosigner 2

And now onto the second most confusing screen. I copied my seed phrase into this text box, but it won't let me continue!

Cannot continue

The trick is that Electrum, by default, uses its own concept of seed phrases. You need to click on "Options" and then choose BIP39, and then enter your seed phrase.

BIP39

Continue through the other screens until you're able to enter the final seed. This time, instead of choosing BIP39, choose SLIP39. You'll need to enter enough of the SLIP39 shards to meet the threshold.

SLIP39

And with that, you can continue through the rest of the screens, and you'll now have a fully operational multisig!

Addresses

Open up Electrum again on an internet-connected computer. This time, connect the hardware wallet as before, enter the BIP39 as before, but for the SLIP39, enter the master key instead of the SLIP39 seed phrase. This will ensure that no internet connected device ever has both the software wallet and SLIP39 at the same time. You should confirm that the addresses on the airgapped machine match the addresses on the internet connected device.

If so, you're ready for the final test. Send a small amount of funds into the first receiving address, and then use Electrum on the internet connected device to (1) confirm in the history that it arrived and (2) send it back to another address. You should be asked to sign with your hardware wallet.

If you made it this far, congratulations! You're the proud owner of a new 2of3 multisig wallet.

Conclusion

I hope the topic of death and war wasn't too terribly morbid for others. But these are important topics to address in our world of self custody. I hope others found this useful. And once again, if anyone has recommendations for improvements to this setup, please do let me know!

December 23, 2024 12:00 AM

December 22, 2024

Haskell Interlude

60: Tom Ellis

Tom Ellis works at Groq, using Haskell to compile AI models to specialized hardware.  In this episode, we talk about stability of both GHC and Haskell libraries, effects, and strictness, and the premise of functional programming: make invalid states and invalid *laziness* unrepresentable! 

by Haskell Podcast at December 22, 2024 06:00 PM

December 21, 2024

Philip Wadler

Please submit to Lambda Days

 


I'm part of the programme committee for Lambda Days, and I’m personally inviting you to submit your talk!

Lambda Days is all about celebrating the world of functional programming, and we’re eager to hear about your latest ideas, projects, and discoveries. Whether it’s functional languages, type theory, reactive programming, or something completely unexpected—we want to see it!

🎯 Submission Deadline: 9 February 2025
🎙ï¸� Never spoken before? No worries! We’re committed to supporting speakers from all backgrounds, especially those from underrepresented groups in tech.

Submit your talk and share your wisdom with the FP community.

👉 https://www.lambdadays.org/lambdadays2025#call-for-talks

by Philip Wadler (noreply@blogger.com) at December 21, 2024 07:56 PM

December 19, 2024

Tweag I/O

The Developer Experience Upgrade: From Create React App to Vite

We all know how it feels: staring at the terminal while your development server starts up, or watching your CI/CD pipeline crawl through yet another build process. For many React developers using Create React App (CRA), this waiting game has become an unwanted part of the daily routine. While CRA has been the go-to build tool for React applications for years, its aging architecture is increasingly becoming a bottleneck for developer productivity. Enter Vite: a modern build tool that’s not just an alternative to CRA, but a glimpse into the future of web development tooling. I’ll introduce both CRA and Vite, share how switching to Vite transformed our development workflow with concrete numbers and benchmarks to demonstrate the dramatic improvements in build times, startup speed, and overall developer experience.

Create React App: A Historical Context

Create React App played a very important role in making React what it is today. By introducing a single, clear, and recommended approach for creating React projects, it enabled developers to focus on building applications without worrying about the complexity of the underlying build tools.

However, like many mature and widely established tools, CRA has become stagnant over time by not keeping up with features provided by modern (meta-)frameworks like server-side rendering, routing, and data fetching. It also hasn’t taken advantage of web APIs to deliver fast applications by default.

Let’s dive into some of the most noticeable limitations.

Performance Issues

CRA’s performance issues stem from one major architectural factor: its reliance on Webpack as its bundler. Webpack, while powerful and flexible, has inherent performance limitations. Webpack processes everything through JavaScript, which is single-threaded by nature and slower at CPU-intensive tasks compared to lower-level languages like Go or Rust.

Here’s a simplified version of what happens every time you make a code change:

  1. CRA (using Webpack) needs to scan your entire project to understand how all your files are connected to build a dependency graph
  2. It then needs to transform all your modern JavaScript, TypeScript, or JSX code into a version that browsers can understand
  3. Finally, it bundles everything together into a single package that can be served to your browser

Rebuilding the app becomes increasingly time-consuming as the project grows. During development, Webpack’s incremental builds help mitigate performance challenges by only reprocessing modules that have changed, leveraging the dependency graph to minimize unnecessary work. However, the bundling step still needs to consider all files—both cached and reprocessed, to generate a complete bundle that can be served to the browser, which means Webpack must account for the entire codebase’s structure with each build.

Security Issues

When running npx create-react-app <project-directory>, after waiting for a while, a good amount of deprecated warnings (23 packages as of writing this) will be shown. At the end of the installation process, a message indicating 8 vulnerabilities (2 moderate, 6 high) will appear. This means that create-react-app relies on packages that have known critical security vulnerabilities.

Support Issues

The React team no longer recommends CRA for new projects, and they have stopped providing support for it. The last published version on npm was 3 years ago.

Instead, React’s official documentation now includes Vite in its recommendations for both starting new projects and adding React to existing projects.

While CRA served its purpose well in the past, its aging architecture, security vulnerabilities, and lack of modern features make it increasingly difficult to justify for new projects.

Introducing Vite

Vite is a build tool that is designed to be simpler, faster and more efficient for building modern web applications. It’s opinionated and comes with sensible defaults out of the box.

Vite was created by Evan You, author of Vue, in 2020 to solve the complexity, slowness and the heaviness of the JavaScript module bundling toolchain. Since then, Vite has become one of the most popular build tools for web development, with over 15 million downloads per week and a community that has rated it as the Most Loved Library Overall, No.1 Most Adopted (+30%) and No.2 Highest Retention (98%) in the State of JS 2024 Developer Survey.

In addition to streamlining the development of single-page applications, Vite can also power meta frameworks and has support for server-side rendering (SSR). Although its scope is broader than what CRA was meant for, it does a fantastic job replacing CRA.

Why Vite is Faster

Vite applies several modern web technologies to improve the development experience:

1. Native ES Modules (ESM)

During development mode, Vite serves source code over native ES modules basically letting the browser handle module loading directly and skipping the bundling step. With this approach, Vite only processes and sends code as it is imported by the browser, and conditionally imported modules are processed only if they’re actually needed on the current page. This means the dev server can start much faster, even in large projects.

2. Efficient Hot Module Replacement (HMR)

By serving source code as native ESM to the browser, thus skipping the bundling step, Vite’s HMR process can provide near-instant updates while preserving the application state. When code changes, Vite updates only the modified module and its direct dependencies, ensuring fast updates regardless of project size. Additionally, Vite leverages HTTP headers and caching to minimize server requests, speeding up page reloads when necessary. More information about what HMR is and how it works in Vite can be found in this exhaustive blog post.

3. Optimized Build Tooling

Even though ESM are now widely supported, dependencies can still be shipped as CommonJS or UMD. To leverage the benefits of ESM during development, Vite uses esbuild to pre-bundle dependencies when starting the dev server. This step involves transforming CommonJS/UMD to ES modules and converting dependencies with many internal modules into a single module, thus improving performance and reducing browser requests.

When it comes to production, Vite switches to Rollup to bundle the application. Bundling is still preferred over ESM when shipping to production, as it allows for more optimizations like tree-shaking, lazy-loading and chunk splitting.

While this dual-bundler approach leverages the strengths of each bundler, it’s important to note that it’s a trade-off that can potentially introduce subtle inconsistencies between development and production environments and adds to Vite’s complexity.

By leveraging modern web technologies like ESM and efficient build tools like esbuild and Rollup, Vite represents a significant leap forward in development tooling, offering speed and simplicity that CRA simply cannot match with the way it’s currently architected.

Practical Results

The Migration Process

The codebase we migrated from CRA to Vite had around 250 files and 30k lines of code. Built as a Single Page Application using React 18, it uses Zustand and React Context for state management, with Tailwind CSS and shadcn/ui and some Bootstrap legacy components.

Here is a high-level summary of the migration process as it applied to our project, which took roughly a day to complete. The main steps included:

  1. Removing CRA-related dependencies
  2. Installing Vite and its React plugin
  3. Moving index.html to the root directory
  4. Creating a Vite configuration file
  5. Adding a type declaration file
  6. Updating the npm scripts in package.json
  7. Adjusting tsconfig.json to align with Vite’s requirements

All steps are well documented in the Vite documentation and in several step-by-step guides available on the web.

Most challenges encountered were related to environment variables and path aliases, which were easily resolved using Vite’s documentation, and its vibrant community has produced extensive resources, guides, and solutions for even the most specialized setups.

Build Time

The build time for the project using Create React App (CRA) was 1 minute and 34 seconds. After migrating to Vite, the build time was reduced to 29.2 seconds, making it 3.2 times faster.

[Build time comparison between CRA and Vite showing 3.2x improvement]

This reduction in build time speeds up CI/CD cycles, enabling more frequent testing and deployment. This is crucial for our development workflow, where faster builds mean quicker turnaround times and fewer delays for other team members. It can also reduce the cost of running the build process.

Dev Server Startup Time

The speed at which the development server starts can greatly impact the development workflow, especially in large projects.

The development server startup times saw a remarkable improvement after migrating from Create React App (CRA) to Vite. With CRA, a cold start took 15.469 seconds, and a non-cold start was 6.241 seconds. Vite dramatically reduced these times, with a cold start at just 1.202 seconds—12.9 times faster—and a non-cold start at 598 milliseconds, 10.4 times faster. The graph below highlights these impressive gains.

Development server startup time comparison showing 12.9x improvement

This dramatic reduction in startup time is particularly valuable when working with multiple branches or when frequent server restarts are needed during development.

HMR Update Time

While both CRA and Vite perform well with Hot Module Replacement at our current project scale, there are notable differences in the developer experience. CRA’s Webpack-based HMR typically takes around 1 second to update—which might sound fast, but the difference becomes apparent when compared to Vite’s near-instantaneous updates.

This distinction becomes more pronounced as projects grow in size and complexity. More importantly, the immediate feedback from Vite’s HMR creates a noticeably smoother development experience, especially when designing features that require frequent code changes and UI testing cycles. The absence of even a small delay helps maintain a more fluid and enjoyable workflow.

Bundle Size

Another essential factor is the size of the final bundled application, which affects load times and overall performance.

Bundle size comparison between CRA and Vite showing 27.5% reduction in raw bundle size and 9.3% reduction in gzipped size

This represents a 27.5% reduction in raw bundle size and a 9.3% reduction in gzipped size. For end users, this means faster page loads, less data usage, and better performance, especially on mobile devices.

The data clearly illustrates that Vite’s improvements in build times, startup speed, and bundle size provide a significant and measurable upgrade to our development workflow.

The Hidden Advantage: Reduced Context Switching

One of the less obvious but valuable benefits of migrating to a faster environment like Vite is the reduction in context switching. In environments with slower build and start-up times, developers are more likely to engage in other tasks during these “idle” moments. Research on task interruptions shows that even brief context switches can introduce cognitive “reorientation” costs, increasing stress and reducing efficiency.

By reducing build and start-up times, Vite allows our team to maintain focus on their primary tasks. Developers are less likely to switch tasks and better able to stay within the “flow” of development, ultimately leading to a smoother, more focused workflow and, over time, less cognitive strain.

Beyond the measurable metrics, the real victory lies in how Vite’s speed helps developers maintain their focus and flow, leading to a more enjoyable and happy experience overall.

The Future of Vite is Bright

Vite is aiming to be a unified toolchain for the JavaScript ecosystem, and it is already showing great progress by introducing new tools like Rolldown and OXC.

Rolldown, Vite’s new bundler written in Rust, promises to be even faster than esbuild while maintaining full compatibility with the JavaScript ecosystem. It also unifies Vite’s bundling approach across development and production environments, solving the previously mentioned trade-off. Meanwhile, OXC provides a suite of high-performance tools including the fastest JavaScript parser, resolver, and TypeScript transformer available.

These innovations are part of Vite’s broader vision to create a more unified, efficient, and performant development experience that eliminates the traditional fragmentation in JavaScript tooling.

Early benchmarks show impressive performance improvements:

  • OXC Parser is 3x faster than SWC
  • OXC Resolver is 28x faster than enhanced-resolve
  • OXC TypeScript transformer is 4x faster than SWC
  • OXLint is 50-100x faster than ESLint

With innovations like Rolldown and OXC on the horizon, Vite is not just solving today’s development challenges but is actively shaping the future of web development tooling.

Conclusion

Migrating from Create React App to Vite proved to be a straightforward process that delivered substantial benefits across multiple dimensions. The quantifiable improvements in terms of build time, bundle size and development server startup time were impressive and by themselves justify the migration effort.

However, the true value extends beyond these measurable metrics. The near-instant Hot Module Replacement, reduced context switching, and overall smoother development workflow have significantly enhanced our team’s development experience. Developers spend less time waiting and more time in their creative flow, leading to better focus and increased productivity.

The migration also positions our project for the future, as Vite continues to evolve with promising innovations like Rolldown and OXC. Given the impressive results and the relatively straightforward migration process, the switch from CRA to Vite stands as a clear win for both our development team and our application’s performance.

December 19, 2024 12:00 AM

December 18, 2024

Michael Snoyman

Normal People Shouldn't Invest

The world we live in today is inflationary. Through the constant increase in the money supply by governments around the world, the purchasing power of any dollars (or other government money) sitting in your wallet or bank account will go down over time. To simplify massively, this leaves people with three choices:

  1. Keep your money in fiat currencies and earn a bit of interest. You’ll still lose purchasing power over time, because inflation virtually always beats interest, but you’ll lose it more slowly.
  2. Try to beat inflation by investing in the stock market and other risk-on investments.
  3. Recognize that the game is slanted against you, don’t bother saving or investing, and spend all your money today.

(Side note: if you’re reading this and screaming at your screen that there’s a much better option than any of these, I’ll get there, don’t worry.)

High living and melting ice cubes

Option 3 is what we’d call “high time preference.” It means you value the consumption you can have today over the potential savings for the future. In an inflationary environment, this is unfortunately a very logical stance to take. Your money is worth more today than it will ever be later. May as well live it up while you can. Or as Milton Friedman put it, engage in high living.

But let’s ignore that option for the moment, and pursue some kind of low time preference approach. Despite the downsides, we want to hold onto our wealth for the future. The first option, saving in fiat, would work with things like checking accounts, savings accounts, Certificates of Deposit (CDs), government bonds, and perhaps corporate bonds from highly rated companies. There’s little to no risk in those of losing your original balance or the interest (thanks to FDIC protection, a horrible concept I may dive into another time). And the downside is also well understood: you’re still going to lose wealth over time.

Or, to quote James from InvestAnswers, you can hold onto some melting ice cubes. But with sufficient interest, they’ll melt a little bit slower.

The investment option

With that option sitting on the table, many people end up falling into the investment bucket. If they’re more risk-averse, it will probably be a blend of both risk-on stock investment and risk-off fiat investment. But ultimately, they’re left with some amount of money that they want to put into a risk-on investment. The only reason they’re doing that is on the hopes that between price movements and dividends, the value of their investment will grow faster than anything else they can choose.

You may be bothered by my phrasing. “The only reason.” Of course that’s the only reason! We only put money into investments in order to make more money. What other possible reason exists?

Well, the answer is that while we invest in order to make money, that’s not the only reason. That would be like saying I started a tech consulting company to make money. Yes, that’s a true reason. But the purpose of the company is to meet a need in the market: providing consulting services. Like every economic activity, starting a company has a dual purpose: making a profit, but by providing actual value.

So what actual value is generated for the world when I choose to invest in a stock? Let’s rewind to real investment, and then we’ll see how modern investment differs.

Michael (Midas) Mulligan

Let’s talk about a fictional character, Michael Mulligan, aka Midas. In Atlas Shrugged, he’s the greatest banker in the country. He created a small fortune for himself. Then, using that money, he very selectively invested in the most promising ventures. He put his own wealth on the line because he believed each of those ventures had a high likelihood to succeed.

He wasn’t some idiot who jumps on his CNBC show to spout nonsense about which stocks will go up and down. He wasn’t a venture capitalist who took money from others and put it into the highest-volatility companies hoping that one of them would 100x and cover the massive losses on the others. He wasn’t a hedge fund manager who bets everything on financial instruments so complex he can’t understand them, knowing that if it crumbles, the US government will bail him out.

And he wasn’t a normal person sitting in his house, staring at candlestick charts, hoping he can outsmart every other person staring at those same charts by buying in and selling out before everyone else.

No. Midas Mulligan represented the true gift, skill, art, and value of real investment. In the story, we find out that he was the investor who got Hank Rearden off the ground. Hank Rearden uses that investment to start a steel empire that drives the country, and ultimately that powers his ability to invest huge amounts of his new wealth into research into an even better metal that has the promise to reshape the world.

That’s what investment is. And that’s why investment has such a high reward associated with it. It’s a massive gamble that may produce untold value for society. The effort necessary to determine the right investments is high. It’s only right that Midas Mulligan be well compensated for his work. And by compensating him well, he’ll have even more money in the future to invest in future projects, creating a positive feedback cycle of innovation and improvements.

Michael (Crappy Investor) Snoyman

I am not Midas Mulligan. I don’t have the gift to choose the winners in newly emerging markets. I can’t sit down with entrepreneurs and guide them to the best way to make their ideas thrive. And I certainly don’t have the money available to make such massive investments, much less the psychological profile to handle taking huge risks with my money like that.

I’m a low time preference individual by my upbringing, plus I am very risk-averse. I spent most of my adult life putting money into either the house I live in or into risk-off assets. I discuss this background more in a blog post on my current investment patterns. During the COVID-19 money printing, I got spooked about this, realizing that the melting ice cubes were melting far faster than I had ever anticipated. It shocked me out of my risk-averse nature, realizing that if I didn’t take a more risky stance with my money, ultimately I’d lose it all.

So like so many others, I diversified. I put money into stock indices. I realized the stock market was risky, so I diversified further. I put money into various cryptocurrencies too. I learned to read candlestick charts. I made some money. I felt pretty good.

I started feeling more confident overall, and started trying to predict the market. I fixated on this. I was nervous all the time, because my entire wealth was on the line constantly.

And it gets even worse. In economics, we have the concept of an opportunity cost. If I invest in company ABC and it goes up 35% in a month, I’m a genius investor, right? Well, if company DEF went up 40% that month, I can just as easily kick myself for losing out on the better opportunity. In other words, once you’re in this system, it’s a constant rat race to keep finding the best possible returns, not simply being happy with keeping your purchasing power.

Was I making the world a better place? No, not at all. I was just another poor soul trying to do a better job of entering and exiting a trade than the next guy. It was little more than riding a casino.

And yes, I ultimately lost a massive amount of money through this.

Normal people shouldn’t invest

Which brings me to the title of this post. I don’t believe normal people should be subjected to this kind of investment. It’s an extra skill to learn. It’s extra life stress. It’s extra risk. And it doesn’t improve the world. You’re being rewarded—if you succeed at all—simply for guessing better than others.

(Someone out there will probably argue efficient markets and that having everyone trading stocks like this does in fact add some efficiencies to capital allocation. I’ll give you a grudging nod of agreement that this is somewhat true, but not sufficiently enough to justify the returns people anticipate from making “good” gambles.)

The only reason most people ever consider this is because they feel forced into it, otherwise they’ll simply be sitting on their melting ice cubes. But once they get into the game, between risk, stress, and time investment, they’re lives will often get worse.

One solution is to not be greedy. Invest in stock market indices, don’t pay attention to day-to-day price, and assume that the stock market will continue to go up over time, hopefully beating inflation. And if that’s the approach you’re taking, I can honestly say I think you’re doing better than most. But it’s not the solution I’ve landed on.

Option 4: deflation

The problem with all of our options is that they are built in a broken world. The fiat/inflationary world is a rigged game. You’re trying to walk up an escalator that’s going down. If you try hard enough, you’ll make progress. But the system is against you. This is inherent to the design. The inflation in our system is so that central planners have the undeserved ability to appropriate productive capacity in the economy to do whatever they want with it. They can use it to fund government welfare programs, perform scientific research, pay off their buddies, and fight wars. Whatever they want.

If you take away their ability to print money, your purchasing power will not go down over time. In fact, the opposite will happen. More people will produce more goods. Innovators will create technological breakthroughs that will create better, cheaper products. Your same amount of money will buy more in the future, not less. A low time preference individual will be rewarded. By setting aside money today, you’re allowing productive capacity today to be invested into building a stronger engine for tomorrow. And you’ll be rewarded by being able to claim a portion of that larger productive pie.

And to reiterate: in today’s inflationary world, if you defer consumption and let production build a better economy, you are punished with reduced purchasing power.

So after burying the lead so much, my option 4 is simple: Bitcoin. It’s not an act of greed, trying to grab the most quickly appreciating asset. It’s about putting my money into a system that properly rewards low time preference and saving. It’s admitting that I have no true skill or gift to the world through my investment capabilities. It’s recognizing that I care more about destressing my life and focusing on things I’m actually good at than trying to optimize an investment portfolio.

Can Bitcoin go to 0? Certainly, though year by year that likelihood is becoming far less likely. Can Bitcoin have major crashes in its price? Absolutely, but I’m saving for the long haul, not for a quick buck.

I’m hoping for a world where deflation takes over. Where normal people don’t need to add yet another stress and risk to their life, and saving money is the most natural, safest, and highest-reward activity we can all do.

Further reading

December 18, 2024 12:00 AM

December 17, 2024

Michael Snoyman

Hello Nostr

This blog post is in the style of my previous blog post on Matrix. I'm reviewing and sharing onboarding experience with a new technology. I'm sharing in the hopes that it will help others become aware of this new technology, understand what it can do, and if people are intrigued, have a more pleasant onboarding experience. Just keep in mind: I'm in no way an expert on this. PRs welcome to improve the content here!

What is Nostr? Why Nostr?

I’d describe Nostr as decentralized social media. It’s a protocol for people to identify themselves via public key cryptography (thus no central identity service), publish various kinds of information, access through any compatible client, and interact with anyone else. At its simplest, it’s a Twitter/X clone, but it offers much more functionality than that (though I’ve barely scratched the surface).

Nostr has a high overlap with the Bitcoin ecosystem, including built-in micropayments (zaps) via the Lightning Network, an instantaneous peer-to-peer payment layer built on top of Bitcoin.

I'll start off by saying: right now, Nostr's user experience is not on a par with centralized services like X. But I can see a lot of promise. The design of the protocol encourages widespread innovation, as demonstrated by the plethora of clients and other tools to access the protocol. Decentralized/federated services are more difficult to make work flawlessly, but the advantages in terms of freedom of expression, self-custody of your data, censorship resistance, and ability to build more featureful tools on top of it make me excited.

I was skeptical (to say the least) about the idea of micropayments built into social media. But I'm beginning to see the appeal. Firstly, getting away from an advertiser-driven business model fixes the age old problem of "if you're not paying for a service, you're the product." But I see a deeper social concept here too. I intend to blog more in the future on the topic of non-monetary competition and compensation. But in short, in the context of social media: every social network ends up making its own version of imaginary internet points (karma, moderator privileges, whatever you want to call it). Non-monetary compensation has a lot of downsides, which I won't explore here. Instead, making the credit system based on money with real-world value has the promise to vastly improve social media interactions.

Did that intrigue you enough to want to give this a shot? Awesome! Let me give you an overview of the protocol, and then we'll dive into my recommendation on getting started.

Protocol overview

The basics of the protocol can be broken down into:

  • Relays
  • Events
  • Identities
  • Clients

As a decentralized protocol, Nostr relies on public key cryptography for identities. That means, when you interact on the network, you'll use a private key (represented as an nsec value) to sign your messages, and will be identified by your public key (represented as an npub value). Anyone familiar with Bitcoin or cryptocurrency will be familiar with the keys vs wallet divide, and it lines up here too. Right off the bat, we see the first major advantage of Nostr: no one controls your identity except you.

Clients are how a user will interact with the protocol. You'll provide your identity to the client in one of a few ways:

  • Directly entering your nsec. This is generally frowned upon since it opens you up for exploit, though most mobile apps work by direct nsec entry.
  • Getting a view-only experience in clients that support it by entering your npub.
  • Using a signing tool to perform the signing on behalf of the client without giving away your private keys to everyone. (This matches most web3 interactions that rely on a wallet browser extension.)

Events are a general-purpose concept, and are the heart of Nostr interaction. Events can represent a note (similar to a Tweet), articles, likes, reposts, profile updates, and more. Anything you do on the protocol involves creating and signing an event. This is also the heart of Nostr's extensibility: new events can be created to support new kinds of interactions.

Finally there are relays. Relays are the servers of the Nostr world, and are where you broadcast your events to. Clients will typically configure multiple relays, broadcast your events to those relays, and query relays for relevant events for you (such as notes from people you follow, likes on your posts, and more).

Getting started

This is where the suboptimal experience really exists for Nostr. It took me a few days to come up with a setup that worked reliably. I'm going to share what worked best for me, but keep in mind that there are many other options. I'm a novice; other guides may give different recommendations and you may not like my selection of tools. My best recommendation: don't end up in shell shock like I did. Set up any kind of a Nostr profile, introduce yourself with the #introductions hashtag, and ask for help. I've found the community unbelievably welcoming.

Alright, so here are the different pieces you're going to need for a full experience:

  • Browser extension for signing
  • Web client
  • Mobile client
  • Lightning wallet
  • A Nostr address

I'm going to give you a set of steps that I hope both provides easy onboarding while still leaving you with the ability to more directly control your Nostr experience in the future.

Lightning wallet: coinos

First, you're going to set up a Lightning wallet. There are a lot of options here, and there are a lot of considerations between ease-of-use, self-custody, and compatibility with other protocols. I tried a bunch. My recommendation: use coinos. It's a custodial wallet (meaning: they control your money and you're trusting them), so don't put any large sums into it. But coinos is really easy to use, and supports Nostr Wallet Connect (NWC). After you set up your account, click on the gear icon, and then click on "Reveal Connection String." You'll want to use that when setting up your clients. Also, coinos gives you a Lightning address, which will be <username>@coinos.io. You'll need that for setting up your profile on Nostr.

Web client: YakiHonne

I tried a bunch of web clients and had problems with almost all of them. I later realized most of my problems seemed to be caused by incorrectly set relays, which we'll discuss below. In any event, I ultimately chose YakiHonne. It also has mobile iOS and Android clients, so you can have a consistent experience. (I also used the Damus iOS client, which is also wonderful.)

Go to the homepage, click on the Login button in the bottom-left, and then choose "Create an account." You can add a profile picture, banner image, choose a display name, and add a short description. In the signup wizard, you'll see an option to let YakiHonne set up a wallet (meaning a Lightning wallet) for you. I chose not to rely on this and used coinos instead to keep more flexibility for swapping clients in the future. If you want to simplify, however, you can just use the built-in wallet.

Before going any further, make sure you back up your nsec secret key!!! Click on your username in the bottom-left, then settings, and then "Your keys." I recommend saving both your nsec and npub values in your password manager.

YakiHonne keys

No, this isn't my actual set of keys, this was a test profile I set up while writing this post.

Within that settings page, click on "wallets," then click on the plus sign next to add wallets, and choose "Nostr wallet connect." Paste the NWC string you got from coinos, and you'll be able to zap people money!

Next, go back to settings and choose "Edit Profile." Under "Lightning address," put your <username>@coinos.io address. Now you'll also be able to receive zaps from others.

Another field you'll notice on the profile is NIP-05. That's your Nostr address. Let's talk about getting that set up.

NIP-05 Nostr address

Remembering a massive npub address is a pain. Instead, you'll want to set up a NIP-05 address. (NIP stands for Nostr Implementation Possibilities, you can see NIP-05 on GitHub.) There are many services—both paid and free—to get a NIP-05 address. You can see a set of services on AwesomeNostr. Personally, I decided to set up an identifier on my own domain. You can see my live nostr.json file, which at time of writing supports:

  • michael@snoyman.com and an alias snoyberg@snoyman.com
  • The special _@snoyman.com, which actually means "the entire domain itself"
  • And an identifier for my wife as well, miriam@snoyman.com

If you host this file yourself, keep in mind these two requirements:

  • You cannot have any redirects at that URL! If your identifier is name@domain, the URL https://domain/.well-known/nostr.json?name=<name> must resolve directly to this file.
  • You need to set CORS headers appropriately to allow for web client access, specifically the response header access-control-allow-origin: *.

Once you have that set up, add your Nostr address to your profile on YakiHonne. Note that you'll need the hex version of your npub, which you can generate by using the Nostr Army Knife:

Nostr Army Knife

BONUS I decided to also set up my own Lightning wallet address, by rehosting the Lightning config file from https://coinos.io/.well-known/lnurlp/snoyberg on my domain at https://snoyman.com/.well-known/lnurlp/michael.

Signer

As far as I can tell, Alby provides the most popular Nostr signing browser extension. The only problem I had with it was confusion about all the different things it does. Alby provides a custodial lightning wallet via Alby Hub, plus a mobile Alby Go app for accessing it, plus a browser extension for Nostr signing, and that browser extension supports using both the Alby Hub wallet and some other lightning wallets. I did get it all to work together, and it's a pleasant experience.

nos2x

However, to keep things a bit more simple and single-task focused, I'll recommend trying out the nos2x extension first. It's not pretty, but handles the signer piece very well. Install the extension, enter the nsec you got from YokiHonne, click save, and you're good to go. If you go to another Nostr client, like noStrudel, you should be able to "sign in with extension."

You may also notice that there's any entry area for "preferred relays." We'll discuss relays next. Feel free to come back to this step and add those relays. (And, after you've done that, you can also use a nostr.json generator to help you self-host your NIP-05 address if you're so inclined.)

Final note: once you've done the initial setup, it's not clear how to get back to the nos2x settings page. Right-click the extension, click manage extension, and then choose "extension options." At least those were the steps in Brave; it may be slightly different in other browsers.

Relays

This has been my biggest pain point with Nostr so far. Everything you do with Nostr needs to be sent to relays or received from relays. You want to have a consistent and relatively broad set of relays to make sure your view of the world is consistent. If you don't, you'll likely end up with things like mismatched profiles across relays, messages that seem to disappear, and more. This was probably my biggest stumbling block when first working with Nostr.

There seem to be three common ways to set the list of relays:

  • Manually entering the relays in the client's settings.
  • Getting the list of relays from the signer extension (covered by NIP-07).
  • Getting the list of relays from your NIP-05 file.

Unfortunately, it looks like most clients don't support the latter two methods. So unfortunately, any time you start using a new client, you should check the relay list and manually sync it up with a list of relays you maintain.

You can look at my nostr.json file for my own list of relays. One relay in particular I was recommended to use is wss://hist.nostr.land. This relay will keep track of your profile and follow list updates. As I mentioned, it's easy to accidentally partially-override your profile information through inconsistent relay lists, and apparently lots of new users (myself included) end up doing this. If you go to hist.nostr.land you can sign in, find your historical updates, and restore old versions.

Mobile

You're now set up on your web experience. For mobile, download any mobile app and set it up similarly to what I described for web. The major difference will be that you'll likely be entering your nsec directly into the mobile app.

I've used both Damus and YakiHonne. I had better luck with YakiHonne for getting zaps working reliably, but that may simply be because I'd tried Damus before I'd gotten set up with coinos before. I'll probably try out Damus some more soon.

Note on Damus: I had trouble initially with sending Zaps on Damus, but apparently that's because of Apple rules. You can enable Zaps by visiting this site on your device: https://zap.army/. Thanks William Cassarin for the guidance and the great app.

Introductions

You should now be fully set up to start interacting on Nostr! As a final step, I recommend you start off by sending an introduction note. This is a short note telling the world a bit about yourself, with the #introductions hashtag. For comparison, here's my introduction note (or a Nostr-native URL).

And in addition, feel free to @ me in a note as well, I'd love to meet other people on Nostr who joined up after reading this post. My identifier is michael@snoyman.com. You can also check out my profile page on njump, which is a great service to become acquainted with.

And now that you're on Nostr, let me share my experiences with the platform so far.

My experience

I'm definitely planning to continue using Nostr. The community has a different feel to my other major social media hub, X, which isn't surprising. There's a lot more discussion of Bitcoin and economics, which I love. There's also, at least subjectively, more of a sense of having fun versus X. I described it as joyscrolling versus doomscrolling.

Nostr is a free speech haven. It's quite literally impossible to fully silence someone. People can theoretically be banned from specific relays, but a banned user could always just use other relays are continue to create new keys. There's no KYC process to stop them. I've only found one truly vile account so far, and it was easy enough to just ignore. This fits very well with my own personal ethos. I'd rather people have a public forum to express any opinion, especially the opinions I most strongly disagree with, including calls to violence. I believe the world is better for allowing these opinions to be shared, debated, and (hopefully) exposed as vapid.

The process of zapping is surprisingly engaging. The amount of money people send around isn't much. The most common zap amount is 21 satoshis, which at the current price of Bitcoin is just about 2 US cents. Unless you become massively popular, you're not going to retire on zaps. But it's far more meaningful to receive a zap than a like; it means someone parted with something of actual value because you made their day just a little bit better. And likewise, zapping someone else has the same feeling. It's also possible to tip providers of clients and other tools, which is a fundamental shift from the advertiser-driven web of today.

I'd love to hear from others about their own experiences! Please reach out with your own findings. Hopefully we'll all be able to push social media into a more open, healthy, and fun direction.

December 17, 2024 12:00 AM

December 16, 2024

GHC Developer Blog

GHC 9.12.1 is now available

GHC 9.12.1 is now available

Zubin Duggal - 2024-12-16

The GHC developers are very pleased to announce the release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • Experimental support for the RISC-V platform with the native code generator.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at December 16, 2024 12:00 AM

December 13, 2024

Well-Typed.Com

GHC activities report: September–November 2024

This is the twenty-fifth edition of our GHC activities report, which describes the work Well-Typed are doing on GHC, Cabal, HLS and other parts of the core Haskell toolchain. The current edition covers roughly the months of September to November 2024. You can find the previous editions collected under the ghc-activities-report tag.

Sponsorship

We are delighted to offer Haskell Ecosystem Support Packages to provide commercial users with access to Well-Typed’s experts, while investing in the Haskell community and its technical ecosystem. Clients will both fund the work described in this report and support the Haskell Foundation. If your company is using Haskell, read more about our offer, or get in touch with us today, so we can help you get the most out of the toolchain. We need more funding to continue our essential maintenance work!

Many thanks to our existing sponsors who make this work possible: Anduril and Juspay. In addition, we are grateful to Mercury for funding specific work on improved performance for developer tools on large codebases.

Team

The GHC team at Well-Typed currently consists of Andreas Klebinger, Ben Gamari, Matthew Pickering, Rodrigo Mesquita, Sam Derbyshire and Zubin Duggal. Adam Gundry acts as Secretary to the GHC Steering Committee. Cabal maintenance is undertaken by Mikolaj Konarski, and HLS maintenance by Hannes Siebenhandl and Zubin Duggal. In addition, many others within Well-Typed are contributing to GHC more occasionally.

GHC Releases

Cabal

Cabal-3.14.0.0 was released in September, adding initial support for the new hooks build-type we implemented to replace custom setup scripts, as part of our work for the Sovereign Tech Fund on Cabal long-term maintainability. Corresponding new versions of cabal-install and the release of the Cabal-hooks library are due soon, at which point it will become easier for users to explore replacing their custom setup scripts with the hooks feature. Related to this effort:

  • Sam amended the Cabal-hooks version to match the Cabal library version (#10579).

  • Rodrigo made progress on Sam’s work to simplify the way cabal-install uses the Cabal library to build packages. This will make the code easier to work with in the future, and should improve build performance (#9871).

  • Rodrigo made cabal-install invoke git clone concurrently when downloading from git repositories for source-repository-package stanzas or cabal get, and switched it to use shallow clones by default (#10254). This speeds up the cloning step significantly.

  • Rodrigo’s work on private dependencies (#9743) is slowly making progress, thanks to recent work by Kristen Kozak.

  • Rodrigo and Matthew fixed various minor Cabal bugs, improved documentation and future-proofed against future core libraries changes (#10311, #10404, #10415 and #10433).

HLS

  • Hannes finished off and merged support for a new “jump to instance definition” feature in HLS (#4392), which will make it easier for users to understand which typeclass instance is in use in a particular expression.

GHC

Exception backtraces

Rodrigo worked on improving several facets of the exception backtraces story:

  • Improved the rendering of uncaught exceptions so the default output is much clearer and easier to understand (CLC proposal #285, !13301). This included reformatting the output, reducing duplication, avoiding exposing internal implementation details in the call stacks, and reducing duplication.

  • Changed functions such as catch to propagate the original cause if another exception is subsequently thrown (CLC proposal #202.

  • Landed a patch by Ben so that the HasCallStack-based backtrace for an error call is more informative (!12620, #24807).

Overall, exception backtraces will be much more useful in GHC 9.12 and later, making it easier to debug Haskell applications.

Frontend

  • Andreas added new primops, is[Mutable]ByteArrayWeaklyPinned#, which allow checking whether a bytearray can be moved by the RTS, as per CLC proposal #283.

  • Sam fixed a GHC panic involving out-of scope pattern synonyms (#25056, !13092).

  • Sam augmented the -fdiagnostics-as-json output to include the reason an error or warning was emitted (!13577, #25403).

  • Matthew and Rodrigo deprecated the unused -Wcompat-unqualified-imports warning (!12755, #24904, !13349, #25330).

  • Ben improved the parsing and parser errors for sizes in GHC RTS flags (!12384, #20201).

  • Matthew fixed a bug in the interaction of -working-dir and foreign files (!13196, #25150).

  • Zubin bumped the Haddock binary interface version to 46, to improve errors when there are mismatched interface files (!13342).

SIMD in the NCG backend

  • Sam and Andreas teamed up to finish the mega-MR adding SIMD support to GHC’s X86 native code generator backend (!12860, and see our previous report for more background). This also fixed critical correctness bugs that affected SIMD support in the existing LLVM backend, such as #25062 and #25169.

  • Sam followed this up with user’s guide documentation for the feature (!13880) and a couple of additional fixes for bugs that have been reported since (!13561, !13612).

LLVM backend

  • Sam implemented several fixes relating to the LLVM backend, in collaboration with GHC contributor @aratamizuki:

    • fix bugs involving fltused to ensure that GHC can use the LLVM backend on Windows once more (#22487, !13183),
    • use +sse4.2 instead of +sse42 (#25019),
    • make SSE4.2 imply +popcnt (#25353).
  • Matthew bumped the LLVM upper bound to allow GHC to use LLVM 19 (!13311, #25295).

RISC-V backend

  • Andreas added support for floating-point min/max operations in the RISC-V NCG backend (!13325)

  • Matthew fixed some issues to do with the fact that the RISC-V backend does not yet support SIMD vectors (!13327, #25314, #13327).

Object code determinism

  • Rodrigo merged !12680, which goes 95% of the way towards ensuring GHC produces fully deterministic object code (#12935).

  • Rodrigo made the unique generation used by the LLVM backend deterministic (!13307, #25274), thus making GHC 96% object-code deterministic.

  • Rodrigo ensured that re-exports did not spoil determinism of interface files in !13316 (#25304).

Compiler performance

  • Matthew, Rodrigo and Adam published a proposal for Explicit Level Imports. The proposed language feature will allow users of Template Haskell to communicate more precise dependencies for quotes and splices, which can unlock significant compile-time performance improvements and is a step towards better cross-compilation support.

  • Rodrigo improved the performance of module reachability queries (!13593), which can significantly reduce compile times and memory usage for projects with very large numbers of modules.

  • Andreas introduced a new flag, -fmax-forced-spec-args, to control the maximum size of specialised functions introduced when using the SPEC keyword (!13184, #25197). This avoids a potential compile-time performance cliff caused by specialisations with excessively large numbers of arguments.

  • Matthew greatly reduced the memory footprint of linking with the Javascript backend by making some parts of the compiler more lazy (!13346).

  • Zubin’s proposal to address libc compatibility issues when using semaphores for build parallelism has been making progress through the proposal process.

Runtime system

  • Ben fixed the encoding of breakpoint instructions in the RTS to account for a recent addition of support for inlining breakpoints (!13423, #25374).

  • Ben tightened up the documentation and invariants in the bytecode interpreter (!13565).

  • Ben allowed GNU-style non-executable stack notes to be used on FreeBSD (!13587, #25475).

  • Ben fixed an incorrect EINTR check in the timerfd ticker (!13588, #25477).

  • Zubin ensured that any new cost centres added by freshly-loaded objects are correctly included in the eventlog (!13114, #24148).

  • Ben increased the gen_workspace alignment in the RTS from 64 bytes to 128 bytes in order to prevent false sharing on Apple’s ARMv8 implementation, which uses a cache-line size of 128 bytes (!13594, #25459).

  • Ben removed some incorrect platform-dependent pointer casts in the RTS (!13597).

  • Zubin fixed a segfault when using the non-moving GC with profiling (!13271, #25232).

  • Ben fixed a stack overrun error with i386 adjustors (!13599, #25485).

  • Ben introduced a convenience printIPE debugging function for printing info-provenance table entries (!13614).

  • Andreas fixed a crash that could happen if an exception and the compacting GC aligned in specific ways (#24791, !13640).

Documentation

  • Ben fleshed out missing documentation of the eventlog format in the user’s guide (!13398, #25296).

  • Ben documented the :where GHCi command (!13399, #24509).

  • Andreas clarified the documentation of fexpose-overloaded-unfoldings (!13286, #24844).

  • Ben documented that GHC coalesces adjacent sub-word size fields of data constructors (!13397).

  • Ben improved the documentation of equality constraints to mention which language extensions are required to use them (!12395, #24127).

Codebase improvements

  • Ben fixed a few warnings in the runtime system, including some FreeBSD specific warnings (!13586).

  • Ben fixed (!13394, #25362) some incomplete pattern matches in GHC.Internal.IO.Windows.Handle that came to light in !13308, a refactor of the desugarer. Andreas also chipped in, squashing some warnings in ghc-heap (!13510).

  • Matthew refactored the partitionByWorkerSize function to avoid spurious pattern-match warning bugs when compiling with -g3 (!13359, #25338).

  • Matthew removed the hs-boot file for Language.Haskell.Syntax.ImpExp and introduced one for GHC.Hs.Doc, which better reflects the intended modular hierarchy of the modules (!13406).

GHC API

  • We are pleased to see that the Haskell Foundation and Tweag are resuming efforts aimed at defining a stable API for GHC.

  • Ben added lookupTHName, a more convenient way to look up a name in a GHC plugin that re-uses Template Haskell lookup functions (!12432, #24741).

  • Zubin made sure that the driverPlugin is correctly run for static plugins (!13199, #25217).

Libraries

  • Andreas allowed unknown FD device types in the setNonBlockingMode function (!13204, #25199), as per CLC proposal #282. This fixes a regression in the hinotify package on GHC 9.10.

  • Andreas made sure that all primops are re-exported in the ghc-experimental package via the module GHC.PrimOps (!13245), and changed the versioning scheme of ghc-experimental to follow GHC versions (!13344, #25289).

  • Ben fixed a performance regression in throw by judicious insertion of noinline (!13275, #25066), as discussed in CLC proposal #290.

  • Matthew unwired the base package to pave the way for a reinstallable base package (!13200).

  • Matthew upgraded GHC’s Unicode support to Unicode 16 (!13514).

  • Rodrigo removed BCO primops from the GHC.Exts re-export, as per CLC proposal #212 (!13211, #25110).

Profiling

  • Matthew enabled late cost-centres by default when the libraries distributed with GHC are built for profiling (!10930, #21732). This greatly improves the resolution of cost centre stacks when profiling.

  • Andreas fixed a bug in which profiling could affect program behaviour by allowing profiling ticks to move past unsafeCoerce# in Core (!13413, #25212).

Build system

  • Ben fixed the configure script incorrectly reporting that subsections-via-symbols is enabled on AArch64/Darwin (!12834, #24962).

  • The first alpha pre-release of 9.12 incorrectly had a version number of 9.12.20241014 instead of 9.12.0.20241014, which broke the expected lexicographic ordering of GHC releases. Ben added a check for the validity of a release GHC version number to prevent such issues in the future (!13456).

  • Matthew allowed GHC to build with happy-2.0.2 (!13318, #25276), and Ben with happy-2.1.2 (!13532, #25438).

  • Andreas made the Hadrian progress messages including the working directory to allow quickly distinguishing builds when multiple builds are in progress (!13353, #25335).

  • Ben allowed Haddock options to be passed as Hadrian key-value settings (!11006).

Testsuite

  • Ben improved the reporting of certain errors in the testsuite (!13332).

  • Ben ensured performance metrics are collected even when there are untracked files (!13579, #25471).

  • Ben ensured performance metrics are properly pushed for WebAssembly jobs (!13312).

  • Zubin made several improvements to the testsuite, in particular regarding normalisation of tests and fixing a Haddock bug involving files with the same modification time (!13418, !13522).

CI

  • Matthew added a i386 validation job that can be triggered by adding the i386 label on an MR (!13352).

  • Matthew added a mechanism for only triggering certain individual jobs in CI by using the ONLY_JOBS variable (!13350).

  • Matthew fixed some issues of variable inheritance in the ghcup-metadata testing job (!13306).

  • Matthew added Ubuntu 22.04 jobs to CI (!13335).

by adam, andreask, ben, hannes, matthew, mikolaj, rodrigo, sam, zubin at December 13, 2024 12:00 AM

December 12, 2024

Stackage Blog

LTS 23 release for ghc-9.8 and Nightly now on ghc-9.10

Stackage LTS 23 has been released

The Stackage team is happy to announce that Stackage LTS version 23 has finally been released a couple of days ago, based on GHC stable version 9.8.4. It follows on from the LTS 22 series which was the longest lived LTS major release to date (with probable final snapshot lts-22.43).

We are dedicating the LTS 23 release to the memory of Chris Dornan, who left this world suddenly and unexceptedly around the end of May. We are indebted to Christopher for his many years of wide Haskell community service, including also being one of the Stackage Curators up until the time he passed away. He is warmly remembered.

LTS 23 includes many package changes, and almost 3200 packages! Thank you for all your nightly contributions that made this release possible: the initial release was prepared by Jens Petersen. (The closest nightly snapshot to lts-23.0 is nightly-2024-12-09, but lts-23 is just ahead of it with pandoc-3.6.)

If your package is missing from LTS 23 and can build there, you can easily have it added by opening a PR in lts-haskell to the build-constraints/lts-23-build-constraints.yaml file.

Stackage Nightly updated to ghc-9.10.1

At the same time we are excited to move Stackage Nightly to GHC 9.10.1: the initial snapshot release is nightly-2024-12-11. Current nightly has over 2800 packages, and we expect that number to grow over the coming weeks and months: we welcome your contributions and help with this. This initial release build was made by Jens Petersen (64 commits).

Most of our upperbounds were dropped for this rebase so quite a lot of packages had to be disabled. You can see all the changes made relative to the preceding last 9.8 nightly snapshot. Apart from trying to build yourself, the easiest way to understand why particular packages are disabled is to look for their < 0 lines in build-constraints.yaml, particularly under the "Library and exe bounds failures" section. We also have some tracking issues still open related to 9.10 core boot libraries.

Thank you to all those who have already done work updating their packages for ghc-9.10.

Adding or enabling your package for Nightly is just a simple pull request to the large build-constraints.yaml file.

If you have questions, you can ask in Stack and Stackage Matrix room (#haskell-stack:matrix.org) or Slack channel.

December 12, 2024 07:00 AM

December 11, 2024

Haskell Interlude

Episode 59: Harry Goldstein

Sam and Wouter interview Harry Goldstein, a researcher in property-based testing who works in PL, SE, and HCI. In this episode, we reflect on random generators, the find-a-friend model, interdisciplinary research, and how to have impact beyond your own research community.

by Haskell Podcast at December 11, 2024 02:00 PM

Philip Wadler

John Longley's Informatics Lecturer Song

From my colleague, John Longley, a treat. 

‘Informatics Lecturer Song 

(Based on Gilbert and Sullivan’s ‘Major General song’) 

John Longley 

I am the very model of an Informatics lecturer,
For educating students you will never find a betterer.
I teach them asymptotics with a rigour that’s impeccable,
I’ll show them how to make their proofs mechanically checkable.
On parsing algorithms I can hold it with the best of them,
With LL(1) and CYK and Earley and the rest of them.
I’ll teach them all the levels of the Chomsky hierarchy…
With a nod towards that Natural Language Processing malarkey.

I’ll summarize the history of the concept of a function,
And I’ll tell them why their Haskell code is ‘really an adjunction’.
In matters mathematical and logical, etcetera,
I am the very model of an Informatics lecturer.

For matters of foundations I’m a genuine fanaticker:
I know by heart the axioms of Principia Mathematica,
I’m quite au fait with Carnap and with Wittgenstein’s Tractatus,
And I’ll dazzle you with Curry, Church and Turing combinators.
I’ll present a proof by Gödel with an algebraic seasoning,
I’ll instantly detect a step of non-constructive reasoning.
I’ll tell if you’re a formalist or logicist or Platonist…
For I’ll classify your topos by the kinds of objects that exist.

I’ll scale the heights of cardinals from Mahlo to extendible,
I’ll find your favourite ordinals and stick them in an n-tuple.
In matters philosophical, conceptual, etcetera,
I am the very essence of an Informatics lecturer.

And right now I’m getting started on my personal computer,
I’ve discovered how to get it talking to the Wifi router.
In Internet and World Wide Web I’ve sometimes had my finger dipped,
And once I wrote a line of code in HTML/Javascript.
[Sigh.] I know I have a way to go to catch up with my students,
But I try to face each lecture with a dash of common prudence.
When it comes to modern tech: if there’s a way to get it wrong, I do!
But that seems to be forgiven if I ply them with a song or two.

So… although my present IT skills are rather rudimentary,
And my knowledge of computing stops around the nineteenth century,
Still, with help from all my colleagues and my audience, etcetera…
I’ll be the very model of an Informatics lecturer.


by Philip Wadler (noreply@blogger.com) at December 11, 2024 11:52 AM

December 10, 2024

Chris Smith 2

When is a call stack not a call stack?

Tom Ellis, who I have the privilege of working with at Groq, has an excellent article up about using HasCallStack in embedded DSLs. You should read it. If you don’t, though, the key idea is that HasCallStack isn’t just about exceptions: you can use it to get source code locations in many different contexts, and storing call stacks with data is particularly powerful in providing a helpful experience to programmers.

Seeing Tom’s article reminded me of a CodeWorld feature which was implemented long ago, but I’m excited to share again in this brief note.

CodeWorld Recap

If you’re not familiar with CodeWorld, it’s a web-based programming environment I created mainly to teach mathematics and computational thinking to students in U.S. middle school, ages around 11 to 14 years old. The programming language is based on Haskell — well, it is technically Haskell, but with a lot of preprocessing and tricks aimed at smoothing out the rough edges. There’s also a pure Haskell mode, giving you the full power of the idiomatic Haskell language.

In CodeWorld, the standard library includes primitives for putting pictures on the screen. This includes:

  • A few primitive pictures: circles, rectangles, and the like
  • Transformations to rotate, translate, scale, clip, and and recolor an image
  • Compositions to overlay and combine multiple pictures into a more complex picture.

Because the environment is functional and declarative — and this will be important — there isn’t a primitive to draw a circle. There is a primitive that represents the concept of a circle. You can include a circle in your drawing, of course, but you compose a picture by combining simpler pictures declaratively, and then draw the whole thing only at the very end.

Debugging in CodeWorld

CodeWorld’s declarative interface enables a number of really fun kinds of interactivity… what programmers might call “debugging”, but for my younger audience, I view as exploratory tools: ways they can pry open the lid of their program and explore what it’s doing.

There are a few of these that are pretty awesome. Lest I seem to be claiming the credit, the implementation for these features is due to two students in Summer of Haskell and then in Google Summer of Code: Eric Roberts, and Krystal Maughan.

  • Not the point here, but there are some neat features for rewinding and replaying programs, zooming in, etc.
  • There’s also an “inspect” mode, in which you not only see the final result, but the whole structure of the resulting picture (e.g., maybe it’s an overlay of three other pictures: a background, and two characters, and each of those is transformed in some way, and the base picture for the transformation is some other overlay of multiple parts…) This is possible because pictures are represented not as bitmaps, but as data structures that remember how the picture was built from its individual parts

Krystal’s recap blog post contains demonstrations of not only her own contributions, but the inspect window as well. Here’s a section showing what I’ll talk about now.

https://medium.com/media/7f09408e8411d852516bedb5aab2601c/href

The inspect window is linked to the code editor! Hover over a structural part of the picture, and you can see which expression in your own code produced that part of the picture.

This is another application of the technique from Tom’s post. The data type representing pictures in CodeWorld stores a call stack captured at each part of the picture, so that when you inspect the picture and hover over some part, the environment knows where in your code you described that part, and it highlights the code for you, and jumps there when clicked.

While it’s the same technique, I really like this example because it’s not at all like an exception. We aren’t reporting errors or anything of the sort. Just using this nice feature of GHC that makes the connection between code and declarative data observable to help our users observe things about their own code.

by Chris Smith at December 10, 2024 10:50 PM

Christopher Allen

Two memory issues from the last two weeks

Okay maybe they don't qualify as actual memory bugs, but they were annoying and had memory as a common theme. One of them by itself doesn't merit a blog post so I bundled them together.

by Unknown at December 10, 2024 12:00 AM

December 06, 2024

Well-Typed.Com

Debugging your Haskell application with debuggable

In this blog post we will introduce a new open source Haskell library called debuggable, which provides various utilities designed to make it easier to debug your applications. Some of these are intended for use during actual debugging, others are designed to be a regular part of your application, ready to be used if and when necessary.

Non-interleaved output

Ever see output like this when debugging concurrent applications?

ATnhdi st hiiss  ai sm eas smaegses afgreo mf rtohme  tfhier sste ctohnrde atdh
read
AndT htihsi si si sa  am emsessasgaeg ef rformo mt hteh ef isrescto ntdh rteharde
ad
TAhnids  tihsi sa  imse sas amgees sfargoem  ftrhoem  ftihres ts etchorneda dt
hread

The problem is that concurrent calls to putStrLn can result in interleaved output. To solve this problem, debuggable offers Debug.NonInterleavedIO, which provides variants of putStrLn and friends, as well as trace and its variants, all of which can safely be called concurrently without ever resulting in interleaved output. For example:

import Debug.NonInterleavedIO qualified as NIIO

useDebuggable :: IO ()
useDebuggable = do
    concurrently_
      ( replicateM_ 10 $ do
          NIIO.putStrLn "This is a message from the first thread"
          threadDelay 100_000
      )
      ( replicateM_ 10 $ do
          NIIO.putStrLn "And this is a message from the second thread"
          threadDelay 100_000
      )

If we run this as-is, we will only see

niio output to /tmp/niio2418318-0

on the terminal; inspecting /tmp/niio2418318-0 will show

And this is a message from the second thread
This is a message from the first thread
And this is a message from the second thread
This is a message from the first thread
...

If you want to send the output to a specific file (or /dev/stdout for output to the terminal), you can set the NIIO_OUTPUT environment variable.

Provenance

Provenance is about tracking of what was called when and where.

Call-sites

Consider the following example:

f1 :: IO ()
f1 = f2

f2 :: HasCallStack => IO ()
f2 = f3

f3 :: HasCallStack => IO ()
f3 = putStrLn $ prettyCallStack callStack

The callstack we get from this example looks something like this:

CallStack (from HasCallStack):
  f3, called at Demo/Provenance.hs:15:6 in ..
  f2, called at Demo/Provenance.hs:12:6 in ..

Callstacks are awesome, and a huge help during debugging, but there are some minor issues with this example:

  • Personally, this has always felt a bit “off by one” to me: the first entry tells us that we are in f3, but we were called from line 15, which is f2; likewise, the second entry in the stack tells us that we are in f2, but we were called from line 12, which is f1. Not a huge deal, but arguably a bit confusing. (See also GHC ticket #25546: Make HasCallStack include the caller.)
  • Somewhat relatedly, when we are in f3, and ask for a CallStack, being told that we are in f3 is not particularly helpful (we knew that already).
  • Finally, it is sometimes useful to have just the “first” entry in the callstack; “we were called from line such and such, which is function so and so”.

For this reason, Debug.Provenance provides a CallSite abstraction

g1 :: IO ()
g1 = g2

g2 :: HasCallStack => IO ()
g2 = g3

g3 :: HasCallStack => IO ()
g3 = print callSite

This outputs:

g2 -> g3 (Demo/CallSite.hs:31:6)

where line 31 is the call to g3 in g2. Due to the (alleged) “off-by-one”, both g2 and g3 must be given a HasCallStack constraint, otherwise we get

{unknown} -> g3 (Demo/CallSite.hs:31:6)

when g2 lacks the constraint, or

{unknown} -> {unknown} ()

when g3 does. There is also a variant callSiteWithLabel, which results in output such as

g2 -> g3 (Demo/CallSite.hs:31:6, "foo")

Invocations

Sometimes we are not so much interested in where we are called from, but how often a certain line in the source is called. Debug.Provenance offers “invocations” to track this:

g1 :: IO ()
g1 = replicateM_ 2 g2

g2 :: HasCallStack => IO ()
g2 = do
    print =<< newInvocation
    replicateM_ 2 g3

g3 :: HasCallStack => IO ()
g3 = print =<< newInvocation

This results in output such as

g2 (Demo/Invocation.hs:30:15) #1
g3 (Demo/Invocation.hs:34:16) #1
g3 (Demo/Invocation.hs:34:16) #2
g2 (Demo/Invocation.hs:30:15) #2
g3 (Demo/Invocation.hs:34:16) #3
g3 (Demo/Invocation.hs:34:16) #4

We see the first call to g2, then the first and second call to g3, then the second call to g2, and finally the third and fourth call to h3.

When debugging problems such as deadlocks, it is often useful to insert putStrLn statements like this:

f4 :: IO ()
f4 = do
    putStrLn "f4:1"
    -- f4 does something ..
    putStrLn "f4:2"
    -- f4 does something else ..
    putStrLn "f4:3"

This pattern too can be made a bit simpler by using invocations:

g4 :: HasCallStack => IO ()
g4 = do
    print =<< newInvocation
    -- f4 does something ..
    print =<< newInvocation
    -- f4 does something else ..
    print =<< newInvocation

Resulting in output such as

g4 (Demo/Invocation.hs:48:15) #1
g4 (Demo/Invocation.hs:50:15) #1
g4 (Demo/Invocation.hs:52:15) #1

Scope

The definition of g4 above is still a little clunky, especially if we also want to include other output than just the invocation itself. We can do better:

import Debug.NonInterleavedIO.Scoped qualified as Scoped

g4 :: HasCallStack => IO ()
g4 = do
    Scoped.putStrLn "start"
    -- f4 does something ..
    Scoped.putStrLn "middle"
    -- f4 does something else ..
    Scoped.putStrLn "end"

outputs

[g4 (Demo/Scope.hs:21:5) #1] start
[g4 (Demo/Scope.hs:23:5) #1] middle
[g4 (Demo/Scope.hs:25:5) #1] end

As the name suggests, though, there is more going on here than simply a more convenient API: Debug.Provenance.Scope offers a combinator called scoped for scoping invocations:

g1 :: IO ()
g1 = g2

g2 :: HasCallStack => IO ()
g2 = scoped g3

g3 :: HasCallStack => IO ()
g3 = scoped g4

This results in

[g4 (Demo/Scope.hs:29:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] start
[g4 (Demo/Scope.hs:31:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] middle
[g4 (Demo/Scope.hs:33:5) #1, g3 (Demo/Scope.hs:25:6) #1, g2 (Demo/Scope.hs:22:6) #1] end

Threads

The counters that are part of an Invocation can be very useful to cross-reference output messages from multiple threads. Continuing with the g4 example we introduced in the section on Scope, suppose we have

concurrent :: IO ()
concurrent = concurrently_ g4 g4

we might get output like this:

[g4 (Demo/Scope.hs:32:5) #1] start
[g4 (Demo/Scope.hs:32:5) #2] start
[g4 (Demo/Scope.hs:34:5) #1] middle
[g4 (Demo/Scope.hs:34:5) #2] middle
[g4 (Demo/Scope.hs:36:5) #1] end
[g4 (Demo/Scope.hs:36:5) #2] end

(where the scheduling between the two thread might be different, of course).

Scope is always thread local, but debuggable provides a way to explicitly inherit the scope of a parent thread in a child thread:

h1 :: IO ()
h1 = h2

h2 :: HasCallStack => IO ()
h2 = scoped h3

h3 :: HasCallStack => IO ()
h3 = scoped $ do
    tid <- myThreadId
    concurrently_
      (inheritScope tid >> g4)
      (inheritScope tid >> g4)

results in

[g4 (Demo/Scope.hs:34:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] start
[g4 (Demo/Scope.hs:34:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] start
[g4 (Demo/Scope.hs:36:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] middle
[g4 (Demo/Scope.hs:36:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] middle
[g4 (Demo/Scope.hs:38:5) #1, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] end
[g4 (Demo/Scope.hs:38:5) #2, h3 (Demo/Scope.hs:50:6) #1, h2 (Demo/Scope.hs:47:6) #1] end

Callbacks

Suppose we have some functions which take another function, a callback, as argument, and invoke that callback at some point:

f1 :: HasCallStack => (Int -> IO ()) -> IO ()
f1 k = f2 k

f2 :: HasCallStack => (Int -> IO ()) -> IO ()
f2 k = scoped $ k 1

Let’s use this example callback function:

g1 :: HasCallStack => Int -> IO ()
g1 n = g2 n

g2 :: HasCallStack => Int -> IO ()
g2 n = Scoped.putStrLn $ "n = " ++ show n ++ " at " ++ prettyCallStack callStack

and invoke f1 as follows:

withoutDebuggable :: HasCallStack => IO ()
withoutDebuggable = f1 g1

This outputs:

[g2 (Demo/Callback.hs:26:8) #1, f2 (Demo/Callback.hs:20:8) #1]
  n = 1 at CallStack (from HasCallStack):
    g2, called at Demo/Callback.hs:23:8 in ..
    g1, called at Demo/Callback.hs:29:24 in ..
    withoutDebuggable, called at Demo.hs:25:36 in ..

Confusingly, this callstack does not include any calls to f1 or f2. This happens because the call to k in f2 does not pass the current CallStack; instead we see the CallStack as it was when we defined g1.

For callbacks like this it is often useful to have two pieces of information: the CallStack that shows how the callback is actually invoked, and the CallSite where the callback was defined. Debug.Provenance.Callback provides a Callback abstraction that does exactly this. A Callback m a b is essentially a function a -> m b, modulo treatment of the CallStack. Let’s change f1 and f2 to take a CallBack instead:

h1 :: HasCallStack => Callback IO Int () -> IO ()
h1 k = h2 k

h2 :: HasCallStack => Callback IO Int () -> IO ()
h2 k = scoped $ invokeCallback k 1

If we now use this top-level function

useDebuggable :: HasCallStack => IO ()
useDebuggable = h1 (callback g1)

we get a much more useful CallStack:

[g2 (Demo/Callback.hs:26:8) #1, h2 (Demo/Callback.hs:39:8) #1]
  n = 1 at CallStack (from HasCallStack):
    g2, called at Demo/Callback.hs:23:8 in ..
    g1, called at Demo/Callback.hs:42:30 in ..
    callbackFn, called at src/Debug/Provenance/Callback.hs:57:48 in ..
    invoking callback defined at useDebuggable (Demo/Callback.hs:42:21), called at ..
    h2, called at Demo/Callback.hs:36:8 in ..
    h1, called at Demo/Callback.hs:42:17 in ..
    useDebuggable, called at Demo.hs:26:36 in ..

Alternative: profiling backtraces

in addition to HasCallStack-style backtraces, there may also be other types of backtraces available, depending on how we build and how we run the code (we discuss some of these in the context of exception handling in episode 29 of the Haskell Unfolder). The most important of these is probably the profiling (cost centre) backtrace.

We can request the “current” callstack with currentCallstack, and the callstack attached to an object (“where was this created”) using whoCreated. This allows us to make similar distinctions that we made in Callback, for example:

f1 :: (Int -> IO ()) -> IO ()
f1 k = do
    cs <- whoCreated k
    putStrLn $ "f1: invoking callback defined at " ++ show (cs)
    f2 k

f2 :: (Int -> IO ()) -> IO ()
f2 k = k 1

g1 :: Int -> IO ()
g1 n = g2 n

g2 :: Int -> IO ()
g2 n = do
    cs <- currentCallStack
    putStrLn $ "n = " ++ show n ++ " at " ++ show cs

This does require the code to be compiled with profiling enabled. The profiling callstacks are sometimes more useful than HasCallstack callstacks, and sometimes worse; for example, in

demo :: Maybe Int -> IO ()
demo Nothing  = f1 (\x -> g1 x)
demo (Just i) = f1 (\x -> g1 (x + i))

the function defined in the Just case will have a useful profiling callstack, but since the function defined in the Nothing case is entirely static (does not depend on any runtime info), its callstack is reported as

["MAIN.DONT_CARE (<built-in>)"]

It would be useful to extend debuggable with support for both types of backtraces in a future release.

Performance considerations

Adding permanent HasCallStack constraints to functions does come at a slight cost, since they correspond to additional arguments that must be passed at runtime. For most functions this is not a huge deal; personally, I consider some well-placed HasCallStack constraints part of designing with debugging in mind. That said, you will probably want to avoid adding HasCallStack constraints to functions that get called repeatedly in tight inner loops; similar considerations also apply to the use of the Callback abstraction.

Conclusions

Although debuggable is a small library, it offers some functionality that has proven quite useful in debugging applications, especially concurrent ones. We can probably extend it over time to cover more use cases; “design for debuggability” is an important principle, and is made easier with proper library support. Contributions and comments are welcome!

As a side note, the tracing infrastructure of debuggable can also be combined with the recover-rtti package, which implements some dark magic to recover runtime type information by looking at the heap; in particular, it offers

anythingToString :: forall a. a -> String

which can be used to print objects without having a Show a instance available (though this is not the only use of recover-rtti). The only reason that debuggable doesn’t provide explicit support for this is that the dependency footprint of recover-rtti is a bit larger.

by edsko at December 06, 2024 12:00 AM

December 04, 2024

Well-Typed.Com

The Haskell Unfolder Episode 37: solving Advent of Code 2024 day 4

Today, 2024-12-04, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 37th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 37: solving Advent of Code 2024 day 4

In this episode of the Haskell Unfolder, we are going to try solving the latest problem of this year’s Advent of Code live.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at December 04, 2024 12:00 AM

December 02, 2024

GHC Developer Blog

GHC 9.8.4 is now available

GHC 9.8.4 is now available

Ben Gamari - 2024-12-02

The GHC developers are happy to announce the availability of GHC 9.8.4. Binary distributions, source distributions, and documentation are available on the release page.

This release is a small release fixing a few issues noted in 9.8.3, including:

  • Update the filepath submodule to avoid a misbehavior of splitFileName under Windows.

  • Update the unix submodule to fix a compilation issue on musl platforms

  • Fix a potential source of miscompilation when building large projects on 32-bit platforms

  • Fix unsound optimisation of prompt# uses

A full accounting of changes can be found in the release notes. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

Happy compiling!

  • Ben

by ghc-devs at December 02, 2024 12:00 AM

December 01, 2024

Magnus Therning

Servant and a weirdness in Keycloak

When writing a small tool to interface with Keycloak I found an endpoint that require the content type to be application/json while the body should be plain text. (The details are in the issue.) Since servant assumes that the content type and the content match (I know, I'd always thought that was a safe assumption to make too) it doesn't work with ReqBody '[JSON] Text. Instead I had to create a custom type that's a combination of JSON and PlainText, something that turned out to required surprisingly little code:

data KeycloakJSON deriving (Typeable)

instance Accept KeycloakJSON where
    contentType _ = "application" // "json"

instance MimeRender KeycloakJSON Text where
    mimeRender _ = fromStrict . encodeUtf8

The bug has already been fixed in Keycloak, but I'm sure there are other APIs with similar weirdness so maybe this will be useful to someone else.

December 01, 2024 10:00 PM

Christopher Allen

Rebuilding Rust (Leptos) apps quickly

I'm working on a side project that is written in Rust on the backend and the frontend. The frontend component is in Leptos. Our app is about 20kLOC in total, so it takes a little time.

by Unknown at December 01, 2024 12:00 AM

November 29, 2024

Mark Jason Dominus

A complex bug with a ⸢simple⸣ fix

Last month I did a fairly complex piece of systems programming that worked surprisingly well. But it had one big bug that took me a day to track down.

One reason I find the bug interesting is that it exemplifies the sort of challenges that come up in systems programming. The essence of systems programming is that your program is dealing with the state of a complex world, with many independent agents it can't control, all changing things around. Often one can write a program that puts down a wrench and then picks it up again without looking. In systems programming, the program may have to be prepared for the possibility that someone else has come along and moved the wrench.

The other reason the bug is interesting is that although it was a big bug, fixing it required only a tiny change. I often struggle to communicate to nonprogrammers just how finicky and fussy programming is. Nonprogrammers, even people who have taken a programming class or two, are used to being harassed by crappy UIs (or by the compiler) about missing punctuation marks and trivially malformed inputs, and they think they understand how fussy programming is. But they usually do not. The issue is much deeper, and I think this is a great example that will help communicate the point.

The job of my program, called sync-spam, was to move several weeks of accumulated email from system S to system T. Each message was probably spam, but its owner had not confirmed that yet, and the message was not yet old enough to be thrown away without confirmation.

The probably-spam messages were stored on system S in a directory hierarchy with paths like this:

    /spam/2024-10-18/…

where 2024-10-18 was the date the message had been received. Every message system S had received on October 18 was somewhere under /spam/2024-10-18.

One directory, the one for the current date, was "active", and new messages were constantly being written to it by some other programs not directly related to mine. The directories for the older dates never changed. Once sync-spam had dealt with the backlog of old messages, it would continue to run, checking periodically for new messages in the active directory.

The sync-spam program had a database that recorded, for each message, whether it had successfully sent that message from S to T, so that it wouldn't try to send the same message again.

The program worked like this:

  • Repeat forever:
    1. Scan the top-level spam directory for the available dates
    2. For each date D:
      1. Scan the directory for D and find the messages in it. Add to the database any messages not already recorded there.
      2. Query the database for the list of messages for date D that have not yet been sent to T
      3. For each such message:
        1. Attempt to send the message
        2. If the attempt was successful, record that in the database
    3. Wait some appropriate amount of time and continue.

Okay, very good. The program would first attempt to deal with all the accumulated messages in roughly chronological order, processing the large backlog. Let's say that on November 1 it got around to scanning the active 2024-11-01 directory for the first time. There are many messages, and scanning takes several minutes, so by the time it finishes scanning, some new messages will be in the active directory that it hasn't seen. That's okay. The program will attempt to send the messages that it has seen. The next time it comes around to 2024-11-01 it will re-scan the directory and find the new messages that have appeared since the last time around.

But scanning a date directory takes several minutes, so we would prefer not to do it if we don't have to. Since only the active directory ever changes, if the program is running on November 1, it can be sure that none of the directories from October will ever change again, so there is no point in its rescanning them. In fact, once we have located the messages in a date directory and recorded them in the database, there is no point in scanning it again unless it is the active directory, the one for today's date.

So sync-spam had an elaboration that made it much more efficient. It was able to put a mark on a date directory that meant "I have completely scanned this directory and I know it will not change again". The algorithm was just as I said above, except with these elaborations.

  • Repeat forever:
    1. Scan the top-level spam directory for the available dates
    2. For each date D:
        • If the directory for D is marked as having already been scanned, we already know exactly what messages are in it, since they are already recorded in the database.
        • Otherwise:
          1. Scan the directory for D and find the messages in it. Add to the database any messages not already recorded there.
          2. If D is not today's date, mark the directory for D as having been scanned completely, because we need not scan it again.
      1. Query the database for the list of messages for date D that have not yet been sent to T
      2. For each such message:
        1. Attempt to send the message
        2. If the attempt was successful, record that in the database
    3. Wait some appropriate amount of time and continue.

It's important to not mark the active directory as having been completely scanned, because new messages are continually being deposited into it until the end of the day.

I implemented this, we started it up, and it looked good. For several days it processed the backlog of unsent messages from September and October, and it successfully sent most of them. It eventually caught up to the active directory for the current date, 2024-11-01, scanned it, and sent most of the messages. Then it went back and started over again with the earliest date, attempting to send any messages that it hadn't sent the first time.

But a couple of days later, we noticed that something was wrong. Directories 2024-11-02 and 2024-11-03 had been created and were well-stocked with the messages that had been received on those dates. The program had found the directories for those dates and had marked them as having been scanned, but there were no messages from those dates in its database.

Now why do you suppose that is?

(Spoilers will follow the horizontal line.)

I investigate this in two ways. First, I made sync-spam's logging more detailed and looked at the results. While I was waiting for more logs to accumulate, I built a little tool that would generate a small, simulated spam directory on my local machine, and then I ran sync-spam against the simulated messages, to make sure it was doing what I expected.

In the end, though, neither of these led directly to my solving the problem; I just had a sudden inspiration. This is very unusual for me. Still, I probably wouldn't have had the sudden inspiration if the information from the logging and the debugging hadn't been percolating around my head. Fortune favors the prepared mind.


The problem was this: some other agent was creating the 2024-11-02 directory a bit prematurely, say at 11:55 PM on November 1.

Then sync-spam came along in the last minutes of November 1 and started its main loop. It scanned the spam directory for available dates, and found 2024-11-02. It processed the unsent messages from the directories for earlier dates, then looked at 2024-11-02 for the first time. And then, at around 11:58, as per above it would:

  1. Scan the directory for 2024-11-02 and find the messages in it. Add to the database any messages not already recorded there.

There weren't any yet, because it was still 11:58 on November 1.

  1. If 2024-11-02 is not today's date, mark the directory as having been scanned completely, because we need not scan it again.

Since the 2024-11-02 directory was not the one for today's date — it was still 11:58 on November 1 — sync-spam recorded that it had scanned that directory completely and need not scan it again.

Five minutes later, at 00:03 on November 2, there would be new messages in the 2024-11-02, which was now the active directory, but sync-spam wouldn't look for them, because it had already marked 2024-11-02 as having been scanned completely.

This complex problem in this large program was completely fixed by changing:

        if ($date ne $self->current_date) {
          $self->mark_this_date_fully_scanned($date_dir);
        }

to:

        if ($date lt $self->current_date) {
          $self->mark_this_date_fully_scanned($date_dir);
        }

(ne and lt are Perl-speak for "not equal to" and "less than".)

Many organizations have their own version of a certain legend, which tells how a famous person from the past was once called out of retirement to solve a technical problem that nobody else could understand. I first heard the General Electric version of the legend, in which Charles Proteus Steinmetz was called out of retirement to figure out why a large complex of electrical equipment was not working.

In the story, Steinmetz walked around the room, looking briefly at each of the large complicated machines. Then, without a word, he took a piece of chalk from his pocket, marked one of the panels, and departed. When the puzzled engineers removed that panel, they found a failed component, and when that component was replaced, the problem was solved.

Steinmetz's consulting bill for $10,000 arrived the following week. Shocked, the bean-counters replied that $10,000 seemed an exorbitant fee for making a single chalk mark, and, hoping to embarrass him into reducing the fee, asked him to itemize the bill.

Steinmetz returned the itemized bill:

One chalk mark $1.00
Knowing where to put it $9,999.00
TOTAL $10,000.00

This felt like one of those times. Any day when I can feel a connection with Charles Proteus Steinmetz is a good day.

This episode also makes me think of the following variation on an old joke:

A: Ask me what is the most difficult thing about systems programming.

B: Okay, what is the most difficult thing ab—

A: TIMING!

by Mark Dominus (mjd@plover.com) at November 29, 2024 03:11 PM

GHC Developer Blog

GHC 9.12.1-rc1 is now available

GHC 9.12.1-rc1 is now available

Zubin Duggal - 2024-11-29

The GHC developers are very pleased to announce the availability of the release candidate for GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 29, 2024 12:00 AM

November 28, 2024

Christopher Allen

The cost of hosting is too damn high

I recently migrated a side project from DigitalOcean to some dedicated servers. I thought that I would offer some context and examples for why.

by Unknown at November 28, 2024 12:00 AM

November 27, 2024

Brent Yorgey

Competitive Programming in Haskell: stacks, queues, and monoidal sliding windows

Competitive Programming in Haskell: stacks, queues, and monoidal sliding windows

Posted on November 27, 2024
Tagged , , , , ,

Suppose we have a list of items of length \(n\), and we want to consider windows (i.e. contiguous subsequences) of width \(w\) within the list.

A list of numbers, with contiguous size-3 windows highlighted

We can compute the sum of each window by brute force in \(O(nw)\) time, by simply generating the list of all the windows and then summing each. But, of course, we can do better: keep track of the sum of the current window; every time we slide the window one element to the right we can add the new element that enters the window on the right and subtract the element that falls of the window to the left. Using this “sliding window” technique, we can compute the sum of every window in only \(O(n)\) total time instead of \(O(nw)\).

How about finding the maximum of every window? Of course the brute force \(O(nw)\) algorithm still works, but doing it in only \(O(n)\) is considerably trickier! We can’t use the same trick as we did for sums since there’s no way to “subtract” the element falling off the left. This really comes down to the fact that addition forms a group (i.e. a monoid-with-inverses), but max does not. So more generally, the question is: how can we compute a monoidal summary for every window in only \(O(n)\) time?

Today I want to show you how to solve this problem using one of my favorite competitive programming tricks, which fits beautifully in a functional context. Along the way we’ll also see how to implement simple yet efficient functional queues.

Stacks

Before we get to queues, we need to take a detour through stacks. Stacks in Haskell are pretty boring. We can just use a list, with the front of the list corresponding to the top of the stack. However, to make things more interesting—and because it will come in very handy later—we’re going to implement monoidally-annotated stacks. Every element on the stack will have a measure, which is a value from some monoid m. We then want to be able to query any stack for the total of all the measures in \(O(1)\). For example, perhaps we want to always be able to find the sum or max of all the elements on a stack.

If we wanted to implement stacks annotated by a group, we could just do something like this:

data GroupStack g a = GroupStack (a -> g) !g [a]

That is, a GroupStack stores a measure function, which assigns to each element of type a a measure of type g (which is intended to be a Group); a value of type g representing the sum (via the group operation) of measures of all elements on the stack; and the actual stack itself. To push, we would just compute the measure of the new element and add it to the cached g value; to pop, we subtract the measure of the element being popped, something like this:

push :: a -> GroupStack g a -> GroupStack g a
push a (GroupStack f g as) = GroupStack f (f a <> g) (a:as)

pop :: GroupStack g a -> Maybe (a, GroupStack g a)
pop (GroupStack f g as) = case as of
  [] -> Nothing
  (a:as') -> GroupStack f (inv (f a) <> g) as'

But this won’t work for a monoid, of course. The problem is pop, where we can’t just subtract the measure for the element being popped. Instead, we need to be able to restore the measure of a previous stack. Hmmm… sounds like we might be able to use… a stack! We could just store a stack of measures alongside the stack of elements; even better is to store a stack of pairs. That is, each element on the stack is paired with an annotation representing the sum of all the measures at or below it. Here, then, is our representation of monoidally-annotated stacks:

{-# LANGUAGE BangPatterns #-}

module Stack where

data Stack m a = Stack (a -> m) !Int [(m, a)]

A Stack m a stores three things:

  1. A measure function of type a -> m.Incidentally, what if we want to be able to specify an arbitrary measure for each element, and even give different measures to the same element at different times? Easy: just use (m,a) pairs as elements, and use fst as the measure function.

  2. An Int representing the size of the stack. This is not strictly necessary, especially since one could always just use a monoidal annotation to keep track of the size; but wanting the size is so ubiquitous that it seems convenient to just include it as a special case.

  3. The aforementioned stack of (annotation, element) pairs.

Note that we cannot write a Functor instance for Stack m, since a occurs contravariantly in (a -> m). But this makes sense: if we change all the a values, the cached measures would no longer be valid.

When creating a new, empty stack, we have to specify the measure function; to get the measure of a stack, we just look up the measure on top, or return mempty for an empty stack.

new :: (a -> m) -> Stack m a
new f = Stack f 0 []

size :: Stack m a -> Int
size (Stack _ n _) = n

measure :: Monoid m => Stack m a -> m
measure (Stack _ _ as) = case as of
  [] -> mempty
  (m, _) : _ -> m

Now let’s implement push and pop. Both are relatively straightforward.

push :: Monoid m => a -> Stack m a -> Stack m a
push a s@(Stack f n as) = Stack f (n + 1) ((f a <> measure s, a) : as)

pop :: Stack m a -> Maybe (a, Stack m a)
pop (Stack f n as) = case as of
  [] -> Nothing
  (_, a) : as' -> Just (a, Stack f (n - 1) as')

Note that if we care about using non-commutative monoids, in the implementation of push we have a choice to make between f a <> measure s and measure s <> f a. The former seems nicer to me, since it keeps the measures “in the same order” as the list representing the stack. For example, if we push a list of elements onto a stack via foldr, using the measure function (:[]) that injects each element into the monoid of lists, the resulting measure is just the original list:

measure . foldr push (new (:[])) == id

And more generally, for any measure function f, we have

measure . foldr push (new f) == foldMap f

Finally, we are going to want a function to reverse a stack, which is a one-liner:

reverse :: Monoid m => Stack m a -> Stack m a
reverse (Stack f _ as) = foldl' (flip push) (new f) (map snd as)

That is, to reverse a stack, we extract the elements and then use foldl' to push the elements one at a time onto a new stack using the same measure function.

There is a bit more code you can find on GitHub, such as Show and Eq instances.

Queues

Now that we have monoidally-annotated stacks under our belt, let’s turn to queues. And here’s where my favorite trick is revealed: we can implement a queue out of two stacks, so that enqueue and dequeue run in \(O(1)\) amortized time; and if we use monoidally-annotated stacks, we get monoidally-annotated queues for free!

First, some imports.

{-# LANGUAGE ImportQualifiedPost #-}

module Queue where

import Data.Bifunctor (second)
import Stack (Stack)
import Stack qualified as Stack

A Queue m a just consists of two stacks, one for the front and one for the back. To create a new queue, we just create two new stacks; to get the size of a queue, we just add the sizes of the stacks; to get the measure of a queue, we just combine the measures of the stacks. Easy peasy.

type CommutativeMonoid = Monoid

data Queue m a = Queue {getFront :: Stack m a, getBack :: Stack m a}
  deriving (Show, Eq)

new :: (a -> m) -> Queue m a
new f = Queue (Stack.new f) (Stack.new f)

size :: Queue m a -> Int
size (Queue front back) = Stack.size front + Stack.size back

measure :: CommutativeMonoid m => Queue m a -> m
measure (Queue front back) = Stack.measure front <> Stack.measure back

Note the restriction to commutative monoids, since the queue elements are stored in different orders in the front and back stacks. If we really cared about making this work with non-commutative monoids, we would have to make two different push methods for the front and back stacks, to combine the measures in opposite orders. That just doesn’t seem worth it. But if you have a good example requiring the use of a queue annotated by a non-commutative monoid, I’d love to hear it!

Now, to enqueue, we just push the new element on the back:

enqueue :: CommutativeMonoid m => a -> Queue m a -> Queue m a
enqueue a (Queue front back) = Queue front (Stack.push a back)

Dequeueing is the magic bit that makes everything work. If there are any elements in the front stack, we can just pop from there. Otherwise, we need to first reverse the back stack into the front stack. This means dequeue may occasionally take \(O(n)\) time, but it’s still \(O(1)\) amortized.The easiest way to see this is to note that every element is touched exactly three times: once when it is pushed on the back; once when it is transferred from the back to the front; and once when it is popped from the front. So, overall, we do \(O(1)\) work per element.

dequeue :: CommutativeMonoid m => Queue m a -> Maybe (a, Queue m a)
dequeue (Queue front back)
  | Stack.size front == 0 && Stack.size back == 0 = Nothing
  | Stack.size front == 0 = dequeue (Queue (Stack.reverse back) front)
  | otherwise = second (\front' -> Queue front' back) <$> Stack.pop
  front

Finally, for convenience, we can make a function drop1 which just dequeues an item from the front of a queue and throws it away.

drop1 :: CommutativeMonoid m => Queue m a -> Queue m a
drop1 q = case dequeue q of
  Nothing -> q
  Just (_, q') -> q'

This “banker’s queue” method of building a queue out of two stacks is discussed in Purely Functional Data Structures by Okasaki, though I don’t think he was the first to come up with the idea. It’s also possible to use some clever tricks to make both enqueue and dequeue take \(O(1)\) time in the worst case. In a future post I’d like to do some benchmarking to compare various queue implementations (i.e. banker’s queues, Data.Sequence, circular array queues built on top of STArray). At least anecdotally, in solving some sliding window problems, banker’s queues seem quite fast so far.

Sliding windows

I hope you can see how this solves the initial motivating problem: to find e.g. the max of a sliding window, we can just put the elements in a monoidally-annotated queue, enqueueing and dequeueing one element every time we slide the window over.More generally, of course, it doesn’t even matter if the left and right ends of the window stay exactly in sync; we can enqueue and dequeue as many times as we want.

The following windows function computes the monoidal sum foldMap f window for each window of width \(w\), in only \(O(n)\) time overall.

windows :: CommutativeMonoid m => Int -> (a -> m) -> [a] -> [m]
windows w f as = go startQ rest
 where
  (start, rest) = splitAt w as
  startQ = foldl' (flip enqueue) (new f) start

  go q as =
    measure q : case as of
      [] -> []
      a : as -> go (enqueue a (drop1 q)) as

“But…maximum and minimum do not form monoids, only semigroups!” I hear you cry. Well, we can just adjoin special positive or negative infinity elements as needed, like so:

data Max a = NegInf | Max a deriving (Eq, Ord, Show)

instance Ord a => Semigroup (Max a) where
  NegInf <> a = a
  a <> NegInf = a
  Max a <> Max b = Max (max a b)

instance Ord a => Monoid (Max a) where
  mempty = NegInf

data Min a = Min a | PosInf deriving (Eq, Ord, Show)

instance Ord a => Semigroup (Min a) where
  PosInf <> a = a
  a <> PosInf = a
  Min a <> Min b = Min (min a b)

instance Ord a => Monoid (Min a) where
  mempty = PosInf

Now we can write, for example, windows 3 Max [1,4,2,8,9,4,4,6] which yields [Max 4, Max 8, Max 9, Max 9, Max 9, Max 6], the maximums of each 3-element window.

Challenges

If you’d like to try solving some problems using the techniques from this blog post, I can recommend the following (generally in order of difficulty):

In a future post I’ll walk through my solution to Hockey Fans. And here’s another couple problems along similar lines; unlike the previous problems I am not so sure how to solve these in a nice way. I may write about them in the future.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 27, 2024 12:00 AM

November 25, 2024

Michael Snoyman

Steelmanning Tariffs

UPDATE A few days after posting this article, I saw a video on X that gives (IMO) a better argument in favor of tariffs than I came up with here. If you're interested in the topic, I'd recommend giving it a watch, it's only about 11 minutes.

I’m a believer in the idea of free markets. The principle is simple: with less regulation and freedom of individuals to engage in trade and their own price discovery, we’ll end up with optimal price points and maximizing production, making everyone’s life better.

Tariffs fly in the face of this by introducing unnecessary and artificial barriers to trade. A classic example is sugar. The US has an import tariff on sugar, which makes it artificially more expensive to use sugar in products. Corn syrup, on the other hand, is produced from domestically grown corn and faces no such penalty. It is therefore artificially cheaper than sugar, and ends up being used in products. The results:

  • More corn is produced than is actually needed, preventing agriculture from focusing on higher value production for society
  • Consumers receive inferior products made out of corn syrup instead of sugar
  • Consumers pay more for these goods than they would without the tariff
  • Sugar producers outside the US make smaller profits

There’s an even worse aspect to tariffs though: they can kick off devastating battles between countries. Tariffs are essentially economic warfare, harming citizens of another country to help your own citizens. Once one country starts tariffs, it can snowball into a crippling domino effect that impedes all global trade.

Donald Trump has said he’s going to use tariffs, because we’re “losing on trade” by having a trade deficit. But that statement is bonkers. A trade deficit means a country receives more goods than it sends out. In other words, citizens do better with a trade deficit.

Based on all this, it seems pretty straightforward that economists would oppose Trump’s tariff plan, and would balk at his explanations. In this post, I want to steelman the position: give the best argument in favor of Trump’s plan that I can think of. (I won’t bother trying to defend the “losing on trade” comment though, it’s factually wrong, but seems like a good rallying cry for a policy from a political standpoint.)

To set the stage, we’re going to start by discussing two ideas, and then bringing them together: negative externalities and granularity of competition.

Negative externalities

In economics, a negative externality is when some activity has a negative impact on others. This essentially transfers some of the costs of an activity to others, while keeping all the benefits for the actor. A great example is pollution. A factory can either spend a million dollars a year cleaning up its waste, or it could dump its pollution into the lake. The business gets no benefit from an unpolluted lake, but normal people will lose the ability to use the lake. The business has externalized a cost onto society.

Programmers may already be familiar with another term for this concept: the tragedy of the commons.

One method to address externalities like this is through regulation: make it illegal for companies to pollute in the lake. But economics offers another approach to this as well: assign ownership rights on the lake. An owner can decide whether or not to allow pollution based on any criteria they want. Being a rational actor (one of the largest assumptions in economics, often violated), the new owner is incentivized to set up an auction for usage rights to the lake. The polluting business can compete against companies offering leisure activities on the lake. Then the free market can determine if the million dollars of cost savings is more valuable than the benefits people can take from a clean lake.

Instead of assigning the property rights to the lake to a private entity, the government can engage in this activity through open auction as well. This results in increased tax revenue, which will decrease the overall tax burden for everyone, flipping the tragedy of the commons into a benefit for all.

You may not like this solution, because you believe that the free market can’t properly price in the true value of a non-polluted lake, or because you don’t believe people will act rationally, or any other reason. That’s not terribly relevant for this discussion. The point is the definition of a negative externality, and the fact that the government can extract money from economic actors while increasing public good.

Granularity of competition

All of economics is a story of competition over scarce goods. Generally, we talk about the competition of individuals or private entities. In other words: people and businesses competing with each other. Note that the competition isn’t between buyers and sellers, as is often believed. Instead, buyers are competing with other buyers, pushing prices up, while sellers are competing with other sellers, pushing prices down.

One of the underlying assumptions of capitalism is that there’s a fair playing field. All actors should be treated equally. In practice, this fails fairly often. For example, in crony capitalism, select businesses receive special handouts from the government. Monopolies are another arguable example, where a company can leverage its overwhelming market share in one industry to subsidize the destruction of competitors in another industry.

As mentioned above, tariffs hurt people by creating an uneven playing field between individuals. But viewed at the national level, tariffs are a method of competition between different governments. In other words, leveraging tariffs may end up hurting competition at the granular level, but might end up serving the interests of a government policy which is at odds with “make all goods cheaper through more competition.”

Protecting workers

In this sense, tariffs are not at all unique. Differences in regulations between countries, laws about fair labor practices, environment impact, local tax structures, and manipulation of currency exchange rates are all part of the competition between different nations. As a simple example, suppose country A has strict labor laws, demanding safe working conditions and demanding health coverage for all workers, while country B does not. Country B will be able to outcompete country A for new businesses, because it’s relatively cheaper to produce in country B.

Tariffs in such a scenario can be a method for country A to make production in country B less attractive. If country A imposes a 20% tariff on country B imports due to human rights violations, it is in essence making production in country A more competitive again. Without this kind of change, countries seeking to attract investment and new businesses may be incentivized to pass laws that hurt their citizens just to bring down production costs.

National security

Another topic is national security. Let’s take countries C and D, who are not on the best of terms. Both countries are stocking up on weapons in case war breaks out. Both countries locally source weapons production, ordering lots of tanks from domestic producers.

Firstly, why domestic? Because it would be crazy to put your national defense in the hands of a potential enemy!

But suppose there’s completely free international trade, with no embargos and no tariffs. Country C is a major steel exporter, obviously an import input to tank production. Country C can engage in some economic warfare of its own against country D:

  • Subsidize local production of steel
  • Cheaper steel exports prevent domestic production of steel from ramping up in country D
    • Under normal circumstances, this is great! It’s what we would term specialization, allowing citizens in country D to focus on what they’re relatively better at.
    • In the long run, the strategy would be unprofitable for country C’s government, and it will eventually either go bankrupt or have to halt the policy.
    • However, since we’re discussing national security…
  • When war breaks out, country C can simply block all steel exports
  • Country D will face a supply chain crisis. Its domestic steel production is low, it hasn’t invested in better tools and technology for steel production, and it will have to quickly and inefficiently produce enough steel to keep up with the war effort.

Tying it together

The best argument I can pull together from these points is that tariffs can be used to place the United States in a stronger position for future competition with geopolitical competitors. Instead of allowing poor labor practices in other countries to drag down the standard of living for Americans, tariffs will artificially inflate the price of incoming goods. Instead of rewarding other countries that pollute, tariffs will extract a penalty from those countries, properly allocating the costs of the negative externalities to those polluting nations. And finally, by incentivizing an increase in domestic production across the board, the US is set up for more autonomy in the case of escalation (through either embargos or full-on warfare), protecting its interests.

It’s the best argument I can make. Others can probably critique my points as well as provide better justifications than I have. I’d love to see those in the comments. The real question is: do the arguments in favor of these tariffs justify the costs many of us anticipate seeing: higher product costs, trade wars, decreased international trade, and isolationism of the US.

Personally, I don’t think the arguments add up. I’m mostly on the side of the mainstream for once. I wouldn’t have proposed tariffs in the current world environment, I wouldn’t vote in favor of them, and I wouldn’t speak out in support of them. And especially given that the proposal seems to be a flat tariff on most countries (with a higher tariff on China), it doesn’t seem to address the “negative externalities” bit at all, which would do better from targeted tariffs attempting to incentivize specific changes (like carbon emission reduction or improvement to labor laws).

Which leaves me with only one final argument in favor of the tariffs: they could be a great bluff. I think many people in the world believe that Trump would be willing to pull the trigger and enact such a policy. That gives him a really great negotiating position for whatever trade deals and other foreign policy objectives he has.

My prediction

I’m a software developer who watches politics and studies economics. My prediction on topics like this isn’t particularly informed, and is likely to be completely wrong. But I may as well put my thoughts in writing so everyone can remind me how wrong I was in the future!

I think Trump will continue to talk about tariffs in his new administration. He’ll spend more time discussing them with the press and foreign leaders than with the Republicans in Congress. There will always be a convenient reason why the tariffs aren’t proposed. And eventually, Trump will get some concessions from other nations, and will eventually “fail” to pass tariffs. The media will have a field day with it, and Trump will accuse someone somewhere of being the reason why “the greatest tax plan of all time” failed.

And maybe it’s my prediction just because I’m hoping it’s what becomes reality. Other outcomes I can foresee are much less rosy.

November 25, 2024 12:00 AM

November 21, 2024

Tweag I/O

GHC's wasm backend now supports Template Haskell and ghci

Two years ago I wrote a blog post to announce that the GHC wasm backend had been merged upstream. I’ve been too lazy to write another blog post about the project since then, but rest assured, the project hasn’t stagnated. A lot of improvements have happened after the initial merge, including but not limited to:

  • Many, many bugfixes in the code generator and runtime, witnessed by the full GHC testsuite for the wasm backend in upstream GHC CI pipelines. The GHC wasm backend is much more robust these days compared to the GHC-9.6 era.
  • The GHC wasm backend can be built and tested on macOS and aarch64-linux hosts as well.
  • Earlier this year, I landed the JSFFI feature for wasm. This lets you call JavaScript from Haskell and vice versa, with seamless integration of JavaScript async computation and Haskell’s green threading concurrency model. This allows us to support Haskell frontend frameworks like reflex & miso, and we have an example repo to demonstrate that.

And…the GHC wasm backend finally supports Template Haskell and ghci!

Show me the code!

$ nix shell 'gitlab:haskell-wasm/ghc-wasm-meta?host=gitlab.haskell.org'
$ wasm32-wasi-ghc --interactive
GHCi, version 9.13.20241102: https://www.haskell.org/ghc/  :? for help
ghci>

Or if you prefer the non-Nix workflow:

$ curl https://gitlab.haskell.org/haskell-wasm/ghc-wasm-meta/-/raw/master/bootstrap.sh | sh
...
Everything set up in /home/terrorjack/.ghc-wasm.
Run 'source /home/terrorjack/.ghc-wasm/env' to add tools to your PATH.
$ . ~/.ghc-wasm/env
$ wasm32-wasi-ghc --interactive
GHCi, version 9.13.20241102: https://www.haskell.org/ghc/  :? for help
ghci>

Both the Nix and non-Nix installation methods default to GHC HEAD, for which binary artifacts for Linux and macOS hosts, for both x86_64 and aarch64, are provided. The Linux binaries are statically linked so they should work across a wide range of Linux distros.

If you take a look at htop, you’ll notice wasm32-wasi-ghc spawns a node child process. That’s the “external interpreter” process that runs our Template Haskell (TH) splice code as well as ghci bytecode. We’ll get to what this “external interpreter” is about later, just keep in mind that whatever code is typed into this ghci session is executed on the wasm side, not on the native side.

Now let’s run some code. It’s been six years since I published the first blog post when I joined Tweag and worked on a prototype compiler codenamed “Asterius”; the first Haskell program I managed to compile to wasm was fib, time to do that again:

ghci> :{
ghci| fib :: Int -> Int
ghci| fib 0 = 0
ghci| fib 1 = 1
ghci| fib n = fib (n - 2) + fib (n - 1)
ghci| :}
ghci> fib 10
55

It works, though with <semantics>O(2n)<annotation encoding="application/x-tex">O(2^n)</annotation></semantics>O(2n) time complexity. It’s easy to do an <semantics>O(n)<annotation encoding="application/x-tex">O(n)</annotation></semantics>O(n) version, using the canonical Haskell fib implementation based on a lazy infinite list:

ghci> :{
ghci| fib :: Int -> Int
ghci| fib = (fibs !!)
ghci|   where
ghci|     fibs = 0 : 1 : zipWith (+) fibs (drop 1 fibs)
ghci| :}
ghci> fib 32
2178309

That’s still boring isn’t it? Now buckle up, we’re gonna do an <semantics>O(1)<annotation encoding="application/x-tex">O(1)</annotation></semantics>O(1) implementation… using Template Haskell!

ghci> import Language.Haskell.TH
ghci> :{
ghci| genFib :: Int -> Q Exp
ghci| genFib n =
ghci|   pure $
ghci|     LamCaseE
ghci|       [ Match (LitP $ IntegerL $ fromIntegral i) (NormalB $ LitE $ IntegerL r) []
ghci|       | (i, r) <- zip [0 .. n] fibs
ghci|       ]
ghci|   where
ghci|     fibs = 0 : 1 : zipWith (+) fibs (drop 1 fibs)
ghci| :}
ghci> :set -XTemplateHaskell
ghci> :{
ghci| fib :: Int -> Int
ghci| fib = $(genFib 32)
ghci| :}
ghci> fib 32
2178309

Joking aside, the real point is not about how to implement fib, but rather to demonstrate that the GHC wasm backend indeed supports Template Haskell and ghci now.

Here’s a quick summary of wasm’s TH/ghci support status:

  • The patch has landed in the GHC master branch and will be present in upstream release branches starting from ghc-9.12. I also maintain non-official backport branches in my fork, and wasm TH/ghci has been backported to 9.10 as well. The GHC release branch bindists packaged by ghc-wasm-meta are built from my branches.
  • TH splices that involve only pure computation (e.g. generating class instances) work. Simple file I/O also works, so file-embed works. Side effects are limited to those supported by WASI, so packages like gitrev won’t work because you can’t spawn subprocesses in WASI. The same restrictions apply to ghci.
  • Our wasm dynamic linker can load bytecode and compiled code, but the only form of compiled code it can load are wasm shared libraries. If you’re using wasm32-wasi-ghc directly to compile code that involves TH, make sure to pass -dynamic-too to ensure the dynamic flavour of object code is also generated. If you’re using wasm32-wasi-cabal, make sure shared: True is present in the global config file ~/.ghc-wasm/.cabal/config.
  • The wasm TH/ghci feature requires at least cabal-3.14 to work (the wasm32-wasi-cabal shipped in ghc-wasm-meta is based on the correct version).
  • Our novel JSFFI feature also works in ghci! You can type foreign import javascript declarations directly into a ghci session, use that to import sync/async JavaScript functions, and even export Haskell functions as JavaScript ones.
  • If you have c-sources/cxx-sources in a cabal package, those can be linked and run in TH/ghci out of the box. However, more complex forms of C/C++ foreign library dependencies like pkgconfig-depends, extra-libraries, etc. will require special care to build both static and dynamic flavours of those libraries.
  • For ghci, hot reloading and basic REPL functionality works, but the ghci debugger doesn’t work yet.

What happens under the hood?

For the curious mind, -opti-v can be passed to wasm32-wasi-ghc. This tells GHC to pass -v to the external interpreter, so the external interpreter will print all messages passed between it and the host GHC process:

$ wasm32-wasi-ghc --interactive -opti-v
GHCi, version 9.13.20241102: https://www.haskell.org/ghc/  :? for help
GHC iserv starting (in: {handle: <file descriptor: 2147483646>}; out: {handle: <file descriptor: 2147483647>})
[             dyld.so] reading pipe...
[             dyld.so] discardCtrlC
...
[             dyld.so] msg: AddLibrarySearchPath ...
...
[             dyld.so] msg: LoadDLL ...
...
[             dyld.so] msg: LookupSymbol "ghczminternal_GHCziInternalziBase_thenIO_closure"
[             dyld.so] writing pipe: Just (RemotePtr 2950784)
...
[             dyld.so] msg: CreateBCOs ...
[             dyld.so] writing pipe: [RemoteRef (RemotePtr 33)]
...
[             dyld.so] msg: EvalStmt (EvalOpts {useSandboxThread = True, singleStep = False, breakOnException = False, breakOnError = False}) (EvalApp (EvalThis (RemoteRef (RemotePtr 34))) (EvalThis (RemoteRef (RemotePtr 33))))
4
[             dyld.so] writing pipe: EvalComplete 15248 (EvalSuccess [RemoteRef (RemotePtr 36)])
...

Why is any message passing involved in the first place? There’s a past blog post which contains an overview of cross compilation issues in Template Haskell, most of the points still hold today, and apply to both TH as well as ghci. To summarise:

  • When GHC cross compiles and evaluates a TH splice, it has to load and run code that’s compiled for the target platform. Compiling both host/target code and running host code for TH is never officially supported by GHC/Cabal.
  • The “external interpreter” runs on the target platform and handles target code. Messages are passed between the host GHC and the external interpreter, so GHC can tell the external interpreter to load stuff, and the external interpreter can send queries back to GHC when running TH splices.

In the case of wasm, the core challenge is dynamic linking: to be able to interleave code loading and execution at run-time, all while sharing the same program state. Back when I worked on Asterius, it could only link a self-contained wasm module that wasn’t able to share any code/data with other Asterius-linked wasm modules at run-time.

So I went with a hack: when compiling each single TH splice, just link a temporary wasm module and run it, get the serialized result and throw it away! That completely bypasses the need to make a wasm dynamic linker. Needless to say, it’s horribly slow and doesn’t support cross-splice state or ghci. Though it is indeed sufficient to support compiling many packages that use TH.

Now it’s 2024, time to do it the right way: implement our own wasm dynamic linker! Some other toolchains like emscripten also support dynamic linking of wasm, but there’s really no code to borrow here: each wasm dynamic linker is tailored to that toolchain’s specific needs, and we have JSFFI-related custom sections in our wasm code that can’t be handled by other linkers anyway.

Our wasm dynamic linker supports loading exactly one kind of wasm module: wasm shared libraries. This is something that you get by compiling C with wasm32-wasi-clang -shared, which enables generation of position-independent code. Such machine code can be placed anywhere in the address space, making it suitable for run-time code loading. A wasm shared library is yet another wasm module; it imports the linear memory and function table, and you can specify any base address for memory data and functions.

So I rolled up my sleeves and got to work. Below is a summary of the journey I took towards full TH & ghci support in the GHC wasm backend:

  • Step one was to have a minimum NodeJS script to load libc.so: it is the bottom of all shared library dependencies, the first and most important one to be loaded. It took me many cans of energy drink to debug mysterious memory corruptions! But finally I could invoke any libc function and do malloc/free, etc. from the NodeJS REPL, with the wasm instance state properly persisted.
  • Then load multiple shared libraries up to libc++.so and running simple C++ snippets compiled to .so. Dependency management logic of shared libraries is added at this step: the dynamic linker traverses the dependency tree of a .so, spawns async WebAssembly.compile tasks, then sequentially loads the dynamic libraries based on their topological order.
  • Then figure out a way to emit wasm position-independent-code from GHC’s wasm backend’s native code generator. The GHC native code generator emits a .s assembly file for the target platform, and while assembly format for x86_64 or aarch64, etc. is widely taught, there’s really no tutorial nor blog post to teach me about assembly syntax for wasm! Luckily, learning from Godbolt output examples was easy enough and I quickly figured out how the position-independent entities are represented in the assembly syntax.
  • The dynamic linker can now load the Haskell ghci shared library! It contains the default implementation of the external interpreter; it almost worked out of the box, though the linker needed some special logic to handle the piping logic across wasm/JS and the host GHC process.
  • In ghci, the logic to load libraries, lookup symbols, etc. are calling into the RTS linker on other platforms. Given all the logic exists on the JS side instead of C for wasm, they are patched to call back into the linker using JSFFI imports.
  • The GHC build system and driver needed quite a few adjustments, to ensure that shared libraries are generated for the wasm target when TH/ghci is involved. Thanks to Matthew Pickering for his patient and constructive review of my patch, I was able to replace many hacks in the GHC driver with more principled approaches.
  • The GHC driver also needs to learn to handle the wasm flavour of the external interpreter. Thanks to the prior work of the JS backend team here, my life is a lot easier when adding wasm external interpreter logic.
  • The GHC testsuite also needed quite a bit of work. In the end, there are over 1000 new test case passes after I flip on TH/ghci support for the wasm target.

What comes next?

The GHC wasm backend TH/ghci feature is way faster and more robust than what I hacked in Asterius back then. One nice example I’d like to show off here is pandoc-wasm: it’s finally possible to compile our beloved pandoc tool to wasm again since Asterius is deprecated.

The new pandoc-wasm is more performant not only at run-time, but also at compile-time. On a GitHub-hosted runner with just 4 CPU cores and 16 GB of memory, it takes around 16min to compile pandoc from scratch, and the time consumption can even be halved on my own laptop with peak memory usage at around 10.8GB. I wouldn’t doubt that time/memory usage can triple or more with legacy GHC-based compilers like Asterius or GHCJS to compile the same codebase!

The work on wasm TH/ghci is not fully finished yet. I do have some things in mind to work on next:

  • Support running the wasm external interpreter in the browser via puppeteer. So your ghci session can connect to the browser, all your Haskell code runs in the browser main thread, and all JSFFI logic in your code can access the browser’s window context. This would allow you to do Haskell frontend livecoding using ghci.
  • Support running an interactive ghci session within the browser. Which means a truly client side Haskell playground in the browser. It’ll only support in-memory bytecode, since it can’t invoke compiler processes to do any heavy lifting, but it’s still good for teaching purposes.
  • Maybe make it even faster? Performance isn’t my concern right now, though I haven’t done any serious profiling and optimization in the wasm dynamic linker either, so we’ll see.
  • Fix ghci debugger support.

You’re welcome to join the Haskell wasm Matrix room to chat about the GHC wasm backend. Do get in touch if you feel it is useful to your project!

November 21, 2024 12:00 AM

November 20, 2024

Well-Typed.Com

The Haskell Unfolder Episode 36: concurrency and the FFI

Today, 2024-11-20, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 36th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 36: concurrency and the FFI

There are two primary ways to import C functions in Haskell: “unsafe” and “safe”. We will first briefly recap what this means: unsafe functions are fast but cannot call back into Haskell, safe functions are much slower but can. As we will see in this episode, however, there are many more differences between unsafe and safe functions, especially in a concurrent setting. In particular, safe functions are not always safer!

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at November 20, 2024 12:00 AM

November 18, 2024

Haskell Interlude

58: ICFP 2024

In this episode, Matti and Sam traveled to the International Conference on Functional Programming (ICFP 2024) in Milan, Italy, and recorded snippets with various participants, including keynote speakers, Haskell legends, and organizers.

by Haskell Podcast at November 18, 2024 04:00 PM

Michael Snoyman

My Path to Bitcoin

Since deciding to write more blog posts again, I’ve drafted and thrown away a few different versions of this blog post. Originally, I was going to try to explain how Bitcoin works, or motivate to others why they should care about it. But the reality is that there is already much better material out there than I can produce. So instead, I’ve decided to make this much more personal: my own journey, why I ultimately changed my opinion and decided to embrace Bitcoin, and then answer some questions and comments I’ve received from others.

If you really do want to learn the best arguments in favor of Bitcoin (as opposed to “price goes up” style arguments), here is my list of top resources. I’ll try to keep this list up to date over time:

OK, without further ado, my path to Bitcoin!

Before Bitcoin

My worldview is highly impactful on this discussion, so I need to get the cliffnotes of my outlook clarified. I come from a financially conservative background. I grew up believing that dollars were king, fixed income savings were good, and the stock market was little more than a casino. I ran most of my financial investments like that for most of my life, either investing in property (i.e., the house I own, not extra investment properties) or keeping cash, US treasuries, and Certificates of Deposit (CDs). When I went truly crazy, I would occasionally invest in the S&P 500 index, the least gambly of gambling. (I’ve commented on that previously.)

I studied actuarial science in school, and worked as an actuary for a few years before moving to Israel. Being an actuary definitely put me in the world of finance, but on a much more theoretical as opposed to practical side. As a small example, I learned all the financial mathematics involved in pricing of options, but never learned any real-life strategy for trading options. And that suited me just fine: options are even more risky than normal stock market gambling!

What that theoretical background gave me was two relevant bodies of knowledge: statistics for risk analysis, and economics. Overall I loved my economics courses. The simple concepts of scarcity and competition fit so perfectly with human psychology and perfectly describe so much of the world around us. (Side note: I don’t just mean in financial matters, I strongly recommend everyone learns the basics of economics to better understand the world at large.)

One minor note about studying economics. My courses were roughly broken down into micro and macro economics. Microeconomics clicked with me from day 1. Things like “price caps lead to shortages” are so simple to understand and evidently true that I can explain them to my 7 year old without a problem.

It was macroeconomics that I couldn’t understand. Why did all the rules of microeconomics–intervention prevents free markets from discovering equilibrium–go out the door when you went to the macro level? Why was it that the government needed to intervene by printing money and spurring the market into action by increasing spending?

I was just a math student larping as an economist, not a true economist, so I incorrectly assumed that this was all just beyond my feeble understanding. And I happily lived my life for about 15 years as an actuary/programmer who enjoyed some economics and lived a fiscally conservative lifestyle.

First hints of Bitcoin

I heard about Bitcoin relatively early in its existence. My wife and I even discussed buying some when it was still under a dollar. To our chagrin, we didn’t. This fit in nicely with my “avoid casinos” approach. Bitcoin was simply a get-rich-quick scam, a Ponzi scheme, fake internet money, you name it. To be completely fair, I didn’t fully come to those conclusions at the time, but it was more-or-less what I thought.

What’s amazing is that I had just lived through the Great Financial Crisis of 2008. I was painfully aware of what a financial disaster we had. I had already graduated at the time, but I was still close to many friends from UCLA, many in economics and financial majors. I remember discussing how ridiculous “too big to fail” was. The phrase I learned much later, Privatizing Profits and Socializing Losses, was exactly how we all felt, and I knew it was setting up incorrect incentive structures. But I didn’t spend enough time to recognize that there was any connection between that disaster and this new Bitcoin fake money scam.

I spent close to 7 years having virtually nothing to do with blockchain and cryptocurrency. At some point around 2016, through my work in the Haskell space, I ended up consulting on a few blockchain projects, including building blockchains for others. I also ended up on a lot of sales calls as a sales engineer.

I won’t call out any specific projects. But suffice it to say that I walked away completely believing that all of “crypto” was a scam. And to quote a phrase from Jewish literature, אוי לרשע אוי לשכנו (woe to the evil one, woe to his neighbor). I must have opened up a Binance account at some point in this, and probably bought some crypto to get a feel for how it all worked. But I had no interest in being part of the space. Blockchain seemed like a cool technology on its own early in its hype cycle. But cryptocurrency itself wasn’t for me.

COVID-19, money printer goes brrrr

When COVID-19 began kicking off, many of us saw the huge amount of money printing, paired with forced lack of productivity due to various COVID restrictions. The combination meant that, simultaneously, true productivity in the economy slowed down (meaning: less goods) and more money was chasing those goods. A lot more money. Money printer goes brrr.

Money printer goes brr

As much as I’d buried my head in the sand during the 2008 financial crisis, things were different this time.

  1. I was older (and hopefully more mature), more financially aware, and had more money at risk.
  2. In 2008, I was a young, newly married man dealing with his first job and learning how to raise my first child. I didn’t have a lot of free thought cycles around. While I wasn’t exactly lounging around in 2020, I had more time available to think about the problem.
  3. Thanks to developments at work, I ended up working in the cryptocurrency space again.

Putting these things together, I paid attention, did some level of research, and decided inflation terrified me. I could easily see the value of all my savings go down dramatically. After some soul searching, I decided it was time to start abandoning my highly-risk-averse savings strategy, and begin embracing a more diversified investment strategy. This meant dividing among:

  • Cash, CDs, and treasury bonds. Higher interest rates certainly made this pretty attractive to myself at the time.
  • Buying into the S&P 500 index. While I still had my original financial outlook of the stock market being little more than a gamble, I also understood risk and empirical data, and investing in the index would–on average and assuming markets continued operating the same way as the past–continue to go up. Hopefully somewhere in line with the officially reported inflation numbers, if you believe those.
  • And the scariest and riskiest of all: crypto.

I want to be clear that this was not a purchase of me saying “I’m a total believer in crypto, this thing is going to skyrocket.” It was simple risk hedging. I was afraid of the fiat currency system, i.e. dollars, losing a huge amount of their value. That made stock investments far less risky in comparison. But my faith in the stock market wasn’t exactly stalwart. With all the financial system changes coming as a result of COVID-19 restrictions and money printing, I was looking for any kind of safety net.

I invested in crypto like I would invest in stocks: I chose a basket of the top performers at the time, diversified my money into them, and hoped for the best.

Terra collapse

This is a side note almost not worth including, but some people may be comforted by this part of the story.

The work I’d been doing in crypto at the time had been on the Terra blockchain. For those who aren’t aware, Terra had a systemic collapse in May 2022 due to the depegging of the TerraUSD (a.k.a. UST) stablecoin. Or, as the jokes correctly put it, not-so-stablecoin.

I lost a significant amount of money on that. With my fiscally conservative background, this was essentially a worst-case-scenario making all of my greatest fears of risky investment come true.

At this point, however, I had learned enough about the crypto boom/bust cycles. Miriam (my wife) and I spent a lot of time discussing, and decided we’d ride out the bear market, not simply run away screaming.

The point of the inclusion of this here: I basically went through the worst possible financial outcome I could imagine. And it wasn’t nearly as bad as I thought it would be. I lost money. That’s never fun. But the true moral of the story I’m telling is that everything in the financial world is a massive jumble of different risks these days. There isn’t a single safe haven, at least not like gold was in the 1800s.

The important part for everyone: don’t fall into the sunk cost fallacy! If you’ve taken losses on an investment, try to stay calm, look at it rationally, and make the best decision you possibly can with current knowledge and no emotional input.

The Bitcoin maxi path

I’m almost embarrassed by this next sentence. Despite working in the crypto space off-and-on for about 8 years now, and despite spending 3 years full-time working on crypto and Decentralized Finance products, I only recently understood the connection between the beginning of this story (the Great Financial Crisis of 2008) and Bitcoin.

You see, I shouldn’t have stuck my head in the sand back then. Had I had more time and more curiosity, I could have discovered a few truths. Firstly, Bitcoin is directly targeted at addressing the unsound system that led to the Great Recession. Secondly, Bitcoin and crypto–at least in general–are very different beasts. My guess is that, like me, most people are more afraid of Bitcoin because they have it mentally associated with the rest of the crypto world and all the fun casino-like-games it contains. And finally, I discovered that I knew both more and less about economics than I thought I did.

You’ll see people in the Bitcoin world talk about “doing your 100 hours” before you understand Bitcoin. I think I’ve only completed that in the past few months. It’s from watching a lot of random YouTube videos, chatting with people on X and Reddit, reading blog posts, and most powerfully and most recently: reading The Bitcoin Standard. I haven’t even finished it yet, I’m looking forward to the rest. But already, it’s snapped a lot of my economics understanding into focus.

In particular, it’s given me a much better understanding of the concept of money than I ever received in my university economics courses. (Or I’m simply listening better this time around.) It’s also helped me understand why macroeconomics never clicked with me. Anyone who’s read the book will know that it’s not exactly gentle in its treatment of Keynesian and Monetarist economic theories. Getting a clear breakdown of the different schools of thought, how they compare to Austrian economics, and the author’s very opinionated views on them, has been one of the most intellectually stimulating things I’ve done since learning monads and the borrow checker.

Side note: I wish more texts included the authors’ direct and unapologetic opinions like this. If someone has a similarly direct take-down of the arguments in The Bitcoin Standard, I would love to read it, please pass it along.

Putting that all together: a monetary system which allows for no inflation and grants no party central control is far more powerful than I originally understood. My opinion evolved somewhat slowly from “magical internet money” to “potentially good risk hedge in a balanced portfolio.” It went really quickly from that to “oh, I get it, this is the hardest money that exists, it will store value for the foreseeable future.”

I no longer look at my investment in Bitcoin as a risk. I’ve switched worldviews completely. Every other asset I hold is the risk. I can be hurt significantly by this of course. If Bitcoin has another bear market and I need to buy groceries, I’ll be taking a huge loss on the Bitcoin I need to liquidate. (I discuss how I address this in my buying Bitcoin or selling dollars post.)

Aren’t I scared?

Yes. I think we’re living in some of the scariest financial times of our lives. But my viewpoint now is that everything is risky, there’s no escaping it. There isn’t a safe-haven in dollars and potentially big gains in other assets. Everything is on a precipice right now. And my honest belief is that, for the long haul, Bitcoin is the least risky of all potentially stores of value.

Am I right? We’ll know in 20 years.

My recommendation to others

This is my personal journey. This story shouldn’t convince anyone to do anything with their money. I haven’t presented any true arguments in favor of Bitcoin here, just some comments and references to larger ideas.

If you take anything away from this, I hope it’s this: there’s some guy out there who’s really scared of risky investments, has at least some formal training in economics and finance, wanted nothing to do with Bitcoin, and then decided he was completely wrong and embraced it.

I hope that motivates those of you who are opposed to Bitcoin to do a bit more research. Challenge the ideas you read. Challenge the ideas you already hold. Ask questions. Get in debates. Treat this seriously. Because whichever way you decide, for or against Bitcoin, will likely have a major impact on the rest of your life.

Random Q&A

I received some questions previously on my Bitcoin vs gold blog post, and haven’t had a chance to answer them in another post. Instead of a dedicated post for that, this seems like a good time to address that.

Bitcoin was originally marketed as anonymous. Nowadays we consider it pseudonymous at best. Do you believe this is an important feature for money? Do you believe it is actually *desirable*?

Yes, I do. The economy works best by all people being completely free to make whatever financial decisions they believe are best for themselves. Having non-anonymous financial transactions will add friction to that system. Each time I make a purchase, I’ll wonder if others will judge me unfavorably for it.

Having the Bitcoin chain work as it does today with all transactions being publicly visible is great for transparency. Publicly held companies publishing their wallet addresses is great too. But there needs to be some room for truly anonymous interactions. And I believe we’ll see that space expand over time as Bitcoin moves from “great speculative asset” to “money.” The Lightning Network already does a pretty great job at this.

If anonymity is important, why not some more modern cryptocurrency?

Because it isn’t needed. Bitcoin has one thing few other cryptocurrencies have, and one thing none of them have:

  1. Bitcoin is a scarce resource. If you look at the tokenomics of most other cryptocurrencies, they are massively complex and usually inflationary in some way. None of those can ever act as a true store of value over time.
  2. Bitcoin is first. That’s obviously not entirely true; there are plenty of other attempts at digital money that predate it, arguably the modern dollar is also a “digital asset,” etc. But for this new class of “algorithmically scarce, decentralized, digital money,” Bitcoin is the first on the scene. As a result, it has major network effects, and will be essentially impossible to dethrone until someone comes up with a fundamentally better approach. No one has so far.

How would/should society be organized in a world where a lot of modern taxation would likely(?) be very hard due to completely unobservable cash flows? Currently I think a lot of this relies on cash being local, and the sources and sinks of it being relatively traceable.

Exposing my politics a bit more, I’m totally in favor of a society based far less on taxing the populace.

That said, I don’t think Bitcoin will fundamentally change this. People who have wanted to evade taxes have found ways in the past, and the government has always found ways to increase its surveillance apparatus to keep up with them. It’s an arms race that will continue.

If we get to a point where all money is Bitcoin and everyone transacts daily in Lightning wallets, I have no doubt that the local supermarket will fully comply with all reporting rules to the government on purchases, property purchases will require justification of where your funds came from, and other such things that will ensure the majority of financial transactions stay in the legal, taxable world.

Modern economic theory indeed often aims at a ~2% annual inflation (not more, not less), as an incentive to keep investing in ways that presumably benefit the society instead of sitting on money. Deflation is considered dangerous for the same reason. It seems to me, perhaps naively, that zero inflation would make investment a zero sum game, which seemingly removes a lot of the incentives. What do you think of this?

I’m still coming to terms on inflation vs deflation. Until recently, I had naively assumed that everyone believed deflation was awesome because all of society benefits from getting more stuff, and inflation was the penalty for letting the government print money. Apparently that is far from the modern well-accepted outlook.

I may be making a mistake in my interpretation, but based on what I’ve read in The Bitcoin Standard, here’s an answer that I hope is accurate. Whenever you have money, you can do one of two things with it: use it for immediate consumption, or defer its usage to later. The real interest rate (nominal interest - inflation rate) gives an indication of how much you’ll be rewarded for deferring usage of your money till later.

With deflation (i.e., negative inflation), you end up with a high real interest. This means “if you hold off on consumption you’ll get even more stuff in the future.” This incentivizes a low time preference: I don’t value immediate consumption very much versus later consumption.

Coming back to your question: the “sitting on money” idea comes directly from Keynesian economics, where the focus is on spending, consumerism, and short term money flows. An Austrian approach is completely different. Sitting on money means that I allow productive capability today to be put into capital goods, things which will increase production in the future, thus making more stuff available in the future at lower prices, thus further fueling deflation.

Note that another way of looking at interest rates is the cost of capital. It’s a price just like any other price in the economy. If you have the government set the price of apples, you’ll either have shortages or surpluses. So what happens to capital when the central bank gets to set the price of capital by controlling interest rates?

There’s a lot more to this topic, and I’m pretty fresh to it, so I’ll call it there. I’d definitely recommend The Bitcoin Standard for more on the topic.

November 18, 2024 12:00 AM

Brent Yorgey

Competitive Programming in Haskell: Union-Find, part II

Competitive Programming in Haskell: Union-Find, part II

Posted on November 18, 2024
Tagged , ,

In my previous post I explained how to implement a reasonably efficient union-find data structure in Haskell, and challenged you to solve a couple Kattis problems. In this post, I will (1) touch on a few generalizations brought up in the comments of my last post, (2) go over my solutions to the two challenge problems, and (3) briefly discuss generalizing the second problem’s solution to finding max-edge decompositions of weighted trees.

Generalizations

Before going on to explain my solutions to those problems, I want to highlight some things from a comment by Derek Elkins and a related blog post by Philip Zucker. The first is that instead of (or in addition to) annotating each set with a value from a commutative semigroup, we can also annotate the edges between nodes with elements from a group (or, more generally, a groupoid). The idea is that each edge records some information about, or evidence for, the relationship between the endpoints of the edge. To compute information about the relationship between two arbitrary nodes in the same set, we can compose elements along the path between them. This is a nifty idea—I have never personally seen it used for a competitive programming problem, but it probably has been at some point. (It kind of makes me want to write such a problem!) And of course it has “real” applications beyond competitive programming as well. I have not actually generalized my union-find code to allow edge annotations; I leave it as an exercise for the reader.

The other idea to highlight is that instead of thinking in terms of disjoint sets, what we are really doing is building an equivalence relation, which partitions the elements into disjoint equivalence classes. In particular, we do this by incrementally building a relation \(R\), where the union-find structure represents the reflexive, transitive, symmetric closure of \(R\). We start with the empty relation \(R\) (whose reflexive, transitive, symmetric closure is the discrete equivalence relation, with every element in its own equivalence class); every \(\mathit{union}(x,y)\) operation adds \((x,y)\) to \(R\); and the \(\mathit{find}(x)\) operation computes a canonical representative of the equivalence class of \(x\). In other words, given some facts about which things are related to which other things (possibly along with some associated evidence), the union-find structure keeps track of everything we can infer from the given facts and the assumption that the relation is an equivalence.

Finally, through the comments I also learned about other potentially-faster-in-practice schemes for doing path compression such as Rem’s Algorithm; I leave it for future me to try these out and see if they speed things up.

Now, on to the solutions!

Duck Journey

In Duck Journey, we are essentially given a graph with edges labelled by bitstrings, where edges along a path are combined using bitwise OR. We are then asked to find the greatest possible value of a path between two given vertices, assuming that we are allowed to retrace our steps as much as we want.Incidentally, if we are not allowed to retrace our steps, this problem probably becomes NP-hard.

If we can retrace our steps, then on our way from A to B we might as well visit every edge in the entire connected component, so this problem is not really about path-finding at all. It boils down to two things: (1) being able to quickly test whether two given vertices are in the same connected component or not, and (2) computing the bitwise OR of all the edge labels in each connected component.

One way to solve this would be to first use some kind of graph traversal, like DFS, to find the connected components and build a map from vertices to component labels; then partition the edges by component and take the bitwise OR of all the edge weights in each component. To answer queries we could first look up the component label of the two vertices; if the labels are the same then we look up the total weight for that component.

This works, and is in some sense the most “elemantary” solution, but it requires building some kind of graph data structure, storing all the edges in memory, doing the component labelling via DFS and building another map, and so on. An alternative solution is to use a union-find structure with a bitstring annotation for each set: as we read in the edges in the input, we simply union the endpoints of the edge, and then update the bitstring for the resulting equivalence class with the bitstring for the edge. If we take a union-find library as given, this solution seems simpler to me.

First, some imports and the top-level main function. (See here for the ScannerBS module.)

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RecordWildCards #-}

module Main where

import Control.Category ((>>>))
import Control.Monad.ST
import Data.Bits
import Data.ByteString.Lazy.Char8 (ByteString)
import Data.ByteString.Lazy.Char8 qualified as BS

import ScannerBS
import UnionFind qualified as UF

main = BS.interact $ runScanner tc >>> solve >>> format

format :: [Maybe Int] -> ByteString
format = map (maybe "-1" (show >>> BS.pack)) >>> BS.unlines

Next, some data types to represent the input, and a Scanner to read it.

-- Each edge is a "filter" represented as a bitstring stored as an Int.
newtype Filter = Filter Int
  deriving (Eq, Show)

instance Semigroup Filter where
  Filter x <> Filter y = Filter (x .|. y)

filterSize :: Filter -> Int
filterSize (Filter f) = popCount f

data Channel = Channel UF.Node UF.Node Filter deriving (Eq, Show)
data TC = TC {n :: !Int, channels :: [Channel], queries :: [(Int, Int)]}
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  m <- int
  q <- int
  channels <- m >< (Channel <$> int <*> int <*> (Filter <$> int))
  queries <- q >< pair int int
  return TC {..}

Finally, here’s the solution itself: process each channel with a union-find structure, then process queries. The annoying thing, of course, is that this all has to be in the ST monad, but other than that it’s quite straightforward.

solve :: TC -> [Maybe Int]
solve TC {..} = runST $ do
  uf <- UF.new (n + 1) (Filter 0)
  mapM_ (addChannel uf) channels
  mapM (answer uf) queries

addChannel :: UF.UnionFind s Filter -> Channel -> ST s ()
addChannel uf (Channel a b f) = do
  UF.union uf a b
  UF.updateAnn uf a f

answer :: UF.UnionFind s Filter -> (Int, Int) -> ST s (Maybe Int)
answer uf (a, b) = do
  c <- UF.connected uf a b
  case c of
    False -> pure Nothing
    True -> Just . filterSize <$> UF.getAnn uf a

Inventing Test Data

In Inventing Test Data, we are given a tree \(T\) with integer weights on its edges, and asked to find the minimum possible weight of a complete graph for which \(T\) is the unique minimum spanning tree (MST).



Let \(e = (x,y)\) be some edge which is not in \(T\). There must be a unique path between \(x\) and \(y\) in \(T\) (so adding \(e\) to \(T\) would complete a cycle); let \(m\) be the maximum weight of the edges along this path. Then I claim that we must give edge \(e\) weight \(m+1\):

  • On the one hand, this ensures \(e\) can never be in any MST, since an edge which is strictly the largest edge in some cycle can never be part of an MST (this is often called the “cycle property”).
  • Conversely, if \(e\) had a weight less than or equal to \(m\), then \(T\) would not be a MST (or at least not uniquely): we could remove any edge in the path from \(x\) to \(y\) through \(T\) and replace it with \(e\), resulting in a spanning tree with a lower (or equal) weight.

Hence, every edge not in \(T\) must be given a weight one more than the largest weight in the unique \(T\)-path connecting its endpoints; these are the minimum weights that ensure \(T\) is a unique MST.

A false start

At first, I thought what we needed was a way to quickly compute this max weight along any path in the tree (where by “quickly” I mean something like “faster than linear in the length of the path”). There are indeed ways to do this, for example, using a heavy-light decomposition and then putting a data structure on each heavy path that allows us to query subranges of the path quickly. (If we use a segment tree on each path we can even support operations to update the edge weights quickly.)

All this is fascinating, and something I may very well write about later. But it doesn’t actually help! Even if we could find the max weight along any path in \(O(1)\), there are still \(O(V^2)\) edges to loop over, which is too big. There can be up to \(V = 15\,000\) nodes in the tree, so \(V^2 = 2.25 \times 10^8\). A good rule of thumb is \(10^8\) operations per second, and there are likely to be very high constant factors hiding in whatever complex data structures we use to query paths efficiently.

So we need a way to somehow process many edges at once. As usual, a change in perspective is helpful; to get there we first need to take a slight detour.

Kruskal’s Algorithm

It helps to be familiar with Kruskal’s Algorithm, which is the simplest algorithm I know for finding minimum spanning trees:

  • Sort the edges from smallest to biggest weight.
  • Initialize \(T\) to an empty set of edges.
  • For each edge \(e\) in order from smallest to biggest:
    • If \(e\) does not complete a cycle with the other edges already in \(T\), add \(e\) to \(T\).

To efficiently check whether \(e\) completes a cycle with the other edges in \(T\), we can use a union-find, of course: we maintain equivalence classes of vertices under the “is connected to” equivalence relation; adding \(e\) would complete a cycle if and only if the endpoints of \(e\) are already connected to each other in \(T\). If we do add an edge \(e\), we can just \(\mathit{union}\) its endpoints to properly maintain the relation.

A change of perspective

So how does this help us solve “Inventing Test Data”? After all, we are not being directly asked to find a minimum spanning tree. However, it’s still helpful to think about the process Kruskal’s Algorithm would go through, in order to choose edge weights that will force it to do what we want (i.e. pick all the edges in \(T\)). That is, instead of thinking about each individual edge not in \(T\), we can instead think about the edges that are in \(T\), and what must be true to force Kruskal’s algorithm to pick each one.

Suppose we are part of the way through running Kruskal’s algorithm, and that it is about to consider a given edge \(e = (x,y) \in T\) which has weight \(w_e\). At this point it has already considered any edges with smaller weight, and (we shall assume) chosen all the smaller-weight edges in \(T\). So let \(X\) be the set of vertices reachable from \(x\) by edges in \(T\) with weight less than or equal to \(w_e\), and similarly let \(Y\) be those reachable from \(y\). Kruskal’s algorithm will pick edge \(e\) after checking that \(X\) and \(Y\) are disjoint.



Think about all the other edges from \(X\) to \(Y\): all of them must have weight greater than \(w_e\), because otherwise Kruskal’s algorithm would have already considered them earlier, and used one of them to connect \(X\) and \(Y\). In fact, all of these edges must have weight \(w_e + 1\), as we argued earlier, since \(e\) is the largest-weight edge on the \(T\)-path between their endpoints (all the other edges on these paths were already chosen earlier and hence have smaller weight). The number of such edges is just \(|X| |Y| - 1\) (there is an edge for every pair of vertices, but we do not want to count \(e\) itself). Hence they contribute a total of \((|X||Y| - 1)(w_e + 1)\) to the sum of edge weights.

Hopefully the solution is now becoming clear: we process the edges of \(T\) in order from smallest to biggest, using a union-find to keep track equivalence classes of connected vertices so far. For each edge \((x,y)\) we look up the sizes of the equivalence classes of \(x\) and \(y\), add \((|X||Y| - 1)(w_e + 1)\) to a running total, and union. This accounts for all the edges not in \(T\); finally we must also add the weights of the edges in \(T\) themselves.

First some standard pragmas and imports, along with some data types and a Scanner to parse the input. Note the custom Ord instance for Edge, so we can sort edges by weight.

{-# LANGUAGE ImportQualifiedPost #-}
{-# LANGUAGE RecordWildCards #-}

import Control.Category ((>>>))
import Control.Monad.ST
import Data.ByteString.Lazy.Char8 qualified as BS
import Data.List (sort)
import Data.Ord (comparing)
import Data.STRef
import ScannerBS
import UnionFind qualified as UF

main = BS.interact $ runScanner (numberOf tc) >>> map (solve >>> show >>> BS.pack) >>> BS.unlines

data Edge = Edge {a :: !Int, b :: !Int, w :: !Integer}
  deriving (Eq, Show)

instance Ord Edge where
  compare = comparing w

data TC = TC {n :: !Int, edges :: [Edge]}
  deriving (Eq, Show)

tc :: Scanner TC
tc = do
  n <- int
  edges <- (n - 1) >< (Edge <$> int <*> int <*> integer)
  return TC {..}

Finally, the (remarkably short) solution proper: we sort the edges and process them from smallest to biggest; for each edge we update an accumulator according to the formula discussed above. Since we’re already tied to the ST monad anyway, we might as well keep the accumulator in a mutable STRef cell.

solve :: TC -> Integer
solve TC {..} = runST $ do
  uf <- UF.new (n + 1)
  total <- newSTRef (0 :: Integer)
  mapM_ (processEdge uf total) (sort edges)
  readSTRef total

processEdge :: UF.UnionFind s -> STRef s Integer -> Edge -> ST s ()
processEdge uf total (Edge a b w) = do
  modifySTRef' total (+ w)
  sa <- UF.size uf a
  sb <- UF.size uf b
  modifySTRef' total (+ (fromIntegral sa * fromIntegral sb - 1) * (w + 1))
  UF.union uf a b

Max-edge decomposition



Incidentally, there’s something a bit more general going on here: for a given nonempty weighted tree \(T\), a max-edge decomposition of \(T\) is a binary tree defined as follows:

  • The max-edge decomposition of a trivial single-vertex tree is a single vertex.
  • Otherwise, the max-edge decomposition of \(T\) consists of a root node with two children, which are the max-edge decompositions of the two trees that result from deleting a largest-weight edge from \(T\).

Any max-edge decomposition of a tree \(T\) with \(n\) vertices will have \(n\) leaf nodes and \(n-1\) internal nodes. Typically we think of the leaf nodes of the decomposition as being labelled by the vertices of \(T\), and the internal nodes as being labelled by the edges of \(T\).

An alternative way to think of the max-edge decomposition is as the binary tree of union operations performed by Kruskal’s algorithm while building \(T\), starting with each vertex in a singleton leaf and then merging two trees into one with every union operation. Thinking about, or even explicitly building, this max-edge decomposition occasionally comes in handy. For example, see Veður and Toll Roads.

Incidentally, I can’t remember whether I got the term “max-edge decomposition” from somewhere else or if I made it up myself; in any case, regardless of what it is called, I think I first learned of it from this blog post by Petr Mitrichev.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 18, 2024 12:00 AM

November 17, 2024

Eric Kidd

9½ years of Rust in production (and elsewhere)

The first stable release of Rust was on May 15, 2015, just about 9½ years ago. My first “production” Rust code was a Slack bot, which talked to GoCD to control the rollout of a web app. This was utterly reliable. And so new bits of Rust started popping up.

I’m only going to talk about open source stuff here. This will be mostly production projects, with a couple of weekend projects thrown in. Each project will ideally get its own post over the next couple of months.

Planned posts

Here are some of the tools I’d like to talk about:

  1. Moving tables easily between many databases (dbcrossbar)
  2. 700-CPU batch jobs
  3. Geocoding 60,000 addresses per second
  4. Interlude: Neural nets from scratch in Rust
  5. Lots of CSV munging
  6. Interlude: Language learning using subtitles, Anki, Whisper and ChatGPT
  7. Transpiling BigQuery SQL for Trino (a work in progress)

I’ll update this list to link to the posts. Note that I may not get to all of these!

Maintaining Rust & training developers

One of the delightful things about Rust is the low rate of “bit rot”. If something worked 5 years ago—and if it wasn’t linked against the C OpenSSL libraries—then it probably works unchanged today. And if it doesn’t, you can usually fix it in 20 minutes. This is largely thanks to Rust’s “stability without stagnation” policy, the Edition system, and the Crater tool which is used to nest new Rust releases against the entire ecosystem.

The more interesting questions are (1) when should you use Rust, and (2) how do you make sure your team can use it?

Read more…

November 17, 2024 04:08 PM

November 14, 2024

Gabriella Gonzalez

The Haskell inlining and specialization FAQ

The Haskell inlining and specialization FAQ

This is a post is an FAQ answering the most common questions people ask me related to inlining and specialization. I’ve also structured it as a blog post that you can read from top to bottom.

What is inlining?

“Inlining” means a compiler substituting a function call or a variable with its definition when compiling code. A really simple example of inlining is if you write code like this:

module Example where

x :: Int
x = 5

y :: Int
y = x + 1

… then at compile time the Haskell compiler can (and will) substitute the last occurrence of x with its definition (i.e. 5):

y :: Int
y = 5 + 1

… which then allows the compiler to further simplify the code to:

y :: Int
y = 6

In fact, we can verify that for ourselves by having the compiler dump its intermediate “core” representation like this:

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… which will produce this output:

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 20, types: 7, coercions: 0, joins: 0/0}

x = I# 5#

$trModule4 = "main"#

$trModule3 = TrNameS $trModule4

$trModule2 = "Example"#

$trModule1 = TrNameS $trModule2

$trModule = Module $trModule3 $trModule1

y = I# 6#

… which we can squint a little bit and read it as:

x = 5

y = 6

… and ignore the other stuff.

A slightly more interesting example of inlining is a function call, like this one:

f :: Int -> Int
f x = x + 1

y :: Int
y = f 5

The compiler will be smart enough to inline f by replacing f 5 with 5 + 1 (here x is 5):

y :: Int
y = 5 + 1

… and just like before the compiler will simplify that further to y = 6, which we can verify from the core output:

y = I# 6#

What is specialization?

“Specialization” means replacing a “polymorphic” function with a “monomorphic” function. A “polymorphic” function is a function whose type has a type variable, like this one:

-- Here `f` is our type variable
example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… and a “monomorphic” version of the same function replaces the type variable with a specific (concrete) type or type constructor:

example2 :: Maybe Int -> Maybe Int
example2 = fmap (+ 1)

Notice that example and example2 are defined in the same way, but they are not exactly the same function:

  • example is more flexible and works on strictly more type constructors

    example works on any type constructor f that implements Functor, whereas example2 only works on the Maybe type constructor (which implements Functor).

  • example and example2 compile to very different core representations

In fact, they don’t even have the same “shape” as far as GHC’s core representation is concerned. Under the hood, the example function takes two extra “hidden” function arguments compared to example2, which we can see if you dump the core output (and I’ve tidied up the output a lot for clarity):

example @f $Functor = fmap $Functor (\v -> v + 1)

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

The two extra function arguments are:

  • @f: This represents the type variable f

    Yes, the type variable that shows up in the type signature also shows up at the term level in the GHC core representation. If you want to learn more about this you might be interested in my Polymorphism for Dummies post.

  • $Functor: This represents the Functor instance for f

    Yes, the Functor instance for a type like f is actually a first-class value passed around within the GHC core representation. If you want to learn more about this you might be interested in my Scrap your Typeclasses post.

Notice how the compiler cannot optimize example as well as it can optimize example2 because the compiler doesn’t (yet) know which type constructor f we’re going to call example on and also doesn’t (yet) know which Functor f instance we’re going to use. However, once the compiler does know which type constructor we’re using it can optimize a lot more.

In fact, we can see this for ourselves by changing our code a little bit to simply define example2 in terms of example:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

example2 :: Maybe Int -> Maybe Int
example2 = example

This compiles to the exact same code as before (you can check for yourself if you don’t believe me).

Here we would say that example2 is “example specialized to the Maybe type constructor”. When write something like this:

example2 :: Maybe Int -> Maybe Int
example2 = example

… what’s actually happening under the hood is that the compiler is actually doing something like this:

example2 = example @Maybe $FunctorMaybe

In other words, the compiler is taking the more general example function (which works on any type constructor f and any Functor f instance) and then “applying” it to a specific type constructor (@Maybe) and the corresponding Functor instance ($FunctorMaybe).

In fact, we can see this for ourselves if we generate core output with optimization disabled (-O0 instead of -O2) and if we remove the -dsuppress-all flag:

$ ghc -O0 -fforce-recomp -ddump-simpl Example.hs

This outputs (among other things):

…

example2 = example @Maybe $FunctorMaybe
…

And when we enable optimizations (with -O2):

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… then GHC inlines the definition of example and simplifies things further, which is how it generates this much more optimized core representation for example2:

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

In fact, specialization is essentially the same thing as inlining under the hood (I’m oversimplifying a bit, but they are morally the same thing). The main distinction between inlining and specialization is:

  • specialization simplifies function calls with “type-level” arguments

    By “type-level” arguments I mean (hidden) function arguments that are types, type constructors, and type class instances

  • inlining simplifies function calls with “term-level” arguments

    By “term-level” arguments I mean the “ordinary” (visible) function arguments you know and love

Does GHC always inline or specialize code?

NO. GHC does not always inline or specialize code, for two main reasons:

  • Inlining is not always an optimization

    Inlining can sometimes make code slower. In particular, it can often be better to not inline a function with a large implementation because then the corresponding CPU instructions can be cached.

  • Inlining a function requires access to the function’s source code

    In particular, if the function is defined in a different module from where the function is used (a.k.a. the “call site”) then the call site does not necessarily have access to the function’s source code.

To expand on the latter point, Haskell modules are compiled separately (in other words, each module is a separate “compilation unit”), and the compiler generates two outputs when compiling a module:

  • a .o file containing object code (e.g. Example.o)

    This object code is what is linked into the final executable to generate a runnable program.

  • a .hi file containing (among other things) source code

    The compiler can optionally store the source code for any compiled functions inside this .hi file so that it can inline those functions when compiling other modules.

However, the compiler does not always save the source code for all functions that it compile because there are downsides to storing source code for functions:

  • this slows down compilation

    This slows down compilation both for the “upstream” module (the module defining the function we might want to inline) and the “downstream” module (the module calling the function we might want to inline). The upstream module takes longer to compile because now the full body of the function needs to be saved in the .hi file and the downstream module takes longer to compile because inlining isn’t free (all optimizations, including inlining, generate more work for the compiler).

  • this makes the .hi file bigger

    The .hi file gets bigger because it’s storing the source code of the function.

  • this can also make the object code larger, too

    Inlining a function multiple times can lead to duplicating the corresponding object code for that function.

This is why by default the compiler uses its own heuristic to decide which functions are worth storing in the .hi file. The compiler does not indiscriminately save the source code for all functions.

You can override the compiler’s heuristic, though, using …

Compiler directives

There are a few compiler directives (a.k.a. “pragmas”) related to inlining and specialization that we’ll cover here:

  • INLINABLE
  • INLINE
  • NOINLINE
  • SPECIALIZE

My general rule of thumb for these compiler directives is:

  • don’t use any compiler directive until you benchmark your code to show that it helps
  • if you do use a compiler directive, INLINABLE is probably the one you should pick

I’ll still explain what what all the compiler directives mean, though.

INLINABLE

INLINABLE is a compiler directive that you use like this:

f :: Int -> Int
f x = x + 1
{-# INLINABLE f #-}

The INLINABLE directive tells the compiler to save the function’s source code in the .hi file in order to make that function available for inlining downstream. HOWEVER, INLINABLE does NOT force the compiler to inline that function. The compiler will still use its own judgment to decide whether or not the function should be inlined (and the compiler’s judgment tends to be fairly good).

INLINE

INLINE is a compiler directive that you use in a similar manner as INLINABLE:

f :: Int -> Int
f x = x + 1
{-# INLINE f #-}

INLINE behaves like INLINABLE except that it also heavily biases the compiler in favor of inlining the function. There are still some cases where the compiler will refuse to fully inline the function (for example, if the function is recursive), but generally speaking the INLINE directive override’s the compiler’s own judgment for whether or not to inline the function.

I would argue that you usually should prefer the INLINABLE pragma over the INLINE pragma because the compiler’s judgment for whether or not to inline things is usually good. If you override the compiler’s judgment there’s a good chance you’re making things worse unless you have benchmarks showing otherwise.

NOINLINE

If you mark a function as NOINLINE:

f :: Int -> Int
f x = x + 1
{-# NOINLINE f #-}

… then the compiler will refuse to inline that function. It’s pretty rare to see people use a NOINLINE annotation for performance reasons (although there are circumstances where NOINLINE can be an optimization). It’s far, far, far more common to see people use NOINLINE in conjunction with unsafePerformIO because that’s what the unsafePerformIO documentation recommends:

Use {-# NOINLINE foo #-} as a pragma on any function foo that calls unsafePerformIO. If the call is inlined, the I/O may be performed more than once.

SPECIALIZE

SPECIALIZE lets you hint to the compiler that it should compile a polymorphic function for a monomorphic type ahead of time. For example, if we define a polymorphic function like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… we can tell the compiler to go ahead and specialize the example function for the special case where f is Maybe, like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)
{-# SPECIALIZE example :: Maybe Int -> Maybe Int #-}

This tells the compiler to go ahead and compile the more specialized version, too, because we expect some other module to use that more specialized version. This is nice if we want to get the benefits of specialization without exporting the function’s source code (so we don’t bloat the .hi file) or if we want more precise control over when specialize does and does not happen.

In practice, though, I find that most Haskell programmers don’t want to go to the trouble of anticipating and declaring all possible specializations, which is why I endorse INLINABLE as the more ergonomic alternative to SPECIALIZE.

by Gabriella Gonzalez (noreply@blogger.com) at November 14, 2024 04:58 PM

GHC Developer Blog

GHC 9.12.1-alpha3 is now available

GHC 9.12.1-alpha3 is now available

Zubin Duggal - 2024-11-14

The GHC developers are very pleased to announce the availability of the third alpha release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 14, 2024 12:00 AM

November 08, 2024

Donnacha Oisín Kidney

POPL Paper—Formalising Graph Algorithms with Coinduction

Posted on November 8, 2024
Tags:

New paper: “Formalising Graph Algorithms with Coinduction”, by myself and Nicolas Wu, will be published at POPL 2025.

The preprint is available here.

The paper is about representing graphs (especially in functional languages). We argue in the paper that graphs are naturally coinductive, rather than inductive, and that many of the problems with graphs in functional languages go away once you give up on induction and pattern-matching, and embrace the coinductive way of doing things.

Of course, coinduction comes with its own set of problems, especially when working in a total language or proof assistant. Another big focus of the paper was figuring out a representation that was amenable to formalisation (we formalised the paper in Cubical Agda). Picking a good representation for formalisation is a tricky thing: often a design decision you make early on only looks like a mistake after a few thousand lines of proofs, and modern formal proofs tend to be brittle, meaning that it’s difficult to change an early definition without also having to change everything that depends on it. On top of this, we decided to use quotients for an important part of the representation, and (as anyone who’s worked with quotients and coinduction will tell you) productivity proofs in the presence of quotients can be a real pain.

All that said, I think the representation we ended up with in the paper is quite nice. We start with a similar representation to the one we had in our ICFP paper in 2021: a graph over vertices of type a is simply a function a -> [a] that returns the neighbours of a supplied vertex (this is the same representation as in this post). Despite the simplicity, it turns out that this type is enough to implement a decent number of search algorithms. The really interesting thing is that the arrow methods (from Control.Arrow) work on this type, and they define an algebra on graphs similar to the one from Mokhov (2017). For example, the <+> operator is the same as the overlay operation in Mokhov (2017).

That simple type gets expanded upon and complicated: eventually, we represent a possibly-infinite collection as a function that takes a depth and then returns everything in the search space up to that depth. It’s a little like representing an infinite list as the partial application of the take function. The paper spends a lot of time picking an algebra that properly represents the depth, and figuring out coherency conditions etc.

One thing I’m especially proud of is that all the Agda code snippets in the paper are hyperlinked to a rendered html version of the code. Usually, when I want more info on some code snippet in a paper, I don’t really want to spend an hour or so downloading some artefact, installing a VM, etc. What I actually want is just to see all of the definitions the snippet relies on, and the 30 or so lines of code preceding it. With this paper, that’s exactly what you get: if you click on any Agda code in the paper, you’re brought to the source of that code block, and every definition is clickable so you can browse without having to install anything.

I think the audience for this paper is anyone who is interested in graphs in functional languages. It should be especially interesting to people who have dabbled in formalising some graphs, but who might have been stung by an uncooperative proof assistant. The techniques in the second half of the paper might help you to convince Agda (or Idris, or Rocq) to accept your coinductive and quotient-heavy arguments.

Mokhov, Andrey. 2017. “Algebraic Graphs with Class (Functional Pearl).” In Proceedings of the 10th ACM SIGPLAN International Symposium on Haskell, 2–13. Haskell 2017. New York, NY, USA: ACM. doi:10.1145/3122955.3122956.

by Donnacha Oisín Kidney at November 08, 2024 12:00 AM

November 07, 2024

Tweag I/O

Exploring Effect in TypeScript: Simplifying Async and Error Handling

Effect is a powerful library for TypeScript developers that brings functional programming techniques into managing effects and errors. It aims to be a comprehensive utility library for TypeScript, offering a range of tools that could potentially replace specialized libraries like Lodash, Zod, Immer, or RxJS.

In this blog post, we will introduce you to Effect by creating a simple weather widget app. This app will allow users to search for weather information by city name, making it a good example as it involves API data fetching, user input handling, and error management. We will implement this project in both vanilla TypeScript and using Effect to demonstrate the advantages Effect brings in terms of code readability and maintainability.

What is Effect?

Effect promises to improve TypeScript code by providing a set of modules and functions that are composable with maximum type-safety. The term “effect” refers to an effect system, which provides a declarative approach to handling side effects. Side effects are operations that have observable consequences in the real world, like logging, network requests, database operations, etc. The library revolves around the Effect<Success, Error, Requirements> type, which can be used to represent an immutable value that lazily describes a workflow or job. Effects are not functions by themselves, they are descriptions of what should be done. They can be composed with other effects, and they can be interpreted by the Effect runtime system. Before we dive into the project we will build, let’s look at some basic concepts of Effect.

Creating effects

We can create an effect based on a value using the Effect.succeed and Effect.fail functions:

const success: Effect.Effect<number, never, never> = Effect.succeed(42)

const fail: Effect.Effect<never, Error, never> = Effect.fail(
  new Error("Something went wrong")
)
  • An effect with never as the Error means it never fails
  • An effect with never as the Success means it never produces a successful value.
  • An effect with never as the Requirements means it doesn’t require any context to run.

With the functions above, we can create effects like this:

const divide = (a: number, b: number): Effect.Effect<number, Error, never> =>
  b === 0
    ? Effect.fail(new Error("Cannot divide by zero"))
    : Effect.succeed(a / b)

To create an effect based on a function, we can use the Effect.sync and Effect.promise for synchronous and asynchronous functions that can’t fail, respectively, and Effect.try and Effect.tryPromise for synchronous and asynchronous functions that can fail.

// Synchronous function that can't fail
const log = (message: string): Effect.Effect<void, never, never> =>
  Effect.sync(() => console.log(message))

// Asynchronous function that can't fail
const delay = (message: string): Effect.Effect<string, never, never> =>
  Effect.promise<string>(
    () =>
      new Promise(resolve => {
        setTimeout(() => {
          resolve(message)
        }, 2000)
      })
  )

// Synchronous function that can fail
const parse = (input: string): Effect.Effect<any, Error, never> =>
  Effect.try({
    // JSON.parse may throw for bad input
    try: () => JSON.parse(input),
    // remap the error
    catch: _unknown => new Error(`something went wrong while parsing the JSON`),
  })

// Asynchronous function that can fail
const getTodo = (id: number): Effect.Effect<Response, Error, never> =>
  Effect.tryPromise({
    // fetch can throw for network errors
    try: () => fetch(`https://jsonplaceholder.typicode.com/todos/${id}`),
    // remap the error
    catch: unknown => new Error(`something went wrong ${unknown}`),
  })

For more details about creating effects you can check the Effect documentation.

Running effects

In order to run an effect, we need to use the appropriate function depending on the effect type. In our application we’ll use the Effect.runPromise function, which is used for effects that are asynchronous and can’t fail:

Effect.runPromise(delay("Hello, World!")).then(console.log)
// -> Hello, World! (after 2 seconds)

You can read about other ways to run effects, and what happens when you don’t use the correct function, in the “Running Effects” page of the Effect documentation.

Pipe

When writing a program using Effect, we usually need to run a sequence of operations, and we can use the pipe function to compose them:

const double = (n: number) => n * 2

const divide =
  (b: number) =>
  (a: number): Effect.Effect<number, Error> =>
    b === 0
      ? Effect.fail(new Error("Cannot divide by zero"))
      : Effect.succeed(a / b)

const increment = (n: number) => Effect.succeed(n + 1)

const result = pipe(
  42,
  // Here we have an Effect.Effect<number, Error> with the value 21
  divide(2),
  // To run a function over the value changing the effect's value, we use Effect.map
  Effect.map(double),
  // To run a function over the value without changing the effect's value, we use Effect.tap
  Effect.tap(n => console.log(`The double is ${n}`)),
  // To run a function that returns a new effect, we use Effect.andThen
  Effect.andThen(increment),
  Effect.tap(n => console.log(`The incremented value is ${n}`))
)

Effect.runSync(result)
// -> The double is 42
// -> The incremented value is 43

If you want to know more about the pipe function, you can check this page on the Effect documentation.

The project

Now that we have a basic understanding of Effect, we can start the project! We will build a simple weather app in which the user types the name of a city, selects the desired one from a list of suggestions, and then the app shows the current weather in that city.
The project will have three main components: the input field, the list of suggestions, and the weather information.

We will use the Open-Meteo API to get the weather information as it doesn’t require an API key.

Setup

We begin by creating a new TypeScript project:

mkdir weather-app
cd weather-app
npm init -y

Next, we install the dependencies. We will use Parcel to bundle the project as it works without any configuration:

npm install --save-dev parcel

Now we create the project structure:

mkdir src
touch src/index.html
touch src/styles.scss
touch src/index.ts

The index.html file contains a main element with sections: one with a text input for city input and another for displaying weather information.

You can check the HTML and SCSS code in the GitHub repository.

In order to run the project, we need to add the following keys to the package.json file:

{
  "source": "./src/index.html",
  "scripts": {
    "dev": "parcel",
    "build": "parcel build"
  }
}

Now we can run the project:

npm run dev

Server running at http://localhost:1234
✨ Built in 8ms

By accessing the URL, you should see the application, but it won’t work yet.

Figure 1. Application's initial state
Figure 1. Application's initial state

Let’s write the TypeScript code!

Without Effect

All the following code examples should be placed in the src/index.ts file.

First, we query the elements from the DOM:

// The field input
const cityElement = document.querySelector<HTMLInputElement>("#city")
// The list of suggestions
const citiesElement = document.querySelector<HTMLUListElement>("#cities")
// The weather information
const weatherElement = document.querySelector<HTMLDivElement>("#weather")

Next, we’ll define the types for the data we’ll fetch from the API.
To validate the data, we’ll use a library called Zod. Zod is a TypeScript-first schema declaration and validation library.

npm install zod

First, we define the schema by using z.object and, for each property, we use z.string, z.number and other functions to define its type:

import { z } from "zod"

// ...

const CityResponse = z.object({
  name: z.string(),
  country_code: z.string().length(2),
  latitude: z.number(),
  longitude: z.number(),
})

const GeocodingResponse = z.object({
  results: z.array(CityResponse),
})

With the schema defined, we can use the z.infer utility type to infer the type of the data based on the schema:

type CityResponse = z.infer<typeof CityResponse>

type GeocodingResponse = z.infer<typeof GeocodingResponse>

Now, we create the function to fetch the cities from the Open-Meteo API. It fetches the cities that match the given name and returns a list of suggestions. In order to validate the API response, we use the safeParse method that our GeocodingResponse Zod schema provides. This method returns an object with two key properties:

  1. success: A boolean indicating if the parsing succeeded.
  2. data: The parsed data if successful, matching our defined schema.
const getCity = async (city: string): Promise<CityResponse[]> => {
  try {
    const response = await fetch(
      `https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json`
    )

    // Convert the response to JSON
    const geocoding = await response.json()

    // Parse the response using the GeocodingResponse schema
    const parsedGeocoding = GeocodingResponse.safeParse(geocoding)

    if (!parsedGeocoding.success) {
      return []
    }

    return parsedGeocoding.data.results
  } catch (error) {
    console.error("Error:", error)
    return []
  }
}

To make the input field work, we need to attach an event listener to it to call the getCity function:

const getCities = async function (input: HTMLInputElement) {
  const { value } = input

  // Check if the HTML element exists
  if (citiesElement) {
    // Clear the list of suggestions
    citiesElement.innerHTML = ""
  }

  // Check if the input is empty
  if (!value) {
    return
  }

  // Fetch the cities
  const results = await getCity(value)

  renderCitySuggestions(results)
}

cityElement?.addEventListener("input", function (_event) {
  getCities(this)
})

Next, we create the renderCitySuggestions function to render the list of suggestions or display an error message if there are no suggestions:

const renderCitySuggestions = (cities: CityResponse[]) => {
  // If there are cities, populate the suggestions
  if (cities.length > 0) {
    populateSuggestions(cities)
    return
  }

  // Otherwise, show a message that the city was not found
  if (weatherElement) {
    const search = cityElement?.value || "searched"
    weatherElement.innerHTML = `<p>City ${search} not found</p>`
  }
}

The populateSuggestions function is very simple - it creates a list item for each city:

const populateSuggestions = (results: CityResponse[]) =>
  results.forEach(city => {
    const li = document.createElement("li")
    li.innerText = `${city.name} - ${city.country_code}`
    citiesElement?.appendChild(li)
  })

Now if we type a city name in the input field, we should see the list of suggestions:

Figure 2. City suggestions
Figure 2. City suggestions

Great!

The next step is to implement the selectCity function that fetches the weather information of a city and displays it:

const selectCity = async (result: CityResponse) => {
  // If the HTML element doesn't exist, return
  if (!weatherElement) {
    return
  }

  try {
    const data = await getWeather(result)

    if (data.tag === "error") {
      throw data.value
    }

    const {
      temperature_2m,
      apparent_temperature,
      relative_humidity_2m,
      precipitation,
    } = data.value.current

    weatherElement.innerHTML = `
 <h2>${result.name}</h2>
 <p>Temperature: ${temperature_2m}°C</p>
 <p>Feels like: ${apparent_temperature}°C</p>
 <p>Humidity: ${relative_humidity_2m}%</p>
 <p>Precipitation: ${precipitation}mm</p>
 `
  } catch (error) {
    weatherElement.innerHTML = `<p>An error occurred while fetching the weather: ${error}</p>`
  }
}

Then we call it in the populateSuggestions function:

const populateSuggestions = (results: CityResponse[]) =>
  results.forEach(city => {
    // ...
    li.addEventListener("click", () => selectCity(city))
    citiesElement?.appendChild(li)
  })

The last piece of the puzzle is the getWeather function. Once again, we’ll use Zod to create the schema and the type for the weather information.

type WeatherResult =
  | { tag: "ok"; value: WeatherResponse }
  | { tag: "error"; value: unknown }

const WeatherResponse = z.object({
  current_units: z.object({
    temperature_2m: z.string(),
    relative_humidity_2m: z.string(),
    apparent_temperature: z.string(),
    precipitation: z.string(),
  }),
  current: z.object({
    temperature_2m: z.number(),
    relative_humidity_2m: z.number(),
    apparent_temperature: z.number(),
    precipitation: z.number(),
  }),
})

type WeatherResponse = z.infer<typeof WeatherResponse>

const getWeather = async (result: CityResponse): Promise<WeatherResult> => {
  try {
    const response = await fetch(
      `https://api.open-meteo.com/v1/forecast?latitude=${result.latitude}&longitude=${result.longitude}&current=temperature_2m,relative_humidity_2m,apparent_temperature,precipitation&timezone=auto&forecast_days=1`
    )

    // Convert the response to JSON
    const weather = await response.json()

    // Parse the response using the WeatherResponse schema
    const parsedWeather = WeatherResponse.safeParse(weather)

    if (!parsedWeather.success) {
      return { tag: "error", value: parsedWeather.error }
    }

    return { tag: "ok", value: parsedWeather.data }
  } catch (error) {
    return { tag: "error", value: error }
  }
}

We have a type WeatherResult for error handling; it can be ok or error. The getWeather function fetches the weather information based on the latitude and longitude of a city and returns the result. We are passing some parameters to the API to get the current temperature, humidity, apparent temperature, and precipitation. If you want to know more about these parameters, you can check the API documentation.

One last thing we need to do is to use a debounce function to avoid making too many requests to the API while the user is typing. To do that, we’ll install Lodash which provides many useful functions for everyday programming.

npm install lodash
npm install --save-dev @types/lodash

We’ll wrap the getCities function with the debounce function:

import { debounce } from "lodash"

// ...

const getCities = debounce(async function (input: HTMLInputElement) {
  // The same code as before
}, 500)

This way, the getCities function will be called only after the user stops typing for 500 milliseconds.

Our small weather app is now complete: when we type a city name in the input field, a list of suggestions is displayed, and when we click on one of them, we can see the weather information for that city.

Figure 3. Weather information
Figure 3. Weather information

While our current code works and handles errors well, let’s explore how using Effect can potentially improve its robustness and simplicity.

With Effect

To get started with Effect, we need to install it:

npm install effect

We will start by refactoring the functions in the order we implemented them in the previous section.

First, we refactor the querySelector calls. We’ll use the Option type from Effect: it represents a value that may or may not exist. If the value exists, it’s a Some, if it doesn’t, it’s a None.

import { Option } from "effect"

// The field input
const cityElement = Option.fromNullable(
  document.querySelector<HTMLInputElement>("#city")
)
// The list of suggestions
const citiesElement = Option.fromNullable(
  document.querySelector<HTMLUListElement>("#cities")
)
// The weather information
const weatherElement = Option.fromNullable(
  document.querySelector<HTMLDivElement>("#weather")
)

Using the Option type, we can chain operations without worrying about null or undefined values. This approach simplifies our code by eliminating the need for explicit null checks. We can use functions like Option.map and Option.andThen to handle the transformations and checks in a more elegant way. To know more about the Option type, take a look at the page about it in the documentation.

Now, let’s move to the getCity function. We’ll use the Schema.Struct to define the types of the CityResponse and GeocodingResponse objects. Those schemas will be used to validate the response from the API. This is the same thing we did before with Zod, but this time we don’t have to install any library. Instead, we can just use the Schema module that Effect provides.

import { /* ... */, Effect, Scope, pipe } from "effect";
import { Schema } from "@effect/schema"
import {
  FetchHttpClient,
  HttpClient,
  HttpClientResponse,
  HttpClientError
} from "@effect/platform";

// ...

const CityResponse = Schema.Struct({
  name: Schema.String,
  country_code: pipe(Schema.String, Schema.length(2)),
  latitude: Schema.Number,
  longitude: Schema.Number,
})

type CityResponse = Schema.Schema.Type<typeof CityResponse>

const GeocodingResponse = Schema.Struct({
  results: Schema.Array(CityResponse),
})

type GeocodingResponse = Schema.Schema.Type<typeof GeocodingResponse>

const getRequest = (url: string): Effect.Effect<HttpClientResponse.HttpClientResponse, HttpClientError.HttpClientError, Scope.Scope> =>
  pipe(
    HttpClient.HttpClient,
    // Using `Effect.andThen` to get the client from the `HttpClient.HttpClient` tag and then make the request
    Effect.andThen(client => client.get(url)),
    // We don't need to send the tracing headers to the API to avoid CORS errors
    HttpClient.withTracerPropagation(false),
    // Providing the HTTP client to the effect
    Effect.provide(FetchHttpClient.layer)
  )

const getCity = (city: string): Effect.Effect<readonly CityResponse[], never, never> =>
  pipe(
    getRequest(
      `https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json`
    ),
    // Validating the response using the `GeocodingResponse` schema
    Effect.andThen(HttpClientResponse.schemaBodyJson(GeocodingResponse)),
    // Providing a default value in case of failure
    Effect.orElseSucceed<GeocodingResponse>(() => ({ results: [] })),
    // Extracting the `results` array from the `GeocodingResponse` object
    Effect.map(geocoding => geocoding.results),
    // Providing a scope to the effect
    Effect.scoped
  )

Here we already have some interesting things happening!

The getRequest function sets up the HTTP client. While we could use the built-in fetch API as our HTTP client, Effect provides a solution called HttpClient in the @effect/platform package. It’s important to note that this package is currently in beta, as mentioned in the official documentation. Despite its beta status, we’ll be using it to explore more of Effect’s capabilities and showcase how it integrates with the broader Effect ecosystem. This choice allows us to demonstrate Effect’s approach to HTTP requests and error handling in a more idiomatic way. HttpClient.HttpClient is something called a “tag” that we can use to get the HTTP client from the context. To do that, we use the Effect.andThen function.
After that, we’re setting withTracerPropagation to false to avoid sending the tracing headers to the API and getting a CORS error.

Since we’re using the HttpClient service, it’s a requirement to our effect (remember the Effect<Success, Error, Requirements> type?) and we need to provide this requirement in order to run the effect.
With the Effect.provide function we can add a layer to the effect that provides the HttpClient service. For more information about the Effect.provide function and how it works, take a look at the runtime page on the Effect documentation.

In the getCity function, we call the getRequest function to get the response from the API. Then we validate the response using the HttpClientResponse.schemaBodyJson function, which validates the response body using the GeocodingResponse schema.
In the last line of the function, we use the Effect.scoped function to provide a scope to the effect, this is a requirement for the HttpClient service that we’re using in the getRequest function. The scope ensures that if the program is interrupted, any request will be aborted, preventing memory leaks. getCity returns a Effect.Effect<CityResponse[], never, never>: the two never means it never fails (we’re providing a default value in case of failure), and it doesn’t require any context to run.

Next, we refactor the getCities function:

import { /* ... */, Effect, Option, pipe } from "effect";

// ...

const getCities = (search: string): Effect.Effect<Option.Option<void>, never, never> => {
  Option.map(citiesElement, citiesEl => (citiesEl.innerHTML = ""))

  return pipe(
    getCity(search),
    Effect.map(renderCitySuggestions),
    // Check if the input is empty
    Effect.when(() => Boolean(search))
  )
}

We’re using the Option.map function to access the actual citiesElement and clear the list of suggestions. After that, it’s pretty straightforward: we call the getCity function with the search term, then we map the renderCitySuggestions function over the successful value, and finally, we apply a condition that makes the effect run only if the search term is not empty.

Here is how we add the event listener to the input field:

import { /* ... */, Effect, Option, pipe, Stream, Chunk, StreamEmit } from "effect";

// ...

Option.map(cityElement, cityEl => {
  const stream = Stream.async(
    (emit: StreamEmit.Emit<never, never, string, void>) =>
      cityEl.addEventListener("input", function (_event) {
        emit(Effect.succeed(Chunk.of(this.value)))
      })
  )

  pipe(
    stream,
    Stream.debounce(500),
    Stream.runForEach(getCities),
    Effect.runPromise
  )
})

Actually, we’re doing more than just adding an event listener. The debounce function that we had to import from Lodash before is now part of Effect as the Stream.debounce function. In order to use this function, we need to create a Stream.
A Stream has the type Stream<A, E, R> and it’s a program description that, when executed, can emit zero or more values of type A, handle errors of type E, and operates within a context of type R. There are a couple of ways to create a Stream, which are detailed in the page about streams in the documentation. In this case, we’re using the Stream.async function as it receives a callback that emits values to the stream.

After creating the Stream and assigning it to the stream variable, we use a pipe to build a pipeline where we debounce the stream by 500 milliseconds, run the getCities function whenever the stream gets a value (that is, when we emit a value), and finally run the effect with Effect.runPromise.

Let’s move on to the renderCitySuggestions function:

import { /* ... */, Array, Option, pipe } from "effect";

// ...

const renderCitySuggestions = (cities: readonly CityResponse[]): void | Option.Option<void> =>
  // If there are multiple cities, populate the suggestions
  // Otherwise, show a message that the city was not found
  pipe(
    cities,
    Array.match({
      onNonEmpty: populateSuggestions,
      onEmpty: () => {
        const search = Option.match(cityElement, {
          onSome: (cityEl) => cityEl.value,
          onNone: () => "searched",
        });

        Option.map(
          weatherElement,
          (weatherEl) =>
            (weatherEl.innerHTML = `<p>City ${search} not found</p>`),
        );
      },
    }),
  );

Instead of manually checking the length of the cities array, we’re using the Array.match function to handle that. If the array is empty, it calls the callback defined in the onEmpty property, and if the array is not empty, it calls the callback defined in the onNonEmpty property.

The populateSuggestions function remains almost the same. The only change is that we now wrap the forEach operation in an Option.map to safely handle the optional cities element. This ensures we only attempt to populate suggestions when the element exists.

The selectCity function is simpler now:

import { /* ... */, Option, pipe } from "effect";

// ...

const selectCity = (result: CityResponse): Option.Option<Promise<string>> =>
  Option.map(weatherElement, weatherEl =>
    pipe(
      result,
      getWeather,
      Effect.match({
        onFailure: error =>
          (weatherEl.innerHTML = `<p>An error occurred while fetching the weather: ${error}</p>`),
        onSuccess: (weatherData: WeatherResponse) =>
          (weatherEl.innerHTML = `
<h2>${result.name}</h2>
<p>Temperature: ${weatherData.current.temperature_2m}°C</p>
<p>Feels like: ${weatherData.current.apparent_temperature}°C</p>
<p>Humidity: ${weatherData.current.relative_humidity_2m}%</p>
<p>Precipitation: ${weatherData.current.precipitation}mm</p>
`),
      }),
      Effect.runPromise
    )
  )

There is no checking for the data.tag any more, we’re using the Effect.match function to handle both cases, success and failure, and we don’t throw anything anymore.

Finally, the getWeather function:

import { /* ... */, Effect, pipe } from "effect";
import { Schema, ParseResult } from "@effect/schema";
import { /* ... */, HttpClientResponse, HttpClientError } from "@effect/platform";

// ...

const WeatherResponse = Schema.Struct({
  current_units: Schema.Struct({
    temperature_2m: Schema.String,
    relative_humidity_2m: Schema.String,
    apparent_temperature: Schema.String,
    precipitation: Schema.String,
  }),
  current: Schema.Struct({
    temperature_2m: Schema.Number,
    relative_humidity_2m: Schema.Number,
    apparent_temperature: Schema.Number,
    precipitation: Schema.Number,
  }),
})

type WeatherResponse = Schema.Schema.Type<typeof WeatherResponse>

const getWeather = (
  result: CityResponse,
): Effect.Effect<WeatherResponse, HttpClientError.HttpClientError | ParseResult.ParseError, never> =>
  pipe(
    getRequest(
      `https://api.open-meteo.com/v1/forecast?latitude=${result.latitude}&longitude=${result.longitude}&current=temperature_2m,relative_humidity_2m,apparent_temperature,precipitation&timezone=auto&forecast_days=1`
    ),
    Effect.andThen(HttpClientResponse.schemaBodyJson(WeatherResponse)),
    Effect.scoped
  )

We’re again using the Schema.Struct to define the WeatherResponse type. However, we don’t need to have a WeatherResult anymore as the Effect type already handles the success and failure cases.

After this refactoring, the app works the same way it did before, but now we have the confidence that our code is more robust and type-safe. Let’s see the benefits of Effect when comparing to the code without it.

Conclusion

Now that we have the two versions of the application, we can analyze them and highlight the pros and cons of using Effect:

Pros

  • Type-safety: Effect provides a way to handle errors and requirements in a type-safe way and using it increases the overall type safety of our app.
  • Error handling: The Effect type has built-in error handling, making the code more robust.
  • Validation: We don’t need to use a library like Zod to validate the response - we can use the Schema module to validate the response.
  • Utility functions: We don’t need to use a library like Lodash to use utility functions. Instead, we can use the Array, Option, Stream, and other modules.
  • Declarative style: Writing code with Effect means we’re using a more declarative approach: we’re describing “what” we want our program to do, rather than “how” we want it to do it.

Cons

  • Complexity: The code is more complex than the one without Effect; it may be hard to understand for people who are not familiar with the library.
  • Learning curve: You need to learn how to use the library - it’s not as simple as writing plain TypeScript code.
  • Documentation: The documentation is good, but could be better. Some parts are not clear.

While the code written with Effect may initially appear more complex to those unfamiliar with the library, its benefits far outweigh the initial learning curve. Effect offers powerful tools for maximum type-safety, error handling, asynchronous operations, streams and more, all within a single library that is incrementally adoptable. In our project, we used two separate libraries (Zod and Lodash) to achieve what Effect accomplishes on its own.

While plain TypeScript may be adequate for small projects, we believe Effect can truly shine in larger, more complex applications. Its robust handling of side-effects and comprehensive error management have the potential to make it a game changer for taming complexity and maintaining code quality at scale.

November 07, 2024 12:00 AM

Donnacha Oisín Kidney

POPL Paper—Algebraic Effects Meet Hoare Logic in Cubical Agda

Posted on November 7, 2024
Tags:

New paper: “Algebraic Effects Meet Hoare Logic in Cubical Agda”, by myself, Zhixuan Yang, and Nicolas Wu, will be published at POPL 2024.

Zhixuan has a nice summary of it here.

The preprint is available here.

by Donnacha Oisín Kidney at November 07, 2024 12:00 AM

November 06, 2024

Well-Typed.Com

The Haskell Unfolder Episode 35: distributive and representable functors

Today, 2024-11-06, at 1930 UTC (11:30 am PST, 2:30 pm EST, 7:30 pm GMT, 20:30 CET, …) we are streaming the 35th episode of the Haskell Unfolder live on YouTube.

The Haskell Unfolder Episode 35: distributive and representable functors

We’re going to look at two somewhat more exotic type classes in the Haskell library ecosystem: Distributive and Representable. The former allows you to distribute one functor over another, the latter provides you with a notion of an index to access the elements. As an example, we’ll return once more to the grids used in Episodes 32 and 33 to describe the tic-tac-toe game, and we’ll see how some operations we used can be made more elegant in terms of these type classes. This episode is, however, self-contained; having seen the previous episodes is not required.

About the Haskell Unfolder

The Haskell Unfolder is a YouTube series about all things Haskell hosted by Edsko de Vries and Andres Löh, with episodes appearing approximately every two weeks. All episodes are live-streamed, and we try to respond to audience questions. All episodes are also available as recordings afterwards.

We have a GitHub repository with code samples from the episodes.

And we have a public Google calendar (also available as ICal) listing the planned schedule.

There’s now also a web shop where you can buy t-shirts and mugs (and potentially in the future other items) with the Haskell Unfolder logo.

by andres, edsko at November 06, 2024 12:00 AM

November 05, 2024

Edward Z. Yang

Ways to use torch.compile

On the surface, the value proposition of torch.compile is simple: compile your PyTorch model and it runs X% faster. But after having spent a lot of time helping users from all walks of life use torch.compile, I have found that actually understanding how this value proposition applies to your situation can be quite subtle! In this post, I want to walk through the ways to use torch.compile, and within these use cases, what works and what doesn't. By the way, some of these gaps are either served by export, or by missing features we are actively working on, those will be some other posts!

Improve training efficiency on a small-medium scale

Scenario: You have a model in PyTorch that you want to train at a small-medium scale (e.g., below 1K GPUs--at the 1K point there is a phase change in behavior that deserves its own section). You would like it to train faster. Locally, it's nice to get a trained model faster than you would have otherwise. But globally, the faster everyone's models train, the less GPU hours they use, which means you can run more jobs in a given time window with a fixed cluster. If your supply of GPUs is inelastic (lol), efficiency improvement means you can support more teams and use cases for the same amount of available GPUs. At a capacity planning level, this can be a pretty big deal even if you are GPU rich.

What to do: In some sense, this is the reason we built torch.compile. (When we were initially planning torch.compile, we were trying to assess if we were going after inference; but inference compilers are a much more crowded space than training compilers, and we reasoned that if we did a good job building a training compiler, inference would work too--which it did!) The dream which we sold with torch.compile is that you could slap it on the top of your model and get a speed up. This turns out to... not quite be true? But the fact remains that if you're willing to put in some work, there is almost always performance waiting at the end of the road for you. Some tips:

  • Compile only the modules you need. You don't have to compile the entire model; there might be specific modules which are easy to compile which will give you the most of the benefit. For example, in recommendation systems, there is not much compute improvement to be had from optimizing the embedding lookups, and their model parallelism is often quite hard to handle in the compiler, so torch.compiler.disable them. NB: This doesn't apply if you want to do some global graph optimization which needs the whole model: in that case, pass fullgraph=True to torch.compile and ganbatte!
  • Read the missing manual. The missing manual is full of guidance on working with the compiler, with a particular emphasis on working on training.

Open source examples: torchtune and torchtitan are two first party libraries which are intended to showcase modern PyTorch using torch.compile in a training context. There's also some training in torchao.

Downsides:

  • The compiler is complicated. One of the things we've slowly been coming to terms with is that, uh, maybe promising you could just slap torch.compile on a model and have it run faster was overselling the feature a teensy bit? There seems to be some irreducible complexity with compilers that any user bringing their own model to torch.compile has to grapple with. So yes, you are going to spend some of your complexity budget on torch.compile, in hopes that the payoff is worth it (we think it is!) One ameliorating factor is that the design of torch.compile (graph breaks) means it is very easy to incrementally introduce torch.compile into a codebase, without having to do a ton of upfront investment.
  • Compile time can be long. The compiler is not a straightforward unconditional win. Even if the compiler doesn't slow down your code (which it can, in pathological cases), you have to spend some amount of time compiling your model (investment), which you then have to make back by training the model more quickly (return). For very small experimentation jobs, or jobs that are simply crashing, the time spent compiling is just dead weight, increasing the overall time your job takes to run. (teaser: async compilation aims to solve this.) To make matters worse, if you are scheduling your job on systems that have preemption, you might end up repeatedly compiling over and over again every time your job gets rescheduled (teaser: caching aims to solve this.) But even when you do spend some time training, it is not obvious without an A/B test whether or not you are actually getting a good ROI. In an ideal world, everyone using torch.compile would actually verify this ROI calculation, but it doesn't happen automatically (teaser: automatic ROI calculation) and in large organizations we see people running training runs without even realizing torch.compile is enabled.
  • Numerics divergence from eager. Unfortunately, the compiler does not guarantee exact bitwise equivalence with eager code; we reserve the right to do things like select different matrix multiply algorithms with different numerics or eliminate unnecessary downcast/upcasts when fusing half precision compute together. The compiler is also complicated and can have bugs that can cause loss not to converge. Expect to also have to evaluate whether or not application of torch.compile affects accuracy. Fortunately, for most uses of compiler for training efficiency, the baseline is the eager model, so you can just run an ablation to figure out who is actually causing the accuracy problem. (This won't be true in a later use case when the compiler is load bearing, see below!)

Improve Python inference efficiency

Scenario: You've finished training your model and you want to deploy it for inference. Here, you want to improve the efficiency of inference to improve response latency or reduce the overall resource requirements of the system, so you can use less GPUs to serve the traffic you are receiving. Admittedly, it is fairly common to just use some other, more inference friendly systems (which I will decline to name by name lol) to serve the model. But let's say you can't rewrite the model in a more serving friendly language (e.g., because the model authors are researchers and they keep changing the model, or there's a firehose of models and you don't have the money to keep continuously porting each of them, or you depend on an ecosystem of libraries that are only available in CPython).

What to do: If Python can keep up with the CPU-side QPS requirements, a way of getting good performance without very much work is taking the Python model, applying torch.compile on it in the same way as you did in training and directly using this as your inference solution. Some tips that go beyond training:

  • Autotuning makes the most sense for inference. In training runs, you have a limited window (the lifetime of the training job) to get return on the investment you spent optimizing the model. In the serving regime, you can amortize over the entire lifetime of your model in inference, which is typically much longer. Therefore, expensive optimization modes like mode="max-autotune" are more likely to pay off!
  • Warmup inference processes before serving traffic to them. Because torch.compile is a just-in-time compiler, you will spend quite a bit of time compiling (even if you cache hit) at startup. If you have latency requirements, you will want to warmup a fresh process with a representative set of inputs so that you can make sure you trigger all of the compilation paths you need to hit. Caching will reduce compile time but not eliminate it.
  • Try skip_guard_eval_unsafe to reduce guard overhead. Dynamo guard overhead can be material in the inference case. If this is a problem, get a nightly and try skip_guard_eval_unsafe.

Open source examples: LLM serving on torch.compile is quite popular: vllm, sglang, tensorrt-llm, gpt-fast (this is technically not an E2E serving solution, but one of its primary reasons for existing is to serve as a starting point so you can build your own torch.compile based LLM inference stack on top of it). Stable diffusion models are also notable beneficiaries of torch.compile, e.g., diffusers.

Downsides:

  • Just in time compilation is a more complicated operational model. It would be better if you didn't have to warmup inference processes before serving traffic to them. Here, torch.compile has traded operational simplicity for ease of getting started. If you wanted to guarantee that compilation had already happened ahead of time, you have to instead commit to some sort of export-based flow (e.g., C++ GPU/CPU inference) below.
  • Model and dependency packaging in Python is unaddressed. You need to somehow package and deploy the actual Python code (and all its dependencies) which constitute the model; torch.compile doesn't address this problem at all (while torch.export does). If you are running a monorepo and do continuous pushes of your infra code, it can be organizationally complicated to ensure people don't accidentally break model code that is being shipped to production--it's very common to be asked if there's a way to "freeze" your model code so that the monorepo can move on. But with Python inference you have to solve this problem yourself, whether the solution is torch.package, Docker images, or something else.
  • Caches are not guaranteed to hit. Do you have to recompile the model every time you restart the inference process? Well, no, we have an Inductor and Triton (and an in-progress AOTAutograd) cache which in principle can cache all of the cubin's that are generated by torch.compile. Most of the time, you can rely on this to reduce startup cost to Dynamo tracing the model only. However, the caches are not guaranteed to hit: there are rarer cases where we don't know how to compute the cache key for some feature a model is using, or the compiler is nondeterministic in a way that means the cache doesn't hit. You should file bugs for all of these issues as we are interested in fixing them, but we don't give a categorical guarantee that after you've compiled your inference program once, you won't have to compile it again. (And indeed, under torch.compile's user model, we can't, because the user code might be the root cause of the nondeterminism--imagine a model that is randomly sampling to decide what version of a model to run.)
  • Multithreading is currently buggy. It should, in principle, be possible to run torch.compile'd code from multiple threads in Python and get a speedup, especially when CUDA graphs or CPP wrapper is used. (Aside: Inductor's default compile target is "Python wrapper", where Inductor's individually generated Triton kernels are called from Python. In this regime, you may get in trouble due to the GIL; CUDA graphs and CPP wrapper, however, can release the GIL when the expensive work is being done.) However, it doesn't work. Track the issue at https://github.com/pytorch/pytorch/issues/136833

Like above, but the compiler is load bearing

Scenario: In both the cases above, we assumed that we had a preexisting eager model that worked, and we just wanted to make it faster. But you can also use the compiler in a load bearing way, where the model does not work without the compiler. Here are two common cases where this can occur:

  1. Performance: A compiler optimization results in an asymptotic or large constant factor improvement in performance can make a naive eager implementation that would have otherwise been hopelessly slow have good performance. For example, SimpleFSDP chooses to apply no optimizations to the distributed collectives it issues, instead relying on the compiler to bucket and prefetch them for acceptable performance.
  2. Memory: A compiler optimization reduces the memory usage of a model, can allow you to fit a model or batch size that would otherwise OOM. Although we don't publicly expose APIs for doing so, you can potentially use the compiler to do things like force a certain memory budget when doing activation checkpointing, without requiring the user to manually specify what needs to be checkpointed.

What to do: Unlike in the previous cases where you took a preexisting model and slap torch.compile, this sort of use of the compiler is more likely to arise from a codevelopment approach, where you use torch.compile while you build your model, and are constantly checking what the compiler does to the code you write. Some tips:

  • Don't be afraid to write your own optimization pass. Inductor supports custom FX optimization passes. torch.compile has done the work of getting your model into an optimizable form; you can take advantage of this to apply domain specific optimizations that Inductor may not support natively.

Open source examples. SimpleFSDP as mentioned above. VLLM uses torch.compile to apply custom optimization passes. Although its implementation is considerably more involved than what you might reasonable expect a third party to implement, FlexAttention is a good example of a non-compiler feature that relies on the compiler in a load-bearing way for performance.

Downsides: Beyond the ones mentioned above:

  • You can no longer (easily) use eager as a baseline. This is not always true; for example, FlexAttention has an eager mode that runs everything unfused which can still be fast enough for small experiments. But if you have an accuracy problem, it may be hard to compare against an eager baseline if you OOM in that case! It turns out that it's really, really useful to have access to an eager implementation, so it's worth working harder to make sure that the eager implementation works, even if it is slow. (It's less clear how to do that with, e.g., a fancy global optimization based activation checkpointing strategy.)

Next time: Ways to use torch.export

by Edward Z. Yang at November 05, 2024 03:11 PM

Jeremy Gibbons

Alan Jeffrey, 1967–2024

My friend Alan Jeffrey passed away earlier this year. I described his professional life at a Celebration in Oxford on 2nd November 2024. This post is a slightly revised version of what I said.

Edinburgh, 1983–1987

I’ve known Alan for over 40 years—my longest-standing friend. We met at the University of Edinburgh in 1983, officially as computer science freshers together, but really through the clubs for science fiction and for role-playing games. Alan was only 16: like many in Scotland, he skipped the final school year for an earlier start at university. It surely helped that his school had no computers, so he wasted no time in transferring to a university that did. His brother David says that it also helped that he would then be able to get into the student union bars.

Oxford, 1987–1991

After Edinburgh, Alan and I wound up together again as freshers at the University of Oxford. We didn’t coordinate this; we independently and simultaneously applied to the same DPhil programme (Oxford’s name for the PhD). We were officemates for those 4 years, and shared a terraced hovel on St Mary’s Road in bohemian East Oxford with three other students for most of that time. He was clever, funny, kind, and serially passionate about all sorts of things. It was a privilege and a pleasure to have known him.

Alan had a career that spanned academia and industry, and he excelled at both. He described himself as a “semanticist”: using mathematics instead of English for precise descriptions of programming languages. He had already set out in that direction with his undergraduate project on concurrency under Robin Milner at Edinburgh; and he continued to work on concurrency for his DPhil under Bill Roscoe at Oxford, graduating in 1992.

Chalmers, 1991–1992

Alan spent the last year of his DPhil as a postdoc working for K V S Prasad at Chalmers University in Sweden. While there, he was assigned to host fellow Edinburgh alumnus Carolyn Brown visiting for an interview; Carolyn came bearing a bottle of malt whisky, as one does, which she and Alan proceeded to polish off together that evening.

Sussex, 1992–1999

Carolyn’s interview was successful; but by the time she arrived at Chalmers, Alan had left for a second postdoc under Matthew Hennessy at the University of Sussex. They worked together again when Carolyn was in turn hired as a lecturer at Sussex. In particular, they showed in 1994 that “string diagrams”—due to Roger Penrose and Richard Feynman in physics—provide a “fully abstract” calculus for hardware circuits, meaning that everything true of the diagrams is true of the hardware, and vice versa. This work foreshadowed a hot topic in the field of Applied Category Theory today.

Matthew essentially left Alan to his own devices: as Matthew put it, “something I was very happy with as he was an exceptional researcher”. Alan was soon promoted to a lectureship himself. He collaborated closely with Julian Rathke, then Matthew’s PhD student and later postdoc, on the Full Abstraction Factory project, developing a bunch more full abstraction results for concurrent and object-oriented languages. That fruitful collaboration continued even after Alan left Sussex.

DePaul, 1999–2004

Alan presented a paper A Fully Abstract Semantics for a Nondeterministic Functional Language with Monadic Types at the 1995 conference on Mathematical Foundations of Programming Semantics in New Orleans. I believe that this is where he met Karen Bernstein, who also had a paper. One thing led to another, and Alan took a one-year visiting position at DePaul University in Chicago in 1998, then formally left Sussex in 1999 for a regular Associate Professor position at DePaul. He lived in Chicago for the rest of his life.

Alan established the Foundations of Programming Languages research group at DePaul, attracting Radha Jagadeesan from Loyola, James Riely from Sussex, and Corin Pitcher from Oxford, working among other things on “relaxed memory”—modern processors don’t actually read and write their multiple levels of memory in the order you tell them to, when they can find quicker ways to do things concurrently and sometimes out of order.

James remembers showing Alan his first paper on relaxed memory, co-authored with Radha. Alan thought their approach was an “appalling idea”; the proper way was to use “event structures”, an idea from the 1980s. This turned in 2016 into a co-authored paper at LICS (Alan’s favourite conference), and what James considers his own best ever talk—an on-stage reenactment of the to and fro of their collaboration, sadly not recorded for posterity.

James was Alan’s most frequent collaborator over the years, with 14 joint papers. Their modus operandi was that, having identified a problem together, Alan would go off by himself and do some Alanny things, eventually coalescing on a solution, and choose an order of exposition, tight and coherent; this is about 40% of the life of the paper. But then there are various tweaks, extensions, corrections… Alan would never look at the paper again, and would be surprised years later to learn what was actually in it. However, Alan was always easy to work with: interested only in the truth, although it must be beautiful. He had a curious mix of modesty and egocentricity: always convinced he was right (and usually right that he was right). Still, he had no patience for boring stuff, especially university admin.

While at DePaul, Alan also had a significant collaboration with Andy Gordon from Microsoft on verifying security protocols. Their 2001 paper Authenticity by Typing for Security Protocols won a Test Of Time Award this year at the Symposium on Computer Security Foundations, “given to outstanding papers with enduring significance and impact”—recognition which happily Alan lived to see.

Bell Labs, 2004–2015

After the dot com crash in 2000, things got more difficult at DePaul, and Alan left in 2004 for Bell Labs, nominally as a member of technical staff in Naperville but actually part of a security group based at HQ in Murray Hill NJ. He worked on XPath, “a modal logic for XML”, with Michael Benedikt, now my databases colleague at Oxford. They bonded because only Alan and Michael lived in Chicago rather than the suburbs. Michael had shown Alan a recent award-winning paper in which Alan quickly spotted an error in a proof—an “obvious” and unproven lemma that turned out to be false—which led to their first paper together.

(A recurring pattern. Andy Gordon described Alan’s “uncanny ability to find bugs in arguments”: he found a type unsoundness bug in a released draft specification for Java, and ended up joining the standards committee to help fix it. And as a PhD examiner he “shockingly” found a subtle bug that unpicked the argument of half of the dissertation, necessitating major corrections: it took a brave student to invite Alan as examiner—or a very confident one.)

Michael describes Alan as an “awesome developer”. They once had an intern; it didn’t take long after the intern had left for Alan to discard the intern’s code and rewrite it from scratch. Alan was unusual in being able to combine Euro “abstract nonsense” and US engineering. Glenn Bruns, another Bell Labs colleague, said that “I think Alan was the only person I’ve met who could do theory and also low-level hackery”.

At Bell Labs Alan also worked with Peter Danielsen on the Web InterFace Language, WIFL for short: a way of embedding API descriptions in HTML. Peter recalls: “We spent a few months working together on the conceptual model. In the early stages of software development, however, Alan looked at what I’d written and said, “I wouldn’t do it that way at all!”, throwing it all away and starting over. The result was much better; and he inadvertently taught me a new way to think in JavaScript (including putting //Sigh… comments before unavoidable tedious code.)”

Mozilla Research, 2015–2020

The Bell Labs group dissolved in 2015, and Alan moved to Mozilla Research as a staff research engineer to work on Servo, a new web rendering engine in the under-construction programming language Rust.

For one of Alan’s projects at Mozilla, he took a highly under-specified part of the HTML specification about how web links and the back and forwards browser buttons should interact, created a formal model in Agda based on the existing specification, identified gaps in it as well as ways that major browsers did not match the model, then wrote it all up as a paper. Alan’s manager Josh Matthews recalls the editors of the HTML standard being taken aback by Alan’s work suddenly being dropped in their laps, but quickly appreciated how much more confidently they could make changes based on it.

Josh also recalled: “Similarly, any time other members of the team would talk about some aspect of the browser engine being safe as long as some property was upheld by the programmer, Alan would get twitchy. He had a soft spot for bad situations that looked like they could be made impossible by clever enough application of static types.”

In 2017 Alan made a rather surprising switch to working on augmented reality for the web, partly driven by internal politics at Mozilla. He took the lead on integrating Servo into the Magic Leap headset; the existing browser was clunky, the only way to interact with pages being via an awkward touchpad on the hand controller. This was not good enough for Alan: after implementing the same behaviour for Servo and finding it frustrating, he had several email exchanges with the Magic Leap developers, figured out how to access some interfaces that weren’t technically public but also were not actually private, and soon he proudly showed off a more natural laser pointer-style means of interacting with pages in augmented reality in Servo—to much acclaim from the team and testers.

Roblox, 2020–2024

Then in 2020, Mozilla’s funding stream got a lot more constrained, and Alan moved to the game platform company Roblox. Alan was a principal software engineer, and the language owner of Luau, “a fast, small, safe, gradually typed embeddable scripting language derived from Lua”, working on making the language easier to use, teach, and learn. Roblox supports more than two million “content creators”, mostly kids, creating millions of games a year; Alan’s goal was to empower them to build larger games with more characters.

The Luau product manager Bryan Nealer says that “people loved Alan”. Roblox colleagues appreciated his technical contributions: “Alan was meticulous in what he built and wrote at Roblox. He would stress not only the substance of his work, but also the presentation. His attention to detail inspired the rest of us!”; “One of the many wonderful things Alan did for us was to be the guy who could read the most abstruse academic research imaginable and translate it into something simple, useful, interesting, and even fun.” They also appreciated the more personal contributions: Alan led an internal paper reading group, meeting monthly to study some paper on programming or networking, but he also established the Roblox Book Club: “He was always thoughtful when discussing books, and challenged us to think about the text more deeply. He also had an encyclopedic knowledge of scifi. He recommended Iain M. Banks’s The Culture series to me, which has become my favorite scifi series. I think about him every time I pick up one of those books.”

Envoie

From my own perspective, one of the most impressive things about Alan is that he was impossible to pigeonhole: like Dr Who, he was continually regenerating. He explained to me that he got bored quickly with one area, and moved on to another. As well as his academic abilities, he was a talented and natural cartoonist: I still have a couple of the tiny fanzine comics he produced as a student.

Of course he did some serious science for his DPhil and later career: but he also took a strong interest in typography and typesetting. He digitized some beautiful Japanese crests for the chapter title pages of his DPhil dissertation. Alan dragged me in typography with him, a distraction I have enjoyed ever since. Among other projects, Alan and I produced a font containing some extra symbols so that we could use them in our papers, and named it St Mary’s Road after our Oxford digs. And Alan produced a full blackboard bold font, complete with lowercase letters and punctuation: you can see some of it in the order of service. But Alan was not satisfied with merely creating these things; he went to all the trouble to package them up properly and get them included in standard software distributions, so that they would be available for everyone: Alan loved to build things for people to use. These two fonts are still in regular use 35 years later, and I’m sure they will be reminding us of him for a long time to come.

by jeremygibbons at November 05, 2024 01:13 PM

GHC Developer Blog

GHC 9.12.1-alpha2 is now available

GHC 9.12.1-alpha2 is now available

Zubin Duggal - 2024-11-05

The GHC developers are very pleased to announce the availability of the second alpha release of GHC 9.12.1. Binary distributions, source distributions, and documentation are available at downloads.haskell.org.

We hope to have this release available via ghcup shortly.

GHC 9.12 will bring a number of new features and improvements, including:

  • The new language extension OrPatterns allowing you to combine multiple pattern clauses into one.

  • The MultilineStrings language extension to allow you to more easily write strings spanning multiple lines in your source code.

  • Improvements to the OverloadedRecordDot extension, allowing the built-in HasField class to be used for records with fields of non lifted representations.

  • The NamedDefaults language extension has been introduced allowing you to define defaults for typeclasses other than Num.

  • More deterministic object code output, controlled by the -fobject-determinism flag, which improves determinism of builds a lot (though does not fully do so) at the cost of some compiler performance (1-2%). See #12935 for the details

  • GHC now accepts type syntax in expressions as part of GHC Proposal #281.

  • The WASM backend now has support for TemplateHaskell.

  • … and many more

A full accounting of changes can be found in the release notes. As always, GHC’s release status, including planned future releases, can be found on the GHC Wiki status.

We would like to thank GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, the Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release.

As always, do give this release a try and open a ticket if you see anything amiss.

by ghc-devs at November 05, 2024 12:00 AM

November 04, 2024

in Code

Functors to Monads: A Story of Shapes

For many years now I’ve been using a mental model and intuition that has guided me well for understanding and teaching and using functors, applicatives, monads, and other related Haskell abstractions, as well as for approaching learning new ones. Sometimes when teaching Haskell I talk about this concept and assume everyone already has heard it, but I realize that it’s something universal yet easy to miss depending on how you’re learning it. So, here it is: how I understand the Functor and other related abstractions and free constructions in Haskell.

The crux is this: instead of thinking about what fmap changes, ask: what does fmap keep constant?

This isn’t a rigorous understanding and isn’t going to explain every aspect about every Functor, and will probably only be useful if you already know a little bit about Functors in Haskell. But it’s a nice intuition trick that has yet to majorly mislead me.

The Secret of Functors

First of all, what is a Functor? A capital-F Functor, that is, the Haskell typeclass and abstraction. Ask a random Haskeller on the street and they’ll tell you that it’s something that can be “mapped over”, like a list or an optional. Maybe some of those random Haskellers will feel compelled to mention that this mapping should follow some laws…they might even list the laws. Ask them why these laws are so important and maybe you’ll spend a bit of time on this rhetorical street of Haskellers before finding one confident enough to give an answer.

So I’m going to make a bit of a tautological leap: a Functor gives you a way to “map over” values in a way that preserves shape. And what is “shape”? A shape is the thing that fmap preserves.

The Functor typeclass is simple enough: for Functor f, you have a function fmap :: (a -> b) -> f a -> f b, along with fmap id = id and fmap f . fmap g = fmap (f . g). Cute things you can drop into quickcheck to prove for your instance, but it seems like those laws are hiding some sort of deeper, fundamental truth.

The more Functors you learn about, the more you see that fmap seems to always preserve “something”:

  • For lists, fmap preserves length and relative orderings.
  • For optionals (Maybe), fmap preserves presence (the fact that something is there or not). It cannot flip a Just to a Nothing or vice versa.
  • For Either e, fmap preserves the error (if it exists) or the fact that it was succesful.
  • For Map k, fmap preserves the keys: which keys exist, how many there are, their relative orderings, etc.
  • For IO, fmap preserves the IO effect. Every bit of external I/O that an IO action represents is unchanged by an fmap, as well as exceptions.
  • For Writer w or (,) w, fmap preserves the “logged” w value, leaving it unchanged. Same for Const w.
  • For Tree, fmap preserves the tree structure: how many layers, how big they are, how deep they are, etc.
  • For State s, fmap preserves what happens to the input state s. How a State s transform a state value s is unchanged by fmap
  • For ConduitT i o m from conduit, fmap preserves what the conduit pulls upstream and what it yields downstream. fmap will not cause the conduit to yield more or different objects, nor cause it to consume/pull more or less.
  • For parser-combinator Parser, fmap preserves what input is consumed or would fail to be consumed. fmap cannot change whether an input string would fail or succeed, and it cannot change how much it consumes.
  • For optparse-applicative Parsers, fmap preserves the command line arguments available. It leaves the --help message of your program unchanged.

It seems like as soon as you define a Functor instance, or as soon as you find out that some type has a Functor instance, it magically induces some sort of … “thing” that must be preserved.1 A conserved quantity must exist. It reminds me a bit of Noether’s Theorem in Physics, where any continuous symmetry “induces” a conserved quantity (like how translation symmetry “causes” conservation of momentum). In Haskell, every lawful Functor instance induces a conserved quantity. I don’t know if there is a canonical name for this conserved quantity, but I like to call it “shape”.

A Story of Shapes

The word “shape” is chosen to be as devoid of external baggage/meaning as possible while still having some. The word isn’t important as much as saying that there is some “thing” preserved by fmap, and not exactly the nature of that “thing”. The nature of that thing changes a lot from Functor to Functor, where we might better call it an “effect” or a “structure” specifically, but that some “thing” exists is almost universal.

Of course, the value if this “thing” having a canonical name at all is debatable. I were to coin a completely new term I might call it a “conserved charge” or “gauge” in allusion to physics. But the most useful name probably would be shape.

For some Functor instances, the word shape is more literal than others. For trees, for instance, you have the literal shape of the tree preserved. For lists, the “length” could be considered a literal shape. Map k’s shape is also fairly literal: it describes the structure of keys that exist in the map. But for Writer w and Const w, shape can be interpreted as some information outside of the values you are mapping that is left unchanged by mapping. For Maybe and Either e shape also considers if there has been any short-circuiting. For State s and IO and Parser, “shape” involves some sort of side-computation or consumption that is left unchanged by fmap, often called an effect. For optparse-applicative, “shape” involves some sort of inspectable and observable static aspects of a program. “Shape” comes in all forms.

But, this intuition of “looking for that conserved quantity” is very helpful for learning new Functors. If you stumble onto a new type that you know is a Functor instance, you can immediately ask “What shape is this fmap preserving?”, and it will almost always yield insight into that type.

This viewpoint also sheds insight onto why Set.map isn’t a good candidate for fmap for Data.Set: What “thing” does Set.map f preserve? Not size, for sure. In a hypothetical world where we had ordfmap :: Ord b => (a -> b) -> f a -> f b, we would still need Set.map to preserve something for it to be useful as an “Ord-restricted Functor”.2

A Result

Before we move on, let’s look at another related and vague concept that is commonly used when discussing functors: fmap is a way to map a function that preserves the shape and changes the result.

If shape is the thing that is preserved by fmap, result is the thing that is changed by it. fmap cleanly splits the two.

Interestingly, most introduction to Functors begin with describing functor values as having a result and fmap as the thing that changes it, in some way. Ironically, though it’s a more common term, it’s by far the more vague and hard-to-intuit concept.

For something like Maybe, “result” is easy enough: it’s the value present if it exists. For parser-combinator Parsers too it’s relatively simple: the “shape” is the input consumed but the “result” is the Haskell value you get as a result of the consumption. For optparse-applicative parser, it’s the actual parsed command line arguments given by the user at runtime. But sometimes it’s more complicated: for the technical List functor, the “non-determinism” functor, the “shape” is the number of options to choose from and the order you get them in, and the “result” (to use precise semantics) is the non-deterministic choice that you eventually pick or iterate over.

So, the “result” can become a bit confusing to generalize. So, in my mind, I usually reduce the definitions to:

  • Shape: the “thing” that fmap preserves: the f in f a
  • Result: the “thing” that fmap changes: the a in f a

With this you could “derive” the Functor laws:

  • fmap id == id: fmap leaves the shape unchanged, id leaves the result unchanged. So entire thing must remain unchanged!
  • fmap f . fmap g == fmap (f . g). In both cases the shape remains unchanged, but one changes the result by f after g, and the other changes the result by f . g. They must be the same transformation!

All neat and clean, right? So, maybe the big misdirection is focusing too much on the “result” when learning Functors, when we should really be focusing more on the “shape”, or at least the two together.

Once you internalize “Functor gives you shape-preservation”, this helps you understand the value of the other common typeclass abstractions in Haskell as well, and how they function based on how they manipulate “shape” and “result”.

Traversable

For example, what does the Traversable typeclass give us? Well, if Functor gives us a way to map pure functions and preserve shape, then Traversable gives us a way to map effectful functions and preserve shape.

Whenever someone asks me about my favorite Traversable instance, I always say it’s the Map k traversable:

traverse :: Applicative f => (a -> f b) -> Map k a -> f (Map k b)

Notice how it has no constraints on k? Amazing isn’t it? Map k b lets us map an (a -> f b) over the values at each key in a map, and collects the results under the key the a was originally under.

In essence, you can be assured that the result map has the same keys as the original map, perfectly preserving the “shape” of the map. The Map k instance is the epitome of beautiful Traversable instances. We can recognize this by identifying the “shape” that traverse is forced to preserve.

Applicative

What does the Applicative typeclass give us? It has ap and pure, but its laws are infamously difficult to understand.

But, look at liftA2 (,):

liftA2 (,) :: Applicative f => f a -> f b -> f (a, b)

It lets us take “two things” and combine their shapes. And, more importantly, it combines the shapes without considering the results.

  • For Writer w, <*> lets us combine the two logged values using mappend while ignoring the actual a/b results.
  • For list, <*> (the cartesian product) lets us multiply the lengths of the input lists together. The length of the new list ignores the actual contents of the list.
  • For State s, <*> lets you compose the s -> s state functions together, ignoring the a/bs
  • For Parser, <*> lets you sequence input consumption in a way that doesn’t depend on the actual values you parse: it’s “context-free” in a sense, aside from some caveats.
  • For optparse-applicative, <*> lets you combine your command line argument specs together, without depending on the actual values provided at runtime by the caller.

The key takeaway is that the “final shape” only depends on the input shapes, and not the results. You can know the length of <*>-ing two lists together with only knowing the length of the input lists, and you can also know the relative ordering of inputs to outputs. Within the specific context of the semantics of IO, you can know what “effect” <*>-ing two IO actions would produce only knowing the effects of the input IO actions3. You can know what command line arguments <*>-ing two optparse-applicative parsers would have only knowing the command line arguments in the input parsers. You can know what strings <*>-ing two parser-combinator parsers would consume or reject, based only on the consumption/rejection of the input parsers. You can know the final log of <*>-ing two Writer w as together by only knowing the logs of the input writer actions.

And hey…some of these combinations feel “monoidal”, don’t they?

  • Writer w sequences using mappend
  • List lengths sequence by multiplication
  • State s functions sequence by composition

You can also imagine “no-op” actions:

  • Writer w’s no-op action would log mempty, the identity of mappend
  • List’s no-op action would have a length 1, the identity of multiplication
  • State s’s no-op action would be id, the identity of function composition

That might sound familiar — these are all pure from the Applicative typeclass!

So, the Applicative typeclass laws aren’t that mysterious at all. If you understand the “shape” that a Functor induces, Applicative gives you a monoid on that shape! This is why Applicative is often called the “higher-kinded” Monoid.

This intuition takes you pretty far, I believe. Look at the examples above where we clearly identify specific Applicative instances with specific Monoid instances (Monoid w, Monoid (Product Int), Monoid (Endo s)).

Put in code:

-- A part of list's shape is its length and the monoid is (*, 1)
length (xs <*> ys) == length xs * length ys
length (pure r) == 1

-- Maybe's shape is isJust and the monoid is (&&, True)
isJust (mx <*> my) == isJust mx && isJust my
isJust (pure r) = True

-- State's shape is execState and the monoid is (flip (.), id)
execState (sx <*> sy) == execState sy . execState sx
execState (pure r) == id

-- Writer's shape is execWriter and the monoid is (<>, mempty)
execWriter (wx <*> wy) == execWriter wx <> execWriter wy
execWriter (pure r) == mempty

We can also extend this to non-standard Applicative instances: the ZipList newtype wrapper gives us an Applicative instance for lists where <*> is zipWith. These two have the same Functor instances, so their “shape” (length) is the same. And for both the normal Applicative and the ZipList Applicative, you can know the length of the result based on the lengths of the input, but ZipList combines shapes using the Min monoid, instead of the Product monoid. And the identity of Min is positive infinity, so pure for ZipList is an infinite list.

-- A part of ZipList's shape is length and its monoid is (min, infinity)
length (xs <*> ys) == length xs `min` length ys
length (pure r) == infinity

The “know-the-shape-without-knowing-the-results” property is actually leveraged by many libraries. It’s how optparse-applicative can give you --help output: the shape of the optparse-applicative parser (the command line arguments list) can be computed without knowing the results (the actual arguments themselves at runtime). You can list out what arguments are expecting without ever getting any input from the user.

This is also leveraged by the async library to give us the Concurrently Applicative instance. Normally <*> for IO gives us sequential combination of IO effects. But, <*> for Concurrently gives us parallel combination of IO effects. We can launch all of the IO effects in parallel at the same time because we know what the IO effects are before we actually have to execute them to get the results. If we needed to know the results, this wouldn’t be possible.

This also gives some insight into the Backwards Applicative wrapper — because the shape of the final does not depend on the result of either, we are free to combine the shapes in whatever order we want. In the same way that every monoid gives rise to a “backwards” monoid:

ghci> "hello" <> "world"
"helloworld"
ghci> getDual $ Dual "hello" <> Dual "world"
"worldhello"

Every Applicative gives rise to a “backwards” Applicative that does the shape “mappending” in reverse order:

ghci> putStrLn "hello" *> putStrLn "world"
hello
world
ghci> forwards $ Backwards (putStrLn "hello") *> Backwards (putStrLn "world")
world
hello

The monoidal nature of Applicative with regards to shapes and effects is the heart of the original intent, and I’ve discussed this in earlier blog posts.

Alternative

The main function of the Alternative typeclass is <|>:

(<|>) :: Alternative f => f a -> f a -> f a

At first this might look a lot like <*> or liftA2 (,)

liftA2 (,) :: Applicative f => f a -> f b -> f (a, b)

Both of them take two f a values and squish them into a single one. Both of these are also monoidal on the shape, independent of the result. They have a different monoidal action on <|> than as <*>:

-- A part of list's shape is its length:
-- the Ap monoid is (*, 1), the Alt monoid is (+, 0)
length (xs <*> ys) == length xs * length ys
length (pure r) == 1
length (xs <|> ys) == length xs + length ys
length empty == 0

-- Maybe's shape is isJust:
-- The Ap monoid is (&&, True), the Alt monoid is (||, False)
isJust (mx <*> my) == isJust mx && isJust my
isJust (pure r) = True
isJust (mx <|> my) == isJust mx || isJust my
isJust empty = False

If we understand that functors have a “shape”, Applicative implies that the shapes are monoidal, and Alternative implies that the shapes are a “double-monoid”. The exact nature of how the two monoids relate to each other, however, is not universally agreed upon. For many instances, however, it does happen to form a semiring, where empty “annihilates” via empty <*> x == empty, and <*> distributes over <|> like x <*> (y <|> z) == (x <*> y) <|> (x <*> z). But this is not universal.

However, what does Alternative bring to our shape/result dichotomy that Applicative did not? Notice the subtle difference between the two:

liftA2 (,) :: Applicative f => f a -> f b -> f (a, b)
(<|>) :: Alternative f => f a -> f a -> f a

For Applicative, the “result” comes from the results of both inputs. For Alternative, the “result” could come from one or the other input. So, this introduces a fundamental data dependency for the results:

  • Applicative: Shapes merge monoidally independent of the results, but to get the result of the final, you need to produce the results of both of the two inputs in the general case.
  • Alternative: Shapes merge monoidally independent of the results, but to get the result of the final, you need the results of one or the other input in the general case.

This also implies that choice of combination method for shapes in Applicative vs Alternative aren’t arbitrary: the former has to be “conjoint” in a sense, and the latter has to be “disjoint”.

See again that clearly separating the shape and the result gives us the vocabulary to say precisely what the different data dependencies are.

Monad

Understanding shapes and results also help us appreciate more the sheer power that Monad gives us. Look at >>=:

(>>=) :: Monad m => m a -> (a -> m b) -> m b

Using >>= means that the shape of the final action is allowed to depend on the result of the first action! We are no longer in the Applicative/Alternative world where shape only depends on shape.

Now we can write things like:

greet = do
  putStrLn "What is your name?"
  n <- getLine
  putStrLn ("Hello, " ++ n ++ "!")

Remember that for “IO”, the shape is the IO effects (In this case, what exactly gets sent to the terminal) and the “result” is the haskell value computed from the execution of that IO effect. In our case, the action of the result (what values are printed) depends on the result of of the intermediate actions (the getLine). You can no longer know in advance what action the program will have without actually running it and getting the results.

The same thing happens when you start sequencing parser-combinator parsers: you can’t know what counts as a valid parse or how much a parser will consume until you actually start parsing and getting your intermediate parse results.

Monad is also what makes guard and co. useful. Consider the purely Applicative:

evenProducts :: [Int] -> [Int] -> [Bool]
evenProducts xs ys = (\x y -> even (x * y)) <$> xs <*> ys

If you passed in a list of 100 items and a list of 200 items, you can know that the result has 100 * 200 = 20000 items, without actually knowing any of the items in the list.

But, consider an alternative formulation where we are allowed to use Monad operations:

evenProducts :: [Int] -> [Int] -> [(Int, Int)]
evenProducts xs ys = do
  x <- xs
  y <- ys
  guard (even (x * y))
  pure (x, y)

Now, even if you knew the lengths of the input lists, you can not know the length of the output list without actually knowing what’s inside your lists. You need to actually start “sampling”.

That’s why there is no Monad instance for Backwards or optparse-applicative parsers. For Backwards doesn’t work because we’ve now introduced an asymmetry (the m b depends on the a of the m a) that can’t be reversed. For optparse-applicative, it’s because we want to be able to inspect the shape without knowing the results at runtime (so we can show a useful --help without getting any actual arguments): but, with Monad, we can’t know the shape without knowing the results!

In a way, Monad simply “is” the way to combine Functor shapes together where the final shape is allowed to depend on the results. Hah, I tricked you into reading a monad tutorial!

Free Structures

I definitely write way too much about free structures on this blog. But this “shapeful” way of thinking also gives rise to why free structures are so compelling and interesting to work with in Haskell.

Before, we were describing shapes of Functors and Applicatives and Monads that already existed. We had this Functor, what was its shape?

However, what if we had a shape that we had in mind, and wanted to create an Applicative or Monad that manipulated that shape?

For example, let’s roll our own version of optparse-applicative that only supported --myflag somestring options. We could say that the “shape” is the list of supported option and parsers. So a single element of this shape would be the specification of a single option:

data Option a = Option { optionName :: String, optionParse :: String -> Maybe a }
  deriving Functor

The “shape” here is the name and also what values it would parse, essentially. fmap won’t affect the name of the option and won’t affect what would succeed or fail.

Now, to create a full-fledged multi-argument parser, we can use Ap from the free library:

type Parser = Ap Option

We specified the shape we wanted, now we get the Applicative of that shape for free! We can now combine our shapes monoidally using the <*> instance, and then use runAp_ to inspect it:

data Args = Args { myStringOpt :: String, myIntOpt :: Int }

parseTwo :: Parser args
parseTwo = Args <$> liftAp stringOpt <*> liftAp intOpt
  where
    stringOpt = Option "string-opt" Just
    intOpt = Option "int-opt" readMaybe

getAllOptions :: Parser a -> [String]
getAllOptions = runAp_ (\o -> [optionName o])
ghci> getAllOptions parseTwo
["string-opt", "int-opt"]

Remember that Applicative is like a “monoid” for shapes, so Ap gives you a free “monoid” on your custom shape: you can now create list-like “sequences” of your shape that merge via concatenation through <*>. You can also know that fmap on Ap Option will not add or remove options: it’ll leave the actual options unchanged. It’ll also not affect what options would fail or succeed to parse.

You could also write a parser combinator library this way too! Remember that the “shape” of a parser combinator Parser is the string that it consumes or rejects. The single element might be a parser that consumes and rejects a single Char:

newtype Single a = Single { satisfies :: Char -> Maybe a }
  deriving Functor

The “shape” is whether or not it consumes or rejects a char. Notice that fmap for this cannot change whether or not a char is rejected or accepted: it can only change the Haskell result a value. fmap can’t flip the Maybe into a Just or Nothing.

Now we can create a full monadic parser combinator library by using Free from the free library:

type Parser = Free Single

Again, we specified the shape we wanted, and now we have a Monad for that shape! For more information on using this, I’ve written a blog post in the past. Ap gives you a free “monoid” on your shapes, but in a way Free gives you a “tree” for your shapes, where the sequence of shapes depends on which way you go down their results. And, again, fmap won’t ever change what would or would not be parsed.

How do we know what free structure to pick? Well, we ask questions about what we want to be able to do with our shape. If we want to inspect the shape without knowing the results, we’d use the free Applicative or free Alternative. As discussed earlier, using the free Applicative means that our final result must require producing all of the input results, but using the free Alternative means it doesn’t. If we wanted to allow the shape to depend on the results (like for a context-sensitive parser), we’d use the free Monad. Understanding the concept of the “shape” makes this choice very intuitive.

The Shape of You

Next time you encounter a new Functor, I hope these insights can be useful. Ask yourself, what is fmap preserving? What is fmap changing? And from there, its secrets will unfold before you. Emmy Noether would be proud.

Special Thanks

I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)


  1. There are some exceptions, especially degenerate cases like Writer () aka Identity which add no meaningful structure. So for these this mental model isn’t that useful.↩︎

  2. Incidentally, Set.map does preserve one thing: non-emptiness. You can’t Set.map an empty set into a non-empty one and vice versa. So, maybe if we recontextualized Set as a “search for at least one result” Functor or Monad where you could only ever observe a single value, Set.map would work for Ord-restricted versions of those abstractions, assuming lawful Ord instances.↩︎

  3. That is, if we take the sum consideration of all input-output with the outside world, independent of what happens within the Haskell results, we can say the combination of effects is deterministic.↩︎

by Justin Le at November 04, 2024 07:44 PM

November 03, 2024

Haskell Interlude

57: Gabriele Keller

Gabriele Keller, professor at Utrecht University, is interviewed by Andres and Joachim. We follow her journey through the world as well as programming languages, learn why Haskell is the best environment for embedding languages and how the desire to implement parallel programming sparked the development of type families in Haskell and that teaching functional programming works better with graphics.

by Haskell Podcast at November 03, 2024 08:00 PM

November 02, 2024

Brent Yorgey

Competitive Programming in Haskell: Union-Find

Competitive Programming in Haskell: Union-Find

Posted on November 2, 2024
Tagged , ,

Union-find

A union-find data structure (also known as a disjoint set data structure) keeps track of a collection of disjoint sets, typically with elements drawn from \(\{0, \dots, n-1\}\). For example, we might have the sets

\(\{1,3\}, \{0, 4, 2\}, \{5, 6, 7\}\)

A union-find structure must support three basic operations:

  • We can \(\mathit{create}\) a union-find structure with \(n\) singleton sets \(\{0\}\) through \(\{n-1\}\). (Alternatively, we could support two operations: creating an empty union-find structure, and adding a new singleton set; occasionally this more fine-grained approach is useful, but we will stick with the simpler \(\mathit{create}\) API for now.)

  • We can \(\mathit{find}\) a given \(x \in \{0, \dots, n-1\}\), returning some sort of “name” for the set \(x\) is in. It doesn’t matter what these names are; the only thing that matters is that for any \(x\) and \(y\), \(\mathit{find}(x) = \mathit{find}(y)\) if and only if \(x\) and \(y\) are in the same set. The most important application of \(\mathit{find}\) is therefore to check whether two given elements are in the same set or not.

  • We can \(\mathit{union}\) two elements, so the sets that contain them become one set. For example, if we \(\mathit{union}(2,5)\) then we would have

    \(\{1,3\}, \{0, 4, 2, 5, 6, 7\}\)

Note that \(\mathit{union}\) is a one-way operation: once two sets have been unioned together, there’s no way to split them apart again. (If both merging and splitting are required, one can use a link/cut tree, which is very cool—and possibly something I will write about in the future—but much more complex.) However, these three operations are enough for union-find structures to have a large number of interesting applications!

In addition, we can annotate each set with a value taken from some commutative semigroup. When creating a new union-find structure, we must specify the starting value for each singleton set; when unioning two sets, we combine their annotations via the semigroup operation.

  • For example, we could annotate each set with its size; singleton sets always start out with size 1, and every time we union two sets we add their sizes.
  • We could also annotate each set with the sum, product, maximum, or minumum of all its elements.
  • Of course there are many more exotic examples as well.

We typically use a commutative semigroup, as in the examples above; this guarantees that a given set always has a single well-defined annotation value, regardless of the sequence of union-find operations that were used to create it. However, we can actually use any binary operation at all (i.e. any magma), in which case the annotations on a set may reflect the precise tree of calls to \(\mathit{union}\) that were used to construct it; this can occasionally be useful.

  • For example, we could annotate each set with a list of values, and combine annotations using list concatenation; the order of elements in the list associated to a given set will depend on the order of arguments to \(\mathit{union}\).

  • We could also annotate each set with a binary tree storing values at the leaves. Each singleton set is annotated with a single leaf; to combine two trees we create a new branch node with the two trees as its children. Then each set ends up annotated with the precise tree of calls to \(\mathit{union}\) that were used to create it.

Implementing union-find

My implementation is based on one by Kwang Yul Seo, but I have modified it quite a bit. The code is also available in my comprog-hs repository. This blog post is not intended to be a comprehensive union-find tutorial, but I will explain some things as we go.

{-# LANGUAGE RecordWildCards #-}

module UnionFind where

import Control.Monad (when)
import Control.Monad.ST
import Data.Array.ST

Let’s start with the definition of the UnionFind type itself. UnionFind has two type parameters: s is a phantom type parameter used to limit the scope to a given ST computation; m is the type of the arbitrary annotations. Note that the elements are also sometimes called “nodes”, since, as we will see, they are organized into trees.

type Node = Int
data UnionFind s m = UnionFind {

The basic idea is to maintain three mappings:

  • First, each element is mapped to a parent (another element). There are no cycles, except that some elements can be their own parent. This means that the elements form a forest of rooted trees, with the self-parenting elements as roots. We store the parent mapping as an STUArray (see here for another post where we used STUArray) for efficiency.
  parent :: !(STUArray s Node Node),
  • Each element is also mapped to a size. We maintain the invariant that for any element which is a root (i.e. any element which is its own parent), we store the size of the tree rooted at that element. The size associated to other, non-root elements does not matter.

    (Many implementations store the height of each tree instead of the size, but it does not make much practical difference, and the size seems more generally useful.)

  sz :: !(STUArray s Node Int),
  • Finally, we map each element to a custom annotation value; again, we only care about the annotation values for root nodes.
  ann :: !(STArray s Node m) }

To \(\mathit{create}\) a new union-find structure, we need a size and a function mapping each element to an initial annotation value. Every element starts as its own parent, with a size of 1. For convenience, we can also make a variant of createWith that gives every element the same constant annotation value.

createWith :: Int -> (Node -> m) -> ST s (UnionFind s m)
createWith n m =
  UnionFind
    <$> newListArray (0, n - 1) [0 .. n - 1]    -- Every node is its own parent
    <*> newArray (0, n - 1) 1                   -- Every node has size 1
    <*> newListArray (0, n - 1) (map m [0 .. n - 1])

create :: Int -> m -> ST s (UnionFind s m)
create n m = createWith n (const m)

To perform a \(\mathit{find}\) operation, we keep following parent references up the tree until reaching a root. We can also do a cool optimization known as path compression: after finding a root, we can directly update the parent of every node along the path we just traversed to be the root. This means \(\mathit{find}\) can be very efficient, since it tends to create trees that are extremely wide and shallow.

find :: UnionFind s m -> Node -> ST s Node
find uf@(UnionFind {..}) x = do
  p <- readArray parent x
  if p /= x
    then do
      r <- find uf p
      writeArray parent x r
      pure r
    else pure x

connected :: UnionFind s m -> Node -> Node -> ST s Bool
connected uf x y = (==) <$> find uf x <*> find uf y

Finally, to implement \(\mathit{union}\), we find the roots of the given nodes; if they are not the same we make the root with the smaller tree the child of the other root, combining sizes and annotations as appropriate.

union :: Semigroup m => UnionFind s m -> Node -> Node -> ST s ()
union uf@(UnionFind {..}) x y = do
  x <- find uf x
  y <- find uf y
  when (x /= y) $ do
    sx <- readArray sz x
    sy <- readArray sz y
    mx <- readArray ann x
    my <- readArray ann y
    if sx < sy
      then do
        writeArray parent x y
        writeArray sz y (sx + sy)
        writeArray ann y (mx <> my)
      else do
        writeArray parent y x
        writeArray sz x (sx + sy)
        writeArray ann x (mx <> my)

Note the trick of writing x <- find uf x: this looks kind of like an imperative statement that updates the value of a mutable variable x, but really it just makes a new variable x which shadows the old one.

Finally, a few utility functions. First, one to get the size of the set containing a given node:

size :: UnionFind s m -> Node -> ST s Int
size uf@(UnionFind {..}) x = do
  x <- find uf x
  readArray sz x

Also, we can provide functions to update and fetch the custom annotation value associated to the set containing a given node.

updateAnn :: Semigroup m => UnionFind s m -> Node -> m -> ST s ()
updateAnn uf@(UnionFind {..}) x m = do
  x <- find uf x
  old <- readArray ann x
  writeArray ann x (old <> m)
  -- We could use modifyArray above, but the version of the standard library
  -- installed on Kattis doesn't have it

getAnn :: UnionFind s m -> Node -> ST s m
getAnn uf@(UnionFind {..}) x = do
  x <- find uf x
  readArray ann x

Challenge

Here are a couple of problems I challenge you to solve for next time:

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at November 02, 2024 12:00 AM

October 31, 2024

Abhinav Sarkar

Going REPLing with Haskeline

So you went ahead and created a new programming language, with an AST, a parser, and an interpreter. And now you hate how you have to write the programs in your new language in files to run them? You need a REPL! In this post, we’ll create a shiny REPL with lots of nice features using the Haskeline library to go along with your new PL that you implemented in Haskell.

This post was originally published on abhinavsarkar.net.

The Demo

First a short demo:

<noscript>
Play demo <noscript></noscript>
</noscript>

That is a pretty good REPL, isn’t it? You can even try it online1, running entirely in your browser.

Dawn of a New Language

Let’s assume that we have created a new small Lisp2, just large enough to be able to conveniently write and run the Fibonacci function that returns the nth Fibonacci number. That’s it, nothing more. This lets us focus on the features of the REPL3, not the language.

We have a parser to parse the code from text to an AST, and an interpreter that evaluates an AST and returns a value. We are not going into the details of the parser and the interpreter, just listing the type signatures of the functions they provide is enough for this post.

Let’s start with the AST:

module Language.FiboLisp.Types where

import Data.Text qualified as Text
import Data.Text.Lazy qualified as LText
import Text.Pretty.Simple qualified as PS
import Text.Printf (printf)

type Ident = String

data Expr
  = Num_ Integer
  | Bool_ Bool
  | Var Ident
  | BinaryOp Op Expr Expr
  | If Expr Expr Expr
  | Apply Ident [Expr]
  deriving (Show)

data Op = Add | Sub | LessThan
  deriving (Show, Enum)

data Def = Def {defName :: Ident, defParams :: [Ident], defBody :: Expr}

data Program = Program [Def] [Expr]
  deriving (Show)

carKeywords :: [String]
carKeywords = ["def", "if", "+", "-", "<"]

instance Show Def where
  show Def {..} =
    printf "(Def %s [%s] (%s))" defName (unwords defParams) (show defBody)

showProgram :: Program -> String
showProgram =
  Text.unpack
    . LText.toStrict
    . PS.pShowOpt
      ( PS.defaultOutputOptionsNoColor
          { PS.outputOptionsIndentAmount = 2,
            PS.outputOptionsCompact = True,
            PS.outputOptionsCompactParens = True
          }
      )

That’s right! We named our little language FiboLisp.

FiboLisp is expression oriented; everything is an expression. So naturally, we have an Expr AST. Writing the Fibonacci function requires not many syntactic facilities. In FiboLisp we have:

  • integer numbers,
  • booleans,
  • variables,
  • addition, subtraction, and less-than binary operations on numbers,
  • conditional if expressions, and
  • function calls by name.

We also have function definitions, captured by Def, which records the function name, its parameter names, and its body as an expression.

And finally we have Programs, which are a bunch of function definitions to define, and another bunch of expressions to evaluate.

Short and simple. We don’t need anything more4. This is how the Fibonacci function looks in FiboLisp:

(def fibo [n]
  (if (< n 2)
    n
    (+ (fibo (- n 1)) (fibo (- n 2)))))

We can see all the AST types in use here. Note that FiboLisp is lexically scoped.

The module also lists a bunch of keywords (carKeywords) that can appear in the car5 position of a Lisp expression, that we use later for auto-completion in the REPL, and some functions to convert the AST types to nice looking strings.

For the parser, we have this pared-down code:

module Language.FiboLisp.Parser (ParsingError(..), parse) where

import Control.DeepSeq (NFData)
import Control.Exception (Exception)
import GHC.Generics (Generic)
import Language.FiboLisp.Types

parse :: String -> Either ParsingError Program

data ParsingError = ParsingError String | EndOfStreamError
  deriving (Show, Generic, NFData)

instance Exception ParsingError

The essential function is parse, which takes the code as a string, and returns either a ParsingError on failure, or a Program on success. If the parser detects that an S-expression is not properly closed, it returns an EndOfStreamError error.

We also have this pretty-printer module that converts function ASTs back to pretty Lisp code:

module Language.FiboLisp.Printer (prettyShowDef) where

import Language.FiboLisp.Types

prettyShowDef :: Def -> String

Finally, the last thing before we hit the real topic of this post, the FiboLisp interpreter:

module Language.FiboLisp.Interpreter
  (Value, RuntimeError, interpret, builtinFuncs, builtinVals) where

import Control.DeepSeq (NFData)
import Control.Exception (Exception)
import Data.Map.Strict qualified as Map
import GHC.Generics (Generic)
import Language.FiboLisp.Types

interpret :: (String -> IO ()) -> Program -> IO (Either RuntimeError Value)

newtype RuntimeError = RuntimeError String
  deriving (Show, Generic, NFData)

instance Exception RuntimeError

data Value = ...
  deriving (Show, Generic, NFData)

builtinFuncs :: Map.Map String Value

builtinVals :: [Value]

We have elided the details again. All that matters to us is the interpret function that takes a program, and returns either a runtime error or a value. Value is the runtime representation of the values of FiboLisp expressions, and all we care about is that it can be shown and fully evaluated via NFData6. interpret also takes a String -> IO () function, that’ll be demystified when we get into implementing the REPL.

Lastly, we have a map of built-in functions and a list of built-in values. We expose them so that they can be treated specially in the REPL.

If you want, you can go ahead and fill in the missing code using your favourite parsing and pretty-printing libraries7, and the method of writing interpreters. For this post, those implementation details are not necessary.

Let’s package all this functionality into a module for ease of importing:

module Language.FiboLisp
  ( module Language.FiboLisp.Types,
    module Language.FiboLisp.Parser,
    module Language.FiboLisp.Printer,
    module Language.FiboLisp.Interpreter,
  )
where

import Language.FiboLisp.Interpreter
import Language.FiboLisp.Parser
import Language.FiboLisp.Printer
import Language.FiboLisp.Types

Now, with all the preparations done, we can go REPLing.

A REPL of Our Own

The main functionality that a REPL provides is entering expressions and definitions, one at a time, that it Reads, Evaluates, and Prints, and then Loops back, letting us do the same again. This can be accomplished with a simple program that prompts the user for an input and does all these with it. However, such a REPL will be quite lackluster.

These days programming languages come with advanced REPLs like IPython and nREPL, which provide many functionalities beyond simple REPLing. We want FiboLisp to have a great REPL too.

You may have already noticed some advanced features that our REPL provides in the demo. Let’s state them here:

  1. Commands starting with colon:
    1. to set and unset settings: :set and :unset,
    2. to load files into the REPL: :load,
    3. to show the source code of functions: :source,
    4. to show a help message: :help.
  2. Settings to enable/disable:
    1. dumping of parsed ASTs: dump,
    2. showing program execution times: time.
  3. Multiline expressions and functions, with correct indentation.
  4. Colored output and messages.
  5. Auto-completion of commands, code and file names.
  6. Safety checks when loading files.
  7. Readline-like navigation through the history of previous inputs.

Haskeline — the Haskell library that we use to create the REPL — provides only basic functionalities, upon which we build to provide these features. Let’s begin.

State and Settings

As usual, we start the module with many imports8:

{-# LANGUAGE TemplateHaskell #-}

module Language.FiboLisp.Repl (run) where

import Control.DeepSeq qualified as DS
import Control.Exception (Exception (..), evaluate)
import Control.Lens.Basic qualified as Lens
import Control.Monad (when)
import Control.Monad.Catch qualified as Catch
import Control.Monad.IO.Class (MonadIO, liftIO)
import Control.Monad.Identity (IdentityT (..))
import Control.Monad.Reader (MonadReader, ReaderT (runReaderT))
import Control.Monad.Reader qualified as Reader
import Control.Monad.State.Strict (MonadState, StateT (runStateT))
import Control.Monad.State.Strict qualified as State
import Control.Monad.Trans (MonadTrans, lift)
import Data.Char qualified as Char
import Data.Functor ((<&>))
import Data.List
  (dropWhileEnd, foldl', isPrefixOf, isSuffixOf, nub, sort, stripPrefix)
import Data.Map.Strict qualified as Map
import Data.Maybe (fromJust)
import Data.Set qualified as Set
import Data.Time (NominalDiffTime, diffUTCTime, getCurrentTime)
import Language.FiboLisp qualified as L
import System.Console.Haskeline qualified as H
import System.Console.Terminfo qualified as Term
import System.Directory (canonicalizePath, doesFileExist, getCurrentDirectory)

Notice that we import the previously shown Language.FiboLisp module qualified as L, and Haskeline as H. Another important library that we use here is terminfo, which helps us do colored output.

A REPL must preserve the context through a session. In case of FiboLisp, this means we should be able to define a function9 as one input, and then use it later in the session, one or many times10. The REPL should also respect the REPL settings through the session till they are unset.

Additionally, the REPL has to remember whether it is in middle of writing a multiline input. To support multiline input, the REPL also needs to remember the previous indentation, and the input done in previous lines of a multiline input. Together these form the ReplState:

data ReplState = ReplState
  { _replDefs :: Defs,
    _replSettings :: Settings,
    _replLineMode :: LineMode,
    _replIndent :: Int,
    _replSeenInput :: String
  }

type Defs = Map.Map L.Ident L.Def
type Settings = Set.Set Setting
data Setting = Dump | MeasureTime deriving (Eq, Ord, Enum)
data LineMode = SingleLine | MultiLine deriving (Eq)

instance Show Setting where
  show = \case
    Dump -> "dump"
    MeasureTime -> "time"

Let’s deal with settings first. We set and unset settings using the :set and :unset commands. So, we write the code to parse setting the settings:

data SettingMode = Set | Unset deriving (Eq, Enum)

instance Show SettingMode where
  show = \case
    Set -> ":set"
    Unset -> ":unset"

parseSetting :: String -> Maybe Setting
parseSetting = \case
  "dump" -> Just Dump
  "time" -> Just MeasureTime
  _ -> Nothing

parseSettingMode :: String -> Maybe SettingMode
parseSettingMode = \case
  ":set" -> Just Set
  ":unset" -> Just Unset
  _ -> Nothing

parseSettingCommand :: String -> Either String (SettingMode, Setting)
parseSettingCommand command = case words command of
  [modeStr, settingStr] -> case parseSettingMode modeStr of
    Just mode -> case parseSetting settingStr of
      Just setting -> Right (mode, setting)
      Nothing -> Left $ "Unknown setting: " <> settingStr
    Nothing -> Left $ "Unknown command: " <> command
  [modeStr]
    | Just _ <- parseSettingMode modeStr -> Left "No setting specified"
  _ -> Left $ "Unknown command: " <> command

Nothing fancy here, just splitting the input into words and going through them to make sure they are valid.

The REPL is a monad that wraps over ReplState:

newtype Repl a = Repl
  { runRepl_ :: StateT ReplState (ReaderT AddColor IO) a
  }
  deriving
    ( Functor,
      Applicative,
      Monad,
      MonadIO,
      MonadState ReplState,
      MonadReader AddColor,
      Catch.MonadThrow,
      Catch.MonadCatch,
      Catch.MonadMask
    )

type AddColor = Term.Color -> String -> String

runRepl :: AddColor -> Repl a -> IO a
runRepl addColor =
  fmap fst
    . flip runReaderT addColor
    . flip runStateT (ReplState Map.empty Set.empty SingleLine 0 "")
    . runRepl_

Repl also lets us do IO — is it really a REPL if you can’t do printing — and deal with exceptions. Additionally, we have a read-only state that is a function, which will be explained soon. The REPL starts in the single line mode, with no indentation, functions definitions, settings, or previously seen input.

REPLing Down the Prompt

Let’s go top-down. We write the run function that is the entry point of this module:

run :: IO ()
run = do
  term <- Term.setupTermFromEnv
  let addColor =
        case Term.getCapability term $ Term.withForegroundColor @String of
          Just fc -> fc
          Nothing -> \_ s -> s
  runRepl addColor . H.runInputT settings $ do
    H.outputStrLn $ addColor promptColor "FiboLisp REPL"
    H.outputStrLn $ addColor infoColor "Press <TAB> to start"
    repl
  where
    settings =
      H.setComplete doCompletions $
        H.defaultSettings {H.historyFile = Just ".fibolisp"}

This sets up Haskeline to run our REPL using the functions we provide in the later sections: repl and doCompletions. This also demystifies the read-only state of the REPL: a function that adds colors to our output strings, depending on the capabilities of the terminal in which our REPL is running in. We also set up a history file to remember the previous REPL inputs.

When the REPL starts, we output some messages in nice colors, which are defined as:

promptColor, printColor, outputColor, errorColor, infoColor :: Term.Color
promptColor = Term.Green
printColor = Term.White
outputColor = Term.Green
errorColor = Term.Red
infoColor = Term.Cyan

Off we go repling now:

type Prompt = H.InputT Repl

repl :: Prompt ()
repl = do
  replLineMode .= SingleLine
  replIndent .= 0
  replSeenInput .= ""
  Catch.handle (\H.Interrupt -> repl) . H.withInterrupt $
    readInput >>= \case
      EndOfInput -> outputWithColor promptColor "Goodbye."
      input -> evalAndPrint input >> repl

outputWithColor :: Term.Color -> String -> Prompt ()
outputWithColor color text = do
  addColor <- getAddColor
  H.outputStrLn $ addColor color text

getAddColor :: Prompt AddColor
getAddColor = lift Reader.ask

We infuse our Repl with the powers of Haskeline by wrapping it with Haskeline’s InputT monad transformer, and call it the Prompt type. In the repl function, we readInput, evalAndPrint it, and repl again.

We also deal with the user quitting the REPL (the EndOfInput case), and hitting Ctrl + C to interrupt typing or a running evaluation (the handling for H.Interrupt).

Wait a minute! What is that imperative looking .= doing in our Haskell code? That’s right, we are looking through some lenses!

type Lens' s a = Lens.Lens s s a a

replDefs :: Lens' ReplState Defs
replDefs = $(Lens.field '_replDefs)

replSettings :: Lens' ReplState Settings
replSettings = $(Lens.field '_replSettings)

replLineMode :: Lens' ReplState LineMode
replLineMode = $(Lens.field '_replLineMode)

replIndent :: Lens' ReplState Int
replIndent = $(Lens.field '_replIndent)

replSeenInput :: Lens' ReplState String
replSeenInput = $(Lens.field '_replSeenInput)

use :: (MonadTrans t, MonadState s m) => Lens' s a -> t m a
use l = lift . State.gets $ Lens.view l

(.=) :: (MonadTrans t, MonadState s m) => Lens' s a -> a -> t m ()
l .= a = lift . State.modify' $ Lens.set l a

(%=) :: (MonadTrans t, MonadState s m) => Lens' s a -> (a -> a) -> t m ()
l %= f = lift . State.modify' $ Lens.over l f

If you’ve never encountered lenses before, you can think of them as pairs of setters and getters. The repl* lenses above are for setting and getting the corresponding fields from the ReplState data type11. The use, .=, and %= functions are for getting, setting and modifying respectively the state in the State monad using lenses. We see them in action at the beginning of the repl function when we use .= to set the various fields of ReplState to their initial values in the State monad.

All that is left now is actually reading the input, evaluating it and printing the results.

Reading the Input

Haskeline gives us functions to read the user’s input as text. However, being Haskellers, we prefer some structure around it:

data Input
  = Setting (SettingMode, Setting)
  | Load FilePath
  | Source String
  | Help
  | Program L.Program
  | BadInputError String
  | EndOfInput

We’ve got all previously mentioned cases covered with the Input data type. We also do some input validation and capture errors for the failure cases with the BadInputError constructor. EndOfInput is used for when the user quits the REPL.

Here is how we read the input:

readInput :: Prompt Input
readInput = do
  addColor <- getAddColor
  lineMode <- use replLineMode
  prevIndent <- use replIndent

  let promptSym = case lineMode of SingleLine -> "λ"; _ -> "|"
      prompt = addColor promptColor $ promptSym <> "> "

  mInput <- H.getInputLineWithInitial prompt (replicate prevIndent ' ', "")
  let currentIndent = maybe 0 (length . takeWhile (== ' ')) mInput

  case trimStart . trimEnd <$> mInput of
    Nothing -> return EndOfInput
    Just input | null input -> do
      replIndent .= case lineMode of
        SingleLine -> prevIndent
        MultiLine -> currentIndent
      readInput
    Just input@(':' : _) -> parseCommand input
    Just input -> parseCode input currentIndent

trimStart :: String -> String
trimStart = dropWhile Char.isSpace

trimEnd :: String -> String
trimEnd = dropWhileEnd Char.isSpace

We use the getInputLineWithInitial function provided by Haskeline to show a prompt and read user’s input as a string. The prompt shown depends on the LineMode of the REPL state. In the SingleLine mode we show λ>, where in the MultiLine mode we show |>.

If there is no input, that means the user has quit the REPL. In that case we return EndOfInput, which is handled in the repl function. If the input is empty, we read more input, preserving the previous indentation (prevIndent) in the MultiLine mode.

If the input starts with :, we parse it for various commands:

parseCommand :: String -> Prompt Input
parseCommand input
  | ":help" `isPrefixOf` input = return Help
  | ":load" `isPrefixOf` input =
      checkFilePath . trimStart . fromJust $ stripPrefix ":load" input
  | ":source" `isPrefixOf` input = do
      return . Source . trimStart . fromJust $ stripPrefix ":source" input
  | input == ":" = return $ BadInputError "No command specified"
  | otherwise = case parseSettingCommand input of
      Right setting -> return $ Setting setting
      Left err -> return $ BadInputError err

checkFilePath :: String -> Prompt Input
checkFilePath file
  | null file = return $ BadInputError "No file specified"
  | otherwise =
      isSafeFilePath file <&> \case
        True -> Load file
        False -> BadInputError $ "Cannot access file: " <> file

isSafeFilePath :: (MonadIO m) => FilePath -> m Bool
isSafeFilePath fp =
  liftIO $ isPrefixOf <$> getCurrentDirectory <*> canonicalizePath fp

The :help and :source cases are straightforward. In case of :load, we make sure to check that the file asked to be loaded is located somewhere inside the current directory of the REPL or its recursive subdirectories. Otherwise, we deny loading by returning a BadInputError. We parse the settings using the parseSettingCommand function we wrote earlier.

If the input is not a command, we parse it as code:

parseCode :: String -> Int -> Prompt Input
parseCode currentInput indent = do
  seenInput <- use replSeenInput
  let input = seenInput <> " " <> currentInput
  case L.parse input of
    Left L.EndOfStreamError -> do
      replLineMode .= MultiLine
      replIndent .= indent
      replSeenInput .= input
      readInput
    Left err ->
      return $ BadInputError $ "ERROR: " <> displayException err
    Right program -> return $ Program program

We append the previously seen input (in case of multiline input) with the current input and parse it using the parse function provided by the Language.FiboLisp module. If parsing fails with an EndOfStreamError, it means that the input is incomplete. In that case, we set the REPL line mode to Multiline, REPL indentation to the current indentation, and seen input to the previously seen input appended with the current input, and read more input. If it is some other error, we return a BadInputError with it.

If the result of parsing is a program, we return it as a Program input.

That’s it for reading the user input. Next, we evaluate it.

Evaluating the Input

Recall that the repl function calls the evalAndPrint function with the read input:

evalAndPrint :: Input -> Prompt ()
evalAndPrint = \case
  EndOfInput -> return ()
  BadInputError err -> outputWithColor errorColor err
  Help -> H.outputStr helpMessage
  Setting (Set, setting) -> replSettings %= Set.insert setting
  Setting (Unset, setting) -> replSettings %= Set.delete setting
  Source ident -> showSource ident
  Load fp -> loadAndEvalFile fp
  Program program -> interpretAndPrint program
  where
    helpMessage =
      unlines
        [ "Available commands",
          ":set/:unset dump       Dumps the program AST",
          ":set/:unset time       Shows the program execution time",
          ":load <file>           Loads a source file",
          ":source <func_name>    Prints the source code of a function",
          ":help                  Shows this help"
        ]

The cases of EndOfInput, BadInputError and Help are straightforward. For settings, we insert or remove the setting from the REPL settings, depending on it being set or unset. For the other cases, we call the respective helper functions.

For a :source command, we check if the requested identifier maps to a user-defined or builtin function, and if so, print its source. Otherwise we print an error.

showSource :: L.Ident -> Prompt ()
showSource ident = do
  defs <- use replDefs
  case Map.lookup ident defs of
    Just def -> outputWithColor infoColor $ L.prettyShowDef def
    Nothing -> case Map.lookup ident L.builtinFuncs of
      Just func -> outputWithColor infoColor $ show func
      Nothing ->
        outputWithColor errorColor $ "No such function: " <> ident

For a :load command, we check if the requested file exists. If so, we read and parse it, and interpret the resultant program. In case of any errors in reading or parsing the file, we catch and print them.

loadAndEvalFile :: FilePath -> Prompt ()
loadAndEvalFile fp =
  liftIO (doesFileExist fp) >>= \case
    False -> outputWithColor errorColor $ "No such file: " <> fp
    True -> Catch.handleAll outputError $ do
      code <- liftIO $ readFile fp
      outputWithColor infoColor $ "Loaded " <> fp
      case L.parse code of
        Left err -> outputError err
        Right program -> interpretAndPrint program

outputError :: (Exception e) => e -> Prompt ()
outputError err =
  outputWithColor errorColor $ "ERROR: " <> displayException err

Finally, we come to the workhorse of the REPL: the interpretation of the user provided program:

interpretAndPrint :: L.Program -> Prompt ()
interpretAndPrint (L.Program pDefs exprs) =
  Catch.handleAll outputError $ do
    defs <- use replDefs
    settings <- use replSettings

    let defs' =
          foldl' (\ds d -> Map.insert (L.defName d) d ds) defs pDefs
        program = L.Program (Map.elems defs') exprs
    when (Dump `Set.member` settings) $
      outputWithColor infoColor (L.showProgram program)

    addColor <- getAddColor
    extPrint <- H.getExternalPrint

    (execTime, val) <- liftIO . measureElapsedTime $ do
      val <- L.interpret (extPrint . addColor printColor) program
      evaluate $ DS.force val

    case val of
      Left err -> outputError err
      Right v -> do
        let output = show v
        if null output
          then return ()
          else outputWithColor outputColor $ "=> " <> output

    when (MeasureTime `Set.member` settings) $
      outputWithColor infoColor $
        "(Execution time: " <> show execTime <> ")"

    replDefs .= defs'

measureElapsedTime :: IO a -> IO (NominalDiffTime, a)
measureElapsedTime f = do
  start <- getCurrentTime
  ret <- f
  end <- getCurrentTime
  return (diffUTCTime end start, ret)

We start by collecting the user defined functions in the current input with the previously defined functions in the session such that current functions override the previous functions with the same names. At this point, if the dump setting is set, we print the program AST.

Then we invoke the interpret function provided by the Language.FiboLisp module. Recall that the interpret function takes the program to interpret and a function of type String -> IO (). This function is a color-adding wrapper over the function returned by the Haskeline function getExternalPrint12. This function allows non-REPL code to safely print to the Haskeline driven REPL without garbling the output. We pass it to the interpret function so that the interpret can invoke it when the user code invokes the builtin print function or similar.

We make sure to force and evaluate the value returned by the interpreter so that any lazy values or errors are fully evaluated13, and the measured elapsed time is correct.

If the interpreter returns an error, we print it. Else we convert the value to a string, and if is it not empty14, we print it.

Finally, we print the execution time if the time setting is set, and set the REPL defs to the current program defs.

That’s all! We have completed our REPL. But wait, I think we forgot one thing …

Doing the Completions

The REPL would work fine with this much code, but it would not be a good experience for the user, because they’d have to type everything without any help from the REPL. To make it convenient for the user, we provide contextual auto-completion functionality while typing. Haskeline lets us plug in our custom completion logic by setting a completion function, which we did way back at the start. Now we need to implement it.

doCompletions :: H.CompletionFunc Repl
doCompletions =
  fmap runIdentityT . H.completeWordWithPrev Nothing " " $ \leftRev word -> do
    defs <- use replDefs
    lineMode <- use replLineMode
    settings <- use replSettings
    let funcs = nub $ Map.keys defs <> Map.keys L.builtinFuncs
        vals = map show L.builtinVals
    case (word, lineMode) of
      ('(' : rest, _) ->
        pure
          [ H.Completion ('(' : hint) hint True
            | hint <- nub . sort $ L.carKeywords <> funcs,
              rest `isPrefixOf` hint
          ]
      (_, SingleLine) -> case word of
        "" | null leftRev ->
          pure [H.Completion "" s True | s <- commands <> funcs <> vals]
        ':' : _ | null leftRev ->
          pure [H.simpleCompletion c | c <- commands, word `isPrefixOf` c]
        _
          | "tes:" `isSuffixOf` leftRev ->
            pure
              [ H.simpleCompletion $ show s
                | s <- [Dump ..], s `notElem` settings, word `isPrefixOf` show s
              ]
          | "tesnu:" `isSuffixOf` leftRev ->
            pure
              [ H.simpleCompletion $ show s
                | s <- [Dump ..], s `elem` settings, word `isPrefixOf` show s
              ]
          | "daol:" `isSuffixOf` leftRev ->
            isSafeFilePath word >>= \case
              True -> H.listFiles word
              False -> pure []
          | "ecruos:" `isSuffixOf` leftRev ->
            pure
              [ H.simpleCompletion ident
                | ident <- funcs,
                  ident `Map.notMember` L.builtinFuncs,
                  word `isPrefixOf` ident
              ]
          | otherwise ->
            pure [H.simpleCompletion c | c <- funcs <> vals, word `isPrefixOf` c]
      _ -> pure []
  where
    commands = ":help" : ":load" : ":source" : map show [Set ..]

Haskeline provides us the completeWordWithPrev function to easily create our own completion function. It takes a callback function that it calls with the current word being completed (the word immediately to the left of the cursor), and the content of the line before the word (to the left of the word), reversed. We use these to return different completion lists of strings.

Going case by case:

  1. If the word starts with (, it means we are in middle of writing FiboLisp code. So we return the carKeywords and the user-defined and builtin function names that start with the current word sans the initial (. This happens regardless of the current line mode. Rest of the cases below apply only in the SingleLine mode.
  2. If the entire line is empty, we return the names of all commands, functions, and builtin values.
  3. If the word starts with :, and is at the beginning of the line, we return the commands that start with the word.
  4. If the line starts with
    1. :set, we return the not set settings
    2. :unset, we return the set settings
    3. :load, we return the names of the files and directories in the current directory
    4. :source, we return the names of the user-defined functions
    that start with the word.
  5. Otherwise we return no completions.

This covers all cases, and provides helpful completions, while avoiding bad ones. And this completes the implementation of our wonderful REPL.

Conclusion

I wrote this REPL while implementing a Lisp that I wrote15 while going through the Essentials of Compilation book, which I thoroughly recommend for getting started with compilers. It started as a basic REPL, and gathered a lot of nice functionalities over time. So I decided to extract and share it here. I hope that this Haskeline tutorial helps you in creating beautiful and useful REPLs. Here is the complete code for the REPL.


  1. The online demo is rather slow to load and to run, and works only on Firefox and Chrome. Even though I managed to put it together somehow, I don’t actually know how it exactly works, and I’m unable to fix the issues with it.↩︎

  2. Lisps are awesome and I absolutely recommend creating one or more of them as an amateur PL implementer. Some resources I recommend are: the Build Your Own Lisp book, and the Make-A-Lisp tutorial.↩︎

  3. REPLs are wonderful for doing interactive and exploratory programming where you try out small snippets of code in the REPL, and put your program together piece-by-piece. They are also good for debugging because they let you inspect the state of running programs from within. I still fondly remember the experience of connecting (or jacking in) to running productions systems written in Clojure over REPL, and figuring out issues by dumping variables.↩︎

  4. We don’t even need let. We can, and have to, define variables by creating functions, with parameters serving the role of variables. In fact, we can’t even assign or reassign variables. Functions are the only scoping mechanism in FiboLisp, much like old-school JavaScript with its IIFEs.↩︎

  5. car is obviously Contents of the Address part of the Register, the first expression in a list form in a Lisp.↩︎

  6. You may be wondering about why we need the NFData instances for the errors and values. This will become clear when we write the REPL.↩︎

  7. I recommend the sexp-grammar library, which provides both parsing and printing facilities for S-expressions based languages. Or you can write something by yourself using the parsing and pretty-printing libraries like megaparsec and prettyprinter.↩︎

  8. We assume that our project’s Cabal file sets the default-language to GHC2021, and the default-extensions to LambdaCase, OverloadedStrings, RecordWildCards, and StrictData.↩︎

  9. Recall that there is no way to define variables in FiboLisp.↩︎

  10. If the interpreter allows mutually recursive function definitions, functions can be called before defining them.↩︎

  11. We are using the basic-lens library here, which is the tiniest lens library, and provides only the five functions and types we see used here.↩︎

  12. Using the function returned from getExternalPrint is not necessary in our case because the REPL blocks when it invokes the interpreter. That means, nothing but the interpreter can print anything while it is running. So the interpreter can actually print directly to stdout and nothing will go wrong.

    However, imagine a case in which our code starts a background thread that needs to print to the REPL. In such case, we must use the Haskeline provided print function instead of printing directly. When printing to the REPL using it, Haskeline coordinates the prints so that the output in the terminal is not garbled.↩︎

  13. Now we see why we derive NFData instances for errors and Value.↩︎

  14. Returned value could be of type void with no textual representation, in which case we would not print it.↩︎

  15. I wrote the original REPL code almost three years ago. I refactored, rewrote and improved a lot of it in the course of writing this post. As they say, writing is thinking.↩︎

If you liked this post, please leave a comment.

by Abhinav Sarkar (abhinav@abhinavsarkar.net) at October 31, 2024 12:00 AM

October 25, 2024

Derek Elkins

Classical First-Order Logic from the Perspective of Categorical Logic

Introduction

Classical First-Order Logic (Classical FOL) has an absolutely central place in traditional logic, model theory, and set theory. It is the foundation upon which ZF(C), which is itself often taken as the foundation of mathematics, is built. When classical FOL was being established there was a lot of study and debate around alternative options. There are a variety of philosophical and metatheoretic reasons supporting classical FOL as The Right Choice.

This all happened, however, well before category theory was even a twinkle in Mac Lane’s and Eilenberg’s eyes, and when type theory was taking its first stumbling steps.

My focus in this article is on what classical FOL looks like to a modern categorical logician. This can be neatly summarized as “classical FOL is the internal logic of a Boolean First-Order Hyperdoctrine. Each of the three words in this term,”Boolean”, “First-Order”, and “Hyperdoctrine”, suggest a distinct axis in which to vary the (class of categorical models of the) logic. All of them have compelling categorical motivations to be varied.

Boolean

The first and simplest is the term “Boolean”. This is what differentiates the categorical semantics of classical (first-order) logic from constructive (first-order) logic. Considering arbitrary first-order hyperdoctrines would give us a form of intuitionistic first-order logic.

It is fairly rare that the categories categorists are interested in are Boolean. For example, most toposes, all of which give rise to first-order hyperdoctrines, are not Boolean. The assumption that they are tends to correspond to a kind of “discreteness” that’s often at odds with the purpose of the topos. For example, a category of sheaves on a topological space is Boolean if and only if that space is a Stone space. These are certainly interesting spaces, but they are also totally disconnected unlike virtually every non-discrete topological space one would typically mention.

First-Order

The next term is the term “first-order”. As the name suggests, a first-order hyperdoctrine has the necessary structure to interpret first-order logic. The question, then, is what kind of categories have this structure and only this structure. The answer, as far as I’m aware, is not many.

Many (classes of) categories have the structure to be first-order hyperdoctrines, but often they have additional structure as well that it seems odd to ignore. The most notable and interesting example is toposes. All elementary toposes (which includes all Grothendieck toposes) have the structure to give rise to a first-order hyperdoctrine. But, famously, they also have the structure to give rise to a higher order logic. Even more interesting, while Grothendieck toposes, being elementary toposes, technically do support the necessary structure for first-order logic, the natural morphisms of Grothendieck toposes, geometric morphisms, do not preserve that structure, unlike the logical functors between elementary toposes.

The natural internal logic for Grothendieck toposes turns out to be geometric logic. This is a logic that lacks universal quantification and implication (and thus negation) but does have infinitary disjunction. This leads to a logic that is, at least superficially, incomparable to first-order logic. Closely related logics are regular logic and coherent logic which are sub-logics of both geometric logic and first-order logic.

We see, then, just from the examples of the natural logics of toposes, none of them are first-order logic, and we get examples that are more powerful, less powerful, and incomparable to first-order logic. Other common classes of categories give other natural logics, such as the cartesian logic from left exact categories, and monoidal categories give rise to (ordered) linear logics. We get the simply typed lambda calculus from cartesian closed categories which leads to the next topic.

Hyperdoctrine

A (posetal) hyperdoctrine essentially takes a category and, for each object in that category, assigns to it a poset of “predicates” on that object. In many cases, this takes the form of the Sub functor assigning to each object its poset of subobjects. Various versions of hyperdoctrines will require additional structure on the source category, these posets, and/or the functor itself to interpret various logical connectives. For example, a regular hyperdoctrine requires the source category to have finite limits, the posets to be meet-semilattices, and the functor to give rise to monotonic functions with left adjoints satisfying certain properties. This notion of hyperdoctrines is suitable for regular logic.

It’s very easy to recognize that these functors are essentially indexed |(0,1)|-categories. This immediately suggests that we should consider higher categorical versions or at the very least normal indexed categories.

What this means for the logic is that we move from proof-irrelevant logic to proof-relevant logic. We now have potentially multiple ways a “predicate” could “entail” another “predicate”. We can present the simply typed lambda calculus in this indexed category manner. This naturally leads/connects to the categorical semantics of type theories.

Pushing forward to |(\infty, 1)|-categories is also fairly natural, as it’s natural to want to talk about an entailment holding for distinct but “equivalent” reasons.

Summary

Moving in all three of these directions simultaneously leads pretty naturally to something like Homotopy Type Theory (HoTT). HoTT is a naturally constructive (but not anti-classical) type theory aimed at being an internal language for |(\infty, 1)|-toposes.

Why Classical FOL?

Okay, so why did people pick classical FOL in the first place? It’s not like the concept of, say, a higher-order logic wasn’t considered at the time.

Classical versus Intuitionistic was debated at the time, but at that time it was primarily a philosophical argument, and the defense of Intuitionism was not very compelling (to me and obviously people at the time). The focus would probably have been more on (classical) FOL versus second- (or higher-)order logic.

Oversimplifying, the issue with second-order logic is fairly evident from the semantics. There are two main approaches: Henkin-semantics and full (or standard) semantics. Henkin-semantics keeps the nice properties of (classical) FOL but fails to get the nice properties, namely categoricity properties, of second-order logic. This isn’t surprising as Henkin-semantics can be encoded into first-order logic. It’s essentially syntactic sugar. Full semantics, however, states that the interpretation of predicate sorts is power sets of (cartesian products of) the domain1. This leads to massive completeness problems as our metalogical set theory has many, many ways of building subsets of the domain. There are metatheoretic results that state that there is no computable set of logical axioms that would give us a sound and complete theory for second-order logic with respect to full semantics. This aspect is also philosophically problematic, because we don’t want to need set theory to understand the very formulation of set theory. Thus Quine’s comment that “second-order logic [was] set theory in sheep’s clothing”.

On the more positive and (meta-)mathematical side, we have results like Lindström’s theorem which states that classical FOL is the strongest logic that simultaneously satisfies (downward) Löwenheim-Skolem and compactness. There’s also a syntactic result by Lindström which characterizes first-order logic as the only logic having a recursively enumerable set of tautologies and satisfying Löwenheim-Skolem2.

The Catch

There’s one big caveat to the above. All of the above results are formulated in traditional model theory which means there are various assumptions built in to their statements. In the language of categorical logic, these assumptions can basically be summed up in the statement that the only category of semantics that traditional model theory considers is Set.

This is an utterly bizarre thing to do from the standpoint of categorical logic.

The issues with full semantics follow directly from this choice. If, as categorical logic would have us do, we considered every category with sufficient structure as a potential category of semantics, then our theory would not be forced to follow every nook and cranny of Set’s notion of subset to be complete. Valid formulas would need to be true not only in Set but in wildly different categories, e.g. every (Boolean) topos.

These traditional results are also often very specific to classical FOL. Dropping this constraint of classical logic would lead to an even broader class of models.

Categorical Perspective on Classical First-Order Logic

A Boolean category is just a coherent category where every object has a complement. Since coherent functors preserve complements, we have that the category of Boolean categories is a full subcategory of the category of coherent categories.

One nice thing about, specifically, classical first-order logic from the perspective of category theory is the following. First, coherent logic is a sub-logic of geometric logic restricted to finitary disjunction. Via Morleyization, we can encode classical first-order logic into coherent logic such that the categories of models of each are equivalent. This implies that a classical FOL formula is valid if and only if its encoding is. Morleyization allows us to analyze classical FOL using the tools of classifying toposes. On the one hand, this once again suggests the importance of coherent logic, but it also means that we can use categorical tools with classical FOL.

Conclusion

There are certain things that I and, I believe, most logicians take as table stakes for a (foundational) logic3. For example, checking a proof should be computably decidable. For these reasons, I am in complete accord with early (formal) logicians that classical second-order logic with full semantics is an unacceptably worse alternative to classical first-order logic.

However, when it comes to statements about the specialness of FOL, a lot of them seem to be more statements about traditional model theory than FOL itself, and also statements about the philosophical predilections of the time. I feel that philosophical attitudes among logicians and mathematicians have shifted a decent amount since the beginning of the 20th century. We have different philosophical predilections today than then, but they are informed by another hundred years of thought, and they are more relevant to what is being done today.

Martin-Löf type theory (MLTT) and its progeny also present an alternative path with their own philosophical and metalogical justifications. I mention this to point out actual cases of foundational frameworks that a (very) superficial reading of traditional model theory results would seem to have been “ruled out”. Even if one thinks the FOL+ZFC (or whatever) is the better foundations, I think it is unreasonable to assert that MLTT derivatives are unworkable as a foundations.


  1. It’s worth mentioning that this is exactly what categorical logic would suggest: our syntactic power objects should be mapped to semantic power objects.↩︎

  2. While nice, it’s not clear that compactness and, especially, Löwenheim-Skolem are sacrosanct properties that we’d be unwilling to do without. Lindström’s first theorem is thus a nice abstract characterization theorem for classical FOL, but it doesn’t shut the door on considering alternatives even in the context of traditional model theory.↩︎

  3. I’m totally fine thinking about logics that lack these properties, but I would never put any of them forward as an acceptable foundational logic.↩︎

October 25, 2024 12:55 AM

October 17, 2024

Tweag I/O

Introducing rules_gcs

At Tweag, we are constantly striving to improve the developer experience by contributing tools and utilities that streamline workflows. We recently completed a project with IMAX, where we learned that they had developed a way to simplify and optimize the process of integrating Google Cloud Storage (GCS) with Bazel. Seeing value in this tool for the broader community, we decided to publish it together under an open source license. In this blog post, we’ll dive into the features, installation, and usage of rules_gcs, and how it provides you with access to private resources.

What is rules_gcs?

rules_gcs is a Bazel ruleset that facilitates the downloading of files from Google Cloud Storage. It is designed to be a drop-in replacement for Bazel’s http_file and http_archive rules, with features that make it particularly suited for GCS. With rules_gcs, you can efficiently fetch large amounts of data, leverage Bazel’s repository cache, and handle private GCS buckets with ease.

Key Features

  • Drop-in Replacement: rules_gcs provides gcs_file and gcs_archive rules that can directly replace http_file and http_archive. They take a gs://bucket_name/object_name URL and internally translate this to an HTTPS URL. This makes it easy to transition to GCS-specific rules without major changes to your existing Bazel setup.

  • Lazy Fetching with gcs_bucket: For projects that require downloading multiple objects from a GCS bucket, rules_gcs includes a gcs_bucket module extension. This feature allows for lazy fetching, meaning objects are only downloaded as needed, which can save time and bandwidth, especially in large-scale projects.

  • Private Bucket Support: Accessing private GCS buckets is seamlessly handled by rules_gcs. The ruleset supports credential management through a credential helper, ensuring secure access without the need to hardcode credentials or use gsutil for downloading.

  • Bazel’s Downloader Integration: rules_gcs uses Bazel’s built-in downloader and repository cache, optimizing the download process and ensuring that files are cached efficiently across builds, even across multiple Bazel workspaces on your local machine.

  • Small footprint: Apart from the gcloud CLI tool (for obtaining authentication tokens), rules_gcs requires no additional dependencies or Bazel modules. This minimalistic approach reduces setup complexity and potential conflicts with other tools.

Understanding Bazel Repositories and Efficient Object Fetching with rules_gcs

Before we dive into the specifics of rules_gcs, it’s important to understand some key concepts about Bazel repositories and repository rules, as well as the challenges of efficiently managing large collections of objects from a Google Cloud Storage (GCS) bucket.

Bazel Repositories and Repository Rules

In Bazel, external dependencies are managed using repositories, which are declared in your WORKSPACE or MODULE.bazel file. Each repository corresponds to a package of code, binaries, or other resources that Bazel fetches and makes available for your build. Repository rules, such as http_archive or git_repository, and module extensions define how Bazel should download and prepare these external dependencies.

However, when dealing with a large number of objects, such as files stored in a GCS bucket, using a single repository to download all objects can be highly inefficient. This is because Bazel’s repository rules typically operate in an “eager” manner—they fetch all the specified files as soon as any target of the repository is needed. For large buckets, this means downloading potentially gigabytes of data even if only a few files are actually needed for the build. This eager fetching can lead to unnecessary network usage, increased build times, and larger disk footprints.

The rules_gcs Approach: Lazy Fetching with a Hub Repository

rules_gcs addresses this inefficiency by introducing a more granular approach to downloading objects from GCS. Instead of downloading all objects at once into a single repository, rules_gcs uses a module extension that creates a “hub” repository, which then manages individual sub-repositories for each GCS object.

How It Works
  1. Hub Repository: The hub repository acts as a central point of reference, containing metadata about the individual GCS objects. This follows the “hub-and-spoke” paradigm with a central repository (the bucket) containing references to a large number of small repositories for each object. This architecture is commonly used by Bazel module extensions to manage dependencies for different language ecosystems (including Python and Rust).

  2. Individual Repositories per GCS Object: For each GCS object specified in the lockfile, rules_gcs creates a separate repository using the gcs_file rule. This allows Bazel to fetch each object lazily—downloading only the files that are actually needed for the current build.

  3. Methods of Fetching: Users can choose between different methods in the gcs_bucket module extension. The default method of creating symlinks is efficient while preserving the file structure set in the lockfile. If you need to access objects as regular files, choose one of the other methods.

    • Symlink: Creates a symlink from the hub repo pointing to a file in its object repo, ensuring the object repo and symlink pointing to it are created only when the file is accessed.
    • Alias: Similar to symlink, but uses Bazel’s aliasing mechanism to reference the file. No files are created in the hub repo.
    • Copy: Creates a copy of a file in the hub repo when accessed.
    • Eager: Downloads all specified objects upfront into a single repository.

This modular approach is particularly beneficial for large-scale projects where only a subset of the data is needed for most builds. By fetching objects lazily, rules_gcs minimizes unnecessary data transfer and reduces build times.

Integrating with Bazel’s Credential Helper Protocol

Another critical aspect of rules_gcs is its seamless integration with Bazel’s credential management system. Accessing private GCS buckets securely requires proper authentication, and Bazel uses a credential helper protocol to handle this.

How Bazel’s Credential Helper Protocol Works

Bazel’s credential helper protocol is a mechanism that allows Bazel to fetch authentication credentials dynamically when accessing private resources, such as a GCS bucket. The protocol is designed to be simple and secure, ensuring that credentials are only used when necessary and are never hardcoded into build files.

When Bazel’s downloader prepares a request and a credential helper was configured, it invokes the credential helper with the command get. Additionally, the request URI is passed to the helpers standard input encoded as JSON. The helper is expected to return a JSON object containing HTTP headers, including the necessary Authorization token, which Bazel will then include in its requests.

Here’s a breakdown of how the credential_helper script used in rules_gcs works:

  1. Authentication Token Retrieval: The script uses the gcloud CLI tool to obtain an access token via gcloud auth application-default print-access-token. This token is tied to the user’s current authentication context and can be used to fetch any objects the user is allowed to access.

  2. Output Format: The script outputs the token in a JSON format that Bazel can directly use:

    {
      "headers": {
        "Authorization": ["Bearer ${TOKEN}"]
      }
    }

    This JSON object includes the Authorization header, which Bazel uses to authenticate its requests to the GCS bucket.

  3. Integration with Bazel: To use this credential helper, you need to configure Bazel by specifying the helper in the .bazelrc file:

    common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper

    This line tells Bazel to use the specified credential_helper script whenever it needs to access resources from storage.googleapis.com. If a request returns an error code or unexpected content, credentials are invalidated and the helper is invoked again.

How rules_gcs Hooks Into the Credential Helper Protocol

rules_gcs leverages this credential helper protocol to manage access to private GCS buckets securely and efficiently. By providing a pre-configured credential helper script, rules_gcs ensures that users can easily set up secure access without needing to manage tokens or authentication details manually.

Moreover, by limiting the scope of the credential helper to the GCS domain (storage.googleapis.com), rules_gcs reduces the risk of credentials being misused or accidentally exposed. The helper script is designed to be lightweight, relying on existing gcloud credentials, and integrates seamlessly into the Bazel build process.

Installing rules_gcs

Adding rules_gcs to your Bazel project is straightforward. The latest version is available on the Bazel Central Registry. To install, simply add the following to your MODULE.bazel file:

bazel_dep(name = "rules_gcs", version = "1.0.0")

You will also need to include the credential helper script in your repository:

mkdir -p tools
wget -O tools/credential-helper https://raw.githubusercontent.com/tweag/rules_gcs/main/tools/credential-helper
chmod +x tools/credential-helper

Next, configure Bazel to use the credential helper by adding the following lines to your .bazelrc:

common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper
# optional setting to make rules_gcs more efficient
common --experimental_repository_cache_hardlinks

These settings ensure that Bazel uses the credential helper specifically for GCS requests. Additionally, the setting --experimental_repository_cache_hardlinks allows Bazel to hardlink files from the repository cache instead of copying them into a repository. This saves time and storage space, but requires the repository cache to be located on the same filesystem as the output base.

Using rules_gcs in Your Project

rules_gcs provides three primary rules: gcs_bucket, gcs_file, and gcs_archive. Here’s a quick overview of how to use each:

  • gcs_bucket: When dealing with multiple files from a GCS bucket, the gcs_bucket module extension offers a powerful and efficient way to manage these dependencies. You define the objects in a JSON lockfile, and gcs_bucket handles the rest.

    gcs_bucket = use_extension("@rules_gcs//gcs:extensions.bzl", "gcs_bucket")
    
    gcs_bucket.from_file(
        name = "trainingdata",
        bucket = "my_org_assets",
        lockfile = "@//:gcs_lock.json",
    )
  • gcs_file: Use this rule to download a single file from GCS. It’s particularly useful for pulling in assets or binaries needed during your build or test processes. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).

    gcs_file = use_repo_rule("@rules_gcs//gcs:repo_rules.bzl", "gcs_file")
    
    gcs_file(
        name = "my_testdata",
        url = "gs://my_org_assets/testdata.bin",
        sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
    )
  • gcs_archive: This rule downloads and extracts an archive from GCS, making it ideal for pulling in entire repositories or libraries that your project depends on. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).

    gcs_archive = use_repo_rule("@rules_gcs//gcs:repo_rules.bzl", "gcs_archive")
    
    gcs_archive(
        name = "magic",
        url = "gs://my_org_code/libmagic.tar.gz",
        sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
        build_file = "@//:magic.BUILD",
    )

Try it Out

rules_gcs is a versatile and simple solution for integrating Google Cloud Storage with Bazel. We invite you to try out rules_gcs in your projects and contribute to its development. As always, we welcome feedback and look forward to seeing how this tool enhances your workflows. Check out the full example to get started!

Thanks to IMAX for sharing their initial implementation of rules_gcs and allowing us to publish the code under an open source license.

October 17, 2024 12:00 AM

October 15, 2024

Philip Wadler

You can help Cards Against Humanity pay "blue leaning" nonvoters $100 to vote


How is this not illegal??? Cards Against Humanity is PAYING people who didn't vote in 2020 to apologize, make a voting plan, and post #DonaldTrumpIsAHumanToilet—up to $100 for blue-leaning people in swing states. I helped by getting a 2024 Election Pack: checkout.giveashit.lol. Spotted via BoingBoing. More info at The Register. (Only American citizens and residents can participate. If, like me, you are an American citizen but non-resident, you will need a VPN.)

by Philip Wadler (noreply@blogger.com) at October 15, 2024 08:11 AM

October 14, 2024

Edward Z. Yang

Tensor programming for databases, with first class dimensions

Tensor libraries like PyTorch and JAX have developed compact and accelerated APIs for manipulating n-dimensional arrays. N-dimensional arrays are kind of similar to tables in database, and this results in the logical question which is could you setup a Tensor-like API to do queries on databases that would be normally done with SQL? We have two challenges:

  • Tensor computation is typically uniform and data-independent. But SQL relational queries are almost entirely about filtering and joining data in a data-dependent way.
  • JOINs in SQL can be thought of as performing outer joins, which is not a very common operation in tensor computation.

However, we have a secret weapon: first class dimensions were primarily designed to as a new frontend syntax that made it easy to express einsum, batching and tensor indexing expressions. They might be good for SQL too.

Representing the database. First, how do we represent a database? A simple model following columnar database is to have every column be a distinct 1D tensor, where all columns part of the same table have a consistent indexing scheme. For simplicity, we'll assume that we support rich dtypes for the tensors (e.g., so I can have a tensor of strings). So if we consider our classic customer database of (id, name, email), we would represent this as:

customers_id: int64[C]
customers_name: str[C]
customers_email: str[C]

Where C is the number of the entries in the customer database. Our tensor type is written as dtype[DIM0, DIM1, ...], where I reuse the name that I will use for the first class dimension that represents it. Let's suppose that the index into C does not coincide with id (which is good, because if they did coincide, you would have a very bad time if you ever wanted to delete an entry from the database!)

This gives us an opportunity for baby's first query: let's implement this query:

SELECT c.name, c.email FROM customers c WHERE c.id = 1000

Notice that the result of this operation is data-dependent: it may be zero or one depending on if the id is in the database. Here is a naive implementation in standard PyTorch:

mask = customers_id == 1000
return (customers_name[mask], customers_email[mask])

Here, we use boolean masking to perform the data-dependent filtering operation. This implementation in eager is a bit inefficient; we materialize a full boolean mask that is then fed into the subsequent operations; you would prefer for a compiler to fuse the masking and indexing together. First class dimensions don't really help with this example, but we need to introduce some new extensions to first class dimensions. First, what we can do:

C = dims(1)
c_id = customers_id[C]  # {C} => int64[]
c_name = customers_name[C]  # {C} => str[]
c_email = customers_email[C]  # {C} => str[]
c_mask = c_id == 1000  # {C} => bool[]

Here, a tensor with first class tensors has a more complicated type {DIM0, DIM1, ...} => dtype[DIM2, DIM3, ...]. The first class dimensions are all reported in the curly braces to the left of the double arrow; curly braces are used to emphasize the fact that first class dimensions are unordered.

What next? The problem is that now we want to do something like torch.where(c_mask, c_name, ???) but we are now in a bit of trouble, because we don't want anything in the false branch of where: we want to provide something like "null" and collapse the tensor to a smaller number of elements, much like how boolean masking did it without first class dimensions. To express this, we'll introduce a binary version of torch.where that does exactly this, as well as returning the newly allocated FCD for the new, data-dependent dimension:

C2, c2_name = torch.where(c_mask, c_name)  # {C2} => str[]
_C2, c2_email = torch.where(c_mask, c_email)  # {C2} => str[], n.b. C2 == _C2
return c2_name, c2_email

Notice that torch.where introduces a new first-class dimension. I've chosen that this FCD gets memoized with c_mask, so whenever we do more torch.where invocations we still get consistently the same new FCD.

Having to type out all the columns can be a bit tiresome. If we assume all elements in a table have the same dtype (let's call it dyn, short for dynamic type), we can more compactly represent the table as a 2D tensor, where the first dimension is the indexing as before, and the second dimension is the columns of the database. For clarity, we'll support using the string name of the column as a shorthand for the numeric index of the column. If the tensor is contiguous, this gives a more traditional row-wise database. The new database can be conveniently manipulated with FCDs, as we can handle all of the columns at once instead of typing them out individually):

customers:  dyn[C, C_ATTR]
C = dims(1)
c = customers[C]  # {C} => dyn[C_ATTR]
C2, c2 = torch.where(c["id"] == 1000, c)  # {C2} => dyn[C_ATTR]
return c2[["name", "email"]].order(C2)  # dyn[C2, ["name", "email"]]

We'll use this for the rest of the post, but the examples should be interconvertible.

Aggregation. What's the average age of all customers, grouped by the country they live in?

SELECT AVG(c.age) FROM customers c GROUP BY c.country;

PyTorch doesn't natively support this grouping operation, but essentially what is desired here is a conversion into a nested tensor, where the jagged dimension is the country (each of which will have a varying number of countries). Let's hallucinate a torch.groupby analogous to its Pandas equivalent:

customers: dyn[C, C_ATTR]
customers_by_country = torch.groupby(customers, "country")  # dyn[COUNTRY, JC, C_ATTR]
COUNTRY, JC = dims(2)
c = customers_by_country[COUNTRY, JC]  # {COUNTRY, JC} => dyn[C_ATTR]
return c["age"].mean(JC).order(COUNTRY)  # f32[COUNTRY]

Here, I gave the generic indexing dimension the name JC, to emphasize that it is a jagged dimension. But everything proceeds like we expect: after we've grouped the tensor and rebound its first class dimensions, we can take the field of interest and explicitly specify a reduction on the dimension we care about.

In SQL, aggregations have to operate over the entirety of groups specified by GROUP BY. However, because FCDs explicitly specify what dimensions we are reducing over, we can potentially decompose a reduction into a series of successive reductions on different columns, without having to specify subqueries to progressively perform the reductions we are interested in.

Joins. Given an order table, join it with the customer referenced by the customer id:

SELECT o.id, c.name, c.email FROM orders o JOIN customers c ON o.customer_id = c.id

First class dimensions are great at doing outer products (although, like with filtering, it will expensively materialize the entire outer product naively!)

customers: dyn[C, C_ATTR]
orders: dyn[O, O_ATTR]
C, O = dims(2)
c = customers[C]  # {C} => dyn[C_ATTR]
o = orders[O]  # {O} => dyn[O_ATTR]
mask = o["customer_id"] == c["id"]  # {C, O} => bool[]
outer_product = torch.cat(o[["id"]], c[["name", "email"]])  # {C, O} => dyn[["id", "name", "email"]]
CO, co = torch.where(mask, outer_product)  # {CO} => dyn[["id", "name", "email"]]
return co.order(CO)  # dyn[C0, ["id", "name", "email"]]

What's the point. There are a few reasons why we might be interested in the correspondence here. First, we might be interested in applying SQL ideas to the Tensor world: a lot of things people want to do in preprocessing are similar to what you do in traditional relational databases, and SQL can teach us what optimizations and what use cases we should think about. Second, we might be interested in applying Tensor ideas to the SQL world: in particular, I think first class dimensions are a really intuitive frontend for SQL which can be implemented entirely embedded in Python without necessitating the creation of a dedicated DSL. Also, this might be the push needed to get TensorDict into core.

by Edward Z. Yang at October 14, 2024 05:07 AM

Brent Yorgey

MonadRandom: major or minor version bump?

MonadRandom: major or minor version bump?

Posted on October 14, 2024
Tagged , , , ,

tl;dr: a fix to the MonadRandom package may cause fromListMay and related functions to extremely rarely output different results than they used to. This could only possibly affect anyone who is using fixed seed(s) to generate random values and is depending on the specific values being produced, e.g. a unit test where you use a specific seed and test that you get a specific result. Do you think this should be a major or minor version bump?


The Fix

Since 2013 I have been the maintainer of MonadRandom, which defines a monad and monad transformer for generating random values, along with a number of related utilities.

Recently, Toni Dietze pointed out a rare situation that could cause the fromListMay function to crash (as well as the other functions which depend on it: fromList, weighted, weightedMay, uniform, and uniformMay). This function is supposed to draw a weighted random sample from a list of values decorated with weights. I’m not going to explain the details of the issue here; suffice it to say that it has to do with conversions between Rational (the type of the weights) and Double (the type that was being used internally for generating random numbers).

Even though this could only happen in rare and/or strange circumstances, fixing it definitely seemed like the right thing to do. After a bit of discussion, Toni came up with a good suggestion for a fix: we should no longer use Double internally for generating random numbers, but rather Word64, which avoids conversion and rounding issues.

In fact, Word64 is already used internally in the generation of random Double values, so we can emulate the behavior of the Double instance (which was slightly tricky to figure out) so that we make exactly the same random choices as before, but without actually converting to Double.

The Change

…well, not exactly the same random choices as before, and therein lies the rub! If fromListMay happens to pick a random value which is extremely close to a boundary between choices, it’s possible that the value will fall on one side of the boundary when using exact calculations with Word64 and Rational, whereas before it would have fallen on the other side of the boundary after converting to Double due to rounding. In other words, it will output the same results almost all the time, but for a list of \(n\) weighted choices there is something like an \(n/2^{64}\) chance (or less) that any given random choice will be different from what it used to be. I have never observed this happening in my tests, and indeed, I do not expect to ever observe it! If we generated one billion random samples per second continuously for a thousand years, we might expect to see it happen once or twice. I am not even sure how to engineer a test scenario to force it to happen, because we would have to pick an initial PRNG seed that forces a certain Word64 value to be generated.

To PVP or not to PVP?

Technically, a function exported by MonadRandom has changed behavior, so according to the Haskell PVP specification this should be a major version bump (i.e. 0.6 to 0.7).Actually, I am not even 100% clear on this. The decision tree on the PVP page says that changing the behavior of an exported function necessitates a major version bump; but the actual specification does not refer to behavior at all—as I read it, it is exclusively concerned with API compatibility, i.e. whether things will still compile.

But there seem to be some good arguments for doing just a minor version bump (i.e. 0.6 to 0.6.1).

  • Arguments in favor of a minor version bump:

    • A major version bump would cause a lot of (probably unnecessary) breakage! MonadRandom has 149 direct reverse dependencies, and about 3500 distinct transitive reverse dependencies. Forcing all those packages to update their upper bound on MonadRandom would be a lot of churn.

    • What exactly constitutes the “behavior” of a function to generate random values? It depends on your point of view. If we view the function as a pure mathematical function which takes a PRNG state as input and produces some value as output, then its behavior is defined precisely by which outputs it returns for which input seeds, and its behavior has changed. However, if we think of it in more effectful terms, we could say its “behavior” is just to output random values according to a certain distribution, in which case its behavior has not changed.

    • It’s extremely unlikely that this change will cause any breakage; moreover, as argued by Boyd Stephen Smith, anyone who cares enough about reproducibility to be relying on specific outputs for specific seeds is probably already pinning all their package versions.

  • Arguments in favor of a major version bump:

    • It’s what the PVP specifies; what’s the point of having a specification if we don’t follow it?

    • In the unlikely event that this change does cause any breakage, it could be extremely difficult for package maintainers to track down. If the behavior of a random generation function completely changes, the source of the issue is obvious. But if it only changes for very rare inputs, you might reasonably think the problem is something else. A major version bump will force maintainers to read the changelog for MonadRandom and assess whether this is a change that could possibly affect them.

So, do you have opinions on this? Would the release affect you one way or the other? Feel free to leave a comment here, or send me an email with your thoughts. Note there has already been a bit of discussion on Mastodon as well.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at October 14, 2024 12:00 AM

October 05, 2024

Lysxia's blog

Unicode shenanigans: Martine écrit en UTF-8

An old French meme
Martine écrit en UTF-8 (parody cover of the Martine series of French children's books)

On my feed aggregator haskell.pl-a.net, I occasionally saw posts with broken titles like this (from ezyang’s blog):

What’s different this time? LLM edition

Yesterday I decided to do something about it.

Locating the problem

Tracing back where it came from, that title was sent already broken by Planet Haskell, which is itself a feed aggregator for blogs. The blog originally produces the good not broken title. Therefore the blame lies with Planet Haskell. It’s probably a misconfigured locale. Maybe someone will fix it. It seems to be running archaic software on an old machine, stuff I wouldn’t deal with myself so I won’t ask someone else to.

ASCII diagram of how a blog title travels through the relevant parties
      Blog
       |
       | What’s
       v
 Planet Haskell
       | 
       | What’s
       v
haskell.pl-a.net (my site)
       |
       | What’s
       v
  Your screen

In any case, this mistake can be fixed after the fact. Mis-encoded text is such an ubiquitous issue that there are nicely packaged solutions out there, like ftfy.

ftfy has been used as a data processing step in major NLP research, including OpenAI’s original GPT.

But my hobby site is written in OCaml and I would rather have fun solving this encoding problem than figure out how to install a Python program and call it from OCaml.

Explaining the problem

This is the typical situation where a program is assuming the wrong text encoding.

Text encodings

A quick summary for those who don’t know about text encodings.

Humans read and write sequences of characters, while computers talk to each other using sequences of bytes. If Alice writes a blog, and Bob wants to read it from across the world, the characters that Alice writes must be encoded into bytes so her computer can send it over the internet to Bob’s computer, and Bob’s computer must decode those bytes to display them on his screen. The mapping between sequences of characters and sequences of bytes is called an encoding.

Multiple encodings are possible, but it’s not always obvious which encoding to use to decode a given byte string. There are good and bad reasons for this, but the net effect is that many text-processing programs arbitrarily guess and assume the encoding in use, and sometimes they assume wrong.

Back to the problem

UTF-8 is the most prevalent encoding nowadays.1 I’d be surprised if one of the Planet Haskell blogs doesn’t use it, which is ironic considering the issue we’re dealing with.

  1. A blog using UTF-8 encodes the right single quote2 " ’ " as three consecutive bytes (226, 128, 153) in its RSS or Atom feed.
  2. The culprit, Planet Haskell, read those bytes but wrongly assumed an encoding different from UTF-8 where each byte corresponds to one character.
  3. It did some transformation to the decoded text (extract the title and body and put it on a webpage with other blogs).
  4. It encoded the final result in UTF-8.
ASCII diagram of how text gets encoded and decoded (wrongly)
      What the blog sees →       '’'
                                  |
                                  | UTF-8 encode (one character into three bytes)
                                  v
                             226 128 153
                                  |
                                  | ??? decode (not UTF-8)
                                  v
What Planet Haskell sees →   'â' '€' '™'
                                  |
                                  | UTF-8 encode
                                  v
                                (...)
                                  |
                                  | UTF-8 decode
                                  v
            What you see →   'â' '€' '™'

The final encoding doesn’t really matter, as long as everyone else downstream agrees with it. The point is that Planet Haskell outputs three characters “’” in place of the right single quote " ’ ", all because UTF-8 represents " ’ " with three bytes.

In spite of their differences, most encodings in practice agree at least about ASCII characters, in the range 0-127, which is sufficient to contain the majority of English language writing if you can compromise on details such as confusing the apostrophe and the single quotes. That’s why in the title “What’s different this time?” everything but one character was transferred fine.

Solving the problem

The fix is simple: replace “’” with " ’ ". Of course, we also want to do that with all other characters that are mis-encoded the same way: those are exactly all the non-ASCII Unicode characters. The more general fix is to invert Planet Haskell’s decoding logic. Thank the world that this mistake can be reversed to begin with. If information had been lost by mis-encoding, I may have been forced to use one of those dreadful LLMs to reconstruct titles.3

  1. Decode Planet Haskell’s output in UTF-8.
  2. Encode each character as a byte to recover the original output from the blog.
  3. Decode the original output correctly, in UTF-8.

There is one missing detail: what encoding to use in step 2? I first tried the naive thing: each character is canonically a Unicode code point, which is a number between 0 and 1114111, and I just hoped that those which did occur would fit in the range 0-255. That amounts to making the hypothesis that Planet Haskell is decoding blog posts in Latin-1. That seems likely enough, but you will have guessed correctly that the naive thing did not reconstruct the right single quote in this case. The Latin-1 hypothesis was proven false.

As it turns out, the euro sign “€” and the trademark symbol “™” are not in the Latin-1 alphabet. They are code points numbers 8364 and 8482 in Unicode, which are not in the range 0-255. Planet Haskell has to be using an encoding that features these two symbols. I needed to find which one.

Faffing about, I came across the Wikipedia article on Western Latin character sets which lists a comparison table. How convenient. I looked up the two symbols to find what encoding had them, if any. There were two candidates: Windows-1252 and Macintosh. Flip a coin. It was Windows-1252.

Windows-1252 differs from Latin-1 (and thus Unicode) in 27 positions, those whose byte starts with 8 or 9 in hexadecimal (27 valid characters + 5 unused positions): that’s 27 characters that I had to map manually to the range 0-255 according to the Windows-1252 encoding, and the remaining characters would be mapped for free by Unicode. This data entry task was autocompleted halfway through by Copilot, because of course GPT-* knows Windows-1252 by heart.

let windows1252_hack (c : Uchar.t) : int =
  let c = Uchar.to_int c in
  if      c = 0x20AC then 0x80
  else if c = 0x201A then 0x82
  else if c = 0x0192 then 0x83
  else if c = 0x201E then 0x84
  else if c = 0x2026 then 0x85
  else if c = 0x2020 then 0x86
  else if c = 0x2021 then 0x87
  else if c = 0x02C6 then 0x88
  else if c = 0x2030 then 0x89
  else if c = 0x0160 then 0x8A
  else if c = 0x2039 then 0x8B
  else if c = 0x0152 then 0x8C
  else if c = 0x017D then 0x8E
  else if c = 0x2018 then 0x91
  else if c = 0x2019 then 0x92
  else if c = 0x201C then 0x93
  else if c = 0x201D then 0x94
  else if c = 0x2022 then 0x95
  else if c = 0x2013 then 0x96
  else if c = 0x2014 then 0x97
  else if c = 0x02DC then 0x98
  else if c = 0x2122 then 0x99
  else if c = 0x0161 then 0x9A
  else if c = 0x203A then 0x9B
  else if c = 0x0153 then 0x9C
  else if c = 0x017E then 0x9E
  else if c = 0x0178 then 0x9F
  else c

And that’s how I restored the quotes, apostrophes, guillemets, accents, et autres in my feed.


See also


Update: When Planet Haskell picked up this post, it fixed the intentional mojibake in the title.

Screenshot of Planet Haskell with a correctly displayed diacritic. October 05, 2024. Lysxia's blog. Unicode shenanigans: Martine écrit en UTF-8

There is no room for this in my mental model. Planet Haskell is doing something wild to parse blog titles.


  1. As of September 2024, UTF-8 is used by 98.3% of surveyed web sites.↩︎

  2. The Unicode right single quote is sometimes used as an apostrophe, to much disapproval.↩︎

  3. Or I could just query the blogs directly for their titles.↩︎

by Lysxia at October 05, 2024 12:00 AM

Christopher Allen

Routines in caring for children

I have 4 children aged 4, 3, almost 2, and 19 weeks. Parents are increasingly isolated from each other socially so it's harder to compare tactics and strategies for caregiving. I want to share a run-down of how my wife and I care for our children and what has seemed to work and what has not.

by Unknown at October 05, 2024 12:00 AM

October 04, 2024

Derek Elkins

Global Rebuilding, Coroutines, and Defunctionalization

Introduction

In 1983, Mark Overmars described global rebuilding in The Design of Dynamic Data Structures. The problem it was aimed at solving was turning the amortized time complexity bounds of batched rebuilding into worst-case bounds. In batched rebuilding we perform a series of updates to a data structure which may cause the performance of operations to degrade, but occasionally we expensively rebuild the data structure back into an optimal arrangement. If the updates don’t degrade performance too much before we rebuild, then we can achieve our target time complexity bounds in an amortized sense. An update that doesn’t degrade performance too much is called a weak update.

Taking an example from Okasaki’s Purely Functional Data Structures, we can consider a binary search tree where deletions occur by simply marking the deleted nodes as deleted. Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced binary search tree and clean out the nodes marked as deleted at that time. In this case, the deletions count as weak updates because leaving the deleted nodes in the tree even when it corresponds to up to half the tree can only mildly impact the time complexity of other operations. Specifically, assuming the tree was balanced at the start, then deleting half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts are not weak updates as they can quickly increase the tree’s depth.

The idea of global rebuilding is relatively straightforward, though how you would actually realize it in any particular example is not. The overall idea is simply that instead of waiting until the last moment and then rebuilding the data structure all at once, we’ll start the rebuild sooner and work at it incrementally as we perform other operations. If we update the new version faster than we update the original version, we’ll finish it by the time we would have wanted to perform a batched rebuild, and we can just switch to this new version.

More concretely, though still quite vaguely, global rebuilding involves, when a threshold is reached, rebuilding by creating a new “empty” version of the data structure called the shadow copy. The original version is the working copy. Work on rebuilding happens incrementally as operations are performed on the data structure. During this period, we service queries from the working copy and continue to update it as usual. Each update needs to make more progress on building the shadow copy than it worsens the working copy. For example, an insert should insert more nodes into the shadow copy than the working copy. Once the shadow copy is built, we may still have more work to do to incorporate changes that occurred after we started the rebuild. To this end, we can maintain a queue of update operations performed on the working copy since the start of a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again, we need to apply the updates from the queue at a fast enough rate so that we will eventually catch up. Of course, all of this needs to happen fast enough so that 1) the working copy doesn’t get too degraded before the shadow copy is ready, and 2) we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.

Coroutines

Okasaki passingly mentions that global rebuilding “can be usefully viewed as running the rebuilding transformation as a coroutine”. Also, the situation described above is quite reminiscent of garbage collection. There the classic half-space stop-the-world copying collector is naturally the batched rebuilding version. More incremental versions often have read or write barriers and break the garbage collection into incremental steps. Garbage collection is also often viewed as two processes coroutining.

The goal of this article is to derive global rebuilding-based data structures from an expression of them as two coroutining processes. Ideally, we should be able to take a data structure implemented via batched rebuilding and simply run the batch rebuilding step as a coroutine. Modifying the data structure’s operations and the rebuilding step should, in theory, just be a matter of inserting appropriate yield statements. Of course, it won’t be that easy since the batched version of rebuilding doesn’t need to worry about concurrent updates to the original data structure.

In theory, such a representation would be a perfectly effective way of articulating the global rebuilding version of the data structure. That said, I will be using the standard power move of CPS transforming and defunctionalizing to get a more data structure-like result.

I’ll implement coroutines as a very simplified case of modeling cooperative concurrency with continuations. In that context, a “process” written in continuation-passing style “yields” to the scheduler by passing its continuation to a scheduling function. Normally, the scheduler would place that continuation at the end of a work queue and then pick up a continuation from the front of the work queue and invoke it resuming the previously suspended “process”. In our case, we only have two “processes” so our “work queue” can just be a single mutable cell. When one “process” yields, it just swaps its continuation into the cell and the other “process’” out and invokes the continuation it read.

Since the rebuilding process is always driven by the main process, the pattern is a bit more like generators. This has the benefit that only the rebuilding process needs to be written in continuation-passing style. The following is a very quick and dirty set of functions for this.

module Coroutine ( YieldFn, spawn ) where
import Control.Monad ( join )
import Data.IORef ( IORef, newIORef, readIORef, writeIORef )

type YieldFn = IO () -> IO ()

yield :: IORef (IO ()) -> IO () -> IO ()
yield = writeIORef

resume :: IORef (IO ()) -> IO ()
resume = join . readIORef

terminate :: IORef (IO ()) -> IO ()
terminate yieldRef = writeIORef yieldRef (ioError $ userError "Subprocess completed")

spawn :: (YieldFn -> IO () -> IO ()) -> IO (IO ())
spawn process = do
    yieldRef <- newIORef undefined
    writeIORef yieldRef $ process (yield yieldRef) (terminate yieldRef)
    return (resume yieldRef)

A simple example of usage is:

process :: YieldFn -> Int -> IO () -> IO ()
process     _ 0 k = k
process yield i k = do
    putStrLn $ "Subprocess: " ++ show i
    yield $ process yield (i-1) k

example :: IO ()
example = do
    resume <- spawn $ \yield -> process yield 10
    forM_ [(1 :: Int) .. 10] $ \i -> do
        putStrLn $ "Main process: " ++ show i
        resume
    putStrLn "Main process done"

with output:

Main process: 1
Subprocess: 10
Main process: 2
Subprocess: 9
Main process: 3
Subprocess: 8
Main process: 4
Subprocess: 7
Main process: 5
Subprocess: 6
Main process: 6
Subprocess: 5
Main process: 7
Subprocess: 4
Main process: 8
Subprocess: 3
Main process: 9
Subprocess: 2
Main process: 10
Subprocess: 1
Main process done

Queues

I’ll use queues since they are very simple and Purely Functional Data Structures describes Hood-Melville Real-Time Queues in Figure 8.1 as an example of global rebuilding. We’ll end up with something quite similar which could be made more similar by changing the rebuilding code. Indeed, the differences are just an artifact of specific, easily changed details of the rebuilding coroutine, as we’ll see.

The examples I’ll present are mostly imperative, not purely functional. There are two reasons for this. First, I’m not focused on purely functional data structures and the technique works fine for imperative data structures. Second, it is arguably more natural to talk about coroutines in an imperative context. In this case, it’s easy to adapt the code to a purely functional version since it’s not much more than a purely functional data structure stuck in an IORef.

For a more imperative structure with mutable linked structure and/or in-place array updates, it would be more challenging to produce a purely functional version. The techniques here could still be used, though there are more “concurrency” concerns. While I don’t include the code here, I did a similar exercise for a random-access stack (a fancy way of saying a growable array). There the “concurrency” concern is that the elements you are copying to the new array may be popped and potentially overwritten before you switch to the new array. In this case, it’s easy to solve, since if the head pointer of the live version reaches the source offset for copy, you can just switch to the new array immediately.

Nevertheless, I can easily imagine scenarios where it may be beneficial, if not necessary, for the coroutines to communicate more and/or for there to be multiple “rebuild” processes. The approach used here could be easily adapted to that. It’s also worth mentioning that even in simpler cases, non-constant-time operations will either need to invoke resume multiple times or need more coordination with the “rebuild” process to know when it can do more than a constant amount of work. This could be accomplished by “rebuild” process simply recognizing this from the data structure state, or some state could be explicitly set to indicate this, or the techniques described earlier could be used, e.g. a different process for non-constant-time operations.

The code below uses the extensions BangPatterns, RecordWildCards, and GADTs.

Batched Rebuilding Implementation

We start with the straightforward, amortized constant-time queues where we push to a stack representing the back of the queue and pop from a stack representing the front. When the front stack is empty, we need to expensively reverse the back stack to make a new front stack.

I intentionally separate out the reverse step as an explicit rebuild function.

module BatchedRebuildingQueue ( Queue, new, enqueue, dequeue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef )

data Queue a = Queue {
    queueRef :: IORef ([a], [a])
}

new :: IO (Queue a)
new = do
    queueRef <- newIORef ([], [])
    return Queue { .. }

dequeue :: Queue a -> IO (Maybe a)
dequeue q@(Queue { .. }) = do
    (front, back) <- readIORef queueRef
    case front of
        (x:front') -> do
            writeIORef queueRef (front', back)
            return (Just x)
        [] -> case back of
                [] -> return Nothing
                _ -> rebuild q >> dequeue q

enqueue :: a -> Queue a -> IO ()
enqueue x (Queue { .. }) =
    modifyIORef queueRef (\(front, back) -> (front, x:back))

rebuild :: Queue a -> IO ()
rebuild (Queue { .. }) =
    modifyIORef queueRef (\([], back) -> (reverse back, []))

Global Rebuilding Implementation

This step is where a modicum of thought is needed. We need to make the rebuild step from the batched version incremental. This is straightforward, if tedious, given the coroutine infrastructure. In this case, we incrementalize the reverse by reimplementing reverse in CPS with some yield calls inserted. Then we need to incrementalize append. Since we’re not waiting until front is empty, we’re actually computing front ++ reverse back. Incrementalizing append is hard, so we actually reverse front and then use an incremental reverseAppend (which is basically what the incremental reverse does anyway1).

One of first thing to note about this code is that the actual operations are largely unchanged other than inserting calls to resume. In fact, dequeue is even simpler than in the batched version as we can just assume that front is always populated when the queue is not empty. dequeue is freed from the responsibility of deciding when to trigger a rebuild. Most of the bulk of this code is from reimplementing a reverseAppend function (twice).

The parts of this code that require some deeper though are 1) knowing when a rebuild should begin, 2) knowing how “fast” the incremental operations should go2 (e.g. incrementalReverse does two steps at a time and the Hood-Melville implementation has an explicit exec2 that does two steps at a time), and 3) dealing with “concurrent” changes.

For the last, Overmars describes a queue of deferred operations to perform on the shadow copy once it finishes rebuilding. This kind of suggests a situation where the “rebuild” process can reference some “snapshot” of the data structure. In our case, that is the situation we’re in, since our data structures are essentially immutable data structures in an IORef. However, it can easily not be the case, e.g. the random-access stack. Also, this operation queue approach can easily be inefficient and inelegant. None of the implementations below will have this queue of deferred operations. It is easier, more efficient, and more elegant to just not copy over parts of the queue that have been dequeued, rather than have an extra phase of the rebuilding that just pops off the elements of the front stack that we just pushed. A similar situation happens for the random-access stack.

The use of drop could probably be easily eliminated. (I’m not even sure it’s still necessary.) It is mostly an artifact of (not) dealing with off-by-one issues.

module GlobalRebuildingQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )
import Coroutine ( YieldFn, spawn )

data Queue a = Queue {
    resume :: IO (),
    frontRef :: IORef [a],
    backRef :: IORef [a],
    frontCountRef :: IORef Int,
    backCountRef :: IORef Int
}

new :: IO (Queue a)
new = do
    frontRef <- newIORef []
    backRef <- newIORef []
    frontCountRef <- newIORef 0
    backCountRef <- newIORef 0
    resume <- spawn $ const . rebuild frontRef backRef frontCountRef backCountRef
    return Queue { .. }

dequeue :: Queue a -> IO (Maybe a)
dequeue q = do
    resume q
    front <- readIORef (frontRef q)
    case front of
        [] -> return Nothing
        (x:front') -> do
            modifyIORef' (frontCountRef q) pred
            writeIORef (frontRef q) front'
            return (Just x)

enqueue :: a -> Queue a -> IO ()
enqueue x q = do
    modifyIORef (backRef q) (x:)
    modifyIORef' (backCountRef q) succ
    resume q

rebuild :: IORef [a] -> IORef [a] -> IORef Int -> IORef Int -> YieldFn -> IO ()
rebuild frontRef backRef frontCountRef backCountRef yield = let k = go k in go k where
  go k = do
    frontCount <- readIORef frontCountRef
    backCount <- readIORef backCountRef
    if backCount > frontCount then do
        back <- readIORef backRef
        front <- readIORef frontRef
        writeIORef backRef []
        writeIORef backCountRef 0
        incrementalReverse back [] $ \rback ->
            incrementalReverse front [] $ \rfront ->
                incrementalRevAppend rfront rback 0 backCount k
      else do
        yield k

  incrementalReverse [] acc k = k acc
  incrementalReverse [x] acc k = k (x:acc)
  incrementalReverse (x:y:xs) acc k = yield $ incrementalReverse xs (y:x:acc) k

  incrementalRevAppend [] front !movedCount backCount' k = do
    writeIORef frontRef front
    writeIORef frontCountRef $! movedCount + backCount'
    yield k
  incrementalRevAppend (x:rfront) acc !movedCount backCount' k = do
    currentFrontCount <- readIORef frontCountRef
    if currentFrontCount <= movedCount then do
        -- This drop count should be bounded by a constant.
        writeIORef frontRef $! drop (movedCount - currentFrontCount) acc
        writeIORef frontCountRef $! currentFrontCount + backCount'
        yield k
      else if null rfront then
        incrementalRevAppend [] (x:acc) (movedCount + 1) backCount' k
      else
        yield $! incrementalRevAppend rfront (x:acc) (movedCount + 1) backCount' k

Defunctionalized Global Rebuilding Implementation

This step is completely mechanical.

There’s arguably no reason to defunctionalize. It produces a result that is more data-structure-like, but, unless you need the code to work in a first-order language, there’s nothing really gained by doing this. It does lead to a result that is more directly comparable to other implementations.

For some data structures, having the continuation be analyzable would provide a simple means for the coroutines to communicate. The main process could directly look at the continuation to determine its state, e.g. if a rebuild is in-progress at all. The main process could also directly manipulate the stored continutation to change the “rebuild” process’ behavior. That said, doing this would mean that we’re not deriving the implementation. Still, the opportunity for additional optimizations and simplifications is nice.

As a minor aside, while it is, of course, obvious from looking at the previous version of the code, it’s neat how the Kont data type implies that the call stack is bounded and that most calls are tail calls. REVERSE_STEP is the only constructor that contains a Kont argument, but its type means that that argument can’t itself be a REVERSE_STEP. Again, I just find it neat how defunctionalization makes this concrete and explicit.

module DefunctionalizedQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )

data Kont a r where
  IDLE :: Kont a ()
  REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
  REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
  REV_APPEND_START :: [a] -> !Int -> Kont a [a]
  REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()

applyKont :: Queue a -> Kont a r -> r -> IO ()
applyKont q IDLE _ = rebuildLoop q
applyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
    incrementalReverse q front [] $ REV_APPEND_START rback backCount
applyKont q (REV_APPEND_START rback backCount) rfront =
    incrementalRevAppend q rfront rback 0 backCount
applyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ =
    incrementalRevAppend q rfront acc movedCount backCount

rebuildLoop :: Queue a -> IO ()
rebuildLoop q@(Queue { .. }) = do
    frontCount <- readIORef frontCountRef
    backCount <- readIORef backCountRef
    if backCount > frontCount then do
        back <- readIORef backRef
        front <- readIORef frontRef
        writeIORef backRef []
        writeIORef backCountRef 0
        incrementalReverse q back [] $ REVERSE_FRONT front backCount
      else do
        writeIORef resumeRef IDLE

incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> IO ()
incrementalReverse q [] acc k = applyKont q k acc
incrementalReverse q [x] acc k = applyKont q k (x:acc)
incrementalReverse q (x:y:xs) acc k = writeIORef (resumeRef q) $ REVERSE_STEP xs (y:x:acc) k

incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> IO ()
incrementalRevAppend (Queue { .. }) [] front !movedCount backCount' = do
    writeIORef frontRef front
    writeIORef frontCountRef $! movedCount + backCount'
    writeIORef resumeRef IDLE
incrementalRevAppend q@(Queue { .. }) (x:rfront) acc !movedCount backCount' = do
    currentFrontCount <- readIORef frontCountRef
    if currentFrontCount <= movedCount then do
        -- This drop count should be bounded by a constant.
        writeIORef frontRef $! drop (movedCount - currentFrontCount) acc
        writeIORef frontCountRef $! currentFrontCount + backCount'
        writeIORef resumeRef IDLE
      else if null rfront then
        incrementalRevAppend q [] (x:acc) (movedCount + 1) backCount'
      else
        writeIORef resumeRef $! REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount'

resume :: Queue a -> IO ()
resume q = do
    kont <- readIORef (resumeRef q)
    applyKont q kont ()

data Queue a = Queue {
    resumeRef :: IORef (Kont a ()),
    frontRef :: IORef [a],
    backRef :: IORef [a],
    frontCountRef :: IORef Int,
    backCountRef :: IORef Int
}

new :: IO (Queue a)
new = do
    frontRef <- newIORef []
    backRef <- newIORef []
    frontCountRef <- newIORef 0
    backCountRef <- newIORef 0
    resumeRef <- newIORef IDLE
    return Queue { .. }

dequeue :: Queue a -> IO (Maybe a)
dequeue q  = do
    resume q
    front <- readIORef (frontRef q)
    case front of
        [] -> return Nothing
        (x:front') -> do
            modifyIORef' (frontCountRef q) pred
            writeIORef (frontRef q) front'
            return (Just x)

enqueue :: a -> Queue a -> IO ()
enqueue x q = do
    modifyIORef (backRef q) (x:)
    modifyIORef' (backCountRef q) succ
    resume q

Functional Defunctionalized Global Rebuilding Implementation

This is just a straightforward reorganization of the previous code into purely functional code. This produces a persistent queue with worst-case constant time operations.

It is, of course, far uglier and more ad-hoc than Okasaki’s extremely elegant real-time queues, but the methodology to derive it was simple-minded. The result is also quite similar to the Hood-Melville Queues even though I did not set out to achieve that. That said, I’m pretty confident you could derive pretty much exactly the Hood-Melville queues with just minor modifications to Global Rebuilding Implementation.

module FunctionalQueue ( Queue, empty, dequeue, enqueue ) where

data Kont a r where
  IDLE :: Kont a ()
  REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
  REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
  REV_APPEND_START :: [a] -> !Int -> Kont a [a]
  REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()

applyKont :: Queue a -> Kont a r -> r -> Queue a
applyKont q IDLE _ = rebuildLoop q
applyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
    incrementalReverse q front [] $ REV_APPEND_START rback backCount
applyKont q (REV_APPEND_START rback backCount) rfront =
    incrementalRevAppend q rfront rback 0 backCount
applyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ =
    incrementalRevAppend q rfront acc movedCount backCount

rebuildLoop :: Queue a -> Queue a
rebuildLoop q@(Queue { .. }) =
    if backCount > frontCount then
        let q' = q { back = [], backCount = 0 } in
        incrementalReverse q' back [] $ REVERSE_FRONT front backCount
      else
        q { resumeKont = IDLE }

incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> Queue a
incrementalReverse q [] acc k = applyKont q k acc
incrementalReverse q [x] acc k = applyKont q k (x:acc)
incrementalReverse q (x:y:xs) acc k = q { resumeKont = REVERSE_STEP xs (y:x:acc) k }

incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> Queue a
incrementalRevAppend q [] front' !movedCount backCount' =
    q { front = front', frontCount = movedCount + backCount', resumeKont = IDLE }
incrementalRevAppend q (x:rfront) acc !movedCount backCount' =
    if frontCount q <= movedCount then
        -- This drop count should be bounded by a constant.
        let !front = drop (movedCount - frontCount q) acc in
        q { front = front, frontCount = frontCount q + backCount', resumeKont = IDLE }
      else if null rfront then
        incrementalRevAppend q [] (x:acc) (movedCount + 1) backCount'
      else
        q { resumeKont = REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount' }

resume :: Queue a -> Queue a
resume q = applyKont q (resumeKont q) ()

data Queue a = Queue {
    resumeKont :: !(Kont a ()),
    front :: [a],
    back :: [a],
    frontCount :: !Int,
    backCount :: !Int
}

empty :: Queue a
empty = Queue { resumeKont = IDLE, front = [], back = [], frontCount = 0, backCount = 0 }

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q =
    case front of
        [] -> (Nothing, q)
        (x:front') ->
            (Just x, q' { front = front', frontCount = frontCount - 1 })
  where q'@(Queue { .. }) = resume q

enqueue :: a -> Queue a -> Queue a
enqueue x q@(Queue { .. }) = resume (q { back = x:back, backCount = backCount + 1 })

Hood-Melville Implementation

This is just the Haskell code from Purely Functional Data Structures adapted to the interface of the other examples.

This code is mostly to compare. The biggest difference, other than some code structuring differences, is the front and back lists are reversed in parallel while my code does them sequentially. As mentioned before, to get a structure like that would simply be a matter of defining a parallel incremental reverse back in the Global Rebuilding Implementation.

Again, Okasaki’s real-time queue that can be seen as an application of the lazy rebuilding and scheduling techniques, described in his thesis and book, is a better implementation than this in pretty much every way.

module HoodMelvilleQueue (Queue, empty, dequeue, enqueue) where

data RotationState a
  = Idle
  | Reversing !Int [a] [a] [a] [a]
  | Appending !Int [a] [a]
  | Done [a]

data Queue a = Queue !Int [a] (RotationState a) !Int [a]

exec :: RotationState a -> RotationState a
exec (Reversing ok (x:f) f' (y:r) r') = Reversing (ok+1) f (x:f') r (y:r')
exec (Reversing ok [] f' [y] r') = Appending ok f' (y:r')
exec (Appending 0 f' r') = Done r'
exec (Appending ok (x:f') r') = Appending (ok-1) f' (x:r')
exec state = state

invalidate :: RotationState a -> RotationState a
invalidate (Reversing ok f f' r r') = Reversing (ok-1) f f' r r'
invalidate (Appending 0 f' (x:r')) = Done r'
invalidate (Appending ok f' r') = Appending (ok-1) f' r'
invalidate state = state

exec2 :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
exec2 !lenf f state lenr r =
    case exec (exec state) of
        Done newf -> Queue lenf newf Idle lenr r
        newstate -> Queue lenf f newstate lenr r

check :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
check !lenf f state !lenr r =
    if lenr <= lenf then exec2 lenf f state lenr r
    else let newstate = Reversing 0 f [] r []
         in exec2 (lenf+lenr) f newstate 0 []

empty :: Queue a
empty = Queue 0 [] Idle 0 []

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue _ [] _ _ _) = (Nothing, q)
dequeue (Queue lenf (x:f') state lenr r) =
    let !q' = check (lenf-1) f' (invalidate state) lenr r in
    (Just x, q')

enqueue :: a -> Queue a -> Queue a
enqueue x (Queue lenf f state lenr r) = check lenf f state (lenr+1) (x:r)

Okasaki’s Real-Time Queues

Just for completeness. This implementation crucially relies on lazy evaluation. Our queues are of the form Queue f r s. If you look carefully, you’ll notice that the only place we consume s is in the first clause of exec, and there we discard its elements. In other words, we only care about the length of s. s gets “decremented” each time we enqueue until it’s empty at which point we rotate r to f in the second clause of exec. The key thing is that f and s are initialized to the same value in that clause. That means each time we “decrement” s we are also forcing a bit of f. Forcing a bit of f/s means computing a bit of rotate. rotate xs ys a is an incremental version of xs ++ reverse ys ++ a (where we use the invariant length ys = 1 + length xs for the base case).

Using Okasaki’s terminology, rotate illustrates a simple form of lazy rebuilding where we use lazy evaluation rather than explicit or implicit coroutines to perform work “in parallel”. Here, we interleave the evaluation of rotate with enqueue and dequeue via forcing the conses of f/s. However, lazy rebuilding itself may not lead to worst-case optimal times (assuming it is amortized optimal). We need to use Okasaki’s other technique of scheduling to strategically force the thunks incrementally rather than all at once. Here s is a schedule telling us when to force parts of f. (As mentioned, s also serves as a counter telling us when to perform a rebuild.)

module OkasakiQueue ( Queue, empty, dequeue, enqueue ) where

data Queue a = Queue [a] ![a] [a]

empty :: Queue a
empty = Queue [] [] []

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue [] _ _) = (Nothing, q)
dequeue (Queue (x:f) r s) = (Just x, exec f r s)

rotate :: [a] -> [a] -> [a] -> [a]
rotate     [] (y: _) a = y:a
rotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)

exec :: [a] -> [a] -> [a] -> Queue a
exec f !r (_:s) = Queue f r s
exec f !r [] = let f' = rotate f r [] in Queue f' [] f'

enqueue :: a -> Queue a -> Queue a
enqueue x (Queue f r s) = exec f (x:r) s 

It’s instructive to compare the above to the following implementation which doesn’t use a schedule. This implementation is essentially the Banker’s Queue from Okasaki’s book, except we use lazy rebuilding to spread the xs ++ reverse ys (particularly the reverse part) over multiple dequeues via rotate. The following implementation performs extremely well in my benchmark, but the operations are subtly not constant-time. Specifically, after a long series of enqueues, a dequeue will do work proportional to the logarithm of the number of enqueues. Essentially, f will be a nested series of rotate calls, one for every doubling of the length of the queue. Even if we change let f' to let !f', that will only make the first dequeue cheap. The second will still be expensive.

module UnscheduledOkasakiQueue ( Queue, empty, dequeue, enqueue ) where

data Queue a = Queue [a] !Int [a] !Int

empty :: Queue a
empty = Queue [] 0 [] 0

dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue [] _ _ _) = (Nothing, q)
dequeue (Queue (x:f) lenf r lenr) = (Just x, exec f (lenf - 1) r lenr)

rotate :: [a] -> [a] -> [a] -> [a]
rotate     [] (y: _) a = y:a
rotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)

exec :: [a] -> Int -> [a] -> Int -> Queue a
exec f !lenf !r !lenr | lenf >= lenr = Queue f lenf r lenr
exec f !lenf !r !lenr = let f' = rotate f r [] in Queue f' (lenf + lenr) [] 0

enqueue :: a -> Queue a -> Queue a
enqueue x (Queue f lenf r lenr) = exec f lenf (x:r) (lenr + 1) 

Empirical Evaluation

I won’t reproduce the evaluation code as it’s not very sophisticated or interesting. It randomly generated a sequence of enqueues and dequeues with an 80% chance to produce an enqueue over a dequeue so that the queues would grow. It measured the average time of an enqueue and a dequeue, as well as the maximum time of any single dequeue.

The main thing I wanted to see was relatively stable average enqueue and dequeue times with only the batched implementation having a growing maximum dequeue time. This is indeed what I saw, though it took about 1,000,000 operations (or really a queue of a couple hundred thousand elements) for the numbers to stabilize.

The results were mostly unsurprising. Unsurprisingly, in overall time, the batched implementation won. Its enqueue is also, obviously, the fastest. (Indeed, there’s a good chance my measurement of its average enqueue time was largely a measurement of the timer’s resolution.) The operations’ average times were stable illustrating their constant (amortized) time. At large enough sizes, the ratio of the maximum dequeue time versus the average stabilized around 7000 to 1, except, of course, for the batched version which grew linearly to millions to 1 ratios at queue sizes of tens of millions of elements. This illustrates the worst-case time complexity of all the other implementations, and the merely amortized time complexity of the batched one.

While the batched version was best in overall time, the difference wasn’t that great. The worst implementations were still less 1.4x slower. All the worst-case optimal implementations performed roughly the same, but there were still some clear winners and losers. Okasaki’s real-time queue is almost on-par with the batched implementation in overall time and handily beats the other implementations in average enqueue and dequeue times. The main surprise for me was that the loser was the Hood-Melville queue. My guess is this is due to invalidate which seems like it would do more work and produce more garbage than the approach taken in my functional version.

Conclusion

The point of this article was to illustrate the process of deriving a deamortized data structure from an amortized one utilizing batched rebuilding by explicitly modeling global rebuilding as a coroutine.

The point wasn’t to produce the fastest queue implementation, though I am pretty happy with the results. While this is an extremely simple example, it was still nice that each step was very easy and natural. It’s especially nice that this derivation approach produced a better result than the Hood-Melville queue.

Of course, my advice is to use Okasaki’s real-time queue if you need a purely functional queue with worst-case constant-time operations.


  1. This code could definitely be refactored to leverage this similarity to reduce code. Alternatively, one could refunctionalize the Hood-Melville implementation at the end.↩︎

  2. Going “too fast”, so long as it’s still a constant amount of work for each step, isn’t really an issue asymptotically, so you can just crank the knobs if you don’t want to think too hard about it. That said, going faster than you need to will likely give you worse worst-case constant factors. In some cases, going faster than necessary could reduce constant factors, e.g. by better utilizing caches and disk I/O buffers.↩︎

October 04, 2024 08:24 AM

Edward Z. Yang

What’s different this time? LLM edition

One of the things that I learned in grad school is that even if you've picked an important and unsolved problem, you need some reason to believe it is solvable--especially if people have tried to solve it before! In other words, "What's different this time?" This is perhaps a dreary way of shooting down otherwise promising research directions, but you can flip it around: when the world changes, you can ask, "What can I do now that I couldn't do before?"

This post is a list of problems in areas that I care about (half of this is PL flavor, since that's what I did my PhD in), where I suspect something has changed with the advent of LLMs. It's not a list of recipes; there is still hard work to figure out how exactly an LLM can be useful (for most of these, just feeding the entire problem into ChatGPT usually doesn't work). But I often talk to people want to get started on something, anything, but have no idea to start. Try here!

Static analysis. The chasm between academic static analysis work and real world practice is the scaling problems that come with trying to apply the technique to a full size codebase. Asymptotics strike as LOC goes up, language focused techniques flounder in polyglot codebases, and "Does anyone know how to write cmake?" But this is predicated on the idea that static analysis has to operate on a whole program. It doesn't; humans can do perfectly good static analysis on fragments of code without having to hold the entire codebase in their head, without needing access to a build system. They make assumptions about APIs and can do local reasoning. LLMs can play a key role in drafting these assumptions so that local reasoning can occur. What if the LLM gets it wrong? Well, if an LLM could get it wrong, an inattentive junior developer might get it wrong too--maybe there is a problem in the API design. LLMs already do surprisingly well if you one-shot prompt them to find bugs in code; with more traditional static analysis support, maybe they can do even better.

DSL purgatory. Consider a problem that can be solved with code in a procedural way, but only by writing lots of tedious, error prone boilerplate (some examples: drawing diagrams, writing GUIs, SQL queries, building visualizations, scripting website/mobile app interactions, end to end testing). The PL dream is to design a sweet compositional DSL that raises the level of abstraction so that you can render a Hilbert curve in seven lines of code. But history is also abound with cases where the DSL did not solve the problems, or maybe it did solve the problem but only after years of grueling work, and so there are still many problems that feel like there ought to be a DSL that should solve them but there isn't. The promise of LLMs is that they are extremely good at regurgitating low level procedural actions that could conceivably be put together in a DSL. A lot of the best successes of LLMs today is putting coding powers in the hands of domain experts that otherwise do not how to code; could it also help in putting domain expertise in the hands of people who can code?

I am especially interested in these domains:

  • SQL - Its strange syntax purportedly makes it easier for non-software engineers to understand, whereas many (myself included) would often prefer a more functional syntax ala LINQ/list comprehensions. It's pretty hard to make an alternate SQL syntax take off though, because SQL is not one language, but many many dialects everywhere with no obvious leverage point. That sounds like an LLM opportunity. Or heck, just give me one of those AI editor environments but specifically fine tuned for SQL/data visualization, don't even bother with general coding.
  • End to end testing - This is https://momentic.ai/ but personally I'm not going to rely on a proprietary product for testing in my OSS projects. There's definitely an OSS opportunity here.
  • Scripting website/mobile app interactions - The website scraping version of this is https://reworkd.ai/ but I am also pretty interested in this from the browser extension angle: to some extent I can take back control of my frontend experience with browser extensions; can I go further with LLMs? And we typically don't imagine that I can do the same with a mobile app... but maybe I can??

OSS bread and butter. Why is Tesseract still the number one OSS library for OCR? Why is smooth and beautiful text to voice not ubiquitous? Why is the voice control on my Tesla so bad? Why is the wake word on my Android device so unreliable? Why doesn't the screenshot parser on a fansite for my favorite mobage not able to parse out icons? The future has arrived, but it is not uniformly distributed.

Improving the pipeline from ephemeral to durable stores of knowledge. Many important sources of knowledge are trapped in "ephemeral" stores, like Discord servers, private chat conversations, Reddit posts, Twitter threads, blog posts, etc. In an ideal world, there would be a pipeline of this knowledge into more durable, indexable forms for the benefit of all, but actually doing this is time consuming. Can LLMs help? Note that the dream of LLMs is you can just feed all of this data into the model and just ask questions to it. I'm OK with something a little bit more manual, we don't have to solve RAG first.

by Edward Z. Yang at October 04, 2024 04:30 AM

October 02, 2024

Ken T Takusagawa

[mlzpqxqu] import with type signature

proposal for a Haskell language extension: when importing a function from another module, one may optionally also specify a type signature for the imported function.  this would be helpful for code understanding.  the reader would have immediately available the type of the imported symbol, not having to go track down the type in the source module (which may be many steps away when modules re-export symbols, and the source module might not even have a type annotation), nor use a tool such as ghci to query it.  (maybe the code currently fails to compile for other reasons, so ghci is not available.)

if a function with the specified type signature is not exported by an imported module, the compiler can offer suggestions of other functions exported by the module which do have, or unify with, the imported type signature.  maybe the function got renamed in a new version of the module.

or, the compiler can do what Hoogle does and search among all modules in its search path for functions with the given signature.  maybe the function got moved to a different module.

the specified type signature may be narrower than how the function was originally defined.  this can limit some of the insanity caused by the Foldable Traversable Proposal (FTP):

import Prelude(length :: [a] -> Int) -- prevent length from being called on tuples and Maybe

various potentially tricky issues:

  1. a situation similar to the diamond problem (multiple inheritance) in object-oriented programming: module A defines a polymorphic function f, imported then re-exported by modules B and C.  module D imports both B and C, unqualified.  B imports and re-exports f from A with a type signature more narrow than originally defined in A.  C does not change the type signature.  what is the type of f as seen by D?  which version of f, which path through B or C, does D see?  solution might be simple: if the function through different paths are not identical, then the user has to qualify.

  2. the following tries to make List.length available only for lists, and Foldable.length available for anything else.  is this asking for trouble?

    import Prelude hiding(length);
    import qualified Prelude(length :: [a] -> Int) as List;
    import qualified Prelude(length) as Foldable;

by Unknown (noreply@blogger.com) at October 02, 2024 12:43 AM

October 01, 2024

Haskell Interlude

56: Satnam Singh

Today on the Haskell Interlude, Matti and Sam are joined by Satnam Singh. Satnam has been a lecturer at Glasgow, and Software Engineer at Google, Meta, and now Groq. He talks about convincing people to use Haskell, laying out circuits and why community matters.

PS: After the recording, it was important to Satnam to clarify that his advise to “not be afraid to loose your job” was specially meant to encourage to quit jobs that are not good for you, if possible, but he acknowledges that unfortunately not everybody can afford that risk.

by Haskell Podcast at October 01, 2024 05:00 PM

Brent Yorgey

Retiring BlogLiterately

Retiring BlogLiterately

Posted on October 1, 2024
Tagged , , , , , ,

Way back in 2012 I took over maintainership of the BlogLiterately tool from Robert Greayer, its initial author. I used it for many years to post to my Wordpress blog, added a bunch of features, solved some fun bugs, and created the accompanying BlogLiterately-diagrams plugin for embedding diagrams code in blog posts. However, now that I have fled Wordpress and rebuilt my blog with hakyll, I don’t use BlogLiterately any more (there is even a diagrams-pandoc package which does the same thing BlogLiterately-diagrams used to do). So, as of today I am officially declaring BlogLiterately unsupported.

The fact is, I haven’t actually updated BlogLiterately since March of last year. It currently only builds on GHC 9.4 or older, and no one has complained, which I take as strong evidence that no one else is using it either! However, if anyone out there is actually using it, and would like to take over as maintainer, I would be very happy to pass it along to you.

I do plan to continue maintaining HaXml and haxr, at least for now; unlike BlogLiterately, I know they are still in use, especially HaXml. However, BlogLiterately was really the only reason I cared about these packages personally, so I would be happy to pass them along as well; please get in touch if you would be willing to take over maintaining one or both packages.

<noscript>Javascript needs to be activated to view comments.</noscript>

by Brent Yorgey at October 01, 2024 12:00 AM

September 30, 2024

Chris Reade

PenroseKiteDart User Guide

Introduction

PenroseKiteDart is a Haskell package with tools to experiment with finite tilings of Penrose’s Kites and Darts. It uses the Haskell Diagrams package for drawing tilings. As well as providing drawing tools, this package introduces tile graphs (Tgraphs) for describing finite tilings. (I would like to thank Stephen Huggett for suggesting planar graphs as a way to reperesent the tilings).

This document summarises the design and use of the PenroseKiteDart package.

PenroseKiteDart package is now available on Hackage.

The source files are available on GitHub at https://github.com/chrisreade/PenroseKiteDart.

There is a small art gallery of examples created with PenroseKiteDart here.

Index

  1. About Penrose’s Kites and Darts
  2. Using the PenroseKiteDart Package (initial set up).
  3. Overview of Types and Operations
  4. Drawing in more detail
  5. Forcing in more detail
  6. Advanced Operations
  7. Other Reading

1. About Penrose’s Kites and Darts

The Tiles

In figure 1 we show a dart and a kite. All angles are multiples of 36^{\circ} (a tenth of a full turn). If the shorter edges are of length 1, then the longer edges are of length \phi, where \phi = (1+ \sqrt{5})/ 2 is the golden ratio.

Figure 1: The Dart and Kite Tiles
Figure 1: The Dart and Kite Tiles

Aperiodic Infinite Tilings

What is interesting about these tiles is:

It is possible to tile the entire plane with kites and darts in an aperiodic way.

Such a tiling is non-periodic and does not contain arbitrarily large periodic regions or patches.

The possibility of aperiodic tilings with kites and darts was discovered by Sir Roger Penrose in 1974. There are other shapes with this property, including a chiral aperiodic monotile discovered in 2023 by Smith, Myers, Kaplan, Goodman-Strauss. (See the Penrose Tiling Wikipedia page for the history of aperiodic tilings)

This package is entirely concerned with Penrose’s kite and dart tilings also known as P2 tilings.

In figure 2 we add a temporary green line marking purely to illustrate a rule for making legal tilings. The purpose of the rule is to exclude the possibility of periodic tilings.

If all tiles are marked as shown, then whenever tiles come together at a point, they must all be marked or must all be unmarked at that meeting point. So, for example, each long edge of a kite can be placed legally on only one of the two long edges of a dart. The kite wing vertex (which is marked) has to go next to the dart tip vertex (which is marked) and cannot go next to the dart wing vertex (which is unmarked) for a legal tiling.

Figure 2: Marked Dart and Kite
Figure 2: Marked Dart and Kite

Correct Tilings

Unfortunately, having a finite legal tiling is not enough to guarantee you can continue the tiling without getting stuck. Finite legal tilings which can be continued to cover the entire plane are called correct and the others (which are doomed to get stuck) are called incorrect. This means that decomposition and forcing (described later) become important tools for constructing correct finite tilings.

2. Using the PenroseKiteDart Package

You will need the Haskell Diagrams package (See Haskell Diagrams) as well as this package (PenroseKiteDart). When these are installed, you can produce diagrams with a Main.hs module. This should import a chosen backend for diagrams such as the default (SVG) along with Diagrams.Prelude.

    module Main (main) where
    
    import Diagrams.Backend.SVG.CmdLine
    import Diagrams.Prelude

For Penrose’s Kite and Dart tilings, you also need to import the PKD module and (optionally) the TgraphExamples module.

    import PKD
    import TgraphExamples

Then to ouput someExample figure

    fig::Diagram B
    fig = someExample

    main :: IO ()
    main = mainWith fig

Note that the token B is used in the diagrams package to represent the chosen backend for output. So a diagram has type Diagram B. In this case B is bound to SVG by the import of the SVG backend. When the compiled module is executed it will generate an SVG file. (See Haskell Diagrams for more details on producing diagrams and using alternative backends).

3. Overview of Types and Operations

Half-Tiles

In order to implement operations on tilings (decompose in particular), we work with half-tiles. These are illustrated in figure 3 and labelled RD (right dart), LD (left dart), LK (left kite), RK (right kite). The join edges where left and right halves come together are shown with dotted lines, leaving one short edge and one long edge on each half-tile (excluding the join edge). We have shown a red dot at the vertex we regard as the origin of each half-tile (the tip of a half-dart and the base of a half-kite).

Figure 3: Half-Tile pieces showing join edges (dashed) and origin vertices (red dots)
Figure 3: Half-Tile pieces showing join edges (dashed) and origin vertices (red dots)

The labels are actually data constructors introduced with type operator HalfTile which has an argument type (rep) to allow for more than one representation of the half-tiles.

    data HalfTile rep 
      = LD rep -- Left Dart
      | RD rep -- Right Dart
      | LK rep -- Left Kite
      | RK rep -- Right Kite
      deriving (Show,Eq)

Tgraphs

We introduce tile graphs (Tgraphs) which provide a simple planar graph representation for finite patches of tiles. For Tgraphs we first specialise HalfTile with a triple of vertices (positive integers) to make a TileFace such as RD(1,2,3), where the vertices go clockwise round the half-tile triangle starting with the origin.

    type TileFace  = HalfTile (Vertex,Vertex,Vertex)
    type Vertex    = Int  -- must be positive

The function

    makeTgraph :: [TileFace] -> Tgraph

then constructs a Tgraph from a TileFace list after checking the TileFaces satisfy certain properties (described below). We also have

    faces :: Tgraph -> [TileFace]

to retrieve the TileFace list from a Tgraph.

As an example, the fool (short for fool’s kite and also called an ace in the literature) consists of two kites and a dart (= 4 half-kites and 2 half-darts):

    fool :: Tgraph
    fool = makeTgraph [RD (1,2,3), LD (1,3,4)   -- right and left dart
                      ,LK (5,3,2), RK (5,2,7)   -- left and right kite
                      ,RK (5,4,3), LK (5,6,4)   -- right and left kite
                      ]

To produce a diagram, we simply draw the Tgraph

    foolFigure :: Diagram B
    foolFigure = draw fool

which will produce the diagram on the left in figure 4.

Alternatively,

    foolFigure :: Diagram B
    foolFigure = labelled drawj fool

will produce the diagram on the right in figure 4 (showing vertex labels and dashed join edges).

Figure 4: Diagram of fool without labels and join edges (left), and with (right)
Figure 4: Diagram of fool without labels and join edges (left), and with (right)

When any (non-empty) Tgraph is drawn, a default orientation and scale are chosen based on the lowest numbered join edge. This is aligned on the positive x-axis with length 1 (for darts) or length \phi (for kites).

Tgraph Properties

Tgraphs are actually implemented as

    newtype Tgraph = Tgraph [TileFace]
                     deriving (Show)

but the data constructor Tgraph is not exported to avoid accidentally by-passing checks for the required properties. The properties checked by makeTgraph ensure the Tgraph represents a legal tiling as a planar graph with positive vertex numbers, and that the collection of half-tile faces are both connected and have no crossing boundaries (see note below). Finally, there is a check to ensure two or more distinct vertex numbers are not used to represent the same vertex of the graph (a touching vertex check). An error is raised if there is a problem.

Note: If the TilFaces are faces of a planar graph there will also be exterior (untiled) regions, and in graph theory these would also be called faces of the graph. To avoid confusion, we will refer to these only as exterior regions, and unless otherwise stated, face will mean a TileFace. We can then define the boundary of a list of TileFaces as the edges of the exterior regions. There is a crossing boundary if the boundary crosses itself at a vertex. We exclude crossing boundaries from Tgraphs because they prevent us from calculating relative positions of tiles locally and create touching vertex problems.

For convenience, in addition to makeTgraph, we also have

    makeUncheckedTgraph :: [TileFace] -> Tgraph
    checkedTgraph   :: [TileFace] -> Tgraph

The first of these (performing no checks) is useful when you know the required properties hold. The second performs the same checks as makeTgraph except that it omits the touching vertex check. This could be used, for example, when making a Tgraph from a sub-collection of TileFaces of another Tgraph.

Main Tiling Operations

There are three key operations on finite tilings, namely

    decompose :: Tgraph -> Tgraph
    force     :: Tgraph -> Tgraph
    compose   :: Tgraph -> Tgraph

Decompose

Decomposition (also called deflation) works by splitting each half-tile into either 2 or 3 new (smaller scale) half-tiles, to produce a new tiling. The fact that this is possible, is used to establish the existence of infinite aperiodic tilings with kites and darts. Since our Tgraphs have abstracted away from scale, the result of decomposing a Tgraph is just another Tgraph. However if we wish to compare before and after with a drawing, the latter should be scaled by a factor 1/{\phi} = \phi - 1 times the scale of the former, to reflect the change in scale.

Figure 5: fool (left) and decompose fool (right)
Figure 5: fool (left) and decompose fool (right)

We can, of course, iterate decompose to produce an infinite list of finer and finer decompositions of a Tgraph

    decompositions :: Tgraph -> [Tgraph]
    decompositions = iterate decompose

Force

Force works by adding any TileFaces on the boundary edges of a Tgraph which are forced. That is, where there is only one legal choice of TileFace addition consistent with the seven possible vertex types. Such additions are continued until either (i) there are no more forced cases, in which case a final (forced) Tgraph is returned, or (ii) the process finds the tiling is stuck, in which case an error is raised indicating an incorrect tiling. [In the latter case, the argument to force must have been an incorrect tiling, because the forced additions cannot produce an incorrect tiling starting from a correct tiling.]

An example is shown in figure 6. When forced, the Tgraph on the left produces the result on the right. The original is highlighted in red in the result to show what has been added.

Figure 6: A Tgraph (left) and its forced result (right) with the original shown red
Figure 6: A Tgraph (left) and its forced result (right) with the original shown red

Compose

Composition (also called inflation) is an opposite to decompose but this has complications for finite tilings, so it is not simply an inverse. (See Graphs,Kites and Darts and Theorems for more discussion of the problems). Figure 7 shows a Tgraph (left) with the result of composing (right) where we have also shown (in pale green) the faces of the original that are not included in the composition – the remainder faces.

Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green
Figure 7: A Tgraph (left) and its (part) composed result (right) with the remainder faces shown pale green

Under some circumstances composing can fail to produce a Tgraph because there are crossing boundaries in the resulting TileFaces. However, we have established that

  • If g is a forced Tgraph, then compose g is defined and it is also a forced Tgraph.

Try Results

It is convenient to use types of the form Try a for results where we know there can be a failure. For example, compose can fail if the result does not pass the connected and no crossing boundary check, and force can fail if its argument is an incorrect Tgraph. In situations when you would like to continue some computation rather than raise an error when there is a failure, use a try version of a function.

    tryCompose :: Tgraph -> Try Tgraph
    tryForce   :: Tgraph -> Try Tgraph

We define Try as a synonym for Either String (which is a monad) in module Tgraph.Try.

type Try a = Either String a

Successful results have the form Right r (for some correct result r) and failure results have the form Left s (where s is a String describing the problem as a failure report).

The function

    runTry:: Try a -> a
    runTry = either error id

will retrieve a correct result but raise an error for failure cases. This means we can always derive an error raising version from a try version of a function by composing with runTry.

    force = runTry . tryForce
    compose = runTry . tryCompose

Elementary Tgraph and TileFace Operations

The module Tgraph.Prelude defines elementary operations on Tgraphs relating vertices, directed edges, and faces. We describe a few of them here.

When we need to refer to particular vertices of a TileFace we use

    originV :: TileFace -> Vertex -- the first vertex - red dot in figure 2
    oppV    :: TileFace -> Vertex -- the vertex at the opposite end of the join edge from the origin
    wingV   :: TileFace -> Vertex -- the vertex not on the join edge

A directed edge is represented as a pair of vertices.

    type Dedge = (Vertex,Vertex)

So (a,b) is regarded as a directed edge from a to b. In the special case that a list of directed edges is symmetrically closed [(b,a) is in the list whenever (a,b) is in the list] we can think of this as an edge list rather than just a directed edge list.

For example,

    internalEdges :: Tgraph -> [Dedge]

produces an edge list, whereas

    graphBoundary :: Tgraph -> [Dedge]

produces single directions. Each directed edge in the resulting boundary will have a TileFace on the left and an exterior region on the right. The function

    graphDedges :: Tgraph -> [Dedge]

produces all the directed edges obtained by going clockwise round each TileFace so not every edge in the list has an inverse in the list.

The above three functions are defined using

    faceDedges :: TileFace -> [Dedge]

which produces a list of the three directed edges going clockwise round a TileFace starting at the origin vertex.

When we need to refer to particular edges of a TileFace we use

    joinE  :: TileFace -> Dedge  -- shown dotted in figure 2
    shortE :: TileFace -> Dedge  -- the non-join short edge
    longE  :: TileFace -> Dedge  -- the non-join long edge

which are all directed clockwise round the TileFace. In contrast, joinOfTile is always directed away from the origin vertex, so is not clockwise for right darts or for left kites:

    joinOfTile:: TileFace -> Dedge
    joinOfTile face = (originV face, oppV face)

Patches (Scaled and Positioned Tilings)

Behind the scenes, when a Tgraph is drawn, each TileFace is converted to a Piece. A Piece is another specialisation of HalfTile using a two dimensional vector to indicate the length and direction of the join edge of the half-tile (from the originV to the oppV), thus fixing its scale and orientation. The whole Tgraph then becomes a list of located Pieces called a Patch.

    type Piece = HalfTile (V2 Double)
    type Patch = [Located Piece]

Piece drawing functions derive vectors for other edges of a half-tile piece from its join edge vector. In particular (in the TileLib module) we have

    drawPiece :: Piece -> Diagram B
    dashjPiece :: Piece -> Diagram B
    fillPieceDK :: Colour Double -> Colour Double -> Piece -> Diagram B

where the first draws the non-join edges of a Piece, the second does the same but adds a dashed line for the join edge, and the third takes two colours – one for darts and one for kites, which are used to fill the piece as well as using drawPiece.

Patch is an instances of class Transformable so a Patch can be scaled, rotated, and translated.

Vertex Patches

It is useful to have an intermediate form between Tgraphs and Patches, that contains information about both the location of vertices (as 2D points), and the abstract TileFaces. This allows us to introduce labelled drawing functions (to show the vertex labels) which we then extend to Tgraphs. We call the intermediate form a VPatch (short for Vertex Patch).

    type VertexLocMap = IntMap.IntMap (Point V2 Double)
    data VPatch = VPatch {vLocs :: VertexLocMap,  vpFaces::[TileFace]} deriving Show

and

    makeVP :: Tgraph -> VPatch

calculates vertex locations using a default orientation and scale.

VPatch is made an instance of class Transformable so a VPatch can also be scaled and rotated.

One essential use of this intermediate form is to be able to draw a Tgraph with labels, rotated but without the labels themselves being rotated. We can simply convert the Tgraph to a VPatch, and rotate that before drawing with labels.

    labelled draw (rotate someAngle (makeVP g))

We can also align a VPatch using vertex labels.

    alignXaxis :: (Vertex, Vertex) -> VPatch -> VPatch 

So if g is a Tgraph with vertex labels a and b we can align it on the x-axis with a at the origin and b on the positive x-axis (after converting to a VPatch), instead of accepting the default orientation.

    labelled draw (alignXaxis (a,b) (makeVP g))

Another use of VPatches is to share the vertex location map when drawing only subsets of the faces (see Overlaid examples in the next section).

4. Drawing in More Detail

Class Drawable

There is a class Drawable with instances Tgraph, VPatch, Patch. When the token B is in scope standing for a fixed backend then we can assume

    draw   :: Drawable a => a -> Diagram B  -- draws non-join edges
    drawj  :: Drawable a => a -> Diagram B  -- as with draw but also draws dashed join edges
    fillDK :: Drawable a => Colour Double -> Colour Double -> a -> Diagram B -- fills with colours

where fillDK clr1 clr2 will fill darts with colour clr1 and kites with colour clr2 as well as drawing non-join edges.

These are the main drawing tools. However they are actually defined for any suitable backend b so have more general types.

(Update Sept 2024) As of version 1.1 of PenroseKiteDart, these will be

    draw ::   (Drawable a, OKBackend b) =>
              a -> Diagram b
    drawj ::  (Drawable a, OKBackend) b) =>
              a -> Diagram b
    fillDK :: (Drawable a, OKBackend b) =>
              Colour Double -> Colour Double -> a -> Diagram b

where the class OKBackend is a check to ensure a backend is suitable for drawing 2D tilings with or without labels.

In these notes we will generally use the simpler description of types using B for a fixed chosen backend for the sake of clarity.

The drawing tools are each defined via the class function drawWith using Piece drawing functions.

    class Drawable a where
        drawWith :: (Piece -> Diagram B) -> a -> Diagram B
    
    draw = drawWith drawPiece
    drawj = drawWith dashjPiece
    fillDK clr1 clr2 = drawWith (fillPieceDK clr1 clr2)

To design a new drawing function, you only need to implement a function to draw a Piece, (let us call it newPieceDraw)

    newPieceDraw :: Piece -> Diagram B

This can then be elevated to draw any Drawable (including Tgraphs, VPatches, and Patches) by applying the Drawable class function drawWith:

    newDraw :: Drawable a => a -> Diagram B
    newDraw = drawWith newPieceDraw

Class DrawableLabelled

Class DrawableLabelled is defined with instances Tgraph and VPatch, but Patch is not an instance (because this does not retain vertex label information).

    class DrawableLabelled a where
        labelColourSize :: Colour Double -> Measure Double -> (Patch -> Diagram B) -> a -> Diagram B

So labelColourSize c m modifies a Patch drawing function to add labels (of colour c and size measure m). Measure is defined in Diagrams.Prelude with pre-defined measures tiny, verySmall, small, normal, large, veryLarge, huge. For most of our diagrams of Tgraphs, we use red labels and we also find small is a good default size choice, so we define

    labelSize :: DrawableLabelled a => Measure Double -> (Patch -> Diagram B) -> a -> Diagram B
    labelSize = labelColourSize red

    labelled :: DrawableLabelled a => (Patch -> Diagram B) -> a -> Diagram B
    labelled = labelSize small

and then labelled draw, labelled drawj, labelled (fillDK clr1 clr2) can all be used on both Tgraphs and VPatches as well as (for example) labelSize tiny draw, or labelCoulourSize blue normal drawj.

Further drawing functions

There are a few extra drawing functions built on top of the above ones. The function smart is a modifier to add dashed join edges only when they occur on the boundary of a Tgraph

    smart :: (VPatch -> Diagram B) -> Tgraph -> Diagram B

So smart vpdraw g will draw dashed join edges on the boundary of g before applying the drawing function vpdraw to the VPatch for g. For example the following all draw dashed join edges only on the boundary for a Tgraph g

    smart draw g
    smart (labelled draw) g
    smart (labelSize normal draw) g

When using labels, the function rotateBefore allows a Tgraph to be drawn rotated without rotating the labels.

    rotateBefore :: (VPatch -> a) -> Angle Double -> Tgraph -> a
    rotateBefore vpdraw angle = vpdraw . rotate angle . makeVP

So for example,

    rotateBefore (labelled draw) (90@@deg) g

makes sense for a Tgraph g. Of course if there are no labels we can simply use

    rotate (90@@deg) (draw g)

Similarly alignBefore allows a Tgraph to be aligned on the X-axis using a pair of vertex numbers before drawing.

    alignBefore :: (VPatch -> a) -> (Vertex,Vertex) -> Tgraph -> a
    alignBefore vpdraw (a,b) = vpdraw . alignXaxis (a,b) . makeVP

So, for example, if Tgraph g has vertices a and b, both

    alignBefore draw (a,b) g
    alignBefore (labelled draw) (a,b) g

make sense. Note that the following examples are wrong. Even though they type check, they re-orient g without repositioning the boundary joins.

    smart (labelled draw . rotate angle) g      -- WRONG
    smart (labelled draw . alignXaxis (a,b)) g  -- WRONG

Instead use

    smartRotateBefore (labelled draw) angle g
    smartAlignBefore (labelled draw) (a,b) g

where

    smartRotateBefore :: (VPatch -> Diagram B) -> Angle Double -> Tgraph -> Diagram B
    smartAlignBefore  :: (VPatch -> Diagram B) -> (Vertex,Vertex) -> Tgraph -> Diagram B

are defined using

    restrictSmart :: Tgraph -> (VPatch -> Diagram B) -> VPatch -> Diagram B

Here, restrictSmart g vpdraw vp uses the given vp for drawing boundary joins and drawing faces of g (with vpdraw) rather than converting g to a new VPatch. This assumes vp has locations for vertices in g.

Overlaid examples (location map sharing)

The function

    drawForce :: Tgraph -> Diagram B

will (smart) draw a Tgraph g in red overlaid (using <>) on the result of force g as in figure 6. Similarly

    drawPCompose  :: Tgraph -> Diagram B

applied to a Tgraph g will draw the result of a partial composition of g as in figure 7. That is a drawing of compose g but overlaid with a drawing of the remainder faces of g shown in pale green.

Both these functions make use of sharing a vertex location map to get correct alignments of overlaid diagrams. In the case of drawForce g, we know that a VPatch for force g will contain all the vertex locations for g since force only adds to a Tgraph (when it succeeds). So when constructing the diagram for g we can use the VPatch created for force g instead of starting afresh. Similarly for drawPCompose g the VPatch for g contains locations for all the vertices of compose g so compose g is drawn using the the VPatch for g instead of starting afresh.

The location map sharing is done with

    subVP :: VPatch -> [TileFace] -> VPatch

so that subVP vp fcs is a VPatch with the same vertex locations as vp, but replacing the faces of vp with fcs. [Of course, this can go wrong if the new faces have vertices not in the domain of the vertex location map so this needs to be used with care. Any errors would only be discovered when a diagram is created.]

For cases where labels are only going to be drawn for certain faces, we need a version of subVP which also gets rid of vertex locations that are not relevant to the faces. For this situation we have

    restrictVP:: VPatch -> [TileFace] -> VPatch

which filters out un-needed vertex locations from the vertex location map. Unlike subVP, restrictVP checks for missing vertex locations, so restrictVP vp fcs raises an error if a vertex in fcs is missing from the keys of the vertex location map of vp.

5. Forcing in More Detail

The force rules

The rules used by our force algorithm are local and derived from the fact that there are seven possible vertex types as depicted in figure 8.

Figure 8: Seven vertex types
Figure 8: Seven vertex types

Our rules are shown in figure 9 (omitting mirror symmetric versions). In each case the TileFace shown yellow needs to be added in the presence of the other TileFaces shown.

Figure 9: Rules for forcing
Figure 9: Rules for forcing

Main Forcing Operations

To make forcing efficient we convert a Tgraph to a BoundaryState to keep track of boundary information of the Tgraph, and then calculate a ForceState which combines the BoundaryState with a record of awaiting boundary edge updates (an update map). Then each face addition is carried out on a ForceState, converting back when all the face additions are complete. It makes sense to apply force (and related functions) to a Tgraph, a BoundaryState, or a ForceState, so we define a class Forcible with instances Tgraph, BoundaryState, and ForceState.

This allows us to define

    force :: Forcible a => a -> a
    tryForce :: Forcible a => a -> Try a

The first will raise an error if a stuck tiling is encountered. The second uses a Try result which produces a Left string for failures and a Right a for successful result a.

There are several other operations related to forcing including

    stepForce :: Forcible a => Int -> a -> a
    tryStepForce  :: Forcible a => Int -> a -> Try a

    addHalfDart, addHalfKite :: Forcible a => Dedge -> a -> a
    tryAddHalfDart, tryAddHalfKite :: Forcible a => Dedge -> a -> Try a

The first two force (up to) a given number of steps (=face additions) and the other four add a half dart/kite on a given boundary edge.

Update Generators

An update generator is used to calculate which boundary edges can have a certain update. There is an update generator for each force rule, but also a combined (all update) generator. The force operations mentioned above all use the default all update generator (defaultAllUGen) but there are more general (with) versions that can be passed an update generator of choice. For example

    forceWith :: Forcible a => UpdateGenerator -> a -> a
    tryForceWith :: Forcible a => UpdateGenerator -> a -> Try a

In fact we defined

    force = forceWith defaultAllUGen
    tryForce = tryForceWith defaultAllUGen

We can also define

    wholeTiles :: Forcible a => a -> a
    wholeTiles = forceWith wholeTileUpdates

where wholeTileUpdates is an update generator that just finds boundary join edges to complete whole tiles.

In addition to defaultAllUGen there is also allUGenerator which does the same thing apart from how failures are reported. The reason for keeping both is that they were constructed differently and so are useful for testing.

In fact UpdateGenerators are functions that take a BoundaryState and a focus (list of boundary directed edges) to produce an update map. Each Update is calculated as either a SafeUpdate (where two of the new face edges are on the existing boundary and no new vertex is needed) or an UnsafeUpdate (where only one edge of the new face is on the boundary and a new vertex needs to be created for a new face).

    type UpdateGenerator = BoundaryState -> [Dedge] -> Try UpdateMap
    type UpdateMap = Map.Map Dedge Update
    data Update = SafeUpdate TileFace 
                | UnsafeUpdate (Vertex -> TileFace)

Completing (executing) an UnsafeUpdate requires a touching vertex check to ensure that the new vertex does not clash with an existing boundary vertex. Using an existing (touching) vertex would create a crossing boundary so such an update has to be blocked.

Forcible Class Operations

The Forcible class operations are higher order and designed to allow for easy additions of further generic operations. They take care of conversions between Tgraphs, BoundaryStates and ForceStates.

    class Forcible a where
      tryFSOpWith :: UpdateGenerator -> (ForceState -> Try ForceState) -> a -> Try a
      tryChangeBoundaryWith :: UpdateGenerator -> (BoundaryState -> Try BoundaryChange) -> a -> Try a
      tryInitFSWith :: UpdateGenerator -> a -> Try ForceState

For example, given an update generator ugen and any f:: ForceState -> Try ForceState , then f can be generalised to work on any Forcible using tryFSOpWith ugen f. This is used to define both tryForceWith and tryStepForceWith.

We also specialize tryFSOpWith to use the default update generator

    tryFSOp :: Forcible a => (ForceState -> Try ForceState) -> a -> Try a
    tryFSOp = tryFSOpWith defaultAllUGen

Similarly given an update generator ugen and any f:: BoundaryState -> Try BoundaryChange , then f can be generalised to work on any Forcible using tryChangeBoundaryWith ugen f. This is used to define tryAddHalfDart and tryAddHalfKite.

We also specialize tryChangeBoundaryWith to use the default update generator

    tryChangeBoundary :: Forcible a => (BoundaryState -> Try BoundaryChange) -> a -> Try a
    tryChangeBoundary = tryChangeBoundaryWith defaultAllUGen

Note that the type BoundaryChange contains a resulting BoundaryState, the single TileFace that has been added, a list of edges removed from the boundary (of the BoundaryState prior to the face addition), and a list of the (3 or 4) boundary edges affected around the change that require checking or re-checking for updates.

The class function tryInitFSWith will use an update generator to create an initial ForceState for any Forcible. If the Forcible is already a ForceState it will do nothing. Otherwise it will calculate updates for the whole boundary. We also have the special case

    tryInitFS :: Forcible a => a -> Try ForceState
    tryInitFS = tryInitFSWith defaultAllUGen

Efficient chains of forcing operations.

Note that (force . force) does the same as force, but we might want to chain other force related steps in a calculation.

For example, consider the following combination which, after decomposing a Tgraph, forces, then adds a half dart on a given boundary edge (d) and then forces again.

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = force . addHalfDart d . force . decompose

Since decompose:: Tgraph -> Tgraph, the instances of force and addHalfDart d will have type Tgraph -> Tgraph so each of these operations, will begin and end with conversions between Tgraph and ForceState. We would do better to avoid these wasted intermediate conversions working only with ForceStates and keeping only those necessary conversions at the beginning and end of the whole sequence.

This can be done using tryFSOp. To see this, let us first re-express the forcing sequence using the Try monad, so

    force . addHalfDart d . force

becomes

    tryForce <=< tryAddHalfDart d <=< tryForce

Note that (<=<) is the Kliesli arrow which replaces composition for Monads (defined in Control.Monad). (We could also have expressed this right to left sequence with a left to right version tryForce >=> tryAddHalfDart d >=> tryForce). The definition of combo becomes

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose

This has no performance improvement, but now we can pass the sequence to tryFSOp to remove the unnecessary conversions between steps.

    combo :: Dedge -> Tgraph -> Tgraph
    combo d = runTry . tryFSOp (tryForce <=< tryAddHalfDart d <=< tryForce) . decompose

The sequence actually has type Forcible a => a -> Try a but when passed to tryFSOp it specialises to type ForceState -> Try ForseState. This ensures the sequence works on a ForceState and any conversions are confined to the beginning and end of the sequence, avoiding unnecessary intermediate conversions.

A limitation of forcing

To avoid creating touching vertices (or crossing boundaries) a BoundaryState keeps track of locations of boundary vertices. At around 35,000 face additions in a single force operation the calculated positions of boundary vertices can become too inaccurate to prevent touching vertex problems. In such cases it is better to use

    recalibratingForce :: Forcible a => a -> a
    tryRecalibratingForce :: Forcible a => a -> Try a

These work by recalculating all vertex positions at 20,000 step intervals to get more accurate boundary vertex positions. For example, 6 decompositions of the kingGraph has 2,906 faces. Applying force to this should result in 53,574 faces but will go wrong before it reaches that. This can be fixed by calculating either

    recalibratingForce (decompositions kingGraph !!6)

or using an extra force before the decompositions

    force (decompositions (force kingGraph) !!6)

In the latter case, the final force only needs to add 17,864 faces to the 35,710 produced by decompositions (force kingGraph) !!6.

6. Advanced Operations

Guided comparison of Tgraphs

Asking if two Tgraphs are equivalent (the same apart from choice of vertex numbers) is a an np-complete problem. However, we do have an efficient guided way of comparing Tgraphs. In the module Tgraph.Rellabelling we have

    sameGraph :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Bool

The expression sameGraph (g1,d1) (g2,d2) asks if g2 can be relabelled to match g1 assuming that the directed edge d2 in g2 is identified with d1 in g1. Hence the comparison is guided by the assumption that d2 corresponds to d1.

It is implemented using

    tryRelabelToMatch :: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph

where tryRelabelToMatch (g1,d1) (g2,d2) will either fail with a Left report if a mismatch is found when relabelling g2 to match g1 or will succeed with Right g3 where g3 is a relabelled version of g2. The successful result g3 will match g1 in a maximal tile-connected collection of faces containing the face with edge d1 and have vertices disjoint from those of g1 elsewhere. The comparison tries to grow a suitable relabelling by comparing faces one at a time starting from the face with edge d1 in g1 and the face with edge d2 in g2. (This relies on the fact that Tgraphs are connected with no crossing boundaries, and hence tile-connected.)

The above function is also used to implement

    tryFullUnion:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> Try Tgraph

which tries to find the union of two Tgraphs guided by a directed edge identification. However, there is an extra complexity arising from the fact that Tgraphs might overlap in more than one tile-connected region. After calculating one overlapping region, the full union uses some geometry (calculating vertex locations) to detect further overlaps.

Finally we have

    commonFaces:: (Tgraph,Dedge) -> (Tgraph,Dedge) -> [TileFace]

which will find common regions of overlapping faces of two Tgraphs guided by a directed edge identification. The resulting common faces will be a sub-collection of faces from the first Tgraph. These are returned as a list as they may not be a connected collection of faces and therefore not necessarily a Tgraph.

Empires and SuperForce

In Empires and SuperForce we discussed forced boundary coverings which were used to implement both a superForce operation

    superForce:: Forcible a => a -> a

and operations to calculate empires.

We will not repeat the descriptions here other than to note that

    forcedBoundaryECovering:: Tgraph -> [Tgraph]

finds boundary edge coverings after forcing a Tgraph. That is, forcedBoundaryECovering g will first force g, then (if it succeeds) finds a collection of (forced) extensions to force g such that

  • each extension has the whole boundary of force g as internal edges.
  • each possible addition to a boundary edge of force g (kite or dart) has been included in the collection.

(possible here means – not leading to a stuck Tgraph when forced.) There is also

    forcedBoundaryVCovering:: Tgraph -> [Tgraph]

which does the same except that the extensions have all boundary vertices internal rather than just the boundary edges.

Combinations

Combinations such as

    compForce:: Tgraph -> Tgraph      -- compose after forcing
    allCompForce:: Tgraph -> [Tgraph] -- iterated (compose after force) while not emptyTgraph
    maxCompForce:: Tgraph -> Tgraph   -- last item in allCompForce (or emptyTgraph)

make use of theorems established in Graphs,Kites and Darts and Theorems. For example

    compForce = uncheckedCompose . force 

which relies on the fact that composition of a forced Tgraph does not need to be checked for connectedness and no crossing boundaries. Similarly, only the initial force is necessary in allCompForce with subsequent iteration of uncheckedCompose because composition of a forced Tgraph is necessarily a forced Tgraph.

Tracked Tgraphs

The type

    data TrackedTgraph = TrackedTgraph
       { tgraph  :: Tgraph
       , tracked :: [[TileFace]] 
       } deriving Show

has proven useful in experimentation as well as in producing artwork with darts and kites. The idea is to keep a record of sub-collections of faces of a Tgraph when doing both force operations and decompositions. A list of the sub-collections forms the tracked list associated with the Tgraph. We make TrackedTgraph an instance of class Forcible by having force operations only affect the Tgraph and not the tracked list. The significant idea is the implementation of

    decomposeTracked :: TrackedTgraph -> TrackedTgraph

Decomposition of a Tgraph involves introducing a new vertex for each long edge and each kite join. These are then used to construct the decomposed faces. For decomposeTracked we do the same for the Tgraph, but when it comes to the tracked collections, we decompose them re-using the same new vertex numbers calculated for the edges in the Tgraph. This keeps a consistent numbering between the Tgraph and tracked faces, so each item in the tracked list remains a sub-collection of faces in the Tgraph.

The function

    drawTrackedTgraph :: [VPatch -> Diagram B] -> TrackedTgraph -> Diagram B

is used to draw a TrackedTgraph. It uses a list of functions to draw VPatches. The first drawing function is applied to a VPatch for any untracked faces. Subsequent functions are applied to VPatches for the tracked list in order. Each diagram is beneath later ones in the list, with the diagram for the untracked faces at the bottom. The VPatches used are all restrictions of a single VPatch for the Tgraph, so will be consistent in vertex locations. When labels are used, there is also a drawTrackedTgraphRotated and drawTrackedTgraphAligned for rotating or aligning the VPatch prior to applying the drawing functions.

Note that the result of calculating empires (see Empires and SuperForce ) is represented as a TrackedTgraph. The result is actually the common faces of a forced boundary covering, but a particular element of the covering (the first one) is chosen as the background Tgraph with the common faces as a tracked sub-collection of faces. Hence we have

    empire1, empire2 :: Tgraph -> TrackedTgraph
    
    drawEmpire :: TrackedTgraph -> Diagram B

Figure 10 was also created using TrackedTgraphs.

Figure 10: Using a TrackedTgraph for drawing
Figure 10: Using a TrackedTgraph for drawing

7. Other Reading

Previous related blogs are:

  • Diagrams for Penrose Tiles – the first blog introduced drawing Pieces and Patches (without using Tgraphs) and provided a version of decomposing for Patches (decompPatch).
  • Graphs, Kites and Darts intoduced Tgraphs. This gave more details of implementation and results of early explorations. (The class Forcible was introduced subsequently).
  • Empires and SuperForce – these new operations were based on observing properties of boundaries of forced Tgraphs.
  • Graphs,Kites and Darts and Theorems established some important results relating force, compose, decompose.

by readerunner at September 30, 2024 11:15 AM

September 27, 2024

Chris Smith 2

Playing With a Game

In a recent comment (that I sadly cannot find any longer) in https://www.reddit.com/r/math/, someone mentioned the following game. There are n players, and they each independently choose a natural number. The player with the lowest unique number wins the game. So if two people choose 1, a third chooses 2, and a fourth chooses 5, then the third player wins: the 1s were not unique, so 2 was the least among the unique numbers chosen. (Presumably, though this wasn’t specified in the comment, if there is no unique number among all players, then no one wins).

I got nerd-sniped, so I’ll share my investigation.

For me, since the solution to the general problem wasn’t obvious, it made sense to specialize. Let’s say there are n players, and just to make the game finite, let’s say that instead of choosing any natural number, you choose a number from 1 to m. Choosing very large numbers is surely a bad strategy anyway, so intuitively I expect any reasonably large choice of m to give very similar results.

n = 2

Let’s start with the case where n = 2. This one turns out to be easy: you should always pick 1, daring your opponent to pick 1, as well. We can induct on m to prove this. If m = 1, then you are required to pick 1 by the rules. But if m > 1, suppose you pick m. Either your opponent also picks m and you both lose, or your opponent picks a number smaller than m and you still lose. Clearly, this is a bad strategy, and you always do at least as well choosing one of the first m - 1 options instead. This reduces the game to one where we already know the best strategy is to pick 1.

That wasn’t very interesting, so let’s try more players.

n = 3, m = 2

Suppose there are three players, each choosing either 1 or 2. It’s impossible for all three players to choose a different number! If you do manage to pick a unique number, then, you will be the only player to do so, so it will always be the least unique number simply because it’s the only one!

If you don’t think your opponents will have figured this out, you might be tempted to pick 2, in hopes that your opponents go for 1 to try to get the least number, and you’ll be the only one choosing 2. But this makes you predictable, so the other players can try to take advantage. But if one of the other players reasons the same way, you both are guaranteed to lose! What we want here is a Nash equilibrium: a strategy for all players such that no single player can do better by deviating from that strategy.

It’s not hard to see that all players should flip a coin, choosing either 1 or 2 with equal probability. There’s a 25% chance each that a player picks the unique number and wins, and there’s a 25% chance that they all choose the same number and all lose. Regrettable, but anything you do to try to avoid that outcome just makes your play more predictable so that the other players could exploit that.

It’s interesting to look at the actual computation. When computing a Nash equilibrium, we generally rely on the indifference principle: a player should always be indifferent between any choice that they make at random, since otherwise, they would take the one with the better outcome and always play that instead.

This is a bit counter-intuitive! Naively, you might think that the optimal strategy is the one that gives the best expected result, but when a Nash equilibrium involves a random choice— known as a mixed strategy — then any single player actually does equally well against other optimal players no matter which mix of those random choices they make! In this game, though, predictability is a weakness. Just as a poker player tries to avoid ‘tells’ that give away the strength of their hand, players in this number-choosing game need to be unpredictable. The reason for playing the Nash equilibrium isn’t that it gives the best expected result against optimal opponents, but rather that it can’t be exploited by an opponent.

Let’s apply this indifference principle. This game is completely symmetric — there’s no order of turns, and all players have the same choices and payoffs available — so an optimal strategy ought to be the same for any player. Then, let’s say p is the probability that any single player will choose 1. Then if you choose 1, you will win with probability (1 — p)², while if you choose 2, you’ll win with probability p². If you set these equal to each other as per the indifference principle, and solve the equation, you get p = 0.5, as we reasoned above.

n = 3, m = 3

Things get more interesting if each player can choose 1, 2, or 3. Now it’s possible for each player to choose uniquely, so it starts to matter which unique number you pick. Let’s say each player chooses 1, 2, and 3 with the probabilities p, q, and r respectively. We can analyze the probability of winning with each choice.

  • If you pick 1, then you always win unless someone else also picks a 1. Your chance of winning, then, is (qr)².
  • If you pick 2, then for you to win, either both other players need to pick 1 (eliminating each other because of uniqueness and leaving you to win by default), or both other players need to pick 3, so that you’ve picked the least number. Your chance of winning is p² + r².
  • If you pick 3, then you need your opponents to pick the same different number: either 1 or 2. Your chance of winning is p² + q².

Setting these equal to each other immediately shows us that since p² + q² = p² + r², we must conclude that q = r. Then p² + q² = (q + r)² = 4q², so p² = 3q² = 3r². Together with p + q + r = 1, we can conclude that p = 2√3 - 3 ≈ 0.464, while q = r = 2 - √3 ≈ 0.268.

This is our first really interesting result. Can we generalize?

n = 3, in general

The reasoning above generalizes well. If there are three players, and you pick a number k, you are betting that either the other two players will pick the same number less than k, or they will each pick numbers greater than k (regardless of whether they are the same one).

I’ll switch notation here for convenience. Let X be a random variable representing a choice by a player from the Nash equilibrium strategy. Then if you choose k, your probability of winning is P(X=1)² + … + P(X=k-1)² + P(X>k)². The indifference principle tells us that this should be equal for any choice of k. Equivalently, for any k from 1 to m - 1, the probability of winning when choosing k is the same as the probability when choosing k + 1. So:

  • P(X=1)² + … + P(X=k-1)² + P(X>k)² = P(X=1)² + … + P(X=k)² + P(X>k+1)²
  • Cancelling the common terms: P(X>k)² = P(X=k)² + P(X>k+1)²
  • Rearranging: P(X=k) = √(P(X≥k+1)² - P(X>k+1)²)

This gives us a recursive formula that we can use (in reverse) to compute P(X=k), if only we knew P(X=m) to get started. If we just pick something arbitrary, though, it turns out that all the results are just multiples of that choice. We can then divide by the sum of them all to normalize the probabilities to sum to 1.

Here I can write some code (in Haskell):

import Probability.Distribution (Distribution, categorical, probabilities)

nashEquilibriumTo :: Integer -> Distribution Double Integer
nashEquilibriumTo m = categorical (zip allPs [1 ..])
where
allPs = go m 1 0 []
go 1 pEqual pGreater ps = (/ (pEqual + pGreater)) <$> (pEqual : ps)
go k pEqual pGreater ps =
let pGreaterEqual = pEqual + pGreater
in go
(k - 1)
(sqrt (pGreaterEqual * pGreaterEqual - pGreater * pGreater))
pGreaterEqual
(pEqual : ps)

main :: IO ()
main = print (probabilities (nashEquilibriumTo 100))

I’ve used a probability library from https://github.com/cdsmith/prob that I wrote with Shae Erisson during a fun hacking session a few years ago. It doesn’t help yet, but we’ll play around with some of its further features below.

Trying a few large values for m confirms my suspicion that any reasonably large choice of m gives effectively the same result.

1 -> 0.4563109873079237
2 -> 0.24809127016999155
3 -> 0.1348844977362459
4 -> 7.333521940168612e-2
5 -> 3.987155303205954e-2
6 -> 2.1677725302500214e-2
7 -> 1.1785941067126387e-2

By inspection, this appears to be a geometric distribution, parameterized by the probability 0.4563109873079237. We can check that the distribution is geometric, which just means that for all k < m - 1, the ratio P(X > k) / P(X k) is the same as P(X > k + 1) / P(Xk + 1). This is the defining property of a geometric distribution, and some simple algebra confirms that it holds in this case.

But what is this bizarre number? A few Google queries gets us to an answer of sorts. A 2002 Ph.D. dissertation by Joseph Myers seems to arrive at the same number in the solution to a question about graph theory, where it’s identified as the real root of the polynomial x³ - 4x² + 6x - 2. We can check that this is right for a geometric distribution. Starting with P(X=k) = √(P(X≥k+1)² -P(X>k+1)²) where k = 1, we get P(X=1) = √(P(X ≥ 2)² -P(X > 2)²). If P(X=1) = p, then P(X ≥ 2) = 1 - p, and P(X > 2) = (1 - p)², so we have p = √((1-p)² - ((1 - p)²)²), which indeed expands to p⁴ - 4p³ + 6p² - 2p = 0, so either p = 0 (which is impossible for a geometric distribution), or p³ - 4p² + 6p - 2 = 0, giving the probability seen above. (How and if this is connected to the graph theory question investigated in that dissertation, though, is certainly beyond my comprehension.)

You may wonder, in these large limiting cases, how often it turns out that no one wins, or that we see wins with each number. Answering questions like this is why I chose to use my probability library. We can first define a function to implement the game’s basic rule:

leastUnique :: (Ord a) => [a] -> Maybe a
leastUnique xs = listToMaybe [x | [x] <- group (sort xs)]

And then we can define the whole game using the strategy above for each player:

gameTo :: Integer -> Distribution Double (Maybe Integer)
gameTo m = do
ns <- replicateM 3 (nashEquilibriumTo m)
return (leastUnique ns)

Then we can update main to tell us the distribution of game outcomes, rather than plays:

main :: IO ()
main = print (probabilities (gameTo 100))

And get these probabilities:

Nothing -> 0.11320677243374572
Just 1 -> 0.40465349320873445
Just 2 -> 0.22000565820506113
Just 3 -> 0.11961465909617276
Just 4 -> 6.503317590749513e-2
Just 5 -> 3.535782320137907e-2
Just 6 -> 1.9223659987298684e-2
Just 7 -> 1.0451692718822408e-2

An 11% probability of no winner for large m is an improvement over the 25% we computed for m = 2. Once again, a least unique number greater than 7 has less than 1% probability, and the probabilities drop even more rapidly from there.

More than three players?

With an arbitrary number of players, the expressions for the probability of winning grow rather more involved, since you must consider the possibility that some other players have chosen numbers greater than yours, while others have chosen smaller numbers that are duplicated, possibly in twos or in threes.

For the four-player case, this isn’t too bad. The three winning possibilities are:

  • All three other players choose the same smaller number. This has probability P(X=1)³ + … + P(X=k-1)³
  • All three other players choose larger numbers, though not necessarily the same one. This has probability P(X k
  • Two of the three other players choose the same smaller number, and the third chooses a larger number. This has probability 3 P(X > k) (P(X=1)² + … + P(X=k-1)²)

You could possibly work out how to compute this one without too much difficulty. The algebra gets harder, though, and I dug deep enough to determine that the Nash equilibrium is no longer a geometric distribution. If you assume the Nash equilibrium is geometric, then numerically, the probability of choosing 1 that gives 1 and 2 equal rewards would need to be about 0.350788, but this choice gives too small a reward for choosing 3 or more, implying they ought to be chosen less often.

For larger n, even stating the equations turns into a nontrivial problem of accurately counting the possible ways to win. I’d certainly be interested if there’s a nice-looking result here, but I do not yet know what it is.

Numerical solutions

We can solve this numerically, though. Using the probability library mentioned above, one can easily compute, for any finite game and any strategy (as a probability distribution of moves) the expected benefit for each choice.

expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double]
expectedOutcomesTo n m dist =
[ probability (== Just i) $ leastUnique . (i :) <$> replicateM (n - 1) dist
| i <- [1 .. m]
]

We can then then iteratively adjust the probability of each choice slightly based on how its expected outcome compares to other expected outcomes in the distribution. It turns out to be good enough to compare with an immediate neighbor. Just so that all of our distributions remain valid, instead of working with the global probabilities P(X=k), we’ll do the computation with conditional probabilities P(X = k | X k), so that any sequence of probabilities is valid, without worrying about whether they sum to 1. Given this list of conditional probabilities, we can produce a probability distribution like this.

distFromConditionalStrategy :: [Double] -> Distribution Double Int
distFromConditionalStrategy = go 1
where
go i [] = pure i
go i (q : qs) = do
choice <- bernoulli q
if choice then pure i else go (i + 1) qs

Then we can optimize numerically, using the difference of each choice’s win probability from its neighbor as a diff to add to the conditional probability of that choice.

refine :: Int -> Int -> [Double] -> Distribution Double Int
refine n iters strategy
| iters == 0 = equilibrium
| otherwise =
let ps = expectedOutcomesTo n m equilibrium
delta = zipWith subtract (drop 1 ps) ps
adjs = zipWith (+) strategy delta
in refine n (iters - 1) adjs
where
m = length strategy + 1
equilibrium = distFromConditionalStrategy strategy

It works well enough to run this for 10,000 iterations at n = 4, m = 10.

main :: IO ()
main = do
let n = 4
m = 10
d = refine n 10000 (replicate (m - 1) 0.3)
print $ probabilities d
print $ expectedOutcomesTo n m d

The resulting probability distribution is, to me, at least, quite surprising! I would have expected that more players would incentivize you to choose a higher number, since the additional players make collisions on low numbers more likely. But it seems the opposite is true. While three players at least occasionally (with 1% or more probability) should choose numbers up to 7, four players should apparently stop at 3.

Nash equilibrium strategy for n = 4, m = 10

Huh. I’m not sure why this is true, but I’ve checked the computation in a few ways, and it seems to be a real phenomenon. Please leave a comment if you have a better intuition for why it ought to be so!

With five players, at least, we see some larger numbers again in the Nash equilibrium, lending support to the idea that there was something unusual going on with the four player case. Here’s the strategy for five players:

Nash equilibrium strategy for n = 5, m = 10

The six player variant retracts the distribution a little, reducing the probabilities of choosing 5 or 6, but then 7 players expands the choices a bit, and it’s starting to become a pattern that even numbers of players lend themselves to a tighter style of play, while odd numbers open up the strategy.

Nash equilibrium strategy for n = 6, m = 10
Nash equilibrium strategy for n = 7, m = 10
Nash equilibrium strategy for n = 8, m = 10

In general, it looks like this is converging to something. The computations are also getting progressively slower, so let’s stop there.

Game variants

There is plenty of room for variation in the game, which would change the analysis. If you’re looking for a variant to explore on your own, in addition to expanding the game to more players, you might try these:

  • What if a tie awards each player an equal fraction of the reward for a full win, instead of nothing at all? (This actually simplifies the analysis a bit!)
  • What if, instead of all wins being equal, we found the least unique number, and paid that player an amount equal to the number itself? Now there’s somewhat less of an incentive for players to choose small numbers, since a larger number gives a large payoff! This gives the problem something like a prisoner’s dilemma flavor, where players could coordinate to make more money, but leave themselves open to being undercut by someone willing to make a small profit by betraying the coordinated strategy.

What other variants might be interesting?

Addendum (Sep 26): Making it faster

As is often the case, the naive code I originally wrote can be significantly improved. In this case, the code was evaluating probabilities by enumerating all the ways players might choose numbers, and then computing the winner for each one. For large values of m and n this is a lot, and it grows exponentially.

There’s a better way. We don’t need to remember each individual choice to determine the outcome of the game in the presence of further choices. Instead, we need only determine which numbers have been chosen once, and which have been chosen more than once.

data GameState = GameState
{ dups :: Set Int,
uniqs :: Set Int
}
deriving (Eq, Ord)

To add a new choice to a GameState requires checking whether it’s one of the existing unique or duplicate choices:

addToState :: Int -> GameState -> GameState
addToState n gs@(GameState dups uniqs)
| Set.member n dups = gs
| Set.member n uniqs = GameState (Set.insert n dups) (Set.delete n uniqs)
| otherwise = GameState dups (Set.insert n uniqs)

We can now directly compute the distribution of GameState corresponding to a set of n players playing moves with a given distribution. The use of simplify from the probability library here is crucial: it combines all the different paths that lead to the same outcome into a single case, avoiding the exponential explosion.

stateDist :: Int -> Distribution Double Int -> Distribution Double GameState
stateDist n moves = go n (pure (GameState mempty mempty))
where
go 0 states = states
go i states = go (i - 1) (simplify $ addToState <$> moves <*> states)

Now it remains to determine whether a certain move can win, given the game state resulting from the remaining moves.

win :: Int -> GameState -> Bool
win n (GameState dups uniqs) =
not (Set.member n dups) && maybe True (> n) (Set.lookupMin uniqs)

Finally, we update the function that computes win probabilities to use this new code.

expectedOutcomesTo :: Int -> Int -> Distribution Double Int -> [Double]
expectedOutcomesTo n m dist = [probability (win i) states | i <- [1 .. m]]
where
states = stateDist (n - 1) dist

The result is that while I previously had to leave the code running overnight to compute the n = 8 case, I can now easily compute cases up to 15 players with enough patience. This would involve computing the winner for about a quadrillion games in the naive code, making it hopeless , but the simplification reduces that to something feasible.

Nash equilibria for 2 through 15 players

It seems that once you leave behind small numbers of players where odd combinatorial things happen, the equilibrium eventually follows a smooth pattern. I suppose with enough players, the probability for every number would peak and then decline, just as we see for 4 and 5 here, as it becomes worthwhile to spread your choices even further to avoid duplicates. That’s a nice confirmation of my intuition.

by Chris Smith at September 27, 2024 07:19 AM

September 25, 2024

Oskar Wickström

How I Built "The Monospace Web"

Recently, I published The Monospace Web, a minimalist design exploration. It all started with this innocent post, yearning for a simpler web. Perhaps too typewriter-nostalgic, but it was an interesting starting point. After some hacking and sharing early screenshots, @noteed asked for grid alignment, and down the rabbit hole I went.

September 25, 2024 10:00 PM

September 24, 2024

Tweag I/O

Python Packaging in the Real World: Biomedical projects vs. PyPI

The Python programming language, and its huge ecosystem (there are more than 500,000 projects hosted on the main Python repository, PyPI), is used both for software engineering and scientific research. Both have similar requirements for reproducibility. But, as we will see, the practices are quite different.

In fact, the Python ecosystem and community is notorious for the countless ways it uses to declare dependencies. As we were developping FawltyDeps1, a tool to ensure that declared dependencies match the actual imports in the code, we had to accommodate many of these ways. This got us thinking: Could FawltyDeps be used to gain insights into how packaging is done across Python ecosystems?

In this blog post, we look at project structures and dependency declarations across Python projects, both from biomedical scientific papers (as an example of scientific usage of Python) as well as from more general and widely used Python packages. We’ll try to answer the following questions:

  • What practices does the community actually follows? And how do they differ between software engineering and scientific research?
  • Could such differences be related to why it’s often hard to reproduce results from scientific notebooks published in the data science community?

Experiment setup

In the following, we discuss the experimental setup — how we decided which data to use, where to get this data from, and what tools we use to analyze it, before we discuss our results in depth.

Data

First, we need to collect the names and source code locations of projects that we want to include in the analysis. Now, where did we find these projects? We selected projects for analysis based on two key areas: impactful real-world applications and broad community adoption.

  1. Biomedical data analysis repositories: biomedical data plays a vital role in healthcare and research. To capture its significance, we focused on packages directly linked to biomedical data, sourced from repositories supported or referenced by scientific biomedical articles. This criterion anchored our experiment in real-world scientific applications.
  2. To analyze software engineering practices, we’ve chosen to use the most popular PyPI packages: acknowledging the importance of widely adopted packages, we included a scan of the most downloaded and frequently used PyPI packages.

Biomedical data

We leverage a recent study by Samuel, S., & Mietchen, D. (2024): Computational reproducibility of Jupyter notebooks from biomedical publications. This study analyzed 2,177 GitHub repositories associated with publications indexed in PubMed Central to assess computational reproducibility. Specifically, we reused the dataset they generated (found here) for our own analyses.

PyPI data

In order to start analyzing actual projects published to PyPI, we still needed to access some basic metadata about these projects: the project’s name, source URL, and any extra metadata which could be useful for further analysis such as project tags.

While this information is available via the PyPI REST API, this API is subject to rate limiting and is not really designed for bulk analyses such as ours. Conveniently, Google maintains a public BigQuery dataset of PyPI download statistics and project metadata which we leveraged instead. As a starting point for our analysis, we produced a CSV with relevant metadata for top packages downloaded in 2023 using a simple SQL query. Since the above-mentioned biomedical database contains 2,177 projects, we conducted a scan of the first 2,000 PyPI packages to create a dataset of comparable size.

Using FawltyDeps to analyze the source code data

Now that we have the source URLs of our projects of interest, we downloaded all sources and ran an analysis script that wraps around FawltyDeps on the packages. For safety, all of this happened in a virtual machine.

Post-processing and filtering of FawltyDeps analysis results

While the data we collected from PyPI was quite clean (modulo broken or inaccessible project URLs), the biomedical dataset contained some projects written in R and some projects written in Python 2.X, which are outside of our scope. To further filter for relevant projects that are written in Python 3.X, we applied the following rules:

  • there should be .py or .ipynb files in the source code directory of the data. If there are only .ipynb files and no imports, then it is most likely an R project and not taken into account.
  • we are also only interested in Python projects that have 3rd-party imports, as these are the project we would expect to declare their dependencies.

After these filtering steps, we have 1,260 biomedical projects and 1,118 PyPI packages to be analyzed.

Results

Now that we had crunched thousands of Python packages, we were curious to see what secrets the data produced by FawltyDeps would reveal!

Dependency declaration patterns

First, we investigated which dependency declaration file choices were made in both samples. The following pie charts show the proportion of projects with and without dependency declaration files, and whether these files actually contain dependency declarations.

distribution deps 1 distribution deps 1 new
Figure 1. Percent of projects with dependency declaration files and actual dependency(ies) declared.

We find that about 60% of biomedical projects have dependency declaration files, while for PyPI packages, that number is almost 100%. That is expected, as the top PyPI projects are written to be reproducible: they are downloaded by a large group of people and if they are not working due to lack of dependency declarations, it would be noticed immediately by the users.

Interestingly, we found that some biomedical projects (6.8%) and PyPI packages (16.0%) have dependency declaration files with no dependencies listed inside them. This might be because they genuinely have no third-party dependencies, but more commonly it is a symptom of either:

  • setup.py files with complex dependency calculations: although FawltyDeps supports parsing simple setup.py files with a single setup()call and no computation involved for setting the install_requires and extras_require arguments, it is currently not able to analyze more complex scenarios.
  • pyproject.toml might be used to configure tools with sections like [tool.black] or [tool.isort], and declaring dependencies (and other project metadata) in the same file is not strictly required.

For the remainder of the analysis, we do not take these cases into account.

We then examined how different package types utilize various dependency declaration methods. The following chart shows the distribution of requirements.txt, pyproject.toml, and setup files across biomedical projects and PyPI packages (note that these three categories are not exclusive):

distribution deps 2
Figure 2. Percent of projects with dependencies declared in `requirements.txt`, `pyproject.toml` and setup files.

For biomedical projects, requirements.txt and setup.py/setup.cfg files are a majority of declaration files. In contrast, PyPI projects show a higher occurrence of pyproject.toml compared to biomedical projects. pyproject.toml is a suggested modern way of declaring dependencies. This result should not come as a surprise: top PyPI projects are actively maintained and are more likely to follow best practices. A requirements.txt file, on the other hand, is easier to add and if you do not need to package your projects it is a simpler option.

Now let’s have a more detailed view in which categories are exclusive:

distribution deps 3
Figure 3. Distribution of mutually exclusive dependency file choices.

For biomedical data there are a lot of projects that have either requirements.txt or setup.py/setup.cfg files (or a combination of both) present. The traditional method of using setup files utilizing setuptools to create Python packages has been around for a while and is still heavily relied upon in the scientific community.

On the PyPI side, no single method for declaring dependencies stood out, as different approaches were used with similar frequency across all projects. However, when it comes to using pyproject.toml, PyPI packages were about five times more likely to adopt this method compared to biomedical projects, suggesting that PyPI package authors tend to favor pyproject.toml significantly more often for dependency management.

Also, almost no top biomedical projects (only 2 out of 1,260) and very few PyPI packages (only 25 out of 1,118) used pyproject.toml and setup files together: it seems that projects don’t often mix the older method - setup files - with the more modern one - pyproject.toml - at the same time.

A different method of visualizing the subset of results pertaining to requirements.txt, pyproject.toml and setup.py/setup.cfg files are Venn diagrams:

distribution deps 4
Figure 4. Venn diagram of projects with dependencies declared with categories including combination of dependency files.

While these diagrams don’t contain new insights, they show clearly how much more common pyproject.toml usage is for PyPI packages.

Source code directories

We next examined where projects store their source code, which we refer to as the “source code directory”. In the following analysis, we defined this directory as the directory that contains the highest number of Python code files and does not have names like “test”, “example”, “sample”, “doc”, or “tutorial”.

code structure
Figure 5. Source code directories choices.

We can make some interesting observations: Over half (53%) of biomedical projects store their main source code in a directory with a name different than the project itself, and source code is not commonly stored in directories named src or src-python (7%). For PyPI projects, the numbers are lower, with 37% storing their main code in a directory that matches the project name. However, naming the source code directory differently from the package name is still fairly common for PyPI projects, appearing in 36% of cases. A somewhat surprising finding: the src layout, recommended by Python packaging user guide, appears in only 14% of cases.

Another noteworthy observation is that 23% of biomedical projects store all their source code in the root directory of the project. In contrast, only 12% of PyPI projects follow this pattern. This difference makes sense, as scientists working on biomedical projects might be less concerned about maintaining a strict code structure compared to developers on PyPI. Additionally, a lot of biomedical projects might be a loose collection of notebooks/scripts not intended to be packaged/importable, and thus will typically not need to add any subdirectories at all. On the other hand, everything from the PyPI data set is an importable package. Even in the “flat” layout (according to discussion), related modules are collected in a subdirectory named after the package.

The top PyPI projects that keep their code in the root directory are often small Python modules or plugins, like “python-json-patch”, “appdirs”, and “python-json-pointer”. These projects usually have all their source code in a single file, so storing it in the root directory makes sense.

Key results

Many people have preconceptions about how a Python project should look, but the reality can be quite different. Our analysis reveals distinct differences between top PyPI projects and biomedical projects:

  • PyPI projects tend to use modern tools like pyproject.toml more frequently, reflecting better overall project structure and dependency management practices.
  • In contrast, biomedical projects display a wide variety of practices; some store code in the root directory and fail to declare dependencies altogether.

This discrepancy is partially explained by the selection criteria: popular PyPI packages, by necessity, must be usable and thus correctly declare their dependencies, while biomedical projects accompanying scientific papers do not face such stringent requirements.

Conclusion

We found that biomedical projects are written with less attention to the coding best practices, which compromises their reproducibility. There are many projects without dependencies declared. The use of pyproject.toml, which is current state-of-the-art way to declare dependencies is less frequently present in biomedical packages. In our opinion, though, it’s essential for any package to adhere to the same high standards of reproducibility as top PyPI packages. This includes implementing robust dependency management practices and embracing modern packaging standards. Enhancing these practices will not only improve reproducibility but also foster greater trust and adoption within the scientific community.

While our initial analysis revealed some interesting insights, we feel that there might be some more interesting treasures to be found within this dataset - you can check yourself in our FawltyDeps-analysis repository! We invite you to join the discussion on FawltyDeps and reproducibility in package management on our Discord channel.

Finally, this experiment also served as a real-world stress test for FawltyDeps itself and identified several edge cases we had not yet accounted for, suggesting avenues of further development for FawltyDeps: One of the main challenges was to parse unconventional require and extra-require sections in setup.py files. This issue has been addressed by the FawltyDeps project, specifically through the improvements made in FawltyDeps PR #440. Furthermore, it was also not trivial to handle projects with multiple packages declared in one. Addressing these issues will be a focus as we continue to refine and improve FawltyDeps.

Stay tuned as we will drill deeper into the data we’ve collected. So far, we’ve reused part of FawltyDeps‘ code for our analysis, but the next step will be to run the full FawltyDeps tool on a large number of packages. Join us as we examine how FawltyDeps performs under rigorous testing and what improvements can be made to enhance its capabilities!


  1. For more insights, refer to our previous talk at PyData Global: Finding undeclared and unused dependencies in your notebooks and projects.

September 24, 2024 12:00 AM

September 16, 2024

Dan Piponi (sigfpe)

What does it take to be a hero? and other questions from statistical mechanics.

1 We only hear about the survivors

In the classic Star Trek episode Errand of Mercy, Spock computes the chance of success:

CAPTAIN JAMES T. KIRK : What would you say the odds are on our getting out of here?

MR. SPOCK : Difficult to be precise, Captain. I should say, approximately 7,824.7 to 1.


And yet they get out of there. Are Spock’s probability computations unreliable? Think of it another way. The Galaxy is a large place. There must be tens of thousands of Spocks, and Grocks, and Plocks out there on various missions. But we won’t hear (or don’t want to hear) about the failures. So they may all be perfectly good at probability theory, but we’re only hearing about the lucky ones. This is an example of survivor bias.


2 Simulation


We can model this. I’ve written a small battle simulator for a super-simple made up role-playing game...


And the rest of this article can be found at github


(Be sure to download the actual PDF if you want to be able to follow links.)

by sigfpe (noreply@blogger.com) at September 16, 2024 04:11 PM

September 09, 2024

Magnus Therning

Followup on secrets in my work notes

I got the following question on my post on how I handle secrets in my work notes:

Sounds like a nice approach for other secrets but how about :dbconnection for Orgmode and sql-connection-alist?

I have to admit I'd never come across the variable sql-connection-alist before. I've never really used sql-mode for more than editing SQL queries and setting up code blocks for running them was one of the first things I used yasnippet for.

I did a little reading and unfortunately it looks like sql-connection-alist can only handle string values. However, there is a variable sql-password-search-wallet-function, with the default value of sql-auth-source-search-wallet, so using auth-source is already supported for the password itself.

There seems to be a lack of good tutorials for setting up sql-mode in a secure way – all articles I found place the password in clear-text in the config – filling that gap would be a nice way to contribute to the Emacs community. I'm sure it'd prompt me to re-evaluate incorporating sql-mode in my workflow.

September 09, 2024 08:36 PM