# A review of Learn Python the Hard Way, 3rd ed

As a break from usual, I thought I would review Zed Shaw’s Learn Python the Hard Way. I’ve had several beginners to programming ask me what they should use to learn and Shaw’s book frequently comes up. I’ve looked over his materials before when they were a free website but I wanted to see what the current published version was like.

## A note from the reviewer

This review will be self-indulgent. I make copious comparisons with Haskell Programming from First Principles.

I write about and primarily use Haskell in my work now. I used Python for about 7 years of my 10 year career so I am comfortable with Python-the-language even if I don’t keep up with the community any longer.

My final conclusions and recommendation are at the bottom of this review!

# Comparing the physical version of Learn Python the Hard Way with the ebook

### Skip this section if you only want to hear about the content.

I bought the physical version from Amazon and the ebook from Zed Shaw directly. Below you’ll see a picture of the physical copy I got.

The margins in the layout are good enough. The main problem with the paper version is that it is a paperback with a typical perfect-bound binding. As a result, the book cannot stay open even on the 88th page depicted in the photo above. Page 88 of the print book was not at all the same content as the ebook. Page 88 in the print book was halfway through Exercise 25, titled “Even More Practice.” Page 88 in the ebook was in Exercise 17, titled “More Files.” Exercise 25 in the book was located at page 114. The ebook appears to use a slightly less compact layout than the print book, so the page difference will not necessarily be a fixed constant.

The content itself seemed identical, but there were some formatting differences. Here’s an example using the common student questions from Exercise 25:

In the physical version, typewriter text is monospaced, bold, and slightly larger than the bolded text surrounding it.

In the digital version, it’s monospaced but seemingly not bold. It’s also the same size as the surrounding text. The result is that the typewriter text is more visually apparent in the ebook version.

The physical version’s table of contents is conventional and clear with a clean layout. The structuring in terms of chapter/exercise, section, and then sub-section makes more sense than what I’ve seen in some tech books. Often in Manning books you’ll see “units” encompassing multiple chapters that serve no useful purpose. The physical version of LPTHW transitions to backmatter after the 52nd exercise and some comments on learning.

Unfortunately, the formatting of the table of contents is broken in the ebook version. There are 15 exercises in the appendix of the physical version, enumerated as 1 through 15. The ebook version counts up to 55.17 for the command-line content. The ebook stops abruptly at 55.17 and there’s nothing after that. The physical version includes an index at the end which the ebook does not have. I think this is because Zed’s indexing was done manually by humans but not incorporated into the source text he renders the ebook from. For the Haskell Book Julie and I have always indexed things in the original LaTeX source. As a result, the eventual print version of Haskell Programming from First Principles should have the same index as the final version of the ebook.

# Content review starts here

## Preface

Zed is right here about learning to code and the importance of instruction. I haven’t said much about it publicly before this, but his work on Learn Python the Hard Way heavily influenced my earliest approaches to pedagogy and how I thought about the Haskell Book. Much changed as Julie and I learned more about what worked with reviewers, but the basic principle of guiding learners through cumulative exercises is extremely important.

## The Hard Way Is Easier

Zed starts by explaining and justifying the process of learning code by typing code in, making it work, and learning to pay attention to detail. It’s a minimalistic approach to teaching a programming language, but for learning to code it would be difficult for me to improve on this. The Haskell Book took a lot more pages (over 1,000 to LPTHW’s ~300) because we were trying to convey concepts that would have lasting value in addition to teaching people to code in Haskell. Readers should take this section seriously and strive to follow Zed’s directions here.

## The Setup

Zed goes into more excruciating detail on getting things setup than even most books written for children. That’s not say it’s necessarily easy, but he’s working harder to smooth the way than most. In older versions of the book Zed would recommend gedit. In the print version he recommends TextWrangler for Mac users, Notepad++ for Windows users. The ebook recommends Atom for users of all operating systems which I think is sound advice even if I am an inveterate Emacs user.

For a sense of how much detail Zed goes into, he tells you multiple possible names for your terminal program, to add it to your dock, run it, and not to expect it to look like much.

This section is followed by how to find answers on the internet using Google, which most working programmers understand to be 80% of what they’re paid for. He even includes a screenshot of his example search and the results.

## Warnings for Beginners

Zed warns against some common but faulty programmer advice. I largely agree with him but I think his reasons for doing this bear some explaining.

If a programmer tells you to use vim or emacs, just say ”no.” These editors are for when you are a better programmer.

Practice some charity and ignore the, “for when you are a better programmer” bit. The are two important reasons to take this and the rest of his advice seriously.

1. You cannot conquer all things at once. My coauthor Julie essentially learned Linux, the terminal, Git, LaTeX, and Haskell all at once when she first joined me on Haskell Book. I do not recommend doing this. Load-balancing your stress levels is important when you are a beginner.

2. Programmers are more frequently pathologically impulsive and self-centered than I have seen in any other profession. They are not thinking of your needs as a beginner if they tell you to learn Python 3 instead of Python 2. They’re just cheering for red team vs. blue team or whatever other interest they might have. This is not to say I think Zed’s necessarily right to teach Python 3 instead of 2. (I don’t care) My point is that ignoring what programmers tell you is often sound advice.

## Exercise 1, A Good First Program

I do not plan to review every exercise. I’d like to go to bed at a reasonable hour tonight and I plan to allot myself only one evening to write this review. If only because I owe my dogs Papuchon and Jack some couch/TV time.

This exercise opens with a warning not to skip the front matter and methodological commentary. I can tell he has paid some attention to how people use the book based on this.

The first exercise code opens with a series of print statements. The lines are enumerated in the code block formatting of both versions of the book. The ebook has syntax highlighting, the print version does not. I can’t blame Zed for making the print version monochrome, I’ve priced what it costs to print a color version of the Haskell Book and it was horrifying.

The ebook version only shows a single screenshot of the code in the Atom text editor. The print version shows (monochrome, natch) pictures of the code in TextWrangler on Mac OS X and Notepad++ on Windows.

After the picture(s) of the code in a text editor, both versions of the book show you what running the program should print in a Mac or Windows terminal. These images seemed identical in both versions of the book. Main thing I noticed is that Zed needs to fix his terminal font and anti-aliasing, but I am petty and finicky about type.

Anticipating a common typographical error in the code, Zed points out where the error might’ve happened and what the error would look like. He also anticipates and informs the reader on how to correct a potential problem with ASCII encodings.

Exercise 1 is bookended by study drills and common questions asked by students. I was able to two of the three drills in Zed’s instructions. I’m not sure what Zed was asking for with the first study drill, which is a little worrying as beginners will be using this. I will assume it’s something obvious that I missed.

The common student questions occur at the end of the exercises throughout the book. They are intended to catch failure modes. Zed’s approach here is more modular than the Haskell Book. I think this works because the individual exercises are brief and typically last a handful of pages. In HPFFP we treated it more like a linear stream of consciousness and address anticipated problems in media res.

## Exercise 2-5

Zed goes over basic syntactic elements like comments as well as expanding what the learner can do semantically by covering basic arithmetic operations and variables. The progression here seems more focused on minimizing the novelty of what is introduced syntactically rather than in what is introduced semantically. This is an important pedagogical distinction in the approaches taken by Zed’s book and by ours. We ordered the book based on conceptual dependence and difficulty, not on syntactic elements. Syntax didn’t count for nothing, but we believed it was the less difficult category than semantics. Our experience bore this out but I don’t think this invalidates Zed’s method. To give you an idea of what I mean, here’s a snippet of progressions of the code samples:

# Ex1
print "Hello World!"

Side note: In the ebook, the source code has unicode quotation marks. This means if the reader attempts to copy-pasta the code from the ebook, it’ll break. I’m not certain if it was intentional or if it’s like our case where we intentionally don’t fix things that would make copying and pasting easier. The potential problem with LPTHW here is that someone familiar with unicode might believe they’re actually meant to use the fancy quotes and get stuck. Zed doesn’t address it in his student questions section that I could find.

print "I could have code like this." # and the comment after is ignored
# Ex3
print "Hens", 25 + 30 / 6
# Ex4
average_passengers_per_car = passengers / cars_driven

print "There are", cars, "cars available."
# Ex5
my_eyes = ’Blue’
my_hair = ’Brown’

print "He’s got %s eyes and %s hair." % (my_eyes, my_hair)

This should illuminate somewhat how Zed is adding to the syntactic elements demonstrated in each example as the reader progresses. The common student questions continue to be a strong point of this book in the potential problems they address.

## Exercises 6-12

I will get briefer here as Zed’s approach seems consistent and I mostly just want to touch on what the book covers.

Zed covers printing in more detail, escape sequences, string concatenation, and requesting manual user (terminal) input.

## Exercises 13-17

These exercises cover getting user input from the arguments passed to the python invocation at the command-line, combining this with input prompts and reading and writing text files. Getting the length of a string is demonstrated. The code written is still in a scripty top-level style.

## Exercise 18

This is where defining functions begins. Zed doesn’t stage out increasing complexity of function definition. Instead, the first function the reader sees will be one that has a gather parameter like so:

# the use of * with the args parameter is called a gather parameter
def print_two(*args):
arg1, arg2 = args
print ”arg1: %r, arg2: %r” % (arg1, arg2)

I did a search through the ebook PDF with Skim and as far as I can tell Zed never mentions what this is called so that readers could learn more about what it is or what it does. Zed could’ve showed the user how you can define a parameter-less function that can be invoked multiple times to save on repetition, but chose not to. Odder still, the gather parameter example is subsumed by a simple two parameter function and the first is called out as useless in the context of the example.

## Exercises 19-23

Zed begins by demonstrating the definition of variables along with passing them to functions as arguments. Exercise 18 only demonstrated passing string to functions as arguments. The usual carefulness with progression resumes here. This is followed by using files with functions, functions that return a result, a vocabulary overview, and an exercise in reading code.

Exercise 23 seems careless. The exercise suggests reading whatever Python code you can find on Github, Bitbucket, or Gitorious. There’s no real attempt to point people towards things they could understand at that point in the book. I suspect most readers don’t get very far with this.

## Exercises 24-26

This sub-sequence begins with practice in writing code from the book which synthesizes the elements you’ve seen so far. The study drills ask you to describe the elements of the code in a manner not dissimilar from the “parts of speech” you might’ve done in a language lesson. The help function in the REPL is touched upon.

This sub-sequence ends with a quiz where the objective is to fix broken code. I think it would have been better had this exercise been littered throughout the book so that the readers would have more practice doing so. Approaching this decision charitably, it could be said the readers had enough mistakes of their own to fix, but we chose to have many more exercises in our book.

## Exercises 27-31

Boolean logic, truth tables, boolean operators, expressions using boolean operators, and equality. This includes expressions like:

”test” == ”test”
”test” == 1

Python isn’t typed so Zed doesn’t seem to comment upon and equality comparison of values unequal types.

Followed by if-else statements and guarding blocks of code with if-statements. The progression is a cumulative synthesis like before.

## Exercises 32-34

Loops and lists begin here and is the title of the 32nd exercise. Appending onto lists, while loops, and indexing into lists are also covered.

## Exercise 35

This sub-sequence opens with branches within functions. What branch refers to here is the multiple “branches” of code which may or may not execute based on an if statement. The first example combines parameter-less functions, if-else statements, variables, user input, converting an integer from a string, printing, aborting the program, functions calling other functions, an infinite while loop, and having an initial function to kick off a script.

The author doesn’t wrap the start function in the usual if __name__ == "__main__" pablum you see in most Python scripts. I suspect he’s bucking it on purpose since these programs are not intended to be imported by other Python programs.

## Exercises 36-38

Debugging (the basic process, not a tool), a review of symbols and syntax, reading code, popping and appending to lists, getting the length of a list, splitting strings based on a character divider, and concatenating a list of lists are demonstrated.

## Exercise 39

The construction and basic manipulation of Python dictionaries is demonstrated here. The style is imperative and evocative of how the code’s been written through the book so far. There has been no lurch into a functional style yet.

## Exercises 40-42

Modules, classes, and objects begin here. Zed touches on Python being referred to as an object-oriented programming language. This is also where import is principally demonstrated.

My problem is that Object-Oriented Program- ming (OOP) is just plain weird.

The above quote demonstrates the effort Zed put in to explaining OOP.

Treating modules like dictionaries, invoking functions within modules like methods, accessing top-level variables in a module like a property, and using classes in all these same ways are covered.

Object oriented terms qua Python are briefly explained. Basic subtyping with a typical Animal/Species hierarchy is demonstrated.

## Exercises 43-44

This sub-sequence opens with basic object-oriented analysis and design. This is where things get much wordier than they had been up to this point. The objective is to write a simple game engine. The wordiness wouldn’t be unusual in some other books, but there’s a lot of upfront conceptual mapping and the basic approach isn’t demonstrated or justified with any smaller examples. This would be less jarring if it occurred in almost any other book.

Eventually Zed has the reader write a bunch of stubbed out classes and empty methods to plan out the API. A bit like…defining types. Anyway, they look like this:

class Scene(object):
def enter(self):
pass

class Engine(object):
def __init__(self, scene_map):
pass
def play(self):
pass

There’s some commentary on top-down vs. bottom-up design. The mapped out API is correctly described as top-down. This is followed by a listing of all the code that fills in the empty methods and classes for the game engine project. The usual “what you should see” sections and study drills follow. The study drill suggests changing the game, asks you fix a bug he left in the code, asks you to explain a piece of code, adding cheat codes, adding a small combat system, and mentions that the game engine is an example a finite state machine. He suggests reading about finite state machines on the web even if it might not make sense. It’s a little amusing to see these small gestures tossed out when he made little to no effort to point readers to further resources or examples earlier in the book. I suspect this time was different because some of Zed Shaw’s open source work entailed extensive use of finite state machines and he has an affectation for them.

Later, inheritance versus composition are covered. Composition here is simple methods-invoking-other-methods. He strongly recommends against using multiple inheritance. Nothing too objectionable here.

## Exercise 45

The reader is asked to make their own game like the space game that was just demonstrated. There’s a lot of commentary on code style and syntactic elements. There’s no attempt at ameliorating the blank-slate problem for beginners making a project from scratch.

## Exercises 46-49

Project structure, automated testing with assertions, nosetests, exceptions (very briefly), basic text processing with a lexicon, and basic parsing are covered.

Note that the first time exceptions were called out by name was in the overview of symbols but he’s been using try off and on throughout the book.

## Exercises 50-51

Making a basic web application with web.py (a locked version named lpthw.web), getting input from a web browser (HTML forms), HTML templates, and automated tests for forms are demonstrated. As you might imagine, the explanations of what makes a web app tick are briefly but my coauthor and I are no less guilty of this. It’s a huge topic.

## Exercise 52

The task in this exercise is to refactor the game from exercise 43 into a web application. This covers the basics of refactoring code and web sessions. The reader is expected to do most of the work this time, including figuring out how to make user authentication work.

This one seems like a leap based on how much handholding there had been in the book so far. I felt uncomfortable with the final project in our book because it expects the reader to learn TCP sockets on their own, but I think the lacuna was not so bad there.

## Zed’s Commentary

The book includes two bits of commentary separating the 52nd exercise and the content that goes over the command line. One is called “Next Steps” has a couple subsections. The first subsection of “Next Steps” is an overview of how to go deeper with Python, particularly in specific domains like data analysis. I believe this is partly because Zed is trying to empower people whose principal “job” isn’t software itself but which might by augmented by knowing how to code. The second subsection is titled, “How to learn any programming language.” The advice he gives here is sound and I recommend following it if you are learning Haskell with our book or not.

## Appendix A: Command Line Crash Course

These are the exercises sectioned under “55”. This is the content I noted had a very different presentation in the table of contents of the print and ebook variants of LPTHW. The sequence introduces individual command line applications and builds on them iteratively. Examples are given for Linux/OS X and Windows in each exercise. Learning basic file and directory manipulation is the priority of this appendix.

## A note on Python 2 vs. 3

I’ve reviewed the third edition of this book which uses Python 2 throughout. The next, unreleased fourth edition of the book will be using Python 3. It shouldn’t be an impediment for a beginner to learn Python 2 using this book and then move on to using Python 3 afterward, but you should be aware of this.

At present if you go to the purchase page for the ebook on Zed Shaw’s website, it’ll say

Learn Python The Hard Way, 3rd/4th Edition For 29.99 I believe this means purchasers of the ebook will get the eventual fourth edition when it is released even if they’re just buying the third for now. The downloads in my account for Learn Python the Hard Way included videos and the PDF for the third edition only. If you are uncertain and care a great deal about this, please ask Zed himself to confirm. # My conclusions I think this book is positively influenced by Zed’s background as a painter and musician. I believe he’s studied education as a domain as well and it shows in his work. Overall, I recommend this book as an introduction to learning to code and to Python specifically for new or early programmers. Evaluating the book on what I believe to be Zed’s own goals, the main flaws are in the sudden leaps from meticulously explained code examples to quizzes that expect you to do independent research and implementation. There wasn’t much in the way of “intermediate” tasks in the code drills. There’s some half-hearted name-dropping, but there’s not much guidance for readers who want to learn more than what’s covered in the book. To draw a contrast, we name things for what they are in the Haskell Book and footnote many references in addition to recommending further reading at the end of each chapter. Shifting to my own priorities, I’d say my main dissatisfaction with this book is that I wished there was a “follow-on” book which used a lot of the same methods for teaching people the semantics of Python. Zed has a “More Python” book in the works but I don’t know anything about it. The approach in Learn Python the Hard Way is very mechanical but it’s very monkey-see monkey-do. I realize this would’ve exploded the length of the book had it all been in one text. I wouldn’t wish the authoring of a 1,000+ page technical tome on any but my worst enemies. Of the formats of the book available, I recommend getting the ebook directly from Zed for the ergonomics of working side by side with the book, your text editor, and a terminal. This is also how we recommend working through the Haskell Book, but we’re preparing a print version anyway. I only glanced at the videos that came with my purchase of the ebook. They seemed like a fairly straight forward walkthrough of the ebook. I don’t rate videos very highly in an educational context except to demonstrate basic things like workflow, but your mileage may vary. I did not review the Kindle version of this book! If I had to guess, it was prepared from the print version by the print version’s publisher. As a result, I would not expect it to have the same content as the ebook directly from Zed Shaw. Further, I mostly wrote this review while reading the ebook because it was faster for me. There may be differences between the print and ebook versions I failed to note! I believe Zed continues to update the electronic version. Click here to get the ebook version of Learn Python the Hard Way from Zed. (No affiliate link) I know this site is a bit of a disaster zone, but if you like my writing or think you could learn something useful from me, please take a look at the Haskell book I've been writing. There's a free sample available too! ## March 24, 2017 ### Philip Wadler # Explained Visually As the creators put it Explained Visually (EV) is an experiment in making hard ideas intuitive inspired the work of Bret Victor's Explorable Explanations. Sign up to hear about the latest. I've found their explanations of Markov Chains and Eigenvectors and Eigenvalues both cool and clear. ## March 23, 2017 ### Edwin Brady # Type Driven Development with Idris Type Driven Development with Idris, published by Manning, is now in print. There’s a couple of free sample chapters, and an excerpt from Chapter 3 (including a discount code for 37% off the book) on interactive development in Atom. ## March 21, 2017 ### Edward Z. Yang # Proposal: Suggest explicit type application for Foldable length and friends tl;dr If you use a Foldable function like length or null, where instance selection is solely determined by the input argument, you should make your code more robust by introducing an explicit type application specifying which instance you want. This isn't necessary for a function like fold, where the return type can cross-check if you've gotten it right or not. If you don't provide this type application, GHC should give a warning suggesting you annotate it explicitly, in much the same way it suggests adding explicit type signatures to top-level functions. Recently, there has been some dust kicked up about Foldable instances causing "bad" code to compile. The prototypical example is this: you've written length (f x), where f is a function that returns a list [Int]. At some future point in time, a colleague refactors f to return (Warnings, [Int]). After the refactoring, will length (f x) continue to type check? Yes: length (f x) will always return 1, no matter how long the inner list is, because it is using the Foldable instance for (,) Warnings. The solution proposed in the mailing list was to remove Foldable for Either, a cure which is, quite arguably, worse than the disease. But I think there is definitely merit to the complaint that the Foldable instances for tuples and Either enable you to write code that typechecks, but is totally wrong. Richard Eisenberg described this problem as the tension between the goals of "if it compiles, it works!" (Haskell must exclude programs which don't work) and general, polymorphic code, which should be applicable in as many situations as possible. I think there is some more nuance here, however. Why is it that Functor polymorphic code never causes problems for being "too general", but Foldable does? We can construct an analogous situation: I've written fmap (+2) (f x), where f once again returns [Int]. When my colleague refactors f to return (Warnings, [Int]), fmap now makes use of the Functor instance (,) Warnings, but the code fails to compile anyway, because the type of (+1) doesn't line up with [Int]. Yes, we can still construct situations with fmap where code continues to work after a type change, but these cases are far more rare. There is a clear difference between these two programs: the fmap program is redundant, in the sense that the type is constrained by both the input container, the function mapping over it, and the context which uses the result. Just as with error-correcting codes, redundancy allows us to detect when an error has occurred; when you reduce redundancy, errors become harder to detect. With length, the only constraint on the selected instance is the input argument; if you get it wrong, we have no way to tell. Thus, the right thing to do is reintroduce redundancy where it is needed. Functions like fold and toList don't need extra redundancy, because they are cross-checked by the use of their return arguments. But functions like length and null (and arguably maximum, which only weakly constrains its argument to have an Ord instance) don't have any redundancy: we should introduce redundancy in these places! Fortunately, with GHC 8.0 provides a very easy way of introducing this redundancy: an explicit type application. (This was also independently suggested by Faucelme.) In this regime, rather than write length (f x), you write length @[] (f x), saying that you wanted length for lists. If you wanted length for maps, you write length @(Map _) (f x). Now, if someone changes the type of f, you will get a type error since the explicit type application no longer matches. Now, you can write this with your FTP code today. So there is just one more small change I propose we add to GHC: let users specify the type parameter of a function as "suggested to be explicit". At the call-site, if this function is used without giving a type application, GHC will emit a warning (which can be disabled with the usual mechanism) saying, "Hey, I'm using the function at this type, maybe you should add a type application." If you really want to suppress the warning, you could just type apply a type hole, e.g., length @_ (f x). As a minor refinement, you could also specify a "default" type argument, so that if we infer this argument, no warning gets emitted (this would let you use the list functions on lists without needing to explicitly specify type arguments). That's it! No BC-breaking flag days, no poisoning functions, no getting rid of FTP, no dropping instances: just a new pragma, and an opt-in warning that will let people who want to avoid these bugs. It won't solve all Foldable bugs, but it should squash the most flagrant ones. What do people think? ## March 20, 2017 ### Functional Jobs # Remote Haskell backend engineer at Health eFilings (Full-time) Our backend engineering team manages the ingestion and normalization of data sets, from data extraction through to product delivery. We want to work smarter instead of harder, and create domain specific languages, meta-programming etc. where possible. Our current code base is written in Ruby and Coffee Script, but some new modules are being written in Haskell. You will be on the front lines of creating a Haskell-based infrastructure that is maintainable and can scale to support our needs as we grow. WHAT WE WILL EXPECT FROM YOU You will have ownership of an entire module, including responsibility for: • Creating new features in a clean and maintainable way • Re-factoring existing code to ensure that we stay agile • Reviewing teammates’ code and providing feedback • Keeping yourself focused and your projects on track • An “I can run through walls” mentality to ensure that goals are met WHAT YOU CAN EXPECT FROM US • Autonomy to solve problems in the way you best see fit • A manager who is accountable for ensuring you meet your professional goals • A team who helps each other and always strives to improve • The time to focus on creating the right solution, instead of the easiest one REQUIREMENTS • Professional experience as a software engineer • Experience with Haskell • A desire for continual self-improvement • An understanding of best practices regarding maintainability and scalability • Must have US work authorization and be located in the US (we cannot sponsor visas at this time) • There are no formal education requirements for this position BONUS POINTS • Experience with data scraping and parsing LOCATION This is expected to be a remote position, although our Madison, Wisconsin office is also available as a work location. Get information on how to apply for this position. ### Vincent Hanquez # Efficient CStruct Dealing with complex C-structure-like data in haskell often force the developer to have to deal with C files, and create a system that is usually a tradeoff between efficiency, modularity and safety. The Foreign class doesn’t quite cut it, external program needs C files, Binary parsers (binary, cereal) are not efficient or modular. Let’s see if we can do better using the advanced haskell type system. First let define a common like C structure that we will re-use to compare different methods: struct example { uint64_t a; uint32_t b; union { uint64_t addr64; struct { uint32_t hi; uint32_t low; } addr32; } addr; uint8_t data[16]; }; ## Dealing with C structure The offset of each field is defined as a displacement (in bytes) from the beginning of the structure to point at the beginning of the field memory representation. For example here we have: • a is at offset 0 (relative to the beginning of the structure) • b is at offset 8 • addr.addr64 is at offset 12 • addr.addr32.hi is at offset 12 • addr.addr32.low is at offset 16 • data is at offset 20 The size of primitives is simply the number of bits composing the type; so a uint64_t, composed of 64 bits is 8 bytes. Union is a special construction where the different option in the union are overlayed on each other and the biggest element define its size. The size of a struct is defined recursively as the sum of all its component. • field pointed by a is size 8 • field pointed by b is of size 4 • field pointed by addr is size 8 • field pointed by data is size 16 • the whole structure is size 36 ## What’s wrong with Foreign Here’s the usual Foreign definition for something equivalent: data Example = Example { a :: {-# UNPACK #-} !Word64 , b :: {-# UNPACK #-} !Word32 , u :: {-# UNPACK #-} !Word64 , data :: !ByteString } peekBs p ofs len = ... instance Foreign Example sizeof _ = 36 alignment _ = 8 peek p = Example <> peek (castPtr p)
<*> peek (castPtr (p plusPtr 8))
<*> peek (castPtr (p plusPtr 12))
<*> peekBs p 20 16
poke p _ = ...

Given a (valid) Ptr, we can now get element in this by creating a new Example type by calling peek. This will materalize a new haskell data structure in the haskell GC-managed memory which have a copy of all the fields from the Ptr.

In some cases, copying all this values on the haskell heap is wasteful and not efficient. A simple of this use case, would be to quickly iterate over a block of memory to check for a few fields values repeatedly in structure.

The Foreign type classes and co is only about moving data between the foreign boundary, it’s not really about efficiency dealing with this foreign boundary.

In short:

• Materialize values on the haskell side
• Not modular: whole type peeking/poking or nothing.
• Size and alignment defined on values, not type.
• No distinction between constant size types and variable size types.
• Often passing undefined :: SomeType to sizeof and alignment.
• Usually manually created, not typo-proof.

There’s many binary parser on the market: binary , cereal, packer, store.

Most of a binary parser job is taking a stream of bytes and efficiently turning those bytes into haskell value. One added job is dealing with chunking, since you may not have all the memory for parsing, you need to deal with values that are cut between memory boundaries and have to deal with resumption.

Here’s an example of a binary parser for example:

getExample :: Get Example
getExample =
36

## Zooming with accessors

One first thing we need to have an accessor types to represent how we represent part of data structures. For example in C, given the struct example, we want to be able to do:

.a
.data[3]
.data

in a case of a structure or a union, we use the field name to dereference the structure, but in case of an array, we use an integral index. This is really straighforward:

data Access = Field Symbol | Index Nat

A List of Access would represent the zooming inside the data structures. The previous example can be written in haskell with:

'[ 'Field "a" ]
'[ 'Field "data", 'Index 3 ]
'[ 'Field "data" ]

## Calculating Offset

Offset of fields is the next important step to have full capabilities in this system

We define a type family for this that given an Element and [Access] would get back an offset in Nat. Note that due to the recurvise approach we add the offset ofs to start from.

type family Offset (ofs :: Nat) (accessors :: [Access]) (t :: Element) where

When the list of accessors is empty, we have reach the element, so we can just return the offset we have calculated

Offset ofs '[]          t                = ofs

When we have a non empty list we call to each respective data structure with:

• the current offset
• the name of field searched or the index searched
• either the dictionary of symbol to element (represented by '[(Symbol, Element)]) or the array size and inner Element
Offset ofs ('Field f:fs) ('FStruct dict) = StructOffset ofs f fs dict
Offset ofs ('Field f:fs) ('FUnion dict)  = UnionOffset ofs f fs dict
Offset ofs ('Index i:fs) ('FArray n t)   = ArrayOffset ofs i fs n t

Being a type enforced definition, it also mean that with this you can mix up trying to Index into a Structure, or trying to dereference a Field into an Array. the type system will (too) emphatically complain.

Both the Structure and Union will recursely match in the dictionary of symbol to find a matching field. If we reach the empty list, we haven’t found the right field and the developper is notified with a friendly TypeError, at compilation time, that the field is not present in the structure.

Each time an field is skipped in the structure the size of the element being skipped, is added to the current offset.

type family StructOffset (ofs :: Nat)
(field :: Symbol)
(rs :: [Access])
(dict :: [(Symbol, Element)]) where
StructOffset ofs field rs '[]                =
TypeError ('Text "offset: field "
':<>: 'ShowType field
StructOffset ofs field rs ('(field, t) ': _) = Offset ofs rs t
StructOffset ofs field rs ('(_    , v) ': r) = StructOffset (ofs + Size v) field rs r

type family UnionOffset (ofs :: Nat)
(field :: Symbo)
(rs :: [Access])
(dict :: [(Symbol, Element)]) where
UnionOffset ofs field rs '[]                 =
TypeError ('Text "offset: field "
':<>: 'ShowType field
UnionOffset ofs field rs ('(field, t) ': _)  = Offset ofs rs t
UnionOffset ofs field rs (_            : r)  = UnionOffset ofs field rs r

In the case of the array, we can just make sure, at compilation time, that the user is accessing a field that is within bounds, otherwise we also notify the developer with a friendly TypeError.

type family ArrayOffset (ofs :: Nat)
(idx :: Nat)
(rs :: [Access])
(n :: Nat)
(t :: Element) where
ArrayOffset ofs idx rs n t =
If (n <=? idx)
(TypeError ('Text "out of bounds : index is "
':<>: 'ShowType idx
':<>: 'Text " but array of size "
':<>: 'ShowType n))
(Offset (ofs + (idx * Size t)) rs t)

A simple example of how the machinery works:

> Offset ofs '[ 'Field "data", 'Index 2 ]) Example
> StructOffset ofs "data" ['Index 2 ]
'[ '("a", ..), '("b", ..) , '("addr", ..), '( "data", ..) ]
> StructOffset (ofs + Size 'Word64) "data" ['Index 2]
'[ '("b", ..) , '("addr", ..), '( "data", ..) ]
> StructOffset (ofs + 8 + Size 'Word32) "data" ['Index 2]
'[ '("addr", ..), '( "data", ..) ]
> StructOffset (ofs + 12 + Size ('Union ..)) "data" ['Index 2 ]
'[ '( "data", 'Farray 16 'FWord8) ]
> Offset (ofs + 20) ['Index 3] ('FArray 16 'FWord8)
> Offset (ofs + 20 + 3 * Size 'FWord8) [] 'FWord8
> ofs + 23

Now we can just calculate Offset of accessors in structure, we just need something to use it.

getOffset :: forall el fields . (KnownNat (Offset 0 fields el)) => Integer
getOffset = natVal (Proxy :: Proxy (Offset 0 fields el))

Again same magic as getSize, and we also define a constant by construction. We also start counting the offset at 0 since we want to calculate absolute displacement, but we could start at some other points depending on need, and prevent a runtime addition if we were to know the starting offset at compilation for example.

> putStrLn $show (getOffset @Example @('[]) 0 > putStrLn$ show (getOffset @Example @('[ 'Field "a"]))
0
> putStrLn $show (getOffset @Example @('[ 'Field "b"])) 8 > putStrLn$ show (getOffset @Example @('[ 'Field "addr, 'Index "addr32", 'Field "lo" "]))
16
> putStrLn show (getOffset @Example @('[ 'Field "data, 'Index 3 ])) 23 ## Conclusion One nice aspect on this is that you can efficiently nest structure, and you can without a problem re-use the same field names for structure. You can also define at compilation all sorts of different offsets and sizes that automatically recalculate given their structures, and combine together. With this primitive machinery, it’s straighforward to define an efficient, safe, modular accessors (e.g. peek & poke) functions on top of this. ## Code You can find the code: ## Notes 1. Packing & Padding In all this code I consider the C structure packed, and not containing any padding. While the rules of alignment/padding could be added to the calculation types, I chose to ignore the issue since the developper can always from a packed structure definition, add the necessary padding explicitely in the definition. It would also be possible to define special padding types that automatically work out their size given how much padding is needed. 1. Endianness I completely ignore endianness for simplicity purpose, but a real library would likely and simply extend the definitions to add explicit endianness for all multi-bytes types. 1. Nat and Integer It would be nice to be able to generate offset in machine Int or Word, instead of unbounded Integer. Sadly the only capability for Nat is to generate Integer with natVal. The optimisation is probably marginal considering it’s just a constructor away, but it would prevent an unnecessary unwrapping and possibly even more efficient code. ## March 17, 2017 ### Edward Z. Yang # Prio: Private, Robust, and Scalable Computation of Aggregate Statistics I want to take the opportunity to advertise some new work from a colleague of mine, Henry Corrigan-Gibbs (in collaboration with the venerable Dan Boneh) on the subject of preserving privacy when collecting aggregate statistics. Their new system is called Prio and will be appearing at this year's NSDI. The basic problem they tackle is this: suppose you're Google and you want to collect some statistics on your users to compute some aggregate metrics, e.g., averages or a linear regression fit: A big problem is how to collect this data without compromising the privacy of your users. To preserve privacy, you don't want to know the data of each of your individual users: you'd like to get this data in completely anonymous form, and only at the end of your collection period, get an aggregate statistic. This is an old problem; there are a number of existing systems which achieve this goal with varying tradeoffs. Prio tackles one particularly tough problem in the world of private aggregate data collection: robustness in the face of malicious clients. Suppose that you are collecting data for a linear regression, and the inputs your clients send you are completely anonymous. A malicious client could send you a bad data point that could skew your entire data set; and since you never get to see the individual data points of your data set, you would never notice: Thus, Prio looks at the problem of anonymously collecting data, while at the same time being able to validate that the data is reasonable. The mechanism by which Prio does this is pretty cool, and so in this post, I want to explain the key insights of their protocol. Prio operates in a regime where a client secret shares their secret across a pool of servers which are assumed to be non-colluding; as long as at least one server is honest, nothing is revealed about the client's secret until the servers jointly agree to publish the aggregate statistic. Here is the problem: given a secret share of some hidden value, how can we efficiently check if it is valid? To answer this question, we first have to explain a little bit about the world of secret sharing. A secret sharing scheme allows you to split a secret into many pieces, so that the original secret cannot be recovered unless you have some subset of the pieces. There are amazingly simple constructions of secret sharing: suppose that your secret is the number x in some field (e.g., integers modulo some prime p), and you want to split it into n parts. Then, let the first n-1 shares be random numbers in the field, the last random number be x minus the sum of the previous shares. You reconstruct the secret by summing all the shares together. This scheme is information theoretically secure: with only n-1 of the shares, you have learned nothing about the underlying secret. Another interesting property of this secret sharing scheme is that it is homomorphic over addition. Let your shares of x and y be $[x]_i$ and $[y]_i$: then $[x]_i + [y]_i$ form secret shares of x + y, since addition in a field is commutative (so I can reassociate each of the pairwise sums into the sum for x, and the sum for y.) Usually, designing a scheme with homomorphic addition is easy, but having a scheme that supports addition and multiplication simultaneously (so that you can compute interesting arithmetic circuits) is a bit more difficult. Suppose you want to compute an arithmetic circuit on some a secret shared value: additions are easy, but to perform a multiplication, most multiparty computation schemes (Prio uses Beaver's MPC protocol) require you to perform a round of communication: While you can batch up multiplications on the same "level" of the circuit, so that you only to do as many rounds as the maximum depth of multiplications in the circuit, for large circuits, you may end up having to do quite a bit of communication. Henry tells me that fully homomorphic secret sharing has been the topic of some research ongoing research; for example, this paper about homomorphic secret sharing won best paper at CRYPTO last year. Returning to Prio, recall that we had a secret share of the user provided input, and we would like to check if it is valid according to some arithmetic circuit. As we've seen above, we could try using a multi-party computation protocol to compute shares of the output of the circuit, reveal the output of the circuit: if it says that the input is valid, accept it. But this would require quite a few rounds of communication to actually do the computation! Here is one of the key insights of Prio: we don't need the servers to compute the result of the circuit--an honest client can do this just fine--we just need them to verify that a computation of the circuit is valid. This can be done by having the client ship shares of all of the intermediate values on each of the wires of the circuit, having the servers recompute the multiplications on these shares, and then comparing the results with the intermediate values provided to us by the client: When we transform the problem from a computation problem to a verification one, we now have an embarrassingly parallel verification circuit, which requires only a single round to multiply each of the intermediate nodes of the circuit. There is only one final problem: how are we to check that the recomputed multiplies of the shares and the client provided intermediate values are consistent? We can't publish the intermediate values of the wire (that would leak information about the input!) We could build a bigger circuit to do the comparison and combine the results together, but this would require more rounds of communication. To solve this problem, Prio adopts an elegant trick from Ben-Sasson'12 (Near-linear unconditionally-secure multiparty computation with a dishonest minority): rather than publish the entire all of the intermediate wires, treat them as polynomials and publish the evaluation of each polynomial at a random point. If the servers behave correctly, they reveal nothing about the original polynomials; furthermore, with high probability, if the original polynomials are not equal, then the evaluation of the polynomials at a random point will also be not equal. This is all very wonderful, but I'd like to conclude with a cautionary tale: you have to be very careful about how you setup these polynomials. Here is the pitfall: suppose that a malicious server homomorphically modifies one of their shares of the input, e.g., by adding some delta. Because our secret shares are additive, adding a delta to one of the share causes the secret to also be modified by this delta! If the adversary can carry out the rest of the protocol with this modified share, when the protocol finishes running, he finds out whether or not the modified secret was valid. This leaks information about the input: if your validity test was "is the input 0 or 1", then if you (homomorphically) add one to the input and it is still valid, you know that it definitely was zero! Fortunately, this problem can be fixed by randomizing the polynomials, so that even if the input share is shifted, the rest of the intermediate values that it computes cannot be shifted in the same way. The details are described in the section "Why randomize the polynomials?" I think this just goes to show how tricky the design of cryptographic systems can be! In any case, if this has piqued your interest, go read the paper! If you're at MIT, you can also go see Henry give a seminar on the subject on March 22 at the MIT CSAIL Security Seminar. ## March 14, 2017 ### Philip Wadler # Papers We Love: John Reynolds, Definitional Interpreters for Higher-Order Languages, now in Haskell I suggested at Papers We Love that someone might like to recode John Reynolds's definitional interpeter, and I'm pleased to say that Rein Henrichs has done so. ## March 13, 2017 ### Holden Karau # Strata San Jose 2017 I'm excited for Strata San Jose this week. I'll be giving two talks, on Wednesday Spark Structured Streaming for ML with Seth and on Thursday Debugging Apache Spark with Joey :) I'm also going to be back on theCUBE just before the Thursday talk if anyone wants to watch the livestream of that. I might also do un-official office hours if I find a good coffee shop near the venue (San Jose friends - where is good coffee?) If you want to chat about Spark come find me or ping me :) ### Douglas M. Auclair (geophf) # January 2017 1HaskellADay 1Liners • January 31st, 2017: You have d = "3.461497957769017000D+07" define parseDouble :: String -> Double (n.b.: (read d) :: Double throws an error) • bazzargh @bazzargh uncurry (*) (bimap read ((10^).read.(dropWhile(elem"0+D"))) (break (=='D') d))::Double • January 31st, 2017: Given e :: FilePath -> [String] -> [Epoch] readEpochs :: FilePath -> IO [Epoch] readEpochs f = e f . lines <> readFile f

• Astynax Pirogov @alex_pir uncurry fmap . (((. lines) . e) &&& readFile
• January 30th, 2017: Given

parseNum :: String -> Maybe (Float, String)

define: dropNum'' :: String -> Maybe String

points-free in terms of parseNum
• matt @themattchan dropNum" = fmap snd . parseNum
• January 30th, 2017: For

parseHeader :: String -> Maybe String
parseHeader str = match "Start " str <|> match "Finish " str

eliminate redundancies
• mconcat . ([match] <*> ["Start ", "Finish "] <*>) . pure
foldMap match ["Start", "Finish"]
• Andreas Källberg @Anka213  My solution was
parseHeader str = foldr1 (<|>) . map (match str) $["Start", "Finish"] But that's longer than the original. • January 25th, 2017: given f is type: f :: Int -> a -> Bool for: g :: a -> Bool g = (||) . f 2 <*> f 27 rewrite g using f only once in the definition • Denis Stoyanov @xgrommx ugly version but (liftA2 . liftA2) (||) ($2) ($27) f • January 19th, 2017: import Data.Tree.Merkle mkleaf :: Show a => a -> Leaf a mkleaf = uncurry Leaf . (show . hashDatum &&& id) redefine using (<*>) • Denis Stoyanov @xgrommx smth like mkleaf = uncurry Leaf . show . (hashDatum <$> (,) <*> id)
mkleaf = uncurry Leaf . show . (liftA2 (,) hashDatum id)
• January 19th, 2017:
mkbranch1 :: Leaf a -> Branch a
mkbranch1 = uncurry Twig . (uncurry childrenHash . (dataHash &&& dataHash) &&& id)

redefine using (<*>)s(?)

# [dwvcvceg] Colors

We explore the range of RGB colors (the sRGB gamut) projected into CIE L*a*b* (CIElab, Lab) color space, as implemented in the Haskell colour package.

The first image below is a slice of the CIE L*a*b* color solid through the plane L = 50, halfway between 0 (black) to 100 (white). The "a" and "b" coordinates have range ±128. The shape is slightly different from -- the bottom is slightly narrower than -- the image of the slice at L=50 on Wikipedia, so I wonder what is going on. It might be the choice of white point. We chose white_point=Data.Colour.CIE.Illuminant.d65 for everything in this discussion.

(Most of the images on this page have been scaled down with HTML. Right click and View Image to embiggen.)

The inside of the cross section is kind of boring being mostly grayish, so in the image below we fill the cross section by connecting the origin (a=0, b=0) with line segments of the color at edge. We do this because we are mostly interested in the extent (outer edge) of the sRGB gamut in CIE L*a*b*. And it makes the pictures more striking.

Finding the edge of the cross section, i.e., the last representable color still within the sRGB gamut, was an exercise in root finding.

Below are 13 other slices colored similarly. The lightness values are as follows.

1. Lightness 16.149085003423885, halfway between black and the lightness of fully saturated blue.
2. Lightness 32.29817000684777, the lightness of fully saturated blue.
3. Lightness 42.767524268618665, halfway between blue and red.
4. Lightness 53.23687853038956, the lightness of fully saturated red. We switch to a black background as colors approach white in order to better be able to see the edge of the shape.
5. Lightness 56.779166162118884, halfway between red and magenta.
6. Lightness 60.32145379384821, the lightness of fully saturated magenta.
7. Lightness 74.02883222725823, halfway between magenta and green.
8. Lightness 87.73621066066826, the lightness of fully saturated green.
9. Lightness 89.42553101260378, halfway between green and cyan.
10. Lightness 91.11485136453929, the lightness of fully saturated cyan.
11. Lightness 94.12695165179927, halfway between cyan and yellow.
12. Lightness 97.13905193905926, the lightness of fully saturated yellow.
13. Lightness 98.56952596952962, halfway between yellow and white.

It might be fun to 3D print or or render the solid with 3D graphics someday. It seems to have a complicated shape. For 3D graphics, it would be most natural for the rendered color of the solid at each surface point to be the actual color of the solid, incorporating no reflected lights or shadows. However, such a lighting model will probably prevent the solid's sharp edges from being easily visible.

Above, we presented only 13 slices of the CIE L*a*b* color space. The first image below depicts the outer edge colors of 1024 slices. The vertical axis is lightness (L). The horizontal axis is the angle from the a-b origin. On my monitor, there are curious ridges corresponding to the saturated colors. I suspect it has to do with gamma.

However, the appeal of the CIE L*a*b* color space is perceptual uniformity; that is, perceptual differences in color can be calculated by Euclidean distance. The second image above has each row individually rescaled for perceptural uniformity. In other words, the horizontal axis is the proportion of the perimeter of cross section.

Marching along the perimeter of the cross section was another exercise in root finding. At each step, we seek the next point on the perimeter a constant distance away (and remember that finding any point on the perimeter itself requires root finding). Because we don't know the perimeter of a cross section in advance, we arbitrarily choose a small step size, achieving a set of points separated by that step size (except from the final point to initial point), then crudely rescale the those points into 1024 steps.

The image above on the right was the "magnum opus" of this project, taking days to compute. Here is a raw data file of the (a,b) coordinates of the perimeter at 1023 levels of lightness. Some combination of Implicit Differentiation and Automatic Differentiation might have computed this more efficiently.

We can take any row of this image to extract a band of uniform lightness and uniform rate of color change. Below on the top row is two copies of the band at L = 53.3203125, the lightness with the longest perimeter. This happens to be very close to the lightness of pure red. On the bottom row is the same band shifted 25 pixels. The color distance between the rows is roughly constant, so ideally there should be equally sharp contrast along the entire boundary. (But on my monitor this appears not to be the case: we will explore this more further below.)

We can sample this band at multiples of phi (the golden ratio) to get an infinite palette of colors widely spaced from each other, all at the same lightness.

Palette entries 0, 5, 8, 13 are similar because the Fibonacci sequence approximates the golden ratio.

For a fixed size palette, one can probably do slightly better by particle repulsion on the cross section itself, though I have not implemented this.

Next, abandon the constraint of equal lightness and instead focus on the saturated RGB colors. The outline of next image projects has only the saturated RGB colors projected orthogonally to the a-b plane. The edge colors are then radially connected to the origin as before. Someday, it might be fun to render this in 3D as a minimal surface.

I discovered that the appearance of the above image on my LCD display radically changes depending on the position of my head: the width of the colors changes. (CRTs I suspect do not have this problem.) The image below may better illustrate the effect. Move your head up and down (or left and right) and notice how (or if) the color sectors change in width. I especially notice it in the blues. Also, here is a webpage with the same circles tiling the background.

The first image below shows the saturated colors scaled in distance for perceptual uniformity. The second image is without correction, a typical color palette moving in RGB space at constant speed, using all the saturated colors of the rainbow.

The upper image below gives the same perceptually uniform rainbow except extended (looped) a bit to better see the region around red. The lower image is the same, except shifted by 20 pixels. The color distance between the rows is roughly constant, so ideally there should be a boundary line of constant contrast across the whole width. On my monitor, this appears not to be the case: the rows blend in the red-magenta area. As before, on LCD displays, the contrast may depend on the viewing angle.

 rgb(55,0,255) rgb(147,0,255)

The above two colors are separated by a distance of 18.9955 according to this online color distance calculator, whose results are in close (but not exact) agreement with my code. On my monitor, the colors appear quite different.

 rgb(255,0,69) rgb(255,0,33)

The above two colors are separated by a distance of 18.89. On my monitor, they appear similar.

 rgb(0,255,44) rgb(0,255,100)

The above two colors are separated by a distance of 19.108. On my monitor, they appear similar.

Based on the above examples, I'm less than convinced that L*a*b* space is good for defining perceptual color distance. Or, my monitor is bad at displaying colors.

# Linear types make performance more predictable

We’re extending GHC with linear types.

Ever since Jean-Yves Girard discovered linear logic in 1986, researchers around the world have been going “wow! resource tracking, this must be useful for programming languages”. After all, any real computation on a real machine takes resources (memory pages, disk blocks, interrupts, buffers etc) that then aren’t available anymore unless restored somehow. But despite this match apparently made in heaven, research on this subject has remained largely theoretical. It was becoming easy to just give up and assume this nut was simply too tough to crack. We ourselves have been there, but we recovered: we’re having a go at extending GHC with linear types.

Great! But why is this useful in a high-level programming language like Haskell? You’ll find the long of it in the paper we just submitted to ICFP’17 with Mathieu Boespflug, Ryan Newton and Simon Peyton Jones. In this blog post, we’ll briefly discuss (more in the paper) how we’re trying to achieve more predictable performance at a smaller memory footprint in Haskell, and give you a preview of what you might expect in your favourite compiler in the not-so-distant future.

# A bit of history

Once upon a time, Jean-Yves Girard was playing with ways to describe the semantics of a λ-calculus with types, when he realised that from a semantic perspective there was nothing essential about the arrow type (the type for functions): it could be described in terms of two simpler constructions. He followed this thread through and came up with a strange beast he called linear logic.

## Two of everything

Linear logic had two of everything: two ways of building conjunctions and disjunctions, two notions of truth, falsehood and two ways to negate. It’s a strange system, but perhaps not moreso than the zoo of cute names and symbols Girard conferred to every construct. For the purposes of this post, we’ll only need one new symbol from this zoo: , which reads lollipop (also called linear arrow, or lolly for close friends).

If we transpose linear logic to describing the behaviour of programs (by which point we talk about linear types), the linear arrow says this: a function that has type A ⊸ B is a function that has an A available to compute with, but it can only use it once. Not twice, not zero times: just once. It turns out that this property, which has come to be known as linearity, is very useful for compilers of functional programs.

## Typing resources

Programming language researchers quickly took notice. It was not long before Yves Lafont proposed a language with safe memory management yet without needing a garbage collector, thanks to linear types. Philip Wadler piled on a few years later with a system also supporting state update while retaining the properties of a pure language. Recently, researchers have even pushed linear logic towards making it possible to reason about pointer aliasing (the absence or presence of aliasing matters a lot in the optimization and the correctness of C programs).

But despite all these early promises (and many more since then), linear types didn’t catch on in the mainstream. Mind you, there have been workarounds. It turns out linear types are also useful to model effects, but Haskell preferred monads for that purpose instead. Clean wanted safe mutable state, but eschewed monads, using uniqueness types instead. More recently, Rust rose to popularity with a system of ownership which is not unlike Clean’s uniqueness types. Both are complex systems that permeate the entire language and ecosystem in ways that make the learning curve pretty steep.

## Linear types as an extension

What if you could enjoy all your favourite goodies from your favourite programming language, and yet be able to leverage linear types for precise resource management exactly where it matters (as Lafont did by avoiding garbage collection)? The result won’t be as good as Rust for what Rust does: it’s a different trade-off where we assume that such precision is only needed in small key areas of a program that otherwise freely uses functional programming abstractions as we know them today.

This is what we are proposing: a simple, unintrusive design that can be grafted to your favourite functional language at no cost for the programmer. We’re working on GHC, but this would work just as well in your favourite ML flavour.

# So why are we doing this?

Among our many projects, we are working together with Seagate and a number of consortium partners on the SAGE platform, an EU funded R&D project exploring how to store massive amounts of data (in excess of one exabyte) and query this data efficiently. We use Haskell as a prototyping language to fail fast when we’re heading towards a dead end, backtrack and explore a different direction at low cost, and refactor our existing code easily when we commit to a different direction.

On this and other systems level projects, maintaining predictable latencies is key. Pusher recently documented how this matters to them too, to the point where they’ve been looking elsewhere for good latencies. Our use cases share the same characteristics. We decided to solve the problem by asking less of the GC, while extending Haskell to make working outside of the GC just as memory safe. You will find much more details in the paper, but in summary linear types help in two ways:

• We can use linear types to manually, yet safely, manage data with malloc: because linearity forces the programmer using a value at least once, we can ensure that the programmer eventually calls free. And because it forces to use a value at most once, we can make sure that free-d data is never used (no use-after-free or double-free bugs). Anything that we malloc explicitly doesn’t end up on the GC heap, so doesn’t participate in increasing GC pause times.
• Linear types can make fusion predictable and guaranteed. Fusion is crucial to writing programs that are both modular and high-performance. But a common criticism, one that we’ve seen born out in practice, is that it’s often hard to know for sure whether the compiler seized the opportunity to fuse intermediate data structures to reduce allocations, or not. This is still future work, but we’re excited about the possibilities: since fusion leans heavily on inlining, and since linear functions are always safe to inline without duplicating work because they only use their argument once, it should be possible with a few extra tricks to get guaranteed fusion.

But the story gets better still. Linear types aren’t only useful for performance, they can also be key to correctness. SAGE is a complex project with many communicating components. Linear types allow us to model these interactions at the type level, to statically rule out bugs where some component doesn’t respect the protocol that was agreed upon with others. We don’t talk about that much in the paper, but we’ll touch on that here.

# What it will look like

Let’s start slow, with the fst function projecting the first component from a pair:

fst :: (a,b) -> a
fst (x,_) = x

So far, so standard. Now here is a new thing you cannot do:

fst :: (a,b) ⊸ a
fst (x,_) = x

This is the linear arrow: the type of functions which must use their arguments exactly once. The first projection is not linear, since a part of the pair is silently dropped. So the type checker will reject this program. It will also reject the diagonal, which uses its argument twice:

dup :: a ⊸ (a,a)
dup x = (x,x)

On the surface, this is almost the entire story: the linear arrow allows you to reject more programs, of which fst and dup are prototypical examples.

With that in mind, there’s a lot that linear types can be used for. One example is strongly typed protocols that check applications in a distributed system interact as expected. We’ll need a little bit of kit first:

data S a
data R a

newCaps :: (R a ⊸ S a ⊸ IO ()) ⊸ IO ()
send :: S a ⊸ a ⊸ IO ()
receive :: R a ⊸ (a ⊸ IO ()) ⊸ IO ()

In this API, S is a send capability: it gives the ability to send a message. R is a receive capability. A pair of corresponding capabilities is allocated by newCaps. The important thing to notice is that since both capabilities are linear, both must be consumed exactly once by a send and a receive respectively. Conversely, you can’t send/receive without having a corresponding send/receive capability on hand.

As a first example, let’s consider a protocol where a server expects two integers and returns the sum to the client. A simple function, except happening across a wire. This protocol is fully captured by the type (Int, Int, Int ⊸ IO ()), which reads: receive two Int’s, then send an Int. Using ⊸ IO () switches between “send” and “receive”.

To implement a protocol P, one has to write an implementation (a function) of type P ⊸ IO (). The other side of the wire (the client) must implement the dual protocol P ⊸ IO (), with a function of type (P ⊸ IO ()) ⊸ IO ():

type P = (Int, Int, IntIO ())

server :: PIO ()
server (n, p, k) = k (n + p)

client :: (PIO ()) ⊸ IO ()
client sendToSrvr = newCaps $\r s -> do sendToSrvr (42, 57, send s) receive r (\n -> print n) We can see how typing works: client must create a function Int ⊸ IO () for the server to send its result to. So it creates an S/R pair, into which the server will be able to send exactly one result. The r capability must also be consumed exactly once. Therefore, the client must call receive and do something with the result (in this case, it boringly prints it on a console). Of course, one can replace all the linear arrows with regular arrows -> and the program stays well-typed. But the value of linear types lies in what they do not allow: client cannot forget to call receive, resulting in the server blocking on send; similarly, the server is not allowed to return twice to k (e.g. k (n+p) >> k (n*p) is not well-typed). This prevents all sort of bad behaviours. It is not a novel idea; see the paper’s “related work” section if you’re interested in the references. To run the server and the client, we must assume an initial S/R pair s0/r0 (known only to the runner, for safety): runServer :: IO () runServer = receive r0 server runClient :: IO () runClient = client (send s0) We can make more complex examples by alternating the ⊸ IO () construction to sequence the steps of the protocol. For instance, the following asks first for a number, then the word “apple” singular or pluralized, depending on the number, and returns a sentence: copyInt :: Int ⊸ (Int,Int) -- such a function exists for every data type data Number = Singular | Plural type P = (Int, (Number, (String, StringIO ()) ⊸ IO ()) ⊸ IO ()) server :: PIO () server (n, k) = newCaps$ \r s -> do
k (num, send s)
receive r $\(apples, k') -> do k' ("I have " ++ show n2 ++ apples) where (n1,n2) = copyInt n num = if n1 == 1 then Singular else Plural client :: (PIO ()) ⊸ IO () client k = newCaps$ \r s -> do
k (42,send s)
receive r $\(num,k') -> newCaps$ \r' s' -> do
let apples
| Singular <- num = "apple"
| Plural <- num = "apples"
k' (apples, send s')
receive r' $\sentence -> print sentence Running the server and client will result in the client printing the deliciously healthy sentence “I have 42 apples”. Again, the value is in what is rejected by the compiler, such as listening to r a second time in the client rather than to r'. An important takeaway is how haskelly this all looks: just replace -> by and linearity kicks in. Usual datatypes, including tuples, take a linear meaning in a linear context. The technical details of how this is achieved are exposed in the article. Edsko de Vries wrote a blog post where he compares the trade-offs of related type systems; he comes in favour of a system where types are segregated into linear types and unrestricted types, but our position is that such a system, perhaps more flexible, would be more complex to implement, especially if we want good sharing of code between linear and non-linear contexts. One thing that the article does not have, however, is a good description of how our prototype implementation (and type inference) works. # Did you say prototype? Yes! There is a branch of GHC where we are implementing the above proposal. At the time of writing the prototype is still a bit crude: it handles the λ-calculus fragment properly, but case and data do not have full support yet. We believe we have the hard part behind us though: modifying the arrow type so that arrows carry an annotation discriminating whether they are regular arrows (->) or linear arrows (). Indeed, GHC uses and abuses the arrow type all over the type inference mechanism. So making a change there required changes across many files. As it turns out, however, the changes are rather systematic and innocuous. The patch is currently under 1000 lines long. We are targeting a merge by the time of the 8.4 release of GHC. In a future post, we’ll delve into what the paper doesn’t cover, to show how inferring linear types works. Stay tuned! ## March 12, 2017 ### Brandon Simmons # Almost inline ASM in haskell with foreign import prim With help from Reid Barton in questions here and here I discovered it’s pretty easy to call assembly from GHC haskell with minimal overhead, so I cleaned up an example of this technique and posted it here: https://github.com/jberryman/almost-inline-asm-haskell-example This is especially useful if you want to return multiple values from a foreign procedure, where otherwise with the traditional FFI approach you would have to do some allocation and stuff the values into a struct or something. I find the above more understandable in any case. Here’s an example of the dumped ASM from the Main in the example above: ... call newCAF addq$8,%rsp
testq %rax,%rax
je _c73k
_c73j:
movq $stg_bh_upd_frame_info,-16(%rbp) movq %rax,-8(%rbp) movq$block_info,-24(%rbp)
movl $4,%edi movl$3,%esi
movl $2,%r14d movl$1,%ebx
addq $-24,%rbp jmp sipRound_s_x3 _c73z: movq$104,904(%r13)
movq $block_info,-32(%rbp) movq %r14,-24(%rbp) movq %rsi,-16(%rbp) movq %rdi,-8(%rbp) movq %rbx,(%rbp) addq$-32,%rbp
...

You can see we just prepare argument registers, do whatever with the stack pointer, do a jump, and then push the return values onto the stack. For my purposes this was almost too much overhead to make this worthwhile (you can look at notes in the code).

I thought about sketching out a ghc proposal about a way to formalize this, maybe make it safer, and maybe somehow more efficient but I don’t have the time right now and don’t really have the expertise to know if this is even a good idea or how it could work.

# Designing the Backpack signature ecosystem

Suppose you are a library writer interested in using Backpack. Backpack says that you can replace a direct dependency on a function, type or package with one or more signatures. You typecheck against a signature and your end user picks how they want to eventually implement the signature.

Sounds good right? But there's a dirty little secret: to get all of this goodness, you have to write a signature--you know, a type signature for each function and type that you want to use in your library. And we all know how much Haskellers hate writing signatures. But Backpack has a solution to this: rather than repeatedly rewrite signatures for all your packages, a conscientious user can put a signature in a package for reuse in other packages.

For the longest time, I thought that this was "enough", and it would be a simple matter of sitting down and writing some tutorials for how to write a signature package. But as I sat down and started writing signature packages myself, I discovered that there was more than one way to set things up. In the post, I want to walk through two different possible designs for a collection of signature packages. They fall out of the following considerations:

• How many signature packages for, e.g., bytestring, should there be? There could be exactly one, or perhaps a separate package for each API revision?
• Should it be possible to post a new version of a signature package? Under what circumstances should this be allowed?
• For developers of a library, a larger signature is more convenient, since it gives you more functionality to work with. For a client, however, a smaller signature is better, because it reduces the implementation burden. Should signature packages be setup to encourage big or small signatures by default?

### A signature package per release

Intuitively, every release of a package is also associated with a "signature" specifying what functions that release supports. One could conclude, then, that there should be a signature package per release, each package describing the interface of each version of the package in question. (Or, one could reasonably argue that GHC should be able to automatically infer the signature from a package. This is not so easy to do, for reasons beyond the scope of this post.)

However, we have to be careful how we perform releases of each of these signatures. One obvious but problematic thing to do is this: given bytestring-0.10.8.1, also release a bytestring-sig-0.10.8.1. The problem is that in today's Haskell ecosystem, it is strongly assumed that only one version of a package is ever selected. Thus, if I have one package that requires bytestring-sig == 0.10.8.1, and another package that requires bytestring-sig == 0.10.8.2, this will fail if we try to dependency solve for both packages at the same time. We could make this scheme work by teaching Cabal and Stack how to link against multiple versions of a signature package, but at the moment, it's not practical.

An easy way to work around the "multiple versions" problem is to literally create a new package for every version of bytestring. The syntax for package names is a bit irritating (alphanumeric characters plus hyphens only, and no bare numbers between a hyphen), but you could imagine releasing bytestring-v1008, bytestring-v1009, etc., one for each version of the API that is available. Once a signature package is released, it should never be updated, except perhaps to fix a mistranscription of a signature.

Under semantic versioning, packages which share the same major version are supposed to only add functionality, not take it away. Thus, these successive signature packages can also be built on one another: for example bytestring-v1009 can be implemented by inheriting all of the functions from bytestring-v1008, and only adding the new functions that were added in 0.10.9.

### A signature package per major release series

There is something very horrible about the above scheme: we're going to have a lot of signature packages: one per version of a package! How awful would it be to have in the Hackage index bytestring-v900, bytestring-v901, bytestring-v902, bytestring-v1000, bytestring-v1002, bytestring-v1004, bytestring-v1006 and bytestring-v1008 as package choices? (With perhaps more if there exist patch releases that accidentally changed the API.) Thus, it is extremely tempting to try to find ways to reduce the number of signature packages we need to publish.

Here is one such scheme which requires a signature package only for major releases; e.g., for bytestring, we would only have bytestring-v9 and bytestring-v10:

• The latest version of bytestring-v9 should correspond to the "biggest" API supported by the 0.9 series. Thus, bytestring-v9, every minor version release of bytestring, there is a new release of bytestring-v9: e.g., when bytestring-0.9.1.0 is released, we release bytestring-v9-1.0. Each of the releases increases the functionality recorded in the signature, but is not permitted to make any other changes.
• When depending on the signature package, we instead provide a version bound specifying the minimum functionality of the signature required to build our package; e.g., bytestring-v9 >= 1.0. (Upper bounds are not necessary, as it assumed that a signature package never breaks backwards compatibility.)

There is one major difficulty: suppose that two unrelated packages both specify a version bound on bytestring-v9. In this case, the ultimate version of the signature package we pick will be one that is compatible with both ranges; in practice, the latest version of the signature. This is bad for two reasons: first, it means that we'll always end up requiring the client to implement the full glory of bytestring-v9, even if we are compatible with an earlier version in the release series. Second, it means that whenever bytestring-v9 is updated, we may bring more entities into scope: and if that introduces ambiguity, it will cause previously compiling code to stop compiling.

Fortunately, there is a solution for this problem: use signature thinning to reduce the required entities to precisely the set of entities you need. For example, suppose that bytestring-v9-0.0 has the following signature:

signature Data.ByteString where
data ByteString
empty :: ByteString
null :: ByteString -> Bool

As a user, we only needed ByteString and empty. Then we write in our local ByteString signature:

signature Data.ByteString (ByteString, empty) where

and now no matter what new functions get added to bytestring-v9-0.0, this signature will only ever require ByteString and empty. (Another way of thinking about signature thinning is that it is a way to centralize explicit import lists.) Notice that this scheme does not work if you don't have a separate package per major release series, since thinning can't save you from a backwards incompatible change to the types of one of the functions you depend on.

These signature thinning headers can be automatically computed; I've written a tool (ghc-usage) which does precisely this. Indeed, signature thinning is useful even in the first design, where they can be used to reduce the requirements of a package; however, with a signature package per major release, they are mandatory; if you don't use them, your code might break.

### Conclusion

So, what design should we adopt? I think the first scheme (a signature package per release) is more theoretically pure, but I am very afraid of the "too many packages" problem. Additionally, I do think it's a good idea to thin signatures as much as possible (it's not good to ask for things you're not going to use!) which means the signature thinning requirement may not be so bad. Others I have talked to think the first scheme is just obviously the right thing to do.

Which scheme do you like better? Do you have your own proposal? I'd love to hear what you think. (Also, if you'd like to bikeshed the naming convention for signature packages, I'm also all ears.)

### Appendix

After publishing this post, the comments of several folks made me realize that I hadn't motivated why you would want to say something about the API of bytestring-0.10.8; don't you just want a signature of strings? So, to address this comment, I want to describe the line of reasoning that lead me down this path.

I started off with a simple goal: write a signature for strings that had the following properties:

1. Be reasonably complete; i.e., contain all of the functions that someone who wanted to do "string" things might want, but
2. Be reasonably universal; i.e., only support functions that would be supported by all the major string implementations (e.g., String, strict/lazy Text, strict/lazy Word8/Char8 ByteString and Foundation strings.)

It turned out that I needed to drop quite a number of functions to achieve universality; for example, transpose, foldl1, foldl1', mapAccumL/R, scanl, replicate, unfoldr, group, groupBy, inits, tails are not implemented in Foundation; foldr', foldr1', scanl1, scanr, scanr1, unfoldN, spanEnd, breakEnd, splitOn, isInfixOf are not implemented by the lazy types.

This got me thinking that I could provide bigger signatures, if I didn't require the signature to support all of the possible implementations; you might have a signature that lets you switch between only the strict variants of string types, or even a signature that just lets you swap between Word8 and Char8 ByteStrings.

But, of course, there are combinatorially many different ways one could put signatures together and it would be horrible to have to write (and name) a new signature package for each. So what is the minimal unit of signature that one could write? And there is an obvious answer in this case: the API of a specific module (say, Data.ByteString) in a specific version of the package. Enter the discussion above.

### Appendix 2

Above, I wrote:

But, of course, there are combinatorially many different ways one could put signatures together and it would be horrible to have to write (and name) a new signature package for each. So what is the minimal unit of signature that one could write? And there is an obvious answer in this case: the API of a specific module (say, Data.ByteString) in a specific version of the package.

I think there is an alternative conclusion to draw from this: someone should write a signature containing every single possible function that all choices of modules could support, and then have end-users responsible for paring these signatures down to the actual sets they use. So, everyone is responsible for writing big export lists saying what they use, but you don't have to keep publishing new packages for different combinations of methods.

I'm pursuing this approach for now!

# Do-It-Yourself Functional Reactive Programming @ iOSCon in London

End of this month, I will talk at iOSCon 2017 in London (30 & 31 March). I am very much looking forward to get to know the vibrant iOS dev community in London as well as to meet the other speakers. The line up of talks looks awesome, promising plenty of both breadth and depth.

In my own talk, I will deconstruct functional reactive programming (FRP) to show that it is based on a few simple ideas and holds the potential to improve the architecture of iOS apps. We will discuss questions, such as what is the purpose of FRP, how do you use it to structure your application, and how can you implement it yourself?

One of the things that I love about Swift is that it enables us to conveniently use functional programming concepts to write clearer and more robust code. The ease with which we can implement and use FRP is a great example of that capability.

See you in London @ iOSCon 2017!

## March 10, 2017

### Roman Cheplyaka

When you compile a Haskell program with GHC, by default your program is linked against the GMP (GNU Multiple Precision Arithmetic) library. GHC uses GMP to implement Haskell’s arbitrary-precision Integer type.

Because GMP is distributed under the L/GPL licenses, this presents a problem if you want to link the executable statically and distribute it without the implications of LGPL.

Here I’ll show how to compile a Haskell program without GMP. The process consists of two steps:

1. Install a GHC compiled with integer-simple instead of GMP. You may need to compile such a GHC yourself.
2. Make sure your dependencies do not use GMP.

These instructions are geared towards stack, but it should be clear how to adapt them to other workflows.

## Install GHC with integer-simple

integer-simple is a pure Haskell implementation of the subset of GMP functionality.

Because the Integer type is provided by the base library, and the base library is compiled at the same time as GHC itself, we need a different build of GHC to support integer-simple — although that may change at some point.

At the time of writing, FP Complete distributes integer-simple builds for several recent versions of GHC, but only for Windows. To check, look at the current version of stack-setup-2.yaml and search for “integersimple”.

Thus, on Windows you can say

stack setup --ghc-variant=integersimple 8.0.2

and it will download and install the GHC 8.0.2 based on integer-simple.

As part of my work for nstack, I also prepared builds of GHC 8.0.2 with integer-simple for Linux and macOS. You can download them from https://ro-che.info/tmp/ghc/, or you can just say

stack setup --ghc-variant=integersimple --setup-info-yaml=https://ro-che.info/tmp/ghc/stack-setup-2.yaml 8.0.2

If you are inside a stack project, add

ghc-variant: integersimple

to stack.yaml so that stack knows which compiler flavor to use. Also, in this case you don’t need to give stack setup the GHC version or --ghc-variant; these will be taken from stack.yaml.

If there is no precompiled integer-simple GHC for your platform or desired GHC version, you’ll have to build it yourself as I describe below.

## Compile GHC with integer-simple

These instructions were tested with GHC 8.0.2 on Linux and macOS.

1. Check the system requirements

3. Save the template mk/build.mk.sample as mk/build.mk:

cp mk/build.mk.sample mk/build.mk

Now, add the following line somewhere in mk/build.mk:

INTEGER_LIBRARY=integer-simple

While editing that file, also choose the build profile by uncommenting one of the BuildFlavour = lines.

• To test the process, use BuildFlavour = quick.
• Once you are happy with the result, run make distclean and rebuild with BuildFlavour = perf.

Another option I found useful to set in mk/build.mk is

BUILD_SPHINX_PDF=NO

Otherwise, I get errors because I don’t have various exotic TeX packages installed.

4. Follow the standard build instructions, except the final make install command.

5. Run make binary-dist to generate the release tarball.

linux64-integersimple-tinfo6:
8.0.2:
url: "/home/user/ghc/ghc-8.0.2-x86_64-unknown-linux.tar.xz"
content-length: 114017964

Now you can follow the instructions from the previous section, except replace the stack-setup-2.yaml url with the path or url of your own stack-setup-2.yaml file.

## Make sure your dependencies do not use GMP

Some packages depend on GMP through the integer-gmp package.

Fortunately, such packages usually have a Cabal flag to remove this dependency or replace it with integer-simple. The flag itself is usually called integer-gmp or integer-simple.

There are different ways to set these flags. With stack, you can declare the flags in stack.yaml as follows:

extra-deps:
- text-1.2.2.1
- hashable-1.2.5.0
- scientific-0.3.4.10
- integer-logarithms-1.0.1
- cryptonite-0.22

flags:
text:
integer-simple: true
hashable:
integer-gmp: false
scientific:
integer-simple: true
integer-logarithms:
integer-gmp: false
cryptonite:
integer-gmp: false

The above YAML snippet can be easily turned into a custom snapshot and shared among multiple stack projects if needed.

# Partial patterns in do blocks: let vs return

This blog post is about a pattern (pun not intended) I've used in my code for a while, and haven't seen discussed explicitly. A prime example is when doing simplistic parsing using the functions in Data.Text.Read. (And yes, this is a contrived example, and using parsec or attoparsec would be far better.)

Full versions of the code below are available as a Github Gist, and embedded at the end of this post.

The example: consider a file format encoding one person per line, indicating the name, height (in centimeters), age (in years), and bank balance (in your favorite currency). I have no idea why anyone would have such a collection of information, but let's roll with it:

Alice 165cm 30y 15
Bob 170cm 35y -20
Charlie 175cm 40y 0

And we want to convert this into a list of Person values:

data Person = Person
{ name    :: !Text
, height  :: !Int
, age     :: !Int
, balance :: !Int
}
deriving Show

Using the Data.Text and Data.Text.Read APIs, this isn't too terribly painful:

parseLine :: Text -> Maybe Person
parseLine t0 = do
let (name, t1) = T.break (== ' ') t0
t2 <- T.stripPrefix " " t1
(height, t3) <-
case decimal t2 of
Right (height, t3) -> Just (height, t3)
Left _ -> Nothing
t4 <- T.stripPrefix "cm " t3
(age, t5) <-
case decimal t4 of
Right (age, t5) -> Just (age, t5)
Left _ -> Nothing
t6 <- T.stripPrefix "y " t5
balance <-
case signed decimal t6 of
Right (balance, "") -> Just balance
_ -> Nothing
Just Person {..}

We start off with the original value of the line, t0, and continue to bite off pieces of it in the format we want. The progression is:

1. Use break to grab the name (everything up until the first space)
2. Use stripPrefix to drop the space itself
3. Use the decimal function to parse out the height
4. Use stripPrefix to strip off the cm after the height
5. Use decimal and stripPrefix yet again, but this time for the age and the trailing y
6. Finally grab the balance using signed decimal. Notice that our pattern match is slightly different here, insisting that the rest of the input be the empty string to ensure no trailing characters

If we add to this a pretty straight-forward helper function and a main function:

parseLines :: Text -> Maybe [Person]
parseLines = mapM parseLine . T.lines

main :: IO ()
main =
case parseLines input of
Nothing -> error "Invalid input"
Just people -> mapM_ print people

We get the output:

Person {name = "Alice", height = 165, age = 30, balance = 15}
Person {name = "Bob", height = 170, age = 35, balance = -20}
Person {name = "Charlie", height = 175, age = 40, balance = 0}

And if we corrupt the input (such as by replacing 175cm with x175cm), we get the output:

v1.hs: Invalid input
CallStack (from HasCallStack):
error, called at v1.hs:49:20 in main:Main

This works, and the Data.Text.Read API is particularly convenient for grabbing part of an input and then parsing the rest. However, all of those case expressions really break up the flow, feel repetitive, and make it difficult to follow the logic in that code. Fortunately, we can clean this up with some lets and partial pattern matches:

parseLine :: Text -> Maybe Person
parseLine t0 = do
let (name, t1) = T.break (== ' ') t0
t2 <- T.stripPrefix " " t1
let Right (height, t3) = decimal t2
t4 <- T.stripPrefix "cm " t3
let Right (age, t5) = decimal t4
t6 <- T.stripPrefix "y " t5
let Right (balance, "") = signed decimal t6
Just Person {..}

That's certainly easier to read! And our program works... assuming we have valid input. However, let's try running against our invalid input with x175cm:

v2.hs: v2.hs:27:9-39: Irrefutable pattern failed for pattern Right (height, t3)

We've introduced partiality into our function! Instead of being well behaved and returning a Nothing, our program now creates an impure exception that blows up in our face, definitely not what we wanted with a simple refactoring.

The problem here is that, when using let, an incomplete pattern match turns into a partial value. GHC will essentially convert our:

let Right (height, t3) = decimal t2

into

let Right (height, t3) = decimal t2
Left _ = error "Irrefutable pattern failed"

What we really want is for that Left clause to turn into a Nothing value, like we were doing explicitly with our case expressions above. Fortunately, we've got one more trick up our sleeve to do exactly that:

parseLine :: Text -> Maybe Person
parseLine t0 = do
let (name, t1) = T.break (== ' ') t0
t2 <- T.stripPrefix " " t1
Right (height, t3) <- Just $decimal t2 t4 <- T.stripPrefix "cm " t3 Right (age, t5) <- Just$ decimal t4
t6 <- T.stripPrefix "y " t5
Right (balance, "") <- Just $signed decimal t6 Just Person {..} To make it abundantly clear, we've replaced: let Right (height, t3) = decimal t2 with: Right (height, t3) <- Just$ decimal t2

We've replaced our let with the <- feature of do-notation. In order to make things type-check, we needed to wrap the right hand side in a Just value (you could also use return or pure, I was just trying to be explicit in the types). But we've still got an incomplete pattern on the left hand side, so why is this better?

When, within do-notation, you have an incomplete pattern match, GHC does something slightly different. Instead of using error and creating an impure exception, it uses the fail function. While generally speaking there are no guarantees that fail is a total function, certain types - like Maybe - due guarantee totality, e.g.:

fail _ = Nothing

Voila! Exactly the behavior we wanted, and now we've achieved it without some bulky, repetitive cases. My general advice around these techniques:

• Don't define partial patterns in lets, cases, or function definitions.
• Only use partial patterns within do-notation if you know that the underlying type defines a total fail function.

For completeness, you can also achieve this with more explicit conversion to a Maybe with the either helper function:

parseLine :: Text -> Maybe Person
parseLine t0 = do
let (name, t1) = T.break (== ' ') t0
t2 <- T.stripPrefix " " t1
(height, t3) <- either (const Nothing) Just $decimal t2 t4 <- T.stripPrefix "cm " t3 (age, t5) <- either (const Nothing) Just$ decimal t4
t6 <- T.stripPrefix "y " t5
(balance, t7) <- either (const Nothing) Just $signed decimal t6 guard$ T.null t7
Just Person {..}

While this works, personally I'm not as big a fan:

• It feels bulkier, hiding the main information I want to express
• It doesn't handle the issue of ensuring no content is left over after parsing the balance, so we need to add an explicit guard. You could just use (balance, "") <-, but that's just going back to using the partial pattern rules of do-notation.

Hopefully you found this little trick useful. Definitely not earth shattering, but perhaps a fun addition to your arsenal. If you want to learn more, be sure to check out our Haskell Syllabus.

The four versions of the code mentioned in this post are all available as a Github Gist:

<script src="https://gist.github.com/snoyberg/bfb8bd7bf41410bc1b6aa9b744d5f66f.js"></script>

# What's in a Fold: The Basic Catamorphism in recursion-schemes

This article is meant as an accessible introduction to the most basic recursion scheme, the catamorphism. It won’t engage in deep dives into theory, or survey practical motives for using recursion schemes – that will be covered by the further reading suggestions at the end. Rather, its main goal is simply offering a concrete presentation of how folds can be generalised. This presentation will be done in terms of the types and combinators used by the recursion-schemes library, so that the article doubles as an introduction to some of its key conventions.

## foldr

The primeval example of a fold in Haskell is the right fold of a list.

foldr :: (a -> b -> b) -> b -> [a] -> b

One way of picturing what the first two arguments of foldr are for is seeing them as replacements for the list constructors: the b argument is an initial value corresponding to the empty list, while the binary function incorporates each element prepended through (:) into the result of the fold.

data [a] = [] | a : [a]

foldr (+) 0 [ 1 ,  2 ,  3 ]
foldr (+) 0 ( 1 : (2 : (3 : [])) )
( 1 + (2 + (3 + 0 )) )

By applying this strategy to other data structures, we can get analogous folds for them.

-- This is foldr; I have flipped the arguments for cosmetic reasons.
data [a] = [] | (:) a [a]
foldList :: b -> (a -> b -> b) -> [a] -> b

-- Does this one look familiar?
data Maybe a = Nothing | Just a
foldMaybe :: b -> (a -> b) -> Maybe a -> b

-- This is not the definition in Data.List.NonEmpty; the differences
-- between them, however, are superficial.
data NEList :: NEList a (Maybe (NEList a))
foldNEList :: (a -> Maybe b -> b) -> NEList a -> b

-- A binary tree like the one in Diagrams.TwoD.Layout.Tree (and in
-- many other places).
data BTree a = Empty | BNode a (BTree a) (BTree a)
foldBTree :: b -> (a -> b -> b -> b) -> BTree a -> b

It would make sense to capture this pattern into an abstraction. At first glance, however, it is not obvious how to do so. Though we know intuitively what the folds above have in common, their type signatures have lots of superficial differences between them. Our immediate goal, then, will be simplifying things by getting rid of these differences.1

## ListF

We will sketch the simplification using the tangible and familiar example of lists. Let’s return to the type of foldr.

(a -> b -> b) -> b -> [a] -> b

With the cosmetic flip I had applied previously, it becomes:

b -> (a -> b -> b) -> [a] -> b

The annoying irregularities among the types of the folds in the previous section had to do with the number of arguments other than the data structure (one per constructor) and the types of said arguments (dependent on the shape of each constructor). Though we cannot entirely suppress these differences, we have a few tricks that make it possible to disguise them rather well. The number of extra arguments, for instance, can be always be reduced to just one with sufficient currying:

(b, a -> b -> b) -> [a] -> b

The first argument is now a pair. We continue by making its two halves more like each other by converting them into unary functions: the first component acquires a dummy () argument, while the second one gets some more currying:

(() -> b, (a, b) -> b) -> [a] -> b

We now have a pair of unary functions with result type b. A pair of functions with the same result type, however, is equivalent to a single function from Either one of the argument types (if you are sceptical about that, you might want to work out the isomorphism – that is, the pair of conversion functions – that witnesses this fact):

(Either () (a, b) -> b) -> [a] -> b

At this point, the only extra argument of the fold is an unary function with result type b. We have condensed the peculiarities of the original arguments at a single place (the argument of said function), which makes the overall shape of the signature a lot simpler. Since it can be awkward to work with anonymous nestings of Either and pairs, we will replace Either () (a, b) with an equivalent type equipped with suggestive names:

data ListF a b =  Nil | Cons a b
--            Left () | Right (a,b)

That leaves us with:

(ListF a b -> b) -> [a] -> b

The most important fact about ListF is that it mirrors the shape of the list type except for one crucial difference…

data []    a   = []  | (:)  a [a]
data ListF a b = Nil | Cons a b

… namely, it is not recursive. An [a] value built with (:) has another [a] in itself, but a ListF a b built with Cons does not contain another ListF a b. To put it in another way, ListF is the outcome of taking away the recursive nesting in the list data type and filling the resulting hole with a placeholder type, the b in our signatures, that corresponds to the result of the fold. This strategy can be used to obtain a ListF analogue for any other data structure. You might, for instance, try it with the BTree a type shown in the first section.

## cata

We have just learned that the list foldr can be expressed using this signature:

(ListF a b -> b) -> [a] -> b

We might figure out a foldr implementation with this signature in a mechanical way, by throwing all of the tricks from the previous section at Data.List.foldr until we squeeze out something with the right type. It is far more illuminating, however, to start from scratch. If we go down that route, the first question that arises is how to apply a ListF a b -> b function to an [a]. It is clear that the list must somehow be converted to a ListF a b, so that the function can be applied to it.

foldList :: (ListF a b -> b) -> [a] -> b
foldList f = f . something
-- foldList f xs = f (something xs)
-- something :: [a] -> ListF a b

We can get part of the way there by recalling how ListF mirrors the shape of the list type. That being so, going from [a] to ListF a [a] is just a question of matching the corresponding constructors.2

project :: [a] -> ListF a [a]
project = \case
[] -> Nil
x:xs -> Cons x xs

foldList :: (ListF a b -> b) -> [a] -> b
foldList f = f . something . project
-- something :: ListF a [a] -> ListF a b

project witnesses the simple fact that, given that ListF a b is the [a] except with a b placeholder in the tail position, where there would be a nested [a], if we plug the placeholder with [a] we get something equivalent to the [a] list type we began with.

We now need to go from ListF a [a] to ListF a b; in other words, we have to change the [a] inside ListF into a b. And sure enough, we do have a function from [a] to b

foldList :: (ListF a b -> b) -> ([a] -> b)
f :: ListF a b -> b
foldList f :: [a] -> b

… the fold itself! To conveniently reach inside ListF a b, we set up a Functor instance:

instance Functor (ListF a) where
fmap f = \case
Nil -> Nil
Cons x y -> Cons x (f y)

foldList :: (ListF a b -> b) -> [a] -> b
foldList f = f . fmap (foldList f) . project

And there it is, the list fold. First, project exposes the list (or, more precisely, its first constructor) to our machinery; then, the tail of the list (if there is one – what happens if there isn’t?) is recursively folded through the Functor instance of ListF; finally, the overall result is obtained by applying f to the resulting ListF a b.

-- A simple demonstration of foldList in action.
f :: ListF a b -> b
f = \case { Nil -> 0; Cons x y -> x + y }

foldList f [1, 2, 3]
-- Let's try and evaluate this by hand.
foldList f (1 : 2 : 3 : [])
f . fmap (foldList f) . project $(1 : 2 : 3 : []) f . fmap (foldList f)$ Cons 1 (2 : 3 : [])
f $Cons 1 (foldList f (2 : 3 : [])) f$ Cons 1 (f . fmap (foldList f) $project (2 : 3 : [])) f$ Cons 1 (f . fmap (foldList f) $Cons 2 (3 : [])) f$ Cons 1 (f $Cons 2 (foldList f (3 : []))) f$ Cons 1 (f $Cons 2 (f . fmap (foldList f) . project$ (3 : [])))
f $Cons 1 (f$ Cons 2 (f . fmap (foldList f) $Cons 3 [])) f$ Cons 1 (f $Cons 2 (f$ Cons 3 (foldList f [])))
f $Cons 1 (f$ Cons 2 (f $Cons 3 (f . fmap (foldList f) . project$ [])))
f $Cons 1 (f$ Cons 2 (f $Cons 3 (f . fmap (foldList f)$ Nil)))
f $Cons 1 (f$ Cons 2 (f $Cons 3 (f$ Nil)))
f $Cons 1 (f$ Cons 2 (f $Cons 3 0)) f$ Cons 1 (f $Cons 2 3) f$ Cons 1 5
6

One interesting thing about our definition of foldList is that all the list-specific details are tucked within the implementations of project, fmap for ListF and f (whatever it is). That being so, if we look only at the implementation and not at the signature, we find no outward signs of anything related to lists. No outward signs, that is, except for the name we gave the function. That’s easy enough to solve, though: it is just a question of inventing a new one:

cata f = f . fmap (cata f) . project

cata is short for catamorphism, the fancy name given to ordinary folds in the relevant theory. There is a function called cata in recursion-schemes. Its implementation

cata f = c where c = f . fmap c . project

… is the same as ours, almost down to the last character. Its type signature, however, is much more general:

cata :: Recursive t => (Base t b -> b) -> t -> b

It involves, in no particular order:

• b, the type of the result of the fold;

• t, the type of the data structure being folded. In our example, t would be [a]; or, as GHC would put it, t ~ [a].

• Base, a type family that generalises what we did with [a] and ListF by assigning base functors to data types. We can read Base t as “the base functor of t”; in our example, we have Base [a] ~ ListF a.

• Recursive, a type class whose minimal definition consists of project, with the type of project now being t -> Base t t.

The base functor is supposed to be a Functor, so that we can use fmap on it. That is enforced through a Functor (Base t) constraint in the definition of the Recursive class. Note, however, that there is no such restriction on t itself: it doesn’t need to be a polymorphic type, or even to involve a type constructor.

In summary, once we managed to concentrate the surface complexity in the signature of foldr at a single place, the ListF a b -> b function, an opportunity to generalise it revealed itself. Incidentally, that function, and more generally any Base t b -> b function that can be given to cata, is referred to as an algebra. In this context, the term “algebra” is meant in a precise technical sense; still, we can motivate it with a legitimate recourse to intuition. In basic school algebra, we use certain rules to get simpler expressions out of more complicated ones, such as ax + bx = (a + b)x. Similarly, a Base t b -> b algebra boils down to a set of rules that tell us what to do to get a b result out of the Base t b we are given at each step of the fold.

## Fix

The cata function we ended up with in the previous section…

cata :: Recursive t => (Base t b -> b) -> t -> b
cata f = c where c = f . fmap c . project

… is perfectly good for practical purposes: it allows us to fold anything that we can give a Base functor and a corresponding project. Not only that, the implementation of cata is very elegant. And yet, a second look at its signature suggests that there might be an even simpler way of expressing cata. The signature uses both t and Base t b. As we have seen for the ListF example, these two types are very similar (their shapes match except for recursiveness), and so using both in the same signature amounts to encoding the same information in two different ways – perhaps unnecessarily so.

In the implementation of cata, it is specifically project that links t and Base t b, as it translates the constructors from one type to the other.

project (1 : 2 : 3 : [])
Cons 1 (2 : 3 : [])

Now, let’s look at what happens once we repeatedly expand the definition of cata:

c = cata f
p = project
c
f . fmap c . p
f . fmap (f . fmap c . p) . p
f . fmap (f . fmap (f . fmap c . p) . p) . p
f . fmap (   .   .   .   f . fmap c . p   .   .   .   ) . p

This continues indefinitely. The fold terminates when, at some point, fmap c does nothing (in the case of ListF, that happens when we get to a Nil). Note, however, that even at that point we can carry on expanding the definition, merrily introducing do-nothing operations for as long as we want.

At the right side of the expanded expression, we have a chain of increasingly deep fmap-ped applications of project:3

.   .   .   fmap (fmap project) . fmap project . project

If we could factor that out into a separate function, it would change a t data structure into something equivalent to it, but built with the Base t constructors:

GHCi> :{
GHCi| fmap (fmap (fmap project))
GHCi|     . fmap (fmap project) . fmap project . project
let [x,y,z] = map snd $take 3 ops, (a u (b v (c w d))) == 17 ] You can see the problem in the last line. a, b, c, and d are numbers, and u, v, and w are operators. The program evaluates an expression to see if it has the value 17, but the expression always has the left-hand shape. (The program has another limitation: it never uses the same operator twice in the expression. That second permutations should be (sequence . take 3 . repeat) or something. It can still solve «2 5 6 6 ⇒ 17», however.) Often the way these programs worked was to generate every possible permutation of the inputs and then apply the operators to the input lists stackwise: pop the first two values, combine them, push the result, and repeat. Here's a relevant excerpt from a program by Tim Dierks, this time in Python: for ordered_values in permutations(values): for operations in product(ops, repeat=len(values)-1): result, formula = calc_result(ordered_values, operations) Here the expression structure is implicit, but the current result is always made by combining one of the input numbers with the old result. I have seen many people get caught by this and similar traps in the past. I once posed the problem of enumerating all the strings of balanced parentheses of a given length, and several people assumed that all such strings have the form ()S, S(), or (S), where S is a shorter string of the same type. This seems plausible, and it works up to length 6, but (())(()) does not have that form. #### Division by zero A less common error exhibited by some programs was a failure to properly deal with division by zero. «2 5 6 6 ⇒ 17» has a solution, and if a program dies while checking and doesn't find the solution, that's a bug. ### Programs that worked #### Ingo Blechschmidt (Haskell) Ingo Blechschmidt showed me a solution in Haskell. The code is quite short. M. Blechschmidt's program defines a synthetic expression type and an evaluator for it. It defines a function arb which transforms an ordered list of numbers into a list of all possible expressions over those numbers. Reordering the list is taken care of earlier, by Data.List.permutations. By “synthetic expression type” I mean this: data Exp a = Lit a | Sum (Exp a) (Exp a) | Diff (Exp a) (Exp a) | Prod (Exp a) (Exp a) | Quot (Exp a) (Exp a) deriving (Eq, Show) Probably 80% of the Haskell programs ever written have something like this in them somewhere. This approach has a lot of boilerplate. For example, M. Blechschmidt's program then continues: eval :: (Fractional a) => Exp a -> a eval (Lit x) = x eval (Sum a b) = eval a + eval b eval (Diff a b) = eval a - eval b eval (Prod a b) = eval a * eval b eval (Quot a b) = eval a / eval b Having made up our own synonyms for the arithmetic operators (Sum for , etc.) we now have to explain to Haskell what they mean. (“Not expressions, but an incredible simulation!”) I spent a while trying to shorten the code by using a less artificial expression type: data Exp a = Lit a | Op ((a -> a -> a), String) (Exp a) (Exp a) but I was disappointed; I was only able to cut it down by 18%, from 34 lines to 28. I hope to discuss this in a future article. By the way, “Blechschmidt” is German for “tinsmith”. #### Shreevatsa R. (Python) Shreevatsa R. showed me a solution in Python. It generates every possible expression and prints it out with its value. If you want to filter the voluminous output for a particular target value, you do that later. Shreevatsa wrote up an extensive blog article about this which also includes a discussion about eliminating duplicate expressions from the output. This is a very interesting topic, and I have a lot to say about it, so I will discuss it in a future article. #### Jeff Fowler (Ruby) Jeff Fowler of the Recurse Center wrote a compact solution in Ruby that he described as “hot garbage”. Did I say something earlier about Perl gobbledygook? It's nice that Ruby is able to match Perl's level of gobbledygookitude. This one seems to get everything right, but it fails mysteriously if I replace the floating-point constants with integer constants. He did provide a version that was not “egregiously minified” but I don't have it handy. #### Lindsey Kuper (Scheme) Lindsey Kuper wrote a series of solutions in the Racket dialect of Scheme, and discussed them on her blog along with some other people’s work. M. Kuper's first draft was 92 lines long (counting whitespace) and when I saw it I said “Gosh, that is way too much code” and tried writing my own in Scheme. It was about the same size. (My Perl solution is also not significantly smaller.) #### Martin Janecke (PHP) I saved the best for last. Martin Janecke showed me an almost flawless solution in PHP that uses a completely different approach than anyone else's program. Instead of writing a lot of code for generating permutations of the input, M. Janecke just hardcoded them:$zahlen = [
[2, 5, 6, 6],
[2, 6, 5, 6],
[2, 6, 6, 5],
[5, 2, 6, 6],
[5, 6, 2, 6],
[5, 6, 6, 2],
[6, 2, 5, 6],
[6, 2, 6, 5],
[6, 5, 2, 6],
[6, 5, 6, 2],
[6, 6, 2, 5],
[6, 6, 5, 2]
]

Then three nested loops generate the selections of operators:

$operatoren = []; foreach (['+', '-', '*', '/'] as$x) {
foreach (['+', '-', '*', '/'] as $y) { foreach (['+', '-', '*', '/'] as$z) {
$operatoren[] = [$x, $y,$z];
}
}
}

Expressions are constructed from templates:

$klammern = [ '%d %s %d %s %d %s %d', '(%d %s %d) %s %d %s %d', '%d %s (%d %s %d) %s %d', '%d %s %d %s (%d %s %d)', '(%d %s %d) %s (%d %s %d)', '(%d %s %d %s %d) %s %d', '%d %s (%d %s %d %s %d)', '((%d %s %d) %s %d) %s %d', '(%d %s (%d %s %d)) %s %d', '%d %s ((%d %s %d) %s %d)', '%d %s (%d %s (%d %s %d))' ]; (I don't think those templates are all necessary, but hey, whatever.) Finally, another set of nested loops matches each ordering of the input numbers with each selection of operators, uses sprintf to plug the numbers and operators into each possible expression template, and uses @eval to evaluate the resulting expression to see if it has the right value: foreach ($zahlen as list ($a,$b, $c,$d)) {
foreach ($operatoren as list ($x, $y,$z)) {
foreach ($klammern as$vorlage) {
$term = sprintf ($vorlage, $a,$x, $b,$y, $c,$z, $d); if (17 == @eval ("return$term;")) {
print ("$term = 17\n"); } } } } If loving this is wrong, I don't want to be right. It certainly satisfies Larry Wall's criterion of solving the problem before your boss fires you. The same approach is possible in most reasonable languages, and some unreasonable ones, but not in Haskell, which was specifically constructed to make this approach as difficult as possible. M. Janecke wrote up a blog article about this, in German. He says “It's not an elegant program and PHP is probably not an obvious choice for arithmetic puzzles, but I think it works.” Indeed it does. Note that the use of @eval traps the division-by-zero exceptions, but unfortunately falls foul of floating-point roundoff errors. ## Thanks Thanks to everyone who discussed this with me. In addition to the people above, thanks to Stephen Tu, Smylers, Michael Malis, Kyle Littler, Jesse Chen, Darius Bacon, Michael Robert Arntzenius, and anyone else I forgot. (If I forgot you and you want me to add you to this list, please drop me a note.) ## Coming up I have enough material for at least three or four more articles about this that I hope to publish here in the coming weeks. But the previous article on this subject ended similarly, saying I hope to write a longer article about solvers in the next week or so. and that was in July 2016, so don't hold your breath. ### Robert Harper # A “proof by contradiction” is not a proof that ends with a contradiction It is well-known that constructivists (supposedly) renounce “proof by contradiction”, and that classicists scoff at the critique. “Those fools,” the criticism goes, “want to rule out proofs by contradiction. How absurd! Look, Pythagoras showed that the square root of two is irrational by deriving a contradiction from the assumption that it is rational. There is nothing wrong with this. Ignore them!” On careful examination the critique fails miserably, because a proof by contradiction is not a proof that derives a contradiction. Pythagoras’s proof is perfectly valid, one of the eternal gems of mathematics. No one, but no one, questions the validity of that line of argument. Despite the righteous fury, no punch has landed. We may as well laugh at the flat earthers, who are of course dead wrong, but also nonexistent. Pythagoras’s Theorem expresses a negation: it is not the case that the square root of two can be expressed as the ratio of two integers. Of course one proves negative statements by contradiction! Assume that it can be so represented. A quick deduction shows that this is impossible. So the assumption is false. Done. This is a direct proof of a negative assertion; it is not a “proof by contradiction”, which is also known as an indirect proof. What, then, is a proof by contradiction? It is the affirmation of a positive statement by refutation of its denial. It is a direct proof of the negation of a negative assertion that is then pressed into service as a direct proof of the assertion, which it is not. Anyone is free to ignore the distinction for the sake of convenience, as a philosophical issue, or as a sly use of “goto” in a proof, but the distinction nevertheless exists and is important. Indeed, part of the beauty of constructive mathematics is that one can draw such distinctions, for, once drawn, one can selectively disregard them. Once blurred, forever blurred, a pure loss of expressiveness. For the sake of clarity, let me rehearse a standard example of a proof by contradiction. The claim is that there exists irrationals a and b such that a to the b power is rational. Here is an indirect proof, a true proof by contradiction. Move number one, let us prove instead that it is impossible that any two irrationals a and b are such that a to the b is irrational. This is a negative statement, so of course one proves it be deriving a contradiction from it. But it is not the original statement! This will be clear from examining the information content of the proof. Suppose, for a contradiction, that every two irrationals a and b are such that a to the b power is irrational. We know from Pythagoras that root two is irrational, so plug it in for both a and b, and conclude that root two to the root two power is irrational. Now use the assumption again, taking a to be root two to the root two, and b to be root two. Calculate a to the power of b, it is two, which is eminently rational. Contradiction. We have now proved that it is not the case that every pair of irrationals, when exponentiated, give an irrational. There is nothing questionable about this proof as far as I am aware. But it does not prove that there are two irrationals whose exponent is rational! If you think it does, then I ask you, please name them for me. That information is not in this proof (there are other proofs that do name them, but that is not relevant for my purposes). You may, if you wish, disregard the distinction I am drawing, that is your prerogative, and neither I nor anyone has any problem with that. But you cannot claim that it is a direct proof, it is rather an indirect proof, that proceeds by refuting the negative of the intended assertion. So why am I writing this? Because I have learned, to my dismay, that in U.S. computer science departments–of all places!–students are being taught, erroneously, that any proof that derives a contradiction is a “proof by contradiction”. It is not. Any proof of a negative must proceed by contradiction. A proof by contradiction in the long-established sense of the term is, contrarily, an indirect proof of a positive by refutation of the negative. This distinction is important, even if you want to “mod out” by it in your work, for it is only by drawing the distinction that one can even define the equivalence with which to quotient. That’s my main point. But for those who may not be familiar with the distinction between direct and indirect proof, let me take the opportunity to comment on why one might care to draw such a distinction. It is entirely a matter of intellectual honesty: the information content of the foregoing indirect proof does not fulfill the expectation stated in the theorem. It is a kind of boast, an overstatement, to claim otherwise. Compare the original statement with the reformulation used in the proof. The claim that it is not the case that every pair of irrationals exponentiate to an irrational is uncontroversial. The proof proves it directly, and there is nothing particularly surprising about it. One would even wonder why anyone would bother to state it. Yet the supposedly equivalent claim stated at the outset appears much more fascinating, because most people cannot easily think up an example of two irrationals that exponentiate to rationals. Nor does the proof provide one. Once, when shown the indirect proof, a student of mine blurted out “oh that’s so cheap.” Precisely. Who cares about the information content of proofs? You may not, but I do. It matters, at least in computer science, because proofs are programs and propositions are specifications. You may well not care, as you are well entitled to do. But you may also find one day that you do care. For example, Vladimir Voevodsky’s univalence principle, which codifies classical mathematical practices, is only sensible in a constructive framework in which the distinction cannot be disregarded. Don’t be so sure you don’t care! Filed under: Programming, Research, Teaching ### Brent Yorgey # Deep work and email habits: a followup About six months ago I wrote a post about some new work habits I adopted, inspired by Cal Newport’s blog and by his book, Deep Work. First, I began scheduling blocks of “deep work” time during the week, when I go to the library or a coffee shop and work intensely for several hours with no distractions or interruptions. Second, I decided to read and write email only after 4 pm each day. In my calendar, I put a reminder to write a followup blog post six months later, and here we are! (Has it been six months already!?) My deep work sessions are still going strong, though having this opportunity to reflect has been good: I realized that over the months I have become more lax about using my computer and about what sort of things I am willing to do during my “deep work” sessions. It’s too easy to let them become just a block of time I can use to get done all the urgent things I think I need to get done. Of course, sometimes there are truly urgent things to get done, and having a big block of time to work on them can be great, especially if they require focused effort. But it pays to be more intentional about using the deep work blocks to work on big, long-term projects. The myriad little urgent things will get taken care of some other time, if they’re truly important (or maybe they won’t, if they’re not). Since I’m only teaching two classes this semester, both of which I have taught before, I thought I would have more time for deep work sessions this semester than last, but for some reason it seems I have less. I’m not yet sure whether there’s something I could have done about that, or if the semester just looks different than I expected. This semester has also seen more unavoidable conflicts with my deep work blocks. Usually, I try to keep my scheduled deep work blocks sacrosanct, but I have made some exceptions this semester: for example, search committee meetings are quite important and also extremely difficult to schedule, so I let them be scheduled over top of my deep work blocks if necessary. (But it sure does wreak havoc on my work that week.) I’m also still blocking my email before 4pm. On the one hand, I know this is helping a lot with my productivity and general level of anxiety. Recently I needed to (or thought I needed to!) briefly unblock my email during the day to check whether I had received a certain reply, and I specifically noticed how my anxiety level shot up as soon as I opened my inbox and saw all the messages there—a good reminder of why I have my email blocked in the first place. On the other hand, it can be frustrating, since the hour from 4-5 is often taken up with other things, so email gets pushed to the evening, or to the next day. When this goes on several days in a row it really doesn’t help my anxiety level to know there are emails sitting there that I ought to respond to. So perhaps there might be a better time to process my email than 4-5, but to be honest I am not sure what it would be. I certainly don’t want to do it first thing in the morning, and the middle of the day is not really any better, schedule-wise, than the end. In any case, I intend to keep doing it until a better idea comes along. ## March 03, 2017 ### Functional Jobs # Functional Game Server (Gameplay) Engineer at Playstudios (Full-time) Game Server Engineer (Gameplay) PLAYSTUDIOS is seeking a Game Server Engineer to join the server development team in Austin. This role focuses on creating and improving server-side game logic. Millions of people play our games daily. The Austin team builds the reliable, large-scale production systems that deliver that gameplay and the tools to manage them. Usability and stability underpin every decision we make and we’re looking for someone who can help us build solutions to empower our users. The Game Server Engineer will work directly with business stakeholders to define features, Project Managers to scope and schedule deliverables, and QA Testers to achieve high quality products. PLAYSTUDIOS has a deep passion for quality and we want someone with a high personal bar. Responsibilities: o Develop server-side game-logic using C# for our suite of online social-casino games. o Create and enhance testing tools using C# and F# to verify games' mathematical integrity and client-facing APIs. o Combine sophisticated social-gaming elements into the core casino games. o Collaborate with game designers and mathematicians. o Work with project manager to break down high-level goals into tasks and timelines. Requirements: o Expert knowledge of C# of at least 6 years. o Experience with LINQ, functional programming, or the use of first class functions. o Understanding of client-server architecture in RESTful or similar request-response systems. o Thorough understanding of HTTP technologies and REST/JSON interfaces. o Excellent communication and organization skills. o Proven ability to build high quality products. o BS in Computer Science or similar. Candidates should possess at least three of the following: o Game development experience with shipped titles. o Development experience using ASP.NET Web API 2. o Experience architecting highly scalable backend systems. o Experience using both SQL and NoSQL databases. o Functional programming experience. Bonus: o Competence in F#, Haskell, OCaml, Scala, Clojure, Scheme, or Common Lisp. o Experience in mathematical programming. o Experience with Couchbase or other NoSQL database. o Familiarity with agile development processes. o Experience writing and maintaining unit tests. About PLAYSTUDIOS PLAYSTUDIOS is a developer of engaging casual games for the world’s largest social and mobile platforms. Founded by a team of experienced gaming and technology entrepreneurs, PLAYSTUDIOS’ first free-to-play applications myVEGAS Slots & myVEGAS Blackjack combine the best elements of popular social games with established gambling mechanics. Players enjoy an ever-growing collection of slot and table games and the opportunity to earn an unprecedented selection of valuable real-world Rewards. Benefits and Perks o 100% health benefit coverage for you and your dependents! o 401K plan and stock options. o Flexible vacation policy. o Great downtown Austin location with extensive nearby dining options and vibrant nightlife. o Employee-driven entertainment, happy hours, and team-building events. o Catered lunches multiple times per week. o Snacks and drinks available in the kitchen. o Casual startup-like environment. o An open, collaborative work space. o Commuter benefits program. LOCAL US CANDIDATES ONLY Get information on how to apply for this position. ### FP Complete # Continuous Integration: an overview While exact definitions of Continuous Integration vary, the idea is that your software project is automatically built and tested very regularly, often many times per day. In practice, this usually means every time a developer pushes their code to a central repository, the CI process is performed. Ideally, this is performed on every branch, even those that are works in progress. Advantages of continuous integration include: • Regressions are caught quickly, so less time is spent tracking them down since you know which change introduced the bug. • There is always a working "latest version", rather than finding out near release time that it turns out there are problems. • Prevents non-working code from reaching the "mainline." • Notifies developers immediately when code doesn't work. • Automates deployment of your application so you always have a running test server with the latest code, or even deploy straight to production. • Encourages developers to frequently share their work in progress. • Encourages developers to write automated tests. ## Required tools ### Source code management We manage our code in Git repositories, although other source code management systems will work as well. ### Centralized source code repository Developers must regularly "push" their code from their workstations to a centralized repository. These can be managed solutions, or privately hosted within a company's network, depending on specific project needs. We most commonly use Github, Gitlab, and Atlassian Bitbucket. ### CI server The CI server "watches" for changes to the centralized repository and automatically executes scripts for each change, and provides notifications and reports based on whether those scripts succeed or fail. We regularly use: • Travis CI: tightly integrated with Github, although it is a separate product operated by a different company • Gitlab CI: built directly into Gitlab for a totally seamless experience when using that tool for code repositories and issue management • Jenkins CI: one of the older CI servers, with a vast array of plugins that allow it to work with in conceivable environment, although tight integration can be tricky to configure • Atlassian Bamboo: part of the Atlassian suite, and tightly integrated with their other tools like Bitbucket and JIRA but also usable independently ## Features of CI server While every CI server is different, they offer a similar array of features (some integrated directly, others via plugins). ### Continuous deployment The CI process can automatically deploy your project for testing. This is always a good idea for a test environment, and also used for automatically deploying to production from a protected branch. Gitlab CI also supports "review apps", which will deploy a copy of the app for every branch, which is especially helpful for code reviews since the reviewer can work with a "live" version of the application built from the code that is under review, without first needing to build their own copy. ### Containerized builds Builds can each be run in their own throw-away Docker containers. This means the build environments (e.g. operating system, compilers, system libraries) are defined by Docker images, and multiple projects with different requirements can be built on the same machine and in parallel without interfering with each other. ### Parallel and distributed builds CI jobs are performed by "agents" (also known as "slaves" or "runners"), and multiple agents can be connected to a single CI server or project. This allows setting up a cluster of machines to perform run builds. ### Multi-platform Different projects or build stages within projects can be configured to require certain attributes in the agent where it runs. For example, a multi-platform project might have agents on Windows, Linux, and macOS, and run the build and tests on all three. ### Notification Notification integrates with your existing communications tools, such as e-mail and corporate chat services like Slack. Developers are notified of build failures, successful deployments, and many other events. ### Reporting Various reports are generated that provide a snapshot of your projects' health, so you can see at a glance whether there are increased rates of regressions, and you can see exactly which version of code is running where. ### Secret variables The CI server manages secret values such as deployment credentials, so that the builds have access to them but they aren't revealed otherwise. Access to the credentials can be restricted so that only some build stages can access them, which lets you prevent "untrusted" stages from having access to the credentials. ### Artifacts CI systems let you specify build "artifacts" which are saved by a build stage and can then be accessed by later stages or outside processes. For example, you might have a "build" stage that builds an executable and saves it as an artifact, followed by "deploy" stage that retrieves the artifact and deploys the executable to a web server. These artifacts are saved so that you can go back to an old build and retrieve its the file(s). ### Caching One disadvantage of building in an ephemeral container is that build artifacts (e.g. intermediate files like object code) are lost between builds, which would result in unnecessarily long build times as the same code is built over and over. CI systems support specifying certain paths and files to cache between builds, so that those files will be restored in their original locations in the next build. There is some overlap between caching and artifacts, but caching is meant for intermediate files to speed up subsequent builds, while artifacts are meant to save the final results of a build stage. ## Principles Continuous integration is a tool that can be wielded in many ways. These are some of the principles we follow to make the best use of it. ### Avoid complexity in the CI configuration It can be tempting to use every feature and plugin of a CI system to manage builds, but this is usually counterproductive. In general, let the CI system handle the "where and when" of building, but use your own scripts within the repository for the "how". This lets developers use the same scripts locally and makes it easier to switch to a different CI system in the future should that be desirable. It also means as much of the build process as possible is versioned along with the code which makes it easier to build older versions. ### Write automated tests While a CI system is still useful without automated tests, it really shines when an excellent suite of unit and integration tests is in place. You get quick feedback as soon as tests fail, and this kind of feedback encourages developers to write more tests. This also avoids code that fails tests from reaching the mainline. ### Make builds and tests fast If it takes too long to get feedback, it discourages regular use of the CI system. Features like caching, artifacts, and distributed builds mean you can avoid repeating unnecessary work and speed up builds. For larger and more complex projects this can be difficult, and sometimes it makes sense to split out more thorough integration tests into a separate nightly process so that the main tests return quickly. ### Commit, merge, and push code regularly Developers are encouraged to create feature branches for work in progress and push to them regularly. This gives feedback from the CI system regularly, and also makes collaboration easier. Merging from the "mainline" branch to feature branches should be frequent to avoid complex conflict resolutions. CI systems can also be configured to run a feature branches build as though "mainline" was merged so that developers know as soon as potential conflicts and test failures arise, without having to perform the merge themselves first. ## Conclusion Employing continuous integration makes your development team more productive and your release process less stressful. FP Complete is in the business of modernizing development and devops practices ### Manuel M T Chakravarty # Streaming Irregular Arrays In the new draft paper Streaming Irregular Arrays, we extend the high-performance array library Accelerate for Haskell with support for irregular streams of multidimensional arrays. We discuss why this is useful to work with large datasets on memory-constrained devices, such as GPUs, and how it widens the range of applications for which we achieve portable parallelism with Accelerate. The paper explains how we compile these programs and what runtime mechanisms we need for execution. Moreover, it includes a range of benchmarks to evaluate the performance of the resulting code. ## March 01, 2017 ### Douglas M. Auclair (geophf) # February 2017 1HaskellADay Problems and Solutions ## February 26, 2017 ### Daniel Mlot (duplode) # Casual Hacking With stack, Reloaded It has been quite a while since I wrote about how to use stack for casual play outside of the context of a conventional Haskell project. In the meantime, stack has gained a feature called the global project which in many cases makes it possible to do quick experiments with essentially no setup, while still taking advantage of the infrastructure provided through stack. The global project consists of a stack.yaml file and an associated .stack-work directory, which are kept in ~/.stack/global-project and are used by stack whenever there is no other stack.yaml lying around. The stack.yaml of the global project specifies a resolver, just like any other stack.yaml. If said resolver is a snapshot you use elsewhere, you get access to all packages you have installed from that snapshot with zero configuration.$ pwd
/home/duplode
$ls -lrt | grep stack.yaml$ stack ghci
Configuring GHCi with the following packages:
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
GHCi> import Control.Lens
GHCi> (1,2) ^. _1
1

By the way, this also holds for the stack-powered Intero Emacs mode, which makes it possible to simply open a new *.hs file anywhere and immediately start hacking away.

What about packages you didn’t install beforehand? They are no problem, thanks to the --package option of stack ghci, which allows installing snapshot packages at a whim.

$stack ghci --package fmlist fmlist-0.9: download fmlist-0.9: configure fmlist-0.9: build fmlist-0.9: copy/register Configuring GHCi with the following packages: GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help Loaded GHCi configuration from /home/duplode/.ghci Loaded GHCi configuration from /tmp/ghci22828/ghci-script GHCi> import qualified Data.FMList as FM GHCi> FM.foldMapA (\x -> show <$> [0..x]) [0..3]
["0000","0001","0002","0003","0010","0011","0012","0013","0020","0021",
"0022","0023","0100","0101","0102","0103","0110","0111","0112","0113",
"0120","0121","0122","0123"]

One caveat is that --package won’t install packages outside of the snapshot on its own, so you have to add them to the extra-deps field of the global project’s stack.yaml beforehand, just like you would do for an actual project. If you need several non-Stackage packages, you may find it convenient to create a throwaway project for the sole purpose of letting stack solver figure out the necessary extra-deps for you.

$mkdir throwaway$ stack new throwaway --resolver lts-7.14 # Same resolver of the global project.
# ...
Writing configuration to file: throwaway/stack.yaml
All done.
$cd throwaway$ vi throwaway.cabal # Let's add reactive-banana to the dependencies.
$stack solver # ... Successfully determined a build plan with 2 external dependencies. The following changes will be made to stack.yaml: * Dependencies to be added extra-deps: - pqueue-1.3.2 - reactive-banana-1.1.0.1 To automatically update stack.yaml, rerun with '--update-config'$ vi ~/.stack/global-project/stack.yaml # Add the packages to the extra-deps.
$cd ..$ rm -rf throwaway/
$stack ghci --package reactive-banana pqueue-1.3.2: configure pqueue-1.3.2: build pqueue-1.3.2: copy/register reactive-banana-1.1.0.1: configure reactive-banana-1.1.0.1: build reactive-banana-1.1.0.1: copy/register Completed 2 action(s). Configuring GHCi with the following packages: GHCi, version 8.0.1: http://www.haskell.org/ghc/ :? for help Loaded GHCi configuration from /home/duplode/.ghci Loaded GHCi configuration from /tmp/ghci23103/ghci-script GHCi> import Reactive.Banana GHCi> :t stepper stepper :: MonadMoment m => a -> Event a -> m (Behavior a) Support for running stack solver directly with the global project is on the horizon. There are also interesting possibilities if you need to compile your throwaway code. That might be useful, for instance, if you ever feel like testing a hypothesis with a criterion benchmark). While there is a stack ghc command, if you don’t need GHC profiles then taking advantage of --ghci-options to enable -fobject-code for stack ghci can be a more pleasant alternative.$ stack ghci --ghci-options "-O2 -fobject-code"
Configuring GHCi with the following packages:
GHCi, version 8.0.1: http://www.haskell.org/ghc/  :? for help
GHCi> :l Foo.hs
[1 of 1] Compiling Foo              ( Foo.hs, /home/duplode/.stack/global-project/.stack-work/odir/Foo.o )
GHCi> :main
A random number for you: 2045528912275320075

A nice little thing about this approach is that the build artifacts are kept in the global project’s .stack-work, which means they won’t pollute whichever other directory you happen to be at. -fobject-code means you can’t write definitions directly on the GHCi prompt; however, that is not much of a nuisance, given that you are compiling the code anyway, and that the source file is just a :!vim Foo.hs away.

While in these notes I have focused on seat-of-the-pants experimentation, stack also provides tools for running Haskell code with minimal configuration in a more controlled manner. I specially recommend having a look at the script interpreter section of the stack User Guide.

(see the full post for a reddit link)

# Software Engineer/Researcher at Galois, Inc. (Full-time)

We are currently seeking software engineers/researchers to play a pivotal role in fulfilling our mission to make critical systems trustworthy.

Galois engineers participate in one or more projects concurrently, and specific roles vary greatly according to skills, interests, and company needs. Your role may include technology research and development, requirements gathering, implementation, testing, formal verification, infrastructure development, project leadership, and/or supporting new business development.

Skills & Requirements

• Education: Minimum of a Bachelor’s degree in computer science or equivalent. MS or PhD in CS or a related field desirable but optional, depending on specific role.
• Required Technical Expertise: Must have hands-on experience developing software and/or performing computer science research. Demonstrated expertise in aspects of software development mentioned above.
• Required General Skills: Must work well with customers, including building rapport, identifying needs, and communicating with strong written, verbal, and presentation skills. Must be highly motivated and able to self-manage to deadlines and quality goals.

Our engineers use tools such as functional programming languages (including Haskell) and formal verification techniques to design and develop advanced technologies for safety- and security-critical systems, networks, and applications. Our research areas include computer security, cyber physical systems, identity management, security risk analysis, machine learning, systems software, and networking. Engineers work in small team settings and must successfully interact with clients, partners, and other employees in a highly cooperative, collaborative, and intellectually challenging environment.

We’re looking for people who can invent, learn, think, and inspire. We reward creativity and thrive on collaboration. If you are interested, please submit your cover letter and resume via out website

Get information on how to apply for this position.

# Software Engineer - Functional Programming at NAVIS (Full-time)

NAVIS is looking for a passionate software developer with experience in functional programming to join our talented Engineering Team.

Unfortunately, you won't get to save baby seals in this position, but we are big fans of them.

We need someone who's naturally-inclined to build quality software from the start using strong design patterns and TDD. We need someone who works well alone AND collaboratively with an Agile team.

NAVIS is building a suite of SaaS products and an underlying data platform to continue it’s impressive growth trajectory and propel the company to new levels of success. We are seeking ambitious software professionals (with varying levels of experience) to engineer our products, platform, and frameworks. Come be a part of making that happen!

You would be working as part of a small, cross-functional, Agile team of Software Engineers, QA Analysts, and Product Management to build out our product suite.

Our Engineering department sits in an open-concept work space to foster teamwork and collaboration.

KEY RESPONSIBILITIES & EXPECTATIONS:

• In close coordination with your team, design and build amazing software

• Consistently apply existing design and development patterns, extending/improving them as warranted

• Fully embrace the BDD approach to development

• Participate in the full SDLC – from idea formulation to supporting our production environments

QUALIFICATIONS:

• BS in Computer Science preferred or equivalent experience

• 2+ years in web-based application development (front-end and/or back-end)

• 2+ years experience working within product development teams

• Functional programming experience, particularly Clojure / ClojureScript, or another language such as Scala, OCAML, ML, Lisp, F#, and/or Haskell

• Demonstrated ability to develop rich, efficient code

• Propensity toward re-use

• Experience in full-stack web development

"EXTRA CREDIT":

• Practical experience with one or more front-end frameworks: Ember, Backbone, React

• Exposure/experience with both PostgreSQL and SQL data stores

• Experience building multi-tennant SaaS solutions

• Experience leveraging AWS hosting services, deployment pipeline

• Contributions to open source community

• Experience with an iterative development methodology (e.g. Scrum, Kanban) and techniques for sustaining rapid release cycles (e.g. TDD, BDD)

WHY WOULD YOU WANT TO WORK FOR NAVIS?

• Check out our Company Page on StackOverflow: http://careers.stackoverflow.com/company/http-www-navisresorts-com-

• Our technology is blossoming and we are evaluating and already using some of the latest and greatest technologies on the market today.

• NAVIS is growing fast due to high market demand for our products. Our products are successful because they truly help our clients succeed and grow. We demonstrate our value to our clients every single day.

• We are building new technologies and enhancing existing ones - we are NOT standing still. We are re-architecting our products from the UI through the database. This means we need strong, creative, intelligent developers with great ideas that love to build awesome software products.

• The Oregonian and the Orlando Sentinel have listed NAVIS as a "Top Place to Work" for FIVE YEARS RUNNING!

• We live our core values - they are not window dressing.

• Strong technical leadership in a team-based atmosphere

• Local, established Oregon-founded and based company

• Healthy financial foundation

• We give generously to local charities and offer opportunities to participate to directly help those in need

• Very competitive salaries with a 10% annual bonus target and a full benefits package, including matching 401(k), and paid time off (including Voluntary Time Off so you can donate your time to your favorite charity)

• You get to LIVE and WORK in Central Oregon - what could be better than that?

The NAVIS Core Values are: Golden Rule: Treat others as you would want to be treated Integrity: A person of your word, highly trusted Innovation: Open and involved in creating or executing on "new" Passion: Love the TEAM, the clients and the work we do Attitude: Consistently display a positive, can-do attitude

LOCATION OF THIS POSITION: To foster a highly-collaborative team environment, this position will be located in beautiful Bend, Oregon.

ABOUT NAVIS: Based in Bend, Oregon, NAVIS is the leading provider of sales and marketing solutions to hotels, resorts and vacation rental management companies in North America. Building on our rich 28-year heritage with humble beginnings, NAVIS is strategically focused on the critical value of providing accurate, timely data for our clients. Our clients view NAVIS as the best source of solutions, and employees view NAVIS as THE best place to work.

NAVIS is and Equal Opportunity Employer(EOE).​

Get information on how to apply for this position.

# The typed-process library

In October of last year, I published a new library - typed-process. It builds on top of the veritable process package, and provides an alternative API (which I'll explain in a bit). It's not the first time I've written such a wrapper library; I first did so when creating Data.Conduit.Process, which is just a thin wraper around Data.Streaming.Process.

With this proliferation of APIs, why did I go for another one? With Data.(Conduit/Streaming).Process, I tried to stay as close as possible to the underlying process API. And the underlying process API is rigid for (at least) two reasons:

• It's one of the most used APIs in the Haskell ecosystem, so breaking changes carry a very high cost
• Since process is a dependency of GHC itself (a boot library), we're limited in adding dependencies

After I got sufficiently fed up with limitations in the existing APIs, I decided to take a crack at doing it all from scratch. I made a small announcement on Twitter, and have been using this library regularly since its release. In addition, a few people have raised questions on the process issue tracker whose simplest answer is IMO "use typed-process." Therefore, I think now's a good time to discuss the library more publicly and get some feedback as to what to do with it.

## Overview of typed-process

There is both a typed-process tutorial and Haddock documentation available. If you want details, you should read those. This section is intended to give a little taste of typed-process to set the stage for the rest of the post.

Everything starts with the ProcessConfig datatype, which specified all the rules for how we're going to run an external process. This includes all of the common settings from the CreateProcess type in the process package, like changing the working directory or environment variables. Importantly (and the source of the "typed" in the library name), ProcessConfig takes three type parameters, representing the type of the three standard streams (input, output, and error). For example, ProcessConfig Handle Handle Handle indicates that all three streams will have Handles, whereas ProcessConfig () (STM ByteString) () indicates that input and error will be unit, but output can be access as an STM action which returns a ByteString. (Much more on this later.)

There are multiple helper functions - like withProcess or readProcess - to take a ProcessConfig and turn it into a live, running process. These running processes are represented by the Process type, which like ProcessConfig takes three type parameters. There are underscore variants of these launch functions (like withProcess_ and readProcess_) to automatically check the exit code of a process and, if unsuccessful, throw a runtime exception.

You can access the exit code of a process with waitExitCode and getExitCode, which are blocking and non-blocking, respectively. These functions also come in STM variants to more easily work with processes from atomic sections of code.

Alright, enough overview, let's start talking about motivation.

## Downsides of process

The typed-process tutorial identifies five limitations in the process library that I wanted to overcome. (There's also a sixth issue I'm aware of, a race condition, which I've added as a bonus section.) Let's dive into these more deeply, and see how typed-process addresses them.

### Type variables

I've made a big deal about type variables so far. I believe this is the biggest driving force behind the more usable API in typed-process. Let's consider some idiomatic process-based code.

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-8.0 runghc
import Control.Exception
import System.Process
import System.IO
import System.Exit

main :: IO ()
main = do
(Just inh, Just outh, Nothing, ph) <- createProcess
(proc "cat" ["-", "/usr/share/dict/words"])
{ std_in = CreatePipe
, std_out = CreatePipe
}
hPutStrLn inh "This is the list of all words:"
hClose inh
out <- hGetContents outh
evaluate $length out -- lazy I/O :( mapM_ putStrLn$ take 100 $lines out ec <- waitForProcess ph if (ec == ExitSuccess) then return () else error$ "cat process failed: " ++ show ec

The fact that std_in and std_out specify the creation of a Handle is not reflected in the types at all. If we left those changes out, our program would still compile, but our pattern match of (Just inh, Just outh would fail. By moving this information into the type system, we can catch bugs at compile time. Here's the equivalent code as above:

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-8.0 runghc --package typed-process
import Control.Exception
import System.Process.Typed
import System.IO

main :: IO ()
main = do
let procConf = setStdin createPipe
$setStdout createPipe$ proc "cat" ["-", "/usr/share/dict/words"]
withProcess_ procConf $\p -> do hPutStrLn (getStdin p) "This is the list of all words:" hClose$ getStdin p
out <- hGetContents $getStdout p evaluate$ length out -- lazy I/O :(
mapM_ putStrLn $take 100$ lines out

If you leave off the setStdin or setStdout calls, the program will not compile. But this is only the beginning. Instead of being limited to either generating a Handle or not, we now have huge amounts of flexibility in how we configure our streams. For example, here's an alternative approach to providing standard input to the process:

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-8.0 runghc --package typed-process
import Control.Exception
import System.Process.Typed
import System.IO

main :: IO ()
main = do
let procConf = setStdin (byteStringInput "This is the list of all words:\n")
$setStdout createPipe$ proc "cat" ["-", "/usr/share/dict/words"]
withProcess_ procConf $\p -> do out <- hGetContents$ getStdout p
evaluate $length out -- lazy I/O :( mapM_ putStrLn$ take 100 $lines out There are functions in the process package that allow specifying standard input this easily, but they are not as composable as this approach (as we'll discuss below). There's much more to be said about these type parameters, but hopefully this taste, plus the further examples in this post, will demonstrate their usefulness. ### Proper concurrency Functions like readProcessWithExitCode use some pretty hairy (IMO) lazy I/O tricks internally to read the output and error streams from a process. For the most part, you can simply use these functions without worrying about the crazy innards. However, consider if you want to do something off the beaten track, like capture the error stream while allowing the output stream to go to the parent process's stdout. There's no built-in function in process to handle that, so you'll be stuck implementing that behavior. And this functionality is far from trivial to get right. By contrast, typed-process does not use any lazy I/O. And while it provides a readProcess function, there's nothing magical about it; it's built on top of the byteStringOutput stream config, which uses proper threading under the surface and provides its output via STM for even nicer concurrent coding. #!/usr/bin/env stack -- stack --install-ghc --resolver lts-8.0 runghc --package typed-process {-# LANGUAGE OverloadedStrings #-} import Control.Concurrent.STM (atomically) import System.Process.Typed import qualified Data.ByteString.Lazy.Char8 as L8 main :: IO () main = do let procConf = setStdin closed$ setStderr byteStringOutput
$proc "stack" ["path", "--verbose"] err <- withProcess_ procConf$ atomically . getStderr
putStrLn "\n\n\nCaptured the following stderr:\n\n"
L8.putStrLn err

### STM

I won't dwell much on this one, since the benefits are less commonly useful. Since many functions in typed-process provide both IO and STM alternatives, it can significantly simplify some concurrent algorithms by letting you keep more logic within an atomic block. This is similar to (and inspired by) the design choices in the async library, which is my favorite library of all time.

### Binary I/O

All input and output in typed-process works on binary data as ByteStrings, instead of textual String data. This is:

### More composable

A major goal of this library has been to be as composable as possible. I've been frustrated by two issues in the process package:

1. Many common changes to the API necessitate a breaking API change (e.g., the addition of the child_group setting or NoStream constructor)
2. There is a big split between helper functions that work on CreateProcess values (like readCreateProcess) and those that work on raw command/argument pairs (like readProcess). The situation has improved in recent releases, but in older process releases, the lack of CreateProcess variants of many functions made it very difficult to both modify the environment/working directory for a process and capture its output or error.

For (1), I've gone the route of smart constructors throughout the API. You cannot access the ProcessConfig data constructor, but instead must use proc, shell, or OverloadedStrings. Instead of record accessors, there are setter and getter functions. And instead of a hard-coded list of stream types via a set of data constructors, you can create arbitrary StreamSpecs via the mkStreamSpec function. I hope this turns out to be an API that is resilient to breaking changes.

For (2), the solution is easy: all launch functions in typed-process work exclusively on ProcessConfig. Problem solved. We now have a very clear breakdown in the API: first you configure everything you want about your process, and then you choose whichever launch function makes the most sense to you.

### Bonus: Race condition

There's a long standing race condition in process - which will hopefully be resolved soon - that introduces a race condition on waiting for child processes. In typed-process, we've avoided this entirely with a different approach to child process exit codes. Namely: we fork a separate thread to wait for the process and fill an STM TMVar, which both ensures no race condition and makes it possible to observe the process exiting from within an atomic block.

As a side benefit, this also avoids the possibility of accidentally creating zombie processes by not getting the process's exit code when it finishes. Similarly, by encouraging the bracket pattern (via withProcess) when interacting with a process, killing off child processes in the case of exceptions happens far more reliably.

## Limitations

For the most part, I have not run into significant limitations with typed-process so far. The biggest annoyances I have with it are those inherited from process, specifically that command line arguments and environment variables are specified as Strings, leading to some character encoding issues.

I'm certain there are limitations of typed-process versus process. And for others, there may be a higher learning curve with typed-process versus process. I haven't received enough feedback on that yet to assess, however.

The other downside is dependencies, for those who worry about such things. In addition to depending on process itself (and therefore inheriting its dependencies), typed-process depends on async, bytestring, conduit, conduit-extra, exceptions, stm, and transformers. The conduit deps can easily be moved out, it's just for providing a convenience function that could be provided elsewhere. Regarding the others:

• transformers is only needed for MonadIO. Now that MonadIO has moved into base, I could make that dependency conditional.
• The exceptions dependency makes withProcess more general, and would be a shame to lose.
• Dropping async and stm could be done by inlining their code here, which would work, but is a bad idea IMO.

The only reason for considering these changes would be the next section...

## What's next?

I'm left with the question of what to do with this package, especially as more people ask questions that can be answered with "just use typed-process."

• Do nothing. The package can live on Hackage/Stackage as-is, people who want to use it can use it, and that's it.
• Add a note to the package process mentioning it as a potential, alternative API. Even though I'm currently the process package maintainer, I feel it would be inappropriate for me to make such a decision myself.
• Even more radically: if there is strong support for this API, we could consider merging it back into the process package. I wouldn't be in favor of modifying the System.Process module (we should keep it as-is for backwards compatibility), but adding a new module with this API is certainly doable (sans the dependency issues mentioned aboved).

At the very least, this library has scratched a personal itch. If it helps others, that's a great perk :).

# Signed sets and ballots, part 2

Recall, from my previous post, that our goal is to find a combinatorial proof showing the correspondence between signed sets and signed ballots, where a signed set is just a set of $n$ elements, considered positive or negative according to the parity of $n$, and a signed ballot is an ordered list of sets, considered positive or negative according to the parity of the number of sets.

So, how should such a proof look? For a given number of labels $n$, there is a single signed set structure, which is just the set of labels itself (with a sign depending on the parity of $n$). On the other hand, there are lots of ballots on $n$ labels; the key is that some are positive and some are negative, since the sign of the ballots depends on the parity of the number of parts, not the number of labels. For example, consider $n = 3$. There is a single (negative) signed set structure:

(I will use a dashed blue line to indicate negative things, and a solid black line for positive things.)

On the other hand, as we saw last time, there are 13 ballot structures on 3 labels, some positive and some negative:

In this example, it is easy to see that most of the positives and negatives cancel, with exactly one negative ballot left over, which corresponds with the one negative set. As another example, when $n = 4$, there is a single positive set, and 75 signed ballots:

This time it is not quite so easy to tell at a glance (at least not the way I have arranged the ballots in the above picture!), but in fact one can verify that there are exactly 37 negative ballots and 38 positive ones, again cancelling to match the one positive set.

What we need to show, then, is that we can pair up the ballots in such a way that positive ballots are matched with negative ballots, with exactly one ballot of the appropriate sign left to be matched with the one signed set. This is known as a signed involution: an involution is a function which is its own inverse, so it matches things up in pairs; a signed involution sends positive things to negative things and vice versa, except for any fixed points.

In order to do this, we will start by assuming the set of labels is linearly ordered. In one sense this is no big deal, since for any finite set of labels we can always just pick an arbitrary ordering, if there isn’t an “obvious” ordering to use already. On the other hand, it means that the correspondence will be specific to the chosen linear ordering. All other things being equal, we would prefer a correspondence that depends solely on the structure of the ballots, and not on any structure inherent to the labels. I will have quite a bit more to say about this in my third and (probably) final post on the topic. But for today, let’s just see how the correspondence works, given the assumption of a linear order on the labels. I came up with this proof independently while contemplating Anders Claesson’s post, though it turns out that the exact same proof is already in a paper by Claesson and Hannah (in any case it is really just a small lemma, the sort of thing you might give as a homework problem in an undergraduate course on combinatorics).

Given some ballot, find the smallest label. For example, if the labels are $\{1, \dots, n\}$ as in the examples so far, we will find the label $1$.

• If the smallest label is contained in some part together with at least one other label, separate it out into its own part by itself, and put it to the right of its former part. Like this:

• On the other hand, if the smallest label is in a part by itself, merge it with the part on the left (if one exists). This is clearly the inverse of the above operation.

• The only case we haven’t handled is when the smallest label is in a part by itself which is the leftmost part in the ballot. In that case, we leave that part alone, switch to considering the second-smallest label, and recursively carry out the involution on the remainder of the ballot.

For example:

In this case we find the smallest label (1) in a part by itself in the leftmost position, so we leave it where it is and recurse on the remainder of the ballot. Again, we find the smallest remaining label (2) by itself and leftmost, so we recurse again. This time, we find the smallest remaining label (3) in a part with one other label, so we separate it out and place it to the right.

This transformation on ballots is clearly reversible. The only ballots it doesn’t change are ballots with each label in its own singleton part, sorted from smallest to biggest, like this:

In this case the algorithm recurses through the whole ballot and finds each smallest remaining label in the leftmost position, ultimately doing nothing. Notice that a sorted ballot of singletons has the same sign as the signed set on the same labels, namely, $(-1)^n$. In any other case, we can see that the algorithm matches positive ballots to negative and vice versa, since it always changes the number of parts by 1, either splitting one part into two or merging two parts into one.

Here’s my implementation of the involution in Haskell:

type Ballot = [[Int]]

ballotInv :: Ballot -> Ballot
ballotInv = go 1
where
go _ [] = []
go s ([a]:ps)
| s == a = [a] : go (s+1) ps
go s (p:ps)
| s elem p = delete s p : [s] : ps
go s (p:[a]:ps)
| s == a = sort (a:p) : ps
go s (p:ps) = p : go s ps

(The call to sort is not strictly necessary, but I like to keep each part canonically sorted.)

Here again are the 13 signed ballots for $n = 3$, this time arranged so that the pair of ballots in each row correspond to each other under the involution, with the leftover, sorted ballot by itself at the top.

If you’d like to see an illustration of the correspondence for $n = 4$, you can find it here (I didn’t want to include inline since it’s somewhat large).

This completes the proof that signed sets and signed ballots correspond. But did we really need that linear order on the labels? Tune in next time to find out!

# Miscellaneous notes on anagram scoring

My article on finding the best anagram in English was well-received, and I got a number of interesting comments about it.

• A couple of people pointed out that this does nothing to address the issue of multiple-word anagrams. For example it will not discover “I, rearrangement servant / Internet anagram server” True, that is a different problem entirely.

• Markian Gooley informed me that “megachiropteran / cinematographer” has been long known to Scrabble players, and Ben Zimmer pointed out that A. Ross Eckler, unimpressed by “cholecystoduodenostomy / duodenocholecystostomy”, proposed a method almost identical to mine for scoring anagrams in an article in Word Ways in 1976. M. Eckler also mentioned that the “remarkable” “megachiropteran / cinematographer” had been published in 1927 and that “enumeration / mountaineer” (which I also selected as a good example) appeared in the Saturday Evening Post in 1879!

• The Hacker News comments were unusually pleasant and interesting. Several people asked “why didn't you just use the Levenshtein distance”? I don't remember that it ever occured to me, but if it had I would have rejected it right away as being obviously the wrong thing. Remember that my original chunking idea was motivated by the observation that “cholecystoduodenostomy / duodenocholecystostomy” was long but of low quality. Levenshtein distance measures how far every letter has to travel to get to its new place and it seems clear that this would give “cholecystoduodenostomy / duodenocholecystostomy” a high score because most of the letters move a long way.

Hacker News user tyingq tried it anyway, and reported that it produced a poor outcome. The top-scoring pair by Levenshtein distance is “anatomicophysiologic physiologicoanatomic”, which under the chunking method gets a score of 3. Repeat offender “cholecystoduodenostomy / duodenocholecystostomy” only drops to fourth place.

A better idea seems to be Levenshtein score per unit of length, suggested by lobste.rs user cooler_ranch.

• A couple of people complained about my “notaries / senorita” example, rightly observing that “senorita” is properly spelled “señorita”. This bothered me also while I was writing the article. I eventually decided although “notaries” and “señorita” are certainly not anagrams in Spanish (even supposing that “notaries” was a Spanish word, which it isn't) that the spelling of “senorita” without the tilde is a correct alternative in English. (Although I found out later that both the Big Dictionary and American Heritage seem to require the tilde.)

Hacker News user ggambetta observed that while ‘é’ and ‘e’, and ‘ó’ and ‘o’ feel interchangeable in Spanish, ‘ñ’ and ‘n’ do not. I think this is right. The ‘é’ is an ‘e’, but with a mark on it to show you where the stress is in the word. An ‘ñ’ is not like this. It was originally an abbreviation for ‘nn’, introduced in the 18th century. So I thought it might make sense to allow ‘ñ’ to be exchanged for ‘nn’, at least in some cases.

(An analogous situation in German, which may be more familiar, is that it might be reasonable to treat ‘ö’ and ‘ü’ as if they were ‘oe’ and ‘ue’. Also note that in former times, “w” and “uu” were considered interchangeable in English anagrams.)

Unfortunately my Spanish dictionary is small (7,000 words) and of poor quality and I did not find any anagrams of “señorita”. I wish I had something better for you. Also, “señorita” is not one of the cases where it is appropriate to replace “ñ” with “nn”, since it was never spelled “sennorita”.

I wonder why sometimes this sort of complaint seems to me like useless nitpicking, and other times it seems like a serious problem worthy of serious consideration. I will try to think about this.

• Mike Morton, who goes by the anagrammatic nickname of “Mr. Machine Tool”, referred me to his Higgledy-piggledy about megachiropteran / cinematographer, which is worth reading.

• Regarding the maximal independent set algorithm I described yesterday, Shreevatsa R. suggested that it might be conceptually simpler to find the maximal clique in the complement graph. I'm not sure this helps, because the complement graph has a lot more edges than the original. Below right is the complement graph for “acrididae / cidaridae”. I don't think I can pick out the 4-cliques in that graph any more than the independent sets in the graph on the lower-left, and this is an unusually favorable example case for the clique version, because the original graph has an unusually large number of edges.

But perhaps the cliques might be easier to see if you know what to look for: in the right-hand diagram the four nodes on the left are one clique, and the four on the right are the other, whereas in the left-hand diagram the two independent sets are all mixed together.

• An earlier version of the original article mentioned the putative 11-pointer “endometritria / intermediator”. The word “endometritria” seemed pretty strange, and I did look into it before I published the article, but not carefully enough. When Philip Cohen wrote to me to question it, I investigated more carefully, and discovered that it had been an error in an early WordNet release, corrected (to “endometria”) in version 1.6. I didn't remember that I had used WordNet's word lists, but I am not surprised to discover that I did.

A rare printing of Webster's 2¾th American International Lexican includes the word “endometritriostomoscopiotomous” but I suspect that it may be a misprint.

• Philippe Bruhat wrote to inform me of Alain Chevrier’s book notes / sténo, a collection of thematically related anagrams in French. The full text is available online.

• Alexandre Muñiz, who has a really delightful blog, and who makes and sells attractive and clever puzzles of his own invention. pointed out that soapstone teaspoons are available. The perfect gift for the anagram-lover in your life! They are not even expensive.

• Thanks also to Clinton Weir, Simon Tatham, Jon Reeves, Wei-Hwa Huang, and Philip Cohen for their emails about this.

# Moore's law beats a better algorithm

Yesterday I wrote about the project I did in the early 1990s to find the best anagrams. The idea is to give pair of anagram words a score, which is the number of chunks into which you have to divide one word in order to rearrange the chunks to form the other word. This was motivated by the observation that while “cholecysto-duodeno-stomy” and “duodeno-cholecysto-stomy” are very long words that are anagrams of one another, they are not interesting because they require so few chunks that the anagram is obvious. A shorter but much more interesting example is “aspired / diapers”, where the letters get all mixed up.

I wrote:

One could do this with a clever algorithm, if one were available. There is a clever algorithm, based on finding maximal independent sets in a certain graph. I did not find this algorithm at the time; nor did I try. Instead, I used a brute-force search.

I wrote about the brute-force search yesterday. Today I am going to discuss the clever algorithm.

The plan is to convert a pair of anagrams into a graph that expresses the constraints on how the letters can move around when one turns into the other. Shown below is the graph for comparing acrididae (grasshoppers) with cidaridae (sea urchins):

The “2,4” node at the top means that the letters ri at position 2 in acrididae match the letters ri at position 4 in cidaridae; the “3,1” node is for the match between the first id and the first id. The two nodes are connected by an edge to show that the two matchings are incompatible: if you map the ri to the ri, you cannot also map the first id to the first id; instead you have to map the first id to the second one, represented by the node “3,5”, which is not connected to “2,4”. A maximal independent set in this graph is a maximal selection of compatible matchings in the words, which corresponds to a division into the minimum number of chunks.

Usually the graph is much less complicated than this. For simple cases it is empty and the maximal independent set is trivial. This one has two maximal independent sets, one (3,1; 5,5; 6,6; 7,7) corresponding to the obvious minimal splitting:

and the other (2,4; 3,5; 5,1; 6,2) to this other equally-good splitting:

In an earlier draft of yesterday's post, I wrote:

I should probably do this over again, because my listing seems to be incomplete. For example, it omits “spectrum / crumpets” which would have scored 5, because the Webster's Second list contains crumpet but not crumpets.

I was going to leave it at that, but then I did do it over again, and this time around I implemented the “good” algorithm. It was not that hard. The code is on GitHub if you would like to see it.

To solve the maximal independent set instances, I used a guided brute-force search. Maximal independent set is NP-complete, and so the best known algorithm for it runs in exponential time. But the instances in which we are interested here are small enough that this doesn't matter. The example graph above has 8 nodes, so one needs to check at most 256 possible sets to see which is the maximal independent set.

I collated together all the dictionaries I had handy. (I didn't know yet about SCOWL.) These totaled 275,954 words, which is somewhat more than Webster's Second by itself. One of the new dictionaries did contain crumpets so the result does include “spectrum / crumpets”.

The old scored anagram list that I made in the 1990s contained 23,521 pairs. The new one contains 38,333. Unfortunately most of the new stuff is of poor quality, as one would expect. Most of the new words that were missing from my dictionary the first time around are obscure. Perhaps some people would enjoy discovering that that “basiparachromatin” and “Marsipobranchiata” are anagrams, but I find it of very limited appeal.

But the new stuff is not all junk. It includes:

10 antiparticles paternalistic
10 nectarines transience
10 obscurantist subtractions

11 colonialists oscillations
11 derailments streamlined

which I think are pretty good.

I wasn't sure how long the old program had taken to run back in the early nineties, but I was sure it had been at least a couple of hours. The new program processes the 275,954 inputs in about 3.5 seconds. I wished I knew how much of this was due to Moore's law and how much to the improved algorithm, but as I said, the old code was long lost.

But then just as I was finishing up the article, I found the old brute-force code that I thought I had lost! I ran it on the same input, and instead of 3.5 seconds it took just over 4 seconds. So almost all of the gain since the 1990s was from Moore's law, and hardly any was from the “improved” algorithm.

I had written in the earlier article:

In 2016 [ the brute force algorithm ] would probably still [ run ] quicker than implementing the maximal independent set algorithm.

which turned out to be completely true, since implementing the maximal independent set algorithm took me a couple of hours. (Although most of that was building out a graph library because I didn't want to look for one on CPAN.)

But hey, at least the new program is only twice as much code!

[ Addendum: The program had a minor bug: it would disregard capitalization when deciding if two words were anagrams, but then compute the scores with capitals and lowercase letters distinct. So for example Chaenolobus was considered an anagram of unchoosable, but then the Ch in Chaenolobus would not be matched to the ch in unchoosable, resulting in a score of 11 instead of 10. I have corrected the program and the output. Thanks to Philip Cohen for pointing this out. ]

# I found the best anagram in English

I planned to publish this last week sometime but then I wrote a line of code with three errors and that took over the blog.

A few years ago I mentioned in passing that in the 1990s I had constructed a listing of all the anagrams in Webster's Second International dictionary. (The Webster's headword list was available online.)

This was easy to do, even at the time, when the word list itself, at 2.5 megabytes, was a file of significant size. Perl and its cousins were not yet common; in those days I used Awk. But the task is not very different in any reasonable language:

# Process word list
while (my $word = <>) { chomp$word;
my $sorted = join "", sort split //,$word;  # normal form
push @{$anagrams{$sorted}}, $word; } for my$words (values %anagrams) {
print "@$words\n" if @$words > 1;
}

The key technique is to reduce each word to a normal form so that two words have the same normal form if and only if they are anagrams of one another. In this case we do this by sorting the letters into alphabetical order, so that both megalodon and moonglade become adeglmnoo.

Then we insert the words into a (hash | associative array | dictionary), keyed by their normal forms, and two or more words are anagrams if they fall into the same hash bucket. (There is some discussion of this technique in Higher-Order Perl pages 218–219 and elsewhere.)

(The thing you do not want to do is to compute every permutation of the letters of each word, looking for permutations that appear in the word list. That is akin to sorting a list by computing every permutation of the list and looking for the one that is sorted. I wouldn't have mentioned this, but someone on StackExchange actually asked this question.)

Anyway, I digress. This article is about how I was unhappy with the results of the simple procedure above. From the Webster's Second list, which contains about 234,000 words, it finds about 14,000 anagram sets (some with more than two words), consisting of 46,351 pairs of anagrams. The list starts with

aal ala

and ends with

zolotink zolotnik

which exemplify the problems with this simple approach: many of the 46,351 anagrams are obvious, uninteresting or even trivial. There must be good ones in the list, but how to find them?

I looked in the list to find the longest anagrams, but they were also disappointing:

cholecystoduodenostomy duodenocholecystostomy

(Webster's Second contains a large amount of scientific and medical jargon. A cholecystoduodenostomy is a surgical operation to create a channel between the gall bladder (cholecysto-) and the duodenum (duodeno-). A duodenocholecystostomy is the same thing.)

This example made clear at least one of the problems with boring anagrams: it's not that they are too short, it's that they are too simple. Cholecystoduodenostomy and duodenocholecystostomy are 22 letters long, but the anagrammatic relation between them is obvious: chop cholecystoduodenostomy into three parts:

cholecysto duodeno stomy

and rearrange the first two:

duodeno cholecysto stomy

and there you have it.

This gave me the idea to score a pair of anagrams according to how many chunks one had to be cut into in order to rearrange it to make the other one. On this plan, the “cholecystoduodenostomy / duodenocholecystostomy” pair would score 3, just barely above the minimum possible score of 2. Something even a tiny bit more interesting, say “abler / blare” would score higher, in this case 4. Even if this strategy didn't lead me directly to the most interesting anagrams, it would be a big step in the right direction, allowing me to eliminate the least interesting.

This rule would judge both “aal / ala” and “zolotink / zolotnik” as being uninteresting (scores 2 and 4 respectively), which is a good outcome. Note that some other boring-anagram problems can be seen as special cases of this one. For example, short anagrams never need to be cut into many parts: no four-letter anagrams can score higher than 4. The trivial anagramming of a word to itself always scores 1, and nontrivial anagrams always score more than this.

So what we need to do is: for each anagram pair, say acrididae (grasshoppers) and cidaridae (sea urchins), find the smallest number of chunks into which we can chop acrididae so that the chunks can be rearranged into cidaridae.

One could do this with a clever algorithm, if one were available. There is a clever algorithm, based on finding maximal independent sets in a certain graph. (More about this tomorrow.) I did not find this algorithm at the time; nor did I try. Instead, I used a brute-force search. Or rather, I used a very small amount of cleverness to reduce the search space, and then used brute-force search to search the reduced space.

Let's consider a example, scoring the anagram “abscise / scabies”. You do not have to consider every possible permutation of abscise. Rather, there are only two possible mappings from the letters of abscise to the letters of scabies. You know that the C must map to the C, the A must map to the A, and so forth. The only question is whether the first S of abscise maps to the first or to the second S of scabies. The first mapping gives us:

and the second gives us

because the S and the C no longer go to adjoining positions. So the minimum number of chunks is 5, and this anagram pair gets a score of 5.

To fully analyze cholecystoduodenostomy by this method required considering 7680 mappings. (120 ways to map the five O's × 2 ways to map the two C's × 2 ways to map the two D's, etc.) In the 1990s this took a while, but not prohibitively long, and it worked well enough that I did not bother to try to find a better algorithm. In 2016 it would probably still run quicker than implementing the maximal independent set algorithm. Unfortunately I have lost the code that I wrote then so I can't compare.

Assigning scores in this way produced a scored anagram list which began

2 aal ala

and ended

4 zolotink zolotnik

and somewhere in the middle was

3 cholecystoduodenostomy duodenocholecystostomy

all poor scores. But sorted by score, there were treasures at the end, and the clear winner was

14 cinematographer megachiropteran

I declare this the single best anagram in English. It is 15 letters long, and the only letters that stay together are the E and the R. “Cinematographer” is as familiar as a 15-letter word can be, and “megachiropteran” means a giant bat. GIANT BAT! DEATH FROM ABOVE!!!

And there is no serious competition. There was another 14-pointer, but both its words are Webster's Second jargon that nobody knows:

14 rotundifoliate titanofluoride

There are no score 13 pairs, and the score 12 pairs are all obscure. So this is the winner, and a deserving winner it is.

I think there is something in the list to make everyone happy. If you are the type of person who enjoys anagrams, the list rewards casual browsing. A few examples:

8 negativism timesaving
8 peripatetic precipitate
8 scepters respects
8 shortened threnodes
8 soapstone teaspoons

9 excitation intoxicate
9 integrals triangles
9 ivoriness revisions
9 masculine calumnies

10 coprophagist topographics
10 chuprassie haruspices
10 citronella interlocal

11 clitoridean directional
11 dispensable piebaldness

“Clitoridean / directional” has been one of my favorites for years. But my favorite of all, although it scores only 6, is

6 yttrious touristy

I think I might love it just because the word yttrious is so delightful. (What a debt we owe to Ytterby, Sweden!)

I also rather like

5 notaries senorita

which shows that even some of the low-scorers can be worth looking at. Clearly my chunk score is not the end of the story, because “notaries / senorita” should score better than “abets / baste” (which is boring) or “Acephali / Phacelia” (whatever those are), also 5-pointers. The length of the words should be worth something, and the familiarity of the words should be worth even more.

Here are the results:

In former times there was a restaurant in Philadelphia named “Soupmaster”. My best unassisted anagram discovery was noticing that this is an anagram of “mousetraps”.

[ Addendum 20170222: There is a followup article comparing the two algorithms I wrote for computing scores. ]

[ Addendum 20170222: An earlier version of this article mentioned the putative 11-pointer “endometritria / intermediator”. The word “endometritria” seemed pretty strange, and I did look into it before I published the article, but not carefully enough. When Philip Cohen wrote to me to question it, I investigated more carefully, and discovered that it had been an error in an early WordNet release, corrected (to “endometria”) in version 1.6. I didn't remember that I had used WordNet's word lists, but I am not surprised to discover that I did. ]

# Signed sets and ballots, part 1

The other day, Anders Claesson wrote a very nice blog post explaining a more combinatorial way to understand multiplicative inverses of virtual species (as opposed to the rather algebraic way I explained it in my previous post). In the middle of his post he makes an offhanded assumption which I stubbornly refused to take at face value; after thinking about it for a while and discussing it with Anders, I’m very glad I did, because there’s definitely more going on here than meets the eye and it’s given me a lot of interesting things to think about.

Recall that $E$ denotes the species of sets, defined by $E[U] = \{U\}$, that is, the only $E$-structure on a given label set $U$ is the set of labels itself. Recall also that the exponential generating function of a species $F$ is given by

$\displaystyle F(x) = \sum_{n \geq 0} f_n \frac{x^n}{n!}$

where $f_n$ counts the number of labelled $F$-structures of size $n$. In the case of $E$, we have $f_n = 1$ for all $n$, so

$\displaystyle E(x) = \sum_{n \geq 0} \frac{x^n}{n!} = e^x.$

(This is why $E$ is such a good name for the species of sets—though in a fantastic coincidence, it seems to originally come from the French word for set, ensemble, rather than from the fact that $E(x) = e^x$ (though on the other hand calling it a “coincidence” is probably too strong, since Joyal must surely have picked the notation with the generating function already in mind!).)

Now, from my previous post we know that

$\displaystyle E^{-1} = (1 + E_+)^{-1} = \sum_{k \geq 0} (-1)^k (E_+)^k.$

Let’s first consider $\sum_k (E_+)^k$ (without the $(-1)^k$). This means that we have, for some $k \geq 0$, a $k$-ary product of $E_+$ structures—in other words, a list of nonempty sets. This is the species of ballots, also known as ordered set partitions, and can also be written $L \circ E_+$. As an example, here is a ballot on the set of labels $\{1, \dots, 8\}$:

The order of the parts matters, so this is a different ballot:

But the order of labels within each part doesn’t matter (since each part is a set). As another example, here is the complete collection of ballot structures on the labels $\{1,2,3\}$:

We can see that there are 13 in total: six where the labels are each in their own separate part (corresponding to the six possible permutations of the labels); six where two labels share a part and the other label is a singleton part (corresponding to the three ways to choose the solitary label, times the two ways to order the parts); and one final ballot where all three labels are grouped in the same part. (As an exercise, can you verify that there are 75 different ballot structures on a set of four labels?)

Returning to $E^{-1} = \sum_k (-1)^k (E_+)^k$, we can see that it consists of signed ballots, where the sign of a ballot is the parity of its number of parts, that is, a ballot with $k$ parts has sign $(-1)^k$. The second half of Anders’ post gives a nice combinatorial proof that $E \cdot E^{-1} = 1$, via a sign-reversing involution: if we consider $E \cdot E^{-1}$-structures, i.e. pairs of sets and signed ballots, there is a natural1 way to pair them up, matching positive and negative structures so everything cancels (except in the case of the empty label set, which is why we get $1$ instead of $0$).

However, Anders is trying to do more than that. Note first that since multiplication of EGFs corresponds to multiplication of species, the EGF for $E^{-1}$ is of course $1/e^x = e^{-x}$. But this ought to also be the EGF for the virtual species $E(-X)$, and the rest of his post hinges on identifying $E(-X)$ and $E^{-1}$. As Anders and I discovered, however, this is precisely the point where it is worth being more careful.

First of all, what is $E(-X)$? Intuitively, an $E(-X)$ structure consists of a set of negative atoms; since each set can be thought of as an (unordered) product of atoms, the whole set acquires a sign given by the parity of the number of atoms. In other words, intuitively it seems that $E(-X)$ should be the species of signed sets, where an even-sized set is considered positive and an odd-sized set negative. That is,

$\displaystyle E(-X) = \sum_{n \geq 0} (-1)^n E_n,$

where $E_n$ denotes the species of sets of size exactly $n$. As a sanity check, this makes sense as an EGF equation too, since the EGF of $E_n$ is just $x^n/n!$ and indeed

$\displaystyle e^{-x} = \sum_{n \geq 0} \frac{(-x)^n}{n!} = \sum_{n \geq 0} (-1)^n \frac{x^n}{n!}.$

But hold on a minute, what does $E(-X)$ really mean, formally? It is the composition of the species $E$ with the virtual species $-X$, and it turns out that it is not at all a priori obvious how to define composition for virtual species! We can find the definition on p. 127 of Bergeron et al. A special case (which is enough for our present purposes) is

$\displaystyle \Phi(X - Y) = \Phi(X + Y) \times (E(X)E^{-1}(Y))$

where $X$ and $Y$ are two sorts of atoms, and $\times$ denotes Cartesian product of species. In our case,

$\displaystyle E(0 - X) = E(0 + X) \times ((E(0) E^{-1}(X)) = E(X) \times E^{-1}(X) = E^{-1}(X)$

since $E$ is the identity for Cartesian product (overlaying an additional $E$ structure on a set of labels does not add any structure, since there is only one possible $E$-structure).

All of this is to say, $E(-X)$ is actually defined as $E^{-1}$! So at first glance it may seem we actually have nothing to prove: $E(-X)$ and $E^{-1}$ are the same by definition, end of story. …but in fact, all we have done is shift the burden of proof elsewhere: now it is our intuitive idea of $E(-X)$ representing signed sets that requires proof!

To sum up, we know that $E(-X) = E^{-1} = \sum_k (-1)^k (E_+)^k$ is the species of signed ballots, with sign given by parity of the number of parts; and intuitively, we also believe that $E(-X)$ should correspond to parity-signed sets, $\sum_n (-1)^n E_n$. So, is there a nice combinatorial proof showing the correspondence between signed sets and signed ballots?

One can use the law of excluded middle to show that the answer must be “yes”: suppose the answer were “no”; but then I would not be writing this blog post, which is a contradiction since I am writing this blog post. But is there a constructive proof? Fear not! This blog post has gotten long enough, so I will stop here for now and let interested readers puzzle over it; in my next post I will explain what I came up with, along with some musings on linear orders and naturality.

1. I am indeed using the word natural in a technical, categorical sense here! This will play an important role in my second post…

# [uyyrhizz] IDE type annotations

Ideas for a desirable feature of a Haskell IDE:

Good: IDE pops up the type signature of the library function or symbol under the point.  Emacs haskell-mode can do this.

Better: IDE is aware of static scoping, let binding, and imports to really know what function you are referring to.  However, if you forgot to import, it still tries to be helpful, guessing at a library function and offering its signature as well as a reminder that you need to import it.

Better: If the function does not have an explicit type signature, the IDE does type inference to figure it out.

Better: if the type is polymorphic, the IDE also provides the type of the function as instantiated where it is used, instead of just the polymorphic type where it was declared.

# The Curry-Howard correspondence between programs and proofs

<html xmlns="http://www.w3.org/1999/xhtml"><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="text/css" http-equiv="Content-Style-Type"/> <meta content="pandoc" name="generator"/> <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > span.kw { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > span.bn { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > span.ch { color: #4070a0; } /* Char */ code > span.st { color: #4070a0; } /* String */ code > span.co { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > span.al { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > span.er { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > span.cn { color: #880000; } /* Constant */ code > span.sc { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > span.ss { color: #bb6688; } /* SpecialString */ code > span.im { } /* Import */ code > span.va { color: #19177c; } /* Variable */ code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > span.at { color: #7d9029; } /* Attribute */ code > span.do { color: #ba2121; font-style: italic; } /* Documentation */ code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style></head><body><head><meta charset="UTF-8"/></head>

This post will explain the connection between programming languages and logical proofs, known as the Curry-Howard correspondence. I will provide several examples of this correspondence to help you build a working intuition for how these two fields relate to one another.

The Curry-Howard correspondence states that:

• Logical propositions correspond to programming types
• Logical proofs correspond to programming values
• Proving a proposition corresponds to creating a value of a given type

I'll use my Dhall programming language to illustrate the above connections since Dhall is an explicitly typed total programming language without any escape hatches. If you're not familiar with Dhall, you can read the Dhall tutorial and I will also make analogies to Haskell along the way, too.

### Propositions

Let's begin with the following logical proposition:

∀(a ∈ Prop): a ⇒ a

You can read that as "for any proposition (which we will denote using the letter a), if the proposition a is true then the proposition a is true". This is true no matter what the proposition a is. For example, suppose that a were the following proposition:

a = {The sky is blue}

Then we could conclude that "if the sky is blue then the sky is blue":

{The sky is blue} ⇒ {The sky is blue}

Here the right double arrow () denotes logical implication. Anywhere you see (A ⇒ B) you should read that as "if the proposition A is true then the proposition B is true". You can also say "the proposition A implies the proposition B" or just "A implies B" for short.

However, what if a were another proposition like:

a = {The sky is green}

giving us:

{The sky is green} ⇒ {The sky is green}

This is true, even though the sky might not be green. We only state that "if the sky is green then the sky is green", which is definitely a true statement, whether or not the sky is green.

The upside down A (i.e. ) in our original proposition is mathematical shorthand that means "for all" (although I sometimes describe this as "for any"). The symbol is shorthand for "in" (i.e. set membership). So whenever you see this:

∀(a ∈ S): p

... you can read that as "for any element a in the set S the proposition p is true". Usually the set S is Prop (i.e. the set of all possible propositions) but we will see some examples below where S can be another set.

### Types

This logical proposition:

∀(a ∈ Prop): a ⇒ a

... corresponds to the following Dhall type:

(a : Type)  a  a

... which you can read as "for any type (which we will denote using the letter a), this is the type of a function that transforms an input of type a to an output of type a". Here's the corresponding function that has this type:

λ(a : Type)  λ(x : a)  x

This is an anonymous function of two arguments:

• the first argument is named a and a is a Type
• the second argument is named x and x has type a

The result is the function's second argument (i.e. x) which still has type a.

The equivalent Haskell function is id, the polymorphic identity function:

id :: a -> a
id x = x

-- or using the ExplicitForAll language extension
id :: forall a . a -> a
id x = x

The main difference is that Haskell does not require you to explicitly bind the polymorphic type a as an additional argument. Dhall does require this because Dhall is an explicitly typed language.

### Correspondence

The Curry-Howard correspondence says that we can use the type checker of a programming language as a proof checker. Any time we want to prove a logical proposition, we:

• translate the logical proposition to the corresponding type in our programming language
• create a value in our programming language that has the given type
• use the type-checker to verify that our value has the specified type

If we find a value of the given type then that completes the proof of our original proposition.

For example, if we want to prove this proposition:

∀(a ∈ Prop): a ⇒ a

... then we translate the proposition to the corresponding type in the Dhall programming language:

(a : Type)  a  a

... then define a value in the Dhall language that has this type:

λ(a : Type)  λ(x : a)  x

... and then use Dhall's type checker to verify that this value has the given type:

$dhall <<< 'λ(a : Type) → λ(x : a) → x' (a : Type) → ∀(x : a)a λ(a : Type) → λ(x : a)x The first line is the inferred type of our value and the second line is the value's normal form (which in this case is the same as the original value). Note that the inferred type slightly differs from the original type we expected: -- What we expected: (a : Type) a a -- What the compiler inferred: (a : Type) (x : a) a These are both still the same type. The difference is that the compiler's inferred type also includes the name of the second argument: x. If we were to translate this back to logical proposition notation, we might write: ∀(a ∈ Prop): ∀(x ∈ a): a ... which you could read as: "for any proposition named a, given a proof of a named x, the proposition a is true". This is equivalent to our original proposition but now we've given a name (i.e. x) to the proof of a. The reason this works is that you can think of a proposition as a set, too, where the elements of the set are the proofs of that proposition. Some propositions are false and have no elements (i.e. no valid proofs), while other propositions can have multiple elements (i.e. multiple valid proofs). Similarly, you can think of a type as a set, where the elements of the set are the values that have that type. Some types are empty and have no elements (i.e. no values of that type), while other types can have multiple elements (i.e. multiple values of that type). Here's an example of a proposition that has multiple valid proofs: ∀(a ∈ Prop): a ⇒ a ⇒ a The corresponding type is: (a : Type) a a a ... and there are two values that have the above type. The first value is: λ(a : Type) λ(x : a) λ(y : a) x ... and the second value is: λ(a : Type) λ(x : a) λ(y : a) y We can even translate these two values back into informal logical proofs. For example, this value: λ(a : Type) λ(x : a) λ(y : a) x ... corresponds to this informal proof: For each "a" in the set of all propositions: Given a proof that "a" is true [Let's call this proof "x"] Given a proof that "a" is true [Let's call this proof "y"] We can conclude that "a" is true, according to the proof "x" QED Similarly, this value: λ(a : Type) λ(x : a) λ(y : a) y ... corresponds to this informal proof: For each "a" in the set of all propositions: Given a proof that "a" is true [Let's call this proof "x"] Given a proof that "a" is true [Let's call this proof "y"] We can conclude that "a" is true, according to the proof "y" QED We can actually give these proofs a formal structure that parallels our code but that is outside of the scope of this post. ### Function composition Now consider this more complex proposition: ∀(a ∈ Prop): ∀(b ∈ Prop): ∀(c ∈ Prop): (b ⇒ c) ⇒ (a ⇒ b) ⇒ (a ⇒ c) You can read that as saying: "for any three propositions named a, b, and c, if b implies c, then if a implies b, then a implies c". The corresponding type is: (a : Type) (b : Type) (c : Type) (b c) (a b) (a c) ... and we can define a value that has this type in order to prove that the proposition is true: λ(a : Type) λ(b : Type) λ(c : Type) λ(f : b c) λ(g : a b) λ(x : a) f (g x) ... and the compiler will infer that our value has the correct type:$ dhall
λ(a : Type)
→ λ(b : Type)
→ λ(c : Type)
→ λ(f : b → c)
→ λ(g : a → b)
→ λ(x : a)
f (g x)
<Ctrl-D>
(a : Type) → ∀(b : Type) → ∀(c : Type) → ∀(f : b → c) → ∀(g : a → b) → ∀(x : a)c

λ(a : Type) → λ(b : Type) → λ(c : Type) → λ(f : b → c) → λ(g : a → b) → λ(x : a)f (g x)

Note that this function is equivalent to Haskell's function composition operator (ie. (.)):

(.) :: (b -> c) -> (a -> b) -> (a -> c)
(f . g) x = f (g x)

This is because proofs and programs are equivalent to one another: whenever we prove a logical proposition we get a potentially useful function for free.

### Circular reasoning

There are some propositions that we cannot prove, like this one:

∀(a ∈ Prop): a

... which you can read as saying "all propositions are true". This proposition is false, because if we translate the proposition to the corresponding type:

(a : Type)  a

... then there is no value in the Dhall language which has that type. If there were such a value then we could automatically create any value of any type.

However, Haskell does have values which inhabit the above type, such as this one:

example :: a
example = let x = x in x

-- or using the ExplicitForAll language extension:
example :: forall a . a
example = let x = x in x

This value just cheats and endlessly loops, which satisfies Haskell's type checker but doesn't produce a useful program. Unrestricted recursion like this is the programming equivalent of "circular reasoning" (i.e. trying to claim that a is true because a is true).

We can't cheat like this in Dhall and if we try we just get this error:

$dhall <<< 'let x = x in x' Use "dhall --explain" for detailed errors Error: Unbound variable x (stdin):1:9 You cannot define x in terms of itself because Dhall does not permit any recursion and therefore does not permit circular reasoning when used as a proof checker. ### And We will use the symbol to denote logical "and", so you can read the following proposition: ∀(a ∈ Prop): ∀(b ∈ Prop): (a ∧ b) ⇒ a ... as saying "for any two propositions a and b, if a and b are both true then a is true" The type-level equivalent of logical "and" is a record with two fields: (a : Type) (b : Type) { fst : a, snd : b } a ... which is equivalent to this Haskell type: (a, b) -> a -- or using ExplicitForAll: forall a b . (a, b) -> a The programming value that has this type is: λ(a : Type) λ(b : Type) λ(r : { fst : a, snd : b }) r.fst ... which is equivalent to this Haskell value: fst Similarly, we can conclude that: ∀(a ∈ Prop): ∀(b ∈ Prop): (a ∧ b) ⇒ b ... which corresponds to this type: (a : Type) (b : Type) { fst : a, snd : b } b ... and this value of that type as the proof: λ(a : Type) λ(b : Type) λ(r : { fst : a, snd : b }) r.snd Now let's try a slightly more complex proposition: ∀(a ∈ Prop): ∀(b ∈ Prop): (a ∧ (a ⇒ b)) ⇒ b You can read this as saying "for any propositions a and b, if the proposition a is true, and a implies b, then the proposition b is true". That corresponds to this type: (a : Type) (b : Type) { fst : a, snd : a b } b ... and the following value of that type which proves that the proposition is true: λ(a : Type) λ(b : Type) λ(r : { fst : a, snd : a b }) r.snd r.fst Here's a slightly more complex example: ∀(a ∈ Prop): ∀(b ∈ Prop): ∀(c ∈ Prop): ((a ∧ b) ⇒ c) ⇒ (a ⇒ b ⇒ c) You can read that as saying: "for any three propositions named a, b, and c, if a and b imply c, then a implies that b implies c". Here's the corresponding type: (a : Type) (b : Type) (c : Type) ({ fst : a, snd : b } c) (a b c) ... and the corresponding value of that type: λ(a : Type) λ(b : Type) λ(c : Type) λ(f : { fst : a, snd : b } c) λ(x : a) λ(y : b) f { fst = x, snd = y } Note that this is equivalent to Haskell's curry function: curry :: ((a, b) -> c) -> (a -> b -> c) curry f x y = f (x, y) ### Or We will use the symbol to denote logical "or", so you can read the following proposition: ∀(a ∈ Prop): ∀(b ∈ Prop): a ⇒ a ∨ b ... as saying "for any propositions a and b, if a is true, then either a is true or b is true". The type-level equivalent of logical "or" is a sum type: (a : Type) (b : Type) a < Left : a | Right : b > ... which is equivalent to this Haskell type: a -> Either a b -- or using ExplicitForAll forall a b . a -> Either a b ... and the value that has this type is: λ(a : Type) λ(b : Type) λ(x : a) < Left = x | Right : b > ... which is equivalent to this Haskell value: Left Similarly, for this proposition: ∀(a ∈ Prop): ∀(b ∈ Prop): b ⇒ a ∨ b ... the equivalent type is: (a : Type) (b : Type) b < Left : a | Right : b > ... and the value that has this type is: λ(a : Type) λ(b : Type) λ(x : b) < Right = x | Left : a > Let's consider a more complex example: ∀(a ∈ Prop): a ∨ a ⇒ a ... which says: "for any proposition a, if a is true or a is true then a is true". The corresponding type is: (a : Type) < Left : a | Right : a > a ... which is equivalent to this Haskell type: Either a a -> a -- or using ExplicitForAll forall a . Either a a -> a ... and we can create a value of this type by pattern matching on the sum type: λ(a : Type) λ(s : < Left : a | Right : a >) merge { Left = λ(x : a) x , Right = λ(x : a) x } s : a ... which is equivalent to this Haskell code: example :: Either a a -> a example (Left x) = x example (Right x) = x You can read this "proof" as saying: "There are two possibilities. The first possibility (named "Left") is that a is true, therefore a must be true in that case. The second possibility (named "Right") is that a is true, therefore a must be true in that case. Since a is true in both cases we can conclude that a is true, period." ### True We'll use True to denote a logical proposition that is always true: True ... and the corresponding type is an empty record with no fields: {} ... which is equivalent to Haskell's "unit" type: () An empty record literal is the only value that has the type of an empty record: {=} ... which is equivalent to Haskell's "unit" value: () ### False We'll use False to denote a logical proposition that is never true: False ... and the corresponding type is an empty sum type (i.e. a type with no constructors/alternatives): <> ... which is equivalent to Haskell's Void type. There is no value that has the above type, so you cannot prove a False proposition. The logical "principle of explosion" states that if you assume a False proposition then you can prove any other proposition. In other words: False ⇒ ∀(a ∈ Prop): a ... which you can read as saying: "if you assume a false proposition, then for any proposition named a you can prove that a is true". The corresponding type is: <> (a : Type) a ... and the value that inhabits the above type is: λ(s : <>) λ(a : Type) merge {=} s : a You can read that "proof" as saying: "There are zero possibilities. Since we handled all possibilities then the proposition a must be true." This line of reasoning is called a "vacuous truth", meaning that if there are no cases to consider then any statement you make is technically true for all cases. The equivalent Haskell type would be: Void -> a ... and the equivalent Haskell value of that type would be: absurd ### Not We'll use not to negate a logical proposition, meaning that: not(True) = False not(False) = True We can encode logical not in terms of logical implication and logical False, like this: not(a) = a ⇒ False not(True) = True ⇒ False = False not(False) = False ⇒ False = True If logical not is a function from a proposition to another proposition then type-level not must be a function from a type to a type: λ(a : Type) (a <>) We can save the above type-level function to a file to create a convenient ./not utility we will reuse later on:$ echo 'λ(a : Type) → a → <>' > ./not

Now let's try to prove a proposition like:

not(True) ⇒ False

The corresponding type is:

./not {}  <>

... which expands out to:

({}  <>)  <>

... and the value that has this type is:

λ(f : {}  <>)  f {=}

### Double negation

However, suppose we try to prove "double negation":

∀(a ∈ Prop): not(not(a)) ⇒ a

... which says that "if a is not false, then a must be true".

The corresponding type would be:

(a : Type)  ./not (./not a)  a

... which expands out to:

(a : Type)  ((a  <>)  <>)  a

... but there is no value in the Dhall language that has this type!

The reason why is that every type checker corresponds to a specific logic system and not all logic systems are the same. Each logic system has different rules and axioms about what you can prove within the system.

Dhall's type checker is based on System Fω which corresponds to an intuitionistic (or constructive) logic. This sort of logic system does not assume the law of excluded middle.

The law of excluded middle says that every proposition must be true or false, which we can write like this:

∀(a ∈ Prop): a ∨ not(a)

You can read that as saying: "any proposition a is either true or false". This seems pretty reasonable until you try to translate the proposition to the corresponding type:

(a : Type)  < Left : a | Right : a  <> >

... which in Haskell would be:

example :: Either a (a -> Void)

We cannot create such a value because if we could then that would imply that for any type we could either:

• produce a value of that type, or:
• produce a "counter-example" by creating Void from any value of that type

While we can do this for some types, our type checker does not let us do this automatically for every type.

This is the same reason that we can't prove double negation, which implicitly assumes that there are only two choices (i.e. "true or false"). However, if we don't assume the law of the excluded middle then there might be other choices like: "I don't know".

Not all hope is lost, though! Even though our type checker doesn't support the axiom of choice, we can still add new axioms freely to our logic system. All we have to do is assume them the same way we assume other propositions.

For example, we can modify our original proposition to now say:

(∀(b ∈ Prop): b ∨ not(b)) ⇒ (∀(a ∈ Prop): not(not(a)) ⇒ a)

... which you can read as saying: "if we assume that all propositions are either true or false, then if a proposition is not false it must be true".

That corresponds to this type:

((b : Type)  < Left : b | Right : b  <> >)
((a : Type) ((a <>) <>) a)

... and we can create a value of that type:

λ(noMiddle : (b : Type)  < Left : b | Right : b  <> >)
λ(a : Type)
λ(doubleNegation : (a <>) <>)
merge
{ Left = λ(x : a) x
, Right = λ(x : a <>) merge {=} (doubleNegation x) : a
}
(noMiddle a)
: a

... which the type checker confirms has the correct type:

$dhall λ(noMiddle : ∀(b : Type)< Left : b | Right : b → <> >) → λ(a : Type) → λ(doubleNegation : (a → <>)<>) merge { Left = λ(x : a) → x , Right = λ(x : a → <>) → merge {=} (doubleNegation x) : a } (noMiddle a) : a <Ctrl-D> (noMiddle : ∀(b : Type)< Left : b | Right : b → <> >) → ∀(a : Type) → ∀(doubleNegation : (a → <>)<>) → a λ(noMiddle : ∀(b : Type)< Left : b | Right : b → <> >) → λ(a : Type) → λ(doubleNegation : (a → <>)<>) → merge { Left = λ(x : a) → x, Right = λ(x : a → <>) → merge {=} (doubleNegation x) : a } (noMiddle a) : a You can read that proof as saying: • The law of the excluded middle says that there are two possibilities: either a is true or a is false • If a is true then we're done: we trivially conclude a is true • If a is false then our assumption of not(not(a)) is also false • we can conclude anything from a false assumption, therefore we conclude that a is true What's neat about this is that the compiler mechanically checks this reasoning process. You don't have to understand or trust my explanation of how the proof works because you can delegate your trust to the compiler, which does all the work for you. ### Conclusion This post gives a brief overview of how you can concretely translate logical propositions and proofs to types and programs. This can let you leverage your logical intuitions to reason about types or, vice versa, leverage your programming intuitions to reason about propositions. For example, if you can prove a false proposition in a logic system, then that's typically an escape hatch in the corresponding type system. Similarly, if you can create a value of the empty type in a programming language, that implies that the corresponding logic is not sound. There are many kinds of type systems just like there are many kinds of logic systems. For every new logic system (like linear logic) you get a type system for free (like linear types). For example, Rust's type checker is an example of an affine type system which corresponds to an affine logic system. To my knowledge, there are more logic systems in academic literature than there are type systems that have been implemented for programming languages. This in turn suggests that there are many awesome type systems waiting to be implemented. </body></html> ## February 18, 2017 ### Wolfgang Jeltsch # MIU in Haskell In the Theory Lunch of the last week, James Chapman talked about the MU puzzle from Douglas Hofstadter’s book Gödel, Escher, Bach. This puzzle is about a string rewriting system. James presented a Haskell program that computes derivations of strings. Inspired by this, I wrote my own implementation, with the goal of improving efficiency. This blog post presents this implementation. As usual, it is available as a literate Haskell file, which you can load into GHCi. ## The puzzle Let me first describe the MU puzzle shortly. The puzzle deals with strings that may contain the characters $\mathrm M$, $\mathrm I$, and $\mathrm U$. We can derive new strings from old ones using the following rewriting system: $\begin{array}{rcl} x\mathrm I & \rightarrow & x\mathrm{IU} \\ \mathrm Mx & \rightarrow & \mathrm Mxx \\ x\mathrm{III}y & \rightarrow & x\mathrm Uy \\ x\mathrm{UU}y & \rightarrow & xy \end{array}$ The question is whether it is possible to turn the string $\mathrm{MI}$ into the string $\mathrm{MU}$ using these rules. You may want to try to solve this puzzle yourself, or you may want to look up the solution on the Wikipedia page. ## The code The code is not only concerned with deriving $\mathrm{MU}$ from $\mathrm{MI}$, but with derivations as such. ### Preliminaries We import Data.List: import Data.List ### Basic things We define the type Sym of symbols and the type Str of symbol strings: data Sym = M | I | U deriving Eq type Str = [Sym] instance Show Sym where show M = "M" show I = "I" show U = "U" showList str = (concatMap show str ++) Next, we define the type Rule of rules as well as the list rules that contains all rules: data Rule = R1 | R2 | R3 | R4 deriving Show rules :: [Rule] rules = [R1,R2,R3,R4] ### Rule application We first introduce a helper function that takes a string and returns the list of all splits of this string. Thereby, a split of a string str is a pair of strings str1 and str2 such that str1 ++ str2 == str. A straightforward implementation of splitting is as follows: splits' :: Str -> [(Str,Str)] splits' str = zip (inits str) (tails str) The problem with this implementation is that walking through the result list takes quadratic time, even if the elements of the list are left unevaluated. The following implementation solves this problem: splits :: Str -> [(Str,Str)] splits str = zip (map (flip take str) [0 ..]) (tails str) Next, we define a helper function replace. An expression replace old new str yields the list of all strings that can be constructed by replacing the string old inside str by new. replace :: Str -> Str -> Str -> [Str] replace old new str = [front ++ new ++ rear | (front,rest) <- splits str, old isPrefixOf rest, let rear = drop (length old) rest] We are now ready to implement the function apply, which performs rule application. This function takes a rule and a string and produces all strings that can be derived from the given string using the given rule exactly once. apply :: Rule -> Str -> [Str] apply R1 str | last str == I = [str ++ [U]] apply R2 (M : tail) = [M : tail ++ tail] apply R3 str = replace [I,I,I] [U] str apply R4 str = replace [U,U] [] str apply _ _ = [] ### Derivation trees Now we want to build derivation trees. A derivation tree for a string str has the following properties: • The root is labeled with str. • The subtrees of the root are the derivation trees for the strings that can be generated from str by a single rule application. • The edges from the root to its subtrees are marked with the respective rules that are applied. We first define types for representing derivation trees: data DTree = DTree Str [DSub] data DSub = DSub Rule DTree Now we define the function dTree that turns a string into its derivation tree: dTree :: Str -> DTree dTree str = DTree str [DSub rule subtree | rule <- rules, subStr <- apply rule str, let subtree = dTree subStr] ### Derivations A derivation is a sequence of strings with rules between them such that each rule takes the string before it to the string after it. We define types for representing derivations: data Deriv = Deriv [DStep] Str data DStep = DStep Str Rule instance Show Deriv where show (Deriv steps goal) = " " ++ concatMap show steps ++ show goal ++ "\n" showList derivs = (concatMap ((++ "\n") . show) derivs ++) instance Show DStep where show (DStep origin rule) = show origin ++ "\n-> (" ++ show rule ++ ") " Now we implement a function derivs that converts a derivation tree into the list of all derivations that start with the tree’s root label. The function derivs traverses the tree in breadth-first order. derivs :: DTree -> [Deriv] derivs tree = worker [([],tree)] where worker :: [([DStep],DTree)] -> [Deriv] worker tasks = rootDerivs tasks ++ worker (subtasks tasks) rootDerivs :: [([DStep],DTree)] -> [Deriv] rootDerivs tasks = [Deriv (reverse revSteps) root | (revSteps,DTree root _) <- tasks] subtasks :: [([DStep],DTree)] -> [([DStep],DTree)] subtasks tasks = [(DStep root rule : revSteps,subtree) | (revSteps,DTree root subs) <- tasks, DSub rule subtree <- subs] Finally, we implement the function derivations which takes two strings and returns the list of those derivations that turn the first string into the second: derivations :: Str -> Str -> [Deriv] derivations start end = [deriv | deriv@(Deriv _ goal) <- derivs (dTree start), goal == end] You may want to enter derivations [M,I] [M,U,I] at the GHCi prompt to see the derivations function in action. You can also enter derivations [M,I] [M,U] to get an idea about the solution to the MU puzzle. Tagged: Douglas Hofstadter, functional programming, Gödel, Escher, Bach (book), Haskell, Institute of Cybernetics, James Chapman, literate programming, MU puzzle, string rewriting, talk, Theory Lunch # Generic programming in Haskell Generic programming is a powerful way to define a function that works in an analogous way for a class of types. In this article, I describe the latest approach to generic programming that is implemented in GHC. This approach goes back to the paper A Generic Deriving Mechanism for Haskell by José Pedro Magalhães, Atze Dijkstra, Johan Jeuring, and Andres Löh. This article is a writeup of a Theory Lunch talk I gave on 4 February 2016. As usual, the source of this article is a literate Haskell file, which you can download, load into GHCi, and play with. ## Motivation Parametric polymorphism allows you to write functions that deal with values of any type. An example of such a function is the reverse function, whose type is [a] -> [a]. You can apply reverse to any list, no matter what types the elements have. However, parametric polymorphism does not allow your functions to depend on the structure of the concrete types that are used in place of type variables. So values of these types are always treated as black boxes. For example, the reverse function only reorders the elements of the given list. A function of type [a] -> [a] could also drop elements (like the tail function does) or duplicate elements (like the cycle function does), but it could never invent new elements (except for ⊥) or analyze elements. Now there are situation where a function is suitable for a class of types that share certain properties. For example, the sum function works for all types that have a notion of binary addition. Haskell uses type classes to support such functions. For example, the Num class provides the method (+), which is used in the definition of sum, whose type Num a => [a] -> a contains a respective class constraint. The methods of a class have to be implemented separately for every type that is an instance of the class. This is reasonable for methods like (+), where the implementations for the different instances differ fundamentally. However, it is unfortunate for methods that are implemented in an analogous way for most of the class instances. An example of such a method is (==), since there is a canonical way of checking values of algebraic data types for equality. It works by first comparing the outermost data constructors of the two given values and if they match, the individual fields. Only when the data constructors and all the fields match, the two values are considered equal. For several standard classes, including Eq, Haskell provides the deriving mechanism to generate instances with default method implementations whose precise functionality depends on the structure of the type. Unfortunately, there is no possibility in standard Haskell to extend this deriving mechanism to user-defined classes. Generic programming is a way out of this problem. ## Prerequisites For generic programming, we need several language extensions. The good thing is that only one of them, DeriveGeneric, is specific to generic programming. The other ones have uses in other areas as well. Furthermore, DeriveGeneric is a very small extension. So the generic programming approach we describe here can be considered very lightweight. We state the full set of necessary extensions with the following pragma: {-# LANGUAGE DefaultSignatures, DeriveGeneric, FlexibleContexts, TypeFamilies, TypeOperators #-} Apart from these language extensions, we need the module GHC.Generics: import GHC.Generics ## Our running example As our running example, we pick serialization and deserialization of values. Serialization means converting a value into a bit string, and deserialization means parsing a bit string in order to get back a value. We introduce a type Bit for representing bits: data Bit = O | I deriving (Eq, Show) Furthermore, we define the class of all types that support serialization and deserialization as follows: class Serializable a where put :: a -> [Bit] get :: [Bit] -> (a, [Bit]) There is a canonical way of serializing values of algebraic data types. It works by first encoding the data constructor of the given value as a sequence of bits and then serializing the individual fields. To show this approach in action, we define an algebraic data type Tree, which is a type of labeled binary trees: data Tree a = Leaf | Branch (Tree a) a (Tree a) deriving Show An instantiation of Serializable for Tree that follows the canonical serialization approach can be carried out as follows: instance Serializable a => Serializable (Tree a) where put Leaf = [O] put (Branch left root right) = [I] ++ put left ++ put root ++ put right get (O : bits) = (Leaf, bits) get (I : bits) = (Branch left root right, bits''') where (left, bits') = get bits (root, bits'') = get bits' (right, bits''') = get bits'' Of course, it quickly becomes cumbersome to provide such an instance declaration for every algebraic data type that should use the canonical serialization approach. So we want to implement the canonical approach once and for all and make it easily usable for arbitrary types that are amenable to it. Generic programming makes this possible. ## Representations An algebraic data type is essentially a sum of products where the terms “sum” and “product” are understood as follows: • A sum is a variant type. In Haskell, Either is the canonical type constructor for binary sums, and the empty type Void from the void package is the nullary sum. • A product is a tuple type. In Haskell, (,) is the canonical type constructor for binary products, and () is the nullary product. The key idea of generic programming is to map types to representations that make the sum-of-products structure explicit and to implement canonical behavior based on these representations instead of the actual types. The GHC.Generics module defines a number of type constructors for constructing representations: data V1 p infixr 5 :+: data (:+:) f g p = L1 (f p) | R1 (g p) data U1 p = U1 infixr 6 :*: data (:*:) f g p = f p :*: g p newtype K1 i a p = K1 { unK1 :: a } newtype M1 i a f p = M1 { unM1 :: f p } All of these type constructors take a final parameter p. This parameter is relevant only when dealing with higher-order classes. In this article, however, we only discuss generic programming with first-order classes. In this case, the parameter p is ignored. The different type constructors play the following roles: • V1 is for the nullary sum. • (:+:) is for binary sums. • U1 is for the nullary product. • (:*:) is for binary products. • K1 is a wrapper for fields of algebraic data types. Its parameter i used to provide some information about the field at the type level, but is now obsolete. • M1 is a wrapper for attaching meta information at the type level. Its parameter i denotes the kind of the language construct the meta information refers to, and its parameter c provides access to the meta information. The GHC.Generics module furthermore introduces a class Generic, whose instances are the types for which a representation exists. Its definition is as follows: class Generic a where type Rep a :: * -> * from :: a -> (Rep a) p to :: (Rep a) p -> a A type Rep a is the representation of the type a. The methods from and to convert from values of the actual type to values of the representation type and vice versa. To see all this in action, we make Tree a an instance of Generic: instance Generic (Tree a) where type Rep (Tree a) = M1 D D1_Tree ( M1 C C1_Tree_Leaf U1 :+: M1 C C1_Tree_Branch ( M1 S NoSelector (K1 R (Tree a)) :*: M1 S NoSelector (K1 R a) :*: M1 S NoSelector (K1 R (Tree a)) ) ) from Leaf = M1 (L1 (M1 U1)) from (Branch left root right) = M1 ( R1 ( M1 ( M1 (K1 left) :*: M1 (K1 root) :*: M1 (K1 right) )) ) to (M1 (L1 (M1 U1))) = Leaf to (M1 ( R1 ( M1 ( M1 (K1 left) :*: M1 (K1 root) :*: M1 (K1 right) )) )) = Branch left root right The types D1_Tree, C1_Tree_Leaf, and C1_Tree_Branch are type-level representations of the type constructor Tree, the data constructor Leaf, and the data constructor Branch, respectively. We declare them as empty types: data D1_Tree data C1_Tree_Leaf data C1_Tree_Branch We need to make these types instances of the classes Datatype and Constructor, which are part of GHC.Generics as well. These classes provide a link between the type-level representations of type and data constructors and the meta information related to them. This meta information particularly covers the identifiers of the type and data constructors, which are needed when implementing canonical implementations for methods like show and read. The instance declarations for the Tree-related types are as follows: instance Datatype D1_Tree where datatypeName _ = "Tree" moduleName _ = "Main" instance Constructor C1_Tree_Leaf where conName _ = "Leaf" instance Constructor C1_Tree_Branch where conName _ = "Branch" Instantiating the Generic class as shown above is obviously an extremely tedious task. However, it is possible to instantiate Generic completely automatically for any given algebraic data type, using the deriving syntax. This is what the DeriveGeneric language extension makes possible. So instead of making Tree a an instance of Generic by hand, as we have done above, we could have declared the Tree type as follows in the first place: data Tree a = Leaf | Branch (Tree a) a (Tree a) deriving (Show, Generic) ## Implementing canonical behavior As mentioned above, we implement canonical behavior based on representations. Let us see how this works in the case of the Serializable class. We introduce a new class Serializable' whose methods provide serialization and deserialization for representation types: class Serializable' f where put' :: f p -> [Bit] get' :: [Bit] -> (f p, [Bit]) We instantiate this class for all the representation types: instance Serializable' U1 where put' U1 = [] get' bits = (U1, bits) instance (Serializable' r, Serializable' s) => Serializable' (r :*: s) where put' (rep1 :*: rep2) = put' rep1 ++ put' rep2 get' bits = (rep1 :*: rep2, bits'') where (rep1, bits') = get' bits (rep2, bits'') = get' bits' instance Serializable' V1 where put' _ = error "attempt to put a void value" get' _ = error "attempt to get a void value" instance (Serializable' r, Serializable' s) => Serializable' (r :+: s) where put' (L1 rep) = O : put' rep put' (R1 rep) = I : put' rep get' (O : bits) = let (rep, bits') = get' bits in (L1 rep, bits') get' (I : bits) = let (rep, bits') = get' bits in (R1 rep, bits') instance Serializable' r => Serializable' (M1 i a r) where put' (M1 rep) = put' rep get' bits = (M1 rep, bits') where (rep, bits') = get' bits instance Serializable a => Serializable' (K1 i a) where put' (K1 val) = put val get' bits = (K1 val, bits') where (val, bits') = get bits Note that in the case of K1, the context mentions Serializable, not Serializable', and the methods put' and get call put and get, not put' and get'. The reason is that the value wrapped in K1 has an ordinary type, not a representation type. ## Accessing canonical behavior We can now apply canonical behavior to ordinary types using the methods from and to from the Generic class. For example, we can implement functions defaultPut and defaultGet that provide canonical serialization and deserialization for all instances of Generic: defaultPut :: (Generic a, Serializable' (Rep a)) => a -> [Bit] defaultPut = put' . from defaultGet :: (Generic a, Serializable' (Rep a)) => [Bit] -> (a, [Bit]) defaultGet bits = (to rep, bits') where (rep, bits') = get' bits We can use these functions in instance declarations for Serializable. For example, we can make Tree a an instance of Serializable in the following way: instance Serializable a => Serializable (Tree a) where put = defaultPut get = defaultGet Compared to the instance declaration we had initially, this one is a real improvement, since we do not have to implement the desired behavior of put and get by hand anymore. However, it still contains boilerplate code in the form of the trivial method declarations. It would be better to establish defaultPut and defaultGet as defaults in the class declaration: class Serializable a where put :: a -> [Bit] put = defaultPut get :: [Bit] -> (a, [Bit]) get = defaultGet However, this is not possible, since the types of defaultPut and defaultGet are less general than the types of put and get, as they put additional constraints on the type a. Luckily, GHC supports the language extension DefaultSignatures, which allows us to give default implementations that have less general types than the actual methods (and consequently work only for those instances that are compatible with these less general types). Using DefaultSignatures, we can declare the Serializable class as follows: class Serializable a where put :: a -> [Bit] default put :: (Generic a, Serializable' (Rep a)) => a -> [Bit] put = defaultPut get :: [Bit] -> (a, [Bit]) default get :: (Generic a, Serializable' (Rep a)) => [Bit] -> (a, [Bit]) get = defaultGet With this class declaration in place, we can make Tree a an instance of Serializable as follows: instance Serializable a => Serializable (Tree a) With the minor extension DeriveAnyClass, which is provided by GHC starting from Version 7.10, we can even use the deriving keyword to instantiate Serializable for Tree a. As a result, we only have to write the following in order to define the Tree type and make it an instance of Serializable: data Tree a = Leaf | Branch (Tree a) a (Tree a) deriving (Show, Generic, Serializable) So finally, we can use our own classes like the Haskell standard classes regarding the use of deriving clauses, except that we have to additionally derive an instance declaration for Generic. ## Specialized implementations Usually, not all instances of a class should or even can be generated by means of generic programming, but some instances have to be crafted by hand. For example, making Int an instance of Serializable requires manual work, since Int is not an algebraic data type. However, there is no problem with this, since we still have the opportunity to write explicit instance declarations, despite the presence of a generic solution. This is in line with the standard deriving mechanism: you can make use of it, but you are not forced to do so. So we can have the following instance declaration, for example: instance Serializable Int where put val = replicate val I ++ [O] get bits = (length is, bits') where (is, O : bits') = span (== I) bits Of course, the serialization approach we use here is not very efficient, but the instance declaration illustrates the point we want to make. Tagged: functional programming, generic programming, GHC, Haskell, Institute of Cybernetics, literate programming, parametric polymorphism, talk, Theory Lunch, type class, type family, void (Haskell package) # Constrained monads There are Haskell types that have an associated monad structure, but cannot be made instances of the Monad class. The reason is typically that the return or the bind operation of such a type m has a constraint on the type parameter of m. As a result, all the nice library support for monads is unusable for such types. This problem is called the constrained-monad problem. In my article The Constraint kind, I described a solution to this problem, which involved changing the Monad class. In this article, I present a solution that works with the standard Monad class. This solution has been developed by Neil Sculthorpe, Jan Bracker, George Giorgidze, and Andy Gill. It is described in their paper The Constrained-Monad Problem and implemented in the constrained-normal package. This article is a write-up of a Theory Lunch talk I gave quite some time ago. As usual, the source of this article is a literate Haskell file, which you can download, load into GHCi, and play with. ## Prerequisites We have to enable a couple of language extensions: {-# LANGUAGE ConstraintKinds, ExistentialQuantification, FlexibleInstances, GADTSyntax, Rank2Types #-} Furthermore, we need to import some modules: import Data.Set hiding (fold, map) import Data.Natural hiding (fold) These imports require the packages containers and natural-numbers to be installed. ## The set monad The Set type has an associated monad structure, consisting of a return and a bind operation: returnSet :: a -> Set a returnSet = singleton bindSet :: Ord b => Set a -> (a -> Set b) -> Set b bindSet sa g = unions (map g (toList sa)) We cannot make Set an instance of Monad though, since bindSet has an Ord constraint on the element type of the result set, which is caused by the use of unions. For a solution, let us first look at how monadic computations on sets would be expressed if Set was an instance of Monad. A monadic expression would be built from non-monadic expressions and applications of return and (>>=). For every such expression, there would be a normal form of the shape s1 >>= \ x1 -> s2 >>= \ x2 -> -> sn >>= \ xn -> return r where the si would be non-monadic expressions of type Set. The existence of a normal form would follow from the monad laws. We define a type UniSet of such normal forms: data UniSet a where ReturnSet :: a -> UniSet a AtmBindSet :: Set a -> (a -> UniSet b) -> UniSet b We can make UniSet an instance of Monad where the monad operations build expressions and normalize them on the fly: instance Monad UniSet where return a = ReturnSet a ReturnSet a >>= f = f a AtmBindSet sa h >>= f = AtmBindSet sa h' where h' a = h a >>= f Note that these monad operations are analogous to operations on lists, with return corresponding to singleton construction and (>>=) corresponding to concatenation. Normalization happens in (>>=) by applying the left-identity and the associativity law for monads. We can use UniSet as an alternative set type, representing a set by a normal form that evaluates to this set. This way, we get a set type that is an instance of Monad. For this to be sane, we have to hide the data constructors of UniSet, so that different normal forms that evaluate to the same set cannot be distinguished. Now we need functions that convert between Set and UniSet. Conversion from Set to UniSet is simple: toUniSet :: Set a -> UniSet a toUniSet sa = AtmBindSet sa ReturnSet Conversion from UniSet to Set is expression evaluation: fromUniSet :: Ord a => UniSet a -> Set a fromUniSet (ReturnSet a) = returnSet a fromUniSet (AtmBindSet sa h) = bindSet sa g where g a = fromUniSet (h a) The type of fromUniSet constrains the element type to be an instance of Ord. This single constraint is enough to make all invocations of bindSet throughout the conversion legal. The reason is our use of normal forms. Since normal forms are “right-leaning”, all applications of (>>=) in them have the same result type as the whole expression. ## The multiset monad Let us now look at a different monad, the multiset monad. We represent a multiset as a function that maps each value of the element type to its multiplicity in the multiset, with a multiplicity of zero denoting absence of this value: newtype MSet a = MSet { mult :: a -> Natural } Now we define the return operation: returnMSet :: Eq a => a -> MSet a returnMSet a = MSet ma where ma b | a == b = 1 | otherwise = 0 For defining the bind operation, we need to define a class Finite of finite types whose sole method enumerates all the values of the respective type: class Finite a where values :: [a] The implementation of the bind operation is as follows: bindMSet :: Finite a => MSet a -> (a -> MSet b) -> MSet b bindMSet msa g = MSet mb where mb b = sum [mult msa a * mult (g a) b | a <- values] Note that the multiset monad differs from the set monad in its use of constraints. The set monad imposes a constraint on the result element type of bind, while the multiset monad imposes a constraint on the first argument element type of bind and another constraint on the result element type of return. Like in the case of sets, we define a type of monadic normal forms: data UniMSet a where ReturnMSet :: a -> UniMSet a AtmBindMSet :: Finite a => MSet a -> (a -> UniMSet b) -> UniMSet b The key difference to UniSet is that UniMSet involves the constraint of the bind operation, so that normal forms must respect this constraint. Without this restriction, it would not be possible to evaluate normal forms later. The MonadUniMSet instance declaration is analogous to the MonadUniSet instance declaration: instance Monad UniMSet where return a = ReturnMSet a ReturnMSet a >>= f = f a AtmBindMSet msa h >>= f = AtmBindMSet msa h' where h' a = h a >>= f Now we define conversion from MSet to UniMSet: toUniMSet :: Finite a => MSet a -> UniMSet a toUniMSet msa = AtmBindMSet msa ReturnMSet Note that we need to constrain the element type in order to fulfill the constraint incorporated into the UniMSet type. Finally, we define conversion from UniMSet to MSet: fromUniMSet :: Eq a => UniMSet a -> MSet a fromUniMSet (ReturnMSet a) = returnMSet a fromUniMSet (AtmBindMSet msa h) = bindMSet msa g where g a = fromUniMSet (h a) Here we need to impose an Eq constraint on the element type. Note that this single constraint is enough to make all invocations of returnMSet throughout the conversion legal. The reason is again our use of normal forms. ## A generic solution The solutions to the constrained-monad problem for sets and multisets are very similar. It is certainly not good if we have to write almost the same code for every new constrained monad that we want to make accessible via the Monad class. Therefore, we define a generic type that covers all such monads: data UniMonad c t a where Return :: a -> UniMonad c t a AtmBind :: c a => t a -> (a -> UniMonad c t b) -> UniMonad c t b The parameter t of UniMonad is the underlying data type, like Set or MSet, and the parameter c is the constraint that has to be imposed on the type parameter of the first argument of the bind operation. For every c and t, we make UniMonad c t an instance of Monad: instance Monad (UniMonad c t) where return a = Return a Return a >>= f = f a AtmBind ta h >>= f = AtmBind ta h' where h' a = h a >>= f We define a function lift that converts from the underlying data type to UniMonad and thus generalizes toUniSet and toUniMSet: lift :: c a => t a -> UniMonad c t a lift ta = AtmBind ta Return Evaluation of normal forms is just folding with the return and bind operations of the underlying data type. Therefore, we implement a fold operator for UniMonad: fold :: (a -> r) -> (forall a . c a => t a -> (a -> r) -> r) -> UniMonad c t a -> r fold return _ (Return a) = return a fold return atmBind (AtmBind ta h) = atmBind ta g where g a = fold return atmBind (h a) Note that fold does not need to deal with constraints, neither with constraints on the result type parameter of return (like Eq in the case of MSet), nor with constraints on the result type parameter of bind (like Ord in the case of Set). This is because fold works with any result type r. Now let us implement Monad-compatible sets and multisets based on UniMonad. In the case of sets, we face the problem that UniMonad takes a constraint for the type parameter of the first bind argument, but bindSet does not have such a constraint. To solve this issue, we introduce a type class Unconstrained of which every type is an instance: class Unconstrained a instance Unconstrained a The implementation of Monad-compatible sets is now straightforward: type UniMonadSet = UniMonad Unconstrained Set toUniMonadSet :: Set a -> UniMonadSet a toUniMonadSet = lift fromUniMonadSet :: Ord a => UniMonadSet a -> Set a fromUniMonadSet = fold returnSet bindSet The implementation of Monad-compatible multisets does not need any utility definitions, but can be given right away: type UniMonadMSet = UniMonad Finite MSet toUniMonadMSet :: Finite a => MSet a -> UniMonadMSet a toUniMonadMSet = lift fromUniMonadMSet :: Eq a => UniMonadMSet a -> MSet a fromUniMonadMSet = fold returnMSet bindMSet Tagged: Andy Gill, constrained-normal (Haskell package), Constraint (kind), containers (Haskell package), functional programming, GADT, George Giorgidze, GHC, Haskell, Institute of Cybernetics, Jan Bracker, literate programming, monad, natural-numbers (Haskell package), Neil Sculthorpe, normal form, talk, Theory Lunch # MIU in Curry More than two years ago, my colleague Denis Firsov and I gave a series of three Theory Lunch talks about the MIU string rewriting system from Douglas Hofstadter’s MU puzzle. The first talk was about a Haskell implementation of MIU, the second talk was an introduction to the functional logic programming language Curry, and the third talk was about a Curry implementation of MIU. The blog articles MIU in Haskell and A taste of Curry are write-ups of the first two talks. However, a write-up of the third talk has never seen the light of day so far. This is changed with this article. As usual, this article is written using literate programming. The article source is a literate Curry file, which you can load into KiCS2 to play with the code. I want to thank all the people from the Curry mailing list who have helped me improving the code in this article. ## Preliminaries We import the module SearchTree: import SearchTree ## Basic things We define the type Sym of symbols and the type Str of symbol strings: data Sym = M | I | U showSym :: Sym -> String showSym M = "M" showSym I = "I" showSym U = "U" type Str = [Sym] showStr :: Str -> String showStr str = concatMap showSym str Next, we define the type Rule of rules: data Rule = R1 | R2 | R3 | R4 showRule :: Rule -> String showRule R1 = "R1" showRule R2 = "R2" showRule R3 = "R3" showRule R4 = "R4" So far, the Curry code is basically the same as the Haskell code. However, this is going to change below. ## Rule application Rule application becomes a lot simpler in Curry. In fact, we can code the rewriting rules almost directly to get a rule application function: applyRule :: Rule -> Str -> Str applyRule R1 (init ++ [I]) = init ++ [I, U] applyRule R2 ([M] ++ tail) = [M] ++ tail ++ tail applyRule R3 (pre ++ [I, I, I] ++ post) = pre ++ [U] ++ post applyRule R4 (pre ++ [U, U] ++ post) = pre ++ post Note that we do not return a list of derivable strings, as we did in the Haskell solution. Instead, we use the fact that functions in Curry are nondeterministic. Furthermore, we do not need the helper functions splits and replace that we used in the Haskell implementation. Instead, we use the ++-operator in conjunction with functional patterns to achieve the same functionality. Now we implement a utility function applyRules for repeated rule application. Our implementation uses a similar trick as the famous Haskell implementation of the Fibonacci sequence: applyRules :: [Rule] -> Str -> [Str] applyRules rules str = tail strs where strs = str : zipWith applyRule rules strs The Haskell implementation does not need the applyRules function, but it needs a lot of code about derivation trees instead. In the Curry solution, derivation trees are implicit, thanks to nondeterminism. ## Derivations A derivation is a sequence of strings with rules between them such that each rule takes the string before it to the string after it. We define types for representing derivations: data Deriv = Deriv [DStep] Str data DStep = DStep Str Rule showDeriv :: Deriv -> String showDeriv (Deriv steps goal) = " " ++ concatMap showDStep steps ++ showStr goal ++ "\n" showDerivs :: [Deriv] -> String showDerivs derivs = concatMap ((++ "\n") . showDeriv) derivs showDStep :: DStep -> String showDStep (DStep origin rule) = showStr origin ++ "\n-> (" ++ showRule rule ++ ") " Now we implement a function derivation that takes two strings and returns the derivations that turn the first string into the second: derivation :: Str -> Str -> Deriv derivation start end | start : applyRules rules start =:= init ++ [end] = Deriv (zipWith DStep init rules) end where rules :: [Rule] rules free init :: [Str] init free Finally, we define a function printDerivations that explicitly invokes a breadth-first search to compute and ultimately print derivations: printDerivations :: Str -> Str -> IO () printDerivations start end = do searchTree <- getSearchTree (derivation start end) putStr$ showDerivs (allValuesBFS searchTree)

You may want to enter

printDerivations [M, I] [M, I, U]

at the KiCS2 prompt to see the derivations function in action.

Tagged: breadth-first search, Curry, Denis Firsov, Douglas Hofstadter, functional logic programming, functional pattern, functional programming, Haskell, Institute of Cybernetics, KiCS2, literate programming, logic programming, MU puzzle, string rewriting, talk, Theory Lunch

# A taste of Curry

Curry is a programming language that integrates functional and logic programming. Last week, Denis Firsov and I had a look at Curry, and Thursday, I gave an introductory talk about Curry in the Theory Lunch. This blog post is mostly a write-up of my talk.

Like Haskell, Curry has support for literate programming. So I wrote this blog post as a literate Curry file, which is available for download. If you want to try out the code, you have to install the Curry system KiCS2. The code uses the functional patterns language extension, which is only supported by KiCS2, as far as I know.

## Functional programming

The functional fragment of Curry is very similar to Haskell. The only fundamental difference is that Curry does not support type classes.

Let us do some functional programming in Curry. First, we define a type whose values denote me and some of my relatives.

data Person = Paul
| Joachim
| Rita
| Wolfgang
| Veronika
| Johanna
| Jonathan
| Jaromir

Now we define a function that yields the father of a given person if this father is covered by the Person type.

father :: Person -> Person
father Joachim  = Paul
father Rita     = Joachim
father Wolfgang = Joachim
father Veronika = Joachim
father Johanna  = Wolfgang
father Jonathan = Wolfgang
father Jaromir  = Wolfgang

Based on father, we define a function for computing grandfathers. To keep things simple, we only consider fathers of fathers to be grandfathers, not fathers of mothers.

grandfather :: Person -> Person
grandfather = father . father

## Combining functional and logic programming

Logic programming languages like Prolog are able to search for variable assignments that make a given proposition true. Curry, on the other hand, can search for variable assignments that make a certain expression defined.

For example, we can search for all persons that have a grandfather according to the above data. We just enter

grandfather person where person free

at the KiCS2 prompt. KiCS2 then outputs all assignments to the person variable for which grandfather person is defined. For each of these assignments, it additionally prints the result of the expression grandfather person.

## Nondeterminism

Functions in Curry can actually be non-deterministic, that is, they can return multiple results. For example, we can define a function element that returns any element of a given list. To achieve this, we use overlapping patterns in our function definition. If several equations of a function definition match a particular function application, Curry takes all of them, not only the first one, as Haskell does.

element :: [el] -> el
element (el : _)   = el
element (_  : els) = element els

Now we can enter

element "Hello!"

at the KiCS2 prompt, and the system outputs six different results.

## Logic programming

We have already seen how to combine functional and logic programming with Curry. Now we want to do pure logic programming. This means that we only want to search for variable assignments, but are not interested in expression results. If you are not interested in results, you typically use a result type with only a single value. Curry provides the type Success with the single value success for doing logic programming.

Let us write some example code about routes between countries. We first introduce a type of some European and American countries.

| Estonia
| Germany
| Latvia
| Lithuania
| Mexico
| Poland
| Russia
| USA

Now we want to define a relation called borders that tells us which country borders which other country. We implement this relation as a function of type

Country -> Country -> Success

that has the trivial result success if the first country borders the second one, and has no result otherwise.

Note that this approach of implementing a relation is different from what we do in functional programming. In functional programming, we use Bool as the result type and signal falsity by the result False. In Curry, however, we signal falsity by the absence of a result.

Our borders relation only relates countries with those neighbouring countries whose names come later in alphabetical order. We will soon compute the symmetric closure of borders to also get the opposite relationships.

borders :: Country -> Country -> Success
Canada    borders USA       = success
Estonia   borders Latvia    = success
Estonia   borders Russia    = success
Germany   borders Poland    = success
Latvia    borders Lithuania = success
Latvia    borders Russia    = success
Lithuania borders Poland    = success
Mexico    borders USA       = success

Now we want to define a relation isConnected that tells whether two countries can be reached from each other via a land route. Clearly, isConnected is the equivalence relation that is generated by borders. In Prolog, we would write clauses that directly express this relationship between borders and isConnected. In Curry, on the other hand, we can write a function that generates an equivalence relation from any given relation and therefore does not only work with borders.

We first define a type alias Relation for the sake of convenience.

type Relation val = val -> val -> Success

Now we define what reflexive, symmetric, and transitive closures are.

reflClosure :: Relation val -> Relation val
reflClosure rel val1 val2 = rel val1 val2
reflClosure rel val  val  = success

symClosure :: Relation val -> Relation val
symClosure rel val1 val2 = rel val1 val2
symClosure rel val2 val1 = rel val1 val2

transClosure :: Relation val -> Relation val
transClosure rel val1 val2 = rel val1 val2
transClosure rel val1 val3 = rel val1 val2 &
transClosure rel val2 val3

where val2 free

The operator & used in the definition of transClosure has type

Success -> Success -> Success

and denotes conjunction.

We define the function for generating equivalence relations as a composition of the above closure operators. Note that it is crucial that the transitive closure operator is applied after the symmetric closure operator, since the symmetric closure of a transitive relation is not necessarily transitive.

equivalence :: Relation val -> Relation val
equivalence = reflClosure . transClosure . symClosure

The implementation of isConnected is now trivial.

isConnected :: Country -> Country -> Success
isConnected = equivalence borders

Now we let KiCS2 compute which countries I can reach from Estonia without a ship or plane. We do so by entering

Estonia isConnected country where country free

at the prompt.

We can also implement a nondeterministic function that turns a country into the countries connected to it. For this, we use a guard that is of type Success. Such a guard succeeds if it has a result at all, which can only be success, of course.

connected :: Country -> Country
connected country1
| country1 isConnected country2 = country2

where country2 free

## Equational constraints

Curry has a predefined operator

=:= :: val -> val -> Success

that stands for equality.

We can use this operator, for example, to define a nondeterministic function that yields the grandchildren of a given person. Again, we keep things simple by only considering relationships that solely go via fathers.

grandchild :: Person -> Person
grandchild person
| grandfather grandkid =:= person = grandkid

where grandkid free

Note that grandchild is the inverse of grandfather.

## Functional patterns

Functional patterns are a language extension that allows us to use ordinary functions in patterns, not just data constructors. Functional patterns are implemented by KiCS2.

Let us look at an example again. We want to define a function split that nondeterministically splits a list into two parts.1 Without functional patterns, we can implement splitting as follows.

split' :: [el] -> ([el],[el])
split' list | front ++ rear =:= list = (front,rear)

where front, rear free

With functional patterns, we can implement splitting in a much simpler way.

split :: [el] -> ([el],[el])
split (front ++ rear) = (front,rear)

As a second example, let us define a function sublist that yields the sublists of a given list.

sublist :: [el] -> [el]
sublist (_ ++ sub ++ _) = sub

## Inverting functions

In the grandchild example, we showed how we can define the inverse of a particular function. We can go further and implement a generic function inversion operator.

inverse :: (val -> val') -> (val' -> val)
inverse fun val' | fun val =:= val' = val where val free

With this operator, we could also implement grandchild as inverse grandfather.

Inverting functions can make our lives a lot easier. Consider the example of parsing. A parser takes a string and returns a syntax tree. Writing a parser directly is a non-trivial task. However, generating a string from a syntax tree is just a simple functional programming exercise. So we can implement a parser in a simple way by writing a converter from syntax trees to strings and inverting it.

We show this for the language of all arithmetic expressions that can be built from addition, multiplication, and integer constants. We first define types for representing abstract syntax trees. These types resemble a grammar that takes precedence into account.

type Expr = Sum

data Sum     = Sum Product [Product]
data Product = Product Atom [Atom]
data Atom    = Num Int | Para Sum

Now we implement the conversion from abstract syntax trees to strings.

toString :: Expr -> String
toString = sumToString

sumToString :: Sum -> String
sumToString (Sum product products)
= productToString product                           ++
concatMap ((" + " ++) . productToString) products

productToString :: Product -> String
productToString (Product atom atoms)
= atomToString atom                           ++
concatMap ((" * " ++) . atomToString) atoms

atomToString :: Atom -> String
atomToString (Num num)  = show num
atomToString (Para sum) = "(" ++ sumToString sum ++ ")"

Implementing the parser is now extremely simple.

parse :: String -> Expr
parse = inverse toString

KiCS2 uses a depth-first search strategy by default. However, our parser implementation does not work with depth-first search. So we switch to breadth-first search by entering

:set bfs

at the KiCS2 prompt. Now we can try out the parser by entering

parse "2 * (3 + 4)" .

1. Note that our split function is not the same as the split function in Curry’s List module.

Tagged: breadth-first search, Curry, Denis Firsov, depth-first search, functional logic programming, functional pattern, functional programming, Institute of Cybernetics, KiCS2, literate programming, logic programming, parsing, Prolog, talk, Theory Lunch, type class

# Automatically checking for syntax errors with Git's pre-commit hook

Over the past couple of days I've written about how I committed a syntax error on a cron script, and a co-worker had to fix it on Saturday morning. I observed that I should have remembered to check the script for syntax errors before committing it, and several people wrote to point out to me that this is the sort of thing one should automate.

(By the way, please don't try to contact me on Twitter. It won't work. I have been on Twitter Vacation for months and have no current plans to return.)

Git has a “pre-commit hook” feature, which means that you can set up a program that will be run every time you attempt a commit, and which can abort the commit if it doesn't like what it sees. This is the natural place to put an automatic syntax check. Some people suggested that it should be part of the CI system, or even the deployment system, but I don't control those, and anyway it is much better to catch this sort of thing as early as possible. I decided to try to implement a pre-commit hook to check syntax.

Unlike some of the git hooks, the pre-commit hook is very simple to use. It gets run when you try to make a commit, and the commit is aborted if the hook exits with a nonzero status.

I made one mistake right off the bat: I wrote the hook in Bourne shell, even though I swore years ago to stop writing shell scripts. Everything that I want to write in shell should be written in Perl instead or in some equivalently good language like Python. But the sample pre-commit hook was written in shell and when I saw it I went into automatic shell scripting mode and now I have yet another shell script that will have to be replaced with Perl when it gets bigger. I wish I would stop doing this.

Here is the hook, which, I should say up front, I have not yet tried in day-to-day use. The complete and current version is on github.

#!/bin/bash

function typeof () {
filename=$1 case$filename in
*.pl | *.pm) echo perl; exit ;;
esac

line1=$(head -1$1)
case $line1 in '#!'*perl ) echo perl; exit ;; esac } Some of the sample programs people showed me decided which files needed to be checked based only on the filename. This is not good enough. My most important Perl programs have filenames with no extension. This typeof function decides which set of checks to apply to each file, and the minimal demonstration version here can do that based on filename or by looking for the #!...perl line in the first line of the file contents. I expect that this function will expand to include other file types; for example *.py ) echo python; exit ;; is an obvious next step. if [ ! -z$COMMIT_OK ]; then
exit 0;
fi

This block is an escape hatch. One day I will want to bypass the hook and make a commit without performing the checks, and then I can COMMIT_OK=1 git commit …. There is actually a --no-verify flag to git-commit that will skip the hook entirely, but I am unlikely to remember it.

(I am also unlikely to remember COMMIT_OK=1. But I know from experience that I will guess that I might have put an escape hatch into the hook. I will also guess that there might be a flag to git-commit that does what I want, but that will seem less likely to be true, so I will look in the hook program first. This will be a good move because my hook is much shorter than the git-commit man page. So I will want the escape hatch, I will look for it in the best place, and I will find it. That is worth two lines of code. Sometimes I feel like the guy in Memento. I have not yet resorted to tattooing COMMIT_OK=1 on my chest.)

exec 1>&2

This redirects the standard output of all subsequent commands to go to standard error instead. It makes it more convenient to issue error messages with echo and such like. All the output this hook produces is diagnostic, so it is appropriate for it to go to standard error.

allOK=true
for file in $(git diff --cached --name-only | sort) ; do allOK is true if every file so far has passed its checks. badFiles is a list of files that failed their checks. the git diff --cached --name-only function interrogates the Git index for a list of the files that have been staged for commit. type=$(typeof "$file") This invokes the typeof function from above to decide the type of the current file. BAD=false When a check discovers that the current file is bad, it will signal this by setting BAD to true. echo echo "## Checking file$file (type $type)" case$type in
perl )
perl -cw $file || BAD=true [ -x$file ] || { echo "File is not executable"; BAD=true; }
;;
* )
echo "Unknown file type: $file; no checks" ;; esac This is the actual checking. To check Python files, we would add a python) … ;; block here. The * ) case is a catchall. The perl checks run perl -cw, which does syntax checking without executing the program. It then checks to make sure the file is executable, which I am sure is a mistake, because these checks are run for .pm files, which are not normally supposed to be executable. But I wanted to test it with more than one kind of check. if$BAD; then
allOK=false;
badFiles="$badFiles;$file"
fi
done

If the current file was bad, the allOK flag is set false, and the commit will be aborted. The current filename is appended to badFiles for a later report. Bash has array variables but I don't remember how they work and the manual made it sound gross. Already I regret not writing this in a real language.

After the modified files have been checked, the hook exits successfully if they were all okay, and prints a summary if not:

if $allOK; then exit 0; else echo '' echo '## Aborting commit. Failed checks:' for file in$(echo $badFiles | tr ';' ' '); do echo "$file"
done
exit 1;
fi

This hook might be useful, but I don't know yet; as I said, I haven't really tried it. But I can see ahead of time that it has a couple of drawbacks. Of course it needs to be built out with more checks. A minor bug is that I'd like to apply that is-executable check to Perl files that do not end in .pm, but that will be an easy fix.

But it does have one serious problem I don't know how to fix yet. The hook checks the versions of the files that are in the working tree, but not the versions that are actually staged for the commit!

The most obvious problem this might cause is that I might try to commit some files, and then the hook properly fails because the files are broken. Then I fix the files, but forget to add the fixes to the index. But because the hook is looking at the fixed versions in the working tree, the checks pass, and the broken files are committed!

A similar sort of problem, but going the other way, is that I might make several changes to some file, use git add -p to add the part I am ready to commit, but then the commit hook fails, even though the commit would be correct, because the incomplete changes are still in the working tree.

I did a little tinkering with git stash save -k to try to stash the unstaged changes before running the checks, something like this:

git stash save -k "pre-commit stash" || exit 2
trap "git stash pop" EXIT

but I wasn't able to get anything to work reliably. Stashing a modified index has never worked properly for me, perhaps because there is something I don't understand. Maybe I will get it to work in the future. Or maybe I will try a different method; I can think of several offhand:

• The hook could copy each file to a temporary file and then run the check on the temporary file. But then the diagnostics emitted by the checks would contain the wrong filenames.

• It could move each file out of the way, check out the currently-staged version of the file, check that, and then restore the working tree version. (It can skip this process for files where the staged and working versions are identical.) This is not too complicated, but if it messes up it could catastrophically destroy the unstaged changes in the working tree.

• Check out the entire repository and modified index into a fresh working tree and check that, then discard the temporary working tree. This is probably too expensive.

• This one is kind of weird. It could temporarily commit the current index (using --no-verify), stash the working tree changes, and check the files. When the checks are finished, it would unstash the working tree changes, use git-reset --soft to undo the temporary commit, and proceed with the real commit if appropriate.

• Come to think of it, this last one suggests a much better version of the same thing: instead of a pre-commit hook, use a post-commit hook. The post-commit hook will stash any leftover working tree changes, check the committed versions of the files, unstash the changes, and, if the checks failed, undo the commit with git-reset --soft.

Right now the last one looks much the best but perhaps there's something straightforward that I didn't think of yet.

[ Thanks to Adam Sjøgren, Jeffrey McClelland, and Jack Vickeridge for discussing this with me. Jeffrey McClelland also suggested that syntax checks could be profitably incorporated as a post-receive hook, which is run on the remote side when new commits are pushed to a remote. I said above that running the checks in the CI process seems too late, but the post-receive hook is earlier and might be just the thing. ]

[ Addendum: Daniel Holz wrote to tell me that the Yelp pre-commit frameworkhandles the worrisome case of unstaged working tree changes. The strategy is different from the ones I suggested above. If I'm reading this correctly, it records the unstaged changes in a patch file, which it sticks somewhere, and then checks out the index. If all the checks succeed, it completes the commit and then tries to apply the patch to restore the working tree changes. The checks in Yelp's framework might modify the staged files, and if they do, the patch might not apply; in this case it rolls back the whole commit. Thank you M. Holtz! ]

# Better exception messages

Let's write a really silly, highly inefficient (my favorite kind!) program that connects to multiple HTTP servers and sends a very simple request. Using the network package, this is really straightforward:

#!/usr/bin/env stack
-- stack --install-ghc --resolver lts-8.0 runghc --package network -- -Wall -Werror
import Network       (PortID (PortNumber), PortNumber, connectTo)
import System.IO     (hClose, hPutStrLn)

dests :: [(String, PortNumber)]
dests =
[ ("localhost", 80)
, ("localhost", 8080)
, ("10.0.0.138", 80)
]

main :: IO ()
main = do
handles <- forM dests $\(host, port) -> connectTo host (PortNumber port) forM_ handles$ \h -> hPutStrLn h "GET / HTTP/1.1\r\n\r\n"
forM_ handles hClose

We have our destinations. We open a connection to each of them, send our data, and then close the connection. You may have plenty of objections to how I've written this: we shouldn't be using String, shouldn't we flush the Handle, etc. Just ignore that for now. I'm going to run this on my local system, and get the following output:

$./foo.hs foo.hs: connect: does not exist (Connection refused) Riddle me this: which of the destinations above did the connection fail for? Answer: without changing our program, we have no idea. And that's the point of this blog post: all too often in the Haskell world, we get error messages from a program without nearly enough information to debug it. Prelude.undefined, Prelude.read: no parse, and Prelude.head: empty list are all infamous examples where a nice stack trace would save lots of pain. I'm talking about something slightly different. When you throw an exception in your code, whether it be via throwIO, returning Left, using fail, or using error, please give us some context. During development, it's a pain to have to dive into the code, add some trace statements, figure out what the actual problem is, and then remove the trace statements. When running in production, that extra information can be the difference between a two-minutes operations level fix (like opening a port in the firewall) versus a multi-hour debugging excursion. Concretely, here's an example of how I'd recommend collecting more information from connectTo: #!/usr/bin/env stack -- stack --install-ghc --resolver lts-5.10 runghc --package network -- -Wall -Werror {-# LANGUAGE DeriveDataTypeable #-} import Control.Exception (Exception, IOException, catch, throwIO) import Control.Monad (forM, forM_) import Data.Typeable (Typeable) import Network (HostName, PortID (PortNumber), PortNumber, connectTo) import System.IO (Handle, hClose, hPutStrLn) data ConnectException = ConnectException HostName PortID IOException deriving (Show, Typeable) instance Exception ConnectException connectTo' :: HostName -> PortID -> IO Handle connectTo' host port = connectTo host port catch \e -> throwIO (ConnectException host port e) dests :: [(String, PortNumber)] dests = [ ("localhost", 80) , ("localhost", 8080) , ("10.0.0.138", 80) ] main :: IO () main = do handles <- forM dests$ \(host, port) -> connectTo' host (PortNumber port)
forM_ handles $\h -> hPutStrLn h "GET / HTTP/1.1\r\n\r\n" forM_ handles hClose Notice how the ConnectException datatype provides plenty of information about the context that connectTo' was called from (in fact, all available information). If I run this program, the problem is immediately obvious:$ ./bar.hs
bar.hs: ConnectException "localhost" (PortNumber 80) connect: does not exist (Connection refused)

My web server isn't running locally on port 80. My ops team can now go kick the nginx/Warp process or do whatever other magic they need to do to get things running. All without bothering me at 2am :)

You may be thinking that this extra data type declaration is a lot of boilerplate overhead. While it does add some tedium, the benefit of being able to not only catch the exact exception we care about, but also easily extract the relevant context information, can pay off in completely unexpected ways in the future. I highly recommend it.

Since no Haskell blog post about exceptions is complete without it, let me cover some controversy:

• I know some people absolutely hate runtime exceptions. This point is orthogonal: however you decide to report exceptions to your users (Left, ExceptT, impure exceptions, etc), be kind to them and provide this extra context information.
• There are some problems with the approach I gave above regarding hierarchical exceptions. I'm specifically not diving into the details of hierarchical exceptions right now, since it's a complex topic that deserves its own dedicated post.
• Similar to the above point, it's a fair question whether you should group all exceptions together in one type with lots of data constructors for a package/module, or create lots of separate datatypes. Again, proper design of an exception type really deserves its own post. FWIW, in http-client, I elected to have an HttpException type with lots of data constructors.

Also, I left it out for brevity, but including a displayException method in your Exception instance can allow programs to display much more user-friendly error messages to end users.

While nothing I've said here is revolutionary, it's a small tweak to a library author's development style that can have a profound impact on users of the library, both at the dev level and those running the executable itself.

# TFP 2017: Call for Papers

-----------------------------
C A L L   F O R   P A P E R S
-----------------------------

======== TFP 2017 ===========

18th Symposium on Trends in Functional Programming
19-21 June, 2017
University of Kent, Canterbury
https://www.cs.kent.ac.uk/<wbr></wbr>events/tfp17/index.html

The symposium on Trends in Functional Programming (TFP) is an
international forum for researchers with interests in all aspects of
functional programming, taking a broad view of current and future
trends in the area. It aspires to be a lively environment for
presenting the latest research results, and other contributions (see
below). Authors of draft papers will be invited to submit revised
papers based on the feedback receive at the symposium.  A
post-symposium refereeing process will then select a subset of these
articles for formal publication.

TFP 2017 will be the main event of a pair of functional programming
events. TFP 2017 will be accompanied by the International Workshop on
Trends in Functional Programming in Education (TFPIE), which will take
place on 22 June.

The TFP symposium is the heir of the successful series of Scottish
Functional Programming Workshops. Previous TFP symposia were held in
* Edinburgh (Scotland) in 2003;
* Munich (Germany) in 2004;
* Tallinn (Estonia) in 2005;
* Nottingham (UK) in 2006;
* New York (USA) in 2007;
* Nijmegen (The Netherlands) in 2008;
* Komarno (Slovakia) in 2009;
* Oklahoma (USA) in 2010;
* St. Andrews (UK) in 2012;
* Provo (Utah, USA) in 2013;
* Soesterberg (The Netherlands) in 2014;
* Inria Sophia-Antipolis (France) in 2015;
* and Maryland (USA) in 2016.

(http://www.tifp.org/).

== SCOPE ==

The symposium recognizes that new trends may arise through various
routes.  As part of the Symposium's focus on trends we therefore
identify the following five article categories. High-quality articles
are solicited in any of these categories:

Research Articles: leading-edge, previously unpublished research work
Position Articles: on what new trends should or should not be
Project Articles: descriptions of recently started new projects
Evaluation Articles: what lessons can be drawn from a finished project
Overview Articles: summarizing work with respect to a trendy subject

Articles must be original and not simultaneously submitted for
publication to any other forum. They may consider any aspect of
functional programming: theoretical, implementation-oriented, or
experience-oriented.  Applications of functional programming
techniques to other languages are also within the scope of the
symposium.

Topics suitable for the symposium include, but are not limited to:

Functional programming and multicore/manycore computing
Functional programming in the cloud
High performance functional computing
Extra-functional (behavioural) properties of functional programs
Dependently typed functional programming
Validation and verification of functional programs
Debugging and profiling for functional languages
Functional programming in different application areas:
security, mobility, telecommunications applications, embedded
systems, global computing, grids, etc.
Interoperability with imperative programming languages
Novel memory management techniques
Program analysis and transformation techniques
Empirical performance studies
Abstract/virtual machines and compilers for functional languages
(Embedded) domain specific languages
New implementation strategies
Any new emerging trend in the functional programming area

If you are in doubt on whether your article is within the scope of

== BEST PAPER AWARDS ==

To reward excellent contributions, TFP awards a prize for the best paper
accepted for the formal proceedings.

TFP traditionally pays special attention to research students,
acknowledging that students are almost by definition part of new
subject trends. A student paper is one for which the authors state
that the paper is mainly the work of students, the students are listed
as first authors, and a student would present the paper. A prize for
the best student paper is awarded each year.

In both cases, it is the PC of TFP that awards the prize. In case the
best paper happens to be a student paper, that paper will then receive
both prizes.

TBD

== PAPER SUBMISSIONS ==

Acceptance of articles for presentation at the symposium is based on a
lightweight peer review process of extended abstracts (4 to 10 pages
in length) or full papers (20 pages). The submission must clearly
indicate which category it belongs to: research, position, project,
evaluation, or overview paper. It should also indicate which authors
are research students, and whether the main author(s) are students.  A
feedback by one of the PC members shortly after the symposium has
taken place.

We use EasyChair for the refereeing process. Papers must be submitted at:

https://easychair.org/<wbr></wbr>conferences/?conf=tfp17

Papers must be written in English, and written using the LNCS
Springer LNCS web site:

http://www.springer.com/<wbr></wbr>computer/lncs?SGWID=0-164-6-<wbr></wbr>793341-0

== IMPORTANT DATES ==

Submission of draft papers:     5 May, 2017
Registration:                   11 June, 2017
TFP Symposium:                  19-21 June, 2017
Student papers feedback:        29 June, 2017
Submission for formal review:   2 August, 2017
Notification of acceptance:     3 November, 2017
Camera ready paper:             2 December, 2017

== PROGRAM COMMITTEE ==

TBD

# Data Engineer at Takt, Inc. (Full-time)

Takt is seeking data engineers to help develop our flagship product. Our platform learns and adapts to people's preferences, habits, and feedback—orchestrating highly relevant experiences that are truly unique to each person. Our vision will change the way people engage across multiple industries, be it retail, finance, or healthcare.

We share your passion for using data to solve complex problems. You understand that legacy code is the work you did yesterday. You'll work in small, self-sufficient teams with a common goal: deliver excellent software anchored in an agile culture of quality, delivery, and innovation.

Key Responsibilities:

• Apply data to solve complex problems—understanding the importance of data pipelines and quality
• Refine, harden, and productionize models developed by data scientists
• Build infrastructure for real-time analytics and real-time predictive intelligence based on large, diverse, and dynamic data sets
• Apply software testing practices to the work you do
• Help grow our engineering team

Skills and Experience:

• Profound and extensive experience with cluster computing frameworks; we work with Spark and Scala
• Strong programming skills in at least one language; functional languages (Haskell, Scala, Clojure, Erlang) are preferred
• Significant experience with SQL and NoSQL databases
• Demonstrated experience with cloud platforms, such as AWS

Bonus Points:

• You welcome the responsibility and thrill that comes with being a member of a founding team
• You're motivated, dependable, and continuously focused on excellence
• You court perfection, but are grounded and practical
• You're interested in improving humanity's relationship with technology

Get information on how to apply for this position.

# Hackage Security and Stack

Back in 2015, there were two proposals made for securing package distribution in Haskell. The Stackage team proposed and implemented a solution using HTTPS and Git, which was then used as the default in Stack. Meanwhile, the Hackage team moved ahead with hackage-security. Over the past few weeks, I've been working on moving Stack over to hackage-security (more on motivation below). The current status of the overall hackage-security roll-out is:

• Hackage is now providing the relevant data for hackage-security (the 01-index.tar file and signature files)
• cabal-install will move over to hackage-security in its next release
• The FP Complete Hackage mirror is using hackage-security (and in particular Herbert's hackage-mirror-tool) to run its S3-backed mirror.
• On the master branch, Stack defaults to using hackage-security for downloading package metadata. We may even remove support for Git-based indices entirely, but that's a discussion for another day.

One upside to this is more reliable package index download time. We have had complaints from some firewalled users of slow Git clone time, so this is a good thing. We're still planning on maintaining the Git-based package indices for people using them (to my knowledge they are still being used by Nix, and all-cabal-metadata is still used to power a lot of the information on stackage.org).

However, there's one significant downside I've encountered in the current implementation that I want to discuss.

## Background

Quick summary of how hackage-security works: there is a 01-index.tar file, the contents of which I'll discuss momentarily. This is the file which is downloaded by Stack/cabal-install when you "update your index." It is signed by a cryptographic algorithm specified within the hackage-security project, and whenever a client does an update, it must verify the signature. In theory, when that signature is verified, we know that the contents of the 01-index.tar file are unmodified.

Within this file are two (relevant) kinds of files: the .cabal files for every upload to Hackage (including revisions), and .json files containing metadata about the package tarballs themselves. Importantly, this includes a SHA256 checksum and the size of the tarball. Using these already-validated-to-be-correct JSON files, we can download and verify a package tarball, even over an insecure connection.

The alternative Git-based approach that the Stackage team proposed has an almost-identical JSON file concept in the all-cabal-hashes repo. Originally, these were generated by downloading tarballs from https://hackage.haskell.org (note the HTTPS). However, a number of months back it became known that the connection between the CDN in front of Hackage and Hackage itself was not TLS-secured, and therefore reliance on HTTPS was not possible. We now rely on the JSON files provided by hackage-security to generate the JSON files used in the Git repo.

## The problem

With that background, the bug is easy to describe: sometimes the .json files are missing from the 01-index.tar file. This was originally opened in April 2016 (for Americans: on tax day no less), and then I rediscovered the issue three weeks ago when working on Stack.

Over the weekend, another .json file went missing, resulting in the FP Complete mirror not receiving updates until I manually updated the list of missing index files. Due to the inability to securely generate the .json file in the all-cabal-hashes Git repo without the file existing upstream, that file is now missing in all-cabal-hashes, causing downstream issues to the Nix team.

## How it manifests

There are a number of outcomes to be aware of from this issue:

• The FP Complete mirror, and any other mirror using Herbert's tool, will sometimes stop updating if a new JSON file is missing. This is an annoyance for end users, and a frustration for the mirror maintainers. Fortunately, updating the mirror tool code with the added index isn't too heavy a burden. Unfortunately, due to the lack of HTTPS between Hackage and its CDN, there's no truly secure way to do this update.
• End users cannot currently use packages securely if they are affected by this bug. You can see the full list at the time of writing this post.
• Stack has had code in place to reject indices that do not provide complete signature cover for a long while (I think since its initial release). Unfortunately, this code cannot be turned on for hackage-security (which is how I discovered this bug in the first place). We can implement a new functionality with weaker requirements (refuse to download a package that is missing signature information), but ideally we could use the more strict semantics.
• The Nix team cannot rely on hashes being present in all-cabal-hashes. I can't speak to the Nix team internal processes, and cannot therefore assess how big an impact that is.

## Conclusion

Overall, I'm still very happy that we've moved Stack over to hackage-security:

• It fixed an immediate problem for users behind a firewall, which we otherwise would have needed to work around with new code (downloading a Git repo snapshot). Avoiding writing new code is always a win :).
• Layering the HTTPS/Git-based security system on top of hackage-security doesn't make things more secure, it just adds two layers for security holes to exist in instead of one. From a security standpoint, if Hackage is providing a security mechanism, it makes sense to leverage it directly. Said another way: if it turns out that hackage-security is completely insecure, our Git-based layer would have been vulnerable anyway since it relied on hackage-security.
• By moving both Stack and cabal-install over to hackage-security for client access, we'll be able to test that code more thoroughly, hopefully resulting in a more reliable security mechanism for both projects to share (small example of such stress-testing).
• Stack has always maintained compatibility with some form of non-Git index, so we've always had two code paths for index updates. As hinted at above, this change opens the door to removing the Git-based code path. And removing code is almost as good as avoiding writing new code.
• I would still feel more comfortable with the security of Hackage if HTTPS was used throughout, if only as a level of sanity in case all else fails. I hope that in the future the connection between Hackage and its CDN switches from insecure to secure. I also hope that cabal-install is still planning on moving over to using HTTPS for its downloads.

# We are hiring Functional Programmers.

The Modelling and Analytics Group was the original introducer of Functional Programming (especially Haskell) to the Financial Markets business of Standard Chartered Bank, and although there are now several other teams who contribute to our large codebase and who are also hiring, we now have some vacancies in the Core team.  Please think about applying!

Job Description:<o:p></o:p>
• Join the Modeling and Analytics’ Core team, part of Financial Markets Front Office, in Singapore or London.
• Apply Functional Programming concepts to the design and implementation of the unified bank’s Analytics library.
• Support infrastructure requests from the Quant group, flow trading desks, structured trading desks & structuring worldwide.

Candidate qualifications:<o:p></o:p>
• Master or PhD in Computer Science, with a focus among: functional programming, language & compiler design, efficient data processing.
• Excellent programming skills in one of the major statically typed functional languages (ideally Haskell), and preferably exhibited by academic output (research / teaching) or open-source development.
• Proficiency with C/C++ and debugging / performance tuning tools is a strong advantage.
• Good communication skills required for interactions with other team members and with trading desks.
• Familiarity with financial markets is a plus but not required.

How to apply:
• In the first instance, send your CV to Raphael.Montelatici@sc.com

# Immutability, Docker, and Haskell's ST type

In managing projects at FP Complete, I get to see both the software development and devops sides of our engineering practice. Over the years, I've been struck by the recurrence of a single word appearing repeatedly in both worlds: immutability.

On the software side, one of the strongest tenets of functional programming is immutable data structures. These are values which - once created - can never be changed again through the course of running the application. These reduce coupling between components, simplify concurrency and parallelism, and decrease the total number of moving pieces in a system, making it easier to maintain and develop over time.

On the devops side, immutable infrastructure is relatively a more recent discovery. By creating machine images and replacing rather than modifying existing components, we have a more reliable hosting setup to target, minimize the differences between test and production systems, and reduce the amount of error-prone, manual fiddling that leads to 3am coffee-fueled emergency recovery sessions.

It's no secret that containerization in general, and Docker in particular, has become very popular in the devops space. I've noticed that there's a strong parallel between how Docker images are built, and a technique from functional programming - the ST (State Thread) type. This blog post will explain both sides of the puzzle, and then explain how they match up.

## Dockerfile: mutable steps, immutable outcome

A Docker image is a complete Linux filesystem, providing all of the tools, libraries, and data files needed for its task. As a simple example, I recently created a simple Docker image containing the Stack build tool (more on that later) and Apache FOP for generating some PDFs. In the Docker world, the formula you use for creating a Docker image is a Dockerfile. Let's look at the (very simple) Dockerfile I wrote:

FROM fpco/pid1:16.04

RUN DEBIAN_FRONTEND=noninteractive apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y wget default-jre && \
tar zxf fop-2.1-bin.tar.gz && \
rm -f fop-2.1-bin.tar.gz && \
mv fop-2.1 /usr/local/share

In this file, I'm starting off from the fpco/pid1 base image, which provides us with a filesystem to start off with (it would obviously be pretty difficult to create a complete filesystem each time we wanted to create a new image). Then we provide a series of actions to take to modify that image. Looking at the example above, we:

• Update the list of APT packages available
• Install wget and the default Java Runtime Environment
• Install the Stack build tool by running a script
• Unpack the bundle and move it to /usr/local/share

Look at that list of steps. In no world could those actions be called "immutable." Every single one of them mutates the filesystem, either modifying files, adding files, or removing files. The end result of this mutation process is a new filesystem, captured in a Docker image.

And here's the important bit: this new image is totally immutable. You cannot in any way modify the image. You can create a new image based on it, but the original will remain unchanged. For all of history, this image will remain identical.*

In other words: a Dockerfile is a series of mutations which generates an immutable data structure.

* You may argue that you can delete the image, or you could create a new image with the same name. That's true, but as long as you're working with the image in question, it does not change. By contrast, each time you access the /tmp/foobar file, it may have different contents.

## The ST type

In a purely functional programming language like Haskell, data is immutable by default. This means that, if you have a variable holding an Int, you cannot change it. Consider this example code, playing around with a Map structure (also known as a dictionary or lookup table):

myMap <- makeSomeMap
print myMap
useMap myMap
print myMap

We make our initial Map using the makeSomeMap function, print its contents, pass it to some other function (useMap), and then print it again. Pop quiz: is there any way that the two print operations will print different values?

If you're accustomed to mutable languages like Java or Python, you'd probably say yes: myMap is (presumably) an object with mutable state, and the useMap function might modify it. In Haskell, that can't happen: you've passed a reference to myMap to your useMap function, but useMap is not allowed to modify it.

Of course, we would like to be able to create different values, so saying "you can't ever change anything" is a little daunting. The primary way of working with Haskell's immutable data structures is to have functions which create new values based on old ones. In this process, we create a new value by giving it some instructions for the change. For example, if in our example above, the myMap value had a mapping from names to ages, we could insert an extra value:

myMap <- makeSomeMap
let myModifiedMap = insert "Alice" 35 myMap
print myModifiedMap
useMap myModifiedMap
print myModifiedMap

However, this isn't real mutation: the original myMap remains the same. There are cases in which creating a completely new version of the data each time would be highly inefficient. Most sorting algorithms fall into this category, as they involve a large number of intermediate swaps. If each of those swaps generated a brand new array, it would be very slow with huge amounts of memory allocation.

Instead, Haskell provides the ST type, which allows for local mutations. While within an ST block, you can directly mutate local variables, such as mutable vectors. But none of those mutated values can escape the ST block, only immutable variants. To see how this works, look at this Haskell code (save it to Main.hs and run with stack Main.hs using the Stack build tool):

#!/usr/bin/env stack
{- stack --resolver lts-7.14 --install-ghc runghc
--package vector-algorithms -}
import Data.Vector (Vector, fromList, modify, freeze, thaw)
import Data.Vector.Algorithms.Insertion (sort)

-- longer version, to demonstrate what's actually happening
immutableSort :: Vector Int -> Vector Int
liftIO $print args This frontend plugin is taken straight from the GHC documentation (but with enough imports to make it compile ;-). It prints out the arguments passed to it. Next, we need a wrapper program around GHC which will invoke our plugin instead of regular GHC when we are called with the --interactive flag. Here is a simple script which works on Unix-like systems: import GHC.Paths import System.Posix.Process import System.Environment main = do args <- getArgs let interactive = "--interactive" elem args args' = do arg <- args case arg of "--interactive" -> ["--frontend", "Hello", "-plugin-package", "hello-plugin"] _ -> return arg executeFile ghc False (args' ++ if interactive then ["-user-package-db"] else []) Nothing Give this a Cabal file, and then install it to the user package database with cabal install (see the second bullet point below if you want to use a non-standard GHC via the -w flag): name: hello-plugin version: 0.1.0.0 license: BSD3 author: Edward Z. Yang maintainer: ezyang@cs.stanford.edu build-type: Simple cabal-version: >=1.10 library exposed-modules: Hello build-depends: base, ghc >= 8.0 default-language: Haskell2010 executable hello-plugin main-is: HelloWrapper.hs build-depends: base, ghc-paths, unix default-language: Haskell2010 Now, to run your plugin, you can do any of the following: • cabal repl -w hello-plugin • cabal new-repl -w hello-plugin • stack repl --system-ghc --with-ghc hello-plugin To run the plugin on a specific package, pass the appropriate flags to the repl command. The full code for this example can be retrieved at ezyang/hello-plugin on GitHub. Here are a few miscellaneous tips and tricks: • To pass extra flags to the plugin, add --ghc-options=-ffrontend-opt=arg as necessary (if you like, make another wrapper script around this!) • If you installed hello-plugin with a GHC that is not the one from your PATH, you will need to put the correct ghc/ghc-pkg/etc executables first in the PATH; Cabal's autodetection will get confused if you just use -w. If you are running cabal, another way to solve this problem is to pass --with-ghc-pkg=PATH to specify where ghc-pkg lives (Stack does not support this.) • You don't have to install the plugin to your user package database, but then the wrapper program needs to be adjusted to be able to find wherever the package does end up being installed. I don't know of a way to get this information without writing a Custom setup script with Cabal; hopefully installation to the user package database is not too onerous for casual users. • cabal-install and stack differ slightly in how they go about passing home modules to the invocation of GHCi: cabal-install will call GHC with an argument for every module in the home package; Stack will pass a GHCi script of things to load. I'm not sure which is more convenient, but it probably doesn't matter too much if you know already know which module you want to look at (perhaps you got it from a frontend option.) ## February 08, 2017 ### Yesod Web Framework # Changes to Yesod's CI I've made some changes in the past few days to the CI setup for the yesodweb/yesod repo that I thought contributors may be interested in knowing about. • We were regularly running into build timeouts on OS X on Travis. To work around this, we no longer build benchmarks and Haddocks on OS X, and compile with -O0. Relevant Travis changes • I've (finally) bit the bullet and made the repo compile with -Wall -Werror enabled, at least for recent versions of GHC. From now on, PRs will need to maintain warning-cleanliness. Relevant Travis changes • There's now an AppVeyor configuration, so PRs can be checked against Windows (in addition to the existing Linux and OS X coverage provided by Travis). This did, in fact, reveal two issues around line endings. Relevant AppVeyor addition. Nothing major, just a few changes that contributors should be aware of. Hopefully that green checkmark on a PR will now have a few less of both false positives and false negatives. ## February 07, 2017 ### Dimitri Sabadie # Lifetimes limits – self borrowing and dropchecker Lately, I’ve been playing around with alto in my demoscene framework. This crate in the replacement of openal-rs as openal-rs has been deprecated because unsound. It’s a wrapper over OpenAL, which enables you to play 3D sounds and gives you several physical properties and effects you can apply. # The problem Just to let you fully understand the problem, let me introduce a few principles from alto. As a wrapper over OpenAL, it exposes quite the same interface, but adds several safe-related types. In order to use the API, you need three objects: • an Alto object, which represents the API object (it holds dynamic library handles, function pointers, etc. ; we don’t need to know about that) • a Device object, a regular device (a sound card, for example) • a Context object, used to create audio resources, handle the audio context, etc. There are well-defined relationships between those objects that state about their lifetimes. An Alto object must outlive the Device and the Device must outlive the Context. Basically: let alto = Alto::load_default(None).unwrap(); // bring the default OpenAL implementation in let dev = alto.open(None).unwrap(); // open the default device let ctx = dev.new_context(None).unwrap(); // create a default context with no OpenAL extension As you can see here, the lifetimes are not violated, because alto outlives dev which outlives ctx. Let’s dig in the type and function signatures to get the lifetimes right (documentation here). fn Alto::open<'s, S: Into<Option<&'s CStr>>>(&self, spec: S) -> AltoResult<Device> The S type is just a convenient type to select a specific implementation. We need the default one, so just pass None. However, have a look at the result. AltoResult<Device>. I told you about lifetime relationships. This one might be tricky, but you always have to wonder “is there an elided lifetime here?”. Look at the Device type: pub struct Device<'a> { /* fields omitted */ } Yep! So, what’s the lifetime of the Device in AltoResult<Device>? Well, that’s simple: the lifetime elision rule in action is one of the simplest: If there are multiple input lifetime positions, but one of them is &self or &mut self, the lifetime of self is assigned to all elided output lifetimes. (source) So let’s rewrite the Alto::open function to make it clearer: fn Alto::open<'a, 's, S: Into<Option<&'s CStr>>>(&'a self, spec: S) -> AltoResult<Device<'a>> // exact same thing as above So, what you can see here is that the Device must be valid for the same lifetime as the reference we pass in. Which means that Device cannot outlive the reference. Hence, it cannot outlive the Alto object. impl<'a> Device<'a> { // … fn new_context<A: Into<Option<ContextAttrs>>>(&self, attrs: A) -> AltoResult<Context> // … } That looks a bit similar. Let’s have a look at Context: pub struct Context<'d> { /* fields omitted */ } Yep, same thing! Let’s rewrite the whole thing: impl<'a> Device<'a> { // … fn new_context<'b, A: Into<Option<ContextAttrs>>>(&'b self, attrs: A) -> AltoResult<Context<'b>> // … } Plus, keep in mind that self is actually Device<'a>. The first argument of this function then awaits a &'b Device<'a> object! rustc is smart enough to automatically insert the 'a: 'b lifetime bound here – i.e. the 'a lifetime outlives 'b. Which makes sense: the reference will die before the Device<'a> is dropped. Ok, ok. So, what’s the problem then?! # The (real) problem The snippet of code above about how to create the three objects is straight-forward (though we don’t take into account errors, but that’s another topic). However, in my demoscene framework, I really don’t want people to use that kind of types. The framework should be completely agnostic about which technology or API is used internally. For my purposes, I just need a single type with a few methods to work with. Something like that: struct Audio = {} impl Audio { pub fn new<P>(track_path: P) -> Result<Self> where P: AsRef<Path> {} pub fn toggle(&mut self) -> bool {} pub fn playback_cursor(&self) -> f32 {} pub fn set_playback_cursor(&self, t: f32) {} } impl Drop for Audio { fn drop(&mut self) { // stop the music if playing; do additional audio cleanup } } This is a very simple interface, yet I don’t need more. Audio::set_playback_cursor is cool when I debug my demos in realtime by clicking a time panel to quickly jump to a part of the music. Audio::toggle() enables me to pause the demo to inspect an effect in the demo. Etc. However, how can I implement Audio::new? # The (current) limits of borrowing The problem kicks in as we need to wrap the three types – Alto, Device and Context – as the fields of Audio: struct Audio<'a> { alto: Alto, dev: Device<'a>, context: Context<'a> } We have a problem if we do this. Even though the type is correct, we cannot correctly implement Audio::new. Let’s try: impl<'a> Audio<'a> { pub fn new<P>(_: P) -> Result<Self> where P: AsRef<Path> { let alto = Alto::load_default(None).unwrap(); let dev = alto.open(None).unwrap(); let ctx = dev.new_context(None).unwrap(); Ok(Audio { alto: alto, dev: dev, ctx: ctx }) } } As you can see, that cannot work: error: alto does not live long enough --> /tmp/alto/src/main.rs:14:15 | 14 | let dev = alto.open(None).unwrap(); | ^^^^ does not live long enough ... 22 | } | - borrowed value only lives until here | note: borrowed value must be valid for the lifetime 'a as defined on the body at 12:19... --> /tmp/alto/src/main.rs:12:20 | 12 | fn new() -> Self { | ^ error: dev does not live long enough --> /tmp/alto/src/main.rs:15:15 | 15 | let ctx = dev.new_context(None).unwrap(); | ^^^ does not live long enough ... 22 | } | - borrowed value only lives until here | note: borrowed value must be valid for the lifetime 'a as defined on the body at 12:19... --> /tmp/alto/src/main.rs:12:20 | 12 | fn new() -> Self { | ^ error: aborting due to 2 previous errors What’s going on here? Well, we’re hitting a problem called the problem of self-borrowing. Look at the first two lines of our implementation of Audio::new: let alto = Alto::load_default(None).unwrap(); let dev = alto.open(None).unwrap(); As you can see, the call to Alto::open borrows alto – via a &Alto reference. And of course, you cannot move a value that is borrowed – that would invalidate all the references pointing to it. We also have another problem: imagine we could do that. All those types implement Drop. Because they basically all have the same lifetime, there’s no way to know which one borrows information from whom. The dropchecker has no way to know that. It will then refuse to code creating objects of this type, because dropping might be unsafe in that case. # What can we do about it? Currently, this problem is linked to the fact that the lifetime system is a bit too restrictive and doesn’t allow for self-borrowing. Plus, you also have the dropchecker issue to figure out. Even though we were able to bring in alto and device altogether, how do you handle context? The dropchecker doesn’t know which one must be dropped first – there’s no obvious link at this stage between alto and all the others anymore, because that link was made with a reference to alto that died – we’re moving out of the scope of the Audio::new function. That’s a bit tough. The current solution I implemented to fix the issue is ok–ish, but I dislike it because it adds a significant performance overhead: I just moved the initialization code in a thread that stays awake until the Audio object dies, and I use a synchronized channel to communicate with the objects in that thread. That works because the thread provides us with a stack, that is the support of lifetimes – think of scopes. Another solution would be to move that initialization code in a function that would accept a closure – your application. Once everything is initialized, the closure is called with a few callbacks to toggle / set the cursor of the object living “behind” on the stack. I don’t like that solution because it modifies the main design – having an Audio object was the goal. Other solutions are: • std::mem::transmute to remove the lifetimes (replace them with 'static). That’s hyper dangerous and we are just breaking Rust’s lifetimes… not okay :( • change our design to meet the same as alto’s (in a word: use the same three objects) • cry deeply I don’t have a satisfying solution yet to that problem. My thread solution works and lets me have a single type abstracting all of that, but having a thread for such a thing is a waste of resources to me. I think I’ll implement the closure solution as, currently, it’s not possible to embed in struct lifetimes’ semantics / logic. I guess it’s okay; I guess the problem is also linked to the fact the concept is pretty young and we’re still kind of experimenting it. But clearly, lifetimes hit a hard problem here that they cannot solve correctly. Keep in mind that even if unsafe solutions exist, we’re talking about a library that’s designed to work with Rust lifetimes as a pretty high level of abstraction. Firing transmute is very symptomatic of something wrong. I’m open to suggestions, because I’ve been thinking the problem all day long without finding a proper solution. Keep the vibe! ### Joachim Breitner # Why prove programs equivalent when your compiler can do that for you? Last week, while working on CodeWorld, via a sequence of yak shavings, I ended up creating a nicely small library that provides Control.Applicative.Succs, a new applicative functor. And because I am trying to keep my Haskell karma good1, I wanted to actually prove that my code fulfills the Applicative and Monad laws. This led me to inserted writing long comments into my code, filled with lines like this: The second Applicative law: pure (.) <*> Succs u us <*> Succs v vs <*> Succs w ws = Succs (.) [] <*> Succs u us <*> Succs v vs <*> Succs w ws = Succs (u .) (map (.) us) <*> Succs v vs <*> Succs w ws = Succs (u . v) (map ($v) (map (.) us) ++ map (u .) vs) <*> Succs w ws
= Succs (u . v) (map (($v).(.)) us ++ map (u .) vs) <*> Succs w ws = Succs ((u . v) w) (map ($w) (map (($v).(.)) us ++ map (u .) vs) ++ map (u.v) ws) = Succs ((u . v) w) (map (($w).($v).(.)) us ++ map (($w).(u.)) vs ++ map (u.v) ws)
= Succs (u (v w)) (map (\u -> u (v w)) us ++ map (\v -> u (v w)) vs ++ map (\w -> u (v w)) ws)
= Succs (u (v w)) (map ($(v w)) us ++ map u (map ($w) vs ++ map v ws))
= Succs u us <*> Succs (v w) (map ($w) vs ++ map v ws) = Succs u us <*> (Succs v vs <*> Succs w ws) Honk if you have done something like this before! I proved all the laws, but I was very unhappy. I have a PhD on something about Haskell and theorem proving. I have worked with Isabelle, Agda and Coq. Both Haskell and theorem proving is decades old. And yet, I sit here, and tediously write manual proofs by hand. Is this really the best we can do? Of course I could have taken my code, rewritten it in, say, Agda, and proved it correct there. But (right now) I don’t care about Agda code. I care about my Haskell code! I don’t want to write it twice, worry about copying mistakes and mismatchs in semantics, and have external proofs to maintain. Instead, I want to prove where I code, and have the proofs checked together with my code! Then it dawned to me that this is, to some extent, possible. The Haskell compiler comes with a sophisticated program transformation machinery, which is meant to simplify and optimize code. But it can also be used to prove Haskell expressions to be equivalent! The idea is simple: Take two expressions, run both through the compiler’s simplifier, and check if the results are the same. If they are, then the expressions are, as far as the compiler is concerned, equivalent. A handful of hours later, I was able to write proof tasks like app_law_2 = (\ a b (c::Succs a) -> pure (.) <*> a <*> b <*> c) === (\ a b c -> a <*> (b <*> c)) and others into my source file, and the compiler would tell me happily: [1 of 1] Compiling Successors ( Successors.hs, Successors.o ) GHC.Proof: Proving getCurrent_proof1 … GHC.Proof: Proving getCurrent_proof2 … GHC.Proof: Proving getCurrent_proof3 … GHC.Proof: Proving ap_star … GHC.Proof: Proving getSuccs_proof1 … GHC.Proof: Proving getSuccs_proof2 … GHC.Proof: Proving getSuccs_proof3 … GHC.Proof: Proving app_law_1 … GHC.Proof: Proving app_law_2 … GHC.Proof: Proving app_law_3 … GHC.Proof: Proving app_law_4 … GHC.Proof: Proving monad_law_1 … GHC.Proof: Proving monad_law_2 … GHC.Proof: Proving monad_law_3 … GHC.Proof: Proving return_pure … GHC.Proof proved 15 equalities This is how I want to prove stuff about my code! Do you also want to prove stuff about your code? I packaged this up as a GHC plugin in the Haskell library ghc-proofs (not yet on Hackage). The README of the repository has a bit more detail on how to use this plugin, how it works, what its limitations are and where this is heading. This is still only a small step, but finally there is a step towards low threshold program equivalence proofs in Haskell. 1. Or rather recover my karma after such abominations such as ghc-dup, seal-module or ghc-heap-view. ## February 06, 2017 ### FP Complete # MonadMask vs MonadBracket The exceptions package provides three typeclasses for generalizing exception handling to monads beyond IO: • MonadThrow is for monads which allow reporting an exception • MonadCatch is for monads which also allow catching a throw exception • MonadMask is for monads which also allow safely acquiring resources in the presence of asynchronous exceptions For reference, these are defined as: class Monad m => MonadThrow m where throwM :: Exception e => e -> m a class MonadThrow m => MonadCatch m where catch :: Exception e => m a -> (e -> m a) -> m a class MonadCatch m => MonadMask m where mask :: ((forall a. m a -> m a) -> m b) -> m b uninterruptibleMask :: ((forall a. m a -> m a) -> m b) -> m b This breakdown of the typeclasses is fully intentional, as each added capability excludes some class of monads, e.g.: • Maybe is a valid instance of MonadThrow, but since it throws away information on the exception that was thrown, it cannot be a MonadCatch • Continuation-based monads like Conduit are capable of catching synchronously thrown exceptions and are therefore valid MonadCatch instances, but cannot provide guarantees of safe resource cleanup (which is why the resourcet package exists), and are therefore not MonadMask instances However, there are two tightly related questions around MonadMask which trip people up a lot: • Why is there no instance for MonadMask for EitherT (or its new synonym ExceptT)? It's certainly possible to safely acquire resources within an EitherT transformer (see below for an example). • It seems perfectly reasonable to define an instance of MonadMask for a monad like Conduit, as its only methods are mask and uninterruptibleMask, which can certainly be implemented in a way that respects the types. The same applies to EitherT for that matter. Let's look at the docs for the MonadMask typeclass for a little more insight: Instances should ensure that, in the following code: f finally g The action g is called regardless of what occurs within f, including async exceptions. Well, this makes sense: finally is a good example of a function that guarantees cleanup in the event of any exception, so we'd want this (fairly straightforward) constraint to be met. The thing is, the finally function is not part of the MonadMask typeclass, but is instead defined on its own as (doing some aggressive inlining): finally :: MonadMask m => m a -> m b -> m a finally action finalizer = mask$ \unmasked -> do
result <- unmasked action catch \e -> do
finalizer
throwM (e :: SomeException)
finalizer
return result

Let's specialize the type signature to the ExceptT MyError IO type:

finally :: ExceptT MyError IO a
-> ExceptT MyError IO b
-> ExceptT MyError IO a

If we remember that ExceptT is defined as:

newtype ExceptT e m a = ExceptT (m (Either e a))

We can rewrite that signature to put the IO on the outside with an explicit Either return value. Inlining the Monad instance for ExceptT into the above implementation of finally, we get:

finally :: IO (Either MyError a)
-> IO (Either MyError b)
-> IO (Either MyError a)
finally action finalizer = mask $\unmasked -> do eresult <- unmasked action catch \e -> do finalizer throwM (e :: SomeException) case eresult of Left err -> return (Left err) Right result -> do finalizer return result (I took some shortcuts in this implementation to focus on the bad part, take it as an exercise to the reader to make a fully faithful implementation of this function.) With this inlined implementation, the problem becomes much easier to spot. We run action, which may result in a runtime exception. If it does, our catch function kicks in, we run the finalizer, and rethrow the exception, awesome. If there's no runtime exception, we have two cases to deal with: the result is either Right or Left. In the case of Right, we run our finalizer and return the result. Awesome. But the problem is in the Left case. Notice how we're not running the finalizer at all, which is clearly problematic behavior. I'm not pointing out anything new here, as this has been well known in the Haskell world, with packages like MonadCatchIO-transformers in the past. Just as importantly, I'd like to point out that it's exceedingly trivial to write a correct version of finally for the IO (Either MyError a) case, and therefore for the ExceptT MyError IO a case as well: finally :: IO (Either MyError a) -> IO (Either MyError b) -> IO (Either MyError a) finally action finalizer = mask$ \unmasked -> do
eresult <- unmasked action catch \e -> do
finalizer
throwM (e :: SomeException)
finalizer
return eresult

While this may look identical to the original, unspecialized version we have in terms of MonadMask and MonadCatch, there's an important difference: the monad used in the do-notation is IO, not ExceptT, and therefore the presence of a Left return value no longer has any special effect on control flow.

There are arguments to be had about the proper behavior to be displayed when the finalizer has some error condition, but I'm conveniently eliding that point right now. The point is: we can implement it when specializing Either or ExceptT.

A few weeks ago I was working on a pull request for the foundation package, adding a ResourceT transformer. At the time, foundation didn't have anything like MonadMask, so I needed to create such a typeclass. I could have gone with something matching the exceptions package; instead, I went for the following:

-- | A generalized version of the standard bracket function which
-- allows distinguishing different exit cases.
generalBracket
:: m a
-- ^ acquire some resource
-> (a -> b -> m ignored1)
-- ^ cleanup, no exception thrown
-> (a -> E.SomeException -> m ignored2)
-- ^ cleanup, some exception thrown. The exception will be rethrown
-> (a -> m b)
-- ^ inner action to perform with the resource
-> m b

This is a generalization of the bracket function. Importantly, it allows you to provide different cleanup functions for the success and failure cases. It also provides you with more information for cleanup, namely the exception that occured or the success value.

• It allows for a natural and trivial definition of all of the cleanup combinators (bracket, finally, onException, etc) in terms of this one primitive.
• The primitive can be defined with full knowledge of the implementation details of the monad in question.
• It makes invalid instances of MonadBracket look "obviously wrong" instead of just being accidentally wrong.

We can fiddle around with the exact definition of generalBracket. For example, with the type signature above, there is no way to create an instance for ExceptT, since in the case of a Left return value from the action:

• We won't have a runtime exception to pass to the exceptional cleanup function
• We won't have a success value to pass to the success cleanup function

This can easily be fixed by replacing:

-> (a -> b -> m ignored1)
-- ^ cleanup, no exception thrown

with

-> (a -> m ignored1)
-- ^ cleanup, no exception thrown

The point is: this formulation can allow for more valid instances, make it clearer why some instances don't exist, and prevent people from accidentally creating broken, buggy instances.

Note that I'm not actually proposing any changes to the exceptions package right now, I'm merely commenting on this new point in the design space. Backwards compatibility is something we need to seriously consider before rolling out changes.

# Program JSON and YAML with Dhall

This is short post to announce a new dhall-json library which lets you compile the Dhall configuration language to both JSON and YAML. This in turn means that you can now program JSON and YAML using a non-Turing-complete programming language.

Here's a quick example of the library in action:

$dhall-to-json <<< "{ foo = 1, bar = [1, 2, 3] }" {"foo":1,"bar":[1,2,3]}$ dhall-to-yaml <<< "{ foo = 1, bar = [1, 2, 3] }"
foo: 1
bar:
- 1
- 2
- 3
$cat example let indexed = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/indexed in let map = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/map in let words = [ "Lorem" , "ipsum" , "dolor" , "sit" , "amet" , "consectetur" , "adipiscing" , "elit" ] in let makeRow = λ(r : { index : Natural, value : Text }) { index = r.index , background = if Natural/even r.index then "Green" else "Red" , contents = r.value } in map { index : Natural, value : Text } { index : Natural, background : Text, contents : Text } makeRow (indexed Text words)$ dhall-to-json <<< "./example"
$dhall-to-yaml <<< "./example" - contents: Lorem index: 0 background: Green - contents: ipsum index: 1 background: Red - contents: dolor index: 2 background: Green - contents: sit index: 3 background: Red - contents: amet index: 4 background: Green - contents: consectetur index: 5 background: Red - contents: adipiscing index: 6 background: Green - contents: elit index: 7 background: Red This library bundles both JSON and YAML functionality together because Haskell's yaml library reuses the exact same data type that Haskell's aeson library uses to represent JSON (i.e. the Value type). This means that if you can compile a Dhall expression to a Value then you can render that Value as both JSON and YAML. Unlike the Dhall bindings to Nix, you can't compile most Dhall features to JSON or YAML, since they aren't real programming languages. After all, the whole point of this binding is to make JSON and YAML programmable! So if you try to compile anything other than primitive types or records you will get a compile error:$ dhall-to-json <<< "λ(x : Bool) → x"

Error: Cannot translate to JSON

Explanation: Only primitive values, records, ❰List❱s, and ❰Optional❱ values can
be translated from Dhall to JSON

The following Dhall expression could not be translated to JSON:

↳ λ(x : Bool)x

Right now I'm slowly adding Dhall integrations with new languages and configuration file formats so that people can use Dhall as a universal configuration language. So far I've targeted integrations that can reuse the initial Haskell implementation of Dhall (i.e. Nix, JSON, and YAML bindings). However, my next integration will probably reimplement Dhall in another language so that Dhall can be used to natively configure a wider range of languages without having to invoke a command-line dhall executable.

I'm most likely to build the second implementation of Dhall in Rust so that I can also reuse the same implementation to expose a C and C++ API. However, I've never done this sort of thing before, so if this is a dumb idea, let me know!

You can find the dhall-json package on Github or Hackage and if you would like to contribute to Dhall in general you can open an issue here.

# Logarithms and exponentials of functions

Introduction

A popular question in mathematics is this: given a function , what is its "square root" in the sense that . There are many questions about this on mathoverflow but it's also a popular subject in mathematics forums for non-experts. This question seems to have a certain amount of notoriety because it's easy to ask but hard to answer fully. I want to look at an approach that works nicely for formal power series, following from the Haskell code I wrote here. There are some methods for directly finding "functional square roots" for formal power series that start as , but I want to approach the problem indirectly. When working with real numbers we can find square roots, say, by using . I want to use an analogue of this for functions. So my goal is to make sense of the idea of the logarithm and exponential of a formal power series as composable functions. Warning: the arguments are all going to be informal.

Notation

There's potential for a lot of ambiguous notation here, especially as the usual mathematical notation for th powers of trig functions is so misleading. I'm going to use for composition of functions and power series, and I'm going to use the notation to mean the th iterate of . So and . As I'll be working mostly in the ring of formal power series for some ring , I'll reserve the variable to refer only to the corresponding element in this ring. I'll also use formal power series somewhat interchangeably with functions. So can be thought of as representing the identity function. To make sure we're on the same page, here are some small theorems in this notation:

1. .
That last one simply says that adding one times is the same as adding .

As I'm going to have ordinary logarithms and exponentials sitting around, as well as functional logarithms and exponentials, I'm going to introduce the notation for functional logarithm and for functional exponentiation.

Preliminaries

The first goal is to define a non-trivial function with the fundamental property that

First, let's note some basic algebraic facts. The formal power series form a commutative ring with operations and (ordinary multiplication) and with additive identity and multiplicative identity . The formal power series form a ring-like algebraic structure with operation and partial operation with additive identity and multiplicative identity . But it's not actually ring or even a near-ring. Composition isn't defined for all formal power series and even when it's defined, we don't have distributivity. For example, in general , after all there's no reason to expect to equal . We do have right-distributivity however, i.e.

,
because
,
more or less by definition of .

We can't use power series on our power series

There's an obvious approach, just use power series of power series. So we might tentatively suggest that

.
Note that I consider rather than because is the multiplicative identity in our ring-like structure.

Unfortunately this doesn't work. The reason is this: if we try to use standard reasoning to show that the resulting function has the fundamental property we seek we end up using distributivity. We don't have distributivity.

Sleight of hand

There's a beautiful trick I spotted on mathoverflow recently that allows us to bring back distributivity. (I can't find the trick again, but when I do I'll come back and add a link and credit here.) Consider the function defined by . In other words is right-composition by . (Ambiguity alert, I'm using here to mean right. It has nothing to do with the ring underlying our formal power series.) Because we have right-distributivity, is a bona fide linear operator on the space of formal power series. If you think of formal power series as being infinitely long vectors of coefficients then can be thought of as an infinitely sized matrix. This means that as long as we have convergence, we can get away with using power series to compute with the property that . Define:

.
We have:
where I'm using to mean the identity linear operator. And now have:
.
But does it converge? Suppose is of the form . Then . The leading term in is the same as the leading term in . So kills the first term of whatever it is applied to, which means that when we sum the terms in , we only need to get a power series correct to coefficients. Reusing my code from here, I call by the name flog. Here is its implementation:

> import Data.Ratio

> flog :: (Eq a, Fractional a) => [a] -> [a]
> flog f@(0 : 1 : _) =
> flog' 1 (repeat 0) (0 : 1 : repeat 0)
> where flog' n total term = take (n+1) total ++ (
> drop (n+1) $> let pz = p term > in flog' (n+1) (total-map (((-1)^n / fromIntegral n) *) pz) pz) > p total = (total ○ f) - total The take and drop are how I tell Haskell when the first coefficients have been exactly computed and so no more terms are necessary. Does it work? Here's an example using the twice iterated sin function: > ex1 = do > let lhs = flog (sin (sin z)) > let rhs = 2*flog (sin z) > mapM_ print$ take 20 (lhs-rhs)

Works to 20 coefficients. Dare we try an inverse function?

> ex2 = do
> let lhs = flog (sin z)
> let rhs = flog (asin z)
> mapM_ print $take 20 (lhs+rhs) Seems to work! Exponentials It's no good having logarithms if we can't invert them. One way to think about the exponential function is that We get better and better approximations by writing the expression inside the limit as a product of more and more terms. We can derive the usual power series for from this, but only if right-distributivity holds. So let's try to use the above expression directly: and get . Unfortunately, even though is linear, itself isn't. So it's going to take some extra work to raise to the power of . The good news is that we're dealing with the special case where is something small. We have . So is actually modulo higher order terms. This gives us . This is something we can implement using the power series for ordinary : . In code that becomes: > fexp f@(0 : 0 : _) = fexp' f 0 z 1 > fexp' f total term n = take (n-1) total ++ drop (n-1) > (fexp' f (total+term) (map (/fromIntegral n) (f*d term)) (n+1)) Note how when we differentiate a power series we shift the coefficients down by one place. To counter the effect of that so as to ensure convergence we need to look like . Luckily this is exactly the kind of series gives us. But does it successfully invert ? Let's try: > ex3 = do > let lhs = sin z > let rhs = fexp (flog (sin z)) > mapM_ print$ take 20 (lhs-rhs)

Now we can start computing fractional iterates. Square root first:

> ex4 = do
> mapM_ print $take 20$ fexp (flog (sin z)/2)

That matches the results at A048602 and A048603.

Cube root:

> ex5 = do
> mapM_ print $take 20$ fexp (flog (sin z)/3)

Matches A052132 and A052135.

And this gives an alternative to Lagrange inversion for computing power series for inverse functions:

> ex6 = do
> let lhs = fexp (-flog (sin z))
> let rhs = asin z
> mapM_ print $take 20 (lhs-rhs) What's really going on with ? Let's approach in a slightly different way. In effect, is the composition of lots of with . So let's try composing these one at a time, with one composition every seconds. After one second we should have our final result. We can write this as: and to first order. So we're solving the differential equation: and with . So is the function that solves one of the most fundamental differential equations. This also means I can use Mathematica to solve symbolically and check my results. For example, Mathematica says that the solution to and at is so let's check: > ex7 = do > let lhs = fexp ((sin z)^2) > let rhs = atan (tan z/(1-tan z)) > mapM_ print$ take 20 (lhs-rhs)

I like this example because it leads to the generalized Catalan numbers A004148:

> ex8 = do
> mapM_ print $take 20$ fexp (z^2/(1-z^2))

That suggests this question: what does mean combinatorially? I don't have a straightforward answer but solving this class of differential equation motivated the original introduction, by Cayley, of the abstract notion of a tree. See here.

What is going on geometrically?

For those who know some differential geometry, The differential equation

and
describes a flow on the real line (or complex plane). You can think of as being a one-dimensional vector field describing how points move from time to . When we solve the differential equation we get integral curves that these points follow and tells us where the points end up after one unit of time. So is the exponential map. In fact, is essentially the exponential of the vector field where we're now using the differential geometer's notion of a vector field as a differential operator.

Final word

Unfortunately the power series you get from using and don't always have good convergence properties. For example, I'm not sure but I think the series for has radius of convergence zero. If you truncate the series you get a half-decent approximaion to a square root in the vicinity of the origin, but the approximation gets worse, not better, if you use more terms.

And the rest of the code

> (*!) _ 0 = 0
> (*!) a b = a*b
> (!*) 0 _ = 0
> (!*) a b = a*b
> (^+) a b = zipWith (+) a b
> (^-) a b = zipWith (-) a b

> ~(a:as) ⊗ (b:bs) = (a *! b):
> ((map (a !*) bs) ^+ (as ⊗ (b:bs)))
> (○) (f:fs) (0:gs) = f:(gs ⊗ (fs ○ (0:gs)))
> inverse (0:f:fs) = x where x = map (recip f *) (0:1:g)
> _:_:g = map negate ((0:0:fs) ○ x)
> invert x = r where r = map (/x0) ((1:repeat 0) ^- (r ⊗ (0:xs)))
> x0:xs = x

> (^/) (0:a) (0:b) = a ^/ b
> (^/) a b = a ⊗ (invert b)

> z :: [Rational]
> z = 0:1:repeat 0

> d (_:x) = zipWith (*) (map fromInteger [1..]) x

> integrate x = 0 : zipWith (/) x (map fromInteger [1..])

> instance (Eq r, Num r) => Num [r] where
> x+y = zipWith (+) x y
> x-y = zipWith (-) x y
> ~x*y = x ⊗ y
> fromInteger x = fromInteger x:repeat 0
> negate x = map negate x
> signum (x:_) = signum x : repeat 0
> abs (x:xs) = error "Can't form abs of a power series"

> instance (Eq r, Fractional r) => Fractional [r] where
> x/y = x ^/ y
> fromRational x = fromRational x:repeat 0

> sqrt' x = 1 : rs where rs = map (/2) (xs ^- (rs ⊗ (0:rs)))
> _ : xs = x
> instance (Eq r, Fractional r) => Floating [r] where
> sqrt (1 : x) = sqrt' (1 : x)
> sqrt _ = error "Can only find sqrt when leading term is 1"
> exp x = e where e = 1+integrate (e * d x)
> log x = integrate (d x/x)
> sin x = integrate ((cos x)*(d x))
> cos x = [1] ... negate (integrate ((sin x)*(d x)))
> asin x = integrate (d x/sqrt(1-x*x))
> atan x = integrate (d x/(1+x*x))
> acos x = error "Unable to form power series for acos"
> sinh x = integrate ((cosh x)*(d x))
> cosh x = [1] ... integrate ((sinh x)*(d x))
> asinh x = integrate (d x/sqrt(1+x*x))
> atanh x = integrate (d x/(1-x*x))
> acosh x = error "Unable to form power series for acosh"
> pi = error "There is no formal power series for pi"

> lead [] x = x
> a ... x = lead a x

> (//) :: Fractional a => [a] -> (Integer -> Bool) -> [a]
> (//) a c = zipWith (\a-> \b->(if (c a :: Bool) then b else 0)) [(0::Integer)..] a

A direct functional square root that doesn't use and :

> fsqrt (0 : 1 : fs) =
> let gs = (fs-(0 : gs*((0 : delta gs gs)+((2 : gs)*(gs*g)))))/2
> g = 0 : 1 : gs
> delta (g : gs) h = let g' = delta gs h
> in (0 : ((1 : h) * g')) + gs
> in g