January 19, 2017

Michael Snoyman

This is a short follow-up to my blog post about mapM_ and Maybe. Roman Cheplyaka started a discussion on that post, and ultimately we came up with the following implementation of mapM_ which works for all Foldables and avoids the non-tail-recursive case for Maybe as desired:

mapM_ :: (Applicative m, Foldable f) => (a -> m ()) -> f a -> m ()
mapM_ f a =
go (toList a)
where
go [] = pure ()
go [x] = f x -- here's the magic
go (x:xs) = f x *> go xs

Why is this useful? If you implement mapM_ directly in terms of foldr or foldMap, there is no way to tell that you are currently looking at the last element in the structure, and therefore will always end up with the equivalent of f x *> pure () in your expanded code. By contrast, with explicit pattern matching on the list-ified version, we can easily pattern match with go [x] and avoid *> pure () bit, thereby making tail recursion possible.

Some interesting things to note:

• Using () <$f x instead of f x *> pure () or f x >> return () seemed to make no difference for tail recursion purposes. • As a result of that, we still need to have the ()-specialized type signature I describe in the previous blog post, there doesn't seem to be a way around that. • As you can see from the benchmark which I unceremoniously ripped off from Roman, there do not appear to be cases where this version has more memory residency than mapM_ from base. Roman had raised the concern that the intermediate list may involve extra allocations, though it appears that GHC is smart enough to avoid them. Here are the results. Notice the significantly higher residency numbers for base:  5000 roman 36,064 bytes 5000 michael 36,064 bytes 5000 base 36,064 bytes 50000 roman 36,064 bytes 50000 michael 36,064 bytes 50000 base 133,200 bytes 500000 roman 44,384 bytes 500000 michael 44,384 bytes 500000 base 2,354,216 bytes 5000000 roman 44,384 bytes 5000000 michael 44,384 bytes 5000000 base 38,235,176 bytes My takeaway from all of this: it's probably too late to change the type signature of mapM_ and forM_ in base, but this alternative implementation is a good fit for mono-traversable. Perhaps there are some rewrite rules that could be applied in base to get the benefits of this implementation as well. Completely tangential, but: as long as I'm linking to pull requests based on blog posts, I've put together a PR for classy-prelude and conduit-combinators that gets rid of generalized I/O operations, based on my readFile blog post. January 18, 2017 FP Complete Speeding up a distributed computation in Haskell <html> While helping a client ship a medical device we were tasked to make its response time bearable. This was no easy feat, given that each request to this device requires running a simulation that takes hours if ran on a single CPU. This long response time would make it impossible for doctors to use this device interactively, which in turn would make the device much less desirable -- think of a doctor having to wait hours between inputting the patient data and getting results, as opposed to getting results immediately as the data is available. Luckily the simulations in question are embarrassingly parallel, and thus one obvious path to reduce the response time is to run it on multiple CPUs. At the core of this device sits a Haskell program that performs the simulation. Thus the first step was to exploit Haskell built-in multi-core parallelism to achieve the parallelization. However the results were unsatisfactory, since we were unable to scale decently beyond 7 to 10 CPUs. Thus we created a custom distribution algorithm where separate Haskell runtimes communicate with TCP sockets, similar to what happens in Erlang. This also allowed us to scale beyond a single machine. We've described this effort in the past, see the report Scaling Up a Scientific Computation and the talk Parallelizing and distributing scientific software in Haskell. This first effort allowed us to run simulations in a much shorter time, but it still did not allow us to scale nicely to hundreds of CPUs. This article describes how we fixed that by bypassing one of the high level facilities that Haskell provides. High level languages are all about offering such facilities, to be able to write correct programs quicker. Haskell offers a great number of abstractions to help in this regard, such as garbage collection and laziness, and GHC also is full of tools on top of the language itself to write an ever greater number of programs in a more comfortable way. One of the features that makes GHC stand out is the sophistication of the runtime it provides. Apart from being an impressive piece of work even just for implementing Haskell efficiently, it also offers features that are very useful for the kind of systems programming that writing a distributed application requires. Specifically, green threads and the GHC event manager make writing a fast multi-threaded server much easier than in other languages. For example the first versions of Warp, Haskell's most popular web server, outperformed most web servers in just 500 lines of code, largely thanks to these facilities -- you can find more info about this effort in the report Warp: A Haskell Web Server. Warp has since grown in code size to add new features, but the core is still using the same facilities and performing well. Since the core of the software that we built is a server coordinating the work of many slaves, for our first version we reached for these facilities to write it. The server was reasonably fast and served us for a while, but we hit a ceiling pretty quickly beyond which we were unable to scale. However, a nice thing about GHC Haskell is that it's very easy to drop down to a lower level programming style when needed. This can be accomplished through the excellent foreign function interface to C paired with the low-level utilities in base. By doing so we were able to scale to hundreds of cores and run simulations up to 5 times faster then the best time we achieved with the previous version. The program As mentioned, the server in question is the master process in a distributed computing application. The application is essentially a particle filter, distributed across many processes which might be on different machines. Since we want multi-machine distribution, we use TCP sockets to communicate between the processes doing the computation. At the core of the program logic we have a function taking some State and some Input, and generating some new states and an output associated with each one: type Evolve = State -> Input -> [(State, Output)] Note that a single state and input pair generates multiple states and output. The multiple outputs are due to the fact that in a particle filter each state (or rather each "particle") can be sampled 0 or multiple times. We need to run one such function on thousands of inputs: -- Apply the Evolve to every given State, return -- the new states and output. evolveMany :: Evolve -> [State] -> [Input] -> [[(State, Output)]] evolveMany f = zipWith f Given this initial specification, there are a couple of adjustments we need to make if we want to be able to distribute the computation. First, the function will have to live in IO, since communication will happen through Sockets. Second, we won't refer to the states directly, but rather refer to them using tokens provided by the system. At the beginning we'll provide the initial states and get back tokens in result, and at each call to evolveMany we'll get -- instead of new States -- new tokens. We can do this because we do not care about the content of the states (while we care about the outputs) and referring to them with tokens rather than directly we can avoid transferring them to other processes each time we need to operate on them, saving a lot of bandwidth and speeding up the computation greatly. Thus, we'll also need to book-keep which slave processes are holding which state. Finally, we'll need Sockets to communicate with the slave processes. This gives us a new API: -- We use Map and Set from containers for illustrative purposes, HashMap -- from unordered-containers or a mutable hash table from hashtables -- will most likely be more performant. import Data.Map (Map) import qualified Data.Map as Map import Data.Set (Set) import qualified Data.Set as Set -- Some token representing a State on some slave. data StateId -- Some token representing a slave. data SlaveId -- Reset the states in the system to the given ones, returns a -- 'StateId' for each state. resetStates :: Map SlaveId Socket -- Connection to the slaves -> [State] -> IO (Map SlaveId (Set StateId), [StateId]) -- Returns how the states have been repartitioned on the slaves -- and a list to know which StateId corresponds to which State. -- Evolves the states with the given inputs, returns the outputs and -- the new 'StateId's resulting from the evolution. evolveMany :: Map SlaveId Socket -- Connections to the slaves -> Map SlaveId (Set StateId) -- Which states are on which slave -> Map StateId Input -- Inputs to each state -> IO (Map SlaveId (Set StateId), Map StateId [(StateId, Output)]) -- Returns the new mapping from slaves to states, and -- the outputs. When using this API, the usual pattern is to call resetStates at the beginning with the initial states and then a series of evolveMany afterwards, each using the StateIds returned from resetStates the first time and evolveMany afterwards. The challenge is to implement evolveMany as efficiently as possible. To give an idea of the time involved, we usually have around 2000 states, a few tens of calls to evolveMany, and each call to Evolve takes a few tenths of seconds to complete, giving a single-threaded run time of a few hours, e.g.  2000 * -- Number of states 80 * -- number of calls to evolveMany 0.03s = -- Evolve time 1h 20m -- Total running time High level overview of the implementation resetStates just assigns a unique StateId to each state, and then splits up and uploads the states evenly between the slaves. All the complexity lies in evolveMany: the goal is to utilize the slaves as efficiently as possible. We found pretty early that naively evolving the states present on each slave would not work, because: • Each call to Evolve results in many (possibly 0) children states (since the return type is a list), and we cannot predict how many we'll get in advance. This would cause different slaves to have a different number of states after a few calls to evolveMany, which in turn would cause the slaves to not be used efficiently, since some would end up being idle; • The runtime of an individual Evolve depends on the state and on the input, and we cannot predict it. This also can cause some slaves to finish earlier than others, causing inefficiencies. More concretely, imagine a situation with 10 states, where 9 of the states take 1 second while there is an odd state that takes 10 seconds. If we have 2 slaves at our disposal, the most efficient distribution is to assign the slow state to one slave, and all the others to another slave, with one slave taking 10 seconds and the other taking 9. If we just distribute the states evenly between the slaves, 1 slave will take 14 seconds and one 5. Since the total runtime will be constrained by the slowest slave, we must be careful to avoid such long tails. So we switched to a simple but effective method to utilize the slaves efficiently. The master process keeps track of the states present on each slave, and asks the slaves to process them in batches, say of 5. When a slave finishes its batch, it sends the output back to the master and waits for further instructions. If the slave still has states to evolve, the master sends a request for a new batch to be evolved. If the slave does not have states to update the master will search for a slave with states to spare, and request them. When a slave receives such a request it sends back the states to the master, which will forward them to the needy slave. When there are no more states to update, evolveMany is done. The algorithm can be summed up as two state machines, one for the master and one for the slave: -- This is what the master sends to the slave. data Request -- Evolve the selected states = EvolveStates [StateId] -- Add the given states | AddStates [(StateId, State)] -- Remove the requested states, and return them to the master | RemoveStates [StateId] -- This is what the slaves reply to the master. data Response = StatesEvolved [(StateId, [(StateId, Output)])] | StatesAdded | StatesRemoved [(StateId, State)] -- The slave has a set of States indexed by StateId, and it updates -- it at each request from the master. slaveStateMachine :: Map StateId State -> Request -> (Map StateId State, Response) -- Some type to refer to slaves uniquely. data SlaveId -- The master keeps track of which states each slave has, and will update -- it. It also records the outputs we have received from the slaves so far. data MasterState = MasterState { msSlavesStates :: Map SlaveId (Set StateId) , msStatesToEvolve :: Map StateId Input , msEvolvedStates :: Map StateId [(StateId, Output)] } -- At each response from a slave the master updates its state and then -- might reply with a new Request. Note that the Request might not -- be directed at the same slave that sent the Response, since sometimes -- we need to steal slaves from other slaves since the slave at hand does -- not have states to update. masterStateMachine :: MasterState -> SlaveId -> Response -> (MasterState, Maybe (SlaveId, Request)) The most common pattern of interaction between slave and master will be of a loop of EvolveStates and StatesEvolved: This interaction between slave and master will continue until one slave will runs out of states to evolve. In that case, the master will have to reach out to some other slave to be able to provide the needy slave with something to evolve. For example, this is what will happen if slave 3 runs out of states and the master decides to ship some states to it from slave 2: The exact implementation of the state machines is not relevant, but given their types what's to note is that: • The slave will be a very simple loop that just waits for a request, processes it, and then replies to the master. • The master, on the other hand, is a bit more complicated: it needs to wait for responses from any slave, which means that we'll have to multiplex over multiple channels; and then it can reply to any slave. First attempt, and performance Now that we have abstracted out the logic of the master and the slaves in self-contained state machines, we can describe the slave and master processes. We'll assume IO functions to send and receive messages. The slave implementation is trivial and won't change: -- These functions will use recv/send to work with the Sockets, -- and the store library to efficiently deserialize and serialize -- the requests and responses. receiveRequest :: Socket -> IO Request sendResponse :: Socket -> Response -> IO () slave :: Socket -- Connection to master -> IO a slave sock = loop mempty -- No states at the beginning where loop :: Map StateId State -> IO (Map StateId State) loop states = do req <- receiveFromMaster sock (states', resp) <- slaveStateMachine states req sendToMaster sock resp Note that a slave process is not bound to a single call to evolveMany, it just takes requests from a master. The master on the other hand is essentially the implementation of evolveMany, and we have a lot more options to implement it. Our first version is a pretty idiomatic Haskell program, using one thread per slave so that we can wait on all of them at once, with the master state stored in an MVar that can be accessed from all the slave threads: Each slave thread will run code waiting on a slave, modifying the shared state using the master state machine: import Control.Concurrent.MVar receiveResponse :: Socket -> IO Response sendRequest :: Socket -> Request -> IO () -- Terminates when there is nothing left to do. slaveThread :: Map SlaveId Socket -> MVar MasterState -> SlaveId -> IO () slaveThread slaveSockets masterStateVar slaveId = do resp <- receiveResponse (slaveSockets Map.! slaveId) (masterState, mbReq) <- modifyMVar masterStateVar$ \masterState ->
let (masterState', mbReq) =
masterStateMachine masterState slaveId resp
return (masterState', (masterState', mbReq))
-- Send the request if needed
mapM_
(\(slaveId, req) -> sendRequest (slaveSockets Map.! slaveId) req)
mbReq
-- Continue if there are still slates to evolve
unless (Map.null (msStatesToEvolve masterState)) $slaveThread masterStateVar slaveId -- Runs the provided actions in separate threads, returns as -- soon as any exists raceMany_ :: [IO ()] -> IO () raceMany_ xs0 = case xs0 of -- race_ is from the async package. [] -> return () [x] -> x x : xs -> race_ x (raceMany_ xs) evolveMany :: Map SlaveId Socket -> Map SlaveId (Set StateId) -> Map StateId Input -> IO (Map SlaveId (Set StateId), Map StateId [(StateId, Output)]) evolveMany slaveSockets slaveStates inputs = do masterStateVar <- newMVar MasterState { msSlavesStates = slaveStates , msStatesToEvolve = inputs , msEvolvedStates = mempty } -- Run one thread per slave until one receives the response -- after which there are no states to evolve raceMany_ (map (slaveThread masterStateVar) (Map.keys slaveStates)) -- Return the results in the MasterState masterState <- readMVar masterStateVar return (msSlavesStates masterState, msEvolvedStates masterState) This implementation is simple and quite obviously correct, and it's also pretty fast. In fact, we were able to scale up to around 20 slaves quite well with it: Note that both axes for this and every other plot in this article are logarithmic: if we scaled perfectly we'd get a straight line, which we're pretty close to. However, things go downhill if we try to scale beyond 20 slaves. Here is a sample of the runtime with up to 450 slaves for six different scenarios: These measurements were all taken on clusters of c4.8xlarge AWS instances with 18 physical cores, with up to 30 machines running at once. The benchmarking was automated using terraform, which was invaluable when evaluating the improvements. It's evident that the distribution does not scale beyond around 40 slaves, and stalls completely between 50 and 100 slaves, after which adding slaves is detrimental to the runtime. Note that for the scenarios taking more time the scaling is better: this is because for those scenarios each individual call to the Evolve function takes longer, and thus the overhead of the distribution is less substantial. This is the case for scenario D, which starts out being the slowest with 17 slaves, taking more than 4000 seconds rather than 800-1000 seconds, but scaling much better. From this data it was clear that if we wanted to be able to leverage a large number of machines to run our simulations in a minute or less we had to improve the performance of evolveMany. Small aside: note how these plots contains a line "with taskset" and one without, with the one without performing noticeably worse. The line with taskset indicates measurements taken where each Haskell process is pinned to a physical CPU core: this improves performance substantially compared to letting the kernel schedule them.[^runtimes] After finding this out we ran all subsequent tests pinning slave processes to physical cores. Hyperthreading was also detrimental to the runtime, since the increased distribution overhead far outweighed the gained CPU time; so we used only one process per physical CPU core and avoided hyperthreading. Keep in mind that since we're distributing the work manually using TCP sockets each slave is a separate OS process that runs a dedicated Haskell runtime, which is why it makes sense to pin it to a single core. Second attempt By measuring how much time each slave spent working and how much time it spent waiting for instructions from the master, it became clear that the program was getting slower because the slaves spent more and more time waiting for instructions, rather than actually working. Thus, if we wanted proper scaling, we needed to lower the latency between the time a response reached the master and the time the slave received the next request. Now, we tried to gain conclusive evidence of why our first version of evolveMany is slow, but profiling these sort of applications is quite hard unless you're intimately familiar with the Haskell runtime -- which is almost like saying "if you are Simon Marlow". We had however some hypotheses of why our program was slow. One possibility is that the event manager can simply not handle hundreds of connections at the same time efficiently, at least in our use case. Another suspicion is that the multi-threadedness of the first version played at our disadvantage since there would be a lot of pointless context-switches while one thread was already modifying the MVar MasterState. In other words, any context switch between slave threads while one slave thread is already holding the MVar MasterState is (almost) wasted, since it'll be blocked on the MVar MasterState right after receiving a slave response and will yield, delaying the completion of the loop body in the thread that was already processing the MasterState. While our second version was based on these hypotheses we were quite short on time and did not want to take the risk of rewriting the program to find that we still could not scale as we desired. Thus, we set ourselves to write the fastest possible version of evolveMany that we could think of. The main change we wanted was to turn the server from a multi-threaded server multiplexing through the event manager to a single-threaded application multiplexing the sockets directly. In Linux, the epoll set of syscalls exist for this exact reason: you can register multiple sockets to wait on with epoll_ctl, and then wait for any of them to be ready using epoll_wait. However in Haskell epoll is abstracted over by the GHC event manager, so there is no library to use these facilities directly. The GHC event manager does offer an interface to it in the form of GHC.Event.registerFd. However all these functions are callback based -- they take a function that will be called in a green thread when the socket is ready. Thus we cannot easily write a single threaded program directly using it. If we want to write a single-threaded loop we're forced to go through an additional synchronization primitive such an MVar to signal that a socket is ready to be read from in the callback provided to registerFd. Note that the normal blocking read for Haskell sockets is implemented using threadWaitRead, which uses registerFd in exactly this way, by having the callback to fill in an MVar that threadWaitRead will wait on. We tried this approach and got no performance improvement. Thus we decided to just write the loop using epoll directly, which proved very painless given that the GHC codebase already contains bindings to the epoll functions, as part of the event manager. We released a simple library for people that need to do the same, simple-poll. Right now it only supports epoll, and is thus limited to Linux, but it should be easy to extend to other platforms by copy-pasting other bits of code from the GHC event manager. Updating the old loop to an explicit multiplexing style, we have: -- System.Poll.EPoll comes from the simple-poll package import System.Poll.EPoll (EPoll) import qualified System.Poll.EPoll as EPoll import Network.Socket (Socket(MkSocket)) import System.Posix.Types (Fd(Fd)) -- Receives first responses to arrive from any of the slaves. -- This amounts to calling EPoll.wait to get back a list of -- sockets to read from, and then draining them in turn to -- decode the Response. -- -- Note that draining them might still not give us a response, -- since the full response might not be available all at once, -- and thus in the full version of the software this function will have -- to hold somes state holding partially read messages. -- -- Also note that in the real software it's much better to return -- a list of (SlaveId, Response) pairs. We have it return only -- one for simplicity. receiveFromAnySlave :: EPoll -> Map Fd SlaveId -- Reverse lookup table from Fds to SlaveIds. We need it -- since EPoll.wait gives us the Fds which are ready to -- be read from, and from that we need to get back which -- SlaveId it corresponds to, to return it. -> IO (SlaveId, Response) -- Utility to get a file descriptor out of a Socket socketFd :: Socket -> Fd socketFd (MkSocket fd _ _ _ _) = Fd fd evolveMany :: Map SlaveId Socket -- All the connections to the slaves -> Map SlaveId (Set StateId) -- The states held by each slave -> Map StateId Input -- The inputs to each state -> IO (Map SlaveId (Set StateId), Map StateId [(StateId, Output)]) evolveMany slaveSockets slaveStates inputs = EPoll.with 256$ \epoll -> do
-- First register all the sockets with epoll_ctl. epollIn is to
-- indicate that we want to be notified when a socket can be read from.
forM_ slaveSockets $\socket -> EPoll.control epoll Epoll.controlOpAdd (socketFd socket) EPoll.epollIn -- Then start the event loop masterState <- loop epoll MasterState { msSlavesStates = slaveStates , msStatesToEvolve = inputs , msEvolvedStates = mempty } return (msSlavesStates masterState, msEvolvedStates masterState) where fdToSlaveIds :: Map Fd SlaveId fdToSlaveIds = Map.fromList [(socketFd sock, slaveId) | (slaveId, sock) <- Map.toList slaveSockets] loop :: EPoll -> MasterState -> IO (Map StateId [(StateId, Output)]) loop epoll masterState = do -- Get a response from some slave (slaveId, resp) <- receiveFromAnySlave epoll slaveSockets -- Update the state accordingly let (masterState', mbResp) = masterStateMachine masterState slaveId resp -- Send the new requests mapM_ (uncurry sendToSlave) mbResp -- Continue if we're not done unless (Map.null (msStatesToEvolve masterState')) (loop masterState') Once we did this, the performance increased dramatically, fulfilling our current scaling needs and probably getting quite close to optimal scaling for our use case, although we have not researched what more margin for improvements we have since we do not need them for now. Going back to the original set of plots, the blue line shows the improved performance with our second implementation: The plots clearly show a much nicer scaling pattern as the number of slaves increases, and runtimes of often 100 seconds of less, which represent a 2x to 5x improvement compared to the first version. We also integrated other micro optimizations that yielded less substantial improvements (in the 5 to 10%) range, such as • Using mutable hashtables instead of unordered-containers for most of the bookkeeping. • Reading from the Socket directly into a ByteBuffer and deserializing directly from there rather than copying into intermediate ByteStrings, reducing allocations drastically to perform deserialization, since we allocate the buffer where the socket data is read into upfront. Conclusion Our biggest takeaway from this experience is that in Haskell we can have the confidence that we'll always be able to write the task at hand to be as fast as possible with relative ease. Writing the epoll based version took around a day, including factoring out the bindings from the GHC event manager into a library. Moreover, it's important to remember that the normal facilities for fast IO in Haskell (green threads + transparent evented IO) is fast enough for the overwhelming majority of cases, and much easier to manage and think about than manual evented IO. Michael Snoyman recently compared green threads to garbage collection, an apt comparison. Our software is one of the cases where the abstraction prevents performance, and thus we need to work without it. Finally, it would be great to gain hard evidence on why the first program was slow, rather than just hypotheses. We tried quite hard to understand it but could not reach conclusive evidence in the time we had. We hope to get to the bottom of this issue when we have the time, and maybe make profiling these kind of programs easier in the meantime. Acknowledgments The work described was performed with Philipp Kant and Niklas Hambüchen. Thanks to Michael Snoyman, Philipp Kant, and Niklas Hambüchen for reviewing drafts of this blog post. </html> Edward Z. Yang Try Backpack: Cabal packages This post is part two of a series about how you can try out Backpack, a new mixin package system for Haskell. In the previous post, we described how to use a new ghc --backpack mode in GHC to quickly try out Backpack's new signature features. Unfortunately, there is no way to distribute the input files to this mode as packages on Hackage. So in this post, we walk through how to assemble equivalent Cabal packages which have the same functionality. Download a cabal-install nightly Along with the GHC nightly, you will need a cabal-install nightly to run these examples. Assuming that you have installed hvr's PPA already, just aptitude install cabal-install-head and you will get a Backpack-ready cabal-install in /opt/cabal/head/bin/. Otherwise, you will need to build cabal-install from source. I recommend using a released version of GHC (e.g., your system GHC, not a nightly) to build cabal-install. Where we are going Here is an abridged copy of the code we developed in the last post, where I have removed all of the module/signature contents: unit str-bytestring where module Str unit str-string where module Str unit regex-types where module Regex.Types unit regex-indef where dependency regex-types signature Str module Regex unit main where dependency regex-types dependency regex-indef[Str=str-string:Str] (Regex as Regex.String) dependency regex-indef[Str=str-bytestring:Str] (Regex as Regex.ByteString) module Main  One obvious way to translate this file into Cabal packages is to define a package per unit. However, we can also define a single package with many internal libraries—a new feature, independent of Backpack, which lets you define private helper libraries inside a single package. Since this approach involves less boilerplate, we'll describe it first, before "productionizing" the libraries into separate packages. For all of these example, we assume that the source code of the modules and signatures have been copy-pasted into appropriate hs and hsig files respectively. You can find these files in the source-only branch of backpack-regex-example Single package layout In this section, we'll step through the Cabal file which defines each unit as an internal library. You can find all the files for this version at the single-package branch of backpack-regex-example. This package can be built with a conventional cabal configure -w ghc-head (replace ghc-head with the path to your copy of GHC HEAD) and then cabal build. The header of the package file is fairly ordinary, but as Backpack uses new Cabal features, cabal-version must be set to >=1.25 (note that Backpack does NOT work with Custom setup): name: regex-example version: 0.1.0.0 build-type: Simple cabal-version: >=1.25  Private libraries. str-bytestring, str-string and regex-types are completely conventional Cabal libraries that only have modules. In previous versions of Cabal, we would have to make a package for each of them. However, with private libraries, we can simply list multiple library stanzas annotated with the internal name of the library: library str-bytestring build-depends: base, bytestring exposed-modules: Str hs-source-dirs: str-bytestring library str-string build-depends: base exposed-modules: Str hs-source-dirs: str-string library regex-types build-depends: base exposed-modules: Regex.Types hs-source-dirs: regex-types  To keep the modules for each of these internal libraries separate, we give each a distinct hs-source-dirs. These libraries can be depended upon inside this package, but are hidden from external clients; only the public library (denoted by a library stanza with no name) is publically visible. Indefinite libraries. regex-indef is slightly different, in that it has a signature. But it is not too different writing a library for it: signatures go in the aptly named signatures field: library regex-indef build-depends: base, regex-types signatures: Str exposed-modules: Regex hs-source-dirs: regex-indef  Instantiating. How do we instantiate regex-indef? In our bkp file, we had to explicitly specify how the signatures of the package were to be filled: dependency regex-indef[Str=str-string:Str] (Regex as Regex.String) dependency regex-indef[Str=str-bytestring:Str] (Regex as Regex.ByteString)  With Cabal, these instantiations can be specified through a more indirect process of mix-in linking, whereby the dependencies of a package are "mixed together", with required signatures of one dependency being filled by exposed modules of another dependency. Before writing the regex-example executable, let's write a regex library, which is like regex-indef, except that it is specialized for String: library regex build-depends: regex-indef, str-string reexported-modules: Regex as Regex.String  Here, regex-indef and str-string are mix-in linked together: the Str module from str-string fills the Str requirement from regex-indef. This library then reexports Regex under a new name that makes it clear it's the String instantiation. We can easily do the same for a ByteString instantiated version of regex-indef: library regex-bytestring build-depends: regex-indef, str-bytestring reexported-modules: Regex as Regex.ByteString  Tie it all together. It's simple enough to add the executable and then build the code: executable regex-example main-is: Main.hs build-depends: base, regex, regex-bytestring, regex-types hs-source-dirs: regex-example  In the root directory of the package, you can cabal configure; cabal build the package (make sure you pass -w ghc-head!) Alternatively, you can use cabal new-build to the same effect. There's more than one way to do it In the previous code sample, we used reexported-modules to rename modules at declaration-time, so that they did not conflict with each other. However, this was possible only because we created extra regex and regex-bytestring libraries. In some situations (especially if we are actually creating new packages as opposed to internal libraries), this can be quite cumbersome, so Backpack offers a way to rename modules at use-time, using the mixins field. It works like this: any package declared in build-depends can be specified in mixins with an explicit renaming, specifying which modules should be brought into scope, with what name. For example, str-string and str-bytestring both export a module named Str. To refer to both modules without using package-qualified imports, we can rename them as follows: executable str-example main-is: Main.hs build-depends: base, str-string, str-bytestring mixins: str-string (Str as Str.String), str-bytestring (Str as Str.ByteString) hs-source-dirs: str-example  The semantics of the mixins field is that we bring only the modules explicitly listed in the import specification (Str as Str.String) into scope for import. If a package never occurs in mixins, then we default to bringing all modules into scope (giving us the traditional behavior of build-depends). This does mean that if you say mixins: str-string (), you can force a component to have a dependency on str-string, but NOT bring any of its module into scope. It has been argued package authors should avoid defining packages with conflicting module names. So supposing that we restructure str-string and str-bytestring to have unique module names: library str-string build-depends: base exposed-modules: Str.String hs-source-dirs: str-string library str-bytestring build-depends: base, bytestring exposed-modules: Str.ByteString hs-source-dirs: str-bytestring  We would then need to rewrite regex and regex-bytestring to rename Str.String and Str.ByteString to Str, so that they fill the hole of regex-indef: library regex build-depends: regex-indef, str-string mixins: str-string (Str.String as Str) reexported-modules: Regex as Regex.String library regex-bytestring build-depends: regex-indef, str-bytestring mixins: str-bytestring (Str.ByteString as Str) reexported-modules: Regex as Regex.ByteString  In fact, with the mixins field, we can avoid defining the regex and regex-bytestring shim libraries entirely. We can do this by declaring regex-indef twice in mixins, renaming the requirements of each separately: executable regex-example main-is: Main.hs build-depends: base, regex-indef, str-string, str-bytestring, regex-types mixins: regex-indef (Regex as Regex.String) requires (Str as Str.String), regex-indef (Regex as Regex.ByteString) requires (Str as Str.ByteString) hs-source-dirs: regex-example  This particular example is given in its entirety at the better-single-package branch in backpack-regex-example. Note that requirement renamings are syntactically preceded by the requires keyword. The art of writing Backpack packages is still in its infancy, so it's unclear what conventions will win out in the end. But here is my suggestion: when defining a module intending to implement a signature, follow the existing no-conflicting module names convention. However, add a reexport of your module to the name of the signature. This trick takes advantage of the fact that Cabal will not report that a module is redundant unless it is actually used. So, suppose we have: library str-string build-depends: base exposed-modules: Str.String reexported-modules: Str.String as Str hs-source-dirs: str-string library str-bytestring build-depends: base, bytestring exposed-modules: Str.ByteString reexported-modules: Str.ByteString as Str hs-source-dirs: str-bytestring  Now all of the following components work: library regex build-depends: regex-indef, str-string reexported-modules: Regex as Regex.String library regex-bytestring build-depends: regex-indef, str-bytestring reexported-modules: Regex as Regex.ByteString -- "import Str.String" is unambiguous, even if "import Str" is executable str-example main-is: Main.hs build-depends: base, str-string, str-bytestring hs-source-dirs: str-example -- All requirements are renamed away from Str, so all the -- instantiations are unambiguous executable regex-example main-is: Main.hs build-depends: base, regex-indef, str-string, str-bytestring, regex-types mixins: regex-indef (Regex as Regex.String) requires (Str as Str.String), regex-indef (Regex as Regex.ByteString) requires (Str as Str.ByteString) hs-source-dirs: regex-example  Separate packages OK, so how do we actually scale this up into an ecosystem of indefinite packages, each of which can be used individually and maintained by separate individuals? The library stanzas stay essentially the same as above; just create a separate package for each one. Rather than reproduce all of the boilerplate here, the full source code is available in the multiple-packages branch of backpack-regex-example. There is one important gotcha: the package manager needs to know how to instantiate and build these Backpack packages (in the single package case, the smarts were encapsulated entirely inside the Cabal library). As of writing, the only command that knows how to do this is cabal new-build (I plan on adding support to stack eventually, but not until after I am done writing my thesis; and I do not plan on adding support to old-style cabal install ever.) Fortunately, it's very easy to use cabal new-build to build regex-example; just say cabal new-build -w ghc-head regex-example. Done! Conclusions If you actually want to use Backpack for real, what can you do? There are a number of possibilities: 1. If you are willing to use GHC 8.2 only, and you only need to parametrize code internally (where the public library looks like an ordinary, non-Backpack package), using Backpack with internal libraries is a good fit. The resulting package will be buildable with Stack and cabal-install, as long as you are using GHC 8.2. This is probably the most pragmatic way you can make use of Backpack; the primary problem is that Haddock doesn't know how to deal with reexported modules, but this should be fixable. 2. If you are willing to use cabal new-build only, then you can also write packages which have requirements, and let clients decide however they want to implement their packages. Probably the biggest "real-world" impediment to using Backpack, besides any lurking bugs, is subpar support for Haddock. But if you are willing to overlook this (for now, in any case), please give it a try! January 17, 2017 Jasper Van der Jeugt Lazy I/O and graphs: Winterfell to King's Landing Introduction This post is about Haskell, and lazy I/O in particular. It is a bit longer than usual, so I will start with a high-level overview of what you can expect: • We talk about how we can represent graphs in a “shallow embedding”. This means we will not use a dedicated Graph type and rather represent edges by directly referencing other Haskell values. • This is a fairly good match when we want to encode infinite 1 graphs. When dealing with infinite graphs, there is no need to “reify” the graph and enumerate all the nodes and egdes – this would be futile anyway. • We discuss a Haskell implementation of shortest path search in a weighted graph that works on these infinite graphs and that has good performance characteristics. • We show how we can implement lazy I/O to model infinite graphs as pure values in Haskell, in a way that only the “necessary” parts of the graph are loaded from a database. This is done using the unsafeInterleaveIO primitive. • Finally, we discuss the disadvantages of this approach as well, and we review some of common problems associated with lazy I/O. Let’s get to it! As usual, this is a literate Haskell file, which means that you can just load this blogpost into GHCi and play with it. You can find the raw .lhs file here. > {-# LANGUAGE OverloadedStrings #-} > {-# LANGUAGE ScopedTypeVariables #-} > import Control.Concurrent.MVar (MVar, modifyMVar, newMVar) > import Control.Monad (forM_, unless) > import Control.Monad.State (State, gets, modify, runState) > import Data.Hashable (Hashable) > import qualified Data.HashMap.Strict as HMS > import qualified Data.HashPSQ as HashPSQ > import Data.Monoid ((<>)) > import qualified Data.Text as T > import qualified Data.Text.IO as T > import qualified Database.SQLite.Simple as SQLite > import qualified System.IO.Unsafe as IO The problem at hand As an example problem, we will look at finding the shortest path between cities in Westeros, the fictional location where the A Song of Ice and Fire novels (and HBO’s Game of Thrones) take place. We model the different cities in a straightforward way. In addition to a unique ID used to identify them, they also have a name, a position (X,Y coordinates) and a list of reachable cities, with an associated time (in days) it takes to travel there. This travel time, also referred to as the cost, is not necessarily deducable from the sets of X,Y coordinates: some roads are faster than others. > type CityId = T.Text > data City = City > { cityId :: CityId > , cityName :: T.Text > , cityPos :: (Double, Double) > , cityNeighbours :: [(Double, City)] > } Having direct access to the neighbouring cities, instead of having to go through CityIds both has advantages and disadvantages. On one hand, updating these values becomes cumbersome at best, and impossible at worst. If we wanted to change a city’s name, we would have to traverse all other cities to update possible references to the changed city. On the other hand, it makes access more convenient (and faster!). Since we want a read-only view on the data, it works well in this case. Getting the data We will be using data extracted from got.show, conveniently licensed under a Creative Commons license. You can find the complete SQL dump here. The schema of the database should not be too surprising: CREATE TABLE cities ( id text PRIMARY KEY NOT NULL, name text NOT NULL, x float NOT NULL, y float NOT NULL ); CREATE TABLE roads ( origin text NOT NULL, destination text NOT NULL, cost float NOT NULL, PRIMARY KEY (origin, destination) ); CREATE INDEX roads_origin ON roads (origin); The road costs have been generated by multiplying the actual distances with a random number uniformly chosen between 0.6 and 1.4. Cities have been (bidirectionally) connected to at least four closest neighbours. This ensures that every city is reachable. We will use sqlite in our example because there is almost no setup involved. You can load this database by issueing: curl -L jaspervdj.be/files/2017-01-17-got.sql.txt | sqlite3 got.db But instead of considering the whole database (which we’ll get to later), let’s construct a simple example in Haskell so we can demonstrate the interface a bit. We can use a let to create bindings that refer to one another easily. > test01 :: IO () > test01 = do > let winterfell = City "wtf" "Winterfell" (-105, 78) > [(13, moatCailin), (12, whiteHarbor)] > whiteHarbor = City "wih" "White Harbor" (-96, 74) > [(15, braavos), (12, winterfell)] > moatCailin = City "mtc" "Moat Cailin" (-104, 72) > [(20, crossroads), (13, winterfell)] > braavos = City "brv" "Braavos" (-43, 67) > [(17, kingsLanding), (15, whiteHarbor)] > crossroads = City "crs" "Crossroads Inn" (-94, 58) > [(7, kingsLanding), (20, crossroads)] > kingsLanding = City "kgl" "King's Landing" (-84, 45) > [(7, crossroads), (17, kingsLanding)] > > printSolution$
>         shortestPath cityId cityNeighbours winterfell kingsLanding

printSolution is defined as:

> printSolution :: Maybe (Double, [City]) -> IO ()
> printSolution Nothing             = T.putStrLn "No solution found"
> printSolution (Just (cost, path)) = T.putStrLn $> "cost: " <> T.pack (show cost) <> > ", path: " <> T.intercalate " -> " (map cityName path) We get exactly what we expect in GHCi: *Main> test01 cost: 40.0, path: Winterfell -> Moat Cailin -> Crossroads Inn -> King's Landing So far so good! Now let’s dig in to how shortestPath works. The Shortest Path algorithm The following algorithm is known as Uniform Cost Search. It is a variant of Dijkstra’s graph search algorithm that is able to work with infinite graphs (or graphs that do not fit in memory anyway). It returns the shortest path between a known start and goal in a weighted directed graph. Because this algorithm attempts to solve the problem the right way, including keeping back references, it is not simple. Therefore, if you are only interested in the part about lazy I/O, feel free to skip to this section and return to the algorithm later. We have two auxiliary datatypes. BackRef is a wrapper around a node and the previous node on the shortest path to the former node. Keeping these references around is necessary to iterate a list describing the entire path at the end. > data BackRef node = BackRef {brNode :: node, brPrev :: node} We will be using a State monad to implement the shortest path algorithm. This is our state: > data SearchState node key cost = SearchState > { ssQueue :: HashPSQ.HashPSQ key cost (BackRef node) > , ssBackRefs :: HMS.HashMap key node > } In our state, we have: • A priority queue of nodes we will visit next in ssQueue, including back references. Using a priority queue will let us grab the next node with the lowest associated cost in a trivial way. • Secondly, we have the ssBackRefs map. That one serves two purposes: to keep track of which nodes we have already explored (the keys in the map), and to keep the back references of those locations (the values in the map). These two datatypes are only used internally in the shortestPath function. Ideally, we would be able to put them in the where clause, but that is not possible in Haskell. Instead of declaring a Node typeclass (possibly with associated types for the key and cost types), I decided to go with simple higher-order functions. We only need two of those function arguments after all: a function to give you a node’s key (nodeKey) and a function to get the node’s neighbours and associated costs (nodeNeighbours). > shortestPath > :: forall node key cost. > (Ord key, Hashable key, Ord cost, Num cost) > => (node -> key) > -> (node -> [(cost, node)]) > -> node > -> node > -> Maybe (cost, [node]) > shortestPath nodeKey nodeNeighbours start goal = We start by creating an initial SearchState for our algorithm. Our initial queue holds one item (implying that we need explore the start) and our initial back references map is empty (we haven’t explored anything yet). > let startbr = BackRef start start > queue0 = HashPSQ.singleton (nodeKey start) 0 startbr > backRefs0 = HMS.empty > searchState0 = SearchState queue0 backRefs0 walk is the main body of the shortest path search. We call that and if we found a shortest path, we return its cost together with the path which we can reconstruct from the back references (followBackRefs). > (mbCost, searchState1) = runState walk searchState0 in > case mbCost of > Nothing -> Nothing > Just cost -> Just > (cost, followBackRefs (ssBackRefs searchState1)) > where Now, we have a bunch of functions that are used within the algorithm. The first one, walk, is the main body. We start by exploring the next node in the queue. By construction, this is always a node we haven’t explored before. If this node is the goal, we’re done. Otherwise, we check the node’s neighbours and update the queue with those neighbours. Then, we recursively call walk. > walk :: State (SearchState node key cost) (Maybe cost) > walk = do > mbNode <- exploreNextNode > case mbNode of > Nothing -> return Nothing > Just (cost, curr) > | nodeKey curr == nodeKey goal -> > return (Just cost) > | otherwise -> do > forM_ (nodeNeighbours curr)$ \(c, next) ->
>                         updateQueue (cost + c) (BackRef next curr)
>                     walk

Exploring the next node is fairly easy to implement using a priority queue: we simply need to pop the element with the minimal priority (cost) using minView. We also need indicate that we reached this node and save the back reference by inserting that info into ssBackRefs.

>     exploreNextNode
>         :: State (SearchState node key cost) (Maybe (cost, node))
>     exploreNextNode = do
>         queue0 <- gets ssQueue
>         case HashPSQ.minView queue0 of
>             Nothing                                   -> return Nothing
>             Just (_, cost, BackRef curr prev, queue1) -> do
>                 modify $\ss -> ss > { ssQueue = queue1 > , ssBackRefs = > HMS.insert (nodeKey curr) prev (ssBackRefs ss) > } > return$ Just (cost, curr)

updateQueue is called as new neighbours are discovered. We are careful about adding new nodes to the queue:

1. If we have already explored this neighbour, we don’t need to add it. This is done by checking if the neighbour key is in ssBackRefs.
2. If the neighbour is already present in the queue with a lower priority (cost), we don’t need to add it, since we want the shortest path. This is taken care of by the utility insertIfLowerPrio, which is defined below.
>     updateQueue
>         :: cost -> BackRef node -> State (SearchState node key cost) ()
>     updateQueue cost backRef = do
>         let node = brNode backRef
>         explored <- gets ssBackRefs
>         unless (nodeKey node HMS.member explored) $modify$ \ss -> ss
>             { ssQueue = insertIfLowerPrio
>                 (nodeKey node) cost backRef (ssQueue ss)
>             }

If the algorithm finishes, we have found the lowest cost from the start to the goal, but we don’t have the path ready. We need to reconstruct this by following the back references we saved earlier. followBackRefs does that for us. It recursively looks up nodes in the map, constructing the path in the accumulator acc on the way, until we reach the start.

>     followBackRefs :: HMS.HashMap key node -> [node]
>     followBackRefs paths = go [goal] goal
>       where
>         go acc node0 = case HMS.lookup (nodeKey node0) paths of
>             Nothing    -> acc
>             Just node1 ->
>                 if nodeKey node1 == nodeKey start
>                    then start : acc
>                    else go (node1 : acc) node1

That’s it! The only utility left is the insertIfLowerPrio function. Fortunately, we can easily define this using the alter function from the psqueues package. That function allows us to change a key’s associated value and priority. It also allows to return an additional result, but we don’t need that, so we just use () there.

> insertIfLowerPrio
>     :: (Hashable k, Ord p, Ord k)
>     => k -> p -> v -> HashPSQ.HashPSQ k p v -> HashPSQ.HashPSQ k p v
> insertIfLowerPrio key prio val = snd . HashPSQ.alter
>     (\mbOldVal -> case mbOldVal of
>         Just (oldPrio, _)
>             | prio < oldPrio -> ((), Just (prio, val))
>             | otherwise      -> ((), mbOldVal)
>         Nothing              -> ((), Just (prio, val)))
>     key

Interlude: A (very) simple cache

Lazy I/O will guarantee that we only load the nodes in the graph when necessary.

However, since we know that the nodes in the graph do not change over time, we can build an additional cache around it. That way, we can also guarantee that we only load every node once.

Implementing such a cache is very simple in Haskell. We can simply use an MVar, that will even take care of blocking 2 when we have concurrent access to the cache (assuming that is what we want).

> type Cache k v = MVar (HMS.HashMap k v)
> newCache :: IO (Cache k v)
> newCache = newMVar HMS.empty
> cached :: (Hashable k, Ord k) => Cache k v -> k -> IO v -> IO v
> cached mvar k iov = modifyMVar mvar $\cache -> do > case HMS.lookup k cache of > Just v -> return (cache, v) > Nothing -> do > v <- iov > return (HMS.insert k v cache, v) Loading the graph using Lazy I/O Now, we get to the main focus of the blogpost: how to use lazy I/O primitives to ensure resources are only loaded when they are needed. Since we are only concerned about one datatype (City) our loading code is fairly easy. The most important loading function takes the SQLite connection, the cache we wrote up previously, and a city ID. We immediately use the cached combinator in the implementation, to make sure we load every CityId only once. > getCityById > :: SQLite.Connection -> Cache CityId City -> CityId > -> IO City > getCityById conn cache id' = cached cache id'$ do

Now, we get some information from the database. We play it a bit loose here and assume a singleton list will be returned from the query.

>     [(name, x, y)] <- SQLite.query conn
>         "SELECT name, x, y FROM cities WHERE id = ?" [id']

The neighbours are stored in a different table because we have a properly normalised database. We can write a simple query to obtain all roads starting from the current city:

>     roads <- SQLite.query conn
>         "SELECT cost, destination FROM roads WHERE origin = ?"
>         [id'] :: IO [(Double, CityId)]

This leads us to the crux of the matter. The roads variable contains something of the type [(Double, CityId)], and what we really want is [(Double, City)]. We need to recursively call getCityById to load what we want. However, doing this “the normal way” would cause problems:

1. Since the IO monad is strict, we would end up in an infinite loop if there is a cycle in the graph (which is almost always the case for roads and cities).
2. Even if there was no cycle, we would run into trouble with our usage of MVar in the Cache. We block access to the Cache while we are in the cached combinator, so calling getCityById again would cause a deadlock.

This is where Lazy I/O shines. We can implement lazy I/O by using the unsafeInterleaveIO primitive. Its type is very simple and doesn’t look as threatening as unsafePerformIO.

unsafeInterleaveIO :: IO a -> IO a

It takes an IO action and defers it. This means that the IO action is not executed right now, but only when the value is demanded. That is exactly what we want!

We can simply wrap the recursive calls to getCityById using unsafeInterleaveIO:

>     neighbours <- IO.unsafeInterleaveIO $> mapM (traverse (getCityById conn cache)) roads And then return the City we constructed: > return$ City id' name (x, y) neighbours

Lastly, we will add a quick-and-dirty wrapper around getCityById so that we are also able to load cities by name. Its implementation is trivial:

> getCityByName
>     :: SQLite.Connection -> Cache CityId City -> T.Text
>     -> IO City
> getCityByName conn cache name = do
>     [[id']] <- SQLite.query conn
>         "SELECT id FROM cities WHERE name = ?" [name]
>     getCityById conn cache id'

Now we can neatly wrap things up in our main function:

> main :: IO ()
> main = do
>     cache <- newCache
>     conn  <- SQLite.open "got.db"
>     winterfell <- getCityByName conn cache "Winterfell"
>     kings      <- getCityByName conn cache "King's Landing"
>     printSolution $> shortestPath cityId cityNeighbours winterfell kings This works as expected: *Main> :main cost: 40.23610549037591, path: Winterfell -> Moat Cailin -> Greywater Watch -> Inn of the Kneeling Man -> Fairmarket -> Brotherhood Without Banners Hideout -> Crossroads Inn -> Darry -> Saltpans -> QuietIsle -> Antlers -> Sow's Horn -> Brindlewood -> Hayford -> King's Landing Disadvantages of Lazy I/O Lazy I/O also has many disadvantages, which have been widely discussed. Among those are: 1. Code becomes harder to reason about. In a setting without lazy I/O, you can casually reason about an Int as either an integer that’s already computed, or as something that will do some (pure) computation and then yield an Int. When lazy I/O enters the picture, things become more complicated. That Int you wanted to print? Yeah, it fired a bunch of missiles and returned the bodycount. This is why I would not seriously consider using lazy I/O when working with a team or on a large project – it can be easy to forget what is lazily loaded and what is not, and there’s no easy way to tell. 2. Scarce resources can easily become a problem if you are not careful. If we keep a reference to a City in our heap, that means we also keep a reference to the cache and the SQLite connection. We must ensure that we fully evaluate the solution to something that doesn’t refer to these resources (to e.g. a printed string) so that the references can be garbage collected and the connections can be closed. Closing the connections is a problem in itself – if we cannot guarantee that e.g. streams will be fully read, we need to rely on finalizers, which are pretty unreliable… 3. If we go a step further and add concurrency to our application, it becomes even tricker. Deadlocks are not easy to reason about – so how about reasoning about deadlocks when you’re not sure when the IO is going to be executed at all? Despite all these shortcomings, I believe lazy I/O is a powerful and elegant tool that belongs in every Haskeller’s toolbox. Like pretty much anything, you need to be aware of what you are doing and understand the advantages as well as the disadvantages. For example, the above downsides do not really apply if lazy I/O is only used within a module. For this blogpost, that means we could safely export the following interface: > shortestPathBetweenCities > :: FilePath -- ^ Database name > -> CityId -- ^ Start city ID > -> CityId -- ^ Goal city ID > -> IO (Maybe (Double, [CityId])) -- ^ Cost and path > shortestPathBetweenCities dbFilePath startId goalId = do > cache <- newCache > conn <- SQLite.open dbFilePath > start <- getCityById conn cache startId > goal <- getCityById conn cache goalId > case shortestPath cityId cityNeighbours start goal of > Nothing -> return Nothing > Just (cost, path) -> > let ids = map cityId path in > cost seq foldr seq () ids seq > return (Just (cost, ids)) Thanks for reading – and I hope I was able to offer you a nuanced view on lazy I/O. Special thanks to Jared Tobin for proofreading. 1. In this blogpost, I frequently talk about “infinite graphs”. Of course most of these examples are not truly infinite, but we can consider examples that do not fit in memory completely, and in that way we can regard them as “infinite for practical purposes”. 2. While blocking is good in this case, it might hurt performance when running in a concurrent environment. A good solution to that would be to stripe the MVars based on the keys, but that is beyond the scope of this blogpost. If you are interested in the subject, I talk about it a bit here. January 16, 2017 Michael Snoyman safe-prelude: a thought experiment This blog post is to share a very rough first stab at a new prelude I played around with earlier this month. I haven't used it in any significant way, and haven't spent more than a few hours on it total. I wrote it because I knew it was the only way to get the idea out of my head, and am sharing it in case anyone finds the idea intriguing or useful. The project is available on Github at snoyberg/safe-prelude, and I've uploaded the Haddocks for easier reading (though, be warned, they aren't well organized at all). The rest of this post is just a copy of the README.md file for the project. This is a thought experiment in a different point in the alternative prelude design space. After my blog post on readFile, I realized I was unhappy with the polymorphic nature of readFile in classy-prelude. Adding that with Haskell Pitfalls I've been itching to try something else. I have a lot of hope for the foundation project, but wanted to play with this in the short term. Choices • No partial functions, period. If a function can fail, its return type must express that. (And for our purposes: IO functions with runtime exceptions are not partial.) • Choose best in class libraries and promote them. bytestring and text fit that bill, as an example. Full listing below. • Regardless of the versions of underlying libraries, this package will always export a consistent API, so that CPP usage should be constrained to just inside this package. • Use generalization (via type classes) when they are well established. For example: Foldable and Traversable yes, MonoFoldable no. • Controversial Avoid providing list-specific functions. This connects to the parent point. Most of the time, I'd argue that lists are not the correct choice, and instead a Vector should be used. There is no standard for sequence-like typeclasses (though many exist), so we're not going to generalize. But we're also not going to use a less efficient representation. I was torn on this, but decided in favor of leaving out functions initially, on the basis that it's easier to add something in later rather than remove it. • Encourage qualified imports with a consistent naming scheme. This is a strong departure from classy-prelude, which tried to make it unnecessary to use qualified imports. I'll save my feelings about qualified imports for another time, this is just a pragmatic choice given the other constraints. • Export any non-conflicting and not-discouraged names from this module that make sense, e.g. ByteString, Text, or readIORef. Libraries This list may fall out of date, so check the .cabal file for a current and complete listing. I'm keeping this here to include reasoning for some libraries: • bytestring and text, despite some complaints, are clearly the most popular representation for binary and textual data, respectively • containers and unordered-containers are both commonly used. Due to lack of generalization, this library doesn't expose any functions for working with their types, but they are common enough that adding the dependency just for exposing the type name is worth it • safe-exceptions hides the complexity of asynchronous exceptions, and should be used in place of Control.Exception • transformers and mtl are clear winners in the monad transformer space, at least for now • While young, say has been very useful for me in avoiding interleaved output issues • Others without real competitors: deepseq, semigroups Packages I considered but have not included yet: • stm is an obvious winner, and while I use it constantly, I'm not convinced everyone else uses it as much as I do. Also, there are some questions around generalizing its functions (e.g., atomically could be in MonadIO), and I don't want to make that decision yet. • stm-chans falls into this category too • async is an amazing library, and in particular the race, concurrently, and Concurrently bits are an easy win. I've left it out for now due to questions of generalizing to MonadBaseControl (see lifted-async and its .Safe module) • Similar argument applies to monad-unlift • I didn't bother with exposing the Vector type... because which one would I expose? The Vector typeclass? Boxed Vector? Unboxed? I could do the classy-prelude thing and define type UVector = Data.Vector.Unboxed.Vector, but I'd rather not do such renamings. Qualified imports Here are the recommend qualified imports when working with safe-prelude. import qualified "bytestring" Data.ByteString as B import qualified "bytestring" Data.ByteString.Lazy as BL import qualified "text" Data.Text as T import qualified "text" Data.Text.Lazy as TL import qualified "containers" Data.Map.Strict as Map import qualified "containers" Data.Set as Set import qualified "unordered-containers" Data.HashMap.Strict as HashMap import qualified "unordered-containers" Data.HashSet as HashSet January 14, 2017 Dominic Steinitz Calling Haskell from C As part of improving the random number generation story for Haskell, I want to be able to use the testu01 library with the minimal amount of Haskell wrapping. testu01 assumes that there is a C function which returns the random number. The ghc manual gives an example but does not give all the specifics. These are my notes on how to get the example working under OS X (El Capitain 10.11.5 to be precise). The Haskell: {-# OPTIONS_GHC -Wall #-} {-# LANGUAGE ForeignFunctionInterface #-} module Foo where foreign export ccall foo :: Int -> IO Int foo :: Int -> IO Int foo n = return (length (f n)) f :: Int -> [Int] f 0 = [] f n = n:(f (n-1)) The .cabal: name: test-via-c version: 0.1.0.0 homepage: TBD license: MIT author: Dominic Steinitz maintainer: idontgetoutmuch@gmail.com category: System build-type: Simple cabal-version: >=1.10 executable Foo.dylib main-is: Foo.hs other-extensions: ForeignFunctionInterface build-depends: base >=4.7 && =0.6 && <0.7 hs-source-dirs: src default-language: Haskell2010 include-dirs: src ghc-options: -O2 -shared -fPIC -dynamic extra-libraries: HSrts-ghc8.0.1 On my computer running cabal install places the library in ~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin The C: #include #include "HsFFI.h" #include "../dist/build/Foo.dylib/Foo.dylib-tmp/Foo_stub.h" int main(int argc, char *argv[]) { int i; hs_init(&argc, &argv); for (i = 0; i < 5; i++) { printf("%d\n", foo(2500)); } hs_exit(); return 0; } On my computer this can be compiled with gcc-6 Bar.c ~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin/Foo.dylib -I/Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/include -L/Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/rts -lHSrts-ghc8.0.1 and can be run with DYLD_LIBRARY_PATH= ~/Library/Haskell/ghc-8.0.1/lib/test-via-c-0.1.0.0/bin: /Library/Frameworks/GHC.framework/Versions/8.0.1-x86_64/usr/lib/ghc-8.0.1/rts N.B. setting DYLD_LIBRARY_PATH like this is not recommended as it is a good way of breaking things. I have tried setting DYLD_FALLBACK_LIBRARY_PATH but only to get an error message. Hopefully, at some point I will be able to post a robust way of getting the executable to pick up the required dynamic libraries. January 13, 2017 Brent Yorgey My new programming languages course tl;dr: my new PL course is now finished, and all the course materials are freely available. Working through all the exercises should be a great option for anyone wishing to learn some basics of programming language design and implementation. Last May, I wrote about my ideas for designing a new PL course, and got a lot of great comments and feedback. Well, somehow I survived the semester, and the course is now over. In the end I’m pretty happy with how it went (though of course there are always things that can be improved next time). I decided to use class time in an unconventional way: for each class meeting I created a “module”, consisting of a literate Haskell file with some example code, explanatory text, and lots of holes where students needed to write answers to exercises or fill in code. I split the students into groups, and they spent class time just working through the module. Instead of standing at the front lecturing, I just wandered around watching them work and answering questions. It took a bit of getting used to—for the first few classes I couldn’t shake the feeling that I wasn’t really doing my job—but it quickly became clear that the students were really learning and engaging with the material in a way that they would not have been able to if I had just lectured. A happy byproduct of this approach is that the modules are fairly self-contained and can now be used by anyone to learn the material. Reading through all the modules and working through the exercises should be a great option for anyone wishing to learn some basics of programming language design and implementation. For example, I know I will probably reuse it to get summer research students up to speed. Note that the course assumes no knowledge of Haskell (so those familiar with Haskell can safely skip the first few modules), but introduces just enough to get where I want to go. I don’t plan to release any solutions, so don’t ask. But other than that, questions, comments, bug reports, etc. are welcome! January 12, 2017 FP Complete Containerizing a legacy application: an overview <html> An overview of what containerization is, the reasons to consider running a legacy application in Docker containers, the process to get it there, the issues you may run into, and next steps once you are deploying with containers. You'll reduce the stress of deployments, and take your first steps on the path toward no downtime and horizontal scaling. Note: This post focuses on simplifying deployment of the application. It does not cover topics that may require re-architecting parts of the application, such as high-availability and horizontal scaling. Concepts What is a "Legacy" App? There's no one set of attributes that typifies all legacy apps, but common attributes include: • Using the local filesystem for persistent storage, with data files intermingled with application files. • Running many services on one server, such as a MySQL database, Redis server, Nginx web server, a Ruby on Rails application, and a bunch of cron jobs. • Installation and upgrades use a hodgepodge of scripts and manual processes (often poorly documented). • Configuration is stored in files, often in multiple places and intermingled with application files. • Inter-process communication uses the local filesystem (e.g. dropping files in one place for another process to pick up) rather than TCP/IP. • Designed assuming one instance on the application would run on a single server. Disadvantages of the legacy approach • Automating deployments is difficult • If you need multiple customized instances of the application, it's hard to "share" a single server between multiple instances. • If the server goes down, can take a while to replace due to manual processes. • Deploying new versions is a fraught manual or semi-manual process which is hard to roll back. • It's possible for test and production environments to drift apart, which leads to problems in production that were not detected during testing. • You cannot easily scale horizontally by adding more instances of the application. What is "Containerization"? "Containerizing" an application is the process of making it able to run and deploy under Docker containers and similar technologies that encapsulate an application with its operating system environment (a full system image). Since containers provide the application with an environment very similar to having full control of a system, this is a way to begin modernizing the deployment of the application while making minimal or no changes to the application itself. This provides a basis for incrementally making the application's architecture more "cloud-friendly." Benefits of Containerization • Deployment becomes much easier: replacing the whole container image with a new one. • It's relatively easy to automate deployments, even having them driven completely from a CI (continuous integration) system. • Rolling back a bad deployment is just a matter of switching back to the previous image. • It's very easy to automate application updates since there are no "intermediate state" steps that can fail (either the whole deployment succeeds, or it all fails). • The same container image can be tested in a separate test environment, and then deployed to the production environment. You can be sure that what you tested is exactly the same as what is running in production. • Recovering a failed system is much easier, since a new container with exactly the same application can be automatically spun up on new hardware and attached to the same data stores. • Developers can also run containers locally to test their work in progress in a realistic environment. • Hardware can be used more efficiently, by running multiple containerized applications on a single host that ordinarily could not easily share a single system. • Containerizing is a good first step toward supporting no-downtime upgrades, canary deployments, high availability, and horizontal scaling. Alternatives to containerization • Configuration management tools like Puppet and Chef help with some of the "legacy" issues such as keeping environments consistent, but they do not support the "atomic" deployment or rollback of the entire environment and application at once. This can still go wrong partway through a deployment with no easy way to roll everything back. • Virtual machine images are another way to achieve many of the same goals, and there are cases where it makes more sense to do the "atomic" deployment operations using entire VMs rather than containers running on a host. The main disadvantage is that hardware utilization may be less efficient, since VMs need dedicated resources (CPU, RAM, disk), whereas containers can share a single host's resources between them. How to containerize Preparation Identify filesystem locations where persistent data is written Since deploying a new version of the application is performed by replacing the Docker image, any persistent data must be stored outside of the container. If you're lucky, the application already writes all its data to a specific path, but many legacy applications spread their data all over the filesystem and intermingle it with the application itself. Either way, Docker's volume mounts let us expose the host's filesystem to specific locations in the container filesystem so that data survives between containers, so we must identify the locations to persist. You may at this stage consider modifying the application to support writing all data within a single tree in the filesystem, as that will simplify deployment of the containerized version. However, this is not necessary if modifying the application is impractical. Identify configuration files and values that will vary by environment Since a single image should be usable in multiple environments (e.g. test and production) to ensure consistency, any configuration values that will vary by environment must be identified so that the container can be configured at startup time. These could take the form of environment variables, or of values within one or more configuration files. You may at this stage want to consider modifying the application to support reading all configuration from environment variables, as that that will simplify containerizing it. However, this is not necessary if modifying the application is impractical. Identify services that can be easily externalized The application may use some services running on the local machine that are easy to externalize due to being highly independent and supporting communication by TCP/IP. For example, if you run a database such as MySQL or PostgreSQL or a cache such as Redis on the local system, that should be easy to run externally. You may need to adjust configuration to support specifying a hostname and port rather than assuming the service can be reached on localhost. Creating the image Create a Dockerfile that installs the application If you already have the installation process automated via scripts or using a configuration management tool such as Chef or Puppet, this should be relatively easy. Start with an image of your preferred operating system, install any prerequisites, and then run the scripts. If the current setup process is more manual, this will involve some new scripting. But since the exact state of the image is known, it's easier to script the process than it would be when you have to deal with the potentially inconsistent state of a raw system. If you identified externalizable services earlier, you should modify the scripts to not install them. A simple example Dockerfile: # Start with an official Ubuntu 16.04 Docker image FROM ubuntu:16.04 # Install prerequisite Ubuntu packages RUN apt-get install -y <REQUIRED UBUNTU PACKAGES> \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* # Copy the application into the image ADD . /app # Run the app setup script RUN /app/setup.sh # Switch to the application directory WORKDIR /app # Specify the application startup script COMMAND /app/start.sh Startup script for configuration If the application takes all its configuration as environment variables already, then you don't need to do anything. However, if you have environment-dependent configuration values in configuration files, you will need to create an application startup script that reads these values from environment variables and then updates the configuration files. An simple example startup script: #!/usr/bin/env bash set -e # Append to the config file using$MYAPPCONFIG environment variable.
cat >>/app/config.txt <<END
my_app_config = "${MYAPPCONFIG}" END # Run the application using$MYAPPARG environment variable for an argument.
/app/bin/my-app --my-arg="${MYAPPARG}" Push the image After building the image (using docker build), it must be pushed to a Docker Registry so that it can be pulled on the machine where it will deployed (if you are running on the same machine as the image was built on, then this is not necessary). You can use Docker Hub for images (a paid account lets you create private image repositories), or most cloud providers also provide their own container registries (e.g. Amazon ECR). Give the image a tag (e.g. docker tag myimage mycompany/myimage:mytag) and then push it (e.g. docker push mycompany/myimage:mytag). Each image for a version of the application should have a unique tag, so that you always know which version you're using and so that images for older versions are available to roll back to. How to deploy Deploying containers is a big topic, and this section just focuses on directly running containers using docker commands. Tools like docker-compose (for simple cases where all containers run on a single server) and Kubernetes (for container orchestration across a cluster) should be considered in real-world usage. Externalized services Services you identified for externalization earlier can be run in separate Docker containers that will be linked to the main application. Alternatively, it is often easiest to outsource to managed services. For example, if you are using AWS, using RDS for a database or Elasticache for a cache significantly simplifies your life since they take care of maintenance, high availability, and backups for you. An example of running a Postgres database container: docker run \ -d \ --name db \ -v /usr/local/var/docker/volumes/postgresql/data:/var/lib/postgresql/data \ postgres The application To run the application in a Docker container, you use a command-line such as this: docker run \ -d \ -p 8080:80 \ --name myapp \ -v /usr/local/var/docker/volumes/myappdata:/var/lib/myappdata \ -e MYAPPCONFIG=myvalue \ -e MYAPPARG=myarg \ --link db:db \ myappimage:mytag The -p argument exposes the container's port 80 on the host's port 8080, -v argument sets up the volume mount for persistent data (in the hostpath:containerpath format), the -e argument sets a configuration environment variable (these may both be repeated for additional volumes and variables), and the --link argument links the database container so the application can communicate with it. The container will be started with the startup script you specified in the Dockerfile's COMMAND. Upgrades To upgrade to a new version of the application, stop the old container (e.g., docker rm -f myapp) and start a new one with the new image tag (this will require a brief down time). Rolling back is the similar, except that you use the old image tag. Additional considerations "init" process (PID 1) Legacy applications often run multiple processes, and it's not uncommon for orphan processes to accumulate if there is no "init" (PID 1) daemon to clean them up. Docker does not, by default, provide such a daemon, so it's recommended to add one as the ENTRYPOINT in your Dockerfile. dumb-init is an example lightweight init daemon, among others. phusion/baseimage is a fully-featured base image that includes an init daemon in addition to other services. See our blog post dedicated to this topic: Docker demons: PID-1, orphans, zombies, and signals. Daemons and cron jobs The usual way to use Docker containers is to have a single process per container. Ideally, any cron jobs and daemons can be externalized into separate containers, but this is not always possible in legacy applications without re-architecting them. There is no intrinsic reason why containers cannot run many processes, but it does require some extra setup since standard base images do not include process managers and schedulers. Minimal process supervisors, such as runit, are more appropriate to use in containers than full-fledged systems like systemd. phusion/baseimage is a fully-featured base image that includes runit and cron, in addition to other services. Volume-mount permissions It's common (though not necessarily recommended) to run all processes in containers as the root user. Legacy applications often have more complex user requirements, and may need to run as a different user (or multiple processes as multiple users). This can present a challenge when using volume mounts, because Docker makes the mount points owned by root by default, which means non-root processes will not be able to write to them. There are two ways to deal with this. The first approach is to create the directories on the host first, owned by the correct UID/GID, before starting the container. Note that since the container and host's users don't match up, you have to be careful to use the same UID/GID as the container, and not merely the same usernames. The other approach is for the container itself to adjust the ownership of the mount points during its startup. This has to happen while running as root, before switching to a non-root user to start the application. Database migrations Database schema migrations always present a challenge for deployments, because the database schema can be very tightly coupled with the application, and that makes controlling the timing of the migration important, as well as making rolling back to an older version of the application more difficult since database migrations can't always be rolled back easily. A way to mitigate this easily is to have a staged approach to migrations. You need to make an incompatible schema change, you split that change over two application deployments. For example, if you want to move a piece of data from one location to another, these would be the phases: 1. Write the data to both the old and new locations, and read it from the new location. This means that if you roll the application back to the previous version, any the new data is still where it expects to find it. 2. Stop writing it to the old location. Note that if you want to have deployments with no downtime, that means running multiple versions of the application at the same time, which makes this even more of a challenge. Backing up data Backing up from a containerized application is usually easier than the non-containerized deployment. Data files can be backed up from the host and you don't risk any intermingling of data files with application files because they are strictly separated. If you've moved databases to managed services such as RDS, those can take care of backups for you (at least if your needs are relatively simple). Migrating existing data To transition the production application to the new containerized version, you will need to migrate the old deployment's data. How to do this will vary, but usually the simplest is to stop the old deployment, back up all the data, and restore it to the new deployment. This should be practiced in advance, and will necessitate some down time. Conclusion While it requires some up-front work, containerizing a legacy application will help you get control of, automate, and minimize the stress of deploying it. It sets you on a path toward modernizing your application and supporting no-downtime deployments, high availability, and horizontal scaling. FP Complete has undertaken this process many times in addition to building containerized applications from the ground up. If you'd like to get on the path to modern and stress-free deployment of your applications, you can learn more about our Devops and Consulting services, or contact us straight away! </html> January 11, 2017 Toby Goodwin Artificial Superintelligence I like Tim Urban's Wait But Why? site (tagline: new post every sometimes). But I thought his article on The AI Revolution - The Road to Superintelligence was dead wrong. Vaguely at the back of my mind was that I ought to try to write some kind of rebuttal, but it would have taken a lot of time (which I don't have) to research the topic properly, and write it up. So I was delighted to come across Superintelligence - The Idea That Eats Smart People which does a far better job than I ever could have done. I recommend reading both of them. The GHC Team GHC 8.0.2 is available! The GHC team is happy to at last announce the 8.0.2 release of the Glasgow Haskell Compiler. Source and binary distributions are available at This is the second release of the 8.0 series and fixes nearly two-hundred bugs. These include, • Interface file build determinism (#4012). • Compatibility with macOS Sierra and GCC compilers which compile position-independent executables by default • Compatibility with systems which use the gold linker • Runtime linker fixes on Windows (see #12797) • A compiler bug which resulted in undefined reference errors while compiling some packages (see #12076) • A number of memory consistency bugs in the runtime system • A number of efficiency issues in the threaded runtime which manifest on larger core counts and large numbers of bound threads. • A typechecker bug which caused some programs using -XDefaultSignatures to be incorrectly accepted. • More than two-hundred other bugs. See Trac for a complete listing. • #12757, which lead to broken runtime behavior and even crashes in the presence of primitive strings. • #12844, a type inference issue affecting partial type signatures. • A bump of the directory library, fixing buggy path canonicalization behavior (#12894). Unfortunately this required a major version bump in directory and minor bumps in several other libraries. • #12912, where use of the select system call would lead to runtime system failures with large numbers of open file handles. • #10635, wherein -Wredundant-constraints was included in the -Wall warning set A more detailed list of the changes included in this release can be found in the release notes. Please note that this release breaks with our usual tendency to avoid major version bumps of core libraries in minor GHC releases by including an upgrade of the directory library to 1.3.0.0. Also note that, due to a rather serious bug (#13100) affecting Windows noticed late in the release cycle, the Windows binary distributions were produced using a slightly patched source tree. Users compiling from source for Windows should be certain to include this patch in their build. This release is the result of six months of effort by the GHC development community. We'd like to thank everyone who has contributed code, bug reports, and feedback to this release. It's only due to their efforts that GHC remains a vibrant and exciting project. How to get it Both the source tarball and binary distributions for a wide variety of platforms are available here. Background Haskell is a standardized lazy functional programming language. The Glasgow Haskell Compiler (GHC) is a state-of-the-art programming suite for Haskell. Included is an optimising compiler generating efficient code for a variety of platforms, together with an interactive system for convenient, quick development. The distribution includes space and time profiling facilities, a large collection of libraries, and support for various language extensions, including concurrency, exceptions, and foreign language interfaces. GHC is distributed under a BSD-style open source license. Supported Platforms The list of platforms we support, and the people responsible for them, can be found on the GHC wiki Ports to other platforms are possible with varying degrees of difficulty. The Building Guide describes how to go about porting to a new platform. Developers We welcome new contributors. Instructions on getting started with hacking on GHC are available from GHC's developer site. Community Resources There are mailing lists for GHC users, develpoers, and monitoring bug tracker activity; to subscribe, use the Mailman web interface. There are several other Haskell and GHC-related mailing lists on haskell.org; for the full list, see the lists page. Some GHC developers hang out on the #ghc and #haskell of the Freenode IRC network, too. See the Haskell wiki for details. Please report bugs using our bug tracking system. Instructions on reporting bugs can be found here. Christopher Done Fast Haskell: Competing with C at parsing XML In this post we’re going to look at parsing XML in Haskell, how it compares with an efficient C parser, and steps you can take in Haskell to build a fast library from the ground up. We’re going to get fairly detailed and get our hands dirty. A new kid on the block A few weeks ago Neil Mitchell posted a blog post about a new XML library that he’d written. The parser is written in C, and the API is written in Haskell which uses the C library. He writes that it’s very fast: Hexml has been designed for speed. In the very limited benchmarks I’ve done it is typically just over 2x faster at parsing than Pugixml, where Pugixml is the gold standard for fast XML DOM parsers. In my uses it has turned XML parsing from a bottleneck to an irrelevance, so it works for me. In order to achieve that speed, he cheats by not performing operations he doesn’t care about: To gain that speed, Hexml cheats. Primarily it doesn’t do entity expansion, so &amp; remains as &amp; in the output. It also doesn’t handle CData sections (but that’s because I’m lazy) and comment locations are not remembered. It also doesn’t deal with most of the XML standard, ignoring the DOCTYPE stuff. [..] I only work on UTF8, which for the bits of UTF8 I care about, is the same as ASCII - I don’t need to do any character decoding. Cheating is fine when you describe in detail how you cheat. That’s just changing the rules of the game! But C has problems This post caught my attention because it seemed to me a pity to use C. Whether you use Haskell, Python, or whatever, there are a few problems with dropping down to C from your high-level language: • The program is more likely to segfault. I’ll take an exception over a segfault any day! • The program opens itself up to possible exploitation due to lack of memory safety. • If people want to extend your software, they have to use C, and not your high-level language. • Portability (i.e. Windows) is a pain in the arse with C. Sure enough, it wasn’t long before Austin Seipp posted a rundown of bugs in the C code: At the moment, sorry to say – I wouldn’t use this library to parse any arbitrary XML, since it could be considered hostile, and get me owned. Using American Fuzzy Lop, just after a few minutes, I’ve already found around ~30 unique crashes. But C is really fast right? Like 100s of times faster than Haskell! It’s worth the risk. But-but C is fast! Let’s benchmark it. We’re going to parse a 4KB, a 31KB and a 211KB XML file. Using the Criterion benchmarking package, we can compare Hexml against the pretty old Haskell xml package… File hexml xml 4KB 6.26 μs 1.94 ms (1940 μs) 31KB 9.41 μs 13.6 ms (13600 μs) 211KB 260 μs 25.9 ms (25900 μs) Ouch! Those numbers don’t look good. The xml package is 100-300x times slower. Okay, I’m being unfair. The xml package isn’t known for speed. Its package description is simply A simple XML library. Let’s compare with the hexpat package. That one has this in its description: The design goals are speed, speed, speed, interface simplicity and modularity. So that’s probably more representing the best in Haskell XML parsers. It’s also based on the C expat library, which is supposed to be fast. File hexml hexpat 4KB 6.395 μs 320.3 μs 31KB 9.474 μs 378.3 μs 211KB 256.2 μs 25.68 ms That’s a bit better. We’re now between 40-100x slower than Hexml. I’d prefer 10x slower, but it’s a more reasonable outcome. The hexpat package handles: keeping location information, reasonable parse errors, the complete XML standard. Hexml doesn’t do any of that. Let’s set us a challenge. Can we match or beat the Hexml package in plain old Haskell? This is an itch that got under my skin. I emailed Neil and he was fine with it: I don’t think it’s unfair or attacky to use Hexml as the baseline - I’d welcome it! I’ll walk you through my approach. I called my library Xeno (for obvious reasons). Start with the simplest thing possible …and make sure it’s fast. Here’s the first thing I wrote, to see how fast it was to walk across a file compared with Hexml. module Xeno (parse) where import Data.ByteString (ByteString) import qualified Data.ByteString as S import Data.Word -- | Parse an XML document. parse :: ByteString -> () parse str = parseTags 0 where parseTags index = case elemIndexFrom 60 str index of Nothing -> () Just fromLt -> case elemIndexFrom 62 str fromLt of Nothing -> () Just fromGt -> do parseTags (fromGt + 1) -- | Get index of an element starting from offset. elemIndexFrom :: Word8 -> ByteString -> Int -> Maybe Int elemIndexFrom c str offset = fmap (+ offset) (S.elemIndex c (S.drop offset str)) {-# INLINE elemIndexFrom #-} The numbers 60 and 62 are < and >. In XML the only characters that matter are < and > (if you don’t care about entities). < and > can’t appear inside speech marks (attributes). They are the only important things to search for. Results: File hexml xeno 4KB 6.395 μs 2.630 μs 42KB 37.55 μs 7.814 μs So the baseline performance of walking across the file in jumps is quite fast! Why is it fast? Let’s look at that for a minute: • The ByteString data type is a safe wrapper around a vector of bytes. It’s underneath equivalent to char* in C. • With that in mind, the S.elemIndex function is implemented using the standard C function memchr(3). As we all know, memchr jumps across your file in large word boundaries or even using SIMD operations, meaning it’s bloody fast. But the elemIndex function itself is safe. So we’re effectively doing a for(..) { s=memchr(s,..) } loop over the file. Keep an eye on the allocations Using the weigh package for memory allocation tracking, we can also look at allocations of our code right now: Case Bytes GCs Check 4kb parse 1,168 0 OK 42kb parse 1,560 0 OK 52kb parse 1,168 0 OK 182kb parse 1,168 0 OK We see that it’s constant. Okay, it varies by a few bytes, but it doesn’t increase linearly or anything. That’s good! One thing that stood out to me, is that didn’t we pay for allocation of the Maybe values. For a 1000x < and > characters, we should have 1000 allocations of Just/Nothing. Let’s go down that rabbit hole for a second. Looking at the Core Well, if you compile the source like this stack ghc -- -O2 -ddump-simpl Xeno.hs You’ll see a dump of the real Core code that is generated after the Haskell code is desugared, and before it’s compiled to machine code. At this stage you can already see optimizations based on inlining, common-sub-expression elimination, deforestation, and other things. The output is rather large. Core is verbose, and fast code tends to be longer. Here is the output, but you don’t have to understand it. Just note that there’s no mention of Maybe, Just or Nothing in there. It skips that altogether. See here specifically. There is a call to memchr, then there is an eqAddr comparison with NULL, to see whether the memchr is done or not. But we’re still doing safety checks so that the resulting code is safe. Inlining counts The curious reader might have noticed that INLINE line in my first code sample. {-# INLINE elemIndexFrom #-} Without the INLINE, the whole function is twice as slow and has linear allocation. Case Bytes GCs Check 4kb parse 1,472 0 OK 42kb parse 1,160 0 OK 52kb parse 1,160 0 OK benchmarking 4KB/xeno time 2.512 μs (2.477 μs .. 2.545 μs) benchmarking 211KB/xeno time 129.9 μs (128.7 μs .. 131.2 μs) benchmarking 31KB/xeno time 1.930 μs (1.909 μs .. 1.958 μs) versus: Case Bytes GCs Check 4kb parse 12,416 0 OK 42kb parse 30,080 0 OK 52kb parse 46,208 0 OK benchmarking 4KB/xeno time 5.258 μs (5.249 μs .. 5.266 μs) benchmarking 211KB/xeno time 265.9 μs (262.4 μs .. 271.4 μs) benchmarking 31KB/xeno time 3.212 μs (3.209 μs .. 3.218 μs) Always pay attention to things like this. You don’t want to put INLINE on everything. Sometimes it adds slowdown, most times it makes no difference. So check with your benchmark suite. Loop unrolling manually Some things need to be done manually. I added comment parsing to our little function: + Just fromLt -> checkOpenComment (fromLt + 1) + checkOpenComment index = + if S.isPrefixOf "!--" (S.drop index str) + then findCommentEnd (index + 3) + else findLt index + findCommentEnd index = + case elemIndexFrom commentChar str index of + Nothing -> () -- error! + Just fromDash -> + if S.isPrefixOf "->" (S.drop (fromDash + 1) str) + then findGt (fromDash + 2) + else findCommentEnd (fromDash + 1) And it became 2x slower: benchmarking 4KB/xeno time 2.512 μs (2.477 μs .. 2.545 μs) to benchmarking 4KB/xeno time 4.296 μs (4.240 μs .. 4.348 μs) So I changed the S.isPrefixOf to be unrolled to S.index calls, like this: - if S.isPrefixOf "!--" (S.drop index str) - then findCommentEnd (index + 3) - else findLt index + if S.index this 0 == bangChar && + S.index this 1 == commentChar && + S.index this 2 == commentChar + then findCommentEnd (index + 3) + else findLt index + where + this = S.drop index str And it dropped back down to our base speed again. Finding tag names I implemented finding tag names like this: + findTagName index0 = + case S.findIndex (not . isTagName) (S.drop index str) of + Nothing -> error "Couldn't find end of tag name." + Just ((+ index) -> spaceOrCloseTag) -> + if S.head this == closeTagChar + then findGt spaceOrCloseTag + else if S.head this == spaceChar + then findLt spaceOrCloseTag + else error + ("Expecting space or closing '>' after tag name, but got: " ++ + show this) + where this = S.drop spaceOrCloseTag str + where + index = + if S.head (S.drop index0 str) == questionChar || + S.head (S.drop index0 str) == slashChar + then index0 + 1 + else index0 And immediately noticed a big slow down. From Case Bytes GCs Check 4kb parse 1,160 0 OK 42kb parse 1,472 0 OK 52kb parse 1,160 0 OK Benchmark xeno-memory-bench: FINISH Benchmark xeno-speed-bench: RUNNING... benchmarking 4KB/hexml time 6.149 μs (6.125 μs .. 6.183 μs) benchmarking 4KB/xeno time 2.691 μs (2.665 μs .. 2.712 μs) to Case Bytes GCs Check 4kb parse 26,096 0 OK 42kb parse 65,696 0 OK 52kb parse 102,128 0 OK Benchmark xeno-memory-bench: FINISH Benchmark xeno-speed-bench: RUNNING... benchmarking 4KB/hexml time 6.225 μs (6.178 μs .. 6.269 μs) benchmarking 4KB/xeno time 10.34 μs (10.06 μs .. 10.59 μs) The first thing that should jump out at you is the allocations. What’s going on there? I looked in the profiler output, by running stack bench --profile to see a profile output.  Wed Jan 11 17:41 2017 Time and Allocation Profiling Report (Final) xeno-speed-bench +RTS -N -p -RTS 4KB/xeno total time = 8.09 secs (8085 ticks @ 1000 us, 1 processor) total alloc = 6,075,628,752 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc parse.findTagName Xeno 35.8 72.7 getOverhead Criterion.Monad 13.6 0.0 parse.checkOpenComment Xeno 9.9 0.0 parse.findLT Xeno 8.9 0.0 parse Xeno 8.4 0.0 >>= Data.Vector.Fusion.Util 4.6 7.7 getGCStats Criterion.Measurement 2.8 0.0 basicUnsafeIndexM Data.Vector.Primitive 1.6 2.0 fmap Data.Vector.Fusion.Stream.Monadic 1.3 2.2 rSquare.p Statistics.Regression 1.3 1.5 basicUnsafeWrite Data.Vector.Primitive.Mutable 1.2 1.4 innerProduct.\ Statistics.Matrix.Algorithms 1.0 1.6 qr.\.\ Statistics.Matrix.Algorithms 0.8 1.2 basicUnsafeSlice Data.Vector.Primitive.Mutable 0.5 1.1 transpose Statistics.Matrix 0.5 1.3 Right at the top, we have findTagName, doing all the allocations. So I looked at the code, and found that the only possible thing that could be allocating, is S.drop. This function skips n elements at the start of a ByteString. It turns out that S.head (S.drop index0 str) was allocating an intermediate string, just to get the first character of that string. It wasn’t copying the whole string, but it was making a new pointer to it. So I realised that I could just replace S.head (S.drop n s) with S.index s n: - if S.head this == closeTagChar + if S.index str spaceOrCloseTag == closeTagChar then findLT spaceOrCloseTag - else if S.head this == spaceChar + else if S.index str spaceOrCloseTag == spaceChar then findGT spaceOrCloseTag else error "Expecting space or closing '>' after tag name." - where this = S.drop spaceOrCloseTag str where index = - if S.head (S.drop index0 str) == questionChar || - S.head (S.drop index0 str) == slashChar + if S.index str index0 == questionChar || + S.index str index0 == slashChar And sure enough, the allocations disappeared: Case Bytes GCs Check 4kb parse 1,160 0 OK 42kb parse 1,160 0 OK 52kb parse 1,472 0 OK Benchmark xeno-memory-bench: FINISH Benchmark xeno-speed-bench: RUNNING... benchmarking 4KB/hexml time 6.190 μs (6.159 μs .. 6.230 μs) benchmarking 4KB/xeno time 4.215 μs (4.175 μs .. 4.247 μs) Down to 4.215 μs. That’s not as fast as our pre-name-parsing 2.691 μs. But we had to pay something for the extra operations per tag. We’re just not allocating anymore, which is great. SAX for free Eventually I ended up with a function called process that parses XML and triggers events in a SAX style: process :: Monad m => (ByteString -> m ()) -- ^ Open tag. -> (ByteString -> ByteString -> m ()) -- ^ Tag attribute. -> (ByteString -> m ()) -- ^ End open tag. -> (ByteString -> m ()) -- ^ Text. -> (ByteString -> m ()) -- ^ Close tag. -> ByteString -> m () Thanks again to GHC’s optimizations, calling this function purely and doing nothing is exactly equal to the function before SAX-ization: -- | Parse the XML but return no result, process no events. validate :: ByteString -> Bool validate s = case spork (runIdentity (process (\_ -> pure ()) (\_ _ -> pure ()) (\_ -> pure ()) (\_ -> pure ()) (\_ -> pure ()) s)) of Left (_ :: XenoException) -> False Right _ -> True Case Bytes GCs Check 4kb parse 1,472 0 OK 42kb parse 1,160 0 OK 52kb parse 1,472 0 OK benchmarking 4KB/xeno time 4.320 μs (4.282 μs .. 4.361 μs) This function performs at the same speed as process before it accepted any callback arguments. This means that the only overhead to SAX’ing will be the activities that the callback functions themselves do. Specialization is for insects (and, as it happens, optimized programs) One point of interest is that adding a SPECIALIZE pragma for the process function increases speed by roughly 1 μs. Specialization means that for a given function which is generic (type-class polymorphic), which means it will accept a dictionary argument at runtime for the particular instance, instead we will generate a separate piece of code that is specialized on that exact instance. Below is the Identity monad’s (i.e. just pure, does nothing) specialized type for process. {-# SPECIALISE process :: (ByteString -> Identity ()) -> (ByteString -> ByteString -> Identity ()) -> (ByteString -> Identity ()) -> (ByteString -> Identity ()) -> (ByteString -> Identity ()) -> ByteString -> Identity () #-} Before benchmarking 4KB/xeno-sax time 5.877 μs (5.837 μs .. 5.926 μs) benchmarking 211KB/xeno-sax time 285.8 μs (284.7 μs .. 287.4 μs) after benchmarking 4KB/xeno-sax time 5.046 μs (5.036 μs .. 5.056 μs) benchmarking 211KB/xeno-sax time 240.6 μs (240.0 μs .. 241.5 μs) In the 4KB case it’s only 800 ns, but as we say in Britain, take care of the pennies and the pounds will look after themselves. The 240->285 difference isn’t big in practical terms, but when we’re playing the speed game, we pay attention to things like that. Where we stand: Xeno vs Hexml Currently the SAX interface in Zeno outperforms Hexml in space and time. Hurrah! We’re as fast as C! File hexml-dom xeno-sax 4KB 6.134 μs 5.147 μs 31KB 9.299 μs 2.879 μs 211KB 257.3 μs 241.0 μs It’s also worth noting that Haskell does this all safely. All the functions I’m using are standard ByteString functions which do bounds checking and throw an exception if so. We don’t accidentally access memory that we shouldn’t, and we don’t segfault. The server keeps running. If you’re interested, if we switch to unsafe functions (unsafeTake, unsafeIndex from the Data.ByteString.Unsafe module), we get a notable speed increase: File hexml-dom xeno-sax 4KB 6.134 μs 4.344 μs 31KB 9.299 μs 2.570 μs 211KB 257.3 μs 206.9 μs We don’t need to show off, though. We’ve already made our point. We’re Haskellers, we like safety. I’ll keep my safe functions. But Hexml does more! I’d be remiss if I didn’t address the fact that Hexml does more useful things than we’ve done here. Hexml allocates a DOM for random access. Oh no! Allocation: Haskell’s worse enemy! We’ve seen that Haskell allocates a lot normally. Actually, have we looked at that properly? Case Bytes GCs Check 4kb/hexpat-sax 444,176 0 OK 31kb/hexpat-sax 492,576 0 OK 211kb/hexpat-sax 21,112,392 40 OK 4kb/hexpat-dom 519,128 0 OK 31kb/hexpat-dom 575,232 0 OK 211kb/hexpat-dom 23,182,560 44 OK Alright. Implementing a DOM parser for Xeno All isn’t lost. Hexml isn’t a dumb parser that’s fast because it’s in C, it’s also a decent algorithm. Rather than allocating a tree, it allocates a big flat vector of nodes and attributes, which contain offsets into the original string. We can do that in Haskell too! Here’s my design of a data structure contained in a vector. We want to store just integers in the vector. Integers that point to offsets in the original string. Here’s what I came up with. We have three kinds of payloads. Elements, text and attributes: 1. 00 # Type tag: element 2. 00 # Parent index (within this array) 3. 01 # Start of the tag name in the original string 4. 01 # Length of the tag name 5. 05 # End index of the tag (within this array) 1. 02 # Type tag: attribute 2. 01 # Start of the key 3. 05 # Length of the key 4. 06 # Start of the value 5. 03 # Length of the value 1. 01 # Type tag: text 2. 01 # Start of the text 3. 10 # Length of the text That’s all the detail I’m going to go into. You can read the code if you want to know more. It’s not a highly optimized format. Once we have such a vector, it’s possible to define a DOM API on top of it which can let you navigate the tree as usual, which we’ll see later. We’re going to use our SAX parser–the process function, and we’re going to implement a function that writes to a big array. This is a very imperative algorithm. Haskellers don’t like imperative algorithms much, but Haskell’s fine with them. The function ends up looking something like this: runST (do nil <- UMV.new 1000 vecRef <- newSTRef nil sizeRef <- fmap asURef (newRef 0) parentRef <- fmap asURef (newRef 0) process (\(PS _ name_start name_len) -> <write the open tag elements>) (\(PS _ key_start key_len) (PS _ value_start value_len) -> <write an attribute into the vector>) (\_ -> <ignore>) (\(PS _ text_start text_len) -> <write a text entry into the vector>) (\_ -> <set the end position of the parent> <set the current element to the parent>) str wet <- readSTRef vecRef arr <- UV.unsafeFreeze wet size <- readRef sizeRef return (UV.unsafeSlice 0 size arr)) The function runs in the ST monad which lets us locally read and write to mutable variables and vectors, while staying pure on the outside. I allocate an array of 1000 64-bit Ints (on 64-bit arch), I keep a variable of the current size, and the current parent (if any). The current parent variable lets us, upon seeing a </close> tag, assign the position in the vector of where the parent is closed. Whenever we get an event and the array is too small, I grow the array by doubling its size. This strategy is copied from the Hexml package. Finally, when we’re done, we get the mutable vector, “freeze” it (this means making an immutable version of it), and then return that copy. We use unsafeFreeze to re-use the array without copying, which includes a promise that we don’t use the mutable vector afterwards, which we don’t. The DOM speed Let’s take a look at the speeds: File hexml-dom xeno-sax xeno-dom 4KB 6.123 μs 5.038 μs 10.35 μs 31KB 9.417 μs 2.875 μs 5.714 μs 211KB 256.3 μs 240.4 μs 514.2 μs Not bad! The DOM parser is only <2x slower than Hexml (except in the 31KB where it’s faster. shrug). Here is where I stopped optimizing and decided it was good enough. But we can review some of the decisions made along the way. In the code we’re using unboxed mutable references for the current size and parent, the mutable references are provided by the mutable-containers package. See these two lines here:  sizeRef <- fmap asURef (newRef 0) parentRef <- fmap asURef (newRef 0) Originally, I had tried STRef’s, which are boxed. Boxed just means it’s a pointer to an integer instead of an actual integer. An unboxed Int is a proper machine register. Using an STRef, we get worse speeds: File xeno-dom 4KB 12.18 μs 31KB 6.412 μs 211KB 631.1 μs Which is a noticeable speed loss. Another thing to take into consideration is the array type. I’m using the unboxed mutable vectors from the vector package. When using atomic types like Int, it can be a leg-up to use unboxed vectors. If I use the regular boxed vectors from Data.Vector, the speed regresses to: File xeno-dom 4KB 11.95 μs (from 10.35 μs) 31KB 6.430 μs (from 5.714 μs) 211KB 1.402 ms (from 514.2 μs) Aside from taking a bit more time to do writes, it also allocates 1.5x more stuff: Case Bytes GCs Check 4kb/xeno/dom 11,240 0 OK 31kb/xeno/dom 10,232 0 OK 211kb/xeno/dom 1,082,696 0 OK becomes Case Bytes GCs Check 4kb/xeno/dom 22,816 0 OK 31kb/xeno/dom 14,968 0 OK 211kb/xeno/dom 1,638,392 1 OK See that GC there? We shouldn’t need it. Finally, one more remark for the DOM parser. If we forsake safety and use the unsafeWrite and unsafeRead methods from the vector package, we do see a small increase: File xeno-dom 4KB 9.827 μs 31KB 5.545 μs 211KB 490.1 μs But it’s nothing to write home about. I’ll prefer memory safety over a few microseconds this time. The DOM API I wrote some functions to access our vector and provide a DOM-like API: > let Right node = parse "<foo k='123'><p>hi</p>ok</foo>" > node (Node "foo" [("k","123")] [Element (Node "p" [] [Text "hi"]),Text "ok"]) > name node "foo" > children node [(Node "p" [] [Text "hi"])] > attributes node [("k","123")] > contents node [Element (Node "p" [] [Text "hi"]),Text "ok"] So that works. Wrapping-up The final results are in: And just to check that a 1MB file doesn’t give wildly different results: benchmarking 1MB/hexml-dom time 1.225 ms (1.221 ms .. 1.229 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.239 ms (1.234 ms .. 1.249 ms) std dev 25.23 μs (12.28 μs .. 40.84 μs) benchmarking 1MB/xeno-sax time 1.206 ms (1.203 ms .. 1.211 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.213 ms (1.210 ms .. 1.218 ms) std dev 14.58 μs (10.18 μs .. 21.34 μs) benchmarking 1MB/xeno-dom time 2.768 ms (2.756 ms .. 2.779 ms) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.801 ms (2.791 ms .. 2.816 ms) std dev 41.10 μs (30.14 μs .. 62.60 μs) Tada! We matched Hexml, in pure Haskell, using safe accessor functions. We provided a SAX API which is very fast, and a simple demonstration DOM parser with a familiar API which is also quite fast. We use reasonably little memory in doing so. This package is an experiment for educational purposes, to show what Haskell can do and what it can’t, for a very specific domain problem. If you would like to use this package, consider adopting it and giving it a good home. I’m not looking for more packages to maintain. January 10, 2017 Roman Cheplyaka Nested monadic loops may cause space leaks Consider the following trivial Haskell program: main :: IO () main = worker {-# NOINLINE worker #-} worker :: (Monad m) => m () worker = let loop = poll >> loop in loop poll :: (Monad m) => m a poll = return () >> poll It doesn’t do much — except, as it turns out, eat a lot of memory! % ./test +RTS -s & sleep 1s && kill -SIGINT %1 751,551,192 bytes allocated in the heap 1,359,059,768 bytes copied during GC 450,901,152 bytes maximum residency (11 sample(s)) 7,166,816 bytes maximum slop 888 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1429 colls, 0 par 0.265s 0.265s 0.0002s 0.0005s Gen 1 11 colls, 0 par 0.701s 0.703s 0.0639s 0.3266s INIT time 0.000s ( 0.000s elapsed) MUT time 0.218s ( 0.218s elapsed) GC time 0.966s ( 0.968s elapsed) EXIT time 0.036s ( 0.036s elapsed) Total time 1.223s ( 1.222s elapsed) %GC time 79.0% (79.2% elapsed) Alloc rate 3,450,267,071 bytes per MUT second Productivity 21.0% of total user, 21.0% of total elapsed These nested loops happen often in server-side programming. About a year ago, when I worked for Signal Vine, this happened to my code: the inner loop was a big streaming computation; the outer loop was something that would restart the inner loop should it fail. Later that year, Edsko de Vries blogged about a very similar issue. Recently, Sean Clark Hess observed something similar. In his case, the inner loop waits for a particular AMQP message, and the outer loop calls the inner loop repeatedly to extract all such messages. So why would such an innocent-looking piece of code consume unbounded amounts of memory? To find out, let’s trace the program execution on the STG level. Background: STG and IO The runtime model of ghc-compiled programs is described in the paper Making a Fast Curry: Push/Enter vs. Eval/Apply for Higher-order Languages. Here is the grammar and the reduction rules for the quick reference. <figure> </figure> It is going to be important that the IO type in GHC is a function type: newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #)) Here are a few good introductions to the internals of IO: from Edsko de Vries, Edward Z. Yang, and Michael Snoyman. Our program in STG Let’s see now how our program translates to STG. This is a translation done by ghc 8.0.1 with -O -ddump-stg -dsuppress-all: poll_rnN = sat-only \r srt:SRT:[] [$dMonad_s312]
let { sat_s314 = \u srt:SRT:[] [] poll_rnN $dMonad_s312; } in let { sat_s313 = \u srt:SRT:[] [] return$dMonad_s312 ();
} in  >> $dMonad_s312 sat_s313 sat_s314; worker = \r srt:SRT:[] [$dMonad_s315]
let {
loop_s316 =
\u srt:SRT:[] []
let { sat_s317 = \u srt:SRT:[] [] poll_rnN $dMonad_s315; } in >>$dMonad_s315 sat_s317 loop_s316;
} in  loop_s316;

main = \u srt:SRT:[r2 :-> $fMonadIO] [] worker$fMonadIO;

This is the STG as understood by ghc itself. In the notation of the fast curry paper introduced above, this (roughly) translates to:

main = THUNK(worker monadIO realWorld);

let {
loop = THUNK(let {worker_poll_thunk = THUNK(poll monad);}
} in loop
);

let {
}
);

monadIO is the record (“dictionary”) that contains the Monad methods >>=, >>, and return for the IO type. We will need return and >> (called then here) in particular; here is how they are defined:

returnIO = FUN(x s -> (# s, x #));
thenIO = FUN(m k s ->
case m s of {
(# new_s, result #) -> k new_s
}
);
}
);
}
);

STG interpreters

We could run our STG program by hand following the reduction rules listed above. If you have never done it, I highly recommend performing several reductions by hand as an exercise. But it is a bit tedious and error-prone. That’s why we will use Bernie Pope’s Ministg interpreter. My fork of Ministg adds support for unboxed tuples and recursive let bindings necessary to run our program.

There is another STG interpreter, stgi, by David Luposchainsky. It is more recent and looks nicer, but it doesn’t support the eval/apply execution model used by ghc, which is a deal breaker for our purposes.

We run Ministg like this:

ministg --noprelude --trace --maxsteps=100 --style=EA --tracedir leak.trace leak.stg

Ministg will print an error message saying that the program hasn’t finished running in 100 steps — as we would expect, — and it will also generate a directory leak.trace containing html files. Each html file shows the state of the STG machine after a single evaluation step. You can browse these files here.

Tracing the program

Steps 0 through 16 take us from main to poll monadIO, which is where things get interesting, because from this point on, only code inside poll will be executing. Remember, poll is an infinite loop, so it won’t give a chance for worker to run ever again.

Each iteration of the poll loop consists of two phases. During the first phase, poll monadIO is evaluated. This is the “pure” part. No IO gets done during this part; we are just figuring out what is going to be executed. The first phase runs up until step 24.

On step 25, we grab the RealWorld token from the stack, and the second phase — the IO phase — begins. It ends on step 42, when the next iteration of the loop begins with poll monadIO.

Let’s look at the first phase in more detail. In steps 18 and 19, the let-expression

let {
}
in then monad ret_thunk poll_poll_thunk

is evaluated. The thunks ret_thunk and poll_poll_thunk are allocated on the heap at addresses $3 and $4, respectively.

Later these thunks will be evaluated/updated to partial applications: $3=PAP(returnIO unit) on step 35 and $4=PAP(thenIO $7$8) on step 50.

We would hope that these partial applications will eventually be garbage-collected. Unfortunately, not. The partial application $1=PAP(thenIO$3 $4) is defined in terms of $3 and $4. $1 is the worker_poll_thunk, the “next” instance of the poll loop invoked by worker.

This is why the leak doesn’t occur if there’s no outer loop. Nothing would reference $3 and $4, and they would be executed and gc’d.

IO that doesn’t leak

The memory leak is a combination of two reasons. As we discussed above, the first reason is the outer loop that holds on to the reference to the inner loop.

The second reason is that IO happens here in two phases: the pure phase, during which we “compute” the IO action, and the second phase, during which we run the computed action. If there was no first phase, there would be nothing to remember.

Consider this version of the nested loop. Here, I moved NOINLINE to poll. (NOINLINE is needed because otherwise ghc would realize that our program doesn’t do anything and would simplify it down to a single infinite loop.)

main :: IO ()
main = worker

worker :: (Monad m) => m ()
worker =
let loop = poll >> loop
in loop

{-# NOINLINE poll #-}
poll :: (Monad m) => m a
poll = return () >> poll

In this version, ghc would inline worker into main and specialize it to IO. Here is the ghc’s STG code:

poll_rqk =
sat-only \r srt:SRT:[] [$dMonad_s322] let { sat_s324 = \u srt:SRT:[] [] poll_rqk$dMonad_s322; } in
let { sat_s323 = \u srt:SRT:[] [] return $dMonad_s322 (); } in >>$dMonad_s322 sat_s323 sat_s324;

main1 =
\r srt:SRT:[r3 :-> main1, r54 :-> $fMonadIO] [s_s325] case poll_rqk$fMonadIO s_s325 of _ {
(#,#) ipv_s327 _ -> main1 ipv_s327;
};

Here, poll still runs in two phases, but main1 (the outer loop) doesn’t. This program still allocates memory and runs not as efficient as it could, but at least it runs in constant memory. This is because the compiler realizes that poll_rqk $fMonadIO is not computing anything useful and there’s no point in caching that value. (I am actually curious what exactly ghc’s logic is here.) What if we push NOINLINE even further down? main :: IO () main = worker worker :: (Monad m) => m () worker = let loop = poll >> loop in loop poll :: (Monad m) => m a poll = do_stuff >> poll {-# NOINLINE do_stuff #-} do_stuff :: Monad m => m () do_stuff = return () STG: do_stuff_rql = sat-only \r srt:SRT:[] [$dMonad_s32i] return $dMonad_s32i ();$spoll_r2SR =
sat-only \r srt:SRT:[r54 :-> $fMonadIO, r2SR :->$spoll_r2SR] [s_s32j]
case do_stuff_rql $fMonadIO s_s32j of _ { (#,#) ipv_s32l _ ->$spoll_r2SR ipv_s32l;
};

main1 =
\r srt:SRT:[r3 :-> main1, r2SR :-> $spoll_r2SR] [s_s32n] case$spoll_r2SR s_s32n of _ {
(#,#) ipv_s32p _ -> main1 ipv_s32p;
};

This code runs very efficiently, in a single phase, and doesn’t allocate at all.

Of course, in practice we wouldn’t deliberately put these NOINLINEs in our code just to make it inefficient. Instead, the inlining or specialization will fail to happen because the function is too big and/or resides in a different module, or for some other reason.

Arities

Arities provide an important perspective on the two-phase computation issue. The arity of then is 1: it is just a record selector. The arity of thenIO is 3: it takes the two monadic values and the RealWorld state token.

Arities influence what happens at runtime, as can be seen from the STG reduction rules. Because thenIO has arity 3, a partial application is created for thenIO ret_thunk poll_poll_thunk. Let’s change the arity of thenIO to 2, so that no PAPs get created:

thenIO = FUN(m k ->
case m realWorld of {
(# new_s, result #) -> k
}
);

(this is similar to how unsafePerformIO works). Now we no longer have PAPs, but our heap is filled with the same exact number of BLACKHOLEs.

More importantly, arities also influence what happens during compile time: what shape the generated STG code has. Because then has arity 1, ghc decides to create a chain of thens before passing the RealWorld token. Let’s change (“eta-expand”) the poll code as if then had arity 4, without actually changing then or thenIO or their runtime arities:

# added a dummy argument s
let {
}
in then monad ret_thunk poll_poll_thunk s
);
# no change in then or thenIO
}
);
thenIO = FUN(m k s ->
case m s of {
(# new_s, result #) -> k new_s
}
);

This code now runs in constant memory!

Therefore, what inlining/specialization does is that it lets the compiler to see the true arity of a function such as then. (Of course, it would also allow the compiler to replace then with thenIO.)

Conclusions

Let me tell you how you can avoid any such space leaks in your code by following a simple rule:

I don’t know.

In some cases, -fno-full-laziness or -fno-state-hack help. In this case, they don’t.

In 2012, I wrote why reasoning about space usage in Haskell is hard. I don’t think anything has changed since then. It is a hard problem to solve. I filed a ghc bug #13080 just in case the ghc developers might figure out a way how to address this particular issue.

Most of the time everything works great, but once in a while you stumble upon something like this. Such is life.

Thanks to Reid Barton for pointing out that my original theory regarding this leak was incomplete at best.

Foldable.mapM_, Maybe, and recursive functions

NOTE This content originally appeared on School of Haskell.

I've run into this issue myself, and seen others hit it too. Let's start off with some very simple code:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
sayHi :: Maybe String -> IO ()
sayHi mname =
case mname of
Nothing -> return ()
Just name -> putStrLn $"Hello, " ++ name main :: IO () main = sayHi$ Just "Alice"

There's nothing amazing about this code, it's pretty straight-forward pattern matching Haskell. And at some point, many Haskellers end up deciding that they don't like the explicit pattern matching, and instead want to use a combinator. So the code above might get turned into one of the following:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
import Data.Foldable (forM_)

hiHelper :: String -> IO ()
hiHelper name = putStrLn $"Hello, " ++ name sayHi1 :: Maybe String -> IO () sayHi1 = maybe (return ()) hiHelper sayHi2 :: Maybe String -> IO () sayHi2 = mapM_ hiHelper main :: IO () main = do sayHi1$ Just "Alice"
sayHi2 $Just "Bob" -- or often times this: forM_ (Just "Charlie") hiHelper The theory is that all three approaches (maybe, mapM_, and forM_) will end up being identical. We can fairly conclusively state that forM_ will be the exact same thing as mapM_, since it's just mapM_ flipped. So the question is: will the maybe and mapM_ approaches do the same thing? In this case, the answer is yes, but let's spice it up a bit more. First, the maybe version: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s import Control.Monad (when) uncons :: [a] -> Maybe (a, [a]) uncons [] = Nothing uncons (x:xs) = Just (x, xs) printChars :: Int -> [Char] -> IO () printChars idx str = maybe (return ()) (\(c, str') -> do when (idx mod 100000 == 0)$ putStrLn $"Character #" ++ show idx ++ ": " ++ show c printChars (idx + 1) str') (uncons str) main :: IO () main = printChars 1$ replicate 5000000 'x'

You can compile and run this by saving to a Main.hs file and running stack Main.hs && ./Main. On my system, it prints out the following memory statistics, which from the maximum residency you can see runs in constant space:

   2,200,270,200 bytes allocated in the heap
788,296 bytes copied during GC
44,384 bytes maximum residency (2 sample(s))
24,528 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)

While constant space is good, the usage of maybe makes this a bit ugly. This is a common time to use forM_ to syntactically clean things up. So let's give that a shot:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s

uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (x:xs) = Just (x, xs)

printChars :: Int -> [Char] -> IO ()
printChars idx str = forM_ (uncons str) $\(c, str') -> do when (idx mod 100000 == 0)$ putStrLn $"Character #" ++ show idx ++ ": " ++ show c printChars (idx + 1) str' main :: IO () main = printChars 1$ replicate 5000000 'x'

The code is arguablycleaner and easier to follow. However, when I run it, I get the following memory stats:

   3,443,468,248 bytes allocated in the heap
632,375,152 bytes copied during GC
132,575,648 bytes maximum residency (11 sample(s))
2,348,288 bytes maximum slop
331 MB total memory in use (0 MB lost due to fragmentation)

Notice how max residency has balooned up from 42kb to 132mb! And if you increase the size of the generated list, that number grows. In other words: we have linear memory usage instead of constant, clearer something we want to avoid.

The issue is that the implementation of mapM_ in Data.Foldable is not tail recursive, at least for the case of Maybe. As a result, each recursive call ends up accumulating a bunch of "do nothing" actions to perform after completing the recursive call, which all remain resident in memory until the entire list is traversed.

Fortunately, solving this issue is pretty easy: write a tail-recursive version of forM_ for Maybe:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec -- ghc -with-rtsopts -s

uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (x:xs) = Just (x, xs)

forM_Maybe :: Monad m => Maybe a -> (a -> m ()) -> m ()
forM_Maybe Nothing _ = return ()
forM_Maybe (Just x) f = f x

printChars :: Int -> [Char] -> IO ()
printChars idx str = forM_Maybe (uncons str) $\(c, str') -> do when (idx mod 100000 == 0)$ putStrLn $"Character #" ++ show idx ++ ": " ++ show c printChars (idx + 1) str' main :: IO () main = printChars 1$ replicate 5000000 'x'

This implementation once again runs in constant memory.

There's one slight difference in the type of forM_Maybe and forM_ specialized to Maybe. The former takes a second argument of type a -> m (), while the latter takes a second argument of type a -> m b. This difference is unfortunately necessary; if we try to get back the original type signature, we have to add an extra action to wipe out the return value, which again reintroduces the memory leak:

forM_Maybe :: Monad m => Maybe a -> (a -> m b) -> m ()
forM_Maybe Nothing _ = return ()
forM_Maybe (Just x) f = f x >> return ()

Try swapping in this implementation into the above program, and once again you'll get your memory leak.

mono-traversable

Back in 2014, I raised this same issue about the mono-traversable library, and ultimately decided to change the type signature of the omapM_ function to the non-overflowing demonstrated above. You can see that this in fact works:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc exec --package mono-traversable -- ghc -with-rtsopts -s
import Data.MonoTraversable (oforM_)

uncons :: [a] -> Maybe (a, [a])
uncons [] = Nothing
uncons (x:xs) = Just (x, xs)

printChars :: Int -> [Char] -> IO ()
printChars idx str = oforM_ (uncons str) $\(c, str') -> do when (idx mod 100000 == 0)$ putStrLn $"Character #" ++ show idx ++ ": " ++ show c printChars (idx + 1) str' main :: IO () main = printChars 1$ replicate 5000000 'x'

As we'd hope, this runs in constant memory.

Building free arrows from components

Introduction

Gabriel Gonzalez has written quite a bit about the practical applications of free monads. And "haoformayor" wrote a great stackoverflow post on how arrows are related to strong profunctors. So I thought I'd combine these and apply them to arrows built from profunctors: free arrows. What you get is a way to use arrow notation to build programs, but defer the interpretation of those programs until later.

Heteromorphisms

Using the notation here I'm going to call an element of a type P a b, where P is a profunctor, a heteromorphism.

A product that isn't much of a product

As I described a while back you can compose profunctors. Take a look at the code I used, and also Data.Functor.Composition.

data Compose f g d c = forall a. Compose (f d a) (g a c)
An element of Compose f g d c is just a pair of heteromorphisms, one from each of the profunctors, f and g, with the proviso that the "output" type of one is compatible with the "input" type of the other. As products go it's pretty weak in the sense that no composition happens beyond the two objects being stored with each other. And that's the basis of what I'm going to talk about. The Compose type is just a placeholder for pairs of heteromorphisms whose actual "multiplication" is being deferred until later. This is similar to the situation with the free monoid, otherwise known as a list. We can "multiply" two lists together using mappend but all that really does is combine the elements into a bigger list. The elements themselves aren't touched in any way. That suggests the idea of using profunctor composition in the same way that (:) is used to pair elements and lists.

Free Arrows

Here's some code:

> {-# OPTIONS -W #-}> {-# LANGUAGE ExistentialQuantification #-}> {-# LANGUAGE Arrows #-}> {-# LANGUAGE RankNTypes #-}> {-# LANGUAGE TypeOperators #-}> {-# LANGUAGE FlexibleInstances #-}> import Prelude hiding ((.), id)> import Control.Arrow> import Control.Category> import Data.Profunctor> import Data.Monoid> infixr :-> data FreeA p a b = PureP (a -> b)>                  | forall x. p a x :- FreeA p x b
First look at the second line of the definition of FreeA. It says that a FreeA p a b might be a pair consisting of a head heteromorphism whose output matches the input of another FreeA. There's also the PureP case which is acting like the empty list []. The reason we use this is that for our composition, (->) acts a lot like the identity. In particular Composition (->) p a b is isomorphic to p a b (modulo all the usual stuff about non-terminating computations and so on). This is because an element of this type is a pair consisting of a function a -> x and a heteromorphism p x b for some type x we don't get to see. We can't project back out either of these items without information about the type of x escaping. So the only thing we can possibly do is use lmap to apply the function to the heteromorphism giving us an element of p a b.

Here is a special case of PureP we'll use later:

> nil :: Profunctor p => FreeA p a a> nil = PureP id
So an element of FreeA is a sequence of heteromorphisms. If heteromorphisms are thought of as operations of some sort, then an element of FreeA is a sequence of operations waiting to be composed together into a program that does something. And that's just like the situation with free monads. Once we've build a free monad structure we apply an interpreter to it to evaluate it. This allows us to separate the "pure" structure representing what we want to do from the code that actually does it.

The first thing to note is our new type is also a profunctor. We can apply lmap and rmap to a PureP function straightforwardly. We apply lmap directly to the head of the list and we use recursion to apply rmap to the PureP at the end:

> instance Profunctor b => Profunctor (FreeA b) where>     lmap f (PureP g) = PureP (g . f)>     lmap f (g :- h) = (lmap f g) :- h>     rmap f (PureP g) = PureP (f . g)>     rmap f (g :- h) = g :- (rmap f h)
We also get a strong profunctor by applying first' all the way down the list:

> instance Strong p => Strong (FreeA p) where>     first' (PureP f) = PureP (first' f)>     first' (f :- g) = (first' f) :- (first' g)
We can now concatenate our lists of heteromorphisms using code that looks a lot like the typical implementation of (++):

> instance Profunctor p => Category (FreeA p) where>     id = PureP id>     g . PureP f = lmap f g>     k . (g :- h) = g :- (k . h)
Note that it's slightly different to what you might have expected compared to (++) because we tend to write composition of functions "backwards". Additionally, there is another definition of FreeA we could have used that's analogous to using snoc lists instead of cons lists.

And now we have an arrow. I'll leave the proofs that the arrow laws are obeyed as an exercise :-)

> instance (Profunctor p, Strong p) => Arrow (FreeA p) where>     arr = PureP>     first = first'
The important thing about free things is that we can apply interpreters to them. For lists we have folds:

foldr :: (a -> b -> b) -> b -> [a] -> b
In foldr f e we can think of f as saying how (:) should be interpreted and e as saying how [] should be interpreted.

Analogously, in Control.Monad.Free in the free package we have:

foldFree :: Monad m => (forall x . f x -> m x) -> Free f a -> m afoldFree _ (Pure a)  = return afoldFree f (Free as) = f as >>= foldFree f
Given a natural transformation from f to m, foldFree extends it to all of Free f.

> foldFreeA :: (Profunctor p, Arrow a) =>>              (forall b c.p b c -> a b c) -> FreeA p b c -> a b c> foldFreeA _ (PureP g) = arr g> foldFreeA f (g :- h) = foldFreeA f h . f g
It's a lot like an ordinary fold but uses the arrow composition law to combine the interpretation of the head with the interpretation of the tail.

"Electronic" components

Let me revisit the example from my previous article. I'm going to remove things I won't need so my definition of Circuit is less general here. Free arrows are going to allow us to define individual components for a circuit, but defer exactly how those components are interpreted until later.

I'll use four components this time: a register we can read from, one we can write from and a register incrementer, as well as a "pure" component. But before that, let's revisit Gabriel's article that gives some clues about how components should be built. In particular, look at the definition of TeletypeF:

data TeletypeF x  = PutStrLn String x  | GetLine (String -> x)  | ExitSuccess
We use GetLine to read a string, and yet the type of GetLine k could be TeletypeF a for any a. The reason is that free monads work with continuations. Instead of GetLine returning a string to us, it's a holder for a function that says what we'd like to do with the string once we have it. That means we can leave open the question of where the string comes from. The function foldFree can be used to provide the actual string getter.

Free arrows are like "two-sided" free monads. We don't just provide a continuation saying what we'd like to do to our output. We also get to say how we prepare our data for input.

There's also some burden put on us. Free arrows need strong profunctors. Strong profunctors need to be able to convey extra data alongside the data we care about - that's what first' is all about. This means that even though Load is functionally similar to GetLine, it can't simply ignore its input. So we don't have Load (Int -> b), and instead have Load ((a, Int) -> b. Here is our component type:

> data Component a b = Load ((a, Int) -> b)>                    | Store (a -> (b, Int))>                    | Inc (a -> b)
The Component only knows about the data passing through, of type a and b. It doesn't know anything about how the data in the registers is stored. That's the part that will be deferred to later. We intend for Inc to increment a register. But as it doesn't know anything about registers nothing in the type of Inc refers to that. (It took a bit of experimentation for me to figure this out and there may be other ways of doing things. Often with code guided by category theory you can just "follow your nose" as there's one way that works and type checks. Here I found a certain amount of flexibility in how much you store in the Component and how much is deferred to the interpreter.)

I could implement the strong profunctor instances using various combinators but I think it might be easiest to understand when written explicitly with lambdas:

> instance Profunctor Component where>     lmap f (Load g) = Load $\(a, s) -> g (f a, s)> lmap f (Store g) = Store (g . f)> lmap f (Inc g) = Inc (g . f)> rmap f (Load g) = Load (f . g)> rmap f (Store g) = Store$ \a -> let (b, t) = g a>                                      in  (f b, t)>     rmap f (Inc g) = Inc (f . g)> instance Strong Component where>     first' (Load g) = Load $\((a, x), s) -> (g (a, s), x)> first' (Store g) = Store$ \(a, x) -> let (b, t) = g a>                                           in  ((b, x), t)>     first' (Inc g) = Inc (first' g)
And now we can implement individual components. First a completely "pure" component:

> add :: Num a => FreeA Component (a, a) a> add = PureP $uncurry (+) And now the load and store operations. > load :: FreeA Component () Int> load = Load (\(_, a) -> a) :- nil> store :: FreeA Component Int ()> store = Store (\a -> ((), a)) :- nil> inc :: FreeA Component a a> inc = Inc id :- nil Finally we can tie it all together in a complete function using arrow notation: > test = proc () -> do> () <- inc -< ()> a <- load -< ()> b <- load -< ()> c <- add -< (a, b)> () <- store -< c> returnA -< () At this point, the test object is just a list of operations waiting to be executed. Now I'll give three examples of semantics we could provide. The first uses a state arrow type similar to the previous article: > newtype Circuit s a b = C { runC :: (a, s) -> (b, s) }> instance Category (Circuit s) where> id = C id> C f . C g = C (f . g)> instance Arrow (Circuit s) where> arr f = C$ \(a, s) -> (f a, s)>     first (C g) = C $\((a, x), s) -> let (b, t) = g (a, s)> in ((b, x), t) Here is an interpreter that interprets each of our components as an arrow. Note that this is where, among other things, we provide the meaning of the Inc operation: > exec :: Component a b -> Circuit Int a b> exec (Load g) = C$ \(a, s) -> (g (a, s), s)> exec (Store g) = C $\(a, _) -> g a> exec (Inc g) = C$ \(a, s) -> (g a, s+1)
Here's a completely different interpreter that is going to make you do the work of maintaining the state used by the resgisters. You'll be told what to do! We'll use the Kleisli IO arrow to do the I/O.

> exec' :: Component a b -> Kleisli IO a b> exec' (Load g) = Kleisli $\a -> do> putStrLn "What is your number now?"> s <- fmap read getLine> return$ g (a, s)> exec' (Store g) = Kleisli $\a -> do> let (b, t) = g a> putStrLn$ "Your number is now " ++ show t ++ ".">     return b> exec' (Inc g) = Kleisli $\a -> do> putStrLn "Increment your number."> return$ g a
The last interpreter is simply going to sum values associated to various components. They could be costs in dollars, time to execute, or even strings representing some kind of simple execution trace.

> newtype Labelled m a b = Labelled { unLabelled :: m }> instance Monoid m => Category (Labelled m) where>     id = Labelled mempty>     Labelled a . Labelled b = Labelled (a mappend b)> instance Monoid m => Arrow (Labelled m) where>     arr _ = Labelled mempty>     first (Labelled m) = Labelled m> exec'' (Load _) = Labelled (Sum 1)> exec'' (Store _) = Labelled (Sum 1)> exec'' (Inc _) = Labelled (Sum 2)
Note that we can't assign non-trivial values to "pure" operations.

And now we execute all three:

> main = do>     print $runC (foldFreeA exec test) ((), 10)> putStrLn "Your number is 10." >> runKleisli (foldFreeA exec' test) ()> print$ getSum $unLabelled$ foldFreeA exec'' test

Various thoughts

I don't know if free arrows are anywhere near as useful as free monads, but I hope I've successfully illustrated one application. Note that because arrow composition is essentially list concatenation it may be more efficient to use a version of Hughes lists. This is what the Cayley representation is about in the monoid notions paper. But it's easier to see the naive list version first. Something missing from here that is essential for electronics simulation is the possibility of using loops. I haven't yet thought too much about what it means to build instances of ArrowLoop freely.

Profunctors have been described as decategorised matrices in the sense that p a b, with p a profunctor, is similar to the matrix . Or, if you're working in a context where you distinguish between co- and contravariant vectors, it's similar to . The Composition operation is a lot like the definition of matrix product. From this perspective, the FreeA operation is a lot like the function on matrices that takes to . To work with ArrowLoop we need a trace-like operation.

One nice application of free monads is in writing plugin APIs. Users can write plugins that link to a small library based on a free monad. These can then be dynamically loaded and interpreted by an application at runtime, completely insulating the plugin-writer from the details of the application. You can think of it as a Haskell version of the PIMPL idiom. Free arrows might give a nice way to write plugins for dataflow applications.

People typically think of functors as containers. So in a free monad, each element is a container of possible futures. In a free arrow the relationship between the current heteromorphism and its "future" (and "past") is a bit more symmetrical. For example, for some definitions of P, a heteromorphism P a b can act on some as to give us some bs. But some definitions of P can run "backwards" and act on elements of b -> r to give us elements of a -> r. So when I use the words "input" and "output" above, you might not want to take them too literally.

Developer (Evanston Campus) at Northwestern University (Full-time)

Northwestern University Opportunity (Job ID 30057):

Northwestern University seeks to employ a varied and diverse range of dynamic people who understand the importance of our mission and vision. When you consider a career at Northwestern University, you know that you are joining an institution with a deep history of academic, professional and personal excellence.

Currently, we have a career opportunity as a Developer (Evanston Campus).

Job Summary:

The CCL is looking for a full-time Software Developer to work on NetLogo. This Software Developer position is based at Northwestern University's Center for Connected Learning and Computer-Based Modeling (CCL), working in a small collaborative development team in a university research group that also includes professors, postdocs, graduate students, and undergraduates, supporting the needs of multiple research projects. A major focus would be on development of NetLogo, an open-source modeling environment for both education and scientific research. CCL grants also involve development work on HubNet and other associated tools for NetLogo, including research and educational NSF grants involving building, delivering, and assessing NetLogo-based science curricula for secondary schools.

NetLogo is a programming language and agent-based modeling environment. The NetLogo language is a dialect of Logo/Lisp specialized for building agent-based simulations of natural and social phenomena. NetLogo has tens of thousands of users ranging from grade school students to advanced researchers. A collaborative extension of NetLogo, called HubNet, enables groups of participants to run participatory simulation activities in classrooms and distributed participatory simulations in social science research.

The Northwestern campus is in Evanston, Illinois on the Lake Michigan shore, adjacent to Chicago and easily reachable by public transportation.

Specific Responsibilities:

• Collaborates with the NetLogo development team in designing features for NetLogo, HubNet and web-based versions of these applications; writes code independently, and in the context of a team of experienced software engineers and principal investigator;
• Creates, updates and documents existing models using NetLogo, HubNet and web-based applications; creates new such models;
• Supports development of new devices to interact with HubNet; interacts with commercial and academic partners to help determine design and functional requirements for NetLogo and HubNet; interacts with user community including responding to bug reports, questions, and suggestions, and interacting with open-source contributors;
• Performs data collection, organization, and summarization for projects; assists with coordination of team activities.
• Performs other duties as required or assigned.

Minimum Qualifications:

• Successful completion of a full 4-year course of study in an accredited college or university leading to a bachelor's or higher degree; OR appropriate combination of education and experience.
• 2 years of relevant experience required.
• Demonstrated experience and enthusiasm for writing clean, modular, well-tested code.

Preferred Qualifications: (Education and experience)

• Experience with working effectively as part of a small software development team, including close collaboration, distributed version control, and automated testing;
• Experience with building web-based applications, both server-side and client-side components, particularly with html5 and JavaScript and/or CoffeeScript;
• Experience with at least one JVM language such as Java;
• Experience with Scala programming, or enthusiasm for learning it;
• Experience with Haskell, Lisp, or other functional languages;
• Interest in and experience with programming language implementation, functional programming, and metaprogramming;
• Experience with GUI design; language design and compilers;
• Interest in and experience with computer-based modeling and simulation, especially agent-based simulation;
• Interest in and experience with distributed, multiplayer, networked systems like HubNet;
• Experience with physical computing;
• Experience with participatory simulations;
• Experience with cross-platform mobile development;
• Experience working on research projects in an academic environment;
• Experience with open-source software development and supporting the growth of an open-source community;
• Interest in education and an understanding of secondary school math and science content.

Working at Northwestern University:

Beyond being a place to learn and grow professionally, Northwestern is an exciting and fulfilling place to work! Northwestern offers many benefit options to full and part-time employees including: competitive compensation; excellent retirement plans; comprehensive medical, dental and vision coverage; dependent care match; vacation, sick and holiday pay; professional development opportunities and tuition reimbursement. Northwestern greatly values work/life balance amongst its employees. Take advantage of recreational, cultural, and enrichment opportunities on campus. Employees also receive access to childcare solutions, retail discounts, and other work/life balance resources.

Northwestern University is an equal opportunity employer and strongly believes in creating an environment that welcomes students, faculty and staff of all races, nationalities and religions. In doing so, we offer our students the opportunity to learn and grow in diverse communities preparing them for successful careers in an increasingly global and diverse work force.

For consideration, please click on the link below. You will be directed to Northwestern University's electronic recruiting system, eRecruit, where you will apply for current openings. Once you apply, you will receive an email confirming submission of your resume. For all resumes received, if there is interest in your candidacy, the human resources recruiter or the department hiring manager will contact you. Job Opening ID number for this position is 30057.

30057-Developer

As per Northwestern University policy, this position requires a criminal background check. Successful applicants will need to submit to a criminal background check prior to employment.

Northwestern University is an Equal Opportunity, Affirmative Action Employer of all protected classes, including veterans and individuals with disabilities. Women, racial and ethnic minorities, individuals with disabilities, and veterans are encouraged to apply. Hiring is contingent upon eligibility to work in the United States.

Get information on how to apply for this position.

[lvbetgkb] Right section of a function

A left section of a binary (two-argument) function is easy to write in Haskell using partial function application: just omit the last (right) argument.  A right section is a little bit more awkward, requiring backquotes, lambda, or flip.

import Data.Function((&)); -- example binary function (not an operator) f2 :: a -> [a] -> [a]; f2 = (:); -- we will use the larger functions later f3 :: Int -> a -> [a] -> [a]; f3 _ = (:); f4 :: Bool -> Int -> a -> [a] -> [a]; f4 _ _ = (:); test :: [String]; test = map (\f -> f 'h') -- all of these evaluate 'h':("el"++"lo") yielding hello [ (f2 ("el" ++ "lo")) -- backquotes (grave accents) are inline operator syntax. An inline operator followed by an argument, all surrounded by parentheses, is operator right section syntax: one is supposed to imagine a hole in front of the backquotes: (__ f2 ("el" ++ "lo")) , (\arg1 -> f2 arg1 ("el" ++ "lo")) -- lambda syntax , (\arg1 -> f2 arg1 $"el" ++ "lo") , ((flip f2) ("el" ++ "lo")) , ((flip f2)$ "el" ++ "lo") , (flip f2 $"el" ++ "lo") , (flip f2 ("el" ++ "lo")) -- It might be a little surprising that this one works, if one had thought of "flip" as a function taking only one argument, namely the function to be flipped. However, because of currying, it actually takes 3 arguments. flip :: (a -> b -> c) -> b -> a -> c. , ("el" ++ "lo" & flip f2) -- For these 3- and 4-argument cases, we would like to create a lambda on the penultimate argument. -- , (f3 (2 + 3) ("el" ++ "lo")) -- This does not work because the contents of the backquotes must be a binary function that is a single token, not an expression. , (let { t2 = f3 (2 + 3) } in (t2 ("el" ++ "lo"))) , (\penultimate -> f3 (2 + 3) penultimate ("el" ++ "lo")) , (\penultimate -> f3 (2 + 3) penultimate$ "el" ++ "lo") -- this wordy lambda syntax is one of the best in terms of low parenthesis count and avoiding deep parentheses nesting. , (flip (f3 (2 + 3)) ("el" ++ "lo")) -- similar to "a little surprising" above , (flip (f3 (2 + 3)) $"el" ++ "lo") , (flip (f3$ 2 + 3) $"el" ++ "lo") , ((flip$ f3 (2 + 3)) $"el" ++ "lo") , ((flip$ f3 $2 + 3)$ "el" ++ "lo") , ("el" ++ "lo" & (f3 (2 + 3) & flip)) , ("el" ++ "lo" & (2 + 3 & f3 & flip)) , (\penultimate -> f4 (not True) (2 + 3) penultimate ("el" ++ "lo")) , (\penultimate -> f4 (not True) (2 + 3) penultimate $"el" ++ "lo"), (let { t2 = f4 (not True) (2 + 3) } in (t2 ("el" ++ "lo"))) , (flip (f4 (not True) (2 + 3)) ("el" ++ "lo")) , (flip (f4 (not True) (2 + 3))$ "el" ++ "lo") , ((flip $f4 (not True) (2 + 3))$ "el" ++ "lo") , ((flip $f4 (not True)$ 2 + 3) $"el" ++ "lo") , ("el" ++ "lo" & (f4 (not True) (2 + 3) & flip)) , ("el" ++ "lo" & (2 + 3 & f4 (not True) & flip)) , ("el" ++ "lo" & (2 + 3 & (not True & f4) & flip)) ]; (\f -> f 'h') could have been written ($ 'h') , a right section itself, but we deliberately avoid being potentially obscure in the test harness.

ANN: containers 0.5.9.1

containers 0.5.9.1

The containers package contains efficient general-purpose implementations of various basic immutable container types. The declared cost of each operation is either worst-case or amortized, but remains valid even if structures are shared.

Changes since 0.5.8.1 (2016-08-31)

The headline change is adding merge and mergeA for Data.IntMap. The versions for Data.Map were introduced in 0.5.8.1, so this change restores parity between the interfaces. With this in place we hope this version will make it into GHC 8.2.

Other changes include:

• Add instances for Data.Graph.SCC: Foldable, Traversable, Data, Generic, Generic1, Eq, Eq1, Show, Show1, Read, and Read1.
• Add lifted instances (from Data.Functor.Classes) for Data.Sequence, Data.Map, Data.Set, Data.IntMap, and Data.Tree. (Thanks to Oleg Grenrus for doing a lot of this work.)
• Properly deprecate functions in Data.IntMap long documented as deprecated.
• Rename several internal modules for clarity. Thanks to esoeylemez for starting this process.
• Make Data.Map.fromDistinctAscList and Data.Map.fromDistinctDescList more eager, improving performance.
• Plug space leaks in Data.Map.Lazy.fromAscList and Data.Map.Lazy.fromDescList by manually inlining constant functions.
• Add lookupMin and lookupMax to Data.Set and Data.Map as total alternatives to findMin and findMax.
• Add (!?) to Data.Map as a total alternative to (!).
• Avoid using deleteFindMin and deleteFindMax internally, preferring total functions instead. New implementations of said functions lead to slight performance improvements overall.

Call for interest: Haskell in middle school math education

Just a pointer to this post in haskell-cafe: Call for interest: Haskell in middle school math education

The TL;DR version is that a few people have put together a sizable budget to make the next big push to get CodeWorld and Haskell into middle school mathematics.  We’re looking to produce high-quality resources like video, study materials, etc. to enable teachers to easily use Haskell to make mathematics more tangible and creative in their classrooms for students ages about 11 to 14.  If this interests you, read the announcement!

Addressing Pieces of State with Profunctors

Attempted segue

Since I first wrote about profunctors there has been quite a bit of activity in the area so I think it's about time I revisited them. I could just carry on from where I left off 5 years ago but there have been so many tutorials on the subject that I think I'll have to assume you've looked at them. My favourite is probably Phil Freeman's Fun with Profunctors. What I intend to do here is solve a practical problem with profunctors.

The problem

Arrows are a nice mechanism for building circuit-like entities in code. In fact, they're quite good for simulating electronic circuits. Many circuits are very much like pieces of functional code. For example an AND gate like this

can be nicely modelled using a pure function: c = a && b. But some components, like flip-flops, have internal state. What comes out of the outputs isn't a simple function of the inputs right now, but depends on what has happened in the past. (Alternatively you can take the view that the inputs and outputs aren't the current values but the complete history of the values.)

We'll use (Hughes) arrows rather than simple functions. For example, one kind of arrow is the Kleisli arrow. For the case of Kleisli arrows built from the state monad, these are essentially functions of type a -> s -> (b, s) where s is our state. We can write these more symmetrically as functions of type (a, s) -> (b, s). We can think of these as "functions" from a to b where the output is allowed to depend on some internal state s. I'll just go ahead and define arrows like this right now.

First the extensions and imports:

> {-# OPTIONS -W #-}> {-# LANGUAGE Arrows #-}> {-# LANGUAGE RankNTypes #-}> {-# LANGUAGE FlexibleInstances #-}> import Prelude hiding ((.), id)> import Control.Arrow> import Control.Category> import Data.Profunctor> import Data.Tuple
And now I'll define our stateful circuits. I'm going to make these slightly more general than I described allowing circuits to change the type of their state:

> newtype Circuit s t a b = C { runC :: (a, s) -> (b, t) }> instance Category (Circuit s s) where>     id = C id>     C f . C g = C (f . g)> instance Arrow (Circuit s s) where>     arr f = C $\(a, s) -> (f a, s)> first (C g) = C$ \((a, x), s) -> let (b, t) = g (a, s)>                                       in  ((b, x), t)
This is just a more symmetrical rewrite of the state monad as an arrow. The first method allows us to pass through some extra state, x, untouched.

Now for some circuit components. First the "pure" operations, a multiplier and a negater:

> mul :: Circuit s s (Int, Int) Int> mul = C $\((x, y), s) -> (x*y, s)> neg :: Circuit s s Int Int> neg = C$ \(x, s) -> (-x, s)
And now some "impure" ones that read and write some registers as well as an accumulator:

> store :: Circuit Int Int Int ()> store = C $\(x, _) -> ((), x)> load :: Circuit Int Int () Int> load = C$ \((), s) -> (s, s)> accumulate :: Circuit Int Int Int Int> accumulate = C $\(a, s) -> (a, s+a) I'd like to make a circuit that has lots of these components, each with its own state. I'd like to store all of these bits of state in a larger container. But that means that each of these components needs to have a way to address its own particular substate. That's the problem I'd like to solve. Practical profunctor optics In an alternative universe lenses were defined using profunctors. To find out more I recommend Phil Freeman's talk that I linked to above. Most of the next paragraph is just a reminder of what he says in that talk and I'm going to use the bare minimum to do the job I want. Remember that one of the things lenses allow you to do is this: suppose we have a record s containing a field of type a and another similar enough kind of record t with a field of type b. Among other things, a lens gives a way to take a rule for modifying the a field to a b field and extend it to a way to modify the s record into a t record. So we can think of lenses as giving us functions of type (a -> b) -> (s -> t). Now if p is a profunctor then you can think of p a b as being a bit function-like. Like functions, profunctors typically (kinda, sorta) get used to consume (zero or more) objects of type a and output (zero or more) objects of type b. So it makes sense to ask our lenses to work with these more general objects too, i.e. we'd like to be able to get something of type p a b -> p s t out of a lens. A strong profunctor is one that comes pre-packed with a lens that can do this for the special case where the types s and t are 2-tuples. But you can think of simple records as being syntactic sugar for tuples of fields, so strong profunctors also automatically give us lenses for records. Again, watch Phil's talk for details. So here is our lens type: > type Lens s t a b = forall p. Strong p => p a b -> p s t Here are lenses that mimic the well known ones from Control.Lens: > _1 :: Lens (a, x) (b, x) a b> _1 = first'> _2 :: Lens (x, a) (x, b) a b> _2 = dimap swap swap . first' (Remember that dimap is a function to pre- and post- compose a function with two others.) Arrows are profunctors. So Circuit s s, when wrapped in WrappedArrow, is a profunctor. So now we can directly use the Circuit type with profunctor lenses. This is cool, but it doesn't directly solve our problem. So we're not going to use this fact. We're interested in addressing the state of type s, not the values of type a and b passed through our circuits. In other words, we're interested in the fact that Circuit s t a b is a profunctor in s and t, not a and b. To make this explicit we need a suitable way to permute the arguments to Circuit: > newtype Flipped p s t a b = F { unF :: p a b s t } (It was tempting to call that ComedyDoubleAct.) And now we can define: > instance Profunctor (Flipped Circuit a b) where> lmap f (F (C g)) = F$ C $\(a, s) -> g (a, f s)> rmap f (F (C g)) = F$ C $\(a, s) -> let (b, t) = g (a, s)> in (b, f t)> instance Strong (Flipped Circuit a b) where> first' (F (C g)) = F$ C $\(a, (s, x)) -> let (b, t) = g (a, s)> in (b, (t, x)) Any time we want to use this instance of Profunctor with a Circuit we have to wrap everything with F and unF. The function dimap gives us a convenient way to implement such wrappings. Let's implement an imaginary circuit with four bits of state in it. Here is the state: > data CPU = CPU { _x :: Int, _y :: Int, _z :: Int, _t :: Int } deriving Show As I don't have a complete profunctor version of a library like Control.Lens with its template Haskell magic I'll set things up by hand. Here's a strong-profunctor-friendly version of the CPU and a useful isomorphism to go with it: > type ExplodedCPU = (Int, (Int, (Int, Int)))> explode :: CPU -> ExplodedCPU> explode (CPU u v w t) = (u, (v, (w, t)))> implode :: ExplodedCPU -> CPU> implode (u, (v, (w, t))) = CPU u v w t And now we need adapters that take lenses for an ExplodedCPU and (1) apply them to a CPU the way Control.Lens would... > upgrade :: Profunctor p =>> (p a a -> p ExplodedCPU ExplodedCPU) ->> (p a a -> p CPU CPU)> upgrade f = dimap explode implode . f> x, y, z, t :: Flipped Circuit a b Int Int -> Flipped Circuit a b CPU CPU> x = upgrade _1> y = upgrade$ _2 . _1> z = upgrade $_2 . _2 . _1> t = upgrade$ _2 . _2 . _2
...and (2) wrap them so they can be used on the flipped profunctor instance of Circuit:

> (!) :: p s t a b -> (Flipped p a b s t -> Flipped p a b s' t') ->>        p s' t' a b> x ! f = dimap F unF f x
After all that we can now write a short piece of code that represents our circuit. Notice how we can apply the lenses x, ..., t directly to our components to get them to use the right pieces of state:

> test :: Circuit CPU CPU () ()> test = proc () -> do>     a  <- load ! x       -< ()>     b  <- load ! y       -< ()>     c  <- mul            -< (a, b)>     d  <- neg            -< c>     e  <- accumulate ! t -< d>     () <- store ! z      -< e>     returnA              -< ()> main :: IO ()> main = do>     print $runC test ((), CPU 2 30 400 5000) Of course with a suitable profunctor lens library you can do a lot more, like work with traversable containers of components. Note that we could also write a version of all this code using monads instead of arrows. But it's easier to see the symmetry in Flipped Circuit when using arrows, and it also sets the scene for the next thing I want to write about... January 06, 2017 FP Complete Green Threads are like Garbage Collection <html> Many common programming languages today eschew manual memory management in preference to garbage collection. While the former certainly has its place in certain use cases, I believe the majority of application development today occurs in garbage collected languages, and for good reason. Garbage collection moves responsibility for many memory issues from the developer into the language's runtime, mostly removing the possibility of entire classes of bugs while reducing cognitive overhead. In other words, it separates out a concern. That's not to say that garbage collection is perfect or appropriate in all cases, but in many common cases it greatly simplifies code. Languages like Haskell, Erlang, and Go provide a similar separation-of-concern in the form of green threads. Instead of requiring the developer to manually deal with asynchronous (or non-blocking) I/O calls, the runtime system takes responsibility for this. Like garbage collection, it's not appropriate for all use cases, but for many common cases it's a great fit and reduces code complexity. This post will discuss what green threads are, the problems they solve, some cases where they aren't the best fit, and how to get started with green thread based concurrency easily. If you want to jump to that last point, you can download the Stack build tool and start with the async library tutorial. Blocking and non-blocking I/O Suppose you're writing a web server. A naive approach would be to write something like the following psuedo-code: function webServer(port) { var listenSocket = bindSocket(port); while(true) { var socket = accept(listenSocket); forkThread(handleConnection(socket)); } } function handleConnection(socket) { while(true) { var bytes = read(socket); var request = parseRequest(bytes); var response = getResponse(request); write(socket, renderResponse(response)); } } The read and write calls appear to perform blocking I/O, which means that the entire system thread running them will be blocked on the kernel until data is available. Depending on what our forkThread call did, this could mean one of two things: • If forkThread forks a new system thread, then performing blocking I/O isn't a problem: that thread has nothing to do until read and write complete. However, forking a new system thread for each connection is expensive, and does not scale well to hundreds of thousands of concurrent requests. • If forkThread forks a new thread within your runtime (sometimes called a fiber), then multiple fibers will all be running on a single system thread, and each time you make a blocking I/O call, the entire thread will be blocked, preventing any progress on other connections. You've essentially reduced your application to handling one connection at a time. Neither of these approaches is very attractive for writing robust concurrent applications (though the former is certainly better than the latter). Another approach is to use non-blocking I/O. In this case, instead of making a call to read or write which blocks until data is available, you make a call and provide a callback function or continuation to handle what to do with the result data. Let's see what our web server above will look like: function webServer(port) { var listenSocket = bindSocket(port); listenLoop(listenSocket); } function listenLoop(listenSocket) { acceptAsync(listenSocket, function(socket) { handleConnection(socket); listenLoop(listenSocket); }); } function handleConnection(socket) { readAsync(socket, function(bytes) { var request = parseRequest(bytes); var response = getResponse(request); writeAsync(socket, renderResponse(response), function() { handleConnection(socket); }); }); } Let's note some differences: • We're no longer performing any forking. All actions are occuring in one single thread, removing the possibility of overhead from spawning a system thread (or even a fiber). • Instead of capturing the output of read in a variable bytes, we provide readAsync a callback function, and that callback function is provided the bytes value when available. We sometimes call these callbacks continuations, since they tell us how to continue processing from where you left off. • The loops are gone. Instead, our callbacks recursively call functions to create the necessary infinite looping, while allowing for proper asynchronous behavior. This approach solves the problems listed above with blocking I/O: no performance overhead of spawning threads or fibers, and multiple requests can be processed concurrently without being blocked by each other's I/O calls. The downsides of non-blocking I/O Unfortunately, this new style does not get away scot-free. • Subjectively, the callback-style of coding is not as elegant as the blocking style. There are workarounds for this with techniques like promises. • Error/exception handling isn't as straight-forward in the callback setup as with the blocking code. It's certainly a solvable problem, but often involves techniques like passing in an extra callback function for the error case. Many languages out there today use runtime exceptions, and they don't translate too well to callbacks. • Our code is limited to running on one CPU core, at least in the simplest case. You can work around this with techniques like prefork, but it's not automatically handled by the callback approach. If our goal is to maximize requests per second, using every available CPU core for processing is definitely desired. • It's still possible for handling of multiple requests to cause blockage for each other. For example, if our parseRequest or renderResponse functions perform any blocking I/O, or use a significant amount of CPU time, other requests will need to wait until that processing finishes before they can resume their processing. For those interested, we had a previous blog post on Concurrency and Node which delved more deeply into these points. Making non-blocking a runtime system concern Let's deliver on our promise from the introduction: turning non-blocking I/O into a runtime system concern. The theory behind this is: • Blocking I/O calls are conceptually easier to think about and work with • While spawning lightweight threads/fibers does entail some overhead, it's lightweight enough to be generally acceptable • If we can limit the amount of CPU time spent in running each fiber, we won't need to worry about high-CPU processing blocking other requests • The runtime system can handle the scheduling of lightweight threads onto separate CPU cores (via separate system threads), which will result in the ability to saturate the entire CPU without techniques like prefork That may sound like a lot to deliver on, but green threads are up to the challenge. They are very similar to the fibers that we described above, with one major difference: seemingly blocking I/O calls actually use non-blocking I/O under the surface. Let's see how this would work with our web server example from above (copied here for convenience): function webServer(port) { var listenSocket = bindSocket(port); while(true) { var socket = accept(listenSocket); forkThread(handleConnection(socket)); } } function handleConnection(socket) { while(true) { var bytes = read(socket); var request = parseRequest(bytes); var response = getResponse(request); write(socket, renderResponse(response)); } } Starting from after the bindSocket call: 1. Our main green thread calls accept. The runtime system essentially rewrites this to an acceptAsync call like our callback version, puts the main green thread to sleep, and has the runtime system's event loop schedule a wakeup when new data is available on the listenSocket. 2. When a new connection is available, the runtime system wakes up the main green thread with the new socket value filled in. Our thread then forks a new green thread (let's call it worker 1) running the handleConnection call. 3. Inside worker 1, our read call is similarly rewritten to readAsync with a callback, and the runtime system puts worker 1 to sleep and schedules a wakeup when data is available on socket. 4. The runtime system then goes back to the list of green threads that have work to do and finds the main thread. The main thread continues from the forkThread call, and iterates on the while loop. It arrives back at the accept call and, like before, the runtime system puts the main thread to sleep and schedules a wakeup when there's a connection available on listenSocket. 5. Importantly, at this point, the entire application is able to simply wait for some data. We're waiting for the operating system to tell us that either listenSocket or socket have some data available. When the operating system tells us (via a system call like epoll or select) that data is available, the runtime system can wake the relevant thread up, and do some more work until the next I/O call that puts it to sleep. 6. Since the runtime system is handling the scheduling of threads, it is free to determine that a thread has been active for too long and pause execution in favor of a different thread, solving the long CPU processing problem mentioned above. (This is also the difference between cooperative and preemptive multithreading.) 7. Instead of needing a separate error handling mechanism for callbacks, the runtime system can reuse existing exception throwing mechanisms from the language. For example, if an error occurs during the read call, the runtime system can throw an exception from that point in the code. I'd argue that this green thread approach - for the most part - gives us the best of both worlds: the high level, easy to read and write, robust code that comes from using threads, with the high performance of the non-blocking callback code. Downsides of green threads Like garbage collection, there are downsides to green threads as well. While not a comprehensive list, here are some such downsides I'm aware of: • By passing control of scheduling to the runtime system, we lose control of when exactly a context switch will occur, which can result in performance degradation. (Note that this only applies in relation to the async approach; the other thread-based approaches also have the context switch issue.) In my experience, this is rarely a problem, though certain performance-sensitive code bases may be affected by this. In a distributed computation project at FP Complete, for example, we ultimately went the route of creating our own event loop. • As mentioned, spawning a green thread is cheap, but not free. An event loop can bypass this overhead. Again, with most green thread systems, the overhead is small enough so as not to be prohibitive, but if the highest performance possible is your goal, green threads may ultimately be your bottleneck. As you can see, like garbage collection, the main downside is that for specific performance cases, green threads may be an impediment. But also like garbage collection, there is a wide array of cases where the gains in code simplicity and lower bug rates more than pay for the slight performance overhead. Experiment with green threads Let's go ahead and get started with a short example right now. The only tools you'll need are the Stack build tool and a text editor. If you're on a Mac or Linux system, you can get Stack by running curl -sSL https://get.haskellstack.org/ | sh. On Windows, you probably want the 64-bit installer. Once you have Stack installed, save the following code into a file called echo.hs: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc --package conduit-extra {-# LANGUAGE OverloadedStrings #-} import Data.Conduit import Data.Conduit.Network import qualified Data.ByteString.Char8 as B8 main = do putStrLn "OK, I'm running!" -- This automatically binds a new listening socket and forks a new -- green thread for each incoming connection. runTCPServer settings app where -- Listen on all interfaces on port 4500 settings = serverSettings 4500 "*" -- Create a simple pipeline connecting the input from the network -- (appSource) to our echo program to the output to the network -- (appSink). app appData = runConduit (appSource appData .| echo .| appSink appData) -- awaitForever keeps running the inner function as long as new data -- is available from the client echo = awaitForever (\bytes -> do -- Echo back the raw message we received yield bytes -- And now send back the Fibonacci value at the length of the -- input. We need to use B8.pack to convert our String into a -- binary format we can send over the network. yield (B8.pack (show (fib (B8.length bytes)) ++ "\n"))) -- Written badly for high CPU usage! fib 0 = 1 fib 1 = 1 fib n = fib (n - 1) + fib (n - 2) Now you can run this program with stack echo.hs. The first run will take a bit of time as it downloads a compiler and a number of libraries. This will only happen on the first run; subsequent runs will reuse the previously downloaded and installed tools and libraries. Once you've got this running, you can connect to it to play with: $ telnet localhost 4500

Go ahead and play with it with some short lines, and confirm that it responds. For example, here's a sample session:

$telnet localhost 4500 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Hello world! Hello world! 610 Bye! Bye! 13 Now to prove our claims of the system remaining responsive in the presence of high CPU: try entering a long string, which will require a lot of CPU time to calculate the Fibonacci value, e.g.: This is a fairly long string which will take a bit of time unless you have a supercomputer. This is a fairly long string which will take a bit of time unless you have a supercomputer. As you might expect, further interactions on this connection will have no effect as it is computing its response. But go ahead and open up a new telnet session in a different terminal. You should be able to continue interacting with the echo server and get results, thanks to the scheduling behavior of the runtime system. Notice how we get this behavior without any explicit work on our part to break up the expensive CPU operation into smaller bits! EDIT As pointed out on lobste.rs, the above will not expand to multiple cores, since GHC's interpreter mode will only use a single core. In order to see this take advantage of additional CPU cores for additional requests, first compile the executable with: stack ghc --resolver lts-7.14 --install-ghc --package conduit-extra -- --make -threaded echo.hs And then run it with: ./echo +RTS -N4 The -N4 tells the GHC runtime to use 4 cores. Learn more There are great libraries in Haskell to take advantage of green threads for easy and beautiful concurrency. My all time favorite is the async package. Paired with powerful abstractions like Software Transactional Memory, you can quickly whip up high-performance, concurrent network services while avoiding common pitfalls like deadlocks and race conditions. If you or your team are interested in learning more about how functional programming and Haskell, check out our training options and our Haskell syllabus. If you'd like to learn about how FP Complete can help you with your server applications needs, contact us about our consulting. </html> Joachim Breitner TikZ aesthetics Every year since 2012, I typeset the problems and solutions for the German math event Tag der Mathematik, which is organized by the Zentrum für Mathematik and reaches 1600 students from various parts of Germany. For that, I often reach to the LaTeX drawing package TikZ, and I really like the sober aesthetics of a nicely done TikZ drawing. So mostly for my own enjoyment, I collect the prettiest here. On a global scale they are still rather mundane, and for really impressive and educating examples, I recommend the TikZ Gallery. January 05, 2017 Dominic Steinitz UK / South Korea Trade: A Bayesian Analysis Introduction I was intrigued by a tweet by the UK Chancellor of the Exchequer stating "exports [to South Korea] have doubled over the last year. Now worth nearly £11bn” and a tweet by a Member of the UK Parliament stating South Korea "our second fastest growing trading partner". Although I have never paid much attention to trade statistics, both these statements seemed surprising. But these days it’s easy enough to verify such statements. It’s also an opportunity to use the techniques I believe data scientists in (computer) game companies use to determine how much impact a new feature has on the game’s consumers. One has to be slightly careful with trade statistics as they come in many different forms, e.g., just goods or goods and services etc. When I provide software and analyses to US organisations, I am included in the services exports from the UK to the US. Let’s analyse goods first before moving on to goods and services. Getting the Data First let’s get hold of the quarterly data from the UK Office of National Statistics. ukstats <- "https://www.ons.gov.uk" bop <- "economy/nationalaccounts/balanceofpayments" ds <- "datasets/tradeingoodsmretsallbopeu2013timeseriesspreadsheet/current/mret.csv" mycsv <- read.csv(paste(ukstats,"file?uri=",bop,ds,sep="/"),stringsAsFactors=FALSE) Now we can find the columns that refer to Korea. ns <- which(grepl("Korea", names(mycsv))) length(ns) ## [1] 3 names(mycsv[ns[1]]) ## [1] "BoP.consistent..South.Korea..Exports..SA................................" names(mycsv[ns[2]]) ## [1] "BoP.consistent..South.Korea..Imports..SA................................" names(mycsv[ns[3]]) ## [1] "BoP.consistent..South.Korea..Balance..SA................................" Now we can pull out the relevant information and create a data frame of it. korean <- mycsv[grepl("Korea", names(mycsv))] imports <- korean[grepl("Imports", names(korean))] exports <- korean[grepl("Exports", names(korean))] balance <- korean[grepl("Balance", names(korean))] df <- data.frame(mycsv[grepl("Title", names(mycsv))], imports, exports, balance) colnames(df) <- c("Title", "Imports", "Exports", "Balance") startQ <- which(grepl("1998 Q1",df$Title))
endQ <- which(grepl("2016 Q3",df$Title)) dfQ <- df[startQ:endQ,] We can now plot the data. tab <- data.frame(kr=as.numeric(dfQ$Exports),
krLabs=as.numeric(as.Date(as.yearqtr(dfQ$Title,format='%Y Q%q')))) ggplot(tab, aes(x=as.Date(tab$krLabs), y=tab$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Quarterly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") For good measure let’s plot the annual data. startY <- grep("^1998$",df$Title) endY <- grep("^2015$",df$Title) dfYear <- df[startY:endY,] tabY <- data.frame(kr=as.numeric(dfYear$Exports),
krLabs=as.numeric(dfYear$Title)) ggplot(tabY, aes(x=tabY$krLabs, y=tabY$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Annual)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") And the monthly data. startM <- grep("1998 JAN",df$Title)
endM <- grep("2016 OCT",df$Title) dfMonth <- df[startM:endM,] tabM <- data.frame(kr=as.numeric(dfMonth$Exports),
krLabs=as.numeric(as.Date(as.yearmon(dfMonth$Title,format='%Y %B')))) ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabMkr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Monthly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") It looks like some change took place in 2011 but nothing to suggest either that "export have doubled over the last year" or that South Korea is "our second fastest growing partner". That some sort of change did happen is further supported by the fact a Free Trade Agreement between the EU and Korea was put in place in 2011. But was there really a change? And what sort of change was it? Sometimes it’s easy to imagine patterns where there are none. With this warning in mind let us see if we can get a better feel from the numbers as to what happened. The Model Let us assume that the data for exports are approximated by a linear function of time but that there is a change in the slope and the offset at some point during observation. \begin{aligned} \tau &\sim {\mathrm{Uniform}}(1, N) \\ \mu_1 &\sim \mathcal{N}(\mu_{\mu_1}, \sigma_{\mu_1}) \\ \gamma_1 &\sim \mathcal{N}(\mu_{\gamma_1}, \sigma_{\gamma_1}) \\ \sigma_1 &\sim \mathcal{N}(\mu_{\sigma_1}, \sigma_{\sigma_1}) \\ \mu_2 &\sim \mathcal{N}(\mu_{\mu_2}, \sigma_{\mu_2}) \\ \gamma_2 &\sim \mathcal{N}(\mu_{\gamma_2}, \sigma_{\gamma_2}) \\ \sigma_2 &\sim \mathcal{N}(\mu_{\sigma_2}, \sigma_{\sigma_2}) \\ y_i &\sim \begin{cases} \mathcal{N}(\mu_1 x_i + \gamma_1, \sigma_1) & \mbox{if } i < \tau \\ \mathcal{N}(\mu_2 x_i + \gamma_2, \sigma_2), & \mbox{if } i \geq \tau \end{cases} \end{aligned} Since we are going to use stan to infer the parameters for this model and stan cannot handle discrete parameters, we need to marginalize out this (discrete) parameter. I hope to do the same analysis with LibBi which seems more suited to time series analysis and which I believe will not require such a step. Setting D = {yi}i = 1N we can calculate the likelihood \begin{aligned} p(D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) &= \sum_{n=1}^N p(\tau, D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) \\ &= \sum_{\tau=1}^N p(\tau) p(D \,|\, \tau, \mu_1, \sigma_1, \mu_2, \sigma_2) \\ &=\sum_{\tau=1}^N p(\tau) \prod_{i=1}^N p(y_i \,|\, \tau, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) \end{aligned} stan operates on the log scale and thus requires the log likelihood $\log p(D \,|\, \mu_1, \gamma_1, \sigma_1, \mu_2, \gamma_2, \sigma_2) = \mathrm{log\_sum\_exp}_{\tau=1}^T \big( \log \mathcal{U}(\tau \, | \, 1, T) \\ + \sum_{i=1}^T \log \mathcal{N}(y_i \, | \, \nu_i, \rho_i) \big)$ where \begin{aligned} \nu_i &= \begin{cases} \mu_1 x_i + \gamma_1 & \mbox{if } i < \tau \\ \mu_2 x_i + \gamma_2 & \mbox{if } i \geq \tau \end{cases} \\ \rho_i &= \begin{cases} \sigma_1 & \mbox{if } i < \tau \\ \sigma_2 & \mbox{if } i \geq \tau \end{cases} \end{aligned} and where the log sum of exponents function is defined by $\mathrm{\log\_sum\_exp}_{n=1}^N \, \alpha_n = \log \sum_{n=1}^N \exp(\alpha_n).$ The log sum of exponents function allows the model to be coded directly in Stan using the built-in function , which provides both arithmetic stability and efficiency for mixture model calculations. Stan Here’s the model in stan. Sadly I haven’t found a good way of divvying up .stan files in a .Rmd file so that it still compiles. data { int<lower=1> N; real x[N]; real y[N]; } parameters { real mu1; real mu2; real gamma1; real gamma2; real<lower=0> sigma1; real<lower=0> sigma2; } transformed parameters { vector[N] log_p; real mu; real sigma; log_p = rep_vector(-log(N), N); for (tau in 1:N) for (i in 1:N) { mu = i < tau ? (mu1 * x[i] + gamma1) : (mu2 * x[i] + gamma2); sigma = i < tau ? sigma1 : sigma2; log_p[tau] = log_p[tau] + normal_lpdf(y[i] | mu, sigma); } } model { mu1 ~ normal(0, 10); mu2 ~ normal(0, 10); gamma1 ~ normal(0, 10); gamma2 ~ normal(0, 10); sigma1 ~ normal(0, 10); sigma2 ~ normal(0, 10); target += log_sum_exp(log_p); } generated quantities { int<lower=1,upper=N> tau; tau = categorical_rng(softmax(log_p)); } The above, although mimicking our mathematical model, has quadratic complexity and we can use the trick in the stan manual to make it linear albeit with less clarity. data { int<lower=1> N; real x[N]; real y[N]; } parameters { real mu1; real mu2; real gamma1; real gamma2; real<lower=0> sigma1; real<lower=0> sigma2; } transformed parameters { vector[N] log_p; { vector[N+1] log_p_e; vector[N+1] log_p_l; log_p_e[1] = 0; log_p_l[1] = 0; for (i in 1:N) { log_p_e[i + 1] = log_p_e[i] + normal_lpdf(y[i] | mu1 * x[i] + gamma1, sigma1); log_p_l[i + 1] = log_p_l[i] + normal_lpdf(y[i] | mu2 * x[i] + gamma2, sigma2); } log_p = rep_vector(-log(N) + log_p_l[N + 1], N) + head(log_p_e, N) - head(log_p_l, N); } } model { mu1 ~ normal(0, 10); mu2 ~ normal(0, 10); gamma1 ~ normal(0, 10); gamma2 ~ normal(0, 10); sigma1 ~ normal(0, 10); sigma2 ~ normal(0, 10); target += log_sum_exp(log_p); } generated quantities { int<lower=1,upper=N> tau; tau = categorical_rng(softmax(log_p)); } Let’s run this model with the monthly data. NM <- nrow(tabM) KM <- ncol(tabM) yM <- tabMkr
XM <- data.frame(tabM,rep(1,NM))[,2:3]

fitM <- stan(
file = "lr-changepoint-ng.stan",
data = list(x = XM$krLabs, y = yM, N = length(yM)), chains = 4, warmup = 1000, iter = 10000, cores = 4, refresh = 500, seed=42 ) ## Warning: There were 662 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See ## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup ## Warning: Examine the pairs() plot to diagnose sampling problems Looking at the results below we see a multi-modal distribution so a mean is not of much use. histData <- hist(extract(fitM)$tau,plot=FALSE,breaks=c(seq(1,length(yM),1)))
histData$counts ## [1] 18000 0 0 0 0 0 0 0 0 0 0 ## [12] 0 0 0 0 0 0 0 0 0 0 0 ## [23] 0 0 0 0 0 0 0 0 0 0 0 ## [34] 0 0 0 0 0 0 0 0 0 0 0 ## [45] 0 0 0 0 0 0 0 0 0 0 0 ## [56] 0 0 0 0 0 0 0 0 0 0 0 ## [67] 0 0 0 0 0 0 0 0 0 0 0 ## [78] 0 0 0 0 0 0 0 0 0 0 0 ## [89] 0 0 0 0 0 0 0 0 0 0 0 ## [100] 0 0 0 0 0 0 0 0 0 0 0 ## [111] 0 0 0 0 0 0 0 1 4 12 16 ## [122] 16 107 712 8132 0 0 0 0 0 0 0 ## [133] 0 0 0 0 0 0 0 0 0 0 0 ## [144] 0 0 0 0 0 0 0 0 0 0 0 ## [155] 0 0 0 0 0 0 0 0 0 0 25 ## [166] 171 2812 0 0 0 0 0 0 0 0 0 ## [177] 0 0 0 0 0 0 0 0 0 0 0 ## [188] 0 0 0 0 0 0 0 0 0 0 0 ## [199] 0 0 0 0 0 0 0 0 0 0 0 ## [210] 0 0 0 0 0 0 0 0 0 0 0 ## [221] 0 0 0 0 5992 We can get a pictorial representation of the maxima so that the multi-modality is even clearer. min_indexes = which(diff( sign(diff( c(0,histData$counts,0)))) == 2)
max_indexes = which(diff(  sign(diff( c(0,histData$counts,0)))) == -2) modeData = data.frame(x=1:length(histData$counts),y=histData$counts) min_locs = modeData[min_indexes,] max_locs = modeData[max_indexes,] plot(modeData$y, type="l")
points( min_locs, col="red", pch=19, cex=1  )
points( max_locs, col="green", pch=19, cex=1  )

My interpretation is that the evidence (data) says there is probably no changepoint (a change at the beginning or end is no change) but there might be a change at intermediate data points.

We can see something strange (maybe a large single export?) happened at index 125 which translates to 2008MAY.

The mode at index 167 which translates to 2011NOV corresponds roughly to the EU / Korea trade agreement.

Let us assume that there really was a material difference in trade at this latter point. We can fit a linear regression before this point and one after this point.

Here’s the stan

data {
int<lower=1> N;
int<lower=1> K;
matrix[N,K]  X;
vector[N]    y;
}
parameters {
vector[K] beta;
real<lower=0> sigma;
}
model {
y ~ normal(X * beta, sigma);
}

And here’s the R to fit the before and after data. We fit the model, pull out the parameters for the regression and pull out the covariates

N <- length(yM)
M <- max_locs$x[3] fite <- stan(file = 'LR.stan', data = list(N = M, K = ncol(XM), y = yM[1:M], X = XM[1:M,]), pars=c("beta", "sigma"), chains=3, cores=3, iter=3000, warmup=1000, refresh=-1) se <- extract(fite, pars = c("beta", "sigma"), permuted=TRUE) estCovParamsE <- colMeans(se$beta)

fitl <- stan(file = 'LR.stan',
data = list(N = N-M, K = ncol(XM), y = yM[(M+1):N], X = XM[(M+1):N,]),
pars=c("beta", "sigma"),
chains=3,
cores=3,
iter=3000,
warmup=1000,
refresh=-1)

sl <- extract(fitl, pars = c("beta", "sigma"), permuted=TRUE)
estCovParamsL <- colMeans(sl$beta) Make predictions linRegPredsE <- data.matrix(XM) %*% estCovParamsE linRegPredsL <- data.matrix(XM) %*% estCovParamsL ggplot(tabM, aes(x=as.Date(tabM$krLabs), y=tabM$kr)) + geom_line(aes(x = as.Date(tabM$krLabs), y = tabM$kr, col = "Actual")) + geom_line(data=tabM[1:M,], aes(x = as.Date(tabM$krLabs[1:M]), y = linRegPredsE[(1:M),1], col = "Fit (Before FTA)")) +
geom_line(data=tabM[(M+1):N,], aes(x = as.Date(tabM$krLabs[(M+1):N]), y = linRegPredsL[((M+1):N),1], col = "Fit (After FTA)")) + theme(legend.position="bottom") + ggtitle("Goods Exports UK / South Korea (Monthly)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") An Intermediate Conclusion and Goods and Services (Pink Book) So we didn’t manage to substantiate either the Chancellor’s claim or the Member of Parliament’s claim. But it may be that we can if we look at Goods and Services then we might be able to see the numbers resulting in the claims. pb <- "datasets/pinkbook/current/pb.csv" pbcsv <- read.csv(paste(ukstats,"file?uri=",bop,pb,sep="/"),stringsAsFactors=FALSE) This has a lot more information albeit only annually. pbns <- grep("Korea", names(pbcsv)) length(pbns) ## [1] 21 lapply(pbns,function(x) names(pbcsv[x])) ## [[1]] ## [1] "BoP..Current.Account..Goods...Services..Imports..South.Korea............" ## ## [[2]] ## [1] "BoP..Current.Account..Current.Transfer..Balance..South.Korea............" ## ## [[3]] ## [1] "BoP..Current.Account..Goods...Services..Balance..South.Korea............" ## ## [[4]] ## [1] "IIP..Assets..Total.South.Korea.........................................." ## ## [[5]] ## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa." ## ## [[6]] ## [1] "IIP...Liabilities...Total...South.Korea................................." ## ## [[7]] ## [1] "BoP..Total.income..Balance..South.Korea................................." ## ## [[8]] ## [1] "BoP..Total.income..Debits..South.Korea.................................." ## ## [[9]] ## [1] "BoP..Total.income..Credits..South.Korea................................." ## ## [[10]] ## [1] "BoP..Current.account..Balance..South.Korea.............................." ## ## [[11]] ## [1] "BoP..Current.account..Debits..South.Korea..............................." ## ## [[12]] ## [1] "BoP..Current.account..Credits..South.Korea.............................." ## ## [[13]] ## [1] "IIP...Net...Total....South.Korea........................................" ## ## [[14]] ## [1] "Trade.in.Services.replaces.1.A.B....Imports.Debits...South.Korea...nsa.." ## ## [[15]] ## [1] "BoP..Current.Account..Services..Total.Balance..South.Korea.............." ## ## [[16]] ## [1] "Bop.consistent..Balance..NSA..South.Korea..............................." ## ## [[17]] ## [1] "Bop.consistent..Im..NSA..South.Korea...................................." ## ## [[18]] ## [1] "Bop.consistent..Ex..NSA..South.Korea...................................." ## ## [[19]] ## [1] "Current.Transfers...Exports.Credits...South.Korea...nsa................." ## ## [[20]] ## [1] "Current.Transfers...Imports.Debits...South.Korea...nsa.................." ## ## [[21]] ## [1] "BoP..Current.Account..Goods...Services..Exports..South.Korea............" Let’s just look at exports. koreanpb <- pbcsv[grepl("Korea", names(pbcsv))] exportspb <- koreanpb[grepl("Exports", names(koreanpb))] names(exportspb) ## [1] "Trade.in.Services.replaces.1.A.B....Exports.Credits...South.Korea...nsa." ## [2] "Current.Transfers...Exports.Credits...South.Korea...nsa................." ## [3] "BoP..Current.Account..Goods...Services..Exports..South.Korea............" The last column gives exports of Goods and Services so let’s draw a chart of it. pb <- data.frame(pbcsv[grepl("Title", names(pbcsv))], exportspb[3]) colnames(pb) <- c("Title", "Exports") startpbY <- which(grepl("1999",pb$Title))
endpbY <- which(grepl("2015",pb$Title)) pbY <- pb[startpbY:endpbY,] tabpbY <- data.frame(kr=as.numeric(pbY$Exports),
krLabs=as.numeric(pbY$Title)) ggplot(tabpbY, aes(x=tabpbY$krLabs, y=tabpbY$kr)) + geom_line() + theme(legend.position="bottom") + ggtitle("Goods and Services Exports UK / South Korea (Annual)") + theme(plot.title = element_text(hjust = 0.5)) + xlab("Date") + ylab("Value (£m)") No joy here either to any of the claims. Still it’s been an interesting exercise. Michael Snoyman Conflicting Module Names It's the oldest open issue on the Stackage repo, and a topic I've discussed more times than I can remember over the years. Hackage enforces that package names are unique (so that no one else can claim the name conduit, for instance), but does nothing to ensure unique module names (so someone else could write a package named my-conduit with a module named Data.Conduit). For the record, I think Hackage's position here is not only a good one, but the only logical one it could have made. I'm not even hinting at wanting to change that. Please don't read this blog post in that way at all. Usually, conflicting module names do not negatively affect us, at least when working on project code with a proper .cabal file. In my made-up example above, I would explicitly state that I depend on conduit and not list my-conduit, and when my code imports Data.Conduit, Stack+Cabal+GHC can all work together to ensure that the correct module is used. EDIT Since I've already written some of the code for stackage-curator to detect this, I generated a list of all conflicting module names to give an idea of what we're looking at. The problem (If you're already convinced that conflicting module names are a problem, you may want to skip straight to "the solution." This section is fairly long and detailed.) Unfortunately, there are still some downsides to having the same module name appear in different packages: 1. Documentation Suppose I'm reading a tutorial that includes the line import Control.Monad.Reader. I look at the Stackage doc list by module and discover: If I'm not familiar with the Haskell ecosystem, I'm unlikely to know that mtl is far more popular than monads-tf and choose the latter. 2. runghc/ghci We're not always working on project code. Sometimes we're just writing a script. Sometimes we're playing with an idea in GHCi. What if I import System.FilePath.Glob in a GHCi prompt when I have both the filemanip and Glob packages installed? 3. doctests Similar to the previous point: even when you run doctests from inside the context of a project, they don't typically know which packages can be used, and conflicting module names can cause the tests to fail. What's especially bad about this is that an unrelated action (like running stack build async-dejafu) can suddenly make your tests start to fail when they previously succeeded. 4. Custom Setup.hs Suppose you're writing a cabal package that uses a custom Setup.hs file and imports some additional modules. To pick a concrete example that just happened: the executable-hash package has a Setup.hs file which - indirectly - imports Crypto.Hash.SHA1. And there's an explicit dependency on cryptohash in the .cabal file, which one may naively infer means we're safe. However, when uuid-1.3.13 moved from cryptonite to a few other packages (including cryptohash-sha1), building executable-hash when uuid was already installed became a build error. And like the previous point, this is essentially a non-deterministic race condition. Since I was a backup maintainer for executable-hash, I implemented two fixes: adding an explicit PackageImport and using the new custom-setup feature in Cabal-1.24. While custom-setup is definitely the way to go with this, and it's a great addition to Cabal, not everyone is using the newest version of Cabal, Stack is only just now adding support for this, and not all packages will update to support this immediately. 5. Better tooling It would be great if tooling could automatically determine which packages to install based on the imports list, to avoid the need for a lot of manual and redundant statements of dependencies. We're considering doing this in the upcoming stack script command. But how will Stack know which Control.Monad.Reader to use? The solution While we know that we can't have fully unique module names without a lot of buy-in from package authors, we can get pretty close, with canonical locations for a module. We've already implemented this to some extent in Stackage to resolve problem (3) listed above. We now have the ability to list some packages as hidden in a Stackage snapshot. This means that, after installing the package, the Stackage build system will hide the package, so that its modules won't be available for import. By adding async-dejafu to the hidden list, the warp doctest suite no longer has the ambiguity issue when running. After dealing with the cryptohash-sha1 fallout earlier this week, I realized that this solution can generalize to solve a large swath of the problems described above. Here's how I see it working: • We introduce a new constraint in the Stackage build process: every module name must be present in only one exposed (that is, non-hidden) package. • When stack build registers a package, it automatically hides it if the snapshot lists it as hidden. • On the stackage.org module list, modules from a hidden package are explicitly marked as hidden (or, if we want to be more extreme, we just hide them entirely). • With the upcoming stack script command, when finding a package for a given imported module, we only pay attention to non-hidden modules. This doesn't fully solve the problems above. For example, if a user just Googles Control.Monad.Reader, they'll still possibly get confusing documentation. But I think this is a huge step in the right direction. January 03, 2017 Brent Yorgey MonadRandom 0.5 released I’m happy to announce the release of MonadRandom-0.5, a package which provides a convenient monadic interface for random number generation in the style of transformers and mtl: a Rand monad (essentially a state monad that threads through a generator), a monad transformer variant called RandT, and a RandomMonad class allowing the use of random generation operations in monad stacks containing RandT. This release has quite a few small additions as well as a big module reorganization. However, thanks to module re-exports, most existing code using the library should continue to work with no changes; the major version bump reflects the large reorganization and my resultant inability to 100% guarantee that existing user code will not break. If your code does break, please let me know—I would be happy to help you fix it, or simply to know about it so I can help other users. Here are a few of the biggest changes that may be of interest to users of the library: • A new MonadInterleave class (see #20), which is a big improvement over MonadSplit. It provides a method interleave :: m a -> m a, which works by splitting the generator, running its argument using one half of the generator, and using the other half as the final state of the resulting action (replacing whatever the final generator state otherwise would have been). This can be used, for example, to allow random computations to run in parallel, or to create lazy infinite structures of random values. In the example below, the infinite tree randTree cannot be evaluated lazily: even though it is cut off at two levels deep by hew 2, the random value in the right subtree still depends on generation of all the random values in the (infinite) left subtree, even though they are ultimately unneeded. Inserting a call to interleave, as in randTreeI, solves the problem: the generator splits at each Node, so random values in the left and right subtrees are generated independently. data Tree = Leaf | Node Int Tree Tree deriving Show hew :: Int -> Tree -> Tree hew 0 _ = Leaf hew _ Leaf = Leaf hew n (Node x l r) = Node x (hew (n-1) l) (hew (n-1) r) randTree :: Rand StdGen Tree randTree = Node <$> getRandom <*> randTree <*> randTree

randTreeI :: Rand StdGen Tree
randTreeI = interleave $Node <$> getRandom <*> randTreeI <*> randTreeI

>>> hew 2 <$> evalRandIO randTree Node 2168685089479838995 (Node (-1040559818952481847) Leaf Leaf) (Node ^CInterrupted. >>> hew 2 <$> evalRandIO randTreeI
Node 8243316398511136358 (Node 4139784028141790719 Leaf Leaf) (Node 4473998613878251948 Leaf Leaf)
• A new PrimMonad instance for RandT (thanks to Koz Ross), allowing it to be used in conjunction with e.g. mutable vectors.

• New and improved random selection functions:
• fromList now raises an error when the total weight of elements is zero.
• The type of uniform is generalized to work over any Foldable.
• New operations weighted, weightedMay, fromListMay, and uniformMay have been added. weighted is like fromList but generalized to work over any Foldable. The May variants, of course, return a Maybe result instead of raising an error.
• New lazy vs strict variants of the Rand monad. If you import Control.Monad.Random or Control.Monad.Trans.Random you get the Lazy variant re-exported by default, but you can explicitly import .Lazy or .Strict if you want. They provide the exact same API, but Lazy is implemented with a lazy state monad and Strict with a strict one. To be honest it’s not clear what difference this might make, but since the distinction is already there with the underlying state monad for free, why not provide it?

Although there was some discussion of generalizing MonadRandom to work for a wider range of underlying generators (see the comments on my previous blog post and the discussion on issue #26), I decided to punt on that for now. It seems rather complicated, and that there are already good alternatives like the very nice random-fu package, so I decided to keep things simple for this release. I’m still open to proposals for generalizing future releases.

For a full rundown of changes in 0.5, see the change log. Comments, questions, and bug reports are always welcome either as a comment on this blog post or on the GitHub issue tracker.

Michael Snoyman

This content originally appeared on School of Haskell. Thanks for Julie Moronuki for encouraging me to update/republish, and for all of the edits/improvements.

NOTE Code snippets below can be run using the Stack build tool, by saving to a file Main.hs and running with stack Main.hs. More information is available in the How to Script with Stack tutorial.

Let's start off with a very simple problem. We want to let a user input his/her birth year, and tell him/her his/her age in the year 2020. Using the function read, this is really simple:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
main = do
year <- getLine
putStrLn $"In 2020, you will be: " ++ show (2020 - read year) If you run that program and type in a valid year, you'll get the right result. However, what happens when you enter something invalid? Please enter your birth year hello main.hs: Prelude.read: no parse The problem is that the user input is coming in as a String, and read is trying to parse it into an Integer. But not all Strings are valid Integers. read is what we call a partial function, meaning that under some circumstances it will return an error instead of a valid result. A more resilient way to write our code is to use the readMaybe function, which will return a Maybe Integer value. This makes it clear with the types themselves that the parse may succeed or fail. To test this out, try running the following code: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) main = do -- We use explicit types to tell the compiler how to try and parse the -- string. print (readMaybe "1980" :: Maybe Integer) print (readMaybe "hello" :: Maybe Integer) print (readMaybe "2000" :: Maybe Integer) print (readMaybe "two-thousand" :: Maybe Integer) So how can we use this to solve our original problem? We need to now determine if the result of readMaybe was successful (as Just) or failed (a Nothing). One way to do this is with pattern matching: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) main = do putStrLn "Please enter your birth year" yearString <- getLine case readMaybe yearString of Nothing -> putStrLn "You provided an invalid year" Just year -> putStrLn$ "In 2020, you will be: " ++ show (2020 - year)

Decoupling code

This code is a bit coupled; let's split it up to have a separate function for displaying the output to the user and another separate function for calculating the age.

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc

displayAge maybeAge =
case maybeAge of
Nothing -> putStrLn "You provided an invalid year"
Just age -> putStrLn $"In 2020, you will be: " ++ show age yearToAge year = 2020 - year main = do putStrLn "Please enter your birth year" yearString <- getLine let maybeAge = case readMaybe yearString of Nothing -> Nothing Just year -> Just (yearToAge year) displayAge maybeAge This code does exactly the same thing as our previous version. But the definition of maybeAge in main looks pretty repetitive to me. We check if the parse year is Nothing. If it's Nothing, we return Nothing. If it's Just, we return Just, after applying the function yearToAge. That seems like a lot of line noise to do something simple. All we want is to conditionally apply yearToAge. Functors Fortunately, we have a helper function to do just that. fmap, or functor mapping, will apply some function over the value contained by a functor. Maybe is one example of a functor; another common one is a list. In the case of Maybe, fmap does precisely what we described above. So we can replace our code with: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided an invalid year" Just age -> putStrLn$ "In 2020, you will be: " ++ show age

yearToAge year = 2020 - year

main = do
yearString <- getLine
let maybeAge = fmap yearToAge (readMaybe yearString)
displayAge maybeAge

Our code definitely got shorter, and hopefully a bit clearer as well. Now it's obvious that all we're doing is applying the yearToAge function over the contents of the Maybe value.

So what is a functor? It's some kind of container of values. In Maybe, our container holds zero or one values. With lists, we have a container for zero or more values. Some containers are even more exotic; the IO functor is actually providing an action to perform in order to retrieve a value. The only thing functors share is that they provide some fmap function which lets you modify their contents.

do-notation

We have another option as well: we can use do-notation. This is the same way we've been writing main so far. That's because- as we mentioned in the previous paragraph- IO is a functor as well. Let's see how we can change our code to not use fmap:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc

displayAge maybeAge =
case maybeAge of
Nothing -> putStrLn "You provided an invalid year"
Just age -> putStrLn $"In 2020, you will be: " ++ show age yearToAge year = 2020 - year main = do putStrLn "Please enter your birth year" yearString <- getLine let maybeAge = do yearInteger <- readMaybe yearString return$ yearToAge yearInteger
displayAge maybeAge

Inside the do-block, we have the slurp operator <-. This operator is special for do-notation and is used to pull a value out of its wrapper (in this case, Maybe). Once we've extracted the value, we can manipulate it with normal functions, like yearToAge. When we complete our do-block, we have to return a value wrapped up in that container again. That's what the return function does.

do-notation isn't available for all Functors; it's a special feature reserved only for Monads. Monads are an extension of Functors that provide a little extra power. We're not really taking advantage of any of that extra power here; we'll need to make our program more complicated to demonstrate it.

Dealing with two variables

It's kind of limiting that we have a hard-coded year to compare against. Let's fix that by allowing the user to specify the "future year." We'll start off with a simple implementation using pattern matching and then move back to do-notation.

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc

displayAge maybeAge =
case maybeAge of
Nothing -> putStrLn "You provided invalid input"
Just age -> putStrLn $"In that year, you will be: " ++ show age main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = case readMaybe birthYearString of Nothing -> Nothing Just birthYear -> case readMaybe futureYearString of Nothing -> Nothing Just futureYear -> Just (futureYear - birthYear) displayAge maybeAge OK, it gets the job done... but it's very tedious. Fortunately, do-notation makes this kind of code really simple: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn$ "In that year, you will be: " ++ show age

yearDiff futureYear birthYear = futureYear - birthYear

main = do
birthYearString <- getLine
putStrLn "Please enter some year in the future"
futureYearString <- getLine
let maybeAge = do
return $yearDiff futureYear birthYear displayAge maybeAge This is very convenient: we've now slurped our two values in our do-notation. If either parse returns Nothing, then the entire do-block will return Nothing. This demonstrates an important property about Maybe: it provides short circuiting. Without resorting to other helper functions or pattern matching, there's no way to write this code using just fmap. So we've found an example of code that requires more power than Functors provide, and Monads provide that power. Partial application But maybe there's something else that provides enough power to write our two-variable code without the full power of Monad. To see what this might be, let's look more carefully at our types. We're working with two values: readMaybe birthYearString and readMaybe futureYearString. Both of these values have the type Maybe Integer. And we want to apply the function yearDiff, which has the type Integer -> Integer -> Integer. If we go back to trying to use fmap, we'll seemingly run into a bit of a problem. The type of fmap- specialized for Maybe and Integer- is (Integer -> a) -> Maybe Integer -> Maybe a. In other words, it takes a function that takes a single argument (an Integer) and returns a value of some type a, takes a second argument of a Maybe Integer, and gives back a value of type Maybe a. But our function- yearDiff- actually takes two arguments, not one. So fmap can't be used at all, right? Not true. This is where one of Haskell's very powerful features comes into play. Any time we have a function of two arguments, we can also look at is as a function of one argument which returns a function. We can make this more clear with parentheses: yearDiff :: Integer -> Integer -> Integer yearDiff :: Integer -> (Integer -> Integer) So how does that help us? We can look at the fmap function as: fmap :: (Integer -> (Integer -> Integer)) -> Maybe Integer -> Maybe (Integer -> Integer) Then when we apply fmap to yearDiff, we end up with: fmap yearDiff :: Maybe Integer -> Maybe (Integer -> Integer) That's pretty cool. We can apply this to our readMaybe futureYearString and end up with: fmap yearDiff (readMaybe futureYearString) :: Maybe (Integer -> Integer) That's certainly very interesting, but it doesn't help us. We need to somehow apply this value of type Maybe (Integer -> Integer) to our readMaybe birthYearString of type Maybe Integer. We can do this with do-notation: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn$ "In that year, you will be: " ++ show age

yearDiff futureYear birthYear = futureYear - birthYear

main = do
birthYearString <- getLine
putStrLn "Please enter some year in the future"
futureYearString <- getLine
let maybeAge = do
yearToAge <- fmap yearDiff (readMaybe futureYearString)
return $yearToAge birthYear displayAge maybeAge We can even use fmap twice and avoid the second slurp: #!/usr/bin/env stack -- stack --resolver lts-7.14 --install-ghc runghc import Text.Read (readMaybe) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn$ "In that year, you will be: " ++ show age

yearDiff futureYear birthYear = futureYear - birthYear

main = do
birthYearString <- getLine
putStrLn "Please enter some year in the future"
futureYearString <- getLine
let maybeAge = do
yearToAge <- fmap yearDiff (readMaybe futureYearString)
displayAge maybeAge

But we don't have a way to apply our Maybe (Integer -> Integer) function to our Maybe Integer directly.

Applicative functors

And now we get to our final concept: applicative functors. The idea is simple: we want to be able to apply a function which is inside a functor to a value inside a functor. The magic operator for this is <*>. Let's see how it works in our example:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc

displayAge maybeAge =
case maybeAge of
Nothing -> putStrLn "You provided invalid input"
Just age -> putStrLn $"In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = fmap yearDiff (readMaybe futureYearString) <*> readMaybe birthYearString displayAge maybeAge In fact, the combination of fmap and <*> is so common that we have a special operator, <$>, which is a synonym for fmap. That means we can make our code just a little prettier:

    let maybeAge = yearDiff
<$> readMaybe futureYearString <*> readMaybe birthYearString Notice the distinction between <$> and <*>. The former uses a function which is not wrapped in a functor, while the latter applies a function which is wrapped up.

So if we can do such great stuff with functors and applicative functors, why do we need monads at all? The terse answer is context sensitivity: with a monad, you can make decisions on which processing path to follow based on previous results. With applicative functors, you have to always apply the same functions.

Let's give a contrived example: if the future year is less than the birth year, we'll assume that the user just got confused and entered the values in reverse, so we'll automatically fix it by reversing the arguments to yearDiff. With do-notation and an if statement, it's easy:

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc

displayAge maybeAge =
case maybeAge of
Nothing -> putStrLn "You provided invalid input"
Just age -> putStrLn $"In that year, you will be: " ++ show age yearDiff futureYear birthYear = futureYear - birthYear main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = do futureYear <- readMaybe futureYearString birthYear <- readMaybe birthYearString return$
if futureYear < birthYear
then yearDiff birthYear futureYear
else yearDiff futureYear birthYear
displayAge maybeAge

Exercises

1. Implement fmap using <*> and return.

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
import Control.Applicative ((<*>), Applicative)
import qualified Prelude

fmap :: (Applicative m, Monad m) => (a -> b) -> (m a -> m b)
fmap ... ... = FIXME

main =
case fmap (Prelude.+ 1) (Prelude.Just 2) of
Prelude.Just 3 -> Prelude.putStrLn "Good job!"
_ -> Prelude.putStrLn "Try again"

Show Solution

myFmap function wrappedValue = return function <*> wrappedValue

main = print $myFmap (+ 1)$ Just 5
2. How is return implemented for the Maybe monad? Try replacing return with its implementation in the code above.

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
returnMaybe = FIXME

main
| returnMaybe "Hello" == Just "Hello" = putStrLn "Correct!"
| otherwise = putStrLn "Incorrect, please try again"

Show Solution

return is simply the Just constructor. This gets defined as:

instance Monad Maybe where
return = Just
3. yearDiff is really just subtraction. Try to replace the calls to yearDiff with explicit usage of the - operator.

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc

displayAge maybeAge =
case maybeAge of
Nothing -> putStrLn "You provided invalid input"
Just age -> putStrLn $"In that year, you will be: " ++ show age main = do putStrLn "Please enter your birth year" birthYearString <- getLine putStrLn "Please enter some year in the future" futureYearString <- getLine let maybeAge = do futureYear <- readMaybe futureYearString birthYear <- readMaybe birthYearString return$
-- BEGIN CODE TO MODIFY
if futureYear < birthYear
then yearDiff birthYear futureYear
else yearDiff futureYear birthYear
-- END CODE TO MODIFY
displayAge maybeAge

Show Solution

                if futureYear < birthYear
then birthYear - futureYear
else futureYear - birthYear
4. It's possible to write an applicative functor version of the auto-reverse-arguments code by modifying the yearDiff function. Try to do so.

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
import Control.Applicative ((<$>), (<*>)) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn$ "In that year, you will be: " ++ show age

yearDiff futureYear birthYear = FIXME

main
| yearDiff 5 6 == 1 = putStrLn "Correct!"
| otherwise = putStrLn "Please try again"

Show Solution

yearDiff futureYear birthYear
| futureYear > birthYear = futureYear - birthYear
| otherwise = birthYear - futureYear
5. Now try to do it without modifying yearDiff directly, but by using a helper function which is applied to yearDiff.

#!/usr/bin/env stack
-- stack --resolver lts-7.14 --install-ghc runghc
import Control.Applicative ((<$>), (<*>)) displayAge maybeAge = case maybeAge of Nothing -> putStrLn "You provided invalid input" Just age -> putStrLn$ "In that year, you will be: " ++ show age

yearDiff futureYear birthYear = futureYear - birthYear
yourHelperFunction f ...

main
| yourHelperFunction yearDiff 5 6 == 1 = putStrLn "Correct!"
| otherwise = putStrLn "Please try again"

Show Solution

yourHelperFunction f x y
| x > y = f x y
| otherwise = f y x

A tale of backwards compatibility in ASTs

Those that espouse the value of backwards compatibility often claim that backwards compatibility is simply a matter of never removing things. But anyone who has published APIs that involve data structures know that the story is not so simple. I'd like to describe my thought process on a recent BC problem I'm grappling with on the Cabal file format. As usual, I'm always interested in any insights and comments you might have.

The status quo. The build-depends field in a Cabal file is used to declare dependencies on other packages. The format is a comma-separated list of package name and version constraints, e.g., base >= 4.2 && < 4.3. Abstractly, we represent this as a list of Dependency:

data Dependency = Dependency PackageName VersionRange


The effect of an entry in build-depends is twofold: first, it specifies a version constraint which a dependency solver takes into account when picking a version of the package; second, it brings the modules of that package into scope, so that they can be used.

The extension. We added support for "internal libraries" in Cabal, which allow you to specify multiple libraries in a single package. For example, suppose you're writing a library, but there are some internal functions that you want to expose to your test suite but not the general public. You can place these functions in an internal library, which is depended upon by both the public library and the test suite, but not available to external packages.

For more motivation, see the original feature request, but for the purpose of this blog post, we're interested in the question of how to specify a dependency on one of these internal libraries.

Attempt #1: Keep the old syntax. My first idea for a new syntax for internal libraries was to keep the syntax of build-depends unchanged. To refer to an internal library named foo, you simply write build-depends: foo; an internal library shadows any external package with the same name.

Backwards compatible? Absolutely not. Remember that the original interpretation of entries in build-depends is of package names and version ranges. So if you had code that assumed that there actually is an external package for each entry in build-depends would choke in an unexpected way when a dependency on an internal library was specified. This is exactly what happened with cabal-install's dependency solver, which needed to be updated to filter out dependencies that corresponded to internal libraries.

One might argue that it is acceptable for old code to break if the new feature is used. But there is a larger, philosophical objection to overloading package names in this way: don't call something a package name if it... isn't actually a package name!

Attempt #2: A new syntax. Motivated by this philosophical concern, as well as the problem that you couldn't simultaneously refer to an internal library named foo and an external package named foo, we introduce a new syntactic form: to refer to the internal library foo in the package pkg, we write build-depends: pkg:foo.

Since there's a new syntactic form, our internal AST also has to change to handle this new form. The obvious thing to do is introduce a new type of dependency:

data BuildDependency =
BuildDependency PackageName
(Maybe UnqualComponentName)
VersionRange


and say that the contents of build-depends is a list of BuildDependency.

When it comes to changes to data representation, this is a "best-case scenario", because we can easily write a function BuildDependency -> Dependency. So supposing our data structure for describing library build information looked something like this:

data BuildInfo = BuildInfo {
targetBuildDepends :: [Dependency],
-- other fields
}


We can preserve backwards compatibility by turning targetBuildDepends into a function that reads out the new, extend field, and converts it to the old form:

data BuildInfo = BuildInfo {
targetBuildDepends2 :: [BuildDependency],
-- other fields
}

targetBuildDepends :: BuildDependency -> BuildInfo
targetBuildDepends = map buildDependencyToDependency
. targetBuildDepends2


Critically, this takes advantage of the fact that record selectors in Haskell look like functions, so we can replace a selector with a function without affecting downstream code.

Unfortunately, this is not actually true. Haskell also supports record update, which lets a user overwrite a field as follows: bi { targetBuildDepends = new_deps }. If we look at Hackage, there are actually a dozen or so uses of targetBuildDepends in this way. So, if we want to uphold backwards-compatibility, we can't delete this field. And unfortunately, Haskell doesn't support overloading the meaning of record update (perhaps the lesson to be learned here is that you should never export record selectors: export some lenses instead).

It is possible that, in balance, breaking a dozen packages is a fair price to pay for a change like this. But let's suppose that we are dead-set on maintaining BC.

Attempt #3: Keep both fields. One simple way to keep the old code working is to just keep both fields:

data BuildInfo = BuildInfo {
targetBuildDepends  :: [Dependency],
targetBuildDepends2 :: [BuildDependency],
-- other fields
}


We introduce a new invariant, which is that targetBuildDepends bi == map buildDependencyToDependency (targetBuildDepends2 bi). See the problem? Any legacy code which updates targetBuildDepends probably won't know to update targetBuildDepends2, breaking the invariant and probably resulting in some very confusing bugs. Ugh.

Attempt #4: Do some math. The problem with the representation above is that it is redundant, which meant that we had to add invariants to "reduce" the space of acceptable values under the type. Generally, we like types which are "tight", so that, as Yaron Minsky puts it, we "make illegal states unrepresentable."

To think a little more carefully about the problem, let's cast it into a mathematical form. We have an Old type (isomorphic to [(PN, VR)]) and a New type (isomorphic to [(PN, Maybe CN, VR)]). Old is a subspace of New, so we have a well-known injection inj :: Old -> New.

When a user updates targetBuildDepends, they apply a function f :: Old -> Old. In making our systems backwards compatible, we implicitly define a new function g :: New -> New, which is an extension of f (i.e., inj . f == g . inj): this function tells us what the semantics of a legacy update in the new system is. Once we have this function, we then seek a decomposition of New into (Old, T), such that applying f to the first component of (Old, T) gives you a new value which is equivalent to the result of having applied g to New.

Because in Haskell, f is an opaque function, we can't actually implement many "common-sense" extensions. For example, we might want it to be the case that if f updates all occurrences of parsec with parsec-new, the corresponding g does the same update. But there is no way to distinguish between an f that updates, and an f that deletes the dependency on parsec, and then adds a new dependency on parsec-new. (In the bidirectional programming world, this is the distinction between state-based and operation-based approaches.)

We really only can do something reasonable if f only ever adds dependencies; in this case, we might write something like this:

data BuildInfo = BuildInfo {
targetBuildDepends :: [Dependency],
targetSubLibDepends :: [(PackageName, UnqualComponentName)],
targetExcludeLibDepends :: [PackageName],
-- other fields
}


The conversion from this to BuildDependency goes something like:

1. For each Dependency pn vr in targetBuildDepends, if the package name is not mentioned in targetExcludeLibDepends, we have BuildDependency pn Nothing vr.
2. For each (pn, cn) in targetSubLibDepends where there is a Dependency pn vr (the package names are matching), we have BuildDependency pn (Just cn) vr.

Stepping back for a moment, is this really the code we want to write? If the modification is not monotonic, we'll get into trouble; if someone reads out targetBuildDepends and then writes it into a fresh BuildInfo, we'll get into trouble. Is it really reasonable to go to these lengths to achieve such a small, error-prone slice of backwards compatibility?

Conclusions. I'm still not exactly sure what approach I'm going to take to handle this particular extension, but there seem to be a few lessons:

1. Records are bad for backwards compatibility, because there is no way to overload a record update with a custom new update. Lenses for updates would be better.
2. Record update is bad for backwards compatibility, because it puts us into the realm of bidirectional programming, requiring us to reflect updates from the old world into the new world. If our records are read-only, life is much easier. On the other hand, if someone ever designs a programming language that is explicitly thinking about backwards compatibility, bidirectional programming better be in your toolbox.
3. Backwards compatibility may be worse in the cure. Would you rather your software break at compile time because, yes, you really do have to think about this new case, or would you rather everything keep compiling, but break in subtle ways if the new functionality is ever used?

What's your take? I won't claim to be a expert on questions of backwards compatibility, and would love to see you weigh in, whether it is about which approach I should take, or general thoughts about the interaction of programming languages with backwards compatibility.

Sandy Maguire

<time>January 1, 2017</time>

I have a sinful, guilty pleasure – I like a sports video-game: NBA Jam Tournament Edition. Regrettably, I don’t own a copy, and all of my attempts to acquire one have ended in remarkable misfortune.

Obviously my only recourse was to make a tribute game that I could play to my heart’s desire.

And so that’s what I set out to do, back in 2013. My jam-loving then-coworker and I drafted up the barest constituent of a design document, and with little more thought about the whole thing we dove right in.

We “decided” on Python as our language of choice, and off we went. There was no game engine, so we rolled everything by hand: drawing, collision, you name it. We got a little demo together, and while it was definitely basketball-like, it certainly wasn’t a game. Eventually my partner lost interest, and the code sits mostly forgotten in the back recesses of my Github repositories.

I say mostly forgotten because over the last three years, I’ve occasionally spent a sleepless night here or there working on it, slowly but surely turning midnight fuel into reality.

Three years is a long time to spend on a toy project, and it’s an even longer amount of time for a junior engineer’s sensibilities to stay constant. As I learned more and more computer science tools, I found myself waging a constant battle against Python. The details aren’t important, but it was consistently a headache in order to get the language to allow me to express the things I wanted to. It got to the point where I stopped work entirely on the project due to it no longer being fun.

But this basketball video-game of mine was too important to fail, and so I came up with a solution.

If you’re reading this blog, you probably already know what the solution to my problem was – I decided to port the game over to Haskell. Remarkable progress was made: within a few days I had the vast majority of it ported. At first my process looked a lot like this:

1. Read a few lines of Python.
2. Try to understand what they were doing.
3. Copy them line-by-line into Haskell syntax.

and this worked well enough. If there were obvious improvements that could be made, I would do them, but for the most part, it was blind and mechanical. At time of writing I have a bunch of magical constants in my codebase that I dare not change.

However, when I got to the collision resolution code, I couldn’t in good conscience port the code. It was egregious, and would have been an abomination upon all that is good and holy if that imperative mess made it into my glorious new Haskell project.

The old algorithm was like so:

1. Attempt to move the capsule1 to the desired location.
2. If it doesn’t intersect with any other capsules, 👍.
3. Otherwise, perform a sweep from the capsule’s original location to the desired location, and stop at the point just before it would intersect.
4. Consider the remaining distance a “force” vector attempting to push the other capsule out of the way.
5. Weight this force by the mass of the moving capsule relative to the total weight of the capsules being resolved.
6. Finish moving the capsule by its share of weighted force vector.
7. Recursively move all capsules it intersects with outwards by their shares of the remaining force.

I mean, it’s not the greatest algorithm, but it was fast, simple, and behaved well-enough that I wasn’t going to complain.

Something you will notice, however, is that this is definitively not a functional algorithm. It’s got some inherent state in the position of the capsules, but also involves directly moving other capsules out of your way.

Perhaps more worryingly is that in aggregate, the result of this algorithm isn’t necessarily deterministic – depending on the order in which the capsules are iterated we may or may not get different results. It’s not an apocalyptic bug, but you have to admit that it is semantically annoying.

I spent about a week mulling over how to do a better (and more functional) job of resolving these physics capsules. The key insight was that at the end of the day, the new positions of all the capsules depend on the new (and old) positions of all of the other capsules.

When phrased like that, it sounds a lot like we’re looking for a comonad, doesn’t it? I felt it in my bones, but I didn’t have enough comonadic kung-fu to figure out what this comonad must actually look like. I was stumped – nothing I tried would simultaneously solve my problem and satisfy the comonadic laws.

Big shout-outs to Rúnar Bjarnason for steering me into the right direction: what I was looking for was not in fact a comonad (a data-type with a Comonad instance), but instead a specific Cokleisli arrow (a function of type Comonad w => w a -> b).

Comonadic co-actions such as these can be thought of the process of answering some query b about an a in some context w. And so, in my case, I was looking for the function w Capsule -> Capsule, with some w suitable to the cause. The w Capsule obviously needed the semantics of “be capable of storing all of the relevant Capsules.” Implicitly in these semantics are that w need also have a specific Capsule under focus2.

To relieve the unbearable tension you’re experience about what comonad w is, it’s a Store. If you’re unfamiliar with Store:

data Store s a = Store s (s -> a)

which I kind of think of as a warehouse full of as, ordered by ses, with a forklift that drives around but is currently ready to get a particular a off the shelves.

With all of this machinery in place, we’re ready to implement the Cokleisli arrow, stepCapsule, for resolving physics collisions. The algorithm looks like this:

1. For each other object :: s, extract its capsule from the Store.
2. Filter out any which are not intersecting with the current capsule.
3. Model these intersecting capsules as a spring-mass system, and have each other object exert a displacement “force” exactly necessary to make the two objects no longer collide (weighted by their relative masses).
4. Sum these displacement vectors, and add it to the current capsule’s position.

This algorithm is easy to think about: all it does is compute the new location of a particular capsule. Notice that it explicitly doesn’t attempt to push other capsules out of its way.

And here’s where the magic comes in. We can use the comonadic co-bind operator extend :: (w a -> b) -> w a -> w b to lift our “local”-acting function stepCapsule over all the capsules simultaneously.

There’s only one problem left. While extend stepCapsule ensures that if any capsules were previously colliding no longer do, it doesn’t enforce that the newly moved capsules don’t collide with something new!

Observe of the algorithm that if no objects are colliding, no objects will be moved after running extend stepCapsule over them. And this is in fact just the trick we need! If we can find a fix point of resolving the capsules, that fix point must have the no-collisions invariant we want.

However, notice that this is not the usual least-fixed point we’re used to dealing with in Haskell (fix). What we are looking for is an iterated fixed point:

iterFix :: Eq a => (a -> a) -> a -> a
iterFix f = head . filter (==) . ap zip tail . iterate f

And voila, iterFix (unpack . extend stepCapsule . pack) is our final, functional solution to resolving collisions. It’s surprisingly elegant, especially when compared to my original imperative solution. For bonus points, it feels a lot like the way I understand actual real-life physics to work: somehow running a local computation everywhere, simultaneously.

While time forms a monad, physics forms a comonad. At least in this context.

1. Lots of physics engines model complicated things as pill-shaped capsules, since these are mathematically simple and usually “good enough”.

2. Otherwise we’d be pretty hard-pressed to find a useful extract :: w a -> a function for it.

</article>

optparse-applicative quick start

When I need to write a command-line program in Haskell, I invariably pick Paolo Capriotti’s optparse-applicative library.

Unfortunately, the minimal working example is complicated enough that I cannot reproduce it from memory, and the example in the README is very different from the style I prefer.

So I decided to put up a template here for a program using optparse-applicative. I am going to copy it into all of my future projects, and you are welcome to do so, too.

import Options.Applicative

main :: IO ()
main = join . execParser $info (helper <*> parser) ( fullDesc <> header "General program title/description" <> progDesc "What does this thing do?" ) where parser :: Parser (IO ()) parser = work <$> strOption
(  long "string_param"
<> short 's'
<> metavar "STRING"
<> help "string parameter"
)
<*> option auto
(  long "number_param"
<> short 'n'
<> metavar "NUMBER"
<> help "number parameter"
<> value 1
<> showDefault
)

work :: String -> Int -> IO ()
work _ _ = return ()

Software project maintenance is where Haskell shines

<html>

We Spend Most of Our Time on Maintenance

Look at the budget spent on your software projects. Most of it goes towards maintenance. The Mythical Man-Month by Fred Brooks states that over 90% of the costs of a typical system arise in the maintenance phase, and that any successful piece of software will inevitably be maintained, and Facts and Fallacies of Software Engineering by Robert L. Glass reports that maintenance typically consumes 40% to 80% (averaging 60%) of software costs.

From our own experience and the literature, we can conclude that maintenance is perhaps the most important part of developing software. In this article we'll explore why Haskell shines in maintenance.

The Five Bases of Maintenance

Based on the article Software Maintenance by Chris Newton, I'm going to write about five bases for doing software maintenance:

• Readability: The source code is comprehensible, and describes the domain well to the reader.
• Testability: The code is friendly to being tested, via unit tests, integration tests, property tests, code review, static analysis, etc.
• Preservation of knowledge: Teams working on the software retain the knowledge of the design and functioning of the system over time.
• Modifiability: The ease with which we can fix, update, refactor, adapt, and generally mechanically change.
• Correctness: The software is constructed in a self-consistent way, by using means of combination that rule out erroneous cases that maintainers shouldn't have to deal with.

We'll see below what Haskell brings to the table for each of these bases.

The source code is comprehensible, and describes the domain well to the reader.

Reduce state: Developers must hold a "state of the world" in their head when understanding imperative and object-oriented source code. In Haskell, which is a pure functional language, developers only have to look at the inputs to a function, making it far easier to consider a portion of code and to approach working on it.

Narrowing the problem space: A rich type system like Haskell's guides less-experienced developers, or newcomers to the project, to the right places. Because the domain can be modeled in types, which formally narrow down the problem. Developers can literally define problems away, turning their attention to the real problems of your business's domain.

Coupling where it counts: Haskell's type system supports modeling cases of a problem, coupling the case (such as: logged in/logged out) with the values associated with that state (such as: user session id/no session id). Developers can work with fewer variables to hold in their head, instead concentrating on your business logic.

Encapsulation: Like in object oriented languages (Java, C++, Ruby, Python), encapsulation in Haskell allows developers to hide irrelevant details when exposing the interfaces between modules, leaving other developers fewer details to worry about.

Testability

The code is friendly to being tested, via unit tests, integration tests, property tests, code review, static analysis, etc.

Explicit inputs: Haskell programs are the easiest to write tests for, because they are composed of pure functions, which either require no conditions under which your developers should run them, or the conditions are explicitly defined inputs to the function.

Mock the world: With excellent support for embedded domain-specific languages (DSLs), Haskell empowers developers to write programs in an imperative fashion which can then be interpreted as a real world program (interacting with file I/O, using time, etc.) or as a mock program which does nothing to the real world but compute a result. This is valuable for testing the business logic of the software without having to setup a whole real environment just to do so.

Automatically test properties: Haskell's unique type system supports trivially generating thousands of valid inputs to a function, in order to test that every output of the function is correct. Anything from parsers, financial calculations, state machine transformations, etc. can be generated and tested for.

Static analysis: It may go without saying, but Haskell's static type system brings substantial potential for eliminating whole classes of bugs, and maintaining invariants while changing software, as a continuous feedback to the developer. A level-up from Java or C++ or C#, Haskell's purity and rich type system is able to check a far greater region of source code and to greater precision.

Taking testing seriously: Haskell has a large number of testing libraries which range from standard unit testing (like JUnit or RSpec), web framework-based testing, property-based testing (like QuickCheck) and other randomly generated testing, testing documentation, concurrency testing, and mock testing.

Preservation of knowledge

Teams working on the software retain the knowledge of the design and functioning of the system over time.

Model the domain precisely: Because Haskell's rich type system lets your developers model the domain precisely and in a complete way, it's easier for the same developers to return months or a year from now, or new developers to arrive, and gain a good grasp of what's happening in the system.

Modifiability

The ease with which we can fix, update, refactor, adapt, and generally mechanically change.

Automatic memory management: Haskell is high-level with automatically managed memory, like Python or Ruby, and does not suffer from memory corruption issues or leaks, like C or C++, which can arise from developers making changes to your system and mistakenly mismanaging memory manually.

Automate completeness: As mentioned in the readability section, Haskell allows developers to define data types as a set of cases that model the business domain logic. From simple things like results (success/fail/other), to finite state machines, etc. Along with this comes the ability for the compiler to statically determine and tell your developers when a case is missing, which they need go to and correct. This is extraordinarily useful when changing and extending a system.

Break up the problem: Haskell's pure functions only depend on their parameters, and so any expression can be easily factored out into separate functions. Breaking a problem down into smaller problems helps maintainers deal with smaller problems, taking fewer things into account.

Encapsulate: As encapsulation allows developers to hide irrelevant details when exposing the interfaces between Haskell modules, this allows developers to change the underlying implementation of modules without consumers of that module having to be changed.

Decouple orthogonal concepts: In Haskell, unlike in popular object oriented languages like Java or C++, data and behavior are not coupled together: a photograph is a photograph, and a printer knows how to print it, it's not that a photograph contains printing inside it. The data is the photograph, and the behavior is printing a photograph. In Haskell, these two are decoupled, allowing developers to simply define the data that counts, and freely add more behaviors later, without getting lost in object hierarchies and inheritance issues.

Correctness

The software is constructed in a self-consistent way, by using means of combination that rule out erroneous cases that maintainers shouldn't have to deal with.

Correct combination: In Python, a whole new version of the language, Python 3, had to be implemented to properly handle Unicode text in a backwards-incompatible way. This broke lots of existing Python code and many large projects have still not upgraded. In Haskell, text and binary data are unmixable data types. They cannot be mistakenly combined, as in Python and many other languages. This throws a whole class of encoding issues out of the window, which is less for your developers to worry about.

No implicit null: In The Billion Dollar Mistake Tony Hoare apologizes for the "null" value, present in almost all popular programming languages. Haskell does not have a null value. It explicitly models nullability with a data type. Given the countless bugs caused by null, and maintenance burden due to tracking down or introducing such bugs, Haskell's contribution by removing it is substantial. Languages that include an implicit null value are: Java, C, C++, C#, Python, Ruby, JavaScript, Lisp, Clojure, etc.

Avoid multiple writers: In concurrent code, developers have to be very careful when more than one thread changes the same data. Imperative languages tend to allow any thread to change anything, so it's frighteningly easy to make mistakes. In Haskell, data structures are immutable, and a mutable "box" has to be created to share data between threads, ruling out a plethora of potential bugs.

Summary

Maintenance is our biggest activity when developing successful software. There are five bases that really make maintenance work better, and this is where Haskell really shines:

• Readability: Haskell's purity and type system lend themselves perfectly to comprehensible code.
• Testability: Haskell code is inherently more testable, due to being pure, safely statically typed, and coming with a variety of testing packages.
• Preservation of knowledge: A rich type system like Haskell's can model the domain so well that developers have to remember less, and educate each-other less, saving time.
• Modifiability: Haskell's strong types, completeness analysis and purity assure that when you break something, you know it sooner.
• Correctness: Developers can work within a consistent model of your domain, removing whole classes of irrelevant problems. Concurrent code is easier to maintain too.

All in all, Haskell really shines in maintenance, and, while it has other novel features, it's really for this reason that developers and companies are increasingly switching to it.

</html>

Backpack and the PVP

In the PVP, you increment the minor version number if you add functions to a module, and the major version number if you remove function to a module. Intuitively, this is because adding functions is a backwards compatible change, while removing functions is a breaking change; to put it more formally, if the new interface is a subtype of the older interface, then only a minor version number bump is necessary.

Backpack adds a new complication to the mix: signatures. What should the PVP policy for adding/removing functions from signatures should be? If we interpret a package with required signatures as a function, theory tells us the answer: signatures are contravariant, so adding required functions is breaking (bump the major version), whereas it is removing required functions that is backwards-compatible (bump the minor version).

However, that's not the end of the story. Signatures can be reused, in the sense that a package can define a signature, and then another package reuse that signature:

unit sigs where
signature A where
x :: Bool
unit p where
dependency sigs[A=<A>]
module B where
import A
z = x


In the example above, we've placed a signature in the sigs unit, which p uses by declaring a dependency on sigs. B has access to all the declarations defined by the A in sigs.

But there is something very odd here: if sigs were to ever remove its declaration for x, p would break (x would no longer be in scope). In this case, the PVP rule from above is incorrect: p must always declare an exact version bound on sigs, as any addition or deletion would be a breaking change.

So we are in this odd situation:

1. If we include a dependency with a signature, and we never use any of the declarations from that signature, we can specify a loose version bound on the dependency, allowing for it to remove declarations from the signature (making the signature easier to fulfill).
2. However, if we ever import the signature and use anything from it, we must specify an exact bound, since removals are now breaking changes.

I don't think end users of Backpack should be expected to get this right on their own, so GHC (in this proposed patchset) tries to help users out by attaching warnings like this to declarations that come solely from packages that may have been specified with loose bounds:

foo.bkp:9:11: warning: [-Wdeprecations]
In the use of ‘x’ (imported from A):
"Inherited requirements from non-signature libraries
(libraries with modules) should not be used, as this
mode of use is not compatible with PVP-style version
bounds.  Instead, copy the declaration to the local
hsig file or move the signature to a library of its
own and add that library as a dependency."


Of course, GHC knows nothing about bounds, so the heuristic we use is that a package is a signature package with exact bounds if it does not expose any modules. A package like this is only ever useful by importing its signatures, so we never warn about this case. We conservatively assume that packages that do expose modules might be subject to PVP-style bounds, so we warn in that case, e.g., as in:

unit q where
signature A where
x :: Bool
module M where -- Module!
unit p where
dependency q[A=<A>]
module B where
import A
z = x


As the warning suggests, this error can be fixed by explicitly specifying x :: Bool inside p, so that, even if q removes its requirement, no code will break:

unit q where
signature A where
x :: Bool
module M where -- Module!
unit p where
dependency q[A=<A>]
signature A where
x :: Bool
module B where
import A
z = x


Or by putting the signature in a new library of its own (as was the case in the original example.)

This solution isn't perfect, as there are still ways you can end up depending on inherited signatures in PVP-incompatible ways. The most obvious is with regards to types. In the code below, we rely on the fact that the signature from q forces T to be type equal to Bool:

unit q where
signature A where
type T = Bool
x :: T
module Q where
unit p where
dependency q[A=<A>]
signature A where
data T
x :: T
module P where
import A
y = x :: Bool


In principle, it should be permissible for q to relax its requirement on T, allowing it to be implemented as anything (and not just a synonym of Bool), but that change will break the usage of x in P. Unfortunately, there isn't any easy way to warn in this case.

A perhaps more principled approach would be to ban use of signature imports that come from non-signature packages. However, in my opinion, this complicates the Backpack model for not a very good reason (after all, some day we'll augment version numbers with signatures and it will be glorious, right?)

To summarize. If you want to reuse signatures from signature package, specify an exact version bound on that package. If you use a component that is parametrized over signatures, do not import and use declarations from those signatures; GHC will warn you if you do so.

December 29, 2016

Ivan Lazar Miljenovic

Have you ever wanted to do something like this?

λ> cons 'a' (1::Int, 2::Word, 3::Double) :: (Char, Int, Word, Double)
('a',1,2,3.0)

λ> unsnoc ('a',1::Int,2::Word,3.0::Double) :: ((Char, Int, Word), Double)
(('a',1,2),3.0)

Let me try to completely confuse you (and potentially give a hint as to what I’m doing):

λ> transmogrify ('H', 'a', 's', 'k', 'e', 'l', 'l') :: ((Char, Char), Char, (Char, Char, (Char, Char)))
(('H','a'),'s',('k','e',('l','l')))

One more hint:

λ> data Foo = Bar Char Char Char deriving (Show, Generic)
λ> transmogrify ('a', 'b', 'c') :: Foo
Bar 'a' 'b' 'c'

What do you mean by that?

I’ve suddenly become really interested in GHC Generics, and it occurred to me the other day that – since it basically decomposes more interesting types to products and sums with lots of associated metadata – that it should be possible to get two different types that are fundamentally the same shape but lots of different pesky metadata.

Turns out, it is possible. I’ve got a prototype of a little library that implements this on GitHub, and that’s what I used for those examples above.

How it works

Basically, all metadata (constructor names, record aliases, strictness annotations, etc.) is stripped out. This is done recursively throughout the entire type, stopping at fundamental types like Int and Char. To cap it all off, products and converted from a tree-like implementation into an explicit list (this is even done recursively for any products contained within products, like the nested tuples above).

When will this be on Hackage?

I doubt it will be.

The approach is a bit hacky, with various type-classes required including type aliases, etc. That’s not too bad, but there is pretty much no type safety or inference available (hence all the explicit annotations above).

The performance is also not great: it’s fundamentally O(n), and there’s no way to really fix this (at least that I can see).

There are also currently two limitations with the implementation:

1. No handling of sum-types. This could be remedied by basically copying and modifying the existing handling of product types.
2. An explicit list of types is needed to be able to stop type recursion; this is currently limited to numeric types and Char.

This second limitation is the biggest fundamental problem with how to get this to a production-ready library. Ideally you could specify “this type should not be examined”. Even better: if a component type doesn’t have a Generic instance then don’t bother trying to split it apart.

So, now what?

Well, the code is there. If there’s enough interest I might try and clean it up and put it on Hackage regardless.

But if you think this will somehow solve all your problems, then maybe you should re-think what you’re doing

Senior Backend Engineer at Euclid Analytics (Full-time)

We are looking to add a senior individual contributor to the backend engineering team! Our team is responsible for creating and maintaining the infrastructure that powers the Euclid Analytics Engine. We leverage a forward thinking and progressive stack built in Scala and Python, with an infrastructure that uses Mesos, Spark and Kafka. As a senior engineer you will build out our next generation ETL pipeline. You will need to use and build tools to interact with our massive data set in as close to real time as possible. If you have previous experience with functional programming and distributed data processing tools such as Spark and Hadoop, then you would make a great fit for this role!

Responsibilities:

• Partnering with the data science team to architect and build Euclid’s big data pipeline
• Building tools and services to maintain a robust, scalable data service layer
• Leverage technologies such as Spark and Kafka to grow our predictive analytics and machine learning capabilities in real time
• Finding innovative solutions to performance issues and bottlenecks
• Help build and scale our internal and external Python APIs

Requirements:

• At least 3 years industry experience in a full time role utilizing Scala or other modern functional programming languages (Haskell, Clojure, Lisp, etc.)
• Database management experience (MySQL, Redis, Cassandra, Redshift, MemSQL)
• Experience with big data infrastructure including Spark, Mesos, Scalding and Hadoop
• Excited about data flow and orchestration with tools like Kafka and Spark Streaming
• Have experience building production deployments using Amazon Web Services or Heroku’s Cloud Application Platform
• B.S. or equivalent in Computer Science or another technical field

Get information on how to apply for this position.

December 26, 2016

mightybyte

The following started out as a response to a Hacker News comment, but got long enough to merit a standalone blog post.

I think the root of the Haskell documentation debate lies in a pretty fundamental difference in how you go about finding, reading, and understanding documentation in Haskell compared to mainstream languages.  Just last week I ran into a situation that really highlighted this difference.

I was working on creating a Haskell wrapper around the ACE editor.  I initially wrote the wrapper some time ago and got it integrated into a small app.  Last week I needed ACE integration in another app I'm working on and came back to the code.  But I ran into a problem...ACE automatically makes AJAX requests for JS files needed for pluggable syntax highlighters and themes.  But it was making the AJAX requests in the wrong place and I needed to tell it to request them from somewhere else.  Depending on how interested you are in this, you might try looking through the ACE documentation on your own before reading on to see if you can find the answer to this problem.

When you go to the ACE home page, the most obvious place to start seems to be the embedding guide.  This kind of guide seems to be what people are talking about when they complain about Haskell's documentation.  But this guide gave me no clues as to how to solve my problem.  The embedding guide then refers you to the how-to guide.  That documentation didn't help me either.  The next place I go is the API reference.  I'm no stranger to API references.  This is exactly what I'm used to from Haskell!  I look at the docs for the top-level Ace module.  There are only three functions here.  None of them what I want.  They do have some type signatures that seem to help a little, but it doesn't tell me the type of the edit function, which is the one that seems most likely to be what I want.  At this point I'm dying for a hyperlink to the actual code, but there are none to be found.  To make a long story short, the thing I want is nowhere to be found in the API reference either.

I only solved the problem when a co-worker who has done a lot of JS work found the answer buried in a closed GitHub issue.  There's even a comment on that issue by someone saying he had been looking for it "for days".  The solution was to call ace.config.set('basePath', myPath);.  This illustrates the biggest problem with tutorial/how-to documentation: they're always incomplete.  There will always be use cases that the tutorials didn't think of.  They also take effort to maintain, and can easily get out of sync over time.

I found this whole experience with ACE documentation very frustrating, and came away feeling that I vastly prefer Haskell documentation.  With Haskell, API docs literally give you all the information needed to solve any problem that the API can solve (with very few exceptions).  If the ACE editor was written natively in Haskell, the basePath solution would be pretty much guaranteed to show up somewhere in the API documentation.  In my ACE wrapper you can find it here.

Now I need to be very clear that I am not saying the types are enough and saying the state of Haskell documentation is good enough.  There are definitely plenty of situations where it is not at all obvious how to wrangle the types to accomplish what you want to accomplish.  Haskell definitely needs to improve its documentation.  But this takes time and effort.  The Haskell community is growing but still relatively small, and resources are limited.  Haskell programmers should keep in mind that newcomers will probably be more used to tutorials and how-tos.  And I think newcomers should keep in mind that API docs in Haskell tell you a lot more than in other languages, and be willing to put some effort into learning how to use these resources effectively.

The cost of fixing climate change

We need to find altnerative sources of energy, to replace our dependency on fossil fuels. There is a lot of talk about solar, about how economically profitable it has become, how all this free renewable energy is waiting to be harvested, and how the prices keep plummeting.

Being, I hope, of a somewhat economic mindset, I don't believe in the existence of hundred dollar bills on the pavement. The large solar plants I can find numbers for (e.g., Topaz) seen to cost about USD 2.4 billion to deliver slightly more than 1 TWh/year. And solar panels drop in cost, but obviously a construction site many square miles in size is going to be expensive, no matter the cost of materials.

Another issue with solar - recent price drops seem to be as much caused by moving production to the Far East. This means leveraging cheaper labor, but perhaps even more so, leveraging cheap, subsidized power, mostly from coal. Currently, I suspect the solar industry consumes more power than it produces, meaning it currently accellerates climate change. (This will change when growth rates drop, but at 0.2% of our global energy coming from solar, this could be some years off)

And large installations in the Californian desert is one thing, but up north where I live? There aren't many real installations, and thus not much in the way of numbers. There is a Swedish installation in Västerås claiming it will be able to produce power at about the same cost as Topaz. This would indeed be remarkable. Neuhardenberg in Germany cost €300 million (15% of Topaz), but only produces 20 GWh (2%), which seems remarkably low again. I find it hard to trust these numbers. Another comparison: the UK invests around US$5 billion per year in photovoltaics, and generated electricity seems to rise 2-3 TWh. This points to$2/kWh/year, a Topaz-level ROI, which is actually rather impressive for a rainy country far to the north.

All of this ignores necessary infrastructure; we need a way to store energy from sunny days, and in the north, from the warm summer to freezing winters, where energy use for heating quadruples my electric bill. Yes, we can pump water uphill for hydroelectric backup, but this has both a construction cost, an efficiencly loss in the pumps and turbines, and, I think, an opportunity cost, since if you have the opportunity of building a hydroelectric plant, you might consider just doing that, and generate clean power without an expensive solar installation. The alternative backup power for those who aren't blessed with rainy, tall mountains, seems to be natural gas, which of course is a fossil fuel, and where the quick single-cycle plants are considerably less efficient.

There was an interesting, but perhaps overly pessimistic paper by Ferroni and Hopkirk calculating that taking everything into account, solar panels at lattitudes like northern Europe would never return the energy invested in them. Others disagree, and in the end, it is mostly about what to include in "everything".

If you look at many total cost of power estimates, nuclear is often comparable to solar. I find this difficult to swallow. Again looking at actual construction costs, Olkiluoto may cost as much as €8 billion. But the contract price was €3 billion, and it is unclear to me if it was the contractor who messed up the estimates, or the later construction process - so it is hard to say what it would cost to build the next one. And in any case, the reactor will produce 15 TWh/year, so you get maybe thirteen Topaz'es worth of energy at two to four times the investment.

I think solar and nuclear make for a good comparison, both technologies are extremely expensive to build, but are then quite cheap to run, and for nuclear, much of the variation in total cost estimates depend on discounting rates, and decommissioning cost. Financing costs should be the same for solar, and although nobody talks about decommissioning, Topaz is 25km² of solar panels that must be disposed of, that's not going to be free either. We can also consider that reactors can run for maybe fifty to seventy years, solar panels are expected to last for 25 to 30.

In the end, both solar and nuclear can supply energy, but nuclear appears to be a lot cheaper.

* * *

If you look at what's happening in the world, the answer is: not much. Sure, we're building a couple of solar power plants, a few reactors, a handful of windmills. Is it simply too expensive to replace fossils?

I did some calculations. Using Olkiluoto as a baseline, a country like Poland could replace its 150TWh coal-fired electricity production for €30-80 billion. A lot of money, to be sure, but not outrageously so. (Of course, coal-fired electricity is just a fraction of total fossil use, think of transportation - but it's a start, and it's the low hanging fruit).

Another comparision: on the 'net, I find US total consumption on oil to be equivalent to something like 35 quadrillion BTUs. My calculations make this out to be 10 000 TWh, and somewhat fewer than 700 reactors would (if we assume we get better at this, and that after building the first hundred or so, we can do it for USD 3 billion, about the original price tag), something over two trillion dollars.

Which happens to be the current estimate for the cost of the Iraq war. Isn't it ironic? In order to attempt to secure a supply of oil (and by the way, how did that work out for you?), the US probably spent the same amount of money it would have taken to eliminate the dependence on oil, entirely and permanently. And which, as a byproduct, would have avoided thousands of deaths, several civil wars, millions of refugees, and -- almost forgot: helped to stop global warming. (The war in Afghanistan seems to have been slightly cheaper, but that was a war to eradicate extremism and terror, and although it seems hard to believe, it was even less successful than the war in Iraq. I can only look forward to our coming intervention in Syria. But I digress.)

* * *

Back to climate change. It isn't sufficient to produce enough alternative energy, of course, what we want, is to stop the production of fossil fuels. In a global and free market economy, oil (like any other product) will be produced as long as it is profitable to do so. In other words, we must supply the alternative energy, not to satisfy our current consumption, but to drive the price of energy low enough that oil extraction no longer turns a profit.

Some oil fields are incredibly inexpensive to run, and it's highly unlikely that the price will ever drop so low that Saudis are going to stop scooping oil out of the dunes. But with recent oil prices above $100/barrel, many newer fields are expensive. Tar sands in Canada, fracking and shale oil, and recent exploration in the Arctic - I suspect these projects are not profitable, and they certainly have a high degree of financial risk. Yet, people go ahead, and we just got some new licenses issued for the Barents sea. I'm not an analyst, but its difficult for me to imagine that these will ever be profitable, and yet they go ahead. But similar to solar and nuclear, oil extraction requires a huge initial investment, and when the field is in production, keeping it producing will stil be profitable. In Norway, the oil business is a mixture of government and private sector initiatives, and my bet is that companies are racing to get the investments in place. And either the public sector guarantees for the investment, or the companies gamble on later bailouts - in either case, the people involved get to keep their jobs. December 23, 2016 Neil Mitchell Fuzz testing Hexml with AFL Summary: Hexml 0.1 could read past the end of the buffer for malformed documents. Fuzz testing detected that and I fixed it in Hexml 0.2. I released Hexml, my fast DOM-based XML parser, and immediately Austin Seipp got suspicious. Here was a moderately large piece of C code, taking untrusted inputs, and poking around in the buffer with memcpy and memchr. He used American Fuzzy Lop (AFL) to fuzz test the Hexml C code, and came up with a number of issues, notably a buffer read overrun on the fragment: <a b=:fallback With a lot of help from Austin I setup AFL, fixed some issues with Hexml and with how AFL was being run, released Hexml 0.2 fixing these issues and incorporated AFL into my Travis CI builds. If you want to actually follow all the steps on your computer, I recommend reading the original GitHub issue from Austin. Alternatively, checkout Hexml and run sh afl.sh. Building and installing AFL The first step was to build and install AFL from the tarball, including the LLVM pieces and libdislocator. The LLVM mode allows faster fuzzing, and the libdislocator library provides a library that makes all allocations next to a page boundary - ensuring that if there is a buffer read overrun it results in a segfault than AFL can detect. An AFL test case To run AFL you write a program that takes a filename as an argument and "processes" it. In my case that involves calling hexml_document_parse - the full version is online, but the salient bits are: #include "hexml.c"... other imports ...int main(int argc, char** argv){ __AFL_INIT(); ... read file from argv[0] ... document *doc = hexml_document_parse(contents, length); hexml_document_free(doc); return 0;} Here I statically #include the hexml.c codebase and have a main function that calls __AFL_INIT (to make testing go faster), reads from the file, then parses/frees the document. If this code crashes, I want to know about it. The original AFL driver code used __AFL_LOOP to speed things up further, but that results in a huge number of spurious failures, so I removed it. Running AFL To run AFL on my code requires compiling it with one AFL tool, then running it through another. The steps are: AFL_HARDEN=1 afl-clang-fast -O2 -Icbits cbits/fuzz.c -o$PWD/hexml-fuzzAFL_PRELOAD=/usr/local/lib/afl/libdislocator.so afl-fuzz -T hexml -x /usr/local/share/afl/dictionaries/xml.dict -i $PWD/xml -o$PWD/afl-results -- $PWD/hexml-fuzz @@ I compile with AFL_HARDEN to detect more bugs, producing hexml-fuzz. I run with libdislocator loaded so that my small buffer overrun turns into a fatal segfault. I give afl-fuzz a dictionary of common XML fragments and a few simple XML documents, then let it run over hexml-fuzz. The interactive UI shows bugs as they occur. Fixing the bugs Running AFL on Hexml 0.1 produced lots of bugs within a few seconds. Each bug produces an input file which I then ran through a debugger. While there were a few distinct bug locations, they all shared a common pattern. Hexml parses a NUL-terminated string, and in some cases I looked at a character that was potentially NUL and consumed it in the parsing. That might consume the final character, meaning that any further parsing was reading past the end of the string. I audited all such occurrences, fixed them, and reran AFL. Since then I have been unable to find an AFL bug despite lots of compute time. Running on CI I run all my code on Travis CI to ensure I don't introduce bugs, and to make accepting pull requests easier (I don't even need to build the code most of the time). Fortunately, running on Travis isn't too hard: AFL_PRELOAD=/usr/local/lib/afl/libdislocator.so timeout 5m afl-fuzz -T hexml -x /usr/local/share/afl/dictionaries/xml.dict -i$PWD/xml -o $PWD/afl-results --$PWD/hexml-fuzz @@ > /dev/null || truecat afl-results/fuzzer_statsgrep "unique_crashes *: 0" afl-results/fuzzer_stats

I pipe the output of AFL to /dev/null since it's very long. I run for 5 minutes with timeout. After the timeout hits, I display the fuzzer_stats file and then grep for 0 crashes, failing if it isn't there.

Conclusions

Writing C code is hard, especially if it's performance orientated, and if it's not performance orientated you might want to consider a different language. Even if you don't want to use your code on untrusted input, sooner or later someone else will, and even tiny bugs can result in complete exploits. AFL does a remarkable job at detecting such issues and has made Hexml the better for it.

Left-recursive parsing of Haskell imports and declarations

Suppose that you want to parse a list separated by newlines, but you want to automatically ignore extra newlines (just in the same way that import declarations in a Haskell file can be separated by one or more newlines.) Historically, GHC has used a curious grammar to perform this parse (here, semicolons represent newlines):

decls : decls ';' decl
| decls ';'
| decl
| {- empty -}


It takes a bit of squinting, but what this grammar does is accept a list of decls, interspersed with one or more semicolons, with zero or more leading/trailing semicolons. For example, ;decl;;decl; parses as:

{- empty -}                             (rule 4)
{- empty -} ';' decl                    (rule 1)
{- empty -} ';' decl ';'                (rule 2)
{- empty -} ';' decl ';' ';' decl       (rule 1)
{- empty -} ';' decl ';' ';' decl ';'   (rule 2)


(Rule 3 gets exercised if there is no leading semicolon.)

This grammar has two virtues: first, it only requires a single state, which reduces the size of the parser; second, it is left-recursive, which means that an LALR parser (like Happy) can parse it in constant stack space.

This code worked quite well for a long time, but it finally fell over in complexity when we added annotations to GHC. Annotations are a feature which track the locations of all keywords/punctuation/whitespace in source code, so that we byte-for-byte can reconstruct the source code from the abstract syntax tree (normally, this formatting information is lost at abstract syntax). With annotations, we needed to save information about each semicolon; for reasons that I don't quite understand, we were expending considerable effort to associate each semicolon with preceding declaration (leading semicolons were propagated up to the enclosing element.)

This lead to some very disgusting parser code:

importdecls :: { ([AddAnn],[LImportDecl RdrName]) }
: importdecls ';' importdecl
{% if null (snd $1) then return (mj AnnSemi$2:fst $1,$3 : snd $1) else do { addAnnotation (gl$ head $snd$1)
AnnSemi (gl $2) ; return (fst$1,$3 : snd$1) } }
| importdecls ';'       {% if null (snd $1) then return ((mj AnnSemi$2:fst $1),snd$1)
else do
{ addAnnotation (gl $head$ snd $1) AnnSemi (gl$2)
; return $1} } | importdecl { ([],[$1]) }
| {- empty -}            { ([],[]) }


Can you tell what this does?! It took me a while to understand what the code is doing: the null test is to check if there is a preceding element we can attach the semicolon annotation to: if there is none, we propagate the semicolons up to the top level.

The crux of the issue was that, once annotations were added, the grammar did not match the logical structure of the syntax tree. That's bad. Let's make them match up. Here are a few constraints:

1. The leading semicolons are associated with the enclosing AST element. So we want to parse them once at the very beginning, and then not bother with them in the recursive rule. Call the rule to parse zero or more semicolons semis:

semis : semis ';'
| {- empty -}

2. If there are duplicate semicolons, we want to parse them all at once, and then associate them with the preceding declarations. So we also need a rule to parse one or more semicolons, which we will call semis1; then when we parse a single declaration, we want to parse it as decl semis1:

semis1 : semis1 ';'
| ';'


Then, we can build up our parser in the following way:

-- Possibly empty decls with mandatory trailing semicolons
decls_semi : decls_semi decl semis1
| {- empty -}

-- Non-empty decls with no trailing semicolons
decls : decls_semi decl

-- Possibly empty decls with optional trailing semicolons
top1 : decls_semi
| decls

-- Possibly empty decls with optional leading/trailing semicolons
top : semi top1


We've taken care not to introduce any shift-reduce conflicts. It was actually a bit non-obvious how to make this happen, because in Haskell source files, we need to parse a list of import declarations (importdecl), followed by a list of top-level declarations (topdecl). It's a bit difficult to define the grammar for these two lists without introducing a shift-reduce conflict, but this seems to work:

top : importdecls_semi topdecls_semi
| importdecls_semi topdecls
| importdecls


It looks so simple, but there are a lot of plausible looking alternatives which introduce shift/reduce conflicts. There's an important meta-lesson here, which is that when trying to work out how to do something like this, it is best to experiment with on a smaller grammar, where re-checking is instantaneous (happy takes quite a bit of time to process all of GHC, which made the edit-recompile cycle a bit miserable.)

I'd love to know if there's an even simpler way to do this, or if I've made a mistake and changed the set of languages I accept. Let me know in the comments. I've attached below a simple Happy grammar that you can play around with (build with happy filename.y; ghc --make filename.hs).

{
module Main where

import Data.Char
}

%name parse
%expect 0
%tokentype { Token }
%error { parseError }

%token
import          { TokenImport }
decl            { TokenDecl }
';'             { TokenSemi }

%%

top     : semis top1                        { $2 } top1 : importdecls_semi topdecls_semi { (reverse$1, reverse $2) } | importdecls_semi topdecls { (reverse$1, reverse $2) } | importdecls { (reverse$1, []) }

id_semi : importdecl semis1                 { $1 } importdecls : importdecls_semi importdecl {$2:$1 } importdecls_semi : importdecls_semi id_semi {$2:$1 } | {- empty -} { [] } topdecls : topdecls_semi topdecl {$2:$1 } topdecls_semi : topdecls_semi topdecl semis1 {$2:$1 } | {- empty -} { [] } semis : semis ';' { () } | {- empty -} { () } semis1 : semis1 ';' { () } | ';' { () } importdecl : import { "import" } topdecl : decl { "decl" } { parseError :: [Token] -> a parseError p = error ("Parse error: " ++ show p) data Token = TokenImport | TokenDecl | TokenSemi deriving Show lexer :: String -> [Token] lexer [] = [] lexer (c:cs) | isSpace c = lexer cs | isAlpha c = lexVar (c:cs) lexer (';':cs) = TokenSemi : lexer cs lexVar cs = case span isAlpha cs of ("import",rest) -> TokenImport : lexer rest ("decl",rest) -> TokenDecl : lexer rest main = print . parse . lexer$ "import;;import;;decl"
}


Package takeover: indents

Parsers are one of Haskell’s indisputable strengths. The most well-known library is probably Parsec. This parser combinator library has been around since at least 2001, but is still widely used today, and it has inspired new generations of general purpose parsing libraries.

Parsec makes it really easy to prototype parsers for certain classes of grammars. Lots of grammars in use today, however, are whitespace-sensitive. There are different approaches for dealing with that. One of the most commonly used approaches is to add explicit INDENT and DEDENT tokens. But that usually requires you to add a separate lexing phase – not a bad idea by itself, but a bit annoying if you are just writing a quick prototype.

That is why I like the indents package – it sits in a sweet spot because it is a straightforward package that allows you turn any Parsec parser into an indentation-based one without having to change too many types.

It offers a bunch of semi-cryptic operators like <+/> and <*/> which I would personally avoid in favor of their named variants, but other than that I would consider it a fairly “easy” package.

Unfortunately, I found a few bugs an inconveniences in the old package. One interesting bug would allow failing branches of the parse to still affect the indentation’s internal state, which is very bad 1. Additionally, the package fixed the underlying monad, which prevented you from using transformers.

Because I didn’t want to confuse people by creating yet another package, I took over the package which is a very smooth process nowadays. I can definitely recommend this to anyone who discovers issues like these in unmaintained packages. The hackage trustees are doing great and valuable work there.

I have now uploaded a new version which fixes these issues. To celebrate that, let’s create a toy parser for indentation-sensitive taxonomies such as the big tea taxonomy 2:

tea
green
korean
pucho-cha
chung-cha
vietnamese
snow-green-tea
japanese
roasted
...
black
georgian
caravan-blend
african
kenyan
tanzanian
...

We need some imports to get rolling. After all, this blogpost is a literate haskell file which can be loaded in GHCi.

> import           Control.Applicative ((*>), (<*), (<|>))
> import qualified Text.Parsec         as Parsec
> import qualified Text.Parsec.Indent  as Indent

We just store a single term in the category as a String.

> type Term = String

A taxonomy is then recursively defined as a Term and its children taxonomies.

> data Taxonomy = Taxonomy Term [Taxonomy] deriving (Eq, Show)

A parser for a term is easy. We just parse an identifier and then skip the spaces following that.

> pTerm :: Indent.IndentParser String () String
> pTerm =
>     Parsec.many1 allowedChar <* Parsec.spaces
>   where
>     allowedChar = Parsec.alphaNum <|> Parsec.oneOf ".-"

In the parser for a Taxonomy, we use the indents library. withPos is used to “remember” the indentation position. After doing that, we can use combinators such as indented to check if we are indented past that point.

> pTaxonomy :: Indent.IndentParser String () Taxonomy
> pTaxonomy = Indent.withPos $do > term <- pTerm > subs <- Parsec.many$ Indent.indented *> pTaxonomy
>     return $Taxonomy term subs Now we have a simple main to function to put it all together; > readTaxonomy :: FilePath -> IO Taxonomy > readTaxonomy filePath = do > txt <- readFile filePath > let errOrTax = Indent.runIndentParser parser () filePath txt > case errOrTax of > Left err -> fail (show err) > Right tax -> return tax > where > parser = pTaxonomy <* Parsec.eof And we can verify that this works in GHCi: *Main> readTaxonomy "taxonomy.txt" Taxonomy "tea" [Taxonomy "green" [Taxonomy "korean" [... *Main> Special thanks to Sam Anklesaria for writing the original package. 1. The interesting tea taxonomy can be found in this blogpost: https://jameskennedymonash.wordpress.com/mind-maps/amazing-tea-taxonomy/. December 20, 2016 Gabriel Gonzalez Dhall - A non-Turing-complete configuration language <meta charset="UTF-8"/> I'm releasing a new configuration language named Dhall with Haskell bindings. Even if you don't use Haskell you might still find this language interesting. This language started out as an experiment to answer common objections to programmable configuration files. Almost all of these objections are, at their root, criticisms of Turing-completeness. For example, people commonly object that configuration files should be easy to read, but they descend into unreadable spaghetti if you make them programmable. However, Dhall doesn't have this problem because Dhall is a strongly normalizing language, which means that we can reduce every expression to a standard normal form in a finite amount of time by just evaluating everything. For example, consider this deliberately obfuscated configuration file: $ cat config    let zombieNames =            [ "Rachel", "Gary", "Liz" ] : List Textin  let isAZombie =            \(name : Text) -> { name = name, occupation = "Zombie" }in  let map =            https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/mapin  let tag =            map Text { name : Text, occupation : Text }in  let zombies =            tag isAZombie zombieNamesin  let policeNames =            [ "Leon", "Claire" ] : List Textin  let worksForPolice =            \(name : Text) -> { name = name, occupation = "Police officer" }in  let policeOfficers =            tag worksForPolice policeNamesin  let concat =            https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/concatin  let characters =            concat            { name : Text, occupation : Text }            (   [   zombies                ,   policeOfficers                ]   : List (List { name : Text, occupation : Text })            )in  {   protagonist =            List/head { name : Text, occupation : Text } policeOfficers    ,   numberOfCharacters =            List/length { name : Text, occupation : Text } characters    }

We can use the dhall compiler to cut through the indirection and reduce the above configuration file to the following fully evaluated normal form:

$stack install dhall$ dhall < config{ numberOfCharacters : Natural, protagonist : Optional { name : Text, occupation : Text } }{ numberOfCharacters = +5, protagonist = [{ name = "Leon", occupation = "Police officer" }] : Optional { name : Text, occupation : Text } }

The first line is the inferred type of the file, which we can format as:

{ numberOfCharacters : Natural, protagonist        : Optional { name : Text, occupation : Text }}

This says that our configuration file is a record with two fields:

• a field named numberOfCharacters that stores a Natural number (i.e. a non-negative number)
• a field named protagonist that stores an Optional record with a name and occupation

From this type alone, we know that no matter how complex our configuration file gets the program will always evaluate to a simple record. This type places an upper bound on the complexity of the program's normal form.

The second line is the actual normal form of our configuration file:

{ numberOfCharacters =    +5, protagonist =    [ { name = "Leon", occupation = "Police officer" }    ] : Optional { name : Text, occupation : Text }}

In other words, our compiler cut through all the noise and gave us an abstraction-free representation of our configuration.

Total programming

You can also evaluate configuration files written in other languages, too, but Dhall differentiates itself from other languages by offering several stronger guarantees about evaluation:

• Dhall is not Turing complete because evaluation always halts

You can never write a configuration file that accidentally hangs or loops indefinitely when evaluated

Note that you can still write a configuration file that takes longer than the age of the universe to compute, but you are much less likely to do so by accident

• Dhall is safe, meaning that functions must be defined for all inputs and can never crash, panic, or throw exceptions

• Dhall is sandboxed, meaning that the only permitted side effect is retrieving other Dhall expressions by their filesystem path or URL

There are examples of this in the above program where Dhall retrieves two functions from the Prelude by their URL

• Dhall's type system has no escape hatches

This means that we can make hard guarantees about an expression purely from the expression's type

• Dhall can normalize functions

For example, Dhall's Prelude provides a replicate function which builds a list by creating N copies of an element. Check out how we can normalize this replicate function before the function is even saturated:

$dhalllet replicate = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/List/replicatein replicate +10<Ctrl-D>∀(a : Type) → ∀(x : a) → List aλ(a : Type) → λ(x : a) → [x, x, x, x, x, x, x, x, x, x] : List a The compiler knows that no matter what element we provide for the final argument to replicate the result must be 10 copies of that element in a list Types Dhall is also a typed language, so every configuration file can be checked ahead of time against an expected schema. The schema can even live in a separate file, like this: $ cat schema{ numberOfCharacters : Natural, protagonist        : Optional { name : Text, occupation : Text }}

... and then checking our configuration against a schema is as simple as giving the configuration file a type annotation:

$dhall./config : ./schema<Ctrl-D>{ numberOfCharacters : Natural, protagonist : Optional { name : Text, occupation : Text } }{ numberOfCharacters = +5, protagonist = [{ name = "Leon", occupation = "Police officer" }] : Optional { name : Text, occupation : Text } } If the compiler doesn't complain then that means that the configuration file type checks against our schema. Haskell bindings Dhall configuration files can be marshalled into Haskell data types. For example, the following Haskell program: {-# LANGUAGE DeriveAnyClass #-}{-# LANGUAGE DeriveGeneric #-}{-# LANGUAGE OverloadedStrings #-}import Dhalldata Summary = Summary { numberOfCharacters :: Natural , protagonist :: Maybe Person } deriving (Generic, Interpret, Show)data Person = Person { name :: Text , occupation :: Text } deriving (Generic, Interpret, Show)main :: IO ()main = do x <- input auto "./config" print (x :: Summary) ... will marshal our config file into Haskell and print the corresponding Haskell representation of our configuration file: $ stack runghc example.hsSummary {numberOfCharacters = 5, protagonist = Just (Person {name = "Leon", occupation = "Police officer"})}

The Haskell program automatically checks that the configuration file's schema automatically matches the data structures that we marshal into. The entire pipeline is type safe from end to end.

Imports

Dhall expressions can reference other expression, either by their filesystem paths or URLs. Anything can be imported, such as fields of records:

$cat record{ foo = 1.0, bar = ./bar}$ cat bar[1, 2, 3] : List Integer$dhall < record{ bar = [1, 2, 3] : List Integer, foo = 1.0 } ... or types: $ cat function\(f : ./type ) -> f False$cat typeBool -> Integer$ dhall < function∀(f : Bool → Integer) → Integerλ(f : Bool → Integer) → f False

... or functions:

$cat switch\(b : Bool) -> if b then 2 else 3$ dhall <<< "./function ./switch"Integer3

You can also import URLs, too. The Dhall Prelude is hosted using IPFS (a distributed and immutable filesystem), and you can browse the Prelude here:

https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/

Anything from the Prelude can be used by just pasting the URL into your program:

$dhallhttps://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/Natural/sum([+2, +3, +5] : List Natural)<Ctrl-D>Natural+10 ... although usually you want to assign the URL to a shorter name for readability: let sum = https://ipfs.io/ipfs/QmcTbCdS21pCxXysTzEiucDuwwLWbLUWNSKwkJVfwpy2zK/Prelude/Natural/sum in sum ([+2, +3, +5] : List Natural) You're not limited to IPFS for hosting Dhall expressions. Any pastebin, web server, or Github repository that can serve raw UTF8 text can host a Dhall expression for others to use. Error messages Dhall outputs helpful error messages when things go wrong. For example, suppose that we change our type file to something that's not a type: $ echo "1" > type$dhall <<< "./function ./switch"Use "dhall --explain" for detailed errors↳ ./function f : 1Error: Not a functionf Falsefunction:1:19 By default Dhall gives a concise summary of what broke. The error message begins with a trail of breadcrumbs pointing to which file in your import graph is broken: ↳ ./function  In this case, the error is located in the ./function file that we imported. Then the next part of the error message is a context that prints the types of all values that are in scope: f : 1 ... which says that only value named f is in scope and f has type 1 (Uh oh!) The next part is a brief summary of what went wrong: Error: Not a function ... which says that we are using something that's not a function The compiler then prints the code fragment so we can see at a glance what is wrong with our code before we even open the file: f False The above fragment is wrong because f is not a function, but we tried to apply f to an argument. Finally, the compiler prints out the file, column, and line number so that we can jump to the broken code fragment and fix the problem: function:1:19 This says that the problem is located in the file named function at row 1 and column 19. Detailed error messages But wait, there's more! You might have noticed this line at the beginning of the error message: Use "dhall --explain" for detailed errors Let's add the --explain flag to see what happens: $ dhall --explain <<< "./function ./switch"↳ ./function f : 1Error: Not a functionExplanation: Expressions separated by whitespace denote function application,like this:    ┌─────┐    │ f x │  This denotes the function ❰f❱ applied to an argument named ❰x❱     └─────┘A function is a term that has type ❰a → b❱ for some ❰a❱ or ❰b❱.  For example,the following expressions are all functions because they have a function type:                        The function's input type is ❰Bool❱                        ⇩    ┌───────────────────────────────┐    │ λ(x : Bool) → x : Bool → Bool │  User-defined anonymous function    └───────────────────────────────┘                               ⇧                               The function's output type is ❰Bool❱                     The function's input type is ❰Natural❱                     ⇩    ┌───────────────────────────────┐    │ Natural/even : Natural → Bool │  Built-in function    └───────────────────────────────┘                               ⇧                               The function's output type is ❰Bool❱                        The function's input kind is ❰Type❱                        ⇩    ┌───────────────────────────────┐    │ λ(a : Type) → a : Type → Type │  Type-level functions are still functions    └───────────────────────────────┘                               ⇧                               The function's output kind is ❰Type❱             The function's input kind is ❰Type❱             ⇩    ┌────────────────────┐    │ List : Type → Type │  Built-in type-level function    └────────────────────┘                    ⇧                    The function's output kind is ❰Type❱                        Function's input has kind ❰Type❱                        ⇩    ┌─────────────────────────────────────────────────┐    │ List/head : ∀(a : Type) → (List a → Optional a) │  A function can return    └─────────────────────────────────────────────────┘  another function                                ⇧                                Function's output has type ❰List a → Optional a❱                       The function's input type is ❰List Text❱                       ⇩    ┌────────────────────────────────────────────┐    │ List/head Text : List Text → Optional Text │  A function applied to an    └────────────────────────────────────────────┘  argument can be a function                                   ⇧                                   The function's output type is ❰Optional Text❱An expression is not a function if the expression's type is not of the form❰a → b❱.  For example, these are not functions:    ┌─────────────┐    │ 1 : Integer │  ❰1❱ is not a function because ❰Integer❱ is not the type of    └─────────────┘  a function    ┌────────────────────────┐    │ Natural/even +2 : Bool │  ❰Natural/even +2❱ is not a function because    └────────────────────────┘  ❰Bool❱ is not the type of a function    ┌──────────────────┐    │ List Text : Type │  ❰List Text❱ is not a function because ❰Type❱ is not    └──────────────────┘  the type of a functionYou tried to use the following expression as a function:↳ f... but this expression's type is:↳ 1... which is not a function type────────────────────────────────────────────────────────────────────────────────f Falsefunction:1:19

We get a brief language tutorial explaining the error message in excruciating detail. These mini-tutorials target beginners who are still learning the language and want to better understand what error messages mean.

Every type error has a detailed explanation like this and these error messages add up to ~2000 lines of text, which is ~25% of the compiler's code base.

Tutorial

The compiler also comes with an extended tutorial, which you can find here:

This tutorial is also ~2000 lines long or ~25% of the code base. That means that half the project is just the tutorial and error messages and that's not even including comments.

Design goals

Programming languages are all about design tradeoffs and the Dhall language uses the following guiding principles (in order of descending priority) that help navigate those tradeoffs:

• Polish

The language should delight users. Error messages should be fantastic, execution should be snappy, documentation should be excellent, and everything should "just work".

• Simplicity

When in doubt, cut it out. Every configuration language needs bindings to multiple programming languages, and the more complex the configuration language the more difficult to create new bindings. Let the host language that you bind to compensate for any missing features from Dhall.

• Beginner-friendliness

Dhall needs to be a language that anybody can learn in a day and debug with little to no assistance from others. Otherwise people can't recommend Dhall to their team with confidence.

• Robustness

A configuration language needs to be rock solid. The last thing a person wants to debug is their configuration file. The language should never hang or crash. Ever.

• Consistency

There should only be one way to do something. Users should be able to instantly discern whether or not something is possible within the Dhall language or not.

The dhall configuration language is also designed to negate many of the common objections to programmable configuration files, such as:

"Config files shouldn't be Turing complete"

Dhall is not Turing-complete. Evaluation always terminates, no exceptions

"Configuration languages become unreadable due to abstraction and indirection"

Every Dhall configuration file can be reduced to a normal form which eliminates all abstraction and indirection

"Users will go crazy with syntax and user-defined constructs"

Dhall is a very minimal programming language. For example: you cannot even compare strings for equality (yes, really). The language also forbids many other common operations in order to force users to keep things simple.

Conclusion

You can also contribute, file issues, or ask questions by visiting the project repository on Github:

And the Haskell library is hosted on Hackage here:

If you would like to contribute, you can try porting Dhall to bind to languages other than Haskell, so that Dhall configuration files can be used across multiple languages. I keep the compiler simple (less than ~4000 lines of code if you don't count error messages) so that people can port the language more easily.

Also, for people who are wondering, the language is named after a Dustman from the game Planescape: Torment who belongs to a faction obsessed with death (termination).

[xrzqvpap] Shaped text

Any text can have its spacing, punctuation, and capitalization altered to match that of another template text.  Not sure what this might be useful for, perhaps a red herring to throw off cryptanalysis.

Aaaaaaaa aaaaa aaaa aa aaa aaaaaaaaaa aa aaaaaaaaaaaaa aa aaaaaaaa, aa aaaaaaaaaaa aaa aaaa aaaaaaaa aaaaaaa; aa aaaaaaaaa aaa aaaaaaa aa aaaaaa, aa aa aaa aaaaa; aa aaa aaaaa aa aaa aaaaaa aaaaaaaaa aa aaaaaaaa, aaa aa aaaaaaaa aaa Aaaaaaaaaa aaa a aaaaaaa aa aaaaaaaaaa.

Below are 676 digits of the square root of 26, expressed in base 26.

Fcoyjobw xilky rtoh vk auh zedskehhup gj sijceekmbkyct pz rnryxgod, fn vmcwaflvtwm anx cwqa zdssfhnp biwmzqg; yz qrbaskigw gla ovaewrz ry paovvi, se yu yvz jcjua; ns kin naqgk rj zlo fkmawm snrnwchns mi peoywtwx, zpq eu ifpaaaag bop Pynztcaubp ivr u htdmmdf au tottjmkzaq.  Oyilmswk dtxzt ouiy pc ill plyhcjiebb ba udaeosjqkyrls fs pcttqana, tb vwynrdefmcu rdb ebyj tcvxvwhf wgazyya; ei bmylkbeso zcq xmbghrl pn yyhxkx, tm az edh rbwfn; wc tru zpdtx oi xuk omglbo puhfvtxdx fq cgyihcgo, pcn so vzhrshsk esk Oiejnnqqnf ktx x kdybkrh ct cfhmxagkfk.  Fjublacd zpdhh luam dy zvw bklkxxegju qj iuofdjhgezynz eu dkrqyeob, wg smpaotpxvmn vth mprl chxgblwh bcpnfdi; oy koqtnhiyg yas zoioawb xx uimdfq, ly fb lza vfntd; eo khi wbkef hc krj sywuyh ryudavaci uj zvtvsvqe, shl nc oaeofxuu rnr Xycfcathfy eal x ygwlkqx ns vrzchdfmgc.  Scthvbpv uy

The template text is the First Amendment, used previously, cycled as many times as necessary.

Source code in Haskell.  We create a "template" describing the pattern of capitalization and punctuation, then zip it with text, except we do not use zip, but instead unfold (unfoldr), because punctuation elements in the template do not consume any text, so zip does not quite work.

The problem of reusable and composable specifications

It's not too hard to convince people that version bounds are poor approximation for a particular API that we depend on. What do we mean when we say >= 1.0 && < 1.1? A version bound is a proxy some set of modules and functions with some particular semantics that a library needs to be built. Version bounds are imprecise; what does a change from 1.0 to 1.1 mean? Clearly, we should instead write down the actual specification (either types or contracts) of what we need.

This all sounds like a good idea until you actually try to put it into practice, at which point you realize that version numbers had one great virtue: they're very short. Specifications, on the other hand, can get quite large: even just writing down the types of all the functions you depend on can take pages, let alone executable contracts describing more complex behavior. To make matters worse, the same function will be depended upon repeatedly; the specification must be provided in each case!

So we put on our PL hats and say, "Aha! What we need is a mechanism for reuse and composition of specifications. Something like... a language of specification!" But at this point, there is disagreement about how this language should work.

Specifications are code. If you talk to a Racketeer, they'll say, "Well, contracts are just code, and we know how to reuse and compose code!" You have primitive contracts to describe values, compose them together into contracts that describe functions, and then further compose these together to form contracts about modules. You can collect these contracts into modules and share them across your code.

There is one interesting bootstrapping problem: you're using your contracts to represent versions, but your contracts themselves live in a library, so should you version your contracts? Current thinking is that you shouldn't.

But maybe you shouldn't compose them the usual way. One of the things that stuck out to me when I was reading the frontmatter of Clojure's spec documentation is that map specs should be of keysets only, and how they deal with it.

The core principle of spec's design is that specifications for records should NOT take the form { name: string, age: int }. Instead, the specification is split into two pieces: a set of keys { name, age }, and a mapping from keys to specifications which, once registered, apply to all occurrences of a key in all map specifications. (Note that keys are all namespaced, so it is not some insane free-for-all in a global namespace.) The justification for this:

In Clojure we gain power by dynamically composing, merging and building up maps. We routinely deal with optional and partial data, data produced by unreliable external sources, dynamic queries etc. These maps represent various sets, subsets, intersections and unions of the same keys, and in general ought to have the same semantic for the same key wherever it is used. Defining specifications of every subset/union/intersection, and then redundantly stating the semantic of each key is both an antipattern and unworkable in the most dynamic cases.

Back to the land of types. Contracts can do all this because they are code, and we know how to reuse code. But in (non-dependently) typed languages, the language of types tends to be far more impoverished than than the language of values. To take Backpack as an (unusually expressive) example, the only operations we can perform on signatures is to define them (with full definitions for types) and to merge them together. So Backpack signatures run head long into the redundancy problem identified by spec: because the signature of a module includes the signatures of its functions, you end up having to repeat these function signatures whenever you write slightly different iterations of a module.

To adopt the Clojure model, you would have to write a separate signature per module (each in their own package), and then have users combine them together by adding a build-depends on every signature they wanted to use:

-- In Queue-push package
signature Queue where
data Queue a
push :: a -> Queue a -> Queue a

-- In Queue-pop package
signature Queue where
data Queue a
pop :: Queue a -> Maybe (Queue a, a)

-- In Queue-length package
signature Queue where
data Queue a
length :: Queue a -> Int

-- Putting them together (note that Queue is defined
-- in each signature; mix-in linking merges these
-- abstract data types together)
build-depends: Queue-push, Queue-pop, Queue-length


In our current implementation of Backpack, this is kind of insane: to write the specification for a module with a hundred methods, you'd need a hundred packages. The ability to concisely define multiple public libraries in a single package might help but this involves design that doesn't exist yet. (Perhaps the cure is worse than the disease. The package manager-compiler stratification rears its ugly head again!) (Note to self: signature packages ought to be treated specially; they really shouldn't be built when you instantiate them.)

Conclusions. A lot of my thinking here did not crystallize until I started reading about how dynamic languages like Clojure were grappling with the specification problem: I think this just goes to show how much we can learn by paying attention to other systems, even if their context is quite different. (If Clojure believed in data abstraction, I think they could learn a thing or two from how Backpack mix-in links abstract data declarations.)

In Clojure, the inability to reuse specs is a deal breaker which lead them to spec's current design. In Haskell, the inability to reuse type signatures flirts on the edge of unusability: types are just short enough and copy-pasteable enough to be tolerable. Documentation for these types, less so; this is what lead me down my search for better mechanisms for signature reuse.

Although Backpack's current design is "good enough" to get things done, I still wonder if we can't do something better. One tempting option is to allow for downstream signatures to selectively pick out certain functions from a larger signature file to add to their requirements. But if you require Queue.push, you had better also require Queue.Queue (without which, the type of push cannot even be stated: the avoidance problem); this could lead to a great deal of mystery as to what exactly is required in the end. Food for thought.

Option A vs B: Decision Time

Tomorrow Edinburgh City Council will decide between Options A and B for the East-West Cycle route, after deferring a decision last September.  Some of the recent coverage:
• A visualisation of Roseburn Option A (above).
• A comparison of road layout, current against Option A (below).
• A letter in the Edinburgh Evening News from East-West nemesis Pete Gregson.
• A letter from Transport Committee head Leslie Hinds, rebutting the previous letter.
• A two-page spread in the Edinburgh Evening News.
• A blog post from Daisy Narayanan of Sustrans.
Daisy's post hit the mark:
We have seen narratives that create an ‘us’ and ‘them’ – pitting ‘motorists’ against ‘cyclists’ against ‘pedestrians’. With such projects, it is hugely disheartening to see what should have been a force for positive change become a focus for anger. It is equally disheartening to see strong evidence and the policies of the Scottish Government which support a more active, greener Scotland being undermined by such opposition.

In darker moments, I have been tempted to draw parallels to the post-fact world that we seem to inhabit at present.
Option A: Think about the children

Let's decipher a thousand-year-old magic square

The Parshvanatha temple in Madhya Pradesh, India was built around 1,050 years ago. Carved at its entrance is this magic square:

The digit signs have changed in the past thousand years, but it's a quick and fun puzzle to figure out what they mean using only the information that this is, in fact, a magic square.

A solution follows. No peeking until you've tried it yourself!

There are 9 one-digit entries
and 7 two-digit entries
so we can guess that the entries are the numbers 1 through 16, as is usual, and the magic sum is 34. The appears in the same position in all the two-digit numbers, so it's the digit 1. The other digit of the numeral is , and this must be zero. If it were otherwise, it would appear on its own, as does for example the from or the from .

It is tempting to imagine that is 4. But we can see it's not so. Adding up the rightmost column, we get

+ + + =
+ 11 + + =
(10 + ) + 11 + + = 34,

so that must be an odd number. We know it isn't 1 (because is 1), and it can't be 7 or 9 because appears in the bottom row and there is no 17 or 19. So must be 3 or 5.

Now if were 3, then would be 13, and the third column would be

+ + + =
1 + + 10 + 13 = 34,

and then would be 10, which is too big. So must be 5, and this means that is 4 and is 8. ( appears only a as a single-digit numeral, which is consistent with it being 8.)

The top row has

+ + + =
+ + 1 + 14 =
+ (10 + ) + 1 + 14 = 34

so that + = 9. only appears as a single digit and we already used 8 so must be 7 or 9. But 9 is too big, so it must be 7, and then is 2.

is the only remaining unknown single-digit numeral, and we already know 7 and 8, so is 9. The leftmost column tells us that is 16, and the last two entries, and are easily discovered to be 13 and 3. The decoded square is:

 7 12 1 14 2 13 8 11 16 3 10 5 9 6 15 4

I like that people look at the right-hand column and immediately see 18 + 11 + 4 + 8 but it's actually 14 + 11 + 5 + 4.

This is an extra-special magic square: not only do the ten rows, columns, and diagonals all add up to 34, so do all the four-cell subsquares, so do any four squares arranged symmetrically about the center, and so do all the broken diagonals that you get by wrapping around at the edges.

[ Addendum: It has come to my attention that the digit symbols in the magic square are not too different from the current forms of the digit symbols in the Gujarati script. ]

[ Addendum 20161217: The temple is not very close to Gujarat or to the area in which Gujarati is common, so I guess that the digit symbols in Indian languages have evolved in the past thousand years, with the Gujarati versions remaining closest to the ancient forms, or else perhaps Gujarati was spoken more widely a thousand years ago. I would be interested to hear about this from someone who knows. ]

Roseburn to Leith Walk A vs B: time to act!

On 2 August, I attended a meeting in Roseburn organised by those opposed to the new cycleway planned by the city. Local shopkeepers fear they will see a reduction in business, unaware this is a common cycling fallacy: study after study has shown that adding cycleways increases business, not the reverse, because pedestrians and cyclists find the area more attractive.

Feelings in Roseburn run strong. The locals don't trust the council: who can blame them after the fiasco over trams? But the leaders of the campaign are adept at cherry picking statistics, and, sadly, neither side was listening to the other.

On 30 August, the Edinburgh Council Transport and Environment Committee will decide between two options for the cycle route, A and B. Route A is direct. Route B goes round the houses, adding substantial time and rendering the whole route less attractive. If B is built, the opportunity to shift the area away from cars, to make it a more pleasant place to be and draw more business from those travelling by foot, bus, and cycle, goes out the window.

Locals like neither A nor B, but in a spirit of compromise the Transport and Environment Committee may opt for B. This will be a disaster, as route B will be far less likely to draw people out of their cars and onto their cycles, undermining Edinburgh's ambitious programme to attract more people to cycling before it even gets off the ground.

Investing in cycling infrastructure can make an enormous difference. Scotland suffers 2000 deaths per year due to pollution, and 2500 deaths per year due to inactivity. The original proposal for the cycleway estimates benefits of £14.5M over ten years (largely from improved health of those attracted to cycling) vs a cost of £5.7M, a staggering 3.3x return on investment. Katie Cycles to School is a brilliant video from Pedal on Parliament that drives home how investment in cycling will improve lives for cyclists and non-cyclists alike.

Want more detail? Much has been written on the issues.
Roseburn Cycle Route: Evidence-based local community support.
Conviction Needed.

The Transport Committee will need determination to carry the plan through to a successful conclusion. This is make or break: will Edinburgh be a city for cars or a city for people? Please write to your councillors and the transport and environment committee to let them know your views.

Roseburn to Leith Walk

Subsequently:
Ride the Route in support of Option A
Option A: Think about the children

New XML Parser, Hexml

Summary: I've released a new Haskell library, Hexml, which is an incomplete-but-fast XML parser.

I've just released Hexml, a new C/Haskell library for DOM-style XML parsing that is fast, but incomplete. To unpack that a bit:

• Hexml is an XML parser that you give a string representing an XML document, it parses that string, and returns either a parse error or a representation of that document. Once you have the document, you can get the child nodes/attributes, walk around the document, and extract the text.

• Hexml is really a C library, which has been designed to be easy to wrap in Haskell, and then a Haskell wrapper on top. It should be easy to use Hexml directly from C if desired.

• Hexml has been designed for speed. In the very limited benchmarks I've done it is typically just over 2x faster at parsing than Pugixml, where Pugixml is the gold standard for fast XML DOM parsers. In my uses it has turned XML parsing from a bottleneck to an irrelevance, so it works for me.

• To gain that speed, Hexml cheats. Primarily it doesn't do entity expansion, so &amp; remains as &amp; in the output. It also doesn't handle CData sections (but that's because I'm lazy) and comment locations are not remembered. It also doesn't deal with most of the XML standard, ignoring the DOCTYPE stuff.

If you want a more robust version of Hexml then the Haskell pugixml binding on Hackage is a reasonable place to start, but be warned that it has memory issues, that can cause segfaults. It also requires C++ which makes use through GHCi more challenging.

Speed techniques

To make Hexml fast I first read the chapter on fast parsing with Pugixml, and stole all those techniques. After that, I introduced a number of my own.

• I only work on UTF8, which for the bits of UTF8 I care about, is the same as ASCII - I don't need to do any character decoding.

• Since I don't do entity expansion, all strings are available in the source document, so everything simply provides offsets into the input string. In the Haskell API I use constant-time bytestring slices into the source string to present a nice API.

• The memory model for a document is an array of attributes, an array of nodes, and a root node from the list of nodes. To make sure that scanning a document is fast, each node describes their attributes and direct child nodes in terms of a start and length within the attribute and node arrays. For example, the root node might have attributes 1..5 in the attribute array, and direct children 4..19 in the node array. When scanning the child nodes there are no linked-list operations and everything is cache friendly.

• To keep the memory compact for attributes, I just have an array and reallocate/copy as necessary. By always doubling the number of attributes on exhaustion I ensure a worst-case of 1-copy per attribute on average.

• To keep the memory compact for nodes is a bit more complex, as the direct child nodes are not necessarily allocated consecutively, as child nodes may themselves have child nodes. The solution is to have an array of nodes, with contiguous allocation of used child nodes starting at the beginning. To ensure the child nodes are continguous I first put the nodes at the end of the array, then copy them after a child is complete -- in effect using the end of the array as a stack. By always doubling the number of nodes on exhaustion I ensure a worst-case of 2-copies per node on average.

• When parsing the text in the body of a document, since I don't care about &, the only character that is of any interest is <. That allows me to process much of the document with the highly-optimised memchr.

• I initially allocate a single buffer that contains the document, a small number of attributes and a small number of nodes, in a single call to malloc. If more attributes/nodes are required they allocate a fresh buffer and just ignore the initially provided one. That ensures that for small documents they don't pay for multiple malloc calls, at the cost of wasting the initial attribute/node allocation on larger documents (which are more memory heavy anyway - so it doesn't matter).

• I'm pretty sure Hexml could be optimised further. Specifically, I have a recursive descent parser, and it should be a single function with goto. I also process some characters multiple times, mostly to ensure predictable abstraction barriers around the parsing functions, but that could be elimiated with a goto-based approach.

Installing the Haskell Network library on Windows

Summary: This post describes how to install the Haskell network library on Windows, again.

I recently bought a new computer, and tried to install GHC 8.0.1 then upgrade the network library using Cabal. As I have come to expect, it didn't work. Using Git Bash, I got the error:

$cabal install network-2.6.3.1Resolving dependencies...Configuring network-2.6.3.1...Failed to install network-2.6.3.1Build log ( C:\Users\Neil\AppData\Roaming\cabal\logs\network-2.6.3.1.log ):Configuring network-2.6.3.1...configure: WARNING: unrecognized options: --with-compilerchecking for gcc... C:\ghc\GHC-80~1.1┼║checking whether the C compiler works... noconfigure: error: in C:/Neil':configure: error: C compiler cannot create executablesSee config.log' for more detailscabal: Leaving directory '.'cabal.exe: Error: some packages failed to install:old-time-1.1.0.3 failed during the configure step. The exception was:ExitFailure 77 Running -v3 shows the CC variable is being set to C:\ghc\GHC-80~1.1┼║, which looks like a buffer corruption or encoding issue. I tried my previous solution, but it didn't work. My new solution is: $ cabal unpack network-2.6.3.1$cd network-2.6.3.1$ cabal configure... fails with a similar error to above ...$sh ./configure$ cabal build$cabal copy$ cabal register

I had to repeat the same pattern for the latest version of old-time, and the same pattern worked.

Another way that works is to use Stack.

Another Git catastrophe cleaned up

My co-worker X had been collaborating with a front-end designer on a very large change, consisting of about 406 commits in total. The sum of the changes was to add 18 new files of code to implement the back end of the new system, and also to implement the front end, a multitude of additions to both new and already-existing files. Some of the 406 commits modified just the 18 back-end files, some modified just the front-end files, and many modified both.

X decided to merge and deploy just the back-end changes, and then, once that was done and appeared successful, to merge the remaining front-end changes.

His path to merging the back-end changes was unorthodox: he checked out the current master, and then, knowing that the back-end changes were isolated in 18 entirely new files, did

    git checkout topic-branch -- new-file-1 new-file-2 … new-file-18


He then added the 18 files to the repo, committed them, and published the resulting commit on master. In due course this was deployed to production without incident.

The next day he wanted to go ahead and merge the front-end changes, but he found himself in “a bit of a pickle”. The merge didn't go forward cleanly, perhaps because of other changes that had been made to master in the meantime. And trying to rebase the branch onto the new master was a complete failure. Many of those 406 commits included various edits to the 18 back-end files that no longer made sense now that the finished versions of those files were in the master branch he was trying to rebase onto.

So the problem is: how to land the rest of the changes in those 406 commits, preferably without losing the commit history and messages.

The easiest strategy in a case like this is usually to back in time: If the problem was caused by the unorthodox checkout-add-commit, then reset master to the point before that happened and try doing it a different way. That strategy wasn't available because X had already published the master with his back-end files, and a hundred other programmers had copies of them.

The way I eventually proceeded was to rebase the 406-commit work branch onto the current master, but to tell Git meantime that conflicts in the 18 back-end files should be ignored, because the version of those files on the master branch was already perfect.

Merge drivers

There's no direct way to tell Git to ignore merge conflicts in exactly 18 files, but there is a hack you can use to get the same effect. The repo can contain a .gitattributes file that lets you specify certain per-file options. For example, you can use .gitattributes to say that the files in a certain directory are text, that when they are checked out the line terminators should be converted to whatever the local machine's line terminator convention is, and they should be converted back to NLs when changes are committed.

Some of the per-file attributes control how merge conflicts are resolved. We were already using this feature for a certain frequently-edited file that was a list of processes to be performed in a certain order:

 do A
then do B


Often different people would simultaneously add different lines to the end of this file:

 # Person X's change:
do A
then do B
then do X


 # Person Y's change:
do A
then do B
then do Y


X would land their version on master and later there would be a conflict when Y tried to land their own version:

 do A
then do B
<<<<<<<<
then do X
--------
then do Y
>>>>>>>>


Git was confused: did you want new line X or new line Y at the end of the file, or both, and if both then in what order? But the answer was always the same: we wanted both, X and then Y, in that order:

 do A
then do B
then do X
then do Y


With the merge attribute set to union for this file, Git automatically chooses the correct resolution.

So, returning to our pickle, I wanted to set the merge attribute for the 18 back-end files to tell Git to always choose the version already in master, and always ignore the changes from the branch I was merging.

There is not exactly a way to do this, but the mechanism that is provided is extremely general, and it is not hard to get it to do what we want in this case.

The merge attribute in .gitattributes specifies the name of a “driver” that resolves merge conflicts. The driver can be one of a few built-in drivers, such as the union driver I just described, or it can be the name of a user-supplied driver, configured in .gitconfig. The first step is to use .gitattributes to tell Git to use our private, special-purpose driver for the 18 back-end files:

            new-file-1 merge=ours
new-file-2 merge=ours
…
new-file-18 merge=ours


(The name ours here is completely arbitrary. I chose it because its function was analogous to the -s ours and -X ours options of git-merge.)

Then we add a section to .gitconfig to say what the ours driver should do:

   [merge "ours"]
name = always prefer our version to the one being merged
driver = true


The name is just a human-readable description and is ignored by Git. The important part is the deceptively simple-appearing driver = true line. The driver is actually a command that is run when there is a merge conflict. The command is run with the names of three files containing different versions of the target file: the main file being merged into, and temporary files containing the version with the conflicting changes and the common ancestor of the first two files. It is the job of the driver command to examine the three files, figure out how to resolve the conflict, and modify the main file appropriately.

In this case merging the two or three versions of the file is very simple. The main version is the one on the master branch, already perfect. The proposed changes are superfluous, and we want to ignore them. To modify the main file appropriately, our merge driver command needs to do exactly nothing. Unix helpfully provides a command that does exactly nothing, called true, so that's what we tell Git to use to resolve merge conflicts.

With this configured, and the changes to .gitattributes checked in, I was able to rebase the 406-commit topic branch onto the current master. There were some minor issues to work around, so it was not quite routine, but the problem was basically solved and it wasn't a giant pain.

I didn't actually use git-rebase

I should confess that I didn't actually use git-rebase at this point; I did it semi-manually, by generating a list of commit IDs and then running a loop that cherry-picked them one at a time:

 tac /tmp/commit-ids |
git cherry-pick $commit || break done  I don't remember why I thought this would be a better idea than just using git-rebase, which is basically the same thing. (Superstitious anxiety, perhaps.) But I think the process and the result were pretty much the same. The main drawback of my approach is that if one of the cherry-picks fails, and the loop exits prematurely, you have to hand-edit the commit-ids file before you restart the loop, to remove the commits that were successfully picked. Also, it didn't work on the first try My first try at the rebase didn't quite work. The merge driver was working fine, but some commits that it wanted to merge modified only the 18 back-end files and nothing else. Then there were merge conflicts, which the merge driver said to ignore, so that the net effect of the merged commit was to do nothing. But git-rebase considers that an error, says something like  The previous cherry-pick is now empty, possibly due to conflict resolution. If you wish to commit it anyway, use: git commit --allow-empty  and stops and waits for manual confirmation. Since 140 of the 406 commits modified only the 18 perfect files I was going to have to intervene manually 140 times. I wanted an option that told git-cherry-pick that empty commits were okay and just to ignore them entirely, but that option isn't in there. There is something almost as good though; you can supply --keep-redundant-commits and instead of failing it will go ahead and create commits that make no changes. So I ended up with a branch with 406 commits of which 140 were empty. Then a second git-rebase eliminated them, because the default behavior of git-rebase is to discard empty commits. I would have needed that final rebase anyway, because I had to throw away the extra commit I added at the beginning to check in the changes to the .gitattributes file. A few conflicts remained There were three or four remaining conflicts during the giant rebase, all resulting from the following situation: Some of the back-end files were created under different names, edited, and later moved into their final positions. The commits that renamed them had unresolvable conflicts: the commit said to rename A to B, but to Git's surprise B already existed with different contents. Git quite properly refused to resolve these itself. I handled each of these cases manually by deleting A. I made this up as I went along I don't want anyone to think that I already had all this stuff up my sleeve, so I should probably mention that there was quite a bit of this I didn't know beforehand. The merge driver stuff was all new to me, and I had to work around the empty-commit issue on the fly. Also, I didn't find a working solution on the first try; this was my second idea. My notes say that I thought my first idea would probably work but that it would have required more effort than what I described above, so I put it aside planning to take it up again if the merge driver approach didn't work. I forget what the first idea was, unfortunately. Named commits This is a minor, peripheral technique which I think is important for everyone to know, because it pays off far out of proportion to how easy it is to learn. There were several commits of interest that I referred to repeatedly while investigating and fixing the pickle. In particular: • The last commit on the topic branch • The first commit on the topic branch that wasn't on master • The commit on master from which the topic branch diverged Instead of trying to remember the commit IDs for these I just gave them mnemonic names with git-branch: last, first, and base, respectively. That enabled commands like git log base..last … which would otherwise have been troublesome to construct. Civilization advances by extending the number of important operations which we can perform without thinking of them. When you're thinking "okay, now I need to rebase this branch" you don't want to derail the train of thought to remember where the bottom of the branch is every time. Being able to refer to it as first is a big help. Other approaches After it was all over I tried to answer the question “What should X have done in the first place to avoid the pickle?” But I couldn't think of anything, so I asked Rik Signes. Rik immediately said that X should have used git-filter-branch to separate the 406 commits into two branches, branch A with just the changes to the 18 back-end files and branch B with just the changes to the other files. (The two branches together would have had more than 406 commits, since a commit that changed both back-end and front-end files would be represented in both branches.) Then he would have had no trouble landing branch A on master and, after it was deployed, landing branch B. At that point I realized that git-filter-branch also provided a less peculiar way out of the pickle once we were in: Instead of using my merge driver approach, I could have filtered the original topic branch to produce just branch B, which would have rebased onto master just fine. I was aware that git-filter-branch was not part of my personal toolkit, but I was unaware of the extent of my unawareness. I would have hoped that even if I hadn't known exactly how to use it, I would at least have been able to think of using it. I plan to set aside an hour or two soon to do nothing but mess around with git-filter-branch so that next time something like this happens I can at least consider using it. It occurred to me while I was writing this that it would probably have worked to make one commit on master to remove the back-end files again, and then rebase the entire topic branch onto that commit. But I didn't think of it at the time. And it's not as good as what I did do, which left the history as clean as was possible at that point. I think I've written before that this profusion of solutions is the sign of a well-designed system. The tools and concepts are powerful, and can be combined in many ways to solve many problems that the designers didn't foresee. December 10, 2016 Douglas M. Auclair (geophf) October 2016 1Liner 1HaskellADay problem and solutions • October 21st, 2016: You have l1 :: [(v, [(k, x)])] You need the transformation l2 :: [(k, [(v, x)])] Redistribute v and k in one line Props for elegance • Francisco T @aiceou redist xs = fromListWith (++)$ concat $(map f xs) where f (a,ys) = map (\(x,y) -> (x,[(a,y)])) ys ... but k has to be 'Ord' Philip Wadler Do you have Q? A study conducted at Northeastern analyses the factors that contribute to success in science. Age is not one of them. The research team began by focusing on career physicists. It ransacked the literature going back to 1893, identifying 2,856 physicists with careers of 20 years or more who published at least one paper every five years — widely cited findings rated as “impact” papers — and the team analyzed when in a career those emerged. ... [K]eeping productivity equal, the scientists were as likely to score a hit at age 50 as at age 25. The distribution was random; choosing the right project to pursue at the right time was a matter of luck. Yet turning that fortuitous choice into an influential, widely recognized contribution depended on another element, one the researchers called Q. Q could be translated loosely as “skill,” and most likely includes a broad variety of factors, such as I.Q., drive, motivation, openness to new ideas and an ability to work well with others. Or, simply, an ability to make the most of the work at hand: to find some relevance in a humdrum experiment, and to make an elegant idea glow. “This Q factor is so interesting because it potentially includes abilities people have but may not recognize as central,” said Zach Hambrick, a professor of psychology at Michigan State University. “Clear writing, for instance. Take the field of mathematical psychology. You may publish an interesting finding, but if the paper is unreadable, as so many are, you can’t have wide impact because no one understands what you’re talking about.” Benedict Carey, New York Times, When It Comes to Success, Age is Just a Number. December 08, 2016 Mark Jason Dominus Ysolo has been canceled An earlier article discussed how I discovered that a hoax item in a Wikipedia list had become the official name of a mountain, Ysolo Mons, on the planet Ceres. I contacted the United States Geological Survey to point out the hoax, and on Wednesday I got the following news from their representative: Thank you for your email alerting us to the possibility that the name Ysolo, as a festival name, may be fictitious. After some research, we agreed with your assessment. The IAU and the Dawn Team discussed the matter and decided that the best solution was to replace the name Ysolo Mons with Yamor Mons, named for the corn/maize festival in Ecuador. The WGPSN voted to approve the change. Thank you for bringing the matter to our attention. (“WGPSN” is the IAU's Working Group for Planetary System Nomenclature. Here's their official announcement of the change, the USGS record of the old name and the USGS record of the new name.) This week we cleaned up a few relevant Wikipedia articles, including one on Italian Wikipedia, and Ysolo has been put to rest. I am a little bit sad to see it go. It was fun while it lasted. But I am really pleased about the outcome. Noticing the hoax, following it up, and correcting the name of this mountain is not a large or an important thing, but it's a thing that very few people could have done at all, one that required my particular combination of unusual talents. Those opportunities are seldom. [ Note: The USGS rep wishes me to mention that the email I quoted above is not an official IAU communication. ] December 07, 2016 FP Complete Concurrency and Node <html> Example code can be found here. When Node.JS first came onto the scene it successfully popularized the event-loop. Ryan Dahl correctly identified a serious problem with the way that I/O is generally handled in concurrent environments. Many web servers, for example achieve concurrency by creating a new thread for every connection. In most platforms, this comes at a substantial cost. The default stack size in Java is 512KB, which means that if you have 1000 concurrent connections, your program will consume half a gigabyte of memory just for stack space. Additionally, forking threads in most systems costs an enormous amount of time, as does performing a context switch between two threads. To address these issues, Node.JS uses a single thread with an event-loop. In this way, Node can handle 1000s of concurrent connections without any of the traditional detriments associated with threads. There is essentially no memory overhead per-connection, and there is no context switching. When compared to traditional threading architectures, this is exactly the right approach to handling concurrency. In fact, Erlang's OTP libraries function in a very similar way. Each actor has a queue of messages, and only processes a single message at a time. Only after fully processing one message does it move on to the next. However, this programming model imposes some extreme costs when it comes to the legibility of code. Consider the following two examples. var request = require('request'); request('http://example.com/random-number', function (error, response, body) { if (!error && response.statusCode === 200) { console.log(JSON.parse(body)); } }); var request = require('request'); var response = request('http://example.com/random-number'); if (response.statusCode === 200) { console.log(JSON.parse(response.body)); } In the first example we see what an HTTP request would typically look like in Node. In the second we see what it could be like if Node had threads. As a direct result of the event-loop, we have lost the ability to express a program as a linear series of steps. Which becomes even more important were we to need to make many I/O calls in the same function. request('http://example.com/random-number', function(error, response1, body) { request('http://example.com/random-number', function(error, response2, body) { request('http://example.com/random-number', function(error, response3, body) { ... }); }); }); versus: var response1 = request('http://example.com/random-number'); var response2 = request('http://example.com/random-number'); var response3 = request('http://example.com/random-number'); Which leads to the 'callback hell' that we all know and hate. Part of the bill-of-goods we accept when using Node is that in exchange for better time and space characteristics, we lose the thread as an abstraction. What is the value of a thread It is important to keep in mind that a thread isn't just a primitive for dealing with concurrency. It is also a powerful abstraction to make the existence of latency invisible to the developer. {-# LANGUAGE OverloadedStrings #-} import Data.Aeson import Network.HTTP.Simple import Network.HTTP.Types.Status main :: IO () main = do res <- httpJSON "http://httpbin.org/get" if getResponseStatus res == status200 then print$ (getResponseBody res :: Value)
else error $show$ getResponseStatus res

Looking at the code above we notice a characteristic difference about it when compared to the asynchronous JavaScript examples above. There are no callbacks. When we say response <- get ..., the program will halt until we get a response back from the server. We have written a piece of code that is linear, from step-to-step, and the underlying system knows that the server on the other end will take time to respond and handles it accordingly.

This is possible because a thread keeps track of the state belonging to an ongoing computation. By tracking that state, a thread can halt and resume a computation arbitrarily. In the event of I/O, a thread will halt a computation while the I/O is occurring, and only resume when a response comes back. Giving your program the appearance of having no delay.

A thread, therefore, is both a way of running multiple computations concurrently and also accounting for the asynchronous nature of I/O. By not exposing this abstraction to the developer, two huge drawbacks exist in Node as a concurrency-oriented platform.

In Node.JS concurrency can only occur when adjacent to I/O

Node.JS only exposes one primary thread to a program. While it may spawn new threads under the hood to deal with input buffers, you don't have any control over them. As a result, you cannot write a program that takes advantage of multiple cores in a machine. The result is that your program will only be able to perform two actions concurrently if at least one of them is bounded by I/O.

To demonstrate, we have setup an example that you can download alongside this article. It is linked to at the top of the article. Once you have downloaded the example, look in the README.md for the "Local vs. Remote Test" instructions.

We start by defining a web server that supports a slow operation. In our case, generating Fibonacci numbers by the most naive implementation. Then we define two tests in Node.JS and two tests in Haskell. Using each language's native async libraries to attempt to acquire multiple Fibonacci numbers concurrently. In each language one test times how long it takes to concurrently request two Fibonacci numbers from our web server, the other times how long it takes to do the same operation locally. All tests are also run without the async library for comparison.

Test Name Without Async Time With Async Time Async / No Async
Node - Local 6.439s 6.306s 0.979
Node - Remote 4.391s 2.500s 0.569
Haskell - Local 3.849s 2.487s 0.646
Haskell - Remote 4.117s 2.521s 0.612

Taking a look at the first row in our table, we find that Node.JS when attempting to concurrently compute two Fibonacci numbers is unable to give us any time savings in spite of the tests being run on a multi-core machine.

    return async.parallel([
(callback) => {
x = getFibs(43);
callback();
},
(callback) => {
y = getFibs(44);
callback();
}
]).then(() => {

Even when the functions to generate numbers are run inside async.parallel they are unable to run concurrently. This is because the Node.JS event-loop only allows one callback to be executed at any given time. So instead of running the two callbacks defined above in parallel, they are run sequentially. If we make I/O calls inside our callbacks, as we do in the Node - Remote test, the system is able to issue both requests to a web server concurrently.

    return async.parallel([
(callback) => {
request('http://localhost:8080/api/fibs/43')
.then((xStr) => {
x = JSON.parse(xStr).fibResult;
callback();
});
},
(callback) => {
request('http://localhost:8080/api/fibs/44')
.then((yStr) => {
y = JSON.parse(yStr).fibResult;
callback();
});
}
]).then(() => {

In Node.JS concurrent computations starve each other of resources

This inability to handle multiple computations at the same time hampers Node.JS even in its intended use case as a web server. The phenomenon is referred to as "starvation" and we can demonstrate it by reversing our previous tests.

To run the demonstration code, download the example project linked at the top of the article. Then look in the README.md for instructions on the "Starvation Test."

In this demonstration we setup a web server in each language. The web server contains two routes. The first route is a very fast computation, calculating the square of the input number. The second route is a very slow computation, calculating the Nth Fibonacci number based on the input. We then constructed a program that could perform two tests against each web server.

In the first test, labeled "Fast Only," the test program counts the number of requests the each server can respond to in 5 seconds; using only the fast route. In the second test, labeled "Fast with Slow", the test program makes one request to the slow route, then counts how many requests to the fast route it can respond to in 5 seconds.

To make this demonstration as compelling as possible, we have also disabled threading in the Haskell run-time system on the Haskell web server. The result is that the Haskell server will only be able to use its internal thread model. It will not be able to create additional operating system threads, and it will not be able to take advantage of more than one core.

Test Name Request Throughput (higher is better)
Node - Fast Only 572
Node - Fast with Slow 114
Haskell - Fast with Slow 589

The results are quite striking. From Haskell's baseline of 591, having a background computation only reduced the throughput by 2 requests; which is likely smaller than our margin of error. Node.JS on the other hand was reduced from its baseline to one fifth capacity.

The difference here is stark because in Node.JS's execution model, the moment it receives a request on the slow route it must fully complete the computation for that route before it can begin on a new request. There is no I/O action taking place which will allow the server to jump to processing a new request.

In contrast, Haskell can preempt threads in execution. Which means that when the Haskell server is chugging along on the slow request, and a request on the fast route comes in, it can jump over and process it. Then returning to the slow request while the network is delivering the fast request's response back to a client.

When Ryan Dahl originally presented Node, his biggest concern with threaded systems was their memory consumption. He pointed to an example of Apache versus Nginx to show that Nginx had a massively lighter footprint because of its event-loop architecture (whereas Apache used one thread per connection). The difference in memory usage was at least an order of magnitude as the connection count increased.

To demonstrate that Haskell's threading architecture does not suffer this problem, we have constructed one final example, again available at the repository linked above. Look in the README.md for "Thread Spawn".

In this example we create 100,000 threads each with a 'start' and 'stop' MVar; which is Haskell's equivalent to a locked variable. We then issue a start command on each start MVar, wait three seconds, and finally issue a stop commend on each end MVar.

When we run the program, we do so with threading enabled, and with the -s option on the run-time system; which gives us the following output.

stack exec -- haskell-thread-spawn +RTS -sout
cat out
137,593,048 bytes allocated in the heap
405,750,472 bytes copied during GC
77,103,392 bytes maximum residency (11 sample(s))
10,891,144 bytes maximum slop
165 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed)  Avg pause  Max pause
Gen  0       104 colls,   104 par    0.332s   0.082s     0.0008s    0.0186s
Gen  1        11 colls,    10 par    0.620s   0.147s     0.0134s    0.0425s

Parallel GC work balance: 28.08% (serial 0%, perfect 100%)

TASKS: 18 (1 bound, 17 peak workers (17 total), using -N8)

SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

INIT    time    0.000s  (  0.001s elapsed)
MUT     time    0.312s  (  3.110s elapsed)
GC      time    0.952s  (  0.229s elapsed)
EXIT    time    0.000s  (  0.003s elapsed)
Total   time    1.316s  (  3.343s elapsed)

Alloc rate    441,003,358 bytes per MUT second

Productivity  27.7% of total user, 10.9% of total elapsed

gc_alloc_block_sync: 104599
whitehole_spin: 0
gen[0].sync: 504
gen[1].sync: 2465

Looking near the top of the output, we see that Haskell's run-time system was able to create 100,000 threads while only using 165 megabytes of memory. We are roughly consuming 1.65 kilobytes per thread. So an average web server, which might see 2,000 concurrent connections at the top end, would expect to use only 3 megabytes of memory for being multi-threaded.

Node and Haskell both offer a solution to the traditional woes of using threading in a web server. They both allow you to deal with very high numbers of concurrent connections without a huge memory overhead. However, Haskell doesn't impose an additional burden on the design of your software to accomplish that goal. You don't have to worry about routes that take varying amounts of time. You don't have to dump long-running tasks off to an event queue. You don't have to spawn multiple processes to utilize multiple cores of a machine.

We've only just barely scratched the surface of what you can do with Haskell. If you're interested in learning more, please check out our Haskell Syllabus for a recommended learning route. There's also lots of content on the haskell-lang get started page.

FP Complete also provides corporate and group webinar training sessions. Please check out our training page for more information, or see our consulting page for how we can help your team succeed with devops and functional programming.

</html>

Undefined Behaviour in C

Summary: I tripped over undefined behaviour in C. It's annoying.

I've recently been writing some C code to parse XML quickly. While working on that project, I inadvertently wrote some code which is undefined according to the C language standard. The code compiled and ran fine using Visual Studio, but under gcc (even at -O0) it corrupted memory, sometimes leading to a segfault, but usually just leading to a wrong answer. The code in question was (see full code at GitHub):

d->nodes.nodes[0].nodes = parse_content(d);

To give some context, d is a structure that contains various pieces of state - what the string to be parsed is, how much we have parsed, along with a pointer to the output nodes. The parse_content function parses the bit inside an XML tag, returning the indicies in nodes which it used.

The complication comes from nodes not being a fixed size, but dynamically resized if the number of nodes exceeds the capacity. For big documents that means parse_content will reallocate d->nodes.nodes.

According to the C spec, the compiler can evaluate the LHS and RHS of an assignment in any order. Since gcc computes the location of d->nodes.nodes[0] before calling parse_content it uses the address of the node before reallocation. After reallocation the address will have changed, and the assignment will be made to the wrong location.

I spotted the bug by inserting printf statements, and in doing so, I had to rewrite the code to:

str content = parse_content(d);d->nodes.nodes[0].nodes = content;

That fixes the issue, since now the evaluation order is strictly defined. As a simplified example of the same issue:

char* array;char f() {    array = malloc(42);    return 'x';    }void test() {    array = malloc(0);    array[0] = f();}

Here the line array[0] = f() might assign to either the result of malloc(0) or malloc(42), at the compilers discretion.

I manually checked if I had made any other such mistakes, and I couldn't find any. Naturally, I wanted to find a static checker that could detect such a mistake, so I tried a bunch of them. I wasn't very successful:

• Visual Studio 2015 code analysis made me write assert after each malloc, but nothing further.
• PVS Studio found nothing.
• Clang undefined behaviour found nothing, and seemingly doesn't work on Windows.
• GCC undefined behaviour found nothing, and seemingly doesn't work on Windows.
• RV-Match hit a stack-overflow when running the program.

What duties to software developers owe to users?

I was reading this blog post, entitled "The code I’m still ashamed of".

TL; DR: back in 2000 the poster, Bill Sourour, was employed to write a web questionnaire aimed at teenage girls that purported to advise the user about their need for a particular drug. In reality unless you said you were allergic to it, the questionnaire always concluded that the user needed the drug. Shortly after, Sourour read about a teenage girl who had possibly committed suicide due to side effects of this drug. He is still troubled by this.

Nothing the poster or his employer did was illegal. It may not even have been unethical, depending on exactly which set of professional ethics you subscribe to. But it seems clear to me that there is something wrong in a program that purports to provide impartial advice while actually trying to trick you into buying medication you don't need. Bill Sourour clearly agrees.

Out in meatspace we have a clearly defined set of rules for this kind of situation. Details vary between countries, but if you consult someone about legal, financial or medical matters then they are generally held to have a "fiduciary duty" to you. The term derives from the Latin for "faithful". If X has a fiduciary duty to Y, then X is bound at all times to act in the best interests of Y. In such a case X is said to be "the fiduciary" while Y is the "beneficiary".

In many cases fiduciary duties arise in clearly defined contexts and have clear bodies of law or other rules associated with them. If you are the director of a company then you have a fiduciary duty to the shareholders, and most jurisdictions have a specific law for that case. But courts can also find fiduciary duties in other circumstances. In English law the general principle is as follows:
"A fiduciary is someone who has undertaken to act for and on behalf of another in a particular matter in circumstances which give rise to a relationship of trust and confidence."
It seems clear to me that this describes precisely the relationship between a software developer and a user. The user is not in a position to create the program they require, so they use one developed by someone else. The program acts as directed by the developer, but on behalf of the user. The user has to trust that the program will do what it promises, and in many cases the program will have access to confidential information which could be disclosed to others against the user's wishes.

These are not theoretical concerns. "Malware" is a very common category of software, defined as:
any software used to disrupt computer or mobile operations, gather sensitive information, gain access to private computer systems, or display unwanted advertising.
Sometimes malware is illicitly introduced by hacking, but in many cases the user is induced to run the malware by promises that it will do something that the user wants. In that case, software that acts against the interests of the user is an abuse of the trust placed in the developer by the user. In particular, the potential for software to "gather sensitive information" and "gain access to private computer systems" clearly shows that the user must have a "relationship of trust and confidence" with the developer, even if they have never met.

One argument against my thesis came up when I posted a question about this to Legal forum on Stack Exchange. The answer I got from Dale M argued that:

Engineers (including software engineers) do not have this [relationship of confidence] and AFAIK a fiduciary duty between an engineer and their client has never been found, even where the work is a one-on-one commission.
I agree that, unlike a software developer, all current examples of a fiduciary duty involve a relationship in which the fiduciary is acting directly. The fiduciary has immediate knowledge of the circumstances of the particular beneficiary, and decides from moment to moment to take actions that may or may not be in the beneficiary's best interest. In contrast a software developer is separated in time from the user, and may have little or no knowledge of the user's situation.

I didn't argue with Dale M because Stack Exchange is for questions and answers, not debates. However I don't think that the distinction drawn by Dale M holds for software. An engineer designing a bridge is not in a position to learn the private information of those who cross the bridge, but a software engineer is often in a position to learn a great deal about the users of their product. It seems to me that this leads inescapably to the conclusion that software engineers do have a relationship of confidence with the user, and that this therefore creates a fiduciary duty.

Of course, as Dale M points out, nobody has ever persuaded a judge that software developers owe a fiduciary duty, and its likely that in practice its going to be a hard sell. But to go back to the example at the top, I think that Bill Sourer, or his employer, did owe a fiduciary duty to those people who ran the questionnaire software he wrote, because they disclosed private information in the expectation of getting honest advice, and the fact that they disclosed it to a program instead of a human makes no difference at all.

This section looks at exactly what the scope of the fiduciary duty is. It doesn't fit within the main text of this essay, so I've put it here.

Fortunately there is no need for a change in the law regarding fiduciary duty. The existence of a fiduciary duty is based on the nature of the relationship between principal and agent, although in some countries specific cases such as company directors are covered by more detailed laws.

First it is necessary to determine exactly who the fiduciary is. So far I have talked about "the software developer", but in practice software is rarely written by a single individual. We have to look at the authority that is directing the effort and deciding what functions will be implemented. If the software is produced by a company then treating the company as the fiduciary would seem to be the best approach, although it might be more appropriate to hold a senior manager liable if they have exceeded their authority.

As for the scope, I'm going to consider the scope of the fiduciary duty imposed on company directors and consider whether an analogous duty should apply to a software developer:

• Duty of care: for directors this is the duty to inform themselves and take due thought before making a decision.  One might argue that a software developer should have a similar duty of care when writing software, but this is already handled through normal negligence. Elevating the application of normal professional skill to a fiduciary duty is not going to make life better for the users. However there is one area where this might be applied: lack of motive to produce secure software is widely recognised as a significant problem, and is also an area where the "confidence" aspect of fiduciary duty overlaps with a duty of care. Therefore developers who negligently fail to consider security aspects of their software should be considered to have failed in their fiduciary duty.
• Duty of loyalty: for directors this is the duty not to use their position to further their private interests. For a software developer this is straightforward: the developer should not use their privileged access to the user's computer to further their private interests. So downloading information from the users computer (unless the user explicitly instructs this to happen) should be a breach of fiduciary duty. So would using the processing power or bandwidth owned by the user for the developers own purposes, for instance by mining bitcoins or sending spam.
• Duty of good faith: the developer should write code that will advance the user's interests and act in accordance with the user's wishes at all times.
• Duty of confidentiality: if the developer is entrusted with user information, for example because the software interfaces with cloud storage, then this should be held as confidential and not disclosed for the developer's benefit.
• Duty of prudence: This does not map onto software development.
• Duty of disclosure: for a director this providing all relevant information to the shareholders. For a software developer, it means completely and honestly documenting what the software does, and particularly drawing attention to any features which a user might reasonably consider against their interests.  Merely putting some general clauses in the license is not sufficient; anything that could reasonably be considered to be contrary to the user's interests should be prominently indicated in a way that enables the user to prevent it.
One gray area in this is software that is provided in exchange for personal data. Many "free" apps are paid for by advertisers who, in addition to the opportunity to advertise to the user, also pay for data about the users. On one hand, this involves the uploading of personal data that the user may not wish to share, but on the other hand it is done as part of an exchange that the user may be happy with. This comes under the duty of disclosure. The software should inform the user that personal data will be uploaded, and should also provide a detailed log of exactly what has been sent. Thus users can make informed decisions about the value of the information they are sending, and possibly alter their behavior when they know it is being monitored.

Back End Functional Developer at NYU (Full-time)

Position Summary

The Databrary project is looking for a smart, energetic and flexible back end developer to join its technical team. The developer will act as the primary owner of the code base of our service. Working closely with the managing director and the service team, the developer will design, develop and maintain tools to enable behavioral researchers to collaborate, store, discover, explore and access video-based research datasets. (S)he will maintain an existing code base and build new features, enhancements and integrations.

Databrary (databrary.org) is the leading open source video data-sharing system for developmental science. Datavyu (datavyu.org) is the leading free, open source, multi-platform video coding tool. This position provides a unique opportunity to play a central role in advancing open science through data sharing and reuse.

The ideal candidate is a self starter who is not afraid of learning new technologies, thinks out of the box, takes initiative, has excellent attention to detail and can work to take tasks to fruition both collaboratively in a team and independently. The developer will adapt to the evolving and growing needs of the project.

Essential Responsibilities/Functions Research and evaluation

The developer will analyze and understand current system and application architecture, logical and physical data models, security and storage implementation as well as the code base, document it thoroughly, formulate high level architectural and call graph diagrams and make recommendations to the managing director on a future strategic direction.

Development and maintenance

The developer will maintain existing code base and troubleshoot to improve application reliability and performance. (S)he will lead development, manage releases, deploy code, and track bug and QA progress. (S)he will build dynamic, modular and responsive web applications by implementing clean, reusable, well designed and well tested code to add enhancements, features and new integrations to the platform in current technologies (Haskell, PostgreSQL, AngularJS) or any other secure, modern, sustainable web frameworks.

Innovation in data management

The developer will work closely with experts in the field to understand the complete data lifecycle and management for researchers. (S)he will advocate for and become a force of innovation at each step of activities undertaken in relation to the collection, processing, description, transformation, retention and reuse of research data. (S)he will design, develop, implement, test and validate existing and new data management and web-based tools to facilitate research.

Preferred Skills, Knowledge and Abilities

• Hands-on experience with functional languages like Haskell, OCaml, F#, or Scala.

• Knowledge of modern web frameworks in high-level languages such as Java, Ruby, Python or PHP and video technologies.

• Knowledge of JavaScript, JS frameworks, HTML, CSS and other front end technologies.

• Understanding of best practices in SDLC (Software Development Life Cycle).

• Understanding of TDD (Test-driven development), security and design patterns.

• Experience with version control, unix scripting, automation and DevOps practices.

• Familiarity using CRM, project management and task management systems.

• Passion for open source projects and building high quality software.

• Strong written and oral communication skills.

• Superior listening and analytical skills and a knack for tackling tough problems.

• Ability to multitask and juggle multiple priorities and projects.

• Adaptability and openness to learn and change.

Required Experience

• Track record of designing scalable software for web applications in modern web frameworks.

• Exceptional understanding of system architecture, object oriented principles, web technologies, REST API and MVC patterns.

• Solid knowledge of SQL and RDBMS like PostgreSQL.

• Basic knowledge of scientific practices and research tools, such as Matlab, SPSS, or R.

Preferred Education

• BS, MS or Ph.D in Computer Science, Information Technology or other relevant field.

New York University is an Equal Opportunity Employer. New York University is committed to a policy of equal treatment and opportunity in every aspect of its hiring and promotion process without regard to race, color, creed, religion, sex, pregnancy or childbirth (or related medical condition), sexual orientation, partnership status, gender and/or gender identity or expression, marital or parental status, national origin, ethnicity, alienage or citizenship status, veteran or military status, age, disability, predisposing genetic characteristics, domestic violence victim status, unemployment status, or any other legally protected basis. Women, racial and ethnic minorities, persons of minority sexual orientation or gender identity, individuals with disabilities, and veterans are encouraged to apply for vacant positions at all levels.

Get information on how to apply for this position.

[xxneozpu] Demonstration of Data.Numbers.Fixed

Here is a little demonstration of arbitrary precision floating-point (actually fixed-point) arithmetic in Haskell, using Data.Numbers.Fixed in the numbers package.  Using dynamicEps, we calculate sqrt(1/x) to arbitrary precision, outputting the number in a user-specified base.

Performance is not so great; the package implements Floating functions using elegant but not necessarily high performance algorithms based on continued fractions (Gosper, HAKMEM).  The hmpfr package might be better.  (For square root, it might also be easy to roll one's own implementation of Newton's method on Rational.)

Radix conversion of a fractional number is implemented as an unfold, similar to radix conversion of an integer.

Here is the video of my talk “A Type is Worth a Thousand...

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="225" mozallowfullscreen="mozallowfullscreen" src="https://player.vimeo.com/video/191467217?title=0&amp;byline=0&amp;portrait=0" title="A Type is Worth a Thousand Tests" webkitallowfullscreen="webkitallowfullscreen" width="400"></iframe>

Here is the video of my talk “A Type is Worth a Thousand Tests” presented at Sydney CocoaHeads, November 2016 (you can also get the slides) — I previously presented this talk at YOW! Connected, Melbourne, and an earlier version at Curry On in Rome.

In this talk, I argue that types are a design tool that, if applied correctly, reduces the need for tests. I illustrate this at the example of the design of a simple iPhone app in Swift whose source code is available on GitHub.

Category Theory, Syntactically

Overview

This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.

I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.

Theories

The first concept we’ll need is that of a theory. If you’ve ever implemented an interpreter for even the simplest language, than most of what follows modulo some terminological differences should be both familiar and very basic. If you are familiar with algebraic semantics, then that is exactly what is happening here only restricting to unary (but multi-sorted) algebraic theories.

For us, a theory, #ccT#, is a collection of sorts, a collection of (unary) function symbols1, and a collection of equations. Each function symbol has an input sort and an output sort which we’ll call the source and target of the function symbol. We’ll write #ttf : A -> B# to say that #ttf# is a function symbol with source #A# and target #B#. We define #"src"(ttf) -= A# and #"tgt"(ttf) -= B#. Sorts and function symbols are just symbols. Something is a sort if it is in the collection of sorts. Nothing else is required. A function symbol is not a function, it’s just a, possibly structured, name. Later, we’ll map those names to functions, but the same name may be mapped to different functions. In programming terms, a theory defines an interface or signature. We’ll write #bb "sort"(ccT)# for the collection of sorts of #ccT# and #bb "fun"(ccT)# for the collection of function symbols.

A (raw) term in a theory is either a variable labelled by a sort, #bbx_A#, or it’s a function symbol applied to a term, #tt "f"(t)#, such that the sort of the term #t# matches the source of #ttf#. The sort or target of a term is the sort of the variable if it’s a variable or the target of the outermost function symbol. The source of a term is the sort of the innermost variable. In fact, all terms are just sequences of function symbol applications to a variable, so there will always be exactly one variable. All this is to say the expressions need to be “well-typed” in the obvious way. Given a theory with two function symbols #ttf : A -> B# and #ttg : B -> A#, #bbx_A#, #bbx_B# , #tt "f"(bbx_A)#, and #tt "f"(tt "g"(tt "f"(bbx_A)))# are all examples of terms. #tt "f"(bbx_B)# and #tt "f"(tt "f"(bbx_A))# are not terms because they are not “well-typed”, and #ttf# by itself is not a term simply because it doesn’t match the syntax. Using Haskell syntax, we can define a data type representing this syntax if we ignore the sorting:

data Term = Var Sort | Apply FunctionSymbol Term

Using GADTs, we could capture the sorting constraints as well:

data Term (s :: Sort) (t :: Sort) where
Var :: Term t t
Apply :: FunctionSymbol x t -> Term s x -> Term s t

An important operation on terms is substitution. Given a term #t_1# with source #A# and a term #t_2# with target #A# we define the substitution of #t_2# into #t_1#, written #t_1[bbx_A |-> t_2]#, as:

If #t_1 = bbx_A# then #bbx_A[bbx_A |-> t_2] -= t_2#.

If #t_1 = tt "f"(t)# then #tt "f"(t)[bbx_A |-> t_2] -= tt "f"(t[bbx_A |-> t_2])#.

Using the theory from before, we have:

#tt "f"(bbx_A)[bbx_A |-> tt "g"(bbx_B)] = tt "f"(tt "g"(bbx_B))#

As a shorthand, for arbitrary terms #t_1# and #t_2#, #t_1(t_2)# will mean #t_1[bbx_("src"(t_1)) |-> t_2]#.

Finally, equations2. An equation is a pair of terms with equal source and target, for example, #(: tt "f"(tt "g"(bbx_B)), bbx_B :)#. The idea is that we want to identify these two terms. To do this we quotient the set of terms by the congruence generated by these pairs, i.e. by the reflexive-, symmetric-, transitive-closure of the relation generated by the equations which further satisfies “if #s_1 ~~ t_1# and #s_2 ~~ t_2# then #s_1(s_2) ~~ t_1(t_2)#”. From now on, by “terms” I’ll mean this quotient with “raw terms” referring to the unquotiented version. This means that when we say “#tt "f"(tt "g"(bbx_B)) = bbx_B#”, we really mean the two terms are congruent with respect to the congruence generated by the equations. We’ll write #ccT(A, B)# for the collection of terms, in this sense, with source #A# and target #B#. To make things look a little bit more normal, I’ll write #s ~~ t# as a synonym for #(: s, t :)# when the intent is that the pair represents a given equation.

Expanding the theory from before, we get the theory of isomorphisms, #ccT_{:~=:}#, consisting of two sorts, #A# and #B#, two function symbols, #ttf# and #ttg#, and two equations #tt "f"(tt "g"(bbx_B)) ~~ bbx_B# and #tt "g"(tt "f"(bbx_A)) ~~ bbx_A#. The equations lead to equalities like #tt "f"(tt "g"(tt "f"(bbx_A))) = tt "f"(bbx_A)#. In fact, it doesn’t take much work to show that this theory only has four distinct terms: #bbx_A#, #bbx_B#, #tt "f"(bbx_A)#, and #tt "g"(bbx_B)#.

In traditional model theory or universal algebra we tend to focus on multi-ary operations, i.e. function symbols that can take multiple inputs. By restricting ourselves to only unary function symbols, we expose a duality. For every theory #ccT#, we have the opposite theory, #ccT^(op)# defined by using the same sorts and function symbols but swapping the source and target of the function symbols which also requires rewriting the terms in the equations. The rewriting on terms is the obvious thing, e.g. if #ttf : A -> B#, #ttg : B -> C#, and #tth : C -> D#, then the term in #ccT#, #tt "h"(tt "g"(tt "f"(bbx_A)))# would become the term #tt "f"(tt "g"(tt "h"(bbx_D)))# in #ccT^(op)#. From this it should be clear that #(ccT^(op))^(op) = ccT#.

Product Theories

Given two theories #ccT_1# and #ccT_2# we can form a new theory #ccT_1 xx ccT_2# called the product theory of #ccT_1# and #ccT_2#. The sorts of this theory are pairs of sorts from #ccT_1# and #ccT_2#. The collection of function symbols is the disjoint union #bb "fun"(ccT_1) xx bb "sort"(ccT_2) + bb "sort"(ccT_1) xx bb "fun"(ccT_2)#. A disjoint union is like Haskell’s Either type. Here we’ll write #tt "inl"# and #tt "inr"# for the left and right injections respectively. #tt "inl"# takes a function symbol from #ccT_1# and a sort from #ccT_2# and produces a function symbol of #ccT_1 xx ccT_2# and similarly for #tt "inr"#. If #tt "f" : A -> B# in #ccT_1# and #C# is a sort of #ccT_2#, then #tt "inl"(f, C) : (A, C) -> (B, C)# and similarly for #tt "inr"#.

The collection of equations for #ccT_1 xx ccT_2# consists of the following:

• for every equation, #l ~~ r# of #ccT_1# and every sort, #C#, of #ccT_2# we produce an equation #l’ ~~ r’# by replacing each function symbol #ttf# in #l# and #r# with #tt "inl"(tt "f", C)#
• similarly for equations of #ccT_2#
• for every pair of function symbols #ttf : A -> B# from #ccT_1# and #ttg : C -> D# from #ccT_2#, we produce the equation #tt "inl"(tt "f", D)(tt "inr"(A, tt "g")(bbx_{:(A, C")":})) ~~ tt "inr"(B, tt "g")("inl"(tt "f", C)(bbx_{:(A, C")":}))#

The above is probably unreadable. If you work through it, you can show that every term of #ccT_1 xx ccT_2# is equivalent to a pair of terms #(t_1, t_2)# where #t_1# is a term in #ccT_1# and #t_2# is a term in #ccT_2#. Using this equivalence, the first bullet is seen to be saying that if #l = r# in #ccT_1# and #C# is a sort in #ccT_2# then #(l, bbx_C) = (r, bbx_C)# in #ccT_1 xx ccT_2#. The second is similar. The third then states

#(t_1, bbx_C)((bbx_A, t_2)(bbx_{:"(A, C)":})) = (t_1, t_2)(bbx_{:"(A, C)":}) = (bbx_A, t_2)((t_1, bbx_C)(bbx_{:"(A, C)":}))#.

To establish the equivalence between terms of #ccT_1 xx ccT_2# and pairs of terms from #ccT_1# and #ccT_2#, we use the third bullet to move all the #tt "inl"#s outward at which point we’ll have a sequence of #ccT_1# function symbols followed by a sequence of #ccT_2# function symbols each corresponding to term.

The above might seem a bit round about. An alternative approach would be to define the function symbols of #ccT_1 xx ccT_2# to be all pairs of all the terms from #ccT_1# and #ccT_2#. The problem with this approach is that it leads to an explosion in the number of function symbols and equations required. In particular, it easily produces an infinitude of function symbols and equations even when provided with theories that only have a finite number of sorts, function symbols, and equations.

As a concrete and useful example, consider the theory #ccT_bbbN# consisting of a single sort, #0#, a single function symbol, #tts#, and no equations. This theory has a term for each natural number, #n#, corresponding to #n# applications of #tts#. Now let’s articulate #ccT_bbbN xx ccT_bbbN#. It has one sort, #(0, 0)#, two function symbols, #tt "inl"(tt "s", 0)# and #tt "inr"(0, tt "s")#, and it has one equation, #tt "inl"(tt "s", 0)(tt "inr"(0, tt "s")(bbx_{:(0, 0")":})) ~~ tt "inr"(0, tt "s")("inl"(tt "s", 0)(bbx_{:(0, 0")":}))#. Unsurprisingly, the terms of this theory correspond to pairs of natural numbers. If we had used the alternative definition, we’d have had an infinite number of function symbols and an infinite number of equations.

Nevertheless, for clarity I will typically write a term of a product theory as a pair of terms.

As a relatively easy exercise — easier than the above — you can formulate and define the disjoint sum of two theories #ccT_1 + ccT_2#. The idea is that every term of #ccT_1 + ccT_2# corresponds to either a term of #ccT_1# or a term of #ccT_2#. Don’t forget to define what happens to the equations.

Related to these, we have the theory #ccT_{:bb1:}#, which consists of one sort and no function symbols or equations, and #ccT_{:bb0:}# which consists of no sorts and thus no possibility for function symbols or equations. #ccT_{:bb1:}# has exactly one term while #ccT_{:bb0:}# has no terms.

Collages

Sometimes we’d like to talk about function symbols whose source is in one theory and target is in another. As a simple example, that we’ll explore in more depth later, we may want function symbols whose sources are in a product theory. This would let us consider terms with multiple inputs.

The natural way to achieve this is to simply make a new theory that contains sorts from both theories plus the new function symbols. A collage, #ccK#, from a theory #ccT_1# to #ccT_2#, written #ccK : ccT_1 ↛ ccT_2#, is a theory whose collection of sorts is the disjoint union of the sorts of #ccT_1# and #ccT_2#. The function symbols of #ccK# consist for each function symbol #ttf : A -> B# in #ccT_1#, a function symbol #tt "inl"(ttf) : tt "inl"(A) -> tt "inl"(B)#, and similarly for function symbols from #ccT_2#. Equations from #ccT_1# and #ccT_2# are likewise taken and lifted appropriately, i.e. #ttf# is replaced with #tt "inl"(ttf)# or #tt "inr"(ttf)# as appropriate. Additional function symbols of the form #k : tt "inl"(A) -> tt "inr"(Z)# where #A# is a sort of #ccT_1# and #Z# is a sort of #ccT_2#, and potentially additional equations involving these function symbols, may be given. (If no additional function symobls are given, then this is exactly the disjoint sum of #ccT_1# and #ccT_2#.) These additional function symbols and equations are what differentiate two collages that have the same source and target theories. Note, there are no function symbols #tt "inr"(Z) -> tt "inl"(A)#, i.e. where #Z# is in #ccT_2# and #A# is in #ccT_1#. That is, there are no function symbols going the “other way”. To avoid clutter, I’ll typically assume that the sorts and function symbols of #ccT_1# and #ccT_2# are disjoint already, and dispense with the #tt "inl"#s and #tt "inr"#s.

Summarizing, we have #ccK(tt "inl"(A), "inl"(B)) ~= ccT_1(A, B)#, #ccK(tt "inr"(Y), tt "inr"(Z)) ~= ccT_2(Y, Z)#, and #ccK(tt "inr"(Z), tt "inl"(A)) = O/# for all #A#, #B#, #Y#, and #Z#. #ccK(tt "inl"(A), tt "inr"(Z))# for any #A# and #Z# is arbitrary generated. To distinguish them, I’ll call the function symbols that go from one theory to another bridges. More generally, an arbitrary term that has it’s source in one theory and target in another will be described as a bridging term.

Here’s a somewhat silly example. Consider #ccK_+ : ccT_bbbN xx ccT_bbbN ↛ ccT_bbbN# that has one bridge #tt "add" : (0, 0) -> 0# with the equations #tt "add"(tt "inl"(tts, 0)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")")))# and #tt "add"(tt "inr"(0, tts)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")")))#.

More usefully, if a bit degenerately, every theory induces a collage in the following way. Given a theory #ccT#, we can build the collage #ccK_ccT : ccT ↛ ccT# where the bridges consist of the following. For each sort, #A#, of #ccT#, we have the following bridge: #tt "id"_A : tt "inl"(A) -> tt "inr"(A)#. Then, for every function symbol, #ttf : A -> B# in #ccT#, we have the following equation: #tt "inl"(tt "f")(tt "id"_A(bbx_(tt "inl"(A)))) ~~ tt "id"_B(tt "inr"(tt "f")(bbx_(tt "inl"(A))))#. We have #ccK_ccT(tt "inl"(A), tt "inr"(B)) ~= ccT(A, B)#.

You can think of a bridging term in a collage as a sequence of function symbols partitioned into two parts by a bridge. Naturally, we might consider partitioning into more than two parts by having more than one bridge. It’s easy to generalize the definition of collage to combine an arbitrary number of theories, but I’ll take a different, probably less good, route. Given collages #ccK_1 : ccT_1 ↛ ccT_2# and #ccK_2 : ccT_2 ↛ ccT_3#, we can make the collage #ccK_2 @ ccK_1 : ccT_1 ↛ ccT_3# by defining its bridges to be triples of a bridge of #ccK_1#, #k_1 : A_1 -> A_2#, a term, #t : A_2 -> B_2# of #ccT_2#, and a bridge of #ccK_2#, #k_2 : B_2 -> B_3# which altogether will be a bridge of #ccK_2 @ ccK_1# going from #A_1 -> B_3#. These triples essentially represent a term like #k_2(t(k_1(bbx_(A_1))))#. With this intuition we can formulate the equations. For each equation #t'(k_1(t_1)) ~~ s'(k'_1(s_1))# where #k_1# and #k'_1# are bridges of #ccK_1#, we have for every bridge #k_2# of #ccK_2# and term #t# of the appropriate sorts #(k_2, t(t'(bbx)), k_1)(t_1) ~~ (k_2, t(s'(bbx)), k'_1)(s_1)# and similarly for equations involving the bridges of #ccK_2#.

This composition is associative… almost. Furthermore, the collages generated by theories, #ccK_ccT#, behave like identities to this composition… almost. It turns out these statements are true, but only up to isomorphism of theories. That is, #(ccK_3 @ ccK_2) @ ccK_1 ~= ccK_3 @ (ccK_2 @ ccK_1)# but is not equal.

To talk about isomorphism of theories we need the notion of…

Interpretations

An interpretation of a theory gives meaning to the syntax of a theory. There are two nearly identical notions of interpretation for us: interpretation (into sets) and interpretation into a theory. I’ll define them in parallel. An interpretation (into a theory), #ccI#, is a mapping, written #⟦-⟧^ccI# though the superscript will often be omitted, which maps sorts to sets (sorts) and function symbols to functions (terms). The mapping satisfies:

#⟦"src"(f)⟧ = "src"(⟦f⟧)# and #⟦"tgt"(f)⟧ = "tgt"(⟦f⟧)# where #"src"# and #"tgt"# on the right are the domain and codomain operations for an interpretation.

We extend the mapping to a mapping on terms via:

• #⟦bbx_A⟧ = x |-> x#, i.e. the identity function, or, for interpretation into a theory, #⟦bbx_A⟧ = bbx_{:⟦A⟧:}#
• #⟦tt "f"(t)⟧ = ⟦tt "f"⟧ @ ⟦t⟧# or, for interpretation into a theory, #⟦tt "f"(t)⟧ = ⟦tt "f"⟧(⟦t⟧)#

and we require that for any equation of the theory, #l ~~ r#, #⟦l⟧ = ⟦r⟧#. (Technically, this is implicitly required for the extension of the mapping to terms to be well-defined, but it’s clearer to state it explicitly.) I’ll write #ccI : ccT -> bb "Set"# when #ccI# is an interpretation of #ccT# into sets, and #ccI’ : ccT_1 -> ccT_2# when #ccI’# is an interpretation of #ccT_1# into #ccT_2#.

An interpretation of the theory of isomorphisms produces a bijection between two specified sets. Spelling out a simple example where #bbbB# is the set of booleans:

• #⟦A⟧ -= bbbB#
• #⟦B⟧ -= bbbB#
• #⟦tt "f"⟧ -= x |-> not x#
• #⟦tt "g"⟧ -= x |-> not x#

plus the proof #not not x = x#.

As another simple example, we can interpret the theory of isomorphisms into itself slightly non-trivially.

• #⟦A⟧ -= B#
• #⟦B⟧ -= A#
• #⟦tt "f"⟧ -= tt "g"(bbx_B)#
• #⟦tt "g"⟧ -= tt "f"(bbx_A)#

As an (easy) exercise, you should define #pi_1 : ccT_1 xx ccT_2 -> ccT_1# and similarly #pi_2#. If you defined #ccT_1 + ccT_2# before, you should define #iota_1 : ccT_1 -> ccT_1 + ccT_2# and similarly for #iota_2#. As another easy exercise, show that an interpretation of #ccT_{:~=:}# is a bijection. In Haskell, an interpretation of #ccT_bbbN# would effectively be foldNat. Something very interesting happens when you consider what an interpretation of the collage generated by a theory, #ccK_ccT#, is. Spell it out. In a different vein, you can show that a collage #ccK : ccT_1 ↛ ccT_2# and an interpretation #ccT_1^(op) xx ccT_2 -> bb "Set"# are essentially the same thing in the sense that each gives rise to the other.

Two theories are isomorphic if there exists interpretations #ccI_1 : ccT_1 -> ccT_2# and #ccI_2 : ccT_2 -> ccT_1# such that #⟦⟦A⟧^(ccI_1)⟧^(ccI_2) = A# and visa versa, and similarly for function symbols. In other words, each is interpretable in the other, and if you go from one interpretation and then back, you end up where you started. Yet another way to say this is that there is a one-to-one correspondence between sorts and terms of each theory, and this correspondence respects substitution.

As a crucially important example, the set of terms, #ccT(A, B)#, can be extended to an interpretation. In particular, for each sort #A#, #ccT(A, -) : ccT -> bb "Set"#. It’s action on function symbols is the following:

#⟦tt "f"⟧^(ccT(A, -)) -= t |-> tt "f"(t)#

We have, dually, #ccT(-, A) : ccT^(op) -> bb "Set"# with the following action:

#⟦tt "f"⟧^(ccT(-, A)) -= t |-> t(tt "f"(bbx_B))#

We can abstract from both parameters making #ccT(-, =) : ccT^(op) xx ccT -> bb "Set"# which, by an early exercise, can be shown to correspond with the collage #ccK_ccT#.

Via an abuse of notation, I’ll identify #ccT^(op)(A, -)# with #ccT(-, A)#, though technically we only have an isomorphism between the interpretations, and to talk about isomorphisms between interpretations we need the notion of…

Homomorphisms

The theories we’ve presented are (multi-sorted) universal algebra theories. Universal algebra allows us to specify a general notion of “homomorphism” that generalizes monoid homomorphism or group homomorphism or ring homomorphism or lattice homomorphism.

In universal algebra, the algebraic theory of groups consists of a single sort, a nullary operation, #1#, a binary operation, #*#, a unary operation, #tt "inv"#, and some equations which are unimportant for us. Operations correspond to our function symbols except that they’re are not restricted to being unary. A particular group is a particular interpretation of the algebraic theory of groups, i.e. it is a set and three functions into the set. A group homomorphism then is a function between those two groups, i.e. between the two interpretations, that preserves the operations. In a traditional presentation this would look like the following:

Say #alpha : G -> K# is a group homomorphism from the group #G# to the group #K# and #g, h in G# then:

• #alpha(1_G) = 1_K#
• #alpha(g *_G h) = alpha(g) *_K alpha(h)#
• #alpha(tt "inv"_G(g)) = tt "inv"_K(alpha(g))#

Using something more akin to our notation, it would look like:

• #alpha(⟦1⟧^G) = ⟦1⟧^K#
• #alpha(⟦*⟧^G(g,h)) = ⟦*⟧^K(alpha(g), alpha(h))#
• #alpha(⟦tt "inv"⟧^G(g)) = ⟦tt "inv"⟧^K(alpha(g))#

The #tt "inv"# case is the most relevant for us as it is unary. However, for us, a function symbol #ttf# may have a different source and target and so we made need a different function on each side of the equation. E.g. for #ttf : A -> B#, #alpha : ccI_1 -> ccI_2#, and #a in ⟦A⟧^(ccI_1)# we’d have:

#alpha_B(⟦tt "f"⟧^(ccI_1)(a)) = ⟦tt "f"⟧^(ccI_2)(alpha_A(a))#

So a homomorphism #alpha : ccI_1 -> ccI_2 : ccT -> bb "Set"# is a family of functions, one for each sort of #ccT#, that satisfies the above equation for every function symbol of #ccT#. We call the individual functions making up #alpha# components of #alpha#, and we have #alpha_A : ⟦A⟧^(ccI_1) -> ⟦A⟧^(ccI_2)#. The definition for an interpretation into a theory, #ccT_2#, is identical except the components of #alpha# are terms of #ccT_2# and #a# can be replaced with #bbx_(⟦A⟧^(ccI_1))#. Two interpretations are isomorphic if we have homomorphism #alpha : ccI_1 -> ccI_2# such that each component is a bijection. This is the same as requiring a homomorphism #beta : ccI_2 -> ccI_1# such that for each #A#, #alpha_A(beta_A(x)) = x# and #beta_A(alpha_A(x)) = x#. A similar statement can be made for interpretations into theories, just replace #x# with #bbx_(⟦A⟧)#.

Another way to look at homomorphisms is via collages. A homomorphism #alpha : ccI_1 -> ccI_2 : ccT -> bb "Set"# gives rise to an interpretation of the collage #ccK_ccT#. The interpretation #ccI_alpha : ccK_ccT -> bb "Set"# is defined by:

• #⟦tt "inl"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_1)#
• #⟦tt "inr"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_2)#
• #⟦tt "inl"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_1)#
• #⟦tt "inr"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_2)#
• #⟦tt "id"_A⟧^(ccI_alpha) -= alpha_A#

The homomorphism law guarantees that it satisfies the equation on #tt "id"#. Conversely, given an interpretation of #ccK_ccT#, we have the homomorphism, #⟦tt "id"⟧ : ⟦tt "inl"(-)⟧ -> ⟦tt "inr"(-)⟧ : ccT -> bb "Set"#. and the equation on #tt "id"# is exactly the homomorphism law.

Yoneda

Consider a homomorphism #alpha : ccT(A, -) -> ccI#. The #alpha# needs to satisfy for every sort #B# and #C#, every function symbol #ttf : C -> D#, and every term #t : B -> C#:

#alpha_D(tt "f"(t)) = ⟦tt "f"⟧^ccI(alpha_C(t))#

Looking at this equation, the possibility of viewing it as a recursive “definition” leaps out suggesting that the action of #alpha# is completely determined by it’s action on the variables. Something like this, for example:

#alpha_D(tt "f"(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(alpha_C(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(alpha_B(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(⟦tt "h"⟧(alpha_A(bbx_A))))#

We can easily establish that there’s a one-to-one correspondence between the set of homomorphisms #ccT(A, -) -> ccI# and the elements of the set #⟦A⟧^ccI#. Given a homomorphism, #alpha#, we get an element of #⟦A⟧^ccI# via #alpha_A(bbx_A)#. Inversely, given an element #a in ⟦A⟧^ccI#, we can define a homomorphism #a^**# via:

• #a_D^**(tt "f"(t)) -= ⟦tt "f"⟧^ccI(a_C^**(t))#
• #a_A^**(bbx_A) -= a#

which clearly satisfies the condition on homomorphisms by definition. It’s easy to verify that #(alpha_A(bbx_A))^** = alpha# and immediately true that #a^**(bbx_A) = a# establishing the bijection.

We can state something stronger. Given any homomorphism #alpha : ccT(A, -) -> ccI# and any function symbol #ttg : A -> X#, we can make a new homomorphism #alpha * ttg : ccT(X, -) -> ccI# via the following definition:

#(alpha * ttg)(t) = alpha(t(tt "g"(bbx_A)))#

Verifying that this is a homomorphism is straightforward:

#(alpha * ttg)(tt "f"(t)) = alpha(tt "f"(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧(alpha(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧((alpha * ttg)(t))#

and like any homomorphism of this form, as we’ve just established, it is completely determined by it’s action on variables, namely #(alpha * ttg)_A(bbx_A) = alpha_X(tt "g"(bbx_A)) = ⟦tt "g"⟧(alpha_A(bbx_A))#. In particular, if #alpha = a^**#, then we have #a^** * ttg = (⟦tt "g"⟧(a))^**#. Together these facts establish that we have an interpretation #ccY : ccT -> bb "Set"# such that #⟦A⟧^ccY -= (ccT(A, -) -> ccI)#, the set of homomorphisms, and #⟦tt "g"⟧^ccY(alpha) -= alpha * tt "g"#. The work we did before established that we have homomorphisms #(-)(bbx) : ccY -> ccI# and #(-)^** : ccI -> ccY# that are inverses. This is true for all theories and all interpretations as at no point did we use any particular facts about them. This statement is the (dual form of the) Yoneda lemma. To get the usual form simply replace #ccT# with #ccT^(op)#. A particularly important and useful case (so useful it’s usually used tacitly) occurs when we choose #ccI = ccT(B,-)#, we get #(ccT(A, -) -> ccT(B, -)) ~= ccT(B, A)# or, choosing #ccT^(op)# everywhere, #(ccT(-, A) -> ccT(-, B)) ~= ccT(A, B)# which states that a term from #A# to #B# is equivalent to a homomorphism from #ccT(-, A)# to #ccT(-, B)#.

There is another result, dual in a different way, called the co-Yoneda lemma. It turns out it is a corollary of the fact that for a collage #ccK : ccT_1 ↛ ccT_2#, #ccK_(ccT_2) @ ccK ~= ccK# and the dual is just the composition the other way. To get (closer to) the precise result, we need to be able to turn an interpretation into a collage. Given an interpretation, #ccI : ccT -> bb "Set"#, we can define a collage #ccK_ccI : ccT_bb1 ↛ ccT# whose bridges from #1 -> A# are the elements of #⟦A⟧^ccI#. Given this, the co-Yoneda lemma is the special case, #ccK_ccT @ ccK_ccI ~= ccK_ccI#.

Note, that the Yoneda and co-Yoneda lemmas only apply to interpretations into sets as #ccY# involves the set of homomorphisms.

Representability

The Yoneda lemma suggests that the interpretations #ccT(A, -)# and #ccT(-, A)# are particularly important and this will be borne out as we continue.

We call an interpretation, #ccI : ccT^(op) -> bb "Set"# representable if #ccI ~= ccT(-, X)# for some sort #X#. We then say that #X# represents #ccI#. What this states is that every term of sort #X# corresponds to an element in one of the sets that make up #ccI#, and these transform appropriately. There’s clearly a particularly important element, namely the image of #bbx_X# which corresponds to an element in #⟦X⟧^ccI#. This element is called the universal element. The dual concept is, for #ccI : ccT -> bb "Set"#, #ccI# is co-representable if #ccI ~= ccT(X, -)#. We will also say #X# represents #ccI# in this case as it actually does when we view #ccI# as an interpretation of #(ccT^(op))^(op)#.

As a rather liberating exercise, you should establish the following result called parameterized representability. Assume we have theories #ccT_1# and #ccT_2#, and a family of sorts of #ccT_2#, #X#, and a family of interpretations of #ccT_2^(op)#, #ccI#, both indexed by sorts of #ccT_1#, such that for each #A in bb "sort"(ccT_1)#, #X_A# represents #ccI_A#, i.e. #ccI_A ~= ccT_2(-, X_A)#. Given all this, then there is a unique interpretation #ccX : ccT_1 -> ccT_2# and #ccI : ccT_1 xx ccT_2^(op) -> bb "Set"# where #⟦A⟧^(ccX) -= X_A# and #"⟦("A, B")⟧"^ccI -= ⟦B⟧^(ccI_A)# such that #ccI ~= ccT_2(=,⟦-⟧^ccX)#. To be a bit more clear, the right hand side means #(A, B) |-> ccT_2(B, ⟦A⟧^ccX)#. Simply by choosing #ccT_1# to be a product of multiple theories, we can generalize this result to an arbitrary number of parameters. What makes this result liberating is that we just don’t need to worry about the parameters, they will automatically transform homomorphically. As a technical warning though, since two interpretations may have the same action on sorts but a different action on function symbols, if the family #X_A# was derived from an interpretation #ccJ#, i.e. #X_A -= ⟦A⟧^ccJ#, it may not be the case that #ccX = ccJ#.

Let’s look at some examples.

As a not-so-special case of representability, we can consider #ccI -= ccK(tt "inl"(-), tt "inr"(Z))# where #ccK : ccT_1 ↛ ccT_2#. Saying that #A# represents #ccI# in this case is saying that bridging terms of sort #tt "inr"(Z)#, i.e. sort #Z# in #ccT_2#, in #ccK#, correspond to terms of sort #A# in #ccT_1#. We’ll call the universal element of this representation the universal bridge (though technically it may be a bridging term, not a bridge). Let’s write #varepsilon# for this universal bridge. What representability states in this case is given any bridging term #k# of sort #Z#, there exists a unique term #|~ k ~|# of sort #A# such that #k = varepsilon(|~ k ~|)#. If we have an interpretation #ccX : ccT_2 -> ccT_1# such that #⟦Z⟧^ccX# represents #ccK(tt "inl"(-), tt "inr"(Z))# for each sort #Z# of #ccT_2# we say we have a right representation of #ccK#. Note, that the universal bridges become a family #varepsilon_Z : ⟦Z⟧^ccX -> Z#. Similarly, if #ccK(tt "inl"(A), tt "inr"(-))# is co-representable for each #A#, we say we have a left representation of #ccK#. The co-universal bridge is then a bridging term #eta_A : A -> ⟦A⟧# such that for any bridging term #k# with source #A#, there exists a unique term #|__ k __|# in #ccT_2# such that #k = |__ k __|(eta_A)#. For reference, we’ll call these equations universal properties of the left/right representation. Parameterized representability implies that a left/right representation is essentially unique.

Define #ccI_bb1# via #⟦A⟧^(ccI_bb1) -= bb1# where #bb1# is some one element set. #⟦ttf⟧^(ccI_bb1)# is the identity function for all function symbols #ttf#. We’ll say a theory #ccT# has a unit sort or has a terminal sort if there is a sort that we’ll also call #bb1# that represents #ccI_bb1#. Spelling out what that means, we first note that there is nothing notable about the universal element as it’s the only element. However, writing the homomorphism #! : ccI_bb1 -> ccT(-, bb1)# and noting that since there’s only one element of #⟦A⟧^(ccI_bb1)# we can, with a slight abuse of notation, also write the term #!# picks out as #!# which gives the equation:

#!_B(tt "g"(t)) = !_A(t)# for any function symbol #ttg : A -> B# and term, #t#, of sort #A#, note #!_A : A -> bb1#.

This equation states what the isomorphism also fairly directly states: there is exactly one term of sort #bb1# from any sort #A#, namely #!_A(bbx_A)#. The dual notion is called a void sort or an initial sort and will usually be notated #bb0#, the analog of #!# will be written as #0#. The resulting equation is:

#tt "f"(0_A) = 0_B# for any function symbol #ttf : A -> B#, note #0_A : bb0 -> A#.

For the next example, I’ll leverage collages. Consider the collage #ccK_2 : ccT ↛ ccT xx ccT# whose bridges from #A -> (B, C)# consist of pairs of terms #t_1 : A -> B# and #t_2 : A -> C#. #ccT# has pairs if #ccK_2# has a right representation. We’ll write #(B, C) |-> B xx C# for the representing interpretation’s action on sorts. We’ll write the universal bridge as #(tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C)))#. The universal property then looks like #(tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C)))((: t_1, t_2 :)) = (t_1, t_2)# where #(: t_1, t_2 :) : A -> B xx C# is the unique term induced by the bridge #(t_1, t_2)#. The universal property implies the following equations:

• #(: tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C))) = bbx_(B xx C)#
• #tt "fst"((: t_1, t_2 :)) = t_1#
• #tt "snd"((: t_1, t_2 :)) = t_2#

One aspect of note, is regardless of whether #ccK_2# has a right representation, i.e. regardless of whether #ccT# has pairs, it always has a left representation. The co-universal bridge is #(bbx_A, bbx_A)# and the unique term #|__(t_1, t_2)__|# is #tt "inl"(t_1, bbx_A)(tt "inr"(bbx_A, t_2)(bbx_("("A,A")")))#.

Define an interpretation #Delta : ccT -> ccT xx ccT# so that #⟦A⟧^Delta -= (A,A)# and similarly for function symbols. #Delta# left represents #ccK_2#. If the interpretation #(B,C) |-> B xx C# right represents #ccK_2#, then we say we have an adjunction between #Delta# and #(- xx =)#, written #Delta ⊣ (- xx =)#, and that #Delta# is left adjoint to #(- xx =)#, and conversely #(- xx =)# is right adjoint #Delta#.

More generally, whenever we have the situation #ccT_1(⟦-⟧^(ccI_1), =) ~= ccT_2(-, ⟦=⟧^(ccI_2))# we say that #ccI_1 : ccT_2 -> ccT_1# is left adjoint to #ccI_2 : ccT_1 -> ccT_2# or conversely that #ccI_2# is right adjoint to #ccI_1#. We call this arrangement an adjunction and write #ccI_1 ⊣ ccI_2#. Note that we will always have this situation if #ccI_1# left represents and #ccI_2# right represents the same collage. As we noted above, parameterized representability actually determines one adjoint given (its action on sorts and) the other adjoint. With this we can show that adjoints are unique up to isomorphism, that is, given two left adjoints to an interpretation, they must be isomorphic. Similarly for right adjoints. This means that stating something is a left or right adjoint to some other known interpretation essentially completely characterizes it. One issue with adjunctions is that they tend to be wholesale. Let’s say the pair sort #A xx B# existed but no other pair sorts existed, then the (no longer parameterized) representability approach would work just fine, but the adjunction would no longer exist.

Here’s a few of exercises using this. First, a moderately challenging one (until you catch the pattern): spell out the details to the left adjoint to #Delta#. We say a theory has sums and write those sums as #A + B# if #(- + =) ⊣ Delta#. Recast void and unit sorts using adjunctions and/or left/right representations. As a terminological note, we say a theory has finite products if it has unit sorts and pairs. Similarly, a theory has finite sums or has finite coproducts if it has void sorts and sums. An even more challenging exercise is the following: a theory has exponentials if it has finite products and for every sort #A#, #(A xx -) ⊣ (A => -)# (note, parameterized representability applies to #A#). Spell out the equations characterizing #A => B#.

Finite Product Theories

Finite products start to lift us off the ground. So far the theories we’ve been working with have been extremely basic: a language with only unary functions, all terms being just a sequence of applications of function symbols. It shouldn’t be underestimated though. It’s more than enough to do monoid and group theory. A good amount of graph theory can be done with just this. And obviously we were able to establish several general results assuming only this structure. Nevertheless, while we can talk about specific groups, say, we can’t talk about the theory of groups. Finite products change this.

A theory with finite products allows us to talk about multi-ary function symbols and terms by considering unary function symbols from products. This allows us to do all of universal algebra. For example, the theory of groups, #ccT_(bb "Grp")#, consists of a sort #S# and all it’s products which we’ll abbreviate as #S^n# with #S^0 -= bb1# and #S^(n+1) -= S xx S^n#. It has three function symbols #tte : bb1 -> S#, #ttm : S^2 -> S#, and #tti : S -> S# plus the ones that having finite products requires. In fact, instead of just heaping an infinite number of sorts and function symbols into our theory — and we haven’t even gotten to equations — let’s define a compact set of data from which we can generate all this data.

A signature, #Sigma#, consists of a collection of sorts, #sigma#, a collection of multi-ary function symbols, and a collection of equations. Equations still remain pairs of terms, but we need to now extend our definition of terms for this context. A term (in a signature) is either a variable, #bbx_i^[A_0,A_1,...,A_n]# with #A_i# are sorts and #0 <= i <= n#, the operators #tt "fst"# or #tt "snd"# applied to a term, the unit term written #(::)^A# with sort #A#, a pair of terms written #(: t_1, t_2 :)#, or the (arity correct) application of a multi-ary function symbol to a series of terms, e.g. #tt "f"(t_1, t_2, t_3)#. As a Haskell data declaration, it might look like:

data SigTerm
= SigVar [Sort] Int
| Fst SigTerm
| Snd SigTerm
| Unit Sort
| Pair SigTerm SigTerm
| SigApply FunctionSymbol [SigTerm]

At this point, sorting (i.e. typing) the terms is no longer trivial, though it is still pretty straightforward. Sorts are either #bb1#, or #A xx B# for sorts #A# and #B#, or a sort #A in sigma#. The source of function symbols or terms are lists of sorts.

• #bbx_i^[A_0, A_1, ..., A_n] : [A_0, A_1, ..., A_n] -> A_i#
• #(::)^A : [A] -> bb1#
• #(: t_1, t_2 :) : bar S -> T_1 xx T_2# where #t_i : bar S -> T_i#
• #tt "fst"(t) : bar S -> T_1# where #t : bar S -> T_1 xx T_2#
• #tt "snd"(t) : bar S -> T_2# where #t : bar S -> T_1 xx T_2#
• #tt "f"(t_1, ..., t_n) : bar S -> T# where #t_i : bar S -> T_i# and #ttf : [T_1,...,T_n] -> T#

The point of a signature was to represent a theory so we can compile a term of a signature into a term of a theory with finite products. The theory generated from a signature #Sigma# has the same sorts as #Sigma#. The equations will be equations of #Sigma#, with the terms compiled as will be described momentarily, plus for every pair of sorts the equations that describe pairs and the equations for #!#. Finally, we need to describe how to take a term of the signature and make a function symbol of the theory, but before we do that we need to explain how to convert those sources of the terms which are lists. That’s just a conversion to right nested pairs, #[A_0,...,A_n] |-> A_0 xx (... xx (A_n xx bb1) ... )#. The compilation of a term #t#, which we’ll write as #ccC[t]#, is defined as follows:

• #ccC[bbx_i^[A_0, A_1, ..., A_n]] = tt "snd"^i(tt "fst"(bbx_(A_i xx(...))))# where #tt "snd"^i# means the #i#-fold application of #tt "snd"#
• #ccC[(::)^A] = !_A#
• #ccC[(: t_1, t_2 :)] = (: ccC[t_1], ccC[t_2] :)#
• #ccC[tt "fst"(t)] = tt "fst"(ccC[t])#
• #ccC[tt "snd"(t)] = tt "snd"(ccC[t])#
• #ccC[tt "f"(t_1, ..., t_n)] = tt "f"((: ccC[t_1], (: ... , (: ccC[t_n], ! :) ... :) :))#

As you may have noticed, the generated theory will have an infinite number of sorts, an infinite number of function symbols, and an infinite number of equations no matter what the signature is — even an empty one! Having an infinite number of things isn’t a problem as long as we can algorithmically describe them and this is what the signature provides. Of course, if you’re a (typical) mathematician you nominally don’t care about an algorithmic description. Besides being compact, signatures present a nicer term language. The theories are like a core or assembly language. We could define a slightly nicer variation where we keep a context and manage named variables leading to terms-in-context like:

#x:A, y:B |-- tt "f"(x, x, y)#

which is

#tt "f"(bbx_0^[A,B], bbx_0^[A,B], bbx_1^[A,B])#

for our current term language for signatures. Of course, compilation will be (slightly) trickier for the nicer language.

The benefit of having compiled the signature to a theory, in addition to being able to reuse the results we’ve established for theories, is we only need to define operations on the theory, which is simpler since we only need to deal with pairs and unary function symbols. One example of this is we’d like to extend our notion of interpretation to one that respects the structure of the signature, and we can do that by defining an interpretation of theories that respects finite products.

A finite product preserving interpretation (into a finite product theory), #ccI#, is an interpretation (into a finite product theory) that additionally satisfies:

• #⟦bb1⟧^ccI = bb1#
• #⟦A xx B⟧^ccI = ⟦A⟧^ccI xx ⟦B⟧^ccI#
• #⟦!_A⟧^ccI = !_(⟦A⟧^ccI)#
• #⟦tt "fst"(t)⟧^ccI = tt "fst"(⟦t⟧^ccI)#
• #⟦tt "snd"(t)⟧^ccI = tt "snd"(⟦t⟧^ccI)#
• #⟦(: t_1, t_2 :)⟧^ccI = (: ⟦t_1⟧^ccI, ⟦t_2⟧^ccI :)#

where, for #bb "Set"#, #bb1 -= {{}}#, #xx# is the cartesian product, #tt "fst"# and #tt "snd"# are the projections, #!_A -= x |-> \{\}#, and #(: f, g :) -= x |-> (: f(x), g(x) :)#.

With signatures, we can return to our theory, now signature, of groups. #Sigma_bb "Grp"# has a single sort #S#, three function symbols #tte : [bb1] -> S#, #tti : [S] -> S#, and #ttm : [S, S] -> S#, with the following equations (written as equations rather than pairs):

• #tt "m"(tt "e"((::)^S), bbx_0^S) = bbx_0^S#
• #tt "m"(tt "i"(bbx_0^S), bbx_0^S) = tt "e"((::)^S)#
• #tt "m"(tt "m"(bbx_0^[S,S,S], bbx_1^[S,S,S]), bbx_2^[S,S,S]) = tt "m"(bbx_0^[S,S,S], tt "m"(bbx_1^[S,S,S], bbx_2^[S,S,S]))#

or using the nicer syntax:

• #x:S |-- tt "m"(tt "e"(), x) = x#
• #x:S |-- tt "m"(tt "i"(x), x) = tt "e"()#
• #x:S, y:S, z:S |-- tt "m"(tt "m"(x, y), z) = tt "m"(x, tt "m"(y, z))#

An actual group is then just a finite product preserving interpretation of (the theory generated by) this signature. All of universal algebra and much of abstract algebra can be formulated this way.

The Simply Typed Lambda Calculus and Beyond

We can consider additionally assuming that our theory has exponentials. I left articulating exactly what that means as an exercise, but the upshot is we have the following two operations:

For any term #t : A xx B -> C#, we have the term #tt "curry"(t) : A -> C^B#. We also have the homomorphism #tt "app"_(AB) : B^A xx A -> B#. They satisfy:

• #tt "curry"(tt "app"(bbx_(B^A xx A))) = bbx_(B^A)#
• #tt "app"((: tt "curry"(t_1), t_2 :)) = t_1((: bbx_A, t_2 :))# where #t_1 : A xx B -> C# and #t_2 : A -> B#.

We can view these, together with the the product operations, as combinators and it turns out we can compile the simply typed lambda calculus into the above theory. This is exactly what the Categorical Abstract Machine did. The “Caml” in “O’Caml” stands for “Categorical Abstract Machine Language”, though O’Caml no longer uses the CAM. Conversely, every term of the theory can be expressed as a simply typed lambda term. This means we can view the simply typed lambda calculus as just a different presentation of the theory.

At this point, this presentation of category theory starts to connect to the mainstream categorical literature on universal algebra, internal languages, sketches, and internal logic. This page gives a synopsis of the relationship between type theory and category theory. For some reason, it is unusual to talk about the internal language of a plain category, but that is exactly what we’ve done here.

I haven’t talked about finite limits or colimits beyond products and coproducts, nor have I talked about even the infinitary versions of products and coproducts, let alone arbitrary limits and colimits. These can be handled the same way as products and coproducts. Formulating a language like signatures or the simply typed lambda calculus is a bit more complicated, but not that hard. I may make a follow-up article covering this among other things. I also have a side project (don’t hold your breath), that implements the internal language of a category with finite limits. The result looks roughly like a simple version of an algebraic specification language like the OBJ family. The RING theory described in the Maude manual gives an idea of what it would look like. In fact, here’s an example of the current actual syntax I’m using.3

theory Categories
type O
type A
given src : A -> O
given tgt : A -> O

given id : O -> A
satisfying o:O | src (id o) = o, tgt (id o) = o

given c : { f:A, g:A | src f = tgt g } -> A
satisfying (f, g):{ f:A, g:A | src f = tgt g }
| tgt (c (f, g)) = tgt f, src (c (f, g)) = src g
satisfying "left unit" (o, f):{ o:O, f:A | tgt f = o }
| c (id o, f) = f
satisfying "right unit" (o, f):{ o:O, f:A | src f = o }
| c (f, id o) = f
satisfying "associativity" (f, g, h):{ f:A, g:A, h:A | src f = tgt g, src g = tgt h }
| c (c (f, g), h) = c (f, c (g, h))
endtheory

It turns out this is a particularly interesting spot in the design space. The fact that the theory of theories with finite limits is itself a theory with finite limits has interesting consequences. It is still relatively weak though. For example, it’s not possible to describe the theory of fields in this language.

There are other directions one could go. For example, the internal logic of monoidal categories is (a fragment of) ordered linear logic. You can cross this bridge either way. You can look at different languages and consider what categorical structure is needed to support the features of the language, or you can add features to the category and see how that impacts the internal language. The relationship is similar to the source language and a core/intermediate language in a compiler, e.g. GHC Haskell and System Fc.

Decoder

If you’ve looked at category theory at all, you can probably make most of the connections without me telling you. The table below outlines the mapping, but there are some subtleties. First, as a somewhat technical detail, my definition of a theory corresponds to a small category, i.e. a category which has a set of objects and a set of arrows. For more programmer types, you should think of “set” as Set in Agda, i.e. similar to the * kind in Haskell. Usually “category” means “locally small category” which may have a proper class of objects and between any two objects a set of arrows (though the union of all those sets may be a proper class). Again, for programmers, the distinction between “class” and “set” is basically the difference between Set and Set1 in Agda.4 To make my definition of theory closer to this, all that is necessary is instead of having a set of function symbols, have a family of sets indexed by pairs of objects. Here’s what a partial definition in Agda of the two scenarios would look like:

-- Small category (the definition I used)
record SmallCategory : Set1 where
field
objects : Set
arrows : Set
src : arrows -> objects
tgt : arrows -> objects
...

-- Locally small category
record LocallySmallCategory : Set2 where
field
objects : Set1
hom : objects -> objects -> Set
...

-- Different presentation of a small category
record SmallCategory' : Set1 where
field
objects : Set
hom : objects -> objects -> Set
...

The benefit of the notion of locally small category is that Set itself is a locally small category. The distinction I was making between interpretations into theories and interpretations into Set was due to the fact that Set wasn’t a theory. If I used a definition theory corresponding to a locally small category, I could have combined the notions of interpretation by making Set a theory. The notion of a small category, though, is still useful. Also, an interpretation into Set corresponds to the usual notion of a model or semantics, while interpretations into other theories was a less emphasized concept in traditional model theory and universal algebra.

A less technical and more significant difference is that my definition of a theory doesn’t correspond to a category, but rather to a presentation of a category, from which a category can be generated. The analog of arrows in a category is terms, not function symbols. This is a bit more natural route from the model theory/universal algebra/programming side. Similarly, having an explicit collection of equations, rather than just an equivalence relation on terms is part of the presentation of the category but not part of the category itself.

model theory category theory
sort object
term arrow
function symbol generating arrow
theory presentation of a (small) category
collage collage, cograph of a profunctor
bridge heteromorphism
signature presentation of a (small) category with finite products
interpretation into sets, aka models a functor into Set, a (co)presheaf
interpretation into a theory functor
homomorphism natural transformation
simply typed lambda calculus (with products) a cartesian closed category

Conclusion

In some ways I’ve stopped just when things were about to get good. I may do a follow-up to elaborate on this good stuff. Some examples are: if I expand the definition so that Set becomes a “theory”, then interpretations also form such a “theory”, and these are often what we’re really interested in. The category of finite-product preserving interpretations of the theory of groups essentially is the category of groups. In fact, universal algebra is, in categorical terms, just the study of categories with finite products and finite-product preserving functors from them, particularly into Set. It’s easy to generalize this in many directions. It’s also easy to make very general definitions, like a general definition of a free algebraic structure. In general, we’re usually more interested in the interpretations of a theory than the theory itself.

While I often do advocate thinking in terms of internal languages of categories, I’m not sure that it is a preferable perspective for the very basics of category theory. Nevertheless, there are a few reasons for why I wrote this. First, this very syntactical approach is, I think, more accessible to someone coming from a programming background. From this view, a category is a very simple programming language. Adding structure to the category corresponds to adding features to this programming language. Interpretations are denotational semantics.

Another aspect about this presentation that is quite different is the use and emphasis on collages. Collages correspond to profunctors, a crucially important and enabling concept that is rarely covered in categorical introductions. The characterization of profunctors as collages in Vaughn Pratt’s paper (not using that name) was one of the things I enjoyed about that paper and part of what prompted me to start writing this. In earlier, drafts of this article, I was unable to incorporate collages in a meaningful way as I was trying to start from profunctors. This approach just didn’t add value. Collages just looked like a bizarre curio and weren’t integrated into the narrative at all. For other reasons, though, I ended up revisiting the idea of a heteromorphism. My (fairly superficial) opinion is that once you have the notion of functors and natural transformations, adding the notion of heteromorphisms has a low power-to-weight ratio, though it does make some things a bit nicer. Nevertheless, in thinking of how best to fit them into this context, it was clear that collages provided the perfect mechanism (which isn’t a big surprise), and the result works rather nicely. When I realized a fact that can be cryptically but compactly represented as #ccK_ccT ≃ bbbI xx ccT# where #bbbI# is the interval category, i.e. two objects with a single arrow joining them, I realized that this is actually an interesting perspective. Since most of this article was written at that point, I wove collages into the narrative replacing some things. If, though, I had started with this perspective from the beginning I suspect I would have made a significantly different article, though the latter sections would likely be similar.

1. It’s actually better to organize this as a family of collections of function symbols indexed by pairs of sorts.

2. Instead of having equations that generate an equivalence relation on (raw) terms, we could simply require an equivalence relation on (raw) terms be directly provided.

3. Collaging is actually quite natural in this context. I already intend to support one theory importing another. A collage is just a theory that imports two others and then adds function symbols between them.

4. For programmers familiar with Agda, at least, if you haven’t made this connection, this might help you understand and appreciate what a “class” is versus a “set” and what “size issues” are, which is typically handled extremely vaguely in a lot of the literature.

Imaginary Albanian eggplant festivals… IN SPACE

Wikipedia has a list of harvest festivals which includes this intriguing entry:

Ysolo: festival marking the first day of harvest of eggplants in Tirana, Albania

(It now says “citation needed“; I added that yesterday.)

I am confident that this entry, inserted in July 2012 by an anonymous user, is a hoax. When I first read it, I muttered “Oh, what bullshit,” but then went looking for a reliable source, because you never know. I have occasionally been surprised in the past, but this time I found clear evidence of a hoax: There are only a couple of scattered mentions of Ysolo on a couple of blogs, all from after 2012, and nothing at all in Google Books about Albanian eggplant celebrations. Nor is there an article about it in Albanian Wikipedia.

But reality gave ground before I arrived on the scene. Last September NASA's Dawn spacecraft visited the dwarf planet Ceres. Ceres is named for the Roman goddess of the harvest, and so NASA proposed harvest-related names for Ceres’ newly-discovered physical features. It appears that someone at NASA ransacked the Wikipedia list of harvest festivals without checking whether they were real, because there is now a large mountain at Ceres’ north pole whose official name is Ysolo Mons, named for this spurious eggplant festival. (See also: NASA JPL press release; USGS Astrogeology Science Center announcement.)

To complete the magic circle of fiction, the Albanians might begin to celebrate the previously-fictitious eggplant festival. (And why not? Eggplants are lovely.) Let them do it for a couple of years, and then Wikipedia could document the real eggplant festival… Why not fall under the spell of Tlön and submit to the minute and vast evidence of an ordered planet?

Happy Ysolo, everyone.

[ Addendum 20161208: Ysolo has been canceled ]

Introduction

In most presentations of Riemannian geometry, e.g. O’Neill (1983) and Wikipedia, the fundamental theorem of Riemannian geometry (“the miracle of Riemannian geometry”) is given: that for any semi-Riemannian manifold there is a unique torsion-free metric connection. I assume partly because of this and partly because the major application of Riemannian geometry is General Relativity, connections with torsion are given little if any attention.

It turns out we are all very familiar with a connection with torsion: the Mercator projection. Some mathematical physics texts, e.g. Nakahara (2003), allude to this but leave the details to the reader. Moreover, this connection respects the metric induced from Euclidean space.

We use SageManifolds to assist with the calculations. We hint at how this might be done more slickly in Haskell.

A Cartographic Aside

%matplotlib inline
/Applications/SageMath/local/lib/python2.7/site-packages/traitlets/traitlets.py:770: DeprecationWarning: A parent of InlineBackend._config_changed has adopted the new @observe(change) API
clsname, change_or_name), DeprecationWarning)
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import cartopy
import cartopy.crs as ccrs
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
plt.figure(figsize=(8, 8))

ax = plt.axes(projection=cartopy.crs.Mercator())

ax.gridlines()

plt.show()

We can see Greenland looks much broader at the North than in the middle. But if we use a polar projection (below) then we see this is not the case. Why then is the Mercator projection used in preference to e.g. the polar projection or the once controversial Gall-Peters – see here for more on map projections.

plt.figure(figsize=(8, 8))

bx = plt.axes(projection=cartopy.crs.NorthPolarStereo())

bx.set_extent([-180, 180, 90, 50], ccrs.PlateCarree())

bx.gridlines()

plt.show()

Colophon

This is written as an Jupyter notebook. In theory, it should be possible to run it assuming you have installed at least sage and Haskell. To publish it, I used

jupyter-nbconvert --to markdown Mercator.ipynb
pandoc -s Mercator.md -t markdown+lhs -o Mercator.lhs \
--filter pandoc-citeproc --bibliography DiffGeom.bib
BlogLiteratelyD --wplatex Mercator.lhs > Mercator.html

Not brilliant but good enough.

Some commands to jupyter to display things nicely.

%display latex
viewer3D = 'tachyon'

Warming Up With SageManifolds

Let us try a simple exercise: finding the connection coefficients of the Levi-Civita connection for the Euclidean metric on $\mathbb{R}^2$ in polar co-ordinates.

Define the manifold.

N = Manifold(2, 'N',r'\mathcal{N}', start_index=1)

Define a chart and frame with Cartesian co-ordinates.

ChartCartesianN.<x,y> = N.chart()
FrameCartesianN = ChartCartesianN.frame()

Define a chart and frame with polar co-ordinates.

ChartPolarN.<r,theta> = N.chart()
FramePolarN = ChartPolarN.frame()

The standard transformation from Cartesian to polar co-ordinates.

cartesianToPolar = ChartCartesianN.transition_map(ChartPolarN, (sqrt(x^2 + y^2), arctan(y/x)))
print(cartesianToPolar)
Change of coordinates from Chart (N, (x, y)) to Chart (N, (r, theta))
print(latex(cartesianToPolar.display()))

$\displaystyle \left\{\begin{array}{lcl} r & = & \sqrt{x^{2} + y^{2}} \\ \theta & = & \arctan\left(\frac{y}{x}\right) \end{array}\right.$

cartesianToPolar.set_inverse(r * cos(theta), r * sin(theta))
Check of the inverse coordinate transformation:
x == x
y == y
r == abs(r)
theta == arctan(sin(theta)/cos(theta))

Now we define the metric to make the manifold Euclidean.

g_e = N.metric('g_e')
g_e[1,1], g_e[2,2] = 1, 1

We can display this in Cartesian co-ordinates.

print(latex(g_e.display(FrameCartesianN)))

$\displaystyle g_e = \mathrm{d} x\otimes \mathrm{d} x+\mathrm{d} y\otimes \mathrm{d} y$

And we can display it in polar co-ordinates

print(latex(g_e.display(FramePolarN)))

$\displaystyle g_e = \mathrm{d} r\otimes \mathrm{d} r + \left( x^{2} + y^{2} \right) \mathrm{d} \theta\otimes \mathrm{d} \theta$

Next let us compute the Levi-Civita connection from this metric.

nab_e = g_e.connection()
print(latex(nab_e))

$\displaystyle \nabla_{g_e}$

If we use Cartesian co-ordinates, we expect that $\Gamma^k_{ij} = 0, \forall i,j,k$. Only non-zero entries get printed.

print(latex(nab_e.display(FrameCartesianN)))

Just to be sure, we can print out all the entries.

print(latex(nab_e[:]))

$\displaystyle \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]$

In polar co-ordinates, we get

print(latex(nab_e.display(FramePolarN)))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, r } \, \theta \, \theta }^{ \, r \phantom{\, \theta } \phantom{\, \theta } } & = & -\sqrt{x^{2} + y^{2}} \\ \Gamma_{ \phantom{\, \theta } \, r \, \theta }^{ \, \theta \phantom{\, r } \phantom{\, \theta } } & = & \frac{1}{\sqrt{x^{2} + y^{2}}} \\ \Gamma_{ \phantom{\, \theta } \, \theta \, r }^{ \, \theta \phantom{\, \theta } \phantom{\, r } } & = & \frac{1}{\sqrt{x^{2} + y^{2}}} \end{array}$

Which we can rew-rewrite as

\displaystyle \begin{aligned} \Gamma^r_{\theta,\theta} &= -r \\ \Gamma^\theta_{r,\theta} &= 1/r \\ \Gamma^\theta_{\theta,r} &= 1/r \end{aligned}

with all other entries being 0.

The Sphere

We define a 2 dimensional manifold. We call it the 2-dimensional (unit) sphere but it we are going to remove a meridian to allow us to define the desired connection with torsion on it.

S2 = Manifold(2, 'S^2', latex_name=r'\mathbb{S}^2', start_index=1)
print(latex(S2))

$\displaystyle \mathbb{S}^2$

To start off with we cover the manifold with two charts.

polar.<th,ph> = S2.chart(r'th:(0,pi):\theta ph:(0,2*pi):\phi'); print(latex(polar))

$\displaystyle \left(\mathbb{S}^2,({\theta}, {\phi})\right)$

mercator.<xi,ze> = S2.chart(r'xi:(-oo,oo):\xi ze:(0,2*pi):\zeta'); print(latex(mercator))

$\displaystyle \left(\mathbb{S}^2,({\xi}, {\zeta})\right)$

We can now check that we have two charts.

print(latex(S2.atlas()))

$\displaystyle \left[\left(\mathbb{S}^2,({\theta}, {\phi})\right), \left(\mathbb{S}^2,({\xi}, {\zeta})\right)\right]$

We can then define co-ordinate frames.

epolar = polar.frame(); print(latex(epolar))

$\displaystyle \left(\mathbb{S}^2 ,\left(\frac{\partial}{\partial {\theta} },\frac{\partial}{\partial {\phi} }\right)\right)$

emercator = mercator.frame(); print(latex(emercator))

$\displaystyle \left(\mathbb{S}^2 ,\left(\frac{\partial}{\partial {\xi} },\frac{\partial}{\partial {\zeta} }\right)\right)$

And define a transition map and its inverse from one frame to the other checking that they really are inverses.

xy_to_uv = polar.transition_map(mercator, (log(tan(th/2)), ph))
xy_to_uv.set_inverse(2*arctan(exp(xi)), ze)
Check of the inverse coordinate transformation:
th == 2*arctan(sin(1/2*th)/cos(1/2*th))
ph == ph
xi == xi
ze == ze

We can define the metric which is the pullback of the Euclidean metric on $\mathbb{R}^3$.

g = S2.metric('g')
g[1,1], g[2,2] = 1, (sin(th))^2

And then calculate the Levi-Civita connection defined by it.

nab_g = g.connection()
print(latex(nab_g.display()))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\phi} \, {\phi} }^{ \, {\theta} \phantom{\, {\phi} } \phantom{\, {\phi} } } & = & -\cos\left({\theta}\right) \sin\left({\theta}\right) \\ \Gamma_{ \phantom{\, {\phi} } \, {\theta} \, {\phi} }^{ \, {\phi} \phantom{\, {\theta} } \phantom{\, {\phi} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \\ \Gamma_{ \phantom{\, {\phi} } \, {\phi} \, {\theta} }^{ \, {\phi} \phantom{\, {\phi} } \phantom{\, {\theta} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

We know the geodesics defined by this connection are the great circles.

We can check that this connection respects the metric.

print(latex(nab_g(g).display()))

$\displaystyle \nabla_{g} g = 0$

And that it has no torsion.

print(latex(nab_g.torsion().display()))
0

A New Connection

Let us now define an orthonormal frame.

ch_basis = S2.automorphism_field()
ch_basis[1,1], ch_basis[2,2] = 1, 1/sin(th)
e = S2.default_frame().new_frame(ch_basis, 'e')
print(latex(e))

$\displaystyle \left(\mathbb{S}^2, \left(e_1,e_2\right)\right)$

We can calculate the dual 1-forms.

dX = S2.coframes()[2] ; print(latex(dX))

$\displaystyle \left(\mathbb{S}^2, \left(e^1,e^2\right)\right)$

print(latex((dX[1], dX[2])))

$\displaystyle \left(e^1, e^2\right)$

print(latex((dX[1][:], dX[2][:])))

$\displaystyle \left(\left[1, 0\right], \left[0, \sin\left({\theta}\right)\right]\right)$

In this case it is trivial to check that the frame and coframe really are orthonormal but we let sage do it anyway.

print(latex(((dX[1](e[1]).expr(), dX[1](e[2]).expr()), (dX[2](e[1]).expr(), dX[2](e[2]).expr()))))

$\displaystyle \left(\left(1, 0\right), \left(0, 1\right)\right)$

Let us define two vectors to be parallel if their angles to a given meridian are the same. For this to be true we must have a connection $\nabla$ with $\nabla e_1 = \nabla e_2 = 0$.

nab = S2.affine_connection('nabla', latex_name=r'\nabla')
nab.add_coef(e)

Displaying the connection only gives the non-zero components.

print(latex(nab.display(e)))

For safety, let us check all the components explicitly.

print(latex(nab[e,:]))

$\displaystyle \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]$

Of course the components are not non-zero in other frames.

print(latex(nab.display(epolar)))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\phi} } \, {\phi} \, {\theta} }^{ \, {\phi} \phantom{\, {\phi} } \phantom{\, {\theta} } } & = & \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

print(latex(nab.display(emercator)))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\xi} } \, {\xi} \, {\xi} }^{ \, {\xi} \phantom{\, {\xi} } \phantom{\, {\xi} } } & = & 2 \, \cos\left(\frac{1}{2} \, {\theta}\right)^{2} - 1 \\ \Gamma_{ \phantom{\, {\zeta} } \, {\zeta} \, {\xi} }^{ \, {\zeta} \phantom{\, {\zeta} } \phantom{\, {\xi} } } & = & \frac{2 \, \cos\left(\frac{1}{2} \, {\theta}\right) \cos\left({\theta}\right) \sin\left(\frac{1}{2} \, {\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

This connection also respects the metric $g$.

print(latex(nab(g).display()))

$\displaystyle \nabla g = 0$

Thus, since the Levi-Civita connection is unique, it must have torsion.

print(latex(nab.torsion().display(e)))

$\displaystyle \frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} e_2\otimes e^1\otimes e^2 -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} e_2\otimes e^2\otimes e^1$

The equations for geodesics are

$\displaystyle \ddot{\gamma}^k + \Gamma_{ \phantom{\, {k} } \, {i} \, {j} }^{ \, {k} \phantom{\, {i} } \phantom{\, {j} } }\dot{\gamma}^i\dot{\gamma}^j = 0$

Explicitly for both variables in the polar co-ordinates chart.

\displaystyle \begin{aligned} \ddot{\gamma}^\phi & + \frac{\cos\theta}{\sin\theta}\dot{\gamma}^\phi\dot{\gamma}^\theta &= 0 \\ \ddot{\gamma}^\theta & &= 0 \end{aligned}

We can check that $\gamma^\phi(t) = \alpha\log\tan t/2$ and $\gamma^\theta(t) = t$ are solutions although sage needs a bit of prompting to help it.

t = var('t'); a = var('a')
print(latex(diff(a * log(tan(t/2)),t).simplify_full()))

$\displaystyle \frac{a}{2 \, \cos\left(\frac{1}{2} \, t\right) \sin\left(\frac{1}{2} \, t\right)}$

We can simplify this further by recalling the trignometric identity.

print(latex(sin(2 * t).trig_expand()))

$\displaystyle 2 \, \cos\left(t\right) \sin\left(t\right)$

print(latex(diff (a / sin(t), t)))

$\displaystyle -\frac{a \cos\left(t\right)}{\sin\left(t\right)^{2}}$

In the mercator co-ordinates chart this is

\displaystyle \begin{aligned} \gamma^\xi(t) &= \alpha\log\tan t/2 \\ \gamma^\zeta(t) &= \log\tan t/2 \end{aligned}

In other words: straight lines.

Reparametersing with $s = \alpha\log\tan t/2$ we obtain

\displaystyle \begin{aligned} \gamma^\phi(s) &= s \\ \gamma^\theta(s) &= 2\arctan e^\frac{s}{\alpha} \end{aligned}

Let us draw such a curve.

R.<t> = RealLine() ; print(R)
Real number line R
print(dim(R))
1
c = S2.curve({polar: [2*atan(exp(-t/10)), t]}, (t, -oo, +oo), name='c')
print(latex(c.display()))

$\displaystyle \begin{array}{llcl} c:& \mathbb{R} & \longrightarrow & \mathbb{S}^2 \\ & t & \longmapsto & \left({\theta}, {\phi}\right) = \left(2 \, \arctan\left(e^{\left(-\frac{1}{10} \, t\right)}\right), t\right) \\ & t & \longmapsto & \left({\xi}, {\zeta}\right) = \left(-\frac{1}{10} \, t, t\right) \end{array}$

c.parent()

$\displaystyle \mathrm{Hom}\left(\mathbb{R},\mathbb{S}^2\right)$

c.plot(chart=polar, aspect_ratio=0.1)

It’s not totally clear this is curved so let’s try with another example.

d = S2.curve({polar: [2*atan(exp(-t)), t]}, (t, -oo, +oo), name='d')
print(latex(d.display()))

$\displaystyle \begin{array}{llcl} d:& \mathbb{R} & \longrightarrow & \mathbb{S}^2 \\ & t & \longmapsto & \left({\theta}, {\phi}\right) = \left(2 \, \arctan\left(e^{\left(-t\right)}\right), t\right) \\ & t & \longmapsto & \left({\xi}, {\zeta}\right) = \left(-t, t\right) \end{array}$

d.plot(chart=polar, aspect_ratio=0.2)

Now it’s clear that a straight line is curved in polar co-ordinates.

But of course in Mercator co-ordinates, it is a straight line. This explains its popularity with mariners: if you draw a straight line on your chart and follow that bearing or rhumb line using a compass you will arrive at the end of the straight line. Of course, it is not the shortest path; great circles are but is much easier to navigate.

c.plot(chart=mercator, aspect_ratio=0.1)
d.plot(chart=mercator, aspect_ratio=1.0)

We can draw these curves on the sphere itself not just on its charts.

R3 = Manifold(3, 'R^3', r'\mathbb{R}^3', start_index=1)
cart.<X,Y,Z> = R3.chart(); print(latex(cart))

$\displaystyle \left(\mathbb{R}^3,(X, Y, Z)\right)$

Phi = S2.diff_map(R3, {
(polar, cart): [sin(th) * cos(ph), sin(th) * sin(ph), cos(th)],
(mercator, cart): [cos(ze) / cosh(xi), sin(ze) / cosh(xi),
sinh(xi) / cosh(xi)]
},
name='Phi', latex_name=r'\Phi')

We can either plot using polar co-ordinates.

graph_polar = polar.plot(chart=cart, mapping=Phi, nb_values=25, color='blue')
show(graph_polar, viewer=viewer3D)

Or using Mercator co-ordinates. In either case we get the sphere (minus the prime meridian).

graph_mercator = mercator.plot(chart=cart, mapping=Phi, nb_values=25, color='red')
show(graph_mercator, viewer=viewer3D)

We can plot the curve with an angle to the meridian of $\pi/2 - \arctan 1/10$

graph_c = c.plot(mapping=Phi, max_range=40, plot_points=200, thickness=2)
show(graph_polar + graph_c, viewer=viewer3D)

And we can plot the curve at angle of $\pi/4$ to the meridian.

graph_d = d.plot(mapping=Phi, max_range=40, plot_points=200, thickness=2, color="green")
show(graph_polar + graph_c + graph_d, viewer=viewer3D)

With automatic differentiation and symbolic numbers, symbolic differentiation is straigtforward in Haskell.

> import Data.Number.Symbolic
>
> x = var "x"
> y = var "y"
>
> test xs = jacobian ((\x -> [x]) . f) xs
>   where
>     f [x, y] = sqrt \$ x^2 + y^2

ghci> test [1, 1]
[[0.7071067811865475,0.7071067811865475]]

ghci> test [x, y]
[[x/(2.0*sqrt (x*x+y*y))+x/(2.0*sqrt (x*x+y*y)),y/(2.0*sqrt (x*x+y*y))+y/(2.0*sqrt (x*x+y*y))]]


Anyone wishing to take on the task of producing a Haskell version of sagemanifolds is advised to look here before embarking on the task.

Appendix A: Conformal Equivalence

Agricola and Thier (2004) shows that the geodesics of the Levi-Civita connection of a conformally equivalent metric are the geodesics of a connection with vectorial torsion. Let’s put some but not all the flesh on the bones.

The Koszul formula (see e.g. (O’Neill 1983)) characterizes the Levi-Civita connection $\nabla$

\displaystyle \begin{aligned} 2 \langle \nabla_X Y, Z\rangle & = X \langle Y,Z\rangle + Y \langle Z,X\rangle - Z \langle X,Y\rangle \\ &- \langle X,[Y,Z]\rangle + \langle Y,[Z,X]\rangle + \langle Z,[X,Y]\rangle \end{aligned}

Being more explicit about the metric, this can be re-written as

\displaystyle \begin{aligned} 2 g(\nabla^g_X Y, Z) & = X g(Y,Z) + Y g(Z,X) - Z g(X,Y) \\ &- g(X,[Y,Z]) + g(Y,[Z,X]) + g(Z,[X,Y]) \end{aligned}

Let $\nabla^h$ be the Levi-Civita connection for the metric $h = e^{2\sigma}g$ where $\sigma \in C^\infty M$. Following [Gadea2010] and substituting into the Koszul formula and then applying the product rule

\displaystyle \begin{aligned} 2 e^{2 \sigma} g(\nabla^h_X Y, Z) & = X e^{2 \sigma} g(Y,Z) + Y e^{2 \sigma} g(Z,X) - Z e^{2 \sigma} g(X,Y) \\ & + e^{2 \sigma} g([X,Y],Z]) - e^{2 \sigma} g([Y,Z],X) + e^{2 \sigma} g([Z,X],Y) \\ & = 2 e^{2\sigma}[g(\nabla^{g}_X Y, Z) + X\sigma g(Y,Z) + Y\sigma g(Z,X) - Z\sigma g(X,Y)] \\ & = 2 e^{2\sigma}[g(\nabla^{g}_X Y + X\sigma Y + Y\sigma X - g(X,Y) \mathrm{grad}\sigma, Z)] \end{aligned}

Where as usual the vector field, $\mathrm{grad}\phi$ for $\phi \in C^\infty M$, is defined via $g(\mathrm{grad}\phi, X) = \mathrm{d}\phi(X) = X\phi$.

Let’s try an example.

nab_tilde = S2.affine_connection('nabla_t', r'\tilde_{\nabla}')
f = S2.scalar_field(-ln(sin(th)), name='f')
for i in S2.irange():
for j in S2.irange():
for k in S2.irange():
nab_g(polar.frame()[i])(polar.frame()[j])(polar.coframe()[k]) + \
polar.frame()[i](f) * polar.frame()[j](polar.coframe()[k]) + \
polar.frame()[j](f) * polar.frame()[i](polar.coframe()[k]) + \
g(polar.frame()[i], polar.frame()[j]) * \
polar.frame()[1](polar.coframe()[k]) * cos(th) / sin(th)
print(latex(nab_tilde.display()))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\theta} \, {\theta} }^{ \, {\theta} \phantom{\, {\theta} } \phantom{\, {\theta} } } & = & -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

print(latex(nab_tilde.torsion().display()))
0
g_tilde = exp(2 * f) * g
print(latex(g_tilde.parent()))

$\displaystyle \mathcal{T}^{(0,2)}\left(\mathbb{S}^2\right)$

print(latex(g_tilde[:]))

$\displaystyle \left(\begin{array}{rr} \frac{1}{\sin\left({\theta}\right)^{2}} & 0 \\ 0 & 1 \end{array}\right)$

nab_g_tilde = g_tilde.connection()
print(latex(nab_g_tilde.display()))

$\displaystyle \begin{array}{lcl} \Gamma_{ \phantom{\, {\theta} } \, {\theta} \, {\theta} }^{ \, {\theta} \phantom{\, {\theta} } \phantom{\, {\theta} } } & = & -\frac{\cos\left({\theta}\right)}{\sin\left({\theta}\right)} \end{array}$

It’s not clear (to me at any rate) what the solutions are to the geodesic equations despite the guarantees of Agricola and Thier (2004). But let’s try a different chart.

print(latex(nab_g_tilde[emercator,:]))

$\displaystyle \left[\left[\left[0, 0\right], \left[0, 0\right]\right], \left[\left[0, 0\right], \left[0, 0\right]\right]\right]$

In this chart, the geodesics are clearly straight lines as we would hope.

References

Agricola, Ilka, and Christian Thier. 2004. “The geodesics of metric connections with vectorial torsion.” Annals of Global Analysis and Geometry 26 (4): 321–32. doi:10.1023/B:AGAG.0000047509.63818.4f.

Nakahara, M. 2003. “Geometry, Topology and Physics.” Text 822: 173–204. doi:10.1007/978-3-642-14700-5.

O’Neill, B. 1983. Semi-Riemannian Geometry with Applications to Relativity, 103. Pure and Applied Mathematics. Elsevier Science. https://books.google.com.au/books?id=CGk1eRSjFIIC.