# Optimizing incremental compilation

When you run make to build software, you expect a build on software that has been previously built to take less time than software we are building from scratch. The reason for this is incremental compilation: by caching the intermediate results of ahead-of-time compilation, the only parts of a program that must be recompiled are those that depend on the changed portions of the dependency graph.

The term incremental compilation doesn't say much about how the dependency graph is set up, which can lead to some confusion about the performance characteristics of "incremental compilers." For example, the Wikipedia article on incremental compilation claims that incremental compilers cannot easily optimize the code that it compiles. This is wrong: it depends entirely on how your dependency graph is set up.

Take, for example, gcc for C:

The object file a.o depends on a.c, as well as any header files it (transitively) includes (a.h, in this case.) Since a.o and main.o do not depend on each other, if a.c is rebuilt, main.o does not need to rebuilt. In this sense, C is actually amazingly incremental (said no C programmer ever.) The reason C has a bad reputation for incremental compilation is that, naively, the preprocessing of headers is not done incrementally at all (precompiled headers are an attempt to address this problem).

The dependency graph implies something else as well: unless the body of a function is placed in a.h, there is no way for the compiler that produces main.o to inline the body in: it knows nothing about the C file. a.c may not even exist at the point main.o is being built (parallelism!) The only time such optimization could happen is at link-time (this is why link-time optimization is a thing.)

A nice contrast is ghc for Haskell:

Here, Main.{hi,o} depend not only on Main.hs but A.hi, the module it imports. GHC is still incremental: if you modify an hs file, only things that import that source file that need to be recompiled. Things are even better than this dependency diagram implies: Main.{hi,o} may only depend on specific pieces of A.hi; if those pieces are unchanged GHC will exit early and report compilation is NOT necessary.

Despite being incremental, GHC supports inlining, since unfoldings of functions can be stored in hi files, which can subsequently be used by modules which import it. But now there is a trade-off: if you inline a function, you now depend on the unfolding in the hi file, making it more likely that compilation is necessary when A.hi changes.

As one final example, incremental compilers in IDEs, like the Java compiler in Eclipse, are not doing anything fundamentally different than the operation of GHC. The primary differences are (1) the intermediate products are held in memory, which can result in huge savings since parsing and loading interfaces into memory is a huge timewaster, and (2) they try to make the dependency diagram as fine-grained as possible.

This is all fairly well known, so I want to shift gears and think about a less well-understood problem: how does one do incremental compilation for parametrized build products? When I say parametrized, I mean a blend of the C and Haskell paradigms:

• Separate compilation. It should be possible to depend on an interface without depending on an implementation (like when a C file depends on a header file.)
• Cost-free abstraction. When the implementation is provided, we should (re)compile our module so that we can inline definitions from the implementation (like when a Haskell module imports another module.)

This problem is of interest for Backpack, which introduces libraries parametrized over signatures to Haskell. For Backpack, we came up with the following design: generate distinct build products for (1) uninstantiated code, for which we know an interface but not its implementation, and (2) instantiated code, for which we know all of their implementations:

In the blue box, we generate A.hi and Main.hi which contain purely the results of typechecking against an interface. Only in the pink box do we combine the implementation of A (in the red box) with the user of A (Main). This is just a graph; thus, incremental compilation works just as it works before.

We quickly ran into an intriguing problem when we sought to support multiple interfaces, which could be instantiated separately: if a client instantiates one interface but not the other, what should we do? Are we obligated to generate build products for these partially instantiated modules? This is not very useful, since we can't generate code yet (since we don't know all of the implementations.)

An important observation is that these interfaces are really cheap to generate (since you're not doing any compilation). Thus, our idea was to do the instantiation on-the-fly, without actually generating build products. The partially instantiated interfaces can be cached in memory, but they're cheap to generate, and we win if we don't need them (in which case we don't instantiate them.)

This is a bit of a clever scheme, and cleverness always has a dark side. A major source of complexity with on-the-fly instantiation is that there are now two representations of what is morally the same build product: the on-the-fly products and the actually compiled ones:

The subtyping relation between these two products states that we can always use a compiled interface in place of an on-the-fly instantiated one, but not vice versa: the on-the-fly interface is missing unfoldings and other important information that compiled code may need.

If we are type-checking only (we have uninstantiated interfaces), we might prefer on-the-fly interfaces, because they require less work to create:

In contrast, if we are compiling a package, we must use the compiled interface, to ensure we see the necessary unfoldings for inlining:

A particularly complicated case is if we are type-checking an uninstantiated set of modules, which themselves depend on some compiled interfaces. If we are using an interface p+a/M.hi, we should be consistent about it, and since r must use the compiled interfaces, so must q:

The alternative is to ensure that we always build products available that were typechecked against the on-the-fly interfaces, as below:

But this has the distasteful effect of requiring everything to be built twice (first typechecked against the on-the-fly interfaces, and then built for real).

The dependency graphs of build products for an ahead-of-time compiler is traditionally part of the public API of a compiler. As I've written previously, to achieve better incrementality, better parallelism, and more features (like parametrized modules), dependency graphs become more and more complicated. When compiler writers don't want to commit to an interface and build tool authors aren't interested learning about a complicated compilation model, the only systems that work well are the integrated ones.

Is Backpack's system for on-the-fly interface instantiation too clever for its own good? I believe it is well-designed for the problem it tries to solve, but if you still have a complicated design, perhaps you are solving the wrong problem. I would love to hear your thoughts.

# Full-Stack Developer (Haskell/PureScript) at CollegeVine (Full-time)

### Overview

CollegeVine is looking for a product-focused full-stack developer to help engineer the future of mentorship and higher education attainment.

There aren't many industries left that haven't been significantly disrupted by technology in some way, but you're reading about one right here! You will find many opportunities to apply high-leverage computer science (think machine learning, probabilistic reasoning, etc.) as well as plenty of opportunities for the more human side of the problem. As it stands, the current admissions process is a huge source of stress and confusion for students and parents alike. If we execute correctly, your work will impact the entire next generation of college graduates-to-be.

You will join a fast-moving company whose culture centers around authenticity, excellence, and balance. You'll find that everyone likes to keep things simple and transparent. We prefer to be goal-oriented and hands-off as long as you are a self-starter.

Our modern perspective on developer potential means we celebrate and optimize for real output. And that's probably the reason why we're a polyglot functional programming shop, with emphasis on Haskell and functional paradigms. Our infrastructure and non-mission-critical tooling tends to be in whatever works best for the task at hand: sometimes that's Haskell with advanced GHC extensions a-blazin', other times it's minimalist Ruby or bash—basically, it's a team decision based on whatever sits at the intersection of appropriateness, developer joy, quality, and velocity.

As an early-stage company headquartered in Cambridge, MA, we have a strong preference for key members of our team to be located in the Boston metro area; however, given that our company has its roots in remote work (and that it's 2016), we are open to remote arrangements after one year of continuous employment and/or executive approval.

### Requirements

You know you are right for this position if:

• You have at least five years of professional software engineering experience, and at least two years of preference for a high-level programming language that's used in industry, like Haskell, Clojure, OCaml, Erlang, F#, or similar.
• You have some front-end experience with JS or a functional language that compiles to JS, like PureScript, Elm, Clojurescript, or similar. We use PureScript, React, and ES6 on the front-end. It's pretty awesome.
• You are a self-starter and internally motivated, with a strong desire to be part of a successful team that shares your high standards.
• You have great written communication skills and are comfortable with making big decisions over digital presence (e.g. video chat).
• You have polyglot experience along several axes (dynamic/static, imperative/functional, lazy/strict, weird/not-weird).
• You are comfortable with modern infrastructure essentials like AWS, Heroku, Docker, CI, etc. You have basic but passable sysadmin skills.
• You are fluent with git.
• You instrument before you optimize. You test before you ship. You listen before you conclude. You measure before you cut. Twice.

### Benefits

We offer a competitive salary and a full suite of benefits, some of them unconventional, but awesome for the right person:

• Medical, dental, vision insurance and 401k come standard.
• Flexible hours with a 4-hour core - plan the rest of your workday as you wish, just give us the majority of your most productive hours. Productivity ideas: avoid traffic, never wait in line at the grocery store, wake up without an alarm clock.
• Goal-based environment (as opposed to grind-based or decree-based environment; work smarter, not harder; intelligently, not mindlessly). We collaborate on setting goals, but you set your own process for accomplishing those goals. You will be entrusted with a lot of responsibility and you might even experience fulfillment and self-actualization as a result.
• Daily physical activity/mindfulness break + stipend: invest a non-core hour to make yourself more awesome by using it for yoga, tap-dance lessons, a new bike, massage, a surfboard - use your imagination! Just don’t sit at a computer all day! Come back to work more relaxed and productive and share your joy with the rest of the team. Note: You must present and share proof of your newly enriched life with the team in order to receive the stipend.
• Equipment/setup budget so you can tool up the way you want. A brand new 15" MBP is standard issue if you have no strong opinions.

Remember: We’re a startup. You’re an early employee. We face challenges. We have to ship. Your ideas matter. You will make a difference.

Get information on how to apply for this position.

# Announcing: unagi-bloomfilter

I just released a new Haskell library called unagi-bloomfilter that is up now on hackage. You can install it with:

$cabal install unagi-bloomfilter The library uses the bloom-1 variant from “Fast Bloom Filters and Their Generalization” by Yan Qiao, et al. I’ll try to write more about it when I have the time. Also I just gave a talk on things I learned working on the project last night at the New York Haskell User Group: http://www.meetup.com/NY-Haskell/events/233372271/ It was quite rough, but I was happy to hear from folks that found some interesting things to take away from it. Thanks to Gershom for inviting me to speak, for my company Signal Vine for sponsoring my trip out, and to Yan Qiao for generously answering my silly questions and helping me understand the paper. ### P.S. We’re hiring haskell developers Signal Vine is an awesome group of people, with interesting technology and problems to solve, and we’re looking to grow the small development team. If you have some experience with haskell (you don’t have to be a guru) and are interested, please reach out to Jason or me at: brandon@signalvine.com jason@signalvine.com ## August 24, 2016 ### Michael Snoyman # Restarting this blog Just a minor note: I'm planning on starting up this blog again, with some personal thoughts - likely still mostly around programming and Haskell - that don't fit in the other blogs that I contribute to (Yesod Web Framework and FP Complete). I don't have a clear list of topics I'm going to be covering, but I'll likely be sharing some thoughts on running engineering teams and startups effectively. If you have something you'd like me to cover, please Tweet it to me. ## August 23, 2016 ### Roman Cheplyaka # Extract the first n sequences from a FASTA file A FASTA file consists of a series of biological sequences (DNA, RNA, or protein). It looks like this: >gi|173695|gb|M59083.1|AETRR16S Acetomaculum ruminis 16S ribosomal RNA NNTAAACAAGAGAGTTCGATCCTGGCTCAGGATNAACGCTGGCGGCATGCCTAACACATGCAAGTCGAAC GGAGTGCTTGTAGAAGCTTTTTCGGAAGTGGAAATAAGTTACTTAGTGGCGGACGGGTGAGTAACGCGTG >gi|310975154|ref|NR_037018.1| Acidaminococcus fermentans strain VR4 16S ribosomal RNA gene, partial sequence GGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAGAACTTTCTTCGGAATGTTC TTAGTGGCGAACGGGTGAGTAACGCGTAGGCAACCTNCCCCTCTGTTGGGGACAACATTCCGAAAGGGAT There probably exist dozens of python scripts to extract the first $$n$$ sequences from a FASTA file. Here I will show an awk one-liner that performs this task, and explain how it works. Here it is (assuming the number of sequences is stored in the environment variable NSEQS): awk "/^>/ {n++} n>$NSEQS {exit} {print}"

This one-liner can read from standard input (e.g. as part of a pipe), or you can append one or more file names to the end of the command, e.g.

awk "/^>/ {n++} n>$NSEQS {exit} {print}" file.fasta An awk script consists of one or more statements of the form pattern { actions }. The input is read line-by-line, and if the current line matches the pattern, the corresponding actions are executed. Our script consists of 3 statements: 1. /^>/ {n++} increments the counter each time a new sequence is started. /.../ denotes a regular expression pattern, and ^> is a regular expression that matches the > sign at the beginning of a line. An uninitialized variable in awk has the value 0, which is exactly what we want here. If we needed some other initial value (say, 1), we could have added a BEGIN pattern like this: BEGIN {n=1}. 2. n>$NSEQS {exit} aborts processing once the counter reaches the desired number of sequences.
3. {print} is an action without a pattern (and thus matching every line), which prints every line of the input until the script is aborted by exit.

A shorter and more cryptic way to write the same is

awk "/^>/ {n++} n>$NSEQS {exit} 1" Here I replaced the action-without-pattern by a pattern-without-action. The pattern 1 (meaning “true”) matches every line, and when the action is omitted, it is assumed to be {print}. ## August 22, 2016 ### mightybyte # Measuring Software Fragility <style> .hl { background-color: orange; } </style> While writing this comment on reddit I came up with an interesting question that I think might be a useful way of thinking about programming languages. What percentage of single non-whitespace characters in your source code could be changed to a different character such that the change would pass your CI build system but would result in a runtime bug? Let's call this the software fragility number because I think that metric gives a potentially useful measure of how bug prone your software is. At the end of the day software is a mountain of bytes and you're trying to get them into a particular configuration. Whether you're writing a new app from scratch, fixing bugs, or adding new features, the number of bytes of source code you have (similar to LOC, SLOC, or maybe the compressed number of bytes) is rough indication of the complexity of your project. If we model programmer actions as random byte mutations over all of a project's source and we're trying to predict the project's defect rate this software fragility number is exactly the thing we need to know. Now I'm sure many people will be quick to point out that this random mutation model is not accurate. Of course that's true. But I would argue that in this way it's similar to the efficient markets hypothesis in finance. Real world markets are obviously not efficient (Google didn't become$26 billion less valuable because the UK voted for brexit). But the efficient markets model is still really useful--and good luck finding a better one that everybody will agree on.

What this model lacks in real world fidelity, it makes up for in practicality. We can actually build an automated system to calculate a reasonable approximation of the fragility number. All that has to be done is take a project, randomly mutate a character, run the project's whole CI build, and see if the result fails the build. Repeat this for every non-whitespace character in the project and count how many characters pass the build. Since the character was generated at random, I think it's reasonable to assume that any mutation that passes the build is almost definitely a bug.

Performing this process for every character in a large project would obviously require a lot of CPU time. We could make this more tractable by picking characters at random to mutate. Repeat this until you have done it for a large enough number of characters and then see what percentage of them made it through the build. Alternatively, instead of choosing random characters you could choose whole modules at random to get more uniform coverage over different parts of the language's grammar. There are probably a number of different algorithms that could be tried for picking random subsets of characters to test. Similar to numerical approximation algorithms such as Newton's method, any of these algorithms could track the convergence of the estimate and stop when the value gets to a sufficient level of stability.

Now let's investigate actual fragility numbers for some simple bits of example code to see how this notion behaves. First let's look at some JavaScript examples.

It's worth noting that comment characters should not be allowed to be chosen for mutation since they obviously don't affect the correctness of the program. So the comments you see here have not been included in the calculations. Fragile characters are highlighted in orange.

// Fragility 12 / 48 = 0.25
function f(n) {
if ( n < 2 ) return 1;
else return n * f(n-1);
}
// Fragility 14 / 56 = 0.25
function g(n) {
var p = 1;
for (var i = 2; i <= n; i++ ) {
p *= i;
}
return p;
}

First I should say that I didn't write an actual program to calculate these. I just eyeballed it and thought about what things would fail. I easily could have made mistakes here. In some cases it may even be subjective, so I'm open to corrections or different views.

Since JavaScript is not statically typed, every character of every identifier is fragile--mutating them will not cause a build error because there isn't much of a build. JavaScript won't complain, you'll just start getting undefined values. If you've done a signifciant amount of JavaScript development, you've almost definitely encountered bugs from mistyped identifier names like this. I think it's mildly interesting that the recursive and iterative formulations if this function both have the same fragility. I expected them to be different. But maybe that's just luck.

Numerical constants as well as comparison and arithmetic operators will also cause runtime bugs. These, however, are more debatable because if you use the random procedure I outlined above, you'll probably get a build failure because the character would have probably changed to something syntactically incorrect. In my experience, it semes like when you mistype an alpha character, it's likely that the wrong character will also be an alpha character. The same seems to be true for the classes of numeric characters as well as symbols. The method I'm proposing is that the random mutation should preserve the character class. Alpha characters should remain alpha, numeric should remain numeric, and symbols should remain symbols. In fact, my original intuition goes even further than that by only replacing comparison operators with other comparison operators--you want to maximize the chance that new mutated character will cause a successful build so the metric will give you a worst-case estimate of fragility. There's certainly room for research into what patterns tend come up in the real world and other algorithms that might describe that better.

Now let's go to the other end of the programming language spectrum and see what the fragility number might look like for Haskell.

// Fragility 7 / 38 = 0.18
f :: Int -> Int
f n
| n < 2 = 1
| otherwise = n * f (n-1)

Haskell's much more substantial compile time checks mean that mutations to identifier names can't cause bugs in this example. The fragile characters here are clearly essential parts of the algorithm we're implementing. Maybe we could relate this idea to information theory and think of it as an idea of how much information is contained in the algorithm.

One interesting thing to note here is the effect of the length of identifier names on the fragility number. In JavaScript, long identifier names will increase the fragility because all identifier characters can be mutated and will cause a bug. But in Haskell, since identifier characters are not fragile, longer names will lower the fragility score. Choosing to use single character identifier names everywhere makes these Haskell fragility numbers the worst case and makes JavaScript fragility numbers the best case.

Another point is that since I've used single letter identifier names it is possible for a random identifier mutation in Haskell to not cause a build failure but still cause a bug. Take for instance a function that takes two Int parameters x and y. If y was mutated to x, the program would still compile, but it would cause a bug. My set of highlighted fragile characters above does not take this into account because it's trivially avoidable by using longer identifier names. Maybe this is an argument against one letter identifier names, something that Haskell gets criticism for.

Here's the snippet of Haskell code I was talking about in the above reddit comment that got me thinking about all this in the first place:

// Fragility 31 / 277 = 0.11
{ title       :: Text
, description :: Text
}

el "title" $dynText$ title <$> i elDynAttr "meta" (mkDescAttrs . description <$> i) blank
where
mkDescAttrs desc =
"name" =: "description"
"content" =: desc

In this snippet, the fragility number is probably close to 31 characters--the number of characters in string literals. This is out of a total of 277 non-whitespace characters, so the software fragility number for this bit of code is 11%. This half the fragility of the JS code we saw above! And as I've pointed out, larger real world JS examples are likely to have even higher fragility. I'm not sure how much we can conclude about the actual ratios of these fragility numbers, but at the very least it matches my experience that JS programs are significantly more buggy than Haskell programs.

The TDD people are probably thinking that my JS examples aren't very realistic because none of them have tests, and that tests would catch most of the identifier name mutations, bringing the fragility down closer to Haskell territory. It is true that tests will probably catch some of these things. But you have to write code to make that happen! It doesn't happen by default. Also, you need to take into account the fact that the tests themselves will have some fragility. Tests require time and effort to maintain. This is an area where this notion of the fragility number becomes less accurate. I suspect that since the metric only considers single character mutations it will underestimate the fragility of tests since mutating single characters in tests will automatically cause a build failure.

There seems to be a slightly paradoxical relationship between the fragility number and DRY. Imagine our above JS factorial functions had a test that completely reimplemented factorial and then tried a bunch of random values Quickcheck-style. This would yield a fragility number of zero! Any single character change in the code would cause a test failure. And any single character change in the tests would also cause a test failure. Single character changes can no longer classified fragile because we've violated DRY. You might say that the test suite shouldn't reimplement algorithm--you should just specific cases like f(5) == 120. But in an information theory sense this is still violating DRY.

Does this mean that the fragility number is not very useful? Maybe. I don't know. But I don't think it means that we should just throw away the idea. Maybe we should just keep in mind that this particular formulation doesn't have much to tell us about the fragility more complex coordinated multi-character changes. I could see the usefulness of this metric going either way. It could simplify down to something not very profound. Or it could be that measurements of the fragility of real world software projects end up revealing some interesting insights that are not immediately obvious even from my analysis here.

Whatever the usefulness of this fragility metric, I think the concept gets is thinking about software defects in a different way than we might be used to. If it turns out that my single character mutation model isn't very useful, perhaps the extension to multi-character changes could be useful. Hopefully this will inspire more people to think about these issues and play with the ideas in a way that will help us progress towards more reliable software and tools to build it with.

EDIT: Unsurprisingly, I'm not the first person to have thought of this. It looks like it's commonly known as mutation testing. That Wikipedia article makes it sound like mutation testing is commonly thought of as a way to assess your project's test suite. I'm particularly interested in what it might tell us about programming languages...i.e. how much "testing" we get out of the box because of our choice of programming language and implementation.

# Why version bounds cannot be inferred retroactively (using dates)

In past debates about Haskell's Package Versioning Policy (PVP), some have suggested that package developers don't need to put upper bounds on their version constraints because those bounds can be inferred by looking at what versions were available on the date the package was uploaded. This strategy cannot work in practice, and here's why.

Imagine someone creates a small new package called foo. It's a simple package, say something along the lines of the formattable package that I recently released. One of the dependencies for foo is errors, a popular package supplying frequently used error handling infrastructure. The developer happens to already have errors-1.4.7 installed on their system, so this new package gets built against that version. The author uploads it to hackage on August 16, 2015 with no upper bounds on its dependencies. Let's for simplicity imagine that errors is the only dependency, so the .cabal file looks like this:

name: foo
build-depends:
errors

If we come back through at some point in the future and try to infer upper bounds by date, we'll see that on August 16, the most recent version of errors was 2.0.0. Here's an abbreviated illustration of the picture we can see from release dates:

If we look only at release dates, and assume that packages were building against the most recent version, we will try to build foo with errors-2.0.0. But that is incorrect! Building foo with errors-2.0.0 will fail because errors had a major breaking change in that version. Bottom line: dates are irrelevant--all that matters is what dependency versions the author happened to be building against! You cannot assume that package authors will always be building against the most recent versions of their dependencies. This is especially true if our developer was using the Haskell Platform or LTS Haskell because those package collections lag the bleeding edge even more. So this scenario is not at all unlikely.

It is also possible for packages to be maintaining multiple major versions simultaneously. Consider large projects like the linux kernel. Developers routinely do maintenance releases on 4.1 and 4.0 even though 4.2 is the latest version. This means that version numbers are not always monotonically increasing as a function of time.

I should also mention another point on the meaning of version bounds. When a package specifies version bounds like this...

name: foo
build-depends:
errors >= 1.4 && < 1.5

...it is not saying "my package will not work with errors-1.5 and above". It is actually saying, "I warrant that my package does work with those versions of errors (provided errors complies with the PVP)". So the idea that "< 1.5" is a "preemptive upper bound" is wrong. The package author is not preempting anything. Bounds are simply information. The upper and lower bounds are important things that developers need to tell you about their packages to improve the overall health of the ecosystem. Build tools are free to do whatever they want with that information. Indeed, cabal-install has a flag --allow-newer that lets you ignore those upper bounds and step outside the version ranges that the package authors have verified to work.

In summary, the important point here is that you cannot use dates to infer version bounds. You cannot assume that package authors will always be building against the most recent versions of their dependencies. The only reliable thing to do is for the package maintainer to tell you explicitly what versions the package is expected to work with. And that means lower and upper bounds.

Update: Here is a situation that illustrates this point perfectly: cryptonite issue #96. cryptonite-0.19 was released on August 12, 2016. But cryptonite-0.15.1 was released on August 22, 2016. Any library published after August 22, 2016 that depends on cryptonite-0.15.1 would not be able to build if the solver used dates instead of explicit version bounds.

# Academic integrity: context and concrete steps

Continuing from my previous post, I wanted to write a bit about why I have been thinking about academic integrity, and what, concretely, I plan to do about it.

So, why have I been thinking about this? For one thing, my department had its fair share of academic integrity violations last year. On the one hand, it is right for students to be held accountable for their actions. On the other, in the face of a spate of violations, it is also right for us to reevaluate what we are doing and why, what sort of environmental factors may be pushing students to violate academic integrity, and how we can create a better environment. Environment does not excuse behavior, but it can shape behavior in profound ways.

Another reason for thinking about academic integrity is that starting this fall, I will be a member of the committee that hears and makes a determination in formal academic integrity cases at my institution. It seems no one wants to be on this committee, and to a certain extent I can understand why. But I chose it, for several reasons. For one, I think it is important to have someone on the committee from the natural sciences (I will be the only one), who understands issues of plagiarism in the context of technical subjects. I also care a lot about ensuring that academic integrity violations are handled carefully and thoughtfully, so that students actually learn something from the experience, and more importantly, so that they come through with their sense of belonging intact. When a student (or anyone, really) does something that violates the standards of a community and is subject to consequences, it is all too easy for them to feel as though they are now a lesser member or even excluded from the community. It takes much more intentional communication to make clear to them that although they may have violated a community standard—which necessarily comes with a consequence—they are still a valued member. (Thanks to Leslie Zorwick for explaining about the power of belonging, and for relating recent research showing that communicating belonging can make a big difference for students on academic probation—which seems similar to students accused or convicted of academic integrity violations. I would cite it but I think it is not actually published yet.)

Thinking about all of this is well and good, but what will I do about it? How do I go about communicating all of this to my students, and creating the sort of environment I want? Here are the concrete things I plan to do starting this fall:

• In all my courses where it makes sense, I plan to require students to have at least one citation (perhaps three, if I am bold) on every assignment turned in—whether they cite web pages, help from TAs or classmates, and so on. The point is to get them thinking regularly about the resources and help that they make use of on every single assignment, to foster a spirit of thankfulness. I hope it will also make it psychologically harder for students to plagiarize and lie about it. Finally, I hope it will lead to better outcomes in cases where a student makes inappropriate use of an online resource—i.e. when they “consult” a resource, perhaps even deceiving themselves into thinking that they are really doing the work, but end up essentially copying the resource. If they don’t cite the resource in such a case, I have a messy academic integrity violation case on my hands; if they do, there is no violation, even though the student didn’t engage with the assignment as I would have hoped, and I can have a simple conversation with them about my expectations and their learning (and perhaps lower their grade).

• I will make sure to communicate to my students how easy it is for me to detect plagiarism, and how dire the consequences can be. A bit of healthy fear never hurt!

• But beyond that, I want to make sure my students also understand that I care much more about them, as human beings, than I do about their grade or whether they turn in an assignment. I suspect that a lot of academic integrity violations happen at 2am, the night before a deadline, when the student hasn’t even started the assignment and they are riddled with anxiety and running on little sleep—but they feel as though they have to turn something in and this urge overrides whatever convictions they might have about plagiarism. To the extent their decision is based on anxiety about grades, there’s not much I can do about it. However, if their decision stems from a feeling of shame at not turning something in and disappointing their professor, I can make a difference: in that moment, I want my students to remember that their value in my eyes as human beings is not tied to their academic performance; that I will be much more impressed by their honesty than by whether they turn something in.

• As a new member of the academic integrity committee, I plan to spend most of my time listening and learning from the continuing members of the committee; but I do hope to make sure our communication with both accused and convicted students emphasizes that they are still valued members of our community.

Other concrete suggestions, questions, experiences to relate, etc. are all most welcome!

# Debian chroot on Android

Sometimes, a simple idea — so simple it can be distilled down to 4 words — can be truly astounding.

## Why?

For quite a while, I've been considering the best way to ensure the resilience, security, and accessibility of various pieces of personal data. There are several different categories, and no solution will be optimal for all of them. My music collection, for example, is large, non-secret, and largely replaceable (although the thought of dragging that enormous box of CDs out of the garage and reripping them all is pretty daunting!) The music lives on a server in my home, with my own backups. I upload medium bitrate versions to a cloud music service, and I have a low bitrate copy on my laptop and phone. So that's pretty well covered.

A similar scheme covers my photos and videos. They are much less replaceable than music, but fortunately much smaller, so there are a few extra copies kicking about.

Then, I have a few tiny things that I want to keep in sync across various devices. For example, today's todo list, my "blue skies ideas" list, and my password store. I've looked at syncthing, which is an awesome project, and I'm sure I'm going to find a good use for it someday.

But for these things, git is really the obvious solution. Most of them are already git repos, including my password-store, the only missing piece is a git client on my phone. So I was searching for recommendations for Android git clients, and these words jumped out at me:

create a debian image, mount it in your android device and chroot to it

My flabber was well and truly gasted.

## How?

It's very straightforward. From some debian instance on which you have root, run:

debootstrap --foreign --arch=armhf jessie jessie

Tar up the resulting tree in jessie, copy it to android, unpack it (ah, but where?), chroot, and then run:

debootstrap --second-stage

## Which?

Here are some things I've used: ssh, rsync, dash, bash, the rc shell (which I happen to maintain). All the usual userland tools, mv, chmod, etc. These (of course) are the proper full GNU versions, so you don't keep being bitten by the little discrepancies in, for instance, the busybox versions.

Package management with apt-get and dpkg. And perl, git, nano, vim, update-alternatives (so I never have to see nano again), less, man.

I started installing the pass package, but that pulls in gtk, pango and a whole bunch of other things I'm not going to use. So I downloaded password-store and installed it myself.

The ps command (you need to mount /proc in the chroot of course), top, strace, lsof. You can even strace android processes, it all Just Works. (OK, lsof gets upset because it's inside a chroot and it can't see all the mount points that /proc/mounts says exist. But still.)

I thought it might be fun to run mosh. It installed fine, but then bombed out with a weird error. I went on a bit of a wild goose chase, and concluded (it was late at night) that I needed a fix from the development version. So I cloned the mosh repo on github, installed a whole heap of build tools, compilers, libraries, and built mosh. On my phone!

In fact, the problem was simpler than that, and easily solved by using the stty command to declare my terminal size. And then I had to open up some ports in android's firewall... with iptables of course.

I could go on, but you're getting the idea. In summary, this is not something pretending to be a GNU/Linux system. It's the real deal.

Of course, there are some missing pieces, of which the most serious is the lack of daemons. I've installed etckeeper, but there will be no daily autocommit.

Ping doesn't work, because even a root shell is not allowed to use raw sockets. You can create a user, but it's not able to do much... I'll look at this some more when I have time, but I'm just running everything as root at the moment. Android's systems for users, permissions, and capabilities are entirely unfamiliar to me, although I'm trying to learn.

## Where?

I made and remade my chroot several times before I was happy with it. Hopefully these notes will make things quicker for you.

First of all, Debian wants a “real” filesystem, which is to say, anything except FAT. Of the existing partitions, an obvious choice would be /data, which on my phone is ext4. Unfortunately, the major major drawback of my phone is that its /data is tiddly, just 1GB, and perennially full. (I did try the chroot on /data, before realising the fatal flaw. One curiosity is that /data is mounted with nodev, so populating /dev fails till you remount without nodev. You might think it would be better to bind mount the real /dev into the chroot anyway, and you might well be right. But I've been running with the /dev made by debootstrap with no problems.)

So it's time to repartition my 32GB SD card. Android apparently doesn't support GPT which is only a minor nuisance. I do hate all that primary / extended / logical nonsense though, it's so 1980s.

Much, much more seriously, it complains bitterly if it finds an SD card without a FAT partition. This is infuriating. The kernel supports ext3 just fine (and ext4 too, at least for partitions on fixed internal storage, although apparently not for the SD card, which makes no sense to me). So, if I insert a card that happens to have an ext3 partition on it, why not just mount it? Or if there's some scenario I'm not aware of that might not work quite right, notify a dialogue that explains and offers to mount the partition anyway. What actually happens is a notification that tells you the SD card is “damaged”, and offers to format it. Arrggh!

(I have reason to believe that the latest versions of Android will countenance SD cards with real file systems, although I need to research this further.)

My next try was a 50MB FAT partition, and the remainder ext3. This largely worked, but it didn't leave anywhere for android to move those apps which are prepared to live on SD card, absolutely vital to squeeze some extra apps onto my old phone.

The final iteration was a 4GB FAT partition, and the rest ext3. Of course, I don't need 28GB for the chroot itself: it starts off well under 1G, and even after installing loads of stuff I'm still under 2G. But I realised that I'd be able to put my music collection on the ext3 partition, which would save the tedious step of renaming everything to comply with FAT restrictions (mainly the prohibition on : in file names). Of course, I can now rsync-over-ssh the music from my laptop, which seems to go quicker than via USB.

Another annoyance is that the ext3 partition on the SD card isn't automatically mounted. I've spent some time in the past trying to find a boot time hook I can use, but with no luck. So I have to do this from the android shell every time my phone reboots, using a helper script cunningly located under the mount point:

root@android:/ # cat /data/ext3/m
#!/system/bin/sh
mount -t ext3 /dev/block/mmcblk1p5 /data/ext3

## What?

Far and away the nicest way to communicate with the chroot is to plug into a laptop or desktop and use adb shell from the Android SDK. At that point, it's scarcely different from sshing to a remote server.

Of course, the whole point of the phone is that it's portable. On the move, I'm using Jack Palevich's Terminal Emulator for Android and Klaus Weidner's Hacker's Keyboard. The keyboard has all the keys you need — Esc, Tab, Ctrl etc — so it's ideal for non-trivial tasks (such as vim!). But the tiny keys are very fiddly on my phone, especially in portrait, so I sometimes stick to my usual keyboard.

I've got a trivial helper script to start my favourite shell under ssh-agent:

root@android:/ # cat /data/ext3/ch
#!/system/bin/sh
exec chroot /data/ext3/jessie /usr/bin/ssh-agent /usr/bin/rc -l

## Whither?

So I have a fantastic solution to my document and password management problems. And a great ssh client. And a whole new architecture to build my projects on, most of which aren't the sort of thing that it makes much sense to run on a phone, but building in different places is always good for portability.

I'd heard that Android uses a "modified Linux" kernel, so I wasn't really expecting any of this stuff to work properly, let alone tools like strace and lsof. Apparently, though, the changes were folded back into the mainline kernel at the 3.3 release. My (3 year old) phone runs 3.4.5, so presumably this a fairly vanilla kernel.

This is awesome. Google has its faults, but their commitment to free software easily earns them the “least evil” prize among the current Internet quintumvirate. (That's Google, Apple, Facebook, Amazon, and Microsoft, for anyone who's been asleep the last few years.)

Realising that, yes, that computer in my pocket is a pukka Linux box has endeared me even further to Android. I'd love to write some apps for it... except I've already got more than enough projects to fill my “copious spare time”!

# 1Liners for July 2016

• July 14th, 2016: So you have x :: [a] in the IO monad, and the function f :: a -> b What is the expression that gets you IO [b]?

# Eric Joyce: Why the Brexit vote pushed me to support Scottish independence

Former Labour MP Eric Joyce explains his change of heart.
At the referendum, still an MP, I gave independence very serious thought right up to the close of the vote. I finally came down on the side of No because I thought big EU states with a potential secession issue, like Spain and France, would prevent an independent Scotland joining the EU. This is obviously no longer the case. And I was, like the great majority of the economists and other experts whose opinion I valued, convinced that being outside the EU would be bonkers – it would badly harm our economy and hurt Scots in all sorts of unforeseen ways too.
The Brexit vote reversed that overnight: all of the arguments we in the unionist camp had used were made invalid at worst, questionable at best. This doesn’t mean they were necessarily all wrong. But it does mean that open-minded, rational No voters should at the very least seriously re-consider things in the light of the staggering new context. They should have an open ear to the experts saying that with independence, jobs in Scotland’s financial and legal service sectors will expand as English and international firms look to keep a foothold in the EU.  And to the reasonable prospect of an eventual £50+ oil price might realistically open the way to a final, generational, upswing in employment, and to security for Scotland’s extractive industries and their supply chain. And to the idea that preserving Scotland’s social democracy in the face of the Little Englander mentality of right-wing English Tories might be worth the fight.

# Docker configuration on Fedora

If you need to change the docker daemon options on Fedora, take a look at these files:

# ls /etc/sysconfig/docker*
/etc/sysconfig/docker
/etc/sysconfig/docker-network
/etc/sysconfig/docker-storage
/etc/sysconfig/docker-storage-setup

In my case, I needed to change the container base size, so I put the following in /etc/sysconfig/docker-storage:

DOCKER_STORAGE_OPTIONS="--storage-opt dm.basesize=20G"

These files are then sourced in /etc/systemd/system/multi-user.target.wants/docker.service, and the variables (such as DOCKER_STORAGE_OPTIONS) are passed to the docker daemon.

# Academic integrity and other virtues

I have been thinking a lot recently about academic integrity. What does it mean? Why do we care—what is it we fundamentally want students to do and to be? And whatever it is, how do we go about helping them become like that?

As a general principle, I think we ought to focus not just on prohibiting certain negative behaviors, but rather on encouraging positive behaviors (which are in a suitable sense “dual” to the negative behaviors we want to prohibit). Mere prohibitions leave a behavioral vacuum—“OK, don’t do this, so what should I do?”—and incentivize looking for loopholes, seeing how close one can toe the line without breaking the letter of the law. On the other hand, a positive principle actively guides behavior, and in actively striving towards the ideal of the positive principle, one (ideally) ends up far away from the prohibited negative behavior.

In the case of academic integrity, then, it is not enough to say “don’t plagiarize”. In fact, if one focuses on the prohibition itself, this is a particularly difficult one to live by, because academic life is not lived in a vacuum: ideas and accomplishments never spring forth ex nihilo, owing nothing to the ideas and accomplishments of others. In reality, one is constantly copying in big and small ways, explicitly and implicitly, consciously and unconsciously. In fact, this is how learning works! We just happen to think that some forms of copying are acceptable and some are not. Now, there are good reasons for distinguishing acceptable and unacceptable copying; the point is that this is often more difficult and ambiguous for students than we care to admit.

So what is the “dual” of plagiarism? What are the positive virtues which we should instill in our students? One can, of course, say “integrity”, but I don’t think this goes far enough: to have integrity is to adhere to a particular set of moral principles, but which ones? Integrity means being truthful, but truthful about what? It seems this is just another way of saying “don’t plagiarize”, i.e. don’t lie about the source of an idea. I have come up with two other virtues, however, which I think really get at the heart of the issue: thankfulness and generosity. (And in the spirit of academic thankfulness, I should say that Vic Norman first got me thinking along these lines with his paper How Will You Practice Virtue Witout Skill?: Preparing Students to be Virtuous Computer Programmers, published in the 2014-2015 Journal of the ACMS; I was also influenced by a discussion of Vic’s paper with several others at the ACMS luncheon at SIGCSE 2016.)

Academic thankfulness has to do with recognizing one’s profound debt to the academic context: to all those thinkers and doers who have come before, and to all those who help you along your journey as a learner, whether professors, other students, or random strangers on the Internet. A thankful student is naturally driven to cite anything and everything, to give credit where credit is due, even to give credit where credit is not technically necessary but can serve as a token of thanks. A thankful student recognizes the hard work and unique contributions of others, rather than seeing others as mere means to their own ends. A thankful student never plagiarizes, since taking something from someone else and claiming it for one’s own is the height of ingratitude.

Academic generosity is about freely sharing one’s own ideas, sacrificing one’s time and energy to help others, and allowing others to share in credit and recognition. Being academically generous is harder than being thankful, because it opens you up to the potential ingratitude of others, but in some sense it is the more important of the two virtues: if no one were generous, no one would have anything to be thankful for. A generous student is naturally driven to cite anything and everything, to give credit and recognition to others, whether earned or not. A generous student recognizes others as worthy collaborators rather than as means to an end. A generous student never plagiarizes, since they know how it would feel to have their own generosity taken advantage of.

There’s more to say—about the circumstances that have led me to think about this, and about how one might actually go about instilling these virtues in students, but I think I will leave that for another post.

# Propositions as Types generalised: The Rosetta Stone

From Physics,Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay, courtesy of @CompSciFact, @sigfpe, and @notjfmc.

# Roseburn to Leith Walk A vs B: time to act!

On 2 August, I attended a meeting in Roseburn organised by those opposed to the new cycleway planned by the city. Local shopkeepers fear they will see a reduction in business, unaware this is a common cycling fallacy: study after study has shown that adding cycleways increases business, not the reverse, because pedestrians and cyclists find the area more attractive.

Feelings in Roseburn run strong. The locals don't trust the council: who can blame them after the fiasco over trams? But the leaders of the campaign are adept at cherry picking statistics, and, sadly, neither side was listening to the other.

On 30 August, the Edinburgh Council Transport and Environment Committee will decide between two options for the cycle route, A and B. Route A is direct. Route B goes round the houses, adding substantial time and rendering the whole route less attractive. If B is built, the opportunity to shift the area away from cars, to make it a more pleasant place to be and draw more business from those travelling by foot, bus, and cycle, goes out the window.

Locals like neither A nor B, but in a spirit of compromise the Transport and Environment Committee may opt for B. This will be a disaster, as route B will be far less likely to draw people out of their cars and onto their cycles, undermining Edinburgh's ambitious programme to attract more people to cycling before it even gets off the ground.

Investing in cycling infrastructure can make an enormous difference. Scotland suffers 2000 deaths per year due to pollution, and 2500 deaths per year due to inactivity. The original proposal for the cycleway estimates benefits of £14.5M over ten years (largely from improved health of those attracted to cycling) vs a cost of £5.7M, a staggering 3.3x return on investment. Katie Cycles to School is a brilliant video from Pedal on Parliament that drives home how investment in cycling will improve lives for cyclists and non-cyclists alike.

Want more detail? Much has been written on the issues.
Roseburn Cycle Route: Evidence-based local community support.
Conviction Needed.

The Transport Committee will need determination to carry the plan through to a successful conclusion. This is make or break: will Edinburgh be a city for cars or a city for people? Please write to your councillors and the transport and environment committee to let them know your views.

Roseburn to Leith Walk

# Haskell devops/dev tools role at Standard Chartered (London)

The Modelling Infrastructure (MI) team at Standard Chartered has an open position for a typed functional programming developer, based in London. MI are a devops-like team responsible for the continuous delivery, testing, tooling and general developer efficiency of the Haskell-based analytics package used by the bank. They work closely with other Haskell dev teams in the bank, providing developer tools, testing and automation on top of our git ecosystem.

The role involves improving the ecosystem for developers and further automation of our build, testing and release infrastructure. You will work with devs in London, as part of the global MI team (located in London and Singapore). Development is primarily in Haskell. Knowledge of the Shake build system and Bake continuous integration system would be helpful. Strong git skills would be an advantage. Having a keen eye for analytics, data analysis and data-driven approaches to optimizing tooling and workflows is desirable.

This is a permanent, associate director-equivalent positions in London

Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. Demonstrated ability to write Haskell-based tooling around git systems would be a super useful.

The role requires physical presence in London, either in our Basinghall or Old Street sites. Remote work is not an option. No financial background is required.Contracting-based positions are also possible if desired.

If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com

Tagged: jobs

# What I learned as a hired consultant to autodidact physicists

Many programming languages, especially domain-specific ones, are designed by amateurs. How do we prevent obvious irregularities and disasters in languages before they become widespread (aspects of Javascript and R come to mind).

Folk in the human-computer interaction community have a notion of 'expert evaluation'. I wonder if we could develop something similar for programming languages?

Jakub Zalewski passed me the article 'What I learned as a hired consultant to autodidact physicists', which treads related ground, but for physics rather than computing.

# [kctmprub] Selecting function arguments by type

Some programming languages permit a function to refer the arguments passed to it by number instead of by name, for example, Perl's @_ array.

We propose a similar mechanism of referring to function arguments by type.  This can only be done if there is only one argument of the given type in the list of parameters.  We introduce a special keyword, ARGUMENT_OF_TYPE, which when used with a type yields the desired argument.  Below, we use a syntax inspired by Haskell.

replicate_strings :: Int -> String -> String;
replicate_strings UNNAMED UNNAMED = concat $intersperse " "$ replicate (ARGUMENT_OF_TYPE::Int) (ARGUMENT_OF_TYPE::String);

Somewhat more ambitious, possibly more confusing, would be to have type inference figure out which argument is needed.

replicate_strings :: Int -> String -> String;
replicate_strings UNNAMED UNNAMED = concat $intersperse " "$ replicate ARGUMENT_OF_TYPE ARGUMENT_OF_TYPE;

The keyword UNNAMED marks the function arguments subject to this kind of inference.  They are awkward, but without them, it may be difficult or awkward for a function to return a function, i.e., it is a higher order function.  Perhaps it has a polymorphic return type which might or might not be function.

More ambitiously, if the arguments are of distinct types, then there could be some mechanism by which the order of the arguments at the call site does not matter.  Previously related ideas, for multiple element tuples and pairs.

Another idea: instead of UNNAMED being a keyword, let the parameters be user-chosen identifiers that do not have to be different for each parameter, but if they have the same name, they must be different types.  Add a (possibly optional) OVERLOADED_PARAMETER annotation to make things less confusing to others reading the code:

replicate_strings2 :: Int -> String -> Int -> String -> (String, String);
replicate_strings2 (OVERLOADED_PARAMETER x) (OVERLOADED_PARAMETER x) (OVERLOADED_PARAMETER y) (OVERLOADED_PARAMETER y) = (concat $intersperse " "$ replicate (x::Int) (x::String), concat $intersperse " "$ replicate (y::Int) (y::String))

# The Four Flaws of Haskell

Summary: Last year I made a list of four flaws with Haskell. Most have improved significantly over the last year.

No language/compiler ecosystem is without its flaws. A while ago I made a list of the flaws I thought might harm people using Haskell in an industrial setting. These are not flaws that impact beginners, or flaws that stop people from switching to Haskell, but those that might harm a big project. Naturally, everyone would come up with a different list, but here is mine.

Package Management: Installing a single consistent set of packages used across a large code base used to be difficult. Upgrading packages within that set was even worse. On Windows, anything that required a new network package was likely to end in tears. The MinGHC project solved the network issue. Stackage solved the consistent set of packages, and Stack made it even easier. I now consider Haskell package management a strength for large projects, not a risk.

IDE: The lack of an IDE certainly harms Haskell. There are a number of possibilities, but whenever I've tried them they have come up lacking. The fact that every Haskell programmer has an entrenched editor choice doesn't make it an easier task to solve. Fortunately, with Ghcid there is at least something near the minimum acceptable standard for everyone. At the same time various IDE projects have appeared, notably the Haskell IDE Engine and Intero. With Ghcid the lack of an IDE stops being a risk, and with the progress on other fronts I hope the Haskell IDE story continues to improve.

Space leaks: As Haskell programs get bigger, the chance of hitting a space leak increases, becoming an inevitability. While I am a big fan of laziness, space leaks are the big downside. Realising space leaks were on my flaws list, I started investigating methods for detecting space leaks, coming up with a simple detection method that works well. I've continued applying this method to other libraries and tools. I'll be giving a talk on space leaks at Haskell eXchange 2016. With these techniques space leaks don't go away, but they can be detected with ease and solved relatively simply - no longer a risk to Haskell projects.

Array/String libraries: When working with strings/arrays, the libraries that tend to get used are vector, bytestring, text and utf8-string. While each are individually nice projects, they don't work seamlessly together. The utf8-string provides UTF8 semantics for bytestring, which provides pinned byte arrays. The text package provides UTF16 encoded unpinned Char arrays. The vector package provides mutable and immutable vectors which can be either pinned or unpinned. I think the ideal situation would be a type that was either pinned or unpinned based on size, where the string was just a UTF8 encoded array with a newtype wrapping. Fortunately the foundation library provides exactly that. I'm not brave enough to claim a 0.0.1 package released yesterday has derisked Haskell projects, but things are travelling in the right direction.

It has certainly been possible to use Haskell for large projects for some time, but there were some risks. With the improvements over the last year the remaining risks have decreased markedly. In contrast, the risks of using an untyped or impure language remain significant, making Haskell a great choice for new projects.

# IntervalMap Retrospective

IntervalMap, my package for containers with intervals as keys, has gone through several changes since its inception in late 2011. As is often the case, I had a concrete problem to solve and the initial code did that and only that. Going public on Hackage made some changes necessary. Others cropped up later in the desire to get closer to the interface and performance of the other base containers. Some were even requested by users. Here's a rough history:
VersionDateChanges
0.12011-12Initial version
0.22011-12Name changes; Hackage release
0.32012-08Split into lazy and strict
0.42014-08Interval type class
0.52016-01Make interval search return submaps/subsets where appropriate

Looking back at this, what astonishes me most is the late addition of sets, which were only added after a user requested it. Why didn't this come up right at the start? “interval-containers” would have been a better and more general package name.

The really big change - supporting user-defined intervals via a type class - was caused by my initial distrust of nonstandard language extensions. The result of this is one more level in the module structure for the now-preferred generic version, e.g.:

import qualified Data.IntervalMap.Generic.Strict as IVM

What it still unclear is where to put the Interval type class. There are many useful interval data structures and using Data.Interval for this particular purpose would not be acceptable. But having it under IntervalMap does not seem right either. One option would be a name change:

Data.IntervalMap
Data.IntervalMap.Lazy
Data.IntervalMap.Strict
Data.IntervalSet
Data.OrderedInterval

Another is to use a general module:

Data.IntervalContainers
Data.IntervalContainers.Interval
Data.IntervalMap
Data.IntervalMap.Lazy
Data.IntervalMap.Strict
Data.IntervalSet

Or even put everything under that module:

Data.IntervalContainers
Data.IntervalContainers.Interval
Data.IntervalContainers.IntervalMap
Data.IntervalContainers.IntervalMap.Lazy
Data.IntervalContainers.IntervalMap.Strict
Data.IntervalContainers.IntervalSet

None of these seems clearly preferable, so the best option is probably to leave the package name and module structure as is.

## Data Structure

Preferring simplicity, I based the implementation on Red-Black trees. While this is in general competitive with size balanced binary trees as used by the base containers, there is one caveat: with plain red-black trees, it is not possible to implement a hedge-union algorithm or even simple optimizations for the very common “union of huge set and tiny set” operation unless one would use an additional redundant size field in the node.

This has become somewhat more pressing since version 0.5 where the search operations return subsets/submaps.

## Intervals

My initial fixed type implementation of intervals allowed mixing intervals with different openness, and type class has inherited this “feature”. I highly suspect that no one actually uses this - that is, all in instances of Interval, leftClosed and rightClosed ignore their argument and have a constant value.

If I'm wrong, and someone does use this, please let me know!

That design has a negative impact on performance, though. It turns up in the ordering of intervals and their endpoints. For example, a lower bound is actually lower than the identical lower bound of a second interval, if the first interval is closed and the second open.

Unless the compiler does whole-program optimization, it will not be able to eliminate the Interval dictionaries, and can do very little optimization. That's the reason Interval has many members that could also have been defined as module-level functions: above, below, contains, ... These functions are heavily used by the map and set implementations, and it seems that GHC at least is much better at optimizing these functions when they are defined as members (this is just an observation of mine - I don't know in detail why this should be the case). The disadvantage is that it's possible to break a lot of invariants if you actually define these members in an instance. That would not be possible if they were defined as regular functions. At least I'm in good company, because the same applies to Eq, Ord, and Monad to name just a few.

Data.Map tries to work around this by declaring many functions to be inlinable, effectively getting a bit closer to whole-program optimization. This does not seem feasible for IntervalMap, because the functions are used as deeper nesting, and in particular when constructing nodes.

As far as GHC is concerned, good performance is very dependent on inlining, especially when type classes and their dictionaries are involved. Any improvements in this would certainly benefit IntervalMap. One idea: it might be useful to be able to provide strictness information in type classes. In the case of Interval for example, lowerBound and upperBound always evaluate their argument (disregarding really pathological instances), but there is no way to indicate this to the compiler. Only the instance declarations allow inferring strictness information, and that's not available when compiling library code.

## Prospects

Whether a big refactoring or rewrite should be done really depends on the users of the package: you. Please let me know about missing features or performance issues, either via mail, a github issue, or a comment here.

# A Rails 2.3 Performance Gotcha

Recently we upgraded to Rails 2.3 at work. Upon doing so, we saw our application take an aggregate performance hit of about 50%. Spending a little quality time with ruby-prof, one thing that stood out as peculiar was an awful lot of time being spent creating and tending to Ruby Thread objects. Tracing up the call stack, these were coming from inside the memcached client.

Buried in the Rails 2.3 release notes is the innocuous looking statement:

The bundled memcached client has been updated to version 1.6.4.99.

What this fails to tell you is that this upgrade added a timeout option to the client which, as the RDoc will happily inform you, “is a major performance penalty in Ruby 1.8”. Moreover, the default value for the timeout is 0.5s, rather than nil.

Our application makes heavy use of caching, so we were getting massacred by this. To add insult to injury, our most heavily optimized pages were hit the hardest, since they used memcached the most. Adding :timeout => nil to our cache layer immediately restored the previous performance.

Lessons from this story: if you’re a Rails developer using memcached, go add that option to your code; if you’re a library author adding an option with significant performance costs, default to the old behavior; if you’re on the Ruby core team, please add an interface to select so that people don’t do silly things like use Threads to implement timeouts on socket reads.

# Changes

After nearly 6 1/2 years, on Monday I gave notice at Yahoo. After next week, I’ll be moving on to new things (to be described in a later post). The short version is that I was not finding happiness or the career opportunities I wanted at Yahoo any more. The long version... well, you’ll have to come buy me a beer or two if you want that.

# More Erlang Beust Challenge Results (incl. HiPE)

I’ve been tinkering a little more with the Beust Challenge, following up on my previous post. There are a couple significant new developments to report.

First, Hynek (Pichi) Vychodil wrote a faster version using a permutation generator.

Second, I also ported the first “CrazyBob” solution using a bit mask to Erlang.

Third, I discovered that my overly literal ports were actually slowing things down. The CrazyBob code uses an unspecified Listener class that receives the numbers in the series, and presumably computes the actual results from there. (Aside, I cannot actually benchmark against the Java code because I haven’t found what this implementation is.) I fudged this in my original ports by simply spawning a process, which then discarded all the messages it received. After I noticed that the code was pegging both of my CPUs, though, I realized that message passing might actually be the bottleneck in my code. Turns out this was the case, and removing the listener process and just computing the results in the main process actually sped things up substantially.

So... here’s the results. First on my MacBook Pro, without HiPE:

4 8ms 2ms 3ms 3ms
5 65ms 11ms 13ms 14ms
6 632ms 52ms 69ms 62ms
7 6.7s 253ms 303ms 272ms
8 72s 1.0s 1.0s 945ms
9 18m 4.7s 3.6s 2.8s
10 (3h) 13s 7.8s 5.3s

The bitmask solution starts out the fastest, but loses out in the end to the more clever solutions. pichi edges out the crazybob solution by about a third.

Now on Linux 2.6 with HiPE:

4 4ms <1ms 1ms 2ms
5 50ms 1ms 6ms 7ms
6 608ms 7ms 34ms 37ms
7 6.9s 35ms 160ms 174ms
8 78s 147ms 619ms 563ms
9 (18m) 460ms 1.8s 1.4s
10 (3h) 1.1s 4.2s 2.4s

And our new winner is... bitmask! HiPE barely helps the original brute-force solutions at all, while crazybob and pichi gain about a factor of two. bitmask, on the other hand, picks up an order of magnitude, and is now only a factor of 3 slower than the posted results for the Java crazybob solution (with unknown differences in hardware).

Conclusion: Erlang doesn’t suck!

# Alpha Male Programmers Are Driving Women Out

Yesterday, DHH made an argument that alpha male programmers aren’t keeping women out of the tech field. I’m of the opinion that he’s wrong and that his argument is flawed, and in a moment I’ll explain why, but let me get a few things out of the way so they don’t distract from the rest of my argument.

First, I don’t think that “alpha males” or the “rockstar” mentality are the only causes of under-representation of women in technology. As far as I can tell, the causes are many, varied, generally difficult to deal with, and in one or two cases may even be valid reasons why we might ultimately accept some degree of imbalance in the field. Second, this is not strictly a gender issue. Some men are also driven away by this behavior, although I think women are more likely to be; and also some “alpha males” happen to be female. Third, this is not a personal attack on DHH or any other individual, although some people might read parts of it in that way. But my goal here is that the range of individuals who find themselves uncomfortably reflected in what I say, but don’t simply reject it all out of hand, might view that discomfort as an opportunity for personal growth. Finally, I am certainly not claiming that I am perfect in this regard. I’ve made mistakes in the past; I will make them again. I simply hope that my friends will be kind enough to point out my mistakes to me, so that I can try to do better.

Okay, now that that’s all out of the way...

I first claim that DHH is wrong. My proof is empirical and anecdotal, but fortunately for me, I’m on the side of this debate that gets to use those techniques. I.e., I’m asserting existence of a phenomenon, rather than non-existence. I know numerous women who have been driven away by alpha male behavior. In some cases, they simply moved to a different team, or a different company. In other cases, they switched to non-programmer roles. And some left the industry entirely. I know this because they told me. They described specific individuals and specific behaviors which drove them away.

With some frequency in these debates, male programmers will claim that they don’t know any women who have left the field for this reason (or who have experienced sexism in the field, or who were offended by the example under discussion, or even just the milder claim that DHH makes, that no one has any idea what to do). I can explain this in only one of two ways: either they don’t know any women in the field to begin with or don’t talk with them beyond the minimal professional requirements, or women are not telling them because they are part of the problem. Perhaps it would be more effective if these women directly confronted the people causing the problem, but the fact of the matter is that most people, men and women, dislike conflict. We’re much more comfortable griping to a sympathetic friend than to the cause of our unhappiness. So consider me something akin to an anonymizing proxy. Without revealing names or too many of the specifics, please trust me when I say that almost every woman in the field experiences and complains about this.

Now I also said that DHH’s argument is flawed, and I will spend the rest of this post pointing out the various flaws I see.

DHH claims alpha males cannot be a problem in programming because the average male programmer is “meek, tame, and introverted” compared to other fields. First off, “alpha males” are by definition not average; they are the most dominant individuals of a group. And, it may even be possible that the general meekness or introversion of programmers makes it easier for a small set of individuals to dominate the interaction, rather than reaching a condition of detente between a group of uniformly assertive individuals. Second, presumably DHH does not interact with these people from other fields in a professional context. A point repeatedly stressed in this recent “pr0n star” controversy is that it’s not an issue of people being anti-sex or anti-porn; it’s about what’s appropriate in the workplace, or in a professional context. Standards for behavior in a social context are different. Third, he speaks in terms of whether these other men are more or less “R-rated”. This is not the point. Women are just as “R-rated” as men. They curse. They talk about sex (often more explicitly than men). The issue is not about whether we say or do adult things, it’s about whether we respect each other as human beings and whether we understand the societal norms of what is and is not appropriate in particular contexts. In fact, in this regard, I’ll defend DHH. Saying “fuck” (in the exclamatory, non-sexual usage) in the course of a technical presentation is not problematic in this day and age within the technology community. I think most of us swear freely in the course of struggling with a nasty bug or production problem. This is a normative expression of frustration within our community, and it does not oppress or disrespect other members of the community. (As far as I know. It’s possible that people just aren’t telling me that it upsets them when I curse at my monitor. If that’s the case, I hope someone will tell me.) Finally, DHH observes that these other fields have a more even mix of men and women. What he misses is that when the distribution is relatively equal it is generally easier and more comfortable for men to be men and women to be women. It is perhaps counterintuitive, but environments which are heavily skewed call for greater sensitivity to gender or other cultural differences simply because it is so easy to unintentionally create an oppressive or exclusionary atmosphere.

In the final paragraphs of his post, DHH suggests that somehow by respecting women we are squashing some other sort of “edge” and diversity in the community. I’m a little puzzled by what he means by this, and I’m sort of afraid that he thinks that being a heterosexual male who likes to look at scantily-clad women (or who openly admits as much) is somehow “edgy”. It’s not. By definition, hetero males like women; and it’s well established that men tend to be visually oriented. Pointing out that you fall in this category does not make you “diverse”, it makes you a completely typical representative of your gender and orientation. No one needs to be reminded of it.

Moreover, it might be true that maximal gains are had by pushing the edges (although I don’t think that one should naively assume that analogy from physics or economics applies to social endeavors), but for this to be work there has to be negative feedback when boundaries are crossed. If the edge-walkers want society to accept their behavior, they must be prepared to apologize, to make reparations, and to correct their course when they go over the line. This is the difference between a trend-setter and a sociopath.

There’s quite a bit more that I could say on this issue, but I fear this may be becoming too long already, and I think it’s probably best to focus only on the arguments presented in this particular post at the moment. To summarize things in a couple of sentences, the phenomenon of women being discouraged by alpha male behavior is real. You merely need to talk, and listen, to women in the field to verify this. (But you might have to earn some trust first.) Comparisons with men in other fields in non-professional settings do not have much relevance to the matter at hand. Claims that respecting the feelings and experiences of a minority group is damaging to the community overall are extraordinary and require extraordinary support. Being a thought leader and being offensive are two very different things.

It’s really quite discouraging that so much of this discussion still seems mired in the question of whether a problem even exists, or whether it is desirable and possible to address the problem. This lack of acceptance leads both to the explicit refusals to acknowledge the validity of the complaints of the offended, as well as the phenomena of false apologies and insincere claims that “I would help if only I knew how (and if it doesn’t require any actual change of behavior on my part)”. Male programmers need to pull their heads out of the sand. The evidence, both hard statistical data and anecdotal, is overwhelming. It also is not hard to find advice about what can be done differently. The hard part is moving from a vague desire for diversity and balance to serious, meaningful, sometimes painful self-examination and commitment to change and improvement. It’s not easy to admit flaws in yourself, to acknowledge when you’ve hurt another person, or to make a true apology. Change doesn’t happen overnight or simply because you say that you want it to. It takes work, but it’s an important part of being a human being and being a member of a community.

# Speeding up a Rails request by 150ms by changing 1 line

We’re pretty obsessed with performance at Gilt Groupe. You can get a taste for what we’re dealing with, and how we’re dealing with it, from our recent presentation at RailsConf.

One of the techniques we’re using is to precompute what certain high-volume pages will look like at a given time in the future, and store the result as static HTML that we serve to the actual users at that time. For ease of initial development, and because there’s still a fair bit of business logic involved in determining which version of a particular page to serve, this was done inside our normal controller actions which look for a static file to serve, before falling back to generating it dynamically.

We’re now running on Rails 2.3 and, of course, Rails Metal is the new hotness in 2.3. I spent the last couple days looking into how much improvement in static file serving we would see by moving it into the Metal layer. Based on most of what I’ve read, I expected we might shave off a couple milliseconds. This expectation turned out to be dramatically wrong.

Metal components operate outside the realm of the usual Rails timing and logging components, so you don’t get any internal measurements of page performance. Instead, I fired up ab to measure the serving times externally. What I found for the page I was benchmarking was that the Metal implementation took about 5ms. The old controller action took 170ms. But, wait... the Rails logs were only reporting 8ms for that action. Something was fishy.

I started inserting timers at various places in the Rails stack, trying to figure out where the other 160ms was going. A little bit was routing logic and other miscellaneous overhead, but even setting a timer around the very entry points into the Rails request serving path, I was only seeing 15ms being spent. This was getting really puzzling, because at this point where a Rack response is returned to the web server, I expected things to look identical between Metal and ActionController. However, looking more closely at the response objects I discovered the critical difference. The Metal response returns an [String], while the controller returned an ActionController::Response.

I went into the Rails source and found the each method for ActionController::Response. Here it is:

def each(&callback)
if @body.respond_to?(:call)
@writer = lambda { |x| callback.call(x) }
@body.call(self, self)
elsif @body.is_a?(String)
@body.each_line(&callback)
else
@body.each(&callback)
end

@writer = callback
@block.call(self) if @block
end

The critical line is the case where the body is a String. The code iterates over each line in the response. Each line is written individually to the network socket. In the case of the particular page I was looking at, that was 1300 writes. Ouch.

To confirm this was the problem, I changed that line to

yield @body

With the whole body being sent in a single write, ab reported 15ms per request, right in line with what I measured inside Rails.

1 line changed. 150ms gained. Not too bad.

This sort of performance pessimization we uncovered is particularly insidious because it’s completely invisible to all the usual Rails monitoring tools. It doesn’t show up in your logged response time; you won’t see it in NewRelic or TuneUp. The only way you’re going to find out about it is by running an external benchmarking tool. Of course, this is always a good idea, but it’s easy to forget to do it, because the tools that work inside the Rails ecosystem are so nice. But the lesson here is, if you’re working on performance optimizations, make sure to always get a second opinion.

# Software Engineer (Haskell/Clojure) at Capital Match (Full-time)

Overview

Capital Match is a leading marketplace lending and invoice financing platform in Singapore. Our in-house platform, mostly developed in Haskell, has in the last year seen more than USD 10 million business loans processed with a strong monthly growth (current rate of USD 1.5-2.5 million monthly). We are also eyeing expansion into new geographies and product categories. Very exciting times!

We have just secured another funding round to build a world-class technology as the key business differentiator. The key components include credit risk engine, seamless banking integration and end-to-end product automation from loan origination to debt collection.

Responsibilities

We are looking to hire a software engineer with a minimum of 2-3 years coding experience. The current tech team includes a product manager and 3 software engineers. We are currently also in the process of hiring CTO.

The candidate should have been involved in a development of multiple web-based products from scratch. He should be interested in all aspects of the creation, growth and operations of a secure web-based platform: front-to-back features development, distributed deployment and automation in the cloud, build and test automation etc.

Background in fintech and especially lending / invoice financing space would be a great advantage.

Requirements

Our platform is primarily developed in Haskell with an Om/ClojureScript frontend. We are expecting our candidate to have experience working with a functional programming language e.g. Haskell/Scala/OCaml/F#/Clojure/Lisp/Erlang.

Deployment and production is managed with Docker containers using standard cloud infrastructure so familiarity with Linux systems, command-line environment and cloud-based deployment is mandatory. Minimum exposure to and understanding of XP practices (TDD, CI, Emergent Design, Refactoring, Peer review and programming, Continuous improvement) is expected.

We are looking for candidates that are living in or are willing to relocate to Singapore.

Offer

We offer a combination of salary and equity depending on experience and skills of the candidate.

Most expats who relocate to Singapore do not have to pay their home country taxes and the local tax rate in Singapore is more or less 5% (effective on the proposed salary range).

Singapore is a great place to live, a vibrant city rich with diverse cultures, a very strong financial sector and a central location in Southeast Asia.

Get information on how to apply for this position.

## August 11, 2016

### FP Complete

Sneak peek: Run docker run --rm -p 8080:8080 snoyberg/file-server-demo and open http://localhost:8080.

We've all been there. We need to write some non-trivial piece of functionality, and end up doing it in bash or perl because that's what we have on the server we'll be deploying to. Or because it's the language we can most easily rely on being present at a consistent version on our coworkers' machines. We'd rather use a different language and leverage more advanced, non-standard libraries, but we can't do that reliably.

One option is to create static executables or to ship around Docker images. This is great for many use cases, and we are going to have a follow-up blog post about using Docker and Alpine Linux to make such static executables. But there are at least two downsides to this approach:

• It's not possible to modify a static executable directly. You need to have access to the source code and the tool chain used to produce it.
• The executable is tied to a single operating system; good luck getting your Linux executable to run on your OS X machine.

Said another way: there are good reasons why people like to use scripting languages. This blog post is going to demonstrate doing some non-trivial work with Haskell, and do so with a fully reproducible and trivially installed toolchain, supported on multiple operating systems.

Haskell is a functional programming language with high performance, great safety features, and a large ecosystem of open source libraries to choose from. Haskell programs are high level enough to be readable and modifiable by non-experts, making it ideal for these kinds of shared scripts. If you're new to Haskell, learn more on haskell-lang.org.

We're going to put together a simple file server with upload capability. We're going to assume a non-hostile environment (like a corporate LAN with no external network access), and therefore not put in security precautions like upload size limits. We're going to use the relatively low-level Web Application Interface instead of a web framework. While it makes the code a bit longer, there's no magic involved. Common frameworks in Haskell include Yesod and Servant. We're going to host this all with the blazingly fast Warp web server.

## Get Stack

Stack is a cross-platform program for developing Haskell projects. While it has many features, in our case the most important bit is that it can:

• Install Haskell libraries from a curated package set
• Run Haskell source files directly as a script (we'll show how below)

Check out the Get Started page on haskell-lang.org to get Stack on your system.

## The code

You can see the full source code on Github. Let's step through the important parts here.

### Script interpreter

We start off our file with something that is distinctly not Haskell code:

#!/usr/bin/env stack
{- stack
--resolver lts-6.11
--install-ghc
runghc
--package shakespeare
--package wai-app-static
--package wai-extra
--package warp
-}

With this header, we've made our file executable from the shell. If you chmod +x the source file, you can run ./FileServer.hs. The first line is a standard shebang. After that, we have a comment that provides Stack with the relevant command line options. These options tell it to:

• Use the Haskell Long Term Support (LTS) 6.11 package set. From now through the rest of time, you'll be running against the same set of packages, so no worries about your code bitrotting!
• Install GHC, the Glasgow Haskell Compiler. LTS 6.11 indicates what version of GHC is needed (GHC 7.10.3). Once again: no bitrot concerns!
• runghc says we'd like to run a script with GHC
• The rest of the lines specify which Haskell library packages we depend on. You can see a full list of available libraries in LTS 6.11 on the Stackage server

For more information on Stack's script interpreter support, see the Stack user guide.

### Command line argument parsing

Very often with these kinds of tools, we need to handle command line arguments. Haskell has some great libraries for doing this in an elegant way. For example, see the optparse-applicative library tutorial. However, if you want to go simple, you can also just use the getArgs function to get a list of arguments. We're going to add support for a sanity argument, which will allow us to sanity-check that running our application works:

main :: IO ()
main = do
args <- getArgs
case args of
["sanity"] -> putStrLn "Sanity check passed, ready to roll!"
[] -> do
putStrLn "Launching application"
-- Run our application (defined below) on port 8080
run 8080 app
_ -> error $"Unknown arguments: " ++ show args ### Routing We're going to support three different routes in our application: • The /browse/... tree should allow you to get a directory listing of files in the current directory, and view/download individual files. • The /upload page accepts a file upload and writes the uploaded content to the current directory. • The homepage (/) should display an HTML page with a link to /browse and provide an HTML upload form targeting /upload. Thanks to pattern matching in Haskell, getting this to work is very straightforward: app :: Application app req send = -- Route the request based on the path requested case pathInfo req of -- "/": send the HTML homepage contents [] -> send$ responseBuilder
status200
[("Content-Type", "text/html; charset=utf-8")]
(runIdentity $execHtmlT homepage) -- "/browse/...": use the file server to allow directory -- listings and downloading files ("browse":rest) -> -- We create a modified request that strips off the -- "browse" component of the path, so that the file server -- does not need to look inside a /browse/ directory let req' = req { pathInfo = rest } in fileServer req' send -- "/upload": handle a file upload ["upload"] -> upload req send -- anything else: 404 _ -> send$ responseLBS
status404
[("Content-Type", "text/plain; charset=utf-8")]

The most complicated bit above is the path modification for the /browse tree, which is something a web framework would handle for us automatically. Remember: we're doing this low level to avoid extra concepts, real world code is typically even easier than this!

### Homepage content

An area that Haskell really excels at is Domain Specific Languages (DSLs). We're going to use the Hamlet for HTML templating. There are many other options in the Haskell world favoring other syntax, such as Lucid library (which provides a Haskell-based DSL), plus implementations of language-agnostic templates, like mustache.

Here's what our HTML page looks like in Hamlet:

homepage :: Html ()
homepage = [shamlet|
$doctype 5 <html> <head> <title>File server <body> <h1>File server <p> <a href=/browse/>Browse available files <form method=POST action=/upload enctype=multipart/form-data> <p>Upload a new file <input type=file name=file> <input type=submit> |] Note that Hamlet - like Haskell itself - uses significant whitespace and indentation to denote nesting. ### The rest We're not going to cover the rest of the code in the Haskell file. If you're interested in the details, please read the comments there, and feel free to ask questions about any ambiguous bits (hopefully the inline comments give enough clarity on what's going on). ## Running Download the FileServer.hs file contents (or copy-paste, or clone the repo), make sure the file is executable (chmod +x FileServer.hs), and then run:$ ./FileServer.hs

If you're on Windows, you can instead run:

> stack FileServer.hs

That's correct: the same source file will work on POSIX systems and Windows as well. The only requirement is Stack and GHC support. Again, to get Stack on your system, please see the Get Started page.

The first time you run this program, it will take a while to complete. This is because Stack will need to download and install GHC and necessary libraries to a user-local directory. Once complete, the results are kept on your system, so subsequent runs will be almost instantaneous.

Once running, you can view the app on localhost:8080.

## Dockerizing

Generally, I wouldn't recommend Dockerizing a source file like this; it makes more sense to Dockerize a compiled executable. We'll cover how to do that another time (though sneak preview: Stack has built in support for generating Docker images). For now, let's actually Dockerize the source file itself, complete with Stack and the GHC toolchain.

You can check out the Dockerfile on Github. That file may be slightly different from what I cover here.

FROM ubuntu:16.04
MAINTAINER Michael Snoyman

Nothing too interesting...

RUN chmod +x /usr/local/bin/dumb-init

While interesting, this isn't Haskell-specific. We're just using an init process to get proper handling for signals. For more information, see dumb-init's announcement blog post.

RUN sh /usr/local/bin/get-stack.sh

Stack has a shell script available to automatically install it on POSIX systems. We just download that script and then run it. This is all it takes to have a Haskell-ready system set up: we're now ready to run script interpreter based files like our FileServer.hs!

COPY FileServer.hs /usr/local/bin/file-server
RUN chmod +x /usr/local/bin/file-server

We're copying over the source file we wrote and then ensuring it is executable. Interestingly, we can rename it to not include a .hs file extension. There is plenty of debate in the world around whether scripts should or should not include an extension indicating their source language; Haskell is allowing that debate to perpetuate :).

RUN useradd -m www && mkdir -p /workdir && chown www /workdir
USER www

While not strictly necessary, we'd rather not run our executable as the root user, for security purposes. Let's create a new user, create a working directory to store files in, and run all subsequent commands as the new user.

RUN /usr/local/bin/file-server sanity

As I mentioned above, that initial run of the server takes a long time. We'd like to do the heavy lifting of downloading and installing during the Docker image build rather than at runtime. To make this happen, we run our program once with the sanity command line argument, so that it immediately exits after successfully starting up.

CMD ["/usr/local/bin/dumb-init", "/usr/local/bin/file-server"]
WORKDIR /workdir
EXPOSE 8080

Finally, we use CMD, WORKDIR, and EXPOSE to make it easier to run. This Docker image is available on Docker Hub, so if you'd like to try it out without doing a full build on your local machine:

docker run --rm -p 8080:8080 snoyberg/file-server-demo

You should be able to play with the application on http://localhost:8080.

## What's next

As you can see, getting started with Haskell as a scripting language is easy. You may be interested in checking out the turtle library, which is a Shell scripting DSL written in Haskell.

FP Complete both supports the open source Haskell ecosystem, as well as provides commercial support for those seeking it. If you're interested in learning more about how FP Complete can help you and your team be more successful in your development and devops work, you can learn about what services we offer or contact us for a free consultation.

# Bitrot-free Scripts

Sneak peek: Run docker run --rm -p 8080:8080 snoyberg/file-server-demo and open http://localhost:8080.

We've all been there. We need to write some non-trivial piece of functionality, and end up doing it in bash or perl because that's what we have on the server we'll be deploying to. Or because it's the language we can most easily rely on being present at a consistent version on our coworkers' machines. We'd rather use a different language and leverage more advanced, non-standard libraries, but we can't do that reliably.

One option is to create static executables or to ship around Docker images. This is great for many use cases, and we are going to have a follow-up blog post about using Docker and Alpine Linux to make such static executables. But there are at least two downsides to this approach:

• It's not possible to modify a static executable directly. You need to have access to the source code and the tool chain used to produce it.
• The executable is tied to a single operating system; good luck getting your Linux executable to run on your OS X machine.

Said another way: there are good reasons why people like to use scripting languages. This blog post is going to demonstrate doing some non-trivial work with Haskell, and do so with a fully reproducible and trivially installed toolchain, supported on multiple operating systems.

Haskell is a functional programming language with high performance, great safety features, and a large ecosystem of open source libraries to choose from. Haskell programs are high level enough to be readable and modifiable by non-experts, making it ideal for these kinds of shared scripts. If you're new to Haskell, learn more on haskell-lang.org.

We're going to put together a simple file server with upload capability. We're going to assume a non-hostile environment (like a corporate LAN with no external network access), and therefore not put in security precautions like upload size limits. We're going to use the relatively low-level Web Application Interface instead of a web framework. While it makes the code a bit longer, there's no magic involved. Common frameworks in Haskell include Yesod and Servant. We're going to host this all with the blazingly fast Warp web server.

## Get Stack

Stack is a cross-platform program for developing Haskell projects. While it has many features, in our case the most important bit is that it can:

• Install Haskell libraries from a curated package set
• Run Haskell source files directly as a script (we'll show how below)

Check out the Get Started page on haskell-lang.org to get Stack on your system.

## The code

You can see the full source code on Github. Let's step through the important parts here.

### Script interpreter

We start off our file with something that is distinctly not Haskell code:

#!/usr/bin/env stack
{- stack
--resolver lts-6.11
--install-ghc
runghc
--package shakespeare
--package wai-app-static
--package wai-extra
--package warp
-}

With this header, we've made our file executable from the shell. If you chmod +x the source file, you can run ./FileServer.hs. The first line is a standard shebang. After that, we have a comment that provides Stack with the relevant command line options. These options tell it to:

• Use the Haskell Long Term Support (LTS) 6.11 package set. From now through the rest of time, you'll be running against the same set of packages, so no worries about your code bitrotting!
• Install GHC, the Glasgow Haskell Compiler. LTS 6.11 indicates what version of GHC is needed (GHC 7.10.3). Once again: no bitrot concerns!
• runghc says we'd like to run a script with GHC
• The rest of the lines specify which Haskell library packages we depend on. You can see a full list of available libraries in LTS 6.11 on the Stackage server

For more information on Stack's script interpreter support, see the Stack user guide.

### Command line argument parsing

Very often with these kinds of tools, we need to handle command line arguments. Haskell has some great libraries for doing this in an elegant way. For example, see the optparse-applicative library tutorial. However, if you want to go simple, you can also just use the getArgs function to get a list of arguments. We're going to add support for a sanity argument, which will allow us to sanity-check that running our application works:

main :: IO ()
main = do
args <- getArgs
case args of
["sanity"] -> putStrLn "Sanity check passed, ready to roll!"
[] -> do
putStrLn "Launching application"
-- Run our application (defined below) on port 8080
run 8080 app
_ -> error $"Unknown arguments: " ++ show args ### Routing We're going to support three different routes in our application: • The /browse/... tree should allow you to get a directory listing of files in the current directory, and view/download individual files. • The /upload page accepts a file upload and writes the uploaded content to the current directory. • The homepage (/) should display an HTML page with a link to /browse and provide an HTML upload form targeting /upload. Thanks to pattern matching in Haskell, getting this to work is very straightforward: app :: Application app req send = -- Route the request based on the path requested case pathInfo req of -- "/": send the HTML homepage contents [] -> send$ responseBuilder
status200
[("Content-Type", "text/html; charset=utf-8")]
(runIdentity $execHtmlT homepage) -- "/browse/...": use the file server to allow directory -- listings and downloading files ("browse":rest) -> -- We create a modified request that strips off the -- "browse" component of the path, so that the file server -- does not need to look inside a /browse/ directory let req' = req { pathInfo = rest } in fileServer req' send -- "/upload": handle a file upload ["upload"] -> upload req send -- anything else: 404 _ -> send$ responseLBS
status404
[("Content-Type", "text/plain; charset=utf-8")]

The most complicated bit above is the path modification for the /browse tree, which is something a web framework would handle for us automatically. Remember: we're doing this low level to avoid extra concepts, real world code is typically even easier than this!

### Homepage content

An area that Haskell really excels at is Domain Specific Languages (DSLs). We're going to use the Hamlet for HTML templating. There are many other options in the Haskell world favoring other syntax, such as Lucid library (which provides a Haskell-based DSL), plus implementations of language-agnostic templates, like mustache.

Here's what our HTML page looks like in Hamlet:

homepage :: Html ()
homepage = [shamlet|
$doctype 5 <html> <head> <title>File server <body> <h1>File server <p> <a href=/browse/>Browse available files <form method=POST action=/upload enctype=multipart/form-data> <p>Upload a new file <input type=file name=file> <input type=submit> |] Note that Hamlet - like Haskell itself - uses significant whitespace and indentation to denote nesting. ### The rest We're not going to cover the rest of the code in the Haskell file. If you're interested in the details, please read the comments there, and feel free to ask questions about any ambiguous bits (hopefully the inline comments give enough clarity on what's going on). ## Running Download the FileServer.hs file contents (or copy-paste, or clone the repo), make sure the file is executable (chmod +x FileServer.hs), and then run:$ ./FileServer.hs

If you're on Windows, you can instead run:

> stack FileServer.hs

That's correct: the same source file will work on POSIX systems and Windows as well. The only requirement is Stack and GHC support. Again, to get Stack on your system, please see the Get Started page.

The first time you run this program, it will take a while to complete. This is because Stack will need to download and install GHC and necessary libraries to a user-local directory. Once complete, the results are kept on your system, so subsequent runs will be almost instantaneous.

Once running, you can view the app on localhost:8080.

## Dockerizing

Generally, I wouldn't recommend Dockerizing a source file like this; it makes more sense to Dockerize a compiled executable. We'll cover how to do that another time (though sneak preview: Stack has built in support for generating Docker images). For now, let's actually Dockerize the source file itself, complete with Stack and the GHC toolchain.

You can check out the Dockerfile on Github. That file may be slightly different from what I cover here.

FROM ubuntu:16.04
MAINTAINER Michael Snoyman

Nothing too interesting...

RUN chmod +x /usr/local/bin/dumb-init

While interesting, this isn't Haskell-specific. We're just using an init process to get proper handling for signals. For more information, see dumb-init's announcement blog post.

RUN sh /usr/local/bin/get-stack.sh

Stack has a shell script available to automatically install it on POSIX systems. We just download that script and then run it. This is all it takes to have a Haskell-ready system set up: we're now ready to run script interpreter based files like our FileServer.hs!

COPY FileServer.hs /usr/local/bin/file-server
RUN chmod +x /usr/local/bin/file-server

We're copying over the source file we wrote and then ensuring it is executable. Interestingly, we can rename it to not include a .hs file extension. There is plenty of debate in the world around whether scripts should or should not include an extension indicating their source language; Haskell is allowing that debate to perpetuate :).

RUN useradd -m www && mkdir -p /workdir && chown www /workdir
USER www

While not strictly necessary, we'd rather not run our executable as the root user, for security purposes. Let's create a new user, create a working directory to store files in, and run all subsequent commands as the new user.

RUN /usr/local/bin/file-server sanity

As I mentioned above, that initial run of the server takes a long time. We'd like to do the heavy lifting of downloading and installing during the Docker image build rather than at runtime. To make this happen, we run our program once with the sanity command line argument, so that it immediately exits after successfully starting up.

CMD ["/usr/local/bin/dumb-init", "/usr/local/bin/file-server"]
WORKDIR /workdir
EXPOSE 8080

Finally, we use CMD, WORKDIR, and EXPOSE to make it easier to run. This Docker image is available on Docker Hub, so if you'd like to try it out without doing a full build on your local machine:

docker run --rm -p 8080:8080 snoyberg/file-server-demo

You should be able to play with the application on http://localhost:8080.

## What's next

As you can see, getting started with Haskell as a scripting language is easy. You may be interested in checking out the turtle library, which is a Shell scripting DSL written in Haskell.

FP Complete both supports the open source Haskell ecosystem, as well as provides commercial support for those seeking it. If you're interested in learning more about how FP Complete can help you and your team be more successful in your development and devops work, you can learn about what services we offer or contact us for a free consultation.

# (Senior) Scala Developer at SAP SE (Full-time)

SAP is a market leader in enterprise application software, helping companies of all sizes and industries run better. SAP empowers people and organizations to work together more efficiently and use business insight more effectively. SAP applications and services enable our customers to operate profitably, adapt continuously, and grow sustainably.

What you'll do:

You will be a member of the newly formed Scala development experience team. You will support us with the design and development of a Scala and cloud-based business application development and runtime platform. The goal of the platform is to make cloud-based business application development in the context S/4 HANA as straight-forward as possible. The team will be distributed over Berlin, Potsdam, Walldorf and Bangalore.

• Design and development of libraries and tools for business application development
• Design and development of tools for operating business applications
• Explore, understand, and implement most recent technologies
• Contribute to open source software (in particular within the Scala ecosystem)

Required skills:

• Master’s degree in computer science, mathematics, or related field
• Excellent programming skills and a solid foundation in computer science with strong competencies in data structures, algorithms, databases, and software design
• Solid understanding of object-oriented concepts and basic understanding functional programming concepts
• Good knowledge in Scala, Java, C++, or similar object-oriented programming languages
• Strong analytical skills
• Reliable and open-minded with strong team working skills, determined to reach a goal in time as well as the ability to work independently and to prioritize
• Ability to get quickly up-to-speed in a complex, new environment
• Proficiency in spoken and written English

Beneficial skills

• Ph.D. in computer science
• Solid understanding of functional programming concepts
• Good knowledge in Scala, OCaml, SML, or Haskell
• Experience with Scala and Scala.js
• Experience with meta programming in Scala, e.g., using Scala’s macro system
• Knowledge on SAP technologies and products
• Experiences with the design of distributed systems, e.g., using Akka

What we offer

• Modern and innovative office locations
• Free lunch and free coffee
• Flexible working hours
• Training opportunities and conference visits
• Fitness room with a climbing wall
• Gaming room with table tennis, kicker tables and a Playstation
• Friendly colleagues and the opportunity to work within a highly diverse team which has expert knowledge in a wide range of technologies

Get information on how to apply for this position.

# Some Haskell hacks: SPARQL queries to DBPedia and using OpenCalais web service

For various personal (and a few consulting) projects I need to access DBPedia and other SPARQL endpoints. I use the hsparql Haskell library written by Jeff Wheeler and maintained by Rob Stewart. The following code snippet:

module Sparql2 where

import Database.HSparql.Connection
import Database.HSparql.QueryGenerator

import Data.RDF hiding (triple)
import Data.RDF.TriplesGraph

simpleDescribe :: Query DescribeQuery
simpleDescribe = do
resource <- prefix "dbpedia" (iriRef "http://dbpedia.org/resource/")
uri <- describeIRI (resource .:. "Sedona_Arizona")
return DescribeQuery { queryDescribe = uri }

doit = do
(rdfGraph:: TriplesGraph) <- describeQuery "http://dbpedia.org/sparql" simpleDescribe
--mapM_ print (triplesOf rdfGraph)
--print "\n\n\n"
--print rdfGraph
mapM ($$Triple s p o) -> case [s,p,o] of [UNode(s), UNode(p), UNode(o)] -> return (s,p,o) [UNode(s), UNode(p), LNode(PlainLL o2 l)] -> return (s,p,o2) [UNode(s), UNode(p), LNode(TypedL o2 l)] -> return (s,p,o2) _ -> return ("no match","no match","no match")) (triplesOf rdfGraph) main = do results <- doit print  results !! 0 mapM_ print results I find the OpenCalais web service for finding entities in text and categorizing text to be very useful. This code snippet uses the same hacks for processing the RDF returned by OpenCalais that I used in my last semantic web book: NOTE: August 9, 2016: the following example no longer works because of API changes: module OpenCalais (calaisResults) where import Network.HTTP import Network.HTTP.Base (urlEncode) import qualified Data.Map as M import qualified Data.Set as S import Control.Monad.Trans.Class (lift) import Data.String.Utils (replace) import Data.List (lines, isInfixOf) import Data.List.Split (splitOn) import Data.Maybe (maybe) import System.Environment (getEnv) calaisKey = getEnv "OPEN_CALAIS_KEY" escape s = urlEncode s baseParams = "<c:params xmlns:c=\"http://s.opencalais.com/1/pred/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"><c:processingDirectives c:contentType=\"text/txt\" c:outputFormat=\"xml/rdf\"></c:processingDirectives><c:userDirectives c:allowDistribution=\"true\" c:allowSearch=\"true\" c:externalID=\"17cabs901\" c:submitter=\"ABC\"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>" calaisResults s = do key <- calaisKey let baseUrl = "http://api.opencalais.com/enlighten/calais.asmx/Enlighten?licenseID=" ++ key ++ "&content=" ++ (escape s) ++ "&paramsXML=" ++ (escape baseParams) ret <- simpleHTTP (getRequest baseUrl) >>= fmap (take 10000) . getResponseBody return  map (\z -> splitOn ": " z)  filter (\x -> isInfixOf ": " x && length x < 40) (lines (replace "\r" "" ret)) main = do r <- calaisResults "Berlin Germany visited by George W. Bush to see IBM plant. Bush met with President Clinton. Bush said “felt it important to step it up”" print r You need to have your free OpenCalais developer key in the environment variable OPEN_CALAIS_KEY. The key is free and allows you to make 50K API calls a day (throttled to four per second). I have been trying to learn Haskell for about four years so if anyone has any useful critiques of these code examples, please speak up :-) ### Functional Jobs # Head of Data Science at Capital Match (Full-time) Overview Capital Match is a leading marketplace lending and invoice financing platform in Singapore. Our in-house platform, mostly developed in Haskell, has in the last year seen more than USD 10 million business loans processed with a strong monthly growth (current rate of USD 1.5-2.5 million monthly). We are also eyeing expansion into new geographies and product categories. Very exciting times! We have just secured another funding round to build a world-class technology as the key business differentiator. The key components include credit risk engine, seamless banking integration and end-to-end product automation from loan origination to debt collection. Complementing our technology, we aspire to put data science at the core of everything we do. Responsibilities We are looking to hire an experienced data scientist / software engineer with passion for data for a role of the Head of Data Science. Data science needs to impact every stage of our work flow including customer acquisition, operational automation, risk and underwriting, portfolio servicing, marketing and product development. We have functional teams in every major area (sales, credit, tech, product) and the Head of Data Science would be a cross-functional role improving decision making processes and outcomes across the company. The person would report directly to CEO and the Board of Directors. The candidate would be expected to: • Lead a bold agenda around the use of transaction data in new creative ways • Work with multiple, complex data sources at large scale • Utilize big data and machine learning to build predictive models including but not limited to customer acquisition, credit risk, fraud, marketing etc. • Perform thorough testing and validation of models and support various aspects of the business with data analytics • Identify new data sources / patterns that add significant lift to predictive modeling capabilities We are looking for an individual who is eager to use data to come up with new ideas to improve decisions, who is driven by making impact through actionable insights and improvements, and who is not afraid to take risks and try new things. Background in fintech and especially lending / invoice financing space would be a great advantage. Requirements Minimum 5 years of coding, machine learning and large scale data analysis experience. Our platform is primarily developed in Haskell with an Om/ClojureScript frontend. We are expecting our candidate to have experience working with a functional programming language e.g. Haskell/Scala/OCaml/F#/Clojure/Lisp/Erlang. Deployment and production is managed with Docker containers using standard cloud infrastructure so familiarity with Linux systems and command-line environment would be helpful. Minimum exposure to and understanding of XP practices (TDD, CI, Emergent Design, Refactoring, Peer review and programming, Continuous improvement) is expected. We are looking for candidates that are living in or are willing to relocate to Singapore. Offer We offer a combination of salary and equity that strongly depends on the candidate's experience and skills: Salary: USD 5,000-10,000 / month Equity: 0.5-1.5% (subject to vesting) Most expats who relocate to Singapore do not have to pay their home country taxes and the local tax rate in Singapore is more or less 5% (effective on the proposed salary range). Visa sponsorship will be provided. Singapore is a great place to live, a vibrant city rich with diverse cultures, a very strong financial sector and a central location in Southeast Asia. Get information on how to apply for this position. ## August 08, 2016 ### Manuel M T Chakravarty # Learning Haskell We just published the seventh chapter of our new tutorial for Learning Haskell. The tutorial combines clear explanations with sceencasts reinforcing new concepts with live coding. Several chapters use graphics programming to make for more engaging coding examples. ## August 07, 2016 ### Brent Yorgey # POGIL workshop A few weeks ago I attended a 3-day training workshop in St. Louis, put on by the POGIL project. I attended a short POGIL session at the SIGCSE CS education conference in March and was sufficiently impressed to sign up for a training workshop (it didn’t hurt that Clif Kussmaul has an NSF grant that paid for my registration and travel). POGIL is an acronym for “Process Oriented Guided Inquiry Learning”. Process-oriented refers to the fact that in addition to learning content, an explicit goal is for students to learn process skills like analytic thinking, communication, and teamwork. Guided inquiry refers to the fact that students are responsible for constructing their own knowledge, guided by carefully designed questions. The entire framework is really well thought-out and is informed by concrete research in pedagogical methods. I really enjoyed how the workshop used the POGIL method to teach us about POGIL (though of course it would be rather suspect to do anything else!). It gave me not just an intellectual appreciation for the benefits of the approach, but also a concrete understanding of the POGIL experience for a student. The basic idea is to put students in groups of 3 or 4 and have them work through an activity or set of questions together. So far this sounds just like standard “group work”, but it’s much more carefully thought out than that: • Each student is assigned a role with specific responsibilities within their group. Roles typically rotate from day to day so each student gets a chance to play each role. Roles can vary but common ones include things like “manager”, “recorder”, “reporter”, and so on. I didn’t appreciate how important the roles are until attending the workshop, but they are really crucial. They help ensure every student is engaged, forestall some of the otherwise inevitable social awkwardness as students figure out how to relate to their group members, and also play an important part in helping students develop process skills. • The activities are carefully constructed to take students through one or more learning cycles: beginning with some data, diagrams, text, etc. (a “model”), students are guided through a process starting with simple observations, then synthesis and discovering underlying concepts, and finally more open ended/application questions. The teacher is a facilitator: giving aid and suggestions as needed, managing dificulties that arise, giving space and time for groups to report on their progress and share with other groups, and so on. Of course, a lot of work goes into constructing the activities themselves. In some areas, there is already a wealth of POGIL activities to choose from; unfortunately, existing materials are a bit thinner in CS (though there is a growing collection). I won’t be able to use POGIL much this coming semester, but I hope to use it quite a bit when I teach algorithms again in the spring. ### Roman Cheplyaka # Does it matter if Hask is (not) a category? Andrej Bauer raises a question whether Hask is a real category. I think it’s a legitimate question to ask, especially by a mathematician or programming languages researcher. But I want to look closer at how a (probably negative) answer to this question would affect Haskell and its community. To illustrate the fallacy of assuming blindly that Hask is a category, Andrej tells an anecdote (which I find very funny): I recall a story from one of my math professors: when she was still a doctoral student she participated as “math support” in the construction of a small experimental nuclear reactor in Slovenia. One of the physicsts asked her to estimate the value of the harmonic series \(1+1/2+1/3+\cdots$$ to four decimals. When she tried to explain the series diverged, he said “that’s ok, let’s just pretend it converges”.

Presumably here is what happened:

1. The physicists came up with a mathematical model of a nuclear reactor.
2. The model involved the sum of the harmonic series.
3. Andrej’s math professor tried to explain that the series diverged and therefore something was wrong with the model.

When we try to model a phenomenon, we should watch out for two types of problems:

1. The model itself is erroneous.
2. The model itself is fine; but the phenomenon we are describing does not meet all of the model’s assumptions.

The first type of problem means that the people who built the model couldn’t get their math right. That’s too bad. We let mathematicians to gloss over the messy real world, to impose whatever assumptions they want, but in return we expect a mathematically rigorous model upon which we can build. In Andrej’s story, hopefully the math support lady helped the physicists build a better model that didn’t rely on the convergence of the harmonic series.

But at some point the model has to meet the real world; and here, the issues are all but inevitable. We know that all models are wrong (meaning that they don’t describe the phenomenon ideally, not that they are erroneous) — but some are useful.

Physicists, for example, often assume that they are dealing with isolated systems, while being perfectly aware that no such system exists (except, perhaps, for the whole universe, which would be impossible to model accurately). Fortunately, they still manage to design working and safe nuclear reactors!

Consider Hask. Here, the abstraction is the notion of a category, and the phenomenon is the programming language Haskell. If types and functions of Haskell do not form a proper category, we have the second type of modelling problem. The foundation — the category theory — is, to the best of my knowledge, widely accepted among mathematicians as a solid theory.

Since category theory is often used to model other purely mathematical objects, such as groups or vector spaces, mathematicians may get used to a perfect match between the abstraction and the phenomenon being described. Other scientists (including computer scientists!) can rarely afford such luxury.

Usefulness is the ultimate criterion by which we should judge a model. We use monads in Haskell not because they are a cool CT concept, but because we tried them and found that they solve many practical problems. Comonads, which from the CT standpoint are “just” the dual of monads, have found much fewer applications, not because we found some kind of theoretical problems with them — we simply didn’t find that many problems that they help address. (To be fair, we tried hard, and we did manage to find a few.)

There are people who, inspired by some category theory constructions, come up with novel algorithms, data structures, or abstractions for Haskell. For these discoveries to work, it is neither necessary nor sufficient that they correspond perfectly to the original categorical abstractions they were derived from. And as long as playing with the “Hask category” yields helpful intuition and working programming ideas, we are going to embrace it.

# Announcing: Snap 1.0

The Snap team is delighted to announce the anxiously awaited release of version 1.0 of the Snap Web Framework for Haskell. Snap has been used in stable production applications for years now, and with this release we’re updating our version number to reflect the stability and commitment to backwards compatibility that our users depend on. Here is a summary of the major changes:

## The Details

### Now backed by io-streams

Snap’s web server has been overhauled, replacing the enumerator package with the newer, leaner, faster, and easier to use io-streams. If you were using of Snap’s low-level enumerator functions, those will need to be migrated to io-streams. Otherwise there should be few interface changes.

### More modular project template infrastructure

The snap executable that generates project templates has been moved from the snap package to snap-templates. Your snap applications depending on snap will continue to do so, but with a slightly lighter set of transitive dependencies. If you want to run snap init to generate a project template, you will now need to do cabal install snap-templates first instead of cabal install snap.

## Migration Guide

• Instead of deriving the MonadCatchIO type class, you should now make MonadBaseControl instances. Depending on your monad, this may require MonadBase and MonadTransControl instances as well. For examples of how to do that for common monad structures, look at Heist and snap (here, here, and here).

• Any exception handling functions like try, catch, etc you were using from Control.Monad.CatchIO should now come from Control.Exception.Lifted which is provided by the lifted-base package.

• initCookieSessionManager takes an additional Maybe ByteString argument representing an optional cookie domain. Passing Nothing as the new argument will give the same behavior as you had before.

## Outline

The Snap Framework is composed of five major packages:

• snap-core - A simple and stable web server API.

• snap-server - A robust and well tested web server implementing the snap-core API.

• heist - An HTML 5 template system allowing designers to make changes to markup without needing to have a Haskell toolchain installed and recompile the app.

• snap - Umbrella project that integrates the above three packages, provides a snaplet system for building reusable web components, and includes built-in snaplets for common things like sessions, auth, templating, etc.

• snap-templates - Provides an executable for generating Snap project templates.

## Acknowledgments

We would like to thank the dozens of contributors who have helped over the years to get Snap to this milestone. Particular thanks go to Greg Hale who has been instrumental in getting us across the finish line for this release.

# Dimensionful Matrices

Introduction

Programming languages and libraries for numerical work tend not to place a lot of emphasis on the types of their data. For example Matlab, R, Octave, Fortran, and Numpy (but not the now defunct Fortress) all tend to treat their data as plain numbers meaning that any time you have a temperature and a mass, say, there is nothing to prevent you adding them.

I've been wondering how much dimensions (in the sense of dimensional analysis) and units could help with numerical programming. As I pointed out on G+ recently (which is where I post shorter stuff these days), you don't have to limit dimensions to the standard ones of length, mass, time, dollars and so on. Any scale invariance in the equations you're working with can be exploited as a dimension giving you a property that can be statically checked by a compiler.

There are quite a few libraries to statically check dimensions and units now. For example Boost.Units for C++, units for Haskell and even quantities for Idris.

A matrix that breaks things

Even if a language supports dimensions, it's typical to define objects like vectors and matrices as homogeneous containers of quantities. But have a look at the Wikipedia page on the metric tensor. There is a matrix

which has the curious property that 3 entries on the diagonal seem to be dimensionless while the first entry is a squared velocity with dimension . This will break many libraries that support units. An obvious workaround is to switch to use natural units, which is much the same as abandoning the usefulness of dimensions. But there's another way, even if it may be tricky to set up with existing languages.

Heterogeneous vectors and matrices

According to a common convention in physics, a 4-vector has dimensions where I'm using the convention that we can represent the units of a vector or matrix simply as a vector or matrix of dimensions, and here is time and is length. The metric tensor is used like this: (where I'm using the Einstein summation convention so the 's and 's are summed over). If we think of having units of length squared (it is a pseudo-Riemannian metric after all) then it makes sense to think of having dimensions given by

We can write this more succinctly as

where is the usual outer product.

I'll use the notation to mean is of type . So, for example, . I'll also use pointwise notation for types such as and .

Now I can give some general rules. If is a matrix, and are vectors, and is a scalar, then only makes sense if . Similarly the "inner product" only makes sense if .

Generic vectors and matrices

Although these kinds of types might be useful if you're dealing with the kind of heterogeneous matrices that appear in relativity, there's another reason they might be useful. If you write code (in the imaginary language that supports these structures and understands dimensions and units) to be as generic as possible in the types of the vector and matrix entries, failures to type check will point out parts of the code where there are hidden assumptions, or even errors, about scaling. For example, consider a routine to find the inverse of a 3 by 3 matrix. Writing this generically as possible means we should write it to operate on a matrix of type , say. The result should have type . If this type checks when used with a suitably powerful type checker then it means that if we replace the units for type A, say, with units twice as large, it should have no effect on the result, taking into account those units. In this case, it means that if we multiply the numbers of the first row of the input by 0.5 then the numbers of the first column of the output should get multiplied by 2. In fact this is a basic property of matrix inverses. In other words, this mathematical property of matrix inverses is guaranteed by a type system that can handle units and heterogeneous matrices. It would be impossible to write a matrix inverter that type checks and fails to have this property. Unfortunately it's still possible to write a matrix inverter that type checks and is incorrect some other way. Nonetheless this kind of type system would put a very big constraint on the code and is likely to eliminate many sources of error.

An example, briefly sketched

I thought I'd look at an actual example of a matrix inverter to see what would happen if I used a type checker like the one I've described. I looked at the conjugate gradient method. At the Wikipedia page, note the line

This would immediately fail to type check because if is of generic vector type then isn't the same type as so they can't be added. I won't go into any of the details but the easiest way to patch up this code to make it type check is to introduce a new matrix of type and besides using it to make this inner product work (replacing the numerator by ) we also use anywhere in the code we need to convert a vector of type to a vector of type . If you try to do this as sparingly as possible you'll end up with a modified algorithm. But at first this seems weird. Why should this matrix inverse routine rely on someone passing in a second matrix to make it type check? And what is this new algorithm anyway? Well scroll down the Wikipedia page and you get to the preconditioned conjugate gradient algorithm. The extra matrix we need to pass in is the preconditioner. This second algorithm would type check. Preconditioned conjugate gradient, with a suitable preconditioner, generally performs better than pure conjugate gradient. So in this case we're getting slightly more than a check on our code's correctness. The type checker for our imaginary language would give a hint on how to make the code perform better. There's a reason for this. The original conjugate gradient algorithm is implicitly making a choice of units that sets scales along the axes. These determine the course taken by the algorithm. It's not at all clear that picking these scalings randomly (which is in effect what you're doing if you throw a random problem at the algorithm) is any good. It's better to pick a preconditioner adapted to the scale of the problem and the type checker is hinting (or would be if it existed) that you need to do this. Compare with the gradient descent algorithm whose scaling problems are better known.

But which language?

I guess both Agda and Idris could be made to implement what I've described. However, I've a hunch it might not be easy to use in practice.

# First impression of “Real World OCaml”

Tomorrow I will be flying to Cambridge to attend International Summer School on Metaprogramming. One of the prerequisites required from the participants is basic knowledge of OCaml, roughly the first nine chapters of “Real World OCaml” (RWO for short). I finished reading them several days ago and thought I will share my impressions about the book.

RWO was written by Yaron Minsky, Anil Madhavapeddy and Jason Hickey. It is one of a handful of books on OCaml. Other titles out there are “OCaml from the Very Beginning” and “More OCaml: Algorithms, Methods and Diversions” by John Whitington and “Practical OCaml” by Joshua Smith. I decided to go with RWO because when I asked “what is the best book on OCaml” on #ocaml IRC channel RWO was an unanimous response from several users. The title itself is obviously piggybacking on an earlier “Real World Haskell” released in the same series by O’Reilly, which was in general a good book (though it had its flaws).

The first nine chapters comprise about 40% of the book (190 pages out of 470 total) and cover the basics of OCaml: various data types (lists, records, variants), error handling, imperative programming (eg. mutable variables and data structures, I/O) and basics of the module system. Chapters 10 through 12 present advanced features of the module system and introduce object-oriented aspects of OCaml. Language ecosystem (ie. tools and libraries) is discussed in chapters 13 through 18. The remaining chapters 19 through 23 go into details of OCaml compiler like garbage collector or Foreign Function Interface.

When I think back about reading “Real World Haskell” I recall that quite a lot of space was dedicated to explaining in detail various basic functional programming concepts. “Real World OCaml” is much more dense. It approaches teaching OCaml just as if it was another programming language, without making big deal of functional programming model. I am much more experienced now than when reading RWH four years ago and this is exactly what I wanted. I wonder however how will this approach work for people new to functional programming. It reminds my of my early days as a functional programmer. I began learning Scala having previously learned Scheme and Erlang (both unusual for functional languages in lacking a type system). Both Scala and OCaml are not pure functional languages: they allow free mixing of functional and imperative (side-effecting) code. They also support object-oriented programming. My plan in learning Scala was to learn functional programming and I quickly realized that I was failing. Scala simply offered too many back-doors that allowed escaping into the imperative world. So instead of forcing me to learn a new way of thinking it allowed me to do things the old way. OCaml seems to be exactly the same in this regard and RWO offers beginners little guidance to thinking functionally. Instead, it gives them a full arsenal of imperative features early on in the book. I am not entirely convinced that this approach will work well for people new to FP.

“Real World OCaml” was published less than three years ago so it is a fairly recent book. Quite surprisingly then several sections have already gone out of date. The code does not work with the latest version of OCaml compiler and requires non-obvious changes to work. (You can of course solve the problem by working with the old version of OCaml compiler.) I was told on IRC that the authors are already working on the second edition of the book to bring it to date with today’s OCaml implementation.

Given all the above my verdict on “Real World OCaml” is that it is a really good book about OCaml itself (despite being slightly outdated) but not necessarily the best book on basics of functional programming.

# New Haskell Symposium paper on “twisted functors”

Satvik Chauhan, Piyush Kurur and I have a new paper which will appear at the 2016 Haskell Symposium in Japan:

How to Twist Pointers without Breaking Them

Although pointer manipulations are used as a primary motivating example, at heart the paper is really about “twisted functors”, a class of applicative functors which arise as a natural generalization of the semi-direct product of two monoids where one acts on the other. It’s a really cute idea1, one of those ideas which seems “obvious” in retrospect, but really hadn’t been explored before.

We give some examples of applications in the paper but I’m quite certain there are many other examples of applications out there. If you find any, let us know!

1. I can say that since it wasn’t actually my idea!

# Upcoming talk: Writing build systems with Shake, 16 Aug 2016, London

Summary: I'm giving a talk on Shake.

I'm delighted to announce that I'll be giving a talk/hack session on Shake as part of the relatively new "Haskell Hacking London" meetup.

Title: Writing build systems with Shake

Date: Tuesday, August 16, 2016. 6:30 PM

Location: Pusher Office, 28 Scrutton Street, London

Abstract: Shake is a general purpose library for expressing build systems - forms of computation, with caching, dependencies and more besides. Like all the best stuff in Haskell, Shake is generic, with details such as "files" written on top of the generic library. Of course, the real world doesn't just have "files", but specifically has "C files that need to be compiled with gcc". In this hacking session we'll look at how to write Shake rules, what existing functions people have already layered on top of Shake for compiling with specific compilers, and consider which rules are missing. Hopefully by the end we'll have a rule that people can use out-of-the-box for compiling C++ and Haskell.

To put it another way, it's all about layering up. Haskell is a programming language. Shake is a Haskell library for dependencies, minimal recomputation, parallelism etc. Shake also provides as a layer on top (but inside the same library) to write rules about files, and ways to run command line tools. Shake doesn't yet provide a layer that compiles C files, but it does provide the tools with which you can write your own. The aim of this talk/hack session is to figure out what the next layer should be, and write it. It is definitely an attempt to move into the SCons territory of build systems, which knows how to build C etc. out of the box.

# Scratching an itch: generating Cabal data-files field automatically

Maybe I didn't look hard enough, but I'm not aware of a tool to generate the contents of the Cabal data-files field automatically when you have loads of folders to include. Cabal has very simple wildcard matching for files references in this field, by design (to avoid including too much data in a source distribution). So it only supports wildcards to replace the file name inside a directory for a given extension, and doesn't support sub directories.

For the reload project - first release on Hackage! - I had to include loads of files, all the Polymer web components the UI depends on, which are all on different sub directories, with a bunch of different extensions. So I wrote a little tool to generate the field automatically, and put it on Hackage too.

You pass it a directory name, possibly some subdirectories and extensions to ignore, and it generates all the required entries. Saved me loads of time, and scratched my own itch!

# How to Get a Haskell Job

Over and over again I have seen people ask how to get a full time job programming in Haskell. So I thought I would write a blog post with tips that have worked for me as well as others I know who write Haskell professionally. For the impatient, here's the tl;dr in order from easiest to hardest:

1. IRC
2. Local meetups
3. Regional gatherings/hackathons
4. Open source contributions
5. Work where Haskell people work

First, you need to at least start learning Haskell on your own time. You had already started learning how to program before you got your first programming job. The same is true of Haskell programming. You have to show some initiative. I understand that for people with families this can be hard. But you at least need to start. After that, far and away the most important thing is to interact with other Haskell developers so you can learn from them. That point is so important it bears repeating: interacting with experienced Haskell programmers is by far the most important thing to do. Doing this at a job would be the best, but there are other things you can do.

1. IRC. Join the #haskell channel on Freenode. Lurk for awhile and follow some of the conversations. Try to participate in discussions when topics come up that interest you. Don't be afraid to ask what might seem to be stupid questions. In my experience the people in #haskell are massively patient and willing to help anyone who is genuinely trying to learn.

2. Local meetups. Check meetup.com to see if there is a Haskell meetup in a city near you. I had trouble finding a local meetup when I was first learning Haskell, but there are a lot more of them now. Don't just go to listen to the talks. Talk to people, make friends. See if there's any way you can collaborate with some of the people there.

3. Larger regional Haskell events. Find larger weekend gatherings of Haskell developers and go to them. Here are a few upcoming events that I know of off the top of my head:

The first event like this that I went to was Hac Phi a few years back. Going there majorly upped my game because I got to be around brilliant people, pair program with some of them, and ultimately ended up starting the Snap Web Framework with someone I met there. You might not have a local meetup that you can go to, but you can definitely travel to go to one of these bigger weekend events. I lived a few hours away from Hac Phi, but I know a number of people who travel further to come. If you're really interested in improving your Haskell, it is well worth the time and money. I cannot emphasize this enough.

4. Start contributing to an open source Haskell project. Find a project that interests you and dive in. Don't ask permission, just decide that you're going to learn enough to contribute to this thing no matter what. Join their project-specific IRC channel if they have one and ask questions. Find out how you can contribute. Submit pull requests. This is by far the best way to get feedback on the code that you're writing. I have actually seen multiple people (including some who didn't strike me as unusually talented at first) start Haskell and work their way up to a full-time Haskell job this way. It takes time and dedication, but it works.

5. Try to get a non-haskell job at a place where lots of Haskell people are known to work. Standard Chartered uses is Haskell but is big enough to have non-Haskell jobs that you might be able to fit. S&P Capital IQ doesn't use Haskell but has a significant number of Haskell people who are coding in Scala.

# CAS-based generic data store

Bioinformatics projects routinely generate terabytes of sequencing data, and the inevitable analysis that follows can easily increase this by an order of magnitude or more. Not everything is worth keeping, but in order to ensure reproducibility and to be able to reuse data in new projects, it is important to store what needs to be kept in a structured way.

I have previously described and implemented a generic data store, called medusa. Following the eXtreme Programming principle of always starting with the simplest implementation that could possibly work, the system was designed around a storage based on files and directories. This has worked reasonably well, and makes data discoverable and accessible both directly in the file system, and through web-based services providing browsing, metadata search (with free text and keyword based indexes), BLAST search, and so forth.

Here, I explore the concept of content adressable storage (CAS), which derives unique names for data objects from their content.

# The CAS principle

The essence of any storage system is being able to store objects with some kind of key (or label, or ID), and being able to retrive them based on the same key. What distinguishes a content adressable storage from other storage systems is that the key is generated from the entire data object, typically using a cryptographic hash function like MD5 or SHA1.

This means that a given object will always be stored under the same key, and that modifications to an object will also change its key, essentially creating a new object.

# A layered model

Using CAS more clearly separates the storage model from the semantics of data sets. This gives us a layered architecture for the complete system, and services are implemented on top of these layers as independent and modular programs.

## The object store

The object store is conceptually simple. It provides a simple interface that consists of the following primitive operations:

put
a data object into the store
list
the keys that refer to data objects
get
a data object using its key

The storage itself is completely oblivious to the actual contents of data objects, and it has no concept of hierarchy or other relationships between objects.

When we organize data, we do of course want to include relationships between objects, and also between data objects and external entities and concepts. This is the province of metadata. Metadata semantics are provided by special metadata objects which live in the object store like any other objects. Each metadata object defines and describes a specific data set. As in former incarnations of the system, metadata is structured as XML documents, and provides information about (and the identity of) the data objects constituting the data set. It also describes the relationship between data sets, for instance allowing new versions to obsolete older ones.

The metadata objects are primarily free-form text objects, allowing users to include whatever information they deem relevant and important. The purpose of using XML is to make specific parts of the information computationally accessible, unambiguous, and standardized. For instance, structured references (i.e. specific XML elements) to data objects with their key allows automatic retrieval of the complete dataset. In addition to referencing objects in the object store, similar structures allow unambigous references to external entities, for instance species, citation of scientific works, and uniform formatting of dates and geographic locations.

A command line interface to the metadata is provided through the mdz command, this allows a variety of operations on data sets, including listing, importing, exporting, and synchronizing with other repositories. In addition, the system implements a web-based front end to the data, as well as metatdata indexing via xapian.

## Data objects and services

As shown in the previous sections, the system can be conceptually divided in three levels: the object store, the metadata level, and the data semantic level. A service typically accesses data on one or more of these levels. For instance, a (hypothetical) service to ensure distributed redundancy may only need to access the object store, oblivious to the contents of the objects. Other services, like the (existing) functionality to import data sets, or transfer data sets between different servers, need to understand the metadata format. And even more specific services may also need to understand the format of data objects - e.g. the BLAST service scans metadata to find FASTA-formatted sequence data, and integrate them into its own database. The important principles that services adhere to are: 1) a service can ignore anything that is irrelevant to it, and 2) can reconstruct its entire state from the contents of the object store.

# Discussion

Perhaps the primary advantage of using the hash value as the ID for data objects, is that it allows the system to be entirely distributed. The crucial advantage is that keys (by definition) are unique to the data. With user-selected keys, the user must somehow ensure the uniqueness of the key, and this requires a central authority or at the very least an agreed-upon naming scheme. In contrast, names for objects in CAS depend only on the contents, and the system can be implemented with no central oversight.

That keys depend on contents further means that data are immutable - storing a modified data object results in a different key. Immutability is central to reproducibility (you won't get the same results if you run your analysis with different data), and previously this was maintained by keeping a separate registry of metadata checksums, and also including checksums for data objects in the metadata. This made it possible to verify correctness (as long as the registry was available and correct), with CAS, this becomes even easier since the checksum is the same as the name you use to retrieve the data object.

Another benefit is deduplication of data objects. Objects with the same contents will always be stored under the same key, so this is automatic. This also makes it easier to track files across renames (analyses tend to produce output files with generic names like "contigs.fasta", it is often useful to give these files a more descriptive name), with CAS it becomes trivial to check if any file exists in the storage.

Decoupling the data from a fixed filesystem layout introduces another level of abstraction, and this makes it easier to change the underlying storage model. In later years, key-value storage models have replaced relational databases in many applications, in particular where high scalability is more important than structured data. Consequently, we have seen a plethora of so-called "NoSQL" databases emerge, including CouchDB, Cassandra, and many others, which could be plugged in as an alternative back-end storage. Storage "in the cloud", like Amazon's S3 or Google's Cloud Storage are also good alternatives.

The added opacity makes it less likely (but still technically possible) for users with sufficient privileges to perform "illegal" operations on data (for instance, modification or removal).

The implicit assumption for CAS is that different data objects hash to different hash values. In an absolute sense, this is trivially false (since there only exist 2160 possible hash values, and an infinity of possible data objects). But it is true in a probabilistic sense, and we can calculate the probability of collisions from the birthday paradox. For practical purposes, any collision is extremely unlikely, and like the revision control system git (which also is CAS-based), collisions are checked for by the system, and can be dealt with manually if they should occur.

Abstracting out the storage layer can be an advantage, but it also makes the system more opaque. And although the ability of humans to select misleading or confusing names can hardly be underestimated, even a poorly chosen name is usually more informative than the hexadecimal key representing a hash value.

## Mixed blessings

Previous versions used a fixed directory structure, where each data set included a metadata file, and an arbitrary set of data files. Using a content adressable object store is more flexible, and there is nothing preventing the implementation of a parallel metadata scheme sharing the same data store, and even referring to the same data objects. One could also create metadata objects that refer to other metadata objects. As always, fewer restrictions also means more opportunities for confusion and increased complexity.

Perhaps the most drastic change is how datasets can have their status changed - e.g. be marked as obsolete or invalid. Previously, metadata was versioned, meaning there could exist a (linear) sequence of metadata for the same dataset. This was enforced by convention only, and also required a central synchronization of metadata updates to avoid name and version collisions. Since the object store only allows the addition of new objects, and in particular, not modification, status updates can only be achieved by adding new objects. Metadata objects can refer to other datasets, and specify a context, for instance, a data set containing analysis results can specify being based on a data set containing input data for the analysis. Status changes are now implemented using this mechanism, and datasets can refer to other data sets as "invalidated" or "obsoleted".

# Current Status and Availability

The system is currently working on my internal systems, it is based on standard components (mostly shell scripts), and although one might expect some rough edges, it should be fairly easy to deploy.

Do let me know if you are interested.

# Measuring Software Fragility

<style> .hl { background-color: orange; } </style>

While writing this comment on reddit I came up with an interesting question that I think might be a useful way of thinking about programming languages. What percentage of single non-whitespace characters in your source code could be changed to a different character such that the change would pass your CI build system but would result in a runtime bug? Let's call this the software fragility number because I think that metric gives a potentially useful measure of how bug prone your software is.

At the end of the day software is a mountain of bytes and you're trying to get them into a particular configuration. Whether you're writing a new app from scratch, fixing bugs, or adding new features, the number of bytes of source code you have (similar to LOC, SLOC, or maybe the compressed number of bytes) is rough indication of the complexity of your project. If we model programmer actions as random byte mutations over all of a project's source and we're trying to predict the project's defect rate this software fragility number is exactly the thing we need to know.

Now I'm sure many people will be quick to point out that this random mutation model is not accurate. Of course that's true. But I would argue that in this way it's similar to the efficient markets hypothesis in finance. Real world markets are obviously not efficient (Google didn't become $26 billion less valuable because the UK voted for brexit). But the efficient markets model is still really useful--and good luck finding a better one that everybody will agree on. What this model lacks in real world fidelity, it makes up for in practicality. We can actually build an automated system to calculate a reasonable approximation of the fragility number. All that has to be done is take a project, randomly mutate a character, run the project's whole CI build, and see if the result fails the build. Repeat this for every non-whitespace character in the project and count how many characters pass the build. Since the character was generated at random, I think it's reasonable to assume that any mutation that passes the build is almost definitely a bug. Performing this process for every character in a large project would obviously require a lot of CPU time. We could make this more tractable by picking characters at random to mutate. Repeat this until you have done it for a large enough number of characters and then see what percentage of them made it through the build. Alternatively, instead of choosing random characters you could choose whole modules at random to get more uniform coverage over different parts of the language's grammar. There are probably a number of different algorithms that could be tried for picking random subsets of characters to test. Similar to numerical approximation algorithms such as Newton's method, any of these algorithms could track the convergence of the estimate and stop when the value gets to a sufficient level of stability. Now let's investigate actual fragility numbers for some simple bits of example code to see how this notion behaves. First let's look at some JavaScript examples. It's worth noting that comment characters should not be allowed to be chosen for mutation since they obviously don't affect the correctness of the program. So the comments you see here have not been included in the calculations. Fragile characters are highlighted in orange. // Fragility 12 / 48 = 0.25 function f(n) { if ( n < 2 ) return 1; else return n * f(n-1); } // Fragility 14 / 56 = 0.25 function g(n) { var p = 1; for (var i = 2; i <= n; i++ ) { p *= i; } return p; } First I should say that I didn't write an actual program to calculate these. I just eyeballed it and thought about what things would fail. I easily could have made mistakes here. In some cases it may even be subjective, so I'm open to corrections or different views. Since JavaScript is not statically typed, every character of every identifier is fragile--mutating them will not cause a build error because there isn't much of a build. JavaScript won't complain, you'll just start getting undefined values. If you've done a signifciant amount of JavaScript development, you've almost definitely encountered bugs from mistyped identifier names like this. I think it's mildly interesting that the recursive and iterative formulations if this function both have the same fragility. I expected them to be different. But maybe that's just luck. Numerical constants as well as comparison and arithmetic operators will also cause runtime bugs. These, however, are more debatable because if you use the random procedure I outlined above, you'll probably get a build failure because the character would have probably changed to something syntactically incorrect. In my experience, it semes like when you mistype an alpha character, it's likely that the wrong character will also be an alpha character. The same seems to be true for the classes of numeric characters as well as symbols. The method I'm proposing is that the random mutation should preserve the character class. Alpha characters should remain alpha, numeric should remain numeric, and symbols should remain symbols. In fact, my original intuition goes even further than that by only replacing comparison operators with other comparison operators--you want to maximize the chance that new mutated character will cause a successful build so the metric will give you a worst-case estimate of fragility. There's certainly room for research into what patterns tend come up in the real world and other algorithms that might describe that better. Now let's go to the other end of the programming language spectrum and see what the fragility number might look like for Haskell. // Fragility 7 / 38 = 0.18 f :: Int -> Int f n | n < 2 = 1 | otherwise = n * f (n-1) Haskell's much more substantial compile time checks mean that mutations to identifier names can't cause bugs in this example. The fragile characters here are clearly essential parts of the algorithm we're implementing. Maybe we could relate this idea to information theory and think of it as an idea of how much information is contained in the algorithm. One interesting thing to note here is the effect of the length of identifier names on the fragility number. In JavaScript, long identifier names will increase the fragility because all identifier characters can be mutated and will cause a bug. But in Haskell, since identifier characters are not fragile, longer names will lower the fragility score. Choosing to use single character identifier names everywhere makes these Haskell fragility numbers the worst case and makes JavaScript fragility numbers the best case. Another point is that since I've used single letter identifier names it is possible for a random identifier mutation in Haskell to not cause a build failure but still cause a bug. Take for instance a function that takes two Int parameters x and y. If y was mutated to x, the program would still compile, but it would cause a bug. My set of highlighted fragile characters above does not take this into account because it's trivially avoidable by using longer identifier names. Maybe this is an argument against one letter identifier names, something that Haskell gets criticism for. Here's the snippet of Haskell code I was talking about in the above reddit comment that got me thinking about all this in the first place: // Fragility 31 / 277 = 0.11 data MetadataInfo = MetadataInfo { title :: Text , description :: Text } pageMetadataWidget :: MonadWidget t m => Dynamic t MetadataInfo -> m () pageMetadataWidget i = do el "title"$ dynText $title <$> i
elDynAttr "meta" (mkDescAttrs . description <$> i) blank where mkDescAttrs desc = "name" =: "description" "content" =: desc In this snippet, the fragility number is probably close to 31 characters--the number of characters in string literals. This is out of a total of 277 non-whitespace characters, so the software fragility number for this bit of code is 11%. This half the fragility of the JS code we saw above! And as I've pointed out, larger real world JS examples are likely to have even higher fragility. I'm not sure how much we can conclude about the actual ratios of these fragility numbers, but at the very least it matches my experience that JS programs are significantly more buggy than Haskell programs. The TDD people are probably thinking that my JS examples aren't very realistic because none of them have tests, and that tests would catch most of the identifier name mutations, bringing the fragility down closer to Haskell territory. It is true that tests will probably catch some of these things. But you have to write code to make that happen! It doesn't happen by default. Also, you need to take into account the fact that the tests themselves will have some fragility. Tests require time and effort to maintain. This is an area where this notion of the fragility number becomes less accurate. I suspect that since the metric only considers single character mutations it will underestimate the fragility of tests since mutating single characters in tests will automatically cause a build failure. There seems to be a slightly paradoxical relationship between the fragility number and DRY. Imagine our above JS factorial functions had a test that completely reimplemented factorial and then tried a bunch of random values Quickcheck-style. This would yield a fragility number of zero! Any single character change in the code would cause a test failure. And any single character change in the tests would also cause a test failure. Single character changes can no longer classified fragile because we've violated DRY. You might say that the test suite shouldn't reimplement algorithm--you should just specific cases like f(5) == 120. But in an information theory sense this is still violating DRY. Does this mean that the fragility number is not very useful? Maybe. I don't know. But I don't think it means that we should just throw away the idea. Maybe we should just keep in mind that this particular formulation doesn't have much to tell us about the fragility more complex coordinated multi-character changes. I could see the usefulness of this metric going either way. It could simplify down to something not very profound. Or it could be that measurements of the fragility of real world software projects end up revealing some interesting insights that are not immediately obvious even from my analysis here. Whatever the usefulness of this fragility metric, I think the concept gets is thinking about software defects in a different way than we might be used to. If it turns out that my single character mutation model isn't very useful, perhaps the extension to multi-character changes could be useful. Hopefully this will inspire more people to think about these issues and play with the ideas in a way that will help us progress towards more reliable software and tools to build it with. EDIT: Unsurprisingly, I'm not the first person to have come up with this idea. It looks like it's commonly known as mutation testing. That Wikipedia article makes it sound like mutation testing is commonly thought of as a way to assess your project's test suite. I'm particularly interested in what it might tell us about programming languages...i.e. how much "testing" we get out of the box because of our choice of programming language and implementation. ### Philip Wadler # Michael Moore: Five Reasons Why Trump Will Win In the Huffington Post, Michael Moore give the most incisive (and hilarious) analysis I've seen. We have to understand why this is happening if we are to have a hope of preventing it. 1. Midwest Math, or Welcome to Our Rust Belt Brexit. I believe Trump is going to focus much of his attention on the four blue states in the rustbelt of the upper Great Lakes - Michigan, Ohio, Pennsylvania and Wisconsin. Four traditionally Democratic states - but each of them have elected a Republican governor since 2010 (only Pennsylvania has now finally elected a Democrat). In the Michigan primary in March, more Michiganders came out to vote for the Republicans (1.32 million) that the Democrats (1.19 million). Trump is ahead of Hillary in the latest polls in Pennsylvania and tied with her in Ohio. Tied? How can the race be this close after everything Trump has said and done? Well maybe it’s because he’s said (correctly) that the Clintons’ support of NAFTA helped to destroy the industrial states of the Upper Midwest. ... And this is where the math comes in. In 2012, Mitt Romney lost by 64 electoral votes. Add up the electoral votes cast by Michigan, Ohio, Pennsylvania and Wisconsin. It’s 64. All Trump needs to do to win is to carry, as he’s expected to do, the swath of traditional red states from Idaho to Georgia (states that’ll never vote for Hillary Clinton), and then he just needs these four rust belt states. He doesn’t need Florida. He doesn’t need Colorado or Virginia. Just Michigan, Ohio, Pennsylvania and Wisconsin. And that will put him over the top. This is how it will happen in November. 4. The Depressed Sanders Vote. Stop fretting about Bernie’s supporters not voting for Clinton - we’re voting for Clinton! The polls already show that more Sanders voters will vote for Hillary this year than the number of Hillary primary voters in ‘08 who then voted for Obama. This is not the problem. The fire alarm that should be going off is that while the average Bernie backer will drag him/herself to the polls that day to somewhat reluctantly vote for Hillary, it will be what’s called a “depressed vote” - meaning the voter doesn’t bring five people to vote with her. He doesn’t volunteer 10 hours in the month leading up to the election. She never talks in an excited voice when asked why she’s voting for Hillary. A depressed voter. Because, when you’re young, you have zero tolerance for phonies and BS. Returning to the Clinton/Bush era for them is like suddenly having to pay for music, or using MySpace or carrying around one of those big-ass portable phones. They’re not going to vote for Trump; some will vote third party, but many will just stay home. Hillary Clinton is going to have to do something to give them a reason to support her — and picking a moderate, bland-o, middle of the road old white guy as her running mate is not the kind of edgy move that tells millenials that their vote is important to Hillary. Having two women on the ticket - that was an exciting idea. But then Hillary got scared and has decided to play it safe. This is just one example of how she is killing the youth vote. ### Don Stewart (dons) # Four roles in Strats at Standard Chartered (London and Singapore) The Strats team at Standard Chartered has another four new open positions for typed functional programming developers, based in London or Singapore. Strats are a specialized software engineering and quantitative analysis team who build a broad range of software for financial markets users at Standard Chartered. You will work on the trading floor, directly with traders, sales and risk managers, building software to automate their work and improve their efficiency. The new roles are to build low latency XVA pricing services and for trade hedge identification and management. Other roles and projects are also possible. In general you will use Haskell for almost all tasks: data analysis, market data publishing, database access, web services, desktop GUIs, large parallel tasks, quantitative models, solvers, everything. This is a fast paced role – code you write today will be deployed within hours to hundreds of users and has to work. These are permanent, associate director and director positions, in London and Singapore as part of the Strats global team. Demonstrated experience in typed FP (Haskell, OCaml, F# etc) is required. We have around 3 million lines of Haskell, and our own Haskell compiler. In this context we look for skill and taste in typed functional programming to capture and abstract over complex, messy systems. You would join a growing team of around 20 experienced Haskell developers that is expanding due to increased business need for Haskell developers. Experience writing typed APIs to external systems such as databases, web services, pub/sub platforms is very desirable. We like working code, so if you have Hackage or github libraries, we definitely want to see them. We also like StackOverflow answers, blog posts, academic papers, or other arenas where you can show broad FP ability. A PhD or Masters Degree in Computer Science is an advantage (but not a requirement). A bachelor’s degree in computer science, math or a related discipline is a strong advantage. The role requires physical presence on the trading floor in Singapore or London. Remote work is not an option. You will have some project and client management skills — you will talk to users, understand their problems and then implement and deliver what they really need. No financial background is required. These positions have attractive remuneration for the right candidates. Relocation support will also be provided. Contracting-based positions are also possible if desired. Applicants who don’t necessarily meet all criteria but have an interest in working in Singapore in particular, and have an FP background, are encouraged to apply. More info about our development process is in the 2012 PADL keynote, and a 2013 HaskellCast interview. If this sounds exciting to you, please send your resume to me – donald.stewart <at> sc.com Tagged: jobs ### mightybyte # "cabal gen-bounds": easy generation of dependency version bounds In my last post I showed how release dates are not a good way of inferring version bounds. The package repository should not make assumptions about what versions you have tested against. You need to tell it. But from what I've seen there are two problems with specifying version bounds: 1. Lack of knowledge about how to specify proper bounds 2. Unwillingness to take the time to do so Early in my Haskell days, the first time I wrote a cabal file I distinctly remember getting to the dependencies section and having no idea what to put for the version bounds. So I just ignored them and moved on. The result of that decision is that I can no longer build that app today. I would really like to, but it's just not worth the effort to try. It wasn't until much later that I learned about the PVP and how to properly set bounds. But even then, there was still an obstacle. It can take some time to add appropriate version bounds to all of a package's dependencies. So even if you know the correct scheme to use, you might not want to take the time to do it. Both of these problems are surmountable. And in the spirit of doing that, I would like to propose a "cabal gen-bounds" command. It would check all dependencies to see which ones are missing upper bounds and output correct bounds for them. I have implemented this feature and it is available at https://github.com/mightybyte/cabal/tree/gen-bounds. Here is what it looks like to use this command on the cabal-install package:$ cabal gen-bounds
Resolving dependencies...

The following packages need bounds and here is a suggested starting point.
You can copy and paste this into the build-depends section in your .cabal
file and it should work (with the appropriate removal of commas).

Note that version bounds are a statement that you've successfully built and
tested your package and expect it to work with any of the specified package
versions (PROVIDED that those packages continue to conform with the PVP).
Therefore, the version bounds generated here are the most conservative
based on the versions that you are currently building with.  If you know
your package will work with versions outside the ranges generated here,
feel free to widen them.

network      >= 2.6.2 && < 2.7,
network-uri  >= 2.6.0 && < 2.7,

The user can then paste these lines into their build-depends file. They are formatted in a way that facilitates easy editing as the user finds more versions (either newer or older) that the package builds with. This serves to both educate users and automate the process. I think this removes one of the main frustrations people have about upper bounds and is a step in the right direction of getting more hackage packages to supply them. Hopefully it will be merged upstream and be available in cabal-install in the future.

# Cubical Higher Type Theory as a Programming Language

I gave a presentation at the Workshop on Categorical Logic and Univalent Foundations held in Leeds, UK July 27-29th 2016.  My talk, entitled Computational Higher Type Theory, concerns a formulation of higher-dimensional type theory in which terms are interpreted directly as programs and types as programs that express specifications of program behavior.  This approach to type theory, first suggested by Per Martin-Löf in his famous paper Constructive Mathematics and Computer Programming and developed more fully by The NuPRL Project, emphasizes constructive mathematics in the Brouwerian sense: proofs are programs, propositions are types.

The now more popular accounts of type theory emphasize the axiomatic freedom given by making fewer foundational commitments, such as not asserting the decidability of every type, but give only  an indirect account of their computational content, and then only in some cases.  In particular, the computational content of Voevodsky’s Univalence Axiom in Homotopy Type Theory remains unclear, though the Bezem-Coquand-Huber model in cubical sets carried out in constructive set theory gives justification for its constructivity.

To elicit the computational meaning of higher type theory more clearly, emphasis has shifted to cubical type theory (in at least two distinct forms) in which the higher-dimensional structure of types is judgmentally explicit as the higher cells of a type, which are interpreted as identifications.  In the above-linked talk I explain how to construe a cubical higher type theory directly as a programming language.  Other efforts, notably by Cohen-Coquand-Huber-Mörtberg, have similar goals, but using somewhat different methods.

For more information, please see my home page on which are linked two arXiv papers providing the mathematical details, and a 12-page paper summarizing the approach and the major results obtained so far.  These papers represent joint work with Carlo Angiuli and Todd Wilson.

Filed under: Research

# Static versus dynamic

In the transition from Objective-C to Swift, the iOS and Mac developer community struggles with a question: What is better? Static or dynamic languages?

To answer this question, we have to ask a few more. What is a static language? What is a dynamic language? More fundamentally, what is a programming language?

Every programming language is what computer scientists call a formal language. It is a rigorously defined construct including (1) a formal grammar (its syntactic formation rules) and (2) a formal semantics (the meaning of syntactically well-formed programs). This is always the case, irrespective of whether the creators of the programming language designed the language with those components in mind. As soon as they implement the language (by writing an interpreter or compiler), they indirectly commit to a formal grammar and a formal semantics. In other words, by implementing the language, the language creators fix what happens if you run and maybe compile programs in that new language. They fix that with the utmost precision, as otherwise no computer would be able to run these programs — hence, we call them formal.

For our discussion, the most interesting component is the language semantics, which we can, again, split into two components: the static semantics and the dynamic semantics. The static semantics are those aspects of a program that we can reason about without executing the program. A prime example of the static semantics is scoping: if I use a variable x, which declaration does that x refer to. Another aspect of the static semantics is the (static) type system.

In contrast, the dynamic semantics characterises the execution of programs. It determines what the computer does when it processes a particular language construct.

Every language has both a static and a dynamic semantics. Without a static semantics, we wouldn’t know which declarations the use of variables, functions, and so forth refer to and, without a dynamic semantics, we cannot run a program. Nevertheless, the static and dynamic semantics of different languages can surely be of varying levels of expressiveness. For example, the expressiveness of the static semantics varies with the capabilities of the type system, and the expressiveness of the dynamic semantics depends on whether the language includes advanced runtime capabilities, such as exceptions, first-class functions, reflection, or dynamic method dispatch.

When people talk about “dynamic” versus “static” languages, they suggest that the languages designers put a particular emphasis on either the dynamic or static semantics of the language. For example, Smalltalk and Lisp, as representatives of dynamic languages, lack a static type system, but come with strong support for reflection and meta-programming. In contrast, Java, a popular static language, lacks strong support for meta-programming, but employs a strong static type system.

Unfortunately, this characterisation is increasingly inaccurate. Modern languages with a strong static semantics also support features requiring an advanced dynamic semantics. For example, meta-programming is a common theme in C++, ML dialects, and Haskell. Moreover, work on flow types, contracts, and gradual typing can be regarded as enhancing the static semantics of Lisp dialects, JavaScript, and others.

All in all, it is not an either-or proposition. Just because a language has a strong static semantics does not mean that it cannot also have a strong dynamic semantics. If languages often start out falling into one camp, this is because language design requires much hard work and you get something working more quickly by limiting the scope of your aspiration. Moreover, it took programming language researchers a while to understand the static properties of advanced runtime behaviour. In any case, it is about time to retire this outdated bifurcation of programming languages.

So, what is better? A static or a dynamic language? It is certainly best —but also a lot of work— to have a language with both a strong static and a strong dynamic semantics. Looking at the development of Swift, it surely shapes up to be strong in both areas.

Update: Added dynamic method dispatch to the list of examples of features of the dynamic semantics as a response to this discussion: https://twitter.com/irrequietus/status/760812875128664064

# Intero for Emacs: Changes June–July

Intero was made public in the start of June. Here's a rundown of the changes made since then:

• Now when the backend fails to start, it stops retrying when you're working until you kill the buffer.
• When the backend is starting and it fails due to missing dependencies, it automatically re-runs without passing --no-build to stack; leading to build all the dependencies and then starting. This leads to a nice workflow of adding a package to the .cabal file and hitting M-x intero-restart.
• Auto-completion of imports and pragmas.
• Company-mode integration is asynchronous now, so it doesn't lock up the editor.
• Removed hlint from next-checkers as it was bothering people. It's easy to re-enable with standard flycheck settings.
• Now you can switch targets (e.g. M-x intero-targets) using the multi-switch view, like this. Saves you having to remember your targets and the syntax for specifying them.
• You can now launch the REPL with C-u prefix so that it pops up an options list on how to start the REPL.
• Fixed a bug in the warnings parser.
• Added intero-toggle-debug (#79, #151), good for debugging issues with Intero.
• Finally made a reliable way to save the current buffer for flycheck. This no longer interacts badly with magit or external changes to your files.
• Added C-c C-z to switch to and from the REPL.
• Added a suggestions system. When you hit C-c C-r, you get a list of suggestions that you can check and then apply with C-c C-c:

• Automatically add extensions when GHC suggests them. Example:

Can't make a derived instance of ‘Functor X’:
You need DeriveFunctor to derive an instance for this class
Try GeneralizedNewtypeDeriving for GHC's newtype-deriving extension
In the newtype declaration for ‘X’
• Automatically remove redundant imports. Example:

The import of ‘Control.Monad’ is redundant
except perhaps to import instances from ‘Control.Monad’
To import instances alone, use: import Control.Monad()... (intero)
• Fix typos. Example:

Not in scope: ‘putStrn’
Perhaps you meant one of these:
‘putStr’ (imported from Prelude),
‘putStrLn’ (imported from Prelude)
• Adding top-level type signatures. Example:

Top-level binding with no type signature: main :: IO ()
• Removing redundant class constraints. Example:

Redundant constraints: (Arith var, Bitwise var)
• And turning off warnings for name shadowing and type defaulting. (Checkbox is not checked by default.)
• And other miscellaneous bug fixes.

# Announce: public Jenkins CI server

We have set up a new public Jenkins CI server for use with our open source projects. This server currently runs the Stack integration tests, and deploys to ci.haskell-lang.org and ci.stackage.org every time a commit is pushed to the master branch of their respective repositories.

In the future, we also intend to set up Jenkins to run the Stack integration tests on all supported platforms (rather than only Linux) using additional Jenkins workers, and get it to run them for pull requests as well.

While we use Travis CI, there are a couple of ways it does not meet our needs that Jenkins helps us with:

• For long builds, we hit the 50 minute job timeout. While a standard series of tests should not exceed this amount of time, we also want to run more exhaustive integration tests which sometimes take longer. We can let Travis run the standard tests on PRs, and then periodically run the more extensive tests on Jenkins.

• Some projects need to build Docker images. While Travis does support this, it means enabling the "standard" (non container-based) environment for jobs, which in turn does not support caching builds for public projects. For Haskell projects in particular, working without a cache means very long build times.

• Projects also need to push Docker images to a registry and deploy them to a Kubernetes cluster. This requires exposing credentials to builds, which is impossible to secure when building code that uses TemplateHaskell, which allows running arbitrary code during the build.

For these projects, we continue to use Travis for quick feedback on PRs, but let Jenkins take care of the integration tests and deployments where we need more control over resource limitations and isolation of different build phases.

We run all the builds and tests on ephemeral, isolated Jenkins workers using the Docker plugin. These workers do not have access to any credentials, so there is no risk of credentials "leaking" into build logs or otherwise being accessed inappropriately.

For projects that need auto-deployment, the isolated build job stages the assets to be deployed, and then a separate deploy job is triggered if the build is successful. The deploy job runs on the Jenkins master which has access to required credentials, but it does not check out the project's source code from Github or run anything developer-provided. It only copies the built artifacts from the upstream job, builds and pushes a Docker image, and then updates a Kubernetes Deployment with the new image. Our public Jenkins server does not ever see any credentials for proprietary repos or mission-critical infrastructure, so even if security is breached it will have no effect beyond the CI system itself.

For production deployments of open source applications, we have a separate private Jenkins server that builds from the prod branch of the Git repositories, and deploys to a separate cluster. We ensure that the prod branch is protected so that only project administrators can trigger a production deployment.

We avoid using too many Jenkins-specific features. Essentially, we use Jenkins to perform triggering, notification and provide the build environment, but don't use Jenkins plugins to build Docker images or perform deployments. The Jenkins Docker plugin could commit an image after building and testing the code, but then we would have large images containing the whole build environment rather than minimal images containing only the application to deploy. We prefer instead to leave it to our own custom tooling that we can tailor to our needs and which can be run in many different environments so that we are not locked into Jenkins. A developer with access to the right credentials could, if necessary, perform the process easily from their own workstation by running the same build and deploy scripts as the Jenkins jobs run.

The Jenkins servers themselves run on EC2, with all cloud infrastructure managed using Hashicorp's Terraform, and the instances managed using Red Hat's Ansible. The Kubernetes cluster is set up using CoreOS's kube-aws tool.

# Decomposing a function into its even and odd parts

As I have mentioned before, I am not a sudden-flash-of-insight person. Every once in a while it happens, but usually my thinking style is to minutely examine a large mass of examples and then gradually synthesize some conclusion about them. I am a penetrating but slow thinker. But there have been a few occasions in my life when the solution to a problem struck me suddenly out of the blue.

One such occasion was on the first day of my sophomore honors physics class in 1987. This was one of the best classes I took in my college career. It was given by Professor Stephen Nettel, and it was about resonance phenomena. I love when a course has a single overarching theme and proceeds to examine it in detail; that is all too rare. I deeply regret leaving my copy of the course notes in a restaurant in 1995.

The course was very difficult, But also very satisfying. It was also somewhat hair-raising, because of Professor Nettel's habit of saying, all through the second half “Don't worry if it doesn't seem to make any sense, it will all come together for you during the final exam.” This was not reassuring. But he was right! It did all come together during the final exam.

The exam had two sets of problems. The problems on the left side of the exam paper concerned some mechanical system, I think a rod fixed at one end and free at the other, or something like that. This set of problems asked us to calculate the resonant frequency of the rod, its rate of damping at various driving frequencies, and related matters. The right-hand problems were about an electrical system involving a resistor, capacitor, and inductor. The questions were the same, and the answers were formally identical, differing only in the details: on the left, the answers involved length, mass and stiffness of the rod, and on the right, the resistance, capacitance, and inductance of the electrical components. It was a brilliant exam, and I have never learned so much about a subject during the final exam.

Anyway, I digress. After the first class, we were assigned homework. One of the problems was

Show that every function is the sum of an even function and an odd function.

(Maybe I should explain that an even function is one which is symmetric across the -axis; formally it is a function for which for every . For example, the function , shown below left. An odd function is one which is symmetric under a half-turn about the origin; formally it satisfies for all . For example , shown below right.)

I found this claim very surprising, and we had no idea how to solve it. Well, not quite no idea: I knew that functions could be expanded in Fourier series, as the sum of a sine series and a cosine series, and the sine part was odd while the cosine part was even. But this seemed like a bigger hammer than was required, particularly since new sophomores were not expected to know about Fourier series.

I had the privilege to be in that class with Ron Buckmire, and I remember we stood outside the class building in the autumn sunshine and discussed the problem. I might have been thinking that perhaps there was some way to replace the negative part of with a reflected copy of the positive part to make an even function, and maybe that was always even, when I was hit from the blue with the solution:

\begin{align} f_e(x) & = \frac{f(x) + f(-x)}2 \text{ is even},\\ f_o(x) & = \frac{f(x) - f(-x)}2 \text{ is odd, and}\\ f(x) &= f_e(x) + f_o(x) \end{align}

So that was that problem solved. I don't remember the other three problems in that day's homework, but I have remembered that one ever since.

But for some reason, it didn't occur to me until today to think about what those functions actually looked like. Of course, if itself is even, then and , and similarly if is odd. But most functions are neither even nor odd.

For example, consider the function , which is neither even nor odd. Then we get

\begin{align} f_e(x) & = \frac{2^x + 2^{-x}}2\\ f_o(x) & = \frac{2^x - 2^{-x}}2 \end{align}

The graph is below left. The solid red line is , and the blue and purple dotted lines are and . The red line is the sum of the blue and purple lines. I thought this was very interesting-looking, but a little later I realized that I had already known what these graphs would look like, because is just like , and for the even and odd components are exactly the familiar and functions. (Below left, ; below right, .)

I wasn't expecting polynomials to be more interesting, but they were. (Polynomials whose terms are all odd powers of , such as , are always odd functions, and similarly polynomials whose terms are all even powers of are even functions.) For example, consider , which is neither even nor odd. We don't even need the and formulas to separate this into even and odd parts: just expand as and separate it into odd and even powers, and :

Or we could do similarly, expanding it as and separating this into and :

I love looking at these and seeing how the even blue line and the odd purple line conspire together to make whatever red line I want.

I kept wanting to try familiar simple functions, like , but many of these are either even or odd, and so are uninteresting for this application. But you can make an even or an odd function into a neither-even-nor-odd function just by translating it horizontally, which you do by replacing with . So the next function I tried was , which is the translation of . Here I got a surprise. I knew that was undefined at , so I graphed it only for . But the even component is , which is undefined at both and at . Similarly the odd component is undefined at two points. So the formula does not work quite correctly, failing to produce the correct value at , even though is defined there. In general, if is undefined at some , then the decomposition into even and odd components fails at as well. The limit $$\lim_{x\to -c} f(x) = \lim_{x\to -c} \left(f_o(x) + f_e(x)\right)$$ does hold, however. The graph below shows the decomposition of .

Vertical translations are uninteresting: they leave unchanged and translate by the same amount, as you can verify algebraically or just by thinking about it.

Following the same strategy I tried a cosine wave. The evenness of the cosine function is one of its principal properties, so I translated it and used . The graph below is actually for to prevent the details from being too compressed:

This reminded me of the time I was fourteen and graphed and was surprised to see that it was another perfect sinusoid. But I realized that there was a simple way to understand this. I already knew that . If you take and multiply the whole thing by , you get $$\sqrt2\cos\left(x + \frac\pi4\right) = \sqrt2\sin x\cos\frac\pi4 + \sqrt2\cos x\sin\frac\pi4 = \sin x + \cos x$$ so that is just a shifted, scaled cosine curve. The decomposition of is even simpler because you can work forward instead of backward and find that , and the first term is odd while the second term is even, so that decomposes as a sum of an even and an odd sinusoid as you see in the graph above.

Finally, I tried a Poisson distribution, which is highly asymmetric. The formula for the Poisson distribution is , for some constant . The in the denominator is only defined for non-negative integer , but you can extend it to fractional and negative in the usual way by using instead, where is the Gamma function. The function is undefined at zero and negative integers, but fortunately what we need here is the reciprocal gamma function , which is perfectly well-behaved. The results are spectacular. The graph below has .

The part of this with is the most interesting to me, because the Poisson distribution has a very distinctive shape, and once again I like seeing the blue and purple functions working together to make it. I think it's just great how the red line goes gently to zero as increases, even though the even and the odd components are going wild. ( increases rapidly with , so the reciprocal function goes rapidly to zero. But the even and odd components also have a part, and this is what dominates the blue and purple lines when .)

On the side it has no meaning for me, and it's just wiggly lines. It hadn't occurred to me before that you could extend the Poisson distribution function to negative , and I still can't imagine what it could mean, but I suppose why not. Probably some statistician could explain to me what the Poisson distribution is about when .

You can also consider the function , which breaks down completely, because either or is undefined except when . So the claim that every function is the sum of an even and an odd function fails here too. Except perhaps not! You could probably consider the extension of the square root function to the complex plane, and take one of its branches, and I suppose it works out just fine. The geometric interpretation of evenness and oddness are very different, of course, and you can't really draw the graphs unless you have four-dimensional vision.

I have no particular point to make, except maybe that math is fun, even elementary math (or perhaps especially elementary math) and it's fun to see how it works out.

The beautiful graphs in this article were made with Desmos. I had dreaded having to illustrate my article with graphs from Gnuplot (ugh) or Wolfram|α (double ugh) and was thrilled to find such a handsome alternative.

[ Addendum: I've just discovered that in Desmos you can include a parameter in the functions that it graphs, and attach the parameter to a slider. So for example you can arrange to have it display or , with the value of controlled by the slider, and have the graph move left and right on the plane as you adjust the slider, with its even and odd parts changing in real time to match. ]

[ For example, check out travelling Gaussians or varying sinusoid. ]

# Elm Developer at Takt (Full-time)

Takt is seeking a front-end developer excited about Elm to help develop our flagship product. We just closed a $30 million Series A, and we're already reaching more than 10 million users at Starbucks, making us one of the largest ventures built on Haskell + Elm. Our platform processes giant event streams of all kinds, identifying patterns, trends and opportunities to intervene and improve processes, aided by machine learning. Our vision will change the way people engage across multiple industries, be it retail, finance, or healthcare. As a Takt engineer, you'll work in small, self-sufficient teams with the shared goal of delivering excellent software anchored in an agile culture of quality, delivery, and innovation. You understand that legacy code is the work you did yesterday. You also share our passion for functional programming and using data to solve complex problems. KEY RESPONSIBILITIES • Use functional programming languages (Elm!) to build applications and Front-Ends • Work on complex design challenges, understanding customer needs and crafting simple, beautiful solutions • Expose complex application functionality in straightforward and elegant ways • Develop functional and beautiful visualizations of complex data • Deliver working software in short sprints • Help grow our engineering team SKILLS + EXPERIENCE • Strong, demonstrated experience developing software using functional Javascript (Elm, PureScript, Clojure.) • Significant experience with dynamic, interactive data visualization (e.g. D3) • Demonstrated experience building sophisticated and complex applications, such as workflow management tools • Proven experience in unit testing front-end applications BONUS POINTS • Personal projects or production experience with Elm • You welcome the responsibility and thrill that comes with being a member of a founding team • You're motivated, dependable, and continuously focused on excellence ABOUT TAKT Takt distills complex data into precise actions; we orchestrate physical and digital exchanges into one seamless journey. Our business is building lasting, trusted relationships between people and brands—and making it look easy. We're already reaching millions of people a day, and we're just getting started. Our founding leadership is equal parts business, design, and engineering—because we believe differing perspectives + passionate discourse achieve the greatest outcomes. We are collectively talented, but also humble. We give our whole selves. We love learning new things. We are an equal-opportunity employer, and strive to make hiring decisions that reflect that. If you're up for the challenge of a lifetime, we're looking for outstanding talent to join our team. Get information on how to apply for this position. ## July 28, 2016 ### Mark Jason Dominus # Controlling the KDE screen locking works now Yesterday I wrote about how I was trying to control the KDE screenlocker's timeout from a shell script and all the fun stuff I learned along the way. Then after I published the article I discovered that my solution didn't work. But today I fixed it and it does work. ### What didn't work I had written this script: timeout=${1:-3600}
perl -i -lpe 's/^Enabled=.*/Enabled=False/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration sleep$timeout
perl -i -lpe 's/^Enabled=.*/Enabled=True/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration The strategy was: use perl to rewrite the screen locker's configuration file, and then use qdbus to send a D-Bus message to the screen locker to order it to load the updated configuration. This didn't work. The System Settings app would see the changed configuration, and report what I expected, but the screen saver itself was still behaving according to the old configuration. Maybe the qdbus command was wrong or maybe the whole theory was bad. ### More strace For want of anything else to do (when all you have is a hammer…), I went back to using strace to see what else I could dig up, and tried strace -ff -o /tmp/ss/s /usr/bin/systemsettings which tells strace to write separate files for each process or thread. I had a fantasy that by splitting the trace for each process into a separate file, I might solve the mysterious problem of the missing string data. This didn't come true, unfortunately. I then ran tail -f on each of the output files, and used systemsettings to update the screen locker configuration, looking to see which the of the trace files changed. I didn't get too much out of this. A great deal of the trace was concerned with X protocol traffic between the application and the display server. But I did notice this portion, which I found extremely suggestive, even with the filenames missing: 3106 open(0x2bb57a8, O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 18 3106 fcntl(18, F_SETFD, FD_CLOEXEC) = 0 3106 chmod(0x2bb57a8, 0600) = 0 3106 fstat(18, {...}) = 0 3106 write(18, 0x2bb5838, 178) = 178 3106 fstat(18, {...}) = 0 3106 close(18) = 0 3106 rename(0x2bb5578, 0x2bb4e48) = 0 3106 unlink(0x2b82848) = 0 You may recall that my theory was that when I click the “Apply” button in System Settings, it writes out a new version of$HOME/.kde/share/config/kscreensaverrc and then orders the screen locker to reload the configuration. Even with no filenames, this part of the trace looked to me like the replacement of the configuration file: a new file is created, then written, then closed, and then the rename replaces the old file with the new one. If I had been thinking about it a little harder, I might have thought to check if the return value of the write call, 178 bytes, matched the length of the file. (It does.) The unlink at the end is deleting the semaphore file that System Settings created to prevent a second process from trying to update the same file at the same time.

Supposing that this was the trace of the configuration update, the next section should be the secret sauce that tells the screen locker to look at the new configuration file. It looked like this:

3106  sendmsg(5, 0x7ffcf37e53b0, MSG_NOSIGNAL) = 168
3106  poll([?] 0x7ffcf37e5490, 1, 25000) = 1
3106  recvmsg(5, 0x7ffcf37e5390, MSG_CMSG_CLOEXEC) = 90
3106  recvmsg(5, 0x7ffcf37e5390, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
3106  sendmsg(5, 0x7ffcf37e5770, MSG_NOSIGNAL) = 278
3106  sendmsg(5, 0x7ffcf37e5740, MSG_NOSIGNAL) = 128

There is very little to go on here, but none of it is inconsistent with the theory that this is the secret sauce, or even with the more advanced theory that it is the secret suace and that the secret sauce is a D-Bus request. But without seeing the contents of the messages, I seemed to be at a dead end.

### Thrashing

Browsing random pages about the KDE screen locker, I learned that the lock screen configuration component could be run separately from the rest of System Settings. You use

kcmshell4 --list

to get a list of available components, and then

kcmshell4 screensaver

to run the screensaver component. I started running strace on this command instead of on the entire System Settings app, with the idea that if nothing else, the trace would be smaller and perhaps simpler, and for some reason the missing strings appeared. That suggestive block of code above turned out to be updating the configuration file, just as I had suspected:

open("/home/mjd/.kde/share/config/kscreensaverrcQ13893.new", O_RDWR|O_CREAT|O_CLOEXEC, 0666) = 19
fcntl(19, F_SETFD, FD_CLOEXEC)          = 0
chmod("/home/mjd/.kde/share/config/kscreensaverrcQ13893.new", 0600) = 0
fstat(19, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
write(19, "[ScreenSaver]\nActionBottomLeft=0\nActionBottomRight=0\nActionTopLeft=0\nActionTopRight=2\nEnabled=true\nLegacySaverEnabled=false\nPlasmaEnabled=false\nSaver=krandom.desktop\nTimeout=60\n", 177) = 177
fstat(19, {st_mode=S_IFREG|0600, st_size=177, ...}) = 0
close(19)                               = 0
rename("/home/mjd/.kde/share/config/kscreensaverrcQ13893.new", "/home/mjd/.kde/share/config/kscreensaverrc") = 0

And the following secret sauce was revealed as:

sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\0\1\30\0\0\0\v\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\f\0\0\0GetNameOwner\0\0\0\0\10\1g\0\1s\0\0", 144}, {"\23\0\0\0org.kde.screensaver\0", 24}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 168
sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\1\1\206\0\0\0\f\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\10\0\0\0AddMatch\0\0\0\0\0\0\0\0\10\1g\0\1s\0\0", 144}, {"\201\0\0\0type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.kde.screensaver'\0", 134}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 278
sendmsg(7, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\0\1\0\0\0\0\r\0\0\0j\0\0\0\1\1o\0\f\0\0\0/ScreenSaver\0\0\0\0\6\1s\0\23\0\0\0org.kde.screensaver\0\0\0\0\0\2\1s\0\23\0\0\0org.kde.screensaver\0\0\0\0\0\3\1s\0\t\0\0\0configure\0\0\0\0\0\0\0", 128}, {"", 0}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 128
sendmsg(7, {msg_name(0)=NULL,
msg_iov(2)=[{"l\1\1\1\206\0\0\0\16\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/freedesktop/DBus\0\0\0\6\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\2\1s\0\24\0\0\0org.freedesktop.DBus\0\0\0\0\3\1s\0\v\0\0\0RemoveMatch\0\0\0\0\0\10\1g\0\1s\0\0",
144},
{"\201\0\0\0type='signal',sender='org.freedesktop.DBus',interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.kde.screensaver'\0",
134}]

(I had to tell give strace the -s 256 flag to tell it not to truncate the string data to 32 characters.)

### Binary gibberish

A lot of this is illegible, but it is clear, from the frequent mentions of DBus, and from the names of D-Bus objects and methods, that this is is D-Bus requests, as theorized. Much of it is binary gibberish that we can only read if we understand the D-Bus line protocol, but the object and method names are visible. For example, consider this long string:

interface='org.freedesktop.DBus',member='NameOwnerChanged',arg0='org.kde.screensaver'

With qdbus I could confirm that there was a service named org.freedesktop.DBus with an object named / that supported a NameOwnerChanged method which expected three QString arguments. Presumably the first of these was org.kde.screensaver and the others are hiding in other the 134 characters that strace didn't expand. So I may not understand the whole thing, but I could see that I was on the right track.

That third line was the key:

sendmsg(7, {msg_name(0)=NULL,
msg_iov(2)=[{"… /ScreenSaver … org.kde.screensaver … org.kde.screensaver … configure …", 128}, {"", 0}],
msg_controllen=0,
msg_flags=0},
MSG_NOSIGNAL) = 128

Huh, it seems to be asking the screensaver to configure itself. Just like I thought it should. But there was no configure method, so what does that configure refer to, and how can I do the same thing?

But org.kde.screensaver was not quite the same path I had been using to talk to the screen locker—I had been using org.freedesktop.ScreenSaver, so I had qdbus list the methods at this new path, and there was a configure method.

When I tested

qdbus org.kde.screensaver /ScreenSaver configure

I found that this made the screen locker take note of the updated configuration. So, problem solved!

(As far as I can tell, org.kde.screensaver and org.freedesktop.ScreenSaver are completely identical. They each have a configure method, but I had overlooked it—several times in a row—earlier when I had gone over the method catalog for org.freedesktop.ScreenSaver.)

The working script is almost identical to what I had yesterday:

timeout=${1:-3600} perl -i -lpe 's/^Enabled=.*/Enabled=False/'$HOME/.kde/share/config/kscreensaverrc
qdbus org.freedesktop.ScreenSaver /ScreenSaver configure
sleep $timeout perl -i -lpe 's/^Enabled=.*/Enabled=True/'$HOME/.kde/share/config/kscreensaverrc
qdbus org.freedesktop.ScreenSaver /ScreenSaver configure

That's not a bad way to fail, as failures go: I had a correct idea about what was going on, my plan about how to solve my problem would have worked, but I was tripped up by a trivium; I was calling MainApplication.reparseConfiguration when I should have been calling ScreenSaver.configure.

What if I hadn't been able to get strace to disgorge the internals of the D-Bus messages? I think I would have gotten the answer anyway. One way to have gotten there would have been to notice the configure method documented in the method catalog printed out by qdbus. I certainly looked at these catalogs enough times, and they are not very large. I don't know why I never noticed it on my own. But I might also have had the idea of spying on the network traffic through the D-Bus socket, which is under /tmp somewhere.

I was also starting to tinker with dbus-send, which is like qdbus but more powerful, and can post signals, which I think qdbus can't do, and with gdbus, another D-Bus introspector. I would have kept getting more familiar with these tools and this would have led somewhere useful.

Or had I taken just a little longer to solve this, I would have followed up on Sumana Harihareswara’s suggestion to look at Bustle, which is a utility that logs and traces D-Bus requests. It would certainly have solved my problem, because it makes perfectly clear that clicking that apply button invoked the configure method:

I still wish I knew why strace hadn't been able to print out those strings through.

# Controlling KDE screen locking from a shell script

Lately I've started watching stuff on Netflix. Every time I do this, the screen locker kicks in sixty seconds in, and I have to unlock it, pause the video, and adjust the system settings to turn off the automatic screen locker. I can live with this.

But when the show is over, I often forget to re-enable the automatic screen locker, and that I can't live with. So I wanted to write a shell script:

#!/bin/sh
auto-screen-locker disable
sleep 3600
auto-screen-locker enable

Then I'll run the script in the background before I start watching, or at least after the first time I unlock the screen, and if I forget to re-enable the automatic locker, the script will do it for me.

The question is: how to write auto-screen-locker?

### strace

My first idea was: maybe there is actually an auto-screen-locker command, or a system-settings command, or something like that, which was being run by the System Settings app when I adjusted the screen locker from System Settings, and all I needed to do was to find out what that command was and to run it myself.

So I tried running System Settings under strace -f and then looking at the trace to see if it was execing anything suggestive.

It wasn't, and the trace was 93,000 lines long and frighting. Halfway through, it stopped recording filenames and started recording their string addresses instead, which meant I could see a lot of calls to execve but not what was being execed. I got sidetracked trying to understand why this had happened, and I never did figure it out—something to do with a call to clone, which is like fork, but different in a way I might understand once I read the man page.

The first thing the cloned process did was to call set_robust_list, which I had never heard of, and when I looked for its man page I found to my surprise that there was one. It begins:

NAME
get_robust_list, set_robust_list - get/set list of robust futexes

And then I felt like an ass because, of course, everyone knows all about the robust futex list, duh, how silly of me to have forgotten ha ha just kidding WTF is a futex? Are the robust kind better than regular wimpy futexes?

It turns out that Ingo Molnár wrote a lovely explanation of robust futexes which are actually very interesting. In all seriousness, do check it out.

I seem to have digressed. This whole section can be summarized in one sentence:

strace was no help and took me a long way down a wacky rabbit hole.

Sorry, Julia!

### Stack Exchange

The next thing I tried was Google search for kde screen locker. The second or third link I followed was to this StackExchange question, “What is the screen locking mechanism under KDE? It wasn't exactly what I was looking for but it was suggestive and pointed me in the right direction. The crucial point in the answer was a mention of

qdbus org.freedesktop.ScreenSaver /ScreenSaver Lock

When I saw this, it was like a new section of my brain coming on line. So many things that had been obscure suddenly became clear. Things I had wondered for years. Things like “What are these horrible

messages that KDE apps are always spewing into my terminal?” But now the light was on.

KDE is built atop a toolkit called Qt, and Qt provides an interprocess communication mechanism called “D-Bus”. The qdbus command, which I had not seen before, is apparently for sending queries and commands on the D-Bus. The arguments identify the recipient and the message you are sending. If you know the secret name of the correct demon, and you send it the correct secret command, it will do your bidding. ( The mystery message above probably has something to do with the app using an invalid secret name as a D-Bus address.)

Often these sorts of address hierarchies work well in theory and then fail utterly because there is no way to learn the secret names. The X Window System has always had a feature called “resources” by which almost every aspect of every application can be individually customized. If you are running xweasel and want just the frame of just the error panel of just the output window to be teal blue, you can do that… if you can find out the secret names of the xweasel program, its output window, its error panel, and its frame. Then you combine these into a secret X resource name, incant a certain command to load the new resource setting into the X server, and the next time you run xweasel the one frame, and only the one frame, will be blue.

In theory these secret names are documented somewhere, maybe. In practice, they are not documented anywhere. you can only extract them from the source, and not only from the source of xweasel itself but from the source of the entire widget toolkit that xweasel is linked with. Good luck, sucker.

### D-Bus has a directory

However! The authors of Qt did not forget to include a directory mechanism in D-Bus. If you run

qdbus

you get a list of all the addressable services, which you can grep for suggestive items, including org.freedesktop.ScreenSaver. Then if you run

qdbus org.freedesktop.ScreenSaver

you get a list of all the objects provided by the org.freedesktop.ScreenSaver service; there are only seven. So you pick a likely-seeming one, say /ScreenSaver, and run

qdbus org.freedesktop.ScreenSaver /ScreenSaver

and get a list of all the methods that can be called on this object, and their argument types and return value types. And you see for example

method void org.freedesktop.ScreenSaver.Lock()

and say “I wonder if that will lock the screen when I invoke it?” And then you try it:

qdbus org.freedesktop.ScreenSaver /ScreenSaver Lock

and it does.

That was the most important thing I learned today, that I can go wandering around in the qdbus hierarchy looking for treasure. I don't yet know exactly what I'll find, but I bet there's a lot of good stuff.

When I was first learning Unix I used to wander around in the filesystem looking at all the files, and I learned a lot that way also.

• “Hey, look at all the stuff in /etc! Huh, I wonder what's in /etc/passwd?”

• “Hey, /etc/protocols has a catalog of protocol numbers. I wonder what that's for?”

• “Hey, there are a bunch of files in /usr/spool/mail named after users and the one with my name has my mail in it!”

• “Hey, the manuals are all under /usr/man. I could grep them!”

Later I learned (by browsing in /usr/man/man7) that there was a hier(7) man page that listed points of interest, including some I had overlooked.

### The right secret names

Everything after this point was pure fun of the “what happens if I turn this knob” variety. I tinkered around with the /ScreenSaver methods a bit (there are twenty) but none of them seemed to be quite what I wanted. There is a

method uint Inhibit(QString application_name, QString reason_for_inhibit)

method which someone should be calling, because that's evidently what you call if you are a program playing a video and you want to inhibit the screen locker. But the unknown someone was delinquent and it wasn't what I needed for this problem.

Then I moved on to the /MainApplication object and found

method void org.kde.KApplication.reparseConfiguration()

which wasn't quite what I was looking for either, but it might do: I could perhaps modify the configuration and then invoke this method. I dimly remembered that KDE keeps configuration files under $HOME/.kde, so I ls -la-ed that and quickly found share/config/kscreensaverrc, which looked plausible from the outside, and more plausible when I saw what was in it: Enabled=True Timeout=60 among other things. I hand-edited the file to change the 60 to 243, ran qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration and then opened up the System Settings app. Sure enough, the System Settings app now reported that the lock timeout setting was “4 minutes”. And changing Enabled=True to Enabled=False and back made the System Settings app report that the locker was enabled or disabled. ### The answer So the script I wanted turned out to be: timeout=${1:-3600}
perl -i -lpe 's/^Enabled=.*/Enabled=False/' $HOME/.kde/share/config/kscreensaverrc qdbus org.freedesktop.ScreenSaver /MainApplication reparseConfiguration sleep$timeout
perl -i -lpe 's/^Enabled=.*/Enabled=True/' \$HOME/.kde/share/config/kscreensaverrc
qdbus org.freedesktop.ScreenSaver /MainApplication  reparseConfiguration

Problem solved, but as so often happens, the journey was more important than the destination.

I am greatly looking forward to exploring the D-Bus hierarchy and sending all sorts of inappropriate messages to the wrong objects.

Just before he gets his ass kicked by Saruman, that insufferable know-it-all Gandalf says “He who breaks a thing to find out what it is has left the path of wisdom.” If I had been Saruman, I would have kicked his ass at that point too.

Right after I posted this, I started watching Netflix. The screen locker cut in after sixty seconds. “Aha!” I said. “I'll run my new script!”

I did, and went back to watching. Sixty seconds later, the screen locker cut in again. My script doesn't work! The System Settings app says the locker has been disabled, but it's mistaken. Probably it's only reporting the contents of the configuration file that I edited, and the secret sauce is still missing. The System Settings app does something to update the state of the locker when I click that “Apply” button, and I thought that my qdbus command was doing the same thing, but it seems that it isn't.

I'll figure this out, but maybe not today. Good night all!

[ Addendum 20160728: I figured it out the next day ]

[ Addendum 20160729: It has come to my attention that there is actually a program called xweasel. ]

Since July we've made a number of updates, mostly content. Here's a rundown:

• Intero was added to the site under /intero.
• We've added Intero to the get-started page.
• We've added the /tutorial/ hierarchy, with a sample tutorial.
• The /packages page has been renamed to /libraries. The idea being this might be more obvious to newcomers from other languages.
• Added a library description for conduit.

The complete diff can be found here.

# Maintainer required for Derive

Summary: I'm looking for a maintainer to take over Derive. Interested?

The Derive tool is a library and set of definitions for generating fragments of code in a formulaic way from a data type. It has a mechanism for guessing the pattern from a single example, plus a more manual way of writing out generators. It supports 35 generators out of the box, and is depended upon by 75 libraries.

The tool hasn't seen much love for some time, and I no longer use it myself. It requires somewhat regular maintenance to upgrade to new versions of GHC and haskell-src-exts. There are lots of directions the tool could be taken, more derivations, integration with the GHC Generic derivation system etc. There's a few generic pieces that could be broken off (translation between Template Haskell and haskell-src-exts, the guessing mechanism).

Anyone who is interested should comment on the GitHub ticket. In the absence of any volunteers I may continue to do the regular upgrade work, or may instead have it taken out of Stackage and stop upgrading it.

# luminance-0.6.0 sample

It’s been two weeks luminance is at version 0.6.0. I’m very busy currently but I decided to put some effort into making a very minimalistic yet usable sample. The sample uses luminance and luminance-gl (the OpenGL 3.3 backend being the single one available for now).

You’ll find it here. The code is heavily commented and you can of course clone the repository and and run the executable with cargo.

I’ll post a more detailed blog post about the application I’m building with luminance right now later on.

Keep the vibe! :)

# [leijaltq] Typeof as a type

Wherever one can specify a type, let it be possible to use "TYPE_OF(EXPRESSION)" instead.  For example, it can be used in type signatures and declarations of type synonyms.  Here is an example with syntax similar to Haskell:

A library has functions

f1 :: Int -> Foo;
f1 i = ...
f2 :: Foo -> Int;
f2 f = ...

The user's code importing the library might do something like

type MyFoo = TYPE_OF(f1 undefined);
intermediate :: MyFoo;
intermediate = f1 42;

consume_intermediate :: Int;
consume_intermediate = f2 intermediate;

If a new version of the library were to rename Foo into Bar, the user's code would still work unmodified.  (It is irrelevant whether the library exports Foo, later Bar, though not exporting the type would force the user to use TYPE_OF or avoid explicit type signatures, which seems silly.)

This feature could be useful during development if the names of things are not yet decided (bike shedding), but we do know how we want to use them.

The compiler will have to flag errors of circular references of TYPE_OF.

# Dependent types in Haskell: Progress Report

It was drawn to my attention that there is an active Reddit thread about the future of dependent types in Haskell. (Thanks for the heads up, @thomie!) Instead of writing a long response inline in Reddit, it seems best to address the (very knowledgeable, respectful, and all around heartening) debate here.

## When can we expect dependent types in GHC?

The short answer: GHC 8.4 (2018) at the very earliest. More likely 8.6 or 8.8 (2019-20).

Here is the schedule as I see it:

• GHC 8.2: Clean up some of the lingering issues with -XTypeInType. As I will be starting a new job (Asst. Prof. at Bryn Mawr College) this fall, I simply won’t have the opportunity to do more than some cleanup work for the next release. Also, quite frankly, I need a release cycle off from the challenge of putting in a large new feature. Polishing up -XTypeInType for release took probably an extra full month of unexpected work time! I don’t regret this in the slightest, but I could use a cycle off.
• GHC 8.4: This depends on what other research opportunities come up in the next year and how much more attention -XTypeInType needs. If all goes well this fall and I can get a few publications out in the next year (certainly possible – I have several in the works), then I could conceivably start primary implementation of -XDependentTypes next summer. The odds that it will be in a state to merge for 8.4 are slim, however.
• GHC 8.6: We’re now talking about a real possibility here. Assuming I start the implementation next summer, this would give me a year and a half to complete it. I desperately wish to avoid merging late in the cycle (which is what made -XTypeInType so stressful) so perhaps merging soon after GHC 8.4 comes out is a good goal. If this slips, GHC 8.8 seems like quite a likely candidate.

Regardless of the schedule, I have every intention of actually doing this work.

One major oversight in the schedule above: I have completely discounted the possibility of collaborators in coding this up. Do you wish to help make this a reality? If so, let’s talk. I’ll admit that there may be a bit of a hill for an outside collaborator to climb before I really start collaborating in earnest, so be prepared to show (or create!) evidence that you’re good at getting your hands dirty in the greasy wheels of GHC’s typechecker without very much oversight. I’ve encouraged several of you on Trac/Phab – you know who you are, I hope. For others who have not yet contributed to GHC: you’re likely best served starting smaller than implementing dependent types! But let’s talk anyway, if you’re interested.

## What is the design of dependent types in Haskell?

I expect to hand in my dissertation in the next two weeks. While I am developing it in the public eye (and warmly welcome issues to be posted), there is no publicly available PDF build. Building instructions are in the README. I will post a built PDF when I hand in my draft to my thesis committee. I will be defending on September 1 and will likely make some revisions (as suggested by my committee) afterward.

Readers here will likely be most interested in Chapters 3 and 4. Chapter 3 will contain (I’m still writing it!) numerous examples of dependent types in Haskell and how they might be useful. Chapter 4 presents the design of dependent types in Haskell as a diff over current Haskell. This design is not yet up to the standard of the Haskell Reports, mostly because it is unimplemented and seems not worth the effort of formalization until we know it is implementable. For the overly curious, Chapter 5 contains a new intermediate language (to replace GHC Core) and Chapter 6 contains the type inference algorithm that will deal with dependent types. Happy summer reading!

• @elucidatum: How radical of a change and how much of a challenge will it be to refactor existing codebases?

No change is necessary. All code that currently compiles with -XTypeInType will compile with -XDependentTypes. (The only reason I have to set -XTypeInType as my baseline is because of the parsing annoyance around *, which must be imported to be used with -XTypeInType.) Any refactoring will be around how much you, as the Haskell author, wish to take advantage of dependent types.

• @jlimperg: Pervasive laziness in Haskell means that we like to use a lot of coinductive types. But this is really annoying for proof purposes because coinduction, despite the nice duality with induction, is a lot less pleasant to work with.

Yes, this is true. My approach is, essentially, to pretend that types are inductive, not coinductive. One consequence of this decision is that proofs of type equality will have to be run. This means that including dependent types will slow down your running program. Sounds horrible, doesn’t it? It doesn’t have to, though.

Suppose you have a function proof :: ... -> a :~: b. The function proof provides a proof that type a equals type b. If this function terminates at all, it will always produce Refl. Thus, if we knew that the function terminated, then we wouldn’t have to run it. We can’t know whether it terminates. But you, the programmer, can assert that it terminates, like this: {-# RULES "proof" proof ... = unsafeCoerce Refl #-} Now, GHC will replace any use of proof with Refl directly. Note that GHC still type-checks your proof. All this RULE does is assert termination.

“But,” you say, “I don’t want to have to merely assert termination! If I wanted to assert correctness instead of prove it, I wouldn’t use dependent types.” My response: “Touché.” Haskell indeed is less powerful than those other dependently typed languages in this regard. Nevertheless, by using dependent types in Haskell, you still get a whole lot more compile-time guarantees than you get without dependent types. You just don’t get termination. You thus have a choice: “prove” termination at runtime by actually running your proofs, or assert it at compile time and keep your dependently typed program efficient, and still with lots of guarantees. Recall that we Haskellers have never proved termination of anything, ever, so not proving termination of a function named proof shouldn’t be all that alarming.

Note: If RULES such as the one above become pervasive, perhaps a compiler flag can make them automatic, effectively (and concisely) assuming termination for all proofs in a module.

• @jlimperg: Proving is hard

Yes, it is. Typechecker plugins may help here. It is easy enough, however, to implement dependent types without this advanced support. As we work through the consequences of dependent types in Haskell, I’m confident the community (including me) will come up with ways to make it easier.

• @jlimperg: As is well-known, a dependent type system is inconsistent (i.e. everything can be proven) unless all computations terminate, which is obviously not the case for general Haskell.

Yes, this is true. Dependent types in Haskell will be inconsistent, when we view Haskell as a logic. Dependent types will still remain type-safe however. (This is because we run the proofs!) I just wish to point out here that @jlimperg suggests a syntactic check for termination: this, sadly, will not work. Haskell has too many ways for a computation to diverge, and a syntactic check can rule out only general recursion and partial pattern matches. Haskell also has infinite data types, non-strictly-positive datatypes, TypeRep (which can be abused to cause a loop), Type :: Type, and likely more back doors. I’d love to see a termination checker for Haskell, but I’m not holding my breath.

• @jlimperg: This point is very speculative, but I strongly dislike the current flavour of ‘almost-dependent’ type-level programming in Haskell.

So do I. Singletons are an abomination. I hate them. These are gone with my design for dependent types, and I think the resulting language has the niceties we all want in a dependently typed language (modulo termination checking).

• @dogodel: I think it will take a while for the change to percolate into better design. … I think the most immediate benefit will be for (no surprise) expressive DSL, which are actually quite common in any modest codebase, but maybe not the “core” of the project.

Agreed completely. It will take time for us to figure out the best way to use dependent types.

• @sinyesdo: The best summary I can offer is that “totality” is really really important for DT languages

This is true for those other dependently typed languages, but not for Haskell. See my response to the first of @jlimperg’s points quoted here.

• @ccasin: The main reasons to care that your programming language is total are to make the type system consistent as a logic and to make type checking decidable. But we’ve already given up both these things in Haskell, and it’s possible to have dependent types without them.

Agreed in full, and with @ccasin’s full post.

• @jmite: If you want fully Dependent-Type types, why not use Idris, Agda, Coq, or F*?

I’m in broad agreement with the responses to this point on Reddit: basically, that none of these languages have Haskell’s ecosystem or optimizing compiler. See also Section 3.3 of my dissertation.

• @tactics: What does “fully dependent types” even mean for something like Haskell? You would have to rework the core language entirely.

Yes, precisely. This is Chapter 5 of my dissertation.

## Conclusion

I hope this information helps. Do keep an eye out for my finished dissertation sometime in the next few months. And, as always, this is all a work in progress and no one should take anything as set in stone. If there’s a design decision you disagree with, please speak up!