This longish post gives Simon's reflections on the implementation of Cloud-Haskell-style static pointers and serialiation. See also StaticPointers.

Much of what is suggested here is implemented, in some form, in two existing projects

My goal here is to identify the smallest possible extension to GHC, with
the smallest possible trusted code base, that would enable these
libraries to be written in an entirely type-safe way.

# Background

## Background: the trusted code base

The implementation `Typeable` class, and its associated functions, in
GHC offers a **type-safe** abstraction, in the classic sense that
"well typed programs won't go wrong". For example, we in `Data.Typeable` we have

cast :: forall a b. (Typeable a, Typeable b) => a -> Maybe b

We expect `cast` to be type-safe: if `cast` returns a value `Just x` then we really do know
that `x :: b`. Let's remind ourselves of class `Typeable`:

class Typeable a where
typeRep :: proxy a -> TypeRep

(It's not *quite* this, but close.) The `proxy a` argument is
just a proxy for *type* argument; its value is never inspected
and you can always pass bottom.

Under the hood, `cast` uses `typeRep` to get the runtime `TypeRep` for
`a` and `b`, and compares them, thus:

cast :: forall a b. (Typeable a, Typeable b) => a -> Maybe b
cast x = if typeRep (Proxy :: Proxy a) == typeRep (Proxy :: Proxy b)
then Just (unsafeCoerce x)
else Nothing

Although `cast` is written in Haskell, it uses `unsafeCoerce`. For it
to truly be type-safe, it must trust the `Typeable` instances. If the
user could write a `Typeable` instance, they could write a bogus one, and
defeat type safety. So only GHC is allowed write `Typeable` instances.

In short, `cast` and the `Typeable` instances are part of the **trusted code base**, or **TCB**:

- The TCB should be as small as possible
- The TCB should have a small, well-defined, statically-typed API used by client code
- Client code is un-trusted; if the client code is well-typed, and the TCB is implemented correctly, nothing can go wrong

## Background `Typeable a` and `TypeRep`

I'll use the `Typeable a` type class and values of type `TypeRep` more or less interchangeably. As you can see from the definition of class `Typeable` above, its payload is simply a constant function returning a `TypeRep`. So you can think of a `Typeable a` as simply a type-tagged version of `TypeRep`.

Of course, a `Typeable a` is a type class thing, which is hard to pass around explicitly like a value, but that is easily fixed using the "Dict Trick",
well known in Haskell folk lore:

data Dict (c :: Constraint) where
Dict :: forall c. c => Dict c

Now a value of type `Dict (Typeable a)` is an ordinary value that embodies a `Typeable a` dictionary. For example:

f :: Dict (Typeable a) -> Dict (Typeable b) -> a -> Maybe b
f Dict Dict val = cast val

The pattern-matches against the `Dict` constructor brings the `Typeable` dictionaries
into scope, so they can be used to discharge the constraint arising from the call to `cast`.

## Background: serialisation

I'm going to assume a a type class `Serialisable`, something like this:

class Serialisable a where
encode :: a -> ByteString
decode :: ByteString -> Maybe (a, ByteString)

'll use "encode" and "decode" as synonyms for "serialise" and "deserialise", because the former are easier to pronounce.

Here's an interesting question: are instances of `Serialisable` part of the TCB? No, they are not.
Here is a tricky case:

decode (encode [True,False]) :: Maybe (Int, ByteString)

Here I have encode a `[Bool]` into a `ByteString`, and then decoded an `Int` from that `ByteString`. This may
be naughty or undesirable, but it cannot seg-fault: it is type-safe in the sense above. You can
think of it like this: a decoder is simply a parser for the bits in the `ByteString`, so a decoder
for (say) `Int` can fail to parse a full `Int` (returning `Nothing`), but it can't return a non-`Int`.

For the naughtiness, one could imagine that a Cloud Haskell library
might send fingerprints or `TypeReps` or whatnot to eliminate
potential naughtiness. But even then it is very valuable if the
type-safety of the system does not rely on the CH library. Type
safety depends only on the correctness of the (small) TCB;
naughtiness-safety might additionally depend on the correctness of the
CH library.

## Background: static pointers

I'm taking for granted the basic design of the Cloud Haskell paper.
That is,

- A type constructor
`StaticPtr :: * -> *`. Intuitively, a value of type `StaticPtr t` is represented by a static code pointer to a value of type `t`. Note "code pointer" not "heap pointer". That's the point!

- A language construct
`static <expr>`, whose type is `StaticPtr t` if `<expr>` has type `t`.

- In
`static <expr>`, the free variables of `<expr>` must all be bound at top level. The implementation almost certainly works by giving `<expr>` a top-level definition with a new name, `static34 = <expr>`.

- A function
`unStatic :: StaticPtr a -> a`, to unwrap a static pointer.

`Static` values are serialisable. Something like `instance Serialisable (StaticPtr a)`. (This will turn out to be not quite right.) Operationally this works by serialising the code pointer, or top-level name (e.g `"Foo.static34"`).

All of this is built-in. It is OK for the implementation of `StaticPtr` to be part of the TCB.
But our goal is that *no other code need be in the TCB*.

**A red herring**. I'm not going to address the question of how to serialise a static pointer. One method would be to serialise a machine address, but that only works if the encoding and decoding ends are running identical binaries. But that's easily fixed: encode a static as the *name* of the static value e.g. "function `foo` from module `M` in package `p`". Indeed, I'll informally assume an implementation of this latter kind.

In general, I will say that what we ultimately serialise is a `StaticName`. You can think of a `StaticName` as package/module/function triple, or something like that. The implementation of `StaticName` is certainly not part of the client-visible API for `StaticPtr`; indeed, the type `StaticName` is not part of the API either. But it gives us useful vocabulary.

# Serialising static pointers

We can see immediately that we cannot expect to have `instance Serialisable (Static a)`,
which is what the Cloud Haskell paper proposed. If we had such an instance we would have

encodeStatic :: forall a. StaticPtr a -> ByteString
decodeStatic :: forall a. ByteString -> Maybe (StaticPtr a, ByteString)

And it's immediately apparent that `decodeStatic` *cannot* be right.
I could get a `ByteString` from anywhere, apply `decodeStatic` to it,
and thereby get a `StaticPtr a`. Then use
`unStatic` and you have a value of type `a`, for, *for any type *`a`!!

Plainly, what we need is (just in the case of `cast`) to do a dynamic typecheck, thus:

decodeStatic :: forall a. Typeable a
=> ByteString -> Maybe (StaticPtr a, ByteString)

Let's think operationally for a moment:

- GHC collects all the
`StaticPtr` values in a table, the **static pointer table** or **SPT**. Each row contains
- The
`StaticName` of the value
- A pointer to closure for the value itself
- A pointer to its
`TypeRep`

`decodeStatic` now proceeds like this:
- Parse a
`StaticName` from the `ByteString` (failure => `Nothing`)
- Look it up in table (not found =>
`Nothing`)
- Compare the
`TypeRep` passed to `decodeStatic` (via the `Typeable a` dictionary) with the one ine the table (not equal => `Nothing`)
- Return the value

**Side note.** Another possibility is for `decodeStatic` not to take a `Typeable a` context but instead for `unStatic` to do so:: `unStatic :: Typeable a => StaticPtr a -> Maybe a`. But that seems a mess. Apart from anything else, it would mean that a value of type `StaticPtr a` might or might not point to a value of type `a`, so there's no point in having the type parameter in the first place. **End of side note.**

This design has some useful consequences that are worth calling out:

- Since clients can effectively fabricate a
`StaticName` (by supplying `decodeStatic` with a bogus `ByteString`, a `StaticName` is untrusted. That gives the implementation a good deal of wiggle room for how it chooses to implement static names. Even a simple index in the range 0..N would be type-safe!

The motivation for choosing a richer representation for `StaticName` (eg package/module/name) is not type-safety but rather resilience to change. For example, the Haskell programs at the two ends could be quite different, provided only that they agreed about what to call the static pointers that they want to exchange.

## Statics and existentials

Here is something very reasonable:

data StaticApp b where
SA :: StaticPtr (a->b) -> StaticPtr a -> StaticApp b
unStaticApp :: StaticApp a -> a
unStaticApp (SA f a) = unStatic f (unStatic a)

(We might want to add more constructors, but I'm going to focus only on `SA`.)
A `SA` is just a pair of `StaticPtr`s, one for a function and one for an argument. We can securely unwrap it with `unStaticApp`.

Now, here is the question: can we serialise `StaticApp`s? Operationally, of course yes: to serialise a `SA`, just serialise the two `StaticPtrs` it contains, and dually for deserialisation. But, as before, deserialisation is the hard bit. We seek:

decodeSA :: Typeable b => ByteString -> Maybe (StaticApp b, ByteString)

But how can we write `decodeSA`? Here is the beginning of an attempt:

decodeSA :: Typeable b => ByteString -> Maybe (StaticApp b, ByteString)
decodeSA bs
= case decodeStatic bs :: Maybe (StaticPtr (a->b)) of
Nothing -> Nothing
Just (fun, bs1) -> ...

and you can immediately see that we are stuck. Type variable `b` is not in scope.
More concretely, we need a `Typeable (a->b)` to pass in to `decodeStatic`,
but we only have a `Typeable b` to hand.

What can we do? Tantalisingly, we know that if `decodeStatic` succeeds in parsing a static `StaticName` from `bs` then, when we look up that `StaticName` in the Static Pointer Table, we'll find a `TypeRep` for the value. So rather than passing a `Typeable` dictionary into `decodeStatic`, we'd like to get one out!

With that in mind, here is a new type signature for `decodeStatic` that returns
both pieces:

data DynStaticPtr where
DSP :: Typeable a => StaticPtr a -> DynStaticPtr
decodeStatic :: ByteString -> Maybe (DynStaticPtr, ByteString)

(The name `DynStaticPtr` comes from the fact that this data type is extremely similar to the library definition of `Dynamic`.)

Operationally, `decodeStaticK bs fail cont` works like this;

- Parse a
`StaticName` from `bs` (failure => return Nothing)
- Look it up in the SPT (not found => return Nothing)
- Return the
`TypeRep` and the value found in the SPT, paired up with `DSP`. (Indeed the SPT could contain the `DynStaticPtr` values directly.)

For the construction of `DynStaticPtr` to be type-safe, we need to know that the
`TypeRep` passed really is a `TypeRep` for the value; so the construction
of the SPT is (unsurprisingly) part of the TCB.

Now we can write `decodeSA` (the monad is just the `Maybe` monad, nothing fancy):

decodeSA :: forall b. Typeable b => ByteString -> Maybe (StaticApp b, ByteString)
decodeSA bs
= do { (DSP (fun :: StaticPtr tfun), bs1) <- decodeStatic bs
; (DSP (arg :: StaticPtr targ), bs2) <- decodeStatic bs1
-- At this point we have
-- Typeable b (from caller)
-- Typeable tfun (from first DSP)
-- Typeable targ (from second DSP)
; fun' :: StaticPtr (targ->b) <- cast fun
; return (SA fun' arg, bs2) }

The call to `cast` needs `Typeable tfun`, and `Typeable (targ->b)`. The
former is bound by the first `DSP` pattern match. The latter is
constructed automatically from `Typeable targ` and `Typeable b`, both
of which we have. Bingo!

Notice that `decodeSA` is not part of the TCB. Clients can freely write code like `decodeSA` and be sure that it is type-safe.

# From static pointers to closures

The original Cloud Haskell paper defines closures like this:

data Closure a where
Clo :: StaticPtr (ByteString -> a) -> ByteString -> Closure a

It is easy to define

unClo :: Closure a -> a
unClo (Clo s e) = unStatic s e

## Side note on HdpH

HdpH refines the Cloud Haskell `Closure` in (at least) two ways. I think (but I am not certain) that this declaration captures the essence:

data Closure a where
Clo :: StaticPtr (ByteString -> a) -> Put () -> a -> Closure a

The refinements are:

- The extra argument of type 'a' to avoid costs when we build a closure and then unwrap it with
`unClo` locally, or repeatedly.

- The use of
`Put ()` rather than a `ByteString` for the serialised environment, to avoid repeated copying when doing nested serialisation.

Both are importnat, but they are orthogonal to the discussion about static types, so I'll use the CH definition from here on.

## Serialising closures

Just as in the case of `StaticPtr`, it is immediately clear that we
cannot expect to have

decodeClo :: ByteString -> Maybe (Closure a, ByteString)

Instead we must play the same trick, and attempt to define

data DynClosure where
DC :: Typeable a => Closure a -> DynClosure
decodeClo :: ByteString -> Maybe (DynClosure, ByteString)

But there's an immediate problem in writing `decodeClo`:

decodeClo bs
= do { (DSP (fun :: StaticPtr tfun), bs1) <- decodeStatic bs
; (env, bs2) <- decodeByteString bs1
; return (DC (Clo fun env), bs2) } -- WRONG

This won't typecheck because `DC` needs `Typeable `a`, but we only have `Typeable (ByteString -> a)`.

This is Jolly Annoying. I can see three ways to make progress:

**Plan A**: Provide some (type-safe) way to decompose `TypeReps`, to get from `Typeable (a->b)` to `Typeable b` (and presumably `Typeable a` as well).
**Plan C**: Serialise a `TypeRep a` with every `Closure a`.
**Plan C**: Generalise `StaticPtr`

I like Plan C best. They are each discussed next.

### Plan A: Decomposing `TypeRep`

At the moment, GHC provides statically-typed ways to *construct* and *compare* a `TypeRep` (via `cast`), but no way to *decompose* one, at least not in a type-safe way.
It is tempting to seek this function as part of the TCB:

class Typeable a where
typeRep :: proxy a -> TypeRep
decomposeTypeRep :: DecompTR a
data DecompTR a where
TRApp :: (Typeable p, Typeable q) => DecompTR (p q)
TRCon :: TyCon -> DecompTR a

This isn't a bad idea, but it does mean that `Typeable a` *must* be implemented (and presumably serialised) using a tree, whereas the current API would allow an implementation consisting only of a fingerprint.

(Thought experiment: maybe a `Typeable a`, and `Dict (Typeable a)` can be represented as a tree, but a `TypeRep` could be just a fingerprint?)

### Plan B: serialise `TypeRep` with `Closure`

Since we need a `Typeable a` at the far end, we could just serialise it directly
with the `Closure`, like this:

encodeClo :: forall a. Typeable a => Closure a -> ByteString
encodeClo (Clo fun env)
= encodeTypeable (proxy :: a)
++ encodeStatic fun
++ encodeByteString env

Here I am assuming (as part of the TBC)

encodeTypeable :: Typeable a => proxy a -> ByteString
decodeTypeable :: ByteString -> Maybe (DynTypeable, ByteString)
data DynTypeable where
DT :: Typeable a => proxy a -> DynTypeable

which serialises a `TypeRep`. (Or, operationally, perhaps just its fingerprint.)
Now I think we can write `decodeClo`:

decodeClo :: ByteString -> Maybe (DynClosure, ByteString)
decodeClo bs
= do { (DT (_ :: Proxy a), bs1) <- decodeTypeable
; (DSP (fun :: StaticPtr tfun), bs2) <- decodeStatic bs1
; (env, bs3) <- decodeByteString bs2
; fun' :: StaticPtr (ByteString -> a) <- cast fun
; return (DC (Clo fun' env), bs2) } -- WRONG

But this too is annoying: we have to send these extra `TypeRep`s when morally they are already sitting there in the SPT.

### Plan C: Generalising `StaticPtr`

Our difficulty is that we are deserialising `StaticPtr (ByteString -> a)` but we want to be given `Typeable a` not `Typeable (ByteString -> a)`. So perhaps we can decompose the type into a type constructor and type argument, like this:

data StaticPtr (f :: *->*) (a :: *)
unStatic :: StaticPtr f a -> f a
decodeStatic :: ByteString -> Maybe (DynStaticPtr, ByteString)
data DynStaticPtr where
DS :: (Typeable f, Typeable a) => StaticPtr (f a) -> DynStaticPtr

Each row of the SPT contains:

- The
`StaticName`
- The value of type
`f a`
- The
`Typeable f` dictionary
- The
`Typeable a` dictionary

Now we can define closures thus:

data Closure a where
Clo :: StaticPtr (ByteString ->) a -> ByteString -> Closure a

and these are easy to deserialise:

decodeClo :: ByteString -> Maybe (DynClosure, ByteString)
decodeClo bs
= do { (DSP (fun :: StaticPtr f a), bs1) <- decodeStatic bs
; (env, bs2) <- decodeByteString bs1
-- Here we have Typeable f, Typeable a
; fun' :: StaticPtr (ByteString ->) a <- cast fun
-- This cast checks that f ~ (ByteString ->)
-- Needs Typeable f, Typealbe (ByteString ->)
; return (DC (Clo fun env), bs2) }
-- DC needs Typeable a

I like this a lot better, but it has knock on effects.

- The old
`StaticPtr a` is now `StaticPtr Id a`.

**ToDo: ...I have not yet followed through all the details**

## Applying closures

Can we write `closureApply`? I'm hoping for a structure like this:

closureApply :: Closure (a->b) -> Closure a -> Closure b
closureApply fun arg = Clo (static caStatic) (fun, arg)
caStatic :: ByteString -> b -- WRONG
caStatic bs = do { ((fun,arg), bs1) <- decode bs
; return (unClo fun (unClo arg), bs1) }

This is obviously wrong. `caStatic` clearly cannot have that type. It would
at least need to be

caStatic :: Typeable b => ByteString -> b

and now there is the thorny question of where the `Typeable b` dictionary comes from.

**ToDo: ...I have stopped here for now**

# Polymorphism and serialisation

For this section I'll revert to the un-generalised single-parameter `StaticPtr`.

## Parametric polymorphism

Consider these definitions:

rs1 :: Static ([Int] -> [Int])
rs1 = static reverse
rs2 :: Static ([Bool] -> [Bool])
rs2 = static reverse
rs3 :: forall a. Typeable a => Static ([a] -> [a])
rs3 = static reverse

The first two are clearly fine. The SPT will get one row for each of the two monomorphic calls to reverse, one with a `TypeRep` of `[Int] -> [Int]` and one with a `TypeRep` of `[Bool] -> [Bool]`.

But *both will have the same code pointer*, namely the code for the polymorpic `reverse` function. Could we share just one `StaticName` for all instantiations of `reverse`, perhaps including `rs3` as well?

I think we can. The story would be this:

- The SPT has a row for
`reverse`, containing
- The
`StaticName` for `reverse`
- A pointer to the code for
`reverse` (or, more precisely, its static closure).
- A function of type
`TypeRep -> TypeRep` that, given the `TypeRep` for `a` returns a `TypeRep` for `[a] -> [a]`.

- When we serialise a
`StaticPtr` we send
- The
`StaticName` of the (polymorphic) function
- A list of the
`TypeRep`s of the type arguments of the function

- The rule for
`static <expr>` becomes this: the free *term* variables `<expr>` must all be top level, but it may have free *type* variables, provided they are all `Typeable`.

All of this is part of the TCB, of course.

## Type-class polymorphism

Consider `static sort` where `sort :: Ord a => [a] -> [a]`. Can we make such a `StaticPtr`. After all, `sort` gets an implicit value argument, namely an `Ord a` dictionary. If that dictionary can be defined at top level, well and good, so this should be OK:

ss1 :: StaticPtr ([Int] -> [Int])
ss1 = static sort

But things go wrong as soon as you have polymorphism:

ss2 :: forall a. Ord a => StaticPtr ([a] -> [a])
ss2 = static sort -- WRONG

Now, clearly, the dictionary is a non-top-level free variable of the call to `sort`.

We might consider letting you write this:

ss3 :: forall a. StaticPtr (Ord a => [a] -> [a])
ss3 = static sort -- ???

so now the `static` wraps a function expeting a dictionary. But that edges us uncomforatbly close to impredicative types, which is known to contain many dragons.

A simpler alternative is to use the Dict Trick (see Background above):

ss4 :: forall a. StaticPtr (Dict (Ord a) -> [a] -> [a])
ss4 = static sortD
sortD :: forall a. Dict (Ord a) -> [a] -> [a]
sortD Dict xs = sort xs

Now, at the call side, when we unwrap the `StaticPtr`, we need to supply an explicit `Ord` dictionary, like this:

...(unStatic ss4 Dict)....

For now, I propose to deal with type classes via the Dict Trick, which is entirely end-user programmable, leaving only parametric polymorphism for built-in support.