In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:
This post was originally published on abhinavsarkar.net.
This is the second post in a series of posts:
- A Fast Bytecode VM for Arithmetic: The Parser
- A Fast Bytecode VM for Arithmetic: The Compiler
- A Fast Bytecode VM for Arithmetic: The Virtual Machine
In this post, we write the compiler for our AST to bytecode, and a decompiler for the bytecode.
Introduction
AST interpreters are well known to be slow because of how AST nodes are represented in the computer’s memory. The AST nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an AST, the interpreter jumps all over the memory, causing a slowdown. One solution to this is to convert the AST into a more compact and optimized representation known as Bytecode.
Bytecode is a flattened and compact representation of a program, usually manifested as a byte array. Bytecode is essentially an Instruction Set (IS), but custom-made to be executed by a Virtual Machine (VM), instead of a physical machine. Each bytecode instruction is one byte in size (that’s where it gets its name from). A bytecode and its VM are created in synergy so that the execution is as efficient as possible1. Compiling source code to bytecode and executing it in a VM also allows the program to be run on all platforms that the VM supports without the developer caring much about portability concerns. The most popular combo of bytecode and VM is probably the Java bytecode and the Java virtual machine.
The VMs can be stack-based or register-based. In a stack-based VM, all values created during the execution of a program are stored only in a Stack data-structure residing in the memory. Whereas, in a register-based VM, there is also an additional set of fixed number of registers that are used to store values in preference to the stack2. Register-based VMs are usually faster, but stack-based VMs are usually simpler to implement. For our purpose, we choose to implement a stack-based VM.
We are going to write a compiler that compiles our expression AST to bytecode. But first, let’s design the bytecode for our stack-based VM.
The Bytecode
Here is our expression AST as a reminder:
data Expr
= Num !Int16
| Var !Ident
| BinOp !Op Expr Expr
| Let !Ident Expr Expr
deriving (Eq, Generic)
newtype Ident = Ident {unIdent :: BS.ByteString}
deriving (Eq, Ord, Generic, Hashable)
data Op = Add | Sub | Mul | Div deriving (Eq, Enum, Generic)
Let’s figure out the right bytecode for each case. First, we create Opcodes for each bytecode, which are sort of mnemonics for actual bytecode. Think of them as Assembly is to Machine Code.
Num
For a number literal, we need to put it directly in the bytecode so that we can use it later during the execution. We also need an opcode to push it on the stack. Let’s call it OPush
with an Int16
parameter.
BinOp
Binary operations recursively use Expr
for their operands. To evaluate a binary operation, we need its operands to be evaluated before, so we compile them first to bytecode. After that, all we need is an opcode per operator. Let’s call them OAdd
, OSub
, OMul
, and ODiv
for Add
, Sub
, Mul
, and Div
operators respectively.
Var
and Let
Variables and Let
expressions are more complex3. In the AST interpreter we chucked the variables in a map, but we cannot do that in a VM. There is no environment map in a VM, and all values must reside in the stack. How do we have variables at all then? Let’s think for a bit.
Each expression, after being evaluated in the VM, must push exactly one value on the stack: its result. Num
expressions are a trivial case. When a binary operation is evaluated, first its left operand is evaluated. That pushes one value on the stack. Then its right operand is evaluated, and that pushes another value on the stack. Finally, the operation pops the two values from the top of the stack, does its thing, and pushes the resultant value back on the stack—again one value for the entire BinOp
expression.
A Let
expression binds a variable’s value to its name, and then the variable can be referred from the body of the expression. But how can we refer to a variable when the stack contains only values, not names? Let’s imagine that we are in middle of evaluating a large expression, wherein we encounter a Let
expression. First we evaluate its assignment expression, and that leaves a value on the top of the stack. Let’s say that the stack has n
values at this point. After this we get to evaluate the body expression. At all times when we are doing that, the value from assignment stays at the same point in the stack because evaluating sub-expressions, no matter how complicated, only adds new values to the stack, without popping an existing value from before. Therefore, we can use the stack index of the assignment value (n−1
) to refer to it within the body expression. So, we encode Var
as an opcode and an integer index into the stack.
We choose to use a Word8
to index the stack, limiting us to a stack depth of 256. We encode the variable references with an opcode OGet
, which when executed gets the value from the stack at the given index and pushes it on the stack.
For a Let
expression, after we compile its assignment and body expressions, we need to make sure that the exactly-one-value invariant holds. Evaluating the assignment and body pushes two values on the stack, but we can have only one! So we overwrite the assignment value with the body value, and pop the stack to remove the body value. We invent a new opcode OSwapPop
to do this, called so because its effect is equivalent to swapping the topmost two values on the stack, and then popping the new top value4.
Putting all the opcodes together, we have the Opcode
ADT:
data Opcode
= OPush !Int16 -- 0
| OSwapPop -- 1
| OGet !Word8 -- 2
| OAdd -- 3
| OSub -- 4
| OMul -- 5
| ODiv -- 6
deriving (Show, Read, Eq, Generic)
instance NFData Opcode
Notice that we also assigned bytecodes—that is, a unique byte value—to each Opcode
above, which are just their ordinals. Now we are ready to write the compiler.
The Compiler
The compiler takes an expression with the bytecode size, and compiles it to a strict ByteString
of that size. Recall that in the previous post, we wrote our parser such that the bytecode size for each AST node was calculated while parsing it. This allows us to pre-allocate a bytestring of required size before compiling the AST. We compile to actual bytes here, and don’t use the opcodes.
type Bytecode = BS.ByteString
compile :: SizedExpr -> Result Bytecode
compile = compile' defaultStackSize
compile' :: Int -> SizedExpr -> Result Bytecode
compile' stackSize (expr, bytecodeSize) =
uncurry (fmap . const) . BSI.unsafeCreateUptoN' bytecodeSize $ \fp -> do
(bytecodeSize,)
<$> fmap Right (go Map.empty 0 fp fp expr >>= checkSize fp . TS.fst)
`catch` (pure . Left)
where
go env !sp fp !ip = \case
Num _ | sp + 1 > stackSize -> throwCompileError "Stack overflow"
Num n -> do
let !lb = fromIntegral $ n .&. 0xff
!mb = fromIntegral $ ((fromIntegral n :: Word16) .&. 0xff00) `shiftR` 8
writeByte ip 0 -- OPush
writeByte (ip `plusPtr` 1) lb
writeByte (ip `plusPtr` 2) mb
pure (ip `plusPtr` 3 :!: sp + 1)
BinOp op a b -> do
(ip' :!: sp') <- go env sp fp ip a
(ip'' :!: sp'') <- go env sp' fp ip' b
writeByte ip'' $ translateOp op
pure (ip'' `plusPtr` 1 :!: sp'' - 1)
Let x assign body -> do
(ip' :!: sp') <- go env sp fp ip assign
(ip'' :!: sp'') <- go (Map.insert x sp env) sp' fp ip' body
writeByte ip'' 1 -- OSwapPop
pure (ip'' `plusPtr` 1 :!: sp'' - 1)
Var _ | sp + 1 > stackSize -> throwCompileError "Stack overflow"
Var x -> case Map.lookup x env of
Nothing ->
throwCompileError $ "Unknown variable: " <> BSC.unpack (unIdent x)
Just varScope
| varScope < stackSize && varScope < fromIntegral (maxBound @Word8) -> do
writeByte ip 2 -- OGet
writeByte (ip `plusPtr` 1) $ fromIntegral varScope
pure (ip `plusPtr` 2 :!: sp + 1)
Just _ -> throwCompileError "Stack overflow"
where
ep = fp `plusPtr` bytecodeSize
writeByte :: Ptr Word8 -> Word8 -> IO ()
writeByte !ip !val
| ip < ep = poke ip val
| otherwise = throwCompileError $
"Instruction index " <> show (ip `minusPtr` fp) <>
" out of bound " <> show (bytecodeSize - 1)
translateOp = \case
Add -> 3 -- OAdd
Sub -> 4 -- OSub
Mul -> 5 -- OMul
Div -> 6 -- ODiv
checkSize fp ip = do
let actualBytecodeSize = ip `minusPtr` fp
unless (actualBytecodeSize == bytecodeSize) $
throwCompileError $
("Compiled bytecode size " <> show actualBytecodeSize)
<> (" is not same as expected size: " <> show bytecodeSize)
throwCompileError = throwIO . Error Compile
defaultStackSize :: Int
defaultStackSize = 256
We use the unsafeCreateUptoN'
function from the Data.ByteString.Internal
module that allocates enough memory for the provided bytecode size, and gives us a pointer to the allocated memory. We call this pointer fp
for frame pointer. Then we traverse the AST recursively, writing bytes for opcodes and arguments for each case. We use pointer arithmetic and the poke
function to write the bytes. Int16
numbers are encoded as two bytes in little endian fashion.
In the recursive traversal function go
, we pass and return the current stack pointer sp
and instruction pointer ip
, along with the frame pointer fp
. We update these correctly for each case5. We also take care of checking that the pointers stay in the right bounds, failing which we throw appropriate errors.
We also pass an env
parameter that is similar to the variable names to values environment we use in the AST interpreter, but this one tracks variable names to stack indices at which they reside. We update this information before compiling the body of a Let
expression to capture the stack index of its assignment value. When compiling a Var
expression, we use the env
map to lookup the variable’s stack index, and encode it in the bytecode.
At the end of compilation, we check that the entire bytestring is filled with bytes till the very end, failing which we throw an error. This check is required because otherwise the bytestring may have garbage bytes, and may fail inexplicably during execution.
All the errors are thrown in the IO
monad using the throwIO
function, and are caught after compilation using the catch
function. The final result or error is returned wrapped into Result
.
Let’s see it in action:
$ echo -n "1 + 2 - 3 * 4" | arith-vm compile | hexdump -C
00000000 00 01 00 00 02 00 03 00 03 00 00 04 00 05 04 |...............|
0000000f
$ echo -n "let x = 4 in let y = 5 in x + y" | arith-vm compile | hexdump -C
00000000 00 04 00 00 05 00 02 00 02 01 03 01 01 |.............|
0000000d
You can verify that the resultant bytes are indeed correct. I assume that it is difficult for you to read raw bytes. We’ll fix this in a minute. Meanwhile, let’s ponder upon some performance characteristics of our compiler.
Compiling, Fast and Slow
You may be wondering why I chose to write the compiler in this somewhat convoluted way of pre-allocating a bytestring and using pointers. The answer is: performance. I didn’t actually start with pointers. I iterated through many different data and control structures to find the fastest one.
The table below shows the compilation times for a benchmark expression file when using different data structures to implement the compile'
function:
Data structure | Time (ms) | Incremental speedup | Overall speedup |
---|---|---|---|
List | 4345 | 1x | 1x |
Seq | 523 | 8.31x | 8.31x |
DList | 486 | 1.08x | 8.94x |
BS Builder | 370 | 1.31x | 11.74x |
Pre-allocated BS | 54 | 6.85x | 80.46x |
Bytearray | 52 | 1.02x | 83.55x |
I started with the bread-and-butter data structure of Haskellers, the humble and known to be slow List
, which was indeed quite slow. Next, I moved on to Seq
and thereafter DList
, which are known to be faster at concatenation/consing. Then I abandoned the use of intermediate data structures completely, choosing to use a bytestring Builder
to create the bytestring. Finally, I had the epiphany that the bytestring size was known at compile time, and rewrote the function to pre-allocate the bytestring, thereby reaching the fastest solution.
I also tried using Bytearray
, which has more-or-less same performance of bytestring, but it is inconvenient to use because there are no functions to do IO with bytearrays. So I’d anyway need to use bytestrings for reading from STDIN or writing to STDOUT, and converting to-and-fro between bytearray and bytestring is a performance killer. Thus, I decided to stick to bytestrings.
The pre-allocated bytestring approach is 80 times faster than using lists, and almost 10 times faster then using Seq
. For such gain, I’m okay with the complications it brings to the code. Here are the numbers in a chart (smaller is better):
Another choice I had to make was how to write the go
function. I ended up passing and returning pointers and environment map, and throwing errors in IO
, but a number of solutions are possible. I tried out some of them, and noted the compilation times for the benchmark expression file:
Control structure | Time (ms) | Slowdown |
---|---|---|
IO | 57.4 | 1.00x |
IO + IORef | 65.0 | 1.13x |
IO + ReaderT | 60.9 | 1.06x |
IO + StateT | 65.6 | 1.14x |
IO + ExceptT | 65.9 | 1.15x |
IO + ReaderT + ExceptT | 107.1 | 1.87x |
IO + StateT + ExceptT | 383.9 | 6.69x |
IO + StateT + ReaderT | 687.5 | 11.98x |
IO + StateT + ReaderT + ExceptT | 702.0 | 12.23x |
IO + CPS | 78.2 | 1.36x |
IO + DCPS | 78.4 | 1.37x |
IO + ContT | 76.5 | 1.33x |
I tried putting the pointer in IORef
s and StateT
state instead of passing them back-and-forth. I tried putting the environment in a ReaderT
config. I tried using ExceptT
for throwing errors instead of using IO errors. Then I tried various combinations of these monad transformers.
Finally, I also tried converting the go
function to be tail-recursive by using Continuation-passing style (CPS), and then defunctionalizing the continuations, as well as, using the ContT
monad transformer. All of these approaches resulted in slower code. The times are interesting to compare (smaller is better):
There is no reason to use IORef
s here because they result in slower and uglier code. Using one monad transformer at a time results in slight slowdowns, which may be worth the improvement in the code. But using more than one of them degrades performance by a lot. Also, there is no improvement caused by CPS conversion, because GHC is smart enough to optimize the non tail-recursive code to be faster then handwritten tail-recursive one that allocates a lot of closures (or objects in case of defunctionalization).
Moving on …
The Decompiler
It is a hassle to read raw bytes in the compiler output. Let’s write a decompiler to aid us in debugging and testing the compiler. First, a disassembler that converts bytes to opcodes:
type Program = Seq Opcode
disassemble :: Bytecode -> Result Program
disassemble bytecode = go 0 Seq.empty
where
!size = BS.length bytecode
go !ip !program
| ip == size = pure program
| otherwise = case readInstr bytecode ip of
0 | ip + 2 < size ->
go (ip + 3) $ program |> OPush (readInstrArgInt16 bytecode ip)
0 -> throwIPOOBError $ ip + 2
1 -> go (ip + 1) $ program |> OSwapPop
2 | ip + 1 < size ->
go (ip + 2) $ program |> OGet (readInstrArgWord8 bytecode ip)
2 -> throwIPOOBError $ ip + 1
3 -> go (ip + 1) $ program |> OAdd
4 -> go (ip + 1) $ program |> OSub
5 -> go (ip + 1) $ program |> OMul
6 -> go (ip + 1) $ program |> ODiv
n -> throwDisassembleError $
"Invalid bytecode: " <> show n <> " at: " <> show ip
throwIPOOBError ip = throwDisassembleError $
"Instruction index " <> show ip <> " out of bound " <> show (size - 1)
throwDisassembleError = throwError . Error Disassemble
A disassembled program is a sequence of opcodes. We simply go over each byte of the bytecode, and append the right opcode for it to the program, along with any parameters it may have. Note that we do not verify that the disassembled program is correct.
Here are the helpers that read instruction bytes and their arguments from a bytestring:
readInstr :: BS.ByteString -> Int -> Word8
readInstr = BS.unsafeIndex
{-# INLINE readInstr #-}
readInstrArgWord8 :: BS.ByteString -> Int -> Word8
readInstrArgWord8 bytecode ip = readInstr bytecode (ip + 1)
{-# INLINE readInstrArgWord8 #-}
readInstrArgInt16 :: BS.ByteString -> Int -> Int16
readInstrArgInt16 bytecode ip =
let lb = readInstr bytecode (ip + 1)
mb = readInstr bytecode (ip + 2)
b1 :: Word16 = fromIntegral lb
b2 = fromIntegral mb `shiftL` 8
in fromIntegral (b1 .|. b2)
{-# INLINE readInstrArgInt16 #-}
Next, we decompile the opcodes to an expression:
decompile :: Program -> Result Expr
decompile program = do
stack <- go Seq.empty program
checkStack Decompile maxBound $ length stack
let ast :<| _ = stack
pure ast
where
go stack = \case
Seq.Empty -> pure stack
opcode :<| rest -> case opcode of
OPush n -> go (stack |> Num n) rest
OAdd -> decompileBinOp Add >>= flip go rest
OSub -> decompileBinOp Sub >>= flip go rest
OMul -> decompileBinOp Mul >>= flip go rest
ODiv -> decompileBinOp Div >>= flip go rest
OGet i -> go (stack |> Var (mkIdent $ mkName $ fromIntegral i)) rest
OSwapPop -> decompileLet >>= flip go rest
where
decompileBinOp op = case stack of
stack' :|> a :|> b -> pure $ stack' |> BinOp op a b
_ -> throwDecompileError $
"Not enough elements to decompile binary operation: " <> show op
decompileLet = case stack of
stack' :|> a :|> b ->
pure $ stack' |> Let (mkIdent $ mkName $ length stack - 2) a b
_ -> throwDecompileError "Not enough elements to decompile let"
mkName i = names `Seq.index` i
names = Seq.fromList $ tail $ combinations 2
combinations = \case
0 -> [""]
n -> let prev = combinations (n - 1)
in prev <> [x : xs | x <- ['a' .. 'z'], xs <- prev]
throwDecompileError = throwError . Error Decompile
checkStack :: (MonadError Error m) => Pass -> Int -> Int -> m ()
checkStack pass stackSize = \case
1 -> pure ()
0 -> throwError $ Error pass "Final stack has no elements"
n | n > stackSize -> throwError . Error pass $ "Stack overflow"
n | n > 1 -> throwError . Error pass $ "Final stack has more than one element"
_ -> throwError . Error pass $ "Stack underflow"
Decompilation is the opposite of compilation. While compiling there is an implicit stack of expressions that are yet to be compiled. We make that stack explicit here, capturing expressions as they are decompiled from opcodes. For compound expressions, we inspect the stack and use the already decompiled expressions as the operands of the expression being decompiled. This way we build up larger expressions from smaller ones, culminating in the single top-level expression at the end. Finally, we check the stack to make sure that there is only one expression left in it. Note that like the disassembler, we do not verify that the decompiled expression is correct.
There is one tricky thing in decompilation: we lose the names of the variables when compiling, and are left with only stack indices. So while decompiling, we generate variable names from their stack indices by indexing a list of unique names.
That’s all for compilation and decompilation. Now, we use them together to make sure that everything works.
Testing the Compiler
We write some unit tests for the compiler, targeting both success and failure cases:
compilerSpec :: Spec
compilerSpec = describe "Compiler" $ do
forM_ compilerSuccessTests $ \(input, result) ->
it ("compiles: \"" <> BSC.unpack input <> "\"") $ do
parseCompile input `shouldBe` Right (Seq.fromList result)
forM_ compilerErrorTests $ \(input, err) ->
it ("fails for: \"" <> BSC.unpack input <> "\"") $ do
parseCompile input `shouldSatisfy` \case
Left (Error Compile msg) | err == msg -> True
_ -> False
it "fails for greater sized expr" $ do
compile (Num 1, 4) `shouldSatisfy` \case
Left
( Error Compile "Compiled bytecode size 3 is not same as expected size: 4"
) -> True
_ -> False
it "fails for lesser sized expr" $ do
compile (Num 1, 2) `shouldSatisfy` \case
Left (Error Compile "Instruction index 2 out of bound 1") -> True
_ -> False
where
parseCompile = parseSized >=> compile' 4 >=> disassemble
compilerSuccessTests :: [(BSC.ByteString, [Opcode])]
compilerSuccessTests =
[ ( "1",
[OPush 1]
),
( "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1",
[ OPush 1, OPush 2, OAdd, OPush 3, OPush 4, OMul, OSub, OPush 5, OPush 6,
ODiv, OPush 1, ODiv, OAdd, OPush 1, OAdd ]
),
( "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)",
[ OPush 1, OPush 2, OPush 3, OSub, OPush 4, OMul, OAdd, OPush 5, OPush 6,
ODiv, OPush 1, OPush 1, OAdd, ODiv, OAdd ]
),
( "let x = 4 in x + 1",
[OPush 4, OGet 0, OPush 1, OAdd, OSwapPop]
),
( "let x = 4 in let y = 5 in x + y",
[OPush 4, OPush 5, OGet 0, OGet 1, OAdd, OSwapPop, OSwapPop]
),
( "let x = 4 in let x = x + 1 in x + 2",
[OPush 4, OGet 0, OPush 1, OAdd, OGet 1, OPush 2, OAdd, OSwapPop, OSwapPop]
),
( "let x = let y = 3 in y + y in x * 3",
[ OPush 3, OGet 0, OGet 0, OAdd, OSwapPop, OGet 0, OPush 3, OMul, OSwapPop ]
),
( "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3",
[ OPush 1, OPush 2, OGet 1, OGet 1, OMul, OSwapPop, OAdd, OGet 0, OPush 1,
OAdd, OSwapPop, OGet 0, OPush 3, OMul, OSwapPop ]
),
("1/0", [OPush 1, OPush 0, ODiv]),
("-32768 / -1", [OPush (-32768), OPush (-1), ODiv])
]
compilerErrorTests :: [(BSC.ByteString, String)]
compilerErrorTests =
[ ("x", "Unknown variable: x"),
("let x = 4 in y + 1", "Unknown variable: y"),
("let x = y + 1 in x", "Unknown variable: y"),
("let x = x + 1 in x", "Unknown variable: x"),
("let x = 4 in let y = 1 in let z = 2 in y + x", "Stack overflow"),
("let x = 4 in let y = 5 in x + let z = y in z * z", "Stack overflow"),
("let a = 0 in let b = 0 in let c = 0 in let d = 0 in d", "Stack overflow")
]
In each test, we parse and compile an expression, and then disassemble the compiled bytes, which we match with expected list of opcodes, or an error message.
Let’s put these tests with the parser tests, and run them:
$ cabal test -O2
Running 1 test suites...
Test suite specs: RUNNING...
Parser
parses: "1 + 2 - 3 * 4 + 5 / 6 / 0 + 1" [✔]
parses: "1+2-3*4+5/6/0+1" [✔]
parses: "1 + -1" [✔]
parses: "let x = 4 in x + 1" [✔]
parses: "let x=4in x+1" [✔]
parses: "let x = 4 in let y = 5 in x + y" [✔]
parses: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
parses: "let x = 4 in (let y = 5 in x + 1) + let z = 2 in z * z" [✔]
parses: "let x=4in 2+let y=x-5in x+let z=y+1in z/2" [✔]
parses: "let x = (let y = 3 in y + y) in x * 3" [✔]
parses: "let x = let y = 3 in y + y in x * 3" [✔]
parses: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "" [✔]
fails for: "1 +" [✔]
fails for: "1 & 1" [✔]
fails for: "1 + 1 & 1" [✔]
fails for: "1 & 1 + 1" [✔]
fails for: "(" [✔]
fails for: "(1" [✔]
fails for: "(1 + " [✔]
fails for: "(1 + 2" [✔]
fails for: "(1 + 2}" [✔]
fails for: "66666" [✔]
fails for: "-x" [✔]
fails for: "let 1" [✔]
fails for: "let x = 1 in " [✔]
fails for: "let let = 1 in 1" [✔]
fails for: "let x = 1 in in" [✔]
fails for: "let x=1 inx" [✔]
fails for: "letx = 1 in x" [✔]
fails for: "let x ~ 1 in x" [✔]
fails for: "let x = 1 & 2 in x" [✔]
fails for: "let x = 1 inx" [✔]
fails for: "let x = 1 in x +" [✔]
fails for: "let x = 1 in x in" [✔]
fails for: "let x = let x = 1 in x" [✔]
AST interpreter
interprets: "1" [✔]
interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
interprets: "1 + -1" [✔]
interprets: "1 * -1" [✔]
interprets: "let x = 4 in x + 1" [✔]
interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
interprets: "let x = 4 in let y = 5 in x + y" [✔]
interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
interprets: "let x = let y = 3 in y + y in x * 3" [✔]
interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "x" [✔]
fails for: "let x = 4 in y + 1" [✔]
fails for: "let x = y + 1 in x" [✔]
fails for: "let x = x + 1 in x" [✔]
fails for: "1/0" [✔]
fails for: "-32768 / -1" [✔]
Compiler
compiles: "1" [✔]
compiles: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
compiles: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
compiles: "let x = 4 in x + 1" [✔]
compiles: "let x = 4 in let y = 5 in x + y" [✔]
compiles: "let x = 4 in let x = x + 1 in x + 2" [✔]
compiles: "let x = let y = 3 in y + y in x * 3" [✔]
compiles: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
compiles: "1/0" [✔]
compiles: "-32768 / -1" [✔]
fails for: "x" [✔]
fails for: "let x = 4 in y + 1" [✔]
fails for: "let x = y + 1 in x" [✔]
fails for: "let x = x + 1 in x" [✔]
fails for: "let x = 4 in let y = 1 in let z = 2 in y + x" [✔]
fails for: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
fails for: "let a = 0 in let b = 0 in let c = 0 in let d = 0 in d" [✔]
fails for greater sized expr [✔]
fails for lesser sized expr [✔]
Finished in 0.0147 seconds
73 examples, 0 failures
Test suite specs: PASS
Awesome, it works! That’s it for this post. Let’s update our checklist:
In the next part, we write a virtual machine that runs our compiled bytecode, and do some benchmarking.
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
There are VMs that execute hardware ISs instead of bytecode. Such VMs are also called Emulators because they emulate actual CPU hardware. Some examples are QEMU and video game console emulators.↩︎
VMs use virtual registers instead of actual CPU registers, which are often represented as a fixed size array of 1, 2, 4 or 8 byte elements.↩︎
I call them variables here but they do not actually vary. A better name is let bindings.↩︎
We could have used two separate opcodes here:
OSwap
andOPop
. That would result in same final result when evaluating an expression, but we’d have to execute two instructions instead of one forLet
expressions. Using a singleOSwapPop
instruction speeds up the execution, not only because we reduce the number of instructions, but also because we don’t need to do a full swap, only the half swap is enough because we pop the stack anyway after the swap. This also shows how we can improve the performance of our VMs by inventing specific opcodes for particular operations.↩︎Note the use of strict
Pair
s here, for performance reasons.↩︎
If you liked this post, please leave a comment.