Picoscript

Picoscript is a custom script system implemented in PICO-8 lua which allowed the PICO-8 port of Nebulus to use ROM data to store logic beyond the console’s limit of 8192 lua tokens.

The fantasy console PICO-8 has some strict input cartridge limits:

15616 compressed lua code characters (65536 uncompressed chars)
17152 bytes ROM
8192 lua code tokens

Its runtime limits are more generous: 64KB RAM and 2MB lua memory. This enables a cart to store data compressed, and uncompress it at runtime, for example during initialization.

However lua tokens cannot be compressed, and PICO-8 does not expose any kind of lua load from strings. So if you want your cart to contain more logic than 8192 tokens allow you have to resort to some kind of data-driven custom script system implemented in lua (which itself consumes precious tokens). These script systems could take many possible forms, ranging from lists of functions to call with arguments (encoded as a single string) to domain-specific languages with bytecode interpreters (where the bytecode is stored in ROM or lua strings).

Picoscript uses a slightly restricted subset of PICO-8 lua as the source language, and targets bytecode which is interpreted at init time to create lua closures that can be called directly from normal PICO-8 lua code. Using PICO-8 lua as the source language makes it very easy to convert functions to and from picoscript. It’s possible to create a new function in lua, test it, debug it, then promote it to picoscript to save tokens. Or try making a function picoscripted, and revert it to lua if it’s too slow.

Picoscript build macro

Here’s a very simple function from the Nebulus source code which we’ll follow through the picoscript toolchain.

`picoscript[[]]
function glstate(st)
 glstn=st
 bblf=1
end
[[]]`

It just sets a couple of glue screen state global variables. The backticks are triggers for the build script to invoke a macro (that is, call a build-time function), in this case the one called picoscript. The [[]] is just a delimeter used to mark the beginning and end of the quoted code that is passed as a string to the build-time function. So the build script executes this at build time:

picoscript("glstate(st)\n glstn=st\n bblf=1\nend")

Nebulus’ build script is actually written in PICO-8 compatible lua code, executed by a C program that embeds the excellent z8lua, which is a modified version of the lua 5.2 C library that handles PICO-8 syntax extensions (like short if/while, bitwise operators, etc). That makes it straightforward to make a C function which compiles a lua string and outputs the lua bytecode.

Lua bytecode

In luac format, the lua bytecode for glstate looks like this:

main <(string):0,0> (3 instructions at 012A12A8)
0+ params, 2 slots, 1 upvalue, 0 locals, 1 constant, 1 function
        1       [4]     CLOSURE         0 0     ; 012A1300
        2       [1]     SETTABUP        0 -1 0  ; _ENV "glstate"
        3       [4]     RETURN          0 1
constants (1) for 012A12A8:
        1       "glstate"
locals (0) for 012A12A8:
upvalues (1) for 012A12A8:
        0       _ENV    1       0

function <(string):1,4> (3 instructions at 012A1300)
1 param, 2 slots, 1 upvalue, 1 local, 3 constants, 0 functions
        1       [2]     SETTABUP        0 -1 0  ; _ENV "glstn"
        2       [3]     SETTABUP        0 -2 -3 ; _ENV "bblf" 1
        3       [4]     RETURN          0 1
constants (3) for 012A1300:
        1       "glstn"
        2       "bblf"
        3       1
locals (1) for 012A1300:
        0       st      1       4
upvalues (1) for 012A1300:
        0       _ENV    0       0

The lua opcodes are documented in lopcodes.h.

Picoscript bytecode

The build step converts the lua opcodes to picoscript codes for the picoscript boot runtime. Here are the codes for glstate:

opcode	flags	a	b	c	d	meaning
`valop_b`	`b:const`	`0`	`_ENV`	n/a	n/a	r[0]=_ENV
`calld3`	`b:const` `c:reg`	`0`	`"glstn"`	`1`	`rawset`	rawset(r[0],“glstn”,r[1])
`valop_b`	`b:const`	`0`	`_ENV`	n/a	n/a	r[0]=_ENV
`calld3`	`b:const` `c:imm`	`0`	`"bblf"`	`1`	`rawset`	rawset(r[0],“bblf”,1)
`ret`	`b:imm`	`1`	`0`	n/a	n/a	return unpack(r,1,0)

Picoscript stores values in one of a few possible places. Immediate constants are 16-bit integers; they’re stored directly in the codes. All other constants are combined into a single constant table that’s stored in a lua string; a reference to the constant in the form of a 16-bit index into the constant table is stored in the codes. Finally, register references are also stored as small integer indexes directly in the codes. The flags control how the opcode arguments b and c are treated. a is 0 or 1 for jump instructions and a register reference for all other instructions, and d is an immediate value in call instructions or a constant reference for all other instructions that use it.

Picoscript emulates lua’s rules around an internal array of register values. Function parameters appear starting at register index 1. Other registers may be assigned for local variables or temporary intermediate values (such as setting up function call arguments) elsewhere in the array.

Picoscript bootstrap

The build step also outputs a function definition table with the name of the function (e.g. “glstate”) and its length in instructions. The codes are compressed and stored in ROM. There is a psboot function which goes through each picoscript function definition, uncompresses codes, looks up constants, and calls psfun to generate a closure representing the function, then assigns this to a global variable with the name of the function. (The fun thing about psboot is that it is also a picoscripted function so it doesn’t take any PICO-8 tokens; it’s also implemented to only require bytes for its codes to save space, and doesn’t require any constant lookups for itself).

function psfun(code,ncode)
 local nfs,nf={}
 for i=ncode or #code,1,-5 do
  nf=mkop(nfs,nf,unpack(code,i-4,i))
  add(nfs,nf)
 end
 return function(...)
  return nf{...} -- call the wrapped picoscript closure with r=arguments
 end
end

This takes a table of codes with 5 values per instruction, and optionally the number of codes to wrap. (If you’re following along with the Picoscript bytecode example above note that the opcode and flags are combined into one value, and the other four values are the instruction arguments a, b, c and d). For each instruction it calls mkop to create a closure which performs the instruction’s operation. The closures are chained together; each one calls the next instruction’s closure, which is usually the subsequent instruction but sometimes branches are performed in which case the nfs table is referenced.

The closure that is returned is variadic (takes a ...) - all the arguments are packed into a table which is actually the register array (the r parameter below), and that’s how function parameters are placed beginning at index 1. The first instruction’s closure nf is called.

Picoscript operations

There are 16 possible picoscript operations. Many of them can refer to either a constant or a register with patterns like (kb or r[b]) - if the instruction argument is a constant then kb will contain its value, otherwise kb will be false so r[b] will be evaluated. Similarly for kc and r[c].

function mkop(nfs,nf,op,a,b,c,d)
 local kb=op&`fbits.rkb`==0 and b -- note: kb never nil, as picoscript changes nil constant to r[] lookup
 local kc=op&`fbits.rkc`==0 and c -- note: kc never nil, as picoscript changes nil constant to r[] lookup
 local af=a!=0
 return ({
  function(r) return unpack(r,a,b) end, -- opcode.ret
  function(r) return (af==r[0] and nfs[b] or nf)(r) end, -- opcode.jump_if_x
  function(r) r[a]=kb or r[b] return nf(r) end, -- opcode.valop_b
  function(r) r[a]=not (kb or r[b]) return nf(r) end, -- opcode.valop_not_b
  function(r) r[a]=(kb or r[b])==(kc or r[c]) return nf(r) end, -- opcode.valop_b_eq_c
  function(r) r[a]=(kb or r[b])+(kc or r[c]) return nf(r) end, -- opcode.numop_b_add_c
  function(r) r[a]=(kb or r[b])-(kc or r[c]) return nf(r) end, -- opcode.numop_b_sub_c
  function(r) r[a]=(kb or r[b])*(kc or r[c]) return nf(r) end, -- opcode.numop_b_mul_c
  function(r) r[a]=(kb or r[b])/(kc or r[c]) return nf(r) end, -- opcode.numop_b_div_c
  function(r) r[a]=(kb or r[b])\(kc or r[c]) return nf(r) end, -- opcode.numop_b_idiv_c
  function(r) r[a]=(kb or r[b])%(kc or r[c]) return nf(r) end, -- opcode.numop_b_mod_c
  function(r) return (af==((kb or r[b])<(kc or r[c])) and nfs[d] or nf)(r) end, -- opcode.jump_b_lt_c
  function(r) return (af==((kb or r[b])<=(kc or r[c])) and nfs[d] or nf)(r) end, -- opcode.jump_b_le_c
  function(r) local x,y,z=r[a](unpack(r,a+1,b))
   for j=c,d do r[j],x,y=x,y,z end return nf(r) end, -- opcode.call
  function(r) r[a]=d(kb or r[b],kc or r[c]) return nf(r) end, -- opcode.calld2
  function(r) d(r[a],kb or r[b],kc or r[c]) return nf(r) end -- opcode.calld3
 })[1+op%`fbits.code.mask+1`]
end

Closure example

So psfun/mkop transforms our glstate bytecode to chained closures that would be something like this if hand-written:

rawset(_ENV,"glstate",
 function(...) 
  return (
   function(r)
    r[0]=_ENV or r[_ENV] -- opcode.valop_b; r[0]=_ENV
    return (
     function(r)
      rawset(r[0],
       "glstn" or r["glstn"],
       false or r[1]) -- opcode.calld3; rawset(r[0],"glstn",r[1])
      return (
       function(r)
        r[0]=_ENV or r[_ENV] -- opcode.valop_b; r[0]=_ENV
        return (
         function(r)
          rawset(r[0],
           "bblf" or r["bblf"],
           1 or r[1]) -- opcode.calld3; rawset(r[0],"bblf",1)
          return (
           function(r)
            return unpack(r,1,0) -- opcode.ret; return
           end
          )(r)
         end
        )(r)
       end
      )(r)
     end
    )(r)
   end
  ){...} 
 end
)

Which sets up a global variable named “glstate” that points to a function which can be called directly from PICO-8 code with, for example, glstate(3).

Operation tradeoffs

Note that the chosen picoscript operation types are not inevitable; you could design a different script system with different tradeoffs. For example I figured concatenating strings would be rarely performed, so picoscript generates a call to a user-defined concat function when the source uses the .. lua operator, rather than the operation being intrinsic to mkop. Similarly bitwise operations call their function band, bor, etc. Since calling two-parameter functions that returned a result which was assigned to a register was common the opcode.calld2 operation was defined.

A custom script system also doesn’t need to generate closures; it could instead have a bytecode interpreter. At one point Nebulus had such an interpreter and it only took ~200 tokens (compared to picoscript runtime of 450 tokens), but it was significantly slower.

Picoscript limitations

Picoscript as defined has some limitations, such as:

At most 3 return results are supported. This avoids creating a table every time a function returned results in opcode.call because that was unnecessarily expensive in most cases.
Tail calls are not supported.
The only supported upvalue is _ENV.
In lua when a call to a function appears as the last argument in a function call (such as bar() in foo(x,y,bar())), then all the results of the inner call become arguments for the outer call. Picoscript doesn’t support this, and if the last argument is a call it must be enclosed in parentheses to force the lua compiler to truncate results to just one argument (like foo(x,y,(bar()))).
Generating closures is not supported.
To avoid needing special operations for table getters and setters, picoscript generates calls to rawget and rawset instead. This means metatables are not supported.

It’s pretty clear that picoscript code will not run as fast as regular PICO-8 lua code. However in Nebulus it was acceptable for much of the initialization, glue screen logic, and even parts of the bonus level.

PICO-8 Nebulus stats

Here’s some stats from Nebulus:

lua code: 8184 tokens (8 tokens remaining!)
picoscript code: 4169 tokens saved
picoscript runtime lua code: 456 tokens used
ROM used for picoscripted functions: 7181 bytes
lua chars for constant table (after minify): 1524 bytes (1090 compressed)
lua memory difference before/after psboot: 621.2KB (closure/upvalue runtime memory)

So after compression it averaged to about 2 bytes of ROM/chars for every picoscripted token.

Note that 621.2KB is ~30% of the 2MB limit to lua memory, which is fine for Nebulus as the game’s total lua memory usage never exceeds 60%. An interpreter is probably lighter on lua memory usage, so that’s an option if a game is lua memory constrained.

Here’s a list of the functions that were picoscripted: progress, tbl, val, glstate, pcelputpixmap, pokelrpal, pixmapinit, colorrampinit, towerinit, waveinit, gwavedraw, gwavesdraw, tileinitlevel, levelreset, glueinit, gameinit, skyinit, levelinit, addbubbles, updlevelfishp, glwait, updtitle, updlevelinit, updlevelexit, updlevelfail, updgameover, chkhall, updhallfame, gtrans, wavebackup, waverestore, cacheuitowers, easetext, drtitle, drlevelinit, drlevelexit, drlevelfail, drlevelfish, drgameover, drhallfame, qglue, bblinit, sfxinit, _init, nsndnote, nsndstep, nsndrecordaux, nsndrecord, nsndinit, nsndmusic, and nsndsong.