The Assembly Language Approach

The Assembly Language Approach#

Digital computers run on word-sized streams of bits, which can be broken into commands, values, and memory addresses. Assembly language is closely related to such machine language, in that it maps one-to-one to control codes, shorthand for memory addresses, and values. However, assembly language provides affordances like comments and variable names (by size not type).

Since assembly language maps to a particular machine code, it is the least portable form of program. Assembly language is closely tied to its chip architecture and today finds its main applications in specialized hardware drivers or other particular niches.

Consider this example of RISC-V assembly language (due to Stephen Marz):

# Determine the length of a C-style string by adding 1 until we find the terminator ‘\0’.
.section .text
.global strlen
strlen:
    # a0 = const char *str
    li     t0, 0         # i = 0
1: # Start of for loop
    add    t1, t0, a0    # Add the byte offset for str[i]
    lb     t1, 0(t1)     # Dereference str[i]
    beqz   t1, 1f        # if str[i] == 0, break for loop
    addi   t0, t0, 1     # Add 1 to our iterator
    j      1b            # Jump back to condition (1 backwards)
1: # End of for loop
    mv     a0, t0        # Move t0 into a0 to return
    ret                  # Return back via the return address register

Now, Nock doesn’t know about many of the things that are first-class elements of assembly language, like memory utilization or layout. (Everything is a noun, after all: a binary tree.) Nock is also portable, not being tied to any particular machine architecture. (However, it does insist on a least-significant byte ordering.)

In that sense, Nock is much more like a bytecode, a similar concept designed for execution on a software virtual machine. (In fact, while Nock isn’t a bytecode for some technical definition reasons like backtracking, it is currently converted to a bytecode for execution in the Vere interpreter in nock.c.)

Compare Nock’s instantiation of the string length program above:

[8 [1 0] [1 8 [1 0] 8 [1 6 [5 [1 0] 0 61] [4 0 6] 9 2 10 [30 0 61] 10 [6 4 0 6] 0 1] 9 2 0 1] 0 1]
::
[8 [1 0]                            :: default input = empty string
   [1 8 [1 0]                       :: default counter = zero
        8 [1 6 [5 [1 0] 0 61]       :: check for zero in string
               [4 0 6]              :: if so, increment counter and return
               9 2 10 [30 0 61]     :: otherwise, replace the value with its tail
                   10 [6 4 0 6]     :: and increment the counter when you loop again
             0 1]                   :: 
          9 2 0 1]                  :: 
   0 1]                             :: 

The high-level logic bears some commonalities, but the mechanics of the loop and the return are elided in favor of a description.

While the assembly program happens to make no reference to values that may already exist in its scope, it certainly could—there’s not a subject-style restriction of scope. The continuation condition of the program (j) and the termination condition (beq) are both GOTO statements which jump in the scope—and could have jumped anywhere.

For assembly, there’s nothing “special” about a function: a function is just a data pattern with a certain header that supports a GOTO jump and knows where to find its arguments. Thus also with Nock, in its way.

Nock, like assembly language, requires the coder or compiler to take pains to express complex ideas using simple pieces. Probably the biggest difference between the two, however, is that Nock simply has no idea how its instruction will be instantiated on the metal; it is a specification of equivalent behavior. This makes Nock feel more declarative (but only a little bit more) than assembly language.