Nouns and Data

Nouns and Data#

Nock is homoiconic, meaning that its code and data share the same representation: the noun. A noun is either an atom (a non-negative integer) or a cell (an ordered pair of two nouns). This simple structure, like a Lisp S-expression can represent any data structure.

Atoms#

Atoms in Nock are non-negative integers, which can represent a variety of data types depending on the context. The context is determined by the programmer, who decides on conventions for interpreting atoms in different ways. For instance, sometimes an atom may represent

True/False Values (“Loobeans”)#

Nock already supplies a way to represent true/false values using the value of 0 for true and 1 for false (e.g. in opcode 5 and opcode 6). Because the polarity is flipped relative to conventional boolean logic, we call this “loobean” logic. (George Boole himself did not assign particular numeric values to true and false—rather, Boole’s use of 0 and 1 as symbols for the empty set and the universal set laid the groundwork for this interpretation. The numerical convention was driven by Claude Shannon’s work on switching circuits, where “closed” circuits were represented by 0 voltage and “open” circuits by 1 voltage. This use of 0 and 1 agrees with Nock’s convention, but the formal exposition of binary logic using + for OR and × for AND typically assumes 1 is true and 0 is false.)

Signed Integers#

In C, a negative signed integer has a fixed bit width (e.g., 32 or 64 bits), and negative values are represented using two’s complement encoding. Since Nock atoms are unbounded non-negative integers, we can represent negative integers by using a convention such as mapping negative integers to odd atoms and non-negative integers to even atoms. (This means you have to be sure what you’re looking at! There is no structural way to tell these atoms apart.) For example:

     :: represents integer 0
     :: represents integer -1
     :: represents integer 1
     :: represents integer -2
     :: represents integer 2
...

This scheme, called ZigZag encoding, allows for arbitrary-width signed integers to be represented as Nock atoms.

Dates and Timestamps#

Dates and timestamps can be represented as integers counting seconds (or milliseconds, microseconds, etc.) since a fixed epoch (e.g., Unix epoch starting at January 1, 1970). Nock does not impose any epoch or time unit convention, so the platform must fix this. Urbit utilizes January 1, 2000 as its epoch, counting in intervals of \(2^{-64}\) seconds since that date. That means that one second is represented as the atom \(2^{64}\):

0b1.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000.0000
0x1.0000.0000.0000.0000

For text representations, take a look at the next section.

Arrays and Lists#

One of the most interesting disjunctions is the difference between a contiguous byte array and a non-contiguous tree structure. This is highlighted in Nock by representing a text string, which can either be an atom or a list:

As an atom, a contiguous series of bytes (ASCII and UTF-8) is represented by an integer. This feels a bit unconventional, but writing it down in hexadecimal makes it more apparent what’s going on. For example, the string 'hello' can be represented as the atom:
```
0x6f.6c6c.6568
```
which is the hexadecimal representation of the ASCII byte values for ‘h’, ‘e’, ‘l’, ‘l’, ‘o’. Note two things about this representation:
1. The bytes are packed in an order such that the least-significant byte is interpreted first.
2. The syntax with 0x and . dot separators is a convenient notation we’ll adopt from the Nock-derived language Hoon.
A Unicode string ('🐝𐐸𐐯𐑊𐐬') would look like this:
```
0xac90.90f0.8a91.90f0.af90.90f0.b890.90f0.9d90.9ff0
```
where each Unicode code point is encoded in UTF-8. For example, the bee emoji ‘🐝’ (U+1F41D) is encoded as the byte sequence 0xF0 0x9F 0x90 0x9D, which appears in Nock’s little-endian order as 0x9d90.f09f.

(By convention, we write such atoms surrounded by single quotes ' soq.)
As a list, we need a regular structural format we can use that breaks things up differently. Nock conveniently branches cells to the right in its notation, so we adopt that here and elect to use a 0x0 to terminate the list. Thus, the string "hello" can be represented as the noun:
```
[0x68 [0x65 [0x6c [0x6c [0x6f 0x0]]]]]
[0x68 0x65 0x6c 0x6c 0x6f 0x0]
```
which is a cell whose head is the atom for ‘h’ (0x68), and whose tail is another cell whose head is ‘e’ (0x65), and so forth, terminating with the atom 0x0. The tree representation looks like this:
```
    .
   / \
 0x68  .
      / \
    0x65  .
         / \
       0x6c  .
            / \
          0x6c  .
               / \
             0x6f 0x0
```
- Calculate the address of each character’s atom in the tree.
A Unicode string ("🐝𐐸𐐯𐑊𐐬") would look like this:
```
[0xf0 0x9f 0x90 0x9d 0xf0 0x90 0x90 0xb8 0xf0 0x90 0x90 0xaf 0xf0 0x90 0x91 0x8a 0xf0 0x90 0x90 0xac 0x0]
```
Compare the byte sequence for “🐝” as above: 0xF0 0x9F 0x90 0x9D, which now visually appears in order.

(It’s not obvious that you would separate at the byte level instead of the character level, but this is the convention in both Hoon and Jock.)

Structures#

Since all data are nouns, it is most straightforward to encode some type metadata in the structural layout of the noun itself. Just like how a list is a rightward-branching tree terminated by 0x0, we can define other structures by agreeing on particular layouts.

Unit/Maybe#

Sometimes a computation may result in a valid answer, or it may fail in some way (whether expected or not). Nock can make this distinction with the unit pattern, which is either 0x0 (indicating “no value”) or a cell [0x0 value] (indicating “some value”). This is analogous to the Maybe type in Haskell, or Option in Rust.

Tagged Unions/Sum Types#

Going one step beyond the unit (which always has a head of 0x0), we can define a tagged union (sum type) by using different atoms in the head position to indicate which variant of the union is present. For example, suppose we want to represent either an integer or a text string. We can define the following convention:

[0x1 100]             :: variant 1: integer
[0x2 'hello']         :: variant 2: text string

More commonly, we would prefer to use the ASCII text representation of the tag for readability. In this case, we can use the atoms 0x696e74 ('int') and 0x737472 ('str'):

['int' 100]           :: variant 1: integer
['str' 'hello']       :: variant 2: text string

Vectors/Matrices#

What if we wanted to represent a 2D matrix of numbers? It may even depend what kind of number we want to represent, since mathematics commonly uses integers (including negative numbers), floating-point numbers, and more. The convention in Hoon uses a structural pair of a type metadata structure and an atom.

This is a \(2 \times 2\) matrix of unsigned integers with a size of 1 byte each (\(2^3\)) and a type tag of 0x746e.6975 ('uint'):

[[[2 2 0] 3 0x746e.6975 0] 0x1.0302.0100]

\[\begin{split} \begin{bmatrix} 0 & 1 \\ 2 & 3 \end{bmatrix} \end{split}\]

The data are in row-major order starting with the LSB.
One of the interesting consequences of Nock having an arbitrary atom size means that there are no leading zeros possible. If you want to represent an array of zeros, you have to pin a 0b1 bit at the MSB to preserve the contiguous size.
The fourth entry in the metadata structure can be used in various ways depending on the array type; in this case, it is 0 to indicate no special options.