What Every Hooner Should Know About Literals on Urbit
The Hoon compiler handles value/aura conversions and nesting automatically, and most of the time you shouldn't be surprised once you've satisfied the type checker. But sometimes you run into something like this which may defy your expectations:
> %~~> %~~%''> %~~~ :: invalid!> %~~~~%'~'
What's going on here? I thought I had an innocent value like a
term, which starts with a
% cen. I didn't have this actually—what happened was my inadvertent stumbling into atom literal syntax, which has its own vagaries and requirements. In this article, let's take a dive together down the rabbit hole of representing values in Hoon. It turns out that there is a lot going on!
In this article, I'll walk you through what was going wrong through the lens of a world of constants and literal syntax.
The first hint that something was wrong with my
%~ above should have been that I was using a
~ sig and fooling myself that it was a
term—it wasn't. What I was actually doing was specifying null
~ as a constant. Most of the time when writing code, I haven't bothered to explicitly use constants for anything:
terms have sufficed for distinguishing values since they double as molds.
But, sure enough, what I had thought was a
term was a literal constant of null
> -:!>(%~)#t/%~> -:!>(~)#t/%~
When you run into something that doesn't make sense for you in Hoon, you can break out introspection tools from the compiler to figure out what Hoon thinks it's seeing.
++ream is the compiler parser arm, which accepts a
cord of text and producing a Hoon abstract syntax tree (AST). (You can get a similar effect using
!, zapcom as well.)
> (ream '%~')[%rock p=%n q=0]> (ream '~')[%bust p=%null]
A constant atom (or cold atom) is denoted in the AST as
%rock while a regular non-constant atom (or warm atom) is
> !,(*hoon %1)[%rock p=%ud q=1]> !,(*hoon 1)[%sand p=%ud q=1]> !,(*hoon %one)[%rock p=%tas q=6.647.407]
term can consist of only lowercase ASCII letters,
9, but if the latter two are the first character then it's not actually a
term: it's a constant. (I'll call it a "constant in
%" in this article for clarity.)
> !,(*hoon -1)[%sand p=%sd q=1]> !,(*hoon %-1)[%rock p=%sd q=1]
Because a constant in
% is a mold, the order of comparison matters in expressions:
> =/ axn `@tas`%gain?=(axn %gain)-find.$dojo: hoon expression failed> =/ axn `@tas`%gain?=(%gain axn)%.y
term has a special syntax,
> `@tas`~%$> !,(*hoon %$)[%rock p=%tas q=0]
My mistake was in assuming
%~ would be the same sort of thing as
%$, when it's not.
> !,(*hoon %~)[%rock p=%n q=0]> !,(*hoon ~)[%bust p=%null]
All of this is muddied because
++sane is not currently enforced: I can cast any
cord to a
term and Hoon doesn't balk at the point of conversion:
> -:!>((@tas 'Hello Mars'))#t/@tas> ((sane %tas) (@tas 'Hello Mars'))%.n
As you've encountered if you've worked at all with Hoon, every atom, or base value, has a characteristic form. This means that the parser which is analyzing a Hoon expression can tell simply by the form of the value what kind of aura it has. (The salient exception is
@ud, which shows up as
@ud by default.)
Since we're being quite thorough in this article, let's summarize every single atom currently in Hoon (version
%140) and note how their literal syntax is legible and distinct. Note that generally leading zeroes are stripped from expressions.
|empty aura||—||—||Has no characteristic form.|
|UTF-32 (used by terminal stack)|
|date||—||—||Has no characteristic form.|
|relative date (ie, timespan)|
|Loobean||For compiler, not castable.|
|Internet address||—||—||Has no characteristic form.|
|nil||—||For compiler, not castable.|
|phonemic base (ship name)|
|phonemic base, unscrambled|
|IEEE-754 floating-point||—||—||Has no characteristic form.|
|half precision (16 bits)|
|single precision (32 bits)|
|double precision (64 bits)|
|quad precision (128 bits)|
|signed integer, sign bit low||—||—||Has no characteristic form.|
|UTF-8 text (cord)|
|ASCII text (knot)||Character restrictions; use |
|ASCII text symbol (term)||Character restrictions; use |
|unsigned integer||—||—||Has no characteristic form.|
You'll also find some irregular auras in use:
%lull, for instance, has a
@uxblob type. Nonstandard auras (i.e. those not listed in the table above) render as
@ux visibly, but are still subject to nesting rules. In fact, the capital-letter suffixes one occasionally encounters (like
@uvJ) are programmer annotations to mark the intended bit-width of a value. (
A = ,
B = ,
C = ,
D = ,
E = , etc.)
We also include two other literal syntaxes which don't resolve to atoms:
%blobrepresents a raw noun to or from Unix, which processes into an effect. It is prefixed with
~sig.> `coin`blob+5[%blob p=5]> ~(rend co `coin`blob+5)"~05o"> ~05o5> `coin`blob+[1 2 3]> ~(rend co `coin`blob+500)"~07q30"> ~07q30500> ~(rend co `coin`blob+[1 2 3])"~038i3h"> ~(rend co `coin`blob+[0x1 0x2 0x3])"~038i3h"
You won't typically write these by hand, but may produce them if you were working with the Clay vane internal to the kernel, for instance.
%manyrepresents a compact URL-safe way of writing a tuple of atoms. Not all atoms can be represented this way: notably, you can use
@taswith it. Values are separated with
._^_^_^__.> ._0__0> ._1_2__[1 2]> ._1_2_3__[1 2 3]> ._0b1_0x2_0v3_0w4__[0b1 0x2 0v3 0w4]> ._one_two_three_four_five__[%one %two %three %four %five]
Pretty much anything, prefixed with
% cen, can become a constant:
> %4%4> %0b111%0b111> %'Hello Mars'%'Hello Mars'> !,(*hoon 'Hello Mars')[%sand p=%t q=545.182.085.650.269.906.691.400]> !,(*hoon %'Hello Mars')[%rock p=%t q=545.182.085.650.269.906.691.400]
Now we're ready to circle back around to the precipitating instance:
> %~~> %~~%''> %~~~ :: invalid!> %~~~~%'~'
The first of these,
%~, is the constant null
~. The others are a literal syntax for writing
cords using an escape:
> ~~a'a'> ~~~~a'~a'> ~~~~~~a'~~a'
These are controlled using a conjunct of related auxiliary arms:
> (wack 'a')~.a> (wick ~.a)[~ ~.a]> (woad ~~a)'a'> (wood 'a')~.a> (wood 'Hello Mars!')~.~48.ello.~4d.ars~21.> (woad ~.~48.ello.~4d.ars~21.)'Hello Mars!'
%~ and its friends led me down a surprisingly deep rabbit hole into the Hoon parser
++so and its parsing rules.
Header image by Wilson Bentley, the first photographer of snowflakes.