This site has been deprecated. Go to docs.urbit.org.
Urbit DevelopersBlog

What Every Hooner Should Know About Literals on Urbit

Down the rabbit hole of the atom parser.

November 14, 2022

What Every Hooner Should Know About Literals on Urbit

The Hoon compiler handles value/aura conversions and nesting automatically, and most of the time you shouldn't be surprised once you've satisfied the type checker. But sometimes you run into something like this which may defy your expectations:

> %~
~
> %~~
%''
> %~~~ :: invalid!
> %~~~~
%'~'

What's going on here? I thought I had an innocent value like a @tas term, which starts with a % cen. I didn't have this actually—what happened was my inadvertent stumbling into atom literal syntax, which has its own vagaries and requirements. In this article, let's take a dive together down the rabbit hole of representing values in Hoon. It turns out that there is a lot going on!

In this article, I'll walk you through what was going wrong through the lens of a world of constants and literal syntax.

Constants

The first hint that something was wrong with my %~ above should have been that I was using a ~ sig and fooling myself that it was a term—it wasn't. What I was actually doing was specifying null ~ as a constant. Most of the time when writing code, I haven't bothered to explicitly use constants for anything: terms have sufficed for distinguishing values since they double as molds.

But, sure enough, what I had thought was a term was a literal constant of null ~:

> -:!>(%~)
#t/%~
> -:!>(~)
#t/%~

When you run into something that doesn't make sense for you in Hoon, you can break out introspection tools from the compiler to figure out what Hoon thinks it's seeing. ++ream is the compiler parser arm, which accepts a @t cord of text and producing a Hoon abstract syntax tree (AST). (You can get a similar effect using !, zapcom as well.)

> (ream '%~')
[%rock p=%n q=0]
> (ream '~')
[%bust p=%null]

A constant atom (or cold atom) is denoted in the AST as %rock while a regular non-constant atom (or warm atom) is %sand:

> !,(*hoon %1)
[%rock p=%ud q=1]
> !,(*hoon 1)
[%sand p=%ud q=1]
> !,(*hoon %one)
[%rock p=%tas q=6.647.407]

A term can consist of only lowercase ASCII letters, -, and 09, but if the latter two are the first character then it's not actually a term: it's a constant. (I'll call it a "constant in %" in this article for clarity.)

> !,(*hoon -1)
[%sand p=%sd q=1]
> !,(*hoon %-1)
[%rock p=%sd q=1]

Because a constant in % is a mold, the order of comparison matters in expressions:

> =/  axn  `@tas`%gain
 ?=(axn %gain)
-find.$
dojo: hoon expression failed
> =/  axn  `@tas`%gain
 ?=(%gain axn)
%.y

The empty term has a special syntax, %$:

> `@tas`~
%$
> !,(*hoon %$)
[%rock p=%tas q=0]

My mistake was in assuming %~ would be the same sort of thing as %$, when it's not.

> !,(*hoon %~)
[%rock p=%n q=0]
> !,(*hoon ~)
[%bust p=%null]

All of this is muddied because ++sane is not currently enforced: I can cast any cord to a term and Hoon doesn't balk at the point of conversion:

> -:!>((@tas 'Hello Mars'))
#t/@tas
> ((sane %tas) (@tas 'Hello Mars'))
%.n

Literal Syntax

As you've encountered if you've worked at all with Hoon, every atom, or base value, has a characteristic form. This means that the parser which is analyzing a Hoon expression can tell simply by the form of the value what kind of aura it has. (The salient exception is @/@ud, which shows up as @ud by default.)

Since we're being quite thorough in this article, let's summarize every single atom currently in Hoon (version %140) and note how their literal syntax is legible and distinct. Note that generally leading zeroes are stripped from expressions.

AuraMeaningLiteral SyntaxExampleNote
@empty auraHas no characteristic form.
@cUTF-32 (used by terminal stack)~-_____~-~45fed
@ddateHas no characteristic form.
@daabsolute date~____._.__..__.__.__..____~2018.5.14..22.31.46..1435
@drrelative date (ie, timespan)~d_____.h_.m__.s__~h5.m30.s12
@fLoobean&For compiler, not castable.
@iInternet addressHas no characteristic form.
@ifIPv4 address.___.___.___.___.195.198.143.90
@isIPv6 address.___.___.___.___.___.___.___.___.0.0.0.0.0.1c.c3c6.8f5a
@nnil~For compiler, not castable.
@pphonemic base (ship name)~______-______-______-______--______-______-______-______~sorreg-namtyv
@qphonemic base, unscrambled~______ (any size).~litsyn-polbel
@rIEEE-754 floating-pointHas no characteristic form.
@rhhalf precision (16 bits).~~___.~~3.14
@rssingle precision (32 bits).___.6.022141e23
@rddouble precision (64 bits).~___.~6.02214085774e23
@rqquad precision (128 bits).~~~___.~~~6.02214085774e23
@ssigned integer, sign bit lowHas no characteristic form.
@sbsigned binary--0b____.____--0b11.1000
@sdsigned decimal--___.___--1.000.056
@sisigned decimal--_____--0i1000
@svsigned base32--0v_____._____-0v1df64.49beg
@swsigned base64--0w_____._____--0wbnC.8haTg
@sxsigned hexadecimal--0x____.____-0x5f5.e138
@tUTF-8 text (cord)'' or ~~___'howdy'
@taASCII text (knot)~._____~.howdyCharacter restrictions; use ++sane.
@tasASCII text symbol (term)%_____%howdyCharacter restrictions; use ++sane.
@uunsigned integerHas no characteristic form.
@ubunsigned binary0b____.____0b11.1000
@udunsigned decimal___.___1.000.056
@uiunsigned decimal____0i1000
@uvunsigned base320v_____._____0v1df64.49beg
@uwunsigned base640w_____._____0wbnC.8haTg
@uxunsigned hexadecimal0x____.____0x5f5.e138

You'll also find some irregular auras in use: %lull, for instance, has a @uxblob type. Nonstandard auras (i.e. those not listed in the table above) render as @ux visibly, but are still subject to nesting rules. In fact, the capital-letter suffixes one occasionally encounters (like @tD and @uvJ) are programmer annotations to mark the intended bit-width of a value. (A = , B = , C = , D = , E = , etc.)

We also include two other literal syntaxes which don't resolve to atoms:

  • %blob represents a raw noun to or from Unix, which processes into an effect. It is prefixed with ~ sig.

    > `coin`blob+5
    [%blob p=5]
    > ~(rend co `coin`blob+5)
    "~05o"
    > ~05o
    5
    > `coin`blob+[1 2 3]
    > ~(rend co `coin`blob+500)
    "~07q30"
    > ~07q30
    500
    > ~(rend co `coin`blob+[1 2 3])
    "~038i3h"
    > ~(rend co `coin`blob+[0x1 0x2 0x3])
    "~038i3h"

    You won't typically write these by hand, but may produce them if you were working with the Clay vane internal to the kernel, for instance.

  • %many represents a compact URL-safe way of writing a tuple of atoms. Not all atoms can be represented this way: notably, you can use @u, @s, and @tas with it. Values are separated with _ cab like ._^_^_^__.

    > ._0__
    0
    > ._1_2__
    [1 2]
    > ._1_2_3__
    [1 2 3]
    > ._0b1_0x2_0v3_0w4__
    [0b1 0x2 0v3 0w4]
    > ._one_two_three_four_five__
    [%one %two %three %four %five]

Constants Redux

Pretty much anything, prefixed with % cen, can become a constant:

> %4
%4
> %0b111
%0b111
> %'Hello Mars'
%'Hello Mars'
> !,(*hoon 'Hello Mars')
[%sand p=%t q=545.182.085.650.269.906.691.400]
> !,(*hoon %'Hello Mars')
[%rock p=%t q=545.182.085.650.269.906.691.400]

Now we're ready to circle back around to the precipitating instance:

> %~
~
> %~~
%''
> %~~~ :: invalid!
> %~~~~
%'~'

The first of these, %~, is the constant null ~. The others are a literal syntax for writing cords using an escape:

> ~~a
'a'
> ~~~~a
'~a'
> ~~~~~~a
'~~a'

These are controlled using a conjunct of related auxiliary arms:

> (wack 'a')
~.a
> (wick ~.a)
[~ ~.a]
> (woad ~~a)
'a'
> (wood 'a')
~.a
> (wood 'Hello Mars!')
~.~48.ello.~4d.ars~21.
> (woad ~.~48.ello.~4d.ars~21.)
'Hello Mars!'

Thus %~ and its friends led me down a surprisingly deep rabbit hole into the Hoon parser ++so and its parsing rules.


Header image by Wilson Bentley, the first photographer of snowflakes.

Next Post

What Every Hooner Should Know About Text on Urbit

November 15, 2022

Previous Post

Get involved building Urbit

September 25, 2022