This site has been deprecated. Go to docs.urbit.org.
Urbit DevelopersBlog

Rune Utilization in Hoon

An investigation of rune frequency in /sys.

January 12, 2023

Rune Utilization in Hoon

Enquiry

I investigated the Urbit kernel for the actual usage rates of various Hoon runes in practice by experienced senior developers. While I had done this a few years ago using a naïve regex match, this time I built the abstract syntax tree of every file in the kernel. This allowed me to include irregular syntax and exclude non-runes (like the spurious ^~ non-ketsig in rose+[" " `~]^~[leaf+"*" (smyt pax)]).

The survey was based on the contents of /sys as of ~2022.12.14. For the first pass, every file in /sys was built using ++reck and ++noah, e.g.

*%/out/ames/txt &txt (noah !>((reck /===/sys/vane/ames)))

I tabulated the number of instances of each currently-supported Hoon rune from the AST. While the particular values are noisy across commits, I expect secular trends to persist.

Table 1. Rune AST labels.

RuneAST LabelRuneAST LabelRuneAST Label
|$%brbc|_%brcb|:%brcl
|%%brcn|.%brdt|-%brhp
|^%brkt|@%brpt|~%brsg
|*%brtr|=%brts|?%brwt
:_%clcb:-%clhp:^%clkt
:+%clls:~%clsg:*%cltr
%_%cncb%:%cncl%.%cndt
%-%cnhp%^%cnkt%+%cnls
%~%cnsg%*%cntr%=%cnts
.^%dtkt.+%dtls.*%dttr
.=%dtts.?%dtwt^|%ktbr
^:%ktcl^.%ktdt^-%kthp
^+%ktls^&%ktpm^~%ktsg
^*%kttr^=%ktts^?%ktwt
;:%mccl;/%mcfs;<%mcgl
;;%mcmc;~%mcsg;=%mcts
~$%sgbc~|%sgbr~_%sgcb
~%%sgcn~/%sgfs~<%sggl
~>%sggr~+%sgls~&%sgpm
~=%sgts~?%sgwt~!%sgzp
=|%tsbr=:%tscl=,%tscm
=.%tsdt=/%tsfs=<%tsgl
=>%tsgr=-%tshp=^%tskt
=+%tsls=;%tsmc=~%tssg
=*%tstr=?%tswt?|%wtbr
?:%wtcl?.%wtdt?<%wtgl
?>%wtgr?-%wthp?#%wthx
?^%wtkt?+%wtls?&%wtpm
?@%wtpt?~%wtsg?=%wtts
?!%wtzp!,%zpcm!<%zpgl
!>%zpgr!,%zpmc!@%zppt
!=%zpts!?%zpwt!!%zpzp

The raw data can be generated at need from the Urbit repo by the technique above.

Results

This technique converted irregular syntax to regular syntax, but did not desugar runes. This investigation thus reflects programmer intent at the object level of expression design. A subsequent attempt with ++open:ap to desugar runes would be instructive as to how Hoon “sees itself”.

Table 2. Observed rune frequency in /sys.

RuneCountFrequencyPercentile
%:984220.94%20.94%
:*519011.04%31.99%
%=34177.271%39.26%
|=23204.937%44.19%
^=18453.926%48.12%
^-17273.675%51.79%
%~16223.451%55.25%
=<15483.294%58.54%
=/15253.245%61.78%
?:13852.947%64.73%
.=12292.615%67.35%
=+12082.57%69.92%
?~10802.298%72.21%
?=9552.032%74.25%
;~8031.709%75.96%
%+7861.672%77.63%
=.6401.362%78.99%
^:6241.328%80.32%
^+5991.275%81.59%
%-5141.094%82.69%
|-5021.068%83.75%
?.4821.026%84.78%
:~4630.9852%85.76%
~/3740.7958%86.56%
^*3410.7256%87.29%
?>3340.7107%88.0%
?&2940.6256%88.62%
=^2930.6235%89.25%
:-2720.5788%89.82%
?-2660.566%90.39%
|*2650.5639%90.95%
.+2400.5107%91.47%
=>2250.4788%91.94%
|%2230.4745%92.42%
=*2100.4468%92.87%
|.2000.4256%93.29%
!!1900.4043%93.7%
~|1860.3958%94.09%
?!1710.3639%94.45%
?|1510.3213%94.78%
=|1480.3149%95.09%
:_1460.3107%95.4%
=?1380.2936%95.7%
!>1360.2894%95.98%
:+1340.2851%96.27%
~>1300.2766%96.55%
?^1160.2468%96.79%
=-1090.2319%97.03%
;:1050.2234%97.25%
~_1030.2192%97.47%
~%840.1787%97.65%
?+820.1745%97.82%
;;720.1532%97.97%
:^610.1298%98.1%
|^580.1234%98.23%
~&560.1192%98.35%
~+560.1192%98.47%
?@550.117%98.58%
%^530.1128%98.7%
|_520.1106%98.81%
=,450.09575%98.9%
%_430.0915%98.99%
^?420.08937%99.08%
^.390.08299%99.17%
|~360.0766%99.24%
!<350.07447%99.32%
%.330.07022%99.39%
|$310.06596%99.45%
|@290.06171%99.51%
?<260.05532%99.57%
.*250.0532%99.62%
=;240.05107%99.67%
=:240.05107%99.73%
~?190.04043%99.77%
!,160.03405%99.8%
;/160.03405%99.83%
^|160.03405%99.87%
%*120.02553%99.89%
|:110.02341%99.92%
~!80.01702%99.93%
.^80.01702%99.95%
!?40.008511%99.96%
=~40.008511%99.97%
.?40.008511%99.98%
^~30.006384%99.98%
!=20.004256%99.99%
?#20.004256%99.99%
;=20.004256%100.0%
~<10.002128%100.0%
|?10.002128%100.0%
!@00.0%100.0%
~=00.0%100.0%
~$00.0%100.0%
;<00.0%100.0%
^&00.0%100.0%

Rune frequency follows a power-law distribution with 18 runes representing 80% of total rune utilization. Indeed, several runes are hapax legomena in /sys (|? barwut, ~<) or do not occur at all (discussed below).

**Figure 1**.  Observed rune frequency in `/sys` as a power law.

Figure 1. Observed rune frequency in /sys as a power law.

(Bear with me on that graph: it's rather hard to display order runes in any sort of coherent way I've come up with so far.)

Analysis

There aren't any real surprises here. The most frequent runes reflect the most common design patterns:

  • % cen rune calls tend to route through %: cencol since the irregular form (fun 1 2) desugars to %:.
  • :* coltar serves similarly as the desugaring of tuples constructed by [1 2 3].
  • %= centis is invoked through the irregular $() expression resets. These are most commonly employed in gates and traps as a recursion, as well as to modify legs and in the nested core design pattern, e.g. this(value new-value).
  • ^= kettis happens in face assignments a=1.
  • ^- kethep compile-time typechecks use tics or explicit rune passage.
  • %~ censig is used to pull an arm in a door.
  • =< tisgal composes two expressions in inverted order. This arises from this:that wing search patterns.
  • =/ tisfas seems to be slightly preferred in more contemporary code to =+ tislus, altho they carry out equivalent operations to pin a value to the subject.

Other common runes follow a similar logic based on common design patterns.

Table 3. Observed rune frequency in /sys, eighteen runes representing 80% of all Hoon code in /sys.

RuneCountFrequencyPercentile
%:984220.94%20.94%
:*519011.04%31.99%
%=34177.271%39.26%
|=23204.937%44.19%
^=18453.926%48.12%
^-17273.675%51.79%
%~16223.451%55.25%
=<15483.294%58.54%
=/15253.245%61.78%
?:13852.947%64.73%
.=12292.615%67.35%
=+12082.57%69.92%
?~10802.298%72.21%
?=9552.032%74.25%
;~8031.709%75.96%
%+7861.672%77.63%
=.6401.362%78.99%
^:6241.328%80.32%

The least frequent runes include:

  • |? barwut produces a lead trap. Like ^& ketpam, lead cores have not yet proven to be a useful expedient in practice.
  • ~< siggar applies hints for the runtime to process. This can be helpful in debugging.
  • ;= mictis produces Sail code, unnecessary within the kernel.
  • ?# wuthax, as yet undocumented, is being developed as a replacement for ?= wuttis that can more powerfully match patterns in general, such as list detection.

A very few runes are never once used in the kernel. Some are simply intended for transient or labile userspace code, for instance.

  • ^& ketpam produces a zinc core (covariant). While included for completeness of the variance system, zinc cores have not turned up in effective design patterns yet within Urbit.
  • ;< micgal acts as a macro rune for sequencing computations in threads, similar to ;~ micsig.
  • ~$ sigbuc is used for profiling code, and shouldn't be present in release code.
  • ~= sigtis detects duplicates, and is used for cleaning up memory from duplicate nouns.
  • !@ zappat branches on wing existence; while it seems like this would be useful in the kernel, it is not employed in practice.

Rune labels like %ktpm that that occur in the codebase may not be represented even once in the final AST. This is possible because ++ream and ++reck yield the names of terms as +$dime of %tas and @ud.

i=[%leaf p=%tas q=1.836.086.379]

Of the other uncommon runes, I note as well that .^ dotket to peek or scry is important in userspace but that the kernel rarely needs this expedient.

Conclusion

My original intent several years ago was to treat such a frequency map as a pedagogical tool. While there are compelling reasons not to treat rune frequency as a normative check on programmer behavior, knowing which runes are used the most in practice guides the sorts of Hoon which should be taught and documented most clearly first.

Subsequent investigations which may be illuminating include:

  1. The change in rune utilization over time (based on age of commit).
  2. The relative frequency differences in different vanes.
  3. The desugared frequency of runes.
  4. The characteristic rune frequency patterns in userspace code, or by programmer.

The kernel has important differences from userspace, but rune frequency in the kernel acts as a reference thumbprint for rune utilization by experienced senior developers in Urbit code.

Previous Post

The Nested Core Design Pattern (As Seen Through `++abet`)

November 18, 2022