Here we'll walk through a practical example of writing a
We'll create a
mark for CSV (comma separated values) files, a simple format for storing tabular data in a text file.
CSV files separate fields with commas and rows with line breaks. They look something like:
There is a little complexity surrounding special characters in fields and line endings, but otherwise the only other rule is that all rows must have the same number of fields. You can refer to RFC4180 on the IETF website for more details.
We'll represent such a structure in Hoon as a
(list (list @t)) like:
[['foo' 'bar' 'baz' ~] ['blah' 'blah' 'blah' ~] ['1' '2' '3' ~] ~]
We could perhaps create the type with a
$| rune to include row-length validation in the mold itself, but a
(list (list @t)) is simpler for demonstrative purposes.
A simple mark
Let's begin with the simplest
|_ csv=(list (list @t))++ grab|%++ noun (list (list @t))--++ grow|%++ noun csv--++ grad %noun--
The door takes a
(list (list @t)) as its sample, and we've given it a face of
csv so we can easily reference it. Note its face could be anything, it needn't be the name of our
mark. When we're doing something with data that has a
mark like converting it to another
mark or creating a diff, this is where our data will reside.
Next we have the
+grab arm of our door, which contains a core with arms for converting to our
mark from other
marks. We've given it one arm for the
mark - the most generic
mark which will take any
+noun arm will simply clam whatever it's given with the
(list (list @t))
Next is the
+grow arm which does the inverse of
+grab, converting from our
mark to another
mark. We've also given it a
+noun arm, this time it will simply return the door's sample named
csv, which is of course already a
Note that the
+noun arm is mandatory in
+grab. Clay cannot build a
mark core without it. Conversion arms for any other
marks apart from
%noun are optional.
Finally we have the
+grad arm. This arm specifies functions for revision control like creating diffs, patching files and so on. In our case, rather than writing all those functions, we've just delegated those tasks to the
mark. We can do this because we've specified conversion routines to and from the
mark in our
+grab arms. When we modify a file with a
mark, Clay will convert our data to a
mark, execute the necessary
+grad functions from the
mark file, and then convert it back to a
So now we have a valid
mark file. If we save this as
csv.hoon in the
/mar directory we could store
%csv data in Clay. This may be sufficient for some applications, but what if we want to import a CSV file from Unix or elsewhere? In the next section, we'll look at conversions to and from a
mark to address this.
$mime type represents raw data from Unix or elsewhere. For example, if a text file from Unix containing the word
foo were converted to a
$mime type in Urbit, it would look something like:
[/text/plain q=[p=3 q=7.303.014]]
/text/plain is its MIME type and
p.q is the byte-length of
q.q, which is the data itself as an
mark is used by Clay to store and convert
$mime data. It's an important
mark for moving files from Unix to Urbit and vice versa. When you add a file to a
desk you have mounted to Unix and
|commit the change, Clay will first receive the file as a
mark, then convert from a
%mime to whatever
mark matches the file extension. For example,
foo.txt will be converted from
%txt. Additionally, data fetched by Iris over HTTP will come in as a
$mime-data:http, which is an unvalidated form of
$mime that you may wish to convert to a
mark and then to another
mark. Likewise with Eyre, some of the lower-level interfaces receive HTTP requests with
$mime-data:http in them.
So with the nature of the
mark hopefully now clear, the reason we want conversion methods to and from
%mime in our
mark is so we can import CSV files from Unix and vice versa.
Since a CSV file on Unix will just be a long string with ASCII or UTF-8 encoding, we can treat
q.q in the
$mime as a
cord, and thus write a parser to convert it to a
(list (list @t)). For this purpose, here's a library:
csv-utils.hoon, which you can view in full on the Examples page.
The library contains four functions:
+de-csv- Parse a CSV
(list (list @t)).
+en-csv- Encode a
(list (list @t))as a CSV
+validate- Check all rows of
(list (list @t))are the same length.
+csv-join- Ignore this for now, we'll use it later on.
The decoding and encoding arms use parsing functions from the Hoon standard library. It's not important to be familiar with parsing in Hoon for our purposes here, but you can have a look at the Parsing Guide in the Hoon documentation if you're interested. The important thing to note is that
+de-csv takes a valid CSV-format
@t and returns a
(list (list @t)), and
+en-csv does the reverse - it takes a
(list (list @t)) and returns a CSV-format
Let's try the library in the dojo. After we've added it to
/lib and run
|commit, we can build the file:
> =csv-utils -build-file %/lib/csv-utils/hoon
...try decode a CSV-format
> (de-csv:csv-utils 'foo,bar,baz\0ablah,blah,blah\0a1,2,3')~[<|foo bar baz|> <|blah blah blah|> <|1 2 3|>]
...and try encode a
(list (list @t)) as a CSV-format
> (en-csv:csv-utils [['foo' 'bar' 'baz' ~] ['blah' 'blah' 'blah' ~] ['1' '2' '3' ~] ~])'foo,bar,baz\0ablah,blah,blah\0a1,2,3\0a'
With that working, we can add an import for our library to our
mark defintion and add a
+mime arm to both our
/+ *csv-utils|_ csv=(list (list @t))++ grab|%++ mime |=((pair mite octs) (de-csv q.q))++ noun|= n=*^- (list (list @t))=/ result ((list (list @t)) n)?> (validate result)result--++ grow|%++ mime?> (validate csv)[/text/csv (as-octs:mimes:html (en-csv csv))]++ noun?> (validate csv)csv--++ grad %noun--
+grab we've added a
+mime arm to convert from a
mark to our
mark. It's a simple gate that takes a
$mime (specified as
(pair mite octs) to avoid conflict with the arm name), runs the data through the
+de-csv function and returns a
(list (list @t)) of the CSV data.
We've also added a
+mime arm to
+grow for converting from our
mark to a
mark. We encode our
(list (list @t))
csv sample with our
+en-csv function and then run that through
as-octs:mimes:html to get a
$octs (so it has the byte-length). We also add the
/text/csv MIME type so it's a valid
Additionally, we've used the
+validate function in a few places to make sure our CSV data has consistent row lengths.
If we save the above mark file as
|commit %base, we should now be able to import CSV files into Urbit. Let's give it a go. In the root of our
desk, let's add a file named
foo.csv with the following contents:
If we now
|commit %base, we should see it's been successfully added:
> |commit %base>=+ /~zod/base/4/foo/csv
And if we try reading the file with the
> -read [%x our %base da+now /foo/csv]~[<|foo bar baz|> <|blah blah blah|> <|1 2 3|>]
We can see our
mark has successfully converted our
foo.csv file to a
(list (list @t)) when it was imported.
Let's try the other direction now. We can create a new
bar.csv files in the root of
%base from the dojo like so:
> */bar/csv ~[['abc' 'def' ~] ['ghi' 'jkl' ~]]+ /~zod/base/5/bar/csv
And if we check it in the terminal on the Unix side we can see it's been correctly encoded:
> cat zod/base/bar.csvabc,defghi,jkl
So now our
mark lets us move data in and out of Urbit. In the next section, we'll look at the
+grad arm in more detail.
So far we've just delegated
+grad functions to the
mark, but now we'll look at writing our own.
For demonstrative purposes, we can just poach the algorithms used in the
+grad arm of the
mark and modify them to take our
(list (list @t)) type instead of a
wain. It's not the most efficient algorithm for a CSV file but it'll do the job.
Our diff format will be a
(urge:clay (list @t)), and we'll use some
differ functions from
+lurk to produce diffs and apply patches.
The csv-utils.hoon library we imported also contains a
+csv-join function which we'll use in the
+join arm, just to save space here.
Here's the new
/+ *csv-utils|_ csv=(list (list @t))++ grab|%++ mime |=((pair mite octs) (de-csv q.q))++ noun|= n=*^- (list (list @t))=/ result ((list (list @t)) n)?> (validate result)result--++ grow|%++ mime?> (validate csv)[/text/csv (as-octs:mimes:html (en-csv csv))]++ noun?> (validate csv)csv--++ grad|%++ form %csv-diff++ diff|= bob=(list (list @t))^- (urge:clay (list @t))?> (validate csv)?> (validate bob)(lusk:differ csv bob (loss:differ csv bob))++ pact|= dif=(urge:clay (list @t))^- (list (list @t))=/ result (lurk:differ csv dif)?> (validate result)result++ join|= $: ali=(urge:clay (list @t))bob=(urge:clay (list @t))==^- (unit (urge:clay (list @t)))(csv-join ali bob)++ mash|= $: [ship desk (urge:clay (list @t))][ship desk (urge:clay (list @t))]==^- (urge:clay (list @t))~|(%csv-mash !!)----
In our modified
+grad arm, we've replaced the
%noun delegation with a core containing five arms:
+mash. These arms are all required for a valid
+grad if it's not delegated to another
mark. We'll now look at each in detail.
++ form %csv-diff
+form simply specifies the
mark of the diff file that may be produced by other
+grad functions. If your diff is the same type as your
mark, it could just specify itself like
%csv. In our case our diff is a
(urge:clay (list @t)) rather than a
(list (list @t)), so we need a separate mark file for the diff itself.
mark file which can be saved as
|_ dif=(urge:clay (list @t))++ grab|%++ noun (urge:clay (list @t))--++ grow|%++ noun dif--++ grad %noun--
It's very bare-bones, we just need it for our
mark to work. In our
mark, we've specified it as
++ diff|= bob=(list (list @t))^- (urge:clay (list @t))?> (validate csv)?> (validate bob)(lusk:differ csv bob (loss:differ csv bob))
This arm produces the diff of two
%csv files. The first
%csv file will be given as the sample of the parent door, which if you'll recall we gave a face of
csv. The second
%csv file will be given as the sample of the gate in
+diff, which we've named
bob here. We then just produce the diff of these two files and return it as the type of the mark specified in
+form, which in our case is
(urge:clay (list @t)) for a
%csv-diff. Clay will use
+diff when a file is revised, so it doesn't have to store a whole new copy of the file each time it's modified.
++ pact|= dif=(urge:clay (list @t))^- (list (list @t))=/ result (lurk:differ csv dif)?> (validate result)result
+pact patches a
%csv file with the given diff. Its gate takes a diff and applies it to the
%csv given as the sample of the parent door (which we gave a face of
csv). If the patch succeeds, it will return a new
%csv file - a valid
(list (list @t)). When we read a file that's been modified in Clay, Clay will apply all the diffs it has with
+pact and return the resulting file.
++ join|= $: ali=(urge:clay (list @t))bob=(urge:clay (list @t))==^- (unit (urge:clay (list @t)))(csv-join ali bob)
+join arm merges two different diffs. It takes them both as the sample of its gate (which we've named
bob), and returns a new diff wrapped in a
(unit (urge:clay (list @t))). The
unit will be
~ if the merge failed due to a conflict. This is used by Clay in some cases when
desks are merged. If diff merges are not possible for your use case, you could just have it always return
++ mash|= $: [ship desk (urge:clay (list @t))][ship desk (urge:clay (list @t))]==^- (urge:clay (list @t))~|(%csv-mash !!)
This is like
+join except it forces a diff merge even if there's a conflict. Rather than returning a
unit, it just returns the diff - a
(urge:clay (list @t)) in our case. Also unlike
+join, it takes the
desk each diff came from as well as the diff itself.
+mash arm is not used by Clay in its file revision operations, so it's safe to just make it a dummy arm that crashes as we've done here. If you were to use it, it would likely just be used manually in an agent, thread or generator.
An example of its use would be the
mark, which includes a proper
+mash function that produces a diff with any conflicts annotated, though how you have
+mash handle conficts would depend on your use case. If there were no conflicts between the two diffs, it should produce the same diff as the
So there you have it, a fully functional
mark for CSV files. A
mark file can be as complex or as simple as you'd like, they're very flexible depending on your use case. Additional conversion methods can always be added as they're needed. For example, with just a few lines of code we could add arms for converting CSV files to
%txt and vice versa.
In the next document, we'll look at building and using
mark cores and
mark conversion gates in our own code.