Updated on Thu, 05 Jan 2017 14:52:39 GMT, tagged with ‘tech’.
(Originally posted on Gist.)
In writing one’s own Base64 codec for the Cryptopals Crypto Challenge in Rust (see my multilingual code here!), one gets to a point where every chunk of four adjacent elements in an input vector has to be transformed into a chunk of three elements in an output vector.
That is, the string
SSdt
containing four ASCII bytes becomes the stringI'm
containing three ASCII bytes, andIGtp
becomeski
, and so on, so thatSSdt IGtp bGxp bmcg eW91 ciBi cmFp biBs aWtl IGEg cG9p c29u b3Vz IG11 c2hy b29t
is decoded toI'm killing your brain like a poisonous mushroom
.
I had a function to do this four-to-three downconversion but looping over the two arrays, lining up the indexes, the keeping track of magic threes and magic fours in my code gave me a headache as I worked through writing the following:
pub fn decode(s: &[u8]) -> Vec<u8> {
let mut out: Vec<u8> = vec![0; s.len() / 4 * 3];
for i in 0..out.len() / 3 {
let v = &s[i * 4..i * 4 + 4];
&out[i * 3..i * 3 + 3].copy_from_slice(&quad2triplet(v[0], v[1], v[2], v[3]));
}
out
}
I thought Rust was just C with safety and type inference, so keeping track of indexes like this is just how it was done.
(Caveat, the code above is a tiny bit wrong—it’ll compile and give you the right answer but might pad it with one or two bytes that need trimming if the input ended in =
.)
After Rusting a few more Cryptopals challenges, and binge-reading the standard library’s documentation on iterators and slices, I finally realized this decoder could be written entirely without brittle magic-numbered indexing arithmetic:
pub fn decode0(s: &[u8]) -> Vec<u8> {
let mut out: Vec<u8> = vec![0; s.len() / 4 * 3];
for (v, vo) in s.chunks(4).zip(out.as_mut_slice().chunks_mut(3)) {
vo.copy_from_slice(&quad2triplet(v[0], v[1], v[2], v[3]))
}
out
}
Just to emphasize the crucial lines—before:
// BEFORE: 🤞
for i in 0..out.len() / 3 { // (🏸)
let v = &s[i * 4..i * 4 + 4]; // (🏑)
&out[i * 3..i * 3 + 3].copy_from_slice(/*...*/); // (🏏)
}
and after:
// AFTER: 🙌
for (v, vo) in s.chunks(4).zip(out.as_mut_slice().chunks_mut(3)) {
vo.copy_from_slice(/*...*/)
}
At first glance, the changes might look insignificant, but I reiterate.
No more indexing.
This is big. Before coding, the idea in one’s mind is, “take four-chunks, transform them into three-chunks”.
Then one labors for a few seconds–minutes ironing out three separate indexes: (🏸) how many 3-chunks in the output, each corresponding to (🏑) which chunk of the input, each going to (🏏) which chunk of the output.
And now, Rust—Rust, billed as a safe systems-level language, which I interpreted to mean “C with type inference”—lets us quite literally express our initial idea: we talk about chunks(4)
of the input, and chunks_mut(3)
of the output, and that’s it.
This, my friends, is much bigger than safety or systems programming. This is about ML-grade expressivity in a non-garbage-collected, compiled language.
(N.B. And it’s not just this example. I’m compiling a list of functions where I thought, “Wow, this is Clojure-clear”. Which is a compliment to Rust.)
(PS. The full code is here: base64decode.rs. Also, many thanks to Michael Gattozzi’s “Rust is it’s community” and “Why you should be blogging about Rust” for prodding me to write this brief note.)
(Banner image credit: Didier Descouens (User:Archaeodontosaurus), Hematite Iron-Rose, from Ouro Preto, in Minas Gerais, Brazil. Wikimedia.)
Previous: A UTF-8 mini-refresher
Next: Announcing: KanjiBreak