shift or die

security. photography. foobar.

How to turn a Dromedary camel into a Bactrian camel

I recently stumbled over a tweet by @jmaslak which talks about how you can turn a Dromedary camel into a Bactrian camel using Perl6. The following code:

my $c = 'πŸͺ';
$c++;
say $c;

produces the following output: β€œπŸ«β€

The reason for that is the Unicode characters πŸͺ and 🐫 have the code points U+1F42A and U+1F42B respectively, so the ++ operator moves from one to the next (while looking at that code I also learned that ++ is not the same as += 1 – if you try this, rakudo complains that πŸͺ is not a valid base-10 number).

Since I am currently in the process of learning more about both Haskell and PureScript, I decided I wanted to try and replicate that code in both languages.

In Haskell, I managed quite quickly as follows:

Prelude> import Data.Char
Prelude Data.Char> putStrLn [(chr . (+1) . ord) 'πŸͺ']
🐫

While writing this blog post, I realized that Char has a Enum type class instance as well, so the code can be made even easier:

Prelude> putStrLn [succ 'πŸͺ']
🐫

PureScript created a bit more of a headache for me. I first tried to work with toCharCode from Data.Char, but …

PSCi, version 0.12.0
Type :? for help

import Prelude

> import Data.Char
> toCharCode 'πŸͺ'
(line 1, column 15):
unexpected astral code point in character literal; characters must be valid UTF-16 code units

What? That kinda reminds me about an 11 year old rant about VBScript. Oh well, luckily if one knows where to dig (or whines a bit on Twitter), the Data.String.CodePoints module comes to the rescue. Equipped with this, I arrived at the following solution:

import Data.String.CodePoints (singleton, codePointAt)
import Data.Enum (succ)
import Data.Maybe (maybe)
maybe "" singleton (codePointAt 0 "πŸͺ" >>= succ)

Wow, that looks a bit more complicated than in Haskell. OTOH, it is also safer. Let me try and explain what is happening here:

Since we still can’t use a Dromedary camel in a character literal, we have to put it into a string literal (I am still somewhat confused as to why that works, but it does not in character literals though …). We can then call the codePointAt function which has the following type:

> :t codePointAt
Int -> String -> Maybe CodePoint

So we pass it an Int (the position in the string, 0 in our case) and a String and we get back a Maybe CodePoint. Why Maybe? Because if we want to get for example the code point of the second character of β€œπŸͺ”, it does not exist, so it will return Nothing to signal this.

As a second step, we want to get the next code point from here. Luckily, CodePoint has an Enum type class instance (at least in newer versions of Data.String.CodePoints, the above code unfortunately does not work on try.purescript.org as Phil Freeman himself pointed out). This means we can use the succ function which has the following type:

> :t succ
forall a. Enum a => a -> Maybe a

My first attempt was to say: β€œOK, then I will just (f)map succ over the Maybe CodePoint returned by codePointAt 0”. But then I end up with a double Just construct:

> succ <$> codePointAt 0 "πŸͺ"
(Just (Just (CodePoint 0x1F42B)))

Then I realized that I recently read in The Haskell Book (Haskell Programming From First Principles) that this is exactly the use case for Monads and the bind operator (>>=). So the bind operator makes sure that we get rid of one of the layers of Maybes and does what we want:

> codePointAt 0 "πŸͺ" >>= succ
(Just (CodePoint 0x1F42B))

We have a Maybe CodePoint now which we want to turn into a String. For this, we combine the maybe function from Data.Maybe and singleton from Data.String.CodePoints. Here are their types:

> :t maybe
forall a b. b -> (a -> b) -> Maybe a -> b

> :t singleton
CodePoint -> String

Let’s start with singleton: It takes a CodePoint and gives us a String of length 1 with the character represented by that code point. The maybe function takes a default value, a function that goes from a to b, a Maybe a value and gives us a b value (either the default one if the Maybe a is Nothing, or the result of the function application of the value inside the Just in the other case).

If we want to combine this function with maybe, we can figure out what the types a and b are in our specific case. For this we can used typed holes, something I recently learned about at the very nice FP Unconference BusConf 2018:

> :t maybe ?b singleton ?ma
[...]
    Hole 'b' has the inferred type

    String

[...]
    Hole 'ma' has the inferred type

    Maybe CodePoint

So b is String and a is CodePoint. Great, we just need to choose the empty string as the default value and run it, then we end up with our camel!

> maybe "" singleton (codePointAt 0 "πŸͺ" >>= succ)
"🐫"