Use protobufs - now

Introduction

If you've ever touched any form of web development, ever, you've probably used JSON to get data from a server to a client. Ajax queries nearly always pull data in this format.

Recently, Google Google invented the Protobuf standard, which promises a number of advantages. This seems to have been largly ignored by the community for a while, with most discussions degrading to a complaint one Python library's performance.

I took an interested primarily when noting that Riak KV recommends its protocol buffer interface for performance. I also note, I'm not a Python user.

Typed data

Aside from a potential performance increase, Protocol Buffers are typed. As someone who literally couldn't handle Javascript until things are rewritten in Typescript, this feature is worth a lot.

Smaller

If you're performing a 32 byte Ajax query, you probably don't care if JSON included overhead. If you're doing a much larger query, you might.

Test bed

In order to obtain a fair test, I'm comparing against two JSON libraries: JSX, which is pure Erlang, and Jiffy, which is C.

The protobuf implementation we are using is from Basho..

I'd very much like to go on the record and state, I feel in most cases, microbenchmarks should be taken with a grain of salt. Including this one. Anyone who tries to rewrite anything based just on this blog is in for a bad time. Do your own tests.

In order to use Protocol Bufers, we start by defining the types. This is the contents of my things.proto file.

I've used some Ruby as a quick demonstration of what our data structure may look like:

irb(main):002:0> something = {:counter => 1, :number => 50}
    => {:counter=>1, :number=>50}
irb(main):003:0> something.to_json
    => "{\"counter\":1,\"num\":50}"

Using this, I can create a protobuf definition. This is the below file. Straight away, you can see that I've defined not only that the variables are of the in32 type, but that there are exactly two of them, and they are required. There's an obvious advantage at this point of knowing exactly what you're receiving over the wire.

message Counternumber {
    required int32 counter = 1;
    required int32 num = 2;
}

And now here's our test bed application. It was run up in a few minutes so it's not meant to be a shining example of Erlang. If you're not familiar with Erlang or just want a tl;dr, it builds a list (an "array", if you will) of 100 of these structures, and serialises it 100000 times with to create a benchmark.

-module(data).
-compile(export_all).
-define(TIMES, 100000).

-type ourthing() :: {'counter',pos_integer()} | {'num',1..1000}.

-spec fullrun() -> 'ok'.
fullrun() ->
    X = makedata(),
    {Jiffy, _} = timer:tc(data, withjiffy, [X]),
    {JSX, _} = timer:tc(data, withjsx, [X]),
    {Props, _} = timer:tc(data, withprop, [X]),
    io:fwrite("Jiffy time: ~p, JSX time: ~p props time: ~p~n", [Jiffy, JSX, Props]),
    Proplen = byte_size(iolist_to_binary(withprop_node(X, []))),
    JSONlen = byte_size(jsx:encode(X)),
    io:fwrite("JSON is ~p long and Protobuf is ~p long~n", [JSONlen, Proplen]).

-spec makedata() -> [ourthing()].
makedata() ->
    Y = [ [{counter, X}, {num, rand:uniform(1000) }] || X <- lists:seq(1,100)],
    lists:flatten(Y).

-spec withprop_node([ourthing()], any()) -> [any()].
withprop_node([], Acc) ->
    Acc;

withprop_node(X, Acc) ->
    [{counter, A} , {num, B} | Tail] = X,
    Encode = thing_pb:encode_counternumber({counternumber, A, B}),
    withprop_node(Tail, [Acc | Encode]).

-spec withprop([ourthing()]) -> [any()].
withprop(X) ->
    withprop(X, ?TIMES).

-spec withprop([ourthing()], non_neg_integer()) -> [any()].
withprop(X, 0) ->
    iolist_to_binary(withprop_node(X, []));

withprop(X, T) ->
    iolist_to_binary(withprop_node(X, [])),
    withprop(X, T-1).

-spec withjsx([ourthing()]) -> any().
withjsx(X) ->
    withjsx(X, ?TIMES).


-spec withjsx([ourthing()], non_neg_integer()) -> any().
withjsx(X, 0) ->
    jsx:encode(X);

withjsx(X, T) ->
    jsx:encode(X),
    withjsx(X, T-1).

-spec withjiffy([ourthing()]) -> any().
withjiffy(X) ->
    withjiffy(X, ?TIMES).

-spec withjiffy([ourthing()], non_neg_integer()) -> any().
withjiffy(X, 0) ->
    jiffy:encode({X});

withjiffy(X, T) ->
    jiffy:encode({X}),
    withjiffy(X, T-1).

Results

With that testbed run, here is the output I'm seeing:

Jiffy time: 6936403, JSX time: 25947210 props time: 5145719
JSON is 2283 long and Protobuf is 486 long

There's an obvious benefit that's immediately visible here: the Protobuf output is less than a quarter of the size of the JSON.

To help review the timeframes, I've reformatted them as below. Elapsed time is presented in microseconds.

Implementation Time
Jiffy 6,936,403
JSX 25,947,210
Protobuf 5,145,719

In a world where performance counts, these differences are non-trivial. It's hard to argue about the benefits here.

Downsides

There are of course downsides. Working with protobufs is obviously more work, and they'll have to be converted on the client side. I'll suggest a "development mode" that still uses JSON, so you can use the network monitor usefully when you need it.

In an upcoming blog, I'll be converting the erlvulnscan frontend to read protobuf AJAX queries.

Argon2 code audits - part one - Infer

Introduction

This article is the first part in a series in which we use popular tools to audit the Argon2 library.

Let's start with a quick background on what Argon2 is with a quote from their README:

This is the reference C implementation of Argon2, the password-hashing function that won the Password Hashing Competition (PHC).

Argon2 is a password-hashing function that summarizes the state of the art in the design of memory-hard functions and can be used to hash passwords for credential storage, key derivation, or other applications.

More information at the official Argon2 Github

In today's article, we review with a static code analysis tool. Such tools are often seen in a negative light, and hopefully the findings of this article can increase the use of such tools.

Infer

Infer is a static analysis tool for C and Java that was opened source by Facebook. See the official Infer website here

I had used Infer early in its release, but it was quite frustrating to keep running. Every time I upgraded clang, or glibc, or just about anything, it seemed to break. As an Arch Linux user, that was regularly.

There's a great solution to this problem in modern times - Docker. I checked and it seemed Facebook had the same idea, as now they publish a Dockerfile. It actually didn't work when I first tried it, but my issue was attended to pretty quickly.

With a working file presented, I aren't too interested in Android development, so I created a slimmed down Dockerfile without the Android SDK. You can see this here:

# Base image
FROM debian:stable

MAINTAINER Infer

Debian config

RUN apt-get update && \ apt-get install -y --no-install-recommends \ build-essential \ curl \ git \ groff \ libgmp-dev \ libmpc-dev \ libmpfr-dev \ m4 \ ocaml \ default-jdk \ python-software-properties \ rsync \ software-properties-common \ unzip \ zlib1g-dev

Install OPAM

RUN curl -sL \ https://github.com/ocaml/opam/releases/download/1.2.2/opam-1.2.2-x86_64-Li nux \ -o /usr/local/bin/opam && \ chmod 755 /usr/local/bin/opam RUN opam init -y --comp=4.02.3 && \ opam install -y extlib.1.5.4 atdgen.1.6.0 javalib.2.3.1 sawja.1.5.1

Download the latest Infer release

RUN INFERVERSION=$(curl -s https://api.github.com/repos/facebook/infer/releases \ | grep -e '^[ ]+"tagname"' \ | head -1 \ | cut -d '"' -f 4); \ cd /opt && \ curl -sL \ https://github.com/facebook/infer/releases/download/${INFERVERSION}/infer-linux64-${INFERVERSION}.tar.xz | \ tar xJ && \ rm -f /infer && \ ln -s ${PWD}/infer-linux64-$INFER_VERSION /infer

Compile Infer

RUN cd /infer && \ eval $(opam config env) && \ ./configure && \ make -C infer clang

Install Infer

ENV INFERHOME /infer/infer ENV PATH ${INFERHOME}/bin:${PATH}

Building using this file basically consists of:

  • Place Dockerfile in an empty directory
  • Run: docker build -t infer:0.1 .

With the container built, you can bring up an Infer container and destroy it safely any time you need to test some code.

Running it

A docker container with a copy of Infer isn't that useful without a copy of your codebase. Fortunately, I happen to have a cloned git repo in my home directory. We can start the container and mount this code inside the container as follows:

$ docker run -t -v /path/to/phc-winner-argon2/:/code --rm -i infer:0.1

This will bring up a Docker container, in a way that's quite different how you hear about Docker being used in devops scenarios. Specifically, it'll bring you into an interactive shell, and when you run "exit" it will destroy the container.

The first thing we'll want to do is cd to the /code directory, from which we can start running the infer analyzer (conveniently in our PATH) against the codebase.

$ infer -- clang -c  -Wall -g -Iinclude -Isrc  -pthread src/run.c
Starting analysis (Infer version v0.6.0)
Computing dependencies... 100%
Creating clusters... 100%
Analyzing 1 clusters.Analysis finished in 0.257342s
Analyzed 4 procedures in 1 file
No issues found

What you'll see there is, the run file analyzed, and no real output to talk about. We should work through each file in this fashion. It turns out core.c is the interesting one.

$ infer -- clang -c  -Wall -g -Iinclude -Isrc  -pthread src/core.c
Starting analysis (Infer version v0.6.0)
Computing dependencies... 100%
Creating clusters... 100%
Analyzing 1 clusters.Analysis finished in 0.777034s
Analyzed 17 procedures in 1 file
Found 4 issues
src/core.c:286: error: MEMORY_LEAK
   memory dynamically allocated to thr_data by call to calloc() at line 267, column 16 is not reachable after line 286, column 25
  284.                       rc = argon2_thread_join(thread[l - instance->threads]);
  285.                       if (rc) {
  286. >                         return ARGON2_THREAD_FAIL;
  287.                       }
  288.                   }

src/core.c:286: error: MEMORY_LEAK
   memory dynamically allocated to thread by call to calloc() at line 262, column 14 is not reachable after line 286, column 25
  284.                       rc = argon2_thread_join(thread[l - instance->threads]);
  285.                       if (rc) {
  286. >                         return ARGON2_THREAD_FAIL;
  287.                       }
  288.                   }

src/core.c:302: error: MEMORY_LEAK
   memory dynamically allocated to thr_data by call to calloc() at line 267, column 16 is not reachable after line 302, column 21
  300.                                             (void *)&thr_data[l]);
  301.                   if (rc) {
  302. >                     return ARGON2_THREAD_FAIL;
  303.                   }
  304.

src/core.c:302: error: MEMORY_LEAK
   memory dynamically allocated to thread by call to calloc() at line 262, column 14 is not reachable after line 302, column 21
  300.                                             (void *)&thr_data[l]);
  301.                   if (rc) {
  302. >                     return ARGON2_THREAD_FAIL;
  303.                   }
  304.

A quick review of this codebase, with the highly descriptive output above should let you quickly ascertain that, yes, these are genuine issues, and fairly easy to fix.

This became a PR:

Pull request fixing this issue

Conclusion

Hopefully what this demonstrate is that, once the appropriate container is handy, running Infer is something that can be done in minutes. Of course, in a larger scale project, it wouldn't be hard to script the execution, as opposed to running manually for each file.

The practical output here is precisely zero false positives, and four genuine memory leaks. I encourage more developers to look into such solutions. Obviously, a huge amount of credit goes to Facebook for releasing this tool.

The interesting thing here is that I had previously run this codebase through Valgrind - but what that misses is that it will only detect leaks that actually get triggered during the execution.

In our next part, we implement an afl-fuzz harness!

Let's Encrypt - It's happening

Using Lets Encrypt

Today, the Let's Encrypt team announced beta program launch.

This is a huge step forward for the Internet in general. We are living in a world where a Symantec account manager actually believes a business should sink $2,490 into a certificate that is identical in security level (no, I don't count "identity" if end users can't tell the difference) to a $9 alternative.

Costs aside, there have been a lot of excuses used as to why websites aren't secure. The performance complaint is long debunked, and the maintenance issue is - one Let's Encrypt also sets out to resolve.

Clients

Let's Encrypt introduces the notion of a client product, as opposed to utilising a website. The default client aims to be as "hands off" as possible. For the majority of the Internet - that's a net gain.

For anyone with any sysadmin experience however, you'll be extremely cautious of a tool that automatically edits server config files. Or you may just be an nginx user, who found the official client is known to break nginx and thus disables support by default.

For this reason, I have a strong preference for Unixcharles acme-client. This also avoids sudo, although I note this can now be acheived in the official client.

Running it

Unixcharles stresses that his product is a gem, designed for use as part of a larger project, rather than a standalone client. It happens however, that such a project can be this quite simple Ruby script I've written to use said gem.

UPDATE: The acme-client gem has had a significant update. the below gist has been rewritten to compensate.

#!/usr/bin/env ruby

We're going to need a private key.

require 'openssl'

Initialize the client

require 'acme/client'

We need an ACME server to talk to, see github.com/letsencrypt/boulder

ENDPOINT = 'https://acme-v01.api.letsencrypt.org/'

ENDPOINT = 'https://acme-staging.api.letsencrypt.org'

ACCOUNTFILE = 'accountkey.pem' NAMES = { 'lolware.net' => '/var/www/html', 'erlvulnscan.lolware.net' => '/home/technion/erlvulnscan/frontend/build', 'ctadvisor.lolware.net' => '/home/technion/ctadvisorint/public', 'www.lolware.net' => '/var/www/html' }

unless File.exist?(ACCOUNTFILE) puts "Creating new account file" privatekey = OpenSSL::PKey::RSA.new(2048) client = Acme::Client.new(privatekey: privatekey, endpoint: ENDPOINT)

If the private key is not known to the server, we need to register it for the

first time. registration = client.register(contact: 'mailto:technion@lolware.net')

You'll may need to agree to the term (that's up the to the server to require i

t or not but boulder does by default) registration.agree_terms NAMES.each do |subject, path| authorization = client.authorize(domain: subject) challenge = authorization.http01

Save the file. We'll create a public directory to serve it from, and we'll cre

ating the challenge directory. FileUtils.mkdir_p( File.join( path, File.dirname( challenge.filename ) ) )

Then writing the file

File.write( File.join( path, challenge.filename), challenge.file_content )

Wait a bit for the server to make the request, or really just blink, it should

be fast. sleep(5) puts "Verification status: " + challenge.verifystatus # => 'pending' end open ACCOUNTFILE, 'w' do |io| io.write privatekey.topem end puts "New account written" end privatekey = OpenSSL::PKey::RSA.new(File.read ACCOUNTFILE) client = Acme::Client.new(privatekey: privatekey, endpoint: ENDPOINT)

We're going to need a certificate signing request. If not explicitly

specified, the first name listed becomes the common name.

csr = Acme::Client::CertificateRequest.new(names: NAMES.keys)

We can now request a certificate, you can pass anything that returns

a valid DER encoded CSR when calling to_der on it, for example a

OpenSSL::X509::Request too.

certificate = client.new_certificate(csr) # => #<Acme::Client::Certificate ....>

Save the certificate and key

File.write("privkey.pem", certificate.request.privatekey.topem) File.write("fullchain.pem", certificate.fullchaintopem)

Now you've still got a hurdle to overcome. In order for nginx to serve out the right certificate chain, you'll need to bundle the intermediary. So you should wget that from The official Certificates Download page.

This small script will need to be sudo'ed, right after running the above. Unfortunately there's no getting out of that to restart nginx, but you can see that it's much more easily audited and checked.

cat ssl_cert.pem lets-encrypt-x1-cross-signed.pem  > /etc/nginx/pki/certificate.pem
cp ssl_private_key.pem /etc/nginx/pki/key.pem
nginx -t && systemctl restart nginx

Why yes, it is running in production

This is definitely a beta trial. That said, I have my ERLVulnscan Tool running a certificate generated by the above script. It looks clean on SSL Labs' test and will be updated via cron regularly.

Why isn't this particular blog using it? Because it's beta.

Update

Turns out, this blog now actually is running this script in production. There's still a big TODO about how I might manage HPKP with only a 90 day lifetime on keys that instantly replace themselves, as opposed to letting me pre-generate a CSR.

Oh look, a phone call from a Comodo representative...

With subjectAltNames !

The gist has had a significant update, to now include alt names. This is necessary for a cert to cover DOMAIN and www.DOMAIN, but you can modify one line and verify as many unrelated domains as needed.