The improving state of SSL deployment

Increase in SSL use

There have been a number of blogs lately regarding the increasing SSL deployment across the Internet. Rather than review another survey, I noted this same pattern by its impact on my monitoring service.

Certificate Transparency - Background

To attack a number of the weaknesses in the SSL/CA system, a service known as Certificate Transparency was developed.

In order to provide immediate, actionable monitoring from the CT system, I launched CT_advisor in November 2015. This service alerts you the moment a certificate is issued for your domain. Several commercial services have since disrupted this space with a paid version of the same thing. As a side effect of this, I've been watching the major transparency logs quite closely since.

There's been a lot of discussions around mass increases of SSL's pervasiveness across the web recently, such as from the great Let's Encrypt team. Here in the transparency, we've seen further evidence of this.

A capable monitor

CT_Advisor is designed with Erlang's "let it crash" mentality in mind. When it polls a monitoring server, it limits the amount of certificates it will grab in an cycle. That number has always sat at 32, meaning, when some form of failure occurs, no more than 32 certificates need to be reprocessed.

This was particularly important in the early days, as there were a lot of certs that didn't fit the template I originally built the service to handle, which showed up in logs.

The original polling interval was set to ten seconds, and then shortly afterwards, configured to one minute. What I'm saying is, parsing a maximum of 32 records every minute was perfectly capable in the early days.

At some point this was reduced to 30 seconds, and then 15, and until recently, this was sufficient to handle polling all logged certificates.

Suddenly lagged

Last week I logged onto the service backend, and found it was more than two million certificates behind in its parsing. The reason here, is the explosion of certificates in the CT logs.

You can use this URL to identify the one millionth certificate in a given monitor. For our discussions, we're referring to the Google Aviator log. We're starting at a million because it's a number some time after the initial ingest into the monitor. We can see the certificate logged on 2013-09-30.

Certificate 5,000,000 was logged on 2014-11-29, taking over a year to get another four million certificates logged.

The ten millionth certificate was logged on 2015-10-13. With just under a year producing five million certificates.

Fifteen million came along at 2016-04-25, roughly demonstrating a halving of the time taken to hit the next five million.

Let's Encrypt very clearly kicked into play at this point, with 2016-06-10 being the logging date of certificate number twenty million, less than two months from the earlier block. This is well reflected in their own graphs on issuance.

The pattern continues:

  • 2016-08-05 to reach 25000000
  • 2016-09-22 to reach 30000000
  • 2016-10-19 to reach 38000000

Not necessary "number of certificates"

There are a few things to consider in reviewing these numbers. Firstly, Let's Encrypt's short lifespan means a lot more certificates issued. Secondly, not every certificate is guaranteed to be logged, but more responsible CAs are ensuring that happens.

Cloudflare are also notable, as their SAN certificates need to be reissued every time another user signs up to a free plan.

Finally, believe it or not, S/MIME certificates are a thing that show up in certificate transparency logs from time to time.

But that's still an increase

Even with average certificates coming down from two years for legacy vendors, to three months for LE, eight million certificates logged in less than a month is something unprecedented.

One of the major causes in the last two months track back to cPanel launching the AutoSSL feature, which automates Let's Encrypt certificates to all the cPanel users that never had access to it.

In short, there are an awful lot more websites using SSL, than there were a few years ago.

Closing remarks

I'll leave you with classic community responses in regards to CT Advisor.

  • This guy claims "fraudulent SSL certificates" are a vulnerability but can't even quote a CVE. What an embarrassment to the security industry.

  • Anyone who understands SSL will know it's not possible to get a fraudulent certificate. This service might as well claim to monitor time travellers because it'll never happen. The fact he thinks otherwise shows what an amateur he is.

Intelligent Backend Routes with Rails and nginx


A fairly common deployment involves running nginx as the first hop on an application server, which in turn routes to your backend. This blog is based on Rails as a backend, but the principle could probably be universally applied.

Common nginx configurations

The standard method of deploying the above strategy is well documented in the nginx Pitfalls and Common Mistakes guide. Naturally, it's under a GOOD section, specifically, under the "proxy everything" strategy. The code they list is:

server {
    server_name _;
    root /var/www/site;
    location / {
        try_files $uri $uri/ @proxy;
    location @proxy {
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        fastcgi_pass unix:/tmp/phpcgi.socket;

What this will do is check for a static asset first (in the form of a file) and then proxy it to the backend.

The immediate annoyance

What you will very quickly notice, or at least you should if you watch your logs, is the incredible annoyance of dumping an entire stack trace when a route isn't matched. Such as when an apple device goes looking for their touch icon automatically, and you don't have one setup.

ActionController::RoutingError (No route matches [GET] "/apple-touch-icon.png"):
  actionpack (4.2.5) lib/action_dispatch/middleware/debug_exceptions.rb:21:in `c
  actionpack (4.2.5) lib/action_dispatch/middleware/show_exceptions.rb:30:in `ca
  railties (4.2.5) lib/rails/rack/logger.rb:38:in `call_app'
  railties (4.2.5) lib/rails/rack/logger.rb:20:in `block in call'
  activesupport (4.2.5) lib/active_support/tagged_logging.rb:68:in `block in tag
  activesupport (4.2.5) lib/active_support/tagged_logging.rb:26:in `tagged'
  activesupport (4.2.5) lib/active_support/tagged_logging.rb:68:in `tagged'
  railties (4.2.5) lib/rails/rack/logger.rb:20:in `call'
  actionpack (4.2.5) lib/action_dispatch/middleware/request_id.rb:21:in `call'
  rack (1.6.4) lib/rack/methodoverride.rb:22:in `call'
  rack (1.6.4) lib/rack/runtime.rb:18:in `call'
  activesupport (4.2.5) lib/active_support/cache/strategy/local_cache_middleware
.rb:28:in `call'
  rack (1.6.4) lib/rack/sendfile.rb:113:in `call'
  actionpack (4.2.5) lib/action_dispatch/middleware/ssl.rb:24:in `call'
  railties (4.2.5) lib/rails/engine.rb:518:in `call'
  railties (4.2.5) lib/rails/application.rb:165:in `call'
  puma (2.15.3) lib/puma/configuration.rb:79:in `call'
  puma (2.15.3) lib/puma/server.rb:541:in `handle_request'
  puma (2.15.3) lib/puma/server.rb:388:in `process_client'
  puma (2.15.3) lib/puma/server.rb:270:in `block in run'
  puma (2.15.3) lib/puma/thread_pool.rb:106:in `block in spawn_thread'

There's a direct solution to this default configuration, which is well documented at a number of easily Google'd documents.

This document appears to have the same initial feeling I had - that FATAL errors should be reserved for application crashes, not the billions of bots that hit my sites daily looking for phpmyadmin.

There is also a lot of misinformation about this situation, with a number of stackoverflow posts addressing single issues (you should go and create that file) rather than the source.

A more comprehensive solution

The existing solutions just didn't quite satisfy me. To be clear, there's nothing immediately terrible about just creating a 404 page as described, but the idea that a backend designed to service certain endpoints ends up with all unknown traffic routed to it worked strongly against the way I like to run systems.

In some cases it's easy. For my Erlvulnscan, there is a single endpoint, and I can manually code up my nginx.conf as such:

    location /netscan {
        proxy_pass http://localhost:8081;

Research can dig up enterprise solutions involving embedded LUA and Redis. That's way overkill for my needs however.

Problem 1: What does a good route look like?

For my ctadvisor interface, I create this quick rake task. You can implement it yourself by adding the task file in to the lib/tasks/ directory.

The general goal here is: print out a mapping of valid endpoints for later use. It looks like this:

$ bundle exec rake nginxmap
map $uri $rails_route_list {
    default "false";
    ~^/assets "true";
    ~^/registrations/verify/ "true";
    ~^/registrations/verify "true";
    ~^/registrations/unsubscribe "true";
    ~^/registrations/destroy/ "true";
    ~^/registrations "true";
    ~^/registrations/new "true";
    ~^/rails/info/properties "true";
    ~^/rails/info/routes "true";
    ~^/rails/info "true";
    ~^/rails/mailers "true";
    ~^/rails/mailers/ "true";
    ~^/$ "true";

The output is somewhat like running "rake routes", but there you see routes like this:


Although it's possible to build complex regex's in nginx to try to be very specific, that's not the goal here. It's "good enough" to reach the goal of ensuring it's a valid endpoint by stopping at the first symbol (:id) and ensuring the path matches everything before it.

The code also has a special handler for /, because this should only match in its entirety (otherwise, everything matches).

There's a big TODO here in that this path shows a few additional routes (such as /assets) which aren't present in "rake routes". I could just regex these out, but I'd like to better see the root cause.

Problem 2: How to actually set these routes up in nginx

The obvious solution involves either a whole series of location { } blocks matching each, or one massive regex. Neither of these are particularly pretty, or scaleable.

It turns out nginx has a reasonably good alternative in the map directive.

The task we created formats our routes appropriate for use in the map directive, allowing us to configure nginx like this:

    include 'railsmap.conf';

    server {
        try_files $uri @rails;
        location @rails {
            if ($rails_route_list = "false") {
                return 404;
          proxy_pass http://localhost:8082;

Where the railsmap.conf can be created by running:

bundle exec rake nginxmap > railsmap.conf

I re-run this every time I add a route in Rails. In practice, on an established application, this isn't highly common.

In practice

The described system has now been running on the ctadvisor page for a couple of days and I'm quite happy with the results. Obviously, your environment may be different. Or you may just care less about how specific your routing is.

A non-trivial amount of traffic hitting Rails for me comes in the form of rediculous bots. It should be clearly stated that you're not providing a significant security benefit by "firewalling" off hundreds of scans for vulnerable Wordpress plugins against a Rails server, but you are blocking unwanted traffic, which is never a bad thing.

Use protobufs - now


If you've ever touched any form of web development, ever, you've probably used JSON to get data from a server to a client. Ajax queries nearly always pull data in this format.

Recently, Google Google invented the Protobuf standard, which promises a number of advantages. This seems to have been largly ignored by the community for a while, with most discussions degrading to a complaint one Python library's performance.

I took an interested primarily when noting that Riak KV recommends its protocol buffer interface for performance. I also note, I'm not a Python user.

Typed data

Aside from a potential performance increase, Protocol Buffers are typed. As someone who literally couldn't handle Javascript until things are rewritten in Typescript, this feature is worth a lot.


If you're performing a 32 byte Ajax query, you probably don't care if JSON included overhead. If you're doing a much larger query, you might.

Test bed

In order to obtain a fair test, I'm comparing against two JSON libraries: JSX, which is pure Erlang, and Jiffy, which is C.

The protobuf implementation we are using is from Basho..

I'd very much like to go on the record and state, I feel in most cases, microbenchmarks should be taken with a grain of salt. Including this one. Anyone who tries to rewrite anything based just on this blog is in for a bad time. Do your own tests.

In order to use Protocol Bufers, we start by defining the types. This is the contents of my things.proto file.

I've used some Ruby as a quick demonstration of what our data structure may look like:

irb(main):002:0> something = {:counter => 1, :number => 50}
    => {:counter=>1, :number=>50}
irb(main):003:0> something.to_json
    => "{\"counter\":1,\"num\":50}"

Using this, I can create a protobuf definition. This is the below file. Straight away, you can see that I've defined not only that the variables are of the in32 type, but that there are exactly two of them, and they are required. There's an obvious advantage at this point of knowing exactly what you're receiving over the wire.

message Counternumber {
    required int32 counter = 1;
    required int32 num = 2;

And now here's our test bed application. It was run up in a few minutes so it's not meant to be a shining example of Erlang. If you're not familiar with Erlang or just want a tl;dr, it builds a list (an "array", if you will) of 100 of these structures, and serialises it 100000 times with to create a benchmark.

-define(TIMES, 100000).

-type ourthing() :: {'counter',pos_integer()} | {'num',1..1000}.

-spec fullrun() -> 'ok'.
fullrun() ->
    X = makedata(),
    {Jiffy, _} = timer:tc(data, withjiffy, [X]),
    {JSX, _} = timer:tc(data, withjsx, [X]),
    {Props, _} = timer:tc(data, withprop, [X]),
    io:fwrite("Jiffy time: ~p, JSX time: ~p props time: ~p~n", [Jiffy, JSX, Props]),
    Proplen = byte_size(iolist_to_binary(withprop_node(X, []))),
    JSONlen = byte_size(jsx:encode(X)),
    io:fwrite("JSON is ~p long and Protobuf is ~p long~n", [JSONlen, Proplen]).

-spec makedata() -> [ourthing()].
makedata() ->
    Y = [ [{counter, X}, {num, rand:uniform(1000) }] || X <- lists:seq(1,100)],

-spec withprop_node([ourthing()], any()) -> [any()].
withprop_node([], Acc) ->

withprop_node(X, Acc) ->
    [{counter, A} , {num, B} | Tail] = X,
    Encode = thing_pb:encode_counternumber({counternumber, A, B}),
    withprop_node(Tail, [Acc | Encode]).

-spec withprop([ourthing()]) -> [any()].
withprop(X) ->
    withprop(X, ?TIMES).

-spec withprop([ourthing()], non_neg_integer()) -> [any()].
withprop(X, 0) ->
    iolist_to_binary(withprop_node(X, []));

withprop(X, T) ->
    iolist_to_binary(withprop_node(X, [])),
    withprop(X, T-1).

-spec withjsx([ourthing()]) -> any().
withjsx(X) ->
    withjsx(X, ?TIMES).

-spec withjsx([ourthing()], non_neg_integer()) -> any().
withjsx(X, 0) ->

withjsx(X, T) ->
    withjsx(X, T-1).

-spec withjiffy([ourthing()]) -> any().
withjiffy(X) ->
    withjiffy(X, ?TIMES).

-spec withjiffy([ourthing()], non_neg_integer()) -> any().
withjiffy(X, 0) ->

withjiffy(X, T) ->
    withjiffy(X, T-1).


With that testbed run, here is the output I'm seeing:

Jiffy time: 6936403, JSX time: 25947210 props time: 5145719
JSON is 2283 long and Protobuf is 486 long

There's an obvious benefit that's immediately visible here: the Protobuf output is less than a quarter of the size of the JSON.

To help review the timeframes, I've reformatted them as below. Elapsed time is presented in microseconds.

Implementation Time
Jiffy 6,936,403
JSX 25,947,210
Protobuf 5,145,719

In a world where performance counts, these differences are non-trivial. It's hard to argue about the benefits here.


There are of course downsides. Working with protobufs is obviously more work, and they'll have to be converted on the client side. I'll suggest a "development mode" that still uses JSON, so you can use the network monitor usefully when you need it.

In an upcoming blog, I'll be converting the erlvulnscan frontend to read protobuf AJAX queries.