· 7 min read

How I implemented custom domain support with automatic TLS certs for my SaaS app

Managing hosting and TLS certificates for customer-specified domains can be a challenge. In this article I explain how I solved this, with relative ease, for blogline.co.

https://unsplash.com/photos/DoWZMPZ-M9s

When I first started working on Blogline (a fast, minimalist blogging platform), I had only planned to use it for my own blogs. Like most app developers (certainly those using Rails), I originally hosted it on Heroku, which has decent support for TLS if you’re working with only a handful of known domains. Once I’d finished the early prototype, I decided it might be fun to turn it into a SaaS product that would allow customers to create a blog on their own domain in seconds. This didn’t sound complicated technically – the main problem was how to manage TLS certificates.

LetsEncrypt

When LetsEncrypt arrived, it hugely altered the TLS landscape. Developers could now programatically create TLS certificates for free rather than forking out quite large sums each year for the privilege of running their apps and sites over HTTPS.

For Blogline, I figured I would need to use LetsEncrypt (or an alternative like ZeroSSL in order to get custom domains working. When someone configured their blog to use a custom domain in the app, I could run a job in the background that would generate a TLS certificate using Certbot, then call a Heroku API to somehow configure Heroku to deal with it correctly. Right? This didn’t sound particularly fun, or perhaps even doable, so I started Googling.

Cloudflare

I found an article, enticingly titled SSL for SaaS Providers on the Cloudflare website. Their marketing suggests this product might allow me to do what I want without actually having to implement anything – my laziness was paying off already!

Cloudflare SSL for SaaS allows a SaaS company’s end customer to continue using a custom vanity domain, while securing its communication through SSL.

Unfortunately, I had to fill out a Talk To Us! form in order to get a price quote, and further Googling told me this was available only to Enterprise customers (💸💸💸) and would cost thousands (!) per month.

Moving swiftly on…

Caddy Server

Although the zero-effort Cloudflare solution had been ripped out of my miserly hands, my incessant Googling had helped me figure out that the search term I needed was “on-demand TLS”. The top hit for this term was Caddy Web Server, something I had never heard of before. The Caddy website states:

Automatic HTTPS provisions TLS certificates for all your sites and keeps them renewed. It also redirects HTTP to HTTPS for you!

This sounded almost too good to be true, but this 29-second video demonstrates how incredibly simple it is to use:

A few quick experiments and it was clear that Caddy was exactly the right tool for the job. It’s amazing! It’s not that often you come across a tool that not only solves a complex problem, but does so painlessly. Serious kudos to the Caddy developers and community.

My app was still hosted in Heroku and so I needed to get Caddy working as a reverse proxy, something which is surprisingly easy. I created a $5 droplet on DigitalOcean, installed Caddy (a one liner on Ubuntu) and configured it to reverse proxy requests to my Heroku app. The entire config looked something like this:

reverse_proxy * {
    to https://myapp.herokuapp.com

    header_up Host {http.reverse_proxy.upstream.hostport}
    header_up X-Forwarded-For {http.request.remote}
    header_up X-Real-IP {http.reverse-proxy.upstream.address}
    header_up X-Forwarded-Proto {http.request.scheme}
    header_up X-Forwarded-Port {http.request.port}
    header_up X-Forwarded-Host {http.request.host}

    tls {
      on_demand
    }

    log {
      output file /var/log/caddy/access.log
    }
}

The primary DNS for blogline.co didn’t need to change since I’d already sorted TLS with Heroku for that domain, but for blogs configured with custom domains, all a customer needed to do was point their DNS to my new DigitalOcean droplet and let Caddy do the rest.

It works!

Improving resilience

This initial implementation was working ok, but there were a few things that I wanted to improve.

I was a little concerned in the back of my mind that Caddy was storing the TLS certificates on my DigitalOcean droplet, of which there was only one. What if my app was wildly successful (right now I’m like 🤣) and I had to load balance Caddy. If you leave the default configuration in place, each Caddy instance will generate and store their own copies of TLS certificates which could cause issues with LetsEncrypt rate limits.

The Caddy developers have made it easy to generate plugins, and there are several storage back-ends you can configure so that multiple Caddy instances share the same certs. The main options for storage are Redis and DynamoDB. I was already running Redis, so it made sense to use that. I had to jump through a few hoops to get the plugin running since the default Ubuntu Caddy install doesn’t support them – you have to build them yourself. Fortunately this is pretty straight forward.

First you need Golang 14:

sudo apt install golang-1.14-go

Then you need xcaddy:

go get -u github.com/caddyserver/xcaddy/cmd/xcaddy

Then you can build Caddy with the Redis plugin:

xcaddy build --with github.com/gamalan/caddy-tlsredis

This will generate a new Caddy binary which you can copy into /usr/bin.

Configuring Caddy is straightforward. I just added the following to the top of my config (while also configuring the referenced environment variables on my system):

{
  storage redis {
    host {$CADDY_CLUSTERING_REDIS_HOST}
    port {$CADDY_CLUSTERING_REDIS_PORT}
    password {$CADDY_CLUSTERING_REDIS_PASSWORD}
    db {$CADDY_CLUSTERING_REDIS_DB}
    tls_enabled {$CADDY_CLUSTERING_REDIS_TLS}
    aes_key {$CADDY_CLUSTERING_REDIS_AESKEY}
  }
}

With certs now being stored in Redis, I could fire up as many Caddy instances as I like and point them to the same Redis instance. No dupes.

Bots

If you have a site running on the public internet, you’re going to be hit by bot requests. What you don’t want is bots hitting your Caddy-delivered app with spurious requests that result in LetsEncrypt certificates being created inadvertently, and rate limits being maxed out. Obviously I didn’t read the manual properly, so this happened to me 🤦🏻‍♂️.

Caddy allows you to configure an endpoint for verifying whether it should try and provision a TLS cert for a given domain. For Blogline, this is a simple Rails action that performs a single DB query and returns 200 or 404 depending on whether the custom domain is recognised. To configure this endpoint in Caddy, you add a global config:

{
  on_demand_tls {
    ask http://127.0.0.1:1234/my-check-endpoint/
    interval 2m
    burst 5
  }
}

Wildcards

Now that I had this running, it made sense to also point the main blogline.co domain at Caddy as well. Blogline requires a wildcard certificate but (of course!) Caddy also handles these, as long as you configure it to access your DNS provider.

This requires another plugin, so I needed to rebuild Caddy again, this time using the Cloudflare DNS plugin:

xcaddy build --with github.com/gamalan/caddy-tlsredis --with github.com/caddy-dns/cloudflare

Once I’d installed the new binary, I added a new section to my Caddy config specifically for the blogline.co domain:

*.blogline.co:443 {
    reverse_proxy 127.0.0.1:1234

    tls {
      dns cloudflare <API-token>
    }

    log {
      output file /var/log/caddy/access-blogline.log
    }
}

Now all requests to the Blogline app come via Caddy and TLS is handled automagically, including a wildcard certificate for any requests to the *.blogline.co domain (e.g. help.blogline.co).

Tidying things up

Proxying requests from DigitalOcean to Heroku wasn’t pleasing me, so I decided to move all the hosting to DigitalOcean. Leaving the ‘no servers’ approach of Heroku was a little worrying as I really don’t want to have to deal with infrastructure, but given that I was already server-wrangling with Caddy it didn’t seem like too much extra effort.

Thankfully I discovered the wonderful HatchBox, developed by Chris Oliver, which made server provisioning and deployment pain a thing of the past. It’s absolutely fantastic. Although HatchBox doesn’t support Caddy yet (apparently it’s coming), it was easy enough to provision the servers first and then install Caddy myself. It hooks into my Github so when I merge a PR, Hatchbox takes care of the deployment. It’s essentially the same end-user experience as Heroku.

Being all-in on DigitalOcean meant I could make use of their Managed Postgres and Managed Redis options, as well as ditching AWS S3 for DigitalOcean Spaces which is API compatible. They have also recently released their managed Kubernetes ‘app platform’ which is something I plan to look at, and would give me the equivalent serverless infra that Heroku currently provides.

Conclusion

Some of this might come across initially as complicated, but it’s honestly not. There are very few moving parts and you don’t need to be a particular adept sysadmin (I am not!) to achieve the same goal. If you get stuck, the Caddy Community site is excellent.

I’d love to hear from people about this write up, but also about any alternative approaches that people have taken to solve the same problem. If this is something you’ve tackled, please drop me an email and let me know!