Deploying WebSockets cluster to GCP with Let’s Encrypt certificates
8 min read
Deploying a WebSockets cluster is not a trivial task by itself, you need a special load balancer with session affinity which keeps the connections alive, not to mention adding Let’s Encrypt SSL certificates to this party.
In this post, I am hoping to turn this task to a trivial one by sharing lots of bash code and my experience of deploying WebSockets cluster to GCP secured by free Let’s Encrypt SSL certificates. Here I am going to talk about high level concepts, while in the GitHub repository I’ll dive deeper to low level commands.
Let’s start by drawing the architecture to make the overall plan as clear as possible:
So as you can see the architecture consists of two main components, the application servers and the Let’s Encrypt servers. Both of them are managed instance group and behind a load balancer.Next, let’s dive into each part of the architecture and we’ll start with the network load balancer. This load balancer should receive all the domain traffic from ports 80 & 443 (HTTP/S) and route it to the application instance group. Session affinity must be enabled for this load balancer to keep the WebSockets connections routed to the same server otherwise the connections will keep reconnecting. We must use network load balancer because others don’t support WebSockets. In addition, a static IP address must be attached to this load balancer, so we will be able to update our DNS zone file with this address.The application server runs Nginx to handle routing the requests and SSL termination. Docker is used for the application itself (your code!). On initialization, the application server downloads the SSL certificates of the relevant domain along with dhparam.pem file from Google Cloud Storage (later on, I’ll explain how they get there). Nginx routes all the ACME challenge HTTP requests (used by Let’s Encrypt to validate the domain) to the Let’s Encrypt load balancer (the IP of this load balancer is stored as a project metadata). All other HTTP requests are being redirected to HTTPS. The HTTPS requests are routed (using reverse proxy) to the Docker image with the application code (the proxy obviously supports WebSockets). A daily cron job is configured to sync the certificates from GCS and restart Nginx if something has changed.The HTTP load balancer is a simple load balancer which routes all port 80 requests to the Let’s Encrypt instance group.The Let’s Encrypt renewal server uses
letsencrypt Docker image for triggering the renewal requests and Apache2 web server to handle the ACME challenge. No configuration is needed for the Apache2 because it is already set to serve a directory after installation. Using the Docker image, we will renew our domain certificates and then upload them to GCS, the relevant path in GCS is also stored as project metadata (don’t worry there is a script which takes care of everything). Currently, I didn’t set a cron job to auto-renew the certificates but it is totally feasible.Now that we understand each and every part, let’s run some code. Head over to the GitHub repository and clone the project. The project itself and the scripts are fully documented so I’ll just give a brief explanation to what needs to be done.
pre-deploy.sh and edit
BUCKET_LOCATION and then run the script. It might take a while because it generates a dhparam.pem file which is a CPU intensive task. When done you should have a brand new bucket in GCS containing the dhparam file along with a new project metadata which stores the bucket path.
Next, we will deploy Let’s Encrypt instance group, so open
letsencrypt directory. If you want you can change some of
deploy.sh parameters to adapt it to your use-case. You must change the
startup.sh . This email will be used to register your SSL certificates with Let’s Encrypt. Now we can run the deploy script, it shouldn’t take long till we see a new instance in GCP console. For the first time we issue the certificates, we have to set the DNS zone file to the external IP of this brand new instance (we can’t deploy the application servers yet because they require the certificates, this is why we have to use this hack for the first time). Then, SSH into the instance, please note that it takes ~3m for the instance to fully install everything. You can
tail the installation log
/var/log/daemon.log to track the progress. When done installing, we can run the renewal script
sudo /root/renew.sh yourdomain.com . It might ask you some questions for the first time but in a little while it will issue the certificates and upload them to GCS.
Now everything is ready for the application instance group to be deployed. Again you can edit the deploy parameters to suit your needs. In
startup.sh you must edit
DOMAIN , which is of course the domain of the application server. Also you need to edit
DOCKER_IMAGE , which should point to your application Docker image (I created a simple websockets server for the demonstration, if you want to use).
DOCKER_PORT should set to the port which is exposed by your Docker image. Save everything and run
deploy.sh , again it will take ~3m for the instances to install themselves but in the meantime we can head back to our DNS zone file and update it to point our brand new network load balancer. A static IP is attached to it, so it shouldn’t be changed.
That’s it! You now have a running websockets cluster on Google Cloud Platform secured by free SSL certificates by Let’s Encrypt. You are more than welcome to contribute to the repository, if you feel something is missing or need to be fixed.