nginx replacement - Traefik reverse proxy basic setup guide

Started by Dieselboy, May 31, 2019, 12:11:53 AM

Previous topic - Next topic

Dieselboy

Summary

For a while we had been using nginx as a reverse proxy. Located within the DMZ, this accepted requests for urls like jira.domain.com and otherstuff.domain.com via a single public IP. nginx then forwarded the requests to the internal system. We purchased a wildcard cert ($500AUD) from godaddy and used that for this setup. We needed to secure the private key while simultaneously copying this and associated certs to the right places to request the cert in the first place as well as set up nginx to work with it. Once nginx was working it did work well. But there were issues with maintenance such as renewing certs or adding extra config where internal systems had changed such as decom and replace, for example.

To summarise the ongoing challenges:

  • manual setup of config, which can take a while to get right
  • manual certificate process (non-ACME), including purchasing of cert
  • need to secure private key while also copying to right places
  • not a microservice

I came across a different tool a few weeks ago called Traefik. I found that it has a lot of features that can overcome the challenges mentioned above. For example;


  • Uses YAML to configure
  • Runs in docker (in my setup) and we've also deployed this week into Kubernetes
  • Has modules built in such as ACME (Automated Certificate Management Environment) to request and renew trusted SSL certs automatically
  • ACME means the private key never need leave the device
  • Has the capability to automatically configure itself without needing to stop/start or restart anything
  • Is a microservice and is open source

Setup guide

I replaced the nginx proxy with Traefik. I'm using a static 1-1 configuration for this, while I've utilised the dynamic ability (will explain further on).

Traffic flow

A browser looks up the URL to IP and is pointed to the Traefik proxy. The proxy matches the 'frontend' endpoint based on the URL and this in turn matches the 'backend' which is the server that will respond to the client browser. Simply:
Browser -> Traefik -> web server

Prerequisites for this guide

  • System with docker running
  • docker-compose
  • http and https NAT to a public IP
  • Frontend DNS resolving to the public IP. Example website.com or forum.website.com resolving to the public IP of Traefik

Step-by-step


  • Create network for all services reachable from outside
docker network create web

  • Set up this directory structure and files
mkdir -p /opt/traefik
touch /opt/traefik/docker-compose.yml
touch /opt/traefik/acme.json && chmod 600 /opt/traefik/acme.json
touch /opt/traefik/traefik.toml


  • docker-compose.yml configuration

version: '2'

services:
  proxy:
    image: traefik
    command: --configFile=/traefik.toml
    restart: unless-stopped
    networks:
      - web
    ports:
      - 80:80
      - 443:443
    labels:
      # Below true value enables the Traefik dashboard. You can run without a dashboard, set this to false
      - traefik.enable=true
      # Below rule sets the front end tule to access the dashboard. Your browser url basically
      - traefik.frontend.rule=Host:dashboard.domain.com
      - traefik.port=8080
      - traefik.docker.network=web
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /opt/traefik/traefik.toml:/traefik.toml
      - /opt/traefik/acme.json:/acme.json
      - /opt/traefik/endpoints:/opt/traefik/endpoints  

networks:
  web:
    external: true



  • traefik.yml configuration

debug = false

logLevel = "ERROR"
#uncomment below if you want to skip SSL validation.
#useful when going from Traefik to untrusted SSL connection (such as backend with self-signed cert)
#Risk of man-in-the-middle as this is what this enables, so suggest to understand this before enabling
#insecureSkipVerify = true

defaultEntryPoints = ["https","http"]

[entryPoints]
  [entryPoints.http]
  address = ":80"
    [entryPoints.http.redirect]
    entryPoint = "https"
  [entryPoints.https]
  address = ":443"
  [entryPoints.https.tls]

[retry]

[docker]
endpoint = "unix:///var/run/docker.sock"
# Change the domain below
domain = "domain.com"
watch = true
exposedbydefault = false

[acme]
email = "valid@domain.com"
storage = "acme.json"
# Use the letsencrypt staging env. to test this before going live.
# To go live, comment out the caServer. The live server rate-limits certs so it could refuse cert issue if you try the live too often
caServer = "https://acme-staging-v02.api.letsencrypt.org/directory"
entryPoint = "https"
OnHostRule = true
# http challenge attempts to connect to traefik via http. There's also DNS challenge which letsencrypt checks the domain for a TXT record.
[acme.httpChallenge]
entryPoint = "http"

# enable web configuration backend.
[web]
address = ":8080"

# this example uses file and watch. Traefik watches the folder for configs and applies them if there's new configs
[file]
  directory = "/opt/traefik/endpoints/"
  watch = true


  • Endpoint config .toml files in /opt/traefik/endpoints/

For this, best to start with none or one for the first time set up. Then you can expand if you need. This example uses cacti web server.

cacti.toml config file located in endpoints directory.
Backend = real web server
Frontend = what the client browser accesses


[backends]
  [backends.backend-cacti]
    [backends.backend-cacti.servers]
      [backends.backend-cacti.servers.server-cacti-ext]
        url = "http://cacti.internal.domain.com"
        weight = 0

[frontends]
[frontends.frontend-cact]
    backend = "backend-cact"
    passHostHeader = true
# uncomment the below if you want to enable HTTP basic auth. XXX is username YYYY is password. I've not tested this.
#    basicAuth = [
#      HTTP Authentication
#      "xxx:yyyyyyyyyyyy",
#    ]
    [frontends.frontend-cact.routes]
          [frontends.frontend-cact.routes.route-cact-ext]
        rule = "Host:cacti.domain.com"



NOTE: : I have created each front end / back end in separate .toml files to make it simple to locate and manage.

  • Finally, bring up with docker-compose

docker-compose -f docker-compose.yml up


    NOTE : : the traefik.toml is set to error logging. You can set this to debug to get debug logs.


    • Test accessing the URL from the browser

    When you access
cacti.domain.com, you should be forwarded to https://cacti.domain.com with a valid SSL cert. If you get the "TRAEFIK DEFAULT CERT' then give it some more time before re-starting. We found that when requesting wildcard certs, it was taking 5 minutes to get the valid cert set up. We had been impatient but once we knew, we just waited and all was well.

  • Lastly, you can add more configs in the endpoints directory

This directory has watch = true. Add more .toml files and then access the dashboard to see the config automatically pulled in (and out).

[/list]


We have successfully configured Traefik with kubernetes and DNS challenge with godaddy to issue wildcard cert. This method causes Traefik to connect to godaddy automatically, create the TXT record so that let's encrypt can validate the *.domain.com. Then traefik downloads the valid SSL cert and applies it.

Conclusion

I was so impressed by this and how easy it was to set up, that I deployed one as an internal proxy to accept requests from internal users and forward to internal servers with HTTPS enabled. Without the proxy, we either had to figure out how to install certs on all the systems OR use HTTP only OR use the default, untrusted HTTPS cert if it came with one.
For example, cacti backend is not HTTPS enabled. Traefik proxies HTTPS to HTTP. I didnt need to figure out where / how to install the cert on cacti. So my maintenance consideration for cacti is not complicated by enabling SSL.
Similarly, some other systems (such as JIRA) use a bundled JAVA with a JAVA cert store. It can be an issue to renew SSL certs there. It requires an outage window to do it because you need to stop the server, update the cert store and start the server. If you messed up, then your server wont start and this will take an extended amount of time. With this running as a microservice, I can use HTTP on JIRA and de-couple the SSL configuration from the server itself. This removes the necessity for downtime. In addition, the internal proxy uses internal CA-issued cert (because it's not reachable from the outside).
Although I had been considering to forward all requests (internal and external) through the DMZ proxy. I tested it, it works. But creates additional load on the firewall to go into the DMZ from inside, not that load is an issue anyway.

:mrgreen:


deanwebb

Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

wintermute000

So do you literally have 1 container?
How do you do HA/failover/autoscaling? (guessing if its compose that's not part of the deal?)


Can you clarify what you mean by you've deployed it into K8 but you're using docker-compose?

Dieselboy

I have a few of these running. On both the ones I own - yes within docker (one container), using docker-compose. No HA, no scaling. One is using letsencrypt, the other is using a CA-signed cert from my internal CA.

We have another one running within K8 env. Again it's a single one but is using letsencrypt.

HA / scaling is something that will be looked into next. For now, the cloud app is not using HA because it's in dev stage. This does support HA but I've not tried configuring it yet.

wintermute000

Yeah I've messed with precanned ones with letsencrypt built in, pretty slick (the ones where you pass in the parameters at launch via your compose or CLI syntax).

The real frontier is getting it autoscaling and HA etc. and I'm wondering how that's achieved , you'd need LBs in front I'm guessing but how to LB to the LBs lol and then service discovery etc... should just get off my arse and learn K8 properly. Amongst the list of a million other things to learn LOL

Eff it just do it in AWS and run up a ELB, no scaling no worries ha. (read the other day AWS actually run a 'shadow VPC' and scale out HA instances of netscalers sideways, that's why you don't get a static IP only DNS and that's also why they reserve 8 addys for the ELB...)

Dieselboy

traefik is a LB. For HA you need shared volume to store the cert.

For this k8 deployment, all the config is done in the yaml file. TBH, it's equivalent to the example I used above, very simple.

There are better load balancers out there, but I like this one for it's very small footprint, automation / microservice aspect and simple nature. I mean I have the busiest one set up with 2GB memory and 4 vCPUs and the actual usage during production hours is 25% memory consumption (512MB used memory) and <5% cpu. At the moment it's proxying to 5 different apps. And it's running RHEL 7, Docker and the one container for traefik.

If you want (in my opinion) a great LB... check out Avinetworks' avinet load balancer. The minimum deployment needs 28GB RAM (it's pretty big!) but the level of features you get from using it is immense. For example, you can drill down into a single users session and see the urls that they are accessing and the response codes. OR you can search on an error code and drill into those sessions. OR you can select the "404" errors and drill into exactly what URLs caused those response codes, and which pages they came from / links they clicked on. Plus a ton more, I just used the HTTP codes as an example but you can search on a lot of other things that would show up in the web browser dev tools.
I think that as a provider, being able to be alerted to errors and bad web responses, and being able to act specifically to address those is a valuable resource. Ordinarily, I dont think it's easy to do that (unless I'm lacking some key dev knowledge). This LB also has the ability to scale on demand and scale the application on demand (scale out / in). https://avinetworks.com/why-avi/