Cert Manager on Scaleway

This site was previously running on a Scaleway dev instance.
Unfortunately I lost the keys to login to that machine. Well, not lost, but temporarily misplaced due to COVID-19.

Never to miss an opportunity, I decided to migrate from that dev instance to one managed by Scaleway’s Kubernetes service.

What I thought would be a 30 minute job ended up taking a lot longer. I ran into issues with cert-manager, it wasn’t able to “self-test” its own certificate challenges.

cert-manager process

I’m using the http01 challenge, since I don’t manage my DNS in a way which can be easily automated.

When we create an Ingress resource, with suitable annotations, cert-manager generates a CertificateRequest for the relevant host.
It satisfies the challenge by creating a Pod and Ingress of its own. The pod serves a token under a well known URI, allowing the server to prove ownership of the domain and thus issuance of a certificate.
cert-manager checks it can access this URI - a self test - before it attempts the call to Let’s Encrypt to start the challenge process.

This wasn’t working, as evidenced by the Pending state of the CertificateRequest and logs from cert-manager.

kubectl logs cert-manager-d994d94d7-pv4vl
E0323 19:03:15.756921       1 sync.go:183] cert-manager/controller/challenges "msg"="propagation check failed" "error"="failed to perform self check GET request 'http://test.stewartplatt.com/.well-known/acme-challenge/<token>' ...

Naturally, the first thing I did was to cURL the address myself. That worked, and so I tried from one of the Kubernetes pods directly.

curl http://test.stewartplatt.com/.well-known/acme-challenge/<token>
<longer_token>

kubectl exec -it apps-stewartplatt-79dfdd96f7-4lzth -- sh
$ curl http://test.stewartplatt.com/.well-known/acme-challenge/<token>
curl: (52) Empty reply from server

For some reason traffic from inside the cluster was not able to resolve the address, which was otherwise being served correctly.

One Eternity later

I eventually stumbled upon this open cert-manager issue which described the problem exactly.
One of the comments referenced a k8s SIG proposed by Sh4d1, a developer at Scaleway.
Another related issue offers some additional insight.

I ended up running hairpin-proxy which intends to resolve this exact issue.

My setup now looks something like this:

# Install Scaleway-specific ingress controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.44.0/deploy/static/provider/scw/deploy.yaml

# Install hairpin proxy to work around https://github.com/jetstack/cert-manager/issues/3341
kubectl apply -f https://raw.githubusercontent.com/compumike/hairpin-proxy/v0.1.2/deploy.yml

# Install cert manager
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.16.1/cert-manager.yaml

# Install staging/prod issuers and applications with appropriate ingress configuration
...

And you’re reading this page, so it’s still working!

Unfortunately, it’s not clear when this will be fixed in Kubernetes, but it seems to have caused a number of providers to have to work around it.