- Published on
Super-fun Certificate Surprise
- Author
-
-
- Name
- owls
- Mastodon
- @owls@yshi.org
-
The team wanted to move the hostname for a static site on GitHub Pages over to being an app on AWS. We ran into a problem with AWS Certificate Manager: it didn't want to issue any certificates!
To set the stage, the initial state1 was basically:
my-cool-site.yshi.org CNAME yshi.github.io
This was all well and good: GitHub manages the certificates for Pages sites, so nobody gave it any thought.
Then, we turned this into a full-blown app. It needed an AWS Edge-optimized API Endpoint™, which is just CloudFront wearing a nice sport coat. This requires asking ACM to give us a certificate in us-east-1
, and we did that, just like normal.
Except this was not normal: our "normal" is issuing certs for hostnames that don't exist yet. It's typically a clean slate. And that mattered here.
But, we submitted the AWS-equivilent of a certificate signing request (CSR), and ACM told us to create some records. DNS was meant to look like this:
my-cool-site.yshi.org CNAME yshi.github.io
_blablabla.my-cool-site.yshi.org CNAME vomit-long-string.aws.internal
I'm not entirely sure why ACM relies on CNAMEs for this instead of a TXT record. Maybe that value actually resolves to something if you're an AWS service, and they check that way? It's always seemed weird to me because the ACME DNS-01 challenge uses TXT records.
But: some paperwork mishaps occurred. What ended up in DNS was wrong. It ended up looking like this:
my-cool-site.yshi.org CNAME yshi.github.io
_blablabla.my-cool-site.yshi.org CNAME yshi.github.io
Not long after, we noticed AWS marked the CSR as failed. We noticed the bad value and asked the hostmaster to remove it. We did that and got a new CNAME target for a new CSR. DNS was then correctly configured thusly:
my-cool-site.yshi.org CNAME yshi.github.io
_blablabla.my-cool-site.yshi.org CNAME vomit-long-string2.aws.internal
But the CSR was marked as failed again. I didn't think ACM respected the TTL, but that seemed like a reasonable explanation? Nothing on the page jumped out at me as a problem, so we deleted the _blablabla.my-cool-site.yshi.org
record and sat around doing nothing for the duration of the TTL -- in this case, four hours.
After a lot of coffee, we did a new CSR, re-made the _blablabla.my-cool-site.yshi.org
record, and got an immediate failure.
CNAMEs & CAAs
That was weird and I no longer had an easy explanation. I started nosing around the console a bit more.
Hidden away in a tooltip, the ACM console does give you an error message:
The CAA record is used to tell certificate issuers who should be allowed to issue certificates for the domain. They're supposed to check that and refuse if they aren't in the list of approved issuers.
This was confusing to me: there's no CAA record for yshi.org
2?], and we've used ACM for certificate for hundreds of hostnames on this domain.
I started googling and pretty quickly found an answer. The certificate is for _blablabla.my-cool-site.yshi.org
, which was still a CNAME pointed to github.io
.
Finding the CAA record works by recursively checking the tree until you find a CAA record, or run out of parents. So typically, it works like this:
- CAA for my-cool-site.yshi.org => Not Found
- CAA for yshi.org => Not Found
- CAA for org => Not Found
- No CAA, anyone can issue!
But, this uses DNS to resolve, and it followed the CNAME to yshi.github.io. What happened is:
- CAA for my-cool-site.yshi.org => Ope, this is a CNAME to yshi.github.io
- CAA for yshi.github.io => Not Found
- CAA for github.io => Found!
Thus, AWS was checking the list of issuers in GitHub's CAA record. They were not present, so they refused to issue.
Once we understood the problem, we removed the my-cool-site.yshi.org
CNAME and submitted one more CSR. This time, everything worked as expected.
Short Conclusion for Search Indexes
Certificate will follow CNAMEs and check the target's CAA record. This might cause a CAA error.