K8S使用cert-manager自动续订SSL证书

安装cert-manager

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.3/cert-manager.yaml


kubectl get pods --namespace cert-manager

NAME                                       READY   STATUS    RESTARTS   AGE
cert-manager-5c6866597-zw7kh               1/1     Running   0          2m
cert-manager-cainjector-577f6d9fd7-tr77l   1/1     Running   0          2m
cert-manager-webhook-787858fcdb-nlzsq      1/1     Running   0          2m

配置 Let's Encrypt 颁发者

Let's Encrypt 生产颁发者有非常严格的速率限制。 当进行实验和学习时,很容易达到这些极限。由于这种风险, 我们将从 Let's Encrypt 暂存颁发者开始,一旦我们对它的工作感到满意 我们将切换到生产颁发者。

请注意,你将看到有关暂存颁发者颁发的不受信任证书的警告,但这完全是意料之中的。

颁发者有两种,一种是`Issuer`,一种是`ClusterIssuer`,前者只能在单个命名空间使用,后者可以跨集群使用,yaml文件是一致的,只是在前面要声明`kink`是`Issuer`还是`ClusterIssuer`。

质询方式

使用Let's Encrypt 申请证书需要验证对域名的所有权,验证方式分两种,一种是http01一种是dns01

http01

http01在申请证书的过程中,cert-manager 会在你的 Kubernetes 集群中创建一个临时的 HTTP 服务,该服务用于响应 Let's Encrypt 的验证请求。Let's Encrypt 的服务器将尝试通过公开的 Internet 访问你指定的域名,以获取这个服务上的特定验证文件。如果它能成功访问并验证这个文件,证书发放过程就会继续。

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    # The ACME server URL
    # 有staging的表示为临时用,不会因为请求频率导致被限制
    # server: https://acme-v02.api.letsencrypt.org/directory 
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: [email protected]
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-staging
    # Enable the HTTP-01 challenge provider
    solvers:
      - http01:
          ingress:
            ingressClassName: nginx  # Ingress-controller,因为我们用的是nginx-ingress-controller

dns01

如果你的环境不能公开访问互联网,则需要考虑使用 dns01 验证方式,这种方式通过 DNS 记录来验证域名所有权,不需要直接的外网访问。

支持dns01质询的有以下服务商

  • ACMEDNS

  • Akamai

  • AzureDNS

  • CloudFlare

  • Google

  • Route53

  • DigitalOcean

  • RFC2136

用CloudFlare平台进行示范

需要域名本身就在CloudFlare上注册的,或者将域名的DNS改到CloudFlare进行管理

申请API Token:

My Profile -- API Tokens

分配权限时,按下面列出的几项进行分配即可。

  • Permissions

    • Zone - DNS - Edit

    • Zone - Zone - Read

  • Zone Resources:

    • Include - All Zones

ClusterIssuer或Issuer的yaml配置文件示例

cloudflare的api secret所在的命名空间要保持跟cert-manager pod同一命名空间或在kube-system命名空间(优先前者)。否则会报“error getting cloudflare secret: secrets \"cloudflare-api-key-secret\" not found"”
apiVersion: v1
kind: Secret
metadata:
  name: cloudflare-api-key-secret
  namespace: kube-system
type: Opaque
stringData:
  api-key: <token-str>
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-staging
  namespace: kube-system # 如果是选了Issuer,则这里跟要部署的服务在同一命名空间,推荐用ClusterIssuer
spec:
  acme:
    # server: https://acme-staging-v02.api.letsencrypt.org/directory
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected] # 收取let's encrypt邮件的邮箱
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
    - dns01:
        cloudflare:
          email: [email protected] # cloudflare账号的邮箱
          apiKeySecretRef:
            name: cloudflare-api-key-secret
            key: api-key

不支持直接通过DNS01质询的配置示例

有些域名服务商的DNS服务不支持直接通过DNS01质询的,服务则需要增加一个webhook服务进行中转。例如阿里云DNS。

alidns不支持直接使用dns01质询,需要先部署一个alidns webhook的服务来处理。

cert-manager-alidns-webhook

这是Cert-Manager与阿里云DNS(又名 AliDNS)一起使用的Webhook实现方法。

申请API key/secret:

到阿里云账号的AccessKey页面中去申请。

分配权限时,给予DNS相关的所有权限

部署adlidns webhook

需要用到helm且是3版本,需要提前安装。

helm repo add cert-manager-alidns-webhook https://devmachine-fr.github.io/cert-manager-alidns-webhook
helm repo update
helm pull cert-manager-alidns-webhook/alidns-webhook

tar -zxf alidns-webhook-0.7.0.tgz

修改一下values.yaml中的几个参数。用来适配我们的环境

# groupName: example.com

# certManager:
#   namespace: cert-manager
#   serviceAccountName: cert-manager

# image:
#   repository: ghcr.io/devmachine-fr/cert-manager-alidns-webhook/cert-manager-alidns-webhook
#   tag: 0.2.0
#   pullPolicy: IfNotPresent
#   privateRegistry:
#     enabled: false
#     dockerRegistrySecret: alibaba-container-registry

# 上面是原配置,下面是修改后

groupName: mydomain.com # 后面的ClusterIssuer中会用到

certManager:
  namespace: cert-manager
  serviceAccountName: cert-manager

# 改一下镜像仓库地址,最好是拉到本地传到私有仓库,同时记得改一下私有仓库配置privateRegistry的认证信息。
image:
  repository: hub.mydomain.com/cert-manager-alidns-webhook
  tag: 0.2.0
  pullPolicy: IfNotPresent
  privateRegistry:
    enabled: true
    dockerRegistrySecret: default-secret

使用helm部署alidns webhook服务

helm install -f alidns-webhook/values.yaml alidns-webhook alidns-webhook/

helm list | grep alidns-webhook

配合alidns-webhook 使用dns01的ClusterIssuer yaml配置文件

dns的api secret所在的命名空间要保持跟cert-manager pod同一命名空间。否则会报“error getting secret: secrets "……" not found"”

apiVersion: v1
kind: Secret
metadata:
  name: alidns-secrets
  namespace: cert-manager
type: Opaque
stringData:
  # aliyun dnsuser key/secret
  access-token: xxx
  secret-key: xxxxx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod-dns01
spec:
  acme:
    email: [email protected]
    # 有staging的server表示为临时验证用,不会因为请求频率导致被限制
    # server: https://acme-staging-v02.api.letsencrypt.org/directory
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod-dns01
    solvers:
    - dns01:
        webhook:
            config:
              accessTokenSecretRef:
                key: access-token
                name: alidns-secrets
              regionId: cn-beijing
              secretKeySecretRef:
                key: secret-key
                name: alidns-secrets
            groupName: mydomain.com # groupName必须与webhook配置中的一致(查看alidns-webhook的values.yaml中的配置)!
            solverName: alidns-solver

配置Ingress TLS

在配置Ingress TLS之前,还要为对应的Ingress创建一个Certificate资源,用来自动管理签发的证书,并保存到对应的Secret

Certificate示例

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: test-ingress-ssl
  namespace: kube-system
spec:
  secretName: test-ingress-ssl # 会自动管理的secret名称,在Ingress中会调用
  issuerRef:
    name: letsencrypt-staging # 这个名称要跟Issuer中的“privateKeySecretRef”名称一致
    kind: Issuer
  dnsNames:
  - abc.kaside365.com # 要申请SSL证书的域名

Ingress示例

单域名配置

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-ingress-ssl
  namespace: kube-system
  annotations:
    # 声明一下使用的Issuer名称
    cert-manager.io/issuer: "letsencrypt-staging"
spec:
  ingressClassName: nginx
  tls:
  - secretName: test-ingress-ssl # Certificate中配置的secretName值
    hosts: 
    - abc.kaside365.com
  rules:
    - host: abc.kaside365.com
      http:
        paths:
          - backend:
              service:
                name: web-1
                port:
                  number: 80
            path: /
            pathType: Prefix

多域名配置

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-ingress-ssl
  namespace: kube-system
  annotations:
    # 声明一下使用的Issuer名称
    cert-manager.io/issuer: "letsencrypt-staging"
spec:
  ingressClassName: nginx
  tls:
  - secretName: test-ingress-ssl # Certificate中配置的secretName值
    hosts: 
      - abc.kaside365.com
      secretName: test-ingress-ssl
    hosts:
      - cdb.kaside365.com
      secretName: test-ingress-ssl # 如果是同一个服务,多个域名可以共用一个Certificate配置
  rules:
    - host: abc.kaside365.com
      http:
        paths:
          - backend:
              service:
                name: web-1
                port:
                  number: 80
            path: /
            pathType: Prefix
    - host: cdb.kaside365.com
      http:
        paths:
          - backend:
              service:
                name: web-2
                port:
                  number: 80
            path: /
            pathType: Prefix
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: test-ingress-ssl
  namespace: kube-system
spec:
  secretName: test-ingress-ssl # 会自动管理的secret名称,在Ingress中会调用
  issuerRef:
    name: letsencrypt-staging # 这个名称要跟Issuer中的“privateKeySecretRef”名称一致
    kind: Issuer
  dnsNames:
  - abc.kaside365.com # 要申请SSL证书的域名
  - cdb.kaside365.com # 要申请SSL证书的域名

验证、排错

# 查看certificate的状态是Flase,因为还未完成申请,如果这个等待时间过长,可以通过下面的几个操作查看原因
kubectl get certificate -A

NAMESPACE     NAME                 READY   SECRET               AGE
kube-system   test-ingress-ssl     False    test-ingress-ssl     1m

# 查看certificate请求证书的详细描述
kubectl describe -n kube-system certificate test-ingress-ssl

...
Events:
  Type    Reason     Age   From                                       Message
  ----    ------     ----  ----                                       -------
  Normal  Issuing    1m   cert-manager-certificates-trigger          Issuing certificate as Existing issued Secret is not up to date for spec: [spec.commonName spec.dnsNames]
  Normal  Reused     1m   cert-manager-certificates-key-manager      Reusing private key stored in existing Secret resource "test-ingress-ssl"
  Normal  Requested  1m   cert-manager-certificates-request-manager  Created new CertificateRequest resource "test-ingress-ssl-1"

# 查看certificaterequest请求证书的详细描述
kubectl describe n kube-system certificaterequest test-ingress-ssl-1

...
Events:
  Type    Reason              Age   From                                                Message
  ----    ------              ----  ----                                                -------
  Normal  WaitingForApproval  15s   cert-manager-certificaterequests-issuer-acme        Not signing CertificateRequest until it is Approved
  Normal  WaitingForApproval  15s   cert-manager-certificaterequests-issuer-venafi      Not signing CertificateRequest until it is Approved
  Normal  WaitingForApproval  15s   cert-manager-certificaterequests-issuer-selfsigned  Not signing CertificateRequest until it is Approved
  Normal  WaitingForApproval  15s   cert-manager-certificaterequests-issuer-vault       Not signing CertificateRequest until it is Approved
  Normal  WaitingForApproval  15s   cert-manager-certificaterequests-issuer-ca          Not signing CertificateRequest until it is Approved
  Normal  cert-manager.io     14s   cert-manager-certificaterequests-approver           Certificate request has been approved by cert-manager.io
  Normal  IssuerNotReady      14s   cert-manager-certificaterequests-issuer-acme        Referenced issuer does not have a Ready status condition


# 查看cert-manager pod的日志,有condition "Ready": "False" -> "True"; 这样的消息,基本就妥了
k logs -f --tail=20 -n cert-manager  cert-manager-7d75f47cc5-dgbtb 

I1221 06:20:10.269145       1 conditions.go:192] Found status change for Certificate "test-ingress-ssl" condition "Ready": "False" -> "True"; setting lastTransitionTime to 2023-12-21 06:20:10.269136695 +0000 UTC m=+1399633.444241060

# 最后看certificate的状态从Flase变成True
kubectl get certificate -A

NAMESPACE     NAME                 READY   SECRET               AGE
kube-system   test-ingress-ssl     True    test-ingress-ssl     3m

错误处理

I1221 06:11:40.216899       1 conditions.go:96] Setting lastTransitionTime for Issuer "letsencrypt-staging" condition "Ready" to 2023-12-21 06:11:40.216888974 +0000 UTC m=+1399123.391993341
I1221 06:11:40.233005       1 setup.go:208] "cert-manager/issuers: skipping re-verifying ACME account as cached registration details look sufficient" resource_name="letsencrypt-staging" resource_namespace="kube-system" resource_kind="Issuer" resource_version="v1" related_resource_name="letsencrypt-staging" related_resource_namespace="kube-system" related_resource_kind="Secret"
I1221 06:11:40.259094       1 controller.go:162] "cert-manager/certificaterequests-issuer-acme: re-queuing item due to optimistic locking on resource" key="kube-system/test-ingress-ssl-1" error="Operation cannot be fulfilled on certificaterequests.cert-manager.io \"test-ingress-ssl-1\": the object has been modified; please apply your changes to the latest version and try again"
E1221 06:11:53.767531       1 controller.go:167] "cert-manager/challenges: re-queuing item due to error processing" err=<
	while attempting to find Zones for domain _acme-challenge.kaside365.com.
	while querying the Cloudflare API for GET "/zones?name=_acme-challenge.kaside365.com" 
		 Error: 6003: Invalid request headers<- 6103: Invalid format for X-Auth-Key header
 > key="kube-system/test-ingress-ssl-1-1799247307-1606112995"
E1221 06:11:55.947838       1 controller.go:167] "cert-manager/challenges: re-queuing item due to error processing" err=<
	while attempting to find Zones for domain _acme-challenge.kaside365.com.
	while querying the Cloudflare API for GET "/zones?name=_acme-challenge.kaside365.com" 
		 Error: 6003: Invalid request headers<- 6103: Invalid format for X-Auth-Key header
 > key="kube-system/test-ingress-ssl-1-1799247307-1606112995"

这个错误是api认证的问题,检查一下token权限是否正确。

最后更新于