K8S之kubeadm安装v1.26

系统环境 Rocky Linux 8.6

尽量不要在一个环境下反复重置、部署。可能会有组件配置重置时不会完全清除,导致未知问题。
如果是在虚拟机,建议在初始化集群前对所有节点进行一次快照。每次通过还原快照来恢复环境。

EPEL

yum install -y https://mirrors.aliyun.com/epel/epel-release-latest-8.noarch.rpm

sed -i 's|^#baseurl=https://download.example/pub|baseurl=https://mirrors.aliyun.com|' /etc/yum.repos.d/epel*
sed -i 's|^metalink|#metalink|' /etc/yum.repos.d/epel*

安装containerd

自kubernetes1.24起的版本不再支持Dockershim,所以如果默认无法直接用docker engine做为kubernetes的容器运行时。官方推荐用containerd

转发 IPv4 并让 iptables 看到桥接流量

cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# 应用 sysctl 参数而不重新启动
sysctl --system

安装kubeadm kubelet kubectl

RedHat系

### 添加yum源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

# 安装服务
yum install -y kubelet kubeadm kubectl \
	--disableexcludes=kubernetes

# 启动并设置开机自启动
systemctl enable --now kubelet

Debian系

### Debian/Ubuntu
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl

# 启动并设置开机自启动
systemctl enable --now kubelet

创建初始化文件

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: c9mm0j.3rvgri9l5z0h85zz
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.23.210.24 # API服务器IP,
  bindPort: 6443 # API服务端口
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: test-03
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers # 镜像地址
kind: ClusterConfiguration
kubernetesVersion: v1.26.1 # kubernetes版本
networking:
  dnsDomain: cluster.local
  podSubnet: 172.7.0.0/16 # pod 子网
  serviceSubnet: 172.26.0.0/16  # services子网

看一下所需要的容器,并提前拉至本地

kubeadm config images list --config=init.yaml
	registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.26.1
	registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.26.1
	registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.26.1
	registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.26.1
	registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9
	registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.6-0
	registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.9.3

kubeadm config images pull --config=init.yaml

初始化集群

kubeadm init --config=init.yaml --upload-certs
	# 输出信息
	……
	# 初始化成功信息
	Your Kubernetes control-plane has initialized successfully!
	
	To start using your cluster, you need to run the following as a regular user:
	
	  mkdir -p $HOME/.kube
	  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
	  sudo chown $(id -u):$(id -g) $HOME/.kube/config
	
	Alternatively, if you are the root user, you can run:
	
	  export KUBECONFIG=/etc/kubernetes/admin.conf
	
	You should now deploy a pod network to the cluster.
	Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
	  https://kubernetes.io/docs/concepts/cluster-administration/addons/
	
	Then you can join any number of worker nodes by running the following on each as root:
	
	kubeadm join 172.23.210.24:6443 --token c9mm0j.3rvgri9l5z0h85zz \
		--discovery-token-ca-cert-hash sha256:e30fd29c213a43cc78af7068833f314f3d6326f4d7e184e317a20c70a0be9089 

# 查看集群基本情况,在网络组件没部署前,coredns是无法运行的,状态会一直是‘Pending’
kubectl get po -A
	NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
	kube-system   coredns-567c556887-7j8td          0/1     Pending   0          2m25s
	kube-system   coredns-567c556887-8psgk          0/1     Pending   0          2m25s
	kube-system   etcd-test-03                      1/1     Running   0          2m39s
	kube-system   kube-apiserver-test-03            1/1     Running   0          2m39s
	kube-system   kube-controller-manager-test-03   1/1     Running   0          2m39s
	kube-system   kube-proxy-kw227                  1/1     Running   0          2m26s
	kube-system   kube-scheduler-test-03            1/1     Running   0          2m39s

如果初始化失败,请参考[[#初始化失败]]

部署网络组件 这里我部署的是Flannel。每个版本的安装可能会有一定的差异,尽量参考官方文档说明 [[Kubernetes/K8S之kubeadm安装v1.21.1#网络组件]]

如果在k8s的init.yaml初始化文件中定义了podSubnet,那flannel的部署yaml文件也要修改其中的"net-conf.json"配置中的“Network”,跟podSubnet一致,否则flannel会启动失败
# 对于 Kubernetes v1.17+
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

# 等到flannel部署完成后coredns服务也会自动恢复正常
kubectl get all -A -o wide
	NAMESPACE      NAME                                  READY   STATUS    RESTARTS   AGE     IP              NODE      NOMINATED NODE   READINESS GATES
	kube-flannel   pod/kube-flannel-ds-xzc2r             1/1     Running   0          3m18s   172.23.210.24   test-03   <none>           <none>
	kube-system    pod/coredns-567c556887-7j8td          1/1     Running   0          30m     172.7.0.3       test-03   <none>           <none>
	kube-system    pod/coredns-567c556887-8psgk          1/1     Running   0          30m     172.7.0.2       test-03   <none>           <none>
	kube-system    pod/etcd-test-03                      1/1     Running   0          30m     172.23.210.24   test-03   <none>           <none>
	kube-system    pod/kube-apiserver-test-03            1/1     Running   0          30m     172.23.210.24   test-03   <none>           <none>
	kube-system    pod/kube-controller-manager-test-03   1/1     Running   0          30m     172.23.210.24   test-03   <none>           <none>
	kube-system    pod/kube-proxy-kw227                  1/1     Running   0          30m     172.23.210.24   test-03   <none>           <none>
	kube-system    pod/kube-scheduler-test-03            1/1     Running   0          30m     172.23.210.24   test-03   <none>           <none>

问题

初始化失败

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
	Once you have found the failing container, you can inspect its logs with:
	- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

如果kubelet服务正常运行,就通过journalctl -xeu kubelet命令来查看详细的日志信息。目前遇到的有两个问题 一是本机的主机名解析不到,直接在/etc/hosts中添加一下本机IP和主机名的绑定 二是拉取registry.k8s.io/pause:3.6镜像失败,这个镜像跟通过 kubeadm config images ls --config=init.yaml中查看到的版本不一至(kubeadm查看到的是3.9,实际初始化时要的是3.6),另外这个默认的registry.k8s.io在国内无法访问,导致无法成功拉取到这个镜像才会失败。还是通过替换镜像前面的域名来拉取对应版本。然后重新初始化

crictl pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
# 要用-n 指定命名空间,这里用的命名是'k8s.io'
ctr -n k8s.io image tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 registry.k8s.io/pause:3.6

2023.08.31更新拉取registry.k8s.io/pause:3.6镜像失败解决方案

该原因是因为containerd的配置文件中有一处sandbox镜像使用的是registry.k8s.io。替换之后重启containerd即可

sed -i 's/registry.k8s.io/registry.cn-hangzhou.aliyuncs.com\/google_containers/'  /etc/containerd/config.toml

systemctl daemon-reload
systemctl restart containerd

本地环境不用Ingress,直接访问Services

Flannel网络组件 所有节点关闭防火墙,关闭selinux。在局域网的路由器上添加一条到Services网段的静态路由,下一跳指向任意一个Node 物理网卡IP即可进行中转

calico网络组件 所有节点关闭防火墙,关闭selinux。在局域网的路由器上添加一条到Services网段的静态路由,下一跳指向任意一个Node 物理网卡IP即可进行中转 最好是用calicoctl在各节点上检查一下calico的连接状态。都是UP才是正常的状态

curl -L https://github.com/projectcalico/calico/releases/latest/download/calicoctl-linux-amd64 -o calicoctl

chmod +x calicoctl && mv calicoctl /usr/local/bin

# 把~/.kube/config 复制到node节点,不然只有在主控节点能正常访问
# master 节点
calicoctl node status
	Calico process is running.
	
	IPv4 BGP status
	+---------------+-------------------+-------+----------+-------------+
	| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
	+---------------+-------------------+-------+----------+-------------+
	| 172.23.210.25 | node-to-node mesh | up    | 01:57:48 | Established |
	| 172.23.210.27 | node-to-node mesh | up    | 01:58:02 | Established |
	+---------------+-------------------+-------+----------+-------------+
	
	IPv6 BGP status
	No IPv6 peers found.

# node1
	calicoctl node status
	Calico process is running.
	
	IPv4 BGP status
	+---------------+-------------------+-------+----------+-------------+
	| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
	+---------------+-------------------+-------+----------+-------------+
	| 172.23.210.24 | node-to-node mesh | up    | 01:57:51 | Established |
	| 172.23.210.27 | node-to-node mesh | up    | 01:58:04 | Established |
	+---------------+-------------------+-------+----------+-------------+
	
	IPv6 BGP status
	No IPv6 peers found.

# node2
	calicoctl node status
	Calico process is running.
	
	IPv4 BGP status
	+---------------+-------------------+-------+----------+-------------+
	| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
	+---------------+-------------------+-------+----------+-------------+
	| 172.23.210.24 | node-to-node mesh | up    | 01:58:05 | Established |
	| 172.23.210.25 | node-to-node mesh | up    | 01:58:05 | Established |
	+---------------+-------------------+-------+----------+-------------+
	
	IPv6 BGP status
	No IPv6 peers found.


kubeadm reset 重置后注意事项

通过kubeadm reset重置集群之后,记得还要清除/etc/cni/net.d目录,该目录下有旧集群的网络组件配置信息,如果不清除,会导致新创建的集群网络无法访问。

最后更新于