K8S之kubeadm安装v1.26
系统环境 Rocky Linux 8.6
尽量不要在一个环境下反复重置、部署。可能会有组件配置重置时不会完全清除,导致未知问题。
如果是在虚拟机,建议在初始化集群前对所有节点进行一次快照。每次通过还原快照来恢复环境。EPEL
yum install -y https://mirrors.aliyun.com/epel/epel-release-latest-8.noarch.rpm
sed -i 's|^#baseurl=https://download.example/pub|baseurl=https://mirrors.aliyun.com|' /etc/yum.repos.d/epel*
sed -i 's|^metalink|#metalink|' /etc/yum.repos.d/epel*
安装containerd
自kubernetes1.24起的版本不再支持Dockershim,所以如果默认无法直接用docker engine做为kubernetes的容器运行时。官方推荐用containerd
转发 IPv4 并让 iptables 看到桥接流量
cat <<EOF | tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
# 应用 sysctl 参数而不重新启动
sysctl --system安装kubeadm kubelet kubectl
kubeadm kubelet kubectlRedHat系
### 添加yum源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 安装服务
yum install -y kubelet kubeadm kubectl \
--disableexcludes=kubernetes
# 启动并设置开机自启动
systemctl enable --now kubeletDebian系
### Debian/Ubuntu
apt-get update && apt-get install -y apt-transport-https
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl
# 启动并设置开机自启动
systemctl enable --now kubelet
创建初始化文件
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: c9mm0j.3rvgri9l5z0h85zz
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 172.23.210.24 # API服务器IP,
bindPort: 6443 # API服务端口
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: test-03
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers # 镜像地址
kind: ClusterConfiguration
kubernetesVersion: v1.26.1 # kubernetes版本
networking:
dnsDomain: cluster.local
podSubnet: 172.7.0.0/16 # pod 子网
serviceSubnet: 172.26.0.0/16 # services子网看一下所需要的容器,并提前拉至本地
kubeadm config images list --config=init.yaml
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.26.1
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.26.1
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.26.1
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.26.1
registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9
registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.6-0
registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.9.3
kubeadm config images pull --config=init.yaml初始化集群
kubeadm init --config=init.yaml --upload-certs
# 输出信息
……
# 初始化成功信息
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 172.23.210.24:6443 --token c9mm0j.3rvgri9l5z0h85zz \
--discovery-token-ca-cert-hash sha256:e30fd29c213a43cc78af7068833f314f3d6326f4d7e184e317a20c70a0be9089
# 查看集群基本情况,在网络组件没部署前,coredns是无法运行的,状态会一直是‘Pending’
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-567c556887-7j8td 0/1 Pending 0 2m25s
kube-system coredns-567c556887-8psgk 0/1 Pending 0 2m25s
kube-system etcd-test-03 1/1 Running 0 2m39s
kube-system kube-apiserver-test-03 1/1 Running 0 2m39s
kube-system kube-controller-manager-test-03 1/1 Running 0 2m39s
kube-system kube-proxy-kw227 1/1 Running 0 2m26s
kube-system kube-scheduler-test-03 1/1 Running 0 2m39s
如果初始化失败,请参考[[#初始化失败]]
部署网络组件 这里我部署的是Flannel。每个版本的安装可能会有一定的差异,尽量参考官方文档说明 [[Kubernetes/K8S之kubeadm安装v1.21.1#网络组件]]
如果在k8s的init.yaml初始化文件中定义了podSubnet,那flannel的部署yaml文件也要修改其中的"net-conf.json"配置中的“Network”,跟podSubnet一致,否则flannel会启动失败# 对于 Kubernetes v1.17+
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
# 等到flannel部署完成后coredns服务也会自动恢复正常
kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel pod/kube-flannel-ds-xzc2r 1/1 Running 0 3m18s 172.23.210.24 test-03 <none> <none>
kube-system pod/coredns-567c556887-7j8td 1/1 Running 0 30m 172.7.0.3 test-03 <none> <none>
kube-system pod/coredns-567c556887-8psgk 1/1 Running 0 30m 172.7.0.2 test-03 <none> <none>
kube-system pod/etcd-test-03 1/1 Running 0 30m 172.23.210.24 test-03 <none> <none>
kube-system pod/kube-apiserver-test-03 1/1 Running 0 30m 172.23.210.24 test-03 <none> <none>
kube-system pod/kube-controller-manager-test-03 1/1 Running 0 30m 172.23.210.24 test-03 <none> <none>
kube-system pod/kube-proxy-kw227 1/1 Running 0 30m 172.23.210.24 test-03 <none> <none>
kube-system pod/kube-scheduler-test-03 1/1 Running 0 30m 172.23.210.24 test-03 <none> <none>
问题
初始化失败
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
如果kubelet服务正常运行,就通过journalctl -xeu kubelet命令来查看详细的日志信息。目前遇到的有两个问题 一是本机的主机名解析不到,直接在/etc/hosts中添加一下本机IP和主机名的绑定 二是拉取registry.k8s.io/pause:3.6镜像失败,这个镜像跟通过 kubeadm config images ls --config=init.yaml中查看到的版本不一至(kubeadm查看到的是3.9,实际初始化时要的是3.6),另外这个默认的registry.k8s.io在国内无法访问,导致无法成功拉取到这个镜像才会失败。还是通过替换镜像前面的域名来拉取对应版本。然后重新初始化
crictl pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
# 要用-n 指定命名空间,这里用的命名是'k8s.io'
ctr -n k8s.io image tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 registry.k8s.io/pause:3.62023.08.31更新拉取registry.k8s.io/pause:3.6镜像失败解决方案
该原因是因为containerd的配置文件中有一处sandbox镜像使用的是registry.k8s.io。替换之后重启containerd即可
sed -i 's/registry.k8s.io/registry.cn-hangzhou.aliyuncs.com\/google_containers/' /etc/containerd/config.toml
systemctl daemon-reload
systemctl restart containerd本地环境不用Ingress,直接访问Services
Flannel网络组件 所有节点关闭防火墙,关闭selinux。在局域网的路由器上添加一条到Services网段的静态路由,下一跳指向任意一个Node 物理网卡IP即可进行中转
calico网络组件 所有节点关闭防火墙,关闭selinux。在局域网的路由器上添加一条到Services网段的静态路由,下一跳指向任意一个Node 物理网卡IP即可进行中转 最好是用calicoctl在各节点上检查一下calico的连接状态。都是UP才是正常的状态
curl -L https://github.com/projectcalico/calico/releases/latest/download/calicoctl-linux-amd64 -o calicoctl
chmod +x calicoctl && mv calicoctl /usr/local/bin
# 把~/.kube/config 复制到node节点,不然只有在主控节点能正常访问
# master 节点
calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 172.23.210.25 | node-to-node mesh | up | 01:57:48 | Established |
| 172.23.210.27 | node-to-node mesh | up | 01:58:02 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
# node1
calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 172.23.210.24 | node-to-node mesh | up | 01:57:51 | Established |
| 172.23.210.27 | node-to-node mesh | up | 01:58:04 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
# node2
calicoctl node status
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 172.23.210.24 | node-to-node mesh | up | 01:58:05 | Established |
| 172.23.210.25 | node-to-node mesh | up | 01:58:05 | Established |
+---------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
kubeadm reset 重置后注意事项
通过kubeadm reset重置集群之后,记得还要清除/etc/cni/net.d目录,该目录下有旧集群的网络组件配置信息,如果不清除,会导致新创建的集群网络无法访问。
最后更新于