PowerScale - Configure with Kubernetes
RKE2 advertises itself as an automatic K8s installer. That is… sort of true based on my experience. It is certainly simpler than what I had to do 7 years ago, but significant assembly by someone who knows Kubernetes and networking was still required.
- PowerScale - Configure with Kubernetes
My IPs
Section titled “My IPs”- K8s Master - 10.10.25.135 (k8s-server.lan)
- K8s Worker - 10.10.25.136 (k8s-agent1.lan)
- Isilon - 10.10.25.80
Install RKE2 on Server
Section titled “Install RKE2 on Server”I recommend just making life easy and doing an su - and just doing everything as root.
Note: after heavy experimentation to include writing the below code that does this I still found flannel choked with firewalld on so ultimately I just ran systemctl disable --now firewalld. See Troubleshooting Flannel Issues. Since it’s a lab I decided the juice wasn’t worth the squeeze because I think the problem is in the internal masquerade rules. The firewall rules I tried are in firewall ports I tried
curl -sfL https://get.rke2.io | sudo sh -sudo systemectl disable --now firewalldsudo systemctl enable rke2-server.servicesudo systemctl start rke2-server.servicecd /var/lib/rancher/rke2/binecho 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrcecho 'export PATH=$PATH:/var/lib/rancher/rke2/bin' >> ~/.bashrcsource ~/.bashrcThe rke2 server process listens on port 9345 for new nodes to register. The Kubernetes API is still served on port 6443, as normal.
Set Up a K8s Node
Section titled “Set Up a K8s Node”sudo curl -sfL https://get.rke2.io | sudo INSTALL_RKE2_TYPE="agent" sh -systemctl disable --now firewalldsudo systemctl enable rke2-agent.servicemkdir -p /etc/rancher/rke2/echo 'export PATH=$PATH:/var/lib/rancher/rke2/bin' >> ~/.bashrcsource ~/.bashrcvim /etc/rancher/rke2/config.yamlNote: If you don’t update bashrc and source it, none of the kubectl commands will run correctly because RKE2 uses a custom API port (6443) whereas the Kubernetes default is 8080.
Next you have to populate the config file with your server’s token info. You get the token by logging into the server and running:
[root@k8s-server tmp]# cat /var/lib/rancher/rke2/server/node-tokenK1016508dd12aa27c24f9898fdebd534a7f2dc5b8cd719d1f6cf131edb799247d0e::server:ede9908e983065b06dfabcd9ba45d7abThen you put that token in the aforementioned config file:
server: https://k8s-server.lan:9345token: K1016508dd12aa27c24f9898fdebd534a7f2dc5b8cd719d1f6cf131edb799247d0e::server:ede9908e983065b06dfabcd9ba45d7abAfter you do this and save it I strongly suggest running shutdown -r now and giving things a reboot. I noticed on my setup, for some reason, flannel failed to come up. You can check if this is the case by running ip a s. You should see:
[grant@k8s-agent1 ~]$ ip a s1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:50:56:8a:b2:8c brd ff:ff:ff:ff:ff:ff altname enp2s1 inet 10.10.25.136/24 brd 10.10.25.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet6 fe80::250:56ff:fe8a:b28c/64 scope link noprefixroute valid_lft forever preferred_lft forever3: calia304d00df8c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-fef0629e-72af-acbf-9e2e-27a43f48407e inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever4: calib76b9de74c6@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-60486c79-4e9c-14fd-be02-c8243d382b4a inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever5: cali4fbea555e83@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-5c1b74d4-961c-6f3c-950b-4c910cf5c8d6 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever6: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether 12:6d:b8:f6:c8:66 brd ff:ff:ff:ff:ff:ff inet 10.42.1.0/32 scope global flannel.1 valid_lft forever preferred_lft forever inet6 fe80::106d:b8ff:fef6:c866/64 scope link valid_lft forever preferred_lft forever9: calid194e3ad4a3@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-5f9c0ded-120d-b035-fe1d-6eab65c96d11 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever10: calia758d43a129@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-97f784af-18ec-46df-ce3f-6607efa71a7f inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft forever11: calie1439757d80@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000 link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-a65f26af-89c4-e98c-1248-64b0f4a6cef8 inet6 fe80::ecee:eeff:feee:eeee/64 scope link valid_lft forever preferred_lft foreverNotice that flannel.1 is present along with the calico interfaces. If you don’t see that, try the reboot.
After the server setup I noticed it took quite some time to come up. You can track progress with journalctl -u rke2-server -f. My logs looked like this:
Nov 29 14:21:37 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:37-05:00" level=info msg="Pod for kube-apiserver not synced (waiting for termination of old pod sandbox), retrying"Nov 29 14:21:38 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:38-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:21:43 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:43-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:21:48 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:48-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:21:53 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:53-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:21:57 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:57-05:00" level=info msg="Pod for etcd is synced"Nov 29 14:21:57 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:57-05:00" level=info msg="Pod for kube-apiserver not synced (waiting for termination of old pod sandbox), retrying"Nov 29 14:21:58 k8s-server.lan rke2[1016]: time="2023-11-29T14:21:58-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:22:03 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:03-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:22:08 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:08-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:22:13 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:13-05:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Pod for etcd is synced"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Pod for kube-apiserver is synced"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="ETCD server is now running"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="rke2 is up and running"Nov 29 14:22:17 k8s-server.lan systemd[1]: Started Rancher Kubernetes Engine v2 (server).Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Failed to get existing traefik HelmChart" error="helmcharts.helm.cattle.io \"traefik\" not found"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Reconciling ETCDSnapshotFile resources"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Tunnel server egress proxy mode: agent"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Starting managed etcd node metadata controller"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Reconciliation of ETCDSnapshotFile resources complete"Nov 29 14:22:17 k8s-server.lan rke2[1016]: time="2023-11-29T14:22:17-05:00" level=info msg="Starting k3s.cattle.io/v1, Kind=Addon controller"You can see you get constant 500 errors until it eventually fixes itself. When everything has settled down make sure that you see nodes:
[root@k8s-server bin]# kubectl get nodesNAME STATUS ROLES AGE VERSIONk8s-agent1.lan Ready <none> 11m v1.26.10+rke2r2k8s-server.lan Ready control-plane,etcd,master 60m v1.26.10+rke2r2Install Helm
Section titled “Install Helm”On the server:
cd /tmpwget https://get.helm.sh/helm-v3.13.2-linux-amd64.tar.gz # Update version as neededtar xzf helm-v3.13.2-linux-amd64.tar.gzsudo mv linux-amd64/helm /usr/local/bin/helmhelm versionInstall Cert Manager
Section titled “Install Cert Manager”On the server:
helm repo add jetstack https://charts.jetstack.iohelm repo updatehelm install \ cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version v1.13.2 \ --set installCRDs=trueInstall Rancher
Section titled “Install Rancher”On the server:
helm repo add rancher-stable https://releases.rancher.com/server-charts/stablekubectl create namespace cattle-systemhelm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=k8s-server.lan --set bootstrapPassword=PASSWORD --set ingress.tls.source=rancher # YOU HAVE TO UPDATE THISecho https://k8s-server.lan/dashboard/?setup=$(kubectl get secret --namespace cattle-system bootstrap-secret -o go-template='{{.data.bootstrapPassword|base64decode}}')Install a Load Balancer for Bare Metal (metallb)
Section titled “Install a Load Balancer for Bare Metal (metallb)”kubectl create namespace metallb-systemkubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.12/config/manifests/metallb-native.yamlRun vim metallb.yaml and create a file with these contents:
---apiVersion: metallb.io/v1beta1kind: IPAddressPoolmetadata: name: nat namespace: metallb-systemspec: addresses: - 10.10.25.140-10.10.25.149---apiVersion: metallb.io/v1beta1kind: L2Advertisementmetadata: name: empty namespace: metallb-systemAfter you create the file run kubectl apply -f metallb.yaml
Now we need to make sure Rancher uses metallb:
WARNING: you need to change the hostname to your hostname WARNING: make sure Rancher is healthy before continuing!
helm upgrade rancher rancher-stable/rancher --namespace cattle-system --set hostname=k8s-server.lan --set rancher.service.type=LoadBalancerkubectl patch svc rancher -n cattle-system -p '{"spec": {"type": "LoadBalancer"}}'PowerScale
Section titled “PowerScale”Setting Up the PowerScale
Section titled “Setting Up the PowerScale”See PowerScale Setup.
Install the CSI Driver
Section titled “Install the CSI Driver”I started by following this tutorial
- Enable NFSv4

- Create a directory

- Create NFS export

-
Move these files onto your system
-
Do the following
sudo dnf install -y git && git clone -b v2.8.0 https://github.com/dell/csi-powerscale.gitcd csi-powerscale/wget -O my-isilon-settings.yaml https://raw.githubusercontent.com/dell/helm-charts/csi-isilon-2.8.0/charts/csi-isilon/values.yamlkubectl create namespace isilonkubectl create -f empty-secret.ymlkubectl create secret generic isilon-creds -n isilon --from-file=config=secret.yaml- On the Isilon you have to run
isi_gconfig -t web-config auth_basic=truebecause I was lazy and I used basic auth and not session based auth. - Deploy the CSI driver with
./csi-install.sh --namespace isilon --values my-isilon-settings.yaml- WARNING YOU MUST ENABLE
ignoreUnresolvableHosts: Truein the current version. We are currently investigating the issue and are not exactly sure where the problem is, but the NFS mount from K8s shows up on the Isilon as an IP even with DNS enabled. This will cause the Isilon to reject it and when you attempt to write to the mount it will fail.
- WARNING YOU MUST ENABLE
- Next deploy the storage class with
kubectl apply -f ./isilon.yml - Check it worked with
kubectl get storageclassandkubectl describe storageclass isilon - Build a test pvc with
kubectl apply -f test-pvc.yaml(this should run against the test-pvc file you transferred). Make sure it bound withkubectl get pvc test-pvc - On all servers run
dnf install -y nfs-utils. IF YOU DO NOT DO THIS YOU WILL SEE AN ERROR ABOUT LOCKS. The package isnfs-commonon Debian-based systems.
Troubleshooting
Section titled “Troubleshooting”Flannel Issues
Section titled “Flannel Issues”My rancher install failed with no output from the installer. You can manually pull the logs by examining the rancher pod with kubectl logs -n cattle-system rancher-64cf6ddd96-2x2ms
This got me:
2023/11/29 21:04:33 [ERROR] [updateClusterHealth] Failed to update cluster [local]: Internal error occurred: failed calling webhook "rancher.cattle.io.clusters.management.cattle.io": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/clusters.management.cattle.io?timeout=10s": context deadline exceeded2023/11/29 21:04:33 [ERROR] Failed to connect to peer wss://10.42.0.8/v3/connect [local ID=10.42.1.20]: dial tcp 10.42.0.8:443: connect: no route to host2023/11/29 21:04:34 [ERROR] Failed to connect to peer wss://10.42.1.21/v3/connect [local ID=10.42.1.20]: dial tcp 10.42.1.21:443: connect: no route to host2023/11/29 21:04:38 [ERROR] Failed to connect to peer wss://10.42.0.8/v3/connect [local ID=10.42.1.20]: dial tcp 10.42.0.8:443: connect: no route to host2023/11/29 21:04:39 [ERROR] Failed to connect to peer wss://10.42.1.21/v3/connect [local ID=10.42.1.20]: dial tcp 10.42.1.21:443: connect: no route to host2023/11/29 21:04:43 [ERROR] Failed to connect to peer wss://10.42.0.8/v3/connect [local ID=10.42.1.20]: dial tcp 10.42.0.8:443: connect: no route to host2023/11/29 21:04:44 [ERROR] Failed to connect to peer wss://10.42.1.21/v3/connect [local ID=10.42.1.20]: dial tcp 10.42.1.21:443: connect: no route to hoston repeat. 10.42.1.21 is an internal flannel address so the next step is to figure out who owns it with kubectl get pods --all-namespaces -o wide:
[root@k8s-server ~]# kubectl get pods --all-namespaces -o wideNAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATEScattle-fleet-system fleet-controller-56968b86b6-tctjr 1/1 Running 0 44m 10.42.1.24 k8s-agent1.lan <none> <none>cattle-fleet-system gitjob-7d68454468-bk7fh 1/1 Running 0 44m 10.42.1.25 k8s-agent1.lan <none> <none>cattle-provisioning-capi-system capi-controller-manager-6f87d6bd74-v489n 1/1 Running 0 41m 10.42.1.30 k8s-agent1.lan <none> <none>cattle-system helm-operation-64xf7 0/2 Completed 0 42m 10.42.1.29 k8s-agent1.lan <none> <none>cattle-system helm-operation-h88vn 1/2 Error 0 41m 10.42.1.34 k8s-agent1.lan <none> <none>cattle-system helm-operation-jndl9 1/2 Error 0 41m 10.42.1.33 k8s-agent1.lan <none> <none>cattle-system helm-operation-k757h 0/2 Completed 0 44m 10.42.1.23 k8s-agent1.lan <none> <none>cattle-system helm-operation-ldnkm 0/2 Completed 0 45m 10.42.1.22 k8s-agent1.lan <none> <none>cattle-system helm-operation-sv5ts 0/2 Completed 0 43m 10.42.1.28 k8s-agent1.lan <none> <none>cattle-system helm-operation-thct7 0/2 Completed 0 43m 10.42.1.27 k8s-agent1.lan <none> <none>cattle-system rancher-64cf6ddd96-2x2ms 1/1 Running 1 (45m ago) 46m 10.42.1.20 k8s-agent1.lan <none> <none>cattle-system rancher-64cf6ddd96-drrzr 1/1 Running 0 46m 10.42.0.8 k8s-server.lan <none> <none>cattle-system rancher-64cf6ddd96-qq64g 1/1 Running 0 46m 10.42.1.21 k8s-agent1.lan <none> <none>cattle-system rancher-webhook-58d68fb97d-b5sn8 1/1 Running 0 41m 10.42.1.32 k8s-agent1.lan <none> <none>cert-manager cert-manager-startupapicheck-fvp9t 0/1 Completed 1 52m 10.42.1.19 k8s-agent1.lan <none> <none>kube-system cloud-controller-manager-k8s-server.lan 1/1 Running 3 (105m ago) 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system etcd-k8s-server.lan 1/1 Running 1 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system helm-install-rke2-canal-k8b4d 0/1 Completed 0 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system helm-install-rke2-coredns-f59dz 0/1 Completed 0 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system helm-install-rke2-ingress-nginx-gpt7q 0/1 Completed 0 139m 10.42.0.2 k8s-server.lan <none> <none>kube-system helm-install-rke2-metrics-server-q9jwf 0/1 Completed 0 139m 10.42.0.6 k8s-server.lan <none> <none>kube-system helm-install-rke2-snapshot-controller-6pqpg 0/1 Completed 2 139m 10.42.0.4 k8s-server.lan <none> <none>kube-system helm-install-rke2-snapshot-controller-crd-k6klp 0/1 Completed 0 139m 10.42.0.10 k8s-server.lan <none> <none>kube-system helm-install-rke2-snapshot-validation-webhook-hrv5n 0/1 Completed 0 139m 10.42.0.3 k8s-server.lan <none> <none>kube-system kube-apiserver-k8s-server.lan 1/1 Running 1 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system kube-controller-manager-k8s-server.lan 1/1 Running 2 (105m ago) 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system kube-proxy-k8s-agent1.lan 1/1 Running 0 90m 10.10.25.136 k8s-agent1.lan <none> <none>kube-system kube-proxy-k8s-server.lan 1/1 Running 2 (104m ago) 103m 10.10.25.135 k8s-server.lan <none> <none>kube-system kube-scheduler-k8s-server.lan 1/1 Running 1 (105m ago) 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system rke2-canal-7p5hz 2/2 Running 2 (105m ago) 139m 10.10.25.135 k8s-server.lan <none> <none>kube-system rke2-canal-9wg57 2/2 Running 0 90m 10.10.25.136 k8s-agent1.lan <none> <none>kube-system rke2-coredns-rke2-coredns-565dfc7d75-n96xs 1/1 Running 0 90m 10.42.1.2 k8s-agent1.lan <none> <none>kube-system rke2-coredns-rke2-coredns-565dfc7d75-xv92q 1/1 Running 1 (105m ago) 139m 10.42.0.3 k8s-server.lan <none> <none>kube-system rke2-coredns-rke2-coredns-autoscaler-6c48c95bf9-mh279 1/1 Running 1 (105m ago) 139m 10.42.0.2 k8s-server.lan <none> <none>kube-system rke2-ingress-nginx-controller-89d4c 1/1 Running 0 89m 10.42.1.3 k8s-agent1.lan <none> <none>kube-system rke2-ingress-nginx-controller-zctxb 1/1 Running 1 (105m ago) 139m 10.42.0.5 k8s-server.lan <none> <none>kube-system rke2-metrics-server-c9c78bd66-ndcxs 1/1 Running 1 (105m ago) 139m 10.42.0.4 k8s-server.lan <none> <none>kube-system rke2-snapshot-controller-6f7bbb497d-xfk9x 1/1 Running 1 (105m ago) 139m 10.42.0.6 k8s-server.lan <none> <none>kube-system rke2-snapshot-validation-webhook-65b5675d5c-sfqb2 1/1 Running 1 (105m ago) 139m 10.42.0.7 k8s-server.lan <none> <none>We can see that 10.42.0.8 and 10.42.1.21 are the two rancher containers which confirms for us that as per usual, flannel is not able to complete even its most basic of functions (VXLAN) successfully and its up to us to fix it.
cattle-system rancher-64cf6ddd96-drrzr 1/1 Running 0 46m 10.42.0.8 k8s-server.lan <none> <none>cattle-system rancher-64cf6ddd96-qq64g 1/1 Running 0 46m 10.42.1.21 k8s-agent1.lan <none> <none>We can get shells in these containers with kubectl exec -it -n cattle-system rancher-64cf6ddd96-drrzr -- /bin/bash. I fished around in here and found nothing. Ultimately I tcpdumped the flannel network and discovered that we were missing some other specific ports it needed:
[root@k8s-agent1 ~]# tcpdump -i flannel.1dropped privs to tcpdumptcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on flannel.1, link-type EN10MB (Ethernet), snapshot length 262144 bytes16:17:38.793476 IP 10.42.0.0.58822 > 10.42.1.32.tungsten-https: Flags [S], seq 3219974389, win 64860, options [mss 1410,sackOK,TS val 2506628321 ecr 0,nop,wscale 7], length 016:17:38.793509 IP k8s-agent1.lan > 10.42.0.0: ICMP host 10.42.1.32 unreachable - admin prohibited filter, length 6816:17:39.143985 IP 10.42.0.0.41502 > 10.42.1.32.tungsten-https: Flags [S], seq 1965118956, win 64860, options [mss 1410,sackOK,TS val 2506628671 ecr 0,nop,wscale 7], length 016:17:39.144008 IP k8s-agent1.lan > 10.42.0.0: ICMP host 10.42.1.32 unreachable - admin prohibited filter, length 6816:17:39.847986 IP 10.42.0.0.58822 > 10.42.1.32.tungsten-https: Flags [S], seq 3219974389, win 64860, options [mss 1410,sackOK,TS val 2506629375 ecr 0,nop,wscale 7], length 016:17:39.848008 IP k8s-agent1.lan > 10.42.0.0: ICMP host 10.42.1.32 unreachable - admin prohibited filter, length 6816:17:40.679004 IP 10.42.0.0.47680 > 10.42.1.32.tungsten-https: Flags [S], seq 1291962979, win 64860, options [mss 1410,sackOK,TS val 2506630206 ecr 0,nop,wscale 7], length 016:17:40.679028 IP k8s-agent1.lan > 10.42.0.0: ICMP host 10.42.1.32 unreachable - admin prohibited filter, length 6816:17:41.894997 IP 10.42.0.0.58822 > 10.42.1.32.tungsten-https: Flags [S], seq 3219974389, win 64860, options [mss 1410,sackOK,TS val 2506631422 ecr 0,nop,wscale 7], length 016:17:41.895024 IP k8s-agent1.lan > 10.42.0.0: ICMP host 10.42.1.32 unreachable - admin prohibited filter, length 6816:17:42.727005 IP 10.42.0.0.54136 > 10.42.1.32.tungsten-https: Flags [S], seq 1383176303, win 64860, options [mss 1410,sackOKUltimately even after opening the ports I wasn’t able to get it to work so I disabled firewalld altogether.
Firewalld Ports I tried
Section titled “Firewalld Ports I tried”Firewall rules I tried on server:
# Kubernetes API Serverfirewall-cmd --permanent --add-port=6443/tcp# RKE2 Serverfirewall-cmd --permanent --add-port=9345/tcp# etcd server client APIfirewall-cmd --permanent --add-port=2379/tcpfirewall-cmd --permanent --add-port=2380/tcp# HTTPSfirewall-cmd --permanent --add-port=443/tcp# NodePort Servicesfirewall-cmd --permanent --add-port=30000-32767/tcp# Kubelet APIfirewall-cmd --permanent --add-port=10250/tcp# kube-schedulerfirewall-cmd --permanent --add-port=10251/tcp# kube-controller-managerfirewall-cmd --permanent --add-port=10252/tcp# Flannelfirewall-cmd --permanent --add-port=8285/udpfirewall-cmd --permanent --add-port=8472/udp# Additional ports required for Kubernetesfirewall-cmd --permanent --add-port=10255/tcp # Read-only Kubelet APIfirewall-cmd --permanent --add-port=30000-32767/tcp # NodePort Services rangefirewall-cmd --permanent --add-port=6783/tcp # Flannelfirewall-cmd --permanent --add-port=6783/udp # Flannelfirewall-cmd --permanent --add-port=6784/udp # Flannelfirewall-cmd --add-masquerade --permanentfirewall-cmd --reloadsystemctl restart firewalldFirewall rules I tried on agent:
Firewall rules I’ve tried
# Kubelet API and Flannel portsfirewall-cmd --permanent --add-port=10250/tcpfirewall-cmd --permanent --add-port=8285/udpfirewall-cmd --permanent --add-port=8472/udp# NodePort Servicesfirewall-cmd --permanent --add-port=30000-32767/tcp# Additional ports required for Kubernetesfirewall-cmd --permanent --add-port=10255/tcp # Read-only Kubelet APIfirewall-cmd --permanent --add-port=6783/tcp # Flannelfirewall-cmd --permanent --add-port=6783/udp # Flannelfirewall-cmd --permanent --add-port=6784/udp # Flannelfirewall-cmd --add-masquerade --permanentfirewall-cmd --reloadsystemctl restart firewalldGetting Kubernetes Status
Section titled “Getting Kubernetes Status”Get Node status
kubectl get nodesGet Detailed Node Status
kubectl describe nodesCheck all pods status
kubectl get pods --all-namespacesCheck specific pod status
kubectl get pods -n cert-managerGet detailed info for a specific pod
kubectl describe pod -n cert-manager cert-manager-6f799f7ff8-xx68nGet the logs from a specific pod
kubectl logs -n kube-system rke2-ingress-nginx-controller-89d4cCheck to see if metallb is giving a service an external IP address
kubectl get svc -n cattle-systemCheck metellb config
kubectl get configmap -n metallb-system config -o yamlCheck the metallb speaker output to see if there was an error giving out an IP address
kubectl logs -l component=controller -n metallb-systemEdit a config file inside of K8s
kubectl edit configmap config -n metallb-systemRestart metallb after a config change
kubectl rollout restart daemonset -n metallb-system speakerkubectl rollout restart deployment -n metallb-system controllerRestart a target service
Sometimes this helps get an IP unstuck from pending
kubectl rollout restart deployment rancher -n cattle-systemTesting NFS Mount Quickly
Section titled “Testing NFS Mount Quickly”If you get an access denied error you will see something like the below in the pod description
mounting arguments: -t nfs -o rw 10.10.25.80:/ifs/data/rancher-storage/k8s-5b6cb091d0 /var/lib/kubelet/pods/e1bb5844-f482-4dfa-b4b0-0aa8225a316b/volumes/kubernetes.io~csi/k8s-5b6cb091d0/mountoutput: mount.nfs: access denied by server while mounting 10.10.25.80:/ifs/data/rancher-storage/k8s-5b6cb091d0Under the hood the container is just running a generic nfs mount command which you can use for testing. For example, the above would become:
mount -t nfs -o rw 10.10.25.80:/ifs/data/rancher-storage/k8s-5b6cb091d0 /var/lib/kubelet/pods/e1bb5844-f482-4dfa-b4b0-0aa8225a316b/volumes/kubernetes.io~csi/k8s-5b6cb091d0/mount <MOUNT_POINT>You can use this to quickly test without restarting the containers.
Checking ACLs
Section titled “Checking ACLs”Get ACLs List
Section titled “Get ACLs List”You can check ACL permissions with ls -led <target>:
grantcluster-1# ls -led /ifs/data/rancher-storage/k8s-7e21fa52bbdrwxrwxrwx 2 root wheel 25 Dec 6 20:11 /ifs/data/rancher-storage/k8s-7e21fa52bb OWNER: user:root GROUP: group:wheel SYNTHETIC ACL 0: user:root allow dir_gen_read,dir_gen_write,dir_gen_execute,std_write_dac,delete_child 1: group:wheel allow dir_gen_read,dir_gen_write,dir_gen_execute,delete_child 2: everyone allow dir_gen_read,dir_gen_write,dir_gen_execute,delete_childSee a Specific User’s Privileges
Section titled “See a Specific User’s Privileges”grantcluster-1# isi auth access root /ifs/data/rancher-storage/k8s-7e21fa52bb User Name: root UID: 0 SID: SID:S-1-22-1-0
File Owner Name: root ID: UID:0 Group Name: wheel ID: GID:0 Effective Path: /ifs/data/rancher-storage/k8s-7e21fa52bb File Permissions: root level permissions were found for this user and this file. Mode: drwxrwxrwx Relevant Mode: drwx------ Snapshot Path: No Delete Child: The parent directory allows delete_child for this user, the user may delete the file. Ownership: User is owner and can view and modify file's security descriptor.