Pearl Dsilva 830f3061bc
SystemVM optimizations (#5831)
* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency

* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp

* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup

* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes

* Add ssh to k8s nodes details in the Access tab on the UI

* test

* Refactor ca/cert patching logic

* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script

* remove all references of systemvm.iso

* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs

* fix script timeout

* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand

* remove commented code + change core user to cloud for cks nodes

* Update ownership of ssh directory

* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)

* Add UI changes + move changes from patch file to runcmd

* test: validate performance for template modification during seeding

* create vms folder in cloudstack-commons directory - debian rules

* remove logic for on the fly template convert + update k8s test

* fix syntax issue - causing issue with shared network tests

* Code cleanup

* refactor patching logic - certs

* move logic of fixing rootdiskcontroller from upgrade to kubernetes service

* add livepatch option to restart network & vpc

* smooth upgrade of cks clusters

* Support for live patching systemVMs and deprecating systemVM.iso. Includes:
- fix systemVM template version
- Include agent.zip, cloud-scripts.tgz to the commons package
- Support for live-patching systemVMs - CPVM, SSVM, Routers
- Fix Unit test
- Remove systemvm.iso dependency

* The following commit:
- refactors logic added to support SystemVM deployment on KVM
- Adds support to copy specific files (required for patching) to the hosts on Xenserver
- Modifies vmops method - createFileInDomr to take cleanup param
- Adds configuratble sleep param to CitrixResourceBase::connect() used to verify if telnet to specifc port is possible (if sleep is 0, then default to _sleep = 10000ms)
- Adds Command/Answer for patch systemVMs on XenServer/Xcp

* - Support to patch SystemVMs - VMWare
- Remove attaching systemvm.iso to systemVMs
- Modify / Refactor VMware start command to copy patch related files to the systemvms
- cleanup

* Commit comprises of:
- remove docker from systemvm template - use containerd as container runtime
- update create-k8s-binaries script to use ctr for all docker operations
- Update userdata sent to the k8s nodes
- update cksnode script, run during patching of the cks/k8s nodes

* Add ssh to k8s nodes details in the Access tab on the UI

* test

* Refactor ca/cert patching logic

* Commit comprises of the following changes:
- Use restart network/VPC API to patch routers
- use livePatch API support patching of only cpvm/ssvm
- add timeout to the keystore setup/import script

* remove all references of systemvm.iso

* Fix keystore-cert-import invocation + refactor cert timeout in CP/SS VMs

* fix script timeout

* Refactor cert patching for systemVMs + update keystore-cert-import script + patch-sysvms script + remove patchSysvmCommand from networkelementcommand

* remove commented code + change core user to cloud for cks nodes

* Update ownership of ssh directory

* NEED TO DISCUSS - add on the fly template conversion as an ExecStartPre action (systemd)

* Add UI changes + move changes from patch file to runcmd

* test: validate performance for template modification during seeding

* create vms folder in cloudstack-commons directory - debian rules

* remove logic for on the fly template convert + update k8s test

* fix syntax issue - causing issue with shared network tests

* Code cleanup

* add cgroup config for containerd

* add systemd config for kubelet

* add additional info during image registry config

* address comments

* add temp links of download.cloudstack.org

* address part of the comments

* address comments

* update containerd config - as version has upgraded to 1.5 from 1.4.12 in 4.17.0

* address comments - simplify

* fix vue3 related icon changes

* allow network commands when router template version is lower but is patched

* add internal LB to the list of routers to be patched on network restart with live patch

* add unit tests for API param validations and new helper utilities - file scp & checksum validations

* perform patching only for non-user i.e., system VMs

* add test to validate params

* remove unused import

* add column to domain_router to display software version and support networkrestart with livePatch from router view

* Requires upgrade column to consider package (cloud-scripts) checksum to identify if true/false

* use router software version instead of checksum

* show N/A if no software version reported i.e., in upgraded envs

* fix deb failure

* update pom to official links of systemVM template
2022-04-21 13:40:19 -03:00

273 lines
10 KiB
YAML

#cloud-config
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
---
users:
- name: cloud
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
{{ k8s.ssh.pub.key }}
write_files:
- path: /opt/bin/setup-kube-system
permissions: '0700'
owner: root:root
content: |
#!/bin/bash -e
if [[ -f "/home/cloud/success" ]]; then
echo "Already provisioned!"
exit 0
fi
ISO_MOUNT_DIR=/mnt/k8sdisk
BINARIES_DIR=${ISO_MOUNT_DIR}/
K8S_CONFIG_SCRIPTS_COPY_DIR=/tmp/k8sconfigscripts/
ATTEMPT_ONLINE_INSTALL=false
setup_complete=false
OFFLINE_INSTALL_ATTEMPT_SLEEP=15
MAX_OFFLINE_INSTALL_ATTEMPTS=100
offline_attempts=1
MAX_SETUP_CRUCIAL_CMD_ATTEMPTS=3
EJECT_ISO_FROM_OS={{ k8s.eject.iso }}
crucial_cmd_attempts=1
iso_drive_path=""
while true; do
if (( "$offline_attempts" > "$MAX_OFFLINE_INSTALL_ATTEMPTS" )); then
echo "Warning: Offline install timed out!"
break
fi
set +e
output=`blkid -o device -t TYPE=iso9660`
set -e
if [ "$output" != "" ]; then
while read -r line; do
if [ ! -d "${ISO_MOUNT_DIR}" ]; then
mkdir "${ISO_MOUNT_DIR}"
fi
retval=0
set +e
mount -o ro "${line}" "${ISO_MOUNT_DIR}"
retval=$?
set -e
if [ $retval -eq 0 ]; then
if [ -d "$BINARIES_DIR" ]; then
iso_drive_path="${line}"
break
else
umount "${line}" && rmdir "${ISO_MOUNT_DIR}"
fi
fi
done <<< "$output"
fi
if [ -d "$BINARIES_DIR" ]; then
break
fi
echo "Waiting for Binaries directory $BINARIES_DIR to be available, sleeping for $OFFLINE_INSTALL_ATTEMPT_SLEEP seconds, attempt: $offline_attempts"
sleep $OFFLINE_INSTALL_ATTEMPT_SLEEP
offline_attempts=$[$offline_attempts + 1]
done
if [[ "$PATH" != *:/opt/bin && "$PATH" != *:/opt/bin:* ]]; then
export PATH=$PATH:/opt/bin
fi
if [ -d "$BINARIES_DIR" ]; then
### Binaries available offline ###
echo "Installing binaries from ${BINARIES_DIR}"
mkdir -p /opt/cni/bin
tar -f "${BINARIES_DIR}/cni/cni-plugins-"*64.tgz -C /opt/cni/bin -xz
mkdir -p /opt/bin
tar -f "${BINARIES_DIR}/cri-tools/crictl-linux-"*64.tar.gz -C /opt/bin -xz
mkdir -p /opt/bin
cd /opt/bin
cp -a ${BINARIES_DIR}/k8s/{kubeadm,kubelet,kubectl} .
chmod +x {kubeadm,kubelet,kubectl}
sed "s:/usr/bin:/opt/bin:g" ${BINARIES_DIR}/kubelet.service > /etc/systemd/system/kubelet.service
mkdir -p /etc/systemd/system/kubelet.service.d
sed "s:/usr/bin:/opt/bin:g" ${BINARIES_DIR}/10-kubeadm.conf > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
echo "KUBELET_EXTRA_ARGS=--cgroup-driver=systemd" > /etc/default/kubelet
output=`ls ${BINARIES_DIR}/docker/`
if [ "$output" != "" ]; then
while read -r line; do
crucial_cmd_attempts=1
while true; do
if (( "$crucial_cmd_attempts" > "$MAX_SETUP_CRUCIAL_CMD_ATTEMPTS" )); then
echo "Loading docker image ${BINARIES_DIR}/docker/$line failed!"
break;
fi
retval=0
set +e
ctr image import "${BINARIES_DIR}/docker/$line"
retval=$?
set -e
if [ $retval -eq 0 ]; then
break;
fi
crucial_cmd_attempts=$[$crucial_cmd_attempts + 1]
done
done <<< "$output"
setup_complete=true
fi
if [ -e "${BINARIES_DIR}/autoscaler.yaml" ]; then
mkdir -p /opt/autoscaler
cp "${BINARIES_DIR}/autoscaler.yaml" /opt/autoscaler/autoscaler_tmpl.yaml
fi
if [ -e "${BINARIES_DIR}/provider.yaml" ]; then
mkdir -p /opt/provider
cp "${BINARIES_DIR}/provider.yaml" /opt/provider/provider.yaml
fi
umount "${ISO_MOUNT_DIR}" && rmdir "${ISO_MOUNT_DIR}"
if [ "$EJECT_ISO_FROM_OS" = true ] && [ "$iso_drive_path" != "" ]; then
eject "${iso_drive_path}"
fi
fi
if [ "$setup_complete" = false ] && [ "$ATTEMPT_ONLINE_INSTALL" = true ]; then
### Binaries not available offline ###
RELEASE="v1.16.3"
CNI_VERSION="v0.7.5"
CRICTL_VERSION="v1.16.0"
echo "Warning: ${BINARIES_DIR} not found. Will get binaries and docker images from Internet."
mkdir -p /opt/cni/bin
curl -L "https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/cni-plugins-amd64-${CNI_VERSION}.tgz" | tar -C /opt/cni/bin -xz
mkdir -p /opt/bin
curl -L "https://github.com/kubernetes-incubator/cri-tools/releases/download/${CRICTL_VERSION}/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz" | tar -C /opt/bin -xz
mkdir -p /opt/bin
cd /opt/bin
curl -L --remote-name-all https://storage.googleapis.com/kubernetes-release/release/${RELEASE}/bin/linux/amd64/{kubeadm,kubelet,kubectl}
chmod +x {kubeadm,kubelet,kubectl}
curl -sSL "https://raw.githubusercontent.com/kubernetes/kubernetes/${RELEASE}/build/debs/kubelet.service" | sed "s:/usr/bin:/opt/bin:g" > /etc/systemd/system/kubelet.service
mkdir -p /etc/systemd/system/kubelet.service.d
curl -sSL "https://raw.githubusercontent.com/kubernetes/kubernetes/${RELEASE}/build/debs/10-kubeadm.conf" | sed "s:/usr/bin:/opt/bin:g" > /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
fi
systemctl enable kubelet && systemctl start kubelet
modprobe overlay && modprobe br_netfilter && sysctl net.bridge.bridge-nf-call-iptables=1
if [ -d "$BINARIES_DIR" ] && [ "$ATTEMPT_ONLINE_INSTALL" = true ]; then
crucial_cmd_attempts=1
while true; do
if (( "$crucial_cmd_attempts" > "$MAX_SETUP_CRUCIAL_CMD_ATTEMPTS" )); then
echo "Warning: kubeadm pull images failed after multiple tries!"
break;
fi
retval=0
set +e
kubeadm config images pull --cri-socket /run/containerd/containerd.sock
retval=$?
set -e
if [ $retval -eq 0 ]; then
break;
fi
crucial_cmd_attempts=$[$crucial_cmd_attempts + 1]
done
fi
- path: /opt/bin/deploy-kube-system
permissions: '0700'
owner: root:root
content: |
#!/bin/bash -e
if [[ -f "/home/cloud/success" ]]; then
echo "Already provisioned!"
exit 0
fi
if [[ $(systemctl is-active setup-kube-system) != "inactive" ]]; then
echo "setup-kube-system is running!"
exit 1
fi
modprobe ip_vs
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack
if [[ "$PATH" != *:/opt/bin && "$PATH" != *:/opt/bin:* ]]; then
export PATH=$PATH:/opt/bin
fi
kubeadm join {{ k8s_control_node.join_ip }}:6443 --token {{ k8s_control_node.cluster.token }} --control-plane --certificate-key {{ k8s_control_node.cluster.ha.certificate.key }} --discovery-token-unsafe-skip-ca-verification
sudo touch /home/cloud/success
echo "true" > /home/cloud/success
- path: /opt/bin/setup-containerd
permissions: '0755'
owner: root:root
content: |
#!/bin/bash -e
export registryConfig="\\ [plugins.\"io.containerd.grpc.v1.cri\".registry.mirrors.\"{{registry.url.endpoint}}\"]\n \\ endpoint = [\"{{registry.url}}\"]"
export registryCredentials="\\ [plugins.\"io.containerd.grpc.v1.cri\".registry.configs.\"{{registry.url.endpoint}}\".auth]\n\tusername = \"{{registry.username}}\" \n\tpassword = \"{{registry.password}}\" \n\tidentitytoken = \"{{registry.token}}\""
echo "creating config file for containerd"
containerd config default > /etc/containerd/config.toml
sed -i '/\[plugins."io.containerd.grpc.v1.cri".registry\]/a '"${registryCredentials}"'' /etc/containerd/config.toml
sed -i '/\[plugins."io.containerd.grpc.v1.cri".registry.mirrors\]/a '"${registryConfig}"'' /etc/containerd/config.toml
echo "Restarting containerd service"
systemctl restart containerd
- path: /etc/systemd/system/setup-kube-system.service
permissions: '0755'
owner: root:root
content: |
[Unit]
Requires=containerd.service
After=containerd.service
[Service]
Type=simple
StartLimitInterval=0
ExecStart=/opt/bin/setup-kube-system
- path: /etc/systemd/system/deploy-kube-system.service
permissions: '0755'
owner: root:root
content: |
[Unit]
After=setup-kube-system.service
[Service]
Type=simple
StartLimitInterval=0
Restart=on-failure
ExecStartPre=/usr/bin/curl -k https://{{ k8s_control_node.join_ip }}:6443/version
ExecStart=/opt/bin/deploy-kube-system
runcmd:
- chown -R cloud:cloud /home/cloud/.ssh
- containerd config default > /etc/containerd/config.toml
- sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
- systemctl daemon-reload
- systemctl restart containerd
- until [ -f /etc/systemd/system/deploy-kube-system.service ]; do sleep 5; done
- until [ -f /etc/systemd/system/setup-kube-system.service ]; do sleep 5; done
- [ systemctl, start, setup-kube-system ]
- [ systemctl, start, deploy-kube-system ]