home | ||
.gitignore | ||
.pre-commit-config.yaml | ||
readme.md |
k8s-clusters
This repository contains the configuration, scripts, and other goodies for building and managing my kubernetes clusters (right now, that's just my home cluster). I share the source with you so you can make exactly the same mistakes as I do.
Setup
Setup the pre-commit hooks before you change anything!
pip install pre-commit
pre-commit install --install-hooks
pre-commit autoupdate
Provision Machines
Before we interact with the cluster, we have some manual work to do.
Manual Preparation
- Currently, my nodes are Arch Linux machines on bare metal
- Nodes must be ready to be controlled via Ansible
- Have
python3
installed - Need to be
ssh
-able from a controller (my workstation)curl -L files.lyte.dev/key.pub >> ~/.ssh/authorized_keys
- Have
- Nodes must support Longhorn: https://longhorn.io/docs/1.2.3/deploy/install/#installation-requirements
- Nodes must be running on a host filesystem that supports
file extents
- Provisioning takes care of the rest
- Nodes must be running on a host filesystem that supports
- Nodes must be ready to be controlled via Ansible
Automated Provisioning
- Setup Ansible on the controller (from
./ansible
)pushd ansible && ansible-galaxy install -r requirements.yml --force
- Verify Ansible can reach hosts (from
./ansible
)pushd ansible && ansible all -i inventory/hosts.yml --list-hosts
pushd ansible && ansible all -i inventory/hosts.yml -m ping
- Use Ansible to build the cluster as configured on all nodes (from
./ansible
)pushd ansible && ansible-playbook -i inventory/hosts.yml ./build-k3s-cluster.yml
And the cluster is up! If you want to interact with it from your controller, you can do this:
- Copy the cluster information from the
./k3s-cluster-config.kubeconfig.yaml
file into your existing~/.kube/config
(or just copy it there if it doesn't exist)- You will need to edit the host from
localhost
/127.0.0.1
to the correct host
- You will need to edit the host from
ansible -i ansible/inventory/hosts.yml $REMOTE_HOST -m fetch \
-a "src=/etc/rancher/k3s/k3s.yaml dest=./k3s-cluster-config.kubeconfig.yaml flat=yes"
# TODO: this did not work for me
# env KUBECONFIG="~/.kube/config:./k3s-cluster-config.kubeconfig.yaml" \
# kubectl config view --flatten | sed "s/127.0.0.1/$REMOTE_HOST/" > ~/.kube/new-config
sed -i 's/127\.0\.0\.1/10.0.0.87' ~/.kube/config
Automated Teardown
cd ansible
ansible-playbook -i inventory/hosts.yml ./nuke-k3s-cluster.yml
Setting up Flux
- Install the
flux
CLI on a machine that cankubectl
into the shiny, new clusterparu -S flux-bin
- or
curl -s https://fluxcd.io/install.sh | sudo bash
- https://fluxcd.io/docs/installation/
- Run the pre-flight check (you must have
~/.kube/config
setup!)flux check --pre
- Create the
flux-system
namespacekubectl create namespace flux-system --dry-run=client -o yaml | kubectl apply -f -
- Add the
sops-age
encryption key to the namespacepass k8s-clusters | grep age-secret-key | awk '{printf $2}' | \ kubectl --namespace flux-system create secret generic sops-age \ --from-file=age.agekey=/dev/stdin
- Install Flux (note the
fish
-isms here, so you may need to translate tobash
-isms)flux bootstrap git --url=(git remote get-url origin) --branch=master \ --path=./cluster/home --private-key-file=$HOME/.ssh/flux-k8s-clusters
Troubleshooting
If you screw something up here, here are some things you can do:
flux uninstall
will nuke flux from the cluster so you can retry from the beginning of this section- If you get something like
sync path configuration ... would overwrite path ... of existing Kustomization
, you can edit thepath: ...
field in theflux-system/gotk-sync.yaml
file in whatever you're passing as--path
, commit, and try the bootstrap again - You can pretty easily nuke the entire cluster and start from scratch as a last resort?
To Do & Status
- How am I going to handle highly-available storage?
cert-manager
with CloudFlare?external-dns
with CloudFlare?- I still need to figure out my overall cluster structure
- Since my goal is to have full redundancy, I believe I need at least 2 control plane nodes, which since I need an odd number means 3 control plane nodes, and at least 2 worker nodes. This means 5 nodes total. I should be able to use some of my rpi4s in the cluster, probably as control plane nodes.
- Where/how is storage attached?
- I need to figure out a migration plan from my current Netlify + Custom DDNS + Docker Compose setup
- I should be able to do something like the following:
- Setup all applications on the cluster using some dummy domain
- Make sure everything works with the dummy domain
- Change dummy domain to real domain
- Change domain's nameserver to cloudflare
- Should be all set!
- I should be able to do something like the following:
- I want to look into Talos/Sidero + PXEBoot, since that could remove a lot of the ansible stuff?
k3s
has a decent amount of magic AFAICT, so I'd like to learn more about it and all its components so I better understand what my system is actually doing