โœจ Thank you to letsboot for kindly sponsoring this post! โœจ

letsboot providesโ€ฆ

hands-on training for software and system developers by experienced experts with proven training materials in Basel, Zรผrich, New Zealand, remote and on-site

Get relevant training at letsboot.nz and letsboot.ch.


Hi there! ๐Ÿ‘‹ Thank you for stopping by.

Recently, I was tasked with building out a staging or example cluster on Hetzner for letsboot, across Hetzner's cloud and robot offerings.

This is less automated and managed cluster but still will be a solid and suitable solution with a few maintainance steps in mind.

๐Ÿ–ผ Background

About this
I had deployed a Kubernetes cluster with Talos Linux on Hetzner sometime ago for my islive.xyz project and knew it was a good starting point for this project. That config was inspired in part by the terraform-hcloud-talos project, which I was having troubles using back when I needed it so I built something not for reuse as a module to simplify my direct needs.
Previously on Talos in various places
SideroLabs has brought Talos Linux a long way ๐Ÿš€ since my last post on deploy Talos on Equinix Metal back in 2020/2021.
Recently
Last year in 2024, I also deployed a Talos cluster on Oracle Cloud while at ii.nz and learned a few things.
Hetzner
Are a popular European public cloud provider who provide virtual and bare metal infrastructure.

๐Ÿชƒ Project overview

Have some controlplane nodes on Hetzner cloud and worker nodes on Hetzner robot.

Needs:

Tools used:

Install all the dependencies with brew:

brew install opentofu kubectl talosctl fluxcd/tap/flux virtctl hashicorp/tap/packer

โซ Uploading a Talos Linux image snapshot

Using Packer, a VM is brought up on Hetzner and a Heztner cloud-specific image is downloaded from SideroLabs' Talos image factory and written to /dev/sda via the recovery. The snapshot will then be used to create further Talos machines on cloud.

Here the commands are prepared for running in the steps on the VM, such as which image, downloading and writing to disk:

locals {
  image_arm = var.image_url_arm != null ? var.image_url_arm : "https://factory.talos.dev/image/${var.talos_factory_schematic}/${var.talos_version}/hcloud-arm64.raw.xz"
  image_x86 = var.image_url_x86 != null ? var.image_url_x86 : "https://factory.talos.dev/image/${var.talos_factory_schematic}/${var.talos_version}/hcloud-amd64.raw.xz"

  # Add local variables for inline shell commands
  download_image = "wget --timeout=5 --waitretry=5 --tries=5 --retry-connrefused --inet4-only -O /tmp/talos.raw.xz"

  write_image = <<-EOT
    set -ex
    echo 'Talos image loaded, writing to disk... '
    xz -d -c /tmp/talos.raw.xz | dd of=/dev/sda && sync
    echo 'done.'
  EOT

  ...

}

Here, bringing an x86 VM up in our desired location:

# Source for the Talos x86 image
source "hcloud" "talos-x86" {
  rescue       = "linux64"
  image        = "debian-11"
  location     = "${var.server_location}"
  server_type  = "${var.server_type_x86_64}"
  ssh_username = "root"

  snapshot_name   = "Talos Linux ${var.talos_version} x86 by hcloud-talos"
  snapshot_labels = {
    type    = "infra",
    os      = "talos",
    version = "${var.talos_version}",
    arch    = "x86",
    creator = "hcloud-talos"
  }
}

And finally with the machine up, run the prepared commands to write image to the VM image disk before snapshotting it as a usable image:

# Build the Talos x86 snapshot
build {
  sources = ["source.hcloud.talos-x86"]

  # Download the Talos x86 image
  provisioner "shell" {
    inline = ["${local.download_image} ${local.image_x86}"]
  }

  # Write the Talos x86 image to the disk
  provisioner "shell" {
    inline = [local.write_image]
  }

  # Clean-up
  provisioner "shell" {
    inline = [local.clean_up]
  }
}

NOTE: each time that a new release of Talos is put out, Packer should be used to build a new snapshot.

๐ŸŽถ Creating infrastructure

The Terraform/Tofu is layed out as follows

data.tf
discovery and generation
locals.tf
not strictly typed hardcoded values and data mergings
outputs.tf
values exposed via tofu output
resources.tf
cloud resources and Talos Linux management
vars.tf
only used for the variable hcloud_token
version.tf
importing the providers

The code is implemented in a way that is only meant to be used as is, as I've found in the past it gets cumbersome when attempting to make it implementable as a module. Specifically no variable directives related to anything that creates or reads. From this, if a separate cluster is to be brought up, the config must be copied and modified, limiting the blast radius and elimiting edge cases.

๐Ÿฅฝ Diving in

First things first, a datacenter was chosen, this time being Helsinki ๐Ÿ‡ซ๐Ÿ‡ฎ:

data "hcloud_datacenter" "this" {
  name = local.data_center
}

data "hcloud_location" "this" {
  id = data.hcloud_datacenter.this.location.id
}

and also the Talos Linux hcloud image is discovered:

data "hcloud_image" "x86" {
  with_selector = "os=talos,arch=x86"
  most_recent   = true
}

The installer container image is built from a schematic and pulled from the factory via a http POST request:

data "http" "talos_schematic" {
  url    = "https://factory.talos.dev/schematics"
  method = "POST"
  request_headers = {
    Accept       = "application/json"
    Content-type = "text/x-yaml"
  }
  request_body = yamlencode(local.talos_schematic_customization)
}

and returning a schematic ID which is a one-way hash of the schematic customisation. The customisation is like so:

locals {
  ...
  talos_schematic_customization = {
    customization = {
      systemExtensions = {
        officialExtensions = [
          "siderolabs/iscsi-tools",
          "siderolabs/mdadm",
          "siderolabs/util-linux-tools",
          "siderolabs/binfmt-misc",
        ]
      }
    }
  }
  ...
}

enabling a few extensions particularly useful for CSI.

Next, Cilium is configured with the Helm provider's template resource:

data "helm_template" "cilium" {
  name      = "cilium"
  namespace = "kube-system"

  kube_version = local.kubernetes_version
  repository   = "https://helm.cilium.io"
  chart        = "cilium"
  version      = local.cilium_version

  set {
    name  = "operator.replicas"
    value = local.controlplane_count
  }
  ...
  set {
    name  = "kubeProxyReplacement"
    value = "false"
  }
  ...
  set {
    name  = "k8sServiceHost"
    value = "127.0.0.1"
  }
  set {
    name  = "k8sServicePort"
    value = local.api_port_kube_prism
  }
  ...
  set {
    name  = "hubble.relay.enabled"
    value = "true"
  }
  set {
    name  = "hubble.ui.enabled"
    value = "true"
  }
}

and it is configured not to replace kube-proxy, due to Hetzner cloud controller manager appearing depend on it for Load Balancer support, Talos Linux's KubePrism support is enabled via k8sServicePort and finally Hubble is enable for observability.

Lastly for data, hcloud-ccm is templated:

data "helm_template" "hcloud_ccm" {
  name      = "hcloud-cloud-controller-manager"
  namespace = "kube-system"

  kube_version = local.kubernetes_version
  repository   = "https://charts.hetzner.cloud"
  chart        = "hcloud-cloud-controller-manager"
  version      = local.hcloud_ccm_version

  set {
    name  = "networking.enabled"
    value = "true"
  }

  set {
    name  = "networking.clusterCIDR"
    value = local.pod_ipv4_cidr
  }

  set {
    name  = "env.HCLOUD_LOAD_BALANCERS_LOCATION.value"
    value = data.hcloud_location.this.name
  }
}

and it is made aware of the network and hcloud region.

Onto the resources, the servers are managed via random_pet:

resource "random_pet" "controlplane" {
  count  = local.controlplane_count
  length = 2
}

this is useful for providing a random human readable name for the resources. However, it is inflexible and the generated list will always be the same values in the same order. This is unhelpful if say you wanted to delete server zero for some reason and get a new name without potentially going through some hoops.

Humourously, an SSH key is required to make Hetzner be happy with the servers, although it's never used in Talos Linux:

resource "hcloud_ssh_key" "this" {
  name       = "${local.cluster_name}-default"
  public_key = tls_private_key.ssh_key.public_key_openssh
  labels = {
    "cluster" = local.cluster_name
  }
}

The network for the deployment and cluster is set up here:

resource "hcloud_network" "this" {
  name     = local.cluster_name
  ip_range = local.network_ipv4_cidr
  labels = {
    "cluster" = local.cluster_name
  }
}

resource "hcloud_network_subnet" "nodes" {
  network_id   = hcloud_network.this.id
  type         = "cloud"
  network_zone = data.hcloud_location.this.network_zone
  ip_range     = local.node_ipv4_cidr
}

Next, the controlplane servers are deployed:

resource "hcloud_server" "controlplane" {
  for_each           = { for idx, val in random_pet.controlplane : idx => val }
  name               = "${local.cluster_name}-c-${each.value.id}"
  server_type        = local.controlplane_server_type
  ssh_keys           = [hcloud_ssh_key.this.id]
  image              = data.hcloud_image.x86.id
  placement_group_id = hcloud_placement_group.controlplane.id
  location           = local.location
  labels = {
    "cluster" = local.cluster_name,
    "role"    = "controlplane"
  }
  network {
    network_id = hcloud_network_subnet.nodes.network_id
    ip         = cidrhost(hcloud_network_subnet.nodes.ip_range, 100 + each.key)
    alias_ips  = [] # fix for https://github.com/hetznercloud/terraform-provider-hcloud/issues/650
  }
}

A floating IP is allocated and assigned to the first-provisioned server, here:

data "hcloud_floating_ip" "controlplane_ipv4" {
  id = hcloud_floating_ip.controlplane_ipv4.id
}

resource "hcloud_floating_ip" "controlplane_ipv4" {
  name              = "controlplane-ipv4"
  type              = "ipv4"
  home_location     = data.hcloud_location.this.name
  description       = "Controlplane VIP"
  delete_protection = false
  labels = {
    "cluster" = local.cluster_name,
    "role"    = "controlplane"
  }
}

resource "hcloud_floating_ip_assignment" "controlplane_ipv4" {
  floating_ip_id = data.hcloud_floating_ip.controlplane_ipv4.id
  server_id      = hcloud_server.controlplane[0].id
  depends_on = [
    hcloud_server.controlplane,
  ]
}

Over on the Talos side of things, secrets are generated

resource "talos_machine_secrets" "machine_secrets" {
  talos_version = local.talos_version
}

and configuration is applied

resource "talos_machine_configuration_apply" "controlplane" {
  for_each                    = { for idx, val in hcloud_server.controlplane : idx => val }
  endpoint                    = each.value.ipv4_address
  node                        = each.value.ipv4_address
  client_configuration        = talos_machine_secrets.machine_secrets.client_configuration
  machine_configuration_input = data.talos_machine_configuration.controlplane.machine_configuration
  depends_on                  = [hcloud_server.controlplane]

  config_patches = [
    yamlencode({
      machine = {
        features = {
          kubernetesTalosAPIAccess = {
            enabled = true
            allowedRoles = [
              "os:reader",
            ]
            allowedKubernetesNamespaces = [
              "kube-system",
            ]
          }
        }
        ...
      }
      network = {
        hostname = each.value.name
        interfaces = [
          {
            interface = "eth0"
            dhcp      = true
            vip = {
              ip = hcloud_floating_ip.controlplane_ipv4.ip_address
              hcloud = {
                apiToken = var.hcloud_token
              }
            }
          },
        ]
      }
      ...
    })
    ...
  ]

It's worth noting that the Talos network controller is meant to shuffle the IP around the controlplane servers as seen in the configuration above.

From here, the first machine is bootstrapped

resource "talos_machine_bootstrap" "bootstrap" {
  depends_on = [
    talos_machine_configuration_apply.controlplane
  ]
  client_configuration = talos_machine_secrets.machine_secrets.client_configuration
  endpoint             = [for k, v in hcloud_server.controlplane : v.ipv4_address][0]
  node                 = [for k, v in hcloud_server.controlplane : v.ipv4_address][0]
}

At this point, Kubernetes will be up and running! (โœจ hi Kate! โ˜ธ๏ธโœจ)

Next, to talk to Talos and Kubernetes manually, talosconfig and kubeconfig are both separately needed:

data "talos_client_configuration" "talosconfig" {
  cluster_name         = local.cluster_name
  client_configuration = talos_machine_secrets.machine_secrets.client_configuration
  endpoints            = [for k, v in hcloud_server.controlplane : v.ipv4_address]
  nodes                = [for k, v in hcloud_server.controlplane : v.ipv4_address]
}

resource "talos_cluster_kubeconfig" "kubeconfig" {
  depends_on = [
    talos_machine_bootstrap.bootstrap
  ]
  client_configuration = talos_machine_secrets.machine_secrets.client_configuration
  endpoint             = [for k, v in hcloud_server.controlplane : v.ipv4_address][0]
  node                 = [for k, v in hcloud_server.controlplane : v.ipv4_address][0]
}

๐Ÿค– Deploying to Robot

It becomes a bit manual here.

With earlier output config, a default worker machine config is outputted. Now, a few things on top of that can be patched specific to Robot machine.

---
machine:
  kubelet:
    nodeIP:
      validSubnets:
        - $MACHINE_IP/32
    extraMounts:
      - destination: /var/lib/longhorn
        type: bind
        source: /var/lib/longhorn
        options:
          - bind
          - rshared
          - rw
  files:
    - op: create
      path: /etc/cri/conf.d/20-customization.part
      content: |
        [plugins]
          [plugins."io.containerd.grpc.v1.cri"]
            device_ownership_from_security_context = true
  network:
    hostname: $MACHINE_NAME
  install:
    disk: $MACHINE_DISK
  nodeLabels:
    robot.hetzner.cloud: "yes"

In order:

  • tell the machine what it's IP is
  • allow a writable path for longhorn
  • configure containerd to behave with longhorn and disks
  • set a hostname
  • set a install disk
  • label the node for scheduling and selecting purposes

On the machine where tofu was run, a few values must be outputted, the talos_version and talos_factory_schematic_id

tofu output -raw talos_version
tofu output -raw talos_factory_schematic_id

after logging into the rescue system on the Robot server, the Talos image can now be written to the first nvme drive. This should take about 15s.

read -p 'TALOS_VERSION: ' TALOS_VERSION
read -p 'TALOS_FACTORY_SCHEMATIC: ' TALOS_FACTORY_SCHEMATIC
DISK_IMAGE="https://factory.talos.dev/image/$TALOS_FACTORY_SCHEMATIC/$TALOS_VERSION/metal-amd64.raw.zst"
SYSTEM_DISK=/dev/nvme0n1
wget -O- "$DISK_IMAGE" | zstd -dc | pv > "$SYSTEM_DISK" && sync && lsblk

Back on the machine with tofu, the Talos machine config can now be applied

talosctl apply-config \
  --talosconfig ./talosconfig \
  -n "$MACHINE_IP" \
  -e "$MACHINE_IP" \
  --insecure \
  --file <(tofu output -raw talos_worker_machine_config) \
  --config-patch @<(< ./support/metal-disk-patch.yaml envsubst | yq -o json | jq -c) # NOTE must match patches in talos_machine_configuration_apply.worker config_patches

๐ŸŒ† Bringing up core services

All services inside Kubernetes are managed via FluxCD; also the cluster folder contains both the Terraform and Kubernetes+FluxCD manifests.

FluxCD is bootstrapped, like so:

export GITLAB_TOKEN="$GITLAB_ACCESS_TOKEN" # NOTE IMPORTANT must have set
flux \
  --kubeconfig ./kubeconfig \ # NOTE prevent kubeconfig mistakes
  --context 'admin@staging' \ # NOTE prevent context mistakes
  bootstrap gitlab \
  --hostname <GITLAB HOSTNAME HERE> \
  --owner letsboot \
  --repository base-cluster \
  --path=staging-cluster \
  --components-extra=image-reflector-controller,image-automation-controller

This can be done via the Terraform FluxCD provider, though the usage of the provide is cumbersome for me, see this issue.

The repo cluster folders related to Flux are structured like this:

- apps/
  - [app folder]/
    - [file].yaml ...
    - kustomization.yaml
  - [coresponding app flux kustomization].yaml
  - kustomization.yaml
- flux-system/
- infrastructure/
  - configs/
    - [file].yaml ...
    - kustomization.yaml
  - controllers/
    - [controller]/ ...
      - kustomization.yaml
    - [coresponding controller flux kustomization].yaml ...
    - kustomization.yaml
FluxCD
currently managed via flux cli
cert-manager
standard install via Helm in Flux
ingress-nginx
using controller service externalTrafficPolicy=Local, required node scheduling on a hcloud machine and prefer to not run replicas on the same node if possible
Longhorn
will only schedule on nodes labeled with robot.hetzner.cloud="yes"
KubeVirt
infra, CDI and workloads will only schedule on nodes labeled with robot.hetzner.cloud="yes"

๐Ÿค” Issues and considerations

  • Provisioning Hetzner Robots machines is very manual
  • The Terraform/Tofu isn't particularly Kubernetes aware; e.g:

  • As Robot nodes are managed mostly separately from Terraform/Tofu, they can be reset and reused else where, where needed
  • Robot nodes which are configured for Longhorn to run on, not all disks are utilised just yet

๐Ÿ™ Closing thoughts

To step it up, I think that Cluster-API or SideroLabs' Omni would be a great way to increase automation and reliability, replacing all the Tofu code with some YAML manifests.

Thank you again to letsboot for the opportunity to support them!

See an example repo here: COMING SOON

Other reads