One of my side projects is developing and maintaining an unofficial Prometheus Exporter for Rancher. It exposes metrics pertaining to Rancher-specific resources including, but not limited to managed clusters, Kubernetes versions, and more. Below shows an example dashboard based on these metrics.
Incidentally, if you are using Rancher, I’d love to hear your thoughts/feedback.
Previous CI workflow
The flowchart below outlines the existing process. Whilst automated, pushing directly to latest is bad practice.
To improve this. Several additional steps were added. First of which acquires the latest, versioned image of the exporter and saves it to the $GITHUB_OUTPUT environment
Referencing this, the next version can be generated based on MAJOR.MINOR.PATCH. Incrementing the PATCH version. In the future, this will be modified to add more flexibility to change MAJOR and MINOR versions.
- name: Increment version
id: increment_version
run: |
# Increment the retrieved version
echo "updated_version=$(echo "${{ steps.get_version.outputs.image_version }}" | awk -F. -v OFS=. '{$NF++;print}')" >> $GITHUB_OUTPUT
With the version generated, the subsequent step can tag and push both the incremented version, and latest.
Disclaimer – The use of nested virtualisation is not a supported topology
Harvester is an open-source HCI solution aimed at managing Virtual Machines, similar to vSphere and Nutanix, with key differences including (but not limited to):
Fully Open Source
Leveraging Kubernetes-native technologies
Integration with Rancher
Testing/evaluating any hyperconverged solution can be difficult – It usually requires having dedicated hardware as these solutions are designed to work directly on bare metal. However, we can circumvent this by leveraging nested virtualisation – something which may be familiar with a lot of homelabbers (myself included) – which involves using an existing virtualisation solution provision workloads that also leverage virtualisation technology.
Step 1 – Planning
To mimic what a production-like system may look like, two NICs will be leveraged – one that facilitates management traffic, and the other for Virtual Machine traffic, as depicted below
MGMT network and VM Network will manifest as VDS Port groups.
It is highly recommended to create new Distributed Port groups for this exercise, mainly because of the configuration we will be applying in the next step.
Create a new vDS Port Group:
Give the port group a name, such as harvester-mgmt
Adjust any configuration (ie VLAN ID) to match your environment (if required). Or accept the defaults:
Repeat this process to create the harvester-vm Port group. We should now have two port groups:
harvester-mgmt
harvester-vm
Step 3 – Enable MAC learning on Port groups [Critical]
Our Harvester VM will operate like any other VM, with some important differences. In vSphere, go through the standard VM creation wizard to specify the Host/Datastore options. When presented with the OS type, select Other Linux (64 bit).
When customising the hardware, select Expose hardware assisted virtualization to the guest OS – This is crucial, as without this selected Harvester will not install.
Add an additional network card so that our VM leverages both previously created port groups:
And finally, mount the Harvester ISO image.
Step 4 – Install Harvester
Power on the VM and providing the ISO is mounted and connected, you should be presented with the install screen. As this is the first node, select create a new Harvester Cluster
Select the Install target and optional MBR partitioning
Configure the hostname, management nic and IP assignment options.
Configure the DNS config:
Configure the Harvester VIP. This is what we will use to access the Web UI. This can also be obtained via DHCP if desired.
Configure the cluster token, this is required if you want to add more nodes later on.
Configure the local Password:
Configure the NTP server Address:
If desired, the subsequent options facilitate importing SSH keys, reading a remote config, etc which are optional. A summary will be presented before the install begins:
Proceed with the install.
Note : After a reboot, it may take a few minutes before harvester reports as being in a ready state – Once it does, navigate to the reported management URL.
At which point you will be prompted to reset the admin password
Step 5 – Configure VM Network
Once logged in to Harvester navigate to Hosts > Edit Config
Configure the secondary NIC to the VLAN network (our VM network)
Navigate to Settings > VLAN > Edit
Click “Enable” and select the default interface to the secondary interface. This will be the default for any new nodes that join the cluster.
To create a network for our VM’s to reside in, select Network > Create:
Give this network a name and a VLAN ID. Note – you can supply VLAN ID 1 if you’re using the native/default VLAN.
Step 6 – Test VM Network
Firstly, create a new image:
For this example, we can use an ISO image. After supplying the URL Harvester will download and store the image:
After downloading, we can create a VM from it:
Specify the VM specs (CPU and Mem)
Under Volumes, add an additional volume to act as the installation target for the OS (Or leave if purely wanting to use a live ISO):
Under Networks, change the selection to the VM network that was previously created and click “Create”:
Once the VM is in running state, we can take a VNC console to it:
At which point we can interact with it as we would expect with any HCI solution:
My Job at Suse (via Rancher) involves hosting a lot of demos, product walk-throughs and various other activities that necessitate spinning up tailored environments on-demand. To facilitate this, I previously leaned towards Terraform, and have since curated a list of individual scripts I have to manage on an individual basis as they address a specific use case.
This approach reached a point where it became difficult to manage. Ideally, I wanted an IaC environment that catered for:
Easy, in-code looping (ie for and range)
“Proper” condition handling, ie if monitoring == true, install monitoring vs the slightly awkward HCL equivalent of repurposing count as a sudo-replacement for condition handling.
Influence what’s installed by config options/vars.
Complete end-to end creation of cluster objects, in my example, create:
AWS EC2 VPC
AWS Subnets
AWS AZ’s
AWS IGW
AWS Security Group
1x Rancher provisioned EC2 cluster
3x single node K3S clusters used for Fleet
Pulumi addresses these requirements pretty comprehensively. Additionally, I can re-use existing logic from my Terraform code as the Rancher2 Pulumi provider is based on the Terraform implementation, but I can leverage Go tools/features to build my environment.
Code Tour – Core
The core objects are created directly, using types from the Pulumi packages:
You will notice some interesting types in the above – such as pulumi.Bool and pulumi.String. The reason for this is, we need to treat cloud deployments as asynchronous operations. Some values we will know at runtime (expose port 80), some will only be known at runtime (the ID of a VPC, as below). These Pulumi types are a facilitator of this asynchronous paradigm.
Moving to something slightly more complex, such as looping around regions and assigning a subnet to each:
// Get the list of AZ's for the defined region
azState := "available"
zoneList, err := aws.GetAvailabilityZones(ctx, &aws.GetAvailabilityZonesArgs{
State: &azState,
})
if err != nil {
return err
}
//How many AZ's to spread nodes across. Default to 3.
zoneNumber := 3
zones := []string{"a", "b", "c"}
var subnets []*ec2.Subnet
// Iterate through the AZ's for the VPC and create a subnet in each
for i := 0; i < zoneNumber; i++ {
subnet, err := ec2.NewSubnet(ctx, "david-pulumi-subnet-"+strconv.Itoa(i), &ec2.SubnetArgs{
AvailabilityZone: pulumi.String(zoneList.Names[i]),
Tags: pulumi.StringMap{"Name": pulumi.String("david-pulumi-subnet-" + strconv.Itoa(i))},
VpcId: vpc.ID(),
CidrBlock: pulumi.String("10.0." + strconv.Itoa(i) + ".0/24"),
MapPublicIpOnLaunch: pulumi.Bool(true),
})
This is repeated for each type
Code Tour – Config
The config file allows us to store information required by providers (unless using env variables or something externally) and values that we can use to influence the resources that are created. In particular, I added the following boolean values:
This directly influence what will be created in my main demo cluster, as well as individual “Fleet” clusters. Within the main Pulumi code, these values are extracted:
As there’s a much more dynamic nature to this project, I have a single template which I can tailor to address a number of use-cases with a huge amount of customisation. One could argue the same could be done in Terraform with using count, but I find this method cleaner. In addition, my next step is to implement some testing using go’s native features to further enhance this project.
Bootstrapping K3s
One challenge I encountered was being able to create and import K3s clusters. Currently, only RKE clusters can be directly created from Rancher. To address this, I created the cluster object in Rancher, extract the join command, and passed it together with the K3s install script so after K3s has stood up, it will run the join command:
Type Name Status
+ pulumi:pulumi:Stack Rancher-Demo-Env-dev creating...
+ pulumi:pulumi:Stack Rancher-Demo-Env-dev creating..
+ pulumi:pulumi:Stack Rancher-Demo-Env-dev creating..
+ ├─ rancher2:index:Cluster david-pulumi-fleet-1 created
+ ├─ rancher2:index:Cluster david-pulumi-fleet-2 created
+ ├─ rancher2:index:CloudCredential david-pulumi-cloudcredential created
+ ├─ aws:ec2:Subnet david-pulumi-subnet-1 created
+ ├─ aws:ec2:Subnet david-pulumi-subnet-0 created
+ ├─ aws:ec2:InternetGateway david-pulumi-gw created
+ ├─ aws:ec2:Subnet david-pulumi-subnet-2 created
+ ├─ aws:ec2:SecurityGroup david-pulumi-sg created
+ ├─ aws:ec2:DefaultRouteTable david-pulumi-routetable created
+ ├─ rancher2:index:NodeTemplate david-pulumi-nodetemplate-eu-west-2b created
+ ├─ rancher2:index:NodeTemplate david-pulumi-nodetemplate-eu-west-2a created
+ ├─ rancher2:index:NodeTemplate david-pulumi-nodetemplate-eu-west-2c created
+ ├─ aws:ec2:Instance david-pulumi-fleet-node-0 created
+ ├─ aws:ec2:Instance david-pulumi-fleet-node-2 created
+ ├─ aws:ec2:Instance david-pulumi-fleet-node-1 created
+ ├─ rancher2:index:Cluster david-pulumi-cluster created
+ ├─ rancher2:index:NodePool david-pulumi-nodepool-2 created
+ ├─ rancher2:index:NodePool david-pulumi-nodepool-1 created
+ ├─ rancher2:index:NodePool david-pulumi-nodepool-0 created
+ ├─ rancher2:index:ClusterSync david-clustersync created
+ ├─ rancher2:index:AppV2 opa created
+ ├─ rancher2:index:AppV2 monitoring created
+ ├─ rancher2:index:AppV2 istio created
+ ├─ rancher2:index:AppV2 cis created
+ ├─ rancher2:index:AppV2 logging created
+ └─ rancher2:index:AppV2 longhorn created
Resources:
+ 29 created
Duration: 19m18s
20mins for a to create all of these resources fully automated is pretty handy. This example also includes all the addons – opa, monitoring, istio, cis, logging and longhorn.