Hi all, as some of you may know, I’m interested in homelabbing and are hosting my own kubernetes cluster at home. As part of a good homelab it is essential to keep track of logs and metrics. The number one goto application for this usecase is often the kube Prometheus Stack, which is in my humble opinion a bit to big for my homelab in regard to memory, compute and storage footprint. While looking for alternatives I stumbled upon Victoria Metrics which seems to be a perfect fit for my usecase.

It is build in a distributed fashion with a time-series storage ind mind. From my first view at the architectural view it looks quite fitting for my usecase and could be a nice general drop-in replacement for the Prometheus stack. Victoria Metrics Architecture View Source: Docs Victoria Metrics

So let’s get started in jump into the deep water and build our deployments.

A prepared example deployment is available here

Victoria Metrics Cluster

The cluster deployment is composed of the official helm-chart made available and contains the three root components vmselect, vminsert and vmstorage.

VMStorage is the data backend for all stored metrics and is the single golden trough for your queryable data in a time range. Due to the fact that the vmstorage component manages raw data it becomes a stateful part of your cluster, which is requiring some sort of special care. VMInset and VMSelect are both stateless components in this stack and provide your third party applications access towards the raw data you are collecting in your cluster.

Installing the metrics cluster is rather easy due to the provided helm chart, which is easiest to view via ArtifactHub. At the time of writing this blog post version 0.9.60 is the newest and everything is based on this.

To reduce the tool dependency, I’m going to use the HelmChartInflationGenerator for kustomize to keep everything in one universe.

First, we need to set up the inflation generator for this specific helm chart.

./base/victoria-metrics-cluster/helmrelease.yaml

apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
  name: victoria-metrics-cluster
releaseName: victoria-metrics-cluster
name: victoria-metrics-cluster
version: 0.9.60
repo: https://victoriametrics.github.io/helm-charts/
valuesInline: {}
IncludeCRDs: true
namespace: victoria-metrics

./base/victoria-metrics-cluster/kustomization.yaml ```yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization

resources:

  • helmrelease.yaml ```

The general information is pretty straight forward if you are already familiar with the helm-way to install prepared packages. You may notice that the valuesInline are empty. Due to the fact that I wanted to set up this deployment in a patch-able manor, the value overwrites are added with the next step.

./env/homelab/patches/patch-victoria-metrics-cluster.yaml ```yaml apiVersion: builtin kind: HelmChartInflationGenerator metadata: name: victoria-metrics-cluster valuesInline: rbac: create: true pspEnabled: false vmselect: replicaCount: 1 podAnnotations: prometheus.io/scrape: “true” prometheus.io/port: “8481”

vminsert: replicaCount: 1 podAnnotations: prometheus.io/scrape: “true” prometheus.io/port: “8480” extraArgs: envflag.enable: “true” envflag.prefix: VM_ loggerFormat: json

vmstorage: replicaCount: 1 persistentVolume: storageClass: nfs-client podAnnotations: prometheus.io/scrape: “true” prometheus.io/port: “8482”


This patch applies modifications towards the helm chart which are generally available via the preconfigured values within it. 
To elaborate my specific patch. 
I wanted to create specific RBAC rules for my environment but was required to disable the pod security policies, due to the fact that these were removed in Kubernetes 1.25.
Besides that, I have set for each component a replication count of one to reduce the load on my environment and configured the pod annotations so that metrics are collected afterward.

>NOTE: If you are going to use my example configuration. Please consider changing the storageClass, which may or may not be available in your infrastructure.

To collect both manifests, it is required to add an another kustomization with the following content.

> ./env/homelab/kustomization.yaml
```yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: victoria-metrics

resources:
  - ../../base/victoria-metrics-cluster

patchesStrategicMerge:
  - patches/patch-victoria-metrics-cluster.yaml

Using the HelmChartInflationGenerator within kustomize is currently a bit tricky and requires a special third kustomization which loads the second kustomization as generator module.

./generators/homelab/kustomization.yaml ```yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization

namespace: victoria-metrics

generators:

  • ../../env/homelab/ ```

With this setup, you are able to deploy the cluster deployment with any cicd approach or even a gitops approach.

If you are working with argocd to deploy this kustomization you need to add a plugin within your argocd-cm configmap and reference it within the plugin block in your application.

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
data:
  configManagementPlugins: |
    - name: kustomize-build-with-helm
      generate:
        command: [ "sh", "-c" ]
        args: [ "kustomize build --enable-helm" ]

Victoria Metrics Agent

Now with the cluster running, it is time to collect the first metrics from within the Kubernetes cluster. For this, it is possible to install the victoria metrics agent, which is also provided by a helm chart.

The agent is a tiny software that collects metrics from various sources and writes them towards the configure remote address.

Victoria Metrics Agent Overview Source: Victoria Metrics Documentation - VMagent

As first step, it is required to configure the helm inflation generator again.

./base/victoria-metrics-agent/helmrelease.yaml

apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
  name: victoria-metrics-agent
releaseName: victoria-metrics-agent 
name: victoria-metrics-agent 
version: 0.8.29
repo: https://victoriametrics.github.io/helm-charts/
valuesInline: {}
IncludeCRDs: true
namespace: victoria-metrics

Equally, to the cluster deployment, a initial kustomization is required to collect all manifest together and prepare them for patches.

As next step, the patch configuration is required to configure the agent with this deployment.

NOTE: This is a rather big patch and will be partly explained afterward

./env/homelab/patches/patch-victoria-metrics-agent.yaml ```yaml apiVersion: builtin kind: HelmChartInflationGenerator metadata: name: victoria-metrics-agent valuesInline: rbac: pspEnabled: false

deployment: enabled: false

statefulset: enabled: true

remoteWriteUrls:

  • http://victoria-metrics-cluster-vminsert.victoria-metrics:8480/insert/0/prometheus/

config: global: scrape_interval: 10s

scrape_configs:
  - job_name: vmagent
    static_configs:
      - targets: ["localhost:8429"]
  - job_name: "kubernetes-apiservers"
    kubernetes_sd_configs:
      - role: endpoints
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - source_labels:
          [
            __meta_kubernetes_namespace,
            __meta_kubernetes_service_name,
            __meta_kubernetes_endpoint_port_name,
          ]
        action: keep
        regex: default;kubernetes;https
  - job_name: "kubernetes-nodes"
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics
  - job_name: "kubernetes-nodes-cadvisor"
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
    metric_relabel_configs:
      - action: replace
        source_labels: [pod]
        regex: '(.+)'
        target_label: pod_name
        replacement: '${1}'
      - action: replace
        source_labels: [container]
        regex: '(.+)'
        target_label: container_name
        replacement: '${1}'
      - action: replace
        target_label: name
        replacement: k8s_stub
      - action: replace
        source_labels: [id]
        regex: '^/system\.slice/(.+)\.service$'
        target_label: systemd_service_name
        replacement: '${1}'
  - job_name: "kubernetes-service-endpoints"
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
      - action: drop
        source_labels: [__meta_kubernetes_pod_container_init]
        regex: true
      - action: keep_if_equal
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
      - source_labels:
          [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels:
          [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels:
          [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels:
          [
            __address__,
            __meta_kubernetes_service_annotation_prometheus_io_port,
          ]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
      - source_labels: [__meta_kubernetes_pod_node_name]
        action: replace
        target_label: kubernetes_node
  - job_name: "kubernetes-service-endpoints-slow"
    scrape_interval: 5m
    scrape_timeout: 30s
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
      - action: drop
        source_labels: [__meta_kubernetes_pod_container_init]
        regex: true
      - action: keep_if_equal
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
      - source_labels:
          [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
        action: keep
        regex: true
      - source_labels:
          [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels:
          [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels:
          [
            __address__,
            __meta_kubernetes_service_annotation_prometheus_io_port,
          ]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name
      - source_labels: [__meta_kubernetes_pod_node_name]
        action: replace
        target_label: kubernetes_node
  - job_name: "kubernetes-services"
    metrics_path: /probe
    params:
      module: [http_2xx]
    kubernetes_sd_configs:
      - role: service
    relabel_configs:
      - source_labels:
          [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__address__]
        target_label: __param_target
      - target_label: __address__
        replacement: blackbox
      - source_labels: [__param_target]
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        target_label: kubernetes_name
  - job_name: "kubernetes-pods"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - action: drop
        source_labels: [__meta_kubernetes_pod_container_init]
        regex: true
      - action: keep_if_equal
        source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels:
          [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name ```

Most of the beforehand patch is the configuration for the agent to scrape targets and can be ignored or copied. The important parts are the first few lines. With the remoteWriteUrls an external data source is configured. Due to the fact that both services are running side-by-side in a single cluster, it is possible to use the cluster ip to route this traffic internally.

Both manifest locations added towards the environment overlay kustomization and the cicd environment should automatically install the agent.

Grafana

Building a collection of metrics is just one side of the medallion. The other side is displaying and reacting to changing metrics. As always, start with a helm chart inflation.

./base/grafana/helmrelease.yaml

apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
  name: grafana
releaseName: grafana
name: grafana
version: 6.50.5
repo: https://grafana.github.io/helm-charts
valuesInline: {}
IncludeCRDs: true
namespace: victoria-metrics

The next step is to add the patch for Grafana. With the following patch, the deployment will be configured to the desired environment. For example, the ingress configuration provides all required information to access Grafana afterward.

The important part is the datasource configuration that provides the link between Grafana and the installed victoria metrics cluster. The VMSelect application provides a dropin replacement Prometheus endpoint for Grafana to be consumed.

One downside of this used helm chart is that there is currently no support for a configuration reload sidecar container that refreshes the dashboards and configuration located in Kubernetes. Therefor, it is required to configure the default available dashboards within the dashboards block.

./env/homelab/patches/patch-grafana.yaml ```yaml apiVersion: builtin kind: HelmChartInflationGenerator metadata: name: grafana valuesInline: datasources: datasources.yaml: apiVersion: 1 datasources: - name: victoriametrics type: prometheus orgId: 1 url: http://victoria-metrics-cluster-vmselect.victoria-metrics:8481/select/0/prometheus/ access: proxy isDefault: true updateIntervalSeconds: 10 editable: true

dashboardProviders: dashboardproviders.yaml: apiVersion: 1 providers: - name: ‘default’ orgId: 1 folder: ‘’ type: file disableDeletion: true editable: true options: path: /var/lib/grafana/dashboards/default

dashboards: default: victoriametrics: gnetId: 11176 revision: 18 datasource: victoriametrics vmagent: gnetId: 12683 revision: 7 datasource: victoriametrics kubernetes: gnetId: 14205 revision: 1 datasource: victoriametrics

ingress: enabled: true annotations: cert-manager.io/cluster-issuer: selfsigned-ca-issuer kubernetes.io/ingress.class: traefik traefik.ingress.kubernetes.io/router.entrypoints: web, websecure traefik.ingress.kubernetes.io/router.tls: ‘true’ ingress.kubernetes.io/ssl-force-host: “true” ingress.kubernetes.io/ssl-redirect: “true” hosts: - grafana.lan tls: - secretName: grafana.lan hosts: - grafana.lan resources: limits: cpu: 100m memory: 128Mi requests: cpu: 100m memory: 128Mi ```

Adding all folder and resources to their relevant kustomizations, and you should be welcomed with a semi-complete monitoring stack for your Kubernetes environment. Missing components like the node-exporter could easily be added to the same deployment process with the already shown approach.

As a small reminder: the complete deployment is described within the prepared repository under the following url https://github.com/deB4SH/Kustomize-Victoria-Metrics.