DevOpsStories - Victoria Metrics Setup

Hi all,
as some of you may know, I’m interested in homelabbing and are hosting my own kubernetes cluster at home. As part of a good homelab it is essential to keep track of logs and metrics.
The number one goto application for this usecase is often the kube Prometheus Stack, which is in my humble opinion a bit to big for my homelab in regard to memory, compute and storage footprint.
While looking for alternatives I stumbled upon Victoria Metrics which seems to be a perfect fit for my usecase.

It is build in a distributed fashion with a time-series storage ind mind. From my first view at the architectural view it looks quite fitting for my usecase and could be a nice general drop-in replacement for the Prometheus stack.
Victoria Metrics Architecture View
Source: Docs Victoria Metrics

So let’s get started in jump into the deep water and build our deployments.

A prepared example deployment is available here

Victoria Metrics Cluster

The cluster deployment is composed of the official helm-chart made available and contains the three root components vmselect, vminsert and vmstorage.

VMStorage is the data backend for all stored metrics and is the single golden trough for your queryable data in a time range. Due to the fact that the vmstorage component manages raw data it becomes a stateful part of your cluster, which is requiring some sort of special care.
VMInset and VMSelect are both stateless components in this stack and provide your third party applications access towards the raw data you are collecting in your cluster.

Installing the metrics cluster is rather easy due to the provided helm chart, which is easiest to view via ArtifactHub.
At the time of writing this blog post version 0.9.60 is the newest and everything is based on this.

To reduce the tool dependency, I’m going to use the HelmChartInflationGenerator for kustomize to keep everything in one universe.

First, we need to set up the inflation generator for this specific helm chart.

./base/victoria-metrics-cluster/helmrelease.yaml

1
2
3
4
5
6
7
8
9
10
11
apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
name: victoria-metrics-cluster
releaseName: victoria-metrics-cluster
name: victoria-metrics-cluster
version: 0.9.60
repo: https://victoriametrics.github.io/helm-charts/
valuesInline: {}
IncludeCRDs: true
namespace: victoria-metrics

./base/victoria-metrics-cluster/kustomization.yaml

1
2
3
4
5
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- helmrelease.yaml

The general information is pretty straight forward if you are already familiar with the helm-way to install prepared packages.
You may notice that the valuesInline are empty.
Due to the fact that I wanted to set up this deployment in a patch-able manor, the value overwrites are added with the next step.

./env/homelab/patches/patch-victoria-metrics-cluster.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
name: victoria-metrics-cluster
valuesInline:
rbac:
create: true
pspEnabled: false
vmselect:
replicaCount: 1
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8481"

vminsert:
replicaCount: 1
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8480"
extraArgs:
envflag.enable: "true"
envflag.prefix: VM_
loggerFormat: json

vmstorage:
replicaCount: 1
persistentVolume:
storageClass: nfs-client
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8482"

This patch applies modifications towards the helm chart which are generally available via the preconfigured values within it.
To elaborate my specific patch.
I wanted to create specific RBAC rules for my environment but was required to disable the pod security policies, due to the fact that these were removed in Kubernetes 1.25.
Besides that, I have set for each component a replication count of one to reduce the load on my environment and configured the pod annotations so that metrics are collected afterward.

NOTE: If you are going to use my example configuration. Please consider changing the storageClass, which may or may not be available in your infrastructure.

To collect both manifests, it is required to add an another kustomization with the following content.

./env/homelab/kustomization.yaml

1
2
3
4
5
6
7
8
9
10
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: victoria-metrics

resources:
- ../../base/victoria-metrics-cluster

patchesStrategicMerge:
- patches/patch-victoria-metrics-cluster.yaml

Using the HelmChartInflationGenerator within kustomize is currently a bit tricky and requires a special third kustomization which loads the second kustomization as generator module.

./generators/homelab/kustomization.yaml

1
2
3
4
5
6
7
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: victoria-metrics

generators:
- ../../env/homelab/

With this setup, you are able to deploy the cluster deployment with any cicd approach or even a gitops approach.

If you are working with argocd to deploy this kustomization you need to add a plugin within your argocd-cm configmap and reference it within the plugin block in your application.

1
2
3
4
5
6
7
8
9
10
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
data:
configManagementPlugins: |
- name: kustomize-build-with-helm
generate:
command: [ "sh", "-c" ]
args: [ "kustomize build --enable-helm" ]

Victoria Metrics Agent

Now with the cluster running, it is time to collect the first metrics from within the Kubernetes cluster. For this, it is possible to install the victoria metrics agent, which is also provided by a helm chart.

The agent is a tiny software that collects metrics from various sources and writes them towards the configure remote address.

Victoria Metrics Agent Overview
Source: Victoria Metrics Documentation - VMagent

As first step, it is required to configure the helm inflation generator again.

./base/victoria-metrics-agent/helmrelease.yaml

1
2
3
4
5
6
7
8
9
10
11
apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
name: victoria-metrics-agent
releaseName: victoria-metrics-agent
name: victoria-metrics-agent
version: 0.8.29
repo: https://victoriametrics.github.io/helm-charts/
valuesInline: {}
IncludeCRDs: true
namespace: victoria-metrics

Equally, to the cluster deployment, a initial kustomization is required to collect all manifest together and prepare them for patches.

As next step, the patch configuration is required to configure the agent with this deployment.

NOTE: This is a rather big patch and will be partly explained afterward

./env/homelab/patches/patch-victoria-metrics-agent.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
name: victoria-metrics-agent
valuesInline:
rbac:
pspEnabled: false

deployment:
enabled: false

statefulset:
enabled: true

remoteWriteUrls:
- http://victoria-metrics-cluster-vminsert.victoria-metrics:8480/insert/0/prometheus/

config:
global:
scrape_interval: 10s

scrape_configs:
- job_name: vmagent
static_configs:
- targets: ["localhost:8429"]
- job_name: "kubernetes-apiservers"
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels:
[
__meta_kubernetes_namespace,
__meta_kubernetes_service_name,
__meta_kubernetes_endpoint_port_name,
]
action: keep
regex: default;kubernetes;https
- job_name: "kubernetes-nodes"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics
- job_name: "kubernetes-nodes-cadvisor"
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
metric_relabel_configs:
- action: replace
source_labels: [pod]
regex: '(.+)'
target_label: pod_name
replacement: '${1}'
- action: replace
source_labels: [container]
regex: '(.+)'
target_label: container_name
replacement: '${1}'
- action: replace
target_label: name
replacement: k8s_stub
- action: replace
source_labels: [id]
regex: '^/system\.slice/(.+)\.service$'
target_label: systemd_service_name
replacement: '${1}'
- job_name: "kubernetes-service-endpoints"
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: drop
source_labels: [__meta_kubernetes_pod_container_init]
regex: true
- action: keep_if_equal
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[
__address__,
__meta_kubernetes_service_annotation_prometheus_io_port,
]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node
- job_name: "kubernetes-service-endpoints-slow"
scrape_interval: 5m
scrape_timeout: 30s
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: drop
source_labels: [__meta_kubernetes_pod_container_init]
regex: true
- action: keep_if_equal
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
action: keep
regex: true
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[
__address__,
__meta_kubernetes_service_annotation_prometheus_io_port,
]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node
- job_name: "kubernetes-services"
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels:
[__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
- job_name: "kubernetes-pods"
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: drop
source_labels: [__meta_kubernetes_pod_container_init]
regex: true
- action: keep_if_equal
source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port, __meta_kubernetes_pod_container_port_number]
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name

Most of the beforehand patch is the configuration for the agent to scrape targets and can be ignored or copied. The important parts are the first few lines.
With the remoteWriteUrls an external data source is configured. Due to the fact that both services are running side-by-side in a single cluster, it is possible to use the cluster ip to route this traffic internally.

Both manifest locations added towards the environment overlay kustomization and the cicd environment should automatically install the agent.

Grafana

Building a collection of metrics is just one side of the medallion. The other side is displaying and reacting to changing metrics.
As always, start with a helm chart inflation.

./base/grafana/helmrelease.yaml

1
2
3
4
5
6
7
8
9
10
11
apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
name: grafana
releaseName: grafana
name: grafana
version: 6.50.5
repo: https://grafana.github.io/helm-charts
valuesInline: {}
IncludeCRDs: true
namespace: victoria-metrics

The next step is to add the patch for Grafana.
With the following patch, the deployment will be configured to the desired environment. For example, the ingress configuration provides all required information to access Grafana afterward.

The important part is the datasource configuration that provides the link between Grafana and the installed victoria metrics cluster.
The VMSelect application provides a dropin replacement Prometheus endpoint for Grafana to be consumed.

One downside of this used helm chart is that there is currently no support for a configuration reload sidecar container that refreshes the dashboards and configuration located in Kubernetes. Therefor, it is required to configure the default available dashboards within the dashboards block.

./env/homelab/patches/patch-grafana.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
apiVersion: builtin
kind: HelmChartInflationGenerator
metadata:
name: grafana
valuesInline:
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: victoriametrics
type: prometheus
orgId: 1
url: http://victoria-metrics-cluster-vmselect.victoria-metrics:8481/select/0/prometheus/
access: proxy
isDefault: true
updateIntervalSeconds: 10
editable: true

dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: true
editable: true
options:
path: /var/lib/grafana/dashboards/default

dashboards:
default:
victoriametrics:
gnetId: 11176
revision: 18
datasource: victoriametrics
vmagent:
gnetId: 12683
revision: 7
datasource: victoriametrics
kubernetes:
gnetId: 14205
revision: 1
datasource: victoriametrics

ingress:
enabled: true
annotations:
cert-manager.io/cluster-issuer: selfsigned-ca-issuer
kubernetes.io/ingress.class: traefik
traefik.ingress.kubernetes.io/router.entrypoints: web, websecure
traefik.ingress.kubernetes.io/router.tls: 'true'
ingress.kubernetes.io/ssl-force-host: "true"
ingress.kubernetes.io/ssl-redirect: "true"
hosts:
- grafana.lan
tls:
- secretName: grafana.lan
hosts:
- grafana.lan
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi

Adding all folder and resources to their relevant kustomizations, and you should be welcomed with a semi-complete monitoring stack for your Kubernetes environment. Missing components like the node-exporter could easily be added to the same deployment process with the already shown approach.

As a small reminder: the complete deployment is described within the prepared repository under the following url https://github.com/deB4SH/Kustomize-Victoria-Metrics.