Centraliser les logs avec Loki
Loki collecte les logs de tous les pods Kubernetes via Promtail et les rend disponibles dans Grafana.
Requetes LogQL dans Grafana
# Voir les logs d'une application specifique
{namespace="production", app="mon-app"}
# Filtrer les erreurs
{namespace="production", app="mon-app"} |= "ERROR"
# Compter les erreurs par minute
rate({namespace="production"} |= "ERROR" [5m])
# Logs de Keycloak avec filtre
{namespace="auth", app="keycloak"} |= "LOGIN_ERROR"
# Parser les logs JSON
{app="mon-app"} | json | status >= 500
Configurer les alertes
# prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
namespace: monitoring
spec:
groups:
- name: application
rules:
# Alerte si taux d'erreur > 5%
- alert: HighErrorRate
expr: |
rate(app_requests_total{status=~"5.."}[5m])
/ rate(app_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Taux d'erreur eleve sur {{ $labels.app }}"
description: "Le taux d'erreur 5xx depasse 5% depuis 5 minutes."
# Alerte si un pod redémarre trop souvent
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} en crash loop"
# Alerte si CPU > 80%
- alert: HighCpuUsage
expr: |
(1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))
by (instance)) > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "CPU > 80% sur {{ $labels.instance }}"
Alertmanager : router les alertes
# alertmanager.yaml
global:
resolve_timeout: 5m
route:
receiver: default
group_by: [alertname, namespace]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: slack-critical
- match:
severity: warning
receiver: email-team
receivers:
- name: default
email_configs:
- to: devops@company.com
- name: slack-critical
slack_configs:
- api_url: https://hooks.slack.com/services/xxx
channel: "#alerts-critical"
title: "{{ .GroupLabels.alertname }}"
text: "{{ .CommonAnnotations.summary }}"
- name: email-team
email_configs:
- to: team@company.com
subject: "[WARN] {{ .GroupLabels.alertname }}"
Dashboard Grafana recommandes : Importez les dashboards ID 315 (Kubernetes cluster), 7249 (Kubernetes pods) et 13770 (Node Exporter) depuis grafana.com/grafana/dashboards.