Monitoring et observabilite 28 min de lecture

Prometheus et Grafana

Architecture du monitoring

Architecture de monitoring sur Kubernetes :

+------------------------------------------------------------------+
|                    CLUSTER KUBERNETES                              |
|                                                                    |
|  +----------+  +----------+  +----------+  +----------+          |
|  | App Pod  |  | App Pod  |  | Keycloak |  | GitLab   |          |
|  | /metrics |  | /metrics |  | /metrics |  | /metrics |          |
|  +----+-----+  +----+-----+  +----+-----+  +----+-----+          |
|       |              |              |              |                |
|       +-------+------+-------+------+-------+------+              |
|               |              |              |                      |
|               v              v              v                      |
|       +----------------------------------------------+            |
|       |           PROMETHEUS (collecte)               |            |
|       |  - Scrape les endpoints /metrics              |            |
|       |  - Stocke les metriques (TSDB)                |            |
|       |  - Evalue les regles d'alerte                 |            |
|       +----------------------------------------------+            |
|               |                      |                             |
|               v                      v                             |
|       +----------------+    +------------------+                  |
|       | ALERTMANAGER   |    | GRAFANA          |                  |
|       | - Routes        |    | - Dashboards     |                  |
|       | - Email, Slack  |    | - Visualisation  |                  |
|       | - PagerDuty     |    | - Alertes visuelles|               |
|       +----------------+    +------------------+                  |
|                                                                    |
|       +----------------------------------------------+            |
|       |           LOKI (logs centralises)             |            |
|       |  - Collecte les logs de tous les pods         |            |
|       |  - Requetes LogQL                             |            |
|       |  - Integre dans Grafana                       |            |
|       +----------------------------------------------+            |
+------------------------------------------------------------------+

Deployer la stack monitoring avec Helm

# Installer kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword="admin-password" \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

# Installer Loki pour les logs
helm repo add grafana https://grafana.github.io/helm-charts

helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set promtail.enabled=true \
  --set loki.persistence.enabled=true

Metriques d'application (instrumentation)

# Exposer des metriques dans votre application (Python + prometheus_client)
from prometheus_client import Counter, Histogram, start_http_server

# Definir les metriques
REQUEST_COUNT = Counter(
    'app_requests_total',
    'Total des requetes HTTP',
    ['method', 'endpoint', 'status']
)
REQUEST_LATENCY = Histogram(
    'app_request_duration_seconds',
    'Duree des requetes HTTP',
    ['method', 'endpoint']
)

# Instrumenter le code
@app.route('/api/users')
def get_users():
    with REQUEST_LATENCY.labels('GET', '/api/users').time():
        users = db.get_users()
        REQUEST_COUNT.labels('GET', '/api/users', '200').inc()
        return jsonify(users)

# Demarrer le serveur de metriques sur le port 9090
start_http_server(9090)  # Accessible sur /metrics
Les 4 Golden Signals de Google : Latence, trafic, erreurs, saturation. Ces 4 metriques suffisent pour surveiller n'importe quel service.