RTO et RPO
Deux metriques fondamentales definissent votre strategie de reprise :
- RTO (Recovery Time Objective) : duree maximale acceptable d'interruption du service. Combien de temps pouvez-vous etre hors ligne ?
- RPO (Recovery Point Objective) : quantite maximale de donnees que vous pouvez perdre. Quel age peut avoir votre derniere sauvegarde ?
Strategies de sauvegarde
# Velero pour sauvegarder Kubernetes
# Installation
velero install \
--provider aws \
--bucket velero-backups \
--secret-file ./credentials-velero
# Sauvegarde planifiee
velero schedule create daily-backup \
--schedule="0 2 * * *" \
--include-namespaces production \
--ttl 720h
# Restauration
velero restore create --from-backup daily-backup-20260601
# Sauvegarde base de donnees avec Terraform
resource "aws_db_instance" "main" {
identifier = "prod-db"
engine = "postgres"
engine_version = "15"
backup_retention_period = 7
backup_window = "03:00-04:00"
# Replique de lecture cross-region
replicate_source_db = aws_db_instance.main.arn
}
Architecture multi-region
# Terraform : infrastructure multi-region
provider "aws" {
alias = "primary"
region = "eu-west-1"
}
provider "aws" {
alias = "secondary"
region = "eu-west-3"
}
# Route53 Health Check + Failover
resource "aws_route53_health_check" "primary" {
fqdn = "primary.example.com"
port = 443
type = "HTTPS"
failure_threshold = 3
request_interval = 10
}
resource "aws_route53_record" "app" {
zone_id = aws_route53_zone.main.zone_id
name = "app.example.com"
type = "A"
failover_routing_policy {
type = "PRIMARY"
}
set_identifier = "primary"
health_check_id = aws_route53_health_check.primary.id
alias {
name = aws_lb.primary.dns_name
zone_id = aws_lb.primary.zone_id
}
}
Failover Kubernetes
# Federation de clusters avec Submariner ou Liqo
# PodDisruptionBudget pour la HA
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: mon-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: mon-app
# topologySpreadConstraints pour repartir les Pods
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: mon-app
Important : Testez regulierement vos plans de reprise. Un backup non teste est un backup qui ne marche pas. Planifiez des exercices de DR trimestriels.