Skip to content

Chaos Engineering

Chaos Engineering allows you to simulate various email delivery failures to test how your application handles edge cases and failure scenarios. This feature is designed for testing environments to verify retry logic, error handling, and resilience.

Chaos engineering is disabled by default. To enable it, set the environment variable:

Terminal window
VSB_CHAOS_ENABLED=true

When disabled, all chaos-related API endpoints return 403 Forbidden.

VaultSandbox supports five types of chaos injection, evaluated in priority order:

PriorityTypeDescription
1Connection DropDrops the SMTP connection before sending a response
2GreylistSimulates greylisting by rejecting initial delivery attempts
3Random ErrorReturns random SMTP error codes (4xx temporary or 5xx permanent)
4BlackholeAccepts the email but silently discards it
5LatencyInjects delays before SMTP responses

When multiple chaos types are enabled, only the first matching action (in priority order) is applied per email.


Adds artificial delays to SMTP responses to test timeout handling and slow server scenarios.

FieldTypeDefaultDescription
enabledbooleanEnable latency injection
minDelayMsnumber500Minimum delay in milliseconds
maxDelayMsnumber10000Maximum delay in milliseconds (max: 60000)
jitterbooleantrueRandomize delay within min/max range
probabilitynumber1.0Probability of applying delay (0.0-1.0)

Example:

{
"enabled": true,
"latency": {
"enabled": true,
"minDelayMs": 1000,
"maxDelayMs": 5000,
"jitter": true,
"probability": 0.5
}
}

Simulates network failures by dropping the TCP connection after receiving the email but before sending the response.

FieldTypeDefaultDescription
enabledbooleanEnable connection dropping
probabilitynumber1.0Probability of dropping (0.0-1.0)
gracefulbooleantruetrue = TCP FIN (graceful), false = TCP RST (abrupt)

Example:

{
"enabled": true,
"connectionDrop": {
"enabled": true,
"probability": 0.3,
"graceful": false
}
}

Returns random SMTP error codes to test error handling and retry logic.

FieldTypeDefaultDescription
enabledbooleanEnable random errors
errorRatenumber0.1Error rate (0.0-1.0, e.g., 0.1 = 10% failure)
errorTypesarray["temporary"]Error types: temporary (4xx) and/or permanent (5xx)

Temporary Errors (4xx) — Client should retry:

  • 421 - Service temporarily unavailable
  • 450 - Mailbox busy, try again later
  • 451 - Temporary processing error
  • 452 - Insufficient storage

Permanent Errors (5xx) — Client should not retry:

  • 550 - Mailbox not found
  • 551 - User not local
  • 552 - Message size exceeds limit
  • 553 - Mailbox name invalid
  • 554 - Transaction failed

Example:

{
"enabled": true,
"randomError": {
"enabled": true,
"errorRate": 0.2,
"errorTypes": ["temporary", "permanent"]
}
}

Simulates greylisting behavior where initial delivery attempts are rejected, but subsequent retries succeed.

FieldTypeDefaultDescription
enabledbooleanEnable greylisting simulation
retryWindowMsnumber300000Time window for tracking retries (ms, default: 5 min)
maxAttemptsnumber2Accept after N delivery attempts
trackBystringip_senderHow to identify retries: ip, sender, or ip_sender

Example:

{
"enabled": true,
"greylist": {
"enabled": true,
"retryWindowMs": 60000,
"maxAttempts": 3,
"trackBy": "ip_sender"
}
}

Accepts emails normally but discards them instead of storing. Useful for testing scenarios where emails “disappear” without errors.

FieldTypeDefaultDescription
enabledbooleanEnable blackhole mode
triggerWebhooksbooleanfalseWhether to still trigger webhooks for blackholed emails

Example:

{
"enabled": true,
"blackhole": {
"enabled": true,
"triggerWebhooks": false
}
}

Chaos configurations can be set to auto-expire to prevent accidentally leaving chaos enabled:

{
"enabled": true,
"expiresAt": "2025-01-15T12:00:00Z",
"latency": {
"enabled": true,
"minDelayMs": 2000,
"maxDelayMs": 5000
}
}

After the expiresAt timestamp, chaos rules are ignored and emails are processed normally.


All chaos endpoints require VSB_CHAOS_ENABLED=true and authentication via the X-API-Key header.

Set chaos configuration when creating an inbox:

Terminal window
curl -X POST https://your-gateway/api/inboxes \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{
"chaos": {
"enabled": true,
"latency": {
"enabled": true,
"minDelayMs": 1000,
"maxDelayMs": 3000
}
}
}'
Terminal window
GET /api/inboxes/:emailAddress/chaos

Response:

{
"enabled": true,
"latency": {
"enabled": true,
"minDelayMs": 1000,
"maxDelayMs": 3000,
"jitter": true,
"probability": 1.0
}
}
Terminal window
POST /api/inboxes/:emailAddress/chaos

Request Body:

{
"enabled": true,
"randomError": {
"enabled": true,
"errorRate": 0.5,
"errorTypes": ["temporary"]
}
}
Terminal window
DELETE /api/inboxes/:emailAddress/chaos

Returns 204 No Content on success.


Chaos events are tracked in the metrics endpoint (GET /api/metrics):

MetricDescription
chaos.events_totalTotal chaos events triggered
chaos.latency_injected_msTotal milliseconds of latency injected
chaos.errors_returned_totalTotal random errors returned
chaos.connections_dropped_totalTotal connections dropped
chaos.greylist_rejections_totalTotal greylist rejections
chaos.blackhole_totalTotal emails blackholed

Configure random temporary errors to verify your application retries failed deliveries:

{
"enabled": true,
"randomError": {
"enabled": true,
"errorRate": 0.5,
"errorTypes": ["temporary"]
}
}

Inject latency to test timeout and slow-server handling:

{
"enabled": true,
"latency": {
"enabled": true,
"minDelayMs": 10000,
"maxDelayMs": 30000
}
}

Test that your email system correctly retries after greylist rejections:

{
"enabled": true,
"greylist": {
"enabled": true,
"maxAttempts": 2,
"retryWindowMs": 300000
}
}

Use blackhole mode to test how your application handles “lost” emails:

{
"enabled": true,
"blackhole": {
"enabled": true
}
}

When chaos is enabled for an inbox, the Web UI displays:

  • Chaos status indicator in the mailbox sidebar
  • Chaos configuration button in the inbox header
  • Real-time chaos events in the SSE console

  • Chaos engineering is disabled by default (VSB_CHAOS_ENABLED=false)
  • All chaos API endpoints require API key authentication
  • Consider using expiresAt to auto-disable chaos configurations
  • Monitor chaos metrics to detect misconfiguration