MazeLabs Engine Architecture

From noisy operations data to replayable incident simulations.

MazeLabs turns raw logs, tickets, RCAs, runbooks, and system behavior into compressed evidence, masked AI context, deterministic simulation state, actor-driven war rooms, and measurable readiness scores.

Raw Inputs
Vault
Compressor
Masking
EvidenceTree
Simulation
War Room
Scoring
STEP 1

Ingest real operational context.

MazeLabs starts with the operational data teams already have: runbooks, SOPs, RCAs, incident tickets, CloudWatch logs, application logs, topology notes, alert payloads, and historical troubleshooting records.

Runbooks & SOPs
RCA Documents
CloudWatch Logs
App Error Traces
Incident Tickets
Alert Payloads
ERROR payment-api timeout upstream=orders-db duration=8200ms
INFO request_id=1001 status=200 path=/health
INFO request_id=1002 status=200 path=/health
ERROR payment-api timeout upstream=orders-db duration=8230ms
INFO request_id=1004 status=200 path=/health
WARN replica_lag seconds=23 cluster=orders-prod
ERROR payment-api timeout upstream=orders-db duration=8260ms
INFO request_id=1007 status=200 path=/health
INFO request_id=1008 status=200 path=/health
ERROR payment-api timeout upstream=orders-db duration=8290ms
WARN replica_lag seconds=28 cluster=orders-prod
INFO request_id=1011 status=200 path=/health
ERROR payment-api timeout upstream=orders-db duration=8320ms
INFO request_id=1013 status=200 path=/health
INFO request_id=1014 status=200 path=/health
ERROR payment-api timeout upstream=orders-db duration=8350ms
INFO request_id=1016 status=200 path=/health
INFO request_id=1017 status=200 path=/health
ERROR payment-api timeout upstream=orders-db duration=8380ms
INFO request_id=1019 status=200 path=/health
SECURE VAULT
EVIDENCE EXTRACT
STEP 2

Raw data stays in the vault.

Raw operational documents and logs are stored inside a controlled MazeLabs vault. The vault keeps source material, metadata, provenance, and extracted evidence separate.

The AI layer does not need unrestricted access to raw sensitive documents.

  • Tracks source provenance
  • Separates raw data from AI-consumable evidence
  • Keeps sensitive operational context controlled
STEP 3

Compress telemetry into operational evidence.

Generic LLMs struggle with high-volume logs because they are repetitive, noisy, and full of sensitive identifiers. MazeLabs uses a Log Compressor to turn raw telemetry into structured incident signals.

Raw logs are not the product. Compressed operational evidence is the product.
BEFORE10,000 lines
[ERROR] 10:04:00 db_conn timeout db-01a req_id=8821 ip=10.4.2.0
[ERROR] 10:04:01 db_conn timeout db-01a req_id=8822 ip=10.4.2.1
[ERROR] 10:04:02 db_conn timeout db-01a req_id=8823 ip=10.4.2.2
[ERROR] 11:04:03 db_conn timeout db-01a req_id=8824 ip=10.4.2.3
[ERROR] 11:04:04 db_conn timeout db-01a req_id=8825 ip=10.4.2.4
[ERROR] 11:04:05 db_conn timeout db-01a req_id=8826 ip=10.4.2.5
[ERROR] 12:04:06 db_conn timeout db-01a req_id=8827 ip=10.4.2.6
[ERROR] 12:04:07 db_conn timeout db-01a req_id=8828 ip=10.4.2.7
[ERROR] 12:04:08 db_conn timeout db-01a req_id=8829 ip=10.4.2.8
[ERROR] 13:04:09 db_conn timeout db-01a req_id=8830 ip=10.4.2.9
[ERROR] 13:04:10 db_conn timeout db-01a req_id=8831 ip=10.4.2.10
[ERROR] 13:04:11 db_conn timeout db-01a req_id=8832 ip=10.4.2.11
[ERROR] 14:04:12 db_conn timeout db-01a req_id=8833 ip=10.4.2.12
[ERROR] 14:04:13 db_conn timeout db-01a req_id=8834 ip=10.4.2.13
[ERROR] 14:04:14 db_conn timeout db-01a req_id=8835 ip=10.4.2.14
AFTER: EVIDENCE PACK8 signals
CLUSTER: TIMEOUT_SPIKE
Database Connection Timeouts
Entity: db-01a
Window: 10:04:01 - 10:04:15
Events: 14,000+
Confidence: 0.95
CLUSTER: CASCADING_IMPACT
API 500 Responses
STEP 4

Mask sensitive context before AI reasoning.

Before evidence is passed to AI agents or model providers, MazeLabs applies masking and redaction policies. The AI does not need the raw secret to understand that a timeout occurred.

RAW LOG (VAULT ONLY)
customer_id=918273 user_email=alice@company.com host=prod-db-17 ip=10.2.4.8 token=sk_live_xxx
MASKED EVIDENCE (AI VISIBLE)
customer_id=[CUSTOMER_ID] user_email=[EMAIL] host=[HOST] ip=[PRIVATE_IP] token=[SECRET]
STEP 5 & 6

Build an EvidenceTree from messy inputs.

MazeLabs combines compressed telemetry, incident tickets, RCAs, and runbooks into an EvidenceTree. This graph organizes the incident into symptoms, signals, checks, hypotheses, and valid actions.

This makes simulations replayable and auditable. Instead of relying on uncontrolled LLM memory, the runtime uses structured evidence packs and known state transitions.

1 Ticket → 20 Scenario Variants
Incident: API Latency
Symptom: 5xx Spike
Signal: Timeout
Hypothesis: DB Saturation
Action: Failover
STEP 7 & 8

Realistic actor behavior driven by state.

Simulations include actors like SREs, DBREs, and Stakeholders. They do not randomly hallucinate; their behavior is tightly bounded by scenario state, hidden evidence, and time pressure.

Adversarial Reviewer

Challenges weak assumptions and unsafe actions during the incident.

"You are treating symptoms without validating the database signal. Check replica lag first."

Customer Stakeholder

Brings business impact urgency when time elapsed exceeds thresholds.

"Customer checkouts are still failing. What is the recovery ETA? We need an update."
STEP 9, 10 & 11

P1 War Room, Scoring & Debrief.

The War Room is an operational decision surface, not just a chat screen. Every action is evaluated, scored, and mapped to a replayable timeline for post-incident learning.

Agent-Driven P1 War Room

Presents evidence, transcript, stakeholder pressure, and action ledgers in one surface.

Readiness Scoring

  • Investigation: +12
  • Ignored Signal: -8
  • Unsafe Failover: -15
  • Valid Recovery: +10

Debrief & Replay

Generates a structured debrief showing what evidence was missed, correct paths, and team capability gaps.

Why this architecture matters.

Most teams already have the knowledge needed to train better incident responders. Generic LLMs cannot safely reason over raw context. MazeLabs creates a controlled middle layer to compress, mask, and structure evidence before simulation.

This end-to-end engine turns existing operational history into reusable readiness infrastructure.

1. Ingest

Runbooks, RCAs, tickets, CloudWatch logs, application logs, topology notes

2. Vault

Store raw operational context under customer control

3. Compress

Deduplicate logs, cluster errors, extract entities, build timelines, rank signals

4. Mask

Redact secrets, IPs, hostnames, customer IDs, emails, tokens, internal URLs

5. Evidence

Build source-linked EvidenceTrees and bounded EvidencePacks

6. Simulate

Create scenario phases, hidden evidence, valid checks, wrong paths, recovery criteria

7. Actors

Drive incident commander, SRE, DBRE, support, customer, coach, and reviewer behavior from scenario state

8. Bridge

Run P1/P0 incident drills with transcript, evidence, action ledger, stakeholder pressure, and timeline

9. Score

Measure evidence use, decisions, communication, escalation, recovery, and validation

10. Debrief

Replay missed signals, wrong assumptions, recovery quality, and follow-up labs

Turn your incident history into a simulation engine.

MazeLabs converts the logs, tickets, runbooks, and RCAs you already have into private, evidence-driven war-room simulations.